Technical Article

A short and unscientific trawl through our support call database has revealed Summarizations to be the answer to an unexpectedly high number of client enquiries. This article aims to explain, in detail, the workings of Summarizations and to highlight some of the ways that summarizations can be utilized, sometimes alone, sometimes in collaboration with other tools and features of IDEA.

Found under IDEA's Analysis menu, the primary objective of a summarization is to return totals for unique items (or values) found within a given field or a combination of fields.

There are two variations of Summarization available; the Quick Summarization is designed to search a single field for unique values (known in IDEAspeak as a "key" value), whereas the Key Field Summarization can search a combination of fields for the unique "key". Both variations re-organise the data behind the scenes and add up values to facilitate the generation of the results but they employ slightly different methods to reach this end.

A more extensive understanding of what each one does can help the user decide which option is the best for the task in hand.

The Quick option will temporarily "index" the file by the single, selected "Summarization Field" in a way intended to speed the process up (hence the name) this leads to another "feature" of the Quick summarization; it is incapable of displaying more than 32,000 different key values even if it finds more.

In contrast, the Key Field Summarization allows the user to create a "key" made from a combination of up to eight fields (although up to three or four is the more likely limit), the key used will be become a fixed "index" or specific order of records


Setting Up

Quick Field Summarization.

Open the database to be examined as the active database, select the Analysis menu then Field Summarization... Quick.

A dialog box opens and presents the following options:-
Section 1 Summarization Field: the field to be searched for unique items, use the drop down button to select a field from the list of all the fields in the active database.
Section 2 Fields to Summarize: the numeric field(s) to be added up to give a total for each unique item within the Summarization field, I prefer to call these "Fields to Total". Select a single field by clicking on the field name in the presented list; more than one field can be selected for "totalling" by holding down the Shift key and clicking on consecutive fields from the list or by holding down the Ctrl Control) key and clicking on the desired fields if not consecutive. Depending on the information required in the result of the test it is not always necessary to select any fields.
Section 3 Create Summarization Database: (The default option) Click on the check box to select or de-select this option. This option means that when the test is completed the results will be created in the form of a new IDEA database (.IMD file).

Create Summarization Result: Click on the check box to select or de-select this option. This option is used when the results of the test are required as a "Results View" attached to the original database, it produces this as an additional "tab" on the active database view.

Note: the two results options outlined above have different properties and can be used in different ways, see results below for more detail. They are not mutually exclusive so either or both can be selected.

QFS TAB 4.gif (865 bytes) Criteria: Click on the Equation Editor icon to open the Equation Editor dialog box to add a "criteria" or "equation" to act as filter or control which records the test is applied to.
QFS TAB 5.gif (907 bytes) File Name: A text box to allow the user to type in a meaningful name for the result of the test. This only applies to the Summarization Database created by the test and therefore the option disabled if only the Summarization Result option is selected.
QFS TAB 6.gif (915 bytes) Help button: Clicking this button opens the on-line help pages related to the Quick Field Summarization task. On these pages a description of the test can be found along with a step by step guide to setting up the test, an example of the application of the test, hints and tips and related topics.
QFS TAB 7.gif (900 bytes) Cancel button: Click this button to leave the dialog box without running the test; all settings are restored to the defaults.

QFS TAB 8.gif (876 bytes) Fields button: Clicking on this button opens a sub dialog box.

The Fields dialog box is where additional fields can be selected to add data to each record created in the Summarization result. Note. When applied to a quick field summarization this additional data will be obtained from the first record (after indexing) holding information for each unique item. By default, no additional fields are selected, click on a field name to select it; multiple fields can be selected by clicking on further field names, there are Include All and Clear All buttons available if required. The Cancel button will return the user to the main dialog box without including any selected fields, the Help will open the on-line help to page describing the function of the Fields dialog box and the OK button will return the user to the main dialog box and fields selected will added to the Summarization Database.
QFS TAB 9.gif (870 bytes) OK button: Invokes the running of the test, click this button when the test has been set up as required.

Key Field Summarization

Open the database to be examined as the active database, select the Analysis menu then Field Summarization... Key Field.

A dialog box opens and presents similar options to those on the "Quick Field Summarization" diaolg box, however there are some differences:-

Key: The dialog box for the Key Field Summarization is almost the same as for the Quick Summarization. However, it differs in that the Summarization Field box is replaced with a Key button. Clicking on this button opens another dialog box where a "key" can be created made up of up to eight fields. The "unique items" that the Summarization will group together is made of the values in the "key" fields selected.
Use fields from... : Also added to the Key Field Summarization is the ability to select whether to acquire data from the additional fields selected by clicking the Fields button from the first or last occurrence of the Key. This means that after the summarization has found and grouped together all records sharing the same "key " values, data from the additional fields are added to the Summarization database from either the first record in the list or the last.


Operation of the test

After the task is set up and run, IDEA will search through all the records in the active database, looking at the specified “Summarization field” or “Key” fields, it finds and then collects the records with matching unique items or values together. This is done behind the scenes, and once IDEA has the records grouped together by the values in the Summarization field, IDEA is then in a position to add up the number of times each unique or “key” value appears in the list, returned as a “number of records” for each “key”.

IDEA will also take the numeric values in the fields selected as “Fields to Summarize” and add them up for each “key” value returning an accumulated value for each “key”.

In addition to this, the process will make a record of the value(s) from any additional fields selected from the Fields to Include dialog box, this value will taken from the first record found for each “key” after IDEA has grouped the “keys” together. This additional information about each “key” is only available if a Summarization Database is created by the test.

Results

As mentioned earlier the test can output its results in two ways;-

Summarization Database

The Summarization Database output is a simple view of the result of the test. It shows the data in a tabular format with a column(s) listing the “summarization” or “key” fields, one row for different variation of “key”, the number of records each “key” is found in, and a column for each numeric field that has been selected for totalling, if any (these are all displayed in the same table). This type of result is where any fields of additional data, (those selected by clicking on the Fields button) are displayed – remember that data in these fields is collected from the first “occurrence” of a key when a Quick Field Summarization is used and may not be representative of every record for the “key”.

Summarization Result

The more flexible and some say, more useful form of output is the Summarization Result.

The result is found attached to the original database as an extra tab appearing at the base of the database window (initially labelled “Quick Field Summarization” or “Key Field Summarization”, the name can be changed to be more descriptive by double-clicking on the tab and overtyping the default name). The result is displayed in a tabular form, each different “key” value is shown on a different line and along side each value is the number of records for the “key” or, to put it another way, the number of times each “key “ appears in the original database.

A third column is added if any Numeric fields have been selected as “Fields to Summarize”, this will display the accumulated value for each “key”. If more than one “Fields to Summarize” has been selected then the values in the third column can be switched by clicking on the dropdown box beside the text field near the top of the view labelled “Field to Summarize” and then selecting a different field from the list.

An interesting feature of this type of output is the ability to “drill-down” to the records for each “key”. “Drill-down” can be activated in two ways; double-clicking on the line of interest or by clicking on the line once to highlight it and then clicking on the “drill-down” icon - at the top of the view. A preview of the records for the selected “key” is displayed in a separate window; this can saved as a new database (click the Save button and you will be prompted to give a name for the new database), sent to a printer (click Print) or just viewed and then closed (click Done). As with most dialog boxes within IDEA there is also a Help button to open the on-line about the current dialog box. NOTE: Although it is possible to highlight more than one line in the table only the top-most will be used for the “drill-down”.

There are other options for further use of the view;

The Extract icon - allows the user to create a new database of the records for a “key” – shortcutting the “drill-down” method.

The results can be converted to be displayed as a graph -, provided there are not too many different “keys”, by clicking on the Graph view icon .

There are a variety of different graph types to choose from and many other options to allow the user to display and distribute the results. “Drill-down” and “Extract” can also be used on graphs to get to the records related to a particular “key”.

The view can printed in either tabular or graphical form by clicking the Print icon , (there is also a Print Preview icon only available in the tabular view)

The final option is to save or export the view into a different file format , there are 22 to choose from including Excel worksheet and Word document options.


Examples of Summarization in action

  • A database includes information about a number of transactions, the individual transactions are made up of different elements each of which occupies a different row of the database. Different types of transaction have different numbers of elements and each element has a value related to it. To find out the value of an individual transaction the database needs to be summarized.
    • The “summarization field” or “key” of the summarization task would the field or combination of fields that uniquely identify a transaction. This would dictate whether a Quick Field Summarization or Key Field Summarization is used; if transactions are uniquely identified by information in more than one column then the latter should be employed.
    • To produce totals of fields containing the values related to each element of a transaction these fields must be selected as “Fields to Summarize”. Remember, if necessary, more than one field can be selected to be totalled.
    • Decide on what needs to be done with result and select the appropriate output option. For instance, if the result is and end in itself and only needs to be displayed as part of a report to others then the “Summarization Result” is more than likely sufficient. However, if the data contains several fields of values to be totalled and these totals also have to added together to form a grand total for each key then a “Summarization Database” is better suited. Should the need arise both options can be selected.
    • This is all that is generally required to perform a summarization, most of the time no additional data needs to be added by making selection from the Fields to Include dialog box.

  • In the next example a customer number, appears on a number of records, you have been asked to find out the last date each customer number occurred in the file. The database fortunately includes a date field otherwise the task would be impossible.
    • Firstly, it could be dangerous to assume that the file is in date order so a “Sorted Database” should be created, using the date field/ascending as the order to sort the file upon.
    • With the data recreated in the new “Sorted” database but now in date order a summarization can be performed. As the file is now in date ascending order the last occurrence of each different customer number should be last date on which it has been recorded. To obtain this a “Key Field Summarization” is required.
    • The “key” for the summarization should the field containing the customer number.
    • Any fields containing values to be totalled should selected from the Fields to Summarize list, although to arrive at the desired result this step is not strictly necessary.
    • Ensure that a Database Result will be created.
    • Select the “Use Fields from Last Occurrence” option, then click on the “Fields” button and select the date field from the list. Click OK to close the Fields to include dialog box and lastly click OK to run the test.
    • The Database Result will display a list of all the “key” values or in this example, customer numbers, found in the file and how many times the key occurs in the file tested. It will also display a date field containing the date from the last occurrence of each customer number; this will be the last date of occurrence.
Please let us know if there any areas of IDEA use you would like to see covered in future articles.

AuditWare Systems Limited:
The Old Sawmills, Eridge Road, Eridge Green, Tunbridge Wells, Kent, TN3 9JR
Telephone: + 44 (0) 1892 512348 / Fax: + 44 (0) 1892 512342