|
Technical
Article |
A short and unscientific trawl through our support
call database has revealed Summarizations to be the answer to an unexpectedly
high number of client enquiries. This article aims to explain, in
detail, the workings of Summarizations and to highlight some of the
ways that summarizations can be utilized, sometimes alone, sometimes
in collaboration with other tools and features of IDEA.
Found under IDEA's Analysis menu, the primary objective
of a summarization is to return totals for unique items (or values)
found within a given field or a combination of fields.
There are two variations of Summarization available;
the Quick Summarization is designed to search a single field for unique
values (known in IDEAspeak as a "key" value), whereas the Key Field
Summarization can search a combination of fields for the unique "key".
Both variations re-organise the data behind the scenes and add up
values to facilitate the generation of the results but they employ
slightly different methods to reach this end.
A more extensive understanding of what each one does
can help the user decide which option is the best for the task in
hand.
The Quick option will temporarily "index" the file
by the single, selected "Summarization Field" in a way intended to
speed the process up (hence the name) this leads to another "feature"
of the Quick summarization; it is incapable of displaying more than
32,000 different key values even if it finds more.
In contrast, the Key Field Summarization allows the
user to create a "key" made from a combination of up to eight fields
(although up to three or four is the more likely limit), the key used
will be become a fixed "index" or specific order of records
Setting Up
Quick Field Summarization.
Open the database to be examined as the active database,
select the Analysis menu then Field Summarization... Quick.
A dialog box opens and presents the following options:-
|
 |
Summarization Field: the field to be searched
for unique items, use the drop down button to select a field from
the list of all the fields in the active database. |
|
Fields to Summarize: the numeric field(s) to
be added up to give a total for each unique item within the Summarization
field, I prefer to call these "Fields to Total". Select
a single field by clicking on the field name in the presented
list; more than one field can be selected for "totalling"
by holding down the Shift key and clicking on consecutive fields
from the list or by holding down the Ctrl Control) key and clicking
on the desired fields if not consecutive. Depending on the information
required in the result of the test it is not always necessary
to select any fields. |
|
Create Summarization Database: (The default option)
Click on the check box to select or de-select this option. This
option means that when the test is completed the results will
be created in the form of a new IDEA database (.IMD file).
Create Summarization Result: Click on the
check box to select or de-select this option. This option is
used when the results of the test are required as a "Results
View" attached to the original database, it produces this
as an additional "tab" on the active database view.
Note: the two results options outlined above have different
properties and can be used in different ways, see results below
for more detail. They are not mutually exclusive so either or
both can be selected. |
|
Criteria: Click on the Equation Editor icon to
open the Equation Editor dialog box to add a "criteria"
or "equation" to act as filter or control which records
the test is applied to. |
|
File Name: A text box to allow the user to type
in a meaningful name for the result of the test. This only applies
to the Summarization Database created by the test and therefore
the option disabled if only the Summarization Result option is
selected. |
|
Help button: Clicking this button opens the on-line
help pages related to the Quick Field Summarization task. On these
pages a description of the test can be found along with a step
by step guide to setting up the test, an example of the application
of the test, hints and tips and related topics. |
|
Cancel button: Click this button to leave the
dialog box without running the test; all settings are restored
to the defaults. |
|
Fields button: Clicking on this button opens
a sub dialog box.
The Fields dialog box is where additional fields can be selected
to add data to each record created in the Summarization result.
Note. When applied to a quick field summarization this additional
data will be obtained from the first record (after indexing) holding
information for each unique item. By default, no additional fields
are selected, click on a field name to select it; multiple fields
can be selected by clicking on further field names, there are
Include All and Clear All buttons available if required. The Cancel
button will return the user to the main dialog box without including
any selected fields, the Help will open the on-line help to page
describing the function of the Fields dialog box and the OK button
will return the user to the main dialog box and fields selected
will added to the Summarization Database. |
|
OK button: Invokes the running of the test, click
this button when the test has been set up as required. |
|
|
Key Field Summarization
Open the database to be examined as the active database,
select the Analysis menu then Field Summarization... Key Field.
A dialog box opens and presents similar options to
those on the "Quick Field Summarization" diaolg box, however
there are some differences:- |
 |
Key: The dialog box for the Key Field Summarization
is almost the same as for the Quick Summarization. However, it
differs in that the Summarization Field box is replaced with a
Key button. Clicking on this button opens another dialog box where
a "key" can be created made up of up to eight fields.
The "unique items" that the Summarization will group
together is made of the values in the "key" fields selected.
|
|
Use fields from... : Also added to the Key Field
Summarization is the ability to select whether to acquire data
from the additional fields selected by clicking the Fields button
from the first or last occurrence of the Key. This means that
after the summarization has found and grouped together all records
sharing the same "key " values, data from the additional
fields are added to the Summarization database from either the
first record in the list or the last. |
|
|
Operation of the test
After the task is set
up and run, IDEA will search through all the records in the active
database, looking at the specified “Summarization field”
or “Key” fields, it finds and then collects the records
with matching unique items or values together. This is done behind
the scenes, and once IDEA has the records grouped together by the
values in the Summarization field, IDEA is then in a position to add
up the number of times each unique or “key” value appears
in the list, returned as a “number of records” for each
“key”.
IDEA will also take the numeric values in the fields
selected as “Fields to Summarize” and add them up for each
“key” value returning an accumulated value for each “key”.
In addition to this, the process will make a record
of the value(s) from any additional fields selected from the Fields
to Include dialog box, this value will taken from the first record
found for each “key” after IDEA has grouped the “keys”
together. This additional information about each “key” is
only available if a Summarization Database is created by the test.
|
Results
As mentioned earlier the test can output its results
in two ways;-
Summarization Database
The Summarization Database output is a simple view
of the result of the test. It shows the data in a tabular format with
a column(s) listing the “summarization” or “key”
fields, one row for different variation of “key”, the number
of records each “key” is found in, and a column for each
numeric field that has been selected for totalling, if any (these
are all displayed in the same table). This type of result is where
any fields of additional data, (those selected by clicking on the
Fields button) are displayed – remember that data in these fields
is collected from the first “occurrence” of a key when a
Quick Field Summarization is used and may not be representative of
every record for the “key”.
Summarization Result
The more flexible and some say, more useful form
of output is the Summarization Result.
The result is found attached to the original database
as an extra tab appearing at the base of the database window (initially
labelled “Quick Field Summarization” or “Key Field
Summarization”, the name can be changed to be more descriptive
by double-clicking on the tab and overtyping the default name). The
result is displayed in a tabular form, each different “key”
value is shown on a different line and along side each value is the
number of records for the “key” or, to put it another way,
the number of times each “key “ appears in the original
database.
A third column is added if any Numeric fields have
been selected as “Fields to Summarize”, this will display
the accumulated value for each “key”. If more than one “Fields
to Summarize” has been selected then the values in the third
column can be switched by clicking on the dropdown box beside the
text field near the top of the view labelled “Field to Summarize”
and then selecting a different field from the list. |
An interesting feature of this type of output is
the ability to “drill-down” to the records for each “key”.
“Drill-down” can be activated in two ways; double-clicking
on the line of interest or by clicking on the line once to highlight
it and then clicking on the “drill-down” icon -
at the top of the view. A preview of the records for the selected
“key” is displayed in a separate window; this can saved
as a new database (click the Save button and you will be prompted
to give a name for the new database), sent to a printer (click Print)
or just viewed and then closed (click Done). As with most dialog boxes
within IDEA there is also a Help button to open the on-line about
the current dialog box. NOTE: Although it is possible to highlight
more than one line in the table only the top-most will be used for
the “drill-down”.
There are other options for further use
of the view;
The Extract icon -
allows the user to create a new database of the records for a “key”
– shortcutting the “drill-down” method.
The results can be converted to be displayed as a
graph -, provided there are not too many different “keys”,
by clicking on the Graph view icon .
There are a variety of different graph types to choose
from and many other options to allow the user to display and distribute
the results. “Drill-down” and “Extract” can also
be used on graphs to get to the records related to a particular “key”.
The view can printed in either tabular or graphical
form by clicking the Print icon ,
(there is also a Print Preview icon
only available in the tabular view)
The final option is to save or export the view into
a different file format , there are
22 to choose from including Excel worksheet and Word document options. |
Examples of Summarization in action
- A database includes information about a number of transactions,
the individual transactions are made up of different elements each
of which occupies a different row of the database. Different types
of transaction have different numbers of elements and each element
has a value related to it. To find out the value of an individual
transaction the database needs to be summarized.
- The “summarization field” or “key” of
the summarization task would the field or combination of fields
that uniquely identify a transaction. This would dictate whether
a Quick Field Summarization or Key Field Summarization is used;
if transactions are uniquely identified by information in more
than one column then the latter should be employed.
- To produce totals of fields containing the values related
to each element of a transaction these fields must be selected
as “Fields to Summarize”. Remember, if necessary,
more than one field can be selected to be totalled.
- Decide on what needs to be done with result and select the
appropriate output option. For instance, if the result is and
end in itself and only needs to be displayed as part of a report
to others then the “Summarization Result” is more
than likely sufficient. However, if the data contains several
fields of values to be totalled and these totals also have to
added together to form a grand total for each key then a “Summarization
Database” is better suited. Should the need arise both
options can be selected.
- This is all that is generally required to perform a summarization,
most of the time no additional data needs to be added by making
selection from the Fields to Include dialog box.
|
- In the next example a customer number, appears on a number of
records, you have been asked to find out the last date each customer
number occurred in the file. The database fortunately includes a
date field otherwise the task would be impossible.
- Firstly, it could be dangerous to assume that the file is
in date order so a “Sorted Database” should be created,
using the date field/ascending as the order to sort the file
upon.
- With the data recreated in the new “Sorted” database
but now in date order a summarization can be performed. As the
file is now in date ascending order the last occurrence of each
different customer number should be last date on which it has
been recorded. To obtain this a “Key Field Summarization”
is required.
- The “key” for the summarization should the field
containing the customer number.
- Any fields containing values to be totalled should selected
from the Fields to Summarize list, although to arrive at the
desired result this step is not strictly necessary.
- Ensure that a Database Result will be created.
- Select the “Use Fields from Last Occurrence” option,
then click on the “Fields” button and select the date
field from the list. Click OK to close the Fields to include
dialog box and lastly click OK to run the test.
- The Database Result will display a list of all the “key”
values or in this example, customer numbers, found in the file
and how many times the key occurs in the file tested. It will
also display a date field containing the date from the last
occurrence of each customer number; this will be the last date
of occurrence.
|
Please let us know if there any areas of IDEA
use you would like to see covered in future articles. |
AuditWare Systems Limited:
The Old Sawmills, Eridge Road,
Eridge Green, Tunbridge Wells, Kent,
TN3 9JR
Telephone: + 44 (0) 1892 512348 / Fax: + 44 (0) 1892 512342 |
|