IT:Transforming grouping in DCS
From Stiki
DCS allows output data to be grouped by certain variables. The output variables that are not part of the grouping set of variables are then grouped together according to the rules specified in the Grouping tab. There are three possible calculation types allowed for any one output variable that is being grouped:
- Average returns the arithmetic mean of the variable values within the group.
- Sum returns the arithmetic sum of the variable values within the group.
- SumLog returns the natural logarithm of the sum of the exponents of the variable values within the group.
These groupings can be further defined by specifying weighting variables, by determining the number of decimal points to show in the output file. The remainder of this article assumes that the weighting settings are not being used.
Any single variable that appears in an output file for a DCS program can only be given one grouping setting. That is to say if you are outputting the variable ANNUAL_PREM in three different output formats, each output format will use the same settings; you cannot define different settings for different output formats (and certainly not different settings for different files that use the same output format).
Contents |
Example requiring different grouping calculations
There may be situations where you need to vary the grouping calculation used. For example your data may represent tranches of assurance business, where one policy may have multiple tranches. You wish to output a single record for each policy. The input data you have shows the same sum assured across each tranche, but a different premium value. Your in force output file therefore groups the records by policy number and sets the SUM_ASSURED variable to group by Average. However your new business file just needs to show the amount of the sum assured relating to any new tranches of business, and is calculated for each tranche based on the premium. You need to group the sum assured by Sum in order to get the correct value if there are more than one new business tranche per policy in a set of data.
Solution 1: one program for in force, one program for new business
One solution is to have two separate DCS programs: one that outputs the in force file using Average SUM_ASSURED and one that outputs the new business file using Sum SUM_ASSURED. This is a poor idea however, since it requires duplicate code which must be maintained in parallel, and will require a longer total run time since many of the same calculations will be performed twice.
Solution 2: an additional program to count the number of records per group
Another solution is to have a separate DCS program that only counts how many records there are for each policy. This should be run before the main program, and outputs the results into an external table. This table can then be read in by the main program.
The main program should have SUM_ASSURED set to group by Sum. For new business, the calculated values can be summed to get the correct total amount per policy. For in force business, the sums assured for each record can be divided by the numer of records per group, so that the sum of these adds back up to the policy's sum assured.
However this solution introduces new risks:
- The additional program may not be run before the main program, which may result in a table of values from a previous run being erroneously used in the current run.
- The additional program must replicate the validation of the main program, so that invalid records are excluded from its count. Unless the validations are very simple, this may mean that large sections of the main program need to be copied to the additional program, and the two sets of code must then be maintained in parallel.
- Writing data to and reading data from an external table is relatively slow (compared with performing calculations), and the entire input data file must be processed twice (once by each program) so there will be an increase to the run time if this solution is used.
Solution 3: transform the values to match the grouping calculation
A still better solution is to transform the calculated data so that the variables, when grouped, return the values that you want. In this case, this would involve transforming the calculated value of SUM_ASSURED from Average into Sum for just the new business output file.
This cannot be done by simply dividing each record's SUM_ASSURED by the number of records in the record's group, since at the time that record 1 for a group is being processed we do not know how many records there will be in total for that group. As such we need to make running adjustments to each record's SUM_ASSURED so that the average across all records in that group is actually the sum of the calculated values.
Transforming the grouping calculation
For each of the six possible transformations below, the following terminology is used in the formulae:
- A(k) is the actual calculated value for the variable, for record k of a single group (k=1, 2, 3, ...)
- G(k) is the transformed value of A(k) so that the grouped value of G(k) gives the required result
G(1) is always equal to A(1). Each formula below then shows how to transform A(k) into G(k) for k>1, using only the values of A(k), k, and an aggregate sum of the preceding values A(1) to A(k-1).
The most common requirement will be to transform between Average and Sum (in either direction). The formulae for SumLog are also described below, although they are necessarily more complex. Do not let any perceived complexity in the mathematical formulae put you off; the sample DCS code for each case is relatively simple to implement.
Transforming Average into Sum
If the variable is usually grouped by Average but you need it to be grouped by Sum, use the following calculation:
| Derivation and proof of formula |
|---|
|
Assume: We require G(k) such that: Check that this holds for k=2: So the formula does indeed hold for k=2. If the formula holds for k-1 then it also holds for k. Therefore the formula holds for all k>1 as required. |
| Sample DCS code |
|---|
IF FIRST_RECORD THEN GROUP_ID_PREV = "" ENDIF ... IF GROUP_ID <> GROUP_ID_PREV THEN ; This is the first record for this group. Set up the appropriate variables in case of further records: GROUP_COUNT = 1 GROUP_SUM = OUTPUT_VALUE GROUP_ID_PREV = GROUP_ID ELSE ; This is not the first record for this group. Rebalance OUTPUT_VALUE to allow for averaging in the output format: GROUP_COUNT = GROUP_COUNT + 1 OUTPUT_VALUE_NEW = GROUP_COUNT * OUTPUT_VALUE + GROUP_SUM GROUP_SUM = GROUP_SUM + OUTPUT_VALUE OUTPUT_VALUE = OUTPUT_VALUE_NEW ENDIF |
Transforming Average into SumLog
If the variable is usually grouped by Average but you need it to be grouped by SumLog, use the following calculation:
- Valid only for
| Derivation and proof of formula |
|---|
|
Assume: We require G(k) such that: Check that this holds for k=2: So the formula does indeed hold for k=2. If the formula holds for k-1 then it also holds for k. Therefore the formula holds for all k>1 as required. |
| Derivation of valid range for A(k) |
|---|
|
The largest number that DCS can handle is 1.7×10308. Usually this limitation will not bite, but the exponent of a number can be many orders of magnitude greater than the number itself. The formula for this transformation includes the sum of the exponents of each of the A(k). Therefore we require that: Further, since en>0 for all real n: Note that although the formula includes logarithms, the values on which the logarithms are taken are the sums of exponents. Since en>0 for all real n, all of the logarithms are of strictly positive numbers, so there is no requirement for a lower limit to A(k). |
| Sample DCS code |
|---|
IF FIRST_RECORD THEN GROUP_ID_PREV = "" ENDIF ... IF GROUP_ID <> GROUP_ID_PREV THEN ; This is the first record for this group. Set up the appropriate variables in case of further records: GROUP_COUNT = 1 GROUP_SUM = EXP(OUTPUT_VALUE) GROUP_ID_PREV = GROUP_ID ELSE ; This is not the first record for this group. Rebalance OUTPUT_VALUE to allow for averaging in the output format: GROUP_COUNT = GROUP_COUNT + 1 OUTPUT_VALUE_NEW = GROUP_COUNT * LN(EXP(OUTPUT_VALUE) + GROUP_SUM) - (GROUP_COUNT-1) * LN(GROUP_SUM) GROUP_SUM = GROUP_SUM + EXP(OUTPUT_VALUE) OUTPUT_VALUE = OUTPUT_VALUE_NEW ENDIF |
Transforming Sum into Average
If the variable is usually grouped by Sum but you need it to be grouped by Average, use the following calculation:
| Derivation and proof of formula |
|---|
|
Assume: We require G(k) such that: Check that this holds for k=2: So the formula does indeed hold for k=2. If the formula holds for k-1 then it also holds for k. Therefore the formula holds for all k>1 as required. |
| Sample DCS code |
|---|
IF FIRST_RECORD THEN GROUP_ID_PREV = "" ENDIF ... IF GROUP_ID <> GROUP_ID_PREV THEN ; This is the first record for this group. Set up the appropriate variables in case of further records: GROUP_COUNT = 1 GROUP_SUM = OUTPUT_VALUE GROUP_ID_PREV = GROUP_ID ELSE ; This is not the first record for this group. Rebalance OUTPUT_VALUE to allow for summing in the output format: GROUP_COUNT = GROUP_COUNT + 1 OUTPUT_VALUE_NEW = OUTPUT_VALUE / GROUP_COUNT - GROUP_SUM / GROUP_COUNT / (GROUP_COUNT-1) GROUP_SUM = GROUP_SUM + OUTPUT_VALUE OUTPUT_VALUE = OUTPUT_VALUE_NEW ENDIF |
Transforming Sum into SumLog
If the variable is usually grouped by Sum but you need it to be grouped by SumLog, use the following calculation:
- Valid only for
| Derivation and proof of formula |
|---|
|
Assume: We require G(k) such that: Check that this holds for k=2: So the formula does indeed hold for k=2. If the formula holds for k-1 then it also holds for k. Therefore the formula holds for all k>1 as required. |
| Derivation of valid range for A(k) |
|---|
|
The largest number that DCS can handle is 1.7×10308. Usually this limitation will not bite, but the exponent of a number can be many orders of magnitude greater than the number itself. The formula for this transformation includes the sum of the exponents of each of the A(k). Therefore we require that: Further, since en>0 for all real n: Note that although the formula includes a logarithm, the values on which the logarithm is taken is the ratio of exponents. Since en>0 for all real n, the logarithm is of a strictly positive number, so there is no requirement for a lower limit to A(k). |
| Sample DCS code |
|---|
IF FIRST_RECORD THEN GROUP_ID_PREV = "" ENDIF ... IF GROUP_ID <> GROUP_ID_PREV THEN ; This is the first record for this group. Set up the appropriate variables in case of further records: GROUP_COUNT = 1 GROUP_SUM = EXP(OUTPUT_VALUE) GROUP_ID_PREV = GROUP_ID ELSE ; This is not the first record for this group. Rebalance OUTPUT_VALUE to allow for summing in the output format: GROUP_COUNT = GROUP_COUNT + 1 OUTPUT_VALUE_NEW = LN(1 + EXP(OUTPUT_VALUE) / GROUP_SUM) GROUP_SUM = GROUP_SUM + EXP(OUTPUT_VALUE) OUTPUT_VALUE = OUTPUT_VALUE_NEW ENDIF |
Transforming SumLog into Average
If the variable is usually grouped by SumLog but you need it to be grouped by Average, use the following calculation:
- Valid only for
| Derivation and proof of formula |
|---|
|
Assume: We require G(k) such that: Check that this holds for k=2: So the formula does indeed hold for k=2. If the formula holds for k-1 then it also holds for k. Therefore the formula holds for all k>1 as required. |
| Derivation of valid range for A(k) |
|---|
|
The largest number that DCS can handle is 1.7×10308. Usually this limitation will not bite, but the exponent of a number can be many orders of magnitude greater than the number itself. The formula for this transformation includes an exponent of a function the A(k). Therefore we require that: The formula also include a logarithm. Since DCS does not handle complex numbers, the logarithm must be of a value that is strictly greater than zero. Therefore we require that: Note that this means that each calculated value within a group must be strictly greater than the average of the previous values. In order for this to be true, you may need to change the ordering of your data. There is no limitation on the lower value of the first record in the group (which can, indeed, even be negative). |
| Sample DCS code |
|---|
IF FIRST_RECORD THEN GROUP_ID_PREV = "" ENDIF ... IF GROUP_ID <> GROUP_ID_PREV THEN ; This is the first record for this group. Set up the appropriate variables in case of further records: GROUP_COUNT = 1 GROUP_SUM = OUTPUT_VALUE GROUP_ID_PREV = GROUP_ID ELSE ; This is not the first record for this group. Rebalance OUTPUT_VALUE to allow for use of SumLog in the output format: GROUP_COUNT = GROUP_COUNT + 1 OUTPUT_VALUE_NEW = LN(EXP( OUTPUT_VALUE/GROUP_COUNT - GROUP_SUM/GROUP_COUNT/(GROUP_COUNT-1) ) - 1) + GROUP_SUM/(GROUP_COUNT-1) GROUP_SUM = GROUP_SUM + OUTPUT_VALUE OUTPUT_VALUE = OUTPUT_VALUE_NEW ENDIF |
Transforming SumLog into Sum
If the variable is usually grouped by SumLog but you need it to be grouped by Sum, use the following calculation:
- Valid only for
| Derivation and proof of formula |
|---|
|
Assume: We require G(k) such that: Check that this holds for k=2: So the formula does indeed hold for k=2. If the formula holds for k-1 then it also holds for k. Therefore the formula holds for all k>1 as required. |
| Derivation of valid range for A(k) |
|---|
|
The largest number that DCS can handle is 1.7×10308. Usually this limitation will not bite, but the exponent of a number can be many orders of magnitude greater than the number itself. The formula for this transformation includes an exponent of A(k). Therefore we require that: The formula also include a logarithm. Since DCS does not handle complex numbers, the logarithm must be of a value that is strictly greater than zero. Therefore we require that: Note that there is no limitation on the lower value of the first record in the group (which can, indeed, even be negative). |
| Sample DCS code |
|---|
IF FIRST_RECORD THEN GROUP_ID_PREV = "" ENDIF ... IF GROUP_ID <> GROUP_ID_PREV THEN ; This is the first record for this group. Set up the appropriate variables in case of further records: GROUP_COUNT = 1 GROUP_SUM = EXP(OUTPUT_VALUE) GROUP_ID_PREV = GROUP_ID ELSE ; This is not the first record for this group. Rebalance OUTPUT_VALUE to allow for use of SumLog in the output format: GROUP_COUNT = GROUP_COUNT + 1 OUTPUT_VALUE_NEW = LN(EXP(OUTPUT_VALUE) - 1) + GROUP_SUM GROUP_SUM = GROUP_SUM + EXP(OUTPUT_VALUE) OUTPUT_VALUE = OUTPUT_VALUE_NEW ENDIF |
Notes on sample code
Description of variables
-
OUTPUT_VALUEis the variable that needs to be transformed so that its grouped value after transformation (by the grouping defined for that variable on the DCS Grouping tab) equals the required grouped total for the value (by the different grouping calculation as required for that particular value). -
OUTPUT_VALUE_NEWis the value ofOUTPUT_VALUEafter transformation. This is then reassigned toOUTPUT_VALUE(after calculation ofGROUP_SUM) so that it is output correctly. -
GROUP_IDrepresents a variable that uniquely identifies the records belonging to each group. In the example given above this could bePOLICY_NUM, say. -
GROUP_ID_PREVholds theGROUP_IDof the last record considered, so that it can be compared against theGROUP_IDof the current record to see if they are grouped together or not. -
GROUP_COUNTholds a count of how many records have been found so far in the current group. -
GROUP_SUMholds an aggregate sum of the value ofOUTPUT_VALUEfor all of the earlier records in the current group.
Assumptions made in this code
- The grouping calculation for
OUTPUT_VALUEhas no weighting variables. - The records of the input data are sorted by
GROUP_ID. (Complete sorting is not necessary; it is sufficient that for any two records with the same value ofGROUP_ID, there are no records between them with a differentGROUP_IDthat use the section of code performing the transformation. But note that sorting the data is the easiest way to achieve this if the data is not in a suitable order.) - Any calculations that depend on the value of
OUTPUT_VALUEare performed before the transformation code, so that they are not affected by the transformation. - All values of
OUTPUT_VALUE(pre-transformation) are within the valid range for that transformation type. (Note that no explicit validation is given for the transformation between Average and Sum (in either direction) since these just involve arithmetic sums, and so are restricted only by the limit on a number that its absolute value is no greater than 1.7×10308). - The part of the code conditional on
FIRST_RECORDshould not be nested within any other conditions, and should be placed before any validation. This is to ensure that it is always called on the very first record of the input data. Otherwise when the transformation code is first called, the value ofGROUP_ID_PREVwill be undefined and so set to a default value for its variable type. This is an issue if theGROUP_IDfor the first record using the transformation code happens to equal this value, or if the variables have not been declared as types and cause a conflict when compared. -
GROUP_IDis assumed to be a text variable that cannot have a blank value. If this is not the case, setGROUP_ID_PREVto some other appropriate value in the part of the code conditional onFIRST_RECORD(for example 0 or -1 for a numeric variable).

