PEP is a computer program that carries out stochastic propagation of error using simulation techniques (PEP - Program for Error Propagation). Its intended users include people who use demographic forecasts for planning or scientific purposes, for example demographers, statisticians, economists, actuaries etc. PEP produces files with simulated population counts by age and sex for user-specified forecast years. Output files can be read into a statistical program (e.g., Minitab) or a spreadsheet program for graphical or statistical description. At this stage summary information concerning age-groups or the total population can be obtained.
Three versions of PEP exist. The first one, PEP, is a single state model presented in this document, the second one is a multi state model, M-PEP, and the third one is also a multi state model, W-MPEP, without separate age-groups. The multi state versions are described in separate documents. Several subroutines differ in the three versions.
Two types of PEP-users are envisioned:
Before running PEP the forecaster specifies
PEP produces a set of output files into the directory used. If PEP is run several times, the different run times should be made in different directories.
For later reference, PEP produces a log-file that contains the parameters of the current run.
The program and its subroutines are all programmed in C-language. To give an idea of the size of a PEP as a computer program, we note that the program code of PEP and the utility programs have approximately 4500 lines of C-code (including comments). The size of the final compiled version is about 155 KB.
Alho and Spencer (1997) contains a detailed introduction to the statistical and demographic concepts needed to understand the logic of PEP. Its application nas been illustrated in Alho (1998).
The production of simulated population counts is based on the assumption that point forecasts are available for the relevant vital rates, and the user is able to characterize the expected uncertainty of the forecast in terms of variances and certain simple covariance structures for the error terms. The main parts of the program are the following:
All the parts are described separately in the following sections 3 - 6.
Random variables are generated in PEP using the Box-Muller method that produces independent normally distributed deviates with zero mean and unit variance. For details, see Press, et. al. (1988). The user gives a seed to initialize the random number generator. The given seed can be any integer between -2147483648 and 2147483647.
4.1. Error Terms for Mortality and Fertility
The forecast errors for mortality and fertility rates are generated as explained in Alho and Spencer (1997, 1999). Error terms X(j,t) are produced for mortality and fertility for each forecast year t, and each age-group j. The X(j,t) are calculated from the increments of error e(j,t) = X(j,t) - X(j,t-1) that are assumed to be of the form e(j,t) = S(j,t)(hj + d(j,t)). The scales S(j,t) > 0 are given by the user. The terms hj and d(j,t) are calculated using a constant correlation model or an AR(1) model.
In order to prevent the error terms of the vital rates from growing too large the user can give a limit T such that for t = T the error terms will be formed according to method described above, but for t = T the error structure will follow an AR(1) process centered at the point forecast, with the same standard deviation as the error term at t = T, and with the first autocorrelation equal to that of errors at t = T and t = T - 1.For details see Alho and Spencer (1997, 1999).
The error terms are combined with the point forecasts to produce sample paths of the future vital rates.
4.2. Error Terms for Net-Migration
In the single state model migration is handled via absolute numbers of net-migrants ^N(x,t) in each age x, and each future year t. The point forecast for the net number of migrants given by the user is ^N(x,t), so ^N(x,t)= ^N(x,t) + e(x,t) The error terms for net-migration are generated using gross-migration information. The user gives for each year t a forecast ^G(t) of total gross-migration. A fixed age distribution ^g(x) is assumed, so that x(x,t) = ^G(t)^g(x)e(t). This description applies to one sex. To account for the correlation between males and females we will use e(1,t) and e(2,t) for males and females, respectively. They will be assumed be of the form e(j,t) = S(j,t)(hj + d(j,t)), j = 1,2, or they will formally be of the same form as in Section 4.1.
The future population counts are calculated using a linear (Leslie) growth model. The point forecasts given by the user include the jump-off population counts, and the age- and sex-specific vital rates. The number of different age-groups, forecast years, and the lowest and highest ages of child-bearing are specified by the user before running the program.
In the single state model the user may choose the mortality rates, the mortality hazards or the projective mortality rates to be used. If the mortality rates are chosen to be used, the user may choose either the correction term for the survival probability suggested by Keyfitz or by Reed and Merrell (cf., Keyfitz 1977, pp. 21 - 22). The future sample paths of the vital rates are calculated based on the point forecasts and simulated forecast errors.
6.1. Storing the Simulated Future Population Counts
Suppose the forecast period is n years and N simulation rounds are used. After simulation PEP produces the output files Px_d1.S1, where x = 1,...,N refers to the simulation round, and files Yy_d1.S1, where y = 1,...,n refers to the forecast year. The files Px_d1.S1 contain the sample paths for each simulation round. Hence, the letter 'P' (= path) in the file name. The sample paths are converted into annual data files Yy_d1.S1 that contain the predictive distribution of the future population by age, and sex, for each year y. Hence, the letter 'Y' in the file name. In files Px_d1.S1 the columns correspond the age-groups, and the rows the forecast years, in files Yy_d1.S1 the columns correspond the age-groups, and the rows the simulation rounds, respectively. In the output files Px_d1.S1 the first column contains the future population for males in age 0, the second column for males in age 1, etc., and the last column the future population for females in the highest age. In the file name the symbol 'd1' refers to these default groups of a single year of age. In extension 'S1' the letter 'S' means that the males and females are output separately, and number 1 after 'S' refers to the state, in the single state model it is always 1. In the output files Px_d1.S1 the first row is the title row for the columns, the second row contains the future population counts of the first forecast year, the third row contains the future population of the second forecast year, etc., and the last row contains the future population of the last forecast year. In files Yy_d1.S1 the first row is the title row, the second row contains the future population counts of the first simulation round, the third row the counts of the second simulation round, etc., and the last row the counts of the last simulation round. The number N of simulation rounds (and, hence, of different sample paths) and the number n of forecast years is specified by the user. The 0th simulation round produces a sample path without simulated errors (file P0_d1.S1). This is the usual (nonstochastic) point forecast.
An example of an output file follows.
| EXAMPLE 6.1.1. | The highest age is 99. |
M 0 M 1 ... M 99 F 0 F 1 ... F 99
2166454 2102110 ... 12164 2066876 2007981 ... 32864
2544515 2162703 ... 13029 2427382 2065411 ... 30554
... ... ... ... ... ... ... ...
2424312 2132743 ... 1036 2125312 2042385 ... 33112
After making the files described above the user has an opportunity to create files that summarize the results. Aggregated files contain either sample paths, or annual data, or both according to age-groups chosen by the user. The default is 5-year age-groups. It is possible to choose only males or females, combine the sexes, or to produce separate output for each sex. The default output files are named Px_d5.S1 and Yy_d5.S1, where x = 1,...,N refers to the simulation round, and y = 1,...,n refers to the forecast year. The symbols 'd5' refer to default 5-year age-groups, and the extension 'S1' to separate groups for males and females of state 1. If only males or females are chosen to be output, the extension of a file name will be 'M1' or 'F1' instead of 'S1'. If both males and females are chosen to be output combined together, the extension in the file name is 'C1'. If the user-specified groups are chosen the files will be named Px_0.S1 (or .M1 or .F1 or .C1) and Yy_0.S1 (or .M1 or .F1 or .C1). All the information needed for producing these files is obtained from the user. An example of an aggregated file follows.
| EXAMPLE 6.1.2. | Only females are chosen to be output in 5-year age-groups. The highest age group is 100+. |
F 0-4 F 5-9 F 10-14 ... F 90-94 F 95-99 F 100
10289015 9863765 9533459 ... 755531 206381 32864
10652926 10057378 9613837 ... 787005 213016 30554
... ... ... ... ... ... ...
11363178 10448370 9875363 ... 910371 253861 30979
6.2. Storing the Simulated Life Expectancies
Besides future population counts PEP produces simulated life expectancies for males and females. Suppose the forecast period is n years. The simulated life expectancies are stored into annual files LE_Yy.S1, where y = 1,...,n refers to the forecast year. The first column contains the simulated life expectancies for males, and the second column the simulated life expectancies for females. The 0th simulation round (a sample path without simulated errors) produces a file LE_P0.S1, where the first column contains year, the second column life expectancies for males for each forecast year and the third column contains the life expectancies for females respectively.
| EXAMPLE 6.2.1. | File LE_Y5.S1, that is, the results of the 5th forecast year. Number of rows is the number of simulation rounds. |
73.84 81.42
74.85 81.85
74.75 81.20
74.67 81.22
... ...
74.69 81.68
|
| EXAMPLE 6.2.2. | File LE_P0.S1, that is, the results of the 0th simulation round (without simulated errors). |
| Number of rows is the number of forecast years. In this example number of forecast years is 20. | |
1 73.57 80.75
2 73.80 80.90
3 74.02 81.05
4 74.24 81.19
5 74.47 81.34
... ... ...
20 77.50 83.31
|
The program will prompt the user for a number of parameters needed to specify the forecasts. The information required in PEP is given in menus. PEP prompts the user to make choices as to how the program is run, for parameter values, and for file names. Default values are favored in the process so if the user responds by anything other than the specified alternative, the default value will be assumed.
The values for the following parameters or choices, and names for following files will be asked from the user before simulation (default values are given as an example). Note that the name of a data file may not exceed 12 characters.
In item "Parameters for simulation rounds, forecast years, lowest and highest age of child-bearing, and highest age" in submenu "Input files and parameters" the following values are required:
In item "Point forecasts" in submenu "Input files and parameters" the following file names are required (values for males and females in the same file; for fertility only females are relevant; the name of a data file may not exceed 12 characters):
In item "Kappas and scales" in submenu "Input files and parameters" the following file names are required:
In item "Mortality" in submenu "Error terms" the following values and choices are required:
In item "Fertility" in submenu "Error terms" the following values and choices are required:
In item "Migration" in submenu "Error terms" the following values are required:
In item "Specfication of mortality" the user chooses between
In item "Aggregated files" in submenu "Yes" the following values or choices are required:
In item "Parameters and Options" in submenu "Sex-ratio" the following value is required:
In item "Parameters and Options" in submenu "Correction term (mortality rates)" the following choice is required:
In item "Parameters and Options" in submenu "Separation factors (mortality rates)" the following values are required:
In item "Seed (optional)" in submenu "Optional Parameters and Options" the following value is required:
In item "Testing files (optional)" in submenu "Optional Parameters and Options" the following choice can be given:
If the user answers 'Yes' above, the following dialog appears:
In item "Specification file (optional)" in submenu "Optional Parameters and Options" the following choices can be made:
During PEP run all given parameter values and file names can be saved into the specification file. If the user wishes to save the given parameter values and file names into a specification file, the last option ‘Save the current parameter values for possible future use into a specification file with the name:' should be chosen and a file name given. When running PEP in the future (with some possible corrections), it is easy to get the parameter values and file names just choosing ‘Read the parameter values ..." and give the specification file name. If the user wishes to use the default values for parameters, ‘Use default values for parameters' should be chosen. However, file names containing point forecasts, kappas, and scales, are required from the user. The choice is alternative with the ‘Read the parameter values and file names from specification file:', that is, the user may choose either one of the two choices: ‘Use default ...' or ‘Read the parameter values ...'.
Note! If the names of the data files are not acceptable, or file name are missing, there will be an error message.
After simulation is completed and simulated sample paths and annual files (and possible aggregated files) are made, the user can look at the output files, that is, simulated sample paths and annual files; aggregated sample paths and aggregated annual files; error file made during checking the input files (if user has chosen that); and log-file.
As mentioned above, for later reference, PEP produces during the run a log-file, PEP.LOG, that contains the parameters and the data files of the current run.. The time and the date of the run are also included in PEP.LOG. In an error situation, for example there is not enough disc space for output files, the error messages will be written in PEP.LOG.
In item "Sample paths" in submenu "Output files" the following value is given by the user:
In item "Annual files" in submenu "Output files" the following value is given by the user:
In item "Aggregated sample paths" in submenu "Output files" the following value is given by the user:
(default 5-year age-groups):
In item "Aggregated sample paths" in submenu "Output files" the following value is given by the user:
(age-groups given by the user):
In item "Aggregated annual files" in submenu "Output files" the following value is given by the user:
(default 5-year age-groups):
In item "Aggregated annual files" in submenu "Output files" the following value is given by the user:
(age-groups given by the user):
In item "Log-file"in submenu "Output files" log-file PEP.LOG can be seen. No information required from the user.
In item "Error file"in submenu "Output files" error file named by the user or default file ERROR.TXT can be seen, if file testing has been used. No information required from the user.
In item "Run" simulation will be started. The following window will be opened on the screen:
Simulation can be stopped any time by positioning the cursor of the mouse on the window shown. After simulation PEP creates files containing the sample paths and the following window will be opened on the screen:
After making the files containing sample paths the annual files will be made in the same way.
If aggregated files are chosen to be output, the aggregated sample paths and/or aggregated annual files are made after annual files.
In item "Exit" program can be closed.
In item "Contents" in submenu "Help" help file about PEP can be read.
For example, item 'Parameters' contains the following:
In item "About PEP" in submenu "Help" the following dialog is opened.
In order to be able to read in the data files given by the user PEP requires the files to satisfy certain formal requirements. These have been designed to permit the verification that the data are in the form intended.
To permit the pre-simulation inspection of the data PEP requires that the beginning of every row has to include age and sex according to the following example (first males then females, M/F written with capitals):
(0) M 29976 (1) M 31069 (2) M 32188 (3) M 33085 (4) M 32968 ... ... (99) M 19 (100) M 24 (0) F 29059 (1) F 29568 (2) F 30912 (3) F 31998 ... ... (99) F 151 (100) F 190
PEP reads in data using free format with spaces as delimiters; '(' is used to identify a new row in a data file, especially when testing files (see 10.3); the last row is the highest female age (above it is 99); and exact jump-off time is assumed to be January 1 of a given year.
8.2. Future Mortality Rates/Mortality Hazards/Projective Mortality Rates
Age and sex must be given in the beginning of each row as for the jump-off population; the number of columns after that must be the same as the number of forecast years given to PEP; the text after the row identifier can be written in several lines, that is, there can be a newline character (<Enter>) between the entries, if the number of forecast years is large. An example follows.
EXAMPLE 8.1.
(0) M 0.003 0.004 0.003 0.003 0.004 ... 0.006
0.005 0.004 0.003 0.002
(1) M 0.002 0.003 0.003 0.004 0.004 ...
Each column corresponds a future year to be forecasted, the first column contains the mortality rates of the jump-off year and the last column the mortality rates of the year preceding January 1 of the last forecast year, if mortality rates are used.
(0) M 0.0092 0.0089 ... 0.0087 (1) M 0.0071 0.0069 ... 0.0066 ... ... ... ... ... (99) M 0.5387 0.5373 ... 0.5370 (0) F 0.0074 0.0072 ... 0.0069 (1) F 0.0062 0.0065 ... 0.0063 ... ... ... ... ... (99) F 0.5415 0.5430 ... 0.5334
- Kappa file contains only one column that consists of different age-groups (first males then females)
(0) M 0 (1) M 0 ... ... (99) M 0 (0) F 0 (1) F 0 ... ... (99) F 0
- See future mortality rates.
- Every row of given data file has to contain the age-specific fertility rates according to the following example (from lowest age of child-bearing to highest age of child-bearing, FER written with capitals):
(15) FER 20.7 20.8 ... 20.6
(16) FER 38.5 38.3 ... 38.2
... ... ... ... ...
(45) FER 0.6 0.6 ... 0.6
- The number of columns is the number of forecast years.
- Each column corresponds the future year to be forecasted, the first column contains the fertility rates of the jump-off year and the last column the fertility rates of the year before January 1 of the last forecast year.
(15) FER 0
(16) FER 0
... ...
(45) FER 0
- See future fertility rates.
8.8. Future Net-Migration Numbers
- The number of columns is the number of forecast years.
- Each column corresponds to a forecast year, the first column contains the net-migration numbers of the first forecast year and the last column the net-migration numbers of the last forecast year.
(0) M 105261 103242 ... 102832 (1) M 119418 100324 ... 108432 ... ... ... ... ... (99) M 1380 1342 ... 1289 (0) F 100874 102942 ... 101234 (1) F 100983 100103 ... 100283 ... ... ... ... ... (99) F 1500 1432 ... 1489
- Kappa file contains only two rows, one for each sex, and one column.
() M 0 () F 0
8.10. Scales for Net-Migration
- The number of rows is the number of sexes.
- Each column corresponds to a forecast year, the first column contains the scales for the net-migrants of the first forecast year and the last column the scales for the net-migrants of the last forecast year.
() M 0.5 0.4 ... 0.3 () F 0.5 0.3 ... 0.2
8.11. Age Distribution of Gross-Migration
Age distribution file contains only one column; the number of rows is the number of age-groups multiplied by two (first males then females); and the elements sum up to 1 for both sexes separately.
(0) M 0.000125 (1) M 0.011713 ... ... (99) M 0.000000 (0) F 0.000140 ... ... (99) F 0.000001
A utility program AGEDIST has been written to produce such a file containing a fixed age distribution of net-migrants.
8.12. Future Gross-Migration Numbers
Every row of data file has to contain the annual gross-migration (in-migration + out-migration) for males and females according to the following example:
(1) YEAR 606741 553270
(2) YEAR 614262 560757
... ... ...
(50) YEAR 601694 548317
The first column contains the gross-migration for males, the second column the gross-migration for females of the year in question; the rows correspond the forecast years (above the number of forecast years is 50).
8.13. Probabilities of Surviving in Highest Age
- Probabilities of surviving in highest age are needed only if mortality hazards are used.
- Every row of data file has to contain the annual probability for males and females of surviving in highest age according to the following example:
(1) YEAR 0.10 0.12
(2) YEAR 0.11 0.12
... ... ...
(50) YEAR 0.12 0.14
- The first column contains the probability for males, the second column the probability for females of the year in question.
- The rows correspond the forecast years (above the number of forecast years is 50).
Before simulation the user has an opportunity to check that the files satisfy the criteria given above (number of age-groups, number of forecast years). This can be done by choosing in submenu "Parameters and Options" the item "Testing files". Checking files is optional and running PEP does not require that files are checked. The user may also limit the check to a subset of all files. If the files are not checked, and there are errors in data files, the program terminates before simulation.
During file testing the program checks that the number of rows corresponds to the number of age-groups the user has given to PEP (in fertility files the number of rows is the difference between the highest age of child-bearing and the lowest age of child-bearing plus one; in gross-migration file the number of rows is the number of forecast years; in all other files the number of rows is the highest age plus one multiplied by two), and the number of columns corresponds the number of forecast years (Note: the number of columns in jump-off population file, kappa files, and in age distribution file is one; in gross-migration file the number of columns is two; in all other files it equals the number of forecast years given by the user). In file testing it is also checked whether a letter is found instead of a number in a column; whether an age-group is missing; whether there are extra rows in the beginning or at the end of file; whether the file is not the right one (e.g., fertility file is given instead of mortality file); and whether there are sufficient rows of males and females, fertility rates, or forecast years in each file.
If an error is found on some row in a data file, an error message is given, and the rest of the row is not checked, that is, all text before the next '(' will not be checked. In the error messages the counting of rows and columns starts from 1. Each newline character found in a data file adds the number of rows by one. An example follows.
| EXAMPLE 9.1. | An error in mortality file (number of forecast years is 8) is found on the row '(1) M' and columns of each age are written in two lines. |
(0) M 0.32 0.23 0.23 0.25 0.28 <Enter> (row 1)
0.34 0.30 0.28 <Enter> (row 2)
(1) M 0.28 0.27 0.29 0.25 0.27 <Enter> (row 3)
0.23 0.26 <Enter> (row 4)
(2) M 0.28 0.28 ... (row 5)
The following message will be given:
1. ERROR: row 4, age-group (1) M, column 8 missing.
If an age-group is missing, that is, a row is missing, the next age is expected to be found on the next row starting with '('. An example follows.
| EXAMPLE 9.2. | The highest age is 100. The highest age-group of males is missing. |
(0) M 0.2 (row 1) ... ... ... (98) M 0.3 (row 99) (99) M 0.3 (row 100) (0) F 0.4 (row 101) (1) F 0.2 (row 102) ... ... ...
The following message is given:
row 101, age-group (100) M, missing.
If errors are found in data files, they, or any warnings, are written in the default error file ERROR.TXT, or in a file given by the user. Note that only the errors that are shown on screen during file testing, are written in the error file. If errors are found, PEP will terminate.
The warnings tell the user that there are data (rows remaining) and they do not prevent the reading of the data files. An example follows.
| EXAMPLE 9.3. | The number of forecast years given to PEP is 30. However, the number of forecast years in a file that contains |
| gross-migration numbers for each year is 40. |
... ...
(30) YEAR 4828192
(31) YEAR 4768713
... ...
(40) YEAR 4382917
The following message is given:
WARNING: On row 31, starting with "(31) YEAR", there are still data left.
However, if there are more columns than needed in a data file, warning(s) will not be given. When the file is read in the program the extra columns will be ignored.
PEP runs on the IBM PC family of computers; the processor type must be at least 486 or Pentium; operating system NT; and PEP will run on any 80-column monitor. PEP requires memory according to simulation rounds and forecast years. For example, with 500 simulation rounds and 30 forecast years approximately 25 MB are used, and with 1000 simulation rounds and 50 forecast years approximately 80 MB are used. If there is not enough memory in the computer, PEP will give a message and program will terminate. Beside memory PEP needs disc space for output files, see Appendix 'Disc Space Requirements of PEP'.
Alho, J.M. and Spencer, B.D. (1997). The Practical Specification of the Expected Error of Population Forecasts. Journal of Official Statistics.
Alho, J.M. and Spencer, B.D. (1999). Statistical Demography and Forecasting. Manuscript.
Alho, J.M.. (1998) A Stochastic Forecast of the Population of Finland. Statistics Finland.
Keyfitz, N. (1977). Introduction to the Mathematics of Population with Revisions. Addison-Werley, Reading MA.
Press, W.H., Flannery, B.P., Teukolsky, S.A., and Vetterling, W.T. (1989). Numerical Recipes in C. Cambridge University Press, Cambridge.
Rogers, A. (1986). Parameterized Multistate Population Dynamics and Projections. Journal of the American Statistical Association 81, 48-61.
Sormunen, J., and Alho. J.M. (1999) Technical Report of PEP. Manuscript.