wfs -- Extracting data from World Fertility Survey (WFS) Standard Recode Files at DHS


1. List all available datasets in the data archive: wfs

2. View the data dictionary for a given dataset (e.g. cosr02): wfs using filename [, dhs ]

3. Extract a set of variables from a given dataset: wfs varlist using filename [, dhs clear ]

4. Reshape the data to long or back to wide format using the birth or union histories wfs reshape long|wide [, births|unions nodrop ]

5. Make a local copy of the data and dictionary for a given dataset (e.g. cosr02): wfs copy using filename [, directory(folder) replace ]

options --------------------------------------------------------------------------- dhs Access the data from the DHS data archive rather than locally clear Clear memory before loading the data births Reshape the birth history. Also abbreviation for all variables in the birth history unions Reshape the union history. Also abbreviation for all variables in the union history nodrop Keeps empty entries in the dataset when reshaping to long format. Default: drop directory Directory in which to save a local copy of the dictionary and dataset replace Replace the local copy of a file if it already exists


This command facilitates access to World Fertility Survey (WFS) data in the data archive at The Demographic and Health Surveys Program at ICF.


Syntax 1, wfs with no arguments, will point your web browser to the WFS home page in the DHS data archive, where you can see a list of the 43 surveys available, each with a separate page with key information and a list of the files available.

Syntax 2 let's you view a data dictionary. For example wfs using cosr02, dhs will show the data dictionary for version 2 of the Colombia Standard Recode file. The dhs option will read the data from the DHS website, otherwise the file is supposed to be available on the local machine.

Data Extraction

Syntax 3 lets you extract data. For example to extract the date of interview (v007) and the date of birth of the respondent (v008) from COSR02 you use the command wfs v007 v008 using cosr02, dhs clear. The dhs option downloads the data from the DHS website, otherwise a local copy of the file is assumed. The clear option allows overwriting the dataset currently in memory if any, as usual.

Following Stata conventions the variable names are all in lowercase, even though the data dictionary lists them in uppercase. The list may include a range of consecutive variables in the dictionary, for example v701-v704. The list also allows the use of the wild cards ? and * to match one or several characters in the names, for example b* m??1 v7*, but wildcards may not be used to specify the start or end of a range of variables.

The varlist also allows for three special keywords: all, births and unions. The keyword all is used to extract all variables from the dataset. The keywords births and unions are described below.

The wfs command will create variable labels and value labels using the information in the dictionary. Sometimes a variable will share value labels with another, but may also modify the labels by adding or redefining a value. The command handles all these situations.

WFS defines two types of missing values, a "not applicable" code, and "special codes", which are any values equal to or higher than a stated value.

For example the age at death of the woman's first child (b015) is coded 88 if the woman has no children or if the child is alive, a "not applicable" code, and 99 if she had a child who died but the age at death is not known, a "special code" representing a missing value.

The wfs command handles "not applicable" values by coding them as missing with the code .n, and leaves "special codes" for the researcher to handle.

Birth and Union Histories

The WFS standard recode files include a birth history, which has five variables (birth order, date of birth, sex of child, age at death in years, and age at death in months) for each of up to 24 births. The data are stored in wide format. To facilitate referring to the entire history you can use births, which is equivalent to b011-b245, from the first variable of the first birth to the last variable of the most-recent birth.

The files also include a union history, with four variables (type of union, date of union, status of union and date of dissolution) for each of up to 8 unions, also in wide format. A shortcut name for all these variables is unions. For example wfs v101 unions using cosr02, dhs clear will extract the number of unions and the entire union history.

Reshaping the Histories Syntax 4 of the wfs command requires a WFS dataset in memory and will reshape it from wide to long format (or back from long to wide format) on the basis of the birth or union histories.

For example wfs reshape long, births will reshape the file creating a separate record for each birth, attaching all the mother variables to each child.

Only actual births are included in the long format, i.e. births with "not applicable" date of birth do not generate a separate record. When a file is reshaped back to wide we only include as many records as found, so if nobody has more than 20 births we will only generate 20 sets of variables in wide format. Moreover, women with no births will not be included, because they were not part of the birth file. In other words the first reshape is not reversible.

The same applies to unions. For example the cosr02 file has 5378 women with 3868 unions, so wfs reshape long, unions generates 3868 records, one per union. If you were to wfs reshape wide, unions you would get back 3302 records, one for each woman with at least one union. Moreover, only union variables m011 to m074 would be included, as no woman in the dataset had more than 7 unions.

The variable labels usually have an index number in angular brackets, for example "Type of union <1>", "Type of union <2>", and so forth. These numbers are removed in long format and restored if you reshape back to wide.

The nodrop option applies only when reshaping into long format. The standard recodes allow for up to 24 births and 8 unions, but of course most women have fewer than that. The default is to drop empty entries above the actual number of births or unions that a woman has. nodrop will keep those empty entries in the data file so that the original number of entries is preserved in the dataset.

Local copies

The wfs command works by reading the data dictionary in ascii format, either locally or from wfs.dhsprogram.com, and then generates and runs a Stata command to read the file, code not applicable values, and provide variable and value labels. If you are planning several extracts from the same data it probably pays to download the data one time and then work locally.

Syntax 5 of the {cmd:wfs} command facilitates making a local copy of the dictionary and data for a survey. For example to copy the data and dictionary for COSR02 from the DHS website to a directory called c:\work you use the command wfs copy using cosr02, dir("c:\work").


. wfs . wfs using cosr02, dhs . wfs v011 v701-v704 using cosr02, dhs . wfs b3* m??1 v00* using cosr02, dhs clear . wfs all using cosr02, dhs clear . wfs v007 v008 unions using cosr02, dhs clear . wfs reshape long, unions . wfs copy using cosr02, dir("c:\work")


. net install wfs, from(http://wfs.dhsprogram.com)


Germán Rodríguez grodri@princeton.edu data.princeton.edu Trevor Croft trevor.croft@icf.com wfs.dhsprogram.com