Data project
Continuous Working Life Sample 2004-2012 (MCVL)
Muestra Continua de Vida Laboral 2004-2012 (MCVL)
Summary
The Continuous Sample of Working Lives (CSWL) is a set of anonymous microdata extracted from administrative records of both the Social Security, the Municipal Register and, depending on the versions, some data from the Tax Office. The Sample is updated every year, getting information from the variables selected from the Security Social system, dating back as far as the computerized records are kept, and from other administrative data sources where complementary information about individuals is recorded. The CSWL objectives are (i) to support research (information is preserved during many decades to develop deep and consistent studies, even in the case of small groups) and (ii) to keep the administrative data transparent leading to social policy exercises. It is a source of reference for the development of longitudinal studies and the application of life-history analysis techniques based on the key concept of 'life-course' and for the study of labour market dynamics and the evolution of the social welfare system. The CSWL design is characterized as follows: - A single and large sample enough for significant studies, even in the case of disaggregated variables. - Data aggregation procedure to minimize the resources deployment. - Information relating to the individuals working life. - Annual update. - Provision of data along with metadata. The main weaknesses of the sample have to do with several aspects related to the aggregation procedure, with the administrative constraints that every data source provides and with the multiplicity of situations regarding the working life of people. This causes the analysis to be performed and the results to have to be explained carefully in order to get an exact idea of the reference population. Just to synthesise, following some Lapuerta’s comments (2010): Some inconsistencies arise from the 'matching' process of the various sources of information • very few individuals have a duplicated ID number (0.1%) • not all people with ‘working life’ are obliged to bear an ID number (i.e. young adults receiving an orphan’s pension) • in some initial years in the sample period, some difficulties in matching data with the Municipal Register arose accounting for outstanding non-matching records Analysing concurring situations in the individual’s working life is also a matter of problems. The individual is the unit of analysis, but much of the information contained in the aggregated files refers to working status(e.g., work, collecting unemployment benefits, pension, etc.). Thus, a single person may have experienced them throughout his or her working life, in some cases occurring at the same time. The same can be said about the duration of contracts when these are different, but refer to one person in a fixed period of time, or the length of the unemployment periods. Another problem is the analysis of the family structure, which is possible using the Municipal Register and to some extent from the Tax data, but it is not possible from the Social Security data.
Type of data
Data Source
Registry
Type of Study
Other: Administrative data on working life, which is the basis for this sample.
Data gathering method
Registries
Access to data
Conditions of access
Besides some very general results (http://www.seg-social.es/Internet_1/Estadistica/Est/Muestra_Continua_de_Vidas_Laborales/Algunos_resultados_generales/index.htm), microdata can be requested using this form, available here: http://www.seg-social.es/Internet_1/Estadistica/Est/Muestra_Continua_de_Vidas_Laborales/SolicitarM/index.htm#documentoPDF
A single accessibility agreement must be signed, just to identify the user, his/her affiliation and research interests. A guideline document for users is also provided, paying attention to the large amount of data from the Sample (over 59 million data rows in 2012).
Type of available data (e.g. anonymised microdata, aggregated tables, etc.)
Anonymised data are available in txt format for user analysis.
Formats available
Txt format
Coverage
Coverage Years of collection, reference years, and sample sizes
Data are available for the period from 2004 to 2011. The sample size differs as the reference population does. For example, in 2006, a sample of 1.17 million was drawn from a population of 29 million, which equals one out of every 25 individuals.
First year of collection
The first issue refers to people who had an economic relationship with Social Security in 2004. However, each issue includes data on the entire working and pension life of the person selected, starting in 1980.
Stratification if applicable
No stratification. The sample is 4% of the reference population (The sample includes around 1.2 million people).
Base used for sampling
Geographical coverage and breakdowns
Data cover the Spanish population at all geographical levels. Towns with less than 40,000 residents are identified only by the provincial code.
Age range
The population from which the sample is drawn consists of all individuals who have been registered in the Social Security system as workers or who are receiving a pension during the reference year. No specific year is considered as it depends on the working life starting point. (Those individuals in the Social Security system registered to receive health assistance or the recipients of non-contributory pension or welfare benefits are not included in the population). In 2012, 1% of the population covered are below 20 years of age, and 1.5% are over the age of 90.
Statistical representativeness
Population representative
Coverage of main and cross-cutting topics
It allows for a detailed study of several topics.
As the information refers to individuals and changes along their working life, different types of variables enter into the sample as drawn from the Social Security files for employees and employers, not only for the reference year, but also for the time period from which individual data are recorded:
- identification (individual’s code, SS code, company’s code,).
- variables which have to do with the work performed: SS regime, registering dates, type of labour contract, working regime, payment, disability conditions (as reported by employer), contribution bases, changes along the working life.
- data referred to employers: economic activity, company size, time acting as employer, place where the company is, type of employer.
For those who are already retired, a set of variables describe the pension type (including disability), pension dates, salary to calculate pension, complements, monthly payments, extra payments, etc.
Data from the Municipal Register refer to age, sex, birth and place of residence; age and sex of all other people in the same household.
Some data conversion algorithms have been employed to preserve confidentiality and anonymity of ID, address, nationality and other sensitive information about individuals and companies.
Results can be stratified for every classification variable included in the sample, given its very large size.
Linkage
Standardisation
No
Possibility of linkage among databases
Internal links are set up with the data sources from which the sample is drawn.
In general, no linkages with other data sources are allowed due to privacy concerns. A few have been made under very restrictive conditions. A comparison with the Economically Active Population Survey (EPA) shows that both datasets provide quite a similar active population structure in Spain.
Data quality
Entry errors if applicable
A file of all the people who were part of the reference population with only its main characteristics (sex, age, region of residence and nationality) is drawn in order to make sure the sample is representative.
In the explanatory document for each variable, possible errors are evaluated. Thus, the "education level" variable is considered "not reliable for younger people". However, most data are considered more reliable than the ones coming from a household survey.
Breaks
No big methodological changes have occurred so far. However, data reported to different administrative offices, and included in the sample, may not match. For example, a child may be included in a household in the Municipal Population Register, but not reported by the mother to the employer for tax purposes.
Consistency of terminology or coding used during collection
The same as above.
Changes in administrative classifications occur from time to time, as in types of labour contracts or economic activities, but both the old and new classification is usually retained.
Governance
Contact information
General Directorate for the Organization of the Social Security (DGOSS)
General Directorate for the Organization of the Social Security (DGOSS)
Jorge Juan 59
28001 Madrid Spain Phone: 91 363 2969
Email: FIPOR.SOCIAL.MTIN(at)seg-social.es
Url: http://www.seg-social.es/Internet_1/Estadistica/Est/Muestra_Continua_de_Vidas_Laborales/index.htm
Timeliness, transparency
There are five months between the end of the reference period (December) and when the microdata is distributed to users (May).