Data project

Longitudinal Individual Data Base (LINDA)

Longitudinal individdatabas (LINDA)

Summary

Strengths: Apart from being a panel which is representative for the population, the sampling frame ensures that the data are representative for each year: starting with a representative sample a particular year, we sample from the inflow to replace the outflow to obtain next year’s sample; thus, the data are also cross-sectionally representative. The database will be updated annually. Attached to LINDA is a non-overlapping sample of immigrants created in the same fashion as the general sample. This immigrant sample consists of 20 percent of all individuals born abroad along with information on their families. A general feature of the data is that information becomes richer over time. For the period 1960-1967, there is only census data for 1960 and 1965 along with yearly (rudimentary) information on income from Social Insurance Agency's pension register. From 1968 and onwards, there is yearly information on income and some background characteristics from Statistics Sweden’s income registers. As time passes, the income registers become more detailed, including more components of income (i.e. transfers) especially during the 1980’s. In the 1990’s, the database is expanded in a significant way since other register information - such as information on unemployment duration. LINDA provides researchers and public policy analysts with the means of asking new and interesting questions, particularly regarding the reasons for, and consequences of, change - broadly defined. Weaknesses Frame population for LINDA is the copy of the Tax civil registers stored in the Total Population Register (TPR). This means that persons who, under current regulations must be registered, but is not it constitutes under-coverage. Conversely, people who are registered although under current regulations should not be there is a cover. In particular, deficiencies in reporting of emigration resulting in too high or to low figures, these errors can be assumed to be negligible.

Type of data

Data Source
Registry

Type of Study
Cohort study

Data gathering method
Registries

Access to data

Conditions of access
The registry is the primary source. On a contract basis, scientists can, after initial judgment, gain access to anonymised, primary material for their own research. Statistics Sweden can also perform special processing on a contract basis. In general, there are certain legal and other constraints for international access to primary data. The registry is distributed through the MONA system (Micro Data On line access) to users. The MONA system aims to increase access to microdata while security and privacy in the management are strengthened. With MONA, users gain access to databases and can make derivative works of them online. Physically, the databases are stored at SCB.

Type of available data (e.g. anonymised microdata, aggregated tables, etc.)
The MONA system provides secure access to micro data at Statistics Sweden from an Internet connection. Here data are processed and analyzed through a rich set of applications; see below for a list of included software. Aggregated results are automatically sent to a user’s designated mail account. Users can also store intermediate results on Statistics Sweden servers for future use.

Formats available
Excel, SAS, SPSS, STATA and other forms; see http://www.scb.se/Grupp/Tjanster/MONA_produktblad_engelsk.pdf

Coverage

Coverage Years of collection, reference years, and sample sizes
The core of the data is the income registers (Inkomst- och taxeringsstatistiken) - available annually from 1968 - and population census data (Folk- och Bostadsräkningen) - available every fifth year from 1960 to 1990. All variables in these registers are included in the database. LINDA contains information on 300,000 individuals annually. For each year, information on all family members of the sampled individuals is added to the data set. Family members are only included in the sample as long as they stay in the family. The definition of a "family" differs between the census and income registers, so for census years both definitions of family are available.

First year of collection
1960

Stratification if applicable
No stratification

Base used for sampling

Geographical coverage and breakdowns
Representative for total population

Age range
Information on family members

Statistical representativeness
Population representative

Coverage of main and cross-cutting topics
The data should be applicable in a wide range of areas, including economics, demography, sociology, economic geography and public policy analysis. The potential of the database is enhanced by the fact that the master database is stored at Statistics Sweden; thus, researchers can use LINDA as a sampling frame and add on information from other registers to fit their specific purposes. This is certainly cost effective since the expenses associated with constructing a representative sample and obtaining vital background variables have already been incurred. Moreover, matching the data with interesting regional characteristics is an easy task because of the very detailed information on place of residence. Within the field of economics, the data are well-suited for questions regarding various aspects of individual mobility, unemployment duration, the economic consequences of immigration, housing economics, income distribution, the effects of the welfare state, and public policy analysis in general.

Linkage

Standardisation
The Swedish Standard Classification of Occupations 1996 (SSYK 96) is a national adaptation of the International Standard Classification of Occupations (ISCO-88) published in 1990 by the International Labour Office, Geneva. Also International Standard Classifi¬cation of Education (ISCED 97).

Possibility of linkage among databases
The database is intended to be a general research base - a complement to surveys such as LNU (The Level of Livings Survey) and HUS (The Household Market and Non-market Activities). The core registers consist of the income registers and population censuses. Attached to LINDA is a non-overlapping sample of immigrants created in the same fashion as the general sample. This immigrant sample consists of 20 percent of all individuals born abroad along with information on their families.

Data quality

Entry errors if applicable
LINDA is based on data from the Income Register. Since the income registry based on administrative records and is based on administrative gets a very high precision in their amount calculations. Variable quality of family composition is slightly worse due to cohabiting families (not married) but the children are regarded as two single households. The number of families are therefore somewhat higher than the actual number. In this context it should be emphasized that LINDA is essentially a longitudinal individual database. The statistics are based on data reported in the tax return, control data, etc. This means that income and wealth as withheld tax left out. The size of this hidden sector of the economy is relatively unknown, but likely to be considerable

Breaks
No significant break – more of a question of continuous development.

Consistency of terminology or coding used during collection
High level of consistency in terminology and categories. Variables, however, might changes name and connotations, which is important for scholars to be informed about.

Governance

Contact information
Daniel Kruse / Department for Population/Welfare Statistics
Statistics Sweden - Avdelningen för befolkning och välfärds¬statistik
Ekonomisk välfärd
701 89 Örebro Sweden Phone: 0046-19-176594
Email: inkomststat(at)scb.se
Url: http://www.scb.se/Pages/Product____34441.aspx; http://www.scb.se/LE1900

Timeliness, transparency
Usually one year and 2 months, but LINDA has no more new publication of tables.