Guides: EBOSS (Epi/Biostats/Occupational Health Student Society) Workshop on Access to Health Data: Definitions

What do we mean by data?

Statistical Information

Statistical information can be described as the added value arising from the interpretation of statistics or data. This information will often be in the form of some sort of analysis, such an article in Health Reports.

Statistics

Statistics are the numeric facts and figures which have been created from the data. They have been processed and are ready for use, but do not have the same kind of analysis behind them as statistical information does. These can take the form of e-publications, e-tables or databases. These statistics may be:

Produced by users with the help of databases and programs (such as CANSIM or Beyond 20/20) to organise the data based on their own needs. Users can define the level of geography, characteristics of the population, etc. and create a customised view of the data.

Produced by Statistics Canada to answer the most frequently posed questions by their users. These are known as e-tables. They are static in nature which results in no customization by the users.

Produced as a by-product of a user's research. We tend to see these in analysis bulletins, reports, and other e-publications produced by Statistics Canada. These tables are static.

Data

Data are numeric files created and organized for processing and analysis. There are two types of data – aggregate and microdata. Aggregate data and microdata offer the user more control over the variables offered for analysis.

Aggregate data

Aggregate data are statistical summaries organized in a specific data file structure which permits further computer analysis (that is, data processing). Aggregate data are produced to provide access to data that cannot be released as microdata, such as the surveys based on the Business Registry in Statistics Canada, and to organize statistics into data tables.

Microdata

Microdata consist of the data directly observed or collected from a specific unit of observation. That is, a microdata file contains organized raw data wherein the lines represent a specific unit of measure (usually an individual, household or family) and the information about the lines are the values of variables.

When Statistics Canada conducts a survey, it collects information from each unit of observation (e.g., individual, household, etc.). It processes these answers by coding them using a specific number to identify the respondent's answer. For example, Statistics Canada often uses a "1" to represent males and a "2" to represent females. The microdata file is created by coding and electronically recording each survey respondent's responses to all relevant questions.

A microdata file consists of rows of numbers and letters– each row represents the respondent's responses to the questionnaire. It also consists of one logical record per respondent, where the logical record includes all responses made by a single respondent to the questionnaire. Each logical record will consist of one or more physical records (lines of data) - typically, Statistics Canada files use one physical record to describe one logical record. Since the variables are coded (rather than readable as text), the metadata must be used to describe the data file. These numbers are not revealing in themselves and therefore require metadata to help in their interpretation.

It is important to note that certain information collected in the questionnaire is not available in the data file because Statistics Canada places the utmost importance on protecting the anonymity of respondents and the confidentiality of its data (for example, the respondent's name and exact address are never included in the microdata file).