skip to primary navigation skip to content
 

Integrated Census Microdata (I-CeM)

Integrated Census Microdata (I-CeM)

Access

The I-CeM data collection and resource consists of a number of different yet interrelated elements. It includes the details of c.230 million person records derived from full transcriptions of the English and Welsh Census for all years from 1851 to 1921 (except for 1871) and the Scottish Census for 1851-1901 and 1921.

The key underlying databases are archived and curated at the UK Data Service (UKDS), and can be acquired via them. The data are held in two forms - a 'full' version and an 'anonymised' version without names and addresses.

The bibliographic records and doi's for these databases are as follows:

  • Schürer, K., Higgs, E. (2024). Integrated Census Microdata (I-CeM), 1851-1911. [data collection]. UK Data Service. SN: 7481, DOI: http://doi.org/10.5255/UKDA-SN-7481-3.
  • Schürer, K., Higgs, E. (2020). Integrated Census Microdata (I-CeM) Names and Addresses, 1851-1911: Special Licence Access. [data collection]. 2nd Edition. UK Data Service. SN: 7856, DOI: http://doi.org/10.5255/UKDA-SN-7856-2
  • Schürer, K., Wakelam, A. (2024). Integrated Census Microdata (I-CeM), England and Wales, 1921. [data collection]. UK Data Service. SN: 9280, DOI: http://doi.org/10.5255/UKDA-SN-9280-1.
  • Schürer, K., Wakelam, A., (2024). Integrated Census Microdata (I-CeM) Names and Addresses, England and Wales, 1921: Special Licence Access. [data collection]. UK Data Service. SN: 9281, DOI: http://doi.org/10.5255/UKDA-SN-9281-1

Data Downloader

Given the sheer size of this data - even one year's return may be too large for standard computers to manage - I-CeM has, in collaboration with the UKDS, developed a filtered data download system. This allows users to download either full data (i.e. including the original strings) or just encoded variables. Neither of these include personal names or street addresses to maintain the copyright of the publisher. Researchers seeking these variables are required to apply for a special licence via the UKDS.

Users of the former version of I-CeM will note substantial differences to the old data downloader. The most important of these is an increase from the previous limit of 1 million rows per download to 10 million rows of full data (i.e. including string variables) or 50 million rows of encoded data. Additionally, users are now able to download data (as long as it fits within the row limit) from multiple years at once. Previously users were required to download from only one year at a time (meaning those requiring, for example, all the Bricklayers in Oldham 1881-1911 were required to download at least four different tranches of data). This was principally because of the inconsistency of geographic variables between censuses; even parishes which saw no changes to their administrative boundaries might have varying names across the censuses. To allow greater ease of data download, the CONPARID variable has been used instead of the PARISH variable - more on CONPARID can be read in the case studies section of this website. Most CONPAR units are named on the format of "PARISH A, PARISH B, and PARISH C" though in some cases due to multiple parishes being joined together may be under headings such as "TOWN AND SURROUNDING" or "PARISHES IN RECLAIMED FENLAND". If a desired parish cannot be identified, all parishes in each year and their respective CONPARID value can be found in the CONPARID lookup in the metadata section of this website.

Access the data

Data Format

I-CeM data is distributed as a tab delimited file. Previous editions of the downloader issued data as a CSV. While this was a more familiar format for some users, it created a number of problems the most significant of which was that the data contains a number of commas in the string variables. Users therefore occasionally improperly read in data as the CSV naturally read these as delimiters. The data was created in tab format and it was decided to preserve this for public distribution.

The tab delimited data can be opened in a number of software packages.

If users are working with files of fewer than 1 million rows then data can be opened in excel. Users are advised to ensure that any delimiter is set to tab only, checking that "comma" in particular is not enabled. I-CeM also recommends (though it is not essential) that users enable and use the "From Text (Legacy)" version of the "Get Data" tool rather than using Office's new text import tool.

Many alternative packages are available though I-CeM in particular directs users towards using both R and RStudio. Both are available for free and can handle substantial datasets. While first time users may find base R somewhat opaque, in general it does not feature a steep learning curve and numerous tutorials are available online. Those seeking tutorials specific to historical research will likely find those offered by the Programming Historian particularly helpful.