Skip to Main Content

Text Data Mining (TDM)

A research guide for helping you identify public and licensed text sources for text data mining as well as tools for text analysis.

Data Sets from McGill-digitised Collections

Growing collection of discrete data sets from McGill-digitised materials available for downloading and text data mining.

  • Canadian Architect and Builder, plain text files containing the full text of the publication Canadian Architect and Builder (1888–1908) the only professional architectural journal published in Canada before World War I

  • The Fur Trade in Canada and the North West Company data set provides access to the full-text XML files of 38 manuscripts collectively known as the Masson Papers, held in McGill University Library Rare Books and Special Collections

  • Gynaecology in Traditional Chinese Medicine texts on the practice of gynecology in late imperial China

  • McGill County Atlas Project People Index is an extract from 43 Ontario county atlases produced between 1874 and 1881 that contain indexes of persons residing in each county. The CSV has 172927 records with the following fields: title (e.g. Mr., Mrs., Prof.), first name, last name, township name, town name, county name, atlas date, URL

  • McGill Electronic Theses and Dissertations contains full text and metadata (file types XML and HTML) of over 38,000+ theses and dissertations from 1881 - made available for research purposes 

  • McGill Library Chapbooks has over nine hundred British and American chapbooks andTEI XML file for each of the chapbooks using TEI P5:Guidelines for Electronic Text Encoding and Interchange by the TEI Consortium. Level 4 coding from Best Practices for TEI in Libraries was used to guide the encoding. Headers are minimal and without bibliographic information. The woodcuts in each chapbook were assigned a classification code from the Iconclass thesaurus to describe the subject of the image (zip file)

Text and Data Mining

Text and data mining refers to the processes by which "text or datasets are crawled by software that recognizes entities, relationships, and action." -- GALE, 2017. Text and data mining is an important, new area for academic researchers largely because the output of these processes can result in detecting patterns, trends and also drawing new conclusions.

McGill Library can facilitate access to text corpora for McGill researchers. Assistance can entail helping you locate textual data sources, negotiate access to textual collections for text mining, and, in some cases, purchase or license data. We can also help you find and use tools for managing and analyzing textual data.

Other guides

See the following related guides for:

McGill LibrariesQuestions? Ask us!
Privacy notice