Text and data mining refers to the processes by which "text or datasets are crawled by software that recognizes entities, relationships, and action." -- GALE, 2017. Text and data mining is an important, new area for academic researchers largely because the output of these processes can result in detecting patterns, trends and also drawing new conclusions.
McGill Library can facilitate access to text corpora for McGill researchers. Assistance can entail helping you locate textual data sources, negotiate access to textual collections for text mining, and, in some cases, purchase or license data. We can also help you find and use tools for managing and analyzing textual data.
See the following related guides for:
Network Analysis & Data Visualization for Humanities and Social Sciences
Creating digital books and exhibits as well as interactive map and timelines: Digital and Multimedia Publishing guide
GIS software & tools: Maps and Geospatial Data guide
Growing collection of discrete data sets from McGill-digitised materials available for downloading and text data mining.
Canadian Architect and Builder, plain text files containing the full text of the publication Canadian Architect and Builder (1888–1908) the only professional architectural journal published in Canada before World War I
The Fur Trade in Canada and the North West Company data set provides access to the full-text XML files of 38 manuscripts collectively known as the Masson Papers, held in McGill University Library Rare Books and Special Collections
Gynaecology in Traditional Chinese Medicine texts on the practice of gynecology in late imperial China
McGill County Atlas Project People Index is an extract from 43 Ontario county atlases produced between 1874 and 1881 that contain indexes of persons residing in each county. The CSV has 172927 records with the following fields: title (e.g. Mr., Mrs., Prof.), first name, last name, township name, town name, county name, atlas date, URL
McGill Electronic Theses and Dissertations contains full text and metadata (file types XML and HTML) of over 38,000+ theses and dissertations from 1881 - made available for research purposes
McGill Library • Questions? Ask us!
Privacy notice