“The critical difference between manual research and the platform is to connect events directly and indirectly to a student. We have uncovered the stories behind the data. It is the full truth that we believe will allow survivors and their families to feel heard.” - Yuusnewas Project
DataWalk was able to help analysts significantly reduce the time spent compiling data and statistics. Queries were created and saved to answer questions such as:
From the 1880s until 1997, Indian Residential Schools in Canada forcibly separated indigenous children from their families and attempted to enfranchise Indians to become immersed into Canadian culture. The Indian Act was passed in 1876 and supported a policy of aggressive civilization. Within this policy, abuse was commonplace. Many children left their families and never returned*.
In 2021, a team led by the Squamish (Sḵwx̱wú7mesh Úxwumixw) Nation initiated the Yúusnew̓as Project, and embarked on a research journey to not only find missing children but also to understand the student experience. The project encountered massive analytical challenges along the way, including hand-written records, multiple languages, long-forgotten colloquialisms, incomplete information, working across governmental and community governance, data sovereignty rights, and others. The team continues to work to overcome all barriers.
This case study seeks to spread awareness of the ongoing journey for justice and closure of the Squamish and other indigenous peoples of Canada. Through this project, DataWalk software enables the Squamish Nation to deeply analyze historical records, and build out timelines for the children, including which schools they attended, medical records, and cemetery burials. For the families of those children that are lost, this provides them with confidence that the truth has surfaced and creates the necessary space to continue their spiritual journey.
The Sḵwx̱wú7mesh Úxwumixw (Squamish Nation) is the lead community for the archival and land-based research into the former St. Paul’s Residential School. Other communities also attended this school while it was open from 1899 until it was shut down in 1959. The word “Yúusnew̓as” means “Taking care of spirit, taking care of one another, taking care of everything around us” and was proposed by the Elders Advisory Committee for the project name as a recognition that they must take care of survivors as well as all who have been impacted by intergenerational harms.
The project proceeds through phases. Phase One, Staging, is now complete. In this first phase the Project Team planned how to care for their Elders, survivors, and community. This included setting up health and wellness support, a process for recording truths, researching history through the archives, and preparing for land-based surveys. The project is currently in Phase Two: Truth. During this phase, they are researching the history of ancestors and recording the truths of Indian Residential Schools (IRS) survivors. The Yúusnew̓as Project has been set up to ensure work is done with indigenous culture and protocols as the foundation. Knowledge keepers and those with traditional knowledge lead the project and — before delivery of activities, events, or ceremony — review all work. The Steering Committee guides the project while the Project Team actively works across multiple areas including health and wellness support, recording stories, archive and land-based research, and cultural practices**.
The goal of the Yúusnew̓as Project is to understand more of what happened at the schools and identify possible locations where Indigenous children may be buried to provide closure to families. Unfortunately, the only evidence the researchers have is a voluminous set of hand-written documents, some nearly a century old. These documents are written in multiple languages, include long-forgotten slang terms and colloquialisms and are sometimes difficult to read. The Yúusnew̓as Project chose the DataWalk analytical platform for its ability to analyze historical records and transcripts, find connections between people and locations, create maps of historical locations, and create link charts utilizing data from available data sources.
Prior to the use of DataWalk, analysts and historians were personally reading all historical documents, manually noting all links and important information, and building flow charts in drawing software or by hand. The process was tedious and time-consuming, and made it difficult to summarize the data for record verification. Questions like “How many students were discharged before the age of 16?” were incredibly difficult to answer without reading and counting the transcripts.
The project also lacked a central repository to track and store all the documents processed. Spreadsheets were filled with names and years with links to the original document, but this made information difficult to share internally. It also made cooperation difficult, with analysts struggling to divide up the transcription work and delegate tasks among the team.
The unique challenge the Yuusnewas Project uncovered was that research would need to happen across indigenous communities, in remote locations in Canada with little access to technical data scientists or analysts who understood their unique conditions. In addition, if research was conducted independently within each community, massive duplication of effort would be commonplace as the subjects of the research — children — were often sent to more than one school and hospital throughout Canada. To promote more consistent outcomes for every child who attended Indian Residential School, cooperation, collaboration, and a single platform approach is required.
The Yuusnewas Project researched other big data platforms, and the value in favor of DataWalk was the visual functionality that required no advanced technical skills to perform analyses. Other platforms seemed to require more third-party analysis and technical support, resulting in a perceived higher cost.
Critical to the success of this work was the community active in the research. It was important that each community have access to the information necessary to collaborate and understand the complete student experience, across schools, across hospital visits, and in community records. The selection process ultimately resulted in the request for a proof of concept project. A small, time limited contract was signed to create a live environment using the types of data and topics of data known to be available. DataWalk worked for six weeks to create the environment which would showcase the child in the center of all information directly and indirectly related to that child and their experience.
This project aimed to aggregate all the various historical data records, interview transcripts, and articles into a singular platform for analysis. The data included important sites mapped, combined with the current map addresses overlaid by historical locations that no longer exist. The data was presented in a variety of formats, including scanned handwritten notes, tables, and typewritten articles. Once the data was transcribed, DataWalk could ingest the data via DataWalk Apps that clean, transform, and insert the data into the appropriate datasets and columns. The datawalk-excel-ingestor is a DataWalk app that reads the columns and types of data, and distributes the data into the appropriate datasets. Future transcribed datasets can then automatically run through the same process, with minimal input from the analyst.
Interview transcriptions in the .txt format were analyzed via DataWalk Natural Language Processing (NLP). The interview transcripts are separated into lines, and DataWalk then automatically extracts entities (e.g., locations, people, dates, etc.) from those lines. Finally those entities are inserted into datasets for analysis. Those entities can be linked to entities from other transcribed records, such as school or cemetery records. This process can be used for other unstructured documents, provided that they are processed by OCR and can be recognized as text by DataWalk. Some manual work was required to create a dictionary/library for entity extraction that will recognize entities that are not automatically recognized, such as indigenous names and nations.
Data transformations can be performed on the data after ingesting into DataWalk. Names and locations can all be standardized (e.g., uppercase, removal of symbols for easier linking), and dates can be formatted in the same way. Standardization of the data makes analysis easier, while also making any generated work product more understandable. The cleaned and transformed data is parsed into a data model, which is a visual representation of the data in a unified view, where the analyst can then easily find patterns and connections across all the various documents and sources. DataWalk software enabled creation of a data model specifically designed to meet the needs of Squamish Nation, and the flexibility of the model ensures that new data and new data sources can easily be integrated.
The project utilized DataWalk’s flexible permission schema to show/hide certain datasets, columns, rows, or fields with sensitive data. Users were split into permission groups, where they are allowed to see certain “global” data, and then the data pertaining to their own group only. This was done so that if other nations or groups use the same platform and environment, sensitive information can be hidden from other users while important information can still be shared.
DataWalk analytical functionality was configured to meet the needs of the Yuusnewas Project, and was delivered using existing DataWalk analytical tools and processes. The researchers and analysts were trained to perform no-code visual queries on the data model (a knowledge graph known as the DataWalk Universe Viewer). The data was cross-referenced and connected to see which people were mentioned across different records and datasets. Analysts used DataWalk link charts to visualize student stories and fill in gaps of missing information. Features like “glyphs” and dynamic icons - which highlight certain key fields in the data and allow for a more understandable and comprehensive link chart - were configured for the “School Records” and “Students” data sets. The DataWalk time series function then allows analysts to visualize a student’s progression through time - grades/levels, medical procedures, admissions and discharges - and play it out dynamically.
DataWalk Data Model For Squamish Nation.
DataWalk also allowed the project analysts to truly map out a student’s progress from the time of admission to schools, to their final placement. Historical records are easily linked with entity resolution of student names, and each of their life events - from admission, to medical records, to other school records, to discharge and/or cemetery burial - can be mapped on a timeline. This provides the researchers with the ability to truly understand each student’s story.
Through this project, DataWalk was able to work with the Yúusnew̓as Project to help overcome these challenges. A comprehensive list was compiled of all the archival documents in their various formats. Different transcription services and products were tested to provide the best results. Data formats were standardized into templates for ingestion and a flexible data model was created, and this model can be changed and updated as other nations join the project. Finally, a workflow was implemented for the transcription, ingestion, and analysis of the data. The journeys of thousands of indigenous children, which were previously obscured, can now be well-understood and shared with the families who have waited for closure.
“The critical difference between manual research and the platform is to connect events directly and indirectly to a student. We have uncovered the stories behind the data. It is the full truth that we believe will allow survivors and their families to feel heard.” - Yuusnewas Project
This project presented a complex challenge for comprehensive analytics which required multiple jurisdictions, communities, skill sets, and governance models to complete the work. The platform is intended to provide barrier-free access to secure information for lead communities. This project utilized best practices to identify, prepare, and analyze data from various sources of different formats that resonate with intelligence analysts, investigators, and data experts as this important investigative work continues. For projects and organizations struggling with manual processing of unstructured data, this workflow can be replicated to save vast amounts of time and effort. Such a workflow in DataWalk will significantly increase the speed and accuracy at which final connections and analyses are pulled from the data.
To learn more, please visit: https://datawalk.com/intelligence-analysis-software/