Clinical care accounts for only 10-20% of patients’ health outcomes, while socioeconomic, environmental, and behavioral factors may contribute to the remainder (Hood, Carlyn M., et al., 2016). Recently, the widespread adoption of Electronic Health Records (EHRs) has generated large volumes of clinical data. This is an enabling resource for developing machine learning (ML) models to study patient outcomes. However, information about social determinants of health (SDoH) is usually recorded in unstructured clinical notes, which hampers access to this information. Thus, most current clinical research using EHRs focuses heavily on clinical factors and consequentially may lead to health inequalities. Despite growing interest in incorporating SDoH into clinical decision-making, these factors are studied in isolation. However, social determinants are often interconnected and should be considered in aggregate to improve health outcomes and reduce disparities. In this research, we leverage a combination of structured and free-text data in EHRs to develop novel natural language processing and ML models to extract nonclinical factors. We use these determinants to develop and validate context-sensitive and individualized polyrisk scores to prioritize high-risk patients using both clinical and interacting social factors. These scores will complement the existing EHR data when developing outcome prediction models and help provide tailored interventions in our health system.
As articulated in the Canadian Institutes of Health Research (CIHR) 10-year strategic plan, biomedical science and knowledge mobilization has a critical role in improving the health of people in Canada and throughout the world (CIHR Strategic Plan 2021-2031). Data access and informatics are integral contributors to this goal. Despite massive amount of health data (30% of the world’s data), the health systems are still information poor and prone to disparities in healthcare access and quality. In Canada, while there exist rich provincial and multi-province health data initiatives, they are limited to a specific health outcome, data type (e.g., administrative data) or health care setting (e.g., primary care), which leaves health researchers and stakeholders in the country with incomplete and uncoordinated data about patient health. We will leverage the existing foundational work in the context of health data in the country to develop standard data models to enable interoperability of the existing datasets and integrate other national environmental, behavioural, and lifestyle data (e.g., social media, digital platforms, wearable devices, CANUE, CanPath) into the health data ecosystem and enable pan-Canadian research initiatives in using these data sources for better Canadians health.