AJOU Open Repository: A standardized analytics pipeline for reliable and rapid development and validation of prediction models using observational health data

BROWSE

Cited 0 times in Scipus Cited Count

A standardized analytics pipeline for reliable and rapid development and validation of prediction models using observational health data

DC Field	Value	Language
dc.contributor.author	Khalid, S	-
dc.contributor.author	Yang, C	-
dc.contributor.author	Blacketer, C	-
dc.contributor.author	Duarte-Salles, T	-
dc.contributor.author	Fernández-Bertolín, S	-
dc.contributor.author	Kim, C	-
dc.contributor.author	Park, RW	-
dc.contributor.author	Park, J	-
dc.contributor.author	Schuemie, MJ	-
dc.contributor.author	Sena, AG	-
dc.contributor.author	Suchard, MA	-
dc.contributor.author	You, SC	-
dc.contributor.author	Rijnbeek, PR	-
dc.contributor.author	Reps, JM	-
dc.date.accessioned	2023-01-10T00:39:16Z	-
dc.date.available	2023-01-10T00:39:16Z	-
dc.date.issued	2021	-
dc.identifier.issn	0169-2607	-
dc.identifier.uri	http://repository.ajou.ac.kr/handle/201003/23926	-
dc.description.abstract	Background and objective: As a response to the ongoing COVID-19 pandemic, several prediction models in the existing literature were rapidly developed, with the aim of providing evidence-based guidance. However, none of these COVID-19 prediction models have been found to be reliable. Models are commonly assessed to have a risk of bias, often due to insufficient reporting, use of non-representative data, and lack of large-scale external validation. In this paper, we present the Observational Health Data Sciences and Informatics (OHDSI) analytics pipeline for patient-level prediction modeling as a standardized approach for rapid yet reliable development and validation of prediction models. We demonstrate how our analytics pipeline and open-source software tools can be used to answer important prediction questions while limiting potential causes of bias (e.g., by validating phenotypes, specifying the target population, performing large-scale external validation, and publicly providing all analytical source code). Methods: We show step-by-step how to implement the analytics pipeline for the question: ‘In patients hospitalized with COVID-19, what is the risk of death 0 to 30 days after hospitalization?’. We develop models using six different machine learning methods in a USA claims database containing over 20,000 COVID-19 hospitalizations and externally validate the models using data containing over 45,000 COVID-19 hospitalizations from South Korea, Spain, and the USA. Results: Our open-source software tools enabled us to efficiently go end-to-end from problem design to reliable Model Development and evaluation. When predicting death in patients hospitalized with COVID-19, AdaBoost, random forest, gradient boosting machine, and decision tree yielded similar or lower internal and external validation discrimination performance compared to L1-regularized logistic regression, whereas the MLP neural network consistently resulted in lower discrimination. L1-regularized logistic regression models were well calibrated. Conclusion: Our results show that following the OHDSI analytics pipeline for patient-level prediction modelling can enable the rapid development towards reliable prediction models. The OHDSI software tools and pipeline are open source and available to researchers from all around the world.	-
dc.language.iso	en	-
dc.subject.MESH	COVID-19	-
dc.subject.MESH	Humans	-
dc.subject.MESH	Logistic Models	-
dc.subject.MESH	Machine Learning	-
dc.subject.MESH	Pandemics	-
dc.subject.MESH	SARS-CoV-2	-
dc.title	A standardized analytics pipeline for reliable and rapid development and validation of prediction models using observational health data	-
dc.type	Article	-
dc.identifier.pmid	34560604	-
dc.identifier.url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8420135/	-
dc.subject.keyword	COVID-19	-
dc.subject.keyword	Data harmonization	-
dc.subject.keyword	Data quality control	-
dc.subject.keyword	Distributed data network	-
dc.subject.keyword	Machine learning	-
dc.subject.keyword	Risk prediction	-
dc.contributor.affiliatedAuthor	Park, RW	-
dc.type.local	Journal Papers	-
dc.identifier.doi	10.1016/j.cmpb.2021.106394	-
dc.citation.title	Computer methods and programs in biomedicine	-
dc.citation.volume	211	-
dc.citation.date	2021	-
dc.citation.startPage	106394	-
dc.citation.endPage	106394	-
dc.identifier.bibliographicCitation	Computer methods and programs in biomedicine, 211. : 106394-106394, 2021	-
dc.identifier.eissn	1872-7565	-
dc.relation.journalid	J001692607	-

Appears in Collections:: Journal Papers > School of Medicine / Graduate School of Medicine > Biomedical Informatics

Files in This Item:: 34560604.pdf Download

Show simple item record

qrcode

트윗하기

License

Ajou University Medical Information & Media Center 164 Worldcup-ro Yeongtong-gu Suwon 16499 Korea / TEL : 031-219-5312
Copyright (c) Ajou University Medical Information & Media Center All Rights Reserved.
AJOU Open Repository는 국립중앙도서관 OAK 보급사업으로 구축되었습니다.

BROWSE

Browse