Combining advanced machine learning and natural language processing methods to identify patients with non-metastatic castration-resistant prostate cancer from electronic health record data.
By combining machine learning and rule-based natural language processing (NLP), researchers have developed an algorithm to leverage electronic health records (EHR) to identify patients with non-metastatic castration-resistant prostate cancer (nmCRPC).1
Using EHR data from the National Department of Veterans Affairs, researchers identified 13,199 patients in the final nmCRPC cohort of 654,148 prostate cancer patients from 2006 to 2020. Of the total number of prostate cancer patients identified by their algorithm, 26,506 were castration-resistant, but in the nmCRPC cohort, 8,297 were excluded due to evidence of metastatic disease.
The machine learning algorithm was 86% accurate, using NLP to classify patients with metastatic disease with 96% accuracy, 99% precision, and 98% sensitivity. In addition, within 3 months of diagnosis, patients were 86% accurate in predicting whether they would progress to nmCRPC.
“It is important to be able to identify complex disease states from increasingly readily available EHR data,” researchers at the Huntsman Cancer Institute at the University of Utah wrote in their research poster. “We combined advanced machine learning and NLP methods to identify [patients with] nmCRPC from EHR data including various elements from multiple sources. “
The researchers used an extreme gradient boosting machine learning method previously trained on a similar cohort of prostate cancer patients identified by the Veterans Affairs Cancer Registry. The International Classification of Diseases (ICD) -9 and -10 codes were divided into 7-day intervals, and the number of ICD codes within each interval was assigned as a set of predictive features for progressive patients.
It also allowed the researchers to exclude patients who might have had no prostate cancer in the EHRs they looked at. Trained patients are fed into the algorithm and taught to classify patients. The first was whether the patient presented with urinary symptoms, if the answer was yes, it was determined whether the patient had an ICD with bladder cancer or urinary tract infection, and if the answer was again yes, these patients were designated as those without prostate cancer. Patients with prostate cancer ICD codes were assigned a value of +2, which allowed the model to be appropriately weighted to continue predicting patient progression.
To further categorize patients, those with prior surgical castration, current androgen deprivation therapy (ADT) or evidence of testosterone levels consistent with medical castration, 50 ng/dl or higher (≤ 2.0 nmol/l) were included. considered castrated. These patients were then removed from the cohort. In addition, nmCRPC patients were defined as having a diagnosis of prostate cancer, castration-resistant, defined as whether the patient had 2 consecutive PSA elevations during castration, or radiographically reported no evidence of metastatic disease.
To identify patients with metastatic disease, patient data was provided through NLP to find non-negative mentions of metastatic disease in radiology reports. The algorithm then uses a unified medical language system to identify metastatic vocabulary and identify patterns of metastatic disease, but still requires human review. Once done, these patients are scored to trigger identification in a broader algorithm that looks at thousands of prostate cancer patients.
According to the researchers, if patients show no signs of metastatic disease, but have disease progression, despite their castration-level testosterone signal, they convert to nmCRPC. This is usually after a patient initially responds to ADT but becomes resistant to therapy that inhibits androgen binding to the androgen receptor, thereby blocking the potential for treatment. Identifying these patients is important for adjusting treatment and managing their disease progression.
“This method classifies cancer diagnoses and date of diagnosis with reasonable accuracy,” the researchers concluded.
Patil V, Rasmussen K, Morreall D, et al. RWD140 uses machine learning to identify patients with non-metastatic castration-resistant prostate cancer (NMCRPC) from electronic health record data. health value. 2022;25(Suppl 7):S603. doi.org/10.1016/j.jval.2022.04.1663