We have recently assessed the accuracy of Clinical NLP software available through either open source projects or commercial demonstration systems at processing pathology reports. The attached whitepaper discusses the twenty-eight deficiencies we observed in our testing of five different systems.
Our analysis is based on the need for industrial strength language engineering that must cope with a greater variety of real-world problems than that experienced by research solutions. In a research setting, users can tailor their data and pre-processing solutions to address the answer to a very specific investigation question unlike real-world usage where there is little, or no, control over input data. As a simple example, in a language engineering application the data could be delivered in a standard messaging format, say HL7, that has to be processed no matter what vagaries it embodies. In a research project that data could be curated to overcome the uncertainties created by this delivery mechanism by removing the HL7 components before the CNLP processing was invoked, a fix not available in a standard clinical setting.
When an organisation is intending to apply a CNLP system to their data the topics discussed in this document need to be assessed for their potential impact on their desired outcomes.
The evaluations were based on two key principles:
- There is a primary function to be performed by CNLP, that is, Clinical Entity Recognition (CER).
- There is one secondary function and that is Relationship Identification.
Any other clinical NLP processing will rely on one or both of these primary functions. For the purposes of this conversation we exclude “text mining” which uses a “bag-of-words” approach to language analysis and is woefully inadequate in a clinical setting.
- Amazon Concept Medical
- Stanford NLP + Metathesaurus
- OPenNLP + Metathesaurus
- GATE + Metathesaurus
The systems have the listed deficiencies to a greater or lesser extent. No system has all these problems. The deficiencies discussed are compiled across the 5 topics under the following headings:
- Deficiencies in Understanding Document Structure
- Deficiencies in Tokenisation
- Deficiencies in Grammatical Understanding
- Deficiencies in the CER Algorithms
- Deficiencies in Semantics and Interpreting Medical Terminology