Deficiencies in Clinical Natural Language Processing Software – A Review of 5 Systems

We have recently assessed the accuracy of Clinical NLP software available through either open source projects or commercial demonstration systems at processing pathology reports. The attached whitepaper discusses the twenty-eight deficiencies we observed in our testing of five different systems.

Deficiencies in Current Practices of Clinical Natural Language Processing – White Paper

Our analysis is based on the need for industrial strength language engineering that must cope with a greater variety of real-world problems than that experienced by research solutions. In a research setting, users can tailor their data and pre-processing solutions to address the answer to a very specific investigation question unlike real-world usage where there is little, or no, control over input data. As a simple example, in a language engineering application the data could be delivered in a standard messaging format, say HL7, that has to be processed no matter what vagaries it embodies. In a research project that data could be curated to overcome the uncertainties created by this delivery mechanism by removing the HL7 components before the CNLP processing was invoked, a fix not available in a standard clinical setting.

When an organisation is intending to apply a CNLP system to their data the topics discussed in this document need to be assessed for their potential impact on their desired outcomes.

The evaluations were based on two key principles:

  • There is a primary function to be performed by CNLP, that is, Clinical Entity Recognition (CER).
  • There is one secondary function and that is Relationship Identification.

Any other clinical NLP processing will rely on one or both of these primary functions. For the purposes of this conversation we exclude “text mining” which uses a “bag-of-words” approach to language analysis and is woefully inadequate in a clinical setting.

Assessed Software: 

  • Amazon Concept Medical
  • Stanford NLP + Metathesaurus
  • OPenNLP + Metathesaurus
  • GATE + Metathesaurus
  • cTAKES

The systems have the listed deficiencies to a greater or lesser extent. No system has all these problems. The deficiencies discussed are compiled across the 5 topics under the following headings:

  • Deficiencies in Understanding Document Structure
  • Deficiencies in Tokenisation
  • Deficiencies in Grammatical Understanding
  • Deficiencies in the CER Algorithms
  • Deficiencies in Semantics and Interpreting Medical Terminology

How much document detritus do you have to sift through to find what you need in a Cancer Registry?

Many registries have to report to at least 3 authorities, usually: State Registry, Commission on Cancer and CDC while others may also report to SEER.

While there is mostly common content, the differences can be both in the data required and the rules defining that data. To compound the problem, these complexities can often be dwarfed by two other content filters: a. separating the reportable cancer documents from the other two classes of reports, namely the non-reportable cancer reports and the non-cancer reports, and, b. finding the date of initial diagnosis in a complex patient record.

Dealing with the first of these problems, the separation of document types, is particularly acute for regional and central registries.

The Greater California CR has determined that about 60% of the documents they receive are irrelevant to their needs. In a volume of greater than 600,000 reports per annum that amounts to over 360,000 irrelevant documents, a huge overburden of work before the CTRs can get down to doing the task they have been trained (and paid) for.

As part of our research into the size of this problem, we’d like to understand the extent of the overburden in your registry, so here’s a very short questionnaire:

  1. Is your institution a Hospital/Regional CR/ Central CR?
  2. How many reports per annum do you receive?
  3. What proportion of those reports are irrelevant to the registry work?

You can answer the questionnaire here. Your help will be gratefully accepted.

Cross-posted to LinkedIn.

The Many Faces of Natural Language Processing

In the last 10 years the NLP label has been forced from the lab into mainstream usage fuelled principally by the work of Google and others in providing us with tremendously powerful search engines, often masquerading as NLP.

It also happens that Google processes don’t have their origins in NLP but rather in Information Retrieval (IR), a field in which a number of methods are anathema to true NLPers, namely ignoring grammar, truncating words to stems, ignoring structure and generally treating a text as just a big bag of words with little to no interrelationships.

True NLP has a history as deep as computing itself with the very earliest computer usage for language processing occurring in the 1950s. Since then it has emerged as the discipline of Computational Linguistics with serious attention to all the lexical, grammatical and semantic characteristics of language, a far more complicated task than the simple text mining that Google, Amazon and other search engines have taken on.

A richer description of the history of NLP and its rivalry with Information Retrieval is presented in a previous blog but here is a quick and easy guide to understanding the current phrase usage around language processing, ranked in ascending levels of complexity:

  • Text Mining aka IR – rules, regular expressions, bag of words
  • True NLP aka Computational Linguistics – the field of computing the structure of language
  • Statistical NLP (SNLP) – True NLP plus Machine Learning
  • Neural NLP aka Deep Learning– Statistical NLP using particular types of Machine Learning to broaden domain knowledge
  • Language Engineering – Building production grade SNLP solutions

As a quick and pragmatic way to separate IR and NLP consider the result of searching for “liver cancer” in a range of clinical documents. Text Mining will correctly identify any document that contains the literal string “liver cancer” and possibly “cancer of the liver” if it is using grammatical searching as an adjunct to Information Retrieval. NLP will return a superset of documents containing not only the Text Mining results, but “liver core biopsy shows adenocarcinoma”, “hepatocellular carcinoma”, “liver mets”, “liver CA”, “HCC” and many more lexicogrammatical representations of the concept of “liver cancer”.

Cross-posted to LinkedIn.

AI-based coding the only way for Registries to handle increasing loads and CTR shortages

I have spoken at a number of conferences in the past few years and attended even more as a delegate. One perennial topic raised in conversations is the worrying shortage of CTRs and its impact on the efficiency of registries and the need to stretch budgets further than ever.

The shortages and shortcomings are mostly due to factors outside the control of registries and has resulted in increasing reports of growing backlogs, staff burnout, and a reduction in registry efficiency.

Recruitment of new staff has proven problematic with the cohort of senior CTRs moving into retirement age, exacerbated by the introduction of more and more coding requirements. Young potential staff have more work and eduction opportunities available to them so they find clinical coding a less attractive occupation. The higher mobility of these young staff makes it all the more difficult to retain them even once they are recruited

Delays in recruiting new staff have their own knock on effects by increasing backlogs and making it more difficult to provide timely statistics on the distribution and growth of disease cohorts.

The value of Registries has been increasing in that they have been able to provide valuable information for health planning authorities, but the registries themselves are looking for more ways in which they can contribute to the care of cancer patients. As the most comprehensive source of information about a patient they represent a valuable resource for clinical carers when they are making decisions for patients. The efforts made by Registries to extend their capabilities has led to further pressure to make more information available. The expansion in medical investigations, especially in the imaging and pathology technologies has also increased the demand on registries to collect and analyse even more data.

Large pharmaceutical organisations and other medical research groups are increasingly wanting a larger the range of data made available to them and this puts even further demand on registries.

The question for all registry directors is how to cope with:

  • decreasing staff availability;
  • increased data volumes;
  • increased data requests;
  • faster turnaround requirements; and,
  • diversification of their functions.

The popular answer in today’s medical technology enthusiasm is for Artificial Intelligence (AI), but can it really help? And what can it actually do? Here are some suggestions with their advantages:


  • Sort out reportable cancer reports from non-cancer reports and non-reportables;
  • Extract the five core attributes of {Site, Histology, Grade, Behaviour, Laterality} from the cancer reports;
  • Codify the five core attributes to ICD O3.

The advantages of these functions for a cancer registry would be highly significant. The first function would remove a serious and substantial bottleneck to fulfilling the main task of Case Identification and Coding . In some states the amount of unuseful reports can reach as high as 60%, so in a state like California this can represent about 600,000 reports that are unwanted and a nuisance that clutters up the principal work of the registry.

The second function should be designed to process simple reports automatically and move the cryptic reports to a Manual processing workflow stream. In this way the mundane work would be removed off the registrar’s desktop and they could spend additional time on the more difficult cases.

The third function is important in semi-automating the coding of reports. This requires sophisticated AI as there are over 800 histology codes in ICD-O3 and many of them are defined by complex combinations of content and rules. This labyrinthine standard is better served by a strong AI solution that makes fewer mistakes, with mechanism for identifying the most difficult to be passed to Manual Processing.

AI holds out a promise of increasing the efficiency of cancer registries by releasing staff to do the most difficult work. It is not self evident that this use of AI will cause disruption to the workforce. It should act as a support mechanism for staff who are already surrounded by a shrinking workforce, increasing demands, and flooding data lake that they cannot escape.

I would welcome the chance to hear of your particular challenges and explore ways we could work together to enhance the productivity of your registry.

Cross-posted to LinkedIn.

Part Four: A Generic Architecture for IA-CIS – Refactoring the EMR Model

This is Part Four of the 4-part series, Immediate Adaptability (IA).

Part One: Immediate Adaptability
Part Two: Objections to Immediate Adaptability
Part Three: Functional Specifications of IA Clinical Information Systems
Part Four: A Generic Architecture for IA-CIS – Refactoring the EMR Model

The IA-CIS model is in some ways a mirror image of the CERP methodology. Over time the CERP methodology has diminished the role of requirements gathering and systems analysis to the point where its serves only to direct system configuration of fixed data structures and concomitant code bases. IA-CIS does the opposite: it treats requirements and design as the primary function of creating a system for the specific needs of the user community, and then generates an implementation from the choices defined in the design, creating dynamic storage structures served by an engineered library of adaptabilities.

The value of CERP engineered systems lies in their capacity to massage large volumes of data for repetitive, little changing processing. The disadvantage is their inability to satisfy the needs for fast-changing and diverse work that needs to adapt practices immediately for any number of social, legislative, or professional reasons. Using an IA-CIS for clinical care systems will reduce the maintenance load on the CERP so they don’t have to be continually adaptable and hence will lower the costs of managing them.

Hence the architecture advocated herein is to repurpose CERP systems for back office functions and take them out of the clinical coalface locations where IA-CIS technology can provide better support for work and better efficiency gains for the relative costs of installing them. Customisation of IA-CIS is the most likely pathway for reducing workarounds, but with the more important positive benefits of increasing data collection completeness, improving patient safety, enabling cultures of continuous patient improvement, and, of course, simplifying training.

An important extension to the IA-CIS is that as a method for creating a single application for one clinical department so that method can be repeated for many clinical departments in the one organisation. Although each department designs their own system as an autonomous community, they all use the same design tool and the same instantiation library, hence the technical implementation can house them all in the same software installation. This is equivalent to providing multiple customised best-of-breed systems in the one software installation. This architecture introduces a different type of interoperability, that is, system to system by means of within-system native interoperability. So while users are operating under the belief they are autonomous, they are actually all working on top of the one infrastructure with a single data management process that enables the direct sharing of data (given the appropriate permissions).

IA-CIS do not of themselves solve the problems of interoperability between different systems supplied by various vendors. Hence, it is unavoidable that a CERP system and an IA-CIS will have to use some external coding standard to share data between each other. Methods for solving this problem are well established by HL7 or direct procedure calls. But within the IA-CIS paradigm the problem is solved at a highly efficient level.

The IA-CIS also has another significant advantage in that it eliminates silos of data and maintenance and support for multiple systems.

In this architecture it is important not to take a stance that assumes all data needs to be available in the one place. Most data needs to usable by the people who collect it, and then appropriately selected pieces passed on to those who have secondary use purposes. Just as the results of every research experiment is not required by the back office so not every action taken by the clinical staff needs to be defined by the back office. Autonomy at the front office with a requirement to deliver the essentials to the back office enhances the efficiency of both communities.

There is an argument in some circles that there needs to be a single source of truth which can only be provided by a CERP. This is a specious need. The extensive dispersion of care into many disciplines with many different technologies has already lead to an irreversible distribution of data across multiple information systems, such as radiology, pathology and pharmacy. Advocates for this position, who already operate multiple systems successfully, use this as an argument to exclude evaluating the value of locally optimised systems. The solution proposed here is to ensure that local systems have appropriate interoperability.

Part One: Immediate Adaptability
Part Two: Objections to Immediate Adaptability
Part Three: Functional Specifications of IA Clinical Information Systems
Part Four: A Generic Architecture for IA-CIS – Refactoring the EMR Model