AI-based coding the only way for Registries to handle increasing loads and CTR shortages

I have spoken at a number of conferences in the past few years and attended even more as a delegate. One perennial topic raised in conversations is the worrying shortage of CTRs and its impact on the efficiency of registries and the need to stretch budgets further than ever.

The shortages and shortcomings are mostly due to factors outside the control of registries and has resulted in increasing reports of growing backlogs, staff burnout, and a reduction in registry efficiency.

Recruitment of new staff has proven problematic with the cohort of senior CTRs moving into retirement age, exacerbated by the introduction of more and more coding requirements. Young potential staff have more work and eduction opportunities available to them so they find clinical coding a less attractive occupation. The higher mobility of these young staff makes it all the more difficult to retain them even once they are recruited

Delays in recruiting new staff have their own knock on effects by increasing backlogs and making it more difficult to provide timely statistics on the distribution and growth of disease cohorts.

The value of Registries has been increasing in that they have been able to provide valuable information for health planning authorities, but the registries themselves are looking for more ways in which they can contribute to the care of cancer patients. As the most comprehensive source of information about a patient they represent a valuable resource for clinical carers when they are making decisions for patients. The efforts made by Registries to extend their capabilities has led to further pressure to make more information available. The expansion in medical investigations, especially in the imaging and pathology technologies has also increased the demand on registries to collect and analyse even more data.

Large pharmaceutical organisations and other medical research groups are increasingly wanting a larger the range of data made available to them and this puts even further demand on registries.

The question for all registry directors is how to cope with:

  • decreasing staff availability;
  • increased data volumes;
  • increased data requests;
  • faster turnaround requirements; and,
  • diversification of their functions.

The popular answer in today’s medical technology enthusiasm is for Artificial Intelligence (AI), but can it really help? And what can it actually do? Here are some suggestions with their advantages:


  • Sort out reportable cancer reports from non-cancer reports and non-reportables;
  • Extract the five core attributes of {Site, Histology, Grade, Behaviour, Laterality} from the cancer reports;
  • Codify the five core attributes to ICD O3.

The advantages of these functions for a cancer registry would be highly significant. The first function would remove a serious and substantial bottleneck to fulfilling the main task of Case Identification and Coding . In some states the amount of unuseful reports can reach as high as 60%, so in a state like California this can represent about 600,000 reports that are unwanted and a nuisance that clutters up the principal work of the registry.

The second function should be designed to process simple reports automatically and move the cryptic reports to a Manual processing workflow stream. In this way the mundane work would be removed off the registrar’s desktop and they could spend additional time on the more difficult cases.

The third function is important in semi-automating the coding of reports. This requires sophisticated AI as there are over 800 histology codes in ICD-O3 and many of them are defined by complex combinations of content and rules. This labyrinthine standard is better served by a strong AI solution that makes fewer mistakes, with mechanism for identifying the most difficult to be passed to Manual Processing.

AI holds out a promise of increasing the efficiency of cancer registries by releasing staff to do the most difficult work. It is not self evident that this use of AI will cause disruption to the workforce. It should act as a support mechanism for staff who are already surrounded by a shrinking workforce, increasing demands, and flooding data lake that they cannot escape.

I would welcome the chance to hear of your particular challenges and explore ways we could work together to enhance the productivity of your registry.

Cross-posted to LinkedIn.

Part Four: A Generic Architecture for IA-CIS – Refactoring the EMR Model

This is Part Four of the 4-part series, Immediate Adaptability (IA).

Part One: Immediate Adaptability
Part Two: Objections to Immediate Adaptability
Part Three: Functional Specifications of IA Clinical Information Systems
Part Four: A Generic Architecture for IA-CIS – Refactoring the EMR Model

The IA-CIS model is in some ways a mirror image of the CERP methodology. Over time the CERP methodology has diminished the role of requirements gathering and systems analysis to the point where its serves only to direct system configuration of fixed data structures and concomitant code bases. IA-CIS does the opposite: it treats requirements and design as the primary function of creating a system for the specific needs of the user community, and then generates an implementation from the choices defined in the design, creating dynamic storage structures served by an engineered library of adaptabilities.

The value of CERP engineered systems lies in their capacity to massage large volumes of data for repetitive, little changing processing. The disadvantage is their inability to satisfy the needs for fast-changing and diverse work that needs to adapt practices immediately for any number of social, legislative, or professional reasons. Using an IA-CIS for clinical care systems will reduce the maintenance load on the CERP so they don’t have to be continually adaptable and hence will lower the costs of managing them.

Hence the architecture advocated herein is to repurpose CERP systems for back office functions and take them out of the clinical coalface locations where IA-CIS technology can provide better support for work and better efficiency gains for the relative costs of installing them. Customisation of IA-CIS is the most likely pathway for reducing workarounds, but with the more important positive benefits of increasing data collection completeness, improving patient safety, enabling cultures of continuous patient improvement, and, of course, simplifying training.

An important extension to the IA-CIS is that as a method for creating a single application for one clinical department so that method can be repeated for many clinical departments in the one organisation. Although each department designs their own system as an autonomous community, they all use the same design tool and the same instantiation library, hence the technical implementation can house them all in the same software installation. This is equivalent to providing multiple customised best-of-breed systems in the one software installation. This architecture introduces a different type of interoperability, that is, system to system by means of within-system native interoperability. So while users are operating under the belief they are autonomous, they are actually all working on top of the one infrastructure with a single data management process that enables the direct sharing of data (given the appropriate permissions).

IA-CIS do not of themselves solve the problems of interoperability between different systems supplied by various vendors. Hence, it is unavoidable that a CERP system and an IA-CIS will have to use some external coding standard to share data between each other. Methods for solving this problem are well established by HL7 or direct procedure calls. But within the IA-CIS paradigm the problem is solved at a highly efficient level.

The IA-CIS also has another significant advantage in that it eliminates silos of data and maintenance and support for multiple systems.

In this architecture it is important not to take a stance that assumes all data needs to be available in the one place. Most data needs to usable by the people who collect it, and then appropriately selected pieces passed on to those who have secondary use purposes. Just as the results of every research experiment is not required by the back office so not every action taken by the clinical staff needs to be defined by the back office. Autonomy at the front office with a requirement to deliver the essentials to the back office enhances the efficiency of both communities.

There is an argument in some circles that there needs to be a single source of truth which can only be provided by a CERP. This is a specious need. The extensive dispersion of care into many disciplines with many different technologies has already lead to an irreversible distribution of data across multiple information systems, such as radiology, pathology and pharmacy. Advocates for this position, who already operate multiple systems successfully, use this as an argument to exclude evaluating the value of locally optimised systems. The solution proposed here is to ensure that local systems have appropriate interoperability.

Part One: Immediate Adaptability
Part Two: Objections to Immediate Adaptability
Part Three: Functional Specifications of IA Clinical Information Systems
Part Four: A Generic Architecture for IA-CIS – Refactoring the EMR Model

Part Three: Functional Specifications of IA Clinical Information Systems

This is Part Three of the 4-part series, Immediate Adaptability (IA).

Part One: Immediate Adaptability
Part Two: Objections to Immediate Adaptability
Part Three: Functional Specifications of IA Clinical Information Systems
Part Four: A Generic Architecture for IA-CIS – Refactoring the EMR Model

The intrinsic definition of an Immediately Adaptable system is in the name: immediate. We consider this to be a period of hours or days, not weeks, months, or years! However the requirements as defined by the complaints to the professional lists and elsewhere have a wider ranging scope.

The first level of problem is the concept of the EMR which describes a medical record as placed into an electronic storage bin instead of a filing cabinet. Such an EMR fits the CERP model which is focused on collecting content and storing it on a large scale and then processing the data for highly stable requirements, e.g. billing. Furthermore the CERP methodology requires decomposition of data into normalised storage structures of permanent definition and storage representation. As “efficient” storage of data is paramount to the processes of “capturing” the data and then “reusing” the data, that is, moving the data from the context in which it is collected to the contexts in which it is reused, are blighted by the effort and complexity of programming for the internal movement of the data. This involves elaborate methods putting data into fixed data structures and reading data back out whenever it is called for with the storage mechanisms tightly coupled with the data capture and display processes. Modern web technologies enable a significant loosening of this coupling but the CERP developers have been slow to embrace these innovations due to their years of investment in older software engineering methods.

Given their engineering paradigm, ultimately, the greatest limitation of the CERP approach is the effort, cost, and risk associated with changing the structures by which the data is defined and stored when a new data element needs to be inserted into a design, or changing the semantic meaning of an existing data item. This requires changing the underlying storage design and creating the code to store that data element and to retrieve it at all the points where it is reused without disrupting anything of the existing processings. With due cause, the large vendors, whose systems have thousands of data tables that are beyond the scope of any one person or even small team to comprehend, are aware that their data management is brittle where even a single accident in a new design or coding can bring down their house of cards. This is one of the crucial reasons for resisting modifications to CERP systems.

The process of separating the capturing data in one context, storing it in a rigid data structure, and then moving it for reuse into another context is fundamental to moving away from the idea of an EMR towards one of a Clinical Information System (CIS). A CIS is a software technology that is integrated into the processes of the users so as to support their work in the most active manner possible. Crucially it is NOT a system that cements in the processes of data collection and dissemination as found in a CERP EMR system. A CIS matches the users requirements for both the flow of data from one context to another, and their movement through activities of work that the users have to perform. A CIS supports both dataflow and workflow for the user. The third part of the triumvirate of a CIS is screen layout and design. The optimal design of a CIS is a dominant part of clinical usability research, but, due to the nature of the CERP methodology, usability discoveries have, at best, a slow journey in seeping into such systems.

An IA-CIS has to be easily and readily changeable, that is accept real-time change (or nearly so). An underlying architectural consequence of real time changeability is that it has to have dynamic data structures along with revision control that does not affect the previous versions of storage organisation or access to previously recorded data so that real-time use is uninterrupted.

We have named the data flow requirement of IA-CIS: native interoperability. This is the idea that data created or input at one point in a data flow can be referred to by its name wherever it needs to be reused. There should be no need to write code to read tables to transfer such data, but rather it should behave more like a link. Thus, when you invoke the name of the data at a time for its reuse in a new context, it appears at that point of invocation, without needing to do anything else. This introduces interesting questions about the protocols for naming data but stable solutions are available to solve them.

IA implies real-time design, which requires a design toolkit for specifying all the requirements of the user including, data definition, screen layout and behaviours, business rules, data flow, and workflow. Underlying these design utilities needs to be a design language universal to all CIS designs that become the specification of the operational system. This has an important consequence: the design of the users’ system is independent of the software that manages their data. The benefit is that design can be changed without affecting software code, and code be changed without necessarily effecting designs. Software maintenance is done independently of any CIS design processes. This radically simplifies the nature of system maintenance as their is no enmeshment of a given system design and the program code required to implement it.

Furthermore it opens the door for usability research to be directly incorporated into an operational system. To support usability research the only software engineering requirement is to have a library function that performs according to the usability task being investigated. If the feature to be investigated is not available in the design tool kit then the only software engineering task is to enhance the design tool to carry the function as an element of the design toolkit. To create an executable instantiation of the design as defined in the design language there needs to be libraries for all design functions and auto generation of data structures that are invoked at the point of real-time system generation.

While not an absolute requirement for an IA-CIS, built-in analytics are needed to achieve the user demands for being able to operationalise Continuous Process Improvement. The role of using a CIS for direct operational workflow is fundamental to its conception. However optimising the CIS over time requires the analysis of the behaviour of the CIS and the users as an integrated entity. This analysis is best achieved by having analytical tools built into the CIS that can actively monitor the CIS and its users to establish the value of changes as they are implemented. Omitting analytics functionality as an intrinsic part of the CIS will severely limit the ability of the user team to identify behaviours of the system (technology + staff) that warrant change and later measures those changes.

Part One: Immediate Adaptability
Part Two: Objections to Immediate Adaptability
Part Three: Functional Specifications of IA Clinical Information Systems
Part Four: A Generic Architecture for IA-CIS – Refactoring the EMR Model

Part Two: Objections to Immediate Adaptability

This is Part Two of the 4-part series, Immediate Adaptability (IA).

Part One: Immediate Adaptability
Part Two: Objections to Immediate Adaptability
Part Three: Functional Specifications of IA Clinical Information Systems
Part Four: A Generic Architecture for IA-CIS – Refactoring the EMR Model

EMR systems built by large vendors have code development operations similar to Enterprise Resource Planning (ERP) ventures similar to SAP, arguably the most successful ERP provider globally. So I will label big vendor technology Clinical ERP or CERP. Smaller but older vendors no doubt have similar models. Only recent vendors appearing in the last 10 years are likely to have different approaches.

The problems with IA for CERP are that it ostensibly requires the vendor to:

1. Give control of the design of their CERP to the user community.
2. Have highly qualified programmers on call to respond when users require changes.
3. Have built-in mechanisms to manage automatic version control, including roll back.
4. Have built-in mechanisms to manage data such that data collected before a given change remains available after the change.
5. Change their interoperability functions on-demand to send and receive data from dynamically changing EMRs.
6. Have confidence that their technology can undergo continuous changes.

These criteria would not just increase the cost to maintain CERP technology, but also raise protests from vendors that maintaining large systems cannot be sustained intellectually as the systems are too complex to change rapidly to not create unexpected consequences. This protest would seem to be entirely valid. It is this very scale and complexity that inhibits changes to “usability” beyond the minimum, not to mention to support IA. The best-of-breed system vendors have done a better job with usability because they do not suffer the same complexity problem and their aim is to deliver a smaller range of functionality, however IA would still be a difficult concern for them.

The technical difficulty in delivering IA can be discerned from the process of creating a CERP system in the first place. The process is a sequence of tasks consisting of requirements gathering, systems analysis, data modelling, code writing, systems testing, and deployment. The CERP providers have escaped part of this process be removing the first two steps on the basis that they have built so many systems they know the generalisations of requirements and analysis. Indeed they have built large code repositories relying on these generalisations and are unwilling to change them because changes will affect so many of their customers. Moreover, the code bases are so large that they are unwilling to risk a large number of unexpected consequences of changes.

The CERP approach was state-of-the-art in the general IT industry of the 1980s but is out-dated for most modern purposes. The method suits large volume data transactions with stable patterns of work and processing, which may be acceptable for back office work, including health organisations. This does not suit the needs of dynamic workplaces where workflow is as important as data capture, data volumes are relatively low, local data flow and analytics are crucial for efficiency, and staff need to run continuous process improvement. In fact imposing immutable CERPs on patient-facing clinical operations blocks processes to create clinical efficiencies and productivity, as is frequently testified in the protests from clinicians in many fora.

The professional lists have many discussions about how multiple systems need more cross-consistency, because as staff move from one site to another, they have an extra cognitive load to learn how to use the many different systems. Training for CERP systems is both high cost and difficult, hence the complaints. A system optimised for IA will be customised for its community of use and so someone working across multiple communities will need to train on different IA systems. Would the same objection apply? Most likely not:

  • CERP “solutions” that fit the local workflow poorly will need significant workarounds in addition to the standard training that still has to be learnt by migratory workers;
  • Claims that the same technology from the same vendor have the same workflow and functions are often spurious – there are cases where two systems ostensibly the same cannot even communicate with each other;
  • Locally designed systems are truly optimal for the local workflow and so training on them is about learning how the local community actually works, surely a necessary criteria for successful health care;
  • Training on locally designed systems has little training costs for local users and modest costs for new users;
  • Senior staff responsible for the training of junior staff use the system to train them in the processes of work. It is often the case that a CERP system is training staff in processes that are considered undesirable, whereas an IA system would enable the senior staff to create an ideal training system. This overtime would lead to better standardised work practices where appropriate, and easier adoption of these better practices as they are defined by the professional community, because the IT behaviour is immediately adaptable.

Part One: Immediate Adaptability
Part Two: Objections to Immediate Adaptability
Part Three: Functional Specifications of IA Clinical Information Systems
Part Four: A Generic Architecture for IA-CIS – Refactoring the EMR Model

What is Natural Language Processing (NLP) for Clinical Texts

There are many claims in the medical technology circles that software does NLP. Not all of these claims are valid.

NLP has a long history and in its earliest days NLP was driven by linguists wanting to automate the grammar rules analyses they do on language corpora. However from the 1980s onwards linguists have been superseded by computer scientists and the algorithms they invent to manipulate data. At the same time as the web began to expand the computer scientists could see that they could bypass the horrendously difficult linguistic pathway of analysing the structure of language by throwing it aside and just working on the statistical characteristics of language. This lead to two pathways for processing strategies. The first is called Information Retrieval (IR) which treats documents purely as a bag of words and the second is statistical NLP (SNLP) which identifies the structure in language by its statistical characteristics rather than by grammatical rules.

So far the IR have won the day for popularity as it is best manifest by the Google search engine. However SNLP is making its way very effectively in a slow wind up to superseding IR.

The basic feature of IR is that it recognises documents that cover a topic of interest. It relies on achieving exact match to the string of letters the user types into the search engine. It has no understanding of the linguistic or conceptual meaning of those letters so it can’t tell the difference between the singular and plural form of a word. One technology that has been devised to overcome the inherent limitation of IR is searching using Regular Expressions rather than strings. This enables the user to design a matching string using multiple patterns rather than the literal string in a Google search. Regular expressions are an extension on the variety of strings you can search for but it doesn’t escape the inherent limitation that a fixed pattern of strings is searched for.

NLP has a different origin to IR and believes that the structure of the sentence is important to understanding it and the semantics of the word also effect the meaning. Hence NLP relies on methods for parsing sentences into their grammatical components and then retrieving content based on understanding that structure. SNLP has bolstered NLP by showing that some patterns of language usage are defined by statistical properties that can be exploited to correctly identify the grammatical role of words in a sentence. A good example is the use of statistical patterns to recognise the part of speech of an unknown word in a sentence by the behaviour of the words around it.

Unfortunately the claim that a system uses NLP is so widespread that the value of the methods are being obscured.

Some claims to be NLP that are false:

1. Use of strings in rules to find desired content.

2. Use of regular expressions to find desired content.

3. Use of IR find content.

The reason strings and rules are not classifiable as NLP is crucial to understanding the true values of SNLP.

Why are the differences important?

String and rule based IR systems have the advantages that they are quick and cheap to build, but their crucial disadvantage is that they can only identify what has already been defined. They also become encumbered once their rule set becomes too large as the effect of changes can’t be well predicted due to interaction between the rules. Also they will have a pronounced tendency to over produce results yielding many false positives in their search.

Statistical NLP has the key advantage that it can identify content it has never seen, so it has a serious discovery advantage. Furthermore as more knowledge is acquired it can be incorporated into the processing stream ensuring that it has a growing knowledge base. Its disadvantage is that it needs a gold standard annotation text to reach very high accuracies and that building the computational model takes more resources than the IR methods, so you need more extensive knowledge and training in the methods to exploit them effectively. Rule systems that have been in use for many years will serve their restricted objectives well and provide better Precision (correctly identifying the items requested, and not retrieving too many false hits) than SNLP systems initially. Ultimately though they will always be behind in Recall (finding all the items requested) and eventually slip behind on Precision once the SNLP system has had sufficient training.

So the next time you see someone touting their technology as NLP check and see if it is really a rule based IR technology or truly statistical.