Part Two: Objections to Immediate Adaptability

This is Part Two of the 4-part series, Immediate Adaptability (IA).

Part One: Immediate Adaptability
Part Two: Objections to Immediate Adaptability
Part Three: Functional Specifications of IA Clinical Information Systems
Part Four: A Generic Architecture for IA-CIS – Refactoring the EMR Model

EMR systems built by large vendors have code development operations similar to Enterprise Resource Planning (ERP) ventures similar to SAP, arguably the most successful ERP provider globally. So I will label big vendor technology Clinical ERP or CERP. Smaller but older vendors no doubt have similar models. Only recent vendors appearing in the last 10 years are likely to have different approaches.

The problems with IA for CERP are that it ostensibly requires the vendor to:

1. Give control of the design of their CERP to the user community.
2. Have highly qualified programmers on call to respond when users require changes.
3. Have built-in mechanisms to manage automatic version control, including roll back.
4. Have built-in mechanisms to manage data such that data collected before a given change remains available after the change.
5. Change their interoperability functions on-demand to send and receive data from dynamically changing EMRs.
6. Have confidence that their technology can undergo continuous changes.

These criteria would not just increase the cost to maintain CERP technology, but also raise protests from vendors that maintaining large systems cannot be sustained intellectually as the systems are too complex to change rapidly to not create unexpected consequences. This protest would seem to be entirely valid. It is this very scale and complexity that inhibits changes to “usability” beyond the minimum, not to mention to support IA. The best-of-breed system vendors have done a better job with usability because they do not suffer the same complexity problem and their aim is to deliver a smaller range of functionality, however IA would still be a difficult concern for them.

The technical difficulty in delivering IA can be discerned from the process of creating a CERP system in the first place. The process is a sequence of tasks consisting of requirements gathering, systems analysis, data modelling, code writing, systems testing, and deployment. The CERP providers have escaped part of this process be removing the first two steps on the basis that they have built so many systems they know the generalisations of requirements and analysis. Indeed they have built large code repositories relying on these generalisations and are unwilling to change them because changes will affect so many of their customers. Moreover, the code bases are so large that they are unwilling to risk a large number of unexpected consequences of changes.

The CERP approach was state-of-the-art in the general IT industry of the 1980s but is out-dated for most modern purposes. The method suits large volume data transactions with stable patterns of work and processing, which may be acceptable for back office work, including health organisations. This does not suit the needs of dynamic workplaces where workflow is as important as data capture, data volumes are relatively low, local data flow and analytics are crucial for efficiency, and staff need to run continuous process improvement. In fact imposing immutable CERPs on patient-facing clinical operations blocks processes to create clinical efficiencies and productivity, as is frequently testified in the protests from clinicians in many fora.

The professional lists have many discussions about how multiple systems need more cross-consistency, because as staff move from one site to another, they have an extra cognitive load to learn how to use the many different systems. Training for CERP systems is both high cost and difficult, hence the complaints. A system optimised for IA will be customised for its community of use and so someone working across multiple communities will need to train on different IA systems. Would the same objection apply? Most likely not:

  • CERP “solutions” that fit the local workflow poorly will need significant workarounds in addition to the standard training that still has to be learnt by migratory workers;
  • Claims that the same technology from the same vendor have the same workflow and functions are often spurious – there are cases where two systems ostensibly the same cannot even communicate with each other;
  • Locally designed systems are truly optimal for the local workflow and so training on them is about learning how the local community actually works, surely a necessary criteria for successful health care;
  • Training on locally designed systems has little training costs for local users and modest costs for new users;
  • Senior staff responsible for the training of junior staff use the system to train them in the processes of work. It is often the case that a CERP system is training staff in processes that are considered undesirable, whereas an IA system would enable the senior staff to create an ideal training system. This overtime would lead to better standardised work practices where appropriate, and easier adoption of these better practices as they are defined by the professional community, because the IT behaviour is immediately adaptable.

Part One: Immediate Adaptability
Part Two: Objections to Immediate Adaptability
Part Three: Functional Specifications of IA Clinical Information Systems
Part Four: A Generic Architecture for IA-CIS – Refactoring the EMR Model

What is Natural Language Processing (NLP) for Clinical Texts

There are many claims in the medical technology circles that software does NLP. Not all of these claims are valid.

NLP has a long history and in its earliest days NLP was driven by linguists wanting to automate the grammar rules analyses they do on language corpora. However from the 1980s onwards linguists have been superseded by computer scientists and the algorithms they invent to manipulate data. At the same time as the web began to expand the computer scientists could see that they could bypass the horrendously difficult linguistic pathway of analysing the structure of language by throwing it aside and just working on the statistical characteristics of language. This lead to two pathways for processing strategies. The first is called Information Retrieval (IR) which treats documents purely as a bag of words and the second is statistical NLP (SNLP) which identifies the structure in language by its statistical characteristics rather than by grammatical rules.

So far the IR have won the day for popularity as it is best manifest by the Google search engine. However SNLP is making its way very effectively in a slow wind up to superseding IR.

The basic feature of IR is that it recognises documents that cover a topic of interest. It relies on achieving exact match to the string of letters the user types into the search engine. It has no understanding of the linguistic or conceptual meaning of those letters so it can’t tell the difference between the singular and plural form of a word. One technology that has been devised to overcome the inherent limitation of IR is searching using Regular Expressions rather than strings. This enables the user to design a matching string using multiple patterns rather than the literal string in a Google search. Regular expressions are an extension on the variety of strings you can search for but it doesn’t escape the inherent limitation that a fixed pattern of strings is searched for.

NLP has a different origin to IR and believes that the structure of the sentence is important to understanding it and the semantics of the word also effect the meaning. Hence NLP relies on methods for parsing sentences into their grammatical components and then retrieving content based on understanding that structure. SNLP has bolstered NLP by showing that some patterns of language usage are defined by statistical properties that can be exploited to correctly identify the grammatical role of words in a sentence. A good example is the use of statistical patterns to recognise the part of speech of an unknown word in a sentence by the behaviour of the words around it.

Unfortunately the claim that a system uses NLP is so widespread that the value of the methods are being obscured.

Some claims to be NLP that are false:

1. Use of strings in rules to find desired content.

2. Use of regular expressions to find desired content.

3. Use of IR find content.

The reason strings and rules are not classifiable as NLP is crucial to understanding the true values of SNLP.

Why are the differences important?

String and rule based IR systems have the advantages that they are quick and cheap to build, but their crucial disadvantage is that they can only identify what has already been defined. They also become encumbered once their rule set becomes too large as the effect of changes can’t be well predicted due to interaction between the rules. Also they will have a pronounced tendency to over produce results yielding many false positives in their search.

Statistical NLP has the key advantage that it can identify content it has never seen, so it has a serious discovery advantage. Furthermore as more knowledge is acquired it can be incorporated into the processing stream ensuring that it has a growing knowledge base. Its disadvantage is that it needs a gold standard annotation text to reach very high accuracies and that building the computational model takes more resources than the IR methods, so you need more extensive knowledge and training in the methods to exploit them effectively. Rule systems that have been in use for many years will serve their restricted objectives well and provide better Precision (correctly identifying the items requested, and not retrieving too many false hits) than SNLP systems initially. Ultimately though they will always be behind in Recall (finding all the items requested) and eventually slip behind on Precision once the SNLP system has had sufficient training.

So the next time you see someone touting their technology as NLP check and see if it is really a rule based IR technology or truly statistical.