CN112836492A

CN112836492A - Medical project name alignment method

Info

Publication number: CN112836492A
Application number: CN202110132054.3A
Authority: CN
Inventors: 王博; 刘升平; 梁家恩
Original assignee: Unisound Intelligent Technology Co Ltd; Xiamen Yunzhixin Intelligent Technology Co Ltd
Current assignee: Unisound Intelligent Technology Co Ltd; Xiamen Yunzhixin Intelligent Technology Co Ltd
Priority date: 2021-01-30
Filing date: 2021-01-30
Publication date: 2021-05-25
Anticipated expiration: 2041-01-30
Also published as: CN112836492B

Abstract

The invention relates to a medical project name alignment method, which comprises the following steps: acquiring medical record information including source items in a source text; acquiring one or more target items in an advice list corresponding to medical record information, and calculating the quantity and item frequency of each target item; determining target items which are ranked as the top N and correspond to the source items according to the item frequency, and calculating the reverse alignment frequency of each target item; n is more than 1 and less than 5, and N is an integer; calculating the text distance between the source item and the corresponding target item; obtaining a trained model according to the text distance, the target project frequency, the inverse alignment frequency, a preset first rule, a preset second rule and a preset third rule; and inputting the source text to be aligned into the trained model to obtain a target item in the medical advice list corresponding to the source text to be aligned.

Description

Medical project name alignment method

Technical Field

The invention relates to the technical field of data processing, in particular to a medical project name alignment method.

Background

The same medical item, often referred to by different names in the documents of the medical records, for example: the "abdominal magnetic resonance scan" is named "abdominal MR" in the order book, the "magnetic resonance scan" in the billing invoice, and the "1.5 MR (abdominal)" in the examination slip. The names of the same medical item in different documents are effectively identified and aligned, and the method plays a key role in systems for medical information management, expense audit and the like.

Most of the current schemes only use a character string comparison algorithm of editing distance or maximum common substrings to solve the problems.

Only by adopting character string comparison, the problem of aligning medical items is difficult to solve, such as: although the "abdomen MR" and the "abdomen magnetic resonance flat scan" are the same item, the editing distance between the "abdomen MR" and the "abdomen magnetic resonance flat scan" is very large, and is higher than the editing distance between the "abdomen CT flat scan" and the "abdomen magnetic resonance flat scan", which easily causes erroneous judgment.

In addition, the method of character string comparison cannot solve the problem of 1 pair of multiple sets of items. Such as: the "coagulation group set" in the medical order corresponds to a plurality of items such as "plasma prothrombin time measurement (PT)", "thrombin time measurement (TT)", "activated partial thromboplastin time measurement (APTT)", and "plasma fibrinogen measurement" in the charge list, and there may be different sub-items included in combination items in different hospitals.

Disclosure of Invention

The invention aims to provide a medical project name alignment method aiming at the defects of the prior art so as to solve the problems in the prior art.

To solve the problem, in a first aspect, the present invention provides a medical project name alignment method, including:

acquiring medical record information including source items in a source text;

acquiring one or more target items in an order list corresponding to the medical record information, and calculating the quantity and item frequency of each target item;

determining target items which are ranked as the top N and correspond to the source items according to the item frequency, and calculating the reverse alignment frequency of each target item; n is more than 1 and less than 5, and N is an integer;

calculating the text distance between the source item and the corresponding target item;

obtaining a trained model according to the text distance, the target project frequency, the reverse alignment frequency, a preset first rule, a preset second rule and a preset third rule;

and inputting the source text to be aligned into the trained model to obtain a target item in an order list corresponding to the source text to be aligned.

Preferably, the obtaining of the trained model according to the text distance, the target item frequency, the inverse alignment frequency, a preset first rule, a preset second rule, and a preset third rule specifically includes:

determining a first target item and a second target item corresponding to the source item according to the target items which are ranked as the top N and correspond to the source item; the item frequency of the first target item is greater than the item frequency of the second target item;

calculating a first text distance between a source item and a first target item and a second text distance between the first source item and the second target item;

when the first text distance, the second text distance and the inverse alignment frequency meet a preset first rule, determining that the first target item and the source item are aligned.

Preferably, the first rule is specifically:

and when the first text distance is greater than a preset first threshold value and the first text distance is greater than the product of a preset second threshold value and a second text distance, the reverse alignment frequency of the first target item is greater than a preset third threshold value.

Preferably, the method further comprises:

when the first text distance, the second text distance and the reverse alignment frequency do not meet a preset first rule, sequencing target items corresponding to the source items according to item frequency, and determining n target items;

calculating the text distance between the source item and each target item in the corresponding n target items to obtain the first to nth text distances from the source item to the n target items;

when the jth target item meets a preset second rule, determining that the jth target item is aligned with the source item; wherein j is less than or equal to n.

Preferably, the second rule is specifically:

the j text distance between the source item and the j target item is larger than a fourth threshold value, and the item frequency of the j target item is larger than a fifth threshold value.

Preferably, the method further comprises the following steps:

when the jth target item meets a preset second rule, analyzing each target item with item frequency ordered behind the jth target item, and when the item frequency of the kth target item and the item frequency of the jth target item meet a preset third rule, judging that the kth target item is aligned with the source item; wherein k is less than j;

and repeating the execution until the item frequency of the xth target item and the item frequency of the jth target item do not meet a preset third rule.

Preferably, the third rule is specifically:

the difference between the item frequency of the kth target item and the item frequency of the jth target item is greater than a preset sixth threshold.

Preferably, the inputting the source text to be aligned into the trained model to obtain the target item in the order list corresponding to the source text to be aligned specifically includes:

calculating the alignment probability of each target item to be aligned and the source item through the trained model;

and determining whether the target item to be aligned is aligned with the source item according to the alignment probability.

In a second aspect, the invention provides an apparatus comprising a memory for storing a program and a processor for performing the method of any of the first aspects.

In a third aspect, the present invention provides a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method according to any one of the first aspect.

In a fourth aspect, the invention provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the method of any of the first aspects.

By applying the medical item name alignment method provided by the embodiment of the invention, the medical item name alignment judgment can be carried out according to the semantics based on the semantic matching scheme of the BERT, not only the character string characteristics. Based on the statistical characteristics, the judgment of text alignment can be realized without marking data. Meanwhile, the strategy based on the statistical characteristics can be compatible with the problem that the projects of different sets of projects are inconsistent in different hospitals.

Drawings

Fig. 1 is a flowchart illustrating a medical project name alignment method according to an embodiment of the present invention.

Detailed Description

The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.

In order to make the objects, technical solutions and advantages of the present invention clearer, the present invention will be described in further detail with reference to the accompanying drawings, and it is apparent that the described embodiments are only a part of the embodiments of the present invention, not all of the embodiments.

The terms "first," "second," and the like in the description and claims of the present application and in the above-described drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.

According to the medical project name alignment method, after the model is trained, the target project can be aligned with the source project through the probability of the target project after the target project is input.

Fig. 1 is a flowchart illustrating a medical project name alignment method according to an embodiment of the present invention. In fig. 1, steps 110 to 150 are used to train the model, and step 160 is to align the medical item names by the trained model. The technical solution of the present invention is described in detail below with reference to fig. 1.

Step 110, acquiring medical record information including source items in a source text;

the source text may be a collection of a series of orders obtained from the HIS system of the hospital, i.e., an order list, and the source item may be a specific medical term in the order, such as "abdominal MR".

Step 120, acquiring one or more target items in the medical advice list corresponding to the medical record information, and calculating the quantity and item frequency of each target item;

the medical record information is medical records corresponding to the source texts, and the medical records include target items, such as 'magnetic resonance flat scan'. The number of target items may be the number of medical record information including the target items. The item frequency of the target item can be calculated by equation (1):

wherein the source item is X_iThe target item is Y_j，

The item frequency, Count (X is included in the source document) of the target item_iAnd Y in the object document_jMedical history of (d).

Step 130, according to the item frequency, determining the target items which are ordered to be the top N and correspond to the source items, and calculating the reverse alignment frequency of each target item; n is more than 1 and less than 5, and N is an integer;

wherein the content of the first and second substances,

is a target item Y_jThe frequency of the reverse-aligned frequency of (c),

is Y_jX of (2)_iNumber of items) is the number of top N ranked target items corresponding to the source item.

Step 140, calculating the text distance between the source item and the corresponding target item;

wherein the content of the first and second substances,

as source item X_iAnd target item Y_jText distance of (2), longest common substring LCS (Y)_j,X_i) The calculation can be done by a text comparison algorithm of the longest common substring, min (Y)_j),length(X_i) ) as source item X_iLength of (2) and target item Y_jThe minimum value in the length of (a) can also be calculated by a common algorithm, which is not described in detail in the present application.

Step 150, obtaining a trained model according to the text distance, the target project frequency, the inverse alignment frequency, a preset first rule, a preset second rule and a preset third rule;

specifically, step 150 may include the following steps, so that the model may be trained:

(1) determining a first target item and a second target item corresponding to the source item according to the target items which are ranked as the top N and correspond to the source item; the item frequency of the first target item is greater than the item frequency of the second target item;

calculating a first text distance between the source item and the first target item and a second text distance between the first source item and the second target item;

and when the first text distance, the second text distance and the inverse alignment frequency meet a preset first rule, determining that the first target item is aligned with the source item.

Wherein, the first rule is specifically:

and when the first text distance is greater than a preset first threshold and the first text distance is greater than the product of a preset second threshold and the second text distance, the inverse alignment frequency of the first target item is greater than a preset third threshold.

(2) When the first text distance, the second text distance and the reverse alignment frequency do not meet a preset first rule, sequencing target items corresponding to the source items according to the item frequency, and determining n target items;

calculating the text distance between the source project and each target project in the corresponding n target projects to obtain the first to nth text distances from the source project to the n target projects;

and when the jth target item meets a preset second rule, determining that the jth target item is aligned with the source item.

Wherein the second rule is specifically: the jth text distance between the source item and the jth target item is larger than a fourth threshold value, and the item frequency of the jth target item is larger than a fifth threshold value.

(3) When the jth target item meets a preset second rule, analyzing each target item with the item frequency ordered behind the jth target item, and when the item frequency of the kth target item and the item frequency of the jth target item meet a preset third rule, judging that the kth target item is aligned with the source item;

And the difference between the item frequency of the kth target item and the item frequency of the jth target item is greater than a preset sixth threshold value.

The first threshold to the sixth threshold are empirical values set through a plurality of experiments.

And 160, inputting the source text to be aligned into the trained model to obtain a target item in the order list corresponding to the source text to be aligned.

Specifically, the alignment probability of each target item to be aligned and the source item is calculated through a trained model; and determining whether the target item to be aligned is aligned with the source item according to the alignment probability.

The medical item alignment method of the present application will be described in detail below with reference to specific examples.

In the method and the device, the alignment is directly judged for the source item and the target item which have obvious statistical characteristics and short text distance.

For source items and target items with insignificant statistical features or far text distances, classification can be performed by the following steps.

To find item X in the order list_i"abdomen MR", the alignment name in the billing invoice is an example. First, medical records including "abdominal MR" in all the orders are found. Then, the order list item Y corresponding to the medical records is counted_jThe number of items, calculate the item frequency

For example: the item frequency of the item "magnetic resonance flat scan" with respect to the source item "abdominal MR" is 0.98. After the item frequency related to the items in the medical advice is calculated, each target item Y is counted_jAt which source item X_iThe frequency ordering of the entries in (1) is TOPN, N can be defined as a number between 1 and 5, and the reverse alignment frequency is calculated. For example: "blood routine" as a high frequency examination will be at multiple source items X_iThe frequency ranks in (1) are all higher, and the corresponding frequency is inversely aligned

It will be smaller.

After the above features are obtained, the following rules are formulated:

a first rule: source item X_iThe corresponding Top2 target items are Y respectively_top1And Y_top2If the first text distance

And is

And Y is_top1Of the inverse ordering frequency

Then determine Y_top1And source item X_iAnd (4) aligning. Wherein the content of the first and second substances,

is the second text distance.

The second rule is as follows: if the first rule is not satisfied, for source item X_iCorresponding target item Y_jFrom high to low in frequency, find the first satisfaction

And is

Item Y of_jJudgment of Y_jAnd source item X_iAnd (4) aligning.

A third rule: for the target item Y meeting the second rule_jAnalyzing each item Y with item frequency arranged behind it from high to low according to item frequency_kIf Y is_kSatisfy the requirement of

Determination of Y_kAnd source item X_iAnd (4) aligning. Until the first Y is encountered that does not satisfy the above condition_kAnd (6) ending.

Through the first rule, one-to-one aligned items can be found, and through the second rule and the third rule, one-to-many aligned items can be found.

In the application, the trained model can be obtained through steps 110 to 150, the semantic matching of the model adopts a text classification model based on BERT, the problem of whether any two item names in different documents are consistent is converted into a binary classification problem based on BERT, 0 represents inconsistency, and 1 represents consistency. Firstly, the statistical characteristics and the text distance are adopted, training data are generated according to the first rule, the second rule, the third rule and the fourth rule, and then a model is trained to judge whether any two items are aligned. In order to improve the effect, a part of data which cannot be judged by the rule can be extracted, and training data is added after manual labeling to optimize the semantic matching effect.

Directly judging alignment aiming at the source item and the target item with obvious statistical characteristics and short text distance, and for the source item and the target item with unobvious statistical characteristics or long text distance, the source item X with aligned names can not be found out_iAdopting the model trained in steps 110-150 to perform Y on each target item_jCalculating alignment probabilities

If it is not

Then determine Y_jAnd X_iAnd (4) aligning.

Among them, Threshold₁To Threshold₇Sequentially representing the first to seventh thresholds.

The second embodiment of the invention provides equipment which comprises a memory and a processor, wherein the memory is used for storing programs, and the memory can be connected with the processor through a bus. The memory may be a non-volatile memory such as a hard disk drive and a flash memory, in which a software program and a device driver are stored. The software program is capable of performing various functions of the above-described methods provided by embodiments of the present invention; the device drivers may be network and interface drivers. The processor is used for executing a software program, and the software program can realize the method provided by the first embodiment of the invention when being executed.

A third embodiment of the present invention provides a computer program product including instructions, which, when the computer program product runs on a computer, causes the computer to execute the method provided in the first embodiment of the present invention.

The fourth embodiment of the present invention provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the method provided in the first embodiment of the present invention is implemented.

Those of skill would further appreciate that the various illustrative components and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied in hardware, a software module executed by a processor, or a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are merely exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A medical project name alignment method, characterized by comprising:

acquiring medical record information including source items in a source text;

2. The method according to claim 1, wherein the obtaining the trained model according to the text distance, the target item frequency, the inverse alignment frequency, a preset first rule, a preset second rule, and a preset third rule specifically comprises:

3. The method according to claim 2, wherein the first rule is specifically:

4. The method of claim 2, further comprising:

5. The method according to claim 4, wherein the second rule is specifically:

6. The method of claim 4, further comprising, after the method:

7. The method according to claim 6, wherein the third rule is specifically:

8. The method according to claim 1, wherein the inputting the source text to be aligned into the trained model to obtain the target item in the order list corresponding to the source text to be aligned specifically comprises:

9. A medical item name alignment apparatus, comprising a memory for storing a program and a processor for performing the method of any one of claims 1-8.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a computer program which, when being executed by a processor, carries out the method according to any one of claims 1-8.