CN112836492A - Medical project name alignment method - Google Patents
Medical project name alignment method Download PDFInfo
- Publication number
- CN112836492A CN112836492A CN202110132054.3A CN202110132054A CN112836492A CN 112836492 A CN112836492 A CN 112836492A CN 202110132054 A CN202110132054 A CN 202110132054A CN 112836492 A CN112836492 A CN 112836492A
- Authority
- CN
- China
- Prior art keywords
- item
- target
- frequency
- source
- target item
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 40
- 238000004590 computer program Methods 0.000 claims description 7
- 238000012163 sequencing technique Methods 0.000 claims description 3
- 210000001015 abdomen Anatomy 0.000 description 7
- 230000003187 abdominal effect Effects 0.000 description 6
- 238000004422 calculation algorithm Methods 0.000 description 5
- 238000005259 measurement Methods 0.000 description 4
- 239000000126 substance Substances 0.000 description 3
- 230000000694 effects Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000012549 training Methods 0.000 description 2
- PGOHTUIFYSHAQG-LJSDBVFPSA-N (2S)-6-amino-2-[[(2S)-5-amino-2-[[(2S)-2-[[(2S)-2-[[(2S)-2-[[(2S)-4-amino-2-[[(2S)-2-[[(2S)-2-[[(2S)-2-[[(2S)-2-[[(2S)-5-amino-2-[[(2S)-5-amino-2-[[(2S)-2-[[(2S)-2-[[(2S)-2-[[(2S,3R)-2-[[(2S)-5-amino-2-[[(2S)-2-[[(2S)-2-[[(2S,3R)-2-[[(2S)-2-[[(2S)-2-[[(2S)-2-[[(2S)-2-[[(2S)-5-amino-2-[[(2S)-1-[(2S,3R)-2-[[(2S)-2-[[(2S)-2-[[(2R)-2-[[(2S)-2-[[(2S)-2-[[2-[[(2S)-2-[[(2S)-2-[[(2S)-2-[[(2S)-1-[(2S)-2-[[(2S)-2-[[(2S)-2-[[(2S)-2-amino-4-methylsulfanylbutanoyl]amino]-3-(1H-indol-3-yl)propanoyl]amino]-5-carbamimidamidopentanoyl]amino]propanoyl]pyrrolidine-2-carbonyl]amino]-3-methylbutanoyl]amino]-4-methylpentanoyl]amino]-4-methylpentanoyl]amino]acetyl]amino]-3-hydroxypropanoyl]amino]-4-methylpentanoyl]amino]-3-sulfanylpropanoyl]amino]-4-methylsulfanylbutanoyl]amino]-5-carbamimidamidopentanoyl]amino]-3-hydroxybutanoyl]pyrrolidine-2-carbonyl]amino]-5-oxopentanoyl]amino]-3-hydroxypropanoyl]amino]-3-hydroxypropanoyl]amino]-3-(1H-imidazol-5-yl)propanoyl]amino]-4-methylpentanoyl]amino]-3-hydroxybutanoyl]amino]-3-(1H-indol-3-yl)propanoyl]amino]-5-carbamimidamidopentanoyl]amino]-5-oxopentanoyl]amino]-3-hydroxybutanoyl]amino]-3-hydroxypropanoyl]amino]-3-carboxypropanoyl]amino]-3-hydroxypropanoyl]amino]-5-oxopentanoyl]amino]-5-oxopentanoyl]amino]-3-phenylpropanoyl]amino]-5-carbamimidamidopentanoyl]amino]-3-methylbutanoyl]amino]-4-methylpentanoyl]amino]-4-oxobutanoyl]amino]-5-carbamimidamidopentanoyl]amino]-3-(1H-indol-3-yl)propanoyl]amino]-4-carboxybutanoyl]amino]-5-oxopentanoyl]amino]hexanoic acid Chemical compound CSCC[C@H](N)C(=O)N[C@@H](Cc1c[nH]c2ccccc12)C(=O)N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](C)C(=O)N1CCC[C@H]1C(=O)N[C@@H](C(C)C)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CC(C)C)C(=O)NCC(=O)N[C@@H](CO)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CS)C(=O)N[C@@H](CCSC)C(=O)N[C@@H](CCCNC(N)=N)C(=O)N[C@@H]([C@@H](C)O)C(=O)N1CCC[C@H]1C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@@H](CO)C(=O)N[C@@H](CO)C(=O)N[C@@H](Cc1cnc[nH]1)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](Cc1c[nH]c2ccccc12)C(=O)N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CO)C(=O)N[C@@H](CC(O)=O)C(=O)N[C@@H](CO)C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@@H](Cc1ccccc1)C(=O)N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](C(C)C)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CC(N)=O)C(=O)N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](Cc1c[nH]c2ccccc12)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@@H](CCCCN)C(O)=O PGOHTUIFYSHAQG-LJSDBVFPSA-N 0.000 description 1
- 108010049003 Fibrinogen Proteins 0.000 description 1
- 102000008946 Fibrinogen Human genes 0.000 description 1
- 108010094028 Prothrombin Proteins 0.000 description 1
- 102100027378 Prothrombin Human genes 0.000 description 1
- 108090000190 Thrombin Proteins 0.000 description 1
- 108010000499 Thromboplastin Proteins 0.000 description 1
- 102000002262 Thromboplastin Human genes 0.000 description 1
- 101150107801 Top2a gene Proteins 0.000 description 1
- 238000012550 audit Methods 0.000 description 1
- 239000008280 blood Substances 0.000 description 1
- 210000004369 blood Anatomy 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000013145 classification model Methods 0.000 description 1
- 230000015271 coagulation Effects 0.000 description 1
- 238000005345 coagulation Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 229940012952 fibrinogen Drugs 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 229940039716 prothrombin Drugs 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 229960004072 thrombin Drugs 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/216—Parsing using statistical methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/237—Lexical tools
- G06F40/247—Thesauruses; Synonyms
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Probability & Statistics with Applications (AREA)
- Medical Treatment And Welfare Office Work (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention relates to a medical project name alignment method, which comprises the following steps: acquiring medical record information including source items in a source text; acquiring one or more target items in an advice list corresponding to medical record information, and calculating the quantity and item frequency of each target item; determining target items which are ranked as the top N and correspond to the source items according to the item frequency, and calculating the reverse alignment frequency of each target item; n is more than 1 and less than 5, and N is an integer; calculating the text distance between the source item and the corresponding target item; obtaining a trained model according to the text distance, the target project frequency, the inverse alignment frequency, a preset first rule, a preset second rule and a preset third rule; and inputting the source text to be aligned into the trained model to obtain a target item in the medical advice list corresponding to the source text to be aligned.
Description
Technical Field
The invention relates to the technical field of data processing, in particular to a medical project name alignment method.
Background
The same medical item, often referred to by different names in the documents of the medical records, for example: the "abdominal magnetic resonance scan" is named "abdominal MR" in the order book, the "magnetic resonance scan" in the billing invoice, and the "1.5 MR (abdominal)" in the examination slip. The names of the same medical item in different documents are effectively identified and aligned, and the method plays a key role in systems for medical information management, expense audit and the like.
Most of the current schemes only use a character string comparison algorithm of editing distance or maximum common substrings to solve the problems.
Only by adopting character string comparison, the problem of aligning medical items is difficult to solve, such as: although the "abdomen MR" and the "abdomen magnetic resonance flat scan" are the same item, the editing distance between the "abdomen MR" and the "abdomen magnetic resonance flat scan" is very large, and is higher than the editing distance between the "abdomen CT flat scan" and the "abdomen magnetic resonance flat scan", which easily causes erroneous judgment.
In addition, the method of character string comparison cannot solve the problem of 1 pair of multiple sets of items. Such as: the "coagulation group set" in the medical order corresponds to a plurality of items such as "plasma prothrombin time measurement (PT)", "thrombin time measurement (TT)", "activated partial thromboplastin time measurement (APTT)", and "plasma fibrinogen measurement" in the charge list, and there may be different sub-items included in combination items in different hospitals.
Disclosure of Invention
The invention aims to provide a medical project name alignment method aiming at the defects of the prior art so as to solve the problems in the prior art.
To solve the problem, in a first aspect, the present invention provides a medical project name alignment method, including:
acquiring medical record information including source items in a source text;
acquiring one or more target items in an order list corresponding to the medical record information, and calculating the quantity and item frequency of each target item;
determining target items which are ranked as the top N and correspond to the source items according to the item frequency, and calculating the reverse alignment frequency of each target item; n is more than 1 and less than 5, and N is an integer;
calculating the text distance between the source item and the corresponding target item;
obtaining a trained model according to the text distance, the target project frequency, the reverse alignment frequency, a preset first rule, a preset second rule and a preset third rule;
and inputting the source text to be aligned into the trained model to obtain a target item in an order list corresponding to the source text to be aligned.
Preferably, the obtaining of the trained model according to the text distance, the target item frequency, the inverse alignment frequency, a preset first rule, a preset second rule, and a preset third rule specifically includes:
determining a first target item and a second target item corresponding to the source item according to the target items which are ranked as the top N and correspond to the source item; the item frequency of the first target item is greater than the item frequency of the second target item;
calculating a first text distance between a source item and a first target item and a second text distance between the first source item and the second target item;
when the first text distance, the second text distance and the inverse alignment frequency meet a preset first rule, determining that the first target item and the source item are aligned.
Preferably, the first rule is specifically:
and when the first text distance is greater than a preset first threshold value and the first text distance is greater than the product of a preset second threshold value and a second text distance, the reverse alignment frequency of the first target item is greater than a preset third threshold value.
Preferably, the method further comprises:
when the first text distance, the second text distance and the reverse alignment frequency do not meet a preset first rule, sequencing target items corresponding to the source items according to item frequency, and determining n target items;
calculating the text distance between the source item and each target item in the corresponding n target items to obtain the first to nth text distances from the source item to the n target items;
when the jth target item meets a preset second rule, determining that the jth target item is aligned with the source item; wherein j is less than or equal to n.
Preferably, the second rule is specifically:
the j text distance between the source item and the j target item is larger than a fourth threshold value, and the item frequency of the j target item is larger than a fifth threshold value.
Preferably, the method further comprises the following steps:
when the jth target item meets a preset second rule, analyzing each target item with item frequency ordered behind the jth target item, and when the item frequency of the kth target item and the item frequency of the jth target item meet a preset third rule, judging that the kth target item is aligned with the source item; wherein k is less than j;
and repeating the execution until the item frequency of the xth target item and the item frequency of the jth target item do not meet a preset third rule.
Preferably, the third rule is specifically:
the difference between the item frequency of the kth target item and the item frequency of the jth target item is greater than a preset sixth threshold.
Preferably, the inputting the source text to be aligned into the trained model to obtain the target item in the order list corresponding to the source text to be aligned specifically includes:
calculating the alignment probability of each target item to be aligned and the source item through the trained model;
and determining whether the target item to be aligned is aligned with the source item according to the alignment probability.
In a second aspect, the invention provides an apparatus comprising a memory for storing a program and a processor for performing the method of any of the first aspects.
In a third aspect, the present invention provides a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method according to any one of the first aspect.
In a fourth aspect, the invention provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the method of any of the first aspects.
By applying the medical item name alignment method provided by the embodiment of the invention, the medical item name alignment judgment can be carried out according to the semantics based on the semantic matching scheme of the BERT, not only the character string characteristics. Based on the statistical characteristics, the judgment of text alignment can be realized without marking data. Meanwhile, the strategy based on the statistical characteristics can be compatible with the problem that the projects of different sets of projects are inconsistent in different hospitals.
Drawings
Fig. 1 is a flowchart illustrating a medical project name alignment method according to an embodiment of the present invention.
Detailed Description
The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.
In order to make the objects, technical solutions and advantages of the present invention clearer, the present invention will be described in further detail with reference to the accompanying drawings, and it is apparent that the described embodiments are only a part of the embodiments of the present invention, not all of the embodiments.
The terms "first," "second," and the like in the description and claims of the present application and in the above-described drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.
According to the medical project name alignment method, after the model is trained, the target project can be aligned with the source project through the probability of the target project after the target project is input.
Fig. 1 is a flowchart illustrating a medical project name alignment method according to an embodiment of the present invention. In fig. 1, steps 110 to 150 are used to train the model, and step 160 is to align the medical item names by the trained model. The technical solution of the present invention is described in detail below with reference to fig. 1.
the source text may be a collection of a series of orders obtained from the HIS system of the hospital, i.e., an order list, and the source item may be a specific medical term in the order, such as "abdominal MR".
the medical record information is medical records corresponding to the source texts, and the medical records include target items, such as 'magnetic resonance flat scan'. The number of target items may be the number of medical record information including the target items. The item frequency of the target item can be calculated by equation (1):
wherein the source item is XiThe target item is Yj,The item frequency, Count (X is included in the source document) of the target itemiAnd Y in the object documentjMedical history of (d).
wherein the content of the first and second substances,is a target item YjThe frequency of the reverse-aligned frequency of (c),is YjX of (2)iNumber of items) is the number of top N ranked target items corresponding to the source item.
wherein the content of the first and second substances,as source item XiAnd target item YjText distance of (2), longest common substring LCS (Y)j,Xi) The calculation can be done by a text comparison algorithm of the longest common substring, min (Y)j),length(Xi) ) as source item XiLength of (2) and target item YjThe minimum value in the length of (a) can also be calculated by a common algorithm, which is not described in detail in the present application.
specifically, step 150 may include the following steps, so that the model may be trained:
(1) determining a first target item and a second target item corresponding to the source item according to the target items which are ranked as the top N and correspond to the source item; the item frequency of the first target item is greater than the item frequency of the second target item;
calculating a first text distance between the source item and the first target item and a second text distance between the first source item and the second target item;
and when the first text distance, the second text distance and the inverse alignment frequency meet a preset first rule, determining that the first target item is aligned with the source item.
Wherein, the first rule is specifically:
and when the first text distance is greater than a preset first threshold and the first text distance is greater than the product of a preset second threshold and the second text distance, the inverse alignment frequency of the first target item is greater than a preset third threshold.
(2) When the first text distance, the second text distance and the reverse alignment frequency do not meet a preset first rule, sequencing target items corresponding to the source items according to the item frequency, and determining n target items;
calculating the text distance between the source project and each target project in the corresponding n target projects to obtain the first to nth text distances from the source project to the n target projects;
and when the jth target item meets a preset second rule, determining that the jth target item is aligned with the source item.
Wherein the second rule is specifically: the jth text distance between the source item and the jth target item is larger than a fourth threshold value, and the item frequency of the jth target item is larger than a fifth threshold value.
(3) When the jth target item meets a preset second rule, analyzing each target item with the item frequency ordered behind the jth target item, and when the item frequency of the kth target item and the item frequency of the jth target item meet a preset third rule, judging that the kth target item is aligned with the source item;
and repeating the execution until the item frequency of the xth target item and the item frequency of the jth target item do not meet a preset third rule.
And the difference between the item frequency of the kth target item and the item frequency of the jth target item is greater than a preset sixth threshold value.
The first threshold to the sixth threshold are empirical values set through a plurality of experiments.
And 160, inputting the source text to be aligned into the trained model to obtain a target item in the order list corresponding to the source text to be aligned.
Specifically, the alignment probability of each target item to be aligned and the source item is calculated through a trained model; and determining whether the target item to be aligned is aligned with the source item according to the alignment probability.
The medical item alignment method of the present application will be described in detail below with reference to specific examples.
In the method and the device, the alignment is directly judged for the source item and the target item which have obvious statistical characteristics and short text distance.
For source items and target items with insignificant statistical features or far text distances, classification can be performed by the following steps.
To find item X in the order listi"abdomen MR", the alignment name in the billing invoice is an example. First, medical records including "abdominal MR" in all the orders are found. Then, the order list item Y corresponding to the medical records is countedjThe number of items, calculate the item frequencyFor example: the item frequency of the item "magnetic resonance flat scan" with respect to the source item "abdominal MR" is 0.98. After the item frequency related to the items in the medical advice is calculated, each target item Y is countedjAt which source item XiThe frequency ordering of the entries in (1) is TOPN, N can be defined as a number between 1 and 5, and the reverse alignment frequency is calculated. For example: "blood routine" as a high frequency examination will be at multiple source items XiThe frequency ranks in (1) are all higher, and the corresponding frequency is inversely alignedIt will be smaller.
After the above features are obtained, the following rules are formulated:
a first rule: source item XiThe corresponding Top2 target items are Y respectivelytop1And Ytop2If the first text distanceAnd is
And Y istop1Of the inverse ordering frequencyThen determine Ytop1And source item XiAnd (4) aligning. Wherein the content of the first and second substances,is the second text distance.
The second rule is as follows: if the first rule is not satisfied, for source item XiCorresponding target item YjFrom high to low in frequency, find the first satisfactionAnd isItem Y ofjJudgment of YjAnd source item XiAnd (4) aligning.
A third rule: for the target item Y meeting the second rulejAnalyzing each item Y with item frequency arranged behind it from high to low according to item frequencykIf Y iskSatisfy the requirement ofDetermination of YkAnd source item XiAnd (4) aligning. Until the first Y is encountered that does not satisfy the above conditionkAnd (6) ending.
Through the first rule, one-to-one aligned items can be found, and through the second rule and the third rule, one-to-many aligned items can be found.
In the application, the trained model can be obtained through steps 110 to 150, the semantic matching of the model adopts a text classification model based on BERT, the problem of whether any two item names in different documents are consistent is converted into a binary classification problem based on BERT, 0 represents inconsistency, and 1 represents consistency. Firstly, the statistical characteristics and the text distance are adopted, training data are generated according to the first rule, the second rule, the third rule and the fourth rule, and then a model is trained to judge whether any two items are aligned. In order to improve the effect, a part of data which cannot be judged by the rule can be extracted, and training data is added after manual labeling to optimize the semantic matching effect.
Directly judging alignment aiming at the source item and the target item with obvious statistical characteristics and short text distance, and for the source item and the target item with unobvious statistical characteristics or long text distance, the source item X with aligned names can not be found outiAdopting the model trained in steps 110-150 to perform Y on each target itemjCalculating alignment probabilitiesIf it is notThen determine YjAnd XiAnd (4) aligning.
Among them, Threshold1To Threshold7Sequentially representing the first to seventh thresholds.
By applying the medical item name alignment method provided by the embodiment of the invention, the medical item name alignment judgment can be carried out according to the semantics based on the semantic matching scheme of the BERT, not only the character string characteristics. Based on the statistical characteristics, the judgment of text alignment can be realized without marking data. Meanwhile, the strategy based on the statistical characteristics can be compatible with the problem that the projects of different sets of projects are inconsistent in different hospitals.
The second embodiment of the invention provides equipment which comprises a memory and a processor, wherein the memory is used for storing programs, and the memory can be connected with the processor through a bus. The memory may be a non-volatile memory such as a hard disk drive and a flash memory, in which a software program and a device driver are stored. The software program is capable of performing various functions of the above-described methods provided by embodiments of the present invention; the device drivers may be network and interface drivers. The processor is used for executing a software program, and the software program can realize the method provided by the first embodiment of the invention when being executed.
A third embodiment of the present invention provides a computer program product including instructions, which, when the computer program product runs on a computer, causes the computer to execute the method provided in the first embodiment of the present invention.
The fourth embodiment of the present invention provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the method provided in the first embodiment of the present invention is implemented.
Those of skill would further appreciate that the various illustrative components and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied in hardware, a software module executed by a processor, or a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are merely exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.
Claims (10)
1. A medical project name alignment method, characterized by comprising:
acquiring medical record information including source items in a source text;
acquiring one or more target items in an order list corresponding to the medical record information, and calculating the quantity and item frequency of each target item;
determining target items which are ranked as the top N and correspond to the source items according to the item frequency, and calculating the reverse alignment frequency of each target item; n is more than 1 and less than 5, and N is an integer;
calculating the text distance between the source item and the corresponding target item;
obtaining a trained model according to the text distance, the target project frequency, the reverse alignment frequency, a preset first rule, a preset second rule and a preset third rule;
and inputting the source text to be aligned into the trained model to obtain a target item in an order list corresponding to the source text to be aligned.
2. The method according to claim 1, wherein the obtaining the trained model according to the text distance, the target item frequency, the inverse alignment frequency, a preset first rule, a preset second rule, and a preset third rule specifically comprises:
determining a first target item and a second target item corresponding to the source item according to the target items which are ranked as the top N and correspond to the source item; the item frequency of the first target item is greater than the item frequency of the second target item;
calculating a first text distance between a source item and a first target item and a second text distance between the first source item and the second target item;
when the first text distance, the second text distance and the inverse alignment frequency meet a preset first rule, determining that the first target item and the source item are aligned.
3. The method according to claim 2, wherein the first rule is specifically:
and when the first text distance is greater than a preset first threshold value and the first text distance is greater than the product of a preset second threshold value and a second text distance, the reverse alignment frequency of the first target item is greater than a preset third threshold value.
4. The method of claim 2, further comprising:
when the first text distance, the second text distance and the reverse alignment frequency do not meet a preset first rule, sequencing target items corresponding to the source items according to item frequency, and determining n target items;
calculating the text distance between the source item and each target item in the corresponding n target items to obtain the first to nth text distances from the source item to the n target items;
when the jth target item meets a preset second rule, determining that the jth target item is aligned with the source item; wherein j is less than or equal to n.
5. The method according to claim 4, wherein the second rule is specifically:
the j text distance between the source item and the j target item is larger than a fourth threshold value, and the item frequency of the j target item is larger than a fifth threshold value.
6. The method of claim 4, further comprising, after the method:
when the jth target item meets a preset second rule, analyzing each target item with item frequency ordered behind the jth target item, and when the item frequency of the kth target item and the item frequency of the jth target item meet a preset third rule, judging that the kth target item is aligned with the source item; wherein k is less than j;
and repeating the execution until the item frequency of the xth target item and the item frequency of the jth target item do not meet a preset third rule.
7. The method according to claim 6, wherein the third rule is specifically:
the difference between the item frequency of the kth target item and the item frequency of the jth target item is greater than a preset sixth threshold.
8. The method according to claim 1, wherein the inputting the source text to be aligned into the trained model to obtain the target item in the order list corresponding to the source text to be aligned specifically comprises:
calculating the alignment probability of each target item to be aligned and the source item through the trained model;
and determining whether the target item to be aligned is aligned with the source item according to the alignment probability.
9. A medical item name alignment apparatus, comprising a memory for storing a program and a processor for performing the method of any one of claims 1-8.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a computer program which, when being executed by a processor, carries out the method according to any one of claims 1-8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110132054.3A CN112836492B (en) | 2021-01-30 | 2021-01-30 | Medical project name alignment method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110132054.3A CN112836492B (en) | 2021-01-30 | 2021-01-30 | Medical project name alignment method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112836492A true CN112836492A (en) | 2021-05-25 |
CN112836492B CN112836492B (en) | 2024-03-08 |
Family
ID=75932511
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110132054.3A Active CN112836492B (en) | 2021-01-30 | 2021-01-30 | Medical project name alignment method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112836492B (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110021439A (en) * | 2019-03-07 | 2019-07-16 | 平安科技(深圳)有限公司 | Medical data classification method, device and computer equipment based on machine learning |
CN110717017A (en) * | 2019-10-17 | 2020-01-21 | 腾讯科技(深圳)有限公司 | Method for processing corpus |
CN112085012A (en) * | 2020-09-04 | 2020-12-15 | 泰康保险集团股份有限公司 | Project name and category identification method and device |
CN112149400A (en) * | 2020-09-23 | 2020-12-29 | 腾讯科技(深圳)有限公司 | Data processing method, device, equipment and storage medium |
-
2021
- 2021-01-30 CN CN202110132054.3A patent/CN112836492B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110021439A (en) * | 2019-03-07 | 2019-07-16 | 平安科技(深圳)有限公司 | Medical data classification method, device and computer equipment based on machine learning |
WO2020177230A1 (en) * | 2019-03-07 | 2020-09-10 | 平安科技(深圳)有限公司 | Medical data classification method and apparatus based on machine learning, and computer device and storage medium |
CN110717017A (en) * | 2019-10-17 | 2020-01-21 | 腾讯科技(深圳)有限公司 | Method for processing corpus |
CN112085012A (en) * | 2020-09-04 | 2020-12-15 | 泰康保险集团股份有限公司 | Project name and category identification method and device |
CN112149400A (en) * | 2020-09-23 | 2020-12-29 | 腾讯科技(深圳)有限公司 | Data processing method, device, equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN112836492B (en) | 2024-03-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Galpern et al. | Allelematch: an R package for identifying unique multilocus genotypes where genotyping error and missing data may be present | |
AU2020200909A1 (en) | Evaluation control | |
US7895235B2 (en) | Extracting semantic relations from query logs | |
US9230009B2 (en) | Routing of questions to appropriately trained question and answer system pipelines using clustering | |
US9305083B2 (en) | Author disambiguation | |
US20210286708A1 (en) | Method and electronic device for recommending crowdsourced tester and crowdsourced testing | |
CN104699730A (en) | Identifying and displaying relationships between candidate answers | |
US8577849B2 (en) | Guided data repair | |
RU2680746C2 (en) | Method and device for developing web page quality model | |
CN108595657B (en) | Data table classification mapping method and device of HIS (hardware-in-the-system) | |
CN110737689B (en) | Data standard compliance detection method, device, system and storage medium | |
CN116881430B (en) | Industrial chain identification method and device, electronic equipment and readable storage medium | |
CN110889024A (en) | Method and device for calculating information-related stock | |
CN109783483A (en) | A kind of method, apparatus of data preparation, computer storage medium and terminal | |
CN112836492A (en) | Medical project name alignment method | |
CN111611781A (en) | Data labeling method, question answering method, device and electronic equipment | |
CN116843150A (en) | Community service method and system based on intelligent Internet of things | |
CN112612815B (en) | Method and device for positioning evaluation mark file and electronic equipment | |
CN115098534A (en) | Data query method, device, equipment and medium based on index weight lifting | |
CN112328779B (en) | Training sample construction method, device, terminal equipment and storage medium | |
JP3602084B2 (en) | Database management device | |
JP5912813B2 (en) | Patent Search Result Evaluation Device, Patent Search Result Evaluation Method, and Program | |
CN114238588B (en) | Data retrieval method, system, readable storage medium and computer device | |
Goremykin | A novel test for absolute fit of evolutionary models provides a means to correctly identify the substitution model and the model tree | |
CN113205801B (en) | Method, device, computer equipment and storage medium for determining malicious voice sample |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |