CN112036172A - Entity identification method and device based on abbreviated data of model and computer equipment - Google Patents

Entity identification method and device based on abbreviated data of model and computer equipment Download PDF

Info

Publication number
CN112036172A
CN112036172A CN202010941630.4A CN202010941630A CN112036172A CN 112036172 A CN112036172 A CN 112036172A CN 202010941630 A CN202010941630 A CN 202010941630A CN 112036172 A CN112036172 A CN 112036172A
Authority
CN
China
Prior art keywords
medical
alias
abbreviation
concept
full
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010941630.4A
Other languages
Chinese (zh)
Other versions
CN112036172B (en
Inventor
顾大中
张圣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN202010941630.4A priority Critical patent/CN112036172B/en
Priority to PCT/CN2020/125144 priority patent/WO2021159757A1/en
Publication of CN112036172A publication Critical patent/CN112036172A/en
Application granted granted Critical
Publication of CN112036172B publication Critical patent/CN112036172B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/374Thesaurus
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/247Thesauruses; Synonyms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The application relates to artificial intelligence, and provides an entity identification method, an entity identification device and computer equipment of abbreviated data based on a model, and medical texts are obtained; searching all abbreviation-full symmetry data appearing in the medical text; determining whether a first full name in the designated abbreviation-full name pair is a medical alias of a first medical concept in the designated medical dictionary; if not, judging whether a first abbreviation in the designated abbreviation-full symmetry pair is a medical alias of a second medical concept in the designated medical dictionary; if yes, all medical aliases contained in the second medical concept are obtained; inputting the first full name and the assigned medical alias into a twin network model, and acquiring an assigned similarity value of the first full name and the assigned medical alias; judging whether medical aliases with the same meaning as the first full-meaning exist in all medical aliases contained in the second medical concept or not according to the assigned similarity value; if so, the first abbreviation is determined to belong to the medical entity. The application improves the accuracy of entity identification of abbreviated data.

Description

Entity identification method and device based on abbreviated data of model and computer equipment
Technical Field
The application relates to the technical field of artificial intelligence, in particular to an entity identification method, an entity identification device and computer equipment of abbreviated data based on a model.
Background
In recent years, with the rapid development of networks and medical information technologies, the medical internet has been gradually increased, and big data in the medical industry has been gradually generated, so that people begin to discuss and learn how to improve the management and service of the medical industry by using the big data. One of the prerequisites and bases of utilizing, analyzing and mining the medical big data is the identification of related medical entities in the medical text, and the identification of related medical entities in the medical text is the fundamental work of the medical big data application.
Currently, when entity recognition is performed on abbreviated data in abbreviated-full-scale pair data in a medical text, a manner of entity recognition based on a medical dictionary is generally adopted, and specifically, whether abbreviated words corresponding to full-scale words belong to a medical entity is indirectly judged by judging whether the full-scale words contained in certain abbreviated-full-scale pair data belong to the medical entity. If the full-name word is judged to be capable of corresponding to the medical alias of a certain medical concept in the medical dictionary, namely the full-name word is identical to the medical alias of the certain medical concept in the medical dictionary, the full-name word is judged to belong to the medical entity, and accordingly, the abbreviation word corresponding to the full-name word is judged to also belong to the medical entity, and the abbreviation word is the medical entity corresponding to the medical concept. If the full-name word is judged to be different from each medical alias of all medical concepts in the medical dictionary, the full-name word is directly judged not to belong to the medical entity, and therefore the abbreviation word corresponding to the full-name word is judged not to belong to the medical entity. However, the capacity of the medical dictionary is limited, the form of the full-name data which may actually represent the medical entity in the medical text is nearly unlimited, and the full-name data belonging to the medical entity which appears in many medical texts is not recorded in the medical dictionary. This makes it easy to cause a large recognition error in a manner of judging whether abbreviated data in abbreviated-full-symmetric data in a medical text belongs to a medical entity only by using a medical dictionary, and if the full-symmetric data which does not appear in the medical dictionary but belongs to the medical entity is misjudged as a non-medical entity, the misjudged full-symmetric data in the abbreviated-full-symmetric data is also misjudged as a non-medical entity. Therefore, the existing method for carrying out entity identification on the abbreviated data in the abbreviated-full symmetric data in the medical text has the problem of low identification accuracy.
Disclosure of Invention
The application mainly aims to provide a model-based entity identification method, a model-based entity identification device, computer equipment and a storage medium, and aims to solve the technical problem that the existing method for identifying entities in abbreviated data in abbreviated-full symmetric data in medical texts is low in identification accuracy.
The application provides an entity identification method of abbreviated data based on a model, which comprises the following steps:
acquiring a medical text to be identified;
finding out all abbreviation-full symmetry data appearing in the medical text through a preset algorithm;
judging whether a first full name in a designated abbreviation-full name pair is a medical alias of a first medical concept in a preset designated medical dictionary, wherein the designated abbreviation-full name pair is any one abbreviation-full name pair in all abbreviation-full name pair data appearing in a medical text, and the designated abbreviation-full name pair comprises the first full name and a first abbreviation corresponding to the first full name;
if the first full name is not a medical alias of a first medical concept in the specified medical dictionary, judging whether the first abbreviation is a medical alias of a second medical concept in the specified medical dictionary;
if the first abbreviation is a medical alias of the second medical concept, acquiring all medical aliases contained in the second medical concept from the specified medical dictionary;
inputting the first full name and the assigned medical alias into a preset twin network model, and acquiring an assigned similarity value between the first full name and the assigned medical alias through the twin network model, wherein the assigned medical alias is any one of all medical aliases included in the second medical concept, and the twin network model is generated after a preset neural network model is trained on the basis of pre-acquired sample data with labels;
judging whether medical aliases with the same meaning as the first full-meaning exist in all medical aliases contained in the second medical concept or not according to the designated similarity value;
and if a medical alias with the same meaning as the first full meaning exists in all medical aliases contained in the second medical concept, judging that the first abbreviation belongs to a medical entity, and the first abbreviation and the second medical concept have a corresponding relationship.
Optionally, the twin network model includes two parallel and identical first and second neural networks, the step of inputting the first full name and the designated medical alias into a preset twin network model, and obtaining a designated similarity value between the first full name and the designated medical alias through the twin network model includes:
inputting the first full name into a first neural network in the twin network model and the specified medical alias into a second neural network in the twin network model;
converting the first full name into a corresponding first vector through the first neural network, and converting the specified medical alias into a corresponding second vector through the second neural network;
calculating a similarity value of the first vector and the second vector;
determining a similarity value of the first vector and the second vector as a specified similarity value of the first full name and the specified medical alias.
Optionally, the step of determining whether a medical alias having the same meaning as the first full-meaning exists in all medical aliases included in the second medical concept according to the specified similarity value includes:
acquiring a preset similarity threshold;
judging whether a similarity numerical value larger than the similarity threshold exists in all the specified similarity numerical values;
if a similarity value larger than the similarity threshold value exists in all the designated similarity values, judging that a medical alias with the same meaning as the first full-meaning exists in all medical aliases contained in the second medical concept;
and if the similarity value is not greater than the similarity threshold value in all the designated similarity values, determining that the medical alias with the same meaning as the first full-meaning is not present in all the medical aliases included in the second medical concept.
Optionally, before the step of determining whether the first full name in the designated abbreviation-full name pair is a medical alias of the first medical concept in the preset designated medical dictionary, the method includes:
acquiring a medical dictionary;
according to a preset labeling rule, performing labeling processing on medical alias data contained in the medical dictionary to generate a labeled medical dictionary;
and taking the labeled medical dictionary as the specified medical dictionary.
Optionally, if a medical alias having the same meaning as the first full meaning exists in all medical aliases included in the second medical concept, the step of determining that the first abbreviation belongs to a medical entity and the first abbreviation has a corresponding relationship with the second medical concept includes:
screening out candidate abbreviated medical entities and full-name medical entities in the medical text based on the specified medical dictionary;
judging whether a designated candidate abbreviated medical entity belongs to medical aliases of a plurality of different medical concepts in the designated medical dictionary at the same time, wherein the designated candidate abbreviated medical entity is any one data of all the candidate abbreviated medical entities;
if the medical aliases belong to a plurality of different medical concepts in the designated medical dictionary at the same time, traversing all the fully-qualified medical entities, and judging whether the medical aliases of the designated medical concepts exist in the fully-qualified medical entities, wherein the designated medical concept is any one of the plurality of medical concepts to which the designated candidate abbreviated medical entity belongs;
if a medical alias of a specified medical concept exists in the fully-qualified medical entity, judging that the specified candidate abbreviated medical entity belongs to a medical entity, and the specified candidate abbreviated medical entity has a corresponding relation with the specified medical concept;
if no medical alias of the specified medical concept exists in the fully qualified medical entity, determining that the specified candidate abbreviated medical entity does not belong to a medical entity.
Optionally, after the step of determining whether a medical alias having the same meaning as the first full-meaning exists in all medical aliases included in the second medical concept according to the specified similarity value, the method includes:
if no medical alias having the same meaning as the first full-meaning exists among all medical aliases included in the second medical concept, determining that the first abbreviation does not belong to a medical entity;
adding an annotation of a non-medical entity to the first abbreviation.
Optionally, the step of determining that the first abbreviation belongs to a medical entity if a medical alias having the same meaning as the first full meaning exists among all medical aliases included in the second medical concept, includes:
finding a data record location corresponding to the second medical concept from the specified medical dictionary;
determining a filling position at the data recording position;
adding the first full scale at the fill location.
The present application also provides a model-based entity recognition apparatus for model-based abbreviation data, comprising:
the first acquisition module is used for acquiring a medical text to be recognized;
the first searching module is used for searching all abbreviation-full symmetric data appearing in the medical text through a preset algorithm;
the first judging module is used for judging whether a first full name in a designated abbreviation-full name pair is a medical alias of a first medical concept in a preset designated medical dictionary, wherein the designated abbreviation-full name pair is any one abbreviation-full name pair in all abbreviation-full name pair data appearing in a medical text, and the designated abbreviation-full name pair comprises the first full name and a first abbreviation corresponding to the first full name;
a second determining module, configured to determine whether the first abbreviation is a medical alias of a second medical concept in the designated medical dictionary if the first full name is not the medical alias of the first medical concept in the designated medical dictionary;
a second obtaining module, configured to obtain all medical aliases included in the second medical concept from the designated medical dictionary if the first abbreviation is a medical alias of the second medical concept;
a third obtaining module, configured to input the first full name and the specified medical alias into a preset twin network model, and obtain a specified similarity value between the first full name and the specified medical alias through the twin network model, where the specified medical alias is any one of all medical aliases included in the second medical concept, and the twin network model is generated after a preset neural network model is trained based on pre-acquired sample data with a label;
a third judging module, configured to judge whether a medical alias having the same meaning as the first full-meaning exists in all medical aliases included in the second medical concept according to the specified similarity value;
and the first judgment module is used for judging that the first abbreviation belongs to a medical entity and has a corresponding relation with the second medical concept if a medical alias with the same meaning as the first full-meaning exists in all medical aliases contained in the second medical concept.
The present application further provides a computer device, comprising a memory and a processor, wherein the memory stores a computer program, and the processor implements the steps of the above method when executing the computer program.
The present application also provides a computer-readable storage medium having stored thereon a computer program which, when being executed by a processor, carries out the steps of the above-mentioned method.
The entity identification method, the entity identification device, the computer equipment and the storage medium based on the abbreviated data of the model have the following beneficial effects:
after finding all abbreviation-full-symmetry data appearing in a medical text, if a first full-name in a designated abbreviation-full-name pair is judged to be not a medical alias of a first medical concept in a preset designated medical dictionary, a first abbreviation in the designated abbreviation-full-name pair corresponding to the first full-name is not directly judged to be a non-medical entity, but whether the first abbreviation in the designated abbreviation-full-name pair is a medical alias of a second medical concept in the designated medical dictionary is further judged. And when the first abbreviation is judged to be a medical alias of a second medical concept in the designated medical dictionary, calling a preset twin network model to calculate similarity values of the first full name and all medical aliases contained in the second medical concept respectively through the twin network model, judging whether the medical aliases contained in the second medical concept have the same meaning as the first full name or not according to the calculated similarity values, and further determining whether the first full name belongs to medical entity data which is not recorded in the designated medical dictionary or not. If a medical alias having the same meaning as the first full name exists in all medical aliases included in the second medical concept, the first full name is determined to belong to the medical entity, so that the first abbreviation is determined to belong to the medical entity, and the first abbreviation has a corresponding relationship with the second medical concept. By the aid of the method and the device, whether the abbreviated data in the abbreviated-full symmetric data in the medical text belong to the medical entity can be rapidly and intelligently identified according to the cooperation of the designated medical dictionary and the twin network model, and accuracy of entity identification of the abbreviated data in the abbreviated-full symmetric data in the medical text is effectively improved.
Drawings
FIG. 1 is a schematic flow chart diagram illustrating a method for entity identification based on abbreviated data of a model according to an embodiment of the present application;
FIG. 2 is a schematic structural diagram of an entity recognition apparatus based on abbreviated data of a model according to an embodiment of the present application;
fig. 3 is a schematic structural diagram of a computer device according to an embodiment of the present application.
The implementation, functional features and advantages of the objectives of the present application will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. As used herein, the term "and/or" includes all or any element and all combinations of one or more of the associated listed items.
It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
Referring to fig. 1, an entity identification method based on abbreviated data of a model according to an embodiment of the present application includes:
s1: acquiring a medical text to be identified;
s2: finding out all abbreviation-full symmetry data appearing in the medical text through a preset algorithm;
s3: judging whether a first full name in a designated abbreviation-full name pair is a medical alias of a first medical concept in a preset designated medical dictionary, wherein the designated abbreviation-full name pair is any one abbreviation-full name pair in all abbreviation-full name pair data appearing in a medical text, and the designated abbreviation-full name pair comprises the first full name and a first abbreviation corresponding to the first full name;
s4: if the first full name is not a medical alias of a first medical concept in the specified medical dictionary, judging whether the first abbreviation is a medical alias of a second medical concept in the specified medical dictionary;
s5: if the first abbreviation is a medical alias of the second medical concept, acquiring all medical aliases contained in the second medical concept from the specified medical dictionary;
s6: inputting the first full name and the assigned medical alias into a preset twin network model, and acquiring an assigned similarity value between the first full name and the assigned medical alias through the twin network model, wherein the assigned medical alias is any one of all medical aliases included in the second medical concept, and the twin network model is generated after training a neural network model based on pre-acquired sample data with labels;
s7: judging whether medical aliases with the same meaning as the first full-meaning exist in all medical aliases contained in the second medical concept or not according to the designated similarity value;
s8: and if a medical alias with the same meaning as the first full meaning exists in all medical aliases contained in the second medical concept, judging that the first abbreviation belongs to a medical entity, and the first abbreviation and the second medical concept have a corresponding relationship.
As described in steps S1-S8, the implementation of the embodiment of the method is based on the entity recognition device of the abbreviated data of the model. In practical applications, the entity identification device based on the abbreviated data of the model may be implemented by a virtual device, such as a software code, or by an entity device written or integrated with a relevant execution code, and may perform human-computer interaction with a user through a keyboard, a mouse, a remote controller, a touch panel, or a voice control device. The entity recognition device of the model-based abbreviated data in the present embodiment can quickly, intelligently, and accurately recognize whether the abbreviated data in the medical text belongs to the medical entity. Specifically, first, a medical text to be recognized is acquired. The medical text to be recognized may specifically refer to an english medical document, and the application needs to recognize whether abbreviated data in the abbreviated-full-symmetric data belongs to a medical entity related to medicine according to all abbreviated-full-symmetric data appearing in the medical text. For example, if a sentence "This patent has a Heart Failure (HF) for five years" is recorded in the english medical literature, and the sentence "heart failure" (HF) exists, it is necessary to perform a recognition judgment process on the abbreviation HF whether the abbreviation HF belongs to a medical entity (hereinafter, the medical entity may also be referred to as an entity). And then finding out all abbreviation-full symmetry data appearing in the medical text through a preset algorithm. The preset algorithm may specifically adopt a schwartz heart algorithm, which is a mature algorithm, and can find out "abbreviation-full symmetry" data appearing in a text based on the text itself, where the format of abbreviation-full symmetry is: (for all, abbreviation). For example, an "abbreviation-full symmetry" (New York, NY) may be found from the sentence "New York (NY) is a big city". After the abbreviation-full-symmetry data is obtained, whether a first full-name in a designated abbreviation-full-name pair is a medical alias of a first medical concept in a preset designated medical dictionary is judged, wherein the designated abbreviation-full-name pair is any one abbreviation-full-name pair in all abbreviation-full-name data appearing in a medical text, and the designated abbreviation-full-name pair comprises the first full-name and a first abbreviation corresponding to the first full-name. In addition, the specified medical dictionary is generated by labeling the medical alias data contained in the original medical dictionary according to a preset labeling rule. The full name data and the abbreviated data recorded in the medical dictionary belong to medical entity data. The annotation rules include an annotation that adds a "1" to a medical alias that represents an abbreviation and an annotation that adds a "0" to a medical alias that represents an non-abbreviation, i.e., a fully qualified medical alias. In addition, a method of matching the first full name with a full name medical alias of each medical concept in the designated medical dictionary may be adopted to perform a determination process on whether the first full name in the designated abbreviation-full name pair is a medical alias of a first medical concept in a preset designated medical dictionary, if the matching is successful, it indicates that the first full name in the designated abbreviation-full name pair is the medical alias of the first medical concept in the designated medical dictionary, and if the matching is failed, it indicates that the first full name in the designated abbreviation-full name pair is not the medical alias of the first medical concept in the designated medical dictionary. And if the first full name is a medical alias of a first medical concept in the specified medical dictionary, directly judging that the first abbreviation belongs to a medical entity, and the first abbreviation has a corresponding relation with the first medical concept. For any "abbreviation-full-name pair" found by the algorithm, for example (a, a), it can be determined that a is the abbreviation of full-name a, so that the medical concept corresponding to full-name a is the medical concept corresponding to a, and it is not necessary to care whether a is possibly a medical alias of other medical concepts. If the first full name is not a medical alias of a first medical concept in the specified medical dictionary, determining whether the first abbreviation is a medical alias of the second medical concept. The method may adopt a mode of matching the first abbreviation with an abbreviated medical alias of each medical concept in the designated medical dictionary to perform a judgment process on whether the first abbreviation in the designated abbreviation-full symmetry pair is a preset medical alias of a second medical concept in the designated medical dictionary, if the matching is successful, it indicates that the first abbreviation in the designated abbreviation-full symmetry pair is the medical alias of the second medical concept in the designated medical dictionary, and if the matching is failed, it indicates that the first abbreviation in the designated abbreviation-full symmetry pair is not the medical alias of the second medical concept in the designated medical dictionary. If the first abbreviation is a medical alias of a second medical concept in the specified medical dictionary, all medical aliases contained by the second medical concept are obtained from the specified medical dictionary. Wherein all medical aliases encompassed by said second medical concept may refer to both fully-qualified medical aliases and abbreviated medical aliases encompassed by said second medical concept, preferably all medical aliases encompassed by said second medical concept comprise only all fully-qualified medical aliases encompassed by said second medical concept. And then inputting the first full name and the assigned medical alias into a preset twin network model, and acquiring an assigned similarity value between the first full name and the assigned medical alias through the twin network, wherein the assigned medical alias is any one of all medical aliases included in the second medical concept, and the twin network model is generated after a preset neural network model is trained on the basis of pre-acquired sample data with labels. The neural network model may be a CNN neural network, an RNN neural network, an LSTM neural network, or the like, and a bidirectional LSTM neural network is preferable in the present invention. The twin network model after training generation specifically comprises two parallel identical neural networks (a first neural network and a second neural network), and parameters of the two neural networks are identical. And after the specified similarity value is obtained, judging whether the medical alias with the same meaning as the first full-meaning exists in all the specified medical aliases according to the specified similarity value. Wherein, the method of comparing the specified similarity value with a preset similarity threshold value can be adopted to determine whether the medical alias with the same meaning as the first full meaning exists in all the specified medical aliases. If the medical alias with the specified similarity value larger than the preset similarity threshold value exists in all the medical aliases included in the second medical concept, the medical alias with the same meaning as the first full-meaning alias exists in all the medical aliases included in the second medical concept. And if no medical alias with the specified similarity value larger than the preset similarity threshold value exists in all the medical aliases included in the second medical concept, the fact that the medical alias with the same meaning as the first full-meaning name does not exist in all the medical aliases included in the second medical concept is indicated. And if a medical alias with the same meaning as the first full meaning exists in all medical aliases contained in the second medical concept, judging that the first abbreviation belongs to the medical entity and the first abbreviation has a corresponding relation with the second medical concept. Wherein, for an "abbreviation-full symmetry pair" (first full name, first abbreviation) found by an algorithm, for example, (a, a), if a is a medical alias of the medical concept α, but a is not present in the designated medical dictionary, it is possible that a is also a medical alias of the medical concept α, simply because the medical alias recorded in the designated medical dictionary is not comprehensive enough and is not included in the designated medical dictionary. Generally speaking, there are many statements of a medical concept. Such as heart failure, which may be expressed as "heart failure", "heart failures", "HF", "failure of heart", etc., the dictionary is difficult to exhaust. Most medical aliases are relatively close in font, and may differ only in unit number, part of speech, or word order. If A and other medical aliases of the medical concept alpha are close in font, then it can be basically concluded that A is the medical alias of alpha, and since A and a are the relationship of full name and abbreviation, it can be concluded that a is a medical entity, and a has a corresponding relationship with the medical concept alpha. By calculating the similarity values of A and all medical aliases contained in the medical concept alpha by using the twin network model, if medical aliases with the similarity values larger than a preset similarity threshold value exist in all medical aliases contained in the medical concept alpha, the medical alias of the medical concept alpha can be judged to be A, and the medical alias belongs to a medical entity. In addition, if neither full name a nor abbreviation a appear in the designated medical dictionary, and no medical alias similar in meaning to full name a exists among all medical aliases contained in the medical concept corresponding to abbreviation a in the designated medical dictionary, then a is not a medical entity by a high probability. By the aid of the method and the device, whether the abbreviated data in the abbreviated-full-symmetric data in the medical text belong to the medical entity can be rapidly and intelligently identified according to the cooperation of the specified medical dictionary and the twin network model, and accuracy of entity identification of the abbreviated data in the abbreviated-full-symmetric data in the medical text is effectively improved.
Further, in an embodiment of the present application, the twin network model includes two parallel and identical first and second neural networks, and the step S6 includes:
s600: inputting the first full name into a first neural network in the twin network model and the specified medical alias into a second neural network in the twin network model;
s601: converting the first full name into a corresponding first vector through the first neural network, and converting the specified medical alias into a corresponding second vector through the second neural network;
s602: calculating a similarity value of the first vector and the second vector;
s603: determining a similarity value of the first vector and the second vector as a specified similarity value of the first full name and the specified medical alias.
As described in the above steps S600 to S603, the twin network model includes two parallel and identical first and second neural networks. The step of inputting the first full name and the assigned medical alias into a preset twin network model and obtaining the assigned similarity value between the first full name and the assigned medical alias through the twin network model may specifically include: the first full name is first input into a first neural network in the twin network model, and the specified medical alias is input into a second neural network in the twin network model. The first full name is then converted to a corresponding first vector by the first neural network, and the specified medical alias is converted to a corresponding second vector by the second neural network. Preferably, all the full-name medical aliases included in the second medical concept are extracted from the designated medical dictionary and compared with the first full name in the designated abbreviation-full-name pair, so that the data processing amount of the abbreviation data is effectively reduced, and the identification efficiency of the abbreviation data is improved. Then, the similarity value of the first vector and the second vector is calculated. And finally, determining the similarity value of the first vector and the second vector as a specified similarity value of the first full name and the specified medical alias. The twin network model comprises three layers which are parallel and the same neural network, the first layer of the twin network model is an embedding layer, the second layer of the twin network model is a char CNN layer (character level neural network model layer), and the third layer of the twin network model is a full connection layer according to the sequence from bottom to top. And the first layer and the second layer of the twin network model can be divided into a left part and a right part, the structures of the left part and the right part are completely the same, and the third layer is shared by two identical neural networks (the first neural network and the second neural network). First, the embedding layer is used for each of the inputted character strings according to a preset first conversion ruleThe letters are converted into a corresponding vector, i.e. used to convert the character string into a corresponding matrix, and the first conversion rule may refer to the existing commonly used character-matrix conversion rule. Specifically, two identical neural networks, namely a first neural network and a second neural network, respectively convert the input first full name into a corresponding first matrix and convert the input specified medical alias into a corresponding second matrix through the embedding layer. Secondly, the char CNN layer is used to obtain font information of the input character string. The input character string is converted into a corresponding matrix through a first embedding layer, then the char CNN layer receives the matrix as input, and converts the matrix into a corresponding vector through calculation according to a preset second conversion rule and outputs the vector, the vector generated through conversion contains font information corresponding to the input character string, and the second conversion rule can refer to the existing commonly used matrix-vector conversion rule. Specifically, two identical neural networks, i.e., a first neural network and a second neural network, respectively convert an input first matrix into a corresponding first vector and convert an input second matrix into a corresponding second vector through the char CNN layer. Thirdly, the fully-connected layer takes two vectors output by the two char CNN layers, namely a first vector and a second vector, as inputs, calculates and outputs a similarity value of the two vectors through a corresponding formula, wherein the similarity value represents two input character strings, namely the probability that the first full name and the specified medical alias represent the same meaning, and the similarity value is a real number between 0 and 1. Specifically, the formula for calculating the similarity value may be:
Figure BDA0002673841950000131
wherein, Y1Is a first vector, Y2Is a second vector, sim (Y)1,Y2) Is the similarity of the first vector and the second vector. In the embodiment, the trained twin network model is used, so that the assigned similarity value of the input first full name and the assigned medical alias can be intelligently and quickly calculated, and the method is favorable for accurately calculating the assigned similarity value according to the assigned similarity value subsequentlyAnd judging whether the first abbreviation belongs to the medical entity or not, so that the identification efficiency of the abbreviation data is effectively improved. For example, an "abbreviation-full symmetric pair" is found by an algorithm, e.g., (a, a), if a is a medical alias of the medical concept α, and a does not appear in the designated medical dictionary. Assuming that the medical concept alpha has medical aliases A1, A2 and A3 …, respectively inputting A and A1, A and A2, A and A3 … into a trained twin network model, and respectively calculating similarity values between A and all medical aliases of the medical concept alpha through the twin network model. If the twin network model is calculated, and the similarity value between the A and a certain medical alias Ax of the alpha is judged to be larger than the similarity threshold value, the meaning of the A and the Ax is the same, and the a in the medical text can be judged to be a medical entity and corresponds to the medical concept alpha in the specified medical dictionary. And if the twin network model is calculated, the meanings of all medical aliases included by A and alpha are different, then a is judged not to be a medical entity.
Further, before a model processing process of inputting the first full name and the assigned medical alias into a preset twin network model and acquiring an assigned similarity value between the first full name and the assigned medical alias through the twin network model is performed, a training generation process of the twin network model is further included. Specifically, the twin network model is trained by taking some character string pairs as training samples. Each training sample comprises two character strings, the training samples are labeled, if the two character strings represent the same meaning, the label is 1, and if the two character strings represent different meanings, the label is 0. And then, building an untrained neural network model, training the neural network model by using the training data to obtain a trained neural network model, and using the trained neural network model as the twin network model. The neural network model can be a CNN neural network, an RNN neural network, an LSTM neural network and the like, and the invention is preferably a bidirectional LSTM neural network. And, the parameters of the twin network model may be trained by using the existing training method, which is not specifically limited in this application.
Further, after the twin network model is obtained, the twin network model can be stored in the block chain network. By using the block chain to store and manage the twin network model generated by training, the safety and the non-tamper property of the twin network model can be effectively ensured.
Further, in an embodiment of the present application, the step S7 includes:
s700: acquiring a preset similarity threshold;
s701: judging whether a similarity numerical value larger than the similarity threshold exists in all the specified similarity numerical values;
s702: if a similarity value larger than the similarity threshold value exists in all the designated similarity values, judging that a medical alias with the same meaning as the first full-meaning exists in all medical aliases contained in the second medical concept;
s703: and if the similarity value is not greater than the similarity threshold value in all the designated similarity values, determining that the medical alias with the same meaning as the first full-meaning is not present in all the medical aliases included in the second medical concept.
As described in steps S700 to S703, the step of determining whether or not the medical alias having the same meaning as the first full meaning exists in all the medical aliases included in the second medical concept according to the specified similarity value may specifically include: first, a preset similarity threshold is obtained. The similarity threshold is not particularly limited, and may be set according to actual requirements, for example, may be set to 0.5. And then judging whether a similarity numerical value larger than the similarity threshold value exists in all the specified similarity numerical values. If it is determined that a similarity value larger than the similarity threshold value exists in all the designated similarity values, and that a designated medical alias included in the input first full name and the second medical concept has the same meaning, it is determined that a medical alias having the same meaning as the first full name exists in all the medical aliases included in the second medical concept. And if the similarity value is judged not to exist in all the designated similarity values, the input first full name and all the designated medical aliases contained in the second medical concept are different in meaning, and the medical aliases with the same meaning as the first full name do not exist in all the medical aliases contained in the second medical concept. By the embodiment, whether the abbreviated data in the abbreviated-full symmetric data in the medical text belongs to the medical entity can be quickly, intelligently and accurately determined, and the efficiency and the accuracy of entity identification of the abbreviated data in the abbreviated-full symmetric data are effectively improved.
Further, in an embodiment of the present application, before the step S3, the method includes:
s300: acquiring a medical dictionary;
s301: according to a preset labeling rule, performing labeling processing on medical alias data contained in the medical dictionary to generate a labeled medical dictionary;
s302: and taking the labeled medical dictionary as the specified medical dictionary.
As described in the above steps S300 to S302, before the step of determining whether the first full name in the designated abbreviation-full name pair is a medical alias of the first medical concept in the preset designated medical dictionary, the method further includes a process of creating the designated medical dictionary. Specifically, a medical dictionary is first acquired. The medical dictionary is a medical dictionary in which medical concepts and medical aliases of medical entities are recorded in a database in advance, and the medical dictionary comprises a plurality of medical concepts, and each medical concept has a unique ID and a plurality of medical aliases. Here, a medical dictionary containing 2 medical concepts is taken as an example: { ' id ': PA0001 ', ' aspect failure ', ' HF ', { ' id ': PA0002 ', ' aspects ': diabetes ', ' diabetes ' ] } ]. And then, according to a preset labeling rule, performing labeling processing on medical alias data contained in the medical dictionary to generate a labeled medical dictionary. The preset labeling rules include labeling of "1" to the medical alias representing the abbreviation and labeling of "0" to the medical alias representing the non-abbreviation, i.e., the full name medical alias. Based on the medical dictionary exemplified above, the labeling processing is performed on the medical dictionary according to the labeling rule, and then: [ { 'id': PA0001 ',' alias ': [ (heart failure,0), (HF,1) ] }, {' id ': PA 0002', 'alias': diabets, 0), (diabetic,0) ] }. And finally, taking the labeled medical dictionary as the specified medical dictionary. So that whether the first full name in the designated abbreviation-full name pair is the medical alias of the first medical concept in the preset designated medical dictionary can be quickly judged according to the designated medical dictionary subsequently.
Further, in an embodiment of the present application, after the step S8, the method includes:
s800: screening out candidate abbreviated medical entities and full-name medical entities in the medical text based on the specified medical dictionary;
s801: judging whether a designated candidate abbreviated medical entity belongs to medical aliases of a plurality of different medical concepts in the designated medical dictionary at the same time, wherein the designated candidate abbreviated medical entity is any one data of all the candidate abbreviated medical entities;
s802: if the medical aliases belong to a plurality of different medical concepts in the designated medical dictionary at the same time, traversing all the fully-qualified medical entities, and judging whether the medical aliases of the designated medical concepts exist in the fully-qualified medical entities, wherein the designated medical concept is any one of the plurality of medical concepts to which the designated candidate abbreviated medical entity belongs;
s803: if a medical alias of a specified medical concept exists in the fully-qualified medical entity, judging that the specified candidate abbreviated medical entity belongs to a medical entity, and the specified candidate abbreviated medical entity has a corresponding relation with the specified medical concept;
s804: if no medical alias of the specified medical concept exists in the fully qualified medical entity, determining that the specified candidate abbreviated medical entity does not belong to a medical entity.
As described in steps S800 to S804, in addition to finding out the abbreviated-full symmetric data in the medical text to be recognized, candidate medical entity data other than the abbreviated-full symmetric data may be further screened out from the medical text, and entity discrimination processing may be intelligently performed on candidate abbreviated medical entities included in the candidate medical entity data. Specifically, if there is a medical alias having the same meaning as the first full meaning in all medical aliases included in the second medical concept, the step of determining that the first abbreviation belongs to a medical entity and the first abbreviation and the second medical concept have a corresponding relationship may further include: firstly, screening out candidate abbreviated medical entities and full-name medical entities in the medical text based on the specified medical dictionary. The candidate abbreviated medical entity refers to other abbreviated data in the medical text besides the abbreviated data of the abbreviated-full-symmetric data, and the full-symmetric medical entity refers to all full-symmetric data contained in the medical text and also includes full-symmetric data in the abbreviated-full-symmetric data. In addition, the candidate abbreviated medical entity and the full-name medical entity can be labeled, for example, in a bold manner, so as to achieve the effect of identification. For example, labeling candidate medical entity data in medical text with a specified medical dictionary may result in candidate abbreviated medical entities a, b, c …, and in full-name medical entity A, B, C … (where there is no correspondence between a and a, no association with the previous example, using only lower case letters to represent abbreviated character strings found from medical text according to the specified medical dictionary, and capital letters to represent non-abbreviated character strings found from medical text according to the specified medical dictionary, i.e., full-name character strings, and a does not mean an abbreviation for a). And then judging whether a designated candidate abbreviated medical entity belongs to medical aliases of a plurality of different medical concepts in the designated medical dictionary at the same time, wherein the designated candidate abbreviated medical entity is any data in all the candidate abbreviated medical entities. If the specified candidate abbreviated medical entity belongs to medical aliases of a plurality of different medical concepts in the specified medical dictionary at the same time, traversing all the fully-named medical entities, and judging whether the medical aliases of the specified medical concepts exist in the fully-named medical entities, wherein the specified medical concept is any one medical concept in the plurality of medical concepts to which the specified candidate abbreviated medical entity belongs. If a medical alias of a specified medical concept exists in the fully qualified medical entity, determining that the specified candidate abbreviated medical entity belongs to a medical entity and the specified candidate abbreviated medical entity has a corresponding relationship with the specified medical concept. And if no medical alias specifying the medical concept exists in the fully qualified medical entity, determining that the specified candidate abbreviated medical entity does not belong to the medical entity. Wherein, for a medical concept α, if an abbreviated medical alias a and a full-name medical alias a of the medical concept α are simultaneously presented in a text, a in the text represents the meaning of the medical concept α in most cases. Specifically, taking the candidate abbreviated medical entity a in the medical text as an example, assuming that a is a medical alias of both medical concepts α and β (α, β, γ … being the same) in the specified medical dictionary, the fully named medical entity A, B, C … in the medical text is traversed. If A (B, C is the same) is a medical alias of alpha (beta is the same), the candidate abbreviated medical entity a is determined to belong to a medical entity, and the candidate abbreviated medical entity a has a corresponding relationship with the medical concept alpha. And if none of A, B, C … are medical aliases of alpha or beta (alpha, beta, gamma … synonyms), then the candidate abbreviated medical entity a is determined not to belong to a medical entity. By the method and the device, whether other abbreviated data except the abbreviated data of the abbreviated-full-symmetric data in the medical text belongs to the medical entity can be intelligently, quickly and accurately determined, and the entity identification efficiency and the identification accuracy of other abbreviated data except the abbreviated-full-symmetric data are effectively improved.
Further, in an embodiment of the present application, after the step S7, the method includes:
s700: if no medical alias with the same meaning as the first full-name exists in all the designated medical aliases, judging that the first abbreviation does not belong to a medical entity;
s701: adding an annotation of a non-medical entity to the first abbreviation.
As described in steps S700 to S701 above, after the step of determining whether or not the medical alias having the same meaning as the first full meaning exists among all the medical aliases included in the second medical concept according to the designated similarity value, the method may further include: and if the medical alias with the same meaning as the first full-name does not exist in all the medical aliases contained in the second medical concept, the first full-name and the abbreviation a do not appear in the designated medical dictionary, and the second medical concept in the designated medical dictionary comprises all the medical aliases and does not exist the medical alias with the similar or same meaning as the first full-name, judging that the first abbreviation does not belong to the medical entity. An annotation of a non-medical entity is then added to the first abbreviation. The specific labeling manner of the labeling of the non-medical entity to the first abbreviation is not limited, and for example, labeling manners such as thickening and highlighting may be adopted. According to the embodiment, the label of the non-medical entity is added to the first abbreviation which does not belong to the medical entity, so that the abbreviation data which does not belong to the medical entity can be conveniently and quickly checked from the medical text in the follow-up process, the condition that information is mixed up by a user is avoided, and the use experience of the user is improved.
Further, in an embodiment of the present application, after the step S8, the method includes:
s810: finding a data record location corresponding to the second medical concept from the specified medical dictionary;
s811: determining a filling position at the data recording position;
s812: adding the first full scale at the fill location.
When it is determined that the first abbreviation belongs to the medical entity and the first abbreviation has a corresponding relationship with the second medical concept as described in the above steps S810 to S812, a first full name corresponding to the first abbreviation may be further added to the above specified medical dictionary to implement data perfecting processing for the specified medical dictionary. Specifically, the step of determining that the first abbreviation belongs to the medical entity after the step of determining that the first abbreviation belongs to the medical entity if a medical alias having the same meaning as the first full meaning exists among all medical aliases included in the second medical concept includes: the data record locations corresponding to the second medical concept are first located from the prescribed medical dictionary. The data recording position is a position where data related to the second medical concept is recorded in the designated medical dictionary. A fill location is then determined at the data recording location. Wherein the filling position is related to the positions of all medical aliases included in the second medical concept, but the filling position is not particularly limited, for example, the filling position is a position before the position of a first medical alias included in the second medical concept, and the first medical alias is a medical alias with a first ranking order of the placement positions; or the fill location may be a location behind the location of the last medical alias contained by the second medical concept, the last medical alias being the medical alias with the last ranking of placement locations, and so on. And finally adding the first full scale at the filling position. Wherein, non-abbreviated labels can also be added to the first full name. After determining that the medical alias having the same meaning as the first full name exists in all the medical aliases included in the second medical concept, namely the first abbreviation belongs to the medical entity, the embodiment intelligently perfects the first full name to a corresponding position in the designated medical dictionary related to the second medical concept, thereby effectively improving the data integrity and the data accuracy of the designated medical dictionary.
The entity identification method based on the abbreviated data of the model of the embodiment of the application can also be applied to the field of block chains, such as storing the twin network model in the block chains.
The block chain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism and an encryption algorithm. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.
The block chain underlying platform can comprise processing modules such as user management, basic service, intelligent contract and operation monitoring. The user management module is responsible for identity information management of all blockchain participants, and comprises public and private key generation maintenance (account management), key management, user real identity and blockchain address corresponding relation maintenance (authority management) and the like, and under the authorization condition, the user management module supervises and audits the transaction condition of certain real identities and provides rule configuration (wind control audit) of risk control; the basic service module is deployed on all block chain node equipment and used for verifying the validity of the service request, recording the service request to storage after consensus on the valid request is completed, for a new service request, the basic service firstly performs interface adaptation analysis and authentication processing (interface adaptation), then encrypts service information (consensus management) through a consensus algorithm, transmits the service information to a shared account (network communication) completely and consistently after encryption, and performs recording and storage; the intelligent contract module is responsible for registering and issuing contracts, triggering the contracts and executing the contracts, developers can define contract logics through a certain programming language, issue the contract logics to a block chain (contract registration), call keys or other event triggering and executing according to the logics of contract clauses, complete the contract logics and simultaneously provide the function of upgrading and canceling the contracts; the operation monitoring module is mainly responsible for deployment, configuration modification, contract setting, cloud adaptation in the product release process and visual output of real-time states in product operation, such as: alarm, monitoring network conditions, monitoring node equipment health status, and the like.
Referring to fig. 2, an embodiment of the present application further provides an entity identification apparatus based on abbreviated data of a model, including:
the first acquisition module 1 is used for acquiring a medical text to be recognized;
the first searching module 2 is used for searching all abbreviation-full symmetric data appearing in the medical text through a preset algorithm;
the first judging module 3 is used for judging whether a first full name in a designated abbreviation-full name pair is a medical alias of a first medical concept in a preset designated medical dictionary, wherein the designated abbreviation-full name pair is any one abbreviation-full name pair in all abbreviation-full name pair data appearing in a medical text, and the designated abbreviation-full name pair comprises the first full name and a first abbreviation corresponding to the first full name;
a second determining module 4, configured to determine whether the first abbreviation is a medical alias of a second medical concept in the specified medical dictionary if the first full name is not the medical alias of the first medical concept in the specified medical dictionary;
a second obtaining module 5, configured to obtain all medical aliases included in the second medical concept from the designated medical dictionary if the first abbreviation is a medical alias of the second medical concept;
a third obtaining module 6, configured to input the first full name and the specified medical alias into a preset twin network model, and obtain a specified similarity value between the first full name and the specified medical alias through the twin network model, where the specified medical alias is any one of all medical aliases included in the second medical concept, and the twin network model is generated after a preset neural network model is trained based on pre-collected sample data with labels;
a third judging module 7, configured to judge, according to the specified similarity value, whether a medical alias having the same meaning as the first full-meaning exists in all the specified medical aliases;
and a first determination module 8, configured to determine that the first abbreviation belongs to a medical entity and the first abbreviation has a corresponding relationship with the second medical concept if a medical alias having the same meaning as the first full-meaning exists in all the designated medical aliases.
In this embodiment, the implementation processes of the functions and actions of the first obtaining module, the first searching module, the first determining module, the second obtaining module, the third determining module and the first determining module in the entity identification apparatus for abbreviated data based on a model are specifically described in the implementation processes corresponding to steps S1 to S8 in the entity identification method for abbreviated data based on a model, and are not described herein again.
Further, in an embodiment of the present application, the twin network model includes two parallel and identical first and second neural networks, and the third obtaining module includes:
an input submodule to input the first full name into a first neural network in the twin network model and the specified medical alias into a second neural network in the twin network model;
a translation sub-module to translate the first full name into a corresponding first vector through the first neural network and to translate the specified medical alias into a corresponding second vector through the second neural network;
the calculating submodule is used for calculating the similarity value of the first vector and the second vector;
a determining submodule, configured to determine a similarity value between the first vector and the second vector as a specified similarity value between the first full name and the specified medical alias.
In this embodiment, the implementation processes of the functions and functions of the input submodule, the conversion submodule, the calculation submodule, and the determination submodule in the entity identification device based on the abbreviated data of the model are specifically described in the implementation processes corresponding to steps S600 to S603 in the entity identification method based on the abbreviated data of the model, and are not described herein again.
Further, in an embodiment of the present application, the first receiving module includes:
the acquisition submodule is used for acquiring a preset similarity threshold;
the judgment submodule is used for judging whether a similarity numerical value larger than the similarity threshold exists in all the specified similarity numerical values or not;
a first determining submodule, configured to determine that a medical alias having the same meaning as the first full-meaning exists among all medical aliases included in the second medical concept if a similarity value larger than the similarity threshold value exists among all the designated similarity values;
and a second determining submodule, configured to determine that, if the similarity value is not greater than the similarity threshold value in all the designated similarity values, there is no medical alias having the same meaning as the first full-meaning in all the medical aliases included in the second medical concept.
In this embodiment, the implementation processes of the functions and functions of the obtaining submodule, the judging submodule, the first judging submodule and the second judging submodule in the entity identification device based on the abbreviated data of the model are specifically described in the implementation processes corresponding to steps S700 to S703 in the entity identification method based on the abbreviated data of the model, and are not described herein again.
Further, in an embodiment of the present application, the entity identification apparatus based on abbreviated data of model includes:
the fourth acquisition module is used for acquiring the medical dictionary;
the labeling module is used for labeling the medical alias data contained in the medical dictionary according to a preset labeling rule to generate a labeled medical dictionary;
a first determining module, configured to use the labeled medical dictionary as the specified medical dictionary.
In this embodiment, the implementation processes of the functions and functions of the fourth obtaining module, the labeling module and the first determining module in the entity identification apparatus for abbreviated data based on a model are specifically described in the implementation processes corresponding to steps S300 to S302 in the entity identification method for abbreviated data based on a model, and are not described herein again.
Further, in an embodiment of the present application, the entity identification apparatus based on abbreviated data of model includes:
the screening module is used for screening out candidate abbreviated medical entities and full-name medical entities in the medical text based on the specified medical dictionary;
a fourth judging module, configured to judge whether a specified candidate abbreviated medical entity belongs to medical aliases of multiple different medical concepts in the specified medical dictionary at the same time, where the specified candidate abbreviated medical entity is any one of the data of all the candidate abbreviated medical entities;
a fifth judging module, configured to traverse all the fully-qualified medical entities and judge whether a medical alias of a specified medical concept exists in the fully-qualified medical entities if the medical aliases belong to multiple different medical concepts in the specified medical dictionary at the same time, where the specified medical concept is any one of the multiple medical concepts to which the specified candidate abbreviated medical entity belongs;
a second judging module, configured to judge that the specified candidate abbreviated medical entity belongs to a medical entity and the specified candidate abbreviated medical entity has a corresponding relationship with the specified medical concept if a medical alias of the specified medical concept exists in the fully-qualified medical entity;
a third determining module, configured to determine that the specified candidate abbreviated medical entity does not belong to a medical entity if no medical alias specifying a medical concept exists in the fully qualified medical entity.
In this embodiment, the implementation processes of the functions and functions of the screening module, the fourth determining module, the fifth determining module, the second determining module and the third determining module in the entity identification apparatus based on abbreviated data of the model are specifically described in the implementation processes corresponding to steps S800 to S804 in the entity identification method based on abbreviated data of the model, and are not described herein again.
Further, in an embodiment of the present application, the entity identification apparatus based on abbreviated data of model includes:
a fourth determination module, configured to determine that the first abbreviation does not belong to a medical entity if a medical alias having the same meaning as the first full name does not exist in all the designated medical aliases;
a first adding module for adding a label of a non-medical entity to the first abbreviation.
In this embodiment, the implementation processes of the functions and actions of the fourth determination module and the first addition module in the entity identification apparatus based on abbreviated data of the model are specifically described in the implementation processes corresponding to steps S700 to S701 in the entity identification method based on abbreviated data of the model, and are not described herein again.
Further, in an embodiment of the present application, the entity identification apparatus based on abbreviated data of a model further includes:
a second lookup module to lookup a data record location corresponding to the second medical concept from the prescribed medical dictionary;
a second determining module for determining a filling position at said data recording position;
a second adding module for adding the first full scale at the filling position.
In this embodiment, the implementation processes of the functions and functions of the second search module, the second determination module and the second addition module in the entity identification apparatus based on the abbreviated data of the model are specifically described in the implementation processes corresponding to steps S700 to S701 in the entity identification method based on the abbreviated data of the model, and are not described herein again.
Referring to fig. 3, a computer device, which may be a server and whose internal structure may be as shown in fig. 3, is also provided in the embodiment of the present application. The computer device comprises a processor, a memory, a network interface, a display screen, an input device and a database which are connected through a system bus. Wherein the processor of the computer device is designed to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used for storing abbreviation-full symmetry data, a first full name, a specified medical dictionary, a first abbreviation, a second medical concept, a twin network model, and specified similarity values, etc. The network interface of the computer device is used for communicating with an external terminal through a network connection. The display screen of the computer equipment is an indispensable image-text output equipment in the computer, and is used for converting digital signals into optical signals so that characters and figures are displayed on the screen of the display screen. The input device of the computer equipment is the main device for information exchange between the computer and the user or other equipment, and is used for transmitting data, instructions, some mark information and the like to the computer. The computer program is executed by a processor to implement a method of entity recognition based on abbreviated data of a model.
The processor executes the step of the entity identification method based on the model abbreviation data:
acquiring a medical text to be identified;
finding out all abbreviation-full symmetry data appearing in the medical text through a preset algorithm;
judging whether a first full name in a designated abbreviation-full name pair is a medical alias of a first medical concept in a preset designated medical dictionary, wherein the designated abbreviation-full name pair is any one abbreviation-full name pair in all abbreviation-full name pair data appearing in a medical text, and the designated abbreviation-full name pair comprises the first full name and a first abbreviation corresponding to the first full name;
if the first full name is not a medical alias of a first medical concept in the specified medical dictionary, judging whether the first abbreviation is a medical alias of a second medical concept in the specified medical dictionary;
if the first abbreviation is a medical alias of the second medical concept, acquiring all medical aliases contained in the second medical concept from the specified medical dictionary;
inputting the first full name and the assigned medical alias into a preset twin network model, and acquiring an assigned similarity value between the first full name and the assigned medical alias through the twin network model, wherein the assigned medical alias is any one of all medical aliases included in the second medical concept, and the twin network model is generated after a preset neural network model is trained on the basis of pre-acquired sample data with labels;
judging whether medical aliases with the same meaning as the first full-meaning exist in all medical aliases contained in the second medical concept or not according to the designated similarity value;
and if a medical alias with the same meaning as the first full meaning exists in all medical aliases contained in the second medical concept, judging that the first abbreviation belongs to a medical entity, and the first abbreviation and the second medical concept have a corresponding relationship.
Those skilled in the art will appreciate that the structure shown in fig. 3 is only a block diagram of a part of the structure related to the present application, and does not constitute a limitation to the apparatus and the computer device to which the present application is applied.
An embodiment of the present application further provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the method for identifying an entity based on abbreviated data of a model is implemented, specifically:
acquiring a medical text to be identified;
finding out all abbreviation-full symmetry data appearing in the medical text through a preset algorithm;
judging whether a first full name in a designated abbreviation-full name pair is a medical alias of a first medical concept in a preset designated medical dictionary, wherein the designated abbreviation-full name pair is any one abbreviation-full name pair in all abbreviation-full name pair data appearing in a medical text, and the designated abbreviation-full name pair comprises the first full name and a first abbreviation corresponding to the first full name;
if the first full name is not a medical alias of a first medical concept in the specified medical dictionary, judging whether the first abbreviation is a medical alias of a second medical concept in the specified medical dictionary;
if the first abbreviation is a medical alias of the second medical concept, acquiring all medical aliases contained in the second medical concept from the specified medical dictionary;
inputting the first full name and the assigned medical alias into a preset twin network model, and acquiring an assigned similarity value between the first full name and the assigned medical alias through the twin network model, wherein the assigned medical alias is any one of all medical aliases included in the second medical concept, and the twin network model is generated after a preset neural network model is trained on the basis of pre-acquired sample data with labels;
judging whether medical aliases with the same meaning as the first full-meaning exist in all medical aliases contained in the second medical concept or not according to the designated similarity value;
and if a medical alias with the same meaning as the first full meaning exists in all medical aliases contained in the second medical concept, judging that the first abbreviation belongs to a medical entity, and the first abbreviation and the second medical concept have a corresponding relationship.
In summary, the entity identification method, device, computer equipment and storage medium based on abbreviated data of a model provided in the embodiment of the present application obtain a medical text to be identified; searching all abbreviation-full symmetry data appearing in the medical text through a preset algorithm; judging whether a first full name in a designated abbreviation-full name pair is a medical alias of a first medical concept in a preset designated medical dictionary, wherein the designated abbreviation-full name pair is any one abbreviation-full name pair in all abbreviation-full name pair data appearing in a medical text, and the designated abbreviation-full name pair comprises the first full name and a first abbreviation corresponding to the first full name; if the first full name is not a medical alias of a first medical concept in the designated medical dictionary, judging whether the first abbreviation is a medical alias of a second medical concept in the designated medical dictionary; if the first abbreviation is a medical alias of the second medical concept, acquiring all medical aliases contained in the second medical concept from a specified medical dictionary; inputting the first full name and the assigned medical alias into a preset twin network model, and acquiring an assigned similarity value between the first full name and the assigned medical alias through the twin network model, wherein the assigned medical alias is any one of all medical aliases contained in the second medical concept; judging whether medical aliases with the same meaning as the first full-meaning exist in all medical aliases contained in the second medical concept or not according to the assigned similarity value; if a medical alias having the same meaning as the first full-meaning exists in all medical aliases included in the second medical concept, the first abbreviation belongs to the medical entity, and the first abbreviation and the second medical concept have a corresponding relationship. According to the embodiment of the application, whether the abbreviated data in the abbreviated-full symmetric data in the medical text belongs to the medical entity can be rapidly and intelligently identified according to the matching use of the specified medical dictionary and the twin network model, and the accuracy of entity identification of the abbreviated data in the abbreviated-full symmetric data in the medical text is effectively improved.
It will be understood by those skilled in the art that all or part of the processes of the methods of the above embodiments may be implemented by hardware associated with instructions of a computer program, which may be stored on a non-volatile computer-readable storage medium, and when executed, may include processes of the above embodiments of the methods. Any reference to memory, storage, database, or other medium provided herein and used in the examples may include non-volatile and/or volatile memory. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), double-rate SDRAM (SSRSDRAM), Enhanced SDRAM (ESDRAM), synchronous link (Synchlink) DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, apparatus, article, or method that includes the element.
The above description is only a preferred embodiment of the present application, and not intended to limit the scope of the present application, and all modifications of equivalent structures and equivalent processes, which are made by the contents of the specification and the drawings of the present application, or which are directly or indirectly applied to other related technical fields, are also included in the scope of the present application.

Claims (10)

1. A method for entity recognition based on abbreviated data of a model is characterized by comprising the following steps:
acquiring a medical text to be identified;
finding out all abbreviation-full symmetry data appearing in the medical text through a preset algorithm;
judging whether a first full name in a designated abbreviation-full name pair is a medical alias of a first medical concept in a preset designated medical dictionary, wherein the designated abbreviation-full name pair is any one abbreviation-full name pair in all abbreviation-full name pair data appearing in a medical text, and the designated abbreviation-full name pair comprises the first full name and a first abbreviation corresponding to the first full name;
if the first full name is not a medical alias of a first medical concept in the specified medical dictionary, judging whether the first abbreviation is a medical alias of a second medical concept in the specified medical dictionary;
if the first abbreviation is a medical alias of the second medical concept, acquiring all medical aliases contained in the second medical concept from the specified medical dictionary;
inputting the first full name and the assigned medical alias into a preset twin network model, and acquiring an assigned similarity value between the first full name and the assigned medical alias through the twin network model, wherein the assigned medical alias is any one of all medical aliases included in the second medical concept, and the twin network model is generated after a preset neural network model is trained on the basis of pre-acquired sample data with labels;
judging whether medical aliases with the same meaning as the first full-meaning exist in all medical aliases contained in the second medical concept or not according to the designated similarity value;
and if a medical alias with the same meaning as the first full meaning exists in all medical aliases contained in the second medical concept, judging that the first abbreviation belongs to a medical entity, and the first abbreviation and the second medical concept have a corresponding relationship.
2. The method for entity identification of model-based abbreviated data as claimed in claim 1, wherein the twin network model comprises two parallel and identical first and second neural networks, the step of inputting the first full name and the assigned medical alias into the preset twin network model, and the step of obtaining the assigned similarity value between the first full name and the assigned medical alias through the twin network model comprises:
inputting the first full name into a first neural network in the twin network model and the specified medical alias into a second neural network in the twin network model;
converting the first full name into a corresponding first vector through the first neural network, and converting the specified medical alias into a corresponding second vector through the second neural network;
calculating a similarity value of the first vector and the second vector;
determining a similarity value of the first vector and the second vector as a specified similarity value of the first full name and the specified medical alias.
3. The method according to claim 1, wherein the step of determining whether or not the medical alias having the same meaning as the first full-meaning exists among all the medical aliases included in the second medical concept based on the specified similarity value comprises:
acquiring a preset similarity threshold;
judging whether a similarity numerical value larger than the similarity threshold exists in all the specified similarity numerical values;
if a similarity value larger than the similarity threshold value exists in all the designated similarity values, judging that a medical alias with the same meaning as the first full-meaning exists in all medical aliases contained in the second medical concept;
and if the similarity value is not greater than the similarity threshold value in all the designated similarity values, determining that the medical alias with the same meaning as the first full-meaning is not present in all the medical aliases included in the second medical concept.
4. The method of claim 1, wherein the step of determining whether the first full name in the designated abbreviation-full name pair is a medical alias of the first medical concept in the predetermined designated medical dictionary is preceded by the step of:
acquiring a medical dictionary;
according to a preset labeling rule, performing labeling processing on medical alias data contained in the medical dictionary to generate a labeled medical dictionary;
and taking the labeled medical dictionary as the specified medical dictionary.
5. The method according to claim 1, wherein the step of determining that the first abbreviation belongs to a medical entity and the first abbreviation has a correspondence relationship with the second medical concept, if a medical alias having the same meaning as the first full-meaning exists among all medical aliases included in the second medical concept, comprises:
screening out candidate abbreviated medical entities and full-name medical entities in the medical text based on the specified medical dictionary;
judging whether a designated candidate abbreviated medical entity belongs to medical aliases of a plurality of different medical concepts in the designated medical dictionary at the same time, wherein the designated candidate abbreviated medical entity is any one data of all the candidate abbreviated medical entities;
if the medical aliases belong to a plurality of different medical concepts in the designated medical dictionary at the same time, traversing all the fully-qualified medical entities, and judging whether the medical aliases of the designated medical concepts exist in the fully-qualified medical entities, wherein the designated medical concept is any one of the plurality of medical concepts to which the designated candidate abbreviated medical entity belongs;
if a medical alias of a specified medical concept exists in the fully-qualified medical entity, judging that the specified candidate abbreviated medical entity belongs to a medical entity, and the specified candidate abbreviated medical entity has a corresponding relation with the specified medical concept;
if no medical alias of the specified medical concept exists in the fully qualified medical entity, determining that the specified candidate abbreviated medical entity does not belong to a medical entity.
6. The method for entity recognition based on abbreviated data of model as claimed in claim 1, wherein said step of determining whether a medical alias having the same meaning as the first full-meaning exists among all medical aliases included in the second medical concept according to the assigned similarity value comprises:
if no medical alias having the same meaning as the first full-meaning exists among all medical aliases included in the second medical concept, determining that the first abbreviation does not belong to a medical entity;
adding an annotation of a non-medical entity to the first abbreviation.
7. The method according to claim 1, wherein the step of determining that the first abbreviation belongs to a medical entity if a medical alias having the same meaning as the first full meaning exists among all medical aliases included in the second medical concept is followed by the step of:
finding a data record location corresponding to the second medical concept from the specified medical dictionary;
determining a filling position at the data recording position;
adding the first full scale at the fill location.
8. An entity recognition apparatus for model-based abbreviation data, comprising:
the first acquisition module is used for acquiring a medical text to be recognized;
the first searching module is used for searching all abbreviation-full symmetric data appearing in the medical text through a preset algorithm;
the first judging module is used for judging whether a first full name in a designated abbreviation-full name pair is a medical alias of a first medical concept in a preset designated medical dictionary, wherein the designated abbreviation-full name pair is any one abbreviation-full name pair in all abbreviation-full name pair data appearing in a medical text, and the designated abbreviation-full name pair comprises the first full name and a first abbreviation corresponding to the first full name;
a second determining module, configured to determine whether the first abbreviation is a medical alias of a second medical concept in the designated medical dictionary if the first full name is not the medical alias of the first medical concept in the designated medical dictionary;
a second obtaining module, configured to obtain all medical aliases included in the second medical concept from the designated medical dictionary if the first abbreviation is a medical alias of the second medical concept;
a third obtaining module, configured to input the first full name and the specified medical alias into a preset twin network model, and obtain a specified similarity value between the first full name and the specified medical alias through the twin network model, where the specified medical alias is any one of all medical aliases included in the second medical concept, and the twin network model is generated after a preset neural network model is trained based on pre-acquired sample data with a label;
a third judging module, configured to judge whether a medical alias having the same meaning as the first full-meaning exists in all medical aliases included in the second medical concept according to the specified similarity value;
and the first judgment module is used for judging that the first abbreviation belongs to a medical entity and has a corresponding relation with the second medical concept if a medical alias with the same meaning as the first full-meaning exists in all medical aliases contained in the second medical concept.
9. A computer device comprising a memory and a processor, the memory having stored therein a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method according to any one of claims 1 to 7.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.
CN202010941630.4A 2020-09-09 2020-09-09 Entity identification method and device based on abbreviated data of model and computer equipment Active CN112036172B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202010941630.4A CN112036172B (en) 2020-09-09 2020-09-09 Entity identification method and device based on abbreviated data of model and computer equipment
PCT/CN2020/125144 WO2021159757A1 (en) 2020-09-09 2020-10-30 Method and apparatus for entity recognition in abbreviated data based on model, and computer

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010941630.4A CN112036172B (en) 2020-09-09 2020-09-09 Entity identification method and device based on abbreviated data of model and computer equipment

Publications (2)

Publication Number Publication Date
CN112036172A true CN112036172A (en) 2020-12-04
CN112036172B CN112036172B (en) 2022-04-15

Family

ID=73585261

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010941630.4A Active CN112036172B (en) 2020-09-09 2020-09-09 Entity identification method and device based on abbreviated data of model and computer equipment

Country Status (2)

Country Link
CN (1) CN112036172B (en)
WO (1) WO2021159757A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113204974A (en) * 2021-05-14 2021-08-03 清华大学 Method, device and equipment for generating confrontation text and storage medium
CN114676319A (en) * 2022-03-01 2022-06-28 广州云趣信息科技有限公司 Method and device for acquiring name of merchant and readable storage medium

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116167354B (en) * 2023-04-19 2023-07-07 北京亚信数据有限公司 Medical term feature extraction model training and standardization method and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104035918A (en) * 2014-06-12 2014-09-10 华东师范大学 Chinese organization name abbreviation recognition system adopting context feature matching
CN108491385A (en) * 2018-03-16 2018-09-04 广西师范大学 A kind of this body automatic generation method of teaching field and device based on dependence
CN109635285A (en) * 2018-11-26 2019-04-16 平安科技(深圳)有限公司 Enterprise's full name and abbreviation matching method, apparatus, computer equipment and storage medium
CN111460175A (en) * 2020-04-08 2020-07-28 福州数据技术研究院有限公司 SNOMED-CT-based medical noun dictionary construction and expansion method
CN111581960A (en) * 2020-05-06 2020-08-25 上海海事大学 Method for obtaining semantic similarity of medical texts

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101093478B (en) * 2007-07-25 2010-06-02 中国科学院计算技术研究所 Method and system for identifying Chinese full name based on Chinese shortened form of entity
JP5581861B2 (en) * 2010-07-12 2014-09-03 富士通株式会社 Retrieval device, method and program, and data parsing device having retrieval function
CN104881397B (en) * 2014-02-27 2018-01-30 富士通株式会社 Abbreviation extended method and device
CN108460014B (en) * 2018-02-07 2022-02-25 百度在线网络技术(北京)有限公司 Enterprise entity identification method and device, computer equipment and storage medium
CN108460016A (en) * 2018-02-09 2018-08-28 中云开源数据技术(上海)有限公司 A kind of entity name analysis recognition method
CN111444717A (en) * 2018-12-28 2020-07-24 天津幸福生命科技有限公司 Method and device for extracting medical entity information, storage medium and electronic equipment

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104035918A (en) * 2014-06-12 2014-09-10 华东师范大学 Chinese organization name abbreviation recognition system adopting context feature matching
CN108491385A (en) * 2018-03-16 2018-09-04 广西师范大学 A kind of this body automatic generation method of teaching field and device based on dependence
CN109635285A (en) * 2018-11-26 2019-04-16 平安科技(深圳)有限公司 Enterprise's full name and abbreviation matching method, apparatus, computer equipment and storage medium
CN111460175A (en) * 2020-04-08 2020-07-28 福州数据技术研究院有限公司 SNOMED-CT-based medical noun dictionary construction and expansion method
CN111581960A (en) * 2020-05-06 2020-08-25 上海海事大学 Method for obtaining semantic similarity of medical texts

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王树伟: "面向金融文本的实体识别与关系抽取研究", 《中国优秀硕士学位论文全文数据库信息科技辑(月刊)》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113204974A (en) * 2021-05-14 2021-08-03 清华大学 Method, device and equipment for generating confrontation text and storage medium
CN114676319A (en) * 2022-03-01 2022-06-28 广州云趣信息科技有限公司 Method and device for acquiring name of merchant and readable storage medium
CN114676319B (en) * 2022-03-01 2023-11-24 广州云趣信息科技有限公司 Method and device for acquiring merchant name and readable storage medium

Also Published As

Publication number Publication date
CN112036172B (en) 2022-04-15
WO2021159757A1 (en) 2021-08-19

Similar Documents

Publication Publication Date Title
CN112036172B (en) Entity identification method and device based on abbreviated data of model and computer equipment
WO2021135910A1 (en) Machine reading comprehension-based information extraction method and related device
CN111581229B (en) SQL statement generation method and device, computer equipment and storage medium
CN112612894B (en) Method and device for training intention recognition model, computer equipment and storage medium
CN110032739B (en) Method and system for extracting named entities of Chinese electronic medical record
CN111651992A (en) Named entity labeling method and device, computer equipment and storage medium
CN112347310A (en) Event processing information query method and device, computer equipment and storage medium
CN110688853B (en) Sequence labeling method and device, computer equipment and storage medium
CN109471793A (en) A kind of webpage automatic test defect positioning method based on deep learning
CN109977014A (en) Code error recognition methods, device, equipment and storage medium based on block chain
CN111177345A (en) Intelligent question and answer method and device based on knowledge graph and computer equipment
CN111680634A (en) Document file processing method and device, computer equipment and storage medium
CN111159770A (en) Text data desensitization method, device, medium and electronic equipment
CN112016274B (en) Medical text structuring method, device, computer equipment and storage medium
CN112329865A (en) Data anomaly identification method and device based on self-encoder and computer equipment
CN112908473A (en) Model-based data processing method and device, computer equipment and storage medium
CN113657105A (en) Medical entity extraction method, device, equipment and medium based on vocabulary enhancement
CN112652386A (en) Triage data processing method and device, computer equipment and storage medium
CN112463599A (en) Automatic testing method and device, computer equipment and storage medium
CN113986581A (en) Data aggregation processing method and device, computer equipment and storage medium
CN112637282B (en) Information pushing method and device, computer equipment and storage medium
CN111859862B (en) Text data labeling method and device, storage medium and electronic device
CN113343677A (en) Intention identification method and device, electronic equipment and storage medium
CN113177109A (en) Text weak labeling method, device, equipment and storage medium
CN115455922B (en) Form verification method, form verification device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant