CN112582073B - Medical information acquisition method, device, electronic equipment and medium - Google Patents

Medical information acquisition method, device, electronic equipment and medium Download PDF

Info

Publication number
CN112582073B
CN112582073B CN202011642829.3A CN202011642829A CN112582073B CN 112582073 B CN112582073 B CN 112582073B CN 202011642829 A CN202011642829 A CN 202011642829A CN 112582073 B CN112582073 B CN 112582073B
Authority
CN
China
Prior art keywords
entity
node
medical information
sequence
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011642829.3A
Other languages
Chinese (zh)
Other versions
CN112582073A (en
Inventor
王军涛
艾杰
梅昀
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin Xinkaixin Life Technology Co ltd
Original Assignee
Tianjin Xinkaixin Life Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin Xinkaixin Life Technology Co ltd filed Critical Tianjin Xinkaixin Life Technology Co ltd
Priority to CN202011642829.3A priority Critical patent/CN112582073B/en
Publication of CN112582073A publication Critical patent/CN112582073A/en
Application granted granted Critical
Publication of CN112582073B publication Critical patent/CN112582073B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data
    • G16H10/60ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records

Abstract

The disclosure provides a medical information acquisition method, a medical information acquisition device, electronic equipment and a computer-readable storage medium, and relates to the technical field of databases. The medical information acquisition method comprises the following steps: executing entity identification operation on the unstructured case information to generate an entity sequence; determining a pattern map adapted to the type of the medical information; when detecting that the entity sequence comprises the entity matched with the type, performing matching operation on the entity sequence and the pattern diagram to determine an entity relation in the entity sequence; converting the unstructured case information into structured case information based on the entity relationship; acquiring the medical information based on the structured case information. Through the technical scheme disclosed by the invention, the configuration of the pattern diagram can realize the definition of a relatively complex knowledge rule, so that the medical information extracted based on the structured case information has higher accuracy.

Description

Medical information acquisition method, device, electronic equipment and medium
Technical Field
The present disclosure relates to the field of database technologies, and in particular, to a medical information obtaining method, an apparatus, an electronic device, and a computer-readable storage medium.
Background
The electronic medical record system of hospitals in China consists of a huge medical data information base. A large amount of valuable case information for diagnosis, treatment, examination, medication, clinical adverse events, etc. is recorded in the patient's electronic medical record. However, since these pieces of information are basically unstructured data (which are readable by humans, have no fixed structure, and have great differences according to writing habits of different doctors), it is difficult to accurately extract some pieces of information that are of great interest to researchers using existing search techniques.
In the related art, the knowledge extraction may be performed by using a regular expression, and the knowledge extraction in the information may be performed by using an NLP (Natural Language Processing), but all of them have the defect of low accuracy.
It is to be noted that the information disclosed in the above background section is only for enhancement of understanding of the background of the present disclosure, and thus may include information that does not constitute prior art known to those of ordinary skill in the art.
Disclosure of Invention
An object of the present disclosure is to provide a medical information acquisition method, apparatus, electronic device, and computer-readable storage medium, which overcome, at least to some extent, the problem in the related art that the accuracy of knowledge extraction is not high enough.
Additional features and advantages of the disclosure will be set forth in the detailed description which follows, or in part will be obvious from the description, or may be learned by practice of the disclosure.
According to an aspect of the present disclosure, there is provided a medical information acquisition method including: executing entity identification operation on the unstructured case information to generate an entity sequence; determining a pattern map adapted to the type of the medical information; when detecting that the entity sequence comprises the entity matched with the type, performing matching operation on the entity sequence and the pattern diagram to determine an entity relation in the entity sequence; converting the unstructured case information into structured case information based on the entity relationship; acquiring the medical information based on the structured case information.
In one embodiment, the determining the pattern map adapted to the type of the medical information comprises: configuring a plurality of nodes in the pattern graph based on the type of the medical information; determining a node relationship between two of the nodes based on the triplet expressions; and constructing the pattern graph based on the node and the node relation.
In one embodiment, said building said schema graph based on said nodes and said node relationships comprises: determining a target node of the plurality of nodes based on entity attributes of the type of the medical information to determine the node relationship between other nodes and the target node; detecting whether the other nodes comprise preposed negative word nodes or not; determining the preposed negative word node as a starting node in the pattern graph when the other nodes are detected to comprise the preposed negative word node; determining the target node as a starting node in the pattern graph upon detecting that the preceding negative word node is not included in the other nodes.
In one embodiment, said building said schema graph based on said nodes and said node relationships further comprises: determining a corresponding relation operation model according to the triple-based expression; and determining a termination node corresponding to the starting node and the editing attribute of the termination node based on a relational operation model.
In one embodiment, the performing, when it is detected that an entity matching the type is included in the entity sequence, a matching operation on the entity sequence and the pattern diagram to determine an entity relationship in the entity sequence includes: when the entity sequence is detected to comprise the entity matched with the type, determining a starting node matched with the entity; performing, from the starting node, a matching operation based on entity context information in the entity sequence and next node information of the starting node with the pattern diagram; and obtaining the entity relation in the entity sequence based on the result of the matching operation.
In one embodiment, the performing a matching operation with the pattern graph based on the entity context information in the entity sequence and the next node information of the start node comprises: inputting the entity context information and the next node information into the relational operation model to output corresponding transfer conditions; determining the node relationship corresponding to the transition condition in the pattern graph; constructing a transition matrix based on the node relationships to determine a result of the matching operation based on the transition matrix.
In one embodiment, said obtaining said medical information based on said structured case information comprises: detecting a shortest path of the node relation in the transition matrix based on a preset scene selection model; determining the shortest path as a transfer path; extracting the medical information from the structured case based on the transfer path.
In one embodiment, the performing an entity identification operation on the unstructured case information, the generating the entity sequence comprising: performing preprocessing operation on unstructured case information to generate a preprocessed text; and executing entity recognition operation on the preprocessed text to generate an entity sequence.
In one embodiment, the performing an entity recognition operation on the preprocessed text, generating an entity sequence includes: performing the recognition operation on the preprocessed text based on a conditional random field model to generate the sequence of entities.
In one embodiment, said performing said recognition operation on said preprocessed text based on said conditional random field model to generate said sequence of entities comprises: inputting the preprocessed text into the conditional random field model to show entity recognition results; and correcting and outputting the entity identification result to generate the entity sequence.
In one embodiment, further comprising: optimizing a training text library of the conditional random field model based on the entity sequence; and updating the conditional random field model based on the optimized training text library.
According to another aspect of the present disclosure, there is provided a medical information acquisition apparatus including: the entity identification module is used for executing entity identification operation on the unstructured case information to generate an entity sequence; a determination module for determining a pattern map adapted to the type of the medical information; the matching module is used for performing matching operation on the entity sequence and the pattern diagram to determine an entity relationship in the entity sequence when detecting that the entity sequence comprises the entity matched with the type; a conversion module to convert the unstructured case information into structured case information based on the entity relationship; an obtaining module for obtaining the medical information based on the structured case information.
According to still another aspect of the present disclosure, there is provided an electronic device including: a processor; and a memory for storing executable instructions of the processor; wherein the processor is configured to perform any one of the medical information acquisition methods described above via execution of executable instructions.
According to yet another aspect of the present disclosure, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the medical information acquisition method of any one of the above.
According to the medical information acquisition scheme provided by the embodiment of the disclosure, the entity identification is carried out on the unstructured case information of a patient to obtain the identified entity sequence, the entity sequence is matched with the pattern diagram of the medical information to be extracted, the unstructured case information is converted into the structured case information, the required medical information is extracted based on the structured case, and the configuration of the pattern diagram can realize the definition of a relatively complex knowledge rule by converting the case information from the unstructured state to the structured state, so that the medical information extracted based on the structured case information has higher accuracy.
Furthermore, the mode diagram is configured based on the type of the medical information, so that the generalization capability of the mode diagram is favorably improved, the unstructured-to-structured conversion can be performed based on the generalized mode diagram, and the efficiency of information extraction can be improved while different types of medical information are extracted.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure. It should be apparent that the drawings in the following description are merely examples of the disclosure and that other drawings may be derived by those of ordinary skill in the art without inventive effort.
Fig. 1 is a schematic diagram showing a structure of a medical information acquisition system according to an embodiment of the present disclosure;
FIG. 2 shows a flow chart of a medical information acquisition method in an embodiment of the present disclosure;
FIG. 3 shows a flow chart of another medical information acquisition method in an embodiment of the present disclosure;
FIG. 4 shows a schematic diagram of one pattern diagram in an embodiment of the present disclosure;
FIG. 5 shows a schematic diagram of another schematic diagram in an embodiment of the disclosure;
FIG. 6 is a flow chart illustrating a method for acquiring medical information according to yet another embodiment of the present disclosure;
fig. 7 shows a schematic diagram of a medical information acquisition device in an embodiment of the present disclosure;
fig. 8 shows a schematic diagram of an electronic device in an embodiment of the disclosure.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
Furthermore, the drawings are merely schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus their repetitive description will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.
According to the scheme provided by the application, the entity identification is carried out on the unstructured case information of a patient to obtain the identified entity sequence, the entity sequence is matched with the pattern diagram of the medical information to be extracted, the unstructured case information is converted into the structured case information, the required medical information is extracted based on the structured case, the case information is converted from the unstructured case information to the structured case information, the pattern diagram configuration can also realize the definition of a relatively complex rule, and the medical information extracted based on the structured case information has higher accuracy.
For ease of understanding, the following first explains several terms referred to in this application.
Interpretivity, refers to the interpretation of model behavior, the purpose of which is to turn the behavior of a model into a causal relationship understood by humans.
Generalization ability (generalization ability) refers to the ability of a machine learning algorithm to adapt to a fresh sample. The purpose of learning is to learn the rules hidden behind the data, and for data beyond a learning set with the same rules, a trained network can also give appropriate output, and the capability is called generalization capability.
The entities generally refer to names of people, places, organizations, etc., and in the news domain, the entities refer to characters, places, organizations, etc. If the method is further expanded, the words concerned by the user can be understood, such as in a product title, entities refer to brand words, article words and article attribute words, and the words + emotion polarity words can be used for understanding the shopping willingness of the customer in more detail.
The entity recognition is divided into two steps from the recognition step, wherein the first step is to recognize the entity word boundary, namely the starting position and the ending position of the entity; the second step identifies the entity type, i.e., the specific entity types mentioned above, such as name of person, place name, organization name, etc.
The scheme provided by the embodiment of the application relates to technologies such as data format processing and entity identification, and is specifically explained by the following embodiment.
Fig. 1 shows a schematic structural diagram of a medical information acquisition system in an embodiment of the present disclosure, which includes a plurality of terminals 120 and a server cluster 140.
The terminal 120 may be a mobile terminal such as a mobile phone, a game console, a tablet Computer, an e-book reader, smart glasses, an MP4 (Moving Picture Experts Group Audio Layer IV) player, an intelligent home device, an AR (Augmented Reality) device, a VR (Virtual Reality) device, or a Personal Computer (Personal Computer), such as a laptop Computer and a desktop Computer.
Among them, an application for acquiring the provided medical information may be installed in the terminal 120.
The terminals 120 are connected to the server cluster 140 through a communication network. Optionally, the communication network is a wired network or a wireless network.
The server cluster 140 is a server, or is composed of a plurality of servers, or is a virtualization platform, or is a cloud computing service center. The server cluster 140 is used to provide background services for providing the medical information acquisition application. Optionally, the server cluster 140 undertakes primary computational work and the terminal 120 undertakes secondary computational work; alternatively, the server cluster 140 undertakes the secondary computing work and the terminal 120 undertakes the primary computing work; alternatively, the terminal 120 and the server cluster 140 perform cooperative computing by using a distributed computing architecture.
In some alternative embodiments, the server cluster 140 is used to store medical information acquisition models and the like.
Alternatively, the clients of the applications installed in different terminals 120 are the same, or the clients of the applications installed on two terminals 120 are clients of the same type of application of different control system platforms. Based on different terminal platforms, the specific form of the client of the application program may also be different, for example, the client of the application program may be a mobile phone client, a PC client, or a World Wide Web (Web) client.
Those skilled in the art will appreciate that the number of terminals 120 described above may be greater or fewer. For example, the number of the terminals may be only one, or several tens or hundreds of the terminals, or more. The number of terminals and the type of the device are not limited in the embodiments of the present application.
Optionally, the system may further include a management device (not shown in fig. 1), and the management device is connected to the server cluster 140 through a communication network. Optionally, the communication network is a wired network or a wireless network.
Optionally, the wireless or wired networks described above use standard communication techniques and/or protocols. The Network is typically the Internet, but can be any Network including, but not limited to, a Local Area Network (LAN), a Metropolitan Area Network (MAN), a Wide Area Network (WAN), a mobile, wired or wireless Network, a private Network, or any combination of virtual private networks. In some embodiments, data exchanged over a network is represented using techniques and/or formats including HyperText Mark-up Language (HTML), extensible Mark-up Language (XML), and the like. All or some of the links may also be encrypted using conventional encryption techniques such as Secure Socket Layer (SSL), transport Layer Security (TLS), virtual Private Network (VPN), internet protocol Security (IPsec). In other embodiments, custom and/or dedicated data communication techniques may also be used in place of, or in addition to, the data communication techniques described above.
Hereinafter, each step in the medical information acquisition method according to the present exemplary embodiment will be described in more detail with reference to the drawings and examples.
Fig. 2 shows a flowchart of a medical information acquisition method in an embodiment of the present disclosure. The method provided by the embodiment of the present disclosure may be performed by any electronic device with computing processing capability, for example, the terminal 120 and/or the server cluster 140 in fig. 1. In the following description, the terminal 120 is taken as an execution subject for illustration.
As shown in fig. 2, the terminal 120 executes a medical information acquisition method including the steps of:
step S202, entity identification operation is performed on the unstructured case information, and an entity sequence is generated.
Wherein the case information includes different types of medical information.
Unstructured case information index data has an irregular or incomplete structure, does not have a predefined data model, is inconvenient to represent by a database two-dimensional logic table, and can be understood as patient information directly input into a system by a doctor in the medical field.
The entity sequence refers to an entity identification result formed by performing sequence annotation on an entity.
Step S204, determining a mode map matched with the type of the medical information.
The schema diagram is a generalizable schema diagram comprising relationships between generalizable nodes and generalizable nodes.
In addition, different pattern diagrams are configured for different medical information so as to meet the data processing of different types of medical information, and therefore generalization of the pattern diagrams is achieved.
The type of medical information includes, but is not limited to, diagnosis, treatment, examination, medication, clinical adverse events, and the like.
In step S206, when it is detected that the entity sequence includes an entity matching the type, the entity sequence is matched with the pattern diagram to determine an entity relationship in the entity sequence.
Step S208, the unstructured case information is converted into structured case information based on the entity relationship.
The structured case information is data logically expressed and realized by a two-dimensional table structure, and unstructured case information is obtained.
Step S210, medical information is acquired based on the structured case information.
The medical information refers to specific diagnosis information, specific treatment information, specific examination information, specific medication information, specific clinical adverse events, and the like.
In this embodiment, the unstructured case information of the patient is subjected to entity recognition to obtain a recognized entity sequence, the entity sequence is matched with a pattern diagram of medical information to be extracted, the unstructured case information is converted into structured case information, the required medical information is extracted based on the structured case, and the case information is converted from unstructured to structured, so that the pattern diagram can be configured to realize definition of a relatively complex rule, and the medical information extracted based on the structured case information has higher accuracy.
Furthermore, the mode diagram is configured based on the type of the medical information, so that the generalization capability of the mode diagram is favorably improved, the unstructured-to-structured conversion can be performed based on the generalized mode diagram, and the efficiency of information extraction can be improved while different types of medical information are extracted.
In one embodiment, step S204, determining a specific implementation of the pattern map adapted to the type of the medical information includes:
a plurality of nodes in the pattern graph are configured based on the type of medical information.
The node is also an entity node and is used for describing the entity, the pattern graph comprises generalizable entity nodes and generalizable node relations, the generalizable entity nodes can be defined by a group of triple expressions, and the node is combined into a group of expressions by using some type attributes in the entity and an entity operation model.
Determining a pattern map that fits the type of medical information further comprises: a node relationship between two nodes is determined based on the triplet expression.
The triple RDF (Resource Description Framework) expression refers to a way of describing resources. Briefly, each description is a short sentence made up of a triplet of subject-predicate elements.
Determining a pattern map that fits the type of medical information further comprises: and constructing a pattern graph based on the node and node relation.
The generalizable node relation is a group of ternary expressions formed by front and rear entities, texts between the entities and an entity relation operation model, so as to judge whether a certain specific relation is met between the two entities.
In this embodiment, nodes in the pattern diagram are configured based on the type attribute of the medical information and the operation model of the entity, and the adaptation based on the type attribute of the medical information is suitable for realizing the generalization function of the nodes, further, the entity relationship is configured based on the context relationship between the entity and the entity on the nodes and the corresponding operation model, so as to realize the generalization function of the node relationship, and the pattern diagram is formed based on the entity and the entity relationship, so as to realize the generalization function of the pattern diagram, so as to realize the structured processing of the unstructured information.
In one embodiment, the step of constructing a pattern graph based on node-to-node relationships in the step of determining a pattern graph adapted to the type of medical information includes:
a target node of the plurality of nodes is determined based on the entity attributes of the type of medical information to determine a node relationship between the other nodes and the target node.
The target node may be understood as a node that can be used as a label for medical information.
And detecting whether the other nodes comprise the preposed negative word node.
And when other nodes are detected to comprise the preposed negative word node, determining the preposed negative word node as a starting node in the pattern graph.
And determining the target node as a starting node in the pattern graph when detecting that the preposed negative word node is not included in other nodes.
In one embodiment, the constructing the pattern graph based on the node-node relationship in the step of determining the pattern graph adapted to the type of the medical information further includes: determining a corresponding relation operation model according to the triple-based expression; and determining a termination node corresponding to the starting node and the editing attribute of the termination node based on the relational operation model.
Wherein, the attribute is edited until the node content is reserved or deleted.
Referring to fig. 3 and 4, a pattern diagram is created by taking clinical adverse events as an example.
S means an initial node, M means an intermediate node, T means a terminal node, D means that the node entity can be deleted, and R means that the node entity needs to be reserved.
As shown in the schematic diagram of event 1 shown in fig. 3, event 1 is a start node, the node relationship at least includes a1, a2, and a3, and in addition to a stop node corresponding to a3, the node relationship also includes a last negative word at a certain distance from event 1, and an adjacent last negative word, and in addition, may also include a level corresponding to event 1, a unit of the level, and the like.
Wherein, a1 may be a distance between entities, a2 may be that two entities are adjacent entities, and a3 may be an event source, etc.
As shown in the pattern diagram of event 2 in fig. 4, event 2 is preceded by a negative word, i.e. a previous negative word, and the previous negative word has a node relationship b with event 2, where the previous negative word is a start node and event 2 is a stop node.
As shown in fig. 5, in an embodiment, in step S206, when it is detected that the entity sequence includes an entity matching the type, a specific implementation manner of performing a matching operation on the entity sequence and the pattern diagram to determine an entity relationship in the entity sequence includes:
step S502, when detecting that the entity sequence includes the entity matched with the type, determining a starting node adapted to the entity.
And when the prefix negative word is included, determining the prefix negative word node as the starting node, and when the prefix negative word is not included, determining the node of the entity matched with the type as the starting node.
From the starting node, performing matching operation based on the entity context information in the entity sequence and the next node information of the starting node and the pattern diagram, specifically including:
step S504, the entity context information and the next node information are input into the relational operation model to output the corresponding transfer condition.
Taking the starting node as an example, the transfer condition refers to which nodes the starting node can establish a node relationship with.
Step S506, the node relation corresponding to the transition condition in the pattern graph is determined.
Step S508, constructing a transition matrix based on the node relationship to determine a result of the matching operation based on the transition matrix.
The transition matrix can be understood as how many kinds of designated nodes are included in the matrix, such as node relationships between the start node and other nodes.
Step S510, obtaining an entity relationship in the entity sequence based on the result of the matching operation.
In this embodiment, the entity sequence and the operation in the entity sequence are input by the generalized schema diagram, the entity sequence and the context information between the entities. Firstly, searching an entity which accords with an initial node in a graph in an entity sequence, taking the entity as the initial node, further calculating a transfer condition through entity context information and a next entity node to obtain a transfer matrix, further calculating a transfer path, and continuing matching according to the path until the node with an end mark is reached to obtain all entity sequences which accord with a mode.
In one embodiment, step 210, a specific implementation of acquiring medical information based on structured case information, includes: detecting a shortest path of a node relation in a transition matrix based on a preset scene selection model; determining the shortest path as a transfer path; medical information is extracted from the structured case based on the transfer path.
In this embodiment, by setting the post-result processor, the functions of cleaning, selecting, recombining, format converting, and the like are performed on the obtained multiple entity relationships, and the scene selection model is customized for different output requirements, and the shortest quota path matching the selection rule is determined, so that the output requirements of different scenes can be used.
In one embodiment, step S202, performing an entity identification operation on the unstructured case information to generate a specific implementation manner of an entity sequence, includes:
and executing preprocessing operation on the unstructured case information to generate a preprocessed text. And
and executing entity recognition operation on the preprocessed text to generate an entity sequence.
In the embodiment, the preprocessing of the unstructured case information is realized by arranging the front text processing module, so that some long-tail problems in the unstructured original case information are solved, and the accuracy and the recall rate of entity identification are further improved.
In one embodiment, the performing the entity recognition operation on the preprocessed text, and generating the entity sequence includes: a recognition operation is performed on the preprocessed text based on the conditional random field model to generate a sequence of entities.
The entity recognition is realized in a mode of combining a conditional random field model based on a feature template, and the feature template can refer to a manually defined binary feature function so as to mine the internal and context formation characteristics of the named entity. For a given position in a sentence, the position of the mentioned feature is a window, i.e. a context position. Moreover, different feature templates may be combined to form a new feature template. The advantage of the conditional random field model is that it can use the information that has been previously marked to obtain the optimal sequence by Viterbi decoding during the process of marking a position. When extracting features from each position in a sentence, the feature value which meets the condition is 1, and the feature value which does not meet the condition is 0; and inputting the characteristics into a conditional random field model, modeling label transfer in a training stage, and marking each position of a test sentence in an inference stage to realize the output of an entity sequence.
In one embodiment, performing a recognition operation on the preprocessed text based on the conditional random field model to generate the sequence of entities includes: inputting the preprocessed text into a conditional random field model to show an entity recognition result; and correcting and outputting the entity identification result to generate an entity sequence.
In the embodiment, after the entity sequence is obtained, the user-defined processor performs operations such as re-segmentation and correction on the entity sequence, so as to solve the problem of long tail of the relationship pattern between the entities, and make the pattern between the corrected entity sequences more convenient for subsequent pattern matching.
In one embodiment, further comprising: optimizing a training text library of a conditional random field model based on the entity sequence; and updating the conditional random field model based on the optimized training text library.
In the embodiment, in the execution process of the project, the recognition of the conditional random field model and the entity relationship can be adjusted according to the recognition result, and the purpose of quick iteration is realized by performing the above operation in an iteration manner.
As shown in FIG. 6, a medical information acquisition method according to the present disclosure includes
Step S602, a pre-processing operation is performed on the unstructured case information based on the pre-text processor, and a pre-processed text is generated.
Step S604, performing recognition operation on the preprocessed text based on the conditional random field model generated by the entity dictionary to obtain an entity recognition result.
Step S606, the entity sequence processor processes the entity identification result to generate an entity sequence.
Step S608, performing a matching operation on the entity sequence and the pattern diagram to determine an entity relationship in the entity sequence.
Step S610, generating structured case information based on the entity sequence and the entity relationship.
Step S612, extracting medical information from the structured case based on a preset scene selection model.
It is to be noted that the above-mentioned figures are only schematic illustrations of the processes involved in the method according to an exemplary embodiment of the invention, and are not intended to be limiting. It will be readily understood that the processes shown in the above figures are not intended to indicate or limit the chronological order of the processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, e.g., in multiple modules.
As will be appreciated by one skilled in the art, aspects of the present invention may be embodied as a system, method or program product. Accordingly, various aspects of the present invention may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" system.
The medical information acquisition apparatus 700 according to this embodiment of the present invention is described below with reference to fig. 7. The medical information acquisition apparatus 700 shown in fig. 7 is merely an example, and should not impose any limitation on the function and scope of use of the embodiment of the present invention.
The medical information acquisition apparatus 700 is represented in the form of a hardware module. The components of the medical information acquisition device 700 may include, but are not limited to: an entity identification module 702, configured to perform entity identification operation on the unstructured case information, and generate an entity sequence; a determining module 704 for determining a pattern map adapted to the type of medical information; a matching module 706, configured to, when it is detected that the entity sequence includes an entity matching the type, perform a matching operation on the entity sequence and the pattern diagram to determine an entity relationship in the entity sequence; a conversion module 708 to convert the unstructured case information to structured case information based on the entity relationship; an obtaining module 710 for obtaining medical information based on the structured case information.
In one embodiment, the determining module 704 is further configured to: configuring a plurality of nodes in a pattern graph based on the type of medical information; determining a node relationship between two nodes based on the triplet expression; and constructing a pattern graph based on the node and node relation.
In one embodiment, the determining module 704 is further configured to: determining a target node of the plurality of nodes based on the entity attributes of the type of the medical information to determine a node relationship between the other nodes and the target node; detecting whether other nodes comprise preposed negative word nodes or not; when other nodes are detected to comprise preposed negative word nodes, determining the preposed negative word nodes as initial nodes in the pattern graph; and determining the target node as the starting node in the pattern graph when detecting that the other nodes do not comprise the preposed negative word node.
In one embodiment, the determining module 704 is further configured to: determining a corresponding relation operation model according to the triple-based expression; and determining a termination node corresponding to the starting node and the editing attribute of the termination node based on the relational operation model.
In one embodiment, the matching module 706 is further configured to: when detecting that the entity sequence comprises entities matched with the types, determining a starting node adapted to the entities; performing matching operation based on the entity context information in the entity sequence and the next node information of the starting node and the pattern diagram from the starting node; and obtaining entity relations in the entity sequence based on the result of the matching operation.
In one embodiment, the matching module 706 is further configured to: inputting the entity context information and the next node information into a relational operation model to output a corresponding transfer condition; determining a node relation corresponding to the transfer condition in the pattern graph; a transition matrix is constructed based on the node relationships to determine a result of the matching operation based on the transition matrix.
In one embodiment, the obtaining module 710 is further configured to: detecting a shortest path of a node relation in a transition matrix based on a preset scene selection model; determining the shortest path as a transfer path; medical information is extracted from the structured case based on the transfer path.
In one embodiment, the entity identification module 702 is further configured to: performing preprocessing operation on unstructured case information to generate a preprocessed text; and executing entity recognition operation on the preprocessed text to generate an entity sequence.
In one embodiment, the entity identification module 702 is further configured to: a recognition operation is performed on the preprocessed text based on the conditional random field model to generate a sequence of entities.
In one embodiment, the entity identification module 702 is further configured to: inputting the preprocessed text into a conditional random field model to show an entity recognition result; and correcting and outputting the entity identification result to generate an entity sequence.
In one embodiment, an optimization module 712 is further included for optimizing a training text library of the conditional random field model based on the entity sequence; and updating the conditional random field model based on the optimized training text library.
An electronic device 800 according to this embodiment of the invention is described below with reference to fig. 8. The electronic device 800 shown in fig. 8 is only an example and should not bring any limitation to the function and the scope of use of the embodiments of the present invention.
As shown in fig. 8, electronic device 800 is in the form of a general purpose computing device. The components of the electronic device 800 may include, but are not limited to: the at least one processing unit 810, the at least one memory unit 820, and a bus 830 that couples the various system components including the memory unit 820 and the processing unit 810.
Where the memory unit stores program code, which may be executed by the processing unit 1010 to cause the processing unit 810 to perform steps according to various exemplary embodiments of the present invention as described in the "exemplary methods" section above in this specification. For example, the processing unit 1010 may perform steps S202, S204 to S210 as shown in fig. 2, and other steps defined in the medical information acquisition method of the present disclosure.
The storage unit 820 may include readable media in the form of volatile memory units such as a random access memory unit (RAM) 8201 and/or a cache memory unit 8202, and may further include a read only memory unit (ROM) 8203.
The storage unit 820 may also include a program/utility 8204 having a set (at least one) of program modules 8205, such program modules 8205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.
Bus 830 may be any one or more of several types of bus structures including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.
The electronic device 800 may also communicate with one or more external devices 860 (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a user to interact with the electronic device, and/or with any devices (e.g., router, modem, etc.) that enable the electronic device 800 to communicate with one or more other computing devices. Such communication may occur via input/output (I/O) interfaces 850. Also, the electronic device 800 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the internet) via the network adapter 850. As shown, the network adapter 850 communicates with the other modules of the electronic device 800 over a bus 830. It should be understood that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the electronic device, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, a terminal device, or a network device, etc.) to execute the method according to the embodiments of the present disclosure.
In an exemplary embodiment of the present disclosure, there is also provided a computer-readable storage medium having stored thereon a program product capable of implementing the above-described method of the present specification. In some possible embodiments, aspects of the invention may also be implemented in the form of a program product comprising program code means for causing a terminal device to carry out the steps according to various exemplary embodiments of the invention described in the above-mentioned "exemplary methods" section of the present description, when the program product is run on the terminal device.
According to the program product for realizing the method, the portable compact disc read only memory (CD-ROM) can be adopted, the program code is included, and the program product can be operated on terminal equipment, such as a personal computer. However, the program product of the present invention is not limited in this respect, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
A computer readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In situations involving remote computing devices, the remote computing devices may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to external computing devices (e.g., through the internet using an internet service provider).
It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.
Moreover, although the steps of the methods of the present disclosure are depicted in the drawings in a particular order, this does not require or imply that these steps must be performed in this particular order, or that all of the depicted steps must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions, etc.
Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, a mobile terminal, or a network device, etc.) to execute the method according to the embodiments of the present disclosure.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice in the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

Claims (14)

1. A medical information acquisition method characterized by comprising:
executing entity identification operation on the unstructured case information to generate an entity sequence;
determining a mode graph matched with the type of the medical information, wherein nodes in the mode graph are configured based on the type of the medical information and an entity operation model;
when detecting that the entity sequence comprises the entity matched with the type, performing matching operation on the entity sequence and the pattern diagram to determine an entity relation in the entity sequence;
converting the unstructured case information into structured case information based on the entity relationship;
acquiring the medical information based on the structured case information.
2. The medical information acquisition method according to claim 1, wherein the determining a pattern map adapted to the type of the medical information includes:
configuring a plurality of nodes in the pattern graph based on the type of the medical information;
determining a node relationship between two of the nodes based on the triplet expression;
and constructing the pattern graph based on the node and the node relation.
3. The medical information acquisition method according to claim 2, wherein the constructing the pattern graph based on the node and the node relationship includes:
determining a target node of the plurality of nodes based on entity attributes of the type of medical information to determine the node relationship between other nodes and the target node;
detecting whether the other nodes comprise preposed negative word nodes or not;
determining the preposed negative word node as a starting node in the pattern graph when the other nodes are detected to comprise the preposed negative word node;
determining the target node as a starting node in the pattern graph upon detecting that the preceding negative word node is not included in the other nodes.
4. The medical information acquisition method according to claim 3, wherein the constructing the pattern graph based on the node and the node relationship further comprises:
determining a corresponding relation operation model according to the triple-based expression;
and determining a termination node corresponding to the starting node and the editing attribute of the termination node based on the relational operation model.
5. The medical information acquisition method according to claim 4, wherein the performing a matching operation on the entity sequence and the pattern diagram to determine an entity relationship in the entity sequence when detecting that an entity matching the type is included in the entity sequence comprises:
when the entity sequence is detected to comprise the entity matched with the type, determining a starting node matched with the entity;
performing, from the starting node, a matching operation based on entity context information in the entity sequence and next node information of the starting node with the pattern diagram;
and obtaining the entity relation in the entity sequence based on the result of the matching operation.
6. The medical information acquisition method according to claim 5, wherein the performing matching operation based on the entity context information in the entity sequence and the next node information of the start node with the pattern diagram comprises:
inputting the entity context information and the next node information into the relational operation model to output corresponding transfer conditions;
determining the node relationship corresponding to the transition condition in the pattern graph;
constructing a transition matrix based on the node relationships to determine a result of the matching operation based on the transition matrix.
7. The medical information acquisition method according to claim 6, wherein the acquiring the medical information based on the structured case information includes:
detecting a shortest path of the node relation in the transition matrix based on a preset scene selection model;
determining the shortest path as a transfer path;
extracting the medical information from the structured case information based on the transfer path.
8. The medical information acquisition method according to any one of claims 1 to 7, wherein the performing an entity identification operation on the unstructured case information, the generating an entity sequence includes:
performing preprocessing operation on the unstructured case information to generate a preprocessed text;
and executing entity recognition operation on the preprocessed text to generate an entity sequence.
9. The medical information acquisition method according to claim 8, wherein the performing an entity recognition operation on the preprocessed text, generating an entity sequence comprises:
performing the entity recognition operation on the preprocessed text based on a conditional random field model to generate the sequence of entities.
10. The medical information acquisition method of claim 9, wherein the performing the entity recognition operation on the preprocessed text based on the conditional random field model to generate the entity sequence comprises:
inputting the preprocessed text into the conditional random field model to show an entity recognition result;
and correcting and outputting the entity identification result to generate the entity sequence.
11. The medical information acquisition method according to claim 9, characterized by further comprising:
optimizing a training text library of the conditional random field model based on the entity sequence;
and updating the conditional random field model based on the optimized training text library.
12. A medical information acquisition apparatus characterized by comprising:
the entity identification module is used for executing entity identification operation on the unstructured case information to generate an entity sequence;
the determining module is used for determining a mode graph adaptive to the type of the medical information, and nodes in the mode graph are configured based on the type of the medical information and an entity operation model;
the matching module is used for performing matching operation on the entity sequence and the pattern diagram to determine an entity relationship in the entity sequence when detecting that the entity sequence comprises the entity matched with the type;
a conversion module to convert the unstructured case information into structured case information based on the entity relationship;
an obtaining module for obtaining the medical information based on the structured case information.
13. An electronic device, comprising:
a processor; and
a memory for storing executable instructions of the processor;
wherein the processor is configured to perform the medical information acquisition method of any one of claims 1-11 via execution of the executable instructions.
14. A computer-readable storage medium on which a computer program is stored, the computer program being characterized by implementing the medical information acquisition method according to any one of claims 1 to 11 when executed by a processor.
CN202011642829.3A 2020-12-30 2020-12-30 Medical information acquisition method, device, electronic equipment and medium Active CN112582073B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011642829.3A CN112582073B (en) 2020-12-30 2020-12-30 Medical information acquisition method, device, electronic equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011642829.3A CN112582073B (en) 2020-12-30 2020-12-30 Medical information acquisition method, device, electronic equipment and medium

Publications (2)

Publication Number Publication Date
CN112582073A CN112582073A (en) 2021-03-30
CN112582073B true CN112582073B (en) 2022-10-11

Family

ID=75145524

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011642829.3A Active CN112582073B (en) 2020-12-30 2020-12-30 Medical information acquisition method, device, electronic equipment and medium

Country Status (1)

Country Link
CN (1) CN112582073B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113094477B (en) * 2021-06-09 2021-08-31 腾讯科技(深圳)有限公司 Data structuring method and device, computer equipment and storage medium
WO2024042348A1 (en) * 2022-08-24 2024-02-29 Evyd科技有限公司 English medical text structurization method, apparatus, medium and electronic device

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107609163A (en) * 2017-09-15 2018-01-19 南京深数信息科技有限公司 Generation method, storage medium and the server of medical knowledge collection of illustrative plates
CN109299472A (en) * 2018-11-09 2019-02-01 天津开心生活科技有限公司 Text data processing method, device, electronic equipment and computer-readable medium
CN110019839A (en) * 2018-01-03 2019-07-16 中国科学院计算技术研究所 Medical knowledge map construction method and system based on neural network and remote supervisory
CN111048167A (en) * 2019-10-31 2020-04-21 中电药明数据科技(成都)有限公司 Hierarchical case structuring method and system
CN111177393A (en) * 2020-01-02 2020-05-19 广东博智林机器人有限公司 Knowledge graph construction method and device, electronic equipment and storage medium
CN112015900A (en) * 2020-09-07 2020-12-01 平安科技(深圳)有限公司 Medical attribute knowledge graph construction method, device, equipment and medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107609163A (en) * 2017-09-15 2018-01-19 南京深数信息科技有限公司 Generation method, storage medium and the server of medical knowledge collection of illustrative plates
CN110019839A (en) * 2018-01-03 2019-07-16 中国科学院计算技术研究所 Medical knowledge map construction method and system based on neural network and remote supervisory
CN109299472A (en) * 2018-11-09 2019-02-01 天津开心生活科技有限公司 Text data processing method, device, electronic equipment and computer-readable medium
CN111048167A (en) * 2019-10-31 2020-04-21 中电药明数据科技(成都)有限公司 Hierarchical case structuring method and system
CN111177393A (en) * 2020-01-02 2020-05-19 广东博智林机器人有限公司 Knowledge graph construction method and device, electronic equipment and storage medium
CN112015900A (en) * 2020-09-07 2020-12-01 平安科技(深圳)有限公司 Medical attribute knowledge graph construction method, device, equipment and medium

Also Published As

Publication number Publication date
CN112582073A (en) 2021-03-30

Similar Documents

Publication Publication Date Title
US20230206087A1 (en) Techniques for building a knowledge graph in limited knowledge domains
CN106910501B (en) Text entities extracting method and device
CN107391677B (en) Method and device for generating Chinese general knowledge graph with entity relation attributes
CN108628830B (en) Semantic recognition method and device
CN107861954B (en) Information output method and device based on artificial intelligence
CN111159220B (en) Method and apparatus for outputting structured query statement
CN112582073B (en) Medical information acquisition method, device, electronic equipment and medium
US20220414463A1 (en) Automated troubleshooter
WO2021259205A1 (en) Text sequence generation method, apparatus and device, and medium
CN113656590B (en) Industry map construction method and device, electronic equipment and storage medium
CN110275962A (en) Method and apparatus for output information
CN116303537A (en) Data query method and device, electronic equipment and storage medium
JP7309811B2 (en) Data annotation method, apparatus, electronics and storage medium
CN114357195A (en) Knowledge graph-based question-answer pair generation method, device, equipment and medium
CN114064923A (en) Data processing method and device, electronic equipment and storage medium
CN115878818B (en) Geographic knowledge graph construction method, device, terminal and storage medium
WO2022073341A1 (en) Disease entity matching method and apparatus based on voice semantics, and computer device
CN112699642B (en) Index extraction method and device for complex medical texts, medium and electronic equipment
CN115269862A (en) Electric power question-answering and visualization system based on knowledge graph
CN108932225A (en) For natural language demand to be converted into the method and system of semantic modeling language statement
CN114519071A (en) Generation method, matching method, system, device and medium of rule matching model
CN114391151A (en) Enhanced natural language generation platform
CN109739970A (en) Information processing method and device and electronic equipment
CN111104118A (en) AIML-based natural language instruction execution method and system
WO2023050218A1 (en) Instruction file generation method and apparatus, electronic device, and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant