CN114613515A

CN114613515A - Medical entity relationship extraction method and device, storage medium and electronic equipment

Info

Publication number: CN114613515A
Application number: CN202210315705.7A
Authority: CN
Inventors: 郝东林
Original assignee: Yidu Cloud Beijing Technology Co Ltd
Current assignee: Yidu Cloud Beijing Technology Co Ltd
Priority date: 2022-03-28
Filing date: 2022-03-28
Publication date: 2022-06-10
Anticipated expiration: 2042-03-28
Also published as: CN114613515B

Abstract

The disclosure belongs to the technical field of natural language processing, and relates to a method and a device for extracting a relationship of a medical entity, a storage medium and electronic equipment. The method comprises the following steps: acquiring a position logic relationship between standard medical entities, and configuring a finite state machine according to the position logic relationship; acquiring a text to be recognized, and performing word segmentation processing on the text to be recognized to obtain text word segmentation; and determining word segmentation labels corresponding to the text word segmentation according to the standard medical entity, and identifying the medical relation of the text to be identified according to the word segmentation labels and the finite state machine. The method and the device have the advantages that manual labeling is not needed, labor cost and time cost are saved, upgrading and iteration are more convenient, logic tracing of text segmentation and medical relations is facilitated, the source tracing requirements of the text segmentation and the medical relations are met, the intelligent degree, the automatic degree and the recognition accuracy of medical relation recognition are improved, the extraction requirements of various medical texts are met, and the application scene of relation extraction of medical entities is enriched.

Description

Medical entity relationship extraction method and device, storage medium and electronic equipment

Technical Field

The present disclosure relates to the field of natural language processing technologies, and in particular, to a method and an apparatus for extracting a relationship between a medical entity, a computer-readable storage medium, and an electronic device.

Background

The medical data is data generated by a doctor during diagnosis and treatment of a patient, which is centered on the patient. The application value of medical data exists in medical research, public health, personal health, remote consultation, medical diagnosis and other aspects. The automatic mining of the corresponding knowledge from the medical data is to automatically identify various named entities and the relationships among the entities, which are closely related to the health of the patient, in the electronic medical record text.

At present, entity identification and relationship identification of medical documents mainly depend on manual labeling, or deep learning technologies such as pre-training models are adopted to identify entities and relationships. However, medical data typically consists of unstructured or semi-structured text. Unstructured and semi-structured text is difficult to use directly, and therefore, it is obvious that all processing cannot be performed manually in the face of massive medical text data. The adoption of deep learning technologies such as the pre-training model also requires a large amount of manual data labeling, which is high in cost, and the model of the pre-training model has poor interpretability, so that unidentified entities and entity relationships cannot be solved timely and effectively.

In view of this, there is a need in the art to develop a new method and apparatus for extracting relationships of medical entities.

It is to be noted that the information disclosed in the above background section is only for enhancement of understanding of the background of the present disclosure, and thus may include information that does not constitute prior art known to those of ordinary skill in the art.

Disclosure of Invention

The present disclosure is directed to a method for extracting a relationship between medical entities, a device for extracting a relationship between medical entities, a computer-readable storage medium, and an electronic device, so as to overcome, at least to some extent, the technical problems of high identification cost and insufficient accuracy due to the limitations of related technologies.

Additional features and advantages of the disclosure will be set forth in the detailed description which follows, or in part will be obvious from the description, or may be learned by practice of the disclosure.

According to an aspect of the present disclosure, there is provided a method of relationship extraction of a medical entity, the method comprising: acquiring a position logic relationship between standard medical entities, and configuring a finite state machine according to the position logic relationship;

acquiring a text to be recognized, and performing word segmentation processing on the text to be recognized to obtain text word segmentation;

and determining word segmentation labels corresponding to the text word segmentation according to the standard medical entity, and identifying the medical relation of the text to be identified according to the word segmentation labels and the finite state machine.

In one exemplary embodiment of the present disclosure, the location logical relationship includes an attribute logical relationship and an entity logical relationship,

the acquiring of the position logic relationship between the standard medical entities comprises:

acquiring a medical word list, and performing word segmentation processing on the medical word list to obtain a standard medical entity;

acquiring attributes of the standard medical entities, and counting the attribute logic relationship of the attributes;

and counting the entity logic relation of the standard medical entity.

In an exemplary embodiment of the present disclosure, the obtaining the medical vocabulary includes:

and acquiring the medical word list by using the statistical model.

In an exemplary embodiment of the present disclosure, the determining, according to the standard medical entity, a word segmentation label corresponding to the text word segmentation includes:

acquiring an entity label corresponding to the standard medical entity, and performing similarity calculation on the standard medical entity and the text participle to obtain semantic similarity;

obtaining a similarity threshold corresponding to the semantic similarity, and comparing the semantic similarity with the similarity threshold to obtain a comparison result;

and if the semantic similarity is larger than the similarity threshold value according to the comparison result, determining that the entity label is a word segmentation label corresponding to the text word segmentation.

In an exemplary embodiment of the present disclosure, the calculating the similarity between the standard medical entity and the text participle to obtain a semantic similarity includes:

and performing similarity calculation on the standard medical entity and the text participles by using a language representation model to obtain semantic similarity.

In one exemplary embodiment of the present disclosure, the finite state machines include an entity finite state machine and a relationship finite state machine,

the configuring the finite state machine according to the position logic relationship comprises:

configuring the entity finite state machine according to the attribute logic relationship;

and configuring the relation finite state machine according to the entity logic relation.

In an exemplary embodiment of the present disclosure, the recognizing the medical relationship of the text to be recognized according to the word segmentation label and the finite state machine includes:

recognizing a text medical entity in the text to be recognized according to the word segmentation label and the entity finite state machine;

and identifying the medical relation of the text to be identified according to the text medical entity and the relation finite state machine.

recognizing a text medical entity and a non-medical entity in the text to be recognized according to the word segmentation label and the entity finite state machine;

and identifying the medical relation of the text to be identified according to the text medical entity, the non-medical entity and the relation finite state machine.

According to an aspect of the present disclosure, there is provided a relationship extraction apparatus of a medical entity, the apparatus including: the relation configuration module is configured to acquire a position logic relation between standard medical entities and configure a finite state machine according to the position logic relation;

the text word segmentation module is configured to acquire a text to be recognized and perform word segmentation processing on the text to be recognized to obtain text words;

and the relation identification module is configured to determine word segmentation labels corresponding to the text word segmentation according to the standard medical entity and identify the medical relation of the text to be identified according to the word segmentation labels and the finite state machine.

and counting the entity logic relation of the standard medical entity.

and acquiring the medical word list by using the statistical model.

In an exemplary embodiment of the present disclosure, the calculating the similarity between the standard medical entity and the text segmentation to obtain the semantic similarity includes:

In an exemplary embodiment of the present disclosure, the finite state machines include an entity finite state machine and a relationship finite state machine,

According to an aspect of the present disclosure, there is provided an electronic device including: a processor and a memory; wherein the memory has stored thereon computer readable instructions which, when executed by the processor, implement the method of relationship extraction of medical entities of any of the above-described exemplary embodiments.

According to an aspect of the present disclosure, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method of relationship extraction for medical entities in any of the above-described exemplary embodiments.

As can be seen from the foregoing technical solutions, the method for extracting a relationship between medical entities, the apparatus for extracting a relationship between medical entities, the computer storage medium, and the electronic device in the exemplary embodiments of the present disclosure have at least the following advantages and positive effects:

in the method and the device provided by the exemplary embodiment of the disclosure, the position logic relationship is managed through the finite state machine, the maintainability of the position logic relationship is improved, manual labeling is not needed, the labor cost and the time cost are saved, upgrading and iteration are more convenient, the word segmentation label of the text word segmentation can be determined according to the standard medical entity, the text word segmentation and the medical relationship can be conveniently tracked logically, and the traceability requirement of the text word segmentation and the medical relationship is met. Furthermore, a method for quickly and effectively identifying medical relationships is provided, the intelligent degree, the automatic degree and the identification accuracy of medical relationship identification are improved, the extraction requirements of medical texts such as various medical documents are met, and the application scene of relationship extraction of medical entities is enriched.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure. It is to be understood that the drawings in the following description are merely exemplary of the disclosure, and that other drawings may be derived from those drawings by one of ordinary skill in the art without the exercise of inventive faculty.

Fig. 1 schematically illustrates a flow chart of a method of relationship extraction for a medical entity in an exemplary embodiment of the disclosure;

fig. 2 schematically illustrates a flow chart of a method of obtaining a logical relationship of locations in an exemplary embodiment of the present disclosure;

FIG. 3 schematically illustrates a flow diagram of a method of configuring a finite state machine in an exemplary embodiment of the present disclosure;

FIG. 4 schematically illustrates a flow chart of a method of determining a segmentation label for a text segmentation in an exemplary embodiment of the present disclosure;

FIG. 5 schematically illustrates a flow chart of a method of determining a medical relationship of text to be recognized in an exemplary embodiment of the present disclosure;

FIG. 6 schematically illustrates a flow chart of another method of determining a medical relationship of text to be recognized in an exemplary embodiment of the present disclosure;

fig. 7 schematically illustrates a structural diagram of a relationship extraction apparatus of a medical entity in an exemplary embodiment of the disclosure;

FIG. 8 schematically illustrates an electronic device for implementing a relationship extraction method for medical entities in an exemplary embodiment of the disclosure;

fig. 9 schematically illustrates a computer-readable storage medium for implementing a relationship extraction method for a medical entity in an exemplary embodiment of the disclosure.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the disclosure. One skilled in the relevant art will recognize, however, that the subject matter of the present disclosure can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and the like. In other instances, well-known technical solutions have not been shown or described in detail to avoid obscuring aspects of the present disclosure.

The terms "a," "an," "the," and "said" are used in this specification to denote the presence of one or more elements/components/parts/etc.; the terms "comprising" and "having" are intended to be inclusive and mean that there may be additional elements/components/etc. other than the listed elements/components/etc.; the terms "first" and "second", etc. are used merely as labels, and are not limiting on the number of their objects.

Furthermore, the drawings are merely schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus their repetitive description will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities.

Medical data is data that is generated during a doctor's diagnosis and treatment of a patient, centered on the patient. The medical data may include a variety of data including patient profile data, electronic medical record data, medical image data, medical management data, economic data, medical device and instrument data, and the like. The application value of medical data also exists in medical research, public health, personal health, remote consultation and medical diagnosis.

For example, in the electronic medical record of a patient, the "first CT (Computed Tomography) examination shows lacunar infarction" in the medical record, the "first CT" is an examination means, and the "lacunar infarction" is a disease. Both of which are referred to as named entities in electronic medical record information extraction studies. The relationship between these two entities is that "first CT" confirms the occurrence of "lacunar infarction", or that "lacunar infarction" can be confirmed by examination of "first CT".

The automatic mining of the knowledge from the medical data is to automatically identify various named entities and relationships among the entities in the electronic medical record text, which are closely related to the health of the patient.

However, medical data typically consists of unstructured or semi-structured text. Unstructured and semi-structured text is difficult to use directly, and obviously cannot be processed by manpower in the face of massive medical text data.

Although the identification of entities and relationships using deep learning techniques such as a pre-trained model is also attempted, the pre-trained model also requires a large amount of manpower to label data, which is costly.

Among them, pre-training (pre-training) is a process of learning a neural network model to a general feature in a data set by training the neural network model using a large data set. The pre-training is intended to provide superior model parameters for subsequent neural network model training on a particular data set.

In addition, the pre-training model learns the relative position relationship among the words based on a large amount of texts, and the position relationship is characterized by probability transition parameters. Such probability transition parameters are not understandable to humans and, therefore, the interpretability of the pre-trained model is also not good.

Then, when the entity or relationship is not recognized (badcase), that is, when the error case occurs, the human cannot manually change the model parameters of the pre-training model. Therefore, the pre-training model cannot quickly and effectively solve the recognition error.

Therefore, it is important how to automatically find out knowledge from massive texts so that people can understand and use medical data at low cost.

In order to solve the problems in the related art, the present disclosure provides a method for extracting relationships of medical entities. Fig. 1 shows a flow chart of a method for extracting a relationship of a medical entity, which, as shown in fig. 1, comprises at least the following steps:

and S110, acquiring a position logic relationship between the standard medical entities, and configuring a finite state machine according to the position logic relationship.

And S120, acquiring a text to be recognized, and performing word segmentation processing on the text to be recognized to obtain text word segmentation.

And S130, determining word segmentation labels corresponding to the text word segmentation according to the standard medical entity, and identifying the medical relation of the text to be identified according to the word segmentation labels and the finite state machine.

In the exemplary embodiment of the disclosure, the position logic relationship is managed through the finite state machine, the maintainability of the position logic relationship is improved, manual labeling is not needed, the labor cost and the time cost are saved, upgrading and iteration are more convenient, the word segmentation tags of the text word segmentation can be determined according to the standard medical entity, the text word segmentation and the medical relationship can be conveniently logically traced, and the tracing requirement of the text word segmentation and the medical relationship is met. Furthermore, a method for quickly and effectively identifying medical relationships is provided, the intelligent degree, the automatic degree and the identification accuracy of medical relationship identification are improved, the extraction requirements of medical texts such as various medical documents are met, and the application scene of relationship extraction of medical entities is enriched.

The following describes each step of the relationship extraction method for medical entities in detail.

In step S110, the position logic relationship between the standard medical entities is obtained, and the finite state machine is configured according to the position logic relationship.

In an exemplary embodiment of the present disclosure, the standard medical entities may be derived from medical vocabularies to further obtain the positional logical relationship between the standard medical entities.

In an alternative embodiment, the location logical relationship includes an attribute logical relationship and an entity logical relationship, and fig. 2 is a flowchart illustrating a method for obtaining the location logical relationship, where as shown in fig. 2, the method at least includes the following steps: in step S210, a medical vocabulary is obtained, and a standard medical entity is obtained by performing word segmentation processing on the medical vocabulary.

The medical vocabulary may be obtained by manually classifying and sorting medical documents, or abstracts of medical documents, or medical texts of other industry standards, which is not limited in this exemplary embodiment.

In addition, the medical word list can be obtained through various statistical models.

In an alternative embodiment, the medical vocabulary is obtained using a statistical model.

For example, the statistical model may be a SVM (Support Vector Machines) model, and may also be a BERT (Bidirectional Encoder reconstruction from encoders) model.

Among them, SVMs were first proposed by Vladimir n.vapnik and Alexey ya. chervon enkis in 1963, and the current version (soft margin) was proposed by cornna cortex and Vapnik in 1993 and 1995. Before the advent of deep learning (2012), SVMs were considered to be the most successful and best performing algorithms in the last decade of machine learning.

The support vector machine is a two-class model that maps the feature vectors of instances to points in space, such as solid points and empty points, that belong to two different classes. The purpose of the SVM is to draw a line to "best" distinguish between the two types of points so that if new points are available later, the line can be well classified. The SVM is suitable for the classification problems of medium and small-sized data samples, nonlinearity and high dimension.

The SVM is a discrimination method, and in the field of machine learning, is a supervised learning model, and is generally used for pattern recognition, classification, and regression analysis. And training each sub-classifier in the initial tree-shaped classifier by using the SVM algorithm and the training data to obtain the node parameters of each sub-classifier.

The conversion from natural language to machine language is achieved using a frontier level BERT pre-trained model to obtain a generic semantic representation.

BERT is a language representation model proposed by the institute of Artificial Intelligence (Google AI) in 2018 in 10 months and trained in an unsupervised manner using a large amount of unlabeled text. BERT showed surprising performance in machine reading understanding top level test sqadd 1.1, and exceeded human beings in all two metrics, and created SOTA (state-of-the-art, which indicates that the performance of a model is currently optimal if a model can be called SOTA) in 11 different NLP (Natural Language Processing) tests, including developing a milestone model with GLUE (a benchmark for evaluating a generic NLP model, whose ranking list can reflect the performance of NLP to some extent) benchmark as high as 80.4% (7.6% absolute improvement), with nli accuracy reaching 86.7% (5.6% absolute improvement), which became a milestone model in the NLP development history.

The BERT pre-training Model is a general semantic representation Model with strong migration capability, takes a Transformer as a network basic component, takes a Masked Bi-Language Model (a mask Language Model) and a Next sequence Prediction (Next Sentence Prediction) as training targets, and obtains general semantic representation through pre-training.

Compared with the traditional Word Vectors such as Word2Vec (Word to vector, which is used to generate a correlation model of Word Vectors), GloVe (Global Vectors for Word Representation), and a Word Representation (Word Representation) tool based on Global Word frequency statistics (count-based & overall statistics), BERT satisfies the concept of the most popular contextual Word Representation (contextual Word Representation) in recent years, that is, considering the content of contexts, the same Word has different Representation modes in different contexts. Intuitively, this also satisfies the real situation of human natural language, i.e. the meaning of the same vocabulary is likely to be different in different situations.

Specifically, the BERT model adopts a plurality of layers of transformers to perform bidirectional learning on the text, and the transformers read the text in a one-time reading mode, so that the context relationship among words in the text can be more accurately learned, the context can be more deeply understood, namely the context can be more deeply understood by the bidirectional trained language model than the unidirectional language model, and the text can be accurately subjected to feature extraction.

When the corresponding medical word list is obtained through classification algorithms of machine learning such as SVM models and the like, or when the corresponding medical word list is obtained through BERT semantic vectors, the medical word list can be obtained by adopting statistical probability because the medical word list is based on a statistical model.

For example, when a word of "vocabulary a vocabulary B aspirin vocabulary D vocabulary E" appears and "vocabulary a vocabulary B vocabulary C vocabulary D vocabulary E" appears, it is known that when a word appears before and after in order from the aspirin with respect to the relative positional relationship between the aspirin and the vocabulary a, B, D, E, the vocabulary a, B, D, E is equivalent to aspirin, and thus the vocabulary C is considered as a word identical to aspirin.

Thus, the medical vocabulary may include "surgical access status: laparoscope assisted laparoscopy under a laparoscope, and operation description state: broad in breadth "," anatomical site status: adnexa of uterus, pelvic lymph node "," core word status: resection, word "azimuth: single and double up, down, left and right, etc.

In particular, "laparoscope", "laparoscopic aid" and "laparoscopic aid" may be considered as the same words in the surgical access state; in the surgical description state, "broad" and "broad" may be considered to be the same words; in the anatomical site state, "uterus", "adnexa" and "pelvic lymph node" may be considered the same words; in the core word state, "resection," "resection," and "cleanout" may be considered to be the same words; in the directional words, "single", "double", "upper", "lower", "left" and "right" may be considered equivalent words, all meaning an orientation.

In the exemplary embodiment, the medical vocabulary can be acquired through various statistical models, the acquisition mode is faster and more accurate, an intelligent and automatic medical vocabulary acquisition mode is provided, and the labor cost and the time cost are saved. And when the unrecognized medical term is determined, the problem of recall can be quickly solved by adding the medical term to the medical word list, and a data basis is provided for upgrading and iterating the relation extraction method of the medical entity.

After the medical word list is obtained, word segmentation processing can be performed on the medical word list to obtain a corresponding standard medical entity.

The word segmentation processing mode for the medical word list comprises rule-based word segmentation and statistical-based word segmentation.

The rule-based word segmentation is performed by pre-constructing a dictionary and segmenting words according to a matching mode. The dictionary can be a multivariate grammar N-gram (Chinese language model) dictionary, a medical word list is matched with the multivariate grammar N-gram dictionary which is constructed in advance according to a word segmentation strategy to obtain a possible segmentation result of each word, and then a final standard medical entity is calculated by adopting a shortest path method based on the multivariate grammar N-gram dictionary.

The N-gram is a language model commonly used in large-vocabulary continuous speech recognition, and realizes the conversion from phonemes to words. The words can be Chinese words or English words. Generally, the acoustic model gives probabilities of phoneme sequences, and the language model scales (scales) the phoneme sequences by counting probabilities between words using the language model probabilities, so that word sequences more conforming to language habits are output.

The word segmentation is carried out by utilizing a classifier constructed by labeled corpus training based on statistical word segmentation. The classifier can be constructed by training using machine learning or deep learning algorithms. Such algorithms may employ Hidden Markov Models (HMMs), conditional random field algorithms (CRFs), deep learning, and the like.

In addition, a plurality of different word segmentation tools can be directly called as word segmentation models, and word segmentation tools of different types are respectively called to perform word segmentation processing on the medical word list to obtain the standard medical entity. The medical vocabulary may also be referred to as coarse corpus. And calling word segmentation tools with different types to perform primary word segmentation on the medical word list to obtain a plurality of initial word segmentations corresponding to different word segmentation tools, and merging the plurality of initial word segmentations into an initial word segmentation set. At this time, the initial participle set includes more initial participle data, and the number of initial participles can be reduced by voting for each initial participle. The voting process can be obtained according to the word segmentation tool statistics. For example, for a certain initial word segmentation, all three word segmentation tools can segment the initial word segmentation from the original text, and the initial word segmentation is used as a word segmentation character string. If the word segmentation results of the three word segmentation tools for the initial word segmentation are not consistent, directly discarding the initial word segmentation; if the segmentation results of the two segmentation tools for the initial segmentation are consistent, and the segmentation results of the other segmentation tool for the initial segmentation tool are inconsistent, the initial segmentation tool can be determined to be a standard medical entity.

When a plurality of word segmentation tools are used for word segmentation, the number of initial word segmentation in the initial word segmentation set can be preliminarily reduced through a voting mode, and the effectiveness of the word segmentation character strings is guaranteed.

The word segmentation tool may be an open-source Chinese word segmentation tool, such as Chinese word segmentation in the crust, a Hanlp word segmentation device, a Language Technology Platform (LTP), a Chinese lexical analysis kit (THU lexical Analyzer for Chinese) developed by the university of qinghua in natural Language processing and social human computing laboratory, a stanford word segmentation device, a natural Language processing and information retrieval sharing Platform NLPIR, and the like. These word segmentation tools have respective word segmentation characteristics. For example, the invoked multiple word segmentation tools can be three word segmentation tools, namely LTP, THULAV and NLPIR, respectively, for performing word segmentation processing on the medical word list.

The word segmentation module of LTP is trained and decoded based on CRF model, which models the target sequence based on the observed sequence, and the data source is the data in the 1-6 month people's daily report of 1998. Initializing by obtaining a file path word segmentation interface, and calling the word segmentation interface to perform word segmentation processing on the medical word list to obtain at least two standard medical entities.

The THULAC toolkit trains the original corpus from the onboard model, but requires authorization. The Chinese word segmentation and part-of-speech tagging functions of the THULAC toolkit have the characteristics of strong capability and high accuracy. The method can call a word segmentation statement to perform word segmentation processing on a medical word list by configuring interface parameters to obtain at least two standard medical entities.

The NLPIR tool is a full-chain analysis tool and can be used for segmenting a medical word list. In the specific word segmentation process, a pre-constructed dictionary needs to be introduced, and the dictionary is called to perform primary segmentation to obtain a segmentation result. And further, eliminating ambiguous words by using a probability statistical method and a simple rule, identifying the unknown words by using word frequency information, and obtaining at least two standard medical entities after eliminating ambiguity and identifying the unknown words.

For example, the standard medical entity may include a drug entity, a disease entity, and the like, and the exemplary embodiment is not particularly limited thereto.

In step S220, the attributes of the standard medical entity are obtained, and the attribute logical relationship of the attributes is counted.

After the standard medical entity is obtained through word segmentation processing, the attributes of the standard medical entity can be obtained.

Wherein, when the standard medical entity is a disease medical entity, the attributes of the standard medical entity may include a core word and a descriptor. In addition, the attribute of the standard medical entity may further include an orientation word and other attribute words, which are not particularly limited in the present exemplary embodiment.

For example, when the disease entity is middle-aged or young-aged diabetes, the term "middle-aged or young-aged" is used as the descriptor, and "diabetes" is used as the core word.

Also, since the descriptor precedes the core word, the logical relationship of the attributes of the descriptor and the core word may be such that the descriptor of the disease entity precedes the core word.

The statistics of the attribute logical relationship may be obtained by manually summarizing a rule, or may be obtained based on the relative position relationship between the attributes acquired by the statistical N-gram and other methods, which is not particularly limited in this exemplary embodiment.

In step S230, the entity logical relationship of the standard medical entity is counted.

Besides the statistical attribute logical relationship, for the standard medical entity, the corresponding entity logical relationship can also be obtained through statistics.

When a drug entity and a disease entity are referred to in the same clause, the drug entity precedes the disease entity, and thus, the entity logical relationship of the drug entity and the disease entity is that the drug entity precedes the disease entity.

Furthermore, the statistics of the entity logical relationship may also be obtained by manually summarizing a rule, or obtained based on the statistical N-gram and other methods, and the present exemplary embodiment is not particularly limited to this.

In the exemplary embodiment, the attribute logic relationship and the entity logic relationship can be obtained by counting the standard medical entities and the attributes thereof, the obtaining mode is simple and accurate, the use specification of the medical field is met, and the method is closely attached to the application scene.

After the two position logic relationships of the attribute logic relationship and the entity logic relationship are obtained through statistics, a finite state machine can be further configured according to the position logic relationship.

A Finite-state machine (FSM), also called Finite-state automata, abbreviated as state machine, is a mathematical model representing Finite states and the behaviors of transitions and actions between the states.

A finite state machine is a tool used to model the behavior of an object, and its role is mainly to describe the sequence of states that an object experiences during its lifecycle and how to respond to various events from the outside world.

Finite state machines are widely used in computer science for modeling application behavior, hardware circuitry design, software engineering, compilers, network protocols, and computing and language research.

The state machine can be generalized into 4 elements, namely, a present state, a condition, an action, and a next state. The "present" and "Condition" are the reasons, and the "action" and "substate" are the effects.

Wherein, the current state refers to the current state.

Conditions, also known as "events. When a condition is met, an action is triggered or a state transition is performed.

The action is an action executed after the condition is satisfied. After the action is executed, the mobile terminal can be transferred to a new state or still keep the original state. The action is not necessary, and when the condition is satisfied, the state can be directly migrated to a new state without executing any action.

The secondary state is a new state to be migrated after the condition is satisfied. The "off state" is relative to the "off state" and, once activated, the "off state" transitions to a new "off state".

It is worth to be noted that the finite state machine has clear writing logic and strong expressive power, and is beneficial to packaging events. Therefore, the more states and events that occur in an object, the more appropriate the finite state machine can be written.

The programming concept of a finite state machine is roughly as follows: describing the FSM using a state transition diagram; nodes in the state transition diagram correspond to different state objects; each state object is converted to another state by an input character or remains unchanged.

The process of switching from one state to another by entering characters may be referred to as a mapping. In computer programming, there may be two ways to represent mapping, namely, by algorithmic representation, i.e., "Executable Code" or by a mapping table, i.e., "Passive Data" respectively.

The FSM implementing the mapping by means of executable code mainly processes different characters by conditional branches, such as if or switch statement blocks; the FSMs that implement the mapping by means of passive data are of approximately the same type of use, and therefore it is possible to consider saving similar information in a table, which avoids many function calls in the program.

A conversion table is used in each state to represent the mapping relationship, and the index of the conversion table is represented by using an input character. In addition, because the change between different states can be described through the conversion table, it is not necessary to define each State as a class, that is, unnecessary inheritance and virtual functions are not needed, and only one State is used. Thus, a conversion table is used to replace a virtual function, and the design of the program is simplified.

If the class FSM can represent any type of FSM, then it is more consistent with programming requirements. The specific configuration performed in the constructor should be generalized to a mechanism by which to build an arbitrary FSM.

In the construction function of the FSM, the translation table should be imported as a parameter, rather than including a specific translation table, so that the size of the translation table does not need to be encoded in the FSM. Therefore, this memory space for the translation tables must be dynamically created in the constructor.

Of course, the translation table in the mentioned program may be provided not in the main program but by a subclass specficfsm derived from the FSM, and the specific translation table is set in the specficfsm and then passed to the base class FSM through the initialization list of the specficfsm, so that the specficfsm can be used in the main program for operation.

In an alternative embodiment, the finite state machines include an entity finite state machine and a relationship finite state machine, and fig. 3 is a flow chart illustrating a method for configuring a finite state machine, as shown in fig. 3, the method at least includes the following steps: in step S310, the entity finite state machine is configured according to the attribute logical relationship.

Since the attribute logical relationship between the descriptors and the core words may be that the descriptors of the disease entity precede the core words, the entity finite state machine containing the rules of "" middle-young type "," diabetes "," old type "," diabetes ", etc. may be configured when configuring the entity finite state machine.

In addition, the entity finite state machine may further include a rule of "surgical approach state → surgical description state", a rule of "surgical description state → azimuth word state → anatomical region state", a rule of "anatomical region state → core word state", a rule of "core word state → anatomical region state", and the like, and this exemplary embodiment is not particularly limited thereto.

In step S320, the relationship finite state machine is configured according to the entity logical relationship.

Because the entity logical relationship between the drug entity and the disease entity is that the drug entity is before the disease entity, when the relationship finite state machine is configured, the relationship finite state machine containing the rules of metformin, senile diabetes, and the like can be configured.

In the present exemplary embodiment, two finite state machines can be configured in accordance with the positional logical relationship, and the attribute logical relationship and the entity logical relationship can be managed by using the finite state machines. Moreover, the uncovered position logic relationship can be quickly solved by adding the rule of the priority state machine, and data support and theoretical basis are provided for the relationship extraction method of the iteration and upgrade medical entities.

In step S120, a text to be recognized is obtained, and word segmentation is performed on the text to be recognized to obtain text word segmentation.

In an exemplary embodiment of the present disclosure, the text to be recognized may be obtained from a document to be recognized, or may be obtained from other medical texts to be recognized, and this exemplary embodiment is not particularly limited in this respect.

For example, the text to be recognized may be "study of efficacy and adverse reaction of metformin and metformin for treatment of senile diabetes".

Further, word segmentation processing is carried out on the text to be recognized.

The word segmentation processing mode for the text to be recognized can also comprise rule-based word segmentation and statistical-based word segmentation.

The rule-based word segmentation is performed by pre-constructing a dictionary and segmenting words according to a matching mode. The dictionary can be a multivariate grammar N-gram dictionary, the medical word list is matched with the multivariate grammar N-gram dictionary which is constructed in advance according to a word segmentation strategy to obtain a possible segmentation result of each word, and then a shortest path method based on the multivariate grammar N-gram dictionary is adopted to calculate a final word segmentation text.

The word segmentation is carried out by utilizing a classifier constructed by labeled corpus training based on statistical word segmentation. The classifier can be constructed by training using machine learning or deep learning algorithms. Such algorithms may employ hidden markov models, conditional random field algorithms, deep learning, and the like.

In addition, a plurality of different word segmentation tools can be directly called as word segmentation models, and word segmentation processing is performed on the text to be recognized by respectively calling the word segmentation tools of different types to obtain words. The text to be recognized may also be referred to as a coarse corpus. And calling word segmentation tools of different types to perform preliminary word segmentation on the text to be recognized to obtain a plurality of initial words corresponding to the different word segmentation tools, and merging the plurality of initial words into an initial word segmentation set. At this time, the initial participle set includes more initial participle data, and the number of initial participles can be reduced by voting for each initial participle. The voting process can be obtained according to the word segmentation tool statistics.

For example, for a certain initial word segmentation, the three word segmentation tools can all segment the initial word segmentation from the text to be recognized, and at this time, the initial word segmentation is used as a word segmentation character string. If the word segmentation results of the three word segmentation tools for the initial word segmentation are not consistent, directly discarding the initial word segmentation; if the word segmentation results of the two word segmentation tools for the initial word segmentation are consistent, and the word segmentation results of the other word segmentation tool for the initial word segmentation are inconsistent, the initial word segmentation can be determined to be text word segmentation.

The word segmentation tool can adopt an open-source Chinese word segmentation tool, such as Chinese word segmentation in the Chinese, Hanlp word segmentation device, Hadamard language technology platform, a Chinese lexical analysis tool kit developed by the natural language processing and social humanistic computation laboratory of Qinghua university, Stanford word segmentation device, natural language processing and information retrieval sharing platform NLPIR and the like. These word segmentation tools have respective word segmentation characteristics. For example, the called multiple word segmentation tools can be three word segmentation tools, namely LTP, THULAV and NLPIR, for performing word segmentation processing on the text to be recognized.

The word segmentation module of LTP is trained and decoded based on CRF model, which models the target sequence based on the observed sequence, and the data source is the data in the 1-6 month people's daily report of 1998. Initializing by obtaining a file path word segmentation interface, and calling the word segmentation interface to perform word segmentation processing on the medical word list to obtain at least two text word segments.

The THULAC toolkit trains the original corpus from the onboard model, but requires authorization. The Chinese word segmentation and part-of-speech tagging functions of the THULAC toolkit have the characteristics of strong capability and high accuracy. The method can call word segmentation sentences to perform word segmentation processing on the medical word list by configuring interface parameters to obtain at least two text word segments.

The NLPIR tool is a full-chain analysis tool and can be used for segmenting words of a text to be recognized. In the specific word segmentation process, a pre-constructed dictionary needs to be introduced, and the dictionary is called to perform primary segmentation to obtain a segmentation result. Further, a probability statistical method and a simple rule are used for eliminating ambiguous words, word frequency information is used for identifying unknown words, and at least two word segmentation texts are obtained after ambiguity elimination and identification of the unknown words.

In step S130, word segmentation tags corresponding to the text word segmentation are determined according to the standard medical entity, and the medical relationship of the text to be recognized is recognized according to the word segmentation tags and the finite state machine.

In an exemplary embodiment of the present disclosure, after the standard medical entity and the text segmentation are obtained, the segmentation labels of the text segmentation may be determined according to the standard medical entity.

In an alternative embodiment, fig. 4 shows a flow chart of a method for determining word segmentation tags for text word segmentation, as shown in fig. 4, the method at least comprises the following steps: in step S410, an entity tag corresponding to the standard medical entity is obtained, and similarity calculation is performed on the standard medical entity and the text segmentation to obtain semantic similarity.

After the standard medical entity is obtained, a part or all of the entity labels of the standard medical entity are manually marked, so that the word segmentation labels of the text word segmentation can be correspondingly determined by taking the entity labels as a reference.

In an alternative embodiment, the semantic similarity is calculated by similarity calculation of the standard medical entity and the text participle by using a language representation model.

To determine semantic similarity between the standard medical entity and the text participles, a similarity calculation may be performed using a language characterization model.

The language characterization model may be a BERT model or other models, which is not particularly limited in this exemplary embodiment.

BERT showed surprising performance in machine-read understanding top-level test sqaad 1.1, outweighed humans overall in both metrics, and created SOTA performance in 11 different NLP tests, including boosting the GLUE benchmark to 80.4% (7.6% absolute improvement), MultiNLI accuracy to 86.7% (5.6% absolute improvement), as a milestone model achievement in the NLP development history.

The BERT pre-training Model is a general semantic representation Model with strong migration capability, takes a Transformer as a network basic component, takes a Masked Bi-Language Model and a Next sequence Prediction as training targets, and obtains general semantic representation through pre-training.

Compared with the traditional Word vectors embedded by Word2Vec, GloVe and the like, BERT meets the concept of the representation of the context words which are very popular in recent years, namely, the same Word has different representation modes in different contexts by considering the content of the contexts. Intuitively, this also satisfies the real situation of human natural language, i.e. the meaning of the same vocabulary is likely to be different in different situations.

In step S420, a similarity threshold corresponding to the semantic similarity is obtained, and the semantic similarity and the similarity threshold are compared to obtain a comparison result.

After calculating the semantic similarity between the standard medical entity and the text participle, a similarity threshold corresponding to the semantic similarity may be obtained. The similarity threshold may be set according to actual needs and situations, which is not particularly limited in this exemplary embodiment.

Further, the semantic similarity is compared with a similarity threshold value to obtain a corresponding comparison result.

In step S430, if the semantic similarity is greater than the similarity threshold as a result of the comparison, it is determined that the entity label is a word segmentation label corresponding to the text word segmentation.

When the semantic similarity is greater than the similarity threshold value as a result of the comparison, it may be determined that the text participle is similar to the standard medical entity, and therefore, it may be determined that the entity tag corresponding to the standard medical entity is the participle tag corresponding to the text participle.

The word segmentation labels may include labels such as descriptors and core words, labels such as drug entities and disease entities, labels of non-medical entities such as experimental groups and control groups, or other related labels, which is not limited in this exemplary embodiment.

When the text to be recognized is 'researching the curative effect and adverse reaction of metformin and trimethylbiguanide for treating the senile diabetes', the text participles and corresponding participle labels can be obtained through participle processing and semantic similarity calculation. For example, (metformin, drugs) (treatment, intervention modality) (elderly, medical descriptors) (diabetes, disease core).

It should be noted that, because the laparoscope is marked with the word segmentation label of "approach", the laparoscope is marked with the label of "descriptor", the laparoscope is marked with the word segmentation label of "azimuth word", the uterus is marked with the word segmentation label of "dissection", the resection is marked with the word segmentation label of "core word", the pelvic lymph node is marked with the word segmentation label of "dissection", and the cleaning is marked with the word segmentation label of "core word", the logical pursuit can be performed on the identification reasons of each text medical entity and medical relationship.

In the exemplary embodiment, the entity labels of the standard medical entities are used as the basis, the word segmentation labels of the text word segmentation can be determined, the determination mode is simple and accurate, and the practicability is strong. Moreover, the effect of logic tracing of the entity and medical relation can be realized subsequently by the word segmentation label labeled with the text word segmentation, and a solution is provided for the tracing requirement of related personnel.

After determining the word segmentation labels of the text word segmentation, the medical relationship of the text to be recognized can be determined according to the word segmentation labels and the finite state machine.

In an alternative embodiment, fig. 5 shows a flow diagram of a method of determining a medical relationship of a text to be recognized, which method comprises at least the following steps, as shown in fig. 5: in step S510, a text medical entity in the text to be recognized is recognized according to the word segmentation label and the entity finite state machine.

Because the entity finite state machine is provided with the rule that the descriptors of disease entities such as ' middle-young ' type ', ' diabetes ', ' old-age ' and ' diabetes ' are before the core word, the rule contained in the entity finite state machine can be used for judging that when a text segmentation word with a segmentation label as a descriptor is before a text segmentation word with a segmentation label as a core word, the two text segmentation words form a disease entity, namely, the text medicine is an entity.

Therefore, when the word segmentation label of the text word segmentation of the 'youth type' is a descriptor, but the word segmentation label of the following text word segmentation of the 'student' is not a core word, the 'youth type' and the 'student' are not recognized as a text medical entity.

In addition, laparoscopic widespread total hysterectomy and pelvic lymph node dissection may also be identified as surgical entities. Therefore, in addition to the disease entity and the drug identification, an entity of medical means, such as a surgical entity, etc., may be included, and this exemplary embodiment is not particularly limited thereto.

And when the text to be recognized is 'researching the curative effect and adverse reaction of metformin and trimethylbiguanide for treating the senile diabetes', and the text participles and corresponding participle labels are (metformin, medicine) (trimethylbiguanide, medicine) (treatment and intervention mode) (senile, medical descriptor) (diabetes and disease core word), the disease entity of (senile diabetes and disease) can be recognized by using the 'medical descriptor → disease core word → disease' rule in the entity finite state machine.

In step S520, the medical relationship of the text to be recognized is recognized according to the text medical entity and the relationship finite state machine.

After the text medical entity is identified, the medical relation corresponding to the text to be identified can be further identified by using a relation finite state machine.

Because the relation finite state machine is provided with rules containing drug entities such as metformin, senile diabetes, and the like before the disease entity, the rules contained in the relation finite state machine can be used for judging that when a text word with a word segmentation label as the drug entity is before a text word with a word segmentation label as the disease entity, the two text words form a treatment relation, namely a medical relation.

The medical relationship may further include a non-drug treatment entity, such as a treatment relationship between surgery, radiotherapy, and the like and a disease entity, an indication relationship between an experimental group and a drug, an indication relationship between a control group and a drug, and the like, which is not limited in this exemplary embodiment.

For example, when the text to be recognized is "study of efficacy and adverse reactions of metformin, metformin for treatment of senile diabetes", the text participles and corresponding participle labels are (metformin, drug) (treatment, intervention mode) (geriatric, medical descriptor) (diabetes, disease core word), after the disease entity (senile diabetes, disease) can be identified using the "medical descriptor → disease core → disease" rule in the entity finite state machine, since metformin and metformin are known as pharmaceutical entities, treatment is an entity of intervention, the medical relationship of the treatment relationship is thus identified by the relationship "medicine → intervention → disease" in the finite state machine, and can compare the treatment and intervention effects of the metformin and the trimethylbiguanide on the senile diabetes.

In the exemplary embodiment, the medical relationship of the text to be recognized can be gradually recognized through the entity finite state machine and the relationship finite state machine, manual labeling is not needed, and labor cost and time cost are saved.

In an alternative embodiment, fig. 6 shows a flow diagram of another method for determining a medical relationship of a text to be recognized, which, as shown in fig. 6, at least comprises the following steps: in step S610, a text medical entity and a non-medical entity in the text to be recognized are identified according to the word segmentation label and the entity finite state machine.

Because the entity finite state machine is provided with the rule that the descriptors of disease entities such as ' middle-young ' type ', ' diabetes ', ' old-age ' and ' diabetes ' are before the core word, the rule contained in the entity finite state machine can be used for judging that when a text segmentation word with a segmentation label as a descriptor is before a text segmentation word with a segmentation label as a core word, the two text segmentation words form a disease entity, namely a text medical entity.

In addition, since the word segmentation tags can also comprise non-medical entity tags such as an experimental group and a control group, in the process of identifying the text medical entity, if a certain text segmentation contains the non-medical entity tags, the non-medical entity related to the medical text entity can be additionally determined.

In step S620, the medical relationship of the text to be recognized is recognized according to the textual medical entity, the non-medical entity, and the relationship finite state machine.

After the text medical entity and the non-medical entity are identified, the medical relation corresponding to the text to be identified can be further identified by using a relation finite state machine.

Because the relation finite state machine is provided with rules containing drug entities such as metformin, senile diabetes and the like before disease entities, the rules contained in the relation finite state machine can be used for judging that when a text word with a word segmentation label of the drug entities is before a text word with a word segmentation label of the disease entities, the two text words form a treatment relation, namely a medical relation.

However, it is possible to determine the medical relationship of "effect of metformin in the experimental group on treatment of senile diabetes … …" and "effect of metformin in the control group on treatment of senile diabetes … …" because "metformin" corresponds to the participle label of the experimental group and the associated "metformin" corresponds to the participle label of the control group.

Since in the medical field, it is common to refer to the description of the drugs used in the experimental group or the control group, the medical relationship can be determined as "effect of the experimental group on treating senile diabetes … …" and "effect of the control group on treating senile diabetes … …".

In the exemplary embodiment, the medical relationship of the text to be recognized can be gradually recognized through the entity finite state machine and the relationship finite state machine, manual labeling is not needed, and labor cost and time cost are saved. Moreover, the expression mode and diversity of medical relations can be enriched through the identified useful non-medical entities.

After the text medical entity and the medical relation of the text to be recognized are recognized, text support can be provided for semantic search of medical texts such as documents. Moreover, researchers can conveniently carry out statistical analysis on related treatment methods of diseases, and transverse comparison is carried out on the treatment methods to determine the advantages and disadvantages. Besides, the curative effect of the medicine, possible adverse reactions and the like can be statistically analyzed.

Therefore, the recognized text medical entity and medical relation of the text to be recognized are more convenient to understand and use, a more convenient data acquisition mode can be provided for clinical treatment and medical research, and the processing efficiency of the clinical treatment and the medical research is improved.

Furthermore, in an exemplary embodiment of the present disclosure, a relationship extraction apparatus of a medical entity is also provided. Fig. 7 shows a schematic structural diagram of a relationship extraction apparatus of a medical entity, and as shown in fig. 7, a relationship extraction apparatus 700 of a medical entity may include: a relationship configuration module 710, a text segmentation module 720, and a relationship identification module 730. Wherein:

a relationship configuration module 710 configured to obtain a position logic relationship between standard medical entities and configure a finite state machine according to the position logic relationship;

the text word segmentation module 720 is configured to acquire a text to be recognized, and perform word segmentation processing on the text to be recognized to obtain text words;

the relation recognition module 730 is configured to determine a segmentation label corresponding to the text segmentation according to the standard medical entity, and recognize the medical relation of the text to be recognized according to the segmentation label and the finite state machine.

and counting the entity logic relation of the standard medical entity.

and acquiring the medical word list by using the statistical model.

The specific details of the above-mentioned relationship extracting apparatus 700 for medical entities have been described in detail in the relationship extracting method for corresponding medical entities, and therefore are not described herein again.

It should be noted that although several modules or units of the relationship extraction apparatus 700 of the medical entity are mentioned in the above detailed description, such division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.

In addition, in an exemplary embodiment of the present disclosure, an electronic device capable of implementing the above method is also provided.

An electronic device 800 according to such an embodiment of the invention is described below with reference to fig. 8. The electronic device 800 shown in fig. 8 is only an example and should not bring any limitations to the function and scope of use of the embodiments of the present invention.

As shown in fig. 8, electronic device 800 is in the form of a general purpose computing device. The components of the electronic device 800 may include, but are not limited to: the at least one processing unit 810, the at least one memory unit 820, a bus 830 connecting different system components (including the memory unit 820 and the processing unit 810), and a display unit 840.

Wherein the storage unit stores program code that is executable by the processing unit 810 to cause the processing unit 810 to perform steps according to various exemplary embodiments of the present invention described in the above section "exemplary methods" of the present specification, such as:

and counting the entity logic relation of the standard medical entity.

and acquiring the medical word list by using the statistical model.

acquiring an entity label corresponding to the standard medical entity, and performing similarity calculation on the standard medical entity and the text participles to obtain semantic similarity;

By the mode, the position logic relation is managed through the finite state machine, maintainability of the position logic relation is improved, manual labeling is not needed, labor cost and time cost are saved, upgrading and iteration are facilitated, word segmentation labels of text word segmentation can be determined according to standard medical entities, logic tracing of the text word segmentation and the medical relation is facilitated, and tracing requirements of the text word segmentation and the medical relation are met. Furthermore, a method for quickly and effectively identifying medical relationships is provided, the intelligent degree, the automatic degree and the identification accuracy of medical relationship identification are improved, the extraction requirements of medical texts such as various medical documents are met, and the application scene of relationship extraction of medical entities is enriched.

The storage unit 820 may include readable media in the form of volatile storage units, such as a random access storage unit (RAM)821 and/or a cache storage unit 822, and may further include a read only storage unit (ROM) 823.

Storage unit 820 may also include a program/utility 824 having a set (at least one) of program modules 825, such program modules 825 include, but are not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.

Bus 830 may be any of several types of bus structures including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.

The electronic device 800 may also communicate with one or more external devices 1000 (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a user to interact with the electronic device 800, and/or with any devices (e.g., router, modem, etc.) that enable the electronic device 800 to communicate with one or more other computing devices. Such communication may occur via input/output (I/O) interfaces 850. Also, the electronic device 800 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the internet) via the network adapter 860. As shown, a network adapter 840 communicates with the other modules of the electronic device 800 over the bus 830. It should be appreciated that although not shown, other hardware and/or software modules may be used in conjunction with the electronic device 800, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.

Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, a terminal device, or a network device, etc.) to execute the method according to the embodiments of the present disclosure.

In an exemplary embodiment of the present disclosure, there is also provided a computer-readable storage medium having stored thereon a program product capable of implementing the above-described method of the present specification. In some possible embodiments, aspects of the invention may also be implemented in the form of a program product comprising program code for causing a terminal device to perform the steps according to various exemplary embodiments of the invention described in the above section "exemplary methods" of this specification, when the program product is run on the terminal device, for example:

and counting the entity logic relation of the standard medical entity.

and acquiring the medical word list by using the statistical model.

Referring to fig. 9, a program product 900 for implementing the above method according to an embodiment of the present invention is described, which may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a terminal device, such as a personal computer. However, the program product of the present invention is not limited in this regard and, in the present document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

A computer readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

Claims

1. A method of relationship extraction for a medical entity, the method comprising:

acquiring a position logic relationship between standard medical entities, and configuring a finite state machine according to the position logic relationship;

determining word segmentation labels corresponding to the text word segmentation according to the standard medical entity, and identifying the medical relation of the text to be identified according to the word segmentation labels and the finite state machine.

2. The method of extracting relationship of medical entity as claimed in claim 1, wherein the position logical relationship comprises an attribute logical relationship and an entity logical relationship,

and counting the entity logic relation of the standard medical entity.

3. The method of claim 2, wherein the obtaining the medical vocabulary comprises:

and acquiring the medical word list by using the statistical model.

4. The method of claim 2, wherein the determining the word segmentation labels corresponding to the text word segmentation according to the standard medical entity comprises:

5. The method of claim 4, wherein the calculating the similarity between the standard medical entity and the text segment to obtain semantic similarity comprises:

6. The method of extracting relationships of medical entities according to claim 2, wherein the finite state machines include an entity finite state machine and a relationship finite state machine,

7. The method of claim 6, wherein the identifying the medical relationship of the text to be identified according to the word segmentation label and the finite state machine comprises:

8. The method of claim 6, wherein the identifying the medical relationship of the text to be identified according to the word segmentation label and the finite state machine comprises:

9. A relationship extraction apparatus for medical entities, comprising:

the relation configuration module is configured to acquire a position logic relation between standard medical entities and configure a finite state machine according to the position logic relation;

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a transmitter, carries out a method of relationship extraction for a medical entity of any one of claims 1-8.

11. An electronic device, comprising:

a transmitter;

a memory for storing executable instructions of the transmitter;

wherein the transmitter is configured to perform the method of relationship extraction of a medical entity of any one of claims 1-8 via execution of the executable instructions.