CN111581972A

CN111581972A - Method, device, equipment and medium for identifying corresponding relation between symptom and part in text

Info

Publication number: CN111581972A
Application number: CN202010229316.3A
Authority: CN
Inventors: 孙安国
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2020-03-27
Filing date: 2020-03-27
Publication date: 2020-08-25

Abstract

The embodiment of the application discloses a method, a device, equipment and a medium for identifying corresponding relation between symptoms and parts in a text, and relates to the technical field of text content identification. The method comprises the following steps: receiving a relation identification instruction; acquiring text data; recognizing the text data and extracting a target continuous text; inputting the target continuous text into a long-short term memory model, and outputting a target labeling sequence corresponding to the target continuous text through the long-short term memory model; and analyzing the target labeling sequence according to the labeling rule to obtain all text data matching relations. According to the method, excessive manpower is not needed to be consumed for continuously maintaining the dictionary database, the accuracy and generalization capability of text recognition are improved by setting the specific labeling rule and constructing the long-short term memory model, so that after text information is converted into the corpus labels meeting the labeling rule through the long-short term memory model, the corpus labels are analyzed, and the text data matching relation between the symptoms and the parts represented by the disease diagnosis result can be conveniently obtained.

Description

Method, device, equipment and medium for identifying corresponding relation between symptom and part in text

Technical Field

The application relates to the technical field of text content identification, in particular to a method, a device, equipment and a medium for identifying corresponding relations between symptoms and parts in a text.

Background

For a file or data, in which text information describing disease diagnosis results is recorded, if a corresponding relationship between symptoms and parts is to be extracted from the file or data, the general scheme in the prior art is as follows:

1. a dictionary of symptoms corresponding to the disease and a dictionary of corresponding parts are generated.

2. And searching words representing symptoms and parts in the text through a matching rule.

3. The relationship between the symptom and the part is matched through a series of text rules (such as a near principle).

However, in the current scheme for identifying the corresponding relationship between the symptoms and the parts in the text, the following problems exist: the symptom dictionary and the part dictionary need to be continuously maintained by continuously investing manpower, and the workload is complicated; the problem of error extraction is easy to exist when the symptoms or parts are extracted based on the rules, only the symptoms or parts appearing in the dictionary library can be identified, new entities cannot be identified, newly appearing sentences are difficult to process, and generalization capability during identification is poor.

Disclosure of Invention

The technical problem to be solved by the embodiments of the present application is to provide a method, an apparatus, a device and a medium for recognizing correspondence between a symptom and a part in a text, which can reduce manpower for maintaining a dictionary base and enhance accuracy and generalization capability of text recognition.

In order to solve the above technical problem, an embodiment of the present application provides a method for identifying correspondence between a symptom and a part in a text, which adopts the following technical solutions:

a method for identifying correspondence between symptoms and parts in a text comprises the following steps:

receiving a relation identification instruction;

responding to the relation identification instruction, and acquiring text data pointed by the relation identification instruction;

recognizing the text data, and extracting a target continuous text for describing diagnosis information from the text data;

calling a long-short term memory model, inputting the target continuous text into the long-short term memory model, and outputting a target labeling sequence corresponding to the target continuous text through the long-short term memory model;

reading a preset labeling rule, analyzing the target labeling sequence according to the labeling rule, and acquiring all text data matching relations between symptoms and parts represented by the target labeling sequence.

In order to solve the above technical problem, an embodiment of the present application further provides a device for identifying correspondence between a symptom and a part in a text, which adopts the following technical solutions:

a device for recognizing correspondence between symptoms and parts in text comprises:

the instruction receiving module is used for receiving a relation identification instruction;

the data acquisition module is used for responding to the relation identification instruction and acquiring text data pointed by the relation identification instruction;

the text extraction module is used for identifying the text data and extracting a target continuous text for describing the diagnostic information;

the text conversion module is used for calling a long-short term memory model, inputting the target continuous text into the long-short term memory model and outputting a target labeling sequence corresponding to the target continuous text through the long-short term memory model;

and the relationship analysis module is used for reading a preset labeling rule, analyzing the target labeling sequence according to the labeling rule and acquiring all text data matching relationships between symptoms and parts represented by the target labeling sequence.

In order to solve the above technical problem, an embodiment of the present application further provides a computer device, which adopts the following technical solutions:

a computer device, comprising a memory and a processor, wherein the memory stores a computer program, and the processor executes the computer program to implement the steps of the method for identifying correspondence between symptoms and parts in text according to any one of the above technical solutions.

In order to solve the above technical problem, an embodiment of the present application further provides a computer-readable storage medium, which adopts the following technical solutions:

a computer-readable storage medium, wherein a computer program is stored on the computer-readable storage medium, and when being executed by a processor, the computer program implements the steps of the method for identifying correspondence between symptoms and parts in a text according to any one of the above technical solutions.

Compared with the prior art, the embodiment of the application mainly has the following beneficial effects:

the embodiment of the application discloses a method, a device, equipment and a medium for identifying the corresponding relation between a symptom and a part in a text, and the method for identifying the corresponding relation between the symptom and the part in the text receives a relation identification instruction; acquiring text data in response to the relation recognition instruction; then extracting a target continuous text for describing diagnosis information from the text data; then calling the long-short term memory model, inputting the target continuous text into the long-short term memory model, and outputting a target labeling sequence corresponding to the target continuous text through the long-short term memory model; and then reading a preset labeling rule, and analyzing the target labeling sequence according to the labeling rule, so that all text data matching relations between symptoms and parts represented by the target labeling sequence can be obtained. According to the method, excessive manpower is not needed to be consumed for continuously maintaining the dictionary database, the specific labeling rule is set for the disease diagnosis result in the text information, the long-short term memory model is built, the accuracy and generalization capability of text recognition are improved, and after the text information is converted into the corpus labels meeting the labeling rule through the long-short term memory model, the corpus labels are analyzed, so that the text data matching relation between the symptoms and the parts represented by the disease diagnosis result can be conveniently obtained.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

FIG. 1 is a diagram of an exemplary system architecture to which embodiments of the present application may be applied;

fig. 2 is a flowchart of an embodiment of a method for identifying correspondence between a symptom and a part in a text according to an embodiment of the present application;

FIG. 3 is a diagram illustrating the labeling results at times T0-Tn when the conditional random field layer outputs the labeling sequence in the embodiment of the present application;

fig. 4 is a schematic structural diagram of an embodiment of a device for identifying correspondence between symptoms and parts in a text in an embodiment of the present application;

fig. 5 is a schematic structural diagram of an embodiment of a computer device in an embodiment of the present application.

Detailed Description

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein in the description of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application.

It is noted that the terms "comprises," "comprising," and "having" and any variations thereof in the description and claims of this application and the drawings described above are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus. In the claims, the description and the drawings of the specification of the present application, relational terms such as "first" and "second", and the like, may be used solely to distinguish one entity/action/object from another entity/action/object without necessarily requiring or implying any actual such relationship or order between such entities/actions/objects.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.

In order to make the technical solutions of the present application better understood, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the relevant drawings in the embodiments of the present application.

As shown in fig. 1, the system architecture 100 may include a first terminal device 101, a second terminal device 102, a third terminal device 103, a network 104, and a server 105. The network 104 is used to provide a medium of communication links between the first terminal device 101, the second terminal device 102, the third terminal device 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The user may use the first terminal device 101, the second terminal device 102 and the third terminal device 103 to interact with the server 105 through the network 104 to receive or send messages or the like. Various communication client applications, such as a web browser application, a shopping application, a search application, an instant messaging tool, a mailbox client, social platform software, and the like, may be installed on the first terminal device 101, the second terminal device 102, and the third terminal device 103.

The first terminal device 101, the second terminal device 102 and the third terminal device 103 may be various electronic devices having display screens and supporting web browsing, including but not limited to smart phones, tablet computers, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III, motion Picture Experts compression standard Audio Layer 3), MP4 players (Moving Picture Experts Group Audio Layer IV, motion Picture Experts compression standard Audio Layer 4), laptop portable computers, desktop computers, and the like.

The server 105 may be a server that provides various services, such as a background server that provides support for pages displayed on the first terminal apparatus 101, the first terminal apparatus 102, and the third terminal apparatus 103.

It should be noted that the identification method of correspondence between symptom and part in text provided in the embodiments of the present application is generally executed by a server/terminal device, and accordingly, the identification apparatus of correspondence between symptom and part in text is generally disposed in the server/terminal device.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

With continuing reference to fig. 2, a flowchart of an embodiment of a method for identifying correspondence between symptoms and parts in text according to an embodiment of the present application is shown. The method for identifying the corresponding relation between the symptoms and the parts in the text comprises the following steps:

step 201: a relationship identification instruction is received.

For a piece of document or data, this application refers generally to medical documents or data in which textual information describing the results of a disease diagnosis is recorded. If a user who processes the information wants to quickly acquire the medical relationship which is matched and corresponding between the disease symptom and the part and is shown in the text information of the user, a relationship identification instruction for identifying the medical relationship is sent to the server in an operation page provided by the server, and the corresponding text data matching relationship between the symptom and the part and described in the file or the data can be automatically identified by the server by activating the relationship identification instruction, wherein the text data matching relationship refers to the medical relationship in the disease diagnosis result.

In the embodiment of the present application, an electronic device (for example, the server/terminal device shown in fig. 1) on which the method for identifying correspondence between symptoms and parts in text operates may receive a relationship identification instruction in a wired connection manner or a wireless connection manner. It should be noted that the wireless connection means may include, but is not limited to, a 3G/4G connection, a WiFi connection, a bluetooth connection, a WiMAX connection, a Zigbee connection, a uwb (ultra wideband) connection, and other wireless connection means now known or developed in the future.

Step 202: and responding to the relation identification instruction, and acquiring the text data pointed by the relation identification instruction.

The text data to be recognized, which is pointed to by the relationship recognition instruction, may be a medical document or a complete medical record, etc., in which medical information related to a certain patient is generally recorded. Specifically, the step of receiving the text data by the server may be performed before the step of receiving the relationship identification instruction, or may be performed after the step of receiving the relationship identification instruction.

If the step of the server receiving the text data is before the step of receiving the relation identification instruction, the search and the matching are directly carried out in the database of the server according to the content indicated by the relation identification instruction, and the content pointed by the relation identification instruction can be read out from the database. If the step of receiving the text data by the server is after receiving the relation identification instruction, jumping to a page for inputting the text data in an operation page provided by the server, and the user can confirm the text data to be identified by selecting a related medical file or directly inputting a related medical text.

Step 203: and recognizing the text data, and extracting a target continuous text for describing diagnosis information from the text data.

For medical text data, the information recorded in the text data may further include: the personal basic information of the patient, the time and the place of the visit, the historical medical record, the specific diagnosis description of the current symptoms and the like. The text information required by the current relation identification instruction is limited to the relevant text information for describing the current disease symptom diagnosis result, so that after the text data is obtained, the required text data is further identified and then extracted as the target continuous text.

For the text data as medical documents or complete medical records, a more standardized storage template is generally provided for filling in relevant medical information of patients so as to record and store the medical information of the patients. At this time, the position of the target continuous text in the text data can be conveniently determined by identifying the keyword/field of the text information for describing the current diagnosis result in the text data, so that the target continuous text is extracted.

Step 204: and calling a long-short term memory model, inputting the target continuous text into the long-short term memory model, and outputting a target labeling sequence corresponding to the target continuous text through the long-short term memory model.

The Long Short-Term Memory (LSTM) model is a time-cycling Neural Network (RNN), and the LSTM model contains an LSTM layer which can be used for memorizing the value of the length of the indefinite time.

In the application, a label template with a specific format is preset, and is used for labeling the disease diagnosis result in the text data through the label template. The specific format accords with a set specific marking rule, and the relation between the symptom and the part can be identified more directly and conveniently by identifying the label template containing the specific format. If the target continuous text is to be converted into the target labeling sequence matched with the label template format, the target continuous text obtained in the previous step can be processed by using a preset trained long-short term memory model, so that the target labeling sequence corresponding to the target continuous text can be generated, and the target labeling sequence can be regarded as a disease diagnosis result conforming to the label template format.

In some embodiments of the present application, after the step 203, the method for identifying correspondence between a symptom and a part in a text further includes:

calling a preset word segmentation model based on a dictionary base, and inputting the target continuous text into the word segmentation model;

and matching the target continuous text with entries in a dictionary library through the word segmentation model so as to segment the target continuous text based on a maximum matching principle.

The words representing symptoms and parts in the target continuous text can be called as entities, and a dictionary library about the symptoms words and a dictionary library about the parts words are stored in the server database in advance.

In some possible embodiments, the word segmentation model can be set separately or integrated into the long-short term memory model. If the continuous text is set independently, the continuous text is segmented through the segmentation model and then input into the long-term and short-term memory model; if the method is integrated into the long-short term memory model, the target continuous text input into the long-short term memory model can be directly processed through the long-short term memory model. In the embodiment of the application, firstly, the word segmentation is carried out on the target continuous text through the word segmentation model, the entity in the target continuous text is identified through the vocabulary in the matching dictionary base, and then the target labeling sequence is generated based on the labeling rule and the word segmentation result. Specifically, when the word segmentation model processes the target continuous text input into the word segmentation model for word segmentation, the entity in the target continuous text is identified based on the maximum matching principle.

When an entity is identified, for a text to be identified, a word segmentation model matches a plurality of continuous characters in the text to be identified with words in a dictionary library from left to right, if the continuous characters are matched, an entity is identified, but based on the idea of maximum matching, the segmentation of the entity can be determined sometimes without matching for the first time.

The concept of the maximum matching principle is explained below by way of a detailed example:

if a segment of medical text to be participled is: the "history of hypertension is 5+ years". Firstly, obtaining after dividing according to characters: sensor [ ] { "high", "blood", "pressure", "disease", "history", "5", "+", "year" }, and the words in the dictionary library include: dit [ ] { "hypertension", "history of hypertension", "headache" }.

(1) Starting with sense [1] (first word), when sense [3] (third word) is scanned, "hypertension" is found to be already in the vocabulary dit [ ]. But still cannot be split out because we are unaware that the following words cannot be grouped into longer words (maximum match).

(2) The content [4] was scanned continuously and found that "hypertension" was not the word in "fact [ ]. But we cannot yet determine if "hypertension" found previously is the largest word because "hypertension" is the prefix of fact [2 ].

(3) Scan content [5], find that "history of hypertension" is the word in dit [ ], and continue to scan down.

(4) When content [6] is scanned, it is found that "hypertensive history 5" is not a word in the vocabulary, nor a prefix of a word, and thus the top most word "hypertensive history" can be segmented.

It will be appreciated that the maximum matched word must ensure that the next scan does not end with a word or prefix to a word in the dictionary repository.

In some embodiments of the present application, before the step 204, the method for identifying correspondence between a symptom and a part in a text further includes:

configuring a network structure of a long-term and short-term memory model and acquiring a pre-training corpus;

and training the long-short term memory model based on the pre-training corpus so that the long-short term memory model converts the pre-training corpus into a labeling sequence conforming to a preset labeling rule.

Before the long-short term memory model is called, a network structure of the long-short term memory model is constructed and then trained, and after the training is finished, the network structure is deployed in a server to finish the presetting of the model.

During training, some disease diagnosis results can be selected from historical medical records as pre-training corpora, and the long-short term memory model is trained through the pre-training corpora until the pre-training corpora are input into the long-short term memory model, so that a labeling sequence which is output by the long-short term memory model and accords with a preset labeling rule can be obtained.

Further, the step of configuring the network structure of the long-short term memory model comprises:

setting an initial model of the long-short term memory model to sequentially comprise an input layer, a long-short term memory layer, a Conditional Random Field (CRF) layer and an output layer, wherein the long-short term memory layer comprises a forgetting gate, an input gate and an output gate;

and setting a global attention mechanism for the forgetting gate of the long and short term memory layer, and adding hidden layer output at the last moment into the input gate and the output gate to update the initial model.

The long-short term memory layer in the network structure of the long-short term memory model comprises a forgetting gate (forget gate), an input gate (input gate) and an output gate (output gate). The long and short term memory layer controls the discarding and increasing of information through a gate, and realizes the function of forgetting or learning and memorizing the information. The gate is a structure for selectively passing information, and is composed of an activation function and an arithmetic operation. The concept of the long-short term memory layer is briefly described as follows:

forget gate: the decision of which information is forgotten from the history cell (cell) state is realized by an activation function of 0-1.

0: representation complete forgetfulness, 1: representing a total reservation.

Input to the gate: the input of the current time, the output of the previous time, and the hidden layer output of all previous times are performed with concat (the concat refers to connecting two or more arrays and returning a new array).

The output of the gate: after passing the forget gate, the information retained before is output.

Input gate): and determining how much new information is added into the cell state, and realizing the enhancement or weakening function of the information at the moment through an activation function of-1.

Input to the gate: and inputting the current time, outputting the previous time, and outputting the hidden layer at the previous time to perform concat.

The output of the gate: after passing through the input gate, the information for enhancing or weakening the current input is output.

Output gate (output gate): ultimately determining what value to output. This output will be based on the state of the cell, but is also a filtered version. First, a sigmoid function layer is run to determine which part of the cell state will be output. The cell state is then processed through the tanh function (resulting in a value between-1 and 1) and multiplied by the output of the sigmoid function layer, which ultimately outputs only the portion that we determined to be output.

Setting a global attention mechanism for the forget gate of the long-short term memory layer means that the hidden layer states at all previous moments (1-t-1) are carried in each input of the gate, thereby realizing the global attention mechanism.

In a specific embodiment, hidden layer output at the last time can be added to input gate and output gate, so as to fully consider global information.

And a global attention mechanism is added, so that the generalization capability of the long-term and short-term memory model can be effectively improved. After the long-short term memory model learns that the 'severe' is the degree word and the 'burn' is the symptom, the degree word + the symptom can be further learned to be combined into a new symptom entity by considering the global information, and thus the entity which does not exist in the dictionary database is identified.

The CRF layer is used for outputting a globally optimal labeling sequence from global consideration, and the implementation process refers to the following steps:

further referring to fig. 3, a schematic diagram of the labeling result at time T0-Tn when the conditional random field layer outputs the labeling sequence in the embodiment of the present application is shown, in which only a part of transition probability lines are drawn, and actually all connections are drawn. Let T0 go to Tn, and the possible labeling result at each time is B or M or E or S or O.

The first step is as follows: at time T0, the output of outgate is input into the trained CRF layer to obtain the probability of each label (i.e. emission probability)

The second step is that: at time T1, the output of outgate is input into the trained CRF layer to obtain the probability of each label and the probability of state transition (i.e. from B to B, B to M, etc.), and the maximum probability of each label at time T1 (i.e. the maximum probability labeled as B at time T1, the maximum probability labeled as M, the maximum probability labeled as E, etc.) is calculated according to the emission probability and the probability of state transition.

The third step: and at the time T2, obtaining the emission probability and the state transition probability in the same method, and calculating the maximum probability of each label at each time T1 according to the emission probability and the state transition probability.

…

And (N) step: and Tn, obtaining the emission probability and the state transition probability by the same method, and calculating the maximum probability of each label at each T1 according to the emission probability and the state transition probability.

And supposing that Tn is the last moment, selecting the maximum label of the Tn moment and acquiring the optimal path. The labeling sequence output by the optimal path is the globally optimal labeling sequence.

Step 205: reading a preset labeling rule, analyzing the target labeling sequence according to the labeling rule, and acquiring all text data matching relations between symptoms and parts represented by the target labeling sequence.

As can be understood from the description in step 204, the label template is preset by a specific labeling rule, so that the target labeling sequence conforming to the label template format can be analyzed and identified according to the labeling rule to obtain a disease diagnosis result represented by the target labeling sequence. In the present application, the disease diagnosis result includes several matching symptoms and parts described in the target continuous text, and each matching associated symptom and part can be regarded as a set of text data matching relations.

In one embodiment, the target annotation sequence can be parsed based on regular expression matching, and a specific part of information that we want is obtained from the target annotation sequence through a filtering logic (also referred to as "matching") of the regular expression, so as to obtain a disease diagnosis result represented by the target annotation sequence.

In some embodiments of the present application, the step 205 comprises:

identifying a plurality of first entities representing parts and second entities representing symptoms in the target labeling sequence based on the labeling rules;

and matching a second entity for each first entity in sequence, and recording each pair of the correlated first entity and second entity as a group of text data matching relation.

The generated target labeling sequence is obtained by converting a target continuous text by a long-term and short-term memory model according to a labeling rule, and text contents in the target continuous text are represented by a string of ordered labels.

And setting the first entity as a part described in the disease diagnosis result represented by the target continuous text, and setting the second entity as a symptom described in the disease diagnosis result represented by the target continuous text. When the target labeling sequence is analyzed, a first entity and a second entity are firstly identified, the corresponding relation between the first entity and the second entity is analyzed according to the labeling rule, after the corresponding relation is determined, the first entity and the second entity which correspond to each other are associated, and each pair is marked as a group of text data matching relations until all text data matching relations described in the target continuous text are obtained.

In one specific embodiment, the following labeling rules are set:

the abbreviations for the letters are as follows: b-start position, M-middle position, E-end position, S-single position, O-other, SJ-symptom and corresponding site relationship, 1-symptom, 2-site. SJ is a text for distinguishing from other text tags to indicate that the text in the tag having the letter is a text describing an entity in the text data matching relationship, that is, a text belonging to a result describing a disease diagnosis.

According to the above labeling rules, some specific labels and their significances should be interpreted as: b _ SJ _ 1-start position of symptom entity, M _ SJ _ 1-middle position of symptom entity, E _ SJ _ 1-end position of symptom entity, S _ SJ _ 1-entity composed of single words of symptom, B _ SJ _ 2-start position of part entity, M _ SJ _ 2-middle position of part entity, E _ SJ _ 2-end position of part entity, S _ SJ _ 2-entity composed of single words of part, O-others.

The following describes the concept of matching relationship between target continuous text, target labeling sequence and text data by a specific example applying the labeling rule:

setting a target continuous text representing the disease diagnosis result as follows: the nasal mucosa was congested, edematous, secreted and slightly congested throat.

Specifically, the target labeling sequence converted from the entry label continuous text conforms to the following corresponding relationship:

it is further understood that, among other things, the first entity includes: the "nasal mucosa" and the "pharynx"; the second entity includes: "hyperemia", "edema" and "mild hyperemia", and further identified textual data matching relationships of the first entity with the second entity include three groups: the medicine is prepared from the following raw materials of (1) nasal mucosa and congestion, (b) nasal mucosa and edema, and (c) pharynx and mild congestion.

As in the matching relationship of the first group of text data, the label corresponding to the "nasal cavity" is "B _ SJ _ 2", and the label corresponding to the "mucous membrane" is "E _ SJ _ 2", it can be understood that the "nasal cavity" and the "mucous membrane" belong to the same first entity, the "nasal cavity" is the initial position of the first entity, and the "mucous membrane" is the end position of the first entity, and the two are combined to form a complete first entity.

In some other specific application scenarios, "there is a secretion" may also be considered a second entity that represents a symptom. At this time, a group of text data matching relations of 'nasal mucosa' and 'secretion' are added.

In a further specific embodiment, the step of identifying a plurality of first entities representing sites and second entities representing symptoms in the target tagging sequence based on the tagging rules comprises:

determining the number of first entities and the number of sub-entities respectively contained by each first entity, and determining the number of second entities and the number of sub-entities respectively contained by each second entity;

if the number of the sub-entities contained in one first entity or one second entity is more than one, identifying a text parameter corresponding to each sub-entity contained in the first entity or the second entity, and combining the text parameters to represent the text parameters as the first entity or the second entity.

Each of the first entity and the second entity may be composed of text parameters represented by a plurality of words, each of which is considered a sub-entity. Identifying the first entity and the second entity requires first confirming the total number of the first entity or the second entity, and the determination of the total number can be specifically determined according to the tags representing the entities in the target tagging sequence, for example, when determining the total number of the first entity, calculating the sum of the numbers of "B _ SJ _ 2" and "S _ SJ _ 2" appearing in the tags, and the sum represents the value of the total number.

As will be understood in connection with the examples in the above embodiments, for the first and second entities therein, "nasal mucosa" is the first entity comprised of two sub-individuals, "pharynx" is the first entity represented by a single sub-individual, "congestion" and "edema" are both the second entity represented by a single sub-individual, and "mild congestion" is the second entity comprised of two sub-individuals.

It can be understood that the above identification process includes: the number of the first entity in the first text (text before the first pause sign) is identified as 1, the first entity comprises two sub-individuals, namely 'nasal cavity' and 'mucosa', the number of the second entity in the last text (text between the comma and the period) is identified as 1, the second entity comprises two sub-individuals, namely 'mild' and 'hyperemia', and then the 'nasal cavity' and the 'mucosa' are combined to be represented as the first entity 'nasal mucosa', and the 'mild' and 'hyperemia' are combined to be represented as the second entity 'mild hyperemia'.

According to the method for recognizing the corresponding relation between the symptoms and the parts in the text, excessive manpower is not needed to be consumed to continuously maintain the dictionary base, the specific labeling rule is set for the disease diagnosis result in the text information, the long-short term memory model is built, the accuracy and the generalization capability of text recognition are improved, and after the text information is converted into the corpus labels meeting the labeling rule through the long-short term memory model, the corpus labels are analyzed, so that the text data matching relation between the symptoms and the parts represented by the disease diagnosis result can be conveniently obtained.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and can include the processes of the embodiments of the methods described above when the computer program is executed. The storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disk, a Read-Only Memory (ROM), or a Random Access Memory (RAM).

It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.

With further reference to fig. 4, fig. 4 is a schematic structural diagram of an embodiment of the apparatus for identifying correspondence between symptoms and parts in a text according to the embodiment of the present application. As an implementation of the method shown in fig. 2, the present application provides an embodiment of an apparatus for identifying correspondence between symptoms and parts in text, where the embodiment of the apparatus corresponds to the embodiment of the method shown in fig. 2, and the apparatus can be applied to various electronic devices.

As shown in fig. 4, the apparatus for recognizing correspondence between symptom and part in text according to this embodiment includes:

an instruction receiving module 301; for receiving a relationship identification instruction.

A data acquisition module 302; and the system is used for responding to the relation identification instruction and acquiring the text data pointed by the relation identification instruction.

A text extraction module 303; the text data is used for identifying, and target continuous texts for describing diagnosis information are extracted from the text data.

A text conversion module 304; and the long-short term memory model is used for calling, inputting the target continuous text into the long-short term memory model, and outputting a target labeling sequence corresponding to the target continuous text through the long-short term memory model.

A relationship resolution module 305; the system is used for reading a preset labeling rule, analyzing the target labeling sequence according to the labeling rule and acquiring all text data matching relations between symptoms and parts represented by the target labeling sequence.

In some embodiments of the present application, the text conversion module 304 further comprises: and a word segmentation submodule. The word segmentation sub-module is used for calling a preset word segmentation model based on a dictionary base and inputting the target continuous text into the word segmentation model; and matching the target continuous text with entries in a dictionary library through the word segmentation model so as to segment the target continuous text based on a maximum matching principle.

In some embodiments of the present application, the apparatus for identifying correspondence between a symptom and a part in a text further includes: and a training module. The training module is used for configuring a network structure of the long-term and short-term memory model and acquiring a pre-training corpus; and training the long-short term memory model based on the pre-training corpus so that the long-short term memory model converts the pre-training corpus into a labeling sequence conforming to a preset labeling rule. .

Further, the training module is used for setting an initial model of the long-short term memory model to sequentially comprise an input layer, a long-short term memory layer, a Conditional Random Field (CRF) layer and an output layer, wherein the long-short term memory layer comprises a forgetting gate, an input gate and an output gate; and setting a global attention mechanism for the forgetting gate of the long and short term memory layer, and adding hidden layer output at the last moment into the input gate and the output gate to update the initial model. .

In some embodiments of the present application, the relationship parsing module 305 further comprises: and an entity association submodule. The entity association submodule is used for identifying a plurality of first entities representing parts and second entities representing symptoms in the target labeling sequence based on the labeling rule; and matching a second entity for each first entity in sequence, and recording each pair of the correlated first entity and second entity as a group of text data matching relation.

In a further specific embodiment, the entity association sub-module is further configured to determine the number of the first entities and the number of the sub-entities included in each of the first entities, and determine the number of the second entities and the number of the sub-entities included in each of the second entities; if the number of the sub-entities contained in one first entity or one second entity is more than one, identifying a text parameter corresponding to each sub-entity contained in the first entity or the second entity, and combining the text parameters to represent the text parameters as the first entity or the second entity.

The identification device for the corresponding relation between the symptom and the part in the text does not need to consume too much manpower to continuously maintain a dictionary base, sets a specific labeling rule for a disease diagnosis result in text information, and builds a long-short term memory model to improve the accuracy and generalization capability of text identification, so that after the text information is converted into a corpus label according with the labeling rule through the long-short term memory model, the corpus label is analyzed, and the text data matching relation between the symptom and the part represented by the disease diagnosis result can be conveniently obtained.

In order to solve the technical problem, an embodiment of the present application further provides a computer device. Referring to fig. 5, fig. 5 is a block diagram of a basic structure of a computer device according to the present embodiment.

The computer device 6 comprises a memory 61, a processor 62, a network interface 63 communicatively connected to each other via a system bus. It is noted that only a computer device 6 having components 61-63 is shown, but it is understood that not all of the shown components are required to be implemented, and that more or fewer components may be implemented instead. As will be understood by those skilled in the art, the computer device is a device capable of automatically performing numerical calculation and/or information processing according to a preset or stored instruction, and the hardware includes, but is not limited to, a microprocessor, an Application Specific Integrated Circuit (ASIC), a Programmable gate array (FPGA), a Digital Signal Processor (DSP), an embedded device, and the like.

The computer device can be a desktop computer, a notebook, a palm computer, a cloud server and other computing devices. The computer equipment can carry out man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch panel or voice control equipment and the like.

The memory 61 includes at least one type of readable storage medium including a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, etc. In some embodiments, the memory 61 may be an internal storage unit of the computer device 6, such as a hard disk or a memory of the computer device 6. In other embodiments, the memory 61 may also be an external storage device of the computer device 6, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a flash Card (FlashCard), and the like, which are provided on the computer device 6. Of course, the memory 61 may also comprise both an internal storage unit of the computer device 6 and an external storage device thereof. In this embodiment, the memory 61 is generally used for storing an operating system installed in the computer device 6 and various types of application software, such as program codes of a method for identifying correspondence between symptoms and parts in text. Further, the memory 61 may also be used to temporarily store various types of data that have been output or are to be output.

The processor 62 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data Processing chip in some embodiments. The processor 62 is typically used to control the overall operation of the computer device 6. In this embodiment, the processor 62 is configured to run a program code stored in the memory 61 or process data, for example, a program code of a method for identifying correspondence between a symptom and a part in the text.

The network interface 63 may comprise a wireless network interface or a wired network interface, and the network interface 63 is typically used for establishing a communication connection between the computer device 6 and other electronic devices.

According to the computer equipment, when the corresponding relation between the symptom and the part in the text is recognized through the computer program stored in the processor execution memory, excessive manpower is not needed to be consumed to continuously maintain the dictionary base, the specific marking rule is set for the disease diagnosis result in the text information, the long-short term memory model is built to improve the accuracy and generalization capability of text recognition, and after the text information is converted into the corpus tag according with the marking rule through the long-short term memory model, the corpus tag is analyzed, so that the text data matching relation between the symptom and the part represented by the disease diagnosis result can be conveniently obtained.

The present application provides another embodiment, which is to provide a computer-readable storage medium storing a textual symptom and part correspondence recognition program, which is executable by at least one processor to cause the at least one processor to perform the steps of the textual symptom and part correspondence recognition method as described above.

The calculation and storage medium provided by the embodiment of the application, when the computer program stored in the calculation and storage medium is executed to identify the corresponding relation between the symptom and the part in the text, the dictionary library is not required to be continuously maintained by consuming too much manpower, the specific labeling rule is set for the disease diagnosis result in the text information, and the long-short term memory model is constructed to improve the accuracy and generalization capability of text identification, so that after the text information is converted into the corpus tag meeting the labeling rule through the long-short term memory model, the corpus tag is analyzed, and the text data matching relation between the symptom and the part represented by the disease diagnosis result can be conveniently obtained.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present application.

In the above embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is merely a logical division, and other divisions may be realized in practice, for example, a plurality of modules or components may be combined or integrated into another system, or some features may be omitted, or not executed.

The modules or components may or may not be physically separate, and the components shown as modules or components may or may not be physical modules, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules or components can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

The present application is not limited to the above-mentioned embodiments, the above-mentioned embodiments are preferred embodiments of the present application, and the present application is only used for illustrating the present application and not for limiting the scope of the present application, it should be noted that, for a person skilled in the art, it is still possible to make several improvements and modifications to the technical solutions described in the foregoing embodiments or to make equivalent substitutions for some technical features without departing from the principle of the present application. All equivalent structures made by using the contents of the specification and the drawings of the present application can be directly or indirectly applied to other related technical fields, and the same should be considered to be included in the protection scope of the present application.

It is to be understood that the above-described embodiments are merely illustrative of some, but not restrictive, of the broad invention, and that the appended drawings illustrate preferred embodiments of the invention and do not limit the scope of the invention. This application is capable of embodiments in many different forms and is provided for the purpose of enabling a thorough understanding of the disclosure of the application. Although the present application has been described in detail with reference to the foregoing embodiments, it will be apparent to one skilled in the art that the present application may be practiced without modification or with equivalents of some of the features described in the foregoing embodiments. All other embodiments that can be obtained by a person skilled in the art based on the embodiments in this application without any creative effort and all equivalent structures made by using the contents of the specification and the drawings of this application can be directly or indirectly applied to other related technical fields and are within the scope of protection of the present application.

Claims

1. A method for identifying correspondence between symptoms and parts in a text is characterized by comprising the following steps:

receiving a relation identification instruction;

2. The method for recognizing correspondence between symptom and part in text according to claim 1, wherein after the step of recognizing the text data and extracting a target continuous text for describing diagnosis information therefrom, the method further comprises:

3. The method for recognizing correspondence between symptom and part in text according to claim 1, wherein before the step of calling the long-short term memory model, the method further comprises:

4. The method according to claim 3, wherein the step of configuring the network structure of the long-term and short-term memory model comprises:

setting an initial model of the long-short term memory model, and enabling the initial model to sequentially comprise an input layer, a long-short term memory layer, a conditional random field layer and an output layer, wherein the long-short term memory layer comprises a forgetting gate, an input gate and an output gate;

5. The method for identifying the correspondence between the symptoms and the parts in the text according to claim 1, wherein the step of analyzing the target labeling sequence according to the labeling rule to obtain all text data matching relationships between the symptoms and the parts represented by the target labeling sequence comprises:

6. The method according to claim 5, wherein the step of identifying a plurality of first entities representing parts and second entities representing symptoms in the target labeling sequence based on the labeling rules comprises:

7. A device for recognizing correspondence between symptoms and parts in a text, comprising:

8. The apparatus for recognizing correspondence between symptom and part in text according to claim 7, wherein the text conversion module further includes: a word segmentation submodule;

the system comprises a word segmentation model, a word segmentation model and a word segmentation module, wherein the word segmentation model is used for calling a preset dictionary base-based word segmentation model and inputting the target continuous text into the word segmentation model;

9. A computer device comprising a memory and a processor, wherein the memory stores a computer program, and the processor implements the steps of the method for recognizing correspondence between symptoms and parts in text according to any one of claims 1 to 6 when executing the computer program.

10. A computer-readable storage medium, having stored thereon a computer program which, when being executed by a processor, carries out the steps of the method for identifying correspondence between symptoms and parts in text according to any one of claims 1 to 6.