CN111581972A - Method, device, equipment and medium for identifying corresponding relation between symptom and part in text - Google Patents

Method, device, equipment and medium for identifying corresponding relation between symptom and part in text Download PDF

Info

Publication number
CN111581972A
CN111581972A CN202010229316.3A CN202010229316A CN111581972A CN 111581972 A CN111581972 A CN 111581972A CN 202010229316 A CN202010229316 A CN 202010229316A CN 111581972 A CN111581972 A CN 111581972A
Authority
CN
China
Prior art keywords
text
long
target
term memory
entity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010229316.3A
Other languages
Chinese (zh)
Inventor
孙安国
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN202010229316.3A priority Critical patent/CN111581972A/en
Publication of CN111581972A publication Critical patent/CN111581972A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the application discloses a method, a device, equipment and a medium for identifying corresponding relation between symptoms and parts in a text, and relates to the technical field of text content identification. The method comprises the following steps: receiving a relation identification instruction; acquiring text data; recognizing the text data and extracting a target continuous text; inputting the target continuous text into a long-short term memory model, and outputting a target labeling sequence corresponding to the target continuous text through the long-short term memory model; and analyzing the target labeling sequence according to the labeling rule to obtain all text data matching relations. According to the method, excessive manpower is not needed to be consumed for continuously maintaining the dictionary database, the accuracy and generalization capability of text recognition are improved by setting the specific labeling rule and constructing the long-short term memory model, so that after text information is converted into the corpus labels meeting the labeling rule through the long-short term memory model, the corpus labels are analyzed, and the text data matching relation between the symptoms and the parts represented by the disease diagnosis result can be conveniently obtained.

Description

Method, device, equipment and medium for identifying corresponding relation between symptom and part in text
Technical Field
The application relates to the technical field of text content identification, in particular to a method, a device, equipment and a medium for identifying corresponding relations between symptoms and parts in a text.
Background
For a file or data, in which text information describing disease diagnosis results is recorded, if a corresponding relationship between symptoms and parts is to be extracted from the file or data, the general scheme in the prior art is as follows:
1. a dictionary of symptoms corresponding to the disease and a dictionary of corresponding parts are generated.
2. And searching words representing symptoms and parts in the text through a matching rule.
3. The relationship between the symptom and the part is matched through a series of text rules (such as a near principle).
However, in the current scheme for identifying the corresponding relationship between the symptoms and the parts in the text, the following problems exist: the symptom dictionary and the part dictionary need to be continuously maintained by continuously investing manpower, and the workload is complicated; the problem of error extraction is easy to exist when the symptoms or parts are extracted based on the rules, only the symptoms or parts appearing in the dictionary library can be identified, new entities cannot be identified, newly appearing sentences are difficult to process, and generalization capability during identification is poor.
Disclosure of Invention
The technical problem to be solved by the embodiments of the present application is to provide a method, an apparatus, a device and a medium for recognizing correspondence between a symptom and a part in a text, which can reduce manpower for maintaining a dictionary base and enhance accuracy and generalization capability of text recognition.
In order to solve the above technical problem, an embodiment of the present application provides a method for identifying correspondence between a symptom and a part in a text, which adopts the following technical solutions:
a method for identifying correspondence between symptoms and parts in a text comprises the following steps:
receiving a relation identification instruction;
responding to the relation identification instruction, and acquiring text data pointed by the relation identification instruction;
recognizing the text data, and extracting a target continuous text for describing diagnosis information from the text data;
calling a long-short term memory model, inputting the target continuous text into the long-short term memory model, and outputting a target labeling sequence corresponding to the target continuous text through the long-short term memory model;
reading a preset labeling rule, analyzing the target labeling sequence according to the labeling rule, and acquiring all text data matching relations between symptoms and parts represented by the target labeling sequence.
In order to solve the above technical problem, an embodiment of the present application further provides a device for identifying correspondence between a symptom and a part in a text, which adopts the following technical solutions:
a device for recognizing correspondence between symptoms and parts in text comprises:
the instruction receiving module is used for receiving a relation identification instruction;
the data acquisition module is used for responding to the relation identification instruction and acquiring text data pointed by the relation identification instruction;
the text extraction module is used for identifying the text data and extracting a target continuous text for describing the diagnostic information;
the text conversion module is used for calling a long-short term memory model, inputting the target continuous text into the long-short term memory model and outputting a target labeling sequence corresponding to the target continuous text through the long-short term memory model;
and the relationship analysis module is used for reading a preset labeling rule, analyzing the target labeling sequence according to the labeling rule and acquiring all text data matching relationships between symptoms and parts represented by the target labeling sequence.
In order to solve the above technical problem, an embodiment of the present application further provides a computer device, which adopts the following technical solutions:
a computer device, comprising a memory and a processor, wherein the memory stores a computer program, and the processor executes the computer program to implement the steps of the method for identifying correspondence between symptoms and parts in text according to any one of the above technical solutions.
In order to solve the above technical problem, an embodiment of the present application further provides a computer-readable storage medium, which adopts the following technical solutions:
a computer-readable storage medium, wherein a computer program is stored on the computer-readable storage medium, and when being executed by a processor, the computer program implements the steps of the method for identifying correspondence between symptoms and parts in a text according to any one of the above technical solutions.
Compared with the prior art, the embodiment of the application mainly has the following beneficial effects:
the embodiment of the application discloses a method, a device, equipment and a medium for identifying the corresponding relation between a symptom and a part in a text, and the method for identifying the corresponding relation between the symptom and the part in the text receives a relation identification instruction; acquiring text data in response to the relation recognition instruction; then extracting a target continuous text for describing diagnosis information from the text data; then calling the long-short term memory model, inputting the target continuous text into the long-short term memory model, and outputting a target labeling sequence corresponding to the target continuous text through the long-short term memory model; and then reading a preset labeling rule, and analyzing the target labeling sequence according to the labeling rule, so that all text data matching relations between symptoms and parts represented by the target labeling sequence can be obtained. According to the method, excessive manpower is not needed to be consumed for continuously maintaining the dictionary database, the specific labeling rule is set for the disease diagnosis result in the text information, the long-short term memory model is built, the accuracy and generalization capability of text recognition are improved, and after the text information is converted into the corpus labels meeting the labeling rule through the long-short term memory model, the corpus labels are analyzed, so that the text data matching relation between the symptoms and the parts represented by the disease diagnosis result can be conveniently obtained.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.
FIG. 1 is a diagram of an exemplary system architecture to which embodiments of the present application may be applied;
fig. 2 is a flowchart of an embodiment of a method for identifying correspondence between a symptom and a part in a text according to an embodiment of the present application;
FIG. 3 is a diagram illustrating the labeling results at times T0-Tn when the conditional random field layer outputs the labeling sequence in the embodiment of the present application;
fig. 4 is a schematic structural diagram of an embodiment of a device for identifying correspondence between symptoms and parts in a text in an embodiment of the present application;
fig. 5 is a schematic structural diagram of an embodiment of a computer device in an embodiment of the present application.
Detailed Description
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein in the description of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application.
It is noted that the terms "comprises," "comprising," and "having" and any variations thereof in the description and claims of this application and the drawings described above are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus. In the claims, the description and the drawings of the specification of the present application, relational terms such as "first" and "second", and the like, may be used solely to distinguish one entity/action/object from another entity/action/object without necessarily requiring or implying any actual such relationship or order between such entities/actions/objects.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.
In order to make the technical solutions of the present application better understood, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the relevant drawings in the embodiments of the present application.
As shown in fig. 1, the system architecture 100 may include a first terminal device 101, a second terminal device 102, a third terminal device 103, a network 104, and a server 105. The network 104 is used to provide a medium of communication links between the first terminal device 101, the second terminal device 102, the third terminal device 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The user may use the first terminal device 101, the second terminal device 102 and the third terminal device 103 to interact with the server 105 through the network 104 to receive or send messages or the like. Various communication client applications, such as a web browser application, a shopping application, a search application, an instant messaging tool, a mailbox client, social platform software, and the like, may be installed on the first terminal device 101, the second terminal device 102, and the third terminal device 103.
The first terminal device 101, the second terminal device 102 and the third terminal device 103 may be various electronic devices having display screens and supporting web browsing, including but not limited to smart phones, tablet computers, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III, motion Picture Experts compression standard Audio Layer 3), MP4 players (Moving Picture Experts Group Audio Layer IV, motion Picture Experts compression standard Audio Layer 4), laptop portable computers, desktop computers, and the like.
The server 105 may be a server that provides various services, such as a background server that provides support for pages displayed on the first terminal apparatus 101, the first terminal apparatus 102, and the third terminal apparatus 103.
It should be noted that the identification method of correspondence between symptom and part in text provided in the embodiments of the present application is generally executed by a server/terminal device, and accordingly, the identification apparatus of correspondence between symptom and part in text is generally disposed in the server/terminal device.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
With continuing reference to fig. 2, a flowchart of an embodiment of a method for identifying correspondence between symptoms and parts in text according to an embodiment of the present application is shown. The method for identifying the corresponding relation between the symptoms and the parts in the text comprises the following steps:
step 201: a relationship identification instruction is received.
For a piece of document or data, this application refers generally to medical documents or data in which textual information describing the results of a disease diagnosis is recorded. If a user who processes the information wants to quickly acquire the medical relationship which is matched and corresponding between the disease symptom and the part and is shown in the text information of the user, a relationship identification instruction for identifying the medical relationship is sent to the server in an operation page provided by the server, and the corresponding text data matching relationship between the symptom and the part and described in the file or the data can be automatically identified by the server by activating the relationship identification instruction, wherein the text data matching relationship refers to the medical relationship in the disease diagnosis result.
In the embodiment of the present application, an electronic device (for example, the server/terminal device shown in fig. 1) on which the method for identifying correspondence between symptoms and parts in text operates may receive a relationship identification instruction in a wired connection manner or a wireless connection manner. It should be noted that the wireless connection means may include, but is not limited to, a 3G/4G connection, a WiFi connection, a bluetooth connection, a WiMAX connection, a Zigbee connection, a uwb (ultra wideband) connection, and other wireless connection means now known or developed in the future.
Step 202: and responding to the relation identification instruction, and acquiring the text data pointed by the relation identification instruction.
The text data to be recognized, which is pointed to by the relationship recognition instruction, may be a medical document or a complete medical record, etc., in which medical information related to a certain patient is generally recorded. Specifically, the step of receiving the text data by the server may be performed before the step of receiving the relationship identification instruction, or may be performed after the step of receiving the relationship identification instruction.
If the step of the server receiving the text data is before the step of receiving the relation identification instruction, the search and the matching are directly carried out in the database of the server according to the content indicated by the relation identification instruction, and the content pointed by the relation identification instruction can be read out from the database. If the step of receiving the text data by the server is after receiving the relation identification instruction, jumping to a page for inputting the text data in an operation page provided by the server, and the user can confirm the text data to be identified by selecting a related medical file or directly inputting a related medical text.
Step 203: and recognizing the text data, and extracting a target continuous text for describing diagnosis information from the text data.
For medical text data, the information recorded in the text data may further include: the personal basic information of the patient, the time and the place of the visit, the historical medical record, the specific diagnosis description of the current symptoms and the like. The text information required by the current relation identification instruction is limited to the relevant text information for describing the current disease symptom diagnosis result, so that after the text data is obtained, the required text data is further identified and then extracted as the target continuous text.
For the text data as medical documents or complete medical records, a more standardized storage template is generally provided for filling in relevant medical information of patients so as to record and store the medical information of the patients. At this time, the position of the target continuous text in the text data can be conveniently determined by identifying the keyword/field of the text information for describing the current diagnosis result in the text data, so that the target continuous text is extracted.
Step 204: and calling a long-short term memory model, inputting the target continuous text into the long-short term memory model, and outputting a target labeling sequence corresponding to the target continuous text through the long-short term memory model.
The Long Short-Term Memory (LSTM) model is a time-cycling Neural Network (RNN), and the LSTM model contains an LSTM layer which can be used for memorizing the value of the length of the indefinite time.
In the application, a label template with a specific format is preset, and is used for labeling the disease diagnosis result in the text data through the label template. The specific format accords with a set specific marking rule, and the relation between the symptom and the part can be identified more directly and conveniently by identifying the label template containing the specific format. If the target continuous text is to be converted into the target labeling sequence matched with the label template format, the target continuous text obtained in the previous step can be processed by using a preset trained long-short term memory model, so that the target labeling sequence corresponding to the target continuous text can be generated, and the target labeling sequence can be regarded as a disease diagnosis result conforming to the label template format.
In some embodiments of the present application, after the step 203, the method for identifying correspondence between a symptom and a part in a text further includes:
calling a preset word segmentation model based on a dictionary base, and inputting the target continuous text into the word segmentation model;
and matching the target continuous text with entries in a dictionary library through the word segmentation model so as to segment the target continuous text based on a maximum matching principle.
The words representing symptoms and parts in the target continuous text can be called as entities, and a dictionary library about the symptoms words and a dictionary library about the parts words are stored in the server database in advance.
In some possible embodiments, the word segmentation model can be set separately or integrated into the long-short term memory model. If the continuous text is set independently, the continuous text is segmented through the segmentation model and then input into the long-term and short-term memory model; if the method is integrated into the long-short term memory model, the target continuous text input into the long-short term memory model can be directly processed through the long-short term memory model. In the embodiment of the application, firstly, the word segmentation is carried out on the target continuous text through the word segmentation model, the entity in the target continuous text is identified through the vocabulary in the matching dictionary base, and then the target labeling sequence is generated based on the labeling rule and the word segmentation result. Specifically, when the word segmentation model processes the target continuous text input into the word segmentation model for word segmentation, the entity in the target continuous text is identified based on the maximum matching principle.
When an entity is identified, for a text to be identified, a word segmentation model matches a plurality of continuous characters in the text to be identified with words in a dictionary library from left to right, if the continuous characters are matched, an entity is identified, but based on the idea of maximum matching, the segmentation of the entity can be determined sometimes without matching for the first time.
The concept of the maximum matching principle is explained below by way of a detailed example:
if a segment of medical text to be participled is: the "history of hypertension is 5+ years". Firstly, obtaining after dividing according to characters: sensor [ ] { "high", "blood", "pressure", "disease", "history", "5", "+", "year" }, and the words in the dictionary library include: dit [ ] { "hypertension", "history of hypertension", "headache" }.
(1) Starting with sense [1] (first word), when sense [3] (third word) is scanned, "hypertension" is found to be already in the vocabulary dit [ ]. But still cannot be split out because we are unaware that the following words cannot be grouped into longer words (maximum match).
(2) The content [4] was scanned continuously and found that "hypertension" was not the word in "fact [ ]. But we cannot yet determine if "hypertension" found previously is the largest word because "hypertension" is the prefix of fact [2 ].
(3) Scan content [5], find that "history of hypertension" is the word in dit [ ], and continue to scan down.
(4) When content [6] is scanned, it is found that "hypertensive history 5" is not a word in the vocabulary, nor a prefix of a word, and thus the top most word "hypertensive history" can be segmented.
It will be appreciated that the maximum matched word must ensure that the next scan does not end with a word or prefix to a word in the dictionary repository.
In some embodiments of the present application, before the step 204, the method for identifying correspondence between a symptom and a part in a text further includes:
configuring a network structure of a long-term and short-term memory model and acquiring a pre-training corpus;
and training the long-short term memory model based on the pre-training corpus so that the long-short term memory model converts the pre-training corpus into a labeling sequence conforming to a preset labeling rule.
Before the long-short term memory model is called, a network structure of the long-short term memory model is constructed and then trained, and after the training is finished, the network structure is deployed in a server to finish the presetting of the model.
During training, some disease diagnosis results can be selected from historical medical records as pre-training corpora, and the long-short term memory model is trained through the pre-training corpora until the pre-training corpora are input into the long-short term memory model, so that a labeling sequence which is output by the long-short term memory model and accords with a preset labeling rule can be obtained.
Further, the step of configuring the network structure of the long-short term memory model comprises:
setting an initial model of the long-short term memory model to sequentially comprise an input layer, a long-short term memory layer, a Conditional Random Field (CRF) layer and an output layer, wherein the long-short term memory layer comprises a forgetting gate, an input gate and an output gate;
and setting a global attention mechanism for the forgetting gate of the long and short term memory layer, and adding hidden layer output at the last moment into the input gate and the output gate to update the initial model.
The long-short term memory layer in the network structure of the long-short term memory model comprises a forgetting gate (forget gate), an input gate (input gate) and an output gate (output gate). The long and short term memory layer controls the discarding and increasing of information through a gate, and realizes the function of forgetting or learning and memorizing the information. The gate is a structure for selectively passing information, and is composed of an activation function and an arithmetic operation. The concept of the long-short term memory layer is briefly described as follows:
forget gate: the decision of which information is forgotten from the history cell (cell) state is realized by an activation function of 0-1.
0: representation complete forgetfulness, 1: representing a total reservation.
Input to the gate: the input of the current time, the output of the previous time, and the hidden layer output of all previous times are performed with concat (the concat refers to connecting two or more arrays and returning a new array).
The output of the gate: after passing the forget gate, the information retained before is output.
Input gate): and determining how much new information is added into the cell state, and realizing the enhancement or weakening function of the information at the moment through an activation function of-1.
Input to the gate: and inputting the current time, outputting the previous time, and outputting the hidden layer at the previous time to perform concat.
The output of the gate: after passing through the input gate, the information for enhancing or weakening the current input is output.
Output gate (output gate): ultimately determining what value to output. This output will be based on the state of the cell, but is also a filtered version. First, a sigmoid function layer is run to determine which part of the cell state will be output. The cell state is then processed through the tanh function (resulting in a value between-1 and 1) and multiplied by the output of the sigmoid function layer, which ultimately outputs only the portion that we determined to be output.
Setting a global attention mechanism for the forget gate of the long-short term memory layer means that the hidden layer states at all previous moments (1-t-1) are carried in each input of the gate, thereby realizing the global attention mechanism.
In a specific embodiment, hidden layer output at the last time can be added to input gate and output gate, so as to fully consider global information.
And a global attention mechanism is added, so that the generalization capability of the long-term and short-term memory model can be effectively improved. After the long-short term memory model learns that the 'severe' is the degree word and the 'burn' is the symptom, the degree word + the symptom can be further learned to be combined into a new symptom entity by considering the global information, and thus the entity which does not exist in the dictionary database is identified.
The CRF layer is used for outputting a globally optimal labeling sequence from global consideration, and the implementation process refers to the following steps:
further referring to fig. 3, a schematic diagram of the labeling result at time T0-Tn when the conditional random field layer outputs the labeling sequence in the embodiment of the present application is shown, in which only a part of transition probability lines are drawn, and actually all connections are drawn. Let T0 go to Tn, and the possible labeling result at each time is B or M or E or S or O.
The first step is as follows: at time T0, the output of outgate is input into the trained CRF layer to obtain the probability of each label (i.e. emission probability)
The second step is that: at time T1, the output of outgate is input into the trained CRF layer to obtain the probability of each label and the probability of state transition (i.e. from B to B, B to M, etc.), and the maximum probability of each label at time T1 (i.e. the maximum probability labeled as B at time T1, the maximum probability labeled as M, the maximum probability labeled as E, etc.) is calculated according to the emission probability and the probability of state transition.
The third step: and at the time T2, obtaining the emission probability and the state transition probability in the same method, and calculating the maximum probability of each label at each time T1 according to the emission probability and the state transition probability.
And (N) step: and Tn, obtaining the emission probability and the state transition probability by the same method, and calculating the maximum probability of each label at each T1 according to the emission probability and the state transition probability.
And supposing that Tn is the last moment, selecting the maximum label of the Tn moment and acquiring the optimal path. The labeling sequence output by the optimal path is the globally optimal labeling sequence.
Step 205: reading a preset labeling rule, analyzing the target labeling sequence according to the labeling rule, and acquiring all text data matching relations between symptoms and parts represented by the target labeling sequence.
As can be understood from the description in step 204, the label template is preset by a specific labeling rule, so that the target labeling sequence conforming to the label template format can be analyzed and identified according to the labeling rule to obtain a disease diagnosis result represented by the target labeling sequence. In the present application, the disease diagnosis result includes several matching symptoms and parts described in the target continuous text, and each matching associated symptom and part can be regarded as a set of text data matching relations.
In one embodiment, the target annotation sequence can be parsed based on regular expression matching, and a specific part of information that we want is obtained from the target annotation sequence through a filtering logic (also referred to as "matching") of the regular expression, so as to obtain a disease diagnosis result represented by the target annotation sequence.
In some embodiments of the present application, the step 205 comprises:
identifying a plurality of first entities representing parts and second entities representing symptoms in the target labeling sequence based on the labeling rules;
and matching a second entity for each first entity in sequence, and recording each pair of the correlated first entity and second entity as a group of text data matching relation.
The generated target labeling sequence is obtained by converting a target continuous text by a long-term and short-term memory model according to a labeling rule, and text contents in the target continuous text are represented by a string of ordered labels.
And setting the first entity as a part described in the disease diagnosis result represented by the target continuous text, and setting the second entity as a symptom described in the disease diagnosis result represented by the target continuous text. When the target labeling sequence is analyzed, a first entity and a second entity are firstly identified, the corresponding relation between the first entity and the second entity is analyzed according to the labeling rule, after the corresponding relation is determined, the first entity and the second entity which correspond to each other are associated, and each pair is marked as a group of text data matching relations until all text data matching relations described in the target continuous text are obtained.
In one specific embodiment, the following labeling rules are set:
the abbreviations for the letters are as follows: b-start position, M-middle position, E-end position, S-single position, O-other, SJ-symptom and corresponding site relationship, 1-symptom, 2-site. SJ is a text for distinguishing from other text tags to indicate that the text in the tag having the letter is a text describing an entity in the text data matching relationship, that is, a text belonging to a result describing a disease diagnosis.
According to the above labeling rules, some specific labels and their significances should be interpreted as: b _ SJ _ 1-start position of symptom entity, M _ SJ _ 1-middle position of symptom entity, E _ SJ _ 1-end position of symptom entity, S _ SJ _ 1-entity composed of single words of symptom, B _ SJ _ 2-start position of part entity, M _ SJ _ 2-middle position of part entity, E _ SJ _ 2-end position of part entity, S _ SJ _ 2-entity composed of single words of part, O-others.
The following describes the concept of matching relationship between target continuous text, target labeling sequence and text data by a specific example applying the labeling rule:
setting a target continuous text representing the disease diagnosis result as follows: the nasal mucosa was congested, edematous, secreted and slightly congested throat.
Specifically, the target labeling sequence converted from the entry label continuous text conforms to the following corresponding relationship:
Figure BDA0002428752510000141
Figure BDA0002428752510000151
it is further understood that, among other things, the first entity includes: the "nasal mucosa" and the "pharynx"; the second entity includes: "hyperemia", "edema" and "mild hyperemia", and further identified textual data matching relationships of the first entity with the second entity include three groups: the medicine is prepared from the following raw materials of (1) nasal mucosa and congestion, (b) nasal mucosa and edema, and (c) pharynx and mild congestion.
As in the matching relationship of the first group of text data, the label corresponding to the "nasal cavity" is "B _ SJ _ 2", and the label corresponding to the "mucous membrane" is "E _ SJ _ 2", it can be understood that the "nasal cavity" and the "mucous membrane" belong to the same first entity, the "nasal cavity" is the initial position of the first entity, and the "mucous membrane" is the end position of the first entity, and the two are combined to form a complete first entity.
In some other specific application scenarios, "there is a secretion" may also be considered a second entity that represents a symptom. At this time, a group of text data matching relations of 'nasal mucosa' and 'secretion' are added.
In a further specific embodiment, the step of identifying a plurality of first entities representing sites and second entities representing symptoms in the target tagging sequence based on the tagging rules comprises:
determining the number of first entities and the number of sub-entities respectively contained by each first entity, and determining the number of second entities and the number of sub-entities respectively contained by each second entity;
if the number of the sub-entities contained in one first entity or one second entity is more than one, identifying a text parameter corresponding to each sub-entity contained in the first entity or the second entity, and combining the text parameters to represent the text parameters as the first entity or the second entity.
Each of the first entity and the second entity may be composed of text parameters represented by a plurality of words, each of which is considered a sub-entity. Identifying the first entity and the second entity requires first confirming the total number of the first entity or the second entity, and the determination of the total number can be specifically determined according to the tags representing the entities in the target tagging sequence, for example, when determining the total number of the first entity, calculating the sum of the numbers of "B _ SJ _ 2" and "S _ SJ _ 2" appearing in the tags, and the sum represents the value of the total number.
As will be understood in connection with the examples in the above embodiments, for the first and second entities therein, "nasal mucosa" is the first entity comprised of two sub-individuals, "pharynx" is the first entity represented by a single sub-individual, "congestion" and "edema" are both the second entity represented by a single sub-individual, and "mild congestion" is the second entity comprised of two sub-individuals.
It can be understood that the above identification process includes: the number of the first entity in the first text (text before the first pause sign) is identified as 1, the first entity comprises two sub-individuals, namely 'nasal cavity' and 'mucosa', the number of the second entity in the last text (text between the comma and the period) is identified as 1, the second entity comprises two sub-individuals, namely 'mild' and 'hyperemia', and then the 'nasal cavity' and the 'mucosa' are combined to be represented as the first entity 'nasal mucosa', and the 'mild' and 'hyperemia' are combined to be represented as the second entity 'mild hyperemia'.
According to the method for recognizing the corresponding relation between the symptoms and the parts in the text, excessive manpower is not needed to be consumed to continuously maintain the dictionary base, the specific labeling rule is set for the disease diagnosis result in the text information, the long-short term memory model is built, the accuracy and the generalization capability of text recognition are improved, and after the text information is converted into the corpus labels meeting the labeling rule through the long-short term memory model, the corpus labels are analyzed, so that the text data matching relation between the symptoms and the parts represented by the disease diagnosis result can be conveniently obtained.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and can include the processes of the embodiments of the methods described above when the computer program is executed. The storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disk, a Read-Only Memory (ROM), or a Random Access Memory (RAM).
It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.
With further reference to fig. 4, fig. 4 is a schematic structural diagram of an embodiment of the apparatus for identifying correspondence between symptoms and parts in a text according to the embodiment of the present application. As an implementation of the method shown in fig. 2, the present application provides an embodiment of an apparatus for identifying correspondence between symptoms and parts in text, where the embodiment of the apparatus corresponds to the embodiment of the method shown in fig. 2, and the apparatus can be applied to various electronic devices.
As shown in fig. 4, the apparatus for recognizing correspondence between symptom and part in text according to this embodiment includes:
an instruction receiving module 301; for receiving a relationship identification instruction.
A data acquisition module 302; and the system is used for responding to the relation identification instruction and acquiring the text data pointed by the relation identification instruction.
A text extraction module 303; the text data is used for identifying, and target continuous texts for describing diagnosis information are extracted from the text data.
A text conversion module 304; and the long-short term memory model is used for calling, inputting the target continuous text into the long-short term memory model, and outputting a target labeling sequence corresponding to the target continuous text through the long-short term memory model.
A relationship resolution module 305; the system is used for reading a preset labeling rule, analyzing the target labeling sequence according to the labeling rule and acquiring all text data matching relations between symptoms and parts represented by the target labeling sequence.
In some embodiments of the present application, the text conversion module 304 further comprises: and a word segmentation submodule. The word segmentation sub-module is used for calling a preset word segmentation model based on a dictionary base and inputting the target continuous text into the word segmentation model; and matching the target continuous text with entries in a dictionary library through the word segmentation model so as to segment the target continuous text based on a maximum matching principle.
In some embodiments of the present application, the apparatus for identifying correspondence between a symptom and a part in a text further includes: and a training module. The training module is used for configuring a network structure of the long-term and short-term memory model and acquiring a pre-training corpus; and training the long-short term memory model based on the pre-training corpus so that the long-short term memory model converts the pre-training corpus into a labeling sequence conforming to a preset labeling rule. .
Further, the training module is used for setting an initial model of the long-short term memory model to sequentially comprise an input layer, a long-short term memory layer, a Conditional Random Field (CRF) layer and an output layer, wherein the long-short term memory layer comprises a forgetting gate, an input gate and an output gate; and setting a global attention mechanism for the forgetting gate of the long and short term memory layer, and adding hidden layer output at the last moment into the input gate and the output gate to update the initial model. .
In some embodiments of the present application, the relationship parsing module 305 further comprises: and an entity association submodule. The entity association submodule is used for identifying a plurality of first entities representing parts and second entities representing symptoms in the target labeling sequence based on the labeling rule; and matching a second entity for each first entity in sequence, and recording each pair of the correlated first entity and second entity as a group of text data matching relation.
In a further specific embodiment, the entity association sub-module is further configured to determine the number of the first entities and the number of the sub-entities included in each of the first entities, and determine the number of the second entities and the number of the sub-entities included in each of the second entities; if the number of the sub-entities contained in one first entity or one second entity is more than one, identifying a text parameter corresponding to each sub-entity contained in the first entity or the second entity, and combining the text parameters to represent the text parameters as the first entity or the second entity.
The identification device for the corresponding relation between the symptom and the part in the text does not need to consume too much manpower to continuously maintain a dictionary base, sets a specific labeling rule for a disease diagnosis result in text information, and builds a long-short term memory model to improve the accuracy and generalization capability of text identification, so that after the text information is converted into a corpus label according with the labeling rule through the long-short term memory model, the corpus label is analyzed, and the text data matching relation between the symptom and the part represented by the disease diagnosis result can be conveniently obtained.
In order to solve the technical problem, an embodiment of the present application further provides a computer device. Referring to fig. 5, fig. 5 is a block diagram of a basic structure of a computer device according to the present embodiment.
The computer device 6 comprises a memory 61, a processor 62, a network interface 63 communicatively connected to each other via a system bus. It is noted that only a computer device 6 having components 61-63 is shown, but it is understood that not all of the shown components are required to be implemented, and that more or fewer components may be implemented instead. As will be understood by those skilled in the art, the computer device is a device capable of automatically performing numerical calculation and/or information processing according to a preset or stored instruction, and the hardware includes, but is not limited to, a microprocessor, an Application Specific Integrated Circuit (ASIC), a Programmable gate array (FPGA), a Digital Signal Processor (DSP), an embedded device, and the like.
The computer device can be a desktop computer, a notebook, a palm computer, a cloud server and other computing devices. The computer equipment can carry out man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch panel or voice control equipment and the like.
The memory 61 includes at least one type of readable storage medium including a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, etc. In some embodiments, the memory 61 may be an internal storage unit of the computer device 6, such as a hard disk or a memory of the computer device 6. In other embodiments, the memory 61 may also be an external storage device of the computer device 6, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a flash Card (FlashCard), and the like, which are provided on the computer device 6. Of course, the memory 61 may also comprise both an internal storage unit of the computer device 6 and an external storage device thereof. In this embodiment, the memory 61 is generally used for storing an operating system installed in the computer device 6 and various types of application software, such as program codes of a method for identifying correspondence between symptoms and parts in text. Further, the memory 61 may also be used to temporarily store various types of data that have been output or are to be output.
The processor 62 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data Processing chip in some embodiments. The processor 62 is typically used to control the overall operation of the computer device 6. In this embodiment, the processor 62 is configured to run a program code stored in the memory 61 or process data, for example, a program code of a method for identifying correspondence between a symptom and a part in the text.
The network interface 63 may comprise a wireless network interface or a wired network interface, and the network interface 63 is typically used for establishing a communication connection between the computer device 6 and other electronic devices.
According to the computer equipment, when the corresponding relation between the symptom and the part in the text is recognized through the computer program stored in the processor execution memory, excessive manpower is not needed to be consumed to continuously maintain the dictionary base, the specific marking rule is set for the disease diagnosis result in the text information, the long-short term memory model is built to improve the accuracy and generalization capability of text recognition, and after the text information is converted into the corpus tag according with the marking rule through the long-short term memory model, the corpus tag is analyzed, so that the text data matching relation between the symptom and the part represented by the disease diagnosis result can be conveniently obtained.
The present application provides another embodiment, which is to provide a computer-readable storage medium storing a textual symptom and part correspondence recognition program, which is executable by at least one processor to cause the at least one processor to perform the steps of the textual symptom and part correspondence recognition method as described above.
The calculation and storage medium provided by the embodiment of the application, when the computer program stored in the calculation and storage medium is executed to identify the corresponding relation between the symptom and the part in the text, the dictionary library is not required to be continuously maintained by consuming too much manpower, the specific labeling rule is set for the disease diagnosis result in the text information, and the long-short term memory model is constructed to improve the accuracy and generalization capability of text identification, so that after the text information is converted into the corpus tag meeting the labeling rule through the long-short term memory model, the corpus tag is analyzed, and the text data matching relation between the symptom and the part represented by the disease diagnosis result can be conveniently obtained.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present application.
In the above embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is merely a logical division, and other divisions may be realized in practice, for example, a plurality of modules or components may be combined or integrated into another system, or some features may be omitted, or not executed.
The modules or components may or may not be physically separate, and the components shown as modules or components may or may not be physical modules, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules or components can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
The present application is not limited to the above-mentioned embodiments, the above-mentioned embodiments are preferred embodiments of the present application, and the present application is only used for illustrating the present application and not for limiting the scope of the present application, it should be noted that, for a person skilled in the art, it is still possible to make several improvements and modifications to the technical solutions described in the foregoing embodiments or to make equivalent substitutions for some technical features without departing from the principle of the present application. All equivalent structures made by using the contents of the specification and the drawings of the present application can be directly or indirectly applied to other related technical fields, and the same should be considered to be included in the protection scope of the present application.
It is to be understood that the above-described embodiments are merely illustrative of some, but not restrictive, of the broad invention, and that the appended drawings illustrate preferred embodiments of the invention and do not limit the scope of the invention. This application is capable of embodiments in many different forms and is provided for the purpose of enabling a thorough understanding of the disclosure of the application. Although the present application has been described in detail with reference to the foregoing embodiments, it will be apparent to one skilled in the art that the present application may be practiced without modification or with equivalents of some of the features described in the foregoing embodiments. All other embodiments that can be obtained by a person skilled in the art based on the embodiments in this application without any creative effort and all equivalent structures made by using the contents of the specification and the drawings of this application can be directly or indirectly applied to other related technical fields and are within the scope of protection of the present application.

Claims (10)

1. A method for identifying correspondence between symptoms and parts in a text is characterized by comprising the following steps:
receiving a relation identification instruction;
responding to the relation identification instruction, and acquiring text data pointed by the relation identification instruction;
recognizing the text data, and extracting a target continuous text for describing diagnosis information from the text data;
calling a long-short term memory model, inputting the target continuous text into the long-short term memory model, and outputting a target labeling sequence corresponding to the target continuous text through the long-short term memory model;
reading a preset labeling rule, analyzing the target labeling sequence according to the labeling rule, and acquiring all text data matching relations between symptoms and parts represented by the target labeling sequence.
2. The method for recognizing correspondence between symptom and part in text according to claim 1, wherein after the step of recognizing the text data and extracting a target continuous text for describing diagnosis information therefrom, the method further comprises:
calling a preset word segmentation model based on a dictionary base, and inputting the target continuous text into the word segmentation model;
and matching the target continuous text with entries in a dictionary library through the word segmentation model so as to segment the target continuous text based on a maximum matching principle.
3. The method for recognizing correspondence between symptom and part in text according to claim 1, wherein before the step of calling the long-short term memory model, the method further comprises:
configuring a network structure of a long-term and short-term memory model and acquiring a pre-training corpus;
and training the long-short term memory model based on the pre-training corpus so that the long-short term memory model converts the pre-training corpus into a labeling sequence conforming to a preset labeling rule.
4. The method according to claim 3, wherein the step of configuring the network structure of the long-term and short-term memory model comprises:
setting an initial model of the long-short term memory model, and enabling the initial model to sequentially comprise an input layer, a long-short term memory layer, a conditional random field layer and an output layer, wherein the long-short term memory layer comprises a forgetting gate, an input gate and an output gate;
and setting a global attention mechanism for the forgetting gate of the long and short term memory layer, and adding hidden layer output at the last moment into the input gate and the output gate to update the initial model.
5. The method for identifying the correspondence between the symptoms and the parts in the text according to claim 1, wherein the step of analyzing the target labeling sequence according to the labeling rule to obtain all text data matching relationships between the symptoms and the parts represented by the target labeling sequence comprises:
identifying a plurality of first entities representing parts and second entities representing symptoms in the target labeling sequence based on the labeling rules;
and matching a second entity for each first entity in sequence, and recording each pair of the correlated first entity and second entity as a group of text data matching relation.
6. The method according to claim 5, wherein the step of identifying a plurality of first entities representing parts and second entities representing symptoms in the target labeling sequence based on the labeling rules comprises:
determining the number of first entities and the number of sub-entities respectively contained by each first entity, and determining the number of second entities and the number of sub-entities respectively contained by each second entity;
if the number of the sub-entities contained in one first entity or one second entity is more than one, identifying a text parameter corresponding to each sub-entity contained in the first entity or the second entity, and combining the text parameters to represent the text parameters as the first entity or the second entity.
7. A device for recognizing correspondence between symptoms and parts in a text, comprising:
the instruction receiving module is used for receiving a relation identification instruction;
the data acquisition module is used for responding to the relation identification instruction and acquiring text data pointed by the relation identification instruction;
the text extraction module is used for identifying the text data and extracting a target continuous text for describing the diagnostic information;
the text conversion module is used for calling a long-short term memory model, inputting the target continuous text into the long-short term memory model and outputting a target labeling sequence corresponding to the target continuous text through the long-short term memory model;
and the relationship analysis module is used for reading a preset labeling rule, analyzing the target labeling sequence according to the labeling rule and acquiring all text data matching relationships between symptoms and parts represented by the target labeling sequence.
8. The apparatus for recognizing correspondence between symptom and part in text according to claim 7, wherein the text conversion module further includes: a word segmentation submodule;
the system comprises a word segmentation model, a word segmentation model and a word segmentation module, wherein the word segmentation model is used for calling a preset dictionary base-based word segmentation model and inputting the target continuous text into the word segmentation model;
and matching the target continuous text with entries in a dictionary library through the word segmentation model so as to segment the target continuous text based on a maximum matching principle.
9. A computer device comprising a memory and a processor, wherein the memory stores a computer program, and the processor implements the steps of the method for recognizing correspondence between symptoms and parts in text according to any one of claims 1 to 6 when executing the computer program.
10. A computer-readable storage medium, having stored thereon a computer program which, when being executed by a processor, carries out the steps of the method for identifying correspondence between symptoms and parts in text according to any one of claims 1 to 6.
CN202010229316.3A 2020-03-27 2020-03-27 Method, device, equipment and medium for identifying corresponding relation between symptom and part in text Pending CN111581972A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010229316.3A CN111581972A (en) 2020-03-27 2020-03-27 Method, device, equipment and medium for identifying corresponding relation between symptom and part in text

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010229316.3A CN111581972A (en) 2020-03-27 2020-03-27 Method, device, equipment and medium for identifying corresponding relation between symptom and part in text

Publications (1)

Publication Number Publication Date
CN111581972A true CN111581972A (en) 2020-08-25

Family

ID=72122420

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010229316.3A Pending CN111581972A (en) 2020-03-27 2020-03-27 Method, device, equipment and medium for identifying corresponding relation between symptom and part in text

Country Status (1)

Country Link
CN (1) CN111581972A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112307766A (en) * 2020-09-22 2021-02-02 北京京东世纪贸易有限公司 Method, apparatus, electronic device and medium for identifying preset category entities
CN112365948A (en) * 2020-10-27 2021-02-12 沈阳东软智能医疗科技研究院有限公司 Cancer stage prediction system
CN113722464A (en) * 2021-09-14 2021-11-30 国泰君安证券股份有限公司 System, method, device, processor and storage medium for realizing named entity recognition processing aiming at security intelligent customer service system
CN113971210A (en) * 2021-12-27 2022-01-25 宇动源(北京)信息技术有限公司 Data dictionary generation method and device, electronic equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109522546A (en) * 2018-10-12 2019-03-26 浙江大学 Entity recognition method is named based on context-sensitive medicine
CN109871538A (en) * 2019-02-18 2019-06-11 华南理工大学 A kind of Chinese electronic health record name entity recognition method
CN109871545A (en) * 2019-04-22 2019-06-11 京东方科技集团股份有限公司 Name entity recognition method and device
CN110083824A (en) * 2019-03-18 2019-08-02 昆明理工大学 A kind of Laotian segmenting method based on Multi-Model Combination neural network
CN110444259A (en) * 2019-06-06 2019-11-12 昆明理工大学 Traditional Chinese medical electronic case history entity relationship extracting method based on entity relationship mark strategy
CN110634546A (en) * 2019-08-14 2019-12-31 中国科学院苏州生物医学工程技术研究所 Electronic medical record text standardization detection method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109522546A (en) * 2018-10-12 2019-03-26 浙江大学 Entity recognition method is named based on context-sensitive medicine
CN109871538A (en) * 2019-02-18 2019-06-11 华南理工大学 A kind of Chinese electronic health record name entity recognition method
CN110083824A (en) * 2019-03-18 2019-08-02 昆明理工大学 A kind of Laotian segmenting method based on Multi-Model Combination neural network
CN109871545A (en) * 2019-04-22 2019-06-11 京东方科技集团股份有限公司 Name entity recognition method and device
CN110444259A (en) * 2019-06-06 2019-11-12 昆明理工大学 Traditional Chinese medical electronic case history entity relationship extracting method based on entity relationship mark strategy
CN110634546A (en) * 2019-08-14 2019-12-31 中国科学院苏州生物医学工程技术研究所 Electronic medical record text standardization detection method

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
杨晓辉;毕雪华;张琳琳;高颖;: "基于多任务的中文电子病历中命名实体识别研究", 东北师大学报(自然科学版), vol. 52, no. 1, 20 March 2020 (2020-03-20), pages 81 - 87 *
王嘉宁: "基于深度学习的命名实体识别与关系抽取", pages 1 - 27, Retrieved from the Internet <URL:https://blog.csdn.net/qq_36426650/article/details/84668741> *
王若佳;魏思仪;王继民;: "BiLSTM-CRF模型在中文电子病历命名实体识别中的应用研究", 文献与数据学报, vol. 1, no. 2, 30 June 2019 (2019-06-30), pages 53 - 66 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112307766A (en) * 2020-09-22 2021-02-02 北京京东世纪贸易有限公司 Method, apparatus, electronic device and medium for identifying preset category entities
CN112365948A (en) * 2020-10-27 2021-02-12 沈阳东软智能医疗科技研究院有限公司 Cancer stage prediction system
CN112365948B (en) * 2020-10-27 2023-07-18 沈阳东软智能医疗科技研究院有限公司 Cancer stage prediction system
CN113722464A (en) * 2021-09-14 2021-11-30 国泰君安证券股份有限公司 System, method, device, processor and storage medium for realizing named entity recognition processing aiming at security intelligent customer service system
CN113971210A (en) * 2021-12-27 2022-01-25 宇动源(北京)信息技术有限公司 Data dictionary generation method and device, electronic equipment and storage medium
CN113971210B (en) * 2021-12-27 2022-04-08 宇动源(北京)信息技术有限公司 Data dictionary generation method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN111581972A (en) Method, device, equipment and medium for identifying corresponding relation between symptom and part in text
WO2021135469A1 (en) Machine learning-based information extraction method, apparatus, computer device, and medium
CN112101041B (en) Entity relationship extraction method, device, equipment and medium based on semantic similarity
US20230169100A1 (en) Method and apparatus for information acquisition, electronic device, and computer-readable storage medium
US20220318286A1 (en) Data updating method and apparatus, electronic device and computer readable storage medium
CN110765754B (en) Text data typesetting method and device, computer equipment and storage medium
US11361002B2 (en) Method and apparatus for recognizing entity word, and storage medium
CN111985229A (en) Sequence labeling method and device and computer equipment
US11599727B2 (en) Intelligent text cleaning method and apparatus, and computer-readable storage medium
CN110866107A (en) Method and device for generating material corpus, computer equipment and storage medium
CN111221936B (en) Information matching method and device, electronic equipment and storage medium
CN111353311A (en) Named entity identification method and device, computer equipment and storage medium
CN112287069A (en) Information retrieval method and device based on voice semantics and computer equipment
CN111523316A (en) Medicine identification method based on machine learning and related equipment
CN111783471A (en) Semantic recognition method, device, equipment and storage medium of natural language
CN113657105A (en) Medical entity extraction method, device, equipment and medium based on vocabulary enhancement
CN114298035A (en) Text recognition desensitization method and system thereof
CN112614559A (en) Medical record text processing method and device, computer equipment and storage medium
CN113627797A (en) Image generation method and device for employee enrollment, computer equipment and storage medium
CN113297852B (en) Medical entity word recognition method and device
CN113868419B (en) Text classification method, device, equipment and medium based on artificial intelligence
CN112199954B (en) Disease entity matching method and device based on voice semantics and computer equipment
CN110705211A (en) Text key content marking method and device, computer equipment and storage medium
CN112015866B (en) Method, device, electronic equipment and storage medium for generating synonymous text
CN110442858B (en) Question entity identification method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination