CN113919356A - Method, device, storage medium and electronic equipment for identifying medical entity - Google Patents

Method, device, storage medium and electronic equipment for identifying medical entity Download PDF

Info

Publication number
CN113919356A
CN113919356A CN202111229618.1A CN202111229618A CN113919356A CN 113919356 A CN113919356 A CN 113919356A CN 202111229618 A CN202111229618 A CN 202111229618A CN 113919356 A CN113919356 A CN 113919356A
Authority
CN
China
Prior art keywords
entity
medical
sample
model
word vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111229618.1A
Other languages
Chinese (zh)
Inventor
孙小婉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Neusoft Corp
Original Assignee
Neusoft Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Neusoft Corp filed Critical Neusoft Corp
Priority to CN202111229618.1A priority Critical patent/CN113919356A/en
Publication of CN113919356A publication Critical patent/CN113919356A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H70/00ICT specially adapted for the handling or processing of medical references
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate

Abstract

The present disclosure relates to a method, an apparatus, a storage medium, and an electronic device for identifying a medical entity. The method comprises the following steps: acquiring a medical text to be identified; inputting a medical entity recognition model to be recognized into the trained medical entity recognition model, and obtaining an entity class recognized from the medical text to be recognized and a character starting position and a character ending position of a corresponding medical entity under the entity class in the medical text to be recognized, wherein the medical entity recognition model comprises a coding sub-model and an entity recognition sub-model, the coding sub-model is obtained by training according to a sample medical text and a sample problem text of the corresponding sample medical text, and the entity recognition sub-model is obtained by training based on a feature vector of the sample medical text output by the coding sub-model; and determining the medical entity from the medical text to be recognized according to the starting position and the ending position. By adopting the method disclosed by the invention, the accuracy of medical entity identification can be improved.

Description

Method, device, storage medium and electronic equipment for identifying medical entity
Technical Field
The present disclosure relates to the field of entity identification technologies, and in particular, to a method, an apparatus, a storage medium, and an electronic device for identifying a medical entity.
Background
The medical text has extremely strong professionalism and abstraction. In addition to some common basic entities such as short entities like "part", "disease", "medication", etc., medical entities in medical texts also exist some complex abstract long entities such as entities like "approach to surgery", "surgical operation", "surgical margin", etc. in surgical record texts.
In the related art, the LSTM model and the CRF model are adopted to cooperatively identify the medical named entity, however, the identification method is low in accuracy.
Disclosure of Invention
An object of the present disclosure is to provide a method, an apparatus, a storage medium, and an electronic device for identifying a medical entity, so as to solve the problems in the related art.
To achieve the above object, a first part of the embodiments of the present disclosure provides a method of identifying a medical entity, the method including:
acquiring a medical text to be identified;
inputting the medical text to be recognized into a trained medical entity recognition model to obtain an entity class recognized from the medical text to be recognized and a character starting position and a character ending position of a corresponding medical entity under the entity class in the medical text to be recognized, wherein the medical entity recognition model comprises a coding sub-model and an entity recognition sub-model, the coding sub-model is obtained by training according to a sample medical text and a sample problem text corresponding to the sample medical text, and the entity recognition sub-model is obtained by training based on a feature vector of the sample medical text output by the coding sub-model;
and determining the medical entity from the medical text to be recognized according to the starting position and the ending position.
Optionally, the entity identifier model is configured to:
after the coding sub-model outputs the feature vectors of the medical texts to be recognized, determining whether the word vectors are initial word vectors corresponding to any entity type and determining whether the word vectors are ending word vectors corresponding to any entity type aiming at each word vector in the feature vectors of the medical texts to be recognized;
for each entity category, determining a head-tail vector position pair set consisting of the position of the start word vector and the position of the end word vector according to the position of each start word vector of the entity category in the feature vector of the medical text to be recognized and the position of each end word vector in the feature vector of the medical text to be recognized;
and judging whether the position of the starting word vector and the position of the ending word vector in the head-tail vector position pair represent the positions of head-tail characters of the same medical entity under the entity category or not aiming at each head-tail vector position pair.
Optionally, in the training process of the medical entity recognition model, the coding sub-model to be trained is used for:
coding the sample medical text representing a single sample medical entity and the sample problem text corresponding to the single sample medical entity to obtain a sample feature vector;
and deleting the sample question text feature vector corresponding to the sample question text from the sample feature vector to obtain the feature vector of the sample medical text for inputting the entity recognition sub-model to be trained.
Optionally, the entity identifier sub-model includes a start character multi-classifier, an end character multi-classifier, and an entity prediction sub-model, and in the training process of the medical entity identifier model, the entity identifier sub-model to be trained is configured to:
for each sample word vector in the feature vectors of the sample medical texts, inputting the sample word vector into the initial character multi-classifier to be trained to obtain a result representing whether the sample word vector is an initial word vector or not, and the entity class corresponding to the sample word vector under the condition that the sample word vector is the initial word vector; and are
For each sample word vector in the feature vectors of the sample medical texts, inputting the sample word vector into the end character multi-classifier to be trained to obtain a result representing whether the sample word vector is an end word vector or not, and the entity class corresponding to the sample word vector under the condition that the sample word vector is the end word vector;
for each entity category, determining a head-tail vector position pair set consisting of the position of the start word vector and the position of the end word vector according to the position of each start word vector of the entity category in the feature vector of the sample medical text and the position of each end word vector in the feature vector of the sample medical text;
and inputting each head-tail vector position pair into the entity prediction sub-model to be trained to obtain an output result representing whether the position of the initial word vector and the position of the end word vector in the head-tail vector position pair represent the positions of head-tail characters of the same medical entity under the entity category.
Optionally, in the training process of the medical entity recognition model, the method further includes:
calculating first loss information according to an output result of the initial character multi-classifier to be trained and a real label for representing whether the sample word vector is a first character of the single sample medical entity;
calculating second loss information according to the output result of the ending character multi-classifier to be trained and a real label used for representing whether the sample word vector is the tail word of the single sample medical entity;
calculating third loss information according to an output result of the entity prediction sub-model to be trained and a real label for representing whether the head-tail vector position represents the head-tail character position of the single sample medical entity;
and obtaining the trained medical entity recognition model when the weighted sum of the first loss information, the second loss information and the third loss information is minimum.
Optionally, the coding sub-model to be trained is a pre-trained BERT model.
Optionally, the sample question text is constructed based on noun interpretation of entity categories corresponding to the single sample medical entity.
According to a second aspect of embodiments of the present disclosure there is provided an apparatus for identifying a medical entity, the apparatus comprising:
the acquisition module is used for acquiring a medical text to be recognized;
the input module is used for inputting the medical text to be recognized into a trained medical entity recognition model to obtain an entity class recognized from the medical text to be recognized and a character starting position and a character ending position of a corresponding medical entity in the medical text to be recognized under the entity class, wherein the medical entity recognition model comprises a coding sub-model and an entity recognition sub-model, the coding sub-model is obtained by training according to a sample medical text and a sample problem text corresponding to the sample medical text, and the entity recognition sub-model is obtained by training based on a feature vector of the sample medical text output by the coding sub-model;
and the execution module is used for determining the medical entity from the medical text to be recognized according to the starting position and the ending position.
Optionally, the entity identifier model is configured to:
after the coding sub-model outputs the feature vectors of the medical texts to be recognized, determining whether the word vectors are initial word vectors corresponding to any entity type and determining whether the word vectors are ending word vectors corresponding to any entity type aiming at each word vector in the feature vectors of the medical texts to be recognized; for each entity category, determining a head-tail vector position pair set consisting of the position of the start word vector and the position of the end word vector according to the position of each start word vector of the entity category in the feature vector of the medical text to be recognized and the position of each end word vector in the feature vector of the medical text to be recognized; and judging whether the position of the starting word vector and the position of the ending word vector in the head-tail vector position pair represent the positions of head-tail characters of the same medical entity under the entity category or not aiming at each head-tail vector position pair.
Optionally, in the training process of the medical entity recognition model, the coding sub-model to be trained is used for:
coding the sample medical text representing a single sample medical entity and the sample problem text corresponding to the single sample medical entity to obtain a sample feature vector; and deleting the sample question text feature vector corresponding to the sample question text from the sample feature vector to obtain the feature vector of the sample medical text for inputting the entity recognition sub-model to be trained.
Optionally, the entity identifier sub-model includes a start character multi-classifier, an end character multi-classifier, and an entity prediction sub-model, and in the training process of the medical entity identifier model, the entity identifier sub-model to be trained is configured to:
for each sample word vector in the feature vectors of the sample medical texts, inputting the sample word vector into the initial character multi-classifier to be trained to obtain a result representing whether the sample word vector is an initial word vector or not, and the entity class corresponding to the sample word vector under the condition that the sample word vector is the initial word vector; inputting the sample word vector into the ending character multi-classifier to be trained aiming at each sample word vector in the feature vectors of the sample medical texts to obtain a result representing whether the sample word vector is an ending word vector or not and the entity category corresponding to the sample word vector under the condition that the sample word vector is the ending word vector; for each entity category, determining a head-tail vector position pair set consisting of the position of the start word vector and the position of the end word vector according to the position of each start word vector of the entity category in the feature vector of the sample medical text and the position of each end word vector in the feature vector of the sample medical text; and inputting each head-tail vector position pair into the entity prediction sub-model to be trained to obtain an output result representing whether the position of the initial word vector and the position of the end word vector in the head-tail vector position pair represent the positions of head-tail characters of the same medical entity under the entity category.
Optionally, the apparatus further includes a calculating module, configured to calculate, during the training process of the medical entity recognition model, first loss information according to an output result of the initial character multi-classifier to be trained and a real label used for characterizing whether the sample word vector is a first word of the single sample medical entity; calculating second loss information according to the output result of the ending character multi-classifier to be trained and a real label used for representing whether the sample word vector is the tail word of the single sample medical entity; calculating third loss information according to an output result of the entity prediction sub-model to be trained and a real label for representing whether the head-tail vector position represents the head-tail character position of the single sample medical entity; and obtaining the trained medical entity recognition model when the weighted sum of the first loss information, the second loss information and the third loss information is minimum.
Optionally, the coding sub-model to be trained is a pre-trained BERT model.
Optionally, the sample question text is constructed based on noun interpretation of entity categories corresponding to the single sample medical entity.
A third part of the embodiments of the present disclosure provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the method of any one of the first parts.
A fourth aspect of the embodiments of the present disclosure provides an electronic apparatus, including:
a memory having a computer program stored thereon;
a processor for executing the computer program in the memory to implement the steps of the method of any of the first parts.
By adopting the technical scheme, the following beneficial technical effects can be at least achieved:
the medical entity identification method comprises the steps of obtaining a medical text to be identified, inputting the medical text to be identified into a trained medical entity identification model, obtaining an entity category identified from the medical text to be identified and a character starting position and a character ending position of a corresponding medical entity under the entity category in the medical text to be identified, and determining the medical entity from the medical text to be identified according to the starting position and the ending position. The coding sub-model in the medical entity recognition model is obtained by training according to the sample medical text and the sample problem text corresponding to the sample medical text, and the sample problem text can assist the coding sub-model to better understand the semantics of the sample medical text in the training process, so that the coding sub-model can learn the capability of coding the medical entities expressed by the same meaning (the same entity category) and different characters in the sample medical text into the same or similar feature vectors on the basis of understanding the semantics of the sample medical text. Therefore, the coding sub-model obtained by training in the mode can code the medical text to be recognized into more accurate characteristic vectors, and the entity recognition sub-model can decode the character starting position and the character ending position corresponding to the medical long entity/medical short entity more accurately according to the accurate characteristic vectors, so that the accurate medical entity is obtained. Therefore, by adopting the technical scheme disclosed by the invention, the accuracy of medical entity identification can be improved.
Additional features and advantages of the disclosure will be set forth in the detailed description which follows.
Drawings
The accompanying drawings, which are included to provide a further understanding of the disclosure and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description serve to explain the disclosure without limiting the disclosure. In the drawings:
fig. 1 is a flow chart illustrating a method of identifying a medical entity according to an exemplary embodiment of the present disclosure.
FIG. 2 is a diagram illustrating a medical entity recognition model architecture to be trained in accordance with an exemplary embodiment of the present disclosure.
Fig. 3 is a block diagram illustrating an apparatus for identifying a medical entity according to an exemplary embodiment of the present disclosure.
Fig. 4 is a block diagram illustrating an electronic device according to an exemplary embodiment of the present disclosure.
Fig. 5 is a block diagram illustrating another electronic device according to an exemplary embodiment of the present disclosure.
Detailed Description
The following detailed description of specific embodiments of the present disclosure is provided in connection with the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating the present disclosure, are given by way of illustration and explanation only, not limitation.
In the related art, the LSTM model and the CRF model are employed to cooperatively identify medical named entities. The LSTM model is a long-short term memory (long-short term memory) model, and the CRF model is a Conditional Random field (Conditional Random Fields).
Because the LSTM model is trained only according to the sample medical text, the LSTM model may encode the same entity category expressed by different characters into feature vectors with huge differences, which causes that the feature vectors of a single entity may be segmented into a plurality of sub-vectors by the CRF model during word segmentation, thereby causing the problem that a long entity is broken and further the broken long entity cannot be identified. Therefore, the accuracy of this entity identification method in the related art is low.
In view of this, the present disclosure provides a method, an apparatus, a storage medium, and an electronic device for identifying a medical entity, so as to improve accuracy of medical entity identification.
Fig. 1 is a flow chart illustrating a method of identifying a medical entity according to an exemplary embodiment of the present disclosure. As shown in fig. 1, the method of identifying a medical entity comprises the steps of:
and S11, acquiring the medical text to be recognized.
The medical text to be recognized includes one or more medical entities. In a possible case, the medical entity may not exist in the medical text to be recognized, and in this case, the medical entity recognition model may output a result representing that the medical entity does not exist.
S12, inputting the medical text to be recognized into the trained medical entity recognition model, and obtaining the entity category recognized from the medical text to be recognized and the corresponding character starting position and ending position of the medical entity under the entity category in the medical text to be recognized.
The medical entity recognition model comprises a coding sub-model and an entity recognition sub-model, wherein the coding sub-model is obtained by training according to a sample medical text and a sample problem text corresponding to the sample medical text, and the entity recognition sub-model is obtained by training based on a feature vector of the sample medical text output by the coding sub-model.
In one possible embodiment, the coding sub-model to be trained is a pre-trained BERT model.
It is worth explaining that the BERT model is a kind of migration learning model. The pre-training BERT model can be obtained by performing multi-task learning (the multi-task learning comprises a masking language model training task and a next sentence prediction task) on the basis of a bidirectional deep network Transformer. The pre-trained BERT model employed in the present disclosure may be a pre-trained BERT model already disclosed in the related art.
And performing migration training on the pre-trained BERT model according to the sample medical text and the sample problem text corresponding to the sample medical text, so that the pre-trained BERT model can be adapted to the medical text data set. The method for training the pre-training BERT model according to the sample medical texts and the sample problem texts corresponding to the sample medical texts is an unsupervised training method.
It should be noted that the principle of obtaining the coding sub-model according to the sample medical text and the sample problem text training corresponding to the sample medical text is similar to the principle understood by machine reading in the related art. Machine Reading Comprehension (MRC) is a technique that uses algorithms to make computing mechanisms solve article semantics and answer related questions. It is understood that the ability to understand natural language for a machine to answer a given question through a given context is provided.
And the coding sub-model is obtained by training according to the sample medical text and the sample problem text corresponding to the sample medical text, and the coding space of the training sub-model is more suitable for the coding of the medical entity set than the coding space obtained by training only according to the sample medical text.
S13, determining the medical entity from the medical text to be recognized according to the starting position and the ending position.
Under the condition that the corresponding character starting position and text ending position of the medical entity in the medical text to be recognized are known, the text segment representing the medical entity can be determined from the medical text to be recognized according to the starting position and the ending position of the characters.
By adopting the method disclosed by the invention, the entity category identified from the medical text to be identified and the corresponding initial position and the ending position of the medical entity corresponding to the entity category in the medical text to be identified are obtained by acquiring the medical text to be identified and inputting the medical text to be identified into the trained medical entity identification model, and the medical entity can be determined from the medical text to be identified according to the initial position and the ending position. The coding sub-model in the medical entity recognition model is obtained by training according to the sample medical text and the sample problem text corresponding to the sample medical text, and the sample problem text can assist the coding sub-model to better understand the semantics of the sample medical text in the training process, so that the coding sub-model can learn the capability of coding the medical entities expressed by the same meaning (the same entity category) and different characters in the sample medical text into the same or similar feature vectors on the basis of understanding the semantics of the sample medical text. Therefore, the coding sub-model obtained by training in the mode can code the medical text to be recognized into more accurate characteristic vectors, and the entity recognition sub-model can decode the character starting position and the character ending position corresponding to the medical long entity/medical short entity more accurately according to the accurate characteristic vectors, so that the accurate medical entity is obtained. Therefore, by adopting the technical scheme disclosed by the invention, the accuracy of medical entity identification can be improved.
Optionally, in the step S12, the entity identifier sub-model is configured to perform the following steps:
s121, after the feature vectors of the medical text to be recognized are output by the coding sub-model, determining whether the word vectors are initial word vectors corresponding to any entity type and determining whether the word vectors are ending word vectors corresponding to any entity type aiming at each word vector in the feature vectors of the medical text to be recognized.
And inputting the medical text to be recognized into the coding sub-model to obtain the feature vector of the medical text to be recognized output by the coding sub-model. It is easy to understand that the feature vector of the medical text to be recognized includes a word vector of each word in the medical text to be recognized.
The entity identification submodel determines whether the word vector is a starting word vector corresponding to any entity category and determines whether the word vector is an ending word vector corresponding to any entity category aiming at each word vector in the feature vectors of the medical texts to be identified.
Wherein any entity category refers to any one or more entity categories. Illustratively, the medical text to be recognized is "left hemihepatectomy", wherein a word vector corresponding to the word "left" may be a starting word vector of the orientation category entity "left hemihepatectomy", or may be a starting word vector of the resection manner category entity "left hemihepatectomy".
Specifically, the entity identification submodel determines, for each word vector in the feature vectors of the medical text to be identified, whether the word vector is a starting word vector, and determines which category or categories the entity category corresponding to the starting word vector corresponds to in the case where the word vector is determined to be the starting word vector. Similarly, the entity identification submodel determines, for each word vector in the feature vectors of the medical text to be identified, whether the word vector is an end word vector, and determines which category or categories the entity category corresponding to the end word vector corresponds to in the case that the word vector is determined to be the end word vector.
And S122, for each entity category, determining a head-tail vector position pair set consisting of the position of the start word vector and the position of the end word vector according to the position of each start word vector of the entity category in the feature vector of the medical text to be identified and the position of each end word vector in the feature vector of the medical text to be identified.
In specific implementation, for each entity category, according to the position of each start word vector in the feature vector of the medical text to be recognized and the position of each end word vector in the feature vector of the medical text to be recognized under the entity category, a set of head-to-tail vector position pairs consisting of the position of the start word vector and the position of the end word vector is determined, and the set may include zero, one, or multiple head-to-tail vector position pairs. For example, assume that there is a start word vector A, B, C for an entity class and a stop word vector X, Y for an entity class. The positions of the start word vector A, B, C in the feature vector of the medical text to be recognized are a, b, and c, respectively, and the positions of the end word vector X, Y in the feature vector of the medical text to be recognized are x and y, respectively. In case the feature vectors of the medical text to be recognized are characterized by a matrix, a, b, c, x, y characterize a certain column (or a certain row) in the matrix. The head and tail vector position pairs are determined to be (a, x), (a, y), (b, x), (b, y), (c, x), (c, y) according to the positions a, b and c of the start word vector A, B, C and x and y of the end word vector X, Y in the feature vector of the medical text to be recognized.
S123, judging whether the position of the initial word vector and the position of the end word vector in the head-tail vector position pair represent the positions of head-tail characters of the same medical entity under the entity category or not aiming at each head-tail vector position pair.
And judging whether the position of the initial word vector and the position of the ending word vector in each head-tail vector position pair represent the positions of head-tail characters of the same medical entity under the corresponding entity category or not. It is worth explaining here that the same entity class corresponds to a plurality of different medical entities. For example, in the following embodiments, the entity category in table 2 is "resection mode", and the entity category may correspond to the medical entity "left hemihepatectomy", the medical entity "caudate lobe resection", and the like.
Under the condition that the position of any head-tail vector is determined to represent the positions of head-tail characters of the same medical entity, the character starting position of the medical entity corresponding to (representing) the position of the starting character vector in the head-tail vector position pair and the character ending position of the medical entity corresponding to (representing) the position of the ending character vector in the head-tail vector position pair are output.
With the entity recognition submodel of the present disclosure, it is possible to determine, for each word vector, whether the word vector is a starting word vector and which entity class or classes the word vector is. And determines for each word vector whether the word vector is an end word vector and which one or more entity classes the end word vector is. Further, for each head-tail vector position pair in any entity category, whether the head-tail vector position pair is the head-tail position of the same medical entity in any entity category is judged. By adopting the method, all long entities and short entities in the medical text to be recognized can be recognized accurately. For example, a long entity and a short entity embedded in the long entity can be identified. For example, multiple medical entities having the same starting word (character) can be identified.
Optionally, in the training process of the medical entity recognition model, the coding sub-model to be trained is used for:
coding the sample medical text representing a single sample medical entity and the sample problem text corresponding to the single sample medical entity to obtain a sample feature vector; and deleting the sample question text feature vector corresponding to the sample question text from the sample feature vector to obtain the feature vector of the sample medical text for inputting the entity recognition sub-model to be trained.
Illustratively, referring to FIG. 2, a sample medical text (x)1、x2…), and sample question text (q)1、q2…), the coding sub-model to be trained encodes the sample medical text and the sample problem text to obtain a sample feature vector E as shown in fig. 2. The sample question text feature vector corresponding to the sample question text is deleted from the sample feature vector E, resulting in the feature vector E' of the sample medical text as shown in fig. 2.
It should be noted that, in the application process of the medical entity recognition model, only the medical text to be recognized is input into the coding sub-model, so that in the application process of the medical entity recognition model, the coding sub-model does not execute the step of deleting part of the feature vectors in the training process.
Optionally, the sample question text is constructed based on noun interpretation of entity categories corresponding to the single sample medical entity.
A single sample medical entity characterizes that the sample is a text sample corresponding to a medical entity.
In one possible embodiment, for a single sample medical entity, the corresponding sample question text may be constructed according to the noun interpretation of the entity category (i.e., entity concept/entity name) of the single medical entity. Illustratively, the entity categories and corresponding question texts are shown in table 1 below.
Entity classes Question text
Surgical cutterEdge The entity of the edge of the surgically excised tissue in the text is found.
Cutting mode Finding out the surgical resection mode in the text.
TABLE 1
In another possible embodiment, for a single sample medical entity, the corresponding sample question text may be constructed jointly from the noun explanation of the entity category (i.e., entity concept/entity name) of the single medical entity and the text expression features of the sample medical text characterizing the single sample medical entity. Illustratively, as shown in table 2 below:
Figure BDA0003315531210000141
TABLE 2
Optionally, the entity identifier sub-model includes a start character multi-classifier, an end character multi-classifier, and an entity prediction sub-model, and in the training process of the medical entity identifier model, the entity identifier sub-model to be trained is configured to perform the following steps:
step one, aiming at each sample word vector in the feature vectors of the sample medical texts, inputting the sample word vector into the initial character multi-classifier to be trained to obtain a result representing whether the sample word vector is an initial word vector or not, and the entity category corresponding to the sample word vector under the condition that the sample word vector is the initial word vector.
For example, as shown in fig. 2, the start character multi-classifier and the end character multi-classifier may both adopt a softmax classifier. And each entity classification category Y belonging to the starting character multi-classifier and the ending character multi-classifier belongs to Y, wherein Y represents a list of all medical entity categories in the softmax classifier.
In specific implementation, the entity identification submodel inputs the sample word vector into an initial character multi-classifier to be trained for classification aiming at each sample word vector in the feature vectors of the sample medical texts, so as to obtain a classification result that the sample word vector is an initial word vector of one or more entity classes, or obtain a classification result that the sample word vector is not an initial word vector.
Step two, aiming at each sample word vector in the feature vectors of the sample medical texts, inputting the sample word vector into the ending character multi-classifier to be trained to obtain a result representing whether the sample word vector is an ending word vector or not, and the entity category corresponding to the sample word vector under the condition that the sample word vector is the ending word vector.
In an example, the entity identification submodel inputs the sample word vector into a final character multi-classifier to be trained for classification aiming at each sample word vector in the feature vectors of the sample medical texts, so as to obtain a classification result that the sample word vector is a final word vector of one or more entity classes, or obtain a classification result that the sample word vector is not a final word vector.
And step three, for each entity category, determining a head-tail vector position pair set consisting of the position of the start word vector and the position of the end word vector according to the position of each start word vector of the entity category in the feature vector of the sample medical text and the position of each end word vector of the entity category in the feature vector of the sample medical text.
For example, assume that there is a start word vector A, B, C for an entity class and a stop word vector X, Y for an entity class. The start word vector A, B, C has positions a, b, and c in the feature vector of the sample medical text, respectively, and the end word vector X, Y has positions x and y in the feature vector of the sample medical text, respectively. In case the characterization form of the feature vector of the sample medical text is a matrix, the positions a, b, c, x, y characterize a certain column (or a certain row) in the matrix. The head and tail vector position pairs are determined to be (a, x), (a, y), (b, x), (b, y), (c, x), (c, y) according to the positions a, b, c, respectively, of the start word vector A, B, C and the positions x, y, respectively, of the end word vector X, Y in the feature vector of the sample medical text.
An implementable embodiment determines the position of any starting word vector in the feature vector of the sample medical text by:
Figure BDA0003315531210000151
wherein the content of the first and second substances,
Figure BDA0003315531210000152
the starting word vector characterized by the ith row (or column) in the matrix composed of the starting word vectors is characterized,
Figure BDA0003315531210000153
the position of the start word vector (or an index characterizing the position) is characterized.
Similarly, an implementation can determine the position of any ending word vector in the feature vector of the sample medical text by:
Figure BDA0003315531210000154
wherein the content of the first and second substances,
Figure BDA0003315531210000155
the end word vector characterized by the jth row (or column) in the matrix composed of end word vectors is characterized,
Figure BDA0003315531210000156
the position of the end word vector (or an index characterizing the position) is characterized.
Inputting each head-tail vector position pair into the entity prediction sub-model to be trained to obtain an output result representing whether the position of the initial word vector and the position of the end word vector in the head-tail vector position pair represent the positions of head-tail characters of the same medical entity under the entity category.
Exemplarily, the head-tail vector position pair (a, x) is input into the entity predictor model to be trained, so as to obtain an output result representing whether the position a of the start word vector and the position x of the end word vector in the head-tail vector position pair (a, x) represent the positions of the head-tail characters of the same medical entity under the corresponding entity category. If yes, the start word vector A and the end word vector X corresponding to the head-tail vector position pair (a, X) are the encoding vectors of the head-tail characters of the same entity.
Optionally, in the training process of the medical entity recognition model, the method further includes the following steps:
and calculating first loss information according to the output result of the initial character multi-classifier to be trained and a real label for representing whether the sample word vector is the first character of the single sample medical entity.
Illustratively, the formula for calculating the first loss information may be characterized as lossstart=cross_entropy(Pstart,Ystart) Wherein P isstart=softmaxeach_row(E’n,Cstart),E’nCharacterizing the position of the n-th sample word vector, CstartHyper-parameter, P, characterizing a starting character multi-classifierstartCharacterization of the results of the classification of the sample word vectors by the initial character multi-classifier, YstartA true label characterizing the sample word vector.
And calculating second loss information according to the output result of the ending character multi-classifier to be trained and a real label for representing whether the sample word vector is the tail word of the single sample medical entity.
Illustratively, the formula for calculating the second loss information may be characterized as lossend=cross_entropy(Pend,Yend) Wherein P isend=softmaxeach_row(E’n,Cend) Wherein, E'nCharacterizing the position of the n-th sample word vector, CendHyper-parameter, P, characterizing an end character multi-classifierendMultiple classifications of end-of-token charactersResult of classification of sample word vectors by the machine, YendA true label characterizing the sample word vector.
And calculating third loss information according to an output result of the entity prediction sub-model to be trained and a real label for representing whether the head-tail vector position represents the head-tail character position of the single sample medical entity.
Illustratively, the formula for calculating the third loss information may be characterized as lossspan=cross_entropy(Pstart,end,Ystart,end) Wherein, in the step (A),
Figure BDA0003315531210000171
wherein the content of the first and second substances,
Figure BDA0003315531210000172
characterizing the position of the ith start word vector, wherein,
Figure BDA0003315531210000173
E'jendcharacterizing the position of the jth end word vector, wherein
Figure BDA0003315531210000174
(in the same way as,
Figure BDA0003315531210000175
),Pstart,endoutput of the characterization entity predictor model, Ystart,endAnd (4) representing real labels of head-tail vector position pairs.
And obtaining the trained medical entity recognition model when the weighted sum of the first loss information, the second loss information and the third loss information is minimum.
For example, a weighted sum of the first loss information, the second loss information, and the third loss information may be characterized as having a loss α -lossstart+β·lossend+γ·lossspanWherein α, β, γ ∈ [0,1 ]]Training the model with the hyperparameter to control the importance degree of the loss of the three parts, wherein the training process corresponds to the minimum weighted sum valueThe medical entity recognition model is a trained medical entity recognition model.
By adopting the method and the LSTM + CRF mode in the related technology to carry out experiments on the same medical text data set to be identified (particularly the liver cancer operation record data set), the following experiment results are obtained:
Model F1
LSTM+CRF 83.84
methods of the present disclosure 91.11
TABLE 3
Wherein F1 is the result of weighting the accuracy and the recall. As can be seen from table 3, this method of identifying a medical entity of the present disclosure works better than the method in the related art.
Based on the same inventive concept, the disclosed embodiment further provides an apparatus for identifying a medical entity, as shown in fig. 3, wherein the apparatus 300 further includes:
an obtaining module 310, configured to obtain a medical text to be recognized;
the input module 320 is configured to input the medical text to be recognized into a trained medical entity recognition model, and obtain an entity category recognized from the medical text to be recognized and a character start position and a character end position of a corresponding medical entity in the medical text to be recognized under the entity category, where the medical entity recognition model includes a coding sub-model and an entity recognition sub-model, the coding sub-model is trained according to a sample medical text and a sample question text corresponding to the sample medical text, and the entity recognition sub-model is trained based on a feature vector of the sample medical text output by the coding sub-model;
an executing module 330, configured to determine the medical entity from the medical text to be recognized according to the starting position and the ending position.
By adopting the device disclosed by the invention, the entity category identified from the medical text to be identified and the corresponding character starting position and ending position of the medical entity corresponding to the entity category in the medical text to be identified are obtained by acquiring the medical text to be identified and inputting the medical text to be identified into the trained medical entity identification model, and the medical entity can be determined from the medical text to be identified according to the starting position and the ending position. The coding sub-model in the medical entity recognition model is obtained by training according to the sample medical text and the sample problem text corresponding to the sample medical text, and the sample problem text can assist the coding sub-model to better understand the semantics of the sample medical text in the training process, so that the coding sub-model can learn the capability of coding the medical entities expressed by the same meaning (the same entity category) and different characters in the sample medical text into the same or similar feature vectors on the basis of understanding the semantics of the sample medical text. Therefore, the coding sub-model obtained by training in the mode can code the medical text to be recognized into more accurate characteristic vectors, and the entity recognition sub-model can decode the character starting position and the character ending position corresponding to the medical long entity/medical short entity more accurately according to the accurate characteristic vectors, so that the accurate medical entity is obtained. Therefore, by adopting the technical scheme disclosed by the invention, the accuracy of medical entity identification can be improved.
Optionally, the entity identifier model is configured to:
after the coding sub-model outputs the feature vectors of the medical texts to be recognized, determining whether the word vectors are initial word vectors corresponding to any entity type and determining whether the word vectors are ending word vectors corresponding to any entity type aiming at each word vector in the feature vectors of the medical texts to be recognized; for each entity category, determining a head-tail vector position pair set consisting of the position of the start word vector and the position of the end word vector according to the position of each start word vector of the entity category in the feature vector of the medical text to be recognized and the position of each end word vector in the feature vector of the medical text to be recognized; and judging whether the position of the starting word vector and the position of the ending word vector in the head-tail vector position pair represent the positions of head-tail characters of the same medical entity under the entity category or not aiming at each head-tail vector position pair.
Optionally, in the training process of the medical entity recognition model, the coding sub-model to be trained is used for:
coding the sample medical text representing a single sample medical entity and the sample problem text corresponding to the single sample medical entity to obtain a sample feature vector; and deleting the sample question text feature vector corresponding to the sample question text from the sample feature vector to obtain the feature vector of the sample medical text for inputting the entity recognition sub-model to be trained.
Optionally, the entity identifier sub-model includes a start character multi-classifier, an end character multi-classifier, and an entity prediction sub-model, and in the training process of the medical entity identifier model, the entity identifier sub-model to be trained is configured to:
for each sample word vector in the feature vectors of the sample medical texts, inputting the sample word vector into the initial character multi-classifier to be trained to obtain a result representing whether the sample word vector is an initial word vector or not, and the entity class corresponding to the sample word vector under the condition that the sample word vector is the initial word vector; inputting the sample word vector into the ending character multi-classifier to be trained aiming at each sample word vector in the feature vectors of the sample medical texts to obtain a result representing whether the sample word vector is an ending word vector or not and the entity category corresponding to the sample word vector under the condition that the sample word vector is the ending word vector; for each entity category, determining a head-tail vector position pair set consisting of the position of the start word vector and the position of the end word vector according to the position of each start word vector of the entity category in the feature vector of the sample medical text and the position of each end word vector in the feature vector of the sample medical text; and inputting each head-tail vector position pair into the entity prediction sub-model to be trained to obtain an output result representing whether the position of the initial word vector and the position of the end word vector in the head-tail vector position pair represent the positions of head-tail characters of the same medical entity under the entity category.
Optionally, the apparatus further includes a calculating module, configured to calculate, during the training process of the medical entity recognition model, first loss information according to an output result of the initial character multi-classifier to be trained and a real label used for characterizing whether the sample word vector is a first word of the single sample medical entity; calculating second loss information according to the output result of the ending character multi-classifier to be trained and a real label used for representing whether the sample word vector is the tail word of the single sample medical entity; calculating third loss information according to an output result of the entity prediction sub-model to be trained and a real label for representing whether the head-tail vector position represents the head-tail character position of the single sample medical entity; and obtaining the trained medical entity recognition model when the weighted sum of the first loss information, the second loss information and the third loss information is minimum.
Optionally, the coding sub-model to be trained is a pre-trained BERT model.
Optionally, the sample question text is constructed based on noun interpretation of entity categories corresponding to the single sample medical entity.
With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.
The disclosed embodiments also provide a non-transitory computer readable storage medium having stored thereon a computer program that, when executed by a processor, implements the steps of the method of any of the above embodiments.
Fig. 4 is a block diagram illustrating an electronic device 700 according to an example embodiment. As shown in fig. 4, the electronic device 700 may include: a processor 701 and a memory 702. The electronic device 700 may also include one or more of a multimedia component 703, an input/output (I/O) interface 704, and a communication component 705.
The processor 701 is configured to control the overall operation of the electronic device 700 to complete all or part of the steps of the method for identifying a medical entity. The memory 702 is used to store various types of data to support operation at the electronic device 700, such as instructions for any application or method operating on the electronic device 700 and application-related data, such as contact data, transmitted and received messages, pictures, audio, video, and the like. The Memory 702 may be implemented by any type of volatile or non-volatile Memory device or combination thereof, such as Static Random Access Memory (SRAM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Erasable Programmable Read-Only Memory (EPROM), Programmable Read-Only Memory (PROM), Read-Only Memory (ROM), magnetic Memory, flash Memory, magnetic disk, or optical disk. The multimedia components 703 may include screen and audio components. Wherein the screen may be, for example, a touch screen and the audio component is used for outputting and/or inputting audio signals. For example, the audio component may include a microphone for receiving external audio signals. The received audio signal may further be stored in the memory 702 or transmitted through the communication component 705. The audio assembly also includes at least one speaker for outputting audio signals. The I/O interface 704 provides an interface between the processor 701 and other interface modules, such as a keyboard, mouse, buttons, etc. These buttons may be virtual buttons or physical buttons. The communication component 705 is used for wired or wireless communication between the electronic device 700 and other devices. Wireless Communication, such as Wi-Fi, bluetooth, Near Field Communication (NFC), 2G, 3G, 4G, NB-IOT, eMTC, or other 5G, etc., or a combination of one or more of them, which is not limited herein. The corresponding communication component 705 may thus include: Wi-Fi module, Bluetooth module, NFC module, etc.
In an exemplary embodiment, the electronic Device 700 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic components for performing the above-described method for identifying medical entities.
In another exemplary embodiment, a computer-readable storage medium comprising program instructions which, when executed by a processor, carry out the steps of the above-described method of identifying a medical entity is also provided. For example, the computer readable storage medium may be the memory 702 described above comprising program instructions executable by the processor 701 of the electronic device 700 to perform the method of identifying a medical entity described above.
Fig. 5 is a block diagram illustrating an electronic device 1900 according to an example embodiment. For example, the electronic device 1900 may be provided as a server. Referring to fig. 5, an electronic device 1900 includes a processor 1922, which may be one or more in number, and a memory 1932 for storing computer programs executable by the processor 1922. The computer program stored in memory 1932 may include one or more modules that each correspond to a set of instructions. Further, the processor 1922 may be configured to execute the computer program to perform the above-described method of identifying a medical entity.
Additionally, electronic device 1900 may also include a power component 1926 and a communication component 1950, the power component 1926 may be configured to perform power management of the electronic device 1900, and the communication component 1950 may be configured to enable communication, e.g., wired or wireless communication, of the electronic device 1900. In addition, the electronic device 1900 may also include input/output (I/O) interfaces 1958. The electronic device 1900 may operate based on an operating system, such as Windows Server, stored in memory 1932TM,Mac OS XTM,UnixTM,LinuxTMAnd so on.
In another exemplary embodiment, a computer-readable storage medium comprising program instructions which, when executed by a processor, carry out the steps of the above-described method of identifying a medical entity is also provided. For example, the non-transitory computer readable storage medium may be the memory 1932 described above that includes program instructions executable by the processor 1922 of the electronic device 1900 to perform the method of identifying a medical entity described above.
In another exemplary embodiment, a computer program product is also provided, which comprises a computer program executable by a programmable apparatus, the computer program having code portions for performing the above-mentioned method of identifying a medical entity when executed by the programmable apparatus.
The preferred embodiments of the present disclosure are described in detail with reference to the accompanying drawings, however, the present disclosure is not limited to the specific details of the above embodiments, and various simple modifications may be made to the technical solution of the present disclosure within the technical idea of the present disclosure, and these simple modifications all belong to the protection scope of the present disclosure.
It should be noted that the various features described in the above embodiments may be combined in any suitable manner without departing from the scope of the invention. In order to avoid unnecessary repetition, various possible combinations will not be separately described in this disclosure.
In addition, any combination of various embodiments of the present disclosure may be made, and the same should be considered as the disclosure of the present disclosure, as long as it does not depart from the spirit of the present disclosure.

Claims (10)

1. A method of identifying a medical entity, the method comprising:
acquiring a medical text to be identified;
inputting the medical text to be recognized into a trained medical entity recognition model to obtain an entity class recognized from the medical text to be recognized and a character starting position and a character ending position of a corresponding medical entity under the entity class in the medical text to be recognized, wherein the medical entity recognition model comprises a coding sub-model and an entity recognition sub-model, the coding sub-model is obtained by training according to a sample medical text and a sample problem text corresponding to the sample medical text, and the entity recognition sub-model is obtained by training based on a feature vector of the sample medical text output by the coding sub-model;
and determining the medical entity from the medical text to be recognized according to the starting position and the ending position.
2. The method of claim 1, wherein the entity identifier model is configured to:
after the coding sub-model outputs the feature vectors of the medical texts to be recognized, determining whether the word vectors are initial word vectors corresponding to any entity type and determining whether the word vectors are ending word vectors corresponding to any entity type aiming at each word vector in the feature vectors of the medical texts to be recognized;
for each entity category, determining a head-tail vector position pair set consisting of the position of the start word vector and the position of the end word vector according to the position of each start word vector of the entity category in the feature vector of the medical text to be recognized and the position of each end word vector in the feature vector of the medical text to be recognized;
and judging whether the position of the starting word vector and the position of the ending word vector in the head-tail vector position pair represent the positions of head-tail characters of the same medical entity under the entity category or not aiming at each head-tail vector position pair.
3. The method of claim 1, wherein during the training of the medical entity recognition model, the coding submodel to be trained is used to:
coding the sample medical text representing a single sample medical entity and the sample problem text corresponding to the single sample medical entity to obtain a sample feature vector;
and deleting the sample question text feature vector corresponding to the sample question text from the sample feature vector to obtain the feature vector of the sample medical text for inputting the entity recognition sub-model to be trained.
4. The method according to claim 3, wherein the entity recognition sub-model comprises a start character multi-classifier, an end character multi-classifier, and an entity prediction sub-model, and wherein during the training of the medical entity recognition model, the entity recognition sub-model to be trained is used for:
for each sample word vector in the feature vectors of the sample medical texts, inputting the sample word vector into the initial character multi-classifier to be trained to obtain a result representing whether the sample word vector is an initial word vector or not, and the entity class corresponding to the sample word vector under the condition that the sample word vector is the initial word vector; and are
For each sample word vector in the feature vectors of the sample medical texts, inputting the sample word vector into the end character multi-classifier to be trained to obtain a result representing whether the sample word vector is an end word vector or not, and the entity class corresponding to the sample word vector under the condition that the sample word vector is the end word vector;
for each entity category, determining a head-tail vector position pair set consisting of the position of the start word vector and the position of the end word vector according to the position of each start word vector of the entity category in the feature vector of the sample medical text and the position of each end word vector in the feature vector of the sample medical text;
and inputting each head-tail vector position pair into the entity prediction sub-model to be trained to obtain an output result representing whether the position of the initial word vector and the position of the end word vector in the head-tail vector position pair represent the positions of head-tail characters of the same medical entity under the entity category.
5. The method of claim 4, wherein during the training of the medical entity recognition model, further comprising:
calculating first loss information according to an output result of the initial character multi-classifier to be trained and a real label for representing whether the sample word vector is a first character of the single sample medical entity;
calculating second loss information according to the output result of the ending character multi-classifier to be trained and a real label used for representing whether the sample word vector is the tail word of the single sample medical entity;
calculating third loss information according to an output result of the entity prediction sub-model to be trained and a real label for representing whether the head-tail vector position represents the head-tail character position of the single sample medical entity;
and obtaining the trained medical entity recognition model when the weighted sum of the first loss information, the second loss information and the third loss information is minimum.
6. The method according to any of claims 1-5, characterized in that the coding sub-model to be trained is a pre-trained BERT model.
7. The method of claim 3, wherein the sample question text is constructed based on noun interpretations of entity classes to which the single sample medical entity corresponds.
8. An apparatus for identifying a medical entity, the apparatus comprising:
the acquisition module is used for acquiring a medical text to be recognized;
the input module is used for inputting the medical text to be recognized into a trained medical entity recognition model to obtain an entity class recognized from the medical text to be recognized and a character starting position and a character ending position of a corresponding medical entity in the medical text to be recognized under the entity class, wherein the medical entity recognition model comprises a coding sub-model and an entity recognition sub-model, the coding sub-model is obtained by training according to a sample medical text and a sample problem text corresponding to the sample medical text, and the entity recognition sub-model is obtained by training based on a feature vector of the sample medical text output by the coding sub-model;
and the execution module is used for determining the medical entity from the medical text to be recognized according to the starting position and the ending position.
9. A non-transitory computer readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 7.
10. An electronic device, comprising:
a memory having a computer program stored thereon;
a processor for executing the computer program in the memory to carry out the steps of the method of any one of claims 1 to 7.
CN202111229618.1A 2021-10-21 2021-10-21 Method, device, storage medium and electronic equipment for identifying medical entity Pending CN113919356A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111229618.1A CN113919356A (en) 2021-10-21 2021-10-21 Method, device, storage medium and electronic equipment for identifying medical entity

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111229618.1A CN113919356A (en) 2021-10-21 2021-10-21 Method, device, storage medium and electronic equipment for identifying medical entity

Publications (1)

Publication Number Publication Date
CN113919356A true CN113919356A (en) 2022-01-11

Family

ID=79242290

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111229618.1A Pending CN113919356A (en) 2021-10-21 2021-10-21 Method, device, storage medium and electronic equipment for identifying medical entity

Country Status (1)

Country Link
CN (1) CN113919356A (en)

Similar Documents

Publication Publication Date Title
US20230100376A1 (en) Text sentence processing method and apparatus, computer device, and storage medium
US11120801B2 (en) Generating dialogue responses utilizing an independent context-dependent additive recurrent neural network
US10664666B2 (en) Language conversion method and device based on artificial intelligence and terminal
CN107797985B (en) Method and device for establishing synonymous identification model and identifying synonymous text
CN109815487B (en) Text quality inspection method, electronic device, computer equipment and storage medium
US20220414135A1 (en) Detecting affective characteristics of text with gated convolutional encoder-decoder framework
CN110163181B (en) Sign language identification method and device
CN110597961B (en) Text category labeling method and device, electronic equipment and storage medium
CN111666766B (en) Data processing method, device and equipment
CN110222328B (en) Method, device and equipment for labeling participles and parts of speech based on neural network and storage medium
CN113094478B (en) Expression reply method, device, equipment and storage medium
US20220043982A1 (en) Toxic vector mapping across languages
WO2022174496A1 (en) Data annotation method and apparatus based on generative model, and device and storage medium
CN115146068B (en) Method, device, equipment and storage medium for extracting relation triples
CN111444715A (en) Entity relationship identification method and device, computer equipment and storage medium
CN114218945A (en) Entity identification method, device, server and storage medium
CN110717021A (en) Input text and related device for obtaining artificial intelligence interview
CN115312033A (en) Speech emotion recognition method, device, equipment and medium based on artificial intelligence
CN113743101A (en) Text error correction method and device, electronic equipment and computer storage medium
CN113515593A (en) Topic detection method and device based on clustering model and computer equipment
CN112052329A (en) Text abstract generation method and device, computer equipment and readable storage medium
CN115858776B (en) Variant text classification recognition method, system, storage medium and electronic equipment
CN116483979A (en) Dialog model training method, device, equipment and medium based on artificial intelligence
CN110929532A (en) Data processing method, device, equipment and storage medium
CN113919356A (en) Method, device, storage medium and electronic equipment for identifying medical entity

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination