CN112685561A - Small sample clinical medical text post-structuring processing method across disease categories - Google Patents

Small sample clinical medical text post-structuring processing method across disease categories Download PDF

Info

Publication number
CN112685561A
CN112685561A CN202011567629.6A CN202011567629A CN112685561A CN 112685561 A CN112685561 A CN 112685561A CN 202011567629 A CN202011567629 A CN 202011567629A CN 112685561 A CN112685561 A CN 112685561A
Authority
CN
China
Prior art keywords
text
information
model
disease
text information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011567629.6A
Other languages
Chinese (zh)
Inventor
刘翔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Zhihuiyun Technology Co ltd
Original Assignee
Guangzhou Zhihuiyun Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Zhihuiyun Technology Co ltd filed Critical Guangzhou Zhihuiyun Technology Co ltd
Priority to CN202011567629.6A priority Critical patent/CN112685561A/en
Publication of CN112685561A publication Critical patent/CN112685561A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Machine Translation (AREA)

Abstract

The invention discloses a small sample clinical medical text post-structuring processing method across disease categories, which comprises the following steps: acquiring small sample text information of A disease species and large sample text information of B disease species, acquiring information to be labeled by adopting text clustering of text confusion, and labeling the information to be labeled to obtain labeled text information; under a pyrrch neural network framework, training an information extraction model of the type-problem by using a meta-learning model and an LSTM model to obtain a meta-model; training the meta-model by using the labeled text information to obtain a text post-structured model of the small sample medical record; and identifying the text information of the A disease species by using the post-text structured model. Through the scheme, the method has the advantages of simple logic, less label quantity, comprehensive coverage, high processing efficiency and the like, and has high practical value and popularization value in the fields of Chinese natural language processing technology and machine learning.

Description

Small sample clinical medical text post-structuring processing method across disease categories
Technical Field
The invention relates to the field of Chinese natural language processing technology and machine learning, in particular to a small sample clinical medical text post-structuring processing method across disease categories.
Background
High-quality clinical medicine science research is not open to high-availability language model support, however, high-availability language models often require large amounts of high-quality markup corpora. Therefore, clinical medical researchers spend a lot of time on organizing patient data, and effective data are marked out from the complicated medical electronic medical record texts through time-consuming and tedious manual marking operations, and the scientific research method is extremely inefficient for medical workers who are busy originally. And the traditional machine learning is not shared in knowledge and poor in model portability.
Therefore, a small sample clinical medical text post-structuring processing method with less labeling quantity, comprehensive coverage and high efficiency across disease categories is urgently needed to be provided.
Disclosure of Invention
In view of the above problems, an object of the present invention is to provide a small sample clinical medical text post-structured processing method across disease categories, so as to solve the problems in the prior art that it is difficult to obtain data, labeling efficiency is low, coverage is small, and model reuse difficulty is large, and the technical scheme adopted by the present invention is as follows:
a small sample clinical medical text post-structuring processing method across disease categories comprises the following steps:
acquiring small sample text information of A disease species and large sample text information of B disease species, acquiring information to be labeled by adopting text clustering of text confusion, and labeling the information to be labeled to obtain labeled text information; the labeled text information comprises a standard problem list, a target problem list and a small sample labeling corpus;
under a pyrrch neural network framework, training an information extraction model of the type-problem by using a meta-learning model and an LSTM model to obtain a meta-model;
training the meta-model by using the labeled text information to obtain a text post-structured model of the small sample medical record;
and identifying the text information of the A disease species by using the post-text structured model.
Further, the method for acquiring the small sample text information of the disease category A and the large sample text information of the disease category B and acquiring the information to be labeled by adopting text clustering of text confusion comprises the following steps:
respectively acquiring small sample text information of A disease species and large sample text information of B disease species;
standardizing symbols of the text information of the small sample of the disease A and the text information of the large sample of the disease B, and segmenting according to paragraphs, sentences and text types to obtain segmented text data;
converting the segmented text data into binary data to obtain binary data;
combining the BERT model, and training one by using binary data according to the A disease species and the B disease species in sequence to obtain a BERT language model;
solving the confusion degree of the text information of the small sample of the disease A and the text information of the large sample of the disease B by using a tensoflow frame, and filtering sentences larger than a preset threshold value to form a difference set;
solving the local vector of any sentence in the difference set by using a BERT language model;
and clustering the local vectors by adopting a hierarchical clustering algorithm to obtain the information to be marked.
Furthermore, the LSTM model adopts an input gate, a forgetting gate and an output gate which are connected in sequence.
Further, the forgetting gate satisfies the following relationship:
ft=σ(Wf·[ht-1,xt]+bf)
wherein h ist-1Represents the output of the last cell, xtRepresents the output of the current cell, σ represents the activation function, WfWeight matrix representing forgetting gate, bfA bias term representing a forgetting gate.
Still further, the inputs satisfy the following relationship:
Figure BDA0002862010090000021
wherein f istFor the output of a forgetting gate, i.e. the information that the model would discard from the cell state, σ represents the activation function, Ct-1Indicating old cell status, itFor input gate gating, i.e. to control what previously learned needs to be kept at the current moment,
Figure BDA0002862010090000022
indicating what was learned at the current time;
i is describedtThe expression of (a) is:
it=σ(Wi·[ht-1,xt]+bi)
where σ denotes the activation function, WiWeight matrix, h, representing input gate gatingt-1Represents the output of the last cell, xtRepresenting the output of the current cell, biRepresenting the bias term of the input gate.
The above-mentioned
Figure BDA0002862010090000031
The expression of (a) is:
Figure BDA0002862010090000032
wherein tanh represents the activation function, WcRepresenting the weight matrix, h, when learning new knowledget-1Represents the output of the last cell, xtRepresenting the output of the current cell, bcRepresenting biased terms when learning new knowledge.
Still further, the output gate satisfies the following relationship:
ot=σ(Wo·[ht-1,xt]+bo)
ht=ot*tanh(Ct)
wherein, W0Weight matrix representing output gates, b0Indicating an offset term of the output gate, otIndicating the state of the cells that need to be exported.
Furthermore, the information to be labeled is labeled, including the question, the question type and the unique identifier.
Compared with the prior art, the invention has the following beneficial effects:
(1) the invention skillfully adopts the confusion degree of the small sample text information of the A disease species and the large sample text information of the B disease species to obtain the clustering sentence vector, and has the advantages that; by reusing the historical model, the labeling quantity is reduced, and the working efficiency is improved.
(2) In the invention, under the pyrrch neural network framework, the information extraction model of the type-I problem is trained by using the meta-learning and LSTM models to obtain the meta-model, and the advantages are that: the model can guide the learning of a new task by using the prior knowledge and experience, so that the model has learning ability and the model training efficiency is improved.
(3) The invention has the advantages that the input gate, the forgetting gate and the output gate are arranged in the LSTM model, and the LSTM model has the following advantages: by introducing the concept of cell state, the LSTM network can delete and add information to the cell state through the structure of various gates, thereby solving the problem of long dependence.
(4) The invention utilizes Chinese natural language processing and machine learning technology, combines writing specifications and experiences of clinical medical texts, and realizes automatic extraction of structured data from small sample clinical medical texts of different disease categories, and the invention mainly provides an information processing tool for clinical scientific research integration, solves the problems of difficult data acquisition, low labeling efficiency, small coverage, large model multiplexing difficulty and the like in the current clinical scientific research, and improves the utilization rate of clinical scientific research data and the efficiency of model training;
in conclusion, the method has the advantages of simple logic, less label quantity, comprehensive coverage, high processing efficiency and the like, and has high practical value and popularization value in the fields of Chinese natural language processing technology and machine learning.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention, and therefore should not be considered as limiting the scope of protection, and it is obvious for those skilled in the art that other related drawings can be obtained according to these drawings without inventive efforts.
FIG. 1 is a logic flow diagram of the present invention.
FIG. 2 is a flow chart of Encode of the Transformer of the present invention.
FIG. 3 is a schematic diagram of the BERT model structure of the present invention.
FIG. 4 is a schematic diagram of the input components of the BERT model of the present invention.
FIG. 5 is a diagram of an LSTM network of a neural network layer in accordance with the present invention.
Fig. 6 is a diagram of the LSTM network of the four neural network layers of the present invention.
FIG. 7 is a schematic drawing of a line graph in the LSTM network of the present invention.
FIG. 8 is a schematic diagram of the structure of cells in the LSTM network of the present invention.
Fig. 9 is a schematic view of the gate selection information passage in the present invention.
Fig. 10 is a schematic view of the forgetting door of the present invention.
FIG. 11 is a schematic view of an input gate of the present invention.
Fig. 12 is a schematic diagram of an output gate of the present invention.
Detailed Description
To further clarify the objects, technical solutions and advantages of the present application, the present invention will be further described with reference to the accompanying drawings and examples, and embodiments of the present invention include, but are not limited to, the following examples. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Examples
As shown in fig. 1 to fig. 12, the present embodiment provides a post-structured processing method of a small sample clinical medical text across disease categories, which mainly includes the following steps:
the method comprises the steps of firstly, obtaining small sample text information of A disease species and large sample text information of B disease species, adopting text clustering of text confusion to obtain information to be labeled, labeling the information to be labeled, and obtaining labeled text information. Specifically, the method comprises the following steps:
(1) text preprocessing: standardizing the symbols of the text of each medical record of the disease category, and segmenting according to paragraphs, sentences and text types;
(2) text binarization: binarizing text data of different disease species according to a google-BERT standard method;
(3) model training: training language models one by one according to disease types by using a standard BERT model; in this embodiment, BERT is called Bidirectional Encoder replication from Transformers, which uses a Transformer Encoder as a main model structure, and the Transformer discards the recurrent network structure of RNN, and models a text segment completely based on attention mechanism. The core idea of attention mechanism is to calculate the correlation between each word in a sentence and all words in the sentence, and then consider the correlation between these words to reflect the relevance and importance degree between different words in the sentence to some extent. BERT is an unsupervised, deep bi-directional NLP pre-training system, "unsupervised" meaning that it can be trained with only a corpus of plain text, however, unlike traditional language models, BERT does not predict the most likely current word in the context of the current word, but rather randomly masks some words and predicts with all the words that are not masked; as shown in fig. 2 and 3;
the main innovation points of the BERT model are all on a pre-training method, and the pre-training comprises two tasks: mask Language Model and Next sequence Prediction;
the Masked Language Model can be understood as filling in the blank, and the author will randomly mask a fixed number of words in a sentence, and predict the words by using the context;
the aim of the task is that because QA and NLI tasks in natural languages need to understand the relationship between two sentences, the Next sequence Prediction can enable a pre-trained model to be better adapted to the task;
in fig. 4, the input part of BERT is a linear sequence, two sentences are divided by separators, and two symbols [ CLS ] and [ SEP ] are added at the top and at the end, respectively. There are three embeddings per word, respectively: segment Embeddings, Position Embeddings, Token Embeddings;
token Embeddings represent the word vector for each word;
segment Embeddings represent to which sentence the word of each sequence belongs;
position entries represents the encoding of Position information for each word;
(4) the confusion degree is: adopting a tensoflow frame to realize the calculation of the text confusion degree; the specific labeling steps are as follows:
(41) cross-comparing the text information of the small sample of the disease A species with the text information of the large sample of the disease B species with PPL (confusion) of a sentence in a language model of other disease species;
(42) setting a threshold value, and filtering out sentences with large difference among disease seeds; typically, the threshold is set to 0.9.
(5) Sentence vector: and calling a BERT language model of the disease species to obtain a BERT sentence vector corresponding to each sentence.
(6) Hierarchical clustering: and clustering the sentence vectors by a hierarchical clustering method, so as to pack the sentences expressing the same meaning together for the marking of doctors.
In this embodiment, the label of the electronic medical record includes the following contents: qid, query _ type, context, query, ans _ span; wherein qid represents a unique identifier of each custom question; the query _ type represents the type of question, two types in total: class, text; context represents a text to be marked; the query represents a question proposed for a text to be labeled; ans, the answer of the text to be labeled to the question is represented, wherein, the answer of the class type question is of a Boolean type, and the answer of the text type question is of a character string type; ans _ span represents the coordinates of the answer to the text type question in the text to be annotated.
In the embodiment, data to be structured is imported, and data is labeled and stored in excel;
when labeling data, the format requirements are as follows:
(1) each sentence needs to correspond to at least one standard question: the query comprises a query, a query _ type and a qid;
(2) each question corresponds to a unique answer: wherein "0" means "no" and "1" means "yes";
(3) for text type questions, the position of the original text corresponding to the answer needs to be marked, and the position is shown by middle brackets: [ starting position, terminating position ].
And secondly, training an information extraction model of the type-finding problem by using a meta-learning model and an LSTM model under a pytorch neural network framework to obtain a meta-model.
In this embodiment, the LSTM is called Long Short Term Memory, and is designed specifically to solve the Long-Term problem, and all RNNs have a chain form of repeating neural network modules. In the standard RNN, this repeated structure block has only a very simple structure, such as a tanh layer; as shown in FIG. 5;
LSTM is also such a structure, but the duplicated modules have a different structure. Unlike a single neural network layer, here four, interacting in a very specific way, see fig. 6, 7;
in fig. 7, each black line carries an entire vector from the output of one node to the input of the other node. Circles represent operations of poitwise, such as the sum of vectors, while rectangular matrices are learned neural network layers, lines together represent the concatenation of vectors, separate lines represent that content is copied and then distributed to different locations;
LSTM core idea
The key to LSTM is the state of the cell as a whole (fig. 8 shows a cell), and the horizontal line that passes through the cell;
the cell state is similar to the conveyor belt. Run directly on the whole chain with only a few linear interactions. It is easy for information to remain unchanged in the stream above.
If only the upper horizontal line is available, there is no way to add or delete information. But rather by a structure called gates.
The gate may be implemented to selectively pass information, primarily through a sigmoid neural layer and a point-by-point multiplication operation, see fig. 9.
Each element of the sigmoid layer output (which is a vector) is a real number between 0 and 1, representing the weight (or duty) to let the corresponding information pass. For example, 0 means "not to pass any information", and 1 means "to pass all information".
LSTM achieves protection and control of information through three such architectures. The three gates are respectively an input gate, a forgetting gate and an output gate.
Forgetting door
The first step in LSTM is to decide what information to discard from the cell state. This decision is made through a so-called forgetting gate level.
ft=σ(Wf·[ht-1,xt]+bf)
The door will read ht-1And xtOutputting a value between 0 and 1 to each of the cells in the cell state Ct-1The numbers in (1). 1 means "complete retention" and 0 means "complete discard".
In this case, the cell state may include the sex of the current subject, so that the correct pronouns can be selected. When a new subject is seen, the old subject is expected to be forgotten, which is particularly shown in fig. 10;
wherein h ist-1The output of the last cell, x, is showntThe input of the current cell is shown. σ denotes a sigmod function.
Input gate
The next step is to decide how much new information to add to the cell state. This need is accomplished in two steps: firstly, a sigmoid layer called 'inputgatelayer' determines which information needs to be updated; a tanh layer generates a vector, i.e. the content to be updated, C, which is selected as an alternativet
Figure BDA0002862010090000081
Wherein f istFor the output of a forgetting gate, i.e. the information that the model would discard from the cell state, σ represents the activation function, Ct-1Indicating old cell status, itFor input gate gating, i.e. to control what previously learned needs to be kept at the current moment,
Figure BDA0002862010090000082
indicating what was learned at the current time;
i is describedtThe expression of (a) is:
it=σ(Wi·[ht-1,xt]+bi)
where σ denotes the activation function, WiWeight matrix, h, representing input gate gatingt-1Represents the output of the last cell, xtRepresenting the output of the current cell, biRepresenting the bias term of the input gate.
The above-mentioned
Figure BDA0002862010090000091
The expression of (a) is:
Figure BDA0002862010090000092
wherein tanh represents the activation function, WcRepresenting the weight matrix, h, when learning new knowledget-1Represents the output of the last cell, xtRepresenting the output of the current cell, bcRepresenting biased terms when learning new knowledge.
The input gate is used for comparing the old state with the ftMultiplying, discarding the information to be forgotten, and adding
Figure BDA0002862010090000093
This is the new cell state candidate.
In the next step, the two parts are combined to perform an update on the state of the cell. Now renewing the old cell stateTime Ct-1Is updated to Ct. The previous steps have already decided what to do and are now actually done.
C, taking the old state with ftMultiplying and discarding the information which is determined to need to be discarded. Then it is added with it C-t. This is the new candidate, which changes according to the degree of decision to update each state.
In the example of a language model, this is where the gender information for the old pronouns is actually discarded and new information is added based on the previously determined objectives, see FIG. 11 in particular.
Output gate
Finally, it needs to be determined what value to output. This output will be based on the cell state, but is also a filtered version. First, a sigmoid layer is run to determine which part of the cell state will be output. The cell state is then processed through tanh (to obtain a value between-1 and 1) and multiplied by the output of the sigmoid gate, and only that part of the determined output will be output.
ot=σ(Wo·[ht-1,xt]+bo)
ht=ot*tanh(Ct)
Wherein, W0Weight matrix representing output gates, b0Indicating an offset term of the output gate, otIndicating the state of the cells that need to be exported.
And thirdly, training the meta-model by using the labeled text information to obtain a text post-structured model of the small sample medical record.
And fourthly, text information of the A disease species is identified by using the post-text structured model. Therefore, the small sample can be adopted for training, and the identification of the text information of the A disease type is realized.
The above-mentioned embodiments are only preferred embodiments of the present invention, and do not limit the scope of the present invention, but all the modifications made by the principles of the present invention and the non-inventive efforts based on the above-mentioned embodiments shall fall within the scope of the present invention.

Claims (7)

1. A small sample clinical medical text post-structuring processing method across disease categories is characterized by comprising the following steps:
acquiring small sample text information of A disease species and large sample text information of B disease species, acquiring information to be labeled by adopting text clustering of text confusion, and labeling the information to be labeled to obtain labeled text information; the labeled text information comprises a standard problem list, a target problem list and a small sample labeling corpus;
under a pyrrch neural network framework, training an information extraction model of the type-problem by using a meta-learning model and an LSTM model to obtain a meta-model;
training the meta-model by using the labeled text information to obtain a text post-structured model of the small sample medical record;
and identifying the text information of the A disease species by using the post-text structured model.
2. The method for post-structured processing of clinical medical text of small samples of cross-disease category according to claim 1, wherein the obtaining of small sample text information of disease category A and large sample text information of disease category B and the obtaining of information to be labeled by text clustering of text confusion comprises the following steps:
respectively acquiring small sample text information of A disease species and large sample text information of B disease species;
standardizing symbols of the text information of the small sample of the disease A and the text information of the large sample of the disease B, and segmenting according to paragraphs, sentences and text types to obtain segmented text data;
converting the segmented text data into binary data to obtain binary data;
combining the BERT model, and training one by using binary data according to the A disease species and the B disease species in sequence to obtain a BERT language model;
solving the confusion degree of the text information of the small sample of the disease A and the text information of the large sample of the disease B by using a tensoflow frame, and filtering sentences larger than a preset threshold value to form a difference set;
solving the local vector of any sentence in the difference set by using a BERT language model;
and clustering the local vectors by adopting a hierarchical clustering algorithm to obtain the information to be marked.
3. The method for post-structured processing of small sample clinical medical text across disease categories according to claim 1 or 2, wherein the LSTM model employs an input gate, a forgetting gate and an output gate connected in sequence.
4. The method of claim 3, wherein the forgetting gate satisfies the following relationship:
ft=σ(Wf·[ht-1,xt]+bf)
wherein h ist-1Represents the output of the last cell, xtRepresents the output of the current cell, σ represents the activation function, WfWeight matrix representing forgetting gate, bfBiasing term for indicating a forgetting gate
5. The method of claim 3, wherein the input gates satisfy the following relationships:
Figure FDA0002862010080000021
wherein f istFor the output of a forgetting gate, i.e. the information that the model would discard from the cell state, σ represents the activation function, Ct-1Indicating old cell status, itFor input gate gating, i.e. to control what previously learned needs to be kept at the current moment,
Figure FDA0002862010080000022
indicating what was learned at the current time;
i is describedtThe expression of (a) is:
it=σ(Wi·[ht-1,xt]+bi)
where σ denotes the activation function, WiWeight matrix, h, representing input gate gatingt-1Represents the output of the last cell, xtRepresenting the output of the current cell, biA bias term representing input gate gating;
the above-mentioned
Figure FDA0002862010080000023
The expression of (a) is:
Figure FDA0002862010080000024
wherein tanh represents the activation function, WcRepresenting the weight matrix, h, when learning new knowledget-1Represents the output of the last cell, xtRepresenting the output of the current cell, bcRepresenting biased terms when learning new knowledge.
6. The method of claim 4, wherein the output gate satisfies the following relationship:
ot=σ(Wo·[ht-1,xt]+bo)
ht=ot*tanh(Ct)
wherein, W0Weight matrix representing output gates, b0Indicating an offset term of the output gate, otIndicating the state of the cells that need to be exported.
7. The method of claim 1, wherein labeling information to be labeled, including question, question type and unique identifier, is performed.
CN202011567629.6A 2020-12-26 2020-12-26 Small sample clinical medical text post-structuring processing method across disease categories Pending CN112685561A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011567629.6A CN112685561A (en) 2020-12-26 2020-12-26 Small sample clinical medical text post-structuring processing method across disease categories

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011567629.6A CN112685561A (en) 2020-12-26 2020-12-26 Small sample clinical medical text post-structuring processing method across disease categories

Publications (1)

Publication Number Publication Date
CN112685561A true CN112685561A (en) 2021-04-20

Family

ID=75451821

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011567629.6A Pending CN112685561A (en) 2020-12-26 2020-12-26 Small sample clinical medical text post-structuring processing method across disease categories

Country Status (1)

Country Link
CN (1) CN112685561A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114357144A (en) * 2022-03-09 2022-04-15 北京大学 Medical numerical extraction and understanding method and device based on small samples
CN115660871A (en) * 2022-11-08 2023-01-31 上海栈略数据技术有限公司 Medical clinical process unsupervised modeling method, computer device, and storage medium
CN117809792A (en) * 2024-02-28 2024-04-02 神州医疗科技股份有限公司 Method and system for structuring disease seed data during cross-disease seed migration

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1462950A1 (en) * 2003-03-27 2004-09-29 Sony International (Europe) GmbH Method of analysis of a text corpus
CN109686445A (en) * 2018-12-29 2019-04-26 成都睿码科技有限责任公司 A kind of intelligent hospital guide's algorithm merged based on automated tag and multi-model
CN109783604A (en) * 2018-12-14 2019-05-21 平安科技(深圳)有限公司 Information extracting method, device and computer equipment based on a small amount of sample
CN110175329A (en) * 2019-05-28 2019-08-27 上海优扬新媒信息技术有限公司 A kind of method, apparatus, electronic equipment and storage medium that sample expands
US20190267113A1 (en) * 2016-10-31 2019-08-29 Preferred Networks, Inc. Disease affection determination device, disease affection determination method, and disease affection determination program
CN111783451A (en) * 2020-06-30 2020-10-16 北京百度网讯科技有限公司 Method and apparatus for enhancing text samples
CN112116957A (en) * 2020-08-20 2020-12-22 澳门科技大学 Disease subtype prediction method, system, device and medium based on small sample

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1462950A1 (en) * 2003-03-27 2004-09-29 Sony International (Europe) GmbH Method of analysis of a text corpus
US20190267113A1 (en) * 2016-10-31 2019-08-29 Preferred Networks, Inc. Disease affection determination device, disease affection determination method, and disease affection determination program
CN109783604A (en) * 2018-12-14 2019-05-21 平安科技(深圳)有限公司 Information extracting method, device and computer equipment based on a small amount of sample
CN109686445A (en) * 2018-12-29 2019-04-26 成都睿码科技有限责任公司 A kind of intelligent hospital guide's algorithm merged based on automated tag and multi-model
CN110175329A (en) * 2019-05-28 2019-08-27 上海优扬新媒信息技术有限公司 A kind of method, apparatus, electronic equipment and storage medium that sample expands
CN111783451A (en) * 2020-06-30 2020-10-16 北京百度网讯科技有限公司 Method and apparatus for enhancing text samples
CN112116957A (en) * 2020-08-20 2020-12-22 澳门科技大学 Disease subtype prediction method, system, device and medium based on small sample

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
XI SHERYL ZHANG ET AL: "MetaPred: Meta-Learning for Clinical Risk Prediction with Limited Patient Electronic Health Records", 《ARXIV》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114357144A (en) * 2022-03-09 2022-04-15 北京大学 Medical numerical extraction and understanding method and device based on small samples
CN114357144B (en) * 2022-03-09 2022-08-09 北京大学 Medical numerical extraction and understanding method and device based on small samples
CN115660871A (en) * 2022-11-08 2023-01-31 上海栈略数据技术有限公司 Medical clinical process unsupervised modeling method, computer device, and storage medium
CN117809792A (en) * 2024-02-28 2024-04-02 神州医疗科技股份有限公司 Method and system for structuring disease seed data during cross-disease seed migration
CN117809792B (en) * 2024-02-28 2024-05-03 神州医疗科技股份有限公司 Method and system for structuring disease seed data during cross-disease seed migration

Similar Documents

Publication Publication Date Title
CN111738003B (en) Named entity recognition model training method, named entity recognition method and medium
CN110032648B (en) Medical record structured analysis method based on medical field entity
CN109858041B (en) Named entity recognition method combining semi-supervised learning with user-defined dictionary
KR102008845B1 (en) Automatic classification method of unstructured data
CN112685561A (en) Small sample clinical medical text post-structuring processing method across disease categories
US20050027664A1 (en) Interactive machine learning system for automated annotation of information in text
CN107729309A (en) A kind of method and device of the Chinese semantic analysis based on deep learning
Li et al. UD_BBC: Named entity recognition in social network combined BERT-BiLSTM-CRF with active learning
CN109189862A (en) A kind of construction of knowledge base method towards scientific and technological information analysis
CN112115721A (en) Named entity identification method and device
CN111222318B (en) Trigger word recognition method based on double-channel bidirectional LSTM-CRF network
CN112541337B (en) Document template automatic generation method and system based on recurrent neural network language model
CN115587594B (en) Unstructured text data extraction model training method and system for network security
CN111914556A (en) Emotion guiding method and system based on emotion semantic transfer map
US20220156489A1 (en) Machine learning techniques for identifying logical sections in unstructured data
CN113221569A (en) Method for extracting text information of damage test
Zhang et al. Effective character-augmented word embedding for machine reading comprehension
CN114756681A (en) Evaluation text fine-grained suggestion mining method based on multi-attention fusion
CN111191439A (en) Natural sentence generation method and device, computer equipment and storage medium
CN114091406A (en) Intelligent text labeling method and system for knowledge extraction
AU2019101147A4 (en) A sentimental analysis system for film review based on deep learning
CN116108840A (en) Text fine granularity emotion analysis method, system, medium and computing device
CN115659981A (en) Named entity recognition method based on neural network model
CN114510943A (en) Incremental named entity identification method based on pseudo sample playback
CN112347784A (en) Cross-document entity identification method combined with multi-task learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210420

RJ01 Rejection of invention patent application after publication