CN112685561A - Small sample clinical medical text post-structuring processing method across disease categories - Google Patents
Small sample clinical medical text post-structuring processing method across disease categories Download PDFInfo
- Publication number
- CN112685561A CN112685561A CN202011567629.6A CN202011567629A CN112685561A CN 112685561 A CN112685561 A CN 112685561A CN 202011567629 A CN202011567629 A CN 202011567629A CN 112685561 A CN112685561 A CN 112685561A
- Authority
- CN
- China
- Prior art keywords
- text
- information
- model
- disease
- text information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 201000010099 disease Diseases 0.000 title claims abstract description 55
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 title claims abstract description 55
- 238000003672 processing method Methods 0.000 title claims abstract description 9
- 238000012549 training Methods 0.000 claims abstract description 18
- 238000000034 method Methods 0.000 claims abstract description 14
- 238000002372 labelling Methods 0.000 claims abstract description 13
- 238000013528 artificial neural network Methods 0.000 claims abstract description 10
- 238000000605 extraction Methods 0.000 claims abstract description 6
- 238000012545 processing Methods 0.000 claims abstract description 4
- 239000013598 vector Substances 0.000 claims description 14
- 230000006870 function Effects 0.000 claims description 12
- 230000004913 activation Effects 0.000 claims description 11
- 239000011159 matrix material Substances 0.000 claims description 11
- 238000001914 filtration Methods 0.000 claims description 3
- 238000010801 machine learning Methods 0.000 abstract description 5
- 238000003058 natural language processing Methods 0.000 abstract description 5
- 238000010586 diagram Methods 0.000 description 7
- 238000011160 research Methods 0.000 description 5
- 230000007246 mechanism Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000002457 bidirectional effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 239000003814 drug Substances 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 230000014759 maintenance of location Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 230000010076 replication Effects 0.000 description 1
- 230000006403 short-term memory Effects 0.000 description 1
- 238000010561 standard procedure Methods 0.000 description 1
Images
Landscapes
- Machine Translation (AREA)
Abstract
The invention discloses a small sample clinical medical text post-structuring processing method across disease categories, which comprises the following steps: acquiring small sample text information of A disease species and large sample text information of B disease species, acquiring information to be labeled by adopting text clustering of text confusion, and labeling the information to be labeled to obtain labeled text information; under a pyrrch neural network framework, training an information extraction model of the type-problem by using a meta-learning model and an LSTM model to obtain a meta-model; training the meta-model by using the labeled text information to obtain a text post-structured model of the small sample medical record; and identifying the text information of the A disease species by using the post-text structured model. Through the scheme, the method has the advantages of simple logic, less label quantity, comprehensive coverage, high processing efficiency and the like, and has high practical value and popularization value in the fields of Chinese natural language processing technology and machine learning.
Description
Technical Field
The invention relates to the field of Chinese natural language processing technology and machine learning, in particular to a small sample clinical medical text post-structuring processing method across disease categories.
Background
High-quality clinical medicine science research is not open to high-availability language model support, however, high-availability language models often require large amounts of high-quality markup corpora. Therefore, clinical medical researchers spend a lot of time on organizing patient data, and effective data are marked out from the complicated medical electronic medical record texts through time-consuming and tedious manual marking operations, and the scientific research method is extremely inefficient for medical workers who are busy originally. And the traditional machine learning is not shared in knowledge and poor in model portability.
Therefore, a small sample clinical medical text post-structuring processing method with less labeling quantity, comprehensive coverage and high efficiency across disease categories is urgently needed to be provided.
Disclosure of Invention
In view of the above problems, an object of the present invention is to provide a small sample clinical medical text post-structured processing method across disease categories, so as to solve the problems in the prior art that it is difficult to obtain data, labeling efficiency is low, coverage is small, and model reuse difficulty is large, and the technical scheme adopted by the present invention is as follows:
a small sample clinical medical text post-structuring processing method across disease categories comprises the following steps:
acquiring small sample text information of A disease species and large sample text information of B disease species, acquiring information to be labeled by adopting text clustering of text confusion, and labeling the information to be labeled to obtain labeled text information; the labeled text information comprises a standard problem list, a target problem list and a small sample labeling corpus;
under a pyrrch neural network framework, training an information extraction model of the type-problem by using a meta-learning model and an LSTM model to obtain a meta-model;
training the meta-model by using the labeled text information to obtain a text post-structured model of the small sample medical record;
and identifying the text information of the A disease species by using the post-text structured model.
Further, the method for acquiring the small sample text information of the disease category A and the large sample text information of the disease category B and acquiring the information to be labeled by adopting text clustering of text confusion comprises the following steps:
respectively acquiring small sample text information of A disease species and large sample text information of B disease species;
standardizing symbols of the text information of the small sample of the disease A and the text information of the large sample of the disease B, and segmenting according to paragraphs, sentences and text types to obtain segmented text data;
converting the segmented text data into binary data to obtain binary data;
combining the BERT model, and training one by using binary data according to the A disease species and the B disease species in sequence to obtain a BERT language model;
solving the confusion degree of the text information of the small sample of the disease A and the text information of the large sample of the disease B by using a tensoflow frame, and filtering sentences larger than a preset threshold value to form a difference set;
solving the local vector of any sentence in the difference set by using a BERT language model;
and clustering the local vectors by adopting a hierarchical clustering algorithm to obtain the information to be marked.
Furthermore, the LSTM model adopts an input gate, a forgetting gate and an output gate which are connected in sequence.
Further, the forgetting gate satisfies the following relationship:
ft=σ(Wf·[ht-1,xt]+bf)
wherein h ist-1Represents the output of the last cell, xtRepresents the output of the current cell, σ represents the activation function, WfWeight matrix representing forgetting gate, bfA bias term representing a forgetting gate.
Still further, the inputs satisfy the following relationship:
wherein f istFor the output of a forgetting gate, i.e. the information that the model would discard from the cell state, σ represents the activation function, Ct-1Indicating old cell status, itFor input gate gating, i.e. to control what previously learned needs to be kept at the current moment,indicating what was learned at the current time;
i is describedtThe expression of (a) is:
it=σ(Wi·[ht-1,xt]+bi)
where σ denotes the activation function, WiWeight matrix, h, representing input gate gatingt-1Represents the output of the last cell, xtRepresenting the output of the current cell, biRepresenting the bias term of the input gate.
wherein tanh represents the activation function, WcRepresenting the weight matrix, h, when learning new knowledget-1Represents the output of the last cell, xtRepresenting the output of the current cell, bcRepresenting biased terms when learning new knowledge.
Still further, the output gate satisfies the following relationship:
ot=σ(Wo·[ht-1,xt]+bo)
ht=ot*tanh(Ct)
wherein, W0Weight matrix representing output gates, b0Indicating an offset term of the output gate, otIndicating the state of the cells that need to be exported.
Furthermore, the information to be labeled is labeled, including the question, the question type and the unique identifier.
Compared with the prior art, the invention has the following beneficial effects:
(1) the invention skillfully adopts the confusion degree of the small sample text information of the A disease species and the large sample text information of the B disease species to obtain the clustering sentence vector, and has the advantages that; by reusing the historical model, the labeling quantity is reduced, and the working efficiency is improved.
(2) In the invention, under the pyrrch neural network framework, the information extraction model of the type-I problem is trained by using the meta-learning and LSTM models to obtain the meta-model, and the advantages are that: the model can guide the learning of a new task by using the prior knowledge and experience, so that the model has learning ability and the model training efficiency is improved.
(3) The invention has the advantages that the input gate, the forgetting gate and the output gate are arranged in the LSTM model, and the LSTM model has the following advantages: by introducing the concept of cell state, the LSTM network can delete and add information to the cell state through the structure of various gates, thereby solving the problem of long dependence.
(4) The invention utilizes Chinese natural language processing and machine learning technology, combines writing specifications and experiences of clinical medical texts, and realizes automatic extraction of structured data from small sample clinical medical texts of different disease categories, and the invention mainly provides an information processing tool for clinical scientific research integration, solves the problems of difficult data acquisition, low labeling efficiency, small coverage, large model multiplexing difficulty and the like in the current clinical scientific research, and improves the utilization rate of clinical scientific research data and the efficiency of model training;
in conclusion, the method has the advantages of simple logic, less label quantity, comprehensive coverage, high processing efficiency and the like, and has high practical value and popularization value in the fields of Chinese natural language processing technology and machine learning.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention, and therefore should not be considered as limiting the scope of protection, and it is obvious for those skilled in the art that other related drawings can be obtained according to these drawings without inventive efforts.
FIG. 1 is a logic flow diagram of the present invention.
FIG. 2 is a flow chart of Encode of the Transformer of the present invention.
FIG. 3 is a schematic diagram of the BERT model structure of the present invention.
FIG. 4 is a schematic diagram of the input components of the BERT model of the present invention.
FIG. 5 is a diagram of an LSTM network of a neural network layer in accordance with the present invention.
Fig. 6 is a diagram of the LSTM network of the four neural network layers of the present invention.
FIG. 7 is a schematic drawing of a line graph in the LSTM network of the present invention.
FIG. 8 is a schematic diagram of the structure of cells in the LSTM network of the present invention.
Fig. 9 is a schematic view of the gate selection information passage in the present invention.
Fig. 10 is a schematic view of the forgetting door of the present invention.
FIG. 11 is a schematic view of an input gate of the present invention.
Fig. 12 is a schematic diagram of an output gate of the present invention.
Detailed Description
To further clarify the objects, technical solutions and advantages of the present application, the present invention will be further described with reference to the accompanying drawings and examples, and embodiments of the present invention include, but are not limited to, the following examples. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Examples
As shown in fig. 1 to fig. 12, the present embodiment provides a post-structured processing method of a small sample clinical medical text across disease categories, which mainly includes the following steps:
the method comprises the steps of firstly, obtaining small sample text information of A disease species and large sample text information of B disease species, adopting text clustering of text confusion to obtain information to be labeled, labeling the information to be labeled, and obtaining labeled text information. Specifically, the method comprises the following steps:
(1) text preprocessing: standardizing the symbols of the text of each medical record of the disease category, and segmenting according to paragraphs, sentences and text types;
(2) text binarization: binarizing text data of different disease species according to a google-BERT standard method;
(3) model training: training language models one by one according to disease types by using a standard BERT model; in this embodiment, BERT is called Bidirectional Encoder replication from Transformers, which uses a Transformer Encoder as a main model structure, and the Transformer discards the recurrent network structure of RNN, and models a text segment completely based on attention mechanism. The core idea of attention mechanism is to calculate the correlation between each word in a sentence and all words in the sentence, and then consider the correlation between these words to reflect the relevance and importance degree between different words in the sentence to some extent. BERT is an unsupervised, deep bi-directional NLP pre-training system, "unsupervised" meaning that it can be trained with only a corpus of plain text, however, unlike traditional language models, BERT does not predict the most likely current word in the context of the current word, but rather randomly masks some words and predicts with all the words that are not masked; as shown in fig. 2 and 3;
the main innovation points of the BERT model are all on a pre-training method, and the pre-training comprises two tasks: mask Language Model and Next sequence Prediction;
the Masked Language Model can be understood as filling in the blank, and the author will randomly mask a fixed number of words in a sentence, and predict the words by using the context;
the aim of the task is that because QA and NLI tasks in natural languages need to understand the relationship between two sentences, the Next sequence Prediction can enable a pre-trained model to be better adapted to the task;
in fig. 4, the input part of BERT is a linear sequence, two sentences are divided by separators, and two symbols [ CLS ] and [ SEP ] are added at the top and at the end, respectively. There are three embeddings per word, respectively: segment Embeddings, Position Embeddings, Token Embeddings;
token Embeddings represent the word vector for each word;
segment Embeddings represent to which sentence the word of each sequence belongs;
position entries represents the encoding of Position information for each word;
(4) the confusion degree is: adopting a tensoflow frame to realize the calculation of the text confusion degree; the specific labeling steps are as follows:
(41) cross-comparing the text information of the small sample of the disease A species with the text information of the large sample of the disease B species with PPL (confusion) of a sentence in a language model of other disease species;
(42) setting a threshold value, and filtering out sentences with large difference among disease seeds; typically, the threshold is set to 0.9.
(5) Sentence vector: and calling a BERT language model of the disease species to obtain a BERT sentence vector corresponding to each sentence.
(6) Hierarchical clustering: and clustering the sentence vectors by a hierarchical clustering method, so as to pack the sentences expressing the same meaning together for the marking of doctors.
In this embodiment, the label of the electronic medical record includes the following contents: qid, query _ type, context, query, ans _ span; wherein qid represents a unique identifier of each custom question; the query _ type represents the type of question, two types in total: class, text; context represents a text to be marked; the query represents a question proposed for a text to be labeled; ans, the answer of the text to be labeled to the question is represented, wherein, the answer of the class type question is of a Boolean type, and the answer of the text type question is of a character string type; ans _ span represents the coordinates of the answer to the text type question in the text to be annotated.
In the embodiment, data to be structured is imported, and data is labeled and stored in excel;
when labeling data, the format requirements are as follows:
(1) each sentence needs to correspond to at least one standard question: the query comprises a query, a query _ type and a qid;
(2) each question corresponds to a unique answer: wherein "0" means "no" and "1" means "yes";
(3) for text type questions, the position of the original text corresponding to the answer needs to be marked, and the position is shown by middle brackets: [ starting position, terminating position ].
And secondly, training an information extraction model of the type-finding problem by using a meta-learning model and an LSTM model under a pytorch neural network framework to obtain a meta-model.
In this embodiment, the LSTM is called Long Short Term Memory, and is designed specifically to solve the Long-Term problem, and all RNNs have a chain form of repeating neural network modules. In the standard RNN, this repeated structure block has only a very simple structure, such as a tanh layer; as shown in FIG. 5;
LSTM is also such a structure, but the duplicated modules have a different structure. Unlike a single neural network layer, here four, interacting in a very specific way, see fig. 6, 7;
in fig. 7, each black line carries an entire vector from the output of one node to the input of the other node. Circles represent operations of poitwise, such as the sum of vectors, while rectangular matrices are learned neural network layers, lines together represent the concatenation of vectors, separate lines represent that content is copied and then distributed to different locations;
LSTM core idea
The key to LSTM is the state of the cell as a whole (fig. 8 shows a cell), and the horizontal line that passes through the cell;
the cell state is similar to the conveyor belt. Run directly on the whole chain with only a few linear interactions. It is easy for information to remain unchanged in the stream above.
If only the upper horizontal line is available, there is no way to add or delete information. But rather by a structure called gates.
The gate may be implemented to selectively pass information, primarily through a sigmoid neural layer and a point-by-point multiplication operation, see fig. 9.
Each element of the sigmoid layer output (which is a vector) is a real number between 0 and 1, representing the weight (or duty) to let the corresponding information pass. For example, 0 means "not to pass any information", and 1 means "to pass all information".
LSTM achieves protection and control of information through three such architectures. The three gates are respectively an input gate, a forgetting gate and an output gate.
Forgetting door
The first step in LSTM is to decide what information to discard from the cell state. This decision is made through a so-called forgetting gate level.
ft=σ(Wf·[ht-1,xt]+bf)
The door will read ht-1And xtOutputting a value between 0 and 1 to each of the cells in the cell state Ct-1The numbers in (1). 1 means "complete retention" and 0 means "complete discard".
In this case, the cell state may include the sex of the current subject, so that the correct pronouns can be selected. When a new subject is seen, the old subject is expected to be forgotten, which is particularly shown in fig. 10;
wherein h ist-1The output of the last cell, x, is showntThe input of the current cell is shown. σ denotes a sigmod function.
Input gate
The next step is to decide how much new information to add to the cell state. This need is accomplished in two steps: firstly, a sigmoid layer called 'inputgatelayer' determines which information needs to be updated; a tanh layer generates a vector, i.e. the content to be updated, C, which is selected as an alternativet。
Wherein f istFor the output of a forgetting gate, i.e. the information that the model would discard from the cell state, σ represents the activation function, Ct-1Indicating old cell status, itFor input gate gating, i.e. to control what previously learned needs to be kept at the current moment,indicating what was learned at the current time;
i is describedtThe expression of (a) is:
it=σ(Wi·[ht-1,xt]+bi)
where σ denotes the activation function, WiWeight matrix, h, representing input gate gatingt-1Represents the output of the last cell, xtRepresenting the output of the current cell, biRepresenting the bias term of the input gate.
wherein tanh represents the activation function, WcRepresenting the weight matrix, h, when learning new knowledget-1Represents the output of the last cell, xtRepresenting the output of the current cell, bcRepresenting biased terms when learning new knowledge.
The input gate is used for comparing the old state with the ftMultiplying, discarding the information to be forgotten, and addingThis is the new cell state candidate.
In the next step, the two parts are combined to perform an update on the state of the cell. Now renewing the old cell stateTime Ct-1Is updated to Ct. The previous steps have already decided what to do and are now actually done.
C, taking the old state with ftMultiplying and discarding the information which is determined to need to be discarded. Then it is added with it C-t. This is the new candidate, which changes according to the degree of decision to update each state.
In the example of a language model, this is where the gender information for the old pronouns is actually discarded and new information is added based on the previously determined objectives, see FIG. 11 in particular.
Output gate
Finally, it needs to be determined what value to output. This output will be based on the cell state, but is also a filtered version. First, a sigmoid layer is run to determine which part of the cell state will be output. The cell state is then processed through tanh (to obtain a value between-1 and 1) and multiplied by the output of the sigmoid gate, and only that part of the determined output will be output.
ot=σ(Wo·[ht-1,xt]+bo)
ht=ot*tanh(Ct)
Wherein, W0Weight matrix representing output gates, b0Indicating an offset term of the output gate, otIndicating the state of the cells that need to be exported.
And thirdly, training the meta-model by using the labeled text information to obtain a text post-structured model of the small sample medical record.
And fourthly, text information of the A disease species is identified by using the post-text structured model. Therefore, the small sample can be adopted for training, and the identification of the text information of the A disease type is realized.
The above-mentioned embodiments are only preferred embodiments of the present invention, and do not limit the scope of the present invention, but all the modifications made by the principles of the present invention and the non-inventive efforts based on the above-mentioned embodiments shall fall within the scope of the present invention.
Claims (7)
1. A small sample clinical medical text post-structuring processing method across disease categories is characterized by comprising the following steps:
acquiring small sample text information of A disease species and large sample text information of B disease species, acquiring information to be labeled by adopting text clustering of text confusion, and labeling the information to be labeled to obtain labeled text information; the labeled text information comprises a standard problem list, a target problem list and a small sample labeling corpus;
under a pyrrch neural network framework, training an information extraction model of the type-problem by using a meta-learning model and an LSTM model to obtain a meta-model;
training the meta-model by using the labeled text information to obtain a text post-structured model of the small sample medical record;
and identifying the text information of the A disease species by using the post-text structured model.
2. The method for post-structured processing of clinical medical text of small samples of cross-disease category according to claim 1, wherein the obtaining of small sample text information of disease category A and large sample text information of disease category B and the obtaining of information to be labeled by text clustering of text confusion comprises the following steps:
respectively acquiring small sample text information of A disease species and large sample text information of B disease species;
standardizing symbols of the text information of the small sample of the disease A and the text information of the large sample of the disease B, and segmenting according to paragraphs, sentences and text types to obtain segmented text data;
converting the segmented text data into binary data to obtain binary data;
combining the BERT model, and training one by using binary data according to the A disease species and the B disease species in sequence to obtain a BERT language model;
solving the confusion degree of the text information of the small sample of the disease A and the text information of the large sample of the disease B by using a tensoflow frame, and filtering sentences larger than a preset threshold value to form a difference set;
solving the local vector of any sentence in the difference set by using a BERT language model;
and clustering the local vectors by adopting a hierarchical clustering algorithm to obtain the information to be marked.
3. The method for post-structured processing of small sample clinical medical text across disease categories according to claim 1 or 2, wherein the LSTM model employs an input gate, a forgetting gate and an output gate connected in sequence.
4. The method of claim 3, wherein the forgetting gate satisfies the following relationship:
ft=σ(Wf·[ht-1,xt]+bf)
wherein h ist-1Represents the output of the last cell, xtRepresents the output of the current cell, σ represents the activation function, WfWeight matrix representing forgetting gate, bfBiasing term for indicating a forgetting gate
5. The method of claim 3, wherein the input gates satisfy the following relationships:
wherein f istFor the output of a forgetting gate, i.e. the information that the model would discard from the cell state, σ represents the activation function, Ct-1Indicating old cell status, itFor input gate gating, i.e. to control what previously learned needs to be kept at the current moment,indicating what was learned at the current time;
i is describedtThe expression of (a) is:
it=σ(Wi·[ht-1,xt]+bi)
where σ denotes the activation function, WiWeight matrix, h, representing input gate gatingt-1Represents the output of the last cell, xtRepresenting the output of the current cell, biA bias term representing input gate gating;
wherein tanh represents the activation function, WcRepresenting the weight matrix, h, when learning new knowledget-1Represents the output of the last cell, xtRepresenting the output of the current cell, bcRepresenting biased terms when learning new knowledge.
6. The method of claim 4, wherein the output gate satisfies the following relationship:
ot=σ(Wo·[ht-1,xt]+bo)
ht=ot*tanh(Ct)
wherein, W0Weight matrix representing output gates, b0Indicating an offset term of the output gate, otIndicating the state of the cells that need to be exported.
7. The method of claim 1, wherein labeling information to be labeled, including question, question type and unique identifier, is performed.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011567629.6A CN112685561A (en) | 2020-12-26 | 2020-12-26 | Small sample clinical medical text post-structuring processing method across disease categories |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011567629.6A CN112685561A (en) | 2020-12-26 | 2020-12-26 | Small sample clinical medical text post-structuring processing method across disease categories |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112685561A true CN112685561A (en) | 2021-04-20 |
Family
ID=75451821
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011567629.6A Pending CN112685561A (en) | 2020-12-26 | 2020-12-26 | Small sample clinical medical text post-structuring processing method across disease categories |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112685561A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114357144A (en) * | 2022-03-09 | 2022-04-15 | 北京大学 | Medical numerical extraction and understanding method and device based on small samples |
CN115660871A (en) * | 2022-11-08 | 2023-01-31 | 上海栈略数据技术有限公司 | Medical clinical process unsupervised modeling method, computer device, and storage medium |
CN117809792A (en) * | 2024-02-28 | 2024-04-02 | 神州医疗科技股份有限公司 | Method and system for structuring disease seed data during cross-disease seed migration |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1462950A1 (en) * | 2003-03-27 | 2004-09-29 | Sony International (Europe) GmbH | Method of analysis of a text corpus |
CN109686445A (en) * | 2018-12-29 | 2019-04-26 | 成都睿码科技有限责任公司 | A kind of intelligent hospital guide's algorithm merged based on automated tag and multi-model |
CN109783604A (en) * | 2018-12-14 | 2019-05-21 | 平安科技(深圳)有限公司 | Information extracting method, device and computer equipment based on a small amount of sample |
CN110175329A (en) * | 2019-05-28 | 2019-08-27 | 上海优扬新媒信息技术有限公司 | A kind of method, apparatus, electronic equipment and storage medium that sample expands |
US20190267113A1 (en) * | 2016-10-31 | 2019-08-29 | Preferred Networks, Inc. | Disease affection determination device, disease affection determination method, and disease affection determination program |
CN111783451A (en) * | 2020-06-30 | 2020-10-16 | 北京百度网讯科技有限公司 | Method and apparatus for enhancing text samples |
CN112116957A (en) * | 2020-08-20 | 2020-12-22 | 澳门科技大学 | Disease subtype prediction method, system, device and medium based on small sample |
-
2020
- 2020-12-26 CN CN202011567629.6A patent/CN112685561A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1462950A1 (en) * | 2003-03-27 | 2004-09-29 | Sony International (Europe) GmbH | Method of analysis of a text corpus |
US20190267113A1 (en) * | 2016-10-31 | 2019-08-29 | Preferred Networks, Inc. | Disease affection determination device, disease affection determination method, and disease affection determination program |
CN109783604A (en) * | 2018-12-14 | 2019-05-21 | 平安科技(深圳)有限公司 | Information extracting method, device and computer equipment based on a small amount of sample |
CN109686445A (en) * | 2018-12-29 | 2019-04-26 | 成都睿码科技有限责任公司 | A kind of intelligent hospital guide's algorithm merged based on automated tag and multi-model |
CN110175329A (en) * | 2019-05-28 | 2019-08-27 | 上海优扬新媒信息技术有限公司 | A kind of method, apparatus, electronic equipment and storage medium that sample expands |
CN111783451A (en) * | 2020-06-30 | 2020-10-16 | 北京百度网讯科技有限公司 | Method and apparatus for enhancing text samples |
CN112116957A (en) * | 2020-08-20 | 2020-12-22 | 澳门科技大学 | Disease subtype prediction method, system, device and medium based on small sample |
Non-Patent Citations (1)
Title |
---|
XI SHERYL ZHANG ET AL: "MetaPred: Meta-Learning for Clinical Risk Prediction with Limited Patient Electronic Health Records", 《ARXIV》 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114357144A (en) * | 2022-03-09 | 2022-04-15 | 北京大学 | Medical numerical extraction and understanding method and device based on small samples |
CN114357144B (en) * | 2022-03-09 | 2022-08-09 | 北京大学 | Medical numerical extraction and understanding method and device based on small samples |
CN115660871A (en) * | 2022-11-08 | 2023-01-31 | 上海栈略数据技术有限公司 | Medical clinical process unsupervised modeling method, computer device, and storage medium |
CN117809792A (en) * | 2024-02-28 | 2024-04-02 | 神州医疗科技股份有限公司 | Method and system for structuring disease seed data during cross-disease seed migration |
CN117809792B (en) * | 2024-02-28 | 2024-05-03 | 神州医疗科技股份有限公司 | Method and system for structuring disease seed data during cross-disease seed migration |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111738003B (en) | Named entity recognition model training method, named entity recognition method and medium | |
CN110032648B (en) | Medical record structured analysis method based on medical field entity | |
CN109858041B (en) | Named entity recognition method combining semi-supervised learning with user-defined dictionary | |
KR102008845B1 (en) | Automatic classification method of unstructured data | |
CN112685561A (en) | Small sample clinical medical text post-structuring processing method across disease categories | |
US20050027664A1 (en) | Interactive machine learning system for automated annotation of information in text | |
CN107729309A (en) | A kind of method and device of the Chinese semantic analysis based on deep learning | |
Li et al. | UD_BBC: Named entity recognition in social network combined BERT-BiLSTM-CRF with active learning | |
CN109189862A (en) | A kind of construction of knowledge base method towards scientific and technological information analysis | |
CN112115721A (en) | Named entity identification method and device | |
CN111222318B (en) | Trigger word recognition method based on double-channel bidirectional LSTM-CRF network | |
CN112541337B (en) | Document template automatic generation method and system based on recurrent neural network language model | |
CN115587594B (en) | Unstructured text data extraction model training method and system for network security | |
CN111914556A (en) | Emotion guiding method and system based on emotion semantic transfer map | |
US20220156489A1 (en) | Machine learning techniques for identifying logical sections in unstructured data | |
CN113221569A (en) | Method for extracting text information of damage test | |
Zhang et al. | Effective character-augmented word embedding for machine reading comprehension | |
CN114756681A (en) | Evaluation text fine-grained suggestion mining method based on multi-attention fusion | |
CN111191439A (en) | Natural sentence generation method and device, computer equipment and storage medium | |
CN114091406A (en) | Intelligent text labeling method and system for knowledge extraction | |
AU2019101147A4 (en) | A sentimental analysis system for film review based on deep learning | |
CN116108840A (en) | Text fine granularity emotion analysis method, system, medium and computing device | |
CN115659981A (en) | Named entity recognition method based on neural network model | |
CN114510943A (en) | Incremental named entity identification method based on pseudo sample playback | |
CN112347784A (en) | Cross-document entity identification method combined with multi-task learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210420 |
|
RJ01 | Rejection of invention patent application after publication |