CN111611780A - Digestive endoscopy report structuring method and system based on deep learning - Google Patents

Digestive endoscopy report structuring method and system based on deep learning Download PDF

Info

Publication number
CN111611780A
CN111611780A CN202010413026.4A CN202010413026A CN111611780A CN 111611780 A CN111611780 A CN 111611780A CN 202010413026 A CN202010413026 A CN 202010413026A CN 111611780 A CN111611780 A CN 111611780A
Authority
CN
China
Prior art keywords
report
word
digestive endoscopy
document
template
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010413026.4A
Other languages
Chinese (zh)
Inventor
崔立真
柏欣雨
鹿旭东
郭伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong University
Original Assignee
Shandong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong University filed Critical Shandong University
Priority to CN202010413026.4A priority Critical patent/CN111611780A/en
Publication of CN111611780A publication Critical patent/CN111611780A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/169Annotation, e.g. comment data or footnotes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/151Transformation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H15/00ICT specially adapted for medical reports, e.g. generation or transmission thereof

Abstract

The invention provides a digestive endoscopy report structuring method and system based on deep learning, which are used for acquiring digestive endoscopy report data and marking the data; performing word vector and document matrix representation on the acquired digestive endoscopy report information; modeling the constructed word expression vector and the document expression matrix by using a bidirectional long-short term memory model in combination with the document context; identifying and labeling report information needing to be structured by using a conditional random field for the word vector based on the context coding; and matching the recognition and extraction result with a pre-constructed structured template, wherein the structured template is obtained by constructing a key-value pair relation based on different disease information and lesion part information in historical data, and obtaining a final structured result according to the matched template. The present disclosure enables the structuring of digestive endoscopy reports.

Description

Digestive endoscopy report structuring method and system based on deep learning
Technical Field
The disclosure belongs to the field of natural language processing, and relates to a digestive endoscopy report structuring method and system based on deep learning.
Background
The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.
At present, the informatization of hospitals is in the process of construction with good fire and steaming day, which not only changes the traditional management modes of a plurality of hospitals, but also is the inevitable trend of the development of modern hospitals. Therefore, how to effectively utilize electronic medical information stored in hospital information becomes a hot issue of interest to researchers, and many systems and methods for structuring medical information such as electronic medical records are proposed and used.
However, although the text is the same as the medical text, different medical texts have different important contents, and information to be extracted and structured in the text has a great difference, so that it is difficult to have a universal structuring method. The digestive endoscopy report is a comprehensive report, which not only contains the disease condition information of the patient, but also contains detailed description of the focus position and condition, and has large difference with other medical reports and electronic medical information, and the existing structuring method is difficult to apply.
Disclosure of Invention
The invention aims to solve the problems and provides a digestive endoscopy report structuring method and a digestive endoscopy report structuring system based on deep learning.
According to some embodiments, the following technical scheme is adopted in the disclosure:
a digestive endoscopy report structuring method based on deep learning comprises the following steps:
acquiring digestive endoscopy report data and marking the data;
performing word vector and document matrix representation on the acquired digestive endoscopy report information;
modeling the constructed word expression vector and the document expression matrix by using a bidirectional long-short term memory model in combination with the document context;
identifying and labeling report information needing to be structured by using a conditional random field for the word vector based on the context coding;
and matching the recognition and extraction result with a pre-constructed structured template, wherein the structured template is obtained by constructing a key-value pair relation based on different disease information and lesion part information in historical data, and obtaining a final structured result according to the matched template.
As an alternative embodiment, a key-value pair relation is constructed according to the content difference of the report text description of the digestive endoscopy, different templates are constructed according to diseases, lesion parts and the like, and each template is respectively constructed into a data table for storage.
By way of further limitation, the structured template refers to a digestive endoscopy report containing fixed structures and corresponding textual descriptions.
Due to the difference of diseases, the endoscopic report of different diseases needs to use different report template descriptions. For example, templates can be constructed for common digestive system diseases such as early gastric cancer, advanced gastric cancer, gastric polyp, chronic atrophic gastritis, chronic non-atrophic gastritis and gastric ulcer according to the literary habit of the existing semi-structured endoscope report data and the suggestions provided by the doctors of the special digestive system.
Each template comprises basic structures such as lesion parts, lesion quantity, lesion types, lesion sizes, mucous membrane conditions, boundary conditions and the like. Each basic structure has well-defined text description, for example, the content of the lesion part can be described by using text values of cardia, fundus, body of stomach, angle of stomach, antrum of stomach and the like.
In an alternative embodiment, the extracted digestive endoscope data is labeled according to the structured template, and endoscope report information needing to be extracted is labeled.
As an alternative embodiment, the process of annotation comprises:
acquiring digestive endoscopy report data from a database, and extracting semi-structured and unstructured parts in the data;
and (4) performing keyword screening on the report, and labeling more than a set number of templates corresponding to each type of disease.
As an alternative implementation, performing word vector and document matrix representation on the acquired digestive endoscopy report information specifically includes:
using a word segmentation tool, and adding a preset disabled word bank and special word bank segmentation;
training a word2vec model by using the segmented endoscope report text data, wherein the trained word2vec model is used for converting the segmented digestive endoscope report text data into a text vector so as to perform word embedding and calculate a word embedding vector of each word in the digestive endoscope report text;
for each digestive endoscopy report document, each word of which is represented by a vector, each document containing a plurality of words is represented by a matrix, and the representation from the input of the original text to the real-value matrix is completed.
As an alternative embodiment, the constructed word expression vector and the document expression matrix are modeled by using a bidirectional long-short term memory model and combining with the document context, and the specific process comprises the following steps:
after a sentence expression matrix obtained by the input expression layer passes through a forward long-short term memory model, the position of each character obtains a expression vector of a hidden layer fused with the text information
Figure BDA0002494006980000041
After a backward long-short term memory model, each character position obtains a representation vector of a hidden layer fused with the following information
Figure BDA0002494006980000042
Finally, splicing the hidden layer vectors of the upper and lower layers, and finally obtaining a representation vector h fusing contexts for each charactert
As an alternative embodiment, when the word vector representation based on report document context coding uses the conditional random field to identify and label the word information needing to be structured, the word vector based on the context coding forms a sequence according to the sequence in the document, and the conditional random field is used to select the word labeling result in the sequence with the highest probability from all possible label sequences as the output.
As an alternative implementation, matching is performed according to the relationship between the words and the labels of the labeling result and the key value relationship in the template, the template with the highest matching degree is taken as the template of the labeling document, and the values in the template are automatically filled according to the relationship between the words and the labels, so that the final structured report is obtained.
A deep learning based digestive endoscopy report structuring system, comprising:
the annotation module is configured to acquire digestive endoscopy report data and annotate the data;
the word representation module is configured to perform word vector and document matrix representation on the acquired digestive endoscopy report information;
a bidirectional long and short term memory model building module configured to model the constructed word representation vector and document representation matrix using a bidirectional long and short term memory model in conjunction with the document context;
and the structuring module is configured to identify and label report information needing structuring on the word vector based on the context coding by using the conditional random field, match the identified and extracted result with a pre-constructed structuring template, construct a key-value pair relation on the basis of different disease information and lesion part information in historical data to obtain the structuring result, and obtain the final structuring result according to the matched template.
A computer readable storage medium having stored therein a plurality of instructions adapted to be loaded by a processor of a terminal device and to execute a deep learning based digestive endoscopy report structuring method.
A terminal device comprising a processor and a computer readable storage medium, the processor being configured to implement instructions; the computer readable storage medium is used for storing a plurality of instructions, and the instructions are suitable for being loaded by a processor and executing the deep learning based digestive endoscope report structuring method.
Compared with the prior art, the beneficial effect of this disclosure is:
(1) the method and the device realize that the existing unstructured digestive endoscopy report is converted into the structured report, and valuable medical information can be more efficiently extracted from the digestive endoscopy report and used for scientific research;
(2) the digestive endoscopy report structuring method extracts original text information, does not influence the existing flow and writing mode, and can help doctors work under the existing habit.
Drawings
The accompanying drawings, which are included to provide a further understanding of the disclosure, illustrate embodiments of the disclosure and together with the description serve to explain the disclosure and are not to limit the disclosure.
FIG. 1 is a schematic flowchart of a deep learning-based digestive endoscopy report structuring method and system according to an embodiment;
FIG. 2 is a schematic diagram of a two-way LSTM + conditional random field CRF model according to one embodiment;
FIG. 3 is a diagram of the structure of the LSTM model in the first embodiment.
The specific implementation mode is as follows:
the present disclosure is further described with reference to the following drawings and examples.
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present disclosure. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.
The first embodiment is as follows:
a digestive endoscopy report structuring method based on deep learning comprises the following steps:
(1) constructing a digestive endoscopy report structured template;
(2) calling the existing unstructured endoscope report data from a hospital digestive endoscope database, and labeling the endoscope report data according to template contents;
(3) performing word vector and document matrix representation on the acquired endoscope report data;
(4) and (4) modeling the context by using a bidirectional long-term and short-term memory network model for the endoscope report text word expression vector and the document expression matrix obtained in the step (3).
(5) And (4) identifying and labeling word information needing to be structured by using a conditional random field for the context vector representation of each word acquired in the step (4).
(6) And matching the marked result with the structured template, and extracting the marked result into structured value information according to the matched template to obtain a final structured report.
The following describes the above steps in detail with reference to the flowchart of the method in fig. 1, and specifically includes:
the method comprises the following steps: and constructing a structured template of the digestive endoscopy report, and storing the template. The method specifically comprises the following steps:
a template construction
Structured templates refer to the fixed structures and corresponding textual descriptions that a digestive endoscopy report should contain. Due to the difference of diseases, the endoscopic report of different diseases needs to use different report template descriptions.
For example, according to the literary habit of the existing semi-structured endoscope report data and the suggestion provided by a specialist for digestion, a template is constructed for the common diseases of the digestive system, such as early gastric cancer, advanced gastric cancer, gastric polyp, chronic atrophic gastritis, chronic non-atrophic gastritis and gastric ulcer. Each template comprises basic structures such as lesion parts, lesion quantity, lesion types, lesion sizes, mucous membrane conditions, boundary conditions and the like. Each basic structure has well-defined text description, for example, the content of the lesion part can be described by using text values of cardia, fundus, body of stomach, angle of stomach, antrum of stomach and the like.
b template storage
In order to facilitate data management, the SQLServer2008 database which is the same as the report data of the original semi-structured digestive endoscope is adopted, and the template is constructed into different data tables and is stored in the same database with the original semi-structured digestive endoscope data. The column names of the data table are the names of all basic structures in the template, and the values stored in the data table are the word descriptions corresponding to all the basic structures.
Step two: acquiring digestive endoscopy report data and marking the data
a obtaining digestive endoscope report data
And acquiring the existing digestive endoscopy report data from the database, and extracting a semi-structured part in the data to train and label the next step.
b notes digestive endoscopy report
And marking the extracted digestive endoscope data according to the template, and marking endoscope report information to be extracted. Because the morbidity of each template disease is different, the labeled report is screened by using the keywords provided by the specialist for digestive endoscopy, and more than one hundred labels are ensured to be carried out on the template corresponding to each type of disease, so that each template can be sufficiently labeled for training.
Step three: vector and matrix representation of acquired digestive endoscopy information
And for the collected text data, segmenting the sentence words by adopting Jieba word segmentation. The Jieba word segmentation is a Chinese word segmentation tool of Python, and the word segmentation principle of a long-segment character can be roughly divided into three steps: firstly, roughly dividing Chinese paragraphs into sentences by regular expressions; then constructing each sentence into a Directed Acyclic Graph (DAG), and then searching an optimal segmentation scheme; and finally, dividing the continuous single words again by adopting a hidden Markov model (HMM model). For example, the text "IIc type lesion is visible at 1 position on the side of the lower part of the stomach," is segmented into "stomach", "body", "lower part", "large", "curved", "side", "visible", "1", "IIc" type "lesion" by using a default segmentation mode.
Furthermore, a preset disabled word bank and a special word bank are used for improving word segmentation effect. Aiming at the report text of the digestive endoscopy, the suggestion provided by a specialist for digestive endoscopy is adopted, and the common value in the report is used for constructing a word bank, so that the word segmentation accuracy is improved. For example, for the text "IIc type lesion is visible at 1 position on the lower part of the stomach in the large bending side", the deactivated word stock and the special word stock are used for word segmentation, and the text is segmented into "stomach", "lower part", "large bending side", "visible", "1", "position", "IIc", "type" and "lesion".
Further, Word Embedding (Word Embedding) process processing is carried out on the text data after Word segmentation, a high-dimensional space with the dimension being the number of all words is embedded into a continuous vector space with the dimension being much lower, each Word or phrase is mapped into a vector on a real number domain, the vector is called a Word vector, the Word vector has good semantic characteristics and is a common way for representing Word characteristics, and the value of each dimension of the Word vector represents a characteristic with certain semantics and grammatical explanation. In the step, firstly, Word2vec model is trained by using the segmented digestive endoscopy report text data, the trained Word2vec model is used for converting the segmented digestive endoscopy report text data into text vectors, so that Word Embedding (Word Embedding) is performed, and Word Embedding vector x of each Word in the digestive endoscopy report text is calculatedt
Further, for each digestive endoscopy report document, each word in the digestive endoscopy report document is subjected to a low-dimensional vector representation xtThen a document containing m words is represented in a matrix as X ═ X (X)1,…,xt,…,xm) The representation from the input of the original text into the real-valued matrix is completed.
Step four: for the constructed word representation vector and the document representation matrix, the words are modeled using a two-way long-short term memory model in combination with the document context.
A long-short term memory (long-short term memory) model is a special RNN model, and a door mechanism is introduced to control the information transmission manner, so that the network can establish a long-distance time sequence relationship. LSTM is suitable for modeling data of sequence properties, and the individual word vector representation of the digestive endoscopy report text in this embodiment can be considered loosely-ordered data.
Further, since the digestive endoscopy report data in this example is simultaneously influenced by its context, to simultaneously consider the contextual characteristics, this example models information from above to below and from below to above based on bi-directional LSTM.
As shown in fig. 3, it is a structural diagram of a single neuron of the LSTM model, which includes three parts, i.e., an input gate, a forgetting gate and an output gate. The working principle is as follows:
(1) forget the door: choosing to forget some information (forget a part of cell by Sigmoid (x, h) control gate, where Sigmoid activation function σ is often used as a threshold function of neural network, mapping variable between 0 and 1), forgetting gate calculation formula is:
ft=σ(Wf·[ht-1,xt]+bf) (1)
(2) an input gate: memorizing some current information (the current information is activated by tanh (x, h), and then a part of the gate is forgotten by sigmoid (x, h), wherein the tanh function is one of hyperbolic functions), and then the input gate part is calculated as:
it=σ(Wi·[ht-1,xt]+bi) (2)
Figure BDA0002494006980000111
(3) merging the past and present memories:
Figure BDA0002494006980000112
(4) the output gate calculation formula is as follows:
ot=σ(Wo·[ht-1,xt]+bo) (5)
ht=ot*tanh(Ct) (6)
the above is the working principle of the LSTM model, Wf,Wi,WoWeight matrices for forgetting gate, input gate and output gate, respectively, bf、bi、bCThe offset of the forgetting gate, the input gate and the output gate are respectively, the above are parameters to be trained, hiTo hide the layer, xtFor digestive endoscopy report word vector input at time t, htFor output, a representation vector of the context information or the context information is fused for each word in the present embodiment.
Further, after the sentence expression matrix X obtained by the input expression layer passes through the forward LSTM, the position of each character obtains the expression vector of the hidden layer fused with the above text information
Figure BDA0002494006980000113
After backward LSTM, each character position obtains a representation vector of a hidden layer fused with the following information
Figure BDA0002494006980000114
Finally, the vectors of the upper and lower hidden layers are spliced, namely
Figure BDA0002494006980000115
Finally, each character obtains a representation vector h of a fused contextt
Step five: for word vector representations based on report document context coding, conditional random fields are used to identify and label word information that needs to be structured.
Each word in the document is coded by context to obtain corresponding vector representation, and the corresponding label can be predicted by decoding. The simplest decoding layer is the linear layer plus Softmax, but this approach ignores the strong dependence between sequence-adjacent tags, and therefore requires an additional Conditional Random Field (CRF) to help find the best tag path during decoding.
This step essentially learns a probability transition matrix between labels, assuming that the document representation obtained by the context coding layer is further subjected to linear layer operations to obtain an output matrix P ∈ R7×kWhere 7 is the length of the document, k represents the total number of tags, and the element P in the matrixi,jA score representing the ith character in the document predicted to be the jth tag. We introduce a label transition probability matrix T to be learned, the elements T in this matrixi,jRepresenting the score of the transition from the ith label to the jth label, the structured recognition task can be further formalized such that, given the input matrix X, the model predicts the sequence label y ═ y (y ═ y)1,y2,…y7) Thus obtaining a score for predicting a label path
Figure BDA0002494006980000121
Wherein the transfer matrix T ∈ R(k+2)×(k+2)Is the parameter that the model needs to learn, and the total number of the original labels is k: we add special labels to represent the beginning and end of the report document, so the transition matrix becomes a square matrix of size k + 2.
Further, in a model training phase, a score for predicting one possible label path is obtained in formula (7), the scores of all label paths are normalized through a Softmax function, the probability value for predicting the label path is obtained and is shown in formula (8), then the probability of the correct label path is maximized based on maximum likelihood estimation, and the final objective function is shown in formula (9).
Figure BDA0002494006980000131
L=log(p(y|X)) (9)
Further, in the model prediction stage, as shown in equation (10), the model selects the most probable path from all possible paths to output as the best path y, and this process can efficiently calculate the best path according to the viterbi algorithm.
Figure BDA0002494006980000132
Step six: and matching the labeling result with the structured template to obtain a structured report.
And matching the relation between the words and the labels of the labeling result with the key value relation in the template, taking the template with the highest matching degree as the template of the labeling document, and automatically filling values in the template according to the relation between the words and the labels to obtain the final structured report.
Example two:
a deep learning based digestive endoscopy report structuring system, comprising:
the template module is used for constructing a digestive endoscopy report structural template module;
the digestive endoscopy report data calling and labeling module is used for calling and labeling the digestive endoscopy report data;
the module is used for representing the word vectors and the document matrix of the called report documents;
a module for modeling word context based on a matrix representation of a document;
a module for identifying and labeling the structured words according to the word vector representation of the word context;
and the module is used for constructing the structured report of the digestive endoscope according to the marks of the structured words and the structured template.
Example three:
a computer readable storage medium having stored therein a plurality of instructions adapted to be loaded by a processor of a terminal device and to execute a deep learning based digestive endoscopy report structuring method.
Example four:
a terminal device comprising a processor and a computer readable storage medium, the processor being configured to implement instructions; the computer readable storage medium is used for storing a plurality of instructions, and the instructions are suitable for being loaded by a processor and executing the deep learning based digestive endoscope report structuring method.
Various component embodiments of the disclosure may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that a microprocessor or Digital Signal Processor (DSP) may be used in practice to implement some or all of the functionality of some or all of the components of a gateway, proxy server, system according to embodiments of the present disclosure. The present disclosure may also be embodied as apparatus or device programs (e.g., computer programs and computer program products) for performing a portion or all of the methods described herein. Such programs implementing the present disclosure may be stored on a computer-readable medium or may be in the form of one or more signals. Such a signal may be downloaded from an internet website or provided on a carrier signal or in any other form.
The computer readable storage medium is used for storing a plurality of instructions, and the instructions are suitable for being loaded by a processor and executing the intelligent health analysis method with the mental function in the first embodiment. For brevity, no further description is provided herein.
It should be understood that in this embodiment, the processor may be a central processing unit CPU, and the processor may also be other general purpose processors, digital signal processors DSP, application specific integrated circuits ASIC, off-the-shelf programmable gate arrays FPGA or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, and so on. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The computer readable storage medium may include a read-only memory and a random access memory and provide instructions and data to the processor, and a portion of the memory may also include a non-volatile random access memory. For example, the memory may also store device type information.
The steps of a method in connection with one embodiment may be embodied directly in a hardware processor, or in a combination of the hardware and software modules in the processor. The software modules may be located in ram, flash, rom, prom, or eprom, registers, among other storage media as is well known in the art. The storage medium is positioned in the memory, and the processor reads the information in the memory and combines the hardware to complete the steps of the method; in the method, a deep learning related algorithm is designed, and a GPU can be used for accelerating the training and prediction process of the algorithm. To avoid repetition, it is not described in detail here.
Those of ordinary skill in the art will appreciate that the various illustrative elements, i.e., algorithm steps, described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
Although the present disclosure has been described with reference to specific embodiments, it should be understood that the scope of the present disclosure is not limited thereto, and those skilled in the art will appreciate that various modifications and changes can be made without departing from the spirit and scope of the present disclosure.

Claims (10)

1. A digestive endoscopy report structuring method based on deep learning is characterized by comprising the following steps: the method comprises the following steps:
acquiring digestive endoscopy report data and marking the data;
performing word vector and document matrix representation on the acquired digestive endoscopy report information;
modeling the constructed word expression vector and the document expression matrix by using a bidirectional long-short term memory model in combination with the document context;
identifying and labeling report information needing to be structured by using a conditional random field for the word vector based on the context coding;
and matching the recognition and extraction result with a pre-constructed structured template, wherein the structured template is obtained by constructing a key-value pair relation based on different disease information and lesion part information in historical data, and obtaining a final structured result according to the matched template.
2. The deep learning-based digestive endoscopy report structuring method according to claim 1, further comprising: and constructing a key-value pair relation according to the content difference described by the digestive endoscopy report text, constructing different templates according to diseases, lesion parts and the like, and respectively constructing a data table for storage for each template.
3. The deep learning-based digestive endoscopy report structuring method according to claim 1, further comprising: marking the extracted digestive endoscope data according to the structured template, and marking endoscope report information needing to be extracted;
or further, the labeling process comprises:
acquiring digestive endoscopy report data from a database, and extracting semi-structured and unstructured parts in the data;
and (4) performing keyword screening on the report, and labeling more than a set number of templates corresponding to each type of disease.
4. The deep learning-based digestive endoscopy report structuring method according to claim 1, further comprising: performing word vector and document matrix representation on the acquired digestive endoscopy report information, specifically:
using a word segmentation tool, and adding a preset disabled word bank and special word bank segmentation;
training a word2vec model by using the segmented endoscope report text data, wherein the trained word2vec model is used for converting the segmented digestive endoscope report text data into a text vector so as to embed words, and calculating a word embedding vector of each word in the digestive endoscope report text;
for each digestive endoscopy report document, each word of which is represented by a vector, each document containing a plurality of words is represented by a matrix, and the representation from the input of the original text to the real-value matrix is completed.
5. The deep learning-based digestive endoscopy report structuring method according to claim 1, further comprising: modeling the constructed word expression vector and the document expression matrix by using a bidirectional long-short term memory model and combining document context, wherein the specific process comprises the following steps:
after a sentence expression matrix obtained by the input expression layer passes through a forward long-short term memory model, the position of each character obtains a expression vector of a hidden layer fused with the text information
Figure FDA0002494006970000021
After a backward long-short term memory model, each character position obtains a representation vector of a hidden layer fused with the following information
Figure FDA0002494006970000022
Finally, splicing the hidden layer vectors of the upper and lower layers, and finally obtaining a representation vector h fusing contexts for each charactert
6. The deep learning-based digestive endoscopy report structuring method according to claim 1, further comprising: when the word vector based on report document context coding is expressed, and the word information needing structuring is identified and labeled by using a conditional random field, the word vector based on the context coding forms a sequence according to the sequence in the document, and the word labeling result in the sequence with the highest probability is selected from all possible label sequences by using the conditional random field to be output.
7. The deep learning-based digestive endoscopy report structuring method according to claim 6, further comprising: and matching the relation between the words and the labels of the labeling result with the key value relation in the template, taking the template with the highest matching degree as the template of the labeling document, and automatically filling values in the template according to the relation between the words and the labels to obtain a final structured report.
8. The utility model provides a digestive endoscopy report structuralization system based on degree of deep learning which characterized by: the method comprises the following steps:
the annotation module is configured to acquire digestive endoscopy report data and annotate the data;
the word representation module is configured to perform word vector and document matrix representation on the acquired digestive endoscopy report information;
a bidirectional long and short term memory model building module configured to model the constructed word representation vector and document representation matrix using a bidirectional long and short term memory model in conjunction with the document context;
and the structuring module is configured to identify and label report information needing structuring on the word vector based on the context coding by using the conditional random field, match the identified and extracted result with a pre-constructed structuring template, construct a key-value pair relation on the basis of different disease information and lesion part information in historical data to obtain the structuring result, and obtain the final structuring result according to the matched template.
9. A computer-readable storage medium characterized by: a plurality of instructions stored therein, the instructions adapted to be loaded by a processor of a terminal device and to perform a deep learning based digestive endoscopy report structuring method according to any of claims 1-7.
10. A terminal device is characterized in that: the system comprises a processor and a computer readable storage medium, wherein the processor is used for realizing instructions; a computer readable storage medium for storing a plurality of instructions adapted to be loaded by a processor and to perform a deep learning based digestive endoscopy report structuring method according to any of claims 1-7.
CN202010413026.4A 2020-05-15 2020-05-15 Digestive endoscopy report structuring method and system based on deep learning Pending CN111611780A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010413026.4A CN111611780A (en) 2020-05-15 2020-05-15 Digestive endoscopy report structuring method and system based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010413026.4A CN111611780A (en) 2020-05-15 2020-05-15 Digestive endoscopy report structuring method and system based on deep learning

Publications (1)

Publication Number Publication Date
CN111611780A true CN111611780A (en) 2020-09-01

Family

ID=72205493

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010413026.4A Pending CN111611780A (en) 2020-05-15 2020-05-15 Digestive endoscopy report structuring method and system based on deep learning

Country Status (1)

Country Link
CN (1) CN111611780A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112232149A (en) * 2020-09-28 2021-01-15 北京易道博识科技有限公司 Document multi-mode information and relation extraction method and system
CN113110984A (en) * 2021-04-19 2021-07-13 中国工商银行股份有限公司 Report processing method, report processing device, computer system and readable storage medium
CN113823371A (en) * 2021-09-18 2021-12-21 上海保链科技有限公司 Medical data structured processing method, device and equipment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180157638A1 (en) * 2016-12-02 2018-06-07 Microsoft Technology Licensing, Llc Joint language understanding and dialogue management
CN110223742A (en) * 2019-06-14 2019-09-10 中南大学 The clinical manifestation information extraction method and equipment of Chinese electronic health record data
CN110277149A (en) * 2019-06-28 2019-09-24 北京百度网讯科技有限公司 Processing method, device and the equipment of electronic health record

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180157638A1 (en) * 2016-12-02 2018-06-07 Microsoft Technology Licensing, Llc Joint language understanding and dialogue management
CN110223742A (en) * 2019-06-14 2019-09-10 中南大学 The clinical manifestation information extraction method and equipment of Chinese electronic health record data
CN110277149A (en) * 2019-06-28 2019-09-24 北京百度网讯科技有限公司 Processing method, device and the equipment of electronic health record

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
王若佳,等: "BiLSTM-CRF 模型在中文电子病历命名实体识别中的应用研究", 《文献与数据学报》 *
许云峰,等: "《大数据技术及行业应用》", 31 August 2016, 北京邮电大学出版社 *
马刚: "《基于语义的Web数据挖掘》", 31 January 2014, 东北财经大学出版社 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112232149A (en) * 2020-09-28 2021-01-15 北京易道博识科技有限公司 Document multi-mode information and relation extraction method and system
CN112232149B (en) * 2020-09-28 2024-04-16 北京易道博识科技有限公司 Document multimode information and relation extraction method and system
CN113110984A (en) * 2021-04-19 2021-07-13 中国工商银行股份有限公司 Report processing method, report processing device, computer system and readable storage medium
CN113110984B (en) * 2021-04-19 2024-03-08 中国工商银行股份有限公司 Report processing method, report processing device, computer system and readable storage medium
CN113823371A (en) * 2021-09-18 2021-12-21 上海保链科技有限公司 Medical data structured processing method, device and equipment

Similar Documents

Publication Publication Date Title
CN110297908B (en) Diagnosis and treatment scheme prediction method and device
CN107977361B (en) Chinese clinical medical entity identification method based on deep semantic information representation
Xue et al. Multimodal recurrent model with attention for automated radiology report generation
CN109471895B (en) Electronic medical record phenotype extraction and phenotype name normalization method and system
CN111613339B (en) Similar medical record searching method and system based on deep learning
CN108984724B (en) Method for improving emotion classification accuracy of specific attributes by using high-dimensional representation
CN109522546B (en) Medical named entity recognition method based on context correlation
CN110210037B (en) Syndrome-oriented medical field category detection method
CN109871538A (en) A kind of Chinese electronic health record name entity recognition method
US10949456B2 (en) Method and system for mapping text phrases to a taxonomy
CN110688855A (en) Chinese medical entity identification method and system based on machine learning
CN112002411A (en) Cardiovascular and cerebrovascular disease knowledge map question-answering method based on electronic medical record
CN110866401A (en) Chinese electronic medical record named entity identification method and system based on attention mechanism
JP2021166046A (en) Method for training convolutional neural network for image recognition using image conditional mask language modeling
CN111611780A (en) Digestive endoscopy report structuring method and system based on deep learning
CN112818676B (en) Medical entity relationship joint extraction method
CN112800766B (en) Active learning-based Chinese medical entity identification labeling method and system
WO2023029502A1 (en) Method and apparatus for constructing user portrait on the basis of inquiry session, device, and medium
CN112151183A (en) Entity identification method of Chinese electronic medical record based on Lattice LSTM model
Alsharid et al. Captioning ultrasound images automatically
CN112163429B (en) Sentence correlation obtaining method, system and medium combining cyclic network and BERT
Gao et al. Named entity recognition method of Chinese EMR based on BERT-BiLSTM-CRF
CN115599901B (en) Machine question-answering method, device, equipment and storage medium based on semantic prompt
CN110019711A (en) A kind of control method and device of pair of medicine text data structureization processing
CN115688752A (en) Knowledge extraction method based on multi-semantic features

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination