CN106446526A - Electronic medical record entity relation extraction method and apparatus - Google Patents

Electronic medical record entity relation extraction method and apparatus Download PDF

Info

Publication number
CN106446526A
CN106446526A CN201610798932.4A CN201610798932A CN106446526A CN 106446526 A CN106446526 A CN 106446526A CN 201610798932 A CN201610798932 A CN 201610798932A CN 106446526 A CN106446526 A CN 106446526A
Authority
CN
China
Prior art keywords
health record
electronic health
sentence
matrix
convolutional neural
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610798932.4A
Other languages
Chinese (zh)
Other versions
CN106446526B (en
Inventor
黄亦谦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Kilo-Ampere Wise Man Information Technology Co Ltd
Original Assignee
Beijing Kilo-Ampere Wise Man Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Kilo-Ampere Wise Man Information Technology Co Ltd filed Critical Beijing Kilo-Ampere Wise Man Information Technology Co Ltd
Priority to CN201610798932.4A priority Critical patent/CN106446526B/en
Publication of CN106446526A publication Critical patent/CN106446526A/en
Application granted granted Critical
Publication of CN106446526B publication Critical patent/CN106446526B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention discloses an electronic medical record entity relation extraction method and apparatus, and belongs to the field of medical data mining. The method comprises the steps of obtaining a matrix after electronic medical record natural statement mapping through a convolutional neural network model and word vectorization representation; inputting tested electronic medical record natural statements to the trained convolutional neural network model to obtain eigenvectors; and inputting the eigenvectors to a trained classifier, and extracting an entity relation of the tested electronic medical record natural statements. Therefore, the advantages of the convolutional neural network model are utilized, the relation among entities in the electronic medical record natural statements is mined, and a technical way is provided for automatically learning electronic medical record information.

Description

Electronic health record entity relation extraction method and device
Technical field
It relates to medical data excavation applications, in particular it relates to a kind of electronic health record entity relation extraction method and Device.
Background technology
With the explosive growth of information age data volume, clinical medical data also shows that its capacity is big, speedup fast, shape Formula is various and the high feature of potential value.And in clinical treatment field, the electronic health record number being existed with natural language text form According to occupying critical role.Under this background, automatically extracted from non-structured electronic health record text data using computer Go out structurized information, i.e. information extraction technique, of great interest, there is important using value.Electronic health record is real Body Relation extraction is the core missions of its information retrieval.
Being presently used for text entities Relation extraction mainly has measure of supervision, and entity relation extraction is considered as by this kind of method One classification problem, the relation between entity in sentence is divided in the classification pre-defining, thus finish relation extracts Task.This kind of method has two mainstream research directions:A () manually extracts feature, such as part of speech, semantic role, interdependent syntax tree etc., Then classified using graders such as support vector machine or maximum entropies;B () kernel-based method, calculates input character Kernel function, the similarity according to kernel function is come predicting relation type.But, due to these methods classification performance largely On depend on basic natural language processing instrument, such as part-of-speech tagging, syntactic analysiss etc., at least there is following defect:
(1) all there is mistake in these master tools;
(2) selection of feature set relies on experience and expertise;
(3) the imperfect based process instrument of some language.
Content of the invention
The purpose of the disclosure is to provide a kind of electronic health record entity relation extraction method and device, can excavate electronics disease Go through the relation between entity.
To achieve these goals, the disclosure provides a kind of electronic health record entity relation extraction method, and methods described includes: Represented by convolutional neural networks model and term vector, obtain the matrix after electronic health record nature sentence mapping;By test Electronic health record nature input by sentence, to the convolutional neural networks model of training, obtains characteristic vector;Will be defeated for described characteristic vector Enter to extract the entity relationship of the electronic health record nature sentence of described test to the grader of training.
Alternatively, described represented by convolutional neural networks model and term vector, obtain electronic health record nature sentence reflect The step of the matrix after penetrating includes:The word of every electronic health record nature sentence of segmentation;By each word be mapped as m dimension to Amount;Described every electronic health record nature sentence after mapping is expressed as the matrix of n × m, wherein, matrix column dimension is m, OK Dimension is number n of institute's predicate.
Alternatively, in the convolutional neural networks model of the described nature input by sentence of the electronic health record by test to training, Before obtaining the step of characteristic vector, methods described also includes:Slip convolution kernel, obtains with the described electronic health record after mapping certainly So convolution results of the matrix of sentence;According to described convolution results, obtain described electronic health record nature language through maximum pond layer The feature of sentence;Using existing electronic health record training set data and described feature, described convolutional neural networks model is instructed Practice, obtain convolution nuclear parameter and classifier parameters.
Alternatively, in described slip convolution kernel, obtain the volume with the matrix of the described electronic health record nature sentence after mapping Before the step of long-pending result, methods described also includes:The row dimension of multiple adjacent words in described electronic health record nature sentence is set Convolution kernel value be random value.
Alternatively, described using existing electronic health record training set data and described feature, to described convolutional neural networks Model is trained, and obtains convolution nuclear parameter and the step of classifier parameters includes:Choose existing electronic health record training set number According to the entity relationship of described existing electronic health record training set data is carried out classification annotation;According to described classification annotation and warp Cross the feature that maximum pond layer obtains, train described convolutional neural networks model, obtain convolution nuclear parameter and classifier parameters.
Additionally, for achieving the above object, the disclosure also provides a kind of electronic health record entity relation extraction device, described device Including:Matrix acquisition module, for representing by convolutional neural networks model and term vector, obtains electronic health record nature sentence Matrix after mapping;Computing module, the convolutional neural networks of the natural input by sentence of the electronic health record for testing to training Model, obtains characteristic vector;Abstraction module, for inputting described characteristic vector to the grader of training, extracts described survey The entity relationship of the electronic health record nature sentence of examination.
Alternatively, described matrix acquisition module includes:Segmentation submodule, for splitting every electronic health record nature sentence Word;Mapping submodule, for being mapped as the vector of a m dimension by each word;Output matrix submodule, for by mapping after institute State the matrix that every electronic health record nature sentence is expressed as n × m, wherein, matrix column dimension is m, and row dimension is institute's predicate Number n.
Alternatively, described device also includes:Convolution module, for the convolution kernel that slides, obtains and the described electronics after mapping The convolution results of the matrix of case history nature sentence;Feature calculation module, for according to described convolution results, through maximum pond layer Obtain the feature of described electronic health record nature sentence;Parameter calculating module, for using existing electronic health record training set data With described feature, described convolutional neural networks model is trained, obtains convolution nuclear parameter and classifier parameters.
Alternatively, described device also includes:Setup module is multiple adjacent in described electronic health record nature sentence for arranging The value of the convolution kernel of row dimension of word is random value.
Alternatively, described parameter calculating module includes:Classification annotation submodule, for choosing existing electronic health record training Collection data, the entity relationship of described existing electronic health record training set data is carried out classification annotation;Parameter computation module, uses In the feature being annotated according to described classification and obtain through maximum pond layer, train described convolutional neural networks model, rolled up Long-pending nuclear parameter and classifier parameters.
By technique scheme, represented by convolutional neural networks model and term vector, obtain electronic health record natural Matrix after sentence mapping, by the electronic health record nature input by sentence of test to the convolutional neural networks model of training, obtains Characteristic vector, characteristic vector is inputted the entity of the electronic health record nature sentence extracting described test to the grader of training Relation.So, make use of the advantage of convolutional neural networks model, excavate the pass between entity in electronic health record natural language System, provides technological approaches for automatic study electronic health record information.
Other feature and advantage of the disclosure will be described in detail in subsequent specific embodiment part.
Brief description
Accompanying drawing is used to provide further understanding of the disclosure, and constitutes the part of description, with following tool Body embodiment is used for explaining the disclosure together, but does not constitute restriction of this disclosure.In the accompanying drawings:
Fig. 1 is the schematic flow sheet of the electronic health record entity relation extraction method that the disclosure one embodiment provides;
Fig. 2 is that the flow process obtaining the matrix after electronic health record nature sentence maps that the disclosure one embodiment provides is illustrated Figure;
Fig. 3 is the schematic flow sheet of the electronic health record entity relation extraction method that another embodiment of the disclosure provides;
Fig. 4 is the schematic flow sheet that described convolutional neural networks model is trained that the disclosure one embodiment provides;
Fig. 5 is the block diagram of the electronic health record entity relation extraction device that the disclosure one embodiment provides;
Fig. 6 is the block diagram of the matrix acquisition module that the disclosure one embodiment provides;
Fig. 7 is the block diagram of the electronic health record entity relation extraction device that another embodiment of the disclosure provides;
Fig. 8 is the block diagram of the parameter calculating module that the disclosure one embodiment provides.
Specific embodiment
It is described in detail below in conjunction with accompanying drawing specific embodiment of this disclosure.It should be appreciated that this place is retouched The specific embodiment stated is merely to illustrate and explains the disclosure, is not limited to the disclosure.
The electronic health record entity relation extraction method and device that the disclosure proposes is based on convolutional neural networks.So-called convolution Nerve net is a kind of special deep-neural-network, is also the deep layer network model of first successful Application.Convolutional Neural net profit Reduce number of parameters with spatial correlation, become a lot of computer vision systems now, as image recognition, automatic Pilot etc. Core.
The concept of convolution comes from Digital Signal Processing, and the Defined of one-dimensional form is as follows:
The physical meaning of formula (1) is output form after a system for the signal, mathematical form is seek letter Number weighted mean.
The Defined of two dimensional form is as follows:
Two-dimensional convolution is usually used in image procossing, and in formula (2), f (x, y) is the gray value of point on image, and w (x, y) is then It is convolution kernel, also referred to as wave filter.Convolution operation is equivalent to be filtered image by wave filter.In convolutional neural networks In, not all levels neuron can be joined directly together, but is used as intermediary by " convolution kernel ", and same convolution kernel exists It is shared in all images.
Every layer of convolutional neural networks are used for seek the computation layer of local average and second extraction all by feature extraction layer and thereafter, This distinctive two-layer feature extraction structure makes network have higher distortion tolerance in identification.Convolutional Neural net has three Major advantage:One is by weights shared mechanism, decreases network parameter;Two be convolution operation very fast;Three be by under Sampling mechanism is so that the feature extracted has rotational invariance and translation invariance.Convolutional Neural net almost covers all identifications And Detection task.
Fig. 1 is the schematic flow sheet of the electronic health record entity relation extraction method that the disclosure one embodiment provides.Refer to Fig. 1, methods described may comprise steps of.
In step s 110, represented by convolutional neural networks model and term vector, obtain electronic health record nature sentence Matrix after mapping.
Specifically, in convolutional neural networks model, using term vector, electronic health record nature sentence is mapped, every Sentence is expressed as matrix.
Exemplarily, using term vector modeling tool, the word of every electronic health record nature sentence is mapped as one 400 dimension Vector, every sentence is expressed as matrix.Wherein, rectangular array dimension is 400, and row dimension is the number of word in this sentence.
Fig. 2 is that the flow process obtaining the matrix after electronic health record nature sentence maps that the disclosure one embodiment provides is illustrated Figure.Refer to Fig. 2, represented by convolutional neural networks model and term vector, after obtaining electronic health record nature sentence mapping The step (step S110) of matrix may comprise steps of.
In step S210, the word of every electronic health record nature sentence of segmentation.
Specifically, by word all individual segmentation of every electronic health record nature sentence out, it is represented by:
Wn={ w1,w2,w3,…,wn} (3)
In formula (3), WnRepresent the term vector expression formula after a sentence segmentation, n represents the number of word in sentence.
In step S220, each word is mapped as the vector of a m dimension.
Specifically, using term vector modeling tool, each word is mapped as the vector of a m dimension, is represented by:
In formula (4),Represent word wiTerm vector after the mapping of term vector modeling tool, D represents term vector model The dictionary function of instrument.
Alternatively, described term vector modeling tool at least include Google increase income term vector training tool Word2vec and The GloVe of Stanford University.
Exemplarily, m is taken to be 400, that is,:Each word is mapped as the vector that dimension is 400.
In step S230, the described every electronic health record nature sentence after mapping is expressed as the matrix of n × m, wherein, Matrix column dimension is m, and row dimension is number n of institute's predicate.
Exemplarily, matrix column dimension m is taken to be 400, row dimension is number n of institute's predicate, then the electricity after every mapping Sub- case history nature sentence is expressed as matrix Vn×400.
Return Fig. 1, in the step s 120, by the convolutional Neural net of the electronic health record nature input by sentence of test to training Network model, obtains characteristic vector.
Specifically, by the matrix after electronic health record nature sentence mapping through convolutional layer and maximum pond layer, then carry out non- Linear Mapping, is obtained feature, the electronic health record nature sentence of input test, is obtained using the convolutional neural networks model training The characteristic vector of this natural sentence.
When training, the f (w that gives a mark is carried out to the window of continuous n wordt-n+1,…,wt-1, wt), fraction more high then this Word is more normal.It is assumed herein that under the conditions of, the minimum object function of convolutional neural networks model is:
In formula (5), χ is all continuous n units phrase in corpus, and D is the dictionary comprising all words.First Summation is used whole n unit phrases in corpus as positive sample.Second summation is by the replacement acquisition to word in dictionary Negative sample.x(w)It is by middle word in phrase x, random replacement becomes w.In most cases, replaced with a random word Change the word in a normal segment phrase, this segment phrase will become no longer reasonable, so x(w)Constitute negative sample.
In step s 130, described characteristic vector is inputted the grader to training, extract the electronics disease of described test Go through the entity relationship of nature sentence.
Specifically, characteristic vector is input to the grader of training, extracts the electronics of test according to maximum of probability principle The entity relationship of case history nature sentence.
Alternatively, described grader can be Softmax grader.
The electronic health record entity relation extraction method that the present embodiment provides, by convolutional neural networks model and term vector Represent, obtain the matrix after electronic health record nature sentence mapping, by the electronic health record nature input by sentence of test to training Convolutional neural networks model, obtains characteristic vector, characteristic vector is inputted the electricity extracting described test to the grader of training The entity relationship of sub- case history nature sentence, thus make use of the advantage of convolutional neural networks model, excavates electronic health record natural Relation between entity in language, provides technological approaches for automatic study electronic health record information.
Fig. 3 is the schematic flow sheet of the electronic health record entity relation extraction method that another embodiment of the disclosure provides.Reference Fig. 3, on the basis of Fig. 1, in the convolutional neural networks mould of the described nature input by sentence of the electronic health record by test to training Type, before obtaining the step (step S120) of characteristic vector, methods described is further comprising the steps of.
Step S310, slip convolution kernel, obtain the convolution knot with the matrix of the described electronic health record nature sentence after mapping Really.
Specifically, longitudinal sliding motion convolution kernel, obtains and the electronic health record nature sentence matrix V after mappingn×400Convolution knot Really, it is represented by:
C={ c1,c2,…,cn-h+1} (7)
In formula (6), Vn×400Represent the matrix of the electronic health record nature sentence after every mapping, L represents convolution kernel, C Represent convolution results.In formula (7), the dimension of C is n-h+1, and n is the number of word in sentence, and h is the row dimension of convolution kernel.
In step s 320, according to described convolution results, obtain described electronic health record nature sentence through maximum pond layer Feature.
Specifically, the multiple convolution results being obtained according to each convolution kernel, obtain electronic health record certainly through maximum pond layer So feature of sentence.
In step S330, using existing electronic health record training set data and described feature, to described convolutional Neural net Network model is trained, and obtains convolution nuclear parameter and classifier parameters.
In an embodiment of the disclosure, on the basis of Fig. 3, slip convolution kernel, obtain and the described electronics after mapping Before the step (step S310) of the convolution results of matrix of case history nature sentence, methods described can also include:Setting is described In electronic health record nature sentence, the value of the convolution kernel of row dimension of multiple adjacent words is random value.
Exemplarily, in electronic health record nature sentence, the row dimension of multiple adjacent words is selected to be respectively 3,4,5 convolution Each 100 of core, the row dimension of all convolution kernels is 400, and the value of convolution kernel is random value, then three kinds of convolution kernels are expressed as L3×400、L4×400、L5×400.
Fig. 4 is the schematic flow sheet that described convolutional neural networks model is trained that the disclosure one embodiment provides. Refer to Fig. 4, on the basis of Fig. 3, using existing electronic health record training set data and described feature, to described convolutional Neural Network model is trained, and obtains convolution nuclear parameter and the step (step S330) of classifier parameters may comprise steps of.
In step S410, choose existing electronic health record training set data, by described existing electronic health record training set The entity relationship of data carries out classification annotation.
In the step s 420, annotate and pass through, according to described classification, the feature that maximum pond layer obtains, train described convolution Neural network model, obtains convolution nuclear parameter and classifier parameters.
Specifically, according to gradient descent method, convolutional neural networks model is trained, obtains convolution nuclear parameter and divide Class device parameter.
Further, above-mentioned parameter can be expressed as:θ=(F, S), wherein, F represents convolution nuclear parameter, and S represents grader Parameter.
Alternatively, grader is Softmax grader.
The electronic health record entity relation extraction method of the present embodiment, using shallow-layer network, the input layer of network is by term vector A matrix is constituted, this matrix, after convolutional layer and pond layer, obtains feature, uses after nature sentence is mapped Softmax grader, the class label after output category, thus utilizing convolutional neural networks model, excavate in electronic health record Relation between entity, provides technological approaches for automatic study electronic health record information.
Fig. 5 is the block diagram of the electronic health record entity relation extraction device that the disclosure one embodiment provides.Refer to Fig. 5, institute State electronic health record entity relation extraction device 500 and can include matrix acquisition module 510, computing module 520 and abstraction module 530.
Matrix acquisition module 510, for representing by convolutional neural networks model and term vector, obtains electronic health record certainly So matrix after sentence mapping.
Specifically, in convolutional neural networks model, matrix acquisition module 510 use term vector by electronic health record nature language Sentence is mapped, and every sentence is expressed as matrix.
Exemplarily, using term vector modeling tool, the word of every electronic health record nature sentence is mapped as one 400 dimension Vector, every sentence is expressed as matrix, and wherein, rectangular array dimension is 400, and row dimension is the number of word in this sentence.
Fig. 6 is the block diagram of the matrix acquisition module 510 that the disclosure one embodiment provides.Refer to Fig. 6, matrix acquisition module 510 can include splitting submodule 610, mapping submodule 620 and Output matrix submodule 630.
Segmentation submodule 610 is used for splitting the word of every electronic health record nature sentence.
Specifically, segmentation submodule 610 by word all individual segmentation of every electronic health record nature sentence out, can represent For:
Wn={ w1,w2,w3,…,wn} (3)
In formula (3), WnRepresent the term vector expression formula after a sentence segmentation, n represents the number of word in sentence.
Mapping submodule 620 is used for each word is mapped as the vector of a m dimension.
Specifically, each word is mapped as the vector of a m dimension by mapping submodule 620 using term vector modeling tool, can It is expressed as:
In formula (4),Represent word wiTerm vector after the mapping of term vector modeling tool, D represents term vector model The dictionary function of instrument.
Alternatively, described term vector modeling tool at least include Google increase income term vector training tool Word2vec and The GloVe of Stanford University.
Exemplarily, m is taken to be 400, that is,:Each word is mapped as the vector that dimension is 400.
Output matrix submodule 630 is used for the described every electronic health record nature sentence after mapping is expressed as the square of n × m Battle array, wherein, matrix column dimension is m, and row dimension is number n of institute's predicate.
Exemplarily, matrix column dimension m is taken to be 400, row dimension is number n of institute's predicate, then Output matrix submodule Electronic health record nature sentence after every mapping is expressed as matrix V by 630n×400.
Return Fig. 5, computing module 520 is used for the convolutional Neural of the electronic health record nature input by sentence of test to training Network model, obtains characteristic vector.
Specifically, the matrix after electronic health record nature sentence is mapped by computing module 520 is through convolutional layer and maximum pond Layer, then carries out nonlinear mapping, obtains feature, the electronic health record nature sentence of input test, and computing module 520 is using training Convolutional neural networks model obtain the characteristic vector of this natural sentence.
When training, the f (w that gives a mark is carried out to the window of continuous n wordt-n+1,…,wt-1, wt), fraction more high then this Word is more normal.It is assumed herein that under the conditions of, the minimum object function of convolutional neural networks model is:
In formula (5), χ is all continuous n units phrase in corpus, and D is the dictionary comprising all words.First Summation is used whole n unit phrases in corpus as positive sample.Second summation is by the replacement acquisition to word in dictionary Negative sample.x(w)It is by middle word in phrase x, random replacement becomes w.In most cases, replaced with a random word Change the word in a normal segment phrase, this segment phrase will become no longer reasonable, so x(w)Constitute negative sample.
Abstraction module 530, for inputting the electronics extracting described test to the grader of training by described characteristic vector The entity relationship of case history nature sentence.
Specifically, characteristic vector is input to the grader of training, abstraction module 530 extracts according to maximum of probability principle The entity relationship of the electronic health record nature sentence of test.
Alternatively, described grader can be Softmax grader.
The electronic health record entity relation extraction device that the present embodiment provides, matrix acquisition module 510 passes through convolutional Neural net Network model and term vector represent, obtain the matrix after electronic health record nature sentence mapping, the electronics that computing module 520 will be tested Case history nature input by sentence, to the convolutional neural networks model of training, obtains characteristic vector, abstraction module 530 is by characteristic vector Input the entity relationship of the electronic health record nature sentence extracting described test to the grader of training, thus make use of convolution The advantage of neural network model, excavates the relation between entity in electronic health record natural language, for automatic study electronic health record Information provides technological approaches.
Fig. 7 is the block diagram of the electronic health record entity relation extraction device that another embodiment of the disclosure provides.Refer to Fig. 7, On the basis of Fig. 5, described device also includes convolution module 710, feature calculation module 720, parameter calculating module 730.
Convolution module 710 is used for the convolution kernel that slides, and obtains and the matrix of the described electronic health record nature sentence after mapping Convolution results.
Specifically, longitudinal sliding motion convolution kernel, convolution module 710 obtains and the electronic health record nature sentence matrix after mapping Vn×400Convolution results, be represented by:
C={ c1,c2,…,cn-h+1} (7)
In formula (6), Vn×400Represent the matrix of the electronic health record nature sentence after every mapping, L represents convolution kernel, C Represent convolution results.In formula (7), the dimension of C is n-h+1, and n is the number of word in sentence, and h is the row dimension of convolution kernel.
Feature calculation module 720 is used for, according to described convolution results, obtaining described electronic health record certainly through maximum pond layer So feature of sentence.
Specifically, multiple convolution results that feature calculation module 730 obtains according to each convolution kernel, through maximum pond layer Obtain the feature of electronic health record nature sentence.
Parameter calculating module 730 is used for using existing electronic health record training set data and described feature, to described convolution Neural network model is trained, and obtains convolution nuclear parameter and classifier parameters.
Alternatively, on the basis of Fig. 7, described device can also include setup module.
Setup module is used for arranging the value of the convolution kernel of row dimension of multiple adjacent words in described electronic health record nature sentence For random value.
Exemplarily, in electronic health record nature sentence, the row dimension respectively 3 of the setup module multiple adjacent words of selection, 4th, each 100 of 5 convolution kernel, the row dimension of all convolution kernels is 400, and the value of convolution kernel is random value, then three kinds of convolution kernels divide It is not expressed as L3×400、L4×400、L5×400.
Fig. 8 is the block diagram of the parameter calculating module 730 that the disclosure one embodiment provides.Refer to Fig. 8, parameter calculating module 730 can include classification annotation submodule 810 and parameter computation module 820.
Classification annotation submodule 810 is used for choosing existing electronic health record training set data, by described existing electronics disease The entity relationship going through training set data carries out classification annotation.
Parameter computation module 820 is used for the feature obtain according to described classification annotation and through maximum pond layer, training Described convolutional neural networks model, obtains convolution nuclear parameter and classifier parameters.
Specifically, parameter computation module 820 is trained to convolutional neural networks model according to gradient descent method, obtains To convolution nuclear parameter and classifier parameters.
Further, above-mentioned parameter can be expressed as:θ=(F, S), wherein, F represents convolution nuclear parameter, and S represents grader Parameter.
Alternatively, grader is Softmax grader.
The electronic health record entity relation extraction device of the present embodiment, using shallow-layer network, the input layer of network is by term vector A matrix is constituted, this matrix, after convolutional layer and pond layer, obtains feature, uses after nature sentence is mapped Softmax grader, the class label after output category, thus utilizing convolutional neural networks model, excavate in electronic health record Relation between entity, provides technological approaches for automatic study electronic health record information.
Describe the preferred implementation of the disclosure above in association with accompanying drawing in detail, but, the disclosure is not limited to above-mentioned reality Apply the detail in mode, in the range of the technology design of the disclosure, multiple letters can be carried out with technical scheme of this disclosure Monotropic type, these simple variant belong to the protection domain of the disclosure.
It is further to note that each particular technique feature described in above-mentioned specific embodiment, in not lance In the case of shield, can be combined by any suitable means, in order to avoid unnecessary repetition, the disclosure to various can The compound mode of energy no longer separately illustrates.
Additionally, combination in any can also be carried out between the various different embodiment of the disclosure, as long as it is without prejudice to this Disclosed thought, it equally should be considered as disclosure disclosure of that.

Claims (10)

1. a kind of electronic health record entity relation extraction method is it is characterised in that methods described includes:
Represented by convolutional neural networks model and term vector, obtain the matrix after electronic health record nature sentence mapping;
By the electronic health record nature input by sentence of test to the convolutional neural networks model of training, obtain characteristic vector;
The entity described characteristic vector being inputted the electronic health record nature sentence extracting described test to the grader of training closes System.
2. method according to claim 1 it is characterised in that described by convolutional neural networks model with term vector table Show, the step obtaining the matrix after electronic health record nature sentence mapping includes:
The word of every electronic health record nature sentence of segmentation;
Each word is mapped as the vector of a m dimension;
Described every electronic health record nature sentence after mapping is expressed as the matrix of n × m, wherein, matrix column dimension is m, Row dimension is number n of institute's predicate.
3. method according to claim 1 it is characterised in that described by test electronic health record nature input by sentence extremely The convolutional neural networks model of training, before obtaining the step of characteristic vector, methods described also includes:
Slip convolution kernel, obtains the convolution results with the matrix of the described electronic health record nature sentence after mapping;
According to described convolution results, obtain the feature of described electronic health record nature sentence through maximum pond layer;
Using existing electronic health record training set data and described feature, described convolutional neural networks model is trained, obtains To convolution nuclear parameter and classifier parameters.
4. method according to claim 3 is it is characterised in that in described slip convolution kernel, obtain with mapping after described Before the step of the convolution results of matrix of electronic health record nature sentence, methods described also includes:
The value arranging the convolution kernel of row dimension of multiple adjacent words in described electronic health record nature sentence is random value.
5. method according to claim 3 it is characterised in that described using existing electronic health record training set data and institute State feature, described convolutional neural networks model is trained, obtain convolution nuclear parameter and the step of classifier parameters includes:
Choose existing electronic health record training set data, the entity relationship of described existing electronic health record training set data is carried out Classification annotation;
The feature obtain according to described classification annotation and through maximum pond layer, trains described convolutional neural networks model, obtains Convolution nuclear parameter and classifier parameters.
6. a kind of electronic health record entity relation extraction device is it is characterised in that described device includes:
Matrix acquisition module, for representing by convolutional neural networks model and term vector, obtains electronic health record nature sentence Matrix after mapping;
Computing module, obtains to the convolutional neural networks model of training for by the electronic health record nature input by sentence of test Characteristic vector;
Abstraction module, for inputting the electronic health record extracting described test to the grader of training certainly by described characteristic vector So entity relationship of sentence.
7. device according to claim 6 is it is characterised in that described matrix acquisition module includes:
Segmentation submodule, for splitting the word of every electronic health record nature sentence;
Mapping submodule, for being mapped as the vector of a m dimension by each word;
Output matrix submodule, for the described every electronic health record nature sentence after mapping being expressed as the matrix of n × m, its In, matrix column dimension is m, and row dimension is number n of institute's predicate.
8. device according to claim 6 is it is characterised in that described device also includes:
Convolution module, for the convolution kernel that slides, obtains the convolution knot with the matrix of the described electronic health record nature sentence after mapping Really;
Feature calculation module, for according to described convolution results, obtaining described electronic health record nature sentence through maximum pond layer Feature;
Parameter calculating module, for using existing electronic health record training set data and described feature, to described convolutional Neural net Network model is trained, and obtains convolution nuclear parameter and classifier parameters.
9. device according to claim 8 is it is characterised in that described device also includes:
Setup module, the value for arranging the convolution kernel of the row dimension of multiple adjacent words in described electronic health record nature sentence be with Machine value.
10. device according to claim 8 is it is characterised in that described parameter calculating module includes:
Classification annotation submodule, for choosing existing electronic health record training set data, described existing electronic health record is trained The entity relationship of collection data carries out classification annotation;
Parameter computation module, for the feature annotating according to described classification and obtain through maximum pond layer, trains described volume Long-pending neural network model, obtains convolution nuclear parameter and classifier parameters.
CN201610798932.4A 2016-08-31 2016-08-31 Electronic health record entity relation extraction method and device Active CN106446526B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610798932.4A CN106446526B (en) 2016-08-31 2016-08-31 Electronic health record entity relation extraction method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610798932.4A CN106446526B (en) 2016-08-31 2016-08-31 Electronic health record entity relation extraction method and device

Publications (2)

Publication Number Publication Date
CN106446526A true CN106446526A (en) 2017-02-22
CN106446526B CN106446526B (en) 2019-11-15

Family

ID=58164748

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610798932.4A Active CN106446526B (en) 2016-08-31 2016-08-31 Electronic health record entity relation extraction method and device

Country Status (1)

Country Link
CN (1) CN106446526B (en)

Cited By (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106886516A (en) * 2017-02-27 2017-06-23 竹间智能科技(上海)有限公司 The method and device of automatic identification statement relationship and entity
CN107038336A (en) * 2017-03-21 2017-08-11 科大讯飞股份有限公司 A kind of electronic health record automatic generation method and device
CN107220237A (en) * 2017-05-24 2017-09-29 南京大学 A kind of method of business entity's Relation extraction based on convolutional neural networks
CN107833595A (en) * 2017-10-12 2018-03-23 山东大学 Medical big data multicenter integration platform and method
CN107833603A (en) * 2017-11-13 2018-03-23 医渡云(北京)技术有限公司 Electronic medical record document sorting technique, device, electronic equipment and storage medium
CN107863147A (en) * 2017-10-24 2018-03-30 清华大学 The method of medical diagnosis based on depth convolutional neural networks
CN108536754A (en) * 2018-03-14 2018-09-14 四川大学 Electronic health record entity relation extraction method based on BLSTM and attention mechanism
WO2018205715A1 (en) * 2017-05-08 2018-11-15 京东方科技集团股份有限公司 Medical image representation-generating system, training method therefor and representation generation method
CN109243616A (en) * 2018-06-29 2019-01-18 东华大学 Mammary gland electronic health record joint Relation extraction and architectural system based on deep learning
CN109284497A (en) * 2017-07-20 2019-01-29 京东方科技集团股份有限公司 The method and apparatus of medical bodies in the medical text of natural language for identification
CN109300550A (en) * 2018-11-09 2019-02-01 天津新开心生活科技有限公司 Medical data relation excavation method and device
CN109670179A (en) * 2018-12-20 2019-04-23 中山大学 Case history text based on iteration expansion convolutional neural networks names entity recognition method
CN109920501A (en) * 2019-01-24 2019-06-21 西安交通大学 Electronic health record classification method and system based on convolutional neural networks and Active Learning
CN110162767A (en) * 2018-02-12 2019-08-23 北京京东尚科信息技术有限公司 The method and apparatus of text error correction
CN110188193A (en) * 2019-04-19 2019-08-30 四川大学 A kind of electronic health record entity relation extraction method based on most short interdependent subtree
CN110287270A (en) * 2019-06-14 2019-09-27 北京百度网讯科技有限公司 Entity relationship method for digging and equipment
CN110517747A (en) * 2019-08-30 2019-11-29 志诺维思(北京)基因科技有限公司 Pathological data processing method, device and electronic equipment
CN111046185A (en) * 2019-12-16 2020-04-21 重庆邮电大学 Method, device and terminal for extracting knowledge graph relation of text information
CN111145903A (en) * 2019-12-18 2020-05-12 东北大学 Method and device for acquiring vertigo inquiry text, electronic equipment and inquiry system
CN111180025A (en) * 2019-12-18 2020-05-19 东北大学 Method and device for representing medical record text vector and inquiry system
CN111191668A (en) * 2018-11-15 2020-05-22 零氪科技(北京)有限公司 Method for identifying disease content in medical record text
CN111199801A (en) * 2018-11-19 2020-05-26 零氪医疗智能科技(广州)有限公司 Construction method and application of model for identifying disease types of medical records
CN111435410A (en) * 2019-01-14 2020-07-21 阿里巴巴集团控股有限公司 Relationship extraction method and device for medical texts
CN111611395A (en) * 2019-02-25 2020-09-01 北京嘀嘀无限科技发展有限公司 Entity relationship identification method and device
WO2020211250A1 (en) * 2019-04-19 2020-10-22 平安科技(深圳)有限公司 Entity recognition method and apparatus for chinese medical record, device and storage medium
US11514091B2 (en) 2019-01-07 2022-11-29 International Business Machines Corporation Extracting entity relations from semi-structured information

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090037334A1 (en) * 2007-08-01 2009-02-05 Taipei Medical University Electronic medical record system, method for storing medical record data in the medical record system, and a portable electronic device loading the electronic medical record system therein
US20110251984A1 (en) * 2010-04-09 2011-10-13 Microsoft Corporation Web-scale entity relationship extraction
CN104809176A (en) * 2015-04-13 2015-07-29 中央民族大学 Entity relationship extracting method of Zang language
CN104915448A (en) * 2015-06-30 2015-09-16 中国科学院自动化研究所 Substance and paragraph linking method based on hierarchical convolutional network
CN104965992A (en) * 2015-07-13 2015-10-07 南开大学 Text mining method based on online medical question and answer information
CN105335712A (en) * 2015-10-26 2016-02-17 小米科技有限责任公司 Image recognition method, device and terminal
CN105512209A (en) * 2015-11-28 2016-04-20 大连理工大学 Biomedicine event trigger word identification method based on characteristic automatic learning

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090037334A1 (en) * 2007-08-01 2009-02-05 Taipei Medical University Electronic medical record system, method for storing medical record data in the medical record system, and a portable electronic device loading the electronic medical record system therein
US20110251984A1 (en) * 2010-04-09 2011-10-13 Microsoft Corporation Web-scale entity relationship extraction
CN104809176A (en) * 2015-04-13 2015-07-29 中央民族大学 Entity relationship extracting method of Zang language
CN104915448A (en) * 2015-06-30 2015-09-16 中国科学院自动化研究所 Substance and paragraph linking method based on hierarchical convolutional network
CN104965992A (en) * 2015-07-13 2015-10-07 南开大学 Text mining method based on online medical question and answer information
CN105335712A (en) * 2015-10-26 2016-02-17 小米科技有限责任公司 Image recognition method, device and terminal
CN105512209A (en) * 2015-11-28 2016-04-20 大连理工大学 Biomedicine event trigger word identification method based on characteristic automatic learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
杨锦锋 等: "电子病历命名实体识别和实体关系抽取研究综述", 《自动化学报》 *
芮挺 等: "基于深度卷积神经网络的行人检测", 《计算机工程与应用》 *

Cited By (40)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106886516A (en) * 2017-02-27 2017-06-23 竹间智能科技(上海)有限公司 The method and device of automatic identification statement relationship and entity
CN107038336A (en) * 2017-03-21 2017-08-11 科大讯飞股份有限公司 A kind of electronic health record automatic generation method and device
US11024066B2 (en) 2017-05-08 2021-06-01 Boe Technology Group Co., Ltd. Presentation generating system for medical images, training method thereof and presentation generating method
WO2018205715A1 (en) * 2017-05-08 2018-11-15 京东方科技集团股份有限公司 Medical image representation-generating system, training method therefor and representation generation method
CN107220237A (en) * 2017-05-24 2017-09-29 南京大学 A kind of method of business entity's Relation extraction based on convolutional neural networks
CN109284497A (en) * 2017-07-20 2019-01-29 京东方科技集团股份有限公司 The method and apparatus of medical bodies in the medical text of natural language for identification
CN109284497B (en) * 2017-07-20 2021-01-12 京东方科技集团股份有限公司 Method and apparatus for identifying medical entities in medical text in natural language
US11586809B2 (en) 2017-07-20 2023-02-21 Boe Technology Group Co., Ltd. Method and apparatus for recognizing medical entity in medical text
CN107833595A (en) * 2017-10-12 2018-03-23 山东大学 Medical big data multicenter integration platform and method
CN107863147A (en) * 2017-10-24 2018-03-30 清华大学 The method of medical diagnosis based on depth convolutional neural networks
CN107863147B (en) * 2017-10-24 2021-03-16 清华大学 Medical diagnosis method based on deep convolutional neural network
CN107833603A (en) * 2017-11-13 2018-03-23 医渡云(北京)技术有限公司 Electronic medical record document sorting technique, device, electronic equipment and storage medium
CN110162767A (en) * 2018-02-12 2019-08-23 北京京东尚科信息技术有限公司 The method and apparatus of text error correction
CN108536754A (en) * 2018-03-14 2018-09-14 四川大学 Electronic health record entity relation extraction method based on BLSTM and attention mechanism
CN109243616A (en) * 2018-06-29 2019-01-18 东华大学 Mammary gland electronic health record joint Relation extraction and architectural system based on deep learning
CN109300550B (en) * 2018-11-09 2021-11-26 天津新开心生活科技有限公司 Medical data relation mining method and device
CN109300550A (en) * 2018-11-09 2019-02-01 天津新开心生活科技有限公司 Medical data relation excavation method and device
CN111191668B (en) * 2018-11-15 2023-04-28 零氪科技(北京)有限公司 Method for identifying disease content in medical record text
CN111191668A (en) * 2018-11-15 2020-05-22 零氪科技(北京)有限公司 Method for identifying disease content in medical record text
CN111199801B (en) * 2018-11-19 2023-08-08 零氪医疗智能科技(广州)有限公司 Construction method and application of model for identifying disease types of medical records
CN111199801A (en) * 2018-11-19 2020-05-26 零氪医疗智能科技(广州)有限公司 Construction method and application of model for identifying disease types of medical records
CN109670179A (en) * 2018-12-20 2019-04-23 中山大学 Case history text based on iteration expansion convolutional neural networks names entity recognition method
CN109670179B (en) * 2018-12-20 2022-11-11 中山大学 Medical record text named entity identification method based on iterative expansion convolutional neural network
US11514091B2 (en) 2019-01-07 2022-11-29 International Business Machines Corporation Extracting entity relations from semi-structured information
CN111435410A (en) * 2019-01-14 2020-07-21 阿里巴巴集团控股有限公司 Relationship extraction method and device for medical texts
CN111435410B (en) * 2019-01-14 2023-04-14 阿里巴巴集团控股有限公司 Relationship extraction method and device for medical texts
CN109920501B (en) * 2019-01-24 2021-04-20 西安交通大学 Electronic medical record classification method and system based on convolutional neural network and active learning
CN109920501A (en) * 2019-01-24 2019-06-21 西安交通大学 Electronic health record classification method and system based on convolutional neural networks and Active Learning
CN111611395A (en) * 2019-02-25 2020-09-01 北京嘀嘀无限科技发展有限公司 Entity relationship identification method and device
CN111611395B (en) * 2019-02-25 2023-05-16 北京嘀嘀无限科技发展有限公司 Entity relationship identification method and device
CN110188193A (en) * 2019-04-19 2019-08-30 四川大学 A kind of electronic health record entity relation extraction method based on most short interdependent subtree
WO2020211250A1 (en) * 2019-04-19 2020-10-22 平安科技(深圳)有限公司 Entity recognition method and apparatus for chinese medical record, device and storage medium
CN110287270B (en) * 2019-06-14 2021-09-14 北京百度网讯科技有限公司 Entity relationship mining method and equipment
CN110287270A (en) * 2019-06-14 2019-09-27 北京百度网讯科技有限公司 Entity relationship method for digging and equipment
CN110517747B (en) * 2019-08-30 2022-06-03 志诺维思(北京)基因科技有限公司 Pathological data processing method and device and electronic equipment
CN110517747A (en) * 2019-08-30 2019-11-29 志诺维思(北京)基因科技有限公司 Pathological data processing method, device and electronic equipment
CN111046185B (en) * 2019-12-16 2023-02-24 重庆邮电大学 Method, device and terminal for extracting knowledge graph relation of text information
CN111046185A (en) * 2019-12-16 2020-04-21 重庆邮电大学 Method, device and terminal for extracting knowledge graph relation of text information
CN111180025A (en) * 2019-12-18 2020-05-19 东北大学 Method and device for representing medical record text vector and inquiry system
CN111145903A (en) * 2019-12-18 2020-05-12 东北大学 Method and device for acquiring vertigo inquiry text, electronic equipment and inquiry system

Also Published As

Publication number Publication date
CN106446526B (en) 2019-11-15

Similar Documents

Publication Publication Date Title
CN106446526A (en) Electronic medical record entity relation extraction method and apparatus
CN110750959B (en) Text information processing method, model training method and related device
US9779085B2 (en) Multilingual embeddings for natural language processing
Huang et al. Instance-aware image and sentence matching with selective multimodal lstm
Wieting et al. Charagram: Embedding words and sentences via character n-grams
Dekhtyar et al. Re data challenge: Requirements identification with word2vec and tensorflow
CN107025284A (en) The recognition methods of network comment text emotion tendency and convolutional neural networks model
CN107766324A (en) A kind of text coherence analysis method based on deep neural network
CN110209806A (en) File classification method, document sorting apparatus and computer readable storage medium
CN106503055A (en) A kind of generation method from structured text to iamge description
CN106570148A (en) Convolutional neutral network-based attribute extraction method
CN105868184A (en) Chinese name recognition method based on recurrent neural network
CN108108354B (en) Microblog user gender prediction method based on deep learning
CN107301165A (en) A kind of item difficulty analysis method and system
CN106778878B (en) Character relation classification method and device
Rizvi et al. Optical character recognition system for Nastalique Urdu-like script languages using supervised learning
Ghaeini et al. Saliency learning: Teaching the model where to pay attention
CN106682089A (en) RNNs-based method for automatic safety checking of short message
CN110569511A (en) Electronic medical record feature extraction method based on hybrid neural network
CN111966812A (en) Automatic question answering method based on dynamic word vector and storage medium
Khayyat et al. A deep learning based prediction of arabic manuscripts handwriting style.
CN106227836A (en) Associating visual concept learning system and method is supervised with the nothing of word based on image
Akhlaghi et al. Farsi handwritten phone number recognition using deep learning
Fallah et al. Detecting features of human personality based on handwriting using learning algorithms
CN114373554A (en) Drug interaction relation extraction method using drug knowledge and syntactic dependency relation

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant