CN106383853A - Realization method and system for electronic medical record post-structuring and auxiliary diagnosis - Google Patents
Realization method and system for electronic medical record post-structuring and auxiliary diagnosis Download PDFInfo
- Publication number
- CN106383853A CN106383853A CN201610787187.3A CN201610787187A CN106383853A CN 106383853 A CN106383853 A CN 106383853A CN 201610787187 A CN201610787187 A CN 201610787187A CN 106383853 A CN106383853 A CN 106383853A
- Authority
- CN
- China
- Prior art keywords
- document
- dictionary
- feature
- featured
- term
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
- G06F16/258—Data format conversion from or to a database
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Medical Treatment And Welfare Office Work (AREA)
Abstract
The invention relates to a realization method and system for electronic medical record post-structuring and auxiliary diagnosis. A combination mode of multiple types of distance measurement is used: a character string editing distance refers to a minimum number of replacement, insertion and deletion operations required for converting a character into another character string; a Jaro-Winkler distance measures similarity between two character strings and is used for repeated recording detection; a geometric mean value of a Chinese character distance and a Chinese character input method is adopted as comprehensive similarity measurement for measuring similarity between characteristic texts; characteristic ranking is realized by using a TF-IDF method and is used for assessing the importance of characteristic terms relative to documents in a file set or a corpus library, and the importance of the characteristic terms is in direct proportion to an occurrence frequency in the documents and is in inverse proportion to an occurrence document in the corpus library; and files are converted to be in a file format of PU learning of a positive example data set and an unlabelled data set according to the generated characteristic terms, and through the PU learning, the system automatically recommends related diagnoses for clinical medical personnel to refer.
Description
Technical field
The present invention relates to a kind of electronic health record structural system and its implementation are and in particular to structure after a kind of electronic health record
The implementation method of change and auxiliary diagnosis and its system.
Background technology
Traditional electronic health record data is record in the form of word description, although the certain mark of the structure of case history
Standard is as foundation, but because relevant medical clinical field is more complicated, has each different contents in field, even same
Content, corresponding description method is also had nothing in common with each other, therefore will preferably generating structure electronic health record difficult:By certainly
So Language Processing (Natural Language Processing, NLP) is extracted structurized content from plain text describes
It is a kind of method.A solution is also had to be to realize the structuring of medical record information, full structure by structuring typing mode
The electronic medical record system changed can not represent the truly expressed of clinician sometimes completely, and full structuring comes to user of service
Say that requirement is very high although full structuring can bring certain facility for clinical data analysis research.Such mode is to case history
Standardisation requirements are higher, and structuring will have corresponding standard medical term to describe, but in standard medical term coded system
Conception division will not be so fine, and standardize bring accuracy be contradiction with the flexibility of typing in practical application
Although successively having released the respective standard solving this problem in the world, such as:SNOMED(The Systematized
Nomenclature of Human and Veterinary Medicine), SNOMED CT, ICD-10 (International Classification of Diseases
System) etc., but generally require to make very big adjustment in actual applications, and the Chinesizing work about standard also relatively lags behind,
These are all the correlative factors in impact structuring typing using standardization medical terminology, and these factors also can be to electronics disease simultaneously
The excavation going through middle relevant medical data brings certain impact.Hospitals at Present clinical worker operate electronic medical record system when
Wait, due to the presence of objective circumstances, be difficult to use structurized electronic medical record system entirely interior on a large scale.Although structured electronic
Case history has many good qualities, but difficulty is higher because it implements, and the requirement to user of service is very high.Comparatively speaking, freely
Text input mode is much more flexible, and is easy to promotion and implementation and use.
The electronic medical record system of domestic main flow also allows for being realized with structurized method when design at present, but
It is the complexity due to medical science and polytropy, to realize difficulty higher for structurized electronic health record entirely.Some electronic medical record system warps
After being designed to accordingly to support structuring and clinical decision auxiliary, but it is necessary to according to electronics when practical operation
Case history code requirement, to input, also usually must complete according to the unit that system provides when input.Due to input specification relatively,
Can extract to subsequent data and using offering convenience;But premise is whether structured stencil design can be more conform with case history
Structurized requirement.Structured stencil needs the personnel of professional domain to cooperate, and workload is very big, such as:Structuring nursing note
Record, operation record etc., for the difference of patient's check point, design various anesthesia methods, nursing degree etc. making different
Template, the personnel also needing to Medical Technologist's rank of professional domain in the middle of this participate in, and the degree of peopleware and participation is to knot
The good bad influence of fruit is very big, and template construct is difficult to meet the situation of all complexity.In addition, relating in structured electronic patient record
And to medical terminology standardization issue still lack complete, unified, being easy to use at present and have a large amount of practical application bases
The taxonomic hierarchies of plinth and relevant criterion.
Although there is SNOMED, SNOMED CT, ICD-10, ICD-9-CM (International Classification of Diseases clinic correction) etc.
Standard terminology collection, but to realize because these standard terminology collection are substantially to translate by foreign language, therefore and not all standard
The Chinesizing effect of term can be satisfactory, more or less can bring some inconvenience in real work.Based on these shortcomings above-mentioned,
The effect that structured patient record is implemented is simultaneously unsatisfactory, especially as { body part } { conventional symptom } { numeral } { chronomere } this
Plant norm structure, be not to be seen everywhere.It is reported that, the electronic medical record system that most hospitals are used at present in design not
There is consideration structuring, the even so-called institutional electronic health record based on XML is also partly structuring, is not real
Structuring in meaning, can only be the structuring of part, and as main suit, past medical history, laboratory examination etc., these are more is to be based on
The part of free Characters, but this part information of being comprised often most reference significance, the feature wherein comprising
Element has critically important directive significance for clinical research.
Although some articles existing refer to the structuring to electronic health record and the feature recognition to structural data,
Premise is used electronic medical record system in design, just according to standardized structure design, produces the customization meeting requirement
Template, is carried out according to structured way when typing, the Medicine standard term of the relatively specification that term is also
Collection.Regrettably, many electronic medical record systems so do not design or do not have in use to accomplish.Actual subscript
Standardization and typing liberalization originally contradiction, if thinking, standardization just certainly will affect the free degree, if thinking, liberalization will produce
Much nonstandard data, this is accomplished by these substantial amounts of nonstandard data being analysed in depth, using specific technology
Carry out featured terms screening, refinement, analysis, only handle each step pilot process well, could be that clinical data analysis is carried out
Significant guidance is provided.The shortcomings of structured electronic patient record lead to that domestic electronic health record is structurized to be developed and have some setbacks
Profit, so a lot of hospital still continues to use the electronic medical record system of free text input, such electronic health record is a papery
Data, to the transcription of electronic data, is unfavorable for the data analysis of profound level.A lot of electronic medical record systems are in input process
Do not have perfect standard can follow, ununified specification yet, so for later data exchange, Data Integration, data
It is all a potential obstacle for analysis.But want to accomplish in one move, the standardization realizing all data is also unpractical,
How existing non-structural, nonstandard numbers according on accomplish the structuring of data, standardization here it is a comparison is significant
Thing.The data of needs only after structuring, could be further extracted according to structurized relevant information, and to the number extracting
According to being analyzed, so smooth development to clinical medicine correlative study activity provides due help.
(1) in electronic health record, name Entity recognition except name identification, place name identification etc., also disease name identify,
The identification of symptom title, operation names identification, nomenclature of drug identification etc..It is the language using mark based on the method for statistical learning
Material is trained, and therefore the mark of language material does not need too many domain knowledge.At present, the method has been widely used for nature language
Speech process field.Conventional Statistical learning model includes SVMs (Support vector machine, SVM), hidden horse
Er Kefu (Hidden Markov model, HMM), maximum entropy markov (Maximum ectropy Markov model,
MEMM), condition random field (Conditional random field CRF) etc..This characteristic of hidden Markov model is permissible
In the automatic word segmentation and part-of-speech tagging of Chinese.The method of HMM is also used in other Chinese word segmenting methods, wherein by word word-building
Chinese word segmenting method be exactly one kind therein, and achieve good effect, the method by word word-building is that N.Xue et al. carries
Go out, its main thought is the classification problem that participle process is regarded as word, conventional method is all first to set up a dictionary, participle
Process be actually to carry out participle by looking up the dictionary, but then different by word word-building, it each Chinese character constitute word
A location (lexeme) can be corresponded to.In general can be described as:(I), suffix (E), list in prefix (B), word
Alone become word (S).Conditional random field models are on the basis of hidden Markov and maximum entropy model, proposition for marking and cutting
Divide the conditional probability model of ordered data, it is a kind of discriminate probability non-directed graph learning model.CRF has been successfully applied to certainly
So field such as Language Processing (Natural Language Processing, NLP), bioinformatics and network intelligence.(2) lead to
Cross Entity recognition featured terms out, some meanings are similar or close, or even the meaning is just the same, for no other reason than that operating personnel
Have input what term lack of standardization caused.Such as coronary stenting and coronary artery stent implantation, actually refer to generation
The same meaning.Lack of standardization due to inputting, lead to system to extract two different featured terms.Therefore, by calculating feature
Similarity degree between term is come feature of standardizing.
Content of the invention
For solve above-mentioned deficiency of the prior art, it is an object of the invention to provide after a kind of electronic health record structuring and
The implementation method of auxiliary diagnosis and its system, realize the structuring of medical record information by structuring typing mode.
The purpose of the present invention is to be realized using following technical proposals:
The present invention provides the implementation method of structuring and auxiliary diagnosis after a kind of electronic health record, and it thes improvement is that,
Described implementation method comprises the steps:
(1) electronic health record text structureization is processed;
S11:Set up Medical Dictionary;
S12:Set up medical science corpus;
S13:Medical features term process;
(2) auxiliary diagnosis management;
S21:Determine the feature word frequency that featured terms collection and electronic medical record document are constituted;
S22:Feature word frequency is carried out with PU train and carry out PU study;
S23:Draw auxiliary diagnosis result.
Further, in described step S11, described Medical Dictionary includes:
Standard medical dictionary, including:What the whole world was general is the 10th revised edition《The world of diseases and related health problems
Statistical classification》ICD-10, International Classification of Diseases:The 9th edition this ICD-9-CM of clinical modification of operation and operation, medical system name
The data of method-clinic term SNOMED CT is as standard;
Clinical medicine application dictionary, including:Internal dictionary and thesaurus, described internal dictionary includes clinical condition written complaint
Allusion quotation and other the related dictionaries checking term;
Described thesaurus includes:Non-standardization featured terms arrive to the mapping to standardization featured terms, mistake word
The standardization mapping of featured terms and sole criterion term are to the mapping of multiple standard terminologys.
Further, in described step S12, set up medical science corpus and comprise the steps:
S121:Electronic medical record document is extracted from electronic health record database;
S122:Electronic medical record document is carried out with part-of-speech tagging and lexeme mark;
S123:Data Integration is carried out to the document after part-of-speech tagging and lexeme mark;
S124:Make feature templates, feature templates are formed by CRF Algorithm for Training;
S125:Form characteristic, and carry out the recruitment evaluation of CRF algorithm;
S126:Ultimately form medical science corpus.
Further, in described step S122, described part-of-speech tagging refer to extract electronic medical record document carry out pre-
Process, obtain the part of speech of electronic medical record document Chinese version, and combine lexeme mark, be converted into condition random field CRF form, be used in combination
Condition random field CRF algorithm carries out feature extraction;By manual type, the electronic medical record document after automatic marking is checked;
Described lexeme mark, increases the hit probability to electronic medical record document Chinese version using standard medical dictionary, uses
(reverse maximum matching method starts coupling scanning from the end of processed document to reverse maximum matching algorithm, takes least significant end every time
2i character (i word word string), as matching field, if it fails to match, removes a word of matching field foremost, continuation
Join), medical terminology simultaneously carries out automatic marking according to I and suffix E in prefix B, word;
Carry out CRF Algorithm for Training, if training process in described step S124:%CRF_test-m model test.data
>Output.txt, the result of training is in output.txt;Assess the contrast of label to be predicted and prediction label;
During output.txt exports in CRF algorithm, space is TAB key, all replaces with real space bar;
Conlleval.pl identification is space bar;
In described step S125, the evaluation criteria of the recruitment evaluation of CRF algorithm is:
TP, True Positive:It is positive positive sample by model prediction;
FP, False Positive:It is positive negative sample by model prediction;
FN, False Negative:It is negative positive sample by model prediction;
TN, True Negative:It is negative negative sample by model prediction;
Accuracy:P=TP/ (TP+FP);
Recall rate:R=TP/ (TP+FN), i.e. real rate;
F1, compressive classification rate:Precision ratio and the harmonic-mean of recall ratio, equal to P, little that of R two number:F=2*
P*R/(P+R).
Further, in described step S13, medical features term process comprises the steps:
S131:Through the electronic medical record document of the process of CRF algorithm, obtain text, mark inside described text
Each word positional representation in the text in note test set data:I and suffix E in prefix B, word, is obtained special by corresponding program
Collection is closed, and except there being some to be original word in dictionary in Partial Feature word in described characteristic set, has some to be not phase
Close original featured terms inside dictionary, be the feature templates that CRF passes through manually to mark, the feature obtaining after carrying out data training
Word, i.e. so-called unregistered word;
S132:Featured terms set is obtained, the inside comprises the featured terms of specification and nonstandard feature after feature extraction
Term, in conjunction with non-standardization featured terms to standardization featured terms mapping thesaurus, by nonstandard featured terms with
In thesaurus, non-standard featured terms carry out similarity-rough set, after comparing obtain similarity ranking and Similarity value by
According to order arrangement from big to small;
S133:The threshold value of similarity is tentatively set to similarity and is more than or equal to 50%, the non-rule of threshold condition will be met
Model featured terms and corresponding specification features term are recommended operating personnel as candidate feature term and are carried out reference, by operating
Personnel determine non-standard featured terms corresponding specification features term, as final specification features term;The size of threshold value by
Manually freely arrange.
Further, in described step S132, weight featured terms being occurred in all electronic medical record document (uses
TF-IDF method is calculating) add up, finally obtain average in all electronic medical record document for each featured terms, then
Ranking from big to small;
In described step S133, measure to calculate featured terms in individual features term set using feature text similarity
Similarity, finally take the geometric mean=comprehensive similarity formula meter of (chinese character distance+phonetic distance+five distances)
Calculate;
Chinese character distance, phonetic distance, in five distances respectively using character string similar (Jaro-Winkler) away from
With a distance from+string editing, calculating similarity, the mean value finally taking two kinds of distances is as two kinds of similarities for two kinds of distances
Distance metric.
Further, described step S22 includes:Described feature word frequency is by positive example document data collection and not mark number of files
Test data set according to collection composition;From positive example document data collection with do not mark document data focusing study, using set P and U
Practise positive example document and the counter-example document that framework is distinguished in test data set, i.e. PU study, wherein P represents positive example document data collection
Close, U represents the unlabeled data set of counter-example document composition;In the case of not carrying out counter-example document marking, study obtains one
Individual grader, is labeled to not marking document data collection with described grader, the document required for obtaining.
Further, in described step S22, the medical record data clarified a diagnosis as determination disease is identified to form positive example
Document data collection, does not mark document data collection formation training set in conjunction with the medical record data not marked and is learnt, using PU
The grader that habit framework obtains is labeled to electronic medical record document from now on, reaches the purpose of auxiliary diagnosis.
Present invention additionally comprises structuring and assistant diagnosis system after a kind of electronic health record, it thes improvement is that, described
System includes:
Medical Dictionary management module:For to standard dictionary management and clinical medicine application dictionary management;Described medical science is faced
Bed application dictionary, including:Internal dictionary and thesaurus, described internal dictionary includes clinical symptoms dictionary and checks term
Other related dictionaries;Described thesaurus includes:Non-standardization featured terms are used to the mapping to standardization featured terms, mistake
Word is to the standardization mapping of featured terms and sole criterion term to the mapping of multiple standard terminologys;
Medical science language material database management module:For to the extraction of electronic medical record document data, part-of-speech tagging and lexeme mark;And
Make feature templates, feature mark and feature extraction;
Medical features term process:For the standardized management to featured terms;
Auxiliary diagnosis management module:Learn for the management of PU learning framework, PU learning training and test and management and PU
Auxiliary diagnosis manage.
In order to have a basic understanding to some aspects of the embodiment disclosing, shown below is simple summary.Should
Summarized section is not extensive overview, is not the protection domain that will determine key/critical component or describe these embodiments.
Its sole purpose is to assume some concepts with simple form, in this, as the preamble of following detailed description.
Compared with immediate prior art, the excellent effect that the technical scheme of present invention offer has is:
The present invention is by calculating the similarity degree between featured terms come feature of standardizing.Measures characteristic text phase of the present invention
Use the combination of several distance metrics like degree method:Jaro-Winkler (Winkler) distance is to weigh two characters
Similitude between string, it is the variant of Jaro distance metric, for repeating the detection recording.String editing distance is character
String editing distance refer to certain character be changed into another one character string minimum need how many times replace, insertion, deletion action.Using
The geometric mean of (chinese character distance+phonetic+five distances of distance) is measured as last comprehensive similarity.Feature ranking makes
Realized with the method for TF-IDF (Term frequency inverse document frequency), TF-IDF is one
Plant statistical method, in order to assess the significance level that featured terms are with respect to one of file set or corpus document, feature art
The number of times that the importance of language is occurred in the document to it is directly proportional, and the frequency being occurred in corpus with it is inversely proportional to.According to
The featured terms generating, are converted into the file format that PU (positive example data set and no labeled data focusing study) learns, through PU
Study, system recommends the diagnosis of correlation automatically for clinical worker reference.
For above-mentioned and related purpose, one or more embodiments include will be explained in and in claim below
In the feature that particularly points out.Description below and accompanying drawing describe some illustrative aspects in detail, and its instruction is only
Some modes in the utilizable various modes of principle of each embodiment.Other benefits and novel features will with
The detailed description in face is considered in conjunction with the accompanying and becomes obvious, the disclosed embodiments be intended to including all these aspects and they
Equivalent.
Brief description
Fig. 1 is structuring and assistant diagnosis system after the electronic health record of the first optimal technical scheme that the present invention provides
Structured flowchart;
Fig. 2 is the Medical Dictionary structure chart that the present invention provides;
Fig. 3 is the multi-standard term synthesis schematic diagram that the present invention provides;
Fig. 4 is the schematic diagram of the medical domain corpus Establishing process that the present invention provides;
Fig. 5 is the schematic diagram of the condition random field CRF algorithmic format that the present invention provides;
Fig. 6 is the flow chart of the language material lexeme mark that the present invention provides;
Fig. 7 is that the CRF that the present invention provides trains file format by the schematic diagram of word word-building;
Fig. 8 is the feature templates 1 of present invention offer and the schematic diagram of feature templates 2;
Fig. 9 is the schematic diagram of the featured terms handling process that the present invention provides;
Figure 10 is the schematic diagram of the non-standard featured terms mark that the present invention provides;
Figure 11 is the auxiliary diagnosis flow chart that the present invention provides;
Figure 12 is the PU study schematic diagram without category of the second optimal technical scheme that the present invention provides;
Figure 13 is the study schematic diagram of the PU with category of the second optimal technical scheme that the present invention provides;
Figure 14 is the positive example document recall rate schematic diagram of the second optimal technical scheme that the present invention provides;
Figure 15 is the positive example document accurate rate schematic diagram of the second optimal technical scheme that the present invention provides;
Figure 16 is the F-Value value schematic diagram of the second optimal technical scheme that the present invention provides;
Figure 17 is the overall accuracy schematic diagram of the second optimal technical scheme that the present invention provides;
Figure 18 is the comprehensive similarity recall rate and accurate rate schematic diagram that the present invention provides.
Specific embodiment
Below in conjunction with the accompanying drawings the specific embodiment of the present invention is described in further detail.
The following description and drawings fully illustrate specific embodiments of the present invention, to enable those skilled in the art to
Put into practice them.Other embodiments can include structure, logic, electric, process and other change.Implement
Example only represents possible change.Unless explicitly requested, otherwise individually assembly and function are optional, and the order operating can
To change.The part of some embodiments and feature can be included in or replace part and the feature of other embodiments.This
The scope of the embodiment of invention includes the gamut of claims, and all obtainable of claims is equal to
Thing.Herein, these embodiments of the present invention individually or generally with term " invention " can be represented, this is only
For convenience, and if in fact disclosing the invention more than, the scope being not meant to automatically limit this application is to appoint
What single invention or inventive concept.
First optimal technical scheme:
As shown in figure 1, structuring and auxiliary are examined after the electronic health record of the first optimal technical scheme providing for the present invention
The structured flowchart of disconnected system, the present invention provides the implementation method of structuring and auxiliary diagnosis after a kind of electronic health record, realization side
Method comprises the steps:
(1) electronic health record text structureization is processed, including:
S11:The foundation of relevant medical dictionary:
Because participle instrument is generally not that it is special that carried dictionary can not possibly comprise most of medical science towards medical speciality field
With term, the present invention, in order to rapidly set up related dictionary, employs the partial data of ICD10, ICD-9-CM, SNOMED CT
As standard, constitute Medical Dictionary in conjunction with hospital clinical application dictionary.As shown in Figure 2.
Medical Dictionary includes:
Standard medical dictionary, including:What the whole world was general is the 10th revised edition《The world of diseases and related health problems
Statistical classification》ICD-10, International Classification of Diseases:The 9th edition this ICD-9-CM of clinical modification of operation and operation, medical system name
The data of method-clinic term SNOMED CT is as standard;
1st, hospital clinical application dictionary, including:Internal dictionary and thesaurus, described internal dictionary includes clinical symptoms
Dictionary and other the related dictionaries checking term;
(1) internal dictionary:
Clinical symptoms dictionary:
For example:Chilly, heating, shiver with cold, cough, expectoration, headache, headache, giddy, nasal obstruction, runny nose, uncomfortable in chest, asthma, abdomen
Bitterly, abdominal distension, frequent micturition, urgent urination, DOMS, malaise, weak, expiratory dyspnea, spitting of blood etc..
Other related dictionaries:
Various inspection terms, such as full rabat, chest CT etc..
(2) synonymicon, including:Non-standardization featured terms are to the mapping to standardization featured terms, mistake word
To the standardization mapping of featured terms and sole criterion term to the mapping of multiple standard terminologys.
During writing electronic health record, due to the difference of clinician's medical ground, grasp medical science relevant knowledge
Qualification is different, so the degree of clinician's grasp standard medical terminology is also different.Each doctor's accurate perception is allowed to own
Standard clinical term do not meet actual conditions, also have during typing simultaneously clerical mistake produce, so consider with
Adopted word dictionary should comprise three below part, and these three partly can be incorporated in a dictionary:
Non-standardization featured terms to standardization featured terms mapping, as shown in table 1.
Table 1 non-standardization featured terms-standardization featured terms mapping
Non-standardization featured terms | Standardization featured terms |
The sick Crohndisease of clone | Crohn disease (regional ileitis) |
Kernig's sign | Kernig sign |
Hemoptysis, cough up phlegm | Spitting of blood, expectoration |
Antibiotic | Antibiotic |
Anti-inflammatory treatment | Anti-infective therapy |
Cranial nerve | Cranial nerve |
Presbyopic | Presbyopia |
Lymph gland | Lymph node |
Presenium disease is stayed | Alzheimer disease |
Frozen section | Freezing microtome section |
Rale | Sound |
Lymphoblast | Lymphoblast |
Mould | Fungi |
Mistake word to standardization featured terms mapping, as shown in table 2.
The mapping of the wrong word-standardization featured terms of table 2
Mistake word | Standardization featured terms |
Tang's urine disease | sugared ornithosis | Diabetes |
Spontaneous immunity | Autoimmunity |
It is also contemplated for when actually used for Tables 1 and 2 being merged into a dictionary, i.e. thesaurus.
A kind of situation is also had to be exactly that some term has multiple standards expression, using wherein any one is all specification
, but reality during structurized it should the method with reference to SNOMED CT sets up a dictionary it is simply that sole criterion art
The mapping of language and multiple standard terminology is it is also possible to regard as non-standardization featured terms to standardization featured terms this situation
Mapping special circumstances it is also possible to and table 1, table 2-in-1 and in a thesaurus, as shown in Figure 3.
S12:The foundation of relevant medical corpus:Set up and safeguard the corpus of medical domain.As shown in figure 4, under including
State step:
S121:Electronic medical record document is extracted from electronic health record database;
S122:Electronic medical record document is carried out with part-of-speech tagging and lexeme mark;
S123:Data Integration is carried out to the document after part-of-speech tagging and lexeme mark;
S124:Make feature templates, feature templates are formed by CRF Algorithm for Training;
S125:Form characteristic, and carry out the recruitment evaluation of CRF algorithm;
S126:Ultimately form medical science corpus.
Specifically:
In step S121, electronic medical record document is extracted and is included:
Because the fairly large corpus of artificial mark is relatively difficult, the mode that man-computer cooperation is contemplated herein is with fast run-up
A vertical small-scale corpus, comprises the following steps that:
1st, pass through to have artificially collected 887 parts of electronic medical record document, cover the section office such as division of cardiology, oncology, division of respiratory disease
Patient data.
2nd, (main suit), (present illness history), (past medical history), (laboratory and the apparatus inspection of each patient is automatically extracted by program
Look into) text data that is related to, as original process file.
3rd, last, carry out the automatic marking of text on this basis using corresponding instrument, then carry out manual examination and verification mark
Method, can rapidly build a corpus.
In step S122:
First, the part-of-speech tagging of language material:
Chinese Academy of Sciences's ICTCLAS Words partition system is the Chinese lexical analysis system based on level hidden Markov model.System
Function is more, mainly has the functions such as part-of-speech tagging, Chinese word segmentation, name Entity recognition, unknown word identification, can be with plug-in user
Dictionary, extensively applies in the every field of Chinese information processing.
The present invention utilizes the correlation function of ICTCLAS, carries out secondary development, for the pretreatment before being labeled.This mould
The purpose of block design is the part of speech of quick obtaining text, so that next step use condition random field carries out feature extraction.Selection portion
Point effect shows as follows:
【Master/a tells/v:/ w cough/v expectoration/expiratory dyspnea/n3/n days/q of n companion/v./ w is existing/t medical history/n:/ w3/n days/q
Before/f patient/n is in hospital in/p our hospital/n breathing/v section/n/v during/f appearances/v cough/v ,/w expectoration/n ,/w independently/v row/v
Phlegm/n difficulty/a ,/w need/v auxiliary/v row/v phlegm/n ,/w is /p is a large amount of/m grey/n mucus/n phlegm/n ,/w not /d is shown in/v phlegm/n
In/f band/v blood/n.During/w/n has/v expiratory dyspnea/n companion/vSPO2/x decline/v (/w minimum/a70%/m)/w ,/w gives/and v turns over
After the body/v bat/v back of the body/v suction/v phlegm/n/f improvement/v.In/w the course of disease/n/f no/v heating/v ,/w no/v nausea/a vomiting/n ,/w
No/v drop in blood pressure/n ,/w no/v spitting of blood/n ,/w no/v is black/a just/n./ w chest/nCT/x shows/v (/w2013-6-15/m)/
w:/ w is slow/and a props up/q changes/v pulmonary emphysema/n companion/v infection/v ,/w two/m pulmonary fibrosis/n ,/w both sides/f pleura/n plumpness/a companion/
V pleural effusion/n ,/w fall/v sustainer/n increasing/v width/a./w】
In order to meet the requirement of the form to file for the CRF++-0.53 secondary development, using computer program by ICTCLAS
Word segmentation result be converted into the form specified, as shown in Figure 5.
2nd, the lexeme mark of language material
In order to obtain the necessary corpus of CRF study, lexeme mark must be carried out to all words in document, it is apparent that passing through
The mode of artificial mark less efficient it is considered to be solved with the quick notation methods of computer.Need when mark to use related doctor
The standard dictionary in field, system is by the term increase of ICD10, ICD-9-CM, SNOMED, SNOMED CT, synonymicon etc.
To in dictionary, to increase the hit efficiency of participle.Diagnosis, the relevant medical term length performed the operation, check are typically long, use
(I), suffix (E) in reverse maximum matching algorithm foundation prefix (B), word, carry out automatic marking, because dictionary can not possibly comprise
All of standard medical term, so after carrying out dictionary matching, by manually carrying out to the corpus after computer automatic marking
Verification, as shown in Figure 6.By the result of word word-building, corresponding CRF training file format is as shown in Figure 7.
In step S124, the feature templates 1 that the present invention provides and feature templates 2 are as shown in Figure 8.
In step S125, the recruitment evaluation of CRF algorithm includes:
If training process:%CRF_test-m model test.data>output.txt
The result of training is in output.txt.Assessment is the contrast of label to be predicted and prediction label.
conlleval.pl<output.txt
.pl suffix is Perl file, so needing to install " practical form extraction language " (Practical Extraction
And Report Language, Perl)
Note:Output.txt space in CRF++ output is TAB key, needs all to replace with real space bar.
Conlleval.pl identification is space bar.
The assessment result contrast of the assessment result of command set output characteristic template 1 and feature templates 2, as shown in table 3.
Table 3 template contrasts
Evaluation criterion:TP(True Positive):It is positive positive sample by model prediction;
FP(False Positive):It is positive negative sample by model prediction;
FN(False Negative):It is negative positive sample by model prediction;
TN(True Negative):It is negative negative sample by model prediction;
Accuracy (Precision):P=TP/ (TP+FP);
Recall rate (Recall):R=TP/ (TP+FN), i.e. real rate;
F1 (compressive classification rate):Precision ratio and the harmonic-mean of recall ratio, closer to P, R two number less that
Individual:F=2*P*R/ (P+R);
Conclusion:Be the effect of feature templates 2 more preferably, reason is that feature templates 2 can obtain more validity features.
The process of the featured terms of step S13:The featured terms generating after processing are further processed, to obtain
Meet the featured terms of PU study requirement.As shown in figure 9, comprising the steps:
S131:Through the process of CRF algorithm, a text can be obtained, inside this document, be labelled with test set data
In each word positional representation in the text:(M), suffix (E) in prefix (B), word, obtain a feature by corresponding program
Set, in this characteristic set, some Feature Words are not original featured terms inside related dictionary, are that CRF algorithm passes through
The feature templates of artificial mark, the Feature Words obtaining after carrying out data training, that is, so-called unregistered word.
S132:A featured terms set can be obtained after feature extraction, the inside both comprised specification featured terms it is also possible to
Contain nonstandard featured terms, be at this moment accomplished by with reference to thesaurus as shown in table 1, by these featured terms with synonymous
In dictionary " non-standard featured terms " this carry out similarity-rough set, have a similarity ranking and similar after relatively
Angle value arranges according to order from big to small.
S133:Because the initially not ready-made standard data set of thesaurus may be referred to, in order to set up from scratch
One thesaurus, needs the threshold value tune of similarity is lower, as long as being tentatively set to similarity be just more than or equal to 50%
" the non-standard featured terms " that meet threshold condition and corresponding " specification features term " are recommended as candidate feature term
Carry out reference to operating personnel, determine which corresponding specification term of non-standard featured terms of selection as final by artificial
Specification term.
Table 4 non-standard term-specification term mapping
Non-standardization featured terms | Standardization featured terms |
Tang's urine disease | sugared ornithosis | Diabetes |
Get more and more with the featured terms in the data acquisition system of thesaurus, what this when, threshold value can be adjusted is high,
Advantage of this is that, only when the corresponding featured terms of featured terms that similarity is higher than a certain threshold value just can be shown in candidate
Operating personnel's reference is supplied, if featured terms do not have corresponding candidate feature term through similarity-rough set in featured terms list
Occur can selecting in lists, at this time by way of manual confirmation, this feature term can be modified as the rule specified
Model featured terms.Note:The size of threshold value can manually freely to arrange by system, so relatively flexibly.
If typing " Tang's urine disease ", this word is exactly one and typically inputs the word leading to lack of standardization, " Tang as shown in Table 4
The corresponding candidate's non-standard featured terms of urine disease " are " Tang's urine diseases ", can be found according to this candidate's non-standardization featured terms
" diabetes ", this is only final specification features term, as shown in table 5:
Featured terms before table 5 specification
Pant | Expectoration |
Heating | Spitting of blood |
Weak | Asthma |
Pulmonary infection | Full rabat |
Hepatitis | Rabat |
Infection | Diabetes |
Hypertension | Tang's urine disease |
Coronary stenting | Chilly |
Coronary artery stent implantation | Auricular fibrillation |
Coronary heart disease | Chest CT |
Shiver with cold | Uncomfortable in chest |
Cough | Pectoralgia |
Leucocyte | WBC |
As shown in table 5, featured terms " Tang's urine disease " corresponding to non-standard featured terms in synonymicon (table 4) are
" Tang's urine disease " and " sugared ornithosis " this entry.Can extract out corresponding standardization featured terms " diabetes " by this mapping relations.
So just can learn that " disease is urinated by Tang " and " diabetes " are different, then non-standard featured terms " Tang's urine disease " be marked with eye-catching color
Out, prompting clinician revises.As shown in Figure 10.
Specifically:
In step S133, the similarity of featured terms processes and includes:
According to the featured terms set extracting, detect in the corresponding thesaurus of each of which featured terms and do not advise
Model featured terms carry out similarity comparison, and specific method will be used feature text similarity and measure to calculate the phase of individual features
Like degree.According to article, finally take the geometric mean=synthesis phase of (chinese character distance+phonetic+five distances of distance)
Like degree although the algorithm of similarity is similar, it is defeated due to considered the Feature Words under actual conditions having quite a few
Enter what mistake caused, wherein just include homophonic (unisonance, nearly sound), the mistake of similar Chinese character (nearly word form such as such as radical), this
When this comprehensive similarity just can also be obtained in that while improving similar duplication detection algorithm recall ratio and higher look into standard
Rate.As shown in figure 18:
Respectively using Jaro-Winkler distance+word in chinese character distance, phonetic distance, three kinds of methods of five distances
Symbol two kinds of distances of string editing distance, to calculate similarity, finally take the distance degree as two kinds of similarities for the mean value of two kinds of distances
Amount.As shown in table 6:
6 three kinds of similarity comparison of table
Illustrate this two featured terms very close it may be considered that only being represented with one of specification term.By this
Method also can find out the synonym phrase for standard terminology of the easy appearance in routine use it may be considered that being added to synonymous
The vocabulary of dictionary is enriched in dictionary.
Featured terms after specification are as shown in table 7 below:
Featured terms after table 7 specification
Pant | Expectoration |
Heating | Spitting of blood |
Weak | Asthma |
Pulmonary infection | Full rabat |
Hepatitis | |
Infection | Diabetes |
Hypertension | |
Coronary stenting | Chilly |
Auricular fibrillation | |
Coronary heart disease | Chest CT |
Shiver with cold | Uncomfortable in chest |
Cough | Pectoralgia |
Leucocyte |
Featured terms ranking:Through process above, the featured terms being extracted have a lot.However, not all carry
The feature taking is all meaningful, therefore, it can consider to come feature is carried out by way of TF-IDF ranking and filter out crucial spy
Levy.Because not being that every article all of Feature Words all can, in order to obtain the ranking of key feature term, examine herein
Worry occurs in the weight in all document d all key feature terms and adds up, and finally obtains each featured terms in institute
There is the average in document, then ranking from big to small.Herein through CRF++ instrument extraction feature term 390 altogether, Ran Hougen
Carry out ranking according to the average weight calculating, through confirmation and the screening of domain expert, final acquisition key feature term 68
Individual.Front 20 featured terms listed by table 8.
Table 8 featured terms ranking
Second optimal technical scheme:
Auxiliary diagnosis manage:
According to the featured terms generating, it is converted into the file that PU (positive example data set and no labeled data focusing study) learns
Form, through PU study, system recommends the diagnosis of correlation automatically for clinical worker reference.As shown in figure 11:
S21:Determine the feature word frequency that featured terms collection and electronic medical record document are constituted;
S22:Feature word frequency is carried out with PU train and carry out PU study;
S23:Draw auxiliary diagnosis result.
Specifically:
Step S22:The application that part educational inspector practises:
Partial supervised study is generally divided into two kinds:The first learning tasks is from marking and no learned labeled data
Practise, also known as doing LU study, wherein L represents labeled data collection, and U represents unlabeled data collection.Second learning tasks are from just
Example data set and no labeled data focusing study, i.e. PU study, wherein P represents positive example set, and U represents unlabeled set and closes, algorithm
Purpose be in the case of not carrying out negative data mark, acquire an accurate grader.
In actual applications, need to distinguish positive example document from the collection of document of a mixing.And the literary composition of this mixing
Both contain positive example document in shelves set, also contains the document of other classifications.Wherein, the corresponding document of classification interested
Referred to as positive example document;The corresponding document of remaining classification is referred to as counter-example document.All of positive example document constitutes positive example set P;Institute
Some counter-example documents constitute no mark set U.
Problem definition is intended to find out a grader, can distinguish the positive example document in test set by using set P and U
With counter-example document.The method of this solve problem is PU study.
This learning framework is based on such a fact:Current internet is prevailing, due to people in most of the cases only
Wherein certain class document or web page contents are interested in, and other category documents or web page contents are not relevant for.In mark
In the case of a small amount of document of interest, it is possible to use PU learning framework obtains a grader, come to having no that document carries out with it
Mark, thus the document required for obtaining.For example some people are interested in the webpage of friend-making sites, this be every other webpage all
Counter-example webpage can be seen as.
In medical research, this situation is also often had to occur, that is, certain disease is more difficult according to the diagnosis of some features, but this
Planting disease is just interested to clinical workers.The medical record data that fraction is clarified a diagnosis as this kind of disease is identified shape
Become positive example collection of document, then, be that unlabeled data set forms training set in conjunction with the medical record data not marked in a large number
Practising, using the grader that PU learning framework obtains, medical history information from now on being labeled, thus reaching the mesh of auxiliary diagnosis
's.
The present invention also provides structuring and assistant diagnosis system after a kind of electronic health record, including:
Medical Dictionary management module:For to standard dictionary management and clinical medicine application dictionary management;Described medical science is faced
Bed application dictionary, including:Internal dictionary and thesaurus, described internal dictionary includes clinical symptoms dictionary and checks term
Other related dictionaries;Described thesaurus includes:Non-standardization featured terms are used to the mapping to standardization featured terms, mistake
Word is to the standardization mapping of featured terms and sole criterion term to the mapping of multiple standard terminologys;
Medical science language material database management module:For to the extraction of electronic medical record document data, part-of-speech tagging and lexeme mark;And
Make feature templates, feature mark and feature extraction;
Medical features term process:For the standardized management to featured terms;
Auxiliary diagnosis management module:Learn for the management of PU learning framework, PU learning training and test and management and PU
Auxiliary diagnosis manage.
First, experimental framework and result
1st, test used tool
(1) PU learning tool LPU (http://www.cs.uic.edu/~liub/LPU/lpu.zip).
(2) SVMs kit goes out to download SVMlight (SVMs) kit
(3) experiment order and parameter
lpu-s1[option 1]-s2[option 2]-c[option 3]-f[filestem]
-s1:Represent the first stage parameter options of PU study.
-s2:Represent the second stage parameter options of PU study.
-c:The mode of selection sort device.
- s1 has three kinds of methods can select be respectively:Espionage act (spy), Luo Jiao (roc), naive Bayesian
(nb).S2 has two methods can select be respectively:SVMs (svm), expectation are maximum (em).Selection sort device
Mode:1 represents best one in selection institute generation grader.
2nd, the file format of experimental data set
Three original data sets are respectively:
demo.pos:Represent positive example collection of document.
demo.unlabel:Represent and do not mark collection of document.
Above-mentioned two file does not all comprise category, as shown in Figure 12.
demo.test:Represent test data set.Both included positive example document and comprised counter-example document, also wrapped simultaneously
Contain category, positive example is represented with+1, negative example is represented with -1, as shown in Figure 13.
Each row of data form in data file:Category attribute:Property value ... attribute:Property value.Category value:+ 1 and-
1, represent positive example document and counter-example document respectively.Each category and property value between use space-separated, each attribute must
Must be numbered with integer, from 1 open numbering.Each property value must use integer value, represents that each attribute occurs in affiliated literary composition
Number of times in shelves.Property value is that 0 feature will be automatically ignored.Attribute number must arrange according to incremental order, and such as 5:1
6:1 7:1 8:1 10:4 11:2 12:3 13:1 14:1 15:1 16:6 17:2 23:1 25:2 29:1.
3rd, experimental data
(1), experimental data is constituted:
Effective electron case history 750 is collected in this experiment altogether, and content is related to respiratory disease, takes out through the feature of system
Take, obtain efficient diagnosis as shown in table 9 below:
Table 9 efficient diagnosis
The characteristic attribute value extracting is as shown in table 10 below:
Table 10 characteristic attribute
Full rabat | Coronary heart disease | Upper right Lung infection |
Chest CT | Pulmonary emphysema | Atelectasis |
Asthma | Precordialgia | Bronchiectasis |
Cough | Become thin | Malignant pleural effusion |
Bronchial astehma | Chilly | Interstitial pneumonia |
Pleural effusion | Expiratory dyspnea | Ventilatory dysfunction |
Runny nose | Shortness of breath | Stridulate |
Pant | Heating | Obstructive pneumonia |
Shiver with cold | Malaise | Upper left Lung infection |
Two enhanced lung markings | Lower-left Lung infection | Enlargement of lymph nodes |
Spitting of blood | Respiratory failure | Cholecystolithiasis |
Weak | Pulmonary tuberculosis | Hydropericardium |
Uncomfortable in chest | Pneumothorax | Sneeze |
Nasal obstruction | Expectoration | Hydropneumothorax |
Pectoralgia | Chronic bronchitis | Hypertension |
Headache | Bottom right Lung infection | Oedema |
Acute bronchitis | DOMS | Calcification of lymph node |
The infection of the upper respiratory tract | Spontaneous pneumothorax | Pleural calcification |
Palpitaition | AECB | Diabetes |
Vomiting | COPD | Lose weight |
Nausea | It is short of breath | Bronchiostenosis |
Dizzy | Edema of lower extremity | Pleural effusion |
Pharyngalgia | Pulmonary fibrosis |
4th, experimental data packet
With COPD for positive class, choose 151 COPD case histories and constitute positive example document sets
Close, and generate mzf.pos file.Select again from remaining document 49 COPD case histories and 300 other
Disease type case history constitutes mzf.unlabel file.Finally, by remaining 50 COPD case histories and 200
Other diseases case history constitutes mzf.test file.
5th, participate in the combination of the grader of experiment:
Table 11 classifiers combination
In order to ensure there being applicability to different applied environments, the system that the present invention provides, when realizing, provides multiple
Classifiers combination mode.For different task, select optimal classification device.
6th, experimental result and analysis:
From the point of view of the recall rate of the positive example document of Figure 14, though the positive example document recall rate of Roc-Svm is not highest, its value
Also reached 90%, be more or less the same with peak.From the point of view of Figure 15, the positive example document accurate rate value of Roc-Svm is 82%, close
Peak 83%, but after considering recall rate and the accurate rate of positive example document, can Roc-Svm positive example document as seen in Figure 16
F-value value reach 85.9%, be best in all graders.Additionally, from Figure 17 it is also seen that Roc-Svm obtains
94% overall accuracy rate index.It follows that for being directed to the data set of this experiment, Roc-Svm grader is optimum
Grader.
During this is mainly due to learning in PU, unlabeled set closes U and generally has following characteristics:
1. close in U in unlabeled set, positive example document proportion is often less, thus without to the counter-example document in algorithm
Center vector produces considerable influence.
2. close in U in unlabeled set, usually contain multiple different classes of documents, therefore in vector space, they
Cover a larger region, i.e. relative distribution.And document is generally pertaining only to a classification in positive example set P, it is mutually similar
Type.In vector space, they cover a less region, i.e. Relatively centralized.Assume have a decision boundary to be used for
Distinguish positive example document and counter-example document.Wherein, positive example document belongs to set P, and counter-example document belongs to set U, and decision boundary is used for
Ensure that the document in positive example set P and unlabeled set are closed document in U separates.Because in set U, document more disperses, so,
A lot of counter-example documents are had to be divided into positive example document by mistake, this also exactly adopts Rocchio algorithm high precision rate can extract reliability instead
The reason example text shelves.Therefore, after forming reliable counter-example collection of document RN, training set can be formed using RN and set P
Carry out Training Support Vector Machines (SVM), continuous iteration, till no longer having reliable counter-example document to be drawn out of in certain iteration.
But because many counter-example document mistakes can be divided into positive example document by Rocchio algorithm, therefore, positive example document have very low accurate
Rate, and adopt SVMs (SVM) to classify, it will correct the biasing of Rocchio algorithm, thus produce more accurately dividing
Class device.This is the reason also exactly Roc-Svm grader becomes optimum classifier in this experiment.
Specific experiment result is as shown in Figure 14 is to 17:
Evaluation criterion:TP(True Positive):It is positive positive sample by model prediction;FP(False Positive):
It is positive negative sample by model prediction;FN(False Negative):It is negative positive sample by model prediction;TN(True
Negative):It is negative negative sample by model prediction;
Accuracy (Precision):P=TP/ (TP+FP);
Recall rate (Recall):R=TP/ (TP+FN), i.e. real rate;
F1 (compressive classification rate):Precision ratio and the harmonic-mean of recall ratio, closer to P, R two number less that
Individual:F=2*P*R/ (P+R);
Accuracy rate (Aaccuracy):The decision-making ability to whole sample for the grader, judgement that will be positive is that just negative sentences
It is set to negative:A=(TP+TN)/(TP+FN+FP+TN).
Above example is only not intended to limit in order to technical scheme to be described, although with reference to above-described embodiment pair
The present invention has been described in detail, and those of ordinary skill in the art still can enter to the specific embodiment of the present invention
Row modification or equivalent, these without departing from any modification of spirit and scope of the invention or equivalent, all in application
Within the claims of the pending present invention.
Claims (8)
1. after a kind of electronic health record the implementation method of structuring and auxiliary diagnosis it is characterised in that described implementation method includes
Following step:
(1) electronic health record text structureization is processed;
S11:Set up Medical Dictionary;
S12:Set up medical science corpus;
S13:Medical features term process;
(2) auxiliary diagnosis management;
S21:Determine the feature word frequency that featured terms collection and electronic medical record document are constituted;
S22:Feature word frequency is carried out with PU train and carry out PU study;
S23:Draw auxiliary diagnosis result.
2. implementation method as claimed in claim 1 is it is characterised in that in described step S11, described Medical Dictionary includes:
Standard medical dictionary, including:What the whole world was general is the 10th revised edition《The International Statistical of diseases and related health problems
Classification》ICD-10, International Classification of Diseases:Operation and operation the 9th edition this ICD-9-CM of clinical modification, Systematized Nomenclature of Medicine-face
The data of bed term SNOMED CT is as standard;
Clinical medicine application dictionary, including:Internal dictionary and thesaurus, described internal dictionary include clinical symptoms dictionary and
Check other related dictionaries of term;Described thesaurus includes:Non-standardization featured terms are to standardization featured terms
Mapping, the mapping of mistake word to standardization featured terms and sole criterion term are to the mapping of multiple standard terminologys.
3. implementation method as claimed in claim 1 is it is characterised in that in described step S12, sets up under medical science corpus includes
State step:
S121:Electronic medical record document is extracted from electronic health record database;
S122:Electronic medical record document is carried out with part-of-speech tagging and lexeme mark;
S123:Data Integration is carried out to the document after part-of-speech tagging and lexeme mark;
S124:Make feature templates, feature templates are formed by CRF Algorithm for Training;
S125:Form characteristic, and carry out the recruitment evaluation of CRF algorithm;
S126:Ultimately form medical science corpus.
4. implementation method as claimed in claim 3 is it is characterised in that in described step S122, described part-of-speech tagging refers to
The electronic medical record document extracted is pre-processed, obtains the part of speech of electronic medical record document Chinese version, and combine lexeme mark, turn
Change condition random field CRF form into, and carry out feature extraction with condition random field CRF algorithm;Marked to automatic by manual type
Electronic medical record document after note is checked;
Described lexeme mark, increases the hit probability to electronic medical record document Chinese version using standard medical dictionary, using reverse
Maximum matching algorithm, wherein, reverse maximum matching method starts coupling scanning from the end of processed document, takes least significant end every time
2i character is as matching field, if it fails to match, removes a word of matching field foremost, continues coupling;Medical science art
Language simultaneously carries out automatic marking according to I and suffix E in prefix B, word;
Carry out CRF Algorithm for Training, if training process in described step S124:%CRF_test-m model test.data>
Output.txt, the result of training is in output.txt;Assess the contrast of label to be predicted and prediction label;
During output.txt exports in CRF algorithm, space is TAB key, all replaces with real space bar;conlleval.pl
Identification is space bar;
In described step S125, the evaluation criteria of the recruitment evaluation of CRF algorithm is:
TP, True Positive:It is positive positive sample by model prediction;
FP, False Positive:It is positive negative sample by model prediction;
FN, False Negative:It is negative positive sample by model prediction;
TN, True Negative:It is negative negative sample by model prediction;
Accuracy:P=TP/ (TP+FP);
Recall rate:R=TP/ (TP+FN), i.e. real rate;
F1, compressive classification rate:Precision ratio and the harmonic-mean of recall ratio, equal to P, little that of R two number:F=2*P*R/
(P+R).
5. implementation method as claimed in claim 1 is it is characterised in that in described step S13, medical features term process includes
Following step:
S131:Through the electronic medical record document of the process of CRF algorithm, obtain text, inside described text, mark is surveyed
Each word positional representation in the text in examination collection data:In prefix B, word, I and suffix E, obtains feature set by corresponding program
Close, comprise Unrecorded featured terms in original word and dictionary in dictionary in Partial Feature word in described characteristic set, be
CRF passes through the feature templates of artificial mark, the Feature Words obtaining after carrying out data training, i.e. so-called unregistered word;
S132:Featured terms set is obtained, the inside comprises the featured terms of specification and nonstandard featured terms after feature extraction,
In conjunction with the thesaurus of non-standardization featured terms to standardization featured terms mapping, by nonstandard featured terms and synonym
In allusion quotation, non-standard featured terms carry out similarity-rough set, after comparing obtain similarity ranking and Similarity value is according to from big
To little order arrangement;
S133:The threshold value of similarity is tentatively set to similarity and is more than or equal to 50%, will be special for the non-standard meeting threshold condition
Levy term and corresponding specification features term and recommend operating personnel as candidate feature term and carry out reference, by operating personnel
Determine non-standard featured terms corresponding specification features term, as final specification features term;The size of threshold value is by artificial
Freely arrange.
6. implementation method as claimed in claim 5 is it is characterised in that in described step S132, calculated using TF-IDF method
Featured terms are occurred in the weight in all electronic medical record document add up, obtain each featured terms in all electronics disease
Go through the average in document, from big to small ranking;
In described step S133, measure to calculate the phase of featured terms in individual features term set using feature text similarity
Like degree;Geometric mean=comprehensive similarity the formula taking (chinese character distance+phonetic+five distances of distance) calculates;
Respectively using character string similarity distance+string editing distance in chinese character distance, phonetic distance, five distances,
Two kinds of distances, to calculate similarity, finally take the distance metric as two kinds of similarities for the mean value of two kinds of distances.
7. implementation method as claimed in claim 1 is it is characterised in that described step S22 includes:Described feature word frequency is by just
Example text file data collection and the test data set not marking document data collection composition;From positive example document data collection with do not mark number of files
According to focusing study, distinguish positive example document and the counter-example document of test data concentration, i.e. PU using set P and U learning framework
Practise, wherein P represents positive example document data set, U represents the unlabeled data set of counter-example document composition;Do not carrying out counter-example literary composition
In the case of shelves mark, study obtains a grader, is labeled to not marking document data collection with described grader, obtains
Required document;
The medical record data clarified a diagnosis as determination disease is identified to form positive example document data collection, in conjunction with the case history not marked
Data does not mark document data collection and forms training set and learnt, and the grader being obtained using PU learning framework is to electricity from now on
Sub- case history labelling document, reaches the purpose of auxiliary diagnosis.
8. after a kind of electronic health record structuring and assistant diagnosis system it is characterised in that described system includes:
Medical Dictionary management module:For to standard dictionary management and clinical medicine application dictionary management;Described clinical medicine should
With dictionary, including:Internal dictionary and thesaurus, described internal dictionary include clinical symptoms dictionary and check term other
Related dictionary;Described thesaurus includes:Non-standardization featured terms arrive to the mapping to standardization featured terms, mistake word
The standardization mapping of featured terms and sole criterion term are to the mapping of multiple standard terminologys;
Medical science language material database management module:For to the extraction of electronic medical record document data, part-of-speech tagging and lexeme mark;And make
Feature templates, feature mark and feature extraction;
Medical features term process:For the standardized management to featured terms;
Auxiliary diagnosis management module:Auxiliary for the management of PU learning framework, PU learning training and test and management and PU study
Diagnosis management.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610787187.3A CN106383853A (en) | 2016-08-30 | 2016-08-30 | Realization method and system for electronic medical record post-structuring and auxiliary diagnosis |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610787187.3A CN106383853A (en) | 2016-08-30 | 2016-08-30 | Realization method and system for electronic medical record post-structuring and auxiliary diagnosis |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106383853A true CN106383853A (en) | 2017-02-08 |
Family
ID=57939471
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610787187.3A Pending CN106383853A (en) | 2016-08-30 | 2016-08-30 | Realization method and system for electronic medical record post-structuring and auxiliary diagnosis |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106383853A (en) |
Cited By (71)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106934038A (en) * | 2017-03-15 | 2017-07-07 | 江苏华生基因数据科技股份有限公司 | A kind of medical data duplicate checking and the method and system for associating |
CN107168946A (en) * | 2017-04-14 | 2017-09-15 | 北京化工大学 | A kind of name entity recognition method of medical text data |
CN107480131A (en) * | 2017-07-25 | 2017-12-15 | 李姣 | Chinese electronic health record symptom semantic extracting method and its system |
CN107705849A (en) * | 2017-11-27 | 2018-02-16 | 泰康保险集团股份有限公司 | Remote medical consultation with specialists opinion integration method and device |
CN107785075A (en) * | 2017-11-01 | 2018-03-09 | 杭州依图医疗技术有限公司 | Fever in children disease deep learning assistant diagnosis system based on text case history |
CN107908621A (en) * | 2017-11-16 | 2018-04-13 | 东华大学 | Tumor of breast risk assessment system based on ultrasonic examination report text data |
CN108009156A (en) * | 2017-12-27 | 2018-05-08 | 成都信息工程大学 | A kind of Chinese generality text dividing method based on partial supervised study |
CN108021553A (en) * | 2017-09-30 | 2018-05-11 | 北京颐圣智能科技有限公司 | Word treatment method, device and the computer equipment of disease term |
CN108170716A (en) * | 2017-12-04 | 2018-06-15 | 昆明理工大学 | A kind of text duplicate checking method based on human visual |
CN108170673A (en) * | 2017-12-26 | 2018-06-15 | 北京百度网讯科技有限公司 | The recognition methods of information style and device based on artificial intelligence |
CN108346474A (en) * | 2018-03-14 | 2018-07-31 | 湖南省蓝蜻蜓网络科技有限公司 | The electronic health record feature selection approach of distribution within class and distribution between class based on word |
CN108491472A (en) * | 2018-03-07 | 2018-09-04 | 新博卓畅技术(北京)有限公司 | A kind of method and system segmenting structure medical characteristics library based on CRF++ |
CN108538395A (en) * | 2018-04-02 | 2018-09-14 | 上海市儿童医院 | A kind of construction method of general medical disease that calls for specialized treatment data system |
CN108564086A (en) * | 2018-03-17 | 2018-09-21 | 深圳市极客思索科技有限公司 | A kind of the identification method of calibration and device of character string |
CN108572954A (en) * | 2017-03-07 | 2018-09-25 | 上海颐为网络科技有限公司 | A kind of approximation entry structure recommendation method and system |
CN108648788A (en) * | 2018-07-04 | 2018-10-12 | 莫毓昌 | A kind of rehabilitation medical process management system of semi-structured electronic health record |
CN108711443A (en) * | 2018-05-07 | 2018-10-26 | 成都智信电子技术有限公司 | The text data analysis method and device of electronic health record |
CN108831560A (en) * | 2018-06-21 | 2018-11-16 | 北京嘉和美康信息技术有限公司 | A kind of method and apparatus of determining medical data attribute data |
CN108962383A (en) * | 2018-06-05 | 2018-12-07 | 南京麦睿智能科技有限公司 | Hospital's intelligence hospital guide's method and apparatus |
CN109033083A (en) * | 2018-07-20 | 2018-12-18 | 吴怡 | A kind of legal advice system based on semantic net |
CN109065157A (en) * | 2018-08-01 | 2018-12-21 | 中国人民解放军第二军医大学 | A kind of Disease Diagnosis Standard coded Recommendation list determines method and system |
CN109166608A (en) * | 2018-09-17 | 2019-01-08 | 新华三大数据技术有限公司 | Electronic health record information extracting method, device and equipment |
CN109192255A (en) * | 2018-07-03 | 2019-01-11 | 北京康夫子科技有限公司 | Case history structural method |
CN109215754A (en) * | 2018-09-10 | 2019-01-15 | 平安科技(深圳)有限公司 | Medical record data processing method, device, computer equipment and storage medium |
CN109243599A (en) * | 2018-03-16 | 2019-01-18 | 申朴信息技术(上海)股份有限公司 | A kind of disease based on various dimensions information retrieval is to code method |
CN109243618A (en) * | 2018-09-12 | 2019-01-18 | 腾讯科技(深圳)有限公司 | Construction method, disease label construction method and the smart machine of medical model |
CN109344250A (en) * | 2018-09-07 | 2019-02-15 | 北京大学 | Single diseases diagnostic message rapid structure method based on medical insurance data |
CN109493977A (en) * | 2018-11-09 | 2019-03-19 | 天津新开心生活科技有限公司 | Text data processing method, device, electronic equipment and computer-readable medium |
CN109524071A (en) * | 2018-11-16 | 2019-03-26 | 郑州大学第附属医院 | A kind of mask method towards the neutralizing analysis of Chinese electronic health record text structure |
CN109545383A (en) * | 2018-11-12 | 2019-03-29 | 北京懿医云科技有限公司 | Actual clinical path mutation detection method and device, storage medium, electronic equipment |
CN109785918A (en) * | 2018-12-29 | 2019-05-21 | 南京海泰医疗信息系统有限公司 | A kind of data collection system and method applied to clinical research |
CN109817330A (en) * | 2019-01-25 | 2019-05-28 | 华院数据技术(上海)有限公司 | A kind of disease forecasting device |
CN110019418A (en) * | 2018-01-02 | 2019-07-16 | 中国移动通信有限公司研究院 | Object factory method and device, mark system, electronic equipment and storage medium |
CN110020005A (en) * | 2019-03-28 | 2019-07-16 | 云知声(上海)智能科技有限公司 | Symptom matching process in main suit and present illness history in a kind of case history |
CN110097975A (en) * | 2019-04-28 | 2019-08-06 | 湖南省蓝蜻蜓网络科技有限公司 | A kind of nosocomial infection intelligent diagnosing method and system based on multi-model fusion |
CN110289058A (en) * | 2019-06-06 | 2019-09-27 | 北京市天元网络技术股份有限公司 | A kind of electronic health record standardization matching process and device |
CN110349639A (en) * | 2019-07-12 | 2019-10-18 | 之江实验室 | A kind of multicenter medical terms standardized system based on common therapy terminology bank |
CN110362829A (en) * | 2019-07-16 | 2019-10-22 | 北京百度网讯科技有限公司 | Method for evaluating quality, device and the equipment of structured patient record data |
CN110442633A (en) * | 2019-08-12 | 2019-11-12 | 南京医渡云医学技术有限公司 | Structural data generation method and device, storage medium and electronic equipment |
CN110534185A (en) * | 2019-08-30 | 2019-12-03 | 腾讯科技(深圳)有限公司 | Labeled data acquisition methods divide and examine method, apparatus, storage medium and equipment |
CN110750626A (en) * | 2018-07-06 | 2020-02-04 | 中国移动通信有限公司研究院 | Scene-based task-driven multi-turn dialogue method and system |
CN110931137A (en) * | 2018-09-19 | 2020-03-27 | 京东方科技集团股份有限公司 | Machine-assisted dialog system, method and device |
CN111159978A (en) * | 2019-12-30 | 2020-05-15 | 北京爱医生智慧医疗科技有限公司 | Method and device for replacing character strings |
CN111274806A (en) * | 2020-01-20 | 2020-06-12 | 医惠科技有限公司 | Method and device for recognizing word segmentation and part of speech and method and device for analyzing electronic medical record |
CN111291568A (en) * | 2020-03-06 | 2020-06-16 | 西南交通大学 | Automatic entity relationship labeling method applied to medical texts |
CN111382273A (en) * | 2020-03-09 | 2020-07-07 | 西安理工大学 | Text classification method based on feature selection of attraction factors |
CN111444333A (en) * | 2020-04-25 | 2020-07-24 | 上海健交科技服务有限责任公司 | Insurance medicine and clinical medicine code mapping method |
CN111539194A (en) * | 2020-03-24 | 2020-08-14 | 华东理工大学 | Usability evaluation method of medical text structured algorithm |
CN111625646A (en) * | 2020-05-22 | 2020-09-04 | 泰康保险集团股份有限公司 | Method and device for processing insurance policy, electronic equipment and storage medium |
CN111666414A (en) * | 2020-06-12 | 2020-09-15 | 上海观安信息技术股份有限公司 | Method for detecting cloud service by sensitive data and cloud service platform |
CN111681724A (en) * | 2020-05-07 | 2020-09-18 | 浙江大学医学院附属第四医院(浙江省义乌医院、浙江大学医学院附属第四医院医共体) | Electronic medical record key entity standardized identification method and identification system |
CN111986743A (en) * | 2020-08-07 | 2020-11-24 | 上海神桥医药科技有限公司 | Medical auditing method and application thereof |
CN111986750A (en) * | 2020-07-27 | 2020-11-24 | 北京天健源达科技股份有限公司 | Electronic medical record template structured detection method |
CN112101030A (en) * | 2020-08-24 | 2020-12-18 | 沈阳东软智能医疗科技研究院有限公司 | Method, device and equipment for establishing term mapping model and realizing standard word mapping |
CN112101019A (en) * | 2020-08-12 | 2020-12-18 | 南京航空航天大学 | Requirement template conformance checking optimization method based on part-of-speech tagging and chunk analysis |
CN112133390A (en) * | 2020-09-17 | 2020-12-25 | 吾征智能技术(北京)有限公司 | Liver disease cognitive system based on electronic medical record |
CN112204671A (en) * | 2018-05-30 | 2021-01-08 | 国际商业机器公司 | Personalized device recommendation for active health monitoring and management |
CN112270186A (en) * | 2020-11-04 | 2021-01-26 | 吾征智能技术(北京)有限公司 | Hot text information matching system based on entropy model |
CN112434756A (en) * | 2020-12-15 | 2021-03-02 | 杭州依图医疗技术有限公司 | Training method, processing method, device and storage medium of medical data |
CN112507198A (en) * | 2020-12-18 | 2021-03-16 | 北京百度网讯科技有限公司 | Method, apparatus, device, medium, and program for processing query text |
CN112562807A (en) * | 2020-12-11 | 2021-03-26 | 北京百度网讯科技有限公司 | Medical data analysis method, apparatus, device, storage medium, and program product |
CN112652393A (en) * | 2020-12-31 | 2021-04-13 | 山东大学齐鲁医院 | ERCP quality control method, system, storage medium and equipment based on deep learning |
CN112667813A (en) * | 2020-12-30 | 2021-04-16 | 北京华宇元典信息服务有限公司 | Method for identifying sensitive identity information of referee document |
CN112687397A (en) * | 2020-12-31 | 2021-04-20 | 四川大学华西医院 | Rare disease knowledge base processing method and device and readable storage medium |
CN112860842A (en) * | 2021-03-05 | 2021-05-28 | 联仁健康医疗大数据科技股份有限公司 | Medical record labeling method and device and storage medium |
CN113011183A (en) * | 2021-03-23 | 2021-06-22 | 北京科东电力控制系统有限责任公司 | Unstructured text data processing method and system in electric power regulation and control field |
CN113611411A (en) * | 2021-10-09 | 2021-11-05 | 浙江大学 | Body examination aid decision-making system based on false negative sample identification |
CN114341838A (en) * | 2019-09-06 | 2022-04-12 | 豪夫迈·罗氏有限公司 | Automatic information extraction and expansion using natural language processing in pathology reports |
CN114585443A (en) * | 2019-10-31 | 2022-06-03 | 美国西门子医学诊断股份有限公司 | Apparatus and method for training a diagnostic analyzer model |
CN115983233A (en) * | 2023-01-04 | 2023-04-18 | 重庆邮电大学 | Electronic medical record duplication rate estimation method based on data stream matching |
CN116312915A (en) * | 2023-05-19 | 2023-06-23 | 之江实验室 | Method and system for standardized association of drug terms in electronic medical records |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120066540A1 (en) * | 2010-03-26 | 2012-03-15 | Fujitsu Limited | Information correction support system and method |
CN103020034A (en) * | 2011-09-26 | 2013-04-03 | 北京大学 | Chinese words segmentation method and device |
CN105468900A (en) * | 2015-11-20 | 2016-04-06 | 邹远强 | Intelligent medical record input platform based on knowledge base |
-
2016
- 2016-08-30 CN CN201610787187.3A patent/CN106383853A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120066540A1 (en) * | 2010-03-26 | 2012-03-15 | Fujitsu Limited | Information correction support system and method |
CN103020034A (en) * | 2011-09-26 | 2013-04-03 | 北京大学 | Chinese words segmentation method and device |
CN105468900A (en) * | 2015-11-20 | 2016-04-06 | 邹远强 | Intelligent medical record input platform based on knowledge base |
Non-Patent Citations (1)
Title |
---|
刘凯: "基于条件随机场的中医病历命名实体抽取方法研究", 《中国优秀硕士学位论文全文数据库信息科技辑》 * |
Cited By (112)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108572954A (en) * | 2017-03-07 | 2018-09-25 | 上海颐为网络科技有限公司 | A kind of approximation entry structure recommendation method and system |
CN108572954B (en) * | 2017-03-07 | 2023-04-28 | 上海颐为网络科技有限公司 | Method and system for recommending approximate entry structure |
CN106934038B (en) * | 2017-03-15 | 2018-01-05 | 江苏华生基因数据科技股份有限公司 | A kind of medical data duplicate checking and the method and system associated |
CN106934038A (en) * | 2017-03-15 | 2017-07-07 | 江苏华生基因数据科技股份有限公司 | A kind of medical data duplicate checking and the method and system for associating |
CN107168946A (en) * | 2017-04-14 | 2017-09-15 | 北京化工大学 | A kind of name entity recognition method of medical text data |
CN107480131A (en) * | 2017-07-25 | 2017-12-15 | 李姣 | Chinese electronic health record symptom semantic extracting method and its system |
CN108021553A (en) * | 2017-09-30 | 2018-05-11 | 北京颐圣智能科技有限公司 | Word treatment method, device and the computer equipment of disease term |
CN107785075A (en) * | 2017-11-01 | 2018-03-09 | 杭州依图医疗技术有限公司 | Fever in children disease deep learning assistant diagnosis system based on text case history |
CN107908621A (en) * | 2017-11-16 | 2018-04-13 | 东华大学 | Tumor of breast risk assessment system based on ultrasonic examination report text data |
CN107705849A (en) * | 2017-11-27 | 2018-02-16 | 泰康保险集团股份有限公司 | Remote medical consultation with specialists opinion integration method and device |
CN108170716A (en) * | 2017-12-04 | 2018-06-15 | 昆明理工大学 | A kind of text duplicate checking method based on human visual |
CN108170716B (en) * | 2017-12-04 | 2021-12-17 | 昆明理工大学 | Text duplicate checking method based on human vision |
CN108170673A (en) * | 2017-12-26 | 2018-06-15 | 北京百度网讯科技有限公司 | The recognition methods of information style and device based on artificial intelligence |
CN108170673B (en) * | 2017-12-26 | 2021-08-24 | 北京百度网讯科技有限公司 | Information tone identification method and device based on artificial intelligence |
CN108009156B (en) * | 2017-12-27 | 2020-05-19 | 成都信息工程大学 | Chinese generalized text segmentation method based on partial supervised learning |
CN108009156A (en) * | 2017-12-27 | 2018-05-08 | 成都信息工程大学 | A kind of Chinese generality text dividing method based on partial supervised study |
CN110019418B (en) * | 2018-01-02 | 2021-09-14 | 中国移动通信有限公司研究院 | Object description method and device, identification system, electronic equipment and storage medium |
CN110019418A (en) * | 2018-01-02 | 2019-07-16 | 中国移动通信有限公司研究院 | Object factory method and device, mark system, electronic equipment and storage medium |
CN108491472A (en) * | 2018-03-07 | 2018-09-04 | 新博卓畅技术(北京)有限公司 | A kind of method and system segmenting structure medical characteristics library based on CRF++ |
CN108346474A (en) * | 2018-03-14 | 2018-07-31 | 湖南省蓝蜻蜓网络科技有限公司 | The electronic health record feature selection approach of distribution within class and distribution between class based on word |
CN108346474B (en) * | 2018-03-14 | 2021-09-28 | 湖南省蓝蜻蜓网络科技有限公司 | Electronic medical record feature selection method based on word intra-class distribution and inter-class distribution |
CN109243599A (en) * | 2018-03-16 | 2019-01-18 | 申朴信息技术(上海)股份有限公司 | A kind of disease based on various dimensions information retrieval is to code method |
CN108564086B (en) * | 2018-03-17 | 2024-05-10 | 上海柯渡医学科技股份有限公司 | Character string identification and verification method and device |
CN108564086A (en) * | 2018-03-17 | 2018-09-21 | 深圳市极客思索科技有限公司 | A kind of the identification method of calibration and device of character string |
CN108538395A (en) * | 2018-04-02 | 2018-09-14 | 上海市儿童医院 | A kind of construction method of general medical disease that calls for specialized treatment data system |
CN108711443B (en) * | 2018-05-07 | 2021-11-30 | 成都智信电子技术有限公司 | Text data analysis method and device for electronic medical record |
CN108711443A (en) * | 2018-05-07 | 2018-10-26 | 成都智信电子技术有限公司 | The text data analysis method and device of electronic health record |
CN112204671A (en) * | 2018-05-30 | 2021-01-08 | 国际商业机器公司 | Personalized device recommendation for active health monitoring and management |
CN108962383A (en) * | 2018-06-05 | 2018-12-07 | 南京麦睿智能科技有限公司 | Hospital's intelligence hospital guide's method and apparatus |
CN108831560B (en) * | 2018-06-21 | 2020-09-22 | 北京嘉和海森健康科技有限公司 | Method and device for determining medical data attribute data |
CN108831560A (en) * | 2018-06-21 | 2018-11-16 | 北京嘉和美康信息技术有限公司 | A kind of method and apparatus of determining medical data attribute data |
CN109192255A (en) * | 2018-07-03 | 2019-01-11 | 北京康夫子科技有限公司 | Case history structural method |
CN109192255B (en) * | 2018-07-03 | 2022-01-28 | 北京左医科技有限公司 | Medical record structuring method |
CN108648788A (en) * | 2018-07-04 | 2018-10-12 | 莫毓昌 | A kind of rehabilitation medical process management system of semi-structured electronic health record |
CN110750626B (en) * | 2018-07-06 | 2022-05-06 | 中国移动通信有限公司研究院 | Scene-based task-driven multi-turn dialogue method and system |
CN110750626A (en) * | 2018-07-06 | 2020-02-04 | 中国移动通信有限公司研究院 | Scene-based task-driven multi-turn dialogue method and system |
CN109033083A (en) * | 2018-07-20 | 2018-12-18 | 吴怡 | A kind of legal advice system based on semantic net |
CN109065157B (en) * | 2018-08-01 | 2020-11-03 | 中国人民解放军第二军医大学 | Disease diagnosis standardized code recommendation list determination method and system |
CN109065157A (en) * | 2018-08-01 | 2018-12-21 | 中国人民解放军第二军医大学 | A kind of Disease Diagnosis Standard coded Recommendation list determines method and system |
CN109344250B (en) * | 2018-09-07 | 2021-11-19 | 北京大学 | Rapid structuring method of single disease diagnosis information based on medical insurance data |
CN109344250A (en) * | 2018-09-07 | 2019-02-15 | 北京大学 | Single diseases diagnostic message rapid structure method based on medical insurance data |
CN109215754A (en) * | 2018-09-10 | 2019-01-15 | 平安科技(深圳)有限公司 | Medical record data processing method, device, computer equipment and storage medium |
CN109243618A (en) * | 2018-09-12 | 2019-01-18 | 腾讯科技(深圳)有限公司 | Construction method, disease label construction method and the smart machine of medical model |
CN109243618B (en) * | 2018-09-12 | 2020-06-16 | 腾讯科技(深圳)有限公司 | Medical model construction method, disease label construction method and intelligent device |
CN109166608A (en) * | 2018-09-17 | 2019-01-08 | 新华三大数据技术有限公司 | Electronic health record information extracting method, device and equipment |
CN110931137B (en) * | 2018-09-19 | 2023-07-07 | 京东方科技集团股份有限公司 | Machine-assisted dialog systems, methods, and apparatus |
CN110931137A (en) * | 2018-09-19 | 2020-03-27 | 京东方科技集团股份有限公司 | Machine-assisted dialog system, method and device |
CN109493977A (en) * | 2018-11-09 | 2019-03-19 | 天津新开心生活科技有限公司 | Text data processing method, device, electronic equipment and computer-readable medium |
CN109493977B (en) * | 2018-11-09 | 2020-07-31 | 天津新开心生活科技有限公司 | Text data processing method and device, electronic equipment and computer readable medium |
CN109545383A (en) * | 2018-11-12 | 2019-03-29 | 北京懿医云科技有限公司 | Actual clinical path mutation detection method and device, storage medium, electronic equipment |
CN109524071A (en) * | 2018-11-16 | 2019-03-26 | 郑州大学第附属医院 | A kind of mask method towards the neutralizing analysis of Chinese electronic health record text structure |
CN109524071B (en) * | 2018-11-16 | 2021-07-27 | 郑州大学第一附属医院 | Chinese electronic medical record text structured analysis-oriented labeling method |
CN109785918A (en) * | 2018-12-29 | 2019-05-21 | 南京海泰医疗信息系统有限公司 | A kind of data collection system and method applied to clinical research |
CN109785918B (en) * | 2018-12-29 | 2021-10-01 | 南京海泰医疗信息系统有限公司 | Data acquisition system and method applied to clinical scientific research |
CN109817330A (en) * | 2019-01-25 | 2019-05-28 | 华院数据技术(上海)有限公司 | A kind of disease forecasting device |
CN110020005A (en) * | 2019-03-28 | 2019-07-16 | 云知声(上海)智能科技有限公司 | Symptom matching process in main suit and present illness history in a kind of case history |
CN110020005B (en) * | 2019-03-28 | 2021-03-26 | 云知声(上海)智能科技有限公司 | Method for matching main complaints in medical records with symptoms in current medical history |
CN110097975A (en) * | 2019-04-28 | 2019-08-06 | 湖南省蓝蜻蜓网络科技有限公司 | A kind of nosocomial infection intelligent diagnosing method and system based on multi-model fusion |
CN110289058A (en) * | 2019-06-06 | 2019-09-27 | 北京市天元网络技术股份有限公司 | A kind of electronic health record standardization matching process and device |
CN110349639A (en) * | 2019-07-12 | 2019-10-18 | 之江实验室 | A kind of multicenter medical terms standardized system based on common therapy terminology bank |
CN110362829A (en) * | 2019-07-16 | 2019-10-22 | 北京百度网讯科技有限公司 | Method for evaluating quality, device and the equipment of structured patient record data |
CN110362829B (en) * | 2019-07-16 | 2023-01-03 | 北京百度网讯科技有限公司 | Quality evaluation method, device and equipment for structured medical record data |
CN110442633A (en) * | 2019-08-12 | 2019-11-12 | 南京医渡云医学技术有限公司 | Structural data generation method and device, storage medium and electronic equipment |
CN110534185A (en) * | 2019-08-30 | 2019-12-03 | 腾讯科技(深圳)有限公司 | Labeled data acquisition methods divide and examine method, apparatus, storage medium and equipment |
CN114341838A (en) * | 2019-09-06 | 2022-04-12 | 豪夫迈·罗氏有限公司 | Automatic information extraction and expansion using natural language processing in pathology reports |
CN114585443A (en) * | 2019-10-31 | 2022-06-03 | 美国西门子医学诊断股份有限公司 | Apparatus and method for training a diagnostic analyzer model |
CN114585443B (en) * | 2019-10-31 | 2023-11-03 | 美国西门子医学诊断股份有限公司 | Apparatus and method for training diagnostic analyzer model |
CN111159978B (en) * | 2019-12-30 | 2023-07-21 | 北京爱医生智慧医疗科技有限公司 | Character string replacement processing method and device |
CN111159978A (en) * | 2019-12-30 | 2020-05-15 | 北京爱医生智慧医疗科技有限公司 | Method and device for replacing character strings |
CN111274806B (en) * | 2020-01-20 | 2020-11-06 | 医惠科技有限公司 | Method and device for recognizing word segmentation and part of speech and method and device for analyzing electronic medical record |
CN111274806A (en) * | 2020-01-20 | 2020-06-12 | 医惠科技有限公司 | Method and device for recognizing word segmentation and part of speech and method and device for analyzing electronic medical record |
CN111291568A (en) * | 2020-03-06 | 2020-06-16 | 西南交通大学 | Automatic entity relationship labeling method applied to medical texts |
CN111291568B (en) * | 2020-03-06 | 2023-03-31 | 西南交通大学 | Automatic entity relationship labeling method applied to medical texts |
CN111382273B (en) * | 2020-03-09 | 2023-04-14 | 广州智赢万世市场管理有限公司 | Text classification method based on feature selection of attraction factors |
CN111382273A (en) * | 2020-03-09 | 2020-07-07 | 西安理工大学 | Text classification method based on feature selection of attraction factors |
CN111539194A (en) * | 2020-03-24 | 2020-08-14 | 华东理工大学 | Usability evaluation method of medical text structured algorithm |
CN111539194B (en) * | 2020-03-24 | 2024-08-16 | 华东理工大学 | Availability evaluation method of medical text structuring algorithm |
CN111444333B (en) * | 2020-04-25 | 2023-08-11 | 上海健交科技服务有限责任公司 | Coding mapping method for insurance medicine and clinical medicine |
CN111444333A (en) * | 2020-04-25 | 2020-07-24 | 上海健交科技服务有限责任公司 | Insurance medicine and clinical medicine code mapping method |
CN111681724A (en) * | 2020-05-07 | 2020-09-18 | 浙江大学医学院附属第四医院(浙江省义乌医院、浙江大学医学院附属第四医院医共体) | Electronic medical record key entity standardized identification method and identification system |
CN111625646B (en) * | 2020-05-22 | 2023-04-21 | 泰康保险集团股份有限公司 | Method, device, electronic equipment and storage medium for processing insurance policy |
CN111625646A (en) * | 2020-05-22 | 2020-09-04 | 泰康保险集团股份有限公司 | Method and device for processing insurance policy, electronic equipment and storage medium |
CN111666414B (en) * | 2020-06-12 | 2023-10-17 | 上海观安信息技术股份有限公司 | Method for detecting cloud service by sensitive data and cloud service platform |
CN111666414A (en) * | 2020-06-12 | 2020-09-15 | 上海观安信息技术股份有限公司 | Method for detecting cloud service by sensitive data and cloud service platform |
CN111986750A (en) * | 2020-07-27 | 2020-11-24 | 北京天健源达科技股份有限公司 | Electronic medical record template structured detection method |
CN111986750B (en) * | 2020-07-27 | 2023-12-26 | 北京天健源达科技股份有限公司 | Structural detection method for electronic medical record template |
CN111986743A (en) * | 2020-08-07 | 2020-11-24 | 上海神桥医药科技有限公司 | Medical auditing method and application thereof |
CN112101019A (en) * | 2020-08-12 | 2020-12-18 | 南京航空航天大学 | Requirement template conformance checking optimization method based on part-of-speech tagging and chunk analysis |
CN112101030B (en) * | 2020-08-24 | 2024-01-26 | 沈阳东软智能医疗科技研究院有限公司 | Method, device and equipment for establishing term mapping model and realizing standard word mapping |
CN112101030A (en) * | 2020-08-24 | 2020-12-18 | 沈阳东软智能医疗科技研究院有限公司 | Method, device and equipment for establishing term mapping model and realizing standard word mapping |
CN112133390B (en) * | 2020-09-17 | 2024-03-22 | 吾征智能技术(北京)有限公司 | Liver disease cognition system based on electronic medical record |
CN112133390A (en) * | 2020-09-17 | 2020-12-25 | 吾征智能技术(北京)有限公司 | Liver disease cognitive system based on electronic medical record |
CN112270186A (en) * | 2020-11-04 | 2021-01-26 | 吾征智能技术(北京)有限公司 | Hot text information matching system based on entropy model |
CN112270186B (en) * | 2020-11-04 | 2024-02-02 | 吾征智能技术(北京)有限公司 | Mouth based on entropy model peppery text information matching system |
CN112562807A (en) * | 2020-12-11 | 2021-03-26 | 北京百度网讯科技有限公司 | Medical data analysis method, apparatus, device, storage medium, and program product |
CN112562807B (en) * | 2020-12-11 | 2024-03-12 | 北京百度网讯科技有限公司 | Medical data analysis method, apparatus, device, storage medium, and program product |
CN112434756A (en) * | 2020-12-15 | 2021-03-02 | 杭州依图医疗技术有限公司 | Training method, processing method, device and storage medium of medical data |
CN112507198A (en) * | 2020-12-18 | 2021-03-16 | 北京百度网讯科技有限公司 | Method, apparatus, device, medium, and program for processing query text |
CN112667813B (en) * | 2020-12-30 | 2022-03-01 | 北京华宇元典信息服务有限公司 | Method for identifying sensitive identity information of referee document |
CN112667813A (en) * | 2020-12-30 | 2021-04-16 | 北京华宇元典信息服务有限公司 | Method for identifying sensitive identity information of referee document |
CN112687397A (en) * | 2020-12-31 | 2021-04-20 | 四川大学华西医院 | Rare disease knowledge base processing method and device and readable storage medium |
CN112687397B (en) * | 2020-12-31 | 2023-05-09 | 四川大学华西医院 | Rare disease knowledge base processing method and device and readable storage medium |
CN112652393A (en) * | 2020-12-31 | 2021-04-13 | 山东大学齐鲁医院 | ERCP quality control method, system, storage medium and equipment based on deep learning |
CN112860842A (en) * | 2021-03-05 | 2021-05-28 | 联仁健康医疗大数据科技股份有限公司 | Medical record labeling method and device and storage medium |
CN113011183B (en) * | 2021-03-23 | 2023-09-05 | 北京科东电力控制系统有限责任公司 | Unstructured text data processing method and system in electric power regulation and control field |
CN113011183A (en) * | 2021-03-23 | 2021-06-22 | 北京科东电力控制系统有限责任公司 | Unstructured text data processing method and system in electric power regulation and control field |
CN113611411B (en) * | 2021-10-09 | 2021-12-31 | 浙江大学 | Body examination aid decision-making system based on false negative sample identification |
CN113611411A (en) * | 2021-10-09 | 2021-11-05 | 浙江大学 | Body examination aid decision-making system based on false negative sample identification |
CN115983233A (en) * | 2023-01-04 | 2023-04-18 | 重庆邮电大学 | Electronic medical record duplication rate estimation method based on data stream matching |
CN115983233B (en) * | 2023-01-04 | 2024-09-20 | 广州大鱼创福科技有限公司 | Electronic medical record duplicate checking rate estimation method based on data stream matching |
CN116312915B (en) * | 2023-05-19 | 2023-09-19 | 之江实验室 | Method and system for standardized association of drug terms in electronic medical records |
CN116312915A (en) * | 2023-05-19 | 2023-06-23 | 之江实验室 | Method and system for standardized association of drug terms in electronic medical records |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106383853A (en) | Realization method and system for electronic medical record post-structuring and auxiliary diagnosis | |
CN111192680B (en) | Intelligent auxiliary diagnosis method based on deep learning and collective classification | |
CN109460473B (en) | Electronic medical record multi-label classification method based on symptom extraction and feature representation | |
Wei et al. | Task-oriented dialogue system for automatic diagnosis | |
CN110838368B (en) | Active inquiry robot based on traditional Chinese medicine clinical knowledge map | |
CN109949938B (en) | Method and device for standardizing medical non-standard names | |
CN110033859A (en) | Assess method, system, program and the storage medium of the medical findings of patient | |
CN109522551A (en) | Entity link method, apparatus, storage medium and electronic equipment | |
CN106897568A (en) | The treating method and apparatus of case history structuring | |
CN108319605A (en) | The structuring processing method and system of medical examination data | |
CN103020454A (en) | Method and system for extracting morbidity key factor and early warning disease | |
CN109344250A (en) | Single diseases diagnostic message rapid structure method based on medical insurance data | |
Wang et al. | A framework and its empirical study of automatic diagnosis of traditional Chinese medicine utilizing raw free-text clinical records | |
Faulconer et al. | An eight-step method for assessing diagnostic data quality in practice: chronic obstructive pulmonary disease as an exemplar. | |
CN106407664B (en) | The domain-adaptive device of breath diagnosis system | |
CN109478419A (en) | The automatic identification of significant discovery code in structuring and narrative report | |
CN113688255A (en) | Knowledge graph construction method based on Chinese electronic medical record | |
Banerjee et al. | Automatic inference of BI-RADS final assessment categories from narrative mammography report findings | |
US20220108070A1 (en) | Extracting Fine Grain Labels from Medical Imaging Reports | |
Hou et al. | Automatic report generation for chest X-ray images via adversarial reinforcement learning | |
Wang et al. | An answer recommendation algorithm for medical community question answering systems | |
Walker et al. | Evaluation of a semi-automated data extraction tool for public health literature-based reviews: Dextr | |
Atef et al. | AQAD: 17,000+ arabic questions for machine comprehension of text | |
CN110610766A (en) | Apparatus and storage medium for deriving probability of disease based on symptom feature weight | |
CN106354715A (en) | Method and device for medical word processing |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170208 |
|
RJ01 | Rejection of invention patent application after publication |