CN106383853A - Realization method and system for electronic medical record post-structuring and auxiliary diagnosis - Google Patents

Realization method and system for electronic medical record post-structuring and auxiliary diagnosis Download PDF

Info

Publication number
CN106383853A
CN106383853A CN201610787187.3A CN201610787187A CN106383853A CN 106383853 A CN106383853 A CN 106383853A CN 201610787187 A CN201610787187 A CN 201610787187A CN 106383853 A CN106383853 A CN 106383853A
Authority
CN
China
Prior art keywords
document
dictionary
feature
featured
term
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610787187.3A
Other languages
Chinese (zh)
Inventor
刘勇
琚生根
王俊峰
苏翀
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN201610787187.3A priority Critical patent/CN106383853A/en
Publication of CN106383853A publication Critical patent/CN106383853A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/258Data format conversion from or to a database

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The invention relates to a realization method and system for electronic medical record post-structuring and auxiliary diagnosis. A combination mode of multiple types of distance measurement is used: a character string editing distance refers to a minimum number of replacement, insertion and deletion operations required for converting a character into another character string; a Jaro-Winkler distance measures similarity between two character strings and is used for repeated recording detection; a geometric mean value of a Chinese character distance and a Chinese character input method is adopted as comprehensive similarity measurement for measuring similarity between characteristic texts; characteristic ranking is realized by using a TF-IDF method and is used for assessing the importance of characteristic terms relative to documents in a file set or a corpus library, and the importance of the characteristic terms is in direct proportion to an occurrence frequency in the documents and is in inverse proportion to an occurrence document in the corpus library; and files are converted to be in a file format of PU learning of a positive example data set and an unlabelled data set according to the generated characteristic terms, and through the PU learning, the system automatically recommends related diagnoses for clinical medical personnel to refer.

Description

The implementation method of structuring and auxiliary diagnosis and its system after a kind of electronic health record
Technical field
The present invention relates to a kind of electronic health record structural system and its implementation are and in particular to structure after a kind of electronic health record The implementation method of change and auxiliary diagnosis and its system.
Background technology
Traditional electronic health record data is record in the form of word description, although the certain mark of the structure of case history Standard is as foundation, but because relevant medical clinical field is more complicated, has each different contents in field, even same Content, corresponding description method is also had nothing in common with each other, therefore will preferably generating structure electronic health record difficult:By certainly So Language Processing (Natural Language Processing, NLP) is extracted structurized content from plain text describes It is a kind of method.A solution is also had to be to realize the structuring of medical record information, full structure by structuring typing mode The electronic medical record system changed can not represent the truly expressed of clinician sometimes completely, and full structuring comes to user of service Say that requirement is very high although full structuring can bring certain facility for clinical data analysis research.Such mode is to case history Standardisation requirements are higher, and structuring will have corresponding standard medical term to describe, but in standard medical term coded system Conception division will not be so fine, and standardize bring accuracy be contradiction with the flexibility of typing in practical application Although successively having released the respective standard solving this problem in the world, such as:SNOMED(The Systematized Nomenclature of Human and Veterinary Medicine), SNOMED CT, ICD-10 (International Classification of Diseases System) etc., but generally require to make very big adjustment in actual applications, and the Chinesizing work about standard also relatively lags behind, These are all the correlative factors in impact structuring typing using standardization medical terminology, and these factors also can be to electronics disease simultaneously The excavation going through middle relevant medical data brings certain impact.Hospitals at Present clinical worker operate electronic medical record system when Wait, due to the presence of objective circumstances, be difficult to use structurized electronic medical record system entirely interior on a large scale.Although structured electronic Case history has many good qualities, but difficulty is higher because it implements, and the requirement to user of service is very high.Comparatively speaking, freely Text input mode is much more flexible, and is easy to promotion and implementation and use.
The electronic medical record system of domestic main flow also allows for being realized with structurized method when design at present, but It is the complexity due to medical science and polytropy, to realize difficulty higher for structurized electronic health record entirely.Some electronic medical record system warps After being designed to accordingly to support structuring and clinical decision auxiliary, but it is necessary to according to electronics when practical operation Case history code requirement, to input, also usually must complete according to the unit that system provides when input.Due to input specification relatively, Can extract to subsequent data and using offering convenience;But premise is whether structured stencil design can be more conform with case history Structurized requirement.Structured stencil needs the personnel of professional domain to cooperate, and workload is very big, such as:Structuring nursing note Record, operation record etc., for the difference of patient's check point, design various anesthesia methods, nursing degree etc. making different Template, the personnel also needing to Medical Technologist's rank of professional domain in the middle of this participate in, and the degree of peopleware and participation is to knot The good bad influence of fruit is very big, and template construct is difficult to meet the situation of all complexity.In addition, relating in structured electronic patient record And to medical terminology standardization issue still lack complete, unified, being easy to use at present and have a large amount of practical application bases The taxonomic hierarchies of plinth and relevant criterion.
Although there is SNOMED, SNOMED CT, ICD-10, ICD-9-CM (International Classification of Diseases clinic correction) etc. Standard terminology collection, but to realize because these standard terminology collection are substantially to translate by foreign language, therefore and not all standard The Chinesizing effect of term can be satisfactory, more or less can bring some inconvenience in real work.Based on these shortcomings above-mentioned, The effect that structured patient record is implemented is simultaneously unsatisfactory, especially as { body part } { conventional symptom } { numeral } { chronomere } this Plant norm structure, be not to be seen everywhere.It is reported that, the electronic medical record system that most hospitals are used at present in design not There is consideration structuring, the even so-called institutional electronic health record based on XML is also partly structuring, is not real Structuring in meaning, can only be the structuring of part, and as main suit, past medical history, laboratory examination etc., these are more is to be based on The part of free Characters, but this part information of being comprised often most reference significance, the feature wherein comprising Element has critically important directive significance for clinical research.
Although some articles existing refer to the structuring to electronic health record and the feature recognition to structural data, Premise is used electronic medical record system in design, just according to standardized structure design, produces the customization meeting requirement Template, is carried out according to structured way when typing, the Medicine standard term of the relatively specification that term is also Collection.Regrettably, many electronic medical record systems so do not design or do not have in use to accomplish.Actual subscript Standardization and typing liberalization originally contradiction, if thinking, standardization just certainly will affect the free degree, if thinking, liberalization will produce Much nonstandard data, this is accomplished by these substantial amounts of nonstandard data being analysed in depth, using specific technology Carry out featured terms screening, refinement, analysis, only handle each step pilot process well, could be that clinical data analysis is carried out Significant guidance is provided.The shortcomings of structured electronic patient record lead to that domestic electronic health record is structurized to be developed and have some setbacks Profit, so a lot of hospital still continues to use the electronic medical record system of free text input, such electronic health record is a papery Data, to the transcription of electronic data, is unfavorable for the data analysis of profound level.A lot of electronic medical record systems are in input process Do not have perfect standard can follow, ununified specification yet, so for later data exchange, Data Integration, data It is all a potential obstacle for analysis.But want to accomplish in one move, the standardization realizing all data is also unpractical, How existing non-structural, nonstandard numbers according on accomplish the structuring of data, standardization here it is a comparison is significant Thing.The data of needs only after structuring, could be further extracted according to structurized relevant information, and to the number extracting According to being analyzed, so smooth development to clinical medicine correlative study activity provides due help.
(1) in electronic health record, name Entity recognition except name identification, place name identification etc., also disease name identify, The identification of symptom title, operation names identification, nomenclature of drug identification etc..It is the language using mark based on the method for statistical learning Material is trained, and therefore the mark of language material does not need too many domain knowledge.At present, the method has been widely used for nature language Speech process field.Conventional Statistical learning model includes SVMs (Support vector machine, SVM), hidden horse Er Kefu (Hidden Markov model, HMM), maximum entropy markov (Maximum ectropy Markov model, MEMM), condition random field (Conditional random field CRF) etc..This characteristic of hidden Markov model is permissible In the automatic word segmentation and part-of-speech tagging of Chinese.The method of HMM is also used in other Chinese word segmenting methods, wherein by word word-building Chinese word segmenting method be exactly one kind therein, and achieve good effect, the method by word word-building is that N.Xue et al. carries Go out, its main thought is the classification problem that participle process is regarded as word, conventional method is all first to set up a dictionary, participle Process be actually to carry out participle by looking up the dictionary, but then different by word word-building, it each Chinese character constitute word A location (lexeme) can be corresponded to.In general can be described as:(I), suffix (E), list in prefix (B), word Alone become word (S).Conditional random field models are on the basis of hidden Markov and maximum entropy model, proposition for marking and cutting Divide the conditional probability model of ordered data, it is a kind of discriminate probability non-directed graph learning model.CRF has been successfully applied to certainly So field such as Language Processing (Natural Language Processing, NLP), bioinformatics and network intelligence.(2) lead to Cross Entity recognition featured terms out, some meanings are similar or close, or even the meaning is just the same, for no other reason than that operating personnel Have input what term lack of standardization caused.Such as coronary stenting and coronary artery stent implantation, actually refer to generation The same meaning.Lack of standardization due to inputting, lead to system to extract two different featured terms.Therefore, by calculating feature Similarity degree between term is come feature of standardizing.
Content of the invention
For solve above-mentioned deficiency of the prior art, it is an object of the invention to provide after a kind of electronic health record structuring and The implementation method of auxiliary diagnosis and its system, realize the structuring of medical record information by structuring typing mode.
The purpose of the present invention is to be realized using following technical proposals:
The present invention provides the implementation method of structuring and auxiliary diagnosis after a kind of electronic health record, and it thes improvement is that, Described implementation method comprises the steps:
(1) electronic health record text structureization is processed;
S11:Set up Medical Dictionary;
S12:Set up medical science corpus;
S13:Medical features term process;
(2) auxiliary diagnosis management;
S21:Determine the feature word frequency that featured terms collection and electronic medical record document are constituted;
S22:Feature word frequency is carried out with PU train and carry out PU study;
S23:Draw auxiliary diagnosis result.
Further, in described step S11, described Medical Dictionary includes:
Standard medical dictionary, including:What the whole world was general is the 10th revised edition《The world of diseases and related health problems Statistical classification》ICD-10, International Classification of Diseases:The 9th edition this ICD-9-CM of clinical modification of operation and operation, medical system name The data of method-clinic term SNOMED CT is as standard;
Clinical medicine application dictionary, including:Internal dictionary and thesaurus, described internal dictionary includes clinical condition written complaint Allusion quotation and other the related dictionaries checking term;
Described thesaurus includes:Non-standardization featured terms arrive to the mapping to standardization featured terms, mistake word The standardization mapping of featured terms and sole criterion term are to the mapping of multiple standard terminologys.
Further, in described step S12, set up medical science corpus and comprise the steps:
S121:Electronic medical record document is extracted from electronic health record database;
S122:Electronic medical record document is carried out with part-of-speech tagging and lexeme mark;
S123:Data Integration is carried out to the document after part-of-speech tagging and lexeme mark;
S124:Make feature templates, feature templates are formed by CRF Algorithm for Training;
S125:Form characteristic, and carry out the recruitment evaluation of CRF algorithm;
S126:Ultimately form medical science corpus.
Further, in described step S122, described part-of-speech tagging refer to extract electronic medical record document carry out pre- Process, obtain the part of speech of electronic medical record document Chinese version, and combine lexeme mark, be converted into condition random field CRF form, be used in combination Condition random field CRF algorithm carries out feature extraction;By manual type, the electronic medical record document after automatic marking is checked;
Described lexeme mark, increases the hit probability to electronic medical record document Chinese version using standard medical dictionary, uses (reverse maximum matching method starts coupling scanning from the end of processed document to reverse maximum matching algorithm, takes least significant end every time 2i character (i word word string), as matching field, if it fails to match, removes a word of matching field foremost, continuation Join), medical terminology simultaneously carries out automatic marking according to I and suffix E in prefix B, word;
Carry out CRF Algorithm for Training, if training process in described step S124:%CRF_test-m model test.data >Output.txt, the result of training is in output.txt;Assess the contrast of label to be predicted and prediction label;
During output.txt exports in CRF algorithm, space is TAB key, all replaces with real space bar; Conlleval.pl identification is space bar;
In described step S125, the evaluation criteria of the recruitment evaluation of CRF algorithm is:
TP, True Positive:It is positive positive sample by model prediction;
FP, False Positive:It is positive negative sample by model prediction;
FN, False Negative:It is negative positive sample by model prediction;
TN, True Negative:It is negative negative sample by model prediction;
Accuracy:P=TP/ (TP+FP);
Recall rate:R=TP/ (TP+FN), i.e. real rate;
F1, compressive classification rate:Precision ratio and the harmonic-mean of recall ratio, equal to P, little that of R two number:F=2* P*R/(P+R).
Further, in described step S13, medical features term process comprises the steps:
S131:Through the electronic medical record document of the process of CRF algorithm, obtain text, mark inside described text Each word positional representation in the text in note test set data:I and suffix E in prefix B, word, is obtained special by corresponding program Collection is closed, and except there being some to be original word in dictionary in Partial Feature word in described characteristic set, has some to be not phase Close original featured terms inside dictionary, be the feature templates that CRF passes through manually to mark, the feature obtaining after carrying out data training Word, i.e. so-called unregistered word;
S132:Featured terms set is obtained, the inside comprises the featured terms of specification and nonstandard feature after feature extraction Term, in conjunction with non-standardization featured terms to standardization featured terms mapping thesaurus, by nonstandard featured terms with In thesaurus, non-standard featured terms carry out similarity-rough set, after comparing obtain similarity ranking and Similarity value by According to order arrangement from big to small;
S133:The threshold value of similarity is tentatively set to similarity and is more than or equal to 50%, the non-rule of threshold condition will be met Model featured terms and corresponding specification features term are recommended operating personnel as candidate feature term and are carried out reference, by operating Personnel determine non-standard featured terms corresponding specification features term, as final specification features term;The size of threshold value by Manually freely arrange.
Further, in described step S132, weight featured terms being occurred in all electronic medical record document (uses TF-IDF method is calculating) add up, finally obtain average in all electronic medical record document for each featured terms, then Ranking from big to small;
In described step S133, measure to calculate featured terms in individual features term set using feature text similarity Similarity, finally take the geometric mean=comprehensive similarity formula meter of (chinese character distance+phonetic distance+five distances) Calculate;
Chinese character distance, phonetic distance, in five distances respectively using character string similar (Jaro-Winkler) away from With a distance from+string editing, calculating similarity, the mean value finally taking two kinds of distances is as two kinds of similarities for two kinds of distances Distance metric.
Further, described step S22 includes:Described feature word frequency is by positive example document data collection and not mark number of files Test data set according to collection composition;From positive example document data collection with do not mark document data focusing study, using set P and U Practise positive example document and the counter-example document that framework is distinguished in test data set, i.e. PU study, wherein P represents positive example document data collection Close, U represents the unlabeled data set of counter-example document composition;In the case of not carrying out counter-example document marking, study obtains one Individual grader, is labeled to not marking document data collection with described grader, the document required for obtaining.
Further, in described step S22, the medical record data clarified a diagnosis as determination disease is identified to form positive example Document data collection, does not mark document data collection formation training set in conjunction with the medical record data not marked and is learnt, using PU The grader that habit framework obtains is labeled to electronic medical record document from now on, reaches the purpose of auxiliary diagnosis.
Present invention additionally comprises structuring and assistant diagnosis system after a kind of electronic health record, it thes improvement is that, described System includes:
Medical Dictionary management module:For to standard dictionary management and clinical medicine application dictionary management;Described medical science is faced Bed application dictionary, including:Internal dictionary and thesaurus, described internal dictionary includes clinical symptoms dictionary and checks term Other related dictionaries;Described thesaurus includes:Non-standardization featured terms are used to the mapping to standardization featured terms, mistake Word is to the standardization mapping of featured terms and sole criterion term to the mapping of multiple standard terminologys;
Medical science language material database management module:For to the extraction of electronic medical record document data, part-of-speech tagging and lexeme mark;And Make feature templates, feature mark and feature extraction;
Medical features term process:For the standardized management to featured terms;
Auxiliary diagnosis management module:Learn for the management of PU learning framework, PU learning training and test and management and PU Auxiliary diagnosis manage.
In order to have a basic understanding to some aspects of the embodiment disclosing, shown below is simple summary.Should Summarized section is not extensive overview, is not the protection domain that will determine key/critical component or describe these embodiments. Its sole purpose is to assume some concepts with simple form, in this, as the preamble of following detailed description.
Compared with immediate prior art, the excellent effect that the technical scheme of present invention offer has is:
The present invention is by calculating the similarity degree between featured terms come feature of standardizing.Measures characteristic text phase of the present invention Use the combination of several distance metrics like degree method:Jaro-Winkler (Winkler) distance is to weigh two characters Similitude between string, it is the variant of Jaro distance metric, for repeating the detection recording.String editing distance is character String editing distance refer to certain character be changed into another one character string minimum need how many times replace, insertion, deletion action.Using The geometric mean of (chinese character distance+phonetic+five distances of distance) is measured as last comprehensive similarity.Feature ranking makes Realized with the method for TF-IDF (Term frequency inverse document frequency), TF-IDF is one Plant statistical method, in order to assess the significance level that featured terms are with respect to one of file set or corpus document, feature art The number of times that the importance of language is occurred in the document to it is directly proportional, and the frequency being occurred in corpus with it is inversely proportional to.According to The featured terms generating, are converted into the file format that PU (positive example data set and no labeled data focusing study) learns, through PU Study, system recommends the diagnosis of correlation automatically for clinical worker reference.
For above-mentioned and related purpose, one or more embodiments include will be explained in and in claim below In the feature that particularly points out.Description below and accompanying drawing describe some illustrative aspects in detail, and its instruction is only Some modes in the utilizable various modes of principle of each embodiment.Other benefits and novel features will with The detailed description in face is considered in conjunction with the accompanying and becomes obvious, the disclosed embodiments be intended to including all these aspects and they Equivalent.
Brief description
Fig. 1 is structuring and assistant diagnosis system after the electronic health record of the first optimal technical scheme that the present invention provides Structured flowchart;
Fig. 2 is the Medical Dictionary structure chart that the present invention provides;
Fig. 3 is the multi-standard term synthesis schematic diagram that the present invention provides;
Fig. 4 is the schematic diagram of the medical domain corpus Establishing process that the present invention provides;
Fig. 5 is the schematic diagram of the condition random field CRF algorithmic format that the present invention provides;
Fig. 6 is the flow chart of the language material lexeme mark that the present invention provides;
Fig. 7 is that the CRF that the present invention provides trains file format by the schematic diagram of word word-building;
Fig. 8 is the feature templates 1 of present invention offer and the schematic diagram of feature templates 2;
Fig. 9 is the schematic diagram of the featured terms handling process that the present invention provides;
Figure 10 is the schematic diagram of the non-standard featured terms mark that the present invention provides;
Figure 11 is the auxiliary diagnosis flow chart that the present invention provides;
Figure 12 is the PU study schematic diagram without category of the second optimal technical scheme that the present invention provides;
Figure 13 is the study schematic diagram of the PU with category of the second optimal technical scheme that the present invention provides;
Figure 14 is the positive example document recall rate schematic diagram of the second optimal technical scheme that the present invention provides;
Figure 15 is the positive example document accurate rate schematic diagram of the second optimal technical scheme that the present invention provides;
Figure 16 is the F-Value value schematic diagram of the second optimal technical scheme that the present invention provides;
Figure 17 is the overall accuracy schematic diagram of the second optimal technical scheme that the present invention provides;
Figure 18 is the comprehensive similarity recall rate and accurate rate schematic diagram that the present invention provides.
Specific embodiment
Below in conjunction with the accompanying drawings the specific embodiment of the present invention is described in further detail.
The following description and drawings fully illustrate specific embodiments of the present invention, to enable those skilled in the art to Put into practice them.Other embodiments can include structure, logic, electric, process and other change.Implement Example only represents possible change.Unless explicitly requested, otherwise individually assembly and function are optional, and the order operating can To change.The part of some embodiments and feature can be included in or replace part and the feature of other embodiments.This The scope of the embodiment of invention includes the gamut of claims, and all obtainable of claims is equal to Thing.Herein, these embodiments of the present invention individually or generally with term " invention " can be represented, this is only For convenience, and if in fact disclosing the invention more than, the scope being not meant to automatically limit this application is to appoint What single invention or inventive concept.
First optimal technical scheme:
As shown in figure 1, structuring and auxiliary are examined after the electronic health record of the first optimal technical scheme providing for the present invention The structured flowchart of disconnected system, the present invention provides the implementation method of structuring and auxiliary diagnosis after a kind of electronic health record, realization side Method comprises the steps:
(1) electronic health record text structureization is processed, including:
S11:The foundation of relevant medical dictionary:
Because participle instrument is generally not that it is special that carried dictionary can not possibly comprise most of medical science towards medical speciality field With term, the present invention, in order to rapidly set up related dictionary, employs the partial data of ICD10, ICD-9-CM, SNOMED CT As standard, constitute Medical Dictionary in conjunction with hospital clinical application dictionary.As shown in Figure 2.
Medical Dictionary includes:
Standard medical dictionary, including:What the whole world was general is the 10th revised edition《The world of diseases and related health problems Statistical classification》ICD-10, International Classification of Diseases:The 9th edition this ICD-9-CM of clinical modification of operation and operation, medical system name The data of method-clinic term SNOMED CT is as standard;
1st, hospital clinical application dictionary, including:Internal dictionary and thesaurus, described internal dictionary includes clinical symptoms Dictionary and other the related dictionaries checking term;
(1) internal dictionary:
Clinical symptoms dictionary:
For example:Chilly, heating, shiver with cold, cough, expectoration, headache, headache, giddy, nasal obstruction, runny nose, uncomfortable in chest, asthma, abdomen Bitterly, abdominal distension, frequent micturition, urgent urination, DOMS, malaise, weak, expiratory dyspnea, spitting of blood etc..
Other related dictionaries:
Various inspection terms, such as full rabat, chest CT etc..
(2) synonymicon, including:Non-standardization featured terms are to the mapping to standardization featured terms, mistake word To the standardization mapping of featured terms and sole criterion term to the mapping of multiple standard terminologys.
During writing electronic health record, due to the difference of clinician's medical ground, grasp medical science relevant knowledge Qualification is different, so the degree of clinician's grasp standard medical terminology is also different.Each doctor's accurate perception is allowed to own Standard clinical term do not meet actual conditions, also have during typing simultaneously clerical mistake produce, so consider with Adopted word dictionary should comprise three below part, and these three partly can be incorporated in a dictionary:
Non-standardization featured terms to standardization featured terms mapping, as shown in table 1.
Table 1 non-standardization featured terms-standardization featured terms mapping
Non-standardization featured terms Standardization featured terms
The sick Crohndisease of clone Crohn disease (regional ileitis)
Kernig's sign Kernig sign
Hemoptysis, cough up phlegm Spitting of blood, expectoration
Antibiotic Antibiotic
Anti-inflammatory treatment Anti-infective therapy
Cranial nerve Cranial nerve
Presbyopic Presbyopia
Lymph gland Lymph node
Presenium disease is stayed Alzheimer disease
Frozen section Freezing microtome section
Rale Sound
Lymphoblast Lymphoblast
Mould Fungi
Mistake word to standardization featured terms mapping, as shown in table 2.
The mapping of the wrong word-standardization featured terms of table 2
Mistake word Standardization featured terms
Tang's urine disease | sugared ornithosis Diabetes
Spontaneous immunity Autoimmunity
It is also contemplated for when actually used for Tables 1 and 2 being merged into a dictionary, i.e. thesaurus.
A kind of situation is also had to be exactly that some term has multiple standards expression, using wherein any one is all specification , but reality during structurized it should the method with reference to SNOMED CT sets up a dictionary it is simply that sole criterion art The mapping of language and multiple standard terminology is it is also possible to regard as non-standardization featured terms to standardization featured terms this situation Mapping special circumstances it is also possible to and table 1, table 2-in-1 and in a thesaurus, as shown in Figure 3.
S12:The foundation of relevant medical corpus:Set up and safeguard the corpus of medical domain.As shown in figure 4, under including State step:
S121:Electronic medical record document is extracted from electronic health record database;
S122:Electronic medical record document is carried out with part-of-speech tagging and lexeme mark;
S123:Data Integration is carried out to the document after part-of-speech tagging and lexeme mark;
S124:Make feature templates, feature templates are formed by CRF Algorithm for Training;
S125:Form characteristic, and carry out the recruitment evaluation of CRF algorithm;
S126:Ultimately form medical science corpus.
Specifically:
In step S121, electronic medical record document is extracted and is included:
Because the fairly large corpus of artificial mark is relatively difficult, the mode that man-computer cooperation is contemplated herein is with fast run-up A vertical small-scale corpus, comprises the following steps that:
1st, pass through to have artificially collected 887 parts of electronic medical record document, cover the section office such as division of cardiology, oncology, division of respiratory disease Patient data.
2nd, (main suit), (present illness history), (past medical history), (laboratory and the apparatus inspection of each patient is automatically extracted by program Look into) text data that is related to, as original process file.
3rd, last, carry out the automatic marking of text on this basis using corresponding instrument, then carry out manual examination and verification mark Method, can rapidly build a corpus.
In step S122:
First, the part-of-speech tagging of language material:
Chinese Academy of Sciences's ICTCLAS Words partition system is the Chinese lexical analysis system based on level hidden Markov model.System Function is more, mainly has the functions such as part-of-speech tagging, Chinese word segmentation, name Entity recognition, unknown word identification, can be with plug-in user Dictionary, extensively applies in the every field of Chinese information processing.
The present invention utilizes the correlation function of ICTCLAS, carries out secondary development, for the pretreatment before being labeled.This mould The purpose of block design is the part of speech of quick obtaining text, so that next step use condition random field carries out feature extraction.Selection portion Point effect shows as follows:
【Master/a tells/v:/ w cough/v expectoration/expiratory dyspnea/n3/n days/q of n companion/v./ w is existing/t medical history/n:/ w3/n days/q Before/f patient/n is in hospital in/p our hospital/n breathing/v section/n/v during/f appearances/v cough/v ,/w expectoration/n ,/w independently/v row/v Phlegm/n difficulty/a ,/w need/v auxiliary/v row/v phlegm/n ,/w is /p is a large amount of/m grey/n mucus/n phlegm/n ,/w not /d is shown in/v phlegm/n In/f band/v blood/n.During/w/n has/v expiratory dyspnea/n companion/vSPO2/x decline/v (/w minimum/a70%/m)/w ,/w gives/and v turns over After the body/v bat/v back of the body/v suction/v phlegm/n/f improvement/v.In/w the course of disease/n/f no/v heating/v ,/w no/v nausea/a vomiting/n ,/w No/v drop in blood pressure/n ,/w no/v spitting of blood/n ,/w no/v is black/a just/n./ w chest/nCT/x shows/v (/w2013-6-15/m)/ w:/ w is slow/and a props up/q changes/v pulmonary emphysema/n companion/v infection/v ,/w two/m pulmonary fibrosis/n ,/w both sides/f pleura/n plumpness/a companion/ V pleural effusion/n ,/w fall/v sustainer/n increasing/v width/a./w】
In order to meet the requirement of the form to file for the CRF++-0.53 secondary development, using computer program by ICTCLAS Word segmentation result be converted into the form specified, as shown in Figure 5.
2nd, the lexeme mark of language material
In order to obtain the necessary corpus of CRF study, lexeme mark must be carried out to all words in document, it is apparent that passing through The mode of artificial mark less efficient it is considered to be solved with the quick notation methods of computer.Need when mark to use related doctor The standard dictionary in field, system is by the term increase of ICD10, ICD-9-CM, SNOMED, SNOMED CT, synonymicon etc. To in dictionary, to increase the hit efficiency of participle.Diagnosis, the relevant medical term length performed the operation, check are typically long, use (I), suffix (E) in reverse maximum matching algorithm foundation prefix (B), word, carry out automatic marking, because dictionary can not possibly comprise All of standard medical term, so after carrying out dictionary matching, by manually carrying out to the corpus after computer automatic marking Verification, as shown in Figure 6.By the result of word word-building, corresponding CRF training file format is as shown in Figure 7.
In step S124, the feature templates 1 that the present invention provides and feature templates 2 are as shown in Figure 8.
In step S125, the recruitment evaluation of CRF algorithm includes:
If training process:%CRF_test-m model test.data>output.txt
The result of training is in output.txt.Assessment is the contrast of label to be predicted and prediction label.
conlleval.pl<output.txt
.pl suffix is Perl file, so needing to install " practical form extraction language " (Practical Extraction And Report Language, Perl)
Note:Output.txt space in CRF++ output is TAB key, needs all to replace with real space bar. Conlleval.pl identification is space bar.
The assessment result contrast of the assessment result of command set output characteristic template 1 and feature templates 2, as shown in table 3.
Table 3 template contrasts
Evaluation criterion:TP(True Positive):It is positive positive sample by model prediction;
FP(False Positive):It is positive negative sample by model prediction;
FN(False Negative):It is negative positive sample by model prediction;
TN(True Negative):It is negative negative sample by model prediction;
Accuracy (Precision):P=TP/ (TP+FP);
Recall rate (Recall):R=TP/ (TP+FN), i.e. real rate;
F1 (compressive classification rate):Precision ratio and the harmonic-mean of recall ratio, closer to P, R two number less that Individual:F=2*P*R/ (P+R);
Conclusion:Be the effect of feature templates 2 more preferably, reason is that feature templates 2 can obtain more validity features.
The process of the featured terms of step S13:The featured terms generating after processing are further processed, to obtain Meet the featured terms of PU study requirement.As shown in figure 9, comprising the steps:
S131:Through the process of CRF algorithm, a text can be obtained, inside this document, be labelled with test set data In each word positional representation in the text:(M), suffix (E) in prefix (B), word, obtain a feature by corresponding program Set, in this characteristic set, some Feature Words are not original featured terms inside related dictionary, are that CRF algorithm passes through The feature templates of artificial mark, the Feature Words obtaining after carrying out data training, that is, so-called unregistered word.
S132:A featured terms set can be obtained after feature extraction, the inside both comprised specification featured terms it is also possible to Contain nonstandard featured terms, be at this moment accomplished by with reference to thesaurus as shown in table 1, by these featured terms with synonymous In dictionary " non-standard featured terms " this carry out similarity-rough set, have a similarity ranking and similar after relatively Angle value arranges according to order from big to small.
S133:Because the initially not ready-made standard data set of thesaurus may be referred to, in order to set up from scratch One thesaurus, needs the threshold value tune of similarity is lower, as long as being tentatively set to similarity be just more than or equal to 50% " the non-standard featured terms " that meet threshold condition and corresponding " specification features term " are recommended as candidate feature term Carry out reference to operating personnel, determine which corresponding specification term of non-standard featured terms of selection as final by artificial Specification term.
Table 4 non-standard term-specification term mapping
Non-standardization featured terms Standardization featured terms
Tang's urine disease | sugared ornithosis Diabetes
Get more and more with the featured terms in the data acquisition system of thesaurus, what this when, threshold value can be adjusted is high, Advantage of this is that, only when the corresponding featured terms of featured terms that similarity is higher than a certain threshold value just can be shown in candidate Operating personnel's reference is supplied, if featured terms do not have corresponding candidate feature term through similarity-rough set in featured terms list Occur can selecting in lists, at this time by way of manual confirmation, this feature term can be modified as the rule specified Model featured terms.Note:The size of threshold value can manually freely to arrange by system, so relatively flexibly.
If typing " Tang's urine disease ", this word is exactly one and typically inputs the word leading to lack of standardization, " Tang as shown in Table 4 The corresponding candidate's non-standard featured terms of urine disease " are " Tang's urine diseases ", can be found according to this candidate's non-standardization featured terms " diabetes ", this is only final specification features term, as shown in table 5:
Featured terms before table 5 specification
Pant Expectoration
Heating Spitting of blood
Weak Asthma
Pulmonary infection Full rabat
Hepatitis Rabat
Infection Diabetes
Hypertension Tang's urine disease
Coronary stenting Chilly
Coronary artery stent implantation Auricular fibrillation
Coronary heart disease Chest CT
Shiver with cold Uncomfortable in chest
Cough Pectoralgia
Leucocyte WBC
As shown in table 5, featured terms " Tang's urine disease " corresponding to non-standard featured terms in synonymicon (table 4) are " Tang's urine disease " and " sugared ornithosis " this entry.Can extract out corresponding standardization featured terms " diabetes " by this mapping relations. So just can learn that " disease is urinated by Tang " and " diabetes " are different, then non-standard featured terms " Tang's urine disease " be marked with eye-catching color Out, prompting clinician revises.As shown in Figure 10.
Specifically:
In step S133, the similarity of featured terms processes and includes:
According to the featured terms set extracting, detect in the corresponding thesaurus of each of which featured terms and do not advise Model featured terms carry out similarity comparison, and specific method will be used feature text similarity and measure to calculate the phase of individual features Like degree.According to article, finally take the geometric mean=synthesis phase of (chinese character distance+phonetic+five distances of distance) Like degree although the algorithm of similarity is similar, it is defeated due to considered the Feature Words under actual conditions having quite a few Enter what mistake caused, wherein just include homophonic (unisonance, nearly sound), the mistake of similar Chinese character (nearly word form such as such as radical), this When this comprehensive similarity just can also be obtained in that while improving similar duplication detection algorithm recall ratio and higher look into standard Rate.As shown in figure 18:
Respectively using Jaro-Winkler distance+word in chinese character distance, phonetic distance, three kinds of methods of five distances Symbol two kinds of distances of string editing distance, to calculate similarity, finally take the distance degree as two kinds of similarities for the mean value of two kinds of distances Amount.As shown in table 6:
6 three kinds of similarity comparison of table
Illustrate this two featured terms very close it may be considered that only being represented with one of specification term.By this Method also can find out the synonym phrase for standard terminology of the easy appearance in routine use it may be considered that being added to synonymous The vocabulary of dictionary is enriched in dictionary.
Featured terms after specification are as shown in table 7 below:
Featured terms after table 7 specification
Pant Expectoration
Heating Spitting of blood
Weak Asthma
Pulmonary infection Full rabat
Hepatitis
Infection Diabetes
Hypertension
Coronary stenting Chilly
Auricular fibrillation
Coronary heart disease Chest CT
Shiver with cold Uncomfortable in chest
Cough Pectoralgia
Leucocyte
Featured terms ranking:Through process above, the featured terms being extracted have a lot.However, not all carry The feature taking is all meaningful, therefore, it can consider to come feature is carried out by way of TF-IDF ranking and filter out crucial spy Levy.Because not being that every article all of Feature Words all can, in order to obtain the ranking of key feature term, examine herein Worry occurs in the weight in all document d all key feature terms and adds up, and finally obtains each featured terms in institute There is the average in document, then ranking from big to small.Herein through CRF++ instrument extraction feature term 390 altogether, Ran Hougen Carry out ranking according to the average weight calculating, through confirmation and the screening of domain expert, final acquisition key feature term 68 Individual.Front 20 featured terms listed by table 8.
Table 8 featured terms ranking
Second optimal technical scheme:
Auxiliary diagnosis manage:
According to the featured terms generating, it is converted into the file that PU (positive example data set and no labeled data focusing study) learns Form, through PU study, system recommends the diagnosis of correlation automatically for clinical worker reference.As shown in figure 11:
S21:Determine the feature word frequency that featured terms collection and electronic medical record document are constituted;
S22:Feature word frequency is carried out with PU train and carry out PU study;
S23:Draw auxiliary diagnosis result.
Specifically:
Step S22:The application that part educational inspector practises:
Partial supervised study is generally divided into two kinds:The first learning tasks is from marking and no learned labeled data Practise, also known as doing LU study, wherein L represents labeled data collection, and U represents unlabeled data collection.Second learning tasks are from just Example data set and no labeled data focusing study, i.e. PU study, wherein P represents positive example set, and U represents unlabeled set and closes, algorithm Purpose be in the case of not carrying out negative data mark, acquire an accurate grader.
In actual applications, need to distinguish positive example document from the collection of document of a mixing.And the literary composition of this mixing Both contain positive example document in shelves set, also contains the document of other classifications.Wherein, the corresponding document of classification interested Referred to as positive example document;The corresponding document of remaining classification is referred to as counter-example document.All of positive example document constitutes positive example set P;Institute Some counter-example documents constitute no mark set U.
Problem definition is intended to find out a grader, can distinguish the positive example document in test set by using set P and U With counter-example document.The method of this solve problem is PU study.
This learning framework is based on such a fact:Current internet is prevailing, due to people in most of the cases only Wherein certain class document or web page contents are interested in, and other category documents or web page contents are not relevant for.In mark In the case of a small amount of document of interest, it is possible to use PU learning framework obtains a grader, come to having no that document carries out with it Mark, thus the document required for obtaining.For example some people are interested in the webpage of friend-making sites, this be every other webpage all Counter-example webpage can be seen as.
In medical research, this situation is also often had to occur, that is, certain disease is more difficult according to the diagnosis of some features, but this Planting disease is just interested to clinical workers.The medical record data that fraction is clarified a diagnosis as this kind of disease is identified shape Become positive example collection of document, then, be that unlabeled data set forms training set in conjunction with the medical record data not marked in a large number Practising, using the grader that PU learning framework obtains, medical history information from now on being labeled, thus reaching the mesh of auxiliary diagnosis 's.
The present invention also provides structuring and assistant diagnosis system after a kind of electronic health record, including:
Medical Dictionary management module:For to standard dictionary management and clinical medicine application dictionary management;Described medical science is faced Bed application dictionary, including:Internal dictionary and thesaurus, described internal dictionary includes clinical symptoms dictionary and checks term Other related dictionaries;Described thesaurus includes:Non-standardization featured terms are used to the mapping to standardization featured terms, mistake Word is to the standardization mapping of featured terms and sole criterion term to the mapping of multiple standard terminologys;
Medical science language material database management module:For to the extraction of electronic medical record document data, part-of-speech tagging and lexeme mark;And Make feature templates, feature mark and feature extraction;
Medical features term process:For the standardized management to featured terms;
Auxiliary diagnosis management module:Learn for the management of PU learning framework, PU learning training and test and management and PU Auxiliary diagnosis manage.
First, experimental framework and result
1st, test used tool
(1) PU learning tool LPU (http://www.cs.uic.edu/~liub/LPU/lpu.zip).
(2) SVMs kit goes out to download SVMlight (SVMs) kit
(3) experiment order and parameter
lpu-s1[option 1]-s2[option 2]-c[option 3]-f[filestem]
-s1:Represent the first stage parameter options of PU study.
-s2:Represent the second stage parameter options of PU study.
-c:The mode of selection sort device.
- s1 has three kinds of methods can select be respectively:Espionage act (spy), Luo Jiao (roc), naive Bayesian (nb).S2 has two methods can select be respectively:SVMs (svm), expectation are maximum (em).Selection sort device Mode:1 represents best one in selection institute generation grader.
2nd, the file format of experimental data set
Three original data sets are respectively:
demo.pos:Represent positive example collection of document.
demo.unlabel:Represent and do not mark collection of document.
Above-mentioned two file does not all comprise category, as shown in Figure 12.
demo.test:Represent test data set.Both included positive example document and comprised counter-example document, also wrapped simultaneously
Contain category, positive example is represented with+1, negative example is represented with -1, as shown in Figure 13.
Each row of data form in data file:Category attribute:Property value ... attribute:Property value.Category value:+ 1 and- 1, represent positive example document and counter-example document respectively.Each category and property value between use space-separated, each attribute must Must be numbered with integer, from 1 open numbering.Each property value must use integer value, represents that each attribute occurs in affiliated literary composition Number of times in shelves.Property value is that 0 feature will be automatically ignored.Attribute number must arrange according to incremental order, and such as 5:1 6:1 7:1 8:1 10:4 11:2 12:3 13:1 14:1 15:1 16:6 17:2 23:1 25:2 29:1.
3rd, experimental data
(1), experimental data is constituted:
Effective electron case history 750 is collected in this experiment altogether, and content is related to respiratory disease, takes out through the feature of system Take, obtain efficient diagnosis as shown in table 9 below:
Table 9 efficient diagnosis
The characteristic attribute value extracting is as shown in table 10 below:
Table 10 characteristic attribute
Full rabat Coronary heart disease Upper right Lung infection
Chest CT Pulmonary emphysema Atelectasis
Asthma Precordialgia Bronchiectasis
Cough Become thin Malignant pleural effusion
Bronchial astehma Chilly Interstitial pneumonia
Pleural effusion Expiratory dyspnea Ventilatory dysfunction
Runny nose Shortness of breath Stridulate
Pant Heating Obstructive pneumonia
Shiver with cold Malaise Upper left Lung infection
Two enhanced lung markings Lower-left Lung infection Enlargement of lymph nodes
Spitting of blood Respiratory failure Cholecystolithiasis
Weak Pulmonary tuberculosis Hydropericardium
Uncomfortable in chest Pneumothorax Sneeze
Nasal obstruction Expectoration Hydropneumothorax
Pectoralgia Chronic bronchitis Hypertension
Headache Bottom right Lung infection Oedema
Acute bronchitis DOMS Calcification of lymph node
The infection of the upper respiratory tract Spontaneous pneumothorax Pleural calcification
Palpitaition AECB Diabetes
Vomiting COPD Lose weight
Nausea It is short of breath Bronchiostenosis
Dizzy Edema of lower extremity Pleural effusion
Pharyngalgia Pulmonary fibrosis
4th, experimental data packet
With COPD for positive class, choose 151 COPD case histories and constitute positive example document sets Close, and generate mzf.pos file.Select again from remaining document 49 COPD case histories and 300 other Disease type case history constitutes mzf.unlabel file.Finally, by remaining 50 COPD case histories and 200 Other diseases case history constitutes mzf.test file.
5th, participate in the combination of the grader of experiment:
Table 11 classifiers combination
In order to ensure there being applicability to different applied environments, the system that the present invention provides, when realizing, provides multiple Classifiers combination mode.For different task, select optimal classification device.
6th, experimental result and analysis:
From the point of view of the recall rate of the positive example document of Figure 14, though the positive example document recall rate of Roc-Svm is not highest, its value Also reached 90%, be more or less the same with peak.From the point of view of Figure 15, the positive example document accurate rate value of Roc-Svm is 82%, close Peak 83%, but after considering recall rate and the accurate rate of positive example document, can Roc-Svm positive example document as seen in Figure 16 F-value value reach 85.9%, be best in all graders.Additionally, from Figure 17 it is also seen that Roc-Svm obtains 94% overall accuracy rate index.It follows that for being directed to the data set of this experiment, Roc-Svm grader is optimum Grader.
During this is mainly due to learning in PU, unlabeled set closes U and generally has following characteristics:
1. close in U in unlabeled set, positive example document proportion is often less, thus without to the counter-example document in algorithm Center vector produces considerable influence.
2. close in U in unlabeled set, usually contain multiple different classes of documents, therefore in vector space, they Cover a larger region, i.e. relative distribution.And document is generally pertaining only to a classification in positive example set P, it is mutually similar Type.In vector space, they cover a less region, i.e. Relatively centralized.Assume have a decision boundary to be used for Distinguish positive example document and counter-example document.Wherein, positive example document belongs to set P, and counter-example document belongs to set U, and decision boundary is used for Ensure that the document in positive example set P and unlabeled set are closed document in U separates.Because in set U, document more disperses, so, A lot of counter-example documents are had to be divided into positive example document by mistake, this also exactly adopts Rocchio algorithm high precision rate can extract reliability instead The reason example text shelves.Therefore, after forming reliable counter-example collection of document RN, training set can be formed using RN and set P Carry out Training Support Vector Machines (SVM), continuous iteration, till no longer having reliable counter-example document to be drawn out of in certain iteration. But because many counter-example document mistakes can be divided into positive example document by Rocchio algorithm, therefore, positive example document have very low accurate Rate, and adopt SVMs (SVM) to classify, it will correct the biasing of Rocchio algorithm, thus produce more accurately dividing Class device.This is the reason also exactly Roc-Svm grader becomes optimum classifier in this experiment.
Specific experiment result is as shown in Figure 14 is to 17:
Evaluation criterion:TP(True Positive):It is positive positive sample by model prediction;FP(False Positive): It is positive negative sample by model prediction;FN(False Negative):It is negative positive sample by model prediction;TN(True Negative):It is negative negative sample by model prediction;
Accuracy (Precision):P=TP/ (TP+FP);
Recall rate (Recall):R=TP/ (TP+FN), i.e. real rate;
F1 (compressive classification rate):Precision ratio and the harmonic-mean of recall ratio, closer to P, R two number less that Individual:F=2*P*R/ (P+R);
Accuracy rate (Aaccuracy):The decision-making ability to whole sample for the grader, judgement that will be positive is that just negative sentences It is set to negative:A=(TP+TN)/(TP+FN+FP+TN).
Above example is only not intended to limit in order to technical scheme to be described, although with reference to above-described embodiment pair The present invention has been described in detail, and those of ordinary skill in the art still can enter to the specific embodiment of the present invention Row modification or equivalent, these without departing from any modification of spirit and scope of the invention or equivalent, all in application Within the claims of the pending present invention.

Claims (8)

1. after a kind of electronic health record the implementation method of structuring and auxiliary diagnosis it is characterised in that described implementation method includes Following step:
(1) electronic health record text structureization is processed;
S11:Set up Medical Dictionary;
S12:Set up medical science corpus;
S13:Medical features term process;
(2) auxiliary diagnosis management;
S21:Determine the feature word frequency that featured terms collection and electronic medical record document are constituted;
S22:Feature word frequency is carried out with PU train and carry out PU study;
S23:Draw auxiliary diagnosis result.
2. implementation method as claimed in claim 1 is it is characterised in that in described step S11, described Medical Dictionary includes:
Standard medical dictionary, including:What the whole world was general is the 10th revised edition《The International Statistical of diseases and related health problems Classification》ICD-10, International Classification of Diseases:Operation and operation the 9th edition this ICD-9-CM of clinical modification, Systematized Nomenclature of Medicine-face The data of bed term SNOMED CT is as standard;
Clinical medicine application dictionary, including:Internal dictionary and thesaurus, described internal dictionary include clinical symptoms dictionary and Check other related dictionaries of term;Described thesaurus includes:Non-standardization featured terms are to standardization featured terms Mapping, the mapping of mistake word to standardization featured terms and sole criterion term are to the mapping of multiple standard terminologys.
3. implementation method as claimed in claim 1 is it is characterised in that in described step S12, sets up under medical science corpus includes State step:
S121:Electronic medical record document is extracted from electronic health record database;
S122:Electronic medical record document is carried out with part-of-speech tagging and lexeme mark;
S123:Data Integration is carried out to the document after part-of-speech tagging and lexeme mark;
S124:Make feature templates, feature templates are formed by CRF Algorithm for Training;
S125:Form characteristic, and carry out the recruitment evaluation of CRF algorithm;
S126:Ultimately form medical science corpus.
4. implementation method as claimed in claim 3 is it is characterised in that in described step S122, described part-of-speech tagging refers to The electronic medical record document extracted is pre-processed, obtains the part of speech of electronic medical record document Chinese version, and combine lexeme mark, turn Change condition random field CRF form into, and carry out feature extraction with condition random field CRF algorithm;Marked to automatic by manual type Electronic medical record document after note is checked;
Described lexeme mark, increases the hit probability to electronic medical record document Chinese version using standard medical dictionary, using reverse Maximum matching algorithm, wherein, reverse maximum matching method starts coupling scanning from the end of processed document, takes least significant end every time 2i character is as matching field, if it fails to match, removes a word of matching field foremost, continues coupling;Medical science art Language simultaneously carries out automatic marking according to I and suffix E in prefix B, word;
Carry out CRF Algorithm for Training, if training process in described step S124:%CRF_test-m model test.data> Output.txt, the result of training is in output.txt;Assess the contrast of label to be predicted and prediction label;
During output.txt exports in CRF algorithm, space is TAB key, all replaces with real space bar;conlleval.pl Identification is space bar;
In described step S125, the evaluation criteria of the recruitment evaluation of CRF algorithm is:
TP, True Positive:It is positive positive sample by model prediction;
FP, False Positive:It is positive negative sample by model prediction;
FN, False Negative:It is negative positive sample by model prediction;
TN, True Negative:It is negative negative sample by model prediction;
Accuracy:P=TP/ (TP+FP);
Recall rate:R=TP/ (TP+FN), i.e. real rate;
F1, compressive classification rate:Precision ratio and the harmonic-mean of recall ratio, equal to P, little that of R two number:F=2*P*R/ (P+R).
5. implementation method as claimed in claim 1 is it is characterised in that in described step S13, medical features term process includes Following step:
S131:Through the electronic medical record document of the process of CRF algorithm, obtain text, inside described text, mark is surveyed Each word positional representation in the text in examination collection data:In prefix B, word, I and suffix E, obtains feature set by corresponding program Close, comprise Unrecorded featured terms in original word and dictionary in dictionary in Partial Feature word in described characteristic set, be CRF passes through the feature templates of artificial mark, the Feature Words obtaining after carrying out data training, i.e. so-called unregistered word;
S132:Featured terms set is obtained, the inside comprises the featured terms of specification and nonstandard featured terms after feature extraction, In conjunction with the thesaurus of non-standardization featured terms to standardization featured terms mapping, by nonstandard featured terms and synonym In allusion quotation, non-standard featured terms carry out similarity-rough set, after comparing obtain similarity ranking and Similarity value is according to from big To little order arrangement;
S133:The threshold value of similarity is tentatively set to similarity and is more than or equal to 50%, will be special for the non-standard meeting threshold condition Levy term and corresponding specification features term and recommend operating personnel as candidate feature term and carry out reference, by operating personnel Determine non-standard featured terms corresponding specification features term, as final specification features term;The size of threshold value is by artificial Freely arrange.
6. implementation method as claimed in claim 5 is it is characterised in that in described step S132, calculated using TF-IDF method Featured terms are occurred in the weight in all electronic medical record document add up, obtain each featured terms in all electronics disease Go through the average in document, from big to small ranking;
In described step S133, measure to calculate the phase of featured terms in individual features term set using feature text similarity Like degree;Geometric mean=comprehensive similarity the formula taking (chinese character distance+phonetic+five distances of distance) calculates;
Respectively using character string similarity distance+string editing distance in chinese character distance, phonetic distance, five distances, Two kinds of distances, to calculate similarity, finally take the distance metric as two kinds of similarities for the mean value of two kinds of distances.
7. implementation method as claimed in claim 1 is it is characterised in that described step S22 includes:Described feature word frequency is by just Example text file data collection and the test data set not marking document data collection composition;From positive example document data collection with do not mark number of files According to focusing study, distinguish positive example document and the counter-example document of test data concentration, i.e. PU using set P and U learning framework Practise, wherein P represents positive example document data set, U represents the unlabeled data set of counter-example document composition;Do not carrying out counter-example literary composition In the case of shelves mark, study obtains a grader, is labeled to not marking document data collection with described grader, obtains Required document;
The medical record data clarified a diagnosis as determination disease is identified to form positive example document data collection, in conjunction with the case history not marked Data does not mark document data collection and forms training set and learnt, and the grader being obtained using PU learning framework is to electricity from now on Sub- case history labelling document, reaches the purpose of auxiliary diagnosis.
8. after a kind of electronic health record structuring and assistant diagnosis system it is characterised in that described system includes:
Medical Dictionary management module:For to standard dictionary management and clinical medicine application dictionary management;Described clinical medicine should With dictionary, including:Internal dictionary and thesaurus, described internal dictionary include clinical symptoms dictionary and check term other Related dictionary;Described thesaurus includes:Non-standardization featured terms arrive to the mapping to standardization featured terms, mistake word The standardization mapping of featured terms and sole criterion term are to the mapping of multiple standard terminologys;
Medical science language material database management module:For to the extraction of electronic medical record document data, part-of-speech tagging and lexeme mark;And make Feature templates, feature mark and feature extraction;
Medical features term process:For the standardized management to featured terms;
Auxiliary diagnosis management module:Auxiliary for the management of PU learning framework, PU learning training and test and management and PU study Diagnosis management.
CN201610787187.3A 2016-08-30 2016-08-30 Realization method and system for electronic medical record post-structuring and auxiliary diagnosis Pending CN106383853A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610787187.3A CN106383853A (en) 2016-08-30 2016-08-30 Realization method and system for electronic medical record post-structuring and auxiliary diagnosis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610787187.3A CN106383853A (en) 2016-08-30 2016-08-30 Realization method and system for electronic medical record post-structuring and auxiliary diagnosis

Publications (1)

Publication Number Publication Date
CN106383853A true CN106383853A (en) 2017-02-08

Family

ID=57939471

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610787187.3A Pending CN106383853A (en) 2016-08-30 2016-08-30 Realization method and system for electronic medical record post-structuring and auxiliary diagnosis

Country Status (1)

Country Link
CN (1) CN106383853A (en)

Cited By (70)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106934038A (en) * 2017-03-15 2017-07-07 江苏华生基因数据科技股份有限公司 A kind of medical data duplicate checking and the method and system for associating
CN107168946A (en) * 2017-04-14 2017-09-15 北京化工大学 A kind of name entity recognition method of medical text data
CN107480131A (en) * 2017-07-25 2017-12-15 李姣 Chinese electronic health record symptom semantic extracting method and its system
CN107705849A (en) * 2017-11-27 2018-02-16 泰康保险集团股份有限公司 Remote medical consultation with specialists opinion integration method and device
CN107785075A (en) * 2017-11-01 2018-03-09 杭州依图医疗技术有限公司 Fever in children disease deep learning assistant diagnosis system based on text case history
CN107908621A (en) * 2017-11-16 2018-04-13 东华大学 Tumor of breast risk assessment system based on ultrasonic examination report text data
CN108009156A (en) * 2017-12-27 2018-05-08 成都信息工程大学 A kind of Chinese generality text dividing method based on partial supervised study
CN108021553A (en) * 2017-09-30 2018-05-11 北京颐圣智能科技有限公司 Word treatment method, device and the computer equipment of disease term
CN108170716A (en) * 2017-12-04 2018-06-15 昆明理工大学 A kind of text duplicate checking method based on human visual
CN108170673A (en) * 2017-12-26 2018-06-15 北京百度网讯科技有限公司 The recognition methods of information style and device based on artificial intelligence
CN108346474A (en) * 2018-03-14 2018-07-31 湖南省蓝蜻蜓网络科技有限公司 The electronic health record feature selection approach of distribution within class and distribution between class based on word
CN108491472A (en) * 2018-03-07 2018-09-04 新博卓畅技术(北京)有限公司 A kind of method and system segmenting structure medical characteristics library based on CRF++
CN108538395A (en) * 2018-04-02 2018-09-14 上海市儿童医院 A kind of construction method of general medical disease that calls for specialized treatment data system
CN108564086A (en) * 2018-03-17 2018-09-21 深圳市极客思索科技有限公司 A kind of the identification method of calibration and device of character string
CN108572954A (en) * 2017-03-07 2018-09-25 上海颐为网络科技有限公司 A kind of approximation entry structure recommendation method and system
CN108648788A (en) * 2018-07-04 2018-10-12 莫毓昌 A kind of rehabilitation medical process management system of semi-structured electronic health record
CN108711443A (en) * 2018-05-07 2018-10-26 成都智信电子技术有限公司 The text data analysis method and device of electronic health record
CN108831560A (en) * 2018-06-21 2018-11-16 北京嘉和美康信息技术有限公司 A kind of method and apparatus of determining medical data attribute data
CN108962383A (en) * 2018-06-05 2018-12-07 南京麦睿智能科技有限公司 Hospital's intelligence hospital guide's method and apparatus
CN109033083A (en) * 2018-07-20 2018-12-18 吴怡 A kind of legal advice system based on semantic net
CN109065157A (en) * 2018-08-01 2018-12-21 中国人民解放军第二军医大学 A kind of Disease Diagnosis Standard coded Recommendation list determines method and system
CN109166608A (en) * 2018-09-17 2019-01-08 新华三大数据技术有限公司 Electronic health record information extracting method, device and equipment
CN109192255A (en) * 2018-07-03 2019-01-11 北京康夫子科技有限公司 Case history structural method
CN109215754A (en) * 2018-09-10 2019-01-15 平安科技(深圳)有限公司 Medical record data processing method, device, computer equipment and storage medium
CN109243599A (en) * 2018-03-16 2019-01-18 申朴信息技术(上海)股份有限公司 A kind of disease based on various dimensions information retrieval is to code method
CN109243618A (en) * 2018-09-12 2019-01-18 腾讯科技(深圳)有限公司 Construction method, disease label construction method and the smart machine of medical model
CN109344250A (en) * 2018-09-07 2019-02-15 北京大学 Single diseases diagnostic message rapid structure method based on medical insurance data
CN109493977A (en) * 2018-11-09 2019-03-19 天津新开心生活科技有限公司 Text data processing method, device, electronic equipment and computer-readable medium
CN109524071A (en) * 2018-11-16 2019-03-26 郑州大学第附属医院 A kind of mask method towards the neutralizing analysis of Chinese electronic health record text structure
CN109545383A (en) * 2018-11-12 2019-03-29 北京懿医云科技有限公司 Actual clinical path mutation detection method and device, storage medium, electronic equipment
CN109785918A (en) * 2018-12-29 2019-05-21 南京海泰医疗信息系统有限公司 A kind of data collection system and method applied to clinical research
CN109817330A (en) * 2019-01-25 2019-05-28 华院数据技术(上海)有限公司 A kind of disease forecasting device
CN110019418A (en) * 2018-01-02 2019-07-16 中国移动通信有限公司研究院 Object factory method and device, mark system, electronic equipment and storage medium
CN110020005A (en) * 2019-03-28 2019-07-16 云知声(上海)智能科技有限公司 Symptom matching process in main suit and present illness history in a kind of case history
CN110097975A (en) * 2019-04-28 2019-08-06 湖南省蓝蜻蜓网络科技有限公司 A kind of nosocomial infection intelligent diagnosing method and system based on multi-model fusion
CN110289058A (en) * 2019-06-06 2019-09-27 北京市天元网络技术股份有限公司 A kind of electronic health record standardization matching process and device
CN110349639A (en) * 2019-07-12 2019-10-18 之江实验室 A kind of multicenter medical terms standardized system based on common therapy terminology bank
CN110362829A (en) * 2019-07-16 2019-10-22 北京百度网讯科技有限公司 Method for evaluating quality, device and the equipment of structured patient record data
CN110442633A (en) * 2019-08-12 2019-11-12 南京医渡云医学技术有限公司 Structural data generation method and device, storage medium and electronic equipment
CN110534185A (en) * 2019-08-30 2019-12-03 腾讯科技(深圳)有限公司 Labeled data acquisition methods divide and examine method, apparatus, storage medium and equipment
CN110750626A (en) * 2018-07-06 2020-02-04 中国移动通信有限公司研究院 Scene-based task-driven multi-turn dialogue method and system
CN110931137A (en) * 2018-09-19 2020-03-27 京东方科技集团股份有限公司 Machine-assisted dialog system, method and device
CN111159978A (en) * 2019-12-30 2020-05-15 北京爱医生智慧医疗科技有限公司 Method and device for replacing character strings
CN111274806A (en) * 2020-01-20 2020-06-12 医惠科技有限公司 Method and device for recognizing word segmentation and part of speech and method and device for analyzing electronic medical record
CN111291568A (en) * 2020-03-06 2020-06-16 西南交通大学 Automatic entity relationship labeling method applied to medical texts
CN111382273A (en) * 2020-03-09 2020-07-07 西安理工大学 Text classification method based on feature selection of attraction factors
CN111444333A (en) * 2020-04-25 2020-07-24 上海健交科技服务有限责任公司 Insurance medicine and clinical medicine code mapping method
CN111539194A (en) * 2020-03-24 2020-08-14 华东理工大学 Usability evaluation method of medical text structured algorithm
CN111625646A (en) * 2020-05-22 2020-09-04 泰康保险集团股份有限公司 Method and device for processing insurance policy, electronic equipment and storage medium
CN111666414A (en) * 2020-06-12 2020-09-15 上海观安信息技术股份有限公司 Method for detecting cloud service by sensitive data and cloud service platform
CN111681724A (en) * 2020-05-07 2020-09-18 浙江大学医学院附属第四医院(浙江省义乌医院、浙江大学医学院附属第四医院医共体) Electronic medical record key entity standardized identification method and identification system
CN111986743A (en) * 2020-08-07 2020-11-24 上海神桥医药科技有限公司 Medical auditing method and application thereof
CN111986750A (en) * 2020-07-27 2020-11-24 北京天健源达科技股份有限公司 Electronic medical record template structured detection method
CN112101019A (en) * 2020-08-12 2020-12-18 南京航空航天大学 Requirement template conformance checking optimization method based on part-of-speech tagging and chunk analysis
CN112101030A (en) * 2020-08-24 2020-12-18 沈阳东软智能医疗科技研究院有限公司 Method, device and equipment for establishing term mapping model and realizing standard word mapping
CN112133390A (en) * 2020-09-17 2020-12-25 吾征智能技术(北京)有限公司 Liver disease cognitive system based on electronic medical record
CN112204671A (en) * 2018-05-30 2021-01-08 国际商业机器公司 Personalized device recommendation for active health monitoring and management
CN112270186A (en) * 2020-11-04 2021-01-26 吾征智能技术(北京)有限公司 Hot text information matching system based on entropy model
CN112434756A (en) * 2020-12-15 2021-03-02 杭州依图医疗技术有限公司 Training method, processing method, device and storage medium of medical data
CN112507198A (en) * 2020-12-18 2021-03-16 北京百度网讯科技有限公司 Method, apparatus, device, medium, and program for processing query text
CN112562807A (en) * 2020-12-11 2021-03-26 北京百度网讯科技有限公司 Medical data analysis method, apparatus, device, storage medium, and program product
CN112652393A (en) * 2020-12-31 2021-04-13 山东大学齐鲁医院 ERCP quality control method, system, storage medium and equipment based on deep learning
CN112667813A (en) * 2020-12-30 2021-04-16 北京华宇元典信息服务有限公司 Method for identifying sensitive identity information of referee document
CN112687397A (en) * 2020-12-31 2021-04-20 四川大学华西医院 Rare disease knowledge base processing method and device and readable storage medium
CN112860842A (en) * 2021-03-05 2021-05-28 联仁健康医疗大数据科技股份有限公司 Medical record labeling method and device and storage medium
CN113011183A (en) * 2021-03-23 2021-06-22 北京科东电力控制系统有限责任公司 Unstructured text data processing method and system in electric power regulation and control field
CN113611411A (en) * 2021-10-09 2021-11-05 浙江大学 Body examination aid decision-making system based on false negative sample identification
CN114585443A (en) * 2019-10-31 2022-06-03 美国西门子医学诊断股份有限公司 Apparatus and method for training a diagnostic analyzer model
CN115983233A (en) * 2023-01-04 2023-04-18 重庆邮电大学 Electronic medical record duplication rate estimation method based on data stream matching
CN116312915A (en) * 2023-05-19 2023-06-23 之江实验室 Method and system for standardized association of drug terms in electronic medical records

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120066540A1 (en) * 2010-03-26 2012-03-15 Fujitsu Limited Information correction support system and method
CN103020034A (en) * 2011-09-26 2013-04-03 北京大学 Chinese words segmentation method and device
CN105468900A (en) * 2015-11-20 2016-04-06 邹远强 Intelligent medical record input platform based on knowledge base

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120066540A1 (en) * 2010-03-26 2012-03-15 Fujitsu Limited Information correction support system and method
CN103020034A (en) * 2011-09-26 2013-04-03 北京大学 Chinese words segmentation method and device
CN105468900A (en) * 2015-11-20 2016-04-06 邹远强 Intelligent medical record input platform based on knowledge base

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
刘凯: "基于条件随机场的中医病历命名实体抽取方法研究", 《中国优秀硕士学位论文全文数据库信息科技辑》 *

Cited By (109)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108572954A (en) * 2017-03-07 2018-09-25 上海颐为网络科技有限公司 A kind of approximation entry structure recommendation method and system
CN108572954B (en) * 2017-03-07 2023-04-28 上海颐为网络科技有限公司 Method and system for recommending approximate entry structure
CN106934038B (en) * 2017-03-15 2018-01-05 江苏华生基因数据科技股份有限公司 A kind of medical data duplicate checking and the method and system associated
CN106934038A (en) * 2017-03-15 2017-07-07 江苏华生基因数据科技股份有限公司 A kind of medical data duplicate checking and the method and system for associating
CN107168946A (en) * 2017-04-14 2017-09-15 北京化工大学 A kind of name entity recognition method of medical text data
CN107480131A (en) * 2017-07-25 2017-12-15 李姣 Chinese electronic health record symptom semantic extracting method and its system
CN108021553A (en) * 2017-09-30 2018-05-11 北京颐圣智能科技有限公司 Word treatment method, device and the computer equipment of disease term
CN107785075A (en) * 2017-11-01 2018-03-09 杭州依图医疗技术有限公司 Fever in children disease deep learning assistant diagnosis system based on text case history
CN107908621A (en) * 2017-11-16 2018-04-13 东华大学 Tumor of breast risk assessment system based on ultrasonic examination report text data
CN107705849A (en) * 2017-11-27 2018-02-16 泰康保险集团股份有限公司 Remote medical consultation with specialists opinion integration method and device
CN108170716A (en) * 2017-12-04 2018-06-15 昆明理工大学 A kind of text duplicate checking method based on human visual
CN108170716B (en) * 2017-12-04 2021-12-17 昆明理工大学 Text duplicate checking method based on human vision
CN108170673A (en) * 2017-12-26 2018-06-15 北京百度网讯科技有限公司 The recognition methods of information style and device based on artificial intelligence
CN108170673B (en) * 2017-12-26 2021-08-24 北京百度网讯科技有限公司 Information tone identification method and device based on artificial intelligence
CN108009156A (en) * 2017-12-27 2018-05-08 成都信息工程大学 A kind of Chinese generality text dividing method based on partial supervised study
CN108009156B (en) * 2017-12-27 2020-05-19 成都信息工程大学 Chinese generalized text segmentation method based on partial supervised learning
CN110019418A (en) * 2018-01-02 2019-07-16 中国移动通信有限公司研究院 Object factory method and device, mark system, electronic equipment and storage medium
CN110019418B (en) * 2018-01-02 2021-09-14 中国移动通信有限公司研究院 Object description method and device, identification system, electronic equipment and storage medium
CN108491472A (en) * 2018-03-07 2018-09-04 新博卓畅技术(北京)有限公司 A kind of method and system segmenting structure medical characteristics library based on CRF++
CN108346474B (en) * 2018-03-14 2021-09-28 湖南省蓝蜻蜓网络科技有限公司 Electronic medical record feature selection method based on word intra-class distribution and inter-class distribution
CN108346474A (en) * 2018-03-14 2018-07-31 湖南省蓝蜻蜓网络科技有限公司 The electronic health record feature selection approach of distribution within class and distribution between class based on word
CN109243599A (en) * 2018-03-16 2019-01-18 申朴信息技术(上海)股份有限公司 A kind of disease based on various dimensions information retrieval is to code method
CN108564086B (en) * 2018-03-17 2024-05-10 上海柯渡医学科技股份有限公司 Character string identification and verification method and device
CN108564086A (en) * 2018-03-17 2018-09-21 深圳市极客思索科技有限公司 A kind of the identification method of calibration and device of character string
CN108538395A (en) * 2018-04-02 2018-09-14 上海市儿童医院 A kind of construction method of general medical disease that calls for specialized treatment data system
CN108711443A (en) * 2018-05-07 2018-10-26 成都智信电子技术有限公司 The text data analysis method and device of electronic health record
CN108711443B (en) * 2018-05-07 2021-11-30 成都智信电子技术有限公司 Text data analysis method and device for electronic medical record
CN112204671A (en) * 2018-05-30 2021-01-08 国际商业机器公司 Personalized device recommendation for active health monitoring and management
CN108962383A (en) * 2018-06-05 2018-12-07 南京麦睿智能科技有限公司 Hospital's intelligence hospital guide's method and apparatus
CN108831560A (en) * 2018-06-21 2018-11-16 北京嘉和美康信息技术有限公司 A kind of method and apparatus of determining medical data attribute data
CN108831560B (en) * 2018-06-21 2020-09-22 北京嘉和海森健康科技有限公司 Method and device for determining medical data attribute data
CN109192255B (en) * 2018-07-03 2022-01-28 北京左医科技有限公司 Medical record structuring method
CN109192255A (en) * 2018-07-03 2019-01-11 北京康夫子科技有限公司 Case history structural method
CN108648788A (en) * 2018-07-04 2018-10-12 莫毓昌 A kind of rehabilitation medical process management system of semi-structured electronic health record
CN110750626B (en) * 2018-07-06 2022-05-06 中国移动通信有限公司研究院 Scene-based task-driven multi-turn dialogue method and system
CN110750626A (en) * 2018-07-06 2020-02-04 中国移动通信有限公司研究院 Scene-based task-driven multi-turn dialogue method and system
CN109033083A (en) * 2018-07-20 2018-12-18 吴怡 A kind of legal advice system based on semantic net
CN109065157A (en) * 2018-08-01 2018-12-21 中国人民解放军第二军医大学 A kind of Disease Diagnosis Standard coded Recommendation list determines method and system
CN109065157B (en) * 2018-08-01 2020-11-03 中国人民解放军第二军医大学 Disease diagnosis standardized code recommendation list determination method and system
CN109344250A (en) * 2018-09-07 2019-02-15 北京大学 Single diseases diagnostic message rapid structure method based on medical insurance data
CN109344250B (en) * 2018-09-07 2021-11-19 北京大学 Rapid structuring method of single disease diagnosis information based on medical insurance data
CN109215754A (en) * 2018-09-10 2019-01-15 平安科技(深圳)有限公司 Medical record data processing method, device, computer equipment and storage medium
CN109243618A (en) * 2018-09-12 2019-01-18 腾讯科技(深圳)有限公司 Construction method, disease label construction method and the smart machine of medical model
CN109243618B (en) * 2018-09-12 2020-06-16 腾讯科技(深圳)有限公司 Medical model construction method, disease label construction method and intelligent device
CN109166608A (en) * 2018-09-17 2019-01-08 新华三大数据技术有限公司 Electronic health record information extracting method, device and equipment
CN110931137A (en) * 2018-09-19 2020-03-27 京东方科技集团股份有限公司 Machine-assisted dialog system, method and device
CN110931137B (en) * 2018-09-19 2023-07-07 京东方科技集团股份有限公司 Machine-assisted dialog systems, methods, and apparatus
CN109493977A (en) * 2018-11-09 2019-03-19 天津新开心生活科技有限公司 Text data processing method, device, electronic equipment and computer-readable medium
CN109493977B (en) * 2018-11-09 2020-07-31 天津新开心生活科技有限公司 Text data processing method and device, electronic equipment and computer readable medium
CN109545383A (en) * 2018-11-12 2019-03-29 北京懿医云科技有限公司 Actual clinical path mutation detection method and device, storage medium, electronic equipment
CN109524071B (en) * 2018-11-16 2021-07-27 郑州大学第一附属医院 Chinese electronic medical record text structured analysis-oriented labeling method
CN109524071A (en) * 2018-11-16 2019-03-26 郑州大学第附属医院 A kind of mask method towards the neutralizing analysis of Chinese electronic health record text structure
CN109785918B (en) * 2018-12-29 2021-10-01 南京海泰医疗信息系统有限公司 Data acquisition system and method applied to clinical scientific research
CN109785918A (en) * 2018-12-29 2019-05-21 南京海泰医疗信息系统有限公司 A kind of data collection system and method applied to clinical research
CN109817330A (en) * 2019-01-25 2019-05-28 华院数据技术(上海)有限公司 A kind of disease forecasting device
CN110020005B (en) * 2019-03-28 2021-03-26 云知声(上海)智能科技有限公司 Method for matching main complaints in medical records with symptoms in current medical history
CN110020005A (en) * 2019-03-28 2019-07-16 云知声(上海)智能科技有限公司 Symptom matching process in main suit and present illness history in a kind of case history
CN110097975A (en) * 2019-04-28 2019-08-06 湖南省蓝蜻蜓网络科技有限公司 A kind of nosocomial infection intelligent diagnosing method and system based on multi-model fusion
CN110289058A (en) * 2019-06-06 2019-09-27 北京市天元网络技术股份有限公司 A kind of electronic health record standardization matching process and device
CN110349639A (en) * 2019-07-12 2019-10-18 之江实验室 A kind of multicenter medical terms standardized system based on common therapy terminology bank
CN110362829A (en) * 2019-07-16 2019-10-22 北京百度网讯科技有限公司 Method for evaluating quality, device and the equipment of structured patient record data
CN110362829B (en) * 2019-07-16 2023-01-03 北京百度网讯科技有限公司 Quality evaluation method, device and equipment for structured medical record data
CN110442633A (en) * 2019-08-12 2019-11-12 南京医渡云医学技术有限公司 Structural data generation method and device, storage medium and electronic equipment
CN110534185A (en) * 2019-08-30 2019-12-03 腾讯科技(深圳)有限公司 Labeled data acquisition methods divide and examine method, apparatus, storage medium and equipment
CN114585443B (en) * 2019-10-31 2023-11-03 美国西门子医学诊断股份有限公司 Apparatus and method for training diagnostic analyzer model
CN114585443A (en) * 2019-10-31 2022-06-03 美国西门子医学诊断股份有限公司 Apparatus and method for training a diagnostic analyzer model
CN111159978B (en) * 2019-12-30 2023-07-21 北京爱医生智慧医疗科技有限公司 Character string replacement processing method and device
CN111159978A (en) * 2019-12-30 2020-05-15 北京爱医生智慧医疗科技有限公司 Method and device for replacing character strings
CN111274806B (en) * 2020-01-20 2020-11-06 医惠科技有限公司 Method and device for recognizing word segmentation and part of speech and method and device for analyzing electronic medical record
CN111274806A (en) * 2020-01-20 2020-06-12 医惠科技有限公司 Method and device for recognizing word segmentation and part of speech and method and device for analyzing electronic medical record
CN111291568B (en) * 2020-03-06 2023-03-31 西南交通大学 Automatic entity relationship labeling method applied to medical texts
CN111291568A (en) * 2020-03-06 2020-06-16 西南交通大学 Automatic entity relationship labeling method applied to medical texts
CN111382273B (en) * 2020-03-09 2023-04-14 广州智赢万世市场管理有限公司 Text classification method based on feature selection of attraction factors
CN111382273A (en) * 2020-03-09 2020-07-07 西安理工大学 Text classification method based on feature selection of attraction factors
CN111539194A (en) * 2020-03-24 2020-08-14 华东理工大学 Usability evaluation method of medical text structured algorithm
CN111444333B (en) * 2020-04-25 2023-08-11 上海健交科技服务有限责任公司 Coding mapping method for insurance medicine and clinical medicine
CN111444333A (en) * 2020-04-25 2020-07-24 上海健交科技服务有限责任公司 Insurance medicine and clinical medicine code mapping method
CN111681724A (en) * 2020-05-07 2020-09-18 浙江大学医学院附属第四医院(浙江省义乌医院、浙江大学医学院附属第四医院医共体) Electronic medical record key entity standardized identification method and identification system
CN111625646B (en) * 2020-05-22 2023-04-21 泰康保险集团股份有限公司 Method, device, electronic equipment and storage medium for processing insurance policy
CN111625646A (en) * 2020-05-22 2020-09-04 泰康保险集团股份有限公司 Method and device for processing insurance policy, electronic equipment and storage medium
CN111666414B (en) * 2020-06-12 2023-10-17 上海观安信息技术股份有限公司 Method for detecting cloud service by sensitive data and cloud service platform
CN111666414A (en) * 2020-06-12 2020-09-15 上海观安信息技术股份有限公司 Method for detecting cloud service by sensitive data and cloud service platform
CN111986750B (en) * 2020-07-27 2023-12-26 北京天健源达科技股份有限公司 Structural detection method for electronic medical record template
CN111986750A (en) * 2020-07-27 2020-11-24 北京天健源达科技股份有限公司 Electronic medical record template structured detection method
CN111986743A (en) * 2020-08-07 2020-11-24 上海神桥医药科技有限公司 Medical auditing method and application thereof
CN112101019A (en) * 2020-08-12 2020-12-18 南京航空航天大学 Requirement template conformance checking optimization method based on part-of-speech tagging and chunk analysis
CN112101030B (en) * 2020-08-24 2024-01-26 沈阳东软智能医疗科技研究院有限公司 Method, device and equipment for establishing term mapping model and realizing standard word mapping
CN112101030A (en) * 2020-08-24 2020-12-18 沈阳东软智能医疗科技研究院有限公司 Method, device and equipment for establishing term mapping model and realizing standard word mapping
CN112133390B (en) * 2020-09-17 2024-03-22 吾征智能技术(北京)有限公司 Liver disease cognition system based on electronic medical record
CN112133390A (en) * 2020-09-17 2020-12-25 吾征智能技术(北京)有限公司 Liver disease cognitive system based on electronic medical record
CN112270186B (en) * 2020-11-04 2024-02-02 吾征智能技术(北京)有限公司 Mouth based on entropy model peppery text information matching system
CN112270186A (en) * 2020-11-04 2021-01-26 吾征智能技术(北京)有限公司 Hot text information matching system based on entropy model
CN112562807A (en) * 2020-12-11 2021-03-26 北京百度网讯科技有限公司 Medical data analysis method, apparatus, device, storage medium, and program product
CN112562807B (en) * 2020-12-11 2024-03-12 北京百度网讯科技有限公司 Medical data analysis method, apparatus, device, storage medium, and program product
CN112434756A (en) * 2020-12-15 2021-03-02 杭州依图医疗技术有限公司 Training method, processing method, device and storage medium of medical data
CN112507198A (en) * 2020-12-18 2021-03-16 北京百度网讯科技有限公司 Method, apparatus, device, medium, and program for processing query text
CN112667813B (en) * 2020-12-30 2022-03-01 北京华宇元典信息服务有限公司 Method for identifying sensitive identity information of referee document
CN112667813A (en) * 2020-12-30 2021-04-16 北京华宇元典信息服务有限公司 Method for identifying sensitive identity information of referee document
CN112652393A (en) * 2020-12-31 2021-04-13 山东大学齐鲁医院 ERCP quality control method, system, storage medium and equipment based on deep learning
CN112687397B (en) * 2020-12-31 2023-05-09 四川大学华西医院 Rare disease knowledge base processing method and device and readable storage medium
CN112687397A (en) * 2020-12-31 2021-04-20 四川大学华西医院 Rare disease knowledge base processing method and device and readable storage medium
CN112860842A (en) * 2021-03-05 2021-05-28 联仁健康医疗大数据科技股份有限公司 Medical record labeling method and device and storage medium
CN113011183B (en) * 2021-03-23 2023-09-05 北京科东电力控制系统有限责任公司 Unstructured text data processing method and system in electric power regulation and control field
CN113011183A (en) * 2021-03-23 2021-06-22 北京科东电力控制系统有限责任公司 Unstructured text data processing method and system in electric power regulation and control field
CN113611411A (en) * 2021-10-09 2021-11-05 浙江大学 Body examination aid decision-making system based on false negative sample identification
CN113611411B (en) * 2021-10-09 2021-12-31 浙江大学 Body examination aid decision-making system based on false negative sample identification
CN115983233A (en) * 2023-01-04 2023-04-18 重庆邮电大学 Electronic medical record duplication rate estimation method based on data stream matching
CN116312915B (en) * 2023-05-19 2023-09-19 之江实验室 Method and system for standardized association of drug terms in electronic medical records
CN116312915A (en) * 2023-05-19 2023-06-23 之江实验室 Method and system for standardized association of drug terms in electronic medical records

Similar Documents

Publication Publication Date Title
CN106383853A (en) Realization method and system for electronic medical record post-structuring and auxiliary diagnosis
CN111192680B (en) Intelligent auxiliary diagnosis method based on deep learning and collective classification
Wei et al. Task-oriented dialogue system for automatic diagnosis
CN109460473B (en) Electronic medical record multi-label classification method based on symptom extraction and feature representation
CN110838368B (en) Active inquiry robot based on traditional Chinese medicine clinical knowledge map
CN105677873B (en) Text Intelligence association cluster based on model of the domain knowledge collects processing method
CN109949938B (en) Method and device for standardizing medical non-standard names
CN110033859A (en) Assess method, system, program and the storage medium of the medical findings of patient
CN106874643A (en) Build the method and system that knowledge base realizes assisting in diagnosis and treatment automatically based on term vector
CN111540468A (en) ICD automatic coding method and system for visualization of diagnosis reason
CN109522551A (en) Entity link method, apparatus, storage medium and electronic equipment
CN108319605A (en) The structuring processing method and system of medical examination data
CN109344250A (en) Single diseases diagnostic message rapid structure method based on medical insurance data
Wang et al. A framework and its empirical study of automatic diagnosis of traditional Chinese medicine utilizing raw free-text clinical records
CN106407664B (en) The domain-adaptive device of breath diagnosis system
Faulconer et al. An eight-step method for assessing diagnostic data quality in practice: chronic obstructive pulmonary disease as an exemplar.
CN109478419A (en) The automatic identification of significant discovery code in structuring and narrative report
CN110866121A (en) Knowledge graph construction method for power field
CN113688255A (en) Knowledge graph construction method based on Chinese electronic medical record
Banerjee et al. Automatic inference of BI-RADS final assessment categories from narrative mammography report findings
Wang et al. An answer recommendation algorithm for medical community question answering systems
CN112635013A (en) Medical image information processing method and device, electronic equipment and storage medium
Hou et al. Automatic report generation for chest X-ray images via adversarial reinforcement learning
Atef et al. AQAD: 17,000+ arabic questions for machine comprehension of text
Chang et al. Data-driven analysis of radiologists’ behavior for diagnosing thyroid nodules

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20170208

RJ01 Rejection of invention patent application after publication