CN108091372A - Medical field mapping method of calibration and device - Google Patents

Medical field mapping method of calibration and device Download PDF

Info

Publication number
CN108091372A
CN108091372A CN201611027722.1A CN201611027722A CN108091372A CN 108091372 A CN108091372 A CN 108091372A CN 201611027722 A CN201611027722 A CN 201611027722A CN 108091372 A CN108091372 A CN 108091372A
Authority
CN
China
Prior art keywords
field
referential
verified
similarity
word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201611027722.1A
Other languages
Chinese (zh)
Other versions
CN108091372B (en
Inventor
郑号
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Medical Cross Cloud (beijing) Technology Co Ltd
Yidu Cloud Beijing Technology Co Ltd
Original Assignee
Medical Cross Cloud (beijing) Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Medical Cross Cloud (beijing) Technology Co Ltd filed Critical Medical Cross Cloud (beijing) Technology Co Ltd
Priority to CN201611027722.1A priority Critical patent/CN108091372B/en
Publication of CN108091372A publication Critical patent/CN108091372A/en
Application granted granted Critical
Publication of CN108091372B publication Critical patent/CN108091372B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Measuring And Recording Apparatus For Diagnosis (AREA)

Abstract

The disclosure provides a kind of medical field mapping method of calibration and device, and the medical treatment field mapping method of calibration includes:Field to be verified is received, field to be verified has field name and including field contents;The field contents for treating check field are segmented to obtain multiple first segmenting words;Each first segmenting word is characterized as the first term vector respectively;It calculates being averaged for the first term vector and is worth to the center vector of field to be verified;The similarity of the center vector of field to be verified and the center vector of multiple referential fields is calculated respectively;Target referential field is determined from multiple referential fields with reference to similarity;Wherein, target referential field is the highest referential field of similarity degree with field to be verified;Field more to be verified and the title of target referential field, and it is whether correct according to the field contents of comparative result confirmation field to be verified and the mapping relations of field name.The disclosure can improve the accuracy of check results.

Description

Medical field mapping method of calibration and device
Technical field
This disclosure relates to medical big data technical field, in particular to a kind of medical field mapping method of calibration and Medical field maps calibration equipment.
Background technology
At present, in curative activity, substantial amounts of medical data can be generated, these medical datas generally include to include substantial amounts of Field, these fields generally comprise essential information, diagnosis information, idagnostic logout, inspection record and recording pathological mechanism of patient etc..For Effective these medical datas of management, it is necessary to each different medical data of hospital be mapped on unified data platform, The field contents and field name for making the field in medical data are corresponded to according to certain mapping relations.
But since the medical data of each hospital is not quite similar in data format and data content so that in medical data Field mapping process be likely to occur mistake, cause occur field contents and the mapping error of field name on data platform.Cause This is verified, it is necessary to map medical field, that is, judges whether field name is consistent with field contents.
In the prior art, the verification of medical field mapping is mainly judged according to field length and field format similar Field, then according to the whether identical correctness for examining mapping result of the field contents of similar field and field name.But by In many different texts on field length, field format there is no marked difference, cause to be difficult to accurately to find field contents With the mapping relations mistake of field name, i.e., medical field mapping error, thus the accuracy of check results has much room for improvement.
It should be noted that information is only used for strengthening the reason to the background of the disclosure disclosed in above-mentioned background section Solution, therefore can include not forming the information to the prior art known to persons of ordinary skill in the art.
The content of the invention
The disclosure is designed to provide a kind of medical field mapping method of calibration and medical field mapping calibration equipment, into And one or more is overcome the problems, such as caused by the limitation of correlation technique and defect at least to a certain extent.
According to one aspect of the disclosure, a kind of medical field mapping method of calibration, including:
Field to be verified is received, the field to be verified has field name and including field contents;
The field contents of the field to be verified are segmented to obtain multiple first segmenting words;
Each first segmenting word is characterized as the first term vector respectively;
It calculates being averaged for first term vector and is worth to the center vector of the field to be verified;
The similarity of the center vector of the field to be verified and the center vector of multiple referential fields is calculated respectively;
Target referential field is determined from the multiple referential field with reference to the similarity;Wherein, the target reference Field is the highest referential field of similarity degree with the field to be verified;
Compare the title of the field to be verified and the target referential field, and school is treated according to confirming comparative result Test field field contents and field name mapping relations it is whether correct.
In a kind of exemplary embodiment of the disclosure, the step of further including the center vector for calculating the referential field, Including:
The referential field is received, the referential field has field name and including field contents;
The field contents of the referential field are segmented to obtain multiple second segmenting words;
Each second segmenting word is characterized as the second term vector respectively;
It calculates being averaged for second term vector and is worth to the center vector of the referential field.
In a kind of exemplary embodiment of the disclosure, it is described by each second segmenting word be characterized as respectively the second word to Amount includes:
To include at least the multiple referential field segmented with reference to corpus obtain multiple 3rd segmenting words;
Each 3rd segmenting word is characterized as the 3rd term vector respectively, and builds each 3rd segmenting word and described the Mapping relations between three term vectors;
It is searched in mapping relations between the 3rd segmenting word and the 3rd term vector identical with second segmenting word Second term vector of corresponding 3rd term vector of the 3rd segmenting word as second segmenting word.
In a kind of exemplary embodiment of the disclosure, it is described by each first segmenting word be characterized as respectively the first word to Amount includes:
It is searched in mapping relations between the 3rd segmenting word and the 3rd term vector identical with first segmenting word First term vector of corresponding 3rd term vector of the 3rd segmenting word as first segmenting word.
In a kind of exemplary embodiment of the disclosure, the similarity with reference to described in is true from the multiple referential field The referential field that sets the goal includes:
The referential field of the highest predetermined quantity of similarity is chosen in the multiple referential field as candidate's reference word Section;
Based on described with reference to corpus, the weight of the similarity, field average length are calculated according to pre-determined model The weight of weight and field dispersion, the percentage shared by most high frequency words that the field dispersion includes for field;
Gone out respectively according to the similarity, the field average length, the field dispersion and its respective weight calculation The weight score of a candidate's referential field;
The highest candidate's referential field of the weight score is chosen as the target referential field.
In a kind of exemplary embodiment of the disclosure, the pre-determined model is decision-tree model.
In a kind of exemplary embodiment of the disclosure, the similarity is cosine similarity.
According to another aspect of the disclosure, a kind of medical field mapping calibration equipment is provided, including:
Receiving unit, for receiving field to be verified, the field to be verified has field name and including field contents;
Participle unit is segmented to obtain multiple first segmenting words for the field contents to the field to be verified;
Characterization unit, for each first segmenting word to be characterized as the first term vector respectively;
First computing unit, for calculate first term vector be averaged be worth to the center of the field to be verified to Amount;
Second computing unit, for calculating the center of the center vector of the field to be verified and multiple referential fields respectively The similarity of vector;
Unit is chosen, target referential field is determined from the multiple referential field according to the similarity;Wherein, it is described Target referential field is the highest referential field of similarity with the field to be verified;
Judging unit for the field to be verified and the title of the target referential field, and is tied according to comparing Whether the field contents of the fruit confirmation field to be verified and the mapping relations of field name are correct.
In a kind of exemplary embodiment of the disclosure, second computing unit includes:
Receiving module, for receiving the referential field, the referential field has field name and including field contents;
Word-dividing mode is segmented to obtain multiple second segmenting words for the field contents to the referential field;
Characterization module, for each second segmenting word to be characterized as the second term vector respectively;
Computing module is worth to the center vector of the referential field for calculating being averaged for second term vector.
In a kind of exemplary embodiment of the disclosure, the selection unit includes:
Selecting module is made for choosing the referential field of the highest predetermined quantity of similarity in the multiple referential field For candidate's referential field;
For being based on the reference corpus, the power of the similarity is calculated according to pre-determined model for weight computation module The weight of weight, the weight of field average length and field dispersion, the most high frequency words institute that the field dispersion includes for field The percentage accounted for;
Points calculating module, for according to the similarity, the field average length, the field dispersion and its each From weight calculation go out the weight score of each candidate's referential field;
Evaluation module, for choosing the highest candidate's referential field of the weight score as the target referential field.
The medical field mapping method of calibration of the disclosure and medical field mapping calibration equipment, can first determine field to be verified Center vector, and by comparing the field to be verified center vector and multiple referential fields center vector similarity It determines and the most like target referential field of the field to be verified.So as to by comparing the phase for the vector for being used to characterize field The similarity of interfield is judged like degree, judges the phase of interfield by comparing field length and field format compared to direct Like degree, the feature of field can more be reflected by characterizing the vector of field, convenient for more accurately comparing similarity, be advantageously ensured that described Target referential field is the highest referential field of similarity degree with the field to be verified.As a result, by the target with reference to word Section is reference standard, confirms the whether correct accuracy of mapping relations of the field contents and field name of the field to be verified It is improved.
It should be appreciated that above general description and following detailed description are only exemplary and explanatory, not The disclosure can be limited.
Description of the drawings
Attached drawing herein is merged in specification and forms the part of this specification, shows the implementation for meeting the disclosure Example, and for explaining the principle of the disclosure together with specification.It should be evident that the accompanying drawings in the following description is only the disclosure Some embodiments, for those of ordinary skill in the art, without creative efforts, can also basis These attached drawings obtain other attached drawings.
Fig. 1 schematically shows the flow chart of the medical field mapping method of calibration of one example embodiment of the disclosure.
Fig. 2 schematically show calculating first term vector of one example embodiment of the disclosure be averaged be worth to it is described The flow chart of the center vector of field to be verified.
What Fig. 3 schematically showed one example embodiment of the disclosure is characterized as the second word respectively by each second segmenting word The flow chart of vector.
Fig. 4 schematically shows the similarity with reference to described in of one example embodiment of the disclosure from the multiple referential field Determine the flow chart of target referential field.
Fig. 5 schematically shows the block diagram of the medical field mapping calibration equipment of one example embodiment of the disclosure.
Fig. 6 schematically shows the block diagram of the second computing unit of one example embodiment of the disclosure.
Fig. 7 schematically shows the block diagram of the selection unit of one example embodiment of the disclosure.
Specific embodiment
Example embodiment is described more fully with reference to the drawings.However, example embodiment can be with a variety of shapes Formula is implemented, and is not understood as limited to example set forth herein;On the contrary, these embodiments are provided so that the disclosure will more Fully and completely, and by the design of example embodiment comprehensively it is communicated to those skilled in the art.Described feature, knot Structure or characteristic can be in any suitable manner incorporated in one or more embodiments.In the following description, provide perhaps More details fully understand embodiment of the present disclosure so as to provide.It it will be appreciated, however, by one skilled in the art that can One or more in the specific detail are omitted with technical solution of the disclosure or others side may be employed Method, constituent element, device, step etc..In other cases, be not shown in detail or describe known solution to avoid a presumptuous guest usurps the role of the host and So that all aspects of this disclosure thicken.
In addition, attached drawing is only the schematic illustrations of the disclosure, it is not necessarily drawn to scale.Identical attached drawing mark in figure Note represents same or similar part, thus will omit repetition thereof.Attached some block diagrams shown in figure are work( Can entity, not necessarily must be corresponding with physically or logically independent entity.Software form may be employed to realize these work( Entity or these functional entitys can be realized in one or more hardware modules or integrated circuit or at heterogeneous networks and/or place These functional entitys are realized in reason device device and/or microcontroller device.
A kind of medical field mapping method of calibration is provided firstly in this example embodiment, for verifying in medical data Medical field mapping correctness, as shown in Figure 1, it is described medical treatment field mapping method of calibration may comprise steps of:
Step S110, field to be verified is received, the field to be verified has field name and including field contents;
Step S120, the field contents of the field to be verified are segmented to obtain multiple first segmenting words;
Step S130, each first segmenting word is characterized as the first term vector respectively;
Step S140, calculate being averaged for the first term vector and be worth to the center vector of the field to be verified;
Step S150, the phase of the center vector and the center vector of multiple referential fields of the field to be verified is calculated respectively Like degree;
Step S160, target referential field is determined from the multiple referential field with reference to the similarity;Wherein, it is described Target referential field is the highest referential field of similarity degree with the field to be verified;And
Step S170, field to be verified described in comparison and the title of the target referential field, and it is true according to comparative result Recognize the field to be verified field contents and field name mapping relations it is whether correct.
This example embodiment medical field mapping method of calibration, can by comparing the field to be verified center to Amount and the similarity of the center vector of multiple referential fields determine the target referential field most like with the field to be verified.From And can judge the similarity of interfield by comparing the similarity of the vector of characterization field, compared to directly by comparing field Length and field format judge the similarity of interfield, and the feature of field can more be reflected by characterizing the vector of field, convenient for more Accurately compare similarity, it is highest with the similarity degree of the field to be verified to advantageously ensure that the target referential field Referential field.It is as a result, reference standard by the target referential field, confirms the field contents and field of the field to be verified The whether correct accuracy of the mapping relations of title is improved.
In the following, each step of the medical field mapping method of calibration in this example embodiment will be carried out further It is bright.
In step s 110, field to be verified is received, the field to be verified has field name and including field contents.
In this example embodiment, the field to be verified can be chosen from the medical data of hospital to be measured, the field It can be field that text-type field, numeric type field or the two combine etc..And the field to be verified can include the base of patient The contents such as this information, diagnosis information, idagnostic logout, medical expense record, inspection record, recording pathological mechanism, admission records.
For example, a hospital for needing to verify may be selected as hospital to be measured, receive the medical treatment from the hospital to be measured The disease field chosen in data is disease as field to be verified, field name, and field contents may include heart disease, sugar Urinate disease, pneumonia etc..
In the step s 120, the field contents of the field to be verified are segmented to obtain multiple first segmenting words.
In this example embodiment, the algorithm of participle can there are many, such as:Based on dictionary, the matched participle of dictionary Method;Segmenting method that the segmenting method or knowledge based of word-based frequency statistics understand etc., also can be used it is other can be to described The method that field to be verified is segmented, does not limit herein.The Words partition system in Chinese word segmentation can be used as participle instrument The field to be verified is segmented, which can be based on any of the above-described algorithm or other segmentation methods, such as:This point Word system can be stammerer Words partition system, NLPIR Chinese word segmentation systems etc., user can also use it is other will be described to be verified Field cutting is the Words partition system of multiple first segmenting words, is not limited herein.
For example, after the field to be verified is received, stammerer Words partition system can be used by the disease field Words sequence cutting is opened, and generates segmenting word independent one by one, that is, generates multiple first cuttings such as heart disease, diabetes, pneumonia Word.
In step s 130, each first segmenting word is characterized as the first term vector respectively.
In this example embodiment, the language models conduct such as word2vec models or neutral net language model can be used Characterization tool can also determine the first segmenting word corresponding the by inquiring about the mapping relations of the segmenting word that pre-establish and term vector One term vector, so as to which each first segmenting word is characterized as the first term vector, method detailed can refer to hereinafter step S1531- steps The method that each second segmenting word is characterized as the second term vector respectively described in rapid S1533.By by each first cutting Word is characterized as the first term vector, and each first term vector is corresponded with each first segmenting word, so as to which text is special Sign digitlization, convenient for more accurately comparing similarity.
For example, the mapping relations of segmenting word and term vector can be pre-established, heart disease therein, diabetes, pneumonia It is both contained in Deng the first segmenting word in the mapping relations, that is to say, that exist and heart disease, diabetes, lung in the mapping relations The segmenting words such as inflammation, are then determined by inquiring about the mapping relations according to first segmenting word such as heart disease, diabetes, pneumonia and heart The corresponding term vector such as disease, diabetes, pneumonia is as the first term vector, so as to fulfill by heart disease, diabetes, pneumonia etc. first Segmenting word characterization is the first term vector.
In step 140, calculate being averaged for the first term vector and be worth to the center vector of the field to be verified.
For example, the first term vector of whole corresponding with first segmenting word such as heart disease, diabetes, pneumonia can be calculated Average value, the average value of the plurality of vector is the above-mentioned center vector as the disease field for treating school field.Compared to wherein Any first term vector, the center vector of field to be verified can more comprehensively reflect the feature of field to be verified, be conducive to carry High accuracy.
In step S150, the center vector of the field to be verified and the center vector of multiple referential fields are calculated respectively Similarity.
In this example embodiment, the similarity can be cosine similarity, adjustment cosine similarity or Pearson came phase Relation number etc. can be used for the measurement for judging the similarity degree between vector.
In this example embodiment, as shown in Fig. 2, the step of calculating the center vector of the referential field can include Step S151 to step S154.
In step S151, referential field is received, the referential field has field name and including field contents.
In this example embodiment, since referential field is the standard that the field to be verified is treated in verification, can be first The preferable hospital of a quality of data is selected as with reference to hospital, in the medical data of the reference hospital, field name and field All correct or accuracy is very high for the mapping relations of content, so as to be conducive to improve the accuracy of subsequent check result.
For example, the preferable hospital of selecting data quality is used as with reference to hospital, from the medical data of the reference hospital It is middle to select the fields such as disease, symptom and drug as referential field.The field name of the disease field be disease, field contents The content of non-disease title is not included including heart disease, diabetes, pneumonia etc.;The field name of symptom field is symptom and does not wrap Content containing non-symptom, field contents include uncomfortable in chest, shortness of breath etc.;The field name of medication field be drug, field contents bag Content containing aspirin, Amoxicillin etc. and not comprising non-drug.
In step S152, the field contents of the referential field are segmented to obtain multiple second segmenting words.
The Words partition system that the referential field can be used employed in step S120 segments, but can also be used other Words partition system is segmented, that is to say, that stammerer Words partition system, NLPIR Chinese word segmentation systems or other Words partition systems can be used.
For example, after the disease field for electing referential field as, symptom field and medication field etc. is received, can be used Stammerer Words partition system opens the words sequence cutting of above three field, generates heart disease, diabetes, pneumonia, uncomfortable in chest, gas Multiple second segmenting words such as short, aspirin and Amoxicillin.
In step S153, each second segmenting word is characterized as the second term vector respectively.
In this example embodiment, the language models conduct such as word2vec models or neutral net language model can be used Each second segmenting word is characterized as the second term vector by characterization tool respectively.
In this example embodiment, as shown in figure 3, it is described by each second segmenting word be characterized as respectively the second word to Amount may include step S1531- steps S1533.
In step S1531, to include at least the multiple referential field segmented to obtain with reference to corpus it is multiple 3rd segmenting word.
In this example embodiment, the principle of participle can refer to the participle that check field is treated in step S120, can adopt It is segmented with stammerer Words partition system or other Words partition systems to described with reference to corpus, generates the multiple 3rd segmenting word. While the reference corpus includes each referential field, other words in the medical data with reference to hospital are may also include Section.Preferably using whole medical datas with reference to hospital as described with reference to corpus.
For example, may be selected to be used as with reference to corpus with reference to the medical data base of hospital, wherein comprising disease, symptom, The fields such as drug segment this with reference to corpus using stammerer Words partition system, obtain heart disease, diabetes, pneumonia, chest Multiple 3rd segmenting words such as bored, shortness of breath, powerless, aspirin, Amoxicillin.
In step S1532, each 3rd segmenting word is characterized as the 3rd term vector respectively, and builds each described 3rd Mapping relations between segmenting word and the 3rd term vector.Wherein, word2vec models or neutral net language mould can be used The language models such as type are as characterization tool.
For example, using word2vec models, by heart disease, diabetes, pneumonia, uncomfortable in chest, shortness of breath, powerless, Ah Si Multiple 3rd segmenting words such as woods, Amoxicillin are characterized as term vector so as to generate multiple 3rd term vectors respectively, wherein, heart 3rd segmenting word such as disease, diabetes, pneumonia, uncomfortable in chest, shortness of breath, powerless, aspirin, Amoxicillin has the corresponding 3rd Term vector, so as to build the 3rd cutting such as heart disease, diabetes, pneumonia, uncomfortable in chest, shortness of breath, powerless, aspirin, Amoxicillin The relation of reflecting of word and each 3rd term vector is penetrated.
In step S1533, in the mapping relations between the 3rd segmenting word and the 3rd term vector search with it is described Corresponding 3rd term vector of identical the 3rd segmenting word of second segmenting word as second segmenting word described Two term vectors.
For example, when need using as the heart disease of the second segmenting word, diabetes, pneumonia, uncomfortable in chest, shortness of breath, powerless, Ah Take charge of a woods, Amoxicillin when segmenting words are characterized as the second term vector, can be searched in the 3rd segmenting word heart disease, diabetes, The segmenting words such as pneumonia, uncomfortable in chest, shortness of breath, powerless, aspirin, Amoxicillin;Then according to above-mentioned mapping relations, inquire about wherein Corresponding 3rd term vector of the segmenting words such as heart disease, diabetes, pneumonia is obtained, the 3rd term vector is as the second segmenting word The segmenting words such as heart disease, diabetes, pneumonia, uncomfortable in chest, shortness of breath, powerless, aspirin, Amoxicillin the second term vector.From And when the second segmenting word is characterized as the second term vector, it avoids, using special language model, thereby simplifying characterization process.
It should be noted that based on the above-mentioned method that each second segmenting word is characterized as the second term vector respectively, In this example embodiment, each first segmenting word is characterized as the first term vector by described in step 130 respectively may include:
It is searched in mapping relations between the 3rd segmenting word and the 3rd term vector identical with first segmenting word The 3rd segmenting word, and using the 3rd term vector corresponding with the 3rd segmenting word as the first term vector of first segmenting word. Wherein, the mapping relations between the 3rd segmenting word and the 3rd term vector be the above-mentioned segmenting word pre-established and word to The mapping relations of amount.So as to simplified characterization process.Above-mentioned steps S1533 is can refer to, details are not described herein.
In step S154, calculate being averaged for the second term vector and be worth to the center vector of the referential field.
It for example, can be according to as corresponding second word of whole such as heart disease, diabetes, the pneumonia in the second segmenting word Vector calculates the average value of all the second term vectors, which is the center vector of disease field in above-mentioned referential field. In the manner described above, it can obtain the center vector of the referential fields such as symptom field and medication field.Center vector can more comprehensively Reflection referential field feature, so as to be conducive to improve accuracy.
Description of the summary to step 150, step 151- steps 154 and step S1531- steps S1533, citing and Speech, can the center vector of disease field of computational representation field to be verified and the center vector of the disease field of characterization referential field Angle, the cosine value of the angle for both cosine similarity, such as when the angle is 0 °, cosine similarity 1.Similarly, The cosine phase of the center vector of disease field of the symptom field of characterization referential field with characterizing field to be verified can be calculated Like in the center vector of the fields such as degree and the medication field for characterizing referential field and the disease field of characterization field to be verified The cosine similarity of Heart vector.
In step S160, target referential field is determined from the multiple referential field with reference to the similarity;Wherein, The target referential field is the highest referential field of similarity degree with the field to be verified.
It in this example embodiment, can be directly ranked up according to cosine similarity, choose the cosine similarity most High referential field is as the target referential field.
In this example embodiment, as shown in figure 4, the similarity with reference to described in is true from the multiple referential field The referential field that sets the goal may include step S161- steps S164.
In step S161, the referential field that the highest predetermined quantity of similarity is chosen in the multiple referential field is made For candidate's referential field.
In this example embodiment, the similarity can be cosine similarity, can be according to cosine similarity to multiple ginsengs It is ranked up according to field, selects the referential field of the highest predetermined quantity of cosine similarity, and using these referential fields as time Choosing is according to field.The predetermined quantity can be the positive integer not less than 2, such as 3,4,5 etc..It can be by user's sets itself.In addition, In order to avoid being selected into the relatively low referential field of similarity, the predetermined quantity should not be too large.
It for example, can be according to the cosine similarity to disease field, symptom field and the drug word in referential field Section etc. fields be ranked up, if the disease field, symptom field and medication field in referential field be with as field to be verified Disease field similarity for highest three, then can be by disease field, symptom field and the medication field in referential field As candidate's referential field.
In step S162, based on described with reference to corpus, weight, the word of the similarity are calculated according to pre-determined model The section weight of average length and the weight of field dispersion, hundred shared by most high frequency words that the field dispersion includes for field Fraction.
In this example embodiment, the field average length can be described long with reference to the referential field in corpus The average value of degree, the field dispersion are the percentage shared by the most high frequency words in each referential field.The similarity The weight of the weight degree of denoting like of weight, the weight of field average length and field dispersion, field average length and Significance level of the field dispersion when judging field similarity degree.The pre-determined model can be used decision-tree model or it is other can For computation model.
For example, based on the above-mentioned reference corpus with reference to hospital, cosine similarity is calculated according to decision-tree model Weight for 1, the weight of field average length is 0.8, and the weight of field dispersion is 0.5.
In step S163, according to the similarity, the field average length, the field dispersion and its respective Weight calculation goes out the weight score of each candidate's referential field.
For example, disease field, symptom field and medication field in referential field etc. can be calculated respectively according to formula The weight score of field.
Above-mentioned formula is:S=W1 × 1.5+W2 × 0.8+W3 × 0.5, wherein, S is weight score, and W1 is similar for cosine 's;W2 is field average length;W3 is field dispersion.
In step S164, the highest candidate's referential field of the weight score is chosen as the target referential field.
It for example, can adding to fields such as the disease fields, symptom field and medication field as candidate's referential field Power score is compared, and result is:The weight score S2 > medication fields of the weight score S1 > symptom fields of disease field Weight score S3.Illustrate the similarity degree highest of the disease field and disease field to be verified in candidate's referential field.At this point, It can be using the disease field in candidate's referential field as target referential field.
To sum up, referential field and glyphomancy to be checked can be weighed from three similarity, field average length, field dispersion angles The similarity degree of section, is conducive to select the highest referential field of similarity degree as target referential field, convenient for further improving The accuracy of check results.
In step S170, the field to be verified and the title of the target referential field, and tied according to comparing Whether the field contents of the fruit confirmation field to be verified and the mapping relations of field name are correct.
In this example embodiment, due to target referential field field name and field contents mapping relations just True rate is high, therefore can be used as reference standard, and target referential field and the similarity degree highest of field to be verified, therefore, when treating school Test field field name it is identical with the field name of target referential field when, i.e., described comparative result for it is identical when, can determine that The field contents of field to be verified and the mapping relations of field name are correct.
Conversely, when the field name of field to be verified and the field name difference of target referential field, i.e., described comparison As a result for it is different when, can determine that field to be verified field contents and field name mapping relations it is incorrect.
For example, the field name as the disease field of target referential field is disease, and is used as field to be verified Disease field title also for disease, the field name of the two is identical.Therefore, the field contents of disease field to be verified and The mapping relations of field name are correct.
In addition, although describing each step of method in the disclosure with particular order in the accompanying drawings, this does not really want Asking or implying must could realize according to the particular order come the step for performing these steps or having to carry out shown in whole Desired result.It is additional or alternative, it is convenient to omit some steps, by multiple steps merge into a step perform and/ Or a step is decomposed into execution of multiple steps etc..
Following is apparatus of the present invention embodiment, can be used for performing the method for the present invention embodiment.It is real for apparatus of the present invention The details not disclosed in example is applied, refer to the method for the present invention embodiment.
This example embodiment additionally provides a kind of medical field mapping calibration equipment, as shown in figure 5, the medical treatment field Receiving unit 1, participle unit 2, characterization unit 3, the first computing unit 4, the second computing unit can be included by mapping calibration equipment 5th, unit 6 and judging unit 7 are chosen.Wherein:
Receiving unit 1 can be used for receiving field to be verified, and the field to be verified has field name and including field Content.
Participle unit 2 can be used for the field contents of the field to be verified are segmented to obtain multiple first cuttings Word.
Characterization unit 3 can be used for each first segmenting word being characterized as the first term vector respectively.
First computing unit 4, which can be used for calculating being averaged for first term vector, to be worth in the field to be verified Heart vector.
Second computing unit 5 can be used for the center vector for calculating the field to be verified respectively and multiple referential fields The similarity of center vector;
Target referential field can be determined according to the similarity from the multiple referential field by choosing unit 6;Wherein, The target referential field is the highest referential field of similarity with the field to be verified;
Judging unit 7 can be used for the title of field to be verified described in comparison and the target referential field, and according to than Whether the field contents of the relatively result confirmation field to be verified and the mapping relations of field name are correct.
In this example embodiment, when the field name of field to be verified is identical with the field name of target referential field When, can determine that field to be verified field contents and field name mapping relations it is correct;When the field name of field to be verified During with the field name difference of target referential field, then it can determine that the field contents of field to be verified and the mapping of field name are closed It is incorrect.So as to complete to treat the verification of the mapping relations of the field contents of check field and field name.
In this example embodiment, judging unit 7 can be additionally used in the field of the field to be verified confirmed Hold and exported with the whether correct result of the mapping relations of field name.
In this example embodiment, as shown in fig. 6, second computing unit 5 may include receiving module 51, participle mould Block 52, characterization module 53 and computing module 54, wherein:
Receiving module 51 can be used for receiving the referential field, and the referential field has field name and including field Content;
Word-dividing mode 52 can be used for the field contents of the referential field are segmented to obtain multiple second segmenting words;
Characterization module 53 can be used for each second segmenting word being characterized as the second term vector respectively;
Computing module 54 can be used for calculating second term vector be averaged be worth to the center of the referential field to Amount.
In this example embodiment, as shown in fig. 7, choosing unit 6 may include selecting module 61, weight computation module 62nd, points calculating module 63 and evaluation module 64, wherein:
Selecting module 61 can be used for the reference that the highest predetermined quantity of similarity is chosen in the multiple referential field Field is as candidate's referential field;
Weight computation module 62 can be used for calculating the similarity according to pre-determined model with reference to corpus based on described Weight, the weight of the weight of field average length and field dispersion, the field dispersion is the most high frequency that field includes Percentage shared by word;
Points calculating module 63 can be used for according to the similarity, the field average length, the field dispersion And its respective weight calculation goes out the weight score of each candidate's referential field;
Evaluation module 64 can be used for choosing the highest candidate's referential field of the weight score as the target reference Field.
The detail of each module is moved in corresponding virtual objects in above-mentioned medical treatment field mapping calibration equipment controls It is described in detail in method processed, therefore details are not described herein again.
It should be noted that although several modules or list of the equipment for action executing are referred in above-detailed Member, but this division is not enforceable.In fact, according to embodiment of the present disclosure, it is above-described two or more Either the feature of unit and function can embody module in a module or unit.A conversely, above-described mould Either the feature of unit and function can be further divided into being embodied by multiple modules or unit block.
Through the above description of the embodiments, those skilled in the art is it can be readily appreciated that example described herein is implemented Mode can be realized by software, can also be realized in a manner that software is with reference to necessary hardware.Therefore, according to the disclosure The technical solution of embodiment can be embodied in the form of software product, the software product can be stored in one it is non-volatile Property storage medium (can be CD-ROM, USB flash disk, mobile hard disk etc.) in or network on, including some instructions so that a calculating Equipment (can be personal computer, server, mobile terminal or network equipment etc.) is performed according to disclosure embodiment Method.
Those skilled in the art will readily occur to the disclosure its after considering specification and putting into practice invention disclosed herein Its embodiment.This application is intended to cover any variations, uses, or adaptations of the disclosure, these modifications, purposes or Person's adaptive change follows the general principle of the disclosure and including the undocumented common knowledge in the art of the disclosure Or conventional techniques.Description and embodiments are considered only as illustratively, and the true scope and spirit of the disclosure are by appended Claim is pointed out.

Claims (10)

1. a kind of medical treatment field mapping method of calibration, which is characterized in that including:
Field to be verified is received, the field to be verified has field name and including field contents;
The field contents of the field to be verified are segmented to obtain multiple first segmenting words;
Each first segmenting word is characterized as the first term vector respectively;
It calculates being averaged for first term vector and is worth to the center vector of the field to be verified;
The similarity of the center vector of the field to be verified and the center vector of multiple referential fields is calculated respectively;
Target referential field is determined from the multiple referential field with reference to the similarity;Wherein, the target referential field For the highest referential field of similarity degree with the field to be verified;
Compare the title of the field to be verified and the target referential field, and the word to be verified is confirmed according to comparative result Whether the field contents of section and the mapping relations of field name are correct.
2. medical treatment field mapping method of calibration according to claim 1, which is characterized in that further include and calculate the reference word The step of center vector of section, including:
The referential field is received, the referential field has field name and including field contents;
The field contents of the referential field are segmented to obtain multiple second segmenting words;
Each second segmenting word is characterized as the second term vector respectively;
It calculates being averaged for second term vector and is worth to the center vector of the referential field.
3. medical treatment field mapping method of calibration according to claim 2, which is characterized in that described by each second cutting Word is characterized as the second term vector respectively to be included:
To include at least the multiple referential field segmented with reference to corpus obtain multiple 3rd segmenting words;
Each 3rd segmenting word is characterized as the 3rd term vector respectively, and builds each 3rd segmenting word and the 3rd word Mapping relations between vector;
The institute identical with second segmenting word is searched in mapping relations between the 3rd segmenting word and the 3rd term vector State second term vector of corresponding 3rd term vector of the 3rd segmenting word as second segmenting word.
4. medical treatment field mapping method of calibration according to claim 3, which is characterized in that described by each first cutting Word is characterized as the first term vector respectively to be included:
The institute identical with first segmenting word is searched in mapping relations between the 3rd segmenting word and the 3rd term vector State first term vector of corresponding 3rd term vector of the 3rd segmenting word as first segmenting word.
5. it is according to claim 4 medical treatment field mapping method of calibration, which is characterized in that the similarity with reference to described in from Determine that target referential field includes in the multiple referential field:
The referential field of the highest predetermined quantity of similarity is chosen in the multiple referential field as candidate's referential field;
Based on described with reference to corpus, weight, the weight of field average length of the similarity are calculated according to pre-determined model The percentage shared by most high frequency words included with the weight of field dispersion, the field dispersion for field;
Each institute is gone out according to the similarity, the field average length, the field dispersion and its respective weight calculation State the weight score of candidate's referential field;
The highest candidate's referential field of the weight score is chosen as the target referential field.
6. medical treatment field mapping method of calibration according to claim 5, which is characterized in that the pre-determined model is decision tree Model.
7. medical treatment field mapping method of calibration according to claim 1, which is characterized in that the similarity is similar for cosine Degree.
8. a kind of medical treatment field mapping calibration equipment, which is characterized in that including:
Receiving unit, for receiving field to be verified, the field to be verified has field name and including field contents;
Participle unit is segmented to obtain multiple first segmenting words for the field contents to the field to be verified;
Characterization unit, for each first segmenting word to be characterized as the first term vector respectively;
First computing unit is worth to the center vector of the field to be verified for calculating being averaged for first term vector;
Second computing unit, for calculating the center vector of the center vector of the field to be verified and multiple referential fields respectively Similarity;
Unit is chosen, target referential field is determined from the multiple referential field according to the similarity;Wherein, the target Referential field is the highest referential field of similarity degree with the field to be verified;
Judging unit, for the title of the field to be verified and the target referential field, and it is true according to comparative result Recognize the field to be verified field contents and field name mapping relations it is whether correct.
9. medical treatment field mapping calibration equipment according to claim 8, which is characterized in that the second computing unit bag It includes:
Receiving module, for receiving the referential field, the referential field has field name and including field contents;
Word-dividing mode is segmented to obtain multiple second segmenting words for the field contents to the referential field;
Characterization module, for each second segmenting word to be characterized as the second term vector respectively;
Computing module is worth to the center vector of the referential field for calculating being averaged for second term vector.
10. medical treatment field mapping calibration equipment according to claim 8, which is characterized in that the selection unit includes:
Selecting module, for choosing the referential field of the highest predetermined quantity of similarity in the multiple referential field as time Select referential field;
For being based on the reference corpus, weight, the word of the similarity are calculated according to pre-determined model for weight computation module The section weight of average length and the weight of field dispersion, hundred shared by most high frequency words that the field dispersion includes for field Fraction;
Points calculating module, for according to the similarity, the field average length, the field dispersion and its respective Weight calculation goes out the weight score of each candidate's referential field;
Evaluation module, for choosing the highest candidate's referential field of the weight score as the target referential field.
CN201611027722.1A 2016-11-21 2016-11-21 Medical field mapping verification method and device Active CN108091372B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611027722.1A CN108091372B (en) 2016-11-21 2016-11-21 Medical field mapping verification method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611027722.1A CN108091372B (en) 2016-11-21 2016-11-21 Medical field mapping verification method and device

Publications (2)

Publication Number Publication Date
CN108091372A true CN108091372A (en) 2018-05-29
CN108091372B CN108091372B (en) 2021-06-18

Family

ID=62169614

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611027722.1A Active CN108091372B (en) 2016-11-21 2016-11-21 Medical field mapping verification method and device

Country Status (1)

Country Link
CN (1) CN108091372B (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109871382A (en) * 2019-02-13 2019-06-11 北京明略软件系统有限公司 A kind of implementation method and device of tables of data access java standard library
CN109902083A (en) * 2019-02-26 2019-06-18 北京明略软件系统有限公司 Method, apparatus, computer storage medium and the terminal of a kind of pair of mark processing
CN110309504A (en) * 2019-05-23 2019-10-08 平安科技(深圳)有限公司 Text handling method, device, equipment and storage medium based on participle
CN110457704A (en) * 2019-08-12 2019-11-15 北京明略软件系统有限公司 Determination method, apparatus, storage medium and the electronic device of aiming field
CN110532267A (en) * 2019-08-28 2019-12-03 北京明略软件系统有限公司 Determination method, apparatus, storage medium and the electronic device of field
CN110795482A (en) * 2019-10-16 2020-02-14 浙江大华技术股份有限公司 Data benchmarking method, device and storage device
CN111104481A (en) * 2019-12-17 2020-05-05 东软集团股份有限公司 Method, device and equipment for identifying matching field
CN111125311A (en) * 2019-12-24 2020-05-08 医渡云(北京)技术有限公司 Method and device for checking information normalization processing, storage medium and electronic equipment
CN111241086A (en) * 2020-01-17 2020-06-05 甘肃省卫生健康统计信息中心(西北人口信息中心) Data quality improvement method and system based on medical big data
CN111737533A (en) * 2020-06-19 2020-10-02 东软集团股份有限公司 Processing method and device for inspection items, storage medium and equipment
US20210326995A1 (en) * 2019-01-23 2021-10-21 Ping An Technology (Shenzhen) Co., Ltd. Claim settlement anti-fraud method, apparatus, device, and storage medium based on graph computation technology

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020178158A1 (en) * 1999-12-21 2002-11-28 Yuji Kanno Vector index preparing method, similar vector searching method, and apparatuses for the methods
US20070136288A1 (en) * 1998-12-24 2007-06-14 Atsuo Shimada Document processor, document classification device, document processing method, document classification method, and computer-readable recording medium for recording programs for executing the methods on a computer
CN102043813A (en) * 2009-10-13 2011-05-04 北京大学 Medical information treatment server and medical information treatment method
CN102831193A (en) * 2012-08-03 2012-12-19 人民搜索网络股份公司 Topic detecting device and topic detecting method based on distributed multistage cluster
CN104156415A (en) * 2014-07-31 2014-11-19 沈阳锐易特软件技术有限公司 Mapping processing system and method for solving problem of standard code control of medical data

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070136288A1 (en) * 1998-12-24 2007-06-14 Atsuo Shimada Document processor, document classification device, document processing method, document classification method, and computer-readable recording medium for recording programs for executing the methods on a computer
US20020178158A1 (en) * 1999-12-21 2002-11-28 Yuji Kanno Vector index preparing method, similar vector searching method, and apparatuses for the methods
CN102043813A (en) * 2009-10-13 2011-05-04 北京大学 Medical information treatment server and medical information treatment method
CN102831193A (en) * 2012-08-03 2012-12-19 人民搜索网络股份公司 Topic detecting device and topic detecting method based on distributed multistage cluster
CN104156415A (en) * 2014-07-31 2014-11-19 沈阳锐易特软件技术有限公司 Mapping processing system and method for solving problem of standard code control of medical data

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210326995A1 (en) * 2019-01-23 2021-10-21 Ping An Technology (Shenzhen) Co., Ltd. Claim settlement anti-fraud method, apparatus, device, and storage medium based on graph computation technology
CN109871382A (en) * 2019-02-13 2019-06-11 北京明略软件系统有限公司 A kind of implementation method and device of tables of data access java standard library
CN109902083A (en) * 2019-02-26 2019-06-18 北京明略软件系统有限公司 Method, apparatus, computer storage medium and the terminal of a kind of pair of mark processing
CN110309504A (en) * 2019-05-23 2019-10-08 平安科技(深圳)有限公司 Text handling method, device, equipment and storage medium based on participle
CN110309504B (en) * 2019-05-23 2023-10-31 平安科技(深圳)有限公司 Text processing method, device, equipment and storage medium based on word segmentation
CN110457704A (en) * 2019-08-12 2019-11-15 北京明略软件系统有限公司 Determination method, apparatus, storage medium and the electronic device of aiming field
CN110532267A (en) * 2019-08-28 2019-12-03 北京明略软件系统有限公司 Determination method, apparatus, storage medium and the electronic device of field
CN110795482A (en) * 2019-10-16 2020-02-14 浙江大华技术股份有限公司 Data benchmarking method, device and storage device
CN111104481A (en) * 2019-12-17 2020-05-05 东软集团股份有限公司 Method, device and equipment for identifying matching field
CN111104481B (en) * 2019-12-17 2023-10-10 东软集团股份有限公司 Method, device and equipment for identifying matching field
CN111125311A (en) * 2019-12-24 2020-05-08 医渡云(北京)技术有限公司 Method and device for checking information normalization processing, storage medium and electronic equipment
CN111241086A (en) * 2020-01-17 2020-06-05 甘肃省卫生健康统计信息中心(西北人口信息中心) Data quality improvement method and system based on medical big data
CN111737533A (en) * 2020-06-19 2020-10-02 东软集团股份有限公司 Processing method and device for inspection items, storage medium and equipment
CN111737533B (en) * 2020-06-19 2024-02-09 东软集团股份有限公司 Method, device, storage medium and equipment for processing inspection items

Also Published As

Publication number Publication date
CN108091372B (en) 2021-06-18

Similar Documents

Publication Publication Date Title
CN108091372A (en) Medical field mapping method of calibration and device
CN111382255B (en) Method, apparatus, device and medium for question-answering processing
CN107908635A (en) Establish textual classification model and the method, apparatus of text classification
JP7153004B2 (en) COMMUNITY Q&A DATA VERIFICATION METHOD, APPARATUS, COMPUTER DEVICE, AND STORAGE MEDIUM
CN111444724B (en) Medical question-answer quality inspection method and device, computer equipment and storage medium
Pirracchio et al. The balance super learner: A robust adaptation of the super learner to improve estimation of the average treatment effect in the treated based on propensity score matching
CN104462084A (en) Search refinement advice based on multiple queries
CN113593709B (en) Disease coding method, system, readable storage medium and device
CN110147421A (en) A kind of target entity link method, device, equipment and storage medium
US20220215899A1 (en) Affinity prediction method and apparatus, method and apparatus for training affinity prediction model, device and medium
CN109448859A (en) Data processing method and device, electronic equipment, storage medium
US20180004900A1 (en) Method and apparatus for information analysis
CN111160049B (en) Text translation method, apparatus, machine translation system, and storage medium
Liu et al. Convolution neural network based particle filtering for remaining useful life prediction of rolling bearing
Shah et al. Evaluation of deep learning techniques for identification of sarcoma-causing carcinogenic mutations
CN109597989A (en) Diagnose word normalizing method and device, storage medium, electronic equipment
CN113919510A (en) Sample feature selection method, device, equipment and medium
WO2022217715A1 (en) Similar patient identification method and apparatus, computer device, and storage medium
US11322257B2 (en) Intelligent diagnosis system and method
Brown et al. Information growth for sequential monitoring of clinical trials with a stepped wedge cluster randomized design and unknown intracluster correlation
Khalid et al. Calibration of rule-based stochastic biochemical models using statistical model checking
CN112966153B (en) Term mapping method, device, electronic equipment and storage medium
CN117409921B (en) Disease conclusion determination method, device and storage medium
CN107193860A (en) Medicine information multidimensional identification method and system
CN113012780B (en) Method, device and system for grading severity of inspection result in intelligent follow-up visit

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant