CN108091372A - Medical field mapping method of calibration and device - Google Patents
Medical field mapping method of calibration and device Download PDFInfo
- Publication number
- CN108091372A CN108091372A CN201611027722.1A CN201611027722A CN108091372A CN 108091372 A CN108091372 A CN 108091372A CN 201611027722 A CN201611027722 A CN 201611027722A CN 108091372 A CN108091372 A CN 108091372A
- Authority
- CN
- China
- Prior art keywords
- field
- referential
- verified
- similarity
- word
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Animal Behavior & Ethology (AREA)
- Measuring And Recording Apparatus For Diagnosis (AREA)
Abstract
The disclosure provides a kind of medical field mapping method of calibration and device, and the medical treatment field mapping method of calibration includes:Field to be verified is received, field to be verified has field name and including field contents;The field contents for treating check field are segmented to obtain multiple first segmenting words;Each first segmenting word is characterized as the first term vector respectively;It calculates being averaged for the first term vector and is worth to the center vector of field to be verified;The similarity of the center vector of field to be verified and the center vector of multiple referential fields is calculated respectively;Target referential field is determined from multiple referential fields with reference to similarity;Wherein, target referential field is the highest referential field of similarity degree with field to be verified;Field more to be verified and the title of target referential field, and it is whether correct according to the field contents of comparative result confirmation field to be verified and the mapping relations of field name.The disclosure can improve the accuracy of check results.
Description
Technical field
This disclosure relates to medical big data technical field, in particular to a kind of medical field mapping method of calibration and
Medical field maps calibration equipment.
Background technology
At present, in curative activity, substantial amounts of medical data can be generated, these medical datas generally include to include substantial amounts of
Field, these fields generally comprise essential information, diagnosis information, idagnostic logout, inspection record and recording pathological mechanism of patient etc..For
Effective these medical datas of management, it is necessary to each different medical data of hospital be mapped on unified data platform,
The field contents and field name for making the field in medical data are corresponded to according to certain mapping relations.
But since the medical data of each hospital is not quite similar in data format and data content so that in medical data
Field mapping process be likely to occur mistake, cause occur field contents and the mapping error of field name on data platform.Cause
This is verified, it is necessary to map medical field, that is, judges whether field name is consistent with field contents.
In the prior art, the verification of medical field mapping is mainly judged according to field length and field format similar
Field, then according to the whether identical correctness for examining mapping result of the field contents of similar field and field name.But by
In many different texts on field length, field format there is no marked difference, cause to be difficult to accurately to find field contents
With the mapping relations mistake of field name, i.e., medical field mapping error, thus the accuracy of check results has much room for improvement.
It should be noted that information is only used for strengthening the reason to the background of the disclosure disclosed in above-mentioned background section
Solution, therefore can include not forming the information to the prior art known to persons of ordinary skill in the art.
The content of the invention
The disclosure is designed to provide a kind of medical field mapping method of calibration and medical field mapping calibration equipment, into
And one or more is overcome the problems, such as caused by the limitation of correlation technique and defect at least to a certain extent.
According to one aspect of the disclosure, a kind of medical field mapping method of calibration, including:
Field to be verified is received, the field to be verified has field name and including field contents;
The field contents of the field to be verified are segmented to obtain multiple first segmenting words;
Each first segmenting word is characterized as the first term vector respectively;
It calculates being averaged for first term vector and is worth to the center vector of the field to be verified;
The similarity of the center vector of the field to be verified and the center vector of multiple referential fields is calculated respectively;
Target referential field is determined from the multiple referential field with reference to the similarity;Wherein, the target reference
Field is the highest referential field of similarity degree with the field to be verified;
Compare the title of the field to be verified and the target referential field, and school is treated according to confirming comparative result
Test field field contents and field name mapping relations it is whether correct.
In a kind of exemplary embodiment of the disclosure, the step of further including the center vector for calculating the referential field,
Including:
The referential field is received, the referential field has field name and including field contents;
The field contents of the referential field are segmented to obtain multiple second segmenting words;
Each second segmenting word is characterized as the second term vector respectively;
It calculates being averaged for second term vector and is worth to the center vector of the referential field.
In a kind of exemplary embodiment of the disclosure, it is described by each second segmenting word be characterized as respectively the second word to
Amount includes:
To include at least the multiple referential field segmented with reference to corpus obtain multiple 3rd segmenting words;
Each 3rd segmenting word is characterized as the 3rd term vector respectively, and builds each 3rd segmenting word and described the
Mapping relations between three term vectors;
It is searched in mapping relations between the 3rd segmenting word and the 3rd term vector identical with second segmenting word
Second term vector of corresponding 3rd term vector of the 3rd segmenting word as second segmenting word.
In a kind of exemplary embodiment of the disclosure, it is described by each first segmenting word be characterized as respectively the first word to
Amount includes:
It is searched in mapping relations between the 3rd segmenting word and the 3rd term vector identical with first segmenting word
First term vector of corresponding 3rd term vector of the 3rd segmenting word as first segmenting word.
In a kind of exemplary embodiment of the disclosure, the similarity with reference to described in is true from the multiple referential field
The referential field that sets the goal includes:
The referential field of the highest predetermined quantity of similarity is chosen in the multiple referential field as candidate's reference word
Section;
Based on described with reference to corpus, the weight of the similarity, field average length are calculated according to pre-determined model
The weight of weight and field dispersion, the percentage shared by most high frequency words that the field dispersion includes for field;
Gone out respectively according to the similarity, the field average length, the field dispersion and its respective weight calculation
The weight score of a candidate's referential field;
The highest candidate's referential field of the weight score is chosen as the target referential field.
In a kind of exemplary embodiment of the disclosure, the pre-determined model is decision-tree model.
In a kind of exemplary embodiment of the disclosure, the similarity is cosine similarity.
According to another aspect of the disclosure, a kind of medical field mapping calibration equipment is provided, including:
Receiving unit, for receiving field to be verified, the field to be verified has field name and including field contents;
Participle unit is segmented to obtain multiple first segmenting words for the field contents to the field to be verified;
Characterization unit, for each first segmenting word to be characterized as the first term vector respectively;
First computing unit, for calculate first term vector be averaged be worth to the center of the field to be verified to
Amount;
Second computing unit, for calculating the center of the center vector of the field to be verified and multiple referential fields respectively
The similarity of vector;
Unit is chosen, target referential field is determined from the multiple referential field according to the similarity;Wherein, it is described
Target referential field is the highest referential field of similarity with the field to be verified;
Judging unit for the field to be verified and the title of the target referential field, and is tied according to comparing
Whether the field contents of the fruit confirmation field to be verified and the mapping relations of field name are correct.
In a kind of exemplary embodiment of the disclosure, second computing unit includes:
Receiving module, for receiving the referential field, the referential field has field name and including field contents;
Word-dividing mode is segmented to obtain multiple second segmenting words for the field contents to the referential field;
Characterization module, for each second segmenting word to be characterized as the second term vector respectively;
Computing module is worth to the center vector of the referential field for calculating being averaged for second term vector.
In a kind of exemplary embodiment of the disclosure, the selection unit includes:
Selecting module is made for choosing the referential field of the highest predetermined quantity of similarity in the multiple referential field
For candidate's referential field;
For being based on the reference corpus, the power of the similarity is calculated according to pre-determined model for weight computation module
The weight of weight, the weight of field average length and field dispersion, the most high frequency words institute that the field dispersion includes for field
The percentage accounted for;
Points calculating module, for according to the similarity, the field average length, the field dispersion and its each
From weight calculation go out the weight score of each candidate's referential field;
Evaluation module, for choosing the highest candidate's referential field of the weight score as the target referential field.
The medical field mapping method of calibration of the disclosure and medical field mapping calibration equipment, can first determine field to be verified
Center vector, and by comparing the field to be verified center vector and multiple referential fields center vector similarity
It determines and the most like target referential field of the field to be verified.So as to by comparing the phase for the vector for being used to characterize field
The similarity of interfield is judged like degree, judges the phase of interfield by comparing field length and field format compared to direct
Like degree, the feature of field can more be reflected by characterizing the vector of field, convenient for more accurately comparing similarity, be advantageously ensured that described
Target referential field is the highest referential field of similarity degree with the field to be verified.As a result, by the target with reference to word
Section is reference standard, confirms the whether correct accuracy of mapping relations of the field contents and field name of the field to be verified
It is improved.
It should be appreciated that above general description and following detailed description are only exemplary and explanatory, not
The disclosure can be limited.
Description of the drawings
Attached drawing herein is merged in specification and forms the part of this specification, shows the implementation for meeting the disclosure
Example, and for explaining the principle of the disclosure together with specification.It should be evident that the accompanying drawings in the following description is only the disclosure
Some embodiments, for those of ordinary skill in the art, without creative efforts, can also basis
These attached drawings obtain other attached drawings.
Fig. 1 schematically shows the flow chart of the medical field mapping method of calibration of one example embodiment of the disclosure.
Fig. 2 schematically show calculating first term vector of one example embodiment of the disclosure be averaged be worth to it is described
The flow chart of the center vector of field to be verified.
What Fig. 3 schematically showed one example embodiment of the disclosure is characterized as the second word respectively by each second segmenting word
The flow chart of vector.
Fig. 4 schematically shows the similarity with reference to described in of one example embodiment of the disclosure from the multiple referential field
Determine the flow chart of target referential field.
Fig. 5 schematically shows the block diagram of the medical field mapping calibration equipment of one example embodiment of the disclosure.
Fig. 6 schematically shows the block diagram of the second computing unit of one example embodiment of the disclosure.
Fig. 7 schematically shows the block diagram of the selection unit of one example embodiment of the disclosure.
Specific embodiment
Example embodiment is described more fully with reference to the drawings.However, example embodiment can be with a variety of shapes
Formula is implemented, and is not understood as limited to example set forth herein;On the contrary, these embodiments are provided so that the disclosure will more
Fully and completely, and by the design of example embodiment comprehensively it is communicated to those skilled in the art.Described feature, knot
Structure or characteristic can be in any suitable manner incorporated in one or more embodiments.In the following description, provide perhaps
More details fully understand embodiment of the present disclosure so as to provide.It it will be appreciated, however, by one skilled in the art that can
One or more in the specific detail are omitted with technical solution of the disclosure or others side may be employed
Method, constituent element, device, step etc..In other cases, be not shown in detail or describe known solution to avoid a presumptuous guest usurps the role of the host and
So that all aspects of this disclosure thicken.
In addition, attached drawing is only the schematic illustrations of the disclosure, it is not necessarily drawn to scale.Identical attached drawing mark in figure
Note represents same or similar part, thus will omit repetition thereof.Attached some block diagrams shown in figure are work(
Can entity, not necessarily must be corresponding with physically or logically independent entity.Software form may be employed to realize these work(
Entity or these functional entitys can be realized in one or more hardware modules or integrated circuit or at heterogeneous networks and/or place
These functional entitys are realized in reason device device and/or microcontroller device.
A kind of medical field mapping method of calibration is provided firstly in this example embodiment, for verifying in medical data
Medical field mapping correctness, as shown in Figure 1, it is described medical treatment field mapping method of calibration may comprise steps of:
Step S110, field to be verified is received, the field to be verified has field name and including field contents;
Step S120, the field contents of the field to be verified are segmented to obtain multiple first segmenting words;
Step S130, each first segmenting word is characterized as the first term vector respectively;
Step S140, calculate being averaged for the first term vector and be worth to the center vector of the field to be verified;
Step S150, the phase of the center vector and the center vector of multiple referential fields of the field to be verified is calculated respectively
Like degree;
Step S160, target referential field is determined from the multiple referential field with reference to the similarity;Wherein, it is described
Target referential field is the highest referential field of similarity degree with the field to be verified;And
Step S170, field to be verified described in comparison and the title of the target referential field, and it is true according to comparative result
Recognize the field to be verified field contents and field name mapping relations it is whether correct.
This example embodiment medical field mapping method of calibration, can by comparing the field to be verified center to
Amount and the similarity of the center vector of multiple referential fields determine the target referential field most like with the field to be verified.From
And can judge the similarity of interfield by comparing the similarity of the vector of characterization field, compared to directly by comparing field
Length and field format judge the similarity of interfield, and the feature of field can more be reflected by characterizing the vector of field, convenient for more
Accurately compare similarity, it is highest with the similarity degree of the field to be verified to advantageously ensure that the target referential field
Referential field.It is as a result, reference standard by the target referential field, confirms the field contents and field of the field to be verified
The whether correct accuracy of the mapping relations of title is improved.
In the following, each step of the medical field mapping method of calibration in this example embodiment will be carried out further
It is bright.
In step s 110, field to be verified is received, the field to be verified has field name and including field contents.
In this example embodiment, the field to be verified can be chosen from the medical data of hospital to be measured, the field
It can be field that text-type field, numeric type field or the two combine etc..And the field to be verified can include the base of patient
The contents such as this information, diagnosis information, idagnostic logout, medical expense record, inspection record, recording pathological mechanism, admission records.
For example, a hospital for needing to verify may be selected as hospital to be measured, receive the medical treatment from the hospital to be measured
The disease field chosen in data is disease as field to be verified, field name, and field contents may include heart disease, sugar
Urinate disease, pneumonia etc..
In the step s 120, the field contents of the field to be verified are segmented to obtain multiple first segmenting words.
In this example embodiment, the algorithm of participle can there are many, such as:Based on dictionary, the matched participle of dictionary
Method;Segmenting method that the segmenting method or knowledge based of word-based frequency statistics understand etc., also can be used it is other can be to described
The method that field to be verified is segmented, does not limit herein.The Words partition system in Chinese word segmentation can be used as participle instrument
The field to be verified is segmented, which can be based on any of the above-described algorithm or other segmentation methods, such as:This point
Word system can be stammerer Words partition system, NLPIR Chinese word segmentation systems etc., user can also use it is other will be described to be verified
Field cutting is the Words partition system of multiple first segmenting words, is not limited herein.
For example, after the field to be verified is received, stammerer Words partition system can be used by the disease field
Words sequence cutting is opened, and generates segmenting word independent one by one, that is, generates multiple first cuttings such as heart disease, diabetes, pneumonia
Word.
In step s 130, each first segmenting word is characterized as the first term vector respectively.
In this example embodiment, the language models conduct such as word2vec models or neutral net language model can be used
Characterization tool can also determine the first segmenting word corresponding the by inquiring about the mapping relations of the segmenting word that pre-establish and term vector
One term vector, so as to which each first segmenting word is characterized as the first term vector, method detailed can refer to hereinafter step S1531- steps
The method that each second segmenting word is characterized as the second term vector respectively described in rapid S1533.By by each first cutting
Word is characterized as the first term vector, and each first term vector is corresponded with each first segmenting word, so as to which text is special
Sign digitlization, convenient for more accurately comparing similarity.
For example, the mapping relations of segmenting word and term vector can be pre-established, heart disease therein, diabetes, pneumonia
It is both contained in Deng the first segmenting word in the mapping relations, that is to say, that exist and heart disease, diabetes, lung in the mapping relations
The segmenting words such as inflammation, are then determined by inquiring about the mapping relations according to first segmenting word such as heart disease, diabetes, pneumonia and heart
The corresponding term vector such as disease, diabetes, pneumonia is as the first term vector, so as to fulfill by heart disease, diabetes, pneumonia etc. first
Segmenting word characterization is the first term vector.
In step 140, calculate being averaged for the first term vector and be worth to the center vector of the field to be verified.
For example, the first term vector of whole corresponding with first segmenting word such as heart disease, diabetes, pneumonia can be calculated
Average value, the average value of the plurality of vector is the above-mentioned center vector as the disease field for treating school field.Compared to wherein
Any first term vector, the center vector of field to be verified can more comprehensively reflect the feature of field to be verified, be conducive to carry
High accuracy.
In step S150, the center vector of the field to be verified and the center vector of multiple referential fields are calculated respectively
Similarity.
In this example embodiment, the similarity can be cosine similarity, adjustment cosine similarity or Pearson came phase
Relation number etc. can be used for the measurement for judging the similarity degree between vector.
In this example embodiment, as shown in Fig. 2, the step of calculating the center vector of the referential field can include
Step S151 to step S154.
In step S151, referential field is received, the referential field has field name and including field contents.
In this example embodiment, since referential field is the standard that the field to be verified is treated in verification, can be first
The preferable hospital of a quality of data is selected as with reference to hospital, in the medical data of the reference hospital, field name and field
All correct or accuracy is very high for the mapping relations of content, so as to be conducive to improve the accuracy of subsequent check result.
For example, the preferable hospital of selecting data quality is used as with reference to hospital, from the medical data of the reference hospital
It is middle to select the fields such as disease, symptom and drug as referential field.The field name of the disease field be disease, field contents
The content of non-disease title is not included including heart disease, diabetes, pneumonia etc.;The field name of symptom field is symptom and does not wrap
Content containing non-symptom, field contents include uncomfortable in chest, shortness of breath etc.;The field name of medication field be drug, field contents bag
Content containing aspirin, Amoxicillin etc. and not comprising non-drug.
In step S152, the field contents of the referential field are segmented to obtain multiple second segmenting words.
The Words partition system that the referential field can be used employed in step S120 segments, but can also be used other
Words partition system is segmented, that is to say, that stammerer Words partition system, NLPIR Chinese word segmentation systems or other Words partition systems can be used.
For example, after the disease field for electing referential field as, symptom field and medication field etc. is received, can be used
Stammerer Words partition system opens the words sequence cutting of above three field, generates heart disease, diabetes, pneumonia, uncomfortable in chest, gas
Multiple second segmenting words such as short, aspirin and Amoxicillin.
In step S153, each second segmenting word is characterized as the second term vector respectively.
In this example embodiment, the language models conduct such as word2vec models or neutral net language model can be used
Each second segmenting word is characterized as the second term vector by characterization tool respectively.
In this example embodiment, as shown in figure 3, it is described by each second segmenting word be characterized as respectively the second word to
Amount may include step S1531- steps S1533.
In step S1531, to include at least the multiple referential field segmented to obtain with reference to corpus it is multiple
3rd segmenting word.
In this example embodiment, the principle of participle can refer to the participle that check field is treated in step S120, can adopt
It is segmented with stammerer Words partition system or other Words partition systems to described with reference to corpus, generates the multiple 3rd segmenting word.
While the reference corpus includes each referential field, other words in the medical data with reference to hospital are may also include
Section.Preferably using whole medical datas with reference to hospital as described with reference to corpus.
For example, may be selected to be used as with reference to corpus with reference to the medical data base of hospital, wherein comprising disease, symptom,
The fields such as drug segment this with reference to corpus using stammerer Words partition system, obtain heart disease, diabetes, pneumonia, chest
Multiple 3rd segmenting words such as bored, shortness of breath, powerless, aspirin, Amoxicillin.
In step S1532, each 3rd segmenting word is characterized as the 3rd term vector respectively, and builds each described 3rd
Mapping relations between segmenting word and the 3rd term vector.Wherein, word2vec models or neutral net language mould can be used
The language models such as type are as characterization tool.
For example, using word2vec models, by heart disease, diabetes, pneumonia, uncomfortable in chest, shortness of breath, powerless, Ah Si
Multiple 3rd segmenting words such as woods, Amoxicillin are characterized as term vector so as to generate multiple 3rd term vectors respectively, wherein, heart
3rd segmenting word such as disease, diabetes, pneumonia, uncomfortable in chest, shortness of breath, powerless, aspirin, Amoxicillin has the corresponding 3rd
Term vector, so as to build the 3rd cutting such as heart disease, diabetes, pneumonia, uncomfortable in chest, shortness of breath, powerless, aspirin, Amoxicillin
The relation of reflecting of word and each 3rd term vector is penetrated.
In step S1533, in the mapping relations between the 3rd segmenting word and the 3rd term vector search with it is described
Corresponding 3rd term vector of identical the 3rd segmenting word of second segmenting word as second segmenting word described
Two term vectors.
For example, when need using as the heart disease of the second segmenting word, diabetes, pneumonia, uncomfortable in chest, shortness of breath, powerless, Ah
Take charge of a woods, Amoxicillin when segmenting words are characterized as the second term vector, can be searched in the 3rd segmenting word heart disease, diabetes,
The segmenting words such as pneumonia, uncomfortable in chest, shortness of breath, powerless, aspirin, Amoxicillin;Then according to above-mentioned mapping relations, inquire about wherein
Corresponding 3rd term vector of the segmenting words such as heart disease, diabetes, pneumonia is obtained, the 3rd term vector is as the second segmenting word
The segmenting words such as heart disease, diabetes, pneumonia, uncomfortable in chest, shortness of breath, powerless, aspirin, Amoxicillin the second term vector.From
And when the second segmenting word is characterized as the second term vector, it avoids, using special language model, thereby simplifying characterization process.
It should be noted that based on the above-mentioned method that each second segmenting word is characterized as the second term vector respectively,
In this example embodiment, each first segmenting word is characterized as the first term vector by described in step 130 respectively may include:
It is searched in mapping relations between the 3rd segmenting word and the 3rd term vector identical with first segmenting word
The 3rd segmenting word, and using the 3rd term vector corresponding with the 3rd segmenting word as the first term vector of first segmenting word.
Wherein, the mapping relations between the 3rd segmenting word and the 3rd term vector be the above-mentioned segmenting word pre-established and word to
The mapping relations of amount.So as to simplified characterization process.Above-mentioned steps S1533 is can refer to, details are not described herein.
In step S154, calculate being averaged for the second term vector and be worth to the center vector of the referential field.
It for example, can be according to as corresponding second word of whole such as heart disease, diabetes, the pneumonia in the second segmenting word
Vector calculates the average value of all the second term vectors, which is the center vector of disease field in above-mentioned referential field.
In the manner described above, it can obtain the center vector of the referential fields such as symptom field and medication field.Center vector can more comprehensively
Reflection referential field feature, so as to be conducive to improve accuracy.
Description of the summary to step 150, step 151- steps 154 and step S1531- steps S1533, citing and
Speech, can the center vector of disease field of computational representation field to be verified and the center vector of the disease field of characterization referential field
Angle, the cosine value of the angle for both cosine similarity, such as when the angle is 0 °, cosine similarity 1.Similarly,
The cosine phase of the center vector of disease field of the symptom field of characterization referential field with characterizing field to be verified can be calculated
Like in the center vector of the fields such as degree and the medication field for characterizing referential field and the disease field of characterization field to be verified
The cosine similarity of Heart vector.
In step S160, target referential field is determined from the multiple referential field with reference to the similarity;Wherein,
The target referential field is the highest referential field of similarity degree with the field to be verified.
It in this example embodiment, can be directly ranked up according to cosine similarity, choose the cosine similarity most
High referential field is as the target referential field.
In this example embodiment, as shown in figure 4, the similarity with reference to described in is true from the multiple referential field
The referential field that sets the goal may include step S161- steps S164.
In step S161, the referential field that the highest predetermined quantity of similarity is chosen in the multiple referential field is made
For candidate's referential field.
In this example embodiment, the similarity can be cosine similarity, can be according to cosine similarity to multiple ginsengs
It is ranked up according to field, selects the referential field of the highest predetermined quantity of cosine similarity, and using these referential fields as time
Choosing is according to field.The predetermined quantity can be the positive integer not less than 2, such as 3,4,5 etc..It can be by user's sets itself.In addition,
In order to avoid being selected into the relatively low referential field of similarity, the predetermined quantity should not be too large.
It for example, can be according to the cosine similarity to disease field, symptom field and the drug word in referential field
Section etc. fields be ranked up, if the disease field, symptom field and medication field in referential field be with as field to be verified
Disease field similarity for highest three, then can be by disease field, symptom field and the medication field in referential field
As candidate's referential field.
In step S162, based on described with reference to corpus, weight, the word of the similarity are calculated according to pre-determined model
The section weight of average length and the weight of field dispersion, hundred shared by most high frequency words that the field dispersion includes for field
Fraction.
In this example embodiment, the field average length can be described long with reference to the referential field in corpus
The average value of degree, the field dispersion are the percentage shared by the most high frequency words in each referential field.The similarity
The weight of the weight degree of denoting like of weight, the weight of field average length and field dispersion, field average length and
Significance level of the field dispersion when judging field similarity degree.The pre-determined model can be used decision-tree model or it is other can
For computation model.
For example, based on the above-mentioned reference corpus with reference to hospital, cosine similarity is calculated according to decision-tree model
Weight for 1, the weight of field average length is 0.8, and the weight of field dispersion is 0.5.
In step S163, according to the similarity, the field average length, the field dispersion and its respective
Weight calculation goes out the weight score of each candidate's referential field.
For example, disease field, symptom field and medication field in referential field etc. can be calculated respectively according to formula
The weight score of field.
Above-mentioned formula is:S=W1 × 1.5+W2 × 0.8+W3 × 0.5, wherein, S is weight score, and W1 is similar for cosine
's;W2 is field average length;W3 is field dispersion.
In step S164, the highest candidate's referential field of the weight score is chosen as the target referential field.
It for example, can adding to fields such as the disease fields, symptom field and medication field as candidate's referential field
Power score is compared, and result is:The weight score S2 > medication fields of the weight score S1 > symptom fields of disease field
Weight score S3.Illustrate the similarity degree highest of the disease field and disease field to be verified in candidate's referential field.At this point,
It can be using the disease field in candidate's referential field as target referential field.
To sum up, referential field and glyphomancy to be checked can be weighed from three similarity, field average length, field dispersion angles
The similarity degree of section, is conducive to select the highest referential field of similarity degree as target referential field, convenient for further improving
The accuracy of check results.
In step S170, the field to be verified and the title of the target referential field, and tied according to comparing
Whether the field contents of the fruit confirmation field to be verified and the mapping relations of field name are correct.
In this example embodiment, due to target referential field field name and field contents mapping relations just
True rate is high, therefore can be used as reference standard, and target referential field and the similarity degree highest of field to be verified, therefore, when treating school
Test field field name it is identical with the field name of target referential field when, i.e., described comparative result for it is identical when, can determine that
The field contents of field to be verified and the mapping relations of field name are correct.
Conversely, when the field name of field to be verified and the field name difference of target referential field, i.e., described comparison
As a result for it is different when, can determine that field to be verified field contents and field name mapping relations it is incorrect.
For example, the field name as the disease field of target referential field is disease, and is used as field to be verified
Disease field title also for disease, the field name of the two is identical.Therefore, the field contents of disease field to be verified and
The mapping relations of field name are correct.
In addition, although describing each step of method in the disclosure with particular order in the accompanying drawings, this does not really want
Asking or implying must could realize according to the particular order come the step for performing these steps or having to carry out shown in whole
Desired result.It is additional or alternative, it is convenient to omit some steps, by multiple steps merge into a step perform and/
Or a step is decomposed into execution of multiple steps etc..
Following is apparatus of the present invention embodiment, can be used for performing the method for the present invention embodiment.It is real for apparatus of the present invention
The details not disclosed in example is applied, refer to the method for the present invention embodiment.
This example embodiment additionally provides a kind of medical field mapping calibration equipment, as shown in figure 5, the medical treatment field
Receiving unit 1, participle unit 2, characterization unit 3, the first computing unit 4, the second computing unit can be included by mapping calibration equipment
5th, unit 6 and judging unit 7 are chosen.Wherein:
Receiving unit 1 can be used for receiving field to be verified, and the field to be verified has field name and including field
Content.
Participle unit 2 can be used for the field contents of the field to be verified are segmented to obtain multiple first cuttings
Word.
Characterization unit 3 can be used for each first segmenting word being characterized as the first term vector respectively.
First computing unit 4, which can be used for calculating being averaged for first term vector, to be worth in the field to be verified
Heart vector.
Second computing unit 5 can be used for the center vector for calculating the field to be verified respectively and multiple referential fields
The similarity of center vector;
Target referential field can be determined according to the similarity from the multiple referential field by choosing unit 6;Wherein,
The target referential field is the highest referential field of similarity with the field to be verified;
Judging unit 7 can be used for the title of field to be verified described in comparison and the target referential field, and according to than
Whether the field contents of the relatively result confirmation field to be verified and the mapping relations of field name are correct.
In this example embodiment, when the field name of field to be verified is identical with the field name of target referential field
When, can determine that field to be verified field contents and field name mapping relations it is correct;When the field name of field to be verified
During with the field name difference of target referential field, then it can determine that the field contents of field to be verified and the mapping of field name are closed
It is incorrect.So as to complete to treat the verification of the mapping relations of the field contents of check field and field name.
In this example embodiment, judging unit 7 can be additionally used in the field of the field to be verified confirmed
Hold and exported with the whether correct result of the mapping relations of field name.
In this example embodiment, as shown in fig. 6, second computing unit 5 may include receiving module 51, participle mould
Block 52, characterization module 53 and computing module 54, wherein:
Receiving module 51 can be used for receiving the referential field, and the referential field has field name and including field
Content;
Word-dividing mode 52 can be used for the field contents of the referential field are segmented to obtain multiple second segmenting words;
Characterization module 53 can be used for each second segmenting word being characterized as the second term vector respectively;
Computing module 54 can be used for calculating second term vector be averaged be worth to the center of the referential field to
Amount.
In this example embodiment, as shown in fig. 7, choosing unit 6 may include selecting module 61, weight computation module
62nd, points calculating module 63 and evaluation module 64, wherein:
Selecting module 61 can be used for the reference that the highest predetermined quantity of similarity is chosen in the multiple referential field
Field is as candidate's referential field;
Weight computation module 62 can be used for calculating the similarity according to pre-determined model with reference to corpus based on described
Weight, the weight of the weight of field average length and field dispersion, the field dispersion is the most high frequency that field includes
Percentage shared by word;
Points calculating module 63 can be used for according to the similarity, the field average length, the field dispersion
And its respective weight calculation goes out the weight score of each candidate's referential field;
Evaluation module 64 can be used for choosing the highest candidate's referential field of the weight score as the target reference
Field.
The detail of each module is moved in corresponding virtual objects in above-mentioned medical treatment field mapping calibration equipment controls
It is described in detail in method processed, therefore details are not described herein again.
It should be noted that although several modules or list of the equipment for action executing are referred in above-detailed
Member, but this division is not enforceable.In fact, according to embodiment of the present disclosure, it is above-described two or more
Either the feature of unit and function can embody module in a module or unit.A conversely, above-described mould
Either the feature of unit and function can be further divided into being embodied by multiple modules or unit block.
Through the above description of the embodiments, those skilled in the art is it can be readily appreciated that example described herein is implemented
Mode can be realized by software, can also be realized in a manner that software is with reference to necessary hardware.Therefore, according to the disclosure
The technical solution of embodiment can be embodied in the form of software product, the software product can be stored in one it is non-volatile
Property storage medium (can be CD-ROM, USB flash disk, mobile hard disk etc.) in or network on, including some instructions so that a calculating
Equipment (can be personal computer, server, mobile terminal or network equipment etc.) is performed according to disclosure embodiment
Method.
Those skilled in the art will readily occur to the disclosure its after considering specification and putting into practice invention disclosed herein
Its embodiment.This application is intended to cover any variations, uses, or adaptations of the disclosure, these modifications, purposes or
Person's adaptive change follows the general principle of the disclosure and including the undocumented common knowledge in the art of the disclosure
Or conventional techniques.Description and embodiments are considered only as illustratively, and the true scope and spirit of the disclosure are by appended
Claim is pointed out.
Claims (10)
1. a kind of medical treatment field mapping method of calibration, which is characterized in that including:
Field to be verified is received, the field to be verified has field name and including field contents;
The field contents of the field to be verified are segmented to obtain multiple first segmenting words;
Each first segmenting word is characterized as the first term vector respectively;
It calculates being averaged for first term vector and is worth to the center vector of the field to be verified;
The similarity of the center vector of the field to be verified and the center vector of multiple referential fields is calculated respectively;
Target referential field is determined from the multiple referential field with reference to the similarity;Wherein, the target referential field
For the highest referential field of similarity degree with the field to be verified;
Compare the title of the field to be verified and the target referential field, and the word to be verified is confirmed according to comparative result
Whether the field contents of section and the mapping relations of field name are correct.
2. medical treatment field mapping method of calibration according to claim 1, which is characterized in that further include and calculate the reference word
The step of center vector of section, including:
The referential field is received, the referential field has field name and including field contents;
The field contents of the referential field are segmented to obtain multiple second segmenting words;
Each second segmenting word is characterized as the second term vector respectively;
It calculates being averaged for second term vector and is worth to the center vector of the referential field.
3. medical treatment field mapping method of calibration according to claim 2, which is characterized in that described by each second cutting
Word is characterized as the second term vector respectively to be included:
To include at least the multiple referential field segmented with reference to corpus obtain multiple 3rd segmenting words;
Each 3rd segmenting word is characterized as the 3rd term vector respectively, and builds each 3rd segmenting word and the 3rd word
Mapping relations between vector;
The institute identical with second segmenting word is searched in mapping relations between the 3rd segmenting word and the 3rd term vector
State second term vector of corresponding 3rd term vector of the 3rd segmenting word as second segmenting word.
4. medical treatment field mapping method of calibration according to claim 3, which is characterized in that described by each first cutting
Word is characterized as the first term vector respectively to be included:
The institute identical with first segmenting word is searched in mapping relations between the 3rd segmenting word and the 3rd term vector
State first term vector of corresponding 3rd term vector of the 3rd segmenting word as first segmenting word.
5. it is according to claim 4 medical treatment field mapping method of calibration, which is characterized in that the similarity with reference to described in from
Determine that target referential field includes in the multiple referential field:
The referential field of the highest predetermined quantity of similarity is chosen in the multiple referential field as candidate's referential field;
Based on described with reference to corpus, weight, the weight of field average length of the similarity are calculated according to pre-determined model
The percentage shared by most high frequency words included with the weight of field dispersion, the field dispersion for field;
Each institute is gone out according to the similarity, the field average length, the field dispersion and its respective weight calculation
State the weight score of candidate's referential field;
The highest candidate's referential field of the weight score is chosen as the target referential field.
6. medical treatment field mapping method of calibration according to claim 5, which is characterized in that the pre-determined model is decision tree
Model.
7. medical treatment field mapping method of calibration according to claim 1, which is characterized in that the similarity is similar for cosine
Degree.
8. a kind of medical treatment field mapping calibration equipment, which is characterized in that including:
Receiving unit, for receiving field to be verified, the field to be verified has field name and including field contents;
Participle unit is segmented to obtain multiple first segmenting words for the field contents to the field to be verified;
Characterization unit, for each first segmenting word to be characterized as the first term vector respectively;
First computing unit is worth to the center vector of the field to be verified for calculating being averaged for first term vector;
Second computing unit, for calculating the center vector of the center vector of the field to be verified and multiple referential fields respectively
Similarity;
Unit is chosen, target referential field is determined from the multiple referential field according to the similarity;Wherein, the target
Referential field is the highest referential field of similarity degree with the field to be verified;
Judging unit, for the title of the field to be verified and the target referential field, and it is true according to comparative result
Recognize the field to be verified field contents and field name mapping relations it is whether correct.
9. medical treatment field mapping calibration equipment according to claim 8, which is characterized in that the second computing unit bag
It includes:
Receiving module, for receiving the referential field, the referential field has field name and including field contents;
Word-dividing mode is segmented to obtain multiple second segmenting words for the field contents to the referential field;
Characterization module, for each second segmenting word to be characterized as the second term vector respectively;
Computing module is worth to the center vector of the referential field for calculating being averaged for second term vector.
10. medical treatment field mapping calibration equipment according to claim 8, which is characterized in that the selection unit includes:
Selecting module, for choosing the referential field of the highest predetermined quantity of similarity in the multiple referential field as time
Select referential field;
For being based on the reference corpus, weight, the word of the similarity are calculated according to pre-determined model for weight computation module
The section weight of average length and the weight of field dispersion, hundred shared by most high frequency words that the field dispersion includes for field
Fraction;
Points calculating module, for according to the similarity, the field average length, the field dispersion and its respective
Weight calculation goes out the weight score of each candidate's referential field;
Evaluation module, for choosing the highest candidate's referential field of the weight score as the target referential field.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611027722.1A CN108091372B (en) | 2016-11-21 | 2016-11-21 | Medical field mapping verification method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611027722.1A CN108091372B (en) | 2016-11-21 | 2016-11-21 | Medical field mapping verification method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108091372A true CN108091372A (en) | 2018-05-29 |
CN108091372B CN108091372B (en) | 2021-06-18 |
Family
ID=62169614
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611027722.1A Active CN108091372B (en) | 2016-11-21 | 2016-11-21 | Medical field mapping verification method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108091372B (en) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109871382A (en) * | 2019-02-13 | 2019-06-11 | 北京明略软件系统有限公司 | A kind of implementation method and device of tables of data access java standard library |
CN109902083A (en) * | 2019-02-26 | 2019-06-18 | 北京明略软件系统有限公司 | Method, apparatus, computer storage medium and the terminal of a kind of pair of mark processing |
CN110309504A (en) * | 2019-05-23 | 2019-10-08 | 平安科技(深圳)有限公司 | Text handling method, device, equipment and storage medium based on participle |
CN110457704A (en) * | 2019-08-12 | 2019-11-15 | 北京明略软件系统有限公司 | Determination method, apparatus, storage medium and the electronic device of aiming field |
CN110532267A (en) * | 2019-08-28 | 2019-12-03 | 北京明略软件系统有限公司 | Determination method, apparatus, storage medium and the electronic device of field |
CN110795482A (en) * | 2019-10-16 | 2020-02-14 | 浙江大华技术股份有限公司 | Data benchmarking method, device and storage device |
CN111104481A (en) * | 2019-12-17 | 2020-05-05 | 东软集团股份有限公司 | Method, device and equipment for identifying matching field |
CN111125311A (en) * | 2019-12-24 | 2020-05-08 | 医渡云(北京)技术有限公司 | Method and device for checking information normalization processing, storage medium and electronic equipment |
CN111241086A (en) * | 2020-01-17 | 2020-06-05 | 甘肃省卫生健康统计信息中心(西北人口信息中心) | Data quality improvement method and system based on medical big data |
CN111737533A (en) * | 2020-06-19 | 2020-10-02 | 东软集团股份有限公司 | Processing method and device for inspection items, storage medium and equipment |
US20210326995A1 (en) * | 2019-01-23 | 2021-10-21 | Ping An Technology (Shenzhen) Co., Ltd. | Claim settlement anti-fraud method, apparatus, device, and storage medium based on graph computation technology |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020178158A1 (en) * | 1999-12-21 | 2002-11-28 | Yuji Kanno | Vector index preparing method, similar vector searching method, and apparatuses for the methods |
US20070136288A1 (en) * | 1998-12-24 | 2007-06-14 | Atsuo Shimada | Document processor, document classification device, document processing method, document classification method, and computer-readable recording medium for recording programs for executing the methods on a computer |
CN102043813A (en) * | 2009-10-13 | 2011-05-04 | 北京大学 | Medical information treatment server and medical information treatment method |
CN102831193A (en) * | 2012-08-03 | 2012-12-19 | 人民搜索网络股份公司 | Topic detecting device and topic detecting method based on distributed multistage cluster |
CN104156415A (en) * | 2014-07-31 | 2014-11-19 | 沈阳锐易特软件技术有限公司 | Mapping processing system and method for solving problem of standard code control of medical data |
-
2016
- 2016-11-21 CN CN201611027722.1A patent/CN108091372B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070136288A1 (en) * | 1998-12-24 | 2007-06-14 | Atsuo Shimada | Document processor, document classification device, document processing method, document classification method, and computer-readable recording medium for recording programs for executing the methods on a computer |
US20020178158A1 (en) * | 1999-12-21 | 2002-11-28 | Yuji Kanno | Vector index preparing method, similar vector searching method, and apparatuses for the methods |
CN102043813A (en) * | 2009-10-13 | 2011-05-04 | 北京大学 | Medical information treatment server and medical information treatment method |
CN102831193A (en) * | 2012-08-03 | 2012-12-19 | 人民搜索网络股份公司 | Topic detecting device and topic detecting method based on distributed multistage cluster |
CN104156415A (en) * | 2014-07-31 | 2014-11-19 | 沈阳锐易特软件技术有限公司 | Mapping processing system and method for solving problem of standard code control of medical data |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20210326995A1 (en) * | 2019-01-23 | 2021-10-21 | Ping An Technology (Shenzhen) Co., Ltd. | Claim settlement anti-fraud method, apparatus, device, and storage medium based on graph computation technology |
CN109871382A (en) * | 2019-02-13 | 2019-06-11 | 北京明略软件系统有限公司 | A kind of implementation method and device of tables of data access java standard library |
CN109902083A (en) * | 2019-02-26 | 2019-06-18 | 北京明略软件系统有限公司 | Method, apparatus, computer storage medium and the terminal of a kind of pair of mark processing |
CN110309504A (en) * | 2019-05-23 | 2019-10-08 | 平安科技(深圳)有限公司 | Text handling method, device, equipment and storage medium based on participle |
CN110309504B (en) * | 2019-05-23 | 2023-10-31 | 平安科技(深圳)有限公司 | Text processing method, device, equipment and storage medium based on word segmentation |
CN110457704A (en) * | 2019-08-12 | 2019-11-15 | 北京明略软件系统有限公司 | Determination method, apparatus, storage medium and the electronic device of aiming field |
CN110532267A (en) * | 2019-08-28 | 2019-12-03 | 北京明略软件系统有限公司 | Determination method, apparatus, storage medium and the electronic device of field |
CN110795482A (en) * | 2019-10-16 | 2020-02-14 | 浙江大华技术股份有限公司 | Data benchmarking method, device and storage device |
CN111104481A (en) * | 2019-12-17 | 2020-05-05 | 东软集团股份有限公司 | Method, device and equipment for identifying matching field |
CN111104481B (en) * | 2019-12-17 | 2023-10-10 | 东软集团股份有限公司 | Method, device and equipment for identifying matching field |
CN111125311A (en) * | 2019-12-24 | 2020-05-08 | 医渡云(北京)技术有限公司 | Method and device for checking information normalization processing, storage medium and electronic equipment |
CN111241086A (en) * | 2020-01-17 | 2020-06-05 | 甘肃省卫生健康统计信息中心(西北人口信息中心) | Data quality improvement method and system based on medical big data |
CN111737533A (en) * | 2020-06-19 | 2020-10-02 | 东软集团股份有限公司 | Processing method and device for inspection items, storage medium and equipment |
CN111737533B (en) * | 2020-06-19 | 2024-02-09 | 东软集团股份有限公司 | Method, device, storage medium and equipment for processing inspection items |
Also Published As
Publication number | Publication date |
---|---|
CN108091372B (en) | 2021-06-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108091372A (en) | Medical field mapping method of calibration and device | |
CN111382255B (en) | Method, apparatus, device and medium for question-answering processing | |
CN107908635A (en) | Establish textual classification model and the method, apparatus of text classification | |
JP7153004B2 (en) | COMMUNITY Q&A DATA VERIFICATION METHOD, APPARATUS, COMPUTER DEVICE, AND STORAGE MEDIUM | |
CN111444724B (en) | Medical question-answer quality inspection method and device, computer equipment and storage medium | |
Pirracchio et al. | The balance super learner: A robust adaptation of the super learner to improve estimation of the average treatment effect in the treated based on propensity score matching | |
CN104462084A (en) | Search refinement advice based on multiple queries | |
CN113593709B (en) | Disease coding method, system, readable storage medium and device | |
CN110147421A (en) | A kind of target entity link method, device, equipment and storage medium | |
US20220215899A1 (en) | Affinity prediction method and apparatus, method and apparatus for training affinity prediction model, device and medium | |
CN109448859A (en) | Data processing method and device, electronic equipment, storage medium | |
US20180004900A1 (en) | Method and apparatus for information analysis | |
CN111160049B (en) | Text translation method, apparatus, machine translation system, and storage medium | |
Liu et al. | Convolution neural network based particle filtering for remaining useful life prediction of rolling bearing | |
Shah et al. | Evaluation of deep learning techniques for identification of sarcoma-causing carcinogenic mutations | |
CN109597989A (en) | Diagnose word normalizing method and device, storage medium, electronic equipment | |
CN113919510A (en) | Sample feature selection method, device, equipment and medium | |
WO2022217715A1 (en) | Similar patient identification method and apparatus, computer device, and storage medium | |
US11322257B2 (en) | Intelligent diagnosis system and method | |
Brown et al. | Information growth for sequential monitoring of clinical trials with a stepped wedge cluster randomized design and unknown intracluster correlation | |
Khalid et al. | Calibration of rule-based stochastic biochemical models using statistical model checking | |
CN112966153B (en) | Term mapping method, device, electronic equipment and storage medium | |
CN117409921B (en) | Disease conclusion determination method, device and storage medium | |
CN107193860A (en) | Medicine information multidimensional identification method and system | |
CN113012780B (en) | Method, device and system for grading severity of inspection result in intelligent follow-up visit |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |