CN111128388B - Value range data matching method and device and related products - Google Patents
Value range data matching method and device and related products Download PDFInfo
- Publication number
- CN111128388B CN111128388B CN201911222384.0A CN201911222384A CN111128388B CN 111128388 B CN111128388 B CN 111128388B CN 201911222384 A CN201911222384 A CN 201911222384A CN 111128388 B CN111128388 B CN 111128388B
- Authority
- CN
- China
- Prior art keywords
- name
- matched
- obtaining
- operation name
- surgical
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 77
- 239000013598 vector Substances 0.000 claims abstract description 150
- 238000012549 training Methods 0.000 claims abstract description 59
- 238000012545 processing Methods 0.000 claims abstract description 29
- 238000001356 surgical procedure Methods 0.000 claims description 30
- 241001465754 Metazoa Species 0.000 claims description 21
- 238000004364 calculation method Methods 0.000 claims description 18
- 201000010099 disease Diseases 0.000 claims description 18
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 claims description 18
- 238000000546 chi-square test Methods 0.000 claims description 7
- 238000004590 computer program Methods 0.000 claims description 5
- KJONHKAYOJNZEC-UHFFFAOYSA-N nitrazepam Chemical compound C12=CC([N+](=O)[O-])=CC=C2NC(=O)CN=C1C1=CC=CC=C1 KJONHKAYOJNZEC-UHFFFAOYSA-N 0.000 claims description 4
- 230000011218 segmentation Effects 0.000 description 12
- 230000006870 function Effects 0.000 description 9
- 230000008569 process Effects 0.000 description 9
- 238000010586 diagram Methods 0.000 description 8
- 238000002271 resection Methods 0.000 description 6
- 238000004891 communication Methods 0.000 description 5
- 238000013135 deep learning Methods 0.000 description 4
- 238000004422 calculation algorithm Methods 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 239000000470 constituent Substances 0.000 description 2
- 238000003745 diagnosis Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000036541 health Effects 0.000 description 2
- 210000004072 lung Anatomy 0.000 description 2
- 230000008520 organization Effects 0.000 description 2
- 230000000241 respiratory effect Effects 0.000 description 2
- 208000007107 Stomach Ulcer Diseases 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 210000000750 endocrine system Anatomy 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 210000000653 nervous system Anatomy 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 210000002345 respiratory system Anatomy 0.000 description 1
- 238000012552 review Methods 0.000 description 1
- 230000011664 signaling Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 201000004647 tinea pedis Diseases 0.000 description 1
- 210000004291 uterus Anatomy 0.000 description 1
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/70—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
Landscapes
- Engineering & Computer Science (AREA)
- Medical Informatics (AREA)
- Data Mining & Analysis (AREA)
- Health & Medical Sciences (AREA)
- Public Health (AREA)
- Pathology (AREA)
- Databases & Information Systems (AREA)
- Biomedical Technology (AREA)
- Epidemiology (AREA)
- General Health & Medical Sciences (AREA)
- Primary Health Care (AREA)
- Medical Treatment And Welfare Office Work (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The application discloses a value range data matching method, a value range data matching device and related products. Obtaining the surgical name to be matched in the value range data to be matched; processing the surgical name to be matched to obtain a feature vector group to be matched; and obtaining a matching result by utilizing the pre-trained data matching model and the feature vector set to be matched. Therefore, the data matching model obtained through training has the function of matching the national standard operation name according to the operation name which is not the national standard, can determine the specific classification condition of the national standard operation name in the value range classification tree, and reflects the specific classification condition through the node index value. The matching result can be used for indexing and obtaining the national standard operation name matched by the operation name to be matched, and determining the specific classification condition of the matched national standard operation name in the value domain classification tree. Compared with the prior art, the matching device has the advantages that manual labor is effectively saved, and matching efficiency is improved. In addition, the anti-interference performance of matching can be improved, and the matching accuracy of the value range data is further improved.
Description
Technical Field
The present disclosure relates to the field of data processing technologies, and in particular, to a value range data matching method, device and related product.
Background
In recent years, along with the continuous progress of informatization and industrialization in the medical field, the display form of medical data has also changed greatly. This has profound effects on both the hospital medical information system (Hospital Information System, HIS) and the national health information system. In order to effectively collect, analyze and apply relevant information of the medical fields such as disease onset conditions, disease treatment schemes and the like of the region, a region platform can be established at present.
The medical data comprises a large amount of value range data, wherein some value range data are few in variety, the data organization is simple, and the value range data can be called as small value range data, such as medical insurance category, patient gender and the like; there are also some value range data with less kinds, complex data organization, and can be called as large value range data, such as operation name, disease name, etc.
The area corresponding to the area platform typically includes a plurality of hospitals, each of which creates a database for storing medical data of the hospital, from which the area platform obtains data for analysis and application. The value range data stored in the database of each hospital may have non-standard and non-uniform problems. As an example, the name of "laryngeal resection" in the database of hospital a is "first surgery", and the name of "laryngeal resection" in the database of hospital B is "second surgery". It would be difficult to effectively analyze and apply these value range data without matching the surgical name.
At present, in the medical field, the scheme for matching the value range data of the large value range in the medical data comprises fuzzy query, word segmentation comparison and manual comparison, but the method for matching by utilizing the fuzzy query or the word segmentation comparison is poor in matching effect, and the method for manually comparing consumes a large amount of manpower. Therefore, how to improve the accuracy and the matching efficiency of the value domain data matching has become a technical problem to be solved in the process of establishing and perfecting the medical region platform.
Disclosure of Invention
Based on the problems, the application provides a value range data matching method, a value range data matching device and related products, so that the accuracy and the matching efficiency of the value range data matching are improved.
The embodiment of the application discloses the following technical scheme:
in a first aspect, the present application provides a value range data matching method, including:
obtaining value range data to be matched;
processing the surgical name to be matched in the value range data to be matched to obtain a feature vector group to be matched;
obtaining a matching result by using the data matching model and the feature vector group to be matched; the data matching model is obtained after training by using a labeled sample feature vector set in advance; the label comprises a name index value of a national standard operation name corresponding to the historical operation name and node index values corresponding to all layers of the value domain classification tree of the national standard operation name; the value domain classification tree is a structural tree for classifying national standard operation names according to the parts of human bodies or animal bodies.
Optionally, obtaining the data matching model specifically includes:
classifying a plurality of national standard operation names included in the international disease classification standard according to the parts of the human body or the animal body to obtain a value domain classification tree; each layer in the value domain classification tree at least comprises one node;
the historical operation names obtained from the hospital information system HIS and the corresponding relation between the historical operation names and national standard operation names;
processing the historical operation name to obtain the sample feature vector group; obtaining the label by using the history operation name, the corresponding relation and the value range classification tree;
training the model to be trained by using the sample feature vector group with the tag, and stopping training and obtaining the data matching model when a preset ending condition is met.
Optionally, processing the historical operation name to obtain the sample feature vector set specifically includes:
splitting the historical operation name to obtain m dimension basic features corresponding to the historical operation name; obtaining a feature relation feature value w of the kth dimension by using the basic features of the m dimensions k Wherein, k=1, 2, …, m;
Obtaining department information and/or registration information of the historical operation name from a hospital information system HIS, and obtaining a category vector of the historical operation name by using the department information and/or registration information;
characteristic relation characteristic value w using m dimensions 1 ,w2,…,w m And the class vector is used for obtaining the sample characteristic vector group.
Optionally, obtaining the feature relation feature value w of the kth dimension by using the basic features of the m dimensions k The method specifically comprises the following steps:
obtaining a correlation score between the basic feature of the kth dimension and the basic feature of other dimensions by using a Pelson calculation formula, a Szellman calculation formula or a chi-square test method;
obtaining the characteristic relation characteristic value w of the kth dimension by using a preset correlation coefficient and the correlation score k 。
Optionally, splitting the historical operation name to obtain basic features of m dimensions corresponding to the historical operation name, which specifically includes:
and splitting the history operation name to obtain a keyword, a target word, a word before or after the target word in a preset word window, a target word and a word before or after the target word in a preset word window of the history operation name.
Optionally, the processing is performed on the surgical name to be matched in the value range data to be matched to obtain the feature vector set to be matched, which specifically includes:
Splitting the surgical names to be matched to obtain basic characteristics of m dimensions corresponding to the surgical names to be matched; obtaining a characteristic relation characteristic value t of a kth dimension of the surgical name to be matched by utilizing basic characteristics of m dimensions corresponding to the surgical name to be matched k The method comprises the steps of carrying out a first treatment on the surface of the Wherein k=1, 2, …, m;
obtaining department information and/or registration information of the to-be-matched operation name from the HIS, and obtaining a category vector of the to-be-matched operation name by using the department information and/or registration information of the to-be-matched operation name;
utilizing characteristic relation characteristic values t of m dimensions of the surgical name to be matched 1 ,t2,…,t m And the class vector of the surgical name to be matched is used for obtaining the feature vector set to be matched.
In a second aspect, the present application provides a value range data matching device, including:
the data acquisition module is used for acquiring the value range data to be matched;
the data processing module is used for processing the surgical name to be matched in the value range data to be matched to obtain a feature vector group to be matched;
the data matching module is used for obtaining a matching result by utilizing the data matching model and the feature vector group to be matched; the data matching model is obtained after training by using a labeled sample feature vector set in advance; the label comprises a name index value of a national standard operation name corresponding to the historical operation name and node index values corresponding to all layers of the value domain classification tree of the national standard operation name; the value domain classification tree is a structural tree for classifying national standard operation names according to the parts of human bodies or animal bodies.
Optionally, the apparatus further comprises: the model training module specifically comprises:
the value domain classification tree acquisition unit is used for classifying a plurality of national standard operation names included in the international disease classification standard according to the parts of the human body or the animal body to obtain a value domain classification tree; each layer in the value domain classification tree at least comprises one node;
a surgical name acquisition unit configured to acquire the history surgical name;
the corresponding relation acquisition unit is used for acquiring the corresponding relation between the historical operation name and the national standard operation name;
the sample feature vector group acquisition unit is used for processing the historical operation name to acquire the sample feature vector group;
the label obtaining unit is used for obtaining the label by using the history operation name, the corresponding relation and the value range classification tree;
and the model training unit is used for training the model to be trained by using the sample feature vector group with the label, and stopping training and obtaining the data matching model when the preset ending condition is met.
Alternatively, the sample feature vector group acquisition unit may specifically include:
the basic feature first acquisition subunit is used for splitting the historical operation name to obtain m dimension basic features corresponding to the historical operation name;
A first obtaining subunit for obtaining a feature relation feature value w of a kth dimension by using the basic features of the m dimensions k Wherein, k=1, 2, …, m;
the class vector first acquisition subunit is used for acquiring department information and/or registration information of the historical operation names from a Hospital Information System (HIS), and acquiring class vectors of the historical operation names by utilizing the department information and/or registration information;
a sample feature vector group acquisition subunit for utilizing feature relation feature values w of m dimensions 1 ,w 2 ,…,w m And the class vector is used for obtaining the sample characteristic vector group.
Optionally, the first acquisition sub-of the characteristic relation characteristic valueThe unit is specifically used for obtaining a correlation score between the basic feature of the kth dimension and the basic feature of other each dimension by using a Pelson calculation formula, a Szellman calculation formula or a chi-square test method; obtaining the characteristic relation characteristic value w of the kth dimension by using a preset correlation coefficient and the correlation score k 。
Optionally, the first basic feature obtaining subunit is specifically configured to split the history operation name, obtain a keyword of the history operation name, a target word, a word before or after the target word in a preset word window, a target word, and a word before or after the target word in a preset word window.
Optionally, the data processing module specifically includes:
the basic feature second acquisition subunit is used for splitting the surgical names to be matched and acquiring basic features of m dimensions corresponding to the surgical names to be matched;
a second obtaining subunit for obtaining a feature relation feature value t of a kth dimension of the surgical name to be matched by using basic features of m dimensions corresponding to the surgical name to be matched k The method comprises the steps of carrying out a first treatment on the surface of the Wherein k=1, 2, …, m;
a category vector second obtaining subunit, configured to obtain department information and/or registration information of the to-be-matched operation name from the HIS, and obtain a category vector of the to-be-matched operation name by using the department information and/or registration information of the to-be-matched operation name;
a feature vector group to be matched obtaining subunit, configured to utilize feature relation feature values t of m dimensions of the surgical name to be matched 1 ,t 2 ,…,t m And the class vector of the surgical name to be matched is used for obtaining the feature vector set to be matched.
In a third aspect, the present application provides a computer readable storage medium having a computer program stored therein, which when executed by a processor, implements the value range data matching method as provided in the first aspect.
In a fourth aspect, the present application provides a processor for executing a computer program, which when run performs the value range data matching method as provided in the first aspect.
Compared with the prior art, the application has the following beneficial effects:
the method comprises the steps of obtaining value field data to be matched, wherein the value field data to be matched comprise surgical names to be matched; processing the surgical name to be matched to obtain a feature vector group to be matched; and obtaining a matching result by utilizing the pre-trained data matching model and the feature vector set to be matched. The data matching model is obtained after training a labeled sample feature vector group, and the label comprises a name index value of a national standard operation name corresponding to a historical operation name and node index values corresponding to all layers of a value domain classification tree of the national standard operation name, so that the data matching model obtained through training has the function of matching the national standard operation name according to the operation name which is not national standard, and can determine the specific classification condition of the national standard operation name in the value domain classification tree and reflect the specific classification condition through the node index values. The matching result obtained by the data matching model can be displayed in the same or similar form as the label, so that the national standard operation name matched with the operation name to be matched can be obtained by indexing by using the matching result, and the specific classification condition of the matched national standard operation name in the value domain classification tree is determined.
In the application, the automatic matching of the value domain data is carried out by utilizing the pre-trained data matching model, and compared with a manual comparison matching mode, the manual labor is effectively saved, and the matching efficiency is improved. In addition, because the value domain classification tree is divided according to the parts of the human body or the animal body, even if the surgical name to be matched is similar to other non-national standard surgical names, the surgical names with different names (namely different classifications) aiming at the parts can be effectively distinguished according to the node index value, so that the matching error is avoided. Therefore, compared with a matching scheme of fuzzy query and word segmentation comparison, the matching anti-interference performance can be improved, and the matching accuracy of the value domain data is further improved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive faculty for a person skilled in the art.
Fig. 1 is a flowchart of a value range data matching method provided in an embodiment of the present application;
fig. 2 is a schematic structural diagram of a value range classification tree according to an embodiment of the present application;
FIG. 3 is a flowchart of a method for obtaining a data matching model according to an embodiment of the present disclosure;
FIG. 4 is a flowchart of an implementation of obtaining a set of sample feature vectors according to an embodiment of the present application;
fig. 5 is a flowchart of an implementation manner of obtaining a feature vector set to be matched according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of a value range data matching device according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of another value range data matching device according to an embodiment of the present application;
FIG. 8 is a schematic structural diagram of a model training module according to an embodiment of the present disclosure;
fig. 9 is a hardware configuration diagram of a value range data matching device according to the present embodiment.
Detailed Description
As described above, for the value range data (e.g., operation name, disease name, etc.) of a large value range in the medical field, there is a non-uniform and non-standard problem in various hospitals at present, which makes the subsequent analysis and application of these value range data difficult. The problems can be solved through the value range data matching, but in the realization process of the value range data matching, the inventor discovers that the existing matching technical schemes have the problems of lower efficiency and lower accuracy.
Taking the surgical names as examples, there are coincident or similar words between some surgical names. For example, the third operation of the first hospital and the fourth operation of the second hospital have repeated terms YY, and if the value range data matching is performed by means of fuzzy query or word segmentation comparison, the terms YY interfere with the accuracy of the matching, so that the third operation and the fourth operation can be easily matched together to correspond to the fifth operation of the national standard. In practice, however, it may be due to the idioms of the first hospital that national standard "nail-lingual resection" is named third surgery; and the national standard "lung and bronchotomy" has been named fourth surgery due to the idioms of the second hospital. It can be seen that the third procedure and the fourth procedure each actually correspond to a different national standard surgical name, but because of overlapping or similar wordings, a mismatch is caused.
In addition, the matching of the value range data can be realized in a manual comparison mode at present. However, this way of manually participating in the matching requires a lot of manpower and is inefficient. For example, the surgical name of a hospital may be updated after a period of time, which means that manual controls need to be re-conducted, which is time consuming and laborious. In addition, the accuracy of manual operation is influenced by vision and fatigue degree, the error rate is high, and the accuracy of value range data matching is influenced.
Based on the above problems, the present application provides a value range data matching method, device and related products. And (3) utilizing a pre-trained data matching model, taking the feature vector set to be matched obtained after the operation name to be matched is processed as the input of the model when the value range data is required to be matched, and obtaining the matching result output by the model after the operation and the processing are carried out by the model. Because the model is obtained through pre-training, the convenience is high, and the matching efficiency is high. The model has the functions of surgical name matching and classification of identification, and the classification is obtained based on a value range classification tree, and the value range classification tree takes a part as a classification basis, so that a matching result has stronger exclusivity, the matching interference of similar surgical names is avoided, and the matching accuracy of the value range data of a large value range is improved. In addition, the matching method does not need to consume a large amount of manpower, so that the matching efficiency is high, and the manpower cost is saved.
In order to make the present invention better understood by those skilled in the art, the following description will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings, and it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Method embodiment
Referring to fig. 1, the figure is a flowchart of a value range data matching method provided in an embodiment of the present application. The value range data matching method provided by the embodiment can be applied to an area platform in the medical field, and the area platform can be realized in a server mode. The server applies this method to match value range data from various hospitals within the area.
As shown in fig. 1, the value range data matching method provided in the embodiment of the present application includes:
step 101: and obtaining the value range data to be matched.
In an application scenario of the method of the embodiment, a region corresponding to the region platform (server) includes a plurality of hospitals, and each hospital adopts a medical information system HIS. The server can realize remote communication with the HISs of the hospitals, and the server can acquire medical data of the hospitals through the database of the HISs of the hospitals. The medical data comprise value range data to be matched.
As an example, the value range data to be matched may include, but is not limited to: surgical name to be matched, disease name to be matched, medical insurance category to be matched, sex of patient to be matched, etc. In addition, it should be noted that the value range data to be matched may be acquired simultaneously or separately. For example, the surgical name to be matched and the disease name to be matched are firstly obtained simultaneously, and then the medical insurance category to be matched and the sex of the patient to be matched are obtained according to actual requirements. Therefore, the acquisition timing of the value range data to be matched is not particularly limited here
Step 102: and processing the surgical name to be matched in the value domain data to be matched to obtain the feature vector group to be matched.
The method of the embodiment mainly aims at matching the national standard operation names corresponding to the operation names to be matched. As one possible implementation, the surgical name included in the international disease classification standard may be regarded as a national standard surgical name. In this example, the international disease classification standard may be described in the Ninth clinical Revision of the international disease classification (International Classification of Diseases, ninth review, clinical Modification, ICD-9-CM). ICD-9-CM is well known to those skilled in the art, and therefore, the national surgical names included in this application for ICD-9-CM are not listed one by one.
Before matching the surgical names, the surgical names to be matched need to be processed, and vector representations of the surgical names to be matched are obtained through processing. In this embodiment, the vector representation of the surgical name to be matched is referred to as a feature vector set to be matched.
As an example implementation manner, the surgical name to be matched may be split, so as to obtain basic features such as keywords, target words, and the like of the surgical name to be matched. And obtaining a series of characteristic relation characteristic values by analyzing the association relation of the basic characteristics. In addition, department information and/or registration information of the surgical name to be matched can also be obtained. Since the department information and/or registration information generally reflects the department corresponding to the name of the surgery to be matched, the number of the parts of the human body or animal body, which are responsible for diagnosis and treatment, of each department is limited (for example, the eye is responsible for eyes and the uterus is responsible for gynaecology), and therefore the category information of the name of the surgery to be matched can be extracted by using the department information and/or registration information of the name of the surgery to be matched. The category information may in particular be in the form of a category vector. The characteristic relation characteristic value and the category vector are used as the constituent elements of the characteristic vector group to be matched.
It can be appreciated that the foregoing is merely an example implementation manner of this step, and other implementation manners may be adopted to obtain the feature vector set to be matched according to specific matching requirements or the kind of data stored in the HIS during actual application. Therefore, the implementation of this step is not limited here, nor are constituent elements of the feature vector group to be matched.
Step 103: and obtaining a matching result by using the data matching model and the feature vector group to be matched.
It should be noted that, in this embodiment, the data matching model is obtained by training the sample feature vector set with the tag in advance. The sample feature vector set is a vectorized representation of historical surgical names, which may be obtained from HIS of each hospital. The labels of the sample feature vector set include: the name index value of the national standard operation name corresponding to the historical operation name and the node index value corresponding to the national standard operation name in each layer of the value domain classification tree. The name index value may be used to index the national standard surgical name corresponding to the obtained historical surgical name.
In this embodiment, the history operation name and the operation name to be matched refer to operation names for which no value range matching has been performed. The surgical name to be matched refers to the surgical name which is required to be subjected to value range matching by using the trained model at present after the trained model is obtained; and historical surgical names refer to surgical names that were not value range matched used before model training was completed. That is, the history name in this embodiment is specifically history data, and is used to form samples for training the model, with respect to the name of the surgery to be matched.
For easy understanding of the form and node index values of the value range classification tree, reference may be made to fig. 2, which is a schematic structural diagram of the value range classification tree provided in the embodiment of the present application.
As shown in fig. 2, the value range classification tree includes multiple layers (layer 1, layer 2, …, layer s, where s is an integer greater than 2, respectively, from root to leaf), each layer including at least one node. Wherein, the only node of the 1 st layer does not refer to a specific national standard operation name; each node of layers 2 through s represents a different national surgical name. The nodes of layer 2 correspond to various national surgical names classified by the location of the human or animal body, such as: nervous system surgery, endocrine system surgery, eye surgery, respiratory system surgery, and the like. The national standard operation names corresponding to each node of the layer 2 can be further classified in sequence. For example, respiratory surgery may be divided into laryngeal resections, lung and bronchoresections, and the like, and so on.
As can be seen in conjunction with fig. 2, each parent node at layers 1 to s-1 in the value domain classification tree has at least one corresponding child node; each child node of layers 2 through s each has a corresponding parent node. The node index value contained in the label of the sample feature vector group specifically refers to the node index value corresponding to each layer of the value domain classification tree of the national standard operation name corresponding to the historical operation name. For ease of understanding, the following is illustrative.
Assuming that the national standard operation name corresponding to the historical operation name is laryngeal resection, the method is easy to determine according to the father-son relationship between nodes: node 235, representing a laryngeal resection, and its parent node 222 represents a respiratory procedure. Thus, the labels of the sample feature vector set include the node index value of the node and the node index value of the node. It can be appreciated that the node index value in the tag can be used to index the classification of the national standard surgical name corresponding to the determined historical surgical name in the value range classification tree.
It can be appreciated that in the model training process, the labels are used as training bases for deep learning. Through deep learning training, the model gradually has the function of outputting labels from the sample feature vector group. Therefore, after the feature vector set to be matched obtained in step 102 is input to the pre-trained data matching model, the data matching model can output the matching result meeting the matching requirement through operation and processing. The specific composition of the matching result is the same as or similar to the label used in the training phase. Namely, the matching result contains the name index value of the national standard operation name corresponding to the operation name to be matched and the node index value corresponding to each layer of the value domain classification tree of the national standard operation name.
The above is the value range data matching method provided by the embodiment of the application. In the method, value domain data to be matched is obtained, wherein the value domain data to be matched comprises surgical names to be matched; processing the surgical name to be matched to obtain a feature vector group to be matched; and obtaining a matching result by utilizing the pre-trained data matching model and the feature vector set to be matched. The data matching model is obtained after training a labeled sample feature vector group, and the label comprises a name index value of a national standard operation name corresponding to a historical operation name and node index values corresponding to all layers of a value domain classification tree of the national standard operation name, so that the data matching model obtained through training has the function of matching the national standard operation name according to the operation name which is not national standard, and can determine the specific classification condition of the national standard operation name in the value domain classification tree and reflect the specific classification condition through the node index values. The matching result obtained by the data matching model can be displayed in the same or similar form as the label, so that the national standard operation name matched with the operation name to be matched can be obtained by indexing by using the matching result, and the specific classification condition of the matched national standard operation name in the value domain classification tree is determined.
In the application, the automatic matching of the value domain data is carried out by utilizing the pre-trained data matching model, and compared with a manual comparison matching mode, the manual labor is effectively saved, and the matching efficiency is improved. In addition, because the value domain classification tree is divided according to the parts of the human body or the animal body, even if the surgical name to be matched is similar to other non-national standard surgical names, the surgical names with different names (namely different classifications) aiming at the parts can be effectively distinguished according to the node index value, so that the matching error is avoided. Therefore, compared with a matching scheme of fuzzy query and word segmentation comparison, the value range data matching method provided by the application can also improve the anti-interference performance of matching, and further improve the matching accuracy of the value range data.
To facilitate an understanding of the training process of the data matching model, a specific implementation of training the data matching model is described below in conjunction with fig. 3 and the embodiment.
Referring to fig. 3, a flowchart of obtaining a data matching model is provided in an embodiment of the present application. The various steps illustrated in fig. 3 may be performed in particular before steps 101-103 described in the previous embodiments.
As shown in fig. 3, the implementation flow of obtaining the data matching model includes:
Step 301: and classifying a plurality of national standard operation names included in the international disease classification standard according to the parts of the human body or the animal body to obtain a value domain classification tree.
As an example implementation, the ICD-9-CM may include multiple national surgical names classified by the corresponding location of the human or animal body (treatment location or morbidity location), and the relationships (including, but not limited to, membership, non-membership, and complementation relationships) of all the national surgical names under each classification may be determined. And deploying all nodes below the layer 2 of the value domain classification tree by utilizing the relation of all national standard operation names under each classification. An example form of a value range classification tree may be referred to in fig. 2.
Step 302: the obtained historical operation names and the corresponding relation between the historical operation names and national standard operation names.
For ease of understanding, a hospital information system HIS and value domain comparison system is presented herein.
HIS is a system inside a hospital. The value field data which are not subjected to standardized matching in the hospital, such as historical operation names, historical disease names, historical medical insurance types, historical patient sexes and the like, are stored in a database of the HIS. In this step, the server (i.e., the regional platform) may specifically obtain a large number of historical surgical names from the HIS, so as to facilitate subsequent processing to obtain sample data for the training model. For example, 300 different historical surgical names are taken from the HIS.
The value range comparison system is pre-established. It should be noted that the historical operation names stored in the HIS database of each hospital are not generally arbitrarily named, and these historical operation names also follow the naming rules of the medical service layer. Although there are many possible variants of naming rules for each hospital (e.g., variants that occur according to business idioms or spoken language features of the hospital), historical surgical names all correspond to national surgical names.
As an example implementation, the correspondence may be obtained during a surgical name naming phase of the hospital, where the correspondence is stored in a value range comparison system in the form of a file. As another example implementation manner, the corresponding relationship may be obtained by manual control after naming, and the corresponding relationship is stored in the value range control system in a file form. When the step is executed, the server obtains the corresponding relation between the historical operation name and the national standard operation name from the value domain comparison system. For example, 300 history names are obtained from the HIS, and for ease of reference, a001 to a300 are abbreviated (not shown in the drawings), so that the correspondence between the 300 history names a001 to a300 and national standard operation names B001 to B300 (not shown in the drawings) need to be obtained from the value range comparison system. Wherein a001 corresponds to B001, a002 corresponds to B002, …, a300 corresponds to B300.
It should be noted that, with the progress of establishing the regional platform, the requirements of the medical and health institutions such as hospitals for matching the value range data are very large. Because the correspondence can be used for forming sample data used for training the model and for training to obtain a data matching model for matching the accurate and efficient value range data in the method of the embodiment, even if the correspondence is obtained in the above-mentioned example mode, namely by manual comparison, and then stored in the value range comparison system, compared with the method of adopting manual comparison every time the value range data is required to be matched, the matching efficiency is greatly improved. It can be seen that the manual effort of manually comparing the correspondence between the historical surgical names and the national standard surgical names is very little compared with the value range matching requirement.
Step 303: and processing the historical operation name to obtain the sample feature vector group.
The set of sample feature vectors is a vectorized representation of the historical surgical name. The set of sample feature vectors serves as sample data for a subsequent training model.
An exemplary implementation of step 303 is described below in connection with fig. 4. Referring to fig. 4, a flowchart of an implementation manner of obtaining a sample feature vector set according to an embodiment of the present application is provided.
Step 3031: splitting the history operation name to obtain the basic characteristics of m dimensions corresponding to the history operation name.
In practical application, the history operation name can be processed by a word segmentation method to obtain basic characteristics of multiple dimensions. Word segmentation is a relatively mature technology in the field, so that some mature word segmentation algorithms can be utilized to split the historical operation names, and the word segmentation algorithms are not particularly limited herein.
m represents the number of dimensions of the resolved basic feature, and m is an integer greater than 1. As an example, the resolved basic features may include at least one of:
keywords of the history operation name, target words, words before or after the target words in a preset word window.
The keywords may be obtained by performing the same (or duplicate) removal on different historical operation names. For example, the history surgery name a001 is XYZ surgery, the history surgery name a008 is XUZ surgery, and after the history surgery name a001 is removed, "Y" may be used as a keyword of the history surgery name a001, and "U" may be used as a keyword of the history surgery name a 008.
The target word and the target word can be set according to requirements or word segmentation algorithm respectively. For example, the historical surgical name "ABCDEFGHIJKLMN" where A, B, C, D, E, F, G, H, I, J, K, L, M, N each represents a chinese character and AB, CD, EFG, HI, JK, LMN each represents a word. For a target word, the size of the word window may be preset to obtain words before or after the target word. Similarly, for a target word, the size of the word window may be preset to obtain a word before or after the word.
As an example, the target word is D, the word window size is 2, and words preceding the target word in the word window are sequentially B and C in the order from front to back, and words following the target word are sequentially E and F. As an example, if the target word is HI and the word window size is 2, the words preceding the target word in the word window are CD and EFG in order from front to back, and the words following the target word are JK and LMN in order.
As a possible implementation, a dictionary may be pre-established, where each chinese character in the dictionary corresponds to an independent serial number. The above basic features can be represented and applied in the form of a serial number.
Step 3032: obtaining characteristic relation characteristic value w of kth dimension by using basic characteristics of m dimensions k 。
The basic feature of the kth dimension may refer to a basic feature of any one of the basic features of the m dimensions, i.e. k=1, 2, …, m. The feature relation feature value of the kth dimension characterizes the feature relation between the basic feature of the kth dimension and the basic features of other dimensions. See equation (1) showing the eigenvalues w of the eigenvalues of the kth dimension of the historical surgical name k Is obtained by the acquisition mode of the system.
In the formula (1), u is a preset correlation coefficient, And scoring the correlation between the basic feature of the kth dimension and the basic feature of the p dimension. Wherein p has a value from 1 to m and p is not equal to k. That is, the basic features of the kth dimension are different from those of the p-th dimension.
In specific implementation, the basic feature of each dimension can be expressed in a vector form, and then the correlation degree score between the basic feature of the kth dimension and the basic feature of the q dimension is obtained through a Pelson calculation formula, a Szechwan calculation formula or a chi-square test methodIn this embodiment, if a pearson calculation formula is adopted, a pearson correlation coefficient obtained by the pearson calculation formula is referred to as a correlation score in this embodiment; if the spearman calculation formula is adopted, the spearman correlation coefficient obtained by the spearman calculation formula is referred to as a correlation score in this embodiment.
It will be appreciated that the manner of obtaining the relevance score in practical applications is not limited to the several examples above, and the implementation of obtaining the relevance score in this step is not specifically limited herein.
Step 3033: department information and/or registration information of the historical operation name are obtained from the hospital information system HIS, and category vectors of the historical operation name are obtained by using the department information and/or registration information.
It should be noted that the server may also obtain department information and/or registration information of the historical surgical name from the HIS. It will be appreciated that the department information and/or registration information can reflect, to some extent, the location to which the doctor's (or physician's) diagnostic historic surgical name corresponds. For example, if the department of registration is ophthalmic in the registration information, it is not possible to conclude that the patient has gastric ulcers or has tinea pedis in the disease diagnosis link. It can be seen that department information and/or registration information of historical surgical names can help to exclude the possibility of matching with other irrelevant categories of surgical names, thereby improving matching accuracy.
For convenience of processing, department information and/or registration information of the historical operation name may be expressed in a vector form in this embodiment. For example, a category vector of the history surgery name is formed, in which a first element is used to represent department information of the history surgery name; the second element is used to represent registration information for the historical surgical name.
Step 3034: characteristic relation characteristic value w using m dimensions 1 ,w 2 ,…,w m And category vectors, a set of sample feature vectors is obtained.
See equation (2), which shows an example form of a sample feature vector set:
R={W,C} (2)
In the formula (2), C is a category vector of the history surgery name, W is a feature relation feature vector of the history surgery name, and the expression of W is as follows:
W={w 1 ,w 2 ,...,w m } (3)
as can be seen from the combination of steps 3031-3034, in this embodiment, each sample feature vector set R includes two pieces of information of a history name, one piece of information is represented by the feature relation feature vector W of the history name, and the other piece of information is represented by the category vector C of the history name.
Step 304: and obtaining the labels of the sample feature vector group by using the historical operation names, the corresponding relation between the historical operation names and the national standard operation names and the value domain classification tree.
It should be noted that, in this embodiment, the label of the sample feature vector set includes at least the following two parts: and the name index value of the national standard operation name corresponding to the historical operation name and the node index value corresponding to each layer of the value domain classification tree of the national standard operation name. The name index value and the node index value included in the tag are described below, respectively.
In practical application, the national standard operation names corresponding to the historical operation names can be determined by utilizing the historical operation names and the corresponding relations. Because the value field classification tree is established after a plurality of national standard operation names are classified according to the parts of the human body or the animal body, the value field classification tree necessarily comprises nodes representing the national standard operation names corresponding to the historical operation names. In practical application, each node has a name index value, and the name index value can be used for indexing and obtaining the national standard operation name represented by the node.
In this embodiment, the nodes corresponding to the national standard surgical names at each layer of the value domain classification tree may be determined by using the national standard surgical names corresponding to the historical surgical names and the value domain classification tree. It should be noted that, the nodes corresponding to the national standard operation names at each level of the value domain classification tree include the node representing the national standard operation name and each ancestor node (except the 1 st level root node) of the node.
For ease of understanding, reference is made herein to the value range classification tree described with respect to fig. 2. Assume node 243 represents a national standard surgical name corresponding to a historical surgical name. Ancestor nodes of node 243 include node 236 and node 223. It will be appreciated that the parent-child relationship between nodes reveals the classification relationship between the national standard surgical names represented by the nodes.
In practical applications, each node has a node index value. The node index values of the nodes with father-son relations have corresponding relations, so that the node index values can be used for indexing and determining the classification condition of the national standard operation names corresponding to the historical operation names in the value domain classification tree.
By performing the above steps 301-304, a sample feature vector set for training a model is obtained step by step, and labels of the sample feature vector set are obtained. And then training the model by using the characteristic vector group with the label, so that the model has the function of accurately matching the value range data in the training process.
Step 305: training the model to be trained by using the sample feature vector group with the label, judging whether a preset ending condition is met, and if so, executing step 306; if not, step 305 is looped.
In practical applications, the end condition for model training may be set. As an example, the preset end condition may be that the number of training iterations reaches a preset number. As another example, the preset end condition may also be that the value of the objective function reaches below a preset threshold.
It can be understood that if the preset end condition is met, the accuracy of matching the value range data by the model is indicated to meet the actual requirement of matching the value range data. And the value domain data matching can be actually performed by applying the data matching model obtained through training. Otherwise, the matching accuracy of the value range data still does not meet the actual requirement, and training needs to be continued.
Step 306: model training is stopped and a data matching model is obtained.
It will be appreciated that when model training ceases, parameters within the model are critical to ensuring that the model data matches. The parameters at this time can be used as parameters inside the subsequent model when it is actually used. These parameters may be stored and recalled and loaded when the model requires an actual application.
From the above description, after the model is trained, the model can be used for actually performing value range data matching. That is, the model training process described in steps 301-306 above occurs before steps 101-103.
In the previous embodiments, it was described that the model was actually input not the surgical name to be matched, but a vectorized representation of the surgical name to be matched, i.e. the set of feature vectors to be matched, during the model application process. See step 102 in particular.
In order to ensure the matching of the model obtained by training to the input in practice, the feature vector set to be matched is obtained in a similar manner to the sample feature vector set obtained in the embodiment. An exemplary implementation of step 102 is described below in conjunction with fig. 5.
Referring to fig. 5, a flowchart of an implementation manner of obtaining a feature vector set to be matched according to an embodiment of the present application is provided.
Step 1021: splitting the surgical names to be matched, and obtaining the basic characteristics of m dimensions corresponding to the surgical names to be matched.
The surgical name to be matched described in this step specifically refers to the surgical name to be matched in the value range data to be matched obtained in step 101. The m-dimensional basic features obtained by splitting in this step may include, but are not limited to, the following basic features:
The method comprises the steps of matching keywords of an operation name, target words, words before or after the target words in a preset word window, target words and words before or after the target words in a preset word window.
Step 1022: obtaining a characteristic relation characteristic value t of the kth dimension of the surgical name to be matched by utilizing the basic characteristics of m dimensions corresponding to the surgical name to be matched k 。
Note that k=1, 2, …, m. When the step is specifically implemented, a pearson calculation formula, a spearman calculation formula or a chi-square test method can be utilized to obtain a correlation score between the basic feature of the kth dimension of the surgical name to be matched and the basic feature of other dimensions; obtaining the kth dimension of the surgical name to be matched by utilizing the preset correlation coefficient and the correlation scoreCharacteristic relation characteristic value t of (2) k 。
See equation (4) showing the eigenvalue t of the eigenvalue of the kth dimension of the surgical name to be matched k The acquisition mode of (1):
in the formula (4), u is a preset correlation coefficient,and scoring the correlation between the basic features of the kth dimension and the basic features of the p dimension of the surgical name to be matched. Wherein p has a value from 1 to m and p is not equal to k. That is, the basic features of the kth dimension of the surgical name to be matched are not identical to the basic features of the p-th dimension of the surgical name to be matched.
Step 1023: and obtaining department information and/or registration information of the surgical name to be matched from the HIS, and obtaining the category vector of the surgical name to be matched by using the department information and/or registration information of the surgical name to be matched.
The implementation of obtaining a class vector of a surgical name to be matched is substantially the same as the implementation of obtaining a class vector of a non-matched surgical name, and reference is made to step 3033 for a description of this step.
Step 1024: characteristic relation characteristic value t of m dimensions by using surgical name to be matched 1 ,t 2 ,…,t m And obtaining the feature vector group to be matched by the category vector of the surgical name to be matched.
The implementation of this step is substantially the same as the implementation of obtaining the sample feature vector set in step 3034, and for the implementation of this step, reference may be made to the description of step 3034, which is not repeated here.
The above is an example implementation description of obtaining the feature vector set to be matched. By executing steps 1021-1024, the consistency of the acquisition modes of the feature vector group to be matched and the sample feature vector group is ensured, so that the trained data matching model is ensured to have better adaptability to the feature vector group to be matched input into the model during actual application, and the model is further convenient to output a matching result close to the label quality of the sample feature vector group.
According to the method and the device, the problem that the operation names of the large-value range cannot be intensively matched due to various types is solved, the operation names are classified into different classes through the value range classification tree (structure tree), the data matching model based on supervision is adopted for matching in a small range, interference of the operation names on correct matching operation names due to the extremely similar operation names can be effectively prevented, and matching accuracy is greatly improved.
The unified and standardized problem of the value range data (namely the operation name) of the large value range is taken as the prominent problem of large value range matching, and the method is characterized in that the national standard operation names are huge in variety. In the above embodiments, the surgical name matching is performed by sufficiently combining the two means. Firstly, key words, position relations and association relations among key words, words and position relations of value field data are adopted to provide an effective vectorization scheme for deep learning; secondly, the operation name is split according to the position of the human body or the animal body to form a value field classification tree, and value field matching is performed on each layer until a matching result with higher final accuracy is formed. It can be understood that when the sample is obtained in this embodiment, the historical operation name or the operation name to be matched is docked to each layer of the value range classification tree, so that the accuracy of the matching result is greatly improved.
Based on the value range data matching method provided by the foregoing embodiment, the present application further provides a value range data matching device. The following description is made with reference to the examples and the accompanying drawings.
Device embodiment
Referring to fig. 6, the structure of a value range data matching device provided in the embodiment of the present application is shown. As shown in fig. 6, the value range data matching device provided in this embodiment includes:
a data acquisition module 601, configured to obtain value range data to be matched;
the data processing module 602 is configured to process the surgical name to be matched in the value range data to be matched to obtain a feature vector set to be matched;
the data matching module 603 is configured to obtain a matching result by using the data matching model and the feature vector set to be matched.
The data matching model described in the embodiment is obtained after training by using labeled sample feature vector sets in advance; the sample feature vector set is a vectorized representation of historical surgical names; the label comprises a name index value of a national standard operation name corresponding to the historical operation name and node index values corresponding to all layers of the value domain classification tree of the national standard operation name; the value domain classification tree is a structural tree for classifying national standard operation names according to the parts of human bodies or animal bodies.
The data matching model is obtained after training a labeled sample feature vector group, and the label comprises a name index value of a national standard operation name corresponding to a historical operation name and node index values corresponding to all layers of a value domain classification tree of the national standard operation name, so that the data matching model obtained through training has the function of matching the national standard operation name according to the operation name which is not national standard, and can determine the specific classification condition of the national standard operation name in the value domain classification tree and reflect the specific classification condition through the node index values. The matching result obtained by the data matching model can be displayed in the same or similar form as the label, so that the national standard operation name matched with the operation name to be matched can be obtained by indexing by using the matching result, and the specific classification condition of the matched national standard operation name in the value domain classification tree is determined.
In the application, the automatic matching of the value domain data is carried out by utilizing the pre-trained data matching model, and compared with a manual comparison matching mode, the manual labor is effectively saved, and the matching efficiency is improved. In addition, because the value domain classification tree is divided according to the parts of the human body or the animal body, even if the surgical name to be matched is similar to other non-national standard surgical names, the surgical names with different names (namely different classifications) aiming at the parts can be effectively distinguished according to the node index value, so that the matching error is avoided. Therefore, compared with a matching scheme of fuzzy query and word segmentation comparison, the matching anti-interference performance can be improved, and the matching accuracy of the value domain data is further improved.
Optionally, in an actual application, the data matching device provided in this embodiment may further include: and the model training module enables the device to have the function of model training. Referring to fig. 7, which is a schematic structural diagram of another value range data matching device, as can be seen from fig. 7, a model training module 701 is further added on the basis of the device structure shown in fig. 6.
Referring to fig. 8, the structure of a model training module according to an embodiment of the present application is shown.
As shown in fig. 8, the model training module 701 may specifically include:
a value range classification tree obtaining unit 7011, configured to classify a plurality of national standard surgical names included in an international disease classification standard according to a part of a human body or an animal body, to obtain a value range classification tree; each layer in the value domain classification tree at least comprises one node;
a surgical name acquisition unit 7012 for acquiring the history surgical name;
a correspondence acquiring unit 7013, configured to obtain a correspondence between the historical surgical name and a national standard surgical name;
a sample feature vector set obtaining unit 7014, configured to process the historical surgical name to obtain the sample feature vector set;
A tag obtaining unit 7015, configured to obtain the tag by using the history operation name, the correspondence, and the value range classification tree;
the model training unit 7016 is configured to train the model to be trained using the sample feature vector set with the tag, and stop training and obtain the data matching model when a preset end condition is satisfied.
Alternatively, the sample feature vector group acquisition unit 7014 may specifically include:
the basic feature first acquisition subunit is used for splitting the historical operation name to obtain m dimension basic features corresponding to the historical operation name;
a first obtaining subunit for obtaining a feature relation feature value w of a kth dimension by using the basic features of the m dimensions k Wherein, k=1, 2, …, m;
the class vector first acquisition subunit is used for acquiring department information and/or registration information of the historical operation names from a Hospital Information System (HIS), and acquiring class vectors of the historical operation names by utilizing the department information and/or registration information;
a sample feature vector group acquisition subunit for utilizing feature relation feature values w of m dimensions 1 ,w 2 ,…,w m And the class vector is used for obtaining the sample characteristic vector group.
Optionally, the first obtaining subunit of feature relation feature values is specifically configured to obtain a relevance score between the basic feature of the kth dimension and the basic feature of each other dimension by using a pearson calculation formula, a spearman calculation formula, or a chi-square test method; obtaining the characteristic relation characteristic value w of the kth dimension by using a preset correlation coefficient and the correlation score k 。
Optionally, the first basic feature obtaining subunit is specifically configured to split the history operation name, obtain a keyword of the history operation name, a target word, a word before or after the target word in a preset word window, a target word, and a word before or after the target word in a preset word window.
According to the method and the device, the problem that the operation names of the large-value range cannot be intensively matched due to various types is solved, the operation names are classified into different classes through the value range classification tree (structure tree), the data matching model based on supervision is adopted for matching in a small range, interference of the operation names on correct matching operation names due to the extremely similar operation names can be effectively prevented, and matching accuracy is greatly improved.
The unified and standardized problem of the value range data (namely the operation name) of the large value range is taken as the prominent problem of large value range matching, and the method is characterized in that the national standard operation names are huge in variety. In the above embodiments, the surgical name matching is performed by sufficiently combining the two means. Firstly, key words, position relations and association relations among key words, words and position relations of value field data are adopted to provide an effective vectorization scheme for deep learning; secondly, the operation name is split according to the position of the human body or the animal body to form a value field classification tree, and value field matching is performed on each layer until a matching result with higher final accuracy is formed. It can be understood that when the sample is obtained in this embodiment, the historical operation name or the operation name to be matched is docked to each layer of the value range classification tree, so that the accuracy of the matching result is greatly improved.
Optionally, the data processing module 602 specifically includes:
the basic feature second acquisition subunit is used for splitting the surgical names to be matched and acquiring basic features of m dimensions corresponding to the surgical names to be matched;
a second obtaining subunit for obtaining a feature relation feature value t of a kth dimension of the surgical name to be matched by using basic features of m dimensions corresponding to the surgical name to be matched k The method comprises the steps of carrying out a first treatment on the surface of the Wherein k=1, 2, …, m;
a category vector second obtaining subunit, configured to obtain department information and/or registration information of the to-be-matched operation name from the HIS, and obtain a category vector of the to-be-matched operation name by using the department information and/or registration information of the to-be-matched operation name;
a feature vector group to be matched obtaining subunit, configured to utilize feature relation feature values t of m dimensions of the surgical name to be matched 1 ,t 2 ,…,t m And the class vector of the surgical name to be matched is used for obtaining the feature vector set to be matched.
According to the above description, the device ensures the consistency of the acquisition modes of the feature vector group to be matched and the sample feature vector group, so that the trained data matching model is ensured to have better adaptability to the feature vector group to be matched input into the training data matching model in actual application, and the model is further convenient to output a matching result close to the quality of the labels of the sample feature vector group.
Based on the value range data matching method and device provided by the foregoing embodiment, the embodiment of the application further provides a computer readable storage medium.
The storage medium stores a program which, when executed by a processor, implements some or all of the steps in the value range data matching method protected by the foregoing method embodiments of the present application.
The storage medium may be a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
Based on the value range data matching method, the device and the storage medium provided by the foregoing embodiments, the embodiment of the application provides a processor. The processor is configured to execute a program, where when the program is executed, part or all of the steps in the value range data matching method protected by the foregoing method embodiment are executed.
Based on the storage medium and the processor provided in the foregoing embodiments, the present application further provides a value range data matching device.
Referring to fig. 9, the hardware structure diagram of the value range data matching device provided in this embodiment is shown.
As shown in fig. 9, the value range data matching device includes: memory 901, processor 902, communication bus 903, and communication interface 904.
The memory 901 stores a program that can be run on a processor, and when the program is executed, part or all of the steps in the value range data matching method provided in the foregoing method embodiment of the present application are implemented. Memory 901 may include high-speed random access memory, but may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device.
In this device, a processor 902 and a memory 901 transmit signaling, logic instructions, and the like through a communication bus. The device is capable of communicating with other devices via the communication interface 904.
The trained data matching model has the function of matching the national standard operation name according to the operation name which is not the national standard, can determine the specific classification condition of the national standard operation name in the value domain classification tree, and reflects the specific classification condition through the node index value. Therefore, by executing the method through the program, the matching result obtained by using the data matching model can be displayed in the form same as or similar to the label, so that the national standard operation name matched by the operation name to be matched can be obtained by indexing by using the matching result, and the specific classification condition of the matched national standard operation name in the value domain classification tree can be determined.
In the application, the automatic matching of the value domain data is carried out by utilizing the pre-trained data matching model, and compared with a manual comparison matching mode, the manual labor is effectively saved, and the matching efficiency is improved. In addition, because the value domain classification tree is divided according to the parts of the human body or the animal body, even if the surgical name to be matched is similar to other non-national standard surgical names, the surgical names with different names (namely different classifications) aiming at the parts can be effectively distinguished according to the node index value, so that the matching error is avoided. Therefore, compared with a matching scheme of fuzzy query and word segmentation comparison, the matching anti-interference performance can be improved, and the matching accuracy of the value domain data is further improved.
It should be noted that, in the present specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment is mainly described in a different point from other embodiments. In particular, for the apparatus and system embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, with reference to the description of the method embodiments in part. The above-described apparatus and system embodiments are merely illustrative, in which elements illustrated as separate elements may or may not be physically separate, and elements illustrated as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
The foregoing is merely one specific embodiment of the present application, but the protection scope of the present application is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present application should be covered in the protection scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.
Claims (8)
1. A value range data matching method, comprising:
obtaining value range data to be matched;
processing the surgical name to be matched in the value range data to be matched to obtain a feature vector group to be matched;
obtaining a matching result by using the data matching model and the feature vector group to be matched; the data matching model is obtained after training by using a sample feature vector group with a label in advance; the sample feature vector group is a vectorization representation of a historical operation name, and the label comprises a name index value of a national standard operation name corresponding to the historical operation name and node index values corresponding to all layers of a value domain classification tree of the national standard operation name; the value domain classification tree is a structural tree for classifying national standard operation names according to the parts of human bodies or animal bodies;
the processing of the surgical name to be matched in the value domain data to be matched to obtain the feature vector group to be matched specifically comprises the following steps:
splitting the surgical names to be matched to obtain basic characteristics of m dimensions corresponding to the surgical names to be matched; obtaining a characteristic relation characteristic value t of a kth dimension of the surgical name to be matched by utilizing basic characteristics of m dimensions corresponding to the surgical name to be matched k The method comprises the steps of carrying out a first treatment on the surface of the Wherein k=1, 2, …, m;
obtaining department information and/or registration information of the surgical name to be matched from a hospital information system HIS, and obtaining a category vector of the surgical name to be matched by using the department information and/or registration information of the surgical name to be matched;
utilizing characteristic relation characteristic values t of m dimensions of the surgical name to be matched 1 ,t 2, …,t m The class vector of the surgical name to be matched is used for obtaining the feature vector group to be matched;
the data matching model is obtained, which concretely comprises the following steps:
classifying a plurality of national standard operation names included in the international disease classification standard according to the parts of the human body or the animal body to obtain a value domain classification tree; each layer in the value domain classification tree at least comprises one node;
acquiring the historical operation name and the corresponding relation between the historical operation name and the national standard operation name from a hospital information system HIS;
processing the historical operation name to obtain the sample feature vector group; obtaining the label by using the history operation name, the corresponding relation and the value range classification tree;
training a model to be trained by using the sample feature vector group with the tag, and stopping training and obtaining the data matching model when a preset ending condition is met;
The processing the historical operation name to obtain the sample feature vector group specifically comprises the following steps:
splitting the historical operation name to obtain m dimension basic features corresponding to the historical operation name; obtaining a characteristic relation characteristic value w of a kth dimension of the historical operation name by utilizing the basic characteristics of m dimensions corresponding to the historical operation name k ;
Obtaining department information and/or registration information of the historical operation name from a hospital information system HIS, and obtaining a category vector of the historical operation name by using the department information and/or registration information of the historical operation name;
characteristic relation characteristic value w of m dimensions by using historical operation name 1 ,w 2, …,w m And the category vector of the historical operation name is used for obtaining the sample characteristic vector group.
2. The method according to claim 1, wherein the feature relation feature value w of the kth dimension of the history surgery name is obtained by using the basic features of the m dimensions corresponding to the history surgery name k The method specifically comprises the following steps:
obtaining a correlation score between the basic feature of the kth dimension of the historical operation name and the basic feature of other each dimension by using a Pelson calculation formula, a Szechwan calculation formula or a chi-square test method;
Obtaining a characteristic relation characteristic value w of the kth dimension of the historical operation name by using a preset correlation coefficient and the correlation score k 。
3. The method according to claim 1, wherein the splitting the historical surgery name to obtain the basic features of m dimensions corresponding to the historical surgery name specifically includes:
and splitting the history operation name to obtain a keyword, a target word, a word before or after the target word in a preset word window, a target word and a word before or after the target word in a preset word window of the history operation name.
4. A value range data matching device, comprising:
the data acquisition module is used for acquiring the value range data to be matched;
the data processing module is used for processing the surgical name to be matched in the value range data to be matched to obtain a feature vector group to be matched;
the data matching module is used for obtaining a matching result by utilizing the data matching model and the feature vector group to be matched; the data matching model is obtained after training by using a sample feature vector group with a label in advance; the sample feature vector group is a vectorization representation of a historical operation name, and the label comprises a name index value of a national standard operation name corresponding to the historical operation name and node index values corresponding to all layers of a value domain classification tree of the national standard operation name; the value domain classification tree is a structural tree for classifying national standard operation names according to the parts of human bodies or animal bodies;
The data processing module specifically comprises:
the basic feature second acquisition subunit is used for splitting the surgical names to be matched and acquiring basic features of m dimensions corresponding to the surgical names to be matched;
a second obtaining subunit for obtaining a feature relation feature value t of a kth dimension of the surgical name to be matched by using basic features of m dimensions corresponding to the surgical name to be matched k The method comprises the steps of carrying out a first treatment on the surface of the Wherein k=1, 2, …, m;
the class vector second acquisition subunit is used for acquiring department information and/or registration information of the to-be-matched operation name from a Hospital Information System (HIS), and acquiring the class vector of the to-be-matched operation name by utilizing the department information and/or registration information of the to-be-matched operation name;
a feature vector group to be matched obtaining subunit, configured to utilize feature relation feature values t of m dimensions of the surgical name to be matched 1 ,t 2, …,t m The class vector of the surgical name to be matched is used for obtaining the feature vector group to be matched;
the apparatus further comprises: the model training module specifically comprises:
the value domain classification tree acquisition unit is used for classifying a plurality of national standard operation names included in the international disease classification standard according to the parts of the human body or the animal body to obtain a value domain classification tree; each layer in the value domain classification tree at least comprises one node;
A surgical name acquisition unit configured to acquire the history surgical name;
the corresponding relation acquisition unit is used for acquiring the corresponding relation between the historical operation name and the national standard operation name;
the sample feature vector group acquisition unit is used for processing the historical operation name to acquire the sample feature vector group;
the label obtaining unit is used for obtaining the label by using the history operation name, the corresponding relation and the value range classification tree;
the model training unit is used for training the model to be trained by using the sample feature vector group with the label, and stopping training and obtaining the data matching model when a preset end condition is met;
the sample feature vector group acquisition unit specifically includes:
the basic feature first acquisition subunit is used for splitting the historical operation name to obtain m dimension basic features corresponding to the historical operation name;
a first obtaining subunit of feature relation feature values, configured to obtain a feature relation feature value w of a kth dimension of the historical operation name by using basic features of m dimensions corresponding to the historical operation name k ;
A category vector first obtaining subunit, configured to obtain department information and/or registration information of the historical operation name from a hospital information system HIS, and obtain a category vector of the historical operation name by using the department information and/or registration information of the historical operation name;
A sample feature vector group acquisition subunit for utilizing the feature relation feature values w of m dimensions of the history operation name 1 ,w 2, …,w m And the category vector of the historical operation name is used for obtaining the sample characteristic vector group.
5. The device according to claim 4, wherein the feature relation feature value first obtaining subunit is specifically configured to obtain a correlation score between the basic feature of the kth dimension of the historical surgical name and the basic feature of each other dimension by using a pearson calculation formula, a spearman calculation formula, or a chi-square test method; obtaining a characteristic relation characteristic value w of the kth dimension of the historical operation name by using a preset correlation coefficient and the correlation score k 。
6. The apparatus according to claim 4, wherein the basic feature first obtaining subunit is specifically configured to split the history surgery name to obtain a keyword, a target word, a word before or after the target word in a preset word window, a target word, and a word before or after the target word in a preset word window of the history surgery name.
7. A computer readable storage medium, in which a computer program is stored which, when being executed by a processor, implements the value range data matching method as claimed in any one of claims 1 to 3.
8. A processor configured to run a computer program, the computer program when run performing the value range data matching method according to any of claims 1-3.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911222384.0A CN111128388B (en) | 2019-12-03 | 2019-12-03 | Value range data matching method and device and related products |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911222384.0A CN111128388B (en) | 2019-12-03 | 2019-12-03 | Value range data matching method and device and related products |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111128388A CN111128388A (en) | 2020-05-08 |
CN111128388B true CN111128388B (en) | 2024-02-27 |
Family
ID=70497399
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911222384.0A Active CN111128388B (en) | 2019-12-03 | 2019-12-03 | Value range data matching method and device and related products |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111128388B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112818085B (en) * | 2021-01-28 | 2024-06-18 | 东软集团股份有限公司 | Value range data matching method and device, storage medium and electronic equipment |
CN113656467B (en) * | 2021-08-20 | 2023-07-25 | 北京百度网讯科技有限公司 | Method and device for sorting search results and electronic equipment |
CN113925607B (en) * | 2021-11-12 | 2024-02-27 | 上海微创医疗机器人(集团)股份有限公司 | Operation robot operation training method, device, system, medium and equipment |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH05165803A (en) * | 1991-12-16 | 1993-07-02 | Hitachi Ltd | Management system for data item designation |
CN104156415A (en) * | 2014-07-31 | 2014-11-19 | 沈阳锐易特软件技术有限公司 | Mapping processing system and method for solving problem of standard code control of medical data |
CN105069123A (en) * | 2015-08-13 | 2015-11-18 | 易保互联医疗信息科技(北京)有限公司 | Automatic coding method and system for Chinese surgical operation information |
CN105787282A (en) * | 2016-03-24 | 2016-07-20 | 国家卫生计生委统计信息中心 | Automatic standardization method and system for medical data dictionaries |
CN108182207A (en) * | 2017-12-15 | 2018-06-19 | 上海长江科技发展有限公司 | The intelligent coding method and system of Chinese surgical procedure based on participle network |
CN109542965A (en) * | 2018-11-07 | 2019-03-29 | 平安医疗健康管理股份有限公司 | A kind of data processing method, electronic equipment and storage medium |
CN110246592A (en) * | 2019-06-25 | 2019-09-17 | 山东健康医疗大数据有限公司 | Realize the mapping method and system of medical institutions' isomeric data codomain code standardization |
-
2019
- 2019-12-03 CN CN201911222384.0A patent/CN111128388B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH05165803A (en) * | 1991-12-16 | 1993-07-02 | Hitachi Ltd | Management system for data item designation |
CN104156415A (en) * | 2014-07-31 | 2014-11-19 | 沈阳锐易特软件技术有限公司 | Mapping processing system and method for solving problem of standard code control of medical data |
CN105069123A (en) * | 2015-08-13 | 2015-11-18 | 易保互联医疗信息科技(北京)有限公司 | Automatic coding method and system for Chinese surgical operation information |
CN105787282A (en) * | 2016-03-24 | 2016-07-20 | 国家卫生计生委统计信息中心 | Automatic standardization method and system for medical data dictionaries |
CN108182207A (en) * | 2017-12-15 | 2018-06-19 | 上海长江科技发展有限公司 | The intelligent coding method and system of Chinese surgical procedure based on participle network |
CN109542965A (en) * | 2018-11-07 | 2019-03-29 | 平安医疗健康管理股份有限公司 | A kind of data processing method, electronic equipment and storage medium |
CN110246592A (en) * | 2019-06-25 | 2019-09-17 | 山东健康医疗大数据有限公司 | Realize the mapping method and system of medical institutions' isomeric data codomain code standardization |
Also Published As
Publication number | Publication date |
---|---|
CN111128388A (en) | 2020-05-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107731269B (en) | Disease coding method and system based on original diagnosis data and medical record file data | |
CN107705839B (en) | Disease automatic coding method and system | |
CN111128388B (en) | Value range data matching method and device and related products | |
CN111709233B (en) | Intelligent diagnosis guiding method and system based on multi-attention convolutional neural network | |
US6438533B1 (en) | System for retrieval of information from data structure of medical records | |
CN111292848B (en) | Medical knowledge graph auxiliary reasoning method based on Bayesian estimation | |
CN109255013A (en) | Claims Resolution decision-making technique, device, computer equipment and storage medium | |
CN108198620A (en) | A kind of skin disease intelligent auxiliary diagnosis system based on deep learning | |
DE102013202365A1 (en) | RETRIEVING INFORMATION FROM ILLNANCES | |
CN107193919A (en) | The search method and system of a kind of electronic health record | |
Li et al. | Ffa-ir: Towards an explainable and reliable medical report generation benchmark | |
US20240331156A1 (en) | Method and System for Automatic Multiple Lesion Annotation of Medical Images with Hybrid Deep-Learning Networks | |
CN111191415A (en) | Operation classification coding method based on original operation data | |
CN112734202B (en) | Medical capability evaluation method, device, equipment and medium based on electronic medical record | |
CN111192660B (en) | Image report analysis method, device and computer storage medium | |
CN116303981B (en) | Agricultural community knowledge question-answering method, device and storage medium | |
CN111651991A (en) | Medical named entity identification method utilizing multi-model fusion strategy | |
CN116910172B (en) | Follow-up table generation method and system based on artificial intelligence | |
CN111292814A (en) | Medical data standardization method and device | |
CN115910263A (en) | PET/CT image report conclusion auxiliary generation method and device based on knowledge graph | |
CN113673246A (en) | Semantic fusion and knowledge distillation agricultural entity identification method and device | |
CN117370525B (en) | Intelligent diagnosis guiding method based on fine tuning large model | |
CN112071431B (en) | Clinical path automatic generation method and system based on deep learning and knowledge graph | |
Cao et al. | The research on medical image classification algorithm based on PLSA-BOW model | |
CN112562809A (en) | Method and system for auxiliary diagnosis based on electronic medical record text |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |