CN111128388A - Value domain data matching method and device and related products - Google Patents

Value domain data matching method and device and related products Download PDF

Info

Publication number
CN111128388A
CN111128388A CN201911222384.0A CN201911222384A CN111128388A CN 111128388 A CN111128388 A CN 111128388A CN 201911222384 A CN201911222384 A CN 201911222384A CN 111128388 A CN111128388 A CN 111128388A
Authority
CN
China
Prior art keywords
matched
name
operation name
value
obtaining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911222384.0A
Other languages
Chinese (zh)
Other versions
CN111128388B (en
Inventor
冯仓龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Neusoft Corp
Original Assignee
Neusoft Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Neusoft Corp filed Critical Neusoft Corp
Priority to CN201911222384.0A priority Critical patent/CN111128388B/en
Publication of CN111128388A publication Critical patent/CN111128388A/en
Application granted granted Critical
Publication of CN111128388B publication Critical patent/CN111128388B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients

Landscapes

  • Engineering & Computer Science (AREA)
  • Medical Informatics (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Public Health (AREA)
  • Pathology (AREA)
  • Databases & Information Systems (AREA)
  • Biomedical Technology (AREA)
  • Epidemiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Medical Treatment And Welfare Office Work (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a value domain data matching method and device and a related product. Obtaining the name of the operation to be matched in the value domain data to be matched; processing the operation name to be matched to obtain a feature vector group to be matched; and obtaining a matching result by utilizing a pre-trained data matching model and the feature vector group to be matched. Therefore, the trained data matching model has the function of matching the international operation names according to the non-international operation names, can determine the specific classification conditions of the international operation names in the value domain classification tree, and reflects the specific classification conditions through the node index values. The international operation names matched with the operation names to be matched can be obtained by utilizing the matching result, and the specific classification conditions of the matched international operation names in the value domain classification tree are determined. Compare in prior art effectively saved the human labor, promote matching efficiency. In addition, the anti-interference performance of matching can be improved, and the matching accuracy of the value domain data is further improved.

Description

Value domain data matching method and device and related products
Technical Field
The present application relates to the field of data processing technologies, and in particular, to a value domain data matching method, device, and related product.
Background
In recent years, with the progress of information industrialization in the medical field, the presentation form of medical data has been greatly changed. This has profound effects on the Hospital medical Information System (HIS) and the national health Information System. In order to effectively collect, analyze and apply relevant information in the medical field such as disease onset conditions and disease treatment schemes in regions, a regional platform can be established at present.
The medical data comprises a large amount of value range data, wherein some value range data are of few types, and the data organization is simple, and the value range data can be called as small value range data, such as medical insurance types, patient sexes and the like; in addition, some value range data are of a small variety and complex in data organization, and may be referred to as large value range data, such as operation names, disease names, and the like.
The area corresponding to the area platform usually includes a plurality of hospitals, each hospital respectively establishes a database for storing medical data of the hospital, and the area platform obtains data from the databases, analyzes and applies the data. But the value range data stored in the database of each hospital may have the problem of non-standard and non-uniform. As an example, hospital a has the name "first surgery" for "laryngectomy" in its database, and hospital B has the name "second surgery" for "laryngectomy" in its database. Without matching the surgical name, it would be difficult to efficiently analyze and apply these value range data.
At present, in the medical field, schemes for matching value range data of a large value range in medical data include fuzzy query, word segmentation comparison and manual comparison, but the matching effect of the method using the fuzzy query or the word segmentation comparison is poor, and the manual comparison consumes a lot of manpower. Therefore, how to improve the accuracy and the matching efficiency of value range data matching becomes a technical problem which needs to be solved urgently for establishing and perfecting a medical area platform.
Disclosure of Invention
Based on the above problems, the application provides a value domain data matching method, a value domain data matching device and a related product, so as to improve the accuracy and the matching efficiency of value domain data matching.
The embodiment of the application discloses the following technical scheme:
in a first aspect, the present application provides a value domain data matching method, including:
obtaining value domain data to be matched;
processing the operation name to be matched in the value domain data to be matched to obtain a feature vector group to be matched;
obtaining a matching result by utilizing a data matching model and the feature vector group to be matched; the data matching model is obtained by utilizing a labeled sample feature vector set to train in advance; the label comprises a name index value of a national standard operation name corresponding to the historical operation name and a node index value corresponding to the national standard operation name in each layer of a value domain classification tree; the value domain classification tree is a structure tree for classifying national standard operation names according to the parts of the human body or the animal body.
Optionally, obtaining the data matching model specifically includes:
classifying a plurality of national standard operation names included in the international disease classification standard according to the parts of the human body or the animal body to obtain a value domain classification tree; each layer in the value domain classification tree at least comprises one node;
the historical operation names obtained from a hospital information system HIS and the corresponding relation between the historical operation names and national standard operation names;
processing the historical operation name to obtain the sample feature vector group; obtaining the label by using the historical operation name, the corresponding relation and the value range classification tree;
and training the model to be trained by utilizing the sample characteristic vector group with the label, and stopping training and obtaining the data matching model when a preset finishing condition is met.
Optionally, processing the historical procedure name to obtain the sample feature vector group specifically includes:
splitting the historical operation name to obtain m dimensionality basic features corresponding to the historical operation name; obtaining a characteristic relation characteristic value w of the k dimension by using the m-dimension basic characteristicskWherein k is 1,2, …, m;
acquiring department information and/or registration information of the historical operation name from a hospital information system HIS, and acquiring a category vector of the historical operation name by using the department information and/or registration information;
characteristic relation characteristic value w using m dimensions1,w2,…,wmAnd the class vector, obtaining the sample characteristicsAnd (5) vector groups.
Optionally, the m-dimension basic features are used to obtain a characteristic relation feature value w of the k-dimensionkThe method specifically comprises the following steps:
obtaining a correlation score between the basic feature of the kth dimension and the basic features of other dimensions by using a Pearson calculation formula, a Spanish calculation formula or a Chi-square test method;
obtaining a characteristic relation characteristic value w of the kth dimension by using a preset correlation coefficient and the correlation scorek
Optionally, the splitting the historical operation name to obtain m-dimensional basic features corresponding to the historical operation name specifically includes:
and splitting the historical operation name to obtain a keyword, a target word, a word before or after the target word in a preset word window, a target word and a word before or after the target word in the preset word window of the historical operation name.
Optionally, the processing the operation name to be matched in the value domain data to be matched to obtain a feature vector group to be matched specifically includes:
splitting the operation name to be matched to obtain m dimensionality basic features corresponding to the operation name to be matched; obtaining a characteristic relation characteristic value t of the kth dimension of the operation name to be matched by using the m-dimension basic characteristics corresponding to the operation name to be matchedk(ii) a Wherein k is 1,2, …, m;
acquiring department information and/or registration information of the operation name to be matched from the HIS, and acquiring a category vector of the operation name to be matched by utilizing the department information and/or registration information of the operation name to be matched;
utilizing the characteristic relation characteristic value t of m dimensions of the operation name to be matched1,t2,…,tmAnd the class vector of the operation name to be matched is obtained, and the feature vector group to be matched is obtained.
In a second aspect, the present application provides a value domain data matching apparatus, including:
the data acquisition module is used for acquiring value domain data to be matched;
the data processing module is used for processing the operation name to be matched in the value domain data to be matched to obtain a feature vector group to be matched;
the data matching module is used for obtaining a matching result by utilizing a data matching model and the feature vector group to be matched; the data matching model is obtained by utilizing a labeled sample feature vector set to train in advance; the label comprises a name index value of a national standard operation name corresponding to the historical operation name and a node index value corresponding to the national standard operation name in each layer of a value domain classification tree; the value domain classification tree is a structure tree for classifying national standard operation names according to the parts of the human body or the animal body.
Optionally, the apparatus further comprises: the model training module specifically comprises:
a value domain classification tree obtaining unit, configured to classify a plurality of national standard operation names included in the international disease classification standard according to a part of a human or animal body, so as to obtain a value domain classification tree; each layer in the value domain classification tree at least comprises one node;
a surgical name acquisition unit for acquiring the historical surgical name;
the corresponding relation obtaining unit is used for obtaining the corresponding relation between the historical operation name and the national standard operation name;
the sample characteristic vector group acquisition unit is used for processing the historical operation name to acquire a sample characteristic vector group;
a label obtaining unit, configured to obtain the label by using the historical procedure name, the correspondence, and the value range classification tree;
and the model training unit is used for training the model to be trained by utilizing the sample characteristic vector group with the label, and stopping training and obtaining the data matching model when a preset finishing condition is met.
Optionally, the sample feature vector group obtaining unit may specifically include:
the first basic feature obtaining subunit is configured to split the historical surgical name, and obtain m-dimensional basic features corresponding to the historical surgical name;
a first obtaining subunit of feature relationship feature values, configured to obtain a feature relationship feature value w of a kth dimension by using the m-dimension basic featureskWherein k is 1,2, …, m;
the category vector first acquisition subunit is used for acquiring department information and/or registration information of the historical operation name from a hospital information system HIS, and acquiring a category vector of the historical operation name by utilizing the department information and/or registration information;
a sample feature vector group obtaining subunit for obtaining feature values w of feature relationship by using m dimensions1,w2,…,wmAnd the category vector, obtaining the sample feature vector group.
Optionally, the first obtaining subunit of the feature relationship feature value is specifically configured to obtain, by using a pearson calculation formula, a spearman calculation formula, or a chi-square test method, a correlation score between the basic feature of the kth dimension and the basic features of other dimensions; obtaining a characteristic relation characteristic value w of the kth dimension by using a preset correlation coefficient and the correlation scorek
Optionally, the basic feature first obtaining subunit is specifically configured to split the historical surgical name, and obtain a keyword, a target word, a word before or after the target word in a preset word window, a target word, and a word before or after the target word in a preset word window of the historical surgical name.
Optionally, the data processing module specifically includes:
a basic feature second obtaining subunit, configured to split the operation name to be matched, and obtain m-dimensional basic features corresponding to the operation name to be matched;
a second obtaining subunit of feature relationship feature values, configured to obtain a feature relationship feature value t of the kth dimension of the surgical name to be matched by using the m-dimension basic features corresponding to the surgical name to be matchedk(ii) a Wherein k is 1,2, …,m;
The category vector second acquisition subunit is used for acquiring department information and/or registration information of the operation name to be matched from the HIS, and acquiring a category vector of the operation name to be matched by using the department information and/or registration information of the operation name to be matched;
a to-be-matched feature vector group obtaining subunit, configured to utilize feature relationship feature values t of m dimensions of the to-be-matched operation name1,t2,…,tmAnd the class vector of the operation name to be matched is obtained, and the feature vector group to be matched is obtained.
In a third aspect, the present application provides a computer-readable storage medium, in which a computer program is stored, which, when executed by a processor, implements the value range data matching method as provided in the first aspect.
In a fourth aspect, the present application provides a processor for executing a computer program, which when executed performs the value range data matching method as provided in the first aspect.
Compared with the prior art, the method has the following beneficial effects:
the method comprises the steps of obtaining value domain data to be matched, wherein the value domain data comprises operation names to be matched; processing the operation name to be matched to obtain a feature vector group to be matched; and obtaining a matching result by utilizing a pre-trained data matching model and the feature vector group to be matched. The data matching model is obtained after training of the labeled sample feature vector group, and the label comprises a name index value of a national standard operation name corresponding to a historical operation name and a node index value of the national standard operation name corresponding to each layer of the value domain classification tree, so that the trained data matching model has the function of matching the national standard operation name according to the non-national standard operation name, can determine the specific classification condition of the national standard operation name in the value domain classification tree, and reflects the specific classification condition through the node index value. According to the method and the device, the matching result obtained by using the data matching model can be displayed in the same or similar form as the label, so that the international operation name matched by the operation name to be matched can be obtained by using the matching result in an indexing manner, and the specific classification condition of the matched international operation name in the value domain classification tree is determined.
In this application, utilize the data matching model of training in advance to carry out the automatic matching of value range data, compare the matching mode who compares in manual and effectively saved the human labor, promote matching efficiency. In addition, because the value domain classification tree is divided according to the parts of the human body or the animal body, even if the operation name to be matched is similar to other non-national standard operation names, the operation name to be matched can be effectively distinguished from other operation names which are different in part (namely different in classification) but similar in name according to the node index value, and matching errors are avoided. Therefore, compared with a matching scheme of fuzzy query and word segmentation comparison, the method and the device can improve the anti-interference performance of matching, and further improve the matching accuracy of value range data.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.
Fig. 1 is a flowchart of a value domain data matching method according to an embodiment of the present application;
fig. 2 is a schematic structural diagram of a value domain classification tree according to an embodiment of the present application;
FIG. 3 is a flow chart of obtaining a data matching model according to an embodiment of the present application;
fig. 4 is a flowchart of an implementation manner for obtaining a sample feature vector set according to an embodiment of the present application;
fig. 5 is a flowchart of an implementation manner for obtaining a feature vector group to be matched according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of a value domain data matching apparatus according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of another value range data matching apparatus according to an embodiment of the present application;
FIG. 8 is a schematic structural diagram of a model training module according to an embodiment of the present disclosure;
fig. 9 is a hardware structure diagram of a value range data matching device according to this embodiment.
Detailed Description
As described above, for large-scale data (such as operation name, disease name, etc.) in medical field, there is now a non-uniform and non-standard problem in each hospital, which brings difficulties to the subsequent analysis and application of the data. The problem can be solved through value domain data matching, but in the value domain data matching implementation process, the inventor finds that some existing matching technical schemes have the problems of low efficiency and low accuracy.
In the case of operation names, some operation names may have words that are coincident or similar. For example, there is a repeated word YY in the third operation of the first hospital and the fourth operation of the second hospital, and if the value domain data matching is performed by means of fuzzy query or word segmentation comparison, the word YY interferes with the accuracy of the matching, so that the third operation and the fourth operation are easily matched together to correspond to the fifth operation of the national standard. In fact, however, the national standard "thyrohyoidectomy" was named the third operation, possibly due to the first hospital's idioms; and due to the second hospital's idiom, the national standard "lung and bronchiectomy" was named the fourth operation. It can be seen that the third and fourth operations actually each correspond to a different national standard operation name, but result in a mismatch because of the coincident words or similar words.
In addition, the value range data can be matched by adopting a manual comparison mode at present. However, the manual matching method requires a lot of manpower and is inefficient. For example, the hospital's surgical name may be updated after a period of time has elapsed, which means that manual controls need to be re-performed, which is time consuming and laborious. In addition, the accuracy of manual operation is influenced by eyesight and fatigue degree, the error rate is high, and the accuracy of value range data matching is influenced.
Based on the above problems, the present application provides a value domain data matching method, device and related product. And (3) by utilizing a pre-trained data matching model, when value domain data matching is required, taking a feature vector group to be matched obtained after the operation name to be matched is processed as the input of the model, and obtaining a matching result output by the model after the model is operated and processed. Because the model is obtained by pre-training, the convenience is higher and the matching efficiency is higher. The model has the functions of operation name matching and identification of the belonged classification, and the classification is obtained based on the value domain classification tree, and the value domain classification tree takes the position as the classification basis, so that the matching result has stronger exclusivity, the matching interference of similar operation names is avoided, and the matching accuracy of the value domain data of a large value domain is improved. In addition, this application need not to consume a large amount of manpowers, consequently matches efficiently, and has saved the human cost.
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Method embodiment
Referring to fig. 1, the figure is a flowchart of a value domain data matching method provided in an embodiment of the present application. The value range data matching method provided by the embodiment can be applied to a regional platform in the medical field, and the regional platform can be specifically realized in the form of a server. The server applies the method to match the value range data from each hospital in the area.
As shown in fig. 1, a value range data matching method provided in the embodiment of the present application includes:
step 101: and obtaining value domain data to be matched.
In an application scenario of the method of the embodiment, an area corresponding to an area platform (server) includes a plurality of hospitals, and each hospital adopts a medical information system HIS. The server can realize remote communication with the HIS of each hospital, and the server can acquire medical data of the hospital through the database of the HIS of each hospital. The medical data includes value range data to be matched.
As an example, the value range data to be matched may include, but is not limited to: the name of the operation to be matched, the name of the disease to be matched, the type of the medical insurance to be matched, the sex of the patient to be matched and the like. It should be noted that the value range data to be matched may be acquired simultaneously or separately. For example, the operation name to be matched and the disease name to be matched are acquired at the same time, and then the medical insurance category to be matched and the sex of the patient to be matched are acquired according to actual requirements. Therefore, the acquisition timing sequence of the value range data to be matched is not specifically limited
Step 102: and processing the operation name to be matched in the value domain data to be matched to obtain a feature vector group to be matched.
The method mainly aims to match the national standard operation name corresponding to the operation name to be matched. As a possible implementation, the operation name included in the international disease classification standard may be taken as a national standard operation name. In this example, the contents described in the Ninth Clinical Revision of International Classification of diseases (International Classification of diseases, Ninth review, Clinical Modification, ICD-9-CM) may be used as the International disease Classification standard. ICD-9-CM is well known to those skilled in the art, and thus the national standard surgical names included for ICD-9-CM in this application are not listed one by one.
Before matching the operation names, the operation names to be matched need to be processed, and vector representation of the operation names to be matched is obtained through processing. In this embodiment, the vector representation of the operation name to be matched is referred to as a feature vector group to be matched.
As an example implementation manner, the operation name to be matched may be split, and basic features of the operation name to be matched, such as keywords and target words, may be obtained. And obtaining a series of characteristic relation characteristic values by analyzing the correlation among the basic characteristics. In addition, department information and/or registration information of the operation name to be matched can be acquired. Since the department information and/or registration information usually reflects the department corresponding to the name of the operation to be matched, and there are limited parts of the human body or animal body (such as eyes in charge of ophthalmology and uterus in charge of gynecology) in charge of diagnosis and treatment in each department, the category information of the name of the operation to be matched can be extracted by using the department information and/or registration information of the name of the operation to be matched. The category information may specifically be in the form of a category vector. And taking the characteristic relation characteristic value and the category vector as constituent elements of a characteristic vector group to be matched.
It can be understood that the above is only an example implementation manner of this step, and in practical application, other implementation manners may be adopted to obtain the feature vector group to be matched according to a specific matching requirement or a data type stored in the HIS. Therefore, the implementation of this step is not limited here, and the constituent elements of the feature vector group to be matched are not limited.
Step 103: and obtaining a matching result by utilizing the data matching model and the feature vector group to be matched.
It should be noted that, in the present embodiment, the data matching model is obtained by training in advance using the sample feature vector set with the label. Wherein the sample feature vector group is a vectorized representation of historical procedure names, which may be obtained from HIS of various hospitals. The labels for the sample feature vector groups include: name index values of national standard operation names corresponding to historical operation names and node index values corresponding to the national standard operation names in each layer of the value domain classification tree. The name index value can be used for indexing and obtaining the national standard operation name corresponding to the historical operation name.
In this embodiment, the historical operation name and the operation name to be matched both refer to operation names for which value range matching has not been performed. The operation name to be matched refers to an operation name which needs to be subjected to value range matching by using the trained model at present after the trained model is obtained; and the historical operation name refers to the operation name which is used before the model training is completed and has not been subjected to value range matching. That is, the historical operation name is specifically historical data in the embodiment, and is used to form a sample used for training the model, relative to the operation name to be matched.
To facilitate understanding of the form of the value range classification tree and the node index value, reference may be made to fig. 2, which is a schematic structural diagram of a value range classification tree provided in an embodiment of the present application.
As shown in fig. 2, the value range classification tree includes a plurality of levels (level 1, level 2, …, level s, respectively, from the root to the leaf, where s is an integer greater than 2), each of which includes at least one node. Wherein, only one node of the 1 st layer does not refer to a specific national standard operation name; each node from level 2 to level s represents a different national standard surgical name. The nodes of layer 2 correspond to various national standard operation names classified according to the parts of the human or animal body, such as: nervous system surgery, endocrine system surgery, eye surgery, respiratory system surgery, and the like. The international operation names corresponding to each node on the layer 2 can be classified in turn more carefully. For example, respiratory surgery may be divided into laryngectomies, lung and bronchiectomies, and so on.
As can be seen from fig. 2, each parent node of the 1 st to s-1 st layers in the value range classification tree has at least one corresponding child node; each child node of layers 2 to s has a corresponding parent node. The node index values included in the labels of the sample feature vector group specifically refer to the node index values of the national standard operation names corresponding to the historical operation names in each layer of the value domain classification tree. For ease of understanding, the following is exemplified.
Assuming that the national standard operation name corresponding to the historical operation name is laryngectomy, the method can be easily determined according to the father-son relationship among the nodes: node 235 represents laryngectomy, and its parent node 222 represents respiratory surgery. Thus, the label of the sample feature vector group includes the node index value of the node and the node index value of the node. It can be understood that the node index value in the tag can be used to index and determine the classification condition of the national standard operation name corresponding to the historical operation name in the value domain classification tree.
It can be understood that in the model training process, the label is used as the training basis for deep learning. Through deep learning training, the model gradually has the function of outputting labels by the sample feature vector group. Therefore, after the feature vector group to be matched obtained in step 102 is input to the pre-trained data matching model, the data matching model can output a matching result meeting the matching requirement through operation and processing. The specific composition form of the matching result is the same as or similar to the label used in the training stage. That is, the matching result includes the name index value of the international operation name corresponding to the operation name to be matched, and the node index value corresponding to the international operation name in each layer of the value domain classification tree.
The above is the value range data matching method provided in the embodiment of the present application. In the method, value domain data to be matched are obtained, wherein the value domain data comprises a surgical name to be matched; processing the operation name to be matched to obtain a feature vector group to be matched; and obtaining a matching result by utilizing a pre-trained data matching model and the feature vector group to be matched. The data matching model is obtained after training of the labeled sample feature vector group, and the label comprises a name index value of a national standard operation name corresponding to a historical operation name and a node index value of the national standard operation name corresponding to each layer of the value domain classification tree, so that the trained data matching model has the function of matching the national standard operation name according to the non-national standard operation name, can determine the specific classification condition of the national standard operation name in the value domain classification tree, and reflects the specific classification condition through the node index value. According to the method and the device, the matching result obtained by using the data matching model can be displayed in the same or similar form as the label, so that the international operation name matched by the operation name to be matched can be obtained by using the matching result in an indexing manner, and the specific classification condition of the matched international operation name in the value domain classification tree is determined.
In this application, utilize the data matching model of training in advance to carry out the automatic matching of value range data, compare the matching mode who compares in manual and effectively saved the human labor, promote matching efficiency. In addition, because the value domain classification tree is divided according to the parts of the human body or the animal body, even if the operation name to be matched is similar to other non-national standard operation names, the operation name to be matched can be effectively distinguished from other operation names which are different in part (namely different in classification) but similar in name according to the node index value, and matching errors are avoided. Therefore, compared with a matching scheme of fuzzy query and word segmentation comparison, the value range data matching method provided by the application can also improve the anti-interference performance of matching, and further improve the matching accuracy of the value range data.
To facilitate understanding of the training process of the data matching model, a specific implementation of training the data matching model is described below with reference to fig. 3 and the embodiment.
Referring to fig. 3, it is a flowchart for obtaining a data matching model according to an embodiment of the present application. The steps illustrated in fig. 3 may be specifically executed before steps 101-103 described in the foregoing embodiment.
As shown in fig. 3, the implementation process of obtaining the data matching model includes:
step 301: and classifying a plurality of national standard operation names included in the international disease classification standard according to the parts of the human body or the animal body to obtain a value domain classification tree.
As an example implementation, a plurality of national standard operation names included by ICD-9-CM may be classified according to corresponding parts of the human or animal body (treatment parts or disease parts), and relationships (including but not limited to membership, non-membership, and supplementary relationships) of all the national standard operation names under each classification may be determined. And deploying nodes below the 2 nd layer of the value domain classification tree by using the relation of all national standard operation names under each classification. An example form of a value range classification tree may be found in fig. 2.
Step 302: and obtaining the historical operation name and the corresponding relation between the historical operation name and the national standard operation name.
For ease of understanding, a hospital information system HIS and a value range comparison system are described herein.
The HIS is a system inside a hospital. The HIS database stores non-standardized matching value range data in the hospital, such as historical procedure names, historical disease names, historical medical insurance types, and historical patient gender. In this step, the server (i.e., the regional platform) can specifically obtain a large number of historical surgical names from the HIS, thereby facilitating subsequent processing to obtain sample data used by the training model. For example, 300 different historical procedure names are taken from the HIS.
The value range comparison system is pre-established. It should be noted that the historical operation names stored in the HIS database of each hospital are not always named arbitrarily, and the historical operation names follow the naming rules of the medical business layer. Although there are many possible variations of the naming rule of each hospital (e.g., variations occurring according to the business idioms or the characteristics of spoken language of the hospital), the historical operation names all have a corresponding relationship with the national standard operation names.
As an example implementation, the correspondence may be obtained in a procedure name naming phase of a hospital, and the correspondence is stored in a value range comparison system in the form of a file. As another example implementation manner, the corresponding relationship may also be obtained by manual comparison after naming, and the corresponding relationship is stored in the value range comparison system in the form of a file. When executing the step, the server obtains the corresponding relation between the historical operation name and the national standard operation name from the value range comparison system. For example, 300 historical surgical names, abbreviated as a001 to a300 (not shown in the drawings) for convenience of reference, are obtained from the HIS, and thus the corresponding relationship between the 300 historical surgical names a001 to a300 and the national standard surgical names B001 to B300 (not shown in the drawings) needs to be obtained from the value range comparison system. A001 corresponds to B001, A002 corresponds to B002, …, and A300 corresponds to B300.
It should be noted that, as the progress of establishing the regional platform is continuously advanced, the requirement of the health care institution such as the hospital for matching the value domain data is very large. Since the correspondence relationship can be used in the method of this embodiment to form sample data used for training the model and to train and obtain the data matching model for performing accurate and efficient value range data matching, even if the correspondence relationship is obtained in the above-mentioned example manner, i.e., in a manner of manual comparison and then stored in the value range comparison system, the matching efficiency is greatly improved compared with a method of manual comparison when value range data needs to be matched each time. It can be seen that the manual workload for obtaining the corresponding relationship between the historical operation name and the national standard operation name by manual contrast is very little compared with the requirement of value range matching.
Step 303: and processing the historical operation name to obtain the sample feature vector group.
The set of sample feature vectors is a vectorized representation of the historical procedure name. And the sample feature vector group is used as sample data for subsequent training of the model.
One exemplary implementation of step 303 is described below in conjunction with fig. 4. Referring to fig. 4, this figure is a flowchart of an implementation manner for obtaining a sample feature vector set according to an embodiment of the present application.
Step 3031: and splitting the historical operation name to obtain m dimensionality basic features corresponding to the historical operation name.
In practical application, historical operation names can be processed through a word segmentation method, and basic features of multiple dimensions are obtained. The word segmentation is a relatively mature technology in the field, so that historical operation names can be split by utilizing a plurality of mature word segmentation algorithms, and the word segmentation algorithms are not particularly limited.
m represents the number of dimensions of the basic feature after splitting, and m is an integer greater than 1. As an example, the split basic features may include at least one of:
the operation name display method comprises the following steps of keywords of historical operation names, target characters, characters before or after the target characters in a preset character window, target words and words before or after the target words in a preset word window.
The keywords may be obtained by performing identity (or weight) removal on different historical operation names. For example, the historical procedure name a001 is XYZ procedure, the historical procedure name a008 is XUZ procedure, and after the same, "Y" may be used as the keyword for the historical procedure name a001, and "U" may be used as the keyword for the historical procedure name a 008.
The target words and the target words can be set according to requirements or word segmentation algorithms respectively. For example, the historical surgical name "ABCDEFGHIJKLMN", wherein A, B, C, D, E, F, G, H, I, J, K, L, M, N each represents a Chinese character and AB, CD, EFG, HI, JK, LMN each represent a word. For a target word, the size of the word window may be preset so as to obtain words preceding or following the target word. Similarly, for the target word, the size of the word window may be preset, so as to obtain the words before or after the word.
As an example, the target word is D, the word window size is 2, then the words before the target word in the word window are B and C in order from front to back, and the words after the target word are E and F in order. As an example, the target word is HI, the word window size is 2, then the words before the target word in the word window are CD and EFG in order from front to back, and the words after the target word are JK and LMN in order.
As a possible implementation, a dictionary may be established in advance, where each chinese character in the dictionary corresponds to an independent serial number. The above-described basic features may be expressed and applied in the form of serial numbers.
Step 3032: obtaining characteristic relation characteristic value w of k dimension by using m dimension basic characteristicsk
The basic feature of the k-th dimension may refer to a basic feature of any one of the m-dimensions, that is, k is 1,2, …, m. The characteristic relation characteristic value of the k dimension characterizes the characteristic relation between the basic characteristic of the k dimension and the basic characteristics of other dimensions. See formula (1) which shows the characteristic relation characteristic value w of the k dimension of the historical operation namekThe manner of acquisition.
Figure BDA0002301196660000141
In the formula (1), u is a preset correlation coefficient,
Figure BDA0002301196660000142
scoring a degree of correlation between the base features of the kth dimension and the base features of the pth dimension. Wherein p takes on a value from 1 to m and p is not equal to k. That is, the basic features of the kth dimension are not identical to the basic features of the pth dimension.
In specific implementation, the basic features of each dimension can be expressed in a vector form, and then a correlation degree score between the basic features of the kth dimension and the basic features of the qth dimension is obtained by a Pearson calculation formula, a spearman calculation formula or a Chi-square test method
Figure BDA0002301196660000143
If the pearson calculation formula is adopted in the embodiment, the pearson correlation coefficient obtained by the pearson calculation formula is referred to as the correlation score in the embodiment; if the spearman calculation formula is adopted, the spearman correlation coefficient obtained by the spearman calculation formula is referred to as the correlation score in the present embodiment.
It is to be understood that the manner of obtaining the relevancy score in practical applications is not limited to the above examples, and the implementation manner of obtaining the relevancy score in this step is not particularly limited herein.
Step 3033: department information and/or registration information of historical operation names are obtained from a hospital information system HIS, and category vectors of the historical operation names are obtained by utilizing the department information and/or registration information.
It should be noted that the server may also obtain department information and/or registration information of historical surgical names from the HIS. It can be understood that the department information and/or registration information can reflect the part corresponding to the historical operation name examined by the doctor (or doctor) to a certain extent. For example, if the registered department in the registration information is ophthalmology, it is not possible to conclude that the patient has a gastric ulcer or is suffering from tinea pedis at the disease screening stage. Therefore, the department information and/or registration information of the historical operation names can help to eliminate the possibility of matching with operation names of other unrelated categories, and the matching accuracy is improved.
For convenience of processing, the department information and/or registration information of the historical operation names may be expressed in a vector form in the present embodiment. For example, a category vector of the historical operation names is formed, and in the category vector, a first element is used for representing department information of the historical operation names; the second element is used to represent registration information for historical procedure names.
Step 3034: characteristic relation characteristic value w using m dimensions1,w2,…,wmAnd a category vector, obtaining a sample feature vector group.
See equation (2), which shows an example form of a sample feature vector set:
R={W,C} (2)
in the formula (2), C is a category vector of the historical operation name, W is a feature relation feature vector of the historical operation name, and the expression of W is as follows:
W={w1,w2,...,wm} (3)
with reference to the steps 3031-3034, in this embodiment, each sample feature vector group R includes two pieces of information of a historical procedure name, where one piece of information is represented by the feature relationship feature vector W of the historical procedure name, and the other piece of information is represented by the category vector C of the historical procedure name.
Step 304: and obtaining the label of the sample feature vector group by using the historical operation name, the corresponding relation between the historical operation name and the national standard operation name and the value domain classification tree.
It should be noted that, in this embodiment, the label of the sample feature vector group at least includes the following two parts: the name index value of the national standard operation name corresponding to the historical operation name and the node index value corresponding to the national standard operation name in each layer of the value domain classification tree. The name index value and the node index value included in the tag will be described below, respectively.
In practical application, the national standard operation name corresponding to the historical operation name can be determined by using the historical operation name and the corresponding relation. Because the value domain classification tree is established by classifying a plurality of national standard operation names according to the parts of the human body or the animal body, the value domain classification tree necessarily comprises nodes representing the national standard operation names corresponding to the historical operation names. In practical application, each node has a name index value, and the name index value can be used for indexing to obtain the international operation name represented by the node.
In this embodiment, the nodes corresponding to the international operation names in each layer of the value domain classification tree may be determined by using the international operation names corresponding to the historical operation names and the value domain classification tree. It should be noted that the nodes corresponding to the international operation names in each level of the value domain classification tree described herein include a node representing the international operation name and also include each ancestor node of the node (except for the root node in level 1).
For ease of understanding, reference may be made herein to the value range classification tree described with reference to FIG. 2. Assume that node 243 represents a national standard surgical name corresponding to a historical surgical name. The ancestor nodes of node 243 include node 236 and node 223. It will be appreciated that parent-child relationships between nodes reveal categorical relationships between the national standard surgical names represented by the nodes.
In practical applications, each node has a node index value. The node index values of the nodes with parent-child relationship with each other have corresponding association, so that the node index values can be used for indexing and determining the classification condition of the national standard operation name corresponding to the historical operation name in the value domain classification tree.
By performing the above steps 301-304, the sample feature vector set used for training the model is obtained step by step, and the label of the sample feature vector set is obtained. And then, training the model by utilizing the characteristic vector group with the label, thereby gradually enabling the model to have the function of accurately matching value range data in the training process.
Step 305: training a model to be trained by using the sample feature vector group with the label, judging whether a preset ending condition is met, and if so, executing a step 306; if not, step 305 is looped.
In practical application, an end condition for model training may be set. As an example, the preset end condition may be that the number of training iterations reaches a preset number. As another example, the preset end condition may also be that the value of the objective function reaches below a preset threshold.
It can be understood that if the preset end condition is satisfied, the matching accuracy of the model to the value range data is represented to meet the actual requirement of the value range data matching. Namely, the trained data matching model can be applied to actually carry out value-domain data matching. Otherwise, the matching accuracy of the value domain data still does not meet the actual requirement, and the training is required to be continued.
Step 306: stopping model training and obtaining a data matching model.
It can be understood that when the model training is stopped, the parameters inside the model are the key to ensure the matching effect of the model data. Therefore, the parameters at this time can be used as the internal parameters of the subsequent model in actual use. These parameters may be stored and retrieved and loaded when the model requires actual application.
As can be seen from the above description, after the model is trained, it can be used to actually perform value range data matching. That is, the model training process described in step 301-306 above occurs before step 101-103.
In the previous embodiment, it was described that, in the model application process, the operation name to be matched is not really input into the model, but is a vectorized representation of the operation name to be matched, i.e. a feature vector group to be matched. See step 102 for details.
In order to ensure that the trained model is matched with the input in practice, the feature vector group to be matched is obtained in a manner similar to that of the sample feature vector group. One exemplary implementation of step 102 is described below in conjunction with fig. 5.
Referring to fig. 5, this figure is a flowchart of an implementation manner for obtaining a feature vector group to be matched according to an embodiment of the present application.
Step 1021: and splitting the operation name to be matched to obtain m dimensionality basic features corresponding to the operation name to be matched.
The operation name to be matched described in this step specifically refers to the operation name to be matched in the value range data to be matched obtained in step 101. The m-dimensional basic features obtained by splitting in this step may include, but are not limited to, the following basic features in several dimensions:
the method comprises the steps of matching keywords of the operation name to be matched, target characters, characters before or after the target characters in a preset character window, target words and words before or after the target words in a preset word window.
Step 1022: obtaining the characteristic relation characteristic value t of the kth dimensionality of the operation name to be matched by using the m dimensionality basic characteristics corresponding to the operation name to be matchedk
In addition, k is 1,2, …, m. When the step is specifically realized, a Pearson calculation formula, a Spierman calculation formula or a Chi-square inspection method can be utilized to obtain the correlation degree scores between the basic feature of the kth dimension of the operation name to be matched and the basic features of other dimensions; then, a characteristic relation characteristic value t of the kth dimension of the operation name to be matched is obtained by utilizing a preset correlation coefficient and the correlation degree scorek
See formula (4), which shows the characteristic relation characteristic value t of the k dimension of the operation name to be matchedkThe acquisition mode of (1):
Figure BDA0002301196660000181
in the formula (4), u is a preset correlation coefficient,
Figure BDA0002301196660000182
and scoring the correlation degree between the basic characteristics of the k dimension and the basic characteristics of the p dimension of the operation name to be matched. Wherein p takes on a value from 1 to m and p is not equal to k. That is, the basic features of the k-th dimension of the procedure name to be matched are not equal to the basic features of the p-th dimension of the procedure name to be matched.
Step 1023: department information and/or registration information of the operation name to be matched is obtained from the HIS, and the category vector of the operation name to be matched is obtained by utilizing the department information and/or registration information of the operation name to be matched.
The implementation manner of obtaining the category vector of the operation name to be matched and the implementation manner of obtaining the category vector of the operation name not matched are basically the same, and the description about this step can refer to step 3033.
Step 1024: characteristic relation characteristic value t of m dimensions by using operation name to be matched1,t2,…,tmAnd obtaining a feature vector group to be matched by using the category vector of the operation name to be matched.
The implementation manner of this step is substantially the same as the implementation manner of obtaining the sample feature vector group in step 3034, and for the implementation of this step, reference may be made to the description of step 3034, and details are not described herein again.
The above is an example implementation of obtaining a set of feature vectors to be matched. By executing the step 1021-.
The embodiment of the application provides to the problem that the operation name of big value range leads to can't concentrating the matching because of the variety is various, divide into different rank categories through value range classification tree (structure tree), adopt in the minizone to match based on the data matching model of supervision, can effectually prevent that the operation name is extremely similar to the interference of correctly matching the operation name, improved the rate of accuracy of matching greatly.
The problem of unifying and standardizing value range data (namely operation names) of a large value range is taken as a prominent problem of large value range matching, and the problem is characterized in that the types of national standard operation names are huge. In the above embodiment, two means are sufficiently combined to perform the operation name matching. Firstly, keywords, characters, words, position relations of value domain data and the correlation relations of the keywords, the characters, the words and the position relations are adopted to provide an effective vectorization scheme for deep learning; secondly, the operation names are structurally split according to the parts of the human body or the animal body to form a value range classification tree, and value range matching is carried out on each layer until a matching result with high accuracy is formed finally. It can be understood that, in the embodiment, when the sample is obtained, the historical operation name or the operation name to be matched is docked to each layer of the value domain classification tree, so that the accuracy of the matching result is greatly improved.
Based on the value domain data matching method provided by the foregoing embodiment, the present application also provides a value domain data matching device. The following description is made with reference to the embodiments and the accompanying drawings.
Device embodiment
Referring to fig. 6, the figure is a schematic structural diagram of a value domain data matching apparatus according to an embodiment of the present application. As shown in fig. 6, the value range data matching apparatus provided in this embodiment includes:
a data obtaining module 601, configured to obtain value domain data to be matched;
the data processing module 602 is configured to process the operation name to be matched in the value domain data to be matched, so as to obtain a feature vector group to be matched;
and the data matching module 603 is configured to obtain a matching result by using the data matching model and the feature vector group to be matched.
The data matching model described in the embodiment is obtained by training in advance by using a labeled sample feature vector set; the sample feature vector group is vectorized representation of historical operation names; the label comprises a name index value of a national standard operation name corresponding to the historical operation name and a node index value corresponding to the national standard operation name in each layer of a value domain classification tree; the value domain classification tree is a structure tree for classifying national standard operation names according to the parts of the human body or the animal body.
The data matching model is obtained after training of the labeled sample feature vector group, and the label comprises a name index value of a national standard operation name corresponding to a historical operation name and a node index value of the national standard operation name corresponding to each layer of the value domain classification tree, so that the trained data matching model has the function of matching the national standard operation name according to the non-national standard operation name, can determine the specific classification condition of the national standard operation name in the value domain classification tree, and reflects the specific classification condition through the node index value. According to the method and the device, the matching result obtained by using the data matching model can be displayed in the same or similar form as the label, so that the international operation name matched by the operation name to be matched can be obtained by using the matching result in an indexing manner, and the specific classification condition of the matched international operation name in the value domain classification tree is determined.
In this application, utilize the data matching model of training in advance to carry out the automatic matching of value range data, compare the matching mode who compares in manual and effectively saved the human labor, promote matching efficiency. In addition, because the value domain classification tree is divided according to the parts of the human body or the animal body, even if the operation name to be matched is similar to other non-national standard operation names, the operation name to be matched can be effectively distinguished from other operation names which are different in part (namely different in classification) but similar in name according to the node index value, and matching errors are avoided. Therefore, compared with a matching scheme of fuzzy query and word segmentation comparison, the method and the device can improve the anti-interference performance of matching, and further improve the matching accuracy of value range data.
Optionally, in practical application, the data matching apparatus provided in this embodiment may further include: and the model training module is used for enabling the device to have the function of model training. Referring to fig. 7, which is a schematic structural diagram of another value range data matching apparatus, it can be seen from fig. 7 that a model training module 701 is further added on the basis of the apparatus structure shown in fig. 6.
Referring to fig. 8, the figure is a schematic structural diagram of a model training module provided in the embodiment of the present application.
As shown in fig. 8, the model training module 701 may specifically include:
a value range classification tree obtaining unit 7011, configured to classify, according to a part of a human or animal body, a plurality of national standard operation names included in the international disease classification standard, so as to obtain a value range classification tree; each layer in the value domain classification tree at least comprises one node;
a surgical name acquisition unit 7012 configured to acquire the historical surgical name;
a correspondence obtaining unit 7013, configured to obtain a correspondence between the historical surgical name and a national standard surgical name;
a sample feature vector group obtaining unit 7014, configured to process the historical procedure name to obtain the sample feature vector group;
a tag obtaining unit 7015, configured to obtain the tag by using the historical procedure name, the correspondence, and the value range classification tree;
and the model training unit 7016 is configured to train a model to be trained by using the sample feature vector group with the label, and when a preset termination condition is satisfied, stop training and obtain the data matching model.
Optionally, the sample feature vector group obtaining unit 7014 may specifically include:
the first basic feature obtaining subunit is configured to split the historical surgical name, and obtain m-dimensional basic features corresponding to the historical surgical name;
a first obtaining subunit of feature relationship feature values, configured to obtain a feature relationship feature value w of a kth dimension by using the m-dimension basic featureskWherein k is 1,2, …, m;
the category vector first acquisition subunit is used for acquiring department information and/or registration information of the historical operation name from a hospital information system HIS, and acquiring a category vector of the historical operation name by utilizing the department information and/or registration information;
a sample feature vector group obtaining subunit for obtaining feature values w of feature relationship by using m dimensions1,w2,…,wmAnd the category vector, obtaining the sample feature vector group.
Optionally, the first obtaining subunit of the feature relationship feature value is specifically configured to obtain, by using a pearson calculation formula, a spearman calculation formula, or a chi-square test method, a correlation score between the basic feature of the kth dimension and the basic features of other dimensions; obtaining a characteristic relation characteristic value w of the kth dimension by using a preset correlation coefficient and the correlation scorek
Optionally, the basic feature first obtaining subunit is specifically configured to split the historical surgical name, and obtain a keyword, a target word, a word before or after the target word in a preset word window, a target word, and a word before or after the target word in a preset word window of the historical surgical name.
The embodiment of the application provides to the problem that the operation name of big value range leads to can't concentrating the matching because of the variety is various, divide into different rank categories through value range classification tree (structure tree), adopt in the minizone to match based on the data matching model of supervision, can effectually prevent that the operation name is extremely similar to the interference of correctly matching the operation name, improved the rate of accuracy of matching greatly.
The problem of unifying and standardizing value range data (namely operation names) of a large value range is taken as a prominent problem of large value range matching, and the problem is characterized in that the types of national standard operation names are huge. In the above embodiment, two means are sufficiently combined to perform the operation name matching. Firstly, keywords, characters, words, position relations of value domain data and the correlation relations of the keywords, the characters, the words and the position relations are adopted to provide an effective vectorization scheme for deep learning; secondly, the operation names are structurally split according to the parts of the human body or the animal body to form a value range classification tree, and value range matching is carried out on each layer until a matching result with high accuracy is formed finally. It can be understood that, in the embodiment, when the sample is obtained, the historical operation name or the operation name to be matched is docked to each layer of the value domain classification tree, so that the accuracy of the matching result is greatly improved.
Optionally, the data processing module 602 specifically includes:
a basic feature second obtaining subunit, configured to split the operation name to be matched, and obtain m-dimensional basic features corresponding to the operation name to be matched;
a second obtaining subunit of feature relationship feature values, configured to obtain a feature relationship feature value t of the kth dimension of the surgical name to be matched by using the m-dimension basic features corresponding to the surgical name to be matchedk(ii) a Wherein k is 1,2, …, m;
the category vector second acquisition subunit is used for acquiring department information and/or registration information of the operation name to be matched from the HIS, and acquiring a category vector of the operation name to be matched by using the department information and/or registration information of the operation name to be matched;
a to-be-matched feature vector group obtaining subunit, configured to utilize feature relationship feature values t of m dimensions of the to-be-matched operation name1,t2,…,tmAnd the class vector of the operation name to be matched is obtained, and the feature vector group to be matched is obtained.
According to the above description, the device ensures consistency of the acquisition modes of the feature vector group to be matched and the sample feature vector group, so that the trained data matching model has good adaptability to the feature vector group to be matched input in practical application, and the model is convenient to output the matching result close to the label quality of the sample feature vector group.
Based on the value range data matching method and device provided by the foregoing embodiments, the embodiments of the present application further provide a computer-readable storage medium.
The storage medium stores a program, and the program, when executed by a processor, implements some or all of the steps in the value range data matching method as claimed in the foregoing method embodiments of the present application.
The storage medium may be a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and various media capable of storing program codes.
Based on the value range data matching method, device and storage medium provided by the foregoing embodiments, the embodiments of the present application provide a processor. The processor is used for running a program, wherein the program runs to execute part or all of the steps of the value range data matching method protected by the method embodiment.
Based on the storage medium and the processor provided by the foregoing embodiments, the present application also provides a value domain data matching device.
Referring to fig. 9, the diagram is a hardware structure diagram of the value range data matching device provided in this embodiment.
As shown in fig. 9, the value range data matching apparatus includes: memory 901, processor 902, communication bus 903, and communication interface 904.
The memory 901 stores a program that can be executed on the processor, and when the program is executed, some or all of the steps in the value range data matching method provided in the foregoing method embodiments of the present application are implemented. The memory 901 may include high-speed random access memory and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device.
In this device, the processor 902 and the memory 901 transmit signaling, logic instructions, and the like through a communication bus. The device is capable of communicative interaction with other devices via the communication interface 904.
The trained data matching model has the function of matching the national standard operation name according to the non-national standard operation name, can determine the specific classification condition of the national standard operation name in the value domain classification tree, and reflects the specific classification condition through the node index value. Therefore, by performing the above method by a program, the matching result obtained by using the data matching model can be presented in the same or similar form as the tag, so that the international operation name matched with the operation name to be matched can be indexed by using the matching result, and the specific classification condition of the matched international operation name in the value domain classification tree can be determined.
In this application, utilize the data matching model of training in advance to carry out the automatic matching of value range data, compare the matching mode who compares in manual and effectively saved the human labor, promote matching efficiency. In addition, because the value domain classification tree is divided according to the parts of the human body or the animal body, even if the operation name to be matched is similar to other non-national standard operation names, the operation name to be matched can be effectively distinguished from other operation names which are different in part (namely different in classification) but similar in name according to the node index value, and matching errors are avoided. Therefore, compared with a matching scheme of fuzzy query and word segmentation comparison, the method and the device can improve the anti-interference performance of matching, and further improve the matching accuracy of value range data.
It should be noted that, in the present specification, all the embodiments are described in a progressive manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the apparatus and system embodiments, since they are substantially similar to the method embodiments, they are described in a relatively simple manner, and reference may be made to some of the descriptions of the method embodiments for related points. The above-described embodiments of the apparatus and system are merely illustrative, and the units described as separate parts may or may not be physically separate, and the parts suggested as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
The above description is only one specific embodiment of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present application should be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (10)

1. A value domain data matching method, comprising:
obtaining value domain data to be matched;
processing the operation name to be matched in the value domain data to be matched to obtain a feature vector group to be matched;
obtaining a matching result by utilizing a data matching model and the feature vector group to be matched; the data matching model is obtained by utilizing a labeled sample feature vector set to train in advance; the label comprises a name index value of a national standard operation name corresponding to the historical operation name and a node index value corresponding to the national standard operation name in each layer of a value domain classification tree; the value domain classification tree is a structure tree for classifying national standard operation names according to the parts of the human body or the animal body.
2. The method according to claim 1, wherein obtaining the data matching model specifically comprises:
classifying a plurality of national standard operation names included in the international disease classification standard according to the parts of the human body or the animal body to obtain a value domain classification tree; each layer in the value domain classification tree at least comprises one node;
the historical operation names obtained from a hospital information system HIS and the corresponding relation between the historical operation names and national standard operation names;
processing the historical operation name to obtain the sample feature vector group; obtaining the label by using the historical operation name, the corresponding relation and the value range classification tree;
and training the model to be trained by utilizing the sample characteristic vector group with the label, and stopping training and obtaining the data matching model when a preset finishing condition is met.
3. The method according to claim 2, wherein the processing the historical procedure name to obtain the sample feature vector group comprises:
splitting the historical operation name to obtain m dimensionality basic features corresponding to the historical operation name; obtaining a characteristic relation characteristic value w of the k dimension by using the m-dimension basic characteristicskWherein k is 1,2, …, m;
acquiring department information and/or registration information of the historical operation name from a hospital information system HIS, and acquiring a category vector of the historical operation name by using the department information and/or registration information;
characteristic relation characteristic value w using m dimensions1,w2,…,wmAnd the category vector, obtaining the sample feature vector group.
4. Root of herbaceous plantThe method according to claim 3, wherein the k-dimension characteristic relation characteristic value w is obtained by using the m-dimension basic characteristicskThe method specifically comprises the following steps:
obtaining a correlation score between the basic feature of the kth dimension and the basic features of other dimensions by using a Pearson calculation formula, a Spanish calculation formula or a Chi-square test method;
obtaining a characteristic relation characteristic value w of the kth dimension by using a preset correlation coefficient and the correlation scorek
5. The method according to claim 3, wherein the splitting the historical procedure name to obtain m-dimensional basic features corresponding to the historical procedure name specifically comprises:
and splitting the historical operation name to obtain a keyword, a target word, a word before or after the target word in a preset word window, a target word and a word before or after the target word in the preset word window of the historical operation name.
6. The method according to any one of claims 3 to 5, wherein the processing the operation name to be matched in the value domain data to be matched to obtain the feature vector group to be matched specifically includes:
splitting the operation name to be matched to obtain m dimensionality basic features corresponding to the operation name to be matched; obtaining a characteristic relation characteristic value t of the kth dimension of the operation name to be matched by using the m-dimension basic characteristics corresponding to the operation name to be matchedk(ii) a Wherein k is 1,2, …, m;
acquiring department information and/or registration information of the operation name to be matched from the HIS, and acquiring a category vector of the operation name to be matched by utilizing the department information and/or registration information of the operation name to be matched;
utilizing the characteristic relation characteristic value t of m dimensions of the operation name to be matched1,t2,…,tmAnd the name of the operation to be matchedObtaining the feature vector group to be matched.
7. A value domain data matching apparatus, comprising:
the data acquisition module is used for acquiring value domain data to be matched;
the data processing module is used for processing the operation name to be matched in the value domain data to be matched to obtain a feature vector group to be matched;
the data matching module is used for obtaining a matching result by utilizing a data matching model and the feature vector group to be matched; the data matching model is obtained by utilizing a labeled sample feature vector set to train in advance; the label comprises a name index value of a national standard operation name corresponding to the historical operation name and a node index value corresponding to the national standard operation name in each layer of a value domain classification tree; the value domain classification tree is a structure tree for classifying national standard operation names according to the parts of the human body or the animal body.
8. The apparatus of claim 7, further comprising: the model training module specifically comprises:
a value domain classification tree obtaining unit, configured to classify a plurality of national standard operation names included in the international disease classification standard according to a part of a human or animal body, so as to obtain a value domain classification tree; each layer in the value domain classification tree at least comprises one node;
a surgical name acquisition unit for acquiring the historical surgical name;
the corresponding relation obtaining unit is used for obtaining the corresponding relation between the historical operation name and the national standard operation name;
the sample characteristic vector group acquisition unit is used for processing the historical operation name to acquire a sample characteristic vector group;
a label obtaining unit, configured to obtain the label by using the historical procedure name, the correspondence, and the value range classification tree;
and the model training unit is used for training the model to be trained by utilizing the sample characteristic vector group with the label, and stopping training and obtaining the data matching model when a preset finishing condition is met.
9. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out a value range data matching method according to any one of claims 1 to 6.
10. A processor for running a computer program, which when running performs the value range data matching method of any one of claims 1 to 6.
CN201911222384.0A 2019-12-03 2019-12-03 Value range data matching method and device and related products Active CN111128388B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911222384.0A CN111128388B (en) 2019-12-03 2019-12-03 Value range data matching method and device and related products

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911222384.0A CN111128388B (en) 2019-12-03 2019-12-03 Value range data matching method and device and related products

Publications (2)

Publication Number Publication Date
CN111128388A true CN111128388A (en) 2020-05-08
CN111128388B CN111128388B (en) 2024-02-27

Family

ID=70497399

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911222384.0A Active CN111128388B (en) 2019-12-03 2019-12-03 Value range data matching method and device and related products

Country Status (1)

Country Link
CN (1) CN111128388B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112818085A (en) * 2021-01-28 2021-05-18 东软集团股份有限公司 Value range data matching method and device, storage medium and electronic equipment
CN113656467A (en) * 2021-08-20 2021-11-16 北京百度网讯科技有限公司 Search result sorting method and device and electronic equipment
CN113925607A (en) * 2021-11-12 2022-01-14 上海微创医疗机器人(集团)股份有限公司 Operation training method, device, system, medium and equipment for surgical robot

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH05165803A (en) * 1991-12-16 1993-07-02 Hitachi Ltd Management system for data item designation
CN104156415A (en) * 2014-07-31 2014-11-19 沈阳锐易特软件技术有限公司 Mapping processing system and method for solving problem of standard code control of medical data
CN105069123A (en) * 2015-08-13 2015-11-18 易保互联医疗信息科技(北京)有限公司 Automatic coding method and system for Chinese surgical operation information
CN105787282A (en) * 2016-03-24 2016-07-20 国家卫生计生委统计信息中心 Automatic standardization method and system for medical data dictionaries
CN108182207A (en) * 2017-12-15 2018-06-19 上海长江科技发展有限公司 The intelligent coding method and system of Chinese surgical procedure based on participle network
CN109542965A (en) * 2018-11-07 2019-03-29 平安医疗健康管理股份有限公司 A kind of data processing method, electronic equipment and storage medium
CN110246592A (en) * 2019-06-25 2019-09-17 山东健康医疗大数据有限公司 Realize the mapping method and system of medical institutions' isomeric data codomain code standardization

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH05165803A (en) * 1991-12-16 1993-07-02 Hitachi Ltd Management system for data item designation
CN104156415A (en) * 2014-07-31 2014-11-19 沈阳锐易特软件技术有限公司 Mapping processing system and method for solving problem of standard code control of medical data
CN105069123A (en) * 2015-08-13 2015-11-18 易保互联医疗信息科技(北京)有限公司 Automatic coding method and system for Chinese surgical operation information
CN105787282A (en) * 2016-03-24 2016-07-20 国家卫生计生委统计信息中心 Automatic standardization method and system for medical data dictionaries
CN108182207A (en) * 2017-12-15 2018-06-19 上海长江科技发展有限公司 The intelligent coding method and system of Chinese surgical procedure based on participle network
CN109542965A (en) * 2018-11-07 2019-03-29 平安医疗健康管理股份有限公司 A kind of data processing method, electronic equipment and storage medium
CN110246592A (en) * 2019-06-25 2019-09-17 山东健康医疗大数据有限公司 Realize the mapping method and system of medical institutions' isomeric data codomain code standardization

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112818085A (en) * 2021-01-28 2021-05-18 东软集团股份有限公司 Value range data matching method and device, storage medium and electronic equipment
CN113656467A (en) * 2021-08-20 2021-11-16 北京百度网讯科技有限公司 Search result sorting method and device and electronic equipment
CN113656467B (en) * 2021-08-20 2023-07-25 北京百度网讯科技有限公司 Method and device for sorting search results and electronic equipment
CN113925607A (en) * 2021-11-12 2022-01-14 上海微创医疗机器人(集团)股份有限公司 Operation training method, device, system, medium and equipment for surgical robot
CN113925607B (en) * 2021-11-12 2024-02-27 上海微创医疗机器人(集团)股份有限公司 Operation robot operation training method, device, system, medium and equipment

Also Published As

Publication number Publication date
CN111128388B (en) 2024-02-27

Similar Documents

Publication Publication Date Title
CN111414393B (en) Semantic similar case retrieval method and equipment based on medical knowledge graph
CN110176315B (en) Medical question-answering method and system, electronic equipment and computer readable medium
CN107731269B (en) Disease coding method and system based on original diagnosis data and medical record file data
CN107705839B (en) Disease automatic coding method and system
US6438533B1 (en) System for retrieval of information from data structure of medical records
CN110021439A (en) Medical data classification method, device and computer equipment based on machine learning
CN106776711A (en) A kind of Chinese medical knowledge mapping construction method based on deep learning
CN110675944A (en) Triage method and device, computer equipment and medium
DE102013202365A1 (en) RETRIEVING INFORMATION FROM ILLNANCES
CN111128388B (en) Value range data matching method and device and related products
Li et al. Ffa-ir: Towards an explainable and reliable medical report generation benchmark
CN112734202B (en) Medical capability evaluation method, device, equipment and medium based on electronic medical record
CN110032631B (en) Information feedback method, device and storage medium
CN109102899A (en) Chinese medicine intelligent assistance system and method based on machine learning and big data
CN113764112A (en) Online medical question and answer method
CN112541066B (en) Text-structured-based medical and technical report detection method and related equipment
CN111191415A (en) Operation classification coding method based on original operation data
Lacoste et al. Medical-image retrieval based on knowledge-assisted text and image indexing
CN116910172B (en) Follow-up table generation method and system based on artificial intelligence
CN116303981B (en) Agricultural community knowledge question-answering method, device and storage medium
CN111292814A (en) Medical data standardization method and device
CN115858886B (en) Data processing method, device, equipment and readable storage medium
CN113111159A (en) Question and answer record generation method and device, electronic equipment and storage medium
CN116092699A (en) Cancer question-answer interaction method based on pre-training model
CN116992002A (en) Intelligent care scheme response method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant