CN113742348B - Patient data matching method in CDR system, main index establishing method and device - Google Patents

Patient data matching method in CDR system, main index establishing method and device Download PDF

Info

Publication number
CN113742348B
CN113742348B CN202111045885.3A CN202111045885A CN113742348B CN 113742348 B CN113742348 B CN 113742348B CN 202111045885 A CN202111045885 A CN 202111045885A CN 113742348 B CN113742348 B CN 113742348B
Authority
CN
China
Prior art keywords
data
matching
field
confirmed
ith
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111045885.3A
Other languages
Chinese (zh)
Other versions
CN113742348A (en
Inventor
刘新辉
张勇斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Clinbrain Information Technology Co Ltd
Original Assignee
Shanghai Clinbrain Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Clinbrain Information Technology Co Ltd filed Critical Shanghai Clinbrain Information Technology Co Ltd
Priority to CN202111045885.3A priority Critical patent/CN113742348B/en
Publication of CN113742348A publication Critical patent/CN113742348A/en
Application granted granted Critical
Publication of CN113742348B publication Critical patent/CN113742348B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2457Query processing with adaptation to user needs

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The invention provides a patient data matching method in a CDR system, a main index establishing method and a main index establishing device. The patient data matching method in the CDR system comprises the following steps: acquiring data to be matched and confirmed data; based on at least two combinations of the matching fields, sequentially acquiring at least two similarities of each piece of confirmed data; and judging whether the matching is successful or not based on all the similarities, and obtaining the confirmed data matched with the data to be matched. Based on the patient data matching method in the CDR system, the main index of the patient can be constructed, all history visit records of the patient can be further obtained, diagnosis of illness and medical scientific research are assisted, and the problem that in the prior art, uniform patient identification does not exist among all business systems of a hospital is solved. On the other hand, the matching mode of multiple times is used, so that the effectiveness of a matching result is also improved, and complex data working conditions can be dealt with.

Description

Patient data matching method in CDR system, main index establishing method and device
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a method for matching patient data in a CDR system, a method for creating a primary index, and a device thereof.
Background
At present, the information systems in hospitals are more, patient identifications among the information systems are inconsistent, association and cross indexing cannot be carried out to obtain other related information, information islands are easy to form, the maximum utilization of medical data resources cannot be realized, and the consistency and the integrity of patient information of the systems are poor.
In summary, in the prior art, there is a problem that there is no uniform patient identification between the business systems.
Disclosure of Invention
The invention aims to provide a patient data matching method, a main index establishing method and a device in a CDR system, which are used for solving the problem that in the prior art, uniform patient identification does not exist among all business systems of a hospital.
In order to solve the above technical problem, according to a first aspect of the present invention, there is provided a patient data matching method in a CDR system, including:
acquiring data to be matched and confirmed data, wherein the data to be matched comprises a matching field, and the confirmed data comprises the matching field;
Based on the ith combination of the matching fields, sequentially acquiring the ith similarity of the data to be matched and each piece of confirmed data;
judging whether the matching is successful or not based on all the ith similarity, and if so, obtaining one piece of confirmed data matched with the data to be matched based on all the ith similarity;
Wherein, the value range of i is all integers from 1 to n, and n is an integer greater than 1.
Optionally, the step of determining whether the matching is successful based on all the ith similarities includes: if the ith similarity corresponding to each piece of confirmed data is smaller than or equal to an ith threshold value, failing to match; otherwise, the matching is successful;
Or alternatively
The step of judging whether the matching is successful based on all the ith similarity comprises the following steps: if the sum of all the ith similarity corresponding to each piece of confirmed data is smaller than a preset threshold value, failing to match; otherwise, the matching is successful.
Optionally, the step of obtaining a piece of the confirmed data matched with the data to be matched based on all the ith similarity includes: and selecting the confirmed data with the largest sum of all the ith similarity.
Optionally, the step of obtaining a piece of the confirmed data matched with the data to be matched based on all the ith similarity includes:
If the ith similarity of at least one piece of confirmed data in the ith set is greater than an ith threshold value and i is smaller than n, the confirmed data with the ith similarity greater than the ith threshold value in the ith set forms an ith+1 set and is re-judged;
Otherwise, selecting the confirmed data with the largest sum of all the ith similarity in the ith set, or selecting the confirmed data with the largest ith similarity in the ith set;
wherein set 1 is all of the acknowledged data.
Optionally, the step of obtaining the i-th similarity between the data to be matched and the confirmed data includes:
and sequentially obtaining the similarity value corresponding to each matching field in the ith combination, wherein the similarity value is obtained after weighted average based on the ith weighting parameter.
Optionally, each matching field in the data to be matched stores only one attribute value, and each matching field in the confirmed data stores one or more than two attribute values; the step of obtaining the similarity value corresponding to the matching field includes:
and carrying out similarity calculation on the attribute values in the data to be matched and each attribute value in the matching field corresponding to the confirmed data, and obtaining the similarity value after weighted average of calculation results.
Optionally, the 1 st combination includes a name field, a gender field, and an identification card number field, where the 1 st weighting parameter corresponding to the identification card number field is greater than 0.5.
Optionally, the matching field includes a name field, and the method for obtaining the similarity value corresponding to the name field includes: the calculation is carried out according to the following formula:
Wherein similarity represents the similarity value, ED AB represents the edit distance between A and B, max () represents maximum operation, L A represents the string length of A, L B represents the string length of B, A represents the attribute value stored in the name field in the data to be matched, and B represents the attribute value stored in the name field in the confirmed data.
In order to solve the above technical problem, according to a second aspect of the present invention, there is provided a patient primary index establishing method in a CDR system, including:
acquiring original data from at least two service systems, wherein the original data comprises a matching field;
The original data is classified into first data and second data based on a cleaning rule;
The first data generates confirmed data based on a merging rule, wherein the confirmed data comprises a matching field and a main index field;
The second data is configured as data to be matched, and the data to be matched obtains a matching result based on the patient data matching method in the CDR system;
if the matching is successful, combining the current data to be matched with the matched confirmed data;
if the matching fails, the current data to be matched generates temporary index data.
Optionally, the matching field includes a name field and an identification card field, and the merging rule includes:
Judging whether the first data is equal to the identity card field of one piece of confirmed data or not, and judging that the current first data is equal to the name field of the current confirmed data or not;
If the first data is equal to the identity card field of one piece of confirmed data and the current first data is equal to the name field of the current confirmed data; combining the current first data and the current confirmed data;
Otherwise, the current first data is independently converted into a new piece of the confirmed data.
Optionally, each of the matching fields in the first data stores only one attribute value, each of the matching fields in the confirmed data stores one or more attribute values, and the step of determining whether the matching fields of the first data and the confirmed data are equal includes:
if the attribute value of the matching field of the first data is a null value, judging that the result is unequal;
If the attribute value of the matching field of the first data is not null and the attribute values of the matching field of the first data and the matching field of the confirmed data are not equal, judging that the result is unequal;
And if the attribute value of the matching field of the first data is not null and the attribute value of the matching field of the first data is equal to one of the attribute values of the matching field of the confirmed data, judging that the result is equal.
Optionally, the step of merging the data to be merged with the confirmed data includes:
Judging the data to be combined and the confirmed data with respect to each matching field in sequence, and if the attribute value of the matching field of the data to be combined is not null and the attribute values of the matching field of the data to be combined and the attribute value of the matching field of the confirmed data are not equal, storing the current attribute value of the data to be combined into the matching field of the confirmed data;
the data to be combined comprises the first data and the data to be matched.
In order to solve the above technical problem, according to a third aspect of the present invention, there is provided a patient primary index establishing apparatus, the medical system patient primary index establishing apparatus including a matching module for executing the above data matching method.
Compared with the prior art, in the patient data matching method, the main index establishing method and the device in the CDR system, the patient data matching method in the CDR system comprises the following steps: acquiring data to be matched and confirmed data; based on at least two combinations of the matching fields, sequentially acquiring at least two similarities of each piece of confirmed data; and judging whether the matching is successful or not based on all the similarities, and obtaining the confirmed data matched with the data to be matched. Based on the patient data matching method in the CDR system, the main index of the patient can be constructed, all history visit records of the patient can be further obtained, diagnosis of illness and medical scientific research are assisted, and the problem that in the prior art, uniform patient identification does not exist among all business systems of a hospital is solved. On the other hand, the matching mode of multiple times is used, so that the effectiveness of a matching result is also improved, and complex data working conditions can be dealt with.
Drawings
Those of ordinary skill in the art will appreciate that the figures are provided for a better understanding of the present invention and do not constitute any limitation on the scope of the present invention. Wherein:
FIG. 1 is a flow chart of a patient data matching method in a CDR system according to an embodiment of the present invention;
FIG. 2 is a flow chart of a method for establishing a patient primary index in a CDR system according to an embodiment of the present invention.
Detailed Description
The invention will be described in further detail with reference to the drawings and the specific embodiments thereof in order to make the objects, advantages and features of the invention more apparent. It should be noted that the drawings are in a very simplified form and are not drawn to scale, merely for convenience and clarity in aiding in the description of embodiments of the invention. Furthermore, the structures shown in the drawings are often part of actual structures. In particular, the drawings are shown with different emphasis instead being placed upon illustrating the various embodiments.
As used in this disclosure, the singular forms "a," "an," and "the" include plural referents, the term "or" are generally used in the sense of comprising "and/or" and the term "several" are generally used in the sense of comprising "at least one," the term "at least two" are generally used in the sense of comprising "two or more," and the term "first," "second," and "third" are used for descriptive purposes only and are not to be construed as indicating or implying any relative importance or number of features indicated. Thus, a feature defining "first," "second," "third," or "third" may explicitly or implicitly include one or at least two such features, with "one end" and "another end" and "proximal end" and "distal end" generally referring to the respective two portions, including not only the endpoints, but also the terms "mounted," "connected," "coupled," and "connected" are to be construed broadly, e.g., as being either a fixed connection, a removable connection, or as being integral therewith; can be mechanically or electrically connected; can be directly connected or indirectly connected through an intermediate medium, and can be communicated with the inside of two elements or the interaction relationship of the two elements. Furthermore, as used in this disclosure, an element disposed on another element generally only refers to a connection, coupling, cooperation or transmission between two elements, and the connection, coupling, cooperation or transmission between two elements may be direct or indirect through intermediate elements, and should not be construed as indicating or implying any spatial positional relationship between the two elements, i.e., an element may be in any orientation, such as inside, outside, above, below, or on one side, of the other element unless the context clearly indicates otherwise. The specific meaning of the above terms in the present invention can be understood by those of ordinary skill in the art according to the specific circumstances.
The invention provides a patient data matching method, a main index establishing method and a device in a CDR (CLINICAL DATA Resposiry clinical data center system) system, which are used for solving the problem that in the prior art, uniform patient identification does not exist among all business systems of a hospital.
The following description refers to the accompanying drawings.
Referring to fig. 1 to 2, fig. 1 is a flowchart illustrating a patient data matching method in a CDR system according to an embodiment of the present invention; FIG. 2 is a flow chart of a method for establishing a patient primary index in a CDR system according to an embodiment of the present invention.
As shown in fig. 1, the present embodiment provides a patient data matching method in a CDR system, including:
S100, obtaining data to be matched and confirmed data, wherein the data to be matched comprises a matching field, and the confirmed data comprises the matching field;
s200, based on the ith combination of the matching fields, sequentially acquiring the ith similarity of the data to be matched and each piece of confirmed data;
S300, judging whether the matching is successful or not based on all the ith similarity, and if so, obtaining one piece of confirmed data matched with the data to be matched based on all the ith similarity;
Wherein i is an integer from 1 to n, and n is an integer greater than 1.
In step S100, the inclusion of the matching field should be understood as follows. Assuming that a field name of one of the matching fields is "F", wherein an attribute value of the "F" field of a certain piece of data (specifically, the data to be matched or the confirmed data, herein simply referred to as data for convenience of description) is a null value; based on different specifications or standards, "F" may be contained in the piece of data: NULL "or" F: ; "character strings may not include character strings representing the" F "field at all. However, in all data processed by the method, as long as there is a piece of data including a character string representing an "F" field, or an operation method for analyzing, reading or post-processing the data includes an "F" field, it should be understood that the piece of data includes the "F" field.
In step S200, a total of n calculations are actually performed, the ith calculation uses the ith combination, and the ith similarity is obtained. In step S200, the k1 st calculation and the k2 nd calculation differ in that the k1 st combination and the k2 nd combination are different and/or the calculation manner is different. Wherein k1+.k2, k1 has a value ranging from 1 to n, and k2 has a value ranging from 1 to n.
In step S300, it is determined whether the matching is successful or a scheme of matching data is obtained, which can be understood by referring to the following description of the present embodiment, and those skilled in the art can also understand the protection scope of the technical solution of the present invention by modifying the specific scheme provided in the present embodiment.
By means of the configuration, through fuzzy matching of multiple rounds, the data to be matched can find the closest matching data, and the establishment of a subsequent global index value is facilitated. Is a core method for solving the problem that the uniform patient identification is not available among all business systems of the hospital.
In one embodiment, the step of determining whether the matching is successful based on all the ith similarities includes: if the i-th similarity corresponding to each of the confirmed data is less than or equal to an i-th threshold (it should be understood herein that this is true for i taken from 1 to n), the matching fails; otherwise, the matching is successful.
Assume that the validated data is 3 and numbered 1, 2,3, respectively. n has a value of 3, the 1 st threshold value is 0.9, the 2 nd threshold value is 0.9, and the 3 rd threshold value is 0.9. The similarity between the data to be matched and the confirmed data is shown in table 1.
Table 1 similarity of the matching data
Confirmed data number 1 St similarity Similarity of 2 nd Similarity of 3 rd
1 0.8 0.7 0.6
2 0.6 0.7 0.8
3 0.3 0.2 0.3
Since the ith similarity of each of the confirmed data in table 1 is smaller than the ith threshold, the matching is considered to fail.
In another embodiment, the step of determining whether the matching is successful based on all the ith similarities includes: if the sum of all the ith similarity corresponding to each piece of confirmed data is smaller than a preset threshold value, failing to match; otherwise, the matching is successful.
Assume that the validated data is 3 and numbered 1,2,3, respectively. n has a value of 3, and the preset threshold value is 2.7. The similarity between the data to be matched and the confirmed data is shown in table 2.
Table 2 similarity of the matching data
Confirmed data number 1 St similarity Similarity of 2 nd Similarity of 3 rd
1 0.8 0.7 0.6
2 0.6 0.7 0.8
3 0.3 0.2 0.3
Since the sum of all the similarities of each of the pieces of confirmed data in table 2 is less than 2.7, the matching is considered to be failed.
In one embodiment, the step of obtaining a piece of the confirmed data matched with the data to be matched based on all the ith similarities includes: and selecting the confirmed data with the largest sum of all the ith similarity.
Assume that the validated data is 3 and numbered 1,2,3, respectively. n has a value of 3. The similarity between the data to be matched and the confirmed data is shown in table 3.
Table 3 similarity of the matching data
Confirmed data number 1 St similarity Similarity of 2 nd Similarity of 3 rd
1 0.8 0.7 0.8
2 0.6 0.7 0.8
3 0.3 0.2 0.3
Since the sum of all the similarities of the confirmed data numbered 1 in table 3 is 2.5, which is the maximum value, the confirmed data of item 1 is selected as the matching data.
It is to be understood that although the sum of all the similarities of the 1 st piece of data in table 3 is smaller than the preset threshold value 2.7 in the previous example, in the present embodiment, it is not defined what the condition of failure of judgment is. In this embodiment, it is possible to adopt a scheme in which the sum is greater than 1, that is, the matching is considered to be successful, a scheme in which the 1 st threshold value to the 3 rd threshold value are each 0.5 may be selected, and other possible schemes may be selected. The examples herein are merely selection criteria for illustrating matching data, and are not criteria for determining whether a match is successful.
It should be appreciated that when there are at least two pieces of data that meet the conditions (e.g., there are two pieces of data with equal maximum values), the data with the greatest similarity 1 is selected according to additional rules, for example, the data is selected randomly, the data with the earliest creation time is selected, or other rules for comprehensive judgment. In most cases, there will not be exactly more than one confirmed data with the largest sum of all the ith similarities, and the additional rule here is set only for preventing program errors, so a simpler rule can be set. Similar descriptions to the logic of this paragraph will be understood from the concepts of this paragraph.
In another embodiment, the step of obtaining a piece of the confirmed data matched with the data to be matched based on all the ith similarities includes:
S301, if at least one piece of the confirmed data exists in the ith set, the ith similarity is larger than an ith threshold value and is smaller than n, the confirmed data with the ith similarity larger than the ith threshold value in the ith set forms an ith+1 set, and the judgment is repeated;
s302, otherwise, selecting the confirmed data having the largest sum of all the i-th similarities in the i-th set;
wherein set 1 is all of the acknowledged data.
Assume that the validated data is 5 and numbered 1, 2, 3, 4, 5, respectively. n has a value of 3, the 1 st threshold value is 0.6, the 2 nd threshold value is 0.75, and the 3 rd threshold value is 0.65. The similarity between the data to be matched and the confirmed data is shown in table 4.
Table 4 similarity of the matching data
In round 1, since the 1 st similarity of the confirmed data of No. 3 is 0.6, the 2 nd set is the confirmed data of No. 1, 2, 4, 5, the 3 rd set is the confirmed data of No.4, 5, and since i is equal to n at the time of judging the 3 rd set, the data of No.4 is selected based on "the confirmed data having the largest sum of all the i-th similarities in the i-th set is selected". As can be seen from the examples herein, although the data numbered 3 has the greatest sum of all the similarities, it is not the data that is ultimately matched.
In another example, the 2 nd threshold is 0.8, and the other conditions are exactly the same as in the previous example. At this time, no data greater than the 2 nd threshold value exists in the 2 nd set, and therefore, the data having the largest sum of all the similarities in the 2 nd set, that is, the data numbered 1, is selected.
The core idea of the logic is to select according to a mechanism similar to the elimination match, and if the similarity of a certain piece of confirmed data in a certain round is low, the confirmed data is eliminated from the candidate list.
In still another embodiment, the step of obtaining a piece of the confirmed data matched with the data to be matched based on all the ith similarities includes:
S301, if at least one piece of the confirmed data exists in the ith set, the ith similarity is larger than an ith threshold value and is smaller than n, the confirmed data with the ith similarity larger than the ith threshold value in the ith set forms an ith+1 set, and the judgment is repeated;
S302, otherwise, selecting the confirmed data with the highest i similarity in the i-th set;
wherein set 1 is all of the acknowledged data.
The main idea of the above embodiment is basically the same as that of the previous embodiment, except that the data with the highest i-th similarity is selected finally, and the specific implementation process can be understood with reference to the previous embodiment.
Further, the step of obtaining the i-th similarity between the data to be matched and the confirmed data includes:
and sequentially obtaining the similarity value corresponding to each matching field in the ith combination, wherein the similarity value is obtained after weighted average based on the ith weighting parameter.
For example, the i-th combination includes the matching fields "C", "D" and "E", where the i-th weighting parameter of "C" is 0.2, the i-th weighting parameter of "D" is 0.3, the i-th weighting parameter of "E" is 0.5, the similarity value corresponding to "C" is 0.7, the similarity value corresponding to "D" is 0.5, and the similarity value corresponding to "E" is 0.8, and the final similarity calculation result is 0.2×0.7+0.3×0.5+0.5×0.5=0.69.
It should be understood that, when i takes different values, the same i-th weighting parameter corresponding to the matching field may be different.
Further, each matching field in the data to be matched stores only one attribute value, and each matching field in the confirmed data stores one or more than two attribute values; the step of obtaining the similarity value corresponding to the matching field includes:
and carrying out similarity calculation on the attribute values in the data to be matched and each attribute value in the matching field corresponding to the confirmed data, and obtaining the similarity value after weighted average of calculation results.
When the matching fields are "C", "D" and "E", one possible form of the data to be matched is shown in table 5.
TABLE 5 exemplary forms of data to be matched
Field name C D E
Attribute value 3 8 6
One possible form of the validated data is shown in table 6.
Table 6 exemplary forms of validated data
In one of the confirmed data shown in table 6, the attribute values corresponding to the "C" field are 3, 4, and 7.
It should be understood that, in the actual service method, the data to be matched and the confirmed data further include other fields related to the service, and the present disclosure does not limit the storage manner of the data when storing the non-matching fields.
In a preferred embodiment, the 1 st combination includes a name field, a gender field, and an identification number field, and the 1 st weighting parameter corresponding to the identification number field is greater than 0.5. For example, the 1 st weighting parameter of the name field is 0.1, the 1 st weighting parameter of the gender field is 0.1, and the 1 st weighting parameter corresponding to the identification number field is 0.8. By the configuration, the calculated 1 st similarity can be distinguished more.
In some embodiments, the 2 nd combination may include a contact phone field, the 2 nd weighting parameter of the contact phone field is greater than 0.5, and other fields of the 2 nd combination may be set according to different requirements. The 3 rd combination may include a home address field, the 3 rd weighting parameter of the home address field is greater than 0.5, and other fields of the 3 rd combination may be set according to different requirements.
The matching field comprises a name field, and the method for acquiring the similarity value corresponding to the name field comprises the following steps: the calculation is carried out according to the following formula:
Wherein similarity represents the similarity value, ED AB represents the edit distance between A and B, max () represents maximum operation, L A represents the string length of A, L B represents the string length of B, A represents the attribute value stored in the name field in the data to be matched, and B represents the attribute value stored in the name field in the confirmed data.
The edit distance may also be referred to as a Levenshtein (name) distance, and refers to the minimum number of editing operations required to convert one character string into another character string. The permitted editing operations include replacing one character with another, inserting one character, and deleting one character. The edit distance was first proposed by the russian scientist Levenshtein. So configured, on the one hand, the problem of calculating the similarity between name strings is solved, and on the other hand, when two names are completely different, the calculation result is 0 and the expected match.
The present embodiment provides a method for establishing a patient primary index in a CDR system, please refer to fig. 2, wherein the method for establishing a patient primary index in the CDR system includes:
S10, acquiring original data from at least two service systems, wherein the original data comprises a matching field;
S20, classifying the original data into first data and second data based on a cleaning rule;
S31, the first data generates confirmed data based on a merging rule, wherein the confirmed data comprises a matching field and a main index field;
S41, the second data are configured to be matched with data, and the data to be matched obtain a matching result based on the data matching method;
s42, if the matching is successful, combining the current data to be matched with the matched confirmed data;
s43, if the matching fails, the current data to be matched generates temporary index data.
In fig. 2, the precise data is the first data, the fuzzy data is the second data, and the fuzzy matching is the data matching method. In step S10, the CDR system data, i.e. data originating from at least two service systems, is merged into the precision data, i.e. the validated data, which is understood to be merged into the precision data, i.e. the master index has been generated, in step S42.
The flow of processing the stock data and the incremental data is practically indistinguishable, except that when the stock data is processed (or when the patient master index setup method in the CDR system is first run), the initial amount of the validated data is 0, and when the incremental data is processed, a part of the validated data already exists.
The generation rule of the main index field can be set according to actual requirements, and is not described herein. In step S20, the cleaning rule may be set according to actual needs, and in an embodiment, it may be set that data with the name field being null and the id card field being null is classified as the second data, and the rest is classified as the first data. Other rules may be set in other embodiments.
Further, the matching field includes a name field and an identification card field, and the merging rule includes:
Judging whether the first data is equal to the identity card field of one piece of confirmed data or not, and judging that the current first data is equal to the name field of the current confirmed data or not;
If the first data is equal to the identity card field of one piece of confirmed data and the current first data is equal to the name field of the current confirmed data; combining the current first data and the current confirmed data;
Otherwise, the current first data is independently converted into a new piece of the confirmed data.
That is, equal data are merged and unequal data are independently converted into a new piece of the confirmed data. The conversion process may include: the conversion process may further comprise other steps as required by the business logic, which may be set by a person skilled in the art based on common general knowledge, and which is not described herein, by copying the entire content of the first data and adding the main index field. It is theoretically possible that two pieces of unconditional data actually point to the same patient, but it is found in practice that the amount of errors caused by this rule is small and that manual correction is possible after such errors occur, and therefore the above-described logic is used for setting and distinguishing in this embodiment.
Each of the matching fields in the first data stores only one attribute value and each of the matching fields in the validated data stores one or more attribute values, the logic herein also being understood with reference to the foregoing in relation to tables 5 and 6. The step of determining whether the matching fields of the first data and the validated data are equal comprises:
if the attribute value of the matching field of the first data is a null value, judging that the result is unequal;
If the attribute value of the matching field of the first data is not null and the attribute values of the matching field of the first data and the matching field of the confirmed data are not equal, judging that the result is unequal;
And if the attribute value of the matching field of the first data is not null and the attribute value of the matching field of the first data is equal to one of the attribute values of the matching field of the confirmed data, judging that the result is equal.
The weight used in the weighted average may be set according to the number of times each attribute appears in the history data, or may be set according to other manners.
The step of merging the data to be merged and the confirmed data comprises the following steps:
Judging the data to be combined and the confirmed data with respect to each matching field in sequence, and if the attribute value of the matching field of the data to be combined is not null and the attribute values of the matching field of the data to be combined and the attribute value of the matching field of the confirmed data are not equal, storing the current attribute value of the data to be combined into the matching field of the confirmed data;
the data to be combined comprises the first data and the data to be matched.
For example, the content of the data to be combined is shown in table 5, the content of the confirmed data is shown in table 6, and the combined data is shown in table 7.
Table 7 exemplary forms of consolidated validated data
The merging mode of merging other non-matching fields of the data to be merged and the confirmed data may be set according to actual needs, and will not be described herein.
Based on the method, a CDR system can be developed, the CDR system integrates medical data of all systems of a hospital, and an EMPI (ENTERPRISE MASTER PATIENT Index, patient main Index) system establishes a patient main Index for the CDR system to perform unified management of the medical data. The accuracy of the patient primary index depends on the accuracy of the patient information matching algorithm. The EMPI system provides patient primary index generation and patient primary index query functions. By using the main index of the patient in the EMPI system, doctors and related personnel can quickly find all the history visit records of the patient in the CDR system, and the diagnosis of the illness and the medical scientific research are assisted.
The embodiment also provides a patient main index establishing device, which comprises a matching module, wherein the matching module is used for executing the patient data matching method in the CDR system.
Optionally, the patient main index establishing device further includes:
The acquisition module is used for acquiring original data from at least two service systems, wherein the original data comprises a matching field;
The classification module is used for classifying the original data into first data and second data based on a cleaning rule;
the merging module is used for generating confirmed data from the first data based on a merging rule, wherein the confirmed data comprises a matching field and a main index field;
The input module is used for configuring the second data into data to be matched and inputting the data to be matched into the matching module; and
The processing module is used for processing data based on the matching result of the matching module, and if matching is successful, the current data to be matched and the matched confirmed data are combined; if the matching fails, the current data to be matched generates temporary index data.
The specific workflow of the above-described apparatus may be understood with reference to the description herein regarding the method of patient primary index establishment in a CDR system.
The patient main index establishing device can solve the problem that in the prior art, unified patient identification does not exist among all service systems.
Compared with the prior art, in the patient data matching method, the main index establishing method and the device in the CDR system, the patient data matching method in the CDR system comprises the following steps: acquiring data to be matched and confirmed data; based on at least two combinations of the matching fields, sequentially acquiring at least two similarities of each piece of confirmed data; and judging whether the matching is successful or not based on all the similarities, and obtaining the confirmed data matched with the data to be matched. Based on the patient data matching method in the CDR system, the main index of the patient can be constructed, all history visit records of the patient can be further obtained, diagnosis of illness and medical scientific research are assisted, and the problem that in the prior art, uniform patient identification does not exist among all business systems of a hospital is solved. On the other hand, the matching mode of multiple times is used, so that the effectiveness of a matching result is also improved, and complex data working conditions can be dealt with.
The foregoing description is only illustrative of the preferred embodiments of the present invention, and is not intended to limit the scope of the present invention in any way, and any changes and modifications made by those skilled in the art in light of the foregoing disclosure will be deemed to fall within the scope and spirit of the present invention.

Claims (12)

1. A method of matching patient data in a CDR system, comprising:
acquiring data to be matched and confirmed data, wherein the data to be matched comprises a matching field, and the confirmed data comprises the matching field;
Based on the ith combination of the matching fields, sequentially acquiring the ith similarity of the data to be matched and each piece of confirmed data; the step of obtaining the ith similarity of the data to be matched and the confirmed data comprises the following steps: sequentially obtaining a similarity value corresponding to each matching field in the ith combination, wherein the similarity value is obtained after weighted average based on the ith weighting parameter;
judging whether the matching is successful or not based on all the ith similarity, and if so, obtaining one piece of confirmed data matched with the data to be matched based on all the ith similarity;
Wherein, the value range of i is all integers from 1 to n, and n is an integer greater than 1.
2. The method of claim 1, wherein the step of determining whether the matching is successful based on all the ith similarities comprises: if the ith similarity corresponding to each piece of confirmed data is smaller than or equal to an ith threshold value, failing to match; otherwise, the matching is successful;
Or alternatively
The step of judging whether the matching is successful based on all the ith similarity comprises the following steps: if the sum of all the ith similarity corresponding to each piece of confirmed data is smaller than a preset threshold value, failing to match; otherwise, the matching is successful.
3. The method for matching patient data in CDR system according to claim 2, wherein the step of obtaining one piece of the confirmed data that matches the data to be matched based on all the ith degree of similarity comprises: and selecting the confirmed data with the largest sum of all the ith similarity.
4. The method for matching patient data in CDR system according to claim 2, wherein the step of obtaining one piece of the confirmed data that matches the data to be matched based on all the ith degree of similarity comprises:
If the ith similarity of at least one piece of confirmed data in the ith set is greater than an ith threshold value and i is smaller than n, the confirmed data with the ith similarity greater than the ith threshold value in the ith set forms an ith+1 set and is re-judged;
Otherwise, selecting the confirmed data with the largest sum of all the ith similarity in the ith set, or selecting the confirmed data with the largest ith similarity in the ith set;
wherein set 1 is all of the acknowledged data.
5. The method for matching patient data in CDR system according to claim 1, wherein each of the matching fields in the data to be matched stores only one attribute value, and each of the matching fields in the confirmed data stores one or more attribute values; the step of obtaining the similarity value corresponding to the matching field includes:
and carrying out similarity calculation on the attribute values in the data to be matched and each attribute value in the matching field corresponding to the confirmed data, and obtaining the similarity value after weighted average of calculation results.
6. The method of claim 1, wherein the 1 st combination includes a name field, a gender field, and an identification number field, and the 1 st weighting parameter corresponding to the identification number field is greater than 0.5.
7. The method for matching patient data in a CDR system according to claim 1, wherein the matching field includes a name field, and the method for obtaining the similarity value corresponding to the name field includes: the calculation is carried out according to the following formula:
Wherein similarity represents the similarity value, ED AB represents the edit distance between A and B, max () represents maximum operation, L A represents the string length of A, L B represents the string length of B, A represents the attribute value stored in the name field in the data to be matched, and B represents the attribute value stored in the name field in the confirmed data.
8. A method for establishing a patient primary index in a CDR system, comprising:
acquiring original data from at least two service systems, wherein the original data comprises a matching field;
The original data is classified into first data and second data based on a cleaning rule;
The first data generates confirmed data based on a merging rule, wherein the confirmed data comprises a matching field and a main index field;
The second data is configured as data to be matched, which obtains a matching result based on the patient data matching method in the CDR system according to any one of claims 1 to 7;
if the matching is successful, combining the current data to be matched with the matched confirmed data;
if the matching fails, the current data to be matched generates temporary index data.
9. The method for establishing a patient primary index in a CDR system according to claim 8, wherein the matching field comprises a name field and an identification card field, and the combining rule comprises:
Judging whether the first data is equal to the identity card field of one piece of confirmed data or not, and judging that the current first data is equal to the name field of the current confirmed data or not;
If the first data is equal to the identity card field of one piece of confirmed data and the current first data is equal to the name field of the current confirmed data; combining the current first data and the current confirmed data;
Otherwise, the current first data is independently converted into a new piece of the confirmed data.
10. The method of claim 9, wherein each of the matching fields in the first data stores only one attribute value, each of the matching fields in the validated data stores one or more attribute values, and the step of determining whether the matching fields of the first data and the validated data are equal comprises:
if the attribute value of the matching field of the first data is a null value, judging that the result is unequal;
If the attribute value of the matching field of the first data is not null and the attribute values of the matching field of the first data and the matching field of the confirmed data are not equal, judging that the result is unequal;
And if the attribute value of the matching field of the first data is not null and the attribute value of the matching field of the first data is equal to one of the attribute values of the matching field of the confirmed data, judging that the result is equal.
11. The method of claim 10, wherein each of the matching fields in the data to be matched stores only one attribute value, and the step of merging the data to be merged and the validated data comprises:
Judging the data to be combined and the confirmed data with respect to each matching field in sequence, and if the attribute value of the matching field of the data to be combined is not null and the attribute values of the matching field of the data to be combined and the attribute value of the matching field of the confirmed data are not equal, storing the current attribute value of the data to be combined into the matching field of the confirmed data;
the data to be combined comprises the first data and the data to be matched.
12. A patient primary index setup device comprising a matching module for performing a patient data matching method in a CDR system according to any one of claims 1 to 7.
CN202111045885.3A 2021-09-07 2021-09-07 Patient data matching method in CDR system, main index establishing method and device Active CN113742348B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111045885.3A CN113742348B (en) 2021-09-07 2021-09-07 Patient data matching method in CDR system, main index establishing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111045885.3A CN113742348B (en) 2021-09-07 2021-09-07 Patient data matching method in CDR system, main index establishing method and device

Publications (2)

Publication Number Publication Date
CN113742348A CN113742348A (en) 2021-12-03
CN113742348B true CN113742348B (en) 2024-05-17

Family

ID=78736661

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111045885.3A Active CN113742348B (en) 2021-09-07 2021-09-07 Patient data matching method in CDR system, main index establishing method and device

Country Status (1)

Country Link
CN (1) CN113742348B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115269613B (en) * 2022-09-27 2023-01-13 四川互慧软件有限公司 Patient main index construction method, system, equipment and storage medium
CN116072303B (en) * 2023-04-03 2023-06-02 南京吾爱网络技术有限公司 Medical information card data identification system and method for hospital information department

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101727535A (en) * 2008-10-30 2010-06-09 北大方正集团有限公司 Cross indexing method for patients crossing system and system thereof
CN103870668A (en) * 2012-12-17 2014-06-18 上海联影医疗科技有限公司 Method and device for establishing master patient index oriented to regional medical treatment
WO2015169597A1 (en) * 2014-05-07 2015-11-12 Cytolon Ag Methods and systems for predicting alloreactivity in transplantation
CN106650259A (en) * 2016-12-22 2017-05-10 深圳中兴网信科技有限公司 Patient information management method and management system
CN109739862A (en) * 2019-01-07 2019-05-10 深圳中兴网信科技有限公司 Main index of patients weight method for building up, Main index of patients weight establish system
CN110197724A (en) * 2019-03-12 2019-09-03 平安科技(深圳)有限公司 Predict the method, apparatus and computer equipment in diabetes illness stage
KR102055309B1 (en) * 2018-10-30 2019-12-13 재단법인 아산사회복지재단 Method and system for identifying patient
CN111739634A (en) * 2020-05-14 2020-10-02 平安科技(深圳)有限公司 Method, device and equipment for intelligently grouping similar patients and storage medium
CN111768821A (en) * 2020-05-29 2020-10-13 上海森亿医疗科技有限公司 Distributed patient record matching method, system and terminal
CN111785341A (en) * 2020-06-30 2020-10-16 平安国际智慧城市科技股份有限公司 Patient main index data merging method and device based on similarity
CN112286912A (en) * 2020-08-12 2021-01-29 上海柯林布瑞信息技术有限公司 Medical data quality checking method and device, terminal and storage medium
CN112863626A (en) * 2021-03-08 2021-05-28 北京冠新医卫软件科技有限公司 Multi-platform similar medical data removing method, device and equipment
CN112967799A (en) * 2021-03-30 2021-06-15 广州启生信息技术有限公司 Doctor data processing method and platform
CN113130038A (en) * 2021-04-30 2021-07-16 康键信息技术(深圳)有限公司 Medicine data matching method, device, equipment and storage medium

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7668820B2 (en) * 2004-07-28 2010-02-23 Ims Software Services, Ltd. Method for linking de-identified patients using encrypted and unencrypted demographic and healthcare information from multiple data sources
US20210166795A1 (en) * 2018-11-08 2021-06-03 Express Scripts Strategic Development, Inc. Systems and methods for patient record matching

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101727535A (en) * 2008-10-30 2010-06-09 北大方正集团有限公司 Cross indexing method for patients crossing system and system thereof
CN103870668A (en) * 2012-12-17 2014-06-18 上海联影医疗科技有限公司 Method and device for establishing master patient index oriented to regional medical treatment
WO2015169597A1 (en) * 2014-05-07 2015-11-12 Cytolon Ag Methods and systems for predicting alloreactivity in transplantation
CN106650259A (en) * 2016-12-22 2017-05-10 深圳中兴网信科技有限公司 Patient information management method and management system
KR102055309B1 (en) * 2018-10-30 2019-12-13 재단법인 아산사회복지재단 Method and system for identifying patient
CN109739862A (en) * 2019-01-07 2019-05-10 深圳中兴网信科技有限公司 Main index of patients weight method for building up, Main index of patients weight establish system
CN110197724A (en) * 2019-03-12 2019-09-03 平安科技(深圳)有限公司 Predict the method, apparatus and computer equipment in diabetes illness stage
CN111739634A (en) * 2020-05-14 2020-10-02 平安科技(深圳)有限公司 Method, device and equipment for intelligently grouping similar patients and storage medium
CN111768821A (en) * 2020-05-29 2020-10-13 上海森亿医疗科技有限公司 Distributed patient record matching method, system and terminal
CN111785341A (en) * 2020-06-30 2020-10-16 平安国际智慧城市科技股份有限公司 Patient main index data merging method and device based on similarity
CN112286912A (en) * 2020-08-12 2021-01-29 上海柯林布瑞信息技术有限公司 Medical data quality checking method and device, terminal and storage medium
CN112863626A (en) * 2021-03-08 2021-05-28 北京冠新医卫软件科技有限公司 Multi-platform similar medical data removing method, device and equipment
CN112967799A (en) * 2021-03-30 2021-06-15 广州启生信息技术有限公司 Doctor data processing method and platform
CN113130038A (en) * 2021-04-30 2021-07-16 康键信息技术(深圳)有限公司 Medicine data matching method, device, equipment and storage medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Combination of K-Means and Profile Matching for Drag Substitution;S. Paembonan等;2018 2nd East Indonesia Conference on Computer and Information Technology (EIConCIT);20191024;第2018卷;180-183 *
基于IHE PIX/PDQ框架构建临床数据中心MPI系统的应用研究;吴艳艳等;中国数字医学;20150215;第10卷(第2期);25-28 *
多源医疗数据分析模型研究及实现;陈震涛;中国优秀硕士学位论文全文数据库 (医药卫生科技辑);20210415;第2021卷(第4期);E054-24 *

Also Published As

Publication number Publication date
CN113742348A (en) 2021-12-03

Similar Documents

Publication Publication Date Title
CN113742348B (en) Patient data matching method in CDR system, main index establishing method and device
US10025904B2 (en) Systems and methods for managing a master patient index including duplicate record detection
US20180358112A1 (en) Hospital matching of de-identified healthcare databases without obvious quasi-identifiers
US10572461B2 (en) Systems and methods for managing a master patient index including duplicate record detection
CN109074858B (en) Hospital matching of de-identified healthcare databases without distinct quasi-identifiers
CN111210916B (en) Medical record home page coding method and system
US20110004626A1 (en) System and Process for Record Duplication Analysis
CN115269613B (en) Patient main index construction method, system, equipment and storage medium
JP2017521748A (en) Method and apparatus for generating an estimated ontology
Magnani et al. Uncertainty in data integration: current approaches and open problems.
CN111639077A (en) Data management method and device, electronic equipment and storage medium
US20170132372A1 (en) Integrating and/or adding longitudinal information to a de-identified database
US20210202111A1 (en) Method of classifying medical records
CN109102845B (en) Medical document auditing method, device, computer equipment and storage medium
CN113722306B (en) Medical data restoration method and device based on entity similarity and computer equipment
CN115588492A (en) Diagnosis guiding method and system based on artificial intelligence
US20230170100A1 (en) Medical data processing method and system
CN112861962B (en) Sample processing method, device, electronic equipment and storage medium
CN112967799A (en) Doctor data processing method and platform
CN112836483A (en) Intelligent medical term and template recommendation method and recommendation system
CN117038002B (en) Method and device for generating observation variable in drug evaluation research
CN116884629B (en) Digital management method and system for traditional Chinese medicine diagnosis and treatment based on AI
CN118039107A (en) Hospital informatization management system based on Internet and intelligent medical treatment
CN116881419A (en) Data query method, device, medium and equipment
CN107066803B (en) Cleaning method for patient main index data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant