Method for establishing patient identity source cross index
Technical Field
The invention relates to a medical information technology, in particular to a method for establishing a patient identity source cross index.
Background
With the rapid development of information technology and the deep innovation of medical system, the process of informatization construction of the medical industry in China is gradually accelerated. Medical information systems with specific functions, such as Hospital Information Systems (HIS), Picture Archiving and Communication Systems (PACS), Radiology Information Systems (RIS), etc., are becoming essential technical supports and infrastructures for modern healthcare facility operations. However, the informatization construction of the medical industry in China lacks systematicness and continuity, each medical information system only plays a role in an independent service range, and forms an information isolated island which is used as a camp, so that medical information of patients in the generation of heterogeneous medical information systems is difficult to effectively communicate and share.
In order to realize the transmission and sharing of cross-regional medical Information and the integration of medical service resources, the organization related to the North American radiology of North American architecture (RSNA) and the American Society for medical Information and Management Systems (HIMSS) have established an integrated medical Enterprise health (IHE) concept together with equipment manufacturers. Wherein, a patient identity cross-indexing (PIX) technical framework provides guidance for integrating the heterogeneous medical information system. The IHE PIX technical framework defines three roles: a PIX manager, a patient identity source, and a PIX user. The PIX manager is responsible for creating the unique identity identifier of the patient in the global scope, namely a patient primary index (master patient index), and establishing a mapping relation between the patient primary index and the local identity identifiers of different patient identity sources. Once the PIX manager establishes a cross-index between heterogeneous information systems, a PIX user can obtain a complete view of patient information.
Patient identification is an important cornerstone for building heterogeneous information system cross-indexing. Only by establishing a set of reliable patient identity uniqueness recognition mechanism, the diagnosis information of the same patient in different patient identity sources can be found out, so that the local identity identifiers of the patient identity sources are associated. Currently, the unique identification mechanism of patient identity adopted in the medical field is usually based on a deterministic matching strategy. The basic principle of the deterministic matching strategy is as follows: defining patient attributes for identity recognition, and manually specifying the weight of each attribute; and then calculating the similarity between the reference patient and the patient to be matched item by item, and judging whether to establish cross index for the reference patient and the patient to be matched according to the weighted average value of the similarity. The deterministic matching method has the disadvantages that: the patients in the heterogeneous information system are matched pairwise, the time complexity of the algorithm is in the square level, and the establishing efficiency of the cross index is seriously influenced; the sum of the weights of all the attributes must be 1, when a patient stored in the medical information system is absent, the weights must be distributed to all the attributes again, and the actual operability is poor; the boundary of the weighted average value of the weight and the similarity of each attribute is manually appointed, and the accuracy and the reliability of cross indexing are easily interfered by subjective factors.
Disclosure of Invention
The invention solves the problem of providing a method for establishing a patient identity source cross index, which not only improves the efficiency of establishing the cross index, but also can accurately establish the cross index under the condition that the identity attribute of a patient is default, does not need to distribute weight to each identity attribute, simplifies the flow and improves the practicability.
In order to solve the above problems, the present invention provides a method for establishing a patient identity source cross-index, comprising:
(1) defining one or more identity attributes of the patient as blocking attributes;
(2) extracting the blocking attribute of the ith patient in the patient identity source A, matching the blocking attribute with the blocking attribute of the patient in the patient identity source B, and recording the matched patient in the patient identity source B as a set Q to be matched;
(3) matching the identity attribute of the ith patient in the patient identity source A with the identity attribute of each patient in the set Q to obtain a matching ratio;
(4) according to the matching ratio, establishing a cross index between the ith patient in the patient identity source A and the patient identity source B;
(5) and (5) circularly executing the steps (2) to (4) until all the patient identity sources A and B are cross-indexed.
The method for establishing the patient identity source cross index comprises the following steps:
(1) determining blocking attribute information key of ith patient in the patient identity source Ai(ii) a Determining blocking attribute information key of jth patient in the patient identity source BjAnd calculates the blocking attribute information keyiAnd keyjThe similarity value of (a);
(2) and if the similarity value is larger than a specified threshold value, putting the jth patient in the patient identity source B into the set Q to be matched until the patient identity source B is traversed.
The method for establishing the patient identity source cross index comprises the following steps:
(1) setting N identity attributes of the patient as attributes for identifying the identity of the patient, wherein N is more than or equal to 1;
(2) according to the attribute for identifying the identity of the patient, comparing the ith patient in the patient identity source A with each patient in the set Q to be matched to obtain a similarity value of the identity attribute of the identified patient;
(3) creating a corresponding binary space vector C according to the similarity valueij={Cij(1),Cij(2),…,Cij(N), if the similarity value of the identity attribute of the ith patient in the patient identity source A and the identity attribute of the kth patient in the set Q to be matched is greater than a specified threshold value, Cij(k) 1 is ═ 1; otherwise, then Cij(k) 0; wherein k is more than or equal to 1 and less than or equal to N;
(4) calculating a probability distribution of the binary space vectorWherein, FreqlL is more than or equal to 1 and less than or equal to 2 and is the frequency of the binary space vector appearing in the l-th stateN;
(5) And respectively calculating the matching ratio of the ith patient in the patient identity source A and each patient in the set Q to be matched according to the probability distribution.
The method for establishing the patient identity source cross index comprises the steps of identifying the identity attribute of the patient, wherein the identity attribute is an identification number, a driver license number, a social security number, a birth date, a telephone number and/or a contact address.
The method for establishing the patient identity source cross index comprises the following steps of:
(1) creating an identification identity attribute vector S of the ith patient in the patient identity source Ai;
(2) Creating an identification identity attribute vector T of the jth patient in the set Q to be matchedj;
(3) Respectively from the said identity attribute vector SiAnd TjTake out the k-th elementAnd Tj k(ii) a And calculate saidAnd Tj kThe similarity value of (a).
The above method for establishing a patient identity source cross index, wherein the formula for calculating the similarity value is as follows:
<math>
<mrow>
<mi>similarity</mi>
<mrow>
<mo>(</mo>
<msubsup>
<mi>S</mi>
<mi>i</mi>
<mi>k</mi>
</msubsup>
<mo>,</mo>
<msubsup>
<mi>T</mi>
<mi>j</mi>
<mi>k</mi>
</msubsup>
<mo>)</mo>
</mrow>
<mo>=</mo>
<mi>θ</mi>
<mrow>
<mo>(</mo>
<msubsup>
<mi>S</mi>
<mi>i</mi>
<mi>k</mi>
</msubsup>
<mo>,</mo>
<msubsup>
<mi>T</mi>
<mi>j</mi>
<mi>k</mi>
</msubsup>
<mo>)</mo>
</mrow>
</mrow>
</math>
wherein,is the similarity value; theta is a character string matching function; k is more than or equal to 1 and less than or equal to N, and N is the number of the identification identity attributes.
The method for establishing the patient identity source cross index includes the following steps of:
(1) obtaining a matching threshold vector m-m through a maximum expectation algorithm according to the probability distribution1,m2,…,mNAnd a mismatch threshold vector, u ═ u1,u2,…,uNN is the number of the identity attributes;
(2) initializing values of the match probability and the mismatch probability;
(3) according to the value and the initialization value of each element in the binary space vector, performing iterative computation to obtain a matching probability and a mismatching probability;
(4) and calculating to obtain a matching ratio according to the matching probability and the mismatching probability.
The method for establishing the patient identity source cross index, wherein the formula for calculating the matching ratio is as follows:
wherein RatioijIs the match ratio;is the match probability;is the mismatch probability.
The method for establishing the patient identity source cross index includes the following steps:
(1) calculating the matching Ratio according to a statistical modelijUpper bound T ofupAnd a lower bound Tlow;
(3) If the matching Ratio of the jth patient in the data set Q isijGreater than said upper bound TupThen, the jth patient and the ith patient in the identity source A are subjected to cross indexing; if the matching Ratio of the jth patient in the data set Q isijGreater than or equal to the lower bound TlowAnd is less than or equal to the upper bound TupThen, carrying out manual treatment; if the matching Ratio of the jth patient in the data set Q isijLess than said lower bound TlowThen no processing is performed.
Compared with the prior art, the method introduces the blocking attribute, greatly reduces the matching times and improves the efficiency of establishing the cross index.
Further, compared with the existing method for calculating the identity attribute weighted similarity, the technical scheme of the invention not only can accurately establish the cross index under the condition that the identity attribute of the patient is in default, but also does not need to manually distribute the weight to each attribute, thereby simplifying the actual operation process and avoiding the interference of subjective factors.
Furthermore, the invention introduces the false rejection probability and the false extraction probability, and improves the objectivity, the reliability and the practical value of the cross index.
Drawings
FIG. 1 is a schematic flow chart illustrating cross-indexing of patient identity sources according to an embodiment of the present invention;
FIG. 2 is a schematic flow chart illustrating the matching ratio obtaining according to the embodiment of the present invention;
FIG. 3 is a schematic flow chart illustrating a process of obtaining a similarity value of an identity attribute according to an embodiment of the present invention;
FIG. 4 is a schematic flow chart illustrating the calculation of the matching ratio between the ith patient in the patient identity source A and each patient in the set Q to be matched according to the embodiment of the present invention;
fig. 5 is a schematic flow chart illustrating the process of cross-indexing between the ith patient in the patient identity source a and the patient in the patient identity source B according to the embodiment of the present invention.
Detailed Description
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein, but rather construed as limited to the embodiments set forth herein.
Next, the present invention is described in detail by using schematic diagrams, and when the embodiments of the present invention are described in detail, the schematic diagrams are only examples for convenience of description, and the scope of the present invention should not be limited herein.
The present invention will be described in detail below with reference to the drawings and examples. As shown in fig. 1, first, step S1 is executed to define one or more identity attributes of a patient as blocking attributes. Specifically, in this embodiment, the name of the patient is used as the blocking attribute, the blocking information of the blocking attribute is the pinyin of the name, and the name and the identification number of the patient may also be used as the blocking attribute.
Next, step S2 is executed to extract the blocking attribute of the ith patient in the patient identity source a, match the blocking attribute with the blocking attribute of the patient in the patient identity source B, and record the matching patient in the patient identity source B as the set Q to be matched. Specifically, the blocking attribute information key of the ith patient in the patient identity source A is analyzediIf the patient name is used as the blocking attribute, the blocking attribute information key of the ith patient in the patient identity source A is obtained in step S1iSpelling the patient's name; then, the blocking attribute information key of the jth patient in the patient identity source B is analyzedjI.e. the pinyin for the patient's name; and calculating the blocking attribute information key by a character string similarity calculation functioniAnd keyjSimilarity value of (3) similarity _ name (key)i,keyj) If the similarity value is greater than a specified threshold v0Then put the jth patient in the patient identity source B into the set Q to be matched. In this embodiment, if the ith patient in the patient identity source a is the 1 st patient in the patient identity source a, the blocking attribute information of the patient is analyzed, then the blocking attribute information of the 1 st patient in the patient identity source B is analyzed, the similarity between the blocking attribute information and the blocking attribute information is calculated, and the similarity and the specified threshold v are calculated0Making comparison, if it is greater than specified threshold value v0Then putting the 1 st patient in the patient identity source B into the set Q to be matched; then analyzing the blocking attribute information of the 2 nd patient in the patient identity source B, calculating the similarity value of the blocking attribute information of the 1 st patient in the patient identity source A and the blocking attribute information of the 2 nd patient in the patient identity source B, and similarly, comparing the similarity value with a specified threshold value v0Making comparison, if it is greater than specified threshold value v0Then put the 2 nd patient in the patient identity source B into the set Q; and the rest is repeated until all patients in the patient identity source B are traversed, and all patients in the patient identity source B which are matched with the 1 st patient in the patient identity source A are put into the set Q. In the present embodiment, the specified threshold v0The value of (A) is 0.8.
Next, step S3 is executed to match the identity attribute of the ith patient in the patient identity source a with the identity attribute of each patient in the set Q, so as to obtain a matching ratio. Specifically, as shown in FIG. 2, step S201 is executed first, and N identity attributes of the patient are set as attributes for identifying the identity of the patient, where N ≧ 1. It should be noted that the identity attribute of the patient may be an identification number, a driver license number, a social security number, a date of birth, a telephone number and/or a contact address, or may be a name. In the present embodiment, 3 identity attributes (identification number, social security number, and date of birth) of the patient are set to identify the patient identity attribute.
Then, step S202 is executed, according to the attribute of identifying the patient identity, the ith patient in the patient identity source a is compared with each patient in the set Q to be matched, so as to obtain the similarity value of the attribute of identifying the patient identity. Specifically, as shown in fig. 3, step S301 is executed first, and an identification identity attribute vector S of the ith patient in the patient identity source a is createdi. In this embodiment, the 1 st patient identification attribute vector S in the patient identification source A is created1. Then, step S302 is executed to create an identification identity attribute vector T of the jth patient in the to-be-matched set Qj. In this embodiment, an identifying identity attribute vector T is created for the first patient in the set Q1. Then, step S303 is executed to respectively identify the identity attribute vector SiAnd TjTake out the k-th elementAnd Tj kAnd calculate saidAnd Tj kThe similarity value of (a). The formula for calculating the similarity is as follows:
<math>
<mrow>
<mi>similarity</mi>
<mrow>
<mo>(</mo>
<msubsup>
<mi>S</mi>
<mi>i</mi>
<mi>k</mi>
</msubsup>
<mo>,</mo>
<msubsup>
<mi>T</mi>
<mi>j</mi>
<mi>k</mi>
</msubsup>
<mo>)</mo>
</mrow>
<mo>=</mo>
<mi>θ</mi>
<mrow>
<mo>(</mo>
<msubsup>
<mi>S</mi>
<mi>i</mi>
<mi>k</mi>
</msubsup>
<mo>,</mo>
<msubsup>
<mi>T</mi>
<mi>j</mi>
<mi>k</mi>
</msubsup>
<mo>)</mo>
</mrow>
<mo>-</mo>
<mo>-</mo>
<mo>-</mo>
<mrow>
<mo>(</mo>
<mn>1</mn>
<mo>)</mo>
</mrow>
</mrow>
</math>
wherein,is the similarity value; theta is a character string matching function; k is more than or equal to 1 and less than or equal to N, and N is the number of the identification identity attributes. In this embodiment, as shown in step S201, if there are 3 id attribute vectors, the vector S1Sum vector T1The three elements are respectively an identification number, a social security number and a birth date. Then respectively from the vector S1Sum vector T1Taking out the 1 st elementAnd T1 1The element is an identity card number and is calculated by the formula (1)And T1 1I.e. the similarity between the identity number of the 1 st patient in the patient identity source a and the first patient in the set Q. Next, the 2 nd element (social security number) is extracted, the similarity value of the identity attribute is obtained by the same method as the 1 st element, and by analogy, the similarity value of the identity attribute defined in step S201 is obtained, and in this embodiment, the similarity values of the three attributes obtained by calculation are 0.9, 0.85, and 0.8, respectively. Next, steps S302 and S303 are repeated, i.e., the 2 nd in the set Q is createdPatient's identity attribute vector T2And calculating the similarity values of the identification identity attributes of the patient and the 1 st patient in the patient identity source A, and repeating the steps until the whole set Q to be matched is traversed, namely calculating the similarity values of the identification identity attributes of the 1 st patient in the patient identity source A and all the patients in the set Q to be matched.
Then, step S203 is executed to create a corresponding binary space vector C according to the similarity valueij={Cij(1),Cij(2),…,Cij(N), if the similarity value of the identity attribute of the ith patient in the patient identity source A and the identity attribute of the kth patient in the set Q to be matched is greater than a specified threshold value, Cij(k) 1 is ═ 1; otherwise, then Cij(k) 0; wherein k is more than or equal to 1 and less than or equal to N, namely when the identification identity attribute of the ith patient in the patient identity source A or the kth patient in the jth patient in the set Q to be matched is missing, Cij(k) 0. Specifically, first, a given threshold vector v ═ v is determined1,v2,…,vN}. In the present embodiment, the threshold vector is specified to be (0.85, 0.84, 0.81). Then, in step S303, the similarity value between the identification identity attributes of the 1 st patient in the patient identity source a and the 1 st patient in the set to be matched Q is (0.9, 0.85.0.8), and then, the corresponding binary space vector is (1, 1, 0). Then, according to the similarity values of all the identification identity attributes calculated in step S303, the binary space vectors corresponding to the similarity values of the 1 st patient in the patient identity source a and each patient in the set Q to be matched can be respectively known.
Then, step S204 is executed to calculate the probability distribution of the binary space vectorWherein, FreqlL is more than or equal to 1 and less than or equal to 2 and is the frequency of the binary space vector appearing in the l-th stateN. Specifically, a series of binary space vectors is known from step S203, and the probability of the occurrence of the vector (1, 1, 0) in the series of binary space vectors and the other shapes of the binary vectors are calculatedThe probability of occurrence in a state, such as (0, 0, 1), (0, 1, 0), etc., thereby obtaining a probability distribution of the binary space vector.
Then, step S205 is executed to calculate the matching ratio between the ith patient in the patient identity source a and each patient in the set Q to be matched. Specifically, as shown in fig. 4, step S401 is executed first, and according to the probability distribution, a matching threshold vector m ═ m is obtained by a maximum expectation algorithm1,m2,…,mNAnd a mismatch threshold vector, u ═ u1,u2,…,uNAnd N is the number of the identity attributes. Next, step S402 is performed to initialize the values of the match probability and the mismatch probability. Specifically, the match probabilities are initializedProbability of mismatchWherein j is the jth patient in the set Q to be matched, i.e. the initial matching probability and the mismatching probability corresponding to each patient in the set Q to be matched are both 1. Then, step S403 is executed, and a matching probability and a mismatching probability are obtained through iterative computation according to the value of each element in the binary space vector and the initialization value. Specifically, for the match probability, if Cij(k) When the value is equal to 0, thenIf C is presentij(k) 1, thenFor the probability of mismatch, if Cij(k) When the value is equal to 0, thenIf C is presentij(k) 1, thenIn this embodiment, the disease is calculated firstAs can be seen from the above, the matching probability and the mismatching probability between the 1 st patient in the human identity source a and the 1 st patient in the set Q to be matched correspond to the corresponding identification identity attribute if the corresponding binary space vector is (1, 1, 0), and the matching probability and the mismatching probability can be obtained. By analogy, the matching probability and the mismatching probability of the 1 st patient in the patient identity source A and each patient in the set Q to be matched can be obtained. Then, step S404 is executed to calculate a matching ratio according to the matching probability and the mismatching probability. Wherein, the calculation formula is:
next, step S4 is executed to cross-index the ith patient in the patient identity source a and the patient identity source B according to the matching ratio. Specifically, as shown in fig. 5, step S501 is executed first, and the set Q to be matched is sorted in ascending order according to the matching ratio to obtain a data set Q'. Specifically, the patients in the set Q are sorted in ascending order according to the magnitude of the matching ratio calculated in the above step S404. Then, step S502 is executed to calculate the matching Ratio according to a statistical modelijUpper bound T ofupAnd a lower bound Tlow. Specifically, the specified probability of refusal is alpha, the false probability is beta, wherein alpha is more than 0 and less than or equal to 0.1, and beta is more than 0 and less than or equal to 0.1. If the x patient is satisfied in the data set QThen the upper bound T of the match ratio is reachedup=RatioixWherein x is more than 0 and less than M,m is the number of elements in the data set Q', and i is the ith patient in the patient identity source. If the y patient is present in the data set QThe lower bound T of the match ratiolow=RatioiyWherein y is more than 0 and less than x. In this example, the statistical model used was the Fellegi-Sunter model.
Next, step S503 is executed to determine the matching Ratio of the jth patient in the data set QijThe size of (2). If Ratioij>TμpThen step S504 is executed to establish a cross index between the jth patient and the ith patient in the identity source a; if Tlow≤Ratioij≤TμpIf yes, executing step S505 to perform manual processing; if Ratioij≤TλThen, in step S506, no processing is performed. Specifically, the matching Ratio of the first patient in the data set Q is first determined11Comparing the upper and lower limits, and determining the matching Ratio of the second patient12Until the entire data set Q is traversed.
Then, step S5 is executed, and steps S2-S4 are executed in a loop until all patient identity sources a and B are cross-indexed. Specifically, as with the method of establishing the cross-index of the 1 st patient in the patient identity source a and the patient identity source B, the cross-index of the 2 nd patient in the patient identity source a and the patient identity source B is established until the cross-index is established for all patients in the patient identity source a and the patient identity source B.
Although the present invention has been described with reference to the preferred embodiments, it is not intended to limit the present invention, and those skilled in the art can make variations and modifications of the present invention without departing from the spirit and scope of the present invention by using the methods and technical contents disclosed above.