CN104063567A

CN104063567A - Establishment method of patient identity source cross reference

Info

Publication number: CN104063567A
Application number: CN201310089099.2A
Authority: CN
Inventors: 陈文娟
Original assignee: Shanghai United Imaging Healthcare Co Ltd
Current assignee: Shanghai United Imaging Healthcare Co Ltd; Wuhan United Imaging Healthcare Co Ltd
Priority date: 2013-03-20
Filing date: 2013-03-20
Publication date: 2014-09-24
Anticipated expiration: 2033-03-20
Also published as: CN104063567B

Abstract

The invention provides an establishment method of a patient identity source cross reference. The method comprises steps as follows: one or more identity attributes of a patient are defined as stop attributes; a stop attribute of an ith patient in a patient identity source A is extracted and matched with a stop attribute of a patient in a patient identity source B, and the matched patients in the patient identity source B are recorded as a to-be-matched set Q; one or more patient identity attributes are defined as recognition attributes, the matching ratio of the ith patient in the patient identity source A and each patient in the set Q is calculated according to the recognition attributes, and the cross reference is established for the patient identity source A and the patient identity source B by comparing the matching ratio with a designed threshold value. According to the method, the cross reference can be accurately established under the condition that the patient identity attributes are defaulted, weight is not required to be allocated to each identity attribute, the process is simplified, and the cross reference establishing efficiency and the practicability are improved.

Description

Method for establishing patient identity source cross index

Technical Field

The invention relates to a medical information technology, in particular to a method for establishing a patient identity source cross index.

Background

With the rapid development of information technology and the deep innovation of medical system, the process of informatization construction of the medical industry in China is gradually accelerated. Medical information systems with specific functions, such as Hospital Information Systems (HIS), Picture Archiving and Communication Systems (PACS), Radiology Information Systems (RIS), etc., are becoming essential technical supports and infrastructures for modern healthcare facility operations. However, the informatization construction of the medical industry in China lacks systematicness and continuity, each medical information system only plays a role in an independent service range, and forms an information isolated island which is used as a camp, so that medical information of patients in the generation of heterogeneous medical information systems is difficult to effectively communicate and share.

In order to realize the transmission and sharing of cross-regional medical Information and the integration of medical service resources, the organization related to the North American radiology of North American architecture (RSNA) and the American Society for medical Information and Management Systems (HIMSS) have established an integrated medical Enterprise health (IHE) concept together with equipment manufacturers. Wherein, a patient identity cross-indexing (PIX) technical framework provides guidance for integrating the heterogeneous medical information system. The IHE PIX technical framework defines three roles: a PIX manager, a patient identity source, and a PIX user. The PIX manager is responsible for creating the unique identity identifier of the patient in the global scope, namely a patient primary index (master patient index), and establishing a mapping relation between the patient primary index and the local identity identifiers of different patient identity sources. Once the PIX manager establishes a cross-index between heterogeneous information systems, a PIX user can obtain a complete view of patient information.

Patient identification is an important cornerstone for building heterogeneous information system cross-indexing. Only by establishing a set of reliable patient identity uniqueness recognition mechanism, the diagnosis information of the same patient in different patient identity sources can be found out, so that the local identity identifiers of the patient identity sources are associated. Currently, the unique identification mechanism of patient identity adopted in the medical field is usually based on a deterministic matching strategy. The basic principle of the deterministic matching strategy is as follows: defining patient attributes for identity recognition, and manually specifying the weight of each attribute; and then calculating the similarity between the reference patient and the patient to be matched item by item, and judging whether to establish cross index for the reference patient and the patient to be matched according to the weighted average value of the similarity. The deterministic matching method has the disadvantages that: the patients in the heterogeneous information system are matched pairwise, the time complexity of the algorithm is in the square level, and the establishing efficiency of the cross index is seriously influenced; the sum of the weights of all the attributes must be 1, when a patient stored in the medical information system is absent, the weights must be distributed to all the attributes again, and the actual operability is poor; the boundary of the weighted average value of the weight and the similarity of each attribute is manually appointed, and the accuracy and the reliability of cross indexing are easily interfered by subjective factors.

Disclosure of Invention

The invention solves the problem of providing a method for establishing a patient identity source cross index, which not only improves the efficiency of establishing the cross index, but also can accurately establish the cross index under the condition that the identity attribute of a patient is default, does not need to distribute weight to each identity attribute, simplifies the flow and improves the practicability.

In order to solve the above problems, the present invention provides a method for establishing a patient identity source cross-index, comprising:

(1) defining one or more identity attributes of the patient as blocking attributes;

(2) extracting the blocking attribute of the ith patient in the patient identity source A, matching the blocking attribute with the blocking attribute of the patient in the patient identity source B, and recording the matched patient in the patient identity source B as a set Q to be matched;

(3) matching the identity attribute of the ith patient in the patient identity source A with the identity attribute of each patient in the set Q to obtain a matching ratio;

(4) according to the matching ratio, establishing a cross index between the ith patient in the patient identity source A and the patient identity source B;

(5) and (5) circularly executing the steps (2) to (4) until all the patient identity sources A and B are cross-indexed.

The method for establishing the patient identity source cross index comprises the following steps:

(1) determining blocking attribute information key of ith patient in the patient identity source A_i(ii) a Determining blocking attribute information key of jth patient in the patient identity source B_jAnd calculates the blocking attribute information key_iAnd key_jThe similarity value of (a);

(2) and if the similarity value is larger than a specified threshold value, putting the jth patient in the patient identity source B into the set Q to be matched until the patient identity source B is traversed.

(1) setting N identity attributes of the patient as attributes for identifying the identity of the patient, wherein N is more than or equal to 1;

(2) according to the attribute for identifying the identity of the patient, comparing the ith patient in the patient identity source A with each patient in the set Q to be matched to obtain a similarity value of the identity attribute of the identified patient;

(3) creating a corresponding binary space vector C according to the similarity value_ij＝{C_ij(1)，C_ij(2)，…，C_ij(N), if the similarity value of the identity attribute of the ith patient in the patient identity source A and the identity attribute of the kth patient in the set Q to be matched is greater than a specified threshold value, C_ij(k) 1 is ═ 1; otherwise, then C_ij(k) 0; wherein k is more than or equal to 1 and less than or equal to N;

(4) calculating a probability distribution of the binary space vectorWherein, Freq_lL is more than or equal to 1 and less than or equal to 2 and is the frequency of the binary space vector appearing in the l-th state^N；

(5) And respectively calculating the matching ratio of the ith patient in the patient identity source A and each patient in the set Q to be matched according to the probability distribution.

The method for establishing the patient identity source cross index comprises the steps of identifying the identity attribute of the patient, wherein the identity attribute is an identification number, a driver license number, a social security number, a birth date, a telephone number and/or a contact address.

The method for establishing the patient identity source cross index comprises the following steps of:

(1) creating an identification identity attribute vector S of the ith patient in the patient identity source A_i；

(2) Creating an identification identity attribute vector T of the jth patient in the set Q to be matched_j；

(3) Respectively from the said identity attribute vector S_iAnd T_jTake out the k-th elementAnd T_j ^k(ii) a And calculate saidAnd T_j ^kThe similarity value of (a).

The above method for establishing a patient identity source cross index, wherein the formula for calculating the similarity value is as follows:

<math> <mrow> <mi>similarity</mi> <mrow> <mo>(</mo> <msubsup> <mi>S</mi> <mi>i</mi> <mi>k</mi> </msubsup> <mo>,</mo> <msubsup> <mi>T</mi> <mi>j</mi> <mi>k</mi> </msubsup> <mo>)</mo> </mrow> <mo>=</mo> <mi>θ</mi> <mrow> <mo>(</mo> <msubsup> <mi>S</mi> <mi>i</mi> <mi>k</mi> </msubsup> <mo>,</mo> <msubsup> <mi>T</mi> <mi>j</mi> <mi>k</mi> </msubsup> <mo>)</mo> </mrow> </mrow> </math>

wherein,is the similarity value; theta is a character string matching function; k is more than or equal to 1 and less than or equal to N, and N is the number of the identification identity attributes.

The method for establishing the patient identity source cross index includes the following steps of:

(1) obtaining a matching threshold vector m-m through a maximum expectation algorithm according to the probability distribution₁，m₂，…，m_NAnd a mismatch threshold vector, u ═ u₁，u₂，…，u_NN is the number of the identity attributes;

(2) initializing values of the match probability and the mismatch probability;

(3) according to the value and the initialization value of each element in the binary space vector, performing iterative computation to obtain a matching probability and a mismatching probability;

(4) and calculating to obtain a matching ratio according to the matching probability and the mismatching probability.

The method for establishing the patient identity source cross index, wherein the formula for calculating the matching ratio is as follows:

{Ratio}_{ij} = \log_{2} (p_{1}^{j} / p_{2}^{j})

wherein Ratio_ijIs the match ratio;is the match probability;is the mismatch probability.

The method for establishing the patient identity source cross index includes the following steps:

(1) calculating the matching Ratio according to a statistical model_ijUpper bound T of_upAnd a lower bound T_low；

(3) If the matching Ratio of the jth patient in the data set Q is_ijGreater than said upper bound T_upThen, the jth patient and the ith patient in the identity source A are subjected to cross indexing; if the matching Ratio of the jth patient in the data set Q is_ijGreater than or equal to the lower bound T_lowAnd is less than or equal to the upper bound T_upThen, carrying out manual treatment; if the matching Ratio of the jth patient in the data set Q is_ijLess than said lower bound T_lowThen no processing is performed.

Compared with the prior art, the method introduces the blocking attribute, greatly reduces the matching times and improves the efficiency of establishing the cross index.

Further, compared with the existing method for calculating the identity attribute weighted similarity, the technical scheme of the invention not only can accurately establish the cross index under the condition that the identity attribute of the patient is in default, but also does not need to manually distribute the weight to each attribute, thereby simplifying the actual operation process and avoiding the interference of subjective factors.

Furthermore, the invention introduces the false rejection probability and the false extraction probability, and improves the objectivity, the reliability and the practical value of the cross index.

Drawings

FIG. 1 is a schematic flow chart illustrating cross-indexing of patient identity sources according to an embodiment of the present invention;

FIG. 2 is a schematic flow chart illustrating the matching ratio obtaining according to the embodiment of the present invention;

FIG. 3 is a schematic flow chart illustrating a process of obtaining a similarity value of an identity attribute according to an embodiment of the present invention;

FIG. 4 is a schematic flow chart illustrating the calculation of the matching ratio between the ith patient in the patient identity source A and each patient in the set Q to be matched according to the embodiment of the present invention;

fig. 5 is a schematic flow chart illustrating the process of cross-indexing between the ith patient in the patient identity source a and the patient in the patient identity source B according to the embodiment of the present invention.

Detailed Description

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein, but rather construed as limited to the embodiments set forth herein.

Next, the present invention is described in detail by using schematic diagrams, and when the embodiments of the present invention are described in detail, the schematic diagrams are only examples for convenience of description, and the scope of the present invention should not be limited herein.

The present invention will be described in detail below with reference to the drawings and examples. As shown in fig. 1, first, step S1 is executed to define one or more identity attributes of a patient as blocking attributes. Specifically, in this embodiment, the name of the patient is used as the blocking attribute, the blocking information of the blocking attribute is the pinyin of the name, and the name and the identification number of the patient may also be used as the blocking attribute.

Next, step S2 is executed to extract the blocking attribute of the ith patient in the patient identity source a, match the blocking attribute with the blocking attribute of the patient in the patient identity source B, and record the matching patient in the patient identity source B as the set Q to be matched. Specifically, the blocking attribute information key of the ith patient in the patient identity source A is analyzed_iIf the patient name is used as the blocking attribute, the blocking attribute information key of the ith patient in the patient identity source A is obtained in step S1_iSpelling the patient's name; then, the blocking attribute information key of the jth patient in the patient identity source B is analyzed_jI.e. the pinyin for the patient's name; and calculating the blocking attribute information key by a character string similarity calculation function_iAnd key_jSimilarity value of (3) similarity _ name (key)_i，key_j) If the similarity value is greater than a specified threshold v₀Then put the jth patient in the patient identity source B into the set Q to be matched. In this embodiment, if the ith patient in the patient identity source a is the 1 st patient in the patient identity source a, the blocking attribute information of the patient is analyzed, then the blocking attribute information of the 1 st patient in the patient identity source B is analyzed, the similarity between the blocking attribute information and the blocking attribute information is calculated, and the similarity and the specified threshold v are calculated₀Making comparison, if it is greater than specified threshold value v₀Then putting the 1 st patient in the patient identity source B into the set Q to be matched; then analyzing the blocking attribute information of the 2 nd patient in the patient identity source B, calculating the similarity value of the blocking attribute information of the 1 st patient in the patient identity source A and the blocking attribute information of the 2 nd patient in the patient identity source B, and similarly, comparing the similarity value with a specified threshold value v₀Making comparison, if it is greater than specified threshold value v₀Then put the 2 nd patient in the patient identity source B into the set Q; and the rest is repeated until all patients in the patient identity source B are traversed, and all patients in the patient identity source B which are matched with the 1 st patient in the patient identity source A are put into the set Q. In the present embodiment, the specified threshold v₀The value of (A) is 0.8.

Next, step S3 is executed to match the identity attribute of the ith patient in the patient identity source a with the identity attribute of each patient in the set Q, so as to obtain a matching ratio. Specifically, as shown in FIG. 2, step S201 is executed first, and N identity attributes of the patient are set as attributes for identifying the identity of the patient, where N ≧ 1. It should be noted that the identity attribute of the patient may be an identification number, a driver license number, a social security number, a date of birth, a telephone number and/or a contact address, or may be a name. In the present embodiment, 3 identity attributes (identification number, social security number, and date of birth) of the patient are set to identify the patient identity attribute.

Then, step S202 is executed, according to the attribute of identifying the patient identity, the ith patient in the patient identity source a is compared with each patient in the set Q to be matched, so as to obtain the similarity value of the attribute of identifying the patient identity. Specifically, as shown in fig. 3, step S301 is executed first, and an identification identity attribute vector S of the ith patient in the patient identity source a is created_i. In this embodiment, the 1 st patient identification attribute vector S in the patient identification source A is created₁. Then, step S302 is executed to create an identification identity attribute vector T of the jth patient in the to-be-matched set Q_j. In this embodiment, an identifying identity attribute vector T is created for the first patient in the set Q₁. Then, step S303 is executed to respectively identify the identity attribute vector S_iAnd T_jTake out the k-th elementAnd T_j ^kAnd calculate saidAnd T_j ^kThe similarity value of (a). The formula for calculating the similarity is as follows:

<math> <mrow> <mi>similarity</mi> <mrow> <mo>(</mo> <msubsup> <mi>S</mi> <mi>i</mi> <mi>k</mi> </msubsup> <mo>,</mo> <msubsup> <mi>T</mi> <mi>j</mi> <mi>k</mi> </msubsup> <mo>)</mo> </mrow> <mo>=</mo> <mi>θ</mi> <mrow> <mo>(</mo> <msubsup> <mi>S</mi> <mi>i</mi> <mi>k</mi> </msubsup> <mo>,</mo> <msubsup> <mi>T</mi> <mi>j</mi> <mi>k</mi> </msubsup> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>1</mn> <mo>)</mo> </mrow> </mrow> </math>

wherein,is the similarity value; theta is a character string matching function; k is more than or equal to 1 and less than or equal to N, and N is the number of the identification identity attributes. In this embodiment, as shown in step S201, if there are 3 id attribute vectors, the vector S₁Sum vector T₁The three elements are respectively an identification number, a social security number and a birth date. Then respectively from the vector S₁Sum vector T₁Taking out the 1 st elementAnd T₁ ¹The element is an identity card number and is calculated by the formula (1)And T₁ ¹I.e. the similarity between the identity number of the 1 st patient in the patient identity source a and the first patient in the set Q. Next, the 2 nd element (social security number) is extracted, the similarity value of the identity attribute is obtained by the same method as the 1 st element, and by analogy, the similarity value of the identity attribute defined in step S201 is obtained, and in this embodiment, the similarity values of the three attributes obtained by calculation are 0.9, 0.85, and 0.8, respectively. Next, steps S302 and S303 are repeated, i.e., the 2 nd in the set Q is createdPatient's identity attribute vector T₂And calculating the similarity values of the identification identity attributes of the patient and the 1 st patient in the patient identity source A, and repeating the steps until the whole set Q to be matched is traversed, namely calculating the similarity values of the identification identity attributes of the 1 st patient in the patient identity source A and all the patients in the set Q to be matched.

Then, step S203 is executed to create a corresponding binary space vector C according to the similarity value_ij＝{C_ij(1)，C_ij(2)，…，C_ij(N), if the similarity value of the identity attribute of the ith patient in the patient identity source A and the identity attribute of the kth patient in the set Q to be matched is greater than a specified threshold value, C_ij(k) 1 is ═ 1; otherwise, then C_ij(k) 0; wherein k is more than or equal to 1 and less than or equal to N, namely when the identification identity attribute of the ith patient in the patient identity source A or the kth patient in the jth patient in the set Q to be matched is missing, C_ij(k) 0. Specifically, first, a given threshold vector v ═ v is determined₁，v₂，…，v_N}. In the present embodiment, the threshold vector is specified to be (0.85, 0.84, 0.81). Then, in step S303, the similarity value between the identification identity attributes of the 1 st patient in the patient identity source a and the 1 st patient in the set to be matched Q is (0.9, 0.85.0.8), and then, the corresponding binary space vector is (1, 1, 0). Then, according to the similarity values of all the identification identity attributes calculated in step S303, the binary space vectors corresponding to the similarity values of the 1 st patient in the patient identity source a and each patient in the set Q to be matched can be respectively known.

Then, step S204 is executed to calculate the probability distribution of the binary space vectorWherein, Freq_lL is more than or equal to 1 and less than or equal to 2 and is the frequency of the binary space vector appearing in the l-th state^N. Specifically, a series of binary space vectors is known from step S203, and the probability of the occurrence of the vector (1, 1, 0) in the series of binary space vectors and the other shapes of the binary vectors are calculatedThe probability of occurrence in a state, such as (0, 0, 1), (0, 1, 0), etc., thereby obtaining a probability distribution of the binary space vector.

Then, step S205 is executed to calculate the matching ratio between the ith patient in the patient identity source a and each patient in the set Q to be matched. Specifically, as shown in fig. 4, step S401 is executed first, and according to the probability distribution, a matching threshold vector m ═ m is obtained by a maximum expectation algorithm₁，m₂，…，m_NAnd a mismatch threshold vector, u ═ u₁，u₂，…，u_NAnd N is the number of the identity attributes. Next, step S402 is performed to initialize the values of the match probability and the mismatch probability. Specifically, the match probabilities are initializedProbability of mismatchWherein j is the jth patient in the set Q to be matched, i.e. the initial matching probability and the mismatching probability corresponding to each patient in the set Q to be matched are both 1. Then, step S403 is executed, and a matching probability and a mismatching probability are obtained through iterative computation according to the value of each element in the binary space vector and the initialization value. Specifically, for the match probability, if C_ij(k) When the value is equal to 0, thenIf C is present_ij(k) 1, thenFor the probability of mismatch, if C_ij(k) When the value is equal to 0, thenIf C is present_ij(k) 1, thenIn this embodiment, the disease is calculated firstAs can be seen from the above, the matching probability and the mismatching probability between the 1 st patient in the human identity source a and the 1 st patient in the set Q to be matched correspond to the corresponding identification identity attribute if the corresponding binary space vector is (1, 1, 0), and the matching probability and the mismatching probability can be obtained. By analogy, the matching probability and the mismatching probability of the 1 st patient in the patient identity source A and each patient in the set Q to be matched can be obtained. Then, step S404 is executed to calculate a matching ratio according to the matching probability and the mismatching probability. Wherein, the calculation formula is:

{Ratio}_{ij} = \log_{2} (p_{1}^{j} / p_{2}^{j}) .

next, step S4 is executed to cross-index the ith patient in the patient identity source a and the patient identity source B according to the matching ratio. Specifically, as shown in fig. 5, step S501 is executed first, and the set Q to be matched is sorted in ascending order according to the matching ratio to obtain a data set Q'. Specifically, the patients in the set Q are sorted in ascending order according to the magnitude of the matching ratio calculated in the above step S404. Then, step S502 is executed to calculate the matching Ratio according to a statistical model_ijUpper bound T of_upAnd a lower bound T_low. Specifically, the specified probability of refusal is alpha, the false probability is beta, wherein alpha is more than 0 and less than or equal to 0.1, and beta is more than 0 and less than or equal to 0.1. If the x patient is satisfied in the data set QThen the upper bound T of the match ratio is reached_up＝Ratio_ixWherein x is more than 0 and less than M,m is the number of elements in the data set Q', and i is the ith patient in the patient identity source. If the y patient is present in the data set QThe lower bound T of the match ratio_low＝Ratio_iyWherein y is more than 0 and less than x. In this example, the statistical model used was the Fellegi-Sunter model.

Next, step S503 is executed to determine the matching Ratio of the jth patient in the data set Q_ijThe size of (2). If Ratio_ij＞T_μpThen step S504 is executed to establish a cross index between the jth patient and the ith patient in the identity source a; if T_low≤Ratio_ij≤T_μpIf yes, executing step S505 to perform manual processing; if Ratio_ij≤T_λThen, in step S506, no processing is performed. Specifically, the matching Ratio of the first patient in the data set Q is first determined₁₁Comparing the upper and lower limits, and determining the matching Ratio of the second patient₁₂Until the entire data set Q is traversed.

Then, step S5 is executed, and steps S2-S4 are executed in a loop until all patient identity sources a and B are cross-indexed. Specifically, as with the method of establishing the cross-index of the 1 st patient in the patient identity source a and the patient identity source B, the cross-index of the 2 nd patient in the patient identity source a and the patient identity source B is established until the cross-index is established for all patients in the patient identity source a and the patient identity source B.

Although the present invention has been described with reference to the preferred embodiments, it is not intended to limit the present invention, and those skilled in the art can make variations and modifications of the present invention without departing from the spirit and scope of the present invention by using the methods and technical contents disclosed above.

Claims

1. A method for establishing a patient identity source cross index is characterized by comprising the following steps:

2. The method of claim 1, wherein the step of determining the set Q to be matched comprises:

3. The method of claim 1, wherein the matching ratio is obtained by:

(4) calculating a probability distribution of the binary space vector

Wherein, Freq_lL is more than or equal to 1 and less than or equal to 2 and is the frequency of the binary space vector appearing in the l-th state^N；

4. The method of claim 3, wherein the patient identification attribute is an identification number, a driver license number, a social security number, a date of birth, a telephone number, and/or a contact address.

5. The method of claim 3, wherein the identity attribute similarity value is obtained by:

6. The method of claim 5, wherein the similarity value is calculated by the formula:

7. The method of claim 3, wherein the step of calculating the matching ratio between the ith patient in the patient identity source A and each patient in the set Q to be matched comprises:

(2) initializing values of the match probability and the mismatch probability;

8. The method of claim 7, wherein said matching ratio is calculated by the formula:

{Ratio}_{ij} = \log_{2} (p_{1}^{j} / p_{2}^{j})

9. The method of claim 1, wherein the i-th patient in the patient identity source A and the patient in the patient identity source B establish the cross-index by the following steps:

(1) calculating the matching Ratio_ijUpper bound T of_upAnd a lower bound T_low；

(2) If the jth patient in the data set QMatching Ratio of_ijGreater than said upper bound T_upThen, the jth patient and the ith patient in the identity source A are subjected to cross indexing; if the matching Ratio of the jth patient in the data set Q is_ijGreater than or equal to the lower bound T_lowAnd is less than or equal to the upper bound T_upThen, carrying out manual treatment; if the matching Ratio of the jth patient in the data set Q is_ijLess than or equal to the lower bound T_lowThen no processing is performed.