A kind of method for building up of patient status source cross-index
Technical field
The present invention relates to a kind of medical information technology more particularly to a kind of method for building up of patient status source cross-index.
Background technique
With the rapid development of information technology and the in-depth reform of Medical treatment system, China's medical industry informatization into
Journey is gradually accelerated.Has the medical information system of specific function, such as hospital information system (Hospital information
System, HIS), image archiving and communication system (Picture archiving and communication system,
PACS) and radiology information system (Radiology information system, RIS) etc., it is increasingly becoming modern medical service mechanism
The necessary technology of operation supports and infrastructure.But the informatization of China's medical industry lacks systematicness and continuity,
Each medical information system only plays a role within the scope of separate traffic, forms the information island being their respective businesses, so that patient
Effectively intercommunication is difficult in the aborning medical information of heterogeneous medical information system and is shared.
In order to realize the transmission of trans-regional medical information and share, medical services resource, Radiological Society of North America are integrated
(Radiological Society of North America, RSNA) and American Medical information and management system association
(Healthcare Information and Management Systems Society, HIMSS) tissue has shutting mechanism and sets
Standby manufacturer establishes medical (integration healthcare enterprise, the IHE) concept of integrated medical enterprise jointly.
Wherein, patient status's cross-index (patient identifier cross-referencing, PIX) technological frame is integrated
Heterogeneous medical information system provides guidance.IHE PIX technological frame defines Three role: PIX manager, patient status source
With PIX user.The unique identity symbol of local scope, PIX management are distributed in patient status source in its information system for patient
Device is responsible for creation patient and is accorded in the unique identity of global scope, i.e. patient's master index (master patient index),
And patient's master index is established into mapping relations from the local identification identifier in different patient status sources.Once PIX manager is established
Cross-index between Heterogeneous Information System, PIX user are obtained with the full view of patient information.
Patient identification is the important foundation stone for establishing Heterogeneous Information System cross-index.Only establish a set of reliable disease
Personal part uniqueness recognition mechanism can find out the same patient in the diagnosis information in different patient status sources, thus by each
The local identification identifier in a patient status source associates.Currently, patient status's uniqueness cognitron that medical field uses
System is usually based on certainty matching strategy.The basic principle of certainty matching strategy are as follows: definition is used for the disease of identification
It is humanized, and the weight of each attribute is manually specified;Subsequent similarity between calculating benchmark patient and patient to be matched item by item,
The two, which is judged whether it is, according to the weighted average of similarity establishes cross-index.The drawbacks of certainty matching process, is: right
Patient in Heterogeneous Information System is matched two-by-two, and the time complexity of algorithm is square grade, seriously affects cross-index
Establish efficiency;The sum of the weight of every attribute is necessary for 1, when medical information system storage patient missing when, it is necessary to again for
Every attribute distributes weight, and practical operability is poor;The weight and Similarity-Weighted average value of every attribute is manually specified
Boundary, interference of the accuracy and reliability of cross-index vulnerable to subjective factor.
Summary of the invention
Problems solved by the invention is to provide a kind of method for building up of patient status source cross-index, not only increases foundation
The efficiency of cross-index, and cross-index can be accurately established in the case where patient status's attribute is default, without to items
Identity attribute distributes weight, simplifies process, improves practicability.
To solve the above-mentioned problems, the present invention provides a kind of method for building up of patient status source cross-index, comprising:
(1) one of patient or more than one of identity attribute are defined as barrier properties;
(2) barrier properties of i-th of patient in the A of patient status source, the blocking with the patient in the B of patient status source are extracted
Attribute is matched, and the matching patient in the B of patient status source is denoted as set Q to be matched;
(3) by the body of each patient in the identity attribute of i-th of patient in the patient status source A and the set Q
Part attribute is matched, and matching ratio is obtained;
(4) according to the matching ratio, by the patient status source A i-th of patient and patient status source B establish and hand over
Fork index;
(5) circulation executes step (2)~(4), until carrying out intersecting rope for all patient status source A and patient status source B
Draw.
A kind of method for building up of patient status source cross-index described above, wherein the mistake of the determination set Q to be matched
Journey is as follows:
(1) the barrier properties information key of i-th of patient in the patient status source A is determinedi;Determine the patient status
The barrier properties information key of j-th of patient in the B of sourcej, and calculate the barrier properties information keyiAnd keyjSimilarity value;
(2) if the similarity value is greater than specified threshold, j-th of patient in the patient status source B is put into described
Set Q to be matched, until traversing the patient status source B.
A kind of method for building up of patient status source cross-index described above, wherein the process for obtaining matching ratio
It is as follows:
(1) the N item identity attribute of the patient is set to the attribute of identification patient status, wherein N >=1;
(2) according to it is described identification patient status attribute, by i-th of patient in the patient status source A and it is described to
It is compared with each patient in set Q, obtains the similarity value of identification patient status's attribute;
(3) according to the similarity value, corresponding binary space vector C is createdij={ Cij(1), Cij(2) ..., Cij
(N) }, if in the patient status source A in i-th of patient and the set Q to be matched j-th of patient kth item identity attribute
Similarity value be greater than specified threshold, then Cij(k)=1;Conversely, then Cij(k)=0;Wherein, 1≤k≤N;
(4) probability distribution of the binary space vector is calculated
Wherein, FreqlFor the frequency that the binary space vector occurs under l kind state, 1≤l≤2N;
(5) according to the probability distribution, i-th of patient and the collection to be matched in the patient status source A are calculated separately
Close the matching ratio of each patient in Q.
The method for building up of a kind of patient status source cross-index described above, wherein identification patient status's attribute is
ID card No., licence number, social insurance number, date of birth, telephone number and/or contact address.
The method for building up of a kind of patient status source cross-index described above, wherein obtain the identification identity attribute phase
Like the process of angle value are as follows:
(1) the identification identity attribute vector S of i-th of patient in the patient status source A is createdi;
(2) the identification identity attribute vector T of j-th of patient in the set Q to be matched is createdj;
(3) respectively from the identification identity attribute vector SiAnd TjK-th of element of middle taking-upAnd Tj k;And described in calculatingAnd Tj kSimilarity value.
A kind of method for building up of patient status source cross-index described above, wherein the formula for calculating similarity value
Are as follows:
Wherein,For the similarity value;θ is string matching function;1≤k≤N, N are
The quantity of the identification identity attribute.
The method for building up of a kind of patient status source cross-index described above, wherein calculate in the patient status source A
The process of the matching ratio of each patient in i patient and the set Q to be matched are as follows:
(1) according to the probability distribution, matching threshold vector m={ m is obtained by EM algorithm1, m2..., mNAnd
Mismatch threshold vector, u={ u1, u2..., uN, wherein N is the quantity of the identification identity attribute;
(2) value of initialization matching probability and mismatch probability;
(3) according to the value and initialization value of each element in the binary space vector, it is general that iterative calculation obtains matching
Rate and mismatch probability;
(4) according to the matching probability and mismatch probability, matching ratio is calculated.
The method for building up of a kind of patient status source cross-index described above, wherein described that the matching ratio is calculated
The formula of rate are as follows:
Wherein, RatioijFor the matching ratio;For the matching probability;For the mismatch probability.
The method for building up of a kind of patient status source cross-index described above, wherein i-th in the patient status source A
The process that the patient of a patient and the patient status source B establish cross-index is as follows:
(1) according to statistical models, the matching ratio R atio is calculatedijUpper bound TupWith lower bound Tlow;
(3) if the matching ratio R atio of j-th of patient in the data acquisition system QijGreater than the upper bound Tup, then should
I-th of patient in j-th of patient and the identity source A establishes cross-index;If j-th in the data acquisition system Q
The matching ratio R atio of patientijMore than or equal to the lower bound TlowAnd it is less than or equal to the upper bound Tup, then artificial treatment is carried out;
If the matching ratio R atio of j-th of patient in the data acquisition system QijLess than the lower bound Tlow, then without any processing.
Compared with prior art, present invention introduces barrier properties, greatly reduce matching times, improve foundation and intersect rope
The efficiency drawn.
Further, compared to the existing way for calculating identity attribute Weighted Similarity, technical solution of the present invention is not
It is only capable of accurately establishing cross-index in the case where patient status's attribute is default, also without artificially to every attribute distribution power
Weight, simplifies practical operation process, avoids the interference of subjective factor.
Further, present invention introduces true probability and type B error probability is refused, improve objectivity, the reliability of cross-index with
And practical value.
Detailed description of the invention
Fig. 1 show the flow diagram of patient status of embodiment of the present invention source cross-index foundation;
Fig. 2 show the flow diagram that the embodiment of the present invention obtains matching ratio;
Fig. 3 show the flow diagram that the embodiment of the present invention obtains identification identity attribute similarity value;
Fig. 4 show the embodiment of the present invention and calculates i-th of patient and the set Q to be matched in the patient status source A
In each patient matching ratio flow diagram;
Fig. 5 show the patient of i-th of patient and the patient status source B in patient status of embodiment of the present invention source A
Establish the flow diagram of cross-index.
Specific embodiment
In the following description, numerous specific details are set forth in order to facilitate a full understanding of the present invention.But the present invention can be with
Much it is different from other way described herein to implement, those skilled in the art can be without prejudice to intension of the present invention the case where
Under do similar popularization, therefore the present invention is not limited to the specific embodiments disclosed below.
Secondly, the present invention is described in detail using schematic diagram, when describing the embodiments of the present invention, for purposes of illustration only, institute
Stating schematic diagram is example, should not limit the scope of protection of the invention herein.
The present invention is described in detail with reference to the accompanying drawings and examples.Patient status source of the present invention cross-index
Method for building up is as shown in Figure 1, firstly, execution step S1, one of patient or more than one of identity attribute are defined as stopping to belong to
Property.Specifically, in the present embodiment, using the name of patient as barrier properties, then the blocking information of the barrier properties is name
Phonetic, can also be using the name of patient and ID card No. as barrier properties.
Then, step S2 is executed, the barrier properties of i-th of patient in the A of patient status source are extracted, with patient status source B
In the barrier properties of patient match, the matching patient in the B of patient status source is denoted as set Q to be matched.Specifically, it solves
Analyse the barrier properties information key of i-th of patient in the patient status source Ai, by step S1 it is found that the name of patient is as resistance
Keep off attribute, then in the A of patient status source i-th of patient barrier properties information keyiFor the patient name phonetic;Then, institute is parsed
State the barrier properties information key of j-th of patient in the B of patient status sourcej, i.e. the phonetic of the name of the patient;And pass through character string
Similarity calculation function calculates barrier properties information keyiAnd keyjSimilarity value similarity_name (keyi,
keyj), if similarity value is greater than specified threshold v0, then j-th of patient in above-mentioned patient status source B is put into set to be matched
Q.In the present embodiment, i-th of patient is the 1st patient in the A of patient status source in the A of patient status source, then parses the patient
Barrier properties information, then parse the barrier properties information of the 1st patient in the B of patient status source, calculate between the two similar
Angle value, and with specified threshold v0It is compared, if more than specified threshold v0, then by the 1st patient in the B of patient status source be put into
Match set Q;Then the barrier properties information for parsing the 2nd patient in the B of patient status source, calculates in the A of patient status source the 1st
The similarity value of the barrier properties information of 2nd patient in the barrier properties information and patient status source B of patient, equally, by this
Similarity value and specified threshold v0It is compared, if more than specified threshold v0, then the 2nd patient in the B of patient status source is put into collection
Close Q;And so on, until all patient in traversal patient status source B, by the B of patient status source and in the A of patient status source the
The matched all patients of 1 patient are put into set Q.In the present embodiment, the specified threshold v0Value be 0.8.
Then, step S3 is executed, it will be in the identity attribute of i-th of patient in the patient status source A and the set Q
The identity attribute of each patient matches, and obtains matching ratio.Specifically, as shown in Fig. 2, first carrying out step S201, by institute
The N item identity attribute for stating patient is set as the attribute of identification patient status, wherein N >=1.It should be noted that identification patient's body
Part attribute is ID card No., licence number, social insurance number, date of birth, telephone number and/or contact address, can also
To be name etc..In the present embodiment, by 3 identity attributes (ID card No., social insurance number and the date of birth of patient
Phase) it is set as identification patient status's attribute.
Then, step S202 is executed, it, will be i-th in the patient status source A according to the attribute of the identification patient status
Each patient in patient and the set Q to be matched is compared, and obtains the similarity value of identification patient status's attribute.
Specifically, as shown in figure 3, first carrying out step S301, the identification identity attribute of i-th of patient in the patient status source A is created
Vector Si.In the present embodiment, that is, the identification identity attribute vector S of the 1st patient in the A of patient status source is created1.Then it executes
Step S302 creates the identification identity attribute vector T of j-th of patient in the set Q to be matchedj.In the present embodiment, it creates
Build the identification identity attribute vector T of first patient in set Q1.Followed by step S303 is executed, respectively from the identification identity
Attribute vector SiAnd TjK-th of element of middle taking-upAnd Tj k, and described in calculatingAnd Tj kSimilarity value.It is public to calculate similarity
Formula are as follows:
Wherein,For the similarity value;θ is string matching function;1≤k≤N, N are
The quantity of the identification identity attribute.In the present embodiment, by step S201 it is found that there is 3 identification identity attribute vectors, then to
Measure S1And vector T1In containing there are three element, respectively ID card No., social insurance number and date of birth.Then respectively to
Measure S1And vector T1The 1st element of middle taking-upAnd T1 1, which is ID card No., is calculated by formula (1)And T1 1's
Similarity value calculates the similarity of the 1st patient and the ID card No. of first patient in set Q in the A of patient status source
Value.Then, it takes out the 2nd element (social insurance number) and the identity attribute is obtained by method same with the 1st element
Similarity value, and so on, obtain defined in step S201 identify identity attribute similarity value, in the present embodiment,
The similarity value for three attributes being calculated is respectively 0.9,0.85 and 0.8.Followed by, repetition step S302 and S303, i.e.,
Create the identification identity attribute vector T of the 2nd patient in set Q2, and calculate the 1st patient in the patient and patient status source A
Identification identity attribute similarity value, and so on, until traversing entire set Q to be matched, that is, patient status is calculated
In the A of source in the 1st patient and set Q to be matched the identification identity attribute of all patients similarity value.
Then, it executes step S203 and corresponding binary space vector C is created according to the similarity valueij={ Cij
(1), Cij(2) ..., Cij(N) }, if j-th of patient in i-th of patient and the set Q to be matched in the patient status source A
The similarity value of kth item identity attribute be greater than specified threshold, then Cij(k)=1;Conversely, then Cij(k)=0;Wherein, 1≤k≤
N, i.e., when the kth item of j-th of patient in i-th of patient in the patient status source A or the set Q to be matched identifies identity category
Property missing when, then Cij(k)=0.Specifically, specified threshold vector v={ v is first determined1, v2..., vN}.In the present embodiment, refer to
Determining threshold vector is (0.85,0.84,0.81).Then by step S303 it is found that the 1st patient and to be matched in the A of patient status source
In set Q the similarity value of the identification identity attribute of the 1st patient be (0.9,0.85.0.8), then, it is known that corresponding two into
Space vector processed is (1,1,0).Further according to all identification identity attribute similarity values are calculated in step S303, can know respectively
In the A of patient status source in the 1st patient and set Q to be matched binary space corresponding to the similarity value of each patient to
Amount.
Then, step S204 is executed, the probability distribution of the binary space vector is calculatedWherein, FreqlIt is the binary space vector under l kind state
The frequency of appearance, 1≤l≤2N.Specifically, by a series of binary space vectors known to step S203, calculate the series two into
The probability that the probability and binary vector that vector (1,1,0) occurs in space vector processed occur under other states, such as
(0,0,1), (0,1,0) etc., to obtain the probability distribution of binary space vector.
Followed by execution step S205 calculates separately i-th of patient and the collection to be matched in the patient status source A
Close the matching ratio of each patient in Q.Specifically, as shown in figure 4, first carrying out step S401, according to the probability distribution, pass through
EM algorithm obtains matching threshold vector m={ m1, m2..., mNAnd mismatch threshold vector, u={ u1, u2..., uN,
Wherein, N is the quantity of the identification identity attribute.Then, step S402, initialization matching probability and mismatch probability are executed
Value.Specifically, matching probability is initializedMismatch probabilityWherein, j is j-th in set Q to be matched
Patient, i.e., initialization matching probability corresponding to each patient and to mismatch probability be 1 in set Q to be matched.Then, it executes
Step S403, according to the value and initialization value of each element in the binary space vector, iterative calculation obtains matching probability
With mismatch probability.Specifically, for matching probability, if Cij(k)=0, thenIf Cij
(k)=1, thenFor mismatching probability, if Cij(k)=0, thenSuch as
Fruit Cij(k)=1, thenIn the present embodiment, the 1st patient and to be matched in the A of patient status source is first calculated
In set Q the matching probability of the 1st patient and mismatch probability, it can be seen from the above, corresponding binary space vector be (1,1,
0), then correspond to corresponding identification identity attribute, may be matched probability and mismatch probability.And so on, it is available
The matching probability of each patient and probability is mismatched in the 1st patient and set Q to be matched in the A of patient status source.Followed by holding
Row step S404 according to the matching probability and mismatches probability, matching ratio is calculated.Wherein, calculation formula are as follows:
Then, step S4 is executed, according to the matching ratio, by i-th of the patient and patient in the patient status source A
Identity source B establishes cross-index.Specifically, as shown in figure 5, first carrying out step S501, according to the matching ratio, to it is described to
It matches set Q and carries out ascending sort, obtain data acquisition system Q '.Specifically, the matching ratio being calculated according to above-mentioned steps S404
The size of rate carries out ascending order arrangement to the patient in set Q.Then, step S502 is executed, according to statistical models, calculates institute
State matching ratio R atioijUpper bound TupWith lower bound Tlow.Specifically, specifying and refusing true probability is α, and type B error probability is β, wherein 0 <
α≤0.1,0 β≤0.1 <.If there are x-th of patients to meet in data acquisition system Q 'Then match ratio
Upper bound Tup=Ratioix, wherein 0 < x < M, M is the number of element in data acquisition system Q ', i is i-th of disease in patient status source
People.If there are y-th of patients to meet in data acquisition system Q 'Then match the lower bound T of ratiolow=
Ratioiy, wherein 0 < y < x.In the present embodiment, statistical models used are Fellegi-Sunter model.
Then, step S503 is executed, judges the matching ratio R atio of j-th of patient in data acquisition system QijSize.If
Ratioij> Tμp, S504 is thened follow the steps, i-th of patient in j-th of patient and the identity source A is established and is handed over
Fork index;If Tlow≤Ratioij≤Tμp, S505 is thened follow the steps, artificial treatment is carried out;If Ratioij≤Tλ, then step
S506, without any processing.Specifically, first by the matching ratio R atio of first patient in data acquisition system Q11With the upper bound,
Lower bound is compared, then judges the matching ratio R atio of second patient12Size, until traversing entire data acquisition system Q.
Followed by execution step S5, circulation executes step S2~S4, until being all patient status source A and patient status
Source B carries out cross-index.Specifically, and the side of the 1st patient and patient status source B cross-index in the A of patient status source is established
Method is the same, establishes the cross-index of the 2nd patient and patient status source B in the A of patient status source, until institute in the A of patient status source
There are patient and patient status source B to establish cross-index.
Although the invention has been described by way of example and in terms of the preferred embodiments, but it is not for limiting the present invention, any this field
Technical staff without departing from the spirit and scope of the present invention, may be by the methods and technical content of the disclosure above to this hair
Bright technical solution makes possible variation and modification, therefore, anything that does not depart from the technical scheme of the invention, and according to the present invention
Technical spirit any simple modifications, equivalents, and modifications to the above embodiments, belong to technical solution of the present invention
Protection scope.