WO2024028635A1 - Medical data consolidation method and apparatus, computer storage medium, and electronic device - Google Patents

Medical data consolidation method and apparatus, computer storage medium, and electronic device Download PDF

Info

Publication number
WO2024028635A1
WO2024028635A1 PCT/IB2022/057149 IB2022057149W WO2024028635A1 WO 2024028635 A1 WO2024028635 A1 WO 2024028635A1 IB 2022057149 W IB2022057149 W IB 2022057149W WO 2024028635 A1 WO2024028635 A1 WO 2024028635A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
user
integrated
similarity value
medical
Prior art date
Application number
PCT/IB2022/057149
Other languages
French (fr)
Chinese (zh)
Inventor
巴婕菈
阿齐姆
陈嘉源
褚兆玮
郭锋
Original Assignee
Evyd科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Evyd科技有限公司 filed Critical Evyd科技有限公司
Priority to PCT/IB2022/057149 priority Critical patent/WO2024028635A1/en
Publication of WO2024028635A1 publication Critical patent/WO2024028635A1/en

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data
    • G16H10/60ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases

Definitions

  • the present disclosure relates to the field of data processing, and in particular, to a medical data integration method and device, computer-readable storage media, and electronic equipment.
  • a medical data integration method and device may have multiple records in multiple different systems.
  • the multiple records of the user in multiple different systems need to be verified. verify.
  • the verification requirement will be transferred to the relevant personnel of the data group for processing.
  • the relevant personnel of the data group will perform manual analysis and manually analyze different data.
  • Multiple records existing in the system are merged to obtain the merged user data.
  • a medical data integration method includes: acquiring data to be integrated from multiple different systems, and determining the birth date of the user from the plurality of data to be integrated. Multiple first data with the same date; Determine multiple user names corresponding to multiple first data, and perform similarity calculation on multiple user names to obtain similarity values; If the similarity value If the similarity value is greater than or equal to the preset similarity value, other user data corresponding one-to-one with the plurality of first data are determined, so that the plurality of first data consistent with the other user data are integrated to obtain target data.
  • obtaining data to be integrated from multiple different systems includes: obtaining data tables to be integrated from multiple different systems, and obtaining data from the data tables to be integrated. Extract data to be integrated corresponding to specific data fields; wherein, the specific data fields include unique data identification, the user's date of birth, the user name, and other user data fields, and the other user data fields include ID number, birth certificate number, passport number and patient number.
  • determining a user name that corresponds to a plurality of first data in a one-to-one manner includes: determining a unique data identifier that corresponds to a plurality of the first data in a one-to-one manner.
  • performing similarity calculation on multiple user names to obtain similarity values includes: determining multiple words contained in multiple user names; The frequency of occurrence of each word in each user name is determined, and a high-dimensional vector corresponding to each user name is determined; based on the high-dimensional vector corresponding to each user name, a plurality of the users are calculated The cosine distance between names is used to determine the similarity value between multiple user names.
  • determining other user data corresponding to a plurality of the first data one-to-one includes: if the If the similarity value is greater than or equal to the preset similarity value, determine the character lengths corresponding to multiple user names; if the character lengths of multiple usernames are greater than or equal to the preset character length, determine the character lengths corresponding to multiple usernames.
  • One data corresponds to multiple other user data one-to-one.
  • the step of integrating multiple first data that are consistent with the other user data to obtain target data includes: If multiple other user data fields corresponding to the other user data fields are determined, a plurality of the other user data fields are determined; if the other user data fields corresponding to the multiple first data fields are consistent, and If it is consistent with the other user data corresponding to the other user data fields, a plurality of the first data are integrated to obtain the target data.
  • integrating a plurality of the first data to obtain target data includes: determining the character length of a user name that corresponds to a one-to-one correspondence with the plurality of first data, and Compare the plurality of character lengths to obtain a character comparison result; establish a mapping relationship between the first user name and the second user name according to the character comparison result, so as to integrate the plurality of first data to obtain the target Data; wherein, the character length of the first username is greater than the character length of the second username.
  • integrating multiple first data to obtain target data includes: if specific characters exist in multiple first data, converting the specific characters into Remove to obtain a plurality of first data that do not include the specific characters; wherein, the specific characters include all characters except the American Information Exchange Code; for a plurality of all first data that do not include the specific data
  • the first data is integrated to obtain the target data.
  • the method further includes: storing the target according to a specific data format. data.
  • a medical data integration device is provided.
  • the device includes: a determination module configured to obtain data to be integrated from multiple different systems, and to obtain data to be integrated from multiple different systems. Multiple first data with the same date of birth of the user are determined in the data; a similarity calculation module is configured to determine user names corresponding to multiple first data, and perform similarity calculation on multiple user names. Calculate the similarity value; the integration module is configured to, if the similarity value is greater than or equal to the preset similarity value, determine other user data that corresponds to a plurality of the first data one-to-one, so as to compare the other users A plurality of first data with consistent data are integrated to obtain target data.
  • an electronic device including: a processor and a memory; wherein computer-readable instructions are stored on the memory, and when the computer-readable instructions are executed by the processor, the above is achieved.
  • a computer-readable storage medium is provided, on which a computer program is stored. When the computer program is executed by a processor, the medical data integration method in any of the above exemplary embodiments is implemented.
  • Figure 1 schematically shows a flow chart of the medical data integration method in the embodiment of the present disclosure
  • Figure 2 schematically shows the data to be integrated in the medical data integration method in the embodiment of the present disclosure
  • Figure 3 schematically shows the implementation of the present disclosure
  • Figure 4 schematically shows the similarity calculation of multiple usernames in the medical data integration method in the embodiment of the present disclosure.
  • FIG. 5 schematically shows a schematic flowchart of determining other user data that corresponds to multiple first data in the medical data integration method in the embodiment of the present disclosure
  • Figure 6 schematically shows the present invention
  • the medical data integration method integrates multiple first data that are consistent with other user data to obtain target data
  • Figure 7 schematically shows the medical data integration method in the disclosed embodiment.
  • FIG. 8 A schematic flowchart of integrating data to obtain target data
  • Figure 8 schematically illustrates a schematic flowchart of integrating multiple first data to obtain target data in the medical data integration method in an embodiment of the disclosure
  • Figure 9 schematically illustrates the implementation of the disclosure
  • a schematic structural diagram of a medical data integration device in the example Figure 10 schematically shows an electronic device used for a medical data integration method in an embodiment of the present disclosure
  • Figure 11 schematically illustrates a computer-readable storage medium used for a medical data integration method in an embodiment of the present disclosure.
  • Example embodiments may, however, be embodied in various forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concepts of the example embodiments. To those skilled in the art. The described features, structures or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to provide a thorough understanding of embodiments of the disclosure. However, those skilled in the art will appreciate that the technical solutions of the present disclosure may be practiced without one or more of the specific details being omitted, or other methods, components, devices, steps, etc. may be adopted. In other instances, well-known technical solutions have not been shown or described in detail to avoid obscuring aspects of the disclosure.
  • Step S110 Obtain the data to be integrated from multiple different systems, and determine that the user's birth date is the same in the multiple data to be integrated.
  • a plurality of first data A plurality of first data.
  • Step S120 Determine the user names corresponding to the plurality of first data, and perform similarity calculation on the multiple user names to obtain a similarity value.
  • Step S130 The medical data integration method at least includes the following steps: Step S110. Obtain the data to be integrated from multiple different systems, and determine that the user's birth date is the same in the multiple data to be integrated. A plurality of first data.
  • Step S120 Determine the user names corresponding to the plurality of first data, and perform similarity calculation on the multiple user names to obtain a similarity value.
  • the similarity value is greater than or equal to the predetermined Assuming a similarity value, determine other user data that corresponds to multiple first data one-to-one, so as to integrate multiple first data that are consistent with other user data to obtain target data.
  • multiple first data that are consistent with other user data are integrated to obtain target data, and the first data is data with the same birth date of the user among the data to be integrated from multiple different systems, thereby realizing the integration of different systems.
  • step S110 data to be integrated from multiple different systems is obtained, and multiple first data with the same date of birth of the user are determined from the multiple data to be integrated.
  • the data to be integrated refers to data from different systems. Specifically, it can be data from the immigration data system, data from the birth database, or data from the birth database.
  • the data may come from the hospital database, or may also come from any system in which the data to be integrated may exist, which is not particularly limited in this exemplary embodiment.
  • Each data to be integrated corresponds to a user. These users may be the same user or different users.
  • the user's birth date refers to the data corresponding to the user's birth date in the data to be integrated.
  • the first data It refers to the data with the same date of birth of the users in the data to be integrated.
  • the birth date of the user corresponding to the data A to be integrated is 2012-02-02
  • the birth date of the user corresponding to the data B to be integrated is 2000-10-19.
  • the birth date of the user corresponding to the integrated data C is 2012-02-02
  • the birth date of the user corresponding to the data D to be integrated is 2012-02-05
  • the birth date of the user corresponding to the data E to be integrated is 1998-06-23
  • the user's birth date corresponding to the data F to be integrated is 1998-06-23 o
  • the first data with the same birth date of the user is the data to be integrated A, the data to be integrated C, the data to be integrated E and the data to be integrated F
  • the birth dates of the users of the data A to be integrated and the data C to be integrated are the same
  • the birth dates of the users of the data E and the data F to be integrated are the same.
  • obtaining data to be integrated from multiple different systems includes: obtaining data tables to be integrated from multiple different systems, and extracting data to be integrated corresponding to specific data fields from the data tables to be integrated.
  • Integrate data among them, specific data fields include unique data identification, user date of birth, user name and other user data fields. Other user data fields include ID number, birth certificate number, passport number and patient number.
  • the data to be integrated is stored in the data table to be integrated. It is worth mentioning that in addition to the data to be integrated, the data table to be integrated also stores other data that does not need to be integrated.
  • the data to be integrated needs to be extracted from the data table to be integrated, which corresponds to the specific data field.
  • the specific data field refers to the field corresponding to the data to be integrated.
  • the specific data field includes a unique data identifier, which corresponds to different systems.
  • the specific data field also includes the user's date of birth.
  • the data to be integrated corresponding to the specific data field of the user's date of birth can be 1988-07-23,
  • the specific data field also includes the user name.
  • the data to be integrated corresponding to the specific data field of the user name can be "Jane Doe".
  • the specific data field also includes other user data fields. Specifically, the other user data fields include the ID card number.
  • the data to be integrated corresponding to other user data fields can be "610235XXXXXXX2771", other user data fields also include birth certificate numbers, and the data to be integrated corresponding to other data fields such as birth certificate numbers can be "2XX0410"", Other user data fields also include the passport number.
  • the data to be integrated corresponding to the other user data field of the passport can be "C0XXXXX2".
  • Other user data fields also include the patient number.
  • the data to be integrated corresponding to the other data field of the patient number It can be "BNXXXX6" o
  • Figure 2 schematically shows a schematic diagram of the data to be integrated.
  • field 210 is the line number field line
  • value 211 is the value corresponding to field 210, used to represent In the row where the data to be integrated is located
  • the specific data field 220 is the unique data identifier unique_id
  • the identifier 221 is the field value corresponding to the specific data field 220
  • the specific data field 230 is the patient number patient_id
  • the number 231 is corresponding to the specific data field 230.
  • the specific data field 240 is the user name user_name
  • the name 241 is the user name corresponding to the specific data field 240
  • the specific data field 250 is the user's birth date birth_date
  • the date 251 is the date corresponding to the specific data field 250
  • the field 260 is Gender sex
  • data 261 corresponds to field 260
  • specific data field 270 is ID card number id_number
  • number 271 corresponds to specific data field 270
  • specific data field 280 is birth certificate number birth certificate number
  • number 281 corresponds to specific data field 280
  • the specific data field 290 is the passport number
  • the number 291 corresponds to the specific data field 290.
  • the data to be integrated is data corresponding to specific fields extracted from the data table to be integrated, which avoids obtaining other data in the data table to be integrated that does not need to be integrated, and improves the subsequent data to be integrated.
  • user names corresponding one-to-one to the plurality of first data are determined, and similarity calculations are performed on the plurality of user names to obtain similarity values.
  • the user name refers to the user name corresponding to the first data. It is worth mentioning that since the first data is data with the same birth date of the user determined in multiple data to be integrated.
  • the user's date of birth is the same, since the same user may be recorded as different user names in different systems, in order to determine whether multiple first data with the same user's date of birth is the data of the same user, it is necessary to compare multiple Similarity calculation is performed on user names to obtain similarity values between multiple user names.
  • the first data includes data A to be integrated and data C to be integrated with the same date of birth of the user. Based on this, it is determined that the user name corresponding to the data A to be integrated is "George Herbert Walker Bush", and the user name to be integrated is determined.
  • Figure 3 shows a schematic flow chart of determining a user name corresponding to multiple first data in the medical data integration method. As shown in Figure 3, the method at least includes the following steps: In step S310, determine the user name corresponding to multiple first data. The value of the unique data identifier corresponding to the data one-to-one. Among them, the value of the unique data identifier is related to the system from which the first data comes, and the values of different unique data identifiers correspond to different systems.
  • the value of the identifier ensures that the first data from different systems can be integrated later.
  • the plurality of first data include data A to be integrated with the same date of birth of the user and data A to be integrated.
  • C based on this, it is determined that the value of the unique data identifier corresponding to the data A to be integrated is XXI, and the value of the unique data identifier corresponding to the data C to be integrated is XX2.
  • step S320 if the values of the multiple unique data identifiers are different, multiple user names corresponding one-to-one to the multiple first data are determined.
  • the plurality of first data include data A to be integrated and data C to be integrated with the same date of birth of the user.
  • the unique data identifier corresponding to the data A to be integrated is XXI
  • the unique data identifier corresponding to the data to be integrated C is The unique data identifier is XX2.
  • the multiple unique data identifiers are not the same, and it is necessary to determine the user name "George Herbert Walker Bush” corresponding to the data A to be integrated, and it is also necessary to determine the user name "George Walker Bush” corresponding to the data C to be integrated. ".
  • multiple user names respectively corresponding to the multiple first data are determined to ensure that the objects of similarity calculation are users with the same birth date from different systems.
  • FIG. 4 shows a schematic flowchart of performing similarity calculation on multiple user names to obtain similarity values in the medical data integration method.
  • the method at least includes the following steps:
  • step In S410 multiple words contained in multiple user names are determined. Among them, multiple words refer to words included in the user name.
  • step S420 determine the high-dimensional vector corresponding to each user name based on the frequency of multiple words appearing in each user name.
  • the multiple user names are “George Herbert Walker Bush” and “George Walker Bush”.
  • the frequency of multiple words appearing in each user name refers to the occurrence of each word in the user name in the corresponding user name. times, for example the word "George”, the word “Herbert”, the word “Walker” and the word “Bush " each appear once in the username "George Herbert Walker Bush”.
  • the dimension of the high-dimensional vector is related to the number of words appearing in the user names corresponding to the multiple first data, for example A total of four words appear in the above two user names, namely "George”, “Herbert”, “Walker” and “Bush", and the high-dimensional vector corresponding to the user name can be a four-dimensional vector.
  • step S430 based on the high-dimensional vector corresponding to each user name, the cosine distance between multiple user names is calculated to determine the similarity value between the multiple user names.
  • the cosine distance refers to the cosine value of the angle between multiple high-dimensional vectors, and the cosine value is the similarity value obtained by calculating the cosine distance between multiple high-dimensional vectors.
  • a calculation formula may be used to calculate the cosine distance between multiple high-dimensional vectors, or an algorithm may be used to calculate the cosine distance between multiple high-dimensional vectors, which is not specifically limited in this exemplary embodiment.
  • multiple high-dimensional vectors specifically include [1 1 1 1] and [1 0 1 1], and then formula (1) can be used to calculate the cosine distance between multiple high-dimensional vectors to obtain the cosine distance between multiple high-dimensional vectors.
  • the relationship between the usernames corresponding to the dimensional vectors In this exemplary embodiment, a method of calculating the similarity between multiple user names is provided, which helps to subsequently determine whether multiple first data are the data of the same user based on the similarity value, and thus Ensure the accuracy of subsequent data integration.
  • step S130 if the similarity value is greater than or equal to the preset similarity value, other user data corresponding to the plurality of first data are determined to integrate the plurality of first data consistent with the other user data to obtain the target data.
  • the preset similarity value refers to a threshold value that is compared with the similarity value and used to determine whether multiple user names respectively corresponding to the multiple first data are similar. If the similarity value is greater than the preset similarity value, it proves that the user birth dates and user names of multiple first data are the same, and then it is necessary to determine other user data corresponding to the first data to integrate the first data that is also consistent with other user data. , get the target data.
  • the similarity value is less than the preset similarity threshold, it is proved that the plurality of first data are not the data of the same user, and there is no need to integrate the plurality of first data.
  • the first data includes data A to be integrated and data C to be integrated, and the calculated similarity value of the user name of the data A to be integrated and the user name of the data C to be integrated is 0.92. Since the preset similarity The value is 0.9. Therefore, at this time, the similarity value is greater than the preset similarity value, and then other user data A-1 corresponding to the data A to be integrated and other user data C-1 corresponding to the data C to be integrated are determined.
  • Figure 5 shows a schematic flowchart of determining other user data that corresponds to multiple first data in a medical data integration method. As shown in Figure 5, the method at least includes the following steps: In step S510, if the similarity value is greater than or equal to the preset similarity value, character lengths corresponding to multiple user names are determined. Among them, when the similarity is greater than or equal to the preset similarity value, it is also necessary to determine the character length corresponding to the user name.
  • the character length corresponding to the user name is 8.
  • the reason why The character length corresponding to the user name is determined because when the similarity value is greater than or equal to the preset similarity, it cannot be completely guaranteed that multiple user names must be consistent. It is also necessary to subsequently determine the character length corresponding to the user name. It can ensure that multiple usernames are the same user's username in real time. This is because if the username has two usernames with shorter character length, such as "Gone” and "Goie", it is possible to judge only by using the similarity value. You will get the conclusion that the above two user names are similar, which is obviously not consistent with the facts. For example, the similarity value is 0.92, and the preset similarity value is 0.9.
  • the similarity value is greater than the preset similarity value at this time. Based on this, it is determined that the character length corresponding to the user name XXI is 26, which is the same as the user name. The character length corresponding to the name XX2 is 23 o In step S520, if the lengths of multiple characters are greater than or equal to the preset character length, determine other user data corresponding to the multiple first data one-to-one. Among them, the preset character length is a character length threshold for further judging whether the username corresponding to the character length is consistent.
  • the length of multiple characters is greater than or equal to the preset length, it can be proved that the username corresponding to the character length is consistent, and then it is necessary
  • Other user data included in the first data corresponding to the multiple user names is determined to ensure that subsequent judgment can be made on whether the other user data is consistent, and thereby determine whether the first data can be integrated. For example, it is determined that the character length corresponding to the user name XXI is 26, the character length corresponding to the user name XX2 is 23, and the default character length is 20.
  • the character lengths of the above two user names are greater than 20, and then It is determined that the other user data included in the first data corresponding to the user name XXI is A-1, and the other user data included in the first data corresponding to the user name XX2 is determined to be C-1.
  • the similarity value is greater than the preset similarity value and the plurality of character lengths are greater than the preset character length, other user data corresponding to the plurality of first data is determined.
  • Figure 6 shows a schematic flow chart of integrating multiple first data that are consistent with other user data to obtain target data in a medical data integration method. As shown in Figure 6, the method at least includes the following steps : In step S610, if there is a plurality of other user data corresponding to other user data fields in the plurality of first data, a judgment is made on the plurality of other user data fields. Among them, other user data fields correspond to other user data.
  • the other user data corresponding to the other user data fields is the ID card number.
  • the other user data includes the ID card number, birth ID number, passport number and patient number.
  • other user data fields corresponding to the ID card number there are also other user data fields corresponding to the birth certificate number, and other user data fields corresponding to the passport number.
  • Other user data fields corresponding to the patient number need to be compared with the ID number, birth certificate number, passport number and patient number to ensure that the multiple first data are the data that need to be integrated.
  • the first data may include four other user data corresponding to the ID number, birth certificate number, passport number and patient number respectively, or it may only include any one of the above four other user data, or it may only include Including any two of the above four other user data, or may only include any three of the above four other user data. Based on this, no matter whether the plurality of first data includes any of the four other user data mentioned above, it can first be determined whether there is other user data corresponding to other user data fields in the first data. If there is other user data in the first data, Multiple other user data corresponding to other user data fields, then determine whether the multiple other user data fields are consistent. For example, the first data is data A to be integrated and data C to be integrated.
  • the data A to be integrated there is an ID number 6213004 corresponding to the id_nunber field of other user data
  • the data C to be integrated there is also an ID number 6213004 corresponding to the field id_nunber of other user data.
  • the user data field id_nunber corresponds to the ID number 6213004.
  • there are other user data corresponding to other user data fields in the above two first data and the other user data fields are all id numbero.
  • the first data is the data E to be integrated and the data F to be integrated.
  • ID card number 12456 corresponding to the field id_nunber of other user data
  • ID number 12456 corresponding to the field id_nunber of other user data.
  • the corresponding ID number is 12456.
  • passport number 0023 corresponding to other user data fields passport.
  • passport number corresponding to other user data fields passport. 2256 Obviously, at this time, there are other user data corresponding to other user data fields in the above two first data, and the other user data fields are consistent.
  • step S620 if the other user data fields corresponding to the plurality of first data are consistent and consistent with the other user data corresponding to the other user data fields, the plurality of first data are integrated to obtain the target data.
  • the multiple first data are integrated.
  • the first data is data A to be integrated and data C to be integrated.
  • the ID number corresponding to the user data field id_nunber is 6213004.
  • the other user data fields are all id_number, and the other user data corresponding to the other user data field id_number are all 6213004, Then the data to be integrated A and the data to be integrated C are integrated to obtain the target data.
  • the first data is the data to be integrated E and the data to be integrated F
  • the ID number 12456 corresponding to the id_nunber field of other user data
  • the ID number 12456 corresponding to the other user data field id_nunber.
  • the ID number corresponding to the user data field id_nunber is 12456.
  • the corresponding passport number is 2256.
  • FIG. 7 shows a schematic flowchart of integrating multiple first data to obtain target data in a medical data integration method. As shown in FIG.
  • the method at least includes the following steps: In step S710 , determine the character lengths of multiple user names corresponding to the multiple first data one-to-one, and compare the multiple character lengths to obtain a character comparison result.
  • the character length of the username refers to the number of characters used to compose the username
  • the character comparison result is the result obtained by comparing multiple character lengths
  • the multiple character lengths refers to the result of comparing multiple character lengths.
  • One piece of data respectively corresponds to the character length of the user name.
  • the first data includes data A to be integrated and data C to be integrated.
  • the user name corresponding to the data A to be integrated is "Jone”
  • the user name corresponding to the data C to be integrated is "Jone Doe".
  • step S720 a mapping relationship between the first user name and the second user name is established to integrate multiple first data to obtain the target data; wherein the character length of the first user name is greater than that of the second user name. 2.
  • the character length of the username based on the character comparison result, a mapping relationship between the first user name and the second user name is established, and the value It should be noted that the character length of the first username must be greater than the character length of the second username.
  • the first data includes data A to be integrated and data C to be integrated.
  • the user name corresponding to the data A to be integrated is "Jone”
  • the user name corresponding to the data C to be integrated is "Jone Doe”.
  • the character length of the username "Jone Doe” is greater than the character length of the username "Jone”.
  • the first user name is "Jone Doe”
  • the second user name is "Jone”.
  • the mapping relationship between the first user name and the second user name can be established in the form of a key-value pair, for example: “Jone Doe”: “Jone”, which indicates that the user corresponding to the first user name "Jone Doe” and the user corresponding to the second user name "Jone” are the same user.
  • the target data is queried according to the key. Since the key is a user name with a long character length, querying based on the key can ensure the accuracy of the queried target data. Achieve accurate verification of data.
  • a mapping relationship between the first user name and the second user name is established based on the character comparison result, thereby ensuring that when the data is subsequently verified, the first user with a longer character length can be found.
  • the corresponding target data avoids the target data query error caused by the second user with a shorter character length, which may lead to subsequent verification failure.
  • Figure 8 shows a schematic flowchart of integrating multiple first data to obtain target data in the medical data integration method.
  • the method at least includes the following steps: In step S810 , if there are specific characters in the plurality of first data, remove the specific characters to obtain multiple first data excluding the specific characters; wherein the specific characters include all characters except the American Information Exchange Code.
  • the American Standard Code for Information Interchange refers to the ASCII code
  • the specific characters refer to all characters except the ASCII code, that is, the specific characters refer to the non-ASCII codes.
  • the reason why the specific characters need to be removed is because multiple characters cannot be processed.
  • the non-ASCII codes in the first data are processed, and then the non-ASCII codes in the plurality of first data need to be removed, so that the plurality of first data that do not include non-ASCII codes can be processed.
  • the first data is data A to be integrated and data C to be integrated.
  • the data A to be integrated contains non-ASCII code XXX, that is, specific characters are stored in the first data, and XXX is moved from the data A to be integrated. Divide to obtain data A that does not include specific characters.
  • a plurality of first data excluding specific characters are integrated to obtain target data. After removing the non-ASCII codes, the plurality of first data excluding the non-ASCII codes are integrated to obtain the target data.
  • the first data is data A to be integrated and data C to be integrated.
  • the data A to be integrated includes the specific character XXX, that is, there is a non-ASCII code in the first data, and XXX is removed from the data A to be integrated. Remove to obtain the data to be integrated A that does not include non-ASCII codes, and then integrate the data to be integrated C and the data to be integrated A that does not include non-ASCII codes to obtain the target data.
  • the non-ASCII codes present in the first data are removed to avoid the subsequent inability to process the non-ASCII codes in the first data and improve the result obtained after integrating multiple first data. Head accuracy and efficiency of standard data.
  • the method further includes: storing the target data according to a specific data format.
  • the target data can also be stored according to a specific data format.
  • the specific data format can be JSON (JavaScript Object Notation, JS object notation) format
  • the specific data format can also be a table format.
  • the data format can also be a number corresponding to a certain database, and the specific data format can also be any data format, which is not specifically limited in this exemplary embodiment.
  • the obtained target data is stored in Json format.
  • target data is stored in a specific data format to facilitate subsequent identification of the target data, thereby improving the efficiency of verification when data verification is required for the user.
  • target data is obtained by integrating multiple first data that are consistent with other user data, and the first data is a user in the data to be integrated from multiple different systems. Data with the same date of birth realizes the integration of data to be integrated in different systems. On the one hand, it avoids the verification failure in the existing technology when the data to be integrated is verified, and improves the efficiency of verification; on the other hand, avoiding the situation in the existing technology that all the data to be integrated cannot be integrated, thereby improving the integration accuracy and efficiency of the data to be integrated.
  • the medical data integration method in the embodiment of the present disclosure will be described in detail below in conjunction with an application scenario.
  • the birth date of the user is the same as the birth date of the user corresponding to the data B to be integrated. Therefore, the plurality of first data are determined to be the data A and the data to be integrated.
  • the user name corresponding to the data A to be integrated is determined to be "Jone Doe""
  • the user name corresponding to the data B to be integrated is "Jone".
  • the similarity calculation of the above two user names resulted in a similarity value of 1.2.
  • the other user data A-1 corresponding to the integrated data A is also determined to be the other user data B-1 corresponding to the data B to be integrated. Since the other user data A-1 is consistent with the other user data B-1, the data A to be integrated is Integrate it with the data B to be integrated to obtain the target data.
  • a medical data integration device is also provided.
  • Figure 9 shows a schematic structural diagram of a medical data integration device.
  • the medical data integration device 900 may include: Determine module 910, similarity calculation module 920 and integration module 930.
  • the determination module 910 is configured to obtain data to be integrated from multiple different systems, and determine multiple first data with the same birth date of the user from the multiple data to be integrated
  • the similarity calculation module 920 is configured In order to determine the usernames corresponding to the plurality of first data one-to-one, and perform similarity calculation on the plurality of usernames to obtain similarity values
  • the integration module 930 is configured to: if the similarity value is greater than or equal to the preset similarity value, A plurality of other user data corresponding to the plurality of first data are determined to integrate the plurality of first data that are consistent with the other user data to obtain the target data.
  • the electronic device shown in FIG. 10 1000 is just an example and should not bring any limitation to the functions and usage scope of the embodiments of the present invention.
  • electronic device 1000 is embodied in the form of a general computing device.
  • the components of the electronic device 1000 may include, but are not limited to: the above-mentioned at least one processing unit 1010, the above-mentioned at least one storage unit 1020, a bus 1030 connecting different system components (including the storage unit 1020 and the processing unit 1010), the display unit 1040o, wherein, the
  • the storage unit stores program code, which can be executed by the processing unit 1010, so that the processing unit 1010 performs the steps according to various exemplary embodiments of the present invention described in the "Exemplary Method" section of this specification.
  • the storage unit 1020 may include a readable medium in the form of a volatile storage unit, such as a random access storage unit (RAM) 1021 and/or a cache storage unit 1022, and may further include a read-only storage unit (ROM) 1023 o
  • Storage unit 1020 may also include a program/usage tool 1024 having a set of (at least one) program modules 1025.
  • program modules 1025 include, but are not limited to: an operating system, one or more application programs, other program modules, and program data, examples of which Each of these, or some combination of them, may contain the reality of a networked environment.
  • Bus 1030 may represent one or more of several types of bus structures, including a memory unit bus or memory unit controller, a peripheral bus, a graphics acceleration port, a processing unit, or using any of a variety of bus structures. It means the local bus of the bus structure.
  • Electronic device 1000 may also communicate with one or more external devices 1070 (e.g., keyboard, pointing device, Bluetooth device, etc.), may also communicate with one or more devices that enable a user to interact with electronic device 1000, and/or with Any device (eg, router, modem, etc.) that enables the electronic device 1000 to communicate with one or more other computing devices. This communication may occur through input/output (I/O) interface 1050.
  • I/O input/output
  • the electronic device 1000 can also communicate with one or more networks (such as a local area network (LAN), a wide area network (WAN) and/or a public network, such as the Internet) through the network adapter 1060. As shown, network adapter 1060 communicates with other modules of electronic device 1000 via bus 1030. It should be understood that, although not shown in the figure, other hardware and/or software modules may be used in conjunction with the electronic device 1000, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAI systems, tape drives And data backup storage system, etc.
  • the technical solution according to the embodiment of the present disclosure can be embodied in the form of a software product.
  • the software product can be stored in a non-volatile storage medium (which can be a CD-ROM, U disk, mobile hard disk, etc.) or on a network. above, including several instructions to cause a computing device (which may be a personal computer, a server, a terminal device, or a network device, etc.) to execute the method according to the embodiment of the present disclosure.
  • a computer-readable storage medium is also provided, on which a program product capable of implementing the method described above in this specification is stored.
  • various aspects of the present invention can also be implemented in the form of a program product, which includes program code.
  • a program product 1100 for implementing the above method according to an embodiment of the present invention is described, which can adopt a portable compact disk read-only memory (CD-ROM) and include program code, and can be used on a terminal device, For example, run on a personal computer.
  • CD-ROM portable compact disk read-only memory
  • the program product of the present invention is not limited thereto.
  • a readable storage medium may be any tangible medium containing or storing a program, which may be used by or in combination with an instruction execution system, apparatus or device.
  • the program product may take the form of any combination of one or more readable media.
  • the readable medium may be a readable signal medium or a readable storage medium.
  • the readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device or device, or any combination of the above. More specific examples (non-exhaustive list) of readable storage media include: electrical connection with one or more wires, portable disk, hard disk, random access memory (RAM), read only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
  • a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, It contains readable program code. This propagated data signal can take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the above.
  • a readable signal medium may also be any readable medium other than a readable storage medium that may send, propagate, or transmit a program for use by or in conjunction with an instruction execution system, apparatus, or device.
  • the program code contained on the readable medium can be transmitted using any appropriate medium, including but not limited to wireless, wired, optical cable, RF, etc., or any suitable combination of the above.
  • Program code for performing the operations of the present invention may be written in any combination of one or more programming languages, including object-oriented programming languages such as Java, C++, etc., as well as conventional procedural programming. Programming language - such as "C" or a similar programming language.
  • the program code may execute entirely on the user's computing device, partly on the user's computing device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server execute on.
  • the remote computing device may be connected to the user computing device through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computing device (e.g., provided by an Internet service).
  • LAN local area network
  • WAN wide area network

Landscapes

  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Epidemiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Primary Health Care (AREA)
  • Public Health (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The present disclosure relates to the field of data processing, and to a medical data consolidation method and apparatus, a storage medium, and an electronic device. The method comprises: obtaining data to be consolidated from a plurality of different systems, and determining, among the plurality of pieces of data to be consolidated, a plurality of pieces of first data of users having the same date of birth; determining a plurality of user names in one-to-one correspondence to the plurality of pieces of first data, and performing similarity calculation on the plurality of user names to obtain a similarity value; and if the similarity value is greater than or equal to a preset similarity value, determining a plurality of pieces of other user data in one-to-one correspondence to the plurality of pieces of first data, so as to consolidate the plurality of pieces of first data consistent with the plurality of pieces of other user data to obtain target data. In the present disclosure, the first data is the data of the users having the same date of birth among the data to be consolidated from the plurality of different systems, so that the consolidation of the data to be consolidated in the different systems is realized, and the situation in the prior art that data to be consolidated needs to be manually consolidated is avoided, thereby improving the consolidation accuracy and efficiency.

Description

医疗数据整合方法及装置 、 计算机存储介质、 电子设备 技术领域 本公开涉及数据处理领域 , 尤其涉及一种医疗数据整合方法与医疗数据整合置、 计算 机可读存储介质及电子设备。 背景技术 随着计算机技术 的发展, 一个用户可能在多个不同的系统中存在多条记录, 进而当需 要对用户的数据进行验证时, 需要对用户在多个不同系统中存在的多条记录进行验证。 在相关技术 中, 当接收到针对于用户的数据验证请求时, 会将该验证需求转交给数据 组相关人员进行处理, 针对于该验证需求, 数据组相关人员会进行人工分析, 并手动对不 同系统中存在的多条记录进行合并, 以得到合并后的用户数据, 显然, 在该种情况下, 随 着后续不同系统中记录的增加, 需要再次对用户数据进行合并, 这降低了用户数据合并的 准确度以及效率, 进而导致用户数据验证失败。 鉴于此 , 本领域亟需开发一种新的医疗数据整合方法及装置。 需要说 明的是, 在上述背景技术部分公开的信息仅用于加强对本公开的背景的理解, 因此可以包括不构成对本领域普通技术人员已知的现有技术的信息。 发 明内容 本公开的 目的在于提供一种 医疗数据整合方 法、 医疗数据整合装置、 计算机可 读存储介 质及电子设备 , 进而至少在一定程度上克服 由于相关技术导 致的用户数据 合并准确 度以及效率低的 问题。 本公开的其他 特性和优点将通 过下面的详细 描述变得显然 , 或部分地通过本公 开的实践 而习得。 根据本发明实 施例的第一个方面 ,提供一种医疗数据整合方 法,所述方法包括: 获取来 自于多个不同系 统的待整合数据 , 并在多个所述待整合数据 中确定出用户出 生 日期相同的多个第一 数据; 确定与多个所述第一数据 一一对应的多 个用户名, 并 对多个所 述用户名进行 相似度计算得到 相似度值; 若所述相似度值大 于或等于预设 相似度值 , 确定与多个所述第一数据一 一对应的其他用 户数据, 以对所述其他用户 数据一致 的多个所述第一 数据进行整合得 到目标数据。 在本发明的一 种示例性实施例 中,所述获取来自于多个 不同系统的待整合 数据, 包括: 获取来自于多个不同系统的待整 合数据表, 并从所述待整合数 据表中提取与 特定数据 字段对应的待整 合数据; 其中, 所述特定数据字段包括 唯一数据标识、 所 述用户 出生日期、 所述用户名以及其他 用户数据字段 , 所述其他用户数据字段包括 身份证编 号、 出生证编号、 护照号以及病人编号。 在本发明的一 种示例性实施例 中, 所述确定与多个所述第一 数据一一对应 的用 户名, 包括: 确定与多个所述第一数据 一一对应的所述 唯一数据标识 的值; 若多个 所述唯 一数据标识的值 不同, 则确定与多个所述第一数 据一一分别对 应的多个用户 名。 在本发明的一 种示例性实施例 中, 所述对多个所述用户名进 行相似度计算 得到 相似度值 , 包括: 确定多个所述用户名中包含的多个词 汇; 根据所述多个词汇在每 一个所述 用户名中出现 的频率, 确定与每一所述用户名 对应的高维 向量; 根据与每 一所述 用户名对应的所 述高维向量, 计算多个所述用户 名之间的余弦 距离, 以确定 多个所述 用户名之间的相 似度值。 在本发明的一 种示例性实施例 中, 所述若所述相似度值大 于或等于预设相 似度 值, 确定与多个所述第 一数据一一对应 的其他用户数据 , 包括: 若所述相似度值大 于或等 于预设相似度值 , 确定与多个所述用户名一一对 应的字符长度 ; 若多个所述 字符长度 大于或等于预 设字符长度, 确定与多个所述第 一数据一一对 应的多个其他 用户数据 。 在本发明的一 种示例性实施例 中, 所述对所述其他用户数据 一致的多个所 述第 一数据进 行整合得到 目标数据, 包括: 若在多个所述第一数据中存在 与所述其他用 户数据 字段对应的多个 所述其他用户数 据, 则对多个所述其他用户数 据字段进行判 断; 若与多个所述第一 数据一一对应 的所述其他用户数 据字段一致 , 且与所述其他 用户数据 字段对应的所 述其他用户数据 一致, 则对多个所述第一数据 进行整合得到 目标数据 。 在本发明的一 种示例性实施例 中, 所述对多个所述第一数据 进行整合得到 目标 数据, 包括: 确定与多个所述第一数据 一一对应的用户 名的字符长度 , 并对多个所 述字符 长度进行比较得 到字符比较结果 ; 根据所述字符比较结果, 建立第一用户名 与第二 用户名之间的映 射关系, 以对多个所述第一数据 进行整合得到 目标数据; 其 中, 所述第一用户名的所 述字符长度大于 所述第二用户名 的所述字符长度 。 在本发明的一 种示例性实施例 中, 所述对多个所述第一数据 进行整合得到 目标 数据, 包括: 若在多个所述第一数据 中存在特定字符 , 则将所述特定字符移除, 以 得到不包 括所述特定字 符的多个所述第 一数据; 其中, 所述特定字符中包括除了美 国信息 交换代码之外 的所有字符; 对不包括所述特定数 据的多个所述 第一数据进行 整合得到 目标数据。 在本发明的一 种示例性实施例 中, 所述对所述其他用户数据 一致的多个所 述第 一数据进 行整合得到 目标数据之后, 所述方法还包括 : 按照特定数据格式存储所述 目标数据 。 根据本发明实 施例的第二个方面 ,提供一种医疗数据整合装 置,所述装置包括: 确定模块 , 被配置为获取来自于多个不 同系统的待整合 数据, 并在多个所述待整合 数据 中确定出用户出生 日期相同的多个 第一数据; 相似度计算模块, 被配置为确定 与多个所 述第一数据一 一对应的用户名 , 并对多个所述用户名进行相 似度计算得到 相似度值 ; 整合模块, 被配置为若所述相似度值大于 或等于预设相似 度值, 确定与 多个所述 第一数据一一 对应的其他用户 数据, 以对所述其他用户数据 一致的多个所 述第一数 据进行整合得到 目标数据。 根据本发明实 施例的第三个方面 ,提供一种电子设备 ,包括:处理器和存储器; 其中, 存储器上存储有 计算机可读指令 , 所述计算机可读指令被所述 处理器执行时 实现上述 任意示例性实施 例的医疗数据整 合方法。 根据本发明实 施例的第四个方 面, 提供一种计算机可读存储 介质, 其上存储有 计算机程 序, 所述计算机程序被处理器 执行时实现上述 任意示例性实 施例中的医疗 数据整合 方法。 应当理解的是 , 以上的一般描述和后文的细 节描述仅是示例 性和解释性 的, 并 不能限制 本公开。 附图说明 此处的附图被 并入说明书 中并构成本说明书 的一部分, 示出了符合本公开 的实 施例, 并与说明书一起用于解释本公开 的原理。 显而易见地, 下面描述中的附图仅 仅是本 公开的一些实施 例, 对于本领域普通技术人员 来讲, 在不付出创造性劳动的 前提下 , 还可以根据这些附图获得其他 的附图。 图 1示意性示出本 公开实施例中医疗 数据整合方法 的流程示意图; 图 2示意性示出本 公开实施例中医疗 数据整合方法 中待整合数据的示 意图; 图 3示意性示 出本公开实施例 中医疗数据整合 方法中确定与多个 第一数据一一 对应的 多个用户名的流程 示意图; 图 4示意性示 出本公开实施例 中医疗数据整合 方法中对多个用 户名进行相似度 计算得到 相似度值的流程 示意图; 图 5示意性示出本公开实施例 中医疗数据整合 方法中确定与多个 第一数据一一 对应的其 他用户数据的流 程示意图; 图 6示意性示 出本公开实施例 中医疗数据整合 方法中对其他用 户数据一致的多 个第一数 据进行整合得到 目标数据的流程 示意图; 图 7示意性示 出本公开实施例 中医疗数据整合 方法中对多个第 一数据进行整合 得到 目标数据的流程示 意图; 图 8示意性示出本公开实施例 中医疗数据整合 方法中对多个第 一数据进行整合 得到 目标数据的流程示 意图; 图 9示意性示出本 公开实施例中一种 医疗数据整合装 置的结构示意 图; 图 10示意性示出本公 开实施例中一种 用于医疗数据整合 方法的电子设 备; 图 11 示意性示出本公开实施例中一种 用于医疗数据 整合方法的计 算机可读存 储介质 。 具体 实施方式 现在将参考 附图更全面地描 述示例实施方式 。 然而, 示例实施方式能够以多种 形式实 施, 且不应被理解为限于在此 阐述的范例; 相反, 提供这些实施方式使得本 公开将 更加全面和完整 ,并将示例实施方式的构 思全面地传达给本 领域的技术人员 。 所描述 的特征、 结构或特性可以以任 何合适的方式结合 在一个或更多 实施方式中。 在下面 的描述中, 提供许多具体细节 从而给出对本公 开的实施方式 的充分理解。 然 而, 本领域技术人员将 意识到, 可以实践本公开的技 术方案而省略所 述特定细节 中 的一个 或更多, 或者可以采用其它的方法 、 组元、 装置、 步骤等。 在其它情况下, 不 详细示 出或描述公知技 术方案以避免喧 宾夺主而使得本公 开的各方面变得 模糊。 本说明书中使 用用语 “一个”、 “一”、 “该 ”和 “所述 ”用以表示存在一个或多个要素 /组成部分 /等; 用语 “包括 ”和 “具有 ”用以表示开放式的包括在内的意思并且是指除了 列出 的要素 /组成部分 /等之外还可存在另外的要素 /组成部分 /等;用语 “第一 ”和 “第二 等仅作 为标记使用, 不是对其对象的数量 限制。 此外, 附图仅为本公开的示 意性图解, 并非一定是按比例 绘制。 图中相同的附 图标记 表示相同或类似 的部分, 因而将省略对它们 的重复描述。 附图中所示的一些 方框图是 功能实体, 不一定必须与物理或 逻辑上独立的实 体相对应。 针对相关技术 中存在的问题 , 本公开提出了一种医疗数据整合 方法。 图 1示出 了医疗 数据整合方法的 流程示意图, 如图 1所示, 医疗数据整合方法至 少包括以下 步骤 : 步骤 S110. 获取来自于多个不同系统的待整合数据, 并在多个待整合数 据中确 定出用 户出生日期相 同的多个第一数据 。 步骤 S120. 确定与多个第一数据一一对应的用户名, 并对多个用户名进行 相似 度计算 得到相似度值 。 步骤 S130. 若相似度值大于或等于预设相似度值, 确定与多个第一数据一 一对 应的其 他用户数据, 以对其他用户数据一致的多个 第一数据进行整 合得到目标数据 。 在本公开的 示例性实施例提 供的方法及装置 中, 对其他用户数据一致的 多个第 一数据 进行整合得到 目标数据, 并且, 第一数据是来自于多个不 同系统的待整合数 据中用 户出生日期相 同的数据, 进而实现了对不同系 统中待整合数据 的整合, 一方 面, 避免了现有技术 中对待整合数据 进行验证时出现 验证失败的情 况发生, 提高了 验证 的效率; 另一方面, 避免了现有技术中无法对所 有待整合数据进 行整合的情况 发生 , 进而提高了待整合数据的整合准确 度以及效率。 下面对医疗数 据整合方法的各 个步骤进行详细 说明。 在步骤 S110中, 获取来自于多个不同系统的待整合数据, 并在多个待整合数据 中确定 出用户出生日期相 同的多个第一数 据。 在本公开的 示例性实施例 中, 待整合数据指的是来自于不 同系统的数据 , 具体 地, 可以是来自于移 民数据系统中的 数据, 可以是来自于出生数据 库中的数据, 也 可以是 来自于医院数据 库中的数据, 还可以是来自于待整合数据可 能存在的任何一 个系统 中的数据, 本示例性实施例对此不 作特殊限定。 每一个待整 合数据对应于一 个用户, 这些用户可能是相 同的用户, 也可能是不 同的用 户, 用户出生日期指的是待整 合数据中与用户 的出生日期对应 的数据, 基于 此, 第一数据指的是待整 合数据中用户 出生日期相同的数 据。 举例而言, 存在 6个待整合数据 , 这 6个待整合数据来自于不同 的系统, 具体 地, 6个待整合数据包 括待整合数据 A、 待整合数据 B、 待整合数据 C、 待这个数据 D 、 待整合数据 E以及待整合数据 F, 其中, 与待整合数据 A对应的用户 出生日期 为 2012-02-02, 与待整合数据 B对应的用户出生日期为 2000-10-19, 与待整合数据 C 对应的用户 出生日期为 2012-02-02,与待整合数据 D对应的用户出生日期为 2012- 02-05 , 与待整合数据 E对应的用户出生日期为 1998-06-23 , 与待整合数据 F对应的 用户 出生日期为 1998-06-23 o 基于此,确定出用户 出生日期相同的第 一数据为待整合数 据 A、待整合数据 C、 待整合 数据 E以及待整合数 据 F, 其中, 待整合数据 A与待整合数据 C的用户出生 日期相 同, 待整合数据 E与待整合数据 F的用户出生日期相 同。 在可选的实 施例中, 获取来自于多个不同系 统的待整合数 据, 包括: 获取来自 于多个 不同系统的待整 合数据表, 并从待整合数据表 中提取与特定数 据字段对应 的 待整合 数据; 其中, 特定数据字段包括唯一数据标识 、 用户出生日期、 用户名以及 其他用 户数据字段, 其他用户数据字 段包括身份证编 号、 出生证编号、 护照号以及 病人编 号。 其中, 通常在不同的系统 中, 待整合数据被存储在存在待 整合数据表中 , 值得 说明 的是, 在待整合数据表中除了存 储着待整合数据 之外, 也存储着其他不需要被 整合 的数据, 进而, 需要从待整合数据表中提取出与 特定数据字段对 应的待整合数 据, 特定的数据字段指 的就是与待整合数 据对应的字段 。 特定数据字 段包括唯一数据 标识, 该唯一数据标识与不 同的系统对应, 特定数 据字段 中还包括用户 出生日期, 与用户出生日期这一 特定数据字段对 应的待整合数 据可 以为 1988-07-23 , 特定数据字段中还包括用户名, 与用户名这一特定数据字段 对应 的待整合数据可 以 “ Jane Doe” , 特定数据字段中还包括其他用户数据字段, 具体地 , 其他用户数据字段包括身份 证编号, 与身份证编号这一其他 用户数据字段 对应 的待整合数据可 以为 “ 610235XXXXXXXX2771 ”, 其他用户数据字段还包括出 生证编 号,与出生证编号这一 其他数据字段对应 的待整合数据可 以为“ 2XX0410 ”, 其他用户 数据字段还包 括护照号, 与护照这一其他用户 数据字段对应 的待整合数据 可以为 “ C0XXXXX2 ” , 其他用户数据字段还包括病人编号, 与病人编号这一其他 数据字段 对应的待整合数 据可以为 “ BNXXXX6 ” o 举例而言, 图 2示意性示出了待整合数据的示意 图, 如图 2所示, 字段 210为 行号字段 line, 值 211为与字段 210对应的值, 用于表示待整合数据所处于的行, 特定数据 字段 220为唯一数据标识 unique_id, 标识 221为与特定数据字段 220对应 的字段值 ,特定数据字段 230为病人编号 patient_id,编号 231为与特定数据字段 230 对应的病 人编号, 特定数据字段 240为用户名 user_name, 名称 241为与特定数据字 段 240对应的用户名, 特定数据字段 250为用户出生日期 birth_date, 日期 251为与 特定数据 字段 250对应的日期, 字段 260为性别 sex, 数据 261与字段 260对应, 特 定数据字 段 270为身份证编号 id_number, 编号 271与特定数据字段 270对应, 特定 数据字段 280为出生证编号出生证 编号, 编号 281与特定数据字段 280对应, 特定 数据字段 290为护照号 passport, 编号 291与特定数据字段 290对应。 在本示例性实 施例中, 待整合数据是从待整 合数据表中提 取出的与特定字 段对 应的数据 , 避免了获取到待整合数据表 中的其他无需整 合的数据, 提升了后续待整 合数据 的整合准确度以及 效率。 在步骤 S120中, 确定与多个第一数据一一对应的用户名, 并对多个用户名进行 相似度计 算得到相似度值 。 在本公开的示 例性实施例 中, 用户名指的是与第一数据对应 的用户名称 , 值得 说明的是 , 由于第一数据是在多个待整合数据中确 定出的用户出生 日期相同的数据 , 尽管用户 出生日期相 同, 但是由于同一个用户在不 同的系统中可能被 记录为不同的 用户名 , 为了确定用户出生日期相同 的多个第一数据是 否为同一个用 户的数据, 需 要对多个 用户名进行相似 度计算以得到多 个用户名之间的 相似度值。 举例而言,第一数据包括 用户出生日期相 同的待整合数据 A以及待整合数据 C, 基于此 , 确定出与待整合数据 A对应的用 户名为 “ George Herbert Walker Bush” , 确定 出与待整合数 据 C 对应的用户名为 “ George Walker Bush ” , 进而对用户名 “ George Herbert Walker Bush” 以及用户名 “ George Walker Bush”进行相似度计算 得到相似 度值 0.85 o 在可选的实施 例中, 图 3示出了医疗数据整合 方法中确定与多个 第一数据一一 对应的用 户名的流程示 意图, 如图 3所示, 该方法至少包括以下步骤: 在步骤 S310 中, 确定与多个第一数据 一一对应的唯一 数据标识的值。 其中, 唯一数据标识的值与第 一数据所来 自于的系统相关 , 不同的唯一数据标 识的值对 应于不同的系 统, 因此, 需要先确定出与多个第一数据分别 对应的多个唯 一数据标 识的值, 进而确保后续可以对来 自于不同系统的第 一数据进行整合 。 举例而言, 多个第一数据包括 用户出生日期相 同的待整合数据 A以及待整合数 据 C, 基于此, 确定出与待整合数据 A对应的唯一数据 标识的值为 XXI , 与待整合 数据 C 对应的唯一数据标识的值 为 XX2。 在步骤 S320中, 若多个唯一数据标识的值不同, 则确定与多个第一数据一一对 应的多 个用户名。 其中,当多个唯一数据标 识的值不同时 ,证明多个第一数据来自于不 同的系统, 此时才 需要确定出与多个 第一数据对应 的用户名。 当多个唯一 数据标识的值相 同, 证明多个第一数据来 自于同一个系统, 因为, 在同一 个系统当中, 对于数据的存储 具有统一的标准 , 因此, 同一个系统中的第一 数据都 是经过数据整合 处理后得到的数据 ,因此,当多个唯一数据标识的值相 同时, 无需再 对多个第一数据进 行整合。 举例而言, 多个第一数据包括用户出生日期相 同的待整合数据 A以及待整合数 据 C, 基于此, 确定出与待整合数据 A对应的唯一数据 标识为 XXI , 与待整合数据 C 对应的唯一数据标识 为 XX2。 显然, 此时多个唯一数据标识 并不相同, 进而需要确定出与待整 合数据 A对应 的用户 名 “ George Herbert Walker Bush ” , 还需要确定出与待整合数据 C对应的用 户名 " George Walker Bush” 。 在本示例性 实施例中, 若多个第一数据标识 不同, 确定出与多个第一数据 分别 对应 的多个用户名, 以保证相似度计算的对象是来 自于不同系统的 用户出生日期相 同的待 整合数据的用户 名, 从用户名的维度实现了数 据的整合, 提升了整合待整合 数据 的效率。 在可选的实施 例中, 图 4示出了医疗数据整合 方法中对多个用 户名进行相似度 计算得 到相似度值的流 程示意图, 如图 4所示, 该方法至少包括以下步骤 : 在步骤 S410中 , 确定多个用户名中包含的多个词汇 。 其中, 多个词汇指的是用户名 中包括的词汇 。 举例而言, 多个用户名包括“ George Herbert Walker Bush ,,以及“ George Walker Bush” , 则确定出的多个用户名中包含 的词汇为 " George”、 " Herbert”、 "Walker” 以及 “ Bush” o 在步骤 S420中, 根据多个词汇在每一个用户名中出现的频率 , 确定与每一用户 名对应 的高维向量。 其中, 确定出与多个 第一数据分 别对应的 多个用户名 , 例如该多个用户名为 " George Herbert Walker Bush” 和 " George Walker Bush” , 多个词汇在每一个用户 名中 出现的频率指的 就是用户名中的 每个词汇在对应 的用户名中 出现的次数, 例如 词“ George"、词" Herbert"、词" Walker”以及词" Bush”在用户名 " George Herbert Walker Bush,, 中分别出现了一次。 高维向量的 维度与多个第一 数据对应的用户 名中出现的词 汇的个数相关 , 例如 在上述 两个用户名中共 出现了四个词, 分别为 “ George”、 “Herbert”、 “Walker” 以及 “ Bush” , 进而可以与用户名对应的高维向量为四维向量。 举例而言, 与第一数据分别对应的多 个用户名为 “ George Herbert Walker Bush” 和 “ George Walker Bush” , 进而词汇 “ George”、 词汇 " Herbert”、 词汇 "Walker” 以及词 汇 " Bush”在用户名 " George Herbert Walker Bush”中分别出现了一次, 词汇 “ George”、 词汇 "Herbert”、 词汇 "Walker” 以及词汇 " Bush”在用户名 " George Walker Bush” 中分别出现了一次, 词汇 “ Herbert”在用户名 " George Walker Bush” 中出现 了 0次。 基于此, 可以得到与用户名 “ George Herbert Walker Bush”对应的一个四维向量 [1 1 1 1], 还可以得到与用户名 “ George Walker Bush”对应的四维度向量[1 0 1 l]o 在步骤 S430中, 根据与每一个用户名对应的高维向量, 计算多个用户名之间的 余弦距 离, 以确定多个用户名之间的相似 度值。 其中, 余弦距离指的是多个 高维向量的夹角 的余弦值, 该余弦值就是对 多个高 维向量 之间的余弦距离进 行计算所得到 的相似度值。 具体地, 可以利用计算公式 计算多个高维 向量之间的余弦 距离, 也可以利用算 法计算 多个高维向量之 间的余弦距离, 本示例性实施例对 此不做特殊限定 。 举例而言, 多个高维向量具体包括[1 1 1 1]和[1 0 1 1], 进而可以利用公式 (1) 计算多 个高维向量之 间的余弦距离, 以得到与多个高维向量对应的 用户名之间的相
Figure imgf000010_0001
在本示例性 实施例中, 提供了一种计算多个 用户名之间 的相似度的方式 , 进而 有助于 后续根据相似度 值确定出多个第 一数据是否为 同一个用户的数 据, 进而保证 后续数 据整合的准确度 。 在步骤 S130中, 若相似度值大于或等于预设相似度值, 确定与多个第一数据一 一对应 的其他用户数据 , 以对其他用户数据一致的多 个第一数据进 行整合得到 目标 数据 。 在本公开的 示例性实施例 中, 预设相似度值指的是与相似 度值进行比较 , 用于 判断与 多个第一数据分 别对应的多个 用户名是否相似 的阈值, 若相似度值大于预设 相似度 值, 则证明多个第一数据的用 户出生日期以及 用户名相同, 进而还需要确定 出与第 一数据对应的其 他用户数据, 以将其他用户数据也一致的第 一数据整合, 得 到 目标数据。 若相似度值 小于预设相似度 阈值, 则证明多个第一数据并 不是同一个用户 的数 据, 进而无需对多个第 一数据进行整合 。 举例而言, 第一数据包括待整合 数据 A以及 待整合数据 C, 并且, 计算得到的 待整合数 据 A的用户名与 待整合数据 C的用户名的相似度 值为 0.92, 由于预设相似 度值为 0.9, 因此, 此时相似度值大于预设相似度值, 进而确定出与待整合数据 A对 应的其他 用户数据 A-1 以及与待整合数据 C对应的其他用户数 据 C-1 , 若其他用户 数据 A-1和其他用户数据 C-1一致, 则证明待整合数据 A和待 整合数据 C 为同一 个用户 的数据, 进而对待整合数据 A和待整合数据 B进行整合得到目标数 据。 在可选的实施 例中, 图 5示出了医疗数据整合 方法中确定与多个 第一数据一一 对应的其 他用户数据的流 程示意图, 如图 5所示, 该方法至少包括以下步 骤: 在步 骤 S510中, 若相似度值大于或等于预设相似度值, 确定与多个用户名一一对应 的字 符长度 。 其中, 在相似度大于或等于预 设相似度值时 , 还需要确定与用户名对应的字 符 长度, 假设用户名为 “ Jone Doe ” , 则与该用户名对应的字符长度为 8, 之所以需要 确定与用 户名对应的字符 长度, 是因为当相似度值大于 或等于预设相 似度时, 还不 能完全确 保多个用户名 一定是一致的 , 还需要后续判断与用户名对应 的字符长度, 才可 以保证多个用户名 实时同一个用户 的用户名, 这是因为若用户名 为两个字符长 度较短 的用户名, 例如 “ Gone ”和 “ Goie” , 仅仅使用相似度值去判断, 有可能会 得到上述 两个用户名相似 的结论, 这显然与事实不符。 举例而言, 相似度值为 0.92, 预设相似度值为 0.9, 显然, 此时相似度值大于预 设相似度 值, 基于此, 确定出与用户名 XXI对应的 字符长度为 26, 与用户名 XX2 对应的字 符长度为 23 o 在步骤 S520中, 若多个字符长度大于或等于预设字符长度, 确定与多个第一数 据一一对 应的其他用户数 据。 其中, 预设字符长度为进一 步判断与字符长 度对应的用户 名是否一致的字 符长 度阈值 , 若多个字符长度大于或等于预 设长度, 才可以证明与字符长 度对应的用户 名一致 , 进而才需要确定出与多个用户 名对应的第一数 据中包括的其 他用户数据, 以保证 后续可以对其他 用户数据是否一 致进行判断, 进而确定第一数 据是否可以被 整合。 举例而言, 确定出与用户名 XXI对应的字符 长度为 26, 与用户名 XX2对应 的 字符长度 为 23 , 预设字符长度为 20, 显然, 上述两个用户名的字符长度都大于 20, 进而确定 出与用户名 XXI对应的第一数据 中包括的其他用 户数据为 A-1 , 确定出与 用户名 XX2对应的第一 数据中包括的其 他用户数据为 C-1。 在本示例性实 施例中, 若相似度值大于预设 相似度值, 且多个字符长度大 于预 设字符 长度时, 确定与多个第一数据对 应的其他用户数 据, 一方面, 从相似度值和 字符长 度两个维度去判 断用户名是否一 致, 完善了判断用户名是否一 致的逻辑; 另 一方面 , 保证了是在用户名一致的情况 下, 才确定第一数据的其他用 户数据, 提升 了数据 整合的效率。 在可选的实施 例中, 图 6示出了医疗数据整合 方法中对其他用 户数据一致的多 个第一 数据进行整合得 到目标数据的流程 示意图, 如图 6所示, 该方法至少包括以 下步骤 : 在步骤 S610中, 若在多个第一数据中存在与其他用户数据字段对应 的多个 其他用 户数据, 则对多个其他用户数据字 段进行判断。 其中, 其他用户数据字段 与其他 用户数据 对应, 例如其他 用户数据 字段为 id_number时, 与其他用户数据字段对应的其他 用户数据为身份证 编号, 具体地, 其 他用户 数据中包括了身 份证编号、 出生证编号、 护照号以及病人编号, 对应的, 除 了存在 与身份证编号对 应的其他用户 数据字段之外 , 还存在与出生证编号对应的其 他用户 数据字段, 与护照号对应的其 他用户数据字段 , 与病人编号对应的其他用户 数据字 段,进而需要对身份证编 号、出生证编号、护照号以及病 人编号都进行 比对, 才可 以保证多个第一数据 为需要被整合 的数据。 然而, 在第一数据中可能存 在与身份证编 号、 出生证编号、 护照号以及病人编 号分别 对应的四个其他 用户数据, 也可能只包括上述 四个其他用户数 据中任意的一 个, 也可能只包括上述 四个其他用户 数据中的任意两 个, 也可能只包括上述四个其 他用户 数据中的任意三 个。 基于此, 不论多个第一数据 中包括上述四个 其他用户数据 中的任意几个 , 可以 先判断 第一数据中是 否存在与其他用 户数据字段对应 的其他用户数据 , 若在第一数 据中存 在与其他用户数 据字段对应的 多个其他用户数 据, 则判断这多个其他用户数 据字段 是否一致。 举例而言, 第一数据为待整合 数据 A以及待 整合数据 C, 并且, 在待整合数据 A 中存在与其 他用户数据字段 id_nunber对应的身份证编号 6213004, 在待整合数据 C 中也存在与其 他用户数据字段 id_nunber对应的身份证编号 6213004, 显然, 此时 在上述 两个第一数据 中都存在与其他 用户数据字段对 应的其他用户数 据, 并且, 其 他用户 数据字段都是 id numbero 举例而言, 第一数据为待整合 数据 E以及 待整合数据 F, 并且, 在待整合数据 E 中存在与其他 用户数据字段 id_nunber对应的身份证编号 12456, 在待整合数据 F 中也存 在与其他用户数据 字段 id_nunber对应的身份证编号 12456, 除此之外, 在待 整合数 据 E中还存在与其他 用户数据字段 passport对应的护照号 0023 , 在待整合数 据 F中还存在与其他用户数 据字段 passport对应的护照号 2256。 显然, 此时在上述两个第一 数据中都存在与 其他用户数据 字段对应的其他 用户 数据 , 并且, 其他用户数据字段是一致的。 在步骤 S620中, 若与多个第一数据一一对应的其他用户数据字 段一致, 且与其 他用户 数据字段对应 的其他用户数据 一致, 则对多个第一数据进行整 合得到目标数 据。 其中, 若多个其他用户数据 字段一致, 且与多个其他用户 数据字段对应 的其他 用户数 据一致, 则对多个第一数据进行整 合。 举例而言, 第一数据为待整合 数据 A以及待 整合数据 C, 并且, 在待整合数据 A 中存在与其 他用户数据字段 id_nunber对应的身份证编号 6213004, 在待整合数据 C 中也存在与其 他用户数据字段 id_nunber对应的身份证编号 6213004。 显然, 此时在上述两个第一 数据中都存在与 其他用户数据 字段对应的其他 用户 数据 , 并且, 其他用户数据字段都是 id_number, 与其他用户数据字段 id_number对 应的其 他用户数据都是 6213004, 则对待整合数据 A和待整合数据 C进行整合得到 目标数 据。 举例而言, 第一数据为待整合 数据 E以及 待整合数据 F, 并且, 在待整合数据 E 中存在与其他 用户数据字段 id_nunber对应的身份证编号 12456, 在待整合数据 F 中也存 在与其他用户数据 字段 id_nunber对应的身份证编号 12456, 除此之外, 在待 整合数 据 E中还存在与其他 用户数据字段 passport对应的护照号 0023 , 在待整合数 据 F中还存在与其他用户数 据字段 passport对应的护照号 2256。 显然, 此时在上述两个第一 数据中都存在与 其他用户数据 字段对应的其他 用户 数据 , 并且, 其他用户数据字段是一致的, 然而, 与其他用户数据字段 passport对应 的护照 号不一致, 则不对待整合数据 E和待整合数据 F进行整合。 在本示例性 实施例中, 若多个第一数据中存 在与其他用户 数据字段对应 的多个 其他用 户数据, 且多个其他用户数据 字段和其他用户 数据均一致, 则对多个第一数 据进行 整合得到目标数 据, 完善了整合多个第一数据 的逻辑, 提升了数据整合的准 确度。 在可选的实施 例中, 图 7示出了医疗数据整合 方法中对多个第 一数据进行整合 得到 目标数据的流程示 意图, 如图 7所示, 该方法至少包括以下步骤: 在步骤 S710 中, 确定与多个第一数 据一一对应的 多个用户名的字 符长度, 并对多个字符长度进 行比较 得到字符比较结 果。 其中, 用户名的字符长度指 的是组成用户名 所使用到的字 符的个数, 字符比较 结果为 对多个字符长度 进行比较后得 到的结果, 并且, 多个字符长度指的是与多个 第一数 据分别对应的用户 名的字符长度 。 举例而言, 第一数据包括待整 合数据 A和待 整合数据 C, 其中, 与待整合数据 A 对应的用户 名为 “ Jone” , 与待整合数据 C对应的用户名为 “ Jone Doe” , 显然, 用户名 “ Jone Doe” 的字符长度大于用户名 “ Jone” 的字符长度。 在步骤 S720中, 根据字符比较结果, 建立第一用户名与第二用户名之间的映射 关系 , 以对多个第一数据进行整合得 到目标数据; 其中, 第一用户名的字符长度大 于第二 用户名的字符长度 。 其中, 基于字符比较结果 , 建立第一用户名与第二用户名 之间的映射关系 , 值 得说 明的是, 第一用户名的字符长度必须 大于第二用户名 的字符长度。 举例而言, 第一数据包括待整 合数据 A和待 整合数据 C, 其中, 与待整合数据 A 对应的用户 名为 “ Jone” , 与待整合数据 C对应的用户名为 “ Jone Doe” , 显然, 用户名 “ Jone Doe” 的字符长度大于用户名 “ Jone” 的字符长度。 基于此, 第一用户名为 “ Jone Doe” , 第二用户名为 “ Jone” , 具体地, 可以按 照键值对 的形式建立第一 用户名与第二用 户名之间的映射 关系,例如为 “ Jone Doe”: “ Jone” , 以此表示与第一用户名 “ Jone Doe”对应的用户和与第二用户名 “ Jone” 对应的 用户为同一个用户 。 值得说明的 是, 在后续数据验证的过程中 , 是按照键去查询目标数据的 , 由于 键是字 符长度较长的用 户名, 进而基于键去查询可 以保证查询到的 目标数据的准确 性, 进而实现对数据的准 确验证。 在本示例性 实施例中, 根据字符比较结果 , 建立第一用户名与第二用户名 之间 的映射 关系, 进而保证后续对数据进 行验证时, 可以基于字符长度更 长的第一用户 查找到对 应的目标数据 ,避免了基于字符长度更短 的第二用户查找 对应的目标数据 , 所造成 的目标数据查询错 误的情况发生 , 进而导致后续验证失败。 在可选的实施 例中, 图 8示出了医疗数据整合 方法中对多个第 一数据进行整合 得到 目标数据的流程示 意图, 如图 8所示, 该方法至少包括以下步骤: 在步骤 S810 中, 若在多个第一数据 中存在特定字符 , 则将特定字符移除, 以得到不包括特定字 符的多个 第一数据;其中,特定字符中包括除 了美国信息交换代 码之外的所有字符 。 其中, 美国信息交换标准代码 指的是 ASCII码, 特定字符指的是除了 ASCII码 之外的所 有字符,即特定字符指的是 非 ASCII码,之所以需要移除特定字符是 因为, 无法对 多个第一数据 中的非 ASCII 码进行处理, 进而需要将多个第一数 据中的非 ASCII码移 除, 以的待可以被处理的不包括 非 ASCII码的多个第一数据 。 举例而言, 第一数据为待整合 数据 A和待整 合数据 C, 其中, 在待整合数据 A 中非 ASCII码 XXX, 即第一数据中存特定 字符, 进而将 XXX从待整合 数据 A中移 除, 以得到不包括特定字符的待整合数据 A。 在步骤 S820中, 对不包括特定字符的多个第一数据进行整 合得到目标数据 。 其中, 将非 ASCII码移除后, 对不包括非 ASCII码的多个第一数据进行 整合得 到 目标数据。 举例而言, 第一数据为待整合 数据 A和待整 合数据 C, 其中, 在待整合数据 A 中包括特 定字符 XXX, 即第一数据中存在非 ASCII码, 进而将 XXX从待整合数据 A 中移除, 以得到不包括非 ASCII码的待整合数据 A, 进而对待整合数据 C和不包 括非 ASCII码的待整合数据 A进行整 合得到目标数据 。 在本示例性实 施例中, 将第一数据中存在的非 ASCII码移除, 避免了后续对第 一数据 中的非 ASCII码无法处理 的情况发生, 提高了整合多个第一 数据后得到的 目 标数据 的准确度以及效率 。 在可选的实施 例中, 对其他用户数据一致 的多个第一数据 进行整合得到 目标数 据之后 , 方法还包括: 按照特定数据格式存储目标数据 。 其中, 在得到目标数据之后 , 还可以按照特定数据格式对 目标数据进行存 储, 具体地 , 特定数据格式可以为 JSON (JavaScript Object Notation, JS对象简谱) 格 式, 特定数据格式也可 以是表格格式 , 特定数据格式还可以是与某种 数据库对应的 个数,特定数据格式 还可以任何一种 数据格式,本示例性实施例 对此不做特殊限 定。 举例而言, 对多个第一数据进行 整合之后, 将得到的目标数据按 照 Json格式存 储。 在本示例性 实施例中, 按照特定数据格式存 储目标数据, 方便后续对目标数据 的识别 , 进而当需要对用户进行数据验证 时, 可以提高验证的效率。 在本公开的示 例性实施例提 供的方法及装置 中, 对其他用户数据一致的多 个第 一数据 进行整合得到 目标数据, 并且, 第一数据是来自于多个不同系 统的待整合数 据中用 户出生日期相 同的数据, 进而实现了对不同系 统中待整合数据 的整合, 一方 面, 避免了现有技术 中对待整合数据进 行验证时出现 验证失败的情况 发生, 提高了 验证 的效率; 另一方面, 避免了现有技术中无法对所有 待整合数据进 行整合的情况 发生, 进而提高了待整合 数据的整合准确 度以及效率。 下面结合一应 用场景对本公开 实施例中医疗数据 整合方法做 出详细说明。 获取来自于系 统 1的待整合数据 A、 来自于系统 2的待整合数据 B、 来自于系 统 3的待整合数据 C以及来 自于系统 4的待整合数 据 D, 其中, 由于与待整合数据 A 对应的用户 出生日期和与待 整合数据 B对应的用户出生 日期相同, 因此, 确定多 个第一数 据为待整合数据 A和待整合数 据 B o 确定与待整合数 据 A对应的用 户名为 “ Jone Doe” , 与待整合数据 B对应的用 户名为 “ Jone” , 对上述两个用户名进行相似度计算的得到相似度值 1.2, 由于相似 度值大 于预设相似度值 0.9, 因此确定出与待整合数据 A对应的用 户其他数据 A-1 , 还确定 出与待整合数据 B对应的用户其 他数据 B-1 , 由于用户其他数据 A-1与用户 其他数据 B-1一致, 因此对待整合数据 A和待整合数据 B进行整合得到 目标数据。 在本应用场景 中,对其他用户数据一致 的多个第一数据 进行整合得到 目标数据, 并且 , 第一数据是来自于多个不同系 统的待整合数据 中用户出生 日期相同的数据, 进而实 现了对不同系统 中待整合数据 的整合, 一方面, 避免了现有技术中对待整合 数据进 行验证时出现验 证失败的情况 发生, 提高了验证的效率; 另一方面, 避免了 现有技 术中无法对所有 待整合数据进 行整合的情况发 生, 进而提高了待整合数据的 整合准确 度以及效率。 此外, 在本公开的示例性实施 例中, 还提供一种医疗数据整合 装置。 图 9示出 了医疗数 据整合装置的结 构示意图,如图 9所示,医疗数据整合装置 900可以包括: 确定模块 910、 相似度计算模块 920和整合模块 930。 其中: 确定模块 910,被配置为获取来自于多个不 同系统的待整合数据 ,并在多个待整 合数据 中确定出用户出生 日期相同的多个 第一数据;相似度计算模 块 920,被配置为 确定与多 个第一数据一 一对应的用户名 , 并对多个用户名进行相似度 计算得到相似 度值; 整合模块 930, 被配置为若相似度值大于或等于预设 相似度值, 确定与多个第 一数据 一一对应的多个 其他用户数据 , 以对其他用户数据一致的多个 第一数据进行 整合得到 目标数据。 上述医疗数据 整合装置 900的具体细节已经在对应 的医疗数据整合方 法中进行 了详细 的描述, 因此此处不再赘述。 应当注意, 尽管在上文详细描述 中提及医疗数据 整合装置 900的若干模块或者 单元, 但是这种划分并非强制性的。 实际上, 根据本公开的实施方式 , 上文描述的 两个或更 多模块或者单元 的特征和功能 可以在一个模块 或者单元中具 体化。 反之, 上文描述 的一个模块或 者单元的特征和 功能可以进一 步划分为由多个 模块或者单元 来具体化 。 此外, 在本公开的示例性实施 例中, 还提供了一种能够实现 上述方法的 电子设 备 O 下面参照图 10来描述根据本发明的这 种实施例的电子 设备 1000 o 图 10显示的 电子设备 1000仅仅是一个示例,不应对本发 明实施例的功能和 使用范围带来任 何限 制。 如图 10所示, 电子设备 1000以通用计算设备的形式表现。 电子设备 1000的组 件可 以包括但不限于: 上述至少一个处理 单元 1010、 上述至少一个存储单元 1020、 连接不 同系统组件 (包括存储单元 1020和处理单元 1010) 的总线 1030、 显示单元 1040o 其中, 所述存储单元存储有程序 代码, 所述程序代码可以被所述 处理单元 1010 执行, 使得所述处理单元 1010执行本说明书上述 “示例性方法 ”部分中描述的根据本 发明各种 示例性实施例 的步骤。 存储单元 1020可以包括易失性存储单元形 式的可读介质,例如随机存 取存储单 元(RAM )1021和 /或高速缓存存储单元 1022,还可以进一步包括只读存储单元(ROM) 1023 o 存储单元 1020还可以包括具有一组 (至少一个) 程序模块 1025的程序 /使用工 具 1024, 这样的程序模块 1025包括但不限于: 操作系统、一个或者多个应用程序、 其它程 序模块以及程序 数据, 这些示例中的每一个或 某种组合中可能 包含网络环境 的现实 。 总线 1030可以为表示几类总线结构中的一种 或多种,包括存储单元总 线或者存 储单元 控制器、 外围总线、 图形加速端口、 处理单元或者使用多种总 线结构中的任 意总线结 构的局域总线 。 电子设备 1000也可以与一个或多个外部设 备 1070(例如键盘、指向设备、蓝牙 设备等 )通信,还可与一个或者多个使得用户能 与该电子设备 1000交互的设备通信, 和 /或与使得该电子设备 1000能与一个或多个其 它计算设备进行通 信的任何设备(例 如路 由器、调制解调器等等 )通信。这种通信可以通过输 入 /输出(I/O)接口 1050进 行。 并且, 电子设备 1000还可以通过网络适配器 1060与一个或者多个网络 (例如 局域网 (LAN) , 广域网 (WAN) 和 /或公共网络, 例如因特网) 通信。 如图所示, 网络适配器 1060通过总线 1030与电子设备 1000的其它模块通信。应当明白, 尽管 图中未示 出,可以结合电子设备 1000使用其它硬件和 /或软件模块,包括但不限于: 微代码 、 设备驱动器、 冗余处理单元、 外部磁盘驱动阵列、 RAI系统、 磁带驱动器以 及数据备 份存储系统等 。 通过以上的实 施例的描述, 本领域的技术人 员易于理解, 这里描述的示例 实施 例可 以通过软件实现 , 也可以通过软件结合必要的硬件 的方式来实现 。 因此, 根据 本公开 实施例的技术方 案可以以软件产 品的形式体现 出来, 该软件产品可以存储在 一个非 易失性存储介质 (可以是 C D-ROM, U盘, 移动硬盘等) 中或网络上, 包括 若干指令 以使得一台计 算设备 (可以是个人计算机、 服务器、 终端装置、 或者网络 设备等 ) 执行根据本公开实施例的方法 。 在本公开的示 例性实施例 中, 还提供了一种计算机可读存储 介质, 其上存储有 能够实现 本说明书上述 方法的程序产 品。 在一些可能的实施例中, 本发明的各个方 面还可 以实现为一种程 序产品的形式 , 其包括程序代码, 当所述程序产品在终端设 备上运行 时, 所述程序代码用于使所述 终端设备执行 本说明书上述 “示例性方法 ”部 分中描述 的根据本发明各 种示例性实施例 的步骤。 参考图 11 所示, 描述了根据本发明的实施例的用于 实现上述方法 的程序产品 1100, 其可以采用便携式紧凑 盘只读存储器(CD-ROM)并包括程序代码 , 并可以在终 端设备 , 例如个人电脑上运行。然而, 本发明的程序产品不限于此, 在本文件中, 可 读存储介 质可以是任何 包含或存储程序 的有形介质, 该程序可以被指令执行系统、 装置或者 器件使用或者与 其结合使用。 所述程序产 品可以采用一个 或多个可读介质 的任意组合。 可读介质可以是可读 信号介质 或者可读存储介 质。可读存储介质例如 可以为但不限于 电、磁、光、 电磁、 红外线 、 或半导体的系统、 装置或器件, 或者任意以上的组合。 可读存储介质的更 具体的例 子 (非穷举的列表) 包括: 具有一个或多个导线的电连接、 便携式盘、 硬 盘、随机存取存储器 (RAM)、只读存储器(ROM)、可擦式可编程只读存储器( EPROM 或闪存 ) 、 光纤、 便携式紧凑盘只读存储器(CD-ROM), 光存储器件、 磁存储器件、 或者上述 的任意合适的组合 。 计算机可读信 号介质可以包括 在基带中或者 作为载波一部 分传播的数据信 号, 其中承载 了可读程序代 码。 这种传播的数据信号可 以采用多种形式 , 包括但不限于 电磁信 号、 光信号或上述的任意合适 的组合。 可读信号介质还可以是 可读存储介质 以外的 任何可读介质 , 该可读介质可以发送、 传播或者传输用于由指 令执行系统、 装置或者 器件使用或者与 其结合使用的程 序。 可读介质上包 含的程序代码 可以用任何适当 的介质传输, 包括但不限于无线 、 有线、 光缆、 RF等等, 或者上述的任意合适的组合 。 可以以一种或 多种程序设计语 言的任意组合 来编写用于执 行本发明操作 的程序 代码, 所述程序设计语言包括面 向对象的程序设 计语言一诸如 Java、 C++等, 还包括 常规的过 程式程序设计语 言一诸如 “C”语言或类似的程序设计语 言。程序代码可以完 全地在 用户计算设备上 执行、 部分地在用户设备上执 行、 作为一个独立的软件包执 行、 部分在用户计算设 备上部分在远程 计算设备上执 行、 或者完全在远程计算设备 或服务 器上执行。 在涉及远程计算设备 的情形中, 远程计算设备可 以通过任意种类 的网络 , 包括局域网 (LAN) 或广域网 (WAN) , 连接到用户计算设备, 或者, 可 以连接到 外部计算设备 (例如利用因特网服务提供商来通过 因特网连接) 。 本领域技术人 员在考虑说 明书及实践这里公 开的发明后, 将容易想到本公 开的 其他实施 例。本申请旨在涵盖本 公开的任何变型 、用途或者适应性变化,这些变型 、 用途或 者适应性变化遵循 本公开的一般 性原理并包括本 公开未公开的 本技术领域中 的公知 常识或惯用技术 手段。 说明书和实施例仅被视 为示例性的, 本公开的真正范 围和精神 由权利要求指 出。
Medical Data Integration Method and Device, Computer Storage Medium, and Electronic Equipment Technical Field The present disclosure relates to the field of data processing, and in particular, to a medical data integration method and device, computer-readable storage media, and electronic equipment. BACKGROUND OF THE INVENTION With the development of computer technology, a user may have multiple records in multiple different systems. Furthermore, when the user's data needs to be verified, the multiple records of the user in multiple different systems need to be verified. verify. In related technologies, when a data verification request for a user is received, the verification requirement will be transferred to the relevant personnel of the data group for processing. In response to the verification requirement, the relevant personnel of the data group will perform manual analysis and manually analyze different data. Multiple records existing in the system are merged to obtain the merged user data. Obviously, in this case, as the subsequent records in different systems increase, the user data needs to be merged again, which reduces the cost of user data merging. accuracy and efficiency, leading to user data verification failure. In view of this, there is an urgent need to develop a new medical data integration method and device in this field. It should be noted that the information disclosed in the above background section is only used to enhance understanding of the background of the present disclosure, and therefore may include information that does not constitute prior art known to those of ordinary skill in the art. SUMMARY OF THE INVENTION The purpose of the present disclosure is to provide a medical data integration method, a medical data integration device, a computer-readable storage medium and an electronic device, thereby overcoming, at least to a certain extent, the low user data integration accuracy and efficiency caused by related technologies. question. Additional features and advantages of the disclosure will be apparent from the following detailed description, or, in part, may be learned by practice of the disclosure. According to a first aspect of an embodiment of the present invention, a medical data integration method is provided. The method includes: acquiring data to be integrated from multiple different systems, and determining the birth date of the user from the plurality of data to be integrated. Multiple first data with the same date; Determine multiple user names corresponding to multiple first data, and perform similarity calculation on multiple user names to obtain similarity values; If the similarity value If the similarity value is greater than or equal to the preset similarity value, other user data corresponding one-to-one with the plurality of first data are determined, so that the plurality of first data consistent with the other user data are integrated to obtain target data. In an exemplary embodiment of the present invention, obtaining data to be integrated from multiple different systems includes: obtaining data tables to be integrated from multiple different systems, and obtaining data from the data tables to be integrated. Extract data to be integrated corresponding to specific data fields; wherein, the specific data fields include unique data identification, the user's date of birth, the user name, and other user data fields, and the other user data fields include ID number, Birth certificate number, passport number and patient number. In an exemplary embodiment of the present invention, determining a user name that corresponds to a plurality of first data in a one-to-one manner includes: determining a unique data identifier that corresponds to a plurality of the first data in a one-to-one manner. The value of; if the values of the multiple unique data identifiers are different, determine multiple user names corresponding to the multiple first data one by one. In an exemplary embodiment of the present invention, performing similarity calculation on multiple user names to obtain similarity values includes: determining multiple words contained in multiple user names; The frequency of occurrence of each word in each user name is determined, and a high-dimensional vector corresponding to each user name is determined; based on the high-dimensional vector corresponding to each user name, a plurality of the users are calculated The cosine distance between names is used to determine the similarity value between multiple user names. In an exemplary embodiment of the present invention, if the similarity value is greater than or equal to a preset similarity value, determining other user data corresponding to a plurality of the first data one-to-one includes: if the If the similarity value is greater than or equal to the preset similarity value, determine the character lengths corresponding to multiple user names; if the character lengths of multiple usernames are greater than or equal to the preset character length, determine the character lengths corresponding to multiple usernames. One data corresponds to multiple other user data one-to-one. In an exemplary embodiment of the present invention, the step of integrating multiple first data that are consistent with the other user data to obtain target data includes: If multiple other user data fields corresponding to the other user data fields are determined, a plurality of the other user data fields are determined; if the other user data fields corresponding to the multiple first data fields are consistent, and If it is consistent with the other user data corresponding to the other user data fields, a plurality of the first data are integrated to obtain the target data. In an exemplary embodiment of the present invention, integrating a plurality of the first data to obtain target data includes: determining the character length of a user name that corresponds to a one-to-one correspondence with the plurality of first data, and Compare the plurality of character lengths to obtain a character comparison result; establish a mapping relationship between the first user name and the second user name according to the character comparison result, so as to integrate the plurality of first data to obtain the target Data; wherein, the character length of the first username is greater than the character length of the second username. In an exemplary embodiment of the present invention, integrating multiple first data to obtain target data includes: if specific characters exist in multiple first data, converting the specific characters into Remove to obtain a plurality of first data that do not include the specific characters; wherein, the specific characters include all characters except the American Information Exchange Code; for a plurality of all first data that do not include the specific data The first data is integrated to obtain the target data. In an exemplary embodiment of the present invention, after the plurality of first data that are consistent with the other user data are integrated to obtain the target data, the method further includes: storing the target according to a specific data format. data. According to a second aspect of the embodiment of the present invention, a medical data integration device is provided. The device includes: a determination module configured to obtain data to be integrated from multiple different systems, and to obtain data to be integrated from multiple different systems. Multiple first data with the same date of birth of the user are determined in the data; a similarity calculation module is configured to determine user names corresponding to multiple first data, and perform similarity calculation on multiple user names. Calculate the similarity value; the integration module is configured to, if the similarity value is greater than or equal to the preset similarity value, determine other user data that corresponds to a plurality of the first data one-to-one, so as to compare the other users A plurality of first data with consistent data are integrated to obtain target data. According to a third aspect of the embodiment of the present invention, an electronic device is provided, including: a processor and a memory; wherein computer-readable instructions are stored on the memory, and when the computer-readable instructions are executed by the processor, the above is achieved The medical data integration method of any exemplary embodiment. According to a fourth aspect of an embodiment of the present invention, a computer-readable storage medium is provided, on which a computer program is stored. When the computer program is executed by a processor, the medical data integration method in any of the above exemplary embodiments is implemented. It should be understood that the above general description and the following detailed description are exemplary and explanatory only, and do not limit the present disclosure. BRIEF DESCRIPTION OF THE DRAWINGS The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure. Obviously, the drawings in the following description are only some embodiments of the present disclosure. For those of ordinary skill in the art, other drawings can be obtained based on these drawings without exerting creative efforts. Figure 1 schematically shows a flow chart of the medical data integration method in the embodiment of the present disclosure; Figure 2 schematically shows the data to be integrated in the medical data integration method in the embodiment of the present disclosure; Figure 3 schematically shows the implementation of the present disclosure A schematic flow chart of determining multiple usernames corresponding to multiple first data in the medical data integration method in the example; Figure 4 schematically shows the similarity calculation of multiple usernames in the medical data integration method in the embodiment of the present disclosure. A schematic flowchart of calculating the similarity value; Figure 5 schematically shows a schematic flowchart of determining other user data that corresponds to multiple first data in the medical data integration method in the embodiment of the present disclosure; Figure 6 schematically shows the present invention In the disclosed embodiment, the medical data integration method integrates multiple first data that are consistent with other user data to obtain target data; Figure 7 schematically shows the medical data integration method in the disclosed embodiment. A schematic flowchart of integrating data to obtain target data; Figure 8 schematically illustrates a schematic flowchart of integrating multiple first data to obtain target data in the medical data integration method in an embodiment of the disclosure; Figure 9 schematically illustrates the implementation of the disclosure A schematic structural diagram of a medical data integration device in the example; Figure 10 schematically shows an electronic device used for a medical data integration method in an embodiment of the present disclosure; Figure 11 schematically illustrates a computer-readable storage medium used for a medical data integration method in an embodiment of the present disclosure. DETAILED DESCRIPTION Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in various forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concepts of the example embodiments. To those skilled in the art. The described features, structures or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to provide a thorough understanding of embodiments of the disclosure. However, those skilled in the art will appreciate that the technical solutions of the present disclosure may be practiced without one or more of the specific details being omitted, or other methods, components, devices, steps, etc. may be adopted. In other instances, well-known technical solutions have not been shown or described in detail to avoid obscuring aspects of the disclosure. The terms "a", "an", "the" and "said" are used in this specification to indicate the existence of one or more elements/components/etc.; the terms "include" and "have" are used to indicate an open-ended Inclusive is intended and means that there may be additional elements/components/etc. in addition to the listed elements/components/etc.; the terms "first" and "second, etc. are used as labels only and do not refer to The number of objects is limited. In addition, the accompanying drawings are only schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings represent the same or similar parts, and thus their repeated description will be omitted. DRAWINGS Some block diagrams shown in are functional entities and do not necessarily correspond to physically or logically independent entities. In response to problems existing in related technologies, the present disclosure proposes a medical data integration method. Figure 1 shows medical data The flow chart of the integration method is shown in Figure 1. The medical data integration method at least includes the following steps: Step S110. Obtain the data to be integrated from multiple different systems, and determine that the user's birth date is the same in the multiple data to be integrated. A plurality of first data. Step S120. Determine the user names corresponding to the plurality of first data, and perform similarity calculation on the multiple user names to obtain a similarity value. Step S130. If the similarity value is greater than or equal to the predetermined Assuming a similarity value, determine other user data that corresponds to multiple first data one-to-one, so as to integrate multiple first data that are consistent with other user data to obtain target data. In the methods provided in exemplary embodiments of the present disclosure, and In the device, multiple first data that are consistent with other user data are integrated to obtain target data, and the first data is data with the same birth date of the user among the data to be integrated from multiple different systems, thereby realizing the integration of different systems. The integration of the data to be integrated, on the one hand, avoids the verification failure in the existing technology when the data to be integrated is verified, and improves the efficiency of verification; on the other hand, it avoids the failure in the existing technology to verify all the data to be integrated. Data integration occurs, thereby improving the integration accuracy and efficiency of the data to be integrated. The following is a detailed description of each step of the medical data integration method. In step S110, data to be integrated from multiple different systems is obtained, and multiple first data with the same date of birth of the user are determined from the multiple data to be integrated. In the exemplary embodiment of the present disclosure, the data to be integrated refers to data from different systems. Specifically, it can be data from the immigration data system, data from the birth database, or data from the birth database. The data may come from the hospital database, or may also come from any system in which the data to be integrated may exist, which is not particularly limited in this exemplary embodiment. Each data to be integrated corresponds to a user. These users may be the same user or different users. The user's birth date refers to the data corresponding to the user's birth date in the data to be integrated. Based on this, the first data It refers to the data with the same date of birth of the users in the data to be integrated. For example, there are 6 data to be integrated. These 6 data to be integrated come from different systems. Specifically, the 6 data to be integrated include data to be integrated A, data to be integrated B, data to be integrated C, and data to be integrated. D. Data E to be integrated and data F to be integrated. Among them, the birth date of the user corresponding to the data A to be integrated is 2012-02-02, and the birth date of the user corresponding to the data B to be integrated is 2000-10-19. The birth date of the user corresponding to the integrated data C is 2012-02-02, the birth date of the user corresponding to the data D to be integrated is 2012-02-05, the birth date of the user corresponding to the data E to be integrated is 1998-06-23, and The user's birth date corresponding to the data F to be integrated is 1998-06-23 o Based on this, it is determined that the first data with the same birth date of the user is the data to be integrated A, the data to be integrated C, the data to be integrated E and the data to be integrated F, Among them, the birth dates of the users of the data A to be integrated and the data C to be integrated are the same, and the birth dates of the users of the data E and the data F to be integrated are the same. In an optional embodiment, obtaining data to be integrated from multiple different systems includes: obtaining data tables to be integrated from multiple different systems, and extracting data to be integrated corresponding to specific data fields from the data tables to be integrated. Integrate data; among them, specific data fields include unique data identification, user date of birth, user name and other user data fields. Other user data fields include ID number, birth certificate number, passport number and patient number. Among them, usually in different systems, the data to be integrated is stored in the data table to be integrated. It is worth mentioning that in addition to the data to be integrated, the data table to be integrated also stores other data that does not need to be integrated. The data to be integrated needs to be extracted from the data table to be integrated, which corresponds to the specific data field. The specific data field refers to the field corresponding to the data to be integrated. The specific data field includes a unique data identifier, which corresponds to different systems. The specific data field also includes the user's date of birth. The data to be integrated corresponding to the specific data field of the user's date of birth can be 1988-07-23, The specific data field also includes the user name. The data to be integrated corresponding to the specific data field of the user name can be "Jane Doe". The specific data field also includes other user data fields. Specifically, the other user data fields include the ID card number. , the data to be integrated corresponding to other user data fields such as ID number can be "610235XXXXXXXX2771", other user data fields also include birth certificate numbers, and the data to be integrated corresponding to other data fields such as birth certificate numbers can be "2XX0410"", Other user data fields also include the passport number. The data to be integrated corresponding to the other user data field of the passport can be "C0XXXXX2". Other user data fields also include the patient number. The data to be integrated corresponding to the other data field of the patient number. It can be "BNXXXX6" o For example, Figure 2 schematically shows a schematic diagram of the data to be integrated. As shown in Figure 2, field 210 is the line number field line, and value 211 is the value corresponding to field 210, used to represent In the row where the data to be integrated is located, the specific data field 220 is the unique data identifier unique_id, the identifier 221 is the field value corresponding to the specific data field 220, the specific data field 230 is the patient number patient_id, and the number 231 is corresponding to the specific data field 230. Patient number, the specific data field 240 is the user name user_name, the name 241 is the user name corresponding to the specific data field 240, the specific data field 250 is the user's birth date birth_date, the date 251 is the date corresponding to the specific data field 250, the field 260 is Gender sex, data 261 corresponds to field 260, specific data field 270 is ID card number id_number, number 271 corresponds to specific data field 270, specific data field 280 is birth certificate number birth certificate number, number 281 corresponds to specific data field 280, The specific data field 290 is the passport number, and the number 291 corresponds to the specific data field 290. In this exemplary embodiment, the data to be integrated is data corresponding to specific fields extracted from the data table to be integrated, which avoids obtaining other data in the data table to be integrated that does not need to be integrated, and improves the subsequent data to be integrated. The integration accuracy and efficiency. In step S120, user names corresponding one-to-one to the plurality of first data are determined, and similarity calculations are performed on the plurality of user names to obtain similarity values. In the exemplary embodiment of the present disclosure, the user name refers to the user name corresponding to the first data. It is worth mentioning that since the first data is data with the same birth date of the user determined in multiple data to be integrated. , although the user's date of birth is the same, since the same user may be recorded as different user names in different systems, in order to determine whether multiple first data with the same user's date of birth is the data of the same user, it is necessary to compare multiple Similarity calculation is performed on user names to obtain similarity values between multiple user names. For example, the first data includes data A to be integrated and data C to be integrated with the same date of birth of the user. Based on this, it is determined that the user name corresponding to the data A to be integrated is "George Herbert Walker Bush", and the user name to be integrated is determined. The user name corresponding to data C is "George Walker Bush", and then the similarity calculation is performed on the user name "George Herbert Walker Bush" and the user name "George Walker Bush" to obtain a similarity value of 0.85 . In an optional embodiment, Figure 3 shows a schematic flow chart of determining a user name corresponding to multiple first data in the medical data integration method. As shown in Figure 3, the method at least includes the following steps: In step S310, determine the user name corresponding to multiple first data. The value of the unique data identifier corresponding to the data one-to-one. Among them, the value of the unique data identifier is related to the system from which the first data comes, and the values of different unique data identifiers correspond to different systems. Therefore, it is necessary to first determine multiple unique data corresponding to multiple first data. The value of the identifier ensures that the first data from different systems can be integrated later. For example, the plurality of first data include data A to be integrated with the same date of birth of the user and data A to be integrated. According to C, based on this, it is determined that the value of the unique data identifier corresponding to the data A to be integrated is XXI, and the value of the unique data identifier corresponding to the data C to be integrated is XX2. In step S320, if the values of the multiple unique data identifiers are different, multiple user names corresponding one-to-one to the multiple first data are determined. When the values of the multiple unique data identifiers are different, it proves that the multiple first data come from different systems. Only then does it need to determine the user names corresponding to the multiple first data. When the values of multiple unique data identifiers are the same, it proves that multiple first data come from the same system, because in the same system, there is a unified standard for data storage. Therefore, the first data in the same system are all is the data obtained after data integration processing. Therefore, when the values of multiple unique data identifiers are the same, there is no need to integrate multiple first data. For example, the plurality of first data include data A to be integrated and data C to be integrated with the same date of birth of the user. Based on this, it is determined that the unique data identifier corresponding to the data A to be integrated is XXI, and the unique data identifier corresponding to the data to be integrated C is The unique data identifier is XX2. Obviously, at this time, the multiple unique data identifiers are not the same, and it is necessary to determine the user name "George Herbert Walker Bush" corresponding to the data A to be integrated, and it is also necessary to determine the user name "George Walker Bush" corresponding to the data C to be integrated. ". In this exemplary embodiment, if multiple first data identifiers are different, multiple user names respectively corresponding to the multiple first data are determined to ensure that the objects of similarity calculation are users with the same birth date from different systems. The user name of the data to be integrated realizes the integration of data from the user name dimension and improves the efficiency of integrating the data to be integrated. In an optional embodiment, FIG. 4 shows a schematic flowchart of performing similarity calculation on multiple user names to obtain similarity values in the medical data integration method. As shown in FIG. 4 , the method at least includes the following steps: In step In S410, multiple words contained in multiple user names are determined. Among them, multiple words refer to words included in the user name. For example, if multiple usernames include "George Herbert Walker Bush ", and "George Walker Bush", then the words contained in the multiple usernames determined are "George", "Herbert", "Walker" and "Bush" ” o In step S420, determine the high-dimensional vector corresponding to each user name based on the frequency of multiple words appearing in each user name. Wherein, determine multiple user names corresponding to the multiple first data. For example, the multiple user names are "George Herbert Walker Bush" and "George Walker Bush". The frequency of multiple words appearing in each user name refers to the occurrence of each word in the user name in the corresponding user name. times, for example the word "George", the word "Herbert", the word "Walker" and the word "Bush " each appear once in the username "George Herbert Walker Bush". The dimension of the high-dimensional vector is related to the number of words appearing in the user names corresponding to the multiple first data, for example A total of four words appear in the above two user names, namely "George", "Herbert", "Walker" and "Bush", and the high-dimensional vector corresponding to the user name can be a four-dimensional vector. For example, multiple user names corresponding to the first data are "George Herbert Walker Bush" and "George Walker Bush" respectively, and then the vocabulary "George", the vocabulary "Herbert", the vocabulary "Walker" and the vocabulary "Bush" are in The username "George Herbert Walker Bush" appears once each, the word "George", the word "Herbert", the word "Walker" and the word "Bush" appear once each in the username "George Walker Bush", the word "Herbert""" appears 0 times in the username "George Walker Bush". Based on this, you can get a four-dimensional vector [1 1 1 1] corresponding to the user name "George Herbert Walker Bush", and you can also get a four-dimensional vector [1 0 1 l] corresponding to the user name "George Walker Bush" o in In step S430, based on the high-dimensional vector corresponding to each user name, the cosine distance between multiple user names is calculated to determine the similarity value between the multiple user names. Among them, the cosine distance refers to the cosine value of the angle between multiple high-dimensional vectors, and the cosine value is the similarity value obtained by calculating the cosine distance between multiple high-dimensional vectors. Specifically, a calculation formula may be used to calculate the cosine distance between multiple high-dimensional vectors, or an algorithm may be used to calculate the cosine distance between multiple high-dimensional vectors, which is not specifically limited in this exemplary embodiment. For example, multiple high-dimensional vectors specifically include [1 1 1 1] and [1 0 1 1], and then formula (1) can be used to calculate the cosine distance between multiple high-dimensional vectors to obtain the cosine distance between multiple high-dimensional vectors. The relationship between the usernames corresponding to the dimensional vectors
Figure imgf000010_0001
In this exemplary embodiment, a method of calculating the similarity between multiple user names is provided, which helps to subsequently determine whether multiple first data are the data of the same user based on the similarity value, and thus Ensure the accuracy of subsequent data integration. In step S130, if the similarity value is greater than or equal to the preset similarity value, other user data corresponding to the plurality of first data are determined to integrate the plurality of first data consistent with the other user data to obtain the target data. . In an exemplary embodiment of the present disclosure, the preset similarity value refers to a threshold value that is compared with the similarity value and used to determine whether multiple user names respectively corresponding to the multiple first data are similar. If the similarity value is greater than the preset similarity value, it proves that the user birth dates and user names of multiple first data are the same, and then it is necessary to determine other user data corresponding to the first data to integrate the first data that is also consistent with other user data. , get the target data. If the similarity value is less than the preset similarity threshold, it is proved that the plurality of first data are not the data of the same user, and there is no need to integrate the plurality of first data. For example, the first data includes data A to be integrated and data C to be integrated, and the calculated similarity value of the user name of the data A to be integrated and the user name of the data C to be integrated is 0.92. Since the preset similarity The value is 0.9. Therefore, at this time, the similarity value is greater than the preset similarity value, and then other user data A-1 corresponding to the data A to be integrated and other user data C-1 corresponding to the data C to be integrated are determined. If If other user data A-1 is consistent with other user data C-1, it proves that the data to be integrated A and the data to be integrated C are the data of the same user, and then the data to be integrated A and the data to be integrated B are integrated to obtain the target data. In an optional embodiment, Figure 5 shows a schematic flowchart of determining other user data that corresponds to multiple first data in a medical data integration method. As shown in Figure 5, the method at least includes the following steps: In step S510, if the similarity value is greater than or equal to the preset similarity value, character lengths corresponding to multiple user names are determined. Among them, when the similarity is greater than or equal to the preset similarity value, it is also necessary to determine the character length corresponding to the user name. Assuming that the user name is "Jone Doe", the character length corresponding to the user name is 8. The reason why The character length corresponding to the user name is determined because when the similarity value is greater than or equal to the preset similarity, it cannot be completely guaranteed that multiple user names must be consistent. It is also necessary to subsequently determine the character length corresponding to the user name. It can ensure that multiple usernames are the same user's username in real time. This is because if the username has two usernames with shorter character length, such as "Gone" and "Goie", it is possible to judge only by using the similarity value. You will get the conclusion that the above two user names are similar, which is obviously not consistent with the facts. For example, the similarity value is 0.92, and the preset similarity value is 0.9. Obviously, the similarity value is greater than the preset similarity value at this time. Based on this, it is determined that the character length corresponding to the user name XXI is 26, which is the same as the user name. The character length corresponding to the name XX2 is 23 o In step S520, if the lengths of multiple characters are greater than or equal to the preset character length, determine other user data corresponding to the multiple first data one-to-one. Among them, the preset character length is a character length threshold for further judging whether the username corresponding to the character length is consistent. If the length of multiple characters is greater than or equal to the preset length, it can be proved that the username corresponding to the character length is consistent, and then it is necessary Other user data included in the first data corresponding to the multiple user names is determined to ensure that subsequent judgment can be made on whether the other user data is consistent, and thereby determine whether the first data can be integrated. For example, it is determined that the character length corresponding to the user name XXI is 26, the character length corresponding to the user name XX2 is 23, and the default character length is 20. Obviously, the character lengths of the above two user names are greater than 20, and then It is determined that the other user data included in the first data corresponding to the user name XXI is A-1, and the other user data included in the first data corresponding to the user name XX2 is determined to be C-1. In this exemplary embodiment, if the similarity value is greater than the preset similarity value and the plurality of character lengths are greater than the preset character length, other user data corresponding to the plurality of first data is determined. On the one hand, from the similarity The two dimensions of value and character length are used to determine whether the user names are consistent, which improves the logic of determining whether the user names are consistent; on the other hand, it ensures that only when the user names are consistent, other user data of the first data are determined, improving improve the efficiency of data integration. In an optional embodiment, Figure 6 shows a schematic flow chart of integrating multiple first data that are consistent with other user data to obtain target data in a medical data integration method. As shown in Figure 6, the method at least includes the following steps : In step S610, if there is a plurality of other user data corresponding to other user data fields in the plurality of first data, a judgment is made on the plurality of other user data fields. Among them, other user data fields correspond to other user data. For example, when the other user data field is id_number, the other user data corresponding to the other user data fields is the ID card number. Specifically, the other user data includes the ID card number, birth ID number, passport number and patient number. Correspondingly, in addition to other user data fields corresponding to the ID card number, there are also other user data fields corresponding to the birth certificate number, and other user data fields corresponding to the passport number. Other user data fields corresponding to the patient number need to be compared with the ID number, birth certificate number, passport number and patient number to ensure that the multiple first data are the data that need to be integrated. However, the first data may include four other user data corresponding to the ID number, birth certificate number, passport number and patient number respectively, or it may only include any one of the above four other user data, or it may only include Including any two of the above four other user data, or may only include any three of the above four other user data. Based on this, no matter whether the plurality of first data includes any of the four other user data mentioned above, it can first be determined whether there is other user data corresponding to other user data fields in the first data. If there is other user data in the first data, Multiple other user data corresponding to other user data fields, then determine whether the multiple other user data fields are consistent. For example, the first data is data A to be integrated and data C to be integrated. Furthermore, in the data A to be integrated, there is an ID number 6213004 corresponding to the id_nunber field of other user data, and in the data C to be integrated, there is also an ID number 6213004 corresponding to the field id_nunber of other user data. The user data field id_nunber corresponds to the ID number 6213004. Obviously, at this time, there are other user data corresponding to other user data fields in the above two first data, and the other user data fields are all id numbero. For example, The first data is the data E to be integrated and the data F to be integrated. Moreover, in the data E to be integrated, there is an ID card number 12456 corresponding to the field id_nunber of other user data, and in the data F to be integrated, there is also an ID number 12456 corresponding to the field id_nunber of other user data. The corresponding ID number is 12456. In addition, in the data E to be integrated, there is also a passport number 0023 corresponding to other user data fields passport. In the data F to be integrated, there is also a passport number corresponding to other user data fields passport. 2256. Obviously, at this time, there are other user data corresponding to other user data fields in the above two first data, and the other user data fields are consistent. In step S620, if the other user data fields corresponding to the plurality of first data are consistent and consistent with the other user data corresponding to the other user data fields, the plurality of first data are integrated to obtain the target data. Wherein, if multiple other user data fields are consistent and consistent with other user data corresponding to multiple other user data fields, the multiple first data are integrated. For example, the first data is data A to be integrated and data C to be integrated. Furthermore, in the data A to be integrated, there is an ID number 6213004 corresponding to the id_nunber field of other user data, and in the data C to be integrated, there is also an ID number 6213004 corresponding to the field id_nunber of other user data. The ID number corresponding to the user data field id_nunber is 6213004. Obviously, at this time, there are other user data corresponding to other user data fields in the above two first data, and the other user data fields are all id_number, and the other user data corresponding to the other user data field id_number are all 6213004, Then the data to be integrated A and the data to be integrated C are integrated to obtain the target data. For example, the first data is the data to be integrated E and the data to be integrated F, and in the data to be integrated there is the ID number 12456 corresponding to the id_nunber field of other user data, and in the data F to be integrated there is also the ID number 12456 corresponding to the other user data field id_nunber. The ID number corresponding to the user data field id_nunber is 12456. In addition, there is also a passport number 0023 corresponding to other user data fields passport in the data E to be integrated. There is also a passport number 0023 corresponding to other user data fields passport in the data F to be integrated. The corresponding passport number is 2256. Obviously, at this time, there are other user data corresponding to other user data fields in the above two first data, and the other user data fields are consistent. However, the passport number corresponding to the other user data field passport is inconsistent, then The data to be integrated E and the data to be integrated F are not integrated. In this exemplary embodiment, if there are multiple other user data corresponding to other user data fields in the multiple first data, and the multiple other user data fields and other user data are consistent, then for the multiple first data The target data is obtained through integration, which improves the logic of integrating multiple first data and improves the accuracy of data integration. In an optional embodiment, FIG. 7 shows a schematic flowchart of integrating multiple first data to obtain target data in a medical data integration method. As shown in FIG. 7 , the method at least includes the following steps: In step S710 , determine the character lengths of multiple user names corresponding to the multiple first data one-to-one, and compare the multiple character lengths to obtain a character comparison result. Among them, the character length of the username refers to the number of characters used to compose the username, the character comparison result is the result obtained by comparing multiple character lengths, and the multiple character lengths refers to the result of comparing multiple character lengths. One piece of data respectively corresponds to the character length of the user name. For example, the first data includes data A to be integrated and data C to be integrated. The user name corresponding to the data A to be integrated is "Jone", and the user name corresponding to the data C to be integrated is "Jone Doe". Obviously , the character length of the username "Jone Doe" is greater than the character length of the username "Jone". In step S720, according to the character comparison result, a mapping relationship between the first user name and the second user name is established to integrate multiple first data to obtain the target data; wherein the character length of the first user name is greater than that of the second user name. 2. The character length of the username. Among them, based on the character comparison result, a mapping relationship between the first user name and the second user name is established, and the value It should be noted that the character length of the first username must be greater than the character length of the second username. For example, the first data includes data A to be integrated and data C to be integrated. The user name corresponding to the data A to be integrated is "Jone", and the user name corresponding to the data C to be integrated is "Jone Doe". Obviously , the character length of the username "Jone Doe" is greater than the character length of the username "Jone". Based on this, the first user name is "Jone Doe" and the second user name is "Jone". Specifically, the mapping relationship between the first user name and the second user name can be established in the form of a key-value pair, for example: "Jone Doe": "Jone", which indicates that the user corresponding to the first user name "Jone Doe" and the user corresponding to the second user name "Jone" are the same user. It is worth mentioning that in the subsequent data verification process, the target data is queried according to the key. Since the key is a user name with a long character length, querying based on the key can ensure the accuracy of the queried target data. Achieve accurate verification of data. In this exemplary embodiment, a mapping relationship between the first user name and the second user name is established based on the character comparison result, thereby ensuring that when the data is subsequently verified, the first user with a longer character length can be found. The corresponding target data avoids the target data query error caused by the second user with a shorter character length, which may lead to subsequent verification failure. In an optional embodiment, Figure 8 shows a schematic flowchart of integrating multiple first data to obtain target data in the medical data integration method. As shown in Figure 8, the method at least includes the following steps: In step S810 , if there are specific characters in the plurality of first data, remove the specific characters to obtain multiple first data excluding the specific characters; wherein the specific characters include all characters except the American Information Exchange Code. Among them, the American Standard Code for Information Interchange refers to the ASCII code, and the specific characters refer to all characters except the ASCII code, that is, the specific characters refer to the non-ASCII codes. The reason why the specific characters need to be removed is because multiple characters cannot be processed. The non-ASCII codes in the first data are processed, and then the non-ASCII codes in the plurality of first data need to be removed, so that the plurality of first data that do not include non-ASCII codes can be processed. For example, the first data is data A to be integrated and data C to be integrated. Among them, the data A to be integrated contains non-ASCII code XXX, that is, specific characters are stored in the first data, and XXX is moved from the data A to be integrated. Divide to obtain data A that does not include specific characters. In step S820, a plurality of first data excluding specific characters are integrated to obtain target data. After removing the non-ASCII codes, the plurality of first data excluding the non-ASCII codes are integrated to obtain the target data. For example, the first data is data A to be integrated and data C to be integrated. The data A to be integrated includes the specific character XXX, that is, there is a non-ASCII code in the first data, and XXX is removed from the data A to be integrated. Remove to obtain the data to be integrated A that does not include non-ASCII codes, and then integrate the data to be integrated C and the data to be integrated A that does not include non-ASCII codes to obtain the target data. In this exemplary embodiment, the non-ASCII codes present in the first data are removed to avoid the subsequent inability to process the non-ASCII codes in the first data and improve the result obtained after integrating multiple first data. Head accuracy and efficiency of standard data. In an optional embodiment, after integrating multiple first data that are consistent with other user data to obtain the target data, the method further includes: storing the target data according to a specific data format. Among them, after obtaining the target data, the target data can also be stored according to a specific data format. Specifically, the specific data format can be JSON (JavaScript Object Notation, JS object notation) format, and the specific data format can also be a table format. Specific The data format can also be a number corresponding to a certain database, and the specific data format can also be any data format, which is not specifically limited in this exemplary embodiment. For example, after integrating multiple first data, the obtained target data is stored in Json format. In this exemplary embodiment, the target data is stored in a specific data format to facilitate subsequent identification of the target data, thereby improving the efficiency of verification when data verification is required for the user. In the methods and devices provided by exemplary embodiments of the present disclosure, target data is obtained by integrating multiple first data that are consistent with other user data, and the first data is a user in the data to be integrated from multiple different systems. Data with the same date of birth realizes the integration of data to be integrated in different systems. On the one hand, it avoids the verification failure in the existing technology when the data to be integrated is verified, and improves the efficiency of verification; on the other hand, , avoiding the situation in the existing technology that all the data to be integrated cannot be integrated, thereby improving the integration accuracy and efficiency of the data to be integrated. The medical data integration method in the embodiment of the present disclosure will be described in detail below in conjunction with an application scenario. Obtain data A from system 1 to be integrated, data B from system 2 to be integrated, data C from system 3 to be integrated, and data to be integrated D from system 4, where, since it corresponds to data A to be integrated The birth date of the user is the same as the birth date of the user corresponding to the data B to be integrated. Therefore, the plurality of first data are determined to be the data A and the data to be integrated. o The user name corresponding to the data A to be integrated is determined to be "Jone Doe"", the user name corresponding to the data B to be integrated is "Jone". The similarity calculation of the above two user names resulted in a similarity value of 1.2. Since the similarity value is greater than the preset similarity value of 0.9, it is determined that the user name is "Jone". The other user data A-1 corresponding to the integrated data A is also determined to be the other user data B-1 corresponding to the data B to be integrated. Since the other user data A-1 is consistent with the other user data B-1, the data A to be integrated is Integrate it with the data B to be integrated to obtain the target data. In this application scenario, multiple first data that are consistent with other user data are integrated to obtain the target data, and the first data is the data with the same birth date of the user in the data to be integrated from multiple different systems, thereby achieving The integration of data to be integrated in different systems, on the one hand, avoids the failure of verification in the existing technology when the data to be integrated is verified, and improves the efficiency of verification; on the other hand, it avoids the failure of the existing technology to verify the data to be integrated. All data to be integrated are integrated, thereby improving the integration accuracy and efficiency of the data to be integrated. Furthermore, in an exemplary embodiment of the present disclosure, a medical data integration device is also provided. Figure 9 shows a schematic structural diagram of a medical data integration device. As shown in Figure 9, the medical data integration device 900 may include: Determine module 910, similarity calculation module 920 and integration module 930. Among them: the determination module 910 is configured to obtain data to be integrated from multiple different systems, and determine multiple first data with the same birth date of the user from the multiple data to be integrated; the similarity calculation module 920 is configured In order to determine the usernames corresponding to the plurality of first data one-to-one, and perform similarity calculation on the plurality of usernames to obtain similarity values; the integration module 930 is configured to: if the similarity value is greater than or equal to the preset similarity value, A plurality of other user data corresponding to the plurality of first data are determined to integrate the plurality of first data that are consistent with the other user data to obtain the target data. The specific details of the above-mentioned medical data integration device 900 have been described in detail in the corresponding medical data integration method, so they will not be described again here. It should be noted that although several modules or units of the medical data integration device 900 are mentioned in the above detailed description, this division is not mandatory. In fact, according to embodiments of the present disclosure, the features and functions of two or more modules or units described above may be embodied in one module or unit. Conversely, the features and functions of one module or unit described above may be further divided into multiple modules or units to be embodied. In addition, in an exemplary embodiment of the present disclosure, an electronic device capable of implementing the above method is also provided. The electronic device 1000 according to this embodiment of the present invention will be described below with reference to FIG. 10 . The electronic device shown in FIG. 10 1000 is just an example and should not bring any limitation to the functions and usage scope of the embodiments of the present invention. As shown in Figure 10, electronic device 1000 is embodied in the form of a general computing device. The components of the electronic device 1000 may include, but are not limited to: the above-mentioned at least one processing unit 1010, the above-mentioned at least one storage unit 1020, a bus 1030 connecting different system components (including the storage unit 1020 and the processing unit 1010), the display unit 1040o, wherein, the The storage unit stores program code, which can be executed by the processing unit 1010, so that the processing unit 1010 performs the steps according to various exemplary embodiments of the present invention described in the "Exemplary Method" section of this specification. . The storage unit 1020 may include a readable medium in the form of a volatile storage unit, such as a random access storage unit (RAM) 1021 and/or a cache storage unit 1022, and may further include a read-only storage unit (ROM) 1023 o Storage unit 1020 may also include a program/usage tool 1024 having a set of (at least one) program modules 1025. Such program modules 1025 include, but are not limited to: an operating system, one or more application programs, other program modules, and program data, examples of which Each of these, or some combination of them, may contain the reality of a networked environment. Bus 1030 may represent one or more of several types of bus structures, including a memory unit bus or memory unit controller, a peripheral bus, a graphics acceleration port, a processing unit, or using any of a variety of bus structures. It means the local bus of the bus structure. Electronic device 1000 may also communicate with one or more external devices 1070 (e.g., keyboard, pointing device, Bluetooth device, etc.), may also communicate with one or more devices that enable a user to interact with electronic device 1000, and/or with Any device (eg, router, modem, etc.) that enables the electronic device 1000 to communicate with one or more other computing devices. This communication may occur through input/output (I/O) interface 1050. Moreover, the electronic device 1000 can also communicate with one or more networks (such as a local area network (LAN), a wide area network (WAN) and/or a public network, such as the Internet) through the network adapter 1060. As shown, network adapter 1060 communicates with other modules of electronic device 1000 via bus 1030. It should be understood that, although not shown in the figure, other hardware and/or software modules may be used in conjunction with the electronic device 1000, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAI systems, tape drives And data backup storage system, etc. Through the above description of the embodiments, those skilled in the art can easily understand that the example embodiments described here can be implemented by software, or can be implemented by software combined with necessary hardware. Therefore, the technical solution according to the embodiment of the present disclosure can be embodied in the form of a software product. The software product can be stored in a non-volatile storage medium (which can be a CD-ROM, U disk, mobile hard disk, etc.) or on a network. above, including several instructions to cause a computing device (which may be a personal computer, a server, a terminal device, or a network device, etc.) to execute the method according to the embodiment of the present disclosure. In an exemplary embodiment of the present disclosure, a computer-readable storage medium is also provided, on which a program product capable of implementing the method described above in this specification is stored. In some possible embodiments, various aspects of the present invention can also be implemented in the form of a program product, which includes program code. When the program product is run on a terminal device, the program code is used to cause the The terminal device performs the steps according to various exemplary embodiments of the present invention described in the "Exemplary Method" section above in this specification. Referring to FIG. 11 , a program product 1100 for implementing the above method according to an embodiment of the present invention is described, which can adopt a portable compact disk read-only memory (CD-ROM) and include program code, and can be used on a terminal device, For example, run on a personal computer. However, the program product of the present invention is not limited thereto. In this document, a readable storage medium may be any tangible medium containing or storing a program, which may be used by or in combination with an instruction execution system, apparatus or device. The program product may take the form of any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device or device, or any combination of the above. More specific examples (non-exhaustive list) of readable storage media include: electrical connection with one or more wires, portable disk, hard disk, random access memory (RAM), read only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above. A computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, It contains readable program code. This propagated data signal can take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the above. A readable signal medium may also be any readable medium other than a readable storage medium that may send, propagate, or transmit a program for use by or in conjunction with an instruction execution system, apparatus, or device. The program code contained on the readable medium can be transmitted using any appropriate medium, including but not limited to wireless, wired, optical cable, RF, etc., or any suitable combination of the above. Program code for performing the operations of the present invention may be written in any combination of one or more programming languages, including object-oriented programming languages such as Java, C++, etc., as well as conventional procedural programming. Programming language - such as "C" or a similar programming language. The program code may execute entirely on the user's computing device, partly on the user's computing device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server execute on. In situations involving remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computing device (e.g., provided by an Internet service). business to connect via the Internet). Other embodiments of the disclosure will be readily apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure that follow the general principles of the disclosure and include common knowledge or customary technical means in the technical field that are not disclosed in the disclosure. . It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

Claims

权 利要求 Rights request
1、 一种医疗数据整合方法, 其特征在于, 所述方法包括: 获取来自于 多个不同系统的 待整合数据, 并在多个所述待 整合数据中确定 出用户 出生日期相同的多 个第一数据; 确定与多个所 述第一数据一 一对应的用户名 , 并对多个所述用户名进行相 似度计算 得到相似度值 ; 若所述相似度 值大于或等于 预设相似度值 , 确定与多个所述第一数据一一 对应的 其他用户数据 , 以对所述其他用户数据一致 的多个所述第一数 据进行整 合得到 目标数据。 1. A medical data integration method, characterized in that the method includes: acquiring data to be integrated from multiple different systems, and determining multiple users with the same birth date among the multiple data to be integrated. One data; Determine the usernames corresponding to the plurality of first data, and perform similarity calculation on the plurality of usernames to obtain a similarity value; If the similarity value is greater than or equal to the preset similarity value , determine other user data that corresponds one-to-one with the plurality of first data, so as to integrate the plurality of first data that are consistent with the other user data to obtain target data.
2、 根据权利要求 1所述的医疗数据整合方 法, 其特征在于, 所述获取来自 于多个不 同系统的待整合 数据, 包括: 获取来自于 多个不同系统的 待整合数据表 , 并从所述待整合数据表中提取 与特定数 据字段对应 的待整合数据; 其中, 所述特定数据字段包括 唯一数据标 识、 所述用户出生 日期、 所述用户名以及其他用户数 据字段, 所述其他用户数 据字段包 括身份证编号 、 出生证编号、 护照号以及病人编号。 2. The medical data integration method according to claim 1, wherein the obtaining data to be integrated from a plurality of different systems includes: obtaining data tables to be integrated from a plurality of different systems, and obtaining data from all the data tables to be integrated. Extract data to be integrated corresponding to specific data fields from the data table to be integrated; wherein, the specific data fields include unique data identifiers, the user's birth date, the user name and other user data fields, and the other user data Fields include ID number, birth certificate number, passport number, and patient number.
3、 根据权利要求 2所述的医疗数据整合方 法, 其特征在于, 所述确定与多 个所述第 一数据一一对应 的用户名, 包括: 确定与多个所述 第一数据一一对 应的所述唯一数 据标识的值; 若多个所述 唯一数据标识的 值不同, 则确定与多个所述第一 数据一一对应 的多个用 户名。 3. The medical data integration method according to claim 2, wherein the determining a user name that corresponds to a plurality of the first data in a one-to-one manner includes: determining a user name that corresponds to a plurality of the first data in a one-to-one manner. The value of the unique data identifier; if the values of multiple unique data identifiers are different, determine multiple user names that correspond one-to-one to multiple first data.
4、 根据权利要求 1所述的医疗数据整合方 法, 其特征在于, 所述对多个所 述用户名 进行相似度计算 得到相似度值 , 包括: 确定多个所述用 户名中包含的 多个词汇; 根据所述多个 词汇在每一个 所述用户名中 出现的频率, 确定与每一所述用 户名对应 的高维向量; 根据与每一所 述用户名对应 的所述高维向量 , 计算多个所述用户名之间 的 余弦距离 , 以确定多个所述用户名之间的 相似度值。 4. The medical data integration method according to claim 1, characterized in that: performing similarity calculation on a plurality of user names to obtain a similarity value includes: determining a plurality of the user names included in the plurality of user names. Vocabulary; According to the frequency of occurrence of the plurality of words in each user name, determine a high-dimensional vector corresponding to each user name; According to the high-dimensional vector corresponding to each user name, Calculate cosine distances between multiple user names to determine similarity values between multiple user names.
5、 根据权利要求 1所述的医疗数据整合方 法, 其特征在于, 所述若所述相 似度值 大于或等于预设 相似度值, 确定与多个所述第 一数据一一对应 的其他用 户数据 , 包括: 若所述相似度 值大于或等于 预设相似度值 , 确定与多个所述用户名一一对 应的字符 长度; 若多个所述字 符长度大于或 等于预设字符长 度, 确定与多个所述第一数据 一一对应 的其他用户数据 。 5. The medical data integration method according to claim 1, wherein if the similarity value is greater than or equal to a preset similarity value, determine other users corresponding to a plurality of the first data one-to-one. The data includes: if the similarity value is greater than or equal to the preset similarity value, determine the character length corresponding to multiple user names; if the character length of multiple usernames is greater than or equal to the preset character length, determine Other user data corresponding one-to-one to a plurality of first data.
6、 根据权利要求 2所述的医疗数据整合方 法, 其特征在于, 所述对所述其 他用户数 据一致的多个所 述第一数据进行 整合得到目标数 据, 包括: 若在多个所述 第一数据中存 在与所述其他用 户数据字段对 应的多个所述其 他用户数 据, 则对多个所述其他用户数据 字段进行判断; 若与多个所述 第一数据一一 对应的所述其他 用户数据字段 一致, 且与所述 其他用户 数据字段对应 的所述其他用户 数据一致, 则对多个所述第 一数据进行 整合得到 目标数据。 6. The medical data integration method according to claim 2, characterized in that: Integrating multiple first data that are consistent with other user data to obtain target data includes: if there are multiple other user data corresponding to the other user data fields in the multiple first data, then A plurality of other user data fields are judged; if the other user data fields corresponding to a plurality of the first data are consistent and consistent with the other user data corresponding to the other user data fields, then Integrate multiple first data to obtain target data.
7、 根据权利要求 1所述的医疗数据整合方 法, 其特征在于, 所述对多个所 述第一数 据进行整合得到 目标数据, 包括: 确定与多个所 述第一数据一 一对应的用户名 的字符长度, 并对多个所述字 符长度进 行比较得到字符 比较结果; 根据所述字符 比较结果, 建立第一用户名与 第二用户名之 间的映射关系 , 以对多个 所述第一数据 进行整合得到 目标数据; 其中, 所述第一用户名的所述 字符长度 大于所述第二用 户名的所述字符 长度。 7. The medical data integration method according to claim 1, wherein said integrating a plurality of said first data to obtain target data includes: determining a user corresponding to a plurality of said first data in a one-to-one manner. the character length of the name, and compares multiple character lengths to obtain a character comparison result; establishes a mapping relationship between the first user name and the second user name according to the character comparison result, so as to compare multiple first user names A data is integrated to obtain target data; wherein the character length of the first user name is greater than the character length of the second user name.
8、 根据权利要求 7所述的医疗数据整合方 法, 其特征在于, 所述对多个所 述第一数 据进行整合得到 目标数据, 包括: 若在多个所述 第一数据中存 在特定字符, 则将所述特定字 符移除, 以得到 不包括所 述特定字符 的多个所述第一数 据; 其中, 所述特定字符中包括除了美 国信息交 换代码之外的所 有字符; 对不包括所述 特定数据的多个所 述第一数据进行 整合得到 目标数据。 8. The medical data integration method according to claim 7, wherein said integrating a plurality of said first data to obtain target data includes: if there are specific characters in a plurality of said first data, Then remove the specific characters to obtain a plurality of first data that do not include the specific characters; wherein, the specific characters include all characters except the American Information Exchange Code; for A plurality of first data of specific data are integrated to obtain target data.
9、 根据权利要求 1-8中任一项所述医疗数据整合 方法, 其特征在于, 所述 对所述 其他用户数据一 致的多个所述第 一数据进行整 合得到目标数据 之后, 所 述方法还 包括: 按照特定数据 格式存储所述 目标数据。 9. The medical data integration method according to any one of claims 1 to 8, characterized in that, after integrating the plurality of first data that are consistent with the other user data to obtain the target data, the method It also includes: storing the target data in a specific data format.
10、 一种医疗数据整合装置, 其特征在于, 包括: 确定模块, 被配置为获取来 自于多个不同系 统的待整合数据 , 并在多个所 述待整合 数据中确定出用户 出生日期相 同的多个第一数据 ; 相似度计算模 块, 被配置为确定与多个所述 第一数据一一对 应的用户名, 并对多个 所述用户名进行 相似度计算得到 相似度值; 整合模块, 被配置为若所述 相似度值大于或 等于预设相似 度值, 确定与多 个所述 第一数据一一对 应的其他用户数 据, 以对所述其他用户数据 一致的多个 所述第一 数据进行整合得 到目标数据。 10. A medical data integration device, characterized in that it includes: a determination module configured to obtain data to be integrated from multiple different systems, and to determine the user with the same birth date among the multiple data to be integrated. A plurality of first data; a similarity calculation module, configured to determine a user name corresponding to a plurality of the first data, and perform similarity calculation on a plurality of the user names to obtain a similarity value; an integration module, is configured to determine, if the similarity value is greater than or equal to a preset similarity value, other user data that corresponds to a plurality of the first data one-to-one, so as to identify a plurality of the first data that is consistent with the other user data. The data is integrated to obtain the target data.
11、 一种电子设备, 其特征在于, 包括: 处理器; 存储器, 用于存储所述处理器 的可执行指令; 其中,所述处理器被配置为经 由执行所述可执 行指令来执行权 利要求 1-9中 19 的任意一 项所述的医疗数 据整合方法。 11. An electronic device, characterized in that it includes: a processor; a memory for storing executable instructions of the processor; wherein the processor is configured to execute the claims by executing the executable instructions 1-9 in The medical data integration method described in any of 19.
12、一种计算机可读存储 介质, 其上存储计算机程序, 其特征在于, 所述计 算机程序 被处理器执行 时实现权利要求 1-9 中的任意一项所述的医疗数据整合 方法。 12. A computer-readable storage medium on which a computer program is stored, characterized in that, when the computer program is executed by a processor, the medical data integration method described in any one of claims 1-9 is implemented.
PCT/IB2022/057149 2022-08-02 2022-08-02 Medical data consolidation method and apparatus, computer storage medium, and electronic device WO2024028635A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/IB2022/057149 WO2024028635A1 (en) 2022-08-02 2022-08-02 Medical data consolidation method and apparatus, computer storage medium, and electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/IB2022/057149 WO2024028635A1 (en) 2022-08-02 2022-08-02 Medical data consolidation method and apparatus, computer storage medium, and electronic device

Publications (1)

Publication Number Publication Date
WO2024028635A1 true WO2024028635A1 (en) 2024-02-08

Family

ID=89848613

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2022/057149 WO2024028635A1 (en) 2022-08-02 2022-08-02 Medical data consolidation method and apparatus, computer storage medium, and electronic device

Country Status (1)

Country Link
WO (1) WO2024028635A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200013491A1 (en) * 2017-03-13 2020-01-09 Chartspan Medical Technologies, Inc. Interoperable Record Matching Process
CN111223541A (en) * 2020-01-10 2020-06-02 王利 Newborn information matching method and device and terminal equipment
CN112863672A (en) * 2021-03-09 2021-05-28 中电健康云科技有限公司 Patient identity matching method based on PSO algorithm optimization
CN114490642A (en) * 2021-12-31 2022-05-13 上海柯林布瑞信息技术有限公司 Patient master index generation method, apparatus and medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200013491A1 (en) * 2017-03-13 2020-01-09 Chartspan Medical Technologies, Inc. Interoperable Record Matching Process
CN111223541A (en) * 2020-01-10 2020-06-02 王利 Newborn information matching method and device and terminal equipment
CN112863672A (en) * 2021-03-09 2021-05-28 中电健康云科技有限公司 Patient identity matching method based on PSO algorithm optimization
CN114490642A (en) * 2021-12-31 2022-05-13 上海柯林布瑞信息技术有限公司 Patient master index generation method, apparatus and medium

Similar Documents

Publication Publication Date Title
CN109471863B (en) Information query method and device based on distributed database and electronic equipment
WO2021135910A1 (en) Machine reading comprehension-based information extraction method and related device
WO2019095586A1 (en) Meeting minutes generation method, application server, and computer readable storage medium
WO2022121221A1 (en) Token-based application access method and apparatus, computer device, and medium
JP2018081297A (en) Method and device for processing voice data
CN109670297A (en) Activating method, device, storage medium and the electronic equipment of service authority
US11561972B2 (en) Query conversion for querying disparate data sources
CN111709527A (en) Operation and maintenance knowledge map library establishing method, device, equipment and storage medium
CN110555072A (en) Data access method, device, equipment and medium
CN112559865B (en) Information processing system, computer-readable storage medium, and electronic device
WO2021196935A1 (en) Data checking method and apparatus, electronic device, and storage medium
CN114528044B (en) Interface calling method, device, equipment and medium
WO2019210698A1 (en) Authentication method
WO2021159669A1 (en) Secure system login method and apparatus, computer device, and storage medium
CN111694866A (en) Data searching and storing method, data searching system, data searching device, data searching equipment and data searching medium
CN111935078B (en) Handle-based open authentication method, device and system
CN110598007B (en) Bill file processing method, device, medium and electronic equipment
CN111863178A (en) Method, device, medium and electronic device for issuing medical report
CN115526425A (en) Financial data prediction system and method based on block chain and big data
US10917381B2 (en) Device control system, device, and computer-readable non-transitory storage medium
WO2019071907A1 (en) Method for identifying help information based on operation page, and application server
WO2024028635A1 (en) Medical data consolidation method and apparatus, computer storage medium, and electronic device
CN112966304A (en) Method and device for preventing process document from being tampered, computer equipment and medium
EP4099628A2 (en) Method and apparatus of deploying a certificate, electronic device, and storage medium
CN116913494A (en) Pre-consultation method and system for re-consultation of patients in hospital

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22953902

Country of ref document: EP

Kind code of ref document: A1