WO2024028635A1

WO2024028635A1 - Medical data consolidation method and apparatus, computer storage medium, and electronic device

Info

Publication number: WO2024028635A1
Application number: PCT/IB2022/057149
Authority: WO
Inventors: 巴婕菈; 阿齐姆; 陈嘉源; 褚兆玮; 郭锋
Original assignee: Evyd科技有限公司
Priority date: 2022-08-02
Filing date: 2022-08-02
Publication date: 2024-02-08

Abstract

The present disclosure relates to the field of data processing, and to a medical data consolidation method and apparatus, a storage medium, and an electronic device. The method comprises: obtaining data to be consolidated from a plurality of different systems, and determining, among the plurality of pieces of data to be consolidated, a plurality of pieces of first data of users having the same date of birth; determining a plurality of user names in one-to-one correspondence to the plurality of pieces of first data, and performing similarity calculation on the plurality of user names to obtain a similarity value; and if the similarity value is greater than or equal to a preset similarity value, determining a plurality of pieces of other user data in one-to-one correspondence to the plurality of pieces of first data, so as to consolidate the plurality of pieces of first data consistent with the plurality of pieces of other user data to obtain target data. In the present disclosure, the first data is the data of the users having the same date of birth among the data to be consolidated from the plurality of different systems, so that the consolidation of the data to be consolidated in the different systems is realized, and the situation in the prior art that data to be consolidated needs to be manually consolidated is avoided, thereby improving the consolidation accuracy and efficiency.

Description

Medical Data Integration Method and Device, Computer Storage Medium, and Electronic Equipment Technical Field The present disclosure relates to the field of data processing, and in particular, to a medical data integration method and device, computer-readable storage media, and electronic equipment. BACKGROUND OF THE INVENTION With the development of computer technology, a user may have multiple records in multiple different systems. Furthermore, when the user's data needs to be verified, the multiple records of the user in multiple different systems need to be verified. verify. In related technologies, when a data verification request for a user is received, the verification requirement will be transferred to the relevant personnel of the data group for processing. In response to the verification requirement, the relevant personnel of the data group will perform manual analysis and manually analyze different data. Multiple records existing in the system are merged to obtain the merged user data. Obviously, in this case, as the subsequent records in different systems increase, the user data needs to be merged again, which reduces the cost of user data merging. accuracy and efficiency, leading to user data verification failure. In view of this, there is an urgent need to develop a new medical data integration method and device in this field. It should be noted that the information disclosed in the above background section is only used to enhance understanding of the background of the present disclosure, and therefore may include information that does not constitute prior art known to those of ordinary skill in the art. SUMMARY OF THE INVENTION The purpose of the present disclosure is to provide a medical data integration method, a medical data integration device, a computer-readable storage medium and an electronic device, thereby overcoming, at least to a certain extent, the low user data integration accuracy and efficiency caused by related technologies. question. Additional features and advantages of the disclosure will be apparent from the following detailed description, or, in part, may be learned by practice of the disclosure. According to a first aspect of an embodiment of the present invention, a medical data integration method is provided. The method includes: acquiring data to be integrated from multiple different systems, and determining the birth date of the user from the plurality of data to be integrated. Multiple first data with the same date; Determine multiple user names corresponding to multiple first data, and perform similarity calculation on multiple user names to obtain similarity values; If the similarity value If the similarity value is greater than or equal to the preset similarity value, other user data corresponding one-to-one with the plurality of first data are determined, so that the plurality of first data consistent with the other user data are integrated to obtain target data. In an exemplary embodiment of the present invention, obtaining data to be integrated from multiple different systems includes: obtaining data tables to be integrated from multiple different systems, and obtaining data from the data tables to be integrated. Extract data to be integrated corresponding to specific data fields; wherein, the specific data fields include unique data identification, the user's date of birth, the user name, and other user data fields, and the other user data fields include ID number, Birth certificate number, passport number and patient number. In an exemplary embodiment of the present invention, determining a user name that corresponds to a plurality of first data in a one-to-one manner includes: determining a unique data identifier that corresponds to a plurality of the first data in a one-to-one manner. The value of; if the values of the multiple unique data identifiers are different, determine multiple user names corresponding to the multiple first data one by one. In an exemplary embodiment of the present invention, performing similarity calculation on multiple user names to obtain similarity values includes: determining multiple words contained in multiple user names; The frequency of occurrence of each word in each user name is determined, and a high-dimensional vector corresponding to each user name is determined; based on the high-dimensional vector corresponding to each user name, a plurality of the users are calculated The cosine distance between names is used to determine the similarity value between multiple user names. In an exemplary embodiment of the present invention, if the similarity value is greater than or equal to a preset similarity value, determining other user data corresponding to a plurality of the first data one-to-one includes: if the If the similarity value is greater than or equal to the preset similarity value, determine the character lengths corresponding to multiple user names; if the character lengths of multiple usernames are greater than or equal to the preset character length, determine the character lengths corresponding to multiple usernames. One data corresponds to multiple other user data one-to-one. In an exemplary embodiment of the present invention, the step of integrating multiple first data that are consistent with the other user data to obtain target data includes: If multiple other user data fields corresponding to the other user data fields are determined, a plurality of the other user data fields are determined; if the other user data fields corresponding to the multiple first data fields are consistent, and If it is consistent with the other user data corresponding to the other user data fields, a plurality of the first data are integrated to obtain the target data. In an exemplary embodiment of the present invention, integrating a plurality of the first data to obtain target data includes: determining the character length of a user name that corresponds to a one-to-one correspondence with the plurality of first data, and Compare the plurality of character lengths to obtain a character comparison result; establish a mapping relationship between the first user name and the second user name according to the character comparison result, so as to integrate the plurality of first data to obtain the target Data; wherein, the character length of the first username is greater than the character length of the second username. In an exemplary embodiment of the present invention, integrating multiple first data to obtain target data includes: if specific characters exist in multiple first data, converting the specific characters into Remove to obtain a plurality of first data that do not include the specific characters; wherein, the specific characters include all characters except the American Information Exchange Code; for a plurality of all first data that do not include the specific data The first data is integrated to obtain the target data. In an exemplary embodiment of the present invention, after the plurality of first data that are consistent with the other user data are integrated to obtain the target data, the method further includes: storing the target according to a specific data format. data. According to a second aspect of the embodiment of the present invention, a medical data integration device is provided. The device includes: a determination module configured to obtain data to be integrated from multiple different systems, and to obtain data to be integrated from multiple different systems. Multiple first data with the same date of birth of the user are determined in the data; a similarity calculation module is configured to determine user names corresponding to multiple first data, and perform similarity calculation on multiple user names. Calculate the similarity value; the integration module is configured to, if the similarity value is greater than or equal to the preset similarity value, determine other user data that corresponds to a plurality of the first data one-to-one, so as to compare the other users A plurality of first data with consistent data are integrated to obtain target data. According to a third aspect of the embodiment of the present invention, an electronic device is provided, including: a processor and a memory; wherein computer-readable instructions are stored on the memory, and when the computer-readable instructions are executed by the processor, the above is achieved The medical data integration method of any exemplary embodiment. According to a fourth aspect of an embodiment of the present invention, a computer-readable storage medium is provided, on which a computer program is stored. When the computer program is executed by a processor, the medical data integration method in any of the above exemplary embodiments is implemented. It should be understood that the above general description and the following detailed description are exemplary and explanatory only, and do not limit the present disclosure. BRIEF DESCRIPTION OF THE DRAWINGS The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure. Obviously, the drawings in the following description are only some embodiments of the present disclosure. For those of ordinary skill in the art, other drawings can be obtained based on these drawings without exerting creative efforts. Figure 1 schematically shows a flow chart of the medical data integration method in the embodiment of the present disclosure; Figure 2 schematically shows the data to be integrated in the medical data integration method in the embodiment of the present disclosure; Figure 3 schematically shows the implementation of the present disclosure A schematic flow chart of determining multiple usernames corresponding to multiple first data in the medical data integration method in the example; Figure 4 schematically shows the similarity calculation of multiple usernames in the medical data integration method in the embodiment of the present disclosure. A schematic flowchart of calculating the similarity value; Figure 5 schematically shows a schematic flowchart of determining other user data that corresponds to multiple first data in the medical data integration method in the embodiment of the present disclosure; Figure 6 schematically shows the present invention In the disclosed embodiment, the medical data integration method integrates multiple first data that are consistent with other user data to obtain target data; Figure 7 schematically shows the medical data integration method in the disclosed embodiment. A schematic flowchart of integrating data to obtain target data; Figure 8 schematically illustrates a schematic flowchart of integrating multiple first data to obtain target data in the medical data integration method in an embodiment of the disclosure; Figure 9 schematically illustrates the implementation of the disclosure A schematic structural diagram of a medical data integration device in the example; Figure 10 schematically shows an electronic device used for a medical data integration method in an embodiment of the present disclosure; Figure 11 schematically illustrates a computer-readable storage medium used for a medical data integration method in an embodiment of the present disclosure. DETAILED DESCRIPTION Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in various forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concepts of the example embodiments. To those skilled in the art. The described features, structures or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to provide a thorough understanding of embodiments of the disclosure. However, those skilled in the art will appreciate that the technical solutions of the present disclosure may be practiced without one or more of the specific details being omitted, or other methods, components, devices, steps, etc. may be adopted. In other instances, well-known technical solutions have not been shown or described in detail to avoid obscuring aspects of the disclosure. The terms "a", "an", "the" and "said" are used in this specification to indicate the existence of one or more elements/components/etc.; the terms "include" and "have" are used to indicate an open-ended Inclusive is intended and means that there may be additional elements/components/etc. in addition to the listed elements/components/etc.; the terms "first" and "second, etc. are used as labels only and do not refer to The number of objects is limited. In addition, the accompanying drawings are only schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings represent the same or similar parts, and thus their repeated description will be omitted. DRAWINGS Some block diagrams shown in are functional entities and do not necessarily correspond to physically or logically independent entities. In response to problems existing in related technologies, the present disclosure proposes a medical data integration method. Figure 1 shows medical data The flow chart of the integration method is shown in Figure 1. The medical data integration method at least includes the following steps: Step S110. Obtain the data to be integrated from multiple different systems, and determine that the user's birth date is the same in the multiple data to be integrated. A plurality of first data. Step S120. Determine the user names corresponding to the plurality of first data, and perform similarity calculation on the multiple user names to obtain a similarity value. Step S130. If the similarity value is greater than or equal to the predetermined Assuming a similarity value, determine other user data that corresponds to multiple first data one-to-one, so as to integrate multiple first data that are consistent with other user data to obtain target data. In the methods provided in exemplary embodiments of the present disclosure, and In the device, multiple first data that are consistent with other user data are integrated to obtain target data, and the first data is data with the same birth date of the user among the data to be integrated from multiple different systems, thereby realizing the integration of different systems. The integration of the data to be integrated, on the one hand, avoids the verification failure in the existing technology when the data to be integrated is verified, and improves the efficiency of verification; on the other hand, it avoids the failure in the existing technology to verify all the data to be integrated. Data integration occurs, thereby improving the integration accuracy and efficiency of the data to be integrated. The following is a detailed description of each step of the medical data integration method. In step S110, data to be integrated from multiple different systems is obtained, and multiple first data with the same date of birth of the user are determined from the multiple data to be integrated. In the exemplary embodiment of the present disclosure, the data to be integrated refers to data from different systems. Specifically, it can be data from the immigration data system, data from the birth database, or data from the birth database. The data may come from the hospital database, or may also come from any system in which the data to be integrated may exist, which is not particularly limited in this exemplary embodiment. Each data to be integrated corresponds to a user. These users may be the same user or different users. The user's birth date refers to the data corresponding to the user's birth date in the data to be integrated. Based on this, the first data It refers to the data with the same date of birth of the users in the data to be integrated. For example, there are 6 data to be integrated. These 6 data to be integrated come from different systems. Specifically, the 6 data to be integrated include data to be integrated A, data to be integrated B, data to be integrated C, and data to be integrated. D. Data E to be integrated and data F to be integrated. Among them, the birth date of the user corresponding to the data A to be integrated is 2012-02-02, and the birth date of the user corresponding to the data B to be integrated is 2000-10-19. The birth date of the user corresponding to the integrated data C is 2012-02-02, the birth date of the user corresponding to the data D to be integrated is 2012-02-05, the birth date of the user corresponding to the data E to be integrated is 1998-06-23, and The user's birth date corresponding to the data F to be integrated is 1998-06-23 o Based on this, it is determined that the first data with the same birth date of the user is the data to be integrated A, the data to be integrated C, the data to be integrated E and the data to be integrated F, Among them, the birth dates of the users of the data A to be integrated and the data C to be integrated are the same, and the birth dates of the users of the data E and the data F to be integrated are the same. In an optional embodiment, obtaining data to be integrated from multiple different systems includes: obtaining data tables to be integrated from multiple different systems, and extracting data to be integrated corresponding to specific data fields from the data tables to be integrated. Integrate data; among them, specific data fields include unique data identification, user date of birth, user name and other user data fields. Other user data fields include ID number, birth certificate number, passport number and patient number. Among them, usually in different systems, the data to be integrated is stored in the data table to be integrated. It is worth mentioning that in addition to the data to be integrated, the data table to be integrated also stores other data that does not need to be integrated. The data to be integrated needs to be extracted from the data table to be integrated, which corresponds to the specific data field. The specific data field refers to the field corresponding to the data to be integrated. The specific data field includes a unique data identifier, which corresponds to different systems. The specific data field also includes the user's date of birth. The data to be integrated corresponding to the specific data field of the user's date of birth can be 1988-07-23, The specific data field also includes the user name. The data to be integrated corresponding to the specific data field of the user name can be "Jane Doe". The specific data field also includes other user data fields. Specifically, the other user data fields include the ID card number. , the data to be integrated corresponding to other user data fields such as ID number can be "610235XXXXXXXX2771", other user data fields also include birth certificate numbers, and the data to be integrated corresponding to other data fields such as birth certificate numbers can be "2XX0410"", Other user data fields also include the passport number. The data to be integrated corresponding to the other user data field of the passport can be "C0XXXXX2". Other user data fields also include the patient number. The data to be integrated corresponding to the other data field of the patient number. It can be "BNXXXX6" o For example, Figure 2 schematically shows a schematic diagram of the data to be integrated. As shown in Figure 2, field 210 is the line number field line, and value 211 is the value corresponding to field 210, used to represent In the row where the data to be integrated is located, the specific data field 220 is the unique data identifier unique_id, the identifier 221 is the field value corresponding to the specific data field 220, the specific data field 230 is the patient number patient_id, and the number 231 is corresponding to the specific data field 230. Patient number, the specific data field 240 is the user name user_name, the name 241 is the user name corresponding to the specific data field 240, the specific data field 250 is the user's birth date birth_date, the date 251 is the date corresponding to the specific data field 250, the field 260 is Gender sex, data 261 corresponds to field 260, specific data field 270 is ID card number id_number, number 271 corresponds to specific data field 270, specific data field 280 is birth certificate number birth certificate number, number 281 corresponds to specific data field 280, The specific data field 290 is the passport number, and the number 291 corresponds to the specific data field 290. In this exemplary embodiment, the data to be integrated is data corresponding to specific fields extracted from the data table to be integrated, which avoids obtaining other data in the data table to be integrated that does not need to be integrated, and improves the subsequent data to be integrated. The integration accuracy and efficiency. In step S120, user names corresponding one-to-one to the plurality of first data are determined, and similarity calculations are performed on the plurality of user names to obtain similarity values. In the exemplary embodiment of the present disclosure, the user name refers to the user name corresponding to the first data. It is worth mentioning that since the first data is data with the same birth date of the user determined in multiple data to be integrated. , although the user's date of birth is the same, since the same user may be recorded as different user names in different systems, in order to determine whether multiple first data with the same user's date of birth is the data of the same user, it is necessary to compare multiple Similarity calculation is performed on user names to obtain similarity values between multiple user names. For example, the first data includes data A to be integrated and data C to be integrated with the same date of birth of the user. Based on this, it is determined that the user name corresponding to the data A to be integrated is "George Herbert Walker Bush", and the user name to be integrated is determined. The user name corresponding to data C is "George Walker Bush", and then the similarity calculation is performed on the user name "George Herbert Walker Bush" and the user name "George Walker Bush" to obtain a similarity value of _0.85 . In an optional embodiment, Figure 3 shows a schematic flow chart of determining a user name corresponding to multiple first data in the medical data integration method. As shown in Figure 3, the method at least includes the following steps: In step S310, determine the user name corresponding to multiple first data. The value of the unique data identifier corresponding to the data one-to-one. Among them, the value of the unique data identifier is related to the system from which the first data comes, and the values of different unique data identifiers correspond to different systems. Therefore, it is necessary to first determine multiple unique data corresponding to multiple first data. The value of the identifier ensures that the first data from different systems can be integrated later. For example, the plurality of first data include data A to be integrated with the same date of birth of the user and data A to be integrated. According to C, based on this, it is determined that the value of the unique data identifier corresponding to the data A to be integrated is XXI, and the value of the unique data identifier corresponding to the data C to be integrated is XX2. In step S320, if the values of the multiple unique data identifiers are different, multiple user names corresponding one-to-one to the multiple first data are determined. When the values of the multiple unique data identifiers are different, it proves that the multiple first data come from different systems. Only then does it need to determine the user names corresponding to the multiple first data. When the values of multiple unique data identifiers are the same, it proves that multiple first data come from the same system, because in the same system, there is a unified standard for data storage. Therefore, the first data in the same system are all is the data obtained after data integration processing. Therefore, when the values of multiple unique data identifiers are the same, there is no need to integrate multiple first data. For example, the plurality of first data include data A to be integrated and data C to be integrated with the same date of birth of the user. Based on this, it is determined that the unique data identifier corresponding to the data A to be integrated is XXI, and the unique data identifier corresponding to the data to be integrated C is The unique data identifier is XX2. Obviously, at this time, the multiple unique data identifiers are not the same, and it is necessary to determine the user name "George Herbert Walker Bush" corresponding to the data A to be integrated, and it is also necessary to determine the user name "George Walker Bush" corresponding to the data C to be integrated. ". In this exemplary embodiment, if multiple first data identifiers are different, multiple user names respectively corresponding to the multiple first data are determined to ensure that the objects of similarity calculation are users with the same birth date from different systems. The user name of the data to be integrated realizes the integration of data from the user name dimension and improves the efficiency of integrating the data to be integrated. In an optional embodiment, FIG. 4 shows a schematic flowchart of performing similarity calculation on multiple user names to obtain similarity values in the medical data integration method. As shown in FIG. 4 , the method at least includes the following steps: In step In S410, multiple words contained in multiple user names are determined. Among them, multiple words refer to words included in the user name. For example, if multiple usernames include "George Herbert Walker Bush ^", and "George Walker Bush", then the words contained in the multiple usernames determined are "George", "Herbert", "Walker" and "Bush" ” o In step S420, determine the high-dimensional vector corresponding to each user name based on the frequency of multiple words appearing in each user name. Wherein, determine multiple user names corresponding to the multiple first data. For example, the multiple user names are "George Herbert Walker Bush" and "George Walker Bush". The frequency of multiple words appearing in each user name refers to the occurrence of each word in the user name in the corresponding user name. times, for example the word "George", the word "Herbert", the word "Walker" and the word "Bush ^" each appear once in the username "George Herbert Walker Bush". The dimension of the high-dimensional vector is related to the number of words appearing in the user names corresponding to the multiple first data, for example A total of four words appear in the above two user names, namely "George", "Herbert", "Walker" and "Bush", and the high-dimensional vector corresponding to the user name can be a four-dimensional vector. For example, multiple user names corresponding to the first data are "George Herbert Walker Bush" and "George Walker Bush" respectively, and then the vocabulary "George", the vocabulary "Herbert", the vocabulary "Walker" and the vocabulary "Bush" are in The username "George Herbert Walker Bush" appears once each, the word "George", the word "Herbert", the word "Walker" and the word "Bush" appear once each in the username "George Walker Bush", the word "Herbert""" appears 0 times in the username "George Walker Bush". Based on this, you can get a four-dimensional vector [1 1 1 1] corresponding to the user name "George Herbert Walker Bush", and you can also get a four-dimensional vector [1 0 1 l] corresponding to the user name "George Walker Bush" _o in In step S430, based on the high-dimensional vector corresponding to each user name, the cosine distance between multiple user names is calculated to determine the similarity value between the multiple user names. Among them, the cosine distance refers to the cosine value of the angle between multiple high-dimensional vectors, and the cosine value is the similarity value obtained by calculating the cosine distance between multiple high-dimensional vectors. Specifically, a calculation formula may be used to calculate the cosine distance between multiple high-dimensional vectors, or an algorithm may be used to calculate the cosine distance between multiple high-dimensional vectors, which is not specifically limited in this exemplary embodiment. For example, multiple high-dimensional vectors specifically include [1 1 1 1] and [1 0 1 1], and then formula (1) can be used to calculate the cosine distance between multiple high-dimensional vectors to obtain the cosine distance between multiple high-dimensional vectors. The relationship between the usernames corresponding to the dimensional vectors

In this exemplary embodiment, a method of calculating the similarity between multiple user names is provided, which helps to subsequently determine whether multiple first data are the data of the same user based on the similarity value, and thus Ensure the accuracy of subsequent data integration. In step S130, if the similarity value is greater than or equal to the preset similarity value, other user data corresponding to the plurality of first data are determined to integrate the plurality of first data consistent with the other user data to obtain the target data. . In an exemplary embodiment of the present disclosure, the preset similarity value refers to a threshold value that is compared with the similarity value and used to determine whether multiple user names respectively corresponding to the multiple first data are similar. If the similarity value is greater than the preset similarity value, it proves that the user birth dates and user names of multiple first data are the same, and then it is necessary to determine other user data corresponding to the first data to integrate the first data that is also consistent with other user data. , get the target data. If the similarity value is less than the preset similarity threshold, it is proved that the plurality of first data are not the data of the same user, and there is no need to integrate the plurality of first data. For example, the first data includes data A to be integrated and data C to be integrated, and the calculated similarity value of the user name of the data A to be integrated and the user name of the data C to be integrated is 0.92. Since the preset similarity The value is 0.9. Therefore, at this time, the similarity value is greater than the preset similarity value, and then other user data A-1 corresponding to the data A to be integrated and other user data C-1 corresponding to the data C to be integrated are determined. If If other user data A-1 is consistent with other user data C-1, it proves that the data to be integrated A and the data to be integrated C are the data of the same user, and then the data to be integrated A and the data to be integrated B are integrated to obtain the target data. In an optional embodiment, Figure 5 shows a schematic flowchart of determining other user data that corresponds to multiple first data in a medical data integration method. As shown in Figure 5, the method at least includes the following steps: In step S510, if the similarity value is greater than or equal to the preset similarity value, character lengths corresponding to multiple user names are determined. Among them, when the similarity is greater than or equal to the preset similarity value, it is also necessary to determine the character length corresponding to the user name. Assuming that the user name is "Jone Doe", the character length corresponding to the user name is 8. The reason why The character length corresponding to the user name is determined because when the similarity value is greater than or equal to the preset similarity, it cannot be completely guaranteed that multiple user names must be consistent. It is also necessary to subsequently determine the character length corresponding to the user name. It can ensure that multiple usernames are the same user's username in real time. This is because if the username has two usernames with shorter character length, such as "Gone" and "Goie", it is possible to judge only by using the similarity value. You will get the conclusion that the above two user names are similar, which is obviously not consistent with the facts. For example, the similarity value is 0.92, and the preset similarity value is 0.9. Obviously, the similarity value is greater than the preset similarity value at this time. Based on this, it is determined that the character length corresponding to the user name XXI is 26, which is the same as the user name. The character length corresponding to the name XX2 is 23 o In step S520, if the lengths of multiple characters are greater than or equal to the preset character length, determine other user data corresponding to the multiple first data one-to-one. Among them, the preset character length is a character length threshold for further judging whether the username corresponding to the character length is consistent. If the length of multiple characters is greater than or equal to the preset length, it can be proved that the username corresponding to the character length is consistent, and then it is necessary Other user data included in the first data corresponding to the multiple user names is determined to ensure that subsequent judgment can be made on whether the other user data is consistent, and thereby determine whether the first data can be integrated. For example, it is determined that the character length corresponding to the user name XXI is 26, the character length corresponding to the user name XX2 is 23, and the default character length is 20. Obviously, the character lengths of the above two user names are greater than 20, and then It is determined that the other user data included in the first data corresponding to the user name XXI is A-1, and the other user data included in the first data corresponding to the user name XX2 is determined to be C-1. In this exemplary embodiment, if the similarity value is greater than the preset similarity value and the plurality of character lengths are greater than the preset character length, other user data corresponding to the plurality of first data is determined. On the one hand, from the similarity The two dimensions of value and character length are used to determine whether the user names are consistent, which improves the logic of determining whether the user names are consistent; on the other hand, it ensures that only when the user names are consistent, other user data of the first data are determined, improving improve the efficiency of data integration. In an optional embodiment, Figure 6 shows a schematic flow chart of integrating multiple first data that are consistent with other user data to obtain target data in a medical data integration method. As shown in Figure 6, the method at least includes the following steps : In step S610, if there is a plurality of other user data corresponding to other user data fields in the plurality of first data, a judgment is made on the plurality of other user data fields. Among them, other user data fields correspond to other user data. For example, when the other user data field is id_number, the other user data corresponding to the other user data fields is the ID card number. Specifically, the other user data includes the ID card number, birth ID number, passport number and patient number. Correspondingly, in addition to other user data fields corresponding to the ID card number, there are also other user data fields corresponding to the birth certificate number, and other user data fields corresponding to the passport number. Other user data fields corresponding to the patient number need to be compared with the ID number, birth certificate number, passport number and patient number to ensure that the multiple first data are the data that need to be integrated. However, the first data may include four other user data corresponding to the ID number, birth certificate number, passport number and patient number respectively, or it may only include any one of the above four other user data, or it may only include Including any two of the above four other user data, or may only include any three of the above four other user data. Based on this, no matter whether the plurality of first data includes any of the four other user data mentioned above, it can first be determined whether there is other user data corresponding to other user data fields in the first data. If there is other user data in the first data, Multiple other user data corresponding to other user data fields, then determine whether the multiple other user data fields are consistent. For example, the first data is data A to be integrated and data C to be integrated. Furthermore, in the data A to be integrated, there is an ID number 6213004 corresponding to the id_nunber field of other user data, and in the data C to be integrated, there is also an ID number 6213004 corresponding to the field id_nunber of other user data. The user data field id_nunber corresponds to the ID number 6213004. Obviously, at this time, there are other user data corresponding to other user data fields in the above two first data, and the other user data fields are all id numbero. For example, The first data is the data E to be integrated and the data F to be integrated. Moreover, in the data E to be integrated, there is an ID card number 12456 corresponding to the field id_nunber of other user data, and in the data F to be integrated, there is also an ID number 12456 corresponding to the field id_nunber of other user data. The corresponding ID number is 12456. In addition, in the data E to be integrated, there is also a passport number 0023 corresponding to other user data fields passport. In the data F to be integrated, there is also a passport number corresponding to other user data fields passport. 2256. Obviously, at this time, there are other user data corresponding to other user data fields in the above two first data, and the other user data fields are consistent. In step S620, if the other user data fields corresponding to the plurality of first data are consistent and consistent with the other user data corresponding to the other user data fields, the plurality of first data are integrated to obtain the target data. Wherein, if multiple other user data fields are consistent and consistent with other user data corresponding to multiple other user data fields, the multiple first data are integrated. For example, the first data is data A to be integrated and data C to be integrated. Furthermore, in the data A to be integrated, there is an ID number 6213004 corresponding to the id_nunber field of other user data, and in the data C to be integrated, there is also an ID number 6213004 corresponding to the field id_nunber of other user data. The ID number corresponding to the user data field id_nunber is 6213004. Obviously, at this time, there are other user data corresponding to other user data fields in the above two first data, and the other user data fields are all id_number, and the other user data corresponding to the other user data field id_number are all 6213004, Then the data to be integrated A and the data to be integrated C are integrated to obtain the target data. For example, the first data is the data to be integrated E and the data to be integrated F, and in the data to be integrated there is the ID number 12456 corresponding to the id_nunber field of other user data, and in the data F to be integrated there is also the ID number 12456 corresponding to the other user data field id_nunber. The ID number corresponding to the user data field id_nunber is 12456. In addition, there is also a passport number 0023 corresponding to other user data fields passport in the data E to be integrated. There is also a passport number 0023 corresponding to other user data fields passport in the data F to be integrated. The corresponding passport number is 2256. Obviously, at this time, there are other user data corresponding to other user data fields in the above two first data, and the other user data fields are consistent. However, the passport number corresponding to the other user data field passport is inconsistent, then The data to be integrated E and the data to be integrated F are not integrated. In this exemplary embodiment, if there are multiple other user data corresponding to other user data fields in the multiple first data, and the multiple other user data fields and other user data are consistent, then for the multiple first data The target data is obtained through integration, which improves the logic of integrating multiple first data and improves the accuracy of data integration. In an optional embodiment, FIG. 7 shows a schematic flowchart of integrating multiple first data to obtain target data in a medical data integration method. As shown in FIG. 7 , the method at least includes the following steps: In step S710 , determine the character lengths of multiple user names corresponding to the multiple first data one-to-one, and compare the multiple character lengths to obtain a character comparison result. Among them, the character length of the username refers to the number of characters used to compose the username, the character comparison result is the result obtained by comparing multiple character lengths, and the multiple character lengths refers to the result of comparing multiple character lengths. One piece of data respectively corresponds to the character length of the user name. For example, the first data includes data A to be integrated and data C to be integrated. The user name corresponding to the data A to be integrated is "Jone", and the user name corresponding to the data C to be integrated is "Jone Doe". Obviously , the character length of the username "Jone Doe" is greater than the character length of the username "Jone". In step S720, according to the character comparison result, a mapping relationship between the first user name and the second user name is established to integrate multiple first data to obtain the target data; wherein the character length of the first user name is greater than that of the second user name. 2. The character length of the username. Among them, based on the character comparison result, a mapping relationship between the first user name and the second user name is established, and the value It should be noted that the character length of the first username must be greater than the character length of the second username. For example, the first data includes data A to be integrated and data C to be integrated. The user name corresponding to the data A to be integrated is "Jone", and the user name corresponding to the data C to be integrated is "Jone Doe". Obviously , the character length of the username "Jone Doe" is greater than the character length of the username "Jone". Based on this, the first user name is "Jone Doe" and the second user name is "Jone". Specifically, the mapping relationship between the first user name and the second user name can be established in the form of a key-value pair, for example: "Jone Doe": "Jone", which indicates that the user corresponding to the first user name "Jone Doe" and the user corresponding to the second user name "Jone" are the same user. It is worth mentioning that in the subsequent data verification process, the target data is queried according to the key. Since the key is a user name with a long character length, querying based on the key can ensure the accuracy of the queried target data. Achieve accurate verification of data. In this exemplary embodiment, a mapping relationship between the first user name and the second user name is established based on the character comparison result, thereby ensuring that when the data is subsequently verified, the first user with a longer character length can be found. The corresponding target data avoids the target data query error caused by the second user with a shorter character length, which may lead to subsequent verification failure. In an optional embodiment, Figure 8 shows a schematic flowchart of integrating multiple first data to obtain target data in the medical data integration method. As shown in Figure 8, the method at least includes the following steps: In step S810 , if there are specific characters in the plurality of first data, remove the specific characters to obtain multiple first data excluding the specific characters; wherein the specific characters include all characters except the American Information Exchange Code. Among them, the American Standard Code for Information Interchange refers to the ASCII code, and the specific characters refer to all characters except the ASCII code, that is, the specific characters refer to the non-ASCII codes. The reason why the specific characters need to be removed is because multiple characters cannot be processed. The non-ASCII codes in the first data are processed, and then the non-ASCII codes in the plurality of first data need to be removed, so that the plurality of first data that do not include non-ASCII codes can be processed. For example, the first data is data A to be integrated and data C to be integrated. Among them, the data A to be integrated contains non-ASCII code XXX, that is, specific characters are stored in the first data, and XXX is moved from the data A to be integrated. Divide to obtain data A that does not include specific characters. In step S820, a plurality of first data excluding specific characters are integrated to obtain target data. After removing the non-ASCII codes, the plurality of first data excluding the non-ASCII codes are integrated to obtain the target data. For example, the first data is data A to be integrated and data C to be integrated. The data A to be integrated includes the specific character XXX, that is, there is a non-ASCII code in the first data, and XXX is removed from the data A to be integrated. Remove to obtain the data to be integrated A that does not include non-ASCII codes, and then integrate the data to be integrated C and the data to be integrated A that does not include non-ASCII codes to obtain the target data. In this exemplary embodiment, the non-ASCII codes present in the first data are removed to avoid the subsequent inability to process the non-ASCII codes in the first data and improve the result obtained after integrating multiple first data. Head accuracy and efficiency of standard data. In an optional embodiment, after integrating multiple first data that are consistent with other user data to obtain the target data, the method further includes: storing the target data according to a specific data format. Among them, after obtaining the target data, the target data can also be stored according to a specific data format. Specifically, the specific data format can be JSON (JavaScript Object Notation, JS object notation) format, and the specific data format can also be a table format. Specific The data format can also be a number corresponding to a certain database, and the specific data format can also be any data format, which is not specifically limited in this exemplary embodiment. For example, after integrating multiple first data, the obtained target data is stored in Json format. In this exemplary embodiment, the target data is stored in a specific data format to facilitate subsequent identification of the target data, thereby improving the efficiency of verification when data verification is required for the user. In the methods and devices provided by exemplary embodiments of the present disclosure, target data is obtained by integrating multiple first data that are consistent with other user data, and the first data is a user in the data to be integrated from multiple different systems. Data with the same date of birth realizes the integration of data to be integrated in different systems. On the one hand, it avoids the verification failure in the existing technology when the data to be integrated is verified, and improves the efficiency of verification; on the other hand, , avoiding the situation in the existing technology that all the data to be integrated cannot be integrated, thereby improving the integration accuracy and efficiency of the data to be integrated. The medical data integration method in the embodiment of the present disclosure will be described in detail below in conjunction with an application scenario. Obtain data A from system 1 to be integrated, data B from system 2 to be integrated, data C from system 3 to be integrated, and data to be integrated D from system 4, where, since it corresponds to data A to be integrated The birth date of the user is the same as the birth date of the user corresponding to the data B to be integrated. Therefore, the plurality of first data are determined to be the data A and the data to be integrated. o The user name corresponding to the data A to be integrated is determined to be "Jone Doe"", the user name corresponding to the data B to be integrated is "Jone". The similarity calculation of the above two user names resulted in a similarity value of 1.2. Since the similarity value is greater than the preset similarity value of 0.9, it is determined that the user name is "Jone". The other user data A-1 corresponding to the integrated data A is also determined to be the other user data B-1 corresponding to the data B to be integrated. Since the other user data A-1 is consistent with the other user data B-1, the data A to be integrated is Integrate it with the data B to be integrated to obtain the target data. In this application scenario, multiple first data that are consistent with other user data are integrated to obtain the target data, and the first data is the data with the same birth date of the user in the data to be integrated from multiple different systems, thereby achieving The integration of data to be integrated in different systems, on the one hand, avoids the failure of verification in the existing technology when the data to be integrated is verified, and improves the efficiency of verification; on the other hand, it avoids the failure of the existing technology to verify the data to be integrated. All data to be integrated are integrated, thereby improving the integration accuracy and efficiency of the data to be integrated. Furthermore, in an exemplary embodiment of the present disclosure, a medical data integration device is also provided. Figure 9 shows a schematic structural diagram of a medical data integration device. As shown in Figure 9, the medical data integration device 900 may include: Determine module 910, similarity calculation module 920 and integration module 930. Among them: the determination module 910 is configured to obtain data to be integrated from multiple different systems, and determine multiple first data with the same birth date of the user from the multiple data to be integrated; the similarity calculation module 920 is configured In order to determine the usernames corresponding to the plurality of first data one-to-one, and perform similarity calculation on the plurality of usernames to obtain similarity values; the integration module 930 is configured to: if the similarity value is greater than or equal to the preset similarity value, A plurality of other user data corresponding to the plurality of first data are determined to integrate the plurality of first data that are consistent with the other user data to obtain the target data. The specific details of the above-mentioned medical data integration device 900 have been described in detail in the corresponding medical data integration method, so they will not be described again here. It should be noted that although several modules or units of the medical data integration device 900 are mentioned in the above detailed description, this division is not mandatory. In fact, according to embodiments of the present disclosure, the features and functions of two or more modules or units described above may be embodied in one module or unit. Conversely, the features and functions of one module or unit described above may be further divided into multiple modules or units to be embodied. In addition, in an exemplary embodiment of the present disclosure, an electronic device capable of implementing the above method is also provided. The electronic device 1000 according to this embodiment of the present invention will be described below with reference to FIG. 10 . The electronic device shown in FIG. 10 1000 is just an example and should not bring any limitation to the functions and usage scope of the embodiments of the present invention. As shown in Figure 10, electronic device 1000 is embodied in the form of a general computing device. The components of the electronic device 1000 may include, but are not limited to: the above-mentioned at least one processing unit 1010, the above-mentioned at least one storage unit 1020, a bus 1030 connecting different system components (including the storage unit 1020 and the processing unit 1010), the display unit 1040o, wherein, the The storage unit stores program code, which can be executed by the processing unit 1010, so that the processing unit 1010 performs the steps according to various exemplary embodiments of the present invention described in the "Exemplary Method" section of this specification. . The storage unit 1020 may include a readable medium in the form of a volatile storage unit, such as a random access storage unit (RAM) 1021 and/or a cache storage unit 1022, and may further include a read-only storage unit (ROM) 1023 o Storage unit 1020 may also include a program/usage tool 1024 having a set of (at least one) program modules 1025. Such program modules 1025 include, but are not limited to: an operating system, one or more application programs, other program modules, and program data, examples of which Each of these, or some combination of them, may contain the reality of a networked environment. Bus 1030 may represent one or more of several types of bus structures, including a memory unit bus or memory unit controller, a peripheral bus, a graphics acceleration port, a processing unit, or using any of a variety of bus structures. It means the local bus of the bus structure. Electronic device 1000 may also communicate with one or more external devices 1070 (e.g., keyboard, pointing device, Bluetooth device, etc.), may also communicate with one or more devices that enable a user to interact with electronic device 1000, and/or with Any device (eg, router, modem, etc.) that enables the electronic device 1000 to communicate with one or more other computing devices. This communication may occur through input/output (I/O) interface 1050. Moreover, the electronic device 1000 can also communicate with one or more networks (such as a local area network (LAN), a wide area network (WAN) and/or a public network, such as the Internet) through the network adapter 1060. As shown, network adapter 1060 communicates with other modules of electronic device 1000 via bus 1030. It should be understood that, although not shown in the figure, other hardware and/or software modules may be used in conjunction with the electronic device 1000, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAI systems, tape drives And data backup storage system, etc. Through the above description of the embodiments, those skilled in the art can easily understand that the example embodiments described here can be implemented by software, or can be implemented by software combined with necessary hardware. Therefore, the technical solution according to the embodiment of the present disclosure can be embodied in the form of a software product. The software product can be stored in a non-volatile storage medium (which can be a CD-ROM, U disk, mobile hard disk, etc.) or on a network. above, including several instructions to cause a computing device (which may be a personal computer, a server, a terminal device, or a network device, etc.) to execute the method according to the embodiment of the present disclosure. In an exemplary embodiment of the present disclosure, a computer-readable storage medium is also provided, on which a program product capable of implementing the method described above in this specification is stored. In some possible embodiments, various aspects of the present invention can also be implemented in the form of a program product, which includes program code. When the program product is run on a terminal device, the program code is used to cause the The terminal device performs the steps according to various exemplary embodiments of the present invention described in the "Exemplary Method" section above in this specification. Referring to FIG. 11 , a program product 1100 for implementing the above method according to an embodiment of the present invention is described, which can adopt a portable compact disk read-only memory (CD-ROM) and include program code, and can be used on a terminal device, For example, run on a personal computer. However, the program product of the present invention is not limited thereto. In this document, a readable storage medium may be any tangible medium containing or storing a program, which may be used by or in combination with an instruction execution system, apparatus or device. The program product may take the form of any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device or device, or any combination of the above. More specific examples (non-exhaustive list) of readable storage media include: electrical connection with one or more wires, portable disk, hard disk, random access memory (RAM), read only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above. A computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, It contains readable program code. This propagated data signal can take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the above. A readable signal medium may also be any readable medium other than a readable storage medium that may send, propagate, or transmit a program for use by or in conjunction with an instruction execution system, apparatus, or device. The program code contained on the readable medium can be transmitted using any appropriate medium, including but not limited to wireless, wired, optical cable, RF, etc., or any suitable combination of the above. Program code for performing the operations of the present invention may be written in any combination of one or more programming languages, including object-oriented programming languages such as Java, C++, etc., as well as conventional procedural programming. Programming language - such as "C" or a similar programming language. The program code may execute entirely on the user's computing device, partly on the user's computing device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server execute on. In situations involving remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computing device (e.g., provided by an Internet service). business to connect via the Internet). Other embodiments of the disclosure will be readily apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure that follow the general principles of the disclosure and include common knowledge or customary technical means in the technical field that are not disclosed in the disclosure. . It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

Claims

Rights request

1. A medical data integration method, characterized in that the method includes: acquiring data to be integrated from multiple different systems, and determining multiple users with the same birth date among the multiple data to be integrated. One data; Determine the usernames corresponding to the plurality of first data, and perform similarity calculation on the plurality of usernames to obtain a similarity value; If the similarity value is greater than or equal to the preset similarity value , determine other user data that corresponds one-to-one with the plurality of first data, so as to integrate the plurality of first data that are consistent with the other user data to obtain target data.

2. The medical data integration method according to claim 1, wherein the obtaining data to be integrated from a plurality of different systems includes: obtaining data tables to be integrated from a plurality of different systems, and obtaining data from all the data tables to be integrated. Extract data to be integrated corresponding to specific data fields from the data table to be integrated; wherein, the specific data fields include unique data identifiers, the user's birth date, the user name and other user data fields, and the other user data Fields include ID number, birth certificate number, passport number, and patient number.

3. The medical data integration method according to claim 2, wherein the determining a user name that corresponds to a plurality of the first data in a one-to-one manner includes: determining a user name that corresponds to a plurality of the first data in a one-to-one manner. The value of the unique data identifier; if the values of multiple unique data identifiers are different, determine multiple user names that correspond one-to-one to multiple first data.

4. The medical data integration method according to claim 1, characterized in that: performing similarity calculation on a plurality of user names to obtain a similarity value includes: determining a plurality of the user names included in the plurality of user names. Vocabulary; According to the frequency of occurrence of the plurality of words in each user name, determine a high-dimensional vector corresponding to each user name; According to the high-dimensional vector corresponding to each user name, Calculate cosine distances between multiple user names to determine similarity values between multiple user names.

5. The medical data integration method according to claim 1, wherein if the similarity value is greater than or equal to a preset similarity value, determine other users corresponding to a plurality of the first data one-to-one. The data includes: if the similarity value is greater than or equal to the preset similarity value, determine the character length corresponding to multiple user names; if the character length of multiple usernames is greater than or equal to the preset character length, determine Other user data corresponding one-to-one to a plurality of first data.

6. The medical data integration method according to claim 2, characterized in that: Integrating multiple first data that are consistent with other user data to obtain target data includes: if there are multiple other user data corresponding to the other user data fields in the multiple first data, then A plurality of other user data fields are judged; if the other user data fields corresponding to a plurality of the first data are consistent and consistent with the other user data corresponding to the other user data fields, then Integrate multiple first data to obtain target data.

7. The medical data integration method according to claim 1, wherein said integrating a plurality of said first data to obtain target data includes: determining a user corresponding to a plurality of said first data in a one-to-one manner. the character length of the name, and compares multiple character lengths to obtain a character comparison result; establishes a mapping relationship between the first user name and the second user name according to the character comparison result, so as to compare multiple first user names A data is integrated to obtain target data; wherein the character length of the first user name is greater than the character length of the second user name.

8. The medical data integration method according to claim 7, wherein said integrating a plurality of said first data to obtain target data includes: if there are specific characters in a plurality of said first data, Then remove the specific characters to obtain a plurality of first data that do not include the specific characters; wherein, the specific characters include all characters except the American Information Exchange Code; for A plurality of first data of specific data are integrated to obtain target data.

9. The medical data integration method according to any one of claims 1 to 8, characterized in that, after integrating the plurality of first data that are consistent with the other user data to obtain the target data, the method It also includes: storing the target data in a specific data format.

10. A medical data integration device, characterized in that it includes: a determination module configured to obtain data to be integrated from multiple different systems, and to determine the user with the same birth date among the multiple data to be integrated. A plurality of first data; a similarity calculation module, configured to determine a user name corresponding to a plurality of the first data, and perform similarity calculation on a plurality of the user names to obtain a similarity value; an integration module, is configured to determine, if the similarity value is greater than or equal to a preset similarity value, other user data that corresponds to a plurality of the first data one-to-one, so as to identify a plurality of the first data that is consistent with the other user data. The data is integrated to obtain the target data.

11. An electronic device, characterized in that it includes: a processor; a memory for storing executable instructions of the processor; wherein the processor is configured to execute the claims by executing the executable instructions 1-9 in The medical data integration method described in any of 19.

12. A computer-readable storage medium on which a computer program is stored, characterized in that, when the computer program is executed by a processor, the medical data integration method described in any one of claims 1-9 is implemented.