CN115497633B

CN115497633B - Data processing method, device, equipment and storage medium

Info

Publication number: CN115497633B
Application number: CN202211291571.6A
Authority: CN
Inventors: 黄皓; 李天一; 朱靖源
Original assignee: Lianren Healthcare Big Data Technology Co Ltd
Current assignee: Lianren Healthcare Big Data Technology Co Ltd
Priority date: 2022-10-19
Filing date: 2022-10-19
Publication date: 2024-01-30
Anticipated expiration: 2042-10-19
Also published as: CN115497633A

Abstract

The invention discloses a data processing method, a device, equipment and a storage medium. The method includes receiving data to be processed; the data to be processed comprises health data of two users and user basic information; inputting the health data of each user into a corresponding health data twin network model to obtain first vectors corresponding to each health data, and determining the similarity between the two first vectors; inputting user basic information and similarity of two users into a discriminant model obtained by training in advance, and determining the comprehensive similarity between the two users; based on the comprehensive similarity, whether to combine the data to be processed of the two users is determined. According to the embodiment of the invention, the user main index matching system with stronger applicability is constructed, the data association degree of the healthy big data center is improved, the integration of similar user information in each system is realized, and convenience is provided for the subsequent information retrieval and use.

Description

Data processing method, device, equipment and storage medium

Technical Field

The present invention relates to the field of computer technologies, and in particular, to a data processing method, apparatus, device, and storage medium.

Background

With the increase of the degree of medical informatization, it is necessary to integrate user health data stored in a plurality of medical institutions in the construction of a medical health large data center.

According to the traditional data processing method, basic information of a user is compared, a similarity evaluation model is trained based on rules set manually, and data of the user are integrated.

According to the method, only basic information of the user is considered, the dimension is considered to be single, the accuracy of the result is low, and the adaptation degree to the user is low.

Disclosure of Invention

The invention provides a data processing method, a device, equipment and a storage medium, which realize the organic integration of health data generated by different institutions, different businesses and different times through the comparison analysis of the comprehensive similarity of the basic information and the health data of two suspected matched users, and improve the accuracy of data integration.

In a first aspect, an embodiment of the present invention provides a data processing method, including:

receiving data to be processed;

the data to be processed comprises health data of two users and user basic information;

inputting the health data of each user into a corresponding health data twin network model to obtain first vectors corresponding to each health data, and determining the similarity between the two first vectors;

Inputting user basic information and similarity of two users into a discriminant model obtained by training in advance, and determining the comprehensive similarity between the two users:

based on the comprehensive similarity, whether to combine the data to be processed of the two users is determined.

In a second aspect, an embodiment of the present invention further provides a data processing apparatus, which is applied to data processing, where the data processing apparatus includes:

and the data receiving module is used for receiving two groups of data to be processed.

the similarity calculation module is used for inputting the health data of each user into the corresponding health data twin network model, obtaining first vectors corresponding to each health data, and determining the similarity between the two first vectors;

and the comprehensive similarity calculation module is used for inputting the user basic information and the similarity of the two users into a pre-trained discrimination model to determine the comprehensive similarity between the two users.

And the decision module is used for determining whether to combine the data to be processed of the two users based on the comprehensive similarity.

In a third aspect, an embodiment of the present invention further provides an electronic device, including:

At least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the data processing method according to any one of the embodiments of the present invention.

In a fourth aspect, an embodiment of the present invention further provides a computer readable storage medium, where computer instructions are stored, where the computer instructions are configured to cause a processor to execute the data processing method according to any one of the embodiments of the present invention.

According to the technical scheme, the data to be processed are received; inputting the health data of each user into a corresponding health data twin network model to obtain first vectors corresponding to each health data, and determining the similarity between the two first vectors; inputting user basic information and similarity of two users into a discriminant model obtained by training in advance, and determining the comprehensive similarity between the two users; based on the comprehensive similarity, whether to combine the data to be processed of the two users is determined. By comprehensively processing and analyzing the basic information and the health information of the user, a user main index matching system with stronger applicability is constructed, the integration of similar user information in each system is realized, and convenience is provided for the subsequent information retrieval and use.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the invention or to delineate the scope of the invention. Other features of the present invention will become apparent from the description that follows.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a data processing method according to a first embodiment of the present invention;

FIG. 2 is a flow chart of a data processing method according to a second embodiment of the present invention;

FIG. 3 is a flow chart of a data processing method according to a third embodiment of the present invention;

FIG. 4 is a schematic diagram of a data processing apparatus according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of an electronic device implementing a data processing method according to an embodiment of the present invention.

Detailed Description

In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Example 1

Fig. 1 is a flowchart of a data processing method according to a first embodiment of the present invention, where the method may be performed by a data processing device, the data processing device may be implemented in hardware and/or software, and the device may be configured in a computer. As shown in fig. 1, the method includes:

s110, receiving data to be processed.

The data to be processed comprises health data of two users and user basic information.

The health data refers to data related to the health of the user, such as related data of previous diseases, previous operations, previous medication and the like. The basic information may be basic information used to characterize the identity of the user, such as name, gender, address, etc. The dimension of the information specifically included in the health data and the basic information is not limited herein, and may be selected according to actual requirements.

Specifically, under the condition that whether two users are similar users or not and/or user information integration is required, staff can edit, call and upload data to be processed in the system so as to receive the data later.

S120, inputting the health data of each user into a corresponding health data twin network model, obtaining first vectors corresponding to each health data, and determining the similarity between the two first vectors.

The health data twin network model can be obtained by training user data associated with known information. The health data twin network model may be a DNN network, and its model parameters are all trained in advance. The twin network model based on the health data can perform feature engineering on the data to be processed to obtain feature vectors matched with the health data of the user. Optionally, the dimension of the user characteristic is reduced through the partial DNN network, so that a first vector corresponding to the user health data is obtained. The first vector is a feature vector that contains user health data features and the computer is capable of processing the analysis. The similarity is a probability value for representing the similarity degree of the health data of the two users, and the closer the similarity is to 1, the higher the similarity of the two users is, and the closer the similarity is to 0, the lower the similarity is.

Specifically, information such as past diseases, past operations, past medication and the like in the two user health data are respectively input into a corresponding health data twin network model, and two first vectors which can respectively represent the characteristics of the two user health data are obtained through operations such as preprocessing, characteristic selection, characteristic construction, characteristic dimension reduction and the like of the model. Then, the similarity of the two first vectors is determined. In determining the similarity, a corresponding algorithm, such as a cosine similarity algorithm, may be used to calculate a distance value between the two first vectors to determine the similarity based on the distance value, alternatively, a smaller distance value indicates a higher user similarity and a lower user similarity.

It should be noted that, the health data twin network model is two neural network models, and the model structures are the same, and health data of two users can be respectively processed based on a one-to-one correspondence relationship so as to obtain corresponding first vectors.

S130, inputting user basic information and similarity of the two users into a pre-trained discrimination model, and determining the comprehensive similarity between the two users.

The discrimination model is obtained through pre-training, and optionally, the adopted model structure can be a linear model, a support vector machine model, a tree model, a deep network model and other common machine learning models. The comprehensive similarity is a probability value obtained by combining the basic information and the health data similarity of the two users and is used for representing the similarity degree of the comprehensive information of the users. The closer the integrated similarity is to 1, the higher the integrated similarity of the two users is, and the closer the integrated similarity is to 0, the lower the integrated similarity of the two users is.

Specifically, basic information similar features of two users are obtained through feature engineering, the obtained similarity of the two first vectors and the basic information similar features are spliced to obtain similar comprehensive feature vectors of the two users, and the comprehensive similarity is obtained after the processing of a discrimination model.

And S140, determining whether to combine the data to be processed of the two users or not based on the comprehensive similarity.

Specifically, a comprehensive similarity threshold value can be preset, and when the determined comprehensive similarity is within the range of the comprehensive similarity threshold value, the information of the two users can be considered to be combined; otherwise, the information combination processing is not performed.

According to the embodiment of the invention, the data to be processed is received; the data to be processed comprises health data of two users and user basic information; inputting the health data of each user into a corresponding health data twin network model to obtain first vectors corresponding to each health data, and determining the similarity between the two first vectors; inputting user basic information and similarity of two users into a discriminant model obtained by training in advance, and determining the comprehensive similarity between the two users; based on the comprehensive similarity, whether to combine the data to be processed of the two users is determined. According to the technical scheme, the problem that only the user basic information is considered and the dimension is single is solved by comprehensively processing and analyzing the user basic information and the health data. The method and the system realize the integration of similar user information in each system, provide convenience for the subsequent information retrieval and use, improve the accuracy of data integration and have higher adaptation to users.

Example two

Fig. 2 is a flowchart of a data processing method according to a second embodiment of the present invention, on the basis of the foregoing embodiment, a corresponding training sample may be obtained first, so as to obtain a discriminant model based on training samples, and further determine a comprehensive similarity between two users based on the discriminant model.

As shown in fig. 2, the method includes:

s210, determining a training sample set.

The training sample set comprises a plurality of training samples, wherein the training samples comprise positive samples, corresponding positive labels, negative samples and corresponding negative labels.

Wherein, the positive sample is the user characteristic data of two similar users which are determined manually, and the label is set to be 1. A positive label is a label set for a positive sample. The negative sample is user characteristic data of two dissimilar users determined manually, and the label is set as a negative label. Negative labels are labels set for negative examples.

For example, comparing name features, gender features and age features of the users 1 and 2 manually, if the feature data are similar, considering that the two users are similar users, taking the feature data of the two users as positive samples, and setting the label of the positive samples as 1; if the feature data are dissimilar, the two users are considered to be dissimilar users, the feature data of the two users are taken as a negative sample, and the label of the negative sample is set to 0.

Specifically, the training sample set includes basic information data features of a plurality of users, such as names, ages, addresses, and the like. If the basic information of the two users can be determined to be similar users through manual examination, the characteristic data of the two users are taken as positive samples, and the labels of the two users are set as positive labels. If the basic information of the two users can be determined to be dissimilar users through manual verification, taking the data characteristics of the two users as negative samples, and setting the labels of the two users as negative labels.

S220, for each training sample, inputting the current training sample into the to-be-trained discrimination model to obtain a corresponding actual output similarity value.

The feature data of the user a and the user b are input into a to-be-trained judging model, and the actual output similarity value of the two feature data is calculated through the processing of the model on the two user feature data.

In this embodiment, one of the training samples may be described as an example, with the processing manner of each training sample being the same based on the discrimination model. The currently introduced training sample may be referred to as the current training sample.

The model parameters in the to-be-trained discriminant model are default values, and because the to-be-trained discriminant model is not a trained discriminant model, the content output after the to-be-trained discriminant model processes the current training sample may not be consistent with the tag content in the current training sample, and the content output by the to-be-trained discriminant model can be used as an actual output similarity value.

S230, determining a loss value based on the actual output similarity value and the label of the current training sample, and correcting the model parameters in the to-be-trained judging model based on the loss value.

S240, converging the loss function in the to-be-trained discriminant model to be used as a training target, and obtaining the discriminant model.

Specifically, the training error of the loss function in the to-be-trained discriminant model, namely, the loss parameter is used as a condition for detecting whether the loss function reaches convergence, for example, whether the training error is smaller than a preset error or whether the error change trend tends to be stable or whether the current iteration frequency is equal to the preset frequency. If the detection reaches the convergence condition, for example, the training error of the loss function is smaller than the preset error, or the error change trend tends to be stable, the surface to be trained is judged that model training is completed, and at the moment, iterative training can be stopped. If the current condition is not met, other sample data can be further acquired to train the model continuously until the training error of the loss function is within the preset range.

It can be understood that when the training error of the loss function reaches convergence, a trained discriminant model can be obtained. At this time, after the user health data and/or the basic information are input into the model, more accurate similarity can be obtained.

S250, receiving data to be processed.

S260, inputting the health data of one user into the first health data twin model to obtain a first vector to be processed, and inputting the health data of the other user into the second health data twin model to obtain a second vector to be processed.

The health data may include, among other things, whether a disease is present, whether a blood pressure value is above a normal level, whether a leukocyte level is above or below a normal level, and the like. If yes, the corresponding identification bit is represented by 1, and if not, the corresponding identification bit is represented by 0.

Illustratively, the health data of the user a is input into a first health data twin model to obtain a first vector to be processed. And inputting the health data of the user b into a second health data twin model to obtain a second vector to be processed. The analysis processing logic of the first health data twin model and the second health data twin model on the data is the same.

S270, determining the similarity between the first vector to be processed and the second vector to be processed based on a preset similarity algorithm.

The preset similarity algorithm may be a cosine similarity algorithm, a euclidean distance algorithm, and the like. Alternatively, in this embodiment, a cosine similarity algorithm may be used to calculate a distance value between the first vector to be processed and the second vector to be processed, where a smaller distance value indicates a higher user similarity, and conversely, a lower similarity.

S280, inputting the user basic information and the similarity of the two users into a pre-trained discrimination model, and determining the comprehensive similarity between the two users.

S290, determining whether to combine the data to be processed of the two users based on the comprehensive similarity.

According to the technical scheme, the discrimination model to be trained is trained, and then the data to be processed is received; the data to be processed comprises health data of two users and user basic information; inputting the health data of each user into a corresponding health data twin network model to obtain first vectors corresponding to each health data, and determining the similarity between the two first vectors; inputting user basic information and similarity of two users into a discriminant model obtained by training in advance, and determining the comprehensive similarity between the two users; based on the comprehensive similarity, whether to combine the data to be processed of the two users is determined. According to the technical scheme, the problem that only the user basic information is considered and the dimension is single is solved by comprehensively processing and analyzing the user basic information and the health data. The method and the system realize the integration of similar user information in each system, provide convenience for the subsequent information retrieval and use, improve the accuracy of data integration and have higher adaptation to users.

Example III

Fig. 3 is a flowchart of a data processing method according to a third embodiment of the present invention, where, based on the foregoing embodiment, user basic information and similarities of two users may be input into a pre-trained discrimination model, and the comprehensive similarity between the two users may be determined to be refined. Reference may be made to the detailed description of the embodiments of the present invention, where technical terms that are the same as or corresponding to the above embodiments are not repeated herein. As shown in fig. 3, the method includes:

s310, receiving data to be processed.

S320, inputting the health data of each user into a corresponding health data twin network model, obtaining first vectors corresponding to each health data, and determining the similarity between the two first vectors.

And S330, performing feature matching processing on the basic information of the two users to obtain feature matching vectors.

The feature matching is to compare each dimension in the basic information of the two users. For example, three dimensions a, b, and c in the basic information are compared: the basic information of the user a is [ a1, b1, c1], the basic information of the user b is [ a2, b2, c2], and by taking one dimension as an example, the output value obtained by the feature matching of the a1 and the a2 is 0.8, and then 0.8 is input into the corresponding identification bit of the comparison result, and finally the feature matching vectors of the two users are obtained. The feature matching vector is a feature vector which is obtained after the user basic information is compared and can represent the similarity degree of the user basic information.

Optionally, the user basic information includes field contents corresponding to a plurality of fields, and feature matching processing is performed on the basic information of the two users to obtain feature matching vectors, including: obtaining matching characteristics corresponding to the corresponding fields through matching the field contents corresponding to the same field; and determining a feature matching vector based on the matching features corresponding to the fields.

Wherein, the field can be the user basic information such as name, age, etc.

Specifically, the operation methods of the fields are the same, for example, one of the fields is taken as an example, for example, feature matching processing is performed on name fields of two users, and if the names are the same or homophones are different, 1 or name similarity is input into name identification bits in matching features. And finally obtaining the feature matching vector after carrying out the same operation on each field.

S340, obtaining a target vector through splicing the feature matching vector and the similarity.

The target vector is a feature vector obtained by splicing the similarity on the basis of the feature matching vector.

S350, inputting the target vector into the discrimination model to obtain the comprehensive similarity between the two users.

Specifically, the target vector is input into a trained discrimination model, and after the analysis and processing of the vector by the model, the comprehensive similarity between two users is input.

S360, determining whether to combine the data to be processed of the two users based on the comprehensive similarity.

In this embodiment, it may be: if the comprehensive similarity is higher than a first preset similarity threshold, merging the data to be processed of the two users; if the comprehensive similarity is smaller than a second preset similarity threshold, refusing to combine the data to be processed of the two users; if the comprehensive similarity is larger than a second preset similarity threshold and smaller than the first preset similarity threshold, sending the data to be processed of the two users to the target equipment, so that the auditing users corresponding to the target equipment can audit the data to be processed.

The preset similarity threshold is a similarity threshold set according to actual conditions, and whether the health data and the basic information of the two users are combined or forwarded to the manual auditing system is determined by comparing the preset similarity threshold with the comprehensive similarity.

Illustratively, when the first preset similarity threshold is 95%, the second preset similarity threshold is 60%:

if the comprehensive similarity is higher than 95%, the two users can be considered to be similar users through analysis of the discrimination model, and the health data and the basic information of the two users are combined; the method has the advantage that similar user information can be accurately integrated so as to facilitate the subsequent retrieval and use of the information.

If the comprehensive similarity is smaller than 60%, the two users can be considered as dissimilar users, and the combination of the health data and the basic information of the two users is refused;

if the comprehensive similarity is between 60% and 90%, the data is required to be transferred to a manual auditing system, and finally whether the two user information are combined is determined according to the manual auditing result.

In this embodiment, in order to process data using the discrimination model and the health data twin network model obtained by the latest training, measures may be taken as follows: periodically acquiring corresponding training samples, and respectively updating model parameters in the discrimination model and the health data twin network model to process data based on the updated discrimination model and the updated health data twin network model.

Wherein the period may be one day, one week or one month. And training the related model by periodically utilizing the newly acquired user data, and updating the model parameters.

Specifically, the periodic updating of the correlation model by online learning includes: the training set and the test set are divided again by the model data set periodically; updating the health data twin network model; updating the judging model; and recalculating the automatic matching preset similarity threshold according to the preset confidence.

Wherein, the period may be 9 points every hour, every other day or every other day. And updating the related model regularly, and continuously adjusting model parameters according to the data change. The new requirements of the current data convergence integration on the data processing method are met, and the flexible analysis and processing of the data are realized.

According to the technical scheme provided by the embodiment, the data to be processed are received; and inputting the health data of the two users into the health data twin model to obtain a first vector to be processed and a second vector to be processed. Determining the similarity between the first vector to be processed and the second vector to be processed based on a preset similarity algorithm; performing feature matching processing on the basic information of the two users to obtain feature matching vectors; the target vector is obtained through the feature matching vector and similarity splicing processing; and inputting the target vector into the discrimination model to obtain the comprehensive similarity between the two users. Based on the comprehensive similarity, whether to combine the data to be processed of the two users is determined. Through comprehensive processing analysis of user health data and basic information and regular updating of related models, a user main index matching system with stronger applicability is constructed, flexible response of the system to data change and organic integration of similar user information in each system are realized, and convenience is provided for subsequent information retrieval and use.

Example IV

Fig. 4 is a schematic structural diagram of a data processing apparatus according to a fourth embodiment of the present invention. As shown in fig. 4, the apparatus includes,

the data receiving module 410 is configured to receive two sets of data to be processed. The data to be processed comprises health data of two users and user basic information. The similarity calculation module 420 is configured to input the health data of each user into a corresponding health data twin network model, obtain first vectors corresponding to each health data, and determine a similarity between the two first vectors. The comprehensive similarity calculation module 430 is configured to input user basic information and similarities of the two users into a pre-trained discrimination model, and determine the comprehensive similarity between the two users. The decision module 440 determines whether to combine the data to be processed of the two users or to pass the data to be processed to a manual auditing system for processing based on the integrated similarity.

Based on the technical proposal, the similarity calculation module comprises,

the vector processing unit is used for inputting the health data of one user into the first health data twin model to obtain a first vector to be processed, and inputting the health data of the other user into the second health data twin model to obtain a second vector to be processed. And the similarity calculation unit is used for determining the similarity between the first vector to be processed and the second vector to be processed based on a preset similarity algorithm.

The first health data twin model and the second health data twin model have the same model structure.

Based on the technical proposal, the comprehensive similarity calculation module also comprises,

the feature matching vector calculation unit is used for carrying out feature matching processing on the basic information of the two users to obtain feature matching vectors;

the target vector calculation unit is used for obtaining a target vector through splicing the feature matching vector and the similarity;

and the comprehensive similarity calculation unit inputs the target vector into the discrimination model to obtain the comprehensive similarity between the two users.

Based on the above technical solutions, the data processing apparatus in the embodiments of the present invention further includes a discriminant model training module,

the discriminant model training module comprises,

and the training sample set determining unit is used for determining samples required by the training of the discriminant model. The training sample set comprises a plurality of training samples, wherein the training samples comprise positive samples, corresponding positive labels, negative samples and corresponding negative labels;

the actual output similarity value calculation unit is used for inputting the current training sample into the to-be-trained discrimination model for each training sample to obtain a corresponding actual output similarity value;

The loss value determining unit is used for determining a loss value based on the actual output similarity value and the label of the current training sample so as to correct model parameters in the discrimination model to be trained based on the loss value;

and the judging model determining unit is used for converging the loss function in the judging model to be trained as a training target to obtain the judging model.

Based on the above technical solutions, the data processing apparatus in the embodiments of the present invention further includes a model updating module,

and the training samples are used for periodically acquiring corresponding training samples to update model parameters in the judging model and the health data twin network model respectively so as to process data based on the updated judging model and the health data twin network model.

According to the embodiment of the invention, the data to be processed is received; the data to be processed comprises health data of two users and user basic information; inputting the health data of each user into a corresponding health data twin network model to obtain first vectors corresponding to each health data, and determining the similarity between the two first vectors; inputting user basic information and similarity of two users into a discriminant model obtained by training in advance, and determining the comprehensive similarity between the two users; based on the comprehensive similarity, whether to combine the data to be processed of the two users is determined. According to the technical scheme, the integration of medical information in multiple institutions is realized through comprehensive processing analysis of the basic information and the health data of the users, meanwhile, the accuracy of data integration is improved, and the adaptation to the users is higher.

The data processing device provided by the embodiment of the invention can execute any data processing method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method.

Example five

Fig. 5 shows a schematic diagram of the structure of an electronic device 10 that may be used to implement an embodiment of the invention. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. Electronic equipment may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.

As shown in fig. 5, the electronic device 10 includes at least one processor 11, and a memory, such as a Read Only Memory (ROM) 12, a Random Access Memory (RAM) 13, etc., communicatively connected to the at least one processor 11, in which the memory stores a computer program executable by the at least one processor, and the processor 11 may perform various appropriate actions and processes according to the computer program stored in the Read Only Memory (ROM) 12 or the computer program loaded from the storage unit 18 into the Random Access Memory (RAM) 13. In the RAM 13, various programs and data required for the operation of the electronic device 10 may also be stored. The processor 11, the ROM 12 and the RAM 13 are connected to each other via a bus 14. An input/output (I/O) interface 15 is also connected to bus 14.

Various components in the electronic device 10 are connected to the I/O interface 15, including: an input unit 16 such as a keyboard, a mouse, etc.; an output unit 17 such as various types of displays, speakers, and the like; a storage unit 18 such as a magnetic disk, an optical disk, or the like; and a communication unit 19 such as a network card, modem, wireless communication transceiver, etc. The communication unit 19 allows the electronic device 10 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.

The processor 11 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of processor 11 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, digital Signal Processors (DSPs), and any suitable processor, controller, microcontroller, etc. The processor 11 performs the respective methods and processes described above, such as the data processing method in the present embodiment.

In some embodiments, the data processing method may be implemented as a computer program tangibly embodied on a computer-readable storage medium, such as the storage unit 18. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 10 via the ROM 12 and/or the communication unit 19. When the computer program is loaded into RAM 13 and executed by processor 11, one or more steps of the data processing method described above may be performed. Alternatively, in other embodiments, the processor 11 may be configured to perform the data processing method in this embodiment by any other suitable means (e.g. by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

A computer program for carrying out methods of the present invention may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the computer programs, when executed by the processor, cause the functions/acts specified in the flowchart and/or block diagram block or blocks to be implemented. The computer program may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of the present invention, a computer-readable storage medium may be a tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. The computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. Alternatively, the computer readable storage medium may be a machine readable signal medium. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) through which a user can provide input to the electronic device. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), blockchain networks, and the internet.

The computing system may include clients and servers. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service are overcome.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present invention may be performed in parallel, sequentially, or in a different order, so long as the desired results of the technical solution of the present invention are achieved, and the present invention is not limited herein.

The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.

Claims

1. A method of data processing, comprising:

receiving data to be processed; the data to be processed comprises health data of two users and user basic information, wherein the user basic information comprises field contents corresponding to a plurality of fields;

Inputting the user basic information and the similarity of the two users into a pre-trained discrimination model, and determining the comprehensive similarity between the two users;

determining whether to combine the data to be processed of the two users based on the comprehensive similarity;

the step of inputting the health data of each user into the corresponding health data twin network model to obtain first vectors corresponding to each health data and determining the similarity between the two first vectors comprises the following steps:

inputting the health data of one user into a first health data twin model to obtain a first vector to be processed, and inputting the health data of the other user into a second health data twin model to obtain a second vector to be processed;

determining the similarity between the first vector to be processed and the second vector to be processed based on a preset similarity algorithm, wherein the preset similarity algorithm is an algorithm for calculating a distance value between the first vector to be processed and the second vector to be processed;

the first health data twin model and the second health data twin model have the same model structure;

inputting the user basic information and the similarity of each user into a pre-trained discrimination model, and determining the comprehensive similarity between the two users, wherein the method comprises the following steps:

Obtaining matching characteristics corresponding to the corresponding fields through matching the field contents corresponding to the same field;

determining the feature matching vector based on the matching features corresponding to the fields;

the feature matching vector and the similarity are spliced to obtain a target vector;

and inputting the target vector into the discrimination model to obtain the comprehensive similarity between the two users.

2. The method as recited in claim 1, further comprising:

determining a training sample set, wherein the training sample set comprises a plurality of training samples, and the training samples comprise positive samples, corresponding positive labels, negative samples and corresponding negative labels;

for each training sample, inputting the current training sample into a to-be-trained discrimination model to obtain a corresponding actual output similarity value;

determining a loss value based on the actual output similarity value and the label of the current training sample, so as to correct model parameters in the to-be-trained judging model based on the loss value;

and converging the loss function in the discrimination model to be trained as a training target to obtain the discrimination model.

3. The method of claim 2, wherein the determining whether to combine the data to be processed for the two users based on the integrated similarity comprises:

if the comprehensive similarity is higher than a first preset similarity threshold, merging the data to be processed of the two users;

if the comprehensive similarity is smaller than a second preset similarity threshold, refusing to combine the data to be processed of the two users;

and if the comprehensive similarity is larger than the second preset similarity threshold and smaller than the first preset similarity threshold, sending the data to be processed of the two users to target equipment so that the auditing users corresponding to the target equipment can audit and process the data to be processed.

4. The method as recited in claim 1, further comprising:

and periodically acquiring corresponding training samples to update model parameters in the judging model and the health data twin network model respectively so as to process data based on the updated judging model and the health data twin network model.

5. A data processing apparatus, the apparatus comprising:

And a data receiving module: for receiving two sets of data to be processed;

the data to be processed comprises health data of two users and user basic information, wherein the user basic information comprises field contents corresponding to a plurality of fields;

similarity calculation module: the method comprises the steps of inputting health data of each user into a corresponding health data twin network model, obtaining first vectors corresponding to the health data, and determining similarity between the two first vectors;

and the comprehensive similarity calculation module is used for: the method comprises the steps of inputting user basic information and the similarity of the two users into a pre-trained judgment model, and determining the comprehensive similarity between the two users;

decision module: the method is used for determining whether to combine the data to be processed of the two users or not based on the comprehensive similarity;

wherein, the similarity calculation module includes:

the vector processing unit is used for inputting the health data of one user into the first health data twin model to obtain a first vector to be processed, and inputting the health data of the other user into the second health data twin model to obtain a second vector to be processed;

A similarity calculation unit that determines a similarity between the first vector to be processed and the second vector to be processed based on a preset similarity algorithm that is an algorithm that calculates a distance value between the first vector to be processed and the second vector to be processed;

the comprehensive similarity calculation module is specifically configured to:

the target vector calculation unit is used for obtaining matching characteristics corresponding to the corresponding fields through matching processing of field contents corresponding to the same field;

6. An electronic device, the electronic device comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the data processing method of any one of claims 1-4.

7. A computer-readable storage medium, characterized in that the computer-readable storage medium stores computer instructions for causing a processor to implement the data processing method of any one of claims 1-4 when executed.