CN113723524A

CN113723524A - Data processing method based on prediction model, related equipment and medium

Info

Publication number: CN113723524A
Application number: CN202111017641.4A
Authority: CN
Inventors: 钟明峰
Original assignee: Ping An International Smart City Technology Co Ltd
Current assignee: Shenzhen Ping An Smart Healthcare Technology Co ltd
Priority date: 2021-08-31
Filing date: 2021-08-31
Publication date: 2021-11-30
Anticipated expiration: 2041-08-31
Also published as: CN113723524B

Abstract

The embodiment of the application discloses a data processing method based on a prediction model, related equipment and a medium, which are applied to the technical field of artificial intelligence. The method comprises the following steps: the method comprises the steps of obtaining a user data set of a target user, generating a data mapping image of the target user according to the user data set of the target user, determining N first dimension characteristics of the target user according to each group of user data mapping points, determining M second dimension characteristics of the target user according to index data mapping points with the same type in N groups of user data, inputting the N first dimension characteristics and the M second dimension characteristics into a prediction model, and obtaining a prediction result of user data aiming at the target user, wherein the prediction result is used for indicating the data true probability of the user data set aiming at the target user. By adopting the embodiment of the application, the authenticity of the user data can be judged. The application relates to block chain technology, such as the data true probability of a target user data set and the like can be written into a block chain.

Description

Data processing method based on prediction model, related equipment and medium

Technical Field

The application relates to the technical field of artificial intelligence, in particular to a data processing method based on a prediction model, related equipment and media.

Background

Currently, there are many scenarios that require data analysis, however, the data to be analyzed may present a counterfeiting situation and may not be able to reliably determine the authenticity of the data. For example, in a data sampling scenario, when a large amount of data is analyzed, a small portion of data may be extracted in a random sampling manner, and the analysis result of the small portion of data may be used as the analysis result of all data. For example, the relevant regulatory platform may randomly draw target patients from different regional institutions and analyze the medical data of the drawn target patients to find medical data suspected of being problematic. Due to the possible occurrence of false data to be analyzed, indiscriminate data extraction or data analysis may result in insufficient representativeness of the analyzed data and unreliable analysis results. Therefore, how to judge the authenticity of the data and further improve the reliability of the analysis result of the data becomes a problem to be solved urgently.

Disclosure of Invention

The embodiment of the application provides a data processing method based on a prediction model, related equipment and a medium, which can judge the authenticity of data and improve the reliability of an analysis result aiming at the data.

In one aspect, an embodiment of the present application provides a data processing method based on a prediction model, where the method includes:

acquiring a user data set of a target user; the user data set comprises N groups of user data, each group of user data in the N groups of user data comprises M index data, and N and M are positive integers;

generating a data mapping image of the target user according to the user data set of the target user; the data mapping image comprises data mapping points corresponding to M index data in each group of user data respectively;

determining first dimension characteristics of the target user according to mapping points of each group of user data in the data mapping image respectively to obtain N first dimension characteristics;

determining second dimension characteristics of the target user according to mapping points of index data with the same type in the N groups of user data in the data mapping image respectively to obtain M second dimension characteristics;

inputting the N first dimension characteristics and the M second dimension characteristics into a prediction model to obtain a prediction result of user data of the target user; the prediction result is used for indicating the data true probability of the user data set aiming at the target user.

In one possible embodiment, the generating a data mapping image of the target user according to the user data set of the target user includes:

constructing a mapping image area corresponding to each index data in the M index data; one index data corresponds to one mapping image area;

according to the mapping relation between each index data and the mapping image area, mapping each index data in each group of user data into the corresponding mapping image area respectively to obtain M mapping image areas;

and determining the data mapping image according to the M mapping image areas after mapping.

In a possible implementation manner, the N groups of user data include an ith group of user data, where i is a positive integer less than or equal to N;

the determining a first dimension characteristic of the target user according to the mapping points of each set of user data in the data mapping image respectively includes:

determining M index data in the ith group of user data to be a data mapping area corresponding to the ith group of user data in a connection area of mapping points in the data mapping image respectively;

acquiring a reasonable data mapping area for a target user;

determining a first dimension characteristic represented by the ith group of user data according to a data mapping area corresponding to the ith group of user data and the reasonable data mapping area; the first dimension characteristic represented by the ith group of user data is the ith first dimension characteristic in the N first dimension characteristics.

In a possible implementation manner, the determining, according to the data mapping region corresponding to the ith group of user data and the reasonable data mapping region, the first dimension characteristic represented by the ith group of user data includes:

comparing the data mapping area corresponding to the ith group of user data with the reasonable data mapping area, and determining the area overlapping characteristic of the data mapping area corresponding to the ith group of user data and the reasonable data mapping area;

and determining the area coincidence characteristic of the data mapping area corresponding to the ith group of user data and the reasonable data mapping area as the first dimension characteristic represented by the ith group of user data.

In a possible implementation manner, the N groups of user data include M groups of index data with the same type, the M groups of index data with the same type include a jth group of index data with the same type, and j is a positive integer less than or equal to M;

the determining the second dimension characteristics of the target user according to the mapping points of the index data with the same kind in the N groups of user data in the data mapping image respectively includes:

determining the data fluctuation characteristics of the j group index data with the same type according to the connection distance of the mapping points of the j group index data with the same type in the data mapping image;

determining data co-occurrence characteristics of the j-th group of index data with the same type according to the mapping positions of the mapping points of the j-th group of index data with the same type in the data mapping image;

determining the data fluctuation characteristics and the data co-occurrence characteristics of the jth group of index data with the same kind as the second dimension characteristics represented by the jth group of index data with the same kind; and the second dimensional features represented by the j-th group of index data with the same type are j-th second dimensional features in the M second dimensional features.

In one possible embodiment, the predictive model is a gradient lifting tree model;

inputting the N first dimension features and the M second dimension features into a prediction model to obtain a prediction result for the user data of the target user, including:

inputting the N first-dimension features and the M second-dimension features into the gradient lifting tree model, performing feature division on the N first-dimension features and the M second-dimension features by each decision tree included in the gradient lifting tree model, and determining leaf nodes into which the N first-dimension features and the M second-dimension features are divided in each decision tree;

and determining a prediction result of the user data aiming at the target user according to the divided numerical values of the leaf nodes.

In one possible embodiment, the target users are multiple; the method further comprises the following steps:

determining target users with data true probabilities within a preset probability interval according to the data true probabilities of the user data sets of each target user in the plurality of target users;

extracting target users with target quantity from the target users within the preset probability interval;

and sending the user data sets of the target users in the target number to a supervision platform so that the supervision platform performs anomaly analysis on the user data sets of the target users in the target number.

In one aspect, an embodiment of the present application provides a data processing apparatus based on a prediction model, where the apparatus includes:

the acquisition module is used for acquiring a user data set of a target user; the user data set comprises N groups of user data, each group of user data in the N groups of user data comprises M index data, and N and M are positive integers;

the generation module is used for generating a data mapping image of the target user according to the user data set of the target user; the data mapping image comprises data mapping points corresponding to M index data in each group of user data respectively;

a determining module, configured to determine first dimension features of the target user according to mapping points of each group of user data in the data mapping image, respectively, to obtain N first dimension features;

the determining module is configured to determine second dimensional features of the target user according to mapping points of index data of the same type in the N groups of user data in the data mapping image, respectively, to obtain M second dimensional features;

the input module is used for inputting the N first dimension characteristics and the M second dimension characteristics into a prediction model to obtain a prediction result of the user data of the target user; the prediction result is used for indicating the data true probability of the user data set aiming at the target user.

In one aspect, an embodiment of the present application provides an electronic device, which includes a processor and a memory, where the memory is used to store a computer program, and the computer program includes program instructions, and the processor is configured to call the program instructions to execute some or all of the steps in the method.

In one aspect, the present application provides a computer-readable storage medium, which stores a computer program, where the computer program includes program instructions, and the program instructions, when executed by a processor, are used to perform some or all of the steps of the above method.

In the embodiment of the application, a user data set of a target user is obtained, a data mapping image of the target user is generated according to the user data set of the target user, N first dimension features of the target user are determined according to each group of user data mapping points, M second dimension features of the target user are determined according to index data mapping points of the same type in N groups of user data, the N first dimension features and the M second dimension features are input into a prediction model, a prediction result for the user data of the target user is obtained, and the prediction result is used for indicating the data true probability of the user data set for the target user. By implementing the method provided by the embodiment of the application, the data mapping image can be generated based on the user data set, the user data distribution condition of the target user can be determined through the data mapping image, the data true probability of the user data set aiming at the target user can be obtained based on the mapping point and the prediction model in the data mapping image so as to judge the authenticity of the data, and the target user can be sampled based on the data true probability so as to improve the reliability of the analysis result aiming at the user data.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic flowchart of a data processing method based on a prediction model according to an embodiment of the present application;

FIG. 2a is a schematic diagram of a data mapping image according to an embodiment of the present disclosure;

FIG. 2b is a schematic diagram of a data mapping image according to an embodiment of the present application;

fig. 3 is a schematic flowchart of a data processing method based on a prediction model according to an embodiment of the present application;

fig. 4 is a schematic diagram of a data mapping area according to an embodiment of the present application;

FIG. 5 is a schematic structural diagram of a data processing apparatus based on a prediction model according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application.

The data processing method based on the prediction model provided by the embodiment of the application is realized in electronic equipment, and the electronic equipment can be terminal equipment or a server. The terminal device can be a smart phone, a tablet computer, a notebook computer, a desktop computer and the like. The server may be an independent server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a Network service, cloud communication, a middleware service, a domain name service, a security service, a Content Delivery Network (CDN), a big data and artificial intelligence platform, but is not limited thereto. The present application relates to a block chain technology, and an electronic device may write related data, such as a prediction result of user data for a target user, or user data (such as medical data of a target patient) into a block chain, so that the electronic device may obtain required information, such as a data true probability of a user data set for the target user, on the block chain.

In some embodiments, the electronic device may execute the data processing method based on the prediction model according to actual business requirements, and may judge data authenticity. The technical scheme of the application can be applied to any data analysis scene, such as a scene that data is sampled and the extracted data is used for deducing all data, or such as a scene that data of people is analyzed to determine suspicious people, and the like. For example, the technical scheme of the application can be applied to a scene of sampling a target patient, the electronic device can generate a data mapping image for the target patient according to a medical data set of the target patient to be sampled, the data true probability of the medical data set for the target patient is obtained according to the mapping points in the data mapping image in a prediction mode, and the subsequent electronic device can sample the target patient according to the data true probability, so that omission of fake medical data can be reduced, medical data suspected of problems can be found more easily, and reliability of data analysis results can be improved.

It should be understood that the foregoing scenarios are only examples, and do not constitute a limitation on application scenarios of the technical solutions provided in the embodiments of the present application, and the technical solutions of the present application may also be applied to other scenarios. For example, as can be known by those skilled in the art, with the evolution of system architecture and the emergence of new service scenarios, the technical solution provided in the embodiments of the present application is also applicable to similar technical problems.

The scheme provided by the embodiment of the application relates to an artificial intelligence technology, and is specifically explained by the following embodiment:

based on the above description, the present application provides a data processing method based on a prediction model, which may be executed by the above mentioned electronic device. As shown in fig. 1, a flow of a data processing method based on a prediction model according to an embodiment of the present application may include the following:

s101, acquiring a user data set of a target user; the user data set comprises N groups of user data, and each group of user data in the N groups of user data comprises M index data.

In some embodiments, the target user may be any user that needs to perform data analysis, such as any user that needs to perform sampling, the user data set includes multiple types of index data that can be used to characterize characteristics of the target user, N groups of user data in the user data set may represent characteristics of the target user at N different time nodes, and M types of index data included in one group of user data are different types of index data, and the types of M types of index data included in different groups of user data are the same. Wherein N and M are both positive integers.

For example, taking the target patient as the target patient, the user data set is a medical data set of the target patient, the medical data set may be a plurality of index data for characterizing blood pressure characteristics of the target patient, N groups of user data in the user data set may represent blood pressure characteristics of the target patient at N different time nodes, for example, each group of medical data may be a plurality of index data such as a first systolic pressure, a first diastolic pressure, a second systolic pressure, a second diastolic pressure, an average systolic pressure, and an average diastolic pressure in a blood pressure measurement, and the first group of medical data may represent data obtained by starting a blood pressure measurement on the target patient at the first time node, and the nth group of medical data may represent data obtained by starting a blood pressure measurement on the target patient at the nth time node.

And S102, generating a data mapping image of the target user according to the user data set of the target user.

The data mapping image comprises data mapping points corresponding to M index data in each group of user data.

In a possible implementation manner, the electronic device may specifically generate the data mapping image of the target user according to the user data set of the target user, where a mapping image region corresponding to each index data in the M kinds of index data is constructed, and according to a mapping relationship between each index data and the mapping image region, each index data in each group of user data is respectively mapped to the corresponding mapping image region, so as to obtain M mapped mapping image regions, and determine the data mapping image according to the M mapped mapping image regions. Wherein, one index data corresponds to one mapping image area, and the mapping image areas corresponding to different index data are not overlapped. The distribution condition of each index data in the user data set of the target user can be visually represented through the data mapping image, and then various dimensional characteristics can be analyzed from the data mapping image to be used for determining the probability that the user data set of the target user is real data. And different index data represent different characteristics of the target user, so that different index data are in different characteristic spaces, the different index data are mapped through the mapping relations respectively corresponding to the different index data to generate a data mapping image, the multiple index data can be in the same relation space, and then the same measuring standard can be adopted to judge whether the multiple index data are real, so that the prediction accuracy of the real probability of the data can be improved.

In some embodiments, one index data has a corresponding mapping point in the data mapping image, for index data of the same type, the corresponding mapping point is in the same mapped image region, and if the index data of the same type and the specific index values are the same, the corresponding mapping points overlap. It can be understood that a set of user data has one mapping point in each of the M mapped mapping image regions, and one mapped mapping image region has mapping points corresponding to index data in each set of user data having mapping relation with the mapped mapping image region, that is, N mapping points.

For example, as shown in fig. 2a to 2b, fig. 2a to 2b are schematic diagrams of a data mapping image provided in an embodiment of the present application, where a user data set of a target user has 3 sets of user data, each set of user data includes 6 index data, so that an electronic device constructs a corresponding mapping image region according to each index data, and obtains an initial data mapping image according to the 6 mapping image regions, as shown in fig. 2a, each index data has a mapping relationship with the corresponding mapping image region; mapping the index data in each set of user data to the corresponding mapping image area according to the mapping relationship to obtain 6 mapping image areas, and obtaining a data mapping image according to the 6 mapping image areas, as shown in fig. 2b, the data mapping image has mapping points of each index data, and one mapping image area has mapping points of all index data of the same kind in the user data set.

In some embodiments, each mapping point in the data mapping image may have a node attribute indicating index data and a specific index value in which set of user data the mapping point belongs to; alternatively, if there is an overlap of two or more mapping points, the node attribute may further indicate whether the mapping point is an overlapping mapping point, and which set of user data the overlapping mapping points belong to, and the specific index value.

S103, determining the first dimension characteristics of the target user according to the mapping points of each group of user data in the data mapping image respectively to obtain N first dimension characteristics.

The electronic device may determine the first dimension characteristic of a target user according to the mapping point of each group of user data, and the N groups of user data correspond to the N first dimension characteristics. The first dimension characteristic represents a data characteristic within a set of user data.

In one possible embodiment, the process and principle of determining the first dimension characteristic by the electronic device according to each set of user data are the same, taking a set of user data (set as the target set of user data) as an example. The electronic device may determine a standard value corresponding to each index data, map the standard value corresponding to each index data into the data mapping image according to a mapping relationship between the index data and the mapping image area to obtain a standard mapping point corresponding to each index data, and determine a first dimension characteristic corresponding to the target group of user data according to the mapping point of the target group of user data and the position information of the corresponding standard mapping point in the data mapping image.

In some embodiments, the electronic device may determine the first dimension characteristic corresponding to the target group user data according to the position information of the mapping point of the target group user data and the corresponding standard mapping point in the data mapping image, where the first dimension characteristic corresponding to the target group user data is determined according to a position distance between the mapping point of each index data in the target group user data and the corresponding standard mapping point in the data mapping image. Wherein, the electronic device determines the first dimension characteristic corresponding to the target group of user data according to the position distance between the mapping point of each index data and the corresponding standard mapping point in the data mapping image, and may obtain M position distances according to the position distance between the mapping point of each index data and the corresponding standard mapping point in the data mapping image, and the mapping point of one index data may obtain a position distance between the mapping point of one index data and the corresponding standard mapping point in the data mapping image, and may use the sum of the M position distances as the first dimension characteristic, or may be an average position distance of the M position distances as the first dimension feature, or may be a result of weighted summation of the M position distances as the first dimension feature, or may be to calculate a variance location distance from the M location distances as a first dimension feature, and so on. In addition, the standard value corresponding to each index data can be set by related business personnel according to experience values, and can also be generated by learning a large amount of sample user data by utilizing big data and machine learning technology in advance through electronic equipment.

S104, determining second dimension characteristics of the target user according to mapping points of the index data with the same type in the N groups of user data in the data mapping image respectively to obtain M second dimension characteristics.

The electronic device can determine the second dimensional characteristics of a target user according to each group of the same-kind index data in the M groups of the same-kind index data, and the M groups of the same-kind index data correspond to the M second dimensional characteristics. The second dimension characteristic represents a data characteristic between each set of user data groups.

In one possible implementation, the process and principle of determining the second dimension characteristic by the electronic device according to each group of index data with the same kind are the same, and a group of index data with the same kind (set as target group index data) is taken as an example. The electronic device may determine a data fluctuation degree of the target group index data according to a connection distance of mapping points respectively corresponding to each kind of same index data in the target group index data in the data mapping image, and use the data fluctuation degree as a second dimension characteristic of the target user.

S105, inputting the N first dimension characteristics and the M second dimension characteristics into a prediction model to obtain a prediction result of user data for a target user; the prediction result is used to indicate a data true probability for the user data set of the target user.

The data true probability is used for representing the data authenticity of a user data set of user data, the greater the data true probability is, the greater the possibility that the data is true is, and the less the possibility that the data is false is, and conversely, the smaller the data true probability is, the less the possibility that the data is true is, and the greater the possibility that the data is false is.

In one possible embodiment, the prediction model may be a classification model, such as a sigmoid neural network model or a Logistic Regression model (Logistic Regression), and the classification model may classify the target user, that is, the user data set of the target user is data true or data false. The electronic device may input the N first dimension features and the M second dimension features into the classification model, and the classification model predicts and outputs a prediction result according to the N first dimension features and the M second dimension features, where the prediction result may represent a classification result, that is, whether data is true or not, and a data true probability of the user data set for the target user may be determined according to the prediction result. The larger the data true probability is, the lower the possibility of the user data set of the target user being falsified, and conversely, the smaller the data true probability is, the higher the possibility of the user data set of the target user being falsified is.

In some embodiments, when obtaining the data true probability of the user data set for the target user, the electronic device may sample the target user by combining the data true probability to improve the representativeness of the sampled target user, and then when analyzing the user data set based on the sampled target user, the electronic device may make the sampling analysis result more accurate. For example, target users in different probability intervals may be determined according to the data true probability, and random extraction may be performed from the target users in different probability intervals, respectively, and so on.

In the embodiment of the application, the electronic device may obtain a user data set of a target user, where the user data set includes N groups of user data, where each group of user data in the N groups of user data includes M kinds of index data, generate a data mapping image of the target user according to the user data set of the target user, determine a first dimension characteristic of the target user according to mapping points of each group of user data in the data mapping image, respectively, obtain N first dimension characteristics, determine a second dimension characteristic of the target user according to mapping points of index data of the same kind in the N groups of user data in the data mapping image, respectively, obtain M second dimension characteristics, input the N first dimension characteristics and the M second dimension characteristics into a prediction model, obtain a prediction result of the user data for the target user, where the prediction result is used to indicate a data true probability of the user data set for the target user, and sampling the target user according to the data true probability of the user data set aiming at the target user. By implementing the method provided by the embodiment of the application, the data mapping image can be generated based on the user data set, the user data distribution condition of the target user can be determined through the data mapping image, the data true probability of the user data set aiming at the target user can be obtained based on the mapping point and the prediction model in the data mapping image so as to judge the authenticity of the data, and the target user can be sampled based on the data true probability so as to improve the reliability of the analysis result aiming at the user data.

Referring to fig. 3, fig. 3 is a flowchart illustrating a data processing method based on a prediction model according to an embodiment of the present application, where the method may be executed by the above-mentioned electronic device. As shown in fig. 3, a flow of the data processing method based on the prediction model in the embodiment of the present application may include the following:

s301, acquiring a user data set of a target user; the user data set comprises N groups of user data, and each group of user data in the N groups of user data comprises M index data.

S302, generating a data mapping image of the target user according to the user data set of the target user. For specific implementation of steps S301 to S302, refer to the related description of steps S101 to S102.

S303, determining the first dimension characteristics of the target user according to the mapping points of each group of user data in the data mapping image respectively to obtain N first dimension characteristics.

In one possible embodiment, the N sets of user data include an ith set of user data, and i is a positive integer less than or equal to N. The electronic equipment determines first dimension characteristics of a target user according to mapping points of each group of user data in a data mapping image, and the obtained N first dimension characteristics specifically include that a connection area of M index data in the ith group of user data in the mapping points of the data mapping image is determined as a data mapping area corresponding to the ith group of user data, a reasonable data mapping area for the target user is obtained, and the first dimension characteristics represented by the ith group of user data are determined according to the data mapping area corresponding to the ith group of user data and the reasonable data mapping area; the first dimension characteristic represented by the ith group of user data is the ith first dimension characteristic in the N first dimension characteristics. The ith group of user data has M mapping points in the data mapping image, and a connection region is obtained according to the M mapping points, and the connection region is the data mapping region corresponding to the ith group of user data, as shown in fig. 4.

In a possible implementation manner, the electronic device determines the first dimension feature represented by the ith group of user data according to the data mapping region and the reasonable data mapping region corresponding to the ith group of user data specifically, by comparing the data mapping region corresponding to the ith group of user data with the reasonable data mapping region, determining a region overlapping feature of the data mapping region corresponding to the ith group of user data and the reasonable data mapping region, and determining the region overlapping feature of the data mapping region corresponding to the ith group of user data and the reasonable data mapping region as the first dimension feature represented by the ith group of user data. The region coincidence feature can represent the region coincidence degree of a data mapping region corresponding to the ith group of user data and a reasonable data mapping region, the greater the region coincidence degree, the greater the data authenticity aiming at the ith group of user data, and conversely, the smaller the region coincidence degree, the smaller the data authenticity aiming at the ith group of user data. The reasonable data mapping area may be formed by connecting mapping points of the standard value corresponding to each index data in the data mapping image. The reasonable data mapping regions for different target users may be the same.

S304, determining second dimension characteristics of the target user according to mapping points of the index data with the same type in the N groups of user data in the data mapping image respectively to obtain M second dimension characteristics.

In one possible embodiment, the N groups of user data include M groups of index data with the same kind, the M groups of index data with the same kind include a jth group of index data with the same kind, and j is a positive integer smaller than or equal to M. The electronic device may specifically determine the second dimensional characteristic of the target user according to the mapping points of the same kind of index data in the N groups of user data in the data mapping image, that is, determine the data fluctuation characteristic of the same kind of index data in the jth group according to the connection distance of the mapping points of the same kind of index data in the jth group in the data mapping image, determine the data co-occurrence characteristic of the same kind of index data in the jth group according to the mapping position of the same kind of index data in the data mapping image, and determine the data fluctuation characteristic and the data co-occurrence characteristic of the same kind of index data in the jth group as the second dimensional characteristic represented by the same kind of index data in the jth group; the second dimension characteristic represented by the j-th group of index data with the same type is the j-th second dimension characteristic in the M second dimension characteristics.

Optionally, the data fluctuation feature represents the data fluctuation degree of the index data of the same type, and both too small and too large data fluctuation degrees easily cause that the data authenticity of the index data of the jth group of the same type is smaller, that is, the index data of the same type should be in a reasonable data fluctuation range, that is, the connection distance of the corresponding mapping point should be in a reasonable distance; that is, if the connection distance is greater than the reasonable distance, the data fluctuation degree is too large, the data authenticity is smaller, and if the connection distance is less than the reasonable distance, the data fluctuation degree is too small, the data authenticity is smaller, and there may be a data falsification situation, for example, the 2 nd group of user data is not really acquired, but directly copies the first group of user data, and so on.

Optionally, the data co-occurrence feature represents the number of data co-occurrence times in the index data with the same type, that is, whether the index data with the same index value exists or not, that is, whether the overlay mapping point exists or not, and if the data co-occurrence feature represents that the number of data co-occurrence times is equal to or greater than a preset threshold, it represents that the data is unreasonable, and the greater the number of data co-occurrence times, the smaller the data authenticity is. The preset threshold value can be set by the relevant service personnel according to experience values. The data authenticity of the user data set of the target user can be predicted through the first dimension characteristic and the second dimension characteristic which are used for measuring the data authenticity.

S305, inputting the N first dimension characteristics and the M second dimension characteristics into a prediction model to obtain a prediction result of user data for a target user; the prediction result is used to indicate a data true probability for the user data set of the target user.

In a possible implementation, the prediction model may be a gradient-boosted tree model, and the electronic device may input the N first-dimension features and the M second-dimension features into the prediction model to obtain the prediction result of the user data for the target user, specifically, the N first-dimension features and the M second-dimension features are input into the gradient-boosted tree model, each decision tree included in the gradient-boosted tree model performs feature division on the N first-dimension features and the M second-dimension features, leaf nodes into which the N first-dimension features and the M second-dimension features are divided in each decision tree are determined, and the prediction result of the user data for the target user is determined according to values of the divided leaf nodes.

Alternatively, the electronic device may determine an average of the divided leaf node values as a prediction result of the user data for the target user, that is, the average of the plurality of divided leaf node values is used as the data true probability. In addition, a sample set can be constructed, and the gradient lifting tree model to be trained is trained by the sample set to obtain the gradient lifting tree model for prediction. The sample set may include N first-dimension sample features and M second-dimension sample features corresponding to the sample user data set, and a label (data true label or data false label) corresponding to the sample user data set; the training process may be to construct K decision trees (K is a positive integer), where the K decision trees include a plurality of leaf nodes, train the gradient lifting tree model to be trained by using the sample set, and obtain a trained gradient lifting tree model, where the K decision trees in the trained gradient lifting tree model include a plurality of leaf nodes with trained numerical values.

For example, the trained gradient lifting tree model comprises two decision trees 1 and 2, in the decision tree 1, according to feature division, N first-dimension features and M second-dimension features are divided into a node a, and the value corresponding to the node a is a; in the decision tree 2, the N first-dimension features and the M second-dimension features are divided into B nodes, and the value corresponding to the B nodes is B, that is, the true probability y of the data represented by the prediction result is (a + B)/2.

S306, when a plurality of target users exist, extracting a target number of target users from the plurality of target users according to the data true probability of the target user data set of each target user in the plurality of target users.

In a possible embodiment, when the target users are multiple, the electronic device may extract the target users from the multiple target users according to the data true probability of the target user data set of each target user of the multiple target users, specifically, determine the target users whose data true probabilities are within a preset probability interval according to the data true probability of the user data set of each target user of the multiple target users, and extract the target users from the target users within a predicted probability interval. The preset probability interval may be one or more probability intervals, and the predicted probability interval and the target number are set by the relevant service personnel according to experience values. Optionally, the electronic device may extract the target users with the target number from the target users within the prediction probability interval by using a random sampling method. By determining the target users within the preset probability interval and randomly extracting the target users from the target users, the target users at different levels can be sampled to realize hierarchical sampling, so that the representativeness of the extracted target users can be improved.

For example, the preset probability intervals are 3, the 3 prediction probability intervals are respectively [ X < X1 ], [ X1 < X2 ], [ X2 < X3 ], X represents the data true probability, the target users in each prediction probability interval are determined according to the data true probability of the user data set of each target user in the target users, and 5 target users are randomly extracted from the target users in each prediction probability interval respectively to obtain 15 extracted target users.

S307, the user data sets of the target users with the target number are sent to a supervision platform, so that the supervision platform conducts anomaly analysis on the user data sets of the target users with the target number.

In some embodiments, the electronic device may send the extracted target number of target users and the user data sets of the target number of target users to a monitoring platform (such as a related medical monitoring platform), and the monitoring platform performs an anomaly analysis on the user data sets to obtain a sampling analysis result, where the sampling analysis result may indicate whether the user data sets of the target users have problems.

In the embodiment of the application, the electronic device may obtain a user data set of a target user, where the user data set includes N groups of user data, where each group of user data in the N groups of user data includes M kinds of index data, generate a data mapping image of the target user according to the user data set of the target user, determine a first dimension characteristic of the target user according to mapping points of each group of user data in the data mapping image, respectively, obtain N first dimension characteristics, determine a second dimension characteristic of the target user according to mapping points of index data of the same kind in the N groups of user data in the data mapping image, respectively, obtain M second dimension characteristics, input the N first dimension characteristics and the M second dimension characteristics into a prediction model, obtain a prediction result of the user data for the target user, where the prediction result is used to indicate a data true probability of the user data set for the target user, when a plurality of target users are available, extracting a target number of target users from the plurality of target users according to the data true probability of the target user data set of each target user in the plurality of target users, and sending the user data sets of the target number of target users to the supervision platform, so that the supervision platform performs abnormal analysis on the user data sets of the target number of target users. By implementing the method provided by the embodiment of the application, the data mapping image can be generated based on the user data set, the user data distribution condition of the target user can be determined through the data mapping image, the data true probability of the user data set aiming at the target user can be obtained based on the mapping point and the prediction model in the data mapping image so as to judge the authenticity of the data, and the target user can be sampled based on the data true probability so as to improve the reliability of the analysis result aiming at the user data.

Referring to fig. 5, fig. 5 is a schematic structural diagram of a data processing apparatus based on a prediction model according to the present application. It should be noted that, the data processing apparatus based on the prediction model shown in fig. 5 is used for executing the method of the embodiment shown in fig. 1 and fig. 3 of the present application, for convenience of description, only the portion related to the embodiment of the present application is shown, and specific technical details are not disclosed, and reference is made to the embodiment shown in fig. 1 and fig. 3 of the present application. The prediction model-based data processing apparatus 500 may include: the device comprises an acquisition module 501, a generation module 502, a determination module 503 and an input module 504. Wherein:

an obtaining module 501, configured to obtain a user data set of a target user; the user data set comprises N groups of user data, each group of user data in the N groups of user data comprises M index data, and N and M are positive integers;

a generating module 502, configured to generate a data mapping image of the target user according to the user data set of the target user; the data mapping image comprises data mapping points corresponding to M index data in each group of user data respectively;

a determining module 503, configured to determine first dimension features of the target user according to mapping points of each group of user data in the data mapping image, respectively, to obtain N first dimension features;

the determining module 503 is configured to determine second dimensional features of the target user according to mapping points of index data of the same type in the N groups of user data in the data mapping image, respectively, to obtain M second dimensional features;

an input module 504, configured to input the N first dimensional features and the M second dimensional features into a prediction model, so as to obtain a prediction result of user data for the target user; the prediction result is used for indicating the data true probability of the user data set aiming at the target user.

In a possible implementation, the generating module 502, when configured to generate the data mapping image of the target user according to the user data set of the target user, is specifically configured to:

In a possible embodiment, the N groups of user data include an ith group of user data, where i is a positive integer less than or equal to N;

the determining module 503, when configured to determine the first dimension characteristic of the target user according to the mapping points of each group of user data in the data mapping image, is specifically configured to:

determining M index data in the ith group of user data to be data mapping areas corresponding to mapping points in the data mapping image respectively;

acquiring a reasonable data mapping area for a target user;

In a possible embodiment, the determining module 503, when configured to determine the first dimension characteristic represented by the ith group of user data according to the data mapping region corresponding to the ith group of user data and the reasonable data mapping region, is specifically configured to:

In a possible embodiment, the N groups of user data include M groups of index data with the same type, the M groups of index data with the same type include a jth group of index data with the same type, and j is a positive integer less than or equal to M;

the determining module 503, when configured to determine the second dimension characteristic of the target user according to mapping points of the same kind of index data in the N groups of user data in the data mapping image, is specifically configured to:

In one possible embodiment, the predictive model is a gradient-lifting tree model;

the input module 504 is specifically configured to, when the input module is configured to input the N first dimension features and the M second dimension features into a prediction model to obtain a prediction result of the user data for the target user:

In one possible embodiment, the target user is a plurality of users; the input module 504 is further configured to:

In the embodiment of the application, an acquisition module acquires a user data set of a target user, wherein the user data set comprises N groups of user data, each group of user data in the N groups of user data comprises M index data, and both N and M are positive integers; the generation module generates a data mapping image of the target user according to the user data set of the target user, wherein the data mapping image comprises data mapping points corresponding to M index data in each group of user data; the determining module determines first dimension characteristics of a target user according to mapping points of each group of user data in the data mapping image to obtain N first dimension characteristics; the determining module determines second dimensional features of the target user according to mapping points of index data with the same type in the N groups of user data in the data mapping image respectively to obtain M second dimensional features; the input module is used for inputting the N first dimension characteristics and the M second dimension characteristics into the prediction model to obtain a prediction result of user data aiming at the target user; the prediction result is used to indicate a data true probability for the user data set for the target user. By implementing the device provided by the embodiment of the application, a data mapping image can be generated based on a user data set, the user data distribution situation of a target user can be determined through the data mapping image, the data true probability of the user data set aiming at the target user can be obtained based on the mapping points and the prediction model in the data mapping image so as to judge the authenticity of the data, and then the target user can be sampled based on the data true probability so as to improve the reliability of the analysis result aiming at the user data.

Each functional module in the embodiments of the present application may be integrated into one module, or each module may exist alone physically, or two or more modules are integrated into one module. The integrated module may be implemented in a form of hardware, or may be implemented in a form of software functional module, which is not limited in this application.

Referring to fig. 6, fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure. As shown in fig. 6, the electronic device 600 includes: at least one processor 601, a memory 602. Optionally, the electronic device may further include a network interface. Wherein, data can be interacted between the processor 601, the memory 602 and the network interface, the network interface is controlled by the processor 601 for transceiving messages, the memory 602 is used for storing computer programs, the computer programs comprise program instructions, and the processor 601 is used for executing the program instructions stored in the memory 602. Wherein the processor 601 is configured to call the program instructions to perform the above method.

The memory 602 may include volatile memory (volatile memory), such as random-access memory (RAM); the memory 602 may also include a non-volatile memory (non-volatile memory), such as a flash memory (flash memory), a solid-state drive (SSD), etc.; the memory 602 may also comprise a combination of memories of the kind described above.

The processor 601 may be a Central Processing Unit (CPU). In one embodiment, the processor 601 may also be a Graphics Processing Unit (GPU). The processor 601 may also be a combination of a CPU and a GPU.

In one possible embodiment, the memory 602 is used for storing program instructions, and the processor 601 can call the program instructions to execute the following steps:

In a possible implementation, the processor 601, when configured to generate a data mapping image of a target user according to the user data set of the target user, is specifically configured to:

when the processor 601 is configured to determine the first dimension feature of the target user according to the mapping points of each group of user data in the data mapping image, specifically:

acquiring a reasonable data mapping area for a target user;

In a possible embodiment, when the processor 601 is configured to determine the first dimension characteristic represented by the ith group of user data according to the data mapping region corresponding to the ith group of user data and the reasonable data mapping region, specifically:

when the processor 601 is configured to determine the second dimension feature of the target user according to mapping points of index data of the same kind in the N sets of user data in the data mapping image, specifically:

when the processor 601 is configured to input the N first dimension features and the M second dimension features into a prediction model to obtain a prediction result for the user data of the target user, specifically:

In one possible embodiment, the target user is a plurality of users; the processor 601 is further configured to:

In a specific implementation, the apparatus, the processor 601, the memory 602, and the like described in the embodiments of the present application may perform the implementation described in the above method embodiments, and may also perform the implementation described in the embodiments of the present application, which is not described herein again.

Also provided in embodiments of the present application is a computer (readable) storage medium storing a computer program comprising program instructions that, when executed by a processor, cause the processor to perform some or all of the steps performed in the above-described method embodiments. Alternatively, the computer storage media may be volatile or nonvolatile. The computer-readable storage medium may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created according to the use of the blockchain node, and the like.

The block chain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism and an encryption algorithm. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.

Reference herein to "a plurality" means two or more. "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship.

It will be understood by those skilled in the art that all or part of the processes of the methods of the above embodiments may be implemented by a computer program, which may be stored in a computer storage medium, and the computer storage medium may be a computer readable storage medium, and when executed, the programs may include the processes of the above embodiments of the methods. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.

While the present disclosure has been described with reference to particular embodiments, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the present disclosure.

Claims

1. A method of data processing based on a predictive model, the method comprising:

2. The method of claim 1, wherein generating a data mapping image of a target user from the user data set of the target user comprises:

3. The method of claim 1, wherein the N sets of user data include an ith set of user data, i being a positive integer less than or equal to N;

acquiring a reasonable data mapping area for a target user;

4. The method according to claim 3, wherein the determining the first dimension characteristic represented by the ith group of user data according to the data mapping region corresponding to the ith group of user data and the legitimate data mapping region comprises:

5. The method according to claim 1, wherein the N groups of user data include M groups of indicator data of the same kind, the M groups of indicator data of the same kind include a jth group of indicator data of the same kind, j is a positive integer less than or equal to M;

6. The method of claim 1, wherein the predictive model is a gradient-boosted tree model;

7. The method of claim 1, wherein the target user is a plurality; the method further comprises the following steps:

8. A predictive model-based data processing apparatus, the apparatus comprising:

the determining module is further configured to determine second dimensional features of the target user according to mapping points of index data of the same type in the N groups of user data in the data mapping image, respectively, to obtain M second dimensional features;

9. An electronic device comprising a processor and a memory, wherein the memory is configured to store a computer program comprising program instructions, and wherein the processor is configured to invoke the program instructions to perform the method of any of claims 1-7.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program comprising program instructions that, when executed by a processor, cause the processor to carry out the method according to any one of claims 1-7.