CN113723524B

CN113723524B - Data processing method based on prediction model, related equipment and medium

Info

Publication number: CN113723524B
Application number: CN202111017641.4A
Authority: CN
Inventors: 钟明峰
Original assignee: Shenzhen Ping An Smart Healthcare Technology Co ltd
Current assignee: Shenzhen Ping An Smart Healthcare Technology Co ltd
Priority date: 2021-08-31
Filing date: 2021-08-31
Publication date: 2024-05-17
Anticipated expiration: 2041-08-31
Also published as: CN113723524A

Abstract

The embodiment of the application discloses a data processing method based on a prediction model, related equipment and a medium, which are applied to the technical field of artificial intelligence. The method comprises the following steps: acquiring a user data set of a target user, generating a data mapping image of the target user according to the user data set of the target user, determining N first dimension features of the target user according to each group of user data mapping points, determining M second dimension features of the target user according to index data mapping points with the same kind in N groups of user data, inputting the N first dimension features and the M second dimension features into a prediction model, and obtaining a prediction result of the user data of the target user, wherein the prediction result is used for indicating the data true probability of the user data set of the target user. By adopting the embodiment of the application, the authenticity of the user data can be judged. The present application relates to blockchain techniques, such as writing the true probability of data, etc., of a target user data set to a blockchain.

Description

Data processing method based on prediction model, related equipment and medium

Technical Field

The application relates to the technical field of artificial intelligence, in particular to a data processing method based on a prediction model, related equipment and a medium.

Background

Currently, there are many scenarios in which data analysis is required, however, the data to be analyzed may have a fake situation, and the authenticity of the data may not be reliably determined. For example, in a data sampling scenario, when analyzing a large amount of data, a small part of data can be extracted by random sampling, and the analysis result of the small part of data is used as the analysis result of all data. For example, the relevant regulatory platform may randomly extract target patients from different regional authorities and analyze the extracted medical data of the target patients to find medical data that is suspected to be problematic. If data extraction or data analysis is performed indiscriminately, the data to be analyzed may have insufficient representativeness, and the analysis result obtained later may be unreliable. Therefore, how to determine the authenticity of data and further improve the reliability of the analysis result for the data is a urgent problem to be solved.

Disclosure of Invention

The embodiment of the application provides a data processing method, related equipment and medium based on a prediction model, which can judge the authenticity of data and improve the reliability of an analysis result aiming at the data.

In one aspect, an embodiment of the present application provides a data processing method based on a prediction model, where the method includes:

acquiring a user data set of a target user; the user data set comprises N groups of user data, each group of user data in the N groups of user data comprises M index data, and N and M are positive integers;

Generating a data mapping image of the target user according to the user data set of the target user; the data mapping image comprises data mapping points corresponding to M index data in each group of user data respectively;

Determining first dimension features of the target user according to mapping points of each group of user data in the data mapping image respectively to obtain N first dimension features;

Determining second dimension features of the target user according to mapping points of index data with the same type in the N groups of user data in the data mapping image respectively to obtain M second dimension features;

Inputting the N first dimension features and the M second dimension features into a prediction model to obtain a prediction result of user data aiming at the target user; the prediction result is used to indicate a data true probability of a user data set for the target user.

In a possible implementation manner, the generating the data mapping image of the target user according to the user data set of the target user includes:

Constructing a mapping image area corresponding to each index data in the M index data; the index data corresponds to a mapping image area;

according to the mapping relation between each index data and the mapping image area, each index data in each group of user data is mapped into the corresponding mapping image area to obtain M mapped mapping image areas;

and determining the data mapping image according to the M mapped mapping image areas.

In a possible implementation manner, the N groups of user data include an ith group of user data, where i is a positive integer less than or equal to N;

the determining the first dimension characteristic of the target user according to the mapping points of each group of user data in the data mapping image respectively comprises the following steps:

the connection areas of the M index data in the ith group of user data at the mapping points in the data mapping image are determined as the data mapping areas corresponding to the ith group of user data;

Acquiring a reasonable data mapping area aiming at a target user;

Determining a first dimension characteristic represented by the user data of the ith group according to a data mapping area corresponding to the user data of the ith group and the reasonable data mapping area; the first dimension characteristic characterized by the ith set of user data is the ith one of the N first dimension characteristics.

In a possible implementation manner, the determining the first dimension characteristic characterized by the ith group of user data according to the data mapping area corresponding to the ith group of user data and the reasonable data mapping area includes:

Comparing the data mapping area corresponding to the ith group of user data with the reasonable data mapping area, and determining the area superposition characteristics of the data mapping area corresponding to the ith group of user data and the reasonable data mapping area;

And determining the region coincidence characteristic of the data mapping region corresponding to the ith group of user data and the reasonable data mapping region as a first dimension characteristic represented by the ith group of user data.

In a possible implementation manner, the N sets of user data include index data with the same M sets of types, where the index data with the same M sets of types includes index data with the same j sets of types, and j is a positive integer less than or equal to M;

the determining the second dimension characteristic of the target user according to the mapping points of the index data with the same kind in the N groups of user data in the data mapping image comprises the following steps:

determining the data fluctuation characteristics of index data with the same type of the jth group according to the connection distance of the mapping points of the index data with the same type of the jth group in the data mapping image;

determining the data co-occurrence characteristics of index data with the same type of the jth group according to the mapping positions of the mapping points of the index data with the same type of the jth group in the data mapping image;

Determining the data fluctuation characteristics and the data co-occurrence characteristics of index data with the same type of the j group as second dimension characteristics represented by the index data with the same type of the j group; and the j-th second dimension characteristic represented by index data with the same type in the j-th group is the j-th second dimension characteristic in the M second dimension characteristics.

In one possible implementation, the prediction model is a gradient-lifting tree model;

Inputting the N first dimension features and the M second dimension features into a prediction model to obtain a prediction result of user data for the target user, including:

Inputting the N first dimension features and the M second dimension features into the gradient lifting tree model, performing feature division on the N first dimension features and the M second dimension features by each decision tree included in the gradient lifting tree model, and determining leaf nodes divided by the N first dimension features and the M second dimension features in each decision tree;

and determining a prediction result of the user data aiming at the target user according to the numerical value of the divided leaf nodes.

In one possible implementation, the target user is a plurality of; the method further comprises the steps of:

Determining target users with data true probabilities within a preset probability interval according to the data true probabilities of the user data sets of each target user in the plurality of target users;

extracting target number of target users from the target users in the preset probability interval;

And sending the user data sets of the target number of target users to a supervision platform so that the supervision platform performs anomaly analysis on the user data sets of the target number of target users.

In one aspect, an embodiment of the present application provides a data processing apparatus based on a prediction model, including:

The acquisition module is used for acquiring a user data set of the target user; the user data set comprises N groups of user data, each group of user data in the N groups of user data comprises M index data, and N and M are positive integers;

The generation module is used for generating a data mapping image of the target user according to the user data set of the target user; the data mapping image comprises data mapping points corresponding to M index data in each group of user data respectively;

The determining module is used for determining first dimension characteristics of the target user according to the mapping points of each group of user data in the data mapping image respectively to obtain N first dimension characteristics;

The determining module is used for determining second dimension features of the target user according to mapping points of index data with the same type in the N groups of user data in the data mapping image respectively to obtain M second dimension features;

The input module is used for inputting the N first dimension features and the M second dimension features into a prediction model to obtain a prediction result of user data aiming at the target user; the prediction result is used to indicate a data true probability of a user data set for the target user.

In one aspect, an embodiment of the present application provides an electronic device including a processor and a memory, where the memory is configured to store a computer program including program instructions, and the processor is configured to invoke the program instructions to perform some or all of the steps in the above method.

In one aspect, embodiments of the present application provide a computer readable storage medium storing a computer program comprising program instructions for performing part or all of the steps of the above method when executed by a processor.

In the embodiment of the application, a user data set of a target user is obtained, a data mapping image of the target user is generated according to the user data set of the target user, N first dimension features of the target user are respectively determined according to each group of user data mapping points, M second dimension features of the target user are respectively determined according to index data mapping points with the same kind in N groups of user data, the N first dimension features and the M second dimension features are input into a prediction model, a prediction result of the user data of the target user is obtained, and the prediction result is used for indicating the data true probability of the user data set of the target user. By implementing the method provided by the embodiment of the application, the data mapping image can be generated based on the user data set, the user data distribution condition of the target user can be determined through the data mapping image, the data real probability of the user data set for the target user can be obtained based on the mapping points in the data mapping image and the prediction model so as to judge the authenticity of the data, and the target user can be sampled based on the data real probability so as to improve the reliability of the analysis result of the user data.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic flow chart of a data processing method based on a prediction model according to an embodiment of the present application;

FIG. 2a is a schematic diagram of a data mapping image according to an embodiment of the present application;

FIG. 2b is a schematic diagram of a data mapping image according to an embodiment of the present application;

FIG. 3 is a schematic flow chart of a data processing method based on a prediction model according to an embodiment of the present application;

FIG. 4 is a schematic diagram of a data mapping area according to an embodiment of the present application;

FIG. 5 is a schematic structural diagram of a data processing apparatus based on a prediction model according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application.

The data processing method based on the prediction model provided by the embodiment of the application is implemented in the electronic equipment, and the electronic equipment can be terminal equipment or a server. The terminal equipment can be a smart phone, a tablet computer, a notebook computer, a desktop computer and the like. The server may be an independent server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, content delivery networks (Content Delivery Network, CDN), basic cloud computing services such as big data and artificial intelligence platforms, but is not limited thereto. The present application relates to blockchain technology, and an electronic device can write related data such as a prediction result of user data for a target user, or user data (such as medical data of a target patient) into a blockchain, so that the electronic device can acquire required information on the blockchain, such as a data true probability of a user data set for the target user.

In some embodiments, the electronic device may execute the data processing method based on the prediction model according to actual service requirements, so as to determine the authenticity of the data. The technical scheme of the application can be applied to any data analysis scene, such as a scene of sampling data and deducing all data by using the analysis result of the extracted data, or a scene of analyzing personnel trip data to determine suspicious personnel, and the like. For example, the technical scheme of the application can be applied to a scene of sampling a target patient, the electronic equipment can generate a data mapping image for the target patient according to the medical data set of the target patient to be sampled, the data real probability of the medical data set for the target patient is predicted according to the mapping points in the data mapping image, the subsequent electronic equipment can sample the target patient by combining the data real probability, the omission of fake medical data can be reduced, the suspected problematic medical data can be found more easily, and the reliability of the data analysis result can be improved.

It can be understood that the above scenario is merely an example, and does not constitute a limitation on the application scenario of the technical solution provided by the embodiment of the present application, and the technical solution of the present application may also be applied to other scenarios. For example, as one of ordinary skill in the art can know, with the evolution of the system architecture and the appearance of new service scenarios, the technical solution provided by the embodiment of the present application is also applicable to similar technical problems.

The scheme provided by the embodiment of the application relates to an artificial intelligence technology, and is specifically described by the following embodiments:

Based on the above description, the embodiments of the present application provide a data processing method based on a prediction model, which can be performed by the above-mentioned electronic device. As shown in fig. 1, the flow of the data processing method based on the prediction model according to the embodiment of the present application may include the following steps:

S101, acquiring a user data set of a target user; the set of user data includes N sets of user data, each set of user data in the N sets of user data including M index data.

In some embodiments, the target user may be any user needing data analysis, for example, any user needing sampling, the user data set includes a plurality of index data that may be used to characterize the target user, N groups of user data in the user data set may represent features of the target user at N different time nodes, and M index data included in one group of user data are index data with different types, and M index data included in different groups of user data are the same type. Wherein N and M are positive integers.

For example, taking a target user as a target patient, the user data set is a medical data set of the target patient, the medical data set may be a plurality of index data for characterizing blood pressure characteristics of the target patient, N groups of user data in the user data set may represent blood pressure characteristics of the target patient at N different time nodes, for example, each group of medical data may be a plurality of index data such as a first systolic pressure, a first diastolic pressure, a second systolic pressure, a second diastolic pressure, an average systolic pressure, an average diastolic pressure, and the like in blood pressure measurement, the first group of medical data may represent data obtained by starting blood pressure measurement on the target patient at the first time node, and the nth group of medical data may represent data obtained by starting blood pressure measurement on the target patient at the nth time node.

S102, generating a data mapping image of the target user according to the user data set of the target user.

The data mapping image comprises data mapping points corresponding to M index data in each group of user data.

In a possible implementation manner, the electronic device may specifically construct a mapping image area corresponding to each index data in the M index data according to the user data set of the target user, map each index data in each group of user data into a corresponding mapping image area according to a mapping relationship between each index data and the mapping image area, obtain M mapped mapping image areas, and determine a data mapping image according to the M mapped mapping image areas. Wherein, one index data corresponds to one mapping image area, and the mapping image areas corresponding to different index data are not overlapped. The distribution condition of each index data in the user data set of the target user can be visually represented through the data mapping image, and then various dimension characteristics can be analyzed from the data mapping image to be used for determining the probability that the user data set of the target user is real data. And because different index data represent different characteristics of the target user, the different index data are in different characteristic spaces, the different index data are mapped through the mapping relations respectively corresponding to the different index data, and a data mapping image is generated, so that the plurality of index data are in the same relation space, further, the same measurement standard can be adopted to judge whether the plurality of index data are real, and the prediction accuracy of the real probability of the data can be improved.

In some embodiments, one index data has a corresponding one mapping point in the data mapping image, and for index data of the same kind, the corresponding mapping points are in the same mapped mapping image area, and if the index data of the same kind and the specific index values are the same, the corresponding mapping points overlap. It can be understood that the set of user data has one mapping point in each mapped image area of the M mapped image areas, and one mapped image area has a mapping point corresponding to index data having a mapping relationship with the mapped image area in each set of user data, that is, has N mapping points.

For example, as shown in fig. 2 a-2 b, fig. 2 a-2 b are schematic diagrams of a data mapping image provided by an embodiment of the present application, where a set of user data of a target user is set to have 3 sets of user data, and each set of user data includes 6 index data, so that an electronic device constructs a corresponding mapping image area according to each index data, and obtains an initial data mapping image according to the 6 mapping image areas, as shown in fig. 2a, each index data has a mapping relationship with the corresponding mapping image area; mapping index data in each group of user data to a corresponding mapping image area according to a mapping relation to obtain 6 mapped mapping image areas, and obtaining a data mapping image according to the 6 mapped mapping image areas, wherein the data mapping image is provided with mapping points of each index data, and one mapped mapping image area is provided with mapping points of all index data with the same category in a user data set, as shown in fig. 2 b.

In some embodiments, each mapping point in the data mapping image may have a node attribute, where the node attribute indicates which set of user data the mapping point belongs to, and a specific index value; optionally, if there are two or more overlapping mapping points, the node attribute may further indicate whether the mapping points overlap the mapping points, and to which group of user data the overlapping mapping points belong respectively, and a specific index value.

S103, determining first dimension features of the target user according to mapping points of each group of user data in the data mapping image, and obtaining N first dimension features.

The electronic device may determine first dimension features of one target user according to mapping points of each set of user data, where the N sets of user data correspond to the N first dimension features. The first dimension characteristic represents a data characteristic within a set of user data.

In a possible implementation, the procedure and principle of determining the first dimension feature by the electronic device from each set of user data is the same, taking as an example a set of user data (set as target set of user data). The electronic device can determine a standard value corresponding to each index data, map the standard value corresponding to each index data into the data mapping image according to the mapping relation between the index data and the mapping image area, obtain a standard mapping point corresponding to each index data, and determine a first dimension characteristic corresponding to the target group user data according to the mapping point of the target group user data and the position information of the corresponding standard mapping point in the data mapping image.

In some embodiments, the determining, by the electronic device, the first dimension feature corresponding to the target group user data according to the location information of the mapping point of the target group user data and the corresponding standard mapping point in the data mapping image may be determining, by the electronic device, the first dimension feature corresponding to the target group user data according to the location distance between the mapping point of each index data in the target group user data and the corresponding standard mapping point in the data mapping image. The electronic device determines, according to the position distance between the mapping point of each index data and the corresponding standard mapping point in the data mapping image, the first dimension feature corresponding to the target group user data, which may be that, according to the position distance between the mapping point of each index data and the corresponding standard mapping point in the data mapping image, M position distances are obtained, one mapping point of each index data may obtain a position distance between the mapping point of each index data and the corresponding standard mapping point in the data mapping image, which may take the sum of the M position distances as the first dimension feature, or may take the average position distance of the M position distances as the first dimension feature, or may take the result of weighted sum of the M position distances as the first dimension feature, or may calculate the variance position distance according to the M position distances as the first dimension feature, and so on. In addition, the standard value corresponding to each index data may be set by the relevant business personnel according to the experience value, or may be generated by the electronic device by learning a large number of sample user data in advance using big data and machine learning techniques.

S104, determining second dimension features of the target user according to mapping points of index data with the same category in the N groups of user data in the data mapping image, and obtaining M second dimension features.

The electronic device may determine the second dimension characteristic of the target user according to the index data of each group of the index data of the same type in the M groups of the index data of the same type, where the index data of the same type in the M groups of the index data of the same type corresponds to the M second dimension characteristics. The second dimension characteristic represents a data characteristic between each set of user data sets.

In one possible implementation, the electronic device determines the second dimension feature according to the same index data of each group, and takes as an example a group of index data of the same type (set as target group index data). The electronic device may determine a data fluctuation degree of the target group index data according to a connection distance of the mapping points corresponding to each index data of the same type in the target group index data in the data mapping image, and use the data fluctuation degree as a second dimension feature of the target user.

S105, inputting N first dimension features and M second dimension features into a prediction model to obtain a prediction result of user data aiming at a target user; the prediction result is used to indicate the data true probability of the user data set for the target user.

The data true probability is used for representing the data authenticity of a user data set of user data, the greater the data true probability is, the greater the possibility of data being true is, the less the possibility of data falsification is, and conversely, the smaller the data true probability is, the less the possibility of data being true is, and the greater the possibility of data falsification is.

In one possible implementation, the prediction model may be a classification model, such as a sigmoid neural network model or a logistic regression model (Logistic Regression), which may classify the target user, i.e., the user data set of the target user is data-real or data-unreal. The electronic device may input the N first dimension features and the M second dimension features into a classification model, predict the classification model according to the N first dimension features and the M second dimension features, and output a prediction result, where the prediction result may represent the classification result, that is, the data is true or the data is not true, and determine a data true probability of the user data set for the target user through the prediction result. The larger the data true probability is, the smaller the probability of the user data set of the target user is, and conversely, the smaller the data true probability is, the larger the probability of the user data set of the target user is.

In some embodiments, when obtaining the data true probability of the user data set for the target user, the electronic device may sample the target user in combination with the data true probability to improve the representativeness of the sampled target user, and then make the sampling analysis result more accurate when analyzing based on the sampled user data set of the target user. For example, it may be that target users in different probability intervals are determined according to the true probability of the data, and random extraction is performed on the target users in the different probability intervals respectively, and so on.

In the embodiment of the application, the electronic device can acquire a user data set of a target user, wherein the user data set comprises N groups of user data, each group of user data in the N groups of user data comprises M index data, a data mapping image of the target user is generated according to the user data set of the target user, first dimension features of the target user are respectively determined according to mapping points of each group of user data in the data mapping image, N first dimension features are obtained, second dimension features of the target user are respectively determined according to mapping points of index data with the same type in the N groups of user data in the data mapping image, M second dimension features are obtained, N first dimension features and M second dimension features are input into a prediction model, a prediction result of the user data of the target user is obtained, the prediction result is used for indicating the data true probability of the user data set of the target user, and the target user is sampled according to the data true probability of the user data set of the target user. By implementing the method provided by the embodiment of the application, the data mapping image can be generated based on the user data set, the user data distribution condition of the target user can be determined through the data mapping image, the data real probability of the user data set for the target user can be obtained based on the mapping points in the data mapping image and the prediction model so as to judge the authenticity of the data, and the target user can be sampled based on the data real probability so as to improve the reliability of the analysis result of the user data.

Referring to fig. 3, fig. 3 is a flowchart of a data processing method based on a prediction model according to an embodiment of the present application, where the method may be performed by the above-mentioned electronic device. As shown in fig. 3, the flow of the data processing method based on the prediction model in the embodiment of the present application may include the following steps:

S301, acquiring a user data set of a target user; the set of user data includes N sets of user data, each set of user data in the N sets of user data including M index data.

S302, generating a data mapping image of the target user according to the user data set of the target user. Specific embodiments of steps S301 to S302 may be referred to the relevant descriptions of steps S101 to 102.

S303, determining first dimension features of the target user according to mapping points of each group of user data in the data mapping image, and obtaining N first dimension features.

In one possible embodiment, the N groups of user data include an i-th group of user data, where i is a positive integer less than or equal to N. The electronic equipment respectively determines first dimension characteristics of a target user according to mapping points of each group of user data in the data mapping image, and N first dimension characteristics are obtained, wherein the N first dimension characteristics can be specifically obtained by respectively determining connection areas of M index data in the ith group of user data in the mapping points of the data mapping image as data mapping areas corresponding to the ith group of user data, acquiring reasonable data mapping areas for the target user, and determining the first dimension characteristics represented by the ith group of user data according to the data mapping areas corresponding to the ith group of user data and the reasonable data mapping areas; the first dimension characteristic characterized by the ith set of user data is the ith one of the N first dimension characteristics. The i-th group of user data has M mapping points in the data mapping image, and a connection area can be obtained according to the M mapping points, where the connection area is a data mapping area corresponding to the i-th group of user data, as shown in fig. 4.

In one possible implementation manner, the electronic device determines, according to the data mapping area corresponding to the ith group of user data and the reasonable data mapping area, a first dimension characteristic represented by the ith group of user data, specifically, may be to compare the data mapping area corresponding to the ith group of user data with the reasonable data mapping area, determine an area overlapping characteristic of the data mapping area corresponding to the ith group of user data with the reasonable data mapping area, and determine an area overlapping characteristic of the data mapping area corresponding to the ith group of user data with the reasonable data mapping area as the first dimension characteristic represented by the ith group of user data. The region overlapping feature can represent the region overlapping degree of the data mapping region corresponding to the ith group of user data and the reasonable data mapping region, the larger the region overlapping degree is, the larger the data authenticity aiming at the ith group of user data is, and conversely, the smaller the region overlapping degree is, the smaller the data authenticity aiming at the ith group of user data is. The reasonable data mapping area may be formed by connecting mapping points of standard values corresponding to each index data in the data mapping image. The reasonable data mapping areas of different target users may be the same.

S304, determining second dimension features of the target user according to mapping points of index data with the same category in the N groups of user data in the data mapping image, and obtaining M second dimension features.

In one possible embodiment, the N sets of user data include M sets of index data with the same category, where the M sets of index data with the same category include j sets of index data with the same category, where j is a positive integer less than or equal to M. The electronic device determines a second dimension characteristic of the target user according to the mapping points of the index data with the same type in the N groups of user data in the data mapping image, specifically, the electronic device determines a data fluctuation characteristic of the index data with the same type in the j groups according to the connection distance of the mapping points of the index data with the same type in the data mapping image, determines a data co-occurrence characteristic of the index data with the same type in the j groups according to the mapping positions of the mapping points of the index data with the same type in the data mapping image, and determines the data fluctuation characteristic and the data co-occurrence characteristic of the index data with the same type in the j groups as the second dimension characteristic represented by the index data with the same type in the j groups; the j-th set of index data of the same category characterizes a j-th second dimension feature of the M second dimension features.

Optionally, the data fluctuation feature indicates the data fluctuation degree of index data with the same kind, and the smaller the data fluctuation degree and the larger the data fluctuation degree are, the smaller the data authenticity of index data with the same kind of the j group is easily caused, namely, the index data with the same kind is in a reasonable data fluctuation range, namely, the connection distance of the corresponding mapping points is in a reasonable distance; that is, if the connection distance is greater than the reasonable distance, the degree of fluctuation of the data is too large, the authenticity of the data is smaller, if the connection distance is smaller than the reasonable distance, the degree of fluctuation of the data is too small, the authenticity of the data is also smaller, and there may be a case of data falsification, such as that the user data of the group 2 is not actually acquired, but the user data of the first group is directly copied, and the like.

Optionally, the data co-occurrence feature indicates the number of data co-occurrence times in index data with the same category, that is, whether index data with the same index value exists, that is, whether an overlapping mapping point exists, if the data co-occurrence feature indicates that the number of data co-occurrence times is equal to or greater than a preset threshold, the data is unreasonable, and the more the number of data co-occurrence times is, the smaller the data authenticity is. The preset threshold may be set by the relevant business personnel based on an empirical value. The data authenticity of the user data set of the target user can be predicted by the first dimension feature and the second dimension feature for measuring the data authenticity.

S305, inputting N first dimension features and M second dimension features into a prediction model to obtain a prediction result of user data aiming at a target user; the prediction result is used to indicate the data true probability of the user data set for the target user.

In one possible implementation manner, the prediction model may be a gradient lifting tree model, the electronic device inputs N first dimension features and M second dimension features into the prediction model, and the obtaining of the prediction result of the user data for the target user may specifically be that inputting the N first dimension features and the M second dimension features into the gradient lifting tree model, performing feature classification on the N first dimension features and the M second dimension features by using each decision tree included in the gradient lifting tree model, determining leaf nodes classified by the N first dimension features and the M second dimension features in each decision tree, and determining the prediction result of the user data for the target user according to the values of the classified leaf nodes.

Alternatively, the electronic device may determine an average of the values of the divided leaf nodes as a prediction result of the user data for the target user, that is, an average of the values of the plurality of divided leaf nodes as the data true probability. In addition, a sample set can be constructed, and the gradient lifting tree model to be trained is trained by the sample set, so that the gradient lifting tree model for prediction is obtained. The sample set may include N first dimension sample features and M second dimension sample features corresponding to the sample user data set, and a label (data true label or data non-true label) corresponding to the sample user data set; the training process may be that K decision trees (K is a positive integer) are constructed, the K decision trees include a plurality of leaf nodes, a gradient lifting tree model to be trained is trained by using a sample set, a trained gradient lifting tree model is obtained, and the K decision trees in the trained gradient lifting tree model include a plurality of leaf nodes having trained values.

For example, the trained gradient lifting tree model comprises two decision trees 1 and 2, wherein in the decision tree 1, according to feature division, N first dimension features and M second dimension features are divided into a nodes, and the value corresponding to the a nodes is A; in the decision tree 2, N first dimension features and M second dimension features are divided into B nodes, where the B nodes correspond to a value B, that is, the data true probability y= (a+b)/2 represented by the prediction result.

And S306, when a plurality of target users are provided, extracting a target number of target users from the plurality of target users according to the data real probability of the target user data set of each target user in the plurality of target users.

In one possible implementation manner, when the target users are multiple, the electronic device extracts the target number of target users from the multiple target users according to the data real probability of the target user data set of each target user in the multiple target users, which may specifically be that the target users with the data real probability within the preset probability interval are determined according to the data real probability of the user data set of each target user in the multiple target users, and the target number of target users is extracted from the target users within the prediction probability interval. The preset probability interval may be one or more probability intervals, and the predicted probability interval and the target number are set by the relevant service personnel according to experience values. Alternatively, the electronic device may extract the target number of target users from the target users within the predicted probability interval by using a random sampling manner. By determining the target users within the preset probability interval and randomly extracting the target users, sampling can be performed among the target users in different layers to realize layered sampling, and therefore the representativeness of the extracted target users can be improved.

For example, the number of preset probability intervals is 3, the 3 prediction probability intervals are respectively [ X < X1 ], [ X1 is less than or equal to X < X2 ], [ X2 is less than or equal to X < X3 ], X represents the data true probability, the target users in each prediction probability interval are determined according to the data true probability of the user data set of each target user in the plurality of target users, and 5 target users are randomly extracted from the target users in each prediction probability interval, so that the extracted 15 target users are obtained.

S307, the user data sets of the target number of target users are sent to the supervision platform, so that the supervision platform performs anomaly analysis on the user data sets of the target number of target users.

In some embodiments, the electronic device may send the extracted target number of target users and the user data sets of the target number of target users to a supervisory platform (such as a relevant medical supervisory platform), and the supervisory platform performs anomaly analysis on the user data sets to obtain a sampling analysis result, where the sampling analysis result may indicate whether a problem exists in the user data sets of the target users.

In the embodiment of the application, the electronic device can acquire a user data set of a target user, wherein the user data set comprises N groups of user data, each group of user data in the N groups of user data comprises M index data, a data mapping image of the target user is generated according to the user data set of the target user, first dimension characteristics of the target user are respectively determined according to mapping points of each group of user data in the data mapping image, N first dimension characteristics are obtained, second dimension characteristics of the target user are respectively determined according to mapping points of index data with the same type in the N groups of user data in the data mapping image, M second dimension characteristics are obtained, N first dimension characteristics and M second dimension characteristics are input into a prediction model, a prediction result of the user data of the target user is obtained, the prediction result is used for indicating the data true probability of the user data set of the target user, when the target user is a plurality of the target user, the target user data set of the target user is extracted from the target user, and the user data of the target user is sent to a supervision platform to the supervision platform so that the supervision platform of the target user is subjected to the abnormal data analysis. By implementing the method provided by the embodiment of the application, the data mapping image can be generated based on the user data set, the user data distribution condition of the target user can be determined through the data mapping image, the data real probability of the user data set for the target user can be obtained based on the mapping points in the data mapping image and the prediction model so as to judge the authenticity of the data, and the target user can be sampled based on the data real probability so as to improve the reliability of the analysis result of the user data.

Referring to fig. 5, fig. 5 is a schematic structural diagram of a data processing apparatus based on a prediction model according to the present application. It should be noted that, the prediction model-based data processing apparatus shown in fig. 5 is used to execute the method of the embodiment shown in fig. 1 and 3, and for convenience of explanation, only the portion relevant to the embodiment of the present application is shown, and specific technical details are not disclosed, and reference is made to the embodiment shown in fig. 1 and 3 of the present application. The predictive model-based data processing apparatus 500 may include: an acquisition module 501, a generation module 502, a determination module 503, and an input module 504. Wherein:

An obtaining module 501, configured to obtain a user data set of a target user; the user data set comprises N groups of user data, each group of user data in the N groups of user data comprises M index data, and N and M are positive integers;

a generating module 502, configured to generate a data mapping image of the target user according to the user data set of the target user; the data mapping image comprises data mapping points corresponding to M index data in each group of user data respectively;

A determining module 503, configured to determine first dimension features of the target user according to mapping points of each group of user data in the data mapping image, so as to obtain N first dimension features;

the determining module 503 is configured to determine second dimension features of the target user according to mapping points of index data with the same category in the N sets of user data in the data mapping image, so as to obtain M second dimension features;

An input module 504, configured to input the N first dimension features and the M second dimension features into a prediction model, to obtain a prediction result of user data for the target user; the prediction result is used to indicate a data true probability of a user data set for the target user.

In one possible implementation, the generating module 502 is specifically configured to, when configured to generate the data mapping image of the target user according to the user data set of the target user:

In one possible implementation manner, the N groups of user data include an ith group of user data, where i is a positive integer less than or equal to N;

The determining module 503 is specifically configured to, when determining the first dimension characteristic of the target user according to the mapping points of each set of user data in the data mapping image, respectively:

Acquiring a reasonable data mapping area aiming at a target user;

In a possible implementation manner, the determining module 503 is specifically configured to, when determining the first dimension feature characterized by the ith group of user data according to the data mapping area corresponding to the ith group of user data and the reasonable data mapping area:

In one possible implementation manner, the N sets of user data include index data with the same M sets of types, where the index data with the same M sets of types includes index data with the same j sets of types, and j is a positive integer less than or equal to M;

the determining module 503 is specifically configured to, when determining the second dimension characteristic of the target user according to the mapping points of the index data with the same category in the N sets of user data in the data mapping image, respectively:

The input module 504 is specifically configured to, when being configured to input the N first dimension features and the M second dimension features into a prediction model to obtain a prediction result of user data for the target user:

In one possible implementation, the target user is a plurality of; the input module 504 is further configured to:

In the embodiment of the application, an acquisition module acquires a user data set of a target user, wherein the user data set comprises N groups of user data, each group of user data in the N groups of user data comprises M index data, and N and M are positive integers; the generation module generates a data mapping image of the target user according to the user data set of the target user, wherein the data mapping image comprises data mapping points respectively corresponding to M index data in each group of user data; the determining module determines first dimension characteristics of the target user according to mapping points of each group of user data in the data mapping image respectively to obtain N first dimension characteristics; the determining module determines second dimension features of the target user according to mapping points of index data with the same type in the N groups of user data in the data mapping image respectively to obtain M second dimension features; the input module is used for inputting the N first dimension features and the M second dimension features into the prediction model to obtain a prediction result of user data aiming at a target user; the prediction result is used to indicate the data true probability of the user data set for the target user. By implementing the device provided by the embodiment of the application, the data mapping image can be generated based on the user data set, the user data distribution condition of the target user can be determined through the data mapping image, the data real probability of the user data set aiming at the target user can be obtained based on the mapping points in the data mapping image and the prediction model so as to judge the authenticity of the data, and the target user can be sampled based on the data real probability so as to improve the reliability of the analysis result aiming at the user data.

The functional modules in the embodiments of the present application may be integrated into one module, or each module may exist alone physically, or two or more modules may be integrated into one module. The integrated modules may be implemented in hardware or in software functional modules, which is not limited by the present application.

Referring to fig. 6, fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application. As shown in fig. 6, the electronic device 600 includes: at least one processor 601, a memory 602. Optionally, the electronic device may further comprise a network interface. The processor 601, the memory 602, and the network interface may exchange data, the network interface is controlled by the processor 601 to send and receive messages, the memory 602 is used for storing a computer program, the computer program includes program instructions, and the processor 601 is used for executing the program instructions stored in the memory 602. Wherein the processor 601 is configured to invoke the program instructions to perform the above method.

The memory 602 may include volatile memory (RAM), such as random-access memory (RAM); the memory 602 may also include a non-volatile memory (non-volatile memory), such as a flash memory (flash memory), a solid state disk (solid-state drive-STATE DRIVE, SSD), etc.; the memory 602 may also include a combination of the types of memory described above.

The processor 601 may be a central processing unit (central processing unit, CPU). In one embodiment, the processor 601 may also be a graphics processor (Graphics Processing Unit, GPU). The processor 601 may also be a combination of a CPU and a GPU.

In a possible implementation manner, the memory 602 is configured to store program instructions, and the processor 601 may call the program instructions to perform the following steps:

In a possible implementation manner, the processor 601 is specifically configured to, when configured to generate a data mapping image of the target user according to the user data set of the target user:

The processor 601 is specifically configured to, when determining the first dimension characteristics of the target user according to the mapping points of each set of user data in the data mapping image, respectively:

Acquiring a reasonable data mapping area aiming at a target user;

In a possible implementation manner, the processor 601 is specifically configured to, when determining the first dimension characteristic represented by the ith group of user data according to the data mapping area corresponding to the ith group of user data and the reasonable data mapping area:

The processor 601 is configured to determine the second dimension characteristic of the target user according to mapping points of index data with the same category in the N sets of user data in the data mapping image, where the second dimension characteristic is specifically:

The processor 601 is configured to, when configured to input the N first dimension features and the M second dimension features into a prediction model to obtain a prediction result of user data for the target user, specifically:

In one possible implementation, the target user is a plurality of; the processor 601 is further configured to:

In a specific implementation, the apparatus, the processor 601, the memory 602, etc. described in the embodiments of the present application may perform the implementation described in the foregoing method embodiments, and may also perform the implementation described in the embodiments of the present application, which is not described herein again.

In an embodiment of the present application, there is also provided a computer (readable) storage medium storing a computer program, where the computer program includes program instructions, where the program instructions when executed by a processor cause the processor to perform some or all of the steps performed in the foregoing method embodiments. The computer storage medium may be volatile or nonvolatile. The computer readable storage medium may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created from the use of blockchain nodes, and the like.

The blockchain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, encryption algorithm and the like. The blockchain (Blockchain), essentially a de-centralized database, is a string of data blocks that are generated in association using cryptographic methods, each of which contains information from a batch of network transactions for verifying the validity (anti-counterfeit) of its information and generating the next block. The blockchain may include a blockchain underlying platform, a platform product services layer, an application services layer, and the like.

References herein to "a plurality" means two or more. "and/or", describes an association relationship of an association object, and indicates that there may be three relationships, for example, a and/or B, and may indicate: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship.

Those skilled in the art will appreciate that implementing all or part of the above-described embodiment methods may be accomplished by way of a computer program stored in a computer storage medium, which may be a computer-readable storage medium, which when executed, may comprise the steps of the above-described embodiment methods. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a random-access Memory (Random Access Memory, RAM), or the like.

The above disclosure is only a few examples of the present application, and it is not intended to limit the scope of the present application, but it is understood by those skilled in the art that all or a part of the above embodiments may be implemented and equivalent changes may be made in the claims of the present application.

Claims

1. A method of data processing based on a predictive model, the method comprising:

inputting the N first dimension features and the M second dimension features into a prediction model to obtain a prediction result of user data aiming at the target user; the prediction result is used for indicating the data true probability of the user data set aiming at the target user;

The N groups of user data comprise ith group of user data, wherein i is a positive integer less than or equal to N; the determining the first dimension characteristic of the target user according to the mapping points of each group of user data in the data mapping image respectively comprises the following steps:

acquiring a reasonable data mapping area aiming at the target user;

2. The method of claim 1, wherein the generating the data-mapped image of the target user from the set of user data of the target user comprises:

According to the mapping relation between each index data and the mapping image area, mapping each index data in each group of user data into a corresponding mapping image area to obtain M mapped mapping image areas;

3. The method of claim 1, wherein said determining a first dimension characteristic characterized by the ith set of user data from a data mapping region corresponding to the ith set of user data and the legitimate data mapping region comprises:

4. The method according to claim 1, wherein the N sets of user data include M sets of index data with the same category, the M sets of index data with the same category include j sets of index data with the same category, j being a positive integer less than or equal to M;

5. The method of claim 1, wherein the predictive model is a gradient-lifting tree model;

6. The method of claim 1, wherein the target user is a plurality of; the method further comprises the steps of:

Determining target users with the data true probability within a preset probability interval according to the data true probability of the user data set of each target user in the plurality of target users;

extracting target number of target users from target users in the preset probability interval;

7. A data processing apparatus based on a predictive model, the apparatus comprising:

The determining module is further configured to determine second dimension features of the target user according to mapping points of index data with the same category in the N groups of user data in the data mapping image, so as to obtain M second dimension features;

The input module is used for inputting the N first dimension features and the M second dimension features into a prediction model to obtain a prediction result of user data aiming at the target user; the prediction result is used for indicating the data true probability of the user data set aiming at the target user;

acquiring a reasonable data mapping area aiming at the target user;

8. An electronic device comprising a processor and a memory, wherein the memory is configured to store a computer program comprising program instructions, the processor being configured to invoke the program instructions to perform the method of any of claims 1-6.

9. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program comprising program instructions which, when executed by a processor, cause the processor to perform the method of any of claims 1-6.