CN111325565A

CN111325565A - Data processing method and device, computer storage medium and electronic equipment

Info

Publication number: CN111325565A
Application number: CN201811531160.3A
Authority: CN
Inventors: 潘朋
Original assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Jingdong Shangke Information Technology Co Ltd
Current assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Jingdong Shangke Information Technology Co Ltd
Priority date: 2018-12-14
Filing date: 2018-12-14
Publication date: 2020-06-23

Abstract

The present disclosure provides a data processing method, an apparatus, a computer readable medium and an electronic device, the method comprising: acquiring user sharing data according to multi-dimensional original data, and processing the user sharing data through a first algorithm model to acquire first target data and a target dimension number, wherein the number of the first target data is the same as the target dimension number; obtaining multi-dimensional feature data, and performing dimension reduction processing on the feature data through a second algorithm model to obtain a first feature value with the target dimension number; acquiring second target data according to the first characteristic value, wherein the type of the second target data is the same as that of the first target data; and matching the first target data with the second target data, and predicting the sharing behavior of the user according to the matching result. The method can save a large amount of manpower and statistic time; and the sharing condition of the user can be predicted according to the matching result, so that accurate operation is realized.

Description

Data processing method and device, computer storage medium and electronic equipment

Technical Field

The present disclosure relates to the field of computers, and in particular, to a data processing method, a data processing apparatus, a computer-readable storage medium, and an electronic device.

Background

With the development of science and technology, the traditional commodity transaction mode is gradually replaced by an electronic commerce mode, and electronic commerce is a business activity taking information network technology as means and taking commodity exchange as a center, so that a novel business operation mode of online shopping of consumers, online transaction and online electronic payment among merchants and various business activities, transaction activities, financial activities and related comprehensive service activities is realized. Social electronic commerce is one type of electronic commerce, which is a business operation for selling goods or services based on an interpersonal relationship network by using an internet social tool.

The most important index of the operator social network provider is to evaluate the sharing and spreading condition of the users and the spreading effect. At present, social e-commerce is to make statistics on e-commerce data, and then an operator manually analyzes the statistical result and outputs a statistical report. However, the existing statistical method needs to consume a lot of manpower and time and depends on the prediction of product operation, so that the sharing willingness and the effect after sharing of the user cannot be accurately analyzed, that is, the accurate operation cannot be realized.

In view of the above, there is a need in the art to develop a new data processing method and apparatus.

It is to be noted that the information disclosed in the above background section is only for enhancement of understanding of the background of the present disclosure, and thus may include information that does not constitute prior art known to those of ordinary skill in the art.

Disclosure of Invention

The present disclosure is directed to a data processing method, a data processing apparatus, a computer-readable storage medium, and an electronic device, which at least save manpower and statistics time to a certain extent, achieve targeted accurate operation, and can predict sharing intentions and sharing effects of all types of users.

Additional features and advantages of the disclosure will be set forth in the detailed description which follows, or in part will be obvious from the description, or may be learned by practice of the disclosure.

According to an aspect of the present disclosure, there is provided a data processing method, the method including:

acquiring user sharing data according to multi-dimensional original data, and processing the user sharing data through a first algorithm model to acquire first target data and a target dimension number, wherein the number of the first target data is the same as the target dimension number;

obtaining multi-dimensional feature data, and performing dimension reduction processing on the feature data through a second algorithm model to obtain a first feature value with the target dimension number, wherein the feature data is data which has a dominant effect on a sharing behavior of a user in the original data;

acquiring second target data according to the first characteristic value, wherein the type of the second target data is the same as that of the first target data;

and matching the first target data with the second target data, and predicting the sharing behavior of the user according to the matching result.

In an exemplary embodiment of the present disclosure, the first target data includes a first expectation and a first variance;

the method includes the steps of obtaining user sharing data according to multi-dimensional original data, clustering the user sharing data through a first algorithm model to obtain first target data and target dimension numbers, and includes the steps:

clustering the user sharing data through the first algorithm model to obtain a plurality of groups of user sharing subdata which accord with Gaussian distribution;

iteratively obtaining a first expectation and a first variance corresponding to each group of user shared subdata by adopting a maximum expectation algorithm;

and determining the target dimension number according to the first square difference.

In an exemplary embodiment of the present disclosure, determining the target dimension number according to the first variance includes:

determining standard deviations corresponding to the user shared subdata of each group according to the first square deviation;

summing the standard deviations corresponding to the first N groups of the user sharing subdata to obtain a first standard deviation sum, and summing the standard deviations corresponding to the first N +1 groups of the user sharing subdata to obtain a second standard deviation sum, wherein N is greater than 1 and is a positive integer;

and comparing the sum of the first standard deviations with the sum of the second standard deviations, and determining the target dimension number according to the comparison result.

In an exemplary embodiment of the present disclosure, comparing the sum of the first standard deviations with the sum of the second standard deviations, and determining the target dimensional number according to the comparison result includes:

subtracting the sum of the first standard deviations from the sum of the second standard deviations to obtain an increment of the sum of the first standard deviations relative to the sum of the second standard deviations;

comparing the increment with a preset threshold value, and judging whether the increment is smaller than the preset threshold value;

and if the increment is smaller than the preset threshold, the number of the groups of the user shared subdata corresponding to the sum of the first standard deviation is the target dimension number.

In an exemplary embodiment of the present disclosure, acquiring multidimensional feature data, and performing dimension reduction processing on the feature data through a second algorithm model to obtain a first feature value with the target dimension number includes:

extracting the feature data from the original data according to a preset condition;

and removing redundant data in the characteristic data through the second algorithm model to obtain a first characteristic value with the target dimension number.

In an exemplary embodiment of the present disclosure, the target dimension number is multidimensional, the second target data includes a second expectation and a second variance;

acquiring second target data according to the first characteristic value, wherein the second target data and the first target data are the same in type, and the method comprises the following steps:

calculating the expectation and the variance of each first characteristic value in the first characteristic values with the target dimension number, and taking the expectation and the variance as the second expectation and the second variance respectively; or

And calculating the expectation and the variance of a combined characteristic value formed by the combination of the first characteristic values with the target dimension number, and respectively using the expectation and the variance as the second expectation and the second variance.

In an exemplary embodiment of the present disclosure, matching the first target data with the second target data, and predicting a sharing behavior of a user according to a matching result includes:

matching a first expectation and a first variance in the first target data with a second expectation and a second variance in the second target data, respectively;

and if the first expectation is matched with the second expectation and the first variance is matched with the second variance, taking the first expectation as the sharing probability of the user, and predicting the sharing behavior of the user according to the sharing probability.

if the first expectation does not match the second expectation and/or the first variance does not match the second variance, performing dimensionality reduction on the feature data again to obtain a second feature value with the target dimensionality;

acquiring third target data according to the second characteristic value, wherein the third target data comprises a third expectation and a third variance;

matching the third expectation and the third variance with the first expectation and the first variance, respectively;

if the first expectation does not match the third expectation and/or the first variance does not match the third variance, repeating the above steps until a target expectation and a target variance matching the first expectation and the first variance are obtained.

According to an aspect of the present disclosure, there is provided a data processing apparatus including:

the data processing device comprises a first data processing module, a second data processing module and a third data processing module, wherein the first data processing module is used for acquiring user sharing data according to multi-dimensional original data and processing the user sharing data through a first algorithm model to acquire first target data and a target dimension number, and the number of the first target data is the same as the target dimension number;

the second data processing module is used for acquiring multi-dimensional feature data, and performing dimensionality reduction processing on the feature data through a second algorithm model to acquire a first feature value with the target dimensionality, wherein the feature data is data which plays a leading role in sharing behaviors of users in the original data;

the third data processing module is used for acquiring second target data according to the first characteristic value, and the type of the second target data is the same as that of the first target data;

and the matching module is used for matching the first target data with the second target data and predicting the sharing behavior of the user according to the matching result.

According to an aspect of the present disclosure, there is provided a computer storage medium having stored thereon a computer program which, when executed by a processor, implements a data processing method as described above.

According to an aspect of the present disclosure, there is provided an electronic device including:

a processor; and

a memory for storing executable instructions of the processor;

wherein the processor is configured to perform the data processing method as described above via execution of the executable instructions.

As can be seen from the foregoing technical solutions, the data processing method and apparatus, the computer-readable storage medium, and the electronic device in the exemplary embodiments of the present disclosure have at least the following advantages and positive effects:

the method comprises the steps that user sharing data are processed through a first algorithm model to obtain first target data and target dimension numbers, wherein the user sharing data are obtained according to multi-dimensional original data; simultaneously, performing dimensionality reduction processing on multi-dimensional feature data extracted from the original data through a second algorithm model to obtain a first feature value with a target dimensionality; then, second target data with the same type as the first target data are obtained according to the first characteristic value; and finally, matching the first target data with the second target data, and predicting the sharing behavior of the user according to the matching result. On one hand, the data processing method can automatically analyze data and predict the sharing condition of the user, so that manpower and statistical time are saved, and the cost is reduced; on the other hand, the data of all dimensions can be analyzed, the accuracy of an analysis result is improved, and accurate guidance can be conducted on product operation.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure. It is to be understood that the drawings in the following description are merely exemplary of the disclosure, and that other drawings may be derived from those drawings by one of ordinary skill in the art without the exercise of inventive faculty.

FIG. 1 shows a flow diagram of a data processing method in an exemplary embodiment of the present disclosure;

FIG. 2 is a diagram illustrating an example of an application scenario of a data processing method in an exemplary embodiment of the present disclosure;

FIG. 3 illustrates a flow chart for determining a target dimensional number in an exemplary embodiment of the present disclosure;

FIG. 4 shows a schematic diagram of probability density functions of a Gaussian mixture model in an exemplary embodiment of the disclosure;

FIG. 5 illustrates a flow chart for determining a target dimensional number based on a first variance in an exemplary embodiment of the present disclosure;

FIG. 6 illustrates a flow chart for determining a target dimensional number based on a sum of a first standard deviation and a sum of a second standard deviation in an exemplary embodiment of the present disclosure;

FIG. 7 is a schematic flow chart illustrating matching of first target data with second target data in an exemplary embodiment of the present disclosure;

fig. 8 shows a schematic structural diagram of a data processing apparatus in an exemplary embodiment of the present disclosure;

FIG. 9 shows a schematic diagram of a structure of a computer storage medium in an exemplary embodiment of the disclosure;

fig. 10 shows a schematic structural diagram of an electronic device in an exemplary embodiment of the present disclosure.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the disclosure. One skilled in the relevant art will recognize, however, that the subject matter of the present disclosure can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and the like. In other instances, well-known technical solutions have not been shown or described in detail to avoid obscuring aspects of the present disclosure.

The terms "a," "an," "the," and "said" are used in this specification to denote the presence of one or more elements/components/parts/etc.; the terms "comprising" and "having" are intended to be inclusive and mean that there may be additional elements/components/etc. other than the listed elements/components/etc.; the terms "first" and "second", etc. are used merely as labels, and are not limiting on the number of their objects.

Furthermore, the drawings are merely schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus their repetitive description will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities.

In the related art in the field, for example, social electronic commerce is taken as an example, when the sharing propagation condition and the propagation effect of a user are evaluated, statistics needs to be performed on electronic commerce data, and then an operator manually analyzes the statistical result and outputs a statistical report. However, the following problems exist in the statistical process: (1) the dimensionality and the dimensionality combination of the E-business data are too many, and a large amount of manpower and statistics time of operation, products and technicians is consumed during manual statistics; (2) the manual statistics depends on the prediction of product operation, and the sharing willingness and the effect after sharing of each dimension user cannot be accurately analyzed due to the fact that the prediction of the product operation is very difficult, and further accurate operation cannot be pertinently performed; (3) the statistics-based approach is only effective for old users and cannot predict the sharing situation of new users.

Based on the problems in the related art, a data processing method is provided in an embodiment of the present disclosure to optimize the above problems. Referring specifically to fig. 1, the data processing method may be executed by a server, and at least includes the following steps:

step S110: acquiring user sharing data according to multi-dimensional original data, and processing the user sharing data through a first algorithm model to acquire first target data and a target dimension number, wherein the number of the first target data is the same as the target dimension number;

step S120: obtaining multi-dimensional feature data, and performing dimension reduction processing on the feature data through a second algorithm model to obtain a first feature value with the target dimension number, wherein the feature data is data which has a dominant effect on a sharing behavior of a user in the original data;

step S130: acquiring second target data according to the first characteristic value, wherein the type of the second target data is the same as that of the first target data;

step S140: and matching the first target data with the second target data, and predicting the sharing behavior of the user according to the matching result.

On one hand, the data processing method in the embodiment of the disclosure can process the user shared data and the characteristic data through the first algorithm model and the second algorithm model respectively, thereby avoiding manual statistics, saving a large amount of manpower and statistics time, and reducing the cost; on the other hand, the first target data and the second target data are matched, and the sharing condition of the user is predicted according to the matching result, so that the product operation is prevented from being predicted.

In order to make the technical solution of the present disclosure clearer, the following describes each step of the data processing method in the present disclosure in detail by taking the sharing prediction of the shopping group as an example and combining the structure shown in fig. 2.

In step S110, user sharing data is obtained according to multi-dimensional raw data, and the user sharing data is processed through a first algorithm model to obtain first target data and a target dimensionality, where the number of the first target data is the same as the target dimensionality.

In the exemplary embodiment of the present disclosure, a user performs makeup purchase through an e-commerce platform in the terminal device 201, the user may send a commodity link to be makeup purchased to his own friend through chat tools such as WeChat, QQ and the like, and then the friend may participate in the makeup purchase with the user, and may send the commodity link to his own friend again to attract more people to participate in the makeup purchase; the server 202 can obtain the original data related to sharing of all users for making a group purchase using the e-commerce platform, such as the categories of goods, the sharing times of goods links, the browsing times of goods pages, the number of people for making a group purchase, user names, user ages, user professions, and the like. In the embodiment of the present disclosure, the user sharing data may be obtained according to the multidimensional original data obtained by the server 202, for example, the user sharing data is a sharing rate of commodities, and then the sharing rate of the commodities may be obtained according to the sharing times of commodity links and the browsing times of commodity pages in the original data, where the sharing rate of the commodities is only for specific commodities and does not consider influences of factors such as users and scenes. After obtaining the first data, the first data may be processed through a first algorithmic model to obtain first target data and a target dimensional number.

In an exemplary embodiment of the present disclosure, the first target data includes a first expectation and a first variance, fig. 3 shows a schematic flow chart of determining a target dimensionality, and as shown in fig. 3, in step S301, clustering the user shared data by using a first algorithm model to obtain multiple sets of user shared sub-data conforming to gaussian distribution; in the process of making up purchases, corresponding to conditions such as different types of users, different scenes and the like, the sharing rate of commodities is different, and the sharing condition of users with certain characteristics under certain scenes conforms to Gaussian distribution, so that the user sharing data can be clustered through the first algorithm model to obtain multiple groups of user sharing subdata conforming to Gaussian distribution. In step S302, a maximum expectation algorithm is used to iteratively obtain a first expectation and a first variance corresponding to each group of user shared sub-data; the Expectation maximization algorithm (EM algorithm for short) is an iterative algorithm, and is used for finding in statistics, and depends on maximum likelihood estimation of parameters in a probability model of unobservable hidden variables. In step S303, determining a target dimension number according to the first square difference; since the target dimensionality is considered to be found if the variance of the gaussian distributions does not change significantly when the number K of the gaussian distributions included in the gaussian mixture model is increased by 1, the target dimensionality can be determined according to the first variance corresponding to each group of user shared subdata in the present disclosure.

In an exemplary embodiment of the present disclosure, since the user sharing data is formed by a plurality of groups of user sharing sub-data conforming to gaussian distribution, the sharing rate of the commodity conforms to a gaussian mixture model, that is, the distribution of the user sharing data conforms to the gaussian mixture model. A Gaussian Mixture Model (GMM), which may also be referred to as MOG for short, is a Model that accurately quantizes objects by using a Gaussian probability density function (normal distribution curve), and decomposes one object into a plurality of objects formed based on the Gaussian probability density function (normal distribution curve).

In an exemplary embodiment of the present disclosure, the first algorithm model may specifically be a gaussian mixture model for clustering the user shared data.

Fig. 4 is a schematic diagram showing a probability density function of a gaussian mixture model, as shown in fig. 4, the gaussian mixture model is formed by linearly overlapping three gaussian distribution curves (normal distribution curves), wherein a dotted line represents the three gaussian distribution curves, and a solid line is formed by fitting the three gaussian distribution curves. The user sharing data obtained in step S110 of the data processing method of the present disclosure may be regarded as sample data shown in a solid line part in fig. 4, the user sharing data may be clustered through a gaussian mixture model, a plurality of groups of user sharing sub-data (i.e., a dotted line part in fig. 4) conforming to gaussian distribution are obtained, and a first expectation and a first variance corresponding to each group of user sharing sub-data are obtained through a maximum expectation algorithm iteration. It should be noted that the gaussian mixture model includes, but is not limited to, the above three gaussian distributions, which may include a plurality of gaussian distributions, and the disclosure is not repeated herein.

Further, the gaussian mixture model considers that data is generated from several single gaussian distribution models, and the corresponding expression is shown as formula (1):

wherein, pi_kIs a weight factor, mu_kIs a first expectation, Σ_kIs the first variance, and K is the number of gaussian distributions contained in the gaussian mixture model.

The Gaussian mixture model is a clustering algorithm, in which each Gaussian distribution is a cluster center and only sample points areWithout knowing the sample classification, the model parameters (pi) can be iteratively calculated by the EM algorithm_k,μ_k,Σ_k)。

In an exemplary embodiment of the disclosure, fig. 5 shows a flowchart for determining a target dimension number according to a first square difference, and as shown in fig. 5, in step S501, a standard deviation corresponding to first data of each dimension is determined according to the first square difference; in step S502, the standard deviations corresponding to the first N groups of user shared sub-data are summed to obtain a first standard deviation sum, and the standard deviations corresponding to the first N +1 groups of user shared sub-data are summed to obtain a second standard deviation sum, where N >1 and is a positive integer; as the sum of the standard deviations is constantly reduced along with the increase of K, but when the data are relatively cohesive, the reduction speed of the sum of the standard deviations is obviously reduced by increasing the value of K, so that the target dimension number can be determined according to the sum of the standard deviations; in step S503, the sum of the first standard deviations is compared with the sum of the second standard deviations, and a target dimensional number is determined according to the comparison result.

Further, fig. 6 shows a schematic flow chart of determining the target dimensional number according to the sum of the first standard deviations and the sum of the second standard deviations, as shown in fig. 6, in step S601, the sum of the first standard deviations and the sum of the second standard deviations are subtracted to obtain an increment of the sum of the first standard deviations relative to the sum of the second standard deviations; in step S602, comparing the increment with a preset threshold, and determining whether the increment is smaller than the preset threshold; the smaller the preset threshold value is set, the better, namely the closer the sum of the first standard deviation and the sum of the second standard deviation is, the better; in step S603, if the increment is smaller than the preset threshold, the number of groups of the user shared sub-data corresponding to the sum of the first standard deviations is the target dimension number.

In step S120, multi-dimensional feature data is obtained, and dimension reduction processing is performed on the feature data through a second algorithm model to obtain a first feature value with the target dimension number, where the feature data is data that has a dominant effect on a sharing behavior of a user in the original data.

In an exemplary embodiment of the disclosure, part of the multidimensional feature data may be extracted from the multidimensional raw data according to a preset condition, where the preset condition may be a dimensional feature set by a user, for example, extracting data of dimensions such as a sales promotion scene, a commodity category, a user gender, a user age, and a sharing number of commodity links in the first data as the feature data, and further, the preset condition may be a data dimension set according to user experience and having a dominant effect on a sharing behavior of the user, for example, according to the user experience, the user age, the user occupation, the commodity category, and the sharing number of commodity links have an important effect on a sharing share of a user when the user purchases, and then the preset condition may be set as the user age, the user occupation, the commodity category, and the sharing number of commodity links, and then extracting the sharing number of the user age, the commodity category, and the commodity links from the multidimensional raw data, And the data corresponding to the occupation, the commodity category and the sharing times of the commodity link of the user are used as characteristic data.

In the exemplary embodiment of the disclosure, since the data volume of the acquired multi-dimensional feature data is large, and accordingly, the combination of the multi-dimensional feature data is also very large, and in the actual processing, if a large amount of time is consumed for processing all data, in order to improve the processing efficiency on the premise of ensuring the accuracy of the prediction result, the feature data may be subjected to the dimension reduction processing through the second algorithm model to extract principal components in the feature data to remove relatively unimportant factors, and simultaneously extract independent variables in the feature data to remove dependent variables. The first feature value having the same number of dimensions as the target number of dimensions determined in step S110 can be obtained by processing the feature data in a dimension reduction manner.

In an exemplary embodiment of the present disclosure, the dimension number of the feature data is greater than the target dimension number, and the dimension number of the feature data is gradually reduced to the target dimension number by establishing a covariance matrix with respect to the feature data, thereby obtaining a first feature value having the target dimension number.

In an exemplary embodiment of the present disclosure, the second algorithm model may specifically be a Principal Component Analysis (PCA), which is a statistical method that converts a set of variables that may have correlation into a set of linearly uncorrelated variables through orthogonal transformation, and the converted set of variables is called principal components.

In step S130, second target data is obtained according to the first feature value, and the type of the second target data is the same as that of the first target data.

In an exemplary embodiment of the present disclosure, after obtaining the first feature value, second target data may be obtained according to the first feature value, where the type of the second target data is the same as that of the first target data, so as to facilitate subsequent matching of the two.

In an exemplary embodiment of the present disclosure, the first target data and the second target data may each include an expectation and a variance, i.e., the first target data includes a first expectation and a first variance, and the second target data includes a second expectation and a second variance.

In an exemplary embodiment of the present disclosure, the expectation and variance may be found for the first eigenvalue of each dimension in the first eigenvalue having the target degree of dimension, or the expectation and variance may be found for the combined eigenvalue formed by combining the first eigenvalues having the target degree of dimension, and the obtained expectation and variance may be taken as the second expectation and second variance.

In step S140, the first target data and the second target data are matched, and an output result is determined according to the matching result.

In the exemplary embodiment of the disclosure, after the first target data and the second target data are obtained, the first target data and the second target data may be matched, whether the user's purchase sharing is dominated by the corresponding feature value is judged, and the sharing behavior of the user is predicted according to the matching result.

In an exemplary embodiment of the disclosure, fig. 7 shows a flowchart of matching a first target data with a second target data, as shown in fig. 7, in step S701, a first expectation and a first variance in the first target data are matched with a second expectation and a second variance in the second target data, respectively; in step S702, if the first expectation is matched with the second expectation and the first variance is matched with the second variance, the first expectation may be used as a sharing probability of the user, the sharing probability is stored in a database, and the sharing probability is output for operators to refer to so as to predict a sharing behavior of the user, for example, when a sharing rate of a purchase of a certain electronic product obeys a standard normal distribution when male adolescents in the age of 25 to 30 are obtained by matching, when a user characteristic of a certain new user is also a male adolescent in the age of 25 to 30, the sharing of the same electronic product in the promotion may be predicted to meet the standard normal distribution; in step S703, if the first expectation does not match the second expectation and/or the first variance and the second variance do not match, performing dimensionality reduction processing on the feature data again to obtain a second feature value with a target dimensionality; in step S704, third target data is obtained according to the second feature value, the third target data including a third expectation and a third variance; in step S705, the third expectation and the third variance are matched with the first expectation and the first variance, respectively; in step S706, if the first expectation does not match the third expectation and/or the first variance does not match the third variance, steps S703-S705 are repeated until a target expectation and a target variance matching the first expectation and the first variance are obtained.

In an exemplary embodiment of the present disclosure, according to a matching result of first target data and second target data, a sharing probability of a user under a characteristic condition corresponding to the characteristic data may be predicted, and when the first target data is matched with the second target data, a first expectation (a second expectation) is the sharing probability of the user; when the first target data is not matched with the second target data, the feature data needs to be subjected to multiple dimensionality reduction processing to obtain a proper feature value, and further a target expectation and a target variance which are matched with a first expectation and a first variance in the first target data are obtained, so that the first expectation (target expectation) is the sharing probability of the user.

The data processing method can be applied to predicting the condition of user sharing, the data of sharing is in accordance with Gaussian distribution, and meanwhile, multiple groups of mixed sample data are obtained in actual operation, so that the data of sharing is processed through a GMM algorithm, the expectation and the variance of the multiple groups of shared data are obtained, the sum of standard deviations corresponding to the multiple groups of shared data is calculated according to the variance, and the target dimension number is determined according to the variation trend of the sum of the standard deviations; in addition, feature data which has a dominant effect on the sharing behavior of the user can be extracted from the data shared by the user through the user together according to experience or the previous statistical analysis result, then the feature data is subjected to dimensionality reduction through a PCA algorithm model to obtain a feature value with a target dimensionality, and then expectation and variance are calculated for the data corresponding to the feature value; and finally, respectively matching the expectation and the variance of the multiple groups of shared data with the expectation and the variance of the characteristic values. If the user sharing probability is matched with the expected value, the sharing behavior of the user is indicated to be dominated by the characteristic value, and the expected value can be used as the prediction of the user sharing probability; if the data is not matched with the multi-dimensional shared data, the sharing behavior of the user is not dominated by the characteristic value, then dimension reduction processing needs to be carried out on the characteristic data again, the characteristic value with the target dimension number different from the previous characteristic value is obtained, then matching is carried out, the steps are repeated until the expectation and the variance matched with the multi-dimensional shared data are obtained, and the sharing probability of the user is predicted according to the expectation.

The data processing method can avoid the situation that product operators consume a large amount of manpower and time to count the sharing situation of each dimension, can analyze the sharing willingness of all dimension users, and can perform targeted accurate operation.

The following describes embodiments of the apparatus of the present disclosure, which may be used to perform the above-mentioned data processing method of the present disclosure. For details that are not disclosed in the embodiments of the apparatus of the present disclosure, refer to the embodiments of the data processing method described above in the present disclosure.

Fig. 8 schematically shows a block diagram of a data processing device according to an embodiment of the present disclosure. As shown in fig. 8, the data processing apparatus 800 includes at least a first data processing module 801, a second data processing module 802, a third data processing module 803, and a matching module 804, specifically:

the first data processing module 801 is configured to obtain user sharing data according to multi-dimensional original data, and process the user sharing data through a first algorithm model to obtain first target data and a target dimensionality number, where the number of the first target data is the same as the target dimensionality number;

the second data processing module 802 is configured to obtain multi-dimensional feature data, and perform dimension reduction processing on the feature data through a second algorithm model to obtain a first feature value with the target dimension number, where the feature data is data that has a dominant effect on a sharing behavior of a user in the original data;

a third data processing module 803, configured to obtain second target data according to the first feature value, where the second target data and the first target data are of the same type;

the matching module 804 is configured to match the first target data with the second target data, and predict a sharing behavior of the user according to a matching result.

In an exemplary embodiment of the present disclosure, the first target data includes a first expectation and a first variance; the first data processing module 801 includes a clustering unit, a calculating unit, and a first dimension determining unit, specifically:

the clustering unit is used for clustering the user sharing data through the first algorithm model so as to obtain a plurality of groups of user sharing subdata which accord with Gaussian distribution;

the computing unit is used for iteratively obtaining a first expectation and a first variance corresponding to each group of the user sharing subdata by adopting a maximum expectation algorithm;

a first dimension determining unit, configured to determine the target dimension number according to the first square difference.

In an exemplary embodiment of the present disclosure, the dimension determining unit includes a standard deviation determining unit, a standard deviation summing unit, and a comparing unit, specifically:

the standard deviation determining unit is used for determining standard deviations corresponding to the user sharing subdata of each group according to the first square deviation;

a standard deviation summing unit, configured to sum standard deviations corresponding to the first N groups of the user shared sub data to obtain a first standard deviation sum, and sum standard deviations corresponding to the first N +1 groups of the user shared sub data to obtain a second standard deviation sum, where N is greater than 1 and is a positive integer;

and the comparison unit is used for comparing the sum of the first standard deviations with the sum of the second standard deviations and determining the target dimension number according to a comparison result.

In an exemplary embodiment of the present disclosure, the comparing unit includes an increment obtaining unit, a judging unit, and a second dimension determining unit, specifically:

an increment acquisition unit configured to subtract the sum of the first standard deviations from the sum of the second standard deviations to acquire an increment of the sum of the first standard deviations relative to the sum of the second standard deviations;

the judging unit is used for comparing the increment with a preset threshold value and judging whether the increment is smaller than the preset threshold value or not;

and the second dimension determining unit is used for judging that the dimension number corresponding to the sum of the first standard deviations is the target dimension number when the increment is smaller than the preset threshold.

In an exemplary embodiment of the present disclosure, the second data processing module 802 includes a data extraction unit and a data removal unit, specifically:

the data extraction unit is used for extracting the feature data from the original data according to preset conditions;

and the data removing unit is used for removing redundant data in the characteristic data through the second algorithm model so as to obtain a first characteristic value with the target dimension number.

In an exemplary embodiment of the present disclosure, the target dimension number is multidimensional, the second target data includes a second expectation and a second variance; the third data processing module 803 includes:

a first eigenvalue processing unit, configured to find an expectation and a variance of each of the first eigenvalues having the target dimensionality, and take the expectation and the variance as the second expectation and the second variance, respectively; or

For finding an expectation and a variance of a combined eigenvalue formed by a combination of first eigenvalues having the target degree of dimensionality, and taking the expectation and the variance as the second expectation and the second variance, respectively.

In an exemplary embodiment of the present disclosure, the matching module 804 comprises a first matching unit and a probability determination unit, specifically:

a first matching unit for matching a first expectation and a first variance in the first target data with a second expectation and a second variance in the second target data, respectively;

and the probability determining unit is used for taking the first expectation as the sharing probability of the user and predicting the sharing behavior of the user according to the sharing probability when the first expectation is matched with the second expectation and the first variance is matched with the second variance.

In an exemplary embodiment of the present disclosure, the matching module 804 further includes a second feature value obtaining unit, a third target data obtaining unit, a second matching unit, and a target data determining unit, specifically:

a second eigenvalue obtaining unit, configured to perform dimensionality reduction processing on the eigenvalue data again to obtain a second eigenvalue with the target dimensionality when the first expectation does not match the second expectation and/or the first variance does not match the second variance;

a third target data obtaining unit configured to obtain third target data according to the second feature value, where the third target data includes a third expectation and a third variance;

a second matching unit for matching the third expectation and the third variance with the first expectation and the first variance, respectively;

a target data determination unit, configured to, when the first expectation does not match the third expectation and/or the first variance does not match the third variance, repeat the above steps until a target expectation and a target variance matching the first expectation and the first variance are obtained.

It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.

Moreover, although the steps of the methods of the present disclosure are depicted in the drawings in a particular order, this does not require or imply that the steps must be performed in this particular order, or that all of the depicted steps must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions, etc.

Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, a mobile terminal, or a network device, etc.) to execute the method according to the embodiments of the present disclosure.

In an exemplary embodiment of the present disclosure, an electronic device capable of implementing the above method is also provided.

As will be appreciated by one skilled in the art, aspects of the present disclosure may be embodied as a system, method or program product. Accordingly, various aspects of the present disclosure may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" system.

An electronic device 900 according to this embodiment of the disclosure is described below with reference to fig. 9. The electronic device 900 shown in fig. 9 is only an example and should not bring any limitations to the functionality or scope of use of the embodiments of the present disclosure.

As shown in fig. 9, the electronic device 900 is embodied in the form of a general purpose computing device. Components of electronic device 900 may include, but are not limited to: the at least one processing unit 910, the at least one memory unit 920, and a bus 930 that couples various system components including the memory unit 920 and the processing unit 910.

Wherein the storage unit stores program code that can be executed by the processing unit 910 to cause the processing unit 910 to perform the steps according to various exemplary embodiments of the present disclosure described in the above section "detailed description" of the present specification. For example, the processing unit 910 may execute step S110 as shown in fig. 1: acquiring user sharing data according to multi-dimensional original data, and processing the user sharing data through a first algorithm model to acquire first target data and a target dimension number, wherein the number of the first target data is the same as the target dimension number; step S120: obtaining multi-dimensional feature data, and performing dimension reduction processing on the feature data through a second algorithm model to obtain a first feature value with the target dimension number, wherein the feature data is data which has a dominant effect on a sharing behavior of a user in the original data; step S130: acquiring second target data according to the first characteristic value, wherein the type of the second target data is the same as that of the first target data; step S140: and the matching module is used for matching the first target data with the second target data and predicting the sharing behavior of the user according to the matching result.

The storage unit 920 may include a readable medium in the form of a volatile storage unit, such as a random access memory unit (RAM)9201 and/or a cache memory unit 9202, and may further include a read only memory unit (ROM) 9203.

Storage unit 920 may also include a program/utility 9204 having a set (at least one) of program modules 9205, such program modules 9205 including but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.

Bus 930 can be any of several types of bus structures including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.

The electronic device 900 may also communicate with one or more external devices 1100 (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a user to interact with the electronic device 900, and/or with any devices (e.g., router, modem, etc.) that enable the electronic device 900 to communicate with one or more other computing devices. Such communication may occur via input/output (I/O) interface 950. Also, the electronic device 900 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN) and/or a public network, such as the Internet) via the network adapter 960. As shown, the network adapter 960 communicates with the other modules of the electronic device 900 via the bus 930. It should be appreciated that although not shown, other hardware and/or software modules may be used in conjunction with the electronic device 900, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.

Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, a terminal device, or a network device, etc.) to execute the method according to the embodiments of the present disclosure.

In an exemplary embodiment of the present disclosure, there is also provided a computer storage medium having stored thereon a program product capable of implementing the above-described method of the present specification. In some possible embodiments, various aspects of the disclosure may also be implemented in the form of a program product comprising program code for causing a terminal device to perform the steps according to various exemplary embodiments of the disclosure described in the "exemplary methods" section above of this specification, when the program product is run on the terminal device.

Referring to fig. 10, a program product 1000 for implementing the above method according to an embodiment of the present disclosure is described, which may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a terminal device, such as a personal computer. However, the program product of the present disclosure is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

A computer readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations for the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).

Furthermore, the above-described figures are merely schematic illustrations of processes included in methods according to exemplary embodiments of the present disclosure, and are not intended to be limiting. It will be readily understood that the processes shown in the above figures are not intended to indicate or limit the chronological order of the processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, e.g., in multiple modules.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

Claims

1. A method of data processing, the method comprising:

2. The data processing method of claim 1, wherein the first target data comprises a first expectation and a first variance;

3. The data processing method of claim 2, wherein determining the target degree of dimension from the first variance comprises:

4. The data processing method of claim 3, wherein comparing the sum of the first standard deviations with the sum of the second standard deviations, and determining the target dimensional number according to the comparison result comprises:

and if the increment is smaller than the preset threshold, the dimension number corresponding to the sum of the first standard deviations is the target dimension number.

5. The data processing method of claim 1, wherein obtaining multi-dimensional feature data, and performing dimension reduction processing on the feature data through a second algorithm model to obtain a first feature value with the target dimension number comprises:

6. The data processing method of claim 5, wherein the target dimension number is multidimensional, and the second target data comprises a second expectation and a second variance;

7. The data processing method according to any one of claims 1 to 6, wherein matching the first target data with the second target data and predicting sharing behavior of a user according to a matching result comprises:

8. The data processing method according to claim 7, wherein matching the first target data with the second target data and predicting sharing behavior of the user according to a matching result comprises:

9. A data processing apparatus, comprising:

10. A computer storage medium on which a computer program is stored, the computer program, when executed by a processor, implementing the data processing method of any one of claims 1 to 8.

11. An electronic device, comprising:

a processor; and

a memory for storing executable instructions of the processor;

wherein the processor is configured to perform the data processing method of any one of claims 1 to 8 via execution of the executable instructions.