WO2018040561A1

WO2018040561A1 - Data processing method, device and system

Info

Publication number: WO2018040561A1
Application number: PCT/CN2017/079791
Authority: WO
Inventors: 刘冬
Original assignee: 华为技术有限公司
Priority date: 2016-08-31
Filing date: 2017-04-07
Publication date: 2018-03-08
Also published as: CN107784363A; CN107784363B

Abstract

A data processing method, device and system, relating to the technical field of computers. The method comprises: obtaining data to be processed, a group of data parameters of the data to be processed being a target parameter group (205); substituting the target parameter group into a preset algorithm model to determine a target algorithm corresponding to the target parameter group (206), the target algorithm being: evaluating at least one algorithm corresponding to the target parameter group according to a preset evaluation algorithm to determine an algorithm corresponding to an optimal evaluation value; and determining, according to the target algorithm corresponding to the target parameter group, an attribute of the data to be processed. The method is used for data processing, and solves the problem of poor data processing effect, thereby improving the data processing effect.

Description

Data processing method, device and system

The present application claims priority to Chinese Patent Application No. 201610797641.3, entitled "Data Processing Method, Apparatus and System", filed on August 31, 2016, the entire contents of .

Technical field

The present application relates to the field of computer technologies, and in particular, to a data processing method, apparatus, and system.

Background technique

With the rapid development of social networks and the increasing number of network users, more and more user data (hundreds or more) is generated on the network side. Operators can process user data to determine user attributes (such as The user's gender, age, or hobbies, and make business decisions based on the user's attributes.

Generally, an operator can manually process user data generated on the network side, but because of the large amount of data to be processed, the efficiency of manual processing is low. Therefore, in the related art, according to a feature selection algorithm and A machine learning algorithm processes a plurality of user data to determine whether each of the plurality of user data has a preset feature, and further determines whether the user corresponding to each user data has a preset attribute. For example, when multiple users of a communication carrier (such as China Mobile) use the network provided by the communication carrier to communicate, the network side generates more user data, such as: the user's fee (can reflect the user's Consumption level), user's bill (can reflect the user's use of China Mobile's business). The communication operator may substitute multiple user data generated by the network side into a feature selection algorithm (such as a feature space algorithm), determine a feature set, and then substitute the feature set into a machine learning algorithm to determine the multiple users. The first user data of the data having the preset feature (the service with the highest user frequency being the preset service) and the second user data without the preset feature are sent to the user corresponding to the first user data and related to the preset service. Offer information.

Due to different user data generated by different scenarios in the related art, for example, user data generated by users of China Mobile is different from user data generated by users of China Telecom (another communication carrier), and related technologies are generated for each scenario. When the user data is processed, the same feature selection algorithm and the same machine learning algorithm are used, and the same machine learning algorithm cannot be applied to user data in all scenarios, and the accuracy of the processed user data attribute is low. Therefore, the accuracy of data processing is low, and the effect of data processing is poor.

Summary of the invention

In order to solve the problem that the effect of data processing is poor, the present application provides a data processing method, device and system. The technical solution is as follows:

In a first aspect, a data processing method is provided, the method comprising:

Obtaining data to be processed, the set of data parameters of the data to be processed is a target parameter group; after obtaining the data to be processed, the target parameter group may be substituted into a preset algorithm model to determine a target algorithm corresponding to the target parameter group, which needs to be explained The target algorithm is an algorithm for evaluating at least one algorithm corresponding to the target parameter group according to a preset evaluation algorithm, and determining an optimal evaluation value; after determining a target algorithm corresponding to the target parameter group, the target algorithm may be Target parameter group The corresponding target algorithm processes the processed data to determine the attributes of the data to be processed. Optionally, the data parameter is used to describe a feature of the data, and the target parameter group is used to describe a set of features of the to-be-processed data.

After the data to be processed is obtained in the present application, the target algorithm corresponding to the target parameter group may be directly determined according to the preset algorithm model, and the target algorithm corresponding to the target parameter group indicated by the preset algorithm model is evaluated according to the preset. The algorithm evaluates at least one algorithm corresponding to the target parameter group, and the algorithm corresponding to the determined optimal evaluation value, that is, the data to be processed according to the target algorithm corresponding to the target parameter group, and the attribute of the determined data to be processed is the most Accurate, improving the accuracy of the attributes of the determined data to be processed.

Optionally, the target algorithm may include: a target feature selection algorithm and a target machine learning algorithm, and before the target parameter group is substituted into the preset algorithm model, n sample sets may also be acquired, and each of the n sample sets The sample set may have a set of data parameters, the n sample sets have n sets of data parameters, and the n sets of data parameters of the n sample sets may include the target parameter set, and the n may be an integer greater than or equal to 1; Determining a target feature selection algorithm and a target machine learning algorithm corresponding to each set of data parameters of the n sets of data parameters. For example, after each sample set is acquired, a target corresponding to a set of data parameters of the sample set may be determined. a feature selection algorithm and a target machine learning algorithm; after determining a target feature selection algorithm and a target machine learning algorithm corresponding to each set of data parameters of the n sets of data parameters, according to each set of data parameters of the n sets of data parameters The target feature selection algorithm and the target machine learning algorithm determine the preset algorithm model.

That is, before acquiring the data to be processed, it is necessary to acquire n sample sets in advance, determine a target algorithm corresponding to each sample set, and derive a preset algorithm model according to the target algorithm of each sample set, so that according to the pre- The algorithm model can determine the target algorithm corresponding to at least one set of data parameters, and can quickly determine the target algorithm corresponding to the data to be processed according to the preset algorithm model when processing the data to be processed, thereby improving the speed and efficiency of data processing. .

Optionally, the first sample set is any one of the n sample sets, and the at least one feature selection algorithm corresponding to the first sample set and the at least one machine learning algorithm may be used to determine the first This episode is processed. Determining the target feature selection algorithm and the target machine learning algorithm corresponding to each of the n sets of data parameters may include: substituting the first sample set into at least one feature selection algorithm (ie, the first Obtaining at least one feature set in at least one feature selection algorithm corresponding to a sample set, and determining the obtained at least one feature set as at least one feature set corresponding to a set of data parameters of the first sample set; Then, at least one feature set corresponding to a set of data parameters of the first sample set may be substituted into at least one machine learning algorithm to obtain at least one processing model, and the at least one processing model is determined to be the At least one processing model corresponding to a set of data parameters of the first sample set; finally, determining, according to a preset evaluation algorithm, an evaluation value corresponding to each processing model in the at least one processing model, and processing the evaluation value optimally a feature selection algorithm and a machine learning algorithm corresponding to the model, as a target feature selection algorithm corresponding to a set of data parameters of the first sample set Standard machine learning algorithms. It should be noted that the first sample set is any sample set in n sample sets, that is, the process of determining the target feature selection algorithm and the target machine learning algorithm corresponding to each sample set in the n sample sets may be Referring to the above, the process of determining the target feature selection algorithm and the target machine learning algorithm corresponding to the first sample set.

Since the preset algorithm model is determined in advance, when the data to be processed is processed, the target feature selection algorithm and the target machine learning algorithm corresponding to the target parameter group of the data to be processed may be directly determined according to the preset algorithm model, and the whole process is performed. It takes less time and therefore improves the speed and efficiency of data processing.

Optionally, the target algorithm may include: a target feature selection algorithm and a target machine learning algorithm, where determining, according to the target algorithm corresponding to the target parameter group, the attribute of the data to be processed, including: first, the deal with Substituting data into the target feature selection algorithm corresponding to the target parameter group, obtaining a feature set, and determining the obtained feature set as a target feature set, where the target feature set includes p features, each of the p features The feature has a set of feature parameters, p features may have p set of feature parameters, the p is an integer greater than or equal to 1, and each feature in the feature set has a weight; then, p of the p features may be The group feature parameters are respectively substituted into the preset weight change model, and the weight change values corresponding to each set of the feature parameters of the p group feature parameters are determined. It should be noted that, according to the preset weight change model, the q group feature parameters can be determined. Each set of characteristic parameters corresponding to a weight change value, the q set of characteristic parameters including the p set of characteristic parameters, q ≥ p; after determining a weight change value corresponding to each set of characteristic parameters of the p set of characteristic parameters, may be determined according to The weight change value is updated to update the weight corresponding to each feature in the target feature set, that is, the weight of each feature is corresponding to a set of feature parameters of the feature. Weight change values and weights as the feature updated corresponding weight; Finally, machine learning algorithms, to determine attributes of the data to be processed in accordance with the target object feature set of the weight update feature weights and the target parameter set corresponding to.

For example, the preset weight change model may be pre-established according to the experience value of the staff. Since the preset weight change model is determined in advance, after the target feature set is obtained by using the automatic feature selection algorithm, the staff member may also refer to The empirical value is used to update the weight of the target feature set feature, so that the processed model obtained by substituting the updated target feature set into the machine learning algorithm has better processing effect.

Optionally, before determining the attribute of the data to be processed according to the target algorithm corresponding to the target parameter group, the method may further include: acquiring m sample sets, where the m data parameters of the m sample sets include For the target parameter group, the m is an integer greater than or equal to 1. For example, m may be equal to n, and m may not be equal to n; after obtaining m sample sets, m sample sets may be determined. a target feature selection algorithm corresponding to each set of data parameters of the m sets of data parameters; then, determining an initial feature set, the initial feature set may include: substituting each sample set in the m sample sets into a sample set a set of data parameters corresponding to the feature set selection algorithm obtained by the target feature selection algorithm, that is, each sample set is substituted into the target feature selection algorithm corresponding to the sample set to obtain a set of features of the sample set, the m The sample set can obtain a total of m sets of features, and all the different features of the m set of features are composed of the initial feature set; further, a reference feature set is further determined, and the reference feature set includes: Each sample set in the m sample sets is substituted into a feature set obtained by the reference feature selection algorithm; finally, the reference feature set may be compared with an initial feature set, that is, the initial set is determined according to the reference feature set The weight change value corresponding to a set of feature parameters of each feature is set in the feature set; and the preset weight change model is determined according to the weight change value corresponding to the set of feature parameters of each feature.

That is, before acquiring the data to be processed, it is necessary to acquire m sample sets in advance, and determine a target feature selection algorithm corresponding to each sample set, and derivate according to the target feature selection algorithm and the reference feature selection algorithm of each sample set. The preset weight change model is configured, so that the weight change value corresponding to the at least one set of feature parameters can be determined according to the preset weight change model, and when the data to be processed is processed, the preset weight change model can be quickly determined according to the preset weight change model. The feature change value corresponding to each feature in the feature set of the processed data is processed, and the data to be processed is processed according to the feature set after updating the weight, thereby improving the speed and efficiency of data processing.

Optionally, determining, according to the reference feature set, a weight change value corresponding to a set of feature parameters of each feature in the initial feature set, including: substituting the initial feature set into a preset machine learning algorithm, determining a processing model; and substituting the reference feature set into a preset machine learning algorithm to determine a second processing model; and evaluating the first processing model according to the preset evaluation algorithm to determine a first evaluation value; The preset evaluation algorithm evaluates the second processing model to determine a second evaluation value; after obtaining the first evaluation value and the second evaluation value, the Whether the second evaluation value is greater than the first evaluation value; if the second evaluation value is greater than the first evaluation value, and the reference feature set includes the first feature in the initial feature set, it may be determined The reference feature selection algorithm has a better processing effect than the target feature selection algorithm corresponding to the set of data parameters of the first sample set, and the weight of the first feature in the reference feature set is compared with the first feature The difference between the weights in the initial feature set is the weight change value corresponding to the set of feature parameters of the first feature. Optionally, if the second evaluation value is greater than the first evaluation value, and the reference feature set does not include the first feature in the initial feature set, the preset weight change value is used as the first feature. Corresponding weight change value, that is, when the target feature selection algorithm corresponding to the set of feature parameters of the first sample set is better than the first feature, and the reference feature set does not include the first feature, An empirical value is set as the weight change value corresponding to the first feature; if the second evaluation value is not greater than the first evaluation value, the target feature selection algorithm corresponding to a set of data parameters of the first sample set may be determined The processing effect of the reference feature selection algorithm is better. At this time, it may be determined that the weight change value corresponding to the first feature is zero.

In the present application, the processing model obtained by the target feature selection algorithm and the processing model obtained by the reference feature selection algorithm are respectively evaluated. If the first evaluation value is greater than or equal to the second evaluation value, it may be determined that the target feature selection algorithm is used to perform the target sample. The processing effect of the processing is better than that of the target feature processing by using the reference feature selection algorithm, or the same as the processing of the target sample by the reference feature selection algorithm. At this time, it is not necessary to refer to the experience value of the staff. If the first evaluation value is smaller than the second evaluation value, it may be determined that the processing effect of processing the target sample by using the reference feature selection algorithm is better than that of processing the target sample by using the target feature selection algorithm. The experience value is updated to the weight of the initial feature set feature, so that the processed model obtained by substituting the updated initial feature set into the machine learning algorithm has better processing effect on the processed data.

Optionally, the target algorithm includes: a target feature selection algorithm and a target machine learning algorithm, and according to the preset algorithm model, the target feature corresponding to each set of data parameters in the first machine learning algorithm and the at least one set of data parameters can be determined. Determining an algorithm, the step of substituting the target parameter group into a preset algorithm model, and determining a target algorithm corresponding to the target parameter group, comprising: determining that the first machine learning algorithm is a target machine learning algorithm corresponding to the target parameter group; The target parameter set and the first machine learning algorithm are substituted into the preset algorithm model, and the target parameter set and the target feature selection algorithm corresponding to the first machine learning algorithm are determined.

In the present application, after determining the target feature selection algorithm and the target machine learning algorithm corresponding to each of the at least one set of data parameters, the target feature selection corresponding to each set of data parameters in the at least one set of data parameters may be selected. The algorithm and the target machine learning algorithm determine a preset machine learning algorithm and a target feature selection algorithm corresponding to each set of data parameters in at least one set of data parameters, thereby obtaining a preset algorithm model, and according to a preset machine learning algorithm, a target parameter group, and The preset algorithm model determines a target feature set corresponding to the target parameter set and the preset machine learning algorithm.

Optionally, the target algorithm includes: a target feature selection algorithm and a target machine learning algorithm, and according to the preset algorithm model, a target feature selection algorithm and target machine learning corresponding to each set of data parameters in at least one set of data parameters can be determined. An algorithm, the step of substituting the target parameter group into a preset algorithm model, and determining a target algorithm corresponding to the target parameter group, comprising: substituting the target parameter group into the preset algorithm model, and determining the target parameter group Corresponding target feature selection algorithm and target machine learning algorithm.

In the present application, after determining the target feature selection algorithm and the target machine learning algorithm corresponding to each of the at least one set of data parameters, the target feature selection algorithm corresponding to each set of data parameters in the at least one set of data parameters And a target machine learning algorithm, determining a target feature selection algorithm and a target machine learning algorithm corresponding to each set of data parameters in at least one set of data parameters, thereby obtaining a preset algorithm model, and obtaining according to the target parameter group and the preset algorithm model The target feature selection algorithm and the target machine learning algorithm corresponding to the target parameter group.

Optionally, the target feature selection algorithm corresponding to the target parameter group may include: a feature selection algorithm based on information entropy, or a feature selection algorithm based on inter-feature correlation; the target machine learning algorithm corresponding to the target parameter group includes : Random forest RF machine learning algorithm, logistic regression LR machine learning algorithm, or support vector machine SVM machine learning algorithm.

Optionally, a set of data parameters of the data is composed of a set of metadata of the data, and a set of feature parameters of each feature is composed of a set of metadata of the feature.

Optionally, the target algorithm includes at least one of a target feature selection algorithm or a target machine learning algorithm. That is, the target algorithm corresponding to the determined target parameter group may be: a target feature selection algorithm corresponding to the target parameter group; or a target machine learning algorithm corresponding to the target parameter group; or a target feature selection algorithm corresponding to the target parameter group and Target machine learning algorithm.

In a second aspect, a data processing apparatus is provided, where the data processing apparatus includes: a first obtaining module, a first determining module, and a second determining module, wherein the first acquiring module is configured to acquire data to be processed, A set of data parameters of the data to be processed is a target parameter group; the first determining module may be configured to substitute the target parameter group into a preset algorithm model, and determine a target algorithm corresponding to the target parameter group, where the target algorithm is based on The evaluation algorithm is configured to evaluate at least one algorithm corresponding to the target parameter group, and the determined optimal evaluation value corresponds to an algorithm; and the second determining module may be configured to determine, according to the target algorithm corresponding to the target parameter group, the to-be-processed The properties of the data.

Optionally, the target algorithm includes: a target feature selection algorithm and a target machine learning algorithm, the data processing device further includes: a second obtaining module, a third determining module, and a fourth determining module, wherein the second acquiring module may For obtaining n sample sets, the n sets of data parameters of the n sample sets include the target parameter set, the n is an integer greater than or equal to 1; the third determining module may be configured to determine the n sets of data a target feature selection algorithm and a target machine learning algorithm corresponding to each set of data parameters in the parameter; the fourth determining module may be configured to select a target feature selection algorithm and a target machine learning algorithm according to each set of the data parameters of the n sets of data parameters Determining the preset algorithm model.

Optionally, the first sample set is any one of the n sample sets, and the third determining module is further configured to: substitute the first sample set into at least one feature selection algorithm to determine At least one feature set corresponding to a set of data parameters of the first sample set; at least one feature set corresponding to a set of data parameters of the first sample set is respectively substituted into at least one machine learning algorithm, and determined At least one processing model corresponding to a set of data parameters of the first sample set; determining, according to a preset evaluation algorithm, an evaluation value corresponding to each processing model in the at least one processing model, and processing the evaluation value optimally Corresponding feature selection algorithm and machine learning algorithm are used as a target feature selection algorithm and a target machine learning algorithm corresponding to a set of data parameters of the first sample set.

Optionally, the target algorithm includes: a target feature selection algorithm and a target machine learning algorithm, where the second determining module includes: a first determining unit, a second determining unit, an updating unit, and a third determining unit, where the first The determining unit may be configured to substitute the to-be-processed data into a target feature selection algorithm corresponding to the target parameter group, and determine a target feature set, where the target feature set includes p features, each of the p features has a set of characteristic parameters, the p is an integer greater than or equal to 1, and the feature in the feature set has a weight; the second determining unit may be configured to substitute the p-group feature parameters of the p features into the preset weight change model, respectively. Determining, according to the preset weight change model, a weight change value corresponding to each set of the characteristic parameters of the q group feature parameters, where the q group features are determined according to the preset weight change model The parameter includes the p group feature parameter, q≥p; the update unit may be configured to update each feature pair in the target feature set according to the determined weight change value Weight; third determination And a unit, configured to determine an attribute of the to-be-processed data according to the updated target feature set and the target machine learning algorithm corresponding to the target parameter set.

Optionally, the data processing apparatus further includes: a third obtaining module, a fifth determining module, a sixth determining module, a seventh determining module, an eighth determining module, and a nin determining module, wherein the third acquiring module is Obtaining m sample sets, the m sets of data parameters of the m sample sets include the target parameter set, the m is an integer greater than or equal to 1; the fifth determining module may be configured to determine the m sets of data parameters a target feature selection algorithm corresponding to each set of data parameters; the sixth determining module may be configured to determine an initial feature set, the initial feature set comprising: substituting each sample set in the m sample sets into one of the sample sets a feature set obtained by the target feature selection algorithm corresponding to the group data parameter; the seventh determining module may be configured to determine a reference feature set, the reference feature set comprising: substituting each sample set in the m sample sets into a reference feature Selecting features of the feature set obtained by the algorithm; the eighth determining module may be configured to determine a set of features of each feature in the initial feature set according to the reference feature set Weights corresponding to a weight change value; ninth determining module may be used according to a weight value of a weight change in a set of characteristic parameter corresponding to said each feature determining the predetermined weight change model.

Optionally, the eighth determining module is further configured to: substitute the initial feature set into a preset machine learning algorithm, determine a first processing model; substitute the reference feature set into a preset machine learning algorithm, and determine a second process The first processing model is evaluated according to the preset evaluation algorithm, and the first evaluation value is determined; the second processing model is evaluated according to the preset evaluation algorithm, and the second evaluation value is determined; Whether the second evaluation value is greater than the first evaluation value; if the second evaluation value is greater than the first evaluation value, and the reference feature set includes the first feature in the initial feature set, The difference between the weight of the first feature set in the reference feature set and the weight of the first feature in the initial feature set is a weight change value corresponding to a set of feature parameters of the first feature.

Optionally, the target algorithm includes: a target feature selection algorithm or a target machine learning algorithm.

In a third aspect, a data processing system is provided, the data processing system comprising the data processing apparatus of the second aspect.

According to a fourth aspect, a data processing apparatus is provided, the data processing apparatus comprising: at least one processor, at least one network interface, a memory, and at least one bus, wherein the memory and the network interface are respectively connected to the processor through a bus; the processor is The instructions are configured to execute the instructions stored in the memory; the processor implements the data processing method provided by any of the possible implementations of the first aspect or the first aspect by executing the instructions.

In a fifth aspect, a data processing system is provided, the data processing system comprising the data processing apparatus of the fourth aspect.

The technical effects obtained by the above second to fifth aspects are similar to those obtained by the corresponding technical means in the above first aspect, and the present application will not be repeated herein.

In summary, the present application provides a data processing method, apparatus, and system. In the data processing method, after acquiring the data to be processed, the target parameter group (the data to be processed is directly determined according to the preset algorithm model). a target algorithm corresponding to a set of data parameters, and the target algorithm corresponding to the target parameter group determined according to the preset algorithm model is to evaluate at least one algorithm corresponding to the target parameter group according to a preset evaluation algorithm, and determine the optimal algorithm. The algorithm corresponding to the evaluation value, that is, the attribute of the data to be processed determined is the most accurate according to the target algorithm corresponding to the target parameter group, so that the attribute of the data to be processed determined according to the target algorithm corresponding to the target parameter group has higher accuracy.

The above general description and the following detailed description are intended to be illustrative and not restrictive.

DRAWINGS

1 is a schematic diagram of an application scenario of a data processing method according to an embodiment of the present invention;

2 is a flowchart of a method for processing a data according to an embodiment of the present invention;

3-1 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present invention;

3-2 is a schematic structural diagram of another data processing apparatus according to an embodiment of the present invention;

3-3 is a schematic structural diagram of a second determining module according to an embodiment of the present invention;

3-4 is a schematic structural diagram of still another data processing apparatus according to an embodiment of the present invention;

FIG. 4 is a schematic structural diagram of still another data processing apparatus according to an embodiment of the present invention.

detailed description

In order to make the objects, technical solutions and advantages of the present application more clear, the embodiments of the present application will be further described in detail below with reference to the accompanying drawings.

FIG. 1 is a schematic diagram of an application scenario of a data processing method according to an embodiment of the present invention. As shown in FIG. 1 , terminals used by user A, user B, user C, and user D all access the network, so the four users are all It is a network user, where user A and user B are users of the first communication carrier (such as China Mobile), that is, both user A and user B access the network provided by the first communication carrier, and user A uses the most The service is the first service provided by the first communication carrier, the service that the user B uses the most is the second service provided by the first communication carrier, and the user C and the user D are the users of the second communication carrier (such as China Telecom). That is, both user C and user D access the network provided by the second communication carrier, and the user C uses the third service provided by the second communication carrier, and the user D uses the most service as the second communication carrier. The fourth business provided. When the user A communicates using the network provided by the first communication carrier, the network side generates the user data 1; during the process of the user B communicating using the network provided by the first communication carrier, the network side generates the user data. 2; in the process of user C communicating using the network provided by the second communication carrier, the network side generates user data 3; when the user D communicates using the network provided by the second communication carrier, the network side generates User data 4.

In the related art, when the operator of the first communication carrier processes the user data generated by the network side, two user data (user data 1 and user data 2) can be acquired, and the two user data are substituted into one type. A feature selection algorithm determines a feature set corresponding to the two user data. Specifically, when determining the feature set corresponding to the two user data, the sample data may be collected in the two user data, and the sample data is substituted into the feature space algorithm to obtain a feature set (the obtained feature set) Usually a subset of the feature set of the sample data, so the resulting feature set may also be referred to as a feature subset). The feature set is substituted into a machine learning algorithm to obtain a processing model. Finally, the sample data can be divided into multiple copies, and the attributes of each sample data are respectively determined according to the processing model, and the attributes of each sample data are substituted into a preset evaluation algorithm (such as an evaluation method based on multiple cross-validation mechanisms). And obtaining an evaluation value corresponding to the attribute of the plurality of sample data (that is, the evaluation value corresponding to the processing model), and if the evaluation value is greater than the evaluation threshold, determining that the currently obtained feature set is corresponding to the two user data The feature set; if the evaluation value is less than or equal to the evaluation threshold, the feature space selection algorithm needs to be re-acquired to obtain another feature set until the obtained evaluation value is greater than the evaluation value threshold.

Then, the determined feature set is substituted into a machine learning algorithm to determine a processing model. Finally, according to the processing model, it is determined that the user data 1 in the two user data has a preset feature (ie, user data 1 is used to indicate use The user A uses the highest frequency service as the first service. The user data 2 does not have the preset feature (that is, the user data 2 is used to indicate that the service with the highest usage frequency of the user B is not the first service), and then the first communication operation. The provider can send the preferential information related to the first service to the terminal used by the user A.

User data (user data 1 and user data 2) generated using the network provided by the first communication carrier is different from user data (user data 3 and user data 4) generated using the network provided by the second communication carrier. The data generated in the scenario, and the same machine learning algorithm cannot be applied to user data generated in different scenarios. If the operator of the second communication carrier performs user data (user data 3 and user data 4) generated on the network side. When processing, the same feature selection algorithm and machine learning algorithm as the first communication carrier are still used, which may cause the attribute of the user data 3 determined by the second communication carrier to deviate from the attribute of the user data 4, and is processed. User data attributes are less accurate.

As shown in FIG. 2, an embodiment of the present invention provides another data processing method, where the data processing method may include:

Step 201: Acquire multiple sample sets.

For example, before performing data processing, it is necessary to first acquire a plurality of sample sets from user data generated in the network, and determine a set of data parameters of each sample set. It should be noted that each sample set in the plurality of sample sets may be data generated in a scenario, and the plurality of sample sets may include a target sample set, and a set of data parameters of the target sample set may be a target parameter set. Specifically, the data parameter of the data is used to reflect the characteristics of the data, and each of the data parameters of a sample set can reflect a feature of the sample set, and a set of data parameters of a sample set can reflect the sample set. Multiple features. For example, a set of data parameters of a sample set may be composed of a set of metadata (including at least one metadata) of the sample set. If the two sample sets are different, the two sets of metadata of the two sample sets are different. Optionally, a set of data parameters of a sample set may include a mean of the sample set, a variance of the sample set, a maximum value of the sample set, a minimum value of the sample set, and the like, which are not limited by the embodiment of the present invention.

For example, as shown in Table 1, a set of data parameters of the sample set 1 may include: first metadata, second metadata, ..., X-th metadata; a set of data parameters of the sample set 2 may include: X +1 metadata, X+2 metadata, ..., yth metadata; a set of data parameters of sample set 3 may include: Y+1 metadata, Y+2 metadata, ..., Z-th metadata A set of data parameters of the sample set 4 may include: Z+1 metadata, Z+2 metadata, ..., W-th metadata. It should be noted that any two metadata in Table 1 may be the same or different, but the two sets of data parameters in any two sample sets are different. It should be noted that, in the embodiment of the present invention, only the number of sample sets obtained is 4, and in actual application, the number of sample sets acquired in step 201 is hundreds (or more).

Table 1

样本集Sample set	元数据Metadata
11	第1元数据，第2元数据，…，第X元数据1st metadata, 2nd metadata, ..., Xth metadata
22	第X+1元数据，第X+2元数据，…，第Y元数据X+1 metadata, X+2 metadata, ..., y metadata
33	第Y+1元数据，第Y+2元数据，…，第Z元数据Y+1 metadata, Y+2 metadata, ..., Z-dimensional data
44	第Z+1元数据，第Z+2元数据，…，第W元数据The Z+1 metadata, the Z+2 metadata, ..., the W metadata

Step 202: Determine a target feature selection algorithm and a target machine learning algorithm corresponding to a set of data parameters of each sample set in the plurality of sample sets.

It should be noted that a set of data parameters may correspond to multiple feature selection algorithms and multiple machine learning algorithms (that is, when processing a set of data parameters, any one of a plurality of feature selection algorithms may be used. ,and also Any of a variety of machine learning algorithms can be employed). Selecting a feature selection algorithm from a plurality of feature selection algorithms corresponding to a set of data parameters, and selecting a machine learning algorithm from a plurality of machine learning algorithms corresponding to the set of data parameters, may form an algorithm corresponding to the set of data parameters Therefore, the set of data parameters can correspond to a variety of algorithms. And evaluating, according to the preset evaluation algorithm, the plurality of algorithms corresponding to the set of data parameters, the plurality of evaluation values may be determined, and the algorithm corresponding to the optimal evaluation value of the plurality of evaluation values is a target algorithm corresponding to the set of data parameters, and is composed of The feature selection algorithm and the machine learning algorithm of the target algorithm are target feature selection algorithms and target machine learning algorithms corresponding to the set of data parameters.

Specifically, a feature selection algorithm and a machine learning algorithm are used to process a certain sample set, and it can be determined whether the sample set has a preset feature, thereby determining an attribute of the sample set, that is, determining an attribute of the sample set. Yes: has preset features, or does not have preset features. If it is determined that the user corresponding to the sample set is female, or is not female.

The preset evaluation algorithm can evaluate parameters such as the accuracy or error rate of the process of "determining the attributes of the sample set by using a certain feature selection algorithm and a certain machine learning algorithm", and the numerical value is expressed in the form of a numerical value. It can be called the evaluation value of the preset evaluation algorithm. The better the evaluation value, the more accurate the attribute of the determined sample set. Specifically, when the preset evaluation algorithm is used to evaluate the accuracy, the larger the evaluation value, the more accurate the attribute of the determined sample set, and the optimal evaluation value at this time is the maximum evaluation value; when the preset evaluation algorithm is used When evaluating the error rate, the smaller the evaluation value, the more accurate the attribute of the determined sample set, and the optimal evaluation value at this time is the minimum evaluation value. For example, the preset evaluation algorithm may be an evaluation method based on the multiple cross-validation mechanism, and the preset evaluation algorithm may also be other evaluation algorithms, which is not limited by the embodiment of the present invention.

Since the process of determining the target feature selection algorithm and the target machine learning algorithm corresponding to each group of data parameters is similar, the embodiment of the present invention only determines the target feature selection algorithm and the target machine learning algorithm corresponding to the target parameter group. For example, the specific steps of determining the target feature selection algorithm and the target machine learning algorithm corresponding to the other group data parameters may refer to: determining specific steps of the target feature selection algorithm and the target machine learning algorithm corresponding to the target parameter group, and the embodiment of the present invention does not Make a statement. For example, determining a target feature selection algorithm and a target machine learning algorithm corresponding to the target parameter group may include:

First, the target sample set is substituted into at least one feature selection algorithm to determine at least one feature set corresponding to the target parameter set. Specifically, the at least one feature selection algorithm may include a feature selection algorithm based on information entropy or a feature selection algorithm based on inter-feature correlation. It should be noted that the at least one feature selection algorithm may further include other feature selection. The algorithm is not mentioned in this example. Then, at least one feature set corresponding to the target parameter group may be substituted into at least one machine learning algorithm to determine at least one processing model corresponding to the target parameter group. For example, if the target parameter group corresponds to the A feature sets, the A feature sets are respectively substituted into the B machine learning algorithms, and the A×B processing models are determined. Finally, the evaluation value corresponding to each processing model in the at least one processing model may be determined according to a preset evaluation algorithm, and the feature selection algorithm and the machine learning algorithm corresponding to the processing model with the optimal evaluation value are used as the target features corresponding to the target parameter group. The selection algorithm and the target machine learning algorithm. For example, if A×B is equal to 6, and the evaluation values corresponding to the six processing models are 10, 20, 30, 40, 50, and 60, respectively, the corresponding feature selection corresponding to the processing model with an evaluation value of 60 may be selected. The algorithm and the machine learning algorithm are used as the target feature selection algorithm and the target machine learning algorithm corresponding to the target parameter group. Optionally, the target feature selection algorithm corresponding to the target parameter group may include: a feature selection algorithm based on information entropy, or a feature selection algorithm based on correlation between features; the target machine learning algorithm corresponding to the target parameter group may include: a random forest (English: Random Forest; abbreviation: RF) machine learning algorithm, logic Regression (English: Logistic Regression; referred to as: LR) machine learning algorithm, or support vector machine (English: Support Vector Machine) machine learning algorithm.

For example, a list of target feature selection algorithms and target machine learning algorithms corresponding to each set of data parameters may be created. The list may be as shown in Table 2, data parameters: first metadata, second metadata, ... , Xth metadata (a set of data parameters of sample set 1), corresponding target feature selection algorithm 2 and target machine learning algorithm 3, data parameters: X+1 metadata, X+2 metadata, ..., Y Metadata (a set of data parameters of sample set 2), corresponding to target feature selection algorithm 2 and target machine learning algorithm 2, data parameters: Y+1 metadata, Y+2 metadata, ..., Z-dimensional data ( a set of data parameters of sample set 3), corresponding to target feature selection algorithm 1 and target machine learning algorithm 2, data parameters: Z+1 metadata, Z+2 metadata, ..., W-th data (sample set 4) A set of data parameters) corresponding to the target feature selection algorithm 1 and the target machine learning algorithm 3. It should be noted that only the identifier of the target feature selection algorithm and the identifier of the target machine learning algorithm may be recorded in the list.

Table 2

数据参数Data parameter	目标特征选择算法Target feature selection algorithm	目标机器学习算法Target machine learning algorithm
第1元数据，第2元数据，…，第X元数据1st metadata, 2nd metadata, ..., Xth metadata	22	33
第X+1元数据，第X+2元数据，…，第Y元数据X+1 metadata, X+2 metadata, ..., y metadata	22	22
第Y+1元数据，第Y+2元数据，…，第Z元数据Y+1 metadata, Y+2 metadata, ..., Z-dimensional data	11	22
第Z+1元数据，第Z+2元数据，…，第W元数据The Z+1 metadata, the Z+2 metadata, ..., the W metadata	11	33

Step 203: Determine a preset algorithm model according to a target feature selection algorithm and a target machine learning algorithm corresponding to each set of data parameters.

Specifically, in step 201, the sample set can be continuously acquired, and after each sample set is acquired in step 201, the target feature selection algorithm corresponding to a set of data parameters of the sample set and the target machine learning are performed in step 202. The algorithm, until the number of sample sets acquired in step 201 is n, the steps in step 203 can be performed, n can be an integer greater than or equal to 1, and n sample sets have n sets of data parameters. After determining the target feature selection algorithm and the target machine learning algorithm corresponding to each set of data parameters in the n sets of data parameters, the preset algorithm model may be determined according to the target feature selection algorithm and the target machine learning algorithm corresponding to each set of data parameters. Specifically, according to the list established in step 202 (Table 2), a preset algorithm model capable of determining a target feature selection algorithm and a target machine learning algorithm corresponding to each set of data parameters of at least one set of data parameters may be derived.

The preset algorithm model may be a correspondence relationship record table, wherein the correspondence relationship record table records at least one set of data parameters, and a target feature selection algorithm and a target machine learning algorithm corresponding to each set of data parameters in the at least one set of data parameters, That is, according to the correspondence relationship record table (preset algorithm model), the target feature selection algorithm and the target machine learning algorithm corresponding to each set of data parameters can be determined. Optionally, the preset algorithm model may not be a correspondence record table. For example, the preset algorithm model may also be a three-dimensional coordinate curve, and the x variable in the three-dimensional coordinate is a data parameter group, and the y variable is a target. The feature selection algorithm, the z variable is a target machine learning algorithm, and the three-dimensional coordinate curve can correspond to at least one set of data parameters. It should be noted that the preset algorithm model may also be expressed in other forms, which is not limited by the embodiment of the present invention.

On the one hand, if the n sets of data parameters are different, the target algorithm corresponding to each set of data parameters in the n sets of data parameters can be determined according to the preset algorithm model determined in step 203; on the other hand, if n sets of data parameters If there are at least two sets of identical data parameters, the target algorithm corresponding to each set of data parameters in the L sets of data parameters can be determined according to the preset algorithm model determined in step 203, and L is an integer less than n.

Optionally, if the first machine learning algorithm is used to process the data in the process of processing the data, after determining the target feature selection algorithm and the target machine learning algorithm corresponding to each set of data parameters in the n sets of data parameters, Determining a preset algorithm model according to the target feature selection algorithm and the target machine learning algorithm corresponding to each set of data parameters, and the first machine learning algorithm, and determining, according to the preset algorithm model, the first machine learning algorithm and the at least one set of data The target feature selection algorithm corresponding to each set of data parameters in the parameter.

Step 204: Determine a preset weight change model according to a target feature selection algorithm corresponding to each set of data parameters.

For example, in step 201, the sample set may be continuously acquired, and after each sample set is acquired in step 201, the target feature selection algorithm corresponding to a set of data parameters of the sample set and the target machine learning are performed in step 202. The algorithm may perform the step in step 204 when the number of sample sets obtained in step 201 is m, m may be an integer greater than or equal to 1, and m sample sets have m sets of data parameters, in step 204 The m may be the same as the n in the step 203, or the m in the step 204 may be different from the n in the step 203, which is not limited by the embodiment of the present invention. After determining the target feature selection algorithm corresponding to each group of data parameters in the m group data parameter, the preset weight change model may be determined according to the target feature selection algorithm corresponding to each set of data parameters.

Specifically, the m sample sets may be respectively substituted into a target feature selection algorithm corresponding to a set of data parameters of the sample set, and the m sets of feature sets are obtained, and the initial feature set is determined according to the obtained m set of feature sets, and the initial feature set may include All features (q features) in the m group feature set. For example, if the m group feature set is: (feature 1, feature 2, feature 3), (feature 1, feature 3, feature 4) and (feature 1, feature 2, feature 5), then the initial feature set can be determined It can be: (Feature 1, Feature 2, Feature 3, Feature 4, Feature 5). It should be noted that, after determining the initial feature set, the features in the initial feature set may be sorted according to a preset sorting algorithm, and each feature in the initial feature set is given a weight. For example, the weight of the feature 1 may be 5. The weight of feature 2 may be 3, the weight of feature 3 may be 2.5, the weight of feature 4 may be 1, and the weight of feature 5 may be 0.5.

Then, the m sample sets may be substituted into the reference feature selection algorithm to obtain the m sets of feature sets, and the reference feature set may be determined according to the obtained m set of feature sets, and the reference feature set may include all the features in the m sets of feature sets. For example, if the m group feature set is: (feature 1, feature 2, feature 3), (feature 1, feature 3, feature 6) and (feature 1, feature 2, feature 5), then the initial feature set can be determined It can be: (Feature 1, Feature 2, Feature 3, Feature 5, Feature 6). It should be noted that, after determining the reference feature set, the features in the reference feature set may be sorted according to a preset sorting algorithm, and each feature in the reference feature set is given a weight. For example, the weight of the feature 1 may be 5. The weight of feature 2 may be 2.5, the weight of feature 3 may be 1, the weight of feature 5 may be 0.9, and the weight of feature 6 may be 0.6. The reference feature selection algorithm may be an artificial feature selection algorithm, that is, according to the experience value of the staff, each sample is analyzed and judged, and then the reference feature set is determined, and the reference feature set may be continued according to the experience value of the staff. All features are sorted to give each feature a weight in the reference feature set.

Finally, according to the obtained reference feature set, the weight change value corresponding to each feature in the initial feature set may be determined, and the weight change value of each feature is determined as the weight change value corresponding to the set of feature parameters of the feature. Specifically, the initial feature set may be substituted into a preset machine learning algorithm, the first processing model is determined, and the reference feature set is substituted into a preset machine learning algorithm to determine the second processing model. And evaluating the first processing model according to the preset evaluation algorithm, determining the first evaluation value, and evaluating the second processing model according to the preset evaluation algorithm to determine the second evaluation value. Then, it is judged whether the second evaluation value is greater than the first evaluation value, that is, the processing effect of processing the target sample by using the reference feature selection algorithm is good, or the processing effect of processing the target sample by using the target feature selection algorithm corresponding to the target parameter group. it is good. If the second evaluation value is greater than the first evaluation value, and the reference feature set includes the first feature in the initial feature set, the first special The weight of the eigenvalue in the reference feature set, and the difference between the weight of the first feature in the initial feature set, and the weight change value corresponding to the set of feature parameters of the first feature. If the second evaluation value is greater than the first evaluation value, and the reference feature set does not include the first feature in the initial feature set, the preset weight change value is used as the weight change value corresponding to the set of feature parameters of the first feature; If the evaluation value is not greater than the first evaluation value, it is determined that the weight change value corresponding to the set of characteristic parameters of the first feature is zero.

If the second evaluation value is less than or equal to the first evaluation value, it may be determined that the weight change values corresponding to the features 1, 2, 3, 4, and 5 are all 0. If the second evaluation value is greater than the first evaluation value, the feature set includes the feature 1 for the feature 1 in the initial feature set, so the weight 5 of the feature 1 in the reference feature set and the weight 5 of the feature 1 in the initial feature set can be The difference 0 is a weight change value for a set of characteristic parameters (first metadata, second metadata, ... C-ary data) of feature 1. For feature 2 in the initial feature set, feature reference set contains feature 2, so the difference between the weight 2.5 of the reference feature set feature 2 and the weight 3 of the initial feature set feature 2 can be used as a set of feature parameters of feature 2. (C+1 metadata, C+2 metadata, ... D metadata) corresponding weight change values. For the feature 3 in the initial feature set, the reference feature set includes the feature 3, so the difference between the weight 0.9 of the reference feature set feature 3 and the weight 2.5 of the initial feature set feature 3 can be used as a set of feature parameters of the feature 3. (D+1 metadata, D+2 metadata, ... E-element data) corresponding weight change values. For the feature 4 in the initial feature set, the feature set does not include the feature 4, so the preset feature value (such as -0.2) can be used as a set of feature parameters of the feature 4 (E+1 metadata, E+ 2 yuan data, ... F-metadata) corresponding weight change value. For the feature 5 in the initial feature set, the reference feature set includes the feature 5, so the difference between the weight 1 of the reference feature set feature 5 and the weight of the initial feature set feature 5 of 0.5 can be used as a set of feature parameters of the feature 5 ( The weight change value corresponding to the F+1 metadata, the F+2 metadata, the ... G metadata. Optionally, if the reference feature set does not include multiple features in the initial feature set, a simple descent algorithm may be used to divide the weight sum "1" into each feature, that is, assign one to each of the multiple features. The weight change value is such that the sum of the weight change values of the plurality of features is 1.

After determining the weight change values corresponding to a set of feature parameters of each feature in the initial feature set, a list may be used to record the weight change values corresponding to a set of feature parameters of each feature in the initial feature set. By way of example, as shown in Table 3, Table 3 records the weight change values for a set of feature parameters for each feature in the initial feature set. It should be noted that the embodiment of the present invention only exemplifies the number of features in the initial feature set is 5. In practical applications, the number of features in the initial feature set may not be 5.

table 3

初始特征集中的特征的特征参数Characteristic parameters of features in the initial feature set	权重变化值Weight change value
第1元数据、第2元数据、...第C元数据First metadata, second metadata, ... C metadata	55
第C+1元数据、第C+2元数据、...第D元数据C+1 metadata, C+2 metadata, ... D metadata	2.52.5
第D+1元数据、第D+2元数据、...第E元数据D+1 metadata, D+2 metadata, ... E metadata	11
第E+1元数据、第E+2元数据、...第F元数据E+1 metadata, E+2 metadata, ... F metadata	0.90.9
第F+1元数据、第F+2元数据、...第G元数据F+1 metadata, F+2 metadata, ... G metadata	0.60.6

After determining a weight change value corresponding to a set of feature parameters of each feature in the initial feature set, the preset weight change model may be determined according to the weight change value corresponding to each set of feature parameters, that is, the preset weight may be derived according to Table 3. Change model.

Step 205: Acquire data to be processed, and a set of data parameters of the data to be processed is a target parameter group.

Before step 205, the preset algorithm model and the preset weight change model have been determined. In step 205, the data parameter may be processed according to data of any set of data parameters that can be determined according to the preset algorithm model. just now Taking the embodiment shown in FIG. 1 as an example, on the one hand, the data to be processed obtained in step 205 may include: in the process of user A in FIG. 1 communicating using the network provided by the first communication carrier, the network side generates User data 1 and the user data 2 generated by the network B in the process of the user B using the network provided by the first communication carrier; on the other hand, the data to be processed obtained in step 205 may include: user C is In the process of communicating using the network provided by the second communication carrier, the user data 3 generated by the network side, and the user data 4 generated by the network side during the communication of the user D using the network provided by the second communication carrier.

It should be noted that a set of data parameters of the data to be processed may be a target parameter group. It should be noted that, in the embodiment of the present invention, a process of processing data parameters as a target parameter group of the data to be processed is taken as an example for detailed explanation. The process of the data parameter being the data to be processed of the other group data parameters that can be determined according to the preset algorithm model may refer to the process of processing the data to be processed as the target parameter group, which is not described herein.

Step 206: Substituting the target parameter group into a preset algorithm model, and determining a target algorithm corresponding to the target parameter group.

For example, the target algorithm determined in step 206 may include: at least one of a target feature selection algorithm and a target machine learning algorithm, that is, the target algorithm corresponding to the determined target parameter group may be: a target parameter group corresponding to a target feature selection algorithm; or a target machine learning algorithm corresponding to the target parameter group; or a target feature selection algorithm and a target machine learning algorithm corresponding to the target parameter group. For example, in the embodiment of the present invention, the target algorithm includes: a target feature selection algorithm and a target machine learning algorithm as an example.

On the one hand, when step 206 is performed, if it is specified that the first machine learning algorithm must be used in the process of processing the data to be processed, the first machine learning algorithm and the target parameter group may be substituted into the preset algorithm model to obtain the The first machine learning algorithm and the target feature selection algorithm corresponding to the target parameter set, and the obtained target feature selection algorithm and the first machine learning algorithm are used as the target feature selection algorithm and the target machine learning algorithm corresponding to the target parameter group. On the other hand, when step 206 is executed, if it is not explicitly specified that a certain machine learning algorithm must be used in the process of processing the data to be processed, the target parameter group can be directly substituted into the preset algorithm model to obtain the target. The target feature selection algorithm and the target machine learning algorithm corresponding to the parameter group.

It should be noted that, if only the target feature selection algorithm corresponding to the target parameter group is determined in step 206, a machine learning algorithm may be determined according to the related art as the target machine learning algorithm corresponding to the target parameter group. If only the target machine learning algorithm corresponding to the target parameter group is determined in step 206, a feature selection algorithm may be determined according to the related technology as the target feature selection algorithm corresponding to the target parameter group.

Step 207: Determine an attribute of the data to be processed according to the target algorithm corresponding to the target parameter group and the preset weight change model.

For example, since a set of data parameters of the data to be processed is a target parameter group, the data to be processed may be substituted into a target feature selection algorithm corresponding to the target parameter group to determine a target feature set. Specifically, the initial feature set in step 204 may include a target feature set, that is, each feature in the target feature set belongs to the initial feature set. For example, after determining the target feature set, each feature in the target feature set may also be sorted by using a preset sorting algorithm to determine the weight of each feature in the target feature set. For example, if the features in the target feature set are Feature 1, Feature 2, Feature 3, Feature 4, Feature 5, and the weight of Feature 1 may be 5, the weight of Feature 2 may be 3, and the weight of Feature 3 may be 2.5. The weight of the feature 4 may be 1, and the weight of the feature 5 may be 0.5, and the features in the target feature set are sorted according to the weights: feature 1, feature 2, feature 3, feature 4, and feature 5.

After the target feature set is determined, the weight change value corresponding to a set of feature parameters of each feature in the target feature set may be determined according to the preset weight change model determined in step 204. Specifically, the feature 1, the feature 2, and the feature may be 3, The five sets of feature parameters in feature 4 and feature 5 are substituted into the preset weight change model, and the corresponding weight change values corresponding to each set of feature parameters are determined. After determining the weight change value corresponding to each set of feature parameters, the weight corresponding to each feature in the target feature set may be updated according to the weight change value corresponding to each set of feature parameters. Specifically, the weight corresponding to each feature may be The sum of the weight change values corresponding to a set of feature parameters of the feature as the updated weight of the feature. For example, if the weight of the target feature set feature 1 is 5, the weight change value corresponding to the set of feature parameters of the feature 1 is 0, the weight of the updated feature 1 is 5; if the weight of the target feature set 2 is 3, the weight change value corresponding to a set of feature parameters of the feature 2 is -0.5, and the weight of the updated feature 2 is 2.5; if the weight of the feature set 3 in the target feature set is 2.5, a set of features of the feature 3 If the weight change value of the parameter is -1.6, the weight of the updated feature 3 is 0.9; if the weight of the feature feature 4 of the target feature set is 1, the weight change value of the set of feature parameters of the feature 4 is -0.2. Then, the weight of the updated feature 4 is 0.8; if the weight of the target feature set feature 5 is 0.5, and the weight change value corresponding to the set of feature parameters of the feature 5 is 0.5, the weight of the feature 5 that can be updated is 1 Therefore, the features of the updated target feature set are sorted according to the weights: Feature 1, Feature 2, Feature 5, Feature 3, and Feature 4.

After obtaining the target feature set after updating the weight, the attribute of the data to be processed may be determined according to the target machine learning algorithm corresponding to the updated target feature set and the target parameter set. Specifically, the updated target feature set may be substituted into the target. In the target machine learning algorithm corresponding to the parameter group, a processing model is obtained, and the data to be processed is substituted into the processing model to determine the attributes of the model to be processed.

In the related art, when the first communication carrier processes the user data generated by the network side, the two user data may be substituted into a feature selection algorithm to obtain an initial feature set, and then, according to the initial feature set. Constructing a plurality of feature selection weak classifiers, and repeatedly iterating the plurality of feature selection weak classifiers based on a Boosting algorithm (an algorithm for improving the accuracy of the weak classification algorithm), in each iterative process, A machine learning algorithm verifies the accuracy of the attributes of the two user data obtained by the current feature selection weak classifier. If the attribute of the two user data obtained by the current feature selection weak classifier is inaccurate, the current feature selection is required. The weak classifier is replaced with another feature to select the weak classifier, and the size of the parameter in the other feature selection classifier is adjusted. If the current feature selects the attribute of the two user data obtained by the weak classifier to be accurate, the current feature selection weak classifier is used as the feature selection strong classifier, and the feature selection strong classifier and the one machine learning algorithm determine the two The attributes of the user data. However, the process of repeatedly iterating the plurality of feature selection weak classifiers based on the Boosting algorithm takes a long time, so the data processing speed is slow and the data processing efficiency is low. In the embodiment of the present invention, since the preset algorithm model is determined in advance, when the data processing is performed, the target feature selection algorithm and the target machine learning algorithm corresponding to the data to be processed may be directly determined according to the preset algorithm model, and the whole process is performed. It takes less time, so it increases the speed and efficiency of data processing.

In the related art, the data to be processed may be substituted into an automatic feature selection algorithm (such as an information gain based or correlation-based feature selection algorithm) to determine a target feature set. However, the automatic feature selection algorithm is essentially an algorithm based on mathematical statistics theory, that is, the automatic feature selection algorithm can determine the discrimination of a certain tag in the feature of the data to be processed according to the value in the data to be processed. The best feature, but in actual sense is not necessarily the best distinguishing feature, such as identity (English: identification; referred to as: ID) class features, in this case, the selected feature set is substituted into a machine learning algorithm The obtained processing model has a poor processing effect on the processed data. The feature selected by the staff based on the empirical value of the feature value of the data to be processed may be different from the feature determined by the automatic feature selection algorithm, but the feature selected by the worker is substituted into a processing model obtained by a certain machine learning algorithm. The processing of the processed data is better. In the embodiment of the present invention, a preset weight change model is established in advance, so that automatic feature selection is used. After the algorithm obtains the feature set, the weight of the feature set can be updated by referring to the experience value of the staff, so that the processed model obtained by substituting the updated feature set into the machine learning algorithm has better processing effect on the processed data.

In summary, in the data processing method provided by the embodiment of the present invention, after the data to be processed is acquired, the target parameter group (a set of data parameters of the data to be processed) can be determined according to the preset algorithm model. The algorithm, and the target algorithm corresponding to the target parameter group determined according to the preset algorithm model is an algorithm for evaluating at least one algorithm corresponding to the target parameter group according to the preset evaluation algorithm, and determining the optimal evaluation value corresponding to the algorithm, that is, The attribute of the data to be processed is determined to be the most accurate according to the target algorithm corresponding to the target parameter group, so that the attribute of the data to be processed determined according to the target algorithm corresponding to the target parameter group has higher accuracy.

It should be noted that the sequence of the steps of the data processing method provided by the embodiment of the present invention may be appropriately adjusted, and the steps may also be correspondingly increased or decreased according to the situation, and any person skilled in the art may be within the technical scope disclosed in the present application. Methods that can be easily conceived of variations are covered by the scope of the present application and therefore will not be described again.

As shown in FIG. 3-1, the embodiment of the present invention provides a data processing device 30, which may include:

a first acquiring module 301, configured to acquire data to be processed, where a set of data parameters of the data to be processed is a target parameter group;

The first determining module 302 is configured to substitute the target parameter group into the preset algorithm model, and determine a target algorithm corresponding to the target parameter group. The target algorithm is: evaluating, according to the preset evaluation algorithm, at least one algorithm corresponding to the target parameter group, determining The algorithm corresponding to the optimal evaluation value;

The second determining module 303 is configured to determine an attribute of the data to be processed according to the target algorithm corresponding to the target parameter group.

As described above, in the data processing apparatus provided by the embodiment of the present invention, after the first obtaining module acquires the data to be processed, the first determining module can directly determine the target parameter group according to the preset algorithm model (the data to be processed) a target algorithm corresponding to a set of data parameters, and the target algorithm corresponding to the target parameter group determined according to the preset algorithm model is to evaluate at least one algorithm corresponding to the target parameter group according to a preset evaluation algorithm, and determine the optimal algorithm. The algorithm corresponding to the evaluation value, that is, the second determining module determines the attribute of the data to be processed to be the most accurate according to the target algorithm corresponding to the target parameter group, so that the attribute of the data to be processed determined according to the target algorithm corresponding to the target parameter group is accurate. Higher degrees.

Optionally, the target algorithm includes: a target feature selection algorithm and a target machine learning algorithm. As shown in FIG. 3-2, the embodiment of the present invention provides another data processing device 30. Based on the data of FIG. 3-1, Processing device 30 also includes:

a second obtaining module 304, configured to acquire n sample sets, where n sets of data parameters of the n sample sets include a target parameter set, where n is an integer greater than or equal to 1;

a third determining module 305, configured to determine a target feature selection algorithm and a target machine learning algorithm corresponding to each set of data parameters of the n sets of data parameters;

The fourth determining module 306 is configured to determine a preset algorithm model according to a target feature selection algorithm and a target machine learning algorithm corresponding to each set of data parameters of the n sets of data parameters;

The first sample set is any sample set in n sample sets, and the third determining module 305 can also be used to:

Substituting the first sample set into at least one feature selection algorithm to determine at least one feature set corresponding to a set of data parameters of the first sample set;

Substituting at least one feature set corresponding to a set of data parameters of the first sample set into at least one machine learning algorithm, and determining at least one processing model corresponding to a set of data parameters of the first sample set;

Determining an evaluation value corresponding to each processing model in at least one processing model according to a preset evaluation algorithm, and evaluating the value The feature selection algorithm and the machine learning algorithm corresponding to the optimal processing model are used as the target feature selection algorithm and the target machine learning algorithm corresponding to a set of data parameters of the first sample set.

Optionally, the target algorithm includes: a target feature selection algorithm and a target machine learning algorithm. As shown in FIG. 3-3, the second determining module 303 may include:

The first determining unit 3031 is configured to substitute the data to be processed into a target feature selection algorithm corresponding to the target parameter group, and determine a target feature set, where the target feature set includes p features, and each of the p features has a set of feature parameters. p is an integer greater than or equal to 1, and the feature in the feature set has a weight;

The second determining unit 3032 is configured to substitute the p group feature parameters of the p features into the preset weight change model, and determine a weight change value corresponding to each set of the feature parameters of the p group feature parameters;

The updating unit 3033 is configured to update, according to the determined weight change value, a weight corresponding to each feature in the target feature set;

The third determining unit 3034 is configured to determine an attribute of the data to be processed according to the updated target feature set and the target machine learning algorithm corresponding to the target parameter set.

As shown in FIG. 3-4, the embodiment of the present invention provides another data processing apparatus 30. The data processing apparatus 30 may further include:

a third obtaining module 307, configured to acquire m sample sets, where the m data parameters of the m sample sets include a target parameter group, where m is an integer greater than or equal to 1;

a fifth determining module 308, configured to determine a target feature selection algorithm corresponding to each group of data parameters in the m group data parameters;

The sixth determining module 309 is configured to determine an initial feature set, where the initial feature set includes: a feature set obtained by substituting each sample set in the m sample sets into a feature set obtained by the target feature selection algorithm corresponding to a set of data parameters of the sample set;

The seventh determining module 310 is configured to determine a reference feature set, where the reference feature set includes: substituting each sample set in the m sample sets into a feature set obtained by the reference feature selection algorithm;

The eighth determining module 311 is configured to determine, according to the reference feature set, a weight change value corresponding to a set of feature parameters of each feature in the initial feature set;

The ninth determining module 312 is configured to determine a preset weight change model according to the weight change value corresponding to the set of feature parameters of each feature.

Optionally, the eighth determining module 311 is further configured to:

Substituting the initial feature set into a preset machine learning algorithm to determine the first processing model;

Substituting the reference feature set into a preset machine learning algorithm to determine a second processing model;

The first processing model is evaluated according to a preset evaluation algorithm to determine a first evaluation value;

The second processing model is evaluated according to a preset evaluation algorithm to determine a second evaluation value;

Determining whether the second evaluation value is greater than the first evaluation value;

If the second evaluation value is greater than the first evaluation value, and the reference feature set includes the first feature in the initial feature set, the difference between the weight of the first feature in the reference feature set and the weight of the first feature in the initial feature set is used as A weight change value corresponding to a set of characteristic parameters of the first feature.

Optionally, the target algorithm comprises: a target feature selection algorithm or a target machine learning algorithm.

As described above, in the data processing apparatus provided by the embodiment of the present invention, after the first obtaining module acquires the data to be processed, the first determining module can directly determine the target parameter group according to the preset algorithm model (the data to be processed) a target algorithm corresponding to a set of data parameters, and a target algorithm corresponding to the target parameter set determined according to the preset algorithm model The at least one algorithm corresponding to the target parameter group is evaluated according to the preset evaluation algorithm, and the algorithm corresponding to the determined optimal evaluation value, that is, the second determining module determines the data to be processed according to the target algorithm corresponding to the target parameter group. The attribute is the most accurate, so that the accuracy of the attribute of the data to be processed determined according to the target algorithm corresponding to the target parameter group is high.

As shown in FIG. 4, an embodiment of the present invention provides another network adjustment apparatus, which may include at least one processor 401 (such as a CPU), at least one network interface 402 or other communication interface, a memory 403, and at least one. Communication bus 404 is used to implement connection communication between these devices. The processor 401 is configured to execute an executable module stored in the memory 403, such as a computer program, and the memory 403 may include a high-speed random access memory (English: Random Access Memory; RAM), and may also include a non-unstable memory ( English: non-volatile memory), such as at least one disk storage. The communication connection between the network adjustment device and the at least one other network element is implemented by at least one network interface 402 (which may be wired or wireless), and may use an Internet, a wide area network, a local network, a metropolitan area network, or the like.

In some embodiments, the memory 403 stores a program 4031, the program 4031 can be executed by the processor 401, and the data processing method shown in FIG. 2 can be implemented by the processor 401 executing the program 4031.

In summary, in the data processing apparatus provided by the embodiment of the present invention, after acquiring the data to be processed, the processor directly determines, according to the preset algorithm model, the target parameter group (a set of data parameters of the data to be processed). The target algorithm, and the target algorithm corresponding to the target parameter group determined according to the preset algorithm model is an algorithm for evaluating at least one algorithm corresponding to the target parameter group according to the preset evaluation algorithm, and determining the optimal evaluation value, That is, according to the target algorithm corresponding to the target parameter group, the attribute of the to-be-processed data is determined to be the most accurate, so that the attribute of the data to be processed determined according to the target algorithm corresponding to the target parameter group has higher accuracy.

The embodiment of the invention provides a data processing system, which may include the data processing device shown in FIG. 3-1, FIG. 3-2, FIG. 3-4 or FIG.

In summary, in the data processing apparatus in the data processing system provided by the embodiment of the present invention, after the first obtaining module acquires the data to be processed, the first determining module can directly determine the target parameter group according to the preset algorithm model. a target algorithm corresponding to a set of data parameters of the data to be processed, and the target algorithm corresponding to the target parameter group determined according to the preset algorithm model is to evaluate at least one algorithm corresponding to the target parameter group according to the preset evaluation algorithm The algorithm corresponding to the determined optimal evaluation value, that is, the second determining module determines the attribute of the data to be processed to be the most accurate according to the target algorithm corresponding to the target parameter group, so that the target algorithm determined according to the target parameter group is to be processed. The attributes of the data are more accurate.

A person skilled in the art can clearly understand that for the convenience and brevity of the description, the specific working process of the data processing apparatus and the data processing system described above can refer to the corresponding process in the foregoing data processing method embodiment, and no longer Narration.

All the foregoing optional technical solutions may be used in any combination to form an optional embodiment of the present application, and details are not described herein again.

The above description is only an optional embodiment of the present application, and is not intended to limit the present application. Any modifications, equivalent replacements, improvements, etc. made within the spirit and principles of the present application are included in the protection of the present application. Within the scope.

Claims

A data processing method, the method comprising:

Obtaining data to be processed, where a set of data parameters of the data to be processed is a target parameter group;

Substituting the target parameter group into a preset algorithm model, and determining a target algorithm corresponding to the target parameter group, where the target algorithm is to evaluate at least one algorithm corresponding to the target parameter group according to a preset evaluation algorithm, and determining The algorithm corresponding to the optimal evaluation value;

Determining an attribute of the to-be-processed data according to a target algorithm corresponding to the target parameter group.
The method according to claim 1, wherein the target algorithm comprises: a target feature selection algorithm and a target machine learning algorithm, and before the step of substituting the target parameter group into the preset algorithm model, the method further comprises:

Obtaining n sample sets, the n sets of data parameters of the n sample sets include the target parameter set, and the n is an integer greater than or equal to 1;

Determining a target feature selection algorithm and a target machine learning algorithm corresponding to each of the n sets of data parameters;

Determining the preset algorithm model according to a target feature selection algorithm and a target machine learning algorithm corresponding to each of the n sets of data parameters.
The method according to claim 2, wherein the first sample set is any sample set in the n sample sets, and the determining target features corresponding to each set of data parameters of the n sets of data parameters Selection algorithms and target machine learning algorithms, including:

Substituting the first sample set into at least one feature selection algorithm to determine at least one feature set corresponding to a set of data parameters of the first sample set;

Substituting at least one feature set corresponding to a set of data parameters of the first sample set into at least one machine learning algorithm, and determining at least one processing model corresponding to a set of data parameters of the first sample set;

Determining, according to a preset evaluation algorithm, an evaluation value corresponding to each processing model in the at least one processing model, and selecting a feature selection algorithm and a machine learning algorithm corresponding to the processing model having the optimal evaluation value as the first sample set A target feature selection algorithm and a target machine learning algorithm corresponding to a set of data parameters.
The method according to claim 1 or 2, wherein the target algorithm comprises: a target feature selection algorithm and a target machine learning algorithm, wherein the target algorithm corresponding to the target parameter group determines the data to be processed Properties, including:

Substituting the to-be-processed data into a target feature selection algorithm corresponding to the target parameter group, and determining a target feature set, where the target feature set includes p features, each of the p features having a set of feature parameters, The p is an integer greater than or equal to 1, and the feature in the feature set has a weight;

Substituting the p-group feature parameters of the p features into a preset weight change model, and determining a weight change value corresponding to each set of the p-group feature parameters;

Updating a weight corresponding to each feature in the target feature set according to the determined weight change value;

Determining the to-be-processed according to the updated target feature set and the target machine learning algorithm corresponding to the target parameter set The properties of the data.
The method according to claim 4, wherein before determining the attribute of the data to be processed according to the target algorithm corresponding to the target parameter group, the method further includes:

Obtaining m sample sets, where the m sets of data parameters of the m sample sets include the target parameter set, where m is an integer greater than or equal to 1;

Determining a target feature selection algorithm corresponding to each group of data parameters of the m group data parameters;

Determining an initial feature set, the initial feature set comprising: a feature set obtained by substituting each sample set in the m sample sets into a target feature selection algorithm corresponding to a set of data parameters of the sample set;

Determining a reference feature set, the reference feature set comprising: substituting each of the m sample sets into a feature set obtained by a reference feature selection algorithm;

Determining, according to the reference feature set, a weight change value corresponding to a set of feature parameters of each feature in the initial feature set;

And determining the preset weight change model according to the weight change value corresponding to the set of feature parameters of each feature.
The method according to claim 5, wherein the determining a weight change value corresponding to a set of feature parameters of each feature in the initial feature set according to the reference feature set comprises:

Substituting the initial feature set into a preset machine learning algorithm to determine a first processing model;

Substituting the reference feature set into a preset machine learning algorithm to determine a second processing model;

And determining, according to the preset evaluation algorithm, the first processing model to determine a first evaluation value;

And determining, according to the preset evaluation algorithm, the second processing model to determine a second evaluation value;

Determining whether the second evaluation value is greater than the first evaluation value;

And if the second evaluation value is greater than the first evaluation value, and the reference feature set includes the first feature in the initial feature set, weights of the first feature in the reference feature set, and The difference between the weights of the first feature in the initial feature set is a weight change value corresponding to a set of feature parameters of the first feature.
The method of claim 1 wherein the target algorithm comprises a target feature selection algorithm or a target machine learning algorithm.
A data processing apparatus, characterized in that the data processing apparatus comprises:

a first acquiring module, configured to acquire data to be processed, where a set of data parameters of the to-be-processed data is a target parameter group;

a first determining module, configured to substitute the target parameter group into a preset algorithm model, and determine a target algorithm corresponding to the target parameter group, where the target algorithm is at least one corresponding to the target parameter group according to a preset evaluation algorithm An algorithm for evaluating and determining an algorithm corresponding to the optimal evaluation value;

And a second determining module, configured to determine an attribute of the to-be-processed data according to a target algorithm corresponding to the target parameter group.
The data processing apparatus according to claim 8, wherein the target algorithm comprises: a target feature selection algorithm and a target machine learning algorithm, the data processing device further comprising:

a second acquiring module, configured to acquire n sample sets, where n sets of data parameters of the n sample sets include the target parameter An array, wherein n is an integer greater than or equal to 1;

a third determining module, configured to determine a target feature selection algorithm and a target machine learning algorithm corresponding to each of the n sets of data parameters;

And a fourth determining module, configured to determine the preset algorithm model according to a target feature selection algorithm and a target machine learning algorithm corresponding to each of the n sets of data parameters.
The data processing apparatus according to claim 9, wherein the first sample set is any sample set in the n sample sets, and the third determining module is further configured to:

Substituting the first sample set into at least one feature selection algorithm to determine at least one feature set corresponding to a set of data parameters of the first sample set;

Substituting at least one feature set corresponding to a set of data parameters of the first sample set into at least one machine learning algorithm, and determining at least one processing model corresponding to a set of data parameters of the first sample set;

Determining, according to a preset evaluation algorithm, an evaluation value corresponding to each processing model in the at least one processing model, and selecting a feature selection algorithm and a machine learning algorithm corresponding to the processing model having the optimal evaluation value as the first sample set A target feature selection algorithm and a target machine learning algorithm corresponding to a set of data parameters.
The data processing apparatus according to claim 8 or 9, wherein the target algorithm comprises: a target feature selection algorithm and a target machine learning algorithm, and the second determining module comprises:

a first determining unit, configured to substitute the to-be-processed data into a target feature selection algorithm corresponding to the target parameter group, and determine a target feature set, where the target feature set includes p features, each of the p features The feature has a set of characteristic parameters, the p is an integer greater than or equal to 1, and the feature in the feature set has a weight;

a second determining unit, configured to substitute the p group feature parameters of the p features into a preset weight change model, and determine a weight change value corresponding to each set of the p group feature parameters;

And an updating unit, configured to update, according to the determined weight change value, a weight corresponding to each feature in the target feature set;

And a third determining unit, configured to determine an attribute of the to-be-processed data according to the updated target feature set and the target machine learning algorithm corresponding to the target parameter set.
The data processing device according to claim 11, wherein the data processing device further comprises:

a third acquiring module, configured to acquire m sample sets, where the m sets of data parameters of the m sample sets include the target parameter set, where m is an integer greater than or equal to 1;

a fifth determining module, configured to determine a target feature selection algorithm corresponding to each group of data parameters in the m group of data parameters;

a sixth determining module, configured to determine an initial feature set, where the initial feature set includes: a feature set obtained by substituting each sample set in the m sample sets into a target feature selection algorithm corresponding to a set of data parameters of the sample set Characteristics;

a seventh determining module, configured to determine a reference feature set, where the reference feature set includes: substituting each sample set in the m sample sets into a feature set obtained by a reference feature selection algorithm;

An eighth determining module, configured to determine, according to the reference feature set, a weight change value corresponding to a set of feature parameters of each feature in the initial feature set;

The ninth determining module is configured to determine the preset weight change model according to the weight change value corresponding to the set of feature parameters of each feature.
The data processing apparatus according to claim 12, wherein the eighth determining module is further configured to:

Substituting the initial feature set into a preset machine learning algorithm to determine a first processing model;

Substituting the reference feature set into a preset machine learning algorithm to determine a second processing model;

And determining, according to the preset evaluation algorithm, the first processing model to determine a first evaluation value;

And determining, according to the preset evaluation algorithm, the second processing model to determine a second evaluation value;

Determining whether the second evaluation value is greater than the first evaluation value;

And if the second evaluation value is greater than the first evaluation value, and the reference feature set includes the first feature in the initial feature set, weights of the first feature in the reference feature set, and The difference between the weights of the first feature in the initial feature set is a weight change value corresponding to a set of feature parameters of the first feature.
The data processing apparatus according to claim 8, wherein said target algorithm comprises: a target feature selection algorithm or a target machine learning algorithm.
A data processing system, characterized in that the data processing system comprises the data processing device of any one of claims 8 to 14.