WO2018040561A1 - Data processing method, device and system - Google Patents

Data processing method, device and system Download PDF

Info

Publication number
WO2018040561A1
WO2018040561A1 PCT/CN2017/079791 CN2017079791W WO2018040561A1 WO 2018040561 A1 WO2018040561 A1 WO 2018040561A1 CN 2017079791 W CN2017079791 W CN 2017079791W WO 2018040561 A1 WO2018040561 A1 WO 2018040561A1
Authority
WO
WIPO (PCT)
Prior art keywords
feature
target
algorithm
data
parameters
Prior art date
Application number
PCT/CN2017/079791
Other languages
French (fr)
Chinese (zh)
Inventor
刘冬
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2018040561A1 publication Critical patent/WO2018040561A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • G06N5/022Knowledge engineering; Knowledge acquisition

Definitions

  • the present application relates to the field of computer technologies, and in particular, to a data processing method, apparatus, and system.
  • an operator can manually process user data generated on the network side, but because of the large amount of data to be processed, the efficiency of manual processing is low. Therefore, in the related art, according to a feature selection algorithm and A machine learning algorithm processes a plurality of user data to determine whether each of the plurality of user data has a preset feature, and further determines whether the user corresponding to each user data has a preset attribute. For example, when multiple users of a communication carrier (such as China Mobile) use the network provided by the communication carrier to communicate, the network side generates more user data, such as: the user's fee (can reflect the user's Consumption level), user's bill (can reflect the user's use of China Mobile's business).
  • a communication carrier such as China Mobile
  • the communication operator may substitute multiple user data generated by the network side into a feature selection algorithm (such as a feature space algorithm), determine a feature set, and then substitute the feature set into a machine learning algorithm to determine the multiple users.
  • a feature selection algorithm such as a feature space algorithm
  • the first user data of the data having the preset feature (the service with the highest user frequency being the preset service) and the second user data without the preset feature are sent to the user corresponding to the first user data and related to the preset service. Offer information.
  • the present application provides a data processing method, device and system.
  • the technical solution is as follows:
  • a data processing method comprising:
  • the set of data parameters of the data to be processed is a target parameter group; after obtaining the data to be processed, the target parameter group may be substituted into a preset algorithm model to determine a target algorithm corresponding to the target parameter group, which needs to be explained
  • the target algorithm is an algorithm for evaluating at least one algorithm corresponding to the target parameter group according to a preset evaluation algorithm, and determining an optimal evaluation value; after determining a target algorithm corresponding to the target parameter group, the target algorithm may be Target parameter group
  • the corresponding target algorithm processes the processed data to determine the attributes of the data to be processed.
  • the data parameter is used to describe a feature of the data
  • the target parameter group is used to describe a set of features of the to-be-processed data.
  • the target algorithm corresponding to the target parameter group may be directly determined according to the preset algorithm model, and the target algorithm corresponding to the target parameter group indicated by the preset algorithm model is evaluated according to the preset.
  • the algorithm evaluates at least one algorithm corresponding to the target parameter group, and the algorithm corresponding to the determined optimal evaluation value, that is, the data to be processed according to the target algorithm corresponding to the target parameter group, and the attribute of the determined data to be processed is the most Accurate, improving the accuracy of the attributes of the determined data to be processed.
  • the target algorithm may include: a target feature selection algorithm and a target machine learning algorithm, and before the target parameter group is substituted into the preset algorithm model, n sample sets may also be acquired, and each of the n sample sets
  • the sample set may have a set of data parameters, the n sample sets have n sets of data parameters, and the n sets of data parameters of the n sample sets may include the target parameter set, and the n may be an integer greater than or equal to 1; Determining a target feature selection algorithm and a target machine learning algorithm corresponding to each set of data parameters of the n sets of data parameters. For example, after each sample set is acquired, a target corresponding to a set of data parameters of the sample set may be determined.
  • a feature selection algorithm and a target machine learning algorithm after determining a target feature selection algorithm and a target machine learning algorithm corresponding to each set of data parameters of the n sets of data parameters, according to each set of data parameters of the n sets of data parameters
  • the target feature selection algorithm and the target machine learning algorithm determine the preset algorithm model.
  • the algorithm model can determine the target algorithm corresponding to at least one set of data parameters, and can quickly determine the target algorithm corresponding to the data to be processed according to the preset algorithm model when processing the data to be processed, thereby improving the speed and efficiency of data processing.
  • the first sample set is any one of the n sample sets
  • the at least one feature selection algorithm corresponding to the first sample set and the at least one machine learning algorithm may be used to determine the first This episode is processed.
  • Determining the target feature selection algorithm and the target machine learning algorithm corresponding to each of the n sets of data parameters may include: substituting the first sample set into at least one feature selection algorithm (ie, the first Obtaining at least one feature set in at least one feature selection algorithm corresponding to a sample set, and determining the obtained at least one feature set as at least one feature set corresponding to a set of data parameters of the first sample set; Then, at least one feature set corresponding to a set of data parameters of the first sample set may be substituted into at least one machine learning algorithm to obtain at least one processing model, and the at least one processing model is determined to be the At least one processing model corresponding to a set of data parameters of the first sample set; finally, determining, according to a preset evaluation algorithm, an evaluation value corresponding to each processing model in the at least one
  • the first sample set is any sample set in n sample sets, that is, the process of determining the target feature selection algorithm and the target machine learning algorithm corresponding to each sample set in the n sample sets may be Referring to the above, the process of determining the target feature selection algorithm and the target machine learning algorithm corresponding to the first sample set.
  • the target feature selection algorithm and the target machine learning algorithm corresponding to the target parameter group of the data to be processed may be directly determined according to the preset algorithm model, and the whole process is performed. It takes less time and therefore improves the speed and efficiency of data processing.
  • the target algorithm may include: a target feature selection algorithm and a target machine learning algorithm, where determining, according to the target algorithm corresponding to the target parameter group, the attribute of the data to be processed, including: first, the deal with Substituting data into the target feature selection algorithm corresponding to the target parameter group, obtaining a feature set, and determining the obtained feature set as a target feature set, where the target feature set includes p features, each of the p features
  • the feature has a set of feature parameters
  • p features may have p set of feature parameters
  • the p is an integer greater than or equal to 1
  • each feature in the feature set has a weight; then, p of the p features may be
  • the group feature parameters are respectively substituted into the preset weight change model, and the weight change values corresponding to each set of the feature parameters of the p group feature parameters are determined.
  • the q group feature parameters can be determined.
  • the weight change value is updated to update the weight corresponding to each feature in the target feature set, that is, the weight of each feature is corresponding to a set of feature parameters of the feature.
  • Weight change values and weights as the feature updated corresponding weight;
  • machine learning algorithms to determine attributes of the data to be processed in accordance with the target object feature set of the weight update feature weights and the target parameter set corresponding to.
  • the preset weight change model may be pre-established according to the experience value of the staff. Since the preset weight change model is determined in advance, after the target feature set is obtained by using the automatic feature selection algorithm, the staff member may also refer to The empirical value is used to update the weight of the target feature set feature, so that the processed model obtained by substituting the updated target feature set into the machine learning algorithm has better processing effect.
  • the method may further include: acquiring m sample sets, where the m data parameters of the m sample sets include For the target parameter group, the m is an integer greater than or equal to 1. For example, m may be equal to n, and m may not be equal to n; after obtaining m sample sets, m sample sets may be determined.
  • the initial feature set may include: substituting each sample set in the m sample sets into a sample set a set of data parameters corresponding to the feature set selection algorithm obtained by the target feature selection algorithm, that is, each sample set is substituted into the target feature selection algorithm corresponding to the sample set to obtain a set of features of the sample set, the m
  • the sample set can obtain a total of m sets of features, and all the different features of the m set of features are composed of the initial feature set; further, a reference feature set is further determined, and the reference feature set includes: Each sample set in the m sample sets is substituted into a feature set obtained by the reference feature selection algorithm; finally, the reference feature set may be compared with an initial feature set, that is, the initial set is determined according to the reference feature set
  • the weight change value corresponding to a set of feature parameters of each feature is set in the feature set; and the preset weight change model is determined according to
  • the preset weight change model is configured, so that the weight change value corresponding to the at least one set of feature parameters can be determined according to the preset weight change model, and when the data to be processed is processed, the preset weight change model can be quickly determined according to the preset weight change model.
  • the feature change value corresponding to each feature in the feature set of the processed data is processed, and the data to be processed is processed according to the feature set after updating the weight, thereby improving the speed and efficiency of data processing.
  • determining, according to the reference feature set, a weight change value corresponding to a set of feature parameters of each feature in the initial feature set including: substituting the initial feature set into a preset machine learning algorithm, determining a processing model; and substituting the reference feature set into a preset machine learning algorithm to determine a second processing model; and evaluating the first processing model according to the preset evaluation algorithm to determine a first evaluation value;
  • the preset evaluation algorithm evaluates the second processing model to determine a second evaluation value; after obtaining the first evaluation value and the second evaluation value, the Whether the second evaluation value is greater than the first evaluation value; if the second evaluation value is greater than the first evaluation value, and the reference feature set includes the first feature in the initial feature set, it may be determined
  • the reference feature selection algorithm has a better processing effect than the target feature selection algorithm corresponding to the set of data parameters of the first sample set, and the weight of the first feature in the reference feature set is compared with the first feature The difference between the weights in the initial feature set is the weight change value corresponding to
  • the preset weight change value is used as the first feature.
  • Corresponding weight change value that is, when the target feature selection algorithm corresponding to the set of feature parameters of the first sample set is better than the first feature, and the reference feature set does not include the first feature.
  • An empirical value is set as the weight change value corresponding to the first feature; if the second evaluation value is not greater than the first evaluation value, the target feature selection algorithm corresponding to a set of data parameters of the first sample set may be determined The processing effect of the reference feature selection algorithm is better. At this time, it may be determined that the weight change value corresponding to the first feature is zero.
  • the processing model obtained by the target feature selection algorithm and the processing model obtained by the reference feature selection algorithm are respectively evaluated. If the first evaluation value is greater than or equal to the second evaluation value, it may be determined that the target feature selection algorithm is used to perform the target sample. The processing effect of the processing is better than that of the target feature processing by using the reference feature selection algorithm, or the same as the processing of the target sample by the reference feature selection algorithm. At this time, it is not necessary to refer to the experience value of the staff. If the first evaluation value is smaller than the second evaluation value, it may be determined that the processing effect of processing the target sample by using the reference feature selection algorithm is better than that of processing the target sample by using the target feature selection algorithm. The experience value is updated to the weight of the initial feature set feature, so that the processed model obtained by substituting the updated initial feature set into the machine learning algorithm has better processing effect on the processed data.
  • the target algorithm includes: a target feature selection algorithm and a target machine learning algorithm, and according to the preset algorithm model, the target feature corresponding to each set of data parameters in the first machine learning algorithm and the at least one set of data parameters can be determined.
  • the target feature selection corresponding to each set of data parameters in the at least one set of data parameters may be selected.
  • the algorithm and the target machine learning algorithm determine a preset machine learning algorithm and a target feature selection algorithm corresponding to each set of data parameters in at least one set of data parameters, thereby obtaining a preset algorithm model, and according to a preset machine learning algorithm, a target parameter group, and
  • the preset algorithm model determines a target feature set corresponding to the target parameter set and the preset machine learning algorithm.
  • the target algorithm includes: a target feature selection algorithm and a target machine learning algorithm, and according to the preset algorithm model, a target feature selection algorithm and target machine learning corresponding to each set of data parameters in at least one set of data parameters can be determined.
  • the target feature selection algorithm corresponding to each set of data parameters in the at least one set of data parameters After determining the target feature selection algorithm and the target machine learning algorithm corresponding to each of the at least one set of data parameters, the target feature selection algorithm corresponding to each set of data parameters in the at least one set of data parameters And a target machine learning algorithm, determining a target feature selection algorithm and a target machine learning algorithm corresponding to each set of data parameters in at least one set of data parameters, thereby obtaining a preset algorithm model, and obtaining according to the target parameter group and the preset algorithm model The target feature selection algorithm and the target machine learning algorithm corresponding to the target parameter group.
  • the target feature selection algorithm corresponding to the target parameter group may include: a feature selection algorithm based on information entropy, or a feature selection algorithm based on inter-feature correlation; the target machine learning algorithm corresponding to the target parameter group includes : Random forest RF machine learning algorithm, logistic regression LR machine learning algorithm, or support vector machine SVM machine learning algorithm.
  • a set of data parameters of the data is composed of a set of metadata of the data
  • a set of feature parameters of each feature is composed of a set of metadata of the feature
  • the target algorithm includes at least one of a target feature selection algorithm or a target machine learning algorithm. That is, the target algorithm corresponding to the determined target parameter group may be: a target feature selection algorithm corresponding to the target parameter group; or a target machine learning algorithm corresponding to the target parameter group; or a target feature selection algorithm corresponding to the target parameter group and Target machine learning algorithm.
  • a data processing apparatus includes: a first obtaining module, a first determining module, and a second determining module, wherein the first acquiring module is configured to acquire data to be processed, A set of data parameters of the data to be processed is a target parameter group; the first determining module may be configured to substitute the target parameter group into a preset algorithm model, and determine a target algorithm corresponding to the target parameter group, where the target algorithm is based on The evaluation algorithm is configured to evaluate at least one algorithm corresponding to the target parameter group, and the determined optimal evaluation value corresponds to an algorithm; and the second determining module may be configured to determine, according to the target algorithm corresponding to the target parameter group, the to-be-processed The properties of the data.
  • the target algorithm includes: a target feature selection algorithm and a target machine learning algorithm
  • the data processing device further includes: a second obtaining module, a third determining module, and a fourth determining module
  • the second acquiring module may For obtaining n sample sets, the n sets of data parameters of the n sample sets include the target parameter set, the n is an integer greater than or equal to 1
  • the third determining module may be configured to determine the n sets of data a target feature selection algorithm and a target machine learning algorithm corresponding to each set of data parameters in the parameter
  • the fourth determining module may be configured to select a target feature selection algorithm and a target machine learning algorithm according to each set of the data parameters of the n sets of data parameters Determining the preset algorithm model.
  • the first sample set is any one of the n sample sets
  • the third determining module is further configured to: substitute the first sample set into at least one feature selection algorithm to determine At least one feature set corresponding to a set of data parameters of the first sample set; at least one feature set corresponding to a set of data parameters of the first sample set is respectively substituted into at least one machine learning algorithm, and determined At least one processing model corresponding to a set of data parameters of the first sample set; determining, according to a preset evaluation algorithm, an evaluation value corresponding to each processing model in the at least one processing model, and processing the evaluation value optimally Corresponding feature selection algorithm and machine learning algorithm are used as a target feature selection algorithm and a target machine learning algorithm corresponding to a set of data parameters of the first sample set.
  • the target algorithm includes: a target feature selection algorithm and a target machine learning algorithm
  • the second determining module includes: a first determining unit, a second determining unit, an updating unit, and a third determining unit, where the first The determining unit may be configured to substitute the to-be-processed data into a target feature selection algorithm corresponding to the target parameter group, and determine a target feature set, where the target feature set includes p features, each of the p features has a set of characteristic parameters, the p is an integer greater than or equal to 1, and the feature in the feature set has a weight; the second determining unit may be configured to substitute the p-group feature parameters of the p features into the preset weight change model, respectively.
  • the update unit may be configured to update each feature pair in the target feature set according to the determined weight change value Weight; third determination And a unit, configured to determine an attribute of the to-be-processed data according to the updated target feature set and the target machine learning algorithm corresponding to the target parameter set.
  • the data processing apparatus further includes: a third obtaining module, a fifth determining module, a sixth determining module, a seventh determining module, an eighth determining module, and a nin determining module, wherein the third acquiring module is Obtaining m sample sets, the m sets of data parameters of the m sample sets include the target parameter set, the m is an integer greater than or equal to 1; the fifth determining module may be configured to determine the m sets of data parameters a target feature selection algorithm corresponding to each set of data parameters; the sixth determining module may be configured to determine an initial feature set, the initial feature set comprising: substituting each sample set in the m sample sets into one of the sample sets a feature set obtained by the target feature selection algorithm corresponding to the group data parameter; the seventh determining module may be configured to determine a reference feature set, the reference feature set comprising: substituting each sample set in the m sample sets into a reference feature Selecting features of the feature set obtained by the algorithm; the eighth determining module may be configured to determine a third
  • the eighth determining module is further configured to: substitute the initial feature set into a preset machine learning algorithm, determine a first processing model; substitute the reference feature set into a preset machine learning algorithm, and determine a second process
  • the first processing model is evaluated according to the preset evaluation algorithm, and the first evaluation value is determined
  • the second processing model is evaluated according to the preset evaluation algorithm, and the second evaluation value is determined; Whether the second evaluation value is greater than the first evaluation value; if the second evaluation value is greater than the first evaluation value, and the reference feature set includes the first feature in the initial feature set,
  • the difference between the weight of the first feature set in the reference feature set and the weight of the first feature in the initial feature set is a weight change value corresponding to a set of feature parameters of the first feature.
  • the target algorithm includes: a target feature selection algorithm or a target machine learning algorithm.
  • a data processing system comprising the data processing apparatus of the second aspect.
  • a data processing apparatus comprising: at least one processor, at least one network interface, a memory, and at least one bus, wherein the memory and the network interface are respectively connected to the processor through a bus; the processor is The instructions are configured to execute the instructions stored in the memory; the processor implements the data processing method provided by any of the possible implementations of the first aspect or the first aspect by executing the instructions.
  • a data processing system comprising the data processing apparatus of the fourth aspect.
  • the present application provides a data processing method, apparatus, and system.
  • the target parameter group (the data to be processed is directly determined according to the preset algorithm model).
  • a target algorithm corresponding to a set of data parameters and the target algorithm corresponding to the target parameter group determined according to the preset algorithm model is to evaluate at least one algorithm corresponding to the target parameter group according to a preset evaluation algorithm, and determine the optimal algorithm.
  • the algorithm corresponding to the evaluation value that is, the attribute of the data to be processed determined is the most accurate according to the target algorithm corresponding to the target parameter group, so that the attribute of the data to be processed determined according to the target algorithm corresponding to the target parameter group has higher accuracy.
  • FIG. 1 is a schematic diagram of an application scenario of a data processing method according to an embodiment of the present invention
  • FIG. 2 is a flowchart of a method for processing a data according to an embodiment of the present invention
  • 3-1 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present invention.
  • 3-2 is a schematic structural diagram of another data processing apparatus according to an embodiment of the present invention.
  • 3-3 is a schematic structural diagram of a second determining module according to an embodiment of the present invention.
  • 3-4 is a schematic structural diagram of still another data processing apparatus according to an embodiment of the present invention.
  • FIG. 4 is a schematic structural diagram of still another data processing apparatus according to an embodiment of the present invention.
  • FIG. 1 is a schematic diagram of an application scenario of a data processing method according to an embodiment of the present invention.
  • terminals used by user A, user B, user C, and user D all access the network, so the four users are all It is a network user, where user A and user B are users of the first communication carrier (such as China Mobile), that is, both user A and user B access the network provided by the first communication carrier, and user A uses the most
  • the service is the first service provided by the first communication carrier, the service that the user B uses the most is the second service provided by the first communication carrier, and the user C and the user D are the users of the second communication carrier (such as China Telecom).
  • both user C and user D access the network provided by the second communication carrier, and the user C uses the third service provided by the second communication carrier, and the user D uses the most service as the second communication carrier.
  • the fourth business provided.
  • the network side When the user A communicates using the network provided by the first communication carrier, the network side generates the user data 1; during the process of the user B communicating using the network provided by the first communication carrier, the network side generates the user data. 2; in the process of user C communicating using the network provided by the second communication carrier, the network side generates user data 3; when the user D communicates using the network provided by the second communication carrier, the network side generates User data 4.
  • two user data (user data 1 and user data 2) can be acquired, and the two user data are substituted into one type.
  • a feature selection algorithm determines a feature set corresponding to the two user data. Specifically, when determining the feature set corresponding to the two user data, the sample data may be collected in the two user data, and the sample data is substituted into the feature space algorithm to obtain a feature set (the obtained feature set) Usually a subset of the feature set of the sample data, so the resulting feature set may also be referred to as a feature subset). The feature set is substituted into a machine learning algorithm to obtain a processing model.
  • the sample data can be divided into multiple copies, and the attributes of each sample data are respectively determined according to the processing model, and the attributes of each sample data are substituted into a preset evaluation algorithm (such as an evaluation method based on multiple cross-validation mechanisms).
  • obtaining an evaluation value corresponding to the attribute of the plurality of sample data that is, the evaluation value corresponding to the processing model
  • the evaluation value is greater than the evaluation threshold
  • the feature set if the evaluation value is less than or equal to the evaluation threshold, the feature space selection algorithm needs to be re-acquired to obtain another feature set until the obtained evaluation value is greater than the evaluation value threshold.
  • the determined feature set is substituted into a machine learning algorithm to determine a processing model.
  • the processing model it is determined that the user data 1 in the two user data has a preset feature (ie, user data 1 is used to indicate use
  • the user A uses the highest frequency service as the first service.
  • the user data 2 does not have the preset feature (that is, the user data 2 is used to indicate that the service with the highest usage frequency of the user B is not the first service), and then the first communication operation.
  • the provider can send the preferential information related to the first service to the terminal used by the user A.
  • User data (user data 1 and user data 2) generated using the network provided by the first communication carrier is different from user data (user data 3 and user data 4) generated using the network provided by the second communication carrier.
  • the data generated in the scenario, and the same machine learning algorithm cannot be applied to user data generated in different scenarios. If the operator of the second communication carrier performs user data (user data 3 and user data 4) generated on the network side.
  • the same feature selection algorithm and machine learning algorithm as the first communication carrier are still used, which may cause the attribute of the user data 3 determined by the second communication carrier to deviate from the attribute of the user data 4, and is processed.
  • User data attributes are less accurate.
  • an embodiment of the present invention provides another data processing method, where the data processing method may include:
  • Step 201 Acquire multiple sample sets.
  • each sample set in the plurality of sample sets may be data generated in a scenario, and the plurality of sample sets may include a target sample set, and a set of data parameters of the target sample set may be a target parameter set.
  • the data parameter of the data is used to reflect the characteristics of the data, and each of the data parameters of a sample set can reflect a feature of the sample set, and a set of data parameters of a sample set can reflect the sample set.
  • a set of data parameters of a sample set may be composed of a set of metadata (including at least one metadata) of the sample set. If the two sample sets are different, the two sets of metadata of the two sample sets are different.
  • a set of data parameters of a sample set may include a mean of the sample set, a variance of the sample set, a maximum value of the sample set, a minimum value of the sample set, and the like, which are not limited by the embodiment of the present invention.
  • Sample set Metadata 1 1st metadata, 2nd metadata, ..., Xth metadata 2 X+1 metadata, X+2 metadata, ..., y metadata 3 Y+1 metadata, Y+2 metadata, ..., Z-dimensional data 4
  • Step 202 Determine a target feature selection algorithm and a target machine learning algorithm corresponding to a set of data parameters of each sample set in the plurality of sample sets.
  • a set of data parameters may correspond to multiple feature selection algorithms and multiple machine learning algorithms (that is, when processing a set of data parameters, any one of a plurality of feature selection algorithms may be used. ,and also Any of a variety of machine learning algorithms can be employed). Selecting a feature selection algorithm from a plurality of feature selection algorithms corresponding to a set of data parameters, and selecting a machine learning algorithm from a plurality of machine learning algorithms corresponding to the set of data parameters, may form an algorithm corresponding to the set of data parameters Therefore, the set of data parameters can correspond to a variety of algorithms.
  • the algorithm corresponding to the optimal evaluation value of the plurality of evaluation values is a target algorithm corresponding to the set of data parameters, and is composed of
  • the feature selection algorithm and the machine learning algorithm of the target algorithm are target feature selection algorithms and target machine learning algorithms corresponding to the set of data parameters.
  • a feature selection algorithm and a machine learning algorithm are used to process a certain sample set, and it can be determined whether the sample set has a preset feature, thereby determining an attribute of the sample set, that is, determining an attribute of the sample set. Yes: has preset features, or does not have preset features. If it is determined that the user corresponding to the sample set is female, or is not female.
  • the preset evaluation algorithm can evaluate parameters such as the accuracy or error rate of the process of "determining the attributes of the sample set by using a certain feature selection algorithm and a certain machine learning algorithm", and the numerical value is expressed in the form of a numerical value. It can be called the evaluation value of the preset evaluation algorithm.
  • the preset evaluation algorithm may be an evaluation method based on the multiple cross-validation mechanism, and the preset evaluation algorithm may also be other evaluation algorithms, which is not limited by the embodiment of the present invention.
  • the embodiment of the present invention only determines the target feature selection algorithm and the target machine learning algorithm corresponding to the target parameter group.
  • the specific steps of determining the target feature selection algorithm and the target machine learning algorithm corresponding to the other group data parameters may refer to: determining specific steps of the target feature selection algorithm and the target machine learning algorithm corresponding to the target parameter group, and the embodiment of the present invention does not Make a statement.
  • determining a target feature selection algorithm and a target machine learning algorithm corresponding to the target parameter group may include:
  • the target sample set is substituted into at least one feature selection algorithm to determine at least one feature set corresponding to the target parameter set.
  • the at least one feature selection algorithm may include a feature selection algorithm based on information entropy or a feature selection algorithm based on inter-feature correlation. It should be noted that the at least one feature selection algorithm may further include other feature selection. The algorithm is not mentioned in this example.
  • at least one feature set corresponding to the target parameter group may be substituted into at least one machine learning algorithm to determine at least one processing model corresponding to the target parameter group. For example, if the target parameter group corresponds to the A feature sets, the A feature sets are respectively substituted into the B machine learning algorithms, and the A ⁇ B processing models are determined.
  • the evaluation value corresponding to each processing model in the at least one processing model may be determined according to a preset evaluation algorithm, and the feature selection algorithm and the machine learning algorithm corresponding to the processing model with the optimal evaluation value are used as the target features corresponding to the target parameter group.
  • the selection algorithm and the target machine learning algorithm For example, if A ⁇ B is equal to 6, and the evaluation values corresponding to the six processing models are 10, 20, 30, 40, 50, and 60, respectively, the corresponding feature selection corresponding to the processing model with an evaluation value of 60 may be selected.
  • the algorithm and the machine learning algorithm are used as the target feature selection algorithm and the target machine learning algorithm corresponding to the target parameter group.
  • the target feature selection algorithm corresponding to the target parameter group may include: a feature selection algorithm based on information entropy, or a feature selection algorithm based on correlation between features;
  • the target machine learning algorithm corresponding to the target parameter group may include: a random forest (English: Random Forest; abbreviation: RF) machine learning algorithm, logic Regression (English: Logistic Regression; referred to as: LR) machine learning algorithm, or support vector machine (English: Support Vector Machine) machine learning algorithm.
  • a list of target feature selection algorithms and target machine learning algorithms corresponding to each set of data parameters may be created.
  • the list may be as shown in Table 2, data parameters: first metadata, second metadata, ... , Xth metadata (a set of data parameters of sample set 1), corresponding target feature selection algorithm 2 and target machine learning algorithm 3, data parameters: X+1 metadata, X+2 metadata, ..., Y Metadata (a set of data parameters of sample set 2), corresponding to target feature selection algorithm 2 and target machine learning algorithm 2, data parameters: Y+1 metadata, Y+2 metadata, ..., Z-dimensional data ( a set of data parameters of sample set 3), corresponding to target feature selection algorithm 1 and target machine learning algorithm 2, data parameters: Z+1 metadata, Z+2 metadata, ..., W-th data (sample set 4) A set of data parameters) corresponding to the target feature selection algorithm 1 and the target machine learning algorithm 3. It should be noted that only the identifier of the target feature selection algorithm and the identifier of the target machine learning algorithm may be recorded in the list.
  • Target machine learning algorithm 1st metadata, 2nd metadata, ..., Xth metadata 2 3 X+1 metadata, X+2 metadata, ..., y metadata 2 2 Y+1 metadata, Y+2 metadata, ..., Z-dimensional data 1 2
  • Step 203 Determine a preset algorithm model according to a target feature selection algorithm and a target machine learning algorithm corresponding to each set of data parameters.
  • the sample set can be continuously acquired, and after each sample set is acquired in step 201, the target feature selection algorithm corresponding to a set of data parameters of the sample set and the target machine learning are performed in step 202.
  • the algorithm until the number of sample sets acquired in step 201 is n, the steps in step 203 can be performed, n can be an integer greater than or equal to 1, and n sample sets have n sets of data parameters.
  • the preset algorithm model may be determined according to the target feature selection algorithm and the target machine learning algorithm corresponding to each set of data parameters.
  • a preset algorithm model capable of determining a target feature selection algorithm and a target machine learning algorithm corresponding to each set of data parameters of at least one set of data parameters may be derived.
  • the preset algorithm model may be a correspondence relationship record table, wherein the correspondence relationship record table records at least one set of data parameters, and a target feature selection algorithm and a target machine learning algorithm corresponding to each set of data parameters in the at least one set of data parameters, That is, according to the correspondence relationship record table (preset algorithm model), the target feature selection algorithm and the target machine learning algorithm corresponding to each set of data parameters can be determined.
  • the preset algorithm model may not be a correspondence record table.
  • the preset algorithm model may also be a three-dimensional coordinate curve, and the x variable in the three-dimensional coordinate is a data parameter group, and the y variable is a target.
  • the feature selection algorithm, the z variable is a target machine learning algorithm, and the three-dimensional coordinate curve can correspond to at least one set of data parameters. It should be noted that the preset algorithm model may also be expressed in other forms, which is not limited by the embodiment of the present invention.
  • the target algorithm corresponding to each set of data parameters in the n sets of data parameters can be determined according to the preset algorithm model determined in step 203; on the other hand, if n sets of data parameters If there are at least two sets of identical data parameters, the target algorithm corresponding to each set of data parameters in the L sets of data parameters can be determined according to the preset algorithm model determined in step 203, and L is an integer less than n.
  • the first machine learning algorithm is used to process the data in the process of processing the data, after determining the target feature selection algorithm and the target machine learning algorithm corresponding to each set of data parameters in the n sets of data parameters, Determining a preset algorithm model according to the target feature selection algorithm and the target machine learning algorithm corresponding to each set of data parameters, and the first machine learning algorithm, and determining, according to the preset algorithm model, the first machine learning algorithm and the at least one set of data The target feature selection algorithm corresponding to each set of data parameters in the parameter.
  • Step 204 Determine a preset weight change model according to a target feature selection algorithm corresponding to each set of data parameters.
  • the sample set may be continuously acquired, and after each sample set is acquired in step 201, the target feature selection algorithm corresponding to a set of data parameters of the sample set and the target machine learning are performed in step 202.
  • the algorithm may perform the step in step 204 when the number of sample sets obtained in step 201 is m, m may be an integer greater than or equal to 1, and m sample sets have m sets of data parameters, in step 204
  • the m may be the same as the n in the step 203, or the m in the step 204 may be different from the n in the step 203, which is not limited by the embodiment of the present invention.
  • the preset weight change model may be determined according to the target feature selection algorithm corresponding to each set of data parameters.
  • the m sample sets may be respectively substituted into a target feature selection algorithm corresponding to a set of data parameters of the sample set, and the m sets of feature sets are obtained, and the initial feature set is determined according to the obtained m set of feature sets, and the initial feature set may include All features (q features) in the m group feature set. For example, if the m group feature set is: (feature 1, feature 2, feature 3), (feature 1, feature 3, feature 4) and (feature 1, feature 2, feature 5), then the initial feature set can be determined It can be: (Feature 1, Feature 2, Feature 3, Feature 4, Feature 5).
  • the features in the initial feature set may be sorted according to a preset sorting algorithm, and each feature in the initial feature set is given a weight.
  • the weight of the feature 1 may be 5.
  • the weight of feature 2 may be 3, the weight of feature 3 may be 2.5, the weight of feature 4 may be 1, and the weight of feature 5 may be 0.5.
  • the m sample sets may be substituted into the reference feature selection algorithm to obtain the m sets of feature sets, and the reference feature set may be determined according to the obtained m set of feature sets, and the reference feature set may include all the features in the m sets of feature sets. For example, if the m group feature set is: (feature 1, feature 2, feature 3), (feature 1, feature 3, feature 6) and (feature 1, feature 2, feature 5), then the initial feature set can be determined It can be: (Feature 1, Feature 2, Feature 3, Feature 5, Feature 6). It should be noted that, after determining the reference feature set, the features in the reference feature set may be sorted according to a preset sorting algorithm, and each feature in the reference feature set is given a weight.
  • the weight of the feature 1 may be 5.
  • the weight of feature 2 may be 2.5
  • the weight of feature 3 may be 1
  • the weight of feature 5 may be 0.9
  • the weight of feature 6 may be 0.6.
  • the reference feature selection algorithm may be an artificial feature selection algorithm, that is, according to the experience value of the staff, each sample is analyzed and judged, and then the reference feature set is determined, and the reference feature set may be continued according to the experience value of the staff. All features are sorted to give each feature a weight in the reference feature set.
  • the weight change value corresponding to each feature in the initial feature set may be determined, and the weight change value of each feature is determined as the weight change value corresponding to the set of feature parameters of the feature.
  • the initial feature set may be substituted into a preset machine learning algorithm, the first processing model is determined, and the reference feature set is substituted into a preset machine learning algorithm to determine the second processing model. And evaluating the first processing model according to the preset evaluation algorithm, determining the first evaluation value, and evaluating the second processing model according to the preset evaluation algorithm to determine the second evaluation value.
  • the second evaluation value is greater than the first evaluation value, that is, the processing effect of processing the target sample by using the reference feature selection algorithm is good, or the processing effect of processing the target sample by using the target feature selection algorithm corresponding to the target parameter group. it is good. If the second evaluation value is greater than the first evaluation value, and the reference feature set includes the first feature in the initial feature set, the first special The weight of the eigenvalue in the reference feature set, and the difference between the weight of the first feature in the initial feature set, and the weight change value corresponding to the set of feature parameters of the first feature.
  • the preset weight change value is used as the weight change value corresponding to the set of feature parameters of the first feature; If the evaluation value is not greater than the first evaluation value, it is determined that the weight change value corresponding to the set of characteristic parameters of the first feature is zero.
  • the second evaluation value is less than or equal to the first evaluation value, it may be determined that the weight change values corresponding to the features 1, 2, 3, 4, and 5 are all 0. If the second evaluation value is greater than the first evaluation value, the feature set includes the feature 1 for the feature 1 in the initial feature set, so the weight 5 of the feature 1 in the reference feature set and the weight 5 of the feature 1 in the initial feature set can be The difference 0 is a weight change value for a set of characteristic parameters (first metadata, second metadata, ... C-ary data) of feature 1. For feature 2 in the initial feature set, feature reference set contains feature 2, so the difference between the weight 2.5 of the reference feature set feature 2 and the weight 3 of the initial feature set feature 2 can be used as a set of feature parameters of feature 2.
  • the reference feature set includes the feature 3, so the difference between the weight 0.9 of the reference feature set feature 3 and the weight 2.5 of the initial feature set feature 3 can be used as a set of feature parameters of the feature 3. (D+1 metadata, D+2 metadata, ... E-element data) corresponding weight change values.
  • the feature set does not include the feature 4, so the preset feature value (such as -0.2) can be used as a set of feature parameters of the feature 4 (E+1 metadata, E+ 2 yuan data, ... F-metadata) corresponding weight change value.
  • the reference feature set includes the feature 5, so the difference between the weight 1 of the reference feature set feature 5 and the weight of the initial feature set feature 5 of 0.5 can be used as a set of feature parameters of the feature 5 (The weight change value corresponding to the F+1 metadata, the F+2 metadata, the ... G metadata.
  • a simple descent algorithm may be used to divide the weight sum "1" into each feature, that is, assign one to each of the multiple features.
  • the weight change value is such that the sum of the weight change values of the plurality of features is 1.
  • a list may be used to record the weight change values corresponding to a set of feature parameters of each feature in the initial feature set.
  • Table 3 records the weight change values for a set of feature parameters for each feature in the initial feature set. It should be noted that the embodiment of the present invention only exemplifies the number of features in the initial feature set is 5. In practical applications, the number of features in the initial feature set may not be 5.
  • the preset weight change model may be determined according to the weight change value corresponding to each set of feature parameters, that is, the preset weight may be derived according to Table 3. Change model.
  • Step 205 Acquire data to be processed, and a set of data parameters of the data to be processed is a target parameter group.
  • the data parameter may be processed according to data of any set of data parameters that can be determined according to the preset algorithm model.
  • the data to be processed obtained in step 205 may include: in the process of user A in FIG.
  • the network side generates User data 1 and the user data 2 generated by the network B in the process of the user B using the network provided by the first communication carrier;
  • the data to be processed obtained in step 205 may include: user C is In the process of communicating using the network provided by the second communication carrier, the user data 3 generated by the network side, and the user data 4 generated by the network side during the communication of the user D using the network provided by the second communication carrier.
  • a set of data parameters of the data to be processed may be a target parameter group.
  • a process of processing data parameters as a target parameter group of the data to be processed is taken as an example for detailed explanation.
  • the process of the data parameter being the data to be processed of the other group data parameters that can be determined according to the preset algorithm model may refer to the process of processing the data to be processed as the target parameter group, which is not described herein.
  • Step 206 Substituting the target parameter group into a preset algorithm model, and determining a target algorithm corresponding to the target parameter group.
  • the target algorithm determined in step 206 may include: at least one of a target feature selection algorithm and a target machine learning algorithm, that is, the target algorithm corresponding to the determined target parameter group may be: a target parameter group corresponding to a target feature selection algorithm; or a target machine learning algorithm corresponding to the target parameter group; or a target feature selection algorithm and a target machine learning algorithm corresponding to the target parameter group.
  • the target algorithm includes: a target feature selection algorithm and a target machine learning algorithm as an example.
  • step 206 when step 206 is performed, if it is specified that the first machine learning algorithm must be used in the process of processing the data to be processed, the first machine learning algorithm and the target parameter group may be substituted into the preset algorithm model to obtain the The first machine learning algorithm and the target feature selection algorithm corresponding to the target parameter set, and the obtained target feature selection algorithm and the first machine learning algorithm are used as the target feature selection algorithm and the target machine learning algorithm corresponding to the target parameter group.
  • step 206 when step 206 is executed, if it is not explicitly specified that a certain machine learning algorithm must be used in the process of processing the data to be processed, the target parameter group can be directly substituted into the preset algorithm model to obtain the target.
  • the target feature selection algorithm and the target machine learning algorithm corresponding to the parameter group when step 206 is executed, if it is not explicitly specified that a certain machine learning algorithm must be used in the process of processing the data to be processed, the target parameter group can be directly substituted into the preset algorithm model to obtain the target. The target feature selection algorithm and the target
  • a machine learning algorithm may be determined according to the related art as the target machine learning algorithm corresponding to the target parameter group. If only the target machine learning algorithm corresponding to the target parameter group is determined in step 206, a feature selection algorithm may be determined according to the related technology as the target feature selection algorithm corresponding to the target parameter group.
  • Step 207 Determine an attribute of the data to be processed according to the target algorithm corresponding to the target parameter group and the preset weight change model.
  • the data to be processed may be substituted into a target feature selection algorithm corresponding to the target parameter group to determine a target feature set.
  • the initial feature set in step 204 may include a target feature set, that is, each feature in the target feature set belongs to the initial feature set.
  • each feature in the target feature set may also be sorted by using a preset sorting algorithm to determine the weight of each feature in the target feature set.
  • the features in the target feature set are Feature 1, Feature 2, Feature 3, Feature 4, Feature 5, and the weight of Feature 1 may be 5, the weight of Feature 2 may be 3, and the weight of Feature 3 may be 2.5.
  • the weight of the feature 4 may be 1, and the weight of the feature 5 may be 0.5, and the features in the target feature set are sorted according to the weights: feature 1, feature 2, feature 3, feature 4, and feature 5.
  • the weight change value corresponding to a set of feature parameters of each feature in the target feature set may be determined according to the preset weight change model determined in step 204.
  • the feature 1, the feature 2, and the feature may be 3,
  • the five sets of feature parameters in feature 4 and feature 5 are substituted into the preset weight change model, and the corresponding weight change values corresponding to each set of feature parameters are determined.
  • the weight corresponding to each feature in the target feature set may be updated according to the weight change value corresponding to each set of feature parameters.
  • the weight corresponding to each feature may be The sum of the weight change values corresponding to a set of feature parameters of the feature as the updated weight of the feature.
  • the weight of the target feature set feature 1 is 5, the weight change value corresponding to the set of feature parameters of the feature 1 is 0, the weight of the updated feature 1 is 5; if the weight of the target feature set 2 is 3, the weight change value corresponding to a set of feature parameters of the feature 2 is -0.5, and the weight of the updated feature 2 is 2.5; if the weight of the feature set 3 in the target feature set is 2.5, a set of features of the feature 3 If the weight change value of the parameter is -1.6, the weight of the updated feature 3 is 0.9; if the weight of the feature feature 4 of the target feature set is 1, the weight change value of the set of feature parameters of the feature 4 is -0.2.
  • the weight of the updated feature 4 is 0.8; if the weight of the target feature set feature 5 is 0.5, and the weight change value corresponding to the set of feature parameters of the feature 5 is 0.5, the weight of the feature 5 that can be updated is 1 Therefore, the features of the updated target feature set are sorted according to the weights: Feature 1, Feature 2, Feature 5, Feature 3, and Feature 4.
  • the attribute of the data to be processed may be determined according to the target machine learning algorithm corresponding to the updated target feature set and the target parameter set. Specifically, the updated target feature set may be substituted into the target.
  • the target machine learning algorithm corresponding to the parameter group a processing model is obtained, and the data to be processed is substituted into the processing model to determine the attributes of the model to be processed.
  • the two user data may be substituted into a feature selection algorithm to obtain an initial feature set, and then, according to the initial feature set.
  • Boosting algorithm an algorithm for improving the accuracy of the weak classification algorithm
  • the weak classifier is replaced with another feature to select the weak classifier, and the size of the parameter in the other feature selection classifier is adjusted. If the current feature selects the attribute of the two user data obtained by the weak classifier to be accurate, the current feature selection weak classifier is used as the feature selection strong classifier, and the feature selection strong classifier and the one machine learning algorithm determine the two The attributes of the user data. However, the process of repeatedly iterating the plurality of feature selection weak classifiers based on the Boosting algorithm takes a long time, so the data processing speed is slow and the data processing efficiency is low.
  • the target feature selection algorithm and the target machine learning algorithm corresponding to the data to be processed may be directly determined according to the preset algorithm model, and the whole process is performed. It takes less time, so it increases the speed and efficiency of data processing.
  • the data to be processed may be substituted into an automatic feature selection algorithm (such as an information gain based or correlation-based feature selection algorithm) to determine a target feature set.
  • the automatic feature selection algorithm is essentially an algorithm based on mathematical statistics theory, that is, the automatic feature selection algorithm can determine the discrimination of a certain tag in the feature of the data to be processed according to the value in the data to be processed.
  • the best feature, but in actual sense is not necessarily the best distinguishing feature, such as identity (English: identification; referred to as: ID) class features, in this case, the selected feature set is substituted into a machine learning algorithm
  • ID International Mobile Identification
  • the feature selected by the staff based on the empirical value of the feature value of the data to be processed may be different from the feature determined by the automatic feature selection algorithm, but the feature selected by the worker is substituted into a processing model obtained by a certain machine learning algorithm.
  • the processing of the processed data is better.
  • a preset weight change model is established in advance, so that automatic feature selection is used. After the algorithm obtains the feature set, the weight of the feature set can be updated by referring to the experience value of the staff, so that the processed model obtained by substituting the updated feature set into the machine learning algorithm has better processing effect on the processed data.
  • the target parameter group (a set of data parameters of the data to be processed) can be determined according to the preset algorithm model.
  • the algorithm, and the target algorithm corresponding to the target parameter group determined according to the preset algorithm model is an algorithm for evaluating at least one algorithm corresponding to the target parameter group according to the preset evaluation algorithm, and determining the optimal evaluation value corresponding to the algorithm, that is,
  • the attribute of the data to be processed is determined to be the most accurate according to the target algorithm corresponding to the target parameter group, so that the attribute of the data to be processed determined according to the target algorithm corresponding to the target parameter group has higher accuracy.
  • the embodiment of the present invention provides a data processing device 30, which may include:
  • a first acquiring module 301 configured to acquire data to be processed, where a set of data parameters of the data to be processed is a target parameter group;
  • the first determining module 302 is configured to substitute the target parameter group into the preset algorithm model, and determine a target algorithm corresponding to the target parameter group.
  • the target algorithm is: evaluating, according to the preset evaluation algorithm, at least one algorithm corresponding to the target parameter group, determining The algorithm corresponding to the optimal evaluation value;
  • the second determining module 303 is configured to determine an attribute of the data to be processed according to the target algorithm corresponding to the target parameter group.
  • the first determining module can directly determine the target parameter group according to the preset algorithm model (the data to be processed) a target algorithm corresponding to a set of data parameters, and the target algorithm corresponding to the target parameter group determined according to the preset algorithm model is to evaluate at least one algorithm corresponding to the target parameter group according to a preset evaluation algorithm, and determine the optimal algorithm.
  • the algorithm corresponding to the evaluation value that is, the second determining module determines the attribute of the data to be processed to be the most accurate according to the target algorithm corresponding to the target parameter group, so that the attribute of the data to be processed determined according to the target algorithm corresponding to the target parameter group is accurate. Higher degrees.
  • the target algorithm includes: a target feature selection algorithm and a target machine learning algorithm.
  • the embodiment of the present invention provides another data processing device 30. Based on the data of FIG. 3-1, Processing device 30 also includes:
  • a second obtaining module 304 configured to acquire n sample sets, where n sets of data parameters of the n sample sets include a target parameter set, where n is an integer greater than or equal to 1;
  • a third determining module 305 configured to determine a target feature selection algorithm and a target machine learning algorithm corresponding to each set of data parameters of the n sets of data parameters;
  • the fourth determining module 306 is configured to determine a preset algorithm model according to a target feature selection algorithm and a target machine learning algorithm corresponding to each set of data parameters of the n sets of data parameters;
  • the first sample set is any sample set in n sample sets, and the third determining module 305 can also be used to:
  • the feature selection algorithm and the machine learning algorithm corresponding to the optimal processing model are used as the target feature selection algorithm and the target machine learning algorithm corresponding to a set of data parameters of the first sample set.
  • the target algorithm includes: a target feature selection algorithm and a target machine learning algorithm.
  • the second determining module 303 may include:
  • the first determining unit 3031 is configured to substitute the data to be processed into a target feature selection algorithm corresponding to the target parameter group, and determine a target feature set, where the target feature set includes p features, and each of the p features has a set of feature parameters.
  • p is an integer greater than or equal to 1, and the feature in the feature set has a weight;
  • the second determining unit 3032 is configured to substitute the p group feature parameters of the p features into the preset weight change model, and determine a weight change value corresponding to each set of the feature parameters of the p group feature parameters;
  • the updating unit 3033 is configured to update, according to the determined weight change value, a weight corresponding to each feature in the target feature set;
  • the third determining unit 3034 is configured to determine an attribute of the data to be processed according to the updated target feature set and the target machine learning algorithm corresponding to the target parameter set.
  • the data processing apparatus 30 may further include:
  • a third obtaining module 307 configured to acquire m sample sets, where the m data parameters of the m sample sets include a target parameter group, where m is an integer greater than or equal to 1;
  • a fifth determining module 308, configured to determine a target feature selection algorithm corresponding to each group of data parameters in the m group data parameters
  • the sixth determining module 309 is configured to determine an initial feature set, where the initial feature set includes: a feature set obtained by substituting each sample set in the m sample sets into a feature set obtained by the target feature selection algorithm corresponding to a set of data parameters of the sample set;
  • the seventh determining module 310 is configured to determine a reference feature set, where the reference feature set includes: substituting each sample set in the m sample sets into a feature set obtained by the reference feature selection algorithm;
  • the eighth determining module 311 is configured to determine, according to the reference feature set, a weight change value corresponding to a set of feature parameters of each feature in the initial feature set;
  • the ninth determining module 312 is configured to determine a preset weight change model according to the weight change value corresponding to the set of feature parameters of each feature.
  • the eighth determining module 311 is further configured to:
  • the first processing model is evaluated according to a preset evaluation algorithm to determine a first evaluation value
  • the second processing model is evaluated according to a preset evaluation algorithm to determine a second evaluation value
  • the difference between the weight of the first feature in the reference feature set and the weight of the first feature in the initial feature set is used as A weight change value corresponding to a set of characteristic parameters of the first feature.
  • the target algorithm comprises: a target feature selection algorithm or a target machine learning algorithm.
  • the first determining module can directly determine the target parameter group according to the preset algorithm model (the data to be processed) a target algorithm corresponding to a set of data parameters, and a target algorithm corresponding to the target parameter set determined according to the preset algorithm model
  • the at least one algorithm corresponding to the target parameter group is evaluated according to the preset evaluation algorithm, and the algorithm corresponding to the determined optimal evaluation value, that is, the second determining module determines the data to be processed according to the target algorithm corresponding to the target parameter group.
  • the attribute is the most accurate, so that the accuracy of the attribute of the data to be processed determined according to the target algorithm corresponding to the target parameter group is high.
  • an embodiment of the present invention provides another network adjustment apparatus, which may include at least one processor 401 (such as a CPU), at least one network interface 402 or other communication interface, a memory 403, and at least one.
  • Communication bus 404 is used to implement connection communication between these devices.
  • the processor 401 is configured to execute an executable module stored in the memory 403, such as a computer program, and the memory 403 may include a high-speed random access memory (English: Random Access Memory; RAM), and may also include a non-unstable memory ( English: non-volatile memory), such as at least one disk storage.
  • the communication connection between the network adjustment device and the at least one other network element is implemented by at least one network interface 402 (which may be wired or wireless), and may use an Internet, a wide area network, a local network, a metropolitan area network, or the like.
  • the memory 403 stores a program 4031
  • the program 4031 can be executed by the processor 401
  • the data processing method shown in FIG. 2 can be implemented by the processor 401 executing the program 4031.
  • the processor after acquiring the data to be processed, the processor directly determines, according to the preset algorithm model, the target parameter group (a set of data parameters of the data to be processed).
  • the target algorithm, and the target algorithm corresponding to the target parameter group determined according to the preset algorithm model is an algorithm for evaluating at least one algorithm corresponding to the target parameter group according to the preset evaluation algorithm, and determining the optimal evaluation value, That is, according to the target algorithm corresponding to the target parameter group, the attribute of the to-be-processed data is determined to be the most accurate, so that the attribute of the data to be processed determined according to the target algorithm corresponding to the target parameter group has higher accuracy.
  • the embodiment of the invention provides a data processing system, which may include the data processing device shown in FIG. 3-1, FIG. 3-2, FIG. 3-4 or FIG.
  • the first determining module can directly determine the target parameter group according to the preset algorithm model.
  • a target algorithm corresponding to a set of data parameters of the data to be processed, and the target algorithm corresponding to the target parameter group determined according to the preset algorithm model is to evaluate at least one algorithm corresponding to the target parameter group according to the preset evaluation algorithm
  • the algorithm corresponding to the determined optimal evaluation value that is, the second determining module determines the attribute of the data to be processed to be the most accurate according to the target algorithm corresponding to the target parameter group, so that the target algorithm determined according to the target parameter group is to be processed.
  • the attributes of the data are more accurate.

Abstract

A data processing method, device and system, relating to the technical field of computers. The method comprises: obtaining data to be processed, a group of data parameters of the data to be processed being a target parameter group (205); substituting the target parameter group into a preset algorithm model to determine a target algorithm corresponding to the target parameter group (206), the target algorithm being: evaluating at least one algorithm corresponding to the target parameter group according to a preset evaluation algorithm to determine an algorithm corresponding to an optimal evaluation value; and determining, according to the target algorithm corresponding to the target parameter group, an attribute of the data to be processed. The method is used for data processing, and solves the problem of poor data processing effect, thereby improving the data processing effect.

Description

数据处理方法、装置及系统Data processing method, device and system
本申请要求于2016年8月31日提交中国专利局、申请号为201610797641.3、发明名称为“数据处理方法、装置及系统”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。The present application claims priority to Chinese Patent Application No. 201610797641.3, entitled "Data Processing Method, Apparatus and System", filed on August 31, 2016, the entire contents of .
技术领域Technical field
本申请涉及计算机技术领域,特别涉及一种数据处理方法、装置及系统。The present application relates to the field of computer technologies, and in particular, to a data processing method, apparatus, and system.
背景技术Background technique
随着社交网络的飞速发展以及网络用户的不断增多,网络侧产生的用户数据越来越多(成百上千或者更多),运营商可以通过对用户数据进行处理,确定用户的属性(如用户的性别、年龄或爱好),并根据用户的属性进行商业决策。With the rapid development of social networks and the increasing number of network users, more and more user data (hundreds or more) is generated on the network side. Operators can process user data to determine user attributes (such as The user's gender, age, or hobbies, and make business decisions based on the user's attributes.
通常的,运营商可以采用人工的方式对网络侧产生的用户数据进行处理,但是由于需要处理的数据量较大,人工处理的效率较低,因此,相关技术中,根据一种特征选择算法以及一种机器学习算法对多个用户数据进行处理,确定多个用户数据中的每个用户数据是否具有预设特征,进而确定每个用户数据对应的用户是否具有预设属性。示例的,某一通信运营商(如中国移动)的多个用户在使用该通信运营商提供的网络进行通信时,网络侧会产生较多的用户数据,如:用户的费用(能够反映用户的消费水平)、用户的账单(能够反映用户对中国移动提供的业务的使用情况)等。该通信运营商可以将网络侧产生的多个用户数据,代入一种特征选择算法(如特征空间算法),确定特征集,然后,将该特征集代入一种机器学习算法,确定该多个用户数据中具有预设特征(用户使用频率最高的业务为预设业务)的第一用户数据和不具有预设特征的第二用户数据,进而向第一用户数据对应的用户发送与预设业务相关的优惠信息。Generally, an operator can manually process user data generated on the network side, but because of the large amount of data to be processed, the efficiency of manual processing is low. Therefore, in the related art, according to a feature selection algorithm and A machine learning algorithm processes a plurality of user data to determine whether each of the plurality of user data has a preset feature, and further determines whether the user corresponding to each user data has a preset attribute. For example, when multiple users of a communication carrier (such as China Mobile) use the network provided by the communication carrier to communicate, the network side generates more user data, such as: the user's fee (can reflect the user's Consumption level), user's bill (can reflect the user's use of China Mobile's business). The communication operator may substitute multiple user data generated by the network side into a feature selection algorithm (such as a feature space algorithm), determine a feature set, and then substitute the feature set into a machine learning algorithm to determine the multiple users. The first user data of the data having the preset feature (the service with the highest user frequency being the preset service) and the second user data without the preset feature are sent to the user corresponding to the first user data and related to the preset service. Offer information.
由于相关技术中,不同场景产生的用户数据不同,如:中国移动的用户产生的用户数据与中国电信(另一个通信运营商)的用户产生的用户数据不同,相关技术中在对每种场景产生的用户数据进行处理时,均采用同一种特征选择算法和同一种机器学习算法,且同一种机器学习算法无法适用于所有场景下的用户数据,经过处理得到的用户数据属性的准确度较低,所以,数据处理的精确度较低,数据处理的效果较差。Due to different user data generated by different scenarios in the related art, for example, user data generated by users of China Mobile is different from user data generated by users of China Telecom (another communication carrier), and related technologies are generated for each scenario. When the user data is processed, the same feature selection algorithm and the same machine learning algorithm are used, and the same machine learning algorithm cannot be applied to user data in all scenarios, and the accuracy of the processed user data attribute is low. Therefore, the accuracy of data processing is low, and the effect of data processing is poor.
发明内容Summary of the invention
为了解决数据处理的效果较差的问题,本申请提供了一种数据处理方法、装置及系统。所述技术方案如下:In order to solve the problem that the effect of data processing is poor, the present application provides a data processing method, device and system. The technical solution is as follows:
第一方面,提供了一种数据处理方法,所述方法包括:In a first aspect, a data processing method is provided, the method comprising:
获取待处理数据,该待处理数据的一组数据参数为目标参数组;在获取待处理数据后,可以将目标参数组代入预设算法模型,确定目标参数组对应的目标算法,需要说明的是,所述目标算法为根据预设评估算法对所述目标参数组对应的至少一种算法进行评估,确定的最优评估值对应的算法;在确定目标参数组对应的目标算法后,就可以根据目标参数组 对应的目标算法对待处理数据进行处理,从而确定待处理数据的属性。可选的,所述数据参数用于描述数据的特征,所述目标参数组用于描述所述待处理数据的一组特征。Obtaining data to be processed, the set of data parameters of the data to be processed is a target parameter group; after obtaining the data to be processed, the target parameter group may be substituted into a preset algorithm model to determine a target algorithm corresponding to the target parameter group, which needs to be explained The target algorithm is an algorithm for evaluating at least one algorithm corresponding to the target parameter group according to a preset evaluation algorithm, and determining an optimal evaluation value; after determining a target algorithm corresponding to the target parameter group, the target algorithm may be Target parameter group The corresponding target algorithm processes the processed data to determine the attributes of the data to be processed. Optionally, the data parameter is used to describe a feature of the data, and the target parameter group is used to describe a set of features of the to-be-processed data.
由于本申请中在获取到待处理数据后,直接可以根据预设算法模型,确定目标参数组对应的目标算法,且该预设算法模型所指示的目标参数组对应的目标算法为根据预设评估算法对目标参数组对应的至少一种算法进行评估,确定的最优评估值对应的算法,也即根据目标参数组对应的目标算法对待处理数据进行处理,所确定出的待处理数据的属性最准确,提高了确定出的待处理数据的属性的准确度。After the data to be processed is obtained in the present application, the target algorithm corresponding to the target parameter group may be directly determined according to the preset algorithm model, and the target algorithm corresponding to the target parameter group indicated by the preset algorithm model is evaluated according to the preset. The algorithm evaluates at least one algorithm corresponding to the target parameter group, and the algorithm corresponding to the determined optimal evaluation value, that is, the data to be processed according to the target algorithm corresponding to the target parameter group, and the attribute of the determined data to be processed is the most Accurate, improving the accuracy of the attributes of the determined data to be processed.
可选的,所述目标算法可以包括:目标特征选择算法和目标机器学习算法,在将所述目标参数组代入预设算法模型之前,还可以获取n个样本集,该n个样本集中的每个样本集可以具有一组数据参数,n个样本集具有n组数据参数,且n个样本集的n组数据参数可以包括所述目标参数组,所述n可以为大于或等于1的整数;确定所述n组数据参数中的每组数据参数对应的目标特征选择算法和目标机器学习算法,示例的,在每获取到一个样本集后,可以确定该样本集的一组数据参数对应的目标特征选择算法和目标机器学习算法;在确定n组数据参数中的每组数据参数对应的目标特征选择算法和目标机器学习算法后,可以根据所述n组数据参数中的每组数据参数对应的目标特征选择算法和目标机器学习算法,确定所述预设算法模型。Optionally, the target algorithm may include: a target feature selection algorithm and a target machine learning algorithm, and before the target parameter group is substituted into the preset algorithm model, n sample sets may also be acquired, and each of the n sample sets The sample set may have a set of data parameters, the n sample sets have n sets of data parameters, and the n sets of data parameters of the n sample sets may include the target parameter set, and the n may be an integer greater than or equal to 1; Determining a target feature selection algorithm and a target machine learning algorithm corresponding to each set of data parameters of the n sets of data parameters. For example, after each sample set is acquired, a target corresponding to a set of data parameters of the sample set may be determined. a feature selection algorithm and a target machine learning algorithm; after determining a target feature selection algorithm and a target machine learning algorithm corresponding to each set of data parameters of the n sets of data parameters, according to each set of data parameters of the n sets of data parameters The target feature selection algorithm and the target machine learning algorithm determine the preset algorithm model.
也即是,在获取待处理数据前,需要预先获取n个样本集,并确定每个样本集对应的目标算法,以及根据每个样本集的目标算法推导出预设算法模型,使得根据该预设算法模型可以确定至少一组数据参数对应的目标算法,在对待处理数据进行处理时,能够快速的根据该预设算法模型确定该待处理数据对应的目标算法,提高了数据处理的速度和效率。That is, before acquiring the data to be processed, it is necessary to acquire n sample sets in advance, determine a target algorithm corresponding to each sample set, and derive a preset algorithm model according to the target algorithm of each sample set, so that according to the pre- The algorithm model can determine the target algorithm corresponding to at least one set of data parameters, and can quickly determine the target algorithm corresponding to the data to be processed according to the preset algorithm model when processing the data to be processed, thereby improving the speed and efficiency of data processing. .
可选的,第一样本集为所述n个样本集中的任一样本集,通常可以采用第一样本集对应的至少一种特征选择算法和至少一种机器学习算法对该第一样本集进行处理。所述确定所述n组数据参数中的每组数据参数对应的目标特征选择算法和目标机器学习算法,可以包括:将所述第一样本集代入至少一种特征选择算法(也即该第一样本集对应的至少一种特征选择算法)中,得到至少一个特征集,并将得到的至少一个特征集确定为所述第一样本集的一组数据参数对应的至少一个特征集;然后,可以将所述第一样本集的一组数据参数对应的至少一个特征集,分别代入至少一种机器学习算法中,得到至少一个处理模型,并将该至少一个处理模型确定为所述第一样本集的一组数据参数对应的至少一个处理模型;最后,可以根据预设评估算法确定所述至少一个处理模型中每个处理模型对应的评估值,并将评估值最优的处理模型对应的特征选择算法和机器学习算法,作为所述第一样本集的一组数据参数对应的目标特征选择算法和目标机器学习算法。需要说明的是,该第一样本集为n个样本集中的任一样本集,也即在确定n个样本集中的每个样本集对应的目标特征选择算法和目标机器学习算法的过程均可以参考上述确定第一样本集对应的目标特征选择算法和目标机器学习算法的过程。Optionally, the first sample set is any one of the n sample sets, and the at least one feature selection algorithm corresponding to the first sample set and the at least one machine learning algorithm may be used to determine the first This episode is processed. Determining the target feature selection algorithm and the target machine learning algorithm corresponding to each of the n sets of data parameters may include: substituting the first sample set into at least one feature selection algorithm (ie, the first Obtaining at least one feature set in at least one feature selection algorithm corresponding to a sample set, and determining the obtained at least one feature set as at least one feature set corresponding to a set of data parameters of the first sample set; Then, at least one feature set corresponding to a set of data parameters of the first sample set may be substituted into at least one machine learning algorithm to obtain at least one processing model, and the at least one processing model is determined to be the At least one processing model corresponding to a set of data parameters of the first sample set; finally, determining, according to a preset evaluation algorithm, an evaluation value corresponding to each processing model in the at least one processing model, and processing the evaluation value optimally a feature selection algorithm and a machine learning algorithm corresponding to the model, as a target feature selection algorithm corresponding to a set of data parameters of the first sample set Standard machine learning algorithms. It should be noted that the first sample set is any sample set in n sample sets, that is, the process of determining the target feature selection algorithm and the target machine learning algorithm corresponding to each sample set in the n sample sets may be Referring to the above, the process of determining the target feature selection algorithm and the target machine learning algorithm corresponding to the first sample set.
由于预先确定了预设算法模型,所以在对待处理数据进行处理时,可以直接根据该预设算法模型,确定待处理数据的目标参数组对应的目标特征选择算法和目标机器学习算法,且整个过程中耗时较短,所以提高了数据处理的速度和效率。Since the preset algorithm model is determined in advance, when the data to be processed is processed, the target feature selection algorithm and the target machine learning algorithm corresponding to the target parameter group of the data to be processed may be directly determined according to the preset algorithm model, and the whole process is performed. It takes less time and therefore improves the speed and efficiency of data processing.
可选的,所述目标算法可以包括:目标特征选择算法和目标机器学习算法,所述根据所述目标参数组对应的目标算法确定所述待处理数据的属性,包括:首先,将所述待处理 数据代入所述目标参数组对应的目标特征选择算法,得到一个特征集,并将得到的特征集确定为目标特征集,所述目标特征集包括p个特征,所述p个特征中的每个特征具有一组特征参数,p个特征可以具有p组特征参数,所述p为大于或等于1的整数,且特征集中的每个特征具有一个权重;然后,可以将所述p个特征的p组特征参数分别代入预设权重变化模型,确定所述p组特征参数中每组特征参数对应的权重变化值,需要说明的是,根据所述预设权重变化模型能够确定出q组特征参数中的每组特征参数对应的权重变化值,所述q组特征参数包括所述p组特征参数,q≥p;在确定p组特征参数中每组特征参数对应的权重变化值后,可以根据确定出的权重变化值更新所述目标特征集中的每个特征对应的权重,也即,将原先每个特征的权重与该特征的一组特征参数对应的权重变化值之和作为更新后的该特征对应的权重;最后,可以根据更新特征的权重后的目标特征集和所述目标参数组对应的目标机器学习算法,确定所述待处理数据的属性。Optionally, the target algorithm may include: a target feature selection algorithm and a target machine learning algorithm, where determining, according to the target algorithm corresponding to the target parameter group, the attribute of the data to be processed, including: first, the deal with Substituting data into the target feature selection algorithm corresponding to the target parameter group, obtaining a feature set, and determining the obtained feature set as a target feature set, where the target feature set includes p features, each of the p features The feature has a set of feature parameters, p features may have p set of feature parameters, the p is an integer greater than or equal to 1, and each feature in the feature set has a weight; then, p of the p features may be The group feature parameters are respectively substituted into the preset weight change model, and the weight change values corresponding to each set of the feature parameters of the p group feature parameters are determined. It should be noted that, according to the preset weight change model, the q group feature parameters can be determined. Each set of characteristic parameters corresponding to a weight change value, the q set of characteristic parameters including the p set of characteristic parameters, q ≥ p; after determining a weight change value corresponding to each set of characteristic parameters of the p set of characteristic parameters, may be determined according to The weight change value is updated to update the weight corresponding to each feature in the target feature set, that is, the weight of each feature is corresponding to a set of feature parameters of the feature. Weight change values and weights as the feature updated corresponding weight; Finally, machine learning algorithms, to determine attributes of the data to be processed in accordance with the target object feature set of the weight update feature weights and the target parameter set corresponding to.
示例的,该预设权重变化模型可以为根据工作人员的经验值预先建立的,由于预先确定了预设权重变化模型,使得在使用自动特征选择算法得到目标特征集后,还可以参考工作人员的经验值,对该目标特征集中特征的权重进行更新,使得将更新后的目标特征集代入机器学习算法得到的处理模型的处理效果较好。For example, the preset weight change model may be pre-established according to the experience value of the staff. Since the preset weight change model is determined in advance, after the target feature set is obtained by using the automatic feature selection algorithm, the staff member may also refer to The empirical value is used to update the weight of the target feature set feature, so that the processed model obtained by substituting the updated target feature set into the machine learning algorithm has better processing effect.
可选的,在根据所述目标参数组对应的目标算法确定所述待处理数据的属性之前,所述方法还可以包括:获取m个样本集,所述m个样本集的m组数据参数包括所述目标参数组,所述m为大于或等于1的整数,示例的,m可以与n相等,m也可以与n不相等;在获取到m个样本集后,可以确定m个样本集的所述m组数据参数中的每组数据参数对应的目标特征选择算法;然后,确定初始特征集,所述初始特征集可以包括:将所述m个样本集中的每个样本集,代入样本集的一组数据参数对应的目标特征选择算法得到的特征集中的特征,也即,将每个样本集代入该样本集对应的目标特征选择算法中,得到该样本集的一组特征,该m个样本集共能够得到m组特征,将该m组特征中的所有不同的特征组成该初始特征集;进一步的,还需要确定参考特征集,所述参考特征集包括:将所述m个样本集中的每个样本集代入参考特征选择算法得到的特征集中的特征;最后,可以将该参考特征集与初始特征集进行比较,也即根据所述参考特征集,确定所述初始特征集中每个特征的一组特征参数对应的权重变化值;并根据所述每个特征的一组特征参数对应的权重变化值,确定所述预设权重变化模型。Optionally, before determining the attribute of the data to be processed according to the target algorithm corresponding to the target parameter group, the method may further include: acquiring m sample sets, where the m data parameters of the m sample sets include For the target parameter group, the m is an integer greater than or equal to 1. For example, m may be equal to n, and m may not be equal to n; after obtaining m sample sets, m sample sets may be determined. a target feature selection algorithm corresponding to each set of data parameters of the m sets of data parameters; then, determining an initial feature set, the initial feature set may include: substituting each sample set in the m sample sets into a sample set a set of data parameters corresponding to the feature set selection algorithm obtained by the target feature selection algorithm, that is, each sample set is substituted into the target feature selection algorithm corresponding to the sample set to obtain a set of features of the sample set, the m The sample set can obtain a total of m sets of features, and all the different features of the m set of features are composed of the initial feature set; further, a reference feature set is further determined, and the reference feature set includes: Each sample set in the m sample sets is substituted into a feature set obtained by the reference feature selection algorithm; finally, the reference feature set may be compared with an initial feature set, that is, the initial set is determined according to the reference feature set The weight change value corresponding to a set of feature parameters of each feature is set in the feature set; and the preset weight change model is determined according to the weight change value corresponding to the set of feature parameters of each feature.
也即是,在获取待处理数据前,需要预先获取m个样本集,并确定每个样本集对应的目标特征选择算法,以及根据每个样本集的目标特征选择算法以及参考特征选择算法,推导出预设权重变化模型,使得根据该预设权重变化模型可以确定出至少一组特征参数对应的权重变化值,在对待处理数据进行处理时,能够快速的根据该预设权重变化模型确定该待处理数据的特征集中每个特征对应的权重变化值,进而根据更新权重后的特征集对待处理数据进行处理,提高了数据处理的速度和效率。That is, before acquiring the data to be processed, it is necessary to acquire m sample sets in advance, and determine a target feature selection algorithm corresponding to each sample set, and derivate according to the target feature selection algorithm and the reference feature selection algorithm of each sample set. The preset weight change model is configured, so that the weight change value corresponding to the at least one set of feature parameters can be determined according to the preset weight change model, and when the data to be processed is processed, the preset weight change model can be quickly determined according to the preset weight change model. The feature change value corresponding to each feature in the feature set of the processed data is processed, and the data to be processed is processed according to the feature set after updating the weight, thereby improving the speed and efficiency of data processing.
可选的,所述根据所述参考特征集,确定所述初始特征集中每个特征的一组特征参数对应的权重变化值,包括:将所述初始特征集代入预设机器学习算法,确定第一处理模型;以及将所述参考特征集代入预设机器学习算法,确定第二处理模型;并根据所述预设评估算法对所述第一处理模型进行评估,确定第一评估值;以及根据所述预设评估算法对所述第二处理模型进行评估,确定第二评估值;在得到第一评估值和第二评估值后,可以判断 所述第二评估值是否大于所述第一评估值;若所述第二评估值大于所述第一评估值,且所述参考特征集包括所述初始特征集中的第一特征,则可以确定参考特征选择算法比第一样本集的一组数据参数对应的目标特征选择算法的处理效果好,并将所述第一特征在所述参考特征集中的权重,与所述第一特征在所述初始特征集中的权重之差,作为所述第一特征的一组特征参数对应的权重变化值。可选的,若所述第二评估值大于所述第一评估值,且所述参考特征集不包括所述初始特征集中的第一特征,则将预设权重变化值作为所述第一特征对应的权重变化值,也即,在参考特征选择算法比第一样本集的一组数据参数对应的目标特征选择算法的处理效果好,且参考特征集不包括第一特征时,仅仅将预设的一个经验值作为第一特征对应的权重变化值;若所述第二评估值不大于所述第一评估值,则可以确定第一样本集的一组数据参数对应的目标特征选择算法比参考特征选择算法的处理效果好,此时可以确定所述第一特征对应的权重变化值为零。Optionally, determining, according to the reference feature set, a weight change value corresponding to a set of feature parameters of each feature in the initial feature set, including: substituting the initial feature set into a preset machine learning algorithm, determining a processing model; and substituting the reference feature set into a preset machine learning algorithm to determine a second processing model; and evaluating the first processing model according to the preset evaluation algorithm to determine a first evaluation value; The preset evaluation algorithm evaluates the second processing model to determine a second evaluation value; after obtaining the first evaluation value and the second evaluation value, the Whether the second evaluation value is greater than the first evaluation value; if the second evaluation value is greater than the first evaluation value, and the reference feature set includes the first feature in the initial feature set, it may be determined The reference feature selection algorithm has a better processing effect than the target feature selection algorithm corresponding to the set of data parameters of the first sample set, and the weight of the first feature in the reference feature set is compared with the first feature The difference between the weights in the initial feature set is the weight change value corresponding to the set of feature parameters of the first feature. Optionally, if the second evaluation value is greater than the first evaluation value, and the reference feature set does not include the first feature in the initial feature set, the preset weight change value is used as the first feature. Corresponding weight change value, that is, when the target feature selection algorithm corresponding to the set of feature parameters of the first sample set is better than the first feature, and the reference feature set does not include the first feature, An empirical value is set as the weight change value corresponding to the first feature; if the second evaluation value is not greater than the first evaluation value, the target feature selection algorithm corresponding to a set of data parameters of the first sample set may be determined The processing effect of the reference feature selection algorithm is better. At this time, it may be determined that the weight change value corresponding to the first feature is zero.
本申请中分别对目标特征选择算法得到的处理模型和参考特征选择算法得到的处理模型进行评估,若第一评估值大于或等于第二评估值,则可以确定采用目标特征选择算法对目标样本进行处理的处理效果比采用参考特征选择算法对目标样本进行处理的处理效果好,或者与采用参考特征选择算法对目标样本进行处理的处理效果相同,此时,无需参考工作人员的经验值。若第一评估值小于第二评估值,则可以确定采用参考特征选择算法对目标样本进行处理的处理效果比采用目标特征选择算法对目标样本进行处理的处理效果好,此时,需要参考工作人员的经验值,对该初始特征集中特征的权重进行更新,使得将更新后的初始特征集代入机器学习算法得到的处理模型对待处理数据的处理效果较好。In the present application, the processing model obtained by the target feature selection algorithm and the processing model obtained by the reference feature selection algorithm are respectively evaluated. If the first evaluation value is greater than or equal to the second evaluation value, it may be determined that the target feature selection algorithm is used to perform the target sample. The processing effect of the processing is better than that of the target feature processing by using the reference feature selection algorithm, or the same as the processing of the target sample by the reference feature selection algorithm. At this time, it is not necessary to refer to the experience value of the staff. If the first evaluation value is smaller than the second evaluation value, it may be determined that the processing effect of processing the target sample by using the reference feature selection algorithm is better than that of processing the target sample by using the target feature selection algorithm. The experience value is updated to the weight of the initial feature set feature, so that the processed model obtained by substituting the updated initial feature set into the machine learning algorithm has better processing effect on the processed data.
可选的,所述目标算法包括:目标特征选择算法和目标机器学习算法,根据所述预设算法模型能够确定出第一机器学习算法和至少一组数据参数中每组数据参数对应的目标特征选择算法,所述将所述目标参数组代入预设算法模型,确定所述目标参数组对应的目标算法,包括:确定第一机器学习算法为所述目标参数组对应的目标机器学习算法;将所述目标参数组和所述第一机器学习算法代入所述预设算法模型,确定所述目标参数组和所述第一机器学习算法对应的目标特征选择算法。Optionally, the target algorithm includes: a target feature selection algorithm and a target machine learning algorithm, and according to the preset algorithm model, the target feature corresponding to each set of data parameters in the first machine learning algorithm and the at least one set of data parameters can be determined. Determining an algorithm, the step of substituting the target parameter group into a preset algorithm model, and determining a target algorithm corresponding to the target parameter group, comprising: determining that the first machine learning algorithm is a target machine learning algorithm corresponding to the target parameter group; The target parameter set and the first machine learning algorithm are substituted into the preset algorithm model, and the target parameter set and the target feature selection algorithm corresponding to the first machine learning algorithm are determined.
本申请中,在确定所述至少一组数据参数中的每组数据参数对应的目标特征选择算法和目标机器学习算法后,可以根据至少一组数据参数中的每组数据参数对应的目标特征选择算法和目标机器学习算法,确定预设机器学习算法和至少一组数据参数中每组数据参数对应的目标特征选择算法,进而得到预设算法模型,并根据预设机器学习算法、目标参数组和预设算法模型确定目标参数组和预设机器学习算法对应的目标特征选择算法。In the present application, after determining the target feature selection algorithm and the target machine learning algorithm corresponding to each of the at least one set of data parameters, the target feature selection corresponding to each set of data parameters in the at least one set of data parameters may be selected. The algorithm and the target machine learning algorithm determine a preset machine learning algorithm and a target feature selection algorithm corresponding to each set of data parameters in at least one set of data parameters, thereby obtaining a preset algorithm model, and according to a preset machine learning algorithm, a target parameter group, and The preset algorithm model determines a target feature set corresponding to the target parameter set and the preset machine learning algorithm.
可选的,所述目标算法包括:目标特征选择算法和目标机器学习算法,根据所述预设算法模型能够确定出至少一组数据参数中每组数据参数对应的目标特征选择算法和目标机器学习算法,所述将所述目标参数组代入预设算法模型,确定所述目标参数组对应的目标算法,包括:将所述目标参数组代入所述预设算法模型中,确定所述目标参数组对应的目标特征选择算法和目标机器学习算法。Optionally, the target algorithm includes: a target feature selection algorithm and a target machine learning algorithm, and according to the preset algorithm model, a target feature selection algorithm and target machine learning corresponding to each set of data parameters in at least one set of data parameters can be determined. An algorithm, the step of substituting the target parameter group into a preset algorithm model, and determining a target algorithm corresponding to the target parameter group, comprising: substituting the target parameter group into the preset algorithm model, and determining the target parameter group Corresponding target feature selection algorithm and target machine learning algorithm.
本申请中,在确定所述至少一组数据参数中的每组数据参数对应的目标特征选择算法和目标机器学习算法后,根据至少一组数据参数中的每组数据参数对应的目标特征选择算法和目标机器学习算法,确定至少一组数据参数中每组数据参数对应的目标特征选择算法和目标机器学习算法,进而得到预设算法模型,并根据目标参数组以及预设算法模型得到 目标参数组对应的目标特征选择算法和目标机器学习算法。In the present application, after determining the target feature selection algorithm and the target machine learning algorithm corresponding to each of the at least one set of data parameters, the target feature selection algorithm corresponding to each set of data parameters in the at least one set of data parameters And a target machine learning algorithm, determining a target feature selection algorithm and a target machine learning algorithm corresponding to each set of data parameters in at least one set of data parameters, thereby obtaining a preset algorithm model, and obtaining according to the target parameter group and the preset algorithm model The target feature selection algorithm and the target machine learning algorithm corresponding to the target parameter group.
可选的,所述目标参数组对应的目标特征选择算法可以包括:基于信息熵的特征选择算法,或者,基于特征间相关度的特征选择算法;所述目标参数组对应的目标机器学习算法包括:随机森林RF机器学习算法,逻辑回归LR机器学习算法,或者,支持向量机SVM机器学习算法。Optionally, the target feature selection algorithm corresponding to the target parameter group may include: a feature selection algorithm based on information entropy, or a feature selection algorithm based on inter-feature correlation; the target machine learning algorithm corresponding to the target parameter group includes : Random forest RF machine learning algorithm, logistic regression LR machine learning algorithm, or support vector machine SVM machine learning algorithm.
可选的,数据的一组数据参数由数据的一组元数据组成,每个特征的一组特征参数由特征的一组元数据组成。Optionally, a set of data parameters of the data is composed of a set of metadata of the data, and a set of feature parameters of each feature is composed of a set of metadata of the feature.
可选的,所述目标算法包括:目标特征选择算法或目标机器学习算法中的至少一种算法。也即,上述确定的目标参数组对应的目标算法可以为:目标参数组对应的目标特征选择算法;或者,目标参数组对应的目标机器学习算法;或者,目标参数组对应的目标特征选择算法和目标机器学习算法。Optionally, the target algorithm includes at least one of a target feature selection algorithm or a target machine learning algorithm. That is, the target algorithm corresponding to the determined target parameter group may be: a target feature selection algorithm corresponding to the target parameter group; or a target machine learning algorithm corresponding to the target parameter group; or a target feature selection algorithm corresponding to the target parameter group and Target machine learning algorithm.
第二方面,提供了一种数据处理装置,所述数据处理装置包括:第一获取模块、第一确定模块和第二确定模块,其中,第一获取模块可以用于获取待处理数据,所述待处理数据的一组数据参数为目标参数组;第一确定模块可以用于将所述目标参数组代入预设算法模型,确定所述目标参数组对应的目标算法,所述目标算法为根据预设评估算法对所述目标参数组对应的至少一种算法进行评估,确定的最优评估值对应的算法;第二确定模块可以用于根据所述目标参数组对应的目标算法确定所述待处理数据的属性。In a second aspect, a data processing apparatus is provided, where the data processing apparatus includes: a first obtaining module, a first determining module, and a second determining module, wherein the first acquiring module is configured to acquire data to be processed, A set of data parameters of the data to be processed is a target parameter group; the first determining module may be configured to substitute the target parameter group into a preset algorithm model, and determine a target algorithm corresponding to the target parameter group, where the target algorithm is based on The evaluation algorithm is configured to evaluate at least one algorithm corresponding to the target parameter group, and the determined optimal evaluation value corresponds to an algorithm; and the second determining module may be configured to determine, according to the target algorithm corresponding to the target parameter group, the to-be-processed The properties of the data.
可选的,所述目标算法包括:目标特征选择算法和目标机器学习算法,所述数据处理装置还包括:第二获取模块、第三确定模块和第四确定模块,其中,第二获取模块可以用于获取n个样本集,所述n个样本集的n组数据参数包括所述目标参数组,所述n为大于或等于1的整数;第三确定模块可以用于确定所述n组数据参数中的每组数据参数对应的目标特征选择算法和目标机器学习算法;第四确定模块可以用于根据所述n组数据参数中的每组数据参数对应的目标特征选择算法和目标机器学习算法,确定所述预设算法模型。Optionally, the target algorithm includes: a target feature selection algorithm and a target machine learning algorithm, the data processing device further includes: a second obtaining module, a third determining module, and a fourth determining module, wherein the second acquiring module may For obtaining n sample sets, the n sets of data parameters of the n sample sets include the target parameter set, the n is an integer greater than or equal to 1; the third determining module may be configured to determine the n sets of data a target feature selection algorithm and a target machine learning algorithm corresponding to each set of data parameters in the parameter; the fourth determining module may be configured to select a target feature selection algorithm and a target machine learning algorithm according to each set of the data parameters of the n sets of data parameters Determining the preset algorithm model.
可选的,第一样本集为所述n个样本集中的任一样本集,所述第三确定模块还用于:将所述第一样本集代入至少一种特征选择算法中,确定所述第一样本集的一组数据参数对应的至少一个特征集;将所述第一样本集的一组数据参数对应的至少一个特征集,分别代入至少一种机器学习算法中,确定所述第一样本集的一组数据参数对应的至少一个处理模型;根据预设评估算法确定所述至少一个处理模型中每个处理模型对应的评估值,并将评估值最优的处理模型对应的特征选择算法和机器学习算法,作为所述第一样本集的一组数据参数对应的目标特征选择算法和目标机器学习算法。Optionally, the first sample set is any one of the n sample sets, and the third determining module is further configured to: substitute the first sample set into at least one feature selection algorithm to determine At least one feature set corresponding to a set of data parameters of the first sample set; at least one feature set corresponding to a set of data parameters of the first sample set is respectively substituted into at least one machine learning algorithm, and determined At least one processing model corresponding to a set of data parameters of the first sample set; determining, according to a preset evaluation algorithm, an evaluation value corresponding to each processing model in the at least one processing model, and processing the evaluation value optimally Corresponding feature selection algorithm and machine learning algorithm are used as a target feature selection algorithm and a target machine learning algorithm corresponding to a set of data parameters of the first sample set.
可选的,所述目标算法包括:目标特征选择算法和目标机器学习算法,所述第二确定模块包括:第一确定单元、第二确定单元、更新单元和第三确定单元,其中,第一确定单元可以用于将所述待处理数据代入所述目标参数组对应的目标特征选择算法,确定目标特征集,所述目标特征集包括p个特征,所述p个特征中的每个特征具有一组特征参数,所述p为大于或等于1的整数,特征集中的特征具有一个权重;第二确定单元可以用于将所述p个特征的p组特征参数分别代入预设权重变化模型,确定所述p组特征参数中每组特征参数对应的权重变化值,根据所述预设权重变化模型能够确定出q组特征参数中的每组特征参数对应的权重变化值,所述q组特征参数包括所述p组特征参数,q≥p;更新单元可以用于根据确定的权重变化值更新所述目标特征集中的每个特征对应的权重;第三确定 单元,用于根据更新后的目标特征集和所述目标参数组对应的目标机器学习算法,确定所述待处理数据的属性。Optionally, the target algorithm includes: a target feature selection algorithm and a target machine learning algorithm, where the second determining module includes: a first determining unit, a second determining unit, an updating unit, and a third determining unit, where the first The determining unit may be configured to substitute the to-be-processed data into a target feature selection algorithm corresponding to the target parameter group, and determine a target feature set, where the target feature set includes p features, each of the p features has a set of characteristic parameters, the p is an integer greater than or equal to 1, and the feature in the feature set has a weight; the second determining unit may be configured to substitute the p-group feature parameters of the p features into the preset weight change model, respectively. Determining, according to the preset weight change model, a weight change value corresponding to each set of the characteristic parameters of the q group feature parameters, where the q group features are determined according to the preset weight change model The parameter includes the p group feature parameter, q≥p; the update unit may be configured to update each feature pair in the target feature set according to the determined weight change value Weight; third determination And a unit, configured to determine an attribute of the to-be-processed data according to the updated target feature set and the target machine learning algorithm corresponding to the target parameter set.
可选的,所述数据处理装置还包括:第三获取模块、第五确定模块、第六确定模块、第七确定模块、第八确定模块和第九确定模块,其中,第三获取模块可以用于获取m个样本集,所述m个样本集的m组数据参数包括所述目标参数组,所述m为大于或等于1的整数;第五确定模块可以用于确定所述m组数据参数中的每组数据参数对应的目标特征选择算法;第六确定模块可以用于确定初始特征集,所述初始特征集包括:将所述m个样本集中的每个样本集,代入样本集的一组数据参数对应的目标特征选择算法得到的特征集中的特征;第七确定模块可以用于确定参考特征集,所述参考特征集包括:将所述m个样本集中的每个样本集代入参考特征选择算法得到的特征集中的特征;第八确定模块可以用于根据所述参考特征集,确定所述初始特征集中每个特征的一组特征参数对应的权重变化值;第九确定模块可以用于根据所述每个特征的一组特征参数对应的权重变化值,确定所述预设权重变化模型。Optionally, the data processing apparatus further includes: a third obtaining module, a fifth determining module, a sixth determining module, a seventh determining module, an eighth determining module, and a nin determining module, wherein the third acquiring module is Obtaining m sample sets, the m sets of data parameters of the m sample sets include the target parameter set, the m is an integer greater than or equal to 1; the fifth determining module may be configured to determine the m sets of data parameters a target feature selection algorithm corresponding to each set of data parameters; the sixth determining module may be configured to determine an initial feature set, the initial feature set comprising: substituting each sample set in the m sample sets into one of the sample sets a feature set obtained by the target feature selection algorithm corresponding to the group data parameter; the seventh determining module may be configured to determine a reference feature set, the reference feature set comprising: substituting each sample set in the m sample sets into a reference feature Selecting features of the feature set obtained by the algorithm; the eighth determining module may be configured to determine a set of features of each feature in the initial feature set according to the reference feature set Weights corresponding to a weight change value; ninth determining module may be used according to a weight value of a weight change in a set of characteristic parameter corresponding to said each feature determining the predetermined weight change model.
可选的,所述第八确定模块还用于:将所述初始特征集代入预设机器学习算法,确定第一处理模型;将所述参考特征集代入预设机器学习算法,确定第二处理模型;根据所述预设评估算法对所述第一处理模型进行评估,确定第一评估值;根据所述预设评估算法对所述第二处理模型进行评估,确定第二评估值;判断所述第二评估值是否大于所述第一评估值;若所述第二评估值大于所述第一评估值,且所述参考特征集包括所述初始特征集中的第一特征,则将所述第一特征在所述参考特征集中的权重,与所述第一特征在所述初始特征集中的权重之差,作为所述第一特征的一组特征参数对应的权重变化值。Optionally, the eighth determining module is further configured to: substitute the initial feature set into a preset machine learning algorithm, determine a first processing model; substitute the reference feature set into a preset machine learning algorithm, and determine a second process The first processing model is evaluated according to the preset evaluation algorithm, and the first evaluation value is determined; the second processing model is evaluated according to the preset evaluation algorithm, and the second evaluation value is determined; Whether the second evaluation value is greater than the first evaluation value; if the second evaluation value is greater than the first evaluation value, and the reference feature set includes the first feature in the initial feature set, The difference between the weight of the first feature set in the reference feature set and the weight of the first feature in the initial feature set is a weight change value corresponding to a set of feature parameters of the first feature.
可选的,所述目标算法包括:目标特征选择算法或目标机器学习算法。Optionally, the target algorithm includes: a target feature selection algorithm or a target machine learning algorithm.
第三方面,提供了一种数据处理系统,所述数据处理系统包括第二方面所述的数据处理装置。In a third aspect, a data processing system is provided, the data processing system comprising the data processing apparatus of the second aspect.
第四方面,提供了一种数据处理装置,所述数据处理装置包括:至少一个处理器、至少一个网络接口、存储器以及至少一个总线,存储器与网络接口分别通过总线与处理器相连;处理器被配置为执行存储器中存储的指令;处理器通过执行指令来实现上述第一方面或第一方面中任意一种可能的实现方式所提供的数据处理方法。According to a fourth aspect, a data processing apparatus is provided, the data processing apparatus comprising: at least one processor, at least one network interface, a memory, and at least one bus, wherein the memory and the network interface are respectively connected to the processor through a bus; the processor is The instructions are configured to execute the instructions stored in the memory; the processor implements the data processing method provided by any of the possible implementations of the first aspect or the first aspect by executing the instructions.
第五方面,提供了一种数据处理系统,所述数据处理系统包括第四方面所述的数据处理装置。In a fifth aspect, a data processing system is provided, the data processing system comprising the data processing apparatus of the fourth aspect.
上述第二方面至第五方面所获得的技术效果与上述第一方面中对应的技术手段获得的技术效果近似,本申请在此不再赘述。The technical effects obtained by the above second to fifth aspects are similar to those obtained by the corresponding technical means in the above first aspect, and the present application will not be repeated herein.
综上所述,本申请提供了一种数据处理方法、装置及系统,该数据处理方法中,在获取到待处理数据后,直接根据预设算法模型,能够确定目标参数组(待处理数据的一组数据参数)对应的目标算法,且根据该预设算法模型确定出的目标参数组对应的目标算法为根据预设评估算法对目标参数组对应的至少一种算法进行评估,确定的最优评估值对应的算法,也即根据目标参数组对应的目标算法,确定的待处理数据的属性最准确,使得根据该目标参数组对应的目标算法确定的待处理数据的属性的准确度较高。In summary, the present application provides a data processing method, apparatus, and system. In the data processing method, after acquiring the data to be processed, the target parameter group (the data to be processed is directly determined according to the preset algorithm model). a target algorithm corresponding to a set of data parameters, and the target algorithm corresponding to the target parameter group determined according to the preset algorithm model is to evaluate at least one algorithm corresponding to the target parameter group according to a preset evaluation algorithm, and determine the optimal algorithm. The algorithm corresponding to the evaluation value, that is, the attribute of the data to be processed determined is the most accurate according to the target algorithm corresponding to the target parameter group, so that the attribute of the data to be processed determined according to the target algorithm corresponding to the target parameter group has higher accuracy.
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,并不能限制本申请。 The above general description and the following detailed description are intended to be illustrative and not restrictive.
附图说明DRAWINGS
图1是本发明实施例提供的一种数据处理方法的应用场景示意图;1 is a schematic diagram of an application scenario of a data processing method according to an embodiment of the present invention;
图2是本发明实施例提供的一种数据处理方法的方法流程图;2 is a flowchart of a method for processing a data according to an embodiment of the present invention;
图3-1是本发明实施例提供的一种数据处理装置的结构示意图;3-1 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present invention;
图3-2是本发明实施例提供的另一种数据处理装置的结构示意图;3-2 is a schematic structural diagram of another data processing apparatus according to an embodiment of the present invention;
图3-3是本发明实施例提供的一种第二确定模块的结构示意图;3-3 is a schematic structural diagram of a second determining module according to an embodiment of the present invention;
图3-4是本发明实施例提供的又一种数据处理装置的结构示意图;3-4 is a schematic structural diagram of still another data processing apparatus according to an embodiment of the present invention;
图4是本发明实施例提供的再一种数据处理装置的结构示意图。FIG. 4 is a schematic structural diagram of still another data processing apparatus according to an embodiment of the present invention.
具体实施方式detailed description
为使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施方式作进一步地详细描述。In order to make the objects, technical solutions and advantages of the present application more clear, the embodiments of the present application will be further described in detail below with reference to the accompanying drawings.
图1为本发明实施例提供的一种数据处理方法的应用场景示意图,如图1所示,用户A、用户B、用户C和用户D使用的终端均接入网络,所以该四个用户均为网络用户,其中,用户A和用户B为第一通信运营商(如中国移动)的用户,也即用户A和用户B均接入第一通信运营商提供的网络,且用户A使用最多的业务为第一通信运营商提供的第一业务,用户B使用最多的业务为第一通信运营商提供的第二业务;用户C和用户D为第二通信运营商(如中国电信)的用户,也即用户C和用户D均接入第二通信运营商提供的网络,且用户C使用最多的用户为第二通信运营商提供的第三业务,用户D使用最多的业务为第二通信运营商提供的第四业务。用户A在使用第一通信运营商提供的网络进行通信的过程中,网络侧会产生用户数据1;用户B在使用第一通信运营商提供的网络进行通信的过程中,网络侧会产生用户数据2;用户C在使用第二通信运营商提供的网络进行通信的过程中,网络侧会产生用户数据3;用户D在使用第二通信运营商提供的网络进行通信的过程中,网络侧会产生用户数据4。FIG. 1 is a schematic diagram of an application scenario of a data processing method according to an embodiment of the present invention. As shown in FIG. 1 , terminals used by user A, user B, user C, and user D all access the network, so the four users are all It is a network user, where user A and user B are users of the first communication carrier (such as China Mobile), that is, both user A and user B access the network provided by the first communication carrier, and user A uses the most The service is the first service provided by the first communication carrier, the service that the user B uses the most is the second service provided by the first communication carrier, and the user C and the user D are the users of the second communication carrier (such as China Telecom). That is, both user C and user D access the network provided by the second communication carrier, and the user C uses the third service provided by the second communication carrier, and the user D uses the most service as the second communication carrier. The fourth business provided. When the user A communicates using the network provided by the first communication carrier, the network side generates the user data 1; during the process of the user B communicating using the network provided by the first communication carrier, the network side generates the user data. 2; in the process of user C communicating using the network provided by the second communication carrier, the network side generates user data 3; when the user D communicates using the network provided by the second communication carrier, the network side generates User data 4.
相关技术中,第一通信运营商的运营商在对网络侧产生的用户数据进行处理时,可以获取两个用户数据(用户数据1和用户数据2),并将该两个用户数据代入一种特征选择算法,确定该两个用户数据对应的特征集。具体的,在确定该两个用户数据对应的特征集时:可以在该两个用户数据中采集样本数据,并将该样本数据代入该一种特征空间算法,得到一个特征集(得到的特征集通常为样本数据的特征集的子集,所以,该得到的特征集也可以称为特征子集)。并将该特征集代入一种机器学习算法,得到一种处理模型。最后,可以将该样本数据划分为多份,并根据该处理模型分别确定每份样本数据的属性,以及将每份样本数据的属性均代入预设评估算法(如基于多重交叉验证机制的评估方法)中,得到多份样本数据的属性对应的一个评估值(也即该处理模型对应的评估值),若该评估值大于评估阈值,则确定当前得到的特征集为该两个用户数据对应的特征集;若该评估值小于或等于评估阈值,则需要重新采用该特征空间选择算法得到另一个特征集,直至得到的评估值大于评估值阈值。In the related art, when the operator of the first communication carrier processes the user data generated by the network side, two user data (user data 1 and user data 2) can be acquired, and the two user data are substituted into one type. A feature selection algorithm determines a feature set corresponding to the two user data. Specifically, when determining the feature set corresponding to the two user data, the sample data may be collected in the two user data, and the sample data is substituted into the feature space algorithm to obtain a feature set (the obtained feature set) Usually a subset of the feature set of the sample data, so the resulting feature set may also be referred to as a feature subset). The feature set is substituted into a machine learning algorithm to obtain a processing model. Finally, the sample data can be divided into multiple copies, and the attributes of each sample data are respectively determined according to the processing model, and the attributes of each sample data are substituted into a preset evaluation algorithm (such as an evaluation method based on multiple cross-validation mechanisms). And obtaining an evaluation value corresponding to the attribute of the plurality of sample data (that is, the evaluation value corresponding to the processing model), and if the evaluation value is greater than the evaluation threshold, determining that the currently obtained feature set is corresponding to the two user data The feature set; if the evaluation value is less than or equal to the evaluation threshold, the feature space selection algorithm needs to be re-acquired to obtain another feature set until the obtained evaluation value is greater than the evaluation value threshold.
然后,将确定出的特征集代入一种机器学习算法,确定一个处理模型;最后,根据该处理模型,确定两个用户数据中的用户数据1具有预设特征(也即用户数据1用于指示用 户A使用频率最高的业务为第一业务),用户数据2不具有预设特征(也即用户数据2用于指示用户B使用频率最高的业务不为第一业务),进而该第一通信运营商可以向用户A使用的终端发送与第一业务相关的优惠信息。Then, the determined feature set is substituted into a machine learning algorithm to determine a processing model. Finally, according to the processing model, it is determined that the user data 1 in the two user data has a preset feature (ie, user data 1 is used to indicate use The user A uses the highest frequency service as the first service. The user data 2 does not have the preset feature (that is, the user data 2 is used to indicate that the service with the highest usage frequency of the user B is not the first service), and then the first communication operation. The provider can send the preferential information related to the first service to the terminal used by the user A.
由于使用第一通信运营商提供的网络产生的用户数据(用户数据1和用户数据2)与使用第二通信运营商提供的网络产生的用户数据(用户数据3和用户数据4)为两种不同场景下产生的数据,且同一种机器学习算法无法适用于不同场景下产生的用户数据,若第二通信运营商的运营商在对网络侧产生的用户数据(用户数据3和用户数据4)进行处理时,仍然使用与第一通信运营商相同的特征选择算法和机器学习算法,则会导致第二通信运营商确定出的用户数据3的属性和用户数据4的属性出现偏差,经过处理得到的用户数据属性的准确度较低。User data (user data 1 and user data 2) generated using the network provided by the first communication carrier is different from user data (user data 3 and user data 4) generated using the network provided by the second communication carrier. The data generated in the scenario, and the same machine learning algorithm cannot be applied to user data generated in different scenarios. If the operator of the second communication carrier performs user data (user data 3 and user data 4) generated on the network side. When processing, the same feature selection algorithm and machine learning algorithm as the first communication carrier are still used, which may cause the attribute of the user data 3 determined by the second communication carrier to deviate from the attribute of the user data 4, and is processed. User data attributes are less accurate.
如图2所示,本发明实施例提供了另一种数据处理方法,该数据处理方法可以包括:As shown in FIG. 2, an embodiment of the present invention provides another data processing method, where the data processing method may include:
步骤201、获取多个样本集。Step 201: Acquire multiple sample sets.
示例的,在进行数据处理之前,需要首先从网络中产生的用户数据中获取多个样本集,并确定每个样本集的一组数据参数。需要说明的是,该多个样本集中的每个样本集可以为一种场景下产生的数据,该多个样本集中可以包括目标样本集,目标样本集的一组数据参数可以为目标参数组。具体的,数据的数据参数用于反映数据的特征,一个样本集的一组数据参数中的每个数据参数能够反映该样本集的一个特征,一个样本集的一组数据参数可以反映该样本集的多个特征。示例的,一个样本集的一组数据参数可以由该样本集的一组元数据(包括至少一个元数据)组成,若两个样本集不同,则该两个样本集的两组元数据不同,可选的,一个样本集的一组数据参数可以包括样本集的均值、样本集的方差、样本集的最大值、样本集的最小值等,本发明实施例对此不作限定。For example, before performing data processing, it is necessary to first acquire a plurality of sample sets from user data generated in the network, and determine a set of data parameters of each sample set. It should be noted that each sample set in the plurality of sample sets may be data generated in a scenario, and the plurality of sample sets may include a target sample set, and a set of data parameters of the target sample set may be a target parameter set. Specifically, the data parameter of the data is used to reflect the characteristics of the data, and each of the data parameters of a sample set can reflect a feature of the sample set, and a set of data parameters of a sample set can reflect the sample set. Multiple features. For example, a set of data parameters of a sample set may be composed of a set of metadata (including at least one metadata) of the sample set. If the two sample sets are different, the two sets of metadata of the two sample sets are different. Optionally, a set of data parameters of a sample set may include a mean of the sample set, a variance of the sample set, a maximum value of the sample set, a minimum value of the sample set, and the like, which are not limited by the embodiment of the present invention.
示例的,如表1所示,样本集1的一组数据参数可以包括:第1元数据,第2元数据,…,第X元数据;样本集2的一组数据参数可以包括:第X+1元数据,第X+2元数据,…,第Y元数据;样本集3的一组数据参数可以包括:第Y+1元数据,第Y+2元数据,…,第Z元数据;样本集4的一组数据参数可以包括:第Z+1元数据,第Z+2元数据,…,第W元数据。需要说明的是,表1中的任意两个元数据可以相同,也可以不同,但是,任意两个样本集中的两组数据参数不同。需要说明的是,本发明实施例仅仅以获取到的样本集的个数为4进行举例说明,实际应用中,步骤201中获取的样本集的个数成百上千(或者更多)。For example, as shown in Table 1, a set of data parameters of the sample set 1 may include: first metadata, second metadata, ..., X-th metadata; a set of data parameters of the sample set 2 may include: X +1 metadata, X+2 metadata, ..., yth metadata; a set of data parameters of sample set 3 may include: Y+1 metadata, Y+2 metadata, ..., Z-th metadata A set of data parameters of the sample set 4 may include: Z+1 metadata, Z+2 metadata, ..., W-th metadata. It should be noted that any two metadata in Table 1 may be the same or different, but the two sets of data parameters in any two sample sets are different. It should be noted that, in the embodiment of the present invention, only the number of sample sets obtained is 4, and in actual application, the number of sample sets acquired in step 201 is hundreds (or more).
表1Table 1
样本集Sample set 元数据Metadata
11 第1元数据,第2元数据,…,第X元数据1st metadata, 2nd metadata, ..., Xth metadata
22 第X+1元数据,第X+2元数据,…,第Y元数据X+1 metadata, X+2 metadata, ..., y metadata
33 第Y+1元数据,第Y+2元数据,…,第Z元数据Y+1 metadata, Y+2 metadata, ..., Z-dimensional data
44 第Z+1元数据,第Z+2元数据,…,第W元数据The Z+1 metadata, the Z+2 metadata, ..., the W metadata
步骤202、确定多个样本集中每个样本集的一组数据参数对应的目标特征选择算法和目标机器学习算法。Step 202: Determine a target feature selection algorithm and a target machine learning algorithm corresponding to a set of data parameters of each sample set in the plurality of sample sets.
需要说明的是,一组数据参数可以对应多种特征选择算法和多种机器学习算法(也即,对一组数据参数进行处理时,可以采用多种特征选择算法中的任意一种特征选择算法,也 可以采用多种机器学习算法中的任意一种机器学习算法)。从一组数据参数对应的多种特征选择算法中选择一个特征选择算法,与从该组数据参数对应的多种机器学习算法中选择一个机器学习算法,可以组成该组数据参数对应的一种算法,因此,该组数据参数可以对应多种算法。且根据预设评估算法对该组数据参数对应的多种算法进行评估能够确定多个评估值,该多个评估值中的最优评估值对应的算法为该组数据参数对应的目标算法,组成该目标算法的特征选择算法和机器学习算法为该组数据参数对应的目标特征选择算法和目标机器学习算法。It should be noted that a set of data parameters may correspond to multiple feature selection algorithms and multiple machine learning algorithms (that is, when processing a set of data parameters, any one of a plurality of feature selection algorithms may be used. ,and also Any of a variety of machine learning algorithms can be employed). Selecting a feature selection algorithm from a plurality of feature selection algorithms corresponding to a set of data parameters, and selecting a machine learning algorithm from a plurality of machine learning algorithms corresponding to the set of data parameters, may form an algorithm corresponding to the set of data parameters Therefore, the set of data parameters can correspond to a variety of algorithms. And evaluating, according to the preset evaluation algorithm, the plurality of algorithms corresponding to the set of data parameters, the plurality of evaluation values may be determined, and the algorithm corresponding to the optimal evaluation value of the plurality of evaluation values is a target algorithm corresponding to the set of data parameters, and is composed of The feature selection algorithm and the machine learning algorithm of the target algorithm are target feature selection algorithms and target machine learning algorithms corresponding to the set of data parameters.
具体的,采用一种特征选择算法和一种机器学习算法对某一样本集进行处理,可以确定该样本集是否具有预设特征,进而确定该样本集的属性,也即确定该样本集的属性为:具有预设特征,或者,不具有预设特征。如确定该样本集对应的用户为女性,或者,不为女性。Specifically, a feature selection algorithm and a machine learning algorithm are used to process a certain sample set, and it can be determined whether the sample set has a preset feature, thereby determining an attribute of the sample set, that is, determining an attribute of the sample set. Yes: has preset features, or does not have preset features. If it is determined that the user corresponding to the sample set is female, or is not female.
预设评估算法可以对“采用某一种特征选择算法和某一种机器学习算法确定样本集的属性”这一过程的准确率或失误率等参数进行评估,并通过数值的形式表现,这个数值可以称为预设评估算法的评估值,评估值越优,确定出的样本集的属性越准确。具体的,当预设评估算法用于评估准确率时,评估值越大,确定出的样本集的属性越准确,此时的最优评估值为最大评估值;当该预设评估算法用于评估失误率时,评估值越小,确定出的样本集的属性越准确,此时的最优评估值为最小评估值。示例的,该预设评估算法可以为基于多重交叉验证机制的评估方法,该预设评估算法还可以为其他评估算法,本发明实施例对此不作限定。The preset evaluation algorithm can evaluate parameters such as the accuracy or error rate of the process of "determining the attributes of the sample set by using a certain feature selection algorithm and a certain machine learning algorithm", and the numerical value is expressed in the form of a numerical value. It can be called the evaluation value of the preset evaluation algorithm. The better the evaluation value, the more accurate the attribute of the determined sample set. Specifically, when the preset evaluation algorithm is used to evaluate the accuracy, the larger the evaluation value, the more accurate the attribute of the determined sample set, and the optimal evaluation value at this time is the maximum evaluation value; when the preset evaluation algorithm is used When evaluating the error rate, the smaller the evaluation value, the more accurate the attribute of the determined sample set, and the optimal evaluation value at this time is the minimum evaluation value. For example, the preset evaluation algorithm may be an evaluation method based on the multiple cross-validation mechanism, and the preset evaluation algorithm may also be other evaluation algorithms, which is not limited by the embodiment of the present invention.
由于确定每组数据参数对应的目标特征选择算法和目标机器学习算法的过程相似,所以,本发明实施例在此仅以确定目标参数组对应的目标特征选择算法和目标机器学习算法为例进行解释说明,确定其他组数据参数对应的目标特征选择算法和目标机器学习算法的具体步骤可以参考:确定目标参数组对应的目标特征选择算法和目标机器学习算法的具体步骤,本发明实施例在此不做赘述。示例的,确定目标参数组对应的目标特征选择算法和目标机器学习算法,可以包括:Since the process of determining the target feature selection algorithm and the target machine learning algorithm corresponding to each group of data parameters is similar, the embodiment of the present invention only determines the target feature selection algorithm and the target machine learning algorithm corresponding to the target parameter group. For example, the specific steps of determining the target feature selection algorithm and the target machine learning algorithm corresponding to the other group data parameters may refer to: determining specific steps of the target feature selection algorithm and the target machine learning algorithm corresponding to the target parameter group, and the embodiment of the present invention does not Make a statement. For example, determining a target feature selection algorithm and a target machine learning algorithm corresponding to the target parameter group may include:
首先,将目标样本集代入至少一种特征选择算法中,确定目标参数组对应的至少一个特征集。具体的,该至少一种特征选择算法可以包括基于信息熵的特征选择算法,或者,基于特征间相关度的特征选择算法,需要说明的是,该至少一种特征选择算法还可以包括其他特征选择算法,本发明实施例在此不一一例举。然后,可以将目标参数组对应的至少一个特征集分别代入至少一种机器学习算法中,确定目标参数组对应的至少一个处理模型。示例的,若目标参数组对应A个特征集,将该A个特征集分别代入B种机器学习算法中,确定A×B个处理模型。最后,可以根据预设评估算法确定至少一个处理模型中每个处理模型对应的评估值,并将评估值最优的处理模型对应的特征选择算法和机器学习算法,作为目标参数组对应的目标特征选择算法和目标机器学习算法。示例的,若A×B等于6,且该6个处理模型对应的评估值分别为10、20、30、40、50和60,则可以将对应的评估值为60的处理模型对应的特征选择算法和机器学习算法,作为目标参数组对应的目标特征选择算法和目标机器学习算法。可选的,目标参数组对应的目标特征选择算法可以包括:基于信息熵的特征选择算法,或者,基于特征间相关度的特征选择算法;目标参数组对应的目标机器学习算法可以包括:随机森林(英文:Random Forest;简称:RF)机器学习算法,逻 辑回归(英文:Logistic Regression;简称:LR)机器学习算法,或者,支持向量机(英文:Support Vector Machine)机器学习算法。First, the target sample set is substituted into at least one feature selection algorithm to determine at least one feature set corresponding to the target parameter set. Specifically, the at least one feature selection algorithm may include a feature selection algorithm based on information entropy or a feature selection algorithm based on inter-feature correlation. It should be noted that the at least one feature selection algorithm may further include other feature selection. The algorithm is not mentioned in this example. Then, at least one feature set corresponding to the target parameter group may be substituted into at least one machine learning algorithm to determine at least one processing model corresponding to the target parameter group. For example, if the target parameter group corresponds to the A feature sets, the A feature sets are respectively substituted into the B machine learning algorithms, and the A×B processing models are determined. Finally, the evaluation value corresponding to each processing model in the at least one processing model may be determined according to a preset evaluation algorithm, and the feature selection algorithm and the machine learning algorithm corresponding to the processing model with the optimal evaluation value are used as the target features corresponding to the target parameter group. The selection algorithm and the target machine learning algorithm. For example, if A×B is equal to 6, and the evaluation values corresponding to the six processing models are 10, 20, 30, 40, 50, and 60, respectively, the corresponding feature selection corresponding to the processing model with an evaluation value of 60 may be selected. The algorithm and the machine learning algorithm are used as the target feature selection algorithm and the target machine learning algorithm corresponding to the target parameter group. Optionally, the target feature selection algorithm corresponding to the target parameter group may include: a feature selection algorithm based on information entropy, or a feature selection algorithm based on correlation between features; the target machine learning algorithm corresponding to the target parameter group may include: a random forest (English: Random Forest; abbreviation: RF) machine learning algorithm, logic Regression (English: Logistic Regression; referred to as: LR) machine learning algorithm, or support vector machine (English: Support Vector Machine) machine learning algorithm.
示例的,可以建立一个用于记录每组数据参数对应的目标特征选择算法和目标机器学习算法的列表,该列表可以如表2所示,数据参数:第1元数据,第2元数据,…,第X元数据(样本集1的一组数据参数),对应目标特征选择算法2和目标机器学习算法3,数据参数:第X+1元数据,第X+2元数据,…,第Y元数据(样本集2的一组数据参数),对应目标特征选择算法2和目标机器学习算法2,数据参数:第Y+1元数据,第Y+2元数据,…,第Z元数据(样本集3的一组数据参数),对应目标特征选择算法1和目标机器学习算法2,数据参数:第Z+1元数据,第Z+2元数据,…,第W元数据(样本集4的一组数据参数),对应目标特征选择算法1和目标机器学习算法3。需要说明的是,该列表中可以仅仅记录有目标特征选择算法的标识与目标机器学习算法的标识。For example, a list of target feature selection algorithms and target machine learning algorithms corresponding to each set of data parameters may be created. The list may be as shown in Table 2, data parameters: first metadata, second metadata, ... , Xth metadata (a set of data parameters of sample set 1), corresponding target feature selection algorithm 2 and target machine learning algorithm 3, data parameters: X+1 metadata, X+2 metadata, ..., Y Metadata (a set of data parameters of sample set 2), corresponding to target feature selection algorithm 2 and target machine learning algorithm 2, data parameters: Y+1 metadata, Y+2 metadata, ..., Z-dimensional data ( a set of data parameters of sample set 3), corresponding to target feature selection algorithm 1 and target machine learning algorithm 2, data parameters: Z+1 metadata, Z+2 metadata, ..., W-th data (sample set 4) A set of data parameters) corresponding to the target feature selection algorithm 1 and the target machine learning algorithm 3. It should be noted that only the identifier of the target feature selection algorithm and the identifier of the target machine learning algorithm may be recorded in the list.
表2Table 2
数据参数Data parameter 目标特征选择算法Target feature selection algorithm 目标机器学习算法Target machine learning algorithm
第1元数据,第2元数据,…,第X元数据1st metadata, 2nd metadata, ..., Xth metadata 22 33
第X+1元数据,第X+2元数据,…,第Y元数据X+1 metadata, X+2 metadata, ..., y metadata 22 22
第Y+1元数据,第Y+2元数据,…,第Z元数据Y+1 metadata, Y+2 metadata, ..., Z-dimensional data 11 22
第Z+1元数据,第Z+2元数据,…,第W元数据The Z+1 metadata, the Z+2 metadata, ..., the W metadata 11 33
步骤203、根据每组数据参数对应的目标特征选择算法和目标机器学习算法,确定预设算法模型。Step 203: Determine a preset algorithm model according to a target feature selection algorithm and a target machine learning algorithm corresponding to each set of data parameters.
具体的,步骤201中可以不断的获取样本集,且在步骤201中每获取到一个样本集后,就执行步骤202中确定该样本集的一组数据参数对应的目标特征选择算法和目标机器学习算法,直至步骤201中获取到的样本集的个数为n时,就可以执行步骤203中的步骤,n可以为大于或等于1的整数,n个样本集具有n组数据参数。在确定n组数据参数中每组数据参数对应的目标特征选择算法和目标机器学习算法后,可以根据每组数据参数对应的目标特征选择算法和目标机器学习算法,确定预设算法模型。具体的,可以根据步骤202中建立的列表(表2),推导出能够确定出至少一组数据参数中的每组数据参数对应的目标特征选择算法和目标机器学习算法的预设算法模型。Specifically, in step 201, the sample set can be continuously acquired, and after each sample set is acquired in step 201, the target feature selection algorithm corresponding to a set of data parameters of the sample set and the target machine learning are performed in step 202. The algorithm, until the number of sample sets acquired in step 201 is n, the steps in step 203 can be performed, n can be an integer greater than or equal to 1, and n sample sets have n sets of data parameters. After determining the target feature selection algorithm and the target machine learning algorithm corresponding to each set of data parameters in the n sets of data parameters, the preset algorithm model may be determined according to the target feature selection algorithm and the target machine learning algorithm corresponding to each set of data parameters. Specifically, according to the list established in step 202 (Table 2), a preset algorithm model capable of determining a target feature selection algorithm and a target machine learning algorithm corresponding to each set of data parameters of at least one set of data parameters may be derived.
预设算法模型可以为一个对应关系记录表,该对应关系记录表中记录了至少一组数据参数,以及该至少一组数据参数中每组数据参数对应的目标特征选择算法和目标机器学习算法,也即根据该对应关系记录表(预设算法模型)能够确定出每组数据参数对应的目标特征选择算法和目标机器学习算法。可选的,该预设算法模型还可以不为对应关系记录表,示例的,该预设算法模型还可以为一个三维坐标曲线,且三维坐标中的x变量为数据参数组,y变量为目标特征选择算法,z变量为目标机器学习算法,该三维坐标曲线可以对应至少一组数据参数。需要说明的是,该预设算法模型还可以通过其他形式表现,本发明实施例对此不做限定。The preset algorithm model may be a correspondence relationship record table, wherein the correspondence relationship record table records at least one set of data parameters, and a target feature selection algorithm and a target machine learning algorithm corresponding to each set of data parameters in the at least one set of data parameters, That is, according to the correspondence relationship record table (preset algorithm model), the target feature selection algorithm and the target machine learning algorithm corresponding to each set of data parameters can be determined. Optionally, the preset algorithm model may not be a correspondence record table. For example, the preset algorithm model may also be a three-dimensional coordinate curve, and the x variable in the three-dimensional coordinate is a data parameter group, and the y variable is a target. The feature selection algorithm, the z variable is a target machine learning algorithm, and the three-dimensional coordinate curve can correspond to at least one set of data parameters. It should be noted that the preset algorithm model may also be expressed in other forms, which is not limited by the embodiment of the present invention.
一方面,若n组数据参数各不相同,则根据步骤203中确定出的预设算法模型可以确定出n组数据参数中每组数据参数对应的目标算法;另一方面,若n组数据参数中存在至少两组相同的数据参数,则根据步骤203中确定出的预设算法模型以可以确定出L组数据参数中每组数据参数对应的目标算法,L为小于n的整数。 On the one hand, if the n sets of data parameters are different, the target algorithm corresponding to each set of data parameters in the n sets of data parameters can be determined according to the preset algorithm model determined in step 203; on the other hand, if n sets of data parameters If there are at least two sets of identical data parameters, the target algorithm corresponding to each set of data parameters in the L sets of data parameters can be determined according to the preset algorithm model determined in step 203, and L is an integer less than n.
可选的,若在处理数据的过程中,指定使用第一机器学习算法对数据进行处理,则在确定n组数据参数中每组数据参数对应的目标特征选择算法和目标机器学习算法后,可以根据每组数据参数对应的目标特征选择算法和目标机器学习算法,以及该第一机器学习算法,确定预设算法模型,根据该预设算法模型可以确定出第一机器学习算法和至少一组数据参数中每组数据参数对应的目标特征选择算法。Optionally, if the first machine learning algorithm is used to process the data in the process of processing the data, after determining the target feature selection algorithm and the target machine learning algorithm corresponding to each set of data parameters in the n sets of data parameters, Determining a preset algorithm model according to the target feature selection algorithm and the target machine learning algorithm corresponding to each set of data parameters, and the first machine learning algorithm, and determining, according to the preset algorithm model, the first machine learning algorithm and the at least one set of data The target feature selection algorithm corresponding to each set of data parameters in the parameter.
步骤204、根据每组数据参数对应的目标特征选择算法,确定预设权重变化模型。Step 204: Determine a preset weight change model according to a target feature selection algorithm corresponding to each set of data parameters.
示例的,步骤201中可以不断的获取样本集,且在步骤201中每获取到一个样本集后,就执行步骤202中确定该样本集的一组数据参数对应的目标特征选择算法和目标机器学习算法,直至步骤201中获取到的样本集的个数为m时,就可以执行步骤204中的步骤,m可以为大于或等于1的整数,m个样本集具有m组数据参数,步骤204中的m可以与步骤203中的n的相同,或者步骤204中的m可以与步骤203中的n的不同,本发明实施例对此不作限定。在确定m组数据参数中每组数据参数对应的目标特征选择算法后,可以根据每组数据参数对应的目标特征选择算法确定预设权重变化模型。For example, in step 201, the sample set may be continuously acquired, and after each sample set is acquired in step 201, the target feature selection algorithm corresponding to a set of data parameters of the sample set and the target machine learning are performed in step 202. The algorithm may perform the step in step 204 when the number of sample sets obtained in step 201 is m, m may be an integer greater than or equal to 1, and m sample sets have m sets of data parameters, in step 204 The m may be the same as the n in the step 203, or the m in the step 204 may be different from the n in the step 203, which is not limited by the embodiment of the present invention. After determining the target feature selection algorithm corresponding to each group of data parameters in the m group data parameter, the preset weight change model may be determined according to the target feature selection algorithm corresponding to each set of data parameters.
具体的,可以将m个样本集分别代入样本集的一组数据参数对应的目标特征选择算法,得到m组特征集,并根据得到的m组特征集确定初始特征集,该初始特征集可以包括m组特征集中的所有特征(q个特征)。例如:若该m组特征集为:(特征1,特征2,特征3)、(特征1,特征3,特征4)以及(特征1,特征2,特征5),则可以确定该初始特征集可以为:(特征1,特征2,特征3,特征4,特征5)。需要说明的是,在确定初始特征集后,还可以根据预设排序算法对初始特征集中的特征进行排序,为初始特征集中的每个特征赋予一个权重,例如,特征1的权重可以为5,特征2的权重可以为3,特征3的权重可以为2.5,特征4的权重可以为1,特征5的权重可以为0.5。Specifically, the m sample sets may be respectively substituted into a target feature selection algorithm corresponding to a set of data parameters of the sample set, and the m sets of feature sets are obtained, and the initial feature set is determined according to the obtained m set of feature sets, and the initial feature set may include All features (q features) in the m group feature set. For example, if the m group feature set is: (feature 1, feature 2, feature 3), (feature 1, feature 3, feature 4) and (feature 1, feature 2, feature 5), then the initial feature set can be determined It can be: (Feature 1, Feature 2, Feature 3, Feature 4, Feature 5). It should be noted that, after determining the initial feature set, the features in the initial feature set may be sorted according to a preset sorting algorithm, and each feature in the initial feature set is given a weight. For example, the weight of the feature 1 may be 5. The weight of feature 2 may be 3, the weight of feature 3 may be 2.5, the weight of feature 4 may be 1, and the weight of feature 5 may be 0.5.
然后,可以将m个样本集分别代入参考特征选择算法,得到m组特征集,并根据得到的m组特征集确定参考特征集,该参考特征集可以包括m组特征集中的所有特征。例如:若该m组特征集为:(特征1,特征2,特征3)、(特征1,特征3,特征6)以及(特征1,特征2,特征5),则可以确定该初始特征集可以为:(特征1,特征2,特征3,特征5,特征6)。需要说明的是,在确定参考特征集后,还可以根据预设排序算法对参考特征集中的特征进行排序,为参考特征集中的每个特征赋予一个权重,例如,特征1的权重可以为5,特征2的权重可以为2.5,特征3的权重可以为1,特征5的权重可以为0.9,特征6的权重可以为0.6。该参考特征选择算法可以为人工特征选择算法,也即根据工作人员的经验值,对每个样本进行分析判断,进而确定参考特征集,并可以继续根据工作人员的经验值,为参考特征集中的所有特征进行排序,为参考特征集中的每个特征赋予一个权重。Then, the m sample sets may be substituted into the reference feature selection algorithm to obtain the m sets of feature sets, and the reference feature set may be determined according to the obtained m set of feature sets, and the reference feature set may include all the features in the m sets of feature sets. For example, if the m group feature set is: (feature 1, feature 2, feature 3), (feature 1, feature 3, feature 6) and (feature 1, feature 2, feature 5), then the initial feature set can be determined It can be: (Feature 1, Feature 2, Feature 3, Feature 5, Feature 6). It should be noted that, after determining the reference feature set, the features in the reference feature set may be sorted according to a preset sorting algorithm, and each feature in the reference feature set is given a weight. For example, the weight of the feature 1 may be 5. The weight of feature 2 may be 2.5, the weight of feature 3 may be 1, the weight of feature 5 may be 0.9, and the weight of feature 6 may be 0.6. The reference feature selection algorithm may be an artificial feature selection algorithm, that is, according to the experience value of the staff, each sample is analyzed and judged, and then the reference feature set is determined, and the reference feature set may be continued according to the experience value of the staff. All features are sorted to give each feature a weight in the reference feature set.
最后,可以根据得到的参考特征集,确定初始特征集中每个特征对应的权重变化值,并将每个特征的权重变化值确定为该特征的一组特征参数对应的权重变化值。具体的,可以将初始特征集代入预设机器学习算法,确定第一处理模型,将参考特征集代入预设机器学习算法,确定第二处理模型。并根据预设评估算法对第一处理模型进行评估,确定第一评估值,根据预设评估算法对第二处理模型进行评估,确定第二评估值。然后判断第二评估值是否大于第一评估值,也即判断采用参考特征选择算法对目标样本进行处理的处理效果好,还是采用目标参数组对应的目标特征选择算法对目标样本进行处理的处理效果好。若第二评估值大于第一评估值,且参考特征集包括初始特征集中的第一特征,则将第一特 征在参考特征集中的权重,与第一特征在初始特征集中的权重之差,作为第一特征的一组特征参数对应的权重变化值。若第二评估值大于第一评估值,且参考特征集不包括初始特征集中的第一特征,则将预设权重变化值作为第一特征的一组特征参数对应的权重变化值;若第二评估值不大于第一评估值,则确定第一特征的一组特征参数对应的权重变化值为零。Finally, according to the obtained reference feature set, the weight change value corresponding to each feature in the initial feature set may be determined, and the weight change value of each feature is determined as the weight change value corresponding to the set of feature parameters of the feature. Specifically, the initial feature set may be substituted into a preset machine learning algorithm, the first processing model is determined, and the reference feature set is substituted into a preset machine learning algorithm to determine the second processing model. And evaluating the first processing model according to the preset evaluation algorithm, determining the first evaluation value, and evaluating the second processing model according to the preset evaluation algorithm to determine the second evaluation value. Then, it is judged whether the second evaluation value is greater than the first evaluation value, that is, the processing effect of processing the target sample by using the reference feature selection algorithm is good, or the processing effect of processing the target sample by using the target feature selection algorithm corresponding to the target parameter group. it is good. If the second evaluation value is greater than the first evaluation value, and the reference feature set includes the first feature in the initial feature set, the first special The weight of the eigenvalue in the reference feature set, and the difference between the weight of the first feature in the initial feature set, and the weight change value corresponding to the set of feature parameters of the first feature. If the second evaluation value is greater than the first evaluation value, and the reference feature set does not include the first feature in the initial feature set, the preset weight change value is used as the weight change value corresponding to the set of feature parameters of the first feature; If the evaluation value is not greater than the first evaluation value, it is determined that the weight change value corresponding to the set of characteristic parameters of the first feature is zero.
若第二评估值小于或等于第一评估值,则可以确定特征1、2、3、4、5对应的权重变化值均为0。若第二评估值大于第一评估值,则对于初始特征集中的特征1而言,参考特征集中包含特征1,所以可以将参考特征集中特征1的权重5与初始特征集中特征1的权重5之差0,作为特征1的一组特征参数(第1元数据、第2元数据、...第C元数据)对于的权重变化值。对于初始特征集中的特征2而言,参考特征集中包含特征2,所以可以将参考特征集中特征2的权重2.5与初始特征集中特征2的权重3之差-0.5,作为特征2的一组特征参数(第C+1元数据、第C+2元数据、...第D元数据)对应的权重变化值。对于初始特征集中的特征3而言,参考特征集中包含特征3,所以可以将参考特征集中特征3的权重0.9与初始特征集中特征3的权重2.5之差-1.6,作为特征3的一组特征参数(第D+1元数据、第D+2元数据、...第E元数据)对应的权重变化值。对于初始特征集中的特征4而言,参考特征集中不包含特征4,所以可以将预设特征值(如-0.2),作为特征4的一组特征参数(第E+1元数据、第E+2元数据、...第F元数据)对应的权重变化值。对于初始特征集中的特征5而言,参考特征集中包含特征5,所以可以将参考特征集中特征5的权重1与初始特征集中特征5的权重0.5之差0.5,作为特征5的一组特征参数(第F+1元数据、第F+2元数据、...第G元数据)对应的权重变化值。可选的,若该参考特征集中不包括初始特征集中的多个特征,则可以采用一种简单的下降算法将权重总和“1”划分给每个特征,也即分别为该多个特征分配一个权重变化值,使得该多个特征的权重变化值之和为1。If the second evaluation value is less than or equal to the first evaluation value, it may be determined that the weight change values corresponding to the features 1, 2, 3, 4, and 5 are all 0. If the second evaluation value is greater than the first evaluation value, the feature set includes the feature 1 for the feature 1 in the initial feature set, so the weight 5 of the feature 1 in the reference feature set and the weight 5 of the feature 1 in the initial feature set can be The difference 0 is a weight change value for a set of characteristic parameters (first metadata, second metadata, ... C-ary data) of feature 1. For feature 2 in the initial feature set, feature reference set contains feature 2, so the difference between the weight 2.5 of the reference feature set feature 2 and the weight 3 of the initial feature set feature 2 can be used as a set of feature parameters of feature 2. (C+1 metadata, C+2 metadata, ... D metadata) corresponding weight change values. For the feature 3 in the initial feature set, the reference feature set includes the feature 3, so the difference between the weight 0.9 of the reference feature set feature 3 and the weight 2.5 of the initial feature set feature 3 can be used as a set of feature parameters of the feature 3. (D+1 metadata, D+2 metadata, ... E-element data) corresponding weight change values. For the feature 4 in the initial feature set, the feature set does not include the feature 4, so the preset feature value (such as -0.2) can be used as a set of feature parameters of the feature 4 (E+1 metadata, E+ 2 yuan data, ... F-metadata) corresponding weight change value. For the feature 5 in the initial feature set, the reference feature set includes the feature 5, so the difference between the weight 1 of the reference feature set feature 5 and the weight of the initial feature set feature 5 of 0.5 can be used as a set of feature parameters of the feature 5 ( The weight change value corresponding to the F+1 metadata, the F+2 metadata, the ... G metadata. Optionally, if the reference feature set does not include multiple features in the initial feature set, a simple descent algorithm may be used to divide the weight sum "1" into each feature, that is, assign one to each of the multiple features. The weight change value is such that the sum of the weight change values of the plurality of features is 1.
在确定初始特征集中每个特征的一组特征参数对应的权重变化值后,可以使用一个列表记录初始特征集中每个特征的一组特征参数对应的权重变化值。示例的,如表3所示,表3记录了初始特征集中的每个特征的一组特征参数对应的权重变化值。需要说明的是,本发明实施例仅仅以初始特征集中特征的个数为5进行举例说明,实际应用中,初始特征集中特征的个数可以不为5。After determining the weight change values corresponding to a set of feature parameters of each feature in the initial feature set, a list may be used to record the weight change values corresponding to a set of feature parameters of each feature in the initial feature set. By way of example, as shown in Table 3, Table 3 records the weight change values for a set of feature parameters for each feature in the initial feature set. It should be noted that the embodiment of the present invention only exemplifies the number of features in the initial feature set is 5. In practical applications, the number of features in the initial feature set may not be 5.
表3table 3
初始特征集中的特征的特征参数Characteristic parameters of features in the initial feature set 权重变化值Weight change value
第1元数据、第2元数据、...第C元数据First metadata, second metadata, ... C metadata 55
第C+1元数据、第C+2元数据、...第D元数据C+1 metadata, C+2 metadata, ... D metadata 2.52.5
第D+1元数据、第D+2元数据、...第E元数据D+1 metadata, D+2 metadata, ... E metadata 11
第E+1元数据、第E+2元数据、...第F元数据E+1 metadata, E+2 metadata, ... F metadata 0.90.9
第F+1元数据、第F+2元数据、...第G元数据F+1 metadata, F+2 metadata, ... G metadata 0.60.6
在确定初始特征集中每个特征的一组特征参数对应的权重变化值后,可以根据每组特征参数对应的权重变化值确定预设权重变化模型,也即,可以根据表3推导出预设权重变化模型。After determining a weight change value corresponding to a set of feature parameters of each feature in the initial feature set, the preset weight change model may be determined according to the weight change value corresponding to each set of feature parameters, that is, the preset weight may be derived according to Table 3. Change model.
步骤205、获取待处理数据,待处理数据的一组数据参数为目标参数组。Step 205: Acquire data to be processed, and a set of data parameters of the data to be processed is a target parameter group.
在步骤205之前,已经确定好预设算法模型和预设权重变化模型,在步骤205中可以对数据参数为根据该预设算法模型能够确定出的任意一组数据参数的数据进行处理。现在 以图1所示的实施例为例,一方面,步骤205中获取到的待处理数据可以包括:图1中用户A在使用第一通信运营商提供的网络进行通信的过程中,网络侧产生的用户数据1以及用户B在使用第一通信运营商提供的网络进行通信的过程中,网络侧产生的用户数据2;另一方面,步骤205中获取到的待处理数据可以包括:用户C在使用第二通信运营商提供的网络进行通信的过程中,网络侧产生的用户数据3,以及用户D在使用第二通信运营商提供的网络进行通信的过程中,网络侧产生的用户数据4。Before step 205, the preset algorithm model and the preset weight change model have been determined. In step 205, the data parameter may be processed according to data of any set of data parameters that can be determined according to the preset algorithm model. just now Taking the embodiment shown in FIG. 1 as an example, on the one hand, the data to be processed obtained in step 205 may include: in the process of user A in FIG. 1 communicating using the network provided by the first communication carrier, the network side generates User data 1 and the user data 2 generated by the network B in the process of the user B using the network provided by the first communication carrier; on the other hand, the data to be processed obtained in step 205 may include: user C is In the process of communicating using the network provided by the second communication carrier, the user data 3 generated by the network side, and the user data 4 generated by the network side during the communication of the user D using the network provided by the second communication carrier.
需要说明的是,待处理数据的一组数据参数可以为目标参数组,需要说明的是,本发明实施例中以处理数据参数为目标参数组的待处理数据的过程为例进行详细讲解,处理数据参数为根据预设算法模型能够确定出的其他组数据参数的待处理数据的过程可以参考处理数据参数为目标参数组的待处理数据的过程,本发明实施例在此不做赘述。It should be noted that a set of data parameters of the data to be processed may be a target parameter group. It should be noted that, in the embodiment of the present invention, a process of processing data parameters as a target parameter group of the data to be processed is taken as an example for detailed explanation. The process of the data parameter being the data to be processed of the other group data parameters that can be determined according to the preset algorithm model may refer to the process of processing the data to be processed as the target parameter group, which is not described herein.
步骤206、将目标参数组代入预设算法模型,确定目标参数组对应的目标算法。Step 206: Substituting the target parameter group into a preset algorithm model, and determining a target algorithm corresponding to the target parameter group.
示例的,步骤206中确定的目标算法可以包括:目标特征选择算法和目标机器学习算法中的至少一种算法,也即,上述确定的目标参数组对应的目标算法可以为:目标参数组对应的目标特征选择算法;或者,目标参数组对应的目标机器学习算法;或者,目标参数组对应的目标特征选择算法和目标机器学习算法。示例的,本发明实施例中,以目标算法同时包括:目标特征选择算法和目标机器学习算法为例进行说明。For example, the target algorithm determined in step 206 may include: at least one of a target feature selection algorithm and a target machine learning algorithm, that is, the target algorithm corresponding to the determined target parameter group may be: a target parameter group corresponding to a target feature selection algorithm; or a target machine learning algorithm corresponding to the target parameter group; or a target feature selection algorithm and a target machine learning algorithm corresponding to the target parameter group. For example, in the embodiment of the present invention, the target algorithm includes: a target feature selection algorithm and a target machine learning algorithm as an example.
一方面,在执行步骤206时,若规定了在处理待处理数据的过程中,必须用到第一机器学习算法,则可以将第一机器学习算法和目标参数组代入预设算法模型,得到该第一机器学习算法和目标参数组对应的目标特征选择算法,并将得到的目标特征选择算法和该第一机器学习算法作为目标参数组对应的目标特征选择算法和目标机器学习算法。另一方面,在执行步骤206时,若并未明确规定在处理待处理数据的过程中,必须用到某一机器学习算法,则可以直接将目标参数组代入预设算法模型中,得到该目标参数组对应的目标特征选择算法和目标机器学习算法。On the one hand, when step 206 is performed, if it is specified that the first machine learning algorithm must be used in the process of processing the data to be processed, the first machine learning algorithm and the target parameter group may be substituted into the preset algorithm model to obtain the The first machine learning algorithm and the target feature selection algorithm corresponding to the target parameter set, and the obtained target feature selection algorithm and the first machine learning algorithm are used as the target feature selection algorithm and the target machine learning algorithm corresponding to the target parameter group. On the other hand, when step 206 is executed, if it is not explicitly specified that a certain machine learning algorithm must be used in the process of processing the data to be processed, the target parameter group can be directly substituted into the preset algorithm model to obtain the target. The target feature selection algorithm and the target machine learning algorithm corresponding to the parameter group.
需要说明的是,若在步骤206中仅仅确定了目标参数组对应的目标特征选择算法,则可以根据相关技术确定一个机器学习算法作为目标参数组对应的目标机器学习算法。若在步骤206中仅仅确定了目标参数组对应的目标机器学习算法,则可以根据相关技术确定一个特征选择算法作为目标参数组对应的目标特征选择算法。It should be noted that, if only the target feature selection algorithm corresponding to the target parameter group is determined in step 206, a machine learning algorithm may be determined according to the related art as the target machine learning algorithm corresponding to the target parameter group. If only the target machine learning algorithm corresponding to the target parameter group is determined in step 206, a feature selection algorithm may be determined according to the related technology as the target feature selection algorithm corresponding to the target parameter group.
步骤207、根据目标参数组对应的目标算法以及预设权重变化模型,确定待处理数据的属性。Step 207: Determine an attribute of the data to be processed according to the target algorithm corresponding to the target parameter group and the preset weight change model.
示例的,由于待处理数据的一组数据参数为目标参数组,所以可以将待处理数据代入目标参数组对应的目标特征选择算法,确定目标特征集。具体的,步骤204中的初始特征集可以包括目标特征集,也即目标特征集中的每个特征均属于初始特征集。示例的,在确定目标特征集后,还可以采用预设排序算法为目标特征集中的每个特征进行排序,确定目标特征集中的每个特征的权重。示例的,若该目标特征集中的特征为特征1、特征2、特征3、特征4、特征5,且特征1的权重可以为5,特征2的权重可以为3,特征3的权重可以为2.5,特征4的权重可以为1,特征5的权重可以为0.5,则目标特征集中的特征,按照权重进行排序为:特征1、特征2、特征3、特征4、特征5。For example, since a set of data parameters of the data to be processed is a target parameter group, the data to be processed may be substituted into a target feature selection algorithm corresponding to the target parameter group to determine a target feature set. Specifically, the initial feature set in step 204 may include a target feature set, that is, each feature in the target feature set belongs to the initial feature set. For example, after determining the target feature set, each feature in the target feature set may also be sorted by using a preset sorting algorithm to determine the weight of each feature in the target feature set. For example, if the features in the target feature set are Feature 1, Feature 2, Feature 3, Feature 4, Feature 5, and the weight of Feature 1 may be 5, the weight of Feature 2 may be 3, and the weight of Feature 3 may be 2.5. The weight of the feature 4 may be 1, and the weight of the feature 5 may be 0.5, and the features in the target feature set are sorted according to the weights: feature 1, feature 2, feature 3, feature 4, and feature 5.
在确定目标特征集后,可以根据步骤204中确定的预设权重变化模型,确定目标特征集中每个特征的一组特征参数对应的权重变化值,具体的,可以将特征1、特征2、特征3、 特征4、特征5中的5组特征参数代入预设权重变化模型中,确定每组特征参数对应的对应的权重变化值。并在确定每组特征参数对应的权重变化值后,可以根据每组特征参数对应的权重变化值更新目标特征集中的每个特征对应的权重,具体的,可以将每个特征对应的权重与该特征的一组特征参数对应的权重变化值之和,作为该特征更新后的权重。例如,若该目标特征集中特征1的权重为5,该特征1的一组特征参数对应的权重变化值为0,则更新后的特征1的权重为5;若该目标特征集中特征2的权重为3,该特征2的一组特征参数对应的权重变化值为-0.5,则更新后的特征2的权重为2.5;若该目标特征集中特征3的权重为2.5,该特征3的一组特征参数对应的权重变化值为-1.6,则更新后的特征3的权重为0.9;若该目标特征集中特征4的权重为1,该特征4的一组特征参数对应的权重变化值为-0.2,则更新后的特征4的权重为0.8;若该目标特征集中特征5的权重为0.5,该特征5的一组特征参数对应的权重变化值为0.5,则可以更新后的特征5的权重为1,所以,更新后的目标特征集中的特征,按照权重进行排序为:特征1、特征2、特征5、特征3、特征4。After the target feature set is determined, the weight change value corresponding to a set of feature parameters of each feature in the target feature set may be determined according to the preset weight change model determined in step 204. Specifically, the feature 1, the feature 2, and the feature may be 3, The five sets of feature parameters in feature 4 and feature 5 are substituted into the preset weight change model, and the corresponding weight change values corresponding to each set of feature parameters are determined. After determining the weight change value corresponding to each set of feature parameters, the weight corresponding to each feature in the target feature set may be updated according to the weight change value corresponding to each set of feature parameters. Specifically, the weight corresponding to each feature may be The sum of the weight change values corresponding to a set of feature parameters of the feature as the updated weight of the feature. For example, if the weight of the target feature set feature 1 is 5, the weight change value corresponding to the set of feature parameters of the feature 1 is 0, the weight of the updated feature 1 is 5; if the weight of the target feature set 2 is 3, the weight change value corresponding to a set of feature parameters of the feature 2 is -0.5, and the weight of the updated feature 2 is 2.5; if the weight of the feature set 3 in the target feature set is 2.5, a set of features of the feature 3 If the weight change value of the parameter is -1.6, the weight of the updated feature 3 is 0.9; if the weight of the feature feature 4 of the target feature set is 1, the weight change value of the set of feature parameters of the feature 4 is -0.2. Then, the weight of the updated feature 4 is 0.8; if the weight of the target feature set feature 5 is 0.5, and the weight change value corresponding to the set of feature parameters of the feature 5 is 0.5, the weight of the feature 5 that can be updated is 1 Therefore, the features of the updated target feature set are sorted according to the weights: Feature 1, Feature 2, Feature 5, Feature 3, and Feature 4.
在得到更新权重后的目标特征集后,可以根据更新后的目标特征集和目标参数组对应的目标机器学习算法,确定待处理数据的属性,具体的,可以将更新后的目标特征集代入目标参数组对应的目标机器学习算法中,得到一个处理模型,并将待处理数据代入该处理模型中,确定该待处理模型的属性。After obtaining the target feature set after updating the weight, the attribute of the data to be processed may be determined according to the target machine learning algorithm corresponding to the updated target feature set and the target parameter set. Specifically, the updated target feature set may be substituted into the target. In the target machine learning algorithm corresponding to the parameter group, a processing model is obtained, and the data to be processed is substituted into the processing model to determine the attributes of the model to be processed.
相关技术中,第一通信运营商在对网络侧产生的用户数据进行处理时,还可以将该两个用户数据代入一种特征选择算法,得到一个初始特征集,然后,可以根据该初始特征集构建多个特征选择弱分类器,并基于Boosting算法(一种用来提高弱分类算法准确度的算法)对该多个特征选择弱分类器进行反复迭代,在每次迭代的过程中,可以采用一种机器学习算法对当前特征选择弱分类器得到的两个用户数据的属性的准确性进行验证,若当前特征选择弱分类器得到的两个用户数据的属性不准确,则需要将当前特征选择弱分类器更换为另一个特征选择弱分类器,并调整该另一个特征选择分类器中参数的大小。若当前特征选择弱分类器得到的两个用户数据的属性准确,则将当前特征选择弱分类器作为特征选择强分类器,并采用该特征选择强分类器和该一种机器学习算法确定该两个用户数据的属性。但是,基于Boosting算法对该多个特征选择弱分类器进行反复迭代的过程耗时较长,所以,数据处理的速度较慢,且数据处理的效率较低。本发明实施例中,由于预先确定了预设算法模型,在进行数据处理时,可以直接根据该预设算法模型,确定待处理数据对应的目标特征选择算法和目标机器学习算法,且整个过程中耗时较短,所以提高了数据处理的速度和效率。In the related art, when the first communication carrier processes the user data generated by the network side, the two user data may be substituted into a feature selection algorithm to obtain an initial feature set, and then, according to the initial feature set. Constructing a plurality of feature selection weak classifiers, and repeatedly iterating the plurality of feature selection weak classifiers based on a Boosting algorithm (an algorithm for improving the accuracy of the weak classification algorithm), in each iterative process, A machine learning algorithm verifies the accuracy of the attributes of the two user data obtained by the current feature selection weak classifier. If the attribute of the two user data obtained by the current feature selection weak classifier is inaccurate, the current feature selection is required. The weak classifier is replaced with another feature to select the weak classifier, and the size of the parameter in the other feature selection classifier is adjusted. If the current feature selects the attribute of the two user data obtained by the weak classifier to be accurate, the current feature selection weak classifier is used as the feature selection strong classifier, and the feature selection strong classifier and the one machine learning algorithm determine the two The attributes of the user data. However, the process of repeatedly iterating the plurality of feature selection weak classifiers based on the Boosting algorithm takes a long time, so the data processing speed is slow and the data processing efficiency is low. In the embodiment of the present invention, since the preset algorithm model is determined in advance, when the data processing is performed, the target feature selection algorithm and the target machine learning algorithm corresponding to the data to be processed may be directly determined according to the preset algorithm model, and the whole process is performed. It takes less time, so it increases the speed and efficiency of data processing.
相关技术中,可以将待处理数据代入自动特征选择算法(如基于信息增益或基于相关度的特征选择算法),确定目标特征集。但是,自动特征选择算法在本质上是基于数理统计理论的算法,也即,自动特征选择算法可以根据待处理数据中的数值,确定出该待处理数据的特征中,对某一标签的区分度最好的特征,但实际意义上并不一定是区分度最好的特征,比如身份标识(英文:identification;简称:ID)类特征,此时,将选择好的特征集代入某一机器学习算法得到的处理模型对待处理数据的处理效果较差。工作人员根据经验值在该待处理数据的特征值选择出的特征,可能与该自动特征选择算法确定出的特征不同,但是,将工作人员选择出的特征代入某一机器学习算法得到的处理模型对待处理数据的处理效果较好。本发明实施例中,预先建立了预设权重变化模型,使得在使用自动特征选择 算法得到特征集后,还可以参考工作人员的经验值,对该特征集中特征的权重进行更新,使得将更新后的特征集代入机器学习算法得到的处理模型对待处理数据的处理效果较好。In the related art, the data to be processed may be substituted into an automatic feature selection algorithm (such as an information gain based or correlation-based feature selection algorithm) to determine a target feature set. However, the automatic feature selection algorithm is essentially an algorithm based on mathematical statistics theory, that is, the automatic feature selection algorithm can determine the discrimination of a certain tag in the feature of the data to be processed according to the value in the data to be processed. The best feature, but in actual sense is not necessarily the best distinguishing feature, such as identity (English: identification; referred to as: ID) class features, in this case, the selected feature set is substituted into a machine learning algorithm The obtained processing model has a poor processing effect on the processed data. The feature selected by the staff based on the empirical value of the feature value of the data to be processed may be different from the feature determined by the automatic feature selection algorithm, but the feature selected by the worker is substituted into a processing model obtained by a certain machine learning algorithm. The processing of the processed data is better. In the embodiment of the present invention, a preset weight change model is established in advance, so that automatic feature selection is used. After the algorithm obtains the feature set, the weight of the feature set can be updated by referring to the experience value of the staff, so that the processed model obtained by substituting the updated feature set into the machine learning algorithm has better processing effect on the processed data.
综上所述,由于本发明实施例提供的数据处理方法中,在获取到待处理数据后,直接根据预设算法模型,能够确定目标参数组(待处理数据的一组数据参数)对应的目标算法,且根据该预设算法模型确定出的目标参数组对应的目标算法为根据预设评估算法对目标参数组对应的至少一种算法进行评估,确定的最优评估值对应的算法,也即根据目标参数组对应的目标算法,确定的待处理数据的属性最准确,使得根据该目标参数组对应的目标算法确定的待处理数据的属性的准确度较高。In summary, in the data processing method provided by the embodiment of the present invention, after the data to be processed is acquired, the target parameter group (a set of data parameters of the data to be processed) can be determined according to the preset algorithm model. The algorithm, and the target algorithm corresponding to the target parameter group determined according to the preset algorithm model is an algorithm for evaluating at least one algorithm corresponding to the target parameter group according to the preset evaluation algorithm, and determining the optimal evaluation value corresponding to the algorithm, that is, The attribute of the data to be processed is determined to be the most accurate according to the target algorithm corresponding to the target parameter group, so that the attribute of the data to be processed determined according to the target algorithm corresponding to the target parameter group has higher accuracy.
需要说明的是,本发明实施例提供的数据处理方法步骤的先后顺序可以进行适当调整,步骤也可以根据情况进行相应增减,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化的方法,都应涵盖在本申请的保护范围之内,因此不再赘述。It should be noted that the sequence of the steps of the data processing method provided by the embodiment of the present invention may be appropriately adjusted, and the steps may also be correspondingly increased or decreased according to the situation, and any person skilled in the art may be within the technical scope disclosed in the present application. Methods that can be easily conceived of variations are covered by the scope of the present application and therefore will not be described again.
如图3-1所示,本发明实施例提供了一种数据处理装置30,该数据处理装置30可以包括:As shown in FIG. 3-1, the embodiment of the present invention provides a data processing device 30, which may include:
第一获取模块301,用于获取待处理数据,待处理数据的一组数据参数为目标参数组;a first acquiring module 301, configured to acquire data to be processed, where a set of data parameters of the data to be processed is a target parameter group;
第一确定模块302,用于将目标参数组代入预设算法模型,确定目标参数组对应的目标算法,目标算法为:根据预设评估算法对目标参数组对应的至少一种算法进行评估,确定的最优评估值对应的算法;The first determining module 302 is configured to substitute the target parameter group into the preset algorithm model, and determine a target algorithm corresponding to the target parameter group. The target algorithm is: evaluating, according to the preset evaluation algorithm, at least one algorithm corresponding to the target parameter group, determining The algorithm corresponding to the optimal evaluation value;
第二确定模块303,用于根据目标参数组对应的目标算法确定待处理数据的属性。The second determining module 303 is configured to determine an attribute of the data to be processed according to the target algorithm corresponding to the target parameter group.
综上所述,由于本发明实施例提供的数据处理装置中,在第一获取模块获取到待处理数据后,第一确定模块直接根据预设算法模型,能够确定目标参数组(待处理数据的一组数据参数)对应的目标算法,且根据该预设算法模型确定出的目标参数组对应的目标算法为根据预设评估算法对目标参数组对应的至少一种算法进行评估,确定的最优评估值对应的算法,也即第二确定模块根据目标参数组对应的目标算法,确定的待处理数据的属性最准确,使得根据该目标参数组对应的目标算法确定的待处理数据的属性的准确度较高。As described above, in the data processing apparatus provided by the embodiment of the present invention, after the first obtaining module acquires the data to be processed, the first determining module can directly determine the target parameter group according to the preset algorithm model (the data to be processed) a target algorithm corresponding to a set of data parameters, and the target algorithm corresponding to the target parameter group determined according to the preset algorithm model is to evaluate at least one algorithm corresponding to the target parameter group according to a preset evaluation algorithm, and determine the optimal algorithm. The algorithm corresponding to the evaluation value, that is, the second determining module determines the attribute of the data to be processed to be the most accurate according to the target algorithm corresponding to the target parameter group, so that the attribute of the data to be processed determined according to the target algorithm corresponding to the target parameter group is accurate. Higher degrees.
可选的,目标算法包括:目标特征选择算法和目标机器学习算法,如图3-2所示,本发明实施例提供了另一种数据处理装置30,在图3-1的基础上,数据处理装置30还包括:Optionally, the target algorithm includes: a target feature selection algorithm and a target machine learning algorithm. As shown in FIG. 3-2, the embodiment of the present invention provides another data processing device 30. Based on the data of FIG. 3-1, Processing device 30 also includes:
第二获取模块304,用于获取n个样本集,n个样本集的n组数据参数包括目标参数组,n为大于或等于1的整数;a second obtaining module 304, configured to acquire n sample sets, where n sets of data parameters of the n sample sets include a target parameter set, where n is an integer greater than or equal to 1;
第三确定模块305,用于确定n组数据参数中的每组数据参数对应的目标特征选择算法和目标机器学习算法;a third determining module 305, configured to determine a target feature selection algorithm and a target machine learning algorithm corresponding to each set of data parameters of the n sets of data parameters;
第四确定模块306,用于根据n组数据参数中的每组数据参数对应的目标特征选择算法和目标机器学习算法,确定预设算法模型;The fourth determining module 306 is configured to determine a preset algorithm model according to a target feature selection algorithm and a target machine learning algorithm corresponding to each set of data parameters of the n sets of data parameters;
第一样本集为n个样本集中的任一样本集,第三确定模块305还可以用于:The first sample set is any sample set in n sample sets, and the third determining module 305 can also be used to:
将第一样本集代入至少一种特征选择算法中,确定第一样本集的一组数据参数对应的至少一个特征集;Substituting the first sample set into at least one feature selection algorithm to determine at least one feature set corresponding to a set of data parameters of the first sample set;
将第一样本集的一组数据参数对应的至少一个特征集,分别代入至少一种机器学习算法中,确定第一样本集的一组数据参数对应的至少一个处理模型;Substituting at least one feature set corresponding to a set of data parameters of the first sample set into at least one machine learning algorithm, and determining at least one processing model corresponding to a set of data parameters of the first sample set;
根据预设评估算法确定至少一个处理模型中每个处理模型对应的评估值,并将评估值 最优的处理模型对应的特征选择算法和机器学习算法,作为第一样本集的一组数据参数对应的目标特征选择算法和目标机器学习算法。Determining an evaluation value corresponding to each processing model in at least one processing model according to a preset evaluation algorithm, and evaluating the value The feature selection algorithm and the machine learning algorithm corresponding to the optimal processing model are used as the target feature selection algorithm and the target machine learning algorithm corresponding to a set of data parameters of the first sample set.
可选的,目标算法包括:目标特征选择算法和目标机器学习算法,如图3-3所示,第二确定模块303可以包括:Optionally, the target algorithm includes: a target feature selection algorithm and a target machine learning algorithm. As shown in FIG. 3-3, the second determining module 303 may include:
第一确定单元3031,用于将待处理数据代入目标参数组对应的目标特征选择算法,确定目标特征集,目标特征集包括p个特征,p个特征中的每个特征具有一组特征参数,p为大于或等于1的整数,特征集中的特征具有一个权重;The first determining unit 3031 is configured to substitute the data to be processed into a target feature selection algorithm corresponding to the target parameter group, and determine a target feature set, where the target feature set includes p features, and each of the p features has a set of feature parameters. p is an integer greater than or equal to 1, and the feature in the feature set has a weight;
第二确定单元3032,用于将p个特征的p组特征参数分别代入预设权重变化模型,确定p组特征参数中每组特征参数对应的权重变化值;The second determining unit 3032 is configured to substitute the p group feature parameters of the p features into the preset weight change model, and determine a weight change value corresponding to each set of the feature parameters of the p group feature parameters;
更新单元3033,用于根据确定的权重变化值更新目标特征集中的每个特征对应的权重;The updating unit 3033 is configured to update, according to the determined weight change value, a weight corresponding to each feature in the target feature set;
第三确定单元3034,用于根据更新后的目标特征集和目标参数组对应的目标机器学习算法,确定待处理数据的属性。The third determining unit 3034 is configured to determine an attribute of the data to be processed according to the updated target feature set and the target machine learning algorithm corresponding to the target parameter set.
如图3-4所示,本发明实施例提供了又一种数据处理装置30,在图3-1的基础上,该数据处理装置30还可以包括:As shown in FIG. 3-4, the embodiment of the present invention provides another data processing apparatus 30. The data processing apparatus 30 may further include:
第三获取模块307,用于获取m个样本集,m个样本集的m组数据参数包括目标参数组,m为大于或等于1的整数;a third obtaining module 307, configured to acquire m sample sets, where the m data parameters of the m sample sets include a target parameter group, where m is an integer greater than or equal to 1;
第五确定模块308,用于确定m组数据参数中的每组数据参数对应的目标特征选择算法;a fifth determining module 308, configured to determine a target feature selection algorithm corresponding to each group of data parameters in the m group data parameters;
第六确定模块309,用于确定初始特征集,初始特征集包括:将m个样本集中的每个样本集代入样本集的一组数据参数对应的目标特征选择算法得到的特征集中的特征;The sixth determining module 309 is configured to determine an initial feature set, where the initial feature set includes: a feature set obtained by substituting each sample set in the m sample sets into a feature set obtained by the target feature selection algorithm corresponding to a set of data parameters of the sample set;
第七确定模块310,用于确定参考特征集,参考特征集包括:将m个样本集中的每个样本集代入参考特征选择算法得到的特征集中的特征;The seventh determining module 310 is configured to determine a reference feature set, where the reference feature set includes: substituting each sample set in the m sample sets into a feature set obtained by the reference feature selection algorithm;
第八确定模块311,用于根据参考特征集,确定初始特征集中每个特征的一组特征参数对应的权重变化值;The eighth determining module 311 is configured to determine, according to the reference feature set, a weight change value corresponding to a set of feature parameters of each feature in the initial feature set;
第九确定模块312,用于根据每个特征的一组特征参数对应的权重变化值,确定预设权重变化模型。The ninth determining module 312 is configured to determine a preset weight change model according to the weight change value corresponding to the set of feature parameters of each feature.
可选的,第八确定模块311还可以用于:Optionally, the eighth determining module 311 is further configured to:
将初始特征集代入预设机器学习算法,确定第一处理模型;Substituting the initial feature set into a preset machine learning algorithm to determine the first processing model;
将参考特征集代入预设机器学习算法,确定第二处理模型;Substituting the reference feature set into a preset machine learning algorithm to determine a second processing model;
根据预设评估算法对第一处理模型进行评估,确定第一评估值;The first processing model is evaluated according to a preset evaluation algorithm to determine a first evaluation value;
根据预设评估算法对第二处理模型进行评估,确定第二评估值;The second processing model is evaluated according to a preset evaluation algorithm to determine a second evaluation value;
判断第二评估值是否大于第一评估值;Determining whether the second evaluation value is greater than the first evaluation value;
若第二评估值大于第一评估值,且参考特征集包括初始特征集中的第一特征,则将第一特征在参考特征集中的权重,与第一特征在初始特征集中的权重之差,作为第一特征的一组特征参数对应的权重变化值。If the second evaluation value is greater than the first evaluation value, and the reference feature set includes the first feature in the initial feature set, the difference between the weight of the first feature in the reference feature set and the weight of the first feature in the initial feature set is used as A weight change value corresponding to a set of characteristic parameters of the first feature.
可选的,目标算法包括:目标特征选择算法或目标机器学习算法。Optionally, the target algorithm comprises: a target feature selection algorithm or a target machine learning algorithm.
综上所述,由于本发明实施例提供的数据处理装置中,在第一获取模块获取到待处理数据后,第一确定模块直接根据预设算法模型,能够确定目标参数组(待处理数据的一组数据参数)对应的目标算法,且根据该预设算法模型确定出的目标参数组对应的目标算法 为根据预设评估算法对目标参数组对应的至少一种算法进行评估,确定的最优评估值对应的算法,也即第二确定模块根据目标参数组对应的目标算法,确定的待处理数据的属性最准确,使得根据该目标参数组对应的目标算法确定的待处理数据的属性的准确度较高。As described above, in the data processing apparatus provided by the embodiment of the present invention, after the first obtaining module acquires the data to be processed, the first determining module can directly determine the target parameter group according to the preset algorithm model (the data to be processed) a target algorithm corresponding to a set of data parameters, and a target algorithm corresponding to the target parameter set determined according to the preset algorithm model The at least one algorithm corresponding to the target parameter group is evaluated according to the preset evaluation algorithm, and the algorithm corresponding to the determined optimal evaluation value, that is, the second determining module determines the data to be processed according to the target algorithm corresponding to the target parameter group. The attribute is the most accurate, so that the accuracy of the attribute of the data to be processed determined according to the target algorithm corresponding to the target parameter group is high.
如图4所示,本发明实施例提供了再一种网络调整装置,该网络调整装置可以包括至少一个处理器401(例如CPU)、至少一个网络接口402或者其他通信接口、存储器403和至少一个通信总线404,用于实现这些装置之间的连接通信。处理器401用于执行存储器403中存储的可执行模块,例如计算机程序,存储器403可能包含高速随机存取存储器(英文:Random Access Memory;简称:RAM),也可能还包括非不稳定的存储器(英文:non-volatile memory),例如至少一个磁盘存储器。通过至少一个网络接口402(可以是有线或者无线)实现该网络调整装置与至少一个其他网元之间的通信连接,可以使用互联网,广域网,本地网,城域网等。As shown in FIG. 4, an embodiment of the present invention provides another network adjustment apparatus, which may include at least one processor 401 (such as a CPU), at least one network interface 402 or other communication interface, a memory 403, and at least one. Communication bus 404 is used to implement connection communication between these devices. The processor 401 is configured to execute an executable module stored in the memory 403, such as a computer program, and the memory 403 may include a high-speed random access memory (English: Random Access Memory; RAM), and may also include a non-unstable memory ( English: non-volatile memory), such as at least one disk storage. The communication connection between the network adjustment device and the at least one other network element is implemented by at least one network interface 402 (which may be wired or wireless), and may use an Internet, a wide area network, a local network, a metropolitan area network, or the like.
在一些实施方式中,存储器403存储了程序4031,程序4031可以被处理器401执行,图2所示的数据处理方法可以被处理器401执行程序4031来实现。In some embodiments, the memory 403 stores a program 4031, the program 4031 can be executed by the processor 401, and the data processing method shown in FIG. 2 can be implemented by the processor 401 executing the program 4031.
综上所述,由于本发明实施例提供的数据处理装置中,处理器在获取到待处理数据后,直接根据预设算法模型,能够确定目标参数组(待处理数据的一组数据参数)对应的目标算法,且根据该预设算法模型确定出的目标参数组对应的目标算法为根据预设评估算法对目标参数组对应的至少一种算法进行评估,确定的最优评估值对应的算法,也即根据目标参数组对应的目标算法,确定的待处理数据的属性最准确,使得根据该目标参数组对应的目标算法确定的待处理数据的属性的准确度较高。In summary, in the data processing apparatus provided by the embodiment of the present invention, after acquiring the data to be processed, the processor directly determines, according to the preset algorithm model, the target parameter group (a set of data parameters of the data to be processed). The target algorithm, and the target algorithm corresponding to the target parameter group determined according to the preset algorithm model is an algorithm for evaluating at least one algorithm corresponding to the target parameter group according to the preset evaluation algorithm, and determining the optimal evaluation value, That is, according to the target algorithm corresponding to the target parameter group, the attribute of the to-be-processed data is determined to be the most accurate, so that the attribute of the data to be processed determined according to the target algorithm corresponding to the target parameter group has higher accuracy.
本发明实施例提供了一种数据处理系统,该数据处理系统可以包括如图3-1、图3-2、图3-4或图4所示的数据处理装置。The embodiment of the invention provides a data processing system, which may include the data processing device shown in FIG. 3-1, FIG. 3-2, FIG. 3-4 or FIG.
综上所述,由于本发明实施例提供的数据处理系统中的数据处理装置中,在第一获取模块获取到待处理数据后,第一确定模块直接根据预设算法模型,能够确定目标参数组(待处理数据的一组数据参数)对应的目标算法,且根据该预设算法模型确定出的目标参数组对应的目标算法为根据预设评估算法对目标参数组对应的至少一种算法进行评估,确定的最优评估值对应的算法,也即第二确定模块根据目标参数组对应的目标算法,确定的待处理数据的属性最准确,使得根据该目标参数组对应的目标算法确定的待处理数据的属性的准确度较高。In summary, in the data processing apparatus in the data processing system provided by the embodiment of the present invention, after the first obtaining module acquires the data to be processed, the first determining module can directly determine the target parameter group according to the preset algorithm model. a target algorithm corresponding to a set of data parameters of the data to be processed, and the target algorithm corresponding to the target parameter group determined according to the preset algorithm model is to evaluate at least one algorithm corresponding to the target parameter group according to the preset evaluation algorithm The algorithm corresponding to the determined optimal evaluation value, that is, the second determining module determines the attribute of the data to be processed to be the most accurate according to the target algorithm corresponding to the target parameter group, so that the target algorithm determined according to the target parameter group is to be processed. The attributes of the data are more accurate.
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的数据处理装置和数据处理系统的具体工作过程,可以参考前述数据处理方法实施例中的对应过程,在此不再赘述。A person skilled in the art can clearly understand that for the convenience and brevity of the description, the specific working process of the data processing apparatus and the data processing system described above can refer to the corresponding process in the foregoing data processing method embodiment, and no longer Narration.
上述所有可选技术方案,可以采用任意结合形成本申请的可选实施例,在此不再一一赘述。All the foregoing optional technical solutions may be used in any combination to form an optional embodiment of the present application, and details are not described herein again.
以上所述仅为本申请的可选实施例,并不用以限制本申请,凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。 The above description is only an optional embodiment of the present application, and is not intended to limit the present application. Any modifications, equivalent replacements, improvements, etc. made within the spirit and principles of the present application are included in the protection of the present application. Within the scope.

Claims (15)

  1. 一种数据处理方法,其特征在于,所述方法包括:A data processing method, the method comprising:
    获取待处理数据,所述待处理数据的一组数据参数为目标参数组;Obtaining data to be processed, where a set of data parameters of the data to be processed is a target parameter group;
    将所述目标参数组代入预设算法模型,确定所述目标参数组对应的目标算法,所述目标算法为根据预设评估算法对所述目标参数组对应的至少一种算法进行评估,确定的最优评估值对应的算法;Substituting the target parameter group into a preset algorithm model, and determining a target algorithm corresponding to the target parameter group, where the target algorithm is to evaluate at least one algorithm corresponding to the target parameter group according to a preset evaluation algorithm, and determining The algorithm corresponding to the optimal evaluation value;
    根据所述目标参数组对应的目标算法确定所述待处理数据的属性。Determining an attribute of the to-be-processed data according to a target algorithm corresponding to the target parameter group.
  2. 根据权利要求1所述的方法,其特征在于,所述目标算法包括:目标特征选择算法和目标机器学习算法,在将所述目标参数组代入预设算法模型前,所述方法还包括:The method according to claim 1, wherein the target algorithm comprises: a target feature selection algorithm and a target machine learning algorithm, and before the step of substituting the target parameter group into the preset algorithm model, the method further comprises:
    获取n个样本集,所述n个样本集的n组数据参数包括所述目标参数组,所述n为大于或等于1的整数;Obtaining n sample sets, the n sets of data parameters of the n sample sets include the target parameter set, and the n is an integer greater than or equal to 1;
    确定所述n组数据参数中的每组数据参数对应的目标特征选择算法和目标机器学习算法;Determining a target feature selection algorithm and a target machine learning algorithm corresponding to each of the n sets of data parameters;
    根据所述n组数据参数中的每组数据参数对应的目标特征选择算法和目标机器学习算法,确定所述预设算法模型。Determining the preset algorithm model according to a target feature selection algorithm and a target machine learning algorithm corresponding to each of the n sets of data parameters.
  3. 根据权利要求2所述的方法,其特征在于,第一样本集为所述n个样本集中的任一样本集,所述确定所述n组数据参数中的每组数据参数对应的目标特征选择算法和目标机器学习算法,包括:The method according to claim 2, wherein the first sample set is any sample set in the n sample sets, and the determining target features corresponding to each set of data parameters of the n sets of data parameters Selection algorithms and target machine learning algorithms, including:
    将所述第一样本集代入至少一种特征选择算法中,确定所述第一样本集的一组数据参数对应的至少一个特征集;Substituting the first sample set into at least one feature selection algorithm to determine at least one feature set corresponding to a set of data parameters of the first sample set;
    将所述第一样本集的一组数据参数对应的至少一个特征集,分别代入至少一种机器学习算法中,确定所述第一样本集的一组数据参数对应的至少一个处理模型;Substituting at least one feature set corresponding to a set of data parameters of the first sample set into at least one machine learning algorithm, and determining at least one processing model corresponding to a set of data parameters of the first sample set;
    根据预设评估算法确定所述至少一个处理模型中每个处理模型对应的评估值,并将评估值最优的处理模型对应的特征选择算法和机器学习算法,作为所述第一样本集的一组数据参数对应的目标特征选择算法和目标机器学习算法。Determining, according to a preset evaluation algorithm, an evaluation value corresponding to each processing model in the at least one processing model, and selecting a feature selection algorithm and a machine learning algorithm corresponding to the processing model having the optimal evaluation value as the first sample set A target feature selection algorithm and a target machine learning algorithm corresponding to a set of data parameters.
  4. 根据权利要求1或2所述的方法,其特征在于,所述目标算法包括:目标特征选择算法和目标机器学习算法,所述根据所述目标参数组对应的目标算法确定所述待处理数据的属性,包括:The method according to claim 1 or 2, wherein the target algorithm comprises: a target feature selection algorithm and a target machine learning algorithm, wherein the target algorithm corresponding to the target parameter group determines the data to be processed Properties, including:
    将所述待处理数据代入所述目标参数组对应的目标特征选择算法,确定目标特征集,所述目标特征集包括p个特征,所述p个特征中的每个特征具有一组特征参数,所述p为大于或等于1的整数,特征集中的特征具有一个权重;Substituting the to-be-processed data into a target feature selection algorithm corresponding to the target parameter group, and determining a target feature set, where the target feature set includes p features, each of the p features having a set of feature parameters, The p is an integer greater than or equal to 1, and the feature in the feature set has a weight;
    将所述p个特征的p组特征参数分别代入预设权重变化模型,确定所述p组特征参数中每组特征参数对应的权重变化值;Substituting the p-group feature parameters of the p features into a preset weight change model, and determining a weight change value corresponding to each set of the p-group feature parameters;
    根据确定的权重变化值更新所述目标特征集中的每个特征对应的权重;Updating a weight corresponding to each feature in the target feature set according to the determined weight change value;
    根据更新后的目标特征集和所述目标参数组对应的目标机器学习算法,确定所述待处理 数据的属性。Determining the to-be-processed according to the updated target feature set and the target machine learning algorithm corresponding to the target parameter set The properties of the data.
  5. 根据权利要求4所述的方法,其特征在于,在根据所述目标参数组对应的目标算法确定所述待处理数据的属性之前,所述方法还包括:The method according to claim 4, wherein before determining the attribute of the data to be processed according to the target algorithm corresponding to the target parameter group, the method further includes:
    获取m个样本集,所述m个样本集的m组数据参数包括所述目标参数组,所述m为大于或等于1的整数;Obtaining m sample sets, where the m sets of data parameters of the m sample sets include the target parameter set, where m is an integer greater than or equal to 1;
    确定所述m组数据参数中的每组数据参数对应的目标特征选择算法;Determining a target feature selection algorithm corresponding to each group of data parameters of the m group data parameters;
    确定初始特征集,所述初始特征集包括:将所述m个样本集中的每个样本集,代入样本集的一组数据参数对应的目标特征选择算法得到的特征集中的特征;Determining an initial feature set, the initial feature set comprising: a feature set obtained by substituting each sample set in the m sample sets into a target feature selection algorithm corresponding to a set of data parameters of the sample set;
    确定参考特征集,所述参考特征集包括:将所述m个样本集中的每个样本集代入参考特征选择算法得到的特征集中的特征;Determining a reference feature set, the reference feature set comprising: substituting each of the m sample sets into a feature set obtained by a reference feature selection algorithm;
    根据所述参考特征集,确定所述初始特征集中每个特征的一组特征参数对应的权重变化值;Determining, according to the reference feature set, a weight change value corresponding to a set of feature parameters of each feature in the initial feature set;
    根据所述每个特征的一组特征参数对应的权重变化值,确定所述预设权重变化模型。And determining the preset weight change model according to the weight change value corresponding to the set of feature parameters of each feature.
  6. 根据权利要求5所述的方法,其特征在于,所述根据所述参考特征集,确定所述初始特征集中每个特征的一组特征参数对应的权重变化值,包括:The method according to claim 5, wherein the determining a weight change value corresponding to a set of feature parameters of each feature in the initial feature set according to the reference feature set comprises:
    将所述初始特征集代入预设机器学习算法,确定第一处理模型;Substituting the initial feature set into a preset machine learning algorithm to determine a first processing model;
    将所述参考特征集代入预设机器学习算法,确定第二处理模型;Substituting the reference feature set into a preset machine learning algorithm to determine a second processing model;
    根据所述预设评估算法对所述第一处理模型进行评估,确定第一评估值;And determining, according to the preset evaluation algorithm, the first processing model to determine a first evaluation value;
    根据所述预设评估算法对所述第二处理模型进行评估,确定第二评估值;And determining, according to the preset evaluation algorithm, the second processing model to determine a second evaluation value;
    判断所述第二评估值是否大于所述第一评估值;Determining whether the second evaluation value is greater than the first evaluation value;
    若所述第二评估值大于所述第一评估值,且所述参考特征集包括所述初始特征集中的第一特征,则将所述第一特征在所述参考特征集中的权重,与所述第一特征在所述初始特征集中的权重之差,作为所述第一特征的一组特征参数对应的权重变化值。And if the second evaluation value is greater than the first evaluation value, and the reference feature set includes the first feature in the initial feature set, weights of the first feature in the reference feature set, and The difference between the weights of the first feature in the initial feature set is a weight change value corresponding to a set of feature parameters of the first feature.
  7. 根据权利要求1所述的方法,其特征在于,所述目标算法包括:目标特征选择算法或目标机器学习算法。The method of claim 1 wherein the target algorithm comprises a target feature selection algorithm or a target machine learning algorithm.
  8. 一种数据处理装置,其特征在于,所述数据处理装置包括:A data processing apparatus, characterized in that the data processing apparatus comprises:
    第一获取模块,用于获取待处理数据,所述待处理数据的一组数据参数为目标参数组;a first acquiring module, configured to acquire data to be processed, where a set of data parameters of the to-be-processed data is a target parameter group;
    第一确定模块,用于将所述目标参数组代入预设算法模型,确定所述目标参数组对应的目标算法,所述目标算法为根据预设评估算法对所述目标参数组对应的至少一种算法进行评估,确定的最优评估值对应的算法;a first determining module, configured to substitute the target parameter group into a preset algorithm model, and determine a target algorithm corresponding to the target parameter group, where the target algorithm is at least one corresponding to the target parameter group according to a preset evaluation algorithm An algorithm for evaluating and determining an algorithm corresponding to the optimal evaluation value;
    第二确定模块,用于根据所述目标参数组对应的目标算法确定所述待处理数据的属性。And a second determining module, configured to determine an attribute of the to-be-processed data according to a target algorithm corresponding to the target parameter group.
  9. 根据权利要求8所述的数据处理装置,其特征在于,所述目标算法包括:目标特征选择算法和目标机器学习算法,所述数据处理装置还包括:The data processing apparatus according to claim 8, wherein the target algorithm comprises: a target feature selection algorithm and a target machine learning algorithm, the data processing device further comprising:
    第二获取模块,用于获取n个样本集,所述n个样本集的n组数据参数包括所述目标参 数组,所述n为大于或等于1的整数;a second acquiring module, configured to acquire n sample sets, where n sets of data parameters of the n sample sets include the target parameter An array, wherein n is an integer greater than or equal to 1;
    第三确定模块,用于确定所述n组数据参数中的每组数据参数对应的目标特征选择算法和目标机器学习算法;a third determining module, configured to determine a target feature selection algorithm and a target machine learning algorithm corresponding to each of the n sets of data parameters;
    第四确定模块,用于根据所述n组数据参数中的每组数据参数对应的目标特征选择算法和目标机器学习算法,确定所述预设算法模型。And a fourth determining module, configured to determine the preset algorithm model according to a target feature selection algorithm and a target machine learning algorithm corresponding to each of the n sets of data parameters.
  10. 根据权利要求9所述的数据处理装置,其特征在于,第一样本集为所述n个样本集中的任一样本集,所述第三确定模块还用于:The data processing apparatus according to claim 9, wherein the first sample set is any sample set in the n sample sets, and the third determining module is further configured to:
    将所述第一样本集代入至少一种特征选择算法中,确定所述第一样本集的一组数据参数对应的至少一个特征集;Substituting the first sample set into at least one feature selection algorithm to determine at least one feature set corresponding to a set of data parameters of the first sample set;
    将所述第一样本集的一组数据参数对应的至少一个特征集,分别代入至少一种机器学习算法中,确定所述第一样本集的一组数据参数对应的至少一个处理模型;Substituting at least one feature set corresponding to a set of data parameters of the first sample set into at least one machine learning algorithm, and determining at least one processing model corresponding to a set of data parameters of the first sample set;
    根据预设评估算法确定所述至少一个处理模型中每个处理模型对应的评估值,并将评估值最优的处理模型对应的特征选择算法和机器学习算法,作为所述第一样本集的一组数据参数对应的目标特征选择算法和目标机器学习算法。Determining, according to a preset evaluation algorithm, an evaluation value corresponding to each processing model in the at least one processing model, and selecting a feature selection algorithm and a machine learning algorithm corresponding to the processing model having the optimal evaluation value as the first sample set A target feature selection algorithm and a target machine learning algorithm corresponding to a set of data parameters.
  11. 根据权利要求8或9所述的数据处理装置,其特征在于,所述目标算法包括:目标特征选择算法和目标机器学习算法,所述第二确定模块包括:The data processing apparatus according to claim 8 or 9, wherein the target algorithm comprises: a target feature selection algorithm and a target machine learning algorithm, and the second determining module comprises:
    第一确定单元,用于将所述待处理数据代入所述目标参数组对应的目标特征选择算法,确定目标特征集,所述目标特征集包括p个特征,所述p个特征中的每个特征具有一组特征参数,所述p为大于或等于1的整数,特征集中的特征具有一个权重;a first determining unit, configured to substitute the to-be-processed data into a target feature selection algorithm corresponding to the target parameter group, and determine a target feature set, where the target feature set includes p features, each of the p features The feature has a set of characteristic parameters, the p is an integer greater than or equal to 1, and the feature in the feature set has a weight;
    第二确定单元,用于将所述p个特征的p组特征参数分别代入预设权重变化模型,确定所述p组特征参数中每组特征参数对应的权重变化值;a second determining unit, configured to substitute the p group feature parameters of the p features into a preset weight change model, and determine a weight change value corresponding to each set of the p group feature parameters;
    更新单元,用于根据确定的权重变化值更新所述目标特征集中的每个特征对应的权重;And an updating unit, configured to update, according to the determined weight change value, a weight corresponding to each feature in the target feature set;
    第三确定单元,用于根据更新后的目标特征集和所述目标参数组对应的目标机器学习算法,确定所述待处理数据的属性。And a third determining unit, configured to determine an attribute of the to-be-processed data according to the updated target feature set and the target machine learning algorithm corresponding to the target parameter set.
  12. 根据权利要求11所述的数据处理装置,其特征在于,所述数据处理装置还包括:The data processing device according to claim 11, wherein the data processing device further comprises:
    第三获取模块,用于获取m个样本集,所述m个样本集的m组数据参数包括所述目标参数组,所述m为大于或等于1的整数;a third acquiring module, configured to acquire m sample sets, where the m sets of data parameters of the m sample sets include the target parameter set, where m is an integer greater than or equal to 1;
    第五确定模块,用于确定所述m组数据参数中的每组数据参数对应的目标特征选择算法;a fifth determining module, configured to determine a target feature selection algorithm corresponding to each group of data parameters in the m group of data parameters;
    第六确定模块,用于确定初始特征集,所述初始特征集包括:将所述m个样本集中的每个样本集,代入样本集的一组数据参数对应的目标特征选择算法得到的特征集中的特征;a sixth determining module, configured to determine an initial feature set, where the initial feature set includes: a feature set obtained by substituting each sample set in the m sample sets into a target feature selection algorithm corresponding to a set of data parameters of the sample set Characteristics;
    第七确定模块,用于确定参考特征集,所述参考特征集包括:将所述m个样本集中的每个样本集代入参考特征选择算法得到的特征集中的特征;a seventh determining module, configured to determine a reference feature set, where the reference feature set includes: substituting each sample set in the m sample sets into a feature set obtained by a reference feature selection algorithm;
    第八确定模块,用于根据所述参考特征集,确定所述初始特征集中每个特征的一组特征参数对应的权重变化值;An eighth determining module, configured to determine, according to the reference feature set, a weight change value corresponding to a set of feature parameters of each feature in the initial feature set;
    第九确定模块,用于根据所述每个特征的一组特征参数对应的权重变化值,确定所述预设权重变化模型。 The ninth determining module is configured to determine the preset weight change model according to the weight change value corresponding to the set of feature parameters of each feature.
  13. 根据权利要求12所述的数据处理装置,其特征在于,所述第八确定模块还用于:The data processing apparatus according to claim 12, wherein the eighth determining module is further configured to:
    将所述初始特征集代入预设机器学习算法,确定第一处理模型;Substituting the initial feature set into a preset machine learning algorithm to determine a first processing model;
    将所述参考特征集代入预设机器学习算法,确定第二处理模型;Substituting the reference feature set into a preset machine learning algorithm to determine a second processing model;
    根据所述预设评估算法对所述第一处理模型进行评估,确定第一评估值;And determining, according to the preset evaluation algorithm, the first processing model to determine a first evaluation value;
    根据所述预设评估算法对所述第二处理模型进行评估,确定第二评估值;And determining, according to the preset evaluation algorithm, the second processing model to determine a second evaluation value;
    判断所述第二评估值是否大于所述第一评估值;Determining whether the second evaluation value is greater than the first evaluation value;
    若所述第二评估值大于所述第一评估值,且所述参考特征集包括所述初始特征集中的第一特征,则将所述第一特征在所述参考特征集中的权重,与所述第一特征在所述初始特征集中的权重之差,作为所述第一特征的一组特征参数对应的权重变化值。And if the second evaluation value is greater than the first evaluation value, and the reference feature set includes the first feature in the initial feature set, weights of the first feature in the reference feature set, and The difference between the weights of the first feature in the initial feature set is a weight change value corresponding to a set of feature parameters of the first feature.
  14. 根据权利要求8所述的数据处理装置,其特征在于,所述目标算法包括:目标特征选择算法或目标机器学习算法。The data processing apparatus according to claim 8, wherein said target algorithm comprises: a target feature selection algorithm or a target machine learning algorithm.
  15. 一种数据处理系统,其特征在于,所述数据处理系统包括权利要求8至14任一所述的数据处理装置。 A data processing system, characterized in that the data processing system comprises the data processing device of any one of claims 8 to 14.
PCT/CN2017/079791 2016-08-31 2017-04-07 Data processing method, device and system WO2018040561A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610797641.3A CN107784363B (en) 2016-08-31 2016-08-31 Data processing method, device and system
CN201610797641.3 2016-08-31

Publications (1)

Publication Number Publication Date
WO2018040561A1 true WO2018040561A1 (en) 2018-03-08

Family

ID=61299990

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/079791 WO2018040561A1 (en) 2016-08-31 2017-04-07 Data processing method, device and system

Country Status (2)

Country Link
CN (1) CN107784363B (en)
WO (1) WO2018040561A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109615144B (en) * 2018-12-20 2022-11-01 中华全国供销合作总社郑州棉麻工程技术设计研究所 Method, device, equipment and storage medium for setting target value of moisture regain of cotton
CN112036569B (en) * 2020-07-30 2021-07-23 第四范式(北京)技术有限公司 Knowledge content labeling method and device, computer device and readable storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101782976A (en) * 2010-01-15 2010-07-21 南京邮电大学 Automatic selection method for machine learning in cloud computing environment
CN103761426A (en) * 2014-01-02 2014-04-30 中国科学院数学与系统科学研究院 Method and system for quickly recognizing feature combinations in high-dimensional data
US20140310208A1 (en) * 2013-04-10 2014-10-16 Machine Perception Technologies Inc. Facilitating Operation of a Machine Learning Environment
CN104200087A (en) * 2014-06-05 2014-12-10 清华大学 Parameter optimization and feature tuning method and system for machine learning
CN105389639A (en) * 2015-12-15 2016-03-09 上海汽车集团股份有限公司 Logistics transportation route planning method, device and system based on machine learning

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101872347B (en) * 2009-04-22 2012-09-26 富士通株式会社 Method and device for judging type of webpage
CN103123649B (en) * 2013-01-29 2016-04-20 广州一找网络科技有限公司 A kind of message searching method based on microblog and system
CN104239351B (en) * 2013-06-20 2017-12-19 阿里巴巴集团控股有限公司 A kind of training method and device of the machine learning model of user behavior
CN103778913A (en) * 2014-01-22 2014-05-07 苏州大学 Pathologic voice recognizing method
CN104462487A (en) * 2014-12-19 2015-03-25 南开大学 Individualized online news comment mood forecast method capable of fusing multiple information sources
CN104573741A (en) * 2014-12-24 2015-04-29 杭州华为数字技术有限公司 Feature selection method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101782976A (en) * 2010-01-15 2010-07-21 南京邮电大学 Automatic selection method for machine learning in cloud computing environment
US20140310208A1 (en) * 2013-04-10 2014-10-16 Machine Perception Technologies Inc. Facilitating Operation of a Machine Learning Environment
CN103761426A (en) * 2014-01-02 2014-04-30 中国科学院数学与系统科学研究院 Method and system for quickly recognizing feature combinations in high-dimensional data
CN104200087A (en) * 2014-06-05 2014-12-10 清华大学 Parameter optimization and feature tuning method and system for machine learning
CN105389639A (en) * 2015-12-15 2016-03-09 上海汽车集团股份有限公司 Logistics transportation route planning method, device and system based on machine learning

Also Published As

Publication number Publication date
CN107784363A (en) 2018-03-09
CN107784363B (en) 2021-02-09

Similar Documents

Publication Publication Date Title
CN105574538B (en) Classification model training method and device
WO2020073534A1 (en) Pushing method and apparatus based on re-clustering, and computer device and storage medium
JP5755822B1 (en) Similarity calculation system, similarity calculation method, and program
US7788292B2 (en) Raising the baseline for high-precision text classifiers
RU2617921C2 (en) Category path recognition method and system
CN109685092B (en) Clustering method, equipment, storage medium and device based on big data
WO2016045567A1 (en) Webpage data analysis method and device
CN112132208B (en) Image conversion model generation method and device, electronic equipment and storage medium
WO2018006631A1 (en) User level automatic segmentation method and system
WO2018001123A1 (en) Sample size estimator
WO2022001918A1 (en) Method and apparatus for building predictive model, computing device, and storage medium
CN110909868A (en) Node representation method and device based on graph neural network model
CN104484600B (en) Intrusion detection method and device based on improved density clustering
WO2018040561A1 (en) Data processing method, device and system
CN110728322A (en) Data classification method and related equipment
JP2015162109A (en) Task assignment server, task assignment method, and program
WO2020155754A1 (en) Outlier proportion optimization method and apparatus, and computer device and storage medium
Van Rosmalen et al. Optimization strategies for two-mode partitioning
CN110929218A (en) Difference minimization random grouping method and system
CN113222073B (en) Method and device for training advertisement recommendation model
JP6570978B2 (en) Cluster selection device
US11556595B2 (en) Attribute diversity for frequent pattern analysis
CN111784402A (en) Multi-channel based order-descending rate prediction method and device and readable storage medium
CN106778872B (en) Density-based connected graph clustering method and device
CN111147535A (en) Method and device for preventing Internet of things platform from repeatedly creating terminal equipment

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17844869

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17844869

Country of ref document: EP

Kind code of ref document: A1