CN113344626A - Data feature optimization method and device based on advertisement push - Google Patents
Data feature optimization method and device based on advertisement push Download PDFInfo
- Publication number
- CN113344626A CN113344626A CN202110620238.4A CN202110620238A CN113344626A CN 113344626 A CN113344626 A CN 113344626A CN 202110620238 A CN202110620238 A CN 202110620238A CN 113344626 A CN113344626 A CN 113344626A
- Authority
- CN
- China
- Prior art keywords
- binning
- result
- characteristic
- target
- values
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 33
- 238000005457 optimization Methods 0.000 title claims abstract description 22
- 238000000926 separation method Methods 0.000 claims abstract description 24
- 238000012417 linear regression Methods 0.000 abstract description 12
- 230000000694 effects Effects 0.000 abstract description 8
- 239000002699 waste material Substances 0.000 abstract description 5
- 238000000605 extraction Methods 0.000 description 3
- 230000011218 segmentation Effects 0.000 description 3
- 238000010586 diagram Methods 0.000 description 2
- 230000003247 decreasing effect Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0241—Advertisements
- G06Q30/0251—Targeted advertisements
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Development Economics (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Computational Biology (AREA)
- General Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Accounting & Taxation (AREA)
- Evolutionary Biology (AREA)
- Strategic Management (AREA)
- Finance (AREA)
- Game Theory and Decision Science (AREA)
- Entrepreneurship & Innovation (AREA)
- Economics (AREA)
- Marketing (AREA)
- General Business, Economics & Management (AREA)
- Complex Calculations (AREA)
Abstract
The application discloses a data characteristic optimization method and a device based on advertisement push, which can cross continuous characteristics with low linear strength to generate new characteristics by a mode of carrying out data characteristic optimization processing on a box separation result so as to enable the crossed characteristics to be more accurate, thereby reducing dimension of continuous features, ensuring minimum feature intersection on the premise of retaining useful information as much as possible, avoiding information loss in the process of feature combination, when being applied to the field of advertisement push, the method can reduce the complexity of related data, ensure the feature recognition degree of the related data, when the linear regression model is adopted to process the data characteristics, the model performance and the effect of the linear regression model can be ensured, so that the accuracy of advertisement push is improved, and the resource waste caused by invalid advertisement push is reduced.
Description
Technical Field
The application relates to the technical field of business data processing, in particular to a data feature optimization method and device based on advertisement push.
Background
In the advertisement push service, an advertisement push model is usually adopted for performing relevant push processing. Generally, the advertisement push model used is a linear regression model (LR). However, the linear regression model has a poor effect in application due to its own drawbacks. In order to improve the application effect of the linear regression model to improve the accuracy of advertisement delivery and reduce resource waste caused by invalid advertisement delivery, feature optimization needs to be performed on advertisement service data. The related feature optimization techniques, however, still suffer from some drawbacks.
Disclosure of Invention
In order to solve the technical problems in the background art, the present disclosure provides a data feature optimization method and device based on advertisement push.
The application provides a data feature optimization method based on advertisement push, which is applied to computer equipment and comprises the following steps:
acquiring a data set to be processed; the data set to be processed is advertisement service data;
extracting characteristic values of the continuous numerical characteristics in the data set to be processed to obtain a plurality of characteristic values, and performing box separation on the plurality of characteristic values to obtain a final box separation result of each continuous numerical characteristic;
performing pairwise crossing on the final box separation result to obtain a plurality of target box separation characteristics;
carrying out independent hot coding on the plurality of target sub-box characteristics to obtain target coding characteristics; wherein the target coding feature is used for advertisement push processing.
Preferably, the binning the plurality of feature values to obtain a final binning result of each continuous numerical feature includes:
performing equal-frequency binning on the plurality of characteristic values to obtain a first binning result;
performing chi-square binning on the plurality of characteristic values to obtain a second binning result;
performing best-ks binning on the plurality of characteristic values to obtain a third binning result;
and combining the first binning result, the second binning result and the third binning result to obtain a final binning result of each continuous numerical characteristic.
Preferably, the merging the first binning result, the second binning result, and the third binning result to obtain a final binning result of each continuous numerical feature includes:
merging the first binning result, the second binning result and the third binning result according to a preset sequence to obtain a merged result;
and taking the minimum intersection in the merged results as the final binning result of each continuous numerical characteristic.
Preferably, pairwise crossing the final binning result to obtain a plurality of target binning characteristics includes:
obtaining a binning feature sequence consisting of a plurality of target binning features according to the binning features in the final binning result, wherein each target binning feature comprises a plurality of values;
combining every two target box-dividing characteristics in the box-dividing characteristic sequence to obtain at least one characteristic combination;
and aiming at each characteristic combination, pairwise combination is carried out on a plurality of values of one target box characteristic in the characteristic combination and a plurality of values of another target box characteristic respectively to obtain a plurality of target combination data corresponding to the characteristic combination.
Preferably, after the performing the one-hot coding on the plurality of target binning features to obtain the target coding features, the method further includes:
inputting the target coding features into a model.
The application provides a data characteristic optimization device based on advertisement propelling movement, is applied to computer equipment, the device includes:
the data acquisition module is used for acquiring a data set to be processed; the data set to be processed is advertisement service data;
the characteristic binning module is used for extracting characteristic values of the continuous numerical characteristics in the data set to be processed to obtain a plurality of characteristic values, and binning the plurality of characteristic values to obtain a final binning result of each continuous numerical characteristic;
the result crossing module is used for crossing the final box separation results pairwise to obtain a plurality of target box separation characteristics;
the independent-hot coding module is used for carrying out independent-hot coding on the plurality of target sub-box characteristics to obtain target coding characteristics; wherein the target coding feature is used for advertisement push processing.
Preferably, the feature binning module is specifically configured to:
performing equal-frequency binning on the plurality of characteristic values to obtain a first binning result;
performing chi-square binning on the plurality of characteristic values to obtain a second binning result;
performing best-ks binning on the plurality of characteristic values to obtain a third binning result;
and combining the first binning result, the second binning result and the third binning result to obtain a final binning result of each continuous numerical characteristic.
Preferably, the feature binning module is specifically configured to:
merging the first binning result, the second binning result and the third binning result according to a preset sequence to obtain a merged result;
and taking the minimum intersection in the merged results as the final binning result of each continuous numerical characteristic.
Preferably, the result crossing module is specifically configured to:
obtaining a binning feature sequence consisting of a plurality of target binning features according to the binning features in the final binning result, wherein each target binning feature comprises a plurality of values;
combining every two target box-dividing characteristics in the box-dividing characteristic sequence to obtain at least one characteristic combination;
and aiming at each characteristic combination, pairwise combination is carried out on a plurality of values of one target box characteristic in the characteristic combination and a plurality of values of another target box characteristic respectively to obtain a plurality of target combination data corresponding to the characteristic combination.
Preferably, the one-hot encoding module is specifically configured to:
inputting the target coding features into a model.
The technical solution provided by the embodiments disclosed in the present application may include the following advantageous effects.
A data feature optimization method and device based on advertisement push are characterized in that feature value extraction is carried out according to continuous numerical features in a data set to be processed to obtain a plurality of feature values, the feature values are subjected to box separation to obtain a final box separation result of each continuous numerical feature, the final box separation results are crossed pairwise to obtain a plurality of target box separation features, and the target box separation features are subjected to independent hot coding to obtain target coding features. By means of data feature optimization processing of the box dividing result, new features can be generated by intersecting continuous features with low linear strength, the intersected features are more accurate, dimension reduction is performed on the continuous features, meanwhile, minimization of feature intersection can be guaranteed on the premise that useful information is kept as far as possible, information loss in the process of feature combination is avoided, when the method is applied to the field of advertisement pushing, the complexity of related data can be reduced, feature identification degree of the related data is guaranteed, when the data features are processed by adopting a linear regression model, model performance and effect of the linear regression model can be guaranteed, accuracy of advertisement pushing is improved, and resource waste caused by invalid advertisement pushing is reduced.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.
Fig. 1 is a flowchart of a data feature optimization method based on advertisement push according to an embodiment of the present invention;
fig. 2 is a functional block diagram of a data feature optimization apparatus based on advertisement delivery according to an embodiment of the present invention.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.
On the basis of the above, please refer to fig. 1, which is a flowchart illustrating a data feature optimization method based on advertisement pushing according to an embodiment of the present invention, and further, the data feature optimization method based on advertisement pushing may specifically include the following contents described in steps S21 to S24.
Step S21, acquiring a data set to be processed; wherein, the data set to be processed is advertisement service data.
For example, the pending data set may be obtained from an advertisement domain, an e-commerce domain, or other domain (e.g., the advertisement domain may deliver a type of advertisement to the customer, and the obtained sample set may indicate whether the advertisement is clicked, not clicked as a type, and clicked as a type). The dataset to be processed may include samples of continuous numerical features and two class labels.
And step S22, extracting characteristic values of the continuous numerical characteristics in the data set to be processed to obtain a plurality of characteristic values, and performing binning on the plurality of characteristic values to obtain a final binning result of each continuous numerical characteristic.
Illustratively, the feature values represent discrete coefficients, and the bins include equal frequency bins, chi-square bins, best-ks bins, and iv minimum bins, wherein the equal frequency bins divide successive values into a plurality of intervals, the sample size of each interval is equivalent, and it is generally necessary to give a desired number of bins, and after the initial bins, the repeated bins of the values are combined to obtain the final bins. The chi-square binning is a binning method combining from bottom to top, when the values are more, simple equal-frequency binning is carried out initially, then two bins with the smallest chi-square value are combined, and the circulation is repeated until the target bin number or the smallest chi-square value of adjacent bins exceeds a certain threshold value, and then the operation is stopped. The best-ks binning is to obtain the ks value of each point after sorting the characteristic values, firstly, the value corresponding to the largest ks value is selected, then the characteristic values are divided into two intervals by the value, the previous operation is repeated for the left interval and the right interval until the number of samples in the binning is lower than a set threshold value, or all single-class sample bins appear, or the sequential binning class proportion keeps a monotonous increasing or monotonous decreasing trend and is broken, and the binning is stopped. The IV minimum Value bin (IV is called Information Value entirely) is an index for measuring characteristic linear prediction capability, the IV minimum loss bin is a method for combining adjacent bins from bottom to top, and the strategy is to select a group of adjacent bins each time so that the Value of the IV of the variable is larger after combination.
Further, the final binning result indicates that the three binning results are combined and then the smallest intersection region is taken as the final binning result (for example, the three binning methods sequentially divide the features into 5, and 4 segments, that is, new features with the number of values of 5, and 4 are generated, the smallest intersection result is taken for the three binning results according to the new features, seven segmentation regions can be formed in total according to the sample with the smallest intersection in the value regions corresponding to the three binning results, and the results of feature binning corresponding to the seven segmentation regions are taken as the final binning result of a single feature).
For example, after sorting the continuous values, the values are segmented according to a certain rule, and all the values in the same segmentation interval are classified as the same value. On one hand, dimension reduction is carried out on data, and on the other hand, unstable model effect caused by extreme values or numerical value fluctuation is avoided.
And step S23, performing pairwise intersection on the final box separation results to obtain a plurality of target box separation characteristics.
Illustratively, the pairwise crossing indicates that the features in the final binning result are first pairwise crossed and combined, and then the feature values corresponding to the features in the final binning result are pairwise crossed and combined, for example: and n characteristics exist in the final box separation result, and n new box separation characteristics are finally obtained. And then, pairwise crossing the n characteristics, and inputting the crossed characteristics into the binning and merging process again. Such as: the feature value crossing method comprises the following steps that the generated new box-dividing features are Va and Vb … … Vn, the two features of Va and Vb are supposed to be crossed, the value corresponding to Va has x unique values, and the value corresponding to Vb has y unique values. Then, the values of the two characteristics are combined pairwise to obtain a total of x by y combination modes. Each combination corresponds to a new cross feature value. And (3) all the original features are crossed with other features except the original features, and a [ nx (n-1) ]/2 group of crossed results are shared, namely [ nx (n-1) ]/2 crossed features are further newly generated, and the number of the finally generated new crossed feature values is total sigma n-1 i-1 sigma nj-i +1 (xiyj).
Furthermore, all the cross features are subjected to further binning, only a single binning mode is needed, equal frequency binning, chi-square binning and best-ks binning are needed, and new features can be obtained after binning is performed in the same way. The binning is to perform dimension reduction of feature values.
Step S24, carrying out single hot coding on the plurality of target sub-box characteristics to obtain target coding characteristics; wherein the target coding feature is used for advertisement push processing.
For example, the one-hot encoding is an encoding method that stores z values of a feature as z states, for example, if all value intervals of a gender field are male and female, then a gender is a state, and when the state is met, the state bit takes a value of 1, otherwise the state bit takes a value of 0. For example, the value status of male is "1-0", and the value status of female is "0-1". Thus, a single feature is expanded into two features, wherein a "male" corresponds to one feature, values of 0 and 1 sequentially indicate that the feature is not a male and is a male, a "female" corresponds to one feature, and values of 0 and 1 sequentially indicate that the feature is not a female and is a female. The dimensionality of the features can be greatly reduced after the single-hot coding processing, the sample size is reduced, the memory pressure of a computer is reduced, and the modeling speed of the computer is improved.
It can be understood that, when the contents described in the above steps S21 to S24 are executed, feature value extraction is performed according to the continuous numerical features in the data set to be processed to obtain a plurality of feature values, the plurality of feature values are subjected to binning to obtain a final binning result of each continuous numerical feature, the final binning results are intersected two by two to obtain a plurality of target binning features, and the plurality of target binning features are subjected to unique hot coding to obtain target coding features. By means of data feature optimization processing of the box dividing result, new features can be generated by intersecting continuous features with low linear strength, the intersected features are more accurate, dimension reduction is performed on the continuous features, meanwhile, minimization of feature intersection can be guaranteed on the premise that useful information is kept as far as possible, information loss in the process of feature combination is avoided, when the method is applied to the field of advertisement pushing, the complexity of related data can be reduced, feature recognition degree of the related data is guaranteed, when the data features are processed by adopting a linear regression model, the model performance and the effect of the linear regression model can be guaranteed, accuracy of advertisement pushing is improved, and resource waste caused by invalid advertisement pushing is reduced.
In an alternative embodiment, the inventor found that when the plurality of feature values are binned, there is a problem of confusion of binning, so that it is difficult to accurately obtain the final binning result of each of the continuous numerical features, and in order to avoid the above technical problem, the step of binning the plurality of feature values to obtain the final binning result of each of the continuous numerical features described in step S22 may further include the following steps S221 to S224.
Step S221, performing equal frequency binning on the plurality of characteristic values to obtain a first binning result.
And step S222, performing chi-square binning on the plurality of characteristic values to obtain a second binning result.
And step S223, performing best-ks binning on the plurality of characteristic values to obtain a third binning result.
Step S224, merging the first binning result, the second binning result, and the third binning result to obtain a final binning result of each continuous numerical characteristic.
It can be understood that, when the contents described in the above steps S221 to S224 are performed, when the plurality of feature values are binned, the problem of binning confusion is avoided, so that the final binning result of each of the continuous numerical type features can be accurately obtained.
In an alternative embodiment, the inventor finds that, when the first binning result, the second binning result, and the third binning result are combined, there is a technical problem of confusion in combination, so that it is difficult to accurately obtain the final binning result of each of the continuous numerical features, and in order to improve the technical problem, the step of combining the first binning result, the second binning result, and the third binning result to obtain the final binning result of each of the continuous numerical features, as described in step S224, may specifically include the following steps of a1 and a 2.
And step A1, merging the first binning result, the second binning result and the third binning result according to a preset sequence to obtain a merged result.
Step A2, using the minimum intersection in the merged results as the final binning result of each continuous numerical feature.
It can be understood that, when the contents described in the above steps a1 and a2 are performed, the technical problem of merging confusion is avoided when the first, second, and third binning results are merged, so that the final binning result of each of the continuous numerical type features can be accurately obtained.
In an alternative embodiment, the inventor finds that, when the final binning result is pairwise intersected, there is a technical problem of confusion of intersection, so that it is difficult to accurately obtain a plurality of target binning characteristics, and in order to improve the technical problem, the step of pairwise intersecting the final binning result to obtain a plurality of target binning characteristics described in step S23 may specifically include the following steps S231-S233.
Step S231, obtaining a binning feature sequence composed of a plurality of target binning features according to the plurality of binning features in the final binning result, where each target binning feature includes a plurality of values.
Step S232, combining every two target box separation characteristics in the box separation characteristic sequence to obtain at least one characteristic combination.
Step S233, for each feature combination, pairwise combining a plurality of values of one of the target binning features in the feature combination with a plurality of values of another target binning feature, respectively, to obtain a plurality of target combination data corresponding to the feature combination.
It can be understood that, when performing the contents described in the above steps S231 to S233, and performing pairwise intersection on the final binning results, the technical problem of confusion of intersection is avoided, so that a plurality of target binning characteristics can be accurately obtained.
And based on the basis, carrying out single-hot coding on the plurality of target sub-box characteristics to obtain target coding characteristics, and then further comprising the following steps.
Inputting the target coding features into a model.
Based on the same inventive concept as above, please refer to fig. 2, a block diagram of functional modules of the data feature optimizing device 20 based on advertisement push is also provided, and the detailed description about the data feature optimizing device 20 based on advertisement push is as follows.
An advertisement push-based data feature optimization device 20 applied to a computer device, the device 20 comprising:
a data obtaining module 21, configured to obtain a data set to be processed; the data set to be processed is advertisement service data;
the feature binning module 22 is configured to perform feature value extraction on the continuous numerical features in the data set to be processed to obtain a plurality of feature values, bin the plurality of feature values, and obtain a final binning result of each continuous numerical feature;
a result crossing module 23, configured to cross the final binning results two by two to obtain multiple target binning characteristics;
the independent-hot coding module 24 is used for carrying out independent-hot coding on the plurality of target sub-box characteristics to obtain target coding characteristics; wherein the target coding feature is used for advertisement push processing.
Further, the feature binning module 22 is specifically configured to:
performing equal-frequency binning on the plurality of characteristic values to obtain a first binning result;
performing chi-square binning on the plurality of characteristic values to obtain a second binning result;
performing best-ks binning on the plurality of characteristic values to obtain a third binning result;
and combining the first binning result, the second binning result and the third binning result to obtain a final binning result of each continuous numerical characteristic.
Further, the feature binning module 22 is specifically configured to:
merging the first binning result, the second binning result and the third binning result according to a preset sequence to obtain a merged result;
and taking the minimum intersection in the merged results as the final binning result of each continuous numerical characteristic.
Further, the result crossing module 23 is specifically configured to:
obtaining a binning feature sequence consisting of a plurality of target binning features according to the binning features in the final binning result, wherein each target binning feature comprises a plurality of values;
combining every two target box-dividing characteristics in the box-dividing characteristic sequence to obtain at least one characteristic combination;
and aiming at each characteristic combination, pairwise combination is carried out on a plurality of values of one target box characteristic in the characteristic combination and a plurality of values of another target box characteristic respectively to obtain a plurality of target combination data corresponding to the characteristic combination.
Further, the one-hot encoding module 24 is specifically configured to:
inputting the target coding features into a model.
To sum up, the data feature optimization method and apparatus based on advertisement push provided by the embodiments of the present invention extract feature values according to continuous numerical features in a data set to be processed to obtain a plurality of feature values, perform binning on the plurality of feature values to obtain a final binning result of each continuous numerical feature, perform pairwise intersection on the final binning result to obtain a plurality of target binning features, and perform unique hot coding on the plurality of target binning features to obtain target coding features. By means of data feature optimization processing of the box dividing result, new features can be generated by intersecting continuous features with low linear strength, the intersected features are more accurate, dimension reduction is performed on the continuous features, meanwhile, minimization of feature intersection can be guaranteed on the premise that useful information is kept as far as possible, information loss in the process of feature combination is avoided, when the method is applied to the field of advertisement pushing, the complexity of related data can be reduced, feature recognition degree of the related data is guaranteed, when the data features are processed by adopting a linear regression model, the model performance and the effect of the linear regression model can be guaranteed, accuracy of advertisement pushing is improved, and resource waste caused by invalid advertisement pushing is reduced.
It will be understood that the invention is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the invention is limited only by the appended claims.
Claims (10)
1. A data characteristic optimization method based on advertisement pushing is applied to a computer device and comprises the following steps:
acquiring a data set to be processed; the data set to be processed is advertisement service data;
extracting characteristic values of the continuous numerical characteristics in the data set to be processed to obtain a plurality of characteristic values, and performing box separation on the plurality of characteristic values to obtain a final box separation result of each continuous numerical characteristic;
performing pairwise crossing on the final box separation result to obtain a plurality of target box separation characteristics;
carrying out independent hot coding on the plurality of target sub-box characteristics to obtain target coding characteristics; wherein the target coding feature is used for advertisement push processing.
2. The method of claim 1, wherein the binning the plurality of feature values to obtain a final binning result for each of the continuous numerical features comprises:
performing equal-frequency binning on the plurality of characteristic values to obtain a first binning result;
performing chi-square binning on the plurality of characteristic values to obtain a second binning result;
performing best-ks binning on the plurality of characteristic values to obtain a third binning result;
and combining the first binning result, the second binning result and the third binning result to obtain a final binning result of each continuous numerical characteristic.
3. The method of claim 2, wherein combining the first binned result, the second binned result, and the third binned result to obtain a final binned result for each of the sequential numerical features comprises:
merging the first binning result, the second binning result and the third binning result according to a preset sequence to obtain a merged result;
and taking the minimum intersection in the merged results as the final binning result of each continuous numerical characteristic.
4. The method of claim 1, wherein pairwise crossing the final binning results to obtain a plurality of target binning characteristics comprises:
obtaining a binning feature sequence consisting of a plurality of target binning features according to the binning features in the final binning result, wherein each target binning feature comprises a plurality of values;
combining every two target box-dividing characteristics in the box-dividing characteristic sequence to obtain at least one characteristic combination;
and aiming at each characteristic combination, pairwise combination is carried out on a plurality of values of one target box characteristic in the characteristic combination and a plurality of values of another target box characteristic respectively to obtain a plurality of target combination data corresponding to the characteristic combination.
5. The method of claim 1, wherein the step of performing one-hot encoding on the plurality of target binning features to obtain target encoding features further comprises:
inputting the target coding features into a model.
6. An advertisement push-based data feature optimization device, applied to a computer device, the device comprising:
the data acquisition module is used for acquiring a data set to be processed; the data set to be processed is advertisement service data;
the characteristic binning module is used for extracting characteristic values of the continuous numerical characteristics in the data set to be processed to obtain a plurality of characteristic values, and binning the plurality of characteristic values to obtain a final binning result of each continuous numerical characteristic;
the result crossing module is used for crossing the final box separation results pairwise to obtain a plurality of target box separation characteristics;
the independent-hot coding module is used for carrying out independent-hot coding on the plurality of target sub-box characteristics to obtain target coding characteristics; wherein the target coding feature is used for advertisement push processing.
7. The device of claim 6, wherein the feature binning module is specifically configured to:
performing equal-frequency binning on the plurality of characteristic values to obtain a first binning result;
performing chi-square binning on the plurality of characteristic values to obtain a second binning result;
performing best-ks binning on the plurality of characteristic values to obtain a third binning result;
and combining the first binning result, the second binning result and the third binning result to obtain a final binning result of each continuous numerical characteristic.
8. The apparatus of claim 7, wherein the feature binning module is specifically configured to:
merging the first binning result, the second binning result and the third binning result according to a preset sequence to obtain a merged result;
and taking the minimum intersection in the merged results as the final binning result of each continuous numerical characteristic.
9. The apparatus of claim 6, wherein the result interleaving module is specifically configured to:
obtaining a binning feature sequence consisting of a plurality of target binning features according to the binning features in the final binning result, wherein each target binning feature comprises a plurality of values;
combining every two target box-dividing characteristics in the box-dividing characteristic sequence to obtain at least one characteristic combination;
and aiming at each characteristic combination, pairwise combination is carried out on a plurality of values of one target box characteristic in the characteristic combination and a plurality of values of another target box characteristic respectively to obtain a plurality of target combination data corresponding to the characteristic combination.
10. The apparatus of claim 6, wherein the one-hot encoding module is specifically configured to:
inputting the target coding features into a model.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110620238.4A CN113344626A (en) | 2021-06-03 | 2021-06-03 | Data feature optimization method and device based on advertisement push |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110620238.4A CN113344626A (en) | 2021-06-03 | 2021-06-03 | Data feature optimization method and device based on advertisement push |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113344626A true CN113344626A (en) | 2021-09-03 |
Family
ID=77475236
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110620238.4A Pending CN113344626A (en) | 2021-06-03 | 2021-06-03 | Data feature optimization method and device based on advertisement push |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113344626A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114329127A (en) * | 2021-12-30 | 2022-04-12 | 北京瑞莱智慧科技有限公司 | Characteristic box dividing method, device and storage medium |
Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105786860A (en) * | 2014-12-23 | 2016-07-20 | 华为技术有限公司 | Data processing method and device in data modeling |
CN108733631A (en) * | 2018-04-09 | 2018-11-02 | 中国平安人寿保险股份有限公司 | A kind of data assessment method, apparatus, terminal device and storage medium |
CN108764273A (en) * | 2018-04-09 | 2018-11-06 | 中国平安人寿保险股份有限公司 | A kind of method, apparatus of data processing, terminal device and storage medium |
CN111507831A (en) * | 2020-05-29 | 2020-08-07 | 长安汽车金融有限公司 | Credit risk automatic assessment method and device |
CN111626832A (en) * | 2020-06-05 | 2020-09-04 | 中国银行股份有限公司 | Product recommendation method and device and computer equipment |
CN111861706A (en) * | 2020-07-10 | 2020-10-30 | 深圳无域科技技术有限公司 | Data discretization regulation and control method and system and risk control model establishing method and system |
CN111950585A (en) * | 2020-06-29 | 2020-11-17 | 广东技术师范大学 | XGboost-based underground comprehensive pipe gallery safety condition assessment method |
CN112085565A (en) * | 2020-09-07 | 2020-12-15 | 中国平安财产保险股份有限公司 | Deep learning-based information recommendation method, device, equipment and storage medium |
CN112328657A (en) * | 2020-11-03 | 2021-02-05 | 中国平安人寿保险股份有限公司 | Feature derivation method, feature derivation device, computer equipment and medium |
WO2021027362A1 (en) * | 2019-08-13 | 2021-02-18 | 平安科技(深圳)有限公司 | Information pushing method and apparatus based on data analysis, computer device, and storage medium |
CN112580825A (en) * | 2021-02-22 | 2021-03-30 | 上海冰鉴信息科技有限公司 | Unsupervised data binning method and unsupervised data binning device |
CN112632045A (en) * | 2021-03-10 | 2021-04-09 | 腾讯科技(深圳)有限公司 | Data processing method, device, equipment and computer readable storage medium |
CN112633414A (en) * | 2021-01-06 | 2021-04-09 | 深圳前海微众银行股份有限公司 | Feature selection optimization method, device and readable storage medium |
-
2021
- 2021-06-03 CN CN202110620238.4A patent/CN113344626A/en active Pending
Patent Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105786860A (en) * | 2014-12-23 | 2016-07-20 | 华为技术有限公司 | Data processing method and device in data modeling |
CN108733631A (en) * | 2018-04-09 | 2018-11-02 | 中国平安人寿保险股份有限公司 | A kind of data assessment method, apparatus, terminal device and storage medium |
CN108764273A (en) * | 2018-04-09 | 2018-11-06 | 中国平安人寿保险股份有限公司 | A kind of method, apparatus of data processing, terminal device and storage medium |
WO2021027362A1 (en) * | 2019-08-13 | 2021-02-18 | 平安科技(深圳)有限公司 | Information pushing method and apparatus based on data analysis, computer device, and storage medium |
CN111507831A (en) * | 2020-05-29 | 2020-08-07 | 长安汽车金融有限公司 | Credit risk automatic assessment method and device |
CN111626832A (en) * | 2020-06-05 | 2020-09-04 | 中国银行股份有限公司 | Product recommendation method and device and computer equipment |
CN111950585A (en) * | 2020-06-29 | 2020-11-17 | 广东技术师范大学 | XGboost-based underground comprehensive pipe gallery safety condition assessment method |
CN111861706A (en) * | 2020-07-10 | 2020-10-30 | 深圳无域科技技术有限公司 | Data discretization regulation and control method and system and risk control model establishing method and system |
CN112085565A (en) * | 2020-09-07 | 2020-12-15 | 中国平安财产保险股份有限公司 | Deep learning-based information recommendation method, device, equipment and storage medium |
CN112328657A (en) * | 2020-11-03 | 2021-02-05 | 中国平安人寿保险股份有限公司 | Feature derivation method, feature derivation device, computer equipment and medium |
CN112633414A (en) * | 2021-01-06 | 2021-04-09 | 深圳前海微众银行股份有限公司 | Feature selection optimization method, device and readable storage medium |
CN112580825A (en) * | 2021-02-22 | 2021-03-30 | 上海冰鉴信息科技有限公司 | Unsupervised data binning method and unsupervised data binning device |
CN112632045A (en) * | 2021-03-10 | 2021-04-09 | 腾讯科技(深圳)有限公司 | Data processing method, device, equipment and computer readable storage medium |
Non-Patent Citations (1)
Title |
---|
王青天: "《Python金融大数据风控建模实战》", 31 May 2020, 机械工业出版社, pages: 89 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114329127A (en) * | 2021-12-30 | 2022-04-12 | 北京瑞莱智慧科技有限公司 | Characteristic box dividing method, device and storage medium |
CN114329127B (en) * | 2021-12-30 | 2023-06-20 | 北京瑞莱智慧科技有限公司 | Feature binning method, device and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110837836A (en) | Semi-supervised semantic segmentation method based on maximized confidence | |
CN109213866A (en) | A kind of tax commodity code classification method and system based on deep learning | |
CN103679185A (en) | Convolutional neural network classifier system as well as training method, classifying method and application thereof | |
KR20200032258A (en) | Finding k extreme values in constant processing time | |
CN103885942B (en) | A kind of rapid translation device and method | |
CN111325264A (en) | Multi-label data classification method based on entropy | |
CN111815432A (en) | Financial service risk prediction method and device | |
CN105528610A (en) | Character recognition method and device | |
CN108733644A (en) | A kind of text emotion analysis method, computer readable storage medium and terminal device | |
CN113971735A (en) | Depth image clustering method, system, device, medium and terminal | |
CN115170868A (en) | Clustering-based small sample image classification two-stage meta-learning method | |
CN114283083B (en) | Aesthetic enhancement method of scene generation model based on decoupling representation | |
CN113344626A (en) | Data feature optimization method and device based on advertisement push | |
CN111783543A (en) | Face activity unit detection method based on multitask learning | |
CN115146062A (en) | Intelligent event analysis method and system fusing expert recommendation and text clustering | |
CN116860977B (en) | Abnormality detection system and method for contradiction dispute mediation | |
CN111553442B (en) | Optimization method and system for classifier chain tag sequence | |
CN113392868A (en) | Model training method, related device, equipment and storage medium | |
CN116821274A (en) | Combined extraction method and system for fertilization information | |
CN116304012A (en) | Large-scale text clustering method and device | |
CN111814922B (en) | Video clip content matching method based on deep learning | |
Chang et al. | A Robust Color Image Quantization Algorithm Based on Knowledge Reuse of K-Means Clustering Ensemble. | |
CN110689082A (en) | Track clustering algorithm using OPTIC and offline batch processing optimization | |
CN112908418B (en) | Dictionary learning-based amino acid sequence feature extraction method | |
CN116028500B (en) | Range query indexing method based on high-dimensional data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |