CN110909085A

CN110909085A - Data processing method, device, equipment and storage medium

Info

Publication number: CN110909085A
Application number: CN201911177388.1A
Authority: CN
Inventors: 陈瑞钦; 黄启军; 李诗琦; 唐兴兴; 林冰垠
Original assignee: WeBank Co Ltd
Current assignee: WeBank Co Ltd
Priority date: 2019-11-25
Filing date: 2019-11-25
Publication date: 2020-03-24

Abstract

The invention relates to the field of financial science and technology, and discloses a data processing method, a device, equipment and a storage medium, wherein the data processing method comprises the following steps: acquiring the binning split points of each characteristic bin, and grouping the characteristic data blocks of each characteristic bin according to the binning split points to generate corresponding relations between each characteristic bin and the characteristic data blocks of each characteristic bin; if a binning adjustment instruction is detected, determining a binning to be adjusted and a feature data block to be adjusted of the binning to be adjusted from each feature binning according to the binning adjustment instruction and the corresponding relation; and adjusting the sub-boxes to be adjusted and the characteristic data blocks to be adjusted, and outputting an adjustment result. The invention solves the technical problem of low data processing efficiency caused by untimely data response when the traditional box data adjusting method faces mass data.

Description

Data processing method, device, equipment and storage medium

Technical Field

The present invention relates to the field of financial technology, and in particular, to a data processing method, apparatus, device, and storage medium.

Background

With the development of computer technology, more and more technologies (big data, distributed, Blockchain, artificial intelligence, etc.) are applied to the financial field, and the traditional financial industry is gradually changing to financial technology (Fintech), but higher requirements are also put forward on the technologies due to the requirements of security and real-time performance of the financial industry.

Feature binning is a data preprocessing technique used to reduce the effects of minor observation errors, and is a method of grouping multiple consecutive values into a smaller number of "bins". In the practical use process, a user can adjust the box separation result according to business experience, and the box separation point can be changed due to the box separation adjustment, so that the statistical information in the box is changed, and statistics needs to be carried out again. However, when the system is confronted with massive data, the statistics process becomes heavy due to the statistics of massive data again, the statistics process is too time-consuming, the data response speed of the characteristic sub-box is greatly reduced, the data response is not timely, the operation performance is greatly reduced, and the data processing efficiency of the system is reduced.

Disclosure of Invention

The invention mainly aims to provide a data processing method, a data processing device, data processing equipment and a storage medium, and aims to solve the technical problem that when a traditional box data adjusting method faces mass data, data response is not timely, so that the data processing efficiency is low.

In order to achieve the above object, an embodiment of the present invention provides a data processing method, where the data processing method includes:

acquiring the binning split points of each characteristic bin, and grouping the characteristic data blocks of each characteristic bin according to the binning split points to generate corresponding relations between each characteristic bin and the characteristic data blocks of each characteristic bin;

if a binning adjustment instruction is detected, determining a binning to be adjusted and a feature data block to be adjusted of the binning to be adjusted from each feature binning according to the binning adjustment instruction and the corresponding relation;

and adjusting the sub-boxes to be adjusted and the characteristic data blocks to be adjusted, and outputting an adjustment result.

Optionally, the grouping the feature data blocks of each feature bin according to the bin splitting bit to generate a corresponding relationship between each feature bin and the feature data block of each feature bin includes:

caching the characteristic data blocks of each characteristic box, and grouping the characteristic data blocks of each characteristic box according to box-dividing positions to generate a corresponding relation between each characteristic box and the characteristic data block of each characteristic box;

the adjusting the to-be-adjusted sub-box and the to-be-adjusted characteristic data block and outputting an adjusting result includes:

and adjusting the sub-boxes to be adjusted and the characteristic data blocks to be adjusted in the cache, and outputting an adjustment result.

Optionally, the adjusting, in the cache, the binning to be adjusted and the feature data block to be adjusted includes:

acquiring a to-be-adjusted quantile point of the to-be-adjusted characteristic data block in a cache, and acquiring an instruction type of the box-dividing adjusting instruction;

and performing cache adjustment processing on the sub-box to be adjusted and the feature data block to be adjusted according to the instruction type, the sub-position point to be adjusted and the sub-box sub-position point.

Optionally, the performing, according to the instruction type, the binning point to be adjusted, and the binning point, cache adjustment processing on the binning block to be adjusted and the feature data block to be adjusted includes:

if the instruction type is a binning splitting type, splitting the to-be-adjusted binning and the to-be-adjusted feature data block according to the to-be-adjusted binning point and the binning splitting point to obtain a plurality of target splitting bins and target splitting data blocks corresponding to the target splitting bins;

and acquiring first statistical information of each target split data, and generating a cache adjustment result according to each target split sub-box, the target split data corresponding to each target split sub-box and the first statistical information corresponding to each target split data.

if the instruction type is a binning merging type, merging the to-be-adjusted binning block and the to-be-adjusted feature data block according to the to-be-adjusted binning point and the binning point to obtain a target merging bin and a target merging data block corresponding to the target merging bin;

acquiring second statistical information of a sub-box to be adjusted, and adding and summarizing the second statistical information to generate target statistical information;

and generating a cache adjustment result according to the target merged data block and the target statistical information.

Optionally, after adjusting the to-be-adjusted binning and the to-be-adjusted feature data block and outputting an adjustment processing result, the method further includes:

counting the information value of each characteristic sub-box in the adjustment processing result;

if the information value is greater than or equal to the preset value, the adjustment processing effect is determined to be qualified;

and if the information value is less than the preset value, determining that the adjusting treatment effect is unqualified.

Optionally, the information value of each feature bin in the statistical adjustment processing result includes:

counting the event value and non-event value of each feature sub-box in the adjustment processing result to obtain woe value;

and obtaining information value according to the event value, the non-event value and the woe value.

The present invention also provides a data processing apparatus, comprising:

the relation module is used for acquiring the binning split points of each characteristic bin, and grouping the characteristic data blocks of each characteristic bin according to the binning split points to generate the corresponding relation between each characteristic bin and the characteristic data block of each characteristic bin;

the determining module is used for determining the sub-boxes to be adjusted and the feature data blocks to be adjusted of the sub-boxes to be adjusted from all the feature sub-boxes according to the sub-box adjusting instructions and the corresponding relations if the sub-box adjusting instructions are detected;

and the adjusting module is used for adjusting the sub-boxes to be adjusted and the characteristic data blocks to be adjusted and outputting an adjusting result.

Optionally, the relationship module comprises:

the cache processing unit is used for caching the characteristic data blocks of each characteristic sub-box and grouping the characteristic data blocks of each characteristic sub-box according to the sub-box sub-position points so as to generate the corresponding relation between each characteristic sub-box and the characteristic data block of each characteristic sub-box;

the adjustment module includes:

and the cache adjusting unit is used for adjusting the sub-boxes to be adjusted and the characteristic data blocks to be adjusted in the cache and outputting an adjusting result.

Optionally, the cache adjusting unit includes:

the instruction type subunit is used for acquiring the to-be-adjusted quantile points of the to-be-adjusted characteristic data block in a cache and acquiring the instruction type of the box dividing adjustment instruction;

and the cache adjusting subunit is used for performing cache adjusting processing on the to-be-adjusted sub-box and the to-be-adjusted feature data block according to the instruction type, the to-be-adjusted sub-position point and the sub-box sub-position point.

Optionally, the cache adjusting subunit is configured to:

Optionally, the data processing apparatus further includes:

the statistical module is used for counting the information value of each characteristic sub-box in the adjustment processing result;

the qualified module is used for confirming that the adjustment processing effect is qualified if the information value is greater than or equal to the preset value;

and the disqualification module is used for confirming that the adjustment processing effect is disqualified if the information value is less than the preset value.

Optionally, the statistics module includes:

the statistical unit is used for counting the event value and non-event value of each characteristic sub-box in the adjustment processing result to obtain woe value;

and the information value unit is used for obtaining the information value according to the event value, the non-event value and the woe value.

Further, to achieve the above object, the present invention also provides an apparatus comprising: a memory, a processor, and a data processing program stored on the memory and executable on the processor, wherein:

the data processing program, when executed by the processor, implements the steps of the data processing method as described above.

In addition, to achieve the above object, the present invention also provides a computer storage medium;

the computer storage medium has stored thereon a data processing program which, when executed by a processor, implements the steps of the data processing method as described above.

According to the method, the binning split points of each characteristic bin are obtained, and the characteristic data blocks of each characteristic bin are grouped according to the binning split points to generate the corresponding relation between each characteristic bin and the characteristic data block of each characteristic bin; if a binning adjustment instruction is detected, determining a binning to be adjusted and a feature data block to be adjusted of the binning to be adjusted from each feature binning according to the binning adjustment instruction and the corresponding relation; and adjusting the sub-boxes to be adjusted and the characteristic data blocks to be adjusted, and outputting an adjustment result. The method can be applied to characteristic interactive binning of mass data in a big data environment, the characteristic data to be adjusted is directly adjusted without any operation on data blocks which do not need to be adjusted, so that the statistical steps of a large number of irrelevant data blocks can be reduced, the time consumption of statistics is reduced, the data response speed of the characteristic binning is greatly improved, the statistical process is simplified, the statistical efficiency of the mass data is improved, the operation performance and the response speed are obviously improved on the premise of ensuring accurate results, the interactive binning experience is optimized, and the data processing efficiency is greatly improved.

Drawings

FIG. 1 is a schematic diagram of an apparatus architecture of a hardware operating environment according to an embodiment of the present invention;

FIG. 2 is a flow chart illustrating a data processing method according to an embodiment of the present invention;

FIG. 3 is a block diagram of the boxed data in the data processing method of the present invention;

FIG. 4 is a block diagram illustrating binning of data blocks in the data processing method of the present invention;

fig. 5 is a schematic diagram illustrating splitting of boxed data blocks in the data processing method of the present invention.

The objects, features and advantages of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

As shown in fig. 1, fig. 1 is a schematic device structure diagram of a hardware operating environment according to an embodiment of the present invention.

The device of the embodiment of the invention can be a PC or a server device.

As shown in fig. 1, the apparatus may include: a processor 1001, such as a CPU, a network interface 1004, a user interface 1003, a memory 1005, a communication bus 1002. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1005 may be a high-speed RAM memory or a non-volatile memory (e.g., a magnetic disk memory). The memory 1005 may alternatively be a storage device separate from the processor 1001.

Those skilled in the art will appreciate that the configuration of the apparatus shown in fig. 1 is not intended to be limiting of the apparatus and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components.

As shown in fig. 1, a memory 1005, which is a kind of computer storage medium, may include therein an operating system, a network communication module, a user interface module, and a data processing program.

In the device shown in fig. 1, the network interface 1004 is mainly used for connecting to a backend server and performing data communication with the backend server; the user interface 1003 is mainly used for connecting a client (user side) and performing data communication with the client; and the processor 1001 may be configured to call a data processing program stored in the memory 1005 and perform operations in various embodiments of the data processing method described below.

The main idea of the embodiment scheme of the invention is as follows: acquiring the binning split points of each characteristic bin, and grouping the characteristic data blocks of each characteristic bin according to the binning split points to generate corresponding relations between each characteristic bin and the characteristic data blocks of each characteristic bin; if a binning adjustment instruction is detected, determining a binning to be adjusted and a feature data block to be adjusted of the binning to be adjusted from each feature binning according to the binning adjustment instruction and the corresponding relation; and adjusting the sub-boxes to be adjusted and the characteristic data blocks to be adjusted, and outputting an adjustment result. The method can be applied to characteristic interactive binning of mass data in a big data environment, the characteristic data to be adjusted is directly adjusted without any operation on data blocks which do not need to be adjusted, so that the statistical steps of a large number of irrelevant data blocks can be reduced, the time consumption of statistics is reduced, the data response speed of the characteristic binning is greatly improved, the statistical process is simplified, the statistical efficiency of the mass data is improved, the operation performance and the response speed are obviously improved on the premise of ensuring accurate results, the interactive binning experience is optimized, and the data processing efficiency is greatly improved.

In the embodiment of the invention, the situation that in the prior art, a user can adjust the box separation result according to business experience, and the box separation point is changed due to the box separation adjustment, so that the statistical information in the box is changed, and the statistics needs to be carried out again is considered. However, when dealing with massive data, the process of statistics becomes heavy due to the statistics of massive data, and consumes too much system resources, and the response is not timely, which greatly reduces the data processing efficiency of the system.

The invention provides a solution, which can be applied to characteristic interactive binning of mass data in a big data environment, and can directly adjust the characteristic data to be adjusted without any operation on data blocks which do not need to be adjusted, so that the statistical steps of a large number of irrelevant data blocks can be reduced, the time consumption of statistics is reduced, the data response speed of the characteristic binning is greatly improved, the statistical process is further simplified, the statistical efficiency of the mass data is improved, the running performance and the response speed are obviously improved on the premise of ensuring accurate results, the interactive binning experience is optimized, and the data processing efficiency is greatly improved.

Based on the above hardware structure, the embodiment of the data processing method of the present invention is provided.

The invention belongs to the field of financial science and technology (Fintech), and provides a data processing method,

in an embodiment of the data processing method, referring to fig. 2, the data processing method includes:

step S10, acquiring the binning split points of each characteristic bin, and grouping the characteristic data blocks of each characteristic bin according to the binning split points to generate the corresponding relation between each characteristic bin and the characteristic data block of each characteristic bin;

step S20, if a binning adjustment instruction is detected, determining a bin to be adjusted and a feature data block to be adjusted of the bin to be adjusted from each feature bin according to the binning adjustment instruction and the corresponding relation;

and step S30, adjusting the sub-boxes to be adjusted and the characteristic data blocks to be adjusted, and outputting the adjustment result.

The data processing method can be applied to equipment, and comprises the following specific contents:

each characteristic box corresponds to a respective box-dividing point, and the box-dividing points correspond to the data boundary of the characteristic box in which the point is located. For example, there currently exists a set of age characteristic bins: feature bin A (0-10 years old), feature bin B (10-20 years old), feature bin C (20-30 years old), and feature bin D (30-40 years old). Each group of feature sub-box is stored with a related feature data block of the age feature. The system equipment acquires all the characteristic boxes and groups the characteristic data blocks according to the box dividing points. Referring to fig. 3, the left content in fig. 3 is a bin 1. The data of the same group can be stored in the same computing node or a plurality of computing nodes, and a plurality of grouped data can also be stored in the same computing node and marked and distinguished. The grouping processing can form corresponding relations between each box and data in the box range, and the data are quickly read and matched in advance. For example, all the feature data blocks in the feature bin a are mapped to the bin 1 data block, all the feature data blocks in the feature bin B are mapped to the bin 2 data block, and the like, so that the bin 1 data block, the bin 2 data block. Therefore, the characteristic binning n and the binning data block n are in mapping correspondence with each other, and the binning data block n is a cache data block of the characteristic binning n and contains all characteristic data blocks of the characteristic binning n.

when a box separation adjusting instruction is detected, the service adjusting requirement for the characteristic box separation currently exists, the service adjusting requirement corresponds to the box separation to be adjusted, and the to-be-adjusted box separation and the corresponding to-be-adjusted characteristic data block can be positioned from each characteristic box separation according to the corresponding relation. For example, the binning adjustment instruction requires splitting and adjusting the feature binning B of 10 to 20 years old, and the feature binning B is obtained as the binning to be adjusted and the corresponding feature data block to be adjusted.

The present embodiment may determine the to-be-adjusted bin to be adjusted through the bin adjustment instruction. For example, the feature classification includes a feature classification A between 0 and 10 years, a feature classification B between 10 and 20 years, a feature classification C between 20 and 30 years and a feature classification D between 30 and 40 years. And the characteristic data of 25 years old is to be adjusted in the binning adjustment instruction, and the characteristic bin C of the bin to be adjusted, which is 20-30 years old, and the corresponding characteristic data block to be adjusted can be positioned through the 25 quantiles and the characteristic bins corresponding to the quantiles in the corresponding relationship.

The adjustment processing mode can be merging and binning or splitting and binning, and is determined according to actual conditions. Specifically, the grouping the feature data blocks of each feature bin according to the bin dividing bit to generate a corresponding relationship between each feature bin and the feature data block of each feature bin includes:

In the embodiment, a cache grouping mechanism is adopted, so that the adjustment of the data in the data boxes is realized in the cache, the excessive consumption of system resources in the process of mass data statistics is avoided, and the response speed and the data processing efficiency are improved. Specifically, assuming that the feature binning needs to be adjusted according to the service adjustment requirement at present, the device system obtains all the feature binning and maps all the feature binning into the cache.

Further, the adjusting the to-be-adjusted binning and the to-be-adjusted feature data block in the cache includes:

step A1, obtaining the to-be-adjusted quantile point of the to-be-adjusted characteristic data block in a cache, and obtaining the instruction type of the binning adjustment instruction;

acquiring the quantile points to be adjusted of the feature data to be adjusted, wherein the feature data to be adjusted is represented in a data block form, for example, 4 feature data to be adjusted exist currently: data blocks A between 0 and 10 years of age, data blocks B between 10 and 20 years of age, data blocks C between 20 and 30 years of age, and data blocks D between 30 and 40 years of age. The quantile point list of all the data blocks which can be obtained according to the data blocks is [10,20,30,40], so that the quantile points to be adjusted of the feature data to be adjusted can be determined according to the quantile point list in the cache. For example, if the characteristic data to be adjusted is a data block D between 30 and 40 years old, the corresponding quantile point to be adjusted is 40.

It can be understood that the binning adjustment instruction includes two types, a binning splitting type and a binning merging type, and in order to avoid an error in the adjustment process, the binning adjustment instruction needs to be type-distinguished to obtain an instruction type.

Step A2, according to the instruction type, the quantile point to be adjusted and the binning quantile point, performing cache adjustment processing on the quantile to be adjusted and the feature data block to be adjusted.

The difference of the instruction types represents the difference of the adjustment flow, and the quantile to be adjusted and the target quantile can be positioned to the object to be adjusted, so that the characteristic data to be adjusted can be adjusted according to the instruction types, the quantile to be adjusted and the target quantile, and the target characteristic data block can be obtained.

Further, the performing, according to the instruction type, the binning split point to be adjusted, and the binning split point, cache adjustment processing on the binning block to be adjusted and the feature data block to be adjusted includes:

step A21, if the instruction type is a binning splitting type, splitting the to-be-adjusted binning and the to-be-adjusted feature data block according to the to-be-adjusted binning point and the binning point to obtain a plurality of target splitting bins and target splitting data blocks corresponding to the target splitting bins;

if the instruction type is a binning splitting type, the current binning adjusting instruction is proved to split a specific binning of all the characteristic binning, and the characteristic data block to be adjusted is the object to be split.

The split of the sub-box needs to split the sub-box to be adjusted into two sub-boxes, split the characteristic data block to be adjusted into two data blocks, namely two sub-boxes and the corresponding characteristic data block are newly generated, then the number of the sub-box with the original sub-box number larger than k +1 is increased by one, and the number of the corresponding characteristic data is increased by one. Specifically referring to fig. 4, it is assumed that the bin k data block is a to-be-adjusted feature data block, k is a to-be-adjusted quantile point, and k +1 is a target quantile point, so that the bin k data block (i.e., the to-be-adjusted feature data block) can be split into the bin k data block and the bin k +1 data block (i.e., a plurality of target split data blocks) according to k (i.e., the to-be-adjusted quantile point) and k +1 (i.e., the target quantile point), and the bin k data block maps the corresponding feature bin k, which means that the feature bin k is split into the feature bin k and the feature bin k + 1. That is to say, when the feature data block to be adjusted is split, only the current feature data block to be adjusted is adjusted, and no operation is required to be performed on other data.

Step A22, obtaining first statistical information of each target split data, and generating a cache adjustment result according to each target split sub-box, the target split data corresponding to each target split sub-box, and the first statistical information corresponding to each target split data.

Corresponding statistical information is stored in each target split data block, for example, the target split data block allocates information such as feature data in the original feature data block to be adjusted, and then the statistical information (such as event information and non-event information) in the original feature data block to be adjusted is reallocated, so that the statistical information corresponding to each target split data block can be obtained. And each target splits the data block and the corresponding statistical information, so that a cache adjustment result can be generated.

It should be noted that the essence of the binning adjustment is to generate new data binning point information, for example, the original binning point is [10,20,30,40], and the adjusted binning point is [10,30,40 ]. As the binning split-site changes, the statistical information in each bin changes, so all data needs to be traversed to recalculate the binning statistical information in each bin.

step A23, if the instruction type is a binning merging type, merging the to-be-adjusted binning block and the to-be-adjusted feature data block according to the to-be-adjusted binning point and the binning point to obtain a target merging bin and a target merging data block corresponding to the target merging bin;

step A24, acquiring second statistical information of the sub-boxes to be adjusted, and adding and summarizing the second statistical information to generate target statistical information;

step A25, generating a cache adjustment result according to the target merged data block and the target statistical information.

If the instruction type is a binning merging type, the current binning adjusting instruction is proved to merge a specific bin in all the feature bins, and then a plurality of feature bins are involved, at this time, the feature data to be adjusted is the object to be merged, and the feature data to be adjusted is a plurality of.

The split-box combination needs to combine a plurality of corresponding data into one data block, namely, two or more data blocks are combined, and then the split-box number with the original split-box number larger than k +1 is reduced by one, and the corresponding data number is also reduced by one. Specifically referring to fig. 5, it is assumed that a bin k data block and a bin k +1 data block are to-be-adjusted feature data blocks, k and k +1 are to-be-adjusted quantiles, and k is a bin quantile point, so according to k and k +1 (i.e., to-be-adjusted quantiles) and k (i.e., bin quantiles), bin k data and bin k +1 (i.e., to-be-adjusted feature data) can be merged into bin k data (i.e., target merged data), and the bin k data block and the bin k +1 data block merged in the cache map corresponding feature bin k and feature bin k +1, which means that the feature bin k and the feature bin k +1 are merged into the feature bin k. That is to say, when merging the feature data blocks to be adjusted in the cache, only the feature data blocks to be adjusted are adjusted, and no operation is required to be performed on other data blocks.

And all the statistical information of the characteristic data blocks to be adjusted needs to be acquired and then added and summarized to obtain target statistical information. The original statistical information in the feature data block to be adjusted is obtained, and as the feature data block to be adjusted is combined, the corresponding statistical information also needs to be combined. For example, the a statistic information of the a data block and the B statistic information of the B data block, as the a and B data blocks are merged, the a statistic information and the B statistic information will also be merged, thereby generating the target statistic information.

After the data blocks are combined, the original box numbers of the box numbers which are more than k +1 are reduced by one, and the corresponding data block numbers are reduced by one. The merging process is defined by the formula:

and after the target combined data block and the target statistical information are obtained, combining the target combined data block and the target statistical information to generate a target characteristic data block.

Further, based on the first embodiment, a second embodiment of the data processing method according to the present invention is provided, in this embodiment, after performing adjustment processing on the to-be-adjusted binning and the to-be-adjusted feature data block and outputting an adjustment processing result, the method further includes:

step a, counting the information value of each characteristic sub-box in the adjustment processing result;

the Information Value is an IV Value, the IV is called Information Value completely, variable prediction capability can be measured, and the greater the IV Value is, the better the box separation processing effect represented by the IV Value is. In this embodiment, the information value of each feature in the statistical adjustment processing result is binned.

Specifically, the information value of each feature bin in the statistical target feature bin list comprises:

step a1, counting the event value and non-event value of each characteristic box in the adjustment processing result to obtain woe value;

and a step a2, obtaining information value according to the event value, the non-event value and the woe value.

In particular, the following algorithm can be referred to:

is characterized in that: x;

the number of the boxes is as follows: n, representing the number of segments into which the feature X is divided after sorting;

case: x_i，1<＝i<N, representing a piece of data of the feature X after sorting;

binning and quantile: s, comprising n-1 different values, in turn S_i，1<＝i<N-1, and S_i<S_i+1；

Number of each bin event: t is^eventComprising n values, in turn

1<＝i<＝n；

Number of non-events per bin: t is^non-eventComprising n values, in turn

1<＝i<＝n；

Total number of events: n is a radical of^event，

Total number of non-events: n is a radical of^non-event，

The values of each of the bins woe are,

the values of each bin iv are taken out of bins,

finally, the box separation effect evaluation index IV value,

according to the event value and the non-event value in the algorithm, woe values corresponding to the target feature sub-boxes can be calculated and obtained, and the IV value of the target feature sub-boxes is obtained based on woe values, wherein the IV value is the information value.

B, if the information value is greater than or equal to a preset value, determining that the adjustment processing effect is qualified;

and c, if the information value is less than the preset value, determining that the adjusting treatment effect is unqualified.

In this embodiment, the reference standard for information value is preset, and can be specifically set according to actual service requirements. If the information value is greater than or equal to the preset value, the box separation processing effect of the current adjustment processing result is proved to be qualified, and if the information value is less than the preset value, the box separation processing effect of the current adjustment processing result is proved to be unqualified. For example, if the preset value is a and the information value is b, if b is greater than a, it is proved that the current binning adjustment processing has an obvious trend effect, and the system device confirms that the binning processing effect of the adjustment processing result is qualified; if b is smaller than a, the trend effect of the current box separation adjustment treatment is proved to be unobvious, and the system equipment confirms that the box separation treatment effect of the adjustment treatment result is unqualified.

In addition, an embodiment of the present invention further provides a data processing apparatus, where the data processing apparatus includes:

Optionally, the relationship module comprises:

the adjustment module includes:

Optionally, the cache adjusting unit includes:

Optionally, the cache adjusting subunit is configured to:

Optionally, the data processing apparatus further includes:

Optionally, the statistics module includes:

In addition, an embodiment of the present invention further provides an apparatus, where the apparatus includes: a memory 109, a processor 110 and a data processing program stored on the memory 109 and executable on the processor 110, the data processing program implementing the steps of the embodiments of the data processing method described above when executed by the processor 110.

Furthermore, the present invention also provides a computer storage medium storing one or more programs, which can be further executed by one or more processors for implementing the steps of the embodiments of the data processing method described above.

The specific implementation of the device and the storage medium (i.e., the computer storage medium) of the present invention is basically the same as the embodiments of the data processing method described above, and will not be described herein again.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention may be substantially or partially embodied in the form of a software product, which is stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for causing a device (e.g. mobile phone, computer, server, or network device) to execute the method according to the embodiments of the present invention.

While the present invention has been described with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, which are illustrative and not restrictive, and it will be apparent to those skilled in the art that various changes and modifications can be made therein without departing from the spirit and scope of the invention as defined in the appended claims.

Claims

1. A data processing method, characterized in that the data processing method comprises:

2. The data processing method according to claim 1, wherein the grouping the feature data blocks of the feature bins according to the binning split bit to generate a correspondence between the feature bins and the feature data blocks of the feature bins comprises:

3. The data processing method according to claim 2, wherein the adjusting the to-be-adjusted binning and the to-be-adjusted feature data block in the buffer includes:

4. The data processing method according to claim 3, wherein the performing the cache adjustment processing on the to-be-adjusted binning and the to-be-adjusted feature data block according to the instruction type, the to-be-adjusted binning point, and the binning point comprises:

5. The data processing method according to claim 3, wherein the performing the cache adjustment processing on the to-be-adjusted binning and the to-be-adjusted feature data block according to the instruction type, the to-be-adjusted binning point, and the binning point comprises:

6. The data processing method according to claim 1, wherein after adjusting the to-be-adjusted binning and the to-be-adjusted feature data block and outputting an adjustment processing result, the method further comprises:

7. The data processing method of claim 6, wherein the statistically adjusting the information value of each feature bin in the processing result comprises:

8. A data processing apparatus, characterized in that the data processing apparatus comprises:

9. An apparatus, characterized in that the apparatus comprises: memory, processor and data processing program stored on the memory and executable on the processor, which when executed by the processor implements the steps of the data processing method according to any one of claims 1 to 7.

10. A storage medium, characterized in that the storage medium has stored thereon a data processing program which, when executed by a processor, implements the steps of the data processing method according to any one of claims 1 to 7.