CN111144505A - Variable classification method, device, equipment and medium based on dimension slice - Google Patents

Variable classification method, device, equipment and medium based on dimension slice Download PDF

Info

Publication number
CN111144505A
CN111144505A CN201911395277.8A CN201911395277A CN111144505A CN 111144505 A CN111144505 A CN 111144505A CN 201911395277 A CN201911395277 A CN 201911395277A CN 111144505 A CN111144505 A CN 111144505A
Authority
CN
China
Prior art keywords
target
variable
dimension
target variable
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911395277.8A
Other languages
Chinese (zh)
Other versions
CN111144505B (en
Inventor
勾爱利
董超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sankuai Online Technology Co Ltd
Original Assignee
Beijing Sankuai Online Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sankuai Online Technology Co Ltd filed Critical Beijing Sankuai Online Technology Co Ltd
Priority to CN201911395277.8A priority Critical patent/CN111144505B/en
Publication of CN111144505A publication Critical patent/CN111144505A/en
Application granted granted Critical
Publication of CN111144505B publication Critical patent/CN111144505B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0635Risk analysis of enterprise or organisation activities

Abstract

The application discloses a variable classification method, device, equipment and medium based on dimension slices, and belongs to the field of computers. The method comprises the following steps: selecting a first dimension characteristic vector and a second dimension characteristic vector which have correlation with a target variable, wherein the target variable is a variable corresponding to a business event, determining the number mxn of dimension slices according to the first dimension characteristic vector and the second dimension characteristic vector, and generating two target variable distribution tables containing mxn lattices, wherein the first target variable distribution table and the second target variable distribution table are in one-to-one correspondence; calculating a user variable corresponding to each grid in the first target variable distribution table and a target variable value corresponding to each grid in the second target variable distribution table; determining a target area with the largest area formed by lattices corresponding to target variable values meeting a preset threshold from a second target variable distribution table; the target region is divided into k regions, i.e. the user variables are divided into k categories.

Description

Variable classification method, device, equipment and medium based on dimension slice
Technical Field
The present application relates to the field of computers, and in particular, to a method, an apparatus, a device, and a medium for variable classification based on dimension slicing.
Background
In the operation scenes of various services, the personnel or policy analysts responsible for operation need to perform differentiated operation for target users, so as to ensure that the operated products can meet the requirements of different types of users to the greatest extent.
Taking the example that the business is a financial business, the policy analyst needs to formulate a risk operation policy. Under the condition of ensuring that the overdue rate reaches the target value, the number of users passing through the risk operation strategy is the largest, the users passing through the risk operation strategy are classified, and different rules are formulated for the users of different classifications. For example, users who can pass the risk policy are classified into four categories, different loan amounts are set for the four categories of users, and schematically, the loan amount corresponding to the user with a higher overdue rate is lower. The method for classifying users passing risk operation strategies in the related art comprises the following steps: and carrying out slice division on the variable influencing the overdue rate, carrying out color gradation color identification on the variable after the slice division, and further classifying the users corresponding to the areas with the most similar colors in the areas with the most similar colors by identifying and finding the largest areas with the similar colors through human eyes.
Based on the above situation, the method for classifying the users in a manual manner has low accuracy and reasonableness.
Disclosure of Invention
The embodiment of the application provides a variable classification method, a variable classification device, variable classification equipment and a variable classification medium based on dimension slices, and the problem that the accuracy and the reasonableness of a method for classifying users in a manual mode in the related art are low can be solved. The technical scheme is as follows:
according to an aspect of the present application, there is provided a dimension slice-based variable classification method, the method including:
selecting a first dimension characteristic vector and a second dimension characteristic vector which have correlation with a target variable, wherein the target variable is a variable corresponding to a business event, and the business event is an event corresponding to a business for performing differentiated operation on a user;
determining the number mxn of dimension slices according to the first dimension characteristic vector and the second dimension characteristic vector, and generating a first target variable distribution table and a second target variable distribution table containing mxn grids, wherein the first target variable distribution table and the second target variable distribution table are in one-to-one correspondence, and m and n are positive integers;
calculating a user variable corresponding to each grid in the first target variable distribution table and a target variable value corresponding to each grid in the second target variable distribution table;
determining a target area with the largest area, which is formed by lattices corresponding to target variable values meeting a preset threshold value, from the second target variable distribution table;
dividing the target area into k areas, wherein the k areas are used for dividing user variables corresponding to the target area into k categories, and k is a positive integer.
In some embodiments of the present application, said dividing the target region into k regions comprises:
determining a user class number k, wherein the user class number k represents k classes of the user variable, and the user class number k is used for dividing the target area into the k areas;
acquiring P permutation and combination modes for dividing the target area into the k areas, wherein P is a positive integer;
and determining a target permutation and combination mode from the P permutation and combination modes, and dividing the target area into the k areas according to the target permutation and combination mode.
In some embodiments of the present application, said determining a target permutation and combination manner from the P permutation and combination manners, and dividing the target region into the k regions according to the target permutation and combination manner includes:
calculating the variance of the target variable value corresponding to each permutation and combination mode in the P permutation and combination modes;
determining the permutation and combination mode with the minimum variance as the target permutation and combination mode;
and dividing the target area into the k areas according to the target permutation and combination mode.
In some embodiments of the present application, the variances comprise intra-class variances, which are variances corresponding between the target variable values belonging to the same class, and inter-class variances, which are variances corresponding between the target variable values belonging to different classes;
calculating the variance of the target variable value corresponding to each permutation and combination mode in the P permutation and combination modes, wherein the variance comprises the following steps:
calculating the intra-class variance of the target variable value corresponding to each permutation and combination mode in the P permutation and combination modes;
calculating the inter-class variance of the target variable value corresponding to each permutation and combination mode in the P permutation and combination modes;
calculating the sum of the intra-class variance and the inter-class variance.
In some embodiments of the present application, the method further comprises:
dividing the target area into the k areas according to the target permutation and combination mode;
and obtaining a classification result of the user variable according to the k regions, wherein the classification result comprises the user variable and a target variable value corresponding to the user variable.
In some embodiments of the present application, the method further comprises:
adjusting the shape of the target area according to the classification result;
and dividing the target area again according to the user class number k according to the shape of the target area.
In some embodiments of the present application, the selecting a first dimension feature vector and a second dimension feature vector having a correlation with a target variable includes:
determining the target variable;
calculating an information value corresponding to a feature vector according to the target variable, wherein the information value is used for representing the correlation between the feature vector and the target variable;
and selecting the first dimension characteristic vector and the second dimension characteristic vector which have correlation with the target variable according to the information value corresponding to the characteristic vector.
In some embodiments of the present application, the determining the number of dimension slices, mxn, from the first-dimension feature vector and the second-dimension feature vector comprises:
selecting the dimension slicing mode, wherein the dimension slicing mode comprises at least one of an equal frequency slicing mode and a chi-square slicing mode;
and slicing the feature vectors according to the dimension slicing mode.
In some embodiments of the present application, the business event comprises: at least one of a financial risk business event, a merchandise sales business event, and an information push business event.
According to another aspect of the present application, there is provided a dimension slice-based variable classification apparatus, the apparatus including:
the system comprises a selection module, a processing module and a processing module, wherein the selection module is used for selecting a first dimension characteristic vector and a second dimension characteristic vector which have correlation with a target variable, the target variable is a variable corresponding to a business event, and the business event is an event corresponding to a business for performing differentiated operation on a user;
a generating module, configured to determine the number mxn of dimension slices according to the first dimension feature vector and the second dimension feature vector, and generate a first target variable distribution table and a second target variable distribution table that contain mxn lattices, where the first target variable distribution table and the second target variable distribution table are in one-to-one correspondence, and m and n are positive integers;
a calculating module, configured to calculate a user variable corresponding to each lattice in the first target variable distribution table and a target variable value corresponding to each lattice in the second target variable distribution table;
the processing module is used for determining a target area with the largest area formed by lattices corresponding to target variable values meeting a preset threshold from the second target variable distribution table;
and the classification module is used for dividing the target area into k areas, the k areas are used for dividing the user variables corresponding to the target area into k categories, and k is a positive integer.
In some embodiments of the present application, the processing module is configured to determine a user class number k, where the user class number k characterizes k classes of the user variable, and the user class number k is used to divide the target region into the k regions;
the selection module is used for acquiring P permutation and combination modes for dividing the target area into k areas, wherein P is a positive integer;
and the processing module is used for determining a target permutation and combination mode from the P permutation and combination modes and dividing the target area into k areas according to the target permutation and combination mode.
In some embodiments of the present application, the calculating module is configured to calculate a variance of a target variable value corresponding to each permutation and combination manner of the P permutation and combination manners;
the processing module is used for determining the permutation and combination mode with the minimum variance as the target permutation and combination mode; and dividing the target area into k areas according to a target arrangement and combination mode.
In some embodiments of the present application, the variances comprise intra-class variances, which are variances corresponding between the target variable values belonging to the same class, and inter-class variances, which are variances corresponding between the target variable values belonging to different classes;
the calculation module is used for calculating the intra-class variance of the target variable value corresponding to each permutation and combination mode in the P permutation and combination modes; calculating the inter-class variance of the target variable value corresponding to each permutation and combination mode in the P permutation and combination modes; calculating the sum of the intra-class variance and the inter-class variance.
In some embodiments of the present application, the processing module is configured to divide the target region into the k regions according to the target permutation and combination manner;
and the selection module is used for obtaining the classification result of the user variable according to the k areas, wherein the classification result comprises the user variable and a target variable value corresponding to the user variable.
In some embodiments of the present application, the processing module is configured to adjust a shape of the target region according to the classification result; and dividing the target area again according to the user class number k according to the shape of the target area.
In some embodiments of the present application, the processing module is configured to determine the target variable;
the calculation module is used for calculating an information value corresponding to a feature vector according to the target variable, wherein the information value is used for representing the correlation between the feature vector and the target variable;
the selecting module is configured to select the first-dimension feature vector and the second-dimension feature vector having a correlation with the target variable according to an information value corresponding to the feature vector.
In some embodiments of the present application, the processing module is configured to select a dimension slicing mode, where the dimension slicing mode includes at least one of an equal frequency slicing mode and a chi-square slicing mode; and slicing the feature vectors according to the dimension slicing mode.
In some embodiments of the present application, the business event comprises: at least one of a financial risk business event, a merchandise sales business event, and an information push business event.
According to another aspect of the present application, there is provided a computer device comprising a processor and a memory having stored therein at least one instruction, at least one program, set of codes, or set of instructions that is loaded and executed by the processor to implement a dimension slice based variable classification method as described above.
According to another aspect of the present application, there is provided a computer-readable storage medium having stored therein at least one instruction, at least one program, set of codes, or set of instructions that is loaded and executed by a processor to implement a dimension-slice based variable classification method as described above.
The beneficial effects brought by the technical scheme provided by the embodiment of the application at least comprise:
the method comprises the steps of determining the number mxn of dimension slices according to a selected first dimension characteristic vector and a selected second dimension characteristic vector, generating a first target variable distribution table and a second target variable distribution table which contain mxn grids, wherein each grid in the first target variable distribution table corresponds to a user variable, each grid in the second target variable distribution table corresponds to a target variable value, determining a region with the largest area formed by the grids corresponding to the target variable values meeting a preset threshold value as a target region, dividing the target region into k regions, and correspondingly dividing the user variable into k categories by the k regions, wherein m, n and k are positive integers. The purpose of classifying the user variable is achieved by carrying out region division on the delineated target region, so that the classification result is more reasonable and accurate.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a flow chart of a dimension slice based variable classification method provided by an exemplary embodiment of the present application;
FIG. 2 is a schematic diagram of a second target variable distribution table provided by an exemplary embodiment of the present application;
FIG. 3 is a flow chart of a dimension slice based variable classification method provided by another exemplary embodiment of the present application;
FIG. 4 is a schematic diagram of a first target variable distribution table provided by an exemplary embodiment of the present application;
FIG. 5 is a schematic illustration of a second target variable distribution table provided by another exemplary embodiment of the present application;
FIG. 6 is a diagram of a second target variable distribution table for determining a target area provided by an exemplary embodiment of the present application;
FIG. 7 is a diagram illustrating a manner of permutation and combination of divided target regions according to an exemplary embodiment of the present application;
FIG. 8 is a flowchart of a dimension slice based variable classification method in connection with information pushed business events provided by an exemplary embodiment of the present application;
FIG. 9 is a block diagram of a dimension slice based variable classification apparatus provided in an exemplary embodiment of the present application;
fig. 10 is a schematic structural diagram of a computer device according to an exemplary embodiment of the present application.
Detailed Description
To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
First, terms referred to in the embodiments of the present application are described:
slicing: the method is characterized in that the feature vectors are divided into vectors with smaller volume or more intervals, and the slicing method comprises the following steps: equidistant slices, equal frequency slices, and chi-square slices. The equidistant slicing refers to a fixed division interval, for example, the equidistant slicing is carried out on the numbers from 1 to 100, 100 numbers are divided into 10 numbers, and then the 10 numbers can be divided into the numbers with the intervals of [1, 10 ], [10, 20 ], [20, 30 ], [30, 40 ]; the equal frequency slices mean that the number of the corresponding intervals is consistent, for example, 50 users correspond to the interval [ a1, a3), 50 users correspond to the interval [ a3, a5), and 50 users correspond to the interval [ a5, a 7); chi-square slicing refers to dividing a feature vector according to the chi-square value of the feature vector.
Information Value (IV): and named as information quantity, the method is used for measuring the influence degree of the characteristic vector on the target. For example, when a classification model is constructed by using model methods such as logistic regression and decision tree, feature vectors need to be screened, feature vectors capable of accurately predicting a target are selected, the feature vectors can be selected through IV values, the greater the IV value corresponding to the feature vectors, the higher the correlation between the feature vectors and the target, and the higher the capability of the feature vectors for predicting the target.
Target variables: the variable is a variable corresponding to a business event, the business event is an event corresponding to a business requiring differentiated operation on a user, for example, in a financial risk business event, different operation strategies are implemented for users with different credit degrees, and the operation is planning, organizing, implementing and controlling popularization of a product in an operation process. In one example, the bank awards different loan amounts for different categories of users, illustratively with an overdue rate as the target variable. Under the condition that the overdue rate is not lower than the target value, the bank needs to ensure that the most users pass risk assessment, classify the users passing the risk assessment, and configure different loan strategies aiming at different types of users. The embodiment of the application takes the example that the business events comprise financial risk business events, commodity sales business events and information push business events as an example for explanation.
In the process of fine operation of an enterprise or an organization, a person or a policy analyst in charge of operation needs to judge which users are target users and perform differential operation on the target users. In the related art, a person or a policy analyst in charge of operation makes some rules by analyzing through an application program (e.g., spreadsheet Excel) supporting processing data or by empirical judgment, and performs differentiated operation on a user group according to the rules, for example, the policy analyst divides a feature vector of at least one dimension in an equidistant division manner, inputs data corresponding to the feature vector into the application program supporting processing data, and performs color-level color identification on the data through the application program, that is, a grid corresponding to data meeting a target value in the spreadsheet is identified by using the same color or the same series of colors. The largest similar color block is found out in a human eye searching mode, and the user classification is carried out on several blocks with similar colors in the color block, wherein the color block is composed of at least one grid in the spreadsheet.
When a policy analyst classifies users by the method, the difference between user groups corresponding to grids with similar colors is difficult to guarantee by human eyes, and the rationality of dimension division is difficult to guarantee, so that the user magnitude corresponding to each grid in the spreadsheet is inconsistent, and the target value of each user group is not accurate enough.
The embodiment of the application provides a variable classification method based on dimension slices, which can automatically classify users and has reasonable classification results.
Fig. 1 illustrates a dimension slice-based variable classification method provided in an exemplary embodiment of the present application, which includes the following steps:
step 101, selecting a first dimension feature vector and a second dimension feature vector having correlation with a target variable, where the target variable is a variable corresponding to a service event, and the service event is an event corresponding to a service for performing differentiated operation on a user.
Optionally, the business event comprises at least one of a financial risk business event, a merchandise sales business event, and an information push business event. Illustratively, the financial risk business event includes at least one of a banking business event, a securities business event, a trust business event, and an insurance business event. Illustratively, the information push service event includes at least one of an information push service event of a social application program, an information push service event of a shopping application program, an information push service event of a music application program, an information push service event of a video application program, and an information push service event of a predetermined application program. Optionally, the target variables include overdue rate, credit, promotional activity, time of information push, frequency of information push. Illustratively, when the business event is a financial risk business event, the objective variable includes at least one of overdue rate, credit, indemnity, risk level, and commitment rate.
In one example, the target variable is overdue rate, and the first-dimension feature vector and the second-dimension feature vector having a correlation with the target variable are income of the loan user and credit investigation of the loan user, respectively.
Step 102, determining the number mxn of dimension slices according to the first dimension eigenvector and the second dimension eigenvector, and generating a first target variable distribution table and a second target variable distribution table containing mxn grids, wherein the first target variable distribution table corresponds to the second target variable distribution table one by one, and m and n are both positive integers.
Illustratively, m is 3 and n is 5. The number of dimension slices corresponding to the first dimension feature vector is 3, and the number of dimension slices corresponding to the second dimension feature vector is 5, so that the generated first target variable distribution table and second target variable distribution table have 3 rows and 5 columns of grids, or 5 rows and 3 columns of grids. Optionally, the first target variable distribution table is a distribution table corresponding to a user variable, and the second target variable distribution table is a distribution table having a target variable value corresponding to the user variable. The user variable refers to the distribution of the number of users in an interval with a certain dimension of feature vectors, the target variable value refers to the value of a target variable corresponding to the user variable, if the target variable is an overdue rate, the user variable refers to the distribution of the number of users when the overdue rate is between 0.001 and 0.006, and the target variable value refers to the distribution of the number of overdue users when the overdue rate is between 0.001 and 0.006.
In one example, if the target variable is an overdue rate, the first-dimension feature vector is an income condition of the loan user, the second-dimension feature vector is a credit condition of the loan user, the number of dimension slices corresponding to the first-dimension feature vector is 3, and the number of dimension slices corresponding to the second-dimension feature vector is 5, a first target variable distribution table of 3 × 5 grids and a second target variable distribution table of 3 × 5 grids are generated. Alternatively, the first target variable distribution table and the second target variable distribution table are each a 3-row 5-column table. The first target variable distribution table comprises the number of loan users with different incomes, and the second target variable distribution table is the overdue rate of the loan users with different credit investigation conditions.
And 103, calculating a user variable corresponding to each grid in the first target variable distribution table and a target variable value corresponding to each grid in the second target variable distribution table.
Illustratively, the user variable is the number of loan users in different zones, and the target variable value is the ratio of the number of overdue users in different zones to the total number of users in the zone.
The first target variable distribution table is shown in table one, and the second target variable distribution table is shown in table two.
Watch 1
Income/overdue rate [100,300) [300,500) [500,700) [700,900) [900,1100)
[2000,5000) 500 400 6000 3000 5000
[5000,8000) 510 350 200 150 99
[8000,11000) 800 500 350 230 20
Watch two
Income/overdue rate [100,300) [300,500) [500,700) [700,900) [900,1100)
[2000,5000) 0.006 0.0025 0.001 0.001 0.05
[5000,8000) 0.0059 0.0003 0.025 0.1 0.1
[8000,11000) 0.005 0.002 0.0028 0.0043 0.05
It should be noted that the data in each row in the table one and the table two is divided according to the income of the loan user, and each column is divided according to the credit rating score of the loan user. 500 in table one indicates revenue at [2000, 5000) (units: bins), and a credit rating score of [100, 300) (units: minutes) is 500. 0.006 in table two indicates revenue at [2000, 5000) (units: bins), and a credit rating score of [100, 300) (units: minutes) of the loan users in the interval accounts for the percentage of the total number of the loan users in the interval.
And 104, determining a target area with the largest area formed by grids corresponding to the target variable values meeting the preset threshold from the second target variable distribution table.
In one example, if the preset threshold is 0.006, then the corresponding grid with the largest area of the target area is determined from table two, and the target area is shown in fig. 2 (the area formed by gray grids in the table).
The target area comprises an area formed by a grid of income of the loan user between intervals [2000, 11000) and a grid of credit rating scores of the loan user between intervals [100, 900), and is any one of a rectangle, a square, a polygon and an irregular figure. In the embodiment of the present application, the target area is illustrated as a rectangle, and the target area includes an area formed by a grid in which the income of the loan user is between intervals [2000, 11000) and a grid in which the credit rating score of the loan user is between intervals [100, 500).
And 105, dividing the target area into k areas, wherein the k areas are used for dividing the user variables corresponding to the target area into k categories, and k is a positive integer.
In one example, k is 4, and the target area shown in step 104 is divided into 4 areas, where the 4 areas are used to divide the user variables corresponding to the target area into 4 categories. Optionally, there are multiple dividing ways to divide the target area into 4 areas. Illustratively, a region composed of a grid between the region [2000, 5000) of the income of the loan user and the region [100, 300) of the credit rating score of the loan user is a first region, a region composed of a grid between the region [5000, 8000) of the income of the loan user and the region [100, 500) of the credit rating score of the loan user is a second region, a region composed of a grid between the region [8000, 11000) of the income of the loan user and the region [100, 300) of the credit rating score of the loan user is a third region, and a region composed of a grid between the region [8000, 11000) of the income of the loan user and the region [300, 500) of the credit rating score of the loan user is a fourth region.
In summary, in the method provided in this embodiment, the number mxn of dimension slices is determined according to the selected first dimension eigenvector and the selected second dimension eigenvector, and a first target variable distribution table and a second target variable distribution table containing mxn lattices are generated, where each lattice in the first target variable distribution table corresponds to a user variable, each lattice in the second target variable distribution table corresponds to a target variable value, a region of the largest area formed by lattices corresponding to the target variable values meeting a preset threshold is determined as a target region, the target region is divided into k regions, and the k regions correspondingly divide the user variable into k categories, where m, n, and k are positive integers. The purpose of classifying the user variable is achieved by carrying out region division on the delineated target region, so that the classification result is more reasonable and accurate.
Fig. 3 shows a flowchart of a dimension slice-based variable classification method according to another exemplary embodiment of the present application. The method comprises the following steps:
optionally, the business event comprises at least one of a financial risk business event, a merchandise sales business event, and an information push business event.
Optionally, selecting the first dimension feature vector and the second dimension feature vector having a correlation with the target variable includes the following steps:
in step 301, a target variable is determined.
Illustratively, the business event is a merchandise sales business event, and the target variable is a condition of the user participating in a merchant sales promotion. Optionally, the feature vector having a correlation with the target variable comprises at least one of the following vectors: whether the user has purchased the last month, the amount the user has purchased the last time, the type of goods the user has purchased the last time, and whether the user is a member of the merchant.
And 302, calculating an information value corresponding to the feature vector according to the target variable, wherein the information value is used for representing the correlation between the feature vector and the target variable.
The calculation formula of the information value is as follows:
Figure BDA0002346123140000111
where WOE is the evidence weight, pyiIs the user responding in group i (e.g. in a merchandising transaction event, py)iUsers in group i participating in merchant sales promotion), pniAre unresponsive subscribers in group i (e.g., in the event of a merchandise sales transaction, pn)iUsers in group i who are not participating in the merchant promotional program), # yiIs the number of responding users in the ith group of users (e.g., py in a merchandising transaction event)iIs the number of users participating in the merchant's promotional program in group i) # niIs the number of unresponsive subscribers in the group (e.g., pn in the event of a merchandise sales transactioniIs the number of users in group i that are not participating in the merchant promotional campaign), # yTIs the number of all responding users (e.g., # y in the event of a merchandise sales transaction)TIs the number of all users participating in the merchant promotional campaign), # nTIs the number of all unresponsive users (e.g., # n in the event of a merchandise sales transaction)TThe number of all users who do not participate in the merchant's promotional program), n is the number of variable groupings. IViThe IV is the information value of a group of vectors, the IV is the information value of all vectors, the IV value is calculated according to the WOE value, and then the IV values of all groups of vectors are summed, so that the IV values of all characteristic vectors can be obtained.
The larger the IV value of a feature vector, the higher the correlation of the feature vector with the target variable.
And 303, selecting a first dimension characteristic vector and a second dimension characteristic vector which have correlation with the target variable according to the information value corresponding to the characteristic vector.
Illustratively, whether the user is a member of the merchant or not is greater than the IV value corresponding to the commodity category which is purchased last time by the user is greater than the IV value corresponding to the amount which is purchased last time by the user is greater than whether the user has purchased the corresponding IV value last month or not. And selecting whether the user is a member of the merchant as a first dimension characteristic vector, and selecting the commodity type which is purchased by the user most recently as a second dimension characteristic vector.
And 304, determining the number mxn of dimension slices according to the first dimension characteristic vector and the second dimension characteristic vector, and generating a first target variable distribution table and a second target variable distribution table containing mxn grids, wherein the first target variable distribution table and the second target variable distribution table are in one-to-one correspondence, and m are both positive integers.
Optionally, the dimension slicing of the feature vector further comprises the steps of:
step 3041, a dimensional slice mode is selected, where the dimensional slice mode includes at least one of an equal frequency slice mode and a chi-square slice mode.
Step 3042, slice the feature vectors according to a dimension slicing manner.
Taking the business event as an example of a commodity sales event, the target variable is whether the user participates in the sales promotion of the merchant, the first-dimension feature vector is whether the user is a member of the merchant, and the second-dimension feature vector is a commodity which is purchased by the user last time. Illustratively, chi-square slices are selected as the dimension slicing mode. And respectively calculating chi-square values of the first-dimension characteristic vector and the second-dimension characteristic vector, and carrying out dimension slicing on the first-dimension characteristic vector and the second-dimension characteristic vector according to the chi-square values.
Step 305, calculating a user variable corresponding to each grid in the first target variable distribution table and a target variable value corresponding to each grid in the second target variable distribution table.
Taking the commodity sales business event as an example, as shown in fig. 4 and 5, each grid in the first target variable distribution table corresponds to the number of users participating in the sales promotion of the merchant in each section, and each grid in the second target variable distribution table corresponds to the ratio of the number of users participating in the sales promotion of the merchant in each section to the total number of users.
Step 306, determining a target area with the largest area formed by lattices corresponding to the target variable values meeting the preset threshold from the second target variable distribution table.
Illustratively, as shown in fig. 6, the preset threshold is 0.006, and the target variable value smaller than 0.006 is identified in accordance with the grid (the grid marked with dots in fig. 6). The target region is a region composed of a lattice corresponding to the interval [ a1, a8) of the feature vector a and a lattice corresponding to the interval [ b1, b5) of the feature vector b.
Step 307, determining a user class number k, where the user class number k represents k classes of the user variable, and the user class number k is used to divide the target region into k regions.
Step 308, P permutation and combination modes for dividing the target region into k regions are obtained, where P is a positive integer.
Alternatively, there are various forms of dividing the target region into k regions. Fig. 7 (a) and 7 (b) show two forms thereof, respectively, in which lattices with the same shading belong to the same class.
Step 309, calculating the variance of the target variable value corresponding to each permutation and combination mode in the P permutation and combination modes.
Illustratively, the variance of the value of the target variable in the classification form as shown in fig. 7 (a) is calculated, and the variance of the value of the target variable in the classification form as shown in fig. 7 (b) is calculated.
In step 310, the permutation and combination method with the minimum variance is determined as the target permutation and combination method.
Illustratively, the variance of the values of the target variables in the classification form shown in fig. 7 (a) is smaller than the variance of the values of the target variables in the classification form shown in fig. 7 (b), and therefore the classification form shown in fig. 7 (a) is the target permutation and combination.
Step 311, the target region is divided into k regions according to the target permutation and combination mode.
The target region is divided into k regions in the manner as shown in fig. 7 (a).
In summary, in the method provided in this embodiment, the first dimension feature vector and the second dimension feature vector are determined by calculating the information value corresponding to the feature vector, and the first target variable distribution table and the second target variable distribution table corresponding to m × n of the number of dimension slices are generated according to the first dimension feature vector and the second dimension feature vector. And determining a region with the largest area formed by grids corresponding to the target variable values meeting the preset threshold value from the second target variable distribution table as a target region, and dividing the target region into k regions according to the number k of the user classes, namely k user classes. The target area is divided into k areas, P permutation and combination modes are provided (m, n, k and P are positive integers), the target permutation and combination mode is determined according to the minimum variance of target variable values of each permutation and combination mode, and the target area is divided according to the target permutation and combination mode. The method provided by the embodiment automatically determines the target area according to the feature vector, and divides the target area into k areas corresponding to the user class number k according to the minimum variance of the target variable value, so that the classification of the user variable is more reasonable and accurate.
The variable classification method based on the dimension slice is described below with reference to an information push service event. FIG. 8 is a flowchart illustrating a dimension slice-based variable classification method in financial risk business events according to an exemplary embodiment of the present application. The method comprises the following steps:
step 801, select a target variable.
Illustratively, the target variable is information push frequency, and in the application program, the information push frequency is counted by the amount of information pushed every day.
Step 802, determining two dimensions for maximally distinguishing users according to the information values.
Optionally, two feature vectors with the largest information value are selected as the first-dimension feature vector and the second-dimension feature vector from among the feature vectors. Illustratively, the first dimension feature vector is the age of the user using the application, and the second dimension feature vector is the frequency with which the user uses the application (e.g., the time the user spends on the application each day).
Step 803, select the dimension slice number m × n.
Step 804, a slicing mode is selected.
m and n are positive integers, illustratively, the slicing mode is equidistant slicing, the number of slices corresponding to the first-dimension feature vector is m, and the number of slices corresponding to the second-dimension feature vector is n.
Step 805, generate an m × m table.
The m × n table has m × n cells corresponding to m × n user groups.
Step 806a, calculating the user group target variable in the m × n table.
The m × n table includes intervals of the first-dimension feature vector and the second-dimension feature vector of the dimension slice, and the user population target variables in the m × n grids are calculated from the intervals, as shown in fig. 5. Illustratively, the interval corresponding to the first-dimension feature vector is [8, 80) (unit: year), the interval corresponding to the second-dimension feature vector is [0, 24) (hour), the number corresponding to each grid in the table shown in fig. 5 is the ratio of the number of users using the application program in the interval to the number of users not using the application program in the interval, for example, in the obtained user information, if the age is [8, 18) and the frequency of using the application program is [0, 2), the number of users is 500, and if the number of users not using the corresponding interval is 2, the value of the target variable of the user group is 0.004.
Step 806b, calculate the user amount of the user population in the m × n table.
And calculating the user quantity of the user group in the m multiplied by n table according to the corresponding intervals of the first dimension characteristic vector and the second dimension characteristic vector. Optionally, the user amount may be obtained through a questionnaire or through an account of an application program registered by the user, and the manner of obtaining the user amount is not limited in the present application.
In step 807, all the cells within the m × n population that meet the target threshold are determined based on the set target threshold.
Illustratively, if the target threshold (i.e., the preset threshold) is 0.006, then the grid with the target variable value smaller than 0.006 is identified, as shown in fig. 6 (the grid with dotted shading in the table). Optionally, the target threshold is adjusted according to the user pass and fail fractions.
Step 808, outputting the largest rectangle with the largest number of lattices that maximize the satisfaction of the target threshold.
In one example, the maximum rectangle is an area where the target variable value is in the interval [ a1, a8) and the lattice corresponding to the interval [ b1, b 5).
It should be noted that the grids satisfying the target threshold are always different spaces dispersed in the m × n grids, and a regular rectangle needs to be defined to ensure a regular configurability, and therefore, the grids corresponding to the target variables in the sections [ a1, a3) and [ b3, b5) are also defined as the maximum rectangles.
In the actual process, the size of the grid can be adjusted by dividing the fault tolerance rate of the grid, and the formula of the fault tolerance rate is as follows:
Figure BDA0002346123140000151
wherein ftr is the fault tolerance, cntyiIs the user amount in each cell through which the ith target value passes (i.e., the number of users in the cell containing the dotted shading); cntniIs the amount of users in each grid that the ith target value does not pass (i.e. the number of users in the grid that are redundant when dividing a large rectangle and do not contain dot-shaped shading, such as the grid corresponding to "0.007" shown in fig. 6); gyiIs the number of ith passing lattices (i.e., the number of lattices containing dotted shading); gniIs the number of the ith passing grids (i.e. the number of redundant grids which do not contain dot-shaped shading when dividing a large rectangle).
Procedure in business event operationIf the number of non-passing users taken into account by redundancy is not considered, the weight may not be used
Figure BDA0002346123140000152
Step 809, input the class number k of the user cluster.
Illustratively, k is 4. Optionally, after the target variables of the user group are classified into 4 types, each grid identifier may be classified according to a threshold of each type.
And step 810, outputting a plurality of clustered regular rectangles with different color identifications.
Alternatively, the output rectangular area may be color coded or have a shading pattern. According to the clustering number (namely the user class number) of the users, for example, the space of the circled rectangle is divided into 4 rectangles, and the differentiation strategy rule is executed for each rectangle.
Optionally, the target region is divided into k rectangles, and there are P permutation and combination modes, where k and P are both positive integers. Selecting a target arrangement mode from the P arrangement modes requires calculating a variance of a target variable value corresponding to each arrangement combination mode, optionally, the variance includes an intra-class variance and an inter-class variance, the method includes the steps of:
and S1, calculating the intra-class variance of the target variable value corresponding to each permutation and combination mode of the P permutation and combination modes.
And S2, calculating the inter-class variance of the target variable value corresponding to each permutation and combination mode in the P permutation and combination modes.
S3, calculating the sum of the intra-class variance and the inter-class variance.
In order to achieve the purpose of differentiation strategy, it is necessary to ensure that the magnitude of each clustered user needs to be added with a factor of the user magnitude, that is, the variance of the user magnitude in the class is also minimum, and the calculation formula is as follows:
Figure BDA0002346123140000161
the VF is a function for judging the quality of a clustering result, and the smaller the value is, the better the value is; var _ cntiIs the variance of the ratio of the ith class user amount to the total group user amount; var _ rateiIs the variance of the class i user target value (e.g., overdue rate). λ is the custom Var _ cntiIf the user amount of each group in the clustering result is not uniform enough to make the ratio of the user amounts of various user groups greatly different, the value can be set to 0, and n is a positive integer.
And S4, determining the permutation and combination mode with the minimum variance sum as a target permutation and combination mode.
And S5, dividing the target area into k areas according to the target permutation and combination mode, wherein k is a positive integer.
And S6, obtaining the classification result of the user variable according to the k areas, wherein the classification result comprises the user variable and the target variable value corresponding to the user variable.
Optionally, the foregoing steps further include: 1. adjusting the shape of the target area according to the classification result; 2. and dividing the target area again according to the user class number k according to the shape of the target area. And adjusting the rectangular distribution according to the requirement of whether the user magnitude exists in each user cluster.
And 811, outputting the user statistics and target variable reports which pass or fail the rule, the clustering strategy rules of different users and the user target variable reports in the rules.
In one example, the policy analyst determines to push 6 pieces of information per day to users aged 20-25, 4 pieces of information per day to users aged 25-30, and 2 pieces of information per day to users aged 40-50.
In summary, in the method provided in this embodiment, a maximum rectangle is defined from the lattices corresponding to the target variable values that meet the target threshold, and the maximum rectangle is divided into a plurality of clusters by inputting the number of the clusters of the user, that is, the user group is divided.
The following are embodiments of the apparatus of the present application, and for details that are not described in detail in the embodiments of the apparatus, reference may be made to corresponding descriptions in the above method embodiments, and details are not described herein again.
Fig. 9 shows a schematic structural diagram of a variable classification apparatus based on dimension slices according to an exemplary embodiment of the present application. The apparatus can be implemented as all or a part of a terminal by software, hardware or a combination of both, and includes:
a selecting module 910, configured to select a first-dimension feature vector and a second-dimension feature vector having a correlation with a target variable, where the target variable is a variable corresponding to a service event, and the service event is an event corresponding to a service for performing differentiated operation on a user;
a generating module 920, configured to determine the number mxn of the dimension slices according to the first dimension eigenvector and the second dimension eigenvector, and generate a first target variable distribution table and a second target variable distribution table containing mxn lattices, where the first target variable distribution table corresponds to the second target variable distribution table one to one, and m and n are positive integers;
a calculating module 930, configured to calculate a user variable corresponding to each lattice in the first target variable distribution table and a target variable value corresponding to each lattice in the second target variable distribution table;
a processing module 940, configured to determine, from the second target variable distribution table, a target region with a largest area, where the target region is composed of lattices corresponding to target variable values that satisfy a preset threshold;
the classification module 950 is configured to divide the target region into k regions, where the k regions are used to divide the user variables corresponding to the target region into k categories, where k is a positive integer.
In an optional embodiment, the processing module 940 is configured to determine a user class number k, where the user class number k represents k categories of the user variable, and the user class number k is used to divide the target region into k regions;
the selecting module 910 is configured to obtain P permutation and combination manners for dividing the target region into k regions, where P is a positive integer;
the processing module 940 is configured to determine a target permutation and combination manner from the P permutation and combination manners, and divide the target region into k regions according to the target permutation and combination manner.
In an optional embodiment, the calculating module 930 is configured to calculate a variance of a target variable value corresponding to each permutation and combination manner of the P permutation and combination manners;
the processing module 940 is configured to determine the permutation and combination method with the minimum variance as the target permutation and combination method; and dividing the target area into k areas according to a target arrangement and combination mode.
In an alternative embodiment, the variances include intra-class variances, which are variances corresponding between target variable values belonging to the same class, and inter-class variances, which are variances corresponding between target variable values belonging to different classes;
the calculating module 930 is configured to calculate an intra-class variance of the target variable value corresponding to each permutation and combination manner of the P permutation and combination manners; calculating the inter-class variance of the target variable value corresponding to each permutation and combination mode in the P permutation and combination modes; the sum of the intra-class variance and the inter-class variance is calculated.
In an optional embodiment, the processing module 940 is configured to divide the target region into k regions according to the target permutation and combination manner;
the selecting module 910 is configured to obtain a classification result of the user variable according to the k regions, where the classification result includes the user variable and a target variable value corresponding to the user variable.
In an alternative embodiment, the processing module 940 is configured to adjust the shape of the target region according to the classification result; and dividing the target area again according to the user class number k according to the shape of the target area.
In an alternative embodiment, the processing module 940 is configured to determine a target variable;
the calculating module 930 is configured to calculate an information value corresponding to the feature vector according to the target variable, where the information value is used to represent a correlation between the feature vector and the target variable;
the selecting module 910 is configured to select a first-dimension feature vector and a second-dimension feature vector having a correlation with a target variable according to an information value corresponding to the feature vector.
In an optional embodiment, the processing module 940 is configured to select a dimension slicing mode, where the dimension slicing mode includes at least one of an equal frequency slicing mode and a chi-square slicing mode; and slicing the feature vectors according to a dimension slicing mode.
In an alternative embodiment, the traffic events include: at least one of a financial risk business event, a merchandise sales business event, and an information push business event.
Referring to fig. 10, a block diagram of a computer device 1000 according to an exemplary embodiment of the present application is shown. The computer device 1000 may be a portable mobile terminal, such as: smart phones, tablet computers, MP3 players (Moving Picture Experts Group Audio Layer III, motion video Experts compression standard Audio Layer 3), MP4 players (Moving Picture Experts Group Audio Layer IV, motion video Experts compression standard Audio Layer 4). The computer device 1000 may also be referred to by other names such as user equipment, portable terminal, etc.
Generally, the computer device 1000 includes: a processor 1001 and a memory 1002.
Processor 1001 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and so forth. The processor 1001 may be implemented in at least one hardware form of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). The processor 1001 may also include a main processor and a coprocessor, where the main processor is a processor for processing data in an awake state, and is also referred to as a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 1001 may be integrated with a GPU (Graphics Processing Unit), which is responsible for rendering and drawing the content required to be displayed on the display screen. In some embodiments, the processor 1001 may further include an AI (Artificial Intelligence) processor for processing a computing operation related to machine learning.
Memory 1002 may include one or more computer-readable storage media, which may be tangible and non-transitory. The memory 1002 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 1002 is used to store at least one instruction for execution by processor 1001 to implement the dimension slice based variable classification methods provided herein.
In some embodiments, the computer device 1000 may further optionally include: a peripheral interface 1003 and at least one peripheral. Specifically, the peripheral device includes: at least one of radio frequency circuitry 1004, touch screen display 1005, camera 1006, audio circuitry 1007, positioning components 1008, and power supply 1009.
The peripheral interface 1003 may be used to connect at least one peripheral related to I/O (Input/Output) to the processor 1001 and the memory 1002. In some embodiments, processor 1001, memory 1002, and peripheral interface 1003 are integrated on the same chip or circuit board; in some other embodiments, any one or two of the processor 1001, the memory 1002, and the peripheral interface 1003 may be implemented on separate chips or circuit boards, which are not limited by this embodiment.
The Radio Frequency circuit 1004 is used for receiving and transmitting RF (Radio Frequency) signals, also called electromagnetic signals. The radio frequency circuitry 1004 communicates with communication networks and other communication devices via electromagnetic signals. The radio frequency circuit 1004 converts an electrical signal into an electromagnetic signal to transmit, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 1004 comprises: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a subscriber identity module card, and so forth. The radio frequency circuit 1004 may communicate with other terminals via at least one wireless communication protocol. The wireless communication protocols include, but are not limited to: the world wide web, metropolitan area networks, intranets, generations of mobile communication networks (2G, 3G, 4G, and 5G), Wireless local area networks, and/or WiFi (Wireless Fidelity) networks. In some embodiments, the rf circuit 1004 may further include NFC (Near Field Communication) related circuits, which are not limited in this application.
The touch display screen 1005 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. The touch display screen 1005 also has the ability to capture touch signals on or over the surface of the touch display screen 1005. The touch signal may be input to the processor 1001 as a control signal for processing. The touch display screen 1005 is used to provide virtual buttons and/or a virtual keyboard, also referred to as soft buttons and/or a soft keyboard. In some embodiments, the touch display screen 1005 may be one, providing a front panel of the computer device 1000; in other embodiments, the touch display screen 1005 may be at least two, respectively disposed on different surfaces of the computer device 1000 or in a folded design; in still other embodiments, the touch display 1005 may be a flexible display, disposed on a curved surface or on a folded surface of the computer device 1000. Even more, the touch display screen 1005 may be arranged in a non-rectangular irregular figure, i.e., a shaped screen. The touch Display screen 1005 may be made of a material such as an LCD (Liquid Crystal Display) or an OLED (organic light-Emitting Diode).
The camera assembly 1006 is used to capture images or video. Optionally, the camera assembly 1006 includes a front camera and a rear camera. Generally, a front camera is used for realizing video call or self-shooting, and a rear camera is used for realizing shooting of pictures or videos. In some embodiments, the number of the rear cameras is at least two, and each of the rear cameras is any one of a main camera, a depth-of-field camera and a wide-angle camera, so that the main camera and the depth-of-field camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize a panoramic shooting function and a VR (Virtual Reality) shooting function. In some embodiments, camera assembly 1006 may also include a flash. The flash lamp can be a monochrome temperature flash lamp or a bicolor temperature flash lamp. The double-color-temperature flash lamp is a combination of a warm-light flash lamp and a cold-light flash lamp, and can be used for light compensation at different color temperatures.
The audio circuit 1007 is used to provide an audio interface between a user and the computer device 1000. The audio circuit 1007 may include a microphone and a speaker. The microphone is used for collecting sound waves of a user and the environment, converting the sound waves into electric signals, and inputting the electric signals to the processor 1001 for processing or inputting the electric signals to the radio frequency circuit 1004 for realizing voice communication. For stereo sound acquisition or noise reduction purposes, the microphones may be multiple and disposed at different locations of the computer device 1000. The microphone may also be an array microphone or an omni-directional pick-up microphone. The speaker is used to convert electrical signals from the processor 1001 or the radio frequency circuit 1004 into sound waves. The loudspeaker can be a traditional film loudspeaker or a piezoelectric ceramic loudspeaker. When the speaker is a piezoelectric ceramic speaker, the speaker can be used for purposes such as converting an electric signal into a sound wave audible to a human being, or converting an electric signal into a sound wave inaudible to a human being to measure a distance. In some embodiments, the audio circuit 1007 may also include a headphone jack.
The Location component 1008 is used to locate the current geographic Location of the computer device 1000 for navigation or LBS (Location Based Service). The Positioning component 1008 can be a Positioning component based on the Global Positioning System (GPS) in the united states, the beidou System in china, or the galileo System in russia.
The power supply 1009 is used to supply power to the various components in the computer device 1000. The power source 1009 may be alternating current, direct current, disposable batteries, or rechargeable batteries. When the power source 1009 includes a rechargeable battery, the rechargeable battery may be a wired rechargeable battery or a wireless rechargeable battery. The wired rechargeable battery is a battery charged through a wired line, and the wireless rechargeable battery is a battery charged through a wireless coil. The rechargeable battery may also be used to support fast charge technology.
In some embodiments, the computer device 1000 also includes one or more sensors 1010. The one or more sensors 1010 include, but are not limited to: acceleration sensor 1011, gyro sensor 1012, pressure sensor 1013, fingerprint sensor 1014, optical sensor 1015, and proximity sensor 1016.
The acceleration sensor 1011 detects the magnitude of acceleration on three coordinate axes of a coordinate system established with the computer apparatus 1000. For example, the acceleration sensor 1011 is configured to detect the components of the gravitational acceleration on three coordinate axes. The processor 1001 may control the touch display screen 1005 to display the user interface in a landscape view or a portrait view according to the gravitational acceleration signal of the acceleration sensor 1010 set. The acceleration sensor 1011 may be used for acquisition of motion data of a game or a user.
The gyro sensor 1012 may detect a body direction and a rotation angle of the computer apparatus 1000, and the gyro sensor 1012 may collect a 3D motion of the user with respect to the computer apparatus 1000 together with the acceleration sensor 1011. From the data collected by the gyro sensor 1012, the processor 1001 may implement the following functions: motion sensing (such as changing the UI according to a user's tilting operation), image stabilization at the time of photographing, game control, and inertial navigation.
Pressure sensors 1013 may be disposed on a side bezel of computer device 1000 and/or on a lower layer of touch display screen 1005. When the pressure sensor 1013 is disposed on a side frame of the computer apparatus 1000, a user's holding signal to the computer apparatus 1000 can be detected, and left-right hand recognition or shortcut operation can be performed based on the holding signal. When the pressure sensor 1013 is disposed at a lower layer of the touch display screen 1005, it is possible to control the operability control on the UI interface according to the pressure operation of the user on the touch display screen 1005. The operability control comprises at least one of a button control, a scroll bar control, an icon control and a menu control.
The fingerprint sensor 1014 is used for collecting a fingerprint of a user to identify the identity of the user according to the collected fingerprint. Upon identifying that the user's identity is a trusted identity, the processor 1001 authorizes the user to perform relevant sensitive operations including unlocking a screen, viewing encrypted information, downloading software, paying, and changing settings, etc. The fingerprint sensor 1014 may be provided on the front, back, or side of the computer device 1000. When a physical key or vendor Logo is provided on the computer device 1000, the fingerprint sensor 1014 may be integrated with the physical key or vendor Logo.
The optical sensor 1015 is used to collect the ambient light intensity. In one embodiment, the processor 1001 may control the display brightness of the touch display screen 1005 according to the intensity of the ambient light collected by the optical sensor 1015. Specifically, when the ambient light intensity is high, the display brightness of the touch display screen 1005 is increased; when the ambient light intensity is low, the display brightness of the touch display screen 1005 is turned down. In another embodiment, the processor 1001 may also dynamically adjust the shooting parameters of the camera assembly 1006 according to the intensity of the ambient light collected by the optical sensor 1015.
A proximity sensor 1016, also known as a distance sensor, is typically provided on the front side of the computer device 1000. The proximity sensor 1016 is used to capture the distance between the user and the front of the computer device 1000. In one embodiment, the processor 1001 controls the touch display screen 1005 to switch from the bright screen state to the dark screen state when the proximity sensor 1016 detects that the distance between the user and the front face of the computer device 1000 is gradually decreased; when the proximity sensor 1016 detects that the distance between the user and the front of the computer device 1000 is gradually increased, the touch display screen 1005 is controlled by the processor 1001 to switch from a breath-screen state to a bright-screen state.
Those skilled in the art will appreciate that the configuration shown in FIG. 10 is not intended to be limiting of the computer device 1000, and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components may be used.
The present application further provides a computer device, comprising: a processor and a memory, the storage medium having stored therein at least one instruction, at least one program, a set of codes, or a set of instructions, which is loaded and executed by the processor to implement the dimension slice based variable classification method provided by the above-described method embodiments.
The present application further provides a computer-readable storage medium, in which at least one instruction, at least one program, a code set, or a set of instructions is stored, and the at least one instruction, the at least one program, the code set, or the set of instructions is loaded and executed by a processor to implement the dimension slice-based variable classification method provided by the above-mentioned method embodiments.
It should be understood that reference to "a plurality" herein means two or more. "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
The above description is only exemplary of the present application and should not be taken as limiting, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims (12)

1. A method for variable classification based on dimension slices, the method comprising:
selecting a first dimension characteristic vector and a second dimension characteristic vector which have correlation with a target variable, wherein the target variable is a variable corresponding to a business event, and the business event is an event corresponding to a business for performing differentiated operation on a user;
determining the number mxn of dimension slices according to the first dimension characteristic vector and the second dimension characteristic vector, and generating a first target variable distribution table and a second target variable distribution table containing mxn grids, wherein the first target variable distribution table and the second target variable distribution table are in one-to-one correspondence, and m and n are positive integers;
calculating a user variable corresponding to each grid in the first target variable distribution table and a target variable value corresponding to each grid in the second target variable distribution table;
determining a target area with the largest area, which is formed by lattices corresponding to target variable values meeting a preset threshold value, from the second target variable distribution table;
dividing the target area into k areas, wherein the k areas are used for dividing user variables corresponding to the target area into k categories, and k is a positive integer.
2. The method of claim 1, wherein the dividing the target region into k regions comprises:
determining a user class number k, wherein the user class number k represents k classes of the user variable, and the user class number k is used for dividing the target area into the k areas;
acquiring P permutation and combination modes for dividing the target area into the k areas, wherein P is a positive integer;
and determining a target permutation and combination mode from the P permutation and combination modes, and dividing the target area into the k areas according to the target permutation and combination mode.
3. The method according to claim 2, wherein the determining a target permutation and combination mode from the P permutation and combination modes, and dividing the target region into the k regions according to the target permutation and combination mode comprises:
calculating the variance of the target variable value corresponding to each permutation and combination mode in the P permutation and combination modes;
determining the permutation and combination mode with the minimum variance as the target permutation and combination mode;
and dividing the target area into the k areas according to the target permutation and combination mode.
4. The method of claim 3, wherein the variances comprise intra-class variances which are variances corresponding between the target variable values belonging to the same class and inter-class variances which are variances corresponding between the target variable values belonging to different classes;
calculating the variance of the target variable value corresponding to each permutation and combination mode in the P permutation and combination modes, wherein the variance comprises the following steps:
calculating the intra-class variance of the target variable value corresponding to each permutation and combination mode in the P permutation and combination modes;
calculating the inter-class variance of the target variable value corresponding to each permutation and combination mode in the P permutation and combination modes;
calculating the sum of the intra-class variance and the inter-class variance.
5. The method of any of claims 1 to 4, further comprising:
dividing the target area into the k areas according to the target permutation and combination mode;
and obtaining a classification result of the user variable according to the k regions, wherein the classification result comprises the user variable and a target variable value corresponding to the user variable.
6. The method of claim 5, further comprising:
adjusting the shape of the target area according to the classification result;
and dividing the target area again according to the user class number k according to the shape of the target area.
7. The method according to any one of claims 1 to 4, wherein the selecting the first dimension feature vector and the second dimension feature vector having a correlation with the target variable comprises:
determining the target variable;
calculating an information value corresponding to a feature vector according to the target variable, wherein the information value is used for representing the correlation between the feature vector and the target variable;
and selecting the first dimension characteristic vector and the second dimension characteristic vector which have correlation with the target variable according to the information value corresponding to the characteristic vector.
8. The method of claim 1, wherein determining the number of dimension slices, mxn, from the first-dimension feature vector and the second-dimension feature vector comprises:
selecting the dimension slicing mode, wherein the dimension slicing mode comprises at least one of an equal frequency slicing mode and a chi-square slicing mode;
and slicing the feature vectors according to the dimension slicing mode.
9. The method according to any of claims 1 to 4, wherein the traffic event comprises: at least one of a financial risk business event, a merchandise sales business event, and an information push business event.
10. A dimension slice-based variable classification apparatus, the apparatus comprising:
the system comprises a selection module, a processing module and a processing module, wherein the selection module is used for selecting a first dimension characteristic vector and a second dimension characteristic vector which have correlation with a target variable, the target variable is a variable corresponding to a business event, and the business event is an event corresponding to a business for performing differentiated operation on a user;
a generating module, configured to determine the number mxn of dimension slices according to the first dimension feature vector and the second dimension feature vector, and generate a first target variable distribution table and a second target variable distribution table that contain mxn lattices, where the first target variable distribution table and the second target variable distribution table are in one-to-one correspondence, and m and n are positive integers;
a calculating module, configured to calculate a user variable corresponding to each lattice in the first target variable distribution table and a target variable value corresponding to each lattice in the second target variable distribution table;
the processing module is used for determining a target area with the largest area formed by lattices corresponding to target variable values meeting a preset threshold from the second target variable distribution table;
and the classification module is used for dividing the target area into k areas, the k areas are used for dividing the user variables corresponding to the target area into k categories, and k is a positive integer.
11. A computer device comprising a processor and a memory, the memory having stored therein at least one instruction, at least one program, a set of codes, or a set of instructions, the at least one instruction, the at least one program, the set of codes, or the set of instructions being loaded and executed by the processor to implement the dimension slice based variable classification method of any of claims 1 to 9.
12. A computer storage medium having stored therein at least one instruction, at least one program, a set of codes, or a set of instructions, which is loaded and executed by a processor to implement the dimension slice based variable classification method according to any one of claims 1 to 9.
CN201911395277.8A 2019-12-30 2019-12-30 Variable classification method, device, equipment and medium based on dimension slice Active CN111144505B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911395277.8A CN111144505B (en) 2019-12-30 2019-12-30 Variable classification method, device, equipment and medium based on dimension slice

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911395277.8A CN111144505B (en) 2019-12-30 2019-12-30 Variable classification method, device, equipment and medium based on dimension slice

Publications (2)

Publication Number Publication Date
CN111144505A true CN111144505A (en) 2020-05-12
CN111144505B CN111144505B (en) 2023-09-01

Family

ID=70521912

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911395277.8A Active CN111144505B (en) 2019-12-30 2019-12-30 Variable classification method, device, equipment and medium based on dimension slice

Country Status (1)

Country Link
CN (1) CN111144505B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112269815A (en) * 2020-10-29 2021-01-26 维沃移动通信有限公司 Structured data processing method and device and electronic equipment
CN112308466A (en) * 2020-11-26 2021-02-02 东莞市盟大塑化科技有限公司 Enterprise qualification auditing method and device, computer equipment and storage medium

Citations (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014127224A1 (en) * 2013-02-14 2014-08-21 The Research Foundation For The State University Of New York Method for adaptive computer-aided detection of pulmonary nodules in thoracic computed tomography images using hierarchical vector quantization and apparatus for same
CN105574538A (en) * 2015-12-10 2016-05-11 小米科技有限责任公司 Classification model training method and apparatus
CN106874687A (en) * 2017-03-03 2017-06-20 深圳大学 Pathological section image intelligent sorting technique and device
US20170262733A1 (en) * 2016-03-10 2017-09-14 Siemens Healthcare Gmbh Method and System for Machine Learning Based Classification of Vascular Branches
CN108182452A (en) * 2017-12-29 2018-06-19 哈尔滨工业大学(威海) Aero-engine fault detection method and system based on grouping convolution self-encoding encoder
US20180242905A1 (en) * 2017-02-27 2018-08-30 Case Western Reserve University Predicting immunotherapy response in non-small cell lung cancer patients with quantitative vessel tortuosity
WO2018157381A1 (en) * 2017-03-03 2018-09-07 深圳大学 Method and apparatus for intelligently classifying pathological slice image
CN109271460A (en) * 2018-09-29 2019-01-25 阿里巴巴集团控股有限公司 The method and apparatus classified to the trade company in e-platform
CN109285075A (en) * 2017-07-19 2019-01-29 腾讯科技(深圳)有限公司 A kind of Claims Resolution methods of risk assessment, device and server
CN109636530A (en) * 2018-12-14 2019-04-16 拉扎斯网络科技(上海)有限公司 Product determines method, apparatus, electronic equipment and computer readable storage medium
CN109815987A (en) * 2018-12-27 2019-05-28 北京卓思天成数据咨询股份有限公司 A kind of listener clustering method and categorizing system
CN109840542A (en) * 2018-12-06 2019-06-04 北京化工大学 Adaptive dimension Decision-Tree Method based on polarization characteristic
CN110135509A (en) * 2019-05-21 2019-08-16 重庆斐耐科技有限公司 A kind of intelligent finance credit-graded approach neural network based
CN110187334A (en) * 2019-05-28 2019-08-30 深圳大学 A kind of target monitoring method, apparatus and computer readable storage medium
CN110276552A (en) * 2019-06-21 2019-09-24 深圳前海微众银行股份有限公司 Risk analysis method, device, equipment and readable storage medium storing program for executing before borrowing
CN110288038A (en) * 2019-06-28 2019-09-27 深圳前海微众银行股份有限公司 A kind of classification method and device of enterprise
CN110503344A (en) * 2019-08-28 2019-11-26 国网经济技术研究院有限公司 A kind of full category item overall process differentiation various dimensions Classification Management strategy
CN110555627A (en) * 2019-09-10 2019-12-10 拉扎斯网络科技(上海)有限公司 Entity display method, entity display device, storage medium and electronic equipment

Patent Citations (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014127224A1 (en) * 2013-02-14 2014-08-21 The Research Foundation For The State University Of New York Method for adaptive computer-aided detection of pulmonary nodules in thoracic computed tomography images using hierarchical vector quantization and apparatus for same
CN105574538A (en) * 2015-12-10 2016-05-11 小米科技有限责任公司 Classification model training method and apparatus
US20170262733A1 (en) * 2016-03-10 2017-09-14 Siemens Healthcare Gmbh Method and System for Machine Learning Based Classification of Vascular Branches
US20180242905A1 (en) * 2017-02-27 2018-08-30 Case Western Reserve University Predicting immunotherapy response in non-small cell lung cancer patients with quantitative vessel tortuosity
CN106874687A (en) * 2017-03-03 2017-06-20 深圳大学 Pathological section image intelligent sorting technique and device
WO2018157381A1 (en) * 2017-03-03 2018-09-07 深圳大学 Method and apparatus for intelligently classifying pathological slice image
CN109285075A (en) * 2017-07-19 2019-01-29 腾讯科技(深圳)有限公司 A kind of Claims Resolution methods of risk assessment, device and server
CN108182452A (en) * 2017-12-29 2018-06-19 哈尔滨工业大学(威海) Aero-engine fault detection method and system based on grouping convolution self-encoding encoder
CN109271460A (en) * 2018-09-29 2019-01-25 阿里巴巴集团控股有限公司 The method and apparatus classified to the trade company in e-platform
CN109840542A (en) * 2018-12-06 2019-06-04 北京化工大学 Adaptive dimension Decision-Tree Method based on polarization characteristic
CN109636530A (en) * 2018-12-14 2019-04-16 拉扎斯网络科技(上海)有限公司 Product determines method, apparatus, electronic equipment and computer readable storage medium
CN109815987A (en) * 2018-12-27 2019-05-28 北京卓思天成数据咨询股份有限公司 A kind of listener clustering method and categorizing system
CN110135509A (en) * 2019-05-21 2019-08-16 重庆斐耐科技有限公司 A kind of intelligent finance credit-graded approach neural network based
CN110187334A (en) * 2019-05-28 2019-08-30 深圳大学 A kind of target monitoring method, apparatus and computer readable storage medium
CN110276552A (en) * 2019-06-21 2019-09-24 深圳前海微众银行股份有限公司 Risk analysis method, device, equipment and readable storage medium storing program for executing before borrowing
CN110288038A (en) * 2019-06-28 2019-09-27 深圳前海微众银行股份有限公司 A kind of classification method and device of enterprise
CN110503344A (en) * 2019-08-28 2019-11-26 国网经济技术研究院有限公司 A kind of full category item overall process differentiation various dimensions Classification Management strategy
CN110555627A (en) * 2019-09-10 2019-12-10 拉扎斯网络科技(上海)有限公司 Entity display method, entity display device, storage medium and electronic equipment

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112269815A (en) * 2020-10-29 2021-01-26 维沃移动通信有限公司 Structured data processing method and device and electronic equipment
CN112308466A (en) * 2020-11-26 2021-02-02 东莞市盟大塑化科技有限公司 Enterprise qualification auditing method and device, computer equipment and storage medium

Also Published As

Publication number Publication date
CN111144505B (en) 2023-09-01

Similar Documents

Publication Publication Date Title
US20220188840A1 (en) Target account detection method and apparatus, electronic device, and storage medium
CN110585726A (en) User recall method, device, server and computer readable storage medium
CN112578971B (en) Page content display method and device, computer equipment and storage medium
CN111897996A (en) Topic label recommendation method, device, equipment and storage medium
CN111144505B (en) Variable classification method, device, equipment and medium based on dimension slice
CN110246110B (en) Image evaluation method, device and storage medium
CN111080371A (en) Method, device and storage medium for issuing resources to user account
CN111126925A (en) Method and device for determining replenishment quantity of front bin, computer equipment and storage medium
CN112398819A (en) Method and device for recognizing abnormality
CN112000264B (en) Dish information display method and device, computer equipment and storage medium
CN111028071B (en) Bill processing method and device, electronic equipment and storage medium
CN112819103A (en) Feature recognition method and device based on graph neural network, storage medium and terminal
CN112765470B (en) Training method of content recommendation model, content recommendation method, device and equipment
CN111599417B (en) Training data acquisition method and device of solubility prediction model
CN112230822B (en) Comment information display method and device, terminal and storage medium
CN112232890A (en) Data processing method, device, equipment and storage medium
CN113742430A (en) Method and system for determining number of triangle structures formed by nodes in graph data
CN112907702A (en) Image processing method, image processing device, computer equipment and storage medium
CN112560903A (en) Method, device and equipment for determining image aesthetic information and storage medium
CN110928913A (en) User display method, device, computer equipment and computer readable storage medium
CN110134303B (en) Operation control display method, device, terminal and storage medium
CN112907939B (en) Traffic control subarea dividing method and device
CN113591958B (en) Method, device and equipment for fusing internet of things data and information network data
CN112579661B (en) Method and device for determining specific target pair, computer equipment and storage medium
CN115018532A (en) Method, device, equipment, storage medium and product for training resource distribution model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant