CN111144505B - Variable classification method, device, equipment and medium based on dimension slice - Google Patents

Variable classification method, device, equipment and medium based on dimension slice Download PDF

Info

Publication number
CN111144505B
CN111144505B CN201911395277.8A CN201911395277A CN111144505B CN 111144505 B CN111144505 B CN 111144505B CN 201911395277 A CN201911395277 A CN 201911395277A CN 111144505 B CN111144505 B CN 111144505B
Authority
CN
China
Prior art keywords
target
variable
target variable
user
feature vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911395277.8A
Other languages
Chinese (zh)
Other versions
CN111144505A (en
Inventor
勾爱利
董超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sankuai Online Technology Co Ltd
Original Assignee
Beijing Sankuai Online Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sankuai Online Technology Co Ltd filed Critical Beijing Sankuai Online Technology Co Ltd
Priority to CN201911395277.8A priority Critical patent/CN111144505B/en
Publication of CN111144505A publication Critical patent/CN111144505A/en
Application granted granted Critical
Publication of CN111144505B publication Critical patent/CN111144505B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0635Risk analysis of enterprise or organisation activities

Abstract

The application discloses a variable classification method, device, equipment and medium based on dimension slices, and belongs to the field of computers. The method comprises the following steps: selecting a first dimension feature vector and a second dimension feature vector which have correlation with a target variable, wherein the target variable is a variable corresponding to a business event, determining the number m multiplied by n of dimension slices according to the first dimension feature vector and the second dimension feature vector, and generating two target variable distribution tables containing m multiplied by n grids, and the first target variable distribution table corresponds to the second target variable distribution table one by one; calculating a user variable corresponding to each grid in the first target variable distribution table and a target variable value corresponding to each grid in the second target variable distribution table; determining a target area with the largest area formed by grids corresponding to target variable values meeting a preset threshold value from a second target variable distribution table; the target region is divided into k regions, i.e. the user variable is divided into k categories.

Description

Variable classification method, device, equipment and medium based on dimension slice
Technical Field
The present application relates to the field of computers, and in particular, to a variable classification method, device, equipment, and medium based on dimension slicing.
Background
In the operation scene of various services, personnel or policy analysts responsible for operation need to perform differentiated operation aiming at target users so as to ensure that the operated products can meet the requirements of different types of users to the greatest extent.
Taking the example that the business is a financial business, a policy analyst needs to formulate a risk operation policy. Under the condition that the overdue rate reaches the target value, the number of users passing through the risk operation strategy is maximized, the users passing through the risk operation strategy are classified, and different rules are formulated for the users in different classifications. For example, users who can pass through the risk policy are classified into four types, and different loan amounts are set for the four types of users, which is indicative of lower loan amounts corresponding to users with higher overdue rates. The method for classifying the users passing through the risk operation policy in the related art is as follows: and slicing and dividing the variable influencing the overdue rate, performing color-level color identification on the variable after slicing and dividing, finding out the largest similar color area through human eye identification, and further classifying users corresponding to the area with the closest color in the largest similar color area.
Based on the above situation, the method for classifying the users manually has low accuracy and rationality.
Disclosure of Invention
The embodiment of the application provides a variable classification method, device, equipment and medium based on dimension slices, which can solve the problem of lower accuracy and rationality of a method for classifying users manually in the related technology. The technical scheme is as follows:
according to one aspect of the present application, there is provided a variable classification method based on dimensional slicing, the method comprising:
selecting a first dimension feature vector and a second dimension feature vector which have correlation with a target variable, wherein the target variable is a variable corresponding to a business event, and the business event is an event corresponding to a business for differentially operating a user;
determining the number m multiplied by n of dimension slices according to the first dimension feature vector and the second dimension feature vector, and generating a first target variable distribution table and a second target variable distribution table which contain m multiplied by n grids, wherein the first target variable distribution table corresponds to the second target variable distribution table one by one, and m and n are positive integers;
calculating a user variable corresponding to each grid in the first target variable distribution table and a target variable value corresponding to each grid in the second target variable distribution table;
Determining a target area with the largest area formed by grids corresponding to target variable values meeting a preset threshold value from the second target variable distribution table;
dividing the target area into k areas, wherein the k areas are used for dividing the user variable corresponding to the target area into k categories, and k is a positive integer.
In some embodiments of the present application, the dividing the target region into k regions includes:
determining a user class number k, wherein the user class number k characterizes k classes of the user variable, and the user class number k is used for dividing the target area into k areas;
p permutation and combination modes for dividing the target area into k areas are obtained, wherein P is a positive integer;
and determining a target permutation and combination mode from the P permutation and combination modes, and dividing the target region into k regions according to the target permutation and combination mode.
In some embodiments of the present application, the determining a target permutation and combination manner from the P permutation and combination manners, dividing the target region into the k regions according to the target permutation and combination manner includes:
calculating the variance of the target variable value corresponding to each permutation and combination mode in the P permutation and combination modes;
Determining the permutation and combination mode with the smallest variance as the target permutation and combination mode;
dividing the target area into k areas according to the target arrangement and combination mode.
In some embodiments of the application, the variances include a intra-class variance, which is a variance corresponding between the target variable values belonging to the same class, and an inter-class variance, which is a variance corresponding between the target variable values belonging to different classes;
calculating the variance of the target variable value corresponding to each permutation and combination mode in the P permutation and combination modes comprises the following steps:
calculating the intra-class variance of the target variable value corresponding to each permutation and combination mode in the P permutation and combination modes;
calculating the inter-class variance of the target variable value corresponding to each permutation and combination mode in the P permutation and combination modes;
and calculating the sum of the intra-class variance and the inter-class variance.
In some embodiments of the application, the method further comprises:
dividing the target area into k areas according to the target arrangement and combination mode;
and obtaining a classification result of the user variable according to the k areas, wherein the classification result comprises the user variable and a target variable value corresponding to the user variable.
In some embodiments of the application, the method further comprises:
adjusting the shape of the target area according to the classification result;
and dividing the target area again according to the shape of the target area and the user class number k.
In some embodiments of the present application, the selecting the first dimension feature vector and the second dimension feature vector having a correlation with the target variable includes:
determining the target variable;
calculating an information value corresponding to the feature vector according to the target variable, wherein the information value is used for representing the correlation between the feature vector and the target variable;
and selecting the first-dimension feature vector and the second-dimension feature vector which have correlation with the target variable according to the information value corresponding to the feature vector.
In some embodiments of the application, the determining the number of dimension slices, mxn, from the first dimension feature vector and the second dimension feature vector comprises:
selecting the dimension slicing mode, wherein the dimension slicing mode comprises at least one of an equal-frequency slicing mode and a chi-square slicing mode;
and slicing the feature vector according to the dimension slicing mode.
In some embodiments of the application, the business event comprises: at least one of a financial risk business event, a merchandise sales business event, and an information push business event.
According to another aspect of the present application, there is provided a variable classification device based on dimensional slicing, the device comprising:
the selection module is used for selecting a first dimension feature vector and a second dimension feature vector which have correlation with a target variable, wherein the target variable is a variable corresponding to a business event, and the business event is an event corresponding to a business which differentially operates a user;
the generation module is used for determining the number m multiplied by n of dimension slices according to the first dimension feature vector and the second dimension feature vector, and generating a first target variable distribution table and a second target variable distribution table which contain m multiplied by n grids, wherein the first target variable distribution table corresponds to the second target variable distribution table one by one, and m and n are positive integers;
the calculation module is used for calculating the user variable corresponding to each grid in the first target variable distribution table and the target variable value corresponding to each grid in the second target variable distribution table;
the processing module is used for determining a target area with the largest area formed by grids corresponding to the target variable values meeting a preset threshold value from the second target variable distribution table;
The classification module is used for dividing the target area into k areas, wherein the k areas are used for dividing the user variable corresponding to the target area into k categories, and k is a positive integer.
In some embodiments of the application, the processing module is configured to determine a user class number k, where the user class number k characterizes k categories of the user variable, and the user class number k is configured to divide the target region into the k regions;
the selecting module is used for acquiring P arrangement and combination modes for dividing the target area into k areas, wherein P is a positive integer;
the processing module is used for determining a target permutation and combination mode from the P permutation and combination modes and dividing the target region into k regions according to the target permutation and combination mode.
In some embodiments of the present application, the calculating module is configured to calculate a variance of a target variable value corresponding to each of the P permutation and combination manners;
the processing module is used for determining the permutation and combination mode with the minimum variance as the target permutation and combination mode; dividing the target area into k areas according to a target arrangement and combination mode.
In some embodiments of the application, the variances include a intra-class variance, which is a variance corresponding between the target variable values belonging to the same class, and an inter-class variance, which is a variance corresponding between the target variable values belonging to different classes;
The calculation module is used for calculating the intra-class variance of the target variable value corresponding to each of the P permutation and combination modes; calculating the inter-class variance of the target variable value corresponding to each permutation and combination mode in the P permutation and combination modes; and calculating the sum of the intra-class variance and the inter-class variance.
In some embodiments of the present application, the processing module is configured to divide the target area into the k areas according to the target permutation and combination manner;
the selecting module is configured to obtain a classification result of the user variable according to the k regions, where the classification result includes the user variable and a target variable value corresponding to the user variable.
In some embodiments of the present application, the processing module is configured to adjust a shape of the target area according to the classification result; and dividing the target area again according to the shape of the target area and the user class number k.
In some embodiments of the application, the processing module is configured to determine the target variable;
the calculation module is used for calculating an information value corresponding to the feature vector according to the target variable, wherein the information value is used for representing the correlation between the feature vector and the target variable;
The selection module is used for selecting the first-dimension feature vector and the second-dimension feature vector which have correlation with the target variable according to the information value corresponding to the feature vector.
In some embodiments of the present application, the processing module is configured to select a dimension slice mode, where the dimension slice mode includes at least one of an equal frequency slice mode and a chi-square slice mode; and slicing the feature vector according to the dimension slicing mode.
In some embodiments of the application, the business event comprises: at least one of a financial risk business event, a merchandise sales business event, and an information push business event.
According to another aspect of the present application, there is provided a computer device comprising a processor and a memory having stored therein at least one instruction, at least one program, code set or instruction set loaded and executed by the processor to implement the dimensional slice-based variable classification method as described in the above aspect.
According to another aspect of the present application, there is provided a computer readable storage medium having stored therein at least one instruction, at least one program, a set of codes or a set of instructions loaded and executed by a processor to implement the dimensional slice based variable classification method as described in the above aspect.
The technical scheme provided by the embodiment of the application has the beneficial effects that at least:
determining the number m multiplied by n of dimension slices according to the selected first dimension feature vector and second dimension feature vector, generating a first target variable distribution table and a second target variable distribution table which contain m multiplied by n grids, wherein each grid in the first target variable distribution table corresponds to a user variable, each grid in the second target variable distribution table corresponds to a target variable value, determining a region with the largest area formed by grids corresponding to the target variable values meeting a preset threshold value as a target region, dividing the target region into k regions, and dividing the user variable into k categories corresponding to the k regions, wherein m, n and k are positive integers. The aim of classifying the user variables is fulfilled by dividing the delineated target area into areas, so that the classification result is more reasonable and accurate.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a variable classification method based on dimension slices provided by an exemplary embodiment of the application;
FIG. 2 is a schematic diagram of a second target variable distribution table provided by an exemplary embodiment of the present application;
FIG. 3 is a flow chart of a variable classification method based on dimension slices provided by another exemplary embodiment of the application;
FIG. 4 is a schematic diagram of a first target variable distribution table provided by an exemplary embodiment of the present application;
FIG. 5 is a schematic diagram of a second target variable distribution table provided by another exemplary embodiment of the present application;
FIG. 6 is a schematic diagram of a second target variable distribution table for determining target areas provided by an exemplary embodiment of the present application;
FIG. 7 is a schematic diagram illustrating an arrangement and combination of dividing target areas according to an exemplary embodiment of the present application;
FIG. 8 is a flow chart of a dimension slice based variable classification method for pushing business events in conjunction with information provided by an exemplary embodiment of the present application;
FIG. 9 is a block diagram of a dimensional slice-based variable classification apparatus provided in accordance with an exemplary embodiment of the present application;
fig. 10 is a schematic structural view of a computer device according to an exemplary embodiment of the present application.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the embodiments of the present application will be described in further detail with reference to the accompanying drawings.
First, the nouns involved in the embodiments of the present application will be described:
slicing: the slicing method is to divide the feature vector into vectors with smaller volume or more intervals, and comprises the following steps: equidistant slicing, equi-frequency slicing and chi-square slicing. Equidistant slicing means that the number of 100 is divided into 10 parts by fixed dividing intervals, such as equidistant slicing of the number of 1 to 100, and then the 10 parts of numbers can be divided into the numbers of the intervals of [1, 10), [10, 20), [20, 30), [30, 40); equal-frequency slicing means that the number of the corresponding intervals is consistent, for example, 50 users are corresponding to the interval [ a1, a 3), 50 users are corresponding to the interval [ a3, a 5), and 50 users are corresponding to the interval [ a5, a 7); chi-square slice refers to dividing feature vectors according to chi-square values of the feature vectors.
Information value (Information Value, IV): and is named as information quantity, and refers to the influence degree of the feature vector on the target. For example, when a classification model is constructed by using a model method such as logistic regression, decision tree and the like, feature vectors need to be screened, feature vectors which can accurately predict a target are selected, the feature vectors can be selected through IV values, the larger the IV value corresponding to the feature vectors is, the higher the correlation between the feature vectors and the target is, and the higher the capability of the feature vectors to predict the target is.
Target variable: the business event refers to an event corresponding to a business which needs to differentially operate users, for example, in a financial risk business event, different operation strategies are implemented for users with different credibility, and operation refers to planning, organizing, implementing and controlling popularization of products in an operation process. In one example, the bank grants different loan amounts for different categories of users, illustratively targeting the expiration rate as the target variable. Under the condition that the overdue rate is not lower than the target value, the bank needs to ensure that the most users pass through the risk assessment, classify the users passing through the risk assessment, and configure different loan strategies for different types of users. The embodiment of the application is illustrated by taking the example that the business event comprises a financial risk business event, a commodity sales business event and an information push business event.
In the process of fine operation of an enterprise or an organization, a person or a policy analyst in charge of operation needs to determine which users are target users and perform differentiated operation on the target users. In the related art, a person or a policy analyst responsible for operation analyzes through an application program (such as an electronic table Excel) supporting data processing or makes rules through experience judgment, and performs differentiated operation on a user group according to the rules, for example, the policy analyst divides feature vectors of at least one dimension in an equidistant division manner, inputs data corresponding to the feature vectors into the application program supporting data processing, and performs color gradation color identification on the data through the application program, namely, a grid corresponding to the data meeting a target value in the electronic table is identified by using the same color or the same series of colors. And finding out the largest similar color block by means of human eyes, and classifying the user for several blocks with similar colors in the color block, wherein the color block consists of at least one grid in the electronic table.
When a policy analyst classifies users through the method, the human eyes are difficult to ensure the difference between user groups corresponding to grids with similar colors, and the rationality of dimension division is difficult to ensure, so that the user magnitude corresponding to each grid in the electronic form is inconsistent, and the target value of each user group is inaccurate.
The embodiment of the application provides a variable classification method based on dimension slices, which can automatically classify users and has reasonable classification results.
FIG. 1 illustrates a dimensional slice-based variable classification method provided by an exemplary embodiment of the present application, the method comprising the steps of:
step 101, selecting a first dimension feature vector and a second dimension feature vector which have correlation with a target variable, wherein the target variable is a variable corresponding to a business event, and the business event is an event corresponding to a business for differentially operating a user.
Optionally, the business event comprises at least one of a financial risk business event, a merchandise sales business event, and an information push business event. Illustratively, the financial risk transaction event includes at least one of a banking event, a securities transaction event, a trusted transaction event, and an insurance transaction event. Illustratively, the information push business event includes at least one of an information push business event of a social application, an information push business event of a shopping application, an information push business event of a music application, an information push business event of a video application, and an information push business event of a predetermined application. Optionally, the target variables include expiration rate, credit, promotional program, time of message push, frequency of message push. Illustratively, when the business event is a financial risk business event, the target variable includes at least one of overdue rate, credit, odds, risk level, commission rate.
In one example, the target variable is an expiration rate and the first and second dimension feature vectors associated with the target variable are the income of the borrowing user and the credit rating of the borrowing user, respectively.
Step 102, determining the number m×n of dimension slices according to the first dimension feature vector and the second dimension feature vector, and generating a first target variable distribution table and a second target variable distribution table which contain m×n lattices, wherein the first target variable distribution table corresponds to the second target variable distribution table one by one, and m and n are positive integers.
Illustratively, m is 3 and n is 5. And if the number of dimension slices corresponding to the first dimension feature vector is 3 and the number of dimension slices corresponding to the second dimension feature vector is 5, the generated first target variable distribution table and second target variable distribution table have 3 rows and 5 columns of grids, or 5 rows and 3 columns of grids. Optionally, the first target variable distribution table is a distribution table corresponding to the user variable, and the second target variable distribution table is a distribution table having a target variable value corresponding to the user variable. The user variable refers to a distribution of the number of users in an interval of feature vectors having a certain dimension, the target variable value refers to a value of a target variable corresponding to the user variable, for example, the target variable is an overdue rate, the user variable is a distribution of the number of users when the overdue rate is between 0.001 and 0.006, and the target variable value is a distribution of the number of overdue users corresponding to the overdue rate is between 0.001 and 0.006.
In one example, the target variable is a timeout rate, the first dimension feature vector is a income condition of a loan user, the second dimension feature vector is a credit condition of the loan user, the number of dimension slices corresponding to the first dimension feature vector is 3, and the number of dimension slices corresponding to the second dimension feature vector is 5, and then a first target variable distribution table of 3×5 grids and a second target variable distribution table of 3×5 grids are generated. Optionally, the first target variable distribution table and the second target variable distribution table are each 3 rows and 5 columns. The first target variable distribution table comprises the number of loan users with different incomes, and the second target variable distribution table is the overdue rate of the loan users with different credit conditions.
Step 103, calculating the user variable corresponding to each grid in the first target variable distribution table and the target variable value corresponding to each grid in the second target variable distribution table.
Illustratively, the user variable is the number of loan users in different intervals, and the target variable value is the ratio of the number of overdue users in different intervals to the total number of users in the interval.
The first target variable distribution table is shown in table one, and the second target variable distribution table is shown in table two.
List one
Income/expiration rate [100,300) [300,500) [500,700) [700,900) [900,1100)
[2000,5000) 500 400 6000 3000 5000
[5000,8000) 510 350 200 150 99
[8000,11000) 800 500 350 230 20
Watch II
Income/expiration rate [100,300) [300,500) [500,700) [700,900) [900,1100)
[2000,5000) 0.006 0.0025 0.001 0.001 0.05
[5000,8000) 0.0059 0.0003 0.025 0.1 0.1
[8000,11000) 0.005 0.002 0.0028 0.0043 0.05
It should be noted that the data of each row in the first and second tables is divided according to the income of the loan user, and each column is divided according to the credit rating score of the loan user. 500 in table one indicates that the revenue is at [2000, 5000) (unit: meta), and the credit rating score is at [100, 300) (unit: a minute) is 500. 0.006 in Table II indicates that the revenue is at [2000, 5000) (units: meta), and the credit rating score is at [100, 300) (unit: score) the percentage of overdue loan users in the interval as the total number of loan users in the interval.
And 104, determining a target area with the largest area formed by grids corresponding to the target variable values meeting the preset threshold value from a second target variable distribution table.
In one example, if the preset threshold is 0.006, then a cell corresponding to a target variable value of less than 0.006 is determined from table two, and the target region with the largest area of the cell composition is as shown in fig. 2 (the region composed of gray cells in the table).
The target area includes an area composed of a grid of incomes of the loan user between the intervals [2000, 11000) and a grid of credit rating scores of the loan user between the intervals [100, 900), and is optionally any one of a rectangle, a square, a polygon, and an irregular figure. The embodiment of the present application is described taking the example that the target area is a rectangle, the target area includes an area composed of a grid of incomes of loan users between the sections [2000, 11000 ] and a grid of credit rating scores of the loan users between the sections [100, 500).
Step 105, dividing the target area into k areas, where k areas are used to divide the user variables corresponding to the target area into k categories, and k is a positive integer.
In one example, k is 4, and the target area shown in step 104 is divided into 4 areas, where the 4 areas are used to divide the user variable corresponding to the target area into 4 categories. Alternatively, there are various ways of dividing the target area into 4 areas. Illustratively, the region of the grid of loan user's revenue between the intervals [2000, 5000) and loan user's credit rating between the intervals [100, 300) is a first region, the region of the grid of loan user's revenue between the intervals [5000, 8000) and loan user's credit rating between the intervals [100, 500) is a second region, the region of the grid of loan user's revenue between the intervals [8000, 11000) and loan user's credit rating between the intervals [100, 300) is a third region, and the region of the grid of loan user's revenue between the intervals [8000, 11000) and loan user's credit rating between the intervals [300, 500) is a fourth region.
In summary, in the method provided in this embodiment, the number m×n of dimension slices is determined according to the selected first dimension feature vector and second dimension feature vector, and a first target variable distribution table and a second target variable distribution table containing m×n lattices are generated, each lattice in the first target variable distribution table corresponds to a user variable, each lattice in the second target variable distribution table corresponds to a target variable value, a region with a maximum area formed by lattices corresponding to target variable values satisfying a preset threshold is determined as a target region, the target region is divided into k regions, and the k regions correspond to k categories, where m, n and k are positive integers. The aim of classifying the user variables is fulfilled by dividing the delineated target area into areas, so that the classification result is more reasonable and accurate.
FIG. 3 illustrates a flow chart of a variable classification method based on dimension slices provided by another exemplary embodiment of the application. The method comprises the following steps:
optionally, the business event comprises at least one of a financial risk business event, a merchandise sales business event, and an information push business event.
Optionally, selecting the first dimension feature vector and the second dimension feature vector having a correlation with the target variable includes the steps of:
in step 301, a target variable is determined.
Illustratively, the business event is a merchandise sales business event and the target variable is the user's participation in a merchant promotional program. Optionally, the feature vector having a correlation with the target variable includes at least one of the following vectors: whether the user has purchased the last month, the amount the user has purchased the last time, the category of the goods the user has purchased the last time, whether the user is a member of the merchant.
Step 302, calculating an information value corresponding to the feature vector according to the target variable, wherein the information value is used for representing the correlation between the feature vector and the target variable.
The information value is calculated as follows:
where WOE is evidence weight, py i Is the user in the i-th group (e.g., py in the event of a merchandise sales service i Are users in group i who participate in merchant promotions), pn i Is an unresponsive user in group i (e.g., pn in a commodity sales business event i Is a user in group i not participating in the merchant promotional program), #y i The number of responding users in the ith group of users (e.g., py in the event of a merchandise sales service i The number of users in the ith group that are involved in the merchant promotional program), #n i Is the number of unresponsive users in the group (e.g., pn in the event of a commodity sales business i The number of users in the ith group that did not participate in the merchant promotional program), #y T Is the number of all responding users (as in the commodity sales business event, # y T Number of all users participating in the merchant's promotional program), #n T Is the number of all non-responding users (as in the commodity sales business event, #n T The number of all users not participating in the merchant's promotional program), n is the number of variable groupings. IV i The method comprises the steps of calculating the IV value according to the WOE value, and summing the IV values of all the sets of vectors to obtain the IV values of all the feature vectors.
The larger the IV value of a feature vector, the higher the correlation of the feature vector with the target variable.
And step 303, selecting a first-dimension feature vector and a second-dimension feature vector which have correlation with the target variable according to the information value corresponding to the feature vector.
Illustratively, whether the user is a member of the merchant corresponds to an IV value > an IV value corresponding to a commodity category that the user has last purchased > an IV value corresponding to a last purchase amount of the user > whether the user has last purchased an IV value corresponding to a month. And selecting whether the user is a member of the merchant or not as a first dimension feature vector, and selecting the type of the commodity purchased by the user last time as a second dimension feature vector.
And step 304, determining the number m multiplied by n of dimension slices according to the first dimension feature vector and the second dimension feature vector, and generating a first target variable distribution table and a second target variable distribution table which contain m multiplied by n grids, wherein the first target variable distribution table corresponds to the second target variable distribution table one by one, and m are positive integers.
Optionally, slicing the feature vector in dimensions further comprises the steps of:
in step 3041, a dimension slice mode is selected, wherein the dimension slice mode includes at least one of an equal frequency slice mode and a chi-square slice mode.
And step 3042, slicing the feature vector according to the dimension slicing mode.
Taking the example that the business event is a commodity sales event as an illustration, the target variable is whether the user participates in the sales promotion of the merchant, the first dimension feature vector is whether the user is a member of the merchant, and the second dimension feature vector is the commodity purchased by the user last time. Illustratively, chi-square slices are selected as the dimension slice approach. And respectively calculating chi-square values of the first dimension feature vector and the second dimension feature vector, and carrying out dimension slicing on the first dimension feature vector and the second dimension feature vector according to the chi-square values.
Step 305, calculating a user variable corresponding to each grid in the first target variable distribution table and a target variable value corresponding to each grid in the second target variable distribution table.
Taking the commodity sales event as an example, as shown in fig. 4 and fig. 5, each grid in the first target variable distribution table corresponds to the number of users participating in the sales promotion of the merchant in each section, and each grid in the second target variable distribution table corresponds to the ratio of the number of users participating in the sales promotion of the merchant in each section to the total number of users.
And 306, determining a target area with the largest area formed by grids corresponding to the target variable values meeting the preset threshold value from a second target variable distribution table.
Illustratively, as shown in fig. 6, the preset threshold is 0.006, and the target variable value smaller than 0.006 is identified corresponding to the grid (the dotted grid marked in fig. 6). The target region is a region formed between a lattice corresponding to the section [ a1, a8 ] of the feature vector a and a lattice corresponding to the section [ b1, b5 ] of the feature vector b.
Step 307, determining a user class number k, wherein the user class number k characterizes k classes of the user variable, and the user class number k is used for dividing the target area into k areas.
In step 308, P permutation and combination modes of dividing the target region into k regions are obtained, where P is a positive integer.
Alternatively, dividing the target region into k regions has various forms. Fig. 7 (a) and 7 (b) show two forms thereof, respectively, in which lattices with the same ground pattern belong to the same class.
Step 309, calculating the variance of the target variable value corresponding to each permutation and combination mode in the P permutation and combination modes.
Illustratively, the variance of the target variable value in the classification form shown in fig. 7 (a) is calculated, and the variance of the target variable value in the classification form shown in fig. 7 (b) is calculated.
In step 310, the permutation pattern having the smallest variance is determined as the target permutation pattern.
Illustratively, the variance of the target variable values in the classification form shown in fig. 7 (a) is smaller than the variance of the target variable values in the classification form shown in fig. 7 (b), and thus the classification form shown in fig. 7 (a) is used as the target permutation and combination.
In step 311, the target area is divided into k areas according to the target permutation and combination mode.
The target area is divided into k areas in the manner shown in fig. 7 (a).
In summary, in the method provided in this embodiment, the first dimension feature vector and the second dimension feature vector are determined by calculating the information value corresponding to the feature vector, and the first target variable distribution table and the second target variable distribution table corresponding to m×n of the number of dimension slices are generated according to the first dimension feature vector and the second dimension feature vector. And determining the region with the largest area which can be formed by the grids corresponding to the target variable values meeting the preset threshold value from the second target variable distribution table as a target region, and dividing the target region into k regions, namely k user categories according to the number k of the user categories. The target area is divided into k areas, P permutation and combination modes (m, n, k and P are all positive integers), the target permutation and combination mode is determined according to the minimum variance of target variable values of each permutation and combination mode, and the target area is divided according to the target permutation and combination mode. According to the method provided by the embodiment, the target area is automatically determined according to the feature vector, and the target area is divided into k areas corresponding to the user class number k according to the minimum variance of the target variable value, so that the classification of the user variable is more reasonable and accurate.
The variable classification method based on the dimension slice is described below in connection with an information push business event. FIG. 8 illustrates a flow chart of a variable classification method based on dimension slices in a financial risk business event, according to an exemplary embodiment of the application. The method comprises the following steps:
step 801, a target variable is selected.
Illustratively, the target variable is the information push frequency, which is counted in the application as the amount of information pushed every day.
Step 802, determining two dimensions that maximize discrimination between users based on information values.
Alternatively, among the feature vectors, two feature vectors having the largest information value are selected as the first-dimension feature vector and the second-dimension feature vector. Illustratively, the first dimension feature vector is the age of the user using the application and the second dimension feature vector is the frequency of use of the application by the user (e.g., the time spent by the user on the application per day).
Step 803, a dimension slice number m×n is selected.
At step 804, a slicing mode is selected.
m and n are positive integers, and the slicing mode is an equidistant slice, the number of slices corresponding to the first dimension feature vector is m, and the number of slices corresponding to the second dimension feature vector is n.
In step 805, an m×m table is generated.
There are m×n lattices in the m×n table, and the m×n lattices correspond to m×n user groups.
At step 806a, a user population target variable within the mxn table is calculated.
In this m×n table, there are sections of the first dimension feature vector and the second dimension feature vector, and the user group target variables in m×n lattices are calculated from the sections, as shown in fig. 5. Illustratively, the interval corresponding to the first dimension feature vector is [8, 80) (unit: age) the interval corresponding to the second dimension feature vector is [0, 24) (hours), the number corresponding to each grid in the table shown in fig. 5 is the ratio of the number of users using the application program in the interval to the number of users not using the application program in the interval, for example, in the acquired user information, the age is [8, 18) and the frequency of using the application program is 500 persons in the interval [0, 2), and the number of users not used in the corresponding interval is 2, the value of the target variable of the user group is 0.004.
Step 806b, calculating the user quantity of the user population in the m×n table.
And calculating the user quantity of the user group in the m multiplied by n table according to the intervals corresponding to the first dimension feature vector and the second dimension feature vector. Alternatively, the user quantity may be obtained through a questionnaire or through an account of an application program registered by the user, and the manner of obtaining the user quantity is not limited by the present application.
Step 807, determining all lattices within the mxn population that meet the target threshold based on the set target threshold.
Illustratively, if the target threshold (i.e., the preset threshold) is 0.006, then the cells with target variable values less than 0.006 are identified as shown in fig. 6 (cells with dot-shaped shading in the table). Optionally, the target threshold is adjusted according to the user pass and fail duty cycle.
Step 808, outputting a maximum rectangle that maximizes the number of lattices that meet the target threshold.
In one example, the maximum rectangle is an area composed of a lattice whose target variable value is in the section [ a1, a 8) and corresponds to the section [ b1, b 5).
The lattices satisfying the target threshold value are necessarily different spaces dispersed in the mxn lattices, and since one regular rectangle is defined in order to ensure regular disposability, the lattices corresponding to the target variable in the intervals [ a1, a 3) and [ b3, b 5) are also defined as the maximum rectangles.
In the actual process, the size of the grid can be adjusted by dividing the fault tolerance of the grid, and the formula of the fault tolerance is as follows:
wherein ftr is the fault tolerance, cnt yi Is the number of users in each cell through which the ith target value passes (i.e., the number of users in the cell containing dot-shaped ground patterns); cnt ni Is the amount of users in each cell that the ith target value does not pass (i.e., the number of users in cells that do not contain dot-shaped ground patterns that are redundant when dividing a large rectangle, as shown in the corresponding cell of "0.007" in fig. 6); g yi Is the number of the ith pass cells (i.e., the number of cells containing dot-shaped ground patterns); g ni Is the number of the ith passing cells (i.e., the number of cells that are redundant in dividing a large rectangle and do not contain dot-shaped ground patterns).
In the course of the operation of the business event, if the number of non-passing users is not considered, the weight can be not used
Step 809, input class number k of user clusters.
Schematically, k is 4. Alternatively, after classifying the target variables of the user population into 4 classes, each grid identification may be classified according to a threshold value for each class.
Step 810, outputting a plurality of regular rectangles of clusters of different color identifications.
Alternatively, the rectangular area of output may be color-coded, or embossed. According to the clustering number of users (i.e., the user class number), for example, the circled rectangular space is divided into 4 rectangles, and a differentiation policy rule is executed for each rectangle.
Optionally, dividing the target area into k rectangles has P permutation and combination modes, where k and P are positive integers. Selecting a target permutation from the P permutations entails calculating variances of target variable values corresponding to each permutation and combination, optionally the variances include intra-class variances and inter-class variances, the method comprising the steps of:
S1, calculating the intra-class variance of the target variable value corresponding to each of the P permutation and combination modes.
S2, calculating the inter-class variance of the target variable value corresponding to each permutation and combination mode in the P permutation and combination modes.
S3, calculating the sum of the intra-class variance and the inter-class variance.
In order to achieve the purpose of differentiation strategy, it is required to ensure that the magnitude of each clustered user needs to be added with a factor of the magnitude of the user, that is, the variance of the magnitude of the user in the class is also minimum, and the calculation formula is as follows:
VF is a function for judging the quality of the clustering result, and the smaller the value is, the better the value is; var_cnt i Is the variance of the ratio of the i-th user quantity to the total quantity of the grouping users; var_rate i Is the variance of the i-th class user target value (e.g., expiration rate). Lambda is a custom Var_cnt i If the number of users per group in the grouping result is not necessarily required to be very uniformThe value may be set to 0 and n may be a positive integer in the case where the amount-to-ratio varies greatly.
S4, determining the permutation and combination mode with the minimum variance sum as a target permutation and combination mode.
S5, dividing the target area into k areas according to the target arrangement and combination mode, wherein k is a positive integer.
S6, obtaining classification results of the user variables according to the k areas, wherein the classification results comprise the user variables and target variable values corresponding to the user variables.
Optionally, the steps further include: 1. adjusting the shape of the target area according to the classification result; 2. and dividing the target area again according to the shape of the target area and the user class number k. And adjusting rectangular distribution according to whether the user level requirements exist in each user cluster.
And 811, outputting a user statistics and target variable report of whether the rule passes or does not pass and user target variable reports in different user clustering strategy rules and rules.
In one example, the policy analyst determines to push 6 pieces of information per day to users aged 20 to 25 years old, 4 pieces of information per day to users aged 25 to 30 years old, and 2 pieces of information per day to users aged 40 to 50 years old.
In summary, in the method provided in this embodiment, the maximum rectangle is defined from the lattices corresponding to the target variable values meeting the target threshold, and the maximum rectangle can be divided into a plurality of rectangles by inputting the class number of the user clusters, that is, the user groups are divided.
The following is an embodiment of the device according to the present application, and details of the embodiment of the device that are not described in detail may be combined with corresponding descriptions in the embodiment of the method described above, which are not described herein again.
Fig. 9 shows a schematic structural diagram of a variable classification device based on dimension slices according to an exemplary embodiment of the application. The apparatus may be implemented as all or part of a terminal by software, hardware or a combination of both, the apparatus comprising:
a selection module 910, configured to select a first dimension feature vector and a second dimension feature vector that have a correlation with a target variable, where the target variable is a variable corresponding to a service event, and the service event is an event corresponding to a service that performs differentiated operation on a user;
the generating module 920 is configured to determine the number m×n of dimension slices according to the first dimension feature vector and the second dimension feature vector, generate a first target variable distribution table and a second target variable distribution table that contain m×n lattices, where the first target variable distribution table corresponds to the second target variable distribution table one by one, and m and n are both positive integers;
a calculating module 930, configured to calculate a user variable corresponding to each grid in the first target variable distribution table and a target variable value corresponding to each grid in the second target variable distribution table;
a processing module 940, configured to determine, from the second target variable distribution table, a target area with a largest area formed by lattices corresponding to target variable values that satisfy a preset threshold;
The classification module 950 is configured to divide the target area into k areas, where k areas are used to divide the user variable corresponding to the target area into k categories, and k is a positive integer.
In an alternative embodiment, the processing module 940 is configured to determine a number k of user classes, where the number k of user classes characterizes k categories of the user variable, and the number k of user classes is used to divide the target area into k areas;
the selecting module 910 is configured to obtain P permutation and combination modes for dividing the target area into k areas, where P is a positive integer;
the processing module 940 is configured to determine a target permutation and combination manner from the P permutation and combination manners, and divide the target region into k regions according to the target permutation and combination manner.
In an optional embodiment, the calculating module 930 is configured to calculate a variance of the target variable value corresponding to each of the P permutation and combination manners;
the processing module 940 is configured to determine, as the target permutation and combination mode, the permutation and combination mode with the smallest variance; dividing the target area into k areas according to the target arrangement and combination mode.
In an alternative embodiment, the variances include an intra-class variance, which is a variance corresponding between target variable values belonging to the same class, and an inter-class variance, which is a variance corresponding between target variable values belonging to different classes;
The calculating module 930 is configured to calculate an intra-class variance of the target variable value corresponding to each of the P permutation and combination manners; calculating the inter-class variance of the target variable value corresponding to each permutation and combination mode in the P permutation and combination modes; the sum of the intra-class variance and the inter-class variance is calculated.
In an alternative embodiment, the processing module 940 is configured to divide the target area into k areas according to the target permutation and combination mode;
the selection module 910 is configured to obtain, according to the k regions, a classification result of the user variable, where the classification result includes the user variable and a target variable value corresponding to the user variable.
In an alternative embodiment, the processing module 940 is configured to adjust the shape of the target area according to the classification result; and dividing the target area again according to the shape of the target area and the user class number k.
In an alternative embodiment, the processing module 940 is configured to determine a target variable;
the calculating module 930 is configured to calculate, according to the target variable, an information value corresponding to the feature vector, where the information value is used to characterize a correlation between the feature vector and the target variable;
the selecting module 910 is configured to select a first dimension feature vector and a second dimension feature vector that have a correlation with the target variable according to the information value corresponding to the feature vector.
In an optional embodiment, the processing module 940 is configured to select a dimension slicing mode, where the dimension slicing mode includes at least one of an equal frequency slicing mode and a chi-square slicing mode; and slicing the feature vector according to the dimension slicing mode.
In an alternative embodiment, the business event comprises: at least one of a financial risk business event, a merchandise sales business event, and an information push business event.
Referring to FIG. 10, a block diagram of a computer device 1000 according to an exemplary embodiment of the application is shown. The computer device 1000 may be a portable mobile terminal such as: smart phones, tablet computers, MP3 players (Moving Picture Experts Group Audio Layer III, mpeg 3), MP4 (Moving Picture Experts Group Audio Layer IV, mpeg 4) players. The computer device 1000 may also be referred to by other names of user devices, portable terminals, etc.
In general, the computer device 1000 includes: a processor 1001 and a memory 1002.
The processor 1001 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and so on. The processor 1001 may be implemented in at least one hardware form of DSP (Digital Signal Processing ), FPGA (Field-Programmable Gate Array, field programmable gate array), PLA (Programmable Logic Array ). The processor 1001 may also include a main processor, which is a processor for processing data in an awake state, also referred to as a CPU (Central Processing Unit ), and a coprocessor; a coprocessor is a low-power processor for processing data in a standby state. In some embodiments, the processor 1001 may integrate a GPU (Graphics Processing Unit, image processor) for rendering and drawing of content required to be displayed by the display screen. In some embodiments, the processor 1001 may also include an AI (Artificial Intelligence ) processor for processing computing operations related to machine learning.
Memory 1002 may include one or more computer-readable storage media, which may be tangible and non-transitory. Memory 1002 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 1002 is used to store at least one instruction for execution by processor 1001 to implement the dimensional slice-based variable classification method provided in the present application.
In some embodiments, the computer device 1000 may further optionally include: a peripheral interface 1003, and at least one peripheral. Specifically, the peripheral device includes: at least one of radio frequency circuitry 1004, touch display 1005, camera 1006, audio circuitry 1007, positioning component 1008, and power supply 1009.
Peripheral interface 1003 may be used to connect I/O (Input/Output) related at least one peripheral to processor 1001 and memory 1002. In some embodiments, processor 1001, memory 1002, and peripheral interface 1003 are integrated on the same chip or circuit board; in some other embodiments, either or both of the processor 1001, memory 1002, and peripheral interface 1003 may be implemented on a separate chip or circuit board, which is not limited in this embodiment.
Radio Frequency circuit 1004 is used to receive and transmit RF (Radio Frequency) signals, also known as electromagnetic signals. Radio frequency circuitry 1004 communicates with a communication network and other communication devices via electromagnetic signals. The radio frequency circuit 1004 converts an electrical signal into an electromagnetic signal for transmission, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 1004 includes: antenna systems, RF transceivers, one or more amplifiers, tuners, oscillators, digital signal processors, codec chipsets, subscriber identity module cards, and so forth. Radio frequency circuitry 1004 may communicate with other terminals via at least one wireless communication protocol. The wireless communication protocol includes, but is not limited to: the world wide web, metropolitan area networks, intranets, generation mobile communication networks (2G, 3G, 4G, and 5G), wireless local area networks, and/or WiFi (Wireless Fidelity ) networks. In some embodiments, the radio frequency circuitry 1004 may also include NFC (Near Field Communication ) related circuitry, which is not limiting of the application.
The touch display 1005 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. The touch display 1005 also has the ability to capture touch signals at or above the surface of the touch display 1005. The touch signal may be input to the processor 1001 as a control signal for processing. Touch display 1005 is used to provide virtual buttons and/or virtual keyboards, also known as soft buttons and/or soft keyboards. In some embodiments, the touch display 1005 may be one, providing a front panel of the computer device 1000; in other embodiments, the touch display 1005 may be at least two, respectively disposed on different surfaces of the computer device 1000 or in a folded design; in still other embodiments, touch display 1005 may be a flexible display disposed on a curved surface or a folded surface of computer device 1000. Even more, the touch display 1005 may be arranged in an irregular pattern other than rectangular, i.e., a shaped screen. The touch display 1005 may be made of LCD (Liquid Crystal Display ), OLED (Organic Light-Emitting Diode) or other materials.
The camera assembly 1006 is used to capture images or video. Optionally, camera assembly 1006 includes a front camera and a rear camera. In general, a front camera is used for realizing video call or self-photographing, and a rear camera is used for realizing photographing of pictures or videos. In some embodiments, the number of the rear cameras is at least two, and the rear cameras are any one of a main camera, a depth camera and a wide-angle camera, so as to realize fusion of the main camera and the depth camera to realize a background blurring function, and fusion of the main camera and the wide-angle camera to realize a panoramic shooting function and a Virtual Reality (VR) shooting function. In some embodiments, camera assembly 1006 may also include a flash. The flash lamp can be a single-color temperature flash lamp or a double-color temperature flash lamp. The dual-color temperature flash lamp refers to a combination of a warm light flash lamp and a cold light flash lamp, and can be used for light compensation under different color temperatures.
Audio circuitry 1007 is used to provide an audio interface between the user and computer device 1000. The audio circuit 1007 may include a microphone and a speaker. The microphone is used for collecting sound waves of users and environments, converting the sound waves into electric signals, and inputting the electric signals to the processor 1001 for processing, or inputting the electric signals to the radio frequency circuit 1004 for voice communication. For purposes of stereo acquisition or noise reduction, the microphone may be multiple, each disposed at a different location of the computer device 1000. The microphone may also be an array microphone or an omni-directional pickup microphone. The speaker is used to convert electrical signals from the processor 1001 or the radio frequency circuit 1004 into sound waves. The speaker may be a conventional thin film speaker or a piezoelectric ceramic speaker. When the speaker is a piezoelectric ceramic speaker, not only the electric signal can be converted into a sound wave audible to humans, but also the electric signal can be converted into a sound wave inaudible to humans for ranging and other purposes. In some embodiments, audio circuit 1007 may also include a headphone jack.
The location component 1008 is used to locate the current geographic location of the computer device 1000 to enable navigation or LBS (Location Based Service, location-based services). The positioning component 1008 may be a positioning component based on the united states GPS (Global Positioning System ), the beidou system of china, or the galileo system of russia.
The power supply 1009 is used to power the various components in the computer device 1000. The power source 1009 may be alternating current, direct current, disposable battery or rechargeable battery. When the power source 1009 includes a rechargeable battery, the rechargeable battery may be a wired rechargeable battery or a wireless rechargeable battery. The wired rechargeable battery is a battery charged through a wired line, and the wireless rechargeable battery is a battery charged through a wireless coil. The rechargeable battery may also be used to support fast charge technology.
In some embodiments, the computer device 1000 also includes one or more sensors 1010. The one or more sensors 1010 include, but are not limited to: acceleration sensor 1011, gyro sensor 1012, pressure sensor 1013, fingerprint sensor 1014, optical sensor 1015, and proximity sensor 1016.
The acceleration sensor 1011 detects the magnitudes of accelerations on three coordinate axes of the coordinate system established with the computer apparatus 1000. For example, an acceleration sensor 1011 is used to detect components of gravitational acceleration on three coordinate axes. The processor 1001 may control the touch display 1005 to display a user interface in a landscape view or a portrait view based on gravitational acceleration signals of the set of acceleration sensors 1010. The acceleration sensor 1011 may be used for the acquisition of game or user motion data.
The gyro sensor 1012 may detect a body direction and a rotation angle of the computer device 1000, and the gyro sensor 1012 may collect a 3D motion of the user on the computer device 1000 together with the acceleration sensor 1011. The processor 1001 may implement the following functions according to the data collected by the gyro sensor 1012: motion sensing (e.g., changing UI according to a tilting operation by a user), image stabilization at shooting, game control, and inertial navigation.
Pressure sensor 1013 may be disposed on a side frame of computer device 1000 and/or on an underlying layer of touch display 1005. When the pressure sensor 1013 is provided at a side frame of the computer apparatus 1000, a grip signal of the computer apparatus 1000 by a user can be detected, and left-right hand recognition or quick operation can be performed according to the grip signal. When the pressure sensor 1013 is disposed at the lower layer of the touch display 1005, control of the operability control on the UI interface can be achieved according to the pressure operation of the user on the touch display 1005. The operability controls include at least one of a button control, a scroll bar control, an icon control, and a menu control.
The fingerprint sensor 1014 is used to collect a fingerprint of a user to identify the identity of the user based on the collected fingerprint. Upon recognizing that the user's identity is a trusted identity, the processor 1001 authorizes the user to perform relevant sensitive operations including unlocking the screen, viewing encrypted information, downloading software, paying for and changing settings, etc. The fingerprint sensor 1014 may be provided on the front, back or side of the computer device 1000. When a physical key or vendor Logo is provided on the computer device 1000, the fingerprint sensor 1014 may be integrated with the physical key or vendor Logo.
The optical sensor 1015 is used to collect ambient light intensity. In one embodiment, the processor 1001 may control the display brightness of the touch display 1005 based on the ambient light intensity collected by the optical sensor 1015. Specifically, when the intensity of the ambient light is high, the display brightness of the touch display screen 1005 is turned up; when the ambient light intensity is low, the display brightness of the touch display screen 1005 is turned down. In another embodiment, the processor 1001 may dynamically adjust the shooting parameters of the camera module 1006 according to the ambient light intensity collected by the optical sensor 1015.
A proximity sensor 1016, also referred to as a distance sensor, is typically provided on the front of the computer device 1000. The proximity sensor 1016 is used to capture the distance between the user and the front of the computer device 1000. In one embodiment, when the proximity sensor 1016 detects a gradual decrease in the distance between the user and the front of the computer device 1000, the processor 1001 controls the touch display 1005 to switch from the bright screen state to the off screen state; when the proximity sensor 1016 detects a gradual increase in the distance between the user and the front of the computer device 1000, the touch display 1005 is controlled by the processor 1001 to switch from the off-screen state to the on-screen state.
Those skilled in the art will appreciate that the architecture shown in fig. 10 is not limiting as to the computer device 1000, and may include more or fewer components than shown, or may combine certain components, or employ a different arrangement of components.
The present application also provides a computer device comprising: a processor and a memory, where the storage medium stores at least one instruction, at least one program, a code set, or an instruction set, where the at least one instruction, the at least one program, the code set, or the instruction set is loaded and executed by the processor to implement the variable classification method based on dimension slice provided by each method embodiment.
The present application also provides a computer readable storage medium having stored therein at least one instruction, at least one program, code set, or instruction set loaded and executed by a processor to implement the dimensional slice-based variable classification method provided by the above method embodiments.
It should be understood that references herein to "a plurality" are to two or more. "and/or", describes an association relationship of an association object, and indicates that there may be three relationships, for example, a and/or B, and may indicate: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program for instructing relevant hardware, where the program may be stored in a computer readable storage medium, and the storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
The foregoing description of the preferred embodiments of the present application is not intended to limit the application, but rather, the application is to be construed as limited to the appended claims.

Claims (9)

1. A variable classification method based on dimension slicing, the method comprising:
selecting a first dimension feature vector and a second dimension feature vector which have correlation with a target variable, wherein the target variable is a variable corresponding to a business event, and the business event is an event corresponding to a business for differentially operating a user;
determining the number m multiplied by n of dimension slices according to the first dimension feature vector and the second dimension feature vector, and generating a first target variable distribution table and a second target variable distribution table which contain m multiplied by n grids, wherein the first target variable distribution table corresponds to the second target variable distribution table one by one, and m and n are positive integers;
Calculating a user variable corresponding to each grid in the first target variable distribution table and a target variable value corresponding to each grid in the second target variable distribution table;
determining a target area with the largest area formed by grids corresponding to target variable values meeting a preset threshold value from the second target variable distribution table;
dividing the target area into k areas, wherein the k areas are used for dividing user variables corresponding to the target area into k categories, and k is a positive integer;
the dividing the target region into k regions includes:
determining a user class number k, wherein the user class number k characterizes k classes of the user variable, and the user class number k is used for dividing the target area into k areas;
p permutation and combination modes for dividing the target area into k areas are obtained, wherein P is a positive integer;
determining a target permutation and combination mode from the P permutation and combination modes, and dividing the target region into k regions according to the target permutation and combination mode;
determining a target permutation and combination mode from the P permutation and combination modes, dividing the target region into the k regions according to the target permutation and combination mode, wherein the method comprises the following steps:
Calculating the variance of the target variable value corresponding to each permutation and combination mode in the P permutation and combination modes;
determining the permutation and combination mode with the smallest variance as the target permutation and combination mode;
dividing the target area into k areas according to the target arrangement and combination mode.
2. The method of claim 1, wherein the variances include a intra-class variance, which is a variance corresponding between the target variable values belonging to the same class, and an inter-class variance, which is a variance corresponding between the target variable values belonging to different classes;
calculating the variance of the target variable value corresponding to each permutation and combination mode in the P permutation and combination modes comprises the following steps:
calculating the intra-class variance of the target variable value corresponding to each permutation and combination mode in the P permutation and combination modes;
calculating the inter-class variance of the target variable value corresponding to each permutation and combination mode in the P permutation and combination modes;
and calculating the sum of the intra-class variance and the inter-class variance.
3. The method according to claim 1 or 2, characterized in that the method further comprises:
dividing the target area into k areas according to the target arrangement and combination mode;
And obtaining a classification result of the user variable according to the k areas, wherein the classification result comprises the user variable and a target variable value corresponding to the user variable.
4. A method according to claim 3, characterized in that the method further comprises:
adjusting the shape of the target area according to the classification result;
and dividing the target area again according to the shape of the target area and the user class number k.
5. The method according to claim 1 or 2, wherein the selecting a first dimension feature vector and a second dimension feature vector having a correlation with the target variable comprises:
determining the target variable;
calculating an information value corresponding to the feature vector according to the target variable, wherein the information value is used for representing the correlation between the feature vector and the target variable;
and selecting the first-dimension feature vector and the second-dimension feature vector which have correlation with the target variable according to the information value corresponding to the feature vector.
6. The method of claim 1, wherein the determining the number of dimension slices, mxn, from the first dimension feature vector and the second dimension feature vector comprises:
Selecting the dimension slicing mode, wherein the dimension slicing mode comprises at least one of an equal-frequency slicing mode and a chi-square slicing mode;
and slicing the feature vector according to the dimension slicing mode.
7. The method according to claim 1 or 2, wherein the business event comprises: at least one of a financial risk business event, a merchandise sales business event, and an information push business event.
8. A computer device comprising a processor and a memory having stored therein at least one instruction, at least one program, code set, or instruction set that is loaded and executed by the processor to implement the dimensional slice-based variable classification method of any of claims 1-7.
9. A computer storage medium having stored therein at least one instruction, at least one program, code set, or instruction set, the at least one instruction, the at least one program, the code set, or instruction set being loaded and executed by a processor to implement the dimensional slice-based variable classification method of any of claims 1 to 7.
CN201911395277.8A 2019-12-30 2019-12-30 Variable classification method, device, equipment and medium based on dimension slice Active CN111144505B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911395277.8A CN111144505B (en) 2019-12-30 2019-12-30 Variable classification method, device, equipment and medium based on dimension slice

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911395277.8A CN111144505B (en) 2019-12-30 2019-12-30 Variable classification method, device, equipment and medium based on dimension slice

Publications (2)

Publication Number Publication Date
CN111144505A CN111144505A (en) 2020-05-12
CN111144505B true CN111144505B (en) 2023-09-01

Family

ID=70521912

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911395277.8A Active CN111144505B (en) 2019-12-30 2019-12-30 Variable classification method, device, equipment and medium based on dimension slice

Country Status (1)

Country Link
CN (1) CN111144505B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112269815A (en) * 2020-10-29 2021-01-26 维沃移动通信有限公司 Structured data processing method and device and electronic equipment
CN112308466A (en) * 2020-11-26 2021-02-02 东莞市盟大塑化科技有限公司 Enterprise qualification auditing method and device, computer equipment and storage medium

Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014127224A1 (en) * 2013-02-14 2014-08-21 The Research Foundation For The State University Of New York Method for adaptive computer-aided detection of pulmonary nodules in thoracic computed tomography images using hierarchical vector quantization and apparatus for same
CN105574538A (en) * 2015-12-10 2016-05-11 小米科技有限责任公司 Classification model training method and apparatus
CN106874687A (en) * 2017-03-03 2017-06-20 深圳大学 Pathological section image intelligent sorting technique and device
CN108182452A (en) * 2017-12-29 2018-06-19 哈尔滨工业大学(威海) Aero-engine fault detection method and system based on grouping convolution self-encoding encoder
WO2018157381A1 (en) * 2017-03-03 2018-09-07 深圳大学 Method and apparatus for intelligently classifying pathological slice image
CN109271460A (en) * 2018-09-29 2019-01-25 阿里巴巴集团控股有限公司 The method and apparatus classified to the trade company in e-platform
CN109285075A (en) * 2017-07-19 2019-01-29 腾讯科技(深圳)有限公司 A kind of Claims Resolution methods of risk assessment, device and server
CN109636530A (en) * 2018-12-14 2019-04-16 拉扎斯网络科技(上海)有限公司 Product determines method, apparatus, electronic equipment and computer readable storage medium
CN109815987A (en) * 2018-12-27 2019-05-28 北京卓思天成数据咨询股份有限公司 A kind of listener clustering method and categorizing system
CN109840542A (en) * 2018-12-06 2019-06-04 北京化工大学 Adaptive dimension Decision-Tree Method based on polarization characteristic
CN110135509A (en) * 2019-05-21 2019-08-16 重庆斐耐科技有限公司 A kind of intelligent finance credit-graded approach neural network based
CN110187334A (en) * 2019-05-28 2019-08-30 深圳大学 A kind of target monitoring method, apparatus and computer readable storage medium
CN110276552A (en) * 2019-06-21 2019-09-24 深圳前海微众银行股份有限公司 Risk analysis method, device, equipment and readable storage medium storing program for executing before borrowing
CN110288038A (en) * 2019-06-28 2019-09-27 深圳前海微众银行股份有限公司 A kind of classification method and device of enterprise
CN110503344A (en) * 2019-08-28 2019-11-26 国网经济技术研究院有限公司 A kind of full category item overall process differentiation various dimensions Classification Management strategy
CN110555627A (en) * 2019-09-10 2019-12-10 拉扎斯网络科技(上海)有限公司 Entity display method, entity display device, storage medium and electronic equipment

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10115039B2 (en) * 2016-03-10 2018-10-30 Siemens Healthcare Gmbh Method and system for machine learning based classification of vascular branches
US10492723B2 (en) * 2017-02-27 2019-12-03 Case Western Reserve University Predicting immunotherapy response in non-small cell lung cancer patients with quantitative vessel tortuosity

Patent Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014127224A1 (en) * 2013-02-14 2014-08-21 The Research Foundation For The State University Of New York Method for adaptive computer-aided detection of pulmonary nodules in thoracic computed tomography images using hierarchical vector quantization and apparatus for same
CN105574538A (en) * 2015-12-10 2016-05-11 小米科技有限责任公司 Classification model training method and apparatus
CN106874687A (en) * 2017-03-03 2017-06-20 深圳大学 Pathological section image intelligent sorting technique and device
WO2018157381A1 (en) * 2017-03-03 2018-09-07 深圳大学 Method and apparatus for intelligently classifying pathological slice image
CN109285075A (en) * 2017-07-19 2019-01-29 腾讯科技(深圳)有限公司 A kind of Claims Resolution methods of risk assessment, device and server
CN108182452A (en) * 2017-12-29 2018-06-19 哈尔滨工业大学(威海) Aero-engine fault detection method and system based on grouping convolution self-encoding encoder
CN109271460A (en) * 2018-09-29 2019-01-25 阿里巴巴集团控股有限公司 The method and apparatus classified to the trade company in e-platform
CN109840542A (en) * 2018-12-06 2019-06-04 北京化工大学 Adaptive dimension Decision-Tree Method based on polarization characteristic
CN109636530A (en) * 2018-12-14 2019-04-16 拉扎斯网络科技(上海)有限公司 Product determines method, apparatus, electronic equipment and computer readable storage medium
CN109815987A (en) * 2018-12-27 2019-05-28 北京卓思天成数据咨询股份有限公司 A kind of listener clustering method and categorizing system
CN110135509A (en) * 2019-05-21 2019-08-16 重庆斐耐科技有限公司 A kind of intelligent finance credit-graded approach neural network based
CN110187334A (en) * 2019-05-28 2019-08-30 深圳大学 A kind of target monitoring method, apparatus and computer readable storage medium
CN110276552A (en) * 2019-06-21 2019-09-24 深圳前海微众银行股份有限公司 Risk analysis method, device, equipment and readable storage medium storing program for executing before borrowing
CN110288038A (en) * 2019-06-28 2019-09-27 深圳前海微众银行股份有限公司 A kind of classification method and device of enterprise
CN110503344A (en) * 2019-08-28 2019-11-26 国网经济技术研究院有限公司 A kind of full category item overall process differentiation various dimensions Classification Management strategy
CN110555627A (en) * 2019-09-10 2019-12-10 拉扎斯网络科技(上海)有限公司 Entity display method, entity display device, storage medium and electronic equipment

Also Published As

Publication number Publication date
CN111144505A (en) 2020-05-12

Similar Documents

Publication Publication Date Title
US11244170B2 (en) Scene segmentation method and device, and storage medium
CN110083791B (en) Target group detection method and device, computer equipment and storage medium
CN110585726A (en) User recall method, device, server and computer readable storage medium
CN111489378B (en) Video frame feature extraction method and device, computer equipment and storage medium
CN111144505B (en) Variable classification method, device, equipment and medium based on dimension slice
CN111078521A (en) Abnormal event analysis method, device, equipment, system and storage medium
CN110246110B (en) Image evaluation method, device and storage medium
CN112398819A (en) Method and device for recognizing abnormality
CN114298123A (en) Clustering method and device, electronic equipment and readable storage medium
CN111353946A (en) Image restoration method, device, equipment and storage medium
CN112990053A (en) Image processing method, device, equipment and storage medium
CN111931075A (en) Content recommendation method and device, computer equipment and storage medium
CN111599417B (en) Training data acquisition method and device of solubility prediction model
CN112230822B (en) Comment information display method and device, terminal and storage medium
CN112232890A (en) Data processing method, device, equipment and storage medium
CN112765470A (en) Training method of content recommendation model, content recommendation method, device and equipment
CN113742430A (en) Method and system for determining number of triangle structures formed by nodes in graph data
CN110928913A (en) User display method, device, computer equipment and computer readable storage medium
CN112907702A (en) Image processing method, image processing device, computer equipment and storage medium
CN112116480A (en) Virtual resource determination method and device, computer equipment and storage medium
CN110929675B (en) Image processing method, device, computer equipment and computer readable storage medium
CN112989198B (en) Push content determination method, device, equipment and computer-readable storage medium
CN112579661B (en) Method and device for determining specific target pair, computer equipment and storage medium
CN111753154B (en) User data processing method, device, server and computer readable storage medium
CN112269559B (en) Volume adjustment method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant