CN109101562B - Method, device, computer equipment and storage medium for searching target group - Google Patents

Method, device, computer equipment and storage medium for searching target group Download PDF

Info

Publication number
CN109101562B
CN109101562B CN201810771080.9A CN201810771080A CN109101562B CN 109101562 B CN109101562 B CN 109101562B CN 201810771080 A CN201810771080 A CN 201810771080A CN 109101562 B CN109101562 B CN 109101562B
Authority
CN
China
Prior art keywords
samples
target
feature
sample
features
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810771080.9A
Other languages
Chinese (zh)
Other versions
CN109101562A (en
Inventor
周南光
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Life Insurance Company of China Ltd
Original Assignee
Ping An Life Insurance Company of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Life Insurance Company of China Ltd filed Critical Ping An Life Insurance Company of China Ltd
Priority to CN201810771080.9A priority Critical patent/CN109101562B/en
Publication of CN109101562A publication Critical patent/CN109101562A/en
Application granted granted Critical
Publication of CN109101562B publication Critical patent/CN109101562B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application discloses a method for searching a target group, which comprises the following steps: acquiring a plurality of preselected samples; acquiring a first characteristic with the greatest influence on the information quantity of the plurality of pre-selected samples from a plurality of characteristics included in the plurality of pre-selected samples; dividing the plurality of pre-selected samples into a first specified number of first samples according to a first specified number of classification partitions corresponding to the first features; screening target first samples meeting first preset conditions from each first sample; acquiring a second feature with the greatest influence on the information quantity of the target first sample from a plurality of features included in the target first sample; dividing the target first samples into a second specified number of second samples according to a second specified number of classification partitions corresponding to the second features; judging whether target second samples meeting a second preset condition exist in the second specified number of second samples; if so, judging the target second sample as a corresponding target group.

Description

Method, device, computer equipment and storage medium for searching target group
Technical Field
The present application relates to the field of big data, and in particular, to a method, an apparatus, a computer device, and a storage medium for searching a target group.
Background
The existing client data exist in the form of big data, and it is difficult to find the required specific group in the big data. However, in the existing application, the target group meeting the requirement is needed to be screened from a large database, so that corresponding work can be directly and effectively developed aiming at the target group, the working efficiency can be improved, and the working target is more specific and the working effect is more obvious. Therefore, the method for accurately searching the target group in the big data has practical application value.
Disclosure of Invention
The main purpose of the application is to provide a method for searching target groups, which aims to solve the technical problem that the required specific groups are difficult to find in big data.
The application provides a method for searching a target group, which comprises the following steps:
acquiring a plurality of pre-selected samples, wherein each pre-selected sample comprises user data corresponding to a plurality of characteristics of a user respectively;
acquiring a first characteristic with the greatest influence on the information quantity of the plurality of pre-selected samples from a plurality of characteristics included in the plurality of pre-selected samples;
Dividing the plurality of pre-selected samples into a first specified number of first samples according to a first specified number of classification partitions corresponding to the first features;
screening target first samples meeting a first preset condition from each first sample, wherein the number of the target first samples is one or more;
acquiring a second feature with the greatest influence on the information quantity of the target first sample from a plurality of features included in the target first sample, wherein the second feature is different from the first feature;
dividing the target first samples into a second specified number of second samples according to a second specified number of classification partitions corresponding to the second features;
judging whether target second samples meeting a second preset condition exist in the second specified number of second samples;
if yes, stopping dividing the target second samples, and judging that the target second samples meeting the second preset condition are corresponding target groups.
Preferably, the step of acquiring the first feature having the greatest influence on the information amount of the plurality of pre-selected samples from the plurality of features included in the plurality of pre-selected samples includes:
calculating the total information quantity of the preselected sample;
Acquiring influence values of all the characteristics on the total information quantity respectively;
the features are arranged in a descending order according to the magnitude of the influence value;
and setting the feature corresponding to the first influence value at the forefront of the arrangement sequence in the descending order as the first feature.
Preferably, before the step of dividing the plurality of pre-selected samples into the first specified number of first samples according to the first specified number of classification partitions corresponding to the first feature, the method further includes:
acquiring the attribute of the first feature;
and determining a first specified number of classification partitions corresponding to the first feature according to the attribute of the first feature.
Preferably, the attribute of the first feature is a category type, and the step of determining a first specified number of classification partitions corresponding to the first feature according to the attribute of the first feature includes:
and dividing the plurality of pre-selected samples into a first specified number of first samples corresponding to the category types according to the category types of the first features, wherein the first specified number is the category type number of the first features.
Preferably, the attribute of the first feature is a numerical type, and the step of determining a first specified number of classification partitions corresponding to the first feature according to the attribute of the first feature includes:
And dividing the pre-selected samples into a first specified number of first samples corresponding to the discrete intervals according to the discrete intervals corresponding to the continuous data representing the first features, wherein the first specified number is the number of the discrete intervals corresponding to the continuous data of the first features.
Preferably, the step of determining whether a target second sample satisfying a second preset condition exists in the second specified number of second samples includes:
obtaining a designated second sample corresponding to the maximum purchase rate by comparing the purchase rates corresponding to the second samples respectively;
judging whether the maximum purchase rate corresponding to the appointed second sample meets the purchase rate corresponding to a second preset condition;
if so, determining that a target second sample meeting a second preset condition exists.
Preferably, the step of determining whether a target second sample satisfying a second preset condition exists in the second specified number of second samples includes:
obtaining a designated second sample corresponding to the maximum purchase rate by comparing the purchase rates corresponding to the second samples respectively;
judging whether the maximum purchase rate corresponding to the appointed second sample meets the purchase rate corresponding to a second preset condition;
If yes, judging whether the total data amount of the appointed second sample is larger than a preset amount;
if the number is larger than the preset number, determining that a target second sample meeting a second preset condition exists.
The application also provides a device for searching the target group, which comprises:
the system comprises a first acquisition module, a second acquisition module and a storage module, wherein the first acquisition module is used for acquiring a plurality of pre-selected samples, and each pre-selected sample comprises user data corresponding to a plurality of characteristics of a user respectively;
a second acquisition module, configured to acquire a first feature having the greatest influence on the information amounts of the plurality of pre-selected samples from a plurality of features included in the plurality of pre-selected samples;
the first dividing module is used for dividing the plurality of pre-selected samples into a first sample with a first designated number according to the first designated number of classification partitions corresponding to the first features;
the screening module is used for screening target first samples meeting first preset conditions from each first sample, wherein the number of the target first samples is one or more;
a third obtaining module, configured to obtain, from a plurality of features included in the target first sample, a second feature that has a greatest influence on an information amount of the target first sample, where the second feature is different from the first feature;
The second dividing module is used for dividing the target first samples into second samples with a second specified number according to the second specified number of classification partitions corresponding to the second features;
the judging module is used for judging whether target second samples meeting a second preset condition exist in the second specified number of second samples;
and the judging module is used for stopping dividing the target second samples if the target second samples exist and judging the target second samples meeting the second preset conditions as corresponding target groups.
The present application also provides a computer device comprising a memory storing a computer program and a processor implementing the steps of the above method when the processor executes the computer program.
The present application also provides a computer readable storage medium having stored thereon a computer program, characterized in that the computer program when executed by a processor implements the steps of the above-mentioned method.
According to the method and the device, the characteristics corresponding to the important coefficients with the greatest influence on the target group are found through the decision tree model, so that the efficiency of finding the target group is improved, and the finding accuracy is improved. According to the characteristics of the greatest influence information, the method and the device realize refinement and division of the pre-selected samples, gradually inquire the target group, and realize effective utilization and management and control of the target group. According to the method and the system, the features of the target group are collected to form the feature set, so that the user portrait of the target group, which takes the feature set as a label, is formed, and potential customers with the feature set of the target group as the label can be conveniently developed.
Drawings
FIG. 1 is a flow chart of a method for searching a target group according to an embodiment of the present application;
FIG. 2 is a schematic structural diagram of an apparatus for searching a target group according to an embodiment of the present application;
FIG. 3 is a schematic structural diagram of a second obtaining module according to an embodiment of the present application;
FIG. 4 is a schematic structural diagram of an apparatus for searching for a target group according to another embodiment of the present application;
FIG. 5 is a schematic structural diagram of a third partition module according to an embodiment of the present application;
FIG. 6 is a schematic structural diagram of a third partition module according to another embodiment of the present application;
FIG. 7 is a schematic structural diagram of a judging module according to an embodiment of the present application;
FIG. 8 is a schematic structural diagram of a judging module according to another embodiment of the present application;
FIG. 9 is a schematic structural diagram of an apparatus for searching for a target group according to still another embodiment of the present application;
FIG. 10 is a schematic diagram of an internal structure of a computer device according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.
Referring to fig. 1, a method for searching a target group according to an embodiment of the present application includes:
S1a: a plurality of pre-selected samples are obtained, wherein each pre-selected sample comprises user data respectively corresponding to a plurality of features of a user.
The pre-selection sample of this embodiment contains a large amount of user data, for example, the pre-selection sample includes 1000 ten thousand users, each user has a plurality of features, the user with the most features among the 1000 ten thousand users is selected, and based on the features and the feature number of the user, for example, the features of the user a in the pre-selection sample are the most, and include features of 100 dimensions, then each user in the pre-selection sample selects the features of 100 dimensions, for example, name, age, gender, region, height, weight, product purchase frequency, purchase preference, and the like.
S1: and acquiring a first characteristic with the greatest influence on the information quantity of the plurality of pre-selected samples from the plurality of characteristics included in the plurality of pre-selected samples.
The first characteristic of the present embodiment, which has the greatest influence on the preselected sample, is the characteristic of the greatest fluctuation in the amount of information affecting the preselected sample. Taking the purchase rate affecting the pre-selected samples as an example, firstly, taking the 100 dimension features in the pre-selected samples as features according to a column and a data arrangement according to a decision tree calculation method, respectively, calculating the purchase rate value corresponding to each feature according to the data arrangement corresponding to each feature, arranging the 100 features according to the descending order according to the purchase rate value, and selecting 10 features which are ordered at the front end in the descending order to divide the pre-selected samples, wherein the feature which is ordered at the front end in the descending order is the first feature with the greatest influence on the information quantity of the pre-selected samples.
S2: and dividing the plurality of pre-selected samples into a first specified number of first samples according to a first specified number of classification partitions corresponding to the first features.
For example, each feature is pre-divided into a specified number of classification partitions, so that the first feature also corresponds to a first specified number of classification partitions, and the pre-selected samples are divided into pre-selected samples according to the classification partitions of the first feature, where the first specified number corresponds to the number of classification partitions of the first feature one by one. For example, the first characteristic is gender, which includes two classification zones for men and women, the first designated number is two, and the preselected sample may be divided into two first samples, one being a female first sample and the other being a male first sample. For another example, the first characteristic is age, which is pre-discretized into [0,20 ], [20,40 ], [40,60 ], [60,80 ], [80,100] five classification partitions, and the first specified number is five, the pre-selected samples can be divided into five first samples.
S3: screening target first samples meeting a first preset condition from each first sample, wherein the number of the target first samples is one or more.
For example, the first preset condition in this embodiment may be that the first average purchase rate of the first samples is greater than or equal to a preset threshold, for example, the first average purchase rate is 50%, and the first samples satisfying the first average purchase rate are all target first samples. For example, if the first female sample and the first male sample meet the requirement of the first average purchase rate, the first female sample and the first male sample are target first samples to enter the second divided echelon.
S4: and acquiring a second characteristic with the greatest influence on the information quantity of the target first sample from a plurality of characteristics included in the target first sample, wherein the second characteristic is different from the first characteristic.
In this embodiment, taking a selected target first sample as an example, the dividing process of the target first sample is explained, and other target first samples that are included in the second division ladder are processed in the same way. In this embodiment, taking a first sample of a female as an example, a second feature, such as an age, that has the greatest influence on the information content of the first sample of the target is found according to a decision tree calculation method. Since the gender in the same target first sample obtained after the samples are divided by using the gender as the first feature is the same, when the feature importance is ranked again, the feature of the gender is not ranked any more, for example, in the feature descending ranking, the first feature is the age, and the second feature is the age.
S5: and dividing the target first samples into second samples with a second specified number according to the second specified number of classification partitions corresponding to the second features.
For example, the present embodiment divides the target first samples into five second samples corresponding to five intervals by dispersing the ages into five intervals of [0,20 ], [20,40 ], [40,60 ], [60,80 ], [80,100] respectively. This is equivalent to further refinement of the pre-selected samples to find a target population with a higher purchase rate.
S6: and judging whether target second samples meeting a second preset condition exist in the second specified number of second samples.
The preset condition of the embodiment can be set according to the requirement of the target crowd, for example, the preset condition is that the purchasing rate reaches more than 90%. The second preset condition in this embodiment refers to a preset purchasing rate after a plurality of features pass through, and is different from a first preset condition corresponding to a single feature dividing sample, and it can be understood that the purchasing rate corresponding to the first preset condition is smaller than or equal to the purchasing rate corresponding to the second preset condition, so as to achieve that a target group meeting the purchasing rate requirement corresponding to the second preset condition is found by gradually narrowing the sample range.
S7: if yes, stopping dividing the target second samples, and judging the target second samples meeting the second preset condition as corresponding target groups.
If the purchase rate of a certain second sample reaches the purchase rate corresponding to the second preset condition, for example, the purchase rate corresponding to the second preset condition reaches more than 90%, the target crowd to be found is found, if the target crowd does not exist, a third preset condition for screening and entering the third sample division is re-established for the target second sample, the required level of the third preset condition is greater than the required level of the first preset condition, for example, the third average purchase rate corresponding to the third preset condition is greater than the first average purchase rate corresponding to the first preset condition, for example, the third average purchase rate is 60% and greater than 50% of the first average purchase rate, so that the target crowd meeting the requirement can be quickly found in a gathering mode.
Further, step S1 of the present embodiment includes:
s10: and calculating the total information quantity of the preselected sample.
The present embodiment obtains the total information amount by calculating the entropy of the preselected sample. The calculation method is as follows:wherein Pi represents the proportion of the specific population with purchase data to the preselected sample, i.e., pi represents the ratio of the purchased population to the entire population; h (x) represents the calculated overall entropy sign, and the entropy of the pre-selected samples of this embodiment is denoted as H (D).
S11: and acquiring influence values of the features on the total information quantity respectively.
The influence values of the features of the embodiment on the total information quantity are obtained through an information gain algorithm, and the influence values are obtained through the influence amplitude of each feature on the overall entropy after each feature is calculated independently and added into a calculation process. The information gain algorithm is calculated as follows: g (D, a) =h (D) -H (d|a), where g (D, a) represents the magnitude of the influence of the a feature on the overall entropy, H (D) represents the entropy of the preselected sample, and H (d|a) represents the entropy of the sample divided according to the a feature.
Other embodiments of the present application may obtain the influence value of each feature on the total information amount through the information gain ratio, and reduce the influence on the smaller entropy value of the small sample by introducing a penalty parameter for correcting the information gain, i.e. the information gain ratio=penalty parameter.
S12: and arranging the features in a descending order according to the magnitude of the influence values.
The larger the value of the influence value of the embodiment, the larger the overall influence is, the stronger the prediction capability of the corresponding features is, and the more important the features are for dividing samples and searching target groups. The first characteristics of the divided preselected samples are more intuitively and rapidly screened by arranging the characteristics in descending order according to the magnitude of each influence value.
S13: and setting the feature corresponding to the first influence value at the forefront of the arrangement sequence in the descending order as the first feature.
In this embodiment, the feature corresponding to the first influence value with the forefront arrangement order in the descending order is directly selected as the first feature, so as to accurately determine the first feature, and achieve accurate division of the pre-selected samples, so as to ensure the reliability of the target group found finally.
Further, before step S2 of the present embodiment, the method further includes:
s20: and acquiring the attribute of the first feature.
The attributes of the first feature of the present embodiment include two attributes of a category type feature and a numerical type feature. For example, gender is a category type feature and age is a numerical type feature.
S21: and determining a first specified number of classification partitions corresponding to the first feature according to the attribute of the first feature.
In this embodiment, the first features of different attributes have different dividing criteria for the pre-selected samples, and different processing methods for the division. For example, the classification standard of the category type features can only divide samples according to the category types contained, and the number of the category types determines the number of the classification partitions; the numerical value type characteristic can be firstly discretized into a plurality of data intervals which are distributed continuously according to the requirement, then the samples are divided according to the plurality of data intervals, and the number of the data intervals determines the number of the classification partitions.
Further, the attribute of the first feature of the present embodiment is a category type, and step S21 includes:
s210: the plurality of pre-selected samples are divided into a first specified number of first samples corresponding to category categories of the first feature according to the category categories of the first feature, wherein the first specified number is the number of category categories of the first feature.
This embodiment is exemplified by a category type feature, and specifically describes a process of dividing a preselected sample by using the category type feature. First, if the category type feature includes several categories, the pre-selected samples are divided into several first samples. The first sample, the second sample and other samples are pre-selected according to the process and principle of classifying the samples according to the category type characteristics.
Further, the attribute of the first feature of another embodiment of the present application is a numerical type, and step S21 includes:
s211: and dividing the pre-selected samples into a first specified number of first samples corresponding to the discrete intervals according to the discrete intervals corresponding to the continuous data representing the first features, wherein the first specified number is the number of the discrete intervals corresponding to the continuous data of the first features.
This embodiment is exemplified by a numerical feature, and specifically describes a process of dividing a preselected sample by using the numerical feature. Firstly, the numerical value characteristics are scattered into a plurality of discrete intervals which are arranged successively through a discrete algorithm, and then the preselected sample is divided into a plurality of first samples corresponding to the discrete intervals. The process and principle of dividing the samples by the other samples such as the first sample, the second sample and the like according to the numerical value characteristics are the same as that of the pre-selected samples.
In this embodiment, the value range of the numerical feature is first obtained, that is, the maximum value and the minimum value of the numerical feature are obtained. Then, according to the input discretization degree parameter num, calculating a plurality of fractional numbers, if num=5 is input, the numerical feature takes age as an example, the value range is 0 to 100, then the values corresponding to 20%,40%,60% and 80% positions after sorting in the continuous data are calculated, then five interval ranges of [0,20 ], [20,40 ], [40,60 ], [60, 80) and [80,100] are sequentially arranged, and the information of the interval ranges is used for replacing the specifically determined numerical value in the original preselected sample, so as to complete the conversion of the numerical feature from the point numerical feature to the discrete interval feature, namely, the five interval ranges correspond to five discrete intervals, for example, the age of a certain user is 25, and the corresponding discrete interval is [20, 40). In this embodiment, through discretization, the influence of fitting deviation caused by outliers (outliers) on the overall distribution is avoided, for example: 99% of data in the preselected sample is in the interval of 0 to 100, but the value of 1% of data is 1000, so that the algorithm can pay attention to abnormal data too much due to overlarge numerical value change in the identification process, and larger deviation can be brought to the fitting result. The characteristic discretization has stronger interpretation, the numerical value of the characteristic is infinite, the level of a specific value in a preselected sample cannot be found at the moment, and the crowd ratio of the discrete interval can be easily calculated after the discretization.
Further, step S6 of the present embodiment includes:
s60: and comparing the purchase rates corresponding to the second samples respectively to obtain the appointed second samples corresponding to the maximum purchase rate.
Taking searching a target crowd with a designated purchase rate as an example, the embodiment terminates a second preset condition for continuously dividing the sample, and satisfies the purchase rate corresponding to the second preset condition for the purchase rate of the divided small sample. In this embodiment, the number of divided small samples is multiple, and the small sample with the largest purchase rate is obtained by comparing the purchase rates corresponding to each small sample, so as to determine whether the target group has been found by comparing whether the largest purchase rate reaches the purchase rate corresponding to the second preset condition.
S61: and judging whether the maximum purchase rate corresponding to the appointed second sample meets the purchase rate corresponding to the second preset condition.
In this embodiment, the first features are screened for the pre-selected samples, the pre-selected samples are divided into first samples according to the first features, then the second features corresponding to the first samples are screened for the second samples, the second samples are divided into the second samples according to the second features corresponding to the second features, and the samples are circularly divided until the final purchase rate of one or a plurality of small samples reaches the purchase rate corresponding to the second preset condition.
S62: if so, determining that a target second sample meeting a second preset condition exists.
Further, step S6 of another embodiment of the present application includes:
s63: and comparing the purchase rates corresponding to the second samples respectively to obtain the appointed second samples corresponding to the maximum purchase rate.
In this embodiment, after each first sample is divided into a plurality of corresponding second samples, a second sample corresponding to the purchase rate corresponding to the second preset condition is found, but it is necessary to further analyze whether the data amount in the second sample has an actual reference value, and if the data amount in the second sample is smaller, for example, several tens, the reference value is considered to be not large.
S64: and judging whether the maximum purchase rate corresponding to the appointed second sample meets the preset purchase rate or not.
In order to avoid overlarge calculated amount of multiple divisions, when the purchase rate corresponding to the second preset condition is generally reached, only 6 features or at most 10 features are needed to divide the samples in sequence, so that small samples corresponding to the target group can be found.
S65: if yes, judging whether the total data amount of the appointed second sample is larger than a preset amount.
The target second sample in this embodiment not only requires that the purchase rate reach the expected value, but also requires that the data volume reach the expected value, that is, the number of users in the target group reaches the expected value, so that the number of users in the target group reaching the expected purchase rate is too small, and the actual application value of the feature summary target group is lost.
S66: if the number is larger than the preset number, determining that a target second sample meeting a second preset condition exists.
Further, after step S7 of the present embodiment, it includes:
s8: and summarizing the first characteristics and the second characteristics when searching the target group to form a characteristic combination.
The first characteristic and the second characteristic which are used for dividing the preselected sample for a plurality of times are combined to form the characteristic combination to serve as the identity tag of the small sample corresponding to the target group. In other embodiments of the present application, after each first sample is divided into a plurality of second samples corresponding to each first sample, if no target group is found, then the second samples are continuously divided again to obtain a plurality of third samples corresponding to each second sample, or are divided all the time until an nth sample corresponding to the target group is found, then the first feature, the second feature and the nth feature used for dividing the preset sample for multiple times form a feature combination, and the feature combination is used as an identity tag of a small sample corresponding to the target group.
S9: and combining the characteristics as a user portrait of the target group.
According to the embodiment, the user portrait is formed on the target group, so that the target group can be better identified, and a new user with the same characteristics can be more conveniently expanded to serve as a client according to the user portrait.
In this embodiment, taking a user group with a specified purchase rate as an example, the pre-selected sample is a database of a product purchase platform. The pre-selected sample of another embodiment of the present application is the characteristic data of other cases such as diabetes, and a specific crowd of a certain high-incidence disease can be found according to the above process and principle, so as to effectively control the disease attack rate.
In another embodiment, the pre-selected sample is a feature database of the lending crowd, and the specific crowd with lending risk can be searched according to the above process and principle, so as to effectively control the lending risk.
According to the embodiment, the feature corresponding to the important coefficient with the greatest influence on the target group is searched through the decision tree model, so that the efficiency of searching the target group is improved, and the searching accuracy is improved. According to the characteristics with the greatest influence, the method and the device realize refinement and division of the pre-selected samples so as to gradually inquire the target group and realize effective utilization and control of the target group. According to the embodiment, the features of the target group are collected to form the feature set, so that the user portrait of the target group, which takes the feature set as a label, is formed, and potential clients with the feature set of the target group as the label can be conveniently developed.
Referring to fig. 2, an apparatus for searching a target group according to an embodiment of the present application includes:
a first obtaining module 1a, configured to obtain a plurality of pre-selected samples, where each pre-selected sample includes user data corresponding to a plurality of features of a user.
The pre-selection sample of this embodiment contains a large amount of user data, for example, the pre-selection sample includes 1000 ten thousand users, each user has a plurality of features, the user with the most features among the 1000 ten thousand users is selected, and based on the features and the feature number of the user, for example, the features of the user a in the pre-selection sample are the most, and include features of 100 dimensions, then each user in the pre-selection sample selects the features of 100 dimensions, for example, name, age, gender, region, height, weight, product purchase frequency, purchase preference, and the like.
A second obtaining module 1, configured to obtain, from a plurality of features included in the plurality of pre-selected samples, a first feature that has the greatest influence on the information amount of the plurality of pre-selected samples.
The pre-selection sample of this embodiment includes a large amount of user data, for example, the pre-selection sample includes 1000 ten thousand users, each user has a plurality of features, the user having the most features among the 1000 ten thousand users is selected, and based on the features and the feature number of the user, for example, the features of the user a in the pre-selection sample are the most and include features of 100 dimensions, then each user in the pre-selection sample selects the features of 100 dimensions, for example, name, age, sex, region, height, weight, product purchase frequency, purchase preference, and the like. The first characteristic of the present embodiment, which has the greatest influence on the preselected sample, is the characteristic of the greatest fluctuation in the amount of information affecting the preselected sample. Taking the purchase rate affecting the pre-selected samples as an example, firstly, taking the 100 dimension features in the pre-selected samples as features according to a column and a data arrangement according to a decision tree calculation method, respectively, calculating the purchase rate value corresponding to each feature according to the data arrangement corresponding to each feature, arranging the 100 features according to the descending order according to the purchase rate value, and selecting 10 features which are ordered at the front end in the descending order to divide the pre-selected samples, wherein the feature which is ordered at the front end in the descending order is the first feature with the greatest influence on the information quantity of the pre-selected samples.
The first dividing module 2 is configured to divide the plurality of pre-selected samples into a first specified number of first samples according to a first specified number of classification partitions corresponding to the first feature.
For example, each feature is pre-divided into a specified number of classification partitions, so that the first feature also corresponds to a first specified number of classification partitions, and the pre-selected samples are divided into pre-selected samples according to the classification partitions of the first feature, where the first specified number corresponds to the number of classification partitions of the first feature one by one. For example, the first characteristic is gender, which includes two classification zones for men and women, the first designated number is two, and the preselected sample may be divided into two first samples, one being a female first sample and the other being a male first sample. For another example, the first characteristic is age, which is pre-discretized into [0,20 ], [20,40 ], [40,60 ], [60,80 ], [80,100] five classification partitions, and the first specified number is five, the pre-selected samples can be divided into five first samples.
And a screening module 3, configured to screen target first samples that meet a first preset condition from each of the first samples, where the target first samples are one or more.
For example, the first preset condition in this embodiment may be that the first average purchase rate of the first samples is greater than or equal to a preset threshold, for example, the first average purchase rate is 50%, and the first samples satisfying the first average purchase rate are all target first samples. For example, if the first female sample and the first male sample meet the requirement of the first average purchase rate, the first female sample and the first male sample are target first samples to enter the second divided echelon.
A third obtaining module 4, configured to obtain, from a plurality of features included in the target first sample, a second feature that has the greatest influence on the information amount of the target first sample, where the second feature is different from the first feature.
In this embodiment, taking a selected target first sample as an example, the dividing process of the target first sample is explained, and other target first samples that are included in the second division ladder are processed in the same way. In this embodiment, taking a first sample of a female as an example, a second feature, such as an age, that has the greatest influence on the information content of the first sample of the target is found according to a decision tree calculation method. Since the gender in the same target first sample obtained after the samples are divided by using the gender as the first feature is the same, when the feature importance is ranked again, the feature of the gender is not ranked any more, for example, in the feature descending ranking, the first feature is the age, and the second feature is the age.
And the second dividing module 5 is configured to divide the target first sample into a second specified number of second samples according to a second specified number of classification partitions corresponding to the second feature.
For example, the present embodiment divides the target first samples into five second samples corresponding to five intervals by dispersing the ages into five intervals of [0,20 ], [20,40 ], [40,60 ], [60,80 ], [80,100] respectively. This is equivalent to further refinement of the pre-selected samples to find a target population with a higher purchase rate.
And the judging module 6 is used for judging whether target second samples meeting preset conditions exist in the second specified number of second samples.
The preset condition of the embodiment can be set according to the requirement of the target crowd, for example, the preset condition is that the purchasing rate reaches more than 90%. The second preset condition in this embodiment refers to a preset purchasing rate after a plurality of features pass through, and is different from a first preset condition corresponding to a single feature dividing sample, and it can be understood that the purchasing rate corresponding to the first preset condition is smaller than or equal to the purchasing rate corresponding to the second preset condition, so as to achieve that a target group meeting the purchasing rate requirement corresponding to the second preset condition is found by gradually narrowing the sample range.
And the judging module 7 is used for stopping dividing the target second samples if the target second samples exist and judging the target second samples meeting the second preset condition as corresponding target groups.
If the purchase rate of a certain second sample reaches the purchase rate corresponding to the second preset condition, for example, the purchase rate corresponding to the second preset condition reaches more than 90%, the target crowd to be found is found, if the target crowd does not exist, a third preset condition for screening and entering the third sample division is re-established for the target second sample, the required level of the third preset condition is greater than the required level of the first preset condition, for example, the third average purchase rate corresponding to the third preset condition is greater than the first average purchase rate corresponding to the first preset condition, for example, the third average purchase rate is 60% and greater than 50% of the first average purchase rate, so that the target crowd meeting the requirement can be quickly found in a gathering mode.
Referring to fig. 3, the second acquisition module 1 of the present embodiment includes:
a calculation unit 10 for calculating the total information amount of the pre-selected samples.
The present embodiment obtains the total information amount by calculating the entropy of the preselected sample. The calculation method is as follows:wherein Pi represents the proportion of the specific population with purchase data to the preselected sample, i.e., pi represents the ratio of the purchased population to the entire population; h (x) represents the calculated overall entropy sign, and the entropy of the pre-selected samples of this embodiment is denoted as H (D).
A first acquiring unit 11, configured to acquire influence values of the features on the total information amount, respectively.
The influence values of the features of the embodiment on the total information quantity are obtained through an information gain algorithm, and the influence values are obtained through the influence amplitude of each feature on the overall entropy after each feature is calculated independently and added into a calculation process. The information gain algorithm is calculated as follows: g (D, a) =h (D) -H (d|a), where g (D, a) represents the magnitude of the influence of the a feature on the overall entropy, H (D) represents the entropy of the preselected sample, and H (d|a) represents the entropy of the sample divided according to the a feature.
Other embodiments of the present application may obtain the influence value of each feature on the total information amount through the information gain ratio, and reduce the influence on the smaller entropy value of the small sample by introducing a penalty parameter for correcting the information gain, i.e. the information gain ratio=penalty parameter.
An arrangement unit 12 for arranging the features in descending order according to the magnitude of each of the influence values.
The larger the value of the influence value of the embodiment, the larger the overall influence is, the stronger the prediction capability of the corresponding features is, and the more important the features are for dividing samples and searching target groups. The first characteristics of the divided preselected samples are more intuitively and rapidly screened by arranging the characteristics in descending order according to the magnitude of each influence value.
A setting unit 13, configured to set a feature corresponding to a first influence value that is the forefront in the arrangement order in the descending order as the first feature.
In this embodiment, the feature corresponding to the first influence value with the forefront arrangement order in the descending order is directly selected as the first feature, so as to accurately determine the first feature, and achieve accurate division of the pre-selected samples, so as to ensure the reliability of the target group found finally.
Referring to fig. 4, an apparatus for searching a target group according to another embodiment of the present application includes:
a fourth obtaining module 20, configured to obtain an attribute of the first feature.
The attributes of the first feature of the present embodiment include two attributes of a category type feature and a numerical type feature. For example, gender is a category type feature and age is a numerical type feature.
And a third dividing module 21, configured to determine a first specified number of classification partitions corresponding to the first feature according to the attribute of the first feature.
In this embodiment, the first features of different attributes have different dividing criteria for the pre-selected samples, and different processing methods for the division. For example, the classification standard of the category type features can only divide samples according to the category types contained, and the number of the category types determines the number of the classification partitions; the numerical value type characteristic can be firstly discretized into a plurality of data intervals which are distributed continuously according to the requirement, then the samples are divided according to the plurality of data intervals, and the number of the data intervals determines the number of the classification partitions.
Referring to fig. 5, the attribute of the first feature of the present embodiment is a category type, and the third division module 21 includes:
a first dividing unit 210, configured to divide the plurality of pre-selected samples into a first sample with a first designated number corresponding to the category type according to the category type of the first feature, where the first designated number is the number of category types of the first feature.
This embodiment is exemplified by a category type feature, and specifically describes a process of dividing a preselected sample by using the category type feature. First, if the category type feature includes several categories, the pre-selected samples are divided into several first samples. The first sample, the second sample and other samples are pre-selected according to the process and principle of classifying the samples according to the category type characteristics.
Referring to fig. 6, the attribute of the first feature of another embodiment of the present application is a numerical value, and the third dividing module 21 includes:
the second dividing unit 211 is configured to divide the pre-selected samples into a first specified number of first samples corresponding to discrete intervals according to discrete intervals corresponding to continuous data representing the first feature, where the first specified number is the number of discrete intervals corresponding to continuous data of the first feature.
This embodiment is exemplified by a numerical feature, and specifically describes a process of dividing a preselected sample by using the numerical feature. Firstly, the numerical value characteristics are scattered into a plurality of discrete intervals which are arranged successively through a discrete algorithm, and then the preselected sample is divided into a plurality of first samples corresponding to the discrete intervals. The process and principle of dividing the samples by the other samples such as the first sample, the second sample and the like according to the numerical value characteristics are the same as that of the pre-selected samples.
In this embodiment, the value range of the numerical feature is first obtained, that is, the maximum value and the minimum value of the numerical feature are obtained. Then, according to the input discretization degree parameter num, calculating a plurality of fractional numbers, if num=5 is input, the numerical feature takes age as an example, the value range is 0 to 100, then the values corresponding to 20%,40%,60% and 80% positions after sorting in the continuous data are calculated, then five interval ranges of [0,20 ], [20,40 ], [40,60 ], [60, 80) and [80,100] are sequentially arranged, and the information of the interval ranges is used for replacing the specifically determined numerical value in the original preselected sample, so as to complete the conversion of the numerical feature from the point numerical feature to the discrete interval feature, namely, the five interval ranges correspond to five discrete intervals, for example, the age of a certain user is 25, and the corresponding discrete interval is [20, 40). In this embodiment, through discretization, the influence of fitting deviation caused by outliers (outliers) on the overall distribution is avoided, for example: 99% of data in the preselected sample is in the interval of 0 to 100, but the value of 1% of data is 1000, so that the algorithm can pay attention to abnormal data too much due to overlarge numerical value change in the identification process, and larger deviation can be brought to the fitting result. The characteristic discretization has stronger interpretation, the numerical value of the characteristic is infinite, the level of a specific value in a preselected sample cannot be found at the moment, and the crowd ratio of the discrete interval can be easily calculated after the discretization.
Referring to fig. 7, the judging module 6 of the present embodiment includes:
the first obtaining unit 60 is configured to obtain a specified second sample corresponding to the maximum purchase rate by comparing the purchase rates respectively corresponding to the second samples.
Taking searching a target crowd with a designated purchase rate as an example, the embodiment terminates a second preset condition for continuously dividing the sample, and satisfies the purchase rate corresponding to the second preset condition for the purchase rate of the divided small sample. In this embodiment, the number of divided small samples is multiple, and the small sample with the largest purchase rate is obtained by comparing the purchase rates corresponding to each small sample, so as to determine whether the target group has been found by comparing whether the largest purchase rate reaches the purchase rate corresponding to the second preset condition.
A first judging unit 61, configured to judge whether the maximum purchase rate corresponding to the specified second sample meets the purchase rate corresponding to the second preset condition.
In this embodiment, the first features are screened for the pre-selected samples, the pre-selected samples are divided into first samples according to the first features, then the second features corresponding to the first samples are screened for the second samples, the second samples are divided into the second samples according to the second features corresponding to the second features, and the samples are circularly divided until the final purchase rate of one or a plurality of small samples reaches the purchase rate corresponding to the second preset condition.
The first determining unit 62 is configured to determine that there is a target second sample that satisfies a second preset condition if the second sample satisfies the second preset condition.
Referring to fig. 8, a judging module 6 according to another embodiment of the present application includes:
a second obtaining unit 63, configured to obtain a specified second sample corresponding to the maximum purchase rate by comparing the purchase rates respectively corresponding to the second samples.
In this embodiment, after each first sample is divided into a plurality of corresponding second samples, a second sample of the purchase rate corresponding to the second preset condition is found, but it is necessary to further analyze whether the data amount in the second sample has an actual reference value, and if the data amount in the second sample is smaller, for example, several tens, the reference value is considered to be not large.
A second judging unit 64, configured to judge whether the maximum purchase rate corresponding to the specified second sample meets a preset purchase rate.
In order to avoid overlarge calculated amount of multiple divisions, when the purchase rate corresponding to the second preset condition is generally reached, only 6 features or at most 10 features are needed to divide the samples in sequence, so that small samples corresponding to the target group can be found.
And a third judging unit 65, configured to judge whether the total amount of data of the specified second sample is greater than a preset amount if the total amount of data of the specified second sample is greater than the preset amount.
The target second sample in this embodiment not only requires that the purchase rate reach the expected value, but also requires that the data volume reach the expected value, that is, the number of users in the target group reaches the expected value, so that the number of users in the target group reaching the expected purchase rate is too small, and the actual application value of the feature summary target group is lost.
A second determining unit 66, configured to determine that there is a target second sample that satisfies a second preset condition if the second sample is greater than the preset number.
Referring to FIG. 9, an apparatus for searching for a target group according to still another embodiment of the present application includes
And the summarizing module 8 is used for summarizing the first characteristics and the second characteristics when the target group is searched to form a characteristic combination.
The first characteristic and the second characteristic which are used for dividing the preselected sample for a plurality of times are combined to form the characteristic combination to serve as the identity tag of the small sample corresponding to the target group. In other embodiments of the present application, after each first sample is divided into a plurality of second samples corresponding to each first sample, if no target group is found, then the second samples are continuously divided again to obtain a plurality of third samples corresponding to each second sample, or are divided all the time until an nth sample corresponding to the target group is found, then the first feature, the second feature and the nth feature used for dividing the preset sample for multiple times form a feature combination, and the feature combination is used as an identity tag of a small sample corresponding to the target group.
As a module 9 for combining the features as a representation of the users of the target group.
According to the embodiment, the user portrait is formed on the target group, so that the target group can be better identified, and a new user with the same characteristics can be more conveniently expanded to serve as a client according to the user portrait.
In this embodiment, taking a user group with a specified purchase rate as an example, the pre-selected sample is a database of a product purchase platform. The pre-selected sample of another embodiment of the present application is the characteristic data of other cases such as diabetes, and a specific crowd of a certain high-incidence disease can be found according to the above process and principle, so as to effectively control the disease attack rate.
In another embodiment, the pre-selected sample is a feature database of the lending crowd, and the specific crowd with lending risk can be searched according to the above process and principle, so as to effectively control the lending risk.
Referring to fig. 10, a computer device is further provided in the embodiment of the present application, where the computer device may be a server, and the internal structure of the computer device may be as shown in fig. 10. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the computer is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is used for all data needed for the process of finding the target population. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program, when executed by a processor, implements a method of finding a target population.
The processor executes the method for searching the target group, which comprises the following steps: acquiring a plurality of pre-selected samples, wherein each pre-selected sample comprises user data corresponding to a plurality of characteristics of a user respectively; acquiring a first characteristic with the greatest influence on the information quantity of the plurality of pre-selected samples from a plurality of characteristics included in the plurality of pre-selected samples; dividing the plurality of pre-selected samples into a first specified number of first samples according to a first specified number of classification partitions corresponding to the first features; screening target first samples meeting a first preset condition from each first sample, wherein the number of the target first samples is one or more; acquiring a second feature with the greatest influence on the information quantity of the target first sample from a plurality of features included in the target first sample, wherein the second feature is different from the first feature; dividing the target first samples into a second specified number of second samples according to a second specified number of classification partitions corresponding to the second features; judging whether target second samples meeting preset conditions exist in the second specified number of second samples or not; if yes, stopping dividing the target second samples, and judging that the target second samples meeting the preset conditions are corresponding target groups; otherwise, the target second sample is subdivided.
According to the computer equipment, the characteristics corresponding to the important coefficients with the greatest influence on the target group are searched through the decision tree model, so that the efficiency of searching the target group is improved, and the searching accuracy is improved. According to the characteristics corresponding to the found important coefficient with the greatest influence, the method and the device realize refinement and division of the pre-selected samples so as to gradually inquire the target group and realize effective utilization and management and control of the target group. According to the method and the system, the features of the target group are collected to form the feature set, so that the user portrait of the target group, which takes the feature set as a label, is formed, and potential customers with the feature set of the target group as the label can be conveniently developed.
In one embodiment, the step of obtaining, by the processor, a first feature having a greatest influence on the information amount of the plurality of pre-selected samples from a plurality of features included in the plurality of pre-selected samples includes: calculating the total information quantity of the preselected sample; acquiring influence values of all the characteristics on the total information quantity respectively; the features are arranged in a descending order according to the magnitude of the influence value; and setting the feature corresponding to the first influence value at the forefront of the arrangement sequence in the descending order as the first feature.
In one embodiment, before the step of dividing the plurality of pre-selected samples into the first specified number of first samples according to the first specified number of classification partitions corresponding to the first feature, the processor further includes: acquiring the attribute of the first feature; and determining a first specified number of classification partitions corresponding to the first feature according to the attribute of the first feature.
In one embodiment, the attribute of the first feature is a category type, and the step of determining a first specified number of classification partitions corresponding to the first feature according to the attribute of the first feature includes: and dividing the plurality of pre-selected samples into a first specified number of first samples corresponding to the category types according to the category types of the first features, wherein the first specified number is the category type number of the first features.
In one embodiment, the attribute of the first feature is a numerical type, and the step of determining a first specified number of classification partitions corresponding to the first feature according to the attribute of the first feature includes: and dividing the pre-selected samples into a first specified number of first samples corresponding to the discrete intervals according to the discrete intervals corresponding to the continuous data representing the first features, wherein the first specified number is the number of the discrete intervals corresponding to the continuous data of the first features.
In one embodiment, the step of determining, by the processor, whether the target second sample satisfying the second preset condition exists in the second specified number of second samples includes: obtaining a designated second sample corresponding to the maximum purchase rate by comparing the purchase rates corresponding to the second samples respectively; judging whether the maximum purchase rate corresponding to the appointed second sample meets the purchase rate corresponding to a second preset condition; if so, determining that a target second sample meeting a second preset condition exists.
In one embodiment, the step of determining, by the processor, whether the target second sample satisfying the second preset condition exists in the second specified number of second samples includes: obtaining a designated second sample corresponding to the maximum purchase rate by comparing the purchase rates corresponding to the second samples respectively; judging whether the maximum purchase rate corresponding to the appointed second sample meets the purchase rate corresponding to a second preset condition; if yes, judging whether the total data amount of the appointed second sample is larger than a preset amount; if the number is larger than the preset number, determining that a target second sample meeting a second preset condition exists.
Those skilled in the art will appreciate that the architecture shown in fig. 10 is merely a block diagram of a portion of the architecture in connection with the present application and is not intended to limit the computer device to which the present application is applied.
An embodiment of the present application further provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method of finding a target population, comprising: acquiring a plurality of pre-selected samples, wherein each pre-selected sample comprises user data corresponding to a plurality of characteristics of a user respectively; acquiring a first characteristic with the greatest influence on the information quantity of the plurality of pre-selected samples from a plurality of characteristics included in the plurality of pre-selected samples; dividing the plurality of pre-selected samples into a first specified number of first samples according to a first specified number of classification partitions corresponding to the first features; screening target first samples meeting a first preset condition from each first sample, wherein the number of the target first samples is one or more; acquiring a second feature with the greatest influence on the information quantity of the target first sample from a plurality of features included in the target first sample, wherein the second feature is different from the first feature; dividing the target first samples into a second specified number of second samples according to a second specified number of classification partitions corresponding to the second features; judging whether target second samples meeting preset conditions exist in the second specified number of second samples or not; if yes, stopping dividing the target second samples, and judging that the target second samples meeting the preset conditions are corresponding target groups; otherwise, the target second sample is subdivided.
According to the computer readable storage medium, the characteristics corresponding to the important coefficients with the greatest influence on the target group are searched through the decision tree model, so that the efficiency of searching the target group is improved, and the searching accuracy is improved. According to the method and the device, the refinement and division of the pre-selected samples are realized according to the characteristics corresponding to the found important coefficient with the greatest influence, so that the target group is gradually inquired, and the effective utilization and control of the target group are realized. According to the method and the system, the features of the target group are collected to form the feature set, so that the user portrait of the target group, which takes the feature set as a label, is formed, and potential customers with the feature set of the target group as the label can be conveniently developed.
In one embodiment, the step of obtaining, by the processor, a first feature having a greatest influence on the information amount of the plurality of pre-selected samples from a plurality of features included in the plurality of pre-selected samples includes: calculating the total information quantity of the preselected sample; acquiring influence values of all the characteristics on the total information quantity respectively; the features are arranged in a descending order according to the magnitude of the influence value; and setting the feature corresponding to the first influence value at the forefront of the arrangement sequence in the descending order as the first feature.
In one embodiment, before the step of dividing the plurality of pre-selected samples into the first specified number of first samples according to the first specified number of class partitions corresponding to the first feature, the processor includes: acquiring the attribute of the first feature; and determining a first specified number of classification partitions corresponding to the first feature according to the attribute of the first feature.
In one embodiment, the attribute of the first feature is a category type, and the step of determining a first specified number of classification partitions corresponding to the first feature according to the attribute of the first feature includes: and dividing the plurality of pre-selected samples into a first specified number of first samples corresponding to the category types according to the category types of the first features, wherein the first specified number is the category type number of the first features.
In one embodiment, the attribute of the first feature is a numerical type, and the step of determining a first specified number of classification partitions corresponding to the first feature according to the attribute of the first feature includes: and dividing the pre-selected samples into a first specified number of first samples corresponding to the discrete intervals according to the discrete intervals corresponding to the continuous data representing the first features, wherein the first specified number is the number of the discrete intervals corresponding to the continuous data of the first features.
In one embodiment, the step of determining, by the processor, whether the target second sample satisfying the second preset condition exists in the second specified number of second samples includes: obtaining the appointed second sample corresponding to the maximum purchase rate by comparing the purchase rates corresponding to the second samples respectively; judging whether the maximum purchase rate corresponding to the appointed second sample meets the purchase rate corresponding to a second preset condition; if so, determining that a target second sample meeting a second preset condition exists.
In one embodiment, the step of determining, by the processor, whether the target second sample satisfying the second preset condition exists in the second specified number of second samples includes: obtaining a designated second sample corresponding to the maximum purchase rate by comparing the purchase rates corresponding to the second samples respectively; judging whether the maximum purchase rate corresponding to the appointed second sample meets the purchase rate corresponding to a second preset condition; if yes, judging whether the total data amount of the appointed second sample is larger than a preset amount; if the number is larger than the preset number, determining that a target second sample meeting a second preset condition exists.
Those skilled in the art will appreciate that implementing all or part of the above-described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium provided herein and used in embodiments may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual speed data rate SDRAM (SSRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, apparatus, article or method that comprises the element.
The foregoing description is only of the preferred embodiments of the present application, and is not intended to limit the scope of the claims, and all equivalent structures or equivalent processes using the descriptions and drawings of the present application, or direct or indirect application in other related technical fields are included in the scope of the claims of the present application.

Claims (10)

1. A method of finding a target population, comprising:
acquiring a plurality of pre-selected samples, wherein each pre-selected sample comprises user data corresponding to a plurality of characteristics of a user respectively;
Acquiring a first characteristic with the greatest influence on the information quantity of the plurality of pre-selected samples from a plurality of characteristics included in the plurality of pre-selected samples;
dividing the plurality of pre-selected samples into a first specified number of first samples according to a first specified number of classification partitions corresponding to the first features;
screening target first samples meeting a first preset condition from each first sample, wherein the number of the target first samples is one or more;
acquiring a second feature with the greatest influence on the information quantity of the target first sample from a plurality of features included in the target first sample, wherein the second feature is different from the first feature;
dividing the target first samples into a second specified number of second samples according to a second specified number of classification partitions corresponding to the second features;
judging whether target second samples meeting a second preset condition exist in the second specified number of second samples;
if yes, stopping dividing the target second samples, and judging the target second samples meeting the second preset conditions as corresponding target groups;
the acquiring a first feature having the greatest influence on the information quantity of the plurality of pre-selected samples from a plurality of features included in the plurality of pre-selected samples includes:
If the purchase rate of the preselected sample is affected, the 100 dimension features in the preselected sample are respectively arranged according to a decision tree calculation method, a row of features is used as a list of data arrangement for purchasing products or not, the purchase rate values corresponding to the features are respectively calculated according to the data arrangement corresponding to the features, the 100 features are arranged according to the purchase rate values, 10 features which are ordered at the front end in the descending order are selected to divide the preselected sample, wherein the feature which is ordered at the front end in the descending order is the first feature with the largest influence on the information quantity of the preselected sample.
2. The method of finding a target population according to claim 1, wherein the step of obtaining a first feature having a greatest influence on the information amount of the plurality of pre-selected samples from a plurality of features included in the plurality of pre-selected samples comprises:
calculating the total information quantity of the preselected sample;
acquiring influence values of all the characteristics on the total information quantity respectively;
the features are arranged in a descending order according to the magnitude of the influence value;
and setting the feature corresponding to the first influence value at the forefront of the arrangement sequence in the descending order as the first feature.
3. The method of claim 1, wherein prior to the step of dividing the plurality of pre-selected samples into a first specified number of first samples according to the first specified number of classification partitions corresponding to the first feature, further comprising:
acquiring the attribute of the first feature;
and determining a first specified number of classification partitions corresponding to the first feature according to the attribute of the first feature.
4. The method for finding a target group according to claim 3, wherein the attribute of the first feature is a category type, and the step of determining a first specified number of classification partitions corresponding to the first feature according to the attribute of the first feature includes:
and dividing the plurality of pre-selected samples into a first specified number of first samples corresponding to the category types according to the category types of the first features, wherein the first specified number is the category type number of the first features.
5. The method for finding a target group according to claim 3, wherein the attribute of the first feature is a numerical type, and the step of determining a first specified number of classification partitions corresponding to the first feature according to the attribute of the first feature includes:
And dividing the pre-selected samples into a first specified number of first samples corresponding to the discrete intervals according to the discrete intervals corresponding to the continuous data representing the first features, wherein the first specified number is the number of the discrete intervals corresponding to the continuous data of the first features.
6. The method of claim 1, wherein the step of determining whether there are target second samples satisfying a second preset condition in the second specified number of second samples comprises:
obtaining a designated second sample corresponding to the maximum purchase rate by comparing the purchase rates corresponding to the second samples respectively;
judging whether the maximum purchase rate corresponding to the appointed second sample meets the purchase rate corresponding to a second preset condition;
if so, determining that a target second sample meeting a second preset condition exists.
7. The method of claim 1, wherein the step of determining whether there are target second samples satisfying a second preset condition in the second specified number of second samples comprises:
obtaining a designated second sample corresponding to the maximum purchase rate by comparing the purchase rates corresponding to the second samples respectively;
Judging whether the maximum purchase rate corresponding to the appointed second sample meets the purchase rate corresponding to a second preset condition;
if yes, judging whether the total data amount of the appointed second sample is larger than a preset amount;
if the number is larger than the preset number, determining that a target second sample meeting a second preset condition exists.
8. An apparatus for searching for a target group for performing the method for searching for a target group according to any one of claims 1 to 7, comprising:
the system comprises a first acquisition module, a second acquisition module and a storage module, wherein the first acquisition module is used for acquiring a plurality of pre-selected samples, and each pre-selected sample comprises user data corresponding to a plurality of characteristics of a user respectively;
a second acquisition module, configured to acquire a first feature having the greatest influence on the information amounts of the plurality of pre-selected samples from a plurality of features included in the plurality of pre-selected samples;
the first dividing module is used for dividing the plurality of pre-selected samples into a first sample with a first designated number according to the first designated number of classification partitions corresponding to the first features;
the screening module is used for screening target first samples meeting first preset conditions from each first sample, wherein the number of the target first samples is one or more;
A third obtaining module, configured to obtain, from a plurality of features included in the target first sample, a second feature that has a greatest influence on an information amount of the target first sample, where the second feature is different from the first feature;
the second dividing module is used for dividing the target first samples into second samples with a second specified number according to the second specified number of classification partitions corresponding to the second features;
the judging module is used for judging whether target second samples meeting a second preset condition exist in the second specified number of second samples;
and the judging module is used for stopping dividing the target second samples if the target second samples exist and judging the target second samples meeting the second preset conditions as corresponding target groups.
9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1 to 7 when the computer program is executed.
10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 7.
CN201810771080.9A 2018-07-13 2018-07-13 Method, device, computer equipment and storage medium for searching target group Active CN109101562B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810771080.9A CN109101562B (en) 2018-07-13 2018-07-13 Method, device, computer equipment and storage medium for searching target group

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810771080.9A CN109101562B (en) 2018-07-13 2018-07-13 Method, device, computer equipment and storage medium for searching target group

Publications (2)

Publication Number Publication Date
CN109101562A CN109101562A (en) 2018-12-28
CN109101562B true CN109101562B (en) 2023-07-21

Family

ID=64846410

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810771080.9A Active CN109101562B (en) 2018-07-13 2018-07-13 Method, device, computer equipment and storage medium for searching target group

Country Status (1)

Country Link
CN (1) CN109101562B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109992699B (en) * 2019-02-28 2023-08-11 平安科技(深圳)有限公司 User group optimization method and device, storage medium and computer equipment
CN110009012B (en) * 2019-03-20 2023-06-16 创新先进技术有限公司 Risk sample identification method and device and electronic equipment
US20200410369A1 (en) * 2019-06-28 2020-12-31 Microsoft Technology Licensing, Llc Data-driven cross feature generation

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105718490A (en) * 2014-12-04 2016-06-29 阿里巴巴集团控股有限公司 Method and device for updating classifying model
CN105868847A (en) * 2016-03-24 2016-08-17 车智互联(北京)科技有限公司 Shopping behavior prediction method and device
CN106227743A (en) * 2016-07-12 2016-12-14 精硕世纪科技(北京)有限公司 Advertisement target group touches and reaches ratio estimation method and device
CN107785058A (en) * 2017-07-24 2018-03-09 平安科技(深圳)有限公司 Anti- fraud recognition methods, storage medium and the server for carrying safety brain
CN107818482A (en) * 2017-11-22 2018-03-20 用友金融信息技术股份有限公司 Computational methods, system and the computer equipment of the notable feature of target group
CN107944481A (en) * 2017-11-16 2018-04-20 百度在线网络技术(北京)有限公司 Method and apparatus for generating information
CN108153824A (en) * 2017-12-06 2018-06-12 阿里巴巴集团控股有限公司 The determining method and device of targeted user population

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7203343B2 (en) * 2001-09-21 2007-04-10 Hewlett-Packard Development Company, L.P. System and method for determining likely identity in a biometric database
CN105956122A (en) * 2016-05-03 2016-09-21 无锡雅座在线科技发展有限公司 Object attribute determining method and device

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105718490A (en) * 2014-12-04 2016-06-29 阿里巴巴集团控股有限公司 Method and device for updating classifying model
CN105868847A (en) * 2016-03-24 2016-08-17 车智互联(北京)科技有限公司 Shopping behavior prediction method and device
CN106227743A (en) * 2016-07-12 2016-12-14 精硕世纪科技(北京)有限公司 Advertisement target group touches and reaches ratio estimation method and device
CN107785058A (en) * 2017-07-24 2018-03-09 平安科技(深圳)有限公司 Anti- fraud recognition methods, storage medium and the server for carrying safety brain
CN107944481A (en) * 2017-11-16 2018-04-20 百度在线网络技术(北京)有限公司 Method and apparatus for generating information
CN107818482A (en) * 2017-11-22 2018-03-20 用友金融信息技术股份有限公司 Computational methods, system and the computer equipment of the notable feature of target group
CN108153824A (en) * 2017-12-06 2018-06-12 阿里巴巴集团控股有限公司 The determining method and device of targeted user population

Also Published As

Publication number Publication date
CN109101562A (en) 2018-12-28

Similar Documents

Publication Publication Date Title
CN109872036B (en) Task allocation method and device based on classification algorithm and computer equipment
KR102192863B1 (en) Information recommendation method and device
CN109101562B (en) Method, device, computer equipment and storage medium for searching target group
WO2018103718A1 (en) Application recommendation method and apparatus, and server
CN110175895B (en) Article recommendation method and device
JPWO2012118087A1 (en) Recommender system, recommendation method, and program
CN110580278A (en) personalized search method, system, equipment and storage medium according to user portrait
CN114419501A (en) Video recommendation method and device, computer equipment and storage medium
US20180150754A1 (en) Data analysis method, system and non-transitory computer readable medium
US20170154294A1 (en) Performance evaluation device, control method for performance evaluation device, and control program for performance evaluation device
CN111274531A (en) Commodity sales amount prediction method, commodity sales amount prediction device, computer equipment and storage medium
EP2469463A1 (en) A method and a system for analysing traffic on a website by means of path analysis
CN113792084A (en) Data heat analysis method, device, equipment and storage medium
CN109102164B (en) Platform evaluation method and device, computer equipment and storage medium
CN115391666A (en) Hospital online recommendation method and device, computer equipment and storage medium
CN116010670A (en) Data catalog recommendation method, device and application based on data blood relationship
CN111382342B (en) Method, device and equipment for acquiring hot search words and storage medium
CN114780589A (en) Multi-table connection query method, device, equipment and storage medium
Jaffrézic et al. Multivariate character process models for the analysis of two or more correlated function-valued traits
KR101921197B1 (en) Property Insurance Consulting System
CN107545347A (en) Attribute determining method, device and server for prevention and control risk
KR20200046899A (en) Method and apparatus for extracting data of interest
KR20200036173A (en) Method and apparatus for recommending user grouping and groupwise preferred machine learning analysis algorithms
CN117112912B (en) Personalized catering content display method and system based on user characteristics
KR20200122652A (en) Nutrient Profiling-based Pet Food Recommendation System

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant