CN109101562A - Find method, apparatus, computer equipment and the storage medium of target group - Google Patents

Find method, apparatus, computer equipment and the storage medium of target group Download PDF

Info

Publication number
CN109101562A
CN109101562A CN201810771080.9A CN201810771080A CN109101562A CN 109101562 A CN109101562 A CN 109101562A CN 201810771080 A CN201810771080 A CN 201810771080A CN 109101562 A CN109101562 A CN 109101562A
Authority
CN
China
Prior art keywords
sample
feature
target
fisrt feature
selection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810771080.9A
Other languages
Chinese (zh)
Other versions
CN109101562B (en
Inventor
周南光
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Life Insurance Company of China Ltd
Original Assignee
Ping An Life Insurance Company of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Life Insurance Company of China Ltd filed Critical Ping An Life Insurance Company of China Ltd
Priority to CN201810771080.9A priority Critical patent/CN109101562B/en
Publication of CN109101562A publication Critical patent/CN109101562A/en
Application granted granted Critical
Publication of CN109101562B publication Critical patent/CN109101562B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

This application discloses a kind of methods for finding target group, comprising: obtains multiple pre-selection samples;Obtaining in multiple features that the multiple pre-selection sample includes influences maximum fisrt feature to the information content of the multiple pre-selection sample;According to the classified zoning of corresponding first specified quantity of the fisrt feature, the multiple pre-selection sample is divided into the first sample of the first specified quantity;Screening meets the target first sample of the first preset condition from each first sample;Obtaining in multiple features that the target first sample includes influences maximum second feature to the information content of the target first sample;According to the classified zoning of corresponding second specified quantity of the second feature, the target first sample is divided into the second sample of the second specified quantity;Judge in the second sample of second specified quantity with the presence or absence of the second sample of target for meeting the second preset condition;If it exists, then determine that the second sample of target is corresponding target group.

Description

Find method, apparatus, computer equipment and the storage medium of target group
Technical field
This application involves big data field is arrived, method, apparatus, the computer equipment of finding target group are especially related to And storage medium.
Background technique
Existing customer's data exist in the form of big data, and required special group is found in big data, Or it is relatively difficult.But need to screen the target group for meeting and needing in existing application from large database concept, so as to more straight It connects, effectively for target group's expansion relevant work, working efficiency not only can be improved, and target can be made more to be directed to Property, working effect become apparent from.Therefore, target group are accurately found in big data with practical application value.
Summary of the invention
The main purpose of the application is to provide a kind of method for finding target group, it is intended to which solution is found in big data The relatively difficult technical problem of required special group.
The application proposes a kind of method for finding target group, comprising:
Multiple pre-selection samples are obtained, wherein each pre-selection sample includes the corresponding number of users of multiple features of user According to;
Obtaining in multiple features that the multiple pre-selection sample includes influences the information content of the multiple pre-selection sample Maximum fisrt feature;
According to the classified zoning of corresponding first specified quantity of the fisrt feature, the multiple pre-selection sample is divided into The first sample of first specified quantity;
Screening meets the target first sample of the first preset condition from each first sample, wherein the first sample of target This is one or more;
Obtaining in multiple features that the target first sample includes influences the information content of the target first sample Maximum second feature, the second feature are different from the fisrt feature;
According to the classified zoning of corresponding second specified quantity of the second feature, the target first sample is divided into Second sample of the second specified quantity;
Judge in the second sample of second specified quantity with the presence or absence of the second sample of target for meeting the second preset condition This;
If it exists, then stop the division to second sample of target, and determine the mesh for meeting the second preset condition Marking the second sample is corresponding target group.
Preferably, described to obtain in multiple features that the multiple pre-selection sample includes to the multiple pre-selection sample Information content influences the step of maximum fisrt feature, comprising:
Calculate the overall information amount of the pre-selection sample;
Each feature is obtained respectively to the influence value of the overall information amount;
Descending arrangement is carried out to each feature according to the size of each influence value;
The arrangement order in the descending arrangement is set near the preceding corresponding feature of the first influence value, for first spy Sign.
Preferably, the classified zoning according to corresponding first specified quantity of the fisrt feature, will be the multiple pre- Sampling was originally divided into before the step of first sample of the first specified quantity, further includes:
Obtain the attribute of the fisrt feature;
The classified zoning of corresponding first specified quantity of the fisrt feature is determined according to the attribute of the fisrt feature.
Preferably, the attribute of the fisrt feature is classification type, described according to the determination of the attribute of the fisrt feature The step of classified zoning of corresponding first specified quantity of fisrt feature, comprising:
According to the classification type of the fisrt feature, the multiple pre-selection sample is divided into corresponding with the classification type The first specified quantity first sample, wherein first specified quantity be the fisrt feature classification number of species.
Preferably, the attribute of the fisrt feature is numeric type, described according to the determination of the attribute of the fisrt feature The step of classified zoning of corresponding first specified quantity of fisrt feature, comprising:
According to discrete segment corresponding to the continuous data for characterizing the fisrt feature, the pre-selection sample is divided into and The first sample of corresponding first specified quantity of the discrete segment, wherein first specified quantity is the fisrt feature Continuous data corresponding to discrete segment quantity.
Preferably, with the presence or absence of the second preset condition of satisfaction in the second sample of judgement second specified quantity The step of the second sample of target, comprising:
The corresponding buying rate of each second sample by comparing obtains corresponding specified second sample of maximum buying rate This;
Judge whether the corresponding maximum buying rate of specified second sample meets the corresponding buying rate of the second preset condition;
If satisfied, then determining there is the second sample of target for meeting the second preset condition.
Preferably, with the presence or absence of the second preset condition of satisfaction in the second sample of judgement second specified quantity The step of the second sample of target, comprising:
The corresponding buying rate of each second sample by comparing obtains corresponding specified second sample of maximum buying rate This;
Judge whether the corresponding maximum buying rate of specified second sample meets the corresponding buying rate of the second preset condition;
If satisfied, then judging whether the total amount of data of specified second sample is greater than preset quantity;
If more than preset quantity, then determine there is the second sample of target for meeting the second preset condition.
Present invention also provides a kind of devices for finding target group, comprising:
First obtains module, for obtaining multiple pre-selection samples, wherein each pre-selection sample includes multiple features of user Corresponding user data;
Second obtains module, for obtaining in multiple features that the multiple pre-selection sample includes to the multiple pre-selection Amounts of specimen information influences maximum fisrt feature;
First division module will be described for the classified zoning according to corresponding first specified quantity of the fisrt feature Multiple pre-selection samples are divided into the first sample of the first specified quantity;
Screening module, for screening the target first sample for meeting the first preset condition from each first sample, Wherein target first sample is one or more;
Third obtains module, for obtaining in multiple features that the target first sample includes to the target first The information content of sample influences maximum second feature, and the second feature is different from the fisrt feature;
Second division module will be described for the classified zoning according to corresponding second specified quantity of the second feature Target first sample is divided into the second sample of the second specified quantity;
Judgment module, with the presence or absence of meeting the second preset condition in the second sample for judging second specified quantity The second sample of target;
Determination module for if it exists, then stopping the division to second sample of target, and determines to meet described second The second sample of target of preset condition is corresponding target group.
The application also provides a kind of computer equipment, including memory and processor, and the memory is stored with computer The step of program, the processor realizes the above method when executing the computer program.
Present invention also provides a kind of computer readable storage mediums, are stored thereon with computer program, which is characterized in that The computer program realizes the step of above-mentioned method when being executed by processor.
The application finds the corresponding feature of the maximum significant coefficient of influence of target group by decision-tree model, accelerates The efficiency of target group is found, and improves searching accuracy.The maximum feature of influence information content that the application foundation searches out, It realizes the refinement to pre-selection sample, divide, gradually to inquire target group, realize the effective use to target group and pipe Control.By carrying out being aggregated into characteristic set to the feature for finding target group, form target group with characteristic set is the application The user of label draws a portrait, and facilitating development to have target group's characteristic set is the potential customers of label.
Detailed description of the invention
The method flow schematic diagram of the searching target group of one embodiment of Fig. 1 the application;
The structural schematic diagram of the device of the searching target group of one embodiment of Fig. 2 the application;
The second of one embodiment of Fig. 3 the application obtains the structural schematic diagram of module;
The structural schematic diagram of the device of the searching target group of another embodiment of Fig. 4 the application;
The structural schematic diagram of the third division module of one embodiment of Fig. 5 the application;
The structural schematic diagram of the third division module of another embodiment of Fig. 6 the application;
The structural schematic diagram of the judgment module of one embodiment of Fig. 7 the application;
The structural schematic diagram of the judgment module of another embodiment of Fig. 8 the application;
The structural schematic diagram of the device of the searching target group of Fig. 9 the application another embodiment;
The schematic diagram of internal structure of the computer equipment of one embodiment of Figure 10 the application.
Specific embodiment
It is with reference to the accompanying drawings and embodiments, right in order to which the objects, technical solutions and advantages of the application are more clearly understood The application is further elaborated.It should be appreciated that specific embodiment described herein is only used to explain the application, not For limiting the application.
Referring to Fig.1, the method for the searching target group of one embodiment of the application, comprising:
S1a: obtaining multiple pre-selection samples, wherein each pre-selection sample includes the corresponding user of multiple features of user Data.
It include a large amount of user data in the pre-selection sample of the present embodiment, citing ground, pre-selection sample includes 10,000,000 use The data at family, each user have multiple features, choose the user in 10,000,000 users with most features, and with the use On the basis of the feature and feature quantity at family, for example, pre-selection sample in party A-subscriber feature it is most, the feature including 100 dimensions, Then preselect the feature that each user in sample selects 100 above-mentioned dimensions, for example, name, the age, gender, area, height, Weight, product purchase frequency, purchase hobby etc..
S1: the information content shadow to the multiple pre-selection sample is obtained in multiple features that the multiple pre-selection sample includes Ring maximum fisrt feature.
The present embodiment influences maximum fisrt feature to pre-selection sample, and the information content fluctuation for influence pre-selection sample is maximum Feature.For influencing the buying rate of pre-selection sample, the feature of 100 dimensions in sample will be first preselected, according to decision tree meter Calculation method is classified as feature according to one respectively, and one is classified as whether buy product progress data assignment, according to the corresponding data of each feature Arrangement calculates separately the corresponding buying rate value of each feature, arranges 100 features according to descending according to the size of buying rate value Column select sequence in descending arrangement to divide in 10 features of front end to pre-selection sample, and wherein sequence exists in descending arrangement The feature of front end, to influence maximum fisrt feature to pre-selection amounts of specimen information.
S2: according to the classified zoning of corresponding first specified quantity of the fisrt feature, the multiple pre-selection sample is drawn It is divided into the first sample of the first specified quantity.
Citing ground, each feature have divided the classified zoning of specified quantity, therefore fisrt feature also corresponding presence in advance The classified zoning of first specified quantity will then preselect sample according to the classified zoning of fisrt feature and divide pre-selection sample, and above-mentioned the The classified zoning quantity of one specified quantity and fisrt feature corresponds.Such as fisrt feature be gender, gender include male and Two classified zonings of women, then the first specified quantity is two, and pre-selection sample can be divided into two first samples, and one is women First sample, another is male's first sample.For another example, fisrt feature is the age, the age by it is discrete in advance for [0,20), [20,40), [40,60), [60,80), [80,100] five classified zonings, then the first specified quantity is five, can will preselect sample It is divided into five first samples.
S3: screening meets the target first sample of the first preset condition from each first sample, wherein target the One sample is one or more.
For example, first preset condition of the present embodiment can be greater than for the first average buying rate of first sample or be equal to Preset threshold, for example, the first average buying rate is 50%, then the first sample for meeting the first average buying rate is target first Sample.Citing ground, above-mentioned women first sample and male's first sample are all satisfied the requirement of the first average buying rate, then women First sample and male's first sample are target first sample, to enter in the echelon that second divides.
S4: the information content shadow to the target first sample is obtained in multiple features that the target first sample includes Maximum second feature is rung, the second feature is different from the fisrt feature.
The present embodiment explains the partition process to target first sample by taking the target first sample chosen as an example, right Other other target first samples for being included in second of division echelon make same processing.The present embodiment is with women first sample Example, being found according to decision tree calculation method influences maximum second feature to target first sample information content, for example is the age.Cause For after being divided using " gender " as fisrt feature to sample, the obtained gender in the same target first sample is phase Together, when being ranked up again to feature importance, during " gender " this feature will no longer sort, for example, feature descending is arranged at this time In sequence, making number one is " age ", then second feature is " age ".
S5: according to the classified zoning of corresponding second specified quantity of the second feature, the target first sample is drawn It is divided into the second sample of the second specified quantity.
Citing ground, the present embodiment by by the age it is discrete for [0,20), [20,40), [40,60), [60,80), [80, Target first sample correspondence is divided into corresponding five the second samples in five sections by 100] five sections.It is equivalent to pair Pre-selection sample has carried out further refinement, to find the higher target group of buying rate.
S6: judge in the second sample of second specified quantity with the presence or absence of the target second for meeting the second preset condition Sample.
The preset condition of the present embodiment can be set according to the demand for finding target group, for example preset condition is purchase Rate reaches 90% or more.Second preset condition of the present embodiment refers to the default buying rate after multiple features, is different from single A feature divides corresponding first preset condition when sample, it will be appreciated that is less than or equal to for the corresponding buying rate of the first preset condition The corresponding buying rate of second preset condition finds to realize by gradually passing through diminution sample range and meets the second preset condition The target group that corresponding buying rate requires.
S7: if it exists, then stop the division to second sample of target, and determine the target for meeting the second preset condition Second sample is corresponding target group.
If the buying rate of some the second sample reaches the corresponding buying rate of the second preset condition, such as the second preset condition pair The buying rate answered reaches 90% or more, then the target group for needing to find has been found, if it does not exist, then to target the second sample weight It newly sets up screening and enters the third preset condition that third time sample divides, it is default that the calling hierarchy of third preset condition is greater than first The calling hierarchy of condition, for example, the corresponding third of third preset condition is averaged, buying rate is greater than the first preset condition corresponding the One average buying rate, for example the third buying rate that be averaged is 60%, is averaged the 50% of buying rate greater than first, so as to the side to gather Formula is quickly found out the target group met the requirements.
Further, the step S1 of the present embodiment, comprising:
S10: the overall information amount of the pre-selection sample is calculated.
The present embodiment preselects the entropy of sample by calculating, and obtains overall information amount.Calculation is as follows:Wherein, Pi indicates the ratio of the special group accounting pre-selection sample with purchase data Example, i.e. this implementation Pi indicate the ratio of the entire group of purchasing group's accounting;H (x) indicates to calculate population entropy symbol, the present embodiment The entropy of pre-selection sample be expressed as H (D).
S11: each feature is obtained respectively to the influence value of the overall information amount.
Each feature of the present embodiment to the influence value of the overall information amount, is obtained respectively by information gain algorithm, is led to It crosses after individually calculating each feature addition calculating process, the influence amplitude to whole entropy, to obtain each influence value.Information gain is calculated Method calculates as follows: g (D, A)=H (D)-H (D | A), wherein g (D, A) indicates influence amplitude of the A feature to whole entropy, H (D) table Show that the entropy of pre-selection sample, H (D Ι A) indicate the entropy of the sample after dividing according to A feature.
The application other embodiments can be by information gain than obtaining each feature influence to the overall information amount respectively Value reduces the influence to the smaller entropy of small sample, i.e. information increases by introducing the punishment parameter being modified to information gain Beneficial ratio=punishment parameter * information gain.
S12: descending arrangement is carried out to each feature according to the size of each influence value.
The numerical value of the influence value of the present embodiment is bigger, illustrates bigger to entire effect, and the predictive ability of individual features is stronger, Feature is more important for the effect of division sample, searching target group.The present embodiment passes through according to the big of each influence value It is small that descending arrangement, fisrt feature that is more intuitive, screening division pre-selection sample more quickly are carried out to each feature.
S13: the arrangement order in the descending arrangement is set near the preceding corresponding feature of the first influence value, is first special Sign.
The present embodiment passes through the arrangement order directly selected state in descending arrangement near the preceding corresponding spy of the first influence value Sign is fisrt feature, accurately to determine fisrt feature, the accurate division to pre-selection sample is realized, with the target for ensuring to eventually find The reliability of group.
Further, before the step S2 of the present embodiment, further includes:
S20: the attribute of the fisrt feature is obtained.
The attribute of the fisrt feature of the present embodiment, including two attribute of classification type feature and numeric type feature.For example, gender For classification type feature, the age is numeric type feature.
S21: the classification point of corresponding first specified quantity of the fisrt feature is determined according to the attribute of the fisrt feature Area.
In the present embodiment, the fisrt feature of different attribute is different to the criteria for classifying of pre-selection sample, the processing method of division It is different.For example the criteria for classifying of classification type feature can only carry out sample division according to the classification type for being included, classification type Quantity determines the quantity of classified zoning;Numeric type feature can first be separated into multiple data intervals for connecting distribution as needed, Then sample is divided according to multiple data intervals, the quantity of data interval determines the quantity of classified zoning.
Further, the attribute of the fisrt feature of the present embodiment is classification type, step S21, comprising:
S210: according to the classification type of the fisrt feature, the multiple pre-selection sample is divided into and the classification kind The first sample of corresponding first specified quantity of class, wherein first specified quantity is the classification type of the fisrt feature Quantity.
The present embodiment is to illustrate the process that pre-selection sample is divided using classification type feature by taking classification type feature as an example. It first determines whether that classification type feature includes several classifications, then pre-selection sample is divided into several first samples.First sample, the second sample Other samples of this grade divide the process of sample according to classification type feature and principle preselects together sample.
Further, the attribute of the fisrt feature of another embodiment of the application is numeric type, step S21, comprising:
S211: according to discrete segment corresponding to the continuous data for characterizing the fisrt feature, the pre-selection sample is drawn It is divided into the first sample of the first specified quantity corresponding with the discrete segment, wherein first specified quantity is described the The quantity of discrete segment corresponding to the continuous data of one feature.
The present embodiment is to illustrate the process that pre-selection sample is divided using numeric type feature by taking numeric type feature as an example. First by discrete logarithm by numeric type feature be separated into it is several connect arrangement discrete segments, then will pre-selection sample be divided into The corresponding multiple first samples of each discrete segment.Other samples such as first sample, the second sample divide sample according to numeric type feature This process and principle is same to preselect sample.
The present embodiment obtains the value range of numeric type feature first, that is, acquires the maximum value and minimum of numeric type feature Value.Then, according to the discretization extent index num of input, multiple quantiles are calculated, such as input num=5, numeric type feature with For age, value range is 0 to 100, then calculates 20%, 40%, 60% after sorting in the continuous data, 80% position difference Corresponding value, then successively have [0,20), [20,40), [40,60), [60,80), [80,100] five interval ranges, and use area Between the information of range replace numerical value specifically determining in original pre-selection sample, complete numeric type feature from point value numeric type feature Be converted into discrete segment feature, i.e., above-mentioned corresponding five discrete segments of five interval ranges, for example, the age of certain user is 25, Then [20,40) corresponding discrete segment is.The present embodiment avoids outlier (exceptional value) to overall distribution by sliding-model control The influence of fitness bias is caused, such as: 99% data are all in 0 to 100 section in pre-selection sample, but the data of appearance 1% Value is 1000, then algorithm can remove too much concern abnormal data because numerical value change is excessive during identification, meeting Biggish deviation is brought to fitting result.Moreover, after feature discretization have it is stronger explanatory, the value of numeric type feature is It is unlimited, at this time can not some occurrence level locating in pre-selection sample, be then easy to after discretization to be calculated this from The features such as dissipating crowd's accounting in section.
Further, the step S6 of the present embodiment, comprising:
S60: by comparing the corresponding buying rate of each second sample, maximum buying rate corresponding specified the is obtained Two samples.
The present embodiment terminates for finding the target group with specified buying rate and continues divide second in advance to sample If condition, meet the corresponding buying rate of the second preset condition for the buying rate of the small sample after division.After being divided in the present embodiment Small sample be it is multiple, by comparing the corresponding buying rate of each small sample, the maximum small sample of buying rate is obtained, with logical It crosses whether more maximum buying rate reaches the corresponding buying rate of the second preset condition, judges whether to have found target group.
S61: judge whether the corresponding maximum buying rate of specified second sample meets the corresponding purchase of the second preset condition Buy rate.
The present embodiment will preselect sample according to fisrt feature and be divided into each the by pre-selection screening sample fisrt feature Then one sample again respectively carries out each first sample screening corresponding second feature, and according to corresponding second Feature is divided into each second sample again, and so circulation divides sample, and until some final small sample or certain is several small Until the buying rate of sample reaches the corresponding buying rate of the second preset condition.
S62: if satisfied, then determining there is the second sample of target for meeting the second preset condition.
Further, the step S6 of another embodiment of the application, comprising:
S63: by comparing the corresponding buying rate of each second sample, maximum buying rate corresponding specified the is obtained Two samples.
After the present embodiment is to be divided into each corresponding multiple second samples to each first sample, have found that meet second default Second sample of the corresponding buying rate of condition, but need whether the data volume in further the second sample of analysis has practical reference Value, if the data volume in the second sample is less, such as it is several, tens, then it is assumed that reference value is little.
S64: judge whether the corresponding maximum buying rate of specified second sample meets default buying rate.
To avoid the calculation amount repeatedly divided excessive, when commonly reaching the corresponding buying rate of the second preset condition, it is only necessary to Sample successively is divided according to 6 features or most 10 features, the corresponding small sample of target group can be found.
S65: if satisfied, then judging whether the total amount of data of specified second sample is greater than preset quantity.
Second sample of target of the present embodiment does not require nothing more than buying rate and reaches expected, and data volume is required to reach requirement, i.e., The number of users of target group reaches expected, very few to reach user volume contained in the target group of expected purchase rate, and Lose the practical application value that feature summarizes target group.
S66: if more than preset quantity, then determine there is the second sample of target for meeting the second preset condition.
Further, after the step S7 of the present embodiment, comprising:
S8: summarize fisrt feature, the second feature when finding the target group, composition characteristic combination.
The present embodiment, which will pass through, the fisrt feature used repeatedly will be divided to pre-selection sample, second feature composition characteristic combines, Identity label as the corresponding small sample of target group.The application other embodiments are divided into each first sample each corresponding After multiple second samples, target group are not had found, then continue to divide each second sample again, obtain each second sample Corresponding multiple third samples, or another straight divide occur until finding corresponding n-th sample of target group, then will be multiple Divide fisrt feature, second feature that default sample uses ... the combination of the n-th feature composition characteristic, it is corresponding as target group The identity label of small sample.
S9: the user by feature combination as the target group draws a portrait.
The present embodiment, preferably to identify the target group, is more convenient root by forming user's portrait to target group Going to expand according to user's portrait has the new user of same characteristic features as client.
For the present embodiment for finding the user group with specified buying rate, pre-selection sample is that certain product buys platform Database.The pre-selection sample of another embodiment of the application is the characteristic of other cases such as diabetes, then can be according to above-mentioned mistake Journey and principle find the special case crowd of certain high-incidence disease, to carry out effectively managing seizure of disease rate.
The pre-selection sample of the application another embodiment is the property data base of debt-credit crowd, then can be according to the above process and original Reason finds the special case crowd with debt-credit risk, to carry out effectively managing debt-credit risk.
The present embodiment finds the corresponding feature of the maximum significant coefficient of influence of target group by decision-tree model, accelerates The efficiency of target group is found, and improves searching accuracy.The present embodiment is real according to the maximum feature of influence searched out Effective use and control to target group now are realized gradually to inquire target group to the refinement of pre-selection sample, division. The present embodiment forms target group by carrying out being aggregated into characteristic set to the feature for finding target group with characteristic set as mark The user of label draws a portrait, and facilitating development to have target group's characteristic set is the potential customers of label.
Referring to Fig. 2, the device of the searching target group of one embodiment of the application characterized by comprising
First obtains module 1a, for obtaining multiple pre-selection samples, wherein each pre-selection sample includes multiple spies of user Levy corresponding user data.
It include a large amount of user data in the pre-selection sample of the present embodiment, citing ground, pre-selection sample includes 10,000,000 use The data at family, each user have multiple features, choose the user in 10,000,000 users with most features, and with the use On the basis of the feature and feature quantity at family, for example, pre-selection sample in party A-subscriber feature it is most, the feature including 100 dimensions, Then preselect the feature that each user in sample selects 100 above-mentioned dimensions, for example, name, the age, gender, area, height, Weight, product purchase frequency, purchase hobby etc..
Second obtains module 1, for obtaining in multiple features that the multiple pre-selection sample includes to the multiple pre- The information content of sampling sheet influences maximum fisrt feature.
It include a large amount of user data in the pre-selection sample of the present embodiment, citing ground, pre-selection sample includes 10,000,000 use The data at family, each user have multiple features, choose the user in 10,000,000 users with most features, and with this On the basis of the feature and feature quantity of user, for example, in pre-selection sample the feature of party A-subscriber be it is most, including 100 dimensions Feature, then preselect the feature that each user in sample selects 100 above-mentioned dimensions, for example, name, the age, gender, area, Height, weight, product purchase frequency, purchase hobby etc..The present embodiment influences maximum fisrt feature to pre-selection sample, is The information content for influencing pre-selection sample fluctuates maximum feature.For influencing the buying rate of pre-selection sample, will first it preselect in sample The feature of 100 dimensions is classified as feature according to one respectively according to decision tree calculation method, and one is classified as whether buy product progress Data assignment calculates separately the corresponding buying rate value of each feature according to the corresponding data assignment of each feature, takes according to buying rate The size of value arranges 100 features according to descending, and sequence is in 10 features of front end to pre-selection sample in selection descending arrangement It is divided, wherein sequence is special on the influence maximum first of pre-selection amounts of specimen information in the feature of front end in descending arrangement Sign.
First division module 2 will be described for the classified zoning according to corresponding first specified quantity of the fisrt feature Multiple pre-selection samples are divided into the first sample of the first specified quantity.
Citing ground, each feature have divided the classified zoning of specified quantity, therefore fisrt feature also corresponding presence in advance The classified zoning of first specified quantity will then preselect sample according to the classified zoning of fisrt feature and divide pre-selection sample, and above-mentioned the The classified zoning quantity of one specified quantity and fisrt feature corresponds.Such as fisrt feature be gender, gender include male and Two classified zonings of women, then the first specified quantity is two, and pre-selection sample can be divided into two first samples, and one is women First sample, another is male's first sample.For another example, fisrt feature is the age, the age by it is discrete in advance for [0,20), [20,40), [40,60), [60,80), [80,100] five classified zonings, then the first specified quantity is five, can will preselect sample It is divided into five first samples.
Screening module 3, for screening the target first sample for meeting the first preset condition from each first sample, Wherein target first sample is one or more.
For example, first preset condition of the present embodiment can be greater than for the first average buying rate of first sample or be equal to Preset threshold, for example, the first average buying rate is 50%, then the first sample for meeting the first average buying rate is target first Sample.Citing ground, above-mentioned women first sample and male's first sample are all satisfied the requirement of the first average buying rate, then women First sample and male's first sample are target first sample, to enter in the echelon that second divides.
Third obtains module 4, for obtaining in multiple features that the target first sample includes to the target the The information content of one sample influences maximum second feature, and the second feature is different from the fisrt feature.
The present embodiment explains the partition process to target first sample by taking the target first sample chosen as an example, right Other other target first samples for being included in second of division echelon make same processing.The present embodiment is with women first sample Example, being found according to decision tree calculation method influences maximum second feature to target first sample information content, for example is the age.Cause For after being divided using " gender " as fisrt feature to sample, the obtained gender in the same target first sample is phase Together, when being ranked up again to feature importance, during " gender " this feature will no longer sort, for example, feature descending is arranged at this time In sequence, making number one is " age ", then second feature is " age ".
Second division module 5 will be described for the classified zoning according to corresponding second specified quantity of the second feature Target first sample is divided into the second sample of the second specified quantity.
Citing ground, the present embodiment by by the age it is discrete for [0,20), [20,40), [40,60), [60,80), [80, Target first sample correspondence is divided into corresponding five the second samples in five sections by 100] five sections.It is equivalent to pair Pre-selection sample has carried out further refinement, to find the higher target group of buying rate.
Judgment module 6, with the presence or absence of meeting preset condition in the second sample for judging second specified quantity The second sample of target.
The preset condition of the present embodiment can be set according to the demand for finding target group, for example preset condition is purchase Rate reaches 90% or more.Second preset condition of the present embodiment refers to the default buying rate after multiple features, is different from single A feature divides corresponding first preset condition when sample, it will be appreciated that is less than or equal to for the corresponding buying rate of the first preset condition The corresponding buying rate of second preset condition finds to realize by gradually passing through diminution sample range and meets the second preset condition The target group that corresponding buying rate requires.
Determination module 7 for if it exists, then stopping the division to second sample of target, and determines to meet second in advance If the second sample of the target of condition is corresponding target group.
If the buying rate of some the second sample reaches the corresponding buying rate of the second preset condition, such as the second preset condition pair The buying rate answered reaches 90% or more, then the target group for needing to find has been found, if it does not exist, then to target the second sample weight It newly sets up screening and enters the third preset condition that third time sample divides, it is default that the calling hierarchy of third preset condition is greater than first The calling hierarchy of condition, for example, the corresponding third of third preset condition is averaged, buying rate is greater than the first preset condition corresponding the One average buying rate, for example the third buying rate that be averaged is 60%, is averaged the 50% of buying rate greater than first, so as to the side to gather Formula is quickly found out the target group met the requirements.
Referring to Fig. 3, the second of the present embodiment obtains module 1, comprising:
Computing unit 10, for calculating the overall information amount of the pre-selection sample.
The present embodiment preselects the entropy of sample by calculating, and obtains overall information amount.Calculation is as follows:Wherein, Pi indicates the ratio of the special group accounting pre-selection sample with purchase data Example, i.e. this implementation Pi indicate the ratio of the entire group of purchasing group's accounting;H (x) indicates to calculate population entropy symbol, the present embodiment The entropy of pre-selection sample be expressed as H (D).
First acquisition unit 11, for obtaining each feature respectively to the influence value of the overall information amount.
Each feature of the present embodiment to the influence value of the overall information amount, is obtained respectively by information gain algorithm, is led to It crosses after individually calculating each feature addition calculating process, the influence amplitude to whole entropy, to obtain each influence value.Information gain is calculated Method calculates as follows: g (D, A)=H (D)-H (D | A), wherein g (D, A) indicates influence amplitude of the A feature to whole entropy, H (D) table Show that the entropy of pre-selection sample, H (D Ι A) indicate the entropy of the sample after dividing according to A feature.
The application other embodiments can be by information gain than obtaining each feature influence to the overall information amount respectively Value reduces the influence to the smaller entropy of small sample, i.e. information increases by introducing the punishment parameter being modified to information gain Beneficial ratio=punishment parameter * information gain.
Arrangement units 12, for carrying out descending arrangement to each feature according to the size of each influence value.
The numerical value of the influence value of the present embodiment is bigger, illustrates bigger to entire effect, and the predictive ability of individual features is stronger, Feature is more important for the effect of division sample, searching target group.The present embodiment passes through according to the big of each influence value It is small that descending arrangement, fisrt feature that is more intuitive, screening division pre-selection sample more quickly are carried out to each feature.
Setup unit 13, for setting the arrangement order in the descending arrangement near the preceding corresponding spy of the first influence value Sign is fisrt feature.
The present embodiment passes through the arrangement order directly selected state in descending arrangement near the preceding corresponding spy of the first influence value Sign is fisrt feature, accurately to determine fisrt feature, the accurate division to pre-selection sample is realized, with the target for ensuring to eventually find The reliability of group.
Referring to Fig. 4, the device of the searching target group of another embodiment of the application, comprising:
4th obtains module 20, for obtaining the attribute of the fisrt feature.
The attribute of the fisrt feature of the present embodiment, including two attribute of classification type feature and numeric type feature.For example, gender For classification type feature, the age is numeric type feature.
Third division module 21, for determining that the fisrt feature corresponding first refers to according to the attribute of the fisrt feature The classified zoning of fixed number amount.
In the present embodiment, the fisrt feature of different attribute is different to the criteria for classifying of pre-selection sample, the processing method of division It is different.For example the criteria for classifying of classification type feature can only carry out sample division according to the classification type for being included, classification type Quantity determines the quantity of classified zoning;Numeric type feature can first be separated into multiple data intervals for connecting distribution as needed, Then sample is divided according to multiple data intervals, the quantity of data interval determines the quantity of classified zoning.
Referring to Fig. 5, the attribute of the fisrt feature of the present embodiment is classification type, third division module 21, comprising:
First division unit 210 divides the multiple pre-selection sample for the classification type according to the fisrt feature For the first sample of the first specified quantity corresponding with the classification type, wherein first specified quantity is described first The quantity of the classification type of feature.
The present embodiment is to illustrate the process that pre-selection sample is divided using classification type feature by taking classification type feature as an example. It first determines whether that classification type feature includes several classifications, then pre-selection sample is divided into several first samples.First sample, the second sample Other samples of this grade divide the process of sample according to classification type feature and principle preselects together sample.
Referring to Fig. 6, the attribute of the fisrt feature of another embodiment of the application is numeric type, third division module 21, comprising:
Second division unit 211 will for the discrete segment according to corresponding to the continuous data for characterizing the fisrt feature The pre-selection sample is divided into the first sample of the first specified quantity corresponding with the discrete segment, wherein described first refers to Fixed number amount is the quantity of discrete segment corresponding to the continuous data of the fisrt feature.
The present embodiment is to illustrate the process that pre-selection sample is divided using numeric type feature by taking numeric type feature as an example. First by discrete logarithm by numeric type feature be separated into it is several connect arrangement discrete segments, then will pre-selection sample be divided into The corresponding multiple first samples of each discrete segment.Other samples such as first sample, the second sample divide sample according to numeric type feature This process and principle is same to preselect sample.
The present embodiment obtains the value range of numeric type feature first, that is, acquires the maximum value and minimum of numeric type feature Value.Then, according to the discretization extent index num of input, multiple quantiles are calculated, such as input num=5, numeric type feature with For age, value range is 0 to 100, then calculates 20%, 40%, 60% after sorting in the continuous data, 80% position difference Corresponding value, then successively have [0,20), [20,40), [40,60), [60,80), [80,100] five interval ranges, and use area Between the information of range replace numerical value specifically determining in original pre-selection sample, complete numeric type feature from point value numeric type feature Be converted into discrete segment feature, i.e., above-mentioned corresponding five discrete segments of five interval ranges, for example, the age of certain user is 25, Then [20,40) corresponding discrete segment is.The present embodiment avoids outlier (exceptional value) to overall distribution by sliding-model control The influence of fitness bias is caused, such as: 99% data are all in 0 to 100 section in pre-selection sample, but the data of appearance 1% Value is 1000, then algorithm can remove too much concern abnormal data because numerical value change is excessive during identification, meeting Biggish deviation is brought to fitting result.Moreover, after feature discretization have it is stronger explanatory, the value of numeric type feature is It is unlimited, at this time can not some occurrence level locating in pre-selection sample, be then easy to after discretization to be calculated this from The features such as dissipating crowd's accounting in section.
Referring to Fig. 7, the judgment module 6 of the present embodiment, comprising:
First obtains unit 60 obtains maximum purchase for the corresponding buying rate of each second sample by comparing Buy corresponding specified second sample of rate.
The present embodiment terminates for finding the target group with specified buying rate and continues divide second in advance to sample If condition, meet the corresponding buying rate of the second preset condition for the buying rate of the small sample after division.After being divided in the present embodiment Small sample be it is multiple, by comparing the corresponding buying rate of each small sample, the maximum small sample of buying rate is obtained, with logical It crosses whether more maximum buying rate reaches the corresponding buying rate of the second preset condition, judges whether to have found target group.
First judging unit 61, for judging it is pre- whether the corresponding maximum buying rate of specified second sample meets second If the corresponding buying rate of condition.
The present embodiment will preselect sample according to fisrt feature and be divided into each the by pre-selection screening sample fisrt feature Then one sample again respectively carries out each first sample screening corresponding second feature, and according to corresponding second Feature is divided into each second sample again, and so circulation divides sample, and until some final small sample or certain is several small Until the buying rate of sample reaches the corresponding buying rate of the second preset condition.
First judging unit 62, for if satisfied, then determining there is the second sample of target for meeting the second preset condition.
Referring to Fig. 8, the judgment module 6 of another embodiment of the application, comprising:
Second obtaining unit 63 obtains maximum purchase for the corresponding buying rate of each second sample by comparing Buy corresponding specified second sample of rate.
After the present embodiment is to be divided into each corresponding multiple second samples to each first sample, the second preset condition is had found Second sample of corresponding buying rate, but need whether the data volume in further the second sample of analysis has practical reference price Value, if the data volume in the second sample is less, such as it is several, tens, then it is assumed that reference value is little.
Second judgment unit 64, for judging whether the corresponding maximum buying rate of specified second sample meets default purchase Buy rate.
To avoid the calculation amount repeatedly divided excessive, when commonly reaching the corresponding buying rate of the second preset condition, it is only necessary to Sample successively is divided according to 6 features or most 10 features, the corresponding small sample of target group can be found.
Third judging unit 65, for if satisfied, then judge specified second sample total amount of data whether be greater than it is pre- If quantity.
Second sample of target of the present embodiment does not require nothing more than buying rate and reaches expected, and data volume is required to reach requirement, i.e., The number of users of target group reaches expected, very few to reach user volume contained in the target group of expected purchase rate, and Lose the practical application value that feature summarizes target group.
Second judging unit 66 then determines to exist the target for meeting the second preset condition for if more than preset quantity Two samples.
Reference Fig. 9, the device of the searching target group of the application another embodiment, including
Summarizing module 8, for summarize find the target group when fisrt feature, second feature, composition characteristic combination.
The present embodiment, which will pass through, the fisrt feature used repeatedly will be divided to pre-selection sample, second feature composition characteristic combines, Identity label as the corresponding small sample of target group.The application other embodiments are divided into each first sample each corresponding After multiple second samples, target group are not had found, then continue to divide each second sample again, obtain each second sample Corresponding multiple third samples, or another straight divide occur until finding corresponding n-th sample of target group, then will be multiple Divide fisrt feature, second feature that default sample uses ... the combination of the n-th feature composition characteristic, it is corresponding as target group The identity label of small sample.
As module 9, draw a portrait for the user by feature combination as the target group.
The present embodiment, preferably to identify the target group, is more convenient root by forming user's portrait to target group Going to expand according to user's portrait has the new user of same characteristic features as client.
For the present embodiment for finding the user group with specified buying rate, pre-selection sample is that certain product buys platform Database.The pre-selection sample of another embodiment of the application is the characteristic of other cases such as diabetes, then can be according to above-mentioned mistake Journey and principle find the special case crowd of certain high-incidence disease, to carry out effectively managing seizure of disease rate.
The pre-selection sample of the application another embodiment is the property data base of debt-credit crowd, then can be according to the above process and original Reason finds the special case crowd with debt-credit risk, to carry out effectively managing debt-credit risk.
Referring to Fig.1 0, a kind of computer equipment is also provided in the embodiment of the present application, which can be server, Its internal structure can be as shown in Figure 10.The computer equipment includes processor, the memory, network connected by system bus Interface and database.Wherein, the processor of the Computer Design is for providing calculating and control ability.The computer equipment is deposited Reservoir includes non-volatile memory medium, built-in storage.The non-volatile memory medium is stored with operating system, computer program And database.The internal memory provides environment for the operation of operating system and computer program in non-volatile memory medium.It should The database of computer equipment is used to find all data that the process of target group needs.The network interface of the computer equipment For being communicated with external terminal by network connection.Target group are found when the computer program is executed by processor to realize Method.
The method that above-mentioned processor executes above-mentioned searching target group, comprising: multiple pre-selection samples are obtained, wherein each pre- Sampling originally includes the corresponding user data of multiple features of user;In multiple features that the multiple pre-selection sample includes Obtaining influences maximum fisrt feature to the information content of the multiple pre-selection sample;Referred to according to the fisrt feature corresponding first The multiple pre-selection sample is divided into the first sample of the first specified quantity by the classified zoning of fixed number amount;From each described Screening meets the target first sample of the first preset condition in one sample, and wherein target first sample is one or more;Institute It states to obtain in multiple features that target first sample includes and maximum second spy is influenced on the information content of the target first sample Sign, the second feature are different from the fisrt feature;According to the classification of corresponding second specified quantity of the second feature point The target first sample is divided into the second sample of the second specified quantity by area;Judge the second of second specified quantity With the presence or absence of the second sample of target for meeting preset condition in sample;If it exists, then stop drawing second sample of target Point, and determine that the second sample of the target for meeting preset condition is corresponding target group;Otherwise to second sample of target This is divided again.
Above-mentioned computer equipment, finding target group by decision-tree model influences the corresponding spy of maximum significant coefficient Sign accelerates the efficiency for finding target group, and improves searching accuracy.The application is maximum heavy according to the influence searched out The corresponding feature of coefficient is wanted, the refinement to pre-selection sample is realized, divides, gradually to inquire target group, realize to target complex The effective use and control of body.The application is aggregated into characteristic set by carrying out to the feature for finding target group, forms target Group draws a portrait by the user of label of characteristic set, and facilitating development to have target group's characteristic set is the potential customers of label.
In one embodiment, above-mentioned processor obtains in multiple features that the multiple pre-selection sample includes to described The step of information content of multiple pre-selection samples influences maximum fisrt feature, comprising: calculate the overall information of the pre-selection sample Amount;Each feature is obtained respectively to the influence value of the overall information amount;According to the size of each influence value to each feature Carry out descending arrangement;The arrangement order in the descending arrangement is set near the preceding corresponding feature of the first influence value, is first Feature.
In one embodiment, above-mentioned processor is according to the classification of corresponding first specified quantity of the fisrt feature point Area, before the step of the multiple pre-selection sample is divided into the first sample of the first specified quantity, further includes: obtain described the The attribute of one feature;The classification point of corresponding first specified quantity of the fisrt feature is determined according to the attribute of the fisrt feature Area.
In one embodiment, the attribute of the fisrt feature is classification type, the attribute according to the fisrt feature The step of determining the classified zoning of corresponding first specified quantity of the fisrt feature, comprising: according to the class of the fisrt feature The multiple pre-selection sample is divided into the first sample of the first specified quantity corresponding with the classification type by other type, In, first specified quantity is the classification number of species of the fisrt feature.
In one embodiment, the attribute of the fisrt feature is numeric type, the attribute according to the fisrt feature The step of determining the classified zoning of corresponding first specified quantity of the fisrt feature, comprising: according to the characterization fisrt feature Continuous data corresponding to discrete segment, the pre-selection sample is divided into the corresponding with the discrete segment first specified number The first sample of amount, wherein first specified quantity is discrete segment corresponding to the continuous data of the fisrt feature Quantity.
In one embodiment, above-mentioned processor judges in the second sample of second specified quantity with the presence or absence of satisfaction The step of the second sample of target of second preset condition, comprising: the corresponding buying rate of each second sample by comparing, Obtain corresponding specified second sample of maximum buying rate;Judge whether the corresponding maximum buying rate of specified second sample meets The corresponding buying rate of second preset condition;If satisfied, then determining there is the second sample of target for meeting the second preset condition.
In one embodiment, above-mentioned processor judges in the second sample of second specified quantity with the presence or absence of satisfaction The step of the second sample of target of second preset condition, comprising: the corresponding buying rate of each second sample by comparing, Obtain corresponding specified second sample of maximum buying rate;Judge whether the corresponding maximum buying rate of specified second sample meets The corresponding buying rate of second preset condition;If satisfied, it is default then to judge whether the total amount of data of specified second sample is greater than Quantity;If more than preset quantity, then determine there is the second sample of target for meeting the second preset condition.
It will be understood by those skilled in the art that structure shown in Figure 10, only part relevant to application scheme The block diagram of structure does not constitute the restriction for the computer equipment being applied thereon to application scheme.
One embodiment of the application also provides a kind of computer readable storage medium, is stored thereon with computer program, calculates The method for finding target group is realized when machine program is executed by processor, comprising: multiple pre-selection samples are obtained, wherein each pre-selection Sample includes the corresponding user data of multiple features of user;It is obtained in multiple features that the multiple pre-selection sample includes Taking the information content on the multiple pre-selection sample influences maximum fisrt feature;It is specified according to the fisrt feature corresponding first The multiple pre-selection sample is divided into the first sample of the first specified quantity by the classified zoning of quantity;From each described first Screening meets the target first sample of the first preset condition in sample, and wherein target first sample is one or more;Described Obtaining in multiple features that target first sample includes influences maximum second feature to the information content of the target first sample, The second feature is different from the fisrt feature;According to the classified zoning of corresponding second specified quantity of the second feature, The target first sample is divided into the second sample of the second specified quantity;Judge the second sample of second specified quantity In with the presence or absence of meeting the second sample of target of preset condition;If it exists, then stop the division to second sample of target, and Determine that the second sample of the target for meeting preset condition is corresponding target group;Otherwise again to second sample of target It is divided.
Above-mentioned computer readable storage medium, finding target group by decision-tree model influences maximum significant coefficient pair The feature answered accelerates the efficiency for finding target group, and improves searching accuracy.The influence that the application and foundation search out The corresponding feature of maximum significant coefficient is realized the refinement to pre-selection sample, is divided, gradually to inquire target group, realizes Effective use and control to target group.The application carries out being aggregated into characteristic set by the feature to searching target group, It forms target group to draw a portrait by the user of label of characteristic set, facilitating development to have target group's characteristic set is the latent of label In client.
In one embodiment, above-mentioned processor obtains in multiple features that the multiple pre-selection sample includes to described The step of information content of multiple pre-selection samples influences maximum fisrt feature, comprising: calculate the overall information of the pre-selection sample Amount;Each feature is obtained respectively to the influence value of the overall information amount;According to the size of each influence value to each feature Carry out descending arrangement;The arrangement order in the descending arrangement is set near the preceding corresponding feature of the first influence value, is first Feature.
In one embodiment, above-mentioned processor is according to the classification of corresponding first specified quantity of the fisrt feature point Area, before the step of the multiple pre-selection sample is divided into the first sample of the first specified quantity, comprising: obtain described first The attribute of feature;The classification point of corresponding first specified quantity of the fisrt feature is determined according to the attribute of the fisrt feature Area.
In one embodiment, the attribute of the fisrt feature is classification type, the attribute according to the fisrt feature The step of determining the classified zoning of corresponding first specified quantity of the fisrt feature, comprising: according to the class of the fisrt feature The multiple pre-selection sample is divided into the first sample of the first specified quantity corresponding with the classification type by other type, In, first specified quantity is the classification number of species of the fisrt feature.
In one embodiment, the attribute of the fisrt feature is numeric type, the attribute according to the fisrt feature The step of determining the classified zoning of corresponding first specified quantity of the fisrt feature, comprising: according to the characterization fisrt feature Continuous data corresponding to discrete segment, the pre-selection sample is divided into the corresponding with the discrete segment first specified number The first sample of amount, wherein first specified quantity is discrete segment corresponding to the continuous data of the fisrt feature Quantity.
In one embodiment, above-mentioned processor judges in the second sample of second specified quantity with the presence or absence of satisfaction The step of the second sample of target of second preset condition, comprising: the corresponding buying rate of each second sample by comparing, Obtain corresponding specified second sample of maximum buying rate;Whether judge the corresponding maximum buying rate of specified second sample Meet the corresponding buying rate of the second preset condition;If satisfied, then determining there is the second sample of target for meeting the second preset condition.
In one embodiment, above-mentioned processor judges in the second sample of second specified quantity with the presence or absence of satisfaction The step of the second sample of target of second preset condition, comprising: the corresponding buying rate of each second sample by comparing, Obtain corresponding specified second sample of maximum buying rate;Judge whether the corresponding maximum buying rate of specified second sample meets The corresponding buying rate of second preset condition;If satisfied, it is default then to judge whether the total amount of data of specified second sample is greater than Quantity;If more than preset quantity, then determine there is the second sample of target for meeting the second preset condition.
Those of ordinary skill in the art will appreciate that realizing all or part of the process in above-described embodiment method, being can be with Relevant hardware is instructed to complete by computer program, above-mentioned computer program can be stored in a non-volatile computer In read/write memory medium, the computer program is when being executed, it may include such as the process of the embodiment of above-mentioned each method.Wherein, Any reference used in provided herein and embodiment to memory, storage, database or other media, Including non-volatile and/or volatile memory.Nonvolatile memory may include read-only memory (ROM), programming ROM (PROM), electrically programmable ROM (EPROM), electrically erasable ROM (EEPROM) or flash memory.Volatile memory may include Random access memory (RAM) or external cache.By way of illustration and not limitation, RAM is available in many forms, Such as static state RAM (SRAM), dynamic ram (DRAM), synchronous dram (SDRAM), double speed are according to rate SDRAM (SSRSDRAM), enhancing Type SDRAM (ESDRAM), synchronization link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic ram (DRDRAM) and memory bus dynamic ram (RDRAM) etc..
It should be noted that, in this document, the terms "include", "comprise" or its any other variant are intended to non-row His property includes, so that the process, device, article or the method that include a series of elements not only include those elements, and And further include other elements that are not explicitly listed, or further include for this process, device, article or method institute it is intrinsic Element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that including being somebody's turn to do There is also other identical elements in the process, device of element, article or method.
The foregoing is merely preferred embodiment of the present application, are not intended to limit the scope of the patents of the application, all utilizations Equivalent structure or equivalent flow shift made by present specification and accompanying drawing content is applied directly or indirectly in other correlations Technical field, similarly include in the scope of patent protection of the application.

Claims (10)

1. a kind of method for finding target group characterized by comprising
Multiple pre-selection samples are obtained, wherein each pre-selection sample includes the corresponding user data of multiple features of user;
Obtaining in multiple features that the multiple pre-selection sample includes influences maximum to the information content of the multiple pre-selection sample Fisrt feature;
According to the classified zoning of corresponding first specified quantity of the fisrt feature, the multiple pre-selection sample is divided into first The first sample of specified quantity;
Screening meets the target first sample of the first preset condition from each first sample, and wherein target first sample is It is one or more;
Obtaining in multiple features that the target first sample includes influences maximum to the information content of the target first sample Second feature, the second feature is different from the fisrt feature;
According to the classified zoning of corresponding second specified quantity of the second feature, the target first sample is divided into second Second sample of specified quantity;
Judge in the second sample of second specified quantity with the presence or absence of the second sample of target for meeting the second preset condition;
If it exists, then stop division to second sample of target, and determine to meet the target the of second preset condition Two samples are corresponding target group.
2. the method according to claim 1 for finding target group, which is characterized in that described in the multiple pre-selection sample Including multiple features in obtain on it is the multiple pre-selection sample information content influence maximum fisrt feature the step of, comprising:
Calculate the overall information amount of the pre-selection sample;
Each feature is obtained respectively to the influence value of the overall information amount;
Descending arrangement is carried out to each feature according to the size of each influence value;
The arrangement order in the descending arrangement is set near the preceding corresponding feature of the first influence value, is the fisrt feature.
3. the method according to claim 1 for finding target group, which is characterized in that described according to the fisrt feature pair The multiple pre-selection sample is divided into the step of the first sample of the first specified quantity by the classified zoning for the first specified quantity answered Before rapid, further includes:
Obtain the attribute of the fisrt feature;
The classified zoning of corresponding first specified quantity of the fisrt feature is determined according to the attribute of the fisrt feature.
4. the method according to claim 3 for finding target group, which is characterized in that the attribute of the fisrt feature is class Other type, the attribute according to the fisrt feature determine the classified zoning of corresponding first specified quantity of the fisrt feature Step, comprising:
According to the classification type of the fisrt feature, the multiple pre-selection sample is divided into corresponding with the classification type The first sample of one specified quantity, wherein first specified quantity is the classification number of species of the fisrt feature.
5. the method according to claim 3 for finding target group, which is characterized in that the attribute of the fisrt feature is number Value type, the attribute according to the fisrt feature determine the classified zoning of corresponding first specified quantity of the fisrt feature Step, comprising:
According to discrete segment corresponding to the continuous data for characterizing the fisrt feature, by the pre-selection sample be divided into it is described The first sample of corresponding first specified quantity of discrete segment, wherein first specified quantity is the company of the fisrt feature The quantity of discrete segment corresponding to continuous data.
6. the method according to claim 1 for finding target group, which is characterized in that the specified number of the judgement described second The step of in second sample of amount with the presence or absence of the second sample of target for meeting the second preset condition, comprising:
The corresponding buying rate of each second sample by comparing obtains corresponding specified second sample of maximum buying rate;
Judge whether the corresponding maximum buying rate of specified second sample meets the corresponding buying rate of the second preset condition;
If satisfied, then determining there is the second sample of target for meeting the second preset condition.
7. the method according to claim 1 for finding target group, which is characterized in that the specified number of the judgement described second The step of in second sample of amount with the presence or absence of the second sample of target for meeting the second preset condition, comprising:
The corresponding buying rate of each second sample by comparing obtains corresponding specified second sample of maximum buying rate;
Judge whether the corresponding maximum buying rate of specified second sample meets the corresponding buying rate of the second preset condition;
If satisfied, then judging whether the total amount of data of specified second sample is greater than preset quantity;
If more than preset quantity, then determine there is the second sample of target for meeting the second preset condition.
8. a kind of device for finding target group characterized by comprising
First obtains module, for obtaining multiple pre-selection samples, wherein each pre-selection sample includes multiple features difference of user Corresponding user data;
Second obtains module, for obtaining in multiple features that the multiple pre-selection sample includes to the multiple pre-selection sample Information content influences maximum fisrt feature;
First division module will be the multiple for the classified zoning according to corresponding first specified quantity of the fisrt feature Pre-selection sample is divided into the first sample of the first specified quantity;
Screening module, for the target first sample of the first preset condition of screening satisfaction from each first sample, wherein Target first sample is one or more;
Third obtains module, for obtaining in multiple features that the target first sample includes to the target first sample Information content influence maximum second feature, the second feature is different from the fisrt feature;
Second division module, for the classified zoning according to corresponding second specified quantity of the second feature, by the target First sample is divided into the second sample of the second specified quantity;
Judgment module, with the presence or absence of the mesh for meeting the second preset condition in the second sample for judging second specified quantity Mark the second sample;
Determination module for if it exists, then stopping the division to second sample of target, and determines that meeting described second presets The second sample of target of condition is corresponding target group.
9. a kind of computer equipment, including memory and processor, the memory are stored with computer program, feature exists In the step of processor realizes any one of claims 1 to 7 the method when executing the computer program.
10. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the computer program The step of method described in any one of claims 1 to 7 is realized when being executed by processor.
CN201810771080.9A 2018-07-13 2018-07-13 Method, device, computer equipment and storage medium for searching target group Active CN109101562B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810771080.9A CN109101562B (en) 2018-07-13 2018-07-13 Method, device, computer equipment and storage medium for searching target group

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810771080.9A CN109101562B (en) 2018-07-13 2018-07-13 Method, device, computer equipment and storage medium for searching target group

Publications (2)

Publication Number Publication Date
CN109101562A true CN109101562A (en) 2018-12-28
CN109101562B CN109101562B (en) 2023-07-21

Family

ID=64846410

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810771080.9A Active CN109101562B (en) 2018-07-13 2018-07-13 Method, device, computer equipment and storage medium for searching target group

Country Status (1)

Country Link
CN (1) CN109101562B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109992699A (en) * 2019-02-28 2019-07-09 平安科技(深圳)有限公司 Optimization method and device, storage medium, the computer equipment of user group
CN110009012A (en) * 2019-03-20 2019-07-12 阿里巴巴集团控股有限公司 A kind of risk specimen discerning method, apparatus and electronic equipment
WO2020263440A1 (en) * 2019-06-28 2020-12-30 Microsoft Technology Licensing, Llc Data-driven cross feature generation

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030061233A1 (en) * 2001-09-21 2003-03-27 Manasse Mark S. System and method for determining likely identity in a biometric database
CN105718490A (en) * 2014-12-04 2016-06-29 阿里巴巴集团控股有限公司 Method and device for updating classifying model
CN105868847A (en) * 2016-03-24 2016-08-17 车智互联(北京)科技有限公司 Shopping behavior prediction method and device
CN105956122A (en) * 2016-05-03 2016-09-21 无锡雅座在线科技发展有限公司 Object attribute determining method and device
CN106227743A (en) * 2016-07-12 2016-12-14 精硕世纪科技(北京)有限公司 Advertisement target group touches and reaches ratio estimation method and device
CN107785058A (en) * 2017-07-24 2018-03-09 平安科技(深圳)有限公司 Anti- fraud recognition methods, storage medium and the server for carrying safety brain
CN107818482A (en) * 2017-11-22 2018-03-20 用友金融信息技术股份有限公司 Computational methods, system and the computer equipment of the notable feature of target group
CN107944481A (en) * 2017-11-16 2018-04-20 百度在线网络技术(北京)有限公司 Method and apparatus for generating information
CN108153824A (en) * 2017-12-06 2018-06-12 阿里巴巴集团控股有限公司 The determining method and device of targeted user population

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030061233A1 (en) * 2001-09-21 2003-03-27 Manasse Mark S. System and method for determining likely identity in a biometric database
CN105718490A (en) * 2014-12-04 2016-06-29 阿里巴巴集团控股有限公司 Method and device for updating classifying model
CN105868847A (en) * 2016-03-24 2016-08-17 车智互联(北京)科技有限公司 Shopping behavior prediction method and device
CN105956122A (en) * 2016-05-03 2016-09-21 无锡雅座在线科技发展有限公司 Object attribute determining method and device
CN106227743A (en) * 2016-07-12 2016-12-14 精硕世纪科技(北京)有限公司 Advertisement target group touches and reaches ratio estimation method and device
CN107785058A (en) * 2017-07-24 2018-03-09 平安科技(深圳)有限公司 Anti- fraud recognition methods, storage medium and the server for carrying safety brain
CN107944481A (en) * 2017-11-16 2018-04-20 百度在线网络技术(北京)有限公司 Method and apparatus for generating information
CN107818482A (en) * 2017-11-22 2018-03-20 用友金融信息技术股份有限公司 Computational methods, system and the computer equipment of the notable feature of target group
CN108153824A (en) * 2017-12-06 2018-06-12 阿里巴巴集团控股有限公司 The determining method and device of targeted user population

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109992699A (en) * 2019-02-28 2019-07-09 平安科技(深圳)有限公司 Optimization method and device, storage medium, the computer equipment of user group
CN109992699B (en) * 2019-02-28 2023-08-11 平安科技(深圳)有限公司 User group optimization method and device, storage medium and computer equipment
CN110009012A (en) * 2019-03-20 2019-07-12 阿里巴巴集团控股有限公司 A kind of risk specimen discerning method, apparatus and electronic equipment
WO2020263440A1 (en) * 2019-06-28 2020-12-30 Microsoft Technology Licensing, Llc Data-driven cross feature generation

Also Published As

Publication number Publication date
CN109101562B (en) 2023-07-21

Similar Documents

Publication Publication Date Title
Syakur et al. Integration k-means clustering method and elbow method for identification of the best customer profile cluster
US6078892A (en) Method for customer lead selection and optimization
US7818286B2 (en) Computer-implemented dimension engine
CN110363387A (en) Portrait analysis method, device, computer equipment and storage medium based on big data
CN107180093A (en) Information search method and device and ageing inquiry word recognition method and device
CN108108380A (en) Search ordering method, searching order device, searching method and searcher
CN109101562A (en) Find method, apparatus, computer equipment and the storage medium of target group
TW201327451A (en) Providing information recommendations based on determined user groups
CN112052394B (en) Professional content information recommendation method, system, terminal equipment and storage medium
EP2805223A2 (en) Intelligent navigation of a category system
CN109582849A (en) A kind of Internet resources intelligent search method of knowledge based map
CN111126865B (en) Technology maturity judging method and system based on technology big data
CN111160404B (en) Analysis method and device for reasonable value of line loss marker post of power distribution network
CN116431931B (en) Real-time incremental data statistical analysis method
CN111488385A (en) Data processing method and device based on artificial intelligence and computer equipment
CN109753504A (en) Data query method and device
WO1990007163A1 (en) Attribute inductive data analysis
CN105786810B (en) The method for building up and device of classification mapping relations
CN112100400A (en) Node recommendation method and device based on knowledge graph
US20220253448A1 (en) Database search enhancement and interactive user interface therefor
CN108664605A (en) A kind of model evaluation method and system
CN107844496B (en) Statistical information output method and device
JP4059970B2 (en) Information source recommendation device
CN111985576A (en) Shop address selection method based on decision tree
CN104462480B (en) Comment big data method for digging based on typicalness

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant