CN110135916A - A kind of similar crowd recognition method and system - Google Patents

A kind of similar crowd recognition method and system Download PDF

Info

Publication number
CN110135916A
CN110135916A CN201910433863.0A CN201910433863A CN110135916A CN 110135916 A CN110135916 A CN 110135916A CN 201910433863 A CN201910433863 A CN 201910433863A CN 110135916 A CN110135916 A CN 110135916A
Authority
CN
China
Prior art keywords
user group
attribute
similarity
seed
potential target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910433863.0A
Other languages
Chinese (zh)
Inventor
刘亚红
郝冬林
王新胜
李适季
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Best Network To Help Gang Information Technology Co Ltd
Original Assignee
Beijing Best Network To Help Gang Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Best Network To Help Gang Information Technology Co Ltd filed Critical Beijing Best Network To Help Gang Information Technology Co Ltd
Priority to CN201910433863.0A priority Critical patent/CN110135916A/en
Publication of CN110135916A publication Critical patent/CN110135916A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0251Targeted advertisements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0277Online advertisement

Abstract

The invention discloses a kind of similar crowd recognition method and system, method includes: the lateral attribute based on seed user group, determines that the similarity with the lateral attribute of seed user group meets the potential target user group of first condition from user group to be identified;Longitudinal attribute based on potential target user group determines that the similarity with longitudinal attribute of seed user group meets the targeted user population of second condition from potential target user group.The present invention can effectively excavate potential target customer crowd, save advertising cost, improve advertising results according to the characteristic attribute information of seed crowd, increase effective audient.

Description

A kind of similar crowd recognition method and system
Technical field
The present invention relates to technical field of data processing more particularly to a kind of similar crowd recognition method and system.
Background technique
With the rapid development of internet technology, Internet advertising is increasingly becoming one kind by advertiser and user's welcome Mainstream advertising media.Currently, the form of Internet advertising and dispensing source are varied, common advertising mechanism is generallyd use The data extracting rule of the formulations such as the historical information browsed based on advertisement keyword, user, matches data collected The advertisement of suitable user is finally delivered to user by screening, however this method mechanism is clearly present of data and extracts and advertisement throwing The timeliness of the hysteresis quality put, data is poor.
Therefore, to solve the above problems, improving the displaying rate for launching advertisement, precision is launched in advertisement and improving user experience Effect, is badly in need of that a kind of precision is higher, scheme is launched in the better advertisement of suitability.
Summary of the invention
In view of this, the present invention provides a kind of similar crowd recognition method, it can be according to the characteristic attribute of seed crowd Information effectively excavates potential target customer crowd, saves advertising cost, improves advertising results, increase effectively by It is many.
The present invention provides a kind of similar crowd recognition methods, comprising:
Lateral attribute based on seed user group is determined and the seed user group from user group to be identified The similarity of lateral attribute meet the potential target user group of first condition;
Based on longitudinal attribute of the potential target user group, determined from the potential target user group and institute The similarity for stating longitudinal attribute of seed user group meets the targeted user population of second condition.
Preferably, longitudinal attribute based on the potential target user group, from the potential target user group In determine that the similarity with longitudinal attribute of the seed user group meets the targeted user population of second condition, comprising:
Based on longitudinal attribute information of the potential target user group, the seed is calculated using cosine similar function and is used The similarity of longitudinal attribute of family group and the potential target user group, is determined from the potential target user group The similarity of longitudinal attribute meets the group of second condition as targeted user population.
Preferably, the lateral attribute based on seed user group, determined from user group to be identified with it is described The similarity of the lateral attribute of seed user group meets the potential target user group of first condition, comprising:
Lateral attribute based on seed user group, using glowworm swarm algorithm calculate the seed user group and it is described to The similarity for identifying the lateral attribute of user group determines that the similarity of lateral attribute is full from the user group to be identified The group of sufficient first condition is as potential target user group.
Preferably, the lateral attribute based on seed user group calculates the seed user using glowworm swarm algorithm The similarity of the lateral attribute of group and the user group to be identified, determines lateral category from the user group to be identified The similarity of property meets the group of first condition as potential target user group, comprising:
Initialization algorithm parameter;
Class central value is calculated with the lateral attribute of the seed user group;
The original intensity of each firefly is calculated based on the class central value;
In algorithm iteration, the brightness of the location information and firefly that update each firefly is calculated, from described to be identified Determine that the similarity of lateral attribute meets the group of first condition as potential target user group in user group.
Preferably, the initialization algorithm parameter, comprising:
In conjunction with the seed user group and user group's scale to be identified setting optimizing population invariable number, initial attraction Power, the absorption coefficient of light, step factor and maximum number of iterations.
A kind of similar crowd recognition system, comprising:
First determining module is determined from user group to be identified for the lateral attribute based on seed user group Meet the potential target user group of first condition with the similarity of the lateral attribute of the seed user group;
Second determining module is used for longitudinal attribute based on the potential target user group from the potential target Determine that the similarity with longitudinal attribute of the seed user group meets the targeted user population of second condition in the group of family.
Preferably, second determining module is executing longitudinal attribute based on the potential target user group, from institute It states and determines to meet second condition with the similarity of longitudinal attribute of the seed user group in potential target user group When targeted user population, it is specifically used for:
Based on longitudinal attribute information of the potential target user group, the seed is calculated using cosine similar function and is used The similarity of longitudinal attribute of family group and the potential target user group, is determined from the potential target user group The similarity of longitudinal attribute meets the group of second condition as targeted user population.
Preferably, first determining module is executing the lateral attribute based on seed user group, from user to be identified Determine that the similarity with the lateral attribute of the seed user group meets the potential target user group of first condition in group When body, it is specifically used for:
Lateral attribute based on seed user group, using glowworm swarm algorithm calculate the seed user group and it is described to The similarity for identifying the lateral attribute of user group determines that the similarity of lateral attribute is full from the user group to be identified The group of sufficient first condition is as potential target user group.
Preferably, first determining module is executing the lateral attribute based on seed user group, is calculated using firefly Method calculates the similarity of the seed user group and the lateral attribute of the user group to be identified, from the user to be identified It is specific to use when determining that the similarity of lateral attribute meets the group of first condition as potential target user group in group In:
Initialization algorithm parameter;
Class central value is calculated with the lateral attribute of the seed user group;
The original intensity of each firefly is calculated based on the class central value;
In algorithm iteration, the brightness of the location information and firefly that update each firefly is calculated, from described to be identified Determine that the similarity of lateral attribute meets the group of first condition as potential target user group in user group.
Preferably, first determining module is specifically used for when executing initialization algorithm parameter:
In conjunction with the seed user group and user group's scale to be identified setting optimizing population invariable number, initial attraction Power, the absorption coefficient of light, step factor and maximum number of iterations.
In conclusion the invention discloses a kind of similar crowd recognition method, when needing to the similar of seed user group When crowd identifies, be primarily based on the lateral attribute of seed user group, determined from user group to be identified with it is described The similarity of the lateral attribute of seed user group meets the potential target user group of first condition, is then based on potential target Longitudinal attribute of user group determines the similarity with longitudinal attribute of seed user group from potential target user group Meet the targeted user population of second condition.The present invention can be according to the characteristic attribute information of seed crowd, and effective excavate is dived Target customer crowd, save advertising cost, improve advertising results, increase effective audient.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this Some embodiments of invention for those of ordinary skill in the art without creative efforts, can be with It obtains other drawings based on these drawings.
Fig. 1 is a kind of method flow diagram of similar crowd recognition embodiment of the method 1 disclosed by the invention;
Fig. 2 is a kind of method flow diagram of similar crowd recognition embodiment of the method 2 disclosed by the invention;
Fig. 3 is a kind of structural schematic diagram of similar crowd recognition system embodiment 1 disclosed by the invention;
Fig. 4 is a kind of structural schematic diagram of similar crowd recognition system embodiment 2 disclosed by the invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall within the protection scope of the present invention.
As shown in Figure 1, be a kind of method flow diagram of similar crowd recognition embodiment of the method 1 disclosed by the invention, it is described Method may comprise steps of:
S101, the lateral attribute based on seed user group, are determined and seed user group from user group to be identified The similarity of the lateral attribute of body meets the potential target user group of first condition;
When needing to identify crowd similar with seed user group from user group to be identified, used for effective use User characteristics Attribute transposition is lateral attribute and longitudinal attribute by family characteristic attribute.Wherein, lateral attribute includes: gender, year Basic populations' attribute such as age, personal income and educational background, longitudinal attribute be user daily concern, amusement, finance, movement and when The interest preferences such as political affairs.
Then, by the lateral attributive character for the user being related in calculating seed user group and user group to be identified Euclidean distance, to indicate that its degree of similarity, specific formula for calculation are as follows:
Wherein,Indicate the lateral attribute vector of different user in seed user group and user group to be identified, xi、 yjIndicate specific element property in vector, such as gender, age and educational background.
According to the similarity of calculated lateral attribute, the similarity of lateral attribute is determined from user group to be identified Meet first condition potential target user group.
S102, longitudinal attribute based on potential target user group, are determined and seed from potential target user group The similarity of longitudinal attribute of user group meets the targeted user population of second condition.
In the measure of feature similarity measurements, similarity function can be calculated effectively between two vectors in space dimension Difference on degree.When determined from user group to be identified and the similarity of the lateral attribute of seed user group meet first After the potential target user group of condition, further using cosine similar function as measurement seed user group and potential target The measurement of the user longitudinal direction attribute similarity of user group.Specific formula for calculation is as follows:
Functional value is bigger, illustrates that two vector angles are smaller, then the characteristic similarity representated by it is also higher.
Wherein,Indicate longitudinal attribute vector of different user in seed user group and potential target user group, xi、yjIndicate specific element property in vector, such as the hobby of daily concern: amusement, finance and movement.
Then, according to the similarity of calculated longitudinal attribute, longitudinal attribute is determined from potential target user group Similarity meet the targeted user population of second condition, the targeted user population determined is the similar of seed user group Crowd.
In conclusion in the above-described embodiments, when needing the similar crowd to seed user group to identify, first Lateral attribute based on seed user group is determined to belong to the transverse direction of the seed user group from user group to be identified The similarity of property meets the potential target user group of first condition, is then based on longitudinal attribute of potential target user group, Determine that the similarity with longitudinal attribute of seed user group meets the target of second condition from potential target user group User group.The present invention can effectively excavate potential target customer crowd according to the characteristic attribute information of seed crowd, section Advertising cost has been saved, advertising results are improved, has increased effective audient.
As shown in Fig. 2, be a kind of method flow diagram of similar crowd recognition embodiment of the method 2 disclosed by the invention, it is described Method may comprise steps of:
S201, the lateral attribute based on seed user group, using glowworm swarm algorithm calculate seed user group with wait know The similarity of the lateral attribute of other user group determines that the similarity of lateral attribute meets first from user group to be identified The group of condition is as potential target user group;
Glowworm swarm algorithm be it is a kind of by simulation nature firefly individual between the mechanism of attracting each other reach optimizing The colony intelligence random search algorithm of purpose, algorithm mechanism is simple, is easily achieved, versatile.Algorithm core concept is firefly It can be attracted by the bigger firefly of absolute brightness, and carry out self-position update using formula, i.e., if firefly i's is absolute bright Degree is greater than the absolute brightness of firefly j, then firefly i attracts firefly j to move to it;Specific absolute brightness then combines accordingly The problem of design fitness function, i.e. the absolute brightness of firefly is directly proportional to problem fitness function value.
The algorithm realize during assumed as follows: the mutual Attraction Degree of firefly only with its own brightness and Mutual distance is related, is not influenced by other factors.Wherein, firefly Attraction Degree is directly proportional to its brightness, shines weak Firefly by than its luminance firefly attract;Firefly Attraction Degree is inversely proportional with its distance, and distance is remoter, and Attraction Degree is got over It is low.
Assuming that the absolute brightness of firefly i is greater than the absolute brightness of firefly j, then firefly i attracts firefly j to move to i It is dynamic.Firefly i is β to the Attraction Degree of jij, calculation formula is as follows:
Wherein, β0For initial maximum attraction, usual value is 1;γ is the absorption coefficient of light, γ ∈ [0.01,100], rij For firefly i to the distance of firefly j, in the present invention, rijIndicate that the seed that firefly i and firefly j is respectively represented is used The lateral attribute difference degree of family group and user group to be identified, specific formula is expressed as formula (4), i.e., as follows:
Wherein,Indicate the lateral attribute vector of different user in seed user group and user group to be identified, xi、yjIndicate specific element property in vector, such as gender, age and educational background.
Firefly j is moved to it under the attraction of firefly i, and the location update formula of firefly j is as follows.
xj(t+1)=xj(t)+βij(xi(t)-xj(t))+αεj (5)
Wherein, t is the number of iterations;βijIt is firefly i to the attraction of firefly j;α indicate step factor, be section [0, 1] arbitrary number in;εjFor equally distributed random number.
Specifically, above-mentioned glowworm swarm algorithm the following steps are included:
Step (1): algorithm parameter initialization is completed, is advised in conjunction with seed user group set S and user group to be identified Mould sets optimizing population invariable number N, and initial attraction β is arranged0, absorption coefficient of light γ, step factor α, maximum number of iterations MaxGen。
Step (2): reading in the lateral attribute information of each user characteristics in seed user group, by each sample according to existing Information is divided into corresponding attribute classification, in this, as initial category division, calculates according to each attribute value of sample in data Class central value cenk
Step (3): with the lateral attribute and seed user of each user in user group to be identified during algorithm iteration Judgment criteria of the distance between group's class center quadratic sum as its similitude superiority and inferiority, i.e. fitness function in iterative process It is as follows:
Wherein, n indicates user group's quantity to be identified, and m indicates that the user considered in similar crowd divides laterally belongs to The quantity of property.
Step (4): according to the function fitness value being calculated in step (3), i.e. firefly brightness, fitness value is bigger Indicate firefly brightness it is higher, it is closer with the lateral attributes similarity of seed user group, can more attract other fireflies to It is moved.
Step (5): in algorithm iteration, successively according to the Attraction Degree and location update formula (3), formula (4) of firefly It is iterated optimization with formula (5), is finally excavated in user group to be identified with the lateral attribute of seed user group most Matched potential target user group T1
S202, longitudinal attribute information based on potential target user group calculate seed user using cosine similar function The similarity of longitudinal attribute of group and potential target user group, determines longitudinal attribute from potential target user group Similarity meets the group of second condition as targeted user population.
The lateral attribute with seed user group is filtered out in user group to be identified by the solution of first stage With degree preferably potential target user group T1, second stage mainly in combination with seed user group longitudinal attribute information to latent In targeted user population T1In user further carry out classification and matching, finally obtain and the most similar mesh of seed user group Mark user group T2
Specifically, the present invention as measurement seed user group set S and is dived using cosine similar function in second stage In targeted user population T1The measurement of longitudinal attribute similarity of middle user.Specific formula for calculation such as formula (2), functional value is got over Greatly, indicate that two vector angles are smaller, representative characteristic similarity is also higher.
By calculating, each cosine function value is ranked up, in potential target user group T1In press preset ratio Example filters out certain customers' collection and is combined into final targeted user population T2, as finally obtained similar to seed user group Targeted user population.
In conclusion difference of the present invention according to attribute classification and content, the user property being involved in are divided into lateral category Property and longitudinal attribute two major classes.The present invention carries out in two stages, in the first phase, using firefly iterative algorithm, based on kind The lateral attribute of child user group, fitness function and glowworm swarm algorithm according to design update iterator mechanism, to use to be identified Family group carries out preliminary filtering screening, obtains and seed user group matching degree preferably potential target user group;? In two-stage, based on the first stage obtain as a result, in conjunction with seed user group longitudinal attribute, using cosine similar function make For the module for measuring user's similitude in seed user group and potential target user group, according to the function being calculated Value filters out certain customers' collection in preset ratio in potential target user group and is combined into final potential user group Body.The present invention has fully considered the otherness of user property classification, stage by stage according to the characteristic attribute information of seed user group Excavation screening is carried out to user group to be identified, finally obtains the targeted user population satisfied with seed user group matching degree, The conversion ratio for effectively improving Internet advertising reaches and not only can guarantee that accuracy is launched in advertisement, but also do not damage the purpose of user experience.
As shown in figure 3, be a kind of structural schematic diagram of similar crowd recognition system embodiment 1 disclosed by the invention, it is described System may include:
First determining module 301 is determined from user group to be identified for the lateral attribute based on seed user group Meet the potential target user group of first condition with the similarity of the lateral attribute of seed user group out;
When needing to identify crowd similar with seed user group from user group to be identified, used for effective use User characteristics Attribute transposition is lateral attribute and longitudinal attribute by family characteristic attribute.Wherein, lateral attribute includes: gender, year Basic populations' attribute such as age, personal income and educational background, longitudinal attribute be user daily concern, amusement, finance, movement and when The interest preferences such as political affairs.
Then, by the lateral attributive character for the user being related in calculating seed user group and user group to be identified Euclidean distance, to indicate that its degree of similarity, specific formula for calculation are as follows:
Wherein,Indicate the lateral attribute vector of different user in seed user group and user group to be identified, xi、 yjIndicate specific element property in vector, such as gender, age and educational background.
According to the similarity of calculated lateral attribute, the similarity of lateral attribute is determined from user group to be identified Meet first condition potential target user group.
Second determining module 302, for longitudinal attribute based on potential target user group, from potential target user group In determine that the similarity with longitudinal attribute of seed user group meets the targeted user population of second condition.
In the measure of feature similarity measurements, similarity function can be calculated effectively between two vectors in space dimension Difference on degree.When determined from user group to be identified and the similarity of the lateral attribute of seed user group meet first After the potential target user group of condition, further using cosine similar function as measurement seed user group and potential target The measurement of the user longitudinal direction attribute similarity of user group.Specific formula for calculation is as follows:
Functional value is bigger, illustrates that two vector angles are smaller, then the characteristic similarity representated by it is also higher.
Wherein,Indicate longitudinal attribute vector of different user in seed user group and potential target user group, xi、yjIndicate specific element property in vector, such as the hobby of daily concern: amusement, finance and movement.
Then, according to the similarity of calculated longitudinal attribute, longitudinal attribute is determined from potential target user group Similarity meet the targeted user population of second condition, the targeted user population determined is the similar of seed user group Crowd.
In conclusion in the above-described embodiments, when needing the similar crowd to seed user group to identify, first Lateral attribute based on seed user group is determined to belong to the transverse direction of the seed user group from user group to be identified The similarity of property meets the potential target user group of first condition, is then based on longitudinal attribute of potential target user group, Determine that the similarity with longitudinal attribute of seed user group meets the target of second condition from potential target user group User group.The present invention can effectively excavate potential target customer crowd according to the characteristic attribute information of seed crowd, section Advertising cost has been saved, advertising results are improved, has increased effective audient.
As shown in figure 4, be a kind of structural schematic diagram of similar crowd recognition system embodiment 2 disclosed by the invention, it is described System may include:
First determining module 401 calculates seed using glowworm swarm algorithm for the lateral attribute based on seed user group The similarity of the lateral attribute of user group and user group to be identified, determines lateral attribute from user group to be identified Similarity meets the group of first condition as potential target user group;
Glowworm swarm algorithm be it is a kind of by simulation nature firefly individual between the mechanism of attracting each other reach optimizing The colony intelligence random search algorithm of purpose, algorithm mechanism is simple, is easily achieved, versatile.Algorithm core concept is firefly It can be attracted by the bigger firefly of absolute brightness, and carry out self-position update using formula, i.e., if firefly i's is absolute bright Degree is greater than the absolute brightness of firefly j, then firefly i attracts firefly j to move to it;Specific absolute brightness then combines accordingly The problem of design fitness function, i.e. the absolute brightness of firefly is directly proportional to problem fitness function value.
The algorithm realize during assumed as follows: the mutual Attraction Degree of firefly only with its own brightness and Mutual distance is related, is not influenced by other factors.Wherein, firefly Attraction Degree is directly proportional to its brightness, shines weak Firefly by than its luminance firefly attract;Firefly Attraction Degree is inversely proportional with its distance, and distance is remoter, and Attraction Degree is got over It is low.
Assuming that the absolute brightness of firefly i is greater than the absolute brightness of firefly j, then firefly i attracts firefly j to move to i It is dynamic.Firefly i is β to the Attraction Degree of jij, calculation formula is as follows:
Wherein, β0For initial maximum attraction, usual value is 1;γ is the absorption coefficient of light, γ ∈ [0.01,100], rij For firefly i to the distance of firefly j, in the present invention, rijIndicate that the seed that firefly i and firefly j is respectively represented is used The lateral attribute difference degree of family group and user group to be identified, specific formula is expressed as formula (4), i.e., as follows:
Wherein,Indicate the lateral attribute vector of different user in seed user group and user group to be identified, xi、yjIndicate specific element property in vector, such as gender, age and educational background.
Firefly j is moved to it under the attraction of firefly i, and the location update formula of firefly j is as follows.
xj(t+1)=xj(t)+βij(xi(t)-xj(t))+αεj (5)
Wherein, t is the number of iterations;βijIt is firefly i to the attraction of firefly j;α indicate step factor, be section [0, 1] arbitrary number in;εjFor equally distributed random number.
Specifically, above-mentioned glowworm swarm algorithm the following steps are included:
Step (1): algorithm parameter initialization is completed, is advised in conjunction with seed user group set S and user group to be identified Mould sets optimizing population invariable number N, and initial attraction β is arranged0, absorption coefficient of light γ, step factor α, maximum number of iterations MaxGen。
Step (2): reading in the lateral attribute information of each user characteristics in seed user group, by each sample according to existing Information is divided into corresponding attribute classification, in this, as initial category division, calculates according to each attribute value of sample in data Class central value cenk
Step (3): with the lateral attribute and seed user of each user in user group to be identified during algorithm iteration Judgment criteria of the distance between group's class center quadratic sum as its similitude superiority and inferiority, i.e. fitness function in iterative process It is as follows:
Wherein, n indicates user group's quantity to be identified, and m indicates that the user considered in similar crowd divides laterally belongs to The quantity of property.
Step (4): according to the function fitness value being calculated in step (3), i.e. firefly brightness, fitness value is bigger Indicate firefly brightness it is higher, it is closer with the lateral attributes similarity of seed user group, can more attract other fireflies to It is moved.
Step (5): in algorithm iteration, successively according to the Attraction Degree and location update formula (3), formula (4) of firefly It is iterated optimization with formula (5), is finally excavated in user group to be identified with the lateral attribute of seed user group most Matched potential target user group T1
Second determining module 402, for longitudinal attribute information based on potential target user group, using the similar letter of cosine Number calculates the similarity of seed user group and longitudinal attribute of potential target user group, from potential target user group really The similarity for making longitudinal attribute meets the group of second condition as targeted user population.
The lateral attribute with seed user group is filtered out in user group to be identified by the solution of first stage With degree preferably potential target user group T1, second stage mainly in combination with seed user group longitudinal attribute information to latent In targeted user population T1In user further carry out classification and matching, finally obtain and the most similar mesh of seed user group Mark user group T2
Specifically, the present invention as measurement seed user group set S and is dived using cosine similar function in second stage In targeted user population T1The measurement of longitudinal attribute similarity of middle user.Specific formula for calculation such as formula (2), functional value is got over Greatly, indicate that two vector angles are smaller, representative characteristic similarity is also higher.
By calculating, each cosine function value is ranked up, in potential target user group T1In press preset ratio Example filters out certain customers' collection and is combined into final targeted user population T2, as finally obtained similar to seed user group Targeted user population.
In conclusion difference of the present invention according to attribute classification and content, the user property being involved in are divided into lateral category Property and longitudinal attribute two major classes.The present invention carries out in two stages, in the first phase, using firefly iterative algorithm, based on kind The lateral attribute of child user group, fitness function and glowworm swarm algorithm according to design update iterator mechanism, to use to be identified Family group carries out preliminary filtering screening, obtains and seed user group matching degree preferably potential target user group;? In two-stage, based on the first stage obtain as a result, in conjunction with seed user group longitudinal attribute, using cosine similar function make For the module for measuring user's similitude in seed user group and potential target user group, according to the function being calculated Value filters out certain customers' collection in preset ratio in potential target user group and is combined into final potential user group Body.The present invention has fully considered the otherness of user property classification, stage by stage according to the characteristic attribute information of seed user group Excavation screening is carried out to user group to be identified, finally obtains the targeted user population satisfied with seed user group matching degree, The conversion ratio for effectively improving Internet advertising reaches and not only can guarantee that accuracy is launched in advertisement, but also do not damage the purpose of user experience.
Each embodiment in this specification is described in a progressive manner, the highlights of each of the examples are with other The difference of embodiment, the same or similar parts in each embodiment may refer to each other.For device disclosed in embodiment For, since it is corresponded to the methods disclosed in the examples, so being described relatively simple, related place is said referring to method part It is bright.
Professional further appreciates that, unit described in conjunction with the examples disclosed in the embodiments of the present disclosure And algorithm steps, can be realized with electronic hardware, computer software, or a combination of the two, in order to clearly demonstrate hardware and The interchangeability of software generally describes each exemplary composition and step according to function in the above description.These Function is implemented in hardware or software actually, the specific application and design constraint depending on technical solution.Profession Technical staff can use different methods to achieve the described function each specific application, but this realization is not answered Think beyond the scope of this invention.
The step of method described in conjunction with the examples disclosed in this document or algorithm, can directly be held with hardware, processor The combination of capable software module or the two is implemented.Software module can be placed in random access memory (RAM), memory, read-only deposit Reservoir (ROM), electrically programmable ROM, electrically erasable ROM, register, hard disk, moveable magnetic disc, CD-ROM or technology In any other form of storage medium well known in field.
The foregoing description of the disclosed embodiments enables those skilled in the art to implement or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, as defined herein General Principle can be realized in other embodiments without departing from the spirit or scope of the present invention.Therefore, of the invention It is not intended to be limited to the embodiments shown herein, and is to fit to and the principles and novel features disclosed herein phase one The widest scope of cause.

Claims (10)

1. a kind of similar crowd recognition method characterized by comprising
Lateral attribute based on seed user group determines the cross with the seed user group from user group to be identified Meet the potential target user group of first condition to the similarity of attribute;
Based on longitudinal attribute of the potential target user group, determined from the potential target user group and described kind The similarity of longitudinal attribute of child user group meets the targeted user population of second condition.
2. the method according to claim 1, wherein longitudinal category based on the potential target user group Property, it determines to meet second with the similarity of longitudinal attribute of the seed user group from the potential target user group The targeted user population of condition, comprising:
Based on longitudinal attribute information of the potential target user group, the seed user group is calculated using cosine similar function The similarity of body and longitudinal attribute of the potential target user group, determines longitudinal direction from the potential target user group The similarity of attribute meets the group of second condition as targeted user population.
3. the method according to claim 1, wherein the lateral attribute based on seed user group, to Determine that the similarity with the lateral attribute of the seed user group meets the potential mesh of first condition in identification user group Mark user group, comprising:
Lateral attribute based on seed user group, using glowworm swarm algorithm calculate the seed user group with it is described to be identified The similarity of the lateral attribute of user group determines that the similarity of lateral attribute meets the from the user group to be identified The group of one condition is as potential target user group.
4. according to the method described in claim 3, it is characterized in that, the lateral attribute based on seed user group, uses Glowworm swarm algorithm calculates the similarity of the lateral attribute of the seed user group and the user group to be identified, from it is described to Determine that the similarity of lateral attribute meets the group of first condition as potential target user group, packet in identification user group It includes:
Initialization algorithm parameter;
Class central value is calculated with the lateral attribute of the seed user group;
The original intensity of each firefly is calculated based on the class central value;
In algorithm iteration, the brightness of the location information and firefly that update each firefly is calculated, from the user to be identified Determine that the similarity of lateral attribute meets the group of first condition as potential target user group in group.
5. according to the method described in claim 4, it is characterized in that, the initialization algorithm parameter, comprising:
Optimizing population invariable number, initial attraction, light are set in conjunction with the seed user group and user group's scale to be identified Absorption coefficient, step factor and maximum number of iterations.
6. a kind of similar crowd recognition system characterized by comprising
First determining module is determined and institute for the lateral attribute based on seed user group from user group to be identified The similarity for stating the lateral attribute of seed user group meets the potential target user group of first condition;
Second determining module, for longitudinal attribute based on the potential target user group, from the potential target user group Determine that the similarity with longitudinal attribute of the seed user group meets the targeted user population of second condition in body.
7. system according to claim 6, which is characterized in that second determining module is being executed based on the potential mesh The longitudinal attribute for marking user group is determined to belong to the longitudinal direction of the seed user group from the potential target user group When the similarity of property meets the targeted user population of second condition, it is specifically used for:
Based on longitudinal attribute information of the potential target user group, the seed user group is calculated using cosine similar function The similarity of body and longitudinal attribute of the potential target user group, determines longitudinal direction from the potential target user group The similarity of attribute meets the group of second condition as targeted user population.
8. system according to claim 6, which is characterized in that first determining module is being executed based on seed user group The lateral attribute of body is determined to meet with the similarity of the lateral attribute of the seed user group from user group to be identified When the potential target user group of first condition, it is specifically used for:
Lateral attribute based on seed user group, using glowworm swarm algorithm calculate the seed user group with it is described to be identified The similarity of the lateral attribute of user group determines that the similarity of lateral attribute meets the from the user group to be identified The group of one condition is as potential target user group.
9. system according to claim 8, which is characterized in that first determining module is being executed based on seed user group The lateral attribute of body calculates the lateral attribute of the seed user group Yu the user group to be identified using glowworm swarm algorithm Similarity, determine that the similarity of lateral attribute meets the group of first condition as latent from the user group to be identified In targeted user population, it is specifically used for:
Initialization algorithm parameter;
Class central value is calculated with the lateral attribute of the seed user group;
The original intensity of each firefly is calculated based on the class central value;
In algorithm iteration, the brightness of the location information and firefly that update each firefly is calculated, from the user to be identified Determine that the similarity of lateral attribute meets the group of first condition as potential target user group in group.
10. system according to claim 9, which is characterized in that first determining module is executing initialization algorithm ginseng When number, it is specifically used for:
Optimizing population invariable number, initial attraction, light are set in conjunction with the seed user group and user group's scale to be identified Absorption coefficient, step factor and maximum number of iterations.
CN201910433863.0A 2019-05-23 2019-05-23 A kind of similar crowd recognition method and system Pending CN110135916A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910433863.0A CN110135916A (en) 2019-05-23 2019-05-23 A kind of similar crowd recognition method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910433863.0A CN110135916A (en) 2019-05-23 2019-05-23 A kind of similar crowd recognition method and system

Publications (1)

Publication Number Publication Date
CN110135916A true CN110135916A (en) 2019-08-16

Family

ID=67572827

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910433863.0A Pending CN110135916A (en) 2019-05-23 2019-05-23 A kind of similar crowd recognition method and system

Country Status (1)

Country Link
CN (1) CN110135916A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111598714A (en) * 2020-07-24 2020-08-28 北京淇瑀信息科技有限公司 Two-stage unsupervised group partner identification method and device and electronic equipment
CN111831681A (en) * 2020-01-22 2020-10-27 浙江连信科技有限公司 Intelligent terminal-based personnel discrimination method and device

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110055004A1 (en) * 2009-09-02 2011-03-03 Bradd Elden Libby Method and system for selecting and optimizing bid recommendation algorithms
CN104111946A (en) * 2013-04-19 2014-10-22 腾讯科技(深圳)有限公司 Clustering method and device based on user interests
CN105005301A (en) * 2015-05-25 2015-10-28 湘潭大学 Method for planning operation point sequence and path of industrial robot based on swarm intelligence algorithm
CN105005918A (en) * 2015-07-24 2015-10-28 金鹃传媒科技股份有限公司 Online advertisement push method based on user behavior data and potential user influence analysis and push evaluation method thereof
CN105279204A (en) * 2014-07-25 2016-01-27 阿里巴巴集团控股有限公司 Information push method and apparatus
CN106611344A (en) * 2015-10-23 2017-05-03 北京国双科技有限公司 Method and device for mining potential customers
CN106649781A (en) * 2016-12-28 2017-05-10 北京小米移动软件有限公司 Application recommendation method and device
CN107545453A (en) * 2016-06-28 2018-01-05 阿里巴巴集团控股有限公司 A kind of information distribution method and device
CN107862558A (en) * 2017-12-11 2018-03-30 中国南方航空股份有限公司 Self-standing user group's extended method
CN108415913A (en) * 2017-02-09 2018-08-17 周孟 Crowd's orientation method based on uncertain neighbours
CN108537567A (en) * 2018-03-06 2018-09-14 阿里巴巴集团控股有限公司 A kind of determination method and apparatus of targeted user population

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110055004A1 (en) * 2009-09-02 2011-03-03 Bradd Elden Libby Method and system for selecting and optimizing bid recommendation algorithms
CN104111946A (en) * 2013-04-19 2014-10-22 腾讯科技(深圳)有限公司 Clustering method and device based on user interests
CN105279204A (en) * 2014-07-25 2016-01-27 阿里巴巴集团控股有限公司 Information push method and apparatus
CN105005301A (en) * 2015-05-25 2015-10-28 湘潭大学 Method for planning operation point sequence and path of industrial robot based on swarm intelligence algorithm
CN105005918A (en) * 2015-07-24 2015-10-28 金鹃传媒科技股份有限公司 Online advertisement push method based on user behavior data and potential user influence analysis and push evaluation method thereof
CN106611344A (en) * 2015-10-23 2017-05-03 北京国双科技有限公司 Method and device for mining potential customers
CN107545453A (en) * 2016-06-28 2018-01-05 阿里巴巴集团控股有限公司 A kind of information distribution method and device
CN106649781A (en) * 2016-12-28 2017-05-10 北京小米移动软件有限公司 Application recommendation method and device
CN108415913A (en) * 2017-02-09 2018-08-17 周孟 Crowd's orientation method based on uncertain neighbours
CN107862558A (en) * 2017-12-11 2018-03-30 中国南方航空股份有限公司 Self-standing user group's extended method
CN108537567A (en) * 2018-03-06 2018-09-14 阿里巴巴集团控股有限公司 A kind of determination method and apparatus of targeted user population

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
付平: "人工萤火虫算法的参数分析与改进及其应用", 《中国优秀硕士学位论文全文数据库信息科技辑》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111831681A (en) * 2020-01-22 2020-10-27 浙江连信科技有限公司 Intelligent terminal-based personnel discrimination method and device
CN111831681B (en) * 2020-01-22 2022-03-25 浙江连信科技有限公司 Intelligent terminal-based personnel discrimination method and device
CN111598714A (en) * 2020-07-24 2020-08-28 北京淇瑀信息科技有限公司 Two-stage unsupervised group partner identification method and device and electronic equipment

Similar Documents

Publication Publication Date Title
Bhatia Data mining and data warehousing: principles and practical techniques
Tibshirani et al. Diagnosis of multiple cancer types by shrunken centroids of gene expression
Ye et al. Urban function recognition by integrating social media and street-level imagery
CN108108451A (en) The group of subscribers portrait acquisition methods and device of group
Hiu et al. An investigation of decision‐making styles of consumers in China
Wang et al. The determinants of the sports team sponsor's brand equity: a cross-country comparison in Asia
CN109190044A (en) Personalized recommendation method, device, server and medium
Fawcett et al. Data Science for Business
Malik et al. Applied unsupervised learning with R: Uncover hidden relationships and patterns with k-means clustering, hierarchical clustering, and PCA
CN108763362A (en) Method is recommended to the partial model Weighted Fusion Top-N films of selection based on random anchor point
CN107992531A (en) News personalization intelligent recommendation method and system based on deep learning
Wang et al. Large-scale ensemble model for customer churn prediction in search ads
TW201939400A (en) Method and device for determining group of target users
CN108230010A (en) A kind of method and server for estimating ad conversion rates
CN110246007A (en) A kind of Method of Commodity Recommendation and device
CN107526810B (en) Method and device for establishing click rate estimation model and display method and device
CN103886048A (en) Cluster-based increment digital book recommendation method
CN109949089B (en) Method, device and terminal for determining display rate
CN110135916A (en) A kind of similar crowd recognition method and system
CN107220311A (en) A kind of document representation method of utilization locally embedding topic modeling
Kim et al. A k-populations algorithm for clustering categorical data
CN107545444A (en) A kind of card data recommendation method and device
CN109614982A (en) Product analysis method, apparatus, computer equipment and storage medium
Hsu et al. Who donates on line? Segmentation analysis and marketing strategies based on machine learning for online charitable donations in Taiwan
Jayanthi et al. Leaf disease segmentation from agricultural images via hybridization of active contour model and OFA

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190816