CN110135916A - A kind of similar crowd recognition method and system - Google Patents
A kind of similar crowd recognition method and system Download PDFInfo
- Publication number
- CN110135916A CN110135916A CN201910433863.0A CN201910433863A CN110135916A CN 110135916 A CN110135916 A CN 110135916A CN 201910433863 A CN201910433863 A CN 201910433863A CN 110135916 A CN110135916 A CN 110135916A
- Authority
- CN
- China
- Prior art keywords
- user group
- attribute
- similarity
- seed
- potential target
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/004—Artificial life, i.e. computing arrangements simulating life
- G06N3/006—Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0201—Market modelling; Market analysis; Collecting market data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0241—Advertisements
- G06Q30/0251—Targeted advertisements
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0241—Advertisements
- G06Q30/0277—Online advertisement
Abstract
The invention discloses a kind of similar crowd recognition method and system, method includes: the lateral attribute based on seed user group, determines that the similarity with the lateral attribute of seed user group meets the potential target user group of first condition from user group to be identified;Longitudinal attribute based on potential target user group determines that the similarity with longitudinal attribute of seed user group meets the targeted user population of second condition from potential target user group.The present invention can effectively excavate potential target customer crowd, save advertising cost, improve advertising results according to the characteristic attribute information of seed crowd, increase effective audient.
Description
Technical field
The present invention relates to technical field of data processing more particularly to a kind of similar crowd recognition method and system.
Background technique
With the rapid development of internet technology, Internet advertising is increasingly becoming one kind by advertiser and user's welcome
Mainstream advertising media.Currently, the form of Internet advertising and dispensing source are varied, common advertising mechanism is generallyd use
The data extracting rule of the formulations such as the historical information browsed based on advertisement keyword, user, matches data collected
The advertisement of suitable user is finally delivered to user by screening, however this method mechanism is clearly present of data and extracts and advertisement throwing
The timeliness of the hysteresis quality put, data is poor.
Therefore, to solve the above problems, improving the displaying rate for launching advertisement, precision is launched in advertisement and improving user experience
Effect, is badly in need of that a kind of precision is higher, scheme is launched in the better advertisement of suitability.
Summary of the invention
In view of this, the present invention provides a kind of similar crowd recognition method, it can be according to the characteristic attribute of seed crowd
Information effectively excavates potential target customer crowd, saves advertising cost, improves advertising results, increase effectively by
It is many.
The present invention provides a kind of similar crowd recognition methods, comprising:
Lateral attribute based on seed user group is determined and the seed user group from user group to be identified
The similarity of lateral attribute meet the potential target user group of first condition;
Based on longitudinal attribute of the potential target user group, determined from the potential target user group and institute
The similarity for stating longitudinal attribute of seed user group meets the targeted user population of second condition.
Preferably, longitudinal attribute based on the potential target user group, from the potential target user group
In determine that the similarity with longitudinal attribute of the seed user group meets the targeted user population of second condition, comprising:
Based on longitudinal attribute information of the potential target user group, the seed is calculated using cosine similar function and is used
The similarity of longitudinal attribute of family group and the potential target user group, is determined from the potential target user group
The similarity of longitudinal attribute meets the group of second condition as targeted user population.
Preferably, the lateral attribute based on seed user group, determined from user group to be identified with it is described
The similarity of the lateral attribute of seed user group meets the potential target user group of first condition, comprising:
Lateral attribute based on seed user group, using glowworm swarm algorithm calculate the seed user group and it is described to
The similarity for identifying the lateral attribute of user group determines that the similarity of lateral attribute is full from the user group to be identified
The group of sufficient first condition is as potential target user group.
Preferably, the lateral attribute based on seed user group calculates the seed user using glowworm swarm algorithm
The similarity of the lateral attribute of group and the user group to be identified, determines lateral category from the user group to be identified
The similarity of property meets the group of first condition as potential target user group, comprising:
Initialization algorithm parameter;
Class central value is calculated with the lateral attribute of the seed user group;
The original intensity of each firefly is calculated based on the class central value;
In algorithm iteration, the brightness of the location information and firefly that update each firefly is calculated, from described to be identified
Determine that the similarity of lateral attribute meets the group of first condition as potential target user group in user group.
Preferably, the initialization algorithm parameter, comprising:
In conjunction with the seed user group and user group's scale to be identified setting optimizing population invariable number, initial attraction
Power, the absorption coefficient of light, step factor and maximum number of iterations.
A kind of similar crowd recognition system, comprising:
First determining module is determined from user group to be identified for the lateral attribute based on seed user group
Meet the potential target user group of first condition with the similarity of the lateral attribute of the seed user group;
Second determining module is used for longitudinal attribute based on the potential target user group from the potential target
Determine that the similarity with longitudinal attribute of the seed user group meets the targeted user population of second condition in the group of family.
Preferably, second determining module is executing longitudinal attribute based on the potential target user group, from institute
It states and determines to meet second condition with the similarity of longitudinal attribute of the seed user group in potential target user group
When targeted user population, it is specifically used for:
Based on longitudinal attribute information of the potential target user group, the seed is calculated using cosine similar function and is used
The similarity of longitudinal attribute of family group and the potential target user group, is determined from the potential target user group
The similarity of longitudinal attribute meets the group of second condition as targeted user population.
Preferably, first determining module is executing the lateral attribute based on seed user group, from user to be identified
Determine that the similarity with the lateral attribute of the seed user group meets the potential target user group of first condition in group
When body, it is specifically used for:
Lateral attribute based on seed user group, using glowworm swarm algorithm calculate the seed user group and it is described to
The similarity for identifying the lateral attribute of user group determines that the similarity of lateral attribute is full from the user group to be identified
The group of sufficient first condition is as potential target user group.
Preferably, first determining module is executing the lateral attribute based on seed user group, is calculated using firefly
Method calculates the similarity of the seed user group and the lateral attribute of the user group to be identified, from the user to be identified
It is specific to use when determining that the similarity of lateral attribute meets the group of first condition as potential target user group in group
In:
Initialization algorithm parameter;
Class central value is calculated with the lateral attribute of the seed user group;
The original intensity of each firefly is calculated based on the class central value;
In algorithm iteration, the brightness of the location information and firefly that update each firefly is calculated, from described to be identified
Determine that the similarity of lateral attribute meets the group of first condition as potential target user group in user group.
Preferably, first determining module is specifically used for when executing initialization algorithm parameter:
In conjunction with the seed user group and user group's scale to be identified setting optimizing population invariable number, initial attraction
Power, the absorption coefficient of light, step factor and maximum number of iterations.
In conclusion the invention discloses a kind of similar crowd recognition method, when needing to the similar of seed user group
When crowd identifies, be primarily based on the lateral attribute of seed user group, determined from user group to be identified with it is described
The similarity of the lateral attribute of seed user group meets the potential target user group of first condition, is then based on potential target
Longitudinal attribute of user group determines the similarity with longitudinal attribute of seed user group from potential target user group
Meet the targeted user population of second condition.The present invention can be according to the characteristic attribute information of seed crowd, and effective excavate is dived
Target customer crowd, save advertising cost, improve advertising results, increase effective audient.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below
There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this
Some embodiments of invention for those of ordinary skill in the art without creative efforts, can be with
It obtains other drawings based on these drawings.
Fig. 1 is a kind of method flow diagram of similar crowd recognition embodiment of the method 1 disclosed by the invention;
Fig. 2 is a kind of method flow diagram of similar crowd recognition embodiment of the method 2 disclosed by the invention;
Fig. 3 is a kind of structural schematic diagram of similar crowd recognition system embodiment 1 disclosed by the invention;
Fig. 4 is a kind of structural schematic diagram of similar crowd recognition system embodiment 2 disclosed by the invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on
Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other
Embodiment shall fall within the protection scope of the present invention.
As shown in Figure 1, be a kind of method flow diagram of similar crowd recognition embodiment of the method 1 disclosed by the invention, it is described
Method may comprise steps of:
S101, the lateral attribute based on seed user group, are determined and seed user group from user group to be identified
The similarity of the lateral attribute of body meets the potential target user group of first condition;
When needing to identify crowd similar with seed user group from user group to be identified, used for effective use
User characteristics Attribute transposition is lateral attribute and longitudinal attribute by family characteristic attribute.Wherein, lateral attribute includes: gender, year
Basic populations' attribute such as age, personal income and educational background, longitudinal attribute be user daily concern, amusement, finance, movement and when
The interest preferences such as political affairs.
Then, by the lateral attributive character for the user being related in calculating seed user group and user group to be identified
Euclidean distance, to indicate that its degree of similarity, specific formula for calculation are as follows:
Wherein,Indicate the lateral attribute vector of different user in seed user group and user group to be identified, xi、
yjIndicate specific element property in vector, such as gender, age and educational background.
According to the similarity of calculated lateral attribute, the similarity of lateral attribute is determined from user group to be identified
Meet first condition potential target user group.
S102, longitudinal attribute based on potential target user group, are determined and seed from potential target user group
The similarity of longitudinal attribute of user group meets the targeted user population of second condition.
In the measure of feature similarity measurements, similarity function can be calculated effectively between two vectors in space dimension
Difference on degree.When determined from user group to be identified and the similarity of the lateral attribute of seed user group meet first
After the potential target user group of condition, further using cosine similar function as measurement seed user group and potential target
The measurement of the user longitudinal direction attribute similarity of user group.Specific formula for calculation is as follows:
Functional value is bigger, illustrates that two vector angles are smaller, then the characteristic similarity representated by it is also higher.
Wherein,Indicate longitudinal attribute vector of different user in seed user group and potential target user group,
xi、yjIndicate specific element property in vector, such as the hobby of daily concern: amusement, finance and movement.
Then, according to the similarity of calculated longitudinal attribute, longitudinal attribute is determined from potential target user group
Similarity meet the targeted user population of second condition, the targeted user population determined is the similar of seed user group
Crowd.
In conclusion in the above-described embodiments, when needing the similar crowd to seed user group to identify, first
Lateral attribute based on seed user group is determined to belong to the transverse direction of the seed user group from user group to be identified
The similarity of property meets the potential target user group of first condition, is then based on longitudinal attribute of potential target user group,
Determine that the similarity with longitudinal attribute of seed user group meets the target of second condition from potential target user group
User group.The present invention can effectively excavate potential target customer crowd according to the characteristic attribute information of seed crowd, section
Advertising cost has been saved, advertising results are improved, has increased effective audient.
As shown in Fig. 2, be a kind of method flow diagram of similar crowd recognition embodiment of the method 2 disclosed by the invention, it is described
Method may comprise steps of:
S201, the lateral attribute based on seed user group, using glowworm swarm algorithm calculate seed user group with wait know
The similarity of the lateral attribute of other user group determines that the similarity of lateral attribute meets first from user group to be identified
The group of condition is as potential target user group;
Glowworm swarm algorithm be it is a kind of by simulation nature firefly individual between the mechanism of attracting each other reach optimizing
The colony intelligence random search algorithm of purpose, algorithm mechanism is simple, is easily achieved, versatile.Algorithm core concept is firefly
It can be attracted by the bigger firefly of absolute brightness, and carry out self-position update using formula, i.e., if firefly i's is absolute bright
Degree is greater than the absolute brightness of firefly j, then firefly i attracts firefly j to move to it;Specific absolute brightness then combines accordingly
The problem of design fitness function, i.e. the absolute brightness of firefly is directly proportional to problem fitness function value.
The algorithm realize during assumed as follows: the mutual Attraction Degree of firefly only with its own brightness and
Mutual distance is related, is not influenced by other factors.Wherein, firefly Attraction Degree is directly proportional to its brightness, shines weak
Firefly by than its luminance firefly attract;Firefly Attraction Degree is inversely proportional with its distance, and distance is remoter, and Attraction Degree is got over
It is low.
Assuming that the absolute brightness of firefly i is greater than the absolute brightness of firefly j, then firefly i attracts firefly j to move to i
It is dynamic.Firefly i is β to the Attraction Degree of jij, calculation formula is as follows:
Wherein, β0For initial maximum attraction, usual value is 1;γ is the absorption coefficient of light, γ ∈ [0.01,100], rij
For firefly i to the distance of firefly j, in the present invention, rijIndicate that the seed that firefly i and firefly j is respectively represented is used
The lateral attribute difference degree of family group and user group to be identified, specific formula is expressed as formula (4), i.e., as follows:
Wherein,Indicate the lateral attribute vector of different user in seed user group and user group to be identified,
xi、yjIndicate specific element property in vector, such as gender, age and educational background.
Firefly j is moved to it under the attraction of firefly i, and the location update formula of firefly j is as follows.
xj(t+1)=xj(t)+βij(xi(t)-xj(t))+αεj (5)
Wherein, t is the number of iterations;βijIt is firefly i to the attraction of firefly j;α indicate step factor, be section [0,
1] arbitrary number in;εjFor equally distributed random number.
Specifically, above-mentioned glowworm swarm algorithm the following steps are included:
Step (1): algorithm parameter initialization is completed, is advised in conjunction with seed user group set S and user group to be identified
Mould sets optimizing population invariable number N, and initial attraction β is arranged0, absorption coefficient of light γ, step factor α, maximum number of iterations
MaxGen。
Step (2): reading in the lateral attribute information of each user characteristics in seed user group, by each sample according to existing
Information is divided into corresponding attribute classification, in this, as initial category division, calculates according to each attribute value of sample in data
Class central value cenk。
Step (3): with the lateral attribute and seed user of each user in user group to be identified during algorithm iteration
Judgment criteria of the distance between group's class center quadratic sum as its similitude superiority and inferiority, i.e. fitness function in iterative process
It is as follows:
Wherein, n indicates user group's quantity to be identified, and m indicates that the user considered in similar crowd divides laterally belongs to
The quantity of property.
Step (4): according to the function fitness value being calculated in step (3), i.e. firefly brightness, fitness value is bigger
Indicate firefly brightness it is higher, it is closer with the lateral attributes similarity of seed user group, can more attract other fireflies to
It is moved.
Step (5): in algorithm iteration, successively according to the Attraction Degree and location update formula (3), formula (4) of firefly
It is iterated optimization with formula (5), is finally excavated in user group to be identified with the lateral attribute of seed user group most
Matched potential target user group T1。
S202, longitudinal attribute information based on potential target user group calculate seed user using cosine similar function
The similarity of longitudinal attribute of group and potential target user group, determines longitudinal attribute from potential target user group
Similarity meets the group of second condition as targeted user population.
The lateral attribute with seed user group is filtered out in user group to be identified by the solution of first stage
With degree preferably potential target user group T1, second stage mainly in combination with seed user group longitudinal attribute information to latent
In targeted user population T1In user further carry out classification and matching, finally obtain and the most similar mesh of seed user group
Mark user group T2。
Specifically, the present invention as measurement seed user group set S and is dived using cosine similar function in second stage
In targeted user population T1The measurement of longitudinal attribute similarity of middle user.Specific formula for calculation such as formula (2), functional value is got over
Greatly, indicate that two vector angles are smaller, representative characteristic similarity is also higher.
By calculating, each cosine function value is ranked up, in potential target user group T1In press preset ratio
Example filters out certain customers' collection and is combined into final targeted user population T2, as finally obtained similar to seed user group
Targeted user population.
In conclusion difference of the present invention according to attribute classification and content, the user property being involved in are divided into lateral category
Property and longitudinal attribute two major classes.The present invention carries out in two stages, in the first phase, using firefly iterative algorithm, based on kind
The lateral attribute of child user group, fitness function and glowworm swarm algorithm according to design update iterator mechanism, to use to be identified
Family group carries out preliminary filtering screening, obtains and seed user group matching degree preferably potential target user group;?
In two-stage, based on the first stage obtain as a result, in conjunction with seed user group longitudinal attribute, using cosine similar function make
For the module for measuring user's similitude in seed user group and potential target user group, according to the function being calculated
Value filters out certain customers' collection in preset ratio in potential target user group and is combined into final potential user group
Body.The present invention has fully considered the otherness of user property classification, stage by stage according to the characteristic attribute information of seed user group
Excavation screening is carried out to user group to be identified, finally obtains the targeted user population satisfied with seed user group matching degree,
The conversion ratio for effectively improving Internet advertising reaches and not only can guarantee that accuracy is launched in advertisement, but also do not damage the purpose of user experience.
As shown in figure 3, be a kind of structural schematic diagram of similar crowd recognition system embodiment 1 disclosed by the invention, it is described
System may include:
First determining module 301 is determined from user group to be identified for the lateral attribute based on seed user group
Meet the potential target user group of first condition with the similarity of the lateral attribute of seed user group out;
When needing to identify crowd similar with seed user group from user group to be identified, used for effective use
User characteristics Attribute transposition is lateral attribute and longitudinal attribute by family characteristic attribute.Wherein, lateral attribute includes: gender, year
Basic populations' attribute such as age, personal income and educational background, longitudinal attribute be user daily concern, amusement, finance, movement and when
The interest preferences such as political affairs.
Then, by the lateral attributive character for the user being related in calculating seed user group and user group to be identified
Euclidean distance, to indicate that its degree of similarity, specific formula for calculation are as follows:
Wherein,Indicate the lateral attribute vector of different user in seed user group and user group to be identified, xi、
yjIndicate specific element property in vector, such as gender, age and educational background.
According to the similarity of calculated lateral attribute, the similarity of lateral attribute is determined from user group to be identified
Meet first condition potential target user group.
Second determining module 302, for longitudinal attribute based on potential target user group, from potential target user group
In determine that the similarity with longitudinal attribute of seed user group meets the targeted user population of second condition.
In the measure of feature similarity measurements, similarity function can be calculated effectively between two vectors in space dimension
Difference on degree.When determined from user group to be identified and the similarity of the lateral attribute of seed user group meet first
After the potential target user group of condition, further using cosine similar function as measurement seed user group and potential target
The measurement of the user longitudinal direction attribute similarity of user group.Specific formula for calculation is as follows:
Functional value is bigger, illustrates that two vector angles are smaller, then the characteristic similarity representated by it is also higher.
Wherein,Indicate longitudinal attribute vector of different user in seed user group and potential target user group,
xi、yjIndicate specific element property in vector, such as the hobby of daily concern: amusement, finance and movement.
Then, according to the similarity of calculated longitudinal attribute, longitudinal attribute is determined from potential target user group
Similarity meet the targeted user population of second condition, the targeted user population determined is the similar of seed user group
Crowd.
In conclusion in the above-described embodiments, when needing the similar crowd to seed user group to identify, first
Lateral attribute based on seed user group is determined to belong to the transverse direction of the seed user group from user group to be identified
The similarity of property meets the potential target user group of first condition, is then based on longitudinal attribute of potential target user group,
Determine that the similarity with longitudinal attribute of seed user group meets the target of second condition from potential target user group
User group.The present invention can effectively excavate potential target customer crowd according to the characteristic attribute information of seed crowd, section
Advertising cost has been saved, advertising results are improved, has increased effective audient.
As shown in figure 4, be a kind of structural schematic diagram of similar crowd recognition system embodiment 2 disclosed by the invention, it is described
System may include:
First determining module 401 calculates seed using glowworm swarm algorithm for the lateral attribute based on seed user group
The similarity of the lateral attribute of user group and user group to be identified, determines lateral attribute from user group to be identified
Similarity meets the group of first condition as potential target user group;
Glowworm swarm algorithm be it is a kind of by simulation nature firefly individual between the mechanism of attracting each other reach optimizing
The colony intelligence random search algorithm of purpose, algorithm mechanism is simple, is easily achieved, versatile.Algorithm core concept is firefly
It can be attracted by the bigger firefly of absolute brightness, and carry out self-position update using formula, i.e., if firefly i's is absolute bright
Degree is greater than the absolute brightness of firefly j, then firefly i attracts firefly j to move to it;Specific absolute brightness then combines accordingly
The problem of design fitness function, i.e. the absolute brightness of firefly is directly proportional to problem fitness function value.
The algorithm realize during assumed as follows: the mutual Attraction Degree of firefly only with its own brightness and
Mutual distance is related, is not influenced by other factors.Wherein, firefly Attraction Degree is directly proportional to its brightness, shines weak
Firefly by than its luminance firefly attract;Firefly Attraction Degree is inversely proportional with its distance, and distance is remoter, and Attraction Degree is got over
It is low.
Assuming that the absolute brightness of firefly i is greater than the absolute brightness of firefly j, then firefly i attracts firefly j to move to i
It is dynamic.Firefly i is β to the Attraction Degree of jij, calculation formula is as follows:
Wherein, β0For initial maximum attraction, usual value is 1;γ is the absorption coefficient of light, γ ∈ [0.01,100], rij
For firefly i to the distance of firefly j, in the present invention, rijIndicate that the seed that firefly i and firefly j is respectively represented is used
The lateral attribute difference degree of family group and user group to be identified, specific formula is expressed as formula (4), i.e., as follows:
Wherein,Indicate the lateral attribute vector of different user in seed user group and user group to be identified,
xi、yjIndicate specific element property in vector, such as gender, age and educational background.
Firefly j is moved to it under the attraction of firefly i, and the location update formula of firefly j is as follows.
xj(t+1)=xj(t)+βij(xi(t)-xj(t))+αεj (5)
Wherein, t is the number of iterations;βijIt is firefly i to the attraction of firefly j;α indicate step factor, be section [0,
1] arbitrary number in;εjFor equally distributed random number.
Specifically, above-mentioned glowworm swarm algorithm the following steps are included:
Step (1): algorithm parameter initialization is completed, is advised in conjunction with seed user group set S and user group to be identified
Mould sets optimizing population invariable number N, and initial attraction β is arranged0, absorption coefficient of light γ, step factor α, maximum number of iterations
MaxGen。
Step (2): reading in the lateral attribute information of each user characteristics in seed user group, by each sample according to existing
Information is divided into corresponding attribute classification, in this, as initial category division, calculates according to each attribute value of sample in data
Class central value cenk。
Step (3): with the lateral attribute and seed user of each user in user group to be identified during algorithm iteration
Judgment criteria of the distance between group's class center quadratic sum as its similitude superiority and inferiority, i.e. fitness function in iterative process
It is as follows:
Wherein, n indicates user group's quantity to be identified, and m indicates that the user considered in similar crowd divides laterally belongs to
The quantity of property.
Step (4): according to the function fitness value being calculated in step (3), i.e. firefly brightness, fitness value is bigger
Indicate firefly brightness it is higher, it is closer with the lateral attributes similarity of seed user group, can more attract other fireflies to
It is moved.
Step (5): in algorithm iteration, successively according to the Attraction Degree and location update formula (3), formula (4) of firefly
It is iterated optimization with formula (5), is finally excavated in user group to be identified with the lateral attribute of seed user group most
Matched potential target user group T1。
Second determining module 402, for longitudinal attribute information based on potential target user group, using the similar letter of cosine
Number calculates the similarity of seed user group and longitudinal attribute of potential target user group, from potential target user group really
The similarity for making longitudinal attribute meets the group of second condition as targeted user population.
The lateral attribute with seed user group is filtered out in user group to be identified by the solution of first stage
With degree preferably potential target user group T1, second stage mainly in combination with seed user group longitudinal attribute information to latent
In targeted user population T1In user further carry out classification and matching, finally obtain and the most similar mesh of seed user group
Mark user group T2。
Specifically, the present invention as measurement seed user group set S and is dived using cosine similar function in second stage
In targeted user population T1The measurement of longitudinal attribute similarity of middle user.Specific formula for calculation such as formula (2), functional value is got over
Greatly, indicate that two vector angles are smaller, representative characteristic similarity is also higher.
By calculating, each cosine function value is ranked up, in potential target user group T1In press preset ratio
Example filters out certain customers' collection and is combined into final targeted user population T2, as finally obtained similar to seed user group
Targeted user population.
In conclusion difference of the present invention according to attribute classification and content, the user property being involved in are divided into lateral category
Property and longitudinal attribute two major classes.The present invention carries out in two stages, in the first phase, using firefly iterative algorithm, based on kind
The lateral attribute of child user group, fitness function and glowworm swarm algorithm according to design update iterator mechanism, to use to be identified
Family group carries out preliminary filtering screening, obtains and seed user group matching degree preferably potential target user group;?
In two-stage, based on the first stage obtain as a result, in conjunction with seed user group longitudinal attribute, using cosine similar function make
For the module for measuring user's similitude in seed user group and potential target user group, according to the function being calculated
Value filters out certain customers' collection in preset ratio in potential target user group and is combined into final potential user group
Body.The present invention has fully considered the otherness of user property classification, stage by stage according to the characteristic attribute information of seed user group
Excavation screening is carried out to user group to be identified, finally obtains the targeted user population satisfied with seed user group matching degree,
The conversion ratio for effectively improving Internet advertising reaches and not only can guarantee that accuracy is launched in advertisement, but also do not damage the purpose of user experience.
Each embodiment in this specification is described in a progressive manner, the highlights of each of the examples are with other
The difference of embodiment, the same or similar parts in each embodiment may refer to each other.For device disclosed in embodiment
For, since it is corresponded to the methods disclosed in the examples, so being described relatively simple, related place is said referring to method part
It is bright.
Professional further appreciates that, unit described in conjunction with the examples disclosed in the embodiments of the present disclosure
And algorithm steps, can be realized with electronic hardware, computer software, or a combination of the two, in order to clearly demonstrate hardware and
The interchangeability of software generally describes each exemplary composition and step according to function in the above description.These
Function is implemented in hardware or software actually, the specific application and design constraint depending on technical solution.Profession
Technical staff can use different methods to achieve the described function each specific application, but this realization is not answered
Think beyond the scope of this invention.
The step of method described in conjunction with the examples disclosed in this document or algorithm, can directly be held with hardware, processor
The combination of capable software module or the two is implemented.Software module can be placed in random access memory (RAM), memory, read-only deposit
Reservoir (ROM), electrically programmable ROM, electrically erasable ROM, register, hard disk, moveable magnetic disc, CD-ROM or technology
In any other form of storage medium well known in field.
The foregoing description of the disclosed embodiments enables those skilled in the art to implement or use the present invention.
Various modifications to these embodiments will be readily apparent to those skilled in the art, as defined herein
General Principle can be realized in other embodiments without departing from the spirit or scope of the present invention.Therefore, of the invention
It is not intended to be limited to the embodiments shown herein, and is to fit to and the principles and novel features disclosed herein phase one
The widest scope of cause.
Claims (10)
1. a kind of similar crowd recognition method characterized by comprising
Lateral attribute based on seed user group determines the cross with the seed user group from user group to be identified
Meet the potential target user group of first condition to the similarity of attribute;
Based on longitudinal attribute of the potential target user group, determined from the potential target user group and described kind
The similarity of longitudinal attribute of child user group meets the targeted user population of second condition.
2. the method according to claim 1, wherein longitudinal category based on the potential target user group
Property, it determines to meet second with the similarity of longitudinal attribute of the seed user group from the potential target user group
The targeted user population of condition, comprising:
Based on longitudinal attribute information of the potential target user group, the seed user group is calculated using cosine similar function
The similarity of body and longitudinal attribute of the potential target user group, determines longitudinal direction from the potential target user group
The similarity of attribute meets the group of second condition as targeted user population.
3. the method according to claim 1, wherein the lateral attribute based on seed user group, to
Determine that the similarity with the lateral attribute of the seed user group meets the potential mesh of first condition in identification user group
Mark user group, comprising:
Lateral attribute based on seed user group, using glowworm swarm algorithm calculate the seed user group with it is described to be identified
The similarity of the lateral attribute of user group determines that the similarity of lateral attribute meets the from the user group to be identified
The group of one condition is as potential target user group.
4. according to the method described in claim 3, it is characterized in that, the lateral attribute based on seed user group, uses
Glowworm swarm algorithm calculates the similarity of the lateral attribute of the seed user group and the user group to be identified, from it is described to
Determine that the similarity of lateral attribute meets the group of first condition as potential target user group, packet in identification user group
It includes:
Initialization algorithm parameter;
Class central value is calculated with the lateral attribute of the seed user group;
The original intensity of each firefly is calculated based on the class central value;
In algorithm iteration, the brightness of the location information and firefly that update each firefly is calculated, from the user to be identified
Determine that the similarity of lateral attribute meets the group of first condition as potential target user group in group.
5. according to the method described in claim 4, it is characterized in that, the initialization algorithm parameter, comprising:
Optimizing population invariable number, initial attraction, light are set in conjunction with the seed user group and user group's scale to be identified
Absorption coefficient, step factor and maximum number of iterations.
6. a kind of similar crowd recognition system characterized by comprising
First determining module is determined and institute for the lateral attribute based on seed user group from user group to be identified
The similarity for stating the lateral attribute of seed user group meets the potential target user group of first condition;
Second determining module, for longitudinal attribute based on the potential target user group, from the potential target user group
Determine that the similarity with longitudinal attribute of the seed user group meets the targeted user population of second condition in body.
7. system according to claim 6, which is characterized in that second determining module is being executed based on the potential mesh
The longitudinal attribute for marking user group is determined to belong to the longitudinal direction of the seed user group from the potential target user group
When the similarity of property meets the targeted user population of second condition, it is specifically used for:
Based on longitudinal attribute information of the potential target user group, the seed user group is calculated using cosine similar function
The similarity of body and longitudinal attribute of the potential target user group, determines longitudinal direction from the potential target user group
The similarity of attribute meets the group of second condition as targeted user population.
8. system according to claim 6, which is characterized in that first determining module is being executed based on seed user group
The lateral attribute of body is determined to meet with the similarity of the lateral attribute of the seed user group from user group to be identified
When the potential target user group of first condition, it is specifically used for:
Lateral attribute based on seed user group, using glowworm swarm algorithm calculate the seed user group with it is described to be identified
The similarity of the lateral attribute of user group determines that the similarity of lateral attribute meets the from the user group to be identified
The group of one condition is as potential target user group.
9. system according to claim 8, which is characterized in that first determining module is being executed based on seed user group
The lateral attribute of body calculates the lateral attribute of the seed user group Yu the user group to be identified using glowworm swarm algorithm
Similarity, determine that the similarity of lateral attribute meets the group of first condition as latent from the user group to be identified
In targeted user population, it is specifically used for:
Initialization algorithm parameter;
Class central value is calculated with the lateral attribute of the seed user group;
The original intensity of each firefly is calculated based on the class central value;
In algorithm iteration, the brightness of the location information and firefly that update each firefly is calculated, from the user to be identified
Determine that the similarity of lateral attribute meets the group of first condition as potential target user group in group.
10. system according to claim 9, which is characterized in that first determining module is executing initialization algorithm ginseng
When number, it is specifically used for:
Optimizing population invariable number, initial attraction, light are set in conjunction with the seed user group and user group's scale to be identified
Absorption coefficient, step factor and maximum number of iterations.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910433863.0A CN110135916A (en) | 2019-05-23 | 2019-05-23 | A kind of similar crowd recognition method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910433863.0A CN110135916A (en) | 2019-05-23 | 2019-05-23 | A kind of similar crowd recognition method and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110135916A true CN110135916A (en) | 2019-08-16 |
Family
ID=67572827
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910433863.0A Pending CN110135916A (en) | 2019-05-23 | 2019-05-23 | A kind of similar crowd recognition method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110135916A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111598714A (en) * | 2020-07-24 | 2020-08-28 | 北京淇瑀信息科技有限公司 | Two-stage unsupervised group partner identification method and device and electronic equipment |
CN111831681A (en) * | 2020-01-22 | 2020-10-27 | 浙江连信科技有限公司 | Intelligent terminal-based personnel discrimination method and device |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110055004A1 (en) * | 2009-09-02 | 2011-03-03 | Bradd Elden Libby | Method and system for selecting and optimizing bid recommendation algorithms |
CN104111946A (en) * | 2013-04-19 | 2014-10-22 | 腾讯科技(深圳)有限公司 | Clustering method and device based on user interests |
CN105005301A (en) * | 2015-05-25 | 2015-10-28 | 湘潭大学 | Method for planning operation point sequence and path of industrial robot based on swarm intelligence algorithm |
CN105005918A (en) * | 2015-07-24 | 2015-10-28 | 金鹃传媒科技股份有限公司 | Online advertisement push method based on user behavior data and potential user influence analysis and push evaluation method thereof |
CN105279204A (en) * | 2014-07-25 | 2016-01-27 | 阿里巴巴集团控股有限公司 | Information push method and apparatus |
CN106611344A (en) * | 2015-10-23 | 2017-05-03 | 北京国双科技有限公司 | Method and device for mining potential customers |
CN106649781A (en) * | 2016-12-28 | 2017-05-10 | 北京小米移动软件有限公司 | Application recommendation method and device |
CN107545453A (en) * | 2016-06-28 | 2018-01-05 | 阿里巴巴集团控股有限公司 | A kind of information distribution method and device |
CN107862558A (en) * | 2017-12-11 | 2018-03-30 | 中国南方航空股份有限公司 | Self-standing user group's extended method |
CN108415913A (en) * | 2017-02-09 | 2018-08-17 | 周孟 | Crowd's orientation method based on uncertain neighbours |
CN108537567A (en) * | 2018-03-06 | 2018-09-14 | 阿里巴巴集团控股有限公司 | A kind of determination method and apparatus of targeted user population |
-
2019
- 2019-05-23 CN CN201910433863.0A patent/CN110135916A/en active Pending
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110055004A1 (en) * | 2009-09-02 | 2011-03-03 | Bradd Elden Libby | Method and system for selecting and optimizing bid recommendation algorithms |
CN104111946A (en) * | 2013-04-19 | 2014-10-22 | 腾讯科技(深圳)有限公司 | Clustering method and device based on user interests |
CN105279204A (en) * | 2014-07-25 | 2016-01-27 | 阿里巴巴集团控股有限公司 | Information push method and apparatus |
CN105005301A (en) * | 2015-05-25 | 2015-10-28 | 湘潭大学 | Method for planning operation point sequence and path of industrial robot based on swarm intelligence algorithm |
CN105005918A (en) * | 2015-07-24 | 2015-10-28 | 金鹃传媒科技股份有限公司 | Online advertisement push method based on user behavior data and potential user influence analysis and push evaluation method thereof |
CN106611344A (en) * | 2015-10-23 | 2017-05-03 | 北京国双科技有限公司 | Method and device for mining potential customers |
CN107545453A (en) * | 2016-06-28 | 2018-01-05 | 阿里巴巴集团控股有限公司 | A kind of information distribution method and device |
CN106649781A (en) * | 2016-12-28 | 2017-05-10 | 北京小米移动软件有限公司 | Application recommendation method and device |
CN108415913A (en) * | 2017-02-09 | 2018-08-17 | 周孟 | Crowd's orientation method based on uncertain neighbours |
CN107862558A (en) * | 2017-12-11 | 2018-03-30 | 中国南方航空股份有限公司 | Self-standing user group's extended method |
CN108537567A (en) * | 2018-03-06 | 2018-09-14 | 阿里巴巴集团控股有限公司 | A kind of determination method and apparatus of targeted user population |
Non-Patent Citations (1)
Title |
---|
付平: "人工萤火虫算法的参数分析与改进及其应用", 《中国优秀硕士学位论文全文数据库信息科技辑》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111831681A (en) * | 2020-01-22 | 2020-10-27 | 浙江连信科技有限公司 | Intelligent terminal-based personnel discrimination method and device |
CN111831681B (en) * | 2020-01-22 | 2022-03-25 | 浙江连信科技有限公司 | Intelligent terminal-based personnel discrimination method and device |
CN111598714A (en) * | 2020-07-24 | 2020-08-28 | 北京淇瑀信息科技有限公司 | Two-stage unsupervised group partner identification method and device and electronic equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Bhatia | Data mining and data warehousing: principles and practical techniques | |
Tibshirani et al. | Diagnosis of multiple cancer types by shrunken centroids of gene expression | |
Ye et al. | Urban function recognition by integrating social media and street-level imagery | |
CN108108451A (en) | The group of subscribers portrait acquisition methods and device of group | |
Hiu et al. | An investigation of decision‐making styles of consumers in China | |
Wang et al. | The determinants of the sports team sponsor's brand equity: a cross-country comparison in Asia | |
CN109190044A (en) | Personalized recommendation method, device, server and medium | |
Fawcett et al. | Data Science for Business | |
Malik et al. | Applied unsupervised learning with R: Uncover hidden relationships and patterns with k-means clustering, hierarchical clustering, and PCA | |
CN108763362A (en) | Method is recommended to the partial model Weighted Fusion Top-N films of selection based on random anchor point | |
CN107992531A (en) | News personalization intelligent recommendation method and system based on deep learning | |
Wang et al. | Large-scale ensemble model for customer churn prediction in search ads | |
TW201939400A (en) | Method and device for determining group of target users | |
CN108230010A (en) | A kind of method and server for estimating ad conversion rates | |
CN110246007A (en) | A kind of Method of Commodity Recommendation and device | |
CN107526810B (en) | Method and device for establishing click rate estimation model and display method and device | |
CN103886048A (en) | Cluster-based increment digital book recommendation method | |
CN109949089B (en) | Method, device and terminal for determining display rate | |
CN110135916A (en) | A kind of similar crowd recognition method and system | |
CN107220311A (en) | A kind of document representation method of utilization locally embedding topic modeling | |
Kim et al. | A k-populations algorithm for clustering categorical data | |
CN107545444A (en) | A kind of card data recommendation method and device | |
CN109614982A (en) | Product analysis method, apparatus, computer equipment and storage medium | |
Hsu et al. | Who donates on line? Segmentation analysis and marketing strategies based on machine learning for online charitable donations in Taiwan | |
Jayanthi et al. | Leaf disease segmentation from agricultural images via hybridization of active contour model and OFA |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190816 |