CN110019759A

CN110019759A - Tenant group processing method, device, computer equipment and storage medium

Info

Publication number: CN110019759A
Application number: CN201711027618.7A
Authority: CN
Inventors: 唐红艳; 赵铭; 范欣; 张伟
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2017-10-27
Filing date: 2017-10-27
Publication date: 2019-07-16

Abstract

The present invention relates to a kind of tenant group processing method, device, computer equipment and storage mediums, this method comprises: obtaining the click sequence of the clicked message identification of record corresponding with each user identifier in training set；Using the click sequence and the message identification clicked in sequence as the document and word in topic model, topic model training is carried out, the corresponding theme distribution of each click sequence is obtained；Respectively according to the corresponding theme distribution of each click sequence, each population distribution for clicking sequence relative users mark is determined；Affiliated group is determined according to corresponding population distribution respectively to each user identifier.The scheme of the application improves the accuracy of point group.

Description

Tenant group processing method, device, computer equipment and storage medium

Technical field

The present invention relates to field of computer technology, set more particularly to a kind of tenant group processing method, device, computer Standby and storage medium.

Background technique

With the rapid development of Internet, how mass users, be divided into reasonably by the enormous amount of Internet user Group is extremely important.

Current method is basic information (such as user's gender, age etc.) and application state information (net based on user Network situation, application version etc.) etc. user bases attribute a point group is carried out to user.However, user base attribute can only reflect user Fixed character, the individualized feature of user itself can not be embodied, thus based on user base attribute carry out tenant group, often It is not accurate enough to will lead to grouping result.

Summary of the invention

Based on this, it is necessary to for the problem that the result for carrying out tenant group based on user base attribute is not accurate enough, mention For a kind of tenant group processing method, device, computer equipment and storage medium.

A kind of tenant group processing method, which comprises

Obtain the click sequence of the clicked message identification of record corresponding with each user identifier in training set；

Using sequence and the message identification clicked in sequence clicked as the document and list in topic model Word carries out topic model training, obtains the corresponding theme distribution of each click sequence；

Respectively according to the corresponding theme distribution of each click sequence, each click sequence relative users mark is determined Population distribution；

Affiliated group is determined according to corresponding population distribution respectively to each user identifier.

A kind of tenant group processing unit, described device include:

Retrieval module is clicked, for obtaining the clicked information of record corresponding with each user identifier in training set The click sequence of mark；

Theme distribution determining module, for using it is described click sequence and it is described click sequence in message identification as Document and word in topic model carry out topic model training, obtain the corresponding theme distribution of each click sequence；

Population distribution determining module, for according to the corresponding theme distribution of each click sequence, determining each described respectively Click the population distribution of sequence relative users mark；

Group's determining module, for determining affiliated group according to corresponding population distribution respectively to each user identifier Body.

A kind of computer equipment, including memory and processor are stored with computer program, the meter in the memory When calculation machine program is executed by processor, so that the processor executes following steps:

A kind of storage medium being stored with computer program, the computer program are executed by one or more processors When, so that one or more processors execute following steps:

Above-mentioned tenant group processing method, device, computer equipment and storage medium, by corresponding with each user identifier The click sequence of clicked message identification is recorded, topic model training is carried out, obtains the corresponding theme distribution of each click sequence. Wherein, it clicks sequence and embodies click behavior of the user to range of information, and the click behavior reflects to a certain extent User is to the interest of information, so obtained theme distribution can abstractively summarize user couple to the click behavior of information from user The interest of information.Determine that the population distribution for clicking sequence relative users mark, the population distribution then can also be with according to theme distribution User is embodied to the interest preference of information, so user identifier institute can be accurately determined out according to the population distribution of user identifier The group of category improves the accuracy of point group.

Detailed description of the invention

Fig. 1 is the application scenario diagram of tenant group processing method in one embodiment；

Fig. 2 is the flow diagram of tenant group processing method in one embodiment；

Fig. 3 is the graph model schematic diagram for generating document process in one embodiment in LDA topic model；

Fig. 4 is the flow diagram of user information collection obtaining step in one embodiment；

Fig. 5 is the process overview schematic diagram of tenant group processing method in one embodiment；

Fig. 6 is the data flow figure of tenant group processing method in one embodiment；

Fig. 7 is the configuration diagram of tenant group processing method in one embodiment；

Fig. 8 is the flow diagram of tenant group processing method in another embodiment；

Fig. 9 is the block diagram of tenant group processing unit in one embodiment；

Figure 10 is the block diagram of tenant group processing unit in another embodiment；

Figure 11 is the block diagram of tenant group processing unit in another embodiment；

Figure 12 is the schematic diagram of internal structure of computer equipment in one embodiment.

Specific embodiment

In order to make the objectives, technical solutions, and advantages of the present invention clearer, with reference to the accompanying drawings and embodiments, right The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and It is not used in the restriction present invention.

Fig. 1 is the application scenario diagram of tenant group processing method in one embodiment.Referring to Fig.1, it is wrapped in the application scenarios Include the terminal 110 and server 120 by network connection.Terminal 110 can be personal computer or mobile electronic device, move Dynamic electronic equipment includes at least one of mobile phone, tablet computer, personal digital assistant or wearable device etc..Server 120 can be realized with the server cluster of the either multiple physical server compositions of independent server.

The information that the corresponding user of user identifier can show in terminal 110 is clicked, and clicked information is obtained Message identification.Message identification acquired in the available terminal 110 of server 120, and according to the message identification, it obtains and instructs Practice the click sequence of the clicked message identification of the corresponding record of each user identifier concentrated.It is appreciated that being intended merely to here Facilitate and understand the source for forming the basic data for clicking sequence, but is not used to Limited service device 120 and needs by being connect from terminal The mode for the message identification that sink is hit obtains click sequence, and server 120 can also directly acquire and each use in training set Family identifies the click sequence of the clicked message identification of corresponding record.

Server 120 can will click on the message identification in sequence and click sequence respectively as the document in topic model And word, topic model training is carried out, the corresponding theme distribution of each click sequence is obtained.Server 120 can be respectively according to each The corresponding theme distribution of sequence is clicked, the population distribution of each click sequence relative users mark is determined, each user identifier is distinguished Affiliated group is determined according to corresponding population distribution.

Fig. 2 is the flow diagram of tenant group processing method in one embodiment.The present embodiment is mainly answered in this way Come for computer equipment for example, the computer equipment can be the server 120 in Fig. 2.Referring to Fig. 2, this method tool Body includes the following steps:

S202 obtains the click sequence of the clicked message identification of record corresponding with each user identifier in training set.

Wherein, user identifier is used for the corresponding user of unique identification.User identifier can be account or cell-phone number etc..Training Collection is the set for carrying out the user information of topic model training.It include multiple in training set for carrying out topic model training User identifier and corresponding click sequence.

In one embodiment, training set is to concentrate to divide from user information to obtain, and belongs to one of user information collection Point.It should be noted that it includes user identifier and corresponding click sequence that user information, which is concentrated,.

Message identification is used for the corresponding information of unique identification.Information includes article, picture, animation, video, audio and quotient At least one of product etc..Article can be the article of pure words description, be also possible to mixed type article, for example, picture and text mix Or audio-video text mixing etc..Video can be short-sighted frequency or film etc..

Sequence is clicked, is the sequence for recording the message identification that the corresponding user of user identifier is clicked.

It is appreciated that the corresponding user of user identifier, in click information, computer equipment can recorde what this was clicked The message identification of information, computer equipment can be raw according to each message identification clicked by the corresponding user of the user identifier At click sequence corresponding with the user identifier.Clicking includes message identification in sequence.Clicking sequence can also include information mark Know and corresponding to the message identification user identifier.It is appreciated that each user identifier has respectively corresponding click sequence.

For example, the user identifier of user is a, which clicks 3 information, the corresponding information mark of 3 information Knowing is ID_001, ID_002 and ID_003, then the corresponding click sequence of user identifier a can be (ID_001, ID_002, ID_ 003) it, can also be the sequence including user identifier and message identification, such as (a, ID_001, ID_002, ID_003).

S204 will click on sequence and click message identification in sequence respectively as the document and word in topic model, Topic model training is carried out, the corresponding theme distribution of each click sequence is obtained.

In natural language processing, topic model refers to that each word in a document is according to certain probability selection Some theme has simultaneously selected such a process of some word to obtain from the theme of selection.Theme indicates in natural language It is a concept expressed by several words, is mathematically represented as the conditional probability distribution of word on vocabulary.Topic model one As be user's natural language processing, the present embodiment then innovatively applies it to the dividing in group of user identifier.

Specifically, computer equipment can will click on sequence as the document in topic model, will click on the letter in sequence Breath mark carries out topic model training as the word in topic model.It is appreciated that due in traditional topic model training In, document is made of word, and in the present embodiment, clicking sequence is also the sequence for including message identification, it is possible to general Message identification will click on sequence as the document in topic model as the word in topic model.

Based on topic model training, there are following formula (1):

Wherein, ID expression message identification, sequence expression click sequence, topic expression theme, p (ID | sequence) The distribution (i.e. the distribution probability of each message identification in click sequence) of message identification in expression click sequence, p (ID | topic) The probability distribution (distribution probability of the message identification in i.e. each theme) of expression theme and message identification, p (topic | Sequence) then indicating each click corresponding theme distribution of sequence, (i.e. each distribution for clicking each theme in sequence is general Rate).

In the present embodiment, the p on above-mentioned formula (1) equation left side (ID | sequence) be it is known, can directly pass through statistics Click the number of each message identification in sequence and obtain, and p (ID | topic) and p (topic | sequence) be it is unknown, Topic model is exactly to pass through a series of training with a large amount of known p (ID | sequence), infers p (ID | topic) and p (topic|sequence).And the p (topic | sequence) obtained is to click the corresponding theme distribution of sequence.

PLSA (Probabilistic Latent Semantic Analysis) or LDA can be used in topic model (Latent Dirichlet Allocation), the training reasoning of pLSA topic model mainly use EM (expectation maximization) Algorithm；The training reasoning of LDA topic model is using Gibbs sampling (gibbs sampler) method.

In one embodiment, in topic model training process, computer equipment can be in each click sequence Sequence is clicked in the theme of imparting one random initialization of each message identification, the theme of the initialization based on the imparting, initialization Column-theme distribution and theme-message identification distribution, and based on the click sequence in training set and click the message identification in sequence It is trained, optimizes the two distributions.Wherein, sequence-theme distribution is clicked, as the click corresponding theme distribution of sequence, used The probability distribution that sequence belongs to each theme is clicked in description.The distribution of theme-message identification, that is, be the theme and message identification Probability distribution, for indicating the probability distribution of each message identification under each theme.

In one embodiment, computer equipment can be to each message identification according to certain general determined by new probability formula Rate selects some theme, clicks sequence-theme distribution and theme-message identification distribution to update, then repeats to each information Mark clicks sequence-theme distribution and theme-letter according to some theme of certain probability selection determined by new probability formula, to update The step of breath mark distribution, until model is restrained, obtain the final corresponding theme distribution of click sequence.It is appreciated that passing through Topic model training can also obtain the probability distribution of theme and message identification.

It is appreciated that a click sequence has a corresponding theme distribution, a click sequence is in corresponding theme point Correspond to an at least theme in cloth, includes the distribution probability clicked sequence and belong to corresponding each theme in the theme distribution. The distribution probability of each theme is each theme probability corresponding in theme distribution.It is accordingly being led for example, clicking sequence S Corresponding 3 themes, respectively T1, T2 and T3 in topic distribution, in the theme distribution for clicking sequence S, the distribution probability of theme T1 It is 20%, the distribution probability of theme T2 is 40%, and the distribution probability of theme T3 is 60%.

For every text in training corpus, LDA topic model defines following generating process: given includes M The training corpus of text, each text and K (K is given in advance by the methods of repetition test) a master in training corpus One multinomial distribution of topic is corresponding, which is denoted as θ.Each theme again with V word in vocabulary one A multinomial distribution is corresponding, this multinomial distribution is denoted asθ andIt is the Dirichlet with hyper parameter α and β first respectively Test distribution.For each of text m word w, a theme z is extracted from multinomial distribution θ corresponding to text m, Then the multinomial distribution corresponding to the theme z againThis process is repeated N by one word w of middle extraction_mIt is secondary, text m is just produced, Here N_mIt is the word sum in text m.This generating process can be indicated with graph model shown in Fig. 3.Wherein [1, M] m ∈, k∈[1,K]。

In the present embodiment, server is based on LDA topic model, inputs LDA theme mould for M click sequence as corpus Type；Each of sequence m message identification be will click on as the word w in LDA topic model；Sequence m be will click on as LDA master Inscribe the document in model；The corresponding theme distribution z of click sequence that LDA topic model generates, is equivalent to group belonging to user Distribution.With this innovative document-lexical item model that will click on sequence and be mapped to LDA topic model.

In one embodiment, for all theme k ∈ [1, K], generate " theme-message identification " distribution, i.e., it is default The distribution of the theme and message identification of quantityWhereinIt indicatesIt is super for obeying with β The Dirichlet of parameter is distributed.

For all click sequence m ∈ [1, M], generates and click corresponding " clicking sequence-theme " distribution of sequence m, i.e., Click the distribution θ of sequence and theme_m~Dirichlet (α).Wherein θ_m~Dirichlet (α) indicates θ_mIt obeys using α as hyper parameter Dirichlet distribution.

Further, the current length N for clicking sequence m is obtained_m, indicate the current message identification quantity clicked in sequence m. For currently clicking all message identification n ∈ [1, N of sequence m_m], according to " clicking sequence-theme " the distribution θ of generation_mGeneration is worked as Theme z belonging to preceding message identification n_m,n~θ_m.It is distributed according to " theme-message identification " of generationIt is raw It is identified at current informationIt is found that the generating probability of n-th of message identification t is available as follows in m-th of click sequence Formula (2) indicates:

Wherein, m indicates m-th of click sequence, and n indicates to click n-th of message identification in sequence, and t indicates to click sequence m In message identification n value, k indicate theme mark, K is the preset quantity of theme；w_m,nIt indicates to click the information in sequence m Identify n, z_m,nIndicate the theme of n-th of message identification in click sequence m；Indicate message identification w_m,nIt is t Probability, p (z_m,n=k | θ_m) indicate in θ_mUnder conditions of current information mark belonging to theme z_m,nIt is the probability of k,It indicatesUnder conditions of current information identify w_m,nIt is the probability of t.

Above-mentioned formula (2) equation left sidePoint in sequence m is being clicked equal to current information mark t Cloth probability.The probability of the corresponding theme distribution of each click sequence and theme and message identification can be acquired in conjunction with formula (1) Distribution.It is appreciated that the probability distribution for clicking the corresponding theme distribution of sequence and theme and message identification is all multinomial point Cloth.

S206 determines the group of each click sequence relative users mark respectively according to the corresponding theme distribution of each click sequence Body distribution.

Wherein, group is that a group has the composition of user identifier corresponding to the user of same or similar relationship, preference Set.The population distribution of user identifier, for indicating that user identifier belongs to the probability distribution of each group.

It is appreciated that since user identifier is corresponding with sequence is clicked, and theme distribution can abstractively summarize user couple Interest, the preference of information, so computer equipment can determine the click sequence phase according to the corresponding theme distribution of sequence is clicked The population distribution for the user identifier answered.

Specifically, computer equipment can by the corresponding theme distribution of each click sequence, as with the click sequence from The population distribution of the corresponding user identifier of body.Computer equipment can also from each click sequence in corresponding theme distribution institute Selected part theme in corresponding multiple themes is determined according to the theme of selection distribution probability corresponding in theme distribution The population distribution of the corresponding user identifier of click sequence.

S208 determines affiliated group to each user identifier according to corresponding population distribution respectively.

It is appreciated that computer equipment can determine the user identifier according to the corresponding population distribution of each user identifier Group belonging to itself.For example, when computer equipment will determine group belonging to user A, it can be according to the corresponding group of user A Distribution is to determine.Group belonging to user identifier can be one or more.

Specifically, computer equipment can the direct group corresponding in corresponding population distribution by each user identifier, make For group belonging to the user identifier.Computer equipment can also be corresponding in corresponding population distribution from the user identifier group In body, selected section group is as group belonging to the user identifier.

Above-mentioned tenant group processing method passes through the click of the clicked message identification of record corresponding with each user identifier Sequence carries out topic model training, obtains the corresponding theme distribution of each click sequence.Wherein, it clicks sequence and embodies user couple The click behavior of range of information, and the click behavior reflects user to the interest of information to a certain extent, so obtaining Theme distribution user can abstractively be summarized to the interest of information to the click behavior of information from user.It is true according to theme distribution Fixed point hits the population distribution of sequence relative users mark, which can also then embody user to the interest preference of information, So according to the population distribution of user identifier can accurately determine out user identifier belonging to group, improve the accurate of point group Property.

In one embodiment, this method further include: obtain user information collection；By user information collection be divided into training set and Speculate collection；According to the parameter of the topic model obtained by training set training and speculate collection, training, which speculates, collects corresponding theme mould Type；Collect corresponding topic model by inference and determines that group belonging to user identifier is concentrated in supposition.

Wherein, it includes user identifier and corresponding click sequence that user information, which is concentrated,.

Specifically, user information collection can be divided into training set according to a certain percentage and speculate collection by computer equipment. It is appreciated that including multiple user identifiers and corresponding click sequence in training set, thus it is speculated that concentrating includes that at least one user marks Know and click accordingly sequence.

The parameter of the available topic model obtained by training set training of computer equipment, and according to the theme mould of acquisition The parameter and supposition collection of type, the training corresponding topic model of supposition collection.

In one embodiment, computer equipment can collect corresponding topic model by inference, determine and speculate that concentration is each The corresponding theme distribution for clicking sequence of user identifier, and the corresponding click sequence of each user identifier is supposedly concentrated respectively Theme distribution determines each population distribution for clicking the corresponding user identifier of sequence.Computer equipment can concentrate each use to speculating Family mark determines affiliated group according to corresponding population distribution respectively.

In above-described embodiment, user information collection is divided into training set and speculates collection, topic model is carried out by training set Training can reduce the data volume of topic model training, reduce model training complexity.And the master obtained based on training set training The parameter of model is inscribed, training, which speculates, collects corresponding topic model, can reduce the difficulty of the topic model training of training set, improve The efficiency of model training.

As shown in figure 4, in one embodiment, obtaining user information collection (abbreviation user information collection obtaining step) and specifically wrapping Include following steps:

S402 obtains the candidate corresponding message identification clicked of each user identifier.

Wherein, candidate user identifier is the candidate user identifier for generating user information collection.

The corresponding message identification clicked of user identifier is the information for the information that the corresponding user of user identifier is clicked Mark.It is appreciated that candidate each user identifier has the corresponding message identification clicked.

In one embodiment, the available message identification corresponding with candidate each user identifier of computer equipment is clicked List, it is a click record which, which clicks each list records in list, including the message identification clicked. It in one embodiment, can also include the time point of the user identifier and/or click clicked in each list records. Computer equipment can be clicked in list from the message identification, obtain the information clicked corresponding with candidate each user identifier Mark.

It is appreciated that can be the corresponding message identification of each user identifier clicks list, it is also possible to candidate institute The message identification for having user identifier to be clicked is clicked in list in a message identification.

In other embodiments, the message identification clicked corresponding with user identifier, can also be non-list form, For example, can also be the sequence including multiple message identifications clicked.

S404 obtains the operation behavior record that the corresponding record of candidate user identifier corresponds to message identification.

Operation behavior is the behavior operated.Operation behavior include read, thumb up, recommending, sharing, collecting, paying close attention to or Grade of making comments behaviors.Operation behavior record is the record that record carries out operation behavior data generated in operating process.

Operation behavior corresponding to message identification records, and is to record to carry out relevant operation to information corresponding to message identification The record of operation behavior data generated.The operation behavior that the corresponding record of user identifier corresponds to message identification records, and is The corresponding user of user identifier carries out relevant operation operation behavior generated to information corresponding to message identification and records.

In one embodiment, the available operation behavior record list of computer equipment, the operation behavior record list Middle each list records are operation behavior record, the message identification of information including being operated and are generated by corresponding operating Operation behavior data.In one embodiment, operation behavior record can also include the user identifier operated and/or behaviour Make time point.

S406 falls the message identification of invalid clicks according to operation behavior record filtering from the message identification of acquisition.

Wherein, invalid clicks are that click behavior is invalid.The message identification of invalid clicks is that received click behavior belongs to nothing The mark of the information of the click behavior of effect.

In one embodiment, invalid clicks Rule of judgment is pre-set in computer equipment, computer equipment can be with Judge the operation behavior record for meeting invalid clicks Rule of judgment, the operation behavior for meeting invalid clicks Rule of judgment is recorded Message identification of the corresponding message identification as invalid clicks.

Wherein, invalid clicks Rule of judgment is the condition for judging the message identification of invalid clicks.Invalid clicks are sentenced Broken strip part can be the Rule of judgment in the setting of operation behavior dimension.It is appreciated that the judgement item being arranged in operation behavior dimension Part, can be by judging whether operation behavior record meets the invalid clicks Rule of judgment, to judge the information of invalid clicks Mark.

It is appreciated that filtering removes, computer equipment can have been clicked from corresponding with candidate each user identifier In message identification, the message identification of invalid clicks is removed.

In one embodiment, step S406 includes: to obtain the operation row for corresponding to message identification in operation behavior record For data；Screening meets the operation behavior data of invalid clicks Rule of judgment in the operation behavior data of acquisition；From acquisition In message identification, message identification corresponding to the operation behavior data filtered out is filtered out.

It wherein, include operation behavior data in operation behavior record.Operation behavior data are corresponding to message identification The data for being used to embody operation behavior that information generates during being operated.When operation behavior data include information reading At least one of length, information evaluation data, recommendation operation data, sharing operation data and collection operation data etc..Wherein, believe Breath evaluation data include comment information and by triggering express happiness dislike control generate evaluation information, such as thumb up (like) or Point praises (dislike).

Invalid clicks Rule of judgment is the condition for judging the message identification of invalid clicks.Invalid clicks judge item Part can be the Rule of judgment in the setting of operation behavior dimension.It is appreciated that the Rule of judgment being arranged in operation behavior dimension, It can be by judging whether operation behavior data meet the invalid clicks Rule of judgment, to judge the information mark of invalid clicks Know.

Specifically, the operation behavior data that computer equipment can will acquire substitute into invalid clicks Rule of judgment, and screen Meet the operation behavior data of invalid clicks Rule of judgment out.Computer equipment can be from corresponding with candidate each user identifier The message identification clicked in, corresponding with the operation behavior data filtered out message identification is determined, by determining information mark Knowledge filters out.

In one embodiment, invalid clicks Rule of judgment includes that the information reading duration is less than or equal to preset duration Threshold value；And/or the information evaluation data are negative review number evidence.

S408 generates corresponding click sequence according to remaining message identification after the candidate corresponding filtering of each user identifier.

S410 identifies to obtain user information collection according to click sequence and relative users.

Specifically, computer equipment can according to remaining message identification after the candidate corresponding filtering of each user identifier, It respectively obtains each user identifier and clicks sequence accordingly.

Specifically, computer equipment can combine message identification remaining after filtering, obtain marking with candidate each user Know the click sequence of the clicked message identification of corresponding record.In one embodiment, computer equipment can also will be candidate The filtering corresponding with the user identifier of each user identifier after remaining message identification combine, obtain and candidate each user identifier The click sequence of the clicked message identification of corresponding record.

In one embodiment, computer equipment can also carry out secondary filter from message identification remaining after filtering, The message identification for belonging to popular information class and/or unexpected winner info class is filtered out, further according to message identification remaining after secondary filter Combination, obtains the click sequence for recording clicked message identification corresponding with candidate each user identifier, or according to will wait Remaining message identification combines after each user identifier secondary filter corresponding with the user identifier of choosing, obtains and candidate each use Family identifies the click sequence of the clicked message identification of corresponding record.

Wherein, popular information is the higher information of attention rate.Unexpected winner information is the concerned lower information of degree.Attention rate Height can be determined according to the height of the number of the number of clicks to information or click frequency.It is appreciated that by secondary Filtering, enables to filter result more to optimize, creates a further reduction the data volume of model training and answering for model training Miscellaneous degree.

In above-described embodiment, is recorded, come by the operation behavior that the corresponding record of user identifier corresponds to message identification The message identification for filtering the invalid clicks in message identification can accurately determine out the letter of invalid clicks from operation behavior level Breath mark.In addition, the message identification for filtering out invalid clicks can either reduce the data of invalid clicks to the negative of model training It influences, also can reduce the data volume of model training, reduce the complexity of model training.

In one embodiment, obtaining user information collection includes: to obtain that candidate each user identifier is corresponding have been clicked Message identification；In the message identification of acquisition, filtering belongs to the message identification of popular information class and/or unexpected winner info class；According to Remaining message identification after the candidate corresponding filtering of each user identifier generates corresponding click sequence；According to each click sequence It identifies to obtain user information collection with relative users.

Wherein, popular information is the higher information of attention rate.Unexpected winner information is the concerned lower information of degree.Hot topic letter Class is ceased, is to indicate that information corresponding to message identification belongs to the classification of popular information.Unexpected winner info class is to indicate message identification institute Corresponding information belongs to the classification of unexpected winner information.

Specifically, the corresponding message identification clicked of each user identifier of the available candidate of computer equipment, according to To number of clicks corresponding to message identification or click frequency, determine to belong in message identification popular information class and/or cold The message identification of door info class, and from the message identification clicked corresponding with candidate each user identifier of acquisition, it will be true It is fixed to belong to popular information class and/or the message identification of unexpected winner info class filters out.

In one embodiment, this method further include: inquire the corresponding number of clicks of message identification or point of acquisition Hit frequency；The message identification that corresponding number of clicks or click frequency are more than or equal to high frequency click threshold is referred to popular information Class；The message identification that corresponding number of clicks or click frequency are less than or equal to low frequency click threshold is referred to unexpected winner info class.

Wherein, high frequency click threshold and low frequency click threshold, for limiting the number of clicks or click frequency of distinguishing information Range.High frequency click threshold is for filtering out popular information, and low frequency click threshold is for filtering out unexpected winner information.

Computer equipment can combine message identification remaining after filtering, obtain the candidate corresponding note of each user identifier Record the click sequence of clicked message identification.In one embodiment, computer equipment can also mark candidate each user Remaining message identification combines after knowing filtering corresponding with the user identifier, obtains the candidate corresponding record institute of each user identifier The click sequence of the message identification of click.

In one embodiment, computer equipment can also carry out secondary filter from message identification remaining after filtering, The message identification for filtering out invalid clicks is combined further according to message identification remaining after secondary filter, obtains candidate each user The click sequence of the clicked message identification of corresponding record is identified, or is marked according to by candidate each user identifier and the user Remaining message identification combines after knowing corresponding secondary filter, obtains the clicked letter of the corresponding record of candidate each user identifier Cease the click sequence of mark.Wherein, invalid clicks are that click behavior is invalid.The message identification of invalid clicks is received click Behavior belongs to the mark of the information of invalid click behavior.It is appreciated that passing through secondary filter, filter result is enabled to more Optimization, create a further reduction the data volume of model training and the complexity of model training.

Further, computer equipment can identify to obtain user information collection according to click sequence and relative users.

In above-described embodiment, filters out and belong to the message identification of popular information class and/or unexpected winner info class and can either reduce Negative effect of the data of invalid clicks to model training, also can reduce the data volume of model training, reduces model training Complexity.

In one embodiment, the parameter of topic model is training set by training the general of obtained theme and message identification Rate distribution.

In the present embodiment, according to the parameter of the topic model obtained by training set training and collection is speculated, training speculates collection Corresponding topic model includes: acquisition click sequence corresponding with each user identifier for speculating concentration；According to theme and information mark The probability distribution of knowledge, and will speculate that the corresponding message identification clicked in sequence and click sequence of each user identifier concentrated is distinguished As the document and word in topic model, topic model training is carried out.

Wherein, thus it is speculated that collection is the parameter of the topic model obtained based on training set by topic model training, to be led Inscribe the set of the user identifier of model training.The parameter of the topic model is the theme the probability distribution with message identification.Theme with The probability distribution of message identification, for indicating the probability distribution of each message identification under each theme.It is appreciated that theme With the probability distribution of message identification, that is, be the theme-message identification distribution.

It is understood that, thus it is speculated that collection is divided by user information collection, and user information collection can be divided and be trained Collection and supposition collection.

Click sequence corresponding with each user identifier of concentration is speculated is note corresponding with each user identifier of concentration is speculated Record the click sequence of clicked message identification.

Specifically, computer equipment can keep training set to train obtained theme and message identification by topic model Probability distribution is constant, will speculate that the corresponding message identification clicked in sequence and click sequence of each user identifier concentrated is made respectively The document and word being the theme in model carry out topic model training.

In one embodiment, computer equipment carries out in topic model training process to training set, concentrates to speculating The corresponding click sequence-theme distribution of each user identifier carry out the training that iterates, obtain it is final with speculate concentrate it is each The corresponding corresponding theme distribution of click sequence of user identifier.Computer equipment can be distributed according to each user concentrated with supposition The corresponding theme distribution of corresponding click sequence is identified, determines each population distribution for clicking the corresponding user identifier of sequence, Each user identifier is concentrated to determine affiliated group according to corresponding population distribution respectively to speculating.

In above-described embodiment, the parameter based on the topic model that training set training obtains, training, which speculates, collects corresponding theme Model can reduce the difficulty of the topic model training of training set, improve the efficiency of model training.

In one embodiment, affiliated group, which includes:, to be determined according to corresponding population distribution respectively to each user identifier Determine the distribution probability of each user identifier each group corresponding in corresponding population distribution；From group corresponding with user identifier In, according to the group for sequentially screening preset quantity that corresponding distribution probability is descending, as group belonging to relative users mark Body.

Wherein, the population distribution of user identifier, for indicating that user identifier belongs to the probability distribution of each group.It can be with Understand, user identifier corresponds at least one group in corresponding population distribution, includes the use in the population distribution of user identifier Family mark belongs to the distribution probability of corresponding each group.

Specifically, computer equipment can from the group corresponding to user identifier, according to each group distribution probability by The small group for sequentially screening preset quantity is arrived greatly, using the group filtered out as group belonging to the user identifier.

For example, user identifier corresponding 5 groups in corresponding population distribution, each group and corresponding distribution probability difference Are as follows: group a (distribution probability 5%), group b (distribution probability 10%), group c (distribution probability 50%), group's d (distribution probability 90%) and group e (distribution probability 90%), the preset quantity for screening are 3, then computer equipment can be according to each group Distribution probability it is descending sequentially screen 3 groups, respectively group c (distribution probability 50%), group's d (distribution probability 90%) with group e (distribution probability 90%), 3 groups that this is filtered out are as group belonging to the user identifier.

In one embodiment, the sequence sieve that computer equipment can also be descending to the distribution probability according to each group The group for the preset quantity selected carries out postsearch screening, and from the group filtered out, corresponding distribution probability is lower than preset point The group of group's confidence threshold value filters out, using group remaining after filtering as group belonging to relative users mark.

In above-described embodiment, according to the group for sequentially screening preset quantity that corresponding distribution probability is descending, as phase Group belonging to user identifier is answered, so that group belonging to the user identifier finally determined is more accurate.

In one embodiment, from group corresponding with user identifier, according to descending suitable of corresponding distribution probability Sequence screens the group of preset quantity, includes: by each group corresponding to each user identifier as group belonging to relative users mark Distribution probability respectively compared with dividing group's confidence threshold value；In group corresponding with each user identifier, will accordingly it be distributed respectively Probability, which is lower than, divides the group of group's confidence threshold value to filter out；From group remaining after filtering, according to corresponding distribution probability by big To small sequence, the group for meeting preset quantity is chosen as group belonging to relative users mark.

Wherein, confidence level is also referred to as reliability or confidence level, confidence coefficient.Estimated value and population parameter are centainly being permitted Perhaps within error range, corresponding probability has much, this corresponding probability is referred to as confidence level.Divide group's confidence level, that is, uses Family identifies the distribution probability of affiliated group, for embodying the reliability of Suo Fen group.Divide group's confidence threshold value can for filtering out The higher group of reliability.

Specifically, computer equipment can by the distribution probability of each group corresponding to each user identifier respectively with divide group's confidence Threshold value comparison is spent, determines that corresponding distribution probability is lower than the group for dividing group's confidence threshold value, in group corresponding with each user identifier In body, corresponding distribution probability is lower than the group of group's confidence threshold value is divided to filter out respectively.Computer equipment can be after filtering In remaining group, according to the sequence that corresponding distribution probability is descending, the group of preset quantity is chosen as relative users mark Group belonging to knowing.

In above-described embodiment, in group corresponding with each user identifier, corresponding distribution probability is lower than group is divided to set respectively The group of confidence threshold filters out, then from group remaining after filtering, according to the sequence that corresponding distribution probability is descending, choosing Take the group of preset quantity as group belonging to relative users mark, so that group belonging to the user identifier finally determined It is more accurate.

In one embodiment, this method further include: information push is carried out according to group belonging to each user identifier.

Specifically, computer equipment can carry out information to the user identifier and push away according to group belonging to each user identifier It send.Computer equipment can identify the user identifier for needing to carry out information push as target user, which is marked In group belonging to knowing, the user identifier in addition to target user mark is as non-targeted user identifier, according to non-targeted use The message identification that family mark has been clicked, which identifies the target user, carries out information push.

Computer equipment can obtain letter corresponding to the message identification that non-targeted user identifier has been clicked from database Breath, and the information that will acquire is pushed to the corresponding terminal of target user's mark.Computer equipment can also be from non-targeted In information corresponding to the message identification that user identifier has been clicked, the partial information end corresponding to target user's mark is filtered out End is pushed.

It should be noted that when carrying out information push, the information being pushed can be with the complete content of the information into Row push, is also possible to carry out push with reduced forms such as thumbnail, titles to show, is receiving terminal to the simplification shape of push When the read request of the information of formula, then the complete content of the information is pushed to terminal.

In above-described embodiment, group belonging to the user identifier that the user identifier grouping method based on this case is determined is more It is accurate, therefore, the information pushed based on group belonging to each user identifier also just it is more accurate, more meet use The demand at family.

In one embodiment, it includes: to mark in target user that the group according to belonging to each user identifier, which carries out information push, In group belonging to knowing, non-targeted user identifier is obtained；It obtains corresponding to non-targeted user identifier in preset time range The message identification clicked；Obtain the operation behavior record that the corresponding record of non-targeted user identifier corresponds to message identification；Root It is recorded according to operation behavior, the message identification of acquisition is screened；Identify what corresponding terminal push filtered out to target user Information corresponding to message identification.

Wherein, target user identifies, and is the user identifier for needing to carry out information push.Non-targeted user identifier is at this User identifier in group belonging to target user's mark, in addition to target user mark.

Preset time range can be range (such as the October 81 day to 2017 October in 2017 indicated with absolute time Day), the time range that is also possible to indicate with reference time (such as the time that the reference times such as nearly one month, nearly one week indicate Range).

Operation behavior corresponding to message identification records, and is to record to carry out relevant operation to information corresponding to message identification The record of operation behavior data generated.Non-targeted user identifier is corresponding to record the operation behavior note for corresponding to message identification Record is that the non-targeted user that non-target user's mark indicates is generated to the progress relevant operation of information corresponding to message identification Operation behavior record.

Specifically, computer equipment can be recorded according to operation behavior, be screened to the message identification of acquisition.Computer Equipment can identify information corresponding to the message identification that corresponding terminal push filters out to target user.It is appreciated that meter Information corresponding to whole message identifications that corresponding terminal push filters out can be identified to target user by calculating machine equipment, The whole message identifications filtered out can be screened again, by information corresponding to the message identification filtered out again to Target user identifies corresponding terminal and pushes.

In one embodiment, computer equipment can obtain letter corresponding with the message identification filtered out from database Breath, and the information that will acquire identifies corresponding terminal to target user and pushes.

In one embodiment, computer equipment can be recorded according to operation behavior in operation behavior data be corresponding letter Breath, which identifies, determines respective weights；Message identification is screened according to respective weights.Computer equipment can be according to respective weights Descending sequence filters out the message identification of predetermined number from the message identification of acquisition.Computer equipment can also be from In the message identification of acquisition, the message identification that respective weights are greater than or equal to default weight threshold is filtered out.

Wherein, operation behavior data include information reading duration, information evaluation data, recommend operation data, sharing operation At least one of data and collection operation data etc..Wherein, information evaluation data include comment information and are expressed by triggering The evaluation information that the control that happiness is disliked generates, for example thumb up (like) or put and praise (dislike).

It is appreciated that computer equipment to push obtain information when, can by each information according to corresponding weight into Row sequence.

In one embodiment, letter corresponding to the message identification that corresponding terminal push filters out is identified to target user Breath includes: that inquiry target user identifies corresponding user property；Inquire information characteristics corresponding to the message identification filtered out；It presses According to the default correlation degree between the user property, information characteristics and user property and information characteristics of target user's mark, scoring Obtain the score value for corresponding to the message identification filtered out；The message identification filtered out is carried out again according to corresponding score value Screening；Corresponding terminal, which is identified, to target user pushes the corresponding information of message identification filtered out again.

Wherein, user property includes at least one attribute such as gender, age, educational background and location.Information characteristics are to be used for The information for the characteristics of embodying information.Information characteristics include at least one feature such as classification and label of information.

Specifically, computer equipment can inquire target user from database or local and identify corresponding user property, And inquire information characteristics corresponding to the message identification filtered out.

The default correlation degree between user property and information characteristics is pre-set in computer equipment, which uses In the power for embodying the incidence relation between user property and information characteristics, correlation degree is stronger, user property and information characteristics Between incidence relation it is stronger, correlation degree is weaker, and the incidence relation between user property and information characteristics is weaker.

Computer equipment can be according to information corresponding to target user's user property identified, the message identification that filters out Default correlation degree between feature and user property and information characteristics, scores to the message identification filtered out, obtains pair The score value for the message identification that Ying Yu is filtered out.

In one embodiment, computer equipment can according to the default correlation degree between user property and information characteristics, Determine the correlation degree between information characteristics corresponding to the user property that target user identifies and the message identification filtered out, root It scores according to the correlation degree determined to the message identification filtered out, obtains corresponding to commenting for the message identification filtered out Score value.Wherein, score value and correlation degree are positively correlated, and correlation degree is stronger, and score value is higher, and correlation degree is weaker, score value It is lower.

In one embodiment, computer equipment can be screened again, be filtered out from the message identification filtered out Corresponding score value is greater than or equal to the message identification of default scoring threshold value.In another embodiment, computer equipment can also be with From the message identification filtered out, corresponding score value size is filtered out again in the message identification of preceding presetting digit capacity.For example, screening Corresponding score value is in preceding 5 message identifications out.

It is appreciated that above-described embodiment, two dimensions of user property and information characteristics is considered, according to user property and information Incidence relation between feature scores to the message identification of acquisition, and the information for needing to push is filtered out according to score value more Accurately, it is more in line with user demand.

Further, computer equipment can identify corresponding terminal to target user and push the information mark filtered out again Know corresponding information.

It is appreciated that computer equipment can to message identification carry out again screen after, from database obtain with again The secondary corresponding information of message identification filtered out, and the information that will acquire identifies corresponding terminal to target user and pushes. Computer equipment can also be recorded according to operation behavior, after screening to the message identification of acquisition, just be obtained from database Information corresponding with the message identification of screening is taken, and after screen again to the message identification of screening, from acquired letter In breath, information corresponding with the message identification screened again is chosen, the information of selection is identified into corresponding terminal to target user It is pushed.

In above-described embodiment, is recorded, the message identification of acquisition is screened, energy according to the operation behavior of non-targeted user The message identification for being more in line with the preference of non-targeted user is enough filtered out, and then identifies corresponding terminal push sieve to target user The information selected is also just more accurate.

It is appreciated that after user identifier pushed information, user identifier is corresponding in the group according to belonging to user identifier User can click the information of push, and computer equipment is available to click row according to the corresponding user of each user identifier For the click sequence of generation, topic model training is re-started according to the click sequence of acquisition, is redefined belonging to user identifier Group, to improve the accuracy of grouping result.

Fig. 5 is the process overview schematic diagram of tenant group processing method in one embodiment.Referring to Fig. 5, computer equipment The corresponding user of available each user identifier clicks list to the message identification that the click and operation of information obtain and operation is gone For record list.Wherein, message identification clicks the letter that each list records in list are the corresponding click of a user identifier Breath identifies, and each list records are an operation behavior record in the operation behavior record list.Computer equipment can be by According to the operation behavior record in operation behavior record list, list is clicked according to message identification and generates the corresponding point of each user identifier Hit sequence.Computer equipment can carry out a point group according to click sequence and handle, and grouping result is sent to information transmission system, by Information transmission system carries out information and recalls according to grouping result, and carries out screening and sequencing to the information recalled, according to screening and sequencing Information afterwards generates information recommendation list.It is appreciated that information is recalled, that is, reacquire the information clicked.

Fig. 6 is the data flow figure of tenant group processing method in one embodiment.Referring to Fig. 6, user identifier is corresponding User clicks and operates to the information in information list (such as news list), then computer equipment it is available according to It clicks the message identification (i.e. Information ID) that behavior obtains and clicks list and operation behavior record list in family.Computer equipment can root According to the corresponding operating behavior record in operation behavior record list, the message identification for clicking list records to message identification was carried out Filter obtains clicking sequence.Computer equipment can carry out a point group according to click sequence and handle, and obtain grouping result.Wherein, divide group As a result group corresponding to each user identifier (i.e. User ID) is at least one in.In grouping result in Fig. 6, each group Score after ID is the distribution probability of the group.Computer equipment can be based on grouping result, by group belonging to user identifier The interested message identification that other user identifiers have clicked in body is recalled, it will be understood that message identification is recalled, i.e., obtains again Take the message identification clicked.Computer equipment can carry out screening and sequencing to the message identification recalled, after screening and sequencing The corresponding information of message identification, generate information list, push to terminal corresponding to user identifier.

Fig. 7 is the configuration diagram of tenant group processing method in one embodiment.Following each framework groups described in Fig. 7 It can be set in computer equipment at part.

Referring to Fig. 7, corresponding message identification operation behavior record and clicked of each user identifier is stored in database.Meter Corresponding information mark operation behavior record and clicked of candidate each user identifier can be obtained from database by calculating machine equipment Know.

Invalid clicks filter can be recorded according to the corresponding operation behavior of each user identifier, filter out each user identifier pair The message identification for the invalid clicks in the message identification clicked answered.Invalid clicks filter can be by letter remaining after filtering Breath mark, which is sent to, clicks frequency counter.

Frequency counter is clicked for calculating the number that each message identification is clicked, according to the number being clicked to filtering Remaining message identification is for re-filtering afterwards, and remaining message identification after the corresponding filtering again of each user identifier is sent out It send to click sequence withdrawal device.

Sequence withdrawal device is clicked according to remaining message identification after corresponding filtering again, it is corresponding to generate each user identifier Sequence is clicked, user information collection is obtained.

User information can be integrated cutting as training set and pushed away by sampler according to preset oversampling ratio in parameter configuration device Survey collection.

Topic model training aids can be according in the corresponding click sequence of user identifier each in training set and parameter configuration device The model training parameter of setting carries out topic model training, obtains the corresponding theme distribution of each click sequence and theme and letter Cease the probability distribution of mark.

Topic model speculates the probability distribution for the theme and message identification that device is obtained according to training set training, and speculates collection In the corresponding click sequence of each user identifier carry out topic model training, obtain speculating and concentrate the corresponding click sequence of each user identifier Arrange corresponding theme distribution.

Grouping result filter can will respectively click the corresponding theme distribution of sequence in obtained training set, as training set In corresponding each user identifier population distribution, and the corresponding theme distribution of each click sequence is concentrated into obtained supposition, as instruction Practice the population distribution for concentrating corresponding each user identifier.Grouping result filter can be right according to the population distribution of each user identifier Each user identifier group corresponding in respective population distribution is filtered, and obtains grouping result and marks to get to each user Group belonging to knowing.

Recommender system can be based on the grouping result, carry out information push.The information being pushed can be received push The corresponding user of user identifier carry out click handling and operation processing, the corresponding operation of available each user identifier in turn Behavior record and the message identification clicked.It is thusly-formed closed loop, the letter that can be recorded and be clicked according to new operation behavior Breath mark, re-starts a point group to each user identifier and handles, to improve the accuracy of grouping result.

It is appreciated that above-mentioned invalid clicks filter, click sequence withdrawal device, sampler, topic model training aids and It when grouping result filter etc. needs to be applied to the parameter of configuration in processing, is obtained from parameter configuration device.

As shown in figure 8, in one embodiment, providing another tenant group processing method, this method includes following Step:

S802 obtains the candidate corresponding message identification clicked of each user identifier；Obtain candidate user identifier pair The operation behavior that the record answered corresponds to message identification records.

S804 obtains the operation behavior data for corresponding to message identification in operation behavior record；In the operation behavior of acquisition Screening meets the operation behavior data of invalid clicks Rule of judgment in data.

Wherein, operation behavior data include information reading duration and/or information evaluation data；

In one embodiment, invalid clicks Rule of judgment includes: that information reading duration is less than or equal to preset duration threshold Value；And/or information evaluation data are negative review number evidence.

S806 filters message identification corresponding to the operation behavior data filtered out from the message identification of acquisition.

In one embodiment, before step S808 further include: remaining message identification respectively corresponds to after query filter Number of clicks or click frequency；Corresponding number of clicks or click frequency are more than or equal to the message identification of high frequency click threshold It is referred to popular information class；The message identification that corresponding number of clicks or click frequency are less than or equal to low frequency click threshold is sorted out To unexpected winner info class；After filtration in remaining message identification, filtering belongs to the information of popular information class and/or unexpected winner info class Mark.

S808 generates corresponding click sequence according to remaining message identification after the candidate corresponding filtering of each user identifier.

S810 identifies to obtain user information collection according to click sequence and relative users；User information collection is divided into training Collection and supposition collection.Obtain the click sequence of the clicked message identification of record corresponding with each user identifier in training set.

S812, the click sequence that will acquire and click sequence in message identification respectively as in topic model document and Word carries out topic model training, obtains the corresponding theme distribution of each click sequence.

S814 determines the group of each click sequence relative users mark respectively according to the corresponding theme distribution of each click sequence Body distribution；Determine the distribution probability of each user identifier each group corresponding in corresponding population distribution.

S816, by the distribution probability of each group corresponding to each user identifier respectively compared with dividing group's confidence threshold value；With In the corresponding group of each user identifier, respectively by corresponding distribution probability lower than dividing the group of group's confidence threshold value to filter out.

S818, according to the sequence that corresponding distribution probability is descending, chooses preset quantity from group remaining after filtering Group as relative users mark belonging to group.

S820 obtains the probability distribution of theme and message identification that training set is obtained by training；It obtains and speculates concentration The corresponding click sequence of each user identifier.

S822, according to the probability distribution of theme and message identification, and by inference, collection carries out topic model training；It is pushed away It surveys and concentrates the corresponding theme distribution for clicking sequence of each user identifier.

In one embodiment, computer equipment can keep the probability distribution of theme and message identification constant, according to general Speculate the corresponding message identification clicked in sequence and click sequence of each user identifier concentrated respectively as in topic model Document and word carry out topic model training.

S824 supposedly concentrates the corresponding theme distribution for clicking sequence of each user identifier, determines each click sequence phase The population distribution for the user identifier answered；Each user identifier is concentrated to determine affiliated group according to corresponding population distribution respectively to speculating Body.

S826 obtains non-targeted user identifier in the affiliated group of target user's mark；Obtain non-targeted user identifier The corresponding message identification clicked in preset time range.

S828 obtains the operation behavior record that the corresponding record of non-targeted user identifier corresponds to message identification；According to behaviour Make behavior record, the message identification of acquisition is screened.

S830, inquiry target user identify corresponding user property；Inquire information corresponding to the message identification filtered out Feature.

S832, according to pre- between the user property, information characteristics and user property and information characteristics of target user's mark If correlation degree, scoring obtains the score value for corresponding to the message identification filtered out.

S834 is screened the message identification filtered out according to corresponding score value again；To target user's mark pair The terminal answered pushes the corresponding information of message identification filtered out again.

Collect secondly, user information collection to be divided into training set and speculate, topic model training is carried out by training set, it can The data volume of topic model training is reduced, model training complexity is reduced.And obtained topic model is trained based on training set Parameter, training, which speculates, collects corresponding topic model, can reduce the difficulty of the topic model training of training set, improve model training Efficiency.

Then, it is recorded by the operation behavior that the corresponding record of user identifier corresponds to message identification, to filter out information The message identification of invalid clicks in mark can accurately determine out the message identification of invalid clicks from operation behavior level. In addition, the message identification for filtering out invalid clicks can either reduce negative effect of the data to model training of invalid clicks, The data volume that can reduce model training reduces the complexity of model training.Moreover, enabling to filter by secondary filter As a result more optimize, create a further reduction the data volume of model training and the complexity of model training.

Then, in group corresponding with each user identifier, corresponding distribution probability is lower than divides group's confidence threshold value respectively Group filter out, then from group remaining after filtering, according to the sequence that corresponding distribution probability is descending, choose present count The group of amount is as group belonging to relative users mark, so that group belonging to the user identifier finally determined is more quasi- Really.

Finally, being recorded according to the operation behavior of non-targeted user, the message identification of acquisition is screened, can be filtered out It is more in line with the message identification of the preference of non-targeted user, and then identifies corresponding terminal to target user and pushes the letter filtered out Breath is also just more accurate.

As shown in figure 9, in one embodiment, providing a kind of tenant group processing unit 900, which includes: Click retrieval module 904, theme distribution determining module 906, population distribution determining module 908 and group's determining module 910, in which:

Retrieval module 904 is clicked, is clicked for obtaining record corresponding with each user identifier in training set The click sequence of message identification.

Theme distribution determining module 906, for will click on the message identification in sequence and click sequence respectively as theme Document and word in model carry out topic model training, obtain the corresponding theme distribution of each click sequence.

Population distribution determining module 908, for determining each click sequence respectively according to the corresponding theme distribution of each click sequence The population distribution of column relative users mark.

Group's determining module 910, for determining affiliated group according to corresponding population distribution respectively to each user identifier.

As shown in Figure 10, in one embodiment, the device 900 further include:

Module 902 is obtained, for obtaining user information collection.

Division module 903 collects for user information collection to be divided into training set and speculate.

Topic model training module 912, for the parameter and supposition according to the topic model obtained by training set training Collection, training, which speculates, collects corresponding topic model.

Group's determining module 910, which is also used to collect by inference corresponding topic model and determines, to be speculated belonging to concentration user identifier Group.

In one embodiment, it obtains module 902 and is also used to obtain the candidate corresponding letter clicked of each user identifier Breath mark；Obtain the operation behavior record that the corresponding record of candidate each user identifier corresponds to message identification；From the letter of acquisition In breath mark, the message identification of invalid clicks is fallen according to the operation behavior record filtering；According to candidate each user identifier pair Remaining message identification after the filtering answered generates corresponding click sequence；It identifies to obtain according to the click sequence and relative users User information collection.

In one embodiment, it obtains module 902 and is also used to obtain the behaviour for corresponding to message identification in operation behavior record Make behavioral data；Screening meets the operation behavior data of invalid clicks Rule of judgment in the operation behavior data of acquisition；From obtaining In the message identification taken, message identification corresponding to the operation behavior data filtered out is filtered out.

In one embodiment, operation behavior data include information reading duration and/or information evaluation data.

Invalid clicks Rule of judgment includes: that information reading duration is less than or equal to preset duration threshold value；And/or information is commented Valence mumber evidence is negative review number evidence.

In one embodiment, it obtains module 902 and is also used to obtain the candidate corresponding letter clicked of each user identifier Breath mark；From the message identification of acquisition, filtering belongs to the message identification of popular information class and/or unexpected winner info class；According to time Remaining message identification after the corresponding filtering of each user identifier of choosing generates corresponding click sequence；According to the click sequence and Relative users identify to obtain user information collection.

In one embodiment, the corresponding number of clicks of message identification that module 902 is also used to inquire acquisition is obtained Or click frequency；The message identification that corresponding number of clicks or click frequency are more than or equal to high frequency click threshold is referred to hot topic Info class；The message identification that corresponding number of clicks or click frequency are less than or equal to low frequency click threshold is referred to unexpected winner information Class.

In the present embodiment, topic model training module 912 is also used to obtain corresponding with each user identifier concentrated is speculated Click sequence；According to the probability distribution of theme and message identification, and it will speculate the corresponding click sequence of each user identifier concentrated With the message identification in click sequence respectively as the document and word in topic model, topic model training is carried out.

In one embodiment, group's determining module 910 is also used to determine each user identifier institute in corresponding population distribution The distribution probability of corresponding each group；From group corresponding with user identifier, according to descending suitable of corresponding distribution probability Sequence screens the group of preset quantity, as group belonging to relative users mark.

In one embodiment, group's determining module 910 is also used to the distribution of each group corresponding to each user identifier is general Rate is respectively compared with dividing group's confidence threshold value；In group corresponding with each user identifier, corresponding distribution probability is lower than respectively The group of group's confidence threshold value is divided to filter out；From group remaining after filtering, according to descending suitable of corresponding distribution probability Sequence chooses the group of preset quantity as group belonging to relative users mark.

As shown in figure 11, in one embodiment, the device 900 further include:

Info push module 914 carries out information push for the group according to belonging to each user identifier.

In one embodiment, info push module 914 is also used to identify in affiliated group in target user, obtains non- Target user's mark；Obtain the message identification clicked in preset time range corresponding to non-targeted user identifier；It obtains Non-targeted user identifier is corresponding to record the operation behavior record for corresponding to message identification；It is recorded according to operation behavior, to acquisition Message identification screened；Information corresponding to the message identification that corresponding terminal push filters out is identified to target user.

In one embodiment, info push module 914 is also used to inquire the target user and identifies corresponding user's category Property；Inquire information characteristics corresponding to the message identification filtered out；According to the user property of target user mark, the letter The default correlation degree between feature and user property and information characteristics is ceased, scoring obtains corresponding to the message identification filtered out Score value；The message identification filtered out is screened again according to corresponding score value；To target user mark pair The terminal answered pushes the corresponding information of message identification filtered out again.

Figure 12 is the schematic diagram of internal structure of computer equipment in one embodiment.Referring to Fig.1 2, which can To be server 120 shown in Fig. 1, which includes processor, memory and the net connected by system bus Network interface.Wherein, memory includes non-volatile memory medium and built-in storage.The non-volatile memories of the computer equipment are situated between Matter can storage program area and computer program.The computer program is performed, and processor may make to execute a kind of user point Group's processing method.The processor of the computer equipment supports the fortune of entire computer equipment for providing calculating and control ability Row.Computer program can be stored in the built-in storage, when which is executed by processor, processor may make to execute A kind of tenant group processing method.The network interface of computer equipment is for carrying out network communication.

It will be understood by those skilled in the art that structure shown in Figure 12, only part relevant to application scheme The block diagram of structure, does not constitute the restriction for the computer equipment being applied thereon to application scheme, and specific computer is set Standby may include perhaps combining certain components or with different component layouts than more or fewer components as shown in the figure.

In one embodiment, tenant group processing unit provided by the present application can be implemented as a kind of computer program Form, computer program can be run in computer equipment as shown in figure 12, the non-volatile memory medium of computer equipment The each program module for forming the tenant group processing unit can be stored, for example, click retrieval module 904 shown in Fig. 9, Theme distribution determining module 906, population distribution determining module 908 and group's determining module 910.Each program module is formed Computer program be used to that the computer equipment to be made to execute the tenant group of each embodiment of the application described in this specification Step in processing method, for example, computer equipment can pass through the point in tenant group processing unit 900 as shown in Figure 9 Hit the click sequence that retrieval module 904 obtains the clicked message identification of record corresponding with each user identifier in training set Column, and the message identification in sequence and click sequence will click on respectively as topic model by theme distribution determining module 906 In document and word, carry out topic model training, obtain the corresponding theme distribution of each click sequence.Computer equipment can lead to Population distribution determining module 908 is crossed, respectively according to the corresponding theme distribution of each click sequence, determines each click sequence relative users The population distribution of mark, and institute is determined according to corresponding population distribution respectively to each user identifier by group's determining module 910 The group of category.

In one embodiment, a kind of storage medium for being stored with computer program is provided, computer program is by one Or multiple processors are when executing, so that one or more processors execute following steps: obtaining and marked with each user in training set Know the click sequence of the clicked message identification of corresponding record；It will click on sequence and the message identification clicked in sequence made respectively The document and word being the theme in model carry out topic model training, obtain the corresponding theme distribution of each click sequence；It presses respectively According to the corresponding theme distribution of each click sequence, the population distribution of each click sequence relative users mark is determined；To each user identifier Affiliated group is determined according to corresponding population distribution respectively.

In one embodiment, computer program also makes processor execute following steps: obtaining user information collection；It will use Family information collection is divided into training set and speculates collection；According to the parameter of the topic model obtained by training set training and speculate collection, Training, which speculates, collects corresponding topic model；Collect corresponding topic model by inference and determines that group belonging to user identifier is concentrated in supposition Body.

In one embodiment, obtaining user information collection includes: to obtain the corresponding record of candidate each user identifier to correspond to It is recorded in the operation behavior of message identification；From the message identification of acquisition, Null Spot is fallen according to the operation behavior record filtering The message identification hit；According to remaining message identification after the candidate corresponding filtering of each user identifier, corresponding click sequence is generated； It identifies to obtain user information collection according to the click sequence and relative users.

In one embodiment, from the message identification of acquisition, invalid clicks are fallen according to the operation behavior record filtering Message identification include: obtain operation behavior record in correspond to message identification operation behavior data；In the operation row of acquisition Meet the operation behavior data of invalid clicks Rule of judgment for screening in data；From the message identification of acquisition, by what is filtered out Message identification corresponding to operation behavior data filters out.

In one embodiment, operation behavior data include information reading duration and/or information evaluation data；Invalid clicks Rule of judgment includes: that information reading duration is less than or equal to preset duration threshold value；And/or information evaluation data are unfavorable ratings Data.

In one embodiment, obtaining user information collection includes: to obtain that candidate each user identifier is corresponding have been clicked Message identification；From the message identification of acquisition, filtering belongs to the message identification of popular information class and/or unexpected winner info class；According to Remaining message identification after the candidate corresponding filtering of each user identifier generates corresponding click sequence；According to the click sequence It identifies to obtain user information collection with relative users.

In one embodiment, computer program also makes processor execute following steps: inquiring the message identification of acquisition Corresponding number of clicks or click frequency；Corresponding number of clicks or click frequency are more than or equal to high frequency click threshold Message identification is referred to popular information class；Corresponding number of clicks or click frequency are less than or equal to the information of low frequency click threshold Mark is referred to unexpected winner info class.

In one embodiment, the parameter of topic model is training set by training the general of obtained theme and message identification Rate distribution.According to the parameter of the topic model obtained by training set training and speculate collection, training, which speculates, collects corresponding theme mould Type includes: acquisition click sequence corresponding with each user identifier for speculating concentration；According to the probability distribution of theme and message identification, And the corresponding message identification clicked in sequence and click sequence of each user identifier concentrated will be speculated as topic model In document and word, carry out topic model training.

In one embodiment, from group corresponding with user identifier, according to descending suitable of corresponding distribution probability Sequence screens the group of preset quantity, includes: by each group corresponding to each user identifier as group belonging to relative users mark Distribution probability respectively compared with dividing group's confidence threshold value；In group corresponding with each user identifier, will accordingly it be distributed respectively Probability, which is lower than, divides the group of group's confidence threshold value to filter out；From group remaining after filtering, according to corresponding distribution probability by big To small sequence, the group of preset quantity is chosen as group belonging to relative users mark.

In one embodiment, computer program also makes processor execute following steps: according to belonging to each user identifier Group carry out information push.

In one embodiment, letter corresponding to the message identification that corresponding terminal push filters out is identified to target user Breath includes: that the inquiry target user identifies corresponding user property；It is special to inquire information corresponding to the message identification filtered out Sign；According to default between the user property, the information characteristics and user property and information characteristics of target user mark Correlation degree, scoring obtain the score value for corresponding to the message identification filtered out；By the message identification filtered out according to corresponding Score value is screened again；It is corresponding that the message identification that corresponding terminal push filters out again is identified to the target user Information.

It should be understood that although each step in each embodiment of the application is not necessarily to indicate according to step numbers Sequence successively execute.Unless expressly stating otherwise herein, there is no stringent sequences to limit for the execution of these steps, these Step can execute in other order.Moreover, in each embodiment at least part step may include multiple sub-steps or Multiple stages, these sub-steps or stage are not necessarily to execute completion in synchronization, but can be at different times Execute, these sub-steps perhaps the stage execution sequence be also not necessarily successively carry out but can with other steps or its The sub-step or at least part in stage of its step execute in turn or alternately.

Those of ordinary skill in the art will appreciate that realizing all or part of the process in above-described embodiment method, being can be with Relevant hardware is instructed to complete by computer program, the program can be stored in a non-volatile computer and can be read In storage medium, the program is when being executed, it may include such as the process of the embodiment of above-mentioned each method.Wherein, provided herein Each embodiment used in any reference to memory, storage, database or other media, may each comprise non-volatile And/or volatile memory.Nonvolatile memory may include that read-only memory (ROM), programming ROM (PROM), electricity can be compiled Journey ROM (EPROM), electrically erasable ROM (EEPROM) or flash memory.Volatile memory may include random access memory (RAM) or external cache.By way of illustration and not limitation, RAM is available in many forms, such as static state RAM (SRAM), dynamic ram (DRAM), synchronous dram (SDRAM), double data rate sdram (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronization link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) directly RAM (RDRAM), straight Connect memory bus dynamic ram (DRDRAM) and memory bus dynamic ram (RDRAM) etc..

Each technical characteristic of above embodiments can be combined arbitrarily, for simplicity of description, not to above-described embodiment In each technical characteristic it is all possible combination be all described, as long as however, the combination of these technical characteristics be not present lance Shield all should be considered as described in this specification.

Only several embodiments of the present invention are expressed for above embodiments, and the description thereof is more specific and detailed, but can not Therefore it is construed as limiting the scope of the patent.It should be pointed out that for those of ordinary skill in the art, Under the premise of not departing from present inventive concept, various modifications and improvements can be made, and these are all within the scope of protection of the present invention. Therefore, the scope of protection of the patent of the invention shall be subject to the appended claims.

Claims

1. a kind of tenant group processing method, which comprises

Using the click sequence and the message identification clicked in sequence as the document and word in topic model, into The training of row topic model, obtains the corresponding theme distribution of each click sequence；

Respectively according to the corresponding theme distribution of each click sequence, each group for clicking sequence relative users mark is determined Distribution；

2. the method according to claim 1, wherein further include:

Obtain user information collection；

User information collection is divided into training set and speculates collection；

Collected according to the parameter of the topic model obtained by training set training and the supposition, the training supposition collection corresponds to Topic model；

Collect corresponding topic model according to the supposition and determines that group belonging to user identifier is concentrated in the supposition.

3. according to the method described in claim 2, it is characterized in that, the acquisition user information collection includes:

Obtain the candidate corresponding message identification clicked of each user identifier；

Obtain the operation behavior record that the corresponding record of candidate each user identifier corresponds to message identification；

From the message identification of acquisition, the message identification of invalid clicks is fallen according to the operation behavior record filtering；

According to remaining message identification after the candidate corresponding filtering of each user identifier, corresponding click sequence is generated；

It identifies to obtain user information collection according to the click sequence and relative users.

4. according to the method described in claim 3, it is characterized in that, in the message identification from acquisition, according to the operation The message identification that behavior record filters out invalid clicks includes:

Obtain the operation behavior data for corresponding to the message identification obtained in the operation behavior record；

Screening meets the operation behavior data of invalid clicks Rule of judgment in the operation behavior data of acquisition；

From the message identification of acquisition, message identification corresponding to the operation behavior data filtered out is filtered out.

5. according to the method described in claim 2, it is characterized in that, the acquisition user information collection includes:

From the message identification of acquisition, filtering belongs to the message identification of popular information class and/or unexpected winner info class；

6. according to the method described in claim 5, it is characterized by further comprising:

Inquire the corresponding number of clicks of message identification or click frequency obtained；

The message identification that corresponding number of clicks or click frequency are more than or equal to high frequency click threshold is referred to popular information class；

The message identification that corresponding number of clicks or click frequency are less than or equal to low frequency click threshold is referred to unexpected winner info class.

7. according to the method described in claim 2, it is characterized in that, the parameter of the topic model is the training set by instruction The probability distribution of the theme and message identification that get；

The parameter for the topic model that the basis is obtained by training set training and the supposition collect, the training supposition collection Corresponding topic model includes:

Obtain click sequence corresponding with each user identifier of concentration is speculated；

According to the probability distribution of the theme and message identification, and the corresponding click sequence of each user identifier that the supposition is concentrated Message identification in column and the click sequence carries out topic model training respectively as the document and word in topic model.

8. the method according to claim 1, wherein it is described to each user identifier respectively according to corresponding group Body distribution determines that affiliated group includes:

Determine the distribution probability of each user identifier each group corresponding in corresponding population distribution；

From the group corresponding with the user identifier, according to corresponding distribution probability it is descending sequentially screen preset quantity Group, as group belonging to relative users mark.

9. according to the method described in claim 8, pressing it is characterized in that, described from group corresponding with the user identifier The group for sequentially screening preset quantity that photograph answers distribution probability descending, as group's packet belonging to relative users mark It includes:

By the distribution probability of each group corresponding to each user identifier respectively compared with dividing group's confidence threshold value；

In group corresponding with each user identifier, divide corresponding distribution probability to group's confidence threshold value lower than described respectively Group filters out；

From group remaining after filtering, according to the sequence that corresponding distribution probability is descending, the group for choosing preset quantity makees For group belonging to relative users mark.

10. method according to any one of claim 1 to 9, which is characterized in that further include:

Information push is carried out according to group belonging to each user identifier.

11. according to the method described in claim 10, it is characterized in that, the group according to belonging to each user identifier into Row information pushes

In the group belonging to target user's mark, non-targeted user identifier is obtained；

Obtain the message identification clicked in preset time range corresponding to non-targeted user identifier；

Obtain the operation behavior record that the corresponding record of non-targeted user identifier corresponds to the message identification；

It is recorded according to the operation behavior, the message identification of acquisition is screened；

Information corresponding to the message identification that corresponding terminal push filters out is identified to the target user.

12. according to the method for claim 11, which is characterized in that described to be pushed away to the corresponding terminal of target user mark The information corresponding to the message identification filtered out is sent to include:

It inquires the target user and identifies corresponding user property；

Inquire information characteristics corresponding to the message identification filtered out；

According to pre- between the user property, the information characteristics and user property and information characteristics of target user mark If correlation degree, scoring obtains the score value for corresponding to the message identification filtered out；

The message identification filtered out is screened again according to corresponding score value；

Corresponding terminal, which is identified, to the target user pushes the corresponding information of message identification filtered out again.

13. a kind of tenant group processing unit, which is characterized in that described device includes:

Retrieval module is clicked, for obtaining the clicked message identification of record corresponding with each user identifier in training set Click sequence；

Theme distribution determining module, for using sequence and the message identification clicked in sequence clicked as theme Document and word in model carry out topic model training, obtain the corresponding theme distribution of each click sequence；

Population distribution determining module, for determining each click respectively according to the corresponding theme distribution of each click sequence The population distribution of sequence relative users mark；

Group's determining module, for determining affiliated group according to corresponding population distribution respectively to each user identifier.

14. a kind of computer equipment, including memory and processor, computer program, the meter are stored in the memory When calculation machine program is executed by processor, so that the processor executes the step such as any one of claims 1 to 12 the method Suddenly.

15. a kind of storage medium for being stored with computer program, when the computer program is executed by one or more processors, So that one or more processors are executed such as the step of any one of claims 1 to 12 the method.