CN110516033A

CN110516033A - A kind of method and apparatus calculating user preference

Info

Publication number: CN110516033A
Application number: CN201810420441.5A
Authority: CN
Inventors: 刘继宇; 邵荣防; 郝晖; 谢群群
Original assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Jingdong Shangke Information Technology Co Ltd
Current assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Jingdong Shangke Information Technology Co Ltd
Priority date: 2018-05-04
Filing date: 2018-05-04
Publication date: 2019-11-29

Abstract

The invention discloses a kind of method and apparatus for calculating user preference, are related to field of computer technology.One specific embodiment of this method includes: the daily record data for obtaining user and generating during searching for article, and feature text is extracted from daily record data；Using language model processing feature text to obtain text feature data, classify to text feature data to obtain the semantic similarity score of each article classification；The corresponding crucial term vector of each article classification is constructed respectively according to feature text, is handled crucial term vector using the degree of association disaggregated model of pre-training, is obtained the semantic association degree score of each article classification；Summation is weighted as the corresponding user preference score of each article classification to the corresponding each evaluation score of each article classification；Wherein, evaluation score includes semantic similarity score and semantic association degree score；User preference is determined based on user preference score.The embodiment can be improved the accuracy in computation of user preference, preferably meet user demand.

Description

A kind of method and apparatus calculating user preference

Technical field

The present invention relates to field of computer technology more particularly to a kind of method and apparatus for calculating user preference.

Background technique

Existing internet industry generally estimates user interest preference according to the daily record data that user generates in search, To provide the recommendation information for meeting user demand, user experience is promoted.The existing method for realizing the process mostly uses priori to know greatly Know, the interest of user is excavated using the method for rule.

In realizing process of the present invention, at least there are the following problems in the prior art for inventor's discovery:

The prior art has usually only focused on preferences that user is shown, for certain articles, but can not get use Family causes recommendation results not comprehensive enough, accurate, it is difficult to meet the needs of users the preference of article classification.

Summary of the invention

In view of this, the embodiment of the present invention provides a kind of method and apparatus for calculating user preference, it is inclined to can be improved user Good accuracy in computation, preferably meets user demand.

To achieve the above object, according to an aspect of an embodiment of the present invention, a kind of side for calculating user preference is provided Method, comprising:

The daily record data that user generates during searching for article is obtained, feature text is extracted from the daily record data This；

The feature text is handled to obtain text feature data using language model, and the text feature data are carried out Classification is to obtain the semantic similarity score of each article classification；

The corresponding crucial term vector of each article classification is constructed respectively according to the feature text, uses the degree of association of pre-training The disaggregated model processing crucial term vector, obtains the semantic association degree score of each article classification；

Summation is weighted to the corresponding each evaluation score of each article classification, it is corresponding using calculated result as each article classification User preference score；Wherein, the evaluation score includes the semantic similarity score and the semantic association degree score；

User preference is determined based on the user preference score.

Optionally, the daily record data includes click logs data, and the evaluation score further includes clicking preference score；In Summation is weighted to the corresponding each evaluation score of each article classification, to obtain the corresponding user preference score of each article classification Before step, further includes:

The number of clicks of each Item Title is determined according to the click logs data；

According to the number of clicks of each Item Title and the corresponding relationship of Item Title and article classification, each article is determined The number of clicks of classification；

Respectively using the number of clicks of each article classification divided by the click total degree of storewide classification, to obtain each article The click preference score of classification.

Optionally, the feature text is handled to obtain text feature data, to the text feature using language model Data are classified to include: the step of obtaining the semantic similarity score of each article classification

The feature text is handled respectively using at least two language models, to obtain at least two text feature data； Each text feature data are inputted into pre-training, corresponding with each language model similarity disaggregated model respectively In, to obtain each language model score of article classification；Using the Fusion Model of pre-training to each language mould of each article classification Type score is merged, to obtain the semantic similarity score of each article classification.

Optionally, each text feature data are inputted respectively it is pre-training, right respectively with each language model In the similarity disaggregated model answered, the step of to obtain each language model score of article classification before, further includes:

Obtain the daily record data and click logs data that multiple other users generate during searching for article；

Daily record data and click logs data to the multiple other users carry out word segmentation processing, to obtain training text This；

The training text is handled using a language model, to obtain training characteristics data；

Using article classification as tag along sort, use the training characteristics data train classification models, acquisition and the language The corresponding similarity disaggregated model of model；

Above-mentioned steps are executed for each language model, obtain pre-training, corresponding with each language model phase respectively Like degree disaggregated model.

Optionally, it is merged, is obtained using each language model score of the Fusion Model of pre-training to each article classification The step of semantic similarity score of each article classification includes:

The language model score of one article classification is multiplied two-by-two, to obtain nonlinear transformation characteristic value；

By in the Fusion Model of each language model score and each nonlinear transformation characteristic value input pre-training, obtain Obtain the semantic similarity score of the article classification；

Above-mentioned steps are executed for each article classification, to obtain the semantic similarity score of each article classification.

Optionally, the language model includes:

Glove, corresponding text feature data are characterized the term vector of each keyword in text；

N-Gram, corresponding text feature data are characterized the contextual information of each keyword in text；

Doc2vec, corresponding text feature data are characterized the document vector of text.

Optionally, the step of constructing the corresponding crucial term vector of each article classification respectively according to the feature text include:

Count the frequency of occurrence of the keyword of an article class now；

The keyword that frequency of occurrence is located at top N is obtained, the keyword got is used to construct the article class as element The corresponding N-dimensional key term vector of mesh；

The above process is executed for each article classification, to obtain the corresponding crucial term vector of each article classification.

To achieve the above object, other side according to an embodiment of the present invention provides and a kind of calculates user preference Device, comprising:

Characteristic extracting module, the daily record data generated during searching for article for obtaining user, from the log Feature text is extracted in data；

Similarity score computing module, for handling the feature text using language model to obtain text feature number According to classifying to the text feature data to obtain the semantic similarity score of each article classification；

Degree of association score computing module, for constructing the corresponding keyword of each article classification respectively according to the feature text Vector handles the crucial term vector using the degree of association disaggregated model of pre-training, obtains the semantic association degree of each article classification Score；

COMPREHENSIVE CALCULATING module, for being weighted summation to the corresponding each evaluation score of each article classification, by calculated result As the corresponding user preference score of each article classification；Wherein, the evaluation score includes the semantic similarity score and institute Predicate justice degree of association score；

Preference determining module, for determining user preference based on the user preference score.

Optionally, the daily record data includes click logs data, and the evaluation score further includes clicking preference score；Institute State device further include:

Preference score computing module is clicked, for determining the click time of each Item Title according to the click logs data Number；According to the number of clicks of each Item Title and the corresponding relationship of Item Title and article classification, each article classification is determined Number of clicks；Respectively using the number of clicks of each article classification divided by the click total degree of storewide classification, to obtain each object Category purpose clicks preference score.

Optionally, the similarity score computing module is also used to:

Optionally, it is also used to the similarity score computing module:

Optionally, the language model includes:

Optionally, the degree of association score computing module is also used to:

Count the frequency of occurrence of the keyword of an article class now；

To achieve the above object, another aspect according to an embodiment of the present invention provides and a kind of calculates user preference Electronic equipment, comprising:

One or more processors；

Storage device, for storing one or more programs,

When one or more of programs are executed by one or more of processors, so that one or more of processing Device is at least realized:

User preference is determined based on the user preference score.

To achieve the above object, another aspect according to an embodiment of the present invention provides a kind of computer-readable medium, On be stored with computer program, at least realized when described program is executed by processor:

User preference is determined based on the user preference score.

One embodiment in foregoing invention has the following advantages that or the utility model has the advantages that based on big data and machine learning skill Art includes from the similarity of text semantic and user preference, text using the search log of user and click logs as foundation Being associated with etc. for keyword and user preference carries out comprehensive quantification, the evaluation score of foundation to obtain as comparing, and weights It calculates to obtain user preference score.In terms of existing technologies, this method can sufficiently excavate user for certain classifications The preference of commodity, and due to combining various aspects factor, calculated result is more accurate, is more close to the true of user Idea.

Further effect possessed by above-mentioned non-usual optional way adds hereinafter in conjunction with specific embodiment With explanation.

Detailed description of the invention

Attached drawing for a better understanding of the present invention, does not constitute an undue limitation on the present invention.Wherein:

Fig. 1 is the schematic diagram of the key step of the method according to an embodiment of the present invention for calculating user preference；

Fig. 2 is the schematic diagram of the main modular of the device according to an embodiment of the present invention for calculating user preference；

Fig. 3 is that the embodiment of the present invention can be applied to exemplary system architecture figure therein；

Fig. 4 is adapted for the structural representation of the computer system for the terminal device or server of realizing the embodiment of the present invention Figure.

Specific embodiment

Below in conjunction with attached drawing, an exemplary embodiment of the present invention will be described, including the various of the embodiment of the present invention Details should think them only exemplary to help understanding.Therefore, those of ordinary skill in the art should recognize It arrives, it can be with various changes and modifications are made to the embodiments described herein, without departing from scope and spirit of the present invention.Together Sample, for clarity and conciseness, descriptions of well-known functions and structures are omitted from the following description.

Fig. 1 is the schematic diagram of the key step of the method according to an embodiment of the present invention for calculating user preference.

As shown in Figure 1, the method provided in an embodiment of the present invention for calculating user preference, comprising:

S100, obtains the daily record data that user generates during searching for article, extracts from the daily record data special Solicit articles this.The daily record data is that during searching for article interactive word, word or phrase occur for user, for example, daily record data It may include the search key of user's input, can also include the search result that user is selected by modes such as clicks.It is obtaining After getting daily record data, further the content of text in daily record data is carried out at participle using participles tools such as stammerer participles Reason, to obtain feature text；Before carrying out word segmentation processing, obtained word can be cleaned, such as remove no user The corresponding data of name (i.e. the data that generate of casual user, it is believed that this partial data relative to logged-in user data and Say that reference value is lower), removal can not judge the data (such as loss of data occur and incomplete data) in source, blacklist Data (such as the physical address or Internet protocol address of the user of generation data have been added to the blacklist).

S101 handles the feature text to obtain text feature data, to the text feature number using language model According to classifying to obtain the semantic similarity score of each article classification.Before using language model processing feature text, also Stop words processing can be carried out to feature text, removal is such as adverbial word, preposition, conjunction and interjection for the information content tribute of text Less word is offered, treated that word that feature text included is known as keyword by stop words.Language model is according to language Objective fact and carry out language abstract mathematics modeling, can be broadly considered as a kind of corresponding relationship: different language models are never Same angle establishes the corresponding relationship of content of text and mathematical material.For example, word2vec model mainly predicts each keyword The word on periphery, and be more prone to indicate semantic information of the word in entire paragraph；GloVe(Global vectors for Word representation, the Global Vector indicated for word) model is used to indicate the semantic information of keyword dimension；N- Gram is used to indicate the contextual information of keyword, i.e. semantic information before and after keyword.Above-mentioned language model can individually make With, can also be used in combination to react the similarity of semantic and user preference more fully hereinafter, specifically in subsequent embodiment into Row explanation.

For this step by using language model processing feature text, the semantic text obtained for indicating feature text is special Data are levied, then each article classification is corresponding with user preference to close by classifying to text feature data further to obtain It is (indicating user to all kinds of purpose preferences by the semantic similarity score).In addition, the spy occurred in this step " semantic similarity " among sign " semantic similarity score " refers to the similarity degree between user preference and feature text semantic.

S102 constructs the corresponding crucial term vector of each article classification according to the feature text respectively, uses pre-training The degree of association disaggregated model processing crucial term vector, obtains the semantic association degree score of each article classification.Crucial term vector is Refer to that the vector that the element in using keyword as vector is constituted, " semantic association degree " in this step refer to what keyword was included Correlation degree between semantic and user preference.For example, when constituting crucial term vector, it can be in the following way: statistics one The frequency of occurrence of the keyword of article class now；Frequency of occurrence is obtained to be located at the keyword of top N (N is natural number, such as can be with It is 50,70 or 100 etc.), use the keyword got to construct the corresponding N-dimensional key term vector of the article classification as element； The above process is executed for each article classification, to obtain the corresponding crucial term vector of each article classification.

S103 is weighted summation to the corresponding each evaluation score of each article classification, using calculated result as each article class The corresponding user preference score of mesh；Wherein, the evaluation score includes the semantic similarity score and the semantic association degree Score.Specific weight value corresponding to each evaluation score can be determined according to actual working environment.

S104 determines user preference based on the user preference score.It usually can be inclined according to user by each article classification The sequence of good grades from high to low is ranked up, and determines that user more prefers to the commodity of higher article class now that sort.

Occurs the model of a variety of pre-training in the present embodiment and in subsequent embodiment.In practical operation, Ke Yitong The daily record data for obtaining multiple other users is crossed, and corresponding model is trained using these daily record datas to obtain pre- instruction Experienced model.The training process of some key models can be specifically described in subsequent embodiment.

To sum up, method provided in this embodiment is based on big data and machine learning techniques, with the search log of user With click logs as foundation, the keyword and user preference that include from the similarity of text semantic and user preference, text Association etc. carries out comprehensive quantification, the evaluation score of foundation to obtain as comparing, and weighted calculation is to obtain user preference Score.Method in the present embodiment can search for milk powder in the preference of article classification level calculating user, such as user, then The method of the prior art can push the milk powder of various brands, using the present embodiment method, can also push and use including baby toy, nursing Infants' daily necessity such as product, diatery supplement.As it can be seen that in terms of existing technologies, this method can sufficiently excavate user to Mr. Yu The preference of a little classification commodity, and due to combining various aspects factor, calculated result is more accurate, is more close to user True idea.

In some alternative embodiments, the daily record data includes click logs data, and the evaluation score further includes Click preference score；In S103, summation is weighted to the corresponding each evaluation score of each article classification, to obtain each article classification Before the step of corresponding user preference score, further includes:

The number of clicks of each Item Title is determined according to the click logs data；According to the click of each Item Title time Several and Item Title and article classification corresponding relationship, determines the number of clicks of each article classification；Each article class is used respectively Purpose number of clicks divided by storewide classification click total degree, to obtain the click preference score of each article classification.

This step provides another evaluation score --- the calculation method of preference score is clicked, the evaluation score is for retouching State user is more partial to which kind of search result clicked, to react the preference of user.

In some alternative embodiments, S101 handles the feature text using language model to obtain text feature Data classify to the text feature data to include: the step of obtaining the semantic similarity score of each article classification

The feature text is handled respectively using at least two language models, to obtain at least two text feature data； Each text feature data are inputted into pre-training, corresponding with each language model similarity disaggregated model respectively In, to obtain each language model score of article classification；Using the Fusion Model of pre-training to each language mould of each article classification Type score is merged, to obtain the semantic similarity score of each article classification.Using different language model processing features When text, there is certain difference in the text feature data got；For example, when using Glove model treatment, corresponding text Eigen data are characterized the term vector of each keyword in text；When using N-Gram model treatment, corresponding text is special Sign data are characterized the contextual information of each keyword in text；When using Doc2vec model treatment, corresponding text is special Sign data are characterized the document vector of text.

In the present embodiment, by each text feature data input respectively pre-training, with each language model In corresponding similarity disaggregated model, the step of to obtain each language model score of article classification before, can also wrap It includes:

Above-mentioned steps describe the method for trained similarity disaggregated model.The training step for single user without holding Row preference executes during calculating, and can be periodically executed when server stress is lower, thus by a large number of users Data analysis gets newest disaggregated model in time.

In the present embodiment, melted using each language model score of the Fusion Model of pre-training to each article classification It closes, the step of obtaining the semantic similarity score of each article classification includes:

Above-mentioned steps are multiplied to it before calculating each language model score, first to obtain non-linear two-by-two Then nonlinear transformation characteristic value and language model score are input in Fusion Model jointly and carry out classification meter by transform characteristics value It calculates, this is conllinear in assorting process in order to solve the problems, such as；There is certain correlation between variable in the present embodiment, in order to It avoids influencing classification results there are stronger synteny (weight for being equivalent to some variable greatly increases) between these variables Precision, the present embodiment have further carried out nonlinear transformation processing to these language model scores, to make final semantic phase It is more accurate like degree score.

The application of the above method is further described in the embodiment for being applied to e-commerce scene below by one. Step in the embodiment mainly includes following several parts:

1. data cleansing

A) input data: daily record data

User uses the daily record data of search engine on the website of e-commerce and each end product, such as includes use The search daily record data of the search key of family input and the click logs data for the product name clicked comprising user.

B) output data: legal daily record data

After this step cleans daily record data, the data that eliminate not no data of User ID, can not judge source With the data in blacklist IP address, remaining data are considered legal daily record data.

2. classification preference calculates

The step mainly includes four sub-steps:

(1) preference score is clicked to calculate

The click preference score Score of user is calculated by following formula_i:

Wherein, click_times_iThe number of i-th of classification is clicked for user, m indicates classification sum, For total number of clicks of user, Score_iIndicate user to the preference of classification i.

(2) semantic similarity score calculates

Pairing method daily record data carries out word segmentation processing (using stammerer participle) and stop words is handled (for " ", " " etc. Auxiliary word, punctuation mark are filtered), obtain feature text；This feature text by crucial phrase at.

A) score based on Glove model calculates

Using the feature text training Glove model for obtaining and handling from a large amount of other users, trained Glove model；The Glove model that the feature text input of a large amount of other users has been trained exports term vector, composing training Data；Using the training data to GBDT (Gradient Boosting Decision Tree, gradient promote decision tree) model It is trained, disaggregated model having been trained, corresponding with Glove model.

The Glove model that the feature text input of the user has been trained exports term vector, constitutes text feature data. Disaggregated model that the input of text characteristic has been trained, corresponding with Glove model, exports Glove model score Score_glove|j。

B) score based on N-Gram model calculates

Feature text is obtained and handled to obtain from a large amount of other users, extracts Bi-Gram (the i.e. N- of this feature text In Gram, the case where appearance of a word depends only on the word that its front occurs) feature, composing training data；It uses The training data is trained GBDT model, disaggregated model having been trained, corresponding with N-Gram model.

The Bi-Gram feature of the feature text of the user is extracted, text feature data are constituted.Text characteristic is defeated Enter disaggregated model having trained, corresponding with N-Gram model, exports N-Gram model score Score_n-gram|j。

C) score based on Doc2vec model calculates

Using the feature text training Doc2vec model for obtaining and handling from a large amount of other users, instructed Experienced Doc2vec model；The Doc2vec model that the feature text input of a large amount of other users has been trained exports document vector, Composing training data；GBDT model is trained using the training data, it is having been trained, corresponding with Doc2vec model Disaggregated model.

The Doc2vec model that the feature text input of the user has been trained exports document vector, constitutes text feature number According to.Disaggregated model that the input of text characteristic has been trained, corresponding with Doc2vec model, output Doc2vec model point Number Score_doc2vec|j。

D) the fusion of model

Utilize Glove above, the model score and the mutual product of each model score of N-gram, Doc2vec output As feature, inputs Fusion Model and merged, export final score Score_j.The Fusion Model is based on SVM (Support Vector Machine, support vector machines) model or LR (Logistic Regression, logistic regression) mould Type is trained by using the corresponding data of a large amount of other users.

In above-mentioned model, if use is to label (label) when executing training and classification, the label is using commodity classification Information.

(3) semantic association degree score calculates

It (such as can be preparatory to analysis is associated from the keyword in the feature text got from a large amount of other users Save keyword and classification corresponding relationship), respectively obtain feature text in it is all kinds of now, frequency of occurrences highest (number is most) First 100 (it is merely illustrative, particular number can according to actual scene determine) keywords, using these keywords as the member of vector Element constitutes the corresponding crucial term vector of each classification, as training data；Using training data training GBDT disaggregated model, obtain Trained degree of association disaggregated model.

Keyword in the feature text of the user is associated analysis, it is corresponding to constitute the user according to aforesaid way All kinds of purpose key term vectors, and the degree of association disaggregated model trained is inputted, the output all kinds of purpose semanteme degrees of association of user point Number score_k。

(4) user preference score is calculated

Finally export the final score S of user:

S=0.4*score_i+0.4*score_j+0.2*score_k

Wherein, score_iIt is that user clicks behavior classification preference, score_jIt is user semantic similarity, score_kIt is user With the classification degree of association.0.4,0.4,0.2 is rule of thumb to select, and obtain scene optimum value (in electronics quotient by experimental verification In scene of being engaged in), it can according to need redefine in actual use.

After obtaining all kinds of purpose preference scores of user, each classification can be ranked up according to the score, according to sequence Recommend the commodity etc. in its most concerned commodity classification to user.

Fig. 2 is the schematic diagram of the main modular of the device according to an embodiment of the present invention for calculating user preference.

As shown in Fig. 2, another embodiment according to the present invention, provides a kind of device 200 for calculating user preference, comprising:

Characteristic extracting module 201, the daily record data generated during searching for article for obtaining user, from the day Feature text is extracted in will data；

Similarity score computing module 202, for handling the feature text using language model to obtain text feature Data classify to the text feature data to obtain the semantic similarity score of each article classification；

Degree of association score computing module 203, for constructing the corresponding pass of each article classification respectively according to the feature text Keyword vector handles the crucial term vector using the degree of association disaggregated model of pre-training, obtains the semantic of each article classification and close Connection degree score；

COMPREHENSIVE CALCULATING module 204 is tied for being weighted summation to the corresponding each evaluation score of each article classification by calculating Fruit is as the corresponding user preference score of each article classification；Wherein, the evaluation score include the semantic similarity score and The semantic association degree score；

Preference determining module 205, for determining user preference based on the user preference score.

In some alternative embodiments, the daily record data includes click logs data, and the evaluation score further includes Click preference score；Described device 200 further include:

Preference score computing module 206 is clicked, for determining the click of each Item Title according to the click logs data Number；According to the number of clicks of each Item Title and the corresponding relationship of Item Title and article classification, each article classification is determined Number of clicks；It is each to obtain respectively using the number of clicks of each article classification divided by the click total degree of storewide classification The click preference score of article classification.

In some alternative embodiments, the similarity score computing module 202 is also used to:

The feature text is handled respectively using at least two language models, to obtain at least two text feature data； Each text feature data are inputted into pre-training, corresponding with each language model similarity disaggregated model respectively In, to obtain each language model score of article classification；Using the Fusion Model of pre-training to each language mould of each article classification Type score is merged, to obtain the semantic similarity score of each article classification；

In some alternative embodiments, it is also used to the similarity score computing module 202:

In some alternative embodiments, the language model includes:

In some alternative embodiments, the degree of association score computing module 203 is also used to:

Count the frequency of occurrence of the keyword of an article class now；

To sum up, device provided in this embodiment is based on big data and machine learning techniques, with the search log of user With click logs as foundation, the keyword and user preference that include from the similarity of text semantic and user preference, text Association etc. carries out comprehensive quantification, the evaluation score of foundation to obtain as comparing, and weighted calculation is to obtain user preference Score.In terms of existing technologies, which can sufficiently excavate user for the preference of certain classification commodity, and due to Various aspects factor is combined, therefore calculated result is more accurate, is more close to the true idea of user.

Fig. 3 is shown can be using the method for the calculating user preference of the embodiment of the present invention or the device of calculating user preference Exemplary system architecture 300.

As shown in figure 3, system architecture 300 may include terminal device 301,302,303, network 304 and server 305. Network 304 between terminal device 301,302,303 and server 305 to provide the medium of communication link.Network 304 can be with Including various connection types, such as wired, wireless communication link or fiber optic cables etc..

User can be used terminal device 301,302,303 and be interacted by network 304 with server 305, to receive or send out Send message etc..Various telecommunication customer end applications, such as the application of shopping class, net can be installed on terminal device 301,302,303 The application of page browsing device, searching class application, instant messaging tools, mailbox client, social platform software etc..

Terminal device 301,302,303 can be the various electronic equipments with display screen and supported web page browsing, packet Include but be not limited to smart phone, tablet computer, pocket computer on knee and desktop computer etc..

Server 305 can be to provide the server of various services, such as utilize terminal device 301,302,303 to user The shopping class website browsed provides the back-stage management server supported.Back-stage management server, which can receive and save user, to exist The daily record data generated during search commercial articles.

It should be noted that the method for calculating user preference provided by the embodiment of the present invention is generally held by server 305 Row, correspondingly, the device for calculating user preference is generally positioned in server 305.

It should be understood that the number of terminal device, network and server in Fig. 3 is only schematical.According to realization need It wants, can have any number of terminal device, network and server.

According to an embodiment of the invention, the present invention also provides a kind of electronic equipment and a kind of readable storage medium storing program for executing.

Below with reference to Fig. 4, it illustrates the computer systems 400 for the terminal device for being suitable for being used to realize the embodiment of the present invention Structural schematic diagram.Terminal device shown in Fig. 4 is only an example, function to the embodiment of the present invention and should not use model Shroud carrys out any restrictions.

As shown in figure 4, computer system 400 includes central processing unit (CPU) 401, it can be read-only according to being stored in Program in memory (ROM) 402 or be loaded into the program in random access storage device (RAM) 403 from storage section 408 and Execute various movements appropriate and processing.In RAM 403, also it is stored with system 400 and operates required various programs and data. CPU 401, ROM 402 and RAM 403 are connected with each other by bus 404.Input/output (I/O) interface 405 is also connected to always Line 404.

I/O interface 405 is connected to lower component: the importation 406 including keyboard, mouse etc.；It is penetrated including such as cathode The output par, c 407 of spool (CRT), liquid crystal display (LCD) etc. and loudspeaker etc.；Storage section 408 including hard disk etc.； And the communications portion 409 of the network interface card including LAN card, modem etc..Communications portion 409 via such as because The network of spy's net executes communication process.Driver 410 is also connected to I/O interface 405 as needed.Detachable media 411, such as Disk, CD, magneto-optic disk, semiconductor memory etc. are mounted on as needed on driver 410, in order to read from thereon Computer program be mounted into storage section 408 as needed.

Particularly, according to an embodiment of the invention, the process of the schematic diagram description of key step may be implemented as above Computer software programs.For example, the embodiment of the present invention includes a kind of computer program product comprising being carried on computer can The computer program on medium is read, which includes the program for executing method shown in the schematic diagram of key step Code.In such embodiments, which can be downloaded and installed from network by communications portion 409, and/ Or it is mounted from detachable media 411.When the computer program is executed by central processing unit (CPU) 401, the present invention is executed System in the above-mentioned function that limits.

It should be noted that computer-readable medium shown in the present invention can be computer-readable signal media or meter Calculation machine readable storage medium storing program for executing either the two any combination.Computer readable storage medium for example can be --- but not Be limited to --- electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor system, device or device, or any above combination.Meter The more specific example of calculation machine readable storage medium storing program for executing can include but is not limited to: have the electrical connection, just of one or more conducting wires Taking formula computer disk, hard disk, random access storage device (RAM), read-only memory (ROM), erasable type may be programmed read-only storage Device (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), light storage device, magnetic memory device, Or above-mentioned any appropriate combination.In the present invention, computer readable storage medium can be it is any include or storage journey The tangible medium of sequence, the program can be commanded execution system, device or device use or in connection.And at this In invention, computer-readable signal media may include in a base band or as carrier wave a part propagate data-signal, Wherein carry computer-readable program code.The data-signal of this propagation can take various forms, including but unlimited In electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be that computer can Any computer-readable medium other than storage medium is read, which can send, propagates or transmit and be used for By the use of instruction execution system, device or device or program in connection.Include on computer-readable medium Program code can transmit with any suitable medium, including but not limited to: wireless, electric wire, optical cable, RF etc. are above-mentioned Any appropriate combination.

Flow chart and block diagram in attached drawing are illustrated according to the system of various embodiments of the invention, method and computer journey The architecture, function and operation in the cards of sequence product.In this regard, each box in flowchart or block diagram can generation A part of one module, program segment or code of table, a part of above-mentioned module, program segment or code include one or more Executable instruction for implementing the specified logical function.It should also be noted that in some implementations as replacements, institute in box The function of mark can also occur in a different order than that indicated in the drawings.For example, two boxes succeedingly indicated are practical On can be basically executed in parallel, they can also be executed in the opposite order sometimes, and this depends on the function involved.Also it wants It is noted that the combination of each box in block diagram or flow chart and the box in block diagram or flow chart, can use and execute rule The dedicated hardware based systems of fixed functions or operations is realized, or can use the group of specialized hardware and computer instruction It closes to realize.

Being described in module involved in the embodiment of the present invention can be realized by way of software, can also be by hard The mode of part is realized.Described module also can be set in the processor, for example, can be described as: a kind of processor packet It includes characteristic extracting module, similarity score computing module, degree of association score computing module, COMPREHENSIVE CALCULATING module and preference and determines mould Block.Wherein, the title of these modules does not constitute the restriction to the module itself under certain conditions, for example, feature extraction mould Block is also described as the " daily record data generated during searching for article for obtaining user, from the daily record data The middle module for extracting feature text ".

As on the other hand, the present invention also provides a kind of computer-readable medium, which be can be Included in equipment described in above-described embodiment；It is also possible to individualism, and without in the supplying equipment.Above-mentioned calculating Machine readable medium carries one or more program, when said one or multiple programs are executed by the equipment, makes Obtaining the equipment includes:

User preference is determined based on the user preference score.

Technical solution according to an embodiment of the present invention is based on big data and machine learning techniques, with the search log of user With click logs as foundation, the keyword and user preference that include from the similarity of text semantic and user preference, text Association etc. carries out comprehensive quantification, the evaluation score of foundation to obtain as comparing, and weighted calculation is to obtain user preference Score.In terms of existing technologies, this method can sufficiently excavate user for the preference of certain classification commodity, and due to Various aspects factor is combined, therefore calculated result is more accurate, is more close to the true idea of user.

Above-mentioned specific embodiment, does not constitute a limitation on the scope of protection of the present invention.Those skilled in the art should be bright It is white, design requirement and other factors are depended on, various modifications, combination, sub-portfolio and substitution can occur.It is any Made modifications, equivalent substitutions and improvements etc. within the spirit and principles in the present invention, should be included in the scope of the present invention Within.

Claims

1. a kind of method for calculating user preference characterized by comprising

The daily record data that user generates during searching for article is obtained, feature text is extracted from the daily record data；

The feature text is handled to obtain text feature data using language model, is classified to the text feature data To obtain the semantic similarity score of each article classification；

The corresponding crucial term vector of each article classification is constructed respectively according to the feature text, is classified using the degree of association of pre-training Key term vector described in model treatment obtains the semantic association degree score of each article classification；

Summation is weighted to the corresponding each evaluation score of each article classification, using calculated result as the corresponding use of each article classification Family preference score；Wherein, the evaluation score includes the semantic similarity score and the semantic association degree score；

User preference is determined based on the user preference score.

2. the method according to claim 1, wherein the daily record data includes click logs data, institute's commentary Valence score further includes clicking preference score；Summation is being weighted to the corresponding each evaluation score of each article classification, it is each to obtain Before the step of article classification corresponding user preference score, further includes:

According to the number of clicks of each Item Title and the corresponding relationship of Item Title and article classification, each article classification is determined Number of clicks；

Respectively using the number of clicks of each article classification divided by the click total degree of storewide classification, to obtain each article classification Click preference score.

3. the method according to claim 1, wherein handling the feature text using language model to obtain text Eigen data, packet the step of to obtain the semantic similarity score of each article classification of classifying to the text feature data It includes:

The feature text is handled respectively using at least two language models, to obtain at least two text feature data；It will be each The text feature data are inputted respectively in pre-training, corresponding with each language model similarity disaggregated model, To obtain each language model score of article classification；Each language model of each article classification is divided using the Fusion Model of pre-training Number is merged, to obtain the semantic similarity score of each article classification.

4. according to the method described in claim 3, it is characterized in that, each text feature data are inputted pre-training respectively , in corresponding with each language model similarity disaggregated model, to obtain each language model score of article classification The step of before, further includes:

Daily record data and click logs data to the multiple other users carry out word segmentation processing, to obtain training text；

Using article classification as tag along sort, use the training characteristics data train classification models, acquisition and the language model Corresponding similarity disaggregated model；

Above-mentioned steps are executed for each language model, obtain pre-training, corresponding with each language model similarity respectively Disaggregated model.

5. according to the method described in claim 3, it is characterized in that, using the Fusion Model of pre-training to each of each article classification The step of language model score is merged, and the semantic similarity score of each article classification is obtained include:

By in the Fusion Model of each language model score and each nonlinear transformation characteristic value input pre-training, it is somebody's turn to do The semantic similarity score of article classification；

6. according to the method described in claim 3, it is characterized in that, the language model includes:

7. the method according to claim 1, wherein constructing each article classification pair respectively according to the feature text The step of crucial term vector answered includes:

Count the frequency of occurrence of the keyword of an article class now；

The keyword that frequency of occurrence is located at top N is obtained, the keyword got is used to construct the article classification pair as element The N-dimensional key term vector answered；

8. a kind of device for calculating user preference characterized by comprising

Characteristic extracting module, the daily record data generated during searching for article for obtaining user, from the daily record data Middle extraction feature text；

Similarity score computing module, it is right for handling the feature text using language model to obtain text feature data The text feature data are classified to obtain the semantic similarity score of each article classification；

Degree of association score computing module, for according to the feature text construct respectively the corresponding keyword of each article classification to Amount handles the crucial term vector using the degree of association disaggregated model of pre-training, obtains the semantic association degree point of each article classification Number；

COMPREHENSIVE CALCULATING module, for being weighted summation to the corresponding each evaluation score of each article classification, using calculated result as The corresponding user preference score of each article classification；Wherein, the evaluation score includes the semantic similarity score and institute's predicate Adopted degree of association score；

9. device according to claim 8, which is characterized in that the daily record data includes click logs data, institute's commentary Valence score further includes clicking preference score；Described device further include:

Preference score computing module is clicked, for determining the number of clicks of each Item Title according to the click logs data；Root According to the number of clicks and Item Title of each Item Title and the corresponding relationship of article classification, the click of each article classification is determined Number；Respectively using the number of clicks of each article classification divided by the click total degree of storewide classification, to obtain each article class Purpose clicks preference score.

10. device according to claim 8, which is characterized in that the similarity score computing module is also used to:

11. device according to claim 10, which is characterized in that the similarity score computing module is also used to:

12. device according to claim 10, which is characterized in that be also used to the similarity score computing module:

13. device according to claim 10, which is characterized in that the language model includes:

14. device according to claim 8, which is characterized in that the degree of association score computing module is also used to:

Count the frequency of occurrence of the keyword of an article class now；

15. a kind of electronic equipment for calculating user preference characterized by comprising

One or more processors；

Storage device, for storing one or more programs,

When one or more of programs are executed by one or more of processors, so that one or more of processors are real The now method as described in any in claim 1-7.

16. a kind of computer-readable medium, is stored thereon with computer program, which is characterized in that described program is held by processor The method as described in any in claim 1-7 is realized when row.