CN110334283A - Information recommendation method, device, server and storage medium - Google Patents

Information recommendation method, device, server and storage medium Download PDF

Info

Publication number
CN110334283A
CN110334283A CN201810980235.XA CN201810980235A CN110334283A CN 110334283 A CN110334283 A CN 110334283A CN 201810980235 A CN201810980235 A CN 201810980235A CN 110334283 A CN110334283 A CN 110334283A
Authority
CN
China
Prior art keywords
information
recommendation
correlation
vector
keyword
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810980235.XA
Other languages
Chinese (zh)
Inventor
王振飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Beijing Co Ltd
Original Assignee
Tencent Technology Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Beijing Co Ltd filed Critical Tencent Technology Beijing Co Ltd
Priority to CN201810980235.XA priority Critical patent/CN110334283A/en
Publication of CN110334283A publication Critical patent/CN110334283A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

This application discloses a kind of information recommendation method, device, server and storage mediums, belong to information technology field.This method comprises: obtaining first information set and the second information aggregate, first information set is different with the form of expression of information in the second information aggregate;For each first information in first information set, keyword according to the first information calculates the first eigenvector of the first information;For the second information of each item in the second information aggregate, the second feature vector of the second information is calculated according to the keyword of the second information;According to first eigenvector and second feature vector, the information correlation between the first information and the second information is calculated;If information correlation is greater than relevance threshold, it is determined that the first information and the second information recommendation information each other.The application is by solving the problems, such as that the relevant technologies are ineffective when recommending different expression form information to the information of user's recommendation different expression form by information vector, and according to the degree of correlation between vector.

Description

Information recommendation method, device, server and storage medium
Technical field
The invention relates to information technology field more particularly to a kind of information recommendation method, device, server and deposit Storage media.
Background technique
Information recommendation is a kind of technology for recommending specific information to user by the methods of data filtering, information retrieval, energy It is enough to filter out relevant information from the massive information of internet.For example, currently being read when user is read using news client The news tail portion of reading can show other news relevant to Present News, facilitate user to select to read with this.
In the related technology, using collaborative filtering, and information recommendation is carried out based on user's history behavioral data.For example, If most users click article B while clicking article A again, show to have between article A and article B higher related Degree.It therefore, can be according to article A by number that user clicks, article the B number being clicked and article A and article B quilt simultaneously The number of click calculates the degree of correlation of article A and article B, subsequent to carry out article recommendation according to the degree of correlation between article.
Preferable effect is achieved in terms of the information recommendation of the form of expression of the same race using collaborative filtering, such as is recommended The related picture and text of graph text information, recommend the associated video of video information.But since the information of different expression form is gone through in user Co-occurrence situation in history behavioral data lacks, for example, user in a short time seldom can reading text and graph news and video be new simultaneously It hears, causes recommending the information timeliness fruit of different expression form bad using collaborative filtering.
Summary of the invention
The embodiment of the present application provides a kind of information recommendation method, device, server and storage medium, can solve correlation Technology is in the problem for recommending the information timeliness fruit of different expression form bad.The technical solution is as follows:
On the one hand, a kind of information recommendation method is provided, which comprises
First information set and the second information aggregate are obtained, is believed in the first information set and second information aggregate The form of expression of breath is different;
For each first information in the first information set, according to the calculating of the keyword of the first information The first eigenvector of the first information;
For the second information of each item in second information aggregate, according to the calculating of the keyword of second information The second feature vector of second information;
According to the first eigenvector and the second feature vector, the first information and second information are calculated Between information correlation;
If the information correlation is greater than relevance threshold, it is determined that the first information and second information push away each other Recommend information.
On the other hand, a kind of information recommending apparatus is provided, described device includes:
Data obtaining module, for obtaining first information set and the second information aggregate, the first information set and institute The form of expression for stating information in the second information aggregate is different;
First computing module, for being believed according to described first for each first information in the first information set The keyword of breath calculates the first eigenvector of the first information;
Second computing module, for being believed according to described second for the second information of each item in second information aggregate The keyword of breath calculates the second feature vector of second information;
Third computing module, for calculating described first according to the first eigenvector and the second feature vector Information correlation between information and second information;
First determining module, for when the information correlation is greater than relevance threshold, it is determined that the first information With second information recommendation information each other.
On the other hand, a kind of server is provided, the server includes processor and memory, is deposited in the memory Contain at least one instruction, at least a Duan Chengxu, code set or instruction set, at least one instruction, an at least Duan Cheng Sequence, the code set or instruction set are executed as the processor to realize such as the information recommendation method as described in terms of above-mentioned.
On the other hand, a kind of computer readable storage medium is provided, at least one finger is stored in the storage medium Enable, at least a Duan Chengxu, code set or instruction set, at least one instruction, an at least Duan Chengxu, the code set or Instruction set is executed as the processor to realize such as the information recommendation method as described in terms of above-mentioned.
Technical solution bring beneficial effect provided by the embodiments of the present application includes at least:
When carrying out information recommendation using method provided by the embodiments of the present application, each information is determined according to the keyword of information Respective feature vector, and the information correlation between information is determined based on the feature vector of the information of different expression form, from And recommendation information is determined according to the information correlation, so as to subsequent carry out information recommendation;Different from relying on user in the related technology Historical behavior data determine the degree of correlation between information, by information MAP to vector space in the embodiment of the present application, and based on letter The degree of correlation of feature vector determines the correlation between information between breath, is recommending different tables using collaborative filtering to solve The bad problem of the information timeliness fruit of existing form;Also, feature vector, Neng Gouti are generated according to the keyword of performance information feature The accuracy for the relevant information that height is determined.
Detailed description of the invention
In order to more clearly explain the technical solutions in the embodiments of the present application, make required in being described below to embodiment Attached drawing is briefly described, it should be apparent that, the drawings in the following description are only some examples of the present application, for For those of ordinary skill in the art, without creative efforts, it can also be obtained according to these attached drawings other Attached drawing.
Fig. 1 shows the schematic diagram of the implementation environment of the application one embodiment offer;
Fig. 2 shows the method flow diagrams for the information recommendation method that the application one embodiment provides;
Fig. 3 shows the schematic diagram of the information recommendation method of the application one embodiment offer;
Fig. 4 shows the flow chart of the information recommendation method of the application one embodiment offer;
Fig. 5 shows the method flow diagram of the information recommendation method of another embodiment of the application offer;
Fig. 6 is the network structure for the CBOW model that the application one embodiment provides;
Fig. 7 is the network structure for the Skip-gram model that the application one embodiment provides;
Fig. 8 shows the method flow of vector calculation in the information recommendation method of the application one embodiment offer Figure;
Fig. 9 is the vector mean value calculation block diagram that the application one embodiment provides;
Figure 10 shows the method flow diagram that the information recommendation method of the application one embodiment offer uses online;
Figure 11 is the surface chart that the news that the application one embodiment provides recommends scene;
Figure 12 shows the block diagram of the information recommending apparatus of the application one embodiment offer;
Figure 13 shows the structural schematic diagram of the server of the application one embodiment offer.
Specific embodiment
To keep the purposes, technical schemes and advantages of the application clearer, below in conjunction with attached drawing to the application embodiment party Formula is described in further detail.
Referring to FIG. 1, the schematic diagram of the implementation environment provided it illustrates the application one embodiment.In the implementation environment Including terminal 110 and server 120.
Terminal 110 is the electronic equipment with the Internet access function, which can be smart phone, plate electricity Brain or personal computer etc..In Fig. 1, terminal 110 is smart phone.
Optionally, terminal 110 is equipped with the application program with news read function, alternatively, concern is provided with news and reads Read the social account (such as public platform) of service.User opens the application program in terminal 110 with news read function, or Person can carry out news reading into after the public platform for providing news reading service.
It is connected between terminal 110 and server 120 by wired or wireless network.
Server 120 is the server cluster or cloud computing center of a server, several servers composition.One In kind possible embodiment, server 120 be application program in terminal (with news read function) background server or The server of person's social activity account (news reading service is provided).
Optionally, server 120 obtains the information aggregate of the various forms of expression in advance, by calculating different expression form The correlation of information generates the corresponding recommendation list of each information, and is stored, so as to subsequent recommendation use.Wherein, respectively Comprising having the heterologous information of different expression form with the information in the corresponding recommendation list of information.
Under a kind of possible application scenarios, as shown in Figure 1, the news in user's using terminal 110 reads class application journey Sequence, when the title for clicking picture and text news A is read, terminal 110 sends recommendation request to server 120, which asks It include the message identification of picture and text news A in asking.After server 120 receives recommendation request, i.e., according to the information mark of picture and text news A Know from recommending data library 121, searches the corresponding recommendation list 122 of picture and text news A, and the correlation in recommendation list 122 is regarded Frequency news feedback is to terminal 110, so that user selects to read relevant news-video after reading text and graph news A.
Optionally, above-mentioned wireless network or cable network use standard communication techniques and/or agreement.Network is usually Internet, it may also be any network, including but not limited to local area network (Local AreaNetwork, LAN), Metropolitan Area Network (MAN) (Metropolitan Area Network, MAN), wide area network (Wide Area Network, MAN), mobile, wired or nothing Any combination of gauze network, dedicated network or Virtual Private Network.In some embodiments, using including hypertext markup language Say (Hyper Text Mark-up Language, HTML), extensible markup language (Extensible Markup Language, XML) etc. technology and/or format represent the data by network exchange.It additionally can be used such as safe Socket layer (Secure Socket Layer, SSL), Transport Layer Security (Transport Layer Security, TLS), void Quasi- dedicated network (Virtual Private Network, VPN), Internet Protocol Security (Internet Protocol Security, IPsec) etc. conventional encryption techniques encrypt all or some links.In further embodiments, can also make Replace or supplement above-mentioned data communication technology with customization and/or the exclusive data communication technology.
In the related technology, first using collaborative filtering and based on user's history behavioral data come when realizing information recommendation First need to obtain different user to the click condition (i.e. user's history behavioral data) of information, then according to the click condition of information Calculate the degree of correlation between information.For example, when needing to calculate the degree of correlation between article A and article B, it is necessary first to which statistics is not With user to the click condition of article A and article B, the degree of correlation between article A and article B is then calculated by following formula:
Wherein, N (A) refers to that the number of users for clicking article A, N (B) refer to that the number of users for clicking article B, N (A) ∩ N (B) are Refer to while clicking the number of users of article A and article B.
However, when being applied to recommend the information of different expression form for the technology, for example recommend relevant to picture and text news When news-video, due to different expression form information in user's history behavioral data co-occurrence situation it is rare, i.e., click simultaneously Information A and the number of users of information B are seldom, cause the coverage rate of relatedness computation insufficient, therefore, it is different to adopt this method recommendation It is ineffective when form of expression information.Particularly, issue initial stage in information, due to the user's history behavioral data that is collected into compared with It is few, therefore the ineffective of information recommendation is carried out based on user's history behavioral data.
And when method provided by the embodiments of the present application being used to carry out information recommendation, server is determined according to the keyword of information The feature vector of information, and the correlation between information is obtained by calculating the degree of correlation between feature vector, thus according to Correlation determines recommendation information.It is not based on user's history behavioral data when carrying out information recommendation using this method, but is based on The correlation of information content itself, therefore even if different expression form information in user's history behavioral data co-occurrence situation it is dilute It is few, the correlation between the information of different expression form can also be measured using method provided by the embodiments of the present application, to keep away Exempt from the problem of correlation calculations coverage rate deficiency in the related technology.
The information recommendation method provided in the embodiment of the present application can be used for news and read scene, news sharing scene etc. Deng.It is illustrated below with reference to different application scenarios.
News reads scene
When being applied to news reading scene, information recommendation method can be applied to news and read in client.User It is read in client process using news, when the title of picture and text news is read when the user clicks, terminal to server request While the detailed content of the picture and text news, recommendation request is sent to server, which includes the letter of picture and text news Breath mark, server pushes corresponding recommendation news-video to terminal according to the message identification of picture and text news, so that user is readding After running through picture and text news, browsing can further be selected to recommend news-video.Similarly, when user reads news-video, clothes Being engaged in device can also be to the corresponding recommendation picture and text news of user's push.
News shares scene
When being applied to news sharing scene, when graph text information is shared with another terminal user by terminal by user, eventually It holds to server and sends sharing request.After server receives sharing request, according to the information mark of graph text information in sharing request Know, search recommendation video information relevant to the graph text information, thus graph text information and recommendation video letter that user is shared Breath is sent to another terminal together, so that another terminal user after reading the graph text information of sharing, can further select Browsing recommends video information.
Certainly, above- mentioned information recommended method can be also used for other application scenarios for needing to carry out information recommendation, the application Embodiment is only schematically illustrated by taking above two application scenarios as an example, but is not constituted and limited to this.
Referring to FIG. 2, the method flow diagram of the information recommendation method provided it illustrates the application one embodiment.This reality It applies example to be illustrated so that the information recommendation method is applied to server 120 shown in FIG. 1 as an example, this method may include following several A step:
Step 201, first information set and the second information aggregate are obtained.
First information set is different with the form of expression of information in the second information aggregate, and the form of expression difference of information refers to The carrier for carrying information is different, wherein the carrier includes text carrier, video carrier and audio carrier, correspondingly, the table of information Existing form includes the picture and text form of expression, video representation or audio presentation form.
For example, the information in first information set is graph text information, the information in the second information aggregate is video information, or Person, the information in first information set are video information, and the information in the second information aggregate is audio-frequency information.The embodiment of the present application The specific manifestation form of information in first information set and the second information aggregate is not defined.
Optionally, in order to guarantee the timeliness of information in first information set and the second information aggregate, acquired in server Information aggregate in information be information within the scope of specified time, so as to subsequent recommendation timely and effectively information, improve and recommend Quality.
Optionally, since under normal conditions, the correlation between different types of information is lower, such as entertainment information and politics The degree of correlation of information is very low, therefore, in order to reduce calculation amount when subsequent determining recommendation information, the first information that terminal obtains Information in set and the second information aggregate belongs to same type.For example, the letter in first information set and the second information aggregate Breath is sport information or military information etc..
Step 202, for each first information in first information set, keyword according to the first information calculates first The first eigenvector of information.
Wherein, keyword is the word or phrase for reflecting message subject, and optionally, first information keyword includes But it is not limited only to the title keyword and text keyword of the first information.For example, for the information of news one kind, keyword packet Include headline keyword and body keyword.
The feature vector for calculating information is that the information of natural language is converted to vector form by a kind of mathematical way Mathematical information.It is briefly exactly that information MAP to vector space is realized into information vector.Currently, the most common vector mould Type includes that only hot type indicates (one-hot representation) model and distributed expression (distributed Representation) model.Wherein, one-hot representation is a kind of sparse term vector, and length is exactly word The size of allusion quotation, the value of only one dimension is 1 in vector, other dimensions are 0.Although this model comparison is succinct, by In the coding of each word be random, so can not correlation between accurate portrayed words and word.
And distributed representation is determined as a kind of dense vector based on the semantic feature of word The vector of word, vector all has specific meaning per one-dimensional, therefore can make the vector distance between associated word It is more nearly, so as to measure the correlation between word.Optionally, for the correlation between more accurately scaling information Property, first eigenvector is distributed representation vector.
In the embodiment of the present application, server obtains each corresponding keyword of the first information in first information set, Then the corresponding first eigenvector of each first information is calculated according to the keyword, wherein the corresponding keyword of the first information For at least one, and the quantity of the corresponding keyword of the different first information can be the same or different.
Step 203, for the second information of each item in the second information aggregate, second is calculated according to the keyword of the second information The second feature vector of information.
This process embodiment is referred to above-mentioned steps 202, and details are not described herein for the present embodiment.
The embodiment of the present application does not limit the implementation sequence of step 202 and step 203, in a kind of possible embodiment In, step 202 is performed simultaneously with step 203.
Step 204, according to first eigenvector and second feature vector, the letter between the first information and the second information is calculated Cease the degree of correlation.
The degree of correlation is the standard of correlation between scaling information, and the relevant technologies are to calculate information based on user's history behavior Between the degree of correlation, this method recommending the information timeliness fruit of different expression form bad.
And in the embodiment of the present application, since feature vector is per the one-dimensional feature in a certain respect for all reflecting information, i.e., Feature vector can reflect the feature of information, therefore the degree of correlation between first eigenvector and second feature vector can be measured Correlation between the first information and the second information.Currently, the method for commonly calculating vector's correlation degree include Euclid away from From, manhatton distance and COS distance etc..
Step 205, if information correlation is greater than relevance threshold, it is determined that the first information and the second information recommendation each other Breath.
The degree of correlation between quantity and recommendation information in order to control recommendation information, server preset the degree of correlation Threshold value, and the recommendation information that the first information will be determined as greater than the second information of relevance threshold with the degree of correlation of the first information, The first information for being greater than relevance threshold with the degree of correlation of the second information is determined as to the recommendation information of the second information, so as to The lower information of correlation is excluded, the quality of recommendation information is improved.For example, the relevance threshold is 90%.
Schematically, shown in Figure 3, information recommendation method provided by the embodiments of the present application is used for video information and figure When mutual recommendation between literary information, server is in advance by the picture and text vector in the video and image-text set 32 in video collection 31 Change, obtain video feature vector 33 and picture and text feature vector 34, and calculate video feature vector 33 and picture and text feature vector 34 it Between the degree of correlation 35, thus generate the corresponding recommendation picture and text list 36 of video information and graph text information corresponding recommendation video column Table 37.When user reads video information, video information mark 38 is sent to server by terminal, and server receives video letter After breath mark 38, corresponding recommendation picture and text list 36 is inquired, and the recommendation picture and text inquired are pushed to terminal;Scheme when user reads When literary information, graph text information mark 39 is sent to server by terminal, after server receives graph text information mark 39, inquiry pair The recommendation list of videos 37 answered, and the recommendation video inquired is pushed to terminal.This method is not necessarily based on user's history behavior number According to, therefore the problem few there is no the information co-occurrence situation of different expression form, so as to avoid recommendation effect in the related technology Bad problem.
In conclusion when carrying out information recommendation using method provided by the embodiments of the present application, it is true according to the keyword of information Determine each respective feature vector of information, and determines the information phase between information based on the feature vector of different expression form information Guan Du, so that recommendation information is determined according to the information correlation, so as to subsequent carry out information recommendation;Different from the related technology according to User's history behavioral data is relied to determine the degree of correlation between information, by information MAP to vector space in the embodiment of the present application, and The correlation between information is determined based on the degree of correlation of feature vector between information, is being recommended to solve using collaborative filtering The bad problem of different expression form information timeliness fruit;Also, feature vector, energy are generated according to the keyword of performance information feature Enough improve the accuracy for the relevant information determined.
In a kind of possible embodiment, with reference to Fig. 4, after server obtains first information set 41, each article the is obtained First keyword set 43 of one information, by the first term vector set 45 being obtained, thus according to first for keyword vectorization Term vector set 45 obtains the first eigenvector 47 of the first information.Similarly, it after server obtains the second information aggregate 42, obtains The second keyword set 44 for taking each the second information of item obtains the second term vector set 46 by the vectorization of keyword, thus The second feature vector 48 of the second information is obtained according to the second term vector set 46.Later, server calculates first eigenvector 47 with the degree of correlation 49 of second feature vector 48, and sort 410 according to the degree of correlation, export 411 He of recommendation list of the first information The recommendation list 412 of second information, so as to subsequent recommendation use.It is illustrated below using schematical embodiment.
Referring to FIG. 5, the method flow diagram of the information recommendation method provided it illustrates another application embodiment.This Shen Please embodiment by the information recommendation method be applied to server 120 shown in FIG. 1 for be illustrated, this method may include with Lower step.
Step 501, first information set and the second information aggregate are obtained.
The embodiment of this step can refer to above-mentioned steps 201, and details are not described herein for the present embodiment.
Step 502, for each first information in first information set, at least one key of the first information is obtained Word.
In a kind of possible embodiment, server uses reverse document-frequency (the Term Frequency- of word frequency- Inverse Document Frequency, TF-IDF) algorithm extract the first information keyword.The calculation formula of TF-IDF is such as Under:
TF-IDF=TF*IDF
The value of TF-IDF is bigger, then proves that a word is higher to the significance level of the first information, i.e., the word becomes the first letter The probability of the keyword of breath is bigger.
Wherein, word frequency (Term Frequency, TF) refers to the frequency that a word occurs in an information, TF=(word In the total word number of the number/information that an information occurs).Although but due to some general word frequencies of occurrences height but to letter Theme is ceased without too big effect, such as these, we wait everyday expressions, it is therefore desirable to pass through inverse file frequency (Inverse Document Frequency, IDF) some general words are filtered out, retain the word that can more reflect message subject.IDF is one The measurement of a word general importance, the common degree of word and the size of IDF are inversely proportional, i.e., if the letter comprising word A Breath is fewer, then word A is that the probability of general word is lower, and the IDF of word A will be bigger, IDF=log (the information collection of word A The sum of information/(Information Number+1 comprising word A) in conjunction).
Optionally, if the information in the first information is that video information first uses the keyword extraction of video information TF-IDF algorithm extracts candidate keywords from the title of video information, the contents such as video presentation, then passes through manual examination and verification Mode finally determine video information keyword.
The embodiment of the present application for obtaining the keyword of information by TF-IDF algorithm only to be illustrated, not to key The acquisition modes of word, which are constituted, to be limited.
The information of video, audio and picture one kind is led since the corresponding text description information of this type of information is less Cause the accuracy of the keyword extracted lower, in order to improve the accuracy of keyword extraction, in a kind of possible embodiment In, if the first information is video information, server can know the picture frame in video information using image recognition technology Not, to obtain the content characteristic of each frame picture frame, and the keyword based on content characteristic generation video information;If first Information is audio-frequency information, and audio-frequency information can be converted to text information using speech recognition technology, then use TF- by server IDF algorithm extracts keyword;If the first information is pictorial information, server is special using the image that image recognition technology obtains picture Sign, and the keyword based on characteristics of image generation pictorial information.
Step 503, the corresponding term vector of each keyword is obtained.
Optionally, server generates each keyword using term vector model (Word to vector, Word2vec) Corresponding term vector.Word2vec is a kind of shallow-layer neural network, each word in information can be mapped to a vector. Word2vec includes two different network structure models: bag of words (Continuous Bag-of-Words, CBOW) and jump It jumps model (Continuous Skip-gram, Skip-Gram).
Such as Fig. 6, it illustrates the network structures of CBOW model.CBOW model speculates mesh for word based on context Mark word.For example, for 5 words adjacent in an information, respectively W (t-2), W (t-1), W (t), W (t+1) and W (t+ 2), wherein W (t) is target word, that is, needs to export the word for term vector.W (t) refers to t-th of word of information, t >=2, T is integer.The vector of 4 words other than target word is inputted into CBOW model, the final output target word after mapping layer Vector.
And the principle of Skip-gram model and CBOW model is on the contrary, Skip-gram mode input is a specific word Term vector, output be word in the corresponding context of the specific word term vector.Such as Fig. 7, it illustrates Skip-gram models Network structure.For 5 words adjacent in an information, respectively W (t-2), W (t-1), W (t), W (t+1) and W (t+ 2) vector of W (t), is inputted into Skip-gram model, final output W (t-2), W (t-1), W (t+1) and W (t+ after mapping layer 2) term vector.
In a kind of possible embodiment, the first information is inputted Word2vec by server, after trained, output first The term vector for whole words that information is included, it is corresponding to be screened out from it each keyword for keyword according to the first information later Term vector.
Step 504, first eigenvector is calculated according at least one term vector.
Due to single keyword can not comprehensively performance information feature, calculate information feature vector when, service Device needs integrated information to correspond to the term vector of each keyword.For the calculation of first eigenvector, a kind of possible In embodiment, on the basis of Fig. 5, as shown in figure 8, this step may comprise steps of:
Step 504A: the vector average value of at least one term vector is calculated.
Since each keyword of the first information all reflects the theme and feature of information to a certain extent, service Device at least one term vector corresponding to the first information averages processing, obtains vector average value.
For example, as shown in figure 9, if the first information includes 4 keywords, respectively keyword 1, keyword 2, keyword 3 With keyword 4, by the term vector of the available each keyword of above-mentioned steps, respectively term vector 1, term vector 2, term vector 3 and term vector 4, then vector average value is equal to the sum of 4 term vectors divided by 4.
Vector average value is determined as first eigenvector by step 504B.
Further, server is using obtained vector average value as the first eigenvector of the first information.First spy Levying vector can be from the theme and feature of different dimensions (corresponding different keywords) the reflection first information, to improve later period calculating The accuracy of correlation between information.
But since importance degree of the keyword each in the first information for the first information is different, and above-mentioned calculating Method assert each keyword be to the importance degree of the first information it is identical, the first eigenvector being thus calculated without Method accurately reflects the emphasis of first information theme and feature.Therefore in order to improve the accurate of the feature vector being calculated Property, server determines feature vector using the method for the weighted average for calculating each keyword term vector, on the basis of Fig. 5 On, as shown in figure 8, this step may include steps of:
Step 504C: according to frequency of occurrence of each keyword in the first information, determine each keyword equivalent to The term vector weight of amount.
Weight is to measure a certain factor to the index of a certain things importance degree.In the embodiment of the present application, weight is Refer to that each keyword is to the importance degree of the first information in the first information.
Optionally, frequency of occurrence of the keyword in the first information is determined as the word that keyword corresponds to term vector by server Vector weight, alternatively, term vector weight will be used as after frequency of occurrence normalized of each keyword in the first information.
For example, if in the first information including 4 keywords, respectively keyword 1, keyword 2, keyword 3 and keyword 4, each keyword frequency of occurrence is respectively 10,20,30 and 40, and after normalized, server is by 0.1,0.2,0.3 and 0.4 is successively used as keyword 1, keyword 2, keyword 3 and the corresponding term vector weight of keyword 4.
Step 504D, according at least one term vector and term vector weight, calculate the weighting of at least one term vector to Measure average value.
Each keyword is different the importance degree of the first information in the first information, and importance degree is higher Keyword more can reflect the feature of the first information, therefore, in order to preferably reflect the content to be transmitted of the first information, service Device calculates the weighing vector average value that each keyword corresponds to term vector.
Wherein, the calculation formula of weighing vector average value be each term vector and its term vector weight the sum of products divided by Total weight.For example, corresponding term vector is respectively A, B and C, the corresponding weight of term vector if the first information includes 3 keywords Respectively a, b and c, then weighing vector average value is (A*a+B*b+C*c)/(a+b+c).
Weighing vector average value is determined as first eigenvector by step 504E.
Further, first eigenvector of the server using weighted average vector as the first information, so that first is special Sign vector can more reflect the theme and feature of the first information, so that the later period more accurately calculates the correlation between information.
Step 505, for the second information of each item in the second information aggregate, at least one key of the second information is obtained Word.
Step 506, the corresponding term vector of each keyword is obtained.
Step 507, second feature vector is calculated according at least one term vector.
The process for calculating second feature vector is similar to the process of first eigenvector is calculated, above-mentioned steps 505 to 507 Embodiment can be with reference to step 502 to 504, and details are not described herein for the present embodiment.
Step 508, the COS distance of first eigenvector and second feature vector is calculated.
Each vector dimension of feature vector reflects the feature of information to a certain extent, and feature vector it is each to Amount dimension has cooperatively formed its direction, therefore the difference on feature vector direction can be with the difference between scaling information. And COS distance reflects the difference between two vectors on direction, therefore COS distance scaling information is used in the embodiment of the present application Between correlation.
It is formed with vector angle between first eigenvector and second feature vector, which is The COS distance of one feature vector and second feature vector.COS distance characterizes the difference between two vectors on direction, from And the difference between information has been measured, in the range of [- 1,1].
Step 509, information correlation is determined according to COS distance.
Wherein, COS distance and information correlation correlation, i.e. COS distance is bigger, the degree of correlation between information It is higher, when COS distance is -1, there is no correlation between representative information.Optionally, due to calculating two feature vectors Between COS distance be possible to negative value occur, and the degree of correlation is usually positive value, therefore, server can by [- 1,1] this COS distance in range is mapped as the information correlation of [0,1] within the scope of this, so as to later data processing.
Step 510, if information correlation is greater than relevance threshold, it is determined that the first information and the second information recommendation each other Breath.
If information correlation is greater than relevance threshold, proves similitude with higher between information, can be used as and push away Information recommendation is recommended to user.
The embodiment of above-mentioned steps is referred to above-mentioned steps 205, and details are not described herein for the embodiment of the present application.
Step 511, according to the descending of information correlation, corresponding first recommendation list of the first information and the second letter are generated Cease corresponding second recommendation list.
Wherein, in the first recommendation list comprising at least one the second information, comprising at least one the in the second recommendation list One information, and each the second information of item and the degree of correlation of the first information are all larger than relevance threshold in the first recommendation list, second pushes away It recommends the degree of correlation of each first information and the second information in list and is all larger than relevance threshold.
Server stores the first recommendation list and the second recommendation list so as to use when subsequent progress information recommendation, optional Ground, server by each recommendation list of generation be stored in long-range open source server (REmote DIctionary Server, REDIS).Schematically, for picture and text news A, corresponding recommendation list is as shown in Table 1.
Table one
Recommendation information The degree of correlation
News-video 1 0.956
News-video 2 0.937
News-video 3 0.901
...... ......
In the present embodiment, server extracts at least one keyword of information by TF-IDF algorithm, avoids and only adopts The problem of cannot sufficiently reflecting the information content with a keyword;Meanwhile server calculates at least one keyword term vector Vector average value or weighing vector average value, and as the feature vector of information, the feature vector of information can be made Accurately reflect the content of information;Also, server determines the degree of correlation between information according to the COS distance between vector, can The correlation between information accurately to measure different expression form recommends the accurate of relevant information to improve to user Property.
Above-described embodiment is that servers off-line treatment process was recommended online when server needs to carry out information recommendation Journey is as follows.
In a kind of possible embodiment, on the basis of Fig. 5, as shown in Figure 10, after step 511, further include as Lower step:
Step 1001, the recommendation request that terminal is sent is received, recommendation request includes the message identification of target information.
Under a kind of possible application scenarios, news is read when user reads client using news, and it is new to click picture and text When the headline of news, while terminal to server requests the detailed content of the picture and text news, recommendation request is sent to clothes Be engaged in device, the push request in i.e. comprising picture and text news (i.e. target information) message identification;In alternatively possible application scenarios Under, when the terminals share graph text information that user's using terminal is used to another user, terminal to server sends recommendation request, Request server adds associated recommendation information into the graph text information of sharing, includes graph text information (i.e. mesh in the recommendation request Mark information) message identification.
Step 1002, according to message identification, the corresponding target recommendation list of target information is determined.
Server identifies the message identification of recommendation request, and inquires the corresponding target of the target information according to message identification and push away Recommending list whether there is, and if it does not exist, then return empty to terminal, and if it exists, only further execute following step 1003.
Step 1003, n recommendation information preceding in target recommendation list is determined as information to be recommended, n >=1, n are integer.
Due in target recommendation list there may be a fairly large number of recommendation information, server determines that target is recommended Preceding n recommendation information in list is information to be recommended.Optionally, 2 n.It should be noted that if in target recommendation list The quantity of recommendation information is less than n, and whole recommendation informations in target recommendation list are then determined as information to be recommended by server.
Step 1004, information to be recommended is pushed to terminal.
Under a kind of possible application scenarios, when user reads client reading news using news, scheme when the user clicks When the headline of literary news, terminal sends recommendation request to server, and server is according to the recommendation request, by picture and text news It is pushed to terminal together to detailed content and the information to be recommended inquired.
Schematically, as shown in figure 11, after the headline of user's click picture and text news 4, terminal to server sends packet The recommendation request of 4 corresponding informance of news containing picture and text mark.Server finds relevant to picture and text news 4 according to the recommendation request News-video 1 and news-video 2, so that the detailed content of picture and text news 4 and news-video 1 and news-video 2 are pushed to Terminal.After terminal receives the data of server feedback, the title of news-video 1 and news-video 2 is shown in picture and text news 4 The lower section of content is checked so that user clicks.
Under alternatively possible application scenarios, when user's using terminal shares picture and text news to another terminal, i.e., to clothes Business device sends sharing request.After server receives sharing request, i.e., by user share picture and text news and with the picture and text it is new Relevant news-video is heard to be sent to together by another terminal, after the picture and text news of sharing is checked so as to another terminal user, after The continuous associated video information for checking recommendation.
Following is the application Installation practice, can be used for executing the application embodiment of the method.It is real for the application device Undisclosed details in example is applied, the application embodiment of the method is please referred to.
Figure 12 is please referred to, it illustrates the block diagrams for the information recommending apparatus that the application one embodiment provides.Device tool Have and execute the exemplary function of the above method, function can also be executed corresponding software realization by hardware realization by hardware.It should Device may include:
Data obtaining module 1210, for obtaining first information set and the second information aggregate, the first information set It is different with the form of expression of information in second information aggregate;
First computing module 1220, for for each first information in the first information set, according to described The keyword of one information calculates the first eigenvector of the first information;
Second computing module 1230, for for the second information of each item in second information aggregate, according to described The keyword of two information calculates the second feature vector of second information;
Third computing module 1240, for according to the first eigenvector and the second feature vector, described in calculating Information correlation between the first information and second information;
First determining module 1250, for when the information correlation is greater than relevance threshold, it is determined that described first Information and second information recommendation information each other.
Optionally, first computing module 1220, comprising:
Keyword acquiring unit, for obtaining at least one keyword of the first information;
Term vector acquiring unit, for obtaining the corresponding term vector of each keyword;
First computing unit, for calculating the first eigenvector according to term vector described at least one.
Optionally, first computing unit, is used for:
Calculate the vector average value of at least one term vector;
The vector average value is determined as the first eigenvector.
Optionally, first computing unit, is used for:
According to frequency of occurrence of each keyword in the first information, each keyword equivalent is determined The term vector weight of vector;
According to term vector described at least one and the term vector weight, the weighting of at least one term vector is calculated Vector average value;
The weighing vector average value is determined as the first eigenvector.
Optionally, the third computing module, comprising:
Second computing unit, for calculating the COS distance of the first eigenvector and the second feature vector;
Determination unit, for determining the information correlation according to the COS distance, wherein the COS distance and institute State information correlation correlation.
Optionally, described device further include:
Generation module generates the first information corresponding first and recommends for the descending according to the information correlation List and corresponding second recommendation list of second information include at least one described second in first recommendation list Information includes at least one first information in second recommendation list.
Optionally, described device further include:
Receiving module, for receiving the recommendation request of terminal transmission, the recommendation request includes the information mark of target information Know;
Second determining module, for determining the corresponding target recommendation list of the target information according to the message identification;
Third determining module, for recommendation information described in n item preceding in the target recommendation list to be determined as letter to be recommended Breath, n >=1, n are integer;
Pushing module, for the information to be recommended to be pushed to the terminal.
Figure 13 is please referred to, it illustrates the structural schematic diagrams for the server that the application one embodiment provides.The server For implementing information recommendation method provided by the above embodiment.Specifically:
The server 1800 includes 1802 He of central processing unit (CPU) 1801 including random access memory (RAM) The system storage 1804 of read-only memory (ROM) 1803, and connection system storage 1804 and central processing unit 1801 System bus 1805.The server 1800 further includes that the substantially defeated of information is transmitted between each device helped in computer Enter/output system (I/O system) 1806, and is used for storage program area 1813, application program 1814 and other program modules 1815 mass-memory unit 1807.
The basic input/output 1806 includes display 1808 for showing information and inputs for user The input equipment 1809 of such as mouse, keyboard etc of information.Wherein the display 1808 and input equipment 1809 all pass through The input and output controller 1810 for being connected to system bus 1805 is connected to central processing unit 1801.The basic input/defeated System 1806 can also include input and output controller 1810 to touch for receiving and handling from keyboard, mouse or electronics out Control the input of multiple other equipment such as pen.Similarly, input and output controller 1810 also provide output to display screen, printer or Other kinds of output equipment.
The mass-memory unit 1807 (is not shown by being connected to the bulk memory controller of system bus 1805 It is connected to central processing unit 1801 out).The mass-memory unit 1807 and its associated computer-readable medium are Server 1800 provides non-volatile memories.That is, the mass-memory unit 1807 may include such as hard disk or The computer-readable medium (not shown) of person's CD-ROM drive etc.
Without loss of generality, the computer-readable medium may include computer storage media and communication media.Computer Storage medium includes information such as computer readable instructions, data structure, program module or other data for storage The volatile and non-volatile of any method or technique realization, removable and irremovable medium.Computer storage medium includes RAM, ROM, EPROM, EEPROM, flash memory or other solid-state storages its technologies, CD-ROM, DVD or other optical storages, tape Box, tape, disk storage or other magnetic storage devices.Certainly, skilled person will appreciate that the computer storage medium It is not limited to above-mentioned several.Above-mentioned system storage 1804 and mass-memory unit 1807 may be collectively referred to as memory.
According to the various embodiments of the application, the server 1800 can also be arrived by network connections such as internets Remote computer operation on network.Namely server 1800 can be connect by the network being connected on the system bus 1805 Mouth unit 1811 is connected to network 1812, in other words, it is other kinds of to be connected to that Network Interface Unit 1811 also can be used Network or remote computer system.
It is stored at least one instruction, at least a Duan Chengxu, code set or instruction set in the memory, described at least one Item instruction, at least a Duan Chengxu, code set or instruction set are configured to be executed by one or more than one processor, to realize The function of each step in above- mentioned information recommended method.
The embodiment of the present application also provides a kind of computer readable storage medium, and at least one finger is stored in the storage medium Enable, at least a Duan Chengxu, code set or instruction set, at least one instruction, an at least Duan Chengxu, the code set or Instruction set is loaded by the processor and is executed to realize the information recommendation method provided such as above-mentioned each embodiment.
Optionally, the computer readable storage medium may include: read-only memory (ROM, Read Only Memory), Random access memory (RAM, Random Access Memory), solid state hard disk (SSD, Solid State Drives) or light Disk etc..Wherein, random access memory may include resistive random access memory body (ReRAM, Resistance Random Access Memory) and dynamic random access memory (DRAM, Dynamic Random Access Memory).Above-mentioned Apply for that embodiment sequence number is for illustration only, does not represent the advantages or disadvantages of the embodiments.

Claims (10)

1. a kind of information recommendation method, which is characterized in that the described method includes:
First information set and the second information aggregate are obtained, information in the first information set and second information aggregate The form of expression is different;
For each first information in the first information set, described first is calculated according to the keyword of the first information The first eigenvector of information;
For the second information of each item in second information aggregate, described second is calculated according to the keyword of second information The second feature vector of information;
According to the first eigenvector and the second feature vector, calculate between the first information and second information Information correlation;
If the information correlation is greater than relevance threshold, it is determined that the first information and second information recommendation each other Breath.
2. the method according to claim 1, wherein each item first in the first information set Information calculates the first eigenvector of the first information according to the keyword of the first information, comprising:
Obtain at least one keyword of the first information;
Obtain the corresponding term vector of each keyword;
The first eigenvector is calculated according to term vector described at least one.
3. according to the method described in claim 2, it is characterized in that, described calculate described the according to term vector described at least one One feature vector, comprising:
Calculate the vector average value of at least one term vector;
The vector average value is determined as the first eigenvector.
4. according to the method described in claim 2, it is characterized in that, described calculate described the according to term vector described at least one One feature vector, comprising:
According to frequency of occurrence of each keyword in the first information, determine that each keyword corresponds to term vector Term vector weight;
According to term vector described at least one and the term vector weight, the weighing vector of at least one term vector is calculated Average value;
The weighing vector average value is determined as the first eigenvector.
5. method according to any one of claims 1 to 4, which is characterized in that described according to the first eigenvector and institute Second feature vector is stated, the information correlation between the first information and second information is calculated, comprising:
Calculate the COS distance of the first eigenvector and the second feature vector;
The information correlation is determined according to the COS distance, wherein the COS distance and the information correlation are in just Correlativity.
6. method according to any one of claims 1 to 4, which is characterized in that if the information correlation is greater than correlation Spend threshold value, it is determined that the first information and second information is each other after recommendation information, the method also includes:
According to the descending of the information correlation, corresponding first recommendation list of the first information and second letter are generated Corresponding second recommendation list is ceased, includes at least one second information in first recommendation list, described second recommends It include at least one first information in list.
7. according to the method described in claim 6, it is characterized in that, the descending according to the information correlation, generates institute After stating corresponding first recommendation list of the first information and corresponding second recommendation list of second information, the method is also Include:
The recommendation request that terminal is sent is received, the recommendation request includes the message identification of target information;
According to the message identification, the corresponding target recommendation list of the target information is determined;
Recommendation information described in n item preceding in the target recommendation list is determined as information to be recommended, n >=1, n are integer;
The information to be recommended is pushed to the terminal.
8. a kind of information recommending apparatus, which is characterized in that described device includes:
Data obtaining module, for obtaining first information set and the second information aggregate, the first information set and described The form of expression of information is different in two information aggregates;
First computing module, for for each first information in the first information set, according to the first information Keyword calculates the first eigenvector of the first information;
Second computing module, for for the second information of each item in second information aggregate, according to second information Keyword calculates the second feature vector of second information;
Third computing module, for calculating the first information according to the first eigenvector and the second feature vector With the information correlation between second information;
First determining module, for determining the first information and described when the information correlation is greater than relevance threshold Second information recommendation information each other.
9. a kind of server, which is characterized in that the server includes processor and memory, be stored in the memory to Few an instruction, at least a Duan Chengxu, code set or instruction set, it is at least one instruction, an at least Duan Chengxu, described Code set or instruction set are executed by the processor to realize the information recommendation method as described in claim 1 to 7 is any.
10. a kind of computer readable storage medium, which is characterized in that be stored at least one instruction, extremely in the storage medium A few Duan Chengxu, code set or instruction set, at least one instruction, an at least Duan Chengxu, the code set or instruction Collection is executed by the processor to realize the information recommendation method as described in claim 1 to 7 is any.
CN201810980235.XA 2018-08-27 2018-08-27 Information recommendation method, device, server and storage medium Pending CN110334283A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810980235.XA CN110334283A (en) 2018-08-27 2018-08-27 Information recommendation method, device, server and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810980235.XA CN110334283A (en) 2018-08-27 2018-08-27 Information recommendation method, device, server and storage medium

Publications (1)

Publication Number Publication Date
CN110334283A true CN110334283A (en) 2019-10-15

Family

ID=68140058

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810980235.XA Pending CN110334283A (en) 2018-08-27 2018-08-27 Information recommendation method, device, server and storage medium

Country Status (1)

Country Link
CN (1) CN110334283A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111400546A (en) * 2020-03-18 2020-07-10 腾讯科技(深圳)有限公司 Video recall method and video recommendation method and device
CN112711716A (en) * 2021-01-25 2021-04-27 广东工业大学 Knowledge graph-based marine industry news pushing method and system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107038226A (en) * 2017-03-31 2017-08-11 努比亚技术有限公司 A kind of information recommendation method and the network equipment
CN107807940A (en) * 2016-09-09 2018-03-16 腾讯科技(深圳)有限公司 Information recommendation method and device
CN108197211A (en) * 2017-12-28 2018-06-22 百度在线网络技术(北京)有限公司 A kind of information recommendation method, device, server and storage medium
CN108287916A (en) * 2018-02-11 2018-07-17 北京方正阿帕比技术有限公司 A kind of resource recommendation method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107807940A (en) * 2016-09-09 2018-03-16 腾讯科技(深圳)有限公司 Information recommendation method and device
CN107038226A (en) * 2017-03-31 2017-08-11 努比亚技术有限公司 A kind of information recommendation method and the network equipment
CN108197211A (en) * 2017-12-28 2018-06-22 百度在线网络技术(北京)有限公司 A kind of information recommendation method, device, server and storage medium
CN108287916A (en) * 2018-02-11 2018-07-17 北京方正阿帕比技术有限公司 A kind of resource recommendation method

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111400546A (en) * 2020-03-18 2020-07-10 腾讯科技(深圳)有限公司 Video recall method and video recommendation method and device
CN112711716A (en) * 2021-01-25 2021-04-27 广东工业大学 Knowledge graph-based marine industry news pushing method and system

Similar Documents

Publication Publication Date Title
US9704185B2 (en) Product recommendation using sentiment and semantic analysis
CN109492772B (en) Method and device for generating information
US10747771B2 (en) Method and apparatus for determining hot event
US20230267348A1 (en) Computer-based systems configured for entity resolution and indexing of entity activity
CN109033408B (en) Information pushing method and device, computer readable storage medium and electronic equipment
CN107871166B (en) Feature processing method and feature processing system for machine learning
US20140089322A1 (en) System And Method for Ranking Creator Endorsements
Yan et al. A unified video recommendation by cross-network user modeling
CN110334283A (en) Information recommendation method, device, server and storage medium
Duan et al. A hybrid intelligent service recommendation by latent semantics and explicit ratings
WO2017028791A1 (en) Public number recommendation method and system
CN107809410B (en) Information filtering method and device
CN112818213A (en) Multimedia service data pushing method, device, equipment and storage medium
EP4116884A2 (en) Method and apparatus for training tag recommendation model, and method and apparatus for obtaining tag
CN117751368A (en) Privacy sensitive neural network training
WO2023284516A1 (en) Information recommendation method and apparatus based on knowledge graph, and device, medium, and product
Lu et al. Exploiting user and business attributes for personalized business recommendation
US20210241040A1 (en) Systems and Methods for Ground Truth Dataset Curation
CN113592315A (en) Method and device for processing dispute order
US11144599B2 (en) Method of and system for clustering documents
He Research on personalized search based on ElasticSearch
Sun et al. QoS prediction for Web service in Mobile Internet environment
JP2008146610A (en) Method of recommendation to user on network, recommendation server, and program
US11989506B2 (en) Systems for database searching and database schemas management and methods of use thereof
US20240054391A1 (en) Privacy-enhanced training and deployment of machine learning models using client-side and server-side data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination