CN114880572B

CN114880572B - Intelligent news client recommendation system

Info

Publication number: CN114880572B
Application number: CN202210564514.4A
Authority: CN
Inventors: 郑创伟; 符捷雯; 陈义飞; 金勇�; 谢志成; 王泳; 陈少彬; 刑谷涛; 罗佩珊
Original assignee: Shenzhen Creative Intelligence Port Technology Co ltd
Current assignee: Shenzhen Creative Intelligence Port Technology Co ltd
Priority date: 2022-05-23
Filing date: 2022-05-23
Publication date: 2023-03-03
Anticipated expiration: 2042-05-23
Also published as: CN114880572A

Abstract

The invention relates to the technical field of content recommendation, in particular to an intelligent news client recommendation system. The system comprises: a local end and a server end; the local end is characterized by comprising: the user interest bubble building unit is used for building interest bubbles of a user based on preset configuration information, each interest bubble corresponds to a primary classification, the primary classification is a defined user interest category, each primary category comprises a plurality of different secondary categories, and each interest bubble comprises an interest center and a plurality of interest category sets. According to the method and the device, the interest bubbles are established, the motion of the interest bubbles is driven through the behaviors of the user to change the interest analysis of the user, and the recommended content is generated through retrieval characteristics, so that the intellectualization of content recommendation is realized, the behavior of the user is not depended on, the primary and secondary contents can be distinguished more clearly in the characteristic retrieval process based on the interest bubbles, and the retrieval efficiency is improved.

Description

Intelligent news client recommendation system

Technical Field

The invention belongs to the technical field of content recommendation, and particularly relates to an intelligent verification system for internet news content data.

Background

The personalized recommendation system is a tool for helping users to quickly find useful information, and can provide personalized services for different users so as to meet specific interests and requirements of the users. Unlike search engines, recommendation systems do not require users to provide explicit needs, but rather model the interests of users by analyzing their historical behavior and proactively recommend to users information that can satisfy their interests and needs based on this.

The application of the personalized recommendation system can be seen in various websites of the internet, including e-commerce, movies and videos, music, social networks and the like. And applying recommendation systems such as Taobao and Amazon to predict the commodities which are possibly interested by the user to be recommended by personalized recommendation models such as collaborative filtering. Collaborative Filtering (CF) is a recommendation of items or information of interest to a user using the preferences of a community of shared interest and common experience.

The personalized news recommendation system is a recommendation system for recommending interested news information to a user according to the interest characteristics and behaviors of the user. The personalized news recommendation technology is an extended application of personalized recommendation in the news processing field, news is automatically recommended to interested users through a recommendation system, and the benefit of news websites and website users is double . The personalized news recommendation system applies personalized recommendation to the recommendation of news, can help a user to easily acquire interesting news from massive information on the Internet, and excavates content which the user may be interested in.

At present, the most widely applied collaborative filtering personalized recommendation technology has two modes: user-based collaborative filtering and item-based collaborative filtering. The former mainly comprises three steps: a user behavior data representation; searching a plurality of users most similar to the target user by using a user similarity calculation method; and predicting the behaviors of the target user to the items according to the behaviors of the similar users to the items, and recommending. The latter also comprises three steps: a project behavior data representation; calculating the similarity between the projects by using a project similarity calculation method; recommending the item which is most similar to the item of the user generated action to the user.

The method is always based on the user similarity and the item similarity, and in the judgment of the similarity, the final recommendation result is easily influenced due to algorithm errors. And through the mode of target user portrayal, a large amount of user data need to be collected and called, on one hand, the efficiency is low, and on the other hand, more user rights need to be acquired.

Disclosure of Invention

In view of this, the main object of the present invention is to provide an intelligent news client recommendation system, where the interest bubbles are created, the motions of the interest bubbles are driven by the behaviors of the user to change the interest analysis of the user, and the recommended content is generated by retrieving features, so that the intelligence of content recommendation is realized, the user behavior is not relied on, the primary and secondary are better distinguished based on the interest bubbles during the feature retrieval process, and the retrieval efficiency is improved.

In order to achieve the purpose, the technical scheme of the invention is realized as follows:

news client intelligence recommendation system, the system includes: a local end and a server end; the local end comprises: the user interest bubble constructing unit is configured to construct user interest bubbles based on preset configuration information, each interest bubble corresponds to a primary classification, the primary classification is a defined user interest category, each primary category comprises a plurality of different secondary categories, each interest bubble comprises an interest center and a plurality of interest category sets, the interest category sets surround the interest centers in a floating interest set mode, and Euclidean distances between the interest centers and the interest centers are equal set values; the user interest path establishing unit is configured for acquiring a complete behavior path of a user within a set time range; the complete behavior path of the user is defined as: in a set time range, a user browses a starting point, a middle point and an end point of content; the first-level user interest map building unit is configured to perform first-level classification on a starting point, a middle point and an end point in a complete behavior path of a user, find the starting point, the middle point and the end point with a first classification level, find interest bubbles corresponding to the starting point, the middle point and the end point, count the number of the starting point, the middle point and the end point belonging to the same category and the positions of the starting point, the middle point and the end point in the path, calculate first weight values of the starting point, the middle point and the end point by using a preset first interest weight calculation model, and push the interest bubbles to move towards an interest center based on the calculated weight values; the secondary user interest map building unit is configured for carrying out secondary classification on a starting point, an intermediate point and an end point in a complete behavior path of a user, finding out the starting point, the intermediate point and the end point with the classification level of secondary, dividing the starting point, the intermediate point and the end point of the secondary into interest bubbles corresponding to the starting point, the intermediate point and the end point of the subordinate primary, and generating a model by using interest retrieval features based on secondary categories of the starting point, the intermediate point and the end point of the secondary to generate retrieval features corresponding to each interest bubble; the server side comprises: a content database configured to store content; the retrieval unit is configured for sequentially calling retrieval features from near to far according to the distance between the interest bubble and the interest center to perform feature retrieval in the content database and find out the content matched with the retrieval; and the content presentation unit is configured to send the retrieved and matched content to the client for presentation.

Further, the first interest weight calculation model is represented by the following formula:

(ii) a Wherein the content of the first and second substances,

is a weighted value;

the number of starting points, intermediate points or end points belonging to the same class of a first class classification;

the total number of starting points, intermediate points and end points;

the distance between the starting point, the middle point or the end point belonging to one category and the starting point, the middle point or the end point of other categories respectively; the separation distance is defined as the number of points between the starting point, the middle point and the end point and other points of different categories;

the weight initial value is a set value, and the value range is as follows: 100 to 300.

Further, the method for generating the search feature by the interest search feature generation model comprises the following steps: extracting category keywords corresponding to each secondary category; the category keywords are label keywords added during generation of the secondary category; preprocessing each tag keyword in the category keywords and converting the preprocessed tag keywords into word sequences; determining a word vector of each word, and calculating a tag keyword vector of each tag keyword; clustering the label keyword vectors, and dividing the category keywords into a plurality of label keyword subsets; and extracting retrieval features according to the divided tag keyword subsets.

Further, preprocessing each tag keyword in the category keywords, and converting the preprocessed tag keywords into a word sequence, including: for the English label key words, judging whether a space exists between every two words, if so, segmenting the words into words, and adding a sequence; for the Chinese label key words, the Chinese label key words are converted into word sequences through word segmentation and/or word pause.

Further, determining a word vector of each word, calculating a tag keyword vector of each tag keyword, and determining the word vector of each word; and calculating the label keyword vector of each label keyword according to the word vector of each word.

Further, the method for the retrieval unit to sequentially call the retrieval features from near to far according to the distance between the interest bubble and the interest center to perform feature retrieval in the content database and find the content matched with the retrieval comprises the following steps: acquiring retrieval characteristics; extracting core features of the retrieval features from the retrieval features by using a convolutional neural network model, wherein the convolutional neural network model is obtained by training based on historical retrieval features and a training set of historical retrieval data; and retrieving target content of which the core features are matched with the core features of the retrieval features based on the extracted core features of the retrieval features.

Further, the retrieving, based on the extracted core features of the retrieval features, target content whose core features match with the core features of the retrieval features includes: determining a hash bucket mapped by the core feature of the retrieval feature through a hash function; determining the content corresponding to each element existing in the hash bucket as the target content; the existing elements in the hash bucket are obtained by mapping the core features of each content through the hash function in advance, and the core features of each content are extracted from each content through the convolutional neural network model.

Further, the extracting core features of the search features from the search features by using a convolutional neural network model includes: and performing dimensionality reduction on the core features extracted from the retrieval features by using the convolutional neural network model, and taking the core features obtained after dimensionality reduction as the core features of the retrieval features.

Further, the retrieving the target content whose core features are matched with the core features of the retrieval features based on the extracted core features of the retrieval features specifically includes: retrieving target content of which the core features are matched with the core features of the retrieval features from a content retrieval database based on the extracted core features of the retrieval features; the content retrieval database establishes indexes for core features in a mode of combining a locality sensitive hashing algorithm and a distributed system.

Further, when extracting the category keywords corresponding to each secondary category, extracting the category keywords according to the order from near to far of the distance from the primary category to which each secondary category belongs to the interest center.

The intelligent news client recommendation system has the following beneficial effects:

when the method and the device are used for pushing the content, the user portrait is not constructed for each user, but the interest bubble is constructed according to the one-time behavior of the user, the construction according to the behavior is different from the prior art, and the transverse classification and the longitudinal classification are carried out according to the one-time complete behavior chain of the user, so that the more accurate content pushing is carried out, and the efficiency and the accuracy of the content pushing are improved.

Drawings

Fig. 1 is a schematic system structure diagram of an intelligent news client recommendation system according to an embodiment of the present invention;

fig. 2 is a schematic diagram illustrating an interest bubble and an interest center of an intelligent news client recommendation system according to an embodiment of the present invention;

fig. 3 is a diagram of a news client intelligent recommendation system according to an embodiment of the present invention.

Detailed Description

The method of the present invention will be described in further detail below with reference to the accompanying drawings and embodiments of the invention.

Example 1

As shown in fig. 1, the intelligent news client recommendation system includes: a local end and a server end; the local end comprises: the user interest bubble constructing unit is configured to construct user interest bubbles based on preset configuration information, each interest bubble corresponds to a primary classification, the primary classification is a defined user interest category, each primary category comprises a plurality of different secondary categories, each interest bubble comprises an interest center and a plurality of interest category sets, the interest category sets surround the interest centers in a floating interest set mode, and Euclidean distances between the interest centers and the interest centers are equal set values; the user interest path establishing unit is configured for acquiring a complete behavior path of a user within a set time range; the complete behavior path of the user is defined as: in a set time range, a user browses a starting point, a middle point and an end point of content; the first-level user interest map building unit is configured to perform first-level classification on a starting point, a middle point and an end point in a complete behavior path of a user, find the starting point, the middle point and the end point with a first classification level, find interest bubbles corresponding to the starting point, the middle point and the end point, count the number of the starting point, the middle point and the end point belonging to the same category and the positions of the starting point, the middle point and the end point in the path, calculate first weight values of the starting point, the middle point and the end point by using a preset first interest weight calculation model, and push the interest bubbles to move towards an interest center based on the calculated weight values; the secondary user interest map building unit is configured for carrying out secondary classification on a starting point, an intermediate point and an end point in a complete behavior path of a user, finding out the starting point, the intermediate point and the end point with the classification level of secondary, dividing the starting point, the intermediate point and the end point of the secondary into interest bubbles corresponding to the starting point, the intermediate point and the end point of the subordinate primary, and generating a model by using interest retrieval features based on secondary categories of the starting point, the intermediate point and the end point of the secondary to generate retrieval features corresponding to each interest bubble; the server side comprises: a content database configured to store content; the retrieval unit is configured for sequentially calling retrieval features from near to far according to the distance between the interest bubble and the interest center to perform feature retrieval in the content database and find out the content matched with the retrieval; and the content presentation unit is configured to send the retrieved and matched content to the client for presentation.

Specifically, taking a recommendation method based on click rate estimation as an example, a deep network model is set in the server. For each pair of 'user-content' combinations in the candidate content set, predicting the clicking probability of the user on the content by the deep network model according to the historical clicking behaviors of the user, the semantic features and the context features of the content; then, for the content to be recommended of a certain user, recommending the content ranked at the top n as an information stream to the user according to the sequence from high click probability to low click probability.

In the related art, a recommendation algorithm usually selects push information according to the interest of a target user and judges the interest degree of the user in the information by analyzing the information content, but the recommendation method neglects the requirements of the user for acquiring the current hot event, reading the high-quality content of the small people and the like, and usually has the problem of low accuracy.

Example 2

On the basis of the above embodiment, the first interest weight calculation model is represented by the following formula:

(ii) a Wherein the content of the first and second substances,

is a weighted value;

the total number of starting points, intermediate points and end points;

is the distance between the starting point, the middle point or the end point belonging to one category and the starting point, the middle point or the end point of other categories respectively; the separation distance is defined as the number of points between the starting point, the middle point and the end point and other points of different categories;

Referring to fig. 2 and 3, the letter symbols in fig. 2 show a plurality of interest bubbles, and the distance between the interest bubble and the interest center may be a positive value or a negative value, and when the distance is a negative value, an absolute value is required.

FIG. 3 shows the jump chains at each point in the secondary classification.

Example 3

On the basis of the previous embodiment, the method for generating the search feature by the interest search feature generation model comprises the following steps: extracting category keywords corresponding to each secondary category; the category keywords are label keywords added during generation of the secondary category; preprocessing each label keyword in the category keywords and converting the preprocessed label keywords into a word sequence; determining a word vector of each word, and calculating a tag keyword vector of each tag keyword; clustering the label keyword vectors, and dividing the category keywords into a plurality of label keyword subsets; and extracting retrieval features according to the divided tag keyword subsets.

In particular, conventional methods typically include steps of text localization, pre-processing (typically including normalization, enhancement, binarization), and OCR character recognition. Each of which involves many other complex methods, each of which will affect the accuracy of the final recognition result. Chen's paper Automatic detection and recognition of signals from natural scenes suggests a method for detecting and recognizing signals from images of natural scenes. The method comprises the steps of detecting a text by utilizing LoG (Laplacian of Gaussian) edge detection, color modeling, layout analysis and affine correction, then carrying out normalization processing on the text, and finally carrying out text recognition by utilizing OCR (optical character recognition) based on gray level. Koga's paper Camera-based Kanji OCR for mobile-phones, practical issues (used for Camera-based chinese character OCR for practical use of mobile phones) proposes a Camera-based chinese character recognition method for mobile phones. The first part of the method comprises four steps: pre-binarization, rough layout analysis, line direction detection and line segmentation. The latter part also comprises four steps: fine binarization, pre-segmentation, chinese character recognition and post-processing. Due to such OCR-based methods, the recognition accuracy is closely related to the text localization and the enhanced image quality.

Example 4

On the basis of the previous embodiment, preprocessing each tag keyword in the category keywords to convert the tag keywords into a word sequence, including: for the English label key words, judging whether a space exists between every two words, if so, segmenting the words into words, and adding a sequence; for the Chinese label key words, the Chinese label key words are converted into word sequences through word segmentation and/or word pause.

Example 5

On the basis of the previous embodiment, determining a word vector of each word, calculating a tag keyword vector of each tag keyword, and determining the word vector of each word; and calculating the label keyword vector of each label keyword according to the word vector of each word.

Example 6

On the basis of the previous embodiment, the method for the retrieval unit to sequentially call the retrieval features from near to far according to the distance between the interest bubble and the interest center to perform feature retrieval in the content database, and finding the content matched with the retrieval comprises the following steps: acquiring retrieval characteristics; extracting core features of the retrieval features from the retrieval features by using a convolutional neural network model, wherein the convolutional neural network model is obtained by training based on historical retrieval features and a training set of historical retrieval data; and retrieving target content of which the core features are matched with the core features of the retrieval features based on the extracted core features of the retrieval features.

Example 7

On the basis of the above embodiment, the retrieving, based on the extracted core features of the retrieval features, target content whose core features match with the core features of the retrieval features includes: determining a hash bucket mapped by the core feature of the retrieval feature through a hash function; determining the content corresponding to each element existing in the hash bucket as the target content; the existing elements in the hash bucket are obtained by mapping the core features of each content through the hash function in advance, and the core features of each content are extracted from each content through the convolutional neural network model.

Example 8

On the basis of the above embodiment, the extracting core features of the search features from the search features by using a convolutional neural network model includes: and performing dimension reduction on the core features extracted from the retrieval features by using the convolutional neural network model, and taking the core features obtained after the dimension reduction as the core features of the retrieval features.

Example 9

On the basis of the previous embodiment, the retrieving, based on the extracted core feature of the retrieval feature, target content of which the core feature is matched with the core feature of the retrieval feature specifically includes: based on the extracted core features of the retrieval features, retrieving target content of which the core features are matched with the core features of the retrieval features from a content retrieval database; the content retrieval database establishes indexes for core features in a mode of combining a locality sensitive hashing algorithm and a distributed system.

Example 10

On the basis of the previous embodiment, when extracting the category keywords corresponding to each secondary category, the extraction is performed according to the order from near to far of the distance from the primary category to which each secondary category belongs to the interest center.

It can be clearly understood by those skilled in the art that, for convenience and simplicity of description, the specific working process and related description of the system described above may refer to the corresponding process in the foregoing method embodiment, and details are not described herein again.

It should be noted that, the system provided in the foregoing embodiment is only illustrated by dividing the functional units, and in practical applications, the functions may be distributed by different functional units according to needs, that is, the units or steps in the embodiments of the present invention are further decomposed or combined, for example, the units in the foregoing embodiment may be combined into one unit, or may be further separated into multiple sub-units, so as to complete the functions of the whole unit or the unit described above. The names of the units and steps involved in the embodiments of the present invention are only for distinguishing the units or steps, and are not to be construed as unduly limiting the present invention.

It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes and related descriptions of the storage module and the processing module described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

Those of skill in the art would appreciate that the various illustrative elements, method steps, described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that programs corresponding to the elements, method steps may be located in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. To clearly illustrate this interchangeability of electronic hardware and software, various illustrative components and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as electronic hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

The terms "first," "second," and the like are used for distinguishing between similar elements and not necessarily for describing or implying a particular order or sequence.

The terms "comprises," "comprising," or any other similar term are intended to cover a non-exclusive inclusion, such that a process, method, article, or unit/module that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or unit/module.

So far, the technical solutions of the present invention have been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of the present invention is obviously not limited to these specific embodiments. Equivalent changes or substitutions of related technical marks can be made by those skilled in the art without departing from the principle of the invention, and the technical scheme after the changes or substitutions can fall into the protection scope of the invention.

The above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention.

Claims

1. News client intelligence recommendation system, the system includes: a local end and a server end; the local end is characterized by comprising: the user interest bubble building unit is configured for building interest bubbles of a user based on preset configuration information, each interest bubble corresponds to one primary classification, the primary classification is a defined user interest category, each primary category comprises a plurality of different secondary categories, each interest bubble comprises an interest center and a plurality of interest category sets, the interest category sets surround the interest centers in a floating interest set mode, and Euclidean distances between the interest centers are equal set values; the user interest path establishing unit is configured for acquiring a complete behavior path of a user within a set time range; the complete behavior path of the user is defined as: in a set time range, a user browses a starting point, a middle point and an end point of content; the first-level user interest map building unit is configured to perform first-level classification on a starting point, a middle point and an end point in a complete behavior path of a user, find the starting point, the middle point and the end point with a first classification level, find interest bubbles corresponding to the starting point, the middle point and the end point, count the number of the starting point, the middle point and the end point belonging to the same category and the positions of the starting point, the middle point and the end point in the path, calculate first weight values of the starting point, the middle point and the end point by using a preset first interest weight calculation model, and push the interest bubbles to move towards an interest center based on the calculated weight values; the secondary user interest map building unit is configured for carrying out secondary classification on a starting point, an intermediate point and an end point in a complete behavior path of a user, finding out the starting point, the intermediate point and the end point with the classification level of secondary, dividing the starting point, the intermediate point and the end point of the secondary into interest bubbles corresponding to the starting point, the intermediate point and the end point of the subordinate primary, and generating a model by using interest retrieval features based on secondary categories of the starting point, the intermediate point and the end point of the secondary to generate retrieval features corresponding to each interest bubble; the server side comprises: a content database configured to store content; the retrieval unit is configured for sequentially calling retrieval features from near to far according to the distance between the interest bubble and the interest center to perform feature retrieval in the content database and find out the content matched with the retrieval; and the content presentation unit is configured to send the retrieved and matched content to the client for presentation.

2. The system of claim 1, wherein the first interest weight calculation model is represented using the formula:

(ii) a Wherein, the first and the second end of the pipe are connected with each other,

is a weighted value;

the number of starting points, intermediate points or end points which belong to the same class of a primary classification;

the total number of starting points, intermediate points and end points;

3. The system of claim 1, wherein the method for generating search features by the interest search feature generation model comprises: extracting category keywords corresponding to each secondary category; the category keywords are label keywords added during generation of the secondary category; preprocessing each label keyword in the category keywords and converting the preprocessed label keywords into a word sequence; determining a word vector of each word, and calculating a tag keyword vector of each tag keyword; clustering the label keyword vectors, and dividing the category keywords into a plurality of label keyword subsets; and extracting retrieval features according to the divided tag keyword subsets.

4. The system of claim 3, wherein preprocessing each tag keyword in the category keywords into a sequence of words comprises: for the English label key words, judging whether a space exists between every two words, if so, segmenting the words into words, and adding a sequence; for the Chinese label key words, the Chinese label key words are converted into word sequences through word segmentation and/or word pause.

5. The system of claim 4, wherein a word vector for each word is determined and a tag keyword vector for each tag keyword is calculated to determine a word vector for each word; and calculating the label keyword vector of each label keyword according to the word vector of each word.

6. The system of claim 1, wherein the retrieval unit sequentially retrieves the retrieval features from near to far according to the distance between the interest bubble and the interest center for feature retrieval in the content database, and the method for finding the content matching the retrieval comprises: acquiring retrieval characteristics; extracting core features of the retrieval features from the retrieval features by using a convolutional neural network model, wherein the convolutional neural network model is obtained by training based on historical retrieval features and a training set of historical retrieval data; and retrieving target content of which the core features are matched with the core features of the retrieval features based on the extracted core features of the retrieval features.

7. The system of claim 6, wherein retrieving target content whose core features match the core features of the retrieved features based on the extracted core features of the retrieved features comprises: determining a hash bucket mapped by the core feature of the retrieval feature through a hash function; determining the content corresponding to each element existing in the hash bucket as the target content; the existing elements in the hash bucket are obtained by mapping the core features of each content through the hash function in advance, and the core features of each content are extracted from each content through the convolutional neural network model.

8. The system of claim 7, wherein said extracting core features of said search features from said search features using a convolutional neural network model comprises: and performing dimensionality reduction on the core features extracted from the retrieval features by using the convolutional neural network model, and taking the core features obtained after dimensionality reduction as the core features of the retrieval features.

9. The system according to claim 8, wherein the retrieving the target content whose core feature matches the core feature of the retrieval feature based on the extracted core feature of the retrieval feature specifically comprises: based on the extracted core features of the retrieval features, retrieving target content of which the core features are matched with the core features of the retrieval features from a content retrieval database; the content retrieval database establishes indexes for core features in a mode of combining a locality sensitive hashing algorithm and a distributed system.

10. The system of claim 3, wherein in extracting the category keyword corresponding to each secondary category, the extraction is performed in order from near to far of the distance from the interest center to the primary category to which each secondary category belongs.