Disclosure of Invention
In view of this, the main object of the present invention is to provide an intelligent news client recommendation system, where the interest bubbles are created, the motions of the interest bubbles are driven by the behaviors of the user to change the interest analysis of the user, and the recommended content is generated by retrieving features, so that the intelligence of content recommendation is realized, the user behavior is not relied on, the primary and secondary are better distinguished based on the interest bubbles during the feature retrieval process, and the retrieval efficiency is improved.
In order to achieve the purpose, the technical scheme of the invention is realized as follows:
news client intelligence recommendation system, the system includes: a local end and a server end; the local end comprises: the user interest bubble constructing unit is configured to construct user interest bubbles based on preset configuration information, each interest bubble corresponds to a primary classification, the primary classification is a defined user interest category, each primary category comprises a plurality of different secondary categories, each interest bubble comprises an interest center and a plurality of interest category sets, the interest category sets surround the interest centers in a floating interest set mode, and Euclidean distances between the interest centers and the interest centers are equal set values; the user interest path establishing unit is configured for acquiring a complete behavior path of a user within a set time range; the complete behavior path of the user is defined as: in a set time range, a user browses a starting point, a middle point and an end point of content; the first-level user interest map building unit is configured to perform first-level classification on a starting point, a middle point and an end point in a complete behavior path of a user, find the starting point, the middle point and the end point with a first classification level, find interest bubbles corresponding to the starting point, the middle point and the end point, count the number of the starting point, the middle point and the end point belonging to the same category and the positions of the starting point, the middle point and the end point in the path, calculate first weight values of the starting point, the middle point and the end point by using a preset first interest weight calculation model, and push the interest bubbles to move towards an interest center based on the calculated weight values; the secondary user interest map building unit is configured for carrying out secondary classification on a starting point, an intermediate point and an end point in a complete behavior path of a user, finding out the starting point, the intermediate point and the end point with the classification level of secondary, dividing the starting point, the intermediate point and the end point of the secondary into interest bubbles corresponding to the starting point, the intermediate point and the end point of the subordinate primary, and generating a model by using interest retrieval features based on secondary categories of the starting point, the intermediate point and the end point of the secondary to generate retrieval features corresponding to each interest bubble; the server side comprises: a content database configured to store content; the retrieval unit is configured for sequentially calling retrieval features from near to far according to the distance between the interest bubble and the interest center to perform feature retrieval in the content database and find out the content matched with the retrieval; and the content presentation unit is configured to send the retrieved and matched content to the client for presentation.
Further, the first interest weight calculation model is represented by the following formula:
(ii) a Wherein the content of the first and second substances,
is a weighted value;
the number of starting points, intermediate points or end points belonging to the same class of a first class classification;
the total number of starting points, intermediate points and end points;
the distance between the starting point, the middle point or the end point belonging to one category and the starting point, the middle point or the end point of other categories respectively; the separation distance is defined as the number of points between the starting point, the middle point and the end point and other points of different categories;
the weight initial value is a set value, and the value range is as follows: 100 to 300.
Further, the method for generating the search feature by the interest search feature generation model comprises the following steps: extracting category keywords corresponding to each secondary category; the category keywords are label keywords added during generation of the secondary category; preprocessing each tag keyword in the category keywords and converting the preprocessed tag keywords into word sequences; determining a word vector of each word, and calculating a tag keyword vector of each tag keyword; clustering the label keyword vectors, and dividing the category keywords into a plurality of label keyword subsets; and extracting retrieval features according to the divided tag keyword subsets.
Further, preprocessing each tag keyword in the category keywords, and converting the preprocessed tag keywords into a word sequence, including: for the English label key words, judging whether a space exists between every two words, if so, segmenting the words into words, and adding a sequence; for the Chinese label key words, the Chinese label key words are converted into word sequences through word segmentation and/or word pause.
Further, determining a word vector of each word, calculating a tag keyword vector of each tag keyword, and determining the word vector of each word; and calculating the label keyword vector of each label keyword according to the word vector of each word.
Further, the method for the retrieval unit to sequentially call the retrieval features from near to far according to the distance between the interest bubble and the interest center to perform feature retrieval in the content database and find the content matched with the retrieval comprises the following steps: acquiring retrieval characteristics; extracting core features of the retrieval features from the retrieval features by using a convolutional neural network model, wherein the convolutional neural network model is obtained by training based on historical retrieval features and a training set of historical retrieval data; and retrieving target content of which the core features are matched with the core features of the retrieval features based on the extracted core features of the retrieval features.
Further, the retrieving, based on the extracted core features of the retrieval features, target content whose core features match with the core features of the retrieval features includes: determining a hash bucket mapped by the core feature of the retrieval feature through a hash function; determining the content corresponding to each element existing in the hash bucket as the target content; the existing elements in the hash bucket are obtained by mapping the core features of each content through the hash function in advance, and the core features of each content are extracted from each content through the convolutional neural network model.
Further, the extracting core features of the search features from the search features by using a convolutional neural network model includes: and performing dimensionality reduction on the core features extracted from the retrieval features by using the convolutional neural network model, and taking the core features obtained after dimensionality reduction as the core features of the retrieval features.
Further, the retrieving the target content whose core features are matched with the core features of the retrieval features based on the extracted core features of the retrieval features specifically includes: retrieving target content of which the core features are matched with the core features of the retrieval features from a content retrieval database based on the extracted core features of the retrieval features; the content retrieval database establishes indexes for core features in a mode of combining a locality sensitive hashing algorithm and a distributed system.
Further, when extracting the category keywords corresponding to each secondary category, extracting the category keywords according to the order from near to far of the distance from the primary category to which each secondary category belongs to the interest center.
The intelligent news client recommendation system has the following beneficial effects:
when the method and the device are used for pushing the content, the user portrait is not constructed for each user, but the interest bubble is constructed according to the one-time behavior of the user, the construction according to the behavior is different from the prior art, and the transverse classification and the longitudinal classification are carried out according to the one-time complete behavior chain of the user, so that the more accurate content pushing is carried out, and the efficiency and the accuracy of the content pushing are improved.
Detailed Description
The method of the present invention will be described in further detail below with reference to the accompanying drawings and embodiments of the invention.
Example 1
As shown in fig. 1, the intelligent news client recommendation system includes: a local end and a server end; the local end comprises: the user interest bubble constructing unit is configured to construct user interest bubbles based on preset configuration information, each interest bubble corresponds to a primary classification, the primary classification is a defined user interest category, each primary category comprises a plurality of different secondary categories, each interest bubble comprises an interest center and a plurality of interest category sets, the interest category sets surround the interest centers in a floating interest set mode, and Euclidean distances between the interest centers and the interest centers are equal set values; the user interest path establishing unit is configured for acquiring a complete behavior path of a user within a set time range; the complete behavior path of the user is defined as: in a set time range, a user browses a starting point, a middle point and an end point of content; the first-level user interest map building unit is configured to perform first-level classification on a starting point, a middle point and an end point in a complete behavior path of a user, find the starting point, the middle point and the end point with a first classification level, find interest bubbles corresponding to the starting point, the middle point and the end point, count the number of the starting point, the middle point and the end point belonging to the same category and the positions of the starting point, the middle point and the end point in the path, calculate first weight values of the starting point, the middle point and the end point by using a preset first interest weight calculation model, and push the interest bubbles to move towards an interest center based on the calculated weight values; the secondary user interest map building unit is configured for carrying out secondary classification on a starting point, an intermediate point and an end point in a complete behavior path of a user, finding out the starting point, the intermediate point and the end point with the classification level of secondary, dividing the starting point, the intermediate point and the end point of the secondary into interest bubbles corresponding to the starting point, the intermediate point and the end point of the subordinate primary, and generating a model by using interest retrieval features based on secondary categories of the starting point, the intermediate point and the end point of the secondary to generate retrieval features corresponding to each interest bubble; the server side comprises: a content database configured to store content; the retrieval unit is configured for sequentially calling retrieval features from near to far according to the distance between the interest bubble and the interest center to perform feature retrieval in the content database and find out the content matched with the retrieval; and the content presentation unit is configured to send the retrieved and matched content to the client for presentation.
Specifically, taking a recommendation method based on click rate estimation as an example, a deep network model is set in the server. For each pair of 'user-content' combinations in the candidate content set, predicting the clicking probability of the user on the content by the deep network model according to the historical clicking behaviors of the user, the semantic features and the context features of the content; then, for the content to be recommended of a certain user, recommending the content ranked at the top n as an information stream to the user according to the sequence from high click probability to low click probability.
In the related art, a recommendation algorithm usually selects push information according to the interest of a target user and judges the interest degree of the user in the information by analyzing the information content, but the recommendation method neglects the requirements of the user for acquiring the current hot event, reading the high-quality content of the small people and the like, and usually has the problem of low accuracy.
Example 2
On the basis of the above embodiment, the first interest weight calculation model is represented by the following formula:
(ii) a Wherein the content of the first and second substances,
is a weighted value;
the number of starting points, intermediate points or end points belonging to the same class of a first class classification;
the total number of starting points, intermediate points and end points;
is the distance between the starting point, the middle point or the end point belonging to one category and the starting point, the middle point or the end point of other categories respectively; the separation distance is defined as the number of points between the starting point, the middle point and the end point and other points of different categories;
the weight initial value is a set value, and the value range is as follows: 100 to 300.
Referring to fig. 2 and 3, the letter symbols in fig. 2 show a plurality of interest bubbles, and the distance between the interest bubble and the interest center may be a positive value or a negative value, and when the distance is a negative value, an absolute value is required.
FIG. 3 shows the jump chains at each point in the secondary classification.
Example 3
On the basis of the previous embodiment, the method for generating the search feature by the interest search feature generation model comprises the following steps: extracting category keywords corresponding to each secondary category; the category keywords are label keywords added during generation of the secondary category; preprocessing each label keyword in the category keywords and converting the preprocessed label keywords into a word sequence; determining a word vector of each word, and calculating a tag keyword vector of each tag keyword; clustering the label keyword vectors, and dividing the category keywords into a plurality of label keyword subsets; and extracting retrieval features according to the divided tag keyword subsets.
In particular, conventional methods typically include steps of text localization, pre-processing (typically including normalization, enhancement, binarization), and OCR character recognition. Each of which involves many other complex methods, each of which will affect the accuracy of the final recognition result. Chen's paper Automatic detection and recognition of signals from natural scenes suggests a method for detecting and recognizing signals from images of natural scenes. The method comprises the steps of detecting a text by utilizing LoG (Laplacian of Gaussian) edge detection, color modeling, layout analysis and affine correction, then carrying out normalization processing on the text, and finally carrying out text recognition by utilizing OCR (optical character recognition) based on gray level. Koga's paper Camera-based Kanji OCR for mobile-phones, practical issues (used for Camera-based chinese character OCR for practical use of mobile phones) proposes a Camera-based chinese character recognition method for mobile phones. The first part of the method comprises four steps: pre-binarization, rough layout analysis, line direction detection and line segmentation. The latter part also comprises four steps: fine binarization, pre-segmentation, chinese character recognition and post-processing. Due to such OCR-based methods, the recognition accuracy is closely related to the text localization and the enhanced image quality.
Example 4
On the basis of the previous embodiment, preprocessing each tag keyword in the category keywords to convert the tag keywords into a word sequence, including: for the English label key words, judging whether a space exists between every two words, if so, segmenting the words into words, and adding a sequence; for the Chinese label key words, the Chinese label key words are converted into word sequences through word segmentation and/or word pause.
Example 5
On the basis of the previous embodiment, determining a word vector of each word, calculating a tag keyword vector of each tag keyword, and determining the word vector of each word; and calculating the label keyword vector of each label keyword according to the word vector of each word.
Example 6
On the basis of the previous embodiment, the method for the retrieval unit to sequentially call the retrieval features from near to far according to the distance between the interest bubble and the interest center to perform feature retrieval in the content database, and finding the content matched with the retrieval comprises the following steps: acquiring retrieval characteristics; extracting core features of the retrieval features from the retrieval features by using a convolutional neural network model, wherein the convolutional neural network model is obtained by training based on historical retrieval features and a training set of historical retrieval data; and retrieving target content of which the core features are matched with the core features of the retrieval features based on the extracted core features of the retrieval features.
Example 7
On the basis of the above embodiment, the retrieving, based on the extracted core features of the retrieval features, target content whose core features match with the core features of the retrieval features includes: determining a hash bucket mapped by the core feature of the retrieval feature through a hash function; determining the content corresponding to each element existing in the hash bucket as the target content; the existing elements in the hash bucket are obtained by mapping the core features of each content through the hash function in advance, and the core features of each content are extracted from each content through the convolutional neural network model.
Example 8
On the basis of the above embodiment, the extracting core features of the search features from the search features by using a convolutional neural network model includes: and performing dimension reduction on the core features extracted from the retrieval features by using the convolutional neural network model, and taking the core features obtained after the dimension reduction as the core features of the retrieval features.
Example 9
On the basis of the previous embodiment, the retrieving, based on the extracted core feature of the retrieval feature, target content of which the core feature is matched with the core feature of the retrieval feature specifically includes: based on the extracted core features of the retrieval features, retrieving target content of which the core features are matched with the core features of the retrieval features from a content retrieval database; the content retrieval database establishes indexes for core features in a mode of combining a locality sensitive hashing algorithm and a distributed system.
Example 10
On the basis of the previous embodiment, when extracting the category keywords corresponding to each secondary category, the extraction is performed according to the order from near to far of the distance from the primary category to which each secondary category belongs to the interest center.
It can be clearly understood by those skilled in the art that, for convenience and simplicity of description, the specific working process and related description of the system described above may refer to the corresponding process in the foregoing method embodiment, and details are not described herein again.
It should be noted that, the system provided in the foregoing embodiment is only illustrated by dividing the functional units, and in practical applications, the functions may be distributed by different functional units according to needs, that is, the units or steps in the embodiments of the present invention are further decomposed or combined, for example, the units in the foregoing embodiment may be combined into one unit, or may be further separated into multiple sub-units, so as to complete the functions of the whole unit or the unit described above. The names of the units and steps involved in the embodiments of the present invention are only for distinguishing the units or steps, and are not to be construed as unduly limiting the present invention.
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes and related descriptions of the storage module and the processing module described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
Those of skill in the art would appreciate that the various illustrative elements, method steps, described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that programs corresponding to the elements, method steps may be located in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. To clearly illustrate this interchangeability of electronic hardware and software, various illustrative components and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as electronic hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The terms "first," "second," and the like are used for distinguishing between similar elements and not necessarily for describing or implying a particular order or sequence.
The terms "comprises," "comprising," or any other similar term are intended to cover a non-exclusive inclusion, such that a process, method, article, or unit/module that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or unit/module.
So far, the technical solutions of the present invention have been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of the present invention is obviously not limited to these specific embodiments. Equivalent changes or substitutions of related technical marks can be made by those skilled in the art without departing from the principle of the invention, and the technical scheme after the changes or substitutions can fall into the protection scope of the invention.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention.