CN116049529A

CN116049529A - Content recommendation method, device, medium and electronic equipment

Info

Publication number: CN116049529A
Application number: CN202111256472.XA
Authority: CN
Inventors: 杨飞; 洪进栋; 邵佳帅
Original assignee: Beijing ByteDance Network Technology Co Ltd
Current assignee: Beijing ByteDance Network Technology Co Ltd
Priority date: 2021-10-27
Filing date: 2021-10-27
Publication date: 2023-05-02

Abstract

The disclosure relates to a content recommendation method, a device, a medium and an electronic device, wherein the method comprises the following steps: acquiring a plurality of contents to be recommended corresponding to a target user; determining a content feature vector of each content to be recommended based on the content matching model, and determining a candidate content group corresponding to each content to be recommended according to the content feature vector; determining a user feature vector of a target user based on the content matching model, and determining a target user group to which the target user belongs according to the user feature vector; determining whether the content to be recommended is target recommended content matched with a target user according to the target user group, the candidate content group and the content matching relation, wherein the content matching relation comprises a plurality of groups of content difference parameters of a combination pair formed by the user group and the content group, and the content difference parameters are used for representing the degree of matching degree deviation between the user in the user group and the content in the content group; and recommending the content for the target user according to the target recommended content.

Description

Content recommendation method, device, medium and electronic equipment

Technical Field

The disclosure relates to the technical field of computers, and in particular relates to a content recommendation method, a content recommendation device, a content recommendation medium and electronic equipment.

Background

With the development of computer technology, various application degrees of reading contents bring great convenience to users for knowing information. In order to further meet the use requirement of the user, content recommendation is generally performed for the user, so that the user can obtain the content which the user wants to read without searching.

In the related art, when recommending content to a user, content that may be of interest to the user may be generally determined according to historical content read by the user or interest tags selected by the user, so as to recommend content to the user. For some users, when the related features of the users that can be referred to are fewer, the determined recommended content may deviate from the actual reading requirement of the user greatly, and the reading experience of the user is affected.

Disclosure of Invention

This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.

In a first aspect, the present disclosure provides a content recommendation method, the method comprising:

Acquiring a plurality of contents to be recommended corresponding to a target user;

determining a content feature vector of each content to be recommended based on a content matching model, and determining a candidate content group corresponding to each content to be recommended according to the content feature vector;

determining a user characteristic vector of the target user based on the content matching model, and determining a target user group to which the target user belongs according to the user characteristic vector;

determining whether the content to be recommended is target recommended content matched with the target user according to the target user group, the candidate content group and a content matching relation, wherein the content matching relation comprises a plurality of groups of content difference parameters of combination pairs formed by the user group and the content group, and the content difference parameters are used for representing the degree of matching deviation between the user in the user group and the content in the content group;

and recommending the content for the target user according to the target recommended content.

In a second aspect, the present disclosure provides a content recommendation apparatus, the apparatus comprising:

the acquisition module is used for acquiring a plurality of contents to be recommended corresponding to the target user;

the first determining module is used for determining a content characteristic vector of each content to be recommended based on a content matching model, and determining a candidate content group corresponding to each content to be recommended according to the content characteristic vector;

The second determining module is used for determining a user characteristic vector of the target user based on the content matching model and determining a target user group to which the target user belongs according to the user characteristic vector;

a third determining module, configured to determine, according to the target user group, the candidate content group, and a content matching relationship, whether the content to be recommended is a target recommended content that matches the target user, where the content matching relationship includes a plurality of sets of content difference parameters of a combination pair formed by the user group and the content group, where the content difference parameters are used to characterize a degree of deviation of a degree of matching between a user in the user group and a content in the content group;

and the recommending module is used for recommending the content for the target user according to the target recommended content.

In a third aspect, the present disclosure provides a computer readable medium having stored thereon a computer program which when executed by a processing device performs the steps of the method of the first aspect.

In a fourth aspect, the present disclosure provides an electronic device comprising:

a storage device having a computer program stored thereon;

processing means for executing said computer program in said storage means to carry out the steps of the method of the first aspect.

In the above technical solution, after obtaining a plurality of contents to be recommended corresponding to a target user, the method may further determine a content feature vector of each of the contents to be recommended based on a content matching model, determine a candidate content group corresponding to each of the contents to be recommended according to the content feature vector, determine a user feature vector of the target user based on the content matching model, and determine a target user group to which the target user belongs according to the user feature vector, thereby determining whether the contents to be recommended are target recommended contents matched with the target user according to the target user group, the candidate content group, and a content matching relationship, and performing content recommendation for the target user according to the target recommended contents. Therefore, through the technical scheme, after the content to be recommended is determined based on the user characteristics of the user, whether the content to be recommended is suitable for the user or not is determined based on the degree of deviation of the degree of matching of the user group to the candidate content group to which each content to be recommended belongs, on one hand, the determined content to be recommended can be further screened, the content with lower degree of matching is prevented from being recommended to the user, meanwhile, the screening is performed by combining the degree of deviation of the degree of matching between the user group and the content group, the determined recommended content can be ensured to be the content which is matched with the user more, and individual deviation caused by screening single content is avoided; on the other hand, the recommended content which is more matched with the user can be determined by combining the user characteristics in the user group to which the user belongs, and the accurate recommendation of the user is realized when the user characteristics of the user are fewer, so that the user can obtain the interesting content without searching or checking for many times, the actual reading requirement of the user is met, the user operation is saved, and the user use experience is improved.

Additional features and advantages of the present disclosure will be set forth in the detailed description which follows.

Drawings

The above and other features, advantages, and aspects of embodiments of the present disclosure will become more apparent by reference to the following detailed description when taken in conjunction with the accompanying drawings. The same or similar reference numbers will be used throughout the drawings to refer to the same or like elements. It should be understood that the figures are schematic and that elements and components are not necessarily drawn to scale. In the drawings:

FIG. 1 is a flow chart of a content recommendation method provided in accordance with an embodiment of the present disclosure;

FIG. 2 is a flow chart of a training process for a content matching model provided in accordance with an embodiment of the present disclosure;

FIG. 3 is a schematic diagram of the structure of a content matching model provided in accordance with an embodiment of the present disclosure;

FIG. 4 is a description of a plurality of content packets provided in accordance with an embodiment of the present disclosure;

FIG. 5 is a schematic diagram of comparing content variance parameters of a user group corresponding to content groupings with variance parameters determined by an actual questionnaire, provided in accordance with an embodiment of the present disclosure;

FIG. 6 is a block diagram of a content recommendation device provided in accordance with an embodiment of the present disclosure;

fig. 7 shows a schematic diagram of an electronic device suitable for use in implementing embodiments of the present disclosure.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure have been shown in the accompanying drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but are provided to provide a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the present disclosure are for illustration purposes only and are not intended to limit the scope of the present disclosure.

It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order and/or performed in parallel. Furthermore, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.

The term "including" and variations thereof as used herein are intended to be open-ended, i.e., including, but not limited to. The term "based on" is based at least in part on. The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments. Related definitions of other terms will be given in the description below.

It should be noted that the terms "first," "second," and the like in this disclosure are merely used to distinguish between different devices, modules, or units and are not used to define an order or interdependence of functions performed by the devices, modules, or units.

It should be noted that references to "one", "a plurality" and "a plurality" in this disclosure are intended to be illustrative rather than limiting, and those of ordinary skill in the art will appreciate that "one or more" is intended to be understood as "one or more" unless the context clearly indicates otherwise.

The names of messages or information interacted between the various devices in the embodiments of the present disclosure are for illustrative purposes only and are not intended to limit the scope of such messages or information.

Fig. 1 is a flowchart of a content recommendation method according to an embodiment of the present disclosure, and as shown in fig. 1, the method includes:

in step 11, a plurality of contents to be recommended corresponding to the target user are obtained.

For example, the plurality of contents to be recommended corresponding to the user may be determined according to the user characteristics of the user or the interest tags of the user in case of being authorized by the user, e.g., the contents to be recommended may be determined based on a content recommendation algorithm commonly used in the art, which is not limited in the present disclosure.

In step 12, a content feature vector of each content to be recommended is determined based on the content matching model, and a candidate content group corresponding to each content to be recommended is determined according to the content feature vector.

The content grouping can be obtained by clustering based on a plurality of historical content data, and each historical content data belonging to the same content grouping has similar content characteristics. Accordingly, in this step, a content feature vector of each content to be recommended may be determined, where a feature dimension of the content feature vector is the same as a feature dimension corresponding to grouping historical content data to obtain a content group, so that after determining the content feature vector, a candidate content group corresponding to the content to be recommended may be determined directly based on the content feature vector.

In step 13, a user feature vector of the target user is determined based on the content matching model, and a target user group to which the target user belongs is determined according to the user feature vector.

Also, a plurality of users may be grouped in advance, so that a plurality of user groups may be obtained. Wherein the reading preferences of a plurality of users belonging to the same user group are consistent for the content, i.e. the awareness of the satisfaction of the plurality of users for the content is consistent. Accordingly, the user feature of the user can be vector coded based on the content matching model to obtain the user feature vector of the user, so that the target user group corresponding to the user can be determined directly based on the user feature vector.

For example, the content matching model is a model based on deep fm, during the training update of the deep fm model, the features related to the output target (i.e., the content difference parameters) may be updated, so that a vector representation of the features of the user features and the content features, which conform to the satisfaction cognitive scene, may be obtained, and the features unrelated to the output target may be updated less in general during the parameter adjustment of the deep fm model, so that the obtained vector representation may be close to a zero vector, so that an accurate content feature vector and a user feature vector conforming to the application scene may be obtained.

In step 14, according to the target user group, the candidate content group and the content matching relation, determining whether the content to be recommended is the target recommended content matched with the target user, wherein the content matching relation comprises a plurality of groups of content difference parameters of a combination pair formed by the user group and the content group, and the content difference parameters are used for representing the degree of matching degree deviation between the user in the user group and the content in the content group.

In the embodiment of the disclosure, the reading satisfaction degree of the users belonging to the same user group on the content is consistent, and meanwhile, a plurality of contents with similar content characteristics can be formed into a content group from the cognitive scene of the satisfaction degree of the users on the content, so that the matching degree deviation degree of a user group and a content group can be determined. The smaller the content difference parameter is, the smaller the degree of deviation of the degree of matching of the user in the user group to the content in the content group is, the higher the degree of matching of the content in the content group to the user in the user group is, and when the content in the content group is recommended to the user in the user group correspondingly, the higher the degree of satisfaction of the user in the user group is, otherwise, the larger the content difference parameter is, the larger the degree of deviation of the degree of matching of the user in the user group to the content in the content group is, the lower the degree of matching of the content in the content group to the user in the user group is, and when the content in the content group is recommended to the user in the user group correspondingly, the degree of satisfaction of the user in the user group is lower.

Therefore, after determining the target user group to which the user belongs and the candidate content group corresponding to the content to be recommended, whether the content to be recommended is matched with the user can be further determined based on the content matching relationship between the user group and the content group. By the method, when the current user has no preference setting or weaker preference setting, the user can be recommended based on the characteristics of the target user group to which the user belongs, and the accuracy and the application range of content recommendation are improved.

In step 15, content recommendation is performed for the target user according to the target recommended content.

As an example, the target recommended contents may be displayed to the target user in order according to the determined order of the matching degree between the target recommended contents and the target user from high to low, so as to recommend the content to the target user. The matching degree corresponding to the target recommended content is a recommendation parameter corresponding to the content to be recommended when the content to be recommended corresponding to the target user is determined. Further, as another example, the target recommended content ranked N before the ranking may be selected and displayed to the target user after the ranking, so as to recommend content to the target user, where the value of N may be set according to an actual application scenario, which is not limited in this disclosure.

In the above technical solution, after obtaining a plurality of contents to be recommended corresponding to a target user, the method may further determine a content feature vector of each of the contents to be recommended based on a content matching model, determine a candidate content group corresponding to each of the contents to be recommended according to the content feature vector, determine a user feature vector of the target user based on the content matching model, and determine a target user group to which the target user belongs according to the user feature vector, thereby determining whether the contents to be recommended are target recommended contents matched with the target user according to the target user group, the candidate content group, and a content matching relationship, and performing content recommendation for the target user according to the target recommended contents. Therefore, through the technical scheme, after the content to be recommended is determined based on the user characteristics of the user, whether the content to be recommended is suitable for the user or not is determined based on the matching degree deviation degree of the user group to which the user belongs for the candidate content groups to which the content to be recommended belongs, on one hand, the determined content to be recommended can be further screened, the content with lower satisfaction degree for the user is prevented from being recommended, meanwhile, the screening is performed by combining the matching degree deviation degree between the user group and the content group, the determined recommended content can be ensured to be the content which is matched with the user more, and individual deviation caused by screening of single content is avoided; on the other hand, the recommended content which is more matched with the user can be determined by combining the user characteristics in the user group to which the user belongs, and the accurate recommendation of the user is realized when the user characteristics of the user are fewer, so that the user can obtain the interesting content without searching or checking for many times, the actual reading requirement of the user is met, the user operation is saved, and the user use experience is improved.

In one possible embodiment, the content matching model may be obtained by the following steps, as shown in fig. 2, specifically including:

in step 21, training sample data is obtained, wherein each training sample data comprises a user sample and a content sample, and a labeling difference parameter between the user sample and the content sample.

Under the condition of acquiring user authorization, the user data and the corresponding content data can be acquired, and the previous annotation difference parameters can be determined according to the user satisfaction questionnaire of the content. For example, when recommended content is displayed for a user in the history reading process, a satisfaction questionnaire may be issued to the user at the same time, in response to an input operation of the user, a difference parameter between the user and the content is determined according to a score input by the user, for example, if the user inputs the parameter in a satisfaction input box, normalization processing is performed according to the parameter and a standard measurement value, satisfaction is obtained, a difference value obtained by subtracting the satisfaction from 1 is used as the difference parameter, if the user inputs the parameter in an unsatisfied input box, normalization processing is performed according to the parameter and the standard measurement value, and a value obtained after normalization processing is used as the difference parameter. Then, samples with difference parameters can be respectively used as user samples and content samples from the obtained user data and content data, and corresponding difference parameters are used as labeling difference parameters between the user samples and the content samples.

In step 22, an embedding vector is determined for each feature dimension in the user characteristics of the user sample and the content characteristics of the content sample.

Wherein, the user feature may be a feature dimension that may be used to represent interest preferences of the user, obtained if authorization of the user is obtained, and the content feature may be a feature dimension that may be used to represent a layout structure and reading of the content, which may include many of the following: the title and the content correspond to the association degree, the content length, the non-stop word ratio, the content word repetition degree, the association degree of each part in the content and the content correspond to the reading proportion.

In the characteristic value representation of the association degree of the title and the content, if the title contains the theme of the content, the characteristic value is larger, and if the title exaggerates or hides the key information in the content, the characteristic value is smaller. In the characteristic value representation of the content length, when the content length belongs to a preset length range, the characteristic value is larger, and the characteristic value is smaller when the content length deviates from the length range. In the feature value representation of the duty ratio of the non-stop word, the larger the duty ratio of the non-stop word is, the larger the feature value is, and the smaller the duty ratio of the non-stop word is, the smaller the information amount in the content is, and the smaller the feature value is. In the characteristic value representation of the content word repetition degree, the larger the content word repetition degree is, the fewer topics are contained in the content, namely the content is spread around a certain topic, the larger the characteristic value is, the smaller the content word repetition degree is, the more topics are contained in the content, namely the content is spread around a plurality of topics, and the characteristic value is smaller. In the characteristic value representation of the degree of association of each part in the content, the larger the association of each part in the content is, the stronger the correlation of the subject matter of each part is, the larger the characteristic value is, the smaller the association of each part in the content is, the smaller the correlation of the subject matter of each part is, and the characteristic value is. In the characteristic value representation of the reading proportion corresponding to the content, the reading proportion is the ratio of the reading quantity corresponding to the content to the display quantity, and the ratio can be determined as the characteristic value of the characteristic dimension.

Thus, the user characteristics can be represented by the characteristic values of a plurality of characteristic dimensions, and similarly, the content can be represented in a multi-dimensional manner based on the characteristic values of a plurality of characteristic dimensions of the content, so that more and more comprehensive characteristic representations corresponding to the user samples and the content samples are obtained.

The content matching model may be implemented based on a deep fm model, and fig. 3 is a schematic structural diagram of the content matching model. As shown in fig. 3, the embedding layer in the content matching model may maintain an embedding matrix and a mapping table, where the mapping table is used to store a mapping relationship between a feature value of an input feature dimension and each embedded vector in the embedding matrix, for example, taking the feature dimension of a content length as an example, and the subscript corresponding to the feature value of the content length may be determined from the mapping table, so that the corresponding embedded vector is determined from the embedding matrix based on the subscript. Further, an unknown feature value may be added to each feature dimension to fill in when a sample lacks a feature value of the feature dimension based on the unknown feature value. The embedded matrix can be initialized to be zero matrix or randomly initialized, and then learning optimization is performed based on a gradient descent method in the training process of the content matching model, so that accurate characterization of each embedded vector can be determined based on the embedded matrix.

In step 23, each embedded vector is spliced to obtain a feature vector corresponding to the training sample data.

For example, the plurality of embedded vectors may be spliced according to a preset sequence based on a splicing function commonly used in the art, which is not described in detail in this disclosure. And when the splicing is carried out, the preset sequence is consistent between each training sample data and the candidate model application process, so that the accuracy of the feature vector is ensured.

In step 24, a prediction difference parameter corresponding to the training sample data is determined based on the feature vector.

As an example, the prediction of the satisfaction two classification may be performed directly based on the feature vector, i.e., it is determined whether the predicted result of the user sample on the content sample in the training sample data is satisfactory or unsatisfactory based on the feature vector. For example, the feature vector may be fed into the output layer, such that it is softmax processed at the output layer, the probabilities of its respective satisfactory and unsatisfactory classifications are obtained, and the probability corresponding to the unsatisfactory classification is determined as the prediction difference parameter.

As another example, an exemplary implementation of the determining the prediction difference parameter corresponding to the training sample data based on the feature vector may include:

And carrying out feature extraction on the feature vector according to the FM factor decomposition layer to obtain a first feature vector.

The FM (Factorization Models) layer may perform first-order feature and second-order feature combination on the feature vectors, in the FM layer, instead of directly adding the second-order feature combination, when the feature vectors of different feature dimensions in the model perform feature interaction, each feature dimension may be given a feature weight vector to decompose the feature weight vector to perform interaction, and the inner product of the two feature weight vectors may be used as a weight. Thus, as long as the feature dimension appears in the training sample data, the model can learn and update the feature weight vector accordingly, so that the generalization capability of the model can be improved to a certain extent.

And carrying out feature extraction on the feature vector according to the MLP multi-layer perception layer to obtain a second feature vector.

The MLP (Muti-Layer preference) Layer is a multi-Layer neural network, which can implement high-order combination of feature vectors, and learn new feature combinations to further improve features represented in the second feature vector, thereby improving the application range of the model.

And then, splicing the first feature vector and the second feature vector to obtain a target feature vector.

The first feature vector and the second feature vector can be spliced based on a common splicing function in the art, so that the spliced target feature vector contains first-order features, second-order features and higher-order features, the accuracy and the comprehensiveness of the target feature vector are improved, and accurate data support is provided for the subsequent prediction of difference parameters.

And then processing the target feature vector to obtain the prediction difference parameter.

Likewise, the probability corresponding to the unsatisfactory classification obtained by softmax processing the target feature vector may be taken as the prediction difference parameter.

Therefore, through the technical scheme, after the feature vector containing the user feature and the content feature is determined, further feature extraction is performed based on the feature vector, so that more comprehensive and accurate features can be obtained, the accuracy of the determined prediction difference parameters is improved, and reliable data support is improved for the subsequent determination of recommended content based on the content matching model.

Turning back to fig. 2, in step 25, parameters of the preset model are adjusted according to the relationship between the labeling difference parameter and the prediction difference parameter to obtain a content matching model.

Wherein the loss of the model may be determined from the standard deviation parameter and the predicted deviation parameter, such that training is stopped if the loss is less than a loss threshold. The preset model is illustratively a model based on deep fm implementation, so that parameters of the model can be adjusted by a gradient descent method to obtain a trained content matching model.

Therefore, through the technical scheme, the content matching model can be trained based on the satisfaction degree difference degree between the user and the historical reading content of the user, so that the vector representation mode of the user and the content can be determined from the satisfaction degree scene of the user on the content, the embedded vector is learned through training, the accuracy of the embedded vector is ensured, and meanwhile, the more effective characteristic dimension of the satisfaction degree scene can be determined in the training process based on the satisfaction degree scene, so that reliable data support is provided for the follow-up prediction of difference parameters and the screening of target recommended content.

In one possible embodiment, the content variance parameter of the combined pair formed by the user group and the content group may be determined by:

user data and content data are acquired.

The user data may be, for example, feature data of a plurality of users, and the user data of the plurality of users and the content data read by the user data may be acquired in a case where authorization of the user is obtained.

And determining a content characteristic vector corresponding to the content data and a user characteristic vector corresponding to the user data according to the content matching model.

The specific implementation of determining the content feature vector and the user feature vector is described in detail above, and will not be described herein.

Then, clustering the user feature vectors to obtain a plurality of user groups corresponding to the user data; and clustering the content feature vectors to obtain a plurality of content groups corresponding to the content data.

The clustering may be performed by a K-means based on the user feature vectors, i.e., the clustering may be performed on the user data, and similarly, the clustering may be performed by a K-means based on the respective content feature vectors. The algorithm for clustering K-means is a common algorithm in the art, and will not be described here.

As described above, the content matching model is implemented based on the deep fm model, so when the user feature vector and the content feature vector are determined based on the content matching model, the determined user feature vector and the determined content feature vector can satisfy the vector representation of the satisfaction recognition scene, thereby effectively avoiding the problem that a plurality of users are clustered into different user groups due to different features irrelevant to satisfaction recognition when the user feature vector is clustered, ensuring the accuracy of user clustering, making the contribution of the vector representation corresponding to the features irrelevant to the satisfaction recognition scene in the clustering process to the calculated clustering distance smaller, and the clustering result is more determined by the features relevant to the satisfaction recognition scene. Similarly, the accuracy and the effectiveness of content grouping determined by clustering the content data can be improved.

And aiming at the combination formed by any user group and any content group, obtaining the content difference parameters corresponding to the user group and the content group according to the central user vector corresponding to the user group, the central content vector corresponding to the content group and the content matching model.

Illustratively, each user group obtained by K-means clustering is denoted as U1-Un, n is the number of user groups, and each content group obtained by K-means clustering is denoted as L1-Lm, m is the number of content groups. Illustratively, fig. 4 shows the description information of a plurality of content packets.

Taking the user group U1 and the content group L1 as an example, the center vector X1 corresponding to U1 may be used as a user feature vector, and the center vector Y1 corresponding to L1 may be used as a content feature vector, so that X1 and Y1 may be input into the content matching model to obtain the output prediction difference parameter, and the prediction difference parameter may be used as the content difference parameter corresponding to the user group U1 and the content group L1. The specific implementation manner is described in detail above, and will not be described in detail here. Fig. 5 is a schematic diagram showing a comparison between the content variance parameters corresponding to the determined user groups and the content groupings and the variance parameters determined by the actual questionnaires. Therefore, the prediction accuracy of the content matching model can meet the actual use requirement.

Therefore, through the technical scheme, the users can be clustered to obtain the user group, and the content is clustered to obtain the content group, so that the matching degree deviation of the users in the user group to the content in the content group can be determined under the satisfaction cognition scene, and the historical multi-user data is used for determining, so that the fitting degree of the determined content difference parameters and the actual application scene is ensured on one hand, the use requirement of the users can be met on the other hand, and the content to be recommended is further screened based on the satisfaction cognition of the users to the content, so that the accuracy and the rationality of content recommendation are improved.

In a possible embodiment, an exemplary implementation manner of the determining a content feature vector of each content to be recommended is as follows, and the step may include:

and extracting a characteristic value corresponding to each piece of content to be recommended according to a preset characteristic dimension aiming at each piece of content to be recommended. Wherein the preset feature dimension is one or more of the following: the association degree of the title and the content, the length of the content, the duty ratio of the non-stop words, the repetition degree of the content words, the association degree of each part in the content and the reading proportion of the content are detailed above, so that the characteristic value can be extracted from the plurality of preset characteristic dimensions.

If the feature dimension is a continuous feature, determining a discrete range corresponding to the feature value of the feature dimension, and determining the value corresponding to the discrete range as the feature value corresponding to the feature dimension.

For example, the partial feature dimension is a continuity feature, that is, the feature value corresponding to the feature dimension is a continuity value, and for the feature dimension, discretization may be performed on the feature value of the feature dimension. For example, the feature value of the feature dimension, which is the non-stop word ratio, is the continuity data, and 25%, 50%, 75% quantiles of the feature value of the feature dimension may be calculated, the feature value of less than 25% quantiles is "low", the feature value of 25% quantiles to 50% quantiles is "low", the feature value of 50% quantiles to 75% quantiles is "high", and the feature value of more than 75% quantiles is "high", so that the feature value of the feature dimension may be discretized based on the discrete range. The processing manner of the feature values corresponding to other continuity features is similar to that of the other continuity features, and is not described herein.

And then, determining an embedded vector corresponding to the characteristic value according to the characteristic value corresponding to each characteristic dimension and a mapping table, wherein the mapping table is used for mapping the characteristic value corresponding to each characteristic dimension and the vector in the embedded matrix.

Likewise, a corresponding index may be determined from the mapping table based on the feature value of the feature dimension, so that a corresponding embedded vector is determined from the embedded matrix based on the index as the embedded vector corresponding to the feature value. If the determined duty ratio of the non-stop word is 0.9, determining that the characteristic value of the duty ratio of the non-stop word is "high", determining the corresponding index i from the mapping table based on the characteristic dimension of "duty ratio of the non-stop word" and the characteristic value of "high", determining that the association degree of the title corresponding to the content is 0.8, determining that the characteristic value of the association degree of the title corresponding to the content is "high", determining the corresponding index j from the mapping table based on the characteristic dimension of "association degree of the title corresponding to the content" and the characteristic value of "high", and then respectively inquiring the corresponding embedded vector based on the index i and the index j.

And splicing the embedded vectors corresponding to each feature dimension to obtain the content feature vector of the content to be recommended.

For example, the embedded vectors corresponding to the title and the content may be spliced according to the sequence of the association degree, the content length, the non-stop word duty ratio, the content word repetition degree, the association degree of each part in the content and the reading proportion corresponding to the content, so as to obtain the content feature vector, where the sequence is not limited by the present disclosure, and only the sequence is required to be ensured to be consistent with the splicing sequence in the training process of the content matching model, so that the accuracy of the content feature vector is ensured. Therefore, the continuous characteristics of the content to be recommended can be processed, unified management of a plurality of characteristics based on the mapping table is facilitated, and the determination of vector representation of characteristic dimensions is facilitated, so that the convenience of the content recommendation method is improved.

In a possible embodiment, an exemplary implementation manner of determining whether the content to be recommended is the target recommended content matched with the target user according to the target user group, the candidate content group and the content matching relationship is as follows, and the step may include:

for each content to be recommended, inquiring the content matching relation according to the first identifier of the target user group and the second identifier of the candidate content group to which the content to be recommended belongs;

and determining the queried content difference parameters corresponding to the first identifier and the second identifier as target difference parameters corresponding to the content to be recommended.

The first identifier of the target user group may be the number of the determined target user group, or may be a center vector of the target user group, and the second identifier of the candidate content group may be the number of the determined candidate content group, or may be a center vector of the candidate content group. Accordingly, the content matching relationship can be queried based on the determination of the first identifier and the second identifier, and the queried content difference parameter corresponding to the first identifier and the second identifier is determined to be the target difference parameter corresponding to the content to be recommended, so that the matching degree deviation of the user in the target user group and the content in the candidate content group is obtained.

And determining each content to be recommended except the content to be recommended, of which the target difference parameter is larger than a preset threshold value, as the target recommended content.

The preset threshold may be set according to an actual application scenario, which is not limited in the present disclosure, for example, may be set to 0.6, so as to exclude contents with low satisfaction of the target user as far as possible. As an example, if the target difference parameter corresponding to the content to be recommended is greater than the preset threshold, which indicates that the matching degree deviation between the user in the target user group to which the target user belongs and the content in the candidate content group to which the content to be recommended belongs is greater, the satisfaction degree of the target user may be lower when the content to be recommended is recommended to the target user, so that the recommendation of the content to be recommended to the target user should be avoided.

Therefore, through the technical scheme, the content with too low user satisfaction can be eliminated from the content to be recommended based on the determined content difference parameters by inquiring the content matching relation corresponding to the user group and the content group, so that the content is prevented from being recommended to the user, the comprehensiveness of content recommendation is improved, and the use experience of the user is effectively improved.

The present disclosure also provides a content recommendation apparatus, as shown in fig. 6, the apparatus 10 includes:

the acquiring module 100 is configured to acquire a plurality of to-be-recommended contents corresponding to a target user;

the first determining module 200 is configured to determine a content feature vector of each content to be recommended based on a content matching model, and determine a candidate content group corresponding to each content to be recommended according to the content feature vector;

a second determining module 300, configured to determine, based on the content matching model, a user feature vector of the target user, and determine, according to the user feature vector, a target user group to which the target user belongs;

a third determining module 400, configured to determine, according to the target user group, the candidate content group, and a content matching relationship, whether the content to be recommended is a target recommended content that matches the target user, where the content matching relationship includes a plurality of sets of content difference parameters of a combination pair formed by the user group and the content group, where the content difference parameters are used to characterize a degree of deviation of matching degree between a user in the user group and a content in the content group;

and the recommending module 500 is used for recommending the content for the target user according to the target recommended content.

Optionally, the content variance parameter of the combination pair formed by the user group and the content group is determined by:

acquiring user data and content data;

according to the content matching model, determining a content characteristic vector corresponding to the content data and a user characteristic vector corresponding to the user data;

clustering the user feature vectors to obtain a plurality of user groups corresponding to the user data;

clustering the content feature vectors to obtain a plurality of content groups corresponding to the content data;

Optionally, the content matching model is obtained by a training module comprising:

the acquisition sub-module is used for acquiring training sample data, wherein each training sample data comprises a user sample and a content sample, and a labeling difference parameter between the user sample and the content sample;

A first determining submodule, configured to determine an embedded vector corresponding to each feature dimension in a user feature of the user sample and a content feature of the content sample;

the first splicing sub-module is used for splicing the embedded vectors to obtain the feature vectors corresponding to the training sample data;

the second determining submodule is used for determining a prediction difference parameter corresponding to the training sample data based on the feature vector;

and the adjustment sub-module is used for adjusting the parameters of a preset model according to the relation between the annotation difference parameters and the prediction difference parameters so as to obtain the content matching model.

Optionally, the second determining submodule includes:

the first extraction submodule is used for extracting the characteristics of the characteristic vector according to the FM factor decomposition layer to obtain a first characteristic vector;

the second extraction submodule is used for carrying out feature extraction on the feature vectors according to the MLP multi-layer perception layer to obtain second feature vectors;

the second splicing sub-module is used for splicing the first characteristic vector and the second characteristic vector to obtain a target characteristic vector;

and the processing sub-module is used for processing the target feature vector to obtain the prediction difference parameter.

Optionally, the first determining module includes:

the third extraction sub-module is used for extracting the characteristic value corresponding to each piece of content to be recommended according to the preset characteristic dimension;

a third determining submodule, configured to determine a discrete range corresponding to a feature value of the feature dimension if the feature dimension is a continuous feature, and determine a value corresponding to the discrete range as a feature value corresponding to the feature dimension;

a fourth determining submodule, configured to determine an embedded vector corresponding to the feature value according to the feature value corresponding to each feature dimension and a mapping table, where the mapping table is a mapping relationship between the feature value corresponding to each feature dimension and a vector in an embedded matrix;

and the third splicing sub-module is used for splicing the embedded vectors corresponding to each characteristic dimension to obtain the content characteristic vector of the content to be recommended.

Optionally, the preset feature dimension is more than one of the following: the title and the content correspond to the association degree, the content length, the non-stop word ratio, the content word repetition degree, the association degree of each part in the content and the content correspond to the reading proportion.

Optionally, the third determining module includes:

The inquiring sub-module is used for inquiring the content matching relation according to the first identifier of the target user group and the second identifier of the candidate content group to which the content to be recommended belongs for each content to be recommended;

a fifth determining submodule, configured to determine, as a target difference parameter corresponding to the content to be recommended, the queried content difference parameter corresponding to the first identifier and the second identifier;

and a sixth determining sub-module, configured to determine each content to be recommended except for the content to be recommended whose target difference parameter is greater than a preset threshold value as the target recommended content.

Referring now to fig. 7, a schematic diagram of an electronic device 600 suitable for use in implementing embodiments of the present disclosure is shown. The terminal devices in the embodiments of the present disclosure may include, but are not limited to, mobile terminals such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), in-vehicle terminals (e.g., in-vehicle navigation terminals), and the like, and stationary terminals such as digital TVs, desktop computers, and the like. The electronic device shown in fig. 7 is merely an example and should not be construed to limit the functionality and scope of use of the disclosed embodiments.

As shown in fig. 7, the electronic device 600 may include a processing means (e.g., a central processing unit, a graphic processor, etc.) 601, which may perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 602 or a program loaded from a storage means 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data required for the operation of the electronic apparatus 600 are also stored. The processing device 601, the ROM 602, and the RAM 603 are connected to each other through a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

In general, the following devices may be connected to the I/O interface 605: input devices 606 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, and the like; an output device 607 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 608 including, for example, magnetic tape, hard disk, etc.; and a communication device 609. The communication means 609 may allow the electronic device 600 to communicate with other devices wirelessly or by wire to exchange data. While fig. 7 shows an electronic device 600 having various means, it is to be understood that not all of the illustrated means are required to be implemented or provided. More or fewer devices may be implemented or provided instead.

In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a non-transitory computer readable medium, the computer program comprising program code for performing the method shown in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via communication means 609, or from storage means 608, or from ROM 602. The above-described functions defined in the methods of the embodiments of the present disclosure are performed when the computer program is executed by the processing device 601.

It should be noted that the computer readable medium described in the present disclosure may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this disclosure, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present disclosure, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, fiber optic cables, RF (radio frequency), and the like, or any suitable combination of the foregoing.

In some implementations, the clients, servers may communicate using any currently known or future developed network protocol, such as HTTP (HyperText Transfer Protocol ), and may be interconnected with any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the internet (e.g., the internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed networks.

The computer readable medium may be contained in the electronic device; or may exist alone without being incorporated into the electronic device.

The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring a plurality of contents to be recommended corresponding to a target user; determining a content feature vector of each content to be recommended based on a content matching model, and determining a candidate content group corresponding to each content to be recommended according to the content feature vector; determining a user characteristic vector of the target user based on the content matching model, and determining a target user group to which the target user belongs according to the user characteristic vector; determining whether the content to be recommended is target recommended content matched with the target user according to the target user group, the candidate content group and a content matching relation, wherein the content matching relation comprises a plurality of groups of content difference parameters of combination pairs formed by the user group and the content group, and the content difference parameters are used for representing the degree of matching deviation between the user in the user group and the content in the content group; and recommending the content for the target user according to the target recommended content.

Computer program code for carrying out operations of the present disclosure may be written in one or more programming languages, including, but not limited to, an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The modules described in the embodiments of the present disclosure may be implemented in software or hardware. The name of the module is not limited to the module itself in some cases, for example, the acquisition module may also be described as "a module for acquiring a plurality of contents to be recommended corresponding to the target user".

The functions described above herein may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: a Field Programmable Gate Array (FPGA), an Application Specific Integrated Circuit (ASIC), an Application Specific Standard Product (ASSP), a system on a chip (SOC), a Complex Programmable Logic Device (CPLD), and the like.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

According to one or more embodiments of the present disclosure, example 1 provides a content recommendation method, wherein the method includes:

According to one or more embodiments of the present disclosure, example 2 provides the method of example 1, wherein the content variance parameter of the combined pair formed by the user group and the content group is determined by:

Acquiring user data and content data;

According to one or more embodiments of the present disclosure, example 3 provides the method of example 1, wherein the content matching model is obtained by:

obtaining training sample data, wherein each piece of training sample data comprises a user sample and a content sample, and a labeling difference parameter between the user sample and the content sample;

determining embedding vectors respectively corresponding to each feature dimension in the user features of the user samples and the content features of the content samples;

Splicing the embedded vectors to obtain feature vectors corresponding to the training sample data;

determining a prediction difference parameter corresponding to the training sample data based on the feature vector;

and adjusting parameters of a preset model according to the relation between the labeling difference parameters and the prediction difference parameters so as to obtain the content matching model.

According to one or more embodiments of the present disclosure, example 4 provides the method of example 3, wherein the determining, based on the feature vector, a prediction difference parameter corresponding to the training sample data includes:

extracting the characteristics of the characteristic vector according to an FM factor decomposition layer to obtain a first characteristic vector;

extracting the characteristics of the characteristic vector according to the MLP multi-layer perception layer to obtain a second characteristic vector;

splicing the first characteristic vector and the second characteristic vector to obtain a target characteristic vector;

and processing the target feature vector to obtain the prediction difference parameter.

According to one or more embodiments of the present disclosure, example 5 provides the method of example 1, wherein said determining a content feature vector for each of said content to be recommended comprises:

Extracting a characteristic value corresponding to each piece of content to be recommended according to a preset characteristic dimension;

if the feature dimension is a continuous feature, determining a discrete range corresponding to the feature value of the feature dimension, and determining a value corresponding to the discrete range as the feature value corresponding to the feature dimension;

determining an embedded vector corresponding to the characteristic value according to the characteristic value corresponding to each characteristic dimension and a mapping table, wherein the mapping table is used for mapping the characteristic value corresponding to each characteristic dimension and the vector in the embedded matrix;

According to one or more embodiments of the present disclosure, example 6 provides the method of example 5, wherein the preset feature dimension is one or more of: the title and the content correspond to the association degree, the content length, the non-stop word ratio, the content word repetition degree, the association degree of each part in the content and the content correspond to the reading proportion.

According to one or more embodiments of the present disclosure, example 7 provides the method of example 1, wherein the determining whether the content to be recommended is the target recommended content that matches the target user according to the target user group, the candidate content group, and the content matching relationship includes:

determining the queried content difference parameters corresponding to the first identifier and the second identifier as target difference parameters corresponding to the content to be recommended;

According to one or more embodiments of the present disclosure, example 8 provides a content recommendation apparatus, wherein the apparatus includes:

According to one or more embodiments of the present disclosure, example 9 provides a computer-readable medium having stored thereon a computer program which, when executed by a processing device, implements the steps of the method of any of examples 1-7.

Example 10 provides an electronic device according to one or more embodiments of the present disclosure, including:

a storage device having a computer program stored thereon;

processing means for executing the computer program in the storage means to implement the steps of the method of any one of examples 1-7.

The foregoing description is only of the preferred embodiments of the present disclosure and description of the principles of the technology being employed. It will be appreciated by persons skilled in the art that the scope of the disclosure referred to in this disclosure is not limited to the specific combinations of features described above, but also covers other embodiments which may be formed by any combination of features described above or equivalents thereof without departing from the spirit of the disclosure. Such as those described above, are mutually substituted with the technical features having similar functions disclosed in the present disclosure (but not limited thereto).

Moreover, although operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. In certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limiting the scope of the present disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.

Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are example forms of implementing the claims. The specific manner in which the various modules perform the operations in the apparatus of the above embodiments have been described in detail in connection with the embodiments of the method, and will not be described in detail herein.

Claims

1. A content recommendation method, the method comprising:

2. The method of claim 1, wherein the content variance parameter of the combined pair formed by the user group and the content group is determined by:

Acquiring user data and content data;

3. The method of claim 1, wherein the content matching model is obtained by:

4. A method according to claim 3, wherein said determining a prediction difference parameter corresponding to the training sample data based on the feature vector comprises:

5. The method of claim 1, wherein said determining a content feature vector for each of said content to be recommended comprises:

6. The method of claim 5, wherein the predetermined feature dimension is one or more of: the title and the content correspond to the association degree, the content length, the non-stop word ratio, the content word repetition degree, the association degree of each part in the content and the content correspond to the reading proportion.

7. The method of claim 1, wherein the determining whether the content to be recommended is the target recommended content matching the target user according to the target user group, the candidate content group, and the content matching relationship comprises:

8. A content recommendation device, the device comprising:

9. A computer readable medium on which a computer program is stored, characterized in that the program, when being executed by a processing device, carries out the steps of the method according to any one of claims 1-7.

10. An electronic device, comprising:

a storage device having a computer program stored thereon;

processing means for executing said computer program in said storage means to carry out the steps of the method according to any one of claims 1-7.