WO2023020160A1 - 一种推荐方法、训练方法、装置、设备及推荐系统 - Google Patents

一种推荐方法、训练方法、装置、设备及推荐系统 Download PDF

Info

Publication number
WO2023020160A1
WO2023020160A1 PCT/CN2022/105075 CN2022105075W WO2023020160A1 WO 2023020160 A1 WO2023020160 A1 WO 2023020160A1 CN 2022105075 W CN2022105075 W CN 2022105075W WO 2023020160 A1 WO2023020160 A1 WO 2023020160A1
Authority
WO
WIPO (PCT)
Prior art keywords
user
feature vector
sample
image
candidate
Prior art date
Application number
PCT/CN2022/105075
Other languages
English (en)
French (fr)
Inventor
朱杰明
赵洲
张圣宇
何秀强
钱莉
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP22857473.7A priority Critical patent/EP4379574A4/en
Publication of WO2023020160A1 publication Critical patent/WO2023020160A1/zh
Priority to US18/441,389 priority patent/US20240184837A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/53Querying
    • G06F16/535Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features

Definitions

  • the embodiments of the present application relate to the technical field of recommendation, and in particular to a recommendation method, training method, device, equipment, and recommendation system.
  • the current news recommendation system is only used to mine news content that users are interested in, ignoring the impact of the news interface used to recommend news content on users, resulting in the inability to further improve the click-through rate of news.
  • the embodiment of the present application provides a recommendation method, training method, device, equipment and recommendation system, which are used to increase the user's click rate on news by using the influence of the news interface on the user.
  • the embodiment of the present application provides a recommendation method, including: acquiring multiple images, each image contains a candidate interface and a candidate content presented through the candidate interface, wherein the image can be understood as presented through the candidate interface
  • the image of the candidate content; the candidate content can be not only news content, but also other content such as short videos and product information; correspondingly, the candidate interface can be not only a news interface, but also an interface for presenting short videos, Commodity information interface; acquire image feature data of each image; image feature data may include global visual impression feature data and/or local visual impression feature data, wherein global visual impression feature data can be understood as features extracted from the entire image Data, local visual impression feature data can be understood as the feature data extracted from the local area of the image; based on the user feature data and image feature data of the target user, and predict the target user's preference for each image through the prediction model, the prediction model
  • the input is determined based on user feature data and image feature data, where the user feature data includes the user's age information, the city where the user is located, and
  • the prediction model trained based on the image feature data of the image can accurately predict the user's perception of the image while considering the influence of the candidate content and the candidate interface on the user.
  • the degree of preference which is conducive to recommending content of interest to the user through the candidate interface that the user is interested in, so as to improve the click rate of the user on the recommended content.
  • each image includes multiple regions.
  • the image can be divided by various methods to obtain multiple regions; for example, based on the foregoing description, a piece of news can include the title of the news The author of the news and the category of the news.
  • the news can also include the picture part; therefore, the regional coordinates of the above-mentioned parts can be obtained according to the news layout, and then the image can be divided into multiple regions according to the regional coordinates;
  • the image feature data of each image includes multiple local feature vectors, and each local feature vector is used to represent a region.
  • the image is divided into multiple regions, and the local feature vector representing each region is used as the image feature data of the image, so that the local features of the image can be better extracted to improve the user's preference for the image prediction accuracy.
  • based on the user characteristic data and image characteristic data of the target user, and predicting the preference degree of the target user for each image through the prediction model includes: for each image, based on the candidate content in each image to obtain N word vectors, each word vector represents a word in the candidate content, where N is a positive integer; the candidate content includes N words, corresponding to each word, a word vector can be generated by a text characterizer; and a picture characterizer Similarly, the text characterizer can also be understood as a model obtained through pre-training. There can be many types of the model.
  • the model can be a Bert model; since the title of the news content can better reflect the main information of the news content; Therefore, when the candidate content is news content, word segmentation processing can be performed on the title of the news content to obtain N words, and then N word vectors representing N words can be obtained through the text characterizer; for each word vector, based on each word vectors and multiple local feature vectors, and calculate the respective attention weights of multiple local feature vectors through the model of the attention mechanism.
  • the attention weight indicates that the target user pays attention to the local feature vectors when reading the words represented by each word vector
  • the extent of the represented area; the attention mechanism is a method of dynamically controlling the attention of each part or a certain part of the neural network model in the neural network model by calculating the attention weight of each part in the neural network model and merging them into an attention vector.
  • each word vector is fused with multiple local feature vectors to obtain the first fusion feature vector, and each word vector corresponds to a first fusion Feature vector;
  • multiple local feature vectors can be weighted by their respective attention weights, and then the result of the weighted process is added to the word vector to obtain the first fusion feature vector;
  • the input of the prediction model is determined based on the user feature vector and the N first fusion feature vectors, The user feature vector is used to characterize the user feature data of the target user.
  • the attention weights of multiple local feature vectors are calculated through the model of the attention mechanism, because the attention weight indicates that the target user pays attention to the area represented by the local feature vector when reading the words represented by each word vector. degree, so based on the respective attention weights of multiple local feature vectors, each word vector is fused with multiple local feature vectors, and the first fused feature vector obtained can reflect the words in the image and the effects left by each area to the user. Impression feature information; in this way, using the first fused feature vector to predict the preference degree can improve the accuracy of the user's preference degree for the image.
  • the model of the attention mechanism processes the N first fused feature vectors corresponding to the N word vectors to obtain N semantically enhanced feature vectors, and each first fused feature vector corresponds to a semantically enhanced feature vector, wherein the self-attention
  • the self-attention mechanism is a mechanism improved by the attention mechanism, which reduces the dependence on external information and is better at capturing the internal correlation of data or features; based on user feature vectors and N semantically enhanced feature vectors , and predict the target user's preference for each image through the prediction model, the input of the prediction model is determined based on the user feature vector and N semantic enhancement feature vectors.
  • the semantic enhancement feature vector is obtained by processing the N first fusion feature vectors corresponding to N word vectors through the model of the self-attention mechanism. Since the self-attention mechanism is better at capturing the internal correlation of data or features, the obtained The semantically enhanced feature vector can reflect the correlation between the first fusion feature vectors, so that it can more accurately reflect the impression feature information left by the image to the user; in this way, using the semantically enhanced feature vector to predict the degree of preference can improve the user's perception of the image. The accuracy of the degree of preference.
  • based on the user feature vector and N semantically enhanced feature vectors, and predicting the target user's preference for each image through the prediction model includes: For each image, the model of the additive attention mechanism will N A semantically enhanced feature vector is fused to obtain the second fusion feature vector; based on the user feature vector and the second fusion feature vector, and predicting the degree of preference of the target user for each image through the prediction model, the input of the prediction model is based on the user feature vector and the second fused feature vector determined.
  • the model of the additive attention mechanism realizes the fusion of N semantically enhanced feature vectors, and uses the fused second fusion feature vector to predict the degree of preference and improve the accuracy of the user's preference degree for images.
  • the image feature data of each image includes a global feature vector, and the global feature vector is used to represent the image; at this time, the image feature data can also be called global visual impression feature data; the method of obtaining the global feature vector
  • the method may specifically include: inputting the image into a picture characterizer, so as to convert the image into a global feature vector through the picture characterizer.
  • the global feature vector characterizing the image is used as the image feature data of the image, so that the global feature of the image can be better extracted, so as to improve the accuracy of prediction of the user's preference for the image.
  • based on the user characteristic data and image characteristic data of the target user, and predicting the preference degree of the target user for each image through the prediction model includes: for each image, based on the candidate content in each image to obtain Content feature vector, the content feature vector is used to represent the candidate content; since the title of the news content can better reflect the main information of the news content; therefore, when the candidate content is news content, the title of the news content can be converted into a title feature vector ;Based on the content feature vector and the global feature vector, determine the weight of the content feature vector and the weight of the global feature vector; Since users may have different sensitivities to visual impression information and text semantics, as an achievable way, you can use Adaptively controlling the respective weights of the content feature vector and the global feature vector through a threshold addition network; based on the weight of the content feature vector and the weight of the global feature vector, merging the content feature vector and the global feature vector to obtain a third fusion feature vector; Based on the user feature vector and the third fusion feature vector, and predict the target user
  • the feature vector can represent the impression feature information left by the extracted image to the user from a global perspective; therefore, using the third fused feature vector to predict the target user's preference for each image can improve the accuracy of the user's preference for the image .
  • selecting the candidate content and/or the candidate interface from the candidate interfaces and candidate contents included in multiple images based on the degree of preference for recommendation includes: selecting from the candidate contents included in the images based on the degree of preference Selecting a candidate content as the target candidate content; selecting a candidate interface as the target candidate interface from the candidate interfaces of images containing the target candidate content based on the degree of preference, so as to recommend the target candidate content through the target candidate interface.
  • the candidate interface recommends target candidate content, which realizes recommending the user-preferred candidate content for the user through the user-preferred candidate interface, thereby increasing the probability of the user clicking on the recommended content.
  • the method further includes: sending the metadata and the target candidate interface to the terminal device The target candidate content, so that the terminal device displays the target candidate interface based on the metadata, and recommends the target candidate content to the target user through the target candidate interface; wherein, the metadata includes various configuration data of the target candidate interface.
  • the embodiment of the present application provides a training method, including: acquiring a plurality of sample images, each sample image including a sample candidate interface and a sample candidate content presented through the sample candidate interface; acquiring each sample image image feature data; based on the user feature data and image feature data of the sample user, and predict the sample user's preference for each sample image through the prediction model, the input of the prediction model is determined based on the user feature data and image feature data; based on The degree of preference and the historical click data of the sample user on the sample candidate content are adjusted to the prediction model, wherein the sample user’s historical click data on the sample candidate content may include whether the sample user clicks on the sample candidate content, and whether the sample user clicks on the sample candidate content times; specifically, the weight of the prediction model can be adjusted, and the structure of the prediction model can also be adjusted.
  • the prediction model trained based on the image feature data of the sample image can accurately and accurately take into account the influence of the candidate content and the candidate interface on the user.
  • the user's preference for images is output, so that it is beneficial to recommend content of interest to the user through the interface that the user is interested in, so as to improve the click rate of the user on the recommended content.
  • each sample image includes multiple regions; image feature data of each sample image includes multiple local feature vectors, and each local feature vector is used to characterize a region.
  • based on the user characteristic data and image characteristic data of the sample user, and using the prediction model to predict the preference degree of the sample user to each sample image includes: for each sample image, based on the The sample candidate content obtains N word vectors, and each word vector represents a word in the sample candidate content, where N is a positive integer; for each word vector, based on each word vector and multiple local feature vectors, and through attention
  • the model of the force mechanism calculates the respective attention weights of multiple local feature vectors.
  • the attention weight indicates the degree to which the sample user pays attention to the area represented by the local feature vector when reading the words represented by each word vector;
  • the attention weight of each word vector and multiple local feature vectors are fused to obtain the first fusion feature vector, and each word vector corresponds to a first fusion feature vector; based on the user feature vector and N word vectors corresponding N first fusion feature vectors, and predict the sample user’s preference for each sample image through the prediction model.
  • the input of the prediction model is determined based on the user feature vector and the N first fusion feature vectors.
  • the user feature vector is used to represent The user characteristic data of the sample user.
  • based on the N first fusion feature vectors corresponding to the user feature vector and N word vectors, and predicting the preference degree of the sample user for each sample image through the prediction model includes: for each sample image, Process the N first fusion feature vectors corresponding to N word vectors through the model of the self-attention mechanism to obtain N semantic enhancement feature vectors, and each first fusion feature vector corresponds to a semantic enhancement feature vector; based on user characteristics vector and N semantic enhancement feature vectors, and predict the sample user's preference for each sample image through the prediction model, the input of the prediction model is determined based on the user feature vector and N semantic enhancement feature vectors.
  • based on the user feature vector and N semantically enhanced feature vectors, and predicting the preference degree of the sample user for each sample image through the prediction model includes: for each sample image, through the model of the additive attention mechanism Merging N semantically enhanced feature vectors to obtain a second fusion feature vector; based on the user feature vector and the second fusion feature vector, and predicting the degree of preference of the sample user for each sample image through a prediction model, the input of the prediction model is based on The user feature vector and the second fusion feature vector are determined.
  • the image feature data of each sample image includes a global feature vector, and the global feature vector is used to characterize the sample image.
  • based on the user characteristic data and image characteristic data of the sample user, and using the prediction model to predict the preference degree of the sample user to each sample image includes: for each sample image, based on the The sample candidate content obtains the content feature vector, and the content feature vector is used to represent the sample candidate content; based on the content feature vector and the global feature vector, determine the weight of the content feature vector and the weight of the global feature vector; based on the weight of the content feature vector and the global feature vector The weight of the content feature vector and the global feature vector are fused to obtain the third fusion feature vector; based on the user feature vector and the third fusion feature vector, the prediction model predicts the preference degree of the sample user for each sample image, and the prediction model The input of is determined based on the user feature vector and the third fusion vector, where the user feature vector is used to characterize the user feature data of the sample user.
  • the embodiment of the present application provides a recommendation device, including: a first image acquisition unit, configured to acquire multiple images, each image contains a candidate interface and a candidate content presented through the candidate interface; the first The feature data acquisition unit is used to acquire the image feature data of each image; the first prediction unit is used to predict the target user's preference for each image based on the target user's user feature data and image feature data through a prediction model, The input of the prediction model is determined based on user feature data and image feature data; the recommendation unit is used to select candidate content and/or candidate interface from candidate interfaces and candidate content contained in multiple images based on preference degree for recommendation.
  • each image includes multiple regions; the image feature data of each image includes multiple local feature vectors, and each local feature vector is used to characterize a region.
  • the first prediction unit is used to obtain N word vectors based on the candidate content in each image for each image, and each word vector represents a word in the candidate content, where N is Positive integer; for each word vector, based on each word vector and multiple local feature vectors, the respective attention weights of multiple local feature vectors are calculated through the model of the attention mechanism.
  • the attention weight indicates that the target user is reading each When the words represented by the word vector, pay attention to the degree of the region represented by the local feature vector; based on the respective attention weights of multiple local feature vectors, each word vector and multiple local feature vectors are fused to obtain the first fusion feature vector, Each word vector corresponds to a first fusion feature vector; based on the user feature vector and the N first fusion feature vectors corresponding to the N word vectors, and predict the target user's preference for each image through the prediction model, the prediction model The input is determined based on the user feature vector and the N first fusion feature vectors, where the user feature vector is used to represent the user feature data of the target user.
  • the first prediction unit is used to process the N first fusion feature vectors corresponding to the N word vectors through the model of the self-attention mechanism for each image, so as to obtain N semantic enhancements Feature vectors, each first fusion feature vector corresponds to a semantic enhancement feature vector; based on the user feature vector and N semantic enhancement feature vectors, and predict the target user's preference for each image through the prediction model, the input of the prediction model is based on The user feature vector and N semantic enhancement feature vectors are determined.
  • the first prediction unit is used to predict the preference degree of the target user for each image based on the user feature vector and N semantically enhanced feature vectors through the prediction model: for each image, by adding The model of the attention mechanism fuses N semantic enhancement feature vectors to obtain the second fusion feature vector; based on the user feature vector and the second fusion feature vector, and predicts the target user's preference for each image through the prediction model, the prediction model The input of is determined based on the user feature vector and the second fused feature vector.
  • the image feature data of each image includes a global feature vector, and the global feature vector is used to characterize the image.
  • the first prediction unit is used to obtain the content feature vector based on the candidate content in each image for each image, and the content feature vector is used to represent the candidate content; based on the content feature vector and the global feature vector , determine the weight of the content feature vector and the weight of the global feature vector; based on the weight of the content feature vector and the weight of the global feature vector, fuse the content feature vector and the global feature vector to obtain the third fusion feature vector; based on the user feature vector and The third fusion feature vector, and predict the target user's preference for each image through the prediction model, the input of the prediction model is determined based on the user feature vector and the third fusion vector, and the user feature vector is used to represent the user feature data of the target user .
  • the recommendation unit is configured to select a candidate content as the target candidate content from the candidate content contained in multiple images based on the degree of preference; from the candidate interface of the image containing the target candidate content based on the degree of preference, A candidate interface is selected as a target candidate interface to recommend target candidate content through the target candidate interface.
  • the device further includes a sending unit, configured to send the metadata of the target candidate interface and the content of the target candidate to the terminal device, so that the terminal device displays the target candidate interface based on the metadata, and sends the target candidate interface to the target through the target candidate interface.
  • the user recommends target candidate content.
  • the embodiment of the present application provides a training device, including: a second image acquisition unit, configured to acquire a plurality of sample images, each sample image including a sample candidate interface and a sample presented through the sample candidate interface Candidate content; the second characteristic data acquisition unit is used to obtain the image characteristic data of each sample image; the second prediction unit is used to predict the sample user's response to each sample user based on the sample user's user characteristic data and image characteristic data through a prediction model The preference degree of each sample image, the input of the prediction model is determined based on user characteristic data and image characteristic data; the adjustment unit is used to adjust the prediction model based on the preference degree and historical click data of the sample candidate content by the sample user.
  • each sample image includes multiple regions; image feature data of each sample image includes multiple local feature vectors, and each local feature vector is used to characterize a region.
  • the second prediction unit is configured to, for each sample image, obtain N word vectors based on the sample candidate content in each sample image, and each word vector represents a word in the sample candidate content, Among them, N is a positive integer; for each word vector, based on each word vector and multiple local feature vectors, the respective attention weights of multiple local feature vectors are calculated through the model of the attention mechanism, and the attention weight represents the sample user When reading the words represented by each word vector, pay attention to the degree of the area represented by the local feature vector; based on the respective attention weights of multiple local feature vectors, each word vector and multiple local feature vectors are fused to obtain the first Fusion feature vectors, each word vector corresponds to a first fusion feature vector; based on the user feature vector and the N first fusion feature vectors corresponding to the N word vectors, and predict the sample user's preference for each sample image through the prediction model To an extent, the input of the prediction model is determined based on the user feature vector and the N first fusion feature vectors, and the user
  • the second prediction unit is used to process the N first fused feature vectors corresponding to the N word vectors through the model of the self-attention mechanism for each sample image to obtain N semantic Enhanced feature vectors, each first fused feature vector corresponds to a semantically enhanced feature vector; based on the user feature vector and N semantically enhanced feature vectors, and predict the preference of the sample user for each sample image through the prediction model, and predict the input of the model is determined based on the user feature vector and N semantic enhancement feature vectors.
  • the second prediction unit is used to fuse the N semantically enhanced feature vectors through the model of the additive attention mechanism for each sample image to obtain the second fusion feature vector; based on the user feature vector and The second fused feature vector is used to predict the preference degree of the sample user for each sample image through the prediction model, and the input of the prediction model is determined based on the user feature vector and the second fused feature vector.
  • the image feature data of each sample image includes a global feature vector, and the global feature vector is used to characterize the sample image.
  • the second prediction unit is configured to, for each sample image, obtain a content feature vector based on the sample candidate content in each sample image, and the content feature vector is used to characterize the sample candidate content; based on the content feature vector and the global feature vector to determine the weight of the content feature vector and the weight of the global feature vector; based on the weight of the content feature vector and the weight of the global feature vector, the content feature vector and the global feature vector are fused to obtain the third fusion feature vector; The user feature vector and the third fusion feature vector, and predict the sample user's preference for each sample image through the prediction model.
  • the input of the prediction model is determined based on the user feature vector and the third fusion vector, and the user feature vector is used to represent the sample User profile data for the user.
  • an embodiment of the present application provides a computer device, including: one or more processors and a memory; wherein, computer-readable instructions are stored in the memory; one or more processors read the computer-readable instructions,
  • the on-vehicle device implements the method in any one of the implementation manners of the first aspect.
  • an embodiment of the present application provides a training device, including: one or more processors and a memory; wherein, computer-readable instructions are stored in the memory; one or more processors read the computer-readable instructions,
  • the on-vehicle device implements the method in any implementation manner of the second aspect.
  • the embodiment of the present application provides a computer-readable storage medium, including computer-readable instructions, and when the computer-readable instructions are run on the computer, the computer executes any implementation method according to the first aspect or the second aspect. Methods.
  • the embodiment of the present application provides a chip, including one or more processors. Part or all of the processor is used to read and execute the computer program stored in the memory, so as to execute the method in any possible implementation manner of the first aspect or the second aspect above.
  • the chip includes a memory, and the memory and the processor are connected to the memory through a circuit or wires. Further optionally, the chip further includes a communication interface, and the processor is connected to the communication interface.
  • the communication interface is used to receive data and/or information to be processed, and the processor obtains the data and/or information from the communication interface, processes the data and/or information, and outputs the processing result through the communication interface.
  • the communication interface may be an input-output interface.
  • some of the one or more processors can also implement some steps in the above method through dedicated hardware.
  • the processing related to the neural network model can be performed by a dedicated neural network processor or graphics processor to achieve.
  • the method provided in the embodiment of the present application may be implemented by one chip, or may be implemented by multiple chips in cooperation.
  • the embodiment of the present application provides a computer program product, the computer program product includes computer software instructions, and the computer software instructions can be loaded by a processor to implement any one of the first aspect or the second aspect above way of way.
  • the embodiment of the present application provides a recommendation system, including a terminal device and a server;
  • the server is used to execute the method in any one of the implementation manners in the first aspect
  • the terminal device is configured to receive metadata and target candidate content of the target candidate interface from the server;
  • a target candidate interface is displayed based on the metadata, and target candidate content is recommended to the target user through the target candidate interface.
  • Fig. 1 is a schematic diagram of the structure of the news recommendation system provided by the embodiment of the present application.
  • Fig. 2 is a schematic diagram of an embodiment of news
  • Fig. 3 is a schematic diagram of the working process of the news recommendation system
  • Fig. 4 provides a schematic diagram of an embodiment of a training method for the embodiment of the present application
  • Fig. 5 is a schematic diagram of the region of the sample image in the embodiment of the present application.
  • FIG. 6 is a schematic diagram of a first embodiment of predicting the degree of preference of sample users to each sample image in the embodiment of the present application;
  • FIG. 7 is a schematic diagram of a second embodiment of predicting the degree of preference of sample users to each sample image in the embodiment of the present application.
  • FIG. 8 is a schematic diagram of a second fusion feature vector process in an embodiment of the present application.
  • FIG. 9 is a schematic diagram of a third embodiment of predicting the degree of preference of sample users to each sample image in the embodiment of the present application.
  • Fig. 10 is a schematic diagram of the process of obtaining the third fusion feature vector in the embodiment of the present application.
  • Fig. 11 provides a schematic diagram of an embodiment of a recommendation method for the embodiment of the present application.
  • Fig. 12 is a schematic diagram of the first embodiment of predicting the degree of preference of target users for each image in the embodiment of the present application;
  • FIG. 13 is a schematic diagram of a second embodiment of predicting the degree of preference of target users for each image in the embodiment of the present application.
  • FIG. 14 is a schematic diagram of a third embodiment of predicting the degree of preference of target users for each image in the embodiment of the present application.
  • FIG. 15 is a schematic diagram of an embodiment of predicting a user's preference for news in an embodiment of the present application.
  • FIG. 16 is a schematic diagram of an embodiment of obtaining the best user interface configuration in the embodiment of the present application.
  • Fig. 17 provides a schematic diagram of an embodiment of a training device for the embodiment of the present application.
  • Fig. 18 provides a schematic diagram of an embodiment of a recommendation device according to the embodiment of the present application.
  • FIG. 19 is a schematic diagram of an embodiment of a computer device provided in an embodiment of the present application.
  • plural means two or more.
  • the term “and/or” or the character “/” in this application is just an association relationship describing associated objects, indicating that there may be three relationships, for example, A and/or B, or A/B, which may indicate: A alone exists, both A and B exist, and B exists alone.
  • the embodiment of the present application can be applied to the news recommendation system shown in FIG. 1 .
  • the news recommendation system includes a terminal device and a server, and the terminal device is connected to the server by communication.
  • terminal devices may include mobile phones, tablet computers, desktop computers, vehicle-mounted devices, and other devices that can deploy news applications; hereinafter, terminal devices are referred to as terminals for short.
  • the server can be an ordinary server or a cloud server.
  • a news application is deployed in a terminal, and a recommendation service is deployed in a server.
  • the terminal When the user accesses the news application in the terminal, the terminal will send a request to the server to request the recommendation service in the server; after receiving the request, the server will start the recommendation service, and then select the news content that the user is interested in from a large number of news content as recommended news content; then the server sends the recommended news content to the terminal, and then the terminal displays the recommended news content to the user.
  • the embodiment of the present application does not specifically limit the news content; for example, as shown in FIG. 2 , the news content may include the title of the news, the author of the news, and the category of the news. In addition, the news content may also include the text of the news.
  • the news interface presenting the news content will also affect the click-through rate of the news.
  • the layout of the graphics and texts in the news interface (including the position of the title, the relative position between the title and the picture), whether or not there is a picture and the size of the picture, the color of the picture, the clarity of the picture, the font, and the size of the font All will leave different visual impressions to users, affect the user's browsing experience, and thus affect the user's click behavior on news.
  • the information that leaves a visual impression on the user in the news interface is called visual impression information.
  • the visual impression information can be understood as the news multi-modal information displayed on the news interface from the user's perspective. Specifically, it may include the aforementioned graphic and text information. Layout, whether or not to have a picture and the size of the picture, the color of the picture, the clarity of the picture, the font, the size of the font and other information.
  • the embodiment of this application provides a recommendation method, which is to obtain multiple images, each image contains a candidate interface and a candidate content, and then according to the user characteristic data of the target user and the image feature data of the image, and use the prediction model to predict the degree of preference of the target user for each image, and finally select candidate content and/or candidate interface from multiple images according to the degree of preference for recommendation;
  • the candidate interface can be a news interface
  • the candidate content can be news content.
  • the recommendation method can realize the recommendation of news, and, in the process of using the recommendation method to recommend news, not only the influence of the target user of the news content is considered, but also the Considering the impact of the news interface on the target user, the interested news (including the news content and the news interface) can be recommended to the target user to further increase the click-through rate of the news.
  • the candidate content can be not only news content, but also other content such as short videos and product information; correspondingly, the candidate interface can be not only a news interface, but also an interface for presenting short videos Product information interface.
  • the method provided in the embodiment of the present application is introduced below by taking the candidate content as news content and the candidate interface as a news interface as an example.
  • the server can also select the news interface that the user is interested in, and then send the metadata of the news interface to the terminal, and then the terminal displays the news interface based on the metadata, and uses the news interface to send Recommended news content is displayed to the user.
  • FIG. 3 the working process of the news recommendation system shown in FIG. 1 can be shown in FIG. 3 .
  • the server extracts news-related data from the user's behavior log (specifically, it may include browsing news data or clicking on news data), uses news-related data to construct training data, and then performs offline training based on the training data to obtain a prediction model;
  • the server receives the request for the recommendation service, it performs online prediction through the prediction model to obtain the user's preference degree for multiple news images, and then selects the news content and news interface according to the preference degree; finally, the terminal sends the message to the user through the news interface. Show news content.
  • the embodiment of the present application provides an embodiment of a training method, which is usually applied to a server, specifically, this embodiment includes:
  • step 101 a plurality of sample images are acquired, and each sample image includes a sample candidate interface and a sample candidate content presented through the sample candidate interface.
  • the sample image can be understood as an image that presents the sample candidate content through the sample candidate interface, wherein, the sample candidate interface and the sample candidate content can be understood by referring to the relevant descriptions of the candidate interface and the candidate content above.
  • Cases of a plurality of sample images may include the following three.
  • the first case is: multiple sample images include one sample candidate interface and multiple sample candidate contents, that is, the sample candidate interfaces in all sample images are the same.
  • the second case is that the multiple sample images include multiple sample candidate interfaces and one type of sample candidate content, that is, the sample candidate content in all the sample images is the same.
  • the third case is that multiple sample images include multiple sample candidate interfaces and multiple sample candidate contents.
  • all sample images containing the same sample candidate content may contain multiple sample candidate contents; for example, the sample images are 10,000, 10,000 sample images include 100 sample candidate contents, and all sample images containing the same sample candidate content include 100 sample candidate interfaces, that is, each sample candidate content can be presented through 100 sample candidate interfaces.
  • Step 102 acquiring image feature data of each sample image.
  • image feature data may only include global visual impression feature data, It may include only the characteristic data of the local visual impression, or may include the characteristic data of the global visual impression and the characteristic data of the local visual impression at the same time.
  • each sample image includes multiple regions
  • the image feature data of each sample image includes multiple local feature vectors, and each local feature vector is used to characterize a region; at this time, the Image feature data may also be referred to as partial visual impression feature data.
  • sample image can be divided by various methods to obtain multiple regions; for example, based on the foregoing description, a piece of news can include the title of the news, the author of the news, and the category of the news.
  • the news can also include a picture part; therefore, the regional coordinates of the above-mentioned parts can be obtained according to the news layout, and then the sample image can be divided into multiple regions according to the regional coordinates.
  • the sample image in FIG. 5 can be divided into three areas of news title, news category and news picture by using the above method.
  • the method for obtaining local feature vectors may specifically include: inputting images of multiple regions into a picture characterizer, so as to convert multiple regions into multiple local feature vectors through the picture characterizer; wherein, the picture characterizer can be understood as
  • a pre-trained model may have many types, for example, the type of the model may be ResNet101.
  • the image feature data of each sample image includes a global feature vector, which is used to characterize the sample image; at this time, the image feature data may also be called global visual impression feature data.
  • the method for obtaining the global feature vector may specifically include: inputting the sample image into a picture characterizer, so as to convert the sample image into a global feature vector through the picture characterizer; since the picture characterizer is described above, it will not be described in detail here stated.
  • Step 103 acquiring user feature data of the sample user.
  • the embodiment of the present application does not specifically limit the type of feature data of the sample user.
  • the feature data of the sample user includes the age information of the sample user, the city where the sample user is located, and the historical data related to the news of the sample user; wherein, the sample user
  • the historical data related to news may specifically include the type of news that the sample user browses, the type of news that the sample user clicks on, the time when the sample user clicks on the news, the location when the sample user clicks on the news, and the like.
  • the historical data related to the news of the sample user can be obtained from the behavior log of the sample user.
  • Step 104 based on the user characteristic data and image characteristic data of the sample user, predict the preference degree of the sample user to each sample image through a prediction model, and the input of the prediction model is determined based on the user characteristic data and image characteristic data.
  • user feature data and image feature data can also be combined with specific environmental information (such as time, date, whether it is a weekend, whether it is a holiday, etc.), and predict the degree of preference of sample users for each sample image through a prediction model .
  • the sample user feature data and image feature data can be directly input into the prediction model, so as to obtain the sample user’s preference degree of the sample image output by the prediction model; the image feature data can also be processed first to obtain The intermediate characteristic data is obtained, and then the sample user characteristic data and the intermediate characteristic data are input into the prediction model, so as to obtain the preference degree of the sample user to the sample image output by the prediction model.
  • Step 105 adjust the prediction model based on the degree of preference and historical click data of the sample candidate content by the sample user.
  • the historical click data of the sample user on the sample candidate content may include whether the sample user clicks on the sample candidate content, and the number of times the sample user clicks on the sample candidate content.
  • the sample label can be set according to the historical click data of the sample user on the sample candidate content; for example, for a sample image, if the sample user has clicked on the sample candidate content in the sample image, the degree of preference The sample label is set to 1, and if the sample user has not clicked on the sample candidate content in the sample image, the sample label of the degree of preference can be set to 0.
  • the sample label of the degree of preference can be set to 1; The number of times of the sample candidate content of is less than the first threshold and greater than or equal to the second threshold, then the sample label of the degree of preference can be set to 0.5; if the number of times the sample user clicks on the sample candidate content in the sample image is less than the second threshold, or If the sample user has not clicked on the sample candidate content in the sample image, the sample label of the preference level can be set to 0.
  • the loss function can be calculated according to the sample user's preference for the sample image output by the prediction model, and the sample label can update the weight of the prediction model through the backpropagation of the loss function, or adjust the structure of the prediction model so that the prediction model outputs The degree of preference is close to the sample labels.
  • the prediction model trained based on the image feature data of the sample image can simultaneously consider the impact of the candidate content and the candidate interface on the user. In the case of influence, accurately output the user's preference for the image, which is conducive to recommending the content of interest to the user through the interface that the user is interested in, so as to improve the click rate of the user on the recommended content.
  • image feature data includes partial visual impression feature data
  • step 104 includes:
  • Step 201 for each sample image, obtain N word vectors based on the sample candidate content in each sample image, each word vector represents a word in the sample candidate content, where N is a positive integer.
  • the sample candidate content includes N words, corresponding to each word, a word vector can be generated by using the text characterizer; similar to the picture characterizer, the text characterizer can also be understood as a model obtained through pre-training, and there can be many types of the model For example, the model can be a Bert model.
  • sample candidate content is news content
  • title of the news content can better reflect the main information of the news content; therefore, when the sample candidate content is news content, word segmentation can be performed on the title of the news content , to obtain N words, and then obtain N word vectors representing N words through the text characterizer.
  • Step 202 for each word vector, based on each word vector and multiple local feature vectors, and through the model of the attention mechanism, calculate the respective attention weights of multiple local feature vectors.
  • the attention weight indicates that the sample user is reading each When using words represented by word vectors, we pay attention to the extent of the regions represented by local feature vectors.
  • the attention mechanism is a mechanism that dynamically controls the degree of attention to each part or a certain part of the neural network model in the neural network model by calculating the attention weight of each part in the neural network model and merging them into an attention vector .
  • Focus attention refers to the attention that has a predetermined purpose, depends on tasks, and actively and consciously focuses on an object; the other is bottom-up unconscious attention, called saliency-based attention.
  • the attention mechanism also includes the following variants: multi-head attention mechanism, hard attention mechanism, key-value pair attention mechanism and structured attention mechanism.
  • the multi-head attention mechanism uses multiple queries to calculate and select multiple information from the input information in parallel, and each attention focuses on different parts of the input information.
  • o j to represent the j-th local feature vector
  • w i to represent the i-th word vector
  • the formula can be used Calculate the attention weights of multiple local feature vectors for the word vector w i , where, Indicates the attention weight, q m ( ) and k m ( ) represent the linear transformation with a bias term, and k1 represents the number of local feature vectors (ie, the K1th).
  • the sample image in Fig. 5 is divided into three regions: the title of the news, the category of the news, and the picture of the news.
  • the local feature vectors representing these three regions can be obtained; the word "states "For example, for the word vector representing the word "states", the attention weights of the local feature vectors of the three regions respectively represent the degree to which the sample user pays attention to the three regions when paying attention to the word "states”.
  • Step 203 based on the respective attention weights of the multiple local feature vectors, each word vector is fused with multiple local feature vectors to obtain a first fused feature vector, and each word vector corresponds to a first fused feature vector.
  • the multiple local feature vectors may be weighted by their respective attention weights, and then the weighted result is added to the word vector to obtain the first fusion feature vector.
  • Step 204 based on the N first fused feature vectors corresponding to the user feature vector and N word vectors, and predicting the degree of preference of the sample user for each sample image through the prediction model, the input of the prediction model is based on the user feature vector and N word vectors
  • the user feature vector determined by the first fused feature vector is used to characterize the user feature data of the sample user.
  • the user feature vector and N first fusion feature vectors can be directly input into the prediction model, so as to obtain the sample user's preference degree of the sample image output by the prediction model;
  • the first fused feature vector is processed to obtain intermediate feature data, and then the user feature vector and the intermediate feature data are input to the prediction model, so as to obtain the sample user's preference degree for the sample image output by the prediction model.
  • step 204 includes:
  • Step 301 for each sample image, process the N first fused feature vectors corresponding to the N word vectors through the model of the self-attention mechanism to obtain N semantically enhanced feature vectors, and each first fused feature vector corresponds to A semantically augmented feature vector.
  • the self-attention mechanism is a mechanism improved from the attention mechanism, which reduces the dependence on external information and is better at capturing the internal correlation of data or features.
  • the self-attention mechanism can be used to better Analyze the correlation between the N first fused feature vectors; correspondingly, the attention mechanism is used to capture the correlation outside the data.
  • the attention mechanism is used to process word vectors and multiple local feature vectors , compared to the word represented by the word vector, the image area represented by the local feature vector is external, so the embodiment of the present application uses the attention mechanism to capture the relationship between the word represented by the word vector and the image area represented by the local feature vector relevance.
  • the self-attention mechanism includes a single-head self-attention mechanism and a multi-head self-attention mechanism.
  • N first fused feature vectors are obtained from N word vectors, and there is a semantic relationship between the N word vectors, correspondingly, there is also a semantic relationship between the N first fused feature vectors ; Therefore, in this embodiment, semantic enhancement processing is performed on the N first fused feature vectors through the model of the self-attention mechanism.
  • the process of processing the N first fusion feature vectors through the model of the self-attention mechanism may include: Using the formula and Process the N first fused feature vectors, where q( ) and k( ) represent linear transformations, Indicates the jth first fused feature vector For the i-th first fused feature vector The degree of semantic enhancement, k2 represents the number of local feature vectors (ie K2th).
  • Step 302 Based on the user feature vector and N semantic enhancement feature vectors, predict the sample user's preference for each sample image through a prediction model, the input of the prediction model is determined based on the user feature vector and N semantic enhancement feature vectors.
  • the user feature vector and N semantic enhancement feature vectors can be directly input into the prediction model, so as to obtain the sample user's preference degree of the sample image output by the prediction model;
  • the enhanced feature vector is processed to obtain intermediate feature data, and then the user feature vector and the intermediate feature data are input to the prediction model, so as to obtain the sample user's preference degree for the sample image output by the prediction model.
  • step 302 includes:
  • the N semantically enhanced feature vectors are fused by the model of the additive attention mechanism to obtain the second fused feature vector;
  • the preference degree of the sample user to each sample image is predicted through the prediction model, and the input of the prediction model is determined based on the user feature vector and the second fusion feature vector.
  • the fusion of N semantically enhanced feature vectors through the model of the additive attention mechanism includes: using the formula and Process the fusion of N semantically enhanced feature vectors, k a is used to combine Converted into a hidden space vector, q a is used to calculate the attention weight in the fusion process, Indicates the attention weight of the i-th semantically enhanced feature vector, e1 indicates the second fusion feature vector, and k3 indicates the number of local feature vectors (that is, the K3th).
  • the process of obtaining the second fusion feature vector can be summarized as follows: take the word vector and local feature vector as input, and use the attention mechanism and the self-attention mechanism in turn And the additive attention mechanism outputs the second fused feature vector.
  • the situation that the image characteristic data includes the local visual impression characteristic data is introduced above, and the situation that the image characteristic data includes the global visual impression characteristic data is introduced below.
  • step 104 includes:
  • Step 401 for each sample image, obtain a content feature vector based on the sample candidate content in each sample image, and the content feature vector is used to characterize the sample candidate content.
  • text characterizers can also be used to convert sample candidate content into content feature vectors.
  • the title of the news content can better reflect the main information of the news content; therefore, when the sample candidate content is news content, the title of the news content can be converted into a title feature vector, and use the title feature vector as a content feature vector representing the content of the sample candidate.
  • Step 402 based on the content feature vector and the global feature vector, determine the weight of the content feature vector and the weight of the global feature vector.
  • a threshold addition network can be used to adaptively control the respective weights of content feature vectors and global feature vectors.
  • Step 403 based on the weight of the content feature vector and the weight of the global feature vector, the content feature vector and the global feature vector are fused to obtain a third fused feature vector.
  • the process of obtaining the third fusion feature vector can be summarized as follows: the content feature vector and the global feature vector are used as input, and the threshold addition network is used to output the third fusion feature vector.
  • Step 404 based on the user feature vector and the third fusion feature vector, and predicting the preference degree of the sample user for each sample image through the prediction model, the input of the prediction model is determined based on the user feature vector and the third fusion vector, the user feature vector User characteristic data used to characterize sample users.
  • the user feature vector and the third fusion feature vector can be directly input into the prediction model, so as to realize the prediction of the degree of preference of each sample image.
  • the embodiment of the present application provides an embodiment of a recommendation method, which can be applied to a server or a terminal. Specifically, this embodiment includes:
  • Step 501 acquire a plurality of images, each image includes a candidate interface and a candidate content presented through the candidate interface.
  • Step 502 acquiring image characteristic data of each image.
  • each image includes multiple regions, and correspondingly, the image feature data of each image includes multiple local feature vectors, and each local feature vector is used to represent a region.
  • the image feature data of each image includes a global feature vector, and the global feature vector is used to characterize the image.
  • Step 503 acquiring user feature data of the target user.
  • Step 504 based on the user feature data and image feature data of the target user, predict the target user's preference for each image through a prediction model, and the input of the prediction model is determined based on the user feature data and image feature data.
  • step 504 when the image feature data of each image includes multiple local feature vectors, step 504 includes:
  • Step 601 for each image, obtain N word vectors based on the candidate content in each image, each word vector represents a word in the candidate content, where N is a positive integer;
  • Step 602 for each word vector, based on each word vector and multiple local feature vectors, and through the model of the attention mechanism, calculate the respective attention weights of multiple local feature vectors.
  • the attention weight indicates that the target user is reading each When using words represented by word vectors, pay attention to the extent of the region represented by local feature vectors;
  • Step 603 based on the respective attention weights of a plurality of local feature vectors, each word vector is fused with a plurality of local feature vectors to obtain a first fusion feature vector, and each word vector corresponds to a first fusion feature vector;
  • Step 604 based on the user feature vector and the N first fused feature vectors corresponding to the N word vectors, and predicting the degree of preference of the target user for each image through the prediction model, the input of the prediction model is based on the user feature vector and the Nth A fused feature vector is determined, and the user feature vector is used to characterize the user feature data of the target user.
  • step 604 includes:
  • Step 701 for each image, process the N first fused feature vectors corresponding to the N word vectors through the model of the self-attention mechanism to obtain N semantically enhanced feature vectors, and each first fused feature vector corresponds to one Semantic enhanced feature vectors;
  • Step 702 Based on the user feature vector and N semantic enhancement feature vectors, predict the target user's preference for each image through a prediction model, the input of the prediction model is determined based on the user feature vector and N semantic enhancement feature vectors.
  • step 702 includes:
  • the N semantically enhanced feature vectors are fused by the model of the additive attention mechanism to obtain the second fused feature vector;
  • the input of the prediction model is determined based on the user feature vector and the second fusion feature vector.
  • step 504 when the image feature data of each image includes a global feature vector, step 504 includes:
  • Step 801 for each image, obtain a content feature vector based on the candidate content in each image, and the content feature vector is used to represent the candidate content;
  • Step 802 based on the content feature vector and the global feature vector, determine the weight of the content feature vector and the weight of the global feature vector;
  • Step 803 based on the weight of the content feature vector and the weight of the global feature vector, fusing the content feature vector and the global feature vector to obtain a third fused feature vector;
  • Step 804 based on the user feature vector and the third fusion feature vector, and predicting the preference degree of the target user for each image through the prediction model, the input of the prediction model is determined based on the user feature vector and the third fusion vector, and the user feature vector is determined by It is used to characterize the user characteristic data of the target user.
  • step 804 includes:
  • a candidate interface is selected as the target candidate interface from the candidate interfaces containing images of the target candidate content, so as to recommend the target candidate content through the target candidate interface.
  • steps 501 to 504 are similar to steps 101 to 104 , and for details, please refer to relevant descriptions of steps 101 and 103 above for understanding.
  • Step 505 selecting candidate content and/or candidate interface from candidate interfaces and candidate content contained in multiple images based on the degree of preference for recommendation.
  • the preference-based program can only select candidate content in multiple images for recommendation, or select only candidate interfaces in multiple images for recommendation, or select candidate content and candidate interfaces from multiple images at the same time for recommendation , which will be described in detail below.
  • user logs are used to obtain user click history
  • news material and news interface are used to obtain news visual impressions
  • data preprocessing module local impression module
  • global impression module global impression module
  • model prediction module model prediction module to obtain user click history.
  • the preference program for news the preference degree specifically refers to the user’s preference program for the news content in the image (i.e. candidate content); finally, sort the multiple images from high to low according to the degree of preference, and then select the top M news content of an image and recommend it to target users.
  • the data preprocessing module is used to perform step 502 and step 503, and the local impression module is used to perform the fusion operation in step 603, step 701 and step 702 to obtain the second fusion feature vector; the global impression module is used to perform step 802 And step 803 , the model prediction module is used to perform the prediction operation in step 702 and the prediction operation in step 804 .
  • the current user’s user-side features that is, user feature data
  • news materials and news interfaces are used to obtain multiple news interface combination candidates (that is, the multiple images in the previous article)
  • the processing module, local impression module, global impression module, model prediction module and interface generation module are processed to obtain the user's preference program for news, and the degree of preference specifically refers to the user's preference program for the user interface (ie candidate interface) in the image ;Finally, sort multiple images from high to low according to the degree of preference, then select the user interface in the image with the highest degree of preference (i.e. the best user interface), and then generate the best user interface configuration; after that, it can be based on The best UI configuration shows the best UI and recommends various content for the current user with the best UI.
  • the data preprocessing module is used to perform step 502 and step 503, and the local impression module is used to perform the fusion operation in step 603, step 701 and step 702 to obtain the second fusion feature vector; the global impression module is used to perform step 802 And in step 803, the model prediction module is used to perform the prediction operation in step 702 and the prediction operation in step 804, and the interface generation module is used to generate an optimal user interface according to the result predicted by the model prediction module.
  • step 505 includes:
  • a candidate interface is selected as the target candidate interface from the candidate interfaces containing images of the target candidate content, so as to recommend the target candidate content through the target candidate interface.
  • various candidate contents may be selected and recommended to the target user, and the target candidate content is one of the selected candidate contents.
  • the number of images is 4, the first image contains candidate content A and candidate interface A, the second image contains candidate content A and candidate interface B, the third image contains candidate content B and candidate interface A, and the fourth image contains candidate content A and candidate interface B.
  • An image contains candidate content B and candidate interface B; the target user's preference for these four images is in order from high to low: the first image, the second image, the fourth image, and the third image.
  • the target candidate interface is selected from the candidate interfaces of the first image and the second image;
  • the degree of preference for one image is higher than that for the second image, so the candidate interface A in the first image is selected as the target candidate interface, and then the candidate content A is recommended to the target user through the candidate interface A.
  • the target candidate interface is selected from the candidate interfaces of the fourth image and the third image; and since the target The user's preference for the fourth image is higher than that for the third image, so the candidate interface B in the fourth image is selected as the target candidate interface, and then the candidate content B is recommended to the target user through the candidate interface B.
  • the obtained target candidate interfaces may be different.
  • Step 506 Send the metadata of the target candidate interface and the target candidate content to the terminal device, so that the terminal device displays the target candidate interface based on the metadata, and recommends the target candidate content to the target user through the target candidate interface.
  • the server when the above method is executed by the server, the server will send the metadata of the target candidate interface and the target candidate content to the terminal device; correspondingly, the terminal device will receive the metadata of the target candidate interface and the target candidate content , and then display the target candidate interface based on the metadata, and recommend the target candidate content to the target user through the target candidate interface.
  • the embodiment of the present application provides an embodiment of a recommendation device, including: a first image acquisition unit 601, configured to acquire multiple images, each image contains a candidate interface and a candidate interface presented candidate content; the first characteristic data acquisition unit 602 is used to obtain the image characteristic data of each image; the first prediction unit 603 is used to predict the target user based on the user characteristic data and image characteristic data of the target user through a prediction model For the preference degree of each image, the input of the prediction model is determined based on user characteristic data and image characteristic data; the recommending unit 604 is used to select candidate content and / or candidate interface, for recommendation.
  • a recommendation device including: a first image acquisition unit 601, configured to acquire multiple images, each image contains a candidate interface and a candidate interface presented candidate content; the first characteristic data acquisition unit 602 is used to obtain the image characteristic data of each image; the first prediction unit 603 is used to predict the target user based on the user characteristic data and image characteristic data of the target user through a prediction model For the preference degree of each image, the input of the prediction model
  • each image includes multiple regions; the image feature data of each image includes multiple local feature vectors, and each local feature vector is used to characterize a region.
  • the first prediction unit 603 is configured to, for each image, obtain N word vectors based on the candidate content in each image, and each word vector represents a word in the candidate content, where N is a positive integer; for each word vector, based on each word vector and multiple local feature vectors, the respective attention weights of multiple local feature vectors are calculated through the model of the attention mechanism.
  • the attention weight indicates that the target user is reading each When a word represented by a word vector, pay attention to the extent of the region represented by the local feature vector; based on the respective attention weights of multiple local feature vectors, each word vector and multiple local feature vectors are fused to obtain the first fusion feature vector , each word vector corresponds to a first fusion feature vector; based on the user feature vector and the N first fusion feature vectors corresponding to the N word vectors, and predict the target user's preference for each image through the prediction model, the prediction model
  • the input of is determined based on the user feature vector and the N first fusion feature vectors, and the user feature vector is used to represent the user feature data of the target user.
  • the first prediction unit 603 is used to process the N first fused feature vectors corresponding to the N word vectors through the model of the self-attention mechanism for each image to obtain N semantic Enhanced feature vectors, each first fusion feature vector corresponds to a semantically enhanced feature vector; based on the user feature vector and N semantically enhanced feature vectors, and predict the target user's preference for each image through the prediction model, the input of the prediction model is It is determined based on the user feature vector and N semantic enhancement feature vectors.
  • the first prediction unit 603 is configured to predict the target user's preference for each image based on the user feature vector and N semantic enhancement feature vectors through a prediction model, including: for each image, by The model of the additive attention mechanism fuses N semantic enhancement feature vectors to obtain the second fusion feature vector; based on the user feature vector and the second fusion feature vector, and predicts the target user's preference for each image through the prediction model, the prediction The input to the model is determined based on the user feature vector and the second fused feature vector.
  • the image feature data of each image includes a global feature vector, and the global feature vector is used to characterize the image.
  • the first prediction unit 603 is configured to, for each image, obtain a content feature vector based on the candidate content in each image, and the content feature vector is used to characterize the candidate content; based on the content feature vector and the global feature vector, determine the weight of the content feature vector and the weight of the global feature vector; based on the weight of the content feature vector and the weight of the global feature vector, the content feature vector and the global feature vector are fused to obtain the third fusion feature vector; based on the user feature vector and the third fusion feature vector, and predict the target user's preference for each image through the prediction model.
  • the input of the prediction model is determined based on the user feature vector and the third fusion vector, and the user feature vector is used to represent the user characteristics of the target user data.
  • the recommending unit 604 is configured to select a candidate content as the target candidate content from the candidate content contained in multiple images based on the degree of preference; , select a candidate interface as the target candidate interface, so as to recommend the target candidate content through the target candidate interface.
  • the apparatus further includes a sending unit 605, configured to send the metadata of the target candidate interface and the content of the target candidate to the terminal device, so that the terminal device displays the target candidate interface based on the metadata, and sends the target candidate interface to the The target user recommends target candidate content.
  • a sending unit 605 configured to send the metadata of the target candidate interface and the content of the target candidate to the terminal device, so that the terminal device displays the target candidate interface based on the metadata, and sends the target candidate interface to the The target user recommends target candidate content.
  • the embodiment of the present application provides an embodiment of a training device, including: a second image acquisition unit 701, configured to acquire a plurality of sample images, each sample image contains a sample candidate interface and passed sample candidate A sample candidate content presented on the interface; a second characteristic data acquisition unit 702, configured to acquire image characteristic data of each sample image; a second prediction unit 703, configured to use user characteristic data and image characteristic data based on the sample user, and Predict the degree of preference of the sample user to each sample image through the prediction model, the input of the prediction model is determined based on user characteristic data and image characteristic data; the adjustment unit 704 is used to click on the sample candidate content based on the degree of preference and the history of the sample user data to adjust the predictive model.
  • a second image acquisition unit 701 configured to acquire a plurality of sample images, each sample image contains a sample candidate interface and passed sample candidate A sample candidate content presented on the interface
  • a second characteristic data acquisition unit 702 configured to acquire image characteristic data of each sample image
  • a second prediction unit 703, configured to use user characteristic data
  • each sample image includes multiple regions; image feature data of each sample image includes multiple local feature vectors, and each local feature vector is used to characterize a region.
  • the second prediction unit 703 is configured to, for each sample image, obtain N word vectors based on the sample candidate content in each sample image, and each word vector represents a word in the sample candidate content , where N is a positive integer; for each word vector, based on each word vector and multiple local feature vectors, the respective attention weights of multiple local feature vectors are calculated through the model of the attention mechanism, and the attention weight represents the sample When the user reads the words represented by each word vector, the extent to which the user pays attention to the region represented by the local feature vector; based on the respective attention weights of multiple local feature vectors, each word vector and multiple local feature vectors are fused to obtain the first A fusion feature vector, each word vector corresponds to a first fusion feature vector; based on the N first fusion feature vectors corresponding to the user feature vector and N word vectors, and predicting the sample user's response to each sample image through the prediction model For the degree of preference, the input of the prediction model is determined based on the user feature vector and the N first first fusion feature vector
  • the second prediction unit 703 is configured to, for each sample image, process the N first fused feature vectors corresponding to the N word vectors through the model of the self-attention mechanism, so as to obtain N Semantic enhancement feature vectors, each first fusion feature vector corresponds to a semantic enhancement feature vector; based on the user feature vector and N semantic enhancement feature vectors, and predict the preference degree of the sample user for each sample image through the prediction model, the prediction model The input is determined based on the user feature vector and N semantically enhanced feature vectors.
  • the second prediction unit 703 is configured to, for each sample image, fuse N semantically enhanced feature vectors through an additive attention mechanism model to obtain a second fusion feature vector; based on the user feature vector and the second fusion feature vector, and predict the preference degree of the sample user for each sample image through the prediction model, the input of the prediction model is determined based on the user feature vector and the second fusion feature vector.
  • the image feature data of each sample image includes a global feature vector, and the global feature vector is used to characterize the sample image.
  • the second prediction unit 703 is configured to, for each sample image, obtain a content feature vector based on the sample candidate content in each sample image, and the content feature vector is used to characterize the sample candidate content; based on the content feature vector and the global feature vector, determine the weight of the content feature vector and the weight of the global feature vector; based on the weight of the content feature vector and the weight of the global feature vector, the content feature vector and the global feature vector are fused to obtain the third fusion feature vector; Based on the user feature vector and the third fusion feature vector, and predict the sample user's preference for each sample image through the prediction model, the input of the prediction model is determined based on the user feature vector and the third fusion vector, and the user feature vector is used to represent The user characteristic data of the sample user.
  • the embodiment of the present application also provides an embodiment of a computer device.
  • the computer device may be a terminal or a server.
  • the computer device may be used as a training device.
  • FIG. 19 is a schematic structural diagram of a computer device provided by an embodiment of the present application, which is used to realize the function of the recommendation device in the embodiment corresponding to FIG. 17 or the function of the training device in the embodiment corresponding to FIG. 18.
  • the computer device 1800 is realized by one or more servers, and the computer device 1800 may have relatively large differences due to different configurations or performances, and may include one or more central processing units (central processing units, CPU) 1822 (for example, one or one processor) and memory 1832, one or more storage media 1830 (such as one or more mass storage devices) for storing application programs 1842 or data 1844.
  • the memory 1832 and the storage medium 1830 may be temporary storage or persistent storage.
  • the program stored in the storage medium 1830 may include one or more modules (not shown in the figure), and each module may include a series of instruction operations on the computer device. Furthermore, the central processing unit 1822 may be configured to communicate with the storage medium 1830 , and execute a series of instruction operations in the storage medium 1830 on the computer device 1800 .
  • Computer device 1800 can also include one or more power supplies 1826, one or more wired or wireless network interfaces 1850, one or more input and output interfaces 1858, and/or, one or more operating systems 1841, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, etc.
  • the central processing unit 1822 may be used to execute the retrieval method performed by the recommendation device in the embodiment corresponding to FIG. 17 .
  • the central processing unit 1822 can be used for:
  • each image contains a candidate interface and a candidate content presented through the candidate interface
  • the input of the prediction model is determined based on the user characteristic data and image characteristic data;
  • the candidate content and/or the candidate interface are selected from the candidate interfaces and candidate contents included in the multiple images for recommendation.
  • the central processing unit 1822 may be used to execute the model training method executed by the training device in the embodiment corresponding to FIG. 18 .
  • the central processing unit 1822 can be used for:
  • each sample image includes a sample candidate interface and a sample candidate content presented through the sample candidate interface;
  • the input of the prediction model is determined based on the user characteristic data and image characteristic data;
  • the prediction model is adjusted.
  • the embodiment of the present application also provides a chip, including one or more processors. Part or all of the processor is used to read and execute the computer program stored in the memory, so as to execute the methods of the foregoing embodiments.
  • the chip includes a memory, and the memory and the processor are connected to the memory through a circuit or wires. Further optionally, the chip further includes a communication interface, and the processor is connected to the communication interface.
  • the communication interface is used to receive data and/or information to be processed, and the processor obtains the data and/or information from the communication interface, processes the data and/or information, and outputs the processing result through the communication interface.
  • the communication interface may be an input-output interface.
  • some of the one or more processors may implement some of the steps in the above method through dedicated hardware, for example, the processing related to the neural network model may be performed by a dedicated neural network processor or graphics processor to achieve.
  • the method provided in the embodiment of the present application may be implemented by one chip, or may be implemented by multiple chips in cooperation.
  • the embodiment of the present application also provides a computer storage medium, which is used for storing computer software instructions used by the above-mentioned computer equipment, which includes a program for executing a program designed for the computer equipment.
  • the computer equipment may be the recommending device in the embodiment corresponding to FIG. 17 or the training device in the embodiment corresponding to FIG. 18 .
  • the embodiment of the present application also provides a computer program product, the computer program product includes computer software instructions, and the computer software instructions can be loaded by a processor to implement the procedures in the methods shown in the foregoing embodiments.
  • the disclosed system, device and method can be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components can be combined or Can be integrated into another system, or some feature data can be ignored, or not implemented.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated units can be implemented in the form of hardware or in the form of software functional units.
  • the integrated unit is realized in the form of a software function unit and sold or used as an independent product, it can be stored in a computer-readable storage medium.
  • the technical solution of the present application is essentially or part of the contribution to the prior art or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , including several instructions to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disc, etc., which can store program codes. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Library & Information Science (AREA)
  • Multimedia (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Image Analysis (AREA)

Abstract

本申请实施例公开了一种推荐方法、训练方法、装置、设备及推荐系统,用于利用新闻界面对用户的影响,来提高用户对新闻的点击率。本申请实施例方法包括:获取多张图像,每张图像包含一个候选界面和通过候选界面呈现的一种候选内容;获取每张图像的图像特征数据;基于目标用户的用户特征数据和图像特征数据确定预测模型的输入,然后通过预测模型预测目标用户对每张图像的偏好程度;最后基于偏好程度从多张图像包含的候选界面和候选内容中,选择候选内容和/或候选界面,然后通过选择出的候选内容或候选界面为用户推荐。

Description

一种推荐方法、训练方法、装置、设备及推荐系统
本申请要求于2021年08月20日提交中国专利局、申请号为202110963660.X、发明名称为“一种推荐方法、训练方法、装置、设备及推荐系统”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请实施例涉及推荐技术领域,尤其涉及一种推荐方法、训练方法、装置、设备及推荐系统。
背景技术
如今,各种新闻类移动应用已经改变了人们传统的阅读新闻的方式。各大新闻平台无时无刻地产生着海量的新闻。因此,用户在使用这些新闻应用时,就会被推荐各式各样的新闻内容,如果被推荐的新闻内容是用户不感兴趣的,便会导致新闻的点击率降低。为了提高新闻的点击率,个性化的新闻推荐系统应运而生,该系统是通过机器学习的方法挖掘用户的兴趣点,以为用户推荐更加感兴趣的新闻内容,从而提高新闻的点击率。
然而,目前的新闻推荐系统仅仅是用于挖掘用户感兴趣的新闻内容,忽略了推荐新闻内容所使用的新闻界面对用户的影响,导致无法进一步提高新闻的点击率。
发明内容
本申请实施例提供了一种推荐方法、训练方法、装置、设备及推荐系统,用于利用新闻界面对用户的影响,来提高用户对新闻的点击率。
第一方面,本申请实施例提供了一种推荐方法,包括:获取多张图像,每张图像包含一个候选界面和通过候选界面呈现的一种候选内容,其中,图像可以理解为通过候选界面呈现候选内容的图像;候选内容不仅可以为新闻内容,也可以为短视频、商品信息等其他内容;相应地,候选界面不仅可以为新闻界面,也可以为用于呈现短视频的界面、用于呈现商品信息的界面;获取每张图像的图像特征数据;图像特征数据可以包括全局视觉印象特征数据和/或局部视觉印象特征数据,其中,全局视觉印象特征数据可以理解为从整个图像中提取的特征数据,局部视觉印象特征数据可以理解为从图像的局部区域中提取的特征数据;基于目标用户的用户特征数据和图像特征数据,并通过预测模型预测目标用户对每张图像的偏好程度,预测模型的输入是基于用户特征数据和图像特征数据确定的,其中,用户的特征数据包括用户的年龄信息、用户所在的城市以及用户与新闻相关的历史数据;其中,用户与新闻相关的历史数据具体可以包括用户浏览的新闻的类型、用户点击新闻的类型、用户点击新闻的时间、用户点击新闻时的地点等;基于偏好程度从多张图像包含的候选界面和候选内容中,选择候选内容和/或候选界面,以进行推荐,具体地,基于偏好程序可以仅选择多张图像中的候选内容进行推荐,也可以仅选择多张图像中的候选界面进行推荐,还可以同时从多张图像选择候选内容和候选界面进行推荐。
由于图像既包含了候选内容,又包含了候选界面,所以基于图像的图像特征数据训练得到的预测模型,能够在同时考虑候选内容和候选界面对用户的影响的情况下,准确地预测用户对图像的偏好程度,从而有利于通过用户感兴趣的候选界面为用户推荐感兴趣的内 容,以提高用户对推荐内容的点击率。
作为一种可实现的方式,每张图像包括多个区域,具体地,可以通过多种方法对图像进行划分,从而得到多个区域;例如,基于前述说明可知,一条新闻可以包括新闻的标题、新闻的作者和新闻的类别等部分,除此之外,新闻还可以包括配图部分;因此,可以按照新闻排版板式获得上述各个部分的区域坐标,然后根据区域坐标将图像划分为多个区域;每张图像的图像特征数据包括多个局部特征向量,每个局部特征向量用于表征一个区域。
在该实现方式中,将图像划分为多个区域,并将表征每个区域的局部特征向量作为图像的图像特征数据,从而可以较好地提取图像的局部特征,以提高用户对图像的偏好程度的预测的准确率。
作为一种可实现的方式,基于目标用户的用户特征数据和图像特征数据,并通过预测模型预测目标用户对每张图像的偏好程度包括:对于每张图像,基于每张图像中的候选内容获取N个词向量,每个词向量表征候选内容中的一个词语,其中,N为正整数;候选内容包括N个词语,对应每个词语,可以利用文本表征器生成一个词向量;与图片表征器类似,文本表征器也可以理解为通过预先训练获取的模型,该模型的种类可以有多种,例如,该模型可以为Bert模型;由于新闻内容的标题能够较好地体现新闻内容的主要信息;因此,当候选内容为新闻内容时,可以对新闻内容的标题进行分词处理,以得到N个词语,然后通过文本表征器获取表征N个词语的N个词向量;对于每个词向量,基于每个词向量和多个局部特征向量,并通过注意力机制的模型计算多个局部特征向量各自的注意力权重,注意力权重表示目标用户在阅读每个词向量表征的词语时,关注局部特征向量表征的区域的程度;注意力机制为一种通过计算神经网络模型中的各个部分的注意力权重、并合并成注意力向量,从而在神经网络模型中动态控制对神经网络模型中各个部分或某一部分的关注度的机制;基于多个局部特征向量各自的注意力权重,将每个词向量和多个局部特征向量融合,以得到第一融合特征向量,每个词向量对应得到一个第一融合特征向量;具体地,可以通过多个局部特征向量各自的注意力权重对多个局部特征向量进行加权处理,然后将加权处理的结果与词向量相加得到第一融合特征向量;基于用户特征向量和N个词向量对应的N个第一融合特征向量,并通过预测模型预测目标用户对每张图像的偏好程度,预测模型的输入是基于用户特征向量和N个第一融合特征向量确定的,用户特征向量用于表征目标用户的用户特征数据。
在该实现方式中,通过注意力机制的模型计算多个局部特征向量各自的注意力权重,由于注意力权重表示目标用户在阅读每个词向量表征的词语时,关注局部特征向量表征的区域的程度,所以基于多个局部特征向量各自的注意力权重,将每个词向量和多个局部特征向量融合,得到的第一融合特征向量能够反映出图像中的词语和各个区域给用户留下的印象特征信息;这样,利用第一融合特征向量预测偏好程度,能够提高用户对图像的偏好程度的准确率。
作为一种可实现的方式,基于用户特征向量和N个词向量对应的N个第一融合特征向量,并通过预测模型预测目标用户对每张图像的偏好程度包括:对于每张图像,通过自注意力机制的模型对N个词向量对应的N个第一融合特征向量进行处理,以得到N个语义增 强特征向量,每个第一融合特征向量对应一个语义增强特征向量,其中,自注意力机制(self-attention mechanism)是对注意力机制改进得到的一种机制,其减少了对外部信息的依赖,更擅长捕捉数据或特征的内部相关性;基于用户特征向量和N个语义增强特征向量,并通过预测模型预测目标用户对每张图像的偏好程度,预测模型的输入是基于用户特征向量和N个语义增强特征向量确定的。
语义增强特征向量是通过自注意力机制的模型对N个词向量对应的N个第一融合特征向量进行处理得到的,由于自注意力机制更擅长捕捉数据或特征的内部相关性,所以得到的语义增强特征向量能够反映出第一融合特征向量间的相关性,从而可以更准确地反映出图像给用户留下的印象特征信息;这样,利用语义增强特征向量预测偏好程度,能够提高用户对图像的偏好程度的准确率。
作为一种可实现的方式,基于用户特征向量和N个语义增强特征向量,并通过预测模型预测目标用户对每张图像的偏好程度包括:对于每张图像,通过加法注意力机制的模型将N个语义增强特征向量融合,以得到第二融合特征向量;基于用户特征向量和第二融合特征向量,并通过预测模型预测目标用户对每张图像的偏好程度,预测模型的输入是基于用户特征向量和第二融合特征向量确定的。
通过加法注意力机制的模型实现了对N个语义增强特征向量的融合,利用融合后的第二融合特征向量预测偏好程度,提高用户对图像的偏好程度的准确率。
作为一种可实现的方式,每张图像的图像特征数据包括全局特征向量,全局特征向量用于表征图像;此时,该图像特征数据也可以称为全局视觉印象特征数据;获取全局特征向量的方法可以具体包括:将图像输入到图片表征器中,以通过图片表征器将图像转化为全局特征向量。
在该实现方式中,将表征图像的全局特征向量作为图像的图像特征数据,从而可以较好地提取图像的全局特征,以提高用户对图像的偏好程度的预测的准确率。
作为一种可实现的方式,基于目标用户的用户特征数据和图像特征数据,并通过预测模型预测目标用户对每张图像的偏好程度包括:对于每张图像,基于每张图像中的候选内容获取内容特征向量,内容特征向量用于表征候选内容;由于新闻内容的标题能够较好地体现新闻内容的主要信息;因此,当候选内容为新闻内容时,可以将新闻内容的标题转化成标题特征向量;基于内容特征向量和全局特征向量,确定内容特征向量的权重和全局特征向量的权重;由于用户可能对视觉印象信息和文本语义具有不同的敏感度,因此作为一种可实现的方式,可以采用通过门限加法网络自适应地控制内容特征向量和全局特征向量各自的权重;基于内容特征向量的权重和全局特征向量的权重,将内容特征向量和全局特征向量融合,以得到第三融合特征向量;基于用户特征向量和第三融合特征向量,并通过预测模型预测目标用户对每张图像的偏好程度,预测模型的输入是基于用户特征向量和第三融合向量确定的,用户特征向量用于表征目标用户的用户特征数据。
基于内容特征向量和全局特征向量,确定内容特征向量的权重和全局特征向量的权重,然后于内容特征向量的权重和全局特征向量的权重,将内容特征向量和全局特征向量融合得到的第三融合特征向量,可以从全局的角度表征提取图像给用户留下的印象特征信息; 因此,利用第三融合特征向量预测目标用户对每张图像的偏好程度,可以提高用户对图像的偏好程度的准确率。
作为一种可实现的方式,基于偏好程度从多张图像包含的候选界面和候选内容中,选择候选内容和/或候选界面,以进行推荐包括:基于偏好程度从多张图像包含的候选内容中选择一种候选内容作为目标候选内容;基于偏好程度从包含目标候选内容的图像的候选界面中,选择一种候选界面作为目标候选界面,以通过目标候选界面推荐目标候选内容。
于偏好程度从多张图像包含的候选内容中选择一种候选内容作为目标候选内容;基于偏好程度从包含目标候选内容的图像的候选界面中,选择一种候选界面作为目标候选界面,并通过目标候选界面推荐目标候选内容,实现了通过用户偏好的候选界面为用户推荐用户偏好的候选内容,从而可以提高用户点击推荐内容的概率。
作为一种可实现的方式,在基于偏好程度从包含目标候选内容的图像的候选界面中,选择一种候选界面作为目标候选界面之后,方法还包括:向终端设备发送目标候选界面的元数据和目标候选内容,以使得终端设备基于元数据显示目标候选界面,并通过目标候选界面向目标用户推荐目标候选内容;其中,该元数据包含目标候选界面的各种配置数据。
向终端设备发送目标候选界面的元数据和目标候选内容,使得终端设备基于元数据显示目标候选界面,并通过目标候选界面向目标用户推荐目标候选内容,从而可以提高用户点击推荐内容的概率。
第二方面,本申请实施例提供了一种训练方法,包括:获取多个样本图像,每个样本图像包含一个样本候选界面和通过样本候选界面呈现的一种样本候选内容;获取每个样本图像的图像特征数据;基于样本用户的用户特征数据和图像特征数据,并通过预测模型预测样本用户对每个样本图像的偏好程度,预测模型的输入是基于用户特征数据和图像特征数据确定的;基于偏好程度和样本用户对样本候选内容的历史点击数据,对预测模型进行调整,其中,样本用户对样本候选内容的历史点击数据可以包括,样本用户是否点击样本候选内容,以及样本用户点击样本候选内容次数;具体地,可以调整预测模型的权重,也可以调整预测模型的结构。
由于样本图像既包含了样本候选内容,又包含了样本候选界面,所以基于样本图像的图像特征数据训练得到的预测模型,能够在同时考虑候选内容和候选界面对用户的影响的情况下,准确地输出用户对图像的偏好程度,从而有利于通过用户感兴趣的界面为用户推荐感兴趣的内容,以提高用户对推荐内容的点击率。
作为一种可实现的方式,每个样本图像包括多个区域;每个样本图像的图像特征数据包括多个局部特征向量,每个局部特征向量用于表征一个区域。
其中,以上的相关说明以及技术效果请参考本申请实施例第一方面的描述。
作为一种可实现的方式,基于样本用户的用户特征数据和图像特征数据,并通过预测模型预测样本用户对每个样本图像的偏好程度包括:对于每个样本图像,基于每个样本图像中的样本候选内容获取N个词向量,每个词向量表征样本候选内容中的一个词语,其中,N为正整数;对于每个词向量,基于每个词向量和多个局部特征向量,并通过注意力机制的模型计算多个局部特征向量各自的注意力权重,注意力权重表示样本用户在阅读每个词 向量表征的词语时,关注局部特征向量表征的区域的程度;基于多个局部特征向量各自的注意力权重,将每个词向量和多个局部特征向量融合,以得到第一融合特征向量,每个词向量对应得到一个第一融合特征向量;基于用户特征向量和N个词向量对应的N个第一融合特征向量,并通过预测模型预测样本用户对每个样本图像的偏好程度,预测模型的输入是基于用户特征向量和N个第一融合特征向量确定的,用户特征向量用于表征样本用户的用户特征数据。
其中,以上的相关说明以及技术效果请参考本申请实施例第一方面的描述。
作为一种可实现的方式,基于用户特征向量和N个词向量对应的N个第一融合特征向量,并通过预测模型预测样本用户对每个样本图像的偏好程度包括:对于每个样本图像,通过自注意力机制的模型对N个词向量对应的N个第一融合特征向量进行处理,以得到N个语义增强特征向量,每个第一融合特征向量对应一个语义增强特征向量;基于用户特征向量和N个语义增强特征向量,并通过预测模型预测样本用户对每个样本图像的偏好程度,预测模型的输入是基于用户特征向量和N个语义增强特征向量确定的。
其中,以上的相关说明以及技术效果请参考本申请实施例第一方面的描述。
作为一种可实现的方式,基于用户特征向量和N个语义增强特征向量,并通过预测模型预测样本用户对每个样本图像的偏好程度包括:对于每个样本图像,通过加法注意力机制的模型将N个语义增强特征向量融合,以得到第二融合特征向量;基于用户特征向量和第二融合特征向量,并通过预测模型预测样本用户对每个样本图像的偏好程度,预测模型的输入是基于用户特征向量和第二融合特征向量确定的。
其中,以上的相关说明以及技术效果请参考本申请实施例第一方面的描述。
作为一种可实现的方式,每个样本图像的图像特征数据包括全局特征向量,全局特征向量用于表征样本图像。
其中,以上的相关说明以及技术效果请参考本申请实施例第一方面的描述。
作为一种可实现的方式,基于样本用户的用户特征数据和图像特征数据,并通过预测模型预测样本用户对每个样本图像的偏好程度包括:对于每个样本图像,基于每个样本图像中的样本候选内容获取内容特征向量,内容特征向量用于表征样本候选内容;基于内容特征向量和全局特征向量,确定内容特征向量的权重和全局特征向量的权重;基于内容特征向量的权重和全局特征向量的权重,将内容特征向量和全局特征向量融合,以得到第三融合特征向量;基于用户特征向量和第三融合特征向量,并通过预测模型预测样本用户对每个样本图像的偏好程度,预测模型的输入是基于用户特征向量和第三融合向量确定的,用户特征向量用于表征样本用户的用户特征数据。
其中,以上的相关说明以及技术效果请参考本申请实施例第一方面的描述。
第三方面,本申请实施例提供了一种推荐装置,包括:第一图像获取单元,用于获取多张图像,每张图像包含一个候选界面和通过候选界面呈现的一种候选内容;第一特征数据获取单元,用于获取每张图像的图像特征数据;第一预测单元,用于基于目标用户的用户特征数据和图像特征数据,并通过预测模型预测目标用户对每张图像的偏好程度,预测模型的输入是基于用户特征数据和图像特征数据确定的;推荐单元,用于基于偏好程度从 多张图像包含的候选界面和候选内容中,选择候选内容和/或候选界面,以进行推荐。
作为一种可实现的方式,每张图像包括多个区域;每张图像的图像特征数据包括多个局部特征向量,每个局部特征向量用于表征一个区域。
作为一种可实现的方式,第一预测单元,用于对于每张图像,基于每张图像中的候选内容获取N个词向量,每个词向量表征候选内容中的一个词语,其中,N为正整数;对于每个词向量,基于每个词向量和多个局部特征向量,并通过注意力机制的模型计算多个局部特征向量各自的注意力权重,注意力权重表示目标用户在阅读每个词向量表征的词语时,关注局部特征向量表征的区域的程度;基于多个局部特征向量各自的注意力权重,将每个词向量和多个局部特征向量融合,以得到第一融合特征向量,每个词向量对应得到一个第一融合特征向量;基于用户特征向量和N个词向量对应的N个第一融合特征向量,并通过预测模型预测目标用户对每张图像的偏好程度,预测模型的输入是基于用户特征向量和N个第一融合特征向量确定的,用户特征向量用于表征目标用户的用户特征数据。
作为一种可实现的方式,第一预测单元,用于对于每张图像,通过自注意力机制的模型对N个词向量对应的N个第一融合特征向量进行处理,以得到N个语义增强特征向量,每个第一融合特征向量对应一个语义增强特征向量;基于用户特征向量和N个语义增强特征向量,并通过预测模型预测目标用户对每张图像的偏好程度,预测模型的输入是基于用户特征向量和N个语义增强特征向量确定的。
作为一种可实现的方式,第一预测单元,用于基于用户特征向量和N个语义增强特征向量,并通过预测模型预测目标用户对每张图像的偏好程度包括:对于每张图像,通过加法注意力机制的模型将N个语义增强特征向量融合,以得到第二融合特征向量;基于用户特征向量和第二融合特征向量,并通过预测模型预测目标用户对每张图像的偏好程度,预测模型的输入是基于用户特征向量和第二融合特征向量确定的。
作为一种可实现的方式,每张图像的图像特征数据包括全局特征向量,全局特征向量用于表征图像。
作为一种可实现的方式,第一预测单元,用于对于每张图像,基于每张图像中的候选内容获取内容特征向量,内容特征向量用于表征候选内容;基于内容特征向量和全局特征向量,确定内容特征向量的权重和全局特征向量的权重;基于内容特征向量的权重和全局特征向量的权重,将内容特征向量和全局特征向量融合,以得到第三融合特征向量;基于用户特征向量和第三融合特征向量,并通过预测模型预测目标用户对每张图像的偏好程度,预测模型的输入是基于用户特征向量和第三融合向量确定的,用户特征向量用于表征目标用户的用户特征数据。
作为一种可实现的方式,推荐单元,用于基于偏好程度从多张图像包含的候选内容中选择一种候选内容作为目标候选内容;基于偏好程度从包含目标候选内容的图像的候选界面中,选择一种候选界面作为目标候选界面,以通过目标候选界面推荐目标候选内容。
作为一种可实现的方式,装置还包括发送单元,用于向终端设备发送目标候选界面的元数据和目标候选内容,以使得终端设备基于元数据显示目标候选界面,并通过目标候选界面向目标用户推荐目标候选内容。
其中,以上各单元的具体实现、相关说明以及技术效果请参考本申请实施例第一方面的描述。
第四方面,本申请实施例提供了一种训练装置,包括:第二图像获取单元,用于获取多个样本图像,每个样本图像包含一个样本候选界面和通过样本候选界面呈现的一种样本候选内容;第二特征数据获取单元,用于获取每个样本图像的图像特征数据;第二预测单元,用于基于样本用户的用户特征数据和图像特征数据,并通过预测模型预测样本用户对每个样本图像的偏好程度,预测模型的输入是基于用户特征数据和图像特征数据确定的;调整单元,用于基于偏好程度和样本用户对样本候选内容的历史点击数据,对预测模型进行调整。
作为一种可实现的方式,每个样本图像包括多个区域;每个样本图像的图像特征数据包括多个局部特征向量,每个局部特征向量用于表征一个区域。
作为一种可实现的方式,第二预测单元,用于对于每个样本图像,基于每个样本图像中的样本候选内容获取N个词向量,每个词向量表征样本候选内容中的一个词语,其中,N为正整数;对于每个词向量,基于每个词向量和多个局部特征向量,并通过注意力机制的模型计算多个局部特征向量各自的注意力权重,注意力权重表示样本用户在阅读每个词向量表征的词语时,关注局部特征向量表征的区域的程度;基于多个局部特征向量各自的注意力权重,将每个词向量和多个局部特征向量融合,以得到第一融合特征向量,每个词向量对应得到一个第一融合特征向量;基于用户特征向量和N个词向量对应的N个第一融合特征向量,并通过预测模型预测样本用户对每个样本图像的偏好程度,预测模型的输入是基于用户特征向量和N个第一融合特征向量确定的,用户特征向量用于表征样本用户的用户特征数据。
作为一种可实现的方式,第二预测单元,用于对于每个样本图像,通过自注意力机制的模型对N个词向量对应的N个第一融合特征向量进行处理,以得到N个语义增强特征向量,每个第一融合特征向量对应一个语义增强特征向量;基于用户特征向量和N个语义增强特征向量,并通过预测模型预测样本用户对每个样本图像的偏好程度,预测模型的输入是基于用户特征向量和N个语义增强特征向量确定的。
作为一种可实现的方式,第二预测单元,用于对于每个样本图像,通过加法注意力机制的模型将N个语义增强特征向量融合,以得到第二融合特征向量;基于用户特征向量和第二融合特征向量,并通过预测模型预测样本用户对每个样本图像的偏好程度,预测模型的输入是基于用户特征向量和第二融合特征向量确定的。
作为一种可实现的方式,每个样本图像的图像特征数据包括全局特征向量,全局特征向量用于表征样本图像。
作为一种可实现的方式,第二预测单元,用于对于每个样本图像,基于每个样本图像中的样本候选内容获取内容特征向量,内容特征向量用于表征样本候选内容;基于内容特征向量和全局特征向量,确定内容特征向量的权重和全局特征向量的权重;基于内容特征向量的权重和全局特征向量的权重,将内容特征向量和全局特征向量融合,以得到第三融合特征向量;基于用户特征向量和第三融合特征向量,并通过预测模型预测样本用户对每 个样本图像的偏好程度,预测模型的输入是基于用户特征向量和第三融合向量确定的,用户特征向量用于表征样本用户的用户特征数据。
其中,以上各单元的具体实现、相关说明以及技术效果请参考本申请实施例第二方面的描述。
第五方面,本申请实施例提供了一种计算机设备,包括:一个或多个处理器和存储器;其中,存储器中存储有计算机可读指令;一个或多个处理器读取计算机可读指令,以使车载设备实现如第一方面任一实现方式的方法。
第六方面,本申请实施例提供了一种训练设备,包括:一个或多个处理器和存储器;其中,存储器中存储有计算机可读指令;一个或多个处理器读取计算机可读指令,以使车载设备实现如第二方面任一实现方式的方法。
第七方面,本申请实施例提供了一种计算机可读存储介质,包括计算机可读指令,当计算机可读指令在计算机上运行时,使得计算机执行如第一方面或第二方面任一实现方式的方法。
第八方面,本申请实施例提供了一种芯片,包括一个或多个处理器。处理器中的部分或全部用于读取并执行存储器中存储的计算机程序,以执行上述第一方面或第二方面任意可能的实现方式中的方法。
可选地,该芯片该包括存储器,该存储器与该处理器通过电路或电线与存储器连接。进一步可选地,该芯片还包括通信接口,处理器与该通信接口连接。通信接口用于接收需要处理的数据和/或信息,处理器从该通信接口获取该数据和/或信息,并对该数据和/或信息进行处理,并通过该通信接口输出处理结果。该通信接口可以是输入输出接口。
在一些实现方式中,一个或多个处理器中还可以有部分处理器是通过专用硬件的方式来实现以上方法中的部分步骤,例如涉及神经网络模型的处理可以由专用神经网络处理器或图形处理器来实现。
本申请实施例提供的方法可以由一个芯片实现,也可以由多个芯片协同实现。
第九方面,本申请实施例提供了一种计算机程序产品,该计算机程序产品包括计算机软件指令,该计算机软件指令可通过处理器进行加载来实现上述第一方面或第二方面中任意一种实现方式的方法。
第十方面,本申请实施例提供了一种推荐系统,包括终端设备和服务器;
服务器用于执行如第一方面中任意一种实现方式的方法;
终端设备用于接收来自服务器的目标候选界面的元数据和目标候选内容;
基于元数据显示目标候选界面,并通过目标候选界面向目标用户推荐目标候选内容。
附图说明
图1为本申请实施例提供的新闻推荐系统的架构示意图;
图2为新闻的一个实施例示意图;
图3为新闻推荐系统的工作过程的示意图;
图4为本申请实施例提供了一种训练方法的一个实施例的示意图;
图5为本申请实施例中样本图像的区域示意图;
图6为本申请实施例中预测样本用户对每个样本图像的偏好程度的第一实施例示意图;
图7为本申请实施例中预测样本用户对每个样本图像的偏好程度的第二实施例示意图;
图8为本申请实施例中第二融合特征向量的过程的示意图;
图9为本申请实施例中预测样本用户对每个样本图像的偏好程度的第三实施例示意图;
图10为本申请实施例中得到第三融合特征向量的过程的示意图;
图11为本申请实施例提供了一种推荐方法的一个实施例的示意图;
图12为本申请实施例中预测目标用户对每个图像的偏好程度的第一实施例示意图;
图13为本申请实施例中预测目标用户对每个图像的偏好程度的第二实施例示意图;
图14为本申请实施例中预测目标用户对每个图像的偏好程度的第三实施例示意图;
图15为本申请实施例中预测用户对新闻的偏好程度的实施例示意图;
图16为本申请实施例中获取最佳用户界面配置的实施例示意图;
图17为本申请实施例提供了一种训练装置的一个实施例的示意图;
图18为本申请实施例提供了一种推荐装置的一个实施例的示意图;
图19为本申请实施例提供的计算机设备的实施例示意图。
具体实施方式
下面结合附图,对本申请的实施例进行描述,显然,所描述的实施例仅仅是本申请一部分的实施例,而不是全部的实施例。本领域普通技术人员可知,随着技术的发展和新场景的出现,本申请实施例提供的技术方案对于类似的技术问题,同样适用。
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的实施例能够以除了在这里图示或描述的内容以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或模块的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或模块,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或模块。在本申请中出现的对步骤进行的命名或者编号,并不意味着必须按照命名或者编号所指示的时间/逻辑先后顺序执行方法流程中的步骤,已经命名或者编号的流程步骤可以根据要实现的技术目的变更执行次序,只要能达到相同或者相类似的技术效果即可。
另外,在本发明的描述中,除非另有说明,“多个”的含义是两个或两个以上。本申请中的术语“和/或”或字符“/”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,或A/B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。
本申请实施例可以应用于图1所示的新闻推荐系统中,如图1所示,该新闻推荐系统包括终端设备和服务器,终端设备与服务器通信连接。
本申请实施例对终端设备的种类不做具体限定,例如,终端设备可以包括手机、平板电脑、台式电脑、车载设备等任意可以部署新闻应用的设备;下文将终端设备简称为终端。
服务器可以是普通服务器,也可以是云服务器。
如图1所示,终端中部署有新闻应用,服务器中部署有推荐服务。
当用户访问终端中的新闻应用时,终端会向服务器发送请求,以请求服务器中的推荐服务;服务器在接收到请求后,会启动推荐服务,然后从大量的新闻内容选择用户感兴趣的新闻内容作为推荐的新闻内容;然后服务器将推荐的新闻内容发送至终端,然后由终端将推荐的新闻内容展示给用户。
本申请实施例对新闻内容不做具体限定;例如,如图2所示,新闻内容可以包括新闻的标题、新闻的作者和新闻的类别,除此之外,新闻内容还可以包括新闻的正文。
然而,对于一则新闻来说,不仅新闻内容会影响新闻的点击率,呈现新闻内容的新闻界面也会影响新闻的点击率。具体地,新闻界面中图文的布局(包括标题的位置、标题与配图的相对位置)、是否配图及配图的大小、配图的颜色、配图的清晰度、字体、字体的大小都会给用户留下不同的视觉印象,影响用户的浏览体验,从而影响用户对新闻的点击行为。
本申请实施例将新闻界面中给用户留下视觉印象的信息称为视觉印象信息,该视觉印象信息可以理解为用户视角下新闻界面展示的新闻多模态信息,具体可以包括前述的图文的布局、是否配图及配图的大小、配图的颜色、配图的清晰度、字体、字体的大小等信息。
以图2为例,图2中的第一则新闻与第二则新闻相比,图文的布局、配图的大小都不同;对于喜欢较大配图的用户来说,点击第一则新闻的概率较大。
基于此,为了提高新闻的点击率,本申请实施例提供了一种推荐方法,该方法是获取多张图像,每张图像包含一个候选界面和一种候选内容,然后根据目标用户的用户特征数据和图像的图像特征数据,并利用预测模型预测目标用户对每张图像的偏好程度,最终根据偏好程度从多张图像中选择候选内容和/或候选界面,以进行推荐;在上述推荐方法中,候选界面可以为新闻界面,候选内容可以为新闻内容,这样,该推荐方法便可以实现新闻的推荐,并且,在使用该推荐方法推荐新闻的过程中,不仅考虑了新闻内容目标用户的影响,还考虑了新闻界面对目标用户的影响,从而可以向目标用户推荐感兴趣的新闻(包括新闻内容和新闻界面),以进一步提高新闻的点击率。
需要说明的是,候选内容不仅可以为新闻内容,也可以为短视频、商品信息等其他内容;相应地,候选界面不仅可以为新闻界面,也可以为用于呈现短视频的界面、用于呈现商品信息的界面。下文以候选内容为新闻内容、候选界面为新闻界面为例,对本申请实施例的提供的方法进行介绍。
因此,在图1所示的新闻推荐系统中,服务器还可以选择用户感兴趣的新闻界面,然后将新闻界面的元数据发送至终端,然后由终端基于元数据显示新闻界面,并通过新闻界面将推荐的新闻内容展示给用户。
基于前文说明可知,在新闻推荐过程中,需要用到预测模型,因此需要预先训练得到该预测模型。
具体地,图1所示的新闻推荐系统的工作过程可以如图3所示。
服务器从用户的行为日志中提取与新闻相关的数据(具体可以包括浏览新闻的数据或 点击新闻的数据),利用与新闻相关的数据构建训练数据,再基于训练数据进行离线训练以得到预测模型;服务器在接收到对于推荐服务的请求后,通过预测模型进行在线预测,以得到用户对多个新闻图像的偏好程度,在根据该偏好程度选择新闻内容和新闻界面;最终由终端通过新闻界面向用户展示新闻内容。
下面先对预测模型的离线训练过程进行介绍。
如图4所示,本申请实施例提供了一种训练方法的一个实施例,该实施例通常应用于服务器,具体地,该实施例包括:
步骤101,获取多个样本图像,每个样本图像包含一个样本候选界面和通过样本候选界面呈现的一种样本候选内容。
样本图像可以理解为通过样本候选界面呈现样本候选内容的图像,其中,可参照前文中候选界面和候选内容的相关说明,对样本候选界面和样本候选内容进行理解。
多个样本图像的情况可以包括以下三种。
第一种情况为:多个样本图像包括一个样本候选界面和多种样本候选内容,即所有样本图像中的样本候选界面都相同。
第二种情况为,多个样本图像包括多个样本候选界面和一种样本候选内容,即所有样本图像中的样本候选内容都相同。
第三种情况为,多个样本图像包括多个样本候选界面和多种样本候选内容,此时,包含同一种样本候选内容的所有样本图像,可以包含多种样本候选内容;例如,样本图像为10000个,10000个样本图像包括100种样本候选内容,包含同一种样本候选内容的所有样本图像包含100个样本候选界面,即每种样本候选内容都可以通过100个样本候选界面呈现。
步骤102,获取每个样本图像的图像特征数据。
获取样本图像的图像特征数据方法有多种,本申请实施例对此不做具体限定。
基于获取方法的不同本申请实施例将图像特征数据大致分为两类,具体为全局视觉印象特征数据和局部视觉印象特征数据;需要说明的是,图像特征数据可以仅包括全局视觉印象特征数据,可以仅包括局部视觉印象特征数据,也可以同时包括全局视觉印象特征数据和局部视觉印象特征数据。
作为一种可实现的方式,每个样本图像包括多个区域,相应地,每个样本图像的图像特征数据包括多个局部特征向量,每个局部特征向量用于表征一个区域;此时,该图像特征数据也可以称为局部视觉印象特征数据。
需要说明的是,可以通过多种方法对样本图像进行划分,从而得到多个区域;例如,基于前述说明可知,一条新闻可以包括新闻的标题、新闻的作者和新闻的类别等部分,除此之外,新闻还可以包括配图部分;因此,可以按照新闻排版板式获得上述各个部分的区域坐标,然后根据区域坐标将样本图像划分为多个区域。
例如,以图5为例,利用上述方法可以将图5中的样本图像划分为新闻的标题、新闻的类别和新闻的配图三个区域。
获取局部特征向量的方法可以具体包括:将多个区域的图像分别输入到图片表征器中, 以通过图片表征器将多个区域转化为多个局部特征向量;其中,图片表征器可以理解为通过预先训练得到的一个模型,该模型的种类可以有很多,例如,该模型的种类可以为ResNet101。
作为一种可实现的方式,每个样本图像的图像特征数据包括全局特征向量,全局特征向量用于表征样本图像;此时,该图像特征数据也可以称为全局视觉印象特征数据。
获取全局特征向量的方法可以具体包括:将样本图像输入到图片表征器中,以通过图片表征器将样本图像转化为全局特征向量;由于前文对图片表征器进行了说明,故在此不做详述。
步骤103,获取样本用户的用户特征数据。
本申请实施例对样本用户的特征数据的种类不做具体限定,例如,样本用户的特征数据包括样本用户的年龄信息、样本用户所在的城市以及样本用户与新闻相关的历史数据;其中,样本用户与新闻相关的历史数据具体可以包括样本用户浏览的新闻的类型、样本用户点击新闻的类型、样本用户点击新闻的时间、样本用户点击新闻时的地点等。
样本用户与新闻相关的历史数据可以从样本用户的行为日志中获取。
步骤104,基于样本用户的用户特征数据和图像特征数据,并通过预测模型预测样本用户对每个样本图像的偏好程度,预测模型的输入是基于用户特征数据和图像特征数据确定的。
需要说明的是,还可以将用户特征数据和图像特征数据,与特定环境信息(如时间、日期、是否周末、是否假期等)结合,并通过预测模型预测样本用户对每个样本图像的偏好程度。
对于一个样本图像来说,可以将样本用户特征数据和图像特征数据直接输入到预测模型,从而得到预测模型输出的样本用户对该样本图像的偏好程度;也可以先对图像特征数据进行处理,以得到中间特征数据,然后将样本用户特征数据和该中间特征数据输入到预测模型,从而得到预测模型输出的样本用户对该样本图像的偏好程度。
下文会对通过预测模型预测样本用户对每个样本图像的偏好程度的过程进行具体说明。
步骤105,基于偏好程度和样本用户对样本候选内容的历史点击数据,对预测模型进行调整。
其中,样本用户对样本候选内容的历史点击数据可以包括,样本用户是否点击样本候选内容,以及样本用户点击样本候选内容次数。
具体地,可以根据样本用户对样本候选内容的历史点击数据,设定样本标签;例如,对于一张样本图像来说,若样本用户点击过样本图像中的样本候选内容,则可以将偏好程度的样本标签设置为1,若样本用户未点击过样本图像中的样本候选内容,则可以将偏好程度的样本标签设置为0。
再例如,对于一张样本图像来说,若样本用户点击样本图像中的样本候选内容的次数大于或等于第一阈值,则可以将偏好程度的样本标签设置为1;若样本用户点击样本图像中的样本候选内容的次数小于第一阈值,且大于或等于第二阈值,则可以将偏好程度的样本标签设置为0.5;若样本用户点击样本图像中的样本候选内容的次数小于第二阈值,或 样本用户未点击过样本图像中的样本候选内容,则可以将偏好程度的样本标签设置为0。
基于此,根据预测模型输出的样本用户对样本图像的偏好程度,以及样本标签可以计算损失函数,通过损失函数的反向传播更新预测模型的权重,或调整预测模型的结构,以使得预测模型输出的偏好程度接近于样本标签。
在本申请实施例中,由于样本图像既包含了样本候选内容,又包含了样本候选界面,所以基于样本图像的图像特征数据训练得到的预测模型,能够在同时考虑候选内容和候选界面对用户的影响的情况下,准确地输出用户对图像的偏好程度,从而有利于通过用户感兴趣的界面为用户推荐感兴趣的内容,以提高用户对推荐内容的点击率。
下面对通过预测模型预测样本用户对每个样本图像的偏好程度的过程进行说明。
首先,介绍图像特征数据包括局部视觉印象特征数据的情况。
作为一种可实现的方式,如图6所示,步骤104包括:
步骤201,对于每个样本图像,基于每个样本图像中的样本候选内容获取N个词向量,每个词向量表征样本候选内容中的一个词语,其中,N为正整数。
样本候选内容包括N个词语,对应每个词语,可以利用文本表征器生成一个词向量;与图片表征器类似,文本表征器也可以理解为通过预先训练获取的模型,该模型的种类可以有多种,例如,该模型可以为Bert模型。
可以理解的是,当样本候选内容为新闻内容时,通常新闻内容的标题能够较好地体现新闻内容的主要信息;因此,当样本候选内容为新闻内容时,可以对新闻内容的标题进行分词处理,以得到N个词语,然后通过文本表征器获取表征N个词语的N个词向量。
步骤202,对于每个词向量,基于每个词向量和多个局部特征向量,并通过注意力机制的模型计算多个局部特征向量各自的注意力权重,注意力权重表示样本用户在阅读每个词向量表征的词语时,关注局部特征向量表征的区域的程度。
注意力机制为一种通过计算神经网络模型中的各个部分的注意力权重、并合并成注意力向量,从而在神经网络模型中动态控制对神经网络模型中各个部分或某一部分的关注度的机制。
注意力机制包括多种,一般情况下,注意力机制包括两种:一种是自上而下的有意识的注意力,称为聚焦式(focus)注意力。聚焦式注意力是指有预定目的、依赖任务的、主动有意识地聚焦于某一对象的注意力;另一种是自下而上的无意识的注意力,称为基于显著性(saliency-based)的注意力。
除此之外,注意力机制还包括以下几种变体:多头注意力(multi-head attention)机制、硬性注意力机制、键值对注意力机制和结构化注意力机制。
其中,多头注意力(multi-head attention)机制是利用多个查询,来平行地计算从输入信息中选取多个信息,每个注意力关注输入信息的不同部分。
需要说明的是,上述注意力机制以及注意力机制的变体在该实施例中都适用。
下面通过具体的示例对上述过程进行说明。
例如,采用o j表示第j个局部特征向量,采用w i表示第i个词向量;基于此,可以采用 公式
Figure PCTCN2022105075-appb-000001
计算对于词向量w i,多个局部特征向量各自的注意力权重,其中,
Figure PCTCN2022105075-appb-000002
表示注意力权重,q m(·)和k m(·)表示带有偏差项的线性变换,k1表示局部特征向量的数量编号(即第K1个)。
以图5为例,图5中的样本图像被划分为新闻的标题、新闻的类别和新闻的配图三个区域,相应地,可以得到表征这三个区域的局部特征向量;以单词“states”为例,对于表征单词“states”的词向量,三个区域的局部特征向量的注意力权重分别表示样本用户在关注单词“states”时,关注三个区域的程度。
步骤203,基于多个局部特征向量各自的注意力权重,将每个词向量和多个局部特征向量融合,以得到第一融合特征向量,每个词向量对应得到一个第一融合特征向量。
具体地,可以通过多个局部特征向量各自的注意力权重对多个局部特征向量进行加权处理,然后将加权处理的结果与词向量相加得到第一融合特征向量。
上述过程可以通过公式
Figure PCTCN2022105075-appb-000003
实现,其中,v m(·)表示带有偏差项的线性变换,
Figure PCTCN2022105075-appb-000004
表示第一融合特征向量。
步骤204,基于用户特征向量和N个词向量对应的N个第一融合特征向量,并通过预测模型预测样本用户对每个样本图像的偏好程度,预测模型的输入是基于用户特征向量和N个第一融合特征向量确定的,用户特征向量用于表征样本用户的用户特征数据。
例如,对于一个样本图像来说,可以将用户特征向量和N个第一融合特征向量直接输入到预测模型,从而得到预测模型输出的样本用户对该样本图像的偏好程度;也可以先对N个第一融合特征向量进行处理,以得到中间特征数据,然后将用户特征向量和该中间特征数据输入到预测模型,从而得到预测模型输出的样本用户对该样本图像的偏好程度。
作为一种可实现的方式,如图7所示,步骤204包括:
步骤301,对于每个样本图像,通过自注意力机制的模型对N个词向量对应的N个第一融合特征向量进行处理,以得到N个语义增强特征向量,每个第一融合特征向量对应一个语义增强特征向量。
自注意力机制(self-attention mechanism)是对注意力机制改进得到的一种机制,其减少了对外部信息的依赖,更擅长捕捉数据或特征的内部相关性。
例如,在本申请实施例中,由于N个第一融合特征向量是由N个词向量得到的,而N个词向量所表征的词语来自于同一候选内容,所以通过自注意力机制能够更好地分析N个第一融合特征向量之间的相关性;对应地,注意力机制是用于捕捉数据外部的相关性,基于前文说明可知,注意力机制用于处理词向量和多个局部特征向量,相对于词向量所表征的词语来说,局部特征向量所表征的图像区域是外部的,所以本申请实施例通过注意力机制捕捉词向量表征的词语与局部特征向量所表征的图像区域之间的相关性。
其中,自注意力机制包括单头自注意力机制和多头自注意力机制。
可以理解的是,由于N个第一融合特征向量是由N个词向量得到的,而N个词向量之间存在语义关系,所以相应地,N个第一融合特征向量之间也存在语义关系;为此,在该实施例中,通过自注意力机制的模型对N个第一融合特征向量进行语义增强处理。
具体地,通过自注意力机制的模型对N个第一融合特征向量进行处理的过程可以包括: 采用公式
Figure PCTCN2022105075-appb-000005
Figure PCTCN2022105075-appb-000006
对N个第一融合特征向量进行处理,其中,q(·)和k(·)表示线性变换,
Figure PCTCN2022105075-appb-000007
表示第j个第一融合特征向量
Figure PCTCN2022105075-appb-000008
对第i个第一融合特征向量
Figure PCTCN2022105075-appb-000009
的语义增强的程度,k2表示局部特征向量的数量编号(即第K2个)。
步骤302,基于用户特征向量和N个语义增强特征向量,并通过预测模型预测样本用户对每个样本图像的偏好程度,预测模型的输入是基于用户特征向量和N个语义增强特征向量确定的。
例如,对于一个样本图像来说,可以将用户特征向量和N个语义增强特征向量直接输入到预测模型,从而得到预测模型输出的样本用户对该样本图像的偏好程度;也可以先对N个语义增强特征向量进行处理,以得到中间特征数据,然后将用户特征向量和该中间特征数据输入到预测模型,从而得到预测模型输出的样本用户对该样本图像的偏好程度。
作为一种可实现的方式,步骤302包括:
对于每个样本图像,通过加法注意力机制的模型将N个语义增强特征向量融合,以得到第二融合特征向量;
基于用户特征向量和第二融合特征向量,并通过预测模型预测样本用户对每个样本图像的偏好程度,预测模型的输入是基于用户特征向量和第二融合特征向量确定的。
通过加法注意力机制的模型将N个语义增强特征向量融合包括:采用公式
Figure PCTCN2022105075-appb-000010
Figure PCTCN2022105075-appb-000011
Figure PCTCN2022105075-appb-000012
对N个语义增强特征向量融进行处理,k a用于将
Figure PCTCN2022105075-appb-000013
转化为隐空间向量,q a用于计算融合过程中的注意力权重,
Figure PCTCN2022105075-appb-000014
表示第i个语义增强特征向量的注意力权重,e1表示第二融合特征向量,k3表示局部特征向量的数量编号(即第K3个)。
基于上述说明,作为一种可实现的方式,如图8所示,得到第二融合特征向量的过程可以概括如下:将词向量和局部特征向量作为输入,依次利用注意力机制、自注意力机制以及加法注意力机制,输出第二融合特征向量。
上面对图像特征数据包括局部视觉印象特征数据的情况进行了介绍,下面介绍图像特征数据包括全局视觉印象特征数据的情况。
作为一种可实现的方式,如图9所示,步骤104包括:
步骤401,对于每个样本图像,基于每个样本图像中的样本候选内容获取内容特征向量,内容特征向量用于表征样本候选内容。
与词向量的获取过程类似,也可以利用文本表征器将样本候选内容转化成内容特征向量。
可以理解的是,当样本候选内容为新闻内容时,通常新闻内容的标题能够较好地体现新闻内容的主要信息;因此,当样本候选内容为新闻内容时,可以将新闻内容的标题转化成标题特征向量,并将该标题特征向量作为表征样本候选内容的内容特征向量。
步骤402,基于内容特征向量和全局特征向量,确定内容特征向量的权重和全局特征向量的权重。
需要说明的是,确定内容特征向量的权重和全局特征向量的权重的方法有多种,本申 请实施例对此不做具体限定。
由于用户可能对视觉印象信息和文本语义具有不同的敏感度,因此作为一种可实现的方式,可以采用通过门限加法网络自适应地控制内容特征向量和全局特征向量各自的权重。
具体地,采用通过门限加法网络控制内容特征向量和全局特征向量各自的权重的过程包括:通过公式a=σ(g(o *,e2))计算内容特征向量的权重,全局特征向量的权重为(1-a),其中,g(·)表示线性变换,σ表示sigmoid函数,e2表示内容特征向量,o *表示全局特征向量;从该公式可以看出,a是由e2和o *共同决定的,可以自适应地调整。
步骤403,基于内容特征向量的权重和全局特征向量的权重,将内容特征向量和全局特征向量融合,以得到第三融合特征向量。
其中,上述过程可以采用公式e *=a*e2+(1-a)*o *实现。
基于上述说明可知,如图10所示,得到第三融合特征向量的过程可以概括如下:将内容特征向量和全局特征向量作为输入,利用门限加法网络,输出第三融合特征向量。
步骤404,基于用户特征向量和第三融合特征向量,并通过预测模型预测样本用户对每个样本图像的偏好程度,预测模型的输入是基于用户特征向量和第三融合向量确定的,用户特征向量用于表征样本用户的用户特征数据。
在该实施例中,可以将用户特征向量和第三融合特征向量直接输入到预测模型中,以实现对每个样本图像的偏好程度的预测。
下面对本申请实施例提供的推荐方法进行介绍。
如图11所示,本申请实施例提供了一种推荐方法的一个实施例,该实施例可以应用于服务器,也可以应用于终端。具体地,该实施例包括:
步骤501,获取多张图像,每张图像包含一个候选界面和通过候选界面呈现的一种候选内容。
步骤502,获取每张图像的图像特征数据。
作为一种可实现的方式,每张图像包括多个区域,相应地,每张图像的图像特征数据包括多个局部特征向量,每个局部特征向量用于表征一个区域。
作为一种可实现的方式,每张图像的图像特征数据包括全局特征向量,全局特征向量用于表征图像。
步骤503,获取目标用户的用户特征数据。
步骤504,基于目标用户的用户特征数据和图像特征数据,并通过预测模型预测目标用户对每张图像的偏好程度,预测模型的输入是基于用户特征数据和图像特征数据确定的。
作为一种可实现的方式,如图12所示,当每张图像的图像特征数据包括多个局部特征向量时,步骤504包括:
步骤601,对于每张图像,基于每张图像中的候选内容获取N个词向量,每个词向量表征候选内容中的一个词语,其中,N为正整数;
步骤602,对于每个词向量,基于每个词向量和多个局部特征向量,并通过注意力机制的模型计算多个局部特征向量各自的注意力权重,注意力权重表示目标用户在阅读每个词向量表征的词语时,关注局部特征向量表征的区域的程度;
步骤603,基于多个局部特征向量各自的注意力权重,将每个词向量和多个局部特征 向量融合,以得到第一融合特征向量,每个词向量对应得到一个第一融合特征向量;
步骤604,基于用户特征向量和N个词向量对应的N个第一融合特征向量,并通过预测模型预测目标用户对每张图像的偏好程度,预测模型的输入是基于用户特征向量和N个第一融合特征向量确定的,用户特征向量用于表征目标用户的用户特征数据。
作为一种可实现的方式,如图13所示,步骤604包括:
步骤701,对于每张图像,通过自注意力机制的模型对N个词向量对应的N个第一融合特征向量进行处理,以得到N个语义增强特征向量,每个第一融合特征向量对应一个语义增强特征向量;
步骤702,基于用户特征向量和N个语义增强特征向量,并通过预测模型预测目标用户对每张图像的偏好程度,预测模型的输入是基于用户特征向量和N个语义增强特征向量确定的。
作为一种可实现的方式,步骤702包括:
对于每张图像,通过加法注意力机制的模型将N个语义增强特征向量融合,以得到第二融合特征向量;
基于用户特征向量和第二融合特征向量,并通过预测模型预测目标用户对每张图像的偏好程度,预测模型的输入是基于用户特征向量和第二融合特征向量确定的。
作为一种可实现的方式,如图14所示,当每张图像的图像特征数据包括全局特征向量时,步骤504包括:
步骤801,对于每张图像,基于每张图像中的候选内容获取内容特征向量,内容特征向量用于表征候选内容;
步骤802,基于内容特征向量和全局特征向量,确定内容特征向量的权重和全局特征向量的权重;
步骤803,基于内容特征向量的权重和全局特征向量的权重,将内容特征向量和全局特征向量融合,以得到第三融合特征向量;
步骤804,基于用户特征向量和第三融合特征向量,并通过预测模型预测目标用户对每张图像的偏好程度,预测模型的输入是基于用户特征向量和第三融合向量确定的,用户特征向量用于表征目标用户的用户特征数据。
作为一种可实现的方式,步骤804包括:
基于偏好程度从多张图像包含的候选内容中选择一种候选内容作为目标候选内容;
基于偏好程度从包含目标候选内容的图像的候选界面中,选择一种候选界面作为目标候选界面,以通过目标候选界面推荐目标候选内容。
需要说明的是,步骤501至步骤504与步骤101至步骤104类似,具体可参阅前文中步骤101和步骤103的相关说明进行理解。
步骤505,基于偏好程度从多张图像包含的候选界面和候选内容中,选择候选内容和/或候选界面,以进行推荐。
需要说明的是,基于偏好程序可以仅选择多张图像中的候选内容进行推荐,也可以仅选择多张图像中的候选界面进行推荐,还可以同时从多张图像选择候选内容和候选界面进 行推荐,下面对此进行具体介绍。
例如,如图15所示,利用用户日志得到用户点击历史,利用新闻素材和新闻界面得到新闻视觉印象,然后经过数据预处理模块、局部印象模块、全局印象模块以及模型预测模块的处理,得到用户对新闻的偏好程序,该偏好程度具体是指用户对图像中的新闻内容(即候选内容)的偏好程序;最后按照偏好程度对多张图像由高到低进行排序,然后选择排序在前的M张图像的新闻内容,并将其推荐给目标用户。
其中,数据预处理模块用于执行步骤502和步骤503,局部印象模块用于执行步骤603、步骤701以及步骤702中的融合操作,以得到第二融合特征向量;全局印象模块用于执行步骤802和步骤803,模型预测模块用于执行步骤702中的预测操作和步骤804中的预测操作。
再例如,如图16所示,获取当前用户的用户侧特征(即用户特征数据),利用新闻素材和新闻界面得到多种新闻界面组合候选(即前文中的多张图像),然后经过数据预处理模块、局部印象模块、全局印象模块、模型预测模块和界面生成模块的处理,得到用户对新闻的偏好程序,该偏好程度具体是指用户对图像中的用户界面(即候选界面)的偏好程序;最后按照偏好程度对多张图像由高到低进行排序,然后选择偏好程度最高的一张图像中的用户界面(即最佳用户界面),然后生成最佳用户界面配置;之后,便可以根据最佳用户界面配置显示最佳用户界面,并通过最佳用户界面为当前用户推荐各种内容。
其中,数据预处理模块用于执行步骤502和步骤503,局部印象模块用于执行步骤603、步骤701以及步骤702中的融合操作,以得到第二融合特征向量;全局印象模块用于执行步骤802和步骤803,模型预测模块用于执行步骤702中的预测操作和步骤804中的预测操作,界面生成模块用于根据模型预测模块预测的结果生成最佳用户界面。
除此之外,作为一种可实现方式,步骤505包括:
基于偏好程度从多张图像包含的候选内容中选择一种候选内容作为目标候选内容;
基于偏好程度从包含目标候选内容的图像的候选界面中,选择一种候选界面作为目标候选界面,以通过目标候选界面推荐目标候选内容。
需要说明的是,基于偏好程度可以选择多种候选内容向目标用户推荐,而目标候选内容是选择出的多种候选内容的一种。
下面通过具体的示例对上述过程进行说明。
例如,图像的数量为4张,第一张图像包含候选内容A、候选界面A,第二张图像包含候选内容A、候选界面B,第三张图像包含候选内容B、候选界面A,第四张图像包含候选内容B、候选界面B;目标用户对这4张图像的偏好程度由高到低依次为:第一张图像、第二张图像、第四张图像、第三张图像。
若目标候选内容为候选内容A,由于第一张图像和第二张图像包含候选内容A,所以从第一张图像和第二张图像的候选界面中选择目标候选界面;又由于目标用户对第一张图像的偏好程度高于对第二张图像的偏好程度,所以选择第一张图像中的候选界面A作为目标候选界面,然后通过候选界面A向目标用户推荐候选内容A。
同样地,若目标候选内容为候选内容B,由于第三张图像和第四张图像包含候选内容B, 所以从第四张图像和第三张图像的候选界面中选择目标候选界面;又由于目标用户对第四张图像的偏好程度高于对第三张图像的偏好程度,所以选择第四张图像中的候选界面B作为目标候选界面,然后通过候选界面B向目标用户推荐候选内容B。
由此可见,对于不同的目标候选内容,得到的目标候选界面可能不同。
步骤506,向终端设备发送目标候选界面的元数据和目标候选内容,以使得终端设备基于元数据显示目标候选界面,并通过目标候选界面向目标用户推荐目标候选内容。
可以理解的是,当上述方法由服务器执行时,服务器会将目标候选界面的元数据和目标候选内容发送至终端设备;相应地,终端设备便会接收到目标候选界面的元数据和目标候选内容,然后基于元数据显示目标候选界面,并通过目标候选界面向目标用户推荐目标候选内容。
请参阅图17,本申请实施例提供了一种推荐装置的一个实施例,包括:第一图像获取单元601,用于获取多张图像,每张图像包含一个候选界面和通过候选界面呈现的一种候选内容;第一特征数据获取单元602,用于获取每张图像的图像特征数据;第一预测单元603,用于基于目标用户的用户特征数据和图像特征数据,并通过预测模型预测目标用户对每张图像的偏好程度,预测模型的输入是基于用户特征数据和图像特征数据确定的;推荐单元604,用于基于偏好程度从多张图像包含的候选界面和候选内容中,选择候选内容和/或候选界面,以进行推荐。
作为一种可实现的方式,每张图像包括多个区域;每张图像的图像特征数据包括多个局部特征向量,每个局部特征向量用于表征一个区域。
作为一种可实现的方式,第一预测单元603,用于对于每张图像,基于每张图像中的候选内容获取N个词向量,每个词向量表征候选内容中的一个词语,其中,N为正整数;对于每个词向量,基于每个词向量和多个局部特征向量,并通过注意力机制的模型计算多个局部特征向量各自的注意力权重,注意力权重表示目标用户在阅读每个词向量表征的词语时,关注局部特征向量表征的区域的程度;基于多个局部特征向量各自的注意力权重,将每个词向量和多个局部特征向量融合,以得到第一融合特征向量,每个词向量对应得到一个第一融合特征向量;基于用户特征向量和N个词向量对应的N个第一融合特征向量,并通过预测模型预测目标用户对每张图像的偏好程度,预测模型的输入是基于用户特征向量和N个第一融合特征向量确定的,用户特征向量用于表征目标用户的用户特征数据。
作为一种可实现的方式,第一预测单元603,用于对于每张图像,通过自注意力机制的模型对N个词向量对应的N个第一融合特征向量进行处理,以得到N个语义增强特征向量,每个第一融合特征向量对应一个语义增强特征向量;基于用户特征向量和N个语义增强特征向量,并通过预测模型预测目标用户对每张图像的偏好程度,预测模型的输入是基于用户特征向量和N个语义增强特征向量确定的。
作为一种可实现的方式,第一预测单元603,用于基于用户特征向量和N个语义增强特征向量,并通过预测模型预测目标用户对每张图像的偏好程度包括:对于每张图像,通过加法注意力机制的模型将N个语义增强特征向量融合,以得到第二融合特征向量;基于用户特征向量和第二融合特征向量,并通过预测模型预测目标用户对每张图像的偏好程度, 预测模型的输入是基于用户特征向量和第二融合特征向量确定的。
作为一种可实现的方式,每张图像的图像特征数据包括全局特征向量,全局特征向量用于表征图像。
作为一种可实现的方式,第一预测单元603,用于对于每张图像,基于每张图像中的候选内容获取内容特征向量,内容特征向量用于表征候选内容;基于内容特征向量和全局特征向量,确定内容特征向量的权重和全局特征向量的权重;基于内容特征向量的权重和全局特征向量的权重,将内容特征向量和全局特征向量融合,以得到第三融合特征向量;基于用户特征向量和第三融合特征向量,并通过预测模型预测目标用户对每张图像的偏好程度,预测模型的输入是基于用户特征向量和第三融合向量确定的,用户特征向量用于表征目标用户的用户特征数据。
作为一种可实现的方式,推荐单元604,用于基于偏好程度从多张图像包含的候选内容中选择一种候选内容作为目标候选内容;基于偏好程度从包含目标候选内容的图像的候选界面中,选择一种候选界面作为目标候选界面,以通过目标候选界面推荐目标候选内容。
作为一种可实现的方式,装置还包括发送单元605,用于向终端设备发送目标候选界面的元数据和目标候选内容,以使得终端设备基于元数据显示目标候选界面,并通过目标候选界面向目标用户推荐目标候选内容。
其中,以上各单元的具体实现、相关说明以及技术效果请参考本申请实施例方法部分的描述。
请参阅图18,本申请实施例提供了一种训练装置的一个实施例,包括:第二图像获取单元701,用于获取多个样本图像,每个样本图像包含一个样本候选界面和通过样本候选界面呈现的一种样本候选内容;第二特征数据获取单元702,用于获取每个样本图像的图像特征数据;第二预测单元703,用于基于样本用户的用户特征数据和图像特征数据,并通过预测模型预测样本用户对每个样本图像的偏好程度,预测模型的输入是基于用户特征数据和图像特征数据确定的;调整单元704,用于基于偏好程度和样本用户对样本候选内容的历史点击数据,对预测模型进行调整。
作为一种可实现的方式,每个样本图像包括多个区域;每个样本图像的图像特征数据包括多个局部特征向量,每个局部特征向量用于表征一个区域。
作为一种可实现的方式,第二预测单元703,用于对于每个样本图像,基于每个样本图像中的样本候选内容获取N个词向量,每个词向量表征样本候选内容中的一个词语,其中,N为正整数;对于每个词向量,基于每个词向量和多个局部特征向量,并通过注意力机制的模型计算多个局部特征向量各自的注意力权重,注意力权重表示样本用户在阅读每个词向量表征的词语时,关注局部特征向量表征的区域的程度;基于多个局部特征向量各自的注意力权重,将每个词向量和多个局部特征向量融合,以得到第一融合特征向量,每个词向量对应得到一个第一融合特征向量;基于用户特征向量和N个词向量对应的N个第一融合特征向量,并通过预测模型预测样本用户对每个样本图像的偏好程度,预测模型的输入是基于用户特征向量和N个第一融合特征向量确定的,用户特征向量用于表征样本用户的用户特征数据。
作为一种可实现的方式,第二预测单元703,用于对于每个样本图像,通过自注意力机制的模型对N个词向量对应的N个第一融合特征向量进行处理,以得到N个语义增强特征向量,每个第一融合特征向量对应一个语义增强特征向量;基于用户特征向量和N个语义增强特征向量,并通过预测模型预测样本用户对每个样本图像的偏好程度,预测模型的输入是基于用户特征向量和N个语义增强特征向量确定的。
作为一种可实现的方式,第二预测单元703,用于对于每个样本图像,通过加法注意力机制的模型将N个语义增强特征向量融合,以得到第二融合特征向量;基于用户特征向量和第二融合特征向量,并通过预测模型预测样本用户对每个样本图像的偏好程度,预测模型的输入是基于用户特征向量和第二融合特征向量确定的。
作为一种可实现的方式,每个样本图像的图像特征数据包括全局特征向量,全局特征向量用于表征样本图像。
作为一种可实现的方式,第二预测单元703,用于对于每个样本图像,基于每个样本图像中的样本候选内容获取内容特征向量,内容特征向量用于表征样本候选内容;基于内容特征向量和全局特征向量,确定内容特征向量的权重和全局特征向量的权重;基于内容特征向量的权重和全局特征向量的权重,将内容特征向量和全局特征向量融合,以得到第三融合特征向量;基于用户特征向量和第三融合特征向量,并通过预测模型预测样本用户对每个样本图像的偏好程度,预测模型的输入是基于用户特征向量和第三融合向量确定的,用户特征向量用于表征样本用户的用户特征数据。
其中,以上各单元的具体实现、相关说明以及技术效果请参考本申请实施例方法部分的描述。
本申请实施例还提供了一种计算机设备的实施例,该计算机设备可以是终端,也可以服务器,当计算机设备为服务器时,该计算机设备可以作为训练设备。
请参阅图19,图19是本申请实施例提供的计算机设备的一种结构示意图,用于实现图17对应实施例中推荐装置的功能或图18对应实施例中训练装置的功能,具体的,计算机设备1800由一个或多个服务器实现,计算机设备1800可因配置或性能不同而产生比较大的差异,可以包括一个或一个以上中央处理器(central processing units,CPU)1822(例如,一个或一个以上处理器)和存储器1832,一个或一个以上存储应用程序1842或数据1844的存储介质1830(例如一个或一个以上海量存储设备)。其中,存储器1832和存储介质1830可以是短暂存储或持久存储。存储在存储介质1830的程序可以包括一个或一个以上模块(图示没标出),每个模块可以包括对计算机设备中的一系列指令操作。更进一步地,中央处理器1822可以设置为与存储介质1830通信,在计算机设备1800上执行存储介质1830中的一系列指令操作。
计算机设备1800还可以包括一个或一个以上电源1826,一个或一个以上有线或无线网络接口1850,一个或一个以上输入输出接口1858,和/或,一个或一个以上操作系统1841,例如Windows ServerTM,Mac OS XTM,UnixTM,LinuxTM,FreeBSDTM等等。
本申请实施例中,中央处理器1822,可以用于执行图17对应实施例中推荐装置执行的检索方法。具体的,中央处理器1822,可以用于:
获取多张图像,每张图像包含一个候选界面和通过候选界面呈现的一种候选内容;
获取每张图像的图像特征数据;
基于目标用户的用户特征数据和图像特征数据,并通过预测模型预测目标用户对每张图像的偏好程度,预测模型的输入是基于用户特征数据和图像特征数据确定的;
基于偏好程度从多张图像包含的候选界面和候选内容中,选择候选内容和/或候选界面,以进行推荐。
本申请实施例中,中央处理器1822,可以用于执行图18对应实施例中训练装置执行的模型训练方法。具体的,中央处理器1822,可以用于:
获取多个样本图像,每个样本图像包含一个样本候选界面和通过样本候选界面呈现的一种样本候选内容;
获取每个样本图像的图像特征数据;
基于样本用户的用户特征数据和图像特征数据,并通过预测模型预测样本用户对每个样本图像的偏好程度,预测模型的输入是基于用户特征数据和图像特征数据确定的;
基于偏好程度和样本用户对样本候选内容的历史点击数据,对预测模型进行调整。
本申请实施例还提供一种芯片,包括一个或多个处理器。所述处理器中的部分或全部用于读取并执行存储器中存储的计算机程序,以执行前述各实施例的方法。
可选地,该芯片该包括存储器,该存储器与该处理器通过电路或电线与存储器连接。进一步可选地,该芯片还包括通信接口,处理器与该通信接口连接。通信接口用于接收需要处理的数据和/或信息,处理器从该通信接口获取该数据和/或信息,并对该数据和/或信息进行处理,并通过该通信接口输出处理结果。该通信接口可以是输入输出接口。
在一些实现方式中,所述一个或多个处理器中还可以有部分处理器是通过专用硬件的方式来实现以上方法中的部分步骤,例如涉及神经网络模型的处理可以由专用神经网络处理器或图形处理器来实现。
本申请实施例提供的方法可以由一个芯片实现,也可以由多个芯片协同实现。
本申请实施例还提供了一种计算机存储介质,该计算机存储介质用于储存为上述计算机设备所用的计算机软件指令,其包括用于执行为计算机设备所设计的程序。
该计算机设备可以如前述图17对应实施例中推荐装置或图18对应实施例中训练装置。
本申请实施例还提供了一种计算机程序产品,该计算机程序产品包括计算机软件指令,该计算机软件指令可通过处理器进行加载来实现前述各个实施例所示的方法中的流程。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统,装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统,装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征数据可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。

Claims (23)

  1. 一种推荐方法,其特征在于,包括:
    获取多张图像,每张所述图像包含一个候选界面和通过所述候选界面呈现的一种候选内容;
    获取每张所述图像的图像特征数据;
    基于目标用户的用户特征数据和所述图像特征数据,并通过预测模型预测所述目标用户对每张所述图像的偏好程度,所述预测模型的输入是基于所述用户特征数据和所述图像特征数据确定的;
    基于所述偏好程度从所述多张图像包含的所述候选界面和所述候选内容中,选择候选内容和/或候选界面,以进行推荐。
  2. 根据权利要求1所述的方法,其特征在于,每张所述图像包括多个区域;
    每张所述图像的图像特征数据包括多个局部特征向量,每个所述局部特征向量用于表征一个所述区域。
  3. 根据权利要求2所述的方法,其特征在于,所述基于目标用户的用户特征数据和所述图像特征数据,并通过预测模型预测所述目标用户对每张所述图像的偏好程度包括:
    对于每张所述图像,基于每张所述图像中的所述候选内容获取N个词向量,每个所述词向量表征所述候选内容中的一个词语,其中,N为正整数;
    对于每个所述词向量,基于每个所述词向量和所述多个局部特征向量,并通过注意力机制的模型计算所述多个局部特征向量各自的注意力权重,所述注意力权重表示所述目标用户在阅读每个所述词向量表征的词语时,关注所述局部特征向量表征的区域的程度;
    基于所述多个局部特征向量各自的注意力权重,将每个所述词向量和所述多个局部特征向量融合,以得到第一融合特征向量,每个所述词向量对应得到一个所述第一融合特征向量;
    基于所述用户特征向量和所述N个词向量对应的N个所述第一融合特征向量,并通过预测模型预测所述目标用户对每张所述图像的偏好程度,所述预测模型的输入是基于所述用户特征向量和N个所述第一融合特征向量确定的,所述用户特征向量用于表征目标用户的用户特征数据。
  4. 根据权利要求3所述的方法,其特征在于,所述基于所述用户特征向量和所述N个词向量对应的N个所述第一融合特征向量,并通过预测模型预测所述目标用户对每张所述图像的偏好程度包括:
    对于每张所述图像,通过自注意力机制的模型对所述N个词向量对应的N个所述第一融合特征向量进行处理,以得到N个语义增强特征向量,每个所述第一融合特征向量对应一个语义增强特征向量;
    基于所述用户特征向量和所述N个语义增强特征向量,并通过预测模型预测所述目标 用户对每张所述图像的偏好程度,所述预测模型的输入是基于所述用户特征向量和所述N个语义增强特征向量确定的。
  5. 根据权利要求4所述的方法,其特征在于,所述基于所述用户特征向量和所述N个语义增强特征向量,并通过预测模型预测所述目标用户对每张所述图像的偏好程度包括:
    对于每张所述图像,通过加法注意力机制的模型将所述N个语义增强特征向量融合,以得到第二融合特征向量;
    基于所述用户特征向量和所述第二融合特征向量,并通过预测模型预测所述目标用户对每张所述图像的偏好程度,所述预测模型的输入是基于所述用户特征向量和所述第二融合特征向量确定的。
  6. 根据权利要求1所述的方法,其特征在于,每张所述图像的图像特征数据包括全局特征向量,所述全局特征向量用于表征所述图像。
  7. 根据权利要求6所述的方法,其特征在于,所述基于目标用户的用户特征数据和所述图像特征数据,并通过预测模型预测所述目标用户对每张所述图像的偏好程度包括:
    对于每张所述图像,基于每张所述图像中的所述候选内容获取内容特征向量,所述内容特征向量用于表征所述候选内容;
    基于所述内容特征向量和所述全局特征向量,确定所述内容特征向量的权重和所述全局特征向量的权重;
    基于所述内容特征向量的权重和所述全局特征向量的权重,将所述内容特征向量和所述全局特征向量融合,以得到第三融合特征向量;
    基于所述用户特征向量和所述第三融合特征向量,并通过预测模型预测所述目标用户对每张所述图像的偏好程度,所述预测模型的输入是基于所述用户特征向量和所述第三融合向量确定的,所述用户特征向量用于表征目标用户的用户特征数据。
  8. 根据权利要求1至7中任意一项所述的方法,其特征在于,所述基于所述偏好程度从所述多张图像包含的所述候选界面和所述候选内容中,选择候选内容和/或候选界面,以进行推荐包括:
    基于所述偏好程度从所述多张图像包含的所述候选内容中选择一种候选内容作为目标候选内容;
    基于所述偏好程度从包含所述目标候选内容的所述图像的所述候选界面中,选择一种候选界面作为目标候选界面,以通过所述目标候选界面推荐所述目标候选内容。
  9. 根据权利要求8所述的方法,其特征在于,在所述基于所述偏好程度从包含所述目标候选内容的所述图像的所述候选界面中,选择一种候选界面作为目标候选界面之后,所述方法还包括:
    向终端设备发送所述目标候选界面的元数据和所述目标候选内容,以使得所述终端设备基于所述元数据显示所述目标候选界面,并通过所述目标候选界面向所述目标用户推荐所述目标候选内容。
  10. 一种训练方法,其特征在于,包括:
    获取多个样本图像,每个所述样本图像包含一个样本候选界面和通过所述样本候选界面呈现的一种样本候选内容;
    获取每个所述样本图像的图像特征数据;
    基于样本用户的用户特征数据和所述图像特征数据,并通过预测模型预测所述样本用户对每个所述样本图像的偏好程度,所述预测模型的输入是基于所述用户特征数据和所述图像特征数据确定的;
    基于所述偏好程度和所述样本用户对所述样本候选内容的历史点击数据,对所述预测模型进行调整。
  11. 根据权利要求10所述的方法,其特征在于,每个所述样本图像包括多个区域;
    每个所述样本图像的图像特征数据包括多个局部特征向量,每个所述局部特征向量用于表征一个所述区域。
  12. 根据权利要求11所述的方法,其特征在于,所述基于样本用户的用户特征数据和所述图像特征数据,并通过预测模型预测所述样本用户对每个所述样本图像的偏好程度包括:
    对于每个所述样本图像,基于每个所述样本图像中的所述样本候选内容获取N个词向量,每个所述词向量表征所述样本候选内容中的一个词语,其中,N为正整数;
    对于每个所述词向量,基于每个所述词向量和所述多个局部特征向量,并通过注意力机制的模型计算所述多个局部特征向量各自的注意力权重,所述注意力权重表示所述样本用户在阅读每个所述词向量表征的词语时,关注所述局部特征向量表征的区域的程度;
    基于所述多个局部特征向量各自的注意力权重,将每个所述词向量和所述多个局部特征向量融合,以得到第一融合特征向量,每个所述词向量对应得到一个所述第一融合特征向量;
    基于所述用户特征向量和所述N个词向量对应的N个所述第一融合特征向量,并通过预测模型预测所述样本用户对每个所述样本图像的偏好程度,所述预测模型的输入是基于所述用户特征向量和N个所述第一融合特征向量确定的,所述用户特征向量用于表征样本用户的用户特征数据。
  13. 根据权利要求12所述的方法,其特征在于,所述基于所述用户特征向量和所述N个词向量对应的N个所述第一融合特征向量,并通过预测模型预测所述样本用户对每个所述样本图像的偏好程度包括:
    对于每个所述样本图像,通过自注意力机制的模型对所述N个词向量对应的N个所述第一融合特征向量进行处理,以得到N个语义增强特征向量,每个所述第一融合特征向量对应一个语义增强特征向量;
    基于所述用户特征向量和所述N个语义增强特征向量,并通过预测模型预测所述样本用户对每个所述样本图像的偏好程度,所述预测模型的输入是基于所述用户特征向量和所述N个语义增强特征向量确定的。
  14. 根据权利要求13所述的方法,其特征在于,所述基于所述用户特征向量和所述N个语义增强特征向量,并通过预测模型预测所述样本用户对每个所述样本图像的偏好程度包括:
    对于每个所述样本图像,通过加法注意力机制的模型将所述N个语义增强特征向量融合,以得到第二融合特征向量;
    基于所述用户特征向量和所述第二融合特征向量,并通过预测模型预测所述样本用户对每个所述样本图像的偏好程度,所述预测模型的输入是基于所述用户特征向量和所述第二融合特征向量确定的。
  15. 根据权利要求10所述的方法,其特征在于,每个所述样本图像的图像特征数据包括全局特征向量,所述全局特征向量用于表征所述样本图像。
  16. 根据权利要求15所述的方法,其特征在于,所述基于样本用户的用户特征数据和所述图像特征数据,并通过预测模型预测所述样本用户对每个所述样本图像的偏好程度包括:
    对于每个所述样本图像,基于每个所述样本图像中的所述样本候选内容获取内容特征向量,所述内容特征向量用于表征所述样本候选内容;
    基于所述内容特征向量和所述全局特征向量,确定所述内容特征向量的权重和所述全局特征向量的权重;
    基于所述内容特征向量的权重和所述全局特征向量的权重,将所述内容特征向量和所述全局特征向量融合,以得到第三融合特征向量;
    基于所述用户特征向量和所述第三融合特征向量,并通过预测模型预测所述样本用户对每个所述样本图像的偏好程度,所述预测模型的输入是基于所述用户特征向量和所述第三融合向量确定的,所述用户特征向量用于表征样本用户的用户特征数据。
  17. 一种推荐装置,其特征在于,包括:
    第一图像获取单元,用于获取多张图像,每张所述图像包含一个候选界面和通过所述候选界面呈现的一种候选内容;
    第一特征数据获取单元,用于获取每张所述图像的图像特征数据;
    第一预测单元,用于基于目标用户的用户特征数据和所述图像特征数据,并通过预测 模型预测所述目标用户对每张所述图像的偏好程度,所述预测模型的输入是基于所述用户特征数据和所述图像特征数据确定的;
    推荐单元,用于基于所述偏好程度从所述多张图像包含的所述候选界面和所述候选内容中,选择候选内容和/或候选界面,以进行推荐。
  18. 一种训练装置,其特征在于,包括:
    第二图像获取单元,用于获取多个样本图像,每个所述样本图像包含一个样本候选界面和通过所述样本候选界面呈现的一种样本候选内容;
    第二特征数据获取单元,用于获取每个所述样本图像的图像特征数据;
    第二预测单元,用于基于样本用户的用户特征数据和所述图像特征数据,并通过预测模型预测所述样本用户对每个所述样本图像的偏好程度,所述预测模型的输入是基于所述用户特征数据和所述图像特征数据确定的;
    调整单元,用于基于所述偏好程度和所述样本用户对所述样本候选内容的历史点击数据,对所述预测模型进行调整。
  19. 一种计算机设备,其特征在于,包括存储器和处理器,其中,所述存储器用于存储计算机可读指令;所述处理器用于读取所述计算机可读指令并实现如权利要求1-9任意一项所述的方法。
  20. 一种训练设备,其特征在于,包括存储器和处理器,其中,所述存储器用于存储计算机可读指令;所述处理器用于读取所述计算机可读指令并实现如权利要求10-16任意一项所述的方法。
  21. 一种计算机存储介质,其特征在于,存储有计算机可读指令,且所述计算机可读指令在被处理器执行时实现如权利要求1-16任意一项所述的方法。
  22. 一种计算机程序产品,其特征在于,所述计算机程序产品中包含计算机可读指令,当该计算机可读指令被处理器执行时实现如权利要求1-16任意一项所述的方法。
  23. 一种推荐系统,其特征在于,包括终端设备和服务器;
    所述服务器用于执行如权利要求9所述的方法;
    所述终端设备用于接收来自所述服务器的目标候选界面的元数据和目标候选内容;
    基于所述元数据显示所述目标候选界面,并通过所述目标候选界面向所述目标用户推荐所述目标候选内容。
PCT/CN2022/105075 2021-08-20 2022-07-12 一种推荐方法、训练方法、装置、设备及推荐系统 WO2023020160A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP22857473.7A EP4379574A4 (en) 2021-08-20 2022-07-12 RECOMMENDATION METHOD AND APPARATUS, LEARNING METHOD AND APPARATUS, RECOMMENDATION DEVICE, AND SYSTEM
US18/441,389 US20240184837A1 (en) 2021-08-20 2024-02-14 Recommendation method and apparatus, training method and apparatus, device, and recommendation system

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110963660.X 2021-08-20
CN202110963660.XA CN113806631A (zh) 2021-08-20 2021-08-20 一种推荐方法、训练方法、装置、设备及新闻推荐系统

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/441,389 Continuation US20240184837A1 (en) 2021-08-20 2024-02-14 Recommendation method and apparatus, training method and apparatus, device, and recommendation system

Publications (1)

Publication Number Publication Date
WO2023020160A1 true WO2023020160A1 (zh) 2023-02-23

Family

ID=78893897

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/105075 WO2023020160A1 (zh) 2021-08-20 2022-07-12 一种推荐方法、训练方法、装置、设备及推荐系统

Country Status (4)

Country Link
US (1) US20240184837A1 (zh)
EP (1) EP4379574A4 (zh)
CN (1) CN113806631A (zh)
WO (1) WO2023020160A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113806631A (zh) * 2021-08-20 2021-12-17 华为技术有限公司 一种推荐方法、训练方法、装置、设备及新闻推荐系统
CN114741608A (zh) * 2022-05-10 2022-07-12 中国平安财产保险股份有限公司 基于用户画像的新闻推荐方法、装置、设备及存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170116543A1 (en) * 2015-10-23 2017-04-27 Sap Se Self-adaptive display layout system
CN107895024A (zh) * 2017-09-13 2018-04-10 同济大学 用于网页新闻分类推荐的用户模型构建方法及推荐方法
CN109740068A (zh) * 2019-01-29 2019-05-10 腾讯科技(北京)有限公司 媒体数据推荐方法、装置及存储介质
CN109947510A (zh) * 2019-03-15 2019-06-28 北京市商汤科技开发有限公司 一种界面推荐方法及装置、计算机设备
CN111461175A (zh) * 2020-03-06 2020-07-28 西北大学 自注意与协同注意机制的标签推荐模型构建方法及装置
CN113806631A (zh) * 2021-08-20 2021-12-17 华为技术有限公司 一种推荐方法、训练方法、装置、设备及新闻推荐系统

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102819584B (zh) * 2012-07-26 2015-07-08 北京奇虎科技有限公司 一种界面文件展示方法及系统
CN104166668B (zh) * 2014-06-09 2018-02-23 南京邮电大学 基于folfm模型的新闻推荐系统及方法
US10817749B2 (en) * 2018-01-18 2020-10-27 Accenture Global Solutions Limited Dynamically identifying object attributes via image analysis
CN109903314A (zh) * 2019-03-13 2019-06-18 腾讯科技(深圳)有限公司 一种图像区域定位的方法、模型训练的方法及相关装置
CN112100504B (zh) * 2020-11-03 2021-09-10 北京达佳互联信息技术有限公司 内容推荐方法、装置、电子设备及存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170116543A1 (en) * 2015-10-23 2017-04-27 Sap Se Self-adaptive display layout system
CN107895024A (zh) * 2017-09-13 2018-04-10 同济大学 用于网页新闻分类推荐的用户模型构建方法及推荐方法
CN109740068A (zh) * 2019-01-29 2019-05-10 腾讯科技(北京)有限公司 媒体数据推荐方法、装置及存储介质
CN109947510A (zh) * 2019-03-15 2019-06-28 北京市商汤科技开发有限公司 一种界面推荐方法及装置、计算机设备
CN111461175A (zh) * 2020-03-06 2020-07-28 西北大学 自注意与协同注意机制的标签推荐模型构建方法及装置
CN113806631A (zh) * 2021-08-20 2021-12-17 华为技术有限公司 一种推荐方法、训练方法、装置、设备及新闻推荐系统

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4379574A4

Also Published As

Publication number Publication date
EP4379574A1 (en) 2024-06-05
EP4379574A4 (en) 2024-10-16
CN113806631A (zh) 2021-12-17
US20240184837A1 (en) 2024-06-06

Similar Documents

Publication Publication Date Title
CN109844767B (zh) 基于图像分析和预测的可视化搜索
US11361018B2 (en) Automatically curated image searching
US10043109B1 (en) Attribute similarity-based search
CN112733042B (zh) 推荐信息的生成方法、相关装置及计算机程序产品
WO2021238722A1 (zh) 资源推送方法、装置、设备及存储介质
WO2023020160A1 (zh) 一种推荐方法、训练方法、装置、设备及推荐系统
KR20190095333A (ko) 앵커식 검색
US20150339348A1 (en) Search method and device
CN110737783A (zh) 一种推荐多媒体内容的方法、装置及计算设备
US20210166014A1 (en) Generating document summary
WO2020057145A1 (en) Method and device for generating painting display sequence, and computer storage medium
WO2024051609A1 (zh) 广告创意数据选取方法及装置、模型训练方法及装置、设备、存储介质
US12079572B2 (en) Rule-based machine learning classifier creation and tracking platform for feedback text analysis
CN111177467A (zh) 对象推荐方法与装置、计算机可读存储介质、电子设备
US20230089574A1 (en) Modifying a document content section of a document object of a graphical user interface (gui)
CN111144974A (zh) 一种信息展示方法及装置
US10643142B2 (en) Search term prediction
US20230401250A1 (en) Systems and methods for generating interactable elements in text strings relating to media assets
CN117909560A (zh) 搜索方法、模型的训练方法、装置、设备、介质及程序产品
US20240248901A1 (en) Method and system of using domain specific knowledge in retrieving multimodal assets
US11768867B2 (en) Systems and methods for generating interactable elements in text strings relating to media assets
CN113486260B (zh) 互动信息的生成方法、装置、计算机设备及存储介质
CN116030375A (zh) 视频特征提取、模型训练方法、装置、设备及存储介质
CN113674043B (zh) 商品推荐方法及装置、计算机可读存储介质、电子设备
CN113032614A (zh) 一种跨模态信息检索方法和装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22857473

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2022857473

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2022857473

Country of ref document: EP

Effective date: 20240226

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 11202400985T

Country of ref document: SG