WO2023020160A1 - 一种推荐方法、训练方法、装置、设备及推荐系统 - Google Patents
一种推荐方法、训练方法、装置、设备及推荐系统 Download PDFInfo
- Publication number
- WO2023020160A1 WO2023020160A1 PCT/CN2022/105075 CN2022105075W WO2023020160A1 WO 2023020160 A1 WO2023020160 A1 WO 2023020160A1 CN 2022105075 W CN2022105075 W CN 2022105075W WO 2023020160 A1 WO2023020160 A1 WO 2023020160A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- user
- feature vector
- sample
- image
- candidate
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 119
- 238000012549 training Methods 0.000 title claims abstract description 26
- 239000013598 vector Substances 0.000 claims description 697
- 230000004927 fusion Effects 0.000 claims description 109
- 230000007246 mechanism Effects 0.000 claims description 67
- 239000000654 additive Substances 0.000 claims description 12
- 230000000996 additive effect Effects 0.000 claims description 12
- 238000004590 computer program Methods 0.000 claims description 8
- 230000004044 response Effects 0.000 claims description 3
- 230000008569 process Effects 0.000 description 36
- 230000000007 visual effect Effects 0.000 description 23
- 238000010586 diagram Methods 0.000 description 20
- 238000012545 processing Methods 0.000 description 17
- 238000004891 communication Methods 0.000 description 15
- 230000000694 effects Effects 0.000 description 12
- 238000003062 neural network model Methods 0.000 description 8
- 230000006870 function Effects 0.000 description 7
- 238000007781 pre-processing Methods 0.000 description 4
- 230000009466 transformation Effects 0.000 description 4
- 230000006399 behavior Effects 0.000 description 3
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 230000008447 perception Effects 0.000 description 2
- 230000011218 segmentation Effects 0.000 description 2
- 230000035945 sensitivity Effects 0.000 description 2
- 208000003443 Unconsciousness Diseases 0.000 description 1
- 230000003190 augmentative effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000007499 fusion processing Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000002085 persistent effect Effects 0.000 description 1
- 238000000844 transformation Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/53—Querying
- G06F16/535—Filtering based on additional data, e.g. user or group profiles
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/58—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/583—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
Definitions
- the embodiments of the present application relate to the technical field of recommendation, and in particular to a recommendation method, training method, device, equipment, and recommendation system.
- the current news recommendation system is only used to mine news content that users are interested in, ignoring the impact of the news interface used to recommend news content on users, resulting in the inability to further improve the click-through rate of news.
- the embodiment of the present application provides a recommendation method, training method, device, equipment and recommendation system, which are used to increase the user's click rate on news by using the influence of the news interface on the user.
- the embodiment of the present application provides a recommendation method, including: acquiring multiple images, each image contains a candidate interface and a candidate content presented through the candidate interface, wherein the image can be understood as presented through the candidate interface
- the image of the candidate content; the candidate content can be not only news content, but also other content such as short videos and product information; correspondingly, the candidate interface can be not only a news interface, but also an interface for presenting short videos, Commodity information interface; acquire image feature data of each image; image feature data may include global visual impression feature data and/or local visual impression feature data, wherein global visual impression feature data can be understood as features extracted from the entire image Data, local visual impression feature data can be understood as the feature data extracted from the local area of the image; based on the user feature data and image feature data of the target user, and predict the target user's preference for each image through the prediction model, the prediction model
- the input is determined based on user feature data and image feature data, where the user feature data includes the user's age information, the city where the user is located, and
- the prediction model trained based on the image feature data of the image can accurately predict the user's perception of the image while considering the influence of the candidate content and the candidate interface on the user.
- the degree of preference which is conducive to recommending content of interest to the user through the candidate interface that the user is interested in, so as to improve the click rate of the user on the recommended content.
- each image includes multiple regions.
- the image can be divided by various methods to obtain multiple regions; for example, based on the foregoing description, a piece of news can include the title of the news The author of the news and the category of the news.
- the news can also include the picture part; therefore, the regional coordinates of the above-mentioned parts can be obtained according to the news layout, and then the image can be divided into multiple regions according to the regional coordinates;
- the image feature data of each image includes multiple local feature vectors, and each local feature vector is used to represent a region.
- the image is divided into multiple regions, and the local feature vector representing each region is used as the image feature data of the image, so that the local features of the image can be better extracted to improve the user's preference for the image prediction accuracy.
- based on the user characteristic data and image characteristic data of the target user, and predicting the preference degree of the target user for each image through the prediction model includes: for each image, based on the candidate content in each image to obtain N word vectors, each word vector represents a word in the candidate content, where N is a positive integer; the candidate content includes N words, corresponding to each word, a word vector can be generated by a text characterizer; and a picture characterizer Similarly, the text characterizer can also be understood as a model obtained through pre-training. There can be many types of the model.
- the model can be a Bert model; since the title of the news content can better reflect the main information of the news content; Therefore, when the candidate content is news content, word segmentation processing can be performed on the title of the news content to obtain N words, and then N word vectors representing N words can be obtained through the text characterizer; for each word vector, based on each word vectors and multiple local feature vectors, and calculate the respective attention weights of multiple local feature vectors through the model of the attention mechanism.
- the attention weight indicates that the target user pays attention to the local feature vectors when reading the words represented by each word vector
- the extent of the represented area; the attention mechanism is a method of dynamically controlling the attention of each part or a certain part of the neural network model in the neural network model by calculating the attention weight of each part in the neural network model and merging them into an attention vector.
- each word vector is fused with multiple local feature vectors to obtain the first fusion feature vector, and each word vector corresponds to a first fusion Feature vector;
- multiple local feature vectors can be weighted by their respective attention weights, and then the result of the weighted process is added to the word vector to obtain the first fusion feature vector;
- the input of the prediction model is determined based on the user feature vector and the N first fusion feature vectors, The user feature vector is used to characterize the user feature data of the target user.
- the attention weights of multiple local feature vectors are calculated through the model of the attention mechanism, because the attention weight indicates that the target user pays attention to the area represented by the local feature vector when reading the words represented by each word vector. degree, so based on the respective attention weights of multiple local feature vectors, each word vector is fused with multiple local feature vectors, and the first fused feature vector obtained can reflect the words in the image and the effects left by each area to the user. Impression feature information; in this way, using the first fused feature vector to predict the preference degree can improve the accuracy of the user's preference degree for the image.
- the model of the attention mechanism processes the N first fused feature vectors corresponding to the N word vectors to obtain N semantically enhanced feature vectors, and each first fused feature vector corresponds to a semantically enhanced feature vector, wherein the self-attention
- the self-attention mechanism is a mechanism improved by the attention mechanism, which reduces the dependence on external information and is better at capturing the internal correlation of data or features; based on user feature vectors and N semantically enhanced feature vectors , and predict the target user's preference for each image through the prediction model, the input of the prediction model is determined based on the user feature vector and N semantic enhancement feature vectors.
- the semantic enhancement feature vector is obtained by processing the N first fusion feature vectors corresponding to N word vectors through the model of the self-attention mechanism. Since the self-attention mechanism is better at capturing the internal correlation of data or features, the obtained The semantically enhanced feature vector can reflect the correlation between the first fusion feature vectors, so that it can more accurately reflect the impression feature information left by the image to the user; in this way, using the semantically enhanced feature vector to predict the degree of preference can improve the user's perception of the image. The accuracy of the degree of preference.
- based on the user feature vector and N semantically enhanced feature vectors, and predicting the target user's preference for each image through the prediction model includes: For each image, the model of the additive attention mechanism will N A semantically enhanced feature vector is fused to obtain the second fusion feature vector; based on the user feature vector and the second fusion feature vector, and predicting the degree of preference of the target user for each image through the prediction model, the input of the prediction model is based on the user feature vector and the second fused feature vector determined.
- the model of the additive attention mechanism realizes the fusion of N semantically enhanced feature vectors, and uses the fused second fusion feature vector to predict the degree of preference and improve the accuracy of the user's preference degree for images.
- the image feature data of each image includes a global feature vector, and the global feature vector is used to represent the image; at this time, the image feature data can also be called global visual impression feature data; the method of obtaining the global feature vector
- the method may specifically include: inputting the image into a picture characterizer, so as to convert the image into a global feature vector through the picture characterizer.
- the global feature vector characterizing the image is used as the image feature data of the image, so that the global feature of the image can be better extracted, so as to improve the accuracy of prediction of the user's preference for the image.
- based on the user characteristic data and image characteristic data of the target user, and predicting the preference degree of the target user for each image through the prediction model includes: for each image, based on the candidate content in each image to obtain Content feature vector, the content feature vector is used to represent the candidate content; since the title of the news content can better reflect the main information of the news content; therefore, when the candidate content is news content, the title of the news content can be converted into a title feature vector ;Based on the content feature vector and the global feature vector, determine the weight of the content feature vector and the weight of the global feature vector; Since users may have different sensitivities to visual impression information and text semantics, as an achievable way, you can use Adaptively controlling the respective weights of the content feature vector and the global feature vector through a threshold addition network; based on the weight of the content feature vector and the weight of the global feature vector, merging the content feature vector and the global feature vector to obtain a third fusion feature vector; Based on the user feature vector and the third fusion feature vector, and predict the target user
- the feature vector can represent the impression feature information left by the extracted image to the user from a global perspective; therefore, using the third fused feature vector to predict the target user's preference for each image can improve the accuracy of the user's preference for the image .
- selecting the candidate content and/or the candidate interface from the candidate interfaces and candidate contents included in multiple images based on the degree of preference for recommendation includes: selecting from the candidate contents included in the images based on the degree of preference Selecting a candidate content as the target candidate content; selecting a candidate interface as the target candidate interface from the candidate interfaces of images containing the target candidate content based on the degree of preference, so as to recommend the target candidate content through the target candidate interface.
- the candidate interface recommends target candidate content, which realizes recommending the user-preferred candidate content for the user through the user-preferred candidate interface, thereby increasing the probability of the user clicking on the recommended content.
- the method further includes: sending the metadata and the target candidate interface to the terminal device The target candidate content, so that the terminal device displays the target candidate interface based on the metadata, and recommends the target candidate content to the target user through the target candidate interface; wherein, the metadata includes various configuration data of the target candidate interface.
- the embodiment of the present application provides a training method, including: acquiring a plurality of sample images, each sample image including a sample candidate interface and a sample candidate content presented through the sample candidate interface; acquiring each sample image image feature data; based on the user feature data and image feature data of the sample user, and predict the sample user's preference for each sample image through the prediction model, the input of the prediction model is determined based on the user feature data and image feature data; based on The degree of preference and the historical click data of the sample user on the sample candidate content are adjusted to the prediction model, wherein the sample user’s historical click data on the sample candidate content may include whether the sample user clicks on the sample candidate content, and whether the sample user clicks on the sample candidate content times; specifically, the weight of the prediction model can be adjusted, and the structure of the prediction model can also be adjusted.
- the prediction model trained based on the image feature data of the sample image can accurately and accurately take into account the influence of the candidate content and the candidate interface on the user.
- the user's preference for images is output, so that it is beneficial to recommend content of interest to the user through the interface that the user is interested in, so as to improve the click rate of the user on the recommended content.
- each sample image includes multiple regions; image feature data of each sample image includes multiple local feature vectors, and each local feature vector is used to characterize a region.
- based on the user characteristic data and image characteristic data of the sample user, and using the prediction model to predict the preference degree of the sample user to each sample image includes: for each sample image, based on the The sample candidate content obtains N word vectors, and each word vector represents a word in the sample candidate content, where N is a positive integer; for each word vector, based on each word vector and multiple local feature vectors, and through attention
- the model of the force mechanism calculates the respective attention weights of multiple local feature vectors.
- the attention weight indicates the degree to which the sample user pays attention to the area represented by the local feature vector when reading the words represented by each word vector;
- the attention weight of each word vector and multiple local feature vectors are fused to obtain the first fusion feature vector, and each word vector corresponds to a first fusion feature vector; based on the user feature vector and N word vectors corresponding N first fusion feature vectors, and predict the sample user’s preference for each sample image through the prediction model.
- the input of the prediction model is determined based on the user feature vector and the N first fusion feature vectors.
- the user feature vector is used to represent The user characteristic data of the sample user.
- based on the N first fusion feature vectors corresponding to the user feature vector and N word vectors, and predicting the preference degree of the sample user for each sample image through the prediction model includes: for each sample image, Process the N first fusion feature vectors corresponding to N word vectors through the model of the self-attention mechanism to obtain N semantic enhancement feature vectors, and each first fusion feature vector corresponds to a semantic enhancement feature vector; based on user characteristics vector and N semantic enhancement feature vectors, and predict the sample user's preference for each sample image through the prediction model, the input of the prediction model is determined based on the user feature vector and N semantic enhancement feature vectors.
- based on the user feature vector and N semantically enhanced feature vectors, and predicting the preference degree of the sample user for each sample image through the prediction model includes: for each sample image, through the model of the additive attention mechanism Merging N semantically enhanced feature vectors to obtain a second fusion feature vector; based on the user feature vector and the second fusion feature vector, and predicting the degree of preference of the sample user for each sample image through a prediction model, the input of the prediction model is based on The user feature vector and the second fusion feature vector are determined.
- the image feature data of each sample image includes a global feature vector, and the global feature vector is used to characterize the sample image.
- based on the user characteristic data and image characteristic data of the sample user, and using the prediction model to predict the preference degree of the sample user to each sample image includes: for each sample image, based on the The sample candidate content obtains the content feature vector, and the content feature vector is used to represent the sample candidate content; based on the content feature vector and the global feature vector, determine the weight of the content feature vector and the weight of the global feature vector; based on the weight of the content feature vector and the global feature vector The weight of the content feature vector and the global feature vector are fused to obtain the third fusion feature vector; based on the user feature vector and the third fusion feature vector, the prediction model predicts the preference degree of the sample user for each sample image, and the prediction model The input of is determined based on the user feature vector and the third fusion vector, where the user feature vector is used to characterize the user feature data of the sample user.
- the embodiment of the present application provides a recommendation device, including: a first image acquisition unit, configured to acquire multiple images, each image contains a candidate interface and a candidate content presented through the candidate interface; the first The feature data acquisition unit is used to acquire the image feature data of each image; the first prediction unit is used to predict the target user's preference for each image based on the target user's user feature data and image feature data through a prediction model, The input of the prediction model is determined based on user feature data and image feature data; the recommendation unit is used to select candidate content and/or candidate interface from candidate interfaces and candidate content contained in multiple images based on preference degree for recommendation.
- each image includes multiple regions; the image feature data of each image includes multiple local feature vectors, and each local feature vector is used to characterize a region.
- the first prediction unit is used to obtain N word vectors based on the candidate content in each image for each image, and each word vector represents a word in the candidate content, where N is Positive integer; for each word vector, based on each word vector and multiple local feature vectors, the respective attention weights of multiple local feature vectors are calculated through the model of the attention mechanism.
- the attention weight indicates that the target user is reading each When the words represented by the word vector, pay attention to the degree of the region represented by the local feature vector; based on the respective attention weights of multiple local feature vectors, each word vector and multiple local feature vectors are fused to obtain the first fusion feature vector, Each word vector corresponds to a first fusion feature vector; based on the user feature vector and the N first fusion feature vectors corresponding to the N word vectors, and predict the target user's preference for each image through the prediction model, the prediction model The input is determined based on the user feature vector and the N first fusion feature vectors, where the user feature vector is used to represent the user feature data of the target user.
- the first prediction unit is used to process the N first fusion feature vectors corresponding to the N word vectors through the model of the self-attention mechanism for each image, so as to obtain N semantic enhancements Feature vectors, each first fusion feature vector corresponds to a semantic enhancement feature vector; based on the user feature vector and N semantic enhancement feature vectors, and predict the target user's preference for each image through the prediction model, the input of the prediction model is based on The user feature vector and N semantic enhancement feature vectors are determined.
- the first prediction unit is used to predict the preference degree of the target user for each image based on the user feature vector and N semantically enhanced feature vectors through the prediction model: for each image, by adding The model of the attention mechanism fuses N semantic enhancement feature vectors to obtain the second fusion feature vector; based on the user feature vector and the second fusion feature vector, and predicts the target user's preference for each image through the prediction model, the prediction model The input of is determined based on the user feature vector and the second fused feature vector.
- the image feature data of each image includes a global feature vector, and the global feature vector is used to characterize the image.
- the first prediction unit is used to obtain the content feature vector based on the candidate content in each image for each image, and the content feature vector is used to represent the candidate content; based on the content feature vector and the global feature vector , determine the weight of the content feature vector and the weight of the global feature vector; based on the weight of the content feature vector and the weight of the global feature vector, fuse the content feature vector and the global feature vector to obtain the third fusion feature vector; based on the user feature vector and The third fusion feature vector, and predict the target user's preference for each image through the prediction model, the input of the prediction model is determined based on the user feature vector and the third fusion vector, and the user feature vector is used to represent the user feature data of the target user .
- the recommendation unit is configured to select a candidate content as the target candidate content from the candidate content contained in multiple images based on the degree of preference; from the candidate interface of the image containing the target candidate content based on the degree of preference, A candidate interface is selected as a target candidate interface to recommend target candidate content through the target candidate interface.
- the device further includes a sending unit, configured to send the metadata of the target candidate interface and the content of the target candidate to the terminal device, so that the terminal device displays the target candidate interface based on the metadata, and sends the target candidate interface to the target through the target candidate interface.
- the user recommends target candidate content.
- the embodiment of the present application provides a training device, including: a second image acquisition unit, configured to acquire a plurality of sample images, each sample image including a sample candidate interface and a sample presented through the sample candidate interface Candidate content; the second characteristic data acquisition unit is used to obtain the image characteristic data of each sample image; the second prediction unit is used to predict the sample user's response to each sample user based on the sample user's user characteristic data and image characteristic data through a prediction model The preference degree of each sample image, the input of the prediction model is determined based on user characteristic data and image characteristic data; the adjustment unit is used to adjust the prediction model based on the preference degree and historical click data of the sample candidate content by the sample user.
- each sample image includes multiple regions; image feature data of each sample image includes multiple local feature vectors, and each local feature vector is used to characterize a region.
- the second prediction unit is configured to, for each sample image, obtain N word vectors based on the sample candidate content in each sample image, and each word vector represents a word in the sample candidate content, Among them, N is a positive integer; for each word vector, based on each word vector and multiple local feature vectors, the respective attention weights of multiple local feature vectors are calculated through the model of the attention mechanism, and the attention weight represents the sample user When reading the words represented by each word vector, pay attention to the degree of the area represented by the local feature vector; based on the respective attention weights of multiple local feature vectors, each word vector and multiple local feature vectors are fused to obtain the first Fusion feature vectors, each word vector corresponds to a first fusion feature vector; based on the user feature vector and the N first fusion feature vectors corresponding to the N word vectors, and predict the sample user's preference for each sample image through the prediction model To an extent, the input of the prediction model is determined based on the user feature vector and the N first fusion feature vectors, and the user
- the second prediction unit is used to process the N first fused feature vectors corresponding to the N word vectors through the model of the self-attention mechanism for each sample image to obtain N semantic Enhanced feature vectors, each first fused feature vector corresponds to a semantically enhanced feature vector; based on the user feature vector and N semantically enhanced feature vectors, and predict the preference of the sample user for each sample image through the prediction model, and predict the input of the model is determined based on the user feature vector and N semantic enhancement feature vectors.
- the second prediction unit is used to fuse the N semantically enhanced feature vectors through the model of the additive attention mechanism for each sample image to obtain the second fusion feature vector; based on the user feature vector and The second fused feature vector is used to predict the preference degree of the sample user for each sample image through the prediction model, and the input of the prediction model is determined based on the user feature vector and the second fused feature vector.
- the image feature data of each sample image includes a global feature vector, and the global feature vector is used to characterize the sample image.
- the second prediction unit is configured to, for each sample image, obtain a content feature vector based on the sample candidate content in each sample image, and the content feature vector is used to characterize the sample candidate content; based on the content feature vector and the global feature vector to determine the weight of the content feature vector and the weight of the global feature vector; based on the weight of the content feature vector and the weight of the global feature vector, the content feature vector and the global feature vector are fused to obtain the third fusion feature vector; The user feature vector and the third fusion feature vector, and predict the sample user's preference for each sample image through the prediction model.
- the input of the prediction model is determined based on the user feature vector and the third fusion vector, and the user feature vector is used to represent the sample User profile data for the user.
- an embodiment of the present application provides a computer device, including: one or more processors and a memory; wherein, computer-readable instructions are stored in the memory; one or more processors read the computer-readable instructions,
- the on-vehicle device implements the method in any one of the implementation manners of the first aspect.
- an embodiment of the present application provides a training device, including: one or more processors and a memory; wherein, computer-readable instructions are stored in the memory; one or more processors read the computer-readable instructions,
- the on-vehicle device implements the method in any implementation manner of the second aspect.
- the embodiment of the present application provides a computer-readable storage medium, including computer-readable instructions, and when the computer-readable instructions are run on the computer, the computer executes any implementation method according to the first aspect or the second aspect. Methods.
- the embodiment of the present application provides a chip, including one or more processors. Part or all of the processor is used to read and execute the computer program stored in the memory, so as to execute the method in any possible implementation manner of the first aspect or the second aspect above.
- the chip includes a memory, and the memory and the processor are connected to the memory through a circuit or wires. Further optionally, the chip further includes a communication interface, and the processor is connected to the communication interface.
- the communication interface is used to receive data and/or information to be processed, and the processor obtains the data and/or information from the communication interface, processes the data and/or information, and outputs the processing result through the communication interface.
- the communication interface may be an input-output interface.
- some of the one or more processors can also implement some steps in the above method through dedicated hardware.
- the processing related to the neural network model can be performed by a dedicated neural network processor or graphics processor to achieve.
- the method provided in the embodiment of the present application may be implemented by one chip, or may be implemented by multiple chips in cooperation.
- the embodiment of the present application provides a computer program product, the computer program product includes computer software instructions, and the computer software instructions can be loaded by a processor to implement any one of the first aspect or the second aspect above way of way.
- the embodiment of the present application provides a recommendation system, including a terminal device and a server;
- the server is used to execute the method in any one of the implementation manners in the first aspect
- the terminal device is configured to receive metadata and target candidate content of the target candidate interface from the server;
- a target candidate interface is displayed based on the metadata, and target candidate content is recommended to the target user through the target candidate interface.
- Fig. 1 is a schematic diagram of the structure of the news recommendation system provided by the embodiment of the present application.
- Fig. 2 is a schematic diagram of an embodiment of news
- Fig. 3 is a schematic diagram of the working process of the news recommendation system
- Fig. 4 provides a schematic diagram of an embodiment of a training method for the embodiment of the present application
- Fig. 5 is a schematic diagram of the region of the sample image in the embodiment of the present application.
- FIG. 6 is a schematic diagram of a first embodiment of predicting the degree of preference of sample users to each sample image in the embodiment of the present application;
- FIG. 7 is a schematic diagram of a second embodiment of predicting the degree of preference of sample users to each sample image in the embodiment of the present application.
- FIG. 8 is a schematic diagram of a second fusion feature vector process in an embodiment of the present application.
- FIG. 9 is a schematic diagram of a third embodiment of predicting the degree of preference of sample users to each sample image in the embodiment of the present application.
- Fig. 10 is a schematic diagram of the process of obtaining the third fusion feature vector in the embodiment of the present application.
- Fig. 11 provides a schematic diagram of an embodiment of a recommendation method for the embodiment of the present application.
- Fig. 12 is a schematic diagram of the first embodiment of predicting the degree of preference of target users for each image in the embodiment of the present application;
- FIG. 13 is a schematic diagram of a second embodiment of predicting the degree of preference of target users for each image in the embodiment of the present application.
- FIG. 14 is a schematic diagram of a third embodiment of predicting the degree of preference of target users for each image in the embodiment of the present application.
- FIG. 15 is a schematic diagram of an embodiment of predicting a user's preference for news in an embodiment of the present application.
- FIG. 16 is a schematic diagram of an embodiment of obtaining the best user interface configuration in the embodiment of the present application.
- Fig. 17 provides a schematic diagram of an embodiment of a training device for the embodiment of the present application.
- Fig. 18 provides a schematic diagram of an embodiment of a recommendation device according to the embodiment of the present application.
- FIG. 19 is a schematic diagram of an embodiment of a computer device provided in an embodiment of the present application.
- plural means two or more.
- the term “and/or” or the character “/” in this application is just an association relationship describing associated objects, indicating that there may be three relationships, for example, A and/or B, or A/B, which may indicate: A alone exists, both A and B exist, and B exists alone.
- the embodiment of the present application can be applied to the news recommendation system shown in FIG. 1 .
- the news recommendation system includes a terminal device and a server, and the terminal device is connected to the server by communication.
- terminal devices may include mobile phones, tablet computers, desktop computers, vehicle-mounted devices, and other devices that can deploy news applications; hereinafter, terminal devices are referred to as terminals for short.
- the server can be an ordinary server or a cloud server.
- a news application is deployed in a terminal, and a recommendation service is deployed in a server.
- the terminal When the user accesses the news application in the terminal, the terminal will send a request to the server to request the recommendation service in the server; after receiving the request, the server will start the recommendation service, and then select the news content that the user is interested in from a large number of news content as recommended news content; then the server sends the recommended news content to the terminal, and then the terminal displays the recommended news content to the user.
- the embodiment of the present application does not specifically limit the news content; for example, as shown in FIG. 2 , the news content may include the title of the news, the author of the news, and the category of the news. In addition, the news content may also include the text of the news.
- the news interface presenting the news content will also affect the click-through rate of the news.
- the layout of the graphics and texts in the news interface (including the position of the title, the relative position between the title and the picture), whether or not there is a picture and the size of the picture, the color of the picture, the clarity of the picture, the font, and the size of the font All will leave different visual impressions to users, affect the user's browsing experience, and thus affect the user's click behavior on news.
- the information that leaves a visual impression on the user in the news interface is called visual impression information.
- the visual impression information can be understood as the news multi-modal information displayed on the news interface from the user's perspective. Specifically, it may include the aforementioned graphic and text information. Layout, whether or not to have a picture and the size of the picture, the color of the picture, the clarity of the picture, the font, the size of the font and other information.
- the embodiment of this application provides a recommendation method, which is to obtain multiple images, each image contains a candidate interface and a candidate content, and then according to the user characteristic data of the target user and the image feature data of the image, and use the prediction model to predict the degree of preference of the target user for each image, and finally select candidate content and/or candidate interface from multiple images according to the degree of preference for recommendation;
- the candidate interface can be a news interface
- the candidate content can be news content.
- the recommendation method can realize the recommendation of news, and, in the process of using the recommendation method to recommend news, not only the influence of the target user of the news content is considered, but also the Considering the impact of the news interface on the target user, the interested news (including the news content and the news interface) can be recommended to the target user to further increase the click-through rate of the news.
- the candidate content can be not only news content, but also other content such as short videos and product information; correspondingly, the candidate interface can be not only a news interface, but also an interface for presenting short videos Product information interface.
- the method provided in the embodiment of the present application is introduced below by taking the candidate content as news content and the candidate interface as a news interface as an example.
- the server can also select the news interface that the user is interested in, and then send the metadata of the news interface to the terminal, and then the terminal displays the news interface based on the metadata, and uses the news interface to send Recommended news content is displayed to the user.
- FIG. 3 the working process of the news recommendation system shown in FIG. 1 can be shown in FIG. 3 .
- the server extracts news-related data from the user's behavior log (specifically, it may include browsing news data or clicking on news data), uses news-related data to construct training data, and then performs offline training based on the training data to obtain a prediction model;
- the server receives the request for the recommendation service, it performs online prediction through the prediction model to obtain the user's preference degree for multiple news images, and then selects the news content and news interface according to the preference degree; finally, the terminal sends the message to the user through the news interface. Show news content.
- the embodiment of the present application provides an embodiment of a training method, which is usually applied to a server, specifically, this embodiment includes:
- step 101 a plurality of sample images are acquired, and each sample image includes a sample candidate interface and a sample candidate content presented through the sample candidate interface.
- the sample image can be understood as an image that presents the sample candidate content through the sample candidate interface, wherein, the sample candidate interface and the sample candidate content can be understood by referring to the relevant descriptions of the candidate interface and the candidate content above.
- Cases of a plurality of sample images may include the following three.
- the first case is: multiple sample images include one sample candidate interface and multiple sample candidate contents, that is, the sample candidate interfaces in all sample images are the same.
- the second case is that the multiple sample images include multiple sample candidate interfaces and one type of sample candidate content, that is, the sample candidate content in all the sample images is the same.
- the third case is that multiple sample images include multiple sample candidate interfaces and multiple sample candidate contents.
- all sample images containing the same sample candidate content may contain multiple sample candidate contents; for example, the sample images are 10,000, 10,000 sample images include 100 sample candidate contents, and all sample images containing the same sample candidate content include 100 sample candidate interfaces, that is, each sample candidate content can be presented through 100 sample candidate interfaces.
- Step 102 acquiring image feature data of each sample image.
- image feature data may only include global visual impression feature data, It may include only the characteristic data of the local visual impression, or may include the characteristic data of the global visual impression and the characteristic data of the local visual impression at the same time.
- each sample image includes multiple regions
- the image feature data of each sample image includes multiple local feature vectors, and each local feature vector is used to characterize a region; at this time, the Image feature data may also be referred to as partial visual impression feature data.
- sample image can be divided by various methods to obtain multiple regions; for example, based on the foregoing description, a piece of news can include the title of the news, the author of the news, and the category of the news.
- the news can also include a picture part; therefore, the regional coordinates of the above-mentioned parts can be obtained according to the news layout, and then the sample image can be divided into multiple regions according to the regional coordinates.
- the sample image in FIG. 5 can be divided into three areas of news title, news category and news picture by using the above method.
- the method for obtaining local feature vectors may specifically include: inputting images of multiple regions into a picture characterizer, so as to convert multiple regions into multiple local feature vectors through the picture characterizer; wherein, the picture characterizer can be understood as
- a pre-trained model may have many types, for example, the type of the model may be ResNet101.
- the image feature data of each sample image includes a global feature vector, which is used to characterize the sample image; at this time, the image feature data may also be called global visual impression feature data.
- the method for obtaining the global feature vector may specifically include: inputting the sample image into a picture characterizer, so as to convert the sample image into a global feature vector through the picture characterizer; since the picture characterizer is described above, it will not be described in detail here stated.
- Step 103 acquiring user feature data of the sample user.
- the embodiment of the present application does not specifically limit the type of feature data of the sample user.
- the feature data of the sample user includes the age information of the sample user, the city where the sample user is located, and the historical data related to the news of the sample user; wherein, the sample user
- the historical data related to news may specifically include the type of news that the sample user browses, the type of news that the sample user clicks on, the time when the sample user clicks on the news, the location when the sample user clicks on the news, and the like.
- the historical data related to the news of the sample user can be obtained from the behavior log of the sample user.
- Step 104 based on the user characteristic data and image characteristic data of the sample user, predict the preference degree of the sample user to each sample image through a prediction model, and the input of the prediction model is determined based on the user characteristic data and image characteristic data.
- user feature data and image feature data can also be combined with specific environmental information (such as time, date, whether it is a weekend, whether it is a holiday, etc.), and predict the degree of preference of sample users for each sample image through a prediction model .
- the sample user feature data and image feature data can be directly input into the prediction model, so as to obtain the sample user’s preference degree of the sample image output by the prediction model; the image feature data can also be processed first to obtain The intermediate characteristic data is obtained, and then the sample user characteristic data and the intermediate characteristic data are input into the prediction model, so as to obtain the preference degree of the sample user to the sample image output by the prediction model.
- Step 105 adjust the prediction model based on the degree of preference and historical click data of the sample candidate content by the sample user.
- the historical click data of the sample user on the sample candidate content may include whether the sample user clicks on the sample candidate content, and the number of times the sample user clicks on the sample candidate content.
- the sample label can be set according to the historical click data of the sample user on the sample candidate content; for example, for a sample image, if the sample user has clicked on the sample candidate content in the sample image, the degree of preference The sample label is set to 1, and if the sample user has not clicked on the sample candidate content in the sample image, the sample label of the degree of preference can be set to 0.
- the sample label of the degree of preference can be set to 1; The number of times of the sample candidate content of is less than the first threshold and greater than or equal to the second threshold, then the sample label of the degree of preference can be set to 0.5; if the number of times the sample user clicks on the sample candidate content in the sample image is less than the second threshold, or If the sample user has not clicked on the sample candidate content in the sample image, the sample label of the preference level can be set to 0.
- the loss function can be calculated according to the sample user's preference for the sample image output by the prediction model, and the sample label can update the weight of the prediction model through the backpropagation of the loss function, or adjust the structure of the prediction model so that the prediction model outputs The degree of preference is close to the sample labels.
- the prediction model trained based on the image feature data of the sample image can simultaneously consider the impact of the candidate content and the candidate interface on the user. In the case of influence, accurately output the user's preference for the image, which is conducive to recommending the content of interest to the user through the interface that the user is interested in, so as to improve the click rate of the user on the recommended content.
- image feature data includes partial visual impression feature data
- step 104 includes:
- Step 201 for each sample image, obtain N word vectors based on the sample candidate content in each sample image, each word vector represents a word in the sample candidate content, where N is a positive integer.
- the sample candidate content includes N words, corresponding to each word, a word vector can be generated by using the text characterizer; similar to the picture characterizer, the text characterizer can also be understood as a model obtained through pre-training, and there can be many types of the model For example, the model can be a Bert model.
- sample candidate content is news content
- title of the news content can better reflect the main information of the news content; therefore, when the sample candidate content is news content, word segmentation can be performed on the title of the news content , to obtain N words, and then obtain N word vectors representing N words through the text characterizer.
- Step 202 for each word vector, based on each word vector and multiple local feature vectors, and through the model of the attention mechanism, calculate the respective attention weights of multiple local feature vectors.
- the attention weight indicates that the sample user is reading each When using words represented by word vectors, we pay attention to the extent of the regions represented by local feature vectors.
- the attention mechanism is a mechanism that dynamically controls the degree of attention to each part or a certain part of the neural network model in the neural network model by calculating the attention weight of each part in the neural network model and merging them into an attention vector .
- Focus attention refers to the attention that has a predetermined purpose, depends on tasks, and actively and consciously focuses on an object; the other is bottom-up unconscious attention, called saliency-based attention.
- the attention mechanism also includes the following variants: multi-head attention mechanism, hard attention mechanism, key-value pair attention mechanism and structured attention mechanism.
- the multi-head attention mechanism uses multiple queries to calculate and select multiple information from the input information in parallel, and each attention focuses on different parts of the input information.
- o j to represent the j-th local feature vector
- w i to represent the i-th word vector
- the formula can be used Calculate the attention weights of multiple local feature vectors for the word vector w i , where, Indicates the attention weight, q m ( ) and k m ( ) represent the linear transformation with a bias term, and k1 represents the number of local feature vectors (ie, the K1th).
- the sample image in Fig. 5 is divided into three regions: the title of the news, the category of the news, and the picture of the news.
- the local feature vectors representing these three regions can be obtained; the word "states "For example, for the word vector representing the word "states", the attention weights of the local feature vectors of the three regions respectively represent the degree to which the sample user pays attention to the three regions when paying attention to the word "states”.
- Step 203 based on the respective attention weights of the multiple local feature vectors, each word vector is fused with multiple local feature vectors to obtain a first fused feature vector, and each word vector corresponds to a first fused feature vector.
- the multiple local feature vectors may be weighted by their respective attention weights, and then the weighted result is added to the word vector to obtain the first fusion feature vector.
- Step 204 based on the N first fused feature vectors corresponding to the user feature vector and N word vectors, and predicting the degree of preference of the sample user for each sample image through the prediction model, the input of the prediction model is based on the user feature vector and N word vectors
- the user feature vector determined by the first fused feature vector is used to characterize the user feature data of the sample user.
- the user feature vector and N first fusion feature vectors can be directly input into the prediction model, so as to obtain the sample user's preference degree of the sample image output by the prediction model;
- the first fused feature vector is processed to obtain intermediate feature data, and then the user feature vector and the intermediate feature data are input to the prediction model, so as to obtain the sample user's preference degree for the sample image output by the prediction model.
- step 204 includes:
- Step 301 for each sample image, process the N first fused feature vectors corresponding to the N word vectors through the model of the self-attention mechanism to obtain N semantically enhanced feature vectors, and each first fused feature vector corresponds to A semantically augmented feature vector.
- the self-attention mechanism is a mechanism improved from the attention mechanism, which reduces the dependence on external information and is better at capturing the internal correlation of data or features.
- the self-attention mechanism can be used to better Analyze the correlation between the N first fused feature vectors; correspondingly, the attention mechanism is used to capture the correlation outside the data.
- the attention mechanism is used to process word vectors and multiple local feature vectors , compared to the word represented by the word vector, the image area represented by the local feature vector is external, so the embodiment of the present application uses the attention mechanism to capture the relationship between the word represented by the word vector and the image area represented by the local feature vector relevance.
- the self-attention mechanism includes a single-head self-attention mechanism and a multi-head self-attention mechanism.
- N first fused feature vectors are obtained from N word vectors, and there is a semantic relationship between the N word vectors, correspondingly, there is also a semantic relationship between the N first fused feature vectors ; Therefore, in this embodiment, semantic enhancement processing is performed on the N first fused feature vectors through the model of the self-attention mechanism.
- the process of processing the N first fusion feature vectors through the model of the self-attention mechanism may include: Using the formula and Process the N first fused feature vectors, where q( ) and k( ) represent linear transformations, Indicates the jth first fused feature vector For the i-th first fused feature vector The degree of semantic enhancement, k2 represents the number of local feature vectors (ie K2th).
- Step 302 Based on the user feature vector and N semantic enhancement feature vectors, predict the sample user's preference for each sample image through a prediction model, the input of the prediction model is determined based on the user feature vector and N semantic enhancement feature vectors.
- the user feature vector and N semantic enhancement feature vectors can be directly input into the prediction model, so as to obtain the sample user's preference degree of the sample image output by the prediction model;
- the enhanced feature vector is processed to obtain intermediate feature data, and then the user feature vector and the intermediate feature data are input to the prediction model, so as to obtain the sample user's preference degree for the sample image output by the prediction model.
- step 302 includes:
- the N semantically enhanced feature vectors are fused by the model of the additive attention mechanism to obtain the second fused feature vector;
- the preference degree of the sample user to each sample image is predicted through the prediction model, and the input of the prediction model is determined based on the user feature vector and the second fusion feature vector.
- the fusion of N semantically enhanced feature vectors through the model of the additive attention mechanism includes: using the formula and Process the fusion of N semantically enhanced feature vectors, k a is used to combine Converted into a hidden space vector, q a is used to calculate the attention weight in the fusion process, Indicates the attention weight of the i-th semantically enhanced feature vector, e1 indicates the second fusion feature vector, and k3 indicates the number of local feature vectors (that is, the K3th).
- the process of obtaining the second fusion feature vector can be summarized as follows: take the word vector and local feature vector as input, and use the attention mechanism and the self-attention mechanism in turn And the additive attention mechanism outputs the second fused feature vector.
- the situation that the image characteristic data includes the local visual impression characteristic data is introduced above, and the situation that the image characteristic data includes the global visual impression characteristic data is introduced below.
- step 104 includes:
- Step 401 for each sample image, obtain a content feature vector based on the sample candidate content in each sample image, and the content feature vector is used to characterize the sample candidate content.
- text characterizers can also be used to convert sample candidate content into content feature vectors.
- the title of the news content can better reflect the main information of the news content; therefore, when the sample candidate content is news content, the title of the news content can be converted into a title feature vector, and use the title feature vector as a content feature vector representing the content of the sample candidate.
- Step 402 based on the content feature vector and the global feature vector, determine the weight of the content feature vector and the weight of the global feature vector.
- a threshold addition network can be used to adaptively control the respective weights of content feature vectors and global feature vectors.
- Step 403 based on the weight of the content feature vector and the weight of the global feature vector, the content feature vector and the global feature vector are fused to obtain a third fused feature vector.
- the process of obtaining the third fusion feature vector can be summarized as follows: the content feature vector and the global feature vector are used as input, and the threshold addition network is used to output the third fusion feature vector.
- Step 404 based on the user feature vector and the third fusion feature vector, and predicting the preference degree of the sample user for each sample image through the prediction model, the input of the prediction model is determined based on the user feature vector and the third fusion vector, the user feature vector User characteristic data used to characterize sample users.
- the user feature vector and the third fusion feature vector can be directly input into the prediction model, so as to realize the prediction of the degree of preference of each sample image.
- the embodiment of the present application provides an embodiment of a recommendation method, which can be applied to a server or a terminal. Specifically, this embodiment includes:
- Step 501 acquire a plurality of images, each image includes a candidate interface and a candidate content presented through the candidate interface.
- Step 502 acquiring image characteristic data of each image.
- each image includes multiple regions, and correspondingly, the image feature data of each image includes multiple local feature vectors, and each local feature vector is used to represent a region.
- the image feature data of each image includes a global feature vector, and the global feature vector is used to characterize the image.
- Step 503 acquiring user feature data of the target user.
- Step 504 based on the user feature data and image feature data of the target user, predict the target user's preference for each image through a prediction model, and the input of the prediction model is determined based on the user feature data and image feature data.
- step 504 when the image feature data of each image includes multiple local feature vectors, step 504 includes:
- Step 601 for each image, obtain N word vectors based on the candidate content in each image, each word vector represents a word in the candidate content, where N is a positive integer;
- Step 602 for each word vector, based on each word vector and multiple local feature vectors, and through the model of the attention mechanism, calculate the respective attention weights of multiple local feature vectors.
- the attention weight indicates that the target user is reading each When using words represented by word vectors, pay attention to the extent of the region represented by local feature vectors;
- Step 603 based on the respective attention weights of a plurality of local feature vectors, each word vector is fused with a plurality of local feature vectors to obtain a first fusion feature vector, and each word vector corresponds to a first fusion feature vector;
- Step 604 based on the user feature vector and the N first fused feature vectors corresponding to the N word vectors, and predicting the degree of preference of the target user for each image through the prediction model, the input of the prediction model is based on the user feature vector and the Nth A fused feature vector is determined, and the user feature vector is used to characterize the user feature data of the target user.
- step 604 includes:
- Step 701 for each image, process the N first fused feature vectors corresponding to the N word vectors through the model of the self-attention mechanism to obtain N semantically enhanced feature vectors, and each first fused feature vector corresponds to one Semantic enhanced feature vectors;
- Step 702 Based on the user feature vector and N semantic enhancement feature vectors, predict the target user's preference for each image through a prediction model, the input of the prediction model is determined based on the user feature vector and N semantic enhancement feature vectors.
- step 702 includes:
- the N semantically enhanced feature vectors are fused by the model of the additive attention mechanism to obtain the second fused feature vector;
- the input of the prediction model is determined based on the user feature vector and the second fusion feature vector.
- step 504 when the image feature data of each image includes a global feature vector, step 504 includes:
- Step 801 for each image, obtain a content feature vector based on the candidate content in each image, and the content feature vector is used to represent the candidate content;
- Step 802 based on the content feature vector and the global feature vector, determine the weight of the content feature vector and the weight of the global feature vector;
- Step 803 based on the weight of the content feature vector and the weight of the global feature vector, fusing the content feature vector and the global feature vector to obtain a third fused feature vector;
- Step 804 based on the user feature vector and the third fusion feature vector, and predicting the preference degree of the target user for each image through the prediction model, the input of the prediction model is determined based on the user feature vector and the third fusion vector, and the user feature vector is determined by It is used to characterize the user characteristic data of the target user.
- step 804 includes:
- a candidate interface is selected as the target candidate interface from the candidate interfaces containing images of the target candidate content, so as to recommend the target candidate content through the target candidate interface.
- steps 501 to 504 are similar to steps 101 to 104 , and for details, please refer to relevant descriptions of steps 101 and 103 above for understanding.
- Step 505 selecting candidate content and/or candidate interface from candidate interfaces and candidate content contained in multiple images based on the degree of preference for recommendation.
- the preference-based program can only select candidate content in multiple images for recommendation, or select only candidate interfaces in multiple images for recommendation, or select candidate content and candidate interfaces from multiple images at the same time for recommendation , which will be described in detail below.
- user logs are used to obtain user click history
- news material and news interface are used to obtain news visual impressions
- data preprocessing module local impression module
- global impression module global impression module
- model prediction module model prediction module to obtain user click history.
- the preference program for news the preference degree specifically refers to the user’s preference program for the news content in the image (i.e. candidate content); finally, sort the multiple images from high to low according to the degree of preference, and then select the top M news content of an image and recommend it to target users.
- the data preprocessing module is used to perform step 502 and step 503, and the local impression module is used to perform the fusion operation in step 603, step 701 and step 702 to obtain the second fusion feature vector; the global impression module is used to perform step 802 And step 803 , the model prediction module is used to perform the prediction operation in step 702 and the prediction operation in step 804 .
- the current user’s user-side features that is, user feature data
- news materials and news interfaces are used to obtain multiple news interface combination candidates (that is, the multiple images in the previous article)
- the processing module, local impression module, global impression module, model prediction module and interface generation module are processed to obtain the user's preference program for news, and the degree of preference specifically refers to the user's preference program for the user interface (ie candidate interface) in the image ;Finally, sort multiple images from high to low according to the degree of preference, then select the user interface in the image with the highest degree of preference (i.e. the best user interface), and then generate the best user interface configuration; after that, it can be based on The best UI configuration shows the best UI and recommends various content for the current user with the best UI.
- the data preprocessing module is used to perform step 502 and step 503, and the local impression module is used to perform the fusion operation in step 603, step 701 and step 702 to obtain the second fusion feature vector; the global impression module is used to perform step 802 And in step 803, the model prediction module is used to perform the prediction operation in step 702 and the prediction operation in step 804, and the interface generation module is used to generate an optimal user interface according to the result predicted by the model prediction module.
- step 505 includes:
- a candidate interface is selected as the target candidate interface from the candidate interfaces containing images of the target candidate content, so as to recommend the target candidate content through the target candidate interface.
- various candidate contents may be selected and recommended to the target user, and the target candidate content is one of the selected candidate contents.
- the number of images is 4, the first image contains candidate content A and candidate interface A, the second image contains candidate content A and candidate interface B, the third image contains candidate content B and candidate interface A, and the fourth image contains candidate content A and candidate interface B.
- An image contains candidate content B and candidate interface B; the target user's preference for these four images is in order from high to low: the first image, the second image, the fourth image, and the third image.
- the target candidate interface is selected from the candidate interfaces of the first image and the second image;
- the degree of preference for one image is higher than that for the second image, so the candidate interface A in the first image is selected as the target candidate interface, and then the candidate content A is recommended to the target user through the candidate interface A.
- the target candidate interface is selected from the candidate interfaces of the fourth image and the third image; and since the target The user's preference for the fourth image is higher than that for the third image, so the candidate interface B in the fourth image is selected as the target candidate interface, and then the candidate content B is recommended to the target user through the candidate interface B.
- the obtained target candidate interfaces may be different.
- Step 506 Send the metadata of the target candidate interface and the target candidate content to the terminal device, so that the terminal device displays the target candidate interface based on the metadata, and recommends the target candidate content to the target user through the target candidate interface.
- the server when the above method is executed by the server, the server will send the metadata of the target candidate interface and the target candidate content to the terminal device; correspondingly, the terminal device will receive the metadata of the target candidate interface and the target candidate content , and then display the target candidate interface based on the metadata, and recommend the target candidate content to the target user through the target candidate interface.
- the embodiment of the present application provides an embodiment of a recommendation device, including: a first image acquisition unit 601, configured to acquire multiple images, each image contains a candidate interface and a candidate interface presented candidate content; the first characteristic data acquisition unit 602 is used to obtain the image characteristic data of each image; the first prediction unit 603 is used to predict the target user based on the user characteristic data and image characteristic data of the target user through a prediction model For the preference degree of each image, the input of the prediction model is determined based on user characteristic data and image characteristic data; the recommending unit 604 is used to select candidate content and / or candidate interface, for recommendation.
- a recommendation device including: a first image acquisition unit 601, configured to acquire multiple images, each image contains a candidate interface and a candidate interface presented candidate content; the first characteristic data acquisition unit 602 is used to obtain the image characteristic data of each image; the first prediction unit 603 is used to predict the target user based on the user characteristic data and image characteristic data of the target user through a prediction model For the preference degree of each image, the input of the prediction model
- each image includes multiple regions; the image feature data of each image includes multiple local feature vectors, and each local feature vector is used to characterize a region.
- the first prediction unit 603 is configured to, for each image, obtain N word vectors based on the candidate content in each image, and each word vector represents a word in the candidate content, where N is a positive integer; for each word vector, based on each word vector and multiple local feature vectors, the respective attention weights of multiple local feature vectors are calculated through the model of the attention mechanism.
- the attention weight indicates that the target user is reading each When a word represented by a word vector, pay attention to the extent of the region represented by the local feature vector; based on the respective attention weights of multiple local feature vectors, each word vector and multiple local feature vectors are fused to obtain the first fusion feature vector , each word vector corresponds to a first fusion feature vector; based on the user feature vector and the N first fusion feature vectors corresponding to the N word vectors, and predict the target user's preference for each image through the prediction model, the prediction model
- the input of is determined based on the user feature vector and the N first fusion feature vectors, and the user feature vector is used to represent the user feature data of the target user.
- the first prediction unit 603 is used to process the N first fused feature vectors corresponding to the N word vectors through the model of the self-attention mechanism for each image to obtain N semantic Enhanced feature vectors, each first fusion feature vector corresponds to a semantically enhanced feature vector; based on the user feature vector and N semantically enhanced feature vectors, and predict the target user's preference for each image through the prediction model, the input of the prediction model is It is determined based on the user feature vector and N semantic enhancement feature vectors.
- the first prediction unit 603 is configured to predict the target user's preference for each image based on the user feature vector and N semantic enhancement feature vectors through a prediction model, including: for each image, by The model of the additive attention mechanism fuses N semantic enhancement feature vectors to obtain the second fusion feature vector; based on the user feature vector and the second fusion feature vector, and predicts the target user's preference for each image through the prediction model, the prediction The input to the model is determined based on the user feature vector and the second fused feature vector.
- the image feature data of each image includes a global feature vector, and the global feature vector is used to characterize the image.
- the first prediction unit 603 is configured to, for each image, obtain a content feature vector based on the candidate content in each image, and the content feature vector is used to characterize the candidate content; based on the content feature vector and the global feature vector, determine the weight of the content feature vector and the weight of the global feature vector; based on the weight of the content feature vector and the weight of the global feature vector, the content feature vector and the global feature vector are fused to obtain the third fusion feature vector; based on the user feature vector and the third fusion feature vector, and predict the target user's preference for each image through the prediction model.
- the input of the prediction model is determined based on the user feature vector and the third fusion vector, and the user feature vector is used to represent the user characteristics of the target user data.
- the recommending unit 604 is configured to select a candidate content as the target candidate content from the candidate content contained in multiple images based on the degree of preference; , select a candidate interface as the target candidate interface, so as to recommend the target candidate content through the target candidate interface.
- the apparatus further includes a sending unit 605, configured to send the metadata of the target candidate interface and the content of the target candidate to the terminal device, so that the terminal device displays the target candidate interface based on the metadata, and sends the target candidate interface to the The target user recommends target candidate content.
- a sending unit 605 configured to send the metadata of the target candidate interface and the content of the target candidate to the terminal device, so that the terminal device displays the target candidate interface based on the metadata, and sends the target candidate interface to the The target user recommends target candidate content.
- the embodiment of the present application provides an embodiment of a training device, including: a second image acquisition unit 701, configured to acquire a plurality of sample images, each sample image contains a sample candidate interface and passed sample candidate A sample candidate content presented on the interface; a second characteristic data acquisition unit 702, configured to acquire image characteristic data of each sample image; a second prediction unit 703, configured to use user characteristic data and image characteristic data based on the sample user, and Predict the degree of preference of the sample user to each sample image through the prediction model, the input of the prediction model is determined based on user characteristic data and image characteristic data; the adjustment unit 704 is used to click on the sample candidate content based on the degree of preference and the history of the sample user data to adjust the predictive model.
- a second image acquisition unit 701 configured to acquire a plurality of sample images, each sample image contains a sample candidate interface and passed sample candidate A sample candidate content presented on the interface
- a second characteristic data acquisition unit 702 configured to acquire image characteristic data of each sample image
- a second prediction unit 703, configured to use user characteristic data
- each sample image includes multiple regions; image feature data of each sample image includes multiple local feature vectors, and each local feature vector is used to characterize a region.
- the second prediction unit 703 is configured to, for each sample image, obtain N word vectors based on the sample candidate content in each sample image, and each word vector represents a word in the sample candidate content , where N is a positive integer; for each word vector, based on each word vector and multiple local feature vectors, the respective attention weights of multiple local feature vectors are calculated through the model of the attention mechanism, and the attention weight represents the sample When the user reads the words represented by each word vector, the extent to which the user pays attention to the region represented by the local feature vector; based on the respective attention weights of multiple local feature vectors, each word vector and multiple local feature vectors are fused to obtain the first A fusion feature vector, each word vector corresponds to a first fusion feature vector; based on the N first fusion feature vectors corresponding to the user feature vector and N word vectors, and predicting the sample user's response to each sample image through the prediction model For the degree of preference, the input of the prediction model is determined based on the user feature vector and the N first first fusion feature vector
- the second prediction unit 703 is configured to, for each sample image, process the N first fused feature vectors corresponding to the N word vectors through the model of the self-attention mechanism, so as to obtain N Semantic enhancement feature vectors, each first fusion feature vector corresponds to a semantic enhancement feature vector; based on the user feature vector and N semantic enhancement feature vectors, and predict the preference degree of the sample user for each sample image through the prediction model, the prediction model The input is determined based on the user feature vector and N semantically enhanced feature vectors.
- the second prediction unit 703 is configured to, for each sample image, fuse N semantically enhanced feature vectors through an additive attention mechanism model to obtain a second fusion feature vector; based on the user feature vector and the second fusion feature vector, and predict the preference degree of the sample user for each sample image through the prediction model, the input of the prediction model is determined based on the user feature vector and the second fusion feature vector.
- the image feature data of each sample image includes a global feature vector, and the global feature vector is used to characterize the sample image.
- the second prediction unit 703 is configured to, for each sample image, obtain a content feature vector based on the sample candidate content in each sample image, and the content feature vector is used to characterize the sample candidate content; based on the content feature vector and the global feature vector, determine the weight of the content feature vector and the weight of the global feature vector; based on the weight of the content feature vector and the weight of the global feature vector, the content feature vector and the global feature vector are fused to obtain the third fusion feature vector; Based on the user feature vector and the third fusion feature vector, and predict the sample user's preference for each sample image through the prediction model, the input of the prediction model is determined based on the user feature vector and the third fusion vector, and the user feature vector is used to represent The user characteristic data of the sample user.
- the embodiment of the present application also provides an embodiment of a computer device.
- the computer device may be a terminal or a server.
- the computer device may be used as a training device.
- FIG. 19 is a schematic structural diagram of a computer device provided by an embodiment of the present application, which is used to realize the function of the recommendation device in the embodiment corresponding to FIG. 17 or the function of the training device in the embodiment corresponding to FIG. 18.
- the computer device 1800 is realized by one or more servers, and the computer device 1800 may have relatively large differences due to different configurations or performances, and may include one or more central processing units (central processing units, CPU) 1822 (for example, one or one processor) and memory 1832, one or more storage media 1830 (such as one or more mass storage devices) for storing application programs 1842 or data 1844.
- the memory 1832 and the storage medium 1830 may be temporary storage or persistent storage.
- the program stored in the storage medium 1830 may include one or more modules (not shown in the figure), and each module may include a series of instruction operations on the computer device. Furthermore, the central processing unit 1822 may be configured to communicate with the storage medium 1830 , and execute a series of instruction operations in the storage medium 1830 on the computer device 1800 .
- Computer device 1800 can also include one or more power supplies 1826, one or more wired or wireless network interfaces 1850, one or more input and output interfaces 1858, and/or, one or more operating systems 1841, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, etc.
- the central processing unit 1822 may be used to execute the retrieval method performed by the recommendation device in the embodiment corresponding to FIG. 17 .
- the central processing unit 1822 can be used for:
- each image contains a candidate interface and a candidate content presented through the candidate interface
- the input of the prediction model is determined based on the user characteristic data and image characteristic data;
- the candidate content and/or the candidate interface are selected from the candidate interfaces and candidate contents included in the multiple images for recommendation.
- the central processing unit 1822 may be used to execute the model training method executed by the training device in the embodiment corresponding to FIG. 18 .
- the central processing unit 1822 can be used for:
- each sample image includes a sample candidate interface and a sample candidate content presented through the sample candidate interface;
- the input of the prediction model is determined based on the user characteristic data and image characteristic data;
- the prediction model is adjusted.
- the embodiment of the present application also provides a chip, including one or more processors. Part or all of the processor is used to read and execute the computer program stored in the memory, so as to execute the methods of the foregoing embodiments.
- the chip includes a memory, and the memory and the processor are connected to the memory through a circuit or wires. Further optionally, the chip further includes a communication interface, and the processor is connected to the communication interface.
- the communication interface is used to receive data and/or information to be processed, and the processor obtains the data and/or information from the communication interface, processes the data and/or information, and outputs the processing result through the communication interface.
- the communication interface may be an input-output interface.
- some of the one or more processors may implement some of the steps in the above method through dedicated hardware, for example, the processing related to the neural network model may be performed by a dedicated neural network processor or graphics processor to achieve.
- the method provided in the embodiment of the present application may be implemented by one chip, or may be implemented by multiple chips in cooperation.
- the embodiment of the present application also provides a computer storage medium, which is used for storing computer software instructions used by the above-mentioned computer equipment, which includes a program for executing a program designed for the computer equipment.
- the computer equipment may be the recommending device in the embodiment corresponding to FIG. 17 or the training device in the embodiment corresponding to FIG. 18 .
- the embodiment of the present application also provides a computer program product, the computer program product includes computer software instructions, and the computer software instructions can be loaded by a processor to implement the procedures in the methods shown in the foregoing embodiments.
- the disclosed system, device and method can be implemented in other ways.
- the device embodiments described above are only illustrative.
- the division of the units is only a logical function division. In actual implementation, there may be other division methods.
- multiple units or components can be combined or Can be integrated into another system, or some feature data can be ignored, or not implemented.
- the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.
- the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
- each functional unit in each embodiment of the present application may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.
- the above-mentioned integrated units can be implemented in the form of hardware or in the form of software functional units.
- the integrated unit is realized in the form of a software function unit and sold or used as an independent product, it can be stored in a computer-readable storage medium.
- the technical solution of the present application is essentially or part of the contribution to the prior art or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , including several instructions to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present application.
- the aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disc, etc., which can store program codes. .
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Biomedical Technology (AREA)
- General Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Library & Information Science (AREA)
- Multimedia (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Image Analysis (AREA)
Abstract
Description
Claims (23)
- 一种推荐方法,其特征在于,包括:获取多张图像,每张所述图像包含一个候选界面和通过所述候选界面呈现的一种候选内容;获取每张所述图像的图像特征数据;基于目标用户的用户特征数据和所述图像特征数据,并通过预测模型预测所述目标用户对每张所述图像的偏好程度,所述预测模型的输入是基于所述用户特征数据和所述图像特征数据确定的;基于所述偏好程度从所述多张图像包含的所述候选界面和所述候选内容中,选择候选内容和/或候选界面,以进行推荐。
- 根据权利要求1所述的方法,其特征在于,每张所述图像包括多个区域;每张所述图像的图像特征数据包括多个局部特征向量,每个所述局部特征向量用于表征一个所述区域。
- 根据权利要求2所述的方法,其特征在于,所述基于目标用户的用户特征数据和所述图像特征数据,并通过预测模型预测所述目标用户对每张所述图像的偏好程度包括:对于每张所述图像,基于每张所述图像中的所述候选内容获取N个词向量,每个所述词向量表征所述候选内容中的一个词语,其中,N为正整数;对于每个所述词向量,基于每个所述词向量和所述多个局部特征向量,并通过注意力机制的模型计算所述多个局部特征向量各自的注意力权重,所述注意力权重表示所述目标用户在阅读每个所述词向量表征的词语时,关注所述局部特征向量表征的区域的程度;基于所述多个局部特征向量各自的注意力权重,将每个所述词向量和所述多个局部特征向量融合,以得到第一融合特征向量,每个所述词向量对应得到一个所述第一融合特征向量;基于所述用户特征向量和所述N个词向量对应的N个所述第一融合特征向量,并通过预测模型预测所述目标用户对每张所述图像的偏好程度,所述预测模型的输入是基于所述用户特征向量和N个所述第一融合特征向量确定的,所述用户特征向量用于表征目标用户的用户特征数据。
- 根据权利要求3所述的方法,其特征在于,所述基于所述用户特征向量和所述N个词向量对应的N个所述第一融合特征向量,并通过预测模型预测所述目标用户对每张所述图像的偏好程度包括:对于每张所述图像,通过自注意力机制的模型对所述N个词向量对应的N个所述第一融合特征向量进行处理,以得到N个语义增强特征向量,每个所述第一融合特征向量对应一个语义增强特征向量;基于所述用户特征向量和所述N个语义增强特征向量,并通过预测模型预测所述目标 用户对每张所述图像的偏好程度,所述预测模型的输入是基于所述用户特征向量和所述N个语义增强特征向量确定的。
- 根据权利要求4所述的方法,其特征在于,所述基于所述用户特征向量和所述N个语义增强特征向量,并通过预测模型预测所述目标用户对每张所述图像的偏好程度包括:对于每张所述图像,通过加法注意力机制的模型将所述N个语义增强特征向量融合,以得到第二融合特征向量;基于所述用户特征向量和所述第二融合特征向量,并通过预测模型预测所述目标用户对每张所述图像的偏好程度,所述预测模型的输入是基于所述用户特征向量和所述第二融合特征向量确定的。
- 根据权利要求1所述的方法,其特征在于,每张所述图像的图像特征数据包括全局特征向量,所述全局特征向量用于表征所述图像。
- 根据权利要求6所述的方法,其特征在于,所述基于目标用户的用户特征数据和所述图像特征数据,并通过预测模型预测所述目标用户对每张所述图像的偏好程度包括:对于每张所述图像,基于每张所述图像中的所述候选内容获取内容特征向量,所述内容特征向量用于表征所述候选内容;基于所述内容特征向量和所述全局特征向量,确定所述内容特征向量的权重和所述全局特征向量的权重;基于所述内容特征向量的权重和所述全局特征向量的权重,将所述内容特征向量和所述全局特征向量融合,以得到第三融合特征向量;基于所述用户特征向量和所述第三融合特征向量,并通过预测模型预测所述目标用户对每张所述图像的偏好程度,所述预测模型的输入是基于所述用户特征向量和所述第三融合向量确定的,所述用户特征向量用于表征目标用户的用户特征数据。
- 根据权利要求1至7中任意一项所述的方法,其特征在于,所述基于所述偏好程度从所述多张图像包含的所述候选界面和所述候选内容中,选择候选内容和/或候选界面,以进行推荐包括:基于所述偏好程度从所述多张图像包含的所述候选内容中选择一种候选内容作为目标候选内容;基于所述偏好程度从包含所述目标候选内容的所述图像的所述候选界面中,选择一种候选界面作为目标候选界面,以通过所述目标候选界面推荐所述目标候选内容。
- 根据权利要求8所述的方法,其特征在于,在所述基于所述偏好程度从包含所述目标候选内容的所述图像的所述候选界面中,选择一种候选界面作为目标候选界面之后,所述方法还包括:向终端设备发送所述目标候选界面的元数据和所述目标候选内容,以使得所述终端设备基于所述元数据显示所述目标候选界面,并通过所述目标候选界面向所述目标用户推荐所述目标候选内容。
- 一种训练方法,其特征在于,包括:获取多个样本图像,每个所述样本图像包含一个样本候选界面和通过所述样本候选界面呈现的一种样本候选内容;获取每个所述样本图像的图像特征数据;基于样本用户的用户特征数据和所述图像特征数据,并通过预测模型预测所述样本用户对每个所述样本图像的偏好程度,所述预测模型的输入是基于所述用户特征数据和所述图像特征数据确定的;基于所述偏好程度和所述样本用户对所述样本候选内容的历史点击数据,对所述预测模型进行调整。
- 根据权利要求10所述的方法,其特征在于,每个所述样本图像包括多个区域;每个所述样本图像的图像特征数据包括多个局部特征向量,每个所述局部特征向量用于表征一个所述区域。
- 根据权利要求11所述的方法,其特征在于,所述基于样本用户的用户特征数据和所述图像特征数据,并通过预测模型预测所述样本用户对每个所述样本图像的偏好程度包括:对于每个所述样本图像,基于每个所述样本图像中的所述样本候选内容获取N个词向量,每个所述词向量表征所述样本候选内容中的一个词语,其中,N为正整数;对于每个所述词向量,基于每个所述词向量和所述多个局部特征向量,并通过注意力机制的模型计算所述多个局部特征向量各自的注意力权重,所述注意力权重表示所述样本用户在阅读每个所述词向量表征的词语时,关注所述局部特征向量表征的区域的程度;基于所述多个局部特征向量各自的注意力权重,将每个所述词向量和所述多个局部特征向量融合,以得到第一融合特征向量,每个所述词向量对应得到一个所述第一融合特征向量;基于所述用户特征向量和所述N个词向量对应的N个所述第一融合特征向量,并通过预测模型预测所述样本用户对每个所述样本图像的偏好程度,所述预测模型的输入是基于所述用户特征向量和N个所述第一融合特征向量确定的,所述用户特征向量用于表征样本用户的用户特征数据。
- 根据权利要求12所述的方法,其特征在于,所述基于所述用户特征向量和所述N个词向量对应的N个所述第一融合特征向量,并通过预测模型预测所述样本用户对每个所述样本图像的偏好程度包括:对于每个所述样本图像,通过自注意力机制的模型对所述N个词向量对应的N个所述第一融合特征向量进行处理,以得到N个语义增强特征向量,每个所述第一融合特征向量对应一个语义增强特征向量;基于所述用户特征向量和所述N个语义增强特征向量,并通过预测模型预测所述样本用户对每个所述样本图像的偏好程度,所述预测模型的输入是基于所述用户特征向量和所述N个语义增强特征向量确定的。
- 根据权利要求13所述的方法,其特征在于,所述基于所述用户特征向量和所述N个语义增强特征向量,并通过预测模型预测所述样本用户对每个所述样本图像的偏好程度包括:对于每个所述样本图像,通过加法注意力机制的模型将所述N个语义增强特征向量融合,以得到第二融合特征向量;基于所述用户特征向量和所述第二融合特征向量,并通过预测模型预测所述样本用户对每个所述样本图像的偏好程度,所述预测模型的输入是基于所述用户特征向量和所述第二融合特征向量确定的。
- 根据权利要求10所述的方法,其特征在于,每个所述样本图像的图像特征数据包括全局特征向量,所述全局特征向量用于表征所述样本图像。
- 根据权利要求15所述的方法,其特征在于,所述基于样本用户的用户特征数据和所述图像特征数据,并通过预测模型预测所述样本用户对每个所述样本图像的偏好程度包括:对于每个所述样本图像,基于每个所述样本图像中的所述样本候选内容获取内容特征向量,所述内容特征向量用于表征所述样本候选内容;基于所述内容特征向量和所述全局特征向量,确定所述内容特征向量的权重和所述全局特征向量的权重;基于所述内容特征向量的权重和所述全局特征向量的权重,将所述内容特征向量和所述全局特征向量融合,以得到第三融合特征向量;基于所述用户特征向量和所述第三融合特征向量,并通过预测模型预测所述样本用户对每个所述样本图像的偏好程度,所述预测模型的输入是基于所述用户特征向量和所述第三融合向量确定的,所述用户特征向量用于表征样本用户的用户特征数据。
- 一种推荐装置,其特征在于,包括:第一图像获取单元,用于获取多张图像,每张所述图像包含一个候选界面和通过所述候选界面呈现的一种候选内容;第一特征数据获取单元,用于获取每张所述图像的图像特征数据;第一预测单元,用于基于目标用户的用户特征数据和所述图像特征数据,并通过预测 模型预测所述目标用户对每张所述图像的偏好程度,所述预测模型的输入是基于所述用户特征数据和所述图像特征数据确定的;推荐单元,用于基于所述偏好程度从所述多张图像包含的所述候选界面和所述候选内容中,选择候选内容和/或候选界面,以进行推荐。
- 一种训练装置,其特征在于,包括:第二图像获取单元,用于获取多个样本图像,每个所述样本图像包含一个样本候选界面和通过所述样本候选界面呈现的一种样本候选内容;第二特征数据获取单元,用于获取每个所述样本图像的图像特征数据;第二预测单元,用于基于样本用户的用户特征数据和所述图像特征数据,并通过预测模型预测所述样本用户对每个所述样本图像的偏好程度,所述预测模型的输入是基于所述用户特征数据和所述图像特征数据确定的;调整单元,用于基于所述偏好程度和所述样本用户对所述样本候选内容的历史点击数据,对所述预测模型进行调整。
- 一种计算机设备,其特征在于,包括存储器和处理器,其中,所述存储器用于存储计算机可读指令;所述处理器用于读取所述计算机可读指令并实现如权利要求1-9任意一项所述的方法。
- 一种训练设备,其特征在于,包括存储器和处理器,其中,所述存储器用于存储计算机可读指令;所述处理器用于读取所述计算机可读指令并实现如权利要求10-16任意一项所述的方法。
- 一种计算机存储介质,其特征在于,存储有计算机可读指令,且所述计算机可读指令在被处理器执行时实现如权利要求1-16任意一项所述的方法。
- 一种计算机程序产品,其特征在于,所述计算机程序产品中包含计算机可读指令,当该计算机可读指令被处理器执行时实现如权利要求1-16任意一项所述的方法。
- 一种推荐系统,其特征在于,包括终端设备和服务器;所述服务器用于执行如权利要求9所述的方法;所述终端设备用于接收来自所述服务器的目标候选界面的元数据和目标候选内容;基于所述元数据显示所述目标候选界面,并通过所述目标候选界面向所述目标用户推荐所述目标候选内容。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP22857473.7A EP4379574A4 (en) | 2021-08-20 | 2022-07-12 | RECOMMENDATION METHOD AND APPARATUS, LEARNING METHOD AND APPARATUS, RECOMMENDATION DEVICE, AND SYSTEM |
US18/441,389 US20240184837A1 (en) | 2021-08-20 | 2024-02-14 | Recommendation method and apparatus, training method and apparatus, device, and recommendation system |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110963660.X | 2021-08-20 | ||
CN202110963660.XA CN113806631A (zh) | 2021-08-20 | 2021-08-20 | 一种推荐方法、训练方法、装置、设备及新闻推荐系统 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US18/441,389 Continuation US20240184837A1 (en) | 2021-08-20 | 2024-02-14 | Recommendation method and apparatus, training method and apparatus, device, and recommendation system |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2023020160A1 true WO2023020160A1 (zh) | 2023-02-23 |
Family
ID=78893897
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2022/105075 WO2023020160A1 (zh) | 2021-08-20 | 2022-07-12 | 一种推荐方法、训练方法、装置、设备及推荐系统 |
Country Status (4)
Country | Link |
---|---|
US (1) | US20240184837A1 (zh) |
EP (1) | EP4379574A4 (zh) |
CN (1) | CN113806631A (zh) |
WO (1) | WO2023020160A1 (zh) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113806631A (zh) * | 2021-08-20 | 2021-12-17 | 华为技术有限公司 | 一种推荐方法、训练方法、装置、设备及新闻推荐系统 |
CN114741608A (zh) * | 2022-05-10 | 2022-07-12 | 中国平安财产保险股份有限公司 | 基于用户画像的新闻推荐方法、装置、设备及存储介质 |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170116543A1 (en) * | 2015-10-23 | 2017-04-27 | Sap Se | Self-adaptive display layout system |
CN107895024A (zh) * | 2017-09-13 | 2018-04-10 | 同济大学 | 用于网页新闻分类推荐的用户模型构建方法及推荐方法 |
CN109740068A (zh) * | 2019-01-29 | 2019-05-10 | 腾讯科技(北京)有限公司 | 媒体数据推荐方法、装置及存储介质 |
CN109947510A (zh) * | 2019-03-15 | 2019-06-28 | 北京市商汤科技开发有限公司 | 一种界面推荐方法及装置、计算机设备 |
CN111461175A (zh) * | 2020-03-06 | 2020-07-28 | 西北大学 | 自注意与协同注意机制的标签推荐模型构建方法及装置 |
CN113806631A (zh) * | 2021-08-20 | 2021-12-17 | 华为技术有限公司 | 一种推荐方法、训练方法、装置、设备及新闻推荐系统 |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102819584B (zh) * | 2012-07-26 | 2015-07-08 | 北京奇虎科技有限公司 | 一种界面文件展示方法及系统 |
CN104166668B (zh) * | 2014-06-09 | 2018-02-23 | 南京邮电大学 | 基于folfm模型的新闻推荐系统及方法 |
US10817749B2 (en) * | 2018-01-18 | 2020-10-27 | Accenture Global Solutions Limited | Dynamically identifying object attributes via image analysis |
CN109903314A (zh) * | 2019-03-13 | 2019-06-18 | 腾讯科技(深圳)有限公司 | 一种图像区域定位的方法、模型训练的方法及相关装置 |
CN112100504B (zh) * | 2020-11-03 | 2021-09-10 | 北京达佳互联信息技术有限公司 | 内容推荐方法、装置、电子设备及存储介质 |
-
2021
- 2021-08-20 CN CN202110963660.XA patent/CN113806631A/zh active Pending
-
2022
- 2022-07-12 WO PCT/CN2022/105075 patent/WO2023020160A1/zh active Application Filing
- 2022-07-12 EP EP22857473.7A patent/EP4379574A4/en active Pending
-
2024
- 2024-02-14 US US18/441,389 patent/US20240184837A1/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170116543A1 (en) * | 2015-10-23 | 2017-04-27 | Sap Se | Self-adaptive display layout system |
CN107895024A (zh) * | 2017-09-13 | 2018-04-10 | 同济大学 | 用于网页新闻分类推荐的用户模型构建方法及推荐方法 |
CN109740068A (zh) * | 2019-01-29 | 2019-05-10 | 腾讯科技(北京)有限公司 | 媒体数据推荐方法、装置及存储介质 |
CN109947510A (zh) * | 2019-03-15 | 2019-06-28 | 北京市商汤科技开发有限公司 | 一种界面推荐方法及装置、计算机设备 |
CN111461175A (zh) * | 2020-03-06 | 2020-07-28 | 西北大学 | 自注意与协同注意机制的标签推荐模型构建方法及装置 |
CN113806631A (zh) * | 2021-08-20 | 2021-12-17 | 华为技术有限公司 | 一种推荐方法、训练方法、装置、设备及新闻推荐系统 |
Non-Patent Citations (1)
Title |
---|
See also references of EP4379574A4 |
Also Published As
Publication number | Publication date |
---|---|
EP4379574A1 (en) | 2024-06-05 |
EP4379574A4 (en) | 2024-10-16 |
CN113806631A (zh) | 2021-12-17 |
US20240184837A1 (en) | 2024-06-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109844767B (zh) | 基于图像分析和预测的可视化搜索 | |
US11361018B2 (en) | Automatically curated image searching | |
US10043109B1 (en) | Attribute similarity-based search | |
CN112733042B (zh) | 推荐信息的生成方法、相关装置及计算机程序产品 | |
WO2021238722A1 (zh) | 资源推送方法、装置、设备及存储介质 | |
WO2023020160A1 (zh) | 一种推荐方法、训练方法、装置、设备及推荐系统 | |
KR20190095333A (ko) | 앵커식 검색 | |
US20150339348A1 (en) | Search method and device | |
CN110737783A (zh) | 一种推荐多媒体内容的方法、装置及计算设备 | |
US20210166014A1 (en) | Generating document summary | |
WO2020057145A1 (en) | Method and device for generating painting display sequence, and computer storage medium | |
WO2024051609A1 (zh) | 广告创意数据选取方法及装置、模型训练方法及装置、设备、存储介质 | |
US12079572B2 (en) | Rule-based machine learning classifier creation and tracking platform for feedback text analysis | |
CN111177467A (zh) | 对象推荐方法与装置、计算机可读存储介质、电子设备 | |
US20230089574A1 (en) | Modifying a document content section of a document object of a graphical user interface (gui) | |
CN111144974A (zh) | 一种信息展示方法及装置 | |
US10643142B2 (en) | Search term prediction | |
US20230401250A1 (en) | Systems and methods for generating interactable elements in text strings relating to media assets | |
CN117909560A (zh) | 搜索方法、模型的训练方法、装置、设备、介质及程序产品 | |
US20240248901A1 (en) | Method and system of using domain specific knowledge in retrieving multimodal assets | |
US11768867B2 (en) | Systems and methods for generating interactable elements in text strings relating to media assets | |
CN113486260B (zh) | 互动信息的生成方法、装置、计算机设备及存储介质 | |
CN116030375A (zh) | 视频特征提取、模型训练方法、装置、设备及存储介质 | |
CN113674043B (zh) | 商品推荐方法及装置、计算机可读存储介质、电子设备 | |
CN113032614A (zh) | 一种跨模态信息检索方法和装置 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22857473 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2022857473 Country of ref document: EP |
|
ENP | Entry into the national phase |
Ref document number: 2022857473 Country of ref document: EP Effective date: 20240226 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
WWE | Wipo information: entry into national phase |
Ref document number: 11202400985T Country of ref document: SG |