WO2024061073A1 - Multimedia information generation method and apparatus, and computer-readable storage medium - Google Patents

Multimedia information generation method and apparatus, and computer-readable storage medium Download PDF

Info

Publication number
WO2024061073A1
WO2024061073A1 PCT/CN2023/118512 CN2023118512W WO2024061073A1 WO 2024061073 A1 WO2024061073 A1 WO 2024061073A1 CN 2023118512 W CN2023118512 W CN 2023118512W WO 2024061073 A1 WO2024061073 A1 WO 2024061073A1
Authority
WO
WIPO (PCT)
Prior art keywords
features
information
content
item
target
Prior art date
Application number
PCT/CN2023/118512
Other languages
French (fr)
Chinese (zh)
Inventor
张政
刘银星
阮涛
吕晶晶
Original Assignee
北京沃东天骏信息技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京沃东天骏信息技术有限公司 filed Critical 北京沃东天骏信息技术有限公司
Publication of WO2024061073A1 publication Critical patent/WO2024061073A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/48Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/483Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0251Targeted advertisements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]

Definitions

  • the present invention relates to the field of computer vision, and in particular, to a method and device for generating multimedia information, and a computer-readable storage medium.
  • interest product recall is usually at product granularity.
  • the system will select a product candidate set for the current user based on the user's historical browsing, searching, purchasing, adding to the shopping cart, etc., and select the optimal product based on the advertising system ranking model.
  • relevant advertisements will be generated.
  • advertisements are usually generated using templates. The main image of the product is inserted into the template and replaced, and the corresponding product advertisement is rendered and generated.
  • the generated advertisement recommendations are relatively simple.
  • Embodiments of the present invention provide a method and device for generating multimedia information, and a computer-readable storage medium, which can generate corresponding target multimedia information based on item information and content information, has diversity, and has good recommendation effects.
  • Embodiments of the present invention provide a method for generating multimedia information.
  • the method includes:
  • Feature extraction is performed based on the item information and content information to obtain item features corresponding to the item dimension and content features corresponding to the content dimension, and the item features and content features are collaborated and fused to obtain multiple sets of fusion features; each Group fusion features represent the fusion between different content modal combinations and different items;
  • the plurality of groups of fusion features are estimated by a preset recommendation model, and target item information and target content information corresponding to a group of fusion features with the highest estimated value are selected; the preset recommendation model representation screens the fusion features;
  • Target multimedia information is generated based on the target item information and the target content information.
  • feature extraction is performed based on the item information and content information to obtain item features corresponding to the item dimension and content features corresponding to the content dimension, including:
  • the content multi-modal type includes at least two modalities among text information, image information and image sequence information;
  • Feature extraction is performed on the content information corresponding to the content multi-modal type to obtain the content features corresponding to the content dimensions.
  • the feature extraction of the content information corresponding to the content multi-modal type to obtain the content features corresponding to the content dimension includes: if the content multi-modal type is a text type, through the first The encoding method characterizes the text information Extract and obtain text features;
  • the content multi-modal type is an image type or an image sequence type
  • feature extraction is performed on the image information and the image sequence information respectively through the second encoding method to obtain image features and behavioral features;
  • the content characteristics corresponding to the content dimension are determined according to at least one of the text characteristics, the image characteristics and the behavioral characteristics.
  • the content multi-modal type is a text type
  • feature extraction is performed on the text information through the first encoding method to obtain text features, including:
  • the content multi-modal type is a text type
  • feature extraction is performed on the text information to obtain text initial features
  • the text initial features include semantic expression information and word information
  • the text initial features are encoded to obtain the text features.
  • the content multi-modal type is an image type or an image sequence type
  • feature extraction is performed on the image information and the image sequence information through the second encoding method to obtain image features and behavioral features.
  • the content multi-modal type is an image type
  • the initial image features include scene information, content information and style information
  • the content multi-modal type is an image sequence type
  • feature extraction is performed on the image sequence information to obtain initial behavioral features;
  • the initial behavioral features include subject target information and key frame information;
  • the image initial features and the behavior initial features are respectively encoded to obtain the image features and the behavior features.
  • the item characteristics and the content characteristics are coordinated and fused to obtain multiple sets of fusion characteristics, including:
  • the item feature and the content feature are collaboratively processed to obtain a first item feature and a first content feature of the same probability distribution;
  • the first item feature includes a plurality of first sub-item features;
  • the first content feature includes a plurality of first sub-content features;
  • the plurality of first sub-content features are randomly combined to obtain multiple content combination features; the content combination features include content features corresponding to at least two content multi-modal types.
  • the plurality of item combination features and the plurality of content combination features are fused to obtain the plurality of sets of fusion features.
  • the multiple sets of fusion features are estimated through a preset recommendation model, and the target item information and target content information corresponding to the set of fusion features with the highest estimated value are selected, including:
  • the set of fused features is decoded to obtain the target item information and the target content information.
  • generating target multimedia information based on the target item information and the target content information includes:
  • a preset layout generation model layout generation is performed on the target item information and the target content information to obtain multiple layouts; the preset layout generation model represents the adjustment of layout through items and content;
  • the evaluation model the multiple layouts are evaluated and candidate layouts are determined; the evaluation model is used to evaluate and screen the layouts;
  • the target multimedia information is generated based on the optimal layout, the target item information and the target content information.
  • the target items and the target content are laid out through a preset layout generation model, and multiple layouts are obtained, including:
  • the preset layout generation model includes the stacking order of image layers and the text size range constraints in the text information
  • the initialization layout is adjusted and the multiple layouts are determined; the adjustment rules are obtained through continuous training using the object's preference as an incentive.
  • the method before evaluating the multiple layouts by using the evaluation model to determine the candidate layouts, the method further includes:
  • the historical layout includes positive sample data and negative sample data
  • the initial evaluation model is trained by using the positive sample data and the negative sample data to determine the evaluation model.
  • the evaluation model is used to evaluate the multiple layouts and determine candidate layouts, including:
  • the corresponding layout is used as the candidate layout.
  • An embodiment of the present invention provides a device for generating multimedia information.
  • the device for generating multimedia information includes an acquisition part, a selection part and a generation part; wherein,
  • the acquisition part is configured to recall item information and content information in response to a received browsing request; perform feature extraction based on the item information and content information to obtain item features corresponding to the item dimension and content features corresponding to the content dimension,
  • the item features and the content features are collaborated and fused to obtain multiple sets of fusion features; each set of fusion features represents the fusion between different content modal combinations and different items;
  • the selection part is configured to estimate the multiple sets of fusion features through a preset recommendation model, and select the target item information and target content information corresponding to the set of fusion features with the highest estimated value; the preset The recommendation model representation optimizes the fusion features;
  • the generating part is configured to generate target multimedia information based on the target item information and the target content information.
  • An embodiment of the present invention provides a device for generating multimedia information, the device for generating multimedia information comprising:
  • a memory for storing executable instructions
  • a processor configured to execute executable instructions stored in the memory. When the executable instructions are executed, the processor executes the method for generating multimedia information.
  • An embodiment of the present invention provides a computer-readable storage medium, characterized in that executable instructions are stored therein, and when the executable instructions are executed by one or more processors, the processors execute the method for generating multimedia information.
  • Embodiments of the present invention provide a method and device for generating multimedia information, and a computer-readable storage medium.
  • the method includes: in response to a received browsing request, recalling item information and content information; and performing features based on the item information and content information. Extract, obtain the content features corresponding to the item features in the item dimension and the content features corresponding to the content dimension, and collaborate and fuse the item features and the content features to obtain multiple sets of fusion features; each set of fusion features represents a combination of different content modalities.
  • the server vectorizes the item information and content information to obtain the item characteristics corresponding to the item and the content characteristics corresponding to the content; it converts the item characteristics and content characteristics in different spaces into vectors in the same space for fusion, Multiple sets of fusion features are obtained; the fusion features are features with two dimensions, so the obtained fusion features are diverse.
  • the server estimates multiple fusion features based on the preset recommendation model and obtains multiple estimated values. Among them, the higher the estimated value, the better the diversity of fusion features, and the better the diversity of target item information and target content information corresponding to the set of fusion features with the highest estimated value, so that according to the target item information and target The target multimedia information generated by content information is diverse. Finally, based on the generation method of multimedia information, the diversity of target multimedia information can be improved, thereby ensuring that personalized recommendations are provided to users and improving the recommendation effect.
  • FIG1 is a first optional flow chart of a method for generating multimedia information provided by an embodiment of the present invention
  • Figure 2 is an optional flow diagram 2 of a method for generating multimedia information provided by an embodiment of the present invention
  • Figure 3 is an optional flow diagram 3 of a method for generating multimedia information provided by an embodiment of the present invention.
  • Figure 4 is an optional flowchart 4 of a method for generating multimedia information provided by an embodiment of the present invention.
  • Figure 5 is an optional flow diagram 5 of a method for generating multimedia information provided by an embodiment of the present invention.
  • Figure 6 is an optional flowchart 6 of a method for generating multimedia information provided by an embodiment of the present invention.
  • Figure 7 is an optional flow chart 7 of a method for generating multimedia information provided by an embodiment of the present invention.
  • Figure 8 is an optional flowchart 8 of a method for generating multimedia information provided by an embodiment of the present invention.
  • Figure 9 is an optional flow chart 9 of a method for generating multimedia information provided by an embodiment of the present invention.
  • Figure 10 is a schematic structural diagram 1 of a device for generating multimedia information provided by an embodiment of the present invention.
  • FIG. 11 is a schematic structural diagram 2 of a device for generating multimedia information according to an embodiment of the present invention.
  • the recall of items of interest is usually at SKU granularity.
  • the system will select a candidate set of items for the current user based on the user's historical browsing, search, purchase, additional purchase and other behaviors, and select the optimal one based on the advertising system ranking model.
  • related creatives asvertising carriers, pictures, videos, copywriting and other creative content
  • Item recall is an interest matching problem. First, it uses the user's historical interest information to effectively mine related searches, browsing, clicking, and adding shopping cart items. Item recall must consider both the user's long-term preferences and real-time needs. This combination of short-term and long-term behaviors conducts multi-dimensional interest mining. At the same time, for promotional activities and hot information, consider expanding related items to the same store and same product, and add information on hot items and similar items for further exploration.
  • Another idea is to consider the similarity of the crowd, which is often called collaborative filtering, to effectively cluster related users through user portraits, and to consider migrating the interests and preferences of the same type of users (such as similar consumption habits and consistent brand preferences). can be increased for some The ability to mine novel items of interest.
  • Model sorting is called "precise sorting" in the advertising system. Through model estimation, the best product is selected for display. The goal of sorting is usually to maximize revenue.
  • the core pCTR estimation here is the product with the best estimated effect (highest click-through rate).
  • the current ranking model is usually based on the convolutional neural network model (Convolutional Neural Network, CNN), which combines rich data features with deep learning, conducts model training based on the posterior click rate of the data, and uses the learned parameters for online An estimate of items and user characteristics.
  • CNN convolutional Neural Network
  • Creativity as an advertising carrier, presents the content of items.
  • the current method lacks the knowledge of related content.
  • the unified expression and modeling methods among them lead to poor product recall results.
  • the model sorting stage the current sorting method only considers the preference of a single product and lacks the ability to make combined predictions for multiple products. It also lacks corresponding modeling capabilities for different levels of product attributes, and model predictions are based on product and user characteristics.
  • creative dimension information especially the lack of multi-modal prediction capabilities that use copywriting, pictures, videos and other content as features, resulting in poor recommendation results.
  • the current creative generation is after recall and sorting, and the generation method is mostly template nesting, which does not reflect the user's preference for creative elements.
  • templates as a carrier limits the diversity of creativity. At the same time, It also limits users’ diverse needs for creativity, and lacks personalized expression of user interests.
  • an embodiment of the present invention proposes a method for generating multimedia information.
  • intelligent creative generation is evolved into an element combination problem.
  • a unified expression of products and creative elements is established, breaking the underlying logic of traditional e-commerce advertisements that directly sell goods.
  • a creative-driven user conversion maximization idea is established; in the model sorting stage, the optimization of a single product is transformed into a combination optimization of multiple products, and multimodal information is integrated into model estimation; in the creative generation stage, real-time personalized creative generation is performed in the form of creative elements and product combinations to express user interests in a personalized way.
  • Figure 1 is an optional flow diagram 1 of a method for generating multimedia information provided by an embodiment of the present invention. It will be shown in conjunction with Figure 1 The steps are explained.
  • the item information is all items to be recommended by the terminal.
  • Item information contains multiple sub-item information; content information contains multiple sub-content information.
  • a browsing request refers to a request formed by the user entering the item information to be browsed in the search box on the application software browsing page or on the application web page browsing page. For example, after a user enters "photo frame" in the search box on the browsing page of a shopping platform, a request for browsing photo frames will be generated.
  • the server receives the browsing request sent by the terminal, responds to the browsing request, and recalls item information and content information from the item library and content library according to the object's historical browsing information.
  • the sorting recommendation model (Deep & Cross Network, DCN) for extraction to obtain multiple item information
  • the browsed image is input into the convolutional neural network (Convolutional Neural Network, CNN) for extraction to obtain the content information carrying the item information
  • the item information is removed from the content information carrying the item information to obtain Click (equivalent to the content information).
  • each set of fusion features represents the fusion between different content modal combinations and different items.
  • Collaboration is to process multiple vectors located in different vector spaces so that they are mapped to the same vector space so that they meet the same probability distribution; fusion is a fusion vector formed by different combinations of multiple vectors located in the same space ;
  • the vector can be the item characteristics and content characteristics in the present invention.
  • Collaboration must be carried out before fusion, and fusion can only be carried out after collaborative processing.
  • Item features are the displayed characteristics of a certain item, and content features are features obtained by extracting features from the picture description, video description, and text description of a certain item.
  • the item information can be the attribute characteristics of the item;
  • the content information can be images and copywriting excluding the item attributes, that is, some creative content to promote the item.
  • the server can perform feature extraction on item information to obtain item features corresponding to the item dimension; identify content information to obtain content multi-modal types; and perform content information corresponding to content multi-modal types.
  • the item features and content features are collaboratively processed to obtain the first item feature and the first content feature of the same probability distribution; the first item feature and the first item feature are fused to obtain multiple sets of fusion features.
  • Figure 3 is an optional flowchart 3 of a method for generating multimedia information provided by an embodiment of the present invention. As shown in Figure 3, feature extraction is performed based on item information and content information to obtain The item characteristics corresponding to the item dimension and the content characteristics corresponding to the content dimension can be realized through S1021-S1023, as follows:
  • the server can perform feature extraction on the item information, convert the item information into features in vector form, and obtain item features corresponding to the item dimensions.
  • Item characteristics are a 1024-dimensional floating point array.
  • S1022 Identify the content information to obtain content information corresponding to the content multimodal type.
  • the server can identify the content information through a neural network model to obtain content information corresponding to the content multi-modal type, where the content multi-modal type includes text information, image information and image sequence information.
  • the neural network (NN) model is a complex network system formed by a large number of simple processing units (called neurons) that are widely connected to each other. It reflects many basic characteristics of human brain function and is a highly complex system. nonlinear dynamic learning system. Neural networks have large-scale parallelism, distributed storage and processing, self-organization, self-adaptation and self-learning capabilities, and are particularly suitable for processing imprecise and fuzzy information processing problems that require simultaneous consideration of many factors and conditions.
  • the server can perform corresponding feature extraction according to the multi-modal type of the content. If the content multi-modal type is a text type, feature extraction processing is performed on the text information through the first encoding method to obtain text features. If the content multi-modal type is an image type or an image sequence type, feature extraction processing is performed on the image information and image sequence information respectively through the second encoding method to obtain image features and behavioral features. Based on text features, image features and behavioral features, the content features corresponding to the content dimensions are determined.
  • the first encoding method mainly targets text information; the second encoding method mainly targets image information and image sequence information.
  • the image information may be an image, and the image sequence information may be a video.
  • the server extracts features from item information, vectorizes the item information, and obtains item features corresponding to the item; identifies content information to obtain content multimodal types; extracts features from content information corresponding to content multimodal types to obtain content features corresponding to content dimensions. Since item features and content features belong to features in different dimensions, the server obtains multidimensional features. When the target multimedia information is subsequently generated based on the multidimensional features, the target multimedia information has multidimensional information, thereby making the target multimedia information diverse.
  • Figure 4 is an optional flow diagram 4 of a method for generating multimedia information provided by an embodiment of the present invention.
  • S1023 can be implemented through S201-S203, as follows:
  • the content multi-modal type is a text type
  • the server extracts features from text information based on the content multimodal type being text type to obtain initial text features; and encodes the initial text features using a first encoding method to obtain text features.
  • text features are text initial features in vector form.
  • S201 can be implemented through S2011-S2012, as follows:
  • the content multi-modal type is a text type
  • the text initial features include semantic expression information and word information.
  • the server performs feature extraction on the text information based on the multi-modal type of the content being text type to obtain semantic expression information and word information.
  • Semantic expression information and word information are both initial features of text.
  • Figure 5 is an optional flow diagram 5 of a method for generating multimedia information provided by an embodiment of the present invention.
  • the server obtains semantic expression (equivalent to semantic expression) by performing feature extraction on the copy information. Express information) and word segmentation (equivalent to word information). Specifically, the semantic expression is obtained through Bert's method.
  • the server encodes the initial text features through the first encoding method to obtain vectorized text features.
  • the first encoding method is ConCat
  • the server uses ConCat to express semantics (equivalent to semantic table Information) and word segmentation (equivalent to word information) are encoded to obtain feature vectors (equivalent to text features).
  • the server performs feature extraction and encoding on the text information to obtain text features. During this process, the server converts text information into vectorized text features to facilitate subsequent collaboration and integration of item features and content features.
  • the content multi-modal type is an image type or an image sequence type
  • the server performs feature extraction on the image information according to the content multi-modal type being the image type to obtain initial image features.
  • the content multi-modal type being an image sequence type
  • feature extraction is performed on the image sequence information to obtain initial behavioral features.
  • the initial features of the image and the initial features of the behavior are respectively coded to obtain the image features and the behavior features.
  • S202 can be implemented through S2021-S2023, as follows:
  • the initial image features include scene information, content information and style information.
  • the server can extract features from the image information according to the content multimodal type being the image type, and obtain scene information, content information and style information.
  • the scene information, content information and style information are all initial features of the image.
  • the image information can be a promotional picture of an item display; the server extracts features of the image information to obtain the scene (equivalent to scene information), content, and main body (content and main body are equivalent to content information) , color, style and layout (color, style and layout are equivalent to style information).
  • Scene, content, subject, color, style and layout are all initial characteristics of an image.
  • the content multi-modal type is an image sequence type
  • the behavior initial features include subject target information and key frame information.
  • the server can perform feature extraction on the image sequence information based on the content multi-modal type as the image sequence type to obtain the target theme information and key frame information.
  • Target theme information and key frame information are both behavioral initial features.
  • the server performs feature extraction on the image sequence information to obtain key frames, highlight points (key frames, highlight points are equivalent to key frame information), abstracts, and subject target behavior actions (summary, subject target Behavioral actions are equivalent to target topic information).
  • key frames, highlights, summaries, and subject target behaviors are all initial behavioral characteristics. Among them, key frames, highlights, summaries, subject target behaviors and actions all belong to the content list.
  • the initial image features and the initial behavioral features are respectively encoded to obtain the image features and behavioral features.
  • the server encodes the initial image features through the second encoding method to obtain vectorized image features; it encodes the initial behavioral features to obtain vectorized behavioral features.
  • the second encoding method is One Hot.
  • the server performs feature encoding on the scene, content, subject, color, style and layout through One Hot to obtain a feature vector (equivalent to image features).
  • the server uses One Hot to perform feature encoding on key frames, highlights, summaries, and subject target behaviors to obtain feature vectors (equivalent to behavioral features).
  • the server performs feature extraction and encoding on the image information and image sequence information to obtain image features and behavioral features.
  • the server can convert image information and image sequence information into vectorized image features and vectorized behavioral features respectively, thereby obtaining multi-modal content features, making the content features diverse.
  • S203 Determine content features corresponding to the content dimension based on at least one of text features, image features, and behavioral features.
  • the server uses at least one of text features, image features, and behavioral features as content features corresponding to the content dimension.
  • the server can determine text features as content features corresponding to the content dimension; or, the server can determine image features as content features corresponding to the content dimension; or, the server can determine behavioral features as content features corresponding to the content dimension; Alternatively, the server can determine text features and image features as content features corresponding to the content dimension; alternatively, the server can determine text features and behavioral features as content features corresponding to the content dimension; or, the server can determine image features and behavioral features as Content features corresponding to the content dimension; alternatively, the server can determine text features, image features, and behavioral features as content features corresponding to the content dimension.
  • the server can identify and extract features of the content information to obtain text features, image features, and behavior features.
  • the server can determine the content features corresponding to the content dimension based on one of the text features, image features, and behavior features; or, the server can determine the content features corresponding to the content dimension based on two of the text features, image features, and behavior features; or, the server can determine the content features corresponding to the content dimension based on three of the text features, image features, and behavior features. Since the content features have one or more multimodal features, the content features are diverse.
  • collaboration and fusion of item features and content features to obtain multiple sets of fusion features can be achieved through S301-S303, as follows:
  • the first item feature includes a plurality of first sub-item features; the first content feature includes a plurality of first sub-content features.
  • the server performs collaborative learning processing on item features and content features based on differences in feature domains, maps the item features and content features to the same vector space, and obtains the first item feature sum of the same probability distribution.
  • First content characteristics are used to determine whether the item features and content features are relevant to the same vector space.
  • collaborative processing is to process multiple vectors located in different vector spaces so that they are mapped to the same vector space and satisfy the same probability distribution; the technical means of collaborative processing and collaboration are consistent.
  • the server can randomly combine multiple first sub-item features to obtain multiple different item combination features.
  • the server randomly combines 12 first sub-item features (the 12 first sub-item features are different) to obtain 5 item combination features; among which, the 5 item combination features each include 6 first sub-item features. Item characteristics, 8 first sub-item characteristics, 3 first sub-item characteristics, 5 first sub-item characteristics and 9 first sub-item characteristics. It should be noted that the five item combination features may have the same first sub-item feature, or there may be different first sub-item features.
  • the content combination features include content features corresponding to at least two content multi-modal types.
  • the server can randomly combine multiple first sub-content features to obtain multiple different content combination features.
  • the server processes six first sub-content features (the six first sub-content features are different, specifically, the content contained is more
  • the modal types are different or the content features themselves are different) are randomly combined to obtain two content combination features.
  • one content combination feature contains content features corresponding to three content multimodal types, including two text features, three image features and one behavior feature; the other content combination feature contains content features corresponding to two content multimodal types, including two text features and one image feature.
  • the server can fuse multiple item combination features and multiple content combination features to obtain multiple sets of fusion features; one set of fusion features includes at least one item combination feature and at least one content combination feature.
  • the server fuses 5 item combination features and 2 content combination features to obtain 3 groups of fusion features, namely the 1st group, the 2nd group and the 3rd group; among which, the 1st group of fusion characteristics includes 3
  • the second set of fusion features include 8 first sub-item features, two types of content multi-modal types Modal type, there are 2 text features, 1 image feature;
  • the third group includes 13 first sub-item features, three content multi-modal types, 4 text features, 4 image features and behavioral features There is 1 kind.
  • the server processes the item characteristics and content characteristics located in different vector spaces, mapping them to the same vector space, so that they satisfy the same probability distribution, so that the two characteristics can be located in the same vector space, which is convenient for Subsequent implementation of the fusion of the two features.
  • the server randomly combines multiple first sub-item features to obtain multiple item feature combinations. Since each item combination contains multiple first sub-item features, the item feature combinations are diverse.
  • the server randomly combines multiple first sub-content features to obtain multiple content feature combinations. Since each content combination contains multiple first sub-content features, the content feature combinations are diverse.
  • the server randomly fuses the item feature combination and the content feature combination to obtain multiple sets of fusion features. Since the fusion features include improved item feature combinations and content feature combinations, the fusion features are diverse.
  • the server can input multiple sets of fusion features into a preset recommendation model for prediction, and obtain first estimated values corresponding to each of the multiple sets of fusion features. Based on multiple first estimated values, a set of fused features with the highest estimated value is selected from multiple sets of fused features. Decode a set of fused features to obtain target item information and target content information.
  • the number of items is set to From 1 , and the range of From 1 is (0, M); the number of creatives (i.e., content) is set to From 2 , and the range of From 2 is (0, N); the server performs traversal exploration to obtain multiple items; for multiple items, the vectors of the multiple items are fused; for multiple creatives, the multimodal features of the recall stage are fused with feature vectors; at the same time, the fused creative vectors (i.e., fused features) are input into the estimation model (i.e., the preset recommendation model) to obtain a CTR estimation value; the combination with the highest pCTR estimation value is selected for output as the overall estimation result (i.e., the target item information and target content information corresponding to a set of fused features with the highest estimation value).
  • the estimation model i.e., the preset recommendation model
  • S103 can be implemented through S1031, S1032 and S1033, as follows:
  • the server estimates the multiple groups of fused features through a preset recommendation model to obtain first estimated values corresponding to each of the multiple groups of fused features.
  • the server estimates three sets of fusion features through a preset recommendation model, and obtains the third set of fusion features corresponding to the three sets of fusion features.
  • One estimate is 0.7, 0.85 and 0.62.
  • the server selects a set of fusion features with the highest estimated value from multiple sets of fusion features based on multiple first estimated values.
  • the server selects the fusion feature with an estimated value of 0.85 from the first estimated values 0.7, 0.85 and 0.62 corresponding to the three sets of fusion features respectively.
  • the server can decode a set of fused features to convert the fused features into target item information and target content information.
  • the server decodes the first set of fused features.
  • the target item information includes 3 items
  • the target content information includes three content multi-modal types, including 2 types of text, 3 types of images, and 1 type of image sequence.
  • the server estimates multiple fusion features based on the preset recommendation model and obtains multiple estimated values. Among them, the higher the estimated value, the better the diversity of fusion features, and the better the diversity of target item information and target content information corresponding to the set of fusion features with the highest estimated value, so that according to the target item information and target The target multimedia information generated by content information is diverse.
  • S104 Generate target multimedia information based on the target item information and target content information.
  • the server can perform layout generation on target item information and target content information through a preset layout generation model to obtain multiple layouts.
  • a preset layout generation model Through the evaluation model, multiple layouts are evaluated and candidate layouts are determined.
  • the layout optimization model Through the layout optimization model, the optimal layout is selected from the candidate layouts.
  • Target multimedia information is generated based on the optimal layout, target item information and target content information.
  • the target multimedia information is sent to the terminal, so that the terminal displays the browsing page based on the target multimedia information.
  • Figure 6 is an optional flow diagram 6 of a multimedia information generation method provided by an embodiment of the present invention.
  • the traditional multimedia information generation process is: receiving a user request (equivalent to a browsing request ), the server recalls the product (equivalent to item recall) to obtain product information; sorts the product information by model, and selects the product information corresponding to the Top1 model as the recommended product information; generates template creative ideas, and fuses product information to obtain multimedia information.
  • Figure 7 is an optional flow diagram 7 of a method for generating multimedia information provided by an embodiment of the present invention. As shown in Figure 7, data A/B (equivalent to target item information and target content information) are input to the server.
  • an initial layout is generated through a preset layout generation model (not shown in the figure). Adjust the text size, element position, color, and contrast of the initial layout through adjustment rules to obtain multiple layouts. Multiple layouts are evaluated through the evaluation model (represented by +++ in Figure 7), and the evaluation results are obtained. The evaluation results include pass and fail. If the evaluation result is "passed", the layout plan (equivalent to the candidate layout) is output. , among which, the layout planning includes four layouts, namely 1, 2, 3, and 4. The layout planning is optimized through the layout optimization model to obtain the optimal style (optimal layout). The optimal style includes copywriting or pictures or videos or middle pages. Generate target multimedia information through a real-time multimedia information generation engine.
  • the server vectorizes the item information and content information to obtain item features corresponding to the items and content features corresponding to the content.
  • the server converts item features and content features in different spaces into vectors in the same space and fuses them to obtain multiple sets of fusion features. Since the fused feature is a feature with two dimensions, the fused feature is diverse.
  • the server estimates multiple fusion features based on the preset recommendation model and obtains multiple estimated values. Among them, the higher the estimated value, the better the diversity of fused features, corresponding to The better the diversity of target item information and target content information corresponding to the set of fusion features with the highest estimated value is selected, so that the target multimedia information generated based on the target item information and target content information has diversity.
  • the diversity of target multimedia information can be improved, thereby ensuring that personalized recommendations are provided to users and improving the recommendation effect.
  • Figure 8 is an optional flow diagram 8 of a method for generating multimedia information provided by an embodiment of the present invention.
  • S104 can be implemented through S1041-S1045, as follows:
  • the preset layout generation model includes the stacking order of image layers and text size range constraints in text information.
  • the server can generate an initialization layout corresponding to the target item information and the target content information through a preset layout generation model. By adjusting the rules, adjust the initial layout and determine multiple layouts.
  • S1041 may be implemented by S401 and S402 as follows:
  • the server can input the target item information and the target content information into a preset layout generation model, and generate an initialization layout corresponding to the target item information and the target content information.
  • the initial layout refers to the arrangement and combination of the positions of target item information and target content information.
  • the adjustment rule is obtained through continuous training using the object's preference as an incentive. Specifically, it is based on reinforcement learning and is used as an incentive according to the object's preference. That is, if the click-through rate is higher after adjustment, it is a positive incentive. If the click-through rate becomes lower, it is a negative incentive. It is obtained through repeated adjustments and learning. of.
  • the server adjusts the initial layout by adjusting rules to obtain multiple layouts.
  • the server can generate an initial layout corresponding to the target item information and target content information through a preset layout generation model, adjust the initial layout through adjustment rules, and determine multiple layouts of the target item information and target content information; by adjusting The rules adjust the initial layout and adjust the unreasonable layout method, which can improve the rationality of the layout.
  • the adjusted layout of the server can still include multiple layouts, so that the adjusted layout still has diversity.
  • the evaluation model is used to evaluate and filter layouts.
  • the server can evaluate multiple layouts through an evaluation model and obtain evaluation results corresponding to the multiple layouts. If the evaluation result is characterized as successful, the corresponding layout will be used as a candidate layout. If the evaluation result is characterized as failure, the corresponding layout will be deleted.
  • S1042 can be implemented through S501 and S502, as follows:
  • the server can use the evaluation model to evaluate the rationality of multiple layouts and obtain corresponding evaluation results of the multiple layouts. Evaluation results include success and failure.
  • the server can characterize the layout as successful according to the evaluation result of the layout, which means that the layout passed, and the layout as Candidate layout.
  • the server can evaluate multiple layouts through the evaluation model and obtain evaluation results corresponding to the multiple layouts.
  • the evaluation results represent local rationality.
  • the server filters the layouts, removes unreasonable layouts, and determines candidate layouts. Since the candidate layout is the filtered result after removing unreasonable layouts, the server selects the candidate layout as a more reasonable layout.
  • S601, S602 and S603 are also implemented before S1042, as follows:
  • the server may obtain historical target multimedia information.
  • the historical layout includes positive sample data and negative sample data.
  • the server can identify and analyze the historical target multimedia information to obtain the historical layout corresponding to the historical target multimedia information.
  • the server trains the initial evaluation model through the positive sample data and negative sample data of the historical layout until the evaluation result output by the model meets the preset threshold, saves the model, and obtains the evaluation model.
  • the server trains the initial evaluation model through historical target multimedia information to determine the evaluation model, which can ensure the evaluation accuracy of the evaluation model.
  • the server may input candidate layouts into the layout optimization model, and select an optimal layout from the candidate layouts.
  • the optimal layout is to use the layout optimization model to evaluate multiple candidate layouts and obtain the corresponding index evaluation values of the multiple candidate layouts; from the multiple index evaluation values, select the candidate layout with the highest index evaluation value as the optimal layout. Excellent layout.
  • the three candidate layouts are respectively evaluated with indexes through the layout optimization model, and the index evaluation values corresponding to the three candidate layouts are obtained.
  • the index evaluation value of the first candidate layout is 0.5
  • the index evaluation value of the second candidate layout is 0.7
  • the index evaluation value of the third candidate layout is 0.8
  • the third candidate layout with an index evaluation value of 0.8 is regarded as the optimal layout.
  • the server can arrange the target item information and the target content information according to the optimal layout to generate the target multimedia information.
  • S1045 Send the target multimedia information to the terminal, so that the terminal can display a browsing page based on the target multimedia information.
  • the server sends the target multimedia information to the terminal.
  • the terminal can display a browsing page based on target multimedia information.
  • the server can generate multiple layouts of target item information and target content information according to the preset layout generation model and adjustment rules, and filter the multiple layouts through the evaluation model and layout optimization model to determine the optimal layout.
  • the optimal layout is determined after removing unreasonable layouts, then the target multimedia information obtained through the optimal layout is more reasonable and accurate, thus improving the accuracy of the target multimedia information.
  • the server recommends the target multimedia information the target multimedia information will be more in line with the user's needs, and can provide the user with personalized recommendations, and the recommendation effect is good.
  • Figure 9 is an optional flow diagram 9 that provides a method for generating multimedia information according to an embodiment of the present invention.
  • the server receives a user request (equivalent to a browsing request); Carry out interest product recall (equivalent to item information recall) and creative element recall (equivalent to content recall) to obtain item information and content information.
  • a user request equivalent to a browsing request
  • Carry out interest product recall equivalent to item information recall
  • creative element recall equivalent to content recall
  • Multi-product selection (equivalent to fusion feature selection) is performed through cross-modal CTR estimation to obtain the optimal product content combination; among them, multi-modality includes text (equivalent to text features), style, picture (equivalent to image features), Video (equivalent to behavioral characteristics).
  • multi-modality includes text (equivalent to text features), style, picture (equivalent to image features), Video (equivalent to behavioral characteristics).
  • preset layout generation model (not shown in the figure) and adjustment rules, element planning is performed on the product content combination to generate the layout in real time, and the final target multimedia information is determined to be sent to the user (equivalent to the terminal).
  • the server can vectorize the item information and content information to obtain the item characteristics corresponding to the item and the content characteristics corresponding to the content.
  • the server converts item features and content features in different spaces into vectors in the same space and fuses them to obtain multiple fusion features. Since the fused feature is a feature with two dimensions, the fused feature has diversity.
  • the server estimates multiple fusion features based on the preset recommendation model and obtains multiple estimated values. Among them, the higher the estimated value, the better the diversity of the fusion features, and the better the diversity of the optimal product content combination corresponding to the set of fusion features with the highest estimated value.
  • the server uses the preset layout generation model and adjustment rules to perform element planning on the optimal product content combination to generate the layout in real time to determine the final target multimedia information. Since the optimal product content combination is diverse, the target multimedia information generated based on the optimal product content combination is also diverse.
  • the embodiment of the present invention also provides a multimedia information generation device, as shown in Figure 10.
  • Figure 10 is the structure of a multimedia information generation device provided by the embodiment of the present invention.
  • Schematic diagram 1 shows that the multimedia information generation device 10 includes: an acquisition part 1001, a selection part 1002 and a generation part 1003; wherein,
  • the acquisition part 1001 is configured to recall item information and content information in response to a received browsing request; perform feature extraction based on the item information and content information to obtain item features corresponding to the item dimension and content features corresponding to the content dimension , and collaborate and fuse the item features and the content features to obtain multiple sets of fusion features; each set of fusion features represents the fusion between different content modal combinations and different items;
  • the selection part 1002 is configured to estimate the multiple sets of fusion features through a preset recommendation model, and select the target item information and target content information corresponding to the set of fusion features with the highest estimated value; the prediction The recommended recommendation model representation is designed to optimize the fusion features;
  • the generating part 1003 is configured to generate target multimedia information based on the target item information and the target content information.
  • the acquisition part 1001 is configured to perform feature extraction on the item information to obtain the item dimensions corresponding to the item characteristics; identify the content information to obtain the content multi-modal type Corresponding content information; the content multi-modal type includes at least two modalities among text information, image information and image sequence information; feature extraction is performed on the content information corresponding to the content multi-modal type to obtain content dimensions corresponding to The content characteristics.
  • the device for generating multimedia information further includes a determining part 1004; wherein,
  • the acquisition part 1001 is configured to perform feature extraction on the text information through the first encoding method to obtain text features if the content multi-modal type is a text type; if the content multi-modal type is an image type or image sequence type, then perform feature extraction on the image information and the image sequence information respectively through the second encoding method to obtain image features and behavioral features;
  • the determining part 1004 is configured to determine the content characteristics corresponding to the content dimension according to at least one of the text characteristics, the image characteristics and the behavioral characteristics.
  • the acquisition part 1001 is configured to perform feature extraction on the text information to obtain initial text features if the content multi-modal type is a text type; the text initial features It includes semantic expression information and word information; through the first encoding method, the text initial features are encoded to obtain the text features.
  • the acquisition part 1001 is configured to, if the content multimodal type is an image type, perform feature extraction on the image information to obtain image initial features; the image initial features include scene information, content information and style information; if the content multimodal type is an image sequence type, perform feature extraction on the image sequence information to obtain behavior initial features; the behavior initial features include subject target information and key frame information; and through the second encoding method, encode the image initial features and the behavior initial features respectively to obtain the image features and the behavior features.
  • the acquisition part 1001 is configured to perform collaborative processing on the item characteristics and the content characteristics to obtain the first item characteristics and the first content characteristics of the same probability distribution;
  • the first item feature includes a plurality of first sub-item features;
  • the first content feature includes a plurality of first sub-content features;
  • the multiple first sub-item features are randomly combined to obtain multiple item combination features;
  • the plurality of first sub-content features are randomly combined to obtain multiple content combination features;
  • the content combination features include content features corresponding to at least two content multi-modal types; the multiple item combination features and the Multiple content combination features are fused to obtain the multiple sets of fusion features.
  • the acquisition part 1001 is configured to input the multiple sets of fusion features into the preset recommendation model for prediction, and obtain the first corresponding first set of the multiple sets of fusion features. Estimated value; based on a plurality of the first estimated values, select a group of fusion features with the highest estimated value from the plurality of groups of fusion features; decode the group of fused features to obtain the target Item information and the target content information.
  • the acquisition part 1001 is configured to perform layout generation on the target item information and the target content information through a preset layout generation model to obtain multiple layouts;
  • the preset layout generation model is designed to represent the adjustment of layout through items and content;
  • the determination part 1004 is configured to evaluate the multiple layouts and determine candidate layouts through an evaluation model; the evaluation model is configured to evaluate and screen layouts;
  • the selection part 1002 is configured to select an optimal layout from the candidate layouts through a layout optimization model
  • the generating part 1003 is configured to generate the target multimedia information based on the optimal layout, the target item information and the target content information.
  • the generation part 1003 is configured to generate an initialization layout corresponding to the target item information and the target content information through a preset layout generation model; the preset layout generation The model includes the stacking order of image layers and the text size range constraints in text information;
  • the determination part 1004 is configured to adjust the initial layout and determine the multiple layouts through adjustment rules; the adjustment rules are obtained through continuous training using the object's preference as an incentive.
  • the acquisition part 1001 before evaluating the multiple layouts through the evaluation model and determining candidate layouts, is configured to obtain historical target multimedia information; Identify and get history Layout; the historical layout includes positive sample data and negative sample data;
  • the determination part 1004 is configured to train an initial evaluation model through the positive sample data and the negative sample data, and determine the evaluation model.
  • the acquisition part 1001 is configured to evaluate the multiple layouts through the evaluation model and obtain the evaluation results corresponding to the multiple layouts;
  • the determining part 1004 is configured to use the corresponding layout as the candidate layout if the evaluation result is characterized as successful.
  • the server can vectorize the item information and content information to obtain the item characteristics corresponding to the item and the content characteristics corresponding to the content.
  • the server converts item features and content features in different spaces into vectors in the same space and fuses them to obtain multiple fusion features. Since the fused feature is a feature with two dimensions, the fused feature has diversity. Based on the diversity of fusion features, the server estimates multiple fusion features based on the preset recommendation model and obtains multiple estimates; among them, the higher the estimate, the better the diversity of the fusion features, and the corresponding selection The set of fusion features with the highest estimated value corresponds to a better diversity of optimal product content combinations.
  • the server uses the preset layout generation model and adjustment rules to perform element planning on the optimal product content combination to generate the layout in real time to determine the final target multimedia information. Since the optimal product content combination is diverse, the target multimedia information generated based on the optimal product content combination is also diverse.
  • the above-mentioned processing can be allocated to different program modules as needed, that is, the internal structure of the device is divided into into different program modules to complete all or part of the processing described above.
  • the multimedia information generation device provided by the above embodiments and the multimedia information generation method embodiments belong to the same concept. The specific implementation process and beneficial effects can be found in the method embodiments and will not be described again here. For technical details not disclosed in the device embodiment, please refer to the description of the method embodiment of the present invention for understanding.
  • the embodiment of the present invention also provides a multimedia information generation device, as shown in Figure 11.
  • Figure 11 is a schematic structural diagram of a multimedia information generation device provided by the embodiment of the present invention.
  • the multimedia information generating device 11 includes: a processor 1101 and a memory 1102; the memory 1102 stores one or more programs executable by the processor. When one or more programs are executed, the processor 1101 executes the program as described above. Any method for generating multimedia information according to the above embodiments.
  • embodiments of the present invention may be provided as methods, systems, or computer program products. Accordingly, the invention may take the form of a hardware embodiment, a software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, magnetic disk storage, optical storage, etc.) embodying computer-usable program code therein.
  • a computer-usable storage media including, but not limited to, magnetic disk storage, optical storage, etc.
  • These computer program instructions may also be stored in a computer-readable memory that causes a computer or other programmable data processing apparatus to operate in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction means, the instructions device actual Now flowchart a process or processes and/or block diagram a function specified in a box or boxes.
  • These computer program instructions may also be loaded onto a computer or other programmable data processing device, causing a series of operating steps to be performed on the computer or other programmable device to produce computer-implemented processing, thereby executing on the computer or other programmable device.
  • Instructions provide steps for implementing the functions specified in a process or processes of a flowchart diagram and/or a block or blocks of a block diagram.
  • Embodiments of the present invention provide a method and device for generating multimedia information, and a computer-readable storage medium.
  • the method includes: responding to a received browsing request, recalling item information and content information; performing feature extraction based on the item information and content information, Obtain the item features corresponding to the item dimension and the content features corresponding to the content dimension, and collaborate and fuse the item features and content features to obtain multiple sets of fusion features; through the preset recommendation model, estimate the multiple sets of fusion features and select The target item information and target content information corresponding to the set of fusion features with the highest estimated value; based on the target item information and target content information, target multimedia information is generated.
  • the above scheme extracts and combines features of item information and content information to obtain multiple fusion features.
  • the generated target multimedia information is diverse and has good recommendation effects.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • Strategic Management (AREA)
  • Development Economics (AREA)
  • General Engineering & Computer Science (AREA)
  • Library & Information Science (AREA)
  • Economics (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Game Theory and Decision Science (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Embodiments of the present invention provide a multimedia information generation method and apparatus, and a computer-readable storage medium. The method comprises: recalling article information and content information in response to a received browsing request; performing feature extraction on the basis of the article information and the content information to obtain article features corresponding to article dimensions and content features corresponding to content dimensions, and performing collaboration and fusion on the article features and the content features to obtain a plurality of groups of fused features; estimating the plurality of groups of fused features by means of a preset recommendation model, and selecting target article information and target content information corresponding to a group of fused features having the highest estimated value; and generating target multimedia information on the basis of the target article information and the target content information. According to the solution, feature extraction and combination are performed on article information and content information to obtain a plurality of fused features, such that generated target multimedia information is diversified, and the recommendation effect is good.

Description

一种多媒体信息的生成方法及装置、计算机可读存储介质A method and device for generating multimedia information, and a computer-readable storage medium
相关申请的交叉引用Cross-references to related applications
本发明基于申请号为202211139046.2、申请日为2022年09月19日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本发明作为参考。This invention is based on a Chinese patent application with application number 202211139046.2 and a filing date of September 19, 2022, and claims the priority of the Chinese patent application. The entire content of the Chinese patent application is hereby incorporated by reference into the present invention.
技术领域Technical field
本发明涉及计算机视觉领域,尤其涉及一种多媒体信息的生成方法及装置、计算机可读存储介质。The present invention relates to the field of computer vision, and in particular, to a method and device for generating multimedia information, and a computer-readable storage medium.
背景技术Background technique
在目前电商广告系统中,兴趣商品召回通常是商品粒度的,系统会根据用户的历史浏览、搜索、购买、加入购物车等行为为当前用户选择商品候选集,基于广告系统排序模型选择最优商品,在商品确定以后,再生成相关广告。目前广告生成通常是采用模版的方式进行生成,将商品主图插入模版进行替换,渲染生成对应的商品广告,虽然具备自动化能力,但生成的广告推荐比较单一。In the current e-commerce advertising system, interest product recall is usually at product granularity. The system will select a product candidate set for the current user based on the user's historical browsing, searching, purchasing, adding to the shopping cart, etc., and select the optimal product based on the advertising system ranking model. After the product is determined, relevant advertisements will be generated. At present, advertisements are usually generated using templates. The main image of the product is inserted into the template and replaced, and the corresponding product advertisement is rendered and generated. Although it has automation capabilities, the generated advertisement recommendations are relatively simple.
发明内容Contents of the invention
本发明实施例提供一种多媒体信息的生成方法及装置、计算机可读存储介质,能够根据物品的信息和内容信息生成对应的目标多媒体信息,具有多样性,推荐效果好。Embodiments of the present invention provide a method and device for generating multimedia information, and a computer-readable storage medium, which can generate corresponding target multimedia information based on item information and content information, has diversity, and has good recommendation effects.
本发明的技术方案是这样实现的:The technical solution of the present invention is implemented as follows:
本发明实施例提供了一种多媒体信息的生成方法,所述方法包括:Embodiments of the present invention provide a method for generating multimedia information. The method includes:
响应于接收到的浏览请求,召回物品信息及内容信息;Recall item information and content information in response to a received browsing request;
基于所述物品信息及内容信息进行特征提取,得到物品维度对应的物品特征和内容维度对应的内容特征,并将所述物品特征和所述内容特征进行协同和融合,得到多组融合特征;每组融合特征表征不同内容模态组合下与不同物品之间的融合;Feature extraction is performed based on the item information and content information to obtain item features corresponding to the item dimension and content features corresponding to the content dimension, and the item features and content features are collaborated and fused to obtain multiple sets of fusion features; each Group fusion features represent the fusion between different content modal combinations and different items;
通过预设的推荐模型,对所述多组融合特征进行预估,选择预估值最高的一组融合特征对应的目标物品信息和目标内容信息;所述预设的推荐模型表征对融合特征进行筛选;The plurality of groups of fusion features are estimated by a preset recommendation model, and target item information and target content information corresponding to a group of fusion features with the highest estimated value are selected; the preset recommendation model representation screens the fusion features;
基于所述目标物品信息和所述目标内容信息,生成目标多媒体信息。Target multimedia information is generated based on the target item information and the target content information.
上述方案中,所述基于所述物品信息及内容信息进行特征提取,得到物品维度对应的物品特征和内容维度对应的内容特征,包括:In the above solution, feature extraction is performed based on the item information and content information to obtain item features corresponding to the item dimension and content features corresponding to the content dimension, including:
对所述物品信息进行特征提取,得到物品维度对应的所述物品特征;Perform feature extraction on the item information to obtain the item characteristics corresponding to the item dimensions;
对所述内容信息进行识别,得到内容多模态类型对应的内容信息;所述内容多模态类型包括文本信息、图像信息和图像序列信息中至少两种模态;Identify the content information to obtain content information corresponding to a content multi-modal type; the content multi-modal type includes at least two modalities among text information, image information and image sequence information;
对所述内容多模态类型对应的内容信息进行特征提取,得到内容维度对应的所述内容特征。Feature extraction is performed on the content information corresponding to the content multi-modal type to obtain the content features corresponding to the content dimensions.
上述方案中,所述对所述内容多模态类型对应的内容信息进行特征提取,得到内容维度对应的所述内容特征,包括:若所述内容多模态类型为文本类型,则通过第一编码方式对所述文本信息进行特征 提取,得到文本特征;In the above scheme, the feature extraction of the content information corresponding to the content multi-modal type to obtain the content features corresponding to the content dimension includes: if the content multi-modal type is a text type, through the first The encoding method characterizes the text information Extract and obtain text features;
若所述内容多模态类型为图像类型或图像序列类型,则通过第二编码方式分别对所述图像信息和所述图像序列信息进行特征提取,得到图像特征和行为特征;If the content multi-modal type is an image type or an image sequence type, feature extraction is performed on the image information and the image sequence information respectively through the second encoding method to obtain image features and behavioral features;
根据所述文本特征、所述图像特征和所述行为特征中的至少一种,确定内容维度对应的所述内容特征。The content characteristics corresponding to the content dimension are determined according to at least one of the text characteristics, the image characteristics and the behavioral characteristics.
上述方案中,所述若所述内容多模态类型为文本类型,则通过第一编码方式对所述文本信息进行特征提取,得到文本特征,包括:In the above solution, if the content multi-modal type is a text type, feature extraction is performed on the text information through the first encoding method to obtain text features, including:
若所述内容多模态类型为文本类型,则对所述文本信息进行特征提取,得到文本初始特征;所述文本初始特征包括语义表达信息和词语信息;If the content multi-modal type is a text type, feature extraction is performed on the text information to obtain text initial features; the text initial features include semantic expression information and word information;
通过所述第一编码方式,对所述文本初始特征进行编码处理,得到所述文本特征。Using the first encoding method, the text initial features are encoded to obtain the text features.
上述方案中,所述若所述内容多模态类型为图像类型或图像序列类型,则通过第二编码方式分别对所述图像信息和所述图像序列信息进行特征提取,得到图像特征和行为特征,包括:In the above solution, if the content multi-modal type is an image type or an image sequence type, feature extraction is performed on the image information and the image sequence information through the second encoding method to obtain image features and behavioral features. ,include:
若所述内容多模态类型为图像类型,则对所述图像信息进行特征提取,得到图像初始特征;所述图像初始特征包括场景信息、内容信息和风格信息;If the content multi-modal type is an image type, perform feature extraction on the image information to obtain initial image features; the initial image features include scene information, content information and style information;
若所述内容多模态类型为图像序列类型,则对所述图像序列信息进行特征提取,得到行为初始特征;所述行为初始特征包括主体目标信息和关键帧信息;If the content multi-modal type is an image sequence type, feature extraction is performed on the image sequence information to obtain initial behavioral features; the initial behavioral features include subject target information and key frame information;
通过所述第二编码方式,对所述图像初始特征和所述行为初始特征分别进行编码处理,得到所述图像特征和所述行为特征。Through the second encoding method, the image initial features and the behavior initial features are respectively encoded to obtain the image features and the behavior features.
上述方案中,所述将所述物品特征和所述内容特征进行协同和融合,得到多组融合特征,包括:In the above scheme, the item characteristics and the content characteristics are coordinated and fused to obtain multiple sets of fusion characteristics, including:
对所述物品特征和所述内容特征,进行协同处理,得到同一概率分布的第一物品特征和第一内容特征;所述第一物品特征包括多个第一子物品特征;所述第一内容特征包括多个第一子内容特征;The item feature and the content feature are collaboratively processed to obtain a first item feature and a first content feature of the same probability distribution; the first item feature includes a plurality of first sub-item features; the first content feature includes a plurality of first sub-content features;
对所述多个第一子物品特征进行随机组合,得到多个物品组合特征;Randomly combining the multiple first sub-item features to obtain multiple item combination features;
对所述多个第一子内容特征进行随机组合,得到多个内容组合特征;所述内容组合特征包含至少两种内容多模态类型对应的内容特征。The plurality of first sub-content features are randomly combined to obtain multiple content combination features; the content combination features include content features corresponding to at least two content multi-modal types.
对所述多个物品组合特征和所述多个内容组合特征进行融合,得到所述多组融合特征。The plurality of item combination features and the plurality of content combination features are fused to obtain the plurality of sets of fusion features.
上述方案中,所述通过预设的推荐模型,对所述多组融合特征进行预估,选择预估值最高的一组融合特征对应的目标物品信息和目标内容信息,包括:In the above solution, the multiple sets of fusion features are estimated through a preset recommendation model, and the target item information and target content information corresponding to the set of fusion features with the highest estimated value are selected, including:
将所述多组融合特征输入所述预设的推荐模型中进行预估,得到所述多组融合特征各自对应的第一预估值;Input the multiple sets of fusion features into the preset recommendation model for estimation, and obtain the first estimated values corresponding to the multiple sets of fusion features;
基于多个所述第一预估值,从所述多组融合特征中,选择预估值最高的一组融合特征;Based on the first estimated values, selecting a group of fused features with the highest estimated value from the multiple groups of fused features;
对所述一组融合特征进行解码处理,得到所述目标物品信息和所述目标内容信息。The set of fused features is decoded to obtain the target item information and the target content information.
上述方案中,所述基于所述目标物品信息和所述目标内容信息,生成目标多媒体信息,包括:In the above solution, generating target multimedia information based on the target item information and the target content information includes:
通过预设的布局生成模型,对所述目标物品信息和所述目标内容信息进行布局生成,得到多个布局;所述预设的布局生成模型表征通过物品和内容调整布局;Through a preset layout generation model, layout generation is performed on the target item information and the target content information to obtain multiple layouts; the preset layout generation model represents the adjustment of layout through items and content;
通过评价模型,对所述多个布局进行评估,确定候选布局;所述评价模型用于对布局进行评价筛选; Through the evaluation model, the multiple layouts are evaluated and candidate layouts are determined; the evaluation model is used to evaluate and screen the layouts;
通过布局优选模型,从所述候选布局中,选择最优布局;Select the optimal layout from the candidate layouts through the layout optimization model;
基于所述最优布局、所述目标物品信息和所述目标内容信息,生成所述目标多媒体信息。The target multimedia information is generated based on the optimal layout, the target item information and the target content information.
上述方案中,所述通过预设的布局生成模型,对所述目标物品和所述目标内容进行布局生成,得到多个布局,包括:In the above scheme, the target items and the target content are laid out through a preset layout generation model, and multiple layouts are obtained, including:
通过预设的布局生成模型,生成所述目标物品信息和所述目标内容信息对应的初始化布局;所述预设的布局生成模型包括图像层的先后叠放顺序和文本信息中文字大小范围约束;Generate an initialization layout corresponding to the target item information and the target content information through a preset layout generation model; the preset layout generation model includes the stacking order of image layers and the text size range constraints in the text information;
通过调整规则,对所述初始化布局进行调整,确定所述多个布局;所述调整规则是将对象的偏好程度作为激励,通过不断训练得到的。Through adjustment rules, the initialization layout is adjusted and the multiple layouts are determined; the adjustment rules are obtained through continuous training using the object's preference as an incentive.
上述方案中,所述通过评价模型,对所述多个布局进行评估,确定候选布局之前,所述方法还包括:In the above solution, before evaluating the multiple layouts by using the evaluation model to determine the candidate layouts, the method further includes:
获取历史目标多媒体信息;Obtain historical target multimedia information;
对所述历史目标多媒体信息进行识别,得到历史布局;所述历史布局包括正样本数据和负样本数据;Identify the historical target multimedia information to obtain a historical layout; the historical layout includes positive sample data and negative sample data;
通过所述正样本数据和所述负样本数据对初始评价模型进行训练,确定所述评价模型。The initial evaluation model is trained by using the positive sample data and the negative sample data to determine the evaluation model.
上述方案中,所述通过评价模型,对所述多个布局进行评估,确定候选布局,包括:In the above solution, the evaluation model is used to evaluate the multiple layouts and determine candidate layouts, including:
通过所述评价模型,对所述多个布局进行评估,得到所述多个布局各自对应的评估结果;Use the evaluation model to evaluate the multiple layouts and obtain evaluation results corresponding to the multiple layouts;
若所述评估结果表征为成功,则将其对应的布局作为所述候选布局。If the evaluation result is characterized as successful, the corresponding layout is used as the candidate layout.
本发明实施例提供了一种多媒体信息的生成装置,所述多媒体信息的生成装置包括获取部分、选择部分和生成部分;其中,An embodiment of the present invention provides a device for generating multimedia information. The device for generating multimedia information includes an acquisition part, a selection part and a generation part; wherein,
所述获取部分,被配置为响应于接收到的浏览请求,召回物品信息及内容信息;基于所述物品信息及内容信息进行特征提取,得到物品维度对应的物品特征和内容维度对应的内容特征,并将所述物品特征和所述内容特征进行协同和融合,得到多组融合特征;每组融合特征表征不同内容模态组合下与不同物品之间的融合;The acquisition part is configured to recall item information and content information in response to a received browsing request; perform feature extraction based on the item information and content information to obtain item features corresponding to the item dimension and content features corresponding to the content dimension, The item features and the content features are collaborated and fused to obtain multiple sets of fusion features; each set of fusion features represents the fusion between different content modal combinations and different items;
所述选择部分,被配置为通过预设的推荐模型,对所述多组融合特征进行预估,选择预估值最高的一组融合特征对应的目标物品信息和目标内容信息;所述预设的推荐模型表征对融合特征进行优选;The selection part is configured to estimate the multiple sets of fusion features through a preset recommendation model, and select the target item information and target content information corresponding to the set of fusion features with the highest estimated value; the preset The recommendation model representation optimizes the fusion features;
所述生成部分,被配置为基于所述目标物品信息和所述目标内容信息,生成目标多媒体信息。The generating part is configured to generate target multimedia information based on the target item information and the target content information.
本发明实施例提供了一种多媒体信息的生成装置,所述多媒体信息的生成装置包括:An embodiment of the present invention provides a device for generating multimedia information, the device for generating multimedia information comprising:
存储器,用于存储可执行指令;A memory for storing executable instructions;
处理器,用于执行所述存储器中存储的可执行指令,当所述可执行指令被执行时,所述处理器执行所述的多媒体信息的生成方法。A processor, configured to execute executable instructions stored in the memory. When the executable instructions are executed, the processor executes the method for generating multimedia information.
本发明实施例提供了一种计算机可读存储介质,其特征在于,存储有可执行指令,当所述可执行指令被一个或多个处理器执行的时候,所述处理器执行所述的多媒体信息的生成方法。An embodiment of the present invention provides a computer-readable storage medium, characterized in that executable instructions are stored therein, and when the executable instructions are executed by one or more processors, the processors execute the method for generating multimedia information.
本发明实施例提供了一种多媒体信息的生成方法及装置、计算机可读存储介质,方法包括:响应于接收到的浏览请求,召回物品信息及内容信息;基于所述物品信息及内容信息进行特征提取,得到物品维度对应物品特征和内容维度对应的内容特征,并将所述物品特征和所述内容特征进行协同和融合,得到多组融合特征;每组融合特征表征不同内容模态组合下与不同物品之间的融合;通过预设的推荐模型,对所述多组融合特征进行预估,选择预估值最高的一组融合特征对应的目标物品信息和目标内容信息;所述预设的推荐模型表征对融合特征进行筛选;基于所述目标物品信息和所述目标内容信息,生成目标 多媒体信息。上述方案中,首先,服务器将物品信息和内容信息进行向量化表示,得到物品对应物品特征和内容对应的内容特征;将不同空间下的物品特征和内容特征转化为同一空间下的向量进行融合,得到多组融合特征;融合特征是具有两个维度的特征,因此,得到的融合特征具有多样化。其次,服务器根据预设的推荐模型对多个融合特征进行预估,得到多个预估值。其中,预估值越高代表融合特征的多样性越好,对应选择预估值最高的一组融合特征对应的目标物品信息和目标内容信息的多样性就越好,从而根据目标物品信息和目标内容信息生成的目标多媒体信息具有多样性。最后,基于多媒体信息的生成方法,可以提高目标多媒体信息的多样性,从而确保给用户提供个性化的推荐,提高了推荐效果。Embodiments of the present invention provide a method and device for generating multimedia information, and a computer-readable storage medium. The method includes: in response to a received browsing request, recalling item information and content information; and performing features based on the item information and content information. Extract, obtain the content features corresponding to the item features in the item dimension and the content features corresponding to the content dimension, and collaborate and fuse the item features and the content features to obtain multiple sets of fusion features; each set of fusion features represents a combination of different content modalities. Fusion between different items; through the preset recommendation model, estimate the multiple sets of fusion features, and select the target item information and target content information corresponding to the set of fusion features with the highest estimated value; the preset The recommendation model represents the filtering of fusion features; based on the target item information and the target content information, a target is generated Multimedia information. In the above scheme, first, the server vectorizes the item information and content information to obtain the item characteristics corresponding to the item and the content characteristics corresponding to the content; it converts the item characteristics and content characteristics in different spaces into vectors in the same space for fusion, Multiple sets of fusion features are obtained; the fusion features are features with two dimensions, so the obtained fusion features are diverse. Secondly, the server estimates multiple fusion features based on the preset recommendation model and obtains multiple estimated values. Among them, the higher the estimated value, the better the diversity of fusion features, and the better the diversity of target item information and target content information corresponding to the set of fusion features with the highest estimated value, so that according to the target item information and target The target multimedia information generated by content information is diverse. Finally, based on the generation method of multimedia information, the diversity of target multimedia information can be improved, thereby ensuring that personalized recommendations are provided to users and improving the recommendation effect.
附图说明Description of drawings
图1为本发明实施例提供一种多媒体信息的生成方法的一个可选的流程示意图一;FIG1 is a first optional flow chart of a method for generating multimedia information provided by an embodiment of the present invention;
图2为本发明实施例提供一种多媒体信息的生成方法的一个可选的流程示意图二;Figure 2 is an optional flow diagram 2 of a method for generating multimedia information provided by an embodiment of the present invention;
图3为本发明实施例提供一种多媒体信息的生成方法的一个可选的流程示意图三;Figure 3 is an optional flow diagram 3 of a method for generating multimedia information provided by an embodiment of the present invention;
图4为本发明实施例提供一种多媒体信息的生成方法的一个可选的流程示意图四;Figure 4 is an optional flowchart 4 of a method for generating multimedia information provided by an embodiment of the present invention;
图5为本发明实施例提供一种多媒体信息的生成方法的一个可选的流程示意图五;Figure 5 is an optional flow diagram 5 of a method for generating multimedia information provided by an embodiment of the present invention;
图6为本发明实施例提供一种多媒体信息的生成方法的一个可选的流程示意图六;Figure 6 is an optional flowchart 6 of a method for generating multimedia information provided by an embodiment of the present invention;
图7为本发明实施例提供一种多媒体信息的生成方法的一个可选的流程示意图七;Figure 7 is an optional flow chart 7 of a method for generating multimedia information provided by an embodiment of the present invention;
图8为本发明实施例提供一种多媒体信息的生成方法的一个可选的流程示意图八;Figure 8 is an optional flowchart 8 of a method for generating multimedia information provided by an embodiment of the present invention;
图9为本发明实施例提供一种多媒体信息的生成方法的一个可选的流程示意图九;Figure 9 is an optional flow chart 9 of a method for generating multimedia information provided by an embodiment of the present invention;
图10为本发明实施例提供一种多媒体信息的生成装置的结构示意图一;Figure 10 is a schematic structural diagram 1 of a device for generating multimedia information provided by an embodiment of the present invention;
图11为本发明实施例提供一种多媒体信息的生成装置的结构示意图二。FIG. 11 is a schematic structural diagram 2 of a device for generating multimedia information according to an embodiment of the present invention.
具体实施方式Detailed ways
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部实施例。基于本发明的实施例,本领域普通技术人员在没有做出创造性劳动前提下,所获得的所有其他实施例,都属于本发明保护范围。The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, of the embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts fall within the protection scope of the present invention.
为了使本技术领域的人员更好地理解本发明方案,下面结合附图和具体实施方式对本发明作进一步的详细说明。In order to enable those skilled in the art to better understand the solution of the present invention, the present invention will be further described in detail below in conjunction with the accompanying drawings and specific embodiments.
在目前电商广告系统中,兴趣物品召回通常是sku粒度的,系统会根据用户的历史浏览、搜索、购买、加购等行为,为当前用户选择物品候选集,基于广告系统排序模型选择最优物品,在物品确定以后,再生成相关创意(广告载体,图片、视频、文案等创意内容),过程包括几个部分:In the current e-commerce advertising system, the recall of items of interest is usually at SKU granularity. The system will select a candidate set of items for the current user based on the user's historical browsing, search, purchase, additional purchase and other behaviors, and select the optimal one based on the advertising system ranking model. After the items are determined, related creatives (advertising carriers, pictures, videos, copywriting and other creative content) are generated. The process includes several parts:
(1)物品召回(1)Item recall
物品召回是一个兴趣匹配问题,一是通过用户的历史兴趣信息,将相关搜索、浏览、点击、加购物车物品进行有效挖掘,物品召回既要考虑用户的长期偏好,又要考虑实时需求,通过这种短期、长期行为结合的方式进行多维度的兴趣挖掘。同时,针对促销活动、热点信息,考虑对相关物品进行同店同品的扩展,加入热品以及类似物品信息作进一步挖掘。Item recall is an interest matching problem. First, it uses the user's historical interest information to effectively mine related searches, browsing, clicking, and adding shopping cart items. Item recall must consider both the user's long-term preferences and real-time needs. This combination of short-term and long-term behaviors conducts multi-dimensional interest mining. At the same time, for promotional activities and hot information, consider expanding related items to the same store and same product, and add information on hot items and similar items for further exploration.
另外一个思路是考虑人群的相似性,也就是常说的协同过滤,通过用户画像对相关用户进行有效聚类,考虑对同一类用户(比如消费习惯类似、品牌偏好一致)的兴趣偏好进行迁移,可以增加对于一些 新颖性兴趣物品的挖掘能力。Another idea is to consider the similarity of the crowd, which is often called collaborative filtering, to effectively cluster related users through user portraits, and to consider migrating the interests and preferences of the same type of users (such as similar consumption habits and consistent brand preferences). can be increased for some The ability to mine novel items of interest.
(2)模型排序(2) Model sorting
模型排序在广告系统里称为“精排”,通过模型预估的方式,选择里面最优的一个商品进行展示,排序的目标通常是收入最大化,这里核心的pCTR预估,也就是预估出来里面效果最好(点击率最高)的一个商品。Model sorting is called "precise sorting" in the advertising system. Through model estimation, the best product is selected for display. The goal of sorting is usually to maximize revenue. The core pCTR estimation here is the product with the best estimated effect (highest click-through rate).
目前的排序模型通常是基于卷积神经网络模型(Convolutional Neural Network,CNN),将丰富的数据特征和深度学习结合起来,基于数据的后验点击率进行模型训练,将学习到的参数用于在线的预估,这样做的一个依据是电商场景下有大量的用户访问数据,同时离线的数据与在线的数据服从同样的正态分布,基于这个推断,只要保证模型本身能够输入并学习到关键的物品及用户特征。The current ranking model is usually based on the convolutional neural network model (Convolutional Neural Network, CNN), which combines rich data features with deep learning, conducts model training based on the posterior click rate of the data, and uses the learned parameters for online An estimate of items and user characteristics.
电商广告场景下,目前已经建立了比较成熟的特征体系,比如用户的pin以及设备信息、上下文信息,物品的类目、属性等信息,这些信息作为基础特征已经在各场景下广泛应用。In the e-commerce advertising scenario, a relatively mature feature system has been established, such as the user's pin and device information, contextual information, item categories, attributes and other information. This information has been widely used in various scenarios as basic features.
(3)创意生成(3)Creative generation
创意作为广告载体,对物品内容进行呈现,目前的创意通常是有两种方式,一是基于商家预定义的创意进行展示;二是基于创意元素内容(比如物品图、模版、利益点、卖点)进行创意生成。Creativity, as an advertising carrier, presents the content of items. Currently, there are usually two ways of creativity. One is based on the creativity predefined by the merchant; the other is based on creative element content (such as item pictures, templates, benefit points, and selling points). Do idea generation.
创意作为广告内容的关键出口,其生成方式目前已经具备全自动生成能力,同时针对商家自己制作的创意,也能够进行统一的整合和优选。As the key export of advertising content, the generation method of creativity is now fully automatic. At the same time, the creativity produced by merchants can also be unified and optimized.
以展示类图片为例,目前创意生成通常是采用模版的方式进行生成,将商品主图插入模版进行替换,渲染生成对应的物品创意,虽然具备自动化能力,但是没有将AI能力进行充分结合和体现,效果上有很大的优化空间。Take display pictures as an example. Currently, creative generation is usually generated using templates. The main image of the product is inserted into the template and replaced, and the corresponding item creative is generated by rendering. Although it has automation capabilities, it does not fully integrate and embody AI capabilities. , there is a lot of room for optimization in terms of effect.
需要说明的,这里的物品可以是商品。It should be noted that the items here can be commodities.
在兴趣商品召回阶段,由于商品召回只是针对商品本身进行召回,并没有考虑用户对于创意形式以及创意类型的偏好,以及商品本身以及创意内容之间存在某种协同关系,当前方法缺少对于相关内容之间的统一表达和建模方式,导致商品召回的结果差。在模型排序阶段,当前排序方式只考虑了单一商品的偏好,缺少对于多个商品进行组合预估的能力,同时对于不同层次商品属性缺乏相应的建模能力,以及模型预估以商品以及用户特征为输入,缺少创意维度的信息,尤其是缺少将文案、图片、视频等内容作为特征的多模态预估的能力,导致推荐效果差。在创意生成阶段,目前创意生成是在召回和排序之后,而且生成方式多是模版嵌套,没有将用户对于创意元素的偏好进行体现,以模版为载体的方式,限制了创意的多样性,同时也限制了用户对于创意的千人千面的需求,对于用户兴趣缺少个性化的表达。In the interest product recall stage, since the product recall only focuses on the product itself, it does not take into account the user's preference for creative forms and creative types, and there is a certain synergistic relationship between the product itself and the creative content. The current method lacks the knowledge of related content. The unified expression and modeling methods among them lead to poor product recall results. In the model sorting stage, the current sorting method only considers the preference of a single product and lacks the ability to make combined predictions for multiple products. It also lacks corresponding modeling capabilities for different levels of product attributes, and model predictions are based on product and user characteristics. As input, there is a lack of creative dimension information, especially the lack of multi-modal prediction capabilities that use copywriting, pictures, videos and other content as features, resulting in poor recommendation results. In the creative generation stage, the current creative generation is after recall and sorting, and the generation method is mostly template nesting, which does not reflect the user's preference for creative elements. Using templates as a carrier limits the diversity of creativity. At the same time, It also limits users’ diverse needs for creativity, and lacks personalized expression of user interests.
针对上述问题,本发明实施例提出了一种多媒体信息的生成方法,通过在商品召回阶段加入创意元素的召回,将智能创意生成演化为元素化组合问题,基于向量化协同匹配的思路,建立商品和创意元素的统一表达,打破了传统电商广告直接卖货的潜在逻辑,以广告生态结合信息流生态的方式为依托,建立创意驱动的用户转化最大化思路;在模型排序阶段,将单一商品的优选转变为多商品的组合优选,同时将多模态信息融合到模型预估中;在创意生成阶段,通过创意元素和商品组合的形式进行实时个性化创意生成,对用户兴趣进行个性化表达。In view of the above problems, an embodiment of the present invention proposes a method for generating multimedia information. By adding the recall of creative elements in the product recall stage, intelligent creative generation is evolved into an element combination problem. Based on the idea of vectorized collaborative matching, a unified expression of products and creative elements is established, breaking the underlying logic of traditional e-commerce advertisements that directly sell goods. Relying on the combination of advertising ecology and information flow ecology, a creative-driven user conversion maximization idea is established; in the model sorting stage, the optimization of a single product is transformed into a combination optimization of multiple products, and multimodal information is integrated into model estimation; in the creative generation stage, real-time personalized creative generation is performed in the form of creative elements and product combinations to express user interests in a personalized way.
图1是本发明实施例提供一种多媒体信息的生成方法的一个可选的流程示意图一,将结合图1示出 的步骤进行说明。Figure 1 is an optional flow diagram 1 of a method for generating multimedia information provided by an embodiment of the present invention. It will be shown in conjunction with Figure 1 The steps are explained.
S101、响应于接收到的浏览请求,召回物品信息及内容信息。S101. Recalling item information and content information in response to a received browsing request.
在本发明的一些实施例中,物品信息为终端待推荐的所有物品。物品信息包含多个子物品信息;内容信息包含多个子内容信息。浏览请求是指用户在应用软件浏览页面或者在应用网页浏览页面中的搜索框中输入准备浏览的物品信息而形成的请求。例如,用户在某购物平台的浏览页面中的搜索框中输入“相框”后,就会形成关于浏览相框的请求。In some embodiments of the present invention, the item information is all items to be recommended by the terminal. Item information contains multiple sub-item information; content information contains multiple sub-content information. A browsing request refers to a request formed by the user entering the item information to be browsed in the search box on the application software browsing page or on the application web page browsing page. For example, after a user enters "photo frame" in the search box on the browsing page of a shopping platform, a request for browsing photo frames will be generated.
在本发明的一些实施例中,服务器接收终端发送的浏览请求,响应于浏览请求,根据对象的历史浏览信息从物品库和内容库召回物品信息及内容信息。In some embodiments of the present invention, the server receives the browsing request sent by the terminal, responds to the browsing request, and recalls item information and content information from the item library and content library according to the object's historical browsing information.
示例性的,如图2所示,根据对象的历史浏览信息(相当于actor),将物品库中的多个商品(相当于物品)输入排序推荐模型(Deep&Cross Network,DCN)中,进行提取,得到多个物品信息;将浏览图片输入到卷积神经网络(Convolutional Neural Network,CNN)中,进行提取,得到携带物品信息的内容信息;将物品信息从携带物品信息的内容信息中剔除,得到Click(相当于内容信息)。Exemplarily, as shown in FIG2 , based on the historical browsing information of the object (equivalent to the actor), multiple commodities (equivalent to the items) in the item library are input into the sorting recommendation model (Deep & Cross Network, DCN) for extraction to obtain multiple item information; the browsed image is input into the convolutional neural network (Convolutional Neural Network, CNN) for extraction to obtain the content information carrying the item information; the item information is removed from the content information carrying the item information to obtain Click (equivalent to the content information).
S102、基于物品信息及内容信息进行物品和内容的特征提取,得到物品维度对应的物品特征和内容维度对应的内容特征,并将物品特征和内容特征进行协同和融合,得到多组融合特征。S102. Extract features of items and content based on item information and content information, obtain item features corresponding to the item dimension and content features corresponding to the content dimension, and collaborate and fuse the item features and content features to obtain multiple sets of fusion features.
在本发明的一些实施例中,每组融合特征表征不同内容模态组合下与不同物品之间的融合。协同是对多个位于不同向量空间的向量进行处理,使其映射到同一个向量空间,使之满足相同的概率分布;融合是对多个位于同一空间的向量进行不同的组合而形成的融合向量;向量可以是本发明中的物品特征以及内容特征。在融合进行之前先要进行协同,只有进行协同处理之后,才可以进行融合。物品特征是某种物品的展示出的特征,内容特征是关于对某种物品的图片说明、视频说明以及文字说明进行特征提取后得到的特征。In some embodiments of the present invention, each set of fusion features represents the fusion between different content modal combinations and different items. Collaboration is to process multiple vectors located in different vector spaces so that they are mapped to the same vector space so that they meet the same probability distribution; fusion is a fusion vector formed by different combinations of multiple vectors located in the same space ; The vector can be the item characteristics and content characteristics in the present invention. Collaboration must be carried out before fusion, and fusion can only be carried out after collaborative processing. Item features are the displayed characteristics of a certain item, and content features are features obtained by extracting features from the picture description, video description, and text description of a certain item.
示例性的,物品信息可以是物品的属性特征;内容信息可以是去除物品属性以外的图像和文案,即,就是对物品进行宣传的一些创意内容。For example, the item information can be the attribute characteristics of the item; the content information can be images and copywriting excluding the item attributes, that is, some creative content to promote the item.
在本发明的一些实施例中,服务器可以对物品信息进行特征提取,得到物品维度对应的物品特征;对内容信息进行识别,得到内容多模态类型;对内容多模态类型对应的内容信息进行特征提取,得到内容维度对应的内容特征。对物品特征和内容特征,进行协同处理,得到同一概率分布的第一物品特征和第一内容特征;将第一物品特征和第一物品特征进行融合,得到多组融合特征。In some embodiments of the present invention, the server can perform feature extraction on item information to obtain item features corresponding to the item dimension; identify content information to obtain content multi-modal types; and perform content information corresponding to content multi-modal types. Feature extraction to obtain content features corresponding to the content dimension. The item features and content features are collaboratively processed to obtain the first item feature and the first content feature of the same probability distribution; the first item feature and the first item feature are fused to obtain multiple sets of fusion features.
在本发明的一些实施例中,图3为本发明实施例提供一种多媒体信息的生成方法的一个可选的流程示意图三,如图3所示,基于物品信息及内容信息进行特征提取,得到物品维度对应物品特征和内容维度对应的内容特征可以通过S1021-S1023实现,如下:In some embodiments of the present invention, Figure 3 is an optional flowchart 3 of a method for generating multimedia information provided by an embodiment of the present invention. As shown in Figure 3, feature extraction is performed based on item information and content information to obtain The item characteristics corresponding to the item dimension and the content characteristics corresponding to the content dimension can be realized through S1021-S1023, as follows:
S1021、对物品信息进行特征提取,得到物品维度对应的物品特征。S1021. Extract features from the item information to obtain item features corresponding to the item dimensions.
在本发明的一些实施例中,服务器可以通过对物品信息进行特征提取,将物品信息转化成向量形式的特征,得到物品维度对应的物品特征。物品特征是一个1024维的浮点数组。In some embodiments of the present invention, the server can perform feature extraction on the item information, convert the item information into features in vector form, and obtain item features corresponding to the item dimensions. Item characteristics are a 1024-dimensional floating point array.
S1022、对内容信息进行识别,得到内容多模态类型对应的内容信息。S1022: Identify the content information to obtain content information corresponding to the content multimodal type.
在本发明的一些实施例中,服务器可以通过神经网络模型对内容信息进行识别,得到内容多模态类型对应的内容信息,其中,内容多模态类型包括文本信息、图像信息和图像序列信息中至少两种模态。 神经网络(Neural Networks,NN)模型是由大量的、简单的处理单元(称为神经元)广泛地互相连接而形成的复杂网络系统,它反映了人脑功能的许多基本特征,是一个高度复杂的非线性动力学习系统。神经网络具有大规模并行、分布式存储和处理、自组织、自适应和自学能力,特别适合处理需要同时考虑许多因素和条件的、不精确和模糊的信息处理问题。In some embodiments of the present invention, the server can identify the content information through a neural network model to obtain content information corresponding to the content multi-modal type, where the content multi-modal type includes text information, image information and image sequence information. At least two modalities. The neural network (NN) model is a complex network system formed by a large number of simple processing units (called neurons) that are widely connected to each other. It reflects many basic characteristics of human brain function and is a highly complex system. nonlinear dynamic learning system. Neural networks have large-scale parallelism, distributed storage and processing, self-organization, self-adaptation and self-learning capabilities, and are particularly suitable for processing imprecise and fuzzy information processing problems that require simultaneous consideration of many factors and conditions.
S1023、对内容多模态类型对应的内容信息进行特征提取,得到内容维度对应的内容特征。S1023. Extract features from content information corresponding to content multi-modal types to obtain content features corresponding to content dimensions.
在本发明的一些实施例中,服务器可以根据内容多模态类型进行相应的特征提取。若内容多模态类型为文本类型,则通过第一编码方式对文本信息进行特征提取处理,得到文本特征。若内容多模态类型为图像类型或图像序列类型,则通过第二编码方式分别对图像信息和图像序列信息进行特征提取处理,得到图像特征和行为特征。根据文本特征、图像特征和行为特征,确定内容维度对应的内容特征。第一编码方式作用的对象主要是针对文本信息的;第二编码方式作用的主要对象是图像信息和图像序列信息。例如,图像信息可以是图像,图像序列信息可以是视频。In some embodiments of the present invention, the server can perform corresponding feature extraction according to the multi-modal type of the content. If the content multi-modal type is a text type, feature extraction processing is performed on the text information through the first encoding method to obtain text features. If the content multi-modal type is an image type or an image sequence type, feature extraction processing is performed on the image information and image sequence information respectively through the second encoding method to obtain image features and behavioral features. Based on text features, image features and behavioral features, the content features corresponding to the content dimensions are determined. The first encoding method mainly targets text information; the second encoding method mainly targets image information and image sequence information. For example, the image information may be an image, and the image sequence information may be a video.
可以理解的是,服务器对物品信息进行特征提取,将物品信息进行向量化表示,得到物品对应物品特征;对内容信息进行识别,得到内容多模态类型;对内容多模态类型对应的内容信息进行特征提取,得到内容维度对应的内容特征。由于物品特征以及内容特征属于不同维度下的特征,因此,服务器得到多维度的特征,后续基于多维度的特征再实现生成目标多媒体信息时,使得目标多媒体信息具有多维度的信息,进而使目标多媒体信息具有多样性。It can be understood that the server extracts features from item information, vectorizes the item information, and obtains item features corresponding to the item; identifies content information to obtain content multimodal types; extracts features from content information corresponding to content multimodal types to obtain content features corresponding to content dimensions. Since item features and content features belong to features in different dimensions, the server obtains multidimensional features. When the target multimedia information is subsequently generated based on the multidimensional features, the target multimedia information has multidimensional information, thereby making the target multimedia information diverse.
在本发明的一些实施例中,图4为本发明实施例提供一种多媒体信息的生成方法的一个可选的流程示意图四,如图4所示,S1023可以通过S201-S203实现,如下:In some embodiments of the present invention, Figure 4 is an optional flow diagram 4 of a method for generating multimedia information provided by an embodiment of the present invention. As shown in Figure 4, S1023 can be implemented through S201-S203, as follows:
S201、若内容多模态类型为文本类型,则通过第一编码方式对文本信息进行特征提取,得到文本特征。S201. If the content multi-modal type is a text type, perform feature extraction on the text information through the first encoding method to obtain text features.
在本发明的一些实施例中,服务器根据内容多模态类型为文本类型,对文本信息进行特征提取,得到文本初始特征;通过第一编码方式对文本初始特征进行编码处理,得到文本特征。In some embodiments of the present invention, the server extracts features from text information based on the content multimodal type being text type to obtain initial text features; and encodes the initial text features using a first encoding method to obtain text features.
需要说明的是,文本特征是向量形式的文本初始特征。It should be noted that text features are text initial features in vector form.
在本发明的一些实施例中,S201可以通过S2011-S2012实现,如下:In some embodiments of the present invention, S201 can be implemented through S2011-S2012, as follows:
S2011、若内容多模态类型为文本类型,则对文本信息进行特征提取,得到文本初始特征。S2011. If the content multi-modal type is a text type, perform feature extraction on the text information to obtain initial text features.
在本发明的一些实施例中,文本初始特征包括语义表达信息和词语信息。In some embodiments of the present invention, the text initial features include semantic expression information and word information.
在本发明的一些实施例中,服务器根据内容多模态类型为文本类型,对文本信息进行特征提取,得到语义表达信息和词语信息。语义表达信息和词语信息都是文本初始特征。In some embodiments of the present invention, the server performs feature extraction on the text information based on the multi-modal type of the content being text type to obtain semantic expression information and word information. Semantic expression information and word information are both initial features of text.
示例性的,图5为本发明实施例提供一种多媒体信息的生成方法的一个可选的流程示意图五,如图5所示,服务器通过对文案信息进行特征提取,得到语义表达(相当于语义表达信息)和切词(相当于词语信息)。具体是通过Bert的方式得到语义表达。Exemplarily, Figure 5 is an optional flow diagram 5 of a method for generating multimedia information provided by an embodiment of the present invention. As shown in Figure 5, the server obtains semantic expression (equivalent to semantic expression) by performing feature extraction on the copy information. Express information) and word segmentation (equivalent to word information). Specifically, the semantic expression is obtained through Bert's method.
S2012、通过第一编码方式,对文本初始特征进行编码处理,得到文本特征。S2012. Use the first encoding method to encode the initial text features to obtain text features.
在本发明的一些实施例中,服务器通过第一编码方式,对文本初始特征进行编码处理,得到向量化后的文本特征。In some embodiments of the present invention, the server encodes the initial text features through the first encoding method to obtain vectorized text features.
示例性的,如图5所示,第一编码方式为ConCat,服务器通过ConCat对语义表达(相当于语义表 达信息)和切词(相当于词语信息)进行编码,得到特征向量(相当于文本特征)。For example, as shown in Figure 5, the first encoding method is ConCat, and the server uses ConCat to express semantics (equivalent to semantic table Information) and word segmentation (equivalent to word information) are encoded to obtain feature vectors (equivalent to text features).
可以理解的是,服务器将文本信息进行特征提取和编码处理,得到文本特征。在此过程中,服务器将文本信息转化成向量化的文本特征,便于后续进行物品特征与内容特征的协同和融合。It can be understood that the server performs feature extraction and encoding on the text information to obtain text features. During this process, the server converts text information into vectorized text features to facilitate subsequent collaboration and integration of item features and content features.
S202、若内容多模态类型为图像类型或图像序列类型,则通过第二编码方式分别对图像信息和图像序列信息进行特征提取,得到图像特征和行为特征。S202. If the content multi-modal type is an image type or an image sequence type, perform feature extraction on the image information and image sequence information respectively through the second encoding method to obtain image features and behavioral features.
在本发明的一些实施例中,服务器根据内容多模态类型为图像类型,则对图像信息进行特征提取,得到图像初始特征。根据内容多模态类型为图像序列类型,则对图像序列信息进行特征提取,得到行为初始特征。通过第二编码方式,对图像初始特征和行为初始特征分别进行编码处理,得到图像特征和行为特征。In some embodiments of the present invention, the server performs feature extraction on the image information according to the content multi-modal type being the image type to obtain initial image features. According to the content multi-modal type being an image sequence type, feature extraction is performed on the image sequence information to obtain initial behavioral features. Through the second coding method, the initial features of the image and the initial features of the behavior are respectively coded to obtain the image features and the behavior features.
在本发明的一些实施例中,S202可以通过S2021-S2023实现,如下:In some embodiments of the present invention, S202 can be implemented through S2021-S2023, as follows:
S2021、若内容多模态类型为图像类型,则对图像信息进行特征提取,得到图像初始特征。S2021. If the content multi-modal type is an image type, perform feature extraction on the image information to obtain initial image features.
在本发明的一些实施例中,图像初始特征包括场景信息、内容信息和风格信息。In some embodiments of the present invention, the initial image features include scene information, content information and style information.
在本发明的一些实施例中,服务器可以根据内容多模态类型为图像类型,对图像信息进行特征提取,得到场景信息、内容信息和风格信息。场景信息、内容信息和风格信息都是图像初始特征。In some embodiments of the present invention, the server can extract features from the image information according to the content multimodal type being the image type, and obtain scene information, content information and style information. The scene information, content information and style information are all initial features of the image.
示例性的,如图5所示,图像信息可以是物品展示的宣传图片;服务器通过对图像信息进行特征提取,得到场景(相当于场景信息)、内容、主体(内容、主体相当于内容信息)、颜色、风格和布局(颜色、风格和布局相当于风格信息)。场景、内容、主体、颜色、风格和布局均属于图像初始特征。For example, as shown in Figure 5, the image information can be a promotional picture of an item display; the server extracts features of the image information to obtain the scene (equivalent to scene information), content, and main body (content and main body are equivalent to content information) , color, style and layout (color, style and layout are equivalent to style information). Scene, content, subject, color, style and layout are all initial characteristics of an image.
S2022、若内容多模态类型为图像序列类型,则对图像序列信息进行特征提取,得到行为初始特征。S2022. If the content multi-modal type is an image sequence type, perform feature extraction on the image sequence information to obtain initial behavioral features.
在本发明的一些实施例中,行为初始特征包括主体目标信息和关键帧信息。In some embodiments of the present invention, the behavior initial features include subject target information and key frame information.
在本发明的一些实施例中,服务器可以根据内容多模态类型为图像序列类型,对图像序列信息进行特征提取,得到目标主题信息和关键帧信息。目标主题信息和关键帧信息都是行为初始特征。In some embodiments of the present invention, the server can perform feature extraction on the image sequence information based on the content multi-modal type as the image sequence type to obtain the target theme information and key frame information. Target theme information and key frame information are both behavioral initial features.
示例性的,如图5所示,服务器通过对图像序列信息进行特征提取,得到关键帧、精彩点(关键帧、精彩点相当于关键帧信息)、摘要、主体目标行为动作(摘要、主体目标行为动作相当于目标主题信息)。关键帧、精彩点、摘要、主体目标行为动作均属于行为初始特征。其中,关键帧、精彩点、摘要、主体目标行为以及动作都属于内容清单。Illustratively, as shown in Figure 5, the server performs feature extraction on the image sequence information to obtain key frames, highlight points (key frames, highlight points are equivalent to key frame information), abstracts, and subject target behavior actions (summary, subject target Behavioral actions are equivalent to target topic information). Key frames, highlights, summaries, and subject target behaviors are all initial behavioral characteristics. Among them, key frames, highlights, summaries, subject target behaviors and actions all belong to the content list.
S2023、通过第二编码方式,对图像初始特征和行为初始特征分别进行编码处理,得到图像特征和行为特征。S2023. Through the second encoding method, the initial image features and the initial behavioral features are respectively encoded to obtain the image features and behavioral features.
在本发明的一些实施例中,服务器通过第二编码方式,对图像初始特征进行编码处理,得到向量化后的图像特征;对行为初始特征进行编码处理,得到向量化后的行为特征。In some embodiments of the present invention, the server encodes the initial image features through the second encoding method to obtain vectorized image features; it encodes the initial behavioral features to obtain vectorized behavioral features.
示例性的,如图5所示,第二编码方式是One Hot,服务器通过One Hot对场景、内容、主体、颜色、风格和布局进行特征编码,得到特征向量(相当于图像特征)。服务器通过One Hot对关键帧、精彩点、摘要、主体目标行为动作进行特征编码,得到特征向量(相当于行为特征)。For example, as shown in Figure 5, the second encoding method is One Hot. The server performs feature encoding on the scene, content, subject, color, style and layout through One Hot to obtain a feature vector (equivalent to image features). The server uses One Hot to perform feature encoding on key frames, highlights, summaries, and subject target behaviors to obtain feature vectors (equivalent to behavioral features).
可以理解的是,服务器将图像信息和图像序列信息进行特征提取和编码处理,得到图像特征和行为特征。服务器可以将图像信息和图像序列信息分别转换成向量化的图像特征和向量化的行为特征,得到了多模态的内容特征,使的内容特征具有多样性。 It can be understood that the server performs feature extraction and encoding on the image information and image sequence information to obtain image features and behavioral features. The server can convert image information and image sequence information into vectorized image features and vectorized behavioral features respectively, thereby obtaining multi-modal content features, making the content features diverse.
S203、根据文本特征、图像特征和行为特征中的至少一种,确定内容维度对应的内容特征。S203. Determine content features corresponding to the content dimension based on at least one of text features, image features, and behavioral features.
在本发明的一些实施例中,服务器将文本特征、图像特征和行为特征中的至少一种作为内容维度对应的内容特征。In some embodiments of the present invention, the server uses at least one of text features, image features, and behavioral features as content features corresponding to the content dimension.
示例性的,服务器可以将文本特征确定为内容维度对应的内容特征;或者,服务器可以将图像特征确定为内容维度对应的内容特征;或者,服务器可以将行为特征确定为内容维度对应的内容特征;或者,服务器可以将文本特征和图像特征确定为内容维度对应的内容特征;或者,服务器可以将文本特征和行为特征确定为内容维度对应的内容特征;或者,服务器可以将图像特征和行为特征确定为内容维度对应的内容特征;或者,服务器可以将文本特征、图像特征和行为特征确定为内容维度对应的内容特征。For example, the server can determine text features as content features corresponding to the content dimension; or, the server can determine image features as content features corresponding to the content dimension; or, the server can determine behavioral features as content features corresponding to the content dimension; Alternatively, the server can determine text features and image features as content features corresponding to the content dimension; alternatively, the server can determine text features and behavioral features as content features corresponding to the content dimension; or, the server can determine image features and behavioral features as Content features corresponding to the content dimension; alternatively, the server can determine text features, image features, and behavioral features as content features corresponding to the content dimension.
可以理解的是,服务器可以对内容信息进行识别和特征提取,得到文本特征、图像特征和行为特征。服务器可以根据文本特征、图像特征和行为特征中的一种特征确定内容维度对应的内容特征;或者,根据文本特征、图像特征和行为特征中的两种特征确定内容维度对应的内容特征;或者,服务器可以根据文本特征、图像特征和行为特征中的三种特征确定内容维度对应的内容特征。由于,内容特征具有一种或者多种多模态特征,因此,内容特征具有多样性。It is understandable that the server can identify and extract features of the content information to obtain text features, image features, and behavior features. The server can determine the content features corresponding to the content dimension based on one of the text features, image features, and behavior features; or, the server can determine the content features corresponding to the content dimension based on two of the text features, image features, and behavior features; or, the server can determine the content features corresponding to the content dimension based on three of the text features, image features, and behavior features. Since the content features have one or more multimodal features, the content features are diverse.
在本发明的一些实施例中,将物品特征和内容特征进行协同和融合,得到多组融合特征可以通过S301-S303实现,如下:In some embodiments of the present invention, collaboration and fusion of item features and content features to obtain multiple sets of fusion features can be achieved through S301-S303, as follows:
S301、对物品特征和内容特征,进行协同处理,得到同一概率分布的第一物品特征和第一内容特征。S301. Coordinately process item features and content features to obtain first item features and first content features with the same probability distribution.
在本发明的一些实施例中,第一物品特征包括多个第一子物品特征;第一内容特征包括多个第一子内容特征。In some embodiments of the present invention, the first item feature includes a plurality of first sub-item features; the first content feature includes a plurality of first sub-content features.
在本发明的一些实施例中,服务器针对特征域的差异,对物品特征和内容特征进行协同学习处理,将物品特征和内容特征映射到同一个向量空间,得到同一概率分布的第一物品特征和第一内容特征。In some embodiments of the present invention, the server performs collaborative learning processing on item features and content features based on differences in feature domains, maps the item features and content features to the same vector space, and obtains the first item feature sum of the same probability distribution. First content characteristics.
需要说明的是,协同处理是对多个位于不同向量空间的向量进行处理,使其映射到同一个向量空间,使之满足相同的概率分布;协同处理与协同的技术手段一致。It should be noted that collaborative processing is to process multiple vectors located in different vector spaces so that they are mapped to the same vector space and satisfy the same probability distribution; the technical means of collaborative processing and collaboration are consistent.
S302、对多个第一子物品特征进行随机组合,得到多个物品组合特征。S302. Randomly combine multiple first sub-item features to obtain multiple item combination features.
在本发明的一些实施例中,服务器可以对多个第一子物品特征进行随机组合,得到多个不一样的物品组合特征。In some embodiments of the present invention, the server can randomly combine multiple first sub-item features to obtain multiple different item combination features.
示例性的,服务器对12个第一子物品特征(12个第一子物品特征不相同)进行随机组合,得到5个物品组合特征;其中,5个物品组合特征中分别包含6个第一子物品特征、8个第一子物品特征、3个第一子物品特征、5个第一子物品特征和9个第一子物品特征。需要说明的是,5个物品组合特征中可能存在相同第一子物品特征,也有可能存在不相同第一子物品特征。For example, the server randomly combines 12 first sub-item features (the 12 first sub-item features are different) to obtain 5 item combination features; among which, the 5 item combination features each include 6 first sub-item features. Item characteristics, 8 first sub-item characteristics, 3 first sub-item characteristics, 5 first sub-item characteristics and 9 first sub-item characteristics. It should be noted that the five item combination features may have the same first sub-item feature, or there may be different first sub-item features.
S303、对多个第一子内容特征进行随机组合,得到多个内容组合特征。S303. Randomly combine multiple first sub-content features to obtain multiple content combination features.
在本发明的一些实施例中,内容组合特征包含至少两种内容多模态类型对应的内容特征。In some embodiments of the present invention, the content combination features include content features corresponding to at least two content multi-modal types.
在本发明的一些实施例中,服务器可以对多个第一子内容特征进行随机组合,得到多个不一样的内容组合特征。In some embodiments of the present invention, the server can randomly combine multiple first sub-content features to obtain multiple different content combination features.
示例性的,服务器对6个第一子内容特征(6个第一子内容特征不相同,具体体现在含有的内容多 模态类型不一样或者内容特征本身不相同)进行随机组合,得到2个内容组合特征。其中,1个内容组合特征含有三种内容多模态类型对应的内容特征,文本特征有2种、图像特征有3种和行为特征有1种;另一个内容组合特征含有两种内容多模态类型对应的内容特征,文本特征有2种、图像特征有1种。For example, the server processes six first sub-content features (the six first sub-content features are different, specifically, the content contained is more The modal types are different or the content features themselves are different) are randomly combined to obtain two content combination features. Among them, one content combination feature contains content features corresponding to three content multimodal types, including two text features, three image features and one behavior feature; the other content combination feature contains content features corresponding to two content multimodal types, including two text features and one image feature.
S304、对多个物品组合特征和多个内容组合特征进行融合,得到多组融合特征。S304. Fusion of multiple item combination features and multiple content combination features to obtain multiple sets of fusion features.
在本发明的一些实施例中,服务器可以对多个物品组合特征和多个内容组合特征进行融合,得到多组融合特征;一组融合特征包括至少一个物品组合特征和至少一个内容组合特征。In some embodiments of the present invention, the server can fuse multiple item combination features and multiple content combination features to obtain multiple sets of fusion features; one set of fusion features includes at least one item combination feature and at least one content combination feature.
示例性的,服务器对5个物品组合特征和2个内容组合特征进行融合,得到3组融合特征、分别是第1组、第2组和第3组;其中,第1组融合特征包括3个第一子物品特征,三种内容多模态类型,文本特征有2种、图像特征有3种和行为特征有1种;第2组融合特征包括8个第一子物品特征,两种内容多模态类型,文本特征有2种、图像特征有1种;第3组包括13个第一子物品特征,三种内容多模态类型,文本特征有4种、图像特征有4种和行为特征有1种。For example, the server fuses 5 item combination features and 2 content combination features to obtain 3 groups of fusion features, namely the 1st group, the 2nd group and the 3rd group; among which, the 1st group of fusion characteristics includes 3 The first sub-item features, three content multi-modal types, including 2 text features, 3 image features and 1 behavioral feature; the second set of fusion features include 8 first sub-item features, two types of content multi-modal types Modal type, there are 2 text features, 1 image feature; the third group includes 13 first sub-item features, three content multi-modal types, 4 text features, 4 image features and behavioral features There is 1 kind.
可以理解的是,服务器将位于不同向量空间的物品特征和内容特征进行处理,使其映射到同一个向量空间,使之满足相同的概率分布,这样就可以使两种特征位于同一向量空间,便于后续实现融合两种特征。服务器对多个第一子物品特征进行随机组合,得到多个物品特征组合,由于每个物品组合中包含多个第一子物品特征,因此,物品特征组合具有多样性。服务器对多个第一子内容特征进行随机组合,得到多个内容特征组合,由于每个内容组合中包含多个第一子内容特征,因此,内容特征组合具有多样性。服务器将物品特征组合和内容特征组合随机融合,得到多组融合特征,由于融合特征包括提高了物品特征组合和内容特征组合,因此,融合特征具有多样性。It can be understood that the server processes the item characteristics and content characteristics located in different vector spaces, mapping them to the same vector space, so that they satisfy the same probability distribution, so that the two characteristics can be located in the same vector space, which is convenient for Subsequent implementation of the fusion of the two features. The server randomly combines multiple first sub-item features to obtain multiple item feature combinations. Since each item combination contains multiple first sub-item features, the item feature combinations are diverse. The server randomly combines multiple first sub-content features to obtain multiple content feature combinations. Since each content combination contains multiple first sub-content features, the content feature combinations are diverse. The server randomly fuses the item feature combination and the content feature combination to obtain multiple sets of fusion features. Since the fusion features include improved item feature combinations and content feature combinations, the fusion features are diverse.
S103、通过预设的推荐模型,对多组融合特征进行预估,选择预估值最高的一组融合特征对应的目标物品信息和目标内容信息。S103. Estimate multiple groups of fusion features through a preset recommendation model, and select the target item information and target content information corresponding to a group of fusion features with the highest estimated value.
在本发明的一些实施例中,服务器可以将多组融合特征输入预设的推荐模型中进行预估,得到多组融合特征各自对应的第一预估值。基于多个第一预估值,从多组融合特征中,选择预估值最高的一组融合特征。对一组融合特征进行解码处理,得到目标物品信息和目标内容信息。In some embodiments of the present invention, the server can input multiple sets of fusion features into a preset recommendation model for prediction, and obtain first estimated values corresponding to each of the multiple sets of fusion features. Based on multiple first estimated values, a set of fused features with the highest estimated value is selected from multiple sets of fused features. Decode a set of fused features to obtain target item information and target content information.
示例性的,设置物品个数From1,From1的范围是(0,M),设置创意(即,内容)个数From2,From2的范围是(0,N),服务器进行遍历探索,得到多个物品;针对多个物品,将多个物品的向量进行融合;针对多个创意,将召回阶段的多模态特征,进行特征向量融合;同时将融合后的创意向量(即,融合特征)输入到预估模型(即,预设的推荐模型),得到CTR预估值;选择pCTR预估值最高的组合进行输出,作为整体预估结果(即,预估值最高的一组融合特征对应的目标物品信息和目标内容信息)。Exemplarily, the number of items is set to From 1 , and the range of From 1 is (0, M); the number of creatives (i.e., content) is set to From 2 , and the range of From 2 is (0, N); the server performs traversal exploration to obtain multiple items; for multiple items, the vectors of the multiple items are fused; for multiple creatives, the multimodal features of the recall stage are fused with feature vectors; at the same time, the fused creative vectors (i.e., fused features) are input into the estimation model (i.e., the preset recommendation model) to obtain a CTR estimation value; the combination with the highest pCTR estimation value is selected for output as the overall estimation result (i.e., the target item information and target content information corresponding to a set of fused features with the highest estimation value).
在本发明的一些实施例中,S103可以通过S1031、S1032和S1033实现,如下:In some embodiments of the present invention, S103 can be implemented through S1031, S1032 and S1033, as follows:
S1031、将多组融合特征输入预设的推荐模型中进行预估,得到多组融合特征各自对应的第一预估值。S1031. Input multiple sets of fusion features into the preset recommendation model for prediction, and obtain the first estimated values corresponding to each of the multiple sets of fusion features.
在本发明的一些实施例中,服务器通过预设的推荐模型对多组融合特征进行预估,得到多组融合特征各自对应的第一预估值。In some embodiments of the present invention, the server estimates the multiple groups of fused features through a preset recommendation model to obtain first estimated values corresponding to each of the multiple groups of fused features.
示例性的,服务器通过预设的推荐模型对3组融合特征进行预估,得到3组融合特征分别对应的第 一预估值0.7、0.85和0.62。For example, the server estimates three sets of fusion features through a preset recommendation model, and obtains the third set of fusion features corresponding to the three sets of fusion features. One estimate is 0.7, 0.85 and 0.62.
S1032、基于多个第一预估值,从多组融合特征中,选择预估值最高的一组融合特征。S1032. Based on multiple first estimated values, select a set of fusion features with the highest estimated value from multiple sets of fusion features.
在本发明的一些实施例中,服务器通过多个第一预估值,从多组融合特征中,选择预估值最高的一组融合特征。In some embodiments of the present invention, the server selects a set of fusion features with the highest estimated value from multiple sets of fusion features based on multiple first estimated values.
示例性的,服务器从3组融合特征分别对应的第一预估值0.7、0.85和0.62中,选择预估值0.85的融合特征。For example, the server selects the fusion feature with an estimated value of 0.85 from the first estimated values 0.7, 0.85 and 0.62 corresponding to the three sets of fusion features respectively.
S1033、对一组融合特征进行解码处理,得到目标物品信息和目标内容信息。S1033. Decode a set of fusion features to obtain target item information and target content information.
在本发明的一些实施例中,服务器可以通过对一组融合特征进行解码处理,将融合特征转化为目标物品信息和目标内容信息。In some embodiments of the present invention, the server can decode a set of fused features to convert the fused features into target item information and target content information.
示例性的,服务器对第1组融合特征进行解码处理。得到目标物品信息和目标内容信息;其中,目标物品信息包括3个物品,目标内容信息包括三种内容多模态类型,文本有2种、图像有3种和图像序列有1种。For example, the server decodes the first set of fused features. Obtain the target item information and target content information; among them, the target item information includes 3 items, and the target content information includes three content multi-modal types, including 2 types of text, 3 types of images, and 1 type of image sequence.
可以理解的是,服务器根据预设的推荐模型对多个融合特征进行预估,得到多个预估值。其中,预估值越高代表融合特征的多样性越好,对应选择预估值最高的一组融合特征对应的目标物品信息和目标内容信息的多样性就越好,从而根据目标物品信息和目标内容信息生成的目标多媒体信息具有多样性。It can be understood that the server estimates multiple fusion features based on the preset recommendation model and obtains multiple estimated values. Among them, the higher the estimated value, the better the diversity of fusion features, and the better the diversity of target item information and target content information corresponding to the set of fusion features with the highest estimated value, so that according to the target item information and target The target multimedia information generated by content information is diverse.
S104、基于目标物品信息和目标内容信息,生成目标多媒体信息。S104. Generate target multimedia information based on the target item information and target content information.
在本发明的一些实施例中,服务器可以通过预设的布局生成模型,对目标物品信息和目标内容信息进行布局生成,得到多个布局。通过评价模型,对多个布局进行评估,确定候选布局。通过布局优选模型,从候选布局中,选择最优布局。根据最优布局、目标物品信息和目标内容信息,生成目标多媒体信息。将目标多媒体信息发送至终端,供终端基于目标多媒体信息进行浏览页面的展示。In some embodiments of the present invention, the server can perform layout generation on target item information and target content information through a preset layout generation model to obtain multiple layouts. Through the evaluation model, multiple layouts are evaluated and candidate layouts are determined. Through the layout optimization model, the optimal layout is selected from the candidate layouts. Target multimedia information is generated based on the optimal layout, target item information and target content information. The target multimedia information is sent to the terminal, so that the terminal displays the browsing page based on the target multimedia information.
示例性的,图6为本发明实施例提供一种多媒体信息的生成方法的一个可选的流程示意图六,如图6所示,传统的多媒体信息生成过程是:接收用户请求(相当于浏览请求),服务器对商品召回(相当于物品召回),得到商品信息;对商品信息进行模型排序,选择Top1模型对应的商品信息作为推荐的商品信息;模板化创意生成,融合商品信息得到多媒体信息。图7为本发明实施例提供一种多媒体信息的生成方法的一个可选的流程示意图七,如图7所示,将数据A/B(相当于目标物品信息和目标内容信息)输入到服务器的在线学习模块中,通过预设的布局生成模型生成初始化布局(图中未示出)。通过调整规则对初始布局进行文字大小调整、元素位置调整和颜色、对比度调整,得到多个布局。通过评价模型对多个布局进行评估(图7中以+++表示),得到评估结果,评估结果包括通过和不通过,若评估结果是“通过”,则输出布局规划(相当于候选布局),其中,布局规划包括四个布局,分别是1、2、3、4。通过布局优选模型对布局规划进行优选,得到最优样式(最优布局),最优样式包括文案or图片or视频or中间页。通过多媒体信息实时生成引擎生成目标多媒体信息。Exemplarily, Figure 6 is an optional flow diagram 6 of a multimedia information generation method provided by an embodiment of the present invention. As shown in Figure 6, the traditional multimedia information generation process is: receiving a user request (equivalent to a browsing request ), the server recalls the product (equivalent to item recall) to obtain product information; sorts the product information by model, and selects the product information corresponding to the Top1 model as the recommended product information; generates template creative ideas, and fuses product information to obtain multimedia information. Figure 7 is an optional flow diagram 7 of a method for generating multimedia information provided by an embodiment of the present invention. As shown in Figure 7, data A/B (equivalent to target item information and target content information) are input to the server. In the online learning module, an initial layout is generated through a preset layout generation model (not shown in the figure). Adjust the text size, element position, color, and contrast of the initial layout through adjustment rules to obtain multiple layouts. Multiple layouts are evaluated through the evaluation model (represented by +++ in Figure 7), and the evaluation results are obtained. The evaluation results include pass and fail. If the evaluation result is "passed", the layout plan (equivalent to the candidate layout) is output. , among which, the layout planning includes four layouts, namely 1, 2, 3, and 4. The layout planning is optimized through the layout optimization model to obtain the optimal style (optimal layout). The optimal style includes copywriting or pictures or videos or middle pages. Generate target multimedia information through a real-time multimedia information generation engine.
可以理解的是,服务器将物品信息和内容信息进行向量化表示,得到物品对应物品特征和内容对应的内容特征。服务器将不同空间下的物品特征和内容特征转化为同一空间下的向量进行融合,得到多组融合特征。由于融合特征是具有两个维度的特征,因此,融合特征具有多样化。服务器根据预设的推荐模型对多个融合特征进行预估,得到多个预估值。其中,预估值越高代表融合特征的多样性越好,对应 选择预估值最高的一组融合特征对应的目标物品信息和目标内容信息的多样性就越好,从而根据目标物品信息和目标内容信息生成的目标多媒体信息具有多样性。最后,基于多媒体信息的生成方法,可以提高目标多媒体信息的多样性,从而确保给用户提供个性化的推荐,提高了推荐效果。It can be understood that the server vectorizes the item information and content information to obtain item features corresponding to the items and content features corresponding to the content. The server converts item features and content features in different spaces into vectors in the same space and fuses them to obtain multiple sets of fusion features. Since the fused feature is a feature with two dimensions, the fused feature is diverse. The server estimates multiple fusion features based on the preset recommendation model and obtains multiple estimated values. Among them, the higher the estimated value, the better the diversity of fused features, corresponding to The better the diversity of target item information and target content information corresponding to the set of fusion features with the highest estimated value is selected, so that the target multimedia information generated based on the target item information and target content information has diversity. Finally, based on the generation method of multimedia information, the diversity of target multimedia information can be improved, thereby ensuring that personalized recommendations are provided to users and improving the recommendation effect.
在本发明的一些实施例中,图8为本发明实施例提供一种多媒体信息的生成方法的一个可选的流程示意图八,如图8所示,S104可以通过S1041-S1045实现,如下:In some embodiments of the present invention, Figure 8 is an optional flow diagram 8 of a method for generating multimedia information provided by an embodiment of the present invention. As shown in Figure 8, S104 can be implemented through S1041-S1045, as follows:
S1041、通过预设的布局生成模型,对目标物品信息和目标内容信息进行布局生成,得到多个布局。S1041. Use a preset layout generation model to generate layouts for target item information and target content information to obtain multiple layouts.
在本发明的一些实施例中,预设的布局生成模型包括图像层的先后叠放顺序和文本信息中文字大小范围约束。In some embodiments of the present invention, the preset layout generation model includes the stacking order of image layers and text size range constraints in text information.
在本发明的一些实施例中,服务器可以通过预设的布局生成模型,生成目标物品信息和目标内容信息对应的初始化布局。通过调整规则,对初始化布局进行调整,确定多个布局。In some embodiments of the present invention, the server can generate an initialization layout corresponding to the target item information and the target content information through a preset layout generation model. By adjusting the rules, adjust the initial layout and determine multiple layouts.
在本发明的一些实施例中,S1041可以通过S401和S402实现,如下:In some embodiments of the present invention, S1041 may be implemented by S401 and S402 as follows:
S401、通过预设的布局生成模型,生成目标物品信息和目标内容信息对应的初始化布局。S401. Generate an initialization layout corresponding to the target item information and target content information through a preset layout generation model.
在本发明的一些实施例中,服务器可以将目标物品信息和目标内容信息输入预设的布局生成模型中,生成目标物品信息和目标内容信息对应的初始化布局。初始化布局是指对目标物品信息和目标内容信息的位置进行排列组合得到的。In some embodiments of the present invention, the server can input the target item information and the target content information into a preset layout generation model, and generate an initialization layout corresponding to the target item information and the target content information. The initial layout refers to the arrangement and combination of the positions of target item information and target content information.
S402、通过调整规则,对初始化布局进行调整,确定多个布局。S402. Adjust the initial layout by adjusting rules to determine multiple layouts.
在本发明的一些实施例中,调整规则是将对象的偏好程度作为激励,通过不断训练得到的。具体是基于强化学习,根据对象的偏好程度,作为激励,即如果调整后,点击率更高,那么就是正激励,如果点击率变低,那么就是负向激励,通过不断反复的调整和学习得到的。In some embodiments of the present invention, the adjustment rule is obtained through continuous training using the object's preference as an incentive. Specifically, it is based on reinforcement learning and is used as an incentive according to the object's preference. That is, if the click-through rate is higher after adjustment, it is a positive incentive. If the click-through rate becomes lower, it is a negative incentive. It is obtained through repeated adjustments and learning. of.
在本发明的一些实施例中,服务器通过调整规则,对初始化布局进行调整,得到多个布局。In some embodiments of the present invention, the server adjusts the initial layout by adjusting rules to obtain multiple layouts.
可以理解的是,服务器可以通过预设的布局生成模型生成目标物品信息和目标内容信息对应的初始化布局,通过调整规则对初始化布局进行调整,确定目标物品信息和目标内容信息多个布局;通过调整规则对初始化布局进行调整,调整了不合理的布局方式,可以提高布局的合理性。服务器调整后的布局仍然可以包括多种布局,使得调整后的布局仍然具有多样性。It can be understood that the server can generate an initial layout corresponding to the target item information and target content information through a preset layout generation model, adjust the initial layout through adjustment rules, and determine multiple layouts of the target item information and target content information; by adjusting The rules adjust the initial layout and adjust the unreasonable layout method, which can improve the rationality of the layout. The adjusted layout of the server can still include multiple layouts, so that the adjusted layout still has diversity.
S1042、通过评价模型,对多个布局进行评估,确定候选布局。S1042. Evaluate multiple layouts through an evaluation model to determine candidate layouts.
在本发明的一些实施例中,评价模型用于对布局进行评价筛选。In some embodiments of the present invention, the evaluation model is used to evaluate and filter layouts.
在本发明的一些实施例中,服务器可以通过评价模型,对多个布局进行评估,得到多个布局各自对应的评估结果。若评估结果表征为成功,则将其对应的布局作为候选布局,若评估结果表征为失败,则将其对应的布局删除。In some embodiments of the present invention, the server can evaluate multiple layouts through an evaluation model and obtain evaluation results corresponding to the multiple layouts. If the evaluation result is characterized as successful, the corresponding layout will be used as a candidate layout. If the evaluation result is characterized as failure, the corresponding layout will be deleted.
在本发明的一些实施例中,S1042可以通过S501和S502实现,如下:In some embodiments of the present invention, S1042 can be implemented through S501 and S502, as follows:
S501、通过评价模型,对多个布局进行评估,得到多个布局各自对应的评估结果。S501. Use the evaluation model to evaluate multiple layouts and obtain evaluation results corresponding to the multiple layouts.
在本发明的一些实施例中,服务器可以通过评价模型,对多个布局进行合理性评估,得到多个布局各自对应的评估结果。评价结果包括成功和失败。In some embodiments of the present invention, the server can use the evaluation model to evaluate the rationality of multiple layouts and obtain corresponding evaluation results of the multiple layouts. Evaluation results include success and failure.
S502、若评估结果表征为成功,则将其对应的布局作为候选布局。S502. If the evaluation result indicates success, use the corresponding layout as a candidate layout.
在本发明的一些实施例中,服务器可根据布局的评估结果表征为成功,代表布局通过,将布局作为 候选布局。In some embodiments of the present invention, the server can characterize the layout as successful according to the evaluation result of the layout, which means that the layout passed, and the layout as Candidate layout.
可以理解的是,服务器可以通过评价模型,对多个布局进行评估,得到多个布局各自对应的评估结果,评估结果代表了局部的合理性。服务器根据评估结果,对布局进行筛选,去除不合理的布局,确定出候选布局。由于候选布局为去除不合理的布局后的筛选结果,因此,服务器选出候选布局为合理性较高的布局。It is understandable that the server can evaluate multiple layouts through the evaluation model and obtain evaluation results corresponding to the multiple layouts. The evaluation results represent local rationality. Based on the evaluation results, the server filters the layouts, removes unreasonable layouts, and determines candidate layouts. Since the candidate layout is the filtered result after removing unreasonable layouts, the server selects the candidate layout as a more reasonable layout.
在本发明的一些实施例中,在S1042之前还执行S601、S602和S603实现,如下:In some embodiments of the present invention, S601, S602 and S603 are also implemented before S1042, as follows:
S601、获取历史目标多媒体信息。S601. Obtain historical target multimedia information.
在本发明的一些实施例中,服务器可以获取历史目标多媒体信息。In some embodiments of the present invention, the server may obtain historical target multimedia information.
S602、对历史目标多媒体信息进行识别,得到历史布局。S602. Identify the historical target multimedia information and obtain the historical layout.
在本发明的一些实施例中,历史布局包括正样本数据和负样本数据。In some embodiments of the invention, the historical layout includes positive sample data and negative sample data.
在本发明的一些实施例中,服务器可以对历史目标多媒体信息进行识别解析,得到历史目标多媒体信息对应的历史布局。In some embodiments of the present invention, the server can identify and analyze the historical target multimedia information to obtain the historical layout corresponding to the historical target multimedia information.
S603、通过正样本数据和负样本数据对初始评价模型进行训练,确定评价模型。S603. Train the initial evaluation model through positive sample data and negative sample data to determine the evaluation model.
在本发明的一些实施例中,服务器通过历史布局的正样本数据和负样本数据对初始评价模型进行训练,直到模型输出的评估结果满足预设阈值,保存模型,得到评价模型。In some embodiments of the present invention, the server trains the initial evaluation model through the positive sample data and negative sample data of the historical layout until the evaluation result output by the model meets the preset threshold, saves the model, and obtains the evaluation model.
可以理解的是,服务器通过历史目标多媒体信息对初始评价模型进行训练,确定评价模型,可以保证评价模型的评估准确性。It is understandable that the server trains the initial evaluation model through historical target multimedia information to determine the evaluation model, which can ensure the evaluation accuracy of the evaluation model.
S1043、通过布局优选模型,从候选布局中,选择最优布局。S1043. Select the optimal layout from the candidate layouts through the layout optimization model.
在本发明的一些实施例中,服务器可以将候选布局输入到布局优选模型中,从候选布局中,选择最优布局。最优布局是通过布局优选模型对多个候选布局进行指标评价,得到多个候选布局各自对应的指标评价值;从多个指标评价值中,选择指标评价值最高的候选布局,将其作为最优布局。In some embodiments of the present invention, the server may input candidate layouts into the layout optimization model, and select an optimal layout from the candidate layouts. The optimal layout is to use the layout optimization model to evaluate multiple candidate layouts and obtain the corresponding index evaluation values of the multiple candidate layouts; from the multiple index evaluation values, select the candidate layout with the highest index evaluation value as the optimal layout. Excellent layout.
示例性的,有3个候选布局,通过布局优选模型对3个侯选布局分别进行指标评价,得到3个候选布局对应的指标评价值。第一候选布局的指标评价值为0.5、第二候选布局的指标评价值为0.7、第三候选布局的指标评价值为0.8;将指标评价值为0.8的第三侯选布局作为最优布局。For example, there are three candidate layouts, and the three candidate layouts are respectively evaluated with indexes through the layout optimization model, and the index evaluation values corresponding to the three candidate layouts are obtained. The index evaluation value of the first candidate layout is 0.5, the index evaluation value of the second candidate layout is 0.7, and the index evaluation value of the third candidate layout is 0.8; the third candidate layout with an index evaluation value of 0.8 is regarded as the optimal layout.
S1044、基于最优布局、目标物品信息和目标内容信息,生成目标多媒体信息。S1044. Generate target multimedia information based on the optimal layout, target item information and target content information.
在本发明的一些实施例中,服务器可以将目标物品信息和目标内容信息按照最优布局进行排列,生成目标多媒体信息。In some embodiments of the present invention, the server can arrange the target item information and the target content information according to the optimal layout to generate the target multimedia information.
S1045、将目标多媒体信息发送至终端,供终端基于目标多媒体信息进行浏览页面的展示。S1045: Send the target multimedia information to the terminal, so that the terminal can display a browsing page based on the target multimedia information.
在本发明的一些实施例中,服务器将目标多媒体信息发送至终端。终端可以展示基于目标多媒体信息进行浏览页面。In some embodiments of the present invention, the server sends the target multimedia information to the terminal. The terminal can display a browsing page based on target multimedia information.
可以理解的是,服务器可以根据预设的布局生成模型和调整规则生成目标物品信息和目标内容信息的多个布局,通过评价模型和布局优选模型,对多个布局进行筛选,确定最优布局。其中,最优布局为去除不合理的布局后确定的,那么通过最优布局得到的目标多媒体信息就比较合理和准确,因此,提高目标多媒体信息的准确性。那么服务器将目标多媒体信息进行推荐时,目标多媒体信息会更符合用户的需求,可以提供给用户个性化的推荐,推荐效果好。 It can be understood that the server can generate multiple layouts of target item information and target content information according to the preset layout generation model and adjustment rules, and filter the multiple layouts through the evaluation model and layout optimization model to determine the optimal layout. Among them, the optimal layout is determined after removing unreasonable layouts, then the target multimedia information obtained through the optimal layout is more reasonable and accurate, thus improving the accuracy of the target multimedia information. Then when the server recommends the target multimedia information, the target multimedia information will be more in line with the user's needs, and can provide the user with personalized recommendations, and the recommendation effect is good.
在本发明的一些实施例中,图9为本发明实施例提供一种多媒体信息的生成方法的一个可选的流程示意图九,如图9所示,服务器接收用户请求(相当于浏览请求);进行兴趣商品召回(相当于物品信息召回)和创意元素召回(相当于内容召回),得到物品信息和内容信息。对物品信息和内容信息进行向量化协同建模,得到物品特征和内容特征。对物品特征和内容特征进行融合,得到融合特征(图9中未示出,输入到跨模态CTR预估之前得到的)。通过跨模态CTR预估进行多商品优选(相当于融合特征优选),得到最优商品内容组合;其中,多模态包括文本(相当于文本特征)、样式、图片(相当于图像特征)、视频(相当于行为特征)。通过预设的布局生成模型(图中未示出)和调整规则,对商品内容组合进行元素规划实时生成布局,确定最终目标多媒体信息发送至用户(相当于终端)。In some embodiments of the present invention, Figure 9 is an optional flow diagram 9 that provides a method for generating multimedia information according to an embodiment of the present invention. As shown in Figure 9, the server receives a user request (equivalent to a browsing request); Carry out interest product recall (equivalent to item information recall) and creative element recall (equivalent to content recall) to obtain item information and content information. Conduct vectorized collaborative modeling of item information and content information to obtain item features and content features. Fusion of item features and content features is performed to obtain fusion features (not shown in Figure 9, obtained before inputting to cross-modal CTR estimation). Multi-product selection (equivalent to fusion feature selection) is performed through cross-modal CTR estimation to obtain the optimal product content combination; among them, multi-modality includes text (equivalent to text features), style, picture (equivalent to image features), Video (equivalent to behavioral characteristics). Through the preset layout generation model (not shown in the figure) and adjustment rules, element planning is performed on the product content combination to generate the layout in real time, and the final target multimedia information is determined to be sent to the user (equivalent to the terminal).
可以理解的是,首先,服务器可以将物品信息和内容信息进行向量化表示,得到物品对应物品特征和内容对应的内容特征。服务器将不同空间下的物品特征和内容特征转化为同一空间下的向量进行融合,得到多个融合特征。由于融合特征是具有两个维度的特征,因此,融合特征具有多样性。基于融合特征具有多样性的特性,服务器根据预设的推荐模型对多个融合特征进行预估,得到多个预估值。其中,预估值越高代表融合特征的多样性越好,对应选择预估值最高的一组融合特征对应的最优商品内容组合的多样性就越好。其次,服务器通过预设的布局生成模型和调整规则,对最优商品内容组合进行元素规划实时生成布局,确定最终目标多媒体信息。由于,最优商品内容组合具有多样性,根据最优商品内容组合生成的目标多媒体信息也具有多样性。It can be understood that, first, the server can vectorize the item information and content information to obtain the item characteristics corresponding to the item and the content characteristics corresponding to the content. The server converts item features and content features in different spaces into vectors in the same space and fuses them to obtain multiple fusion features. Since the fused feature is a feature with two dimensions, the fused feature has diversity. Based on the diversity of fusion features, the server estimates multiple fusion features based on the preset recommendation model and obtains multiple estimated values. Among them, the higher the estimated value, the better the diversity of the fusion features, and the better the diversity of the optimal product content combination corresponding to the set of fusion features with the highest estimated value. Secondly, the server uses the preset layout generation model and adjustment rules to perform element planning on the optimal product content combination to generate the layout in real time to determine the final target multimedia information. Since the optimal product content combination is diverse, the target multimedia information generated based on the optimal product content combination is also diverse.
基于上述实施例的多媒体信息的生成方法,本发明实施例还提供了一种多媒体信息的生成装置,如图10所示,图10为本发明实施例提供的一种多媒体信息的生成装置的结构示意图一,该多媒体信息的生成装置10包括:获取部分1001、选择部分1002和生成部分1003;其中,Based on the multimedia information generation method of the above embodiment, the embodiment of the present invention also provides a multimedia information generation device, as shown in Figure 10. Figure 10 is the structure of a multimedia information generation device provided by the embodiment of the present invention. Schematic diagram 1 shows that the multimedia information generation device 10 includes: an acquisition part 1001, a selection part 1002 and a generation part 1003; wherein,
所述获取部分1001,被配置为响应于接收到的浏览请求,召回物品信息及内容信息;基于所述物品信息及内容信息进行特征提取,得到物品维度对应的物品特征和内容维度对应的内容特征,并将所述物品特征和所述内容特征进行协同和融合,得到多组融合特征;每组融合特征表征不同内容模态组合下与不同物品之间的融合;The acquisition part 1001 is configured to recall item information and content information in response to a received browsing request; perform feature extraction based on the item information and content information to obtain item features corresponding to the item dimension and content features corresponding to the content dimension , and collaborate and fuse the item features and the content features to obtain multiple sets of fusion features; each set of fusion features represents the fusion between different content modal combinations and different items;
所述选择部分1002,被配置为通过预设的推荐模型,对所述多组融合特征进行预估,选择预估值最高的一组融合特征对应的目标物品信息和目标内容信息;所述预设的推荐模型表征对融合特征进行优选;The selection part 1002 is configured to estimate the multiple sets of fusion features through a preset recommendation model, and select the target item information and target content information corresponding to the set of fusion features with the highest estimated value; the prediction The recommended recommendation model representation is designed to optimize the fusion features;
所述生成部分1003,被配置为基于所述目标物品信息和所述目标内容信息,生成目标多媒体信息。The generating part 1003 is configured to generate target multimedia information based on the target item information and the target content information.
在本发明的一些实施例中,所述获取部分1001,被配置为对所述物品信息进行特征提取,得到物品维度对应所述物品特征;对所述内容信息进行识别,得到内容多模态类型对应的内容信息;所述内容多模态类型包括文本信息、图像信息和图像序列信息中至少两种模态;对所述内容多模态类型对应的内容信息进行特征提取,得到内容维度对应的所述内容特征。In some embodiments of the present invention, the acquisition part 1001 is configured to perform feature extraction on the item information to obtain the item dimensions corresponding to the item characteristics; identify the content information to obtain the content multi-modal type Corresponding content information; the content multi-modal type includes at least two modalities among text information, image information and image sequence information; feature extraction is performed on the content information corresponding to the content multi-modal type to obtain content dimensions corresponding to The content characteristics.
在本发明的一些实施例中,所述多媒体信息的生成装置还包括确定部分1004;其中,In some embodiments of the present invention, the device for generating multimedia information further includes a determining part 1004; wherein,
所述获取部分1001,被配置为若所述内容多模态类型为文本类型,则通过第一编码方式对所述文本信息进行特征提取,得到文本特征;若所述内容多模态类型为图像类型或图像序列类型,则通过第二编码方式分别对所述图像信息和所述图像序列信息进行特征提取,得到图像特征和行为特征; The acquisition part 1001 is configured to perform feature extraction on the text information through the first encoding method to obtain text features if the content multi-modal type is a text type; if the content multi-modal type is an image type or image sequence type, then perform feature extraction on the image information and the image sequence information respectively through the second encoding method to obtain image features and behavioral features;
所述确定部分1004,被配置为根据所述文本特征、所述图像特征和所述行为特征中的至少一种,确定内容维度对应的所述内容特征。The determining part 1004 is configured to determine the content characteristics corresponding to the content dimension according to at least one of the text characteristics, the image characteristics and the behavioral characteristics.
在本发明的一些实施例中,所述获取部分1001,被配置为若所述内容多模态类型为文本类型,则对所述文本信息进行特征提取,得到文本初始特征;所述文本初始特征包括语义表达信息和词语信息;通过所述第一编码方式,对所述文本初始特征进行编码处理,得到所述文本特征。In some embodiments of the present invention, the acquisition part 1001 is configured to perform feature extraction on the text information to obtain initial text features if the content multi-modal type is a text type; the text initial features It includes semantic expression information and word information; through the first encoding method, the text initial features are encoded to obtain the text features.
在本发明的一些实施例中,所述获取部分1001,被配置为若所述内容多模态类型为图像类型,则对所述图像信息进行特征提取,得到图像初始特征;所述图像初始特征包括场景信息、内容信息和风格信息;若所述内容多模态类型为图像序列类型,则对所述图像序列信息进行特征提取,得到行为初始特征;所述行为初始特征包括主体目标信息和关键帧信息;通过所述第二编码方式,对所述图像初始特征和所述行为初始特征分别进行编码处理,得到所述图像特征和所述行为特征。In some embodiments of the present invention, the acquisition part 1001 is configured to, if the content multimodal type is an image type, perform feature extraction on the image information to obtain image initial features; the image initial features include scene information, content information and style information; if the content multimodal type is an image sequence type, perform feature extraction on the image sequence information to obtain behavior initial features; the behavior initial features include subject target information and key frame information; and through the second encoding method, encode the image initial features and the behavior initial features respectively to obtain the image features and the behavior features.
在本发明的一些实施例中,所述获取部分1001,被配置为对所述物品特征和所述内容特征,进行协同处理,得到同一概率分布的第一物品特征和第一内容特征;所述第一物品特征包括多个第一子物品特征;所述第一内容特征包括多个第一子内容特征;对所述多个第一子物品特征进行随机组合,得到多个物品组合特征;对所述多个第一子内容特征进行随机组合,得到多个内容组合特征;所述内容组合特征包含至少两种内容多模态类型对应的内容特征;对所述多个物品组合特征和所述多个内容组合特征进行融合,得到所述多组融合特征。In some embodiments of the present invention, the acquisition part 1001 is configured to perform collaborative processing on the item characteristics and the content characteristics to obtain the first item characteristics and the first content characteristics of the same probability distribution; The first item feature includes a plurality of first sub-item features; the first content feature includes a plurality of first sub-content features; the multiple first sub-item features are randomly combined to obtain multiple item combination features; The plurality of first sub-content features are randomly combined to obtain multiple content combination features; the content combination features include content features corresponding to at least two content multi-modal types; the multiple item combination features and the Multiple content combination features are fused to obtain the multiple sets of fusion features.
在本发明的一些实施例中,所述获取部分1001,被配置为将所述多组融合特征输入所述预设的推荐模型中进行预估,得到所述多组融合特征各自对应的第一预估值;基于多个所述第一预估值,从所述多组融合特征中,选择预估值最高的一组融合特征;对所述一组融合特征进行解码处理,得到所述目标物品信息和所述目标内容信息。In some embodiments of the present invention, the acquisition part 1001 is configured to input the multiple sets of fusion features into the preset recommendation model for prediction, and obtain the first corresponding first set of the multiple sets of fusion features. Estimated value; based on a plurality of the first estimated values, select a group of fusion features with the highest estimated value from the plurality of groups of fusion features; decode the group of fused features to obtain the target Item information and the target content information.
在本发明的一些实施例中,所述获取部分1001,被配置为通过预设的布局生成模型,对所述目标物品信息和所述目标内容信息进行布局生成,得到多个布局;所述预设的布局生成模型表征通过物品和内容调整布局;In some embodiments of the present invention, the acquisition part 1001 is configured to perform layout generation on the target item information and the target content information through a preset layout generation model to obtain multiple layouts; the preset layout generation model The layout generation model is designed to represent the adjustment of layout through items and content;
所述确定部分1004,被配置为通过评价模型,对所述多个布局进行评估,确定候选布局;所述评价模型被配置为对布局进行评价筛选;The determination part 1004 is configured to evaluate the multiple layouts and determine candidate layouts through an evaluation model; the evaluation model is configured to evaluate and screen layouts;
所述选择部分1002,被配置为通过布局优选模型,从所述候选布局中,选择最优布局;The selection part 1002 is configured to select an optimal layout from the candidate layouts through a layout optimization model;
所述生成部分1003,被配置为基于所述最优布局、所述目标物品信息和所述目标内容信息,生成所述目标多媒体信息。The generating part 1003 is configured to generate the target multimedia information based on the optimal layout, the target item information and the target content information.
在本发明的一些实施例中,所述生成部分1003,被配置为通过预设的布局生成模型,生成所述目标物品信息和所述目标内容信息对应的初始化布局;所述预设的布局生成模型包括图像层的先后叠放顺序和文本信息中文字大小范围约束;In some embodiments of the present invention, the generation part 1003 is configured to generate an initialization layout corresponding to the target item information and the target content information through a preset layout generation model; the preset layout generation The model includes the stacking order of image layers and the text size range constraints in text information;
所述确定部分1004,被配置为通过调整规则,对所述初始化布局进行调整,确定所述多个布局;所述调整规则是将对象的偏好程度作为激励,通过不断训练得到的。The determination part 1004 is configured to adjust the initial layout and determine the multiple layouts through adjustment rules; the adjustment rules are obtained through continuous training using the object's preference as an incentive.
在本发明的一些实施例中,在通过评价模型,对所述多个布局进行评估,确定候选布局之前,所述获取部分1001,被配置为获取历史目标多媒体信息;对所述历史目标多媒体信息进行识别,得到历史 布局;所述历史布局包括正样本数据和负样本数据;In some embodiments of the present invention, before evaluating the multiple layouts through the evaluation model and determining candidate layouts, the acquisition part 1001 is configured to obtain historical target multimedia information; Identify and get history Layout; the historical layout includes positive sample data and negative sample data;
所述确定部分1004,被配置为通过所述正样本数据和所述负样本数据对初始评价模型进行训练,确定所述评价模型。The determination part 1004 is configured to train an initial evaluation model through the positive sample data and the negative sample data, and determine the evaluation model.
在本发明的一些实施例中,所述获取部分1001,被配置为通过所述评价模型,对所述多个布局进行评估,得到所述多个布局各自对应的评估结果;In some embodiments of the present invention, the acquisition part 1001 is configured to evaluate the multiple layouts through the evaluation model and obtain the evaluation results corresponding to the multiple layouts;
所述确定部分1004,被配置为若所述评估结果表征为成功,则将其对应的布局作为所述候选布局。The determining part 1004 is configured to use the corresponding layout as the candidate layout if the evaluation result is characterized as successful.
可以理解的是,首先,服务器可以将物品信息和内容信息进行向量化表示,得到物品对应物品特征和内容对应的内容特征。服务器将不同空间下的物品特征和内容特征转化为同一空间下的向量进行融合,得到多个融合特征。由于融合特征是具有两个维度的特征,因此,融合特征具有多样性。基于融合特征具有多样性的特性,服务器根据预设的推荐模型对多个融合特征进行预估,得到多个预估值;其中,预估值越高代表融合特征的多样性越好,对应选择预估值最高的一组融合特征对应的最优商品内容组合的多样性就越好。其次,服务器通过预设的布局生成模型和调整规则,对最优商品内容组合进行元素规划实时生成布局,确定最终目标多媒体信息。由于,最优商品内容组合具有多样性,根据最优商品内容组合生成的目标多媒体信息也具有多样性。It can be understood that, first, the server can vectorize the item information and content information to obtain the item characteristics corresponding to the item and the content characteristics corresponding to the content. The server converts item features and content features in different spaces into vectors in the same space and fuses them to obtain multiple fusion features. Since the fused feature is a feature with two dimensions, the fused feature has diversity. Based on the diversity of fusion features, the server estimates multiple fusion features based on the preset recommendation model and obtains multiple estimates; among them, the higher the estimate, the better the diversity of the fusion features, and the corresponding selection The set of fusion features with the highest estimated value corresponds to a better diversity of optimal product content combinations. Secondly, the server uses the preset layout generation model and adjustment rules to perform element planning on the optimal product content combination to generate the layout in real time to determine the final target multimedia information. Since the optimal product content combination is diverse, the target multimedia information generated based on the optimal product content combination is also diverse.
需要说明的是,在进行多媒体信息的生成时,仅以上述各程序模块的划分进行举例说明,实际应用中,可以根据需要而将上述处理分配由不同的程序模块完成,即将装置的内部结构划分成不同的程序模块,以完成以上描述的全部或者部分处理。另外,上述实施例提供的多媒体信息的生成装置与多媒体信息的生成方法实施例属于同一构思,其具体实现过程及有益效果详见方法实施例,这里不再赘述。对于本装置实施例中未披露的技术细节,请参照本发明方法实施例的描述而理解。It should be noted that when generating multimedia information, only the division of the above-mentioned program modules is used as an example. In practical applications, the above-mentioned processing can be allocated to different program modules as needed, that is, the internal structure of the device is divided into into different program modules to complete all or part of the processing described above. In addition, the multimedia information generation device provided by the above embodiments and the multimedia information generation method embodiments belong to the same concept. The specific implementation process and beneficial effects can be found in the method embodiments and will not be described again here. For technical details not disclosed in the device embodiment, please refer to the description of the method embodiment of the present invention for understanding.
基于上述实施例的多媒体信息的生成方法,本发明实施例还提供一种多媒体信息的生成装置,如图11所示,图11为本发明实施例提供的一种多媒体信息的生成装置的结构示意图二,该多媒体信息的生成装置11包括:处理器1101和存储器1102;存储器1102存储处理器可执行的一个或者多个程序,当一个或者多个程序被执行时,通过处理器1101执行如前所述实施例的任意一种多媒体信息的生成方法。Based on the multimedia information generation method of the above embodiment, the embodiment of the present invention also provides a multimedia information generation device, as shown in Figure 11. Figure 11 is a schematic structural diagram of a multimedia information generation device provided by the embodiment of the present invention. Second, the multimedia information generating device 11 includes: a processor 1101 and a memory 1102; the memory 1102 stores one or more programs executable by the processor. When one or more programs are executed, the processor 1101 executes the program as described above. Any method for generating multimedia information according to the above embodiments.
本领域内的技术人员应明白,本发明的实施例可提供为方法、系统、或计算机程序产品。因此,本发明可采用硬件实施例、软件实施例、或结合软件和硬件方面的实施例的形式。而且,本发明可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器和光学存储器等)上实施的计算机程序产品的形式。Those skilled in the art will appreciate that embodiments of the present invention may be provided as methods, systems, or computer program products. Accordingly, the invention may take the form of a hardware embodiment, a software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, magnetic disk storage, optical storage, etc.) embodying computer-usable program code therein.
本发明是参照根据本发明实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each process and/or block in the flowchart illustrations and/or block diagrams, and combinations of processes and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing device to produce a machine, such that the instructions executed by the processor of the computer or other programmable data processing device produce a use A device for realizing the functions specified in one process or multiple processes of the flowchart and/or one block or multiple blocks of the block diagram.
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实 现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions may also be stored in a computer-readable memory that causes a computer or other programmable data processing apparatus to operate in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction means, the instructions device actual Now flowchart a process or processes and/or block diagram a function specified in a box or boxes.
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions may also be loaded onto a computer or other programmable data processing device, causing a series of operating steps to be performed on the computer or other programmable device to produce computer-implemented processing, thereby executing on the computer or other programmable device. Instructions provide steps for implementing the functions specified in a process or processes of a flowchart diagram and/or a block or blocks of a block diagram.
以上所述,仅为本发明的较佳实施例而已,并非用于限定本发明的保护范围。The above descriptions are only preferred embodiments of the present invention and are not intended to limit the scope of the present invention.
工业实用性Industrial applicability
本发明实施例提供了一种多媒体信息的生成方法及装置、计算机可读存储介质,方法包括:响应于接收到的浏览请求,召回物品信息及内容信息;基于物品信息及内容信息进行特征提取,得到物品维度对应的物品特征和内容维度对应的内容特征,并将物品特征和内容特征进行协同和融合,得到多组融合特征;通过预设的推荐模型,对多组融合特征进行预估,选择预估值最高的一组融合特征对应的目标物品信息和目标内容信息;基于目标物品信息和目标内容信息,生成目标多媒体信息。上述方案,对物品信息及内容信息进行特征提取与组合,得到多个融合特征,生成的目标多媒体信息具有多样性,推荐效果好。 Embodiments of the present invention provide a method and device for generating multimedia information, and a computer-readable storage medium. The method includes: responding to a received browsing request, recalling item information and content information; performing feature extraction based on the item information and content information, Obtain the item features corresponding to the item dimension and the content features corresponding to the content dimension, and collaborate and fuse the item features and content features to obtain multiple sets of fusion features; through the preset recommendation model, estimate the multiple sets of fusion features and select The target item information and target content information corresponding to the set of fusion features with the highest estimated value; based on the target item information and target content information, target multimedia information is generated. The above scheme extracts and combines features of item information and content information to obtain multiple fusion features. The generated target multimedia information is diverse and has good recommendation effects.

Claims (24)

  1. 一种多媒体信息的生成方法,包括:A method for generating multimedia information, including:
    响应于接收到的浏览请求,召回物品信息及内容信息;Recall item information and content information in response to a received browsing request;
    基于所述物品信息及内容信息进行特征提取,得到物品维度对应的物品特征和内容维度对应的内容特征,并将所述物品特征和所述内容特征进行协同和融合,得到多组融合特征;每组融合特征表征不同内容模态组合下与不同物品之间的融合;Feature extraction is performed based on the item information and content information to obtain item features corresponding to the item dimension and content features corresponding to the content dimension, and the item features and content features are collaborated and fused to obtain multiple sets of fusion features; each Group fusion features represent the fusion between different content modal combinations and different items;
    通过预设的推荐模型,对所述多组融合特征进行预估,选择预估值最高的一组融合特征对应的目标物品信息和目标内容信息;所述预设的推荐模型表征对融合特征进行筛选;The plurality of groups of fusion features are estimated by a preset recommendation model, and the target item information and target content information corresponding to a group of fusion features with the highest estimated value are selected; the preset recommendation model representation screens the fusion features;
    基于所述目标物品信息和所述目标内容信息,生成目标多媒体信息。Target multimedia information is generated based on the target item information and the target content information.
  2. 根据权利要求1所述的方法,其中,所述基于所述物品信息及内容信息进行特征提取,得到物品维度对应的物品特征和内容维度对应的内容特征,包括:The method according to claim 1, wherein the feature extraction based on the item information and content information to obtain item features corresponding to the item dimension and content features corresponding to the content dimension includes:
    对所述物品信息进行特征提取,得到物品维度对应的所述物品特征;Perform feature extraction on the item information to obtain the item characteristics corresponding to the item dimensions;
    对所述内容信息进行识别,得到内容多模态类型对应的内容信息;所述内容多模态类型包括文本信息、图像信息和图像序列信息中至少两种模态;Identify the content information to obtain content information corresponding to a content multi-modal type; the content multi-modal type includes at least two modalities among text information, image information and image sequence information;
    对所述内容多模态类型对应的内容信息进行特征提取,得到内容维度对应的所述内容特征。Feature extraction is performed on the content information corresponding to the content multi-modal type to obtain the content features corresponding to the content dimensions.
  3. 根据权利要求2所述的方法,其中,所述对所述内容多模态类型对应的内容信息进行特征提取,得到内容维度对应的所述内容特征,包括:The method according to claim 2, wherein the feature extraction of the content information corresponding to the content multi-modal type to obtain the content features corresponding to the content dimension includes:
    若所述内容多模态类型为文本类型,则通过第一编码方式对所述文本信息进行特征提取,得到文本特征;If the content multimodal type is a text type, extracting features from the text information using a first encoding method to obtain text features;
    若所述内容多模态类型为图像类型或图像序列类型,则通过第二编码方式分别对所述图像信息和所述图像序列信息进行特征提取,得到图像特征和行为特征;If the content multi-modal type is an image type or an image sequence type, feature extraction is performed on the image information and the image sequence information respectively through the second encoding method to obtain image features and behavioral features;
    根据所述文本特征、所述图像特征和所述行为特征中的至少一种,确定内容维度对应的所述内容特征。The content characteristics corresponding to the content dimension are determined according to at least one of the text characteristics, the image characteristics and the behavioral characteristics.
  4. 根据权利要求3所述的方法,其中,若所述内容多模态类型为文本类型,则通过第一编码方式对所述文本信息进行特征提取,得到文本特征,包括:The method according to claim 3, wherein if the content multi-modal type is a text type, feature extraction is performed on the text information through the first encoding method to obtain text features, including:
    若所述内容多模态类型为文本类型,则对所述文本信息进行特征提取,得到文本初始特征;所述文本初始特征包括语义表达信息和词语信息;If the content multi-modal type is a text type, feature extraction is performed on the text information to obtain text initial features; the text initial features include semantic expression information and word information;
    通过所述第一编码方式,对所述文本初始特征进行编码处理,得到所述文本特征。Using the first encoding method, the text initial features are encoded to obtain the text features.
  5. 根据权利要求3所述的方法,其中,所述若所述内容多模态类型为图像类型或图像序列类型,则通过第二编码方式分别对所述图像信息和所述图像序列信息进行特征提取,得到图像特征和行为特征,包括:The method according to claim 3, wherein if the content multi-modal type is an image type or an image sequence type, feature extraction is performed on the image information and the image sequence information respectively through a second encoding method. , obtain image features and behavioral features, including:
    若所述内容多模态类型为图像类型,则对所述图像信息进行特征提取,得到图像初始特征;所述图像初始特征包括场景信息、内容信息和风格信息;If the content multi-modal type is an image type, perform feature extraction on the image information to obtain initial image features; the initial image features include scene information, content information and style information;
    若所述内容多模态类型为图像序列类型,则对所述图像序列信息进行特征提取,得到行为初始特征;所述行为初始特征包括主体目标信息和关键帧信息;If the content multi-modal type is an image sequence type, feature extraction is performed on the image sequence information to obtain initial behavioral features; the initial behavioral features include subject target information and key frame information;
    通过所述第二编码方式,对所述图像初始特征和所述行为初始特征分别进行编码处理,得到所述图像特征和所述行为特征。Through the second encoding method, the image initial features and the behavior initial features are respectively encoded to obtain the image features and the behavior features.
  6. 根据权利要求1所述的方法,其中,所述将所述物品特征和所述内容特征进行协同和融合,得到多组融合特征,包括:The method according to claim 1, wherein the item characteristics and the content characteristics are coordinated and fused to obtain multiple sets of fusion characteristics, including:
    对所述物品特征和所述内容特征,进行协同处理,得到同一概率分布的第一物品特征和第一内容特征;所述第一物品特征包括多个第一子物品特征;所述第一内容特征包括多个第一子内容特征;The item characteristics and the content characteristics are collaboratively processed to obtain the first item characteristics and the first content characteristics of the same probability distribution; the first item characteristics include a plurality of first sub-item characteristics; the first content The features include a plurality of first sub-content features;
    对所述多个第一子物品特征进行随机组合,得到多个物品组合特征;Randomly combine the multiple first sub-item features to obtain multiple item combination features;
    对所述多个第一子内容特征进行随机组合,得到多个内容组合特征;所述内容组合特征包含至少两种内容多模态类型对应的内容特征; Randomly combine the plurality of first sub-content features to obtain multiple content combination features; the content combination features include content features corresponding to at least two content multi-modal types;
    对所述多个物品组合特征和所述多个内容组合特征进行融合,得到所述多组融合特征。The plurality of item combination features and the plurality of content combination features are fused to obtain the plurality of sets of fusion features.
  7. 根据权利要求1-6任一项所述的方法,其中,所述通过预设的推荐模型,对所述多组融合特征进行预估,选择预估值最高的一组融合特征对应的目标物品信息和目标内容信息,包括:The method according to any one of claims 1 to 6, wherein the multiple sets of fusion features are estimated through a preset recommendation model, and the target item corresponding to the set of fusion features with the highest estimated value is selected. Information and target content information, including:
    将所述多组融合特征输入所述预设的推荐模型中进行预估,得到所述多组融合特征各自对应的第一预估值;Inputting the multiple groups of fused features into the preset recommendation model for estimation, and obtaining first estimated values corresponding to each of the multiple groups of fused features;
    基于多个所述第一预估值,从所述多组融合特征中,选择预估值最高的一组融合特征;Based on a plurality of the first estimated values, select a set of fusion features with the highest estimated value from the multiple sets of fusion features;
    对所述一组融合特征进行解码处理,得到所述目标物品信息和所述目标内容信息。The set of fused features is decoded to obtain the target item information and the target content information.
  8. 根据权利要求1-6任一项所述的方法,其中,所述基于所述目标物品信息和所述目标内容信息,生成目标多媒体信息,包括:The method according to any one of claims 1 to 6, wherein generating target multimedia information based on the target item information and the target content information includes:
    通过预设的布局生成模型,对所述目标物品信息和所述目标内容信息进行布局生成,得到多个布局;所述预设的布局生成模型表征通过物品和内容调整布局;Through a preset layout generation model, layout generation is performed on the target item information and the target content information to obtain multiple layouts; the preset layout generation model represents the adjustment of layout through items and content;
    通过评价模型,对所述多个布局进行评估,确定候选布局;所述评价模型用于对布局进行评价筛选;Through the evaluation model, the multiple layouts are evaluated and candidate layouts are determined; the evaluation model is used to evaluate and screen the layouts;
    通过布局优选模型,从所述候选布局中,选择最优布局;Select the optimal layout from the candidate layouts through the layout optimization model;
    基于所述最优布局、所述目标物品信息和所述目标内容信息,生成所述目标多媒体信息。The target multimedia information is generated based on the optimal layout, the target item information and the target content information.
  9. 根据权利要求8所述的方法,其中,所述通过预设的布局生成模型,对所述目标物品和所述目标内容进行布局生成,得到多个布局,包括:The method according to claim 8, wherein the target items and the target content are laid out using a preset layout generation model to obtain multiple layouts, including:
    通过预设的布局生成模型,生成所述目标物品信息和所述目标内容信息对应的初始化布局;所述预设的布局生成模型包括图像层的先后叠放顺序和文本信息中文字大小范围约束;Generate an initialization layout corresponding to the target item information and the target content information through a preset layout generation model; the preset layout generation model includes the stacking order of image layers and the text size range constraints in the text information;
    通过调整规则,对所述初始化布局进行调整,确定所述多个布局;所述调整规则是将对象的偏好程度作为激励,通过不断训练得到的。Through adjustment rules, the initialization layout is adjusted and the multiple layouts are determined; the adjustment rules are obtained through continuous training using the object's preference as an incentive.
  10. 根据权利要求8所述的方法,其中,所述通过评价模型,对所述多个布局进行评估,确定候选布局之前,所述方法还包括:The method according to claim 8, wherein before evaluating the multiple layouts through the evaluation model and determining candidate layouts, the method further includes:
    获取历史目标多媒体信息;Obtain historical target multimedia information;
    对所述历史目标多媒体信息进行识别,得到历史布局;所述历史布局包括正样本数据和负样本数据;Identify the historical target multimedia information to obtain a historical layout; the historical layout includes positive sample data and negative sample data;
    通过所述正样本数据和所述负样本数据对初始评价模型进行训练,确定所述评价模型。The initial evaluation model is trained using the positive sample data and the negative sample data to determine the evaluation model.
  11. 根据权利要求8所述的方法,其中,所述通过评价模型,对所述多个布局进行评估,确定候选布局,包括:The method according to claim 8, wherein said evaluating the plurality of layouts through an evaluation model and determining candidate layouts includes:
    通过所述评价模型,对所述多个布局进行评估,得到所述多个布局各自对应的评估结果;Use the evaluation model to evaluate the multiple layouts and obtain evaluation results corresponding to the multiple layouts;
    若所述评估结果表征为成功,则将其对应的布局作为所述候选布局。If the evaluation result is characterized as successful, the corresponding layout is used as the candidate layout.
  12. 一种多媒体信息的生成装置,包括:获取部分、选择部分和生成部分;其中,A device for generating multimedia information, including: an acquisition part, a selection part and a generation part; wherein,
    所述获取部分,被配置为响应于接收到的浏览请求,召回物品信息及内容信息;基于所述物品信息及内容信息进行特征提取,得到物品维度对应的物品特征和内容维度对应的内容特征,并将所述物品特征和所述内容特征进行协同和融合,得到多组融合特征;每组融合特征表征不同内容模态组合下与不同物品之间的融合;The acquisition part is configured to recall item information and content information in response to a received browsing request; perform feature extraction based on the item information and content information to obtain item features corresponding to the item dimension and content features corresponding to the content dimension, The item features and the content features are collaborated and fused to obtain multiple sets of fusion features; each set of fusion features represents the fusion between different content modal combinations and different items;
    所述选择部分,被配置为通过预设的推荐模型,对所述多组融合特征进行预估,选择预估值最高的一组融合特征对应的目标物品信息和目标内容信息;所述预设的推荐模型表征对融合特征进行优选;The selection part is configured to estimate the multiple sets of fusion features through a preset recommendation model, and select the target item information and target content information corresponding to the set of fusion features with the highest estimated value; the preset The recommendation model representation optimizes the fusion features;
    所述生成部分,被配置为基于所述目标物品信息和所述目标内容信息,生成目标多媒体信息。The generating part is configured to generate target multimedia information based on the target item information and the target content information.
  13. 根据权利要求12所述的装置,其中,所述获取部分,还被配置为对所述物品信息进行特征提取,得到物品维度对应的所述物品特征;对所述内容信息进行识别,得到内容多模态类型对应的内容信息;所述内容多模态类型包括文本信息、图像信息和图像序列信息中至少两种模态;对所述内容多模态类型对应的内容信息进行特征提取,得到内容维度对应的所述内容特征。The device according to claim 12, wherein the acquisition part is further configured to perform feature extraction on the item information to obtain the item characteristics corresponding to the item dimensions; identify the content information to obtain the content information. Content information corresponding to the modal type; the content multi-modal type includes at least two modalities among text information, image information and image sequence information; feature extraction is performed on the content information corresponding to the content multi-modal type to obtain the content The content characteristics corresponding to the dimension.
  14. 根据权利要求13所述的装置,其中,所述多媒体信息的生成装置还包括:确定部分;The device according to claim 13, wherein the multimedia information generating device further includes: a determining part;
    所述获取部分,还被配置为若所述内容多模态类型为文本类型,则通过第一编码方式对所述文本信息进行特征提取,得到文本特征;若所述内容多模态类型为图像类型或图像序列类型,则通过第二编码方式分别对所述图像信息和所述图像序列信息进行特征提取,得到图像特征和行为特征; The acquisition part is further configured to: if the content multi-modal type is a text type, perform feature extraction on the text information through the first encoding method to obtain text features; if the content multi-modal type is an image type or image sequence type, then perform feature extraction on the image information and the image sequence information respectively through the second encoding method to obtain image features and behavioral features;
    所述确定部分,被配置为根据所述文本特征、所述图像特征和所述行为特征中的至少一种,确定内容维度对应的所述内容特征。The determining part is configured to determine the content feature corresponding to the content dimension based on at least one of the text feature, the image feature, and the behavioral feature.
  15. 根据权利要求14所述的装置,其中,所述获取部分,还被配置为若所述内容多模态类型为文本类型,则对所述文本信息进行特征提取,得到文本初始特征;所述文本初始特征包括语义表达信息和词语信息;通过所述第一编码方式,对所述文本初始特征进行编码处理,得到所述文本特征。The device according to claim 14, wherein the acquisition part is further configured to perform feature extraction on the text information to obtain initial text features if the content multi-modal type is a text type; the text The initial features include semantic expression information and word information; through the first encoding method, the text initial features are encoded to obtain the text features.
  16. 根据权利要求14所述的装置,其中,所述获取部分,还被配置为若所述内容多模态类型为图像类型,则对所述图像信息进行特征提取,得到图像初始特征;所述图像初始特征包括场景信息、内容信息和风格信息;若所述内容多模态类型为图像序列类型,则对所述图像序列信息进行特征提取,得到行为初始特征;所述行为初始特征包括主体目标信息和关键帧信息;通过所述第二编码方式,对所述图像初始特征和所述行为初始特征分别进行编码处理,得到所述图像特征和所述行为特征。The device according to claim 14, wherein the acquisition part is further configured to perform feature extraction on the image information to obtain initial image features if the content multi-modal type is an image type; the image The initial features include scene information, content information and style information; if the content multi-modal type is an image sequence type, feature extraction is performed on the image sequence information to obtain behavioral initial features; the behavioral initial features include subject target information and key frame information; through the second encoding method, the image initial features and the behavior initial features are respectively encoded to obtain the image features and the behavior features.
  17. 根据权利要求12所述的装置,其中,所述获取部分,还被配置为对所述物品特征和所述内容特征,进行协同处理,得到同一概率分布的第一物品特征和第一内容特征;所述第一物品特征包括多个第一子物品特征;所述第一内容特征包括多个第一子内容特征;对所述多个第一子物品特征进行随机组合,得到多个物品组合特征;对所述多个第一子内容特征进行随机组合,得到多个内容组合特征;所述内容组合特征包含至少两种内容多模态类型对应的内容特征;对所述多个物品组合特征和所述多个内容组合特征进行融合,得到所述多组融合特征。The device according to claim 12, wherein the acquisition part is further configured to perform collaborative processing on the item characteristics and the content characteristics to obtain the first item characteristics and the first content characteristics of the same probability distribution; The first item feature includes a plurality of first sub-item features; the first content feature includes a plurality of first sub-content features; the multiple first sub-item features are randomly combined to obtain multiple item combination features ; Randomly combine the plurality of first sub-content features to obtain multiple content combination features; the content combination features include content features corresponding to at least two content multi-modal types; combine the multiple item combination features with The multiple content combination features are fused to obtain the multiple sets of fusion features.
  18. 根据权利要求12-17任一项所述的装置,其中,所述获取部分,还被配置为将所述多组融合特征输入所述预设的推荐模型中进行预估,得到所述多组融合特征各自对应的第一预估值;The device according to any one of claims 12 to 17, wherein the acquisition part is further configured to input the multiple sets of fusion features into the preset recommendation model for prediction, and obtain the multiple sets of fusion features. The first estimated value corresponding to each fusion feature;
    所述选择部分,还被配置为基于多个所述第一预估值,从所述多组融合特征中,选择预估值最高的一组融合特征;The selection part is further configured to select a group of fusion features with the highest estimated value from the plurality of groups of fusion features based on a plurality of the first estimated values;
    所述获取部分,还被配置为对所述一组融合特征进行解码处理,得到所述目标物品信息和所述目标内容信息。The acquisition part is further configured to decode the set of fused features to obtain the target item information and the target content information.
  19. 根据权利要求12-17任一项所述的装置,其中,所述获取部分,还被配置为通过预设的布局生成模型,对所述目标物品信息和所述目标内容信息进行布局生成,得到多个布局;所述预设的布局生成模型表征通过物品和内容调整布局;The device according to any one of claims 12 to 17, wherein the acquisition part is further configured to perform layout generation on the target item information and the target content information through a preset layout generation model, to obtain Multiple layouts; the preset layout generation model representation adjusts the layout through items and content;
    所述确定部分,还被配置为通过评价模型,对所述多个布局进行评估,确定候选布局;所述评价模型用于对布局进行评价筛选;The determination part is also configured to evaluate the multiple layouts and determine candidate layouts through an evaluation model; the evaluation model is used to evaluate and screen layouts;
    所述选择部分,还被配置为通过布局优选模型,从所述候选布局中,选择最优布局;The selection part is further configured to select an optimal layout from the candidate layouts through a layout optimization model;
    所述生成部分,还被配置为基于所述最优布局、所述目标物品信息和所述目标内容信息,生成所述目标多媒体信息。The generating part is further configured to generate the target multimedia information based on the optimal layout, the target item information and the target content information.
  20. 根据权利要求19所述的装置,其中,所述生成部分,还被配置为通过预设的布局生成模型,生成所述目标物品信息和所述目标内容信息对应的初始化布局;所述预设的布局生成模型包括图像层的先后叠放顺序和文本信息中文字大小范围约束;The device according to claim 19, wherein the generating part is further configured to generate an initialization layout corresponding to the target item information and the target content information through a preset layout generation model; the preset The layout generation model includes the stacking order of image layers and the text size range constraints in text information;
    所述确定部分,还被配置为通过调整规则,对所述初始化布局进行调整,确定所述多个布局;所述调整规则是将对象的偏好程度作为激励,通过不断训练得到的。The determination part is further configured to adjust the initialization layout by adjusting rules to determine the multiple layouts; the adjustment rules are obtained through continuous training by taking the object's preference degree as an incentive.
  21. 根据权利要求19所述的装置,其中,所述获取部分,还被配置为通过评价模型,对所述多个布局进行评估,确定候选布局之前,获取历史目标多媒体信息;对所述历史目标多媒体信息进行识别,得到历史布局;所述历史布局包括正样本数据和负样本数据;The device according to claim 19, wherein the acquisition part is further configured to evaluate the multiple layouts through an evaluation model, and obtain historical target multimedia information before determining candidate layouts; The information is identified to obtain the historical layout; the historical layout includes positive sample data and negative sample data;
    所述确定部分,还被配置为通过所述正样本数据和所述负样本数据对初始评价模型进行训练,确定所述评价模型。The determining part is further configured to train an initial evaluation model through the positive sample data and the negative sample data, and determine the evaluation model.
  22. 根据权利要求19所述的装置,其中,所述获取部分,还被配置为通过所述评价模型,对所述多个布局进行评估,得到所述多个布局各自对应的评估结果;The device according to claim 19, wherein the acquisition part is further configured to evaluate the multiple layouts through the evaluation model to obtain evaluation results corresponding to the multiple layouts;
    所述确定部分,还被配置为若所述评估结果表征为成功,则将其对应的布局作为所述候选布局。The determining part is further configured to use the corresponding layout as the candidate layout if the evaluation result is characterized as successful.
  23. 一种多媒体信息的生成装置,包括:A device for generating multimedia information, including:
    存储器,用于存储可执行指令; Memory, used to store executable instructions;
    处理器,用于执行所述存储器中存储的可执行指令时,实现权利要求1-11任一项所述的多媒体信息的生成方法。A processor, configured to implement the multimedia information generation method described in any one of claims 1-11 when executing executable instructions stored in the memory.
  24. 一种计算机可读存储介质,所述存储介质存储有可执行指令,当所述可执行指令被执行时,用于引起处理器执行如权利要求1-11任一项所述的多媒体信息的生成方法。 A computer-readable storage medium, the storage medium stores executable instructions, and when the executable instructions are executed, they are used to cause the processor to perform the generation of multimedia information according to any one of claims 1-11 method.
PCT/CN2023/118512 2022-09-19 2023-09-13 Multimedia information generation method and apparatus, and computer-readable storage medium WO2024061073A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211139046.2 2022-09-19
CN202211139046.2A CN117786193A (en) 2022-09-19 2022-09-19 Method and device for generating multimedia information and computer readable storage medium

Publications (1)

Publication Number Publication Date
WO2024061073A1 true WO2024061073A1 (en) 2024-03-28

Family

ID=90383903

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/118512 WO2024061073A1 (en) 2022-09-19 2023-09-13 Multimedia information generation method and apparatus, and computer-readable storage medium

Country Status (2)

Country Link
CN (1) CN117786193A (en)
WO (1) WO2024061073A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190147366A1 (en) * 2017-11-13 2019-05-16 International Business Machines Corporation Intelligent Recommendations Implemented by Modelling User Profile Through Deep Learning of Multimodal User Data
CN110489582A (en) * 2019-08-19 2019-11-22 腾讯科技(深圳)有限公司 Personalization shows the generation method and device, electronic equipment of image
CN112131848A (en) * 2019-06-25 2020-12-25 北京沃东天骏信息技术有限公司 Method and device for generating document information, storage medium and electronic equipment
CN113570416A (en) * 2021-07-30 2021-10-29 北京达佳互联信息技术有限公司 Method and device for determining delivered content, electronic equipment and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190147366A1 (en) * 2017-11-13 2019-05-16 International Business Machines Corporation Intelligent Recommendations Implemented by Modelling User Profile Through Deep Learning of Multimodal User Data
CN112131848A (en) * 2019-06-25 2020-12-25 北京沃东天骏信息技术有限公司 Method and device for generating document information, storage medium and electronic equipment
CN110489582A (en) * 2019-08-19 2019-11-22 腾讯科技(深圳)有限公司 Personalization shows the generation method and device, electronic equipment of image
CN113570416A (en) * 2021-07-30 2021-10-29 北京达佳互联信息技术有限公司 Method and device for determining delivered content, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN117786193A (en) 2024-03-29

Similar Documents

Publication Publication Date Title
CN111382309B (en) Short video recommendation method based on graph model, intelligent terminal and storage medium
TW201915790A (en) Generating document for a point of interest
CN110554782B (en) Expression input image synthesis method and system
WO2021139415A1 (en) Data processing method and apparatus, computer readable storage medium, and electronic device
CN109783539A (en) Usage mining and its model building method, device and computer equipment
CN116468460B (en) Consumer finance customer image recognition system and method based on artificial intelligence
CN113051468B (en) Movie recommendation method and system based on knowledge graph and reinforcement learning
Wang A survey of online advertising click-through rate prediction models
Wu et al. Product design award prediction modeling: Design visual aesthetic quality assessment via DCNNs
Wang et al. Multifunctional product marketing using social media based on the variable-scale clustering
Ahamed et al. A recommender system based on deep neural network and matrix factorization for collaborative filtering
CN115238191A (en) Object recommendation method and device
CN115329215A (en) Recommendation method and system based on self-adaptive dynamic knowledge graph in heterogeneous network
CN114862506A (en) Financial product recommendation method based on deep reinforcement learning
Gelli et al. Learning subjective attributes of images from auxiliary sources
CN113610610A (en) Session recommendation method and system based on graph neural network and comment similarity
CN113344648A (en) Advertisement recommendation method and system based on machine learning
CN116823321B (en) Method and system for analyzing economic management data of electric business
CN114817692A (en) Method, device and equipment for determining recommended object and computer storage medium
CN117251622A (en) Method, device, computer equipment and storage medium for recommending objects
WO2024061073A1 (en) Multimedia information generation method and apparatus, and computer-readable storage medium
Jabeen The use of AI in marketing: Its impact and future
Hanafi et al. Word Sequential Using Deep LSTM and Matrix Factorization to Handle Rating Sparse Data for E‐Commerce Recommender System
Low et al. Recent developments in recommender systems
CN114090848A (en) Data recommendation and classification method, feature fusion model and electronic equipment

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23867362

Country of ref document: EP

Kind code of ref document: A1