WO2024104286A1 - 一种视频处理方法、装置、电子设备和存储介质 - Google Patents

一种视频处理方法、装置、电子设备和存储介质 Download PDF

Info

Publication number
WO2024104286A1
WO2024104286A1 PCT/CN2023/131208 CN2023131208W WO2024104286A1 WO 2024104286 A1 WO2024104286 A1 WO 2024104286A1 CN 2023131208 W CN2023131208 W CN 2023131208W WO 2024104286 A1 WO2024104286 A1 WO 2024104286A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
target
implantable
source video
area
Prior art date
Application number
PCT/CN2023/131208
Other languages
English (en)
French (fr)
Inventor
温佳伟
郭亨凯
夏吉喆
朱圣楠
Original Assignee
北京字跳网络技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京字跳网络技术有限公司 filed Critical 北京字跳网络技术有限公司
Publication of WO2024104286A1 publication Critical patent/WO2024104286A1/zh

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44016Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving splicing one content stream with another content stream, e.g. for substituting a video clip
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/81Monomedia components thereof
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/81Monomedia components thereof
    • H04N21/812Monomedia components thereof involving advertisement data

Definitions

  • the present invention relates to the field of information processing, and in particular to a video processing method, device, electronic equipment and storage medium.
  • Video is one of the most important information dissemination on the Internet.
  • Information related to the video content can be embedded in the video to explain and illustrate the video content.
  • embedding information in the video in the education and training scenario can enhance the learning effect
  • some jump links can also be embedded, such as embedding commodity transaction links in real-time videos in some live broadcast scenarios to realize transactions while watching videos
  • advertisements can also be embedded, such as embedding advertising information in various video resources to promote brands and products. At present, all of these require manual processing of videos to embed information.
  • the present invention proposes a video processing method, device, electronic device, storage medium and computer program.
  • a video processing method comprising:
  • determining at least one material and description information of the material for the target object In response to a received first request for a target object, determining at least one material and description information of the material for the target object, wherein the first request is a request for information dissemination for the target object, the target object has attribute information, the material has description information, and the description information is used to characterize features of the material;
  • determining at least one source video for the target object the at least one source video including at least one target implantable area, the at least one target implantable area matching at least one material of the target object;
  • a composite video is generated by embedding the target object's material into a target embeddable area of a matching source video.
  • a video processing method comprising:
  • determining an implantable area and a A feature tag corresponding to the implantable area In response to a received processing request for a target source video, determining an implantable area and a A feature tag corresponding to the implantable area; wherein the processing request is a request for object implantation for the target source video; the feature tag is used to characterize the features of the implantable area;
  • determining at least one implantable object for the target source video the implantable object comprising at least one target material, the target material matching at least one implantable area in the target source video;
  • a composite video is generated by embedding the target object's material into a target embeddable area of a matching source video.
  • a video processing device comprising:
  • a first determination module is configured to determine at least one material and description information of the material for the target object in response to a received first request for the target object, wherein the first request is a request for information dissemination for the target object, the target object has attribute information, the material has description information, and the description information is used to characterize the characteristics of the material;
  • a first matching module configured to determine at least one source video for a target object based on a first matching rule, wherein the at least one source video includes at least one target implantable area, and the at least one target implantable area matches at least one material of the target object;
  • the synthesis module is used to generate a synthetic video by implanting the material of the target object into a target implantable area of the source video that matches it.
  • a video processing device comprising:
  • a second determination module is used to determine, in response to a received processing request for a target source video, an implantable area in the target source video and a feature tag corresponding to the implantable area; wherein the processing request is a request for object implantation for the target source video; and the feature tag is used to characterize a feature of the implantable area;
  • a second matching module configured to determine at least one implantable object for the target source video based on a second matching rule, wherein the implantable object includes at least one target material, and the target material matches at least one implantable area in the target source video;
  • the synthesis module is used to generate a synthetic video by implanting the material of the target object into a target implantable area of the source video that matches it.
  • an electronic device comprising:
  • a memory for storing the at least one processor-executable instruction
  • the at least one processor is configured to execute the instructions to implement the method as described in any one of the foregoing.
  • a computer-readable storage medium having a computer program stored thereon, wherein when the computer program is executed by a processor, any of the methods described above is implemented.
  • a computer program product comprising a computer program, wherein when the computer program is executed by a processor, the computer program implements any of the methods described above.
  • the technical solution provided in the embodiment of the present application can realize automatic identification of the implantable area of the source video, match the target material of the implantable object based on the feature label of the implantable area, and realize automatic identification of the material of the target object to be implanted in the source video, match the material of the target object to the target implantable area of the source video, match the source video and the object in both directions to obtain a composite video, and recommend the composite video to the user based on the user's video interest. It not only realizes the automatic implantation of the source video, reduces labor costs and improves processing efficiency, but also realizes video recommendation based on the user's interests and obtains commercial benefits.
  • FIG1 is a system architecture diagram provided by an exemplary embodiment of the present invention.
  • FIG2 is a schematic diagram of an application scenario provided by an exemplary embodiment of the present invention.
  • FIG3 is a schematic block diagram of a video processing platform provided by an exemplary embodiment of the present invention.
  • FIG4 is a flow chart of a video processing method provided by an exemplary embodiment of the present invention.
  • FIG5 is a flow chart of a video processing method provided by an exemplary embodiment of the present invention.
  • FIG6 is a flow chart of a video preprocessing method provided by an exemplary embodiment of the present invention.
  • FIG7 is a flow chart of a video recommendation method provided by an exemplary embodiment of the present invention.
  • 8A and 8B are schematic block diagrams of functional modules of video processing devices provided by two exemplary embodiments of the present invention, respectively;
  • FIG9 is a structural block diagram of an electronic device provided by an exemplary embodiment of the present invention.
  • FIG. 10 is a structural block diagram of a computer system provided by an exemplary embodiment of the present invention.
  • the basic structure of video is a hierarchical structure consisting of frames, shots, scenes and video programs.
  • a frame is a static image and is the smallest logical unit of a video.
  • a dynamic video is formed by playing a temporally continuous sequence of frames at equal intervals.
  • a shot is a sequence of frames shot continuously by a camera from power on to power off, depicting a part of an event or a scene. It has no or weak semantic information and emphasizes the similarity of the visual content that constitutes the frames.
  • a scene is a sequence of semantically related shots. It can be the same object shot from different angles and with different techniques, or it can be a combination of shots with the same subject and event, emphasizing the semantic relevance.
  • a video program contains a complete event or story. As the highest-level video content structure, it includes the composition relationship of the video as well as the summary, semantics and general description of the video.
  • Semantic Segmentation assigns semantic labels to each pixel in the image and identifies objects of different categories.
  • Instance Segmentation first determines the location area of the object in the image, and then identifies the category of the object.
  • Panoramic Segmentation detects and segments all objects in the image, including the background.
  • SLAM Visual Simultaneous Localization and Mapping
  • 3D scene analysis determines the scene in the video and combines plane recognition technology to analyze the area in the scene that is suitable for placing 3D materials.
  • FIG1 shows a schematic diagram of an exemplary system architecture to which the technical solution of an embodiment of the present invention can be applied.
  • the system architecture may include a terminal (one or more of a smart phone 101, a tablet computer 102, and a portable computer 103 as shown in FIG1 , and of course, a desktop computer, etc.), a network 104, and a server 105.
  • the network 104 is used to provide a medium for a communication link between the terminal device and the server 105.
  • the network 104 may include various connection types, such as a wired communication link, a wireless communication link, etc.
  • the number of terminals, networks and servers in FIG1 is only illustrative. According to actual needs, there may be any number of terminals, networks and servers.
  • the server 105 may be a server cluster composed of multiple servers.
  • the terminal sends a request to the server 105 to provide a video
  • the server 105 responds to the request by sending the corresponding video or an interactive interface through which the terminal can obtain the video based on a preset video providing strategy.
  • FIG. 2 shows an application scenario diagram of the technical solution of an embodiment of the present invention.
  • the video service system includes a video provider 201, an object provider 202, a video processing platform 203, and a terminal 204.
  • the video provider provides various source videos to the video processing platform, such as offline videos or real-time videos, such as movies, animations, documentaries, popular science knowledge, short videos, live videos, etc.
  • the video provider can be the original creator of the source video, such as the author, or the authorized party of the source video, such as the video platform; the object provider provides the video processing platform with the communication needs for the target object, such as brand promotion needs, product promotion needs, or information release needs, etc.
  • the object provider can be a brand, a product supplier, a seller, the media, etc.; the video processing platform performs video processing based on the source video obtained from the video provider and the target object to be communicated obtained from the object provider, obtains a video that integrates the source video and the target object, and provides the integrated video to the terminal based on the video request of the terminal.
  • FIG. 3 shows a schematic block diagram of a video processing platform according to an embodiment of the present invention.
  • the video processing platform processes the received source video and forms a source video resource library. For each source video, at least one implantable area is obtained, and a feature label is formed for each implantable area.
  • the source video resource library stores the source video, the implantable area, the feature label and their corresponding relationship for each source video.
  • the platform maintains an object material resource library for the specified object. For each object specified by the object provider, the object's attribute information, corresponding materials and description information of the materials are configured.
  • the attribute information may include the commodity name (such as "Tiantian Coke"), multiple category information of the commodity (such as fast-moving consumer goods, beverages, non-alcoholic, low-sugar, etc.) and other attribute information, such as shelf life, etc.
  • the commodity object is configured with multiple materials, each of which has corresponding description information.
  • the description information may include the content, form of expression, 2D or 3D, etc. of the material; the video processing platform has a matching module, a synthesis module and a recommendation module, and is correspondingly configured with matching rules and recommendation strategies.
  • the matching module is used to match the source video and its area with the specified object and its material according to the matching rules to obtain the basic material for video synthesis, and perform video synthesis through the synthesis module.
  • the synthesized video is stored in the synthetic video resource library.
  • the recommendation module is used to recommend at least one synthetic video based on the recommendation strategy.
  • the material stored in the object material resource library can be the material itself, as shown in Figure 3, or it can be the material's identifier or a link to the material; the material can come from the object provider, or it can be produced and generated based on the specified object according to the needs of the object provider, or it can be obtained from other channels.
  • the source video and implantable area stored in the source video resource library can be the source video or the implantable area itself, or it can be the corresponding identifier or the corresponding link; the video stored in the synthetic video resource library can be the synthetic video itself, or it can be the corresponding identifier or the corresponding link.
  • the video processing platform may be a centralized server architecture or a separate server system architecture, which shall not be a limitation to the present invention.
  • FIG. 4 shows a flow chart of a video processing method provided by an exemplary embodiment of the present invention.
  • the method includes:
  • S401 in response to a received first request for a target object, determining at least one material and description information of the material for the target object, wherein the first request is a request for information dissemination for the target object, the target object has attribute information, and the material has description information, and the description information is used to characterize characteristics of the material.
  • the first request may include the material of the target object, or the first request may include material requirement information of the target object, and the corresponding material may be determined for the target object through the material requirement information.
  • the target object, the attributes of the target object, the corresponding material and the description information of the material and the mapping relationship between them can be saved in the object material library.
  • the object material resource library is as shown in Figure 3; the target object has at least one attribute information, at least one material, and each material has corresponding description information for characterizing the characteristics of the material.
  • the attribute information of the target object may include information of multiple dimensions, including but not limited to the name, category, ingredients/materials, functions/efficacy, appearance, structure, usage, introduction, etc. of the target object.
  • the description information of the material may include information of multiple dimensions, including but not limited to the content, form, 2D/3D, image/video, scene, plot, etc. of the material.
  • the material may include the brand of the beverage, the product image of the beverage, A 2D poster of the beverage with a party as the theme, a static 3D product image of the beverage, a 3D animated image of the beverage (such as a bottle twisting animation, etc.), etc.
  • the materials may include the name and brand of the enterprise, the services provided by the enterprise, the enterprise's promotional short videos, 3D materials of the enterprise's buildings, etc.
  • the target object, attribute information, material and description information of the material as well as the mapping relationship between them can be stored in the object material resource library.
  • the attribute information of the target object and the description information of the material can be obtained in a variety of ways, for example, they can be directly provided by the target object provider or the material provider, or they can be manually or automatically extracted by the video processing platform.
  • Different types of attribute information or different types of description information can be set for manual input, or they can be automatically extracted based on a machine model, for example, the target object or material is judged from the root node of the decision tree by a decision tree, layer by layer to each leaf node, each leaf node corresponds to an attribute value or description information, or the attribute information and description information of the target object can be automatically extracted based on a semantic algorithm.
  • the attribute information and description information of the target object can be extracted based on a pre-trained semantic model.
  • Semantic algorithms and decision trees are commonly used artificial intelligence algorithms in this field and will not be described in detail here.
  • S402 Based on a first matching rule, determine at least one source video for a target object, wherein the at least one source video includes at least one target implantable area, and the at least one target implantable area matches at least one material of the target object.
  • the source video, the implantable area of the source video, the feature labels of the implantable area and the mapping relationship between them can be saved in the source video resource library.
  • the source video resource library is shown in Figure 3; the source video has at least one implantable area, and each implantable area has a corresponding feature label for characterizing the characteristics of the implantable area.
  • the source video can be an offline video or a real-time video; the implantable area of the video can be an area with spatial significance in the video, such as the sky, ground, etc. in the video, or an area with surface significance (such as a plane or a curved surface), such as a building facade, a billboard, a screen, the body of a coffee cup, etc. in the video.
  • the implantable area of the video can be an area with spatial significance in the video, such as the sky, ground, etc. in the video, or an area with surface significance (such as a plane or a curved surface), such as a building facade, a billboard, a screen, the body of a coffee cup, etc. in the video.
  • Feature tags can include information of multiple dimensions, such as video classification information, implantable area names (such as sky, ground, desktop, billboard, etc.), scene information (such as party, sports, etc.), location information (such as cafe, airport, bedroom, etc.), and multiple image feature information of implantable areas (such as confidence, clarity, size, etc.).
  • implantable area names such as sky, ground, desktop, billboard, etc.
  • scene information such as party, sports, etc.
  • location information such as cafe, airport, bedroom, etc.
  • image feature information of implantable areas such as confidence, clarity, size, etc.
  • the feature tag includes a video feature tag corresponding to the source video and a region feature tag corresponding to the implantable region.
  • a source video of a variety show there are two implantable regions, one is a starry sky region, and the feature tags may include variety show, highlights, night, starry sky, etc.; the other is a desktop region, and the feature tags may include variety show, highlights, competition, desktop, etc.; the tags of the two implantable regions both include the video feature tags of the source video, variety show and highlights.
  • Feature tags can be obtained in many ways, for example, they can be provided directly by the video provider or by video processing.
  • the feature tags can be manually or automatically extracted by the processing platform. Different types of feature tags can be set for manual configuration, or they can be automatically extracted based on machine models.
  • the source video can be judged from the root node of the decision tree through a decision tree, and each leaf node corresponds to a feature tag.
  • the source video can also be semantically analyzed based on a semantic algorithm to automatically extract feature tags.
  • the first matching rule is related to the feature label of the implantable area, the attribute information of the object and the description information of the material.
  • the first matching rule may be a preset mapping relationship, for example, the feature tag "party” has a corresponding relationship with the object attribute "beverage”, and the feature tag “bar” has a corresponding relationship with the descriptive information "product appearance" of the object attribute "beverage”; based on this matching rule, when the target object is a beverage, the implantable area in the source video whose feature tags include “party” and “bar” can be determined as the target implantable area.
  • step S402 includes:
  • Step S4021 calculating a first matching degree between each feature tag of the implantable area and each attribute information of the target object.
  • the matching degree between the feature label and the attribute information is calculated by respectively using the feature vector corresponding to the feature label and the feature vector corresponding to the attribute information to calculate the similarity between the two feature vectors; the similarity calculation can be carried out in a variety of ways, such as Pearson correlation coefficient, Euclidean distance, cosine similarity, and dot product similarity; the average value of the calculated similarity values is taken to obtain the first matching degree.
  • Step S4022 calculating the second matching degree between each feature tag of the implantable area and each description information of the material.
  • the matching degree between the feature label and the description information is calculated by respectively using the feature vector corresponding to the feature label and the feature vector corresponding to the description information to calculate the similarity between the two feature vectors; the similarity calculation can be carried out in a variety of ways, such as Pearson correlation coefficient, Euclidean distance, cosine similarity, and dot product similarity; the average value of the calculated similarity values is taken to obtain the second matching degree.
  • Step S4023 determining a source video and a target implantable area based on the first matching degree and the second matching degree.
  • step S4023 may include: determining a source video based on a first matching degree, and determining a target implantable area from the implantable area in the determined source video based on a second matching degree.
  • the method of determining a source video or a target implantable area based on the matching degree may include sorting the matching degree values, selecting a source video or an implantable area corresponding to a matching degree before a preset ranking, or setting a preset threshold to determine a source video or an implantable area whose matching degree value is higher than the preset threshold.
  • the first matching rule further includes determining the target implantable area according to the 2D and 3D classification of the material.
  • the target implantable area has a label that can characterize the target implantable area as a surface, such as a desktop, a building plane, a large screen, a billboard, a mirror, a glass plane, a cup body, etc.; when the material is of 3D type, The target implantable area has a label that can characterize the target implantable area as a space, such as the sky, the ground, the starry sky, the canyon, etc.
  • S403 Generate a composite video by implanting the target object's material into a target implantable area of the source video that matches it.
  • the 2D material when the material of the target object is a 2D material, the 2D material is implanted into a first target implantable area that matches it, and the first target implantable area has a label that characterizes the area as a surface; when the material of the target object is a 3D material, the 3D material is implanted into a second target implantable area that matches it, and the second target implantable area has a label that characterizes the area as a space, such as the sky, the ground, the room, the canyon, etc.
  • plane image recognition technology can be used to make the 3D model appear at the corresponding position in the video according to the initial position; and motion tracking technology can be used to make the 3D model present the content of the corresponding perspective according to the change of the perspective of the video content.
  • step S403 further includes a step of rendering the synthesized video, wherein the rendering includes but is not limited to rasterization rendering, ray casting, ray tracing, etc.
  • the material when it is a 3D material, it also includes Neural Radiance Field (NeRF) rendering.
  • NeRF Neural Radiance Field
  • step S403 the method further includes:
  • the video processing platform's recommendation of synthesized videos can be based on user requests. For example, when receiving a direct or indirect video acquisition request from a user, the system recommends at least one synthesized video to the user according to the recommendation strategy. It can also be based on the cold start of a preset program or interface. For example, when a user opens a program or enters a page, the system can automatically recommend videos to the user.
  • a user can issue a video acquisition request through the interactive interface on some video platforms. For example, by clicking on "Funny Video" in the interactive interface, the video acquisition request issued by the user can be directly sent to the video processing platform of the present invention or received by other video platforms, and a request for obtaining a synthesized video can be sent from other video platforms to the video processing platform of the present invention.
  • the target object included in the synthesized video is a commodity object from an advertiser
  • recommending the synthesized video to the user for playback can bring some commercial benefits to the video processing platform, the video source provider, or the platform that recommends the video to the user. Therefore, other video platforms may obtain the synthesized video from the video processing platform for commercial interests.
  • FIG. 5 shows a flow chart of a video processing method provided by an exemplary embodiment of the present invention.
  • the method includes:
  • the processing request is a request for object implantation in the target source video; and the feature tag is used to characterize the feature of the implantable area.
  • the source video, the implantable area of the source video, the feature labels of the implantable area and the mapping relationship between them can be saved in the source video resource library.
  • the source video resource library is shown in Figure 3; the source video has at least one implantable area, and each implantable area has a corresponding feature label for characterizing the characteristics of the implantable area.
  • the source video can be an offline video or a real-time video; the implantable area of the video can be a background area in the video, such as the sky, ground, etc. in the video, or an object area in the video, such as a building facade, billboard, screen, or the body of a coffee cup, etc.
  • Feature tags can include information of multiple dimensions, such as video classification information, implantable area names (such as sky, ground, desktop, billboard, etc.), scene information (such as party, sports, etc.), location information (such as cafe, airport, bedroom, etc.), and multiple image feature information of implantable areas (such as confidence, clarity, size, etc.).
  • implantable area names such as sky, ground, desktop, billboard, etc.
  • scene information such as party, sports, etc.
  • location information such as cafe, airport, bedroom, etc.
  • image feature information of implantable areas such as confidence, clarity, size, etc.
  • the feature tag includes two parts, one is the feature tag corresponding to the source video, and the other is the feature tag corresponding to the implantable area.
  • the feature tags may include variety show, highlights, night, starry sky, etc.
  • the other is the desktop area
  • the feature tags may include variety show, highlights, competition, desktop, etc.
  • the tags of the two implantable areas both include the feature tags of the source video, variety show and highlights.
  • Feature labels can be obtained in a variety of ways, for example, they can be provided directly by the video provider, or they can be extracted manually or automatically by the video processing platform. Different types of feature labels can be set for manual configuration, or they can be automatically extracted based on a machine model. For example, the source video can be judged starting from the root node of the decision tree through a decision tree, and each leaf node corresponds to a feature label. Semantic analysis can also be performed on the source video based on a semantic algorithm to automatically extract feature labels.
  • S502 Determine at least one implantable object for the target source video based on a second matching rule, where the implantable object includes at least one target material, and the target material matches at least one implantable area in the target source video.
  • the target object, the attributes of the target object, the corresponding material and the description information of the material and the mapping relationship between them can be saved in the object material library.
  • the object material resource library is as shown in Figure 3; the target object has at least one attribute information, at least one material, and each material has corresponding description information for characterizing the characteristics of the material.
  • the attribute information of an object may include information of multiple dimensions, including but not limited to the name, category, composition/material, function/effect, appearance, structure, usage, introduction, etc. of the target object.
  • the description information of the material may include information of multiple dimensions, including but not limited to the content, form, 2D/3D, image/video, scene, plot, etc. of the material.
  • the materials may include the brand of the beverage, the product image of the beverage, a 2D poster of the beverage with a party as the theme, a static 3D product image of the beverage, a 3D animated image of the beverage (such as a bottle twisting animation, etc.), etc.
  • the materials may include the name and brand of the enterprise, the services provided by the enterprise, the enterprise's promotional short videos, 3D materials of the enterprise's buildings, etc.
  • the target object, attribute information, material and description information of the material as well as the mapping relationship between them can be stored in the object material resource library.
  • the attribute information of the target object and the description information of the material can be obtained in a variety of ways, for example, they can be provided directly by the target object provider or the material provider, or they can be extracted manually or automatically by the video processing platform. Different types of attribute information can be set for manual filling, or they can be automatically extracted based on a machine model. For example, the target object can be judged step by step through a decision tree, and each leaf node corresponds to a description. It is also possible to perform big data analysis on the descriptions related to the target object based on semantic algorithms, thereby automatically extracting description information.
  • Semantic algorithms and decision trees are commonly used artificial intelligence algorithms in this field and will not be described in detail here.
  • the second matching rule is related to the feature label of the implantable area, the attribute of the object and the description information of the material.
  • the second matching rule may be a preset mapping relationship, for example, the feature label "self-driving" corresponds to the object attributes "vehicle", “tire”, “satellite navigation", “refresh”, etc., and the feature label "night sky” corresponds to the object attribute "satellite navigation” with the descriptive information "3D” material; based on this matching rule, when the target source video is a video of self-driving in the wild at night, the target object can be determined for the source video as a satellite field escort service, and the 3D material included is suitable for implanting into the night sky area in the self-driving video.
  • At least one implantable object is determined for the target source video, where the implantable object includes at least one target material, and the target material matches at least one implantable area in the target source video.
  • the second matching rule may be a preset semantic model, including:
  • Step S5021 calculating the third matching degree between each feature label of the target source video area and each attribute information of the implantable object.
  • the matching degree between the feature label and the attribute information is calculated by respectively using the feature vector corresponding to the feature label and the feature vector corresponding to the attribute information to calculate the similarity between the two feature vectors; the similarity calculation can be carried out in a variety of ways, such as Pearson correlation coefficient, Euclidean distance, cosine similarity, and dot product similarity; the average value of the calculated similarity values is taken to obtain the third matching degree.
  • Step S5022 Calculate a fourth matching degree between each feature tag of the implantable area of the target source video and each description information of the material of the implantable object.
  • the matching degree between the feature label and the description information is calculated by using the feature vector and Describe the feature vectors corresponding to the information and calculate the similarity between the feature vectors.
  • the similarity calculation can be performed in a variety of ways, such as Pearson correlation coefficient, Euclidean distance, cosine similarity, and dot product similarity.
  • the fourth matching degree is obtained by taking an average of the similarity values calculated.
  • Step S5023 Determine an implantable object and a target material based on the third matching degree and the fourth matching degree.
  • step S5023 may include: determining an implantable object based on the third matching degree, and determining a target material from the materials of the determined implantable object based on the fourth matching degree.
  • the method of determining the implantable object or target material based on the matching degree may include sorting the matching degree values, selecting the implantable object frequency or target material corresponding to the matching degree before the preset ranking; or setting a preset threshold value to determine the implantable object or target material whose matching degree value is higher than the preset threshold value.
  • the third matching rule also includes: when the label of the implantable area represents that the target implantable area is a surface area, such as a desktop, a building plane, a large screen, a billboard, a mirror, a glass plane, a cup body, etc., a 2D material is selected from the implantable object; when the label of the implantable area represents that the area is a space area, a 3D material is selected from the implantable object, or the 2D and 3D classification of the material is determined by space to determine the target implantable area.
  • a surface area such as a desktop, a building plane, a large screen, a billboard, a mirror, a glass plane, a cup body, etc.
  • the target implantable area When the material is of 2D type, the target implantable area has a label that can represent that the target implantable area is a surface; when the material is of 3D type, such as the sky, the ground, the starry sky, the canyon, etc., the target implantable area has a label that can represent that the target implantable area is a space.
  • the second matching rule may be a matching algorithm, based on which a matching value between a feature label of a source video, an attribute of a target object, and a description of a material is calculated, and based on the matching value, a target implantable area of the source video is determined; for example, the matching values may be sorted, and an implantable area ranked before a preset ranking may be selected as the target implantable area, or a preset threshold may be set to determine an implantable area with a matching degree higher than the preset threshold as the target implantable area.
  • the second matching rule also includes determining the target implantable area based on the 2D and 3D classification of the material.
  • the target implantable area has a label that can characterize the target implantable area as a plane, such as a desktop, a building plane, a large screen, a billboard, a mirror, a glass plane, etc.; when the material is of 3D type, the target implantable area has a label that can characterize the target implantable area as a space, such as the sky, the ground, the starry sky, a canyon, etc.
  • the target object, the attribute information of the target object, the configured material and the description information of the material, and the mapping relationship between them are stored in the object material resource library.
  • S503 Generate a composite video by implanting the target object's material into a target implantable area of the source video that matches it.
  • the 2D material when the material of the target object is a 2D material, the 2D material is implanted into a first target implantable area that matches the 2D material, and the first target implantable area has a label that indicates that the area is a surface;
  • the material of the target object is a 3D material
  • the 3D material is implanted into a second target implantable area that matches the 3D material.
  • the second target implantable area has a label that characterizes the area as a space, such as the sky, the ground, the room, the canyon, and the like.
  • step S503 further includes a step of rendering the synthesized video, and video rendering includes but is not limited to rasterization rendering, ray casting, ray tracing, etc.
  • video rendering includes but is not limited to rasterization rendering, ray casting, ray tracing, etc.
  • when the material is a 3D material it also includes neural radiation field (NeRF) rendering.
  • NeRF neural radiation field
  • step S503 the method further includes:
  • S504 Recommend at least one synthetic video based on a preset recommendation strategy, where the recommendation strategy is related to user interests or historical data of the synthetic video.
  • the video processing platform's recommendation of synthesized videos can be based on user requests. For example, when receiving a direct or indirect video acquisition request from a user, the system recommends at least one synthesized video to the user according to the recommendation strategy. It can also be based on the cold start of a preset program or interface. For example, when a user opens a program or enters a page, the system can automatically recommend videos to the user.
  • a user can issue a video acquisition request through the interactive interface on some video platforms. For example, by clicking on "Funny Video" in the interactive interface, the video acquisition request issued by the user can be directly sent to the video processing platform of the present invention or received by other video platforms, and a request for obtaining a synthesized video can be sent from other video platforms to the video processing platform of the present invention.
  • the target object included in the synthesized video is a commodity object from an advertiser
  • recommending the synthesized video to the user for playback can bring some commercial benefits to the video processing platform, the video source provider, or the platform that recommends the video to the user. Therefore, other video platforms may obtain the synthesized video from the video processing platform for commercial interests.
  • FIG. 6 is a flowchart of a video preprocessing method provided by an exemplary embodiment of the present invention.
  • the source video is also preprocessed to obtain at least one implantable area and its corresponding feature label.
  • the source video preprocessing method includes:
  • S601 segment a source video to obtain a plurality of video segments.
  • a video includes many continuous frames.
  • the source video can be segmented to obtain video clips.
  • the video segmentation method includes but is not limited to shot segmentation and similarity segmentation; wherein, shot segmentation uses the shot as a processing unit, that is, each shot is regarded as a video segment; similarity segmentation calculates the similarity between adjacent frames, and segments the video based on preset similarity conditions to obtain different video segments.
  • the video is pre-processed, including:
  • the source video is an offline video, segmenting the source video based on shots or similarity
  • the source video is a real-time video
  • the source video is segmented based on shots.
  • S602 segment the video clip to obtain a plurality of candidate regions and their feature labels.
  • a target frame may be selected from a video clip and segmented; the segmentation method includes but is not limited to semantic segmentation, instance segmentation, panoramic segmentation, and any combination thereof.
  • the candidate regions of the target frame and label information of each candidate region can be obtained through instance segmentation, and the label information may include image classification information corresponding to the region, confidence, etc.
  • a corresponding instance segmentation model can be used to implement instance segmentation, and the image frame is used as a training sample to train the instance segmentation model.
  • the candidate regions of the target frame and the scene labels of each candidate region can be obtained by panoptic segmentation, and the scene labels are determined based on the candidate regions of the target frame and the association relationship between the candidate regions in the target frame. For example, when the candidate regions of the target frame are sky, ocean, beach, and awning, the scene labels can be vacation, seaside, and beach.
  • S603 Determine an implantable region and its feature label based on the candidate region.
  • candidate regions of the target frame are clustered to determine implantable intervals in the source video. For example, regions with confidence exceeding a preset threshold are selected as implantable regions. For example, implantable regions are determined by clustering according to area values or regional connectivity.
  • the implantable region of the source video is determined by performing a maximum rectangle search on the candidate region of the target frame. For example, a core region with the largest area and a blank area is selected as the implantable region. For example, a plane region and a space region are selected, such as the counter and facade of a cash register, the seat surface of a bench, the running belt of a treadmill, etc.
  • Steps S601-S603 can be used as a specific implementation method of determining the implantable area in the target source video and the feature label corresponding to the implantable area in step S501; it can be completed before step S402 to determine the implantable area of the source video, so as to determine the target implantable area matching the material in step S402.
  • FIG. 7 is a flow chart of a video recommendation method provided by an exemplary embodiment of the present invention.
  • step S404 and step S504 in the above method further include:
  • S701 determining the user's video interest based on video history data and search history data.
  • the user's video history data and search history data are obtained.
  • the video history data includes information such as the source, type, duration and frequency of the videos watched by the user.
  • the search history data includes the user's video-related search data, such as searched video keywords, click and viewing information, etc.
  • the determination method may be, for example, a deep learning model, which is trained based on a large amount of video historical data and search history data samples, and the trained learning model is used to analyze the above historical data to determine the video interest of the user.
  • S702 Recommend at least one composite video based on the video interest and a preset recommendation strategy.
  • the synthesized video, video tags and the mapping relationship between them can be saved in the synthesized video resource library.
  • the synthesized video resource library is shown in FIG3 ; the synthesized video has at least one video tag, and the video tag is used to characterize the characteristics of the synthesized video.
  • Video tags may include information of multiple dimensions, including but not limited to the name of the synthesized video (for example, the name of the source video may be used), video classification (for example, theme classification, ancient/modern, reality/science fiction, etc.), video introduction, rating, etc.
  • Video tags can be obtained in a variety of ways. You can use the tags of the source video or regenerate them. They can be provided directly by the video provider or manually or automatically extracted by the video processing platform. Different types of video tags can be set for manual filling, or they can be automatically extracted based on machine models. For example, a decision tree can be used to judge the video step by step, with each leaf node corresponding to a video tag. Video tags can also be automatically extracted based on semantic algorithms.
  • At least one composite video is selected for recommendation, and the relationship between the video tag of the composite video and the video interest satisfies the recommendation strategy.
  • FIG8A is a schematic block diagram of the functional modules of a video processing device provided by an exemplary embodiment of the present invention. As shown in FIG8A , the device 800 includes:
  • a first determination module 801 is configured to determine at least one material and description information of the material for the target object in response to a received first request for the target object, wherein the first request is a request for information dissemination for the target object, the target object has attribute information, the material has description information, and the description information is used to characterize the characteristics of the material;
  • a first matching module 802 configured to determine at least one source video for a target object based on a first matching rule, wherein the at least one source video includes at least one target implantable area, and the at least one target implantable area matches at least one material of the target object;
  • the synthesis module 803 is used to generate a synthesized video by implanting the material of the target object into the target implantable area of the source video that matches it.
  • FIG8B is a schematic block diagram of functional modules of a video processing device provided by an exemplary embodiment of the present invention. As shown in FIG8B , the device 800′ includes:
  • the second determination module 801' is used to determine the implantable area in the target source video and the feature label corresponding to the implantable area in response to the received processing request for the target source video; wherein the processing request is a request for object implantation in the target source video; and the feature label is used to characterize the feature of the implantable area;
  • a second matching module 802' is used to determine at least one implantable object for the target source video based on a second matching rule, wherein the implantable object includes at least one target material, and the target material matches at least one implantable area in the target source video;
  • the synthesis module 803' is used to generate a synthetic video by implanting the material of the target object into the target implantable area of the source video that matches it.
  • An embodiment of the present invention also provides an electronic device, comprising: at least one processor; a memory for storing instructions executable by the at least one processor; wherein the at least one processor is configured to execute the instructions to implement the above method invented by the embodiment of the present invention.
  • Fig. 9 is a schematic diagram of the structure of an electronic device provided by an exemplary embodiment of the present invention.
  • the electronic device 1800 includes at least one processor 1801 and a memory 1802 coupled to the processor 1801, and the processor 1801 can execute the corresponding steps in the above method invented in the embodiment of the present invention.
  • the processor 1801 may also be referred to as a central processing unit (CPU), which may be an integrated circuit chip having signal processing capabilities. Each step in the method described above in the embodiment of the present invention may be completed by hardware integrated logic circuits or software instructions in the processor 1801.
  • the processor 1801 may be a general-purpose processor, a digital signal processor (DSP), an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components.
  • DSP digital signal processor
  • ASIC application-specific integrated circuit
  • FPGA field-programmable gate array
  • a general-purpose processor may be a microprocessor or the processor may be any conventional processor, etc.
  • the steps of the method described above in conjunction with the embodiment of the present invention may be directly embodied as being executed by a hardware decoding processor, or may be executed by a combination of hardware and software modules in a decoding processor.
  • the software module may be located in a memory 1802, such as a random access memory, a flash memory, a read-only memory, a programmable read-only memory, or an electrically erasable programmable memory, a register, or other mature storage media in the art.
  • the processor 1801 reads the information in the memory 1802 and completes the steps of the above method in combination with its hardware.
  • FIG. 10 is a block diagram of a computer system provided by an exemplary embodiment of the present invention.
  • Computer system 1900 is intended to represent various forms of digital electronic computer devices, such as laptop computers, desktop computers, workstations, personal digital assistants, servers, blade servers, mainframe computers, and other suitable computers.
  • Electronic devices can also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices.
  • the components shown herein, their connections and relationships, and their functions are merely examples and are not intended to limit the implementation of the present invention described and/or claimed herein.
  • the computer system 1900 includes a computing unit 1901, which can perform various appropriate actions and processes according to a computer program stored in a read-only memory (ROM) 1902 or a computer program loaded from a storage unit 1908 into a random access memory (RAM) 1903.
  • ROM read-only memory
  • RAM random access memory
  • various programs and data required for the operation of the computer system 1900 can also be stored.
  • the computing unit 1901, ROM 1902, and RAM 1903 are connected to each other via a bus 1904.
  • An input/output (I/O) interface 1905 is also connected to the bus 1904.
  • a plurality of components in the computer system 1900 are connected to the I/O interface 1905, including: an input unit 1906, an output unit 1907, a storage unit 1908, and a communication unit 1909.
  • the input unit 1906 may be any type of device capable of inputting information to the computer system 1900, and the input unit 1906 may receive input digital or character information, and generate key signal inputs related to user settings and/or function control of the electronic device.
  • the output unit 1907 may be any type of device capable of presenting information, and may include, but is not limited to, a display, a speaker, a video/audio output terminal, a vibrator, and/or a printer.
  • the storage unit 1908 may include, but is not limited to, a disk, an optical disk.
  • the communication unit 1909 allows the computer system 1900 to exchange information/data with other devices over a network such as the Internet, and may include, but is not limited to, a modem, a network card, an infrared communication device, a wireless communication transceiver, and/or a chipset, such as a BluetoothTM device, a WiFi device, a WiMax device, a cellular communication device, and/or the like.
  • the computing unit 1901 may be a variety of general and/or special processing components with processing and computing capabilities. Some examples of the computing unit 1901 include, but are not limited to, a central processing unit (CPU), a graphics processing unit (GPU), various dedicated artificial intelligence (AI) computing chips, various computing units running machine learning model algorithms, digital signal processors (DSPs), and any appropriate processors, controllers, microcontrollers, etc.
  • the computing unit 1901 performs the various methods and processes described above. For example, in some embodiments, the above-mentioned method of the embodiment of the present invention may be implemented as a computer software program, which is tangibly contained in a machine-readable medium, such as a storage unit 1908.
  • part or all of the computer program may be loaded and/or installed on the electronic device 1900 via the ROM 1902 and/or the communication unit 1909.
  • the computing unit 1901 may be configured to perform the above-mentioned method of the embodiment of the present invention by any other appropriate means (e.g., by means of firmware).
  • An embodiment of the present invention further provides a computer-readable storage medium, wherein when instructions in the computer-readable storage medium are executed by a processor of an electronic device, the electronic device is enabled to execute the above method invented by the embodiment of the present invention.
  • the computer-readable storage medium in the embodiments of the present invention may be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, device, or apparatus.
  • the computer-readable storage medium may include, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices, or devices, or any suitable combination of the above. More specifically, the computer-readable storage medium may include an electrical connection based on one or more wires, a portable computer disk, a hard disk, a random access memory, or a computer program. (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
  • the computer-readable medium may be included in the electronic device, or may exist independently without being incorporated into the electronic device.
  • An embodiment of the present invention further provides a computer program product, including a computer program, wherein the computer program implements the above method invented by the embodiment of the present invention when executed by a processor.
  • a computer program code for performing the operation of the present invention may be written in one or more programming languages or a combination thereof, the programming languages including but not limited to object-oriented programming languages, such as Java, Smalltalk, C++, and conventional procedural programming languages, such as "C" language or similar programming languages.
  • the program code may be executed entirely on a user's computer, partially on a user's computer, as an independent software package, partially on a user's computer, partially on a remote computer, or entirely on a remote computer or server.
  • the remote computer may be connected to the user's computer via any type of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computer.
  • LAN local area network
  • WAN wide area network
  • each square box in the flow chart or block diagram can represent a module, a program segment or a part of a code, and the module, the program segment or a part of the code contains one or more executable instructions for realizing the specified logical function.
  • the functions marked in the square box can also occur in a sequence different from that marked in the accompanying drawings. For example, two square boxes represented in succession can actually be executed substantially in parallel, and they can sometimes be executed in the opposite order, depending on the functions involved.
  • each square box in the block diagram and/or flow chart, and the combination of the square boxes in the block diagram and/or flow chart can be implemented with a dedicated hardware-based system that performs the specified function or operation, or can be implemented with a combination of dedicated hardware and computer instructions.
  • the modules, components or units involved in the embodiments of the present invention may be implemented by software or hardware, wherein the names of the modules, components or units do not, in some cases, limit the modules, components or units themselves.
  • exemplary hardware logic components include, without limitation, field programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), application specific standard products (ASSPs), systems on chip (SOCs), complex programmable logic devices (CPLDs), and the like.
  • FPGAs field programmable gate arrays
  • ASICs application specific integrated circuits
  • ASSPs application specific standard products
  • SOCs systems on chip
  • CPLDs complex programmable logic devices

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Business, Economics & Management (AREA)
  • Multimedia (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Game Theory and Decision Science (AREA)
  • Economics (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

本发明涉及一种视频处理方法,建立对象素材资源库用于保存对象、属性信息、素材和描述信息及其对应关系,建立源视频资源库用于保存源视频、可植入区域和特征标签及其对应关系,基于匹配规则,为目标对象匹配适合植入的源视频及其目标可植入区域,或者为目标源视频匹配适合植入的对象及其素材,将对象的素材植入源视频的可植入区域获得合成视频。本发明实现了在源视频中自动植入相匹配的对象,降低了人工成本,提高了视频处理效率。

Description

一种视频处理方法、装置、电子设备和存储介质
相关申请的交叉引用
本申请是以申请号为202211426333.1,申请日为2022年11月14日,题为“一种视频处理方法、装置、电子设备和存储介质”的中国申请为基础,并主张其优先权,该中国申请的公开内容在此作为整体引入本申请中。
技术领域
本发明涉及信息处理领域,尤其涉及一种视频处理方法、装置、电子设备和存储介质。
背景技术
视频是互联网最重要的传播信息之一,通过在视频信息中植入一些信息,可以实现不同应用场景下的功能,视频中可以植入与视频内容有关联的信息,对视频内容进行解释和说明,例如教育培训场景下的视频植入信息可以增强学习效果;也可以植入一些跳转链接,例如在一些直播场景下的实时视频中植入商品交易链接可以在观看视频过程中实现交易;也可以植入广告,例如在各种视频资源中植入广告信息可以实现品牌和商品推广。目前,这些都需要对视频进行人工处理来进行信息植入。
发明内容
为了更高效的、智能化的实现视频中的信息植入,本发明提出一种视频处理方法、装置、电子设备、存储介质和计算机程序。
根据本发明的一方面,提供了一种视频处理方法,包括:
响应于接收到的针对目标对象的第一请求,为所述目标对象确定至少一个素材和素材的描述信息,其中,所述第一请求是关于针对所述目标对象进行信息传播的请求,目标对象具有属性信息,素材具有描述信息,描述信息用于表征所述素材的特征;
基于第一匹配规则,针对目标对象确定至少一个源视频,所述至少一个源视频包括至少一个目标可植入区域,所述至少一个目标可植入区域与所述目标对象的至少一个素材相匹配;
通过将目标对象的素材植入与之相匹配的源视频的目标可植入区域中生成合成视频。
根据本发明的另一方面,提供了一种视频处理方法,包括:
响应于接收到的针对目标源视频的处理请求,确定所述目标源视频中的可植入区域和 可植入区域对应的特征标签;其中,所述处理请求是关于针对所述目标源视频进行对象植入的请求;所述特征标签用于表征所述可植入区域的特征;
基于第二匹配规则,针对目标源视频确定至少一个可植入对象,所述可植入对象包括至少一个目标素材,所述目标素材与所述目标源视频中的至少一个可植入区域相匹配;
通过将目标对象的素材植入与之相匹配的源视频的目标可植入区域中生成合成视频。
根据本发明的另一方面,提供了一种视频处理装置,包括:
第一确定模块,用于响应于接收到的针对目标对象的第一请求,为所述目标对象确定至少一个素材和素材的描述信息,其中,所述第一请求是关于针对所述目标对象进行信息传播的请求,目标对象具有属性信息,素材具有描述信息,描述信息用于表征所述素材的特征;
第一匹配模块,用于基于第一匹配规则,针对目标对象确定至少一个源视频,所述至少一个源视频包括至少一个目标可植入区域,所述至少一个目标可植入区域与所述目标对象的至少一个素材相匹配;
合成模块,用于通过将目标对象的素材植入与之相匹配的源视频的目标可植入区域中生成合成视频。
根据本发明的另一方面,提供了一种视频处理装置,包括:
第二确定模块,用于响应于接收到的针对目标源视频的处理请求,确定所述目标源视频中的可植入区域和可植入区域对应的特征标签;其中,所述处理请求是关于针对所述目标源视频进行对象植入的请求;所述特征标签用于表征所述可植入区域的特征;
第二匹配模块,用于基于第二匹配规则,针对目标源视频确定至少一个可植入对象,所述可植入对象包括至少一个目标素材,所述目标素材与所述目标源视频中的至少一个可植入区域相匹配;
合成模块,用于通过将目标对象的素材植入与之相匹配的源视频的目标可植入区域中生成合成视频。
根据本发明的另一面,提供了一种电子设备,包括:
至少一个处理器;
用于存储所述至少一个处理器可执行指令的存储器;
其中,所述至少一个处理器被配置为执行所述指令,以实现如前述中任一项所述的方法。
根据本发明的另一面,提供了一种计算机可读存储介质,其上存储有计算机程序,其特征在于,当所述计算机程序由处理器执行时实现如前述任一项所述的方法。
根据本发明的另一面,提供了一种计算机程序产品,包括计算机程序,其特征在于,所述计算机程序被处理器执行时实现如前述任一项所述方法。
本申请实施例中提供的技术方案,可以实现对源视频的自动识别可植入区域,并基于可植入区域的特征标签匹配可植入的对象的目标素材,以及实现对目标对象的素材自动识别可植入源视频,并未目标对象的素材匹配源视频的目标可植入区域,双向对源视频和对象进行匹配以获得合成视频,并基于用户的视频兴趣向用户进行合成视频推荐。既实现了对源视频的自动植入,减少了人工成本提升了处理效能,又能根据用户的兴趣实现视频推荐并获得商业收益。
附图说明
在下面结合附图对于示例性实施例的描述中,本发明的更多细节、特征和优点被发明,在附图中:
图1为本发明一示例性实施例提供的系统架构图;
图2为本发明一示例性实施例提供的应用场景示意图;
图3为本发明一示例性实施例提供的视频处理平台的示意性框图;
图4为本发明一示例性实施例提供的视频处理方法的流程图;
图5为本发明一示例性实施例提供的视频处理方法的流程图;
图6为本发明一示例性实施例提供的视频预处理方法的流程图;
图7为本发明一示例性实施例提供的视频推荐方法的流程图;
图8A和8B分别为本发明两示例性实施例提供的视频处理装置的功能模块示意性框图;
图9为本发明一示例性实施例提供的电子设备的结构框图;
图10为本发明一示例性实施例提供的计算机系统的结构框图。
具体实施方式
下面将参照附图更详细地描述本发明的实施例。虽然附图中显示了本发明的某些实施例,然而应当理解的是,本发明可以通过各种形式来实现,而且不应该被解释为限于这里阐述的实施例,相反提供这些实施例是为了更加透彻和完整地理解本发明。应当理解的是,本发明的附图及实施例仅用于示例性作用,并非用于限制本发明的保护范围。
应当理解,本发明的方法实施方式中记载的各个步骤可以按照不同的顺序执行,和/或并行执行。此外,方法实施方式可以包括附加的步骤和/或省略执行示出的步骤。本发明 的范围在此方面不受限制。
本文使用的术语“包括”及其变形是开放性包括,即“包括但不限于”。术语“基于”是“至少部分地基于”。术语“一个实施例”表示“至少一个实施例”;术语“另一实施例”表示“至少一个另外的实施例”;术语“一些实施例”表示“至少一些实施例”。其他术语的相关定义将在下文描述中给出。需要注意,本发明中提及的“第一”、“第二”等概念仅用于对不同的装置、模块或单元进行区分,并非用于限定这些装置、模块或单元所执行的功能的顺序或者相互依存关系。
需要注意,本发明中提及的“一个”、“多个”的修饰是示意性而非限制性的,本领域技术人员应当理解,除非在上下文另有明确指出,否则应该理解为“一个或多个”。
本发明实施方式中的多个装置之间所交互的消息或者信息的名称仅用于说明性的目的,而并不是用于对这些消息或信息的范围进行限制。
附图中所示的方框图仅仅是功能实体,不一定必须与物理上独立的实体相对应。即,可以采用软件形式来实现这些功能实体,或在一个或多个硬件模块或集成电路中实现这些功能实体,或在不同网络和/或处理器装置和/或微控制器装置中实现这些功能实体。
附图中所示的流程图仅是示例性说明,不是必须包括所有的内容和操作/步骤,也不是必须按所描述的顺序执行。例如,有的操作/步骤还可以分解,而有的操作/步骤可以合并或部分合并,因此实际执行的顺序有可能根据实际情况改变。
在介绍本发明实施例之前首先对本发明实施例中涉及到的相关名词作如下释义:
视频,基本结构是由帧、镜头、场景和视频节目构成的层次结构,其中帧是一幅静态图像,是组成视频的最小逻辑单元,将时间上连续的帧序列按等间隔连续播放,便形成动态视频。
镜头,是一台摄像机从开机到关机连续拍摄的帧序列,描绘一个事件或一个场面的一部分,不具有或具有较弱的语义信息,强调构成帧的视觉内容相似性。
场景,是语义相关的连续镜头,可以是相同对象的不同角度、不同技法拍摄,也可以是具有相同主体和事件的镜头组合,强调语义的相关性。
视频节目包含一个完整的事件或故事,作为最高层的视频内容结构,它包括视频的组成关系以及对视频的摘要、语义和一般性描述等。
语义分割(Semantic Segmentation),对图像中的每个像素赋予语义标签,识别不同类别的物体。
实例分割(Instance Segmentation),首先在图像中确定存在对象的位置区域,然后识别对象的类别。
全景分割(Panoramic Segmentation),对图像中所有对象包括背景都进行检测和分割。
视觉同时定位与地图重建(Simultaneous Localization and Mapping,SLAM)技术,视觉SLAM视图解决利用视觉传感器获得的视觉信息实现定位和地图重建,即观测本体的运动轨迹并重建环境地图。
3D场景分析,对视频中的场景进行判断,并结合平面识别技术,分析该场景中适合放置3D素材的区域。
以下参照附图描述本发明的方案,具体如下:
图1示出了可以应用本发明实施例的技术方案的示例性系统架构的示意图。
如图1所示,系统架构可以包括终端(如图1中所示智能手机101、平板电脑102和便携式计算机103中的一种或多种,当然也可以是台式计算机等等)、网络104和服务器105。网络104用以在终端设备和服务器105之间提供通信链路的介质。网络104可以包括各种连接类型,例如有线通信链路、无线通信链路等等。
应该理解,图1中的终端、网络和服务器的数目仅仅是示意性的。根据实际需要,可以具有任意数目的终端、网络和服务器。比如服务器105可以是多个服务器组成的服务器集群等。
在示例性的场景中,终端向服务器105发出提供视频的请求,服务器105响应于该请求基于预设的视频提供策略向终端发送相应的视频或终端可获取视频的交互界面。
图2示出了应用本发明实施例的技术方案的应用场景图。
如图2所示,视频服务系统包括视频提供方201、对象提供方202、视频处理平台203和终端204。视频提供方向视频处理平台提供各种源视频,例如可以是离线视频或实时视频,例如可以是影视、动画、纪录片、科普知识、短视频、直播视频等,视频提供方可以是源视频的原创方,例如作者,也可以是源视频的授权方,例如视频平台;对象提供方向视频处理平台提供针对目标对象的传播需求,例如品牌宣传需求、商品推广需求、或者信息发布需求等,对象提供方可以是品牌商、商品供应商、销售商、媒体等;视频处理平台基于从视频提供方获得的源视频以及从对象提供方获得的待传播的目标对象,进行视频处理,获得融合了源视频和目标对象的视频,并基于终端的视频请求将融合后的视频提供给终端。
图3示出了本发明实施例的视频处理平台的示意性框图。
如图3所示,视频处理平台对接收的源视频进行处理,并形成源视频资源库,针对每一个源视频,获得至少一个可植入区域,针对每一个可植入区域形成特征标签,所述源视频资源库对应于每个源视频保存源视频、可植入区域、特征标签及其对应关系;视频处理 平台针对指定的对象维护对象素材资源库,针对每一个由对象提供方指定的对象,配置有对象的属性信息、相应的素材以及素材的描述信息,例如如图3所示,对于对象ID为00001的商品对象,属性信息可以包括商品名称(例如“天天可乐”)、商品的多个类别信息(例如快消、饮料、无酒精、低糖等)以及其他属性信息,例如保质期等,该商品对象配置有多个素材,每个素材具有相应的描述信息,描述信息例如可以包括素材的内容、表现形式、2D或3D等;视频处理平台具有匹配模块、合成模块和推荐模块,并相应配置有匹配规则和推荐策略,匹配模块用于根据匹配规则实现对源视频及其区域与指定对象及其素材进行匹配,以获得用于进行视频合成的基础材料,并通过合成模块进行视频合成,合成视频保存在合成视频资源库中,推荐模块,用于基于推荐策略对至少一个合成视频进行推荐。
对象素材资源库中保存的素材可以是素材本身,如图3所示,也可以是素材的标识或指向素材的链接;素材可以来自于对象提供方,也可以根据对象提供方的需求,基于指定的对象进行制作生成,也可以从其他渠道获取。相似的,源视频资源库中保存的源视频和可植入区域可以是源视频或者可植入区域本身也可以是相应的标识或相应的链接;合成视频资源库中保存的视频可以是合成的视频本身也可以是相应的标识或相应的链接。
所述视频处理平台可以是集中式服务器的架构也可以是分离式服务器系统的架构,这些并不应成为对本发明的限制。
图4示出了本发明一示例性实施例提供的视频处理方法的流程图。
如图4所示,所述方法包括:
S401,响应于接收到的针对目标对象的第一请求,为所述目标对象确定至少一个素材和素材的描述信息,其中,所述第一请求是关于针对所述目标对象进行信息传播的请求,目标对象具有属性信息,素材具有描述信息,描述信息用于表征所述素材的特征。
第一请求中可以包括所述目标对象的素材,或者第一请求中可以包括所述目标对象的素材需求信息,通过所述素材需求信息可以为目标对象确定对应的素材。
可以在对象素材库中保存目标对象、目标对象的属性、对应的素材和素材的描述信息及其之间的映射关系,在一些实施例中,对象素材资源库如图3所示;目标对象具有至少一个属性信息、至少一个素材、每个素材具有对应的描述信息,用于表征所述素材的特征。
目标对象的属性信息可以包括多种维度的信息,包括但不限于目标对象的名称、类别、成分/材料、功能/功效、外观、结构、使用方法、简介等等。
素材的描述信息可以包括多种维度的信息,包括但不限于素材的内容、形式、2D/3D、图像/视频、场景、情节等。
以目标对象是某瓶装饮料为例,素材可以包括该饮料的品牌、该饮料的商品图、包含 以聚会为场景主题的该饮料的2D海报、该饮料的静态3D商品图、该饮料的3D动画图(例如瓶身扭转动画等)等。
以目标对象是某企业为例,素材可以包括该企业的名称及品牌、该企业提供的业务、该企业的宣传短视频、该企业楼宇场所的3D素材等。
可以将目标对象、属性信息、素材和素材的描述信息以及它们之间的映射关系保存在对象素材资源库中。
目标对象的属性信息和素材的描述信息可以通过多种方式获得,例如可以直接由目标对象提供方或素材提供方提供,也可以由视频处理平台人工或自动提取,可以设定不同类型的属性信息或者不同类型的描述信息以便于人工输入,也可以基于机器模型进行自动提取,例如通过决策树的方式对目标对象或素材自决策树的根节点处进行判断,逐层到每个叶子节点处,每个叶子节点对应于一个属性值或描述信息,也可以基于语义算法对与所述目标对象进行自动提取属性信息和描述信息。在一些实施例中,可以基于事先训练好的语义模型对所述目标对象提取属性信息和描述信息。
语义算法和决策树属于本领域常用的人工智能算法,在此不进行展开描述。
S402,基于第一匹配规则,针对目标对象确定至少一个源视频,所述至少一个源视频包括至少一个目标可植入区域,所述至少一个目标可植入区域与所述目标对象的至少一个素材相匹配。
可以在源视频资源库中保存源视频、源视频的可植入区域和可植入区域的特征标签及其之间的映射关系,在一些实施例中,源视频资源库如图3所示;源视频具有至少一个可植入区域,每个可植入区域具有对应的特征标签,用于表征所述可植入区域的特征。
源视频可以是离线的视频,也可以是实时视频;视频的可植入区域可以是视频中的一个具有空间意义的区域,例如视频中的天空、地面等,也可以是具有面(例如平面或曲面)意义的区域,例如视频中的楼宇立面、广告牌、屏幕、咖啡杯的杯身等。
特征标签可以包括多种维度的信息,例如视频分类信息、可植入区域名称(例如天空、地面、桌面、广告牌等)、场景信息(例如聚会、运动等)、地点信息(例如咖啡厅、机场、卧室等)、可植入区域的多种图像特征信息(例如置信度、清晰度、尺寸等)。
在一些实施例中,特征标签包括与源视频对应的视频特征标签和与可植入区域对应的区域特征标签。例如在一条综艺花絮的源视频中,包括两个可植入区域,一个是星空区域,特征标签可以包括综艺、花絮、夜晚、星空等;一个是桌面区域,特征标签可以包括综艺、花絮、比赛、桌面等;两个可植入区域的标签中都包括源视频的视频特征标签综艺和花絮。
特征标签可以通过多种方式获得,例如可以直接由视频提供方提供,也可以由视频处 理平台人工或自动提取,可以设定不同类型的特征标签进行人工配置,也可以基于机器模型进行自动提取,例如通过决策树的方式对源视频从决策树的根节点开始判断,每个叶子节点对应于一个特征标签,也可以基于语义算法对与所述源视频进行语义分析,从而自动提取特征标签。
第一匹配规则与可植入区域的特征标签、对象的属性信息和素材的描述信息有关。
在一些实施例中,第一匹配规则可以是预设的映射关系,例如特征标签“聚会”与对象属性“饮料”具有对应关系,特征标签“吧台”与对象属性为“饮料”的描述信息“商品外观”具有对应关系;基于这种匹配规则,当目标对象为饮料时,可以确定源视频中特征标签包含“聚会”和“吧台”的可植入区域为目标可植入区域。
在一些实施例中,步骤S402包括:
步骤S4021,计算可植入区域的各特征标签与目标对象的各属性信息之间的第一匹配度。
其中,特征标签与属性信息之间的匹配度计算,分别采用特征标签对应的特征向量和属性信息对应的特征向量,计算两两特征向量之间的相似度;相似度计算可以采用多种方式,例如皮尔逊相关系数、欧式距离、余弦相似度、点积相似度;对计算得到的各相似度值取平均值获得第一匹配度。
步骤S4022,计算可植入区域的各特征标签与素材的各描述信息之间的第二匹配度。
其中,特征标签与描述信息之间的匹配度计算,分别采用特征标签对应的特征向量和描述信息对应的特征向量,计算两两特征向量之间的相似度;相似度计算可以采用多种方式,例如皮尔逊相关系数、欧式距离、余弦相似度、点积相似度;对计算得到的各相似度值取平均值获得第二匹配度。
步骤S4023,基于所述第一匹配度和所述第二匹配度确定源视频以及目标可植入区域。
在一些实施例中,步骤S4023可以包括:基于第一匹配度确定源视频,并基于第二匹配度从确定的源视频中的可植入区域确定目标可植入区域。基于匹配度确定源视频或目标可植入区域的方式可以包括对匹配度值进行排序,选择排序为预设名次之前的匹配度对应的源视频或可植入区域;也可以设置一预设阈值,确定匹配度值高于预设阈值的源视频或可植入区域。
在一些实施例中,第一匹配规则还包括根据素材的2D和3D分类确定目标可植入区域,当素材为2D类型时,目标可植入区域具有能够表征该目标可植入区域为面的标签,例如桌面、楼宇平面、大屏、广告牌、镜面、玻璃平面、杯身等;当素材为3D类型时, 目标可植入区域具有能够表征该目标可植入区域为空间的标签,例如天空、地面、星空、峡谷等。
S403,通过将目标对象的素材植入与之相匹配的源视频的目标可植入区域中生成合成视频。
在一些实施例中,当所述目标对象的素材为2D素材时,将所述2D素材植入与之相匹配的第一目标可植入区域,该第一目标可植入区域具有表征该区域为面的标签;当所述目标对象的素材为3D素材时,将所述3D素材植入与之相匹配的第二目标可植入区域,该第二目标可植入区域具有表征该区域为空间的标签,例如天空、地面、房间、峡谷等。
在一些实施例中,当目标对象的素材为3D素材时,可以利用平面图像识别技术,使3D模型根据初始位置呈现在视频中对应的位置;以及可以利用运动追踪技术,使得3D模型会根据视频内容视角的变化,呈现对应视角的内容。
在一些实施例中,步骤S403还包括对合成视频进行渲染的步骤,所述渲染包括但不限于光栅化渲染、光线投射、光线跟踪等方式。在一些实施例中,当所述素材为3D素材时,还包括神经辐射场(NeRF)渲染。
在一些实施例中,在步骤S403之后还包括:
S404,基于预设的推荐策略,对至少一条合成视频进行推荐,所述推荐策略与用户的视频历史数据和搜索历史数据有关。
视频处理平台对合成视频的推荐可以基于用户的请求,例如收到来自于用户的直接或间接的视频获取请求时,系统根据推荐策略向用户推荐按至少一条合成视频;也可以基于预设程序或界面的冷启动,例如用户打开某个程序或者进入某个页面时,可以自动对用户进行视频推荐。
用户可以通过一些视频平台上的交互界面来发出视频获取请求,例如点击交互界面中的“搞笑视频”,用户发出的视频获取请求可以直接被发送给本发明的视频处理平台或由其他视频平台接收,并从其他视频平台向本发明的视频处理平台发送获取合成视频的请求。
很容易理解,如果合成的视频中包括的目标对象是来自于广告商的商品对象时,将合成视频推荐给用户进行播放可以为视频处理平台、视频源提供方或者向用户推荐该视频的平台带来一些商业利益,因此其他视频平台可能会出于商业利益的诉求从视频处理平台获取该合成视频。
图5示出了本发明一示例性实施例提供的视频处理方法的流程图。
如图5所示,所述方法包括:
S501,响应于接收到的针对目标源视频的处理请求,确定所述目标源视频中的可植入 区域和可植入区域对应的特征标签;其中,所述处理请求是关于针对所述目标源视频进行对象植入的请求;所述特征标签用于表征所述可植入区域的特征。
可以在源视频资源库中保存源视频、源视频的可植入区域和可植入区域的特征标签及其之间的映射关系,在一些实施例中,源视频资源库如图3所示;源视频具有至少一个可植入区域,每个可植入区域具有对应的特征标签,用于表征所述可植入区域的特征。
源视频可以是离线的视频,也可以是实时视频;视频的可植入区域可以是视频中的一个背景区域,例如视频中的天空、地面等,也可以是视频中的一个对象区域,例如视频中的楼宇立面、广告牌、屏幕、咖啡杯的杯身等。
特征标签可以包括多种维度的信息,例如视频分类信息、可植入区域名称(例如天空、地面、桌面、广告牌等)、场景信息(例如聚会、运动等)、地点信息(例如咖啡厅、机场、卧室等)、可植入区域的多种图像特征信息(例如置信度、清晰度、尺寸等)。
在一些实施例中,特征标签包括两部分,一部分是与源视频对应的特征标签,一部分是与可植入区域对应的特征标签。例如在一条综艺花絮的源视频中,包括两个可植入区域,一个是星空区域,特征标签可以包括综艺、花絮、夜晚、星空等;一个是桌面区域,特征标签可以包括综艺、花絮、比赛、桌面等;两个可植入区域的标签中都包括源视频的特征标签综艺和花絮。
特征标签可以通过多种方式获得,例如可以直接由视频提供方提供,也可以由视频处理平台人工或自动提取,可以设定不同类型的特征标签进行人工配置,也可以基于机器模型进行自动提取,例如通过决策树的方式对源视频从决策树的根节点开始判断,每个叶子节点对应于一个特征标签,也可以基于语义算法对与所述源视频进行语义分析,从而自动提取特征标签。
S502,基于第二匹配规则,针对目标源视频确定至少一个可植入对象,所述可植入对象包括至少一个目标素材,所述目标素材与所述目标源视频中的至少一个可植入区域相匹配。
可以在对象素材库中保存目标对象、目标对象的属性、对应的素材和素材的描述信息及其之间的映射关系,在一些实施例中,对象素材资源库如图3所示;目标对象具有至少一个属性信息、至少一个素材、每个素材具有对应的描述信息,用于表征所述素材的特征。
对象的属性信息可以包括多种维度的信息,包括但不限于目标对象的名称、类别、成分/材料、功能/功效、外观、结构、使用方法、简介等等。
素材的描述信息可以包括多种维度的信息,包括但不限于素材的内容、形式、2D/3D、图像/视频、场景、情节等。
以目标对象是某瓶装饮料为例,素材可以包括该饮料的品牌、该饮料的商品图、包含以聚会为场景主题的该饮料的2D海报、该饮料的静态3D商品图、该饮料的3D动画图(例如瓶身扭转动画等)等。
以目标对象是某企业为例,素材可以包括该企业的名称及品牌、该企业提供的业务、该企业的宣传短视频、该企业楼宇场所的3D素材等。
可以将目标对象、属性信息、素材和素材的描述信息以及它们之间的映射关系保存在对象素材资源库中。
目标对象的属性信息和素材的描述信息可以通过多种方式获得,例如可以直接由目标对象提供方或素材提供方提供,也可以由视频处理平台人工或自动提取,可以设定不同类型的属性信息进行人工填写,也可以基于机器模型进行自动提取,例如通过决策树的方式对目标对象进行逐级判断,每个叶子节点对应于一个描述,也可以基于语义算法对与所述目标对象有关的描述进行大数据分析,从而自动提取描述信息。
语义算法和决策树属于本领域常用的人工智能算法,在此不进行展开描述。
第二匹配规则与可植入区域的特征标签、对象的属性和素材的描述信息有关。
在一些实施例中,第二匹配规则可以是预设的映射关系,例如特征标签“自驾”与对象属性“车辆”、“轮胎”、“卫星导航”、“提神”等具有对应关系,特征标签“夜空”与对象属性为“卫星导航”的具有描述信息“3D”素材具有对应关系;基于这种匹配规则,当目标源视频为一段夜晚野外自驾的视频时,可以为源视频确定目标对象为某卫星野外护航服务,其包括的3D素材适于植入自驾视频中的夜空区域。
基于第二匹配规则,针对目标源视频确定至少一个可植入对象,所述可植入对象包括至少一个目标素材,所述目标素材与所述目标源视频中的至少一个可植入区域相匹配。
在一些实施例中,第二匹配规则可以是预设的语义模型,包括:
步骤S5021,计算目标源视频区域的各特征标签与可植入对象的各属性信息之间的第三匹配度。
其中,特征标签与属性信息之间的匹配度计算,分别采用特征标签对应的特征向量和属性信息对应的特征向量,计算两两特征向量之间的相似度;相似度计算可以采用多种方式,例如皮尔逊相关系数、欧式距离、余弦相似度、点积相似度;对计算得到的各相似度值取平均值获得第三匹配度。
步骤S5022,计算目标源视频的可植入区域的各特征标签与可植入对象的素材的各描述信息之间的第四匹配度。
其中,特征标签与描述信息之间的匹配度计算,分别采用特征标签对应的特征向量和 描述信息对应的特征向量,计算两两特征向量之间的相似度;相似度计算可以采用多种方式,例如皮尔逊相关系数、欧式距离、余弦相似度、点积相似度;对计算得到的各相似度值取平均值获得第四匹配度。
步骤S5023,基于所述第三匹配度和所述第四匹配度确定可植入对象以及目标素材。
在一些实施例中,步骤S5023可以包括:基于第三匹配度确定可植入对象,并基于第四匹配度从确定的可植入对象的素材中确定目标素材。基于匹配度确定可植入对象或目标素材的方式可以包括对匹配度值进行排序,选择排序为预设名次之前的匹配度对应的可植入对象频或目标素材;也可以设置一预设阈值,确定匹配度值高于预设阈值的可植入对象或目标素材。
在一些实施例中,第三匹配规则还包括:当可植入区域的标签表征该目标可植入区域为面区域时,例如桌面、楼宇平面、大屏、广告牌、镜面、玻璃平面、杯身等,从可植入对象中选择2D素材;当可植入区域的标签表征该区域为空间区域时,从可植入对象中选择3D素材,或空间来确定素材的2D和3D分类确定目标可植入区域,当素材为2D类型时,目标可植入区域具有能够表征该目标可植入区域为面的标签;当素材为3D类型时,例如天空、地面、星空、峡谷等目标可植入区域具有能够表征该目标可植入区域为空间的标签。
在一些实施例中,第二匹配规则可以是匹配度算法,基于该匹配度算法计算源视频的特征标签、目标对象的属性、以及素材的描述之间的匹配度值,基于该匹配度值确定源视频的目标可植入区域;例如可以对匹配度值进行排序,选择排序为预设名次之前的可植入区域为目标可植入区域,也可以设置一预设阈值,确定匹配度高于预设阈值的可植入区域为目标可植入区域。
在一些实施例中,第二匹配规则还包括根据素材的2D和3D分类确定目标可植入区域,当素材为2D类型时,目标可植入区域具有能够表征该目标可植入区域为平面的标签,例如桌面、楼宇平面、大屏、广告牌、镜面、玻璃平面等;当素材为3D类型时,目标可植入区域具有能够表征该目标可植入区域为空间的标签,例如天空、地面、星空、峡谷等。
将目标对象、目标对象的属性信息、配置的素材和素材的描述信息以及它们之间的映射关系保存在对象素材资源库中。
S503,通过将目标对象的素材植入与之相匹配的源视频的目标可植入区域中生成合成视频。
在一些实施例中,当所述目标对象的素材为2D素材时,将所述2D素材植入与之相匹配的第一目标可植入区域,该第一目标可植入区域具有表征该区域为面的标签;当所述 目标对象的素材为3D素材时,将所述3D素材植入与之相匹配的第二目标可植入区域,该第二目标可植入区域具有表征该区域为空间的标签,例如天空、地面、房间、峡谷等。
在一些实施例中,步骤S503还包括对合成视频进行渲染的步骤,视频渲染包括但不限于光栅化渲染、光线投射、光线跟踪等方式。在一些实施例中,当所述素材为3D素材时,还包括神经辐射场(NeRF)渲染。
在一些实施例中,在步骤S503之后还包括:
S504,基于预设的推荐策略,对至少一条合成视频进行推荐,所述推荐策略与用户兴趣或所述合成视频的历史数据有关。
视频处理平台对合成视频的推荐可以基于用户的请求,例如收到来自于用户的直接或间接的视频获取请求时,系统根据推荐策略向用户推荐按至少一条合成视频;也可以基于预设程序或界面的冷启动,例如用户打开某个程序或者进入某个页面时,可以自动对用户进行视频推荐。
用户可以通过一些视频平台上的交互界面来发出视频获取请求,例如点击交互界面中的“搞笑视频”,用户发出的视频获取请求可以直接被发送给本发明的视频处理平台或由其他视频平台接收,并从其他视频平台向本发明的视频处理平台发送获取合成视频的请求。
很容易理解,如果合成的视频中包括的目标对象是来自于广告商的商品对象时,将合成视频推荐给用户进行播放可以为视频处理平台、视频源提供方或者向用户推荐该视频的平台带来一些商业利益,因此其他视频平台可能会出于商业利益的诉求从视频处理平台获取该合成视频。
图6为本发明一示例性实施例提供的视频预处理方法的流程图。
在上述方法中,还包括对源视频的预处理,以获得至少一个可植入区域及其对应的特征标签。如图6所示,源视频的预处理方法包括:
S601,对源视频进行视频分片以获得多个视频片段。
一段视频中包括连续的很多帧,为了有效识别可植入区域,可以对源视频进行视频分片以获得视频片段。
视频分片方法包括但不限于镜头切分和相似度切分;其中,镜头切分是以镜头为处理单元,即将每个镜头作为一个视频片段;相似度切分是对相邻帧中进行相似度计算,基于预设相似度条件对视频进行切分获得不同的视频片段。
在一些实施例中,基于源视频为视频离线视频和实时视频,对于视频进行预处理,包括:
当所述源视频为离线视频时,对源视频基于镜头切分或相似度切分;
当所述源视频为实时视频时,对源视频基于镜头切分。
S602,对视频片段进行分割以获得多个候选区域及其特征标签。
可以从视频片段中选择目标帧,对目标帧进行分割处理;分割的方式包括但不限于语义分割、实例分割、全景分割以及任意组合。
在一些实施例中,通过实例分割可以获得目标帧的候选区域以及各候选区域的标签信息,标签信息可以包括区域对应的图像分类信息、置信度等。可以采用相应的实例分割模型来实现实例分割,并采用图像帧作为训练样本,对实例分割模型进行训练。
在一些实施例中,通过全景分割可以获得目标帧的候选区域以及各候选区域的场景标签,场景标签基于目标帧的候选区域以及各候选区域在该目标帧之间的关联关系确定。例如当目标帧的候选区域为天空、海洋、沙滩、遮阳棚时,该场景标签可以为度假、海边、沙滩。
S603,基于候选区域确定可植入区域及其特征标签。
在一些实施例中,通过对目标帧的候选区域进行聚类,以确定源视频中的可植入区间。例如选择置信度超过预设阈值的为可植入区域。例如按照面积值或区域可连通性进行聚类确定可植入区域。
在一些实施例中,通过对目标帧的候选区域进行最大矩形搜索以确定源视频的可植入区域。例如选面积最大、区域空白的核心区域为可植入区域。例如选择平面类的区域和空间类的区域,例如收银台的台面和立面,长椅的椅面,跑步机的跑带等。
步骤S601-S603可以作为步骤S501中的确定所述目标源视频中的可植入区域和可植入区域对应的特征标签的具体实现方式;可以在步骤S402之前完成,以确定源视频的可植入区域,以便于步骤S402中从中确定与素材相匹配的目标可植入区域。
图7为本发明一示例性实施例提供的视频推荐方法的流程图。
如图7所示,前述方法中步骤S404和步骤S504,进一步包括:
S701,基于视频历史数据和搜索历史数据确定用户的视频兴趣。
经用户授权,获取用户的视频历史数据和搜索历史数据,所述视频历史数据包括用户观看的视频来源、类型、时长和频次等信息,所述搜索历史数据包括用户发生的与视频有关的搜索数据,例如包括搜索的视频关键词、点击和观看的信息等。
基于上述历史数据,确定用户的视频兴趣。确定方式例如可以是深度学习模型,基于大量视频历史数据和搜索历史数据样本来训练该深度学习模型,并利用训练后的学习模型分析上述历史数据,确定给用户的视频兴趣。
S702,基于所述视频兴趣和预设的推荐策略,推荐至少一条合成视频。
可以在合成视频资源库中保存合成的视频、视频标签及其之间的映射关系,在一些实施例中,合成视频资源库如图3所示;合成视频具有至少一个视频标签、视频标签用于表征所述合成视频的特征。
视频标签可以包括多种维度的信息,包括但不限于合成视频的名称(例如可以使用源视频的名称)、视频分类(例如主题分类、古代/现代、现实/科幻等等多种分类方式)、视频简介、评分等等。
视频标签可以通过多种方式获得,可以使用源视频的标签,也可以重新生成,可以直接由视频提供方提供,也可以由视频处理平台人工或自动对视频进行自动提取,可以设定不同类型的视频标签进行人工填写,也可以基于机器模型进行自动提取,例如通过决策树的方式对视频进行逐级判断,每个叶子节点对应于一个视频标签,也可以基于语义算法自动提取视频标签。
基于所述视频兴趣和所述推荐策略,选择至少一条合成视频进行推荐,所述合成视频的视频标签与所述视频兴趣之间的关系满足所述推荐策略。
在采用对应各个功能划分各个功能模块的情况下,本发明实施例提供了一种视频处理装置,该装置可以为服务器或应用于服务器的芯片。图8A为本发明一示例性实施例提供的视频处理装置的功能模块示意性框图。如图8A所示,该装置800包括:
第一确定模块801,用于响应于接收到的针对目标对象的第一请求,为所述目标对象确定至少一个素材和素材的描述信息,其中,所述第一请求是关于针对所述目标对象进行信息传播的请求,目标对象具有属性信息,素材具有描述信息,描述信息用于表征所述素材的特征;
第一匹配模块802,用于基于第一匹配规则,针对目标对象确定至少一个源视频,所述至少一个源视频包括至少一个目标可植入区域,所述至少一个目标可植入区域与所述目标对象的至少一个素材相匹配;
合成模块803,用于通过将目标对象的素材植入与之相匹配的源视频的目标可植入区域中生成合成视频。
图8B为本发明一示例性实施例提供的视频处理装置的功能模块示意性框图。如图8B所示,该装置800’包括:
第二确定模块801’,用于响应于接收到的针对目标源视频的处理请求,确定所述目标源视频中的可植入区域和可植入区域对应的特征标签;其中,所述处理请求是关于针对所述目标源视频进行对象植入的请求;所述特征标签用于表征所述可植入区域的特征;
第二匹配模块802’,用于基于第二匹配规则,针对目标源视频确定至少一个可植入对象,所述可植入对象包括至少一个目标素材,所述目标素材与所述目标源视频中的至少一个可植入区域相匹配;
合成模块803’,用于通过将目标对象的素材植入与之相匹配的源视频的目标可植入区域中生成合成视频。
本发明实施例还提供一种电子设备,包括:至少一个处理器;用于存储所述至少一个处理器可执行指令的存储器;其中,所述至少一个处理器被配置为执行所述指令,以实现本发明实施例发明的上述方法。
图9为本发明一示例性实施例提供的电子设备的结构示意图。如图9所示,该电子设备1800包括至少一个处理器1801以及耦接至处理器1801的存储器1802,该处理器1801可以执行本发明实施例发明的上述方法中的相应步骤。
上述处理器1801还可以称为中央处理单元(central processing unit,CPU),其可以是一种集成电路芯片,具有信号的处理能力。本发明实施例发明的上述方法中的各步骤可以通过处理器1801中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器1801可以是通用处理器、数字信号处理器(digital signal processing,DSP)、专用集成电路(application specific integrated circuit,ASIC)、现成可编程门阵列(field-programmable gate array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本发明实施例所发明的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于存储器1802中,例如随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质。处理器1801读取存储器1802中的信息,结合其硬件完成上述方法的步骤。
另外,根据本发明的各种操作/处理在通过软件和/或固件实现的情况下,可从存储介质或网络向具有专用硬件结构的计算机系统,例如图10所示的计算机系统1900安装构成该软件的程序,该计算机系统在安装有各种程序时,能够执行各种功能,包括诸如前文所述的功能等等。图10为本发明一示例性实施例提供的计算机系统的结构框图。
计算机系统1900旨在表示各种形式的数字电子的计算机设备,诸如,膝上型计算机、台式计算机、工作台、个人数字助理、服务器、刀片式服务器、大型计算机、和其它适合的计算机。电子设备还可以表示各种形式的移动装置,诸如,个人数字处理、蜂窝电话、智能电话、可穿戴设备和其它类似的计算装置。本文所示的部件、它们的连接和关系、以及它们的功能仅仅作为示例,并且不意在限制本文中描述的和/或者要求的本发明的实现。
如图10所示,计算机系统1900包括计算单元1901,该计算单元1901可以根据存储在只读存储器(ROM)1902中的计算机程序或者从存储单元1908加载到随机存取存储器(RAM)1903中的计算机程序,来执行各种适当的动作和处理。在RAM 1903中,还可存储计算机系统1900操作所需的各种程序和数据。计算单元1901、ROM 1902以及RAM 1903通过总线1904彼此相连。输入/输出(I/O)接口1905也连接至总线1904。
计算机系统1900中的多个部件连接至I/O接口1905,包括:输入单元1906、输出单元1907、存储单元1908以及通信单元1909。输入单元1906可以是能向计算机系统1900输入信息的任何类型的设备,输入单元1906可以接收输入的数字或字符信息,以及产生与电子设备的用户设置和/或功能控制有关的键信号输入。输出单元1907可以是能呈现信息的任何类型的设备,并且可以包括但不限于显示器、扬声器、视频/音频输出终端、振动器和/或打印机。存储单元1908可以包括但不限于磁盘、光盘。通信单元1909允许计算机系统1900通过网络诸如因特网的与其他设备交换信息/数据,并且可以包括但不限于调制解调器、网卡、红外通信设备、无线通信收发机和/或芯片组,例如蓝牙TM设备、WiFi设备、WiMax设备、蜂窝通信设备和/或类似物。
计算单元1901可以是各种具有处理和计算能力的通用和/或专用处理组件。计算单元1901的一些示例包括但不限于中央处理单元(CPU)、图形处理单元(GPU)、各种专用的人工智能(AI)计算芯片、各种运行机器学习模型算法的计算单元、数字信号处理器(DSP)、以及任何适当的处理器、控制器、微控制器等。计算单元1901执行上文所描述的各个方法和处理。例如,在一些实施例中,本发明实施例发明的上述方法可被实现为计算机软件程序,其被有形地包含于机器可读介质,例如存储单元1908。在一些实施例中,计算机程序的部分或者全部可以经由ROM 1902和/或通信单元1909而被载入和/或安装到电子设备1900上。在一些实施例中,计算单元1901可以通过其他任何适当的方式(例如,借助于固件)而被配置为执行本发明实施例发明的上述方法。
本发明实施例还提供一种计算机可读存储介质,其中,当所述计算机可读存储介质中的指令由电子设备的处理器执行时,使得所述电子设备能够执行本发明实施例发明的上述方法。
本发明实施例中的计算机可读存储介质可以是有形的介质,其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。上述计算机可读存储介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备,或者上述内容的任何合适组合。更具体的,上述计算机可读存储介质可以包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器 (RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。
上述计算机可读介质可以是上述电子设备中所包含的;也可以是单独存在,而未装配入该电子设备中。
本发明实施例还提供一种计算机程序产品,包括计算机程序,其中,所述计算机程序被处理器执行时实现本发明实施例发明的上述方法。
在本发明的实施例中,可以以一种或多种程序设计语言或其组合来编写用于执行本发明的操作的计算机程序代码,上述程序设计语言包括但不限于面向对象的程序设计语言,诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言,诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络(包括局域网(LAN)或广域网(WAN))连接到用户计算机,或者,可以连接到外部计算机。
附图中的流程图和框图,图示了按照本发明各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
描述于本发明实施例中所涉及到的模块、部件或单元可以通过软件的方式实现,也可以通过硬件的方式来实现。其中,模块、部件或单元的名称在某种情况下并不构成对该模块、部件或单元本身的限定。
本文中以上描述的功能可以至少部分地由一个或多个硬件逻辑部件来执行。例如,非限制性地,可以使用的示例性的硬件逻辑部件包括:现场可编程门阵列(FPGA)、专用集成电路(ASIC)、专用标准产品(ASSP)、片上系统(SOC)、复杂可编程逻辑设备(CPLD)等等。
以上描述仅为本发明的一些实施例以及对所运用技术原理的说明。本领域技术人员应当理解,本发明中所涉及的发明范围,并不限于上述技术特征的特定组合而成的技术方案, 同时也应涵盖在不脱离上述发明构思的情况下,由上述技术特征或其等同特征进行任意组合而形成的其它技术方案。例如上述特征与本发明中发明的(但不限于)具有类似功能的技术特征进行互相替换而形成的技术方案。
虽然已经通过示例对本发明的一些特定实施例进行了详细说明,但是本领域的技术人员应该理解,以上示例仅是为了进行说明,而不是为了限制本发明的范围。本领域的技术人员应该理解,可在不脱离本发明的范围和精神的情况下,对以上实施例进行修改。本发明的范围由所附权利要求来限定。

Claims (11)

  1. 一种视频处理方法,包括:
    响应于接收到的针对目标对象的第一请求,为所述目标对象确定至少一个素材和素材的描述信息,其中,所述第一请求是关于针对所述目标对象进行信息传播的请求,目标对象具有属性信息,素材具有描述信息,描述信息用于表征所述素材的特征;
    基于第一匹配规则,针对目标对象确定至少一个源视频,所述至少一个源视频包括至少一个目标可植入区域,所述至少一个目标可植入区域与所述目标对象的至少一个素材相匹配;
    通过将目标对象的素材植入与之相匹配的源视频的目标可植入区域中生成合成视频。
  2. 如权利要求1所述的方法,在所述通过将目标对象的素材植入与之相匹配的源视频的目标可植入区域中生成合成视频之后,还包括:
    基于预设的推荐策略,对至少一条合成视频进行推荐,所述推荐策略与用户的视频历史数据和搜索历史数据有关。
  3. 如权利要求1所述的方法,所述通过将目标对象的素材植入与之相匹配的源视频的目标可植入区域中生成合成视频还包括:
    对所述合成视频进行渲染的步骤,所述渲染包括以下至少之一:光栅化渲染、光线投射、光线跟踪和神经辐射场渲染。
  4. 如权利要求1-3之一所述的方法,在所述基于第一匹配规则,针对目标对象确定至少一个源视频之前,包括:
    对源视频进行视频分片以获得多个视频片段;
    对所述视频片段进行分割以获得多个候选区域及其特征标签;
    基于所述候选区域确定可植入区域及其特征标签。
  5. 如权利要求4所述的方法,所述基于第一匹配规则,针对目标对象确定至少一个源视频,包括:
    计算可植入区域的各特征标签与目标对象的各属性信息之间的第一匹配度;
    计算可植入区域的各特征标签与素材的各描述信息之间的第二匹配度;
    基于所述第一匹配度和所述第二匹配度确定至少一个源视频以及目标可植入区域。
  6. 一种视频处理方法,包括:
    响应于接收到的针对目标源视频的处理请求,确定所述目标源视频中的可植入区域和可植入区域对应的特征标签;其中,所述处理请求是关于针对所述目标源视频进行对象植 入的请求;所述特征标签用于表征所述可植入区域的特征;
    基于第二匹配规则,针对目标源视频确定至少一个可植入对象,所述可植入对象包括至少一个目标素材,所述目标素材与所述目标源视频中的至少一个可植入区域相匹配;
    通过将目标对象的素材植入与之相匹配的源视频的目标可植入区域中生成合成视频。
  7. 一种视频处理装置,包括:
    第一确定模块,用于响应于接收到的针对目标对象的第一请求,为所述目标对象确定至少一个素材和素材的描述信息,其中,所述第一请求是关于针对所述目标对象进行信息传播的请求,目标对象具有属性信息,素材具有描述信息,描述信息用于表征所述素材的特征;
    第一匹配模块,用于基于第一匹配规则,针对目标对象确定至少一个源视频,所述至少一个源视频包括至少一个目标可植入区域,所述至少一个目标可植入区域与所述目标对象的至少一个素材相匹配;
    合成模块,用于通过将目标对象的素材植入与之相匹配的源视频的目标可植入区域中生成合成视频。
  8. 一种视频处理装置,包括:
    第二确定模块,用于响应于接收到的针对目标源视频的处理请求,确定所述目标源视频中的可植入区域和可植入区域对应的特征标签;其中,所述处理请求是关于针对所述目标源视频进行对象植入的请求;所述特征标签用于表征所述可植入区域的特征;
    第二匹配模块,用于基于第二匹配规则,针对目标源视频确定至少一个可植入对象,所述可植入对象包括至少一个目标素材,所述目标素材与所述目标源视频中的至少一个可植入区域相匹配;
    合成模块,用于通过将目标对象的素材植入与之相匹配的源视频的目标可植入区域中生成合成视频。
  9. 一种电子设备,包括:
    至少一个处理器;
    用于存储所述至少一个处理器可执行指令的存储器;
    其中,所述至少一个处理器被配置为执行所述指令,以实现如权利要求1-6中任一项所述的方法。
  10. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,当所述计算机程序由处理器执行时实现如权利要求1-6中任一项所述的方法。
  11. 一种计算机程序产品,包括计算机程序,其特征在于,所述计算机程序被处理器 执行时实现如权利要求1-6中任一项所述方法。
PCT/CN2023/131208 2022-11-14 2023-11-13 一种视频处理方法、装置、电子设备和存储介质 WO2024104286A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211426333.1 2022-11-14
CN202211426333.1A CN118042217A (zh) 2022-11-14 2022-11-14 一种视频处理方法、装置、电子设备和存储介质

Publications (1)

Publication Number Publication Date
WO2024104286A1 true WO2024104286A1 (zh) 2024-05-23

Family

ID=91003008

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/131208 WO2024104286A1 (zh) 2022-11-14 2023-11-13 一种视频处理方法、装置、电子设备和存储介质

Country Status (2)

Country Link
CN (1) CN118042217A (zh)
WO (1) WO2024104286A1 (zh)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111182338A (zh) * 2020-01-13 2020-05-19 上海极链网络科技有限公司 一种视频处理方法、装置、存储介质及电子设备
CN111988657A (zh) * 2020-08-05 2020-11-24 网宿科技股份有限公司 一种广告插入方法及装置
CN112153483A (zh) * 2019-06-28 2020-12-29 腾讯科技(深圳)有限公司 信息植入区域的检测方法、装置及电子设备
CN112312195A (zh) * 2019-07-25 2021-02-02 腾讯科技(深圳)有限公司 视频中植入多媒体信息的方法、装置、计算机设备及存储介质
CN112927024A (zh) * 2021-03-29 2021-06-08 北京奇艺世纪科技有限公司 广告投放方法、系统、装置、电子设备与可读存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112153483A (zh) * 2019-06-28 2020-12-29 腾讯科技(深圳)有限公司 信息植入区域的检测方法、装置及电子设备
CN112312195A (zh) * 2019-07-25 2021-02-02 腾讯科技(深圳)有限公司 视频中植入多媒体信息的方法、装置、计算机设备及存储介质
CN111182338A (zh) * 2020-01-13 2020-05-19 上海极链网络科技有限公司 一种视频处理方法、装置、存储介质及电子设备
CN111988657A (zh) * 2020-08-05 2020-11-24 网宿科技股份有限公司 一种广告插入方法及装置
CN112927024A (zh) * 2021-03-29 2021-06-08 北京奇艺世纪科技有限公司 广告投放方法、系统、装置、电子设备与可读存储介质

Also Published As

Publication number Publication date
CN118042217A (zh) 2024-05-14

Similar Documents

Publication Publication Date Title
US10776970B2 (en) Method and apparatus for processing video image and computer readable medium
US20220309762A1 (en) Generating scene graphs from digital images using external knowledge and image reconstruction
EP3267362B1 (en) Machine learning image processing
CN108629224B (zh) 信息呈现方法和装置
CN110489582B (zh) 个性化展示图像的生成方法及装置、电子设备
US10902262B2 (en) Vision intelligence management for electronic devices
US9324006B2 (en) System and method for displaying contextual supplemental content based on image content
CN102334118B (zh) 基于用户兴趣学习的个性化广告推送方法与系统
Cheng et al. Multimedia features for click prediction of new ads in display advertising
US20220405607A1 (en) Method for obtaining user portrait and related apparatus
CN110390033A (zh) 图像分类模型的训练方法、装置、电子设备及存储介质
Mei et al. ImageSense: Towards contextual image advertising
CN111491187B (zh) 视频的推荐方法、装置、设备及存储介质
CN113760158A (zh) 目标对象展示方法、对象关联方法、装置、介质及设备
US9449231B2 (en) Computerized systems and methods for generating models for identifying thumbnail images to promote videos
CN101668176A (zh) 一种基于人际社交图的多媒体内容点播与分享方法
CN111967924A (zh) 商品推荐方法、商品推荐装置、计算机设备和介质
CN115964560B (zh) 基于多模态预训练模型的资讯推荐方法及设备
US11915724B2 (en) Generating videos
CN114390368B (zh) 直播视频数据的处理方法及装置、设备、可读介质
CN113570416B (zh) 投放内容确定方法、装置、电子设备及存储介质
KR102522989B1 (ko) 멀티미디어 콘텐츠 내 상품 정보 제공 장치 및 방법
CN112446214A (zh) 广告关键词的生成方法、装置、设备及存储介质
WO2024104286A1 (zh) 一种视频处理方法、装置、电子设备和存储介质
CN113011919B (zh) 识别兴趣对象的方法及装置、推荐方法、介质、电子设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23890720

Country of ref document: EP

Kind code of ref document: A1