WO2022063124A1 - 视频融合方法和设备 - Google Patents

视频融合方法和设备 Download PDF

Info

Publication number
WO2022063124A1
WO2022063124A1 PCT/CN2021/119606 CN2021119606W WO2022063124A1 WO 2022063124 A1 WO2022063124 A1 WO 2022063124A1 CN 2021119606 W CN2021119606 W CN 2021119606W WO 2022063124 A1 WO2022063124 A1 WO 2022063124A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
server
push template
editable
fusion
Prior art date
Application number
PCT/CN2021/119606
Other languages
English (en)
French (fr)
Inventor
杨晖
Original Assignee
连尚(北京)网络科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 连尚(北京)网络科技有限公司 filed Critical 连尚(北京)网络科技有限公司
Publication of WO2022063124A1 publication Critical patent/WO2022063124A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/239Interfacing the upstream path of the transmission network, e.g. prioritizing client content requests
    • H04N21/2393Interfacing the upstream path of the transmission network, e.g. prioritizing client content requests involving handling client requests
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/25Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
    • H04N21/262Content or additional data distribution scheduling, e.g. sending additional data at off-peak times, updating software modules, calculating the carousel transmission frequency, delaying a video stream transmission, generating play-lists
    • H04N21/26291Content or additional data distribution scheduling, e.g. sending additional data at off-peak times, updating software modules, calculating the carousel transmission frequency, delaying a video stream transmission, generating play-lists for providing content or additional data updates, e.g. updating software modules, stored at the client
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/45Management operations performed by the client for facilitating the reception of or the interaction with the content or administrating data related to the end-user or to the client device itself, e.g. learning user preferences for recommending movies, resolving scheduling conflicts
    • H04N21/466Learning process for intelligent management, e.g. learning user preferences for recommending movies
    • H04N21/4668Learning process for intelligent management, e.g. learning user preferences for recommending movies for recommending content, e.g. movies

Definitions

  • the embodiments of the present application relate to the field of computer technologies, and in particular, to a video fusion method and device.
  • video files can only be produced based on the user's own inspiration and content.
  • the video content is limited by the user's own level and cannot well meet the needs of information interaction in the current Internet age.
  • the embodiments of the present application propose a video fusion method and device.
  • an embodiment of the present application provides a video fusion method, including: acquiring a source video uploaded by a terminal; detecting whether a predetermined editable feature exists in a frame image of the source video; at least one editable feature, sending to the terminal a push template set corresponding to the editable feature existing in the frame image and tag information, wherein the tag information includes at least one of the editable feature and the frame image; In response to receiving from the terminal selection information of a target push template in the push template set, the target push template is merged into the source video to generate a merged video.
  • merging the push template into the source video to generate a fused video includes: using artificial intelligence image fusion technology to fuse the target push template into a frame image corresponding to the source video.
  • fusing the push template into the source video to generate a merged video includes: using artificial intelligence image fusion technology to fuse the target push template into a frame image corresponding to the source video.
  • the step of using artificial intelligence image fusion technology to fuse the push template into the frame image corresponding to the source video includes: acquiring the frame image corresponding to the source video; using a semantic segmentation neural network to process the frame image corresponding to the source video , determine the image area that includes the editable feature in the frame image corresponding to the source video, and obtain the target fusion area; replace and add the content in the target push template to the target fusion area.
  • detecting whether a predetermined editable feature exists in the frame image of the source video includes: acquiring different types of push template sets, and determining corresponding matching editable features according to the type of the push template set; detecting the source video Whether the matching editable feature exists in the frame image of .
  • sending a push template set corresponding to the editable feature existing in the frame image to the terminal includes: in response to determining that there is at least one editable feature in the frame image Matching the editable feature to obtain a matching push template set corresponding to the matching editable feature; and sending the matching push template set to the terminal.
  • the selection information of the target push template includes: the selection information of the matching push template obtained according to the set of matching push templates; and the fusion of the target push template into the source video to generate the fusion video includes: The matching push template is fused into the source video to generate a fused video.
  • the method in response to determining that there is at least one editable feature in the frame image, before sending the push template set corresponding to the editable feature existing in the frame image and the marking information to the terminal, the method further includes: in response to receiving the terminal sending an editable feature set acquisition request, sending an editable feature set to the terminal, where the editable feature set includes one or more editable features; receiving selection information about the editable feature set sent by the terminal, The selection information is used to instruct the terminal to select at least one editable feature from the one or more editable features; and determining that at least one editable feature exists in the frame image includes: determining, according to the selection information, that there is an editable feature in the frame image At least one editable feature.
  • the method further includes: in response to receiving a push template set update request from the terminal, re-determining a push template set corresponding to the editable feature to obtain an update push template set; sending the update push template to the terminal gather.
  • the method is applied to the first server, and further includes: sending the merged video to the terminal, so that the terminal can display the merged video to the user; in response to receiving the confirmation sent by the terminal pointing to the merged video message, the confirmation message includes the identification information of the user; the identification information of the user and the use mark corresponding to the target push template are added to the fusion video.
  • the method is applied to the first server, and further includes: receiving at least one push template set sent by the second server.
  • the method is applied to a first server, and further includes: sending the merged video to a second server; receiving usage permission information sent by the second server; and sending the usage permission information to the terminal.
  • the method is applied to the second server, and further includes: sending the merged video to the terminal.
  • an embodiment of the present application provides a video fusion method, which is applied to a terminal and includes: sending a source video selected by a user to a first server or a second server; in response to receiving the first server or the second server sending The push template set and mark information; wherein, the mark information includes at least one of editable features and frame image information; present the push template set and the mark information to the user; in response to receiving the selection of the target push template information, and send the selection information of the target push template to the first server or the second server.
  • the method further includes: in response to receiving the merged video sent by the first server, and presenting the merged video to the user; in response to receiving a qualified signal pointing to the merged video, acquiring the identity of the user information to generate a confirmation message; send the confirmation message to the first server.
  • the method further includes: in response to receiving the fusion video sent by the second server, and presenting the fusion video to the user; in response to receiving a qualified signal pointing to the fusion video, acquiring identification information of the user , adding the user's identification information and the use mark corresponding to the target push template to the fusion video, generating a confirmation fusion video; and sending the confirmation fusion video to the first server.
  • pushing the set of templates includes: acquiring a set of matching push templates sent by the first server or the second server; and presenting the set of push templates and the tag information to the user includes: presenting the matching push template The set and the marked information are given to the user; and the selection information of the target push template includes: the selection information of the matching push template obtained according to the set of push templates.
  • the method further comprises: sending a request to the first server or the second server to obtain the set of editable features; in response to receiving the set of editable features sent by the first server or the second server; wherein , the editable feature set includes one or more editable features; present the editable feature set to the user; receive selection information of the editable feature set; wherein, the selection information is used to instruct the terminal to select from the one at least one editable feature selected from the plurality of editable features; sending selection information of the set of editable features to the first server or the second server.
  • the method further includes: in response to receiving the update push template instruction, generating a push template update request; sending the push template update request to the first server or the second server; receiving the first server or the second server The update push template set sent by the second server; and presenting the push template set and the mark information to the user includes: presenting the update push template set and the mark information to the user.
  • an embodiment of the present application provides a video fusion apparatus, including: a source video acquisition unit configured to acquire a source video uploaded by a terminal; a source video detection unit configured to detect whether a frame image of the source video is There is a predetermined editable feature; the push template sending unit is configured to, in response to determining that there is at least one editable feature in the frame image, send to the terminal a push template set corresponding to the editable feature existing in the frame image and Marking information, wherein the marking information includes at least one of the editable feature and the frame image; the fusion video generation unit is configured to respond to receiving from the terminal the target push template in the push template set Select information, and fuse the target push template into the source video to generate a fused video.
  • the fusion video generation unit is further configured to: use artificial intelligence image fusion technology to fuse the target push template into the frame image corresponding to the source video.
  • fusing the push template into the source video to generate a merged video includes: using artificial intelligence image fusion technology to fuse the target push template into a frame image corresponding to the source video.
  • the step of using artificial intelligence image fusion technology in the fusion video generation unit to fuse the push template into the frame image corresponding to the source video includes: acquiring the frame image corresponding to the source video; using a semantic segmentation neural network to process the source video For the frame image corresponding to the video, determine the image area including the editable feature in the frame image corresponding to the source video to obtain the target fusion area; replace and add the content in the target push template to the target fusion area.
  • the source video detection unit is further configured to: obtain different types of push template sets, and determine corresponding matching editable features according to the types of the push template sets; Match editable features.
  • the push template sending unit is further configured to: in response to determining that at least one matching editable feature exists in the frame image, obtain a matching push template set corresponding to the matching editable feature; send to the terminal The matching push template collection.
  • the selection information of the target push template in the fusion video generation unit includes: the selection information of the matching push template obtained according to the set of matching push templates and the fusion of the target push template into the source video, and
  • the fused video generating unit is further configured to: fuse the matching push template into the source video to generate a fused video.
  • an editable feature sending unit configured to send an editable feature set to the terminal in response to receiving an editable feature set acquisition request sent by the terminal, wherein the editable feature set includes One or more editable features; an edit feature selection information receiving unit, configured to receive selection information about the editable feature set sent by the terminal, the selection information being used to instruct the terminal to select the one or more editable features from the one or more editable features and the push template sending unit is further configured to, according to the selection information, determine that there is at least one editable feature in the frame image.
  • the push template update unit is configured to, in response to receiving a push template set update request from the terminal, re-determine a push template set corresponding to the editable feature to obtain an update push template set; send the push template set to the terminal Update push template collection.
  • the apparatus is set on the first server, and further includes: a first merged video sending unit, configured to send the merged video to the terminal, so that the terminal can display the merged video to the user; using a mark adding unit is configured to respond to receiving a confirmation message directed to the fusion video sent by the terminal, where the confirmation message includes the user's identification information; add the user's identification information and a use mark corresponding to the target push template to the fusion video .
  • the apparatus is set on the first server, and further includes: a push template receiving unit configured to receive at least one push template set sent by the second server.
  • the apparatus is set on the first server, and further includes: the first merged video sending unit is further configured to send the merged video to the second server; the license information forwarding unit is configured to receive the second Use permission information sent by the server; send the use permission information to the terminal.
  • the apparatus is set on the second server, and further includes: a second merged video sending unit configured to send the merged video to the terminal.
  • an embodiment of the present application provides a video fusion apparatus, which is provided in a terminal and includes: a source video sending unit configured to send a user-selected source video to a first server or a second server; a template obtaining unit configured In response to receiving the push template set and the mark information sent by the first server or the second server; wherein, the mark information includes at least one of editable features and frame image information; the template presentation unit is configured to presenting the push template set and the mark information to the user; a selection information sending unit, configured to send the selection information of the target push template to the first server or the second server in response to receiving the selection information of the target push template .
  • the apparatus further includes: a fusion video receiving unit, configured to respond to receiving the fusion video sent by the first server, and present the fusion video to the user; a confirmation information sending unit, configured to respond After receiving the qualified signal pointing to the fusion video, obtain the identification information of the user to generate a confirmation message; and send the confirmation message to the first server.
  • a fusion video receiving unit configured to respond to receiving the fusion video sent by the first server, and present the fusion video to the user
  • a confirmation information sending unit configured to respond After receiving the qualified signal pointing to the fusion video, obtain the identification information of the user to generate a confirmation message; and send the confirmation message to the first server.
  • the apparatus further includes: the merged video receiving unit is further configured to, in response to receiving the merged video sent by the second server, present the merged video to the user; an identification information adding unit, configured to In response to receiving a qualified signal pointing to the fusion video, obtain the identification information of the user, add the identification information of the user and the use mark corresponding to the target push template for the fusion video, and generate a confirmation fusion video; the fusion video can also be is configured to send the confirmed fusion video to the first server.
  • the template obtaining unit is further configured to obtain a set of matching push templates sent by the first server or the second server;
  • the template rendering unit is further configured to render the set of matching push templates and the markup information to the user;
  • the selection information sending unit is further configured to send the selection information matching the push template obtained according to the push template set to the first server or the second server.
  • the apparatus further includes: an editing feature requesting unit configured to send a request for obtaining an editable feature set to the first server or the second server; an editing feature receiving unit configured to respond to receiving the editable feature set an editable feature set sent by the first server or the second server; wherein the editable feature set includes one or more editable features; an editable feature presentation unit configured to present the editable feature set to the user; receiving selection information of the set of editable features; wherein the selection information is used to instruct the terminal to select at least one editable feature from the one or more editable features; and an editing feature selection information receiving unit configured to Sending selection information of the editable feature set to the first server or the second server.
  • the apparatus further includes: a push template update request unit, configured to generate a push template update request in response to receiving an update push template instruction; send the push template update to the first server or the second server and an update push template receiving unit configured to receive an update push template set sent by the first server or the second server; and the template rendering unit is further configured to present the push template set and the markup information to The user includes: presenting the update push template set and the mark information to the user.
  • an embodiment of the present application provides a computer device, the computer device includes: one or more processors; a storage device on which one or more programs are stored; when one or more programs are stored by one or more The execution of the processor causes one or more processors to implement a method as described in any implementation of the first aspect, or to implement a method as described in any implementation of the second aspect.
  • an embodiment of the present application provides a computer-readable medium on which a computer program is stored, and when the computer program is executed by a processor, implements the method described in any implementation manner in the first aspect, or implements the method described in the first aspect. The method described in any one of the two aspects.
  • the video fusion method and device After acquiring the source video uploaded by the terminal, it detects whether there is a predetermined editable feature in the frame image of the source video, and in response to determining that there is at least one editable feature in the frame image, Send the push template set corresponding to the editable feature existing in the frame image and the mark information to the terminal, wherein the mark information includes at least one of the editable feature and the frame image; in response to receiving the push from the terminal
  • the selection information of the target push template in the template set, and the corresponding target push template is fused into the source video to generate a fusion video.
  • the source video can be re-edited in combination with template information provided by the uploading user and other users to enrich the content in the source video, so as to improve the quality of the source video and explore more value of the source video.
  • FIG. 1 is an exemplary system architecture to which some embodiments of the present application may be applied;
  • Fig. 2 is the flow chart of the first embodiment of the video fusion method according to the present application.
  • FIG. 3 is a flow chart of an implementation of the video fusion method according to the present application.
  • Fig. 5 is the flow chart of the second embodiment of the video fusion method according to the present application.
  • FIG. 6 is a flowchart of an application scenario of the video fusion method according to the present application.
  • FIG. 7 is a flowchart of another application scenario of the video fusion method according to the present application.
  • FIG. 8 is a schematic structural diagram of a computer system suitable for implementing the computer equipment of some embodiments of the present application.
  • FIG. 1 illustrates an exemplary system architecture 100 to which embodiments of the video fusion method of the present application may be applied.
  • the system architecture 100 may include devices 101 , 102 , 103 , 104 and a network 105 .
  • the network 105 is the medium used to provide communication links between the devices 101 , 102 , 103 , 104 .
  • the network 105 may include various connection types, such as wired, wireless target communication links, or fiber optic cables, among others.
  • the devices 101, 102, 103, 104 may be hardware devices or software that support network connections to provide various network services.
  • the device When the device is hardware, it can be a variety of electronic devices including, but not limited to, smartphones, tablet computers, laptop computers, desktop computers, servers, and the like.
  • a hardware device it can be implemented as a distributed device group composed of multiple devices, or can be implemented as a single device.
  • the device When the device is software, it can be installed in the electronic devices listed above.
  • software it may be implemented as a plurality of software or software modules for providing distributed services, or may be implemented as a single software or software module. There is no specific limitation here.
  • the device can provide corresponding network services by installing a corresponding client application or server application.
  • client application After the client application is installed on the device, it can be embodied as a client in network communication.
  • server application After the server application is installed, it can be embodied as a server in network communication.
  • the devices 101, 102 are embodied as terminals, the device 103 is embodied as a first server, and the device 104 is embodied as a second server.
  • devices 101 and 102 can be clients installed with video applications
  • device 103 can be a background server that provides services for video applications
  • device 104 can be a background server that provides services for video applications, or can support template uploading client.
  • the video fusion method provided in the embodiment of the present application may be executed by the devices 101 , 102 , 103 , and 104 .
  • the video fusion method may include the following steps:
  • Step 201 Obtain the source video uploaded by the terminal.
  • the terminal may send the source to the first server (for example, the server 104 shown in FIG. 1 ) and the second server (for example, the server 103 shown in FIG. 1 ). video.
  • the first server for example, the server 104 shown in FIG. 1
  • the second server for example, the server 103 shown in FIG. 1 .
  • the first server may install a terminal device with video applications for users
  • the first server usually refers to the server used by the video playback platform side that provides video playback services
  • the second server usually refers to the template provider.
  • the terminal usually represents a user terminal device with a video application installed. The user who produces the video has registered a video account on the social application.
  • the source video uploaded by the terminal is the source video to be played to other users through the first server.
  • the source video contains various user-created contents, not limited to the fact that the user shoots the contents in real life Yes, it is still an animation video synthesized by using a tool, and the user can also perform secondary processing according to the captured content to generate the above-mentioned source video, which is not limited in this application.
  • Step 202 Detect whether a predetermined editable feature exists in the frame image of the source video.
  • the execution body of the first server or the second server for executing the video fusion method (abbreviated as the fusion execution body) starts to process the frames in the image of the source video.
  • the image is extracted.
  • all frame images in the source video can be extracted or extracted according to certain rules.
  • the frame images in the source video are detected, the range of the frame images with editable marks is determined, and the frame images within the range are detected.
  • the editable mark can be added by the user when making the source video, or the source video can be marked during the uploading process, or the unintegrated execution body can send various forms of remarks, such as marking in the file code or sending A separate identification field, the user can set the range of the frame image that allows the fusion execution subject to extract by adding an editable mark, so as to mark the range of the frame image that the user wants and/or does not want to be expanded, which is closer to the user's needs. .
  • the content in the frame image is detected, and whether there is a predetermined editable feature in the frame image is detected.
  • the editable features include, but are not limited to, text, images, animations, sounds, videos, and combinations thereof.
  • the fusion execution body detects the editable features, it can be determined that the frame image is editable, and other features are inserted into the frame image. Text, images, animations, sounds, etc.
  • the editable feature is determined in advance by the fusion execution body, so that the frame images of the source video can be screened according to the content corresponding to the identification feature, and the frame images that can be used for editing can be determined.
  • the editable feature when determining the editable feature, it is usually determined based on a push template or a set of push templates. In the determination process, after the fusion execution body pre-determines common template types, basic editable features may be determined, and corresponding template information may be added to these editable features. It is also possible to generate corresponding editable features according to the push template or category information after obtaining a certain push template or obtaining the category information of the template set, so that there is a correspondence between these editable features and the push template or the push template set. Find relationships.
  • the manner of determining the editable features includes: acquiring different types of push template sets, and determining corresponding matching editable features according to the types of the push template sets.
  • the template type may be related to the content of the push template, or it may be related to the content to be inserted or replaced in the push template, or It can be related to the function of the push template. For example, when it is determined that the push template set can be divided into carbonated beverages, juice beverages, functional beverages, etc., it can be determined that the editable feature is the beverage bottle image in the video frame or the text marked.
  • Step 203 in response to determining that at least one editable feature exists in the frame image, send a push template set and tag information corresponding to the editable feature existing in the frame image to the terminal.
  • the corresponding push template set and tag information are determined according to the editable feature, and then the information is sent to the terminal used by the user to upload the source video , so that the user using the terminal can determine the desired push template according to the push template set and the tag information, so that the push template can be merged into the source video to generate a merged video.
  • the corresponding tag information is sent, so as to facilitate the user to know the position and content of the video frame where the editable feature exists, or to know what kind of content the desired content is to be expanded for. , therefore, it can be understood that at least one of editable features and frame image information will be included in the mark information, so as to achieve the above purpose.
  • Step 204 in response to receiving the selection information of the target push template in the push template set from the terminal, fuse the target push template into the source video to generate a merged video.
  • the fusion execution body determines the target push template for fusion into the source video according to the content in the selection information, And fuse the target template into the source video.
  • the video fusion method further includes: in response to receiving a push template set update request from the terminal, re-determining a push template set corresponding to the editable feature to obtain an update push template set; sending the update push to the terminal Template collection.
  • the fusion execution body when the fusion execution body receives the push template collection update request, it responds to the request, regenerates the push template collection, and sends the push template collection to the terminal.
  • the push template set is updated, so that the terminal can select an appropriate push template according to the updated push template set, and expand the content of the push template that can be selected by the user.
  • fusion methods may be determined according to different forms of push templates.
  • image fusion may be performed by methods such as artificial intelligence fusion, texture or pixel replacement.
  • the fusion of the push template into the source video to generate the fusion video includes: using artificial intelligence image fusion technology to fuse the target push template into the frame image corresponding to the source video.
  • AI artificial intelligence image fusion technology
  • image semantic soft segmentation aims to accurately represent the soft transition between different areas of the image, similar to the magnetic lasso ( The functions of magnetic lasso) and magic wand, because artificial intelligence can automatically extract the features and content in the image, and fuse them according to the deep-level features of the image, providing a high-efficiency, high-quality The way of image fusion to save labor cost.
  • the step of using artificial intelligence image fusion technology to fuse the target push template into the frame image corresponding to the source video includes: acquiring the frame image corresponding to the source video; using a semantic segmentation neural network to process the corresponding frame image of the source video The frame image of the source video is determined, and the image area including the editable feature in the frame image corresponding to the source video is determined to obtain the target fusion area; the content in the target push template is replaced and added to the target fusion area.
  • FIG. 3 shows a process 300 of an implementation manner of using artificial intelligence image fusion technology to fuse the push template into the image corresponding to the source video, which specifically includes:
  • Step 301 Obtain a frame image corresponding to the source video.
  • Step 302 use a semantic segmentation neural network to process the frame image corresponding to the source video, determine an image area including the editable feature in the frame image corresponding to the source video, and obtain a target fusion area.
  • the semantic segmentation neural network refers to a graph convolutional neural network that distinguishes different contents in an image based on the classification of pixels in the image, such as fully convolutional neural networks (FCN for short), U -net Semantic Neural Segmentation Network and SegNet Convolutional Neural Network, etc.
  • FCN fully convolutional neural networks
  • U -net Semantic Neural Segmentation Network and SegNet Convolutional Neural Network, etc.
  • low-level affine relation items are first constructed in the neural network of semantic soft segmentation to represent a large range of related features between color-based pixels. Then, high-level semantic affine relation items are constructed so that they belong to the same scene. The pixels of the object are as close as possible, and the relationship between the pixels of different scene objects is far away, and then the Laplacian matrix is decomposed, the feature vector is extracted, and the feature vector is subjected to two-step sparse processing to create an image layer.
  • Vector to achieve image segmentation determine the image area of editable features, that is, determine the target fusion area.
  • Step 303 replace and add the content in the target push template to the target fusion area.
  • the content in the target push template can be replaced with the content in the target fusion area based on feature alignment, size alignment, etc., so as to realize the content in the target push template Replaces the purpose added to the target blend area.
  • the semantic segmentation neural network is used to solve the problem of module division from the perspective of spectral segmentation, and the texture and color features of the image are considered, and the higher-level semantic information generated by the deep neural network using the graph structure is used to realize the push template. Extract the content in the source video, and add the extracted content to the frame image of the source video to improve the fusion effect of the push template and the frame image in the fusion video.
  • the video fusion method After acquiring the source video uploaded by the terminal, it detects whether there is a predetermined editable feature in the frame image of the source video, and in response to determining that there is at least one editable feature in the frame image, to the The terminal sends a push template set and mark information corresponding to the editable feature existing in the frame image, wherein the mark information includes at least one of the editable feature and the frame image; in response to receiving the push template set from the terminal
  • the selection information of the target push template in , and the corresponding target push template is fused into the source video to generate a fusion video.
  • the source video can be re-edited in combination with template information provided by the uploading user and other users to enrich the content in the source video, so as to improve the quality of the source video and explore more value of the source video.
  • FIG. 4 shows an implementation of the video fusion method according to the present application.
  • the process 400 specifically includes the following steps:
  • Step 401 Acquire different types of push template sets, and determine corresponding matching editable features according to the types of the push template sets.
  • the fusion execution body can obtain multiple push templates in advance from local or non-local devices, classify these push templates, determine different types of push template sets, and then select the appropriate push template set according to the different types of the determined push template sets.
  • Editable features correspond to each other. For example, if the obtained push templates are mobile phones of different brands and models, the type of the push template set can be determined as the mobile phone type, and the mobile phone image is automatically matched as the corresponding editable feature, and the matching editable feature is determined. , based on the push template to determine the matching editable features, to ensure that the determined matching editable features have enough matching push templates to correspond to, and improve the quality of the editable features.
  • the set of push templates can be received from the second server, so as to understand the specific needs of users who use the second server, so as to improve the quality of the obtained set of push templates .
  • Step 402 detecting whether the matching editable feature exists in the frame image of the source video.
  • the obtained frame image of the source video can be detected according to the image similarity algorithm or the method of deep learning, to detect whether the image content in the frame image is the same as or similar to the editable feature.
  • the editable features are the same or similar to the image content, it is considered that there are editable features in the frame image, that is, the corresponding push template can be selected according to the editable features to edit the frame image, and the frame with editable features can be edited. Extract, or mark and record the frame images with editable features in the frame sequence, so that the frame images with editable features can be found later.
  • Step 403 in response to determining that there is at least one matching editable feature in the frame image, obtain a matching push template set corresponding to the matching editable feature.
  • the corresponding matching push template set is determined based on the detected matching editable feature. For example, when it is detected that there is a mobile phone image in the frame image, it is determined that the mobile phone type The push template set is used as the matching push template set. Because the matching editable feature has a clear matching push template set, the matching editable feature can quickly determine the corresponding matching push set to improve the confirmation efficiency of the push template.
  • Step 404 Send the matching push template set to the terminal.
  • Step 405 in response to receiving the selection information for the matching push template in the set of matching push templates from the terminal, fuse the matching push template into the source video to generate a merged video.
  • the push template set is determined according to the type and content of the obtained push template.
  • the matching can be determined based on the type information of the push template.
  • Editing features that is, editable features that are actively matched by the fusion execution body.
  • matching is performed according to the editable features, so as to realize the function of automatically detecting the source video and sending and pushing the template set.
  • the corresponding editable features are determined according to the push template set, which not only improves the determination efficiency of the editable features, but also facilitates the user to select appropriate extended content according to the matching result of the fusion execution subject.
  • the video fusion method when the fusion execution body is the above-mentioned first server, the video fusion method further includes: sending the fusion video to the terminal, so that the terminal displays the fusion video to the user; in response to receiving the terminal sending the fusion video
  • the confirmation message pointing to the fusion video the confirmation message includes the user's identification information; the user's identification information and the use mark corresponding to the target push template are added to the fusion video.
  • the fusion execution body when the fusion execution body is the above-mentioned first server, it sends the fusion video to the terminal for confirmation. After the fusion execution body receives the confirmation message that is directed to the fusion video and includes the user's identification information sent by the terminal, it can be considered that the user If you agree to use the fusion video, add the user's identification information and the use mark of the target push template to the fusion video. In the technology of presenting the fusion effect for the user, more consideration can be given to the user's production opinions, and the follow-up can be based on the target. Push the template's usage tag to determine the template to use and understand the generation of the fused video.
  • the video fusion method when the fusion execution body is the above-mentioned first server, the video fusion method further includes: sending the fusion video to a second server; receiving use permission information sent by the second server; sending the terminal to the terminal. Use License Information.
  • the fusion video can also be sent to the second server, and when the use permission information sent by the second server is received, it can be considered that the user of the second server is allowed to use the video.
  • Fusion video that is, confirming that the content in the fusion video generated based on the target push template can meet the needs of the user using the second server, and then sending the license information to the terminal that uploaded the source video, so as to realize the user using the terminal and using the second server.
  • Information exchange between users of the server in order to balance the needs of both parties and improve the quality of the fusion video.
  • the method further includes: sending the fusion video to the terminal.
  • the fusion execution body is the above-mentioned second server
  • the fusion video is sent to the terminal, so that the user of the terminal can directly use the fusion video when the user of the terminal thinks that the fusion video can meet the requirements. Avoid resource waste caused by repeated transmission of converged video.
  • the method before sending to the terminal a push template set corresponding to the editable feature existing in the frame image and marking information in response to determining that there is at least one editable feature in the frame image, the method further includes: responding to Receive an editable feature set acquisition request sent by the terminal, send an editable feature set to the terminal, where the editable feature set includes one or more editable features; receive information about the editable feature set sent by the terminal selection information, the selection information is used to instruct the terminal to select at least one editable feature from the one or more editable features; and the determining that there is at least one editable feature in the frame image includes: according to the selection information It is determined that there is at least one editable feature in the frame of image.
  • a request for obtaining an editable feature set sent by the terminal is received, an editable feature set including one or more editable features is sent to the terminal, and then an editable feature set sent by the terminal is received.
  • the editable features specified by the user of the terminal are read from the selection information, and then the push template set is subsequently determined according to the editable features specified by the user, and the editable features are presented for the user in advance.
  • users can select appropriate editable features according to their own needs, and obtain a corresponding set of push templates, so as to better meet the needs of users.
  • the video fusion method may include the following steps:
  • Step 501 Send the source video to the first server or the second server.
  • a terminal may send a source to a first server (eg, the server 103 shown in FIG. 1 ) or a second server (eg, the server 104 shown in FIG. 1 ) video.
  • a first server eg, the server 103 shown in FIG. 1
  • a second server eg, the server 104 shown in FIG. 1
  • the first server or the second server can be a terminal device with a video application installed for the user, it usually appears as a background server of the video application, and the corresponding terminal communication usually appears as a user terminal device with a video application installed.
  • the user who produces the video has registered a video account on the social application.
  • the source video uploaded by the terminal is the source video to be played to other users through the server.
  • the source video contains a variety of user-created content, not limited to the content shot by the user in real life. It is still an animation video synthesized by using a tool, and the user can also perform secondary processing according to the captured content to generate the above-mentioned source video, which is not limited in this application.
  • the user uses the execution body of the video fusion method for the terminal (referred to as the user execution body) to send the source video to the first server or the second server.
  • the source video contains various content created by the user, not limited to Whether the user shoots the content in real life or uses a tool to synthesize an animation video, the user can also perform secondary processing according to the captured content to generate the above-mentioned source video, which is not limited in this application.
  • the user can also add editable marks to the sent source video, for example, when the user is making the source video, add editable marks, or mark the source video during the uploading process, or send various forms without fusion execution body Remarks, such as marking in the file code or sending a separate identification field, the user can set the range of frame images that the fusion execution body is allowed to extract by adding editable markings to mark the user wants and/or does not want to be The extended range of frame images is closer to the needs of users.
  • Step 502 in response to receiving the push template set and the tag information sent by the first server or the second server.
  • the tag information includes at least one of editable features and frame image information. Edit what the feature replaces.
  • Step 503 presenting the push template set and the markup information to the user.
  • the user execution body can present the push template set and mark information to the user through a local display device, so that the user can determine the editable features and/or the mark information according to the mark information.
  • the frame image information and the push in the displayed push template set determine the push template expected to be selected.
  • Step 504 In response to receiving the selection information of the target push template, send the selection information of the target push template to the first server or the second server.
  • the user execution body determines the selection information of the target push template selected by the user, and sends the selection information of the target push template to the first server or the second server that sends the push template set.
  • the selection information may also include the number of frames that the user expects to add to the push template, so that the fusion execution body can better understand the user's expectations and add the content in the push template according to the expectations.
  • the video fusion method provided by the embodiment of the present application, after sending the source video selected by the user to the first server or the second server, in response to receiving the push template set and the tag information sent by the first server or the second server; wherein , the mark information includes at least one of editable features and frame image information; present the push template set and the mark information to the user, and in response to receiving the selection information of the target push template, report to the first server or the The second server sends the selection information of the target push template.
  • secondary editing of the source video content can be implemented by the first server or the second server, and the content in the source video can be enriched, so as to improve the quality of the source video and explore more value of the source video.
  • the method further includes: in response to receiving the merged video sent by the first server, and presenting the merged video to the user; in response to receiving a qualified signal pointing to the merged video, acquiring the identity of the user information to generate a confirmation message; send the confirmation message to the first server.
  • the fusion video is presented to the user, so as to feed back to the user the fusion video generated after the target push template is fused into the source video, if the user agrees to use the fusion video Then send a qualified signal indicating that the fusion video can be used to the user executive body, after which the user executive body will generate corresponding confirmation information according to the user's identification information, and send it to the first server, so that the first server can understand according to the confirmation information.
  • the fusion video can be used, and the fusion video can be marked according to the user ID in it, so as to establish a connection between the fusion video and the user, and then provide other users with the user information uploaded by the source video according to the fusion video, in order to protect the user.
  • the fusion video can be used, and the fusion video can be marked according to the user ID in it, so as to establish a connection between the fusion video and the user, and then provide other users with the user information uploaded by the source video according to the fusion video, in order to protect the user.
  • the fusion video can be used, and the fusion video can be marked according to the user ID in it, so as to establish a connection between the fusion video and the user, and then provide other users with the user information uploaded by the source video according to the fusion video, in order to protect the user.
  • the same time of copyright explore more potential value.
  • the method further includes: in response to receiving the fusion video sent by the second server, and presenting the fusion video to the user; in response to receiving a qualified signal pointing to the fusion video, acquiring the identity of the user information, add the user's identification information and the use mark corresponding to the target push template to the fusion video, generate a confirmation fusion video; send the confirmation fusion video to the first server.
  • the fusion video is presented to the user, so as to feed back to the user the fusion video generated after the target push template is fused into the source video, if the user agrees to use the fusion video Then send a qualified signal indicating that the fusion video can be used to the user executive body, after which the user executive body will generate the corresponding confirmation information according to the user's identification information, directly add it to the fusion video, and then send the fusion video to the first.
  • the server displays, in the above-mentioned implementation manner, the first server understands that the fusion video can be used according to the confirmation information, and marks the fusion video according to the user ID in it, so as to facilitate the establishment of the connection between the fusion video and the user.
  • the user information uploaded by the source video can be provided to other users according to the fusion video. While protecting the copyright of users, on the basis of exploring more potential value, it also reduces the process of repeatedly sending the fusion video to the second server for uploading. , saving transmission resources.
  • obtaining the push template set in the push template selection request includes: obtaining the matching push template set sent by the first server or the second server.
  • the method of determining the set of matching push templates and the subsequent method of obtaining selection information of matching push templates according to the set of matching push templates are similar to the implementation mode shown in FIG.
  • the set is obtained based on the editable features obtained from the classification information of the push template set.
  • obtaining a push template selected based on the push template set, obtaining selection information of a corresponding push template, and sending the selection information to the first server or the second server includes: in response to receiving an instruction to obtain an editable feature set , sending a request for obtaining an editable feature set to the first server or the second server; wherein, the editable feature set includes at least one editable feature; in response to receiving the editable feature sent by the first server or the second server collection, to obtain a self-selected push template determined by the user based on the editable feature; and send the self-selected push template to the first server or the second server.
  • the method further comprises: sending a request to the first server or the second server to obtain the set of editable features; in response to receiving the set of editable features sent by the first server or the second server; wherein , the editable feature set includes one or more editable features; the selection information of the editable feature set is received; wherein, the selection information is used to instruct the terminal to select at least one of the one or more editable features an editable feature; presenting the set of editable features to the user; sending selection information of the set of editable features to the first server or the second server.
  • the user execution body after the user execution body receives the instruction to obtain the editable feature from the user who uploaded the source video, it can send a request to obtain the set of editable features to the first server or the second server that specifically received the source video, and then receives Based on the editable feature set returned by the first server or the second server, the editable feature set includes one or more editable features; then the editable feature set is presented to the user, and when the user determines that the editable feature set is available After editing the feature, send the selection information of the editable feature set to the user execution body, the selection information is used to instruct the terminal to select at least one editable feature from the one or more editable features, and it is determined that the user execution body responds to receiving the editable feature.
  • the selection information is to send the selection information of the editable feature set to the first server or the second server that specifically received the source video, so that the first server or the second server can subsequently determine the corresponding editable feature set according to the selection information of the editable feature set.
  • the corresponding push template set is sent according to the content of the editable feature, which fits the actual needs of the user, so as to improve the determination efficiency of the target push template and the determined target. The quality of the push template.
  • the method further includes: in response to receiving the update push template instruction, generating a push template update request; and sending the push template update request to the first server or the second server.
  • the user can send a push to the second execution body Template update instruction, after the second execution body receives the push template update instruction, it can generate a template update request based on the instruction, and send it to the first server or the second server to obtain a new push template set, so as to better serve the The user can better meet the needs of the user by pushing the template collection to update, so as to improve the quality of the obtained target push template.
  • the intelligent mobile terminal D1 is the terminal for the user to upload the source video, and a video application can be installed therein
  • the server S1 is the first server embodied as the background server of the video application
  • the server S2 is the first server embodied as the push provider side.
  • the user U1 uses the intelligent mobile terminal D1 to upload the source video A1 to the server S1.
  • the server S1 obtains the push template combinations B and C and the push template E saved locally from the server S2 in advance.
  • the user U1 uses the intelligent mobile terminal D1 to upload the source video A1 to the server S1, and the server S1 parses the source video A1 and determines that there are editable features A11 and A12 in the 30th to 35th frames and the 40th to 45th frames, and generates corresponding and send the push template sets B and C determined according to A11 and A12 together with the marking information to the intelligent mobile terminal D1 for selection by the user U1.
  • the intelligent mobile terminal D1 that is, after the user U1 receives the information, is allowed to use the push template B11 in the push template set corresponding to A11 to edit the image frames of frames 30-35, and is not allowed to use the push template in the push template set C Edit A12, and then the user U1 sends the selection information to the server S1 using the smart mobile terminal D1, and sends an update push template request to the server S1 to obtain the update push template set E for the editing feature A12.
  • the user U1 After receiving the push update push template set E, the user U1 allows to use E11 in the update push template set E to edit the image frames of the 40th to 45th frames, and uses the intelligent mobile terminal D1 to send the selection information to the server S1.
  • the server S1 uses the semantic segmentation neural network to process the 30th-35th and 40th-45th frame images, determines the target fusion area in the image, and then integrates the template B11 into the 30th-35th frame image, and the template E11 into the first image. 40-45 frames of images, and generate fusion video R1.
  • the intelligent mobile terminal D1 After the server S1 sends the merged video R1 to the intelligent mobile terminal D1, the intelligent mobile terminal D1 displays the merged video R1 to the user U1, and the user U1 confirms the merged video and allows the use of the merged video R1, then uses the mobile terminal intelligent The mobile terminal D1 sends confirmation information including the identification information of the user U1 to the server.
  • the server S1 After receiving the confirmation information sent by the mobile terminal intelligent mobile terminal D1, the server S1 adds the usage marks corresponding to the used templates B11 and E11 in the fusion video R1, and sends it to the server S2 for confirmation.
  • the video fusion work is finally completed, and the generated fusion video R1 is saved locally.
  • the intelligent mobile terminal D2 is a terminal for the user U2 to upload the source video, and a video application may be installed therein.
  • the server S1 is a first server embodied as a video application background, and the server S2 is a second server embodied as a template providing side.
  • the user U2 uses the intelligent mobile terminal D2 to upload the source video A2 to the server S2.
  • the user U1 uses the intelligent mobile terminal D2 to upload the source video A2 to the server S2, and the server S2 parses the source video A2 and determines that there are editable features A21 in the 10th to 15th frames, and generates a corresponding and send the push template set F determined according to A21 together with the mark information to the intelligent mobile terminal D2 for the user U2 to select.
  • the intelligent mobile terminal D2 that is, after the user U2 receives the information, allows to use the push template F11 in the push template set corresponding to A21 to edit the image frames of the 10th to 15th frames, and uses D2 to send the selection information to the server S2 .
  • the server S2 uses the semantic segmentation neural network to process the 10-15 frames of images, determines the target fusion area in the images, pushes the template F11 to be integrated into the 10-15 frames of images, and generates a fusion video R2.
  • the intelligent mobile terminal D2 After the server S2 sends the fusion video R2 to the intelligent mobile terminal D2, the intelligent mobile terminal D2 displays the fusion video R2 to the user U2, and the user U2 confirms the fusion video and allows the use of the fusion video, and the intelligent mobile terminal D2 obtains the fusion video.
  • the identification information of the user U2 add the identification information of the user U2 and the use mark of the used push template F11 to the fusion video R2, and send the fusion video R2 with the addition of the identification information of the user U2 and the use mark of the used push template F11. to the server S1 to save it locally on the server S1.
  • FIG. 8 it shows a schematic structural diagram of a computer system 800 suitable for implementing the computer devices (eg, devices 101 , 102 , 103 , and 104 shown in FIG. 1 ) of the embodiments of the present application.
  • the computer device shown in FIG. 8 is only an example, and should not impose any limitations on the functions and scope of use of the embodiments of the present application.
  • a computer system 800 includes a central processing unit (CPU) 801, which can be loaded into a random access memory (RAM) 803 according to a program stored in a read only memory (ROM) 802 or a program from a storage section 808 Instead, various appropriate actions and processes are performed.
  • RAM random access memory
  • ROM read only memory
  • various programs and data required for the operation of the system 800 are also stored.
  • the CPU 801, the ROM 802, and the RAM 803 are connected to each other through a bus 804.
  • An input/output (I/O) interface 805 is also connected to bus 804 .
  • the following components are connected to the I/O interface 805: an input section 806 including a keyboard, a mouse, etc.; an output section 805 including a cathode ray tube (CRT), a liquid crystal display (LCD), etc., and a speaker, etc.; a storage section 807 including a hard disk, etc. ; and a communication section 809 including a network interface card such as a LAN card, a modem, and the like. The communication section 809 performs communication processing via a network such as the Internet.
  • a drive 810 is also connected to the I/O interface 805 as needed.
  • a removable medium 811 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, etc., is mounted on the drive 810 as needed so that a computer program read therefrom is installed into the storage section 807 as needed.
  • embodiments of the present disclosure include a computer program product comprising a computer program carried on a computer-readable medium, the computer program containing program code for performing the method illustrated in the flowchart.
  • the computer program may be downloaded and installed from the network via the communication portion 809, and/or installed from the removable medium 811.
  • CPU central processing unit
  • the computer-readable medium in this application may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the above two.
  • the computer-readable storage medium can be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or a combination of any of the above. More specific examples of computer readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable Programmable read only memory (EPROM or flash memory), fiber optics, portable compact disk read only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.
  • a computer-readable storage medium can be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.
  • a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, carrying computer-readable program code therein. Such propagated data signals may take a variety of forms including, but not limited to, electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • a computer-readable signal medium can also be any computer-readable medium other than a computer-readable storage medium that can transmit, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device .
  • Program code embodied on a computer readable medium may be transmitted using any suitable medium including, but not limited to, wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
  • Computer program code for performing the operations of the present application may be written in one or more programming languages, including object-oriented programming languages—such as Java, Smalltalk, C++, but also conventional Procedural programming language - such as the "C" language or similar programming language.
  • the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or electronic device.
  • the remote computer may be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computer (eg, using an Internet service provider through Internet connection).
  • LAN local area network
  • WAN wide area network
  • each block in the flowchart or block diagrams may represent a module, segment, or portion of code that contains one or more logical functions for implementing the specified functions executable instructions.
  • the functions noted in the blocks may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.
  • each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations can be implemented using dedicated hardware-based systems that perform the specified functions or operations , or can be implemented in a combination of dedicated hardware and computer instructions.
  • the units involved in the embodiments of the present application may be implemented in a software manner, and may also be implemented in a hardware manner.
  • the described unit can also be set in the processor, for example, it can be described as: a processor includes a source video acquisition unit, a source video detection unit, a push template sending unit, and a fusion video generation unit. Wherein, the names of these units do not constitute a limitation on the unit itself, for example, the source video acquisition unit may also be described as "obtaining the source video uploaded by the terminal".
  • a processor includes a source video sending unit, a template obtaining unit, a template presenting unit, and a selection information sending unit. Wherein, the names of these units do not constitute a limitation on the unit itself, for example, the source video sending unit can also be described as "sending a user-selected source video to the first server or the second server".
  • the present application also provides a computer-readable medium.
  • the computer-readable medium may be included in the computer device described in the above embodiments; it may also exist independently without being assembled into the computer device. middle.
  • the above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by the computer device, the computer device: after acquiring the source video uploaded by the terminal, detect whether there is a frame image of the source video.
  • the pre-determined editable feature in response to determining that there is at least one editable feature in the frame image, send to the terminal a push template set corresponding to the editable feature existing in the frame image and mark information, wherein the mark information includes at least one editable feature.
  • One of editing features and frame images in response to receiving the selection information of the target push template in the push template set from the terminal, fuse the corresponding target push template into the source video to generate a fusion video. And after sending the source video selected by the user to the first server or the second server, in response to receiving the push template set and the mark information sent by the first server or the second server; wherein, the mark information includes at least editable One of feature and frame image information; present the push template set and the markup information to the user, and in response to receiving the selection information of the target push template, send the target push template to the first server or the second server. Select Information.

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

本申请实施例公开了视频融合方法和设备。该方法的一具体实施方式包括:获取终端上传的源视频,检测该源视频的帧图像中是否存在预先确定的可编辑特征,响应于确定帧图像中存在至少一个可编辑特征,向该终端发送帧图像中所存在的可编辑特征对应的推送模板集合以及标记信息,其中,标记信息中至少包括可编辑特征和帧图像中的一种;响应于从该终端接收到对该推送模板集合中的目标推送模板的选择信息,将对应的目标推送模板融合至该源视频中,以生成融合视频。该实施方式可以结合上传用户和其他用户提供的模板信息对源视频进行二次编辑,丰富源视频中的内容,以提升源视频的质量并发掘源视频的更多价值。

Description

视频融合方法和设备
本申请是以CN申请号为202011025894.1,申请日为2020.09.25的申请为基础,并主张其优先权,该CN申请的公开内容在此作为整体引入本申请中。
技术领域
本申请实施例涉及计算机技术领域,具体涉及视频融合方法和设备。
背景技术
随着社会进入互联网时代,越来越多的视频网站和自媒体的逐渐兴起,用户可以将自己制作的视频上传至视频网站或者自媒体中实现与其他用户的分享。
目前,视频文件制作时,仅可基于用户自身的灵感和内容进行制作,视频内容受到用户自身的水平限制,无法很好的适应当前互联网时代中信息交互的需求。
发明内容
本申请实施例提出了视频融合方法和设备。
第一方面,本申请实施例提供了一种视频融合方法,包括:获取终端上传的源视频;检测该源视频的帧图像中是否存在预先确定的可编辑特征;响应于确定该帧图像中存在至少一个可编辑特征,向该终端发送该帧图像中所存在的可编辑特征对应的推送模板集合以及标记信息,其中,该标记信息中至少包括该可编辑特征和该帧图像中的一种;响应于从该终端接收到对该推送模板集合中的目标推送模板的选择信息,将该目标推送模板融合至该源视频中,以生成融合视频。
在一些实施例中,将推送模板融合至该源视频中,以生成融合视频,包括:采用人工智能图像融合技术将该目标推送模板融合至该源视频对应的帧图像中。在一些实施例中,融合该推送模板至该源视频中生成融合视频包括:采用人工智能图像融合技术将该目标推送模板融合至该源视频对应的帧图像中。
在一些实施例中,采用人工智能图像融合技术将推送模板融合至源视频对应的帧图像中的步骤包括:获取该源视频对应的帧图像;采用语义分割神经网络处理该源视频对应的帧图像,确定该源视频对应的帧图像中包括该可编辑特征的图像区域,得到目标融 合区域;将该目标推送模板中的内容替换添加至该目标融合区域。
在一些实施例中,检测源视频的帧图像中是否存在预先确定的可编辑特征包括:获取不同类型的推送模板集合,根据该推送模板集合的类型确定对应的匹配可编辑特征;检测该源视频的帧图像中是否存在该匹配可编辑特征。
在一些实施例中,响应于确定帧图像中存在至少一个可编辑特征,向终端发送帧图像中所存在的可编辑特征对应的推送模板集合,包括:响应于确定该帧图像中存在至少一个该匹配可编辑特征,得到与该匹配可编辑特征对应的匹配推送模板集合;向该终端发送该匹配推送模板集合。
在一些实施例中,目标推送模板的选择信息,包括:根据该匹配推送模板集合得到的匹配推送模板的选择信息;以及该将该目标推送模板融合至该源视频中,以生成融合视频包括:将该匹配推送模板融合至该源视频中,以生成融合视频。
在一些实施例中,响应于确定帧图像中存在至少一个可编辑特征,向终端发送帧图像中所存在的可编辑特征对应的推送模板集合以及标记信息之前,还包括:响应于接收到该终端发送的可编辑特征集合获取请求,向该终端发送可编辑特征集合,其中,该可编辑特征集合中包括一个或多个可编辑特征;接收该终端发送的关于该可编辑特征集合的选择信息,该选择信息用于指示该终端从该一个或多个可编辑特征中选择的至少一个可编辑特征;以及确定帧图像中存在至少一个可编辑特征,包括:根据该选择信息确定该帧图像中存在至少一个可编辑特征。
在一些实施例中,该方法还包括:响应于从该终端接收到推送模板集合更新请求,重新确定该可编辑特征对应的推送模板集合,得到更新推送模板集合;向该终端发送该更新推送模板集合。
在一些实施例中,该方法应用于第一服务器,还包括:向该终端发送该融合视频,以使得该终端向用户展示该融合视频;响应于接收到该终端发送的指向该融合视频的确认消息,该确认消息包括该用户的标识信息;为该融合视频添加该用户的标识信息和与该目标推送模板对应的使用标记。
在一些实施例中,该方法应用于第一服务器,还包括:接收第二服务器发送的至少一个推送模板集合。
在一些实施例中,该方法应用于第一服务器,还包括:向第二服务器发送该融合视频;接收到该第二服务器发送的使用许可信息;向该终端发送该使用许可信息。
在一些实施例中,该方法应用于第二服务器,还包括:向该终端发送该融合视频。
第二方面,本申请实施例提供了一种视频融合方法,应用于终端包括:向第一服务器或第二服务器发送用户选择的源视频;响应于接收到该第一服务器或该第二服务器发送的推送模板集合以及标记信息;其中,该标记信息中至少包括可编辑特征和帧图像信息中的一种;呈现该推送模板集合和该标记信息给该用户;响应于接收到目标推送模板的选择信息,向该第一服务器或该第二服务器发送该目标推送模板的选择信息。
在一些实施例中,该方法还包括:响应于接收到该第一服务器发送的融合视频,并呈现该融合视频给该用户;响应于接收到指向该融合视频的合格信号,获取该用户的标识信息生成确认消息;向该第一服务器发送该确认消息。
在一些实施例中,该方法还包括:响应于接收到该第二服务器发送的融合视频,并呈现该融合视频给用户;响应于接收到指向该融合视频的合格信号,获取该用户的标识信息,为该融合视频添加该用户的标识信息和与该目标推送模板对应的使用标记,生成确认融合视频;发送该确认融合视频至该第一服务器。
在一些实施例中,推送模板集合包括:获取该第一服务器或该第二服务器发送的匹配推送模板集合;以及该呈现该推送模板集合和该标记信息给该用户,包括:呈现该匹配推送模板集合和该标记信息给该用户;以及该目标推送模板的选择信息,包括:根据该推送模板集合得到的匹配推送模板的选择信息。
在一些实施例中,该方法还包括:向该第一服务器或该第二服务器发送获取可编辑特征集合请求;响应于接收到该第一服务器或该第二服务器发送的可编辑特征集合;其中,该可编辑特征集合中包括一个或多个可编辑特征;呈现该可编辑特征集合给该用户;接收到该可编辑特征集合的选择信息;其中,该选择信息用于指示该终端从该一个或多个可编辑特征中选择的至少一个可编辑特征;向该第一服务器或该第二服务器发送该可编辑特征集合的选择信息。
在一些实施例中,该方法还包括:响应于接收到更新推送模板指令,生成推送模板更新请求;向该第一服务器或该第二服务器发送该推送模板更新请求;接收该第一服务器或该第二服务器发送的更新推送模板集合;以及该呈现该推送模板集合和该标记信息给该用户,包括:呈现该更新推送模板集合和该标记信息给该用户。
第三方面,本申请实施例提供了一种视频融合装置,包括:源视频获取单元,被配置成获取终端上传的源视频;源视频检测单元,被配置成检测该源视频的帧图像中是否 存在预先确定的可编辑特征;推送模板发送单元,被配置成响应于确定该帧图像中存在至少一个可编辑特征,向该终端发送该帧图像中所存在的可编辑特征对应的推送模板集合以及标记信息,其中,该标记信息中至少包括该可编辑特征和该帧图像中的一种;融合视频生成单元,被配置成响应于从该终端接收到对该推送模板集合中的目标推送模板的选择信息,将该目标推送模板融合至该源视频中,以生成融合视频。
在一些实施例中,融合视频生成单元中进一步被配置成:采用人工智能图像融合技术将该目标推送模板融合至该源视频对应的帧图像中。在一些实施例中,融合该推送模板至该源视频中生成融合视频包括:采用人工智能图像融合技术将该目标推送模板融合至该源视频对应的帧图像中。
在一些实施例中,融合视频生成单元中采用人工智能图像融合技术将推送模板融合至源视频对应的帧图像中的步骤包括:获取该源视频对应的帧图像;采用语义分割神经网络处理该源视频对应的帧图像,确定该源视频对应的帧图像中包括该可编辑特征的图像区域,得到目标融合区域;将该目标推送模板中的内容替换添加至该目标融合区域。
在一些实施例中,该源视频检测单元进一步被配置成:获取不同类型的推送模板集合,根据该推送模板集合的类型确定对应的匹配可编辑特征;检测该源视频的帧图像中是否存在该匹配可编辑特征。
在一些实施例中,该推送模板发送单元进一步被配置成:响应于确定该帧图像中存在至少一个该匹配可编辑特征,得到与该匹配可编辑特征对应的匹配推送模板集合;向该终端发送该匹配推送模板集合。
在一些实施例中,融合视频生成单元中的目标推送模板的选择信息,包括:根据该匹配推送模板集合得到的匹配推送模板的选择信息以及该将该目标推送模板融合至该源视频中,以及融合视频生成单元进一步被配置成:将该匹配推送模板融合至该源视频中,以生成融合视频。
在一些实施例中,还包括可编辑特征发送单元,被配置成响应于接收到该终端发送的可编辑特征集合获取请求,向该终端发送可编辑特征集合,其中,该可编辑特征集合中包括一个或多个可编辑特征;编辑特征选择信息接收单元,被配置成接收该终端发送的关于该可编辑特征集合的选择信息,该选择信息用于指示该终端从该一个或多个可编辑特征中选择的至少一个可编辑特征;以及推送模板发送单元进一步被配置成,根据该选择信息确定该帧图像中存在至少一个可编辑特征。
在一些实施例中,推送模板更新单元,被配置成响应于从该终端接收到推送模板集合更新请求,重新确定该可编辑特征对应的推送模板集合,得到更新推送模板集合;向该终端发送该更新推送模板集合。
在一些实施例中,该装置设置于第一服务器,还包括:第一融合视频发送单元,被配置成向该终端发送该融合视频,以使得该终端向用户展示该融合视频;使用标记添加单元,被配置成响应于接收到该终端发送的指向该融合视频的确认消息,该确认消息包括该用户的标识信息;为该融合视频添加该用户的标识信息和与该目标推送模板对应的使用标记。
在一些实施例中,该装置设置于第一服务器,还包括:推送模板接收单元,被配置成接收第二服务器发送的至少一个推送模板集合。
在一些实施例中,该装置设置于第一服务器,还包括:第一融合视频发送单元进一步被配置成,向第二服务器发送该融合视频;许可信息转发单元,被配置成接收到该第二服务器发送的使用许可信息;向该终端发送该使用许可信息。
在一些实施例中,该装置设置于第二服务器,还包括:第二融合视频发送单元,被配置成向该终端发送该融合视频。
第四方面,本申请实施例提供了一种视频融合装置,设置于终端包括:源视频发送单元,被配置成向第一服务器或第二服务器发送用户选择的源视频;模板获取单元,被配置成响应于接收到该第一服务器或该第二服务器发送的推送模板集合以及标记信息;其中,该标记信息中至少包括可编辑特征和帧图像信息中的一种;模板呈现单元,被配置成呈现该推送模板集合和该标记信息给该用户;选择信息发送单元,被配置成响应于接收到目标推送模板的选择信息,向该第一服务器或该第二服务器发送该目标推送模板的选择信息。
在一些实施例中,该装置还包括:融合视频接收单元,被配置成响应于接收到该第一服务器发送的融合视频,并呈现该融合视频给该用户;确认信息发送单元,被配置成响应于接收到指向该融合视频的合格信号,获取该用户的标识信息生成确认消息;向该第一服务器发送该确认消息。
在一些实施例中,该装置还包括:该融合视频接收单元进一步被配置成,响应于接收到该第二服务器发送的融合视频,并呈现该融合视频给用户;标识信息添加单元,被配置成响应于接收到指向该融合视频的合格信号,获取该用户的标识信息,为该融合视 频添加该用户的标识信息和与该目标推送模板对应的使用标记,生成确认融合视频;该融合视频还可以被配置成,发送该确认融合视频至该第一服务器。
在一些实施例中,该模板获取单元进一步被配置成,获取该第一服务器或该第二服务器发送的匹配推送模板集合;该模板呈现单元进一步被配置成,呈现该匹配推送模板集合和该标记信息给该用户;该选择信息发送单元进一步被配置成,向该第一服务器或该第二服务器发送根据该推送模板集合得到的匹配推送模板的选择信息。
在一些实施例中,该装置还包括:编辑特征请求单元,被配置成向该第一服务器或该第二服务器发送获取可编辑特征集合请求;编辑特征接收单元,被配置成响应于接收到该第一服务器或该第二服务器发送的可编辑特征集合;其中,该可编辑特征集合中包括一个或多个可编辑特征;编辑特征呈现单元,被配置成呈现该可编辑特征集合给该用户;接收到该可编辑特征集合的选择信息;其中,该选择信息用于指示该终端从该一个或多个可编辑特征中选择的至少一个可编辑特征;以及编辑特征选择信息接收单元,被配置成向该第一服务器或该第二服务器发送该可编辑特征集合的选择信息。
在一些实施例中,该装置还包括:推送模板更新请求单元,被配置成响应于接收到更新推送模板指令,生成推送模板更新请求;向该第一服务器或该第二服务器发送该推送模板更新请求;以及更新推送模板接收单元,被配置成接收该第一服务器或该第二服务器发送的更新推送模板集合;以及该模板呈现单元进一步被配置成,该呈现该推送模板集合和该标记信息给该用户,包括:呈现该更新推送模板集合和该标记信息给该用户。
第五方面,本申请实施例提供了一种计算机设备,该计算机设备包括:一个或多个处理器;存储装置,其上存储有一个或多个程序;当一个或多个程序被一个或多个处理器执行,使得一个或多个处理器实现如第一方面中任一实现方式描述的方法,或者实现如第二方面中任一实现方式描述的方法。
第六方面,本申请实施例提供了一种计算机可读介质,其上存储有计算机程序,该计算机程序被处理器执行时实现如第一方面中任一实现方式描述的方法,或者实现如第二方面中任一实现方式描述的方法。
本申请实施例提供的视频融合方法和设备,获取终端上传的源视频后,检测该源视频的帧图像中是否存在预先确定的可编辑特征,响应于确定帧图像中存在至少一个可编辑特征,向该终端发送帧图像中所存在的可编辑特征对应的推送模板集合以及标记信息,其中,标记信息中至少包括可编辑特征和帧图像中的一种;响应于从该终端接收到 对该推送模板集合中的目标推送模板的选择信息,将对应的目标推送模板融合至该源视频中,以生成融合视频。该实施方式可以结合上传用户和其他用户提供的模板信息对源视频进行二次编辑,丰富源视频中的内容,以提升源视频的质量并发掘源视频的更多价值。
附图说明
通过阅读参照以下附图所作的对非限制性实施例所作的详细描述,本申请的其它特征、目的和优点将会变得更明显:
图1是本申请一些实施例可以应用于其中的示例性系统架构;
图2是根据本申请的视频融合方法的第一个实施例的流程图;
图3是根据本申请的视频融合方法的一个实现方式的流程图;
图4是根据本申请的视频融合方法的另一个实现方式的流程图;
图5是根据本申请的视频融合方法的第二个实施例的流程图;
图6是根据本申请的视频融合方法的一个应用场景的流程图;
图7是根据本申请的视频融合方法的另一个应用场景的流程图;
图8是适于用来实现本申请一些实施例的计算机设备的计算机系统的结构示意图。
具体实施方式
下面结合附图和实施例对本申请作进一步的详细说明。可以理解的是,此处所描述的具体实施例仅仅用于解释相关发明,而非对该发明的限定。另外还需要说明的是,为了便于描述,附图中仅示出了与有关发明相关的部分。
需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互组合。下面将参考附图并结合实施例来详细说明本申请。
图1示出了可以应用本申请的视频融合方法的实施例的示例性系统架构100。
如图1所示,系统架构100中可以包括设备101、102、103、104和网络105。网络105用以在设备101、102、103、104之间提供通信链路的介质。网络105可以包括各种连接类型,例如有线、无线目标通信链路或者光纤电缆等等。
设备101、102、103、104可以是支持网络连接从而提供各种网络服务的硬件设备或软件。当设备为硬件时,其可以是各种电子设备包括但不限于智能手机、平板电脑、膝上型便携计算机、台式计算机和服务器等等。这时,作为硬件设备,其 可以实现成多个设备组成的分布式设备群,也可以实现成单个设备。当设备为软件时,可以安装在上述所列举的电子设备中。这时,作为软件,其可以实现成例如用来提供分布式服务的多个软件或软件模块,也可以实现成单个软件或软件模块。在此不做具体限定。
在实践中,设备可以通过安装相应的客户端应用或服务端应用来提供相应的网络服务。设备在安装了客户端应用之后,其可以在网络通信中体现为客户端。相应地,在安装了服务端应用之后,其可以在网络通信中体现为服务端。
作为示例,在图1中,设备101、102体现为终端,设备103体现为第一服务器,而设备104体现为第二服务器。具体地,设备101、102可以是安装有视频应用的客户端,设备103可以是为视频应用提供服务的后台服务端,104可以是为视频应用提供服务的后台服务端也可以为支持有模板上传的客户端。
需要说明的是,本申请实施例所提供的视频融合方法可以由设备101、102、103、104执行。
应该理解,图1中的网络和设备的数目仅仅是示意性的。根据实现需要,可以具有任意数目的网络和设备。
继续参考图2,其示出了根据本申请的视频融合方法的第一个实施例的流程200。应用于第一服务器或第二服务器,该视频融合方法可以包括以下步骤:
步骤201,获取终端上传的源视频。
在本实施例中,终端(例如图1所示的设备101、102)可以向第一服务器(例如图1所示的服务器104)、第二服务器(例如图1所示的服务器103)发送源视频。
实践中,第一服务器虽然可以为用户安装了视频应用的终端设备,但第一服务器通常指的是提供视频播放服务的视频播放平台侧所使用的服务器,第二服务器通常指的是模板提供方所使用的可用于实现本申请视频融合方法的设备或模板提供方所使用的用于上传推送模板的设备,终端通常表现为安装了视频应用的用户终端设备。视频的制作用户在社交应用上注册有视频账号。
通常,由终端上传的源视频是为了通过第一服务器向其他用户进行播放的源视频,源视频中包含有各种各样的用户创作的内容,不限于用户是针对现实生活中的内容进行拍摄的,还是使用工具合成的动画类视频,用户也可以根据拍摄的内容进行二次加工生成上述源视频,对此本申请不做限定。
步骤202,检测源视频的帧图像中是否存在预先确定的可编辑特征。
在本实施例中,在获取到终端上传的源视频后,例如第一服务器或第二服务器的用于执行视频融合方法的执行主体(简称融合执行主体),开始对源视频的图像中的帧图像进行提取,在提取过程中,可以对源视频中的所有帧图像进行提取,也可以按照一定的规则进行提取。
示例性的,在融合执行主体对源视频中的帧图像进行提取时,对源视频中的帧图像检测,确定带有可编辑标记的帧图像的范围,并对该范围内的帧图像进行检测。
其中,可编辑标记,可以由用户在制作源视频时进行添加,也可以在上传过程中对源视频进行标记,或者未融合执行主体发送各种形式的备注,例如在文件代码中进行标记或者发送单独的识别字段,用户通过添加可编辑标记,实现对允许融合执行主体进行提取的帧图像的范围进行设置,以标记用户希望和/或不希望被扩展的帧图像的范围,更加贴近用户的需求。
在确定源视频的可编辑帧图像的范围后,对其中的内容进行检测,检测帧图像中是否存在预先确定的可编辑特征。
其中,可编辑特征包括但不限于文本、图像、动画、声音、视频及其组合,在融合执行主体检测到该可编辑特征时,可以确定该帧图像可编辑,为该帧图像中插入其他的文字、图像、动画、声音等内容。可编辑特征,由融合执行主体预先进行确定,以便于可以根据识别特征对应的内容对源视频的帧图像进行筛选,确定可用于编辑的帧图像。
应当理解的是,确定可编辑的特征时,通常基于推送模板、推送模板的集合来确定。在确定过程中,可以在融合执行主体预先确定了常见模板类型后,再确定基础的可编辑特征,并为这些可编辑特征添加对应的模板信息。也可以在获取到一定的推送模板或者获取到模板集合的类别信息后,根据推送模板或者类别信息生成对应的可编辑特征,以便于这些可编辑特征与推送模板或者推送模板集合之间存在对应的查找关系。
在一些实施例中,可编辑特征的确定方式包括:获取不同类型的推送模板集合,根据推送模板集合的类型确定对应的匹配可编辑特征。
具体的,预先获取不同类型的推送模板集合,基于推送模板集合的类型来确定不同的可匹配特征,模板类型可以与推送模板的内容有关,也可以与推送模板即将插入、替换的内容有关,也可以与推送模板的作用有关,例如在确定推送模板集合可以分为碳酸饮料分为、果汁饮料分类、功能性饮料分类等时,可以确定可编辑特征为视频帧中的饮料瓶图像或者文字标记的“饮料”,通过这种方式,可以根据预 先获取到的推送模板集合,即希望扩展的视频内容的具体信息来确定合适的可编辑特征,在确定存在这些特征时进行内容扩充、替换,不仅提高了扩充、替换内容的相关性和质量,还提高了编辑的效率。
步骤203,响应于确定该帧图像中存在至少一个可编辑特征,向该终端发送该帧图像中所存在的可编辑特征对应的推送模板集合以及标记信息。
在本实施例中,在确定检测到上述步骤202中确定的可编辑特征后,根据该可编辑特征确定对应的推送模板集合和标记信息,然后将这些信息发送给用户用于上传源视频的终端,以便于使用该终端的用户根据推送模板集合和标记信息来确定希望使用的推送模板,以便于将该推送模板融合至源视频中,生成融合视频。
其中,在为终端发送可编辑特征的推送模板集合时,发送对应的标记信息,以方便用户了解可编辑特征存在的视频帧的位置及内容或者了解期望添加的内容是针对何种内容进行扩充的,因此,可以理解,标记信息中至少会包括可编辑特征和帧图像信息中至少一种,以实现上述目的。
步骤204,响应于从该终端接收到对该推送模板集合中的目标推送模板的选择信息,将该目标推送模板融合至该源视频中,以生成融合视频。
在本实施例中,融合执行主体在接收到终端基于上述步骤203中发送的目标推送模板集合返回的选择信息后,根据该选择信息中的内容确定用于融合至源视频中的目标推送模板,并将该目标模板融合至源视频中。
在一些实施例中,该视频融合方法还包括:响应于从终端接收到推送模板集合更新请求,重新确定该可编辑特征对应的推送模板集合,得到更新推送模板集合;向该终端发送该更新推送模板集合。
具体的,在融合执行主体接收到推送模板集合更新请求时,响应该请求,重新生成推送模板集合,并将该推送模板集合发送至终端,在用户不满足于当前推送模板集合中内容的时候,对推送模板集合进行更新,以便于终端根据该更新推送模板集合选取合适的推送模板,扩充用户可选择的推送模板的内容。
应当理解的是,根据不同形式的推送模板可以确定不同的融合方式,例如当推送模板是图像形式的时候,可以采用例如人工智能融合、贴图或像素替换等方式进行图像融合。
在一些实施例中,融合推送模板至源视频中生成融合视频包括:采用人工智能图像融合技术将该目标推送模板融合至该源视频对应的帧图像中。
具体的,人工智能图像融合技术(Artificial Intelligence,简称AI),指的是通 过图片语义软分割的深度学习算法实现语义分割,旨在精确表示图像不同区域间的软过渡,类似与磁力套索(magnetic lasso)和魔术棒(magic wand)的功能,因人工智能方式可以实现自动对图像中的特征、内容进行提取,并根据图像深层次的特征进行融合,提供了一种高效率、高质量的图像融合的方式,以节约人力成本。
在一些实施例中,采用人工智能图像融合技术将该目标推送模板融合至该源视频对应的帧图像中的步骤包括:获取该源视频对应的帧图像;采用语义分割神经网络处理该源视频对应的帧图像,确定该源视频对应的帧图像中包括该可编辑特征的图像区域,得到目标融合区域;将该目标推送模板中的内容替换添加至该目标融合区域。
具体的,参考图3,其中示出了一种采用人工智能图像融合技术将推送模板融合至源视频对应的图像中的一个实现方式的流程300,具体包括:
步骤301,获取源视频对应的帧图像。
步骤302,采用语义分割神经网络处理该源视频对应的帧图像,确定该源视频对应的帧图像中包括该可编辑特征的图像区域,得到目标融合区域。
具体的,语义分割神经网络通畅指的基于图像中像素点的类分情况实现对图像中不同内容进行区分的图卷积神经网络,例如全卷积神经网络(Fully convolutional networks,简称FCN)、U-net语义神经分割网络和SegNet卷积神经网络等。
通常语义软分割的的神经网络中首先构建低层次的仿射关系项,以表示基于颜色的像素间较大范围的关联性特征.,然后通过构建高层语义仿射关系项,以使得属于同一场景物体的像素尽可能的接近,不同场景物体的像素间的关系远离,再通过对Laplacian矩阵进行特征分解,提取特征向量,并对特征向量进行两步稀疏处理,来创建图像层,最后基于根据特征向量来实现图像分割,确定可编辑特征的图像区域,即确定目标融合区域。
步骤303,将该目标推送模板中的内容替换添加至该目标融合区域。
具体的,提取该目标推送模板中的内容后,可以基于特征对齐、尺寸对齐等方式,将该目标推送模板中的内容与目标融合区域中的内容进行替换,以实现将目标推送模板中的内容替换添加至目标融合区域的目的。
在本实现方式中,通过语义分割神经网络实现从谱分割角度来解决模块划分问题,并且考虑了图片的纹理和颜色特征,使用图结构由深度神经网络生成的更高层的语义信息实现对推送模板中的内容进行提取,并将提取到的内容对应添加至源视频的帧图像中,以提升融合视频中推送模板与帧图像的融合效果。
本申请实施例提供的视频融合方法,获取终端上传的源视频后,检测该源视频的帧图像中是否存在预先确定的可编辑特征,响应于确定帧图像中存在至少一个可编辑特征,向该终端发送帧图像中所存在的可编辑特征对应的推送模板集合以及标记信息,其中,标记信息中至少包括可编辑特征和帧图像中的一种;响应于从该终端接收到对该推送模板集合中的目标推送模板的选择信息,将对应的目标推送模板融合至该源视频中,以生成融合视频。该实施方式可以结合上传用户和其他用户提供的模板信息对源视频进行二次编辑,丰富源视频中的内容,以提升源视频的质量并发掘源视频的更多价值。
具体的,为了更好的说明匹配可编辑特征的确定方式以及后续根据该匹配可编辑特征确定推送模板的流程,继续参考图4,其示出了根据本申请的视频融合方法一种实现方式的流程400,具体包括以下步骤:
步骤401,获取不同类型的推送模板集合,根据推送模板集合的类型确定对应的匹配可编辑特征。
具体的,融合执行主体可以从本地或者非本地的设备预先获取多个推送模板,并对这些推送模板进行分类,确定不同类型的推送模板集合,然后根据确定的推送模板集合的类型不同选取合适的可编辑特征进行对应,例如获取的推送模板分别为不同品牌、型号的手机,则可以确定推送模板集合的类型为手机类型,自动匹配手机图像作为对应的可编辑特征,并确定该匹配可编辑特征,基于推送模板来确定匹配可编辑特征,以保证确定到的匹配可编辑特征都有足够的匹配推送模板来对应,提升可编辑特征的质量。
在一些实施例中,在融合执行主体为第一服务器时,可以从第二服务器处接收推送模板集合,以便于了解第二服务器的使用用户的具体需求,以提升获取到的推送模板集合的质量。
步骤402,检测源视频的帧图像中是否存在该匹配可编辑特征。
具体的,可以根据图像相似度算法或者深度学习的方式,对获取到的源视频的帧图像进行检测,检测帧图像中是否与可编辑特征相同或相近似的图像内容,在帧图像中存在与可编辑特征相同或相近似的图像内容时,认为该帧图像中存在可编辑特征,即后续可以根据该可编辑特征选取对应的推送模板对该帧图像进行编辑,对存在可编辑特征的帧进行提取,或者根据帧序列中存在可编辑特征的帧图像的序号进行标记、记录,以便于后续可以查找到存在可编辑特征的帧图像。
步骤403,响应于确定该帧图像中存在至少一个该匹配可编辑特征,得到与该 匹配可编辑特征对应的匹配推送模板集合。
具体的,在检测到帧图像中存在至少一个匹配可编辑特征时,基于检测到的匹配可编辑特征确定对应的匹配推送模板集合,例如在检测到帧图像中存在手机图像时,确定手机类型待推送模板集合作为匹配推送模板集合,因匹配可编辑特征具有明确的匹配推送模板集合,通过匹配可编辑特征可以快速的确定对应的匹配推送集合,以提升推送模板的确认效率。
步骤404,向终端发送该匹配推送模板集合。
步骤405,响应于从终端接收到对匹配推送模板集合中的匹配推送模板的选择信息,将该匹配推送模板融合至源视频中,以生成融合视频。
通过该实现方式,可以看出在融合执行主体获取到推送模板后,根据获取到的推送模板的类型和内容来确定推送模板集合,在确定推送模板集合后,基于推送模板的类型信息确定匹配可编辑特征,即由融合执行主体主动进行匹配的可编辑特征,后续在对源视频的帧图像进行检测时,根据可编辑特征进行匹配,实现自动对源视频进行检测、发送推送模板集合的功能,以根据推送模板集合确定对应的可编辑特征,在提高可编辑特征确定效率的同时,也便于用户根据融合执行主体的匹配结果选取合适的扩展内容。
在一些实施例中,在融合执行主体为上述第一服务器时,该视频融合方法还包括:向该终端发送该融合视频,以使得该终端向用户展示该融合视频;响应于接收到该终端发送的指向该融合视频的确认消息,该确认消息包括该用户的标识信息;为该融合视频添加该用户的标识信息和与该目标推送模板对应的使用标记。
具体的,在融合执行主体为上述第一服务器时,向终端发送融合视频进行确认,在融合执行主体接收到终端发送的指向该融合视频的包括该用户的标识信息的确认消息后,可认为用户同意使用该融合视频,则为该融合视频添该用户的标识信息和目标推送模板的使用标记,可以在为用户呈现融合效果的技术上,更多的考虑用户的制作意见,并且后续可以根据目标推送模板的使用标记来确定使用的模板,了解融合视频的生成情况。
在一些实施例中,在融合执行主体为上述第一服务器时,该视频融合方法还包括:向第二服务器发送该融合视频;接收到该第二服务器发送的使用许可信息;向该终端发送该使用许可信息。
具体的,在融合执行主体为上述第一服务器时,同样可以向第二服务器发送该融合视频,在接收到第二服务器发送的使用许可信息时,则可认为第二服务器的使 用用户允许使用该融合视频,即确认基于目标推送模板生成的融合视频中的内容可以满足第二服务器的使用用户的需求,然后将该许可信息发送给上传源视频的终端,以实现使用终端的用户和使用第二服务器的用户之间的信息互通,以便平衡双方需求,提升融合视频的质量。
在一些实施例中,在融合执行主体为上述第二服务器时,还包括:向该终端发送该融合视频。
具体的,在融合执行主体为上述第二服务器时,在生成融合视频后,向终端发送该融合视频,以便于后续在终端的用户认为生成的融合视频可以满足要求时,直接使用该融合视频,避免因融合视频反复传输造成的资源浪费。
在一些实施例中,在响应于确定该帧图像中存在至少一个可编辑特征,向该终端发送该帧图像中所存在的可编辑特征对应的推送模板集合以及标记信息之前,还包括:响应于接收到该终端发送的可编辑特征集合获取请求,向该终端发送可编辑特征集合,其中,该可编辑特征集合中包括一个或多个可编辑特征;接收该终端发送的关于该可编辑特征集合的选择信息,该选择信息用于指示该终端从该一个或多个可编辑特征中选择的至少一个可编辑特征;以及该确定该帧图像中存在至少一个可编辑特征,包括:根据该选择信息确定该帧图像中存在至少一个可编辑特征。
具体的,在向终端发送推送模板和标记信息之前,接收到终端发送的可编辑特征集合获取请求,向该终端发送包括一个或者多个可编辑特征的可编辑特征集合,然后接收到终端发送的基于该可编辑特征集合确定的选择信息,从该选择信息中读取使用终端的用户指定的可编辑特征,然后根据该用户指定的可编辑特征后续确定推送模板集合,通过预先为用户呈现可编辑特征的方式,实现用户可以根据自身需求选取合适的可编辑特征,并得到对应的推送模板集合,以更好的满足用户的使用需求。
继续参考图5,其示出了根据本申请的视频融合方法的第二个实施例的流程500。应用于终端,该视频融合方法可以包括以下步骤:
步骤501,向第一服务器或第二服务器发送源视频。
在本实施例中,终端(例如图1所示的设备101、102)可以向第一服务器(例如图1所示的服务器103)或第二服务器(例如图1所示的服务器104)发送源视频。
实践中,第一服务器或第二服务器虽然可以为用户安装了视频应用的终端设备,但通常表现为视频应用的后台服务器,终端通对应的,通常表现为安装了视频 应用的用户终端设备。视频的制作用户在社交应用上注册有视频账号。
通常,由终端上传的源视频是为了通过服务器向其他用户进行播放的源视频,源视频中包含有各种各样的用户创作的内容,不限于用户是针对现实生活中的内容进行拍摄的,还是使用工具合成的动画类视频,用户也可以根据拍摄的内容进行二次加工生成上述源视频,对此本申请不做限定。
其中,用户使用用于终端的视频融合方法的执行主体(简称用户执行主体),向第一服务器或第二服务器发送源视频,源视频中包含有各种各样的用户创作的内容,不限于用户是针对现实生活中的内容进行拍摄的,还是使用工具合成的动画类视频,用户也可以根据拍摄的内容进行二次加工生成上述源视频,对此本申请不做限定。
其中,该用户还可以在发送的源视频中添加可编辑标记,例如在用户在制作源视频时添加可编辑标记,或在上传过程中对源视频进行标记,或者未融合执行主体发送各种形式的备注,例如在文件代码中进行标记或者发送单独的识别字段,用户通过添加可编辑标记,实现对允许融合执行主体进行提取的帧图像的范围进行设置,以标记用户希望和/或不希望被扩展的帧图像的范围,更加贴近用户的需求。
步骤502,响应于接收到第一服务器或第二服务器发送的推送模板集合以及标记信息。
在本实施例中,推送模板集合中有一个或者多个推送模板,标记信息中至少包括可编辑特征和帧图像信息中的一种,推送模板可在存在可编辑特征的帧图像中,对可编辑特征进行替换的内容。
步骤503,呈现推送模板集合和标记信息给用户。
在本实施例中,用户执行主体在获取到推送模板集合和标记信息后,可通过本地的显示设备将推送模板集合和标记信息呈现给用户,以便于用户根据标记信息确定可编辑特征和/或帧图像信息和显示出的推送模板集合中的推送,确定期望选用的推送模板。
步骤504,响应于接收到目标推送模板的选择信息,向第一服务器或第二服务器发送目标推送模板的选择信息。
在本实施例中,在用户确定了期望选用的推送模板后,会对用户执行主体进行指示,以电信号等形式告知用户执行主体用户选择的推送模板,即确定了目标推送模板的选择信息,因此,用户执行主体接收到该信号后,确定了用户选择的目标推送模板的选择信息,向发送该推送模板集合的第一服务器或第二服务器发送目标推 送模板的选择信息。
其中,选择信息中还可以包括用户期望添加推送模板的帧数,以便于融合执行主体更好的了解用户的期望,并根据该期望添加推送模板中的内容。
本申请实施例提供的视频融合方法,在向第一服务器或第二服务器发送用户选择的源视频后,响应于接收到该第一服务器或该第二服务器发送的推送模板集合以及标记信息;其中,该标记信息中至少包括可编辑特征和帧图像信息中的一种;呈现该推送模板集合和该标记信息给该用户,响应于接收到目标推送模板的选择信息,向该第一服务器或该第二服务器发送该目标推送模板的选择信息。该实施方式可以通过第一服务器或者第二服务器实现对源视频内容的二次编辑,丰富源视频中的内容,以提升源视频的质量并发掘源视频的更多价值。
在一些实施例中,该方法还包括:响应于接收到该第一服务器发送的融合视频,并呈现该融合视频给该用户;响应于接收到指向该融合视频的合格信号,获取该用户的标识信息生成确认消息;向该第一服务器发送该确认消息。
具体的,在接收到第一服务器发送的融合视频后,将该融合视频呈现给用户,以便于向用户反馈将目标推送模板融合进源视频后,生成的融合视频,如用户同意使用该融合视频则向用户执行主体发出指示可使用该融合视频的合格信号,在此之后用户执行主体会根据用户的标识信息生成对应的确认信息,发送给第一服务器,以便于第一服务器根据该确认信息了解可使用该融合视频,并根据其中的用户标识对融合视频进行标记,以便于建立融合视频和用户之间的联系,后续可根据该融合视频为其他用户提供源视频上传的用户信息,在保护用户著作权的同时,发掘更多潜在价值。
在一些实施例中,该方法还包括:响应于接收到该第二服务器发送的融合视频,并呈现该融合视频给该用户;响应于接收到指向该融合视频的合格信号,获取该用户的标识信息,为该融合视频添加该用户的标识信息和与该目标推送模板对应的使用标记,生成确认融合视频;发送该确认融合视频至该第一服务器。
具体的,在接收到第二服务器发送的融合视频后,将该融合视频呈现给用户,以便于向用户反馈将目标推送模板融合至源视频后,生成的融合视频,如用户同意使用该融合视频则向用户执行主体发出指示可使用该融合视频的合格信号,在此之后用户执行主体会根据用户的标识信息生成对应的确认信息,直接添加至融合视频中,然后将该融合视频发送给第一服务器进行展示,在上述实现方式中以便于第一服务器根据该确认信息了解可使用该融合视频,并根据其中的用户标识对融合视频 进行标记,以便于建立融合视频和用户之间的联系,后续可根据该融合视频为其他用户提供源视频上传的用户信息,在保护用户著作权的同时,发掘更多潜在价值的基础上,还减少了将融合视频重复发送至第二服务器中再进行上传的过程,节约了传输资源。
应当理解的是,因可能同时接收到多个可编辑特征及他们对应的推送模板集合,选择信息中可能存在选择有多个推送模板。
在一些实施例中,获取该推送模板选择请求中的推送模板集合包括:获取该第一服务器或第二服务器发送的匹配推送模板集合。
具体的,该匹配推送模板集合的确定方式及后续根据该匹配推送模板集合得到匹配推送模板的选择信息的方式与上述图4中所示的实现方式相似,重复内容不再赘述,因匹配推送模板集合是基于推送模板集合分类信息得到的可编辑特征得到的,通过发送匹配推送模板集合的方式可以提升推送模板集合的质量,以及提升用户确定目标推送模板(匹配推送模板)的效率。
在一些实施例中,获取基于该推送模板集合选择的推送模板,得到对应的推送模板的选择信息,向第一服务器或第二服务器发送该选择信息包括:响应于接收到获取可编辑特征集合指令,向该第一服务器或第二服务器发送获取可编辑特征集合请求;其中,该可编辑特征集合中包括至少一个可编辑特征;响应于接收到该第一服务器或第二服务器发送的可编辑特征集合,得到该用户基于该可编辑特征确定的自选推送模板;向该第一服务器或第二服务器发送该自选推送模板。
在一些实施例中,该方法还包括:向该第一服务器或该第二服务器发送获取可编辑特征集合请求;响应于接收到该第一服务器或该第二服务器发送的可编辑特征集合;其中,该可编辑特征集合中包括一个或多个可编辑特征;接收到该可编辑特征集合的选择信息;其中,该选择信息用于指示该终端从该一个或多个可编辑特征中选择的至少一个可编辑特征;呈现该可编辑特征集合给该用户;向该第一服务器或该第二服务器发送该可编辑特征集合的选择信息。
具体的,还可以在用户执行主体接收到源视频上传用户的获取可编辑特征的指示后,向具体接收到该源视频的第一服务器或第二服务器发送获取可编辑特征集合请求,然后接收到基于该请求由该第一服务器或该第二服务器返回的可编辑特征集合,该可编辑特征集合中包括一个或多个可编辑特征;然后将该可编辑特征集合呈现给用户,在用户确定可编辑特征后,向用户执行主体发送可编辑特征集合的选择信息,选择信息用于指示该终端从该一个或多个可编辑特征中选择的至少一个可编 辑特征,确定用户执行主体响应于收到该选择信息,向具体接收到该源视频的第一服务器或第二服务器发送该可编辑特征集合的选择信息,以便于第一服务器或第二服务器后续根据该可编辑特征集合的选择信息确定对应的推送模板集合,以实现为用户提供可编辑特征后,根据用户对于可编辑特征的内容发送对应的推送模板集合,贴合用户的实际需求,以提升目标推送模板的确定效率和确定到的目标推送模板的质量。
在一些实施例中,还包括:响应于接收到更新推送模板指令,生成推送模板更新请求;向该第一服务器或第二服务器发送该推送模板更新请求。
具体的,在第二执行主体接收到第一服务器或第二服务器发送的推送模板集合后,若该推送模板集合中的推送模板内容无法满足用户的需求,则用户可以向第二执行主体发出推送模板更新指令,在第二执行主体接收到该推送模板更新指令后,可以基于该指令生成模板更新请求,并发送给第一服务器或第二服务器以获取新的推送模板集合,更好的服务于用户,通过推送模板集合进行更新的方式,更好的满足用户的需求,以提高得到的目标推送模板的质量。
为了便于理解,下面提供视频融合方法的一个应用场景。在该应用场景下,智能移动终端D1为用户上传源视频的终端,其中可以安装有视频应用,服务器S1为体现为视频应用的后台服务器的第一服务器,服务器S2为体现为推送提供侧的第二服务器,用户U1利用智能移动终端D1上传源视频A1至服务器S1。
具体地,参见图6所示,服务器S1预先从服务器S2处获取了推送模板结合B、C,以及本地保存的推送模板E。
用户U1利用智能移动终端D1的向服务器S1上传源视频A1,服务器S1对该源视频A1进行解析后确定其中第30-35帧和第40-45帧中存在可编辑特征A11、A12,生成对应的标记信息,将根据A11、A12确定的推送模板集合B、C和该标记信息一起发送至智能移动终端D1给用户U1进行选择。
智能移动终端D1,即用户U1接收到该信息后,允许使用A11对应的推送模板集合中的推送模板B11对第30-35帧的图像帧进行编辑,不允许使用推送模板集合C中的推送模板对A12进行编辑,然后用户U1使用智能移动终端D1将该选择信息给服务器S1,并向服务器S1发出更新推送模板请求,以得到针对编辑特征A12的更新推送模板集合E。
用户U1接收到该推送更新推送模板集合E后,允许使用更新推送模板集合E中的E11对第40-45帧的图像帧进行编辑,并使用智能移动终端D1将该选择信息 发送给服务器S1。
服务器S1,在采用语义分割神经网络对第30-35帧和第40-45帧图像进行处理,确定图像中的目标融合区域,然后分别将模板B11融入第30-35帧图像,模板E11融入第40-45帧图像,并生成融合视频R1。
服务器S1将该融合视频R1发送至智能移动终端D1后,智能移动终端D1将该融合视频R1展示给用户U1,用户U1对该融合视频进行确认,允许使用该融合视频R1,则使用移动终端智能移动终端D1向服务器发送包含用户U1的标识信息的确认信息。
服务器S1在接收到移动终端智能移动终端D1发送的确认信息,在该融合视频R1中添加使用的模板B11和E11对应的使用标记,并发送给服务器S2进行确认。
接收到服务器S2发送的许可使用信息,最终完成视频融合工作,并将生成的融合视频R1保存至本地。
为了便于理解,下面提供视频融合方法的另一个应用场景。在该应用场景下,智能移动终端D2为用户U2上传源视频的终端,其中可以安装有视频应用。服务器S1为体现为视频应用后台的第一服务器,服务器S2为体现为模板提供侧的第二服务器。用户U2利用智能移动终端D2上传源视频A2至服务器S2。
具体地,参见图7所示,用户U1利用智能移动终端D2的向服务器S2上传源视频A2,服务器S2对该源视频A2进行解析后确定其中第10-15帧存在可编辑特征A21,生成对应的标记信息,将根据A21确定的推送模板集合F和该标记信息一起发送至智能移动终端D2给用户U2进行选择。
智能移动终端D2,即用户U2接收到该信息后,允许使用A21对应的推送模板集合中的推送模板F11对第10-15帧的图像帧进行编辑,并使用D2将该选择信息发送给服务器S2。
服务器S2,在采用语义分割神经网络对第10-15帧图像进行处理,确定图像中的目标融合区域,推送模板F11融入第10-15帧图像,并生成融合视频R2。
服务器S2将该融合视频R2发送至智能移动终端D2后,智能移动终端D2将该融合视频R2展示给用户U2,用户U2对该融合视频进行确认,允许使用该融合视频,则智能移动终端D2获取用户U2的标识信息后,为融合视频R2添加用户U2的标识信息和使用的推送模板F11的使用标记,并发送添加了添加用户U2的标识信息和使用的推送模板F11的使用标记的融合视频R2至服务器S1,以保存至服务器S1本地。
下面参考图8,其示出了适于用来实现本申请实施例的计算机设备(例如图1所示的设备101、102、103、104)的计算机系统800的结构示意图。图8示出的计算机设备仅仅是一个示例,不应对本申请实施例的功能和使用范围带来任何限制。
如图8所示,计算机系统800包括中央处理单元(CPU)801,其可以根据存储在只读存储器(ROM)802中的程序或者从存储部分808加载到随机访问存储器(RAM)803中的程序而执行各种适当的动作和处理。在RAM 803中,还存储有系统800操作所需的各种程序和数据。CPU 801、ROM 802以及RAM 803通过总线804彼此相连。输入/输出(I/O)接口805也连接至总线804。
以下部件连接至I/O接口805:包括键盘、鼠标等的输入部分806;包括诸如阴极射线管(CRT)、液晶显示器(LCD)等以及扬声器等的输出部分805;包括硬盘等的存储部分807;以及包括诸如LAN卡、调制解调器等的网络接口卡的通信部分809。通信部分809经由诸如因特网的网络执行通信处理。驱动器810也根据需要连接至I/O接口805。可拆卸介质811,诸如磁盘、光盘、磁光盘、半导体存储器等等,根据需要安装在驱动器810上,以便于从其上读出的计算机程序根据需要被安装入存储部分807。
特别地,根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信部分809从网络上被下载和安装,和/或从可拆卸介质811被安装。在该计算机程序被中央处理单元(CPU)801执行时,执行本申请的方法中限定的上述功能。
需要说明的是,本申请该的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本申请中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本申请中,计算机可读的信号介质可以包括在基带中或者作为载波一部分传播的数据信 号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读的信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输包括但不限于:无线、电线、光缆、RF等等,或者上述的任意合适的组合。
可以以一种或多种程序设计语言或其组合来编写用于执行本申请的操作的计算机程序代码,该程序设计语言包括面向目标的程序设计语言—诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或电子设备上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络——包括局域网(LAN)或广域网(WAN)—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。
附图中的流程图和框图,图示了按照本申请各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这根据所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以采用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
描述于本申请实施例中所涉及到的单元可以通过软件的方式实现,也可以通过硬件的方式来实现。所描述的单元也可以设置在处理器中,例如,可以描述为:一种处理器包括源视频获取单元、源视频检测单元、推送模板发送单元和融合视频生成单元。其中,这些单元的名称在种情况下并不构成对该单元本身的限定,例如,源视频获取单元还可以被描述为“获取终端上传的源视频”。又例如,可以描述为:一种处理器包括源视频发送单元、模板获取单元、模板呈现单元和选择信息发送单元。其中,这些单元的名称在种情况下并不构成对该单元本身的限定,例如,源视 频发送单元还可以被描述为“向第一服务器或第二服务器发送用户选择的源视频”。
作为另一方面,本申请还提供了一种计算机可读介质,该计算机可读介质可以是上述实施例中描述的计算机设备中所包含的;也可以是单独存在,而未装配入该计算机设备中。上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被该计算机设备执行时,使得该计算机设备:获取终端上传的源视频后,检测该源视频的帧图像中是否存在预先确定的可编辑特征,响应于确定帧图像中存在至少一个可编辑特征,向该终端发送帧图像中所存在的可编辑特征对应的推送模板集合以及标记信息,其中,标记信息中至少包括可编辑特征和帧图像中的一种;响应于从该终端接收到对该推送模板集合中的目标推送模板的选择信息,将对应的目标推送模板融合至该源视频中,以生成融合视频。以及在向第一服务器或第二服务器发送用户选择的源视频后,响应于接收到该第一服务器或该第二服务器发送的推送模板集合以及标记信息;其中,该标记信息中至少包括可编辑特征和帧图像信息中的一种;呈现该推送模板集合和该标记信息给该用户,响应于接收到目标推送模板的选择信息,向该第一服务器或该第二服务器发送该目标推送模板的选择信息。
以上描述仅为本申请的较佳实施例以及对所运用技术原理的说明。本领域技术人员应当理解,本申请中所涉及的发明范围,并不限于上述技术特征的特定组合而成的技术方案,同时也应涵盖在不脱离上述发明构思的情况下,由上述技术特征或其等同特征进行任意组合而形成的其它技术方案。例如上述特征与本申请中公开的(但不限于)具有类似功能的技术特征进行互相替换而形成的技术方案。

Claims (20)

  1. 一种视频融合方法,包括:
    获取终端上传的源视频;
    检测所述源视频的帧图像中是否存在预先确定的可编辑特征;
    响应于确定所述帧图像中存在至少一个可编辑特征,向所述终端发送所述帧图像中所存在的可编辑特征对应的推送模板集合以及标记信息,其中,所述标记信息中至少包括所述可编辑特征和所述帧图像中的一种;
    响应于从所述终端接收到对所述推送模板集合中的目标推送模板的选择信息,将所述目标推送模板融合至所述源视频中,以生成融合视频。
  2. 根据权利要求1中所述的方法,其中,所述将所述推送模板融合至所述源视频中,以生成融合视频,包括:
    采用人工智能图像融合技术将所述目标推送模板融合至所述源视频对应的帧图像中。
  3. 根据权利要求2所述的方法,所述采用人工智能图像融合技术将所述目标推送模板融合至所述源视频对应的帧图像中的步骤包括:
    获取所述源视频对应的帧图像;
    采用语义分割神经网络处理所述源视频对应的帧图像,确定所述源视频对应的帧图像中包括所述可编辑特征的图像区域,得到目标融合区域;
    将所述目标推送模板中的内容替换添加至所述目标融合区域。
  4. 根据权利要求1中所述的方法,其中,所述检测所述源视频的帧图像中是否存在预先确定的可编辑特征包括:
    获取不同类型的推送模板集合,根据所述推送模板集合的类型确定对应的匹配可编辑特征;
    检测所述源视频的帧图像中是否存在所述匹配可编辑特征。
  5. 根据权利要求4中所述的方法,其中,所述响应于确定所述帧图像中存在至少一个可编辑特征,向所述终端发送所述帧图像中所存在的可编辑特征对应的推送模板集合,包括:
    响应于确定所述帧图像中存在至少一个所述匹配可编辑特征,得到与所述匹配可编 辑特征对应的匹配推送模板集合;
    向所述终端发送所述匹配推送模板集合。
  6. 根据权利要求5中所述的方法,其中,所述目标推送模板的选择信息,包括:
    根据所述匹配推送模板集合得到的匹配推送模板的选择信息;以及
    所述将所述目标推送模板融合至所述源视频中,以生成融合视频包括:
    将所述匹配推送模板融合至所述源视频中,以生成融合视频。
  7. 根据权利要求1中所述的方法,所述响应于确定所述帧图像中存在至少一个可编辑特征,向所述终端发送所述帧图像中所存在的可编辑特征对应的推送模板集合以及标记信息之前,还包括:
    响应于接收到所述终端发送的可编辑特征集合获取请求,向所述终端发送可编辑特征集合,其中,所述可编辑特征集合中包括一个或多个可编辑特征;接收所述终端发送的关于所述可编辑特征集合的选择信息,所述选择信息用于指示所述终端从所述一个或多个可编辑特征中选择的至少一个可编辑特征;以及
    所述确定所述帧图像中存在至少一个可编辑特征,包括:
    根据所述选择信息确定所述帧图像中存在至少一个可编辑特征。
  8. 根据权利要求1所述的方法,还包括:
    响应于从所述终端接收到推送模板集合更新请求,重新确定所述可编辑特征对应的推送模板集合,得到更新推送模板集合;
    向所述终端发送所述更新推送模板集合。
  9. 根据权利要求1至8中任一项所述的方法,所述方法应用于第一服务器,还包括:
    向所述终端发送所述融合视频,以使得所述终端向用户展示所述融合视频;
    响应于接收到所述终端发送的指向所述融合视频的确认消息,所述确认消息包括所述用户的标识信息;
    为所述融合视频添加所述用户的标识信息和与所述目标推送模板对应的使用标记。
  10. 根据权利要求9所述的方法,还包括:
    接收第二服务器发送的至少一个推送模板集合。
  11. 根据权利要求9或10所述的方法,还包括:
    向第二服务器发送所述融合视频;
    接收到所述第二服务器发送的使用许可信息;
    向所述终端发送所述使用许可信息。
  12. 根据权利要求1至8中任一项所述的方法,所述方法应用于第二服务器时,还包括:
    向所述终端发送所述融合视频。
  13. 一种视频融合方法,应用于终端,包括:
    向第一服务器或第二服务器发送用户选择的源视频;
    响应于接收到所述第一服务器或所述第二服务器发送的推送模板集合以及标记信息;其中,所述标记信息中至少包括可编辑特征和帧图像信息中的一种;
    呈现所述推送模板集合和所述标记信息给所述用户;
    响应于接收到目标推送模板的选择信息,向所述第一服务器或所述第二服务器发送所述目标推送模板的选择信息。
  14. 根据权利要求13的所述方法,还包括:
    响应于接收到所述第一服务器发送的融合视频,并呈现所述融合视频给所述用户;
    响应于接收到指向所述融合视频的合格信号,获取所述用户的标识信息生成确认消息;
    向所述第一服务器发送所述确认消息。
  15. 根据权利要求13所述的方法,还包括:
    响应于接收到所述第二服务器发送的融合视频,并呈现所述融合视频给用户;
    响应于接收到指向所述融合视频的合格信号,获取所述用户的标识信息,为所述融合视频添加所述用户的标识信息和与所述目标推送模板对应的使用标记,生成确认融合视频;发送所述确认融合视频至所述第一服务器。
  16. 根据权利要求13所述的方法,其中,所述推送模板集合包括:
    获取所述第一服务器或所述第二服务器发送的匹配推送模板集合;以及
    所述呈现所述推送模板集合和所述标记信息给所述用户,包括:
    呈现所述匹配推送模板集合和所述标记信息给所述用户;以及
    所述目标推送模板的选择信息,包括:
    根据所述推送模板集合得到的匹配推送模板的选择信息。
  17. 根据权利要求13所述的方法,其中,还包括:
    向所述第一服务器或所述第二服务器发送获取可编辑特征集合请求;
    响应于接收到所述第一服务器或所述第二服务器发送的可编辑特征集合;其中,所述可编辑特征集合中包括一个或多个可编辑特征;
    呈现所述可编辑特征集合给所述用户;
    接收到所述可编辑特征集合的选择信息;其中,所述选择信息用于指示所述终端从所述一个或多个可编辑特征中选择的至少一个可编辑特征;
    向所述第一服务器或所述第二服务器发送所述可编辑特征集合的选择信息。
  18. 根据权利要求13所述的方法,还包括:
    响应于接收到更新推送模板指令,生成推送模板更新请求;
    向所述第一服务器或所述第二服务器发送所述推送模板更新请求;
    接收所述第一服务器或所述第二服务器发送的更新推送模板集合;
    以及
    所述呈现所述推送模板集合和所述标记信息给所述用户,包括:
    呈现所述更新推送模板集合和所述标记信息给所述用户。
  19. 一种计算机设备包括:
    一个或多个处理器;
    存储装置,其上存储一个或多个程序;
    当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如权利要求1-12中任一所述的方法,或者实现如权利要求13-18中任一所述的方法。
  20. 一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现如权利要求1-12中任一所述的方法,或者实现如权利要求13-18中任一所述的方法。
PCT/CN2021/119606 2020-09-25 2021-09-22 视频融合方法和设备 WO2022063124A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011025894.1A CN112153422B (zh) 2020-09-25 2020-09-25 视频融合方法和设备
CN202011025894.1 2020-09-25

Publications (1)

Publication Number Publication Date
WO2022063124A1 true WO2022063124A1 (zh) 2022-03-31

Family

ID=73897280

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/119606 WO2022063124A1 (zh) 2020-09-25 2021-09-22 视频融合方法和设备

Country Status (2)

Country Link
CN (1) CN112153422B (zh)
WO (1) WO2022063124A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116471429A (zh) * 2023-06-20 2023-07-21 上海云梯信息科技有限公司 基于行为反馈的图像信息推送方法及实时视频传输系统

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112153422B (zh) * 2020-09-25 2023-03-31 连尚(北京)网络科技有限公司 视频融合方法和设备
CN115952315B (zh) * 2022-09-30 2023-08-18 北京宏扬迅腾科技发展有限公司 校园监控视频存储方法、装置、设备、介质和程序产品
CN117765362A (zh) * 2023-12-29 2024-03-26 浙江威星电子系统软件股份有限公司 一种大型运动场馆视频融合方法及系统

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130272679A1 (en) * 2012-04-12 2013-10-17 Mario Luis Gomes Cavalcanti Video Generator System
CN104735468A (zh) * 2015-04-03 2015-06-24 北京威扬科技有限公司 一种基于语义分析将图像合成新视频的方法及系统
CN109769141A (zh) * 2019-01-31 2019-05-17 北京字节跳动网络技术有限公司 一种视频生成方法、装置、电子设备及存储介质
CN111541936A (zh) * 2020-04-02 2020-08-14 腾讯科技(深圳)有限公司 视频及图像处理方法、装置、电子设备、存储介质
CN112153422A (zh) * 2020-09-25 2020-12-29 连尚(北京)网络科技有限公司 视频融合方法和设备

Family Cites Families (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101110100B (zh) * 2006-07-17 2012-05-02 松下电器产业株式会社 检测包含任意线段组合的形状的方法及装置
US20100257551A1 (en) * 2009-04-01 2010-10-07 Embarq Holdings Company, Llc Dynamic video content
US9001252B2 (en) * 2009-11-02 2015-04-07 Empire Technology Development Llc Image matching to augment reality
CN103426003B (zh) * 2012-05-22 2016-09-28 腾讯科技(深圳)有限公司 增强现实交互的实现方法和系统
JP6336406B2 (ja) * 2015-03-11 2018-06-06 富士フイルム株式会社 画像合成装置,画像合成方法ならびに画像合成プログラムおよびそのプログラムを格納した記録媒体
US10785530B2 (en) * 2015-12-16 2020-09-22 Gracenote, Inc. Dynamic video overlays
US20180330756A1 (en) * 2016-11-19 2018-11-15 James MacDonald Method and apparatus for creating and automating new video works
JP6723909B2 (ja) * 2016-12-09 2020-07-15 キヤノン株式会社 画像処理方法、画像処理装置、及びプログラム
US20180300046A1 (en) * 2017-04-12 2018-10-18 International Business Machines Corporation Image section navigation from multiple images
US10810779B2 (en) * 2017-12-07 2020-10-20 Facebook, Inc. Methods and systems for identifying target images for a media effect
CN110163640B (zh) * 2018-02-12 2023-12-08 华为技术有限公司 一种在视频中植入广告的方法及计算机设备
CN108846377A (zh) * 2018-06-29 2018-11-20 百度在线网络技术(北京)有限公司 用于拍摄图像的方法和装置
KR101972918B1 (ko) * 2018-12-20 2019-08-20 주식회사 로민 영상 마스킹 장치 및 영상 마스킹 방법
US20200213644A1 (en) * 2019-01-02 2020-07-02 International Business Machines Corporation Advertisement insertion in videos
CN109801347B (zh) * 2019-01-25 2022-10-25 北京字节跳动网络技术有限公司 一种可编辑图像模板的生成方法、装置、设备和介质
US20200304713A1 (en) * 2019-03-18 2020-09-24 Microsoft Technology Licensing, Llc Intelligent Video Presentation System
CN110472558B (zh) * 2019-08-13 2023-08-15 上海掌门科技有限公司 图像处理方法和装置
CN111147766A (zh) * 2019-11-21 2020-05-12 深圳壹账通智能科技有限公司 特效视频合成方法、装置、计算机设备和存储介质
CN111640166B (zh) * 2020-06-08 2024-03-26 上海商汤智能科技有限公司 一种ar合影方法、装置、计算机设备及存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130272679A1 (en) * 2012-04-12 2013-10-17 Mario Luis Gomes Cavalcanti Video Generator System
CN104735468A (zh) * 2015-04-03 2015-06-24 北京威扬科技有限公司 一种基于语义分析将图像合成新视频的方法及系统
CN109769141A (zh) * 2019-01-31 2019-05-17 北京字节跳动网络技术有限公司 一种视频生成方法、装置、电子设备及存储介质
CN111541936A (zh) * 2020-04-02 2020-08-14 腾讯科技(深圳)有限公司 视频及图像处理方法、装置、电子设备、存储介质
CN112153422A (zh) * 2020-09-25 2020-12-29 连尚(北京)网络科技有限公司 视频融合方法和设备

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116471429A (zh) * 2023-06-20 2023-07-21 上海云梯信息科技有限公司 基于行为反馈的图像信息推送方法及实时视频传输系统
CN116471429B (zh) * 2023-06-20 2023-08-25 上海云梯信息科技有限公司 基于行为反馈的图像信息推送方法及实时视频传输系统

Also Published As

Publication number Publication date
CN112153422B (zh) 2023-03-31
CN112153422A (zh) 2020-12-29

Similar Documents

Publication Publication Date Title
WO2022063124A1 (zh) 视频融合方法和设备
US8667016B2 (en) Sharing of presets for visual effects or other computer-implemented effects
JP5966622B2 (ja) 注釈付きコンテンツをモバイル装置でキャプチャおよび編成するシステム、方法、およびプログラム
US10075399B2 (en) Method and system for sharing media content between several users
CN109255035B (zh) 用于构建知识图谱的方法和装置
CN109271557B (zh) 用于输出信息的方法和装置
CN112073307B (zh) 邮件处理方法、装置、电子设备及计算机可读介质
WO2019227429A1 (zh) 多媒体内容生成方法、装置和设备/终端/服务器
WO2021227919A1 (zh) 图像数据编码方法及装置、显示方法及装置、电子设备
WO2020087878A1 (zh) 一种隐私信息管理方法、装置和系统
KR20180111981A (ko) 제한된 상호 작용을 갖는 실시간 콘텐츠 편집
CN109241344B (zh) 用于处理信息的方法和装置
CN103685209A (zh) 互联网媒体文件的溯源处理方法与服务器、通信系统
US20220312059A1 (en) Systems and methods for media verification, organization, search, and exchange
WO2023179308A1 (zh) 一种图像描述生成方法、装置、设备、介质及产品
CN114339447A (zh) 图片转视频的方法、装置、设备及存储介质
CN109947526B (zh) 用于输出信息的方法和装置
KR20200061784A (ko) 온라인 쇼핑 서비스를 제공하는 서버 및 단말
CN114239501A (zh) 合同生成方法、装置、设备及介质
KR20220079029A (ko) 문서 기반 멀티 미디어 콘텐츠 자동 제작 서비스 제공 방법
CN112306976A (zh) 信息处理方法、装置和电子设备
CN112669000A (zh) 政务事项处理方法、装置、电子设备及存储介质
CN115952315B (zh) 校园监控视频存储方法、装置、设备、介质和程序产品
KR102312481B1 (ko) 직관적인 인터페이스를 이용한 동영상 제작 서비스 제공 방법
US20230396857A1 (en) Video generation method and apparatus, and electronic device and computer-readable medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21871494

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21871494

Country of ref document: EP

Kind code of ref document: A1