WO2023036160A1 - 视频处理方法、装置、计算机可读存储介质及计算机设备 - Google Patents

视频处理方法、装置、计算机可读存储介质及计算机设备 Download PDF

Info

Publication number
WO2023036160A1
WO2023036160A1 PCT/CN2022/117420 CN2022117420W WO2023036160A1 WO 2023036160 A1 WO2023036160 A1 WO 2023036160A1 CN 2022117420 W CN2022117420 W CN 2022117420W WO 2023036160 A1 WO2023036160 A1 WO 2023036160A1
Authority
WO
WIPO (PCT)
Prior art keywords
rendering
frame
target
video frame
target video
Prior art date
Application number
PCT/CN2022/117420
Other languages
English (en)
French (fr)
Inventor
陶然
赵代平
杨瑞健
Original Assignee
上海商汤智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海商汤智能科技有限公司 filed Critical 上海商汤智能科技有限公司
Publication of WO2023036160A1 publication Critical patent/WO2023036160A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44012Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving rendering scenes according to scene graphs, e.g. MPEG-4 scene graphs
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44004Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving video buffer management, e.g. video decoder buffer or video display buffer

Definitions

  • the present disclosure relates to the technical field of video processing, and in particular, to a video processing method, device, computer-readable storage medium, and computer equipment.
  • an embodiment of the present disclosure provides a video processing method.
  • the method includes: when the target object is obtained from the target video frame, copying the target object to obtain at least one rendering material; rendering each rendering material into an intermediate cache frame;
  • the intermediate cache frame is synthesized with the target video frame to obtain a synthesized frame, wherein the synthesized frame includes the rendering material and the target object, and the action of each rendering material is consistent with the action of the target object Synchronize.
  • the target object in the target video frame is copied and rendered to an intermediate buffer frame, and then the intermediate buffer frame is synthesized with the target video frame to obtain a synthesized frame, so that the synthesized frame can simultaneously Display a "main body” and a preset number of "clones", and keep the movements and postures of the avatars and the main body synchronized.
  • the body refers to the copied target object
  • the clone refers to the copied rendering material.
  • the video processing method disclosed in the present disclosure can improve the interest in the special effect rendering process and the diversity of special effect rendering results.
  • an embodiment of the present disclosure provides a video processing device, the device comprising: a copying module, configured to copy the target object when the target object is obtained from the target video frame, to obtain at least one Rendering material; a rendering module, configured to render each rendering material into an intermediate buffer frame; a synthesis module, configured to synthesize the intermediate buffer frame and the target video frame to obtain a synthesized frame, the synthesized frame includes the rendering material and the target object, and the action of each rendering material is synchronized with the action of the target object.
  • an embodiment of the present disclosure provides a computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, the method described in any embodiment is implemented.
  • an embodiment of the present disclosure provides a computer device, including a memory, a processor, and a computer program stored on the memory and operable on the processor.
  • the processor executes the program, it implements the computer program described in any embodiment. described method.
  • FIG. 1A and FIG. 1B are schematic diagrams of video processing methods in the related art.
  • FIG. 2 is a flowchart of a video processing method according to an embodiment of the present disclosure.
  • FIG. 3A-1 , FIG. 3A-2 , FIG. 3B-1 and FIG. 3B-2 are schematic diagrams of setting the number and position of avatars in an embodiment of the present disclosure, respectively.
  • FIG. 4 is a flow chart of an avatar rendering process according to an embodiment of the present disclosure.
  • FIG. 5 , FIG. 6 , FIG. 7A and FIG. 7B are schematic diagrams of rendering associated objects according to embodiments of the present disclosure.
  • 8A and 8B are schematic diagrams of video processing effects of the embodiments of the present disclosure.
  • FIG. 9A is a schematic diagram of background segmentation according to an embodiment of the present disclosure.
  • FIG. 9B is a schematic diagram of masking processing according to an embodiment of the present disclosure.
  • Fig. 10 is a schematic diagram of a composite multi-frame second video frame according to an embodiment of the present disclosure.
  • FIG. 11 is a block diagram of a video processing device of an embodiment of the present disclosure.
  • FIG. 12 is a schematic structural diagram of a computer device according to an embodiment of the present disclosure.
  • first, second, third, etc. may be used in the present disclosure to describe various information, the information should not be limited to these terms. These terms are only used to distinguish information of the same type from one another. For example, without departing from the scope of the present disclosure, first information may also be called second information, and similarly, second information may also be called first information. Depending on the context, the word “if” as used herein may be interpreted as “at” or “when” or “in response to a determination.”
  • a video frame 101 includes a main body area 101a and a background area 101b.
  • the main body area 101a refers to an area including a target object, and the target object may be a person, an animal or other specific objects, or may be a person or an animal. A part of , for example, a human face.
  • the main body area 102 a of the video frame 102 is the same as the main body area 101 a of the video frame 101 , and the background area 102 b of the video frame 102 is different from the background area 101 b of the video frame 101 .
  • the above-mentioned processing methods are relatively monotonous, and are often difficult to meet the user's creative needs.
  • an embodiment of the present disclosure provides a video processing method, as shown in FIG. 2 , the method may include:
  • Step 201 When the target object is obtained from the target video frame, copy the target object to obtain at least one rendering material;
  • Step 202 rendering each rendering material into an intermediate cache frame
  • Step 203 Composite the intermediate cache frame and the target video frame to obtain a composite frame, wherein the composite frame includes the rendering material and the target object, and the action of each rendering material is consistent with The actions of the target objects are synchronized.
  • the methods of the embodiments of the present disclosure can be used to process video frames collected in real time, and can also be used to process video frames collected and cached in advance.
  • the video frames in the video may include continuously collected video frames, or may include discontinuous video frames.
  • the discontinuity may be caused by video clipping, video splicing and other processing.
  • the method can be applied to a terminal device or a server with video processing capability.
  • the terminal device may include but not limited to mobile phone, tablet computer, personal computer (Personal Computer, PC) and other devices.
  • the server may be a single server device, or a server cluster composed of multiple server devices.
  • software products such as an application program (APP), a small program, or a web client for video processing may be installed on the terminal device, and these software products execute the methods of the embodiments of the present disclosure.
  • APP application program
  • small program a web client for video processing
  • the above application program may be a live broadcast application program.
  • the anchor user can install a client (referred to as the anchor client) on the mobile phone, execute the method of the embodiment of the present disclosure through the anchor client to obtain the synthesized second video frame, and upload the second video frame to the live server.
  • the live broadcast server sends the synthesized second video frame to the client of the user watching the live broadcast (referred to as the audience client).
  • the above software product may be a beauty software product.
  • a user can install a client on a mobile phone, and the client calls the camera of the mobile phone to collect video frames, executes the methods of the embodiments of the present disclosure on the video frames through the client to obtain synthesized frames, and outputs the synthesized frames.
  • the target object in the target video frame is copied and then rendered to an intermediate buffer frame, and then the intermediate buffer frame is synthesized with the target video frame to obtain a synthesized frame, which can be simultaneously displayed in the synthesized frame
  • a "main body” and a preset number of "clones”, and the movements and postures of the avatars and the main body are kept in sync.
  • the body refers to the copied target object
  • the clone refers to the copied rendering material.
  • the intermediate buffer frame may be a preset blank frame or a non-blank frame, and may also be a video frame generated in response to obtaining a target object from a video frame, or in response to obtaining at least one rendering material, which is not covered by this disclosure. limited.
  • the video processing method disclosed in the present disclosure can improve the interest in the special effect rendering process and the diversity of special effect rendering results.
  • the target video frame may be any frame in the video that includes the target object.
  • the video processing method of the embodiment of the present disclosure may be executed for part or all of the multiple video frames.
  • Object detection may be performed on part or all of the video frames in the video to determine whether the detected video frames include the target object.
  • one target video frame includes one or more target objects
  • the same operation may be performed on each of the target objects.
  • the solution of the embodiment of the present disclosure will be described below by taking the case that the number of target objects in the target video frame is 1.
  • a target object for example, a person
  • the target object hereinafter also referred to as a body
  • at least one rendering material hereinafter also referred to as an avatar.
  • N clones can be obtained by copying, and N is a positive integer.
  • the value of N can be predetermined. After the value of N is determined, the value of N can be written into the configuration file, and read from the configuration file when rendering material needs to be obtained.
  • a default value of N may also be stored in the configuration file. In this way, if the value of N is not successfully set through any of the following methods, the default value in the configuration file can be used as the value of N.
  • the above method is not necessary, that is, the initial value of N in the configuration file can also be empty, or equal to 0. After successfully setting the value of N in one of the following ways, you can change the value of N in the above configuration file to the value after setting.
  • the value of N may be directly input by the user.
  • a setting instruction input by a user may be received, and the setting instruction carries a value of N, and then the value of N is set based on the setting instruction.
  • This embodiment can acquire a certain number of rendering materials, so as to display a certain number of avatars on the synthesized frame.
  • the determined quantity can be input and stored in advance by the user, and can be read directly when the target object needs to be copied, thereby improving rendering efficiency, and is suitable for scenes with high real-time requirements.
  • the value of N may also be randomly generated.
  • the numerical value of N may be randomly generated from a predetermined numerical range based on a predetermined distribution function (eg, uniform distribution, normal distribution, etc.). Assuming that the distribution function is uniformly distributed and the value range is 1-10, an integer can be randomly selected from the ten integers of 1-10 as the value of N with equal probability.
  • a predetermined distribution function eg, uniform distribution, normal distribution, etc.
  • the information in the target video frame may be detected or identified, and the value of N is automatically set based on the detection or identification result.
  • specified objects may be identified from target video frames, and the value of N is automatically set based on the number of specified objects.
  • the value of N may be set to be equal to the number of specified objects, that is, each specified object corresponds to one avatar.
  • the value of N may be set as an integer multiple of the number of specified objects, that is, each specified object corresponds to multiple avatars.
  • the number of avatars corresponding to different specified objects can also be different.
  • specified object A corresponds to 1 avatar
  • specified object B corresponds to 2 avatars.
  • the specified object C corresponds to 0 clones.
  • This disclosure does not give examples one by one.
  • This embodiment can automatically set the number of rendered materials based on the number of specified objects, and the number of specified objects in different target video frames may be different, so that the number of each avatar in the composite frame generated based on different target video frames is not fixed , improving the diversity of the generated synthetic frames.
  • the specified object may be a certain type of object, for example, it may be a table, a leaf, a pillow, and the like.
  • the category can be preset by the user, or can be obtained through automatic identification. For example, a first object whose distance from the target object in the target video frame is less than a first preset distance threshold may be obtained, and if it is recognized that the target video frame includes a second object of the same category as the first object, Then determine the second object as the designated object.
  • the specified object may also be an object in the target video frame that satisfies a preset number of conditions.
  • the preset quantity condition may be that the quantity exceeds a preset quantity threshold. For example, in a target video frame including a row of streetlights, if the number of streetlights exceeds the preset quantity threshold, the streetlight may be determined as the specified object. For another example, if the number of vehicles exceeds a preset number threshold in the target video frame captured in the scene of a road or a parking lot, the vehicle may be determined as the specified object.
  • the preset quantity condition may also be that the quantity is equal to a preset value.
  • the stool can be determined as the specified object.
  • the preset quantity conditions may also be other quantity conditions, which can be set according to actual needs, and will not be listed here.
  • the specified object may also be an object at a preset position in the target video frame, or an object whose pixel number in the target video frame exceeds a certain threshold, or an object whose pixel value is within a preset range, etc.
  • the specified object may also be determined in other manners, which will not be given one by one in the present disclosure.
  • the avatar and the main body in the synthesized frame can be placed around objects of the same category, so as to display the effect that multiple target objects with synchronized poses repeatedly appear around objects of the same category in the synthesized frame.
  • each rendering material may be rendered into an intermediate cache frame.
  • the rendering materials obtained from different target video frames can be drawn into different intermediate buffer frames, or can be drawn into the same intermediate buffer frame .
  • some or all existing rendering materials in the intermediate buffer frame may be cleared first, and then the rendering materials obtained from the target video frame may be drawn.
  • the rendering material obtained from the target video frame may also be drawn while retaining all existing rendering materials in the intermediate buffer frame.
  • Rendering materials corresponding to different target objects extracted from the same target video frame may be drawn into different intermediate buffer frames, or may be drawn into the same intermediate buffer frame. Multiple rendering materials copied from the same target object can be drawn into different intermediate buffer frames, or drawn into the same intermediate buffer frame.
  • the rendering location may be directly input by the user, for example, inputting the coordinates of the rendering location.
  • a setting instruction input by a user may be received, and the setting instruction carries the rendering position, and then the rendering position is set based on the setting instruction.
  • the value of N and the rendering position mentioned above can be carried in the same setting command at the same time, or the value of N and the rendering position can be set separately through different commands, which is not limited in the present disclosure.
  • the rendering position may also be randomly generated.
  • the coordinates of the rendering position may be randomly generated within the coordinate range corresponding to the target video frame in the intermediate cache frame.
  • a predetermined distribution function eg, uniform distribution, normal distribution, etc.
  • the coordinate range of the target video frame is (x0, y0) to (x1, y1)
  • it can be obtained from (x0, y0) to (x1, y1) in the intermediate buffer frame with equal probability.
  • the rendering position may be automatically set based on detecting or identifying information in the target video frame and based on the detection or identifying result. For example, a specified object may be identified from a target video frame, and the rendering position in the intermediate cache frame is automatically set based on the position of the specified object in the target video.
  • the rendering position may be included within a range of positions corresponding to the position of the specified object in the target video frame. For example, when the specified object is a table, the rendering position may be a position point within a position range corresponding to the position of the desktop.
  • the manner of determining the designated object may be the same as the manner of determining the designated object when determining the value of N described above, and will not be further described here.
  • the specified object used to determine the value of N and the specified object used to determine the rendering position may be the same object. This embodiment can automatically set the position of the rendering material based on the position of the specified object.
  • the positions of the specified objects may be different in different target video frames, so that the position of each avatar in the synthesized frames generated based on different target video frames is not fixed, and the diversity of the generated synthesized frames is improved.
  • the position with the coordinates (x2, y2) in the intermediate buffer frame may be determined as the rendering position. Determining the second relative positional relationship between the rendering material and the specified object through the method of this embodiment can make the mutual relationship between the avatar and the specified object more coordinated.
  • a first relative positional relationship between the target object and the first object may be determined, and based on the first relative positional relationship, the size of the first object, and the size of the designated object, determine A second relative positional relationship between the rendering material and the designated object.
  • the first relative positional relationship may include a first direction and a first distance of the target object relative to the first object
  • the second relative positional relationship may include the rendering material relative to the specified object The second direction and the second distance of .
  • a scaling ratio may be determined based on the size of the first object and the size of the designated object, and the scaling process may be performed on the first distance based on the scaling ratio to obtain a second distance, so that the first distance and the The ratio of the second distance is equal to the ratio of the size of the first object to the size of the designated object.
  • a second direction of the rendering material relative to the specified object may also be determined based on the first direction, and the first direction and the second direction may be the same, opposite or perpendicular to each other, or form a specified angle. Then, the target location is determined based on the second direction, the second distance, and the location of the designated object.
  • the rendering material may also be scaled based on the scaling ratio, so that the ratio of the size of the target object to the rendering material is equal to the ratio of the size of the first object to the designated object.
  • an ontology 3011 and specified objects of preset categories are detected in the target video frame 301 before synthesis.
  • the value of the number N of specified objects is 2, and the specified positions are respectively a point within the range where the leaf 3012 is located and a point within the range where the leaf 3013 is located. Since the directions of the leaves 3012 and 3013 are different, as shown in FIG. 3A-2 , one avatar 3015 of the body 3011 can be turned over so that the two avatars 3014 and 3015 of the body 3011 face different directions.
  • the second object 3033 which is also the street lamp category can be determined as the designated object, and Avatars 3034 are set around each street lamp 3033 to obtain a synthetic frame 304 as shown in FIG. 3B-2 .
  • preprocessing may be performed on each of the rendering materials to obtain preprocessed rendering materials, the attributes of the preprocessed rendering materials are different from the attributes of the target object; Each of the rendering materials of is rendered to the intermediate buffer frame.
  • the attribute may include but not limited to at least one of the target object's position, size, color, transparency, shadow, angle, orientation, motion, and the like.
  • the preprocessing includes but is not limited to at least one of the following: displacement, rotation, flipping, scaling, color processing, transparency processing, and shadow processing.
  • Different attributes can be set for different clones through preprocessing, so as to distinguish different clones.
  • the avatars can display different display effects, so that the avatars can have more diversified display effects.
  • the preprocessed target attribute of each rendering material can be randomly determined. For example, taking the attribute including color as an example, assuming that the candidate color space includes three colors of red, green, and blue, the number of rendering materials is 1 , then one of the three colors of red, green, and blue can be randomly selected as the target color (assumed to be red) after preprocessing of the rendering material, and the color of the rendering material can be changed to the target color (red) through preprocessing.
  • the preprocessed target attribute can also be input by the user, and the attribute of the rendered material can be changed to the target attribute input by the user through preprocessing.
  • the rendering material may be scaled based on the size of the third object.
  • the size of the scaled rendering material can be matched with the size of the third object, reducing visual incongruity caused by size mismatch.
  • the rendering material may be flipped based on the direction of the third object.
  • the flipped rendering material and the third object may have the same direction, opposite direction, vertical direction or other direction relations.
  • the rendering material may be rotated based on the angle of the third object.
  • the rotated rendering material and the third object can form a preset angle, for example, on the same straight line, or on two sides of an equilateral triangle.
  • the target object may be copied and preprocessed by applying parameters such as the rendering position and scaling ratio of each rendering material.
  • the rendering position and zoom ratio may be set through a setting instruction sent by a user, or may be automatically set in other ways.
  • intermediate cache frames may be output.
  • the output intermediate buffer frame can be used for compositing with the target video frame.
  • the intermediate buffer frame and the target video frame may be synthesized to obtain a synthesized frame. Since the avatar and the main body come from the same target video frame, the actions of the avatar and the main body are synchronized.
  • one target video frame may be combined with one or more intermediate buffer frames.
  • the target video frame can be synthesized with multiple intermediate buffer frames to obtain a composite frame frame, multiple target objects in the synthesized frame have corresponding clones.
  • the video frame can be synthesized with multiple intermediate buffer frames to obtain a synthesized frame.
  • a target object in a frame has multiple occurrences.
  • an associated object may be rendered for the rendered material and displayed in the composite frame.
  • an associated object may be rendered for the rendering material based on the action category of the target object.
  • the associated object may be a prop related to the action of the target object.
  • a football prop 503 may be added to the rendering material 502 .
  • an associated object (not shown in the figure) may also be added to the target object 501 . It is also possible to randomly render different associated objects for the target object and each rendering material, so as to distinguish the target object and each rendering material.
  • the fan 704 as an associated object can be rendered for the rendering material 703 corresponding to the target object 702 according to the subtitle information shown in FIG. 7A , so that the composite frame Fc includes the target object 702 , the rendering material 703 and the fan 704 .
  • the target object in the target video frame, the rendering material in the intermediate buffer frame, and the background area in the target video frame can be drawn on different transparent layers respectively, and after drawing Each transparent layer is synthesized to obtain the synthesized frame; wherein, different transparent layers have different display priorities, and pixels on a transparent layer with a higher display priority can cover a transparent image with a lower display priority pixels on the layer.
  • different occlusion effects such as the avatar blocking the main body or the main body occluding the avatar can be realized through the solutions of the embodiments of the present disclosure.
  • the target object 801 in the target video frame F1, the rendering material 802 in the intermediate buffer frame (not shown in the figure), and the The background area 803 of is rendered on the layer L1, the layer L2, and the layer L3 respectively, and the display priorities of the layer L1, the layer L2, and the layer L3 decrease in order.
  • the target object 801 on the layer L1 can cover the rendering material 802 on the layer L2
  • the rendering material 802 on the layer L2 can cover the background area 803 on the layer L3.
  • Target detection can be performed on the target video frame to obtain a detection result. Acquiring a target object from the target video frame based on the detection result. Specifically, background segmentation can be performed on the target video frame based on the detection result to obtain a mask (mask) of the target object in the target video frame; based on the mask of the target object in the target video frame, the Mask processing is performed on the target video frame, and a target object is segmented from the target video frame based on a mask processing result. As shown in FIG. 9A , the target object 901 in the target video frame F1 corresponds to the mask 902 . The mask processing result is shown in FIG. 9B.
  • the mask 902 of the target object 901 is used to extract the target object 901 from the target video frame F1, and generally has the same size and shape as the target object 901 .
  • one layer of layer can be overlaid on the target video frame F1, the layer includes a transparent area and an opaque area, the area corresponding to the mask 902 can be set as a transparent area, and the area other than the mask 902 can be set to The area is set to an opaque area.
  • the target object 901 can be obtained by intercepting the transparent area.
  • the background segmentation in the related art generally requires setting a green screen, based on the color of each pixel in the video frame, to segment the foreground (target object) and the background of the video frame.
  • This background segmentation method is easy to cause segmentation errors due to the green pixels of the target object itself, the segmentation accuracy is low, and it cannot realize the production of video special effects anytime and anywhere.
  • the embodiment of the present disclosure realizes the segmentation of the foreground and the background through target detection without setting up a green screen, which improves the accuracy of the background segmentation, and at the same time facilitates the user to realize video processing anytime and anywhere through a terminal device such as a mobile phone.
  • the information of the target object in the video frame can be obtained through mask processing. Then, according to the requirements and parameters of different effects, various operations can be performed on the obtained target object, such as displacement, scaling, caching, etc., and the processed results are rendered into the intermediate cache frame.
  • the specific operation content of this step will vary depending on the special effects. However, the main idea is to operate on the intercepted main character image (also called the target object) to achieve different effects. Those skilled in the art can also create more different rendering special effects by modifying these operations.
  • the present disclosure relates to the field of augmented reality.
  • the specific application can be obtained. Matching AR effect combining virtual and reality.
  • the target object may involve faces, limbs, gestures, actions, etc. related to the human body, or markers and markers related to objects, or sand tables, display areas or display items related to venues or places.
  • Vision-related algorithms can involve visual positioning, SLAM, 3D reconstruction, image registration, background segmentation, object key point extraction and tracking, object pose or depth detection, etc.
  • Specific applications can not only involve interactive scenes such as guided tours, navigation, explanations, reconstructions, virtual effect overlays and display related to real scenes or objects, but also special effects processing related to people, such as makeup beautification, body beautification, special effect display, virtual Interactive scenarios such as model display.
  • the relevant features, states and attributes of the target object can be detected or identified through the convolutional neural network.
  • the above-mentioned convolutional neural network is a network model obtained by performing model training based on a deep learning framework.
  • the writing order of each step does not mean a strict execution order and constitutes any limitation on the implementation process.
  • the specific execution order of each step should be based on its function and possible
  • the inner logic is OK.
  • Synthesizing module 1103 configured to synthesize the intermediate buffer frame and the target video frame to obtain a synthesized frame, the synthesized frame including the rendering material in the intermediate buffer frame and the target object in the target video frame, each The actions of each of the rendering materials are synchronized with the actions of the target object.
  • the device further includes an identification module configured to identify specified objects in the target video frame; the specified objects include objects satisfying a preset number of conditions and/or objects of a specified category.
  • a determination module configured to determine the quantity of the rendering materials based on the number of the specified objects in the target video frame, and/or, based on the position of the specified object in the target video frame, determine the number of the rendering materials in the target video frame The rendering position in the intermediate cache frame described above.
  • This embodiment can automatically set the number of rendering materials based on the number of specified objects, and automatically set the position of the rendering materials based on the position of the specified object.
  • the number and/or position of the specified object may be different in different target video frames, so that the position and/or number of each avatar in the composite frame generated based on different target video frames is not fixed, which improves the accuracy of the generated composite frame. diversity.
  • the rendering module is configured to: determine the target position of the rendering material in the target video frame based on the position of the specified object in the target video frame, determine based on the target position The rendering position of the rendering material in the intermediate buffer frame.
  • the rendering module is configured to: determine a first relative positional relationship between the target object and the first object in the target video frame; based on the first relative positional relationship, the first The size of the object and the size of the specified object determine a second relative positional relationship between the rendering material and the specified object; determine the target position based on the second relative positional relationship and the position of the specified object. Determining the second relative positional relationship between the rendering material and the specified object through the method of this embodiment can make the mutual relationship between the avatar and the specified object more coordinated.
  • the position and number of rendering materials can be pre-written into the configuration file and read directly from the configuration file when rendering is required. In this way, the reading and writing efficiency is high, thereby improving the rendering efficiency of the rendered material.
  • the number of rendering materials is greater than 1, and each rendering material is rendered to different rendering positions in the intermediate buffer frame; the rendering module is used to: obtain a position sequence, and the position sequence includes multiple position information; according to the sequence of each position information in the position sequence, sequentially render multiple rendering materials to positions corresponding to each position information in the intermediate cache frame.
  • the rendering module includes: a position determining unit, configured to determine a target position of the rendering material on the target video frame; an object determining unit, configured to determine that the distance from the target position is less than the first A third object with a preset distance threshold; a preprocessing unit, configured to preprocess the rendering material based on the attribute information of the third object.
  • the preprocessing unit is configured to: scale the rendering material based on the size of the third object; and/or perform flipping processing on the rendering material based on the direction of the third object ; and/or perform rotation processing on the rendering material based on the angle of the third object; and/or perform color processing on the rendering material based on the color of the third object.
  • the avatars can have different display effects, so that the avatars can have more diversified display effects.
  • the apparatus further includes: an associated object rendering module, configured to render an associated object for the rendering material, and display the associated object in the synthesized frame.
  • an associated object rendering module configured to render an associated object for the rendering material, and display the associated object in the synthesized frame.
  • the associated object rendering module is configured to: render the associated object for the rendering material based on the subtitle information in the target video frame; or render the associated object for the rendering material based on the action category of the target object Render the associated object; or randomly render the associated object for the rendered material.
  • the device further includes: a detection module, configured to perform target detection on the target video frame to obtain a detection result; a target object acquisition module, configured to extract the target object from the target video frame based on the detection result Get the target object.
  • a detection module configured to perform target detection on the target video frame to obtain a detection result
  • a target object acquisition module configured to extract the target object from the target video frame based on the detection result Get the target object.
  • the target object acquisition module is configured to: perform background segmentation on the target video frame based on the detection result to obtain a mask of the target object in the target video frame;
  • the mask of the target object is to perform mask processing on the target video frame, and segment the target object from the target video frame based on the mask processing result.
  • the functions or modules included in the device provided by the embodiments of the present disclosure can be used to execute the methods described in the method embodiments above, and its specific implementation can refer to the description of the method embodiments above. For brevity, here No longer.
  • the processor 1201 may be implemented by a general-purpose CPU (Central Processing Unit, central processing unit), a microprocessor, an application-specific integrated circuit (Application Specific Integrated Circuit, ASIC), or one or more integrated circuits, and is used to execute related programs to realize the technical solutions provided by the embodiments of this specification.
  • the processor 1201 may also include a graphics card, and the graphics card may be an Nvidia titan X graphics card or a 1080Ti graphics card.
  • the memory 1202 can be implemented in the form of ROM (Read Only Memory, read-only memory), RAM (Random Access Memory, random access memory), static storage device, dynamic storage device, and the like.
  • the memory 1202 can store operating systems and other application programs. When implementing the technical solutions provided by the embodiments of this specification through software or firmware, the relevant program codes are stored in the memory 1202 and invoked by the processor 1201 for execution.
  • the input/output interface 1203 is used to connect the input/output module to realize information input and output.
  • the input/output module can be configured in the device as a component (not shown in the figure), or can be externally connected to the device to provide corresponding functions.
  • the input device may include a keyboard, mouse, touch screen, microphone, various sensors, etc.
  • the output device may include a display, a speaker, a vibrator, an indicator light, and the like.
  • the communication interface 1204 is used to connect a communication module (not shown in the figure), so as to realize the communication interaction between this device and other devices.
  • the communication module can realize communication through wired means (such as USB, network cable, etc.), and can also realize communication through wireless means (such as mobile network, WIFI, Bluetooth, etc.).
  • Bus 1205 includes a path for transferring information between the various components of the device (eg, processor 1201, memory 1202, input/output interface 1203, and communication interface 1204).
  • the above device only shows the processor 1201, the memory 1202, the input/output interface 1203, the communication interface 1204, and the bus 1205, in the specific implementation process, the device may also include other components.
  • the above-mentioned device may only include components necessary to implement the solutions of the embodiments of this specification, and does not necessarily include all the components shown in the figure.
  • An embodiment of the present disclosure further provides a computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, the method described in any one of the foregoing embodiments is implemented.
  • Computer-readable media including both volatile and non-volatile, removable and non-removable media, can be implemented by any method or technology for storage of information.
  • Information may be computer readable instructions, data structures, modules of a program, or other data.
  • Examples of storage media for computers include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technology, compact disc read-only memory (CD-ROM), digital versatile disc (DVD) or other optical storage, magnetic A magnetic tape cartridge, disk storage, or other magnetic storage device, or any other non-transmission medium, that can be used to store information that can be accessed by a computing device.
  • computer-readable media does not include transitory computer-readable media, such as modulated data signals and carrier waves.
  • a typical implementing device is a computer, which may take the form of a personal computer, laptop computer, cellular phone, camera phone, smart phone, personal digital assistant, media player, navigation device, e-mail device, game control device, etc. desktops, tablets, wearables, or any combination of these.
  • each embodiment in this specification is described in a progressive manner, the same or similar parts of each embodiment can be referred to each other, and each embodiment focuses on the differences from other embodiments.
  • the description is relatively simple, and for relevant parts, please refer to part of the description of the method embodiment.
  • the device embodiments described above are only illustrative, and the modules described as separate components may or may not be physically separated, and the functions of each module may be integrated in the same or multiple software and/or hardware implementations. Part or all of the modules can also be selected according to actual needs to achieve the purpose of the solution of this embodiment. It can be understood and implemented by those skilled in the art without creative effort.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Processing Or Creating Images (AREA)

Abstract

本公开实施例提供一种视频处理方法、装置、计算机可读存储介质及计算机设备。根据所述方法的一个示例,在从目标视频帧中获取到目标对象的情况下,对所述目标对象进行复制,得到至少一个渲染素材。然后,可将每个所述渲染素材渲染到中间缓存帧中,并对所述中间缓存帧与所述目标视频帧进行合成,得到合成帧。这样,所述合成帧中包括所述渲染素材以及所述目标对象,且每个所述渲染素材的动作均与所述目标对象的动作同步。

Description

视频处理方法、装置、计算机可读存储介质及计算机设备
相关申请的交叉引用
本申请要求于2021年9月7日提交的、申请号为202111044847.6的中国专利申请的优先权,该中国专利申请公开的全部内容以引用的方式并入本文中。
技术领域
本公开涉及视频处理技术领域,尤其涉及视频处理方法、装置、计算机可读存储介质及计算机设备。
背景技术
在以往的特效渲染中,通常采用背景分割来将视频帧中的目标对象和背景区域分开,并对视频帧中的背景区域进行替换。然而,上述处理方式较为单调,处理结果的显示效果比较单一。因此,有必要对视频的特效渲染方式进行改进。
发明内容
第一方面,本公开实施例提供一种视频处理方法。所述方法包括:在从目标视频帧中获取到目标对象的情况下,对所述目标对象进行复制,得到至少一个渲染素材;将每个所述渲染素材渲染到中间缓存帧中;对所述中间缓存帧与所述目标视频帧进行合成,得到合成帧,其中,所述合成帧中包括所述渲染素材以及所述目标对象,每个所述渲染素材的动作均与所述目标对象的动作同步。
本公开一些实施例通过对目标视频帧中的目标对象进行复制后渲染到中间缓存帧,再将所述中间缓存帧与所述目标视频帧进行合成,得到合成帧,从而能够在合成帧中同时显示一个“本体”以及预设数量个“分身”,并使分身与本体的动作和姿态保持同步。其中,本体是指被复制的目标对象,分身是指复制得到的渲染素材。相比于相关技术中简单替换背景区域的方式,本公开的视频处理方式能够提高特效渲染过程中的趣味性和特效渲染结果的多样性。
第二方面,本公开实施例提供一种视频处理装置,所述装置包括:复制模块,用于在从目标视频帧中获取到目标对象的情况下,对所述目标对象进行复制,得到至少一个渲染素材;渲染模块,用于将每个所述渲染素材渲染到中间缓存帧中;合成模块,用于对所述中间缓存帧与所述目标视频帧进行合成,得到合成帧,所述合成帧中包括所述渲染素材以及所述目标对象,每个所述渲染素材的动作均与所述目标对象的动作同步。
第三方面,本公开实施例提供一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现任一实施例所述的方法。
第四方面,本公开实施例提供一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现任一实施例所述的方法。
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,而非限制本公开。
附图说明
此处的附图示出了符合本公开的实施例,并与说明书一起用于说明本公开的技术方 案。
图1A和图1B是相关技术中的视频处理方式的示意图。
图2是本公开实施例的视频处理方法的流程图。
图3A-1、图3A-2、图3B-1和图3B-2分别是本公开实施例的设置分身数量和位置的示意图。
图4是本公开实施例的分身渲染过程的流程图。
图5、图6、图7A和图7B分别是本公开实施例的渲染关联对象的示意图。
图8A和图8B是本公开实施例的视频处理效果的示意图。
图9A是本公开实施例的背景分割的示意图。
图9B是本公开实施例的蒙版处理的示意图。
图10是本公开实施例的合成后的多帧第二视频帧的示意图。
图11是本公开实施例的视频处理装置的框图。
图12是本公开实施例的计算机设备的结构示意图。
具体实施方式
这里将详细地对示例性实施例进行说明,其示例表示在附图中。下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本公开相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本公开的一些方面相一致的装置和方法的例子。
在本公开使用的术语是仅仅出于描述特定实施例的目的,而非旨在限制本公开。在本公开和所附权利要求书中所使用的单数形式的“一种”、“所述”和“该”也旨在包括多数形式,除非上下文清楚地表示其他含义。还应当理解,本文中使用的术语“和/或”是指并包含一个或多个相关联的列出项目的任何或所有可能组合。另外,本文中术语“至少一种”表示多种中的任意一种或多种中的至少两种的任意组合。
应当理解,尽管在本公开可能采用术语第一、第二、第三等来描述各种信息,但这些信息不应限于这些术语。这些术语仅用来将同一类型的信息彼此区分开。例如,在不脱离本公开范围的情况下,第一信息也可以被称为第二信息,类似地,第二信息也可以被称为第一信息。取决于语境,如在此所使用的词语“如果”可以被解释成为“在……时”或“当……时”或“响应于确定”。
为了使本技术领域的人员更好的理解本公开实施例中的技术方案,并使本公开实施例的上述目的、特征和优点能够更加明显易懂,下面结合附图对本公开实施例中的技术方案作进一步详细的说明。
在相关技术中,一般通过背景分割来将视频帧中的目标对象和背景区域分开,并对视频帧中的背景区域进行替换。如图1A所示,视频帧101中包括主体区域101a和背景区域101b,主体区域101a是指包括目标对象的区域,所述目标对象可以是人物、动物或者其他特定对象,也可以是人物或动物的某一部分,例如,人脸。通过对视频帧101中的背景区域进行替换,可以得到如图1B所示的视频帧102。其中,视频帧102的主体区域102a与视频帧101的主体区域101a相同,视频帧102的背景区域102b与视频帧101的背景区域101b不同。然而,上述处理方式较为单调,常常难以满足用户的创作需求。
基于此,本公开实施例提供一种视频处理方法,如图2,所述方法可包括:
步骤201:在从目标视频帧中获取到目标对象的情况下,对所述目标对象进行复制,得到至少一个渲染素材;
步骤202:将每个所述渲染素材渲染到中间缓存帧中;
步骤203:对所述中间缓存帧与所述目标视频帧进行合成,得到合成帧,其中,所述合成帧中包括所述渲染素材以及所述目标对象,每个所述渲染素材的动作均与所述目标对象的动作同步。
本公开实施例的方法既可以用于对实时采集的视频帧进行处理,也可以用于对预先采集并缓存的视频帧进行处理。所述视频中的视频帧可以包括连续采集得到的视频帧,也可以包括不连续的视频帧。所述不连续可以是由视频裁剪、视频拼接等处理而导致的。该方法可应用于具有视频处理能力的终端设备或者服务器。所述终端设备可以包括但不限于手机、平板电脑、个人电脑(Personal Computer,PC)等设备。所述服务器可以是单台服务器设备,或者是由多台服务器设备构成的服务器集群。在一些实施例中,可以在终端设备上安装用于进行视频处理的应用程序(APP)、小程序或者web客户端等软件产品,并由这些软件产品执行本公开实施例的方法。
在一种可能的应用场景下,上述应用程序可以是直播类应用程序。主播用户可以在手机上安装客户端(称为主播客户端),通过主播客户端执行本公开实施例的方法以得到合成后的第二视频帧,并将第二视频帧上传至直播服务器,由直播服务器将合成后的第二视频帧发送至观看直播的用户的客户端(称为观众客户端)。在另一种可能的应用场景下,上述软件产品可以是美颜类软件产品。用户可以在手机上安装客户端,客户端调用手机摄像头采集视频帧,通过客户端对视频帧执行本公开实施例的方法以得到合成帧,并输出合成帧。本领域技术人员可以理解,上述应用场景仅为示例性说明,并非用于限制本公开。
本公开实施例通过对目标视频帧中的目标对象进行复制后渲染到中间缓存帧,再将所述中间缓存帧与所述目标视频帧进行合成,得到合成帧,从而能够在合成帧中同时显示一个“本体”以及预设数量个“分身”,并使分身与本体的动作和姿态保持同步。其中,本体是指被复制的目标对象,分身是指复制得到的渲染素材。中间缓存帧可以是预先设定的空白帧或非空白帧,也可以是响应于从视频帧中获取到目标对象,或响应于得到至少一个渲染素材而生成的视频帧,本公开对此不做限定。相比于相关技术中简单替换背景区域的方式,本公开的视频处理方式能够提高特效渲染过程中的趣味性和特效渲染结果的多样性。
在步骤201中,所述目标视频帧可以是视频中任意一帧包括目标对象的视频帧。在视频的连续多帧视频帧中均包括目标对象的情况下,可以针对所述多帧视频帧中的部分或全部执行本公开实施例的视频处理方法。可以通过对视频中的部分或全部视频帧进行目标检测,以确定被检测的视频帧中是否包括目标对象。在一个目标视频帧中包括一个或多个目标对象的情况下,可以对其中的每个目标对象执行相同的操作。为了便于说明,下文以目标视频帧中目标对象数量为1的情况对本公开实施例的方案进行说明。
如果检测到目标对象(例如,人物),则对目标对象(以下也可称为本体)进行复制,得到至少一个渲染素材(以下也可称为分身)。通过复制可以得到N个分身,N为正整数。N的数值可以预先确定。在确定N的数值之后,可以将N的数值写入配置文件,并在需要获取渲染素材时从配置文件中读取。
下面对确定N的数值的几种方式进行举例说明。本领域技术人员可以理解,下面 的举例仅为示例性说明,并非用于限制本公开。除了下面的例子中所示出的方式之外,也可以采用其他方式确定渲染素材的数量N。在一些实施例中,还可以在配置文件中存储N的默认数值。这样,如果未通过下面任何一种方式成功设置N的数值,则可以采用配置文件中的默认数值作为N的数值。当然,上述方式并非是必须的,即,配置文件中N的初始数值也可以为空,或者等于0。在通过下面的某种方式成功设置N的数值之后,可以将上述配置文件中N的数值更改为设置后的数值。
在一些实施例中,可以直接由用户输入N的数值。具体来说,可以接收用户输入的设置指令,所述设置指令中携带N的数值,然后基于设置指令对N的数值进行设置。本实施例能够获取确定数量个渲染素材,从而在合成帧上显示确定数量个分身。确定数量可以由用户预先输入并存储,并在需要复制目标对象时直接读取,从而提高渲染效率,适用于实时性要求较高的场景。
在另一些实施例中,N的数值也可以随机生成。例如,可以基于预先确定的分布函数(例如,均匀分布、正态分布等),从预先确定的数值范围内随机生成N的数值。假设分布函数为均匀分布,数值范围为1-10,则可以以相等的概率随机从1-10这十个整数中选取一个整数作为N的数值。
在另一些实施例中,可以对目标视频帧中的信息进行检测或识别,并基于检测或识别的结果自动设置N的数值。例如,可以从目标视频帧中识别指定对象,并基于指定对象的数量自动设置N的数值。N的数值可以设置为与指定对象的数量相等,即,每个指定对象对应一个分身。或者,N的数值可以设置为指定对象数量的整数倍,即,每个指定对象对应多个分身。N的数值与指定对象的数量之间也可以是其他数量关系,例如,不同的指定对象所对应的分身的数量也可以不同,例如,指定对象A对应1个分身,指定对象B对应2个分身,指定对象C对应0个分身。本公开对此不再一一举例。本实施例能够基于指定对象的数量自动设置渲染素材的数量,在不同的目标视频帧中指定对象的数量可能不同,从而使得基于不同的目标视频帧生成的合成帧中各个分身的数量不是固定的,提高了生成的合成帧的多样性。
在上述实施例中,所述指定对象可以是某一类别的对象,例如,可以是桌子、树叶、枕头等。所述类别可以由用户预先设置,也可以通过自动识别得到。例如,可以获取所述目标视频帧中与目标对象的距离小于第一预设距离阈值的第一对象,若识别到所述目标视频帧中包括与所述第一对象类别相同的第二对象,则将所述第二对象确定为指定对象。
所述指定对象也可以是目标视频帧中满足预设数量条件的对象。所述预设数量条件可以是数量超过预设数量阈值,例如,在包括一排路灯的目标视频帧中,路灯的数量超过预设数量阈值,则可以将路灯确定为指定对象。又例如,在马路或停车场场景下拍摄得到的目标视频帧中,车辆的数量超过预设数量阈值,则可以将车辆确定为指定对象。所述预设数量条件还可以是数量等于预设的数值。例如,假设预设的数值为3,一帧目标视频帧中包括1棵树和3张凳子,由于凳子的数量等于预设的数值(即,3),则可以将凳子确定为指定对象。所述预设数量条件还可以是其他的数量条件,具体可根据实际需要设置,此处不再一一列举。
所述指定对象还可以是目标视频帧中预设位置处的对象,或者是在目标视频帧中所占像素数量超过一定数量阈值的对象,或者是像素值在预设范围内的对象等。还可以采用其他方式确定指定对象,本公开对此不再一一举例。本实施例能够使合成帧中的分身与本体处于同一类别的对象周围,从而在合成帧中展示出多个姿态同步的目标对象重复出现在同一类别的对象周围的效果。
在步骤202中,可以将各个渲染素材渲染到中间缓存帧中。在视频中包括多个需要 进行合成的目标视频帧的情况下,从不同的目标视频帧中获取的渲染素材可以被绘制到不同的中间缓存帧中,也可以被绘制到相同的中间缓存帧中。在绘制到相同的中间缓存帧中的情况下,针对每帧目标视频帧,可以先清除中间缓存帧中已有的部分或全部渲染素材,再绘制从所述目标视频帧中获取的渲染素材。当然,也可以在保留中间缓存帧中已有的全部渲染素材的情况下,绘制从所述目标视频帧中获取的渲染素材。从同一帧目标视频帧中提取的与不同目标对象对应的渲染素材可以被绘制到不同的中间缓存帧中,也可以被绘制到相同的中间缓存帧中。同一目标对象复制得到的多个渲染素材可以被绘制到不同的中间缓存帧中,也可以被绘制到相同的中间缓存帧中。
各个渲染素材在中间缓存帧中可以互不重叠,也可以存在一定的重叠量。各个渲染素材在中间缓存帧中的渲染位置可以预先确定。在确定渲染位置之后,可以将渲染位置写入配置文件,并在需要将渲染素材渲染到中间缓存帧时从配置文件中读取。其中,用于存储渲染位置以及前述N的数值的配置文件可以是同一个配置文件,也可以是不同的配置文件。下面对确定渲染位置的几种方式进行举例说明。本领域技术人员可以理解,下面的举例仅为示例性说明,并非用于限制本公开。除了下面的例子中所示出的方式之外,也可以采用其他方式确定渲染位置。
在一些实施例中,可以在配置文件中存储默认渲染位置。这样,如果未通过下面任何一种方式成功设置渲染位置,则可以采用配置文件中的默认渲染位置作为渲染位置。当然,上述方式并非必须的,即,配置文件中渲染位置的初始数值也可以为空,在通过下面的某种方式成功设置之后,可以将配置文件中的渲染位置更改为设置后的渲染位置。
在一些实施例中,可以先确定渲染素材的数量N的数值,再分别确定每个渲染素材的渲染位置。在另一些实施例中,也可以先确定渲染素材的渲染位置,再将渲染位置的数量确定为渲染素材的数量N的数值。在确定渲染位置之后,可以基于确定的各所述渲染素材在所述中间缓存帧中的渲染位置,将每个所述渲染素材渲染到所述中间缓存帧的与所述渲染位置对应的位置处。渲染素材的位置和数量可以预先写入配置文件,并在需要渲染时直接从配置文件中读取,读写效率较高,从而提高了渲染素材的渲染效率。
在一些实施例中,可以直接由用户输入所述渲染位置,例如,输入渲染位置的坐标。具体来说,可以接收用户输入的设置指令,所述设置指令中携带所述渲染位置,然后基于设置指令对所述渲染位置进行设置。可以在同一条设置指令中同时携带上述N的数值以及渲染位置,也可以通过不同的指令分别设置N的数值和渲染位置,本公开对此不做限制。
在另一些实施例中,渲染位置也可以随机生成。例如,可以基于预先确定的分布函数(例如,均匀分布、正态分布等),在中间缓存帧中与目标视频帧对应的坐标范围内随机生成渲染位置的坐标。假设分布函数为均匀分布,目标视频帧的坐标范围为(x0,y0)到(x1,y1),则可以以相等的概率从中间缓存帧中(x0,y0)到(x1,y1)这一坐标范围内选取一个或多个坐标作为渲染位置。
在另一些实施例中,可以通过对目标视频帧中的信息进行检测或识别,并基于检测或识别的结果自动设置所述渲染位置。例如,可以从目标视频帧中识别指定对象,并基于目标视频中指定对象的位置自动设置中间缓存帧中的渲染位置。所述渲染位置可以被包括在与所述目标视频帧中指定对象的位置对应的位置范围内。例如,指定对象是桌子时,则渲染位置可以为与桌面所在位置对应的位置范围内的一个位置点。所述指定对象的确定方式可与前述确定N的数值时确定指定对象的方式相同,此处不再展开说明。在一些实施例中,用于确定N的数值的指定对象与用于确定所述渲染位置的 指定对象可以为同一对象。本实施例能够基于指定对象的位置自动设置渲染素材的位置。在不同的目标视频帧中指定对象的位置可能不同,从而使得基于不同的目标视频帧生成的合成帧中各个分身的位置不是固定的,提高了生成的合成帧的多样性。
假设将所述目标视频帧中与所述目标对象的距离小于第一预设距离阈值的对象作为第一对象,在指定对象为所述目标视频帧中与所述第一对象类别相同的第二对象的情况下,可以基于所述指定对象在所述目标视频帧中的位置,确定所述渲染素材在所述目标视频帧中的目标位置,并基于所述目标位置确定所述渲染素材在所述中间缓存帧中对应的渲染位置。例如,可以将指定对象所在位置范围内的一个位置确定为所述目标位置,也可以将指定对象所在位置范围的邻域范围内的一个位置确定为所述目标位置。可以将目标位置直接确定为对应的渲染位置。例如,目标位置在目标视频帧中的坐标为(x2,y2),则可以将中间缓存帧中坐标为(x2,y2)的位置确定为渲染位置。通过本实施例的方式确定渲染素材与指定对象的第二相对位置关系,能够使分身与指定对象之间的相互关系更加协调。
在一些实施例中,可以确定所述目标对象与所述第一对象的第一相对位置关系,基于所述第一相对位置关系、所述第一对象的尺寸以及所述指定对象的尺寸,确定所述渲染素材与所述指定对象的第二相对位置关系。其中,所述第一相对位置关系可包括所述目标对象相对于所述第一对象的第一方向以及第一距离,所述第二相对位置关系可包括所述渲染素材相对于所述指定对象的第二方向以及第二距离。可以基于所述第一对象的尺寸以及所述指定对象的尺寸确定缩放比例,基于所述缩放比例对所述第一距离进行缩放处理,得到第二距离,以使所述第一距离与所述第二距离的比值等于所述第一对象的尺寸与所述指定对象的尺寸的比值。还可以基于所述第一方向确定渲染素材相对于所述指定对象的第二方向,所述第一方向与所述第二方向可以相同、相反或者相互垂直,或者呈指定夹角。然后,基于所述第二方向、第二距离以及所述指定对象的位置确定所述目标位置。还可以基于所述缩放比例对所述渲染素材进行缩放处理,以使目标对象与渲染素材的尺寸之比等于所述第一对象与所述指定对象的尺寸之比。通过本实施例的方案,能够使分身的尺寸与视频帧中其他对象的尺寸相匹配,减少因分身尺寸过大或过小导致的分身看起来不够协调的情况。
如图3A-1所示,在合成前的目标视频帧301中检测到本体3011,以及预设类别的指定对象,例如树叶3012和3013。这种情况下,则可以确定所述指定对象的数量N的数值为2,指定位置分别为树叶3012所在位置范围内的一个点以及树叶3013所在位置范围内的一个点。由于树叶3012和3013的方向不同,因此,如图3A-2所示,可以对本体3011的一个分身3015进行翻转,以使本体3011的两个分身3014和3015朝向不同的方向。此外,由于树叶3012和3013的大小不同,可以对两个分身3014和3015采用不同的缩放比例进行缩放,以使这两个分身3014和3015分别匹配树叶3012和3013的大小,从而得到合成帧302。
如图3B-1所示,在合成前的目标视频帧303中检测到本体3031以及本体附近的路灯3032作为第一对象,则可将同为路灯类别的第二对象3033确定为指定对象,并在各个路灯3033周围设置分身3034,从而得到如图3B-2所示的合成帧304。
在一些实施例中,可以分别对各个所述渲染素材进行预处理,得到预处理后的渲染素材,预处理后的所述渲染素材的属性不同于所述目标对象的属性;分别将预处理后的各个所述渲染素材渲染到所述中间缓存帧上。所述属性可包括但不限于目标对象的位置、尺寸、色彩、透明度、阴影、角度、朝向、动作等中的至少一者。所述预处理包括但不限于以下至少一者:位移、旋转、翻转、缩放、色彩处理、透明度处理、阴影处理。可以通过预处理为不同的分身设置不同的属性,以便区分不同的分身。本实 施例通过进行不同的预处理,能够使分身呈现出不同的显示效果,使得分身的显示效果更加多样化。
在一些实施例中,可以随机确定每个渲染素材预处理后的目标属性,例如,以属性包括色彩为例,假设候选色彩空间中包括红、绿、蓝三种色彩,渲染素材的数量为1,则可以随机从红、绿、蓝三种色彩中选择一种色彩作为渲染素材预处理后的目标色彩(假设为红色),并通过预处理将渲染素材的色彩更改为目标色彩(红色)。在一些实施例中,也可以由用户输入预处理后的目标属性,并通过预处理将渲染素材的属性更改为用户输入的目标属性。
在另一些实施例中,可以确定所述渲染素材在所述目标视频帧上的目标位置;确定与所述目标位置的距离小于第二预设距离阈值的第三对象;基于所述第三对象的属性信息对所述渲染素材进行预处理。
例如,可以基于所述第三对象的尺寸对所述渲染素材进行缩放处理。通过缩放处理,可以使缩放后的渲染素材的尺寸与第三对象的尺寸相匹配,减少因尺寸不匹配带来的视觉上的不协调感。
例如,可以基于所述第三对象的方向对所述渲染素材进行翻转处理。通过翻转处理,可以使翻转后的渲染素材与第三对象的方向相同、相反、垂直或呈现其他的方向关系。
例如,可以基于所述第三对象的角度对所述渲染素材进行旋转处理。通过旋转处理,可以使旋转后的渲染素材与第三对象构成预设角度,例如,在同一直线上,或者在等边三角形的两条边上等。
例如,还可以基于所述第三对象的色彩对所述渲染素材进行色彩处理。通过色彩处理,可以使处理后的渲染素材与第三对象的色彩满足一定的色彩条件,例如,所述色彩条件为色彩差异大于预设差异值,从而便于在合成帧中查看分身。
在一些实施例中,可以每次渲染一个或多个渲染素材到中间缓存帧。以每次渲染一个渲染素材为例,每将一个渲染素材渲染到中间缓存帧中,可以确定所述中间缓存帧上已渲染的渲染素材的数量是否达到所述渲染素材的总数(即前述实施例中的N)。若未达到,返回至将每个所述渲染素材渲染到中间缓存帧中的步骤。在本实施例中,可以采用一个计数器对已渲染到中间缓存帧中的渲染素材进行计数,每渲染一个渲染素材,将计数器的数值加1。在计数值达到N的情况下,确定所述中间缓存帧上已渲染的渲染素材的数量已达到所述渲染素材的总数。
在渲染素材的数量大于1的情况下,可以将多个位置信息写入位置序列,每个位置信息对应所述中间缓存帧中的一个渲染位置,并按照各个位置信息在位置序列中的顺序,依次将多个所述渲染素材分别渲染到所述中间缓存帧中对应于各个位置信息的渲染位置处。本实施例能够基于渲染位置将渲染素材渲染到中间缓存帧,在各个渲染位置都不同的情况下,能够将各个渲染素材渲染到中间缓存帧中不同的位置处。
本公开实施例的渲染过程的流程如图4所示。在步骤401中,可以输入包括目标对象(例如,人物主体)的目标视频帧。
在步骤402中,可以应用各个渲染素材的渲染位置和缩放比例等参数,对目标对象进行复制和预处理。所述渲染位置和缩放比例可以通过用户发送的设置指令进行设置,也可以通过其他方式自动设置。
在步骤403中,可以将预处理后的目标对象作为分身渲染到中间缓存帧。
在步骤404中,可以判断渲染到中间缓存帧的分身数量是否足够。如果是,则执行步骤405,否则返回步骤402。
在步骤405中,可以输出中间缓存帧。输出的中间缓存帧可用于与所述目标视频帧进行合成。
在步骤203中,可以对所述中间缓存帧与所述目标视频帧进行合成,得到合成帧。由于分身与本体来自同一目标视频帧,因此,分身与本体的动作是同步的。其中,一帧目标视频帧可以与一帧或多帧中间缓存帧进行合成。例如,在将从同一目标视频帧中提取的不同目标对象对应的渲染素材绘制到不同的中间缓存帧中的情况下,可以将该目标视频帧与多个中间缓存帧进行合成,得到一帧合成帧,该合成帧中的多个目标对象都具有对应的分身。又例如,在将对同一目标对象复制得到的多个渲染素材绘制到不同的中间缓存帧中的情况下,可以将该视频帧与多个中间缓存帧进行合成,得到一帧合成帧,该合成帧中的一个目标对象具有多个分身。
在一些实施例中,可以为所述渲染素材渲染关联对象,并在所述合成帧中显示所述关联对象。可选地,可以基于所述目标对象的动作类别,为所述渲染素材渲染关联对象。所述关联对象可以是与目标对象的动作相关的道具。如图5所示,在识别到目标对象501执行的动作为踢球动作的情况下,可以为渲染素材502添加足球道具503。当然,还可以为目标对象501添加关联对象(图中未示出)。也可以随机地为目标对象和各个渲染素材渲染不同的关联对象,以便对目标对象和各个渲染素材进行区分。所述关联对象还可以是服饰,包括但不限于帽子、衣服、耳环、手镯、鞋子、胡须、眼镜等中的一种或多种。如图6所示,可以为目标对象和各个渲染素材分别添加帽子601、眼镜602、胡须603等关联对象。
在一些实施例中,由于视频的字幕往往与视频中目标对象的信息相关,还可以基于所述目标视频帧中的字幕信息,为所述渲染素材渲染关联对象。具体来说,可以识别出字幕信息中的关键词,所述关键词可以是预先加入关键词库的词语。可以建立并存储所述关键词与关联对象之间的关联关系。这样,在识别到关键词的情况下,基于所述关联关系查找对应的关联对象。如图7A所示,目标视频帧F1中包括目标对象702,且在目标视频帧F1中识别出的字幕信息701中包括关键词“扇子”。如图7B所示,则可以根据图7A所示的字幕信息为目标对象702对应的渲染素材703渲染作为关联对象的扇子704,以使合成帧Fc中包括目标对象702、渲染素材703以及扇子704。
在上述渲染关联对象的实施例中,可以将关联对象也渲染到中间缓存帧中,并在进行视频帧合成时将渲染有关联对象的中间缓存帧与目标视频帧进行合成。或者,也可以在将未渲染关联对象的中间缓存帧与目标视频帧进行合成之后,将关联对象渲染到合成帧中,还可以将关联对象渲染到中间缓存帧以外的关联缓存帧中,并将中间缓存帧、渲染有关联对象的关联缓存帧以及所述目标视频帧进行合成。通过渲染关联对象,可以使合成帧显示更加丰富的特效,从而进一步提高视频处理结果的多样性和趣味性。
在一些实施例中,可以将所述目标视频帧中的目标对象、所述中间缓存帧中的渲染素材以及所述目标视频帧中的背景区域分别绘制到不同的透明图层上,对绘制后的各个透明图层进行合成,得到所述合成帧;其中,不同透明图层具有不同的显示优先级,显示优先级较高的透明图层上的像素点能够覆盖显示优先级较低的透明图层上的像素点。在分身、本体以及背景区域存在部分重叠时,通过本公开实施例的方案能够实现分身遮挡本体或者本体遮挡分身等不同的遮挡效果。本实施例能够通过为透明图层设置不同的显示优先级,使合成帧中分身、本体以及背景区域分别呈现出不同的覆盖显示效果,例如,可以是分身覆盖本体,本体覆盖背景区域;又例如,可以是本体覆盖分身,分身覆盖背景区域。
例如,在图8A所示的实施例中,将所述目标视频帧F1中的目标对象801、所述 中间缓存帧(图中未示出)中的渲染素材802以及所述目标视频帧F1中的背景区域803分别渲染到图层L1、图层L2以及图层L3上,且图层L1、图层L2以及图层L3的显示优先级依次降低。可以看出,在合成帧Fc中图层L1上的目标对象801能够覆盖图层L2上的渲染素材802,图层L2上的渲染素材802能够覆盖图层L3上的背景区域803。而在图8B所示的实施例中,将所述目标视频帧F1中的目标对象801、所述目标视频帧F1中的背景区域803以及所述中间缓存帧(图中未示出)中的渲染素材802分别渲染到图层L1、图层L2以及图层L3上,且图层L1、图层L2以及图层L3的显示优先级依次降低。可以看出,在合成帧Fc中图层L1上的目标对象801能够覆盖图层L2上的背景区域803,图层L2上的背景区域803能够覆盖图层L3上的渲染素材802。
下面结合附图对本公开实施例的整体流程进行说明。
首先,进行背景分割。可以对目标视频帧进行目标检测,得到检测结果。基于所述检测结果从所述目标视频帧中获取目标对象。具体来说,可以基于所述检测结果对所述目标视频帧进行背景分割,得到所述目标视频帧中目标对象的掩膜(mask);基于所述目标视频帧中目标对象的掩膜,对所述目标视频帧进行蒙版处理,基于蒙版处理结果从所述目标视频帧中分割出目标对象。如图9A所示,目标视频帧F1中的目标对象901对应掩膜902。蒙版处理结果如图9B所示。目标对象901的掩膜902用于从目标视频帧F1中提取目标对象901,一般与目标对象901具有相同尺寸和形状。在进行蒙版处理时,可以在目标视频帧F1上覆盖一层图层,图层包括透明区域和不透明区域,可以将掩膜902对应的区域设置为透明区域,将除掩膜902之外的区域设置为不透明区域。将透明区域截取出来,即可得到目标对象901。
相关技术中的背景分割一般需要通过设置绿幕,基于视频帧中各个像素点的颜色,对视频帧进行前景(目标对象)和背景的分割。这种背景分割方式容易因为目标对象自身的绿色像素点而导致分割错误,分割准确性较低,且无法实现随时随地的视频特效制作。本公开实施例通过目标检测实现前景和背景的分割,无需设置绿幕,提高了背景分割的准确性,同时便于用户通过手机等终端设备随时随地实现视频处理。
通过蒙版处理能够获得视频帧中目标对象的信息。然后,可以依据不同效果的需求和参数,对获取到的目标对象进行各种操作,如位移、缩放、缓存等,并将处理的结果渲染到中间缓存帧中。本步骤的具体操作内容会根据特效的不同而有所变化。但主要思路就是对截取出的主体人物图像(也可称为目标对象)进行操作来实现不同的效果。本领域技术人员通过对这些操作的修改,也可以创造出更多不同的渲染特效。
将中间缓存帧和原视频帧(本文中也称为目标视频帧)进行合成,得到最终的分身特效结果。合成帧中同时显示一个“本体”以及N个“分身”,且分身与本体的动作和姿态保持同步。合成效果如图10所示,f1~f4为合成前的目标视频帧,F1~F4分别为对应于f1~f4的合成帧。1001、1002、1003和1004分别为对应的目标视频帧f1~f4中的本体,c1、c2、c3和c4分别为1001、1002、1003和1004对应的分身。可以看出,在每帧合成帧中,分身与本体都保持相同的动作和姿态。
本公开涉及增强现实领域,通过获取现实环境中的目标对象的图像信息,并借助各类视觉相关算法从所获取的图像信息检测或识别目标对象的相关特征、状态及属性,从而得到与具体应用匹配的虚拟与现实相结合的AR效果。示例性的,目标对象可涉及与人体相关的脸部、肢体、手势、动作等,或者与物体相关的标识物、标志物,或者与场馆或场所相关的沙盘、展示区域或展示物品等。视觉相关算法可涉及视觉定位、SLAM、三维重建、图像注册、背景分割、对象的关键点提取及跟踪、对象的位姿或深度检测等。具体应用不仅可以涉及跟真实场景或物品相关的导览、导航、讲解、重建、虚拟效果叠加展示等交互场景,还可以涉及与人相关的特效处理,比如妆容美化、肢 体美化、特效展示、虚拟模型展示等交互场景。可通过卷积神经网络,实现对目标对象的相关特征、状态及属性进行检测或识别处理。上述卷积神经网络是基于深度学习框架进行模型训练而得到的网络模型。
本领域技术人员可以理解,在具体实施方式的上述方法中,各步骤的撰写顺序并不意味着严格的执行顺序而对实施过程构成任何限定,各步骤的具体执行顺序应当以其功能和可能的内在逻辑确定。
如图11所示,本公开实施例还提供一种视频处理装置,所述装置包括:
复制模块1101,用于在从目标视频帧中获取到目标对象的情况下,对所述目标对象进行复制,得到至少一个渲染素材;
渲染模块1102,用于将每个所述渲染素材渲染到中间缓存帧中;
合成模块1103,用于对所述中间缓存帧与所述目标视频帧进行合成,得到合成帧,所述合成帧中包括中间缓存帧中的渲染素材以及所述目标视频帧中的目标对象,每个所述渲染素材的动作均与所述目标对象的动作同步。
本公开实施例通过对目标视频帧中的目标对象进行复制后渲染到中间缓存帧,再将所述中间缓存帧与所述目标视频帧进行合成,得到合成帧,从而能够在合成帧中同时显示一个“本体”以及预设数量个“分身”,并使分身与本体的动作和姿态保持同步。其中,本体是指目标视频帧在合成之前就包括的目标对象,分身是指中间缓存帧包括的渲染素材。相比于相关技术中简单替换背景区域的方式,本公开的视频处理方式能够提高特效渲染过程中的趣味性和特效渲染结果的多样性。
在一些实施例中,所述装置还包括识别模块,用于识别所述目标视频帧中的指定对象;所述指定对象包括满足预设数量条件的对象和/或指定类别的对象。确定模块,用于基于所述目标视频帧中所述指定对象的数量确定所述渲染素材的数量,和/或,基于所述目标视频帧中所述指定对象的位置确定所述渲染素材在所述中间缓存帧中的渲染位置。本实施例能够基于指定对象的数量自动设置渲染素材的数量,以及基于指定对象的位置自动设置渲染素材的位置。在不同的目标视频帧中指定对象的数量和/或位置可能不同,从而使得基于不同的目标视频帧生成的合成帧中各个分身的位置和/或数量不是固定的,提高了生成的合成帧的多样性。
在一些实施例中,所述复制模块用于:基于确定的所述渲染素材的数量,对所述目标对象进行复制,得到所述确定的数量的所述渲染素材。本实施例能够获取确定数量个渲染素材,从而在合成帧上显示确定数量个分身。确定数量可以由用户预先输入并存储,并在需要复制目标对象时直接读取,从而提高渲染效率,适用于实时性要求较高的场景。
在一些实施例中,所述渲染模块用于:基于确定的各所述渲染素材在所述中间缓存帧中的渲染位置,将每个所述渲染素材渲染到所述中间缓存帧的对应渲染位置处。本实施例能够基于渲染位置将渲染素材渲染到中间缓存帧,在各个渲染位置都不同的情况下,能够将各个渲染素材渲染到中间缓存帧中不同的位置处。
在一些实施例中,所述识别模块用于:获取所述目标视频帧中与所述目标对象的距离小于第一预设距离阈值的第一对象;将所述目标视频帧中与所述第一对象类别相同的第二对象确定为所述指定对象。本实施例能够使合成帧中的分身与本体处于同一类别的对象周围,从而在合成帧中展示出多个姿态同步的目标对象重复出现在同一类别的对象周围的效果。
在一些实施例中,所述渲染模块用于:基于所述指定对象在所述目标视频帧中的 位置,确定所述渲染素材在所述目标视频帧中的目标位置,基于所述目标位置确定所述渲染素材在所述中间缓存帧中的渲染位置。
在一些实施例中,所述渲染模块用于:确定所述目标视频帧中所述目标对象与所述第一对象的第一相对位置关系;基于所述第一相对位置关系、所述第一对象的尺寸以及所述指定对象的尺寸,确定所述渲染素材与所述指定对象的第二相对位置关系;基于所述第二相对位置关系以及所述指定对象的位置确定所述目标位置。通过本实施例的方式确定渲染素材与指定对象的第二相对位置关系,能够使分身与指定对象之间的相互关系更加协调。
在一些实施例中,所述装置还包括:读取模块,用于从预先存储的配置文件中读取所述渲染素材的数量和/或所述渲染素材在所述中间缓存帧中的渲染位置。
在一些实施例中,所述装置还包括:接收模块,用于接收设置指令,所述设置指令中携带所述渲染素材的数量和/或所述渲染素材在所述中间缓存帧中的渲染位置;写入模块,用于响应于所述设置指令,将所述渲染素材的数量和/或所述渲染素材在所述中间缓存帧中的渲染位置写入所述配置文件。
渲染素材的位置和数量可以被预先写入配置文件,并在需要渲染时直接从配置文件中读取。这样,读写效率较高,从而提高了渲染素材的渲染效率。
在一些实施例中,所述渲染模块用于:每将一个渲染素材渲染到中间缓存帧中,确定所述中间缓存帧上已渲染的渲染素材的数量是否达到所述渲染素材的总数;若未达到,返回至将每个所述渲染素材渲染到中间缓存帧中的步骤。
在一些实施例中,所述渲染素材的数量大于1,各个渲染素材分别被渲染到中间缓存帧中不同的渲染位置处;所述渲染模块用于:获取位置序列,所述位置序列中包括多个位置信息;按照各个所述位置信息在所述位置序列中的顺序,依次将多个所述渲染素材分别渲染到所述中间缓存帧中对应于各个所述位置信息的位置处。
在一些实施例中,所述渲染模块用于:基于第三对象的属性信息对各个所述渲染素材进行预处理,得到预处理后的所述渲染素材,所述第三对象与所述渲染素材在所述目标视频帧上的目标位置之间的距离小于第二预设距离阈值;将预处理后的各个所述渲染素材渲染到所述中间缓存帧上。
在一些实施例中,所述渲染模块包括:位置确定单元,用于确定所述渲染素材在所述目标视频帧上的目标位置;对象确定单元,用于确定与所述目标位置的距离小于第二预设距离阈值的第三对象;预处理单元,用于基于所述第三对象的属性信息对所述渲染素材进行预处理。
在一些实施例中,所述预处理单元用于:基于所述第三对象的尺寸对所述渲染素材进行缩放处理;和/或基于所述第三对象的方向对所述渲染素材进行翻转处理;和/或基于所述第三对象的角度对所述渲染素材进行旋转处理;和/或基于所述第三对象的色彩对所述渲染素材进行色彩处理。本实施例通过进行不同的预处理,能够使分身呈现出不同的显示效果,使得分身的显示效果更加多样化。
在一些实施例中,所述装置还包括:关联对象渲染模块,用于为所述渲染素材渲染关联对象,并在所述合成帧中显示所述关联对象。通过渲染关联对象,可以使合成后的视频帧(本文也称为合成帧)显示更加丰富的特效,从而进一步提高视频处理结果的多样性和趣味性。
在一些实施例中,所述关联对象渲染模块用于:基于所述目标视频帧中的字幕信息,为所述渲染素材渲染关联对象;或者基于所述目标对象的动作类别,为所述渲染 素材渲染关联对象;或者随机为所述渲染素材渲染关联对象。
在一些实施例中,所述合成模块用于:将所述目标视频帧中的目标对象、所述中间缓存帧中的渲染素材以及所述目标视频帧中的背景区域分别绘制到不同的透明图层上,对绘制后的各个透明图层进行合成,得到所述合成帧;其中,不同透明图层具有不同的显示优先级,显示优先级较高的透明图层上的像素点覆盖显示优先级较低的透明图层上的像素点。本实施例能够通过为透明图层设置不同的显示优先级,使合成帧中分身、本体以及背景区域分别呈现出不同的覆盖显示效果,例如,可以是分身覆盖本体,本体覆盖背景区域;又例如,可以是本体覆盖分身,分身覆盖背景区域。
在一些实施例中,所述装置还包括:检测模块,用于对所述目标视频帧进行目标检测,得到检测结果;目标对象获取模块,用于基于所述检测结果从所述目标视频帧中获取目标对象。本实施例通过目标检测实现对目标视频帧中的前景(即目标对象所在区域)和背景区域的分割,无需设置绿幕,提高了背景分割的准确性,同时便于用户通过手机等终端设备随时随地实现视频处理。
在一些实施例中,所述目标对象获取模块用于:基于所述检测结果对所述目标视频帧进行背景分割,得到所述目标视频帧中目标对象的掩膜;基于所述目标视频帧中目标对象的掩膜,对所述目标视频帧进行蒙版处理,基于蒙版处理结果从所述目标视频帧中分割出目标对象。
在一些实施例中,本公开实施例提供的装置具有的功能或包含的模块可以用于执行上文方法实施例描述的方法,其具体实现可以参照上文方法实施例的描述,为了简洁,这里不再赘述。
本说明书实施例还提供一种计算机设备,其至少包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其中,处理器执行所述程序时实现前述任一实施例所述的方法。
图12示出了本说明书实施例所提供的一种更为具体的计算设备硬件结构示意图,该设备可以包括:处理器1201、存储器1202、输入/输出接口1203、通信接口1204和总线1205。其中处理器1201、存储器1202、输入/输出接口1203和通信接口1204通过总线1205实现彼此之间的通信连接。
处理器1201可以采用通用的CPU(Central Processing Unit,中央处理器)、微处理器、应用专用集成电路(Application Specific Integrated Circuit,ASIC)、或者一个或多个集成电路等方式实现,用于执行相关程序,以实现本说明书实施例所提供的技术方案。处理器1201还可以包括显卡,所述显卡可以是Nvidia titan X显卡或者1080Ti显卡等。
存储器1202可以采用ROM(Read Only Memory,只读存储器)、RAM(Random Access Memory,随机存取存储器)、静态存储设备,动态存储设备等形式实现。存储器1202可以存储操作系统和其他应用程序,在通过软件或者固件来实现本说明书实施例所提供的技术方案时,相关的程序代码保存在存储器1202中,并由处理器1201来调用执行。
输入/输出接口1203用于连接输入/输出模块,以实现信息输入及输出。输入/输出模块可以作为组件配置在设备中(图中未示出),也可以外接于设备以提供相应功能。其中输入设备可以包括键盘、鼠标、触摸屏、麦克风、各类传感器等,输出设备可以包括显示器、扬声器、振动器、指示灯等。
通信接口1204用于连接通信模块(图中未示出),以实现本设备与其他设备的 通信交互。其中通信模块可以通过有线方式(例如USB、网线等)实现通信,也可以通过无线方式(例如移动网络、WIFI、蓝牙等)实现通信。
总线1205包括一通路,在设备的各个组件(例如处理器1201、存储器1202、输入/输出接口1203和通信接口1204)之间传输信息。
需要说明的是,尽管上述设备仅示出了处理器1201、存储器1202、输入/输出接口1203、通信接口1204以及总线1205,但是在具体实施过程中,该设备还可以包括实现正常运行所必需的其他组件。此外,本领域的技术人员可以理解的是,上述设备中也可以仅包含实现本说明书实施例方案所必需的组件,而不必包含图中所示的全部组件。
本公开实施例还提供一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现前述任一实施例所述的方法。
计算机可读介质包括永久性和非永久性、可移动和非可移动介质可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带、磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括暂存电脑可读介质(transitory media),如调制的数据信号和载波。
通过以上的实施方式的描述可知,本领域的技术人员可以清楚地了解到本说明书实施例可借助软件加必需的通用硬件平台的方式来实现。基于这样的理解,本说明书实施例的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品可以存储在存储介质中,如ROM/RAM、磁碟、光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本说明书各个实施例或者实施例的某些部分所述的方法。
上述实施例阐明的系统、装置、模块或单元,具体可以由计算机芯片或实体实现,或者由具有某种功能的产品来实现。一种典型的实现设备为计算机,计算机的具体形式可以是个人计算机、膝上型计算机、蜂窝电话、相机电话、智能电话、个人数字助理、媒体播放器、导航设备、电子邮件收发设备、游戏控制台、平板计算机、可穿戴设备或者这些设备中的任意几种设备的组合。
本说明书中的各个实施例均采用递进的方式描述,各个实施例之间相同或相似的部分可互相参见,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于装置实施例而言,由于其基本相似于方法实施例,所以描述得比较简单,相关之处参见方法实施例的部分说明即可。以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的模块可以是或者也可以不是物理上分开的,在实施本说明书实施例方案时可以把各模块的功能在同一个或多个软件和/或硬件中实现。也可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性劳动的情况下,即可以理解并实施。
以上所述仅是本说明书实施例的具体实施方式,应当指出,对于本技术领域的普通技术人员来说,在不脱离本说明书实施例原理的前提下,还可以做出若干改进和润饰,这些改进和润饰也应视为落入本公开的保护范围。

Claims (15)

  1. 一种视频处理方法,其特征在于,所述方法包括:
    在从目标视频帧中获取到目标对象的情况下,对所述目标对象进行复制,得到至少一个渲染素材;
    将每个所述渲染素材渲染到中间缓存帧中;
    对所述中间缓存帧与所述目标视频帧进行合成,得到合成帧,其中,所述合成帧中包括所述渲染素材以及所述目标对象,每个所述渲染素材的动作均与所述目标对象的动作同步。
  2. 根据权利要求1所述的方法,其特征在于,所述对所述目标对象进行复制,得到至少一个渲染素材,包括:
    基于确定的渲染素材的数量,对所述目标对象进行复制,得到所述确定的数量的所述渲染素材;
    所述将每个所述渲染素材渲染到中间缓存帧中,包括:
    基于确定的各所述渲染素材在所述中间缓存帧中的所述渲染位置,将每个所述渲染素材渲染到所述中间缓存帧的与所述渲染位置对应的位置处。
  3. 根据权利要求1所述的方法,其特征在于,所述方法还包括以下至少一个:
    从预先存储的配置文件中读取所述渲染素材的数量和/或所述渲染素材在所述中间缓存帧中的渲染位置;
    响应于接收到的设置指令,将所述渲染素材的数量和/或所述渲染素材在所述中间缓存帧中的渲染位置写入所述配置文件,所述设置指令包括所述渲染素材的数量和/或所述渲染素材在所述中间缓存帧中的渲染位置;
    基于所述目标视频帧中指定对象的数量确定所述渲染素材的数量,其中,所述指定对象包括所述目标视频帧中满足预设数量条件的对象和/或所述目标视频帧中指定类别的对象;基于所述目标视频帧中所述指定对象的位置确定所述渲染素材在所述中间缓存帧中的渲染位置。
  4. 根据权利要求3所述的方法,其特征在于,通过以下识别所述目标视频帧中的所述指定对象,包括:
    获取所述目标视频帧中与所述目标对象的距离小于第一预设距离阈值的第一对象;
    将所述目标视频帧中与所述第一对象的类别相同的第二对象确定为所述指定对象。
  5. 根据权利要求3所述的方法,其特征在于,所述基于所述目标视频帧中所述指定对象的位置确定所述渲染素材在所述中间缓存帧中的渲染位置,包括:
    基于所述指定对象在所述目标视频帧中的位置,确定所述渲染素材在所述目标视频帧中的目标位置,基于所述目标位置确定所述渲染素材在所述中间缓存帧中的渲染位置。
  6. 根据权利要求5所述的方法,其特征在于,所述基于所述指定对象在所述目标视频帧中的位置,确定所述渲染素材在所述目标视频帧中的目标位置,包括:
    确定所述目标视频帧中所述目标对象与所述指定对象的第一相对位置关系;
    基于所述第一相对位置关系、所述第一对象的尺寸以及所述指定对象的尺寸,确定所述渲染素材与所述指定对象的第二相对位置关系;
    基于所述第二相对位置关系以及所述指定对象的位置确定所述目标位置。
  7. 根据权利要求1至6任一所述的方法,其特征在于,所述将每个所述渲染素材渲染到中间缓存帧中,包括:
    获取位置序列,所述位置序列中包括多个位置信息;
    按照各个所述位置信息在所述位置序列中的顺序,依次将多个所述渲染素材分别渲染到所述中间缓存帧中对应于各个所述位置信息的位置处。
  8. 根据权利要求1至6任一所述的方法,其特征在于,所述将每个所述渲染素材 渲染到中间缓存帧中,包括:
    基于第三对象的属性信息对各个所述渲染素材进行预处理,得到预处理后的所述渲染素材,所述第三对象与所述渲染素材在所述目标视频帧上的目标位置之间的距离小于第二预设距离阈值;
    将预处理后的各个所述渲染素材渲染到所述中间缓存帧上。
  9. 根据权利要求8所述的方法,其特征在于,所述基于第三对象的属性信息对各个所述渲染素材进行预处理,包括以下至少一项:
    基于所述第三对象的尺寸对所述渲染素材进行缩放处理;
    基于所述第三对象的方向对所述渲染素材进行翻转处理;
    基于所述第三对象的角度对所述渲染素材进行旋转处理;
    基于所述第三对象的色彩对所述渲染素材进行色彩处理。
  10. 根据权利要求1至9任一所述的方法,其特征在于,所述方法还包括:
    为所述渲染素材渲染关联对象,并
    在所述合成帧中显示所述关联对象。
  11. 根据权利要求10所述的方法,其特征在于,所述为所述渲染素材渲染关联对象,包括以下至少一项:
    基于所述目标视频帧中的字幕信息,为所述渲染素材渲染所述关联对象;
    基于所述目标对象的动作类别,为所述渲染素材渲染所述关联对象;或者
    随机为所述渲染素材渲染所述关联对象。
  12. 根据权利要求1至11任一所述的方法,其特征在于,所述对所述中间缓存帧与所述目标视频帧进行合成,得到合成帧,包括:
    将所述目标视频帧中的所述目标对象、所述中间缓存帧中的所述渲染素材、以及所述目标视频帧中的背景区域分别绘制到不同的透明图层上,对绘制后的各个所述透明图层进行合成,得到所述合成帧;
    其中,所述不同的透明图层具有不同的显示优先级,显示优先级较高的所述透明图层上的像素点覆盖显示优先级较低的所述透明图层上的像素点。
  13. 一种视频处理装置,其特征在于,所述装置包括:
    复制模块,用于在从目标视频帧中获取到目标对象的情况下,对所述目标对象进行复制,得到至少一个渲染素材;
    渲染模块,用于将每个所述渲染素材渲染到中间缓存帧中;
    合成模块,用于对所述中间缓存帧与所述目标视频帧进行合成,得到合成帧,所述合成帧中包括所述渲染素材以及所述目标对象,每个所述渲染素材的动作均与所述目标对象的动作同步。
  14. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,该程序被处理器执行时实现权利要求1至12任意一项所述的方法。
  15. 一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其特征在于,所述处理器执行所述程序时实现权利要求1至12任意一项所述的方法。
PCT/CN2022/117420 2021-09-07 2022-09-07 视频处理方法、装置、计算机可读存储介质及计算机设备 WO2023036160A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111044847.6A CN113490050B (zh) 2021-09-07 2021-09-07 视频处理方法和装置、计算机可读存储介质及计算机设备
CN202111044847.6 2021-09-07

Publications (1)

Publication Number Publication Date
WO2023036160A1 true WO2023036160A1 (zh) 2023-03-16

Family

ID=77946519

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/117420 WO2023036160A1 (zh) 2021-09-07 2022-09-07 视频处理方法、装置、计算机可读存储介质及计算机设备

Country Status (2)

Country Link
CN (1) CN113490050B (zh)
WO (1) WO2023036160A1 (zh)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113841112A (zh) * 2020-08-06 2021-12-24 深圳市大疆创新科技有限公司 图像处理方法、相机及移动终端
CN113490050B (zh) * 2021-09-07 2021-12-17 北京市商汤科技开发有限公司 视频处理方法和装置、计算机可读存储介质及计算机设备
CN114202617A (zh) * 2021-12-13 2022-03-18 北京字跳网络技术有限公司 视频图像处理方法、装置、电子设备及存储介质
CN114302229B (zh) * 2021-12-30 2024-04-12 重庆杰夫与友文化创意有限公司 一种场景素材转换为视频的方法、系统及存储介质
CN114881901A (zh) * 2022-04-29 2022-08-09 北京字跳网络技术有限公司 视频合成方法、装置、设备、介质及产品
CN115190362B (zh) * 2022-09-08 2022-12-27 北京达佳互联信息技术有限公司 数据处理方法、装置、电子设备及存储介质

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170094193A1 (en) * 2015-09-28 2017-03-30 Gopro, Inc. Automatic composition of video with dynamic background and composite frames selected based on foreground object criteria
CN108737852A (zh) * 2018-04-26 2018-11-02 深圳天珑无线科技有限公司 一种视频处理方法、终端、具有存储功能的装置
CN110012352A (zh) * 2019-04-17 2019-07-12 广州华多网络科技有限公司 图像特效处理方法、装置及视频直播终端
CN110290425A (zh) * 2019-07-29 2019-09-27 腾讯科技(深圳)有限公司 一种视频处理方法、装置及存储介质
CN111832539A (zh) * 2020-07-28 2020-10-27 北京小米松果电子有限公司 视频处理方法及装置、存储介质
CN112218108A (zh) * 2020-09-18 2021-01-12 广州虎牙科技有限公司 直播渲染方法、装置、电子设备及存储介质
CN113490050A (zh) * 2021-09-07 2021-10-08 北京市商汤科技开发有限公司 视频处理方法和装置、计算机可读存储介质及计算机设备

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2011176748A (ja) * 2010-02-25 2011-09-08 Sony Corp 画像処理装置および方法、並びにプログラム
KR101960305B1 (ko) * 2012-07-04 2019-03-20 엘지전자 주식회사 터치 스크린을 포함하는 디스플레이 장치 및 그 제어 방법
CN103997687B (zh) * 2013-02-20 2017-07-28 英特尔公司 用于向视频增加交互特征的方法及装置
CN106817538A (zh) * 2016-12-11 2017-06-09 乐视控股(北京)有限公司 电子设备、图片拍摄方法及装置
CN107295265A (zh) * 2017-08-01 2017-10-24 珠海市魅族科技有限公司 拍摄方法及装置、计算机装置和计算机可读存储介质
CN110035321B (zh) * 2019-04-11 2022-02-11 北京大生在线科技有限公司 一种在线实时视频的装饰方法与系统
CN112150580A (zh) * 2019-06-28 2020-12-29 腾讯科技(深圳)有限公司 一种图像处理方法、装置、智能终端及存储介质

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170094193A1 (en) * 2015-09-28 2017-03-30 Gopro, Inc. Automatic composition of video with dynamic background and composite frames selected based on foreground object criteria
CN108737852A (zh) * 2018-04-26 2018-11-02 深圳天珑无线科技有限公司 一种视频处理方法、终端、具有存储功能的装置
CN110012352A (zh) * 2019-04-17 2019-07-12 广州华多网络科技有限公司 图像特效处理方法、装置及视频直播终端
CN110290425A (zh) * 2019-07-29 2019-09-27 腾讯科技(深圳)有限公司 一种视频处理方法、装置及存储介质
CN111832539A (zh) * 2020-07-28 2020-10-27 北京小米松果电子有限公司 视频处理方法及装置、存储介质
CN112218108A (zh) * 2020-09-18 2021-01-12 广州虎牙科技有限公司 直播渲染方法、装置、电子设备及存储介质
CN113490050A (zh) * 2021-09-07 2021-10-08 北京市商汤科技开发有限公司 视频处理方法和装置、计算机可读存储介质及计算机设备

Also Published As

Publication number Publication date
CN113490050A (zh) 2021-10-08
CN113490050B (zh) 2021-12-17

Similar Documents

Publication Publication Date Title
WO2023036160A1 (zh) 视频处理方法、装置、计算机可读存储介质及计算机设备
US11238644B2 (en) Image processing method and apparatus, storage medium, and computer device
US20220353432A1 (en) Augmented reality self-portraits
US20230316649A1 (en) Textured mesh building
US9912874B2 (en) Real-time visual effects for a live camera view
US11508135B2 (en) Augmented reality content generators including 3D data in a messaging system
US9330500B2 (en) Inserting objects into content
US10621777B2 (en) Synthesis of composite images having virtual backgrounds
WO2021213067A1 (zh) 物品显示方法、装置、设备及存储介质
US11854230B2 (en) Physical keyboard tracking
US20210374972A1 (en) Panoramic video data processing method, terminal, and storage medium
WO2023030176A1 (zh) 视频处理方法、装置、计算机可读存储介质及计算机设备
US20240071131A1 (en) Interactive augmented reality content including facial synthesis
WO2023030177A1 (zh) 视频处理方法、装置、计算机可读存储介质及计算机设备
WO2023124391A1 (zh) 妆容迁移及妆容迁移网络的训练方法和装置
US20220319231A1 (en) Facial synthesis for head turns in augmented reality content
US20220319060A1 (en) Facial synthesis in augmented reality content for advertisements
WO2023066121A1 (zh) 三维模型的渲染
US20240104954A1 (en) Facial synthesis in augmented reality content for online communities
CN111462205A (zh) 图像数据的变形、直播方法、装置、电子设备和存储介质
Tukur et al. SPIDER: A framework for processing, editing and presenting immersive high-resolution spherical indoor scenes
CN111652807B (zh) 眼部的调整、直播方法、装置、电子设备和存储介质
Du Fusing multimedia data into dynamic virtual environments
Hackl et al. Diminishing reality
US20230326095A1 (en) Overlaying displayed digital content with regional transparency and regional lossless compression transmitted over a communication network via processing circuitry

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22866623

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22866623

Country of ref document: EP

Kind code of ref document: A1