CN111862275A - Video editing method, device and equipment based on 3D reconstruction technology - Google Patents

Video editing method, device and equipment based on 3D reconstruction technology Download PDF

Info

Publication number
CN111862275A
CN111862275A CN202010725481.8A CN202010725481A CN111862275A CN 111862275 A CN111862275 A CN 111862275A CN 202010725481 A CN202010725481 A CN 202010725481A CN 111862275 A CN111862275 A CN 111862275A
Authority
CN
China
Prior art keywords
video
model
edited
frame
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010725481.8A
Other languages
Chinese (zh)
Other versions
CN111862275B (en
Inventor
吴善思源
龚秋棠
吴方灿
林奇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen Zhenjing Technology Co ltd
Original Assignee
Xiamen Zhenjing Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen Zhenjing Technology Co ltd filed Critical Xiamen Zhenjing Technology Co ltd
Priority to CN202010725481.8A priority Critical patent/CN111862275B/en
Publication of CN111862275A publication Critical patent/CN111862275A/en
Application granted granted Critical
Publication of CN111862275B publication Critical patent/CN111862275B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T13/00Animation
    • G06T13/203D [Three Dimensional] animation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T13/00Animation
    • G06T13/203D [Three Dimensional] animation
    • G06T13/403D [Three Dimensional] animation of characters, e.g. humans, animals or virtual beings
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics
    • G06T19/20Editing of 3D images, e.g. changing shapes or colours, aligning objects or positioning parts
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2219/00Indexing scheme for manipulating 3D models or images for computer graphics
    • G06T2219/20Indexing scheme for editing of 3D models
    • G06T2219/2016Rotation, translation, scaling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Abstract

The invention discloses a video editing method based on a 3D reconstruction technology, which comprises the following steps: acquiring a video to be edited; detecting an identifiable object in each frame of the video to be edited; reconstructing a first 3D model corresponding to each of the objects using a neural network; selecting a current frame of the object in the video to be edited, editing the selected object, modifying the first 3D model by the edited content, and generating a second 3D model; and performing real-time attitude estimation on each frame image of the object based on the second 3D model, driving the second 3D model to generate a replacement image according to the attitude estimation, and rendering the replacement image to all frames of the same object of the video to be edited. The scheme provided by the invention can realize that the same object on the whole video frame is automatically applied after the object is edited in a single frame in the video, thereby improving the video editing efficiency of a user and improving the experience effect.

Description

Video editing method, device and equipment based on 3D reconstruction technology
Technical Field
The present invention relates to the field of video processing technologies, and in particular, to a video editing method, apparatus and device based on a 3D reconstruction technology.
Background
With the development of 5G and short video applications, users gradually started moving from editing pictures to editing videos. The video editing software at the present stage is more in the editing of the whole video timeline, such as deleting useless segments, adding music and the like. If a user wants to edit a certain object in the video, such as changing the color of furniture or modifying the patterns of people and clothes in the video, the user needs to modify the object frame by frame, 7200 frames of images need to be edited in a period of 5 minutes, and the workload is extremely high; there is no way to edit an object and then synchronize the object with a subsequent video frame, thereby resulting in poor experience effect of the user in editing the video.
Disclosure of Invention
In view of this, the present invention provides a video editing method, apparatus and device based on a 3D reconstruction technology, which can automatically apply to the same object on the whole video frame after editing the object in a single frame in a video, thereby improving the efficiency of editing the video by a user and improving the experience effect.
In order to achieve the above object, the present invention provides a video editing method based on a 3D reconstruction technique, the method comprising:
acquiring a video to be edited;
detecting an identifiable object in each frame of the video to be edited;
reconstructing a first 3D model corresponding to each of the objects using a neural network;
selecting a current frame of the object in the video to be edited, editing the selected object, modifying the first 3D model by the edited content, and generating a second 3D model;
and performing real-time attitude estimation on each frame image of the object based on the second 3D model, driving the second 3D model to generate a replacement image according to the attitude estimation, and rendering the replacement image to all frames of the same object of the video to be edited.
Preferably, the detecting an identifiable object in each frame of the video to be edited includes:
and detecting identifiable objects in each frame of the video to be edited by utilizing a general object detection technology.
Preferably, the reconstructing, by using a neural network, the first 3D model corresponding to each of the objects includes:
reconstructing, by an auto-encoder, the first 3D model corresponding to each of the objects from the voxel composition of the object.
Preferably, performing real-time pose estimation on each frame of image of the video to be edited based on the second 3D model, driving the second 3D model to generate a replacement image according to the pose estimation, rendering the replacement image to all frames of the same object of the video to be edited, performing real-time pose estimation on the object based on the 3D model, and driving the 3D model to render the edited content to all frames of the same object of the video to be edited includes:
cutting out the object according to the coordinates of each frame of image where the object is located, and inputting the object into the second 3D model;
outputting the coordinates of each frame of image where the object is located and the three-dimensional posture parameters of the object;
and driving the second 3D model to rotate and translate to the position where the object appears in each corresponding frame image according to the coordinates and the three-dimensional posture parameters, projecting the edited content to all frames of the same object, replacing pixel points in all frames, and realizing rendering.
In order to achieve the above object, the present invention further provides a video editing apparatus based on 3D reconstruction technology, the apparatus comprising:
the device comprises an acquisition unit, a processing unit and a display unit, wherein the acquisition unit is used for acquiring a video to be edited;
the detection unit is used for detecting an identifiable object in each frame of the video to be edited;
a reconstruction unit for reconstructing a first 3D model corresponding to each of the objects using a neural network;
the editing unit is used for selecting the current frame of the object in the video to be edited, editing the selected object, modifying the first 3D model by the edited content and generating a second 3D model;
and the rendering unit is used for carrying out real-time attitude estimation on each frame of image where the object is located based on the second 3D model, driving the second 3D model to generate a replacement image according to the attitude estimation, and rendering the replacement image to all frames of the same object of the video to be edited.
Preferably, the detection unit further includes:
and detecting identifiable objects in each frame of the video to be edited by utilizing a general object detection technology.
Preferably, the editing unit further includes:
reconstructing, by an auto-encoder, the first 3D model corresponding to each of the objects from the voxel composition of the object.
Preferably, the rendering unit further includes:
the input unit is used for cutting out the object according to the coordinates of each frame of image where the object is located and inputting the object into the second 3D model;
the output unit is used for outputting the coordinates of each frame of image where the object is located and the three-dimensional posture parameters of the object;
and the driving unit is used for driving the second 3D model to rotate and translate to the position where the object appears in each corresponding frame image according to the coordinates and the three-dimensional posture parameters, projecting the edited content onto all frames of the same object, replacing pixel points in all frames and realizing rendering.
In order to achieve the above object, the present invention further proposes a 3D reconstruction technology-based video editing apparatus, comprising a processor, a memory, and a computer program stored in the memory, wherein the computer program is capable of implementing the 3D reconstruction technology-based video editing method according to any one of the above items when executed by the processor.
In order to achieve the above object, the present invention further provides a computer-readable storage medium, which includes a stored computer program, wherein when the computer program runs, the apparatus on which the computer-readable storage medium is located is controlled to implement the video editing method based on the 3D reconstruction technology according to any one of the above mentioned items.
It can be found that, according to the above scheme, a video to be edited can be obtained, an identifiable object in each frame of the video to be edited is detected, a first 3D model corresponding to each object is reconstructed by using a neural network, a current frame of the object in the video to be edited is selected, the selected object is edited, the edited content is modified by the first 3D model to generate a second 3D model, each frame image of the object is subjected to real-time posture estimation based on the second 3D model, the second 3D model is driven according to the posture estimation to generate a replacement image, the replacement image is rendered on all frames of the same object of the video to be edited, and after the object is edited in a single frame in the video, the replacement image is automatically applied to the same object on the whole video frame, so that the efficiency of a user for editing the video is improved, and the experience effect is improved.
Furthermore, the above scheme utilizes a general object detection technology to detect the recognizable object in each frame of the video to be edited, which has the advantages of being capable of accurately recognizing a plurality of objects in the video and having a plurality of recognized types.
Furthermore, according to the scheme, the self-encoder is used for forming the first 3D model corresponding to the reconstruction object according to the voxels of each object, so that the object on a single frame can be automatically edited in the video and can be automatically applied to the whole video, and the difficulty that the editing in the video needs to be edited frame by frame is solved.
Furthermore, according to the scheme, the object is cut according to the coordinate of each frame of image where the object is located, the object is input into the second 3D model, the coordinate of each frame of image where the object is located and the three-dimensional posture parameter of the object are output, the second 3D model is driven to rotate and translate to the position where the object appears in each frame of image where the object is located correspondingly according to the coordinate and the three-dimensional posture parameter, the edited content is projected onto all frames of the same object, pixel points in all frames are replaced, rendering is achieved, the same object can be automatically applied to the whole video frame after the object is edited in a single frame of the video, and therefore the video editing efficiency of a user is improved, and the experience effect is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a schematic flowchart of a video editing method based on a 3D reconstruction technique according to an embodiment of the present invention.
Fig. 2 is a schematic structural diagram of a video editing apparatus based on a 3D reconstruction technique according to another embodiment of the present invention.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be noted that the following examples are only illustrative of the present invention, and do not limit the scope of the present invention. Similarly, the following examples are only some but not all examples of the present invention, and all other examples obtained by those skilled in the art without any inventive work are within the scope of the present invention.
The present invention will be described in detail with reference to the following examples.
The invention provides a video editing method based on a 3D reconstruction technology, which can be automatically applied to the same object on the whole video frame after the object is edited in a single frame in a video, thereby improving the video editing efficiency of a user and improving the experience effect.
Fig. 1 is a schematic flow chart of a video editing method based on a 3D reconstruction technique according to an embodiment of the present invention. The method comprises the following steps:
and S1, acquiring the video to be edited.
S2, detecting identifiable objects in each frame of the video to be edited.
Wherein, detecting the recognizable object in each frame of the video to be edited comprises: and detecting identifiable objects in each frame of the video to be edited by utilizing a general object detection technology.
In this embodiment, through traversing the entire video, a recognizable object appearing in the video is found out by using a general object detection technology, where the object includes an object, a person, an animal, and the like, which can be selected and edited by a user.
In the general object detection technology, after training the neural network model by labeling a large amount of data, the neural network model can detect an object contained in an image according to a given image, for example: cats, dogs, people, beds, quilts, etc., and frames where these objects are located in the image.
Because the video is essentially composed of one frame of image, a 1-second video generally contains 30 frames of image, when a recognizable object is detected by the video, each frame of image in the video is input into a neural network model for general object detection, the neural network model gives the object content contained in each frame of image, all the image detection results are gathered, n (for example, 5) objects with the highest occurrence frequency are selected as the video detection results, and the positions of the objects in the video are marked.
S3, reconstructing a first 3D model corresponding to each of the objects using the neural network.
In this embodiment, according to the requirements of time and precision of an actual application scene, when a user selects a certain object, a neural network can be used to reconstruct a 3D model corresponding to the object according to a single frame or multiple frames. When the time requirement is strict, a single frame can be selected to reconstruct a 3D model corresponding to the object; when the precision requirement is strict, multiple frames can be selected to reconstruct the 3D model corresponding to the object.
Wherein reconstructing a first 3D model corresponding to each of the objects using a neural network comprises: reconstructing, by an auto-encoder, the first 3D model corresponding to each of the objects from the voxel composition of the object.
Specifically, an image is input through a self-encoding network (auto-encoder), and a 3D model composed of voxels of an object after reconstruction is output. Wherein: the input image may be the object detected by the general object detection technique, and the object may be cut out from the image according to the position result detected by the general object detection technique.
Furthermore, for time and accuracy considerations, the 3D model includes two modes: the first is fast, i.e. only 1 image is input; and the other is high precision, n frames of images (for example, 5 frames) in the video respectively pass through the neural network model in the first mode, n 3D models are output, and values of voxels of the models are averaged according to positions to obtain the final high-precision 3D model.
S4, selecting the current frame of the object in the video to be edited, editing the selected object, modifying the first 3D model by the edited content, and generating a second 3D model.
In this embodiment, when the user edits the object on the image, such as changing color, changing shape, etc., the change will be recorded on the reconstructed 3D model, resulting in a modified 3D model.
And S5, performing real-time attitude estimation on each frame of image where the object is located based on the second 3D model, driving the second 3D model to generate a replacement image according to the attitude estimation, and rendering the replacement image to all frames of the same object of the video to be edited.
Performing real-time attitude estimation on each frame of image where the object is located based on the second 3D model, driving the second 3D model to generate a replacement image according to the attitude estimation, and rendering the replacement image to all frames of the same object of the video to be edited, including:
s5-1, cutting out the object according to the coordinates of each frame of image where the object is located, and inputting the object into the second 3D model;
s5-2, outputting the coordinates of each frame of image where the object is located and the three-dimensional posture parameters of the object;
and S5-3, driving the second 3D model to rotate and translate to the position where the object appears in each corresponding frame image according to the coordinates and the three-dimensional posture parameters, projecting the edited content to all frames of the same object, and replacing pixel points in all frames to realize rendering.
In this embodiment, a neural network model is trained for each object, and the neural network model is input as an image of the object and output as coordinates (x, y) of the center of the object in the image and a three-dimensional posture of the object (i.e., yaw, pitch, roll 3-posture rotation angles).
Calling a 3D model of a corresponding object according to the object selected by the user, cutting out an image aiming at a frame of the object after the detection of the universal object detection technology in the video and the coordinate of the corresponding object in the image, inputting the image into the 3D model, and outputting 5 attitude parameters including x, y, yaw, pitch and roll for subsequent use.
And driving the 3D model to rotate and translate to the position of the corresponding frame image where the object appears by applying the 3D model and the output 5 attitude parameters, directly projecting the editing of the 3D model by a user onto the 2-dimensional image, replacing pixel points in the frame image, and finishing rendering.
For example, in a display video of a home environment, a user selects a quilt detected by a general object detection technology, a neural network model reconstructs a 3D model of the quilt, the color of the quilt on the bed is changed by color matching, and the color of the quilt in the whole video is modified after editing is confirmed.
For another example, in a segment of self-timer video, people, clothes and the like in a scene are detected through a general object detection technology, a user selects clothes of a human body, a 3D model of the clothes of the human body is reconstructed through a neural network model, the patterns of the clothes are changed through editing, and after the editing is confirmed, the clothes patterns in the whole video are modified.
It can be found that, according to the above scheme, a video to be edited can be obtained, an identifiable object in each frame of the video to be edited is detected, a first 3D model corresponding to each object is reconstructed by using a neural network, a current frame of the object in the video to be edited is selected, the selected object is edited, the edited content is modified by the first 3D model to generate a second 3D model, each frame image of the object is subjected to real-time posture estimation based on the second 3D model, the second 3D model is driven according to the posture estimation to generate a replacement image, the replacement image is rendered on all frames of the same object of the video to be edited, and after the object is edited in a single frame in the video, the replacement image is automatically applied to the same object on the whole video frame, so that the efficiency of a user for editing the video is improved, and the experience effect is improved.
Furthermore, the above scheme utilizes a general object detection technology to detect the recognizable object in each frame of the video to be edited, which has the advantages of being capable of accurately recognizing a plurality of objects in the video and having a plurality of recognized types.
Furthermore, according to the scheme, the self-encoder is used for forming the first 3D model corresponding to the reconstruction object according to the voxels of each object, so that the object on a single frame can be automatically edited in the video and can be automatically applied to the whole video, and the difficulty that the editing in the video needs to be edited frame by frame is solved.
Furthermore, according to the scheme, the object is cut according to the coordinate of each frame of image where the object is located, the object is input into the second 3D model, the coordinate of each frame of image where the object is located and the three-dimensional posture parameter of the object are output, the second 3D model is driven to rotate and translate to the position where the object appears in each frame of image where the object is located correspondingly according to the coordinate and the three-dimensional posture parameter, the edited content is projected onto all frames of the same object, pixel points in all frames are replaced, rendering is achieved, the same object can be automatically applied to the whole video frame after the object is edited in a single frame of the video, and therefore the video editing efficiency of a user is improved, and the experience effect is improved.
Fig. 2 is a schematic structural diagram of a video editing apparatus based on a 3D reconstruction technique according to another embodiment of the present invention. The apparatus 10 comprises:
an acquiring unit 11, configured to acquire a video to be edited;
a detection unit 12, configured to detect an identifiable object in each frame of the video to be edited;
a reconstruction unit 13 for reconstructing a first 3D model corresponding to each of the objects using a neural network;
an editing unit 14, configured to select a current frame of the object in the video to be edited, edit the selected object, modify the edited content into the first 3D model, and generate a second 3D model;
and the rendering unit 15 is configured to perform real-time pose estimation on each frame of image where the object is located based on the second 3D model, drive the second 3D model to generate a replacement image according to the pose estimation, and render the replacement image to all frames of the same object of the video to be edited.
Optionally, the detecting unit 12 is further configured to:
and detecting identifiable objects in each frame of the video to be edited by utilizing a general object detection technology.
Optionally, the editing unit 14 is further configured to:
reconstructing, by an auto-encoder, the first 3D model corresponding to each of the objects from the voxel composition of the object.
Optionally, the rendering unit 15 further includes:
an input unit (not labeled in the figure) for cutting out the object according to the coordinates of each frame of image where the object is located, and inputting the object into the second 3D model;
an output unit (not labeled in the figure) for outputting the coordinates of each frame of image where the object is located and the three-dimensional posture parameters of the object;
and the driving unit (not marked in the figure) is used for driving the second 3D model to rotate and translate to the position where the object appears in each corresponding frame image according to the coordinates and the three-dimensional posture parameters, projecting the edited content onto all frames of the same object, replacing pixel points in all frames, and realizing rendering.
The functions or operation steps implemented by each unit in the video editing apparatus based on the 3D reconstruction technology are substantially the same as those in the above embodiments, and are not described herein again.
An embodiment of the present invention further provides a video editing apparatus based on a 3D reconstruction technology, which includes a processor, a memory, and a computer program stored in the memory, where the computer program is executable by the processor to implement the video editing method based on the 3D reconstruction technology as described in the foregoing embodiment.
An embodiment of the present invention further provides a computer-readable storage medium, where the computer-readable storage medium includes a stored computer program, and when the computer program runs, a device in which the computer-readable storage medium is located is controlled to execute the video editing method based on the 3D reconstruction technology according to the above embodiment.
Illustratively, the computer program may be divided into one or more units, which are stored in the memory and executed by the processor to accomplish the present invention. The one or more units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution of the computer program in a video editing apparatus based on 3D reconstruction techniques.
The 3D reconstruction technology based video editing apparatus may include, but is not limited to, a processor, a memory. It will be understood by those skilled in the art that the schematic diagram is merely an example of a video editing apparatus based on 3D reconstruction technology, and does not constitute a limitation of the video editing apparatus based on 3D reconstruction technology, and may include more or less components than those shown, or combine some components, or different components, for example, the video editing apparatus based on 3D reconstruction technology may further include an input and output device, a network access device, a bus, etc.
The Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. The general-purpose processor may be a microprocessor or the processor may be any conventional processor or the like, and the control center of the 3D reconstruction technology-based video editing apparatus connects various parts of the entire 3D reconstruction technology-based video editing apparatus by using various interfaces and lines.
The memory may be used to store the computer programs and/or modules, and the processor may implement various functions of the 3D reconstruction technology-based video editing apparatus by executing or executing the computer programs and/or modules stored in the memory and calling data stored in the memory. The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the cellular phone, and the like. In addition, the memory may include high speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other volatile solid state storage device.
Wherein, the unit integrated with the video editing device based on the 3D reconstruction technology can be stored in a computer readable storage medium if the unit is realized in the form of a software functional unit and sold or used as a stand-alone product. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments may be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc.
The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, etc. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.
It should be noted that the above-described device embodiments are merely illustrative, where the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. In addition, in the drawings of the embodiment of the apparatus provided by the present invention, the connection relationship between the modules indicates that there is a communication connection between them, and may be specifically implemented as one or more communication buses or signal lines. One of ordinary skill in the art can understand and implement it without inventive effort.
The embodiments in the above embodiments can be further combined or replaced, and the embodiments are only used for describing the preferred embodiments of the present invention, and do not limit the concept and scope of the present invention, and various changes and modifications made to the technical solution of the present invention by those skilled in the art without departing from the design idea of the present invention belong to the protection scope of the present invention.

Claims (10)

1. A method for editing video based on 3D reconstruction technology, the method comprising:
acquiring a video to be edited;
detecting an identifiable object in each frame of the video to be edited;
reconstructing a first 3D model corresponding to each of the objects using a neural network;
selecting a current frame of the object in the video to be edited, editing the selected object, modifying the first 3D model by the edited content, and generating a second 3D model;
and performing real-time attitude estimation on each frame image of the object based on the second 3D model, driving the second 3D model to generate a replacement image according to the attitude estimation, and rendering the replacement image to all frames of the same object of the video to be edited.
2. The method for video editing based on 3D reconstruction technology according to claim 1, wherein the detecting identifiable objects in each frame of the video to be edited includes:
and detecting identifiable objects in each frame of the video to be edited by utilizing a general object detection technology.
3. The method for video editing based on 3D reconstruction technology according to claim 1, wherein the reconstructing the first 3D model corresponding to each of the objects by using a neural network comprises:
reconstructing, by an auto-encoder, the first 3D model corresponding to each of the objects from the voxel composition of the object.
4. The method according to claim 1, wherein performing real-time pose estimation on each frame of image where the object is located based on the second 3D model, and driving the second 3D model to generate a replacement image according to the pose estimation, and rendering the replacement image onto all frames of the same object of the video to be edited includes:
cutting out the object according to the coordinates of each frame of image where the object is located, and inputting the object into the second 3D model;
outputting the coordinates of each frame of image where the object is located and the three-dimensional posture parameters of the object;
and driving the second 3D model to rotate and translate to the position where the object appears in each corresponding frame image according to the coordinates and the three-dimensional posture parameters, projecting the edited content to all frames of the same object, replacing pixel points in all frames, and realizing rendering.
5. A video editing apparatus based on 3D reconstruction technology, the apparatus comprising:
the device comprises an acquisition unit, a processing unit and a display unit, wherein the acquisition unit is used for acquiring a video to be edited;
the detection unit is used for detecting an identifiable object in each frame of the video to be edited;
a reconstruction unit for reconstructing a first 3D model corresponding to each of the objects using a neural network;
the editing unit is used for selecting the current frame of the object in the video to be edited, editing the selected object, modifying the first 3D model by the edited content and generating a second 3D model;
and the rendering unit is used for carrying out real-time attitude estimation on each frame of image where the object is located based on the second 3D model, driving the second 3D model to generate a replacement image according to the attitude estimation, and rendering the replacement image to all frames of the same object of the video to be edited.
6. The apparatus for editing video according to claim 5, wherein the detecting unit further comprises:
and detecting identifiable objects in each frame of the video to be edited by utilizing a general object detection technology.
7. The apparatus for video editing based on 3D reconstruction technology as claimed in claim 5, wherein the editing unit further comprises:
reconstructing, by an auto-encoder, the first 3D model corresponding to each of the objects from the voxel composition of the object.
8. The apparatus for video editing based on 3D reconstruction technology as claimed in claim 5, wherein the rendering unit further comprises:
the input unit is used for cutting out the object according to the coordinates of each frame of image where the object is located and inputting the object into the second 3D model;
the output unit is used for outputting the coordinates of each frame of image where the object is located and the three-dimensional posture parameters of the object;
and the driving unit is used for driving the second 3D model to rotate and translate to the position where the object appears in each corresponding frame image according to the coordinates and the three-dimensional posture parameters, projecting the edited content onto all frames of the same object, replacing pixel points in all frames and realizing rendering.
9. A video editing device based on 3D reconstruction technology, characterized by comprising a processor, a memory and a computer program stored in the memory, the computer program being executable by the processor to implement the 3D reconstruction technology based video editing method according to any of claims 1 to 4.
10. A computer-readable storage medium, comprising a stored computer program, wherein the computer program, when executed, controls an apparatus on which the computer-readable storage medium is located to perform the video editing method based on the 3D reconstruction technique according to any one of claims 1 to 4.
CN202010725481.8A 2020-07-24 2020-07-24 Video editing method, device and equipment based on 3D reconstruction technology Active CN111862275B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010725481.8A CN111862275B (en) 2020-07-24 2020-07-24 Video editing method, device and equipment based on 3D reconstruction technology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010725481.8A CN111862275B (en) 2020-07-24 2020-07-24 Video editing method, device and equipment based on 3D reconstruction technology

Publications (2)

Publication Number Publication Date
CN111862275A true CN111862275A (en) 2020-10-30
CN111862275B CN111862275B (en) 2023-06-06

Family

ID=72950754

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010725481.8A Active CN111862275B (en) 2020-07-24 2020-07-24 Video editing method, device and equipment based on 3D reconstruction technology

Country Status (1)

Country Link
CN (1) CN111862275B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112270736A (en) * 2020-11-16 2021-01-26 Oppo广东移动通信有限公司 Augmented reality processing method and device, storage medium and electronic equipment
CN112767534A (en) * 2020-12-31 2021-05-07 北京达佳互联信息技术有限公司 Video image processing method and device, electronic equipment and storage medium
CN113518187A (en) * 2021-07-13 2021-10-19 北京达佳互联信息技术有限公司 Video editing method and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106254941A (en) * 2016-10-10 2016-12-21 乐视控股(北京)有限公司 Method for processing video frequency and device
US9736449B1 (en) * 2013-08-12 2017-08-15 Google Inc. Conversion of 2D image to 3D video
CN107067429A (en) * 2017-03-17 2017-08-18 徐迪 Video editing system and method that face three-dimensional reconstruction and face based on deep learning are replaced
CN108765529A (en) * 2018-05-04 2018-11-06 北京比特智学科技有限公司 Video generation method and device
CN110475157A (en) * 2019-07-19 2019-11-19 平安科技(深圳)有限公司 Multimedia messages methods of exhibiting, device, computer equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9736449B1 (en) * 2013-08-12 2017-08-15 Google Inc. Conversion of 2D image to 3D video
CN106254941A (en) * 2016-10-10 2016-12-21 乐视控股(北京)有限公司 Method for processing video frequency and device
CN107067429A (en) * 2017-03-17 2017-08-18 徐迪 Video editing system and method that face three-dimensional reconstruction and face based on deep learning are replaced
CN108765529A (en) * 2018-05-04 2018-11-06 北京比特智学科技有限公司 Video generation method and device
CN110475157A (en) * 2019-07-19 2019-11-19 平安科技(深圳)有限公司 Multimedia messages methods of exhibiting, device, computer equipment and storage medium

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112270736A (en) * 2020-11-16 2021-01-26 Oppo广东移动通信有限公司 Augmented reality processing method and device, storage medium and electronic equipment
CN112270736B (en) * 2020-11-16 2024-03-01 Oppo广东移动通信有限公司 Augmented reality processing method and device, storage medium and electronic equipment
CN112767534A (en) * 2020-12-31 2021-05-07 北京达佳互联信息技术有限公司 Video image processing method and device, electronic equipment and storage medium
CN112767534B (en) * 2020-12-31 2024-02-09 北京达佳互联信息技术有限公司 Video image processing method, device, electronic equipment and storage medium
CN113518187A (en) * 2021-07-13 2021-10-19 北京达佳互联信息技术有限公司 Video editing method and device
CN113518187B (en) * 2021-07-13 2024-01-09 北京达佳互联信息技术有限公司 Video editing method and device

Also Published As

Publication number Publication date
CN111862275B (en) 2023-06-06

Similar Documents

Publication Publication Date Title
CN111862275B (en) Video editing method, device and equipment based on 3D reconstruction technology
CN109035288B (en) Image processing method and device, equipment and storage medium
CN107578367B (en) Method and device for generating stylized image
WO2022142081A1 (en) Self-defined animation curve generation method and apparatus
CN103052973B (en) Generate method and the device of body animation
EP3246921B1 (en) Integrated media processing pipeline
US20040012593A1 (en) Generating animation data with constrained parameters
CN111158840B (en) Image carousel method and device
US20210287718A1 (en) Providing a user interface for video annotation tools
KR100534061B1 (en) Method for creating automatically cartoon animation image and image creation grapic engine system
US20150002516A1 (en) Choreography of animated crowds
Luo et al. Controllable motion-blur effects in still images
JP2020527814A (en) Systems and methods for creating and displaying interactive 3D representations of real objects
US20210287433A1 (en) Providing a 2-dimensional dataset from 2-dimensional and 3-dimensional computer vision techniques
CN110877332B (en) Robot dance file generation method and device, terminal device and storage medium
JP2022054254A (en) Image processing apparatus, image processing method, and program
Degadwala et al. Moving Object Inpainting using Deep Learning
CN116342759A (en) Method, device, equipment and storage medium for quick offline rendering
KR102561020B1 (en) Emulation of hand-drawn lines in CG animation
US11417048B2 (en) Computer graphics system user interface for obtaining artist inputs for objects specified in frame space and objects specified in scene space
Apriliyanti et al. 3D animation design" Science, lanterns to heaven" using the pose-to-pose method
Anggraeni Optimizing 2D Animation Production Time in Creating Traditional Watercolor Looks by Integrating Traditional and Digital Media Using traditional watercolor for backgrounds in digital 2D animation
US20210274091A1 (en) Reconstruction of obscured views of captured imagery using arbitrary captured inputs
Debski et al. A Framework for Art-directed Augmentation of Human Motion in Videos on Mobile Devices
CN117496038A (en) Picture texture stitching method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant