CN114979785A - Video processing method and related device - Google Patents

Video processing method and related device Download PDF

Info

Publication number
CN114979785A
CN114979785A CN202210396606.6A CN202210396606A CN114979785A CN 114979785 A CN114979785 A CN 114979785A CN 202210396606 A CN202210396606 A CN 202210396606A CN 114979785 A CN114979785 A CN 114979785A
Authority
CN
China
Prior art keywords
target
radiation field
video
mirror
field network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210396606.6A
Other languages
Chinese (zh)
Other versions
CN114979785B (en
Inventor
李宇
王龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Honor Device Co Ltd
Original Assignee
Honor Device Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Honor Device Co Ltd filed Critical Honor Device Co Ltd
Priority to CN202210396606.6A priority Critical patent/CN114979785B/en
Publication of CN114979785A publication Critical patent/CN114979785A/en
Application granted granted Critical
Publication of CN114979785B publication Critical patent/CN114979785B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/472End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
    • H04N21/47205End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for manipulating displayed content, e.g. interacting with MPEG-4 objects, editing locally
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
    • H04N21/440245Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display the reformatting operation being performed only on part of the stream, e.g. a region of the image or a time segment

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Human Computer Interaction (AREA)
  • Databases & Information Systems (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The embodiment of the application provides a video processing method and a related device, which relate to the technical field of terminals, and the method comprises the following steps: the terminal equipment displays a video to be processed; the method comprises the steps that terminal equipment obtains indication information used for indicating a mirror moving track for processing a video to be processed; the terminal equipment acquires a key frame of the video to be processed and pose information corresponding to the key frame based on the indication information; the terminal equipment trains to obtain a target nerve radiation field network according to the key frame and the pose information corresponding to the key frame; the terminal equipment inputs the target moving mirror track into a target nerve radiation field network to obtain a target image sequence; the target mirror moving track is carried in the indication information, or the target mirror moving track is self-defined; and rendering the target image sequence by the terminal equipment to obtain a video conforming to the target mirror moving track. Therefore, the terminal equipment can process the video to be processed to obtain the video of the target mirror moving track different from the original mirror moving track, and the user experience is improved.

Description

Video processing method and related device
Technical Field
The present application relates to the field of terminal technologies, and in particular, to a video processing method and a related apparatus.
Background
With the development of terminal technology, shooting with electronic equipment has become a common technical means for video shooting. In the shooting process, the object can be shot by moving the lens, or changing the optical axis of the lens, or changing the focal length of the lens and other modes, so that the technical effect of shooting videos is improved. The moving track of the lens in the whole shooting process can be used as the moving mirror track of the video.
At present, when a video of a certain scene is shot, the shot video may have a poor effect due to various reasons, for example, professional degree limitation of a photographer, an emergency situation in the shooting process, and the like, and the requirements of a user cannot be met. At this time, the user is required to replace the new mirror moving track and shoot the video of the scene again until the shot video meets the requirements of the user.
However, the above-mentioned method of shooting video requires the user to shoot for many times, which consumes a lot of time, resulting in poor user experience.
Disclosure of Invention
The embodiment of the application provides a video processing method and a related device, which can process a shot video into a video of a mirror-moving track meeting the requirements of a user, and improve the user experience.
In a first aspect, an embodiment of the present application provides a video processing method, where the method includes:
displaying a video to be processed; acquiring indication information for indicating a mirror-moving track for processing a video to be processed; acquiring a key frame of a video to be processed and pose information corresponding to the key frame based on the indication information; training to obtain a target nerve radiation field network according to the key frame and the pose information corresponding to the key frame; inputting the target mirror moving track into a target nerve radiation field network to obtain a target image sequence; the target mirror moving track is carried in the indication information, or the target mirror moving track is self-defined; and rendering the target image sequence to obtain a video conforming to the target mirror moving track. Therefore, the embodiment of the application realizes the modification of the mirror movement track of the video to be processed through the neural radiation field network, obtains the video conforming to the target mirror movement track, and improves the user experience.
In a possible implementation manner, training to obtain a target neural radiation field network according to a key frame and pose information corresponding to the key frame includes: inputting pose information corresponding to the key frame into an initial nerve radiation field network to obtain an image frame to be adjusted; and adjusting the initial nerve radiation field network according to the difference between the image frame to be adjusted and the key frame until the difference between the image frame to be adjusted and the key frame meets the preset condition, thereby obtaining the target nerve radiation field network. Therefore, the initial neural radiation field network is trained by using the camera pose corresponding to the key frame, and the target neural radiation field network which accords with the real scene corresponding to the video to be processed can be obtained.
In a possible implementation manner, the initial neural radiation field network distinguishes a foreground neural radiation field and a background neural radiation field, wherein the volume density corresponding to the object distance of the camera in the foreground neural radiation field being smaller than or equal to a preset value is 0, and the volume density corresponding to the object distance of the camera in the background neural radiation field being larger than the preset value is 0. Therefore, the target nerve radiation field network is closer to a real scene corresponding to the video to be processed by distinguishing the foreground nerve radiation field from the background nerve radiation field.
In one possible implementation, the target neural radiation field network includes a first target neural radiation field network and a second target neural radiation field network; training to obtain a target nerve radiation field network according to the key frame and the pose information corresponding to the key frame, wherein the training comprises the following steps: inputting pose information corresponding to the key frame into a first initial nerve radiation field network to obtain a foreground image frame to be adjusted; adjusting a first initial nerve radiation field network according to the difference between the foreground of the foreground image frame to be adjusted and the foreground of the key frame until the difference between the foreground of the foreground image frame to be adjusted and the foreground of the key frame meets a first preset condition, and obtaining a first target nerve radiation field network; inputting pose information corresponding to the key frame into a second initial nerve radiation field network to obtain a background image frame to be adjusted; and adjusting the second initial nerve radiation field network according to the difference between the background image frame to be adjusted and the background of the key frame until the difference between the background image frame to be adjusted and the background of the key frame meets a second preset condition, and obtaining a second target nerve radiation field network. Therefore, the first target nerve radiation field network corresponding to the foreground of the key frame and the second target nerve radiation field network corresponding to the background of the key frame are obtained, the reduction degree of the real scene corresponding to the video to be processed is further enhanced, and the definition of the video which is obtained after rendering and accords with the target mirror moving track can be effectively improved.
In one possible implementation, inputting the target mirror-moving trajectory into the target nerve radiation field network to obtain a target image sequence, including: respectively inputting the target mirror moving track into a first target nerve radiation field network and a second target nerve radiation field network to obtain a first target image sequence and a second target image sequence; and fusing the first target image sequence and the second target image sequence to obtain a target image sequence. Therefore, the target image sequence is obtained by fusing the first image sequence corresponding to the foreground and the second image sequence corresponding to the background, so that the video which is obtained by rendering the target image and accords with the target mirror moving track is closer to the shooting effect in a real scene, and the user experience is improved.
In one possible implementation, inputting the target mirror-moving trajectory into the target nerve radiation field network to obtain a target image sequence, including: and respectively outputting image frames by the target nerve radiation field network according to the sequence of the poses of the cameras in the target mirror moving track to obtain a target image sequence. Therefore, the video corresponding to the target image sequence can be ensured to conform to the target mirror moving track.
In one possible implementation manner, acquiring indication information for indicating a mirror-moving track for processing a video to be processed includes: receiving a first operation corresponding to a video to be processed; in response to the first operation, displaying one or more first recommended mirror-moving tracks; and when a second operation on a target mirror moving track in the one or more first recommended mirror moving tracks is received, obtaining indication information, wherein the indication information comprises the target mirror moving track. In this way, the user can execute the operation on the video to be processed, and the display of one or more first recommended mirror-moving tracks on the terminal device is realized. In addition, the terminal device can acquire execution information including the target mirror movement track according to the operation executed by the user for one or more first recommended mirror movement tracks.
In one possible implementation manner, acquiring indication information for indicating a mirror-moving track for processing a video to be processed includes: receiving a third operation corresponding to the video to be processed; responding to a third operation to obtain indication information; before inputting the target mirror-moving trajectory into the target nerve radiation field network, the method further comprises the following steps: and acquiring a target mirror moving track. Therefore, the terminal equipment can acquire the indication information according to the operation to be performed on the video to be processed by the user, and acquire the target mirror moving track before inputting the target mirror moving track into the target nerve radiation field network, so as to ensure that the video conforming to the target mirror moving track can be obtained based on the target nerve radiation field network.
In one possible implementation, acquiring a target mirror trajectory includes: displaying a first interface, wherein the first interface comprises one or more second recommended mirror moving tracks, and the one or more second recommended mirror moving tracks are mirror moving tracks meeting the constraint of the pose information corresponding to the key frames; and when a fourth operation on the target mirror moving track in the one or more second recommended mirror moving tracks is received, obtaining the target mirror moving track. In this way, the terminal device may display one or more second recommended mirror-moving trajectories that meet the constraints of the pose information corresponding to the keyframes. And the target mirror moving track obtained by the terminal equipment meets the constraint of the pose information corresponding to the key frame, and images corresponding to a plurality of camera poses in the target mirror moving track can be accurately obtained.
In one possible implementation, acquiring a target mirror trajectory includes: displaying a second interface, wherein the second interface comprises an original viewpoint track, a recommended viewpoint track and an editable optional viewpoint track of the key frame; receiving a fifth operation on the self-selection viewpoint track; and generating a target mirror-moving track in response to the fifth operation. Therefore, the user can execute the operation of the editable self-selection viewpoint track in the second interface, and the mirror-moving track is edited, so that the edited target mirror-moving track is obtained, and the user experience is improved.
In a possible implementation manner, the self-selection viewpoint track includes a camera pose to be processed, and the fifth operation is an operation of processing the camera pose to be processed; generating a target mirror trajectory in response to a fifth operation, comprising: and responding to the fifth operation, and generating a target mirror moving track according to the processed camera pose. Therefore, the user can execute the operation of processing the camera pose to be processed, the user can customize the camera pose in the mirror moving track, the terminal equipment can obtain the target video track formed by the processed camera poses, and the user experience is improved.
In a possible implementation manner, the self-selection viewpoint track comprises a to-be-determined mirror moving mode and a corresponding to-be-determined time length; the fifth operation comprises the operation of a target mirror moving mode and/or a target time length; generating a target mirror trajectory in response to a fifth operation, comprising: and responding to the fifth operation, and generating a target mirror moving track according to the target mirror moving mode and/or the target time length. Therefore, the user can execute the operation on the target mirror moving mode and/or the target time length, the user can customize the mirror moving mode and/or the target time length in the target mirror moving track, so that the terminal equipment generates the target mirror moving track meeting the user requirements, and the user experience is improved.
In a possible implementation manner, acquiring a key frame of a video to be processed and pose information corresponding to the key frame includes: extracting frames from a video to be processed according to a preset time interval to obtain key frames; acquiring pose information corresponding to the key frame by using a feature retrieval matching algorithm and an incremental reconstruction algorithm; or, extracting frames from the video to be processed according to a preset time interval to obtain an initial key frame; removing initial key frames with the definition smaller than a definition threshold value in the initial key frames, and/or removing partial initial key frames with the similarity larger than a similarity threshold value in the initial key frames to obtain key frames; and acquiring pose information corresponding to the key frame by using a feature retrieval matching algorithm and an incremental reconstruction algorithm. Therefore, the initial key frame of the video to be processed is removed through the preset time interval and/or the similarity of the key frames, the accuracy of the target nerve radiation field network obtained by training according to the key frames and the pose information corresponding to the key frames is improved, and the target nerve radiation field network is closer to the real scene of the video to be processed.
In a second aspect, an embodiment of the present application provides a video processing apparatus, where the video processing apparatus may be a terminal device, and may also be a chip or a chip system in the terminal device. The video processing apparatus may include a display unit and a processing unit. When the video processing apparatus is a terminal device, the display unit may be a display screen. The display unit is configured to perform the steps of displaying, so that the terminal device implements the method related to displaying described in the first aspect or any one of the possible implementations of the first aspect, and the processing unit is configured to implement any method related to processing in the first aspect or any one of the possible implementations of the first aspect. When the video processing apparatus is a terminal device, the processing unit may be a processor. The video processing apparatus may further include a storage unit, which may be a memory. The storage unit is configured to store instructions, and the processing unit executes the instructions stored by the storage unit to enable the terminal device to implement the method described in the first aspect or any one of the possible implementation manners of the first aspect. When the video processing apparatus is a chip or a system of chips within a terminal device, the processing unit may be a processor. The processing unit executes the instructions stored by the storage unit to cause the terminal device to implement the first aspect or one of the possible implementations of the first aspect. The storage unit may be a storage unit (e.g., a register, a buffer, etc.) within the chip, or may be a storage unit (e.g., a read-only memory, a random access memory, etc.) located outside the chip within the terminal device.
Illustratively, the display unit is used for displaying the video to be processed. The processing unit is used for acquiring indication information for indicating a mirror-moving track for processing a video to be processed; and the video processing device is also used for acquiring the key frames of the video to be processed and the pose information corresponding to the key frames based on the indication information. The processing unit is used for training to obtain a target nerve radiation field network according to the key frame and the pose information corresponding to the key frame; the target mirror moving track is input into a target nerve radiation field network to obtain a target image sequence; the target mirror moving track is carried in the indication information, or the target mirror moving track is self-defined; and the video processing unit is also used for rendering the target image sequence to obtain a video conforming to the target mirror moving track.
In a possible implementation manner, the processing unit is specifically configured to input pose information corresponding to the key frame into an initial neural radiation field network to obtain an image frame to be adjusted; and adjusting the initial nerve radiation field network according to the difference between the image frame to be adjusted and the key frame until the difference between the image frame to be adjusted and the key frame meets the preset condition, thereby obtaining the target nerve radiation field network.
In a possible implementation manner, a foreground radiation field and a background radiation field are distinguished in the initial neural radiation field network, wherein the volume density corresponding to the object distance of the camera in the foreground radiation field being smaller than or equal to a preset value is 0, and the volume density corresponding to the object distance of the camera in the background radiation field being larger than the preset value is 0.
In one possible implementation, the target neural radiation field network includes a first target neural radiation field network and a second target neural radiation field network. The processing unit is specifically used for inputting the pose information corresponding to the key frame into the first initial neural radiation field network to obtain a foreground image frame to be adjusted; adjusting a first initial nerve radiation field network according to the difference between the foreground of the foreground image frame to be adjusted and the foreground of the key frame until the difference between the foreground of the foreground image frame to be adjusted and the foreground of the key frame meets a first preset condition, and obtaining a first target nerve radiation field network; inputting pose information corresponding to the key frame into a second initial nerve radiation field network to obtain a background image frame to be adjusted; and adjusting the second initial nerve radiation field network according to the difference between the background image frame to be adjusted and the background of the key frame until the difference between the background image frame to be adjusted and the background of the key frame meets a second preset condition, and obtaining a second target nerve radiation field network.
In a possible implementation manner, the processing unit is specifically configured to input the target mirror movement trajectory into a first target nerve radiation field network and a second target nerve radiation field network, respectively, to obtain a first target image sequence and a second target image sequence; and fusing the first target image sequence and the second target image sequence to obtain a target image sequence.
In a possible implementation manner, the processing unit is specifically configured to output, by the target nerve radiation field network, image frames according to a plurality of camera pose sequences in a target mirror movement trajectory, respectively, to obtain a target image sequence
In a possible implementation manner, the display unit is further configured to display one or more first recommended mirror motion trajectories in response to a first operation corresponding to the video to be processed. When the terminal device receives a second operation on a target mirror moving track in the one or more first recommended mirror moving tracks, the processing unit is specifically configured to obtain indication information, where the indication information includes the target mirror moving track.
In a possible implementation, the processing unit, in particular, receives a third operation corresponding to the video to be processed; in response to the third operation, indication information is obtained. And before the target mirror moving track is input into the target nerve radiation field network, the processing unit is also used for acquiring the target mirror moving track.
In a possible implementation manner, the display unit is further configured to display a first interface, where the first interface includes one or more second recommended mirror-moving trajectories, and the one or more second recommended mirror-moving trajectories are mirror-moving trajectories that meet constraints of pose information corresponding to the keyframes. And when the terminal equipment receives a fourth operation on the target mirror moving track in the one or more second recommended mirror moving tracks, the processing unit is further used for obtaining the target mirror moving track.
In a possible implementation manner, the display unit is further configured to display a second interface, where the second interface includes an original viewpoint track of the key frame, a recommended viewpoint track, and an editable self-selecting viewpoint track. And when the terminal equipment receives a fifth operation on the self-selection viewpoint track, the processing unit is also used for responding to the fifth operation and generating a target mirror-moving track.
In a possible implementation manner, the self-selection viewpoint track includes a camera pose to be processed, and the fifth operation is an operation of processing the camera pose to be processed. And the processing unit is specifically used for responding to the fifth operation and generating a target mirror-moving track according to the processed camera pose.
In a possible implementation manner, the self-selection viewpoint track comprises a to-be-determined mirror moving mode and a corresponding to-be-determined time length; the fifth operation comprises the operation of the target mirror moving mode and/or the target time length. And the processing unit is specifically used for responding to the fifth operation and generating a target mirror moving track according to the target mirror moving mode and/or the target time length.
In a possible implementation manner, the processing unit is specifically configured to extract frames from a video to be processed according to a preset time interval to obtain a key frame; acquiring pose information corresponding to the key frame by using a feature retrieval matching algorithm and an incremental reconstruction algorithm; or, extracting frames from the video to be processed according to a preset time interval to obtain an initial key frame; removing initial key frames with the definition smaller than a definition threshold value in the initial key frames, and/or removing partial initial key frames with the similarity larger than a similarity threshold value in the initial key frames to obtain key frames; and acquiring pose information corresponding to the key frame by using a feature retrieval matching algorithm and an incremental reconstruction algorithm.
In a third aspect, an embodiment of the present application provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the electronic device is caused to execute the video processing method described in the first aspect or any one of the possible implementation manners of the first aspect.
In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, where a computer program or an instruction is stored, and when the computer program or the instruction runs on a computer, the computer is caused to execute the video processing method described in the first aspect or any one of the possible implementation manners of the first aspect.
In a fifth aspect, the present application provides a computer program product including a computer program, which when run on a computer, causes the computer to execute the video processing method described in the first aspect or any one of the possible implementation manners of the first aspect.
It should be understood that the second aspect to the sixth aspect of the present application correspond to the technical solutions of the first aspect of the present application, and the beneficial effects achieved by the aspects and the corresponding possible implementations are similar and will not be described again.
Drawings
Fig. 1 is a schematic structural diagram of a terminal device according to an embodiment of the present application;
fig. 2 is a block diagram of a software structure of a terminal device according to an embodiment of the present disclosure;
fig. 3 is a schematic flowchart of a video processing method according to an embodiment of the present application;
fig. 4 is a first schematic view of an interface for acquiring indication information according to an embodiment of the present disclosure;
fig. 5 is a schematic view of a second interface for acquiring indication information according to an embodiment of the present application;
fig. 6 is an interface schematic diagram of a terminal device displaying a first interface according to an embodiment of the present application;
fig. 7 is an interface schematic diagram of a terminal device displaying a second interface according to an embodiment of the present application;
FIG. 8 is a first schematic interface diagram of a user-defined target mirror-moving trajectory according to an embodiment of the present application;
fig. 9 is a second schematic interface diagram of a custom target mirror-moving track provided in the embodiment of the present application;
fig. 10 is a first interface schematic diagram for displaying a target mirror moving manner according to an embodiment of the present disclosure;
fig. 11 is a second interface schematic diagram for displaying a target mirror moving mode according to an embodiment of the present application;
fig. 12 is a third schematic interface diagram for displaying a target mirror moving manner according to an embodiment of the present application;
fig. 13 is a flowchart illustrating a method for training a target neural radiation field network according to an embodiment of the present application;
fig. 14 is a schematic flowchart of another video processing method according to an embodiment of the present application;
fig. 15 is a flowchart illustrating a method for obtaining a key frame and a camera pose corresponding to the key frame according to an embodiment of the present disclosure;
fig. 16 is a schematic flowchart of training and application of a neural radiation field network according to an embodiment of the present application;
fig. 17 is a schematic hardware structure diagram of a control device according to an embodiment of the present application;
fig. 18 is a schematic structural diagram of a chip according to an embodiment of the present application.
Detailed Description
In the embodiments of the present application, terms such as "first" and "second" are used to distinguish the same or similar items having substantially the same function and action. For example, the first interface and the second interface are for distinguishing different response interfaces, and the order of the first interface and the second interface is not limited. Those skilled in the art will appreciate that the terms "first," "second," etc. do not denote any order or quantity, nor do the terms "first," "second," etc. denote any order or importance.
It is noted that, in the present application, words such as "exemplary" or "for example" are used to mean exemplary, illustrative, or descriptive. Any embodiment or design described herein as "exemplary" or "e.g.," is not necessarily to be construed as preferred or advantageous over other embodiments or designs. Rather, use of the word "exemplary" or "such as" is intended to present concepts related in a concrete fashion.
In order to facilitate clear description of the technical solutions of the embodiments of the present application, some terms and techniques referred to in the embodiments of the present application are briefly described below:
pose: i.e., position and attitude, are the position of the rigid body in space and its own attitude, and the camera attitude, i.e., the position of the camera in space and the orientation of the camera.
Mirror moving track: the moving route of the lens of the camera in the three-dimensional space is represented, and the mirror moving track can be formed by a plurality of camera poses. The moving mirror track of the video can be understood as the track of the lens moving in the shooting process.
A mirror moving mode: the mode of moving the lens includes pushing the lens, pulling the lens, shaking the lens, moving the lens, following the lens, lifting the lens, and moving the lens comprehensively. The multiple mirror moving modes are realized by moving the position of the camera, changing the optical axis of the lens, changing the focal length of the lens and the like.
Neural radiation field network: representation of a three-dimensional scene is an implicit representation of a scene, which is called NeRF for short. The method can learn the image of the known camera parameters through a neural network to obtain a static three-dimensional scene.
In a scene of shooting videos by using the terminal equipment, a user can shoot videos with different mirror movement tracks by controlling the movement of the terminal equipment. For example, in the process of video, a user can use a push shot to make a shot from far to near to an object so as to realize close-up of the shot object and enhance the artistic effect of the shot video.
When shooting a video of a certain scene, due to reasons such as professional degree limitation of a photographer, the technical effect of the shot video may be poor, and the requirement of a user on the video effect cannot be met. At this time, the user can change the mirror-moving track in the shooting process to shoot the scene again until the shot video meets the requirements of the user. However, it takes a lot of time to repeatedly capture images of a scene, and due to the limitation of professional level, a video meeting the requirements of a user may not be captured, resulting in poor user experience.
In order to solve the problem that the user experience is poor due to the fact that the artistic effect of the video is poor and the video of a certain scene needs to be shot repeatedly, the artistic effect of the video is related to the mirror moving track used in the shooting process, the video is shot by using the better mirror moving track, and the artistic effect of the video can be effectively improved. Therefore, for videos which do not meet the requirements of users, the terminal equipment can establish a three-dimensional scene corresponding to a real scene in the videos through the nerve radiation field network, so that the terminal equipment can output an image sequence according to a target mirror moving track which meets the requirements of the users, obtain videos which meet the target mirror moving track, and obtain videos of new visual angles which do not exist in original videos.
According to the technical scheme, the user does not need to repeatedly shoot a certain scene, the modification of the mirror moving track corresponding to the shot video by the terminal equipment is realized, the new mirror moving track video meeting the user requirements is obtained, namely the video under the new visual angle, a large amount of time is saved, and the user experience can be effectively improved.
It can be understood that the terminal device may be a smart phone, a tablet, or the like, or the terminal device may also be a wearable device, such as a smart watch, a smart bracelet, a wearable Virtual Reality (VR) device, or a wearable Augmented Reality (AR) device. The specific technology and the specific device form adopted by the terminal device are not limited in the embodiment of the application.
Therefore, in order to better understand the embodiments of the present application, the following describes the structure of the terminal device according to the embodiments of the present application. Exemplarily, fig. 1 is a schematic structural diagram of a terminal device provided in an embodiment of the present application.
The terminal device may include a processor 110, an external memory interface 120, an internal memory 121, a Universal Serial Bus (USB) interface 130, a charging management module 140, a power management module 141, an antenna 1, an antenna 2, a mobile communication module 150, a wireless communication module 160, an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, a sensor module 180, a button 190, an indicator 192, a camera 193, a display 194, and the like.
It is to be understood that the illustrated structure of the embodiments of the present application does not constitute a specific limitation to the terminal device. In other embodiments of the present application, a terminal device may include more or fewer components than shown, or some components may be combined, some components may be split, or a different arrangement of components may be used. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.
Processor 110 may include one or more processing units. The different processing units may be separate devices or may be integrated into one or more processors. A memory may also be provided in processor 110 for storing instructions and data.
The USB interface 130 is an interface conforming to the USB standard specification, and may specifically be a Mini USB interface, a Micro USB interface, a USB Type C interface, or the like. The USB interface 130 may be used to connect a charger to charge the terminal device, and may also be used to transmit data between the terminal device and a peripheral device. And the earphone can also be used for connecting an earphone and playing audio through the earphone. The interface may also be used to connect other electronic devices, such as AR devices and the like.
The charging management module 140 is configured to receive charging input from a charger. The charger may be a wireless charger or a wired charger. The power management module 141 is used for connecting the charging management module 140 and the processor 110.
The wireless communication function of the terminal device can be realized by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, the modem processor, the baseband processor, and the like.
The antennas 1 and 2 are used for transmitting and receiving electromagnetic wave signals. Antennas in terminal devices may be used to cover single or multiple communication bands. Different antennas can also be multiplexed to improve the utilization of the antennas.
The mobile communication module 150 may provide a solution including 2G/3G/4G/5G wireless communication applied on the terminal device. The mobile communication module 150 may include at least one filter, a switch, a power amplifier, a Low Noise Amplifier (LNA), and the like. The mobile communication module 150 may receive the electromagnetic wave from the antenna 1, filter, amplify, etc. the received electromagnetic wave, and transmit the electromagnetic wave to the modem processor for demodulation.
The wireless communication module 160 may provide a solution for wireless communication applied to a terminal device, including Wireless Local Area Networks (WLANs) (e.g., wireless fidelity (Wi-Fi) networks), Bluetooth (BT), Global Navigation Satellite System (GNSS), Frequency Modulation (FM), and the like.
The terminal device realizes the display function through the GPU, the display screen 194, and the application processor. The GPU is a microprocessor for image processing, and is connected to the display screen 194 and an application processor. The GPU is used to perform mathematical and geometric calculations for graphics rendering.
The display screen 194 is used to display images, video, and the like. The display screen 194 includes a display panel. In some embodiments, the terminal device may include 1 or N display screens 194, with N being a positive integer greater than 1.
The terminal device can realize the shooting function through the ISP, the camera 193, the video codec, the GPU, the display screen 194, the application processor, and the like.
The camera 193 is used to capture still images or video. In some embodiments, the terminal device may include 1 or N cameras 193, N being a positive integer greater than 1.
The external memory interface 120 may be used to connect an external memory card, such as a Micro SD card, to extend the storage capability of the terminal device. The external memory card communicates with the processor 110 through the external memory interface 120 to implement a data storage function. For example, files such as music, video, etc. are saved in an external memory card.
The internal memory 121 may be used to store computer-executable program code, which includes instructions. The internal memory 121 may include a program storage area and a data storage area.
The terminal device can implement an audio function through the audio module 170, the speaker 170A, the receiver 170B, the microphone 170C, the earphone interface 170D, and the application processor. Such as music playing, recording, etc.
The audio module 170 is used to convert digital audio information into an analog audio signal output and also to convert an analog audio input into a digital audio signal. The speaker 170A, also called a "horn", is used to convert the audio electrical signal into an acoustic signal. The terminal device can listen to music through the speaker 170A, or listen to a handsfree call. The receiver 170B, also called "earpiece", is used to convert the electrical audio signal into an acoustic signal. When the terminal device answers a call or voice information, it is possible to answer a voice by bringing the receiver 170B close to the human ear. The earphone interface 170D is used to connect a wired earphone.
The microphone 170C, also referred to as a "microphone," is used to convert sound signals into electrical signals. In the embodiment of the present application, the terminal device may receive a sound signal for waking up the terminal device based on the microphone 170C and convert the sound signal into an electrical signal that can be subsequently processed, and the terminal device may have at least one microphone 170C.
The sensor module 180 may include one or more of the following sensors, for example: a pressure sensor, a gyroscope sensor, an air pressure sensor, a magnetic sensor, an acceleration sensor, a distance sensor, a proximity light sensor, a fingerprint sensor, a temperature sensor, a touch sensor, an ambient light sensor, or a bone conduction sensor, etc. (not shown in fig. 3).
The keys 190 include a power-on key, a volume key, and the like. The keys 190 may be mechanical keys. Or may be touch keys. The terminal device may receive a key input, and generate a key signal input related to user setting and function control of the terminal device. Indicator 192 may be an indicator light that may be used to indicate a state of charge, a change in charge, or a message, missed call, notification, etc.
The software system of the terminal device 100 may adopt a hierarchical architecture, an event-driven architecture, a micro-core architecture, a micro-service architecture, a cloud architecture, or the like. The embodiment of the present application takes an Android system with a layered architecture as an example, and exemplarily illustrates a software structure of the terminal device 100.
Fig. 2 is a block diagram of a software structure of a terminal device according to an embodiment of the present application.
The layered architecture divides the software into several layers, each layer having a clear role and division of labor. The layers communicate with each other through a software interface. In some embodiments, the Android system is divided into five layers, an application layer, an application framework layer, an Android runtime (Android runtime) and system library layer, a hardware abstraction layer, and a kernel layer from top to bottom.
The application layer may include a series of application packages. As shown in fig. 2, the application packages may include phone, mailbox, calendar, camera, etc. applications.
The application framework layer provides an Application Programming Interface (API) and a programming framework for the application program of the application layer. The application framework layer includes a number of predefined functions.
As shown in FIG. 2, the application framework layers may include a window manager, an activity manager, a location manager, a package manager, a notification manager, a resource manager, a telephony manager, a view system, a frame rate decision manager, and the like.
A Window Manager (WMS) is used to manage the window program. The window manager can obtain the size of the display screen, judge whether a status bar exists, lock the screen, intercept the screen and the like.
The activity manager is used for managing the life cycle of each application program and the navigation backspacing function. The method is responsible for the creation of the main thread of the Android and the maintenance of the life cycle of each application program.
The location manager is used to provide location services for applications, including querying for last known location, registering and deregistering location updates from a periodic location, etc.
The package manager is used for program management within the system, for example: application installation, uninstallation, upgrade, and the like.
The notification manager enables the application to display notification information in the status bar, can be used to convey notification-type messages, can disappear automatically after a short dwell, and does not require user interaction. Such as a notification manager used to inform download completion, message alerts, etc. The notification manager may also be a notification that appears in the form of a chart or scroll bar text at the top status bar of the system, such as a notification of a background running application, or a notification that appears on the screen in the form of a dialog window. For example, text information is prompted in the status bar, a prompt tone is given, the terminal device vibrates, an indicator light flickers, and the like.
The resource manager provides various resources for the application, such as localized strings, icons, pictures, layout files, video files, and the like.
The phone manager is used for managing mobile device functions, and comprises: handset call state, obtaining telephone information (equipment, sim card, network information), monitoring telephone state and calling telephone dialer to make telephone call
The view system includes visual controls such as controls to display text, controls to display pictures, and the like. The view system may be used to build applications. The display interface may be composed of one or more views. For example, the display interface including the short message notification icon may include a view for displaying text and a view for displaying pictures.
The frame rate decision manager is used for determining the screen refreshing frame rate of the terminal equipment and selecting a switching mode of the screen refreshing frame rate.
The Android runtime comprises a core library and a virtual machine. The Android runtime is responsible for scheduling and managing an Android system.
The core library comprises two parts: one part is a function which needs to be called by java language, and the other part is a core library of android.
The application layer and the application framework layer run in a virtual machine. And executing java files of the application layer and the application framework layer into a binary file by the virtual machine. The virtual machine is used for performing the functions of object life cycle management, stack management, thread management, safety and exception management, garbage collection and the like. A display composition process (e.g., surface flicker) also runs in the virtual machine. The display composition process is used to control the composition of the images.
The system library may include a plurality of functional modules. For example: the system comprises an image drawing module, an image rendering module, an image synthesis module, a function library, a media library and the like.
The image drawing module is used for drawing two-dimensional or three-dimensional images. The image rendering module is used for rendering two-dimensional or three-dimensional images. The image synthesis module is used for synthesizing two-dimensional or three-dimensional images.
In a possible implementation manner, the application draws the image through the image drawing module, then renders the drawn image through the image rendering module, and then sends the rendered image to a cache queue of the display composition process. Each time Vsync comes, a display composition process (e.g., a surface flicker) sequentially acquires one frame of image to be composed from the buffer queue, and then performs image composition by the image composition module.
The function library provides macros, type definitions, character string operation functions, mathematical calculation functions, input and output functions, and the like used in the C language
The media library supports a variety of commonly used audio, video format playback and recording, and still image files, among others. The media library may support a variety of audio-video encoding formats, such as: MPEG4, H.264, MP3, AAC, AMR, JPG, PNG, etc.
The hardware abstraction layer may include a plurality of library modules, which may be hardware configuration modules (hwcomposer, HWC), camera library modules, and the like. The Android system can load corresponding library modules for the equipment hardware, and then the purpose that the application program framework layer accesses the equipment hardware is achieved. The device hardware may include, for example, an LCD screen, a camera, etc. in the electronic device.
The kernel layer is a layer between hardware and software. The kernel layer is used for driving hardware so that the hardware works. The inner core layer at least comprises an LCD/LED drive, a display drive, an audio drive, a camera drive, a sensor drive and the like.
The hardware may be audio devices, bluetooth devices, camera devices, sensor devices, etc.
The following describes the technical solutions of the present application and how to solve the above technical problems with specific embodiments. The following embodiments may be implemented independently or in combination, and details of the same or similar concepts or processes may not be repeated in some embodiments.
Fig. 3 is a schematic flowchart of a video processing method according to an embodiment of the present disclosure. Illustratively, referring to fig. 3, the video processing method may include:
and S301, displaying the video to be processed.
In the embodiment of the application, the video to be processed can be a video needing to change the mirror-moving track. The video to be processed may be a video pre-stored by the terminal device, for example, a video pre-shot by a user and stored in a gallery of the terminal device, a video downloaded through the internet, or the like. The embodiment of the present application does not set any limit to the specific content of the video to be processed.
In a possible implementation, the terminal device may receive an operation triggered by a user to display the video to be processed, and display the video to be processed in response to the operation.
S302, acquiring indication information for indicating a mirror moving track for processing the video to be processed.
In this embodiment of the application, the indication information for indicating the mirror-moving track for processing the video to be processed may be obtained when the user performs an operation on the video to be processed on the display interface of the terminal device. It can be understood that, when a user performs different operations on a video to be processed, the terminal device may obtain different indication information, where the indication information may or may not carry a target mirror moving track.
In a possible implementation, the terminal device may receive an operation triggered by a user and used for instructing processing of a mirror movement track of the video to be processed, and in response to the operation, the terminal device obtains instruction information used for instructing processing of the mirror movement track of the video to be processed. Illustratively, there are two possible implementations for the terminal device to acquire the indication information for indicating the mirror-moving track for processing the video to be processed. In the following, in the embodiments corresponding to fig. 4 to fig. 13, a terminal device is taken as an example for illustration, and the example description does not limit the embodiments of the present application.
In a first possible implementation, when the terminal device receives a first operation corresponding to a video to be processed, the terminal device may display one or more first recommended mirror trajectory in response to the first operation. When the terminal equipment displays the one or more first recommended mirror moving tracks, and when the terminal equipment receives second operation on a target mirror moving track in the one or more first recommended mirror moving tracks, the indication information is obtained. Wherein, the indication information comprises a target mirror-moving track. Fig. 4 is a schematic view of a first interface for acquiring indication information according to an embodiment of the present application.
In the interface state shown in a in fig. 4, when the mobile phone receives a click operation of the editing control by the user, the mobile phone may display an interface shown in b in fig. 4. An interface shown as a in fig. 4, where an editing control is used to instruct the mobile phone to edit the video displayed by the current interface. As shown in a in fig. 4, the interface may further include one or more controls, such as: sharing, collecting, deleting, adding, etc. As shown in fig. 4 b, the interface may further include one or more controls, such as: picture-in-picture, rotation, mirror-motion trajectory, deletion, more, etc.
Further, when the mobile phone receives an operation of the mirror-moving track control clicked by the user in the interface shown as b in fig. 4, the mobile phone may display the interface shown as c in fig. 4. As shown in c in fig. 4, the interface includes a plurality of controls of the first recommended mirror movement track, namely a mirror movement track 1, a mirror movement track 2, a mirror movement track 3, a mirror movement track 4, a mirror movement track 5, a mirror movement track 6, a mirror movement track 7 and a mirror movement track 8. When the mobile phone receives an operation that a user clicks a recommended mirror moving track 2 control in an interface shown as c in fig. 4, indication information including a mirror moving track 2 is obtained.
In a second possible implementation, when receiving a third operation corresponding to a video to be processed, the terminal device obtains the indication information in response to the third operation. Fig. 5 is a schematic view of a second interface for acquiring indication information according to an embodiment of the present application.
In the interface state shown in a in fig. 5, when the mobile phone receives a click operation of the user on the edit control, the mobile phone may display an interface shown in b in fig. 5. When the mobile phone receives the operation of the mirror movement track control clicked by the user in the interface shown as b in fig. 5, the indication information which does not carry the target mirror movement track is obtained. The other contents shown in a in fig. 5 are similar to those shown in a in fig. 4, and are not described again here. The content shown in b in fig. 5 is similar to that shown in b in fig. 4, and is not described in detail here.
And S303, acquiring key frames of the video to be processed and pose information corresponding to the key frames based on the indication information.
In this embodiment of the application, the key frame may be a video frame capable of representing a real scene corresponding to the video to be processed. The pose information may be the position of the camera corresponding to each key frame in the space of the real scene corresponding to the video to be processed and the orientation of the camera. The key frames and the pose information corresponding to the key frames are used for reconstructing a three-dimensional space of a real scene corresponding to the video to be processed.
It can be understood that the video to be processed includes a large number of video frames, the number of key frames of the video to be processed, which are acquired by the terminal device, may be multiple, and it can be ensured that a three-dimensional space reconstructed according to the key frames and pose information corresponding to the key frames is closer to a real scene. When the number of key frames is plural, the same processing is performed for each key frame.
In a possible implementation, the terminal device may perform frame extraction on the video to be processed based on the indication information, for example, perform frame extraction on the video to be processed according to a preset time interval to obtain a key frame of the video to be processed, and analyze the key frame of the video to be processed to obtain pose information corresponding to the key frame. The application is also not limited to the preset time interval for extracting the key frame.
And S304, training to obtain a target nerve radiation field network according to the key frame and the pose information corresponding to the key frame.
In the embodiment of the application, the target neural radiation field network may be used to represent a three-dimensional space of a real scene corresponding to a video to be processed.
Illustratively, the terminal device may input the keyframe and pose information corresponding to the keyframe to the initial neural radiation field network, and train the initial neural radiation field network to obtain the target neural radiation field network.
S305, inputting the target mirror moving track into a target nerve radiation field network to obtain a target image sequence.
In the embodiment of the application, the target mirror moving track is carried in the indication information, or the target mirror moving track is self-defined. The description of the above steps can be referred to for the target mirror-moving track carried in the indication information, and is not repeated here.
In the embodiment of the application, the target image sequence is an image sequence corresponding to a plurality of camera poses in the target mirror-moving track.
Illustratively, there are two possible implementations of the customized target mirror trajectory. In a first possible implementation, the target mirror-moving trajectory is selected by the user from a plurality of recommended mirror-moving trajectories, and in a second possible implementation, the target mirror-moving trajectory is a user-defined mirror-moving trajectory input by the user on an interface of the terminal device.
In a first possible implementation, according to the interface shown in b in fig. 5, when the terminal device receives an operation that the user clicks the mirror movement track control on the interface shown in b in fig. 5, the terminal device may display the first interface. The first interface can comprise one or more second recommended mirror-moving tracks, wherein the one or more second recommended mirror-moving tracks are mirror-moving tracks meeting the constraint of the pose information corresponding to the key frame. And when the terminal equipment receives a fourth operation on the target mirror moving track in the one or more second recommended mirror moving tracks, obtaining the target mirror moving track.
It can be understood that the pose information corresponding to the keyframe is constrained to the range of camera poses corresponding to the three-dimensional space reconstructed by the target neural radiation field network. For example, the video to be processed is the video on the front side and the left and right sides of the person, the target nerve radiation field network is a three-dimensional space capable of reconstructing the real scene where the person is located, and the video behind the person does not appear in the video, so that the scene on the back of the person cannot be reconstructed. In this case, the camera pose range corresponding to the three-dimensional scene reconstructed by the target nerve radiation field network can only cover the front side and the left and right sides of the person, but cannot cover the back side of the person.
Exemplarily, fig. 6 is an interface schematic diagram of a terminal device displaying a first interface according to an embodiment of the present application. As shown in fig. 6, in addition to the video to be processed, the interface may further include one or more controls for recommending a mirror movement track, such as: a mirror moving track A, a mirror moving track B, a mirror moving track C, a mirror moving track D, a mirror moving track E, a mirror moving track F and the like. A plurality of second recommended mirror-moving trajectories in the interface shown in fig. 6 all conform to the constraint of the pose information corresponding to the keyframe.
In a second possible implementation, according to the interface shown in b in fig. 5, when the terminal device receives an operation that the user clicks the mirror motion track control on the interface shown in b in fig. 5, the terminal device may display a second interface, where the second interface includes an original viewpoint track of the key frame, a recommended viewpoint track, and an editable self-selecting viewpoint track. And when the terminal equipment receives a fifth operation on the self-selection viewpoint track, responding to the fifth operation and generating a target mirror moving track.
Exemplarily, fig. 7 is an interface schematic diagram of a terminal device displaying a second interface according to an embodiment of the present application.
As shown in fig. 7, the second interface includes an original viewpoint track, a recommended viewpoint track, and an editable self-selecting viewpoint track. The original viewpoint track, the recommended viewpoint track and the editable self-selection viewpoint track are all represented by line segments with arrows, a plurality of editable camera poses are distributed on the editable self-selection viewpoint track, and the editable camera poses in the interface shown in fig. 6 are displayed by using camera-like identifiers. The interface can also comprise a custom mirror-moving track control.
For example, the self-selection viewpoint track in the second interface displayed by the terminal device may include a camera pose to be processed, and may also include a target mirror moving mode and/or a target time length. The terminal equipment responds to the fifth operation of the self-selection viewpoint track, and two possible implementations exist for generating the target mirror moving track. In a first possible implementation, the fifth operation may be an operation of processing a camera pose to be processed included in the self-selection viewpoint trajectory. In a second possible implementation, the fifth operation may be an operation on the target mirror moving mode and/or the target duration included in the self-selected viewpoint trajectory.
In a first possible implementation, the trajectory of the selected viewpoint in the second interface displayed by the terminal device may include a camera pose to be processed, and the terminal device may generate the target mirror movement trajectory according to the processed camera pose in response to a fifth operation of processing the camera pose to be processed.
Fig. 8 is a schematic interface diagram i of a customized target mirror movement track provided in the embodiment of the present application. In the interface state shown in a in fig. 8, when the mobile phone receives an operation, such as a drag operation or a click operation, of the user on the multiple to-be-processed camera poses on the selected viewpoint trajectory in the interface, the interface shown in b in fig. 8 is obtained, and the selected viewpoint trajectory formed by the multiple camera poses in the interface is the target mirror-moving trajectory. The content shown in a in fig. 8 is similar to that shown in fig. 7, and is not described again here.
Fig. 9 is a schematic interface diagram of a customized target mirror-moving track provided in the embodiment of the present application. In the interface state shown as a in fig. 9, when the mobile phone receives a user click operation on the custom mirror-moving track in the interface, the mobile phone may display the interface shown as b in fig. 9. When the mobile phone receives an operation of the user in the interface shown as b in fig. 9, a plurality of camera poses are distributed on the moving mirror trajectory in the interface shown as b in fig. 9. The user performs an operation in the interface shown as b in fig. 9, for example, the user performs a click operation in different positions of the interface shown as b in fig. 9, a target moving mirror trajectory is generated according to the clicked position and the sequence of the motors, or the user performs a gesture operation in the interface shown as b in fig. 9, and the trajectory of the gesture operation is the target moving mirror trajectory. The content shown in a in fig. 9 is similar to that shown in fig. 7, and is not described again here.
In a second possible implementation, the self-selection viewpoint track in the second interface displayed by the terminal device may include the mirror moving mode to be determined and the corresponding duration to be determined. And the terminal equipment responds to the fifth operation on the target mirror moving mode and/or the target time length and generates a target mirror moving track according to the target mirror moving mode and/or the target time length.
Illustratively, the terminal device may generate the target mirror moving track according to the target mirror moving mode and the target duration in response to a fifth operation on the target mirror moving mode and the target duration. Fig. 10 is a first interface schematic diagram for displaying a target mirror moving mode according to an embodiment of the present application. As shown in the interface shown in fig. 10, when receiving operations of a user clicking a roll control and a push control in the interface shown in fig. 10 and operations of filling in a time length, the mobile phone generates a target mirror-moving track according to a time length corresponding to the roll of 15 seconds and a time length corresponding to the recommendation of 5 seconds. As shown in the interface shown in fig. 10, when the mobile phone receives a click operation of the mirror-moving manner by the user, according to the sequence of the click operation, the serial number of the sequence of the mirror-moving manner in the mirror-moving track may be displayed on the control of the mirror-moving manner. As with the interface shown in fig. 10, the mobile phone generates a mirror trajectory that rolls for 15 seconds and then advances for 5 seconds.
For example, the terminal device may generate the target mirror movement track according to the target mirror movement mode in response to a fifth operation on the target mirror movement mode. Fig. 11 is a second schematic interface diagram for displaying a target mirror moving manner according to an embodiment of the present application. As shown in the interface shown in fig. 11, when the mobile phone receives an operation of clicking a roll control and a push control in the interface shown in fig. 11, the sequence number of the mirror moving mode in the mirror moving track and the corresponding mirror moving time are displayed on the selected mirror moving mode control according to the sequence of the clicking operation. When the mobile phone receives that the user clicks the determination control in the interface shown in fig. 11, the mobile phone generates the target mirror movement track according to the roll and the corresponding first mirror movement mode duration and the push and the corresponding second mirror movement mode duration. The interface shown in fig. 11, the mobile phone generates a moving mirror trajectory that rolls for 15 seconds and then advances for 5 seconds.
For example, the terminal device may generate the target mirror movement track according to the target duration in response to a fifth operation on the target duration. Fig. 12 is a third interface schematic diagram for displaying a target mirror moving mode according to the embodiment of the present application. As shown in the interface of fig. 12, the user can input the mirror-moving time length corresponding to the roll and the mirror-moving time length corresponding to the zoom in and the zoom out in the interface of fig. 12. When the mobile phone receives the mirror movement time length input by the user in the interface shown in fig. 12 and receives the click operation of the user on the determination control in the interface shown in fig. 12, the target mirror movement track is generated according to the roll and the mirror movement time length corresponding to the roll, the zoom-out and the mirror movement time length corresponding to the zoom-out.
In the embodiment of the application, the terminal device can obtain the target mirror moving track through any one of the above manners, the obtained target mirror moving track is input into the target nerve radiation field network, and the target nerve radiation field network respectively outputs the image frames according to the sequence of the poses of the plurality of cameras in the target mirror moving track to obtain the target image sequence.
S306, rendering the target image sequence to obtain a video conforming to the target mirror moving track.
Illustratively, the terminal device may render the target image sequence through the volume renderer, that is, traverse all pixels on the image in each camera pose in the target mirror movement trajectory, determine a camera ray corresponding to the image in each camera pose, and integrate the color and the volume density of a sampling point on the camera ray to obtain a video frame in each camera pose. The terminal equipment can generate a video according with the target mirror moving track according to the sequence of the poses of each camera in the target track and the video frames corresponding to the poses.
Based on the method, the terminal equipment trains according to the key frame of the video to be processed and the pose information corresponding to the key frame to obtain the target nerve radiation field network, and the reconstruction of the real scene corresponding to the video to be processed is achieved. When the target mirror moving track is input into the target neural network, the target neural radiation field network can output image frames corresponding to a plurality of camera poses in the target mirror moving track, so that a video conforming to the target mirror moving track is obtained, the mirror moving track of the shot video is modified, a video with a new view angle is obtained, and user experience is improved.
In order to facilitate understanding of the video processing method provided in the embodiment of the present application, details of obtaining the target neural radiation field network described in step S203 and step S204 in the above embodiment are described below. For example, see fig. 13, which is a schematic flow chart of a method for training a network of a target neural radiation field according to an embodiment of the present application. The method may comprise the steps of:
s1301, acquiring a key frame of the video to be processed and pose information corresponding to the key frame.
In the embodiment of the present application, the key frame of the video to be processed may be a video frame in the video to be processed, which is capable of representing a real scene in the video.
In the embodiment of the application, the terminal equipment acquires the key frame of the video to be processed and the pose information corresponding to the key frame, and the method comprises two possible implementations.
In a first possible implementation, the terminal device may extract frames from the video to be processed according to a preset time interval to obtain key frames; and acquiring pose information corresponding to the key frame by using a feature retrieval matching algorithm and an incremental reconstruction algorithm. For example, the preset time interval may be negatively related to the speed of the camera moving in the video to be processed, and the preset time interval is not limited in the embodiment of the present application.
In a second possible implementation, the terminal device may extract frames from the video to be processed according to a preset time interval to obtain an initial key frame; the terminal equipment removes the initial key frames with the definition smaller than the definition threshold value in the initial key frames, and/or removes the part of the initial key frames with the similarity larger than the similarity threshold value in the initial key frames to obtain key frames; and acquiring pose information corresponding to the key frame by using a feature retrieval matching algorithm and an incremental reconstruction algorithm. For example, the predetermined time interval may be inversely related to the speed of camera movement in the video to be processed. The embodiment of the present application does not limit the preset time interval. The definition threshold and the similarity threshold may be set according to actual conditions, for example, the definition threshold may be a value such as 80% or 85%, and the similarity threshold may be a value such as 90% or 95%, which is not limited in this embodiment of the present application.
In the embodiment of the application, the terminal device trains to obtain the target nerve radiation field network according to the key frame of the video to be processed and the pose information corresponding to the key frame. In a first possible implementation, a target neural radiation field network is obtained through training, and the target neural radiation field network distinguishes between a foreground radiation field and a background radiation field, as described in step S1302 and step S1303. In a second possible implementation, two target neural radiation field networks are obtained through training, wherein one target neural radiation field network is a neural radiation field network corresponding to the foreground of the video to be processed, and the other target neural radiation field network is a neural radiation field network corresponding to the background of the video to be processed. As described in step S1304 and step S1305.
And S1302, inputting pose information corresponding to the key frame into the initial nerve radiation field network to obtain an image frame to be adjusted.
In this embodiment, the initial neural radiation field network may be a neural radiation field network constructed by the terminal device. The image frame to be adjusted may be an image frame corresponding to the key frame, and the camera pose of the image frame to be adjusted is the same as that of the key frame. The image frame to be adjusted may be used to determine whether the trained initial neural radiation field network has reached convergence.
For example, the initial neural radiation field network can distinguish a foreground radiation field from a background radiation field, where the volume density corresponding to the object distance of the camera in the foreground radiation field being less than or equal to the preset value is 0, and the volume density corresponding to the object distance of the camera in the background radiation field being greater than the preset value is 0. The accuracy and the definition of the background of the video to be processed can be ensured by distinguishing the foreground radiation field from the background radiation field.
In the embodiment of the present application, the volume density is the volume density of the sampling points on the camera ray corresponding to the video frame in the video to be processed. The preset value is an object distance of a camera for distinguishing a foreground and a background of a video to be processed, and is not limited in the embodiment of the application.
In possible implementation, the terminal device inputs pose information corresponding to the key frame into the initial neural radiation field network, trains the initial neural radiation field network, and in the training process, the initial neural radiation field network can output an image frame to be adjusted under the camera pose according to the input camera pose.
And S1303, adjusting the initial neural radiation field network according to the difference between the image frame to be adjusted and the key frame until the difference between the image frame to be adjusted and the key frame meets a preset condition, and obtaining a target neural radiation field network.
In the embodiment of the present application, the difference between the image frame to be adjusted and the key frame may be used to determine whether the initial neural radiation field network converges during the training process. The preset condition is a condition when the initial neural radiation field network converges, and the embodiment of the present application is not limited.
Illustratively, in the process of training the initial neural radiation field network by the terminal device, the terminal device may perform iterative training on the initial neural radiation field model according to the key frame and the pose information corresponding to the key frame. The terminal device can determine whether the trained initial nerve radiation field network is converged according to the image frame to be adjusted and the key frame, and when the difference between the image frame to be adjusted and the key frame meets a preset condition, namely the convergence condition is met, the initial nerve radiation field network is determined to be converged, and the terminal device obtains the target nerve radiation field network.
It can be understood that, because the initial neural radiation field network distinguishes between the foreground radiation field and the background radiation field, the trained target neural radiation field network is closer to the real scene corresponding to the video to be processed.
S1304, inputting pose information corresponding to the key frame into a first initial nerve radiation field network to obtain a foreground image frame to be adjusted; and adjusting the first initial nerve radiation field network according to the difference between the foreground of the foreground image frame to be adjusted and the foreground of the key frame until the difference between the foreground of the foreground image frame to be adjusted and the foreground of the key frame meets a first preset condition, and obtaining a first target nerve radiation field network.
In this embodiment of the application, the foreground image frame to be adjusted may be an image frame corresponding to a foreground in the key frame, and the foreground image frame to be adjusted corresponds to a camera pose of the key frame. The first target neural radiation field network may be a three-dimensional space corresponding to a foreground of a real scene corresponding to the video to be processed. The first preset condition is a convergence condition of the first initial neural radiation field network, which is not limited in the embodiment of the present application.
Illustratively, in the process of training the first initial neural radiation field network by the terminal device, the terminal device may perform iterative training on the first initial neural radiation field model according to the keyframe and the pose information corresponding to the keyframe. The terminal device may determine whether the trained first initial neural radiation field network converges according to the foreground image frame to be adjusted and the key frame, and when a difference between the foreground image frame to be adjusted and the key frame satisfies a first preset condition, that is, a convergence condition is satisfied, determine that the first initial neural radiation field network converges, and the terminal device obtains the first target neural radiation field network.
S1305, inputting pose information corresponding to the key frame into a second initial nerve radiation field network to obtain a background image frame to be adjusted; and adjusting the second initial nerve radiation field network according to the difference between the background image frame to be adjusted and the background of the key frame until the difference between the background image frame to be adjusted and the background of the key frame meets a second preset condition, and obtaining a second target nerve radiation field network.
In this embodiment of the application, the background image frame to be adjusted may be an image frame corresponding to a background in the key frame, where the background image frame to be adjusted corresponds to a camera pose of the key frame. The second target neural radiation field network may be a three-dimensional space corresponding to a background of a real scene corresponding to the video to be processed. The second preset condition is a convergence condition of the second initial neural radiation field network, which is not limited in the embodiment of the present application.
Illustratively, in the process of training the second initial neural radiation field network by the terminal device, the terminal device may perform iterative training on the second initial neural radiation field model according to the keyframe and the pose information corresponding to the keyframe. The terminal device may determine whether the trained second initial neural radiation field network converges according to the background image frame to be adjusted and the key frame, and when a difference between the background image frame to be adjusted and the key frame satisfies a second preset condition, that is, a convergence condition is satisfied, determine that the second initial neural radiation field network converges, and the terminal device obtains a second target neural radiation field network.
According to the second possible implementation of the training nerve radiation field, for step S305 in fig. 3, after obtaining the first target nerve radiation field network and the second target nerve radiation field network, the method for inputting the target moving mirror trajectory into the target nerve radiation field network to obtain the target image sequence is as follows:
illustratively, the terminal device may input the target mirror moving trajectory into the first target nerve radiation field network and the second target nerve radiation field network, respectively, to obtain a first target image sequence and a second target image sequence; and fusing the first target image sequence and the second target image sequence to obtain a target image sequence.
In an embodiment of the application, the first target image sequence is an image sequence of foreground images corresponding to a plurality of camera poses in the target mirror trajectory. The second target image sequence is an image sequence of background images in a plurality of camera poses in the target mirror-moving trajectory.
For example, the terminal device may fuse the first target image sequence and the second target image sequence to obtain a target image sequence according to an object distance corresponding to the first target image sequence and an object distance corresponding to the second target image sequence, that is, fuse the foreground image sequence and the background image sequence to obtain a target image sequence corresponding to the real scene. For example, the colors and the volume densities of the sampling points on the camera ray corresponding to the first target image sequence and the camera ray corresponding to the second target image sequence may be fused according to the corresponding object distances, so as to obtain a fused camera ray, that is, a target image sequence.
Based on the method, the terminal equipment can train and obtain the target nerve radiation field network with high precision according to the key frame of the video to be processed and the camera pose corresponding to the key frame, and the definition of the video which is obtained according to the target nerve radiation field network and accords with the target mirror moving track is effectively improved.
Fig. 14 is a schematic flowchart of another video processing method according to an embodiment of the present disclosure. As shown in fig. 14, the method for processing the mirror movement track of the shot video to be processed includes:
when receiving the operation of processing the moving mirror track of the video to be processed by a user, the terminal equipment performs key frame screening and camera pose estimation on the video to be processed to construct a training set. The training set comprises multi-angle key frames of the video to be processed and pose information corresponding to the key frames.
Fig. 15 is a flowchart illustrating a method for acquiring a key frame and a camera pose corresponding to the key frame according to an embodiment of the present disclosure. As shown in fig. 15, the terminal device may obtain a video to be processed, perform image sharpness estimation and motion displacement estimation on a video sequence to be processed, and perform frame extraction on the video to be processed according to a time interval threshold, so as to obtain a multi-angle key frame in the video to be processed. It can be understood that the motion displacement estimation may determine a time interval threshold, the terminal device may perform frame extraction on the video sequence to be processed according to the time interval threshold to obtain an initial key frame, and the method of performing frame extraction processing according to the time interval threshold may refer to the description about frame extraction in the foregoing embodiment, and is not described herein again. The sharpness estimate may be used to screen the obtained initial keyframes to obtain final multi-angle keyframes. After the key frame is obtained, retrieval matching and incremental reconstruction processing are carried out on the key frame by using COLMAP, and the camera pose corresponding to the key frame is obtained.
Further, the terminal device constructs an initial nerve radiation field network, inputs the camera poses corresponding to the multi-angle key frames in the training set into the initial nerve radiation field network, and trains to obtain a target nerve radiation field network. According to fig. 14, the initial neural radiation field network constructed by the terminal device can distinguish the foreground radiation field from the background radiation field, or the constructed initial neural radiation field network comprises a first initial neural radiation field network and a second initial neural radiation field network. Correspondingly, the trained target nerve radiation field network can distinguish a foreground radiation field from a background radiation field, or the target nerve radiation field network comprises a first target nerve radiation field network and a second target nerve radiation field network.
Fig. 16 is a schematic flowchart of training and application of a neural radiation field network according to an embodiment of the present application. As shown in fig. 16, the terminal device may acquire the camera pose corresponding to the key frame, perform sampling processing on the camera pose corresponding to the key frame, and input the camera pose and the sampling data after the sampling processing to the initial neural radiation field to obtain a target neural radiation field network capable of distinguishing the foreground radiation field from the background radiation field, or input the target neural radiation field network to a target neural radiation field network including the first target neural radiation field network and the second target neural radiation field network.
Illustratively, when the terminal device constructs the training set, the terminal device may determine the constraint of the pose information corresponding to the keyframe, that is, the selectable range of the moving mirror corresponding to the video to be processed, that is, the selectable range of the camera pose. The terminal equipment can provide recommended mirror moving tracks and/or user-defined mirror moving tracks for the user according to the selectable mirror moving ranges so as to obtain the target mirror moving tracks and mirror moving visual angle sequences corresponding to the target mirror moving tracks, namely camera pose sequences in the target mirror moving tracks. For example, the terminal device displays an interface including a recommended mirror moving mode and/or a customized mirror moving track, so that the terminal device responds to the operation of the user, determines the mirror pushing and moving mode as the target mirror moving track, or determines the user-customized mirror moving track as the target mirror moving track, and interactive mirror moving selection is achieved.
Further, the terminal device may input a mirror movement view sequence corresponding to the target mirror movement track into the target nerve radiation field network. As shown in fig. 14 and 16, the terminal device may render the image sequence output by the target nerve radiation field network by using a volume renderer, and output an angle-of-view image corresponding to the mirror-moving angle-of-view sequence, that is, an image at each mirror-moving angle-of-view, so as to obtain a video after mirror-moving processing. It can be understood that the mirror-moved video conforms to the target mirror-moved trajectory.
It should be understood that the interface of the terminal device provided in the embodiment of the present application is only an example, and is not limited to the embodiment of the present application.
The method provided by the embodiment of the present application is described above with reference to fig. 4 to 16, and in order to implement the above functions, the method includes a hardware structure and/or a software module for performing each function. Those of skill in the art will readily appreciate that the present application is capable of hardware or a combination of hardware and computer software implementing the exemplary method steps described in connection with the embodiments disclosed herein. Whether a function is performed as hardware or computer software drives hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
In the embodiment of the present application, functional modules of a device for implementing the frame rate switching method may be divided according to the above method example, for example, each functional module may be divided corresponding to each function, or two or more functions may be integrated into one processing module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. It should be noted that, in the embodiment of the present application, the division of the module is schematic, and is only one logic function division, and there may be another division manner in actual implementation.
Fig. 17 is a schematic diagram of a hardware structure of a control device according to an embodiment of the present disclosure, and as shown in fig. 17, the control device includes a processor 1701, a communication line 1704, and at least one communication interface (an example of the communication interface 1703 in fig. 17 is described as an example).
The processor 1701 may be a general processing unit (CPU), a microprocessor, an application-specific integrated circuit (ASIC), or one or more ics for controlling the execution of programs in accordance with the teachings of the present application.
The communication lines 1704 may include circuitry to communicate information between the aforementioned components.
Communication interface 1703 may use any device such as a transceiver for communicating with another device or a communication network, such as an ethernet, a Wireless Local Area Network (WLAN), etc.
Possibly, the control device may also include a memory 1702.
The memory 1702 may be, but is not limited to, a read-only memory (ROM) or other type of static storage device that may store static information and instructions, a Random Access Memory (RAM) or other type of dynamic storage device that may store information and instructions, an electrically erasable programmable read-only memory (EEPROM), a compact disc read-only memory (CD-ROM) or other optical disk storage, optical disk storage (including compact disc, laser disc, optical disc, digital versatile disc, blu-ray disc, etc.), a magnetic disk storage medium or other magnetic storage device, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer. The memory may be self-contained and coupled to the processor via communication line 1704. The memory may also be integral to the processor.
The memory 1702 is used for storing computer-executable instructions for implementing the present solution, and is controlled by the processor 1701 for execution. The processor 1701 is configured to execute computer executable instructions stored in the memory 1702 to implement the video processing method provided by the embodiments of the present application.
Possibly, the computer executed instructions in the embodiments of the present application may also be referred to as application program codes, which are not specifically limited in the embodiments of the present application.
In particular implementations, the processor 1701 may include one or more CPUs, such as the CPU0 and the CPU1 of fig. 17, as one embodiment.
In particular implementations, the control device may include multiple processors, such as the processor 1701 and the processor 1705 in FIG. 17, for example, as an example. Each of these processors may be a single-core (single-CPU) processor or a multi-core (multi-CPU) processor. A processor herein may refer to one or more devices, circuits, and/or processing cores for processing data (e.g., computer program instructions).
Exemplarily, fig. 18 is a schematic structural diagram of a chip provided in an embodiment of the present application. Chip 180 includes one or more (including two) processors 1820 and a communication interface 1830.
In some embodiments, memory 1840 stores the following elements: an executable module or a data structure, or a subset thereof, or an expanded set thereof.
In an embodiment of the present application, the memory 1840 may include a read-only memory and a random access memory, and provides instructions and data to the processor 1820. A portion of the memory 1840 may also include non-volatile random access memory (NVRAM).
In the illustrated embodiment, memory 1840, communication interface 1830, and memory 1840 are coupled via a bus system 1810. The bus system 1810 may include a power bus, a control bus, a status signal bus, and the like, in addition to the data bus. For ease of description, the various buses are identified in FIG. 18 as the bus system 1810.
The method described in the embodiments of the present application may be applied to the processor 1820 or implemented by the processor 1820. The processor 1820 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 1820. The processor 1820 may be a general-purpose processor (e.g., a microprocessor or a conventional processor), a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an FPGA (field-programmable gate array) or other programmable logic devices, discrete gates, transistor logic devices, or discrete hardware components, and the processor 1820 may implement or perform the methods, steps, and logic blocks disclosed in the embodiments of the present invention.
The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in a storage medium mature in the field, such as a random access memory, a read only memory, a programmable read only memory, or a charged erasable programmable memory (EEPROM). The storage medium can reside in the memory 1840, and the processor 1820 can read the information in the memory 1840 and perform the steps of the above-described method in conjunction with its hardware.
In the above embodiments, the instructions stored by the memory for execution by the processor may be implemented in the form of a computer program product. The computer program product may be written in the memory in advance, or may be downloaded in the form of software and installed in the memory.
The computer program product includes one or more computer instructions. The procedures or functions according to the embodiments of the present application are all or partially generated when the computer program instructions are loaded and executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. Computer instructions may be stored in, or transmitted from, a computer-readable storage medium to another computer-readable storage medium, e.g., from one website, computer, server, or data center to another website, computer, server, or data center via wire (e.g., coaxial cable, fiber optics, Digital Subscriber Line (DSL), or wireless (e.g., infrared, wireless, microwave, etc.), the computer-readable storage medium may be any available medium that a computer can store or a data storage device including one or more available media integrated servers, data centers, etc., the available media may include, for example, magnetic media (e.g., floppy disks, hard disks, or magnetic tape), optical media (e.g., digital versatile disks, DVD), or a semiconductor medium (e.g., a Solid State Disk (SSD)), etc.
The embodiment of the application also provides a computer readable storage medium. The methods described in the above embodiments may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. Computer-readable media may include both computer storage media and communication media, and may include any medium that can transfer a computer program from one place to another. A storage medium may be any target medium that can be accessed by a computer.
As one possible design, the computer-readable medium may include a compact disk read-only memory (CD-ROM), RAM, ROM, EEPROM, or other optical disk storage; the computer readable medium may include a disk memory or other disk storage device. Also, any connecting line may also be properly termed a computer-readable medium. For example, if the software is transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. Disk and disc, as used herein, includes Compact Disc (CD), laser disc, optical disc, Digital Versatile Disc (DVD), floppy disk and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers.
Combinations of the above should also be included within the scope of computer-readable media. The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (16)

1. A video processing method applied to an electronic device is characterized by comprising the following steps:
displaying a video to be processed;
acquiring indication information for indicating the mirror-moving track for processing the video to be processed;
acquiring a key frame of the video to be processed and pose information corresponding to the key frame based on the indication information;
training to obtain a target nerve radiation field network according to the key frame and pose information corresponding to the key frame;
inputting a target mirror moving track into the target nerve radiation field network to obtain a target image sequence; the target mirror moving track is carried in the indication information, or the target mirror moving track is self-defined;
and rendering the target image sequence to obtain a video conforming to the target mirror moving track.
2. The method according to claim 1, wherein the training to obtain the target neural radiation field network according to the keyframe and pose information corresponding to the keyframe comprises:
inputting pose information corresponding to the key frame into an initial nerve radiation field network to obtain an image frame to be adjusted;
and adjusting the initial nerve radiation field network according to the difference between the image frame to be adjusted and the key frame until the difference between the image frame to be adjusted and the key frame meets a preset condition, thereby obtaining the target nerve radiation field network.
3. The method according to claim 2, wherein the initial neural radiation field network distinguishes between a foreground radiation field and a background radiation field, wherein the object distance of the camera in the foreground radiation field is less than or equal to a preset value, and the corresponding volume density is 0, and the object distance of the camera in the background radiation field is greater than the preset value, and the volume density is 0.
4. The method of claim 1, wherein the target neural radiation field network comprises a first target neural radiation field network and a second target neural radiation field network; the training according to the key frame and the pose information corresponding to the key frame to obtain the target nerve radiation field network comprises:
inputting pose information corresponding to the key frame into a first initial neural radiation field network to obtain a foreground image frame to be adjusted;
adjusting the first initial nerve radiation field network according to the difference between the foreground of the foreground image frame to be adjusted and the foreground of the key frame until the difference between the foreground of the foreground image frame to be adjusted and the foreground of the key frame meets a first preset condition, and obtaining a first target nerve radiation field network;
inputting pose information corresponding to the key frame into a second initial neural radiation field network to obtain a background image frame to be adjusted;
and adjusting the second initial neural radiation field network according to the difference between the background image frame to be adjusted and the background of the key frame until the difference between the background image frame to be adjusted and the background of the key frame meets a second preset condition, so as to obtain a second target neural radiation field network.
5. The method of claim 4, wherein inputting the target mirror trajectory into the target neural radiation field network to obtain a target image sequence comprises:
inputting the target mirror moving track into the first target nerve radiation field network and the second target nerve radiation field network respectively to obtain a first target image sequence and a second target image sequence;
and fusing the first target image sequence and the second target image sequence to obtain a target image sequence.
6. The method of any one of claims 1-5, wherein said inputting a target mirror trajectory into said target neural radiation field network to obtain a target image sequence comprises:
and the target nerve radiation field network respectively outputs image frames according to the sequence of the poses of the plurality of cameras in the target mirror moving track to obtain a target image sequence.
7. The method according to any one of claims 1 to 6, wherein the obtaining of the indication information for indicating the mirror-moving track for processing the video to be processed comprises:
receiving a first operation corresponding to the video to be processed;
displaying one or more first recommended mirror-moving tracks in response to the first operation;
when a second operation on a target mirror moving track in the one or more first recommended mirror moving tracks is received, obtaining the indication information, wherein the indication information comprises the target mirror moving track.
8. The method according to any one of claims 1 to 6, wherein the obtaining of the indication information for indicating the mirror-moving track for processing the video to be processed comprises:
receiving a third operation corresponding to the video to be processed;
responding to the third operation to obtain the indication information;
before the target mirror-moving trajectory is input into the target nerve radiation field network, the method further comprises the following steps:
and acquiring the target mirror moving track.
9. The method of claim 8, wherein the acquiring the target mirror trajectory comprises:
displaying a first interface, wherein the first interface comprises one or more second recommended mirror moving tracks, and the one or more second recommended mirror moving tracks are mirror moving tracks meeting the constraint of the pose information corresponding to the key frames;
and when a fourth operation on a target mirror moving track in the one or more second recommended mirror moving tracks is received, obtaining the target mirror moving track.
10. The method of claim 8, wherein the acquiring the target mirror trajectory comprises:
displaying a second interface, wherein the second interface comprises an original viewpoint track, a recommended viewpoint track and an editable optional viewpoint track of the key frame;
receiving a fifth operation on the self-selection viewpoint track;
generating the target mirror path in response to the fifth operation.
11. The method of claim 10, wherein the self-chosen viewpoint trajectory comprises a camera pose to be processed, and the fifth operation is a process operation on the camera pose to be processed;
the generating the target mirror trajectory in response to the fifth operation includes:
in response to the fifth operation, generating the target mirror motion trajectory according to the processed camera pose.
12. The method according to claim 10, wherein the self-selection viewpoint trajectory comprises a to-be-determined mirror moving mode and a corresponding to-be-determined time length; the fifth operation comprises the operation of a target mirror moving mode and/or a target time length;
the generating the target mirror trajectory in response to the fifth operation includes:
and responding to the fifth operation, and generating the target mirror moving track according to the target mirror moving mode and/or the target time length.
13. The method according to any one of claims 1 to 6, wherein the acquiring the keyframe of the video to be processed and the pose information corresponding to the keyframe comprises:
extracting frames from the video to be processed according to a preset time interval to obtain the key frames; acquiring pose information corresponding to the key frame by using a feature retrieval matching algorithm and an incremental reconstruction algorithm;
alternatively, the first and second electrodes may be,
extracting frames from the video to be processed according to a preset time interval to obtain an initial key frame; removing initial key frames with the definition smaller than a definition threshold value in the initial key frames, and/or removing partial initial key frames with the similarity larger than a similarity threshold value in the initial key frames to obtain the key frames; and acquiring the pose information corresponding to the key frame by using a feature retrieval matching algorithm and an incremental reconstruction algorithm.
14. An electronic device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor, when executing the computer program, causes the electronic device to perform the method of any of claims 1 to 13.
15. A computer-readable storage medium, in which a computer program is stored which, when executed by a processor, causes a computer to carry out the method according to any one of claims 1 to 13.
16. A computer program product, comprising a computer program which, when executed, causes a computer to perform the method of any one of claims 1 to 13.
CN202210396606.6A 2022-04-15 2022-04-15 Video processing method, electronic device and storage medium Active CN114979785B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210396606.6A CN114979785B (en) 2022-04-15 2022-04-15 Video processing method, electronic device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210396606.6A CN114979785B (en) 2022-04-15 2022-04-15 Video processing method, electronic device and storage medium

Publications (2)

Publication Number Publication Date
CN114979785A true CN114979785A (en) 2022-08-30
CN114979785B CN114979785B (en) 2023-09-08

Family

ID=82977096

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210396606.6A Active CN114979785B (en) 2022-04-15 2022-04-15 Video processing method, electronic device and storage medium

Country Status (1)

Country Link
CN (1) CN114979785B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116703995A (en) * 2022-10-31 2023-09-05 荣耀终端有限公司 Video blurring processing method and device
CN116958492A (en) * 2023-07-12 2023-10-27 数元科技(广州)有限公司 VR editing application based on NeRf reconstruction three-dimensional base scene rendering

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111760286A (en) * 2020-06-29 2020-10-13 完美世界(北京)软件科技发展有限公司 Switching method and device of mirror operation mode, storage medium and electronic device
CN112019768A (en) * 2020-09-04 2020-12-01 北京奇艺世纪科技有限公司 Video generation method and device and electronic equipment
CN112153242A (en) * 2020-08-27 2020-12-29 北京电影学院 Virtual photography method based on camera behavior learning and sample driving
CN112927271A (en) * 2021-03-31 2021-06-08 Oppo广东移动通信有限公司 Image processing method, image processing apparatus, storage medium, and electronic device
CN114245000A (en) * 2020-09-09 2022-03-25 北京小米移动软件有限公司 Shooting method and device, electronic equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111760286A (en) * 2020-06-29 2020-10-13 完美世界(北京)软件科技发展有限公司 Switching method and device of mirror operation mode, storage medium and electronic device
CN112153242A (en) * 2020-08-27 2020-12-29 北京电影学院 Virtual photography method based on camera behavior learning and sample driving
CN112019768A (en) * 2020-09-04 2020-12-01 北京奇艺世纪科技有限公司 Video generation method and device and electronic equipment
CN114245000A (en) * 2020-09-09 2022-03-25 北京小米移动软件有限公司 Shooting method and device, electronic equipment and storage medium
CN112927271A (en) * 2021-03-31 2021-06-08 Oppo广东移动通信有限公司 Image processing method, image processing apparatus, storage medium, and electronic device

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116703995A (en) * 2022-10-31 2023-09-05 荣耀终端有限公司 Video blurring processing method and device
CN116703995B (en) * 2022-10-31 2024-05-14 荣耀终端有限公司 Video blurring processing method and device
CN116958492A (en) * 2023-07-12 2023-10-27 数元科技(广州)有限公司 VR editing application based on NeRf reconstruction three-dimensional base scene rendering
CN116958492B (en) * 2023-07-12 2024-05-03 数元科技(广州)有限公司 VR editing method for reconstructing three-dimensional base scene rendering based on NeRf

Also Published As

Publication number Publication date
CN114979785B (en) 2023-09-08

Similar Documents

Publication Publication Date Title
CN109819313B (en) Video processing method, device and storage medium
CN107087101B (en) Apparatus and method for providing dynamic panorama function
CN115473957B (en) Image processing method and electronic equipment
US10181203B2 (en) Method for processing image data and apparatus for the same
CN114979785B (en) Video processing method, electronic device and storage medium
CN111597000B (en) Small window management method and terminal
CN110070496B (en) Method and device for generating image special effect and hardware device
CN113806306B (en) Media file processing method, device, equipment, readable storage medium and product
CN113409427B (en) Animation playing method and device, electronic equipment and computer readable storage medium
CN113747199A (en) Video editing method, video editing apparatus, electronic device, storage medium, and program product
CN113554932B (en) Track playback method and device
CN116055857A (en) Photographing method and electronic equipment
CN112911337B (en) Method and device for configuring video cover pictures of terminal equipment
CN111031377B (en) Mobile terminal and video production method
CN113038141A (en) Video frame processing method and electronic equipment
CN116095413B (en) Video processing method and electronic equipment
CN112416984A (en) Data processing method and device
CN114222187B (en) Video editing method and electronic equipment
CN114979533A (en) Video recording method, device and terminal
CN114332709A (en) Video processing method, video processing device, storage medium and electronic equipment
CN116797767A (en) Augmented reality scene sharing method and electronic device
CN116089368B (en) File searching method and related device
CN116055799B (en) Multi-track video editing method, graphical user interface and electronic equipment
CN116193275B (en) Video processing method and related equipment
US12019669B2 (en) Method, apparatus, device, readable storage medium and product for media content processing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant