CN114979785B - Video processing method, electronic device and storage medium - Google Patents

Video processing method, electronic device and storage medium Download PDF

Info

Publication number
CN114979785B
CN114979785B CN202210396606.6A CN202210396606A CN114979785B CN 114979785 B CN114979785 B CN 114979785B CN 202210396606 A CN202210396606 A CN 202210396606A CN 114979785 B CN114979785 B CN 114979785B
Authority
CN
China
Prior art keywords
target
mirror
video
radiation field
track
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210396606.6A
Other languages
Chinese (zh)
Other versions
CN114979785A (en
Inventor
李宇
王龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Honor Device Co Ltd
Original Assignee
Honor Device Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Honor Device Co Ltd filed Critical Honor Device Co Ltd
Priority to CN202210396606.6A priority Critical patent/CN114979785B/en
Publication of CN114979785A publication Critical patent/CN114979785A/en
Application granted granted Critical
Publication of CN114979785B publication Critical patent/CN114979785B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/472End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
    • H04N21/47205End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for manipulating displayed content, e.g. interacting with MPEG-4 objects, editing locally
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
    • H04N21/440245Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display the reformatting operation being performed only on part of the stream, e.g. a region of the image or a time segment

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Human Computer Interaction (AREA)
  • Databases & Information Systems (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The embodiment of the application provides a video processing method and a related device, relating to the technical field of terminals, wherein the method comprises the following steps: displaying the video to be processed by the terminal equipment; the terminal equipment acquires indication information for indicating a mirror track for processing the video to be processed; the terminal equipment acquires a key frame of the video to be processed and pose information corresponding to the key frame based on the indication information; the terminal equipment trains to obtain a target nerve radiation field network according to the key frames and pose information corresponding to the key frames; the terminal equipment inputs the target lens orbit into a target nerve radiation field network to obtain a target image sequence; the target mirror track is carried in the indication information, or the target mirror track is self-defined; and rendering the target image sequence by the terminal equipment to obtain a video conforming to the target mirror track. Therefore, the terminal equipment can process the video to be processed to obtain the video of the target mirror track different from the original mirror track, and user experience is improved.

Description

Video processing method, electronic device and storage medium
Technical Field
The present application relates to the field of terminal technologies, and in particular, to a video processing method and a related device.
Background
With the development of terminal technology, shooting by using electronic equipment has become a common technical means for video shooting. In the shooting process, the object can be shot by moving the lens, changing the optical axis of the lens, changing the focal length of the lens and other modes, so that the technical effect of shot videos is improved. The track of the lens movement in the whole shooting process can be used as a mirror track of the video.
Currently, when shooting a video of a certain scene, the effect of the shot video may be poor due to various reasons, such as professional degree limitation of a photographer, emergency occurrence in the shooting process, and the like, so that the requirement of a user cannot be met. At this time, the user is required to replace a new mirror track, and the video of the scene is shot again until the shot video meets the requirements of the user.
However, the above-mentioned manner of capturing video requires multiple times of capturing by the user, which consumes a lot of time, resulting in poor user experience.
Disclosure of Invention
The embodiment of the application provides a video processing method and a related device, which can process shot video into video of a mirror track meeting the requirement of a user, thereby improving the user experience.
In a first aspect, an embodiment of the present application provides a video processing method, including:
displaying a video to be processed; acquiring indication information for indicating a mirror track for processing a video to be processed; acquiring a key frame of a video to be processed and pose information corresponding to the key frame based on the indication information; training to obtain a target nerve radiation field network according to the key frames and pose information corresponding to the key frames; inputting the target fortune mirror track into a target nerve radiation field network to obtain a target image sequence; the target mirror track is carried in the indication information, or the target mirror track is self-defined; and rendering the target image sequence to obtain the video conforming to the target mirror track. In this way, the embodiment of the application realizes the modification of the mirror track of the video to be processed through the nerve radiation field network, obtains the video conforming to the target mirror track, and improves the user experience.
In one possible implementation manner, training to obtain the target neural radiation field network according to the key frame and pose information corresponding to the key frame includes: inputting pose information corresponding to the key frame into an initial neural radiation field network to obtain an image frame to be adjusted; and adjusting the initial neural radiation field network according to the difference between the image frame to be adjusted and the key frame until the difference between the image frame to be adjusted and the key frame meets the preset condition, so as to obtain the target neural radiation field network. In this way, the initial neural radiation field network is trained by using the camera pose corresponding to the key frame, and the target neural radiation field network which accords with the real scene corresponding to the video to be processed can be obtained.
In one possible implementation, the foreground nerve radiation field and the background nerve radiation field are distinguished in the initial nerve radiation field network, wherein the object distance of the camera in the foreground nerve radiation field is smaller than or equal to the volume density corresponding to the preset value and is 0, and the object distance of the camera in the background nerve radiation field is larger than the volume density of the preset value and is 0. In this way, the target nerve radiation field network is more similar to the real scene corresponding to the video to be processed by distinguishing the foreground nerve radiation field and the background nerve radiation field.
In one possible implementation, the target neural radiation field network includes a first target neural radiation field network and a second target neural radiation field network; training to obtain a target nerve radiation field network according to the key frames and pose information corresponding to the key frames, wherein the training comprises the following steps: inputting pose information corresponding to the key frame into a first initial neural radiation field network to obtain a foreground image frame to be adjusted; adjusting a first initial neural radiation field network according to the difference of the foreground image frame to be adjusted and the foreground of the key frame until the difference of the foreground image frame to be adjusted and the foreground of the key frame meets a first preset condition, so as to obtain a first target neural radiation field network; inputting pose information corresponding to the key frames into a second initial neural radiation field network to obtain to-be-adjusted background image frames; and adjusting a second initial neural radiation field network according to the difference between the background image frame to be adjusted and the background of the key frame until the difference between the background image frame to be adjusted and the background of the key frame meets a second preset condition, so as to obtain a second target neural radiation field network. In this way, the restoration degree of the real scene corresponding to the video to be processed is further enhanced by obtaining the first target neural radiation field network corresponding to the foreground of the key frame and the second target neural radiation field network corresponding to the background of the key frame, so that the definition of the video which is rendered and accords with the target microscope track can be effectively improved.
In one possible implementation, inputting the target mirror trajectory into a target neural radiation field network to obtain a target image sequence, including: inputting the target fortune mirror track into a first target nerve radiation field network and a second target nerve radiation field network respectively to obtain a first target image sequence and a second target image sequence; and fusing the first target image sequence and the second target image sequence to obtain a target image sequence. In this way, the first image sequence corresponding to the foreground and the second image sequence corresponding to the background are fused to obtain the target image sequence, so that the video which is obtained by rendering the target image and accords with the target mirror track is closer to the effect of shooting in the real scene, and the user experience is improved.
In one possible implementation, inputting the target mirror trajectory into a target neural radiation field network to obtain a target image sequence, including: the target nerve radiation field network outputs image frames according to the sequence of the plurality of camera poses in the target lens track, and a target image sequence is obtained. Therefore, the video corresponding to the target image sequence can be ensured to accord with the target mirror track.
In one possible implementation manner, obtaining indication information for indicating a mirror trajectory for processing a video to be processed includes: receiving a first operation corresponding to a video to be processed; in response to a first operation, displaying one or more first recommended mirror trajectories; upon receiving a second operation on a target mirror trajectory of the one or more first recommended mirror trajectories, indication information is obtained, the indication information including the target mirror trajectory. Thus, the user can execute the operation of the video to be processed, and the display of one or more first recommended mirror tracks on the terminal equipment is realized. In addition, the terminal device may acquire the execution information including the target mirror track according to an operation performed by the user on one or more first recommended mirror tracks.
In one possible implementation manner, obtaining indication information for indicating a mirror trajectory for processing a video to be processed includes: receiving a third operation corresponding to the video to be processed; responding to the third operation to obtain indication information; before inputting the target fortune mirror track into the target neural radiation field network, the method further comprises: and obtaining the target mirror track. In this way, the terminal device can obtain the indication information according to the operation of the video to be processed, which is executed by the user, and obtain the target mirror track before inputting the target mirror track into the target neural radiation field network, so as to ensure that the video conforming to the target mirror track can be obtained based on the target neural radiation field network.
In one possible implementation, obtaining the target mirror trajectory includes: displaying a first interface, wherein the first interface comprises one or more second recommended mirror trajectories, and the one or more second recommended mirror trajectories are mirror trajectories conforming to the constraint of pose information corresponding to the key frame; and obtaining the target mirror track when receiving a fourth operation on the target mirror track in the one or more second recommended mirror tracks. In this way, the terminal device may display one or more second recommended mirror trajectories that meet the constraints of the pose information corresponding to the keyframes. And the target mirror track obtained by the terminal equipment meets the constraint of pose information corresponding to the key frame, so that images corresponding to a plurality of camera poses in the target mirror track can be accurately obtained.
In one possible implementation, obtaining the target mirror trajectory includes: displaying a second interface, wherein the second interface comprises an original viewpoint track of a key frame, a recommended viewpoint track and an editable self-selection viewpoint track; receiving a fifth operation on the self-selection viewpoint track; in response to the fifth operation, a target mirror trajectory is generated. Therefore, the user can execute the operation of the editable self-selection viewpoint track in the second interface, so that the mirror track is edited, the edited target mirror track is obtained, and the user experience is improved.
In one possible implementation, the self-selected viewpoint track includes a pose of the camera to be processed, and the fifth operation is an operation of processing the pose of the camera to be processed; in response to a fifth operation, generating a target mirror trajectory comprising: and responding to the fifth operation, and generating a target mirror track according to the processed camera pose. Therefore, the user can execute the operation of processing the camera pose to be processed, the user definition of the camera pose in the mirror-moving track is realized, so that the terminal equipment obtains a target video track composed of the processed camera pose, and the user experience is improved.
In one possible implementation, the self-selected viewpoint track includes a to-be-determined mirror mode and a corresponding to-be-determined duration; the fifth operation comprises an operation on a target mirror mode and/or a target duration; in response to a fifth operation, generating a target mirror trajectory comprising: and responding to the fifth operation, and generating a target mirror track according to the target mirror mode and/or the target time length. Therefore, the user can execute the operation of the target mirror mode and/or the target time length, the user definition of the mirror mode and/or the target time length in the target mirror track is realized, so that the terminal equipment generates the target mirror track meeting the user requirement, and the user experience is improved.
In one possible implementation manner, obtaining a key frame of a video to be processed and pose information corresponding to the key frame includes: extracting frames from the video to be processed according to a preset time interval to obtain key frames; acquiring pose information corresponding to the key frame by utilizing a feature retrieval matching algorithm and an incremental reconstruction algorithm; or extracting frames from the video to be processed according to a preset time interval to obtain an initial key frame; removing the initial key frames with the definition smaller than the definition threshold value in the initial key frames, and/or removing partial initial key frames with the similarity larger than the similarity threshold value in the initial key frames to obtain key frames; and acquiring pose information corresponding to the key frame by utilizing a feature retrieval matching algorithm and an incremental reconstruction algorithm. In this way, the initial key frames of the video to be processed are removed through the preset time interval and/or the similarity of the key frames, so that the accuracy of the obtained target nerve radiation field network is improved according to the training of the key frames and the pose information corresponding to the key frames, and the target nerve radiation field network is more similar to the real scene of the video to be processed.
In a second aspect, an embodiment of the present application provides a video processing apparatus, where the video processing apparatus may be a terminal device, or may be a chip or a chip system in the terminal device. The video processing apparatus may include a display unit and a processing unit. When the video processing apparatus is a terminal device, the display unit may be a display screen. The display unit is configured to perform the step of displaying, so that the terminal device implements the display-related method described in the first aspect or any one of the possible implementation manners of the first aspect, and the processing unit is configured to implement the processing-related method in the first aspect or any one of the possible implementation manners of the first aspect. When the video processing apparatus is a terminal device, the processing unit may be a processor. The video processing device may further include a storage unit, which may be a memory. The storage unit is configured to store instructions, and the processing unit executes the instructions stored in the storage unit, so that the terminal device implements a method described in the first aspect or any one of possible implementation manners of the first aspect. When the video processing apparatus is a chip or a system of chips within a terminal device, the processing unit may be a processor. The processing unit executes instructions stored by the storage unit to cause the terminal device to implement a method as described in the first aspect or any one of the possible implementations of the first aspect. The memory unit may be a memory unit (e.g., a register, a cache, etc.) in the chip, or a memory unit (e.g., a read-only memory, a random access memory, etc.) located outside the chip in the terminal device.
Exemplary, the display unit is configured to display the video to be processed. The processing unit is used for acquiring indication information for indicating the mirror track for processing the video to be processed; and the method is also used for acquiring the key frame of the video to be processed and pose information corresponding to the key frame based on the indication information. The processing unit is used for training to obtain a target nerve radiation field network according to the key frames and pose information corresponding to the key frames; the method is also used for inputting the target fortune mirror track into a target nerve radiation field network to obtain a target image sequence; the target mirror track is carried in the indication information, or the target mirror track is self-defined; and the method is also used for rendering the target image sequence to obtain the video conforming to the target mirror track.
In a possible implementation manner, the processing unit is specifically configured to input pose information corresponding to the key frame into the initial neural radiation field network to obtain an image frame to be adjusted; and adjusting the initial neural radiation field network according to the difference between the image frame to be adjusted and the key frame until the difference between the image frame to be adjusted and the key frame meets the preset condition, so as to obtain the target neural radiation field network.
In a possible implementation manner, the foreground radiation field and the background radiation field are distinguished in the initial neural radiation field network, wherein the object distance of the camera in the foreground radiation field is smaller than or equal to the volume density corresponding to the preset value to be 0, and the object distance of the camera in the background radiation field is larger than the volume density of the preset value to be 0.
In one possible implementation, the target neural radiation field network includes a first target neural radiation field network and a second target neural radiation field network. The processing unit is specifically used for inputting pose information corresponding to the key frames into the first initial neural radiation field network to obtain foreground image frames to be adjusted; adjusting a first initial neural radiation field network according to the difference of the foreground image frame to be adjusted and the foreground of the key frame until the difference of the foreground image frame to be adjusted and the foreground of the key frame meets a first preset condition, so as to obtain a first target neural radiation field network; inputting pose information corresponding to the key frames into a second initial neural radiation field network to obtain to-be-adjusted background image frames; and adjusting a second initial neural radiation field network according to the difference between the background image frame to be adjusted and the background of the key frame until the difference between the background image frame to be adjusted and the background of the key frame meets a second preset condition, so as to obtain a second target neural radiation field network.
In a possible implementation manner, the processing unit is specifically configured to input the target lens-transporting track into the first target neural radiation field network and the second target neural radiation field network respectively, so as to obtain a first target image sequence and a second target image sequence; and fusing the first target image sequence and the second target image sequence to obtain a target image sequence.
In one possible implementation manner, the processing unit is specifically configured to output image frames according to the sequence of the plurality of camera poses in the target mirror trajectory by using the target neural radiation field network, so as to obtain a target image sequence
In a possible implementation manner, the display unit is further configured to display one or more first recommended mirror trajectories in response to a first operation corresponding to the video to be processed. And the terminal equipment is used for obtaining indication information when receiving a second operation on the target mirror track in the one or more first recommended mirror tracks, wherein the indication information comprises the target mirror track.
In a possible implementation, the processing unit, in particular, receives a third operation corresponding to the video to be processed; and responding to the third operation to obtain the indication information. The processing unit is further configured to acquire the target mirror trajectory before inputting the target mirror trajectory into the target neural radiation field network.
In a possible implementation manner, the display unit is further configured to display a first interface, where the first interface includes one or more second recommended mirror trajectories, and the one or more second recommended mirror trajectories are mirror trajectories that conform to constraints of pose information corresponding to the keyframe. And the terminal equipment is also used for obtaining the target mirror track when receiving the fourth operation on the target mirror track in the one or more second recommended mirror tracks.
In a possible implementation manner, the display unit is further configured to display a second interface, where the second interface includes an original viewpoint track, a recommended viewpoint track, and an editable self-selection viewpoint track of the key frame. And when the terminal equipment receives the fifth operation on the self-selection viewpoint track, the processing unit is further used for responding to the fifth operation to generate a target mirror track.
In a possible implementation manner, the self-selection viewpoint track includes a pose of the camera to be processed, and the fifth operation is an operation of processing the pose of the camera to be processed. And the processing unit is specifically used for responding to the fifth operation and generating a target mirror track according to the processed camera pose.
In one possible implementation manner, the self-selection viewpoint track includes a mirror mode to be determined and a corresponding duration to be determined; the fifth operation includes an operation on the target mirror mode and/or the target duration. The processing unit is specifically configured to generate a target mirror trajectory according to the target mirror mode and/or the target duration in response to the fifth operation.
In one possible implementation manner, the processing unit is specifically configured to extract frames in the video to be processed according to a preset time interval to obtain key frames; acquiring pose information corresponding to the key frame by utilizing a feature retrieval matching algorithm and an incremental reconstruction algorithm; or extracting frames from the video to be processed according to a preset time interval to obtain an initial key frame; removing the initial key frames with the definition smaller than the definition threshold value in the initial key frames, and/or removing partial initial key frames with the similarity larger than the similarity threshold value in the initial key frames to obtain key frames; and acquiring pose information corresponding to the key frame by utilizing a feature retrieval matching algorithm and an incremental reconstruction algorithm.
In a third aspect, an embodiment of the present application provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, when executing the computer program, causing the electronic device to perform the video processing method described in the first aspect or any one of the possible implementations of the first aspect.
In a fourth aspect, embodiments of the present application provide a computer readable storage medium having stored therein a computer program or instructions which, when run on a computer, cause the computer to perform the video processing method described in the first aspect or any one of the possible implementations of the first aspect.
In a fifth aspect, embodiments of the present application provide a computer program product comprising a computer program which, when run on a computer, causes the computer to perform the video processing method described in the first aspect or any one of the possible implementations of the first aspect.
It should be understood that, the second aspect to the sixth aspect of the present application correspond to the technical solutions of the first aspect of the present application, and the advantages obtained by each aspect and the corresponding possible embodiments are similar, and are not repeated.
Drawings
Fig. 1 is a schematic structural diagram of a terminal device according to an embodiment of the present application;
fig. 2 is a software architecture block diagram of a terminal device according to an embodiment of the present application;
fig. 3 is a schematic flow chart of a video processing method according to an embodiment of the present application;
FIG. 4 is a first interface diagram for obtaining indication information according to an embodiment of the present application;
FIG. 5 is a second interface schematic diagram for obtaining indication information according to an embodiment of the present application;
fig. 6 is an interface schematic diagram of displaying a first interface by a terminal device according to an embodiment of the present application;
fig. 7 is an interface schematic diagram of a terminal device displaying a second interface according to an embodiment of the present application;
FIG. 8 is a schematic diagram of an interface for customizing a target mirror trajectory according to an embodiment of the present application;
FIG. 9 is a second interface schematic diagram of a custom target mirror track according to an embodiment of the present application;
FIG. 10 is a diagram illustrating an interface for displaying a target mirror mode according to an embodiment of the present application;
FIG. 11 is a second interface diagram illustrating a target mirror mode according to an embodiment of the present application;
FIG. 12 is a third interface diagram illustrating a manner of displaying a target mirror according to an embodiment of the present application;
FIG. 13 is a flowchart of a method for training a target neural radiation field network according to an embodiment of the present application;
fig. 14 is a flowchart of another video processing method according to an embodiment of the present application;
FIG. 15 is a flowchart of a method for obtaining a keyframe and a camera pose corresponding to the keyframe according to an embodiment of the present application;
FIG. 16 is a schematic flow chart of training and application of a neural radiation field network according to an embodiment of the present application;
fig. 17 is a schematic hardware structure of a control device according to an embodiment of the present application;
fig. 18 is a schematic structural diagram of a chip according to an embodiment of the present application.
Detailed Description
In order to clearly describe the technical solution of the embodiments of the present application, in the embodiments of the present application, the words "first", "second", etc. are used to distinguish the same item or similar items having substantially the same function and effect. For example, the first interface and the second interface are for distinguishing different response interfaces, and the sequence of the response interfaces is not limited. It will be appreciated by those of skill in the art that the words "first," "second," and the like do not limit the amount and order of execution, and that the words "first," "second," and the like do not necessarily differ.
In the present application, the words "exemplary" or "such as" are used to mean serving as an example, instance, or illustration. Any embodiment or design described herein as "exemplary" or "for example" should not be construed as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary" or "such as" is intended to present related concepts in a concrete fashion.
In order to facilitate the clear description of the technical solutions of the embodiments of the present application, the following simply describes some terms and techniques involved in the embodiments of the present application:
pose: i.e. position and pose, is the position of the rigid body in space and its own pose, the camera pose is the position of the camera in space and the orientation of the camera.
Mirror track: representing the moving path of the lens of the camera in three-dimensional space, the mirror trajectory may be constituted by a plurality of camera poses. The mirror trajectory of a video can be understood as the trajectory of the lens movement during shooting.
Mirror operation mode: namely, the mode of moving the lens comprises lens pushing, lens pulling, lens shaking, lens moving, lens following, lens lifting, lens combining and the like. The various lens-carrying modes are realized by means such as moving the position of the camera, changing the optical axis of the lens, changing the focal length of the lens, and the like.
Neural radiation field network: representing Scenes as Neural Radiance Fields for View Synthesis, neRF for short, is an implicit representation of a three-dimensional scene. The method can learn the images with known camera parameters through a neural network to obtain a static three-dimensional scene.
In a scene of shooting videos by using terminal equipment, a user can shoot videos of different mirror trails by controlling the movement of the terminal equipment. For example, in the video process, a user can use a pushing lens to move the lens from far to near to the object, so that the shot object is closed, and the artistic effect of the shot video is enhanced.
When a video of a certain scene is shot, due to the limitation of the professional degree of a photographer, the technical effect of the shot video may be poor, and the requirement of a user on the video effect cannot be met. At this time, the user can change the mirror track in the shooting process to shoot the scene again until the shot video meets the requirement of the user. However, repeated shooting of an image of one scene takes a lot of time, and due to the limitation of the degree of expertise, it may occur that a video satisfying the user's requirements cannot be shot, resulting in poor user experience.
In order to solve the problem that the user experience is poor because the artistic effect of the video is poor and the video of a certain scene needs to be repeatedly shot, the artistic effect of the video is related to the moving mirror track used in the shooting process, the video is shot by using the better moving mirror track, and the artistic effect of the video can be effectively improved. Therefore, for the video which does not meet the user requirement, the terminal equipment can establish a three-dimensional scene corresponding to the real scene in the video through the nerve radiation field network, so that the terminal equipment can output an image sequence according to the target mirror track which meets the user requirement, and obtain the video which meets the target mirror track, namely, the video with a new view angle which does not exist in the original video.
According to the technical scheme provided by the embodiment of the application, a user does not need to repeatedly shoot a certain scene, the terminal equipment is used for modifying the mirror track corresponding to the shot video, the video of a new mirror track meeting the user requirement, namely the video under a new visual angle, a large amount of time is saved, and the user experience can be effectively improved.
It will be appreciated that the terminal device may be a smart phone or a tablet device, or the terminal device may be a wearable device, such as a smart watch, a smart bracelet, a wearable Virtual Reality (VR) device, or a wearable augmented reality (augmented reality, AR) device. The specific technology and the specific equipment form adopted by the terminal equipment are not limited in the embodiment of the application.
Therefore, in order to better understand the embodiments of the present application, the structure of the terminal device of the embodiments of the present application will be described below. Fig. 1 is a schematic structural diagram of a terminal device according to an embodiment of the present application.
The terminal device may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (universal serial bus, USB) interface 130, a charge management module 140, a power management module 141, an antenna 1, an antenna 2, a mobile communication module 150, a wireless communication module 160, an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, a sensor module 180, keys 190, an indicator 192, a camera 193, a display 194, and the like.
It will be appreciated that the structure illustrated in the embodiments of the present application does not constitute a specific limitation on the terminal device. In other embodiments of the application, the terminal device may include more or less components than illustrated, or certain components may be combined, or certain components may be split, or different arrangements of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.
Processor 110 may include one or more processing units. Wherein the different processing units may be separate devices or may be integrated in one or more processors. A memory may also be provided in the processor 110 for storing instructions and data.
The USB interface 130 is an interface conforming to the USB standard specification, and may specifically be a Mini USB interface, a Micro USB interface, a USB Type C interface, or the like. The USB interface 130 may be used to connect a charger to charge a terminal device, or may be used to transfer data between the terminal device and a peripheral device. And can also be used for connecting with a headset, and playing audio through the headset. The interface may also be used to connect other electronic devices, such as AR devices, etc.
The charge management module 140 is configured to receive a charge input from a charger. The charger can be a wireless charger or a wired charger. The power management module 141 is used for connecting the charge management module 140 and the processor 110.
The wireless communication function of the terminal device may be implemented by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, a modem processor, a baseband processor, and the like.
The antennas 1 and 2 are used for transmitting and receiving electromagnetic wave signals. Antennas in the terminal device may be used to cover single or multiple communication bands. Different antennas may also be multiplexed to improve the utilization of the antennas.
The mobile communication module 150 may provide a solution for wireless communication including 2G/3G/4G/5G or the like applied on a terminal device. The mobile communication module 150 may include at least one filter, switch, power amplifier, low noise amplifier (low noise amplifier, LNA), etc. The mobile communication module 150 may receive electromagnetic waves from the antenna 1, perform processes such as filtering, amplifying, and the like on the received electromagnetic waves, and transmit the processed electromagnetic waves to the modem processor for demodulation.
The wireless communication module 160 may provide solutions for wireless communication including wireless local area network (wirelesslocal area networks, WLAN) (e.g., wireless fidelity (wireless fidelity, wi-Fi) network), bluetooth (BT), global navigation satellite system (global navigation satellite system, GNSS), frequency modulation (frequency modulation, FM), etc. as applied on a terminal device.
The terminal device implements display functions through a GPU, a display screen 194, an application processor, and the like. The GPU is a microprocessor for image processing, and is connected to the display 194 and the application processor. The GPU is used to perform mathematical and geometric calculations for graphics rendering.
The display screen 194 is used to display images, videos, and the like. The display 194 includes a display panel. In some embodiments, the terminal device may include 1 or N display screens 194, N being a positive integer greater than 1.
The terminal device may implement photographing functions through an ISP, a camera 193, a video codec, a GPU, a display screen 194, an application processor, and the like.
The camera 193 is used to capture still images or video. In some embodiments, the terminal device may include 1 or N cameras 193, N being a positive integer greater than 1.
The external memory interface 120 may be used to connect an external memory card, such as a Micro SD card, to realize expansion of the memory capability of the terminal device. The external memory card communicates with the processor 110 through an external memory interface 120 to implement data storage functions. For example, files such as music, video, etc. are stored in an external memory card.
The internal memory 121 may be used to store computer-executable program code that includes instructions. The internal memory 121 may include a storage program area and a storage data area.
The terminal device may implement audio functions through an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, an application processor, and the like. Such as music playing, recording, etc.
The audio module 170 is used to convert digital audio information into an analog audio signal output and also to convert an analog audio input into a digital audio signal. The speaker 170A, also referred to as a "horn," is used to convert audio electrical signals into sound signals. The terminal device can listen to music through the speaker 170A or listen to hands-free calls. A receiver 170B, also referred to as a "earpiece", is used to convert the audio electrical signal into a sound signal. When the terminal device picks up a call or voice message, the voice can be picked up by placing the receiver 170B close to the human ear. The earphone interface 170D is used to connect a wired earphone.
Microphone 170C, also referred to as a "microphone" or "microphone", is used to convert sound signals into electrical signals. In the embodiment of the present application, the terminal device may receive the sound signal for waking up the terminal device based on the microphone 170C and convert the sound signal into an electrical signal that may be processed later, and the terminal device may have at least one microphone 170C.
The sensor module 180 may include one or more of the following sensors, for example: a pressure sensor, a gyroscope sensor, a barometric sensor, a magnetic sensor, an acceleration sensor, a distance sensor, a proximity sensor, a fingerprint sensor, a temperature sensor, a touch sensor, an ambient light sensor, or a bone conduction sensor, etc. (not shown in fig. 3).
The keys 190 include a power-on key, a volume key, etc. The keys 190 may be mechanical keys. Or may be a touch key. The terminal device may receive key inputs, generating key signal inputs related to user settings of the terminal device and function control. The indicator 192 may be an indicator light, may be used to indicate a state of charge, a change in charge, a message indicating a missed call, a notification, etc.
The software system of the terminal device 100 may employ a layered architecture, an event driven architecture, a micro-core architecture, a micro-service architecture, or a cloud architecture, etc. In the embodiment of the application, taking an Android system with a layered architecture as an example, a software structure of the terminal device 100 is illustrated.
Fig. 2 is a software architecture block diagram of a terminal device according to an embodiment of the present application.
The layered architecture divides the software into several layers, each with distinct roles and branches. The layers communicate with each other through a software interface. In some embodiments, the Android system is divided into five layers, from top to bottom, an application layer, an application framework layer, an Zhuoyun row (Android run) and system libraries, a hardware abstraction layer, and a kernel layer, respectively.
The application layer may include a series of application packages. As shown in fig. 2, the application package may include telephone, mailbox, calendar, camera, and like applications.
The application framework layer provides an application programming interface (application programming interface, API) and programming framework for application layer applications. The application framework layer includes a number of predefined functions.
As shown in FIG. 2, the application framework layer may include a window manager, an activity manager, a location manager, a package manager, a notification manager, a resource manager, a telephony manager, a view system, a frame rate decision manager, and the like.
A window manager (window manager service, WMS) is used to manage the window program. The window manager can acquire the size of the display screen, judge whether a status bar exists, lock the screen, intercept the screen and the like.
The activity manager is used for managing the life cycle of each application program and the navigation rollback function. And the main thread creation of the Android is responsible for maintaining the life cycle of each application program.
The location manager is used to provide location services for applications including querying for a last known location, registering and deregistering location updates from some periodicity, etc.
The packet manager is used for program management within the system, for example: application installation, uninstallation, and upgrades, etc.
The notification manager allows the application to display notification information in a status bar, can be used to communicate notification type messages, can automatically disappear after a short dwell, and does not require user interaction. Such as notification manager is used to inform that the download is complete, message alerts, etc. The notification manager may also be a notification in the form of a chart or scroll bar text that appears on the system top status bar, such as a notification of a background running application, or a notification that appears on the screen in the form of a dialog window. For example, a text message is prompted in a status bar, a prompt tone is emitted, the terminal equipment vibrates, and an indicator light blinks.
The resource manager provides various resources for the application program, such as localization strings, icons, pictures, layout files, video files, and the like.
The telephony manager is for managing mobile device functions, comprising: mobile phone call state, obtaining telephone information (equipment, sim card, network information), monitoring telephone state and calling telephone dialer to make telephone call
The view system includes visual controls, such as controls to display text, controls to display pictures, and the like. The view system may be used to build applications. The display interface may be composed of one or more views. For example, a display interface including a text message notification icon may include a view displaying text and a view displaying a picture.
The frame rate decision manager is used for determining the screen refreshing frame rate of the terminal equipment and selecting a switching mode of the screen refreshing frame rate.
Android runtimes include core libraries and virtual machines. Android run time is responsible for scheduling and management of the Android system.
The core library consists of two parts: one part is a function which needs to be called by java language, and the other part is a core library of android.
The application layer and the application framework layer run in virtual machines. The virtual machine executes java files of the application layer and the application framework layer as binary files. The virtual machine is used for executing the functions of object life cycle management, stack management, thread management, security and exception management, garbage collection and the like. A display composition process (e.g., surface flinger) also runs in the virtual machine. The display composition process is used to control the composition of the image.
The system library may include a plurality of functional modules. For example: an image drawing module, an image rendering module, an image synthesizing module, a function library, a media library and the like.
The image drawing module is used for drawing two-dimensional or three-dimensional images. The image rendering module is used for rendering two-dimensional or three-dimensional images. The image synthesis module is used for synthesizing two-dimensional or three-dimensional images.
In a possible implementation manner, the application draws the image through the image drawing module, then the application renders the drawn image through the image rendering module, and then the application sends the rendered image to a cache queue of the display synthesis process. Every time Vsync arrives, a display composition process (e.g., surface flinger) sequentially acquires a frame of image to be composed from the buffer queue, and then performs image composition by the image composition module.
The function library provides macros, type definitions, string operation functions, mathematical calculation functions, input/output functions, and the like used in the C language
Media libraries support a variety of commonly used audio, video format playback and recording, still image files, and the like. The media library may support a variety of audio video encoding formats, such as: MPEG4, h.264, MP3, AAC, AMR, JPG, PNG, etc.
The hardware abstraction layer may include a plurality of library modules, which may be, for example, a hardware configuration module (HWC), a camera library module, and the like. The Android system can load a corresponding library module for the equipment hardware, so that the purpose of accessing the equipment hardware by an application program framework layer is achieved. The device hardware may include, for example, an LCD screen, a camera, etc. in an electronic device.
The kernel layer is a layer between hardware and software. The kernel layer is used for driving the hardware so that the hardware works. The inner core layer at least comprises LCD/LED drive, display drive, audio drive, camera drive, sensor drive, etc.
The hardware may be an audio device, a bluetooth device, a camera device, a sensor device, etc.
The following describes the technical scheme of the present application and how the technical scheme of the present application solves the above technical problems in detail with specific embodiments. The following embodiments may be implemented independently or combined with each other, and the same or similar concepts or processes may not be described in detail in some embodiments.
Fig. 3 is a flowchart of a video processing method according to an embodiment of the present application. For example, referring to fig. 3, the video processing method may include:
s301, displaying the video to be processed.
In the embodiment of the application, the video to be processed can be a video of which the mirror track needs to be changed. The video to be processed may be a video stored in advance in the terminal device, for example, a video which is shot in advance by the user and stored in a gallery of the terminal device, a video downloaded via the internet, or the like. The embodiment of the application does not limit the specific content of the video to be processed.
In a possible implementation, the terminal device may receive an operation triggered by a user for displaying the video to be processed, and display the video to be processed in response to the operation.
S302, acquiring indication information for indicating a mirror track for processing the video to be processed.
In the embodiment of the present application, the indication information for indicating the mirror track of the video to be processed may be obtained according to the operation performed by the user on the display interface of the terminal device. It can be understood that the user performs different operations on the video to be processed, and the terminal device may acquire different indication information, where the indication information may or may not carry the target mirror track.
In a possible implementation, the terminal device may receive an operation triggered by a user and used for indicating a mirror track for processing the video to be processed, and in response to the operation, the terminal device obtains indication information for indicating the mirror track for processing the video to be processed. By way of example, there are two possible implementations for the terminal device to obtain indication information for indicating the mirror trajectory for processing the video to be processed. In the following, in the embodiments corresponding to fig. 4 to fig. 13, a terminal device is taken as an example of a mobile phone, and the illustration is not limited to the embodiment of the present application.
In a first possible implementation, when the terminal device receives a first operation corresponding to a video to be processed, the terminal device may display one or more first recommended mirror trajectories in response to the first operation. And when the terminal equipment receives a second operation on a target mirror track in the one or more first recommended mirror tracks, indicating information is obtained when the terminal equipment displays the one or more first recommended mirror tracks. The indication information comprises a target mirror track. Fig. 4 is a schematic diagram of an interface for obtaining indication information according to an embodiment of the present application.
In the interface state shown as a in fig. 4, when the mobile phone receives the click operation of the edit control by the user, the mobile phone may display the interface shown as b in fig. 4. An interface shown as a in fig. 4, wherein an editing control in the interface is used for instructing the mobile phone to edit the video displayed on the current interface. As shown in a of fig. 4, one or more controls may also be included in the interface, such as: sharing, collecting, deleting, more, etc. As shown in b of fig. 4, the interface may also include one or more controls, such as: picture-in-picture, rotation, mirror trajectories, deletion, more, etc.
Further, when the mobile phone receives an operation of the mirror trajectory control clicked by the user in the interface shown as b in fig. 4, the mobile phone may display the interface shown as c in fig. 4. As shown in c of fig. 4, the interface includes a plurality of controls for a first recommended mirror trajectory, namely mirror trajectory 1, mirror trajectory 2, mirror trajectory 3, mirror trajectory 4, mirror trajectory 5, mirror trajectory 6, mirror trajectory 7, mirror trajectory 8. When the mobile phone receives the operation of clicking the recommended mirror track 2 control by the user in the interface shown as c in fig. 4, indicating information comprising the mirror track 2 is obtained.
In a second possible implementation, when receiving a third operation corresponding to the video to be processed, the terminal device responds to the third operation to obtain the indication information. Fig. 5 is a schematic diagram of an interface for obtaining indication information according to an embodiment of the present application.
In the interface state shown as a in fig. 5, when the mobile phone receives a click operation of the edit control by the user, the mobile phone may display an interface shown as b in fig. 5. When the mobile phone receives the operation of the mirror track control clicked by the user in the interface shown as b in fig. 5, the indication information which does not carry the target mirror track is obtained. Other contents shown in a of fig. 5 are similar to those shown in a of fig. 4, and will not be described again. The content shown in b in fig. 5 is similar to the content shown in b in fig. 4, and will not be described again.
S303, acquiring a key frame of the video to be processed and pose information corresponding to the key frame based on the indication information.
In the embodiment of the application, the key frame can be a video frame capable of representing a real scene corresponding to the video to be processed. The pose information may be a position of a camera corresponding to each key frame in a space of a real scene corresponding to the video to be processed and an orientation of the camera. The key frames and the pose information corresponding to the key frames are used for reconstructing a three-dimensional space of a real scene corresponding to the video to be processed.
It can be understood that the video to be processed includes a large number of video frames, and the number of the key frames of the video to be processed acquired by the terminal device can be multiple, so that the three-dimensional space reconstructed according to the key frames and the pose information corresponding to the key frames can be ensured to be closer to a real scene. When the number of key frames is plural, the same processing is performed for each key frame.
In a possible implementation, the terminal device may perform frame extraction on the video to be processed based on the indication information, for example, extract frames in the video to be processed according to a preset time interval, obtain a key frame of the video to be processed, and analyze the key frame of the video to be processed to obtain pose information corresponding to the key frame. The preset time interval for extracting the key frames is not particularly limited by the plum.
S304, training to obtain a target nerve radiation field network according to the key frames and pose information corresponding to the key frames.
In the embodiment of the application, the target nerve radiation field network can be used for representing the three-dimensional space of the real scene corresponding to the video to be processed.
The terminal device may input the key frame and pose information corresponding to the key frame to the initial neural radiation field network, and train the initial neural radiation field network to obtain the target neural radiation field network.
S305, inputting the target lens track into a target nerve radiation field network to obtain a target image sequence.
In the embodiment of the application, the target mirror track is carried in the indication information, or the target mirror track is self-defined. The description of the above steps can be referred to for carrying the target mirror track in the indication information, and will not be repeated here.
In the embodiment of the application, the target image sequence is an image sequence corresponding to a plurality of camera poses in the target mirror track.
Illustratively, there are two possible implementations of custom target mirror trajectories. In a first possible implementation, the target mirror trajectory is selected by the user from a plurality of recommended mirror trajectories, and in a second possible implementation, the target mirror trajectory is a custom mirror trajectory input by the user on an interface of the terminal device.
In a first possible implementation, according to the interface shown in b in fig. 5, when the terminal device receives an operation that the user clicks the mirror trajectory control on the interface shown in b in fig. 5, the terminal device may display the first interface. The first interface may include one or more second recommended mirror trajectories, where the one or more second recommended mirror trajectories are mirror trajectories that meet constraints of pose information corresponding to the keyframes. And when the terminal equipment receives the fourth operation on the target lens carrying track in the one or more second recommended lens carrying tracks, obtaining the target lens carrying track.
It can be understood that the constraint of pose information corresponding to the key frame is a camera pose range corresponding to the three-dimensional space reconstructed by the target neural radiation field network. For example, the video to be processed is a video of the front face and the left and right sides of the person, and the target neural radiation field network is a three-dimensional space capable of reconstructing a real scene where the person is already located. In this case, the camera pose range corresponding to the three-dimensional scene reconstructed by the target neural radiation field network can only cover the front face and the left and right sides of the person, but cannot cover the back face of the person.
Fig. 6 is an interface schematic diagram of a terminal device displaying a first interface according to an embodiment of the present application. As shown in fig. 6, the interface may further include one or more controls for a second recommended mirror track, for example: mirror moving track A, mirror moving track B, mirror moving track C, mirror moving track D, mirror moving track E, mirror moving track F, etc. Wherein, the plurality of second recommended mirror trajectories in the interface shown in fig. 6 all conform to the constraint of pose information corresponding to the key frame.
In a second possible implementation, according to the interface shown in b in fig. 5, when the terminal device receives the operation of clicking the mirror trajectory control by the user on the interface shown in b in fig. 5, the terminal device may display a second interface, where the second interface includes an original viewpoint trajectory, a recommended viewpoint trajectory, and an editable self-selection viewpoint trajectory of the key frame. And when receiving a fifth operation on the self-selection viewpoint track, the terminal equipment responds to the fifth operation to generate a target mirror track.
Fig. 7 is an interface schematic diagram of a terminal device displaying a second interface according to an embodiment of the present application.
As shown in fig. 7, the second interface includes an original viewpoint trajectory, a recommended viewpoint trajectory, and an editable self-selection viewpoint trajectory. The original viewpoint track, the recommended viewpoint track and the editable self-selection viewpoint track are all represented by line segments with arrows, a plurality of editable camera positions are distributed on the editable self-selection viewpoint track, and the editable camera positions are displayed by using marks similar to cameras in an interface shown in fig. 6. The interface can also comprise a custom mirror track control.
The self-selection viewpoint track in the second interface displayed by the terminal device may include a pose of the camera to be processed, and may also include a target mirror mode and/or a target duration. There are two possible implementations of the terminal device in response to the fifth operation on the self-selection viewpoint trajectory to generate the target mirror trajectory. In a first possible implementation, the fifth operation may be an operation of processing the pose of the camera to be processed included in the self-selection viewpoint trajectory. In a second possible implementation, the fifth operation may be an operation on the target specular manner and/or the target duration included in the self-selected viewpoint trajectory.
In a first possible implementation, the self-selection viewpoint track in the second interface displayed by the terminal device may include a camera pose to be processed, and the terminal device may generate the target mirror track according to the processed camera pose in response to a fifth operation of processing the camera pose to be processed.
Fig. 8 is a schematic diagram of an interface of a custom target mirror track according to an embodiment of the present application. In the interface state shown in a in fig. 8, when the mobile phone receives an operation, for example, a drag operation or a click operation, of a user on a plurality of to-be-processed camera poses on a self-selected viewpoint track in the interface, an interface shown in b in fig. 8 is obtained, and the self-selected viewpoint track formed by the plurality of camera poses in the interface is the target mirror track. The content shown in a in fig. 8 is similar to that shown in fig. 7, and will not be described again here.
Fig. 9 is a schematic diagram of an interface of a custom target mirror track according to an embodiment of the present application. In the interface state shown as a in fig. 9, when the mobile phone receives the click operation of the user on the custom mirror track in the interface, the mobile phone can display the interface shown as b in fig. 9. When the mobile phone receives the operation of the user in the interface shown as b in fig. 9, a plurality of camera poses are distributed on the mirror track in the interface shown as b in fig. 9. The user performs an operation in the interface shown as b in fig. 9, for example, the user performs a click operation in a different position of the interface shown as b in fig. 9, a target mirror trajectory is generated according to the order of the click position and the motor, or the user performs a gesture operation in the interface shown as b in fig. 9, the gesture operation trajectory being the target mirror trajectory. The content shown in a in fig. 9 is similar to that shown in fig. 7, and will not be described again here.
In a second possible implementation, the self-selection viewpoint track in the second interface displayed by the terminal device may include a mirror mode to be determined and a corresponding duration to be determined. And the terminal equipment responds to a fifth operation on the target mirror mode and/or the target time length, and generates a target mirror track according to the target mirror mode and/or the target time length.
For example, the terminal device may generate the target mirror trajectory according to the target mirror mode and the target time period in response to a fifth operation on the target mirror mode and the target time period. Fig. 10 is a schematic diagram of an interface for displaying a target mirror mode according to an embodiment of the present application. When receiving operations of clicking a roll control and pushing the control by a user in the interface shown in fig. 10 and operations of filling in duration, the mobile phone generates a target mirror track according to duration 15 seconds corresponding to roll and duration 5 seconds corresponding to recommendation. As shown in the interface of fig. 10, when the mobile phone receives a click operation of the user on the mirror mode, the sequence number of the order of the mirror mode in the mirror track can be displayed on the control of the mirror mode according to the sequence of the click operation. As with the interface shown in fig. 10, the cell phone generated a mirror trajectory that was advanced 5 seconds after 15 seconds of roll.
For example, the terminal device may generate the target mirror trajectory according to the target mirror mode in response to the fifth operation of the target mirror mode. Fig. 11 is a second interface schematic diagram of a display target mirror mode according to an embodiment of the present application. When receiving the operations of clicking the roll control and pushing the control by the user in the interface shown in fig. 11, the mobile phone displays the sequence number of the sequence of the mirror mode in the mirror track and the corresponding mirror duration on the selected mirror mode control according to the sequence of the clicking operation. When the mobile phone receives that the user clicks the determination control in the interface shown in fig. 11, the mobile phone generates a target mirror track according to the roll and the corresponding first mirror mode duration, the push and the corresponding second mirror mode duration. The interface shown in 11, the cell phone generated a mirror trajectory of 5 seconds propulsion after 15 seconds roll.
For example, the terminal device may generate the target mirror trajectory according to the target time length in response to a fifth operation on the target time length. Fig. 12 is a schematic diagram of an interface for displaying a target mirror mode according to an embodiment of the present application. As shown in the interface of fig. 12, the user may input the mirror duration corresponding to the roll and the mirror duration corresponding to the zoom out in the interface of fig. 12. When the mobile phone receives the mirror duration input by the user in the interface shown in fig. 12 and receives the click operation of the user on the determination control in the interface shown in fig. 12, a target mirror track is generated according to the roll and the mirror duration corresponding thereto and the zoom-out and the mirror duration corresponding thereto.
In the embodiment of the application, the terminal equipment can obtain the target lens-carrying track in any mode, the obtained target lens-carrying track is input into the target nerve radiation field network, and the target nerve radiation field network outputs image frames according to the sequence of a plurality of camera poses in the target lens-carrying track, so as to obtain a target image sequence.
And S306, rendering a target image sequence to obtain a video conforming to the target mirror track.
The terminal device may render the target image sequence through the volume renderer, that is, traverse all pixels on the image under each camera pose in the target lens trajectory, determine a camera ray corresponding to the image under each camera pose, and integrate the color and the volume density of the sampling point on the camera ray to obtain the video frame under each camera pose. The terminal equipment can generate a video conforming to the target mirror track according to the sequence of the camera pose in the target track and the video frame corresponding to the camera pose.
Based on the target neural radiation field network is obtained by the terminal equipment according to the key frames of the video to be processed and the pose information corresponding to the key frames, and the reconstruction of the real scene corresponding to the video to be processed is realized. When the target fortune mirror track is input into the target neural network, the target neural radiation field network can output image frames corresponding to a plurality of camera poses in the target fortune mirror track, so that a video conforming to the target fortune mirror track is obtained, the fortune mirror track of the photographed video is modified, a video with a new visual angle is obtained, and the user experience is improved.
In order to facilitate understanding of the video processing method provided in the embodiment of the present application, the following describes in detail the obtaining of the target neural radiation field network described in step S203 and step S204 in the above embodiment. For example, refer to fig. 13, fig. 13 is a schematic flow chart of a method for training to obtain a target neural radiation field network according to an embodiment of the present application. The method may comprise the steps of:
s1301, acquiring the key frame of the video to be processed and pose information corresponding to the key frame.
In the embodiment of the application, the key frame of the video to be processed can be a video frame in the video to be processed, wherein the video frame can represent a real scene in the video.
In the embodiment of the application, the terminal equipment obtains the key frame of the video to be processed and the pose information corresponding to the key frame, which comprises two possible implementations.
In a first possible implementation, the terminal device may extract frames from the video to be processed according to a preset time interval, to obtain key frames; and acquiring pose information corresponding to the key frame by utilizing a feature retrieval matching algorithm and an incremental reconstruction algorithm. For example, the preset time interval may be inversely related to the speed of the camera moving in the video to be processed, and the embodiment of the present application does not limit the preset time interval.
In a second possible implementation, the terminal device may extract frames from the video to be processed according to a preset time interval, to obtain an initial key frame; the terminal equipment removes the initial key frame with the definition smaller than the definition threshold value in the initial key frame and/or removes part of the initial key frame with the similarity larger than the similarity threshold value in the initial key frame to obtain the key frame; and acquiring pose information corresponding to the key frame by utilizing a feature retrieval matching algorithm and an incremental reconstruction algorithm. For example, the preset time interval may be inversely related to the speed of camera movement in the video to be processed. The embodiment of the application does not limit the preset time interval. The sharpness threshold and the similarity threshold may be set according to practical situations, for example, the sharpness threshold may be 80%,85% and other values, the similarity threshold may be 90%,95% and other values, which is not limited in the embodiment of the present application.
In the embodiment of the application, the terminal equipment trains and obtains two possible realization modes of the target nerve radiation field network according to the key frame of the video to be processed and the pose information corresponding to the key frame. In a first possible implementation, the training results in a target neural radiation field network that distinguishes between a foreground radiation field and a background radiation field, as described in step S1302 and step S1303. In a second possible implementation, two target neural radiation field networks are obtained through training, wherein one target neural radiation field network is a neural radiation field network corresponding to the foreground of the video to be processed, and the other target neural radiation field network is a neural radiation field network corresponding to the background of the video to be processed. As described in step S1304 and step S1305.
S1302, inputting pose information corresponding to the key frame into an initial neural radiation field network to obtain an image frame to be adjusted.
In the embodiment of the present application, the initial neural radiation field network may be a neural radiation field network constructed for the terminal device. The image frame to be adjusted can be an image frame corresponding to the key frame, and the pose of the camera of the image frame to be adjusted is the same as that of the camera of the key frame. The image frames to be adjusted may be used to determine whether the trained initial neural radiation field network has reached convergence.
For example, the initial neural radiation field network may distinguish a foreground radiation field from a background radiation field, where the object distance of the camera in the foreground radiation field is less than or equal to a volume density corresponding to a preset value of 0, and the volume density of the object distance of the camera in the background radiation field is greater than the preset value of 0. The accuracy and definition of the background of the reconstructed video to be processed can be ensured by distinguishing the foreground radiation field from the background radiation field.
In the embodiment of the application, the volume density is the volume density of sampling points on a camera ray corresponding to a video frame in the video to be processed. The preset value is the object distance of the camera for distinguishing the foreground and the background of the video to be processed, and the embodiment of the application is not limited to this.
In a possible implementation, the terminal device inputs pose information corresponding to the key frame into the initial neural radiation field network, trains the initial neural radiation field network, and in the training process, the initial neural radiation field network can output an image frame to be adjusted under the camera pose according to the input camera pose.
And S1303, adjusting the initial neural radiation field network according to the difference between the image frame to be adjusted and the key frame until the difference between the image frame to be adjusted and the key frame meets the preset condition, and obtaining the target neural radiation field network.
In the embodiment of the application, the difference between the image frame to be adjusted and the key frame can be used for determining whether the initial neural radiation field network is converged in the training process. The preset condition is a condition when the initial neural radiation field network converges, and the embodiment of the application is not limited.
For example, in the process of training the initial neural radiation field network by the terminal device, the terminal device can perform iterative training on the initial neural radiation field model according to the key frames and pose information corresponding to the key frames. The terminal equipment can determine whether the trained initial neural radiation field network is converged according to the image frame to be adjusted and the key frame, and when the difference between the image frame to be adjusted and the key frame meets the preset condition, namely the convergence condition is met, the initial neural radiation field network is determined to be converged, and the terminal equipment obtains the target neural radiation field network.
It can be appreciated that since the initial neural radiation field network distinguishes between the foreground radiation field and the background radiation field, the training-derived target neural radiation field network is closer to the real scene corresponding to the video to be processed.
S1304, inputting pose information corresponding to the key frame into a first initial neural radiation field network to obtain a foreground image frame to be adjusted; and adjusting the first initial neural radiation field network according to the difference of the foreground image frame to be adjusted and the foreground of the key frame until the difference of the foreground image frame to be adjusted and the foreground of the key frame meets a first preset condition, so as to obtain a first target neural radiation field network.
In the embodiment of the application, the foreground image frame to be adjusted can be an image frame corresponding to the foreground in the key frame, and the foreground image frame to be adjusted corresponds to the camera pose of the key frame. The first target neural radiation field network may be a three-dimensional space corresponding to a foreground of a real scene corresponding to the video to be processed. The first preset condition is a convergence condition of the first initial neural radiation field network, which is not limited in the embodiment of the present application.
For example, in the process of training the first initial neural radiation field network by the terminal device, the terminal device may perform iterative training on the first initial neural radiation field model according to the key frame and pose information corresponding to the key frame. The terminal equipment can determine whether the trained first initial neural radiation field network is converged according to the foreground image frame to be adjusted and the key frame, and when the difference between the foreground image frame to be adjusted and the key frame meets a first preset condition, namely the convergence condition is met, the first initial neural radiation field network is determined to be converged, and the terminal equipment obtains the first target neural radiation field network.
S1305, inputting pose information corresponding to the key frame into a second initial neural radiation field network to obtain a background image frame to be adjusted; and adjusting a second initial neural radiation field network according to the difference between the background image frame to be adjusted and the background of the key frame until the difference between the background image frame to be adjusted and the background of the key frame meets a second preset condition, so as to obtain a second target neural radiation field network.
In the embodiment of the application, the background image frame to be adjusted can be an image frame corresponding to the background in the key frame, and the background image frame to be adjusted corresponds to the camera pose of the key frame. The second target neural radiation field network may be a three-dimensional space corresponding to a background of a real scene corresponding to the video to be processed. The second preset condition is a convergence condition of the second initial neural radiation field network, which is not limited in the embodiment of the present application.
For example, in the process of training the second initial neural radiation field network by the terminal device, the terminal device may perform iterative training on the second initial neural radiation field model according to the key frame and pose information corresponding to the key frame. The terminal equipment can determine whether the trained second initial neural radiation field network is converged according to the background image frame to be adjusted and the key frame, and when the difference between the background image frame to be adjusted and the key frame meets a second preset condition, namely the convergence condition is met, the second initial neural radiation field network is determined to be converged, and the terminal equipment obtains a second target neural radiation field network.
According to a second possible implementation of the training neural radiation field, for step S305 in fig. 3, after obtaining the first target neural radiation field network and the second target neural radiation field network, a method for inputting the target motion mirror track into the target neural radiation field network to obtain the target image sequence is as follows:
The terminal device may input the target lens trajectory into the first target neural radiation field network and the second target neural radiation field network, respectively, to obtain a first target image sequence and a second target image sequence; and fusing the first target image sequence and the second target image sequence to obtain a target image sequence.
In the embodiment of the application, the first target image sequence is an image sequence of foreground images corresponding to a plurality of camera poses in a target mirror track. The second target image sequence is an image sequence of a background image in a plurality of camera poses in the target mirror trajectory.
For example, the terminal device may fuse the first target image sequence and the second target image sequence according to the object distance corresponding to the first target image sequence and the object distance corresponding to the second target image sequence to obtain a target image sequence, that is, fuse the foreground image sequence and the background image sequence to obtain a target image sequence corresponding to the real scene. For example, the color and the volume density of the sampling point on the camera ray corresponding to the first target image sequence and the camera ray corresponding to the second target image sequence may be fused according to the corresponding object distance, so as to obtain a fused camera ray, i.e. a target image sequence.
Based on the method, the terminal equipment can train to obtain a target nerve radiation field network with higher accuracy according to the key frames of the video to be processed and the camera pose corresponding to the key frames, and the definition of the video which is obtained according to the target nerve radiation field network and accords with the target microscope track is effectively improved.
Fig. 14 is a flowchart of another video processing method according to an embodiment of the present application. According to fig. 14, the method for processing the mirror trajectory of the shot video to be processed includes:
when receiving the operation of processing the mirror track of the video to be processed by the user, the terminal equipment performs key frame screening and camera pose estimation on the video to be processed, and builds a training set. The training set comprises multi-angle key frames of the video to be processed and pose information corresponding to the key frames.
Fig. 15 is a flowchart of a method for acquiring a keyframe and a camera pose corresponding to the keyframe according to an embodiment of the present application. According to fig. 15, a terminal device may obtain a video to be processed, perform image sharpness estimation and motion displacement estimation on a video sequence to be processed, and perform frame extraction on the video to be processed according to a time interval threshold, so as to obtain multi-angle key frames in the video to be processed. It may be appreciated that the motion displacement estimation may determine a time interval threshold, and the terminal device may perform frame extraction on the video sequence to be processed according to the time interval threshold to obtain an initial key frame, and the method of performing frame extraction processing according to the time interval threshold may refer to the description about frame extraction in the above embodiment, which is not repeated herein. The sharpness estimation can be used for screening the obtained initial key frames to obtain final multi-angle key frames. After obtaining the key frame, carrying out search matching and incremental reconstruction processing on the key frame by utilizing COLMAP to obtain the camera pose corresponding to the key frame.
Further, the terminal equipment constructs an initial neural radiation field network, and inputs the camera pose corresponding to the multi-angle key frame in the training set to the initial neural radiation field network, and trains to obtain a target neural radiation field network. According to the illustration of fig. 14, the terminal device constructs an initial neural radiation field network that can distinguish between a foreground radiation field and a background radiation field, or the constructed initial neural radiation field network includes a first initial neural radiation field network and a second initial neural radiation field network. Correspondingly, the target nerve radiation field network obtained through training can distinguish a foreground radiation field and a background radiation field, or the target nerve radiation field network comprises a first target nerve radiation field network and a second target nerve radiation field network.
Fig. 16 is a schematic flow chart of training and application of a neural radiation field network according to an embodiment of the present application. According to fig. 16, the terminal device may acquire a camera pose corresponding to a key frame, sample the camera pose corresponding to the key frame, and input the sampled camera pose and sampling data to an initial neural radiation field to obtain a target neural radiation field network capable of distinguishing a foreground radiation field and a background radiation field, or input the target neural radiation field network including a first target neural radiation field network and a second target neural radiation field network.
When the terminal device constructs the training set, the terminal device can determine the constraint of pose information corresponding to the key frame, namely the optional lens range corresponding to the video to be processed, namely the optional camera pose range. The terminal equipment can provide recommended mirror moving tracks and/or custom mirror moving tracks for users according to the selectable mirror moving range so as to acquire the target mirror moving tracks and mirror moving visual angle sequences corresponding to the target mirror moving tracks, namely camera pose sequences in the target mirror moving tracks. For example, the terminal device displays an interface including a recommended mirror mode and/or a custom mirror track, so that the terminal device determines the mirror pushing mode as a target mirror track or determines the custom mirror track as the target mirror track in response to the operation of the user, thereby realizing interactive mirror selection.
Further, the terminal device may input a sequence of mirror views corresponding to the target mirror trajectory to the target neural radiation field network. According to fig. 14 and 16, the terminal device may render, by using a volume renderer, an image sequence output by the target neural radiation field network, and output a view angle image corresponding to the mirror view angle sequence, that is, an image under each mirror view angle, so as to obtain a video after mirror processing. It can be appreciated that the mirror processed video conforms to the target mirror trajectory.
It may be understood that the interface of the terminal device provided in the embodiment of the present application is only used as an example, and is not limited to the embodiment of the present application.
The method provided by the embodiment of the present application is described above with reference to fig. 4-16, and in order to implement the above functions, the method includes corresponding hardware structures and/or software modules that perform the respective functions. Those of skill in the art will readily appreciate that the present application may be implemented in hardware or a combination of hardware and computer software, as the method steps of the examples described in connection with the embodiments disclosed herein. Whether a function is implemented as hardware or computer software driven hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The embodiment of the application can divide the functional modules of the device for realizing the frame rate switching method according to the method example, for example, each functional module can be divided corresponding to each function, and two or more functions can be integrated in one processing module. The integrated modules may be implemented in hardware or in software functional modules. It should be noted that, in the embodiment of the present application, the division of the modules is schematic, which is merely a logic function division, and other division manners may be implemented in actual implementation.
Fig. 17 is a schematic hardware structure of a control device according to an embodiment of the present application, as shown in fig. 17, where the control device includes a processor 1701, a communication line 1704 and at least one communication interface (illustrated in fig. 17 by taking a communication interface 1703 as an example).
The processor 1701 may be a general purpose central processing unit (central processing unit, CPU), microprocessor, application-specific integrated circuit (ASIC), or one or more integrated circuits for controlling the execution of the programs of the present application.
Communication lines 1704 may include circuitry to communicate information between the components described above.
The communication interface 1703 uses any transceiver-like device for communicating with other devices or communication networks, such as ethernet, wireless local area network (wireless local area networks, WLAN), etc.
Possibly, the control device may also comprise a memory 1702.
The memory 1702 may be, but is not limited to, a read-only memory (ROM) or other type of static storage device that can store static information and instructions, a random access memory (random access memory, RAM) or other type of dynamic storage device that can store information and instructions, an electrically erasable programmable read-only memory (EEPROM), a compact disc-only memory (compact disc read-only memory) or other optical disk storage, a compact disc storage (including compact disc, laser disc, optical disc, digital versatile disc, blu-ray disc, etc.), a magnetic disk storage medium or other magnetic storage device, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer. The memory may be separate and coupled to the processor via communication line 1704. The memory may also be integrated with the processor.
The memory 1702 is used for storing computer-executable instructions for performing the aspects of the present application, and is controlled by the processor 1701 for execution. The processor 1701 is configured to execute computer-executable instructions stored in the memory 1702, thereby implementing the video processing method provided by the embodiment of the present application.
Possibly, the computer-executable instructions in the embodiments of the present application may also be referred to as application program codes, which are not limited in particular.
In a particular implementation, the processor 1701 may include one or more CPUs, such as CPU0 and CPU1 in fig. 17, as an embodiment.
In a specific implementation, as an embodiment, the control device may include a plurality of processors, such as processor 1701 and processor 1705 in fig. 17. Each of these processors may be a single-core (single-CPU) processor or may be a multi-core (multi-CPU) processor. A processor herein may refer to one or more devices, circuits, and/or processing cores for processing data (e.g., computer program instructions).
Fig. 18 is a schematic structural diagram of a chip according to an embodiment of the present application. The chip 180 includes one or more (including two) processors 1820 and a communication interface 1830.
In some implementations, the memory 1840 stores the following elements: executable modules or data structures, or a subset thereof, or an extended set thereof.
In an embodiment of the application, memory 1840 may include read only memory and random access memory and provide instructions and data to processor 1820. A portion of the memory 1840 may also include non-volatile random access memory (non-volatile random access memory, NVRAM).
In an embodiment of the application, memory 1840, communication interface 1830, and memory 1840 are coupled together by bus system 1810. The bus system 1810 may include a power bus, a control bus, a status signal bus, and the like, in addition to a data bus. For ease of description, the various buses are labeled as bus system 1810 in FIG. 18.
The methods described above for embodiments of the present application may be implemented in the processor 1820 or by the processor 1820. Processor 1820 may be an integrated circuit chip with signal processing capabilities. In implementation, the steps of the methods described above may be performed by integrated logic circuitry in hardware or instructions in software in processor 1820. The processor 1820 may be a general purpose processor (e.g., a microprocessor or a conventional processor), a digital signal processor (digital signal processing, DSP), an application specific integrated circuit (application specific integrated circuit, ASIC), an off-the-shelf programmable gate array (field-programmable gate array, FPGA) or other programmable logic device, discrete gates, transistor logic, or discrete hardware components, and the processor 1820 may implement or perform the methods, steps, and logic blocks disclosed in embodiments of the application.
The steps of the method disclosed in connection with the embodiments of the present application may be embodied directly in the execution of a hardware decoding processor, or in the execution of a combination of hardware and software modules in a decoding processor. The software modules may be located in a state-of-the-art storage medium such as random access memory, read-only memory, programmable read-only memory, or charged erasable programmable memory (electrically erasable programmable read only memory, EEPROM). The storage medium is located in the memory 1840 and the processor 1820 reads information from the memory 1840 and performs the steps of the method described above in connection with its hardware.
In the above embodiments, the instructions stored by the memory for execution by the processor may be implemented in the form of a computer program product. The computer program product may be written in the memory in advance, or may be downloaded in the form of software and installed in the memory.
The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, the processes or functions in accordance with embodiments of the present application are produced in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by wired (e.g., coaxial cable, fiber optic, digital subscriber line (digital subscriber line, DSL), or wireless (e.g., infrared, wireless, microwave, etc.), or semiconductor medium (e.g., solid state disk, SSD)) or the like.
The embodiment of the application also provides a computer readable storage medium. The methods described in the above embodiments may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. Computer readable media can include computer storage media and communication media and can include any medium that can transfer a computer program from one place to another. The storage media may be any target media that is accessible by a computer.
As one possible design, the computer-readable medium may include compact disk read-only memory (CD-ROM), RAM, ROM, EEPROM, or other optical disk memory; the computer readable medium may include disk storage or other disk storage devices. Moreover, any connection is properly termed a computer-readable medium. For example, if the software is transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. Disk and disc, as used herein, includes Compact Disc (CD), laser disc, optical disc, digital versatile disc (digital versatile disc, DVD), floppy disk and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers.
Combinations of the above should also be included within the scope of computer-readable media. The foregoing is merely illustrative embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily think about variations or substitutions within the technical scope of the present invention, and the invention should be covered. Therefore, the protection scope of the invention is subject to the protection scope of the claims.

Claims (11)

1. A video processing method applied to an electronic device, the method comprising:
displaying a video to be processed;
acquiring indication information for indicating and processing the mirror track of the video to be processed;
acquiring a key frame of the video to be processed and pose information corresponding to the key frame based on the indication information;
training to obtain a target nerve radiation field network according to the key frames and pose information corresponding to the key frames; the target nerve radiation field network is used for representing a three-dimensional space of a real scene corresponding to the video to be processed;
inputting a target fortune mirror track into the target nerve radiation field network to obtain a target image sequence; the target mirror track is carried in the indication information, or the target mirror track is self-defined;
Rendering the target image sequence to obtain a video conforming to the target mirror track;
the target neural radiation field network comprises a first target neural radiation field network and a second target neural radiation field network; training to obtain a target neural radiation field network according to the key frame and pose information corresponding to the key frame, wherein the training comprises the following steps:
inputting pose information corresponding to the key frame into a first initial neural radiation field network to obtain a foreground image frame to be adjusted;
adjusting the first initial neural radiation field network according to the difference between the foreground image frame to be adjusted and the foreground of the key frame until the difference between the foreground image frame to be adjusted and the foreground of the key frame meets a first preset condition, so as to obtain a first target neural radiation field network; the first target nerve radiation field network is a three-dimensional space corresponding to the foreground of a real scene corresponding to the video to be processed;
inputting pose information corresponding to the key frame into a second initial neural radiation field network to obtain a background image frame to be adjusted;
adjusting the second initial neural radiation field network according to the difference between the background image frame to be adjusted and the background of the key frame until the difference between the background image frame to be adjusted and the background of the key frame meets a second preset condition, so as to obtain a second target neural radiation field network; the second target nerve radiation field network is a three-dimensional space corresponding to the background of the real scene corresponding to the video to be processed;
Inputting the target fortune mirror track into the target neural radiation field network to obtain a target image sequence, wherein the method comprises the following steps:
inputting the target lens-carrying track into the first target nerve radiation field network and the second target nerve radiation field network respectively to obtain a first target image sequence and a second target image sequence;
and fusing the first target image sequence and the second target image sequence to obtain a target image sequence.
2. The method of claim 1, wherein inputting the target mirror trajectory into the target neural radiation field network results in a sequence of target images, comprising:
the target nerve radiation field network outputs image frames according to the sequence of the plurality of camera poses in the target lens track, and a target image sequence is obtained.
3. The method according to claim 1 or 2, wherein the obtaining the indication information for indicating the mirror trajectory for processing the video to be processed includes:
receiving a first operation corresponding to the video to be processed;
in response to the first operation, displaying one or more first recommended mirror trajectories;
and when receiving a second operation on the target lens moving track in the one or more first recommended lens moving tracks, obtaining the indication information, wherein the indication information comprises the target lens moving track.
4. The method according to claim 1 or 2, wherein the obtaining the indication information for indicating the mirror trajectory for processing the video to be processed includes:
receiving a third operation corresponding to the video to be processed;
responding to the third operation to obtain the indication information;
before the target lens track is input into the target nerve radiation field network, the method further comprises the following steps:
and acquiring the target mirror track.
5. The method of claim 4, wherein the acquiring the target mirror trajectory comprises:
displaying a first interface, wherein the first interface comprises one or more second recommended mirror trajectories, and the one or more second recommended mirror trajectories are mirror trajectories conforming to the constraints of pose information corresponding to the key frames;
and when a fourth operation on a target lens carrying track in the one or more second recommended lens carrying tracks is received, obtaining the target lens carrying track.
6. The method of claim 4, wherein the acquiring the target mirror trajectory comprises:
displaying a second interface, wherein the second interface comprises an original viewpoint track, a recommended viewpoint track and an editable self-selection viewpoint track of the key frame;
Receiving a fifth operation on the self-selection viewpoint track;
and generating the target mirror track in response to the fifth operation.
7. The method according to claim 6, wherein the self-selected viewpoint trajectory includes a camera pose to be processed, and the fifth operation is an operation of processing the camera pose to be processed;
the generating the target mirror trajectory in response to the fifth operation includes:
and responding to the fifth operation, and generating the target mirror track according to the processed camera pose.
8. The method of claim 6, wherein the self-selected viewpoint trajectory includes a mirror mode to be determined and a corresponding duration to be determined; the fifth operation comprises an operation on a target mirror mode and/or a target duration;
the generating the target mirror trajectory in response to the fifth operation includes:
and responding to the fifth operation, and generating the target mirror track according to the target mirror mode and/or the target time length.
9. The method according to claim 1 or 2, wherein the obtaining the key frame of the video to be processed and pose information corresponding to the key frame includes:
Extracting frames from the video to be processed according to a preset time interval to obtain the key frames; acquiring pose information corresponding to the key frame by utilizing a feature retrieval matching algorithm and an incremental reconstruction algorithm;
or alternatively, the process may be performed,
extracting frames from the video to be processed according to a preset time interval to obtain an initial key frame; removing an initial key frame with definition smaller than a definition threshold value in the initial key frame, and/or removing a part of initial key frames with similarity larger than a similarity threshold value in the initial key frame to obtain the key frame; and acquiring pose information corresponding to the key frame by utilizing a feature retrieval matching algorithm and an incremental reconstruction algorithm.
10. An electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor, when executing the computer program, causes the electronic device to perform the method of any one of claims 1 to 9.
11. A computer readable storage medium storing a computer program, which when executed by a processor causes a computer to perform the method of any one of claims 1 to 9.
CN202210396606.6A 2022-04-15 2022-04-15 Video processing method, electronic device and storage medium Active CN114979785B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210396606.6A CN114979785B (en) 2022-04-15 2022-04-15 Video processing method, electronic device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210396606.6A CN114979785B (en) 2022-04-15 2022-04-15 Video processing method, electronic device and storage medium

Publications (2)

Publication Number Publication Date
CN114979785A CN114979785A (en) 2022-08-30
CN114979785B true CN114979785B (en) 2023-09-08

Family

ID=82977096

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210396606.6A Active CN114979785B (en) 2022-04-15 2022-04-15 Video processing method, electronic device and storage medium

Country Status (1)

Country Link
CN (1) CN114979785B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116703995B (en) * 2022-10-31 2024-05-14 荣耀终端有限公司 Video blurring processing method and device
CN116958492B (en) * 2023-07-12 2024-05-03 数元科技(广州)有限公司 VR editing method for reconstructing three-dimensional base scene rendering based on NeRf

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111760286A (en) * 2020-06-29 2020-10-13 完美世界(北京)软件科技发展有限公司 Switching method and device of mirror operation mode, storage medium and electronic device
CN112019768A (en) * 2020-09-04 2020-12-01 北京奇艺世纪科技有限公司 Video generation method and device and electronic equipment
CN112153242A (en) * 2020-08-27 2020-12-29 北京电影学院 Virtual photography method based on camera behavior learning and sample driving
CN112927271A (en) * 2021-03-31 2021-06-08 Oppo广东移动通信有限公司 Image processing method, image processing apparatus, storage medium, and electronic device
CN114245000A (en) * 2020-09-09 2022-03-25 北京小米移动软件有限公司 Shooting method and device, electronic equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111760286A (en) * 2020-06-29 2020-10-13 完美世界(北京)软件科技发展有限公司 Switching method and device of mirror operation mode, storage medium and electronic device
CN112153242A (en) * 2020-08-27 2020-12-29 北京电影学院 Virtual photography method based on camera behavior learning and sample driving
CN112019768A (en) * 2020-09-04 2020-12-01 北京奇艺世纪科技有限公司 Video generation method and device and electronic equipment
CN114245000A (en) * 2020-09-09 2022-03-25 北京小米移动软件有限公司 Shooting method and device, electronic equipment and storage medium
CN112927271A (en) * 2021-03-31 2021-06-08 Oppo广东移动通信有限公司 Image processing method, image processing apparatus, storage medium, and electronic device

Also Published As

Publication number Publication date
CN114979785A (en) 2022-08-30

Similar Documents

Publication Publication Date Title
CN107087101B (en) Apparatus and method for providing dynamic panorama function
CN115473957B (en) Image processing method and electronic equipment
CN114979785B (en) Video processing method, electronic device and storage medium
CN115689963B (en) Image processing method and electronic equipment
CN113837920B (en) Image rendering method and electronic equipment
CN116055857B (en) Photographing method and electronic equipment
CN113747199A (en) Video editing method, video editing apparatus, electronic device, storage medium, and program product
CN113822322A (en) Image processing model training method and text processing model training method
CN113409427A (en) Animation playing method and device, electronic equipment and computer readable storage medium
CN111031377B (en) Mobile terminal and video production method
CN112416984A (en) Data processing method and device
CN114222187B (en) Video editing method and electronic equipment
CN113095163B (en) Video processing method, device, electronic equipment and storage medium
CN116095413A (en) Video processing method and electronic equipment
CN116797767A (en) Augmented reality scene sharing method and electronic device
CN114332709A (en) Video processing method, video processing device, storage medium and electronic equipment
CN115734032A (en) Video editing method, electronic device and storage medium
CN116193275B (en) Video processing method and related equipment
CN116668762B (en) Screen recording method and device
CN116089368B (en) File searching method and related device
CN116757963B (en) Image processing method, electronic device, chip system and readable storage medium
WO2024046010A1 (en) Interface display method, and device and system
CN116668764B (en) Method and device for processing video
CN116522400B (en) Image processing method and terminal equipment
CN115359105B (en) Depth-of-field extended image generation method, device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant