CN113572948B

CN113572948B - Video processing method and video processing device

Info

Publication number: CN113572948B
Application number: CN202010359399.8A
Authority: CN
Inventors: 陈帅; 苏霞; 刘蒙; 吴虹; 马靖
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2020-04-29
Filing date: 2020-04-29
Publication date: 2022-11-11
Anticipated expiration: 2040-04-29
Also published as: CN113572948A

Abstract

The embodiment of the invention discloses a video processing method and a video processing device. In the method, the terminal can execute N times of preview processes to obtain an original global video and a first parameter set, wherein N is a positive integer not less than 2. Then, the terminal may send the first parameter set to a server, so that the server processes the first parameter set to obtain a second parameter set, and send the second parameter set to the terminal. And after receiving the second parameter set, the terminal can process the original global video through the second parameter set to obtain a target global video, and finally, the target global video is stored. By implementing the technical scheme, the terminal can realize the global processing of the original global video through the interaction of the server, and the quality of the video recorded by the terminal is improved.

Description

Video processing method and video processing device

Technical Field

The present invention relates to the field of electronic technologies, and in particular, to a video processing method and a video processing apparatus.

Background

In the current society, video recording has become one of the necessary functions of terminals such as mobile phones and tablets, and users can record and share videos through the terminals. At present, the requirement of users on video recording effect is higher and higher, and more video post-processing algorithms are applied to the terminal.

At present, the video processing method applied to the terminal has the problems that the memory of the terminal is not large enough, the computing power cannot meet the requirement for processing the global video and the like, and the terminal can only process the acquired video clip by using a video post-processing algorithm. Specifically, after the terminal collects the video clips, the video clips are processed according to various video post-processing algorithms, the processed video clips are displayed in real time, and finally the preview video stored by the terminal is formed by splicing the processed video clips. The method realizes the processing of the video and ensures the real-time performance of video preview in the video recording process.

However, in video processing, the effect of the video post-processing algorithm on the global video processing is often better than the effect of the video post-processing algorithm on the segment video processing. Therefore, the video processing method applied to the terminal at present has limited video processing effect, and the optimal optimization of a video post-processing algorithm on the video is not realized.

Disclosure of Invention

The embodiment of the invention provides a video processing method and a video processing device, wherein when a terminal carries out local processing and previewing on a video, the terminal obtains global parameters of an original video through interaction with a server, then carries out global processing on the original video to obtain a globally processed video, and updates the previewed video into the globally processed video. The method not only ensures the real-time performance of video preview, but also realizes the global processing of the video, and improves the quality of the video recorded by the terminal.

In a first aspect, an embodiment of the present invention provides a video processing method, which is applied to a terminal, and the method includes:

executing N times of preview processes to obtain an original global video and a first parameter set, wherein N is a positive integer not less than 2, the original global video is a set of images obtained in the N times of preview processes, the first parameter set comprises parameters of each frame of image in the original global video, and the ith preview process comprises the following steps: acquiring an ith original video clip and parameters of the ith original video clip; processing the video sub-segments in the ith original video segment according to the parameters of the ith original video segment to obtain an ith preview video segment; displaying the ith preview video clip; i is a positive integer not greater than N;

sending the first parameter set to a server so that the server processes the first parameter set to obtain a second parameter set, and sending the second parameter set to the terminal by the server;

receiving the second set of parameters;

processing the original global video through the second parameter set to obtain a target global video;

and storing the target global video.

As a possible implementation, the method further comprises:

the parameters comprise camera poses, and the parameters of the ith original video segment are original pose sequences formed by arranging the camera poses of each frame of image in the ith original video segment in the acquisition order of the images; the first set of parameters is a sequence of camera poses formed by arranging the camera poses of each frame of image in the original global video in the order of acquisition of the images.

As a possible implementation manner, the processing the video sub-segment in the ith original video segment according to the parameter of the ith original video segment to obtain the ith preview video segment includes:

processing each camera pose in the original pose sequence to obtain an optimized pose sequence, wherein the original pose sequence corresponds to the camera pose in the optimized pose sequence one by one;

and processing the video sub-segments in the ith original video segment through an image stabilizing algorithm according to the optimized pose sequence to obtain the ith preview video segment.

As a possible implementation, the method further comprises:

the second parameter set is a sequence of camera poses obtained by processing the first parameter set by the server, and the first parameter set corresponds to the camera poses in the second parameter set in a one-to-one mode.

As a possible implementation, the method further comprises:

each frame image in the ith original video clip comprises at least 2 image layers, and the exposure parameters of the at least 2 image layers are different from each other; the parameter is used for indicating the gray information of the image layer; the parameters of the ith original video clip comprise the gray scale information of each frame image in the ith original video clip; the first parameter set comprises gray scale information of each frame of image in the original global video.

As a possible implementation manner, the processing, according to the parameter of the ith original video segment, a video sub-segment in the ith original video segment to obtain an ith preview video segment includes:

processing each frame image of a video sub-fragment in an ith original video fragment according to the gray scale information of the ith original video fragment to obtain a main picture layer of each frame image in the video sub-fragment;

and the ith preview video segment comprises a main image layer of each frame of image in the video sub-segment.

As a possible implementation manner, the second parameter set is a set of fusion parameters obtained by processing the first parameter set by the server; the target global video obtained by processing the original global video through the second parameter set comprises:

synthesizing at least 2 image layers of each frame of image through the corresponding fusion parameters of each frame of image in the original global video to obtain a main image layer corresponding to each frame of image, wherein the target global video comprises the main image layer corresponding to each frame of image.

In a second aspect, an embodiment of the present invention provides a video processing method, which is applied to a server, and the method includes:

receiving a first parameter set sent by a terminal, wherein the first parameter set comprises parameters of each frame of image in an original global video, and the original global video is a video recorded by the terminal;

processing the first parameter set to obtain a second parameter set;

and sending the second parameter set to the terminal so that the terminal processes the original global video according to the first parameter set to obtain a target global video.

As a possible implementation, the method further comprises:

the parameter comprises a camera pose; the first set of parameters is a sequence of camera poses formed by arranging the camera poses of each frame of image in the original global video in the order of acquisition of the images.

As a possible implementation, the method further comprises:

each frame image in the original global video comprises at least 2 image layers, the exposure parameters of the at least 2 image layers are different from each other, and the parameters are used for indicating the gray information of the image layers; the first parameter set comprises gray scale information of each frame of image in the original global video.

In a third aspect, an embodiment of the present invention provides a terminal, including: the system comprises a processor, a memory, a display screen, at least one camera and a communication interface, wherein the at least one camera is used for collecting images; the memory, the display screen, the at least one camera, and the communication interface are coupled to the processor, the memory is configured to store instructions, and the processor is configured to call the instructions stored by the scheduling memory to perform:

executing N times of preview processes to obtain an original global video and a first parameter set, wherein N is a positive integer not less than 2, the original global video is a set of images obtained in the N times of preview processes, the first parameter set comprises parameters of each frame of image in the original global video, and the ith preview process comprises the following steps: acquiring an ith original video clip and parameters of the ith original video clip; processing the video sub-segment in the ith original video segment according to the parameter of the ith original video segment to obtain an ith preview video segment; displaying the ith preview video clip through the display screen; i is a positive integer not greater than N;

sending the first parameter set to a server through the communication interface so that the server processes the first parameter set to obtain a second parameter set, and sending the second parameter set to the terminal by the server;

receiving, by the communication interface, the second set of parameters;

and storing the target global video.

As a possible implementation, the parameters include camera poses, and the parameters of the ith original video clip are a sequence of original poses formed by arranging the camera poses of each frame image in the ith original video clip in the acquisition order of the images; the first set of parameters is a sequence of camera poses formed by arranging the camera poses of each frame of image in the original global video in the order of acquisition of the images.

As a possible implementation manner, the processing, by the processor, the processing, on the video sub-segment in the ith original video segment according to the parameter of the ith original video segment, to obtain an ith preview video segment includes:

processing each camera pose in the original pose sequence to obtain an optimized pose sequence, wherein the original pose sequence corresponds to the camera poses in the optimized pose sequence one to one;

and processing the video sub-segment in the ith original video segment through an image stabilizing algorithm according to the optimized pose sequence to obtain the ith preview video segment.

As a possible implementation manner, the second parameter set is a sequence of camera poses processed by the server on the first parameter set, and the first parameter set corresponds to the camera poses in the second parameter set in a one-to-one manner.

As a possible implementation manner, each frame image in the ith original video segment includes at least 2 image layers, and exposure parameters of the at least 2 image layers are different from each other; the parameter is used for indicating the gray information of the image layer; the parameters of the ith original video clip comprise gray information of each frame image in the ith original video clip; the first parameter set comprises gray scale information of each frame of image in the original global video.

As a possible implementation manner, the processor performs the processing on the video sub-segment in the ith original video segment according to the parameter of the ith original video segment, and obtaining an ith preview video segment includes:

processing each frame image of a video sub-fragment in an ith original video fragment according to the gray information of the ith original video fragment to obtain a main image layer of each frame image in the video sub-fragment;

The second parameter set is a set of fusion parameters obtained by processing the first parameter set by the server; the processor executes the target global video obtained by processing the original global video through the second parameter set, and the target global video comprises:

In a fourth aspect, an embodiment of the present invention provides a server, including: a processor, a memory, and a communication interface, the memory, the processor coupled with the communication interface, the memory to store computer program code, the computer program code including computer instructions, the processor to invoke the computer instructions to cause the terminal to perform:

processing the first parameter set to obtain a second parameter set;

and sending the second parameter set to the terminal through the communication interface so that the terminal processes the original global video according to the first parameter set to obtain a target global video.

As a possible implementation, the parameter includes a camera pose; the first set of parameters is a sequence of camera poses formed by arranging the camera poses of each frame of image in the original global video in the order of acquisition of the images.

As a possible implementation manner, each frame of image in the original global video includes at least 2 image layers, exposure parameters of the at least 2 image layers are different from each other, and the parameters are used for indicating gray scale information of the image layers; the first parameter set comprises gray scale information of each frame of image in the original global video.

In a fifth aspect, an embodiment of the present application provides a chip, where the chip is applied to a terminal, and the chip includes one or more processors, where the processors are configured to invoke computer instructions to cause the terminal to perform the method described in the first aspect and any possible implementation manner of the first aspect.

In a sixth aspect, the present application provides a chip, where the chip is applied to a server, and the chip includes one or more processors, where the processor is configured to invoke computer instructions to cause the server to execute the method described in the second aspect and any possible implementation manner of the second aspect.

In a seventh aspect, an embodiment of the present application provides a computer program product including instructions, which, when run on a terminal, cause the terminal to perform the method described in the first aspect and any possible implementation manner of the first aspect.

In an eighth aspect, embodiments of the present application provide a computer program product including instructions, which, when run on a server, cause the server to perform the method as described in the second aspect and any possible implementation manner of the second aspect.

In a ninth aspect, an embodiment of the present application provides a computer-readable storage medium, which includes instructions that, when executed on a terminal, cause the terminal to perform the method described in the first aspect and any possible implementation manner of the first aspect.

In a tenth aspect, an embodiment of the present application provides a computer-readable storage medium, which includes instructions that, when executed on a server, cause the server to perform a method as described in the second aspect and any possible implementation manner of the second aspect.

It is to be understood that the terminal provided by the third aspect, the server provided by the fourth aspect, the chips provided by the fifth aspect and the sixth aspect, the computer program product provided by the seventh aspect and the eighth aspect, and the computer storage medium provided by the ninth aspect and the tenth aspect are all used to execute the method provided by the embodiments of the present application.

In this embodiment of the present invention, first, a terminal may obtain an original global video and a first parameter set by performing N times of preview processes, where N is a positive integer not less than 2, the original global video is a set of images obtained in the N times of preview processes, and the first parameter set includes parameters of each frame image in the original global video, where the ith preview process may include: the terminal firstly obtains an ith original video segment and parameters of the ith original video segment, then processes video sub-segments in the ith original video segment according to the parameters of the ith original video segment to obtain an ith preview video segment, and displays the ith preview video segment, wherein i is a positive integer not greater than N. And then, the terminal sends the first parameter set to a server so that the server processes the first parameter set to obtain a second parameter set, and the server sends the second parameter set to the terminal. And after receiving the second parameter set, the terminal processes the original global video through the second parameter set to obtain a target global video, and finally, stores the target global video. According to the method, when the terminal carries out local processing and previewing on the video, the terminal obtains the global parameters of the original video through interaction with the server, then carries out global processing on the original video to obtain the video after the global processing, and updates the previewed video. The method not only ensures the real-time performance of video preview, but also realizes the global processing of the video, and improves the quality of the video recorded by the terminal.

Drawings

The drawings used in the embodiments of the present application are described below.

Fig. 1 is a schematic structural diagram of a video processing apparatus according to an embodiment of the present application;

fig. 2 is a schematic hardware structure diagram of a terminal according to an embodiment of the present disclosure;

fig. 3 is a block diagram of a software structure of a terminal exemplarily provided in an embodiment of the present application;

fig. 4 is a schematic flowchart of a video processing method according to an embodiment of the present application;

fig. 5 is a schematic flowchart of a video processing method applied to a video image stabilization technique according to an embodiment of the present application;

fig. 6 is a schematic diagram of a front-end and back-end based video image stabilization processing procedure provided in an embodiment of the present application;

fig. 7 is a flowchart illustrating a video processing method applied to the video HDR technology according to an embodiment of the present application;

fig. 8 is a schematic diagram of a front-end and back-end based video HDR processing procedure provided by an embodiment of the present application;

FIG. 9 is a schematic diagram of a human-computer interaction interface provided by an embodiment of the present application;

fig. 10 is a schematic hardware structure diagram of a server according to an embodiment of the present application.

Detailed Description

The terminology used in the following examples of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of embodiments of the present application. As used in the description of the embodiments of the present application and the appended claims, the singular forms "a", "an", "the", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used in the embodiments of the present application refers to and encompasses any and all possible combinations of one or more of the listed items.

The embodiment of the application provides a video processing method and a video processing device. The video processing device comprises a terminal and a server. In the method, the terminal can execute N times of preview processes to obtain an original global video and a first parameter set, wherein N is a positive integer not less than 2. Then, the terminal may send the first parameter set to a server, so that the server processes the first parameter set to obtain a second parameter set, and the server sends the second parameter set to the terminal. And after receiving the second parameter set, the terminal can process the original global video through the second parameter set to obtain a target global video, and finally, the target global video is stored.

In the video processing method, the terminal can realize the global processing of the video through the interaction of the server, realize the updating of the preview video and improve the quality of the video recorded by the terminal.

Some concepts related to the embodiments of the present application are described below.

1. Video post-processing techniques

In the embodiment of the present application, the video post-processing technology refers to a video processing technology for processing a collected video through a video post-processing algorithm to obtain a video with an improved video effect in a certain aspect, and includes a video stabilization (video stabilization) technology, a High Dynamic Range (HDR) processing technology, and the like. The video post-processing algorithm is used for processing the collected multiple pictures or videos. The video post-processing algorithm can be called by the image processing module provided by the embodiment of the application to process the acquired picture or video.

2. Video image stabilization technique

The video image stabilization technology, also called video jitter removal technology, mainly reduces the jitter of the video through an image stabilization algorithm to generate a stable video and improve the quality of the video.

Video image stabilization techniques are classified into optical image stabilization, mechanical image stabilization, and digital video image stabilization according to mechanisms of action. The optical image stabilization adjusts an optical path in a self-adaptive manner through an active optical component, compensates image motion caused by the shake of a camera platform and achieves the aim of stabilizing images; the mechanical image stabilization detects the shake of the camera platform through devices such as a gyro sensor and the like, and then adjusts a servo system to achieve the aim of stabilizing the image; the digital video image stabilization is based on motion estimation between continuous video images, and then each frame image in the video is subjected to motion smoothing and motion compensation processing to obtain a stable image.

The video image stabilization technology in the embodiment of the invention refers to a data video image stabilization technology, and can be specifically divided into an electronic image stabilization technology and a pure digital image stabilization technology. These two types of techniques are very similar, and differ only in the device shake detection and motion estimation methods, the electronic image stabilization technique uses a hardware sensor (such as a gyroscope or the like) to detect camera shake, while the pure digital image stabilization technique estimates camera shake by an image processing method, and after obtaining a camera motion vector, both perform motion compensation, and perform image inpainting according to the compensated motion.

The traditional video image stabilization algorithm is mainly divided into 3 parts: motion estimation, motion smoothing and motion compensation.

(1) Motion estimation, i.e., camera motion estimation, is a process of determining camera motion vectors, which are quantities describing motion conversion between images, under a specific camera motion model. The camera motion vector refers to a global motion vector related to the motion of the whole frame image, and its estimation is often done by a local motion vector. A local motion vector refers to a motion vector of a part of an image, e.g. a rectangular block, an arbitrarily shaped block or even per pixel. Video image stabilization algorithms usually include 2D image stabilization, 2.5D image stabilization and 3D image stabilization, and taking 2D image stabilization as an example, camera motion estimation in conventional 2D image stabilization usually first performs estimation of local motion vectors, and then estimates global motion vectors through the local motion vectors. Local motion estimation methods can be divided into two broad categories, pixel-based methods and feature-based methods. The method based on the pixel point comprises a block matching method, a phase correlation method, an optical flow method and the like, and a common characteristic point detection method comprises a corner point detection method and a spot detection method. In this embodiment, the motion estimation is specifically an acquisition process of a camera pose sequence.

(2) Camera motion compensation is an algorithm used to generate new camera motion that suppresses the dithering to generate more stable camera motion. For 2D video stabilization, a motion compensation mechanism receives camera 2D motion data and generates new smoothed 2D motion data via algorithmic calculations. Motion compensation is generally divided into 2 types of schemes: motion path smoothing and motion path fitting. The motion path smoothing specifically means that camera motion data corresponding to a stable video should also be smooth, and conversely, camera motion data of a jittered video has "noise" with small jitter. From a digital signal processing perspective, motion smoothing is the removal of these motion noises. Smoothing filters commonly used in video image stabilization schemes are moving average (movingaverage) filters, particle filters, and other polynomial filters. Motion path fitting is different from motion path smoothing, which mimics professional cinematographic paths, such as straight lines, parabolas, etc. This scheme can achieve more stable motion compensation results than motion path smoothing, since it can remove not only high frequency jitter but also ineffective low frequency jitter in the path. In this embodiment, the motion smoothing is specifically an optimization process of a camera pose sequence.

(3) The motion compensation step generally includes obtaining a smooth path through motion smoothing, compensating each frame of the video, obtaining a compensation matrix of each frame, and performing geometric transformation on each frame to obtain a stable video frame sequence. In this embodiment, the motion compensation is specifically a process of obtaining a stable image through the optimized camera pose sequence.

3. Video HDR techniques

Because human eyes can capture scenes with particularly bright or dark areas to a certain extent, but due to factors of internal components of the imaging device, only color information and brightness information of part of scenes in a real scene can be recorded, although the brightness information of the scenes can be adjusted by controlling the exposure of the internal components of the camera, the brightness adjustment range of the method is limited, so that some areas in the captured images are under-exposed, and some areas are over-exposed, so that some detail information in the images is lost, and the visual effect that the human eyes cannot watch is achieved.

In order to capture scene information more suitable for the visual effect of human eyes, a high dynamic range technology, i.e., an HDR technology, needs to be adopted. The technology can clearly image a particularly bright area and a particularly dark area of an image, so that the details of the area are richer. The dynamic range refers to the ratio of the maximum value to the minimum value of the physical measurement quantity, and for an actual scene, the dynamic range refers to the ratio of the brightest area to the darkest area in the scene. For the digital imaging field, the dynamic range refers to the ratio of the maximum pixel value to the minimum pixel value in an image, with luminance units of canperla/square meter.

The HDR technology can be applied to image processing and also applied to video processing.

In this embodiment, a video HDR technology is adopted, and specifically, the image processing module may use an HDR algorithm to synthesize a plurality of LDR (Low-Dynamic Range) images into one HDR image. HDR images can provide more dynamic range and image detail than ordinary images. The plurality of LDR images have different exposure times. Images with different exposure times have different brightness and different details of the provided images. The HDR image can be sent to a preview display module for previewing as a frame of image, and also can be sent to an encoding module for encoding. The HDR algorithm often includes a tone mapping algorithm, because the dynamic range of the high dynamic image is far beyond the dynamic range of a common display, so that the images cannot be directly displayed on the display, and in order to correctly display the high dynamic image, the terminal needs to use a specific tone mapping algorithm to compress the dynamic range.

In addition, the video is an image sequence formed by continuous frames, so that a lot of correlations exist between adjacent images of the video, and in the embodiment, fusion parameters for improving the video correlations are added in video HDR processing, so that the processing effect of the HDR video is effectively improved, and the HDR video with better visual effect is presented.

Referring to fig. 1, fig. 1 is a schematic structural diagram of a video processing system according to an embodiment of the present invention, which may include a terminal 100 and a server 200. Wherein:

the terminal 100 may include an application "camera," and when the terminal 100 starts the application "camera," the recording operation input by the user may be received, and in response to the recording operation, the terminal 100 continuously acquires images of the current scene through a camera to obtain an original video, where the original video obtained after the recording process is finished is referred to as an original global video. In the course of video recording, the terminal 100 may preview a video while recording. In order to ensure the quality of the preview video, the terminal 100 may process the original video acquired by the video post-processing algorithm to obtain the preview video, and play the preview video through the display screen.

In some embodiments, to implement image stabilization of a video, the terminal 100 may detect motion information of a camera, such as a rotation angle, an acceleration, and the like, through an inertial sensor during a video recording process, and determine a camera pose (also referred to as a camera pose) of the camera when capturing an image of each frame based on the motion information, and the terminal may record the camera pose, and further, during video post-processing, may process an original video through a video image stabilization technique in combination with the camera pose.

In some embodiments, to implement high dynamic imaging of a video, during a video recording process, the terminal 100 may maintain camera parameters used by a camera in a shooting process, such as an exposure coefficient, an exposure time, an aperture, and the like, and the terminal 100 may obtain multiple frames of images in the same scene through different camera parameters, at this time, each frame of image in an original video includes multiple images (also referred to as layers) with different camera parameters, and the terminal 100 may collect parameters of each layer, such as gray scale information, and further, during video post-processing, may process the original video through a video HDR technology in combination with the gray scale information.

Due to the limitation of the computing resources of the terminal 100 itself and the real-time requirement of the terminal on the preview video, the terminal only processes the segments therein when performing video post-processing, and performs processing based on the video segments, instead of performing video post-processing after obtaining the original global video. Specifically, the terminal may process and preview the original video obtained in the whole recording process in batches, and preview the original video in N times, that is, the terminal 100 executes the preview process N times in the video recording process, where N is a positive integer not less than 2. Wherein, the ith preview process may include: the method comprises the steps of firstly obtaining an ith original video clip and parameters (such as camera pose, gray scale information and the like) of the ith original video clip, then processing video sub-clips in the ith original video clip according to the parameters of the ith original video clip to obtain an ith preview video clip, and finally displaying the ith preview video clip, wherein i is a positive integer not larger than N. After the recording is finished, the terminal 100 may store the original global video and the global preview video, where the global preview video is a composite video of N preview video segments obtained in N preview processes, and the images in the original global video correspond to the images in the global preview video one to one.

After the terminal 100 executes the N preview processes, the original global video may be obtained, at this time, to reduce the delay, the terminal 100 may send a first parameter set (the parameters of the original global video) to the server 200, and process the first parameter set through the server 200 to obtain a second parameter set, where the second parameter set is a set of parameters obtained by performing global optimization on the first parameter set, and the influence of the original global video on each frame image is considered. Further, the server 200 transmits the second parameter set to the terminal 100.

After receiving the second parameter set, the terminal 100 may store the target global video or update the global preview video to the target global video by processing the original global video through the second parameter set.

The terminal 100 is a terminal having at least one camera and at least one display screen, and can implement video recording and video preview functions. May be a mobile phone, a tablet pc, a notebook pc, an ultra-mobile personal computer (UMPC), a netbook, a Personal Digital Assistant (PDA), an On Board Unit (OBU), a wearable device (e.g., a watch, a bracelet, a smart helmet, etc.), a smart home device (e.g., an electric cooker, a stereo, a home steward device, etc.), an Augmented Reality (AR)/Virtual Reality (VR) device, etc.

The server 200 may be implemented by an independent server or a server cluster composed of a plurality of servers, and may be a cloud server, a cloud end, or the like.

The following describes a terminal according to an embodiment of the present application.

Fig. 2 shows a schematic configuration of the terminal 100.

The terminal 100 may include a processor 110, an external memory interface 120, an internal memory 121, a Universal Serial Bus (USB) interface 130, a charging management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2, a mobile communication module 150, a wireless communication module 160, an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, a sensor module 180, a button 190, a motor 191, an indicator 192, a camera 193, a display screen 194, and a Subscriber Identity Module (SIM) card interface 195, etc. The sensor module 180 may include a pressure sensor 180A, a gyroscope sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity light sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, an ambient light sensor 180L, a bone conduction sensor 180M, and the like.

It is to be understood that the illustrated structure of the embodiment of the present invention does not specifically limit the terminal 100. In other embodiments of the present application, terminal 100 may include more or fewer components than shown, or some components may be combined, some components may be split, or a different arrangement of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.

Processor 110 may include one or more processing units, such as: the processor 110 may include an Application Processor (AP), a modem processor, a Graphics Processing Unit (GPU), an Image Signal Processor (ISP), a controller, a memory, a video codec, a Digital Signal Processor (DSP), a baseband processor, and/or a neural-Network Processing Unit (NPU), etc. The different processing units may be separate devices or may be integrated into one or more processors.

The controller may be, among other things, a neural center and a command center of the terminal 100. The controller can generate an operation control signal according to the instruction operation code and the timing signal to complete the control of instruction fetching and instruction execution.

A memory may also be provided in processor 110 for storing instructions and data. In some embodiments, the memory in the processor 110 is a cache memory. The memory may hold instructions or data that have just been used or recycled by the processor 110. If the processor 110 needs to reuse the instruction or data, it can be called directly from the memory. Avoiding repeated accesses reduces the latency of the processor 110, thereby increasing the efficiency of the system. For example, in some embodiments, the processor may directly invoke the memory-stored instructions to perform high dynamic range imaging processing on each frame of image acquired according to the HDR algorithm.

In some embodiments, processor 110 may include one or more interfaces. The interface may include an integrated circuit (I2C) interface, an integrated circuit built-in audio (I2S) interface, a Pulse Code Modulation (PCM) interface, a universal asynchronous receiver/transmitter (UART) interface, a Mobile Industry Processor Interface (MIPI), a general-purpose input/output (GPIO) interface, a Subscriber Identity Module (SIM) interface, and/or a Universal Serial Bus (USB) interface, etc.

The I2C interface is a bidirectional synchronous serial bus including a serial data line (SDA) and a Serial Clock Line (SCL). In some embodiments, processor 110 may include multiple sets of I2C buses. The processor 110 may be coupled to the touch sensor 180K, a charger, a flash, a camera 193, etc. through different I2C bus interfaces, respectively. For example: the processor 110 may be coupled to the touch sensor 180K through an I2C interface, so that the processor 110 and the touch sensor 180K communicate through an I2C bus interface to implement a touch function of the terminal 100.

The I2S interface may be used for audio communication. In some embodiments, processor 110 may include multiple sets of I2S buses. The processor 110 may be coupled to the audio module 170 through an I2S bus, enabling communication between the processor 110 and the audio module 170. In some embodiments, the audio module 170 may transmit the audio signal to the wireless communication module 160 through the I2S interface, so as to implement a function of receiving a call through a bluetooth headset.

The PCM interface may also be used for audio communication, sampling, quantizing and encoding analog signals. In some embodiments, the audio module 170 and the wireless communication module 160 may be coupled by a PCM bus interface. In some embodiments, the audio module 170 may also transmit audio signals to the wireless communication module 160 through the PCM interface, so as to implement a function of answering a call through a bluetooth headset. Both the I2S interface and the PCM interface may be used for audio communication.

The UART interface is a universal serial data bus used for asynchronous communications. The bus may be a bidirectional communication bus. It converts the data to be transmitted between serial communication and parallel communication. In some embodiments, a UART interface is generally used to connect the processor 110 and the wireless communication module 160. For example: the processor 110 communicates with a bluetooth module in the wireless communication module 160 through a UART interface to implement a bluetooth function. In some embodiments, the audio module 170 may transmit the audio signal to the wireless communication module 160 through a UART interface, so as to realize the function of playing music through a bluetooth headset.

The MIPI interface may be used to connect the processor 110 with peripheral devices such as the display screen 194, the camera 193, and the like. The MIPI interface includes a Camera Serial Interface (CSI), a Display Serial Interface (DSI), and the like. In some embodiments, processor 110 and camera 193 communicate through a CSI interface to implement the capture functionality of terminal 100. The processor 110 and the display screen 194 communicate through the DSI interface to implement the display function of the terminal 100.

The GPIO interface may be configured by software. The GPIO interface may be configured as a control signal and may also be configured as a data signal. In some embodiments, a GPIO interface may be used to connect the processor 110 with the camera 193, the display 194, the wireless communication module 160, the audio module 170, the sensor module 180, and the like. The GPIO interface may also be configured as an I2C interface, I2S interface, UART interface, MIPI interface, and the like.

The USB interface 130 is an interface conforming to the USB standard specification, and may specifically be a Mini USB interface, a Micro USB interface, a USB Type C interface, or the like. The USB interface 130 may be used to connect a charger to charge the terminal 100, and may also be used to transmit data between the terminal 100 and peripheral devices. And the method can also be used for connecting a headset and playing audio through the headset. The interface may also be used to connect other terminals, such as AR devices, etc.

It should be understood that the connection relationship between the modules according to the embodiment of the present invention is only illustrative, and is not limited to the structure of the terminal 100. In other embodiments of the present application, the terminal 100 may also adopt different interface connection manners or a combination of multiple interface connection manners in the above embodiments.

The charging management module 140 is configured to receive charging input from a charger. The charger may be a wireless charger or a wired charger. In some wired charging embodiments, the charging management module 140 may receive charging input from a wired charger via the USB interface 130. In some wireless charging embodiments, the charging management module 140 may receive a wireless charging input through a wireless charging coil of the terminal 100. The charging management module 140 may also supply power to the terminal through the power management module 141 while charging the battery 142.

The power management module 141 is used to connect the battery 142, the charging management module 140 and the processor 110. The power management module 141 receives input from the battery 142 and/or the charge management module 140 and provides power to the processor 110, the internal memory 121, the external memory, the display 194, the camera 193, the wireless communication module 160, and the like. The power management module 141 may also be used to monitor parameters such as battery capacity, battery cycle count, battery state of health (leakage, impedance), etc. In some other embodiments, the power management module 141 may also be disposed in the processor 110. In other embodiments, the power management module 141 and the charging management module 140 may be disposed in the same device.

The wireless communication function of the terminal 100 may be implemented by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, a modem processor, a baseband processor, and the like.

The antennas 1 and 2 are used for transmitting and receiving electromagnetic wave signals. Each antenna in terminal 100 may be used to cover a single or multiple communication bands. Different antennas can also be multiplexed to improve the utilization of the antennas. For example: the antenna 1 may be multiplexed as a diversity antenna of a wireless local area network. In other embodiments, the antenna may be used in conjunction with a tuning switch.

The mobile communication module 150 may provide a solution including wireless communication of 2G/3G/4G/5G, etc. applied on the terminal 100. The mobile communication module 150 may include at least one filter, a switch, a power amplifier, a Low Noise Amplifier (LNA), and the like. The mobile communication module 150 may receive the electromagnetic wave from the antenna 1, filter, amplify, etc. the received electromagnetic wave, and transmit the electromagnetic wave to the modem processor for demodulation. The mobile communication module 150 may also amplify the signal modulated by the modem processor, and convert the signal into electromagnetic wave through the antenna 1 to radiate the electromagnetic wave. In some embodiments, at least some of the functional modules of the mobile communication module 150 may be disposed in the processor 110. In some embodiments, at least some of the functional modules of the mobile communication module 150 may be disposed in the same device as at least some of the modules of the processor 110.

The modem processor may include a modulator and a demodulator. The modulator is used for modulating a low-frequency baseband signal to be transmitted into a medium-high frequency signal. The demodulator is used for demodulating the received electromagnetic wave signal into a low-frequency baseband signal. The demodulator then passes the demodulated low frequency baseband signal to a baseband processor for processing. The low frequency baseband signal is processed by the baseband processor and then transferred to the application processor. The application processor outputs a sound signal through an audio device (not limited to the speaker 170A, the receiver 170B, etc.) or displays an image or video through the display screen 194. In some embodiments, the modem processor may be a stand-alone device. In other embodiments, the modem processor may be provided in the same device as the mobile communication module 150 or other functional modules, independent of the processor 110.

The wireless communication module 160 may provide solutions for wireless communication applied to the terminal 100, including Wireless Local Area Networks (WLANs) (e.g., wireless fidelity (Wi-Fi) networks), bluetooth (BT), global Navigation Satellite System (GNSS), frequency Modulation (FM), near Field Communication (NFC), infrared (IR), and the like. The wireless communication module 160 may be one or more devices integrating at least one communication processing module. The wireless communication module 160 receives electromagnetic waves via the antenna 2, performs frequency modulation and filtering on electromagnetic wave signals, and transmits the processed signals to the processor 110. The wireless communication module 160 may also receive a signal to be transmitted from the processor 110, perform frequency modulation and amplification on the signal, and convert the signal into electromagnetic waves through the antenna 2 to radiate the electromagnetic waves.

In some embodiments, the antenna 1 of the terminal 100 is coupled to the mobile communication module 150 and the antenna 2 is coupled to the wireless communication module 160 so that the terminal 100 can communicate with a network and other devices through a wireless communication technology. The wireless communication technology may include global system for mobile communications (GSM), general Packet Radio Service (GPRS), code Division Multiple Access (CDMA), wideband Code Division Multiple Access (WCDMA), time division code division multiple access (time-division multiple access, TD-SCDMA), long Term Evolution (LTE), BT, GNSS, WLAN, NFC, FM, and/or IR technologies, etc. The GNSS may include a Global Positioning System (GPS), a global navigation satellite system (GLONASS), a beidou navigation satellite system (BDS), a quasi-zenith satellite system (QZSS), and/or a Satellite Based Augmentation System (SBAS).

The terminal 100 implements a display function through the GPU, the display screen 194, and the application processor, etc. The GPU is a microprocessor for image processing, and is connected to the display screen 194 and an application processor. The GPU is used to perform mathematical and geometric calculations for graphics rendering. The processor 110 may include one or more GPUs that execute program instructions to generate or alter display information.

The display screen 194 is used to display images, video, and the like. The display screen 194 includes a display panel. The display panel may adopt a Liquid Crystal Display (LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode (active-matrix organic light-emitting diode, AMOLED), a flexible light-emitting diode (FLED), a miniature, a Micro-oeld, a quantum dot light-emitting diode (QLED), and the like. In some embodiments, the terminal 100 may include 1 or N display screens 194, N being a positive integer greater than 1.

The terminal 100 may implement a capture function through the ISP, the camera 193, the video codec, the GPU, the display screen 194, the application processor, and the like, so as to implement an image capture module of the HAL layer in the embodiment of the present application.

The ISP is used to process the data fed back by the camera 193. For example, when a photo is taken, the shutter is opened, light is transmitted to the camera photosensitive element through the lens, the optical signal is converted into an electrical signal, and the camera photosensitive element transmits the electrical signal to the ISP for processing and converting into an image or video visible to the naked eye. The ISP can also carry out algorithm optimization on the noise, brightness and skin color of the image. The ISP can also optimize parameters such as exposure, color temperature and the like of a shooting scene. In some embodiments, the ISP may be provided in camera 193.

The camera 193 is used to capture still images or video. The object generates an optical image through the lens and projects the optical image to the photosensitive element. The photosensitive element may be a Charge Coupled Device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor. The light sensing element converts the optical signal into an electrical signal, which is then passed to the ISP where it is converted into a digital image or video signal. And the ISP outputs the digital image or video signal to the DSP for processing. The DSP converts the digital image or video signal into image or video signal in standard RGB, YUV and other formats. In some embodiments, terminal 100 may include 1 or N cameras 193, N being a positive integer greater than 1. For example, in some embodiments, the terminal 100 may capture images of multiple exposure coefficients by using the N cameras 193, and then, in video post-processing, the terminal 100 may synthesize an HDR image by an HDR technique according to the images of multiple exposure coefficients.

The digital signal processor is used for processing digital signals, and can process digital images or video signals and other digital signals. For example, when the terminal 100 selects a frequency bin, the digital signal processor is configured to perform fourier transform or the like on the frequency bin energy.

Video codecs are used to compress or decompress digital video. The terminal 100 may support one or more video codecs. In this way, the terminal 100 can play or record video in a variety of encoding formats, such as: moving Picture Experts Group (MPEG) 1, MPEG2, MPEG3, MPEG4, and the like.

The NPU is a neural-network (NN) computing processor that processes input information quickly by using a biological neural network structure, for example, by using a transfer mode between neurons of a human brain, and can also learn by itself continuously. The NPU can implement applications such as intelligent recognition of the terminal 100, for example: image recognition, face recognition, speech recognition, text understanding, and the like.

The external memory interface 120 may be used to connect an external memory card, such as a Micro SD card, to extend the memory capability of the terminal 100. The external memory card communicates with the processor 110 through the external memory interface 120 to implement a data storage function. For example, files such as music, video, etc. are saved in an external memory card.

The internal memory 121 may be used to store computer-executable program code, which includes instructions. The processor 110 executes various functional applications of the terminal 100 and data processing by executing instructions stored in the internal memory 121. The internal memory 121 may include a program storage area and a data storage area. The storage program area may store an operating system, an application program (such as a sound playing function, an image and video playing function, etc.) required by at least one function, and the like. The storage data area may store data (e.g., audio data, a phonebook, etc.) created during use of the terminal 100, and the like. In addition, the internal memory 121 may include a high-speed random access memory, and may further include a nonvolatile memory, such as at least one magnetic disk storage device, a flash memory device, a universal flash memory (UFS), and the like.

The terminal 100 may implement an audio function through the audio module 170, the speaker 170A, the receiver 170B, the microphone 170C, the earphone interface 170D, and the application processor. Such as music playing, recording, etc.

The audio module 170 is used to convert digital audio information into analog audio signals for output, and also used to convert analog audio inputs into digital audio signals. The audio module 170 may also be used to encode and decode audio signals. In some embodiments, the audio module 170 may be disposed in the processor 110, or some functional modules of the audio module 170 may be disposed in the processor 110.

The speaker 170A, also called a "horn", is used to convert the audio electrical signal into an acoustic signal. The terminal 100 can listen to music through the speaker 170A or listen to a handsfree call.

The receiver 170B, also called "earpiece", is used to convert the electrical audio signal into an acoustic signal. When the terminal 100 receives a call or voice information, it can receive voice by bringing the receiver 170B close to the human ear.

The microphone 170C, also referred to as a "microphone," is used to convert sound signals into electrical signals. When making a call or sending voice information, the user can input a voice signal to the microphone 170C by uttering a voice signal close to the microphone 170C through the mouth of the user. The terminal 100 may be provided with at least one microphone 170C. In other embodiments, the terminal 100 may be provided with two microphones 170C to achieve a noise reduction function in addition to collecting sound signals. In other embodiments, the terminal 100 may further include three, four or more microphones 170C to collect sound signals, reduce noise, identify sound sources, implement directional recording functions, and so on.

The earphone interface 170D is used to connect a wired earphone. The headphone interface 170D may be the USB interface 130, or may be 3. A 5mm Open Mobile Terminal Platform (OMTP) standard interface, a cellular telecommunications industry association (cellular telecommunications industry association) standard interface of the USA.

The pressure sensor 180A is used for sensing a pressure signal, and can convert the pressure signal into an electrical signal. In some embodiments, the pressure sensor 180A may be disposed on the display screen 194. The pressure sensor 180A can be of a wide variety, such as a resistive pressure sensor, an inductive pressure sensor, a capacitive pressure sensor, and the like. The capacitive pressure sensor may be a sensor comprising at least two parallel plates having an electrically conductive material. When a force acts on the pressure sensor 180A, the capacitance between the electrodes changes. The terminal 100 determines the intensity of the pressure according to the change in the capacitance. When a touch operation is applied to the display screen 194, the terminal 100 detects the intensity of the touch operation according to the pressure sensor 180A. The terminal 100 may also calculate the touched position from the detection signal of the pressure sensor 180A. In some embodiments, the touch operations that are applied to the same touch position but different touch operation intensities may correspond to different operation instructions. For example: and when the touch operation with the touch operation intensity smaller than the first pressure threshold value acts on the short message application icon, executing an instruction for viewing the short message. And when the touch operation with the touch operation intensity larger than or equal to the first pressure threshold value acts on the short message application icon, executing an instruction of newly building the short message.

The gyro sensor 180B may be used to determine a motion attitude of the terminal 100. In some embodiments, the angular velocity of terminal 100 about three axes (i.e., x, y, and z axes) may be determined by gyroscope sensor 180B. The gyro sensor 180B may be used for photographing anti-shake. Illustratively, when the shutter is pressed, the gyro sensor 180B detects a shake angle of the terminal 100, calculates a distance to be compensated for by the lens module according to the shake angle, and allows the lens to counteract the shake of the terminal 100 by a reverse movement, thereby achieving anti-shake. The gyroscope sensor 180B may also be used for navigation, somatosensory gaming scenes.

The air pressure sensor 180C is used to measure air pressure. In some embodiments, the terminal 100 calculates an altitude from the barometric pressure measured by the barometric pressure sensor 180C to assist in positioning and navigation.

The magnetic sensor 180D includes a hall sensor. The terminal 100 may detect the opening and closing of the flip holster using the magnetic sensor 180D. In some embodiments, when the terminal 100 is a folder, the terminal 100 may detect the opening and closing of the folder according to the magnetic sensor 180D. And then according to the opening and closing state of the leather sheath or the opening and closing state of the flip cover, the automatic unlocking of the flip cover is set.

The acceleration sensor 180E may detect the magnitude of acceleration of the terminal 100 in various directions (generally, three axes). The magnitude and direction of gravity can be detected when the terminal 100 is stationary. The method can also be used for recognizing the terminal gesture, and is applied to horizontal and vertical screen switching, pedometers and other applications.

A distance sensor 180F for measuring a distance. The terminal 100 may measure the distance by infrared or laser. In some embodiments, the scene is photographed and the terminal 100 may range using the distance sensor 180F to achieve fast focus.

The proximity light sensor 180G may include, for example, a Light Emitting Diode (LED) and a light detector, such as a photodiode. The light emitting diode may be an infrared light emitting diode. The terminal 100 emits infrared light outward through the light emitting diode. The terminal 100 detects infrared reflected light from a nearby object using a photodiode. When sufficient reflected light is detected, it can be determined that there is an object near the terminal 100. When insufficient reflected light is detected, terminal 100 may determine that there are no objects near terminal 100. The terminal 100 can utilize the proximity light sensor 180G to detect that the user holds the terminal 100 close to the ear for talking, so as to automatically turn off the screen to achieve the purpose of saving power. The proximity light sensor 180G may also be used in a holster mode, a pocket mode automatically unlocks and locks the screen.

The ambient light sensor 180L is used to sense the ambient light level. The terminal 100 may adaptively adjust the brightness of the display 194 according to the perceived ambient light level. The ambient light sensor 180L may also be used to automatically adjust the white balance when taking a picture. The ambient light sensor 180L may also cooperate with the proximity light sensor 180G to detect whether the terminal 100 is in a pocket to prevent accidental touches.

The fingerprint sensor 180H is used to collect a fingerprint. The terminal 100 can utilize the collected fingerprint characteristics to realize fingerprint unlocking, access to an application lock, fingerprint photographing, fingerprint incoming call answering, and the like.

The temperature sensor 180J is used to detect temperature. In some embodiments, the terminal 100 executes a temperature processing strategy using the temperature detected by the temperature sensor 180J. For example, when the temperature reported by the temperature sensor 180J exceeds a threshold, the terminal 100 performs a reduction in the performance of the processor located near the temperature sensor 180J, so as to reduce power consumption and implement thermal protection. In other embodiments, the terminal 100 heats the battery 142 when the temperature is below another threshold to avoid abnormal shutdown of the terminal 100 due to low temperature. In other embodiments, when the temperature is lower than a further threshold, the terminal 100 performs boosting on the output voltage of the battery 142 to avoid abnormal shutdown due to low temperature.

The touch sensor 180K is also referred to as a "touch panel". The touch sensor 180K may be disposed on the display screen 194, and the touch sensor 180K and the display screen 194 form a touch screen, which is also called a "touch screen". The touch sensor 180K is used to detect a touch operation acting thereon or nearby. The touch sensor can communicate the detected touch operation to the application processor to determine the touch event type. Visual output associated with the touch operation may be provided through the display screen 194. In other embodiments, the touch sensor 180K may be disposed on the surface of the terminal 100 at a different position than the display screen 194.

The bone conduction sensor 180M may acquire a vibration signal. In some embodiments, the bone conduction sensor 180M may acquire a vibration signal of the human vocal part vibrating the bone mass. The bone conduction sensor 180M may also contact the human pulse to receive the blood pressure pulsation signal. In some embodiments, bone conduction sensor 180M may also be provided in a headset, integrated into a bone conduction headset. The audio module 170 may analyze a voice signal based on the vibration signal of the bone block vibrated by the sound part obtained by the bone conduction sensor 180M, so as to implement a voice function. The application processor can analyze heart rate information based on the blood pressure beating signal acquired by the bone conduction sensor 180M, so as to realize the heart rate detection function.

The keys 190 include a power-on key, a volume key, and the like. The keys 190 may be mechanical keys. Or may be touch keys. The terminal 100 may receive a key input, and generate a key signal input related to user setting and function control of the terminal 100.

The motor 191 may generate a vibration cue. The motor 191 may be used for incoming call vibration prompts as well as for touch vibration feedback. For example, touch operations applied to different applications (e.g., photographing, audio playing, etc.) may correspond to different vibration feedback effects. The motor 191 may also respond to different vibration feedback effects for touch operations applied to different areas of the display screen 194. Different application scenes (such as time reminding, receiving information, alarm clock, game and the like) can also correspond to different vibration feedback effects. The touch vibration feedback effect may also support customization.

Indicator 192 may be an indicator light that may be used to indicate a state of charge, a change in charge, or a message, missed call, notification, etc.

The SIM card interface 195 is used to connect a SIM card. The SIM card can be brought into and out of contact with the terminal 100 by being inserted into the SIM card interface 195 or being pulled out of the SIM card interface 195. The terminal 100 may support 1 or N SIM card interfaces, where N is a positive integer greater than 1. The SIM card interface 195 may support a Nano SIM card, a Micro SIM card, a SIM card, etc. Multiple cards can be inserted into the same SIM card interface 195 at the same time. The types of the plurality of cards can be the same or different. The SIM card interface 195 may also be compatible with different types of SIM cards. The SIM card interface 195 may also be compatible with external memory cards. The terminal 100 interacts with the network through the SIM card to implement functions such as communication and data communication. In some embodiments, the terminal 100 employs esims, namely: an embedded SIM card. The eSIM card can be embedded in the terminal 100 and cannot be separated from the terminal 100.

In the embodiment of the present application, the software system of the terminal 100 may adopt a layered architecture, an event-driven architecture, a micro-core architecture, a micro-service architecture, or a cloud architecture. The embodiment of the present application takes an Android system with a layered architecture as an example, and exemplarily illustrates a software structure of the terminal 100.

Referring to fig. 3, fig. 3 is a block diagram illustrating a software structure of the terminal 100 according to an exemplary embodiment of the present disclosure. The layered architecture divides the software into several layers, each layer having a clear role and division of labor. The layers communicate with each other through a software interface.

As shown in fig. 3, the Android system can be divided into three layers, which are from top to bottom: an application layer, an application framework layer, and a Hardware Abstraction Layer (HAL) layer. Wherein:

the application layer includes a series of application packages, for example containing a camera application. Not limited to camera applications, but also other applications such as gallery, calendar, phone, map, navigation, WLAN, bluetooth, music, video, short message, etc. applications.

The camera application may provide a video recording mode for the user. In some embodiments, the video recording mode may specifically include a video HDR mode, an image stabilization mode, and the like. As shown in fig. 3, the camera application may include a mode loading module, a photographing control module, and a preview display module. Wherein:

and the mode loading module is used for inquiring the HAL layer about the mode when the camera application is started and loading the mode according to the inquiry result. The modes may include a night mode, a portrait mode, a photograph mode, a short video mode, a video mode, etc.

And the shooting control module is used for starting the shooting control module together with the preview display module when detecting that the shooting control module is switched to the video recording mode, and informing the HAL layer capability enabling module of starting a video recording mode related module. The shooting control module can also respond to the touch operation of a user on a video starting control in a user interface of the camera application, notify the coding module and the image processing module in the application frame layer, and after receiving the notification, the coding module acquires a video data stream from the beginning of the HAL layer and can code the video stream to generate a video file. For example, in some embodiments, when the user may perform a touch operation on the HDR mode control in the user interface of the camera application, the capture control module may further notify, in response to the touch operation, the encoding module and the image processing module in the application framework layer, after receiving the notification, the image processing module performs HDR processing on the captured image and then sends the processed image to the encoding module, and then the encoding module may encode the video stream to generate the video file. For another example, when the user performs a touch operation on the end video recording control in the user interface of the camera application, the shooting control module may further notify the coding module and the image processing module in the application framework layer in response to the touch operation. And the coding module stops acquiring the video data stream from the image processing module of the HAL layer after receiving the notification.

And the preview display module is used for receiving the video data stream from the image processing module or the image acquisition module of the HAL layer, displaying a preview image or a preview video on a user interface, and updating the preview image and the preview video in real time.

The FWK provides an Application Programming Interface (API) and a programming framework for an application program of the FWK. The application framework layer includes a number of predefined functions.

As shown in fig. 3, the application framework layer may include a Camera Service interface (Camera Service) that may provide a communication interface between a Camera application and the HAL layer in the application layer. The application framework layer may also include an encoding module. The encoding module may receive a notification from a capture control module in the camera application to start or stop receiving a video data stream from an image processing module of the HAL layer and encode the video data stream to obtain a video file.

As shown in fig. 3, the HAL layer contains modules for providing video recording mode for camera applications. These modules providing the video recording mode may capture video in the video recording mode. The HAL layer also provides corresponding post-processing algorithms for the video recording mode.

Specifically, as shown in fig. 3, the HAL layer may include modules related to the video recording mode of the camera: the system comprises a capability enabling module, an image acquisition module, a scene recognition module and an image processing module. Wherein:

and the capability enabling module is used for starting modules related to the video recording mode of the HAL layer after receiving the notification of the shooting control module, such as an image acquisition module and an image processing module. Specifically, when a user operates in a user interface of the camera application to switch to a video recording mode, a shooting control module in the camera application may notify a capability enabling module of the HAL layer, and after receiving the notification, the capability enabling module enables to start the image acquisition module and the image processing module.

And the image acquisition module is used for acquiring image videos or video parameters and sending the acquired image videos or video parameters to the image processing module.

The image processing module may include a plurality of video post-processing algorithms, and may execute one or more video post-processing algorithms simultaneously, which is not limited herein. The image processing module can process the image video through a video post-processing algorithm to obtain a video data stream, and sends the video data stream to the preview display module for preview display and to the encoding module to form a video file. For example, in some embodiments, when the user may perform a touch operation on the HDR mode control in the user interface of the camera application, the capture control module may further notify the encoding module and the image processing module in the application framework layer in response to the touch operation, and the image processing module, after receiving the notification, sends the captured image to the encoding module to form a video file after processing the captured image according to the video HDR technology.

It should be noted that the software architecture of the terminal shown in fig. 3 is only one implementation manner of the embodiment of the present application, and in practical applications, the terminal may further include more or fewer software modules, which is not limited herein.

The following describes a video processing method provided in the embodiment of the present application with reference to fig. 4. The method may be implemented by the video processing system shown in fig. 1, as shown in fig. 4, the video processing method includes the following steps:

s101, the terminal executes N times of preview processes to obtain an original global video and a first parameter set, wherein N is a positive integer not less than 2. The original global video is a set of images acquired by the terminal in N times of previewing processes, and the first parameter set comprises parameters of each frame of image in the original global video.

The terminal can comprise an application program camera, when the terminal starts the application program camera, the terminal can receive recording operation input by a user, and then the terminal responds to the operation and continuously acquires images of a current scene through a camera to obtain an original video, wherein the original video obtained after the recording process is finished is called as an original global video. In the process of recording the video, the terminal may record and preview the video, where the preview video may be an unprocessed original video, or may also be a video obtained by processing an image of a current scene acquired in real time by the terminal in a preset video post-processing manner, for example, performing HDR algorithm processing on each acquired frame of image and displaying the image in real time, so as to obtain the preview video. After turning on the "camera", the user may also select the desired function, e.g. image stabilization, HDR, etc., through the user interface. And then, the terminal carries out video post-processing on the acquired original video through a video post-processing mode corresponding to the function so as to obtain and preview a better preview video.

It should also be understood that in the process of recording a video, the terminal may acquire an image and acquire parameters corresponding to the image, such as a camera pose, an exposure coefficient, gray scale information, and the like. The camera pose corresponding to the image is the pose of the camera when the image is acquired, the exposure coefficient corresponding to the image is the exposure coefficient of the camera used for acquiring the image, and the gray information corresponding to the image is the gray information extracted from the image, and can be the gray value of the image or the gray information extracted from the image by down-sampling. It should be understood that the gray scale information of the images acquired by different exposure coefficients is different.

After receiving an instruction for indicating the end of recording, the terminal ends recording of the video, obtains all images collected in the recording time period to form an original global video, and also can obtain parameters corresponding to each frame of image in the original global video to form a first parameter set. In the embodiment of the application, the whole video recording process of the terminal can be divided into N previewing processes, and after the N previewing processes are executed, the whole video recording process is completed.

In the embodiment of the present application, an ith preview process is taken as an example to describe each preview process of N preview processes, each preview process is implemented by a terminal, i is 1, 2, 3, …, N, and the ith preview process includes, but is not limited to, the following partial or all steps:

s1011, obtaining the ith original video clip and the parameter of the ith original video clip.

It should be understood that the ith original video segment may be a continuous multi-frame image, where one frame of image corresponds to one parameter, and the parameter of the ith original video segment is a set of parameters corresponding to each frame of image in the ith original video segment.

Specifically, according to the requirements of different video post-processing algorithms, the mode of the original video clip obtained by the terminal and the parameters of the obtained original video clip can be different. Alternatively, the terminal may execute one or more video post-processing algorithms simultaneously. For example, when the video needs to be stabilized, a video image stabilization algorithm may be used, at this time, the original video segment may be a continuous multi-frame image, the parameter may be a camera pose, and the parameter of the ith original video segment may be an original pose sequence formed by arranging the camera poses of each frame image in the ith original video segment in the acquisition order of the images.

S1012, processing the video sub-segment in the ith original video segment according to the parameter of the ith original video segment to obtain the ith preview video segment.

Specifically, step S1012 may be executed by an image processing module of the terminal, and the image processing module may process the video sub-segment corresponding to the video post-processing algorithm through a video post-processing algorithm to obtain the ith preview video segment.

Optionally, the ith original video segment may include W frames of images acquired continuously, where the previous V frame of image may be an image acquired before the ith preview process, and the W-V frame of image may be an image acquired by a camera during the ith preview process. At this time, the ith preview video clip previewed in the ith preview process may be a W-V frame image of the terminal after video post-processing.

S1013, displaying the ith preview video clip;

specifically, the image processing module in the terminal may send the ith preview video segment to the preview display module, and the preview display module performs preview display on the ith preview video segment. In the embodiment of the application, the original video clip and the parameters of the original video clip may be collected and sent in real time, that is, after the image collection module collects a frame of image and the parameters of the image, the image and the parameters of the image may be sent to the image processing module, so that the terminal may obtain the first parameter set when the execution of the N preview processes is finished. The original video clip and the N preview video clips can also be sent to the encoding module in real time, so that when the terminal finishes executing the N preview processes, the encoding module encodes the original video image and the N preview video clips to form an original global video and a preview video.

It can be understood that, in order to implement the real-time property of video preview, the image frame number of the ith preview video clip is small, but the effect of previewing the video clip is not as good as that of performing global processing on the original global video due to the small image frame number, so that after the video recording is finished, the data of the original global video is sent to the server, and the video with better effect after global optimization is finally obtained through the interactive processing of the server and the terminal.

S102, the terminal sends the first parameter set to the server.

The method comprises the steps that after a terminal receives an instruction for indicating the end of recording, the recording of a video is ended, it is understood that the terminal executes N times of preview processes, an original global video and relevant parameters of the original global video, such as an exposure coefficient, a camera pose and the like, concretely, the terminal can determine parameters needed by a video processing algorithm corresponding to the functions according to the functions selected by user operation, such as image stabilization and the like, determine a first parameter set from the relevant parameters of the original global video and the original global video, and send the obtained first parameter set to a server.

S103, the server processes the first parameter set to obtain a second parameter set.

In particular, it should be understood that the first parameter set may further include indication information indicating a video post-processing algorithm corresponding to the first parameter set. The server may process the first parameter set by using a corresponding video post-processing algorithm according to the indication information, to obtain a second parameter set.

For example, for video image stabilization processing, the first set of parameters may include a global pose sequence and indication information, and the indication information may be used for indicating the server to process the global pose sequence by using camera motion compensation, where the global pose sequence is formed by arranging camera poses of each frame of image in an acquisition order of the image by the terminal in a video recording process. Specifically, after receiving the global pose sequence and the indication information, the server processes each camera pose in the global pose sequence according to a manner indicated by the indication information to obtain a processed global pose sequence, where the global pose sequence corresponds to the camera poses in the processed global pose sequence one to one, and the processed global pose sequence is a second parameter set.

And S104, the server sends the second parameter set to the terminal.

And after processing the second parameter set, the server sends the second parameter set to the terminal sending the first parameter set.

And S105, the terminal processes the original global video through the second parameter set to obtain the target global video.

The second parameter set may include indication information indicating a video post-processing algorithm, and after receiving the second parameter set, the terminal processes the original global video according to the video post-processing algorithm in the indication information, so as to obtain a target global video processed by the video post-processing algorithm. Specifically, the terminal may also receive a plurality of second parameter sets, and the terminal may perform multiple processing on the original global video through the corresponding second parameter sets respectively according to the video post-processing algorithm in the indication information to obtain the target global video.

And S106, the terminal stores the target global video.

Specifically, when the target global video is obtained through processing, the terminal may delete the preview video corresponding to the target global video, and store the target global video at the position of the preview video. The terminal can directly update the preview video to the target global video so that the user can obtain a video with a better video effect when the video is reopened, and can also provide an update notification to remind the user whether to update the preview video so as to obtain a video with a better video effect.

In this embodiment of the present invention, first, a terminal may respond to a video recording operation input by a user, perform video recording and display a preview video in real time by executing N times of preview processes, and when the terminal responds to an operation of ending the recording input by the user, may obtain an original global video and a first parameter set, where N is a positive integer not less than 2, the original global video is a set of images obtained in the N times of preview processes, and the first parameter set includes parameters of each frame of image in the original global video, where the ith preview process may include: the terminal firstly obtains an ith original video segment and parameters of the ith original video segment, then processes video sub-segments in the ith original video segment according to the parameters of the ith original video segment to obtain an ith preview video segment, and displays the ith preview video segment, wherein i is a positive integer not greater than N. And then, the terminal sends the first parameter set to a server so that the server processes the first parameter set to obtain a second parameter set, and the server sends the second parameter set to the terminal. And after receiving the second parameter set, the terminal processes the original global video through the second parameter set to obtain a target global video, and finally, stores the target global video. The method comprises the steps that when a terminal carries out local processing and previewing on a video, the terminal carries out global parameter acquisition on an original global video through interaction with a server, global processing is carried out on the original global video according to the global parameter, a globally processed video is obtained, and the previewed video is updated. The method not only ensures the real-time performance of video preview, but also realizes the global processing of the video, and improves the quality of the video recorded by the terminal.

The video processing method is respectively described below by taking a video image stabilization technology and a video HDR technology as examples.

Referring to fig. 5, fig. 5 is a schematic flowchart illustrating a video processing method applied to a video image stabilization technique according to an embodiment of the present disclosure. As shown in fig. 5, the video processing method includes steps S201 to S206.

In order to realize video image stabilization, the terminal can detect motion information of the camera, such as a rotation angle, acceleration and the like, through the inertial sensor in a video recording process, determine a camera pose (also called a camera pose) of the camera when shooting each frame of image based on the motion information, record the camera pose, and further process an original video through a video image stabilization technology by combining the camera pose in video post-processing.

The original video segment in this embodiment is a continuous multi-frame image, the parameter is a camera pose, the parameter of the original video segment is an original pose sequence formed by arranging the camera poses of each frame of image in the original video segment in an image acquisition order, the first parameter set is a global pose sequence formed by arranging the camera poses of each frame of image in the original global video in the image acquisition order.

S201, the terminal executes N times of preview processes to obtain an original global video and a global pose sequence, wherein N is a positive integer not less than 2.

Specifically, the terminal may execute N times of preview processes to obtain an original global video and a global pose sequence, where N is a positive integer not less than 2, the original global video is a set of images acquired by the terminal in the N times of preview processes, and the global pose sequence is formed by arranging camera poses of each frame of image in the original global video in an image acquisition order.

and S2011, acquiring a camera pose sequence of the ith original video clip and the ith original video clip.

Specifically, the terminal may determine an image to be processed according to an acquisition order of the images, where the image to be processed may be one frame image or a plurality of frames of images, determine an original video clip according to the image to be processed, and further acquire the original video clip and a camera pose sequence of the original video clip, where the original video clip includes the image to be processed. For example, the terminal may determine the acquired first frame image as a first image to be processed, and then determine images of the first frame to the tenth frame as the first original video clip, the terminal may also determine the acquired second frame image as a second image to be processed, and then determine images of the second frame to the eleventh frame as the second original video clip, and so on, and the original video clip corresponding to the last frame image may be the last ten frame image. It should be understood that the greater the number of frames of the images included in the original video segment, the better the image stabilization effect can be achieved, but the original video segment includes at least two consecutive frames of images due to the limitations of the real-time performance of the video and the conditions of the terminal, such as the memory and algorithm of the terminal, and in particular, the number of frames of the images in the original video segment can be set according to the memory and algorithm calculation of the terminal.

In the video recording process of the terminal, the motion information of the camera, such as a rotation angle, acceleration and the like, can be detected through the inertial sensor, and the camera pose of the camera when each frame of image is shot is determined based on the motion information. Specifically, the terminal may obtain the camera pose of each frame of image in the ith original video clip by a camera pose estimation method, and then arrange the camera poses of each frame of image in the image obtaining order to obtain the camera pose sequence of the ith original video clip. The method for estimating the camera pose can be a feature point method, specifically, the terminal can extract features from two continuous frames of images, perform feature matching on the two frames of images to obtain pixel coordinates of a plurality of groups of matching points and a plurality of groups of matching points, further perform the solution of the camera pose according to the pixel coordinates of the plurality of groups of matching points and the plurality of groups of matching points, solve the coordinates and the rotation angle of the camera in a coordinate system, and obtain the camera pose corresponding to the two frames of images. The method for estimating the pose of the camera may be a block matching method, an optical flow method, or other methods, which are not limited herein.

S2012, processing the video sub-segments in the ith original video segment according to the camera pose sequence of the ith original video segment to obtain the ith preview video segment.

The camera pose sequence of the ith original video clip is the ith original pose sequence, and the video sub-clip in the ith original video clip can be the ith frame of image to be processed. Specifically, the image stabilization algorithm includes that the terminal processes each camera pose in the ith original pose sequence to obtain an ith optimized pose sequence, and then performs motion compensation on the ith frame of image to be processed according to the optimized pose sequence to obtain an ith preview video clip. For example, if the image to be processed is a first frame image and the first original video segment is an image of the first frame to the tenth frame, the terminal may first acquire camera poses of the images of the first frame to the tenth frame, obtain a camera pose sequence according to time arrangement, smooth-process the camera pose sequence to obtain an optimized pose sequence, and further process the first frame image by using the optimized pose sequence to obtain a processed image which is the first frame image of the preview video. It should be understood that the image to be processed may also be each frame image of the ith original video segment, and thus the camera pose sequence of the ith original video segment may also process each frame image in the ith original video segment to obtain the ith preview video segment.

The terminal can adopt a filter to smooth the original pose sequence to acquire an optimized pose sequence. For example, the original pose sequence is processed by using kalman filtering, which is a method for processing the original pose sequence by using a recursive estimation and recursive calculation form by using a linear system state equation. The optimization of the camera pose sequence may also be performed by constructing an objective optimization function, and other methods may also be used, which are not limited herein.

S2013, displaying the ith preview video clip;

specifically, the image processing module in the terminal may send the ith preview video segment to the preview display module in real time, and the preview display module performs preview display on the ith preview video segment.

In this embodiment, the terminal may acquire and send the image in real time through the image acquisition module, that is, the image acquisition module of the terminal may send the image to the encoding module after acquiring one frame of image, so that when the N times of preview processes are finished, the encoding module may encode the original video image to obtain the original global video, and the terminal stores the original global video.

S202, the terminal sends a global pose sequence to the server.

Specifically, in the N previewing processes, the terminal may store the camera pose of each frame of image in real time according to the image acquisition order, so that when the N previewing processes are completed, the terminal may obtain a global pose sequence, and at this time, the terminal sends the global pose sequence to the server.

In one implementation, please refer to fig. 6, where fig. 6 is a schematic diagram of video image stabilization processing based on a front end and a back end, as shown in fig. 6, when a user starts to record a video, a terminal starts to acquire an image and a camera pose of each frame of image, where a continuous multi-frame image is an original video segment, and further, the image and the camera pose are processed in two paths, one path is sent to the front end, and the other path is sent to the back end, where the front end and the back end are distinguished by the function of the video processing method, the front end implements real-time optimization and display of a local video, the back end implements optimization of a global video, and it needs to be noted that the function of the front end is implemented by the terminal, and the function of the back end is implemented by interaction of the terminal and a server. Specifically, after the terminal collects an original video and the camera pose of each frame of image of the original video, one path of the original video segment and the camera pose is sent to the front end, the front end can perform objective function solving on the ith original pose sequence to obtain the ith optimized pose series, the ith original video segment is processed according to the ith optimized pose series to obtain the ith preview video segment, a preview video can be obtained after multiple previewing processes are executed, the other path of the original video segment and the ith optimized pose series sends the collected image and the camera pose to the rear end, the rear end can generate a global pose sequence according to the original pose sequence, the global pose sequence can also be generated according to the collected camera pose, the global optimization sequence is obtained by performing objective function solving on the global pose sequence, and when a user finishes recording, the rear end performs image stabilizing processing on the original global video according to the global optimization sequence and replaces the preview video. When the user finishes recording, the back end can obtain a global pose sequence, specifically, the front end collects the camera pose and sends the camera pose to the back end, the back end processes the camera pose to obtain the global pose sequence, and the global pose sequence is sent to the server.

And S203, the server processes the global pose sequence to obtain a processed global pose sequence.

The server can process each camera pose in the global pose sequence to obtain a processed global pose sequence, namely a global optimization sequence, and the global pose sequence corresponds to the camera poses in the global optimization sequence one to one. Specifically, the method for processing the global pose sequence by the server may refer to the method for processing the original pose sequence by the terminal in step S2012 to obtain the optimized pose sequence, which is not described herein again.

And S204, the server sends the processed global pose sequence to the terminal.

And S205, the terminal processes the original global video through the processed global pose sequence to obtain a target global video.

And after receiving the global optimization sequence, the terminal processes the original global video according to an image stabilization algorithm to obtain a processed target global video. Specifically, the implementation of the image stabilization algorithm may refer to a process in which the terminal processes the original video segment to obtain the preview video segment in step S2012, and details are not repeated here.

And S206, the terminal stores the target global video.

Specifically, when the terminal processes the target global video, the terminal may delete the preview video corresponding to the target global video, and store the target global video at the position of the preview video.

In this embodiment of the present invention, first, an original global video and a global pose sequence may be obtained by executing N times of preview processes at a user terminal, where N is a positive integer not less than 2, the original global video is a set of images acquired in the N times of preview processes, and the global pose sequence includes a camera pose of each frame image in the original global video, where the ith preview process may include: the terminal firstly obtains the ith original video clip and the camera pose of the ith original video clip, then processes the video sub-clip in the ith original video clip according to the camera pose of the ith original video clip to obtain the ith preview video clip, and displays the ith preview video clip, wherein i is a positive integer not more than N. And then, the terminal sends the global pose sequence to a server so that the server processes the global pose sequence to obtain a processed global pose sequence, and the server sends the processed global pose sequence to the terminal. And after receiving the processed global pose sequence, the terminal processes the original global video through the second set to obtain a target global video, and finally, stores the target global video. According to the method, when the terminal carries out local processing and previewing on the video, the terminal obtains the global camera pose of the original global video through interaction with the server, then carries out global processing on the original global video to obtain the globally processed video, and updates the previewed video. The method not only ensures the real-time property of video preview, but also realizes the global processing of the video, and improves the image stabilizing effect of the terminal video.

Referring to fig. 7, fig. 7 is a flowchart illustrating a video processing method applied to the HDR technology of video according to an embodiment of the present application. As shown in fig. 7, the video processing method includes steps S301 to S306.

In order to realize high dynamic imaging of a video, a terminal can keep camera parameters used by a camera in a shooting process, such as exposure coefficients, exposure time, apertures and the like, in a video recording process, and the terminal can acquire multi-frame images in the same scene through different camera parameters.

The original video segment in this embodiment is a frame or multi-frame image, where a frame of image includes at least two image layers with different exposure parameters. The parameter in this embodiment may be gray scale information of a layer, and the parameter of the ith original video segment includes gray scale information of each layer of each frame image in the ith original video segment. The first parameter set in this embodiment is a gray level information set, the gray level information set includes gray level information of each frame image in the original global video, the second parameter set is a fusion parameter set, the fusion parameter set is used to enhance correlation between image frames of a video, and the fusion parameter set is a set of fusion parameters obtained by processing the gray level information set by the server.

S301, the terminal executes N times of preview processes to obtain an original global video and gray information set, wherein N is a positive integer not less than 2.

Specifically, the terminal may process and preview the original video obtained in the whole recording process in batches, and preview the original video in N times, that is, the terminal executes the N times of preview process in the video recording process, where N is a positive integer not less than 2. The original global video is a set of images acquired by the terminal in N times of previewing processes.

s3011, obtaining the ith original video clip and the gray scale information of the ith original video clip.

Specifically, in the video recording process, the terminal may use different camera parameters, such as an exposure coefficient, an exposure time, an aperture, and the like, to obtain a multi-frame image in the same scene through the different camera parameters, where each obtained frame image includes a plurality of images (also referred to as layers) with different camera parameters. Specifically, the terminal may adopt a plurality of cameras to shoot the same scene, wherein camera parameters of the plurality of cameras are different from each other, and a plurality of layers with different exposure degrees may be obtained.

After acquiring multiple image layers of an image, a terminal can combine one or more frames of images into an original video clip to obtain an ith original video clip, where the ith original video clip at least includes 2 image layers of one frame of image. Meanwhile, the terminal can acquire parameters of each layer, such as gray information. The frame number of the image in the original video segment and the layer number of the layer in one frame of image may be set according to the conditions of the memory and computational power of the terminal, which is not limited herein.

S3012, processing the video sub-segment in the ith original video segment according to the gray information of the ith original video segment to obtain the ith preview video segment.

For example, the video sub-segment in the ith original video segment may be all images in the original video segment, or may be a main exposure layer of each frame image in the original video segment, where the main exposure layer may be any layer in the images, or may be a layer in which the gray value is the median. Specifically, the terminal may process each frame image of a video sub-segment in the ith original video segment according to the gray level information of the ith original video segment and according to the HDR algorithm, so as to obtain a main image layer of each frame image in the video sub-segment, where the main image layer is an HDR image of the corresponding image.

In one implementation, the terminal may process a one-frame image according to a one-frame HDR algorithm, where the ith original video clip is a one-frame image. Specifically, during image acquisition, the terminal may first obtain illumination information of a current scene through scene recognition, select a main exposure coefficient according to the illumination information of the current scene, and then shoot an image according to the main exposure coefficient, where the image is a main exposure layer, and then shoot two images with different exposure coefficients, for example, shoot an image with a higher exposure value and an image with a lower exposure value, where the image with the higher exposure value is a high exposure layer and the image with the lower exposure value is a low exposure layer. The terminal determines a dark area and a bright area of the main exposure layer according to the gray information of the main exposure layer, extracts a local gray value from other layers, and applies the local gray value to a corresponding area of the main layer, for example, a gray value of a low exposure layer is applied to the dark area, and a gray value of a high exposure layer is applied to the bright area, so as to obtain the main layer. Finally, the terminal can process the gray value of the main layer according to a tone mapping algorithm, specifically, an average gray value can be calculated according to the gray value of the main layer, a gray value range is selected according to the average gray value, and then the main layer is mapped to the gray value range to obtain the ith preview video segment. The tone mapping algorithm may adopt a global tone mapping algorithm or a local tone mapping algorithm, which is not limited herein.

In another implementation, the ith preview video clip may also be obtained through a multi-exposure image fusion model trained in advance in the terminal, which is not described herein again.

It should be noted that, in other implementation manners, other HDR algorithms may also be specifically used, and are not limited herein.

S3013, displaying the ith preview video clip;

specifically, the image processing module in the terminal may send the ith preview video segment to the preview display module, and the preview display module displays the ith preview video segment in real time.

And S302, the terminal sends the gray information set to the server.

Specifically, in the N previewing processes, the terminal may store the grayscale information of each frame of image in real time according to the image acquisition order, so that when the N previewing processes are completed, the terminal may obtain a grayscale information set, and at this time, the terminal sends the grayscale information set to the server.

In one implementation, please refer to fig. 8, where fig. 8 is a schematic diagram of a video HDR processing process based on a front end and a back end, a user starts to record a video, and a terminal may divide a plurality of layers of an image into two paths to process after acquiring a plurality of layers of the image of each frame with different exposure coefficients, where one path sends the plurality of layers of the image to the front end, and the other path sends the plurality of layers of the image to the back end, where the front end and the back end are distinguished by using the function of the video processing method, the front end implements real-time optimization and display of a local video, and the back end implements optimization of a global video. Specifically, after a terminal collects a plurality of layers of each frame of image with different exposure coefficients and an original global video, one path of the image is sent to a front end, the front end can select a main exposure layer from the plurality of layers of one frame of image and process the main exposure layer according to a single-frame high dynamic imaging processing algorithm to obtain a single-frame HDR image and display the single-frame HDR image in real time, a preview video is obtained when a user finishes recording, the other path of the image is sent to a rear end, the plurality of layers of the image are compressed to obtain a down-sampling layer, multi-frame registration is carried out on the plurality of down-sampling images to obtain a fusion parameter set, and finally, high dynamic imaging processing is carried out on the original global video according to the fusion parameter set to obtain a global optimization video to replace the preview video. It should be noted that the function of the front end is implemented by the terminal, and the function of the back end is implemented by the interaction between the terminal and the server. When the user finishes recording, the back end may obtain a down-sampling layer, specifically, multiple layers of an image collected by the front end are sent to the back end, the multiple down-sampling layers are obtained by the terminal in the back-end processing, and the multiple down-sampling layers are sent to the server.

And S303, the server processes the gray information set to obtain a fusion parameter set.

The fusion parameters are used for enhancing the correlation of the inter-frame images, and specifically, the fusion parameters can be determined according to an HDR video generation algorithm. For example, in an HDR video generation algorithm based on a motion estimation algorithm, the fusion parameter is a binary image, and specifically, the gray level information set sent by the terminal includes information of an HDR image generated by the terminal, for example, the server may generate a motion vector by using the motion estimation algorithm according to information of a first frame image of the HDR video, find out a dynamic region and a static region of a second frame image by using the motion vector, generate a dynamic region binary image, obtain a static region binary image by inverting the dynamic region binary image, so that the terminal, after receiving the dynamic region binary image and the HDR image of the static region, adds the dynamic region binary image to weight map generation for generating the HDR image, thereby generating the HDR image of the dynamic region, and generates the HDR image of the static region according to the static region binary image, and finally, performs weighted fusion on the HDR image of the dynamic region and the HDR image of the static region, thereby generating the HDR image of the second frame, and so on.

Specifically, it is assumed that each pixel value inside a sub-block of an image may be translated in the same way, then an optimal matching block of each sub-block of a current frame is searched from a certain Search area of a previous frame, and the sub-block is translated from the highest matching block obtained from the previous frame, wherein a mean absolute frame difference (MAD) criterion may be used as a criterion for measuring a matching effect to determine a matching block between two frames of images, so as to obtain a motion vector, or an Adaptive cross Pattern Search (ARPS) method may be used to predict a motion vector of a current block according to an adjacent block on the left side of the current block.

In other specific implementations, the video fusion method may also be an HDR video generation algorithm based on curve fitting, or may also be other HDR video generation methods, and the fusion parameter may also be other parameters, which is not limited in this embodiment.

S304, the server sends the fusion parameter set to the terminal.

S305, the terminal processes the original global video through the fusion parameter set to obtain a target global video.

In one implementation, a terminal receives a fusion parameter set, and processes an original global video according to an HDR video synthesis algorithm to obtain a processed target global video. Specifically, the HDR video generation algorithm may be implemented in the processes of the HDR processing in step S3012 and step S303, and is not described herein again.

S306, the terminal stores the target global video.

Specifically, when the target global video is obtained through processing, the terminal may delete the preview video corresponding to the target global video, and store the target global video at the position of the preview video.

In another implementation, the present embodiment may set a video HDR icon in the video recording function, so that a user selects whether to select a video HDR mode, and implement the video processing method applied to the video HDR technology by a user touch on the video HDR icon.

The specific step of starting the video HDR mode may include, before step S301:

s401, the user starts a camera application.

In the embodiment of the present application, the user may start the camera application by operating, for example, touching, an application icon of the camera application, and specifically refer to (a) in fig. 9.

S402, when the camera application is started, the mode loading module inquires the HAL layer about the mode.

In embodiments of the application, the HAL layer may provide video HDR mode for camera applications. In video HDR mode, a capability enabling module, an image capturing module and an image processing module in the HAL layer may be enabled to perform respective functions. In particular, the schema loading module may query the capability enabling module for the schema. The capability enabling module may, in response to a query by the pattern loading module, feed back to the pattern loading module a pattern provided by the HAL layer for the camera application, for example, the provided pattern including: video HDR mode, portrait mode, normal mode, night scene mode, video mode and the like.

And S403, loading the mode according to the query result by the mode loading module.

The loaded modes comprise video HDR modes, and the mode loading module initializes corresponding modules of the modes in an application program layer and a HAL layer in the loading process. After the initialization, the terminal may display an icon corresponding to each mode, which may be specifically referred to as (B) and (C) in fig. 9. After initialization, in response to a user's touch operation on an icon corresponding to the video HDR mode, the photographing control module may notify the capability enabling module, the image capturing module, the scene recognition module, and the image processing module in the HAL layer of start-up to perform respective functions. After initialization, other modes are similar to the video HDR mode, and corresponding modules in the HAL layer can be started in response to a user touching an icon corresponding to the mode.

S404, the user switches to the HDR mode.

As shown in (C) of fig. 9, the user may touch the HDR mode icon 204G on the mode selection interface 30 to switch to the HDR mode.

S405, responding to the touch operation of the user on the HDR mode icon 204G, the terminal starts a corresponding module to support video HDR processing.

Specifically, the processing procedure of the video HDR is as shown in step S201 to step S206.

After the capture control module is started, the capture control module may notify the capability enabling module to enable starting of modules related to the video recording mode in the HAL layer, such as an image acquisition module and an image processing module.

The user interface involved in the process of loading the HDR mode is described below. Referring to fig. 9, fig. 9 is a schematic diagram of a human-computer interaction interface according to an embodiment of the present disclosure. As shown in (a) of fig. 9, the terminal 100 may display the user interface 10 as a home screen interface 10 of the terminal 100. Home screen interface 10 includes calendar widget 101, weather widget 102, application icons 103, status bar 104, and navigation bar 105. Wherein:

the calendar gadget 101 may be used to indicate the current time, such as the date, day of the week, time division information, etc.

The weather widget 102 may be used to indicate a weather type, such as cloudy sunny, light rain, etc., may also be used to indicate information such as temperature, and may also be used to indicate a location.

The application icon 103 may include an icon of Wechat (Wechat), an icon of QQ (Tencent QQ), an icon of Gallery (Gallery), an icon of camera (camera), and the like, and may further include icons of other applications, which is not limited in this embodiment of the present application. Any application icon can be used for responding to the operation of the user, such as touch operation, so that the terminal starts the application corresponding to the icon.

The name of the operator (e.g., china mobile), time, WI-FI icon, signal strength, and current remaining power may be included in the status bar 104.

Navigation bar 105 may include: a return key 1051, a home screen key 1052, an outgoing call task history key 1053, and other system navigation keys. The home screen interface is an interface displayed by the terminal 100 after any user interface detects a user operation on the home interface key 1052. When it is detected that the user clicks the return key 1051, the terminal 100 may display a user interface previous to the current user interface. When detecting that the user clicks the home interface key 1052, the terminal 100 may display the home screen interface 10. When it is detected that the user clicks the outgoing job history key 1053, the terminal 100 may display the job that the user has recently opened. The names of the navigation keys may also be other keys, for example, 1051 may be called Back Button,1052 may be called Home Button, and 1053 may be called Menu Button, which is not limited in this embodiment of the present application. The navigation keys in the navigation bar 105 are not limited to virtual keys, but may be implemented as physical keys.

The user launches the camera application, which may be accomplished by touching the camera icon. As shown in fig. 9 (a), in response to a touch operation of the camera icon by the user, the mode loading module performs steps S402 to S403. After the mode loading module completes loading the modes, the terminal 100 may display an icon corresponding to each mode.

Illustratively, the loaded modes include a night mode, a portrait mode, a photograph mode, a short video mode, a video mode, and the like. As shown in (B) of fig. 9, the terminal 100 may display the camera application interface 20. Icons 204 corresponding to the loaded completed modes may be included on the camera application interface 20. Icons 204 may include a night mode icon 204A, a portrait mode icon 204B, a take picture mode icon 204C, a short video mode icon 204D, a record mode icon 204E, and more icons 204F. The more icon 204F is used to display an icon of a loaded completed mode, specifically referring to the description of (C) in fig. 9. The shooting control module may start a mode corresponding to any one of the icons 204 in response to a touch operation of the user on the icon.

As shown in fig. 9 (B), the camera application interface 20 may further contain a captured image redisplay control 201, a shooting control 202, a camera switching control 203, a finder frame 205, a focus control 206A, a setting control 206B, and a flash switch 206C.

Wherein:

a captured image redisplay control 201 for the user to view captured images and video.

And the camera switching control 203 is used for switching the camera for acquiring the image between the front camera and the rear camera.

And a viewing frame 205 for performing real-time preview display on the acquired image.

And a focusing control 206A for focusing the camera.

And the setting control 206B is used for setting various parameters during image acquisition.

A strobe switch 206C for turning on/off the strobe.

As shown in (C) of fig. 9, in response to the user' S touch operation on the more icon 204F, the terminal 100 displays the mode selection interface 30, and the mode selection interface 30 may include icons of other loaded completed modes through step S403.

Mode selection interface 30 may include a video mode icon 204G and may also include an HDR mode icon, a professional video mode icon, a skin makeup video mode icon, a slow motion mode icon, a gourmet mode icon, a 3D dynamic panoramic mode icon, a streamer shutter mode icon.

In this embodiment, the terminal may respond to a user operation to open the camera application, and then display the camera application interface 20 on the display screen.

The user can operate any mode icon, for example, touch operation is performed to start the corresponding mode, and then the terminal starts the corresponding module in the HAL layer.

In this embodiment of the present invention, first, a terminal may execute N times of preview processes to obtain an original global video and a gray scale information set, where N is a positive integer not less than 2, the original global video is a set of images acquired in the N times of preview processes, and the gray scale information set includes gray scale information of each frame image in the original global video, where the ith preview process may include: the terminal firstly obtains the ith original video clip and the gray information of the ith original video clip, then processes the video sub-clip in the ith original video clip according to the gray information of the ith original video clip to obtain the ith preview video clip, and displays the ith preview video clip, wherein i is a positive integer not greater than N. And then, the terminal sends the gray information set to a server so that the server processes the gray information set to obtain a fusion parameter set, and the server sends the fusion parameter set to the terminal. And after receiving the fusion parameter set, the terminal processes the original global video through the fusion parameter set to obtain a target global video, and finally, stores the target global video. The method includes the steps that when a terminal carries out local processing and previewing on a video, the terminal obtains a fusion parameter set of an original video through interaction with a server, then the original video is subjected to overall processing to obtain an overall processed video, and the previewed video is updated to the overall processed video. The method not only ensures the real-time performance of video preview, but also realizes the global processing of the video, and improves the quality of HDR video recorded by the terminal.

It can be understood that, in the embodiment of the present application, the video image stabilization technology and the video HDR technology in the video post-processing algorithm are taken as examples to describe the video processing method, but the embodiment of the present application is not limited to the video image stabilization technology and the video HDR technology, and the video processing method may also be used under other video post-processing algorithms, and the embodiment of the present application is not limited thereto.

Fig. 10 is a schematic diagram of a hardware structure of a server according to an embodiment of the present invention. The server 200 shown in fig. 10 (the server 200 may specifically be a computer device) includes a memory 201, a processor 202, a communication interface 203, and a bus 204. The memory 201, the processor 202 and the communication interface 203 are connected to each other through a bus 204.

The Memory 201 may be a Read Only Memory (ROM), a static Memory device, a dynamic Memory device, or a Random Access Memory (RAM). The memory 201 may store a program, and the processor 202 and the communication interface 203 are used to perform the steps of video processing in the embodiments of the present application when the program stored in the memory 201 is executed by the processor 202.

The processor 202 may be a general-purpose Central Processing Unit (CPU), a microprocessor, an Application Specific Integrated Circuit (ASIC), a Graphics Processing Unit (GPU) or one or more Integrated circuits, and is configured to execute related programs to implement the video Processing method according to the embodiment of the present invention.

The processor 202 may also be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the video processing method of the present application may be implemented by integrated logic circuits of hardware or instructions in the form of software in the processor 202. The processor 202 may also be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, or discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in the memory 201, and the processor 202 reads the information in the memory 201 and completes the video processing method of the embodiment of the present application in combination with the hardware thereof.

The communication interface 203 enables communication between the server 200 and other devices or communication networks using transceiver means such as, but not limited to, a transceiver. For example, data (such as the global pose sequence in the embodiment of the present application) may be acquired through the communication interface 203.

Bus 204 may include a path that transfers information between various components of server 200 (e.g., memory 201, processor 202, communication interface 203). In the above-described embodiments, all or part of the functions may be implemented by software, hardware, or a combination of software and hardware. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium. The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., a floppy disk, a hard disk, a magnetic tape), an optical medium (e.g., a DVD), or a semiconductor medium (e.g., a Solid State Disk (SSD)), among others.

One of ordinary skill in the art will appreciate that all or part of the processes in the methods of the above embodiments may be implemented by hardware related to instructions of a computer program, which may be stored in a computer-readable storage medium, and when executed, may include the processes of the above method embodiments. And the aforementioned storage medium includes: various media capable of storing program codes, such as ROM or RAM, magnetic or optical disks, etc.

Claims

1. A video processing method is applied to a terminal, and the method comprises the following steps:

executing N times of preview processes to obtain an original global video and a first parameter set, wherein N is a positive integer not less than 2, the original global video is a set of images obtained in the N times of preview processes, the first parameter set comprises parameters of each frame of image in the original global video, and the ith preview process comprises the following steps: acquiring an ith original video clip and parameters of the ith original video clip; processing the video sub-segment in the ith original video segment according to the parameter of the ith original video segment to obtain an ith preview video segment; displaying the ith preview video clip; i is a positive integer not greater than N;

receiving the second set of parameters;

and storing the target global video.

2. The method of claim 1, wherein the parameters comprise camera poses, and the parameters of the ith original video segment are a sequence of original poses formed by arranging the camera poses of each frame image in the ith original video segment in the order of image acquisition; the first set of parameters is a sequence of camera poses formed by arranging the camera poses of each frame of image in the original global video in the order of acquisition of the images.

3. The method of claim 2, wherein the processing the video sub-segment in the ith original video segment according to the parameter of the ith original video segment to obtain the ith preview video segment comprises:

4. The method of claim 2 or 3, wherein the second set of parameters is a sequence of camera poses processed by the server for the first set of parameters, the first set of parameters corresponding one-to-one to the camera poses in the second set of parameters.

5. The method according to claim 1, wherein each frame image in the ith original video segment comprises at least 2 layers, and the exposure parameters of the at least 2 layers are different from each other; the parameter is used for indicating the gray information of the image layer; the parameters of the ith original video clip comprise gray information of each frame image in the ith original video clip; the first parameter set comprises gray scale information of each frame of image in the original global video.

6. The method of claim 5, wherein the processing the video sub-segment in the ith original video segment according to the parameter of the ith original video segment to obtain the ith preview video segment comprises:

wherein the ith preview video clip comprises a main layer of each frame of image in the video sub-clip.

7. The method according to claim 5 or 6, wherein the second set of parameters is a set of fusion parameters resulting from the processing of the first set of parameters by the server; the target global video obtained by processing the original global video through the second parameter set comprises:

8. A video processing method applied to a server, the method comprising:

processing the first parameter set to obtain a second parameter set;

9. The method of claim 8, wherein the parameters include a camera pose; the first set of parameters is a sequence of camera poses formed by arranging the camera pose of each frame of image in the original global video in the order of acquisition of the images.

10. The method according to claim 8, wherein each frame image in the original global video comprises at least 2 layers, and exposure parameters of the at least 2 layers are different from each other, and the parameters are used for indicating gray scale information of the layers; the first parameter set comprises gray scale information of each frame of image in the original global video.

11. A terminal, comprising: the system comprises a processor, a memory, a display screen, at least one camera and a communication interface, wherein the at least one camera is used for collecting images; the memory, the display screen, the at least one camera, and the communication interface are coupled to the processor, the memory is configured to store instructions, and the processor is configured to invoke the instructions stored by the memory to perform:

receiving, by the communication interface, the second set of parameters;

and storing the target global video.

12. The terminal of claim 11, wherein the parameters include camera poses, and the parameters of the ith original video segment are a sequence of original poses formed by arranging the camera poses of each frame of image in the ith original video segment in the order of acquisition of the images; the first set of parameters is a sequence of camera poses formed by arranging the camera pose of each frame of image in the original global video in the order of acquisition of the images.

13. The terminal of claim 12, wherein the processor performs the processing on the video sub-segment in the ith original video segment according to the parameter of the ith original video segment to obtain the ith preview video segment, and comprises:

14. The terminal of claim 12 or 13, wherein the second set of parameters is a sequence of camera poses processed by the server for the first set of parameters, the first set of parameters corresponding one-to-one to the camera poses in the second set of parameters.

15. The terminal according to claim 11, wherein each frame image in the ith original video segment comprises at least 2 layers, and the exposure parameters of the at least 2 layers are different from each other; the parameter is used for indicating the gray information of the image layer; the parameters of the ith original video clip comprise gray information of each frame image in the ith original video clip; the first parameter set comprises gray scale information of each frame of image in the original global video.

16. The terminal of claim 15, wherein the processor performs the processing on the video sub-segment in the ith original video segment according to the parameter of the ith original video segment to obtain the ith preview video segment comprises:

17. The terminal according to claim 15 or 16, wherein the second parameter set is a set of fusion parameters obtained by the server processing the first parameter set; the processor executes the target global video obtained by processing the original global video through the second parameter set, and the execution includes:

18. A server, comprising: a processor, a memory, and a communication interface, the memory, the processor coupled with the communication interface, the memory to store computer program code, the computer program code including computer instructions, the processor to invoke the computer instructions to perform:

processing the first parameter set to obtain a second parameter set;

19. The server of claim 18, wherein the parameters include a camera pose; the first set of parameters is a sequence of camera poses formed by arranging the camera pose of each frame of image in the original global video in the order of acquisition of the images.

20. The server according to claim 18, wherein each frame image in the original global video comprises at least 2 image layers, the exposure parameters of the at least 2 image layers are different from each other, and the parameter is used for indicating gray scale information of the image layers; the first parameter set comprises gray scale information of each frame of image in the original global video.

21. A computer-readable storage medium comprising instructions that, when executed on a terminal, cause the terminal to perform the method of any one of claims 1 to 7.

22. A computer-readable storage medium comprising instructions that, when executed on a server, cause the server to perform the method of any of claims 8 to 10.