CN113747085A

CN113747085A - Method and device for shooting video

Info

Publication number: CN113747085A
Application number: CN202011043999.XA
Authority: CN
Inventors: 赵威; 李宏俏; 李宗原; 赵鑫源; 李成臣; 曾毅华; 廖桂明; 周承涛; 李欣; 周蔚
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2020-05-30
Filing date: 2020-09-28
Publication date: 2021-12-03
Anticipated expiration: 2040-09-28
Also published as: CN113747050B; CN113747085B; CN113747050A; WO2022062318A1; WO2021244295A1

Abstract

The application discloses a method and a device for shooting a video, which relate to the technical field of shooting and image processing, and can enable the white balance effect of a Hirschhook zoom video to be better so as to improve the quality of the Hirschhok zoom video. The method comprises the following steps: and when the terminal is farther away from the target subject, acquiring N +1 images including the target subject in real time aiming at the first scene. And for N images acquired later, performing white balance processing based on a neural network for ensuring white balance consistency of adjacent images in a time domain to obtain N optimized images. Carrying out amplification cutting on the N optimized images to obtain N target images; the size of a target subject in the N target images is consistent with that of the target subject in the first acquired image, and the relative position of the target subject in the N target images is consistent with that of the target subject in the first image; the N target images are of the same size as the first image. A hessian zoom video is generated based on the N target images and the first image.

Description

Method and device for shooting video

The present application claims priority from a chinese patent application filed by the national intellectual property office on 30/05/2020, having application number 202010480536.3 entitled "a zoom method and apparatus for distinguishing a subject person from a background", which is incorporated herein by reference in its entirety.

Technical Field

The present application relates to the field of photographing technologies and image processing technologies, and in particular, to a method and an apparatus for photographing a video.

Background

With the rapid development of the camera shooting function of the mobile phone, more and more camera users begin to try to add movie shooting elements into a video so as to increase the richness and the high-grade feeling of a picture and enable the user to experience a more cool and gorgeous visual effect. One of the satisfying such visual effects is the heucheck zoom video.

Hirschk zoom is a special video capture technique. In shooting a heucheck zoom video, the camera is moved forward or backward, and the focal length is changed while moving forward or backward, so that the size of the target subject is not changed in the captured image, while the background image is drastically changed. The effect can be used for reflecting rich emotion of the main character, and can bring tension and impact feeling of space compression or expansion to the user, so that the super-conventional video recording experience is obtained.

Therefore, how to realize the heucher zoom becomes a technical problem to be solved urgently.

Disclosure of Invention

The embodiment of the application provides a method and a device for shooting a video, which can enable the white balance effect of an obtained xico k zoom video to be better, so that the quality of the xico k zoom video is improved, and further the user experience is improved.

In order to achieve the purpose, the technical scheme is as follows:

in a first aspect, a method for shooting a video is provided, and the method is applied to a terminal. The method comprises the following steps: acquiring N +1 images in real time aiming at a first scene, wherein the N +1 images comprise target main bodies; wherein, in the process of acquiring the N +1 images, the terminal is farther away from the target subject. N is an integer of 1 or more. Carrying out white balance processing on the N images acquired later in the N +1 images based on a preset neural network to obtain N optimized images; the preset neural network is used for ensuring the white balance consistency of the adjacent images in the time domain. Amplifying and cutting the N optimized images to obtain N target images; the size of the target subject in each of the N target images is consistent with the size of the target subject in the first image acquired from the N +1 images, and the relative position of the target subject in each of the N target images is consistent with the relative position of the target subject in the first image. The N target images are of the same size as the first image. A hessian zoom video is generated based on the N target images and the first image.

According to the technical scheme, in the process of acquiring the Hirschhorn zoom video, white balance processing is carried out on the last N images in the N +1 images acquired in real time, so that the white balance of the processed images is consistent with that of the first image in the N +1 images acquired. Therefore, the white balance effect of the obtained Hirschk zoom video is better, the quality of the Hirschk zoom video is improved, and the user experience is improved.

In one possible design, the N +1 images include N1+1 images captured before and N2 images captured after, where N1+1 images are captured by a first camera of the terminal and N2 images are captured by a second camera of the terminal; n1 and N2 are each an integer of 1 or more. That is to say, the technical solution provided by the embodiment of the present application can be applied to shooting a hessian kock zoom video in a scene where cameras are switched.

In one possible design, acquiring N +1 images in real-time for a first scene includes: acquiring the shooting magnification of the ith image in the N +1 images; wherein i is not less than 2 and not more than N, i is an integer; if the shooting magnification of the ith image is within a first shooting magnification range, acquiring the (i + 1) th image in the (N + 1) th images aiming at a first scene based on a first camera of a terminal; and if the shooting magnification of the ith image is within the second shooting magnification range, acquiring the (i + 1) th image in the (N + 1) th images aiming at the first scene based on a second camera of the terminal. Wherein the multiplying power of the first camera is a, and the multiplying power of the second camera is b; a is less than b; the first photographing magnification range is [ a, b); the second shooting magnification range is a range equal to or greater than b.

That is, the terminal determines a camera which acquires the (i + 1) th image based on the shooting magnification of the (i) th image. Therefore, the terminal can amplify the size of the target main body in the subsequently acquired image in a camera switching mode, and compared with the traditional technology, the method is beneficial to enabling the definition of the obtained Hirschhorn zooming effect video to be higher, and therefore user experience is improved.

In one possible design, the shooting magnification of the ith image is determined based on a zoom magnification of the size of the target subject in the ith image relative to the size of the target subject in the first image, and a magnification of a camera that captures the first image.

In one possible design, the size of the target subject in the ith image is characterized by at least one of the following features: the width of the target subject in the ith image, the height of the target subject in the ith image, the area of the target subject in the ith image, or the number of pixel points occupied by the target subject in the ith image.

In one possible design, the method further includes: an example segmentation algorithm is used to extract the target subject from the ith image to determine the size of the target subject in the ith image. This helps to improve the accuracy of determining the size of the target subject in the ith image.

In one possible design, the method further includes: and displaying the first information in the current preview interface. The first information is for instructing to stop shooting the Hirscoke zoom video. In this way, the user can know when to stop the mobile terminal, thereby improving the user experience.

In one possible design, the method further includes: and displaying the second information in the current preview interface. The second information is used to indicate that the target subject is stationary. One of the requirements of the Hirschk zoom video is that the positions of target main bodies in the images are consistent, so that based on the possible design, a user can know whether the requirement of obtaining the Hirschk zoom video is met currently in the process of obtaining the Hirschk zoom video, and the user experience is improved.

In one possible design, the method further includes: and displaying third information in the current preview interface, wherein the third information is used for indicating that the target main body is in the center of the current preview image. In this way, the user can determine whether to move the terminal based on whether the terminal displays the third information, thereby contributing to an improvement in the quality of the heuchker video. The current preview interface contains a current preview image (i.e., an image captured by a camera) and information (such as a shooting control, indication information, etc.) other than the current preview image.

In one possible design, the method further includes: acquiring N +1 images in real time for a first scene, comprising: the first of the N +1 images is acquired when the target subject is in the center of the current preview image. This helps to improve the quality of the Hirschk video.

In one possible design, the method further includes: and displaying a user interface, wherein the user interface comprises a first control used for indicating that the Hooke zoom video is shot from near to far. Acquiring N +1 images in real time for a first scene, comprising: an operation for a first control is received, and in response to the operation, N +1 images are acquired in real-time for a first scene.

In one possible design, the moving speed of the terminal is less than or equal to a preset speed. This helps to improve the quality of the Hirschk zoom video.

In one possible design, the preset neural network is used for predicting the white balance gain of the image to be processed by combining with the characteristic diagram of the historical network layer so as to ensure the white balance consistency of the adjacent images in the time domain; wherein the history network layer is a network layer used in predicting a white balance gain of an image preceding and temporally continuous with the image to be processed. Illustratively, the image to be processed is one of the N images described above. The white balance network fuses the network layer feature information of the current frame and the historical frame. Therefore, the multi-frame information is considered, the predicted value of the white balance gain between the frames is enabled to be closer, the white balance network is enabled to be more stable, and the white balance consistency effect between the images obtained after the continuous multiple images are subjected to white balance processing is enabled to be better.

In one possible design, the preset neural network is obtained based on preset constraint condition training; wherein the preset constraint condition comprises: the predicted values of the white balance gains for a plurality of images which are continuous in an analog time domain are consistent.

In one possible design, for N images acquired later in the N +1 images, white balance processing is performed based on a preset neural network to obtain N optimized images, including: inputting a jth image in the N +1 images into a preset neural network to obtain a predicted value of a white balance gain of the jth image; wherein j is more than or equal to 2 and less than or equal to N-1, and j is an integer. Applying the white balance gain predicted value of the jth image to obtain an optimized image corresponding to the jth image; and the N optimized images comprise optimized images corresponding to the jth image.

In a second aspect, a method for shooting a video is provided, and the method is applied to a terminal, and includes: acquiring N +1 images aiming at a first scene, wherein the N +1 images comprise target bodies; in the process of acquiring the N +1 images, the terminal is closer to the target main body; n is an integer of 1 or more. The first image in the N +1 images is acquired by a first camera of the terminal, part or all of the last N images in the N +1 images are acquired by a second camera of the terminal, and the multiplying power of the second camera is smaller than that of the first camera. The size of the target subject in the later acquired N images of the N +1 images is less than or equal to the size of the target subject in the first image acquired in the N +1 images. Carrying out white balance processing on the later acquired N images in the N +1 images based on a preset neural network to obtain N optimized images; the preset neural network is used for ensuring the white balance consistency of the adjacent images in the time domain. And amplifying and cutting the N optimized images to obtain N target images. The size of a target body in each target image in the N target images is consistent with the size of the target body in a first image acquired from the N +1 images, and the relative position of the target body in each target image in the N target images is consistent with the relative position of the target body in the first image; the N target images are of the same size as the first image. Based on the N target images and the first image, a heucher zoom video is generated.

Alternatively, the N +1 images may be N +1 images acquired continuously, i.e., N +1 images acquired in real time.

According to the technical scheme, in a scene that the terminal is closer to the target main body, the camera with the lower magnification rate is switched to acquire the subsequent image, so that the size of the target main body in the subsequent acquired image is smaller than or equal to the size of the target main body in the previously acquired image. And in the process of acquiring the Hirschk zoom video, white balance processing is performed on the last N images of the acquired N +1 images so that the processed images are consistent with the white balance of the first image of the acquired N +1 images. Therefore, the white balance effect of the obtained Hirschk zoom video is better, the quality of the Hirschk zoom video is improved, and the user experience is improved.

In one possible design, the N images include N1 images captured before and N2 images captured after, where N1 images are captured by the second camera and N2 images are captured by the third camera of the terminal; n1 and N2 are each an integer of 1 or more.

In one possible design, acquiring N +1 images in real-time for a first scene includes: acquiring the shooting magnification of the ith image in the N +1 images; wherein i is not less than 2 and not more than N, i is an integer; acquiring an (i + 1) th image in the (N + 1) th images aiming at the first scene based on the second camera if the shooting magnification of the ith image is within the first shooting magnification range; if the shooting magnification of the ith image is within the second shooting magnification range, acquiring the (i + 1) th image in the (N + 1) th images aiming at the first scene based on a third camera of the terminal; wherein the multiplying power of the second camera is b, and the multiplying power of the third camera is c; b is more than c; the first shooting magnification range is a range greater than or equal to b; the second photographing magnification range is [ c, b).

That is, the terminal determines a camera which acquires the (i + 1) th image based on the shooting magnification of the (i) th image. Therefore, the terminal can adopt the camera with the magnification smaller than that of the camera used for collecting the previous image to collect the subsequent image, namely, the target main body is reduced by switching to the camera with the smaller magnification, so that the acquired image is not required to be subjected to edge repairing, and the user experience is improved.

In one possible design, the method further includes: in the current preview interface, first information is displayed, and the first information is used for indicating that the shooting of the Hirschhorn zoom video is stopped. In this way, the mobile terminal can be instructed to stop when appropriate, and therefore the user experience is improved.

In one possible design, the method further includes: and displaying second information in the current preview interface, wherein the second information is used for indicating that the target main body is static. Based on the possible design, a user can know whether the requirement for acquiring the Hirschhook zoom video is met currently in the process of acquiring the Hirschhok zoom video, so that the user experience is improved.

In one possible design, the method further includes: and displaying third information in the current preview interface, wherein the third information is used for indicating that the target main body is in the center of the current preview image. In this way, the user can determine whether to move the terminal based on whether the terminal displays the third information, thereby contributing to an improvement in the quality of the heuchker video.

In one possible design, the acquiring N +1 images for the first scene includes: the first image is acquired when the target subject is in the center of the current preview image. This helps to improve the quality of the Hirschk video.

In one possible design, the method further includes: and displaying a user interface, wherein the user interface comprises a second control used for indicating that the Koch zoom video is shot from far to near. Acquiring N +1 images for a first scene, comprising: an operation for a second control is received, and in response to the operation, N +1 images are acquired for the first scene.

In one possible design, the moving speed of the terminal is less than or equal to a preset speed. This helps to improve the quality of the high-altitude kock zoom video.

In one possible design, the preset neural network is used for predicting the white balance gain of the image to be processed by combining with the characteristic diagram of the historical network layer so as to ensure the white balance consistency of the adjacent images in the time domain; wherein the history network layer is a network layer used in predicting a white balance gain of an image preceding and temporally continuous with the image to be processed. The advantageous effects of which can be seen in the related possible designs of the first aspect described above.

In a third aspect, a method for shooting a video is provided, and is applied to a terminal, where the terminal includes a first camera and a second camera, and a magnification of the first camera is different from a magnification of the second camera. The method comprises the following steps: respectively acquiring a first image and a second image aiming at a first scene at a first moment through a first camera and a second camera; the first image and the second image both comprise the target subject. And determining the number N of the images to be inserted between the first image and the second image based on the preset playing duration and the preset playing frame rate of the video, wherein N is an integer greater than or equal to 1. And determining N images to be inserted based on the frame number N, the first image and the second image. Generating a video based on the first image, the second image and the N images to be inserted; the size of the target subject in each image in the video gradually becomes larger or smaller.

According to the technical scheme, the terminal collects multiple frames of images aiming at the same scene at the same time through the multiple cameras and performs frame interpolation based on the multiple frames of images, so that a video is generated, and the size of a target main body in each image in the video is gradually increased or decreased. This helps to improve the quality of the generated video compared to conventional techniques. In addition, the method is favorable for improving the interest of the motion picture effect and enhancing the viscosity of the user to the terminal.

In one possible design, the terminal further includes a third camera, and a magnification of the third camera is between the magnifications of the first camera and the second camera. The method further comprises the following steps: acquiring a third image for the first scene at a first time through a third camera; wherein the third image includes the target subject. Determining N images to be inserted based on the frame number N, the first image and the second image, comprising: and determining N images to be inserted based on the frame number N, the first image, the second image and the third image. In this way, it is helpful to further improve the quality of the video.

In a fourth aspect, a terminal is provided.

In one possible design, the terminal may be configured to perform any one of the methods provided in the first to third aspects. The present application may perform the division of the functional modules for the terminal according to any one of the methods provided in the first to third aspects. For example, the functional blocks may be divided for the respective functions, or two or more functions may be integrated into one processing block. For example, the terminal can be divided into an acquisition unit, a processing unit, a display unit and the like according to functions. All possible technical solutions and advantageous effects of the execution of the divided functional modules may refer to the corresponding technical solutions provided in the first to third aspects, and are not described herein again.

In another possible design, the apparatus includes a memory for storing computer instructions and a processor for invoking the computer instructions to perform any one of the methods as provided in the first to third aspects. In this possible design, the acquisition step in any one of the methods provided in the first to third aspects may be replaced by a control acquisition step. The display step in the above-described corresponding method may be specifically replaced with a control display step in this possible design.

In a fifth aspect, a terminal is provided, including: processor, memory and camera. The camera is used for collecting images and the like, the memory is used for storing computer programs and instructions, and the processor is used for calling the computer programs and the instructions and executing the corresponding technical schemes provided by the first aspect to the third aspect in cooperation with the one or more cameras.

In a sixth aspect, a computer-readable storage medium, such as a computer-non-transitory readable storage medium, is provided. Having stored thereon a computer program (or instructions) which, when run on a computer, causes the computer to perform any of the methods provided in the first to third aspects above. In this possible design, the acquisition step in any one of the methods provided in the first to third aspects may be replaced by a control acquisition step. The display step in the above-described corresponding method may be specifically replaced with a control display step in this possible design.

In a seventh aspect, there is provided a computer program product which, when run on a computer, causes any one of the methods provided in the first to third aspects to be performed. In this possible design, the acquisition step in any one of the methods provided in the first to third aspects may be replaced by a control acquisition step. The display step in the above-described corresponding method may be specifically replaced with a control display step in this possible design.

It is understood that any one of the terminals, computer storage media, computer program products, or chip systems provided above may be applied to the corresponding methods provided above, and therefore, the beneficial effects achieved by the methods may refer to the beneficial effects in the corresponding methods, and are not described herein again.

In the present application, the names of the above-mentioned terminals or functional modules do not limit the devices or functional modules themselves, and in actual implementation, the devices or functional modules may appear by other names. Insofar as the functions of the respective devices or functional modules are similar to those of the present application, they fall within the scope of the claims of the present application and their equivalents.

These and other aspects of the present application will be more readily apparent from the following description.

Drawings

Fig. 1 is a schematic hardware structure diagram of a terminal to which the embodiment of the present application is applicable;

fig. 2 is a block diagram of a software structure of a terminal to which the embodiment of the present application is applicable;

fig. 3 is a schematic diagram of an interface change for starting a heucher zoom video shooting mode according to an embodiment of the present disclosure;

fig. 4 is a schematic hardware structure diagram of a computer device according to an embodiment of the present application;

fig. 5 is a flowchart illustrating a training data preparation process before training a white balance network according to an embodiment of the present disclosure;

fig. 6 is a schematic diagram of a network architecture used in training a white balance network according to an embodiment of the present application;

fig. 7 is a schematic diagram of another network architecture used in training a white balance network according to an embodiment of the present application;

FIG. 8 is a diagram illustrating a network architecture used in a prediction phase according to an embodiment of the present application;

fig. 9 is a schematic flowchart of a method for predicting white balance gain according to an embodiment of the present disclosure;

fig. 10 is a schematic flowchart of a method for capturing video according to an embodiment of the present disclosure;

fig. 11 is a schematic diagram illustrating an interface change for starting a kock zoom video in a near-far mode according to an embodiment of the present disclosure;

FIG. 12 is a schematic diagram of a set of interfaces provided by an embodiment of the present application;

FIG. 13 is a schematic view of another set of interfaces provided by embodiments of the present application;

FIG. 14 is a schematic view of another set of interfaces provided by embodiments of the present application;

FIG. 15 is a schematic view of another set of interfaces provided by embodiments of the present application;

FIG. 16 is a schematic view of another set of interfaces provided by embodiments of the present application;

FIG. 17 is a schematic view of another set of interfaces provided by embodiments of the present application;

FIG. 18 is a schematic view of another set of interfaces provided by embodiments of the present application;

fig. 19a is a schematic flowchart of a method for acquiring an image by a terminal according to an embodiment of the present disclosure;

fig. 19b is a schematic flowchart of a method for determining a camera for capturing an image according to an embodiment of the present disclosure;

FIG. 20 is a schematic diagram of an example segmentation provided by an embodiment of the present application;

fig. 21 is a schematic diagram of a process of acquiring an image by a terminal according to an embodiment of the present application;

fig. 22a is a schematic diagram of an image acquired by a terminal according to an embodiment of the present disclosure;

fig. 22b is a schematic diagram of an image acquired by another terminal according to an embodiment of the present disclosure;

FIG. 23a is a diagram illustrating a current preview interface according to an embodiment of the present application;

FIG. 23b is a diagram of another current preview interface provided by an embodiment of the present application;

FIG. 23c is a schematic diagram of another current preview interface provided by an embodiment of the present application;

fig. 24 is a schematic diagram illustrating an enlarged and cropped acquired image according to an embodiment of the present application;

fig. 25 is a schematic flowchart of another method for capturing video according to an embodiment of the present disclosure;

FIG. 26 is a schematic diagram illustrating a process of processing a captured image in a Hirschk zoom video according to a conventional technique;

fig. 27 is a schematic flowchart of another method for capturing video according to an embodiment of the present disclosure;

fig. 28 is a schematic diagram of a process for processing an image according to an embodiment of the present application;

fig. 29 is a schematic structural diagram of a terminal according to an embodiment of the present application;

fig. 30 is a schematic structural diagram of another terminal according to an embodiment of the present application.

Detailed Description

In the embodiments of the present application, words such as "exemplary" or "for example" are used to mean serving as an example, instance, or illustration. Any embodiment or design described herein as "exemplary" or "e.g.," is not necessarily to be construed as preferred or advantageous over other embodiments or designs. Rather, use of the word "exemplary" or "such as" is intended to present concepts related in a concrete fashion.

In the embodiments of the present application, the terms "first" and "second" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of the embodiments of the present application, "a plurality" means two or more unless otherwise specified.

In the embodiments of the present application, the term "identical" is used only to describe the same or similar (i.e., not much different). The difference can be embodied by the difference between the corresponding parameters being less than or equal to the threshold value. For example, the sizes of the target subjects are identical, and the sizes of the target subjects are equal to or different by a threshold or less.

The method for shooting the video provided by the embodiment of the application can be applied to a terminal, the terminal can be a terminal with a camera, such as a smart phone, a tablet computer, a wearable device, an AR/VR device, a Personal Computer (PC), a Personal Digital Assistant (PDA), a netbook and other devices, and any other terminal capable of achieving the embodiment of the application can be used. The present application does not limit the specific form of the terminal.

In the present application, the structure of the terminal may be as shown in fig. 1. As shown in fig. 1, the terminal 100 may include a processor 110, an external memory interface 120, an internal memory 121, a Universal Serial Bus (USB) interface 130, a charging management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2, a mobile communication module 150, a wireless communication module 160, an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, a sensor module 180, a button 190, a motor 191, an indicator 192, a camera 193, a display screen 194, a Subscriber Identification Module (SIM) card interface 195, and the like. The sensor module 180 may include a pressure sensor 180A, a gyroscope sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity light sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, an ambient light sensor 180L, a bone conduction sensor 180M, and the like.

It is to be understood that the illustrated structure of the present embodiment does not constitute a specific limitation to the terminal 100. In other embodiments, terminal 100 may include more or fewer components than shown, or some components may be combined, some components may be split, or a different arrangement of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.

Processor 110 may include one or more processing units, such as: the processor 110 may include an Application Processor (AP), a modem processor, a Graphics Processing Unit (GPU), an Image Signal Processor (ISP), a controller, a video codec, a Digital Signal Processor (DSP), a baseband processor, and/or a neural-Network Processing Unit (NPU), etc. The different processing units may be separate devices or may be integrated into one or more processors. For example, in the present application, the processor 110 may control the camera 193 to acquire N +1 images in real time for the first scene, where each of the N +1 images includes the target subject. Wherein, in the process of acquiring N +1 images, the camera 193 is farther from the target subject; n is an integer of 1 or more. Then, for the N images acquired later in the N +1 images, the processor 110 may perform white balance processing based on a preset neural network to obtain N optimized images; the preset neural network is used for ensuring the white balance consistency of the adjacent images in the time domain. Next, the processor 110 may amplify and crop the N optimized images to obtain N target images; the size of a target subject in the N target images is consistent with the size of the target subject in a first image acquired from the N +1 images, and the relative position of the target subject in the N target images is consistent with the relative position of the target subject in the first image; the N target images are of the same size as the first image. Finally, the processor 110 may generate a Hirschk zoom video based on the N target images and the first image. The following can be referred to for a description of this solution.

The controller may be, among other things, a neural center and a command center of the terminal 100. The controller can generate an operation control signal according to the instruction operation code and the timing signal to complete the control of instruction fetching and instruction execution.

A memory may also be provided in processor 110 for storing instructions and data. In some embodiments, the memory in the processor 110 is a cache memory. The memory may hold instructions or data that have just been used or recycled by the processor 110. If the processor 110 needs to reuse the instruction or data, it can be called directly from the memory. Avoiding repeated accesses reduces the latency of the processor 110, thereby increasing the efficiency of the system.

In some embodiments, processor 110 may include one or more interfaces. The interface may include an integrated circuit (I2C) interface, an integrated circuit built-in audio (I2S) interface, a Pulse Code Modulation (PCM) interface, a universal asynchronous receiver/transmitter (UART) interface, a Mobile Industry Processor Interface (MIPI), a general-purpose input/output (GPIO) interface, a Subscriber Identity Module (SIM) interface, and/or a Universal Serial Bus (USB) interface, etc.

MIPI interfaces may be used to connect processor 110 with peripheral devices such as display screen 194, camera 193, and the like. The MIPI interface includes a Camera Serial Interface (CSI), a Display Serial Interface (DSI), and the like. In some embodiments, processor 110 and camera 193 communicate through a CSI interface to implement the capture functionality of terminal 100. The processor 110 and the display screen 194 communicate through the DSI interface to implement the display function of the terminal 100.

The GPIO interface may be configured by software. The GPIO interface may be configured as a control signal and may also be configured as a data signal. In some embodiments, a GPIO interface may be used to connect the processor 110 with the camera 193, the display 194, the wireless communication module 160, the audio module 170, the sensor module 180, and the like. The GPIO interface may also be configured as an I2C interface, an I2S interface, a UART interface, a MIPI interface, and the like.

The USB interface 130 is an interface conforming to the USB standard specification, and may specifically be a Mini USB interface, a Micro USB interface, a USB type c interface, or the like. The USB interface 130 may be used to connect a charger to charge the terminal 100, and may also be used to transmit data between the terminal 100 and peripheral devices. And the earphone can also be used for connecting an earphone and playing audio through the earphone. The interface may also be used to connect other terminals, such as AR devices, etc.

It should be understood that the interface connection relationship between the modules illustrated in the present embodiment is only an exemplary illustration, and does not limit the structure of the terminal 100. In other embodiments of the present application, the terminal 100 may also adopt different interface connection manners or a combination of multiple interface connection manners in the above embodiments.

The power management module 141 is used to connect the battery 142, the charging management module 140 and the processor 110. The power management module 141 receives input from the battery 142 and/or the charge management module 140, and supplies power to the processor 110, the internal memory 121, the display 194, the camera 193, the wireless communication module 160, and the like. The power management module 141 may also be used to monitor parameters such as battery capacity, battery cycle count, battery state of health (leakage, impedance), etc. In some other embodiments, the power management module 141 may also be disposed in the processor 110. In other embodiments, the power management module 141 and the charging management module 140 may be disposed in the same device.

The wireless communication function of the terminal 100 may be implemented by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, a modem processor, a baseband processor, and the like.

The terminal 100 implements a display function through the GPU, the display screen 194, and the application processor, etc. The GPU is a microprocessor for image processing, and is connected to the display screen 194 and an application processor. The GPU is used to perform mathematical and geometric calculations for graphics rendering. The processor 110 may include one or more GPUs that execute program instructions to generate or alter display information.

The display screen 194 is used to display images, video, and the like. The display screen 194 includes a display panel. The display panel may be a Liquid Crystal Display (LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode (active-matrix organic light-emitting diode, AMOLED), a flexible light-emitting diode (FLED), a miniature, a Micro-OLED, a quantum dot light-emitting diode (QLED), or the like. In some embodiments, the terminal 100 may include 1 or N display screens 194, with N being a positive integer greater than 1.

A series of Graphical User Interfaces (GUIs) may be displayed on the display screen 194 of the terminal 100, which are the main screens of the terminal 100. Generally, the size of the display 194 of the terminal 100 is fixed, and only a limited number of controls can be displayed in the display 194 of the terminal 100. A control is a GUI element, which is a software component contained in an application program and controls all data processed by the application program and interactive operations related to the data, and a user can interact with the control through direct manipulation (direct manipulation) to read or edit information related to the application program. Generally, a control may include a visual interface element such as an icon, button, menu, tab, text box, dialog box, status bar, navigation bar, Widget, and the like.

The terminal 100 may implement a photographing function through the ISP, the camera 193, the video codec, the GPU, the display screen 194, and the application processor, etc.

The ISP is used to process the data fed back by the camera 193. For example, when a photo is taken, the shutter is opened, light is transmitted to the camera photosensitive element through the lens, the optical signal is converted into an electrical signal, and the camera photosensitive element transmits the electrical signal to the ISP for processing and converting into an image visible to naked eyes. The ISP can also carry out algorithm optimization on the noise, brightness and skin color of the image. The ISP can also optimize parameters such as exposure, color temperature and the like of a shooting scene. In some embodiments, the ISP may be provided in camera 193.

The camera 193 is used to capture still images or video. The object generates an optical image through the lens and projects the optical image to the photosensitive element. The photosensitive element may be a Charge Coupled Device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor. The light sensing element converts the optical signal into an electrical signal, which is then passed to the ISP where it is converted into a digital image signal. And the ISP outputs the digital image signal to the DSP for processing. The DSP converts the digital image signal into image signal in standard RGB, YUV and other formats. In some embodiments, the terminal 100 may include 1 or N cameras 193, N being a positive integer greater than 1. For example, the camera 193 may include one or at least two kinds of cameras, such as a main camera, a telephoto camera, a wide-angle camera, an infrared camera, a depth camera, and a black-and-white camera. In combination with the technical solution provided by the embodiment of the present application, the first terminal may adopt one or at least two cameras to acquire an image, and process (e.g., merge) the acquired image to obtain a preview image (e.g., a first preview image or a second preview image).

The digital signal processor is used for processing digital signals, and can process digital image signals and other digital signals. For example, when the terminal 100 selects a frequency bin, the digital signal processor is configured to perform fourier transform or the like on the frequency bin energy.

Video codecs are used to compress or decompress digital video. The terminal 100 may support one or more video codecs. In this way, the terminal 100 can play or record video in a variety of encoding formats, such as: moving Picture Experts Group (MPEG) 1, MPEG2, MPEG3, MPEG4, and the like.

The NPU is a neural-network (NN) computing processor that processes input information quickly by using a biological neural network structure, for example, by using a transfer mode between neurons of a human brain, and can also learn by itself continuously. The NPU can implement applications such as intelligent recognition of the terminal 100, for example: image recognition, face recognition, speech recognition, text understanding, and the like.

The external memory interface 120 may be used to connect an external memory card, such as a Micro SD card, to extend the memory capability of the terminal 100. The external memory card communicates with the processor 110 through the external memory interface 120 to implement a data storage function. For example, files such as music, video, etc. are saved in an external memory card.

The internal memory 121 may be used to store computer-executable program code, which includes instructions. The processor 110 executes various functional applications of the terminal 100 and data processing by executing instructions stored in the internal memory 121. For example, in the present embodiment, the processor 110 may acquire the posture of the terminal 100 by executing instructions stored in the internal memory 121. The internal memory 121 may include a program storage area and a data storage area. The storage program area may store an operating system, an application program (such as a sound playing function, an image playing function, etc.) required by at least one function, and the like. The storage data area may store data (e.g., audio data, a phonebook, etc.) created during use of the terminal 100, and the like. In addition, the internal memory 121 may include a high-speed random access memory, and may further include a nonvolatile memory, such as at least one magnetic disk storage device, a flash memory device, a universal flash memory (UFS), and the like. The processor 110 executes various functional applications of the terminal 100 and data processing by executing instructions stored in the internal memory 121 and/or instructions stored in a memory provided in the processor.

The terminal 100 can implement an audio function through the audio module 170, the speaker 170A, the receiver 170B, the microphone 170C, the earphone interface 170D, and the application processor. Such as music playing, recording, etc.

The audio module 170 is used to convert digital audio information into an analog audio signal output and also to convert an analog audio input into a digital audio signal. The audio module 170 may also be used to encode and decode audio signals. In some embodiments, the audio module 170 may be disposed in the processor 110, or some functional modules of the audio module 170 may be disposed in the processor 110.

The speaker 170A, also called a "horn", is used to convert the audio electrical signal into an acoustic signal. The terminal 100 can listen to music through the speaker 170A or listen to a handsfree call.

The receiver 170B, also called "earpiece", is used to convert the electrical audio signal into an acoustic signal. When the terminal 100 receives a call or voice information, it can receive voice by bringing the receiver 170B close to the human ear.

The microphone 170C, also referred to as a "microphone," is used to convert sound signals into electrical signals. When making a call or transmitting voice information, the user can input a voice signal to the microphone 170C by speaking the user's mouth near the microphone 170C. The terminal 100 may be provided with at least one microphone 170C. In other embodiments, the terminal 100 may be provided with two microphones 170C to achieve a noise reduction function in addition to collecting sound signals. In other embodiments, the terminal 100 may further include three, four or more microphones 170C to collect sound signals, reduce noise, identify sound sources, implement directional recording functions, and so on.

The headphone interface 170D is used to connect a wired headphone. The headset interface 170D may be the USB interface 130, or may be a 3.5mm open mobile electronic device platform (OMTP) standard interface, a cellular telecommunications industry association (cellular telecommunications industry association of the USA, CTIA) standard interface.

The pressure sensor 180A is used for sensing a pressure signal, and converting the pressure signal into an electrical signal. In some embodiments, the pressure sensor 180A may be disposed on the display screen 194. The pressure sensor 180A can be of a wide variety, such as a resistive pressure sensor, an inductive pressure sensor, a capacitive pressure sensor, and the like. The capacitive pressure sensor may be a sensor comprising at least two parallel plates having an electrically conductive material. When a force acts on the pressure sensor 180A, the capacitance between the electrodes changes. The terminal 100 determines the intensity of the pressure according to the change in the capacitance. When a touch operation is applied to the display screen 194, the terminal 100 detects the intensity of the touch operation according to the pressure sensor 180A. The terminal 100 may also calculate the touched position based on the detection signal of the pressure sensor 180A. In some embodiments, the touch operations that are applied to the same touch position but different touch operation intensities may correspond to different operation instructions. For example: and when the touch operation with the touch operation intensity smaller than the first pressure threshold value acts on the short message application icon, executing an instruction for viewing the short message. And when the touch operation with the touch operation intensity larger than or equal to the first pressure threshold value acts on the short message application icon, executing an instruction of newly building the short message.

The gyro sensor 180B may be used to determine a motion posture of the terminal 100. In some embodiments, the angular velocity of terminal 100 about three axes (i.e., x, y, and z axes) may be determined by gyroscope sensor 180B. The gyro sensor 180B may be used for photographing anti-shake. Illustratively, when the shutter is pressed, the gyro sensor 180B detects a shake angle of the terminal 100, calculates a distance to be compensated for by the lens module according to the shake angle, and allows the lens to counteract the shake of the terminal 100 by a reverse movement, thereby achieving anti-shake. The gyroscope sensor 180B may also be used for navigation, somatosensory gaming scenes.

The air pressure sensor 180C is used to measure air pressure. In some embodiments, the terminal 100 calculates an altitude from the barometric pressure measured by the barometric pressure sensor 180C to assist in positioning and navigation.

The magnetic sensor 180D includes a hall sensor. The terminal 100 may detect the opening and closing of the flip holster using the magnetic sensor 180D. In some embodiments, when the terminal 100 is a folder, the terminal 100 may detect the opening and closing of the folder according to the magnetic sensor 180D. And then according to the opening and closing state of the leather sheath or the opening and closing state of the flip cover, the automatic unlocking of the flip cover is set.

The acceleration sensor 180E may detect the magnitude of acceleration of the terminal 100 in various directions (generally, three axes). The magnitude and direction of gravity can be detected when the terminal 100 is stationary. The method can also be used for recognizing terminal gestures, and is applied to horizontal and vertical screen switching, pedometers and other applications.

A distance sensor 180F for measuring a distance. The terminal 100 may measure the distance by infrared or laser. In some embodiments, the scene is photographed and the terminal 100 may range using the distance sensor 180F to achieve fast focus.

The proximity light sensor 180G may include, for example, a Light Emitting Diode (LED) and a light detector, such as a photodiode. The light emitting diode may be an infrared light emitting diode. The terminal 100 emits infrared light outward through the light emitting diode. The terminal 100 detects infrared reflected light from a nearby object using a photodiode. When sufficient reflected light is detected, it can be determined that there is an object near the terminal 100. When insufficient reflected light is detected, the terminal 100 may determine that there is no object near the terminal 100. The terminal 100 can utilize the proximity light sensor 180G to detect that the user holds the terminal 100 close to the ear for talking, so as to automatically turn off the screen to achieve the purpose of saving power. The proximity light sensor 180G may also be used in a holster mode, a pocket mode automatically unlocks and locks the screen.

The ambient light sensor 180L is used to sense the ambient light level. The terminal 100 may adaptively adjust the brightness of the display 194 according to the perceived ambient light level. The ambient light sensor 180L may also be used to automatically adjust the white balance when taking a picture. The ambient light sensor 180L may also cooperate with the proximity light sensor 180G to detect whether the terminal 100 is in a pocket to prevent accidental touches.

The fingerprint sensor 180H is used to collect a fingerprint. The terminal 100 can utilize the collected fingerprint characteristics to realize fingerprint unlocking, access to an application lock, fingerprint photographing, fingerprint incoming call answering, and the like.

The temperature sensor 180J is used to detect temperature. In some embodiments, the terminal 100 executes a temperature processing strategy using the temperature detected by the temperature sensor 180J. For example, when the temperature reported by the temperature sensor 180J exceeds a threshold, the terminal 100 performs a reduction in the performance of the processor located near the temperature sensor 180J, so as to reduce power consumption and implement thermal protection. In other embodiments, terminal 100 heats battery 142 when the temperature is below another threshold to avoid a low temperature causing abnormal shutdown of terminal 100. In other embodiments, when the temperature is lower than a further threshold, the terminal 100 performs boosting on the output voltage of the battery 142 to avoid abnormal shutdown due to low temperature.

The touch sensor 180K is also called a "touch device". The touch sensor 180K may be disposed on the display screen 194, and the touch sensor 180K and the display screen 194 form a touch screen, which is also called a "touch screen". The touch sensor 180K is used to detect a touch operation applied thereto or nearby. The touch sensor can communicate the detected touch operation to the application processor to determine the touch event type. Visual output associated with the touch operation may be provided through the display screen 194. In other embodiments, the touch sensor 180K may be disposed on the surface of the terminal 100 at a different position than the display screen 194.

The bone conduction sensor 180M may acquire a vibration signal. In some embodiments, the bone conduction sensor 180M may acquire a vibration signal of the human vocal part vibrating the bone mass. The bone conduction sensor 180M may also contact the human pulse to receive the blood pressure pulsation signal. In some embodiments, the bone conduction sensor 180M may also be disposed in a headset, integrated into a bone conduction headset. The audio module 170 may analyze a voice signal based on the vibration signal of the bone mass vibrated by the sound part acquired by the bone conduction sensor 180M, so as to implement a voice function. The application processor can analyze heart rate information based on the blood pressure beating signal acquired by the bone conduction sensor 180M, so as to realize the heart rate detection function.

The keys 190 include a power-on key, a volume key, and the like. The keys 190 may be mechanical keys. Or may be touch keys. The terminal 100 may receive a key input, and generate a key signal input related to user setting and function control of the terminal 100.

The motor 191 may generate a vibration cue. The motor 191 may be used for incoming call vibration cues, as well as for touch vibration feedback. For example, touch operations applied to different applications (e.g., photographing, audio playing, etc.) may correspond to different vibration feedback effects. The motor 191 may also respond to different vibration feedback effects for touch operations applied to different areas of the display screen 194. Different application scenes (such as time reminding, receiving information, alarm clock, game and the like) can also correspond to different vibration feedback effects. The touch vibration feedback effect may also support customization.

Indicator 192 may be an indicator light that may be used to indicate a state of charge, a change in charge, or a message, missed call, notification, etc.

In addition, an operating system runs on the above components. For example, the iOS os developed by apple, the Android open source os developed by google, the Windows os developed by microsoft, and the like. A running application may be installed on the operating system.

The operating system of the terminal 100 may adopt a layered architecture, an event-driven architecture, a micro-core architecture, a micro-service architecture, or a cloud architecture. The embodiment of the present application takes an Android system with a layered architecture as an example, and exemplarily illustrates a software structure of the terminal 100.

Fig. 2 is a block diagram of a software configuration of the terminal 100 according to the embodiment of the present application.

The layered architecture divides the software into several layers, each layer having a clear role and division of labor. The layers communicate with each other through a software interface. In some embodiments, the Android system is divided into four layers, an application layer, an application framework layer, an Android runtime (Android runtime) and system library, and a kernel layer from top to bottom.

The application layer may include a series of application packages. As shown in fig. 2, the application package may include applications such as camera, gallery, calendar, phone call, map, navigation, WLAN, bluetooth, music, video, short message, etc. For example, when taking a picture, a camera application may access a camera interface management service provided by the application framework layer.

The application framework layer provides an Application Programming Interface (API) and a programming framework for the application program of the application layer. The application framework layer includes a number of predefined functions. As shown in FIG. 2, the application framework layers may include a window manager, content provider, view system, phone manager, resource manager, notification manager, and the like. For example, in the embodiment of the present application, when taking a picture, the application framework layer may provide an API related to a picture taking function for the application layer, and provide a camera interface management service for the application layer, so as to implement the picture taking function.

The window manager is used for managing window programs. The window manager can obtain the size of the display screen, judge whether a status bar exists, lock the screen, intercept the screen and the like.

The content provider is used to store and retrieve data and make it accessible to applications. The data may include video, images, audio, calls made and received, browsing history and bookmarks, phone books, etc.

The view system includes visual controls such as controls to display text, controls to display pictures, and the like. The view system may be used to build applications. The display interface may be composed of one or more views. For example, the display interface including the short message notification icon may include a view for displaying text and a view for displaying pictures.

The phone manager is used to provide a communication function of the terminal 100. Such as management of call status (including on, off, etc.).

The resource manager provides various resources for the application, such as localized strings, icons, pictures, layout files, video files, and the like.

The notification manager enables the application to display notification information in the status bar, can be used to convey notification-type messages, can disappear automatically after a short dwell, and does not require user interaction. Such as a notification manager used to inform download completion, message alerts, etc. The notification manager may also be a notification that appears in the form of a chart or scroll bar text at the top status bar of the system, such as a notification of a background running application, or a notification that appears on the screen in the form of a dialog window. For example, text information is prompted in the status bar, a prompt tone is given, the terminal vibrates, an indicator light flashes, and the like.

The Android Runtime comprises a core library and a virtual machine. The Android runtime is responsible for scheduling and managing an Android system.

The core library comprises two parts: one part is a function which needs to be called by java language, and the other part is a core library of android.

The application layer and the application framework layer run in a virtual machine. And executing java files of the application program layer and the application program framework layer into a binary file by the virtual machine. The virtual machine is used for performing the functions of object life cycle management, stack management, thread management, safety and exception management, garbage collection and the like.

The system library may include a plurality of functional modules. For example: surface managers (surface managers), Media Libraries (Media Libraries), three-dimensional graphics processing Libraries (e.g., OpenGL ES), 2D graphics engines (e.g., SGL), and the like.

The surface manager is used to manage the display subsystem and provide fusion of 2D and 3D layers for multiple applications.

The media library supports a variety of commonly used audio, video format playback and recording, and still image files, among others. The media library may support a variety of audio-video encoding formats, such as: MPEG4, H.264, MP3, AAC, AMR, JPG, PNG, etc.

The three-dimensional graphic processing library is used for realizing three-dimensional graphic drawing, image rendering, synthesis, layer processing and the like.

The 2D graphics engine is a drawing engine for 2D drawing.

The kernel layer is a layer between hardware and software. The inner core layer at least comprises a display driver, a camera driver, an audio driver and a sensor driver.

Although the Android system is taken as an example for description in the embodiments of the present application, the basic principle is also applicable to a terminal based on an os such as iOS or Windows.

The workflow of the software and hardware of the terminal 100 is exemplarily described below in conjunction with fig. 1 and a video shooting scene.

First, the touch sensor 180K receives a touch operation on a camera application icon, and reports the touch operation to the processor 110, so that the processor 110 starts the camera application in response to the touch operation, and displays a user interface of the camera application on the display screen 194, as shown in a diagram in fig. 3. In addition, in the embodiment of the present application, the terminal 100 may be enabled to start the camera application in other manners, and a user interface of the camera application is displayed on the display screen 194. For example, when the terminal 100 displays a user interface after a screen is blacked, a screen lock interface is displayed, or the terminal is unlocked, the camera application may be started in response to a voice instruction or a shortcut operation of the user, and the user interface of the camera application may be displayed on the display screen 194. The user interface of the camera comprises controls such as night scenes, portraits, photos, videos and more.

Secondly, the touch sensor 180K receives a touch operation on the "record" control, and reports the touch operation to the processor 110, so that the processor 110 highlights the "record" control in response to the touch operation, as shown in a diagram b in fig. 3, and the "record" control in the diagram b in fig. 3 is framed to be highlighted; and starts the video recording function and displays the user interface under the video recording function, as shown in fig. 3 c. The user interface under the video recording function comprises controls of 'xizon Keke zoom video', 'common video', 'more' and the like.

Next, the touch sensor 180K receives a touch operation on the "xico zoom video" control, and reports the touch operation to the processor 110, so that the processor 110 highlights the "xico zoom video" control in response to the touch operation, as shown in a diagram d in fig. 3, where the "xico zoom video" control in the diagram d in fig. 3 is framed to be highlighted; and starting to record by adopting a Hirschhorn zoom video shooting mode, namely starting to shoot the Hirschhorn zoom video.

The terminal 100 may be enabled to start the heuchock zoom video capture mode in other ways in the embodiments of the present application. For example, the terminal 100 may start the xike zoom video photographing mode in response to a voice instruction or a shortcut operation of the user, or the like.

Fig. 4 is a schematic diagram of a hardware structure of a computer device 30 according to an embodiment of the present disclosure. The computer device 30 includes a processor 301, a memory 302, a communication interface 303, and a bus 304. The processor 301, the memory 302, and the communication interface 303 may be connected to each other via a bus 304.

The processor 301 is a control center of the computer device 30, and may be a general-purpose CPU, another general-purpose processor, or the like. Wherein a general purpose processor may be a microprocessor or any conventional processor or the like.

By way of example, processor 301 may include one or more CPUs, such as CPU 0 and CPU 1 shown in fig. 4.

The memory 302 may be, but is not limited to, a read-only memory (ROM) or other type of static storage device that may store static information and instructions, a Random Access Memory (RAM) or other type of dynamic storage device that may store information and instructions, an electrically erasable programmable read-only memory (EEPROM), a magnetic disk storage medium or other magnetic storage device, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer.

In one possible implementation, the memory 301 may exist independently of the processor 301. A memory 302 may be coupled to the processor 301 through a bus 304 for storing data, instructions, or program code. When the processor 301 calls and executes the instructions or program codes stored in the memory 302, it can implement the corresponding method of the training data preparation process before training the white balance network, and the method of training the white balance network provided by the embodiment of the present application.

In another possible implementation, the memory 302 may also be integrated with the processor 301.

The communication interface 303 may be any device capable of inputting parameter information, such as a communication interface, and the embodiment of the present application is not limited. The communication interface may include a receiving unit and a transmitting unit. For example, the communication interface 303 may be configured to transmit information (e.g., values of related parameters) about the trained white balance network to the terminal 100.

The bus 304 may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an extended ISA (enhanced industry standard architecture) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 4, but this does not indicate only one bus or one type of bus.

It should be noted that the configuration shown in fig. 4 does not constitute a limitation of the computer device 30, and that the computer device 30 may include more or less components than those shown in fig. 4, or combine some components, or a different arrangement of components, in addition to the components shown in fig. 4.

It should be further noted that the computer device 30 shown in fig. 4 may specifically be any one of the terminals 100 provided above, and may also be any one of network devices, such as an access network device (e.g., a base station).

The technical scheme provided by the embodiment of the application is described below with reference to the accompanying drawings:

the method for shooting the video can be applied to shooting the kock zoom video in the Hirschhorn area. And in the process of shooting the Hirschhorn zoom video, white balance processing is carried out on the collected images based on a white balance network so as to enable the white balance of the adjacent images in the time domain to be consistent. Wherein:

white balance is an index for describing the accuracy of white color generated by mixing red, green and blue three primary colors. White balance is a very important concept in the field of television photography, by which a series of problems with tone processing can be solved.

The white balance gain is a parameter for correcting the white balance of an image.

The white balance network is a network for predicting a white balance gain of an image. The white balance network may be a deep learning network, such as a neural network or the like.

White balance consistency refers to processing time-domain adjacent images by using an approximate white balance gain such that white balance effects between the processed images are the same or similar. Specifically, the white balance network provided by the embodiment of the application is used for processing the images adjacent to each other in the time domain, so that the white balance effect of the processed images is the same or similar.

For example, the terminal continuously acquires images, or images obtained by processing the images continuously acquired by the terminal; can be understood as temporally adjacent images.

The white balance network provided in the embodiments of the present application is explained below:

training data preparation process:

fig. 5 is a schematic flowchart of a training data preparation process before training a white balance network according to an embodiment of the present application. The training data preparation process prior to training the white balance network may be performed by the computer device 30 described hereinabove. The method shown in fig. 5 comprises the following steps:

s101: the computer device acquires original images in a plurality of environments (such as indoor and outdoor environments with different color temperatures, different brightness, different viewing angles and the like) acquired by a plurality of cameras (such as a main camera, a wide-angle camera and the like).

S102: for each original image, the computer device performs parameter extraction on the gray or white part in the original image to obtain the white balance gain of the original image.

The white balance gain herein is used as a prediction target of the white balance network in the training stage, and therefore, in order to distinguish the white balance gain prediction value hereinafter, the white balance gain obtained here is hereinafter referred to as a white balance gain reference value.

Alternatively, the gray or white portions of the original image may be obtained based on standard color chart contrast.

S103: for each original image, the computer device performs data enhancement on the original image to obtain a set of enhanced images. A set of enhanced images is used to determine an original sample.

For convenience of description, each group of enhanced images is hereinafter referred to as a group of images.

Implementation mode 1: one image set is taken as one original sample. One image group includes P images, P being an integer of 2 or more. P images in one image group are used for simulating P continuous images in time domain collected by a camera.

Optionally, P images in one image group are used to simulate P time-domain continuous images acquired by the same camera.

Optionally, P images in one image group are used to simulate P time-domain continuous images acquired before and after switching of different cameras.

The first image in an image group may be generated based on a set of random numbers based on an original image corresponding to the image group, although the embodiment of the present application is not limited thereto.

Implementation mode 2: an image group and an original image corresponding to the image group are used as an original sample. One image group includes Q images, Q being an integer of 1 or more. And the images in one image group and the original images corresponding to the image group are used for simulating Q +1 continuous images in the time domain acquired by a camera.

Optionally, Q images in one image group and the original image corresponding to the image group are used to simulate Q +1 time-domain continuous images acquired by the same camera.

Optionally, Q images in one image group and original images corresponding to the image group are used to simulate Q +1 time-domain continuous images acquired before and after switching of different cameras.

For example, in an original sample, the original image may be the 1 st image, and the images in the image group corresponding to the original image are the 2 nd image to the Q +1 th image in the original sample.

Generally, a part of original samples are used for simulating a plurality of time-domain continuous images acquired by the same camera, and another part of original samples are used for simulating a plurality of time-domain continuous images acquired before and after switching of different cameras. Therefore, the white balance network trained based on all original samples can be simultaneously suitable for scenes without camera switching and scenes with camera switching.

For convenience of description, it is to be noted that, for convenience of description, the following description is given by taking an example in which one original sample is one image group (i.e., the above implementation mode 1).

S104: for each original sample, the computer device converts the image in the original sample to the same color space, resulting in one sample. All samples constitute training data.

Alternatively, the color space may be a Red Green Blue (RGB) color space, etc., although the specific implementation is not limited thereto.

Optionally, the same color space is used for all samples in the training data.

It should be noted that the images in the sample belong to the same color space, so as to avoid (or eliminate) the difference between different camera modules in the prediction stage. In addition, if the images in an original sample are themselves in the same color space, the step of converting to the same color space may not be performed. In this case, a group of enhanced images obtained based on the same original image is referred to as an image group.

A training stage:

fig. 6 is a schematic diagram of a network architecture used in training a white balance network according to an embodiment of the present application.

in (n) represents the nth image in the sample. n is the number of images in one sample, n is 2 or more, and n is an integer. in (n-a) represents the n-a th image in the sample. a < n, a being an integer.

out (n) represents the output of the first sub-network when the input of the first sub-network is in (n). out (n-a) represents the output of the first sub-network when the input of the first sub-network is in (n-a).

mem (n-1, 1) represents a feature map corresponding to the (n-1) th image through a feature map corresponding to the 1 st image in the sample. mem (n-a-1, 1) represents the feature map corresponding to the (n-a-1) th image through the feature map corresponding to the 1 st image in the sample.

Wherein, the characteristic diagram corresponding to the (n-1) th image is the characteristic diagram of the network layer included by the first sub-network when the input of the first sub-network is in (n-1). The embodiment of the present application does not limit which network layer or network layers in the first sub-network is specifically the network layer, and a specific implementation manner of each network layer. In addition, the feature maps corresponding to different images in a sample may be feature maps of the same network layer or feature maps of different network layers in the first subnetwork.

The loss function (loss) is used to constrain the training process to achieve the "out (n), out (n-1) … … out (n-a) consistent" training goal.

During training, for any one sample in the training data:

first, the computer device inputs the (n-a) th picture to the (n) th picture in the sample into the network architecture shown in fig. 6, and the network architecture outputs a set of white balance gain prediction values out (n-a) to out (n).

Then, the computer device uses the white balance gain reference value of the original image corresponding to the sample as a monitor, and adjusts the values of the parameters in the first sub-network by taking each value in the "out (n-a) … … out (n)", as close to the white balance gain reference value of the original image as possible as a target.

And by analogy, inputting a plurality of samples into the network architecture in sequence, and repeatedly executing the steps. In the execution process, the loss function is used to perform constraint so that the "out (n-a) … … out (n)" corresponding to the same sample and the white balance gain reference value of the original image corresponding to the sample are consistent. When the accuracy of the network architecture reaches a preset accuracy, the white balance network is trained.

That is, the white balance network is trained based on the constraint condition that "the white balance gain prediction values for a plurality of images consecutive in the analog time domain are consistent".

In fig. 6, the white balance gain prediction values of a +1 images that are consecutive in time domain are supervised by using a consistency supervision method.

In one example, if n-2 and a-1, the network architecture used when training the white balance network may be as shown in fig. 7.

In training, for any one sample in the training data:

first, the computer apparatus inputs the first picture in the sample as in (1) and the second picture as in (2) into the network architecture shown in fig. 7, which outputs a pair of white balance gain prediction values out (1) and out (2).

Secondly, the computer device uses the white balance gain reference value of the original image corresponding to the sample as supervision, and adjusts the value of the parameter of the white balance network by combining the white balance gain predicted values out (1) and out (2) output by the network architecture.

And by analogy, inputting a plurality of samples into the network architecture in sequence, and repeatedly executing the steps. In the execution process, the loss function is used to perform constraint so that "out (1) and out (2)" corresponding to the same sample and the white balance gain reference value of the original image corresponding to the sample are consistent. When the accuracy of the network architecture reaches a preset accuracy, the white balance network is trained.

In fig. 7, the white balance gain prediction values of 2 consecutive images in the time domain are supervised by using a consistency supervision method.

A prediction stage:

fig. 8 is a schematic diagram of a network architecture used in a prediction phase according to an embodiment of the present application.

The first sub-network in the white balance network in fig. 8 is the first sub-network at the end of the training phase.

in (t) represents the input of the white balance network for inputting the image to be predicted.

out (t) is the output of the white balance network when the input of the white balance network is in (t).

mem (T-1, T-T) represents a feature map of a first target network layer output (hereinafter referred to as a feature map corresponding to the T-1 th image) used in a process of predicting the white balance gain of the T-1 th acquired image based on the white balance network, to a feature map of a second target network layer output (hereinafter referred to as a feature map corresponding to the T-T th image) used in a process of predicting the white balance gain of the T-T acquired image based on the white balance network.

Optionally, the value of T may be adjustable. Generally, the larger the value of T is, the smaller the overall fluctuation range of the white balance gain of a plurality of time-domain continuous images predicted by using the white balance network is, that is, the larger the value of T is, after the white balance network is used to perform white balance processing on a plurality of time-domain continuous images, the better the white balance consistency effect between the obtained images is.

mem (T-1, T-T) is specifically which feature map or feature maps, and is updated with the update of the value of T. For example, if T is 3, then mem (T-1, T-T) represents the feature map corresponding to the 4 th acquired image to the feature map corresponding to the 1 st acquired image, assuming that T is 5. Assuming that T is 6, mem (T-1, T-T) represents the feature map corresponding to the acquired 5 th image to the feature map corresponding to the acquired 2 nd image.

In prediction, for the T-th acquired image, the image is input as in (T) into the white balance network shown in fig. 8, and the white balance network outputs out (T) as a predicted white balance gain value under the constraint of mem (T-1, T-T).

That is, the white balance network is used for predicting the white balance gain of the image to be processed (i.e. in (T)) in combination with the feature map of the historical network layer (i.e. mem (T-1, T-T)) so as to ensure the white balance consistency of the adjacent images in the time domain. Wherein the history network layer is a network layer used in predicting a white balance gain of an image preceding and temporally continuous with the image to be processed.

Fig. 9 is a schematic flow chart of a method for predicting white balance gain according to an embodiment of the present disclosure. The executing subject of the solution may be the computer device 30 provided above. The method shown in fig. 9 may include the steps of:

s201: the computer device acquires a first image to be predicted and an original color space of the first image to be predicted. The original color space of the first image to be preset is the color space of the camera used when the image to be predicted is acquired.

The first image to be predicted may be any one of non-first images in a plurality of consecutive images acquired by the same camera, or may be any one of non-first images in a plurality of consecutive images acquired by different cameras.

S202: and the computer equipment converts the original color space of the first image to be predicted into a preset color space to obtain a second image to be predicted. Wherein the preset color space is a color space used in the training phase.

It should be noted that S202 may not be performed if the original color space of the first image to be predicted is the preset color space, and in this case, the "second image to be predicted" in the following step may be replaced with the "first image to be predicted".

S203: the computer device inputs the second image to be predicted as in (T) to a white balance network (e.g., the white balance network shown in fig. 8) which outputs out (T) as a white balance gain prediction value under the constraint of mem (T-1, T-T).

S204: and the computer equipment converts the predicted value into an original color space and acts the converted predicted value on the first image to be predicted to obtain an optimized image corresponding to the first image to be predicted.

The applying the converted prediction value to the first to-be-predicted image to obtain an optimized image corresponding to the first to-be-predicted image may include: and multiplying the converted predicted value with the pixel value of each pixel in the first image to be predicted to obtain a new pixel value, and taking the pixel value as the pixel value of the pixel corresponding to the pixel in the optimized image corresponding to the first image to be predicted.

The optimized image is an image obtained by processing a first image to be predicted based on a white balance network. In the examples of the application: the image obtained after performing the processing of S201-S204 for each image in the plurality of temporally successive images other than the first image has white balance consistency with the first image in the plurality of temporally successive images.

Conventional white balance networks generally only consider single frame information, which may cause the white balance gain prediction value to have a jump from frame to frame, that is, the overall fluctuation range of the white balance gain prediction value from frame to frame is large.

The white balance network provided by the embodiment of the application fuses network layer feature information (feature) of the current frame and the historical frame (i.e. a feature map, specifically mem (T-1, T-T) in the foregoing). Thus, considering the information of multiple frames helps to make the white balance gain prediction values closer from frame to frame, thereby making the white balance network more stable. More specifically, in the above technical solution, the current frame and the historical frame are time-domain continuous. Therefore, the white balance network provided by the above technical solution enables the predicted values of the white balance gains of the plurality of images that are continuous in time domain to be closer, that is, the whole fluctuation range of the white balance gains of the plurality of images that are continuous and predicted by using the white balance network is smaller, so that the white balance network is more stable, that is, the white balance consistency effect between the images obtained after the white balance processing is performed on the plurality of images is better.

It should be noted that, in the white balance network provided in the embodiment of the present application, a constraint of "the white balance gain prediction values of the images obtained by enhancing the same original image are consistent" is used in the training process. In practice, it cannot be guaranteed that the predicted values of the white balance gains of a plurality of continuous images are completely the same, but have certain fluctuation. However, compared with the network for predicting the white balance gain based on a single frame in the prior art, the white balance network provided by the embodiment of the application is beneficial to reducing the whole fluctuation range of the white balance gains of a plurality of continuous images in a time domain, so that the stability of the white balance network is improved.

In addition, the conventional white balance network is trained based on a plurality of images acquired by the same camera, which may result in that the conventional white balance network cannot be applied to a multi-shot switching scene.

The embodiment of the application performs data enhancement aiming at a multi-shot switching scene. Specifically, the method can be embodied in that when the white balance network is trained, the used training data includes samples for simulating a plurality of continuous images before and after a multi-shot switching scene. Due to the fact that visual angles, sizes and input image statistics generally change during multi-shot switching, a multi-shot switching scene is simulated through data enhancement during training, and therefore the predicted value of the constraint network can be kept consistent under the multi-shot switching scene.

The application of the white balance network provided above is not limited by the embodiments of the present application.

Hereinafter, an application of the white balance network provided in the embodiment of the present application to capturing a xike kock zoom video will be described.

The sizes of the target subjects in different images in the heucher zoom video are consistent (e.g., the same or slightly different), and the relative positions of the target subjects in different images are consistent (e.g., the same or slightly different). Optionally, the postures of the target subjects in different images are consistent (e.g. same posture or similar postures). For example, the sizes of the target subjects in different images are consistent, which may include: the contours (or minimum bounding rectangles) of the target subject in the different images are relatively consistent. For example, the relative positions of the target subject in different images are consistent, which may include: the relative position of the target subject in different images with respect to the same static object in the background is consistent. For example, the center position (or outline or minimum bounding rectangle) of the target subject in different images is consistent with respect to the center position (or outline or minimum bounding rectangle) of the same static object in the background. For example, posture similarity may include overall posture identity (e.g., standing, sitting, or lying), while local posture identity (e.g., different gestures, etc.).

It should be noted that, in one example, the sizes of the target subjects in the different images of the heuchock zoom video are the same, which means that there is no jump or the jump degree is small, so that the user does not feel the jump or can accept the jump.

It should be noted that, in one example, the relative positions of the target subject in different images of the heuchker zoom video are consistent, which means that the target subject in different images of the heuchker zoom video is static or has a small degree of dynamic change, so that the user does not feel or can accept the dynamic change.

Fig. 10 is a schematic flowchart of a method for capturing a video according to an embodiment of the present disclosure. The method is applied to the terminal. The terminal comprises at least two cameras. The technical scheme provided by the embodiment is applied to a scene of shooting the heucher zoom video from near to far, namely, in the process of shooting the heucher zoom video, the distance between the terminal and the target body is more and more far.

In this embodiment, a hessian kock zoom video is shot under the condition that the terminal is farther from the target subject, and when the camera is not switched, the size of the target subject in the later captured image is smaller than that of the target subject in the earlier captured image, and the sizes of the target subjects in different images of the hessian kock zoom video are the same, so that the later captured image needs to be enlarged to realize the hessian kock zoom video. Directly magnifying the image may result in the magnified image being unclear.

Based on this, the basic principle of the technical scheme provided by this embodiment is as follows: the terminal can adopt the camera that the multiplying power that uses is bigger than when gathering preceding image to gather the image in back, realizes the enlargeing to the target subject through switching into the camera of bigger multiplying power promptly, like this, compares the technical scheme who "enlargies the image of gathering after, helps improving the definition of image.

The method shown in fig. 10 may include the steps of:

s300: the terminal determines to shoot the Hirsch Kock zoom video from near to far and determines an initial camera.

Alternatively, the terminal may determine to take the kock zoom video from near to far at the direction of the user.

For example, in conjunction with diagram d in fig. 3, in response to a touch operation on the "heucher zoom video" control, the terminal may also display a "near-to-far mode" 401 control and a "far-to-near mode" 402 control on the user interface, as shown in diagram a in fig. 11. Based on this, the user can perform a touch operation on the control 401 in the near-to-far mode, and in response to the touch operation, the terminal highlights the control 401 in the near-to-far mode (for example, a frame of the control is bold-faced), as shown in a b diagram in fig. 11, and simultaneously starts to start shooting the kock zoom video in the near-to-far mode.

It should be noted that the terminal may also start shooting the kock zoom video from the near-distance mode in other manners (e.g., a voice instruction manner or a shortcut key manner), which is not specifically limited in this embodiment of the application.

Since in this embodiment, if the terminal switches cameras, the camera is switched to a camera with a larger magnification, the magnification of the initial camera is not usually the camera with the largest magnification in the terminal, and it can be generally predefined that the initial camera is the camera with a smaller magnification (e.g. the smallest magnification) in the terminal.

S301: the terminal collects N +1 images in real time aiming at a first scene, wherein the N +1 images all comprise a target main body. Wherein, in the process of acquiring the N +1 images, the terminal is farther and farther away from the target main body. N is an integer of 1 or more.

Optionally, the N +1 images may be N +1 images continuously acquired by the terminal.

Optionally, the first image in the N +1 images is the first image saved by the terminal when the terminal starts shooting in the "hek zoom video" mode.

The first scene may be understood as a shooting scene in the shooting view field of the camera of the terminal or a scene around the shooting scene when the terminal executes S301, and is related to the environment where the user is located, the posture of the terminal, or parameters of the camera, which is not limited in the present application.

The target body can be an object, and the position of the target body can not move in the shooting process or can move transversely at the same depth.

Alternatively, the target body may include a plurality of objects having the same depth, and the entirety of the plurality of objects may be the target body. In some embodiments, when the target subject includes multiple objects, the images of the multiple objects are connected or partially overlap. In the video shooting process, along with the change of the distance between the user and the target subject, the change range of the imaging sizes of the objects at different depths is different. Thus, it is difficult to simultaneously achieve substantially constant image sizes for objects of different depths as the distance between the user and the target subject varies. Therefore, to keep the size of the target subject image substantially unchanged, multiple objects in the target subject should have the same depth.

The target subject may be determined automatically by the terminal or specified by the user, and these two cases will be described separately below.

(1) The terminal automatically determines a target subject, which may include one or more objects.

In some embodiments, the target subject is a preset type of object. For example, the predetermined type of object is a person, an animal, a famous building or a landmark, and the like. The terminal determines an object of a preset type as a target subject based on the preview image.

In other embodiments, the target subject is an object whose image on the preview image is located in the center region. The target subject of interest to the user will typically be facing the zoom camera and the image of the target subject on the preview image will thus typically be located in the central region.

In other embodiments, the target subject is an object whose image on the preview image is close to the central region and whose area is larger than the preset threshold 1. The target subject of interest to the user will typically be facing the zoom camera and closer to the zoom camera, so that the image of the target subject on the preview image is close to the central region and has an area greater than a preset threshold 1.

In other embodiments, the target subject is a preset type of object whose image on the preview image is near the center region.

In other embodiments, the target subject is a preset type of object whose image on the preview image is near the center region and whose area is larger than a preset threshold.

In other embodiments, the target subject is a preset type of object with the smallest depth near the center region of the image on the preview image. When the preset type of object of the image near the center area on the preview image includes a plurality of objects having different depths, the target object is an object of which the depth is the smallest.

In some embodiments, the terminal default target subject includes only one object.

It can be understood that there may be various ways for the terminal to automatically determine the target subject, and this way is not particularly limited in this embodiment of the application.

In some embodiments, after the terminal determines the target subject, the terminal may prompt the target subject to the user by displaying a prompt message or by voice broadcasting.

For example, the preset type is a person, and the terminal determines that the target subject is a preset type of person 1 whose image on the preview image is close to the center area. For example, referring to fig. 12 (a), the terminal may select a character 1 through a box 501 to prompt the user that the character 1 is a target subject.

For another example, the preset type is a person, and the terminal determines that the target subject is the preset type of the person 2 and the person 3 having the same depth near the center area in the image on the preview image. For example, referring to fig. 12 (b), the terminal may box out the person 2 and the person 3 through the circle 502 to prompt the user that the person 2 and the person 3 are the target subjects.

As another example, the preset types include a person and an animal, and the terminal determines the target subject as the preset type of the person 4 and the animal 1 having the same depth near the center area in the image on the preview image. Illustratively, referring to (c) in fig. 12, the terminal may prompt the user that the character 4 and the animal 1 are the target subjects by displaying prompt information.

It can be understood that there are various ways for the terminal to prompt the target subject to the user, and this embodiment of the present application does not specifically limit this way.

In some embodiments, after the terminal automatically determines the target subject, the terminal may also modify the target subject in response to a user operation, such as switching, adding, or deleting the target subject.

For example, in the case shown in fig. 13 (a), the target subject automatically specified by the terminal is the person 1, and when the terminal detects an operation of the user clicking the person 5 on the preview image, the target subject is modified from the person 1 to the person 5 as shown in fig. 13 (b).

For another example, in the case shown in fig. 14 (a), the target subject automatically determined by the terminal is person 1, and after the terminal detects an operation of the user dragging the box to simultaneously frame person 1 and person 5, the target subject is modified from person 1 to person 1 and person 5 as shown in fig. 14 (b).

For another example, in the case shown in fig. 15 (a), the terminal automatically determines that the target subjects are the person 1 and the person 5, and after the terminal detects that the user can click on the person 5, the terminal modifies the target subjects from the person 1 and the person 5 to the person 1 as shown in fig. 15 (b).

For another example, the terminal first enters the target subject modification mode according to the instruction of the user, and then modifies the target subject in response to the operation of the user.

It is understood that there may be various ways for the user to modify the target subject, and the embodiment of the present application is not particularly limited to this way.

(2) The user specifies a target subject, the target subject including one or more objects.

After the terminal enters the Hirschker mode, the target subject can be determined in response to the preset operation of the user on the preview interface. The preset operation is used to designate a certain object or objects as target objects. The preset operation may be a touch operation, a voice instruction operation, or a gesture operation, and the embodiment of the present application is not limited. For example, the touch operation may be a single click, a double click, a long press, a pressure press, an operation of circling an object, or the like.

For example, in the preview interface shown in (a) of fig. 16, after the terminal detects the user's operation of double-clicking the character 1 on the preview image, the character 1 is determined as the target subject as shown in (b) of fig. 16.

In other embodiments, the terminal may prompt the user to specify a target subject after entering the Hirschk mode. For example, referring to (a) in fig. 17, the terminal may display a prompt message: please designate the target subject so that the image size of the target subject is substantially unchanged during the photographing process. Then, the terminal responds to the preset operation of the user on the preview interface to determine the target subject. For example, when the terminal detects an operation of the user to circle the person 1 shown in fig. 17 (a), the corresponding person 1 is determined as the target subject as shown in fig. 17 (b). For another example, when the terminal detects an operation in which the user voice indicates that the person is the target subject, the terminal determines the person 1 as the target subject.

As another example, in the case where the target subject is a preset object type and the preset object type is a person, referring to (a) in fig. 18, the terminal may display a prompt message: is a person detected and is the person designated as the target subject so that the size of the image of the target subject is substantially unchanged during photographing? Then, in response to the user clicking the "yes" control, the terminal determines that the person is the target subject as shown in (b) in fig. 18.

In some embodiments, if the terminal defaults that the target subject includes only one object, when the user designates a plurality of objects as the target subject, the terminal may prompt the user to: please select only one object as the target subject.

Similar to the terminal automatically determining the target subject, the terminal may also prompt the target subject to the user by displaying prompt information or voice broadcasting or the like after determining the target subject in response to a preset operation of the user. Also, the terminal may also modify the target body in response to an operation by the user, such as switching, adding, or deleting the target body. And will not be described in detail herein.

The terminal acquires N +1 images in real time for the first scene, which means that the terminal acquires N +1 images for the first scene in the shooting process, instead of acquiring N +1 images for the first scene before shooting.

Optionally, as shown in fig. 19a, S301 may include the following steps S301 a-S301 d:

s301 a, the terminal adopts an initial camera to acquire a first image aiming at a first scene, wherein the first image comprises a target main body.

And S301 b, the terminal adopts the initial camera to acquire a second image aiming at the first scene, wherein the second image comprises a target main body.

And S301c, the terminal determines a camera for acquiring the (i + 1) th image based on the shooting magnification of the (i) th image. i is not less than 2, i is an integer. Specific implementations can be found in the following. The ith image includes a target subject.

S301 d: and the terminal acquires the (i + 1) th image by adopting the determined camera for acquiring the (i + 1) th image.

By analogy, the terminal can acquire N +1 images.

In one implementation, the N +1 images include N1+1 images captured before and N2 images captured after, where the N1+1 images are captured by a first camera of the terminal and the N2 images are captured by a second camera of the terminal; n1 and N2 are each an integer of 1 or more. That is to say, the technical solution provided by the embodiment of the present application can be applied to shooting a hessian kock zoom video in a scene where cameras are switched. Of course, in a specific implementation, the camera may be switched many times during shooting of a segment of the hek zoom video.

In this implementation, in combination with the above S301 a to S301d, it can be known that:

the zoom magnifications of the second image to the (N1) th image in the (N1 + 1) th image with respect to the first image all belong to the first photographing magnification range. The first shooting magnification range corresponds to the first camera.

The scaling magnifications of the N1+1 image of the N1+1 images and the first N2-1 image of the N2 images relative to the first image both belong to the second photographing magnification range. The second shooting magnification range corresponds to the second camera.

In another implementation manner, the N +1 images are shot and collected by the first camera. That is to say, the technical solution provided by the embodiment of the present application can be applied to shooting the hessian kock zoom video in a scene without switching the cameras.

Alternatively, as shown in fig. 19b, S301c may include the following steps: s301c-1 to S301c-3:

s301 c-1: and the terminal performs anti-shake processing on the ith image based on the first image. Specifically, the terminal determines the position of a feature point in the first image, and performs motion compensation on the position of the feature point matched with the feature point in the ith image based on the position of the feature point in the first image, so as to perform anti-shake processing on the ith image.

The anti-shake processing technology adopted by the terminal when executing the anti-shake processing in the embodiment of the present application is not limited, for example, the anti-shake processing technology may be an optical anti-shake processing technology, an Artificial Intelligence (AI) anti-shake processing technology, an electronic processing anti-shake technology, or the like.

It should be noted that S301c-1 is an optional step. After S301c-1 is performed for each of the N images, the video (i.e., the last N acquired images) that contributes to the zoom ratio calculation module (i.e., the module for calculating the zoom ratio in the terminal) is less jittered (i.e., the video is better stabilized/smoothed as a whole) as a whole, thereby making the accuracy of the obtained zoom ratio greater.

S301c-2, the terminal acquires the shooting magnification of the ith image. If the terminal performs S301c-1, the ith image here is specifically the ith image after the anti-shake processing.

Optionally, the shooting magnification of the ith image is determined based on the zoom magnification of the ith image relative to the first image and the magnification of a camera acquiring the first image. The scaling factor of the ith image relative to the first image is: based on the size of the target subject in the ith image and the size of the target subject in the first image.

For example, ci ═ c1/(di/d 1). Where di is the size of the target subject in the ith image and d1 is the size of the target subject in the first image. di/d1 is the zoom magnification of the ith image relative to the first image. c1 is the magnification of the first camera, and ci is the shooting magnification of the ith image.

Optionally, the size of the target subject in the image is characterized by at least one of the following features 1-4:

the method is characterized in that: the width of the target subject in the image.

And (2) feature: the height of the target subject in the image.

And (3) feature: the area of the target subject in the image.

And (4) feature: the number of pixel points occupied by the target subject in the image.

For example, taking feature 2 as an example of characterizing the size of the target subject in one image, the photographing magnification of the ith image may be obtained based on the formula ci ═ c1/(hi/h 1). Where hi is the height of the target subject in the ith image and h1 is the height of the target subject in the first image. hi/h1 is the zoom magnification of the ith image relative to the first image. c1 is the magnification of the first camera, and ci is the shooting magnification of the ith image.

Optionally, S301c-2 may include: the terminal extracts a target subject from the ith image and determines the scaling factor of the ith image relative to the first image based on the size of the target subject and the size of the target subject of the first image.

The embodiment of the application does not limit the specific implementation manner of extracting the target subject from the image by the terminal. For example, the terminal extracts the target subject from the image by one or more of a subject segmentation algorithm, a subject skeleton point detection algorithm, a subject contour detection algorithm, and the like.

Illustratively, the subject segmentation algorithm includes an instance segmentation algorithm. Specifically, the terminal extracts an instance segmentation mask of the target subject from the ith image by using an instance segmentation algorithm, and then divides the size of the instance segmentation mask of the target subject extracted from the ith image by the size of the instance segmentation mask of the target subject extracted from the first image to obtain the scaling factor of the ith image relative to the first image. Fig. 20 is a schematic diagram illustrating example segmentation provided in the embodiment of the present application. Note that a in fig. 20 represents an i-th image, b in fig. 20 represents an example division mask of a target subject in the i-th image, in which pixels having pixel values larger than 0 are pixels representing the target subject, and the other regions are pixels representing a background.

The example segmentation algorithm is a pixel-level segmentation method, and the accuracy of the target subject extracted based on the example segmentation algorithm is higher, which is helpful for making the zoom ratio calculated by the terminal more accurate. For example, in the case where the target subject includes a plurality of persons, it is also possible to effectively distinguish the subject portrait from the background.

S301c-3, if the shooting magnification of the ith image is in the first shooting magnification range, the terminal determines that the camera for collecting the (i + 1) th image is the first camera. And if the shooting magnification of the ith image is in the second shooting magnification range, determining that the camera acquiring the (i + 1) th image is the second camera. The first shooting magnification range corresponds to the first camera, and the second shooting magnification range corresponds to the second camera.

That is, the terminal determines a camera which acquires the (i + 1) th image based on the shooting magnification of the (i) th image. Since the difference between the size of the target subject and the position of the target subject in the two adjacent images is not too large, the terminal can acquire the (i + 1) th image based on the camera determined by the shooting magnification of the ith image.

Optionally, the magnification of the first camera is a, and the magnification of the second camera is b; a is less than b; the first shooting magnification range is [ a, b ], and the second shooting magnification range is a range equal to or greater than b.

For example, taking a terminal including a wide-angle camera and a main camera as an example, since the magnification of the wide-angle camera is 0.6 and the magnification of the main camera is equal to 1, the shooting magnification range corresponding to the wide-angle camera is [0.6,1 ], and the shooting magnification range corresponding to the main camera is a range greater than or equal to 1.

Further optionally, if the terminal further includes a third camera, and the magnification of the third camera is c, and a < b < c, the first shooting magnification range is [ a, b ], and the second shooting magnification range is [ b, c); the shooting magnification range corresponding to the third camera is a range larger than or equal to c.

For example, taking a terminal including a wide camera, a main camera and a telephoto camera as an example, since the magnification of the wide camera is 0.6, the magnification of the main camera is 1, the magnification of the telephoto camera is w, and w is an integer greater than 1, the shooting magnification range corresponding to the wide camera is [0.6,1 ], the shooting magnification range corresponding to the main camera is [1, w ], and the shooting magnification range corresponding to the telephoto camera is a range greater than or equal to w.

Based on this, in example 1, if the first image is captured by the wide-angle camera and the zoom magnification of the ith image with respect to the first image is calculated to be 0.5, the shooting magnification of the ith image is 0.6/0.5 — 1.2. Assuming that the magnification of the telephoto camera is 10 (i.e., w is 10), since the photographing magnification (i.e., 1.2) of the ith image is within the photographing magnification range (i.e., [1, 10)) corresponding to the main camera, the terminal determines to capture the (i + 1) th image using the main camera.

For another example, taking the terminal as an example including the wide-angle camera, the main camera, the first telephoto camera, and the second telephoto camera, since the magnification of the wide-angle camera is 0.6, the magnification of the main camera is 1, the magnification of the first telephoto camera is w1, the magnification of the second telephoto camera is w2, and 1 < w1 < w2, the shooting magnification range corresponding to the wide-angle camera is [0.6,1), the shooting magnification range corresponding to the main camera is [1, w1 ], the shooting magnification range corresponding to the first telephoto camera is [ w1, w2 ], and the shooting magnification range corresponding to the second telephoto camera is a range not less than w 2.

Based on this, in example 2, if the first image is captured by the wide-angle camera and the zoom magnification of the ith image with respect to the first image is calculated to be 0.2, the shooting magnification of the ith image is 0.6/0.2 — 3. Assuming that the magnification of the first tele camera is 2 (i.e., w1 is 2), and the magnification of the second tele camera is 10 (i.e., w2 is 10), since the photographing magnification (i.e., 3) of the ith image is within the photographing magnification range (i.e., [2, 10)) corresponding to the first tele camera, the terminal determines to acquire the (i + 1) th image using the first tele camera.

It should be noted that, in a specific implementation, the terminal may switch to a certain camera when the shooting magnification of the ith image is within a small range before the shooting magnification range corresponding to the camera reaches the minimum critical value, so as to reduce the problem of shooting delay caused by switching cameras.

As shown in fig. 21, a schematic view of a process of acquiring N +1 images for a terminal provided in the embodiment of the present application is shown. The terminal comprises a camera 1 to a camera x, wherein x is an integer greater than or equal to 2, and the multiplying power of the camera with the larger number is larger.

Based on fig. 21, the process of acquiring N +1 images by the terminal may include the following steps:

the terminal uses camera 1 (i.e. the initial camera) to capture the 1 st image.

The terminal acquires a 2 nd image by using the camera 1, performs anti-shake processing on the 2 nd image based on the 1 st image, and then obtains the scaling factor of the anti-shake processed 2 nd image relative to the 1 st image. And if the shooting magnification of the 2 nd image is determined to be in the shooting magnification range corresponding to the camera a based on the zooming magnification, acquiring the 3 rd image by using the camera a. Wherein a is more than or equal to 1 and less than or equal to x.

The terminal uses the camera a to collect the 3 rd image, carries out anti-shake processing on the 3 rd image, and then obtains the scaling factor of the 3 rd image which is subjected to anti-shake processing relative to the 1 st image. And if the shooting magnification of the 3 rd image is determined to be in the shooting magnification range corresponding to the camera b based on the scaling magnification, acquiring the 4 th image by using the camera b. Wherein a is more than or equal to b is more than or equal to x.

By analogy, the terminal acquires the 4 th to the N +1 th images.

Optionally, the size of the target subject in the later acquired image in the N +1 images is smaller than the size of the target subject in the first image.

In one implementation, the size of the target subject in the later captured image is smaller than the size of the target subject in the earlier captured image in the N +1 images. As shown in fig. 22a, a schematic diagram of an image acquired by a terminal in S301 according to an embodiment of the present application is provided. In fig. 22a, a figure shows a first image acquired by the terminal, b figure shows a second image acquired by the terminal, and c figure shows a third image acquired by the terminal.

In another implementation, the size of the target subject in the later captured image of the N +1 images may be larger than the size of the target subject in the earlier captured image, but smaller than the size of the target subject in the first image. As shown in fig. 22b, a schematic diagram of an image acquired by a terminal in S301 according to an embodiment of the present application is provided. In fig. 22b, N +1 is 3 as an example. Fig. 22b shows a diagram a of the first image acquired by the terminal, b of the second image acquired by the terminal, and c of the third image acquired by the terminal.

As will be explained below by way of an example, the size of the target subject in the later captured image of the N +1 images may be larger than the size of the target subject in the earlier captured image, but smaller than the size of the target subject in the first image.

Assuming that the size of the target subject in the first image is d and the camera acquiring the first image is a 1X camera, since the first image and the second image are acquired by using 1X cameras, taking the size of the target subject in the second image as d/2 as an example, the zoom magnification of the second image with respect to the first image is 0.5, and thus the shooting magnification of the second image is 1/0.5-2. And the subsequent terminal can adopt a 2X camera to acquire a third image. On the one hand, since the terminal acquires the third image using the 2X camera and the second image using the 1X camera, it is possible that the size of the target subject in the third image is larger than the size of the target subject in the second image. On the other hand, since the terminal is further away from the target subject during the acquisition of the second to third images, the size of the target subject in the third image is smaller than that in the first image.

Optionally, the method may further include: and the terminal displays first information in the current preview interface, wherein the first information is used for indicating that the shooting of the Hockey zoom video is stopped.

For example, the terminal may display the first information in the current preview interface when the currently used camera is the camera with the maximum magnification in the terminal. It is possible for the user to stop shooting the heuchek zoom video for a period of time after the first information is acquired.

The method includes the steps that a Hokka zoom video is shot under the condition that the distance between a terminal and a target body is farther and farther, if a camera used at present is the camera with the maximum magnification in the terminal, the magnification of the current camera cannot be larger, so that the terminal cannot switch the camera, at the moment, a user is prompted to stop shooting the video in time by displaying first information on a current preview interface, otherwise, a subsequently acquired image of the terminal can only be amplified, so that the size of the target body in the amplified image is consistent with that of the target body in a first image, and the definition of a target image generated based on the subsequently acquired image is not high. That is, the present embodiment provides a method of guiding a user to stop shooting a heucher zoom video, which helps to improve the user experience.

The embodiment of the application does not limit what information the first information specifically contains to instruct to stop shooting the heucher zoom video. For example, it may directly indicate that "the currently used camera is the camera with the maximum magnification in the terminal", or indirectly indicate that the currently used camera is the camera with the maximum magnification in the terminal by indicating "please stop recording video". Fig. 23a is a schematic diagram of a current preview interface provided in the embodiment of the present application. The current preview interface includes an image 501 of the currently playing scholar kock zoom video (i.e., the current preview image), and a first message "please stop recording video" 502.

Optionally, the method may further include: and the terminal displays second information in the current preview interface, wherein the second information is used for indicating that the target main body is static.

For example, the terminal may display the second information in the current preview interface if the position of the target body in the current preview image is determined to coincide with the position of the target body in the previous preview image. One of the requirements of the Hirschk zoom video is that the positions of the target main bodies in the images are consistent, so that a user can know whether the requirement for acquiring the Hirschk zoom video is met currently in the process of acquiring the Hirschk zoom video, and the user experience is improved.

The embodiment of the application specifically includes which information the first information includes to indicate that the target subject is still without limitation. For example, as shown in fig. 23b, a schematic diagram of a current preview interface provided in the embodiment of the present application is provided. The current preview interface contains an image 501 of the currently presented heucher zoom video (i.e., the current preview image), and the second information "target subject still" 503.

Optionally, the method may further include: and the terminal displays third information in the current preview interface, wherein the third information is used for indicating that the target main body is in the center of the current preview image. In this way, the user can move the terminal based on when the third information is not displayed in the terminal, so that the target subject is in the center of the current preview image, which contributes to improving the quality of the Hirschk video.

For example, the terminal may display the third information in the current preview interface when detecting that the position of the target subject in the current preview image (e.g., the center of the target subject, or the outline of the target subject or the minimum external rectangle of the target subject, etc.) is in a preset central area of the current preview image (i.e., a preset area centered on the center of the current preview image).

The embodiment of the present application specifically indicates which information the third information includes, but is not limited to, indicating that the target subject is in the center of the current preview image. For example, as shown in fig. 23c, a schematic diagram of a current preview interface provided in the embodiment of the present application is shown. The current preview interface contains an image 501 of the currently presented heucher zoom video (i.e., the current preview image), and a third message "target subject is in the center of the current preview image" 504.

Alternatively, the terminal may display fourth information indicating that the target subject is not in the center of the current preview image in the current preview interface. In this way, the user can move the terminal based on the fourth information being displayed in the terminal so that the target subject is in the center of the current preview image, which helps to improve the quality of the hucko video.

Optionally, acquiring N +1 images in real time for the first scene includes: the first image is acquired when the target subject is in the center of the current preview image. This helps to improve the quality of the Hirschk video.

Optionally, in the process of acquiring the N +1 image, the moving speed of the terminal is less than or equal to the preset speed.

Due to the fact that the number of the cameras in the terminal is limited, the camera switching speed can be too high due to the fact that the terminal moves too fast, and when the camera with the maximum magnification is switched, the camera cannot be switched any more. When an image is acquired by using a camera with the maximum magnification, as the terminal is farther away from a target subject, the target subject in the later acquired image is smaller, and when a Hirschk zoom video is generated, the images need to be amplified, which may result in low definition of the image. Based on this, this possible design is proposed. This helps to improve the quality of the Hirschk zoom video.

The specific value of the preset speed is not limited in the embodiment of the application, and may be an empirical value, for example.

S302: and for the later acquired N images in the N +1 images, the terminal carries out white balance processing based on a preset neural network to obtain N optimized images. The preset neural network is used for ensuring the white balance consistency of the adjacent images in the time domain.

The preset neural network may be the white balance network provided by the embodiment of the present application, such as the white balance network shown in fig. 8. The terminal may download the preset neural network in the network device, or may obtain the preset neural network through local training. This is not limited in the embodiments of the present application.

Optionally, because there are some differences between images captured by different cameras, the terminal may correct parameters such as luminance and chromaticity of each of the N images acquired later in the N +1 images, in addition to performing white balance processing, so as to avoid (or avoid as much as possible) the problem of image inconsistency caused by switching cameras.

For example, for image chromaticity and brightness, luminance/chromaticity values of temporally adjacent images are respectively obtained, and a multiplicative factor or an additive factor of the luminance/chromaticity values is obtained to convert the luminance/chromaticity of the image so that the luminance/chromaticity values of a subsequent image are converted to be close to those of a previous image, thereby keeping the luminance/chromaticity values of the temporally adjacent images consistent.

S303: the terminal amplifies and cuts the N optimized images to obtain N target images; the size of a target subject in the N target images is consistent with the size of the target subject in a first image acquired from the N +1 images, and the relative position of the target subject in the N target images is consistent with the relative position of the target subject in the first image; the N target images are of the same size as the first image.

As can be seen from the above description, the size of the target subject in the later captured image is smaller than that in the first image, and therefore, the terminal needs to enlarge and crop the corresponding N optimized images of the later captured N images.

And the terminal enlarges the N optimized images to ensure that the size of the target subject in the enlarged N optimized images is consistent with that of the target subject in the first image, and the relative position of the target subject in the enlarged N optimized images is consistent with that of the target subject in the first image. And the terminal cuts the amplified N images to obtain N target images, so that each image in the N target images is consistent with the first image in size.

In one example, as shown in fig. 24. Fig. 24 a represents the first image of the N +1 images, and fig. 24 b represents one of the N optimized images. Fig. 24 c is a diagram showing an enlarged image of the optimized image shown in fig. 24 b, in which the size of the target subject in the enlarged image matches the size of the target subject in the first image shown in fig. 24 a. Fig. 24 d shows an object image obtained by cropping the image shown in fig. 24 c, and the size of the object image is the same as the size of the first image shown in fig. 24 a, wherein the position of the object body in the object image is ensured to be consistent with the position of the object body in the first image as much as possible during the cropping process.

It should be noted that, since the "distance between the terminal and the target subject" is different from the "distance between the terminal and the object in the background", in acquiring two different images of the N +1 images, the zoom ratio of the target subject in the two images is different from the zoom ratio of the same object in the background in the two images. This causes the background in the target image obtained after the enlarged cropping of one optimized image to be different from the background in the first image. For example, the background in the a and d plots in fig. 24 is different. Based on this, the terminal can generate a hessian zoom video of "the size of the target subject in different images is uniform, the relative position is uniform, and the background is not uniform" based on the N target images and the first image.

S304: the terminal generates a Hirschk zoom video based on the N target images and the first image.

The playing time interval between adjacent images in a heuchock zoom video may be predefined.

In one implementation, the terminal presents the Hirschhook zoom video in real-time during the shooting process. That is, the generated heuchock zoom video is presented while the heuchock zoom video is generated.

In this case, when the terminal executes the above S302 and S303: for the ith image, after S301d is executed, the ith image is input into a preset neural network, and an optimized image corresponding to the ith image is obtained. And amplifying and cutting the optimized image corresponding to the ith image to obtain a target image corresponding to the optimized image. That is, after an image is captured, the image may be white balanced, enlarged, cropped, rendered (i.e., the target image is displayed), and so forth. And, in the process of processing the image, a next image may be acquired and processed. Instead of performing S302 after obtaining N +1 images, and performing S303 after performing S302 for all images.

In another implementation, the terminal may perform S302 after obtaining N +1 images, and perform S303 after performing S302 for all images. That is, the terminal performs post-processing on the acquired N +1 images, thereby obtaining a heucher zoom video.

It should be noted that, if the camera that acquires the image is switched during the process of acquiring the N +1 images, the target subject may have different positions in different images acquired before and after the camera is switched due to the camera being switched, that is, the target subject may be unstable.

Based on this, optionally, the method may further include: the terminal acquires the position information of a target body in a first image and the position information of the target body in each target image in N target images; then, for each target image in the N target images, image stabilization processing is carried out on the target image based on a main body image stabilization algorithm and the position information of the target main body in the first image, and a new target image is obtained. Wherein the position of the target subject in the new target image coincides with the position of the target subject in the first image.

For example, if the mask of the target subject is obtained by the example segmentation algorithm, the position information of the target subject in the corresponding image may be obtained based on the mask of the target subject. The feature point detection is carried out on the target body position area by combining the position information of the target body, so that the influence of the feature point outside the target body area can be effectively eliminated. By performing image stabilization processing on the feature points of the target main body region, a new target image can be obtained.

The embodiment of the present application does not limit the subject image stabilization algorithm. For example, an AI image stabilization algorithm may be used.

Based on this, in S304, the terminal generates the heucher zoom video based on the N target images and the first image, and specifically includes: the terminal generates a Hirschk zoom video based on the N new target images and the first image. In this way, the main body of the resultant heuchek zoom video can be stabilized.

In the method for shooting the video, provided by the embodiment of the application, in the process of obtaining the xike zoom video, white balance processing is performed on the last N images in the N +1 images collected in real time, so that the white balance of the processed images is consistent with that of the first image in the N +1 images collected. Therefore, the white balance effect of the obtained Hirschk zoom video is better, the quality of the Hirschk zoom video is improved, and the user experience is improved. In addition, in the embodiment of the application, the terminal can amplify the size of the target main body in the subsequently acquired image in a camera switching mode, and compared with the traditional technology, the method is beneficial to enabling the definition of the obtained Hirschhook zoom effect video to be higher, and therefore the user experience is improved.

Fig. 25 is a schematic flowchart of another method for capturing video according to an embodiment of the present disclosure. The method is applied to the terminal. The terminal comprises at least two cameras. The technical scheme provided by the embodiment is applied to the scene of shooting the heucher zoom video from far to near, namely, the distance between the terminal and the target body is closer and closer in the process of shooting the heucher zoom video.

In this embodiment, a heucheck zoom video is shot under the condition that the terminal is closer to the target subject, and when the camera is not switched, the size of the target subject in the later acquired image is larger than that in the earlier acquired image. However, the sizes of the target subjects in different images in the heksuke zoom video are consistent, and therefore, the images acquired later need to be reduced. Because the image acquired later is reduced, the image needs to be subjected to edge complementing processing, so that the size of the image subjected to edge complementing is consistent with that of the first image acquired by the terminal, which can cause the image subjected to edge complementing to have a black edge, and thus causes poor user experience when the image is presented.

For example, a first image acquired by the terminal is shown as an a diagram in fig. 26, a second image is shown as a b diagram in fig. 26, an image obtained by reducing the second image is shown as a c diagram in fig. 26, and an image obtained by "edge-filling" the c diagram in fig. 26 is shown as a d diagram in fig. 26.

Based on this, the basic principle of the technical scheme provided by this embodiment is as follows: the terminal can adopt the camera that the multiplying power that uses is less than when gathering preceding image to gather the image in back, realizes reducing the target subject through the camera that switches into less multiplying power promptly, like this, need not carry out "mend limit" to the image after gathering to improve user experience.

The method shown in fig. 25 may include the steps of:

s400: the terminal determines to shoot the Hooke zoom video from far to near and determines an initial camera.

Alternatively, the terminal may determine to take the heucher zoom video from far to near at the direction of the user.

For example, based on the user interface shown in a diagram in fig. 11, the user may click the "near-far mode" 402 control through a touch operation, and in response to the touch operation, the terminal highlights the "near-far mode" 402 control and simultaneously starts to initiate capturing of the heucher zoom video in the near-far mode.

It should be noted that the terminal may also be activated in other manners (e.g., a voice command manner, etc.) to capture the heuchker zoom video from the far-near mode, which is not specifically limited in this embodiment of the application.

Since in this embodiment, if the terminal switches the cameras, the camera is switched to a camera with a smaller magnification, so the magnification of the initial camera is not usually the camera with the smallest magnification in the terminal, and it can be usually predefined that the camera is the camera with the larger magnification (e.g. the largest) in the terminal.

S401: the terminal collects N +1 images aiming at a first scene, wherein the N +1 images all comprise a target main body. Wherein, in the process of acquiring the N +1 images, the terminal is closer to the target main body. N is an integer of 1 or more. The first image in the N +1 images is acquired by a first camera of the terminal, part or all of the last N images in the N +1 images are acquired by a second camera of the terminal, and the multiplying power of the second camera is smaller than that of the first camera. The size of the target subject in the later acquired N images of the N +1 images is less than or equal to the size of the target subject in the first image acquired in the N +1 images.

That is, in the present embodiment, during the process of capturing an image by the terminal, the camera with the large magnification is switched to the camera with the small magnification, which helps to make the size of the target subject in the later captured image smaller than or equal to the size of the target subject in the earlier captured image in the scene where the terminal is closer to the target subject.

In one implementation, the N +1 images are N +1 images acquired continuously, i.e., N +1 images acquired in real-time.

Optionally, the last N images in the N +1 images include N1 images captured before and N2 images captured after, where N1 images are captured by the second camera and N2 images are captured by the third camera of the terminal; n1 and N2 are each an integer of 1 or more.

Optionally, acquiring N +1 images for the first scene includes: acquiring the shooting magnification of the ith image in the N +1 images; wherein i is not less than 2 and not more than N, i is an integer; acquiring an (i + 1) th image in the (N + 1) th images aiming at the first scene based on the second camera if the shooting magnification of the ith image is within the first shooting magnification range; and if the shooting magnification of the ith image is within the second shooting magnification range, acquiring the (i + 1) th image in the (N + 1) th images aiming at the first scene based on a third camera of the terminal. Wherein the multiplying power of the second camera is b, and the multiplying power of the third camera is c; b is more than c; the first shooting magnification range is a range greater than or equal to b; the second photographing magnification range is [ c, b). The explanation and examples of the relevant content in this alternative implementation are based on example reasoning referred to above and will not be described here again.

Optionally, the shooting magnification of the first image is larger than that of the second camera. That is, the initial shooting magnification of the terminal is larger than the magnification of the camera used when the second image is acquired. For example, the terminal includes a 5X camera and a 1X camera, and the camera used for acquiring the first image may be the 5X camera, and the shooting magnification at this time may be a range greater than 5, or [1, 5 ]. The camera used to capture the second image is a 1X camera.

Optionally, the method may further include: in the current preview interface, first information is displayed, and the first information is used for indicating that the shooting of the Hirschhorn zoom video is stopped.

For example, the terminal may display the first information in the current preview interface when the currently used camera is the camera with the smallest magnification in the terminal. It is possible for the user to stop shooting the heuchek zoom video for a period of time after the first information is acquired.

The method includes the steps that a Hokka zoom video in the Hikka zoom video is shot under the condition that the distance between a terminal and a target body is closer and closer, if a camera used at present is the camera with the minimum magnification in the terminal, the magnification of the current camera cannot be smaller, so that the terminal cannot switch the camera, at the moment, a user is prompted to stop shooting the video in time by displaying first information on a current preview interface, and otherwise, images acquired subsequently by the terminal need to be reduced and edge compensated, so that the user experience in the process of playing the Hokka zoom video is reduced. That is, the present embodiment provides a method of guiding a user to stop shooting a heucher zoom video, which helps to improve the user experience.

Optionally, the method may further include: and displaying second information in the current preview interface, wherein the second information is used for indicating that the target main body is static. Optionally, the method may further include: and displaying third information in the current preview interface, wherein the third information is used for indicating that the target main body is in the center of the current preview image. Optionally, acquiring N +1 images for the first scene includes: the first image is acquired when the target subject is in the center of the current preview image. For specific implementation and examples thereof, reference may be made to the above description, which is not repeated herein.

In one possible design, the moving speed of the terminal is less than or equal to a preset speed. Due to the limited number of cameras in the terminal, the speed of camera switching may be too fast due to the fact that the terminal moves too fast, and when the camera with the minimum magnification is switched, the camera cannot be switched any more. When an image is captured using a camera of the minimum magnification, as the terminal is closer to the target subject, the target subject in the post-captured image becomes larger, which may cause the size of the target subject in the post-captured image to be larger than the size of the target subject in the first image of the above-mentioned N +1 images. When generating a heuchock zoom video, these images need to be scaled down and edge-filled, thereby degrading the user experience. Based on this, this possible design is proposed. This helps to improve the quality of the high-altitude kock zoom video.

S402: and for the later acquired N images in the N +1 images, the terminal carries out white balance processing based on a preset neural network to obtain N optimized images. The preset neural network is used for ensuring the white balance consistency of the adjacent images in the time domain.

S403: and the terminal amplifies and cuts the N optimized images to obtain N target images. Wherein the size of the target subject in the N target images is consistent with the size of the target subject in the first image acquired in the N +1 images. The relative position of the target subject in the N target images coincides with the relative position of the target subject in the first image. The N target images are of the same size as the first image.

S404: the terminal generates a Hirschk zoom video based on the N target images and the first image.

For specific implementation of S402-S404, reference may be made to the above description of S302-S304, and details are not repeated here.

In the method for shooting the video, the size of the target body in the later collected image is smaller than or equal to the size of the target body in the previous collected image by switching to the camera with smaller magnification in the scene that the terminal is closer to the target body, so that the Hirschhorn zoom video is obtained based on the collected images. Then, the last N images of the acquired N +1 images are subjected to white balance processing so that the processed images match the white balance of the first image of the acquired N +1 images. Therefore, the white balance effect of the obtained Hirschk zoom video is better, the quality of the Hirschk zoom video is improved, and the user experience is improved. In addition, in the embodiment of the application, the terminal can reduce the size of the target main body in the subsequently acquired image by switching the camera, and compared with the traditional technology, the acquired image does not need to be subjected to edge repairing processing, so that the user experience can be improved.

In addition, the method for shooting the Hirschhorn zoom video can be applied to scenes that the terminal is closer to the target main body. The method may comprise the steps of:

step 1: reference may be made to S400 described above.

Step 2: the terminal collects N +1 images in real time aiming at a first scene, wherein the N +1 images all comprise a target main body. Wherein, in the process of acquiring the N +1 images, the terminal is closer to the target main body. N is an integer of 1 or more. The first image in the N +1 images is acquired by a first camera of the terminal, part or all of the last N images in the N +1 images are acquired by a second camera of the terminal, and the multiplying power of the second camera is smaller than that of the first camera.

In some examples, reference may be made to the above description of S301 for specific implementation of this step, and details are not described here.

In other examples, based on the scheme (as shown in fig. 19 a) of determining the camera used for acquiring the images in S301, the cameras used for acquiring the first image and the second image of the N +1 images are the same. In this embodiment, in the process of shooting the kock zoom video in the hek area, the terminal is closer to the target subject, and the size of the target subject in the later captured image is larger than that of the target subject in the earlier captured image without switching the camera. Thus, determining the camera used to capture the image using the method shown in FIG. 19a results in the size of the target subject in the second image being larger than the size of the target subject in the first image.

In this regard, in a solution of the embodiment of the present application, a camera used by the terminal to acquire the second image is different from a camera used to acquire the first image. And the magnification of the camera used for acquiring the second image is smaller than the magnification of the camera used for acquiring the first image, which helps to realize that the size of the target image in the second image is smaller than the size of the target subject image in the first image.

And step 3: reference may be made to S402 described above.

And 4, step 4: and the terminal amplifies and cuts the optimized image meeting the first condition in the N optimized images to obtain at least one target image. Wherein the optimized image satisfying the first condition is an optimized image including a target subject whose size is smaller than that of the target image in the first image. The size of the target subject in the at least one target image is consistent with the size of the target subject in the first image captured in the N +1 images. The relative position of the target subject in the at least one target image coincides with the pre-aligned position of the target subject in the first image. The at least one target image is of a size consistent with the first image.

Regarding the processing manner of the optimized image satisfying the first condition, reference may be made to the relevant description in S303, and details are not repeated here.

And 5: the terminal generates a Hirschk zoom video based on the at least one target image and the first image.

For example, the terminal generates the heuchock zoom video based on the at least one target image and the images of the first image and the N optimized images that do not satisfy the first condition.

Due to the technical solution provided by this embodiment, the size of the target subject in the post-captured image may be larger than, equal to, or smaller than the size of the target subject in the first image. Therefore, the present embodiment distinguishes the optimized image satisfying the first condition from the optimized image not satisfying the first condition. For an optimized image that does not satisfy the first condition, it can be directly taken as one image in the heucher zoom video without performing the zoom-in cropping.

For a specific implementation manner of step 5, reference may also be made to the relevant description in S304.

Optionally, the size of the target subject in the N +1 th image acquired in the N +1 th image is smaller than or equal to the size of the target subject in the first image.

Optionally, the size of the target subject in the (N + 1) th image is larger than the size of the target subject in the first image, and a difference between the size of the target subject in the (N + 1) th image and the size of the target subject in the first image is smaller than or equal to a preset threshold. That is, when the size of the target subject in the N +1 th image is larger than that in the first image, the difference therebetween cannot be so large that "the sizes of the target subjects in different images of the heucheck zoom video are uniform" can be satisfied when the heucheck zoom video is generated based on the N +1 th image.

It should be noted that, in the actual implementation, in the process of acquiring an image in real time, if the degree of enlargement of the target subject in the acquired image due to the fact that the terminal is close to the target subject is greater than the degree of reduction of the target subject in the acquired image due to the fact that the terminal switches the current camera to a camera with a smaller magnification (or does not switch the camera), the size of the target subject in the later acquired image may be greater than the size of the target subject in the first image.

For this purpose:

in a solution of the embodiment of the present application, if a difference between a size of a target subject in an image currently captured by a terminal and a size of the target subject in a first image is greater than a preset threshold, and the size of the target subject in the currently captured image is greater than the size of the target subject in the first image, the capturing of the image is stopped. Accordingly, the terminal generates the Hirschk zoom video using the images acquired before that.

Since one of the requirements of the heucheck zoom video is that the sizes of the target subjects in different images are consistent, "the difference between the size of the target subject in the image currently captured by the terminal and the size of the target subject in the first image is greater than the preset threshold, and the size of the target subject in the image currently captured is greater than the size of the target subject in the first image", which explains that: based on the image currently acquired by the terminal, the condition that the sizes of target bodies in different images in the kirk zoom video are consistent cannot be met, and therefore, the acquisition of the image is stopped.

Based on this, optionally, the method may further include: acquiring an (N + 2) th image aiming at a first scene, wherein the (N + 2) th image comprises a target subject; the distance between the terminal and the target main body when the N +2 images are collected is smaller than the distance between the terminal and the target main body when the N +1 images are collected. In this case, the step 4 includes: and when the difference value between the size of the target subject in the (N + 2) th image and the size of the target subject in the first image is larger than a preset threshold value, and the size of the target subject in the (N + 2) th image is larger than the size of the target subject in the first image, generating the Hirschk zoom video based on the N target images and the first image.

Optionally, when stopping capturing the image based on the above scheme, the terminal may output first information, where the first information is used to instruct to stop shooting the heuchek zoom video. That is, the embodiment of the present application provides a method for guiding a user to stop shooting a kock zoom video, so that the user can stop the mobile terminal based on the image, thereby improving user experience.

The embodiment of the present application does not limit the specific implementation manner of the first information, and for example, the first information may be output in the form of an image, a text, a voice, and the like. In this embodiment, the terminal may display the current preview interface as shown in fig. 23a when determining to stop capturing the image based on the above scheme, so as to prompt the user to stop shooting the kock zoom video in the xitake area.

In yet another solution of the embodiment of the present application, N +1 images are acquired by a camera with the minimum magnification in the terminal. And when the size of the target subject in the (N + 1) th image is larger than that of the target subject in the first image, and the difference value between the size of the target subject in the (N + 1) th image and the size of the target subject in the first image is equal to a preset threshold value, outputting first information, wherein the first information is used for indicating that the shooting of the Hirschk zoom video is stopped.

On one hand, since the N +1 images are acquired by the camera with the minimum magnification in the terminal, the camera cannot be switched any more if the images are continuously acquired subsequently. On the other hand, the heuchker zoom video requires the size of the target subject to be consistent from image to image. And when the size of the target main body in the (N + 1) th image is larger than that of the target main body in the first image, and the difference value between the size of the target main body in the (N + 1) th image and that of the target main body in the first image is equal to a preset threshold value, it is described that the jumping degree of the size of the target main body in the (N + 1) th image compared with that of the target main body in the first image reaches a critical value for obtaining the Hirscoke zoom video. In view of this, the present embodiment provides the above-described method of stopping capturing the heucher zoom video.

In another solution of the embodiment of the present application, the optimized image satisfying the first condition in S403 may be replaced with: the size of the included target subject is smaller than the optimized image of the size of the target image in the reference image. The reference image is an image "closest to the image before the image corresponding to the optimized image and including a target subject having a size larger than or equal to the size of the target subject included in the first image" among the N +1 images.

For example, taking a terminal including 0.6X, 1X, 2X, 5X and 10X cameras as an example, the shooting magnification ranges corresponding to these cameras are respectively: [0.6,1), [1,2), [2,5), [5,10), and a range of 10 or more.

The 1 st image is acquired with a 10X camera, where the size of the target subject is d.

The 2 nd image was acquired with a 5X camera, where the size of the target subject was 0.8 d. From this, it can be seen that the imaging magnification of the 2 nd image is: 5/(0.8d/d) — 6.25, 6.25 ∈ [5,10), so the camera that acquired the 3 rd image is a 5X camera.

A 3 rd image was acquired with a 5X camera, where the size of the target subject was 1.5 d. From this, it is understood that the photographing magnification of the 3 rd image is: 5/(1.5d/d) — 3.33, 3.33 ∈ [2,5), so the camera that acquired the 4 th image is a 2X camera.

The 4 th image was acquired with a 2X camera, where the size of the target subject was 0.8 d. From this, it is understood that the shooting magnification of the 4 th image is: 2/(0.8d/d) — 2.5, 2.5 ∈ [2,5), so the camera that acquired the 5 th image is a 2X camera.

……

Based on this example, the optimized images satisfying the first condition are the optimized image corresponding to the 2 nd image and the optimized image corresponding to the 4 th image. For the optimized image of the 4 th image, when the optimized image is subjected to the enlarging and cutting, the reference image is the 3 rd image. For the optimized image of the 2 nd image, when the optimized image is subjected to the enlarging and cutting, the reference image is the 1 st image.

Fig. 27 is a schematic flowchart of a method for capturing a video according to an embodiment of the present disclosure. The method shown in fig. 27 is applied to a terminal including at least two cameras whose magnifications are different. The method shown in fig. 27 may include the steps of:

s500: the terminal respectively collects at least two images aiming at a first scene at a first time through at least two cameras; wherein, a camera corresponds an image, and all include the target subject in at least two images.

That is to say, in the embodiment of the present application, the image acquired for the video to be captured is an image acquired by a plurality of cameras at the same time for the same scene.

S501: the method comprises the steps that a terminal determines the number N of frames of images to be inserted between a first image and a second image in at least two images based on the preset playing duration and the preset playing frame rate of a video; the first image is an image collected by a first camera in the at least two images, and the first camera is a camera with the largest magnification in the at least two cameras. The second image is an image collected by a second camera in the at least two camera images, and the second camera is the camera with the smallest magnification in the at least two cameras. N is an integer of 1 or more.

This is in view of: the technical scheme is provided based on that in images acquired by different cameras at the same time and aiming at the same scene, the size of a target body in the image acquired by the camera with the maximum magnification is the largest, and the size of the target body in the image acquired by the camera with the minimum magnification is the smallest.

S502: the terminal determines N images to be inserted based on the frame number N of the images to be inserted and partial or all images in the at least two images. Wherein the part or all of the images comprise at least a first image and a second image.

During specific implementation, the terminal extracts the size of the target main body in each image of the acquired part or all of the images, and then determines the value of the pixel of the corresponding image to be inserted based on the size of the target main body in the corresponding image. Specific examples may refer to the example shown in fig. 28.

The more images in the at least two images are determined by the terminal to be inserted, which is more beneficial to improving the accuracy of frame insertion, so that the images in the finally generated video can reflect a real scene more, and further the user experience is improved.

S503: and the terminal generates a video based on the at least two images and the N images to be inserted. Wherein the size of the target subject in each image of the video is gradually increased or decreased.

In one example, a 10X camera, a 3X camera, a 1X camera, and a 0.6X camera are provided in the terminal.

When S500 is executed, the terminal acquires images based on the 4 cameras respectively at the same time, and obtains images 1 to 4, as shown in fig. 28. In fig. 28, the diagrams a to d respectively show the images 1 to 4.

When S501 is executed, it is assumed that the preset playing time of the video to be captured is n seconds, n is an integer greater than or equal to 1, and the preset playing frame rate is 24 frames/second, that is, 24 frames of images are played in total per second, and then the total number of images required by the video is n × 24. Therefore, the zoom ratio between two adjacent frames in the video to be shot is obtained

Taking n as 1 as an example, the zoom ratio between two adjacent frames in the video to be shot is

When S502 is executed, the terminal may perform the following steps:

firstly, the terminal determines a reference image, and determines the shooting magnification of an image to be inserted based on the zoom magnification between the reference image and two adjacent frames in a video to be shot, wherein the number of the images to be inserted is N. The reference image may be the image captured by the camera with the maximum magnification (i.e., image 1) or the image captured by the camera with the minimum magnification (i.e., image 4). Taking the reference image as image 4 as an example, the shooting magnifications of the N images to be inserted are respectively: 0.6 × 1.124 ═ 0.6744,0.6744 × 1.124 ═ 0.758,0.758 × 1.124 ═ 0.852, … …

Secondly, the terminal inserts frames based on the shooting magnification of any frame image to be inserted and the values of the pixels in the two images of which the shooting magnification is larger than the shooting magnification of the frame image to be inserted and smaller than the shooting magnification of the frame image to be inserted, so as to obtain the values of the pixels in the frame image to be inserted. By analogy, the terminal can obtain N images to be inserted.

It can be understood that, before the frame interpolation is performed, the terminal needs to perform target subject detection on the two images, so as to obtain the size of the target subject. Schematic diagrams illustrating target subject detection are shown as e-h diagrams in fig. 28, in which portions in rectangular boxes represent target subjects. Also, the step of frame interpolation is illustrated in fig. 28.

Optionally, the two images are images with shooting magnifications greater than that of the frame image to be inserted and with the smallest difference from the shooting magnification of the frame image to be inserted, and the images with shooting magnifications less than that of the frame image to be inserted and with the smallest difference from the shooting magnification of the frame image to be inserted.

For example, for an image to be inserted with a shooting magnification of 0.6-1, frame insertion is performed using image 4 and image 3. And for the image to be inserted with the shooting magnification of 1-3, adopting the image 3 and the image 2 to perform frame insertion. And for the image to be inserted with the shooting magnification of 3-10, adopting the image 2 and the image 1 to perform frame insertion.

And finally, the terminal generates a video (or a dynamic graph) according to the sequence from large to small or from small to large of the size of the target main body to be inserted into the images 1-4. The steps for generating the video are illustrated in fig. 28.

In the conventional technology, the same camera is usually adopted to capture images at different object distances, so as to generate a video, wherein the size of a target subject in each image of the video gradually becomes larger or smaller. This may cause a large difference between the backgrounds of different images due to a positional shift (such as a left-right shift or a top-bottom shift) of the terminal or a movement of a dynamic object in the background when the images are captured at different times, thereby reducing the quality of the video. In the method for shooting a video provided by this embodiment, a terminal collects multiple frames of images for the same scene at the same time through multiple cameras, and performs frame interpolation based on the multiple frames of images, thereby generating the video. This helps to improve the quality of the generated video compared to conventional techniques. In addition, the method is favorable for improving the interest of the motion picture effect and enhancing the viscosity of the user to the terminal.

The scheme provided by the embodiment of the application is mainly introduced from the perspective of a method. To implement the above functions, it includes hardware structures and/or software modules for performing the respective functions. Those of skill in the art would readily appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as hardware or combinations of hardware and computer software. Whether a function is performed as hardware or computer software drives hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiment of the present application, the terminal may be divided into the functional modules according to the method example, for example, each functional module may be divided corresponding to each function, or two or more functions may be integrated into one processing module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. It should be noted that, in the embodiment of the present application, the division of the module is schematic, and is only one logic function division, and there may be another division manner in actual implementation.

Fig. 29 is a schematic structural diagram of a terminal according to an embodiment of the present application. The terminal 220 shown in fig. 29 can be used to implement the functions of the terminal in the above method embodiment, and therefore, the beneficial effects of the above method embodiment can also be achieved. In an embodiment of the present application, the terminal may be the terminal 100 shown in fig. 1.

As shown in fig. 29, the terminal 220 includes an acquisition unit 221 and a processing unit 222. Optionally, as shown in fig. 30, the terminal 220 further includes a display unit 223.

In some embodiments:

the acquisition unit 221 is configured to acquire N +1 images in real time for a first scene, where the N +1 images each include a target subject; and in the process of acquiring the N +1 images, the terminal is farther away from the target main body. N is an integer of 1 or more. A processing unit 222 for performing the steps of: carrying out white balance processing on the later acquired N images in the N +1 images based on a preset neural network to obtain N optimized images; the preset neural network is used for ensuring the white balance consistency of the adjacent images in the time domain. Amplifying and cutting the N optimized images to obtain N target images; the size of a target subject in the N target images is consistent with the size of the target subject in a first image acquired from the N +1 images, and the relative position of the target subject in the N target images is consistent with the relative position of the target subject in the first image; the size of the N target images is consistent with that of the first image; a hessian zoom video is generated based on the N target images and the first image. For example, in conjunction with fig. 10, the acquisition unit 221 is configured to perform S301, and the processing unit 222 is configured to perform S302-S304.

Optionally, the N +1 images include N1+1 images captured at the front and N2 images captured at the back, where N1+1 images are captured by a first camera of the terminal, and N2 images are captured by a second camera of the terminal; n1 and N2 are each an integer of 1 or more.

Optionally, the collecting unit 221 is specifically configured to: acquiring the shooting magnification of the ith image in the N +1 images, wherein i is more than or equal to 2 and less than or equal to N, and i is an integer; and if the shooting magnification of the ith image is within the first shooting magnification range, acquiring the (i + 1) th image in the (N + 1) th images for the first scene based on the first camera. Acquiring an (i + 1) th image in the (N + 1) th images aiming at the first scene based on the second camera if the shooting magnification of the ith image is in a second shooting magnification range; wherein the multiplying power of the first camera is a, and the multiplying power of the second camera is b; a is less than b; the first photographing magnification range is [ a, b); the second shooting magnification range is a range equal to or greater than b. For example, in connection with fig. 19b, the acquisition unit 221 may be configured to perform S301 c-3.

Optionally, the shooting magnification of the ith image is determined based on a zoom magnification of the size of the target subject in the ith image relative to the size of the target subject in the first image, and a magnification of a camera that captures the first image.

Optionally, the size of the target subject in the ith image is characterized by at least one of the following features: the width of the target subject in the ith image, the height of the target subject in the ith image, the area of the target subject in the ith image, or the number of pixel points occupied by the target subject in the ith image.

Optionally, the processing unit 222 is further configured to extract the target subject from the ith image by using an example segmentation algorithm to determine the size of the target subject in the ith image.

Optionally, the display unit 223 is configured to display, in the current preview interface, first information, where the first information is used to instruct to stop shooting the heuchker zoom video.

Optionally, the display unit 223 is configured to display second information in the current preview interface, where the second information is used to indicate that the target subject is still.

Optionally, the display unit 223 is configured to display third information in the current preview interface, where the third information is used to indicate that the target subject is in the center of the current preview image.

Optionally, the acquiring unit 221 is specifically configured to acquire the first image when the target subject is in the center of the current preview image.

Optionally, the display unit 223 is configured to display a user interface, where the user interface includes a first control, and the first control is used to instruct to shoot a kock zoom video in the heuchen region from near to far; and receiving an operation for the first control. The acquisition unit 221 is specifically configured to, in response to the operation, acquire N +1 images in real time for the first scene.

Optionally, the moving speed of the terminal is less than or equal to a preset speed.

Optionally, the preset neural network is configured to predict a white balance gain of the image to be processed in combination with the feature map of the historical network layer, so as to ensure white balance consistency of the time-domain adjacent images; wherein the history network layer is a network layer used in predicting a white balance gain of an image preceding and temporally continuous with the image to be processed.

Optionally, the preset neural network is obtained by training based on preset constraint conditions; wherein the preset constraint condition comprises: the predicted values of the white balance gains for a plurality of images which are continuous in an analog time domain are consistent.

Optionally, the processing unit 222 performs white balance processing on N images acquired later in the N +1 images based on a preset neural network to obtain N optimized images, which is specifically configured to: inputting a jth image in the N +1 images into a preset neural network to obtain a predicted value of a white balance gain of the jth image; wherein j is more than or equal to 2 and less than or equal to N-1, and j is an integer. Applying the white balance gain predicted value of the jth image to obtain an optimized image corresponding to the jth image; and the N optimized images comprise optimized images corresponding to the jth image.

In other embodiments:

the acquisition unit 221 is configured to acquire N +1 images in real time for a first scene, where the N +1 images each include a target subject; in the process of acquiring the N +1 images, the terminal is closer to the target main body; n is an integer of 1 or more. The first image in the N +1 images is acquired by a first camera of the terminal, part or all of the last N images in the N +1 images are acquired by a second camera of the terminal, and the multiplying power of the second camera is smaller than that of the first camera. The size of the target subject in the later acquired N images of the N +1 images is less than or equal to the size of the target subject in the first image acquired in the N +1 images. A processing unit 222 for performing the steps of: carrying out white balance processing on the N acquired images based on a preset neural network to obtain N optimized images; the preset neural network is used for ensuring the white balance consistency of the adjacent images in the time domain. And amplifying and cutting the N optimized images to obtain N target images, wherein the size of a target body in the N target images is consistent with that of a target body in a first image acquired from the N +1 images, and the relative position of the target body in the N target images is consistent with that of the target body in the first image. The N target images are of the same size as the first image. Based on the N target images and the first image, a heucher zoom video is generated. For example, in conjunction with fig. 25, the acquisition unit 221 may be configured to perform S401, and the processing unit 222 may be configured to perform S402-S404.

Optionally, the N images acquired later include N1 images acquired before and N2 images acquired after, where N1 images are acquired by the second camera and N2 images are acquired by the third camera of the terminal; n1 and N2 are each an integer of 1 or more.

Optionally, the acquisition unit 221, in terms of acquiring N +1 images for the first scene, is specifically configured to: acquiring the shooting magnification of the ith image in the N +1 images, wherein i is more than or equal to 2 and less than or equal to N, and i is an integer; and if the shooting magnification of the ith image is within the first shooting magnification range, acquiring the (i + 1) th image in the (N + 1) th images for the first scene based on the second camera. And if the shooting magnification of the ith image is within the second shooting magnification range, acquiring the (i + 1) th image in the (N + 1) th images for the first scene based on the third camera. Wherein the multiplying power of the second camera is b, and the multiplying power of the third camera is c; b is more than c; the first shooting magnification range is a range greater than or equal to b; the second photographing magnification range is [ c, b).

Optionally, the collecting unit 221 is specifically configured to: the first image is acquired when the target subject is in the center of the current preview image.

Optionally, the display unit 223 is configured to display a user interface, where the user interface includes a second control, and the second control is configured to instruct to shoot the kock zoom video in the heuchen area from far to near; and receiving an operation for a second control. The acquisition unit 221 is specifically configured to, in response to the operation, acquire N +1 images for the first scene.

Optionally, for N images acquired later in the N +1 images, performing white balance processing based on a preset neural network to obtain N optimized images, including: inputting a jth image in the N +1 images into a preset neural network to obtain a predicted value of a white balance gain of the jth image; wherein j is more than or equal to 2 and less than or equal to N-1, and j is an integer. Applying the white balance gain predicted value of the jth image to obtain an optimized image corresponding to the jth image; and the N optimized images comprise optimized images corresponding to the jth image.

In other embodiments:

the acquisition unit 221 includes a first camera and a second camera, and a magnification of the first camera is different from a magnification of the second camera. The acquisition unit 221 is configured to acquire a first image and a second image for a first scene at a first time through a first camera and a second camera, respectively; the first image and the second image both comprise the target subject. A processing unit 222 for performing the steps of: determining the number N of frames of images to be inserted between the first image and the second image based on the preset playing duration and the preset playing frame rate of the video; wherein N is an integer of 1 or more. And determining N images to be inserted based on the frame number N, the first image and the second image. Generating a video based on the first image, the second image and the image to be inserted; wherein the size of the target subject in each image of the video is gradually increased or decreased. For example, in conjunction with fig. 27, the acquisition unit 221 may be configured to perform S500, and the processing unit 222 may be configured to perform S501-S503.

Optionally, the collecting unit 221 further includes a third camera, and a magnification of the third camera is between the magnifications of the first camera and the second camera. The collecting unit 221 is further configured to collect a third image for the first scene at the first time by using a third camera; wherein the third image includes the target subject. The processing unit 222 is specifically configured to, in determining the N images to be inserted based on the number of frames, the size of the target subject in the first image, and the size of the target subject in the second image: determining aspects of the N images to be inserted based on the number of frames N, the first image and the second image, and specifically configured to: and determining N images to be inserted based on the frame number N, the first image, the second image and the third image.

For the detailed description of the above alternative modes, reference may be made to the foregoing method embodiments, which are not described herein again. In addition, for any explanation and beneficial effect description of the terminal 220 provided above, reference may be made to the corresponding method embodiment described above, and details are not repeated.

As an example, in connection with fig. 1, the above-described acquisition unit may be implemented by a camera 193. The functions of the processing unit 222 can be realized by the processor 110 calling the level code stored in the internal memory 121.

Another embodiment of the present application further provides a terminal, including: the system comprises a processor, a memory and a camera, wherein the camera is used for collecting images, the memory is used for storing computer programs and instructions, and the processor is used for calling the computer programs and the instructions and executing corresponding steps executed by the terminal in the method flow shown in the embodiment of the method in cooperation with the camera.

Another embodiment of the present application further provides a computer-readable storage medium, where instructions are stored in the computer-readable storage medium, and when the instructions are executed by a terminal, the terminal performs each step in the method flow shown in the foregoing method embodiment.

In some embodiments, the disclosed methods may be implemented as computer program instructions encoded on a computer-readable storage medium in a machine-readable format or encoded on other non-transitory media or articles of manufacture.

It should be understood that the arrangements described herein are for illustrative purposes only. Thus, those skilled in the art will appreciate that other arrangements and other elements (e.g., machines, interfaces, functions, orders, and groupings of functions, etc.) can be used instead, and that some elements may be omitted altogether depending upon the desired results. In addition, many of the described elements are functional entities that may be implemented as discrete or distributed components or in conjunction with other components, in any suitable combination and location.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented using a software program, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. The processes or functions according to the embodiments of the present application are generated in whole or in part when the computer-executable instructions are loaded and executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored on a computer readable storage medium or transmitted from one computer readable storage medium to another computer readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center via wire (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). Computer-readable storage media can be any available media that can be accessed by a computer or can comprise one or more data storage devices, such as servers, data centers, and the like, that can be integrated with the media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims

1. A method for shooting video, which is applied to a terminal and comprises the following steps:

acquiring N +1 images in real time aiming at a first scene, wherein the N +1 images comprise target main bodies; wherein, in the process of collecting the N +1 images, the terminal is farther and farther away from the target main body; n is an integer of 1 or more;

carrying out white balance processing on the later acquired N images in the N +1 images based on a preset neural network to obtain N optimized images; the preset neural network is used for ensuring the white balance consistency of adjacent images in a time domain;

amplifying and cutting the N optimized images to obtain N target images; wherein the size of the target subject in the N target images is consistent with the size of the target subject in a first image acquired in the N +1 images, and the relative position of the target subject in the N target images is consistent with the relative position of the target subject in the first image; the N target images are consistent with the first image in size;

generating a Hirschk zoom video based on the N target images and the first image.

2. The method of claim 1,

the N +1 images comprise N1+1 images captured before and N2 images captured after, wherein the N1+1 images are captured by a first camera of the terminal and the N2 images are captured by a second camera of the terminal; the N1 and the N2 are each an integer of 1 or more.

3. The method of claim 1 or 2, wherein the acquiring N +1 images in real-time for a first scene comprises:

acquiring the shooting magnification of the ith image in the N +1 images; wherein i is not less than 2 and not more than N, i is an integer;

acquiring an i +1 th image in the N +1 images aiming at the first scene based on a first camera of the terminal if the shooting magnification of the i image is in a first shooting magnification range;

acquiring an i +1 th image in the N +1 images for the first scene based on a second camera of the terminal if the shooting magnification of the i image is within a second shooting magnification range;

wherein the multiplying power of the first camera is a, and the multiplying power of the second camera is b; a is less than b; the first photographing magnification range is [ a, b); the second shooting magnification range is a range of b or more.

4. The method of claim 3,

the shooting magnification of the ith image is determined based on the zoom magnification of the size of the target subject in the ith image relative to the size of the target subject in the first image, and the magnification of a camera that captures the first image.

5. The method of claim 4, wherein the size of the target subject in the ith image is characterized by at least one of:

the width of the target subject in the ith image,

the height of the target subject in the ith image,

the area of the target subject in the ith image, or,

the number of pixel points occupied by the target subject in the ith image.

6. The method according to claim 4 or 5, characterized in that the method further comprises:

extracting the target subject from the ith image using an example segmentation algorithm to determine a size of the target subject in the ith image.

7. The method according to any one of claims 1 to 6, further comprising:

and displaying first information in the current preview interface, wherein the first information is used for indicating that the shooting of the Hockey zoom video is stopped.

8. The method according to any one of claims 1 to 7, further comprising:

and displaying second information in the current preview interface, wherein the second information is used for indicating that the target main body is static.

9. The method according to any one of claims 1 to 8, further comprising:

and displaying third information in the current preview interface, wherein the third information is used for indicating that the target main body is in the center of the current preview image.

10. The method of any of claims 1 to 9, wherein the acquiring N +1 images in real-time for a first scene comprises:

the first image is acquired when the target subject is in the center of the current preview image.

11. The method according to any one of claims 1 to 10, further comprising:

displaying a user interface, wherein the user interface comprises a first control used for indicating that a Cock zoom video in a Hirsch area is shot from near to far;

the acquiring N +1 images in real time for a first scene includes:

receiving an operation for the first control, and acquiring the N +1 images for the first scene in real time in response to the operation.

12. The method according to any one of claims 1 to 11, wherein the moving speed of the terminal is equal to or less than a preset speed.

13. The method according to any one of claims 1 to 12,

the preset neural network is used for predicting the white balance gain of the image to be processed by combining the characteristic diagram of the historical network layer so as to ensure the white balance consistency of the adjacent images in the time domain; wherein the historical network layer is a network layer used in predicting a white balance gain of an image preceding and temporally continuous with the image to be processed.

14. The method of claim 13,

the preset neural network is obtained by training based on preset constraint conditions; wherein the preset constraint condition comprises: the predicted values of the white balance gains for a plurality of images which are continuous in an analog time domain are consistent.

15. The method according to any one of claims 1 to 14, wherein the performing white balance processing on N images acquired later in the N +1 images based on a preset neural network to obtain N optimized images comprises:

inputting a jth image in the N +1 images into the preset neural network to obtain a white balance gain predicted value of the jth image; wherein j is more than or equal to 2 and less than or equal to N-1, and j is an integer;

applying the white balance gain predicted value of the jth image to obtain an optimized image corresponding to the jth image; wherein the N optimized images comprise the optimized image corresponding to the jth image.

16. A method for shooting video, which is applied to a terminal and comprises the following steps:

acquiring N +1 images aiming at a first scene, wherein the N +1 images comprise a target main body; wherein, in the process of acquiring the N +1 images, the terminal is closer to the target main body; n is an integer of 1 or more; a first image in the N +1 images is acquired by a first camera of the terminal, and part or all of the last N images in the N +1 images are acquired by a second camera of the terminal, wherein the multiplying power of the second camera is smaller than that of the first camera; the size of the target subject in later-acquired N images of the N +1 images is smaller than or equal to the size of the target subject in a first image acquired in the N +1 images;

performing white balance processing on the N images based on a preset neural network to obtain N optimized images; the preset neural network is used for ensuring the white balance consistency of adjacent images in a time domain;

amplifying and cutting the N optimized images to obtain N target images; wherein the size of the target subject in the N target images is consistent with the size of the target subject in the first image, and the relative position of the target subject in the N target images is consistent with the relative position of the target subject in the first image; the N target images are consistent with the first image in size;

17. The method of claim 16, wherein the N images comprise N1 images captured before and N2 images captured after, wherein the N1 images are captured by the second camera and the N2 images are captured by a third camera of the terminal; the N1 and the N2 are each an integer of 1 or more.

18. The method of claim 16 or 17, wherein acquiring N +1 images for a first scene comprises:

acquiring an i +1 th image of the N +1 images for the first scene based on the second camera if the shooting magnification of the i image is within a first shooting magnification range;

acquiring an i +1 th image in the N +1 images for the first scene based on a third camera of the terminal if the shooting magnification of the i image is within a second shooting magnification range;

wherein the magnification of the second camera is b, and the magnification of the third camera is c; b is more than c; the first shooting magnification range is a range greater than or equal to b; the second photographing magnification range is [ c, b).

19. The method of claim 18,

20. The method of claim 19, wherein the size of the target subject in the ith image is characterized by at least one of:

the width of the target subject in the ith image,

the height of the target subject in the ith image,

the area of the target subject in the ith image, or,

the number of pixel points occupied by the target subject in the ith image.

21. The method according to claim 19 or 20, further comprising:

22. The method of any one of claims 16 to 21, further comprising:

23. The method of any one of claims 16 to 22, further comprising:

24. The method of any one of claims 16 to 23, further comprising:

25. The method of any of claims 16 to 24, wherein acquiring N +1 images for a first scene comprises:

26. The method of any one of claims 16 to 25, further comprising:

displaying a user interface, wherein the user interface comprises a second control used for indicating that a Cock zoom video in a Hirsch area is shot from far to near;

the acquiring N +1 images for a first scene includes:

receiving an operation on the second control, and acquiring the N +1 images for the first scene in response to the operation.

27. The method according to any one of claims 16 to 26, wherein the moving speed of the terminal is equal to or less than a preset speed.

28. The method according to any one of claims 16 to 27,

29. The method of claim 28,

30. The method according to any one of claims 16 to 29, wherein the performing white balance processing on N later-acquired images of the N +1 images based on a preset neural network to obtain N optimized images comprises:

inputting a jth image in the N +1 images into the preset neural network to obtain a white balance gain predicted value of the jth image; wherein j is more than or equal to 2 and less than or equal to N-1, and j is an integer.

31. A method for shooting videos is applied to a terminal, the terminal comprises a first camera and a second camera, the magnification of the first camera is different from that of the second camera, and the method comprises the following steps:

respectively acquiring a first image and a second image for a first scene at a first time through the first camera and the second camera; wherein the first image and the second image both contain a target subject;

determining the number N of frames of images to be inserted between the first image and the second image based on the preset playing duration and the preset playing frame rate of the video; wherein N is an integer of 1 or more;

determining N images to be inserted based on the frame number N, the first image and the second image;

generating the video based on the first image, the second image and the N images to be inserted; wherein the size of the target subject in each image of the video gradually becomes larger or smaller.

32. The method of claim 31, wherein the terminal further comprises a third camera having a magnification between the magnifications of the first camera and the second camera, the method further comprising:

acquiring, by the third camera, a third image for the first scene at the first time; wherein the third image contains the target subject;

the determining N images to be inserted based on the frame number, the first image, and the second image includes:

and determining N images to be inserted based on the frame number, the first image, the second image and the third image.

33. A terminal, characterized in that the terminal comprises:

the system comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring N +1 images in real time aiming at a first scene, and the N +1 images comprise target main bodies; wherein, in the process of collecting the N +1 images, the terminal is farther and farther away from the target main body; n is an integer of 1 or more;

a processing unit for performing the steps of:

34. The terminal of claim 33,

35. The terminal according to claim 33 or 34, wherein the acquisition unit is specifically configured to:

36. The terminal of claim 35,

37. The terminal of claim 36, wherein the size of the target subject in the ith image is characterized by at least one of:

the width of the target subject in the ith image,

the height of the target subject in the ith image,

the area of the target subject in the ith image, or,

the number of pixel points occupied by the target subject in the ith image.

38. The terminal according to claim 36 or 37,

the processing unit is further configured to extract the target subject from the ith image using an instance segmentation algorithm to determine a size of the target subject in the ith image.

39. The terminal according to any of claims 33 to 38, characterized in that the terminal further comprises:

and the display unit is used for displaying first information in the current preview interface, wherein the first information is used for indicating that the shooting of the Hirschhorn zoom video is stopped.

40. The terminal according to any of claims 33 to 39, wherein the terminal further comprises:

and the display unit is used for displaying second information in the current preview interface, wherein the second information is used for indicating that the target main body is static.

41. The terminal according to any of claims 33 to 40, characterized in that the terminal further comprises:

and the display unit is used for displaying third information in the current preview interface, wherein the third information is used for indicating that the target main body is in the center of the current preview image.

42. A terminal according to any of claims 33 to 41,

the acquisition unit is specifically configured to acquire the first image when the target subject is in the center of the current preview image.

43. The terminal according to any of claims 33 to 42, characterized in that the terminal further comprises:

the display unit is used for displaying a user interface, the user interface comprises a first control, and the first control is used for indicating that a Cock zoom video in a Hirsch area is shot from near to far; and receiving an operation for the first control;

the acquisition unit is specifically configured to, in response to the operation, acquire the N +1 images for the first scene.

44. A terminal as claimed in any one of claims 33 to 43, characterised in that the speed of movement of the terminal is less than or equal to a predetermined speed.

45. A terminal according to any of claims 33 to 44,

46. The terminal of claim 45,

47. The terminal according to any one of claims 33 to 46, wherein the processing unit is configured to, in the respect of performing white balance processing on the N images acquired later in the N +1 images based on a preset neural network to obtain N optimized images, specifically:

48. A terminal, characterized in that the terminal comprises:

the system comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring N +1 images aiming at a first scene, and the N +1 images comprise target bodies; wherein, in the process of acquiring the N +1 images, the terminal is closer to the target main body; n is an integer of 1 or more; a first image in the N +1 images is acquired by a first camera of the terminal, and part or all of the last N images in the N +1 images are acquired by a second camera of the terminal, wherein the multiplying power of the second camera is smaller than that of the first camera; the size of the target subject in later-acquired N images of the N +1 images is smaller than or equal to the size of the target subject in a first image acquired in the N +1 images;

a processing unit for performing the steps of:

49. The terminal of claim 48, wherein the N images comprise N1 images captured before and N2 images captured after, wherein the N1 images are captured by the second camera and the N2 images are captured by a third camera of the terminal; the N1 and the N2 are each an integer of 1 or more.

50. The terminal according to claim 48 or 49, wherein the acquisition unit is specifically configured to:

51. The terminal of claim 50,

52. The terminal of claim 51, wherein the size of the target subject in the ith image is characterized by at least one of:

the width of the target subject in the ith image,

the height of the target subject in the ith image,

the area of the target subject in the ith image, or,

the number of pixel points occupied by the target subject in the ith image.

53. The terminal according to claim 51 or 52,

54. The terminal according to any of claims 48 to 53, wherein the terminal further comprises:

55. The terminal according to any of claims 48 to 54, characterized in that the terminal further comprises:

56. The terminal according to any of claims 48 to 55, characterized in that the terminal further comprises:

57. A terminal as claimed in any one of claims 48 to 56,

the acquisition unit is specifically configured to: the first image is acquired when the target subject is in the center of the current preview image.

58. A terminal as claimed in any of claims 48 to 57, further comprising:

the display unit is used for displaying a user interface, the user interface comprises a second control, and the second control is used for indicating that the Koch zoom video is shot from far to near; and receiving an operation for the second control;

59. A terminal as claimed in any one of claims 48 to 58, wherein the speed of movement of the terminal is less than or equal to a predetermined speed.

60. The terminal according to any of the claims 48 to 59,

61. The terminal of claim 60,

62. The terminal according to any one of claims 48 to 61, wherein the performing white balance processing on the later-collected N images of the N +1 images based on a preset neural network to obtain N optimized images comprises:

63. A terminal is characterized by comprising an acquisition unit and a processing unit, wherein the acquisition unit comprises a first camera and a second camera, and the multiplying power of the first camera is different from that of the second camera;

the acquisition unit is used for respectively acquiring a first image and a second image aiming at a first scene at a first moment through the first camera and the second camera; wherein the first image and the second image both contain a target subject;

the processing unit is used for executing the following steps:

64. The terminal of claim 63, wherein the capture unit further comprises a third camera having a magnification between the magnifications of the first camera and the second camera;

the acquisition unit is further configured to acquire, by the third camera, a third image for the first scene at the first time; wherein the third image contains the target subject;

the processing unit, in the aspect of determining N images to be inserted based on the frame number, the first image, and the second image, is specifically configured to:

65. A terminal, comprising: a processor, a memory for storing computer programs and instructions, and a camera, the processor for invoking the computer programs and instructions to perform the method of any of claims 1-32 in cooperation with the camera.