WO2021244295A1 - 拍摄视频的方法和装置 - Google Patents

拍摄视频的方法和装置 Download PDF

Info

Publication number
WO2021244295A1
WO2021244295A1 PCT/CN2021/094695 CN2021094695W WO2021244295A1 WO 2021244295 A1 WO2021244295 A1 WO 2021244295A1 CN 2021094695 W CN2021094695 W CN 2021094695W WO 2021244295 A1 WO2021244295 A1 WO 2021244295A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
images
target subject
camera
terminal
Prior art date
Application number
PCT/CN2021/094695
Other languages
English (en)
French (fr)
Inventor
赵威
李宏俏
李宗原
赵鑫源
李成臣
曾毅华
廖桂明
周承涛
李欣
周蔚
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2021244295A1 publication Critical patent/WO2021244295A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/80Camera processing pipelines; Components thereof
    • H04N23/84Camera processing pipelines; Components thereof for processing colour signals
    • H04N23/88Camera processing pipelines; Components thereof for processing colour signals for colour balance, e.g. white-balance circuits or colour temperature control
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/222Studio circuitry; Studio devices; Studio equipment
    • H04N5/262Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/222Studio circuitry; Studio devices; Studio equipment
    • H04N5/262Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects
    • H04N5/2628Alteration of picture size, shape, position or orientation, e.g. zooming, rotation, rolling, perspective, translation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/76Television signal recording

Definitions

  • This application relates to the field of photographing technology and image processing technology, and in particular to a method and device for shooting video.
  • Hitchcock zoom video is one of these visual effects.
  • Hitchcock zoom is a special video shooting technique.
  • the camera moves forward or backward, and changes the focal length while moving forward or backward, so that the size of the target subject in the captured image does not change, but the background image changes drastically.
  • This effect can be used to embody the rich emotions of the main character, bring the tension and shock of space compression or expansion to the user, and obtain an unconventional video recording experience.
  • the embodiments of the present application provide a method and device for shooting a video, which can make the obtained Hitchcock zoom video have a better white balance effect, thereby improving the quality of the Hitchcock zoom video, and thereby improving user experience.
  • a method for shooting a video is provided, and the method is applied to a terminal.
  • the method includes: real-time acquisition of N+1 images for the first scene, the N+1 images all include the target subject; wherein, in the process of collecting the N+1 images, the terminal is more and more away from the target subject.
  • N is an integer greater than or equal to 1.
  • white balance processing is performed based on a preset neural network to obtain N optimized images; the preset neural network is used to ensure the white balance consistency of adjacent images in the time domain.
  • the N optimized images are enlarged and cropped to obtain N target images; among them, the size of the target subject in each target image in the N target images is the same as the target in the first image collected in the N+1 images
  • the size of the subject is the same, and the relative position of the target subject in each of the N target images is consistent with the relative position of the target subject in the first image.
  • the N target images have the same size as the first image.
  • Generate Hitchcock zoom video based on N target images and the first image.
  • N+1 images include N1+1 images collected before and N2 images collected later, where N1+1 images are collected by the first camera of the terminal, and N2 images The image is collected by the second camera of the terminal; N1 and N2 are both integers greater than or equal to 1.
  • the technical solutions provided by the embodiments of the present application can be applied to shooting Hitchcock zoom videos in a scene where cameras are switched.
  • acquiring N+1 images for the first scene in real time includes: acquiring the shooting magnification of the i-th image among the N+1 images; where 2 ⁇ i ⁇ N, i is an integer; If the shooting magnification of the i-th image is within the first shooting magnification range, the first camera of the terminal will collect the i+1 image of the N+1 images for the first scene; if the shooting magnification of the i-th image In the second shooting magnification range, the second camera based on the terminal collects the (i+1)th image among the N+1 images for the first scene.
  • the magnification of the first camera is a
  • the magnification of the second camera is b
  • a ⁇ b the first shooting magnification range
  • the second shooting magnification range is a range greater than or equal to b.
  • the terminal determines the camera that collects the i+1th image based on the shooting magnification of the ith image. In this way, the terminal can enlarge the size of the target subject in the subsequent captured images by switching the camera. Compared with the traditional technology, it helps to make the obtained Hitchcock zoom effect video with higher definition, thereby improving the user Experience.
  • the shooting magnification of the i-th image is based on the magnification of the size of the target subject in the i-th image with respect to the size of the target subject in the first image, and the magnification of the camera that collected the first image The magnification is determined.
  • the size of the target subject in the i-th image is characterized by at least one of the following features: the width of the target subject in the i-th image, the height of the target subject in the i-th image, and the i-th image The area of the target subject in each image, or the number of pixels occupied by the target subject in the i-th image.
  • the method further includes: extracting the target subject from the i-th image by using an instance segmentation algorithm to determine the size of the target subject in the i-th image. In this way, it is helpful to improve the accuracy of determining the size of the target subject in the i-th image.
  • the method further includes: displaying the first information in the current preview interface.
  • the first information is used to instruct to stop shooting the Hitchcock zoom video. In this way, the user can know when to stop the mobile terminal, thereby improving the user experience.
  • the method further includes: displaying the second information in the current preview interface.
  • the second information is used to indicate that the target subject is stationary. Since one of the requirements of the Hitchcock zoom video is that the position of the target subject in each image is consistent, based on this possible design, the user can know whether the current Hitchcock zoom video is satisfied or not during the process of obtaining the Hitchcock zoom video. Area Kirk zoom video requirements, thereby improving user experience.
  • the method further includes: displaying third information in the current preview interface, and the third information is used to indicate that the target subject is in the center of the current preview image.
  • the current preview interface includes the current preview image (ie, the image collected by the camera) and information other than the current preview image (such as shooting controls, instruction information, etc.).
  • the method further includes: real-time acquisition of N+1 images for the first scene, including: when the target subject is in the center of the current preview image, acquiring the first image of the N+1 images . In this way, it helps to improve the quality of Hitchcock's video.
  • the method further includes: displaying a user interface, the user interface includes a first control, the first control is used to instruct to shoot the Hitchcock zoom video from near and far.
  • Collecting N+1 images for the first scene in real time includes: receiving an operation for the first control, and in response to the operation, collecting N+1 images for the first scene in real time.
  • the moving speed of the terminal is less than or equal to the preset speed. In this way, it helps to improve the quality of Hitchcock zoom video.
  • the preset neural network is used to combine the feature maps of the historical network layer to predict the white balance gain of the image to be processed to ensure the consistency of the white balance of adjacent images in the time domain; among them, the historical network layer It is the network layer used when predicting the white balance gain of the image that is before the image to be processed and is continuous in the time domain with the image to be processed.
  • the image to be processed is one of the above N images.
  • the white balance network integrates the network layer feature information of the current frame and the historical frame. In this way, considering the information of multiple frames helps to make the white balance gain prediction value between frames closer, so that the white balance network is more stable, and then the white balance processing on multiple consecutive images is obtained. The white balance consistency between the images is better.
  • the preset neural network is trained based on preset constraint conditions; where the preset constraint conditions include: the white balance gain prediction values of multiple consecutive images used for simulating the time domain are consistent.
  • N optimized images including: The images are input to the preset neural network, and the white balance gain prediction value of the j-th image is obtained; among them, 2 ⁇ j ⁇ N-1, and j is an integer.
  • the white balance gain prediction value of the j-th image is applied to the j-th image to obtain an optimized image corresponding to the j-th image; wherein, the N optimized images include the optimized image corresponding to the j-th image.
  • a method for shooting a video is provided.
  • the method is applied to a terminal.
  • the method includes: collecting N+1 images for a first scene, each of the N+1 images includes a target subject; In the process of one image, the terminal gets closer and closer to the target subject; N is an integer greater than or equal to 1.
  • the first image of the N+1 images is collected by the first camera of the terminal, and part or all of the last N images among the N+1 images are collected by the second camera of the terminal.
  • the magnification is smaller than the magnification of the first camera.
  • the size of the target subject in the N images acquired after the N+1 images is smaller than or equal to the size of the target subject in the first image acquired in the N+1 images.
  • N optimized images For N images acquired after N+1 images, white balance processing is performed based on a preset neural network to obtain N optimized images; the preset neural network is used to ensure the white balance consistency of adjacent images in the time domain.
  • the N optimized images are enlarged and cropped to obtain N target images.
  • the size of the target subject in each of the N target images is the same as the size of the target subject in the first image collected in the N+1 images, and the target in each target image of the N target images
  • the relative position of the subject is consistent with the relative position of the target subject in the first image; the size of the N target images is the same as the size of the first image.
  • a Hitchcock zoom video is generated.
  • the N+1 images may be N+1 images that are continuously collected, that is, N+1 images that are collected in real time.
  • the size of the target subject in the subsequent captured image is less than or equal to the previous capture The size of the target subject in the image.
  • white balance processing is performed on the last N images of the N+1 captured images, so that the processed image is consistent with the captured N+1 images The white balance of the first image in is consistent. In this way, the white balance effect of the obtained Hitchcock zoom video can be better, thereby improving the quality of the Hitchcock zoom video and improving the user experience.
  • N images include N1 images collected before and N2 images collected later, where N1 images are collected by the second camera, and N2 images are collected by the third camera of the terminal Obtain; N1 and N2 are integers greater than or equal to 1.
  • acquiring N+1 images for the first scene in real time includes: acquiring the shooting magnification of the i-th image among the N+1 images; where 2 ⁇ i ⁇ N, i is an integer; If the shooting magnification of the i-th image is within the range of the first shooting magnification, the i+1th image among the N+1 images is collected for the first scene based on the second camera; if the shooting magnification of the i-th image is in the first scene Within the range of the second shooting magnification, the third camera of the terminal collects the i+1th image of the N+1 images for the first scene; where the magnification of the second camera is b and the magnification of the third camera is c; b>c; the first shooting magnification range is a range greater than or equal to b; the second shooting magnification range is [c, b).
  • the terminal determines the camera that collects the i+1th image based on the shooting magnification of the ith image. In this way, the terminal can use a camera with a smaller magnification than the camera used to collect the previous image to collect the subsequent image, that is, realize the reduction of the target subject by switching to a camera with a smaller magnification, so that there is no need to capture The resulting image is "filled" to improve user experience.
  • the shooting magnification of the i-th image is based on the magnification of the size of the target subject in the i-th image with respect to the size of the target subject in the first image, and the magnification of the camera that collected the first image The magnification is determined.
  • the size of the target subject in the i-th image is characterized by at least one of the following features: the width of the target subject in the i-th image, the height of the target subject in the i-th image, and the i-th image The area of the target subject in each image, or the number of pixels occupied by the target subject in the i-th image.
  • the method further includes: extracting the target subject from the i-th image by using an instance segmentation algorithm to determine the size of the target subject in the i-th image. In this way, it is helpful to improve the accuracy of determining the size of the target subject in the i-th image.
  • the method further includes: in the current preview interface, displaying first information, where the first information is used to instruct to stop shooting the Hitchcock zoom video. In this way, it is helpful to instruct the user to stop the mobile terminal at an appropriate time, thereby improving the user experience.
  • the method further includes: displaying second information in the current preview interface, and the second information is used to indicate that the target subject is stationary. Based on this possible design, the user can know whether the current requirements for obtaining the Hitchcock zoom video are met during the process of obtaining the Hitchcock zoom video, thereby improving the user experience.
  • the method further includes: displaying third information in the current preview interface, and the third information is used to indicate that the target subject is in the center of the current preview image. In this way, the user can determine whether to move the terminal based on whether the terminal displays the third information, thereby helping to improve the quality of Hitchcock video.
  • collecting N+1 images for the first scene includes: collecting the first image when the target subject is in the center of the current preview image. In this way, it helps to improve the quality of Hitchcock's video.
  • the method further includes: displaying a user interface, the user interface including a second control, the second control is used to instruct to shoot the Hitchcock zoom video from far and near.
  • Collecting N+1 images for the first scene includes: receiving an operation for the second control, and in response to the operation, collecting N+1 images for the first scene.
  • the moving speed of the terminal is less than or equal to the preset speed. In this way, it helps to improve the quality of the high-hitch Kirk zoom video.
  • the preset neural network is used to combine the feature maps of the historical network layer to predict the white balance gain of the image to be processed to ensure the consistency of the white balance of adjacent images in the time domain; among them, the historical network layer It is the network layer used when predicting the white balance gain of the image that is before the image to be processed and is continuous in the time domain with the image to be processed.
  • the preset neural network is trained based on preset constraint conditions; where the preset constraint conditions include: the white balance gain prediction values of multiple consecutive images used for simulating the time domain are consistent.
  • N optimized images including: The images are input to the preset neural network, and the white balance gain prediction value of the j-th image is obtained; among them, 2 ⁇ j ⁇ N-1, and j is an integer.
  • the white balance gain prediction value of the j-th image is applied to the j-th image to obtain an optimized image corresponding to the j-th image; wherein, the N optimized images include the optimized image corresponding to the j-th image.
  • a method for shooting a video is provided, which is applied to a terminal.
  • the terminal includes a first camera and a second camera, and the magnification of the first camera is different from the magnification of the second camera.
  • the method includes: separately collecting a first image and a second image for a first scene at a first moment through a first camera and a second camera; wherein the first image and the second image both contain the target subject.
  • the number of frames N of the image to be inserted between the first image and the second image is determined, where N is an integer greater than or equal to 1.
  • N images to be inserted are determined.
  • a video is generated; the size of the target subject in each image in the video gradually becomes larger or smaller.
  • the terminal collects multiple frames of images for the same scene at the same time through multiple cameras, and inserts frames based on the multiple frames of images to generate a video.
  • the size of the target subject in each image in the video gradually increases or Gradually become smaller. In this way, compared to traditional technology, it helps to improve the quality of the generated video. In addition, it helps to enhance the fun of the animation effect and enhance the user's stickiness to the terminal.
  • the terminal further includes a third camera, and the magnification of the third camera is between the magnifications of the first camera and the second camera.
  • the method further includes: collecting a third image for the first scene at the first moment through the third camera; wherein the third image contains the target subject.
  • determining N images to be inserted includes: determining N images to be inserted based on the number of frames N, the first image, the second image, and the third image. In this way, it helps to further improve the quality of the video.
  • a terminal is provided.
  • the terminal may be used to execute any of the methods provided in the first aspect to the third aspect.
  • the terminal may be divided into functional modules according to any of the methods provided in the first aspect to the third aspect.
  • each function module can be divided corresponding to each function, or two or more functions can be integrated into one processing module.
  • the present application may divide the terminal into a collection unit, a processing unit, a display unit, etc. according to functions.
  • the device includes a memory and a processor, the memory is used to store computer instructions, and the processor is used to call the computer instructions to perform any of the tasks provided in the first to third aspects. a way.
  • the acquisition step in any one of the methods provided in the first aspect to the third aspect can be specifically replaced with a control acquisition step in this possible design.
  • the display step in the above corresponding method can be specifically replaced with a control display step in this possible design.
  • a terminal including a processor, a memory, and a camera.
  • the camera is used to collect images, etc.
  • the memory is used to store computer programs and instructions
  • the processor is used to call the computer programs and instructions, and cooperate with the one or more cameras to execute the corresponding technical solutions provided in the first to third aspects.
  • a computer-readable storage medium such as a non-transitory computer-readable storage medium.
  • a computer program (or instruction) is stored thereon, and when the computer program (or instruction) runs on a computer, the computer is caused to execute any one of the methods provided in the first aspect to the third aspect.
  • the acquisition step in any one of the methods provided in the first aspect to the third aspect can be specifically replaced with a control acquisition step in this possible design.
  • the display step in the above corresponding method can be specifically replaced with a control display step in this possible design.
  • a computer program product which when running on a computer, enables any one of the methods provided in the first to third aspects to be executed.
  • the acquisition step in any one of the methods provided in the first aspect to the third aspect can be specifically replaced with a control acquisition step in this possible design.
  • the display step in the above corresponding method can be specifically replaced with a control display step in this possible design.
  • any of the terminals, computer storage media, computer program products, or chip systems provided above can be applied to the corresponding methods provided above. Therefore, the beneficial effects that can be achieved can refer to the corresponding The beneficial effects of the method will not be repeated here.
  • FIG. 1 is a schematic diagram of the hardware structure of a terminal applicable to the embodiments of this application;
  • FIG. 2 is a block diagram of the software structure of a terminal applicable to the embodiments of this application;
  • FIG. 3 is a schematic diagram of an interface change for starting a Hitchcock zoom video shooting mode provided by an embodiment of the application;
  • FIG. 4 is a schematic diagram of the hardware structure of a computer device provided by an embodiment of the application.
  • FIG. 5 is a schematic flowchart of a training data preparation process before training a white balance network according to an embodiment of the application
  • FIG. 6 is a schematic diagram of a network architecture used when training a white balance network according to an embodiment of the application
  • FIG. 7 is a schematic diagram of another network architecture used when training a white balance network according to an embodiment of the application.
  • FIG. 8 is a schematic diagram of a network architecture used in the prediction stage provided by an embodiment of the application.
  • FIG. 9 is a schematic flowchart of a method for predicting white balance gain according to an embodiment of the application.
  • FIG. 10 is a schematic flowchart of a method for shooting a video provided by an embodiment of the application.
  • FIG. 11 is a schematic diagram of an interface change for starting Hitchcock zoom video shooting from near and far mode according to an embodiment of the application.
  • FIG. 12 is a schematic diagram of a set of interfaces provided by an embodiment of the application.
  • FIG. 13 is a schematic diagram of another set of interfaces provided by an embodiment of the application.
  • FIG. 14 is a schematic diagram of another set of interfaces provided by an embodiment of the application.
  • FIG. 15 is a schematic diagram of another set of interfaces provided by an embodiment of the application.
  • FIG. 16 is a schematic diagram of another set of interfaces provided by an embodiment of this application.
  • FIG. 17 is a schematic diagram of another set of interfaces provided by an embodiment of the application.
  • FIG. 18 is a schematic diagram of another set of interfaces provided by an embodiment of this application.
  • FIG. 19a is a schematic flowchart of a method for a terminal to collect images according to an embodiment of the application.
  • FIG. 19b is a schematic flowchart of a method for determining a camera for collecting images according to an embodiment of the application.
  • FIG. 20 is a schematic diagram of an instance segmentation provided by an embodiment of this application.
  • FIG. 21 is a schematic diagram of a process of collecting images by a terminal according to an embodiment of the application.
  • FIG. 22a is a schematic diagram of an image collected by a terminal according to an embodiment of the application.
  • FIG. 22b is a schematic diagram of an image collected by another terminal according to an embodiment of the application.
  • FIG. 23a is a schematic diagram of a current preview interface provided by an embodiment of the application.
  • FIG. 23b is a schematic diagram of another current preview interface provided by an embodiment of the application.
  • FIG. 23c is a schematic diagram of another current preview interface provided by an embodiment of the application.
  • FIG. 24 is a schematic diagram of zooming in and cropping a captured image provided by an embodiment of the application.
  • FIG. 25 is a schematic flowchart of another method for shooting a video provided by an embodiment of the application.
  • FIG. 26 is a schematic diagram of the process of processing the collected images in the Hitchcock zoom video in the traditional technology
  • FIG. 27 is a schematic flowchart of another method for shooting a video provided by an embodiment of the application.
  • FIG. 28 is a schematic diagram of a process of processing an image provided by an embodiment of the application.
  • FIG. 29 is a schematic structural diagram of a terminal provided by an embodiment of this application.
  • FIG. 30 is a schematic structural diagram of another terminal provided by an embodiment of this application.
  • words such as “exemplary” or “for example” are used as examples, illustrations, or illustrations. Any embodiment or design solution described as “exemplary” or “for example” in the embodiments of the present application should not be construed as being more preferable or advantageous than other embodiments or design solutions. To be precise, words such as “exemplary” or “for example” are used to present related concepts in a specific manner.
  • first and second are only used for descriptive purposes, and cannot be understood as indicating or implying relative importance or implicitly indicating the number of indicated technical features. Therefore, the features defined with “first” and “second” may explicitly or implicitly include one or more of these features. In the description of the embodiments of the present application, unless otherwise specified, "plurality" means two or more.
  • the term "consistent" is only used to describe the same or similar (that is, not much difference).
  • the small difference can be reflected by the difference between the corresponding parameters being less than or equal to the threshold.
  • the size of the target subjects is the same, which means that the size of the target subjects is the same or the difference is less than or equal to a threshold, and so on.
  • the video shooting method provided in the embodiments of this application can be applied to a terminal, which can be a terminal with a camera, such as a smart phone, a tablet computer, a wearable device, an AR/VR device, or a personal computer (personal computer).
  • a terminal can be a terminal with a camera, such as a smart phone, a tablet computer, a wearable device, an AR/VR device, or a personal computer (personal computer).
  • PC personal digital assistant
  • PDA personal digital assistant
  • netbook may also be any other terminal that can implement the embodiments of the present application.
  • This application does not limit the specific form of the terminal.
  • the structure of the terminal may be as shown in Figure 1.
  • the terminal 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (USB) interface 130, a charging management module 140, a power management module 141, and a battery 142, Antenna 1, antenna 2, mobile communication module 150, wireless communication module 160, audio module 170, speaker 170A, receiver 170B, microphone 170C, earphone interface 170D, sensor module 180, button 190, motor 191, indicator 192, camera 193, A display screen 194, and a subscriber identification module (SIM) card interface 195, etc.
  • SIM subscriber identification module
  • the sensor module 180 may include a pressure sensor 180A, a gyroscope sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity light sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, and ambient light Sensor 180L, bone conduction sensor 180M, etc.
  • the structure illustrated in this embodiment does not constitute a specific limitation on the terminal 100.
  • the terminal 100 may include more or fewer components than shown, or combine certain components, or split certain components, or arrange different components.
  • the illustrated components can be implemented in hardware, software, or a combination of software and hardware.
  • the processor 110 may include one or more processing units.
  • the processor 110 may include an application processor (AP), a modem processor, a graphics processing unit (GPU), and an image signal processor. (image signal processor, ISP), controller, video codec, digital signal processor (digital signal processor, DSP), baseband processor, and/or neural-network processing unit (NPU), etc.
  • the different processing units may be independent devices or integrated in one or more processors.
  • the processor 110 may control the camera 193 to collect N+1 images for the first scene in real time, and the N+1 images all include the target subject. Among them, in the process of collecting N+1 images, the camera 193 is getting farther and farther from the target subject; N is an integer greater than or equal to 1.
  • the processor 110 may perform white balance processing based on a preset neural network to obtain N optimized images; the preset neural network is used to ensure the whiteness of adjacent images in the time domain. Balance consistency. Then, the processor 110 may enlarge and crop the N optimized images to obtain N target images; where the size of the target subject in the N target images is the same as the target subject in the first image collected in the N+1 images. The relative position of the target subject in the N target images is consistent with the relative position of the target subject in the first image; the size of the N target images is the same as the size of the first image. Finally, the processor 110 may generate a Hitchcock zoom video based on the N target images and the first image. For the relevant description of the technical solution, please refer to the following.
  • the controller may be the nerve center and command center of the terminal 100.
  • the controller can generate operation control signals according to the instruction operation code and timing signals to complete the control of fetching and executing instructions.
  • a memory may also be provided in the processor 110 to store instructions and data.
  • the memory in the processor 110 is a cache memory.
  • the memory can store instructions or data that have just been used or recycled by the processor 110. If the processor 110 needs to use the instruction or data again, it can be directly called from the memory. Repeated accesses are avoided, the waiting time of the processor 110 is reduced, and the efficiency of the system is improved.
  • the processor 110 may include one or more interfaces.
  • the interface can include an integrated circuit (inter-integrated circuit, I2C) interface, an integrated circuit built-in audio (inter-integrated circuit sound, I2S) interface, a pulse code modulation (pulse code modulation, PCM) interface, and a universal asynchronous transmitter (universal asynchronous transmitter) interface.
  • I2C integrated circuit
  • I2S integrated circuit built-in audio
  • PCM pulse code modulation
  • PCM pulse code modulation
  • UART universal asynchronous transmitter
  • MIPI mobile industry processor interface
  • GPIO general-purpose input/output
  • SIM subscriber identity module
  • USB Universal Serial Bus
  • the MIPI interface can be used to connect the processor 110 with the display screen 194, the camera 193 and other peripheral devices.
  • the MIPI interface includes camera serial interface (camera serial interface, CSI), display serial interface (display serial interface, DSI), etc.
  • the processor 110 and the camera 193 communicate through a CSI interface to implement the shooting function of the terminal 100.
  • the processor 110 and the display screen 194 communicate through a DSI interface to realize the display function of the terminal 100.
  • the GPIO interface can be configured through software.
  • the GPIO interface can be configured as a control signal or as a data signal.
  • the GPIO interface can be used to connect the processor 110 with the camera 193, the display screen 194, the wireless communication module 160, the audio module 170, the sensor module 180, and so on.
  • the GPIO interface can also be configured as an I2C interface, I2S interface, UART interface, MIPI interface, etc.
  • the USB interface 130 is an interface that complies with the USB standard specification, and specifically may be a Mini USB interface, a Micro USB interface, a USB Type C interface, and so on.
  • the USB interface 130 can be used to connect a charger to charge the terminal 100, and can also be used to transfer data between the terminal 100 and peripheral devices. It can also be used to connect earphones and play audio through earphones. This interface can also be used to connect to other terminals, such as AR devices.
  • the interface connection relationship between the modules illustrated in this embodiment is merely a schematic description, and does not constitute a structural limitation of the terminal 100.
  • the terminal 100 may also adopt different interface connection modes in the foregoing embodiments, or a combination of multiple interface connection modes.
  • the power management module 141 is used to connect the battery 142, the charging management module 140 and the processor 110.
  • the power management module 141 receives input from the battery 142 and/or the charging management module 140, and supplies power to the processor 110, the internal memory 121, the display screen 194, the camera 193, and the wireless communication module 160.
  • the power management module 141 can also be used to monitor parameters such as battery capacity, battery cycle times, and battery health status (leakage, impedance).
  • the power management module 141 may also be provided in the processor 110.
  • the power management module 141 and the charging management module 140 may also be provided in the same device.
  • the wireless communication function of the terminal 100 can be implemented by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, the modem processor, and the baseband processor.
  • the terminal 100 implements a display function through a GPU, a display screen 194, and an application processor.
  • the GPU is a microprocessor for image processing, connected to the display 194 and the application processor.
  • the GPU is used to perform mathematical and geometric calculations and is used for graphics rendering.
  • the processor 110 may include one or more GPUs that execute program instructions to generate or change display information.
  • the display screen 194 is used to display images, videos, and the like.
  • the display screen 194 includes a display panel.
  • the display panel can adopt liquid crystal display (LCD), organic light-emitting diode (OLED), active matrix organic light-emitting diode or active-matrix organic light-emitting diode (active-matrix organic light-emitting diode).
  • LCD liquid crystal display
  • OLED organic light-emitting diode
  • active-matrix organic light-emitting diode active-matrix organic light-emitting diode
  • AMOLED flexible light-emitting diode (FLED), Miniled, MicroLed, Micro-oled, quantum dot light-emitting diode (QLED), etc.
  • the terminal 100 may include one or N display screens 194, and N is a positive integer greater than one.
  • GUIs graphical user interfaces
  • the size of the display screen 194 of the terminal 100 is fixed, and only limited controls can be displayed on the display screen 194 of the terminal 100.
  • Control is a kind of GUI element, it is a kind of software component, contained in the application, and controls all the data processed by the application and the interactive operations on these data. The user can interact with the control through direct manipulation. , So as to read or edit the relevant information of the application.
  • controls may include visual interface elements such as icons, buttons, menus, tabs, text boxes, dialog boxes, status bars, navigation bars, and Widgets.
  • the terminal 100 can implement a shooting function through an ISP, a camera 193, a video codec, a GPU, a display screen 194, and an application processor.
  • the ISP is used to process the data fed back from the camera 193. For example, when taking a picture, the shutter is opened, the light is transmitted to the photosensitive element of the camera through the lens, the light signal is converted into an electrical signal, and the photosensitive element of the camera transfers the electrical signal to the ISP for processing and is converted into an image visible to the naked eye.
  • ISP can also optimize the image noise, brightness, and skin color. ISP can also optimize the exposure, color temperature and other parameters of the shooting scene.
  • the ISP may be provided in the camera 193.
  • the camera 193 is used to capture still images or videos.
  • the object generates an optical image through the lens and is projected to the photosensitive element.
  • the photosensitive element may be a charge coupled device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor.
  • CMOS complementary metal-oxide-semiconductor
  • the photosensitive element converts the optical signal into an electrical signal, and then transfers the electrical signal to the ISP to convert it into a digital image signal.
  • ISP outputs digital image signals to DSP for processing.
  • DSP converts digital image signals into standard RGB, YUV and other formats of image signals.
  • the terminal 100 may include 1 or N cameras 193, and N is a positive integer greater than 1.
  • the aforementioned camera 193 may include one or at least two cameras such as a main camera, a telephoto camera, a wide-angle camera, an infrared camera, a depth camera, or a black and white camera.
  • the first terminal may use one or at least two cameras to collect images, and process the collected images (such as fusion, etc.) to obtain preview images (such as the first preview image or The second preview image, etc.).
  • Digital signal processors are used to process digital signals. In addition to digital image signals, they can also process other digital signals. For example, when the terminal 100 selects a frequency point, the digital signal processor is used to perform Fourier transform on the energy of the frequency point.
  • Video codecs are used to compress or decompress digital video.
  • the terminal 100 may support one or more video codecs. In this way, the terminal 100 can play or record videos in multiple encoding formats, such as: moving picture experts group (MPEG) 1, MPEG2, MPEG3, MPEG4, and so on.
  • MPEG moving picture experts group
  • MPEG2 MPEG2, MPEG3, MPEG4, and so on.
  • NPU is a neural-network (NN) computing processor.
  • NN neural-network
  • applications such as intelligent cognition of the terminal 100 can be realized, such as image recognition, face recognition, voice recognition, text understanding, and so on.
  • the external memory interface 120 may be used to connect an external memory card, such as a Micro SD card, to expand the storage capacity of the terminal 100.
  • the external memory card communicates with the processor 110 through the external memory interface 120 to realize the data storage function. For example, save music, video and other files in an external memory card.
  • the internal memory 121 may be used to store computer executable program code, where the executable program code includes instructions.
  • the processor 110 executes various functional applications and data processing of the terminal 100 by running instructions stored in the internal memory 121.
  • the processor 110 may acquire the posture of the terminal 100 by executing instructions stored in the internal memory 121.
  • the internal memory 121 may include a storage program area and a storage data area.
  • the storage program area can store an operating system, at least one application program (such as a sound playback function, an image playback function, etc.) required by at least one function.
  • the data storage area can store data (such as audio data, phone book, etc.) created during the use of the terminal 100.
  • the internal memory 121 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, a universal flash storage (UFS), and the like.
  • the processor 110 executes various functional applications and data processing of the terminal 100 by running instructions stored in the internal memory 121 and/or instructions stored in a memory provided in the processor.
  • the terminal 100 can implement audio functions through the audio module 170, the speaker 170A, the receiver 170B, the microphone 170C, the earphone interface 170D, and the application processor. For example, music playback, recording, etc.
  • the audio module 170 is used to convert digital audio information into an analog audio signal for output, and is also used to convert an analog audio input into a digital audio signal.
  • the audio module 170 can also be used to encode and decode audio signals.
  • the audio module 170 may be provided in the processor 110, or part of the functional modules of the audio module 170 may be provided in the processor 110.
  • the speaker 170A also called “speaker” is used to convert audio electrical signals into sound signals.
  • the terminal 100 can listen to music through the speaker 170A, or listen to a hands-free call.
  • the receiver 170B also called a "handset" is used to convert audio electrical signals into sound signals.
  • the terminal 100 answers a call or voice message, it can receive the voice by bringing the receiver 170B close to the human ear.
  • the microphone 170C also called “microphone”, “microphone”, is used to convert sound signals into electrical signals.
  • the user can make a sound by approaching the microphone 170C through the human mouth, and input the sound signal into the microphone 170C.
  • the terminal 100 may be provided with at least one microphone 170C. In other embodiments, the terminal 100 may be provided with two microphones 170C, which can implement noise reduction functions in addition to collecting sound signals. In other embodiments, the terminal 100 may also be provided with three, four or more microphones 170C to collect sound signals, reduce noise, identify sound sources, and realize directional recording functions.
  • the earphone interface 170D is used to connect wired earphones.
  • the earphone interface 170D may be a USB interface 130, or a 3.5mm open mobile terminal platform (OMTP) standard interface, or a cellular telecommunications industry association (cellular telecommunications industry association of the USA, CTIA) standard interface.
  • OMTP open mobile terminal platform
  • CTIA cellular telecommunications industry association
  • the pressure sensor 180A is used to sense the pressure signal and can convert the pressure signal into an electrical signal.
  • the pressure sensor 180A may be provided on the display screen 194.
  • the capacitive pressure sensor may include at least two parallel plates with conductive materials. When a force is applied to the pressure sensor 180A, the capacitance between the electrodes changes.
  • the terminal 100 determines the strength of the pressure according to the change in capacitance.
  • the terminal 100 detects the intensity of the touch operation according to the pressure sensor 180A.
  • the terminal 100 may also calculate the touched position according to the detection signal of the pressure sensor 180A.
  • touch operations that act on the same touch position but have different touch operation intensities can correspond to different operation instructions. For example: when a touch operation whose intensity is less than the first pressure threshold is applied to the short message application icon, an instruction to view the short message is executed. When a touch operation with a touch operation intensity greater than or equal to the first pressure threshold acts on the short message application icon, an instruction to create a new short message is executed.
  • the gyro sensor 180B may be used to determine the movement posture of the terminal 100.
  • the angular velocity of the terminal 100 around three axes ie, x, y, and z axes
  • the gyro sensor 180B can be used for image stabilization.
  • the gyro sensor 180B detects the shake angle of the terminal 100, calculates the distance that the lens module needs to compensate according to the angle, and allows the lens to counteract the shake of the terminal 100 through a reverse movement to achieve anti-shake.
  • the gyro sensor 180B can also be used for navigation and somatosensory game scenes.
  • the air pressure sensor 180C is used to measure air pressure.
  • the terminal 100 calculates the altitude based on the air pressure value measured by the air pressure sensor 180C to assist positioning and navigation.
  • the magnetic sensor 180D includes a Hall sensor.
  • the terminal 100 may use the magnetic sensor 180D to detect the opening and closing of the flip holster.
  • the terminal 100 can detect the opening and closing of the flip according to the magnetic sensor 180D. Then, according to the detected opening and closing state of the leather case or the opening and closing state of the flip cover, features such as automatic unlocking of the flip cover are set.
  • the acceleration sensor 180E can detect the magnitude of the acceleration of the terminal 100 in various directions (generally three-axis). When the terminal 100 is stationary, the magnitude and direction of gravity can be detected. It can also be used to recognize terminal gestures, switch between horizontal and vertical screens, pedometers and other applications.
  • the terminal 100 can measure the distance by infrared or laser. In some embodiments, when shooting a scene, the terminal 100 may use the distance sensor 180F to measure the distance to achieve fast focusing.
  • the proximity light sensor 180G may include, for example, a light emitting diode (LED) and a light detector such as a photodiode.
  • the light emitting diode may be an infrared light emitting diode.
  • the terminal 100 emits infrared light to the outside through the light emitting diode.
  • the terminal 100 uses a photodiode to detect infrared reflected light from nearby objects. When sufficient reflected light is detected, it can be determined that there is an object near the terminal 100. When insufficient reflected light is detected, the terminal 100 may determine that there is no object near the terminal 100.
  • the terminal 100 can use the proximity light sensor 180G to detect that the user holds the terminal 100 close to the ear to talk, so as to automatically turn off the screen to save power.
  • the proximity light sensor 180G can also be used in leather case mode, and the pocket mode will automatically unlock and lock the screen.
  • the ambient light sensor 180L is used to sense the brightness of the ambient light.
  • the terminal 100 can adaptively adjust the brightness of the display screen 194 according to the perceived brightness of the ambient light.
  • the ambient light sensor 180L can also be used to automatically adjust the white balance when taking pictures.
  • the ambient light sensor 180L can also cooperate with the proximity light sensor 180G to detect whether the terminal 100 is in a pocket to prevent accidental touch.
  • the fingerprint sensor 180H is used to collect fingerprints.
  • the terminal 100 can use the collected fingerprint characteristics to implement fingerprint unlocking, access application locks, fingerprint photographs, fingerprint answering calls, and so on.
  • the temperature sensor 180J is used to detect temperature.
  • the terminal 100 uses the temperature detected by the temperature sensor 180J to execute a temperature processing strategy. For example, when the temperature reported by the temperature sensor 180J exceeds a threshold value, the terminal 100 executes to reduce the performance of the processor located near the temperature sensor 180J, so as to reduce power consumption and implement thermal protection.
  • the terminal 100 when the temperature is lower than another threshold, the terminal 100 heats the battery 142 to avoid abnormal shutdown of the terminal 100 due to low temperature.
  • the terminal 100 boosts the output voltage of the battery 142 to avoid abnormal shutdown caused by low temperature.
  • Touch sensor 180K also called “touch device”.
  • the touch sensor 180K may be arranged on the display screen 194, and the touch screen is composed of the touch sensor 180K and the display screen 194, which is also called a “touch screen”.
  • the touch sensor 180K is used to detect touch operations acting on or near it.
  • the touch sensor can pass the detected touch operation to the application processor to determine the type of touch event.
  • the visual output related to the touch operation can be provided through the display screen 194.
  • the touch sensor 180K may also be disposed on the surface of the terminal 100, which is different from the position of the display screen 194.
  • the bone conduction sensor 180M can acquire vibration signals.
  • the bone conduction sensor 180M can acquire the vibration signal of the vibrating bone mass of the human voice.
  • the bone conduction sensor 180M can also contact the human pulse and receive the blood pressure pulse signal.
  • the bone conduction sensor 180M may also be provided in the earphone, combined with the bone conduction earphone.
  • the audio module 170 can parse the voice signal based on the vibration signal of the vibrating bone block of the voice obtained by the bone conduction sensor 180M, and realize the voice function.
  • the application processor may analyze the heart rate information based on the blood pressure beating signal obtained by the bone conduction sensor 180M, and realize the heart rate detection function.
  • the button 190 includes a power-on button, a volume button, and so on.
  • the button 190 may be a mechanical button. It can also be a touch button.
  • the terminal 100 may receive key input, and generate key signal input related to user settings and function control of the terminal 100.
  • the motor 191 can generate vibration prompts.
  • the motor 191 can be used for incoming call vibration notification, and can also be used for touch vibration feedback.
  • touch operations that act on different applications can correspond to different vibration feedback effects.
  • Acting on touch operations in different areas of the display screen 194, the motor 191 can also correspond to different vibration feedback effects.
  • Different application scenarios for example: time reminding, receiving information, alarm clock, games, etc.
  • the touch vibration feedback effect can also support customization.
  • the indicator 192 may be an indicator light, which may be used to indicate the charging status, power change, or to indicate messages, missed calls, notifications, and so on.
  • an operating system runs.
  • the iOS operating system developed by Apple the Android open source operating system developed by Google
  • the Windows operating system developed by Microsoft You can install and run applications on this operating system.
  • the operating system of the terminal 100 may adopt a layered architecture, an event-driven architecture, a microkernel architecture, a microservice architecture, or a cloud architecture.
  • the embodiment of the present application takes an Android system with a layered architecture as an example to illustrate the software structure of the terminal 100 by way of example.
  • FIG. 2 is a block diagram of the software structure of the terminal 100 according to an embodiment of the present application.
  • the layered architecture divides the software into several layers, and each layer has a clear role and division of labor. Communication between layers through software interface.
  • the Android system is divided into four layers, from top to bottom, the application layer, the application framework layer, the Android runtime and system library, and the kernel layer.
  • the application layer can include a series of application packages.
  • the application package can include applications such as camera, gallery, calendar, call, map, navigation, WLAN, Bluetooth, music, video, short message, etc.
  • the camera application can access the camera interface management service provided by the application framework layer.
  • the application framework layer provides an application programming interface (application programming interface, API) and a programming framework for applications in the application layer.
  • the application framework layer includes some predefined functions.
  • the application framework layer can include a window manager, a content provider, a view system, a phone manager, a resource manager, and a notification manager.
  • the application framework layer may provide APIs related to the photographing function for the application layer, and provide camera interface management services for the application layer to realize the photographing function.
  • the window manager is used to manage window programs.
  • the window manager can obtain the size of the display screen, determine whether there is a status bar, lock the screen, take a screenshot, etc.
  • the content provider is used to store and retrieve data and make these data accessible to applications.
  • the data may include video, image, audio, phone calls made and received, browsing history and bookmarks, phone book, etc.
  • the view system includes visual controls, such as controls that display text, controls that display pictures, and so on.
  • the view system can be used to build applications.
  • the display interface can be composed of one or more views.
  • a display interface that includes a short message notification icon may include a view that displays text and a view that displays pictures.
  • the phone manager is used to provide the communication function of the terminal 100. For example, the management of the call status (including connecting, hanging up, etc.).
  • the resource manager provides various resources for the application, such as localized strings, icons, pictures, layout files, video files, and so on.
  • the notification manager enables the application to display notification information in the status bar, which can be used to convey notification-type messages, and it can automatically disappear after a short stay without user interaction.
  • the notification manager is used to notify download completion, message reminders, and so on.
  • the notification manager can also be a notification that appears in the status bar at the top of the system in the form of a chart or a scroll bar text, such as a notification of an application running in the background, or a notification that appears on the screen in the form of a dialog window. For example, text messages are prompted in the status bar, a prompt sound is issued, the terminal vibrates, and the indicator light flashes.
  • Android Runtime includes core libraries and virtual machines. Android runtime is responsible for the scheduling and management of the Android system.
  • the core library consists of two parts: one part is the function function that the java language needs to call, and the other part is the core library of Android.
  • the application layer and the application framework layer run in a virtual machine.
  • the virtual machine executes the java files of the application layer and the application framework layer as binary files.
  • the virtual machine is used to perform functions such as object life cycle management, stack management, thread management, security and exception management, and garbage collection.
  • the system library can include multiple functional modules. For example: surface manager (surface manager), media library (Media Libraries), three-dimensional graphics processing library (for example: OpenGL ES), 2D graphics engine (for example: SGL), etc.
  • the surface manager is used to manage the display subsystem and provides a combination of 2D and 3D layers for multiple applications.
  • the media library supports playback and recording of a variety of commonly used audio and video formats, as well as still image files.
  • the media library can support a variety of audio and video encoding formats, such as: MPEG4, H.264, MP3, AAC, AMR, JPG, PNG, etc.
  • the 3D graphics processing library is used to realize 3D graphics drawing, image rendering, synthesis, and layer processing.
  • the 2D graphics engine is a drawing engine for 2D drawing.
  • the kernel layer is the layer between hardware and software.
  • the kernel layer contains at least display driver, camera driver, audio driver, and sensor driver.
  • the touch sensor 180K receives the touch operation on the camera application icon, and reports it to the processor 110, so that the processor 110 starts the camera application in response to the above touch operation, and displays the user interface of the camera application on the display screen 194 , As shown in figure a in Figure 3.
  • the terminal 100 may also start the camera application in other ways, and display the user interface of the camera application on the display screen 194. For example, when the terminal 100 displays a black screen, displays a lock screen interface, or displays a certain user interface after unlocking, it may start the camera application in response to a user's voice instruction or shortcut operation, and display the user interface of the camera application on the display screen 194.
  • the camera's user interface includes controls such as "night scene", “portrait”, “photograph”, "video” and "more”.
  • the touch sensor 180K receives the touch operation on the "recording" control and reports it to the processor 110, so that the processor 110 will highlight the "recording" control in response to the aforementioned touch operation, as shown in figure b in Figure 3,
  • the "recording" control in figure b in Figure 3 is framed to highlight it; and the recording function is activated to display the user interface under the recording function, as shown in figure c in Figure 3.
  • the user interface under the video function includes controls such as "Hitchcock Zoom Video”, “Normal Video” and "More".
  • the touch sensor 180K receives the touch operation on the "Hitchcock Zoom Video” control and reports it to the processor 110, so that the processor 110 will highlight the "Hitchcock Zoom Video” control in response to the above touch operation.
  • the "Hitchcock zoom video” control in figure d in Figure 3 is framed to highlight it; and the Hitchcock zoom video shooting mode is used to start recording, that is, start shooting Hitchcock Kirk zoom video.
  • the terminal 100 may also activate the Hitchcock zoom video shooting mode in other ways.
  • the terminal 100 may activate the Hitchcock zoom video shooting mode in response to a user's voice instruction or shortcut operation.
  • FIG. 4 it is a schematic diagram of the hardware structure of a computer device 30 provided by an embodiment of this application.
  • the computer device 30 includes a processor 301, a memory 302, a communication interface 303, and a bus 304.
  • the processor 301, the memory 302, and the communication interface 303 may be connected through a bus 304.
  • the processor 301 is the control center of the computer device 30, and may be a general-purpose CPU or other general-purpose processors. Among them, the general-purpose processor may be a microprocessor or any conventional processor.
  • the processor 301 may include one or more CPUs, such as CPU 0 and CPU 1 shown in FIG. 4.
  • the memory 302 can be a read-only memory (ROM) or other types of static storage devices that can store static information and instructions, random access memory (RAM), or other types that can store information and instructions
  • ROM read-only memory
  • RAM random access memory
  • EEPROM electrically erasable programmable read-only memory
  • magnetic disk storage media or other magnetic storage devices, or can be used to carry or store instructions or data structures
  • the desired program code and any other medium that can be accessed by the computer but not limited to this.
  • the memory 301 may exist independently of the processor 301.
  • the memory 302 may be connected to the processor 301 through the bus 304, and is used to store data, instructions or program codes.
  • the processor 301 calls and executes the instructions or program codes stored in the memory 302, it can implement the corresponding method of the training data preparation process before training the white balance network and the method of training the white balance network provided in the embodiments of the present application.
  • the memory 302 may also be integrated with the processor 301.
  • the communication interface 303 may be any device capable of inputting parameter information, such as a communication interface, which is not limited in the embodiment of the present application.
  • the communication interface may include a receiving unit and a sending unit.
  • the communication interface 303 may be used to send related information (such as the values of related parameters) of the trained white balance network to the terminal 100.
  • the bus 304 may be an industry standard architecture (ISA) bus, a peripheral component interconnect (PCI) bus, or an extended industry standard architecture (EISA) bus.
  • ISA industry standard architecture
  • PCI peripheral component interconnect
  • EISA extended industry standard architecture
  • the bus can be divided into address bus, data bus, control bus and so on. For ease of representation, only one thick line is used in FIG. 4 to represent, but it does not mean that there is only one bus or one type of bus.
  • FIG. 4 does not constitute a limitation on the computer device 30.
  • the computer device 30 may include more or less components than those shown in the figure, or Combining certain components, or different component arrangements.
  • the computer device 30 shown in FIG. 4 may specifically be any terminal 100 provided above, or may be any network device, such as an access network device (such as a base station).
  • the video shooting method provided in the embodiments of the present application can be applied to shooting Hitchcock zoom video. And, in the process of shooting the Hitchcock zoom video, white balance processing is performed on the collected images based on the white balance network to make the white balance of adjacent images in the time domain consistent. in:
  • White balance is an index that describes the accuracy of white after the three primary colors of red, green, and blue are mixed.
  • White balance is a very important concept in the field of television photography, through which a series of problems with tone processing can be solved.
  • White balance gain is a parameter for correcting the white balance of the image.
  • the white balance network is a network used to predict the white balance gain of an image.
  • the white balance network can be a deep learning network, such as a neural network.
  • White balance consistency refers to processing adjacent images in the time domain by using approximate white balance gains, so that the white balance effects between the processed images are the same or similar.
  • the white balance network provided by the embodiment of the present application is used to process adjacent images in the time domain, so that the white balance effects between the processed images are the same or similar.
  • the images continuously collected by the terminal or the images obtained after processing based on the images continuously collected by the terminal can be understood as images adjacent in the time domain.
  • FIG. 5 a schematic flow diagram of a training data preparation process before training a white balance network provided by this embodiment of the application.
  • the training data preparation process before training the white balance network can be performed by the computer device 30 described above.
  • the method shown in Figure 5 includes the following steps:
  • the computer device acquires original images in multiple environments (such as indoor and outdoor environments with different color temperatures, different brightness, and different viewing angles) collected through multiple cameras (such as a main camera, a wide-angle camera, etc.).
  • environments such as indoor and outdoor environments with different color temperatures, different brightness, and different viewing angles
  • cameras such as a main camera, a wide-angle camera, etc.
  • S102 For each original image, the computer device performs parameter extraction on the gray or white part of the original image to obtain the white balance gain of the original image.
  • the white balance gain here is used as a prediction target of the white balance network during the training phase. Therefore, in order to distinguish the white balance gain predicted value below, the white balance gain obtained here is referred to as the white balance gain reference value hereinafter.
  • the gray or white part of the original image can be obtained based on a standard color card comparison.
  • S103 For each original image, the computer device performs data enhancement on the original image to obtain a group of enhanced images. A set of enhanced images is used to determine an original sample.
  • each group of enhanced images is referred to as an image group.
  • Implementation method 1 An image group is used as an original sample.
  • An image group includes P images, and P is an integer greater than or equal to 2.
  • the P images in an image group are used to simulate the continuous P images in the time domain collected by the camera.
  • the P images in an image group are used to simulate P consecutive images in the time domain collected by the same camera.
  • the P images in an image group are used to simulate P continuous images in the time domain collected before and after different cameras are switched.
  • the first image in an image group may be generated on the basis of a set of random numbers based on the original image corresponding to the image group.
  • the embodiment of the present application is not limited to this.
  • An image group and the original image corresponding to the image group are used as an original sample.
  • An image group includes Q images, and Q is an integer greater than or equal to 1.
  • the images in an image group and the original image corresponding to the image group are used to simulate the time domain continuous Q+1 images collected by the camera.
  • the Q images in an image group and the original images corresponding to the image group are used to simulate Q+1 time-domain continuous images collected by the same camera.
  • the Q images in an image group and the original images corresponding to the image group are used to simulate Q+1 time-domain continuous images collected before and after different cameras are switched.
  • the original image may be used as the first image, and the images in the image group corresponding to the original image are used as the second image to the Q+1th image in the original sample.
  • a part of the original sample is used to simulate multiple time-domain continuous images collected by the same camera, and the other part of the original sample is used to simulate multiple time-domain continuous images collected before and after different cameras are switched.
  • the white balance network trained based on all the original samples can be applied to scenes where cameras are not switched and scenes where cameras are switched at the same time.
  • the color space may be a red green blue (RGB) color space, etc., of course, the specific implementation is not limited to this.
  • RGB red green blue
  • all samples in the training data use the same color space.
  • the images in the sample belong to the same color space in order to avoid (or eliminate) the difference between different camera modules in the prediction stage.
  • the step of converting to the same color space may not be performed.
  • a group of enhanced images obtained based on the same original image is called an image group.
  • FIG. 6 a schematic diagram of a network architecture used in training a white balance network provided by an embodiment of this application.
  • in(n) represents the nth image in the sample.
  • n is the number of images in a sample, n is greater than or equal to 2, and n is an integer.
  • in(n-a) represents the n-ath image in the sample. a ⁇ n, a is an integer.
  • out(n) represents the output of the first sub-network when the input of the first sub-network is in(n).
  • out(n-a) represents the output of the first sub-network when the input of the first sub-network is in(n-a).
  • mem(n-1,1) represents the feature map corresponding to the n-1th image to the feature map corresponding to the first image in the sample.
  • mem(n-a-1, 1) represents the feature map corresponding to the n-a-1th image to the feature map corresponding to the first image in the sample.
  • the feature map corresponding to the n-1th image is the feature map of the network layer included in the first sub-network when the input of the first sub-network is in(n-1).
  • the embodiments of the present application do not limit which network layer or which of the first sub-network the network layer is specifically, and the specific implementation manner of each network layer.
  • feature maps corresponding to different images in a sample may be feature maps of the same network layer in the first sub-network or feature maps of different network layers.
  • the loss function (loss) is used to constrain during the training process, so as to achieve the training goal of "out(n), out(n-1)...out(n-a) consistent”.
  • the computer device inputs the na-th image to the n-th image in the sample into the network architecture shown in FIG. 6, and the network architecture outputs a set of white balance gain prediction values out(na) to out(n) .
  • the computer equipment uses the white balance gain reference value of the original image corresponding to the sample as supervision, so that each value in "out(na)&out(n)" is as close as possible to the white balance gain reference value of the original image For the goal, adjust the value of the parameter in the first sub-network.
  • the white balance network is trained based on the constraint condition that the white balance gain prediction values of multiple consecutive images used to simulate the time domain are consistent.
  • the network architecture used when training the white balance network may be as shown in FIG. 7.
  • the computer device inputs the first image in the sample as in(1) and the second image as in(2) into the network architecture shown in Figure 7, which outputs a pair of white balance gain prediction values out (1) and out(2).
  • the computer equipment uses the white balance gain reference value of the original image corresponding to the sample as supervision, and combines the white balance gain prediction values out(1) and out(2) output by the network architecture to adjust the parameters of the white balance network. value.
  • FIG. 8 a schematic diagram of a network architecture used in the prediction phase provided by an embodiment of this application.
  • the first sub-network in the white balance network in FIG. 8 is the first sub-network at the end of the training phase.
  • in(t) represents the input of the white balance network and is used to input the image to be predicted.
  • out(t) is the output of the white balance network when the input of the white balance network is in(t).
  • mem(t-1, tT) represents the feature map output by the first target network layer used in the process of predicting the white balance gain of the collected t-1th image based on the white balance network (hereinafter referred to as the first t-1 image corresponding feature map), to the second target network layer output feature map (hereinafter referred to as The feature map corresponding to the tTth image).
  • the value of T can be adjusted.
  • the larger the value of T the smaller the overall fluctuation range of the white balance gain of multiple consecutive images in the time domain predicted by the white balance network, that is, the larger the value of T, the smaller the white balance network is used. After performing white balance processing on multiple consecutive images in the time domain, the white balance consistency between the obtained images is better.
  • the image is input as in(t) into the white balance network shown in Figure 8.
  • the white balance network outputs under the constraints of mem(t-1, tT)
  • the out(t) is the predicted value of white balance gain.
  • the white balance network is used to combine the feature map of the historical network layer (ie mem(t-1, tT)) to predict the white balance gain of the image to be processed (ie in(t)) to ensure the time domain
  • the historical network layer is a network layer used when predicting the white balance gain of an image that precedes the image to be processed and is continuous in the time domain with the image to be processed.
  • FIG. 9 it is a schematic flowchart of a white balance gain prediction method provided by an embodiment of this application.
  • the execution subject of this technical solution may be the computer device 30 provided above.
  • the method shown in FIG. 9 may include the following steps:
  • the computer device obtains the first image to be predicted and the original color space of the first image to be predicted.
  • the original color space of the first image to be preset is the color space of the camera used when collecting the image to be predicted.
  • the first image to be predicted can be any one of the non-first images among the consecutive images captured by the same camera, or any one of the non-first images among the consecutive multiple images captured by different cameras. image.
  • the computer device converts the original color space of the first image to be predicted into a preset color space to obtain a second image to be predicted.
  • the preset color space is the color space used in the training phase.
  • S203 The computer device inputs the second to-be-predicted image as in(t) to a white balance network (for example, the white balance network shown in FIG. 8), and the white balance network is under the constraint of mem(t-1, tT), The output out(t) is the predicted value of white balance gain.
  • a white balance network for example, the white balance network shown in FIG. 8
  • the output out(t) is the predicted value of white balance gain.
  • S204 The computer device converts the predicted value into the original color space, and applies the converted predicted value to the first image to be predicted to obtain an optimized image corresponding to the first image to be predicted.
  • Applying the converted predicted value to the first to-be-predicted image to obtain an optimized image corresponding to the first to-be-predicted image may include: multiplying the converted predicted value by the pixel value of each pixel in the first to-be-predicted image , Obtain the new pixel value, and use the pixel value as the pixel value of the pixel corresponding to the pixel in the optimized image corresponding to the first image to be predicted.
  • the optimized image is an image obtained after processing the first image to be predicted based on a white balance network.
  • the image obtained after the processing of S201-S204 is performed on each of the images that are not the first image in the time domain is the first one of the images that are continuous in the time domain. There is consistency in white balance between images.
  • the traditional white balance network usually only considers single frame information, which will cause the white balance gain prediction value between frames to jump, that is to say, the overall fluctuation range of the white balance gain prediction value between frames Larger.
  • the white balance network provided by the embodiment of the present application integrates the network layer feature information (ie, feature map, specifically mem(t-1, t-T) above) of the current frame and the historical frame.
  • the network layer feature information ie, feature map, specifically mem(t-1, t-T) above
  • the current frame and the historical frame are continuous in the time domain.
  • the white balance network provided by the above technical solution makes the white balance gain predicted values of multiple consecutive images in the time domain closer, that is, the overall fluctuation range of the white balance gain of multiple consecutive images predicted by the white balance network It is smaller, so that the white balance network is more stable, that is, it makes the white balance uniformity effect between the images obtained after white balance processing on consecutive multiple images is better.
  • the white balance gain prediction value of the image enhanced by the same original image is consistent is the constraint. In fact, there is no guarantee that the predicted values of the white balance gains of consecutive images are exactly the same, but there are certain fluctuations.
  • the white balance network provided by the embodiment of the present application helps reduce the overall fluctuation range of the white balance gain of multiple consecutive images in the time domain, thereby Improve the stability of the white balance network.
  • the traditional white balance network is trained based on multiple images collected by the same camera, which will cause the traditional white balance network to not be used in multi-shot switching scenes.
  • the embodiment of the present application implements data enhancement for a multi-shot switching scene. It can be embodied in the training of the white balance network that the training data used contains samples for simulating multiple consecutive images before and after the multi-shot switching scene. Since there are generally changes in viewing angle, size, and image statistics during multi-camera switching, data enhancement is used to simulate multi-camera switching scenes during training, which helps to constrain the network's predicted values to be consistent in multi-camera switching scenes.
  • the size of the target subject in different images is the same (such as the same or not much difference), and the relative position of the target subject in different images is the same (such as the same or not much difference).
  • the posture of the target subject in different images is the same (for example, the posture is the same or the posture is similar).
  • the size of the target subject in different images is the same, which may include: the contour (or the smallest bounding rectangle) of the target subject in the different images is relatively consistent.
  • the relative positions of the target subjects in different images are consistent, which may include: the target subjects in different images have the same relative positions relative to the same static object in the background.
  • the center position (or contour or minimum circumscribed rectangle) of the target subject in different images is consistent with the center position (or contour or minimum circumscribed rectangle) of the same static object in the background.
  • similar postures may include the same overall posture (such as standing, sitting, or lying posture), but differences in local postures (such as different gestures, etc.).
  • the size of the target subject in different images in the Hitchcock zoom video is the same, which means that the target subject in the different images of the Hitchcock zoom video does not jump or change.
  • the degree is small, so that the user does not feel the jump, or the user can accept the jump.
  • the relative positions of the target subject in different images in the Hitchcock zoom video are consistent, which means that the target subject in the different images of the Hitchcock zoom video is static or dynamic
  • the degree of change is small, so that the user does not feel the dynamic change, or the user can accept the dynamic change.
  • FIG. 10 it is a schematic flowchart of a method for shooting a video provided by an embodiment of this application.
  • This method is applied to the terminal.
  • the terminal includes at least two cameras.
  • the technical solution provided in this embodiment is applied to a scene where Hitchcock zoom video is shot from near and far, that is, in the process of shooting Hitchcock zoom video, the distance between the terminal and the target subject is getting farther and farther. .
  • the Hitchcock zoom video is shot under the condition that the terminal is getting farther and farther from the target subject.
  • the size of the target subject in the image captured later is smaller than the target in the image captured previously
  • the size of the subject, and the size of the target subject in the different images of the Hitchcock zoom video is the same. Therefore, in order to realize the Hitchcock zoom video, it is necessary to zoom in on the captured image. Enlarging the image directly will cause the enlarged image to be unclear.
  • the terminal can use a camera with a larger magnification than the camera used to collect the previous image to collect the subsequent image, that is, by switching to a camera with a larger magnification Realize the enlargement of the target subject, in this way, compared with the technical solution of "enlarging the image collected later", it helps to improve the definition of the image.
  • the method shown in FIG. 10 may include the following steps:
  • S300 The terminal determines to shoot Hitchcock zoom video from near and far, and determines the initial camera.
  • the terminal may determine to shoot the Hitchcock zoom video from near and far under the instruction of the user.
  • the terminal in response to a touch operation on the "Hitchcock zoom video” control, can also display the "near to far mode” 401 control and the “from far to near mode” control on the user interface. "402 control, as shown in Figure 11 a. Based on this, the user can touch the "near and far mode” 401 control. In response to the touch operation, the terminal will highlight the "near and far mode” 401 control (such as bolding the border of the control), such as As shown in figure b in Figure 11, start shooting Hitchcock zoom video from near and far mode at the same time.
  • the terminal may also initiate shooting of Hitchcock zoom video in the near and far mode in other ways (such as a voice command mode or a shortcut key mode, etc.), which is not specifically limited in the embodiment of the present application.
  • the magnification of the initial camera is usually not the camera with the largest magnification in the terminal, and it is usually possible to pre-define that the initial camera has a smaller magnification in the terminal (e.g. The smallest) camera.
  • N is an integer greater than or equal to 1.
  • the N+1 images may be N+1 images continuously collected by the terminal.
  • the first image in the N+1 images is the first image saved by the terminal when the terminal starts shooting in the "Hitchcock zoom video” mode.
  • the first scenario can be understood as when the terminal performs S301, the shooting scene in the field of view taken by the camera of the terminal or the surrounding scene is related to the environment of the user, the posture of the terminal or the parameters of the camera, and this application does not do it. limited.
  • the target subject may be an object, and the position of the target subject may not move during the shooting process, or may move laterally at the same depth.
  • the target body may also include multiple objects with the same depth, and the whole of the multiple objects may be used as the target body.
  • the images of the multiple objects are connected or partially overlapped.
  • the imaging size of objects at different depths changes in different magnitudes. Therefore, when the distance between the user and the target subject changes, it is difficult for objects of different depths to achieve the same image size at the same time. Therefore, in order to keep the size of the target subject image basically unchanged, multiple objects in the target subject should have the same depth.
  • the target subject may be automatically determined by the terminal or specified by the user, and the following two cases will be described separately.
  • the terminal automatically determines the target subject, and the target subject may include one or more objects.
  • the target subject is a preset type of object.
  • the preset type of object is a person, an animal, a famous building or a landmark, etc.
  • the terminal determines the object of the preset type as the target subject based on the preview image.
  • the target subject is an object whose image on the preview image is located in the center area.
  • the target subject that the user is interested in is usually directly facing the zoom camera, so the image of the target subject on the preview image is usually located in the center area.
  • the target subject is an object whose image on the preview image is close to the central area and whose area is greater than the preset threshold 1.
  • the target subject that the user is interested in usually faces the zoom camera and is closer to the zoom camera, so that the image of the target subject on the preview image is close to the central area and the area is greater than the preset threshold 1.
  • the target subject is an object of a preset type whose image on the preview image is close to the central area.
  • the target subject is an object of a preset type whose area on the preview image is close to the central area and whose area is greater than a preset threshold.
  • the target subject is a preset type of object with the smallest depth and the image on the preview image is close to the central area.
  • the preset type of objects in the image on the preview image close to the central area includes multiple objects with different depths
  • the target object is the object with the smallest depth.
  • the terminal defaults that the target subject includes only one object.
  • the terminal may prompt the target subject to the user by means of displaying prompt information or voice broadcast.
  • the preset type is a person
  • the terminal determines that the target subject is a person 1 of the preset type whose image on the preview image is close to the central area.
  • the terminal may select the character 1 through the box 501 to prompt the user that the character 1 is the target subject.
  • the preset type is a person
  • the terminal determines that the target subject is a person 2 and a person 3 of the preset type with the same depth and the image on the preview image is close to the central area.
  • the terminal may frame the character 2 and the character 3 through the circle 502, and prompt the user that the character 2 and the character 3 are the target subjects.
  • the preset type includes a person and an animal
  • the terminal determines that the target subject is a person 4 and an animal 1 of the preset type whose image on the preview image is close to the central area and has the same depth.
  • the terminal may prompt the user character 4 and animal 1 as the target subject by displaying prompt information.
  • the terminal after the terminal automatically determines the target subject, it can also modify the target subject in response to the user's operation, such as switching, adding, or deleting the target subject.
  • the target subject automatically determined by the terminal is Person 1
  • the terminal detects that the user has clicked on Person 5 on the preview image as shown in Figure 13(b) Show, change the target subject from character 1 to character 5.
  • the target subject automatically determined by the terminal is Person 1, and the terminal detects that the user has dragged the box to simultaneously select Person 1 and Person 5, as shown in Fig. 14
  • the target subject is changed from character 1 to character 1 and character 5.
  • the target subjects automatically determined by the terminal are Character 1 and Character 5. After the terminal detects that the user can click on Character 5, it is shown in Figure 15 (b) As shown, the target subject is modified from character 1 and character 5 to character 1.
  • the terminal first enters the target subject modification mode according to the user's instruction, and then modifies the target subject in response to the user's operation.
  • the user specifies a target subject, and the target subject includes one or more objects.
  • the terminal After the terminal enters the Hitchcock mode, it can determine the target subject in response to the user's preset operation on the preview interface.
  • This preset operation is used to designate a certain object or certain objects as the target object.
  • the preset operation may be a touch operation, a voice command operation or a gesture operation, etc., which is not limited in the embodiment of the present application.
  • the touch operation may be a single click, a double tap, a long press, a pressure press, or an operation to circle an object, etc.
  • the terminal detects the user's double-click operation of character 1 on the preview image, it determines character 1 as the target subject as shown in (b) in FIG. 16 .
  • the terminal may be prompted to specify the target subject.
  • the terminal may display a prompt message: Please specify the target subject, so that the image size of the target subject during shooting is basically unchanged.
  • the terminal determines the target subject in response to the user's preset operation on the preview interface. For example, after the terminal detects the user's operation of circling the character 1 shown in (a) in FIG. 17, it determines the corresponding character 1 as the target subject as shown in (b) in FIG. 17. For another example, after the terminal detects the operation of the user's voice indicating that the character is the target subject, it determines the character 1 as the target subject.
  • the terminal may display a prompt message: a person is detected, whether to designate the person as the target Subject, so that the image size of the target subject remains basically the same during shooting? Then, after the terminal responds to the user's operation of clicking the "Yes" control, it determines that the character is the target subject as shown in (b) of FIG. 18.
  • the terminal may prompt the user: Please select only one object as the target subject.
  • the terminal can also prompt the user of the target subject by displaying prompt information or voice broadcast.
  • the terminal can also modify the target subject in response to the user's operation, such as switching, adding or deleting the target subject. I won't repeat them here.
  • the terminal collects N+1 images for the first scene in real time, which means that the terminal collects N+1 images for the first scene during the shooting process, rather than the N+ images for the first scene that have been acquired before the shooting. 1 image.
  • S301 may include the following steps S301a-S301d:
  • S301a The terminal uses the initial camera to collect the first image for the first scene, and the first image includes the target subject.
  • S301b The terminal uses the initial camera to collect a second image for the first scene, and the second image includes the target subject.
  • the terminal determines a camera that collects the i+1th image based on the shooting magnification of the ith image.
  • i is an integer.
  • the i-th image includes the target subject.
  • S301d The terminal uses the determined camera that collects the i+1th image to collect the i+1th image.
  • the terminal can collect N+1 images.
  • the N+1 images include N1+1 images collected before and N2 images collected later, where N1+1 images are collected by the first camera of the terminal, and N2 images are collected by the first camera of the terminal.
  • the image is collected by the second camera of the terminal; N1 and N2 are both integers greater than or equal to 1.
  • the technical solutions provided by the embodiments of the present application can be applied to shooting Hitchcock zoom videos in a scene where cameras are switched.
  • the camera can be switched multiple times during the process of shooting a Hitchcock zoom video.
  • the zoom magnifications of the second image to the N1th image in the N1+1 images relative to the first image all belong to the first shooting magnification range.
  • the first shooting magnification range corresponds to the first camera.
  • the zoom magnifications of the N1+1th image in the N1+1 images and the first N2-1 images in the N2 images relative to the first image all belong to the second shooting magnification range.
  • the second shooting magnification range corresponds to the second camera.
  • the N+1 images are all captured by the first camera. That is to say, the technical solutions provided by the embodiments of the present application can be applied to shooting Hitchcock zoom videos in a scene where cameras are not switched.
  • S301c may include the following steps: S301c-1 to S301c-3:
  • the terminal performs anti-shake processing on the i-th image based on the first image. Specifically, the terminal determines the location of the feature point in the first image, and based on the location of the feature point in the first image, performs motion compensation on the location of the feature point in the i-th image that matches the feature point. , So as to realize the anti-shake processing of the i-th image.
  • the anti-shake processing technology may be optical anti-shake processing technology, artificial intelligence (AI) anti-shake processing technology, or electronic Deal with anti-shake technology and so on.
  • AI artificial intelligence
  • S301c-1 is an optional step. After S301c-1 is executed for each of the above N images, on the whole, it is helpful to enter the zoom ratio calculation module (that is, the module used to calculate the zoom ratio in the terminal) of the video (that is, the captured The last N images) have weaker jitter (that is, make the overall video more stable/smooth), so that the obtained zoom ratio has a greater accuracy.
  • the zoom ratio calculation module that is, the module used to calculate the zoom ratio in the terminal
  • the video that is, the captured The last N images
  • weaker jitter that is, make the overall video more stable/smooth
  • S301c-2 The terminal obtains the shooting magnification of the i-th image. If the terminal executes S301c-1, the i-th image here is specifically the i-th image after anti-shake processing.
  • the shooting magnification of the i-th image is determined based on the magnification of the i-th image relative to the first image and the magnification of the camera that collects the first image.
  • the zoom ratio of the i-th image relative to the first image is determined based on the size of the target subject in the i-th image and the size of the target subject in the first image.
  • ci c1/(di/d1).
  • di is the size of the target subject in the i-th image
  • d1 is the size of the target subject in the first image
  • di/d1 is the zoom ratio of the i-th image relative to the first image
  • c1 is the magnification of the first camera
  • ci is the shooting magnification of the i-th image.
  • the size of the target subject in the image is characterized by at least one of the following features 1-4:
  • Feature 1 The width of the target subject in the image.
  • Feature 2 The height of the target subject in the image.
  • Feature 3 The area of the target subject in the image.
  • Feature 4 The number of pixels occupied by the target subject in the image.
  • hi is the height of the target subject in the i-th image
  • h1 is the height of the target subject in the first image.
  • hi/h1 is the zoom ratio of the i-th image relative to the first image.
  • c1 is the magnification of the first camera
  • ci is the shooting magnification of the i-th image.
  • S301c-2 may include: the terminal extracts the target subject from the i-th image, and based on the size of the target subject and the size of the target subject of the first image, determine that the i-th image is relative to the first image The zoom ratio.
  • the embodiment of the present application does not limit the specific implementation manner of the terminal extracting the target subject from the image.
  • the terminal extracts the target subject from the image through one or more of the subject segmentation algorithm, the subject skeleton point detection algorithm, and the subject contour detection algorithm.
  • the subject segmentation algorithm includes an instance segmentation algorithm.
  • the terminal uses the instance segmentation algorithm to extract the instance segmentation mask of the target subject from the i-th image, and then divides the size of the instance segmentation mask of the target subject extracted from the i-th image by the size of the instance segmentation mask extracted from the first image The instance of the target subject is divided into the size of the mask, and the zoom ratio of the i-th image relative to the first image is obtained.
  • FIG. 20 it is a schematic diagram of an instance segmentation provided by an embodiment of this application.
  • the picture a in Fig. 20 represents the i-th image
  • the picture b in Fig. 20 represents an example segmentation mask of the target subject in the i-th image.
  • pixels with a pixel value greater than 0 represent the target
  • the pixels of the subject, and the other areas are pixels that represent the background.
  • the instance segmentation algorithm is a pixel-level segmentation method, and the accuracy of the target subject extracted based on the instance segmentation algorithm is greater, which helps to make the zoom ratio calculated by the terminal more accurate. For example, when the target subject includes multiple people, the subject's portrait and background can also be effectively distinguished.
  • the terminal determines that the camera that collects the i+1-th image is the first camera. If the shooting magnification of the i-th image is in the second shooting magnification range, it is determined that the camera that collects the i+1-th image is the second camera.
  • the first shooting magnification range corresponds to the first camera, and the second shooting magnification range corresponds to the second camera.
  • the terminal determines the camera that collects the i+1th image based on the shooting magnification of the ith image. Since the difference between the size of the target subject and the position of the target subject in two adjacent images is not too great, the terminal may collect the i+1th image based on the camera determined by the shooting magnification of the i-th image.
  • the magnification of the first camera is a
  • the magnification of the second camera is b
  • a ⁇ b the first shooting magnification range is [a, b)
  • the second shooting magnification range is a range greater than or equal to b.
  • the corresponding shooting magnification range of the wide-angle camera is [0.6,1), and the main camera is The corresponding shooting magnification range is a range greater than or equal to 1.
  • the terminal also includes a third camera
  • the magnification of the third camera is c, a ⁇ b ⁇ c
  • the first shooting magnification range is [a, b)
  • the second shooting magnification range is [b , C)
  • the shooting magnification range corresponding to the third camera is a range greater than or equal to c.
  • the shooting magnification range of the wide-angle camera is [0.6, 1)
  • the shooting magnification range of the main camera is [1, w)
  • the shooting magnification range of the telephoto camera is greater than or equal to w.
  • the magnification of the wide-angle camera is 0.6
  • the magnification of the main camera is 1, and the magnification of the first telephoto camera is 1.
  • the magnification is w1, the magnification of the second telephoto camera is w2, 1 ⁇ w1 ⁇ w2, then the shooting magnification range of the wide-angle camera is [0.6,1), and the shooting magnification range of the main camera is [1,w1),
  • the shooting magnification range corresponding to the first telephoto camera is [w1, w2), and the shooting magnification range corresponding to the second telephoto camera is a range greater than or equal to w2.
  • the terminal can also switch to the camera when the shooting magnification of the i-th image reaches a small range before the minimum critical value of the shooting magnification range corresponding to a certain camera to reduce The problem of shooting delay caused by switching cameras.
  • FIG. 21 a schematic diagram of a process in which a terminal collects N+1 images according to an embodiment of this application.
  • the terminal includes camera 1 to camera x, where x is an integer greater than or equal to 2. The larger the number, the greater the magnification of the camera.
  • the process of collecting N+1 images by the terminal may include the following steps:
  • the terminal uses camera 1 (that is, the initial camera) to collect the first image.
  • the terminal uses the camera 1 to capture the second image, and performs anti-shake processing on the second image based on the first image, and then obtains the zoom ratio of the second image after the anti-shake processing relative to the first image. If it is determined based on the zoom magnification that the shooting magnification of the second image is within the shooting magnification range corresponding to the camera a, then the camera a is used to collect the third image. Among them, 1 ⁇ a ⁇ x.
  • the terminal uses the camera a to capture the third image, performs anti-shake processing on the third image, and then obtains the zoom ratio of the anti-shake processed third image relative to the first image. If it is determined based on the zoom magnification that the shooting magnification of the third image is within the shooting magnification range corresponding to the camera b, then the camera b is used to collect the fourth image. Among them, a ⁇ b ⁇ x.
  • the terminal collects the 4th to N+1th images.
  • the size of the target subject in the images acquired later in the N+1 images is smaller than the size of the target subject in the first image.
  • FIG. 22a a schematic diagram of an image collected by a terminal in S301 according to an embodiment of this application.
  • a diagram represents the first image collected by the terminal
  • the b diagram represents the second image collected by the terminal
  • the c diagram represents the third image collected by the terminal.
  • the size of the target subject in the later captured image may be larger than the size of the target subject in the previously captured image, but smaller than the target in the first image The size of the main body.
  • FIG. 22b a schematic diagram of an image collected by a terminal in S301 according to an embodiment of this application.
  • a diagram represents the first image collected by the terminal
  • the b diagram represents the second image collected by the terminal
  • the c diagram represents the third image collected by the terminal.
  • the size of the target subject in the later captured image may be larger than the size of the target subject in the previously captured image, but smaller than the size of the target subject in the first image. size.
  • the subsequent terminal can use the 2X camera to capture the third image.
  • the terminal since the terminal uses a 2X camera to capture the third image and a 1X camera to capture the second image, it is possible that the size of the target subject in the third image is larger than the size of the target subject in the second image.
  • the size of the target subject in the third image is smaller than that of the target subject in the first image. size.
  • the method may further include: the terminal displays first information in the current preview interface, where the first information is used to instruct to stop shooting the Hitchcock zoom video.
  • the terminal may display the first information in the current preview interface when the currently used camera is the camera with the largest magnification in the terminal. For the user, it is possible to stop shooting the Hitchcock zoom video within a period of time after acquiring the first information.
  • Shooting Hitchcock zoom video when the distance between the terminal and the target subject is getting farther and farther. If the currently used camera is the camera with the largest magnification in the terminal, the magnification of the current camera cannot be greater, so the terminal The camera can no longer be switched. At this time, by displaying the first information on the current preview interface, it helps to prompt the user to stop shooting the video in time. Otherwise, the subsequent image captured by the terminal can only be enlarged to make the target subject in the enlarged image The size of is consistent with the size of the target subject in the first image, which will result in a low definition of the target image generated based on subsequent captured images. That is, the embodiment of the present application provides a method for instructing the user to stop shooting the Hitchcock zoom video, which helps to improve the user experience.
  • the embodiment of the present application does not limit the specific information contained in the first information to instruct to stop shooting the Hitchcock zoom video. For example, you can directly indicate "The camera currently in use is the camera with the largest magnification in the terminal", or indirectly indicate that the camera currently in use is the camera with the largest magnification in the terminal by indicating "Please stop recording video”.
  • Figure 23a a schematic diagram of a current preview interface provided by an embodiment of this application.
  • the current preview interface contains an image 501 of the currently playing Hitchcock zoom video (that is, the current preview image), and the first message "Please stop recording Video" 502.
  • the method may further include: the terminal displays second information in the current preview interface, and the second information is used to indicate that the target subject is stationary.
  • the terminal may display the second information in the current preview interface when it is determined that the position of the target subject in the current preview image is consistent with the position of the target subject in the previous preview image. Since one of the requirements of Hitchcock zoom video is that the position of the target subject in each image is the same, in this way, the user can know whether the current Hitchcock zoom is satisfied during the process of obtaining the Hitchcock zoom video. Video requirements to improve user experience.
  • the embodiment of the present application does not limit the specific information contained in the first information to indicate that the target subject is stationary.
  • FIG. 23b a schematic diagram of a current preview interface provided by an embodiment of this application.
  • the current preview interface contains the image 501 of the currently presented Hitchcock zoom video (that is, the current preview image), and the second information "the target subject is still" 503.
  • the method may further include: the terminal displays third information in the current preview interface, and the third information is used to indicate that the target subject is in the center of the current preview image.
  • the user can move the terminal based on when the third information is not displayed in the terminal, so that the target subject is in the center of the current preview image, which helps to improve the quality of the Hitchcock video.
  • the terminal can detect the position of the target subject in the current preview image (such as the center of the target subject, or the outline of the target subject or the smallest outer rectangle of the target subject, etc.), in the preset central area of the current preview image (that is, with the current When the center of the preview image is a preset area in the center), the third information is displayed in the current preview interface.
  • the position of the target subject in the current preview image such as the center of the target subject, or the outline of the target subject or the smallest outer rectangle of the target subject, etc.
  • the embodiment of the present application does not limit the specific information contained in the third information to indicate that the target subject is in the center of the current preview image.
  • FIG. 23c a schematic diagram of a current preview interface provided by an embodiment of this application.
  • the current preview interface contains the image 501 of the currently presented Hitchcock zoom video (that is, the current preview image), and the third information "the target subject is in the center of the current preview image" 504.
  • the terminal may display fourth information in the current preview interface, and the fourth information is used to indicate that the target subject is not in the center of the current preview image.
  • the user can move the terminal based on when the fourth information is displayed in the terminal, so that the target subject is in the center of the current preview image, which helps to improve the quality of the Hitchcock video.
  • collecting N+1 images in real time for the first scene includes: collecting the first image when the target subject is in the center of the current preview image. In this way, it helps to improve the quality of Hitchcock's video.
  • the moving speed of the terminal is less than or equal to a preset speed.
  • the embodiment of the present application does not limit the specific value of the preset speed, for example, it may be an empirical value.
  • the terminal For N images acquired after the N+1 images, the terminal performs white balance processing based on a preset neural network to obtain N optimized images.
  • the preset neural network is used to ensure the white balance consistency of adjacent images in the time domain.
  • the preset neural network here may be the white balance network provided by the embodiment of the present application above, such as the white balance network shown in FIG. 8.
  • the terminal can download the preset neural network in the network device, or can obtain the preset neural network through local training.
  • the embodiment of the application does not limit this.
  • the terminal can perform white balance processing on each of the N images acquired after the N+1 images.
  • the brightness, chroma and other parameters are corrected to avoid (or try to avoid) the problem of inconsistent images caused by switching cameras.
  • chroma and brightness respectively obtain the brightness value/chroma value of the image adjacent in the time domain, and obtain the brightness value/chroma value multiplicative factor or additive factor of the image brightness value/
  • the chroma is converted to make the brightness/chroma value of the subsequent image close to the brightness value/chroma value of the previous image after conversion, so that the brightness value/chroma value of the image adjacent in the time domain remains consistent.
  • S303 The terminal enlarges and crops the N optimized images to obtain N target images; wherein the size of the target subject in the N target images is the same as the size of the target subject in the first image collected in the N+1 images , The relative position of the target subject in the N target images is consistent with the relative position of the target subject in the first image; the size of the N target images is the same as the size of the first image.
  • the size of the target subject in the later captured image is smaller than the size of the target subject in the first image. Therefore, the terminal needs to perform N optimized images corresponding to the N images captured later. Zoom in and crop.
  • the terminal enlarges the N optimized images, so that the size of the target subject in the enlarged N optimized images is consistent with the size of the target subject in the first image, and the relative size of the target subject in the enlarged N optimized images The position is consistent with the relative position of the subject in the first image.
  • the terminal crops the enlarged N images to obtain N target images, so that each of the N target images has the same size as the first image.
  • FIG. 24 The picture a in FIG. 24 represents the first image among N+1 images, and the picture b in FIG. 24 represents one of the N optimized images.
  • Figure c in Figure 24 represents an image obtained by enlarging the optimized image shown in Figure b in Figure 24, where the size of the target subject in the enlarged image is the same as that shown in Figure a in Figure 24.
  • the size of the target subject in an image is the same.
  • the d diagram in FIG. 24 represents the target image obtained by cropping the image shown in the c diagram in FIG. 24, and the size of the target image is the same as the size of the first image shown in the a diagram in FIG. 24, where In the process of cropping, try to ensure that the position of the target subject in the target image is consistent with the position of the target subject in the first image.
  • the zoom magnification of the target subject in the two images is different from the zoom magnification of the same object in the background of the two images. This will make the background in the target image obtained after zooming in and cutting an optimized image different from the background in the first image. For example, the backgrounds in the a and d images in FIG. 24 are different.
  • the terminal can generate a Hitchcock zoom video of "the target subject in different images has the same size, the same relative position, and the background is inconsistent" based on the N target images and the first image.
  • S304 The terminal generates a Hitchcock zoom video based on the N target images and the first image.
  • the playback time interval between adjacent images in the Hitchcock zoom video can be predefined.
  • the terminal displays the Hitchcock zoom video in real time during the shooting process. That is to say, the generated Hitchcock zoom video is presented while generating the Hitchcock zoom video.
  • the terminal executes the above S302 and S303: for the i-th image, after executing S301d, the i-th image is input into the preset neural network to obtain the optimized image corresponding to the i-th image.
  • the optimized image corresponding to the i-th image is enlarged and cropped to obtain the target image corresponding to the optimized image.
  • the image can be white-balanced, enlarged, cropped, and presented (that is, the target image is displayed). And, in the process of processing the image, the next image can be collected and processed.
  • performing S302 after obtaining N+1 images and performing S303 after performing S302 for all images.
  • the terminal may perform S302 after obtaining N+1 images, and perform S303 after performing S302 for all images. In other words, the terminal performs post-processing on the collected N+1 images to obtain the Hitchcock zoom video.
  • the position of the target subject in the different images captured before and after the camera may be different due to the switch of the camera, that is to say ,
  • the target subject may be unstable.
  • the method may further include: the terminal obtains the position information of the target subject in the first image, and the position information of the target subject in each of the N target images; then, for For each of the N target images, based on the subject image stabilization algorithm and the position information of the target subject in the first image, image stabilization is performed on the target image to obtain a new target image.
  • the position of the target subject in the new target image is consistent with the position of the target subject in the first image.
  • the position information of the target subject in the corresponding image can be obtained based on the target subject mask.
  • Combining the location information of the target subject to perform feature point detection on the location area of the target subject can effectively eliminate the influence of feature points outside the target subject area. By stabilizing the feature points of the target subject area, a new target image can be obtained.
  • the embodiment of the application does not limit the subject image stabilization algorithm.
  • it can be an AI image stabilization algorithm.
  • the terminal generates the Hitchcock zoom video based on the N target images and the first image, which specifically includes: the terminal generates the Hitchcock zoom video based on the N new target images and the first image. In this way, the subject of the obtained Hitchcock zoom video can be stabilized.
  • the terminal in the process of acquiring the Hitchcock zoom video, white balance processing is performed on the last N of the N+1 images collected in real time, so that the processed The image of is consistent with the white balance of the first image in the collected N+1 images.
  • the white balance effect of the obtained Hitchcock zoom video can be better, thereby improving the quality of the Hitchcock zoom video and improving the user experience.
  • the terminal can enlarge the size of the target subject in the subsequent captured image by switching the camera, which helps to make the obtained Hitchcock zoom effect video clearer than traditional technology. Higher degrees, thereby improving the user experience.
  • FIG. 25 it is a schematic flowchart of another video shooting method provided by an embodiment of this application.
  • This method is applied to the terminal.
  • the terminal includes at least two cameras.
  • the technical solution provided in this embodiment is applied to a scene where Hitchcock zoom video is shot from far and near, that is, in the process of shooting Hitchcock zoom video, the distance between the terminal and the target subject is getting closer and closer.
  • the Hitchcock zoom video is shot under the condition that the terminal is getting closer and closer to the target subject.
  • the size of the target subject in the image captured later is larger than the target in the image captured previously The size of the main body.
  • the size of the target subject in the different images in the Hitchcock zoom video is the same, so it is necessary to reduce the image collected later.
  • After shrinking the images collected later it is necessary to carry out “edge-filling” processing, so that the size of the “edge-filled” image is consistent with the size of the first image collected by the terminal, which will cause “edge-filling” to appear.
  • the resulting image has black edges, which leads to a poor user experience when the image is presented.
  • the first image collected by the terminal is shown in figure a in Figure 26
  • the second image is shown in figure b in Figure 26
  • the image obtained by reducing the second image is shown in figure 26 in c
  • the image obtained by "complementing the edges" of the c picture in Fig. 26 is shown in the d picture in Fig. 26.
  • the terminal can use a camera with a smaller magnification than the camera used to collect the previous image to collect the subsequent image, that is, by switching to a camera with a smaller magnification Realize the reduction of the target subject, so that there is no need to "fill the edges" of the collected images, thereby improving the user experience.
  • the method shown in FIG. 25 may include the following steps:
  • S400 The terminal determines to shoot Hitchcock zoom video from far and near, and determines the initial camera.
  • the terminal may determine to shoot the Hitchcock zoom video from far and near under the instruction of the user.
  • the user can click the "From far and near mode” 402 control through a touch operation.
  • the terminal highlights the "From far and near mode” 402 control.
  • start to shoot Hitchcock zoom video from far and near mode.
  • the terminal may also be activated in other ways (such as a voice command, etc.) to shoot Hitchcock zoom video in the far and near modes, which is not specifically limited in the embodiment of the present application.
  • the magnification of the initial camera is usually not the camera with the smallest magnification in the terminal, and it is usually possible to pre-define that the camera has a larger magnification in the terminal (e.g. The largest) one camera.
  • N+1 images for the first scene, and all the N+1 images include the target subject. Among them, in the process of collecting N+1 images, the terminal is getting closer and closer to the target subject.
  • N is an integer greater than or equal to 1.
  • the first image of the N+1 images is collected by the first camera of the terminal, and part or all of the last N images among the N+1 images are collected by the second camera of the terminal.
  • the magnification is smaller than the magnification of the first camera.
  • the size of the target subject in the N images acquired after the N+1 images is smaller than or equal to the size of the target subject in the first image acquired in the N+1 images.
  • the camera when the terminal is collecting images, the camera is switched from a large-magnification camera to a small-magnification camera, which helps to make the terminal get closer and closer to the target subject in the scene
  • the size of the target subject in the captured image is smaller than or equal to the size of the target subject in the previously captured image.
  • the N+1 images are N+1 images that are continuously collected, that is, N+1 images that are collected in real time.
  • the last N images in the N+1 images include N1 images collected before and N2 images collected later, where N1 images are collected by the second camera, and N2 images are collected by the terminal Collected by the third camera; N1 and N2 are both integers greater than or equal to 1.
  • acquiring N+1 images for the first scene includes: acquiring the shooting magnification of the i-th image in the N+1 images; where 2 ⁇ i ⁇ N, i is an integer; if the i-th image The shooting magnification of is within the first shooting magnification range, based on the second camera to collect the i+1 image of the N+1 images for the first scene; if the shooting magnification of the i-th image is within the second shooting magnification range , The third camera of the terminal collects the (i+1)th image among the N+1 images for the first scene.
  • the magnification of the second camera is b
  • the magnification of the third camera is c
  • the first shooting magnification range is a range greater than or equal to b
  • the second shooting magnification range is [c, b).
  • the shooting magnification of the first image is greater than the magnification of the second camera.
  • the initial shooting magnification of the terminal is greater than the magnification of the camera used when capturing the second image.
  • the terminal includes a 5X camera and a 1X camera, and the camera used to collect the first image can be a 5X camera.
  • the shooting magnification can be a range greater than 5, or [1, 5).
  • the camera used to capture the second image is a 1X camera.
  • the method may further include: in the current preview interface, displaying first information, where the first information is used to instruct to stop shooting the Hitchcock zoom video.
  • the terminal may display the first information in the current preview interface when the currently used camera is the camera with the smallest magnification in the terminal. For the user, it is possible to stop shooting the Hitchcock zoom video within a period of time after acquiring the first information.
  • the Hitchcock zoom video is shot. If the currently used camera is the camera with the smallest magnification in the terminal, the magnification of the current camera cannot be smaller, so the terminal The camera can no longer be switched. At this time, by displaying the first information on the current preview interface, it helps to prompt the user to stop shooting the video in time. Otherwise, the subsequent images collected by the terminal need to be reduced and edge-filled, thereby reducing the playback Hitchcock zoom User experience during video. That is, the embodiment of the present application provides a method for instructing the user to stop shooting the Hitchcock zoom video, which helps to improve the user experience.
  • the method may further include: in the current preview interface, displaying second information, where the second information is used to indicate that the target subject is stationary.
  • the method may further include: displaying third information in the current preview interface, and the third information is used to indicate that the target subject is in the center of the current preview image.
  • collecting N+1 images for the first scene includes: collecting the first image when the target subject is in the center of the current preview image.
  • the moving speed of the terminal is less than or equal to the preset speed. Since the number of cameras in the terminal is limited, and the terminal moving too fast may cause the camera to be switched too fast, and when the camera is switched to the smallest magnification camera, the camera cannot be switched.
  • the smallest magnification camera to capture images, as the terminal gets closer and closer to the target subject, the target subject in the later captured image becomes larger and larger, which may cause the size of the target subject in the later captured image to be larger than the above
  • the size of the target subject in the first image of N+1 images When generating the Hitchcock zoom video, these images need to be reduced and edge-filled, thereby reducing the user experience. Based on this, this possible design is proposed. In this way, it helps to improve the quality of the high-hitch Kirk zoom video.
  • the terminal For N images acquired after the N+1 images, the terminal performs white balance processing based on a preset neural network to obtain N optimized images. Among them, the preset neural network is used to ensure the white balance consistency of adjacent images in the time domain.
  • the terminal enlarges and crops the N optimized images to obtain N target images.
  • the size of the target subject in the N target images is consistent with the size of the target subject in the first image collected in the N+1 images.
  • the relative positions of the target subjects in the N target images are consistent with the relative positions of the target subjects in the first image.
  • the N target images have the same size as the first image.
  • S404 The terminal generates a Hitchcock zoom video based on the N target images and the first image.
  • the size of the target subject in the subsequent captured images is less than or equal to the previous one.
  • the size of the target subject in the captured image thereby obtaining the Hitchcock zoom video based on the captured image.
  • white balance processing is performed on the last N images of the N+1 captured images, so that the processed image is consistent with the white balance of the first image among the N+1 captured images. In this way, the white balance effect of the obtained Hitchcock zoom video can be better, thereby improving the quality of the Hitchcock zoom video and improving the user experience.
  • the terminal can reduce the size of the target subject in the subsequently collected image by switching the camera. Compared with the traditional technology, it does not need to "fill the edge" processing on the collected image. Therefore, it can Improve user experience.
  • this application also provides a method for shooting a Hitchcock zoom video, which can be applied to a scene where the terminal is getting closer and closer to the target subject.
  • the method can include the following steps:
  • Step 1 Refer to S400 above.
  • Step 2 The terminal collects N+1 images in real time for the first scene, and the N+1 images all include the target subject. Among them, in the process of collecting N+1 images, the terminal is getting closer and closer to the target subject.
  • N is an integer greater than or equal to 1.
  • the first image of the N+1 images is collected by the first camera of the terminal, and part or all of the last N images among the N+1 images are collected by the second camera of the terminal.
  • the magnification is smaller than the magnification of the first camera.
  • the camera used by the terminal to collect the second image is different from the camera used to collect the first image.
  • the magnification of the camera used to collect the second image is smaller than the magnification of the camera used to collect the first image, which helps to achieve that the size of the target image in the second image is smaller than the target subject in the first image The size of the image.
  • Step 3 Refer to S402 above.
  • Step 4 The terminal enlarges and crops the optimized images that meet the first condition among the N optimized images to obtain at least one target image.
  • the optimized image that satisfies the first condition is an optimized image in which the size of the included target subject is smaller than the size of the target image in the first image.
  • the size of the target body in the at least one target image is consistent with the size of the target body in the first image collected in the N+1 images.
  • the relative position of the target subject in the at least one target image is consistent with the prior alignment position of the target subject in the first image.
  • the at least one target image has the same size as the first image.
  • Step 5 The terminal generates a Hitchcock zoom video based on the at least one target image and the first image.
  • the terminal generates a Hitchcock zoom video based on the at least one target image, the first image, and images that do not meet the first condition among the N optimized images.
  • the size of the target subject in the later captured image may be greater than, equal to, or smaller than the size of the target subject in the first image. Therefore, this embodiment distinguishes an optimized image that meets the first condition and an optimized image that does not meet the first condition. For the optimized image that does not meet the first condition, it can be directly used as an image in the Hitchcock zoom video without performing zooming in and cropping.
  • step 5 For the specific implementation of step 5, reference may also be made to the related description in S304.
  • the size of the target subject in the N+1th image collected in the N+1th image is less than or equal to the size of the target subject in the first image.
  • the size of the target subject in the N+1th image is greater than the size of the target subject in the first image, and the difference between the size of the target subject in the N+1th image and the size of the target subject in the first image
  • the value is less than or equal to the preset threshold. That is to say, when the size of the target subject in the N+1th image is greater than the size of the target subject in the first image, the difference between the two cannot be too large, so that the Hitchcock zoom is generated based on the N+1th image
  • the video can meet the requirement of "the size of the target subject in the different images of the Hitchcock zoom video is the same.”
  • the Hitchcock zoom video is acquired.
  • the captured image The target subject becomes larger than the target subject in the captured image due to the terminal switching the current camera to a camera with a smaller magnification (or not switching the camera).
  • the size of the target subject in the image is larger than the size of the target subject in the first image.
  • the terminal if the difference between the size of the target subject in the image currently captured by the terminal and the size of the target subject in the first image is greater than the preset threshold, and the size of the target subject in the currently captured image If the size of the target subject is greater than the size of the target subject in the first image, the image acquisition is stopped.
  • the terminal generates the Hitchcock zoom video using the image collected before.
  • the method may further include: collecting an N+2th image for the first scene, where the N+2th image includes the target subject; the distance between the terminal and the target subject when N+2 images are collected , Less than the distance between the terminal and the target subject when N+1 images are collected.
  • the above step 4 includes: the difference between the size of the target subject in the N+2th image and the size of the target subject in the first image is greater than a preset threshold, and the difference between the size of the target subject in the N+2th image When the size of the target subject is larger than the size of the target subject in the first image, a Hitchcock zoom video is generated based on the N target images and the first image.
  • the first information may be output, and the first information is used to instruct to stop shooting the Hitchcock zoom video. That is to say, the embodiment of the present application provides a method for instructing the user to stop shooting the Hitchcock zoom video, so that the user can stop the mobile terminal based on the image, thereby improving the user experience.
  • the embodiment of the present application does not limit the specific implementation of the first information.
  • the first information may be output in the form of images, text, voice, and the like.
  • the terminal may display the current preview interface as shown in FIG. 23a when determining to stop collecting images based on the above solution, thereby prompting the user to stop shooting the Hitchcock zoom video.
  • N+1 images are acquired by the camera with the smallest magnification in the terminal.
  • the size of the target subject in the N+1th image is greater than the size of the target subject in the first image, and the difference between the size of the target subject in the N+1th image and the size of the target subject in the first image is equal to the preset size.
  • the threshold is set, the first information is output, and the first information is used to instruct to stop shooting the Hitchcock zoom video.
  • Hitchcock zoom video requires the same size of the target subject in different images.
  • the size of the target subject in the N+1th image is greater than the size of the target subject in the first image, and the difference between the size of the target subject in the N+1th image and the size of the target subject in the first image is equal to the preset
  • the threshold it means that the size of the target subject in the N+1th image has jumped from the size of the target subject in the first image, and has reached the critical value for obtaining Hitchcock zoom video.
  • the embodiment of the present application provides the above-mentioned method for stopping shooting of the Hitchcock zoom video.
  • the optimized image that satisfies the first condition in S403 may be replaced with: the included optimized image whose size of the target subject is smaller than the size of the target image in the reference image.
  • the reference image is among the N+1 images "before the image corresponding to the optimized image, the distance to the image is the closest, and the size of the included target subject is greater than or equal to the target included in the first image The size of the subject" image.
  • the shooting magnification ranges of these cameras are: [0.6,1), [1,2), [2,5), [5,10), the range greater than or equal to 10.
  • a 10X camera is used to collect the first image, and the size of the target subject in this image is d.
  • the optimized image that meets the first condition is the optimized image corresponding to the second image and the optimized image corresponding to the fourth image.
  • the reference image is the 3rd image when it is enlarged and cropped.
  • the reference image is the first image when it is enlarged and cropped.
  • FIG. 27 it is a schematic flowchart of a method for shooting a video provided by an embodiment of this application.
  • the method shown in FIG. 27 is applied to a terminal, and the terminal includes at least two cameras, and the magnifications of the at least two cameras are different.
  • the method shown in FIG. 27 may include the following steps:
  • the terminal separately collects at least two images for the first scene at the first moment through at least two cameras; where one camera corresponds to one image, and the at least two images both include the target subject.
  • the images collected for the video to be shot are images collected by multiple cameras for the same scene at the same time.
  • the terminal determines the number of frames N of the image to be inserted between the first image and the second image of the at least two images based on the preset playing time and the preset playing frame rate of the video; wherein, the first image is at least two Among the images collected by the first camera, the first camera is the camera with the largest magnification among the at least two cameras.
  • the second image is an image collected by a second camera among the at least two head images, and the second camera is the camera with the smallest magnification among the at least two cameras.
  • N is an integer greater than or equal to 1.
  • the size of the target subject in the image captured by the camera with the largest magnification is the largest
  • the size of the target subject in the image captured by the camera with the smallest magnification is the largest. The smallest size, and the proposed technical solution.
  • the terminal determines N images to be inserted based on the number of frames N of the image to be inserted and part or all of the at least two images. Wherein, the part or all of the images include at least the first image and the second image.
  • the terminal first extracts the size of the target subject in each of the part or all of the collected images, and then determines the pixel value of the corresponding image to be inserted based on the size of the target subject in the corresponding image. For specific examples, refer to the example shown in FIG. 28.
  • the terminal determines the image to be inserted based on the more images of the at least two images, the more it helps to improve the accuracy of the frame insertion, so that the image in the finally generated video can more reflect the real scene, thereby improving user experience.
  • S503 The terminal generates a video based on the at least two images and the N images to be inserted. Wherein, the size of the target subject in each image of the video gradually becomes larger or smaller.
  • a 10X camera, a 3X camera, a 1X camera, and a 0.6X camera are provided in the terminal.
  • the terminal collects images based on these 4 cameras at the same time to obtain images 1-4, as shown in Figure 28.
  • Figures a-d in Figure 28 represent images 1-4, respectively.
  • the preset playback duration of the video to be shot is n seconds
  • n is an integer greater than or equal to 1
  • the preset playback frame rate is 24 frames per second, that is, a total of 24 frames of images are played per second.
  • the terminal can perform the following steps:
  • the terminal determines the reference image, and determines the shooting magnification of the image to be inserted based on the zoom magnification between the reference image and two adjacent frames in the video to be shot.
  • the number of images to be inserted is N.
  • the reference image may be an image collected by a camera with the largest magnification (ie image 1), or an image collected by a camera with the smallest magnification (ie image 4).
  • the terminal based on the shooting magnification of any frame image to be inserted, and the value of the pixels in the two images that are greater than the shooting magnification of the frame image to be inserted and less than the shooting magnification of the frame image collected by the terminal, Perform frame interpolation to obtain the value of the pixel in the image to be inserted.
  • the terminal can obtain N images to be inserted.
  • FIG. 28 shows a schematic diagram of target subject detection, and the parts in the rectangular boxes in these figures represent the target subject.
  • FIG. 28 illustrates the steps of inserting frames.
  • the two images are images with a shooting magnification greater than the shooting magnification of the frame image to be inserted and the smallest difference from the shooting magnification of the frame image to be inserted, and the shooting magnification is less than the shooting magnification of the frame image to be inserted and The image with the smallest difference in the shooting magnification of the frame image to be inserted.
  • image 4 and image 3 are used for frame interpolation.
  • image 3 and image 2 are used for frame interpolation.
  • image 2 and image 1 are used for frame interpolation.
  • the terminal generates the video (or dynamic image) according to the size of the target subject to be inserted and the image 1-4 in ascending order or descending order.
  • Figure 28 illustrates the steps of generating a video.
  • a terminal collects multiple frames of images for the same scene at the same time through multiple cameras, and performs frame interpolation based on the multiple frames of images, thereby generating the video. In this way, compared to traditional technology, it helps to improve the quality of the generated video. In addition, it helps to enhance the fun of the animation effect and enhance the user's stickiness to the terminal.
  • the embodiments of the present application may divide the terminal into functional modules according to the foregoing method examples.
  • each functional module may be divided corresponding to each function, or two or more functions may be integrated into one processing module.
  • the above-mentioned integrated modules can be implemented in the form of hardware or software function modules. It should be noted that the division of modules in the embodiments of the present application is illustrative, and is only a logical function division, and there may be other division methods in actual implementation.
  • FIG. 29 it is a schematic structural diagram of a terminal provided by an embodiment of this application.
  • the terminal 220 shown in FIG. 29 can be used to implement the functions of the terminal in the foregoing method embodiment, and therefore, can also achieve the beneficial effects of the foregoing method embodiment.
  • the terminal may be the terminal 100 shown in FIG. 1.
  • the terminal 220 includes a collection unit 221 and a processing unit 222.
  • the terminal 220 further includes a display unit 223.
  • the acquisition unit 221 is configured to acquire N+1 images in real time for the first scene, and the N+1 images include the target subject; wherein, in the process of acquiring the N+1 images, the terminal is getting farther and farther away from the target subject .
  • N is an integer greater than or equal to 1.
  • the processing unit 222 is configured to perform the following steps: for N images acquired after the N+1 images, perform white balance processing based on a preset neural network to obtain N optimized images; the preset neural network is used to ensure the time domain phase Consistency of white balance of neighboring images.
  • the N optimized images are enlarged and cropped to obtain N target images; among them, the size of the target subject in the N target images is the same as the size of the target subject in the first image collected in the N+1 images, and N The relative position of the target subject in the target image is consistent with the relative position of the target subject in the first image; N target images have the same size as the first image; Hitchcock is generated based on N target images and the first image Zoom video.
  • the collection unit 221 is used to perform S301
  • the processing unit 222 is used to perform S302-S304.
  • N+1 images include N1+1 images collected before and N2 images collected later, where N1+1 images are collected by the first camera of the terminal, and N2 images are collected by the terminal’s Collected by the second camera; N1 and N2 are integers greater than or equal to 1.
  • the acquisition unit 221 is specifically configured to: acquire the shooting magnification of the i-th image in the N+1 images, where 2 ⁇ i ⁇ N, and i is an integer; if the shooting magnification of the i-th image is in the first Within the shooting magnification range, the i+1th image among the N+1 images is collected for the first scene based on the first camera.
  • the shooting magnification of the i-th image is within the second shooting magnification range
  • the i+1th image among the N+1 images is collected for the first scene based on the second camera; where the magnification of the first camera is a, The magnification of the second camera is b; a ⁇ b; the first shooting magnification range is [a, b); the second shooting magnification range is a range greater than or equal to b.
  • the collection unit 221 may be used to perform S301c-3.
  • the shooting magnification of the i-th image is determined based on the magnification of the size of the target subject in the i-th image relative to the size of the target subject in the first image, and the magnification of the camera that collects the first image.
  • the size of the target subject in the i-th image is characterized by at least one of the following features: the width of the target subject in the i-th image, the height of the target subject in the i-th image, and the size of the target subject in the i-th image The area of the target body, or the number of pixels occupied by the target body in the i-th image.
  • the processing unit 222 is further configured to extract the target subject from the i-th image by using an instance segmentation algorithm to determine the size of the target subject in the i-th image.
  • the display unit 223 is configured to display first information in the current preview interface, and the first information is used to instruct to stop shooting the Hitchcock zoom video.
  • the display unit 223 is configured to display second information in the current preview interface, and the second information is used to indicate that the target subject is stationary.
  • the display unit 223 is configured to display third information in the current preview interface, and the third information is used to indicate that the target subject is in the center of the current preview image.
  • the collecting unit 221 is specifically configured to collect the first image when the target subject is in the center of the current preview image.
  • the display unit 223 is configured to display a user interface, the user interface includes a first control, the first control is used to instruct to shoot a Hitchcock zoom video from near and far; and to receive an operation on the first control.
  • the collection unit 221 is specifically configured to collect N+1 images in real time for the first scene in response to the operation.
  • the moving speed of the terminal is less than or equal to a preset speed.
  • the preset neural network is used to combine the feature maps of the historical network layer to predict the white balance gain of the image to be processed to ensure the consistency of the white balance of adjacent images in the time domain; among them, the historical network layer predicts the white balance of the image to be processed.
  • the preset neural network is obtained by training based on preset constraint conditions; wherein, the preset constraint conditions include: the white balance gain prediction value of a plurality of consecutive images used for simulating the time domain is consistent.
  • the processing unit 222 performs white balance processing based on a preset neural network on the N images acquired after the N+1 images to obtain N optimized images, which is specifically used to:
  • the j-th image in is input to the preset neural network to obtain the white balance gain prediction value of the j-th image; among them, 2 ⁇ j ⁇ N-1, and j is an integer.
  • the white balance gain prediction value of the j-th image is applied to the j-th image to obtain an optimized image corresponding to the j-th image; wherein, the N optimized images include the optimized image corresponding to the j-th image.
  • the acquisition unit 221 is configured to acquire N+1 images in real time for the first scene, and the N+1 images include the target subject; wherein, in the process of acquiring the N+1 images, the terminal is getting closer and closer to the target subject ; N is an integer greater than or equal to 1.
  • the first image of the N+1 images is collected by the first camera of the terminal, and part or all of the last N images among the N+1 images are collected by the second camera of the terminal.
  • the magnification is smaller than the magnification of the first camera.
  • the size of the target subject in the N images acquired after the N+1 images is smaller than or equal to the size of the target subject in the first image acquired in the N+1 images.
  • the processing unit 222 is configured to perform the following steps: for the N images collected later, perform white balance processing based on a preset neural network to obtain N optimized images; the preset neural network is used to ensure that the white balances of adjacent images in the time domain are consistent sex.
  • the N optimized images are enlarged and cropped to obtain N target images.
  • the size of the target subject in the N target images is the same as the size of the target subject in the first image collected in the N+1 images.
  • the N targets The relative position of the target subject in the image is consistent with the relative position of the target subject in the first image.
  • the N target images have the same size as the first image.
  • a Hitchcock zoom video is generated.
  • the collection unit 221 may be used to perform S401
  • the processing unit 222 may be used to perform S402-S404.
  • the N images collected later include N1 images collected before and N2 images collected later, where N1 images are collected by the second camera, and N2 images are collected by the third camera of the terminal. ; N1 and N2 are both integers greater than or equal to 1.
  • the collecting unit 221 is specifically configured to: obtain the shooting magnification of the i-th image in the N+1 images, where 2 ⁇ i ⁇ N, i Is an integer; if the shooting magnification of the i-th image is within the first shooting magnification range, the i+1-th image among the N+1 images is collected for the first scene based on the second camera. If the shooting magnification of the i-th image is within the second shooting magnification range, the (i+1)th image among the N+1 images is collected for the first scene based on the third camera.
  • the magnification of the second camera is b
  • the magnification of the third camera is c
  • the first shooting magnification range is a range greater than or equal to b
  • the second shooting magnification range is [c, b).
  • the shooting magnification of the i-th image is determined based on the magnification of the size of the target subject in the i-th image relative to the size of the target subject in the first image, and the magnification of the camera that collects the first image.
  • the size of the target subject in the i-th image is characterized by at least one of the following features: the width of the target subject in the i-th image, the height of the target subject in the i-th image, and the size of the target subject in the i-th image The area of the target body, or the number of pixels occupied by the target body in the i-th image.
  • the processing unit 222 is further configured to extract the target subject from the i-th image by using an instance segmentation algorithm to determine the size of the target subject in the i-th image.
  • the display unit 223 is configured to display first information in the current preview interface, and the first information is used to instruct to stop shooting the Hitchcock zoom video.
  • the display unit 223 is configured to display second information in the current preview interface, and the second information is used to indicate that the target subject is stationary.
  • the display unit 223 is configured to display third information in the current preview interface, and the third information is used to indicate that the target subject is in the center of the current preview image.
  • the collecting unit 221 is specifically configured to collect the first image when the target subject is in the center of the current preview image.
  • the display unit 223 is configured to display a user interface, and the user interface includes a second control, the second control is used to instruct to shoot a Hitchcock zoom video from far and near; and to receive an operation on the second control.
  • the collection unit 221 is specifically configured to collect N+1 images for the first scene in response to the operation.
  • the moving speed of the terminal is less than or equal to a preset speed.
  • the preset neural network is used to combine the feature maps of the historical network layer to predict the white balance gain of the image to be processed to ensure the consistency of the white balance of adjacent images in the time domain; among them, the historical network layer predicts the white balance of the image to be processed.
  • the preset neural network is obtained by training based on preset constraint conditions; wherein, the preset constraint conditions include: the white balance gain prediction value of a plurality of consecutive images used for simulating the time domain is consistent.
  • N optimized images For N images acquired after N+1 images, perform white balance processing based on a preset neural network to obtain N optimized images, including: inputting the jth image of the N+1 images into The neural network is preset to obtain the white balance gain prediction value of the j-th image; wherein, 2 ⁇ j ⁇ N-1, and j is an integer.
  • the white balance gain prediction value of the jth image is applied to the jth image to obtain the optimized image corresponding to the jth image; wherein, the N optimized images include the optimized image corresponding to the jth image.
  • the acquisition unit 221 includes a first camera and a second camera, and the magnification of the first camera is different from the magnification of the second camera.
  • the collecting unit 221 is configured to collect the first image and the second image for the first scene at the first moment through the first camera and the second camera; wherein, the first image and the second image both contain the target subject.
  • the processing unit 222 is configured to perform the following steps: determine the number of frames N of the image to be inserted between the first image and the second image based on the preset playback duration and preset playback frame rate of the video; where N is greater than or equal to 1 Integer. Based on the number of frames N, the first image and the second image, N images to be inserted are determined.
  • a video is generated; wherein the size of the target subject in each image of the video gradually becomes larger or smaller.
  • the collection unit 221 may be used to perform S500
  • the processing unit 222 may be used to perform S501-S503.
  • the collection unit 221 further includes a third camera, and the magnification of the third camera is between the magnifications of the first camera and the second camera.
  • the acquisition unit 221 is further configured to acquire a third image for the first scene at the first moment through the third camera; wherein, the third image includes the target subject.
  • the processing unit 222 determines N images to be inserted based on the number of frames, the size of the target subject in the first image, and the size of the target subject in the second image, which is specifically used for: based on the number of frames N, the first image and
  • the second image determines the aspects of the N images to be inserted, which is specifically used to determine the N images to be inserted based on the number of frames N, the first image, the second image, and the third image.
  • the aforementioned acquisition unit may be implemented by a camera 193.
  • the functions of the aforementioned processing unit 222 can all be implemented by the processor 110 calling the degree code stored in the internal memory 121.
  • Another embodiment of the present application further provides a terminal, including: a processor, a memory, and a camera, the camera is used to collect images, the memory is used to store computer programs and instructions, and the processor is used to call the computer programs and instructions to execute the above in cooperation with the camera.
  • a terminal including: a processor, a memory, and a camera, the camera is used to collect images, the memory is used to store computer programs and instructions, and the processor is used to call the computer programs and instructions to execute the above in cooperation with the camera.
  • Another embodiment of the present application further provides a computer-readable storage medium that stores instructions in the computer-readable storage medium.
  • the instructions are executed on a terminal, each step executed by the terminal in the method flow shown in the foregoing method embodiment is performed.
  • the disclosed methods may be implemented as computer program instructions encoded on a computer-readable storage medium in a machine-readable format or encoded on other non-transitory media or articles.
  • the computer may be implemented in whole or in part by software, hardware, firmware, or any combination thereof.
  • a software program it can be implemented in the form of a computer program product in whole or in part.
  • the computer program product includes one or more computer instructions.
  • the computer execution instructions When the computer execution instructions are loaded and executed on the computer, the processes or functions according to the embodiments of the present application are generated in whole or in part.
  • the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.
  • Computer instructions may be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium.
  • computer instructions may be transmitted from a website, computer, server, or data center through a cable (such as Coaxial cable, optical fiber, digital subscriber line (digital subscriber line, DSL) or wireless (such as infrared, wireless, microwave, etc.) transmission to another website site, computer, server or data center.
  • the computer-readable storage medium may be any available medium that can be accessed by a computer or may include one or more data storage devices such as a server or a data center that can be integrated with the medium.
  • the usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, and a magnetic tape), an optical medium (for example, a DVD), or a semiconductor medium (for example, a solid state disk (SSD)).

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Studio Devices (AREA)

Abstract

本申请公开了拍摄视频的方法和装置,涉及拍照和图像处理技术领域,可以使希区柯克变焦视频的白平衡效果更好,以提高希区柯克变焦视频的质量。方法包括:在终端距离目标主体越来越远时,针对第一场景实时采集包括目标主体的N+1个图像。对于后采集的N个图像,基于"用于保证时域相邻图像的白平衡一致性"的神经网络进行白平衡处理,得到N个优化图像。对N个优化图像进行放大裁剪,得到N个目标图像;N个目标图像中目标主体的大小与采集的第一个图像中目标主体的大小一致,N个目标图像中目标主体的相对位置,与第一个图像中目标主体的相对位置一致;N个目标图像与第一个图像的大小一致。基于N个目标图像和第一个图像生成希区柯克变焦视频。

Description

拍摄视频的方法和装置
本申请要求于2020年05月30日提交国家知识产权局、申请号为202010480536.3、申请名称为“一种区分主体人物和背景的变焦方法和装置”的中国专利申请的优先权,以及,于2020年09月28提交国家知识产权局、申请号为202011043999.X、申请名称为“拍摄视频的方法和装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及拍照技术领域和图像处理技术领域,尤其涉及拍摄视频的方法和装置。
背景技术
随着手机摄像功能的飞跃发展,越来越多摄影用户开始尝试在视频中加入电影拍摄元素,以增加画面的丰富性和高级感,同时可以让用户体验到更加炫酷、惊艳的视觉效果。希区柯克变焦视频就是满足这种视觉效果的其中之一。
希区柯克变焦是一种特殊的视频拍摄手法。在拍摄希区柯克变焦视频的过程中,摄像机前进或后退,并且在前进或后退的同时改变焦距,以使目标主体在所采集的图像中大小不变,而背景图像发生剧烈改变。这种效果可以用来体现主体人物丰富的情感,给用户带来空间压缩或扩张的紧张感和冲击感,进而获得超脱常规的视频录制体验。
因此,如何实现希区柯克变焦成为亟待解决的技术问题。
发明内容
本申请实施例提供了拍摄视频的方法和装置,可以使得获得的希区柯克变焦视频的白平衡效果更好,从而提高希区柯克变焦视频的质量,进而提高用户体验。
为达到上述目的,本申请采用如下技术方案:
第一方面,提供一种拍摄视频的方法,该方法应用于终端。该方法包括:针对第一场景实时采集N+1个图像,该N+1个图像中均包括目标主体;其中,在采该集N+1个图像的过程中,终端距离目标主体越来越远。N是大于等于1的整数。对该于N+1个图像中后采集的N个图像,基于预设神经网络进行白平衡处理,得到N个优化图像;预设神经网络用于保证时域相邻图像的白平衡一致性。对该N个优化图像进行放大并裁剪,得到N个目标图像;其中,N个目标图像中的每个目标图像中的目标主体的大小与N+1个图像中采集的第一个图像中目标主体的大小一致,N个目标图像中的每个目标图像中的目标主体的相对位置,与第一个图像中目标主体的相对位置一致。N个目标图像与第一个图像的大小一致。基于N个目标图像和第一个图像生成希区柯克变焦视频。
本技术方案,在获取希区柯克变焦视频的过程中,对实时采集到的N+1个图像中的后N个图像进行了白平衡处理,以使得处理后的图像与所采集的N+1个图像中的第一个图像的白平衡一致。这样,可以使得获得的希区柯克变焦视频的白平衡效果更好,从而提高希区柯克变焦视频的质量,进而提高用户体验。
在一种可能的设计中,N+1个图像包括在前采集的N1+1个图像和在后采集的N2个图像,其中,N1+1个图像由终端的第一摄像头采集得到,N2个图像由终端的第二摄像头采集得到;N1和N2均是大于等于1的整数。也就是说,本申请实施例提供的技术方案可以应用于在切换摄像头的场景中拍摄希区柯克变焦视频。
在一种可能的设计中,针对第一场景实时采集N+1个图像,包括:获取N+1个图像中的第i个图像的拍摄倍率;其中,2≤i≤N,i是整数;如果第i个图像的拍摄倍率在第一拍摄倍率范围内,则基于终端的第一摄像头针对第一场景采集N+1个图像中的第i+1个图像;如果第i个图像的拍摄倍率在第二拍摄倍率范围内,则基于终端的第二摄像头针对第一场景采集N+1个图像中的第i+1个图像。其中,第一摄像头的倍率是a,第二摄像头的倍率是b;a<b;第一拍摄倍率范围是[a,b);第二拍摄倍率范围是大于等于b的范围。
也就是说,终端基于第i个图像的拍摄倍率,确定采集第i+1个图像的摄像头。这样,终端可以通过切换摄像头的方式,放大后续采集的图像中的目标主体的大小,相比传统技术,有助于使所获得的希区柯克变焦效果视频的清晰度更高,从而提高用户体验。
在一种可能的设计中,第i个图像的拍摄倍率是基于第i个图像中目标主体的大小相对于第一个图像中目标主体的大小的缩放倍率,和采集第一个图像的摄像头的倍率确定的。
在一种可能的设计中,第i个图像中的目标主体的大小通过以下至少一个特征来表征:第i个图像中的目标主体的宽度,第i个图像中的目标主体的高度,第i个图像中的目标主体的面积,或者,第i个图像中的目标主体的所占的像素点的数量。
在一种可能的设计中,该方法还包括:采用实例分割算法从第i个图像中提取目标主体,以确定第i个图像中的目标主体的大小。这样,有助于提高确定第i个图像中的目标主体的大小的精确度。
在一种可能的设计中,该方法还包括:在当前预览界面中,显示第一信息。第一信息用于指示停止拍摄希区柯克变焦视频。这样,用户可以获知什么时候停止移动终端,从而提高用户体验。
在一种可能的设计中,该方法还包括:在当前预览界面中,显示第二信息。第二信息用于指示目标主体静止。由于希区柯克变焦视频的要求之一是各图像中的目标主体的位置一致,因此,基于该可能的设计,用户可以在获取希区柯克变焦视频的过程中,获知当前是否满足获取希区柯克变焦视频的要求,从而提高用户体验。
在一种可能的设计中,该方法还包括:在当前预览界面中,显示第三信息,第三信息用于指示目标主体在当前预览图像的中央。这样,用户可以基于是否终端显示第三信息确定是否移动终端,从而有助于提高希区柯克视频的质量。其中,当前预览界面包含当前预览图像(即摄像头采集的图像)和除当前预览图像之外的信息(如拍摄控件、指示信息等)。
在一种可能的设计中,该方法还包括:针对第一场景实时采集N+1个图像,包括:在目标主体在当前预览图像的中央时,采集N+1个图像中的第一个图像。这样,有助于提高希区柯克视频的质量。
在一种可能的设计中,该方法还包括:显示用户界面,用户界面中包含第一控件,第一控件用于指示由近及远拍摄希区柯克变焦视频。针对第一场景实时采集N+1个图像,包括:接收针对第一控件的操作,响应于该操作,针对第一场景实时采集N+1个图像。
在一种可能的设计中,终端的移动速度小于等于预设速度。这样,有助于提高希区柯克变焦视频的质量。
在一种可能的设计中,预设神经网络用于结合历史网络层的特征图,对待处理图像的白平衡增益进行预测,以保证时域相邻图像的白平衡一致性;其中,历史网络层是预测在待处理图像之前且与待处理图像时域连续的图像的白平衡增益时所使用的网络层。示例的,待处理图像是上述N个图像中的其中一个。该白平衡网络融合了当前帧和历史帧的网络层特征信息。这样,考虑多帧的信息,有助于使得帧与帧之间的白平衡增益预测值更为接近,从而使得该白平衡网络更稳定,进而使得对连续多个图像进行白平衡处理后所得到的图像之间的白平衡一致性效果更好。
在一种可能的设计中,预设神经网络基于预设约束条件训练得到;其中,预设约束条件包括:用于模拟时域连续的多个图像的白平衡增益预测值一致。
在一种可能的设计中,对于N+1个图像中后采集的N个图像,基于预设神经网络进行白平衡处理,得到N个优化图像,包括:将N+1个图像中的第j个图像输入到预设神经网络,得到第j个图像的白平衡增益预测值;其中,2≤j≤N-1,j是整数。将第j个图像的白平衡增益预测值作用于第j个图像,得到第j个图像对应的优化图像;其中,N个优化图像包括第j个图像对应的优化图像。
第二方面,提供一种拍摄视频的方法,该方法应用于终端,该方法包括:针对第一场景采集N+1个图像,N+1个图像中均包括目标主体;其中,在采集N+1个图像的过程中,终端距离目标主体越来越近;N是大于等于1的整数。N+1个图像中的第一个图像由终端的第一摄像头采集得到,N+1个图像中的后N个图像中的部分或全部图像由终端的第二摄像头采集得到,第二摄像头的倍率小于第一摄像头的倍率。N+1个图像中后采集的N个图像中目标主体的大小小于或等于N+1个图像中采集的第一个图像中的目标主体的大小。对于N+1个图像中后采集的N个图像,基于预设神经网络进行白平衡处理,得到N个优化图像;预设神经网络用于保证时域相邻图像的白平衡一致性。对N个优化图像进行放大并裁剪,得到N个目标图像。其中,N个目标图像中的每个目标图像中的目标主体的大小与N+1个图像中采集的第一个图像中目标主体的大小一致,N个目标图像中每个目标图像中的目标主体的相对位置,与第一个图像中目标主体的相对位置一致;N个目标图像与第一个图像的大小一致。基于N个目标图像和第一个图像,生成希区柯克变焦视频。
可选的,该N+1个图像可以是连续采集的N+1个图像,即实时采集的N+1个图像。
该技术方案中,在终端距离目标主体越来越近的场景中,通过切换成倍率更小的摄像头采集在后的图像,从而实现在后采集的图像中目标主体的大小小于或等于在前采集的图像中目标主体的大小。并且,在获取希区柯克变焦视频的过程中,对采集到的N+1个图像中的后N个图像进行了白平衡处理,以使得处理后的图像与所采集的 N+1个图像中的第一个图像的白平衡一致。这样,可以使得获得的希区柯克变焦视频的白平衡效果更好,从而提高希区柯克变焦视频的质量,提高了用户体验。
在一种可能的设计中,N个图像包括在前采集的N1个图像和在后采集的N2个图像,其中,N1个图像由第二摄像头采集得到,N2个图像由终端的第三摄像头采集得到;N1和N2均是大于等于1的整数。
在一种可能的设计中,针对第一场景实时采集N+1个图像,包括:获取N+1个图像中的第i个图像的拍摄倍率;其中,2≤i≤N,i是整数;如果第i个图像的拍摄倍率在第一拍摄倍率范围内,则基于第二摄像头针对第一场景采集N+1个图像中的第i+1个图像;如果第i个图像的拍摄倍率在第二拍摄倍率范围内,则基于终端的第三摄像头针对第一场景采集N+1个图像中的第i+1个图像;其中,第二摄像头的倍率是b,第三摄像头的倍率是c;b>c;第一拍摄倍率范围是大于等于b的范围;第二拍摄倍率范围是[c,b)。
也就是说,终端基于第i个图像的拍摄倍率,确定采集第i+1个图像的摄像头。这样,终端可以采用比采集在前的图像时所使用的摄像头的倍率更小的摄像头采集在后的图像,即通过切换成更小倍率的摄像头实现对目标主体的缩小,这样,不需要对采集后的图像进行“补边”,从而提高用户体验。
在一种可能的设计中,第i个图像的拍摄倍率是基于第i个图像中目标主体的大小相对于第一个图像中目标主体的大小的缩放倍率,和采集第一个图像的摄像头的倍率确定的。
在一种可能的设计中,第i个图像中的目标主体的大小通过以下至少一个特征来表征:第i个图像中的目标主体的宽度,第i个图像中的目标主体的高度,第i个图像中的目标主体的面积,或者,第i个图像中的目标主体的所占的像素点的数量。
在一种可能的设计中,该方法还包括:采用实例分割算法从第i个图像中提取目标主体,以确定第i个图像中的目标主体的大小。这样,有助于提高确定第i个图像中的目标主体的大小的精确度。
在一种可能的设计中,该方法还包括:在当前预览界面中,显示第一信息,第一信息用于指示停止拍摄希区柯克变焦视频。这样,有助于指示用户在合适的时候,停止移动终端,从而提高用户体验。
在一种可能的设计中,该方法还包括:在当前预览界面中,显示第二信息,第二信息用于指示目标主体静止。基于该可能的设计,用户可以在获取希区柯克变焦视频的过程中,获知当前是否满足获取希区柯克变焦视频的要求,从而提高用户体验。
在一种可能的设计中,该方法还包括:在当前预览界面中,显示第三信息,第三信息用于指示目标主体在当前预览图像的中央。这样,用户可以基于是否终端显示第三信息确定是否移动终端,从而有助于提高希区柯克视频的质量。
在一种可能的设计中,该针对第一场景采集N+1个图像,包括:在目标主体在当前预览图像的中央时,采集第一个图像。这样,有助于提高希区柯克视频的质量。
在一种可能的设计中,该方法还包括:显示用户界面,所述用户界面中包含第二控件,第二控件用于指示由远及近拍摄希区柯克变焦视频。针对第一场景采集N+1个图像,包括:接收针对第二控件的操作,响应于该操作,针对第一场景采集N+1个图 像。
在一种可能的设计中,终端的移动速度小于等于预设速度。这样,有助于提高高希区柯克变焦视频的质量。
在一种可能的设计中,预设神经网络用于结合历史网络层的特征图,对待处理图像的白平衡增益进行预测,以保证时域相邻图像的白平衡一致性;其中,历史网络层是预测在待处理图像之前且与待处理图像时域连续的图像的白平衡增益时所使用的网络层。其有益效果可以参考上述第一方面的相关可能的设计。
在一种可能的设计中,预设神经网络基于预设约束条件训练得到;其中,预设约束条件包括:用于模拟时域连续的多个图像的白平衡增益预测值一致。
在一种可能的设计中,对于N+1个图像中后采集的N个图像,基于预设神经网络进行白平衡处理,得到N个优化图像,包括:将N+1个图像中的第j个图像输入到预设神经网络,得到第j个图像的白平衡增益预测值;其中,2≤j≤N-1,j是整数。将第j个图像的白平衡增益预测值作用于第j个图像,得到第j个图像对应的优化图像;其中,N个优化图像包括第j个图像对应的优化图像。
第三方面,提供了一种拍摄视频的方法,应用于终端,终端包括第一摄像头和第二摄像头,第一摄像头的倍率与第二摄像头的倍率不同。该方法包括:通过第一摄像头和第二摄像头在第一时刻针对第一场景分别采集第一图像和第二图像;其中,第一图像和第二图像中均包含目标主体。基于视频的预设播放时长和预设播放帧率,确定第一图像和第二图像之间的待插入图像的帧数N,N是大于等于1的整数。基于帧数N、第一图像和第二图像,确定N个待插入图像。基于第一图像、第二图像和N个待插入图像,生成视频;该视频中各图像中的目标主体的大小逐渐变大或逐渐变小。
该技术方案中,终端通过多个摄像头在同一时刻针对同一场景采集多帧图像,并基于该多帧图像进行插帧,从而生成视频,该视频中各图像中的目标主体的大小逐渐变大或逐渐变小。这样,相比传统技术,有助于提高所生成的视频的质量。另外,有助于提升动图效果的趣味性,增强用户对终端的粘性。
在一种可能的设计中,终端还包括第三摄像头,第三摄像头的倍率在第一摄像头与第二摄像头的倍率之间。该方法还包括:通过第三摄像头在第一时刻针对第一场景采集第三图像;其中,第三图像包含目标主体。基于帧数N、第一图像和第二图像,确定N个待插入图像,包括:基于帧数N、第一图像、第二图像和第三图像,确定N个待插入图像。这样,有助于进一步提升视频的质量。
第四方面,提供了一种终端。
在一种可能的设计中,该终端可以用于执行上述第一方面至第三方面提供的任一种方法。本申请可以根据上述第一方面至第三方面提供的任一种方法,对该终端进行功能模块的划分。例如,可以对应各个功能划分各个功能模块,也可以将两个或两个以上的功能集成在一个处理模块中。示例性的,本申请可以按照功能将该终端划分为采集单元、处理单元和显示单元等。上述划分的各个功能模块执行的可能的技术方案和有益效果的描述均可以参考上述第一方面至第三方面提供的相应技术方案,此处不再赘述。
在另一种可能的设计中,该装置包括存储器和处理器,所述存储器用于存储计算 机指令,所述处理器用于调用所述计算机指令,以执行如第一方面至第三方面提供的任一种方法。其中,上述第一方面至第三方面提供的任一种方法中的采集步骤,在该可能的设计中具体可以替换为控制采集步骤。上述相应方法中的显示步骤,在该可能的设计中具体可以替换为控制显示步骤。
第五方面,提供了一种终端,包括:处理器、存储器和摄像头。该摄像头用于采集图像等,存储器用于存储计算机程序和指令,处理器用于调用该计算机程序和指令,与该一个或多个摄像头协同执行上述第一方面至第三方面提供的相应技术方案。
第六方面,提供了一种计算机可读存储介质,如计算机非瞬态的可读存储介质。其上储存有计算机程序(或指令),当该计算机程序(或指令)在计算机上运行时,使得该计算机执行上述第一方面至第三方面提供的任一种方法。其中,上述第一方面至第三方面提供的任一种方法中的采集步骤,在该可能的设计中具体可以替换为控制采集步骤。上述相应方法中的显示步骤,在该可能的设计中具体可以替换为控制显示步骤。
第七方面,提供了一种计算机程序产品,当其在计算机上运行时,使得第一方面至第三方面提供的任一种方法被执行。其中,上述第一方面至第三方面提供的任一种方法中的采集步骤,在该可能的设计中具体可以替换为控制采集步骤。上述相应方法中的显示步骤,在该可能的设计中具体可以替换为控制显示步骤。
可以理解的是,上述提供的任一种终端、计算机存储介质、计算机程序产品或芯片系统等均可以应用于上文所提供的对应的方法,因此,其所能达到的有益效果可参考对应的方法中的有益效果,此处不再赘述。
在本申请中,上述终端或者各功能模块的名字对设备或功能模块本身不构成限定,在实际实现中,这些设备或功能模块可以以其他名称出现。只要各个设备或功能模块的功能和本申请类似,属于本申请权利要求及其等同技术的范围之内。
本申请的这些方面或其他方面在以下的描述中会更加简明易懂。
附图说明
图1为本申请实施例可适用的一种终端的硬件结构示意图;
图2为本申请实施例可适用的一种终端的软件结构框图;
图3为本申请实施例提供的一种启动希区柯克变焦视频拍摄模式的界面变化示意图;
图4为本申请实施例提供的一种计算机设备的硬件结构示意图;
图5为本申请实施例提供的一种训练白平衡网络之前的训练数据准备过程的流程示意图;
图6为本申请实施例提供的一种训练白平衡网络时所使用的网络架构的示意图;
图7为本申请实施例提供的另一种训练白平衡网络时所使用的网络架构的示意图;
图8为本申请实施例提供的一种预测阶段所使用的网络架构的示意图;
图9为本申请实施例提供的一种白平衡增益预测方法的流程示意图;
图10为本申请实施例提供的一种拍摄视频的方法的流程示意图;
图11为本申请实施例提供的一种启动由近及远模式拍摄希区柯克变焦视频的界面变化示意图;
图12为本申请实施例提供的一组界面示意图;
图13为本申请实施例提供的另一组界面示意图;
图14为本申请实施例提供的另一组界面示意图;
图15为本申请实施例提供的另一组界面示意图;
图16为本申请实施例提供的另一组界面示意图;
图17为本申请实施例提供的另一组界面示意图;
图18为本申请实施例提供的另一组界面示意图;
图19a为本申请实施例提供的一种终端采集图像的方法的流程示意图;
图19b为本申请实施例提供的一种确定采集图像的摄像头的方法的流程示意图;
图20为本申请实施例提供的一种实例分割的示意图;
图21为本申请实施例提供的一种终端采集图像的过程示意图;
图22a为本申请实施例提供的一种终端采集到的图像的示意图;
图22b为本申请实施例提供的另一种终端所采集到的图像的示意图;
图23a为本申请实施例提供的一种当前预览界面的示意图;
图23b为本申请实施例提供的另一种当前预览界面的示意图;
图23c为本申请实施例提供的另一种当前预览界面的示意图;
图24为本申请实施例提供的一种对所采集的图像进行放大并裁剪的示意图;
图25为本申请实施例提供的另一种拍摄视频的方法的流程示意图;
图26为传统技术中希区柯克变焦视频中对所采集的图像进行处理的过程示意图;
图27为本申请实施例提供的另一种拍摄视频的方法的流程示意图;
图28为本申请实施例提供的一种对图像进行处理的过程示意图;
图29为本申请实施例提供的一种终端的结构示意图;
图30为本申请实施例提供的另一种终端的结构示意图。
具体实施方式
在本申请实施例中,“示例性的”或者“例如”等词用于表示作例子、例证或说明。本申请实施例中被描述为“示例性的”或者“例如”的任何实施例或设计方案不应被解释为比其它实施例或设计方案更优选或更具优势。确切而言,使用“示例性的”或者“例如”等词旨在以具体方式呈现相关概念。
在本申请实施例中,术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括一个或者更多个该特征。在本申请实施例的描述中,除非另有说明,“多个”的含义是两个或两个以上。
在本申请实施例中,术语“一致”仅用于描述相同或相似(即相差不大)。相差不大可以通过相应参数之差小于等于阈值来体现。例如,目标主体的大小一致,是指目标主体的大小相同或相差小于等于阈值等。
本申请实施例提供的拍摄视频的方法可以应用于终端中,该终端可以是带有摄像头的终端,如智能手机、平板电脑、可穿戴设备、AR/VR设备,也可以是个人计算机(personal computer,PC)、个人数字助理(personal digital assistant,PDA)、上网本等设备,还可以是其他任一能够实现本申请实施例的终端。本申请对终端的具体形 态不予限定。
在本申请中,终端的结构可以如图1所示。如图1所示,终端100可以包括处理器110,外部存储器接口120,内部存储器121,通用串行总线(universal serial bus,USB)接口130,充电管理模块140,电源管理模块141,电池142,天线1,天线2,移动通信模块150,无线通信模块160,音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,传感器模块180,按键190,马达191,指示器192,摄像头193,显示屏194,以及用户标识模块(subscriber identification module,SIM)卡接口195等。其中传感器模块180可以包括压力传感器180A,陀螺仪传感器180B,气压传感器180C,磁传感器180D,加速度传感器180E,距离传感器180F,接近光传感器180G,指纹传感器180H,温度传感器180J,触摸传感器180K,环境光传感器180L,骨传导传感器180M等。
可以理解的是,本实施例示意的结构并不构成对终端100的具体限定。在另一些实施例中,终端100可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。图示的部件可以以硬件,软件或软件和硬件的组合实现。
处理器110可以包括一个或多个处理单元,例如:处理器110可以包括应用处理器(application processor,AP),调制解调处理器,图形处理器(graphics processing unit,GPU),图像信号处理器(image signal processor,ISP),控制器,视频编解码器,数字信号处理器(digital signal processor,DSP),基带处理器,和/或神经网络处理器(neural-network processing unit,NPU)等。其中,不同的处理单元可以是独立的器件,也可以集成在一个或多个处理器中。例如,在本申请中,处理器110可以控制摄像头193针对第一场景实时采集N+1个图像,N+1个图像中均包括目标主体。其中,在采集N+1个图像的过程中,摄像头193距离目标主体越来越远;N是大于等于1的整数。然后,对于N+1个图像中后采集的N个图像,处理器110可以基于预设神经网络进行白平衡处理,得到N个优化图像;预设神经网络用于保证时域相邻图像的白平衡一致性。接着,处理器110可以对N个优化图像进行放大并裁剪,得到N个目标图像;其中,N个目标图像中的目标主体的大小与N+1个图像中采集的第一个图像中目标主体的大小一致,N个目标图像中目标主体的相对位置,与第一个图像中目标主体的相对位置一致;N个目标图像与第一个图像的大小一致。最后,处理器110可以基于N个目标图像和第一个图像生成希区柯克变焦视频。该技术方案的相关说明可以参考下文。
其中,控制器可以是终端100的神经中枢和指挥中心。控制器可以根据指令操作码和时序信号,产生操作控制信号,完成取指令和执行指令的控制。
处理器110中还可以设置存储器,用于存储指令和数据。在一些实施例中,处理器110中的存储器为高速缓冲存储器。该存储器可以保存处理器110刚用过或循环使用的指令或数据。如果处理器110需要再次使用该指令或数据,可从所述存储器中直接调用。避免了重复存取,减少了处理器110的等待时间,因而提高了系统的效率。
在一些实施例中,处理器110可以包括一个或多个接口。接口可以包括集成电路(inter-integrated circuit,I2C)接口,集成电路内置音频(inter-integrated circuit sound,I2S)接口,脉冲编码调制(pulse code modulation,PCM)接口,通用异步收发传输器 (universal asynchronous receiver/transmitter,UART)接口,移动产业处理器接口(mobile industry processor interface,MIPI),通用输入输出(general-purpose input/output,GPIO)接口,用户标识模块(subscriber identity module,SIM)接口,和/或通用串行总线(universal serial bus,USB)接口等。
MIPI接口可以被用于连接处理器110与显示屏194,摄像头193等外围器件。MIPI接口包括摄像头串行接口(camera serial interface,CSI),显示屏串行接口(display serial interface,DSI)等。在一些实施例中,处理器110和摄像头193通过CSI接口通信,实现终端100的拍摄功能。处理器110和显示屏194通过DSI接口通信,实现终端100的显示功能。
GPIO接口可以通过软件配置。GPIO接口可以被配置为控制信号,也可被配置为数据信号。在一些实施例中,GPIO接口可以用于连接处理器110与摄像头193,显示屏194,无线通信模块160,音频模块170,传感器模块180等。GPIO接口还可以被配置为I2C接口,I2S接口,UART接口,MIPI接口等。
USB接口130是符合USB标准规范的接口,具体可以是Mini USB接口,Micro USB接口,USB Type C接口等。USB接口130可以用于连接充电器为终端100充电,也可以用于终端100与外围设备之间传输数据。也可以用于连接耳机,通过耳机播放音频。该接口还可以用于连接其他终端,例如AR设备等。
可以理解的是,本实施例示意的各模块间的接口连接关系,只是示意性说明,并不构成对终端100的结构限定。在本申请另一些实施例中,终端100也可以采用上述实施例中不同的接口连接方式,或多种接口连接方式的组合。
电源管理模块141用于连接电池142,充电管理模块140与处理器110。电源管理模块141接收电池142和/或充电管理模块140的输入,为处理器110,内部存储器121,显示屏194,摄像头193,和无线通信模块160等供电。电源管理模块141还可以用于监测电池容量,电池循环次数,电池健康状态(漏电,阻抗)等参数。在其他一些实施例中,电源管理模块141也可以设置于处理器110中。在另一些实施例中,电源管理模块141和充电管理模块140也可以设置于同一个器件中。
终端100的无线通信功能可以通过天线1,天线2,移动通信模块150,无线通信模块160,调制解调处理器以及基带处理器等实现。
终端100通过GPU,显示屏194,以及应用处理器等实现显示功能。GPU为图像处理的微处理器,连接显示屏194和应用处理器。GPU用于执行数学和几何计算,用于图形渲染。处理器110可包括一个或多个GPU,其执行程序指令以生成或改变显示信息。
显示屏194用于显示图像,视频等。显示屏194包括显示面板。显示面板可以采用液晶显示屏(liquid crystal display,LCD),有机发光二极管(organic light-emitting diode,OLED),有源矩阵有机发光二极体或主动矩阵有机发光二极体(active-matrix organic light emitting diode的,AMOLED),柔性发光二极管(flex light-emitting diode,FLED),Miniled,MicroLed,Micro-oled,量子点发光二极管(quantum dot light emitting diodes,QLED)等。在一些实施例中,终端100可以包括1个或N个显示屏194,N为大于1的正整数。
终端100的显示屏194上可以显示一系列图形用户界面(graphical user interface,GUI),这些GUI都是该终端100的主屏幕。一般来说,终端100的显示屏194的尺寸是固定的,只能在该终端100的显示屏194中显示有限的控件。控件是一种GUI元素,它是一种软件组件,包含在应用程序中,控制着该应用程序处理的所有数据以及关于这些数据的交互操作,用户可以通过直接操作(direct manipulation)来与控件交互,从而对应用程序的有关信息进行读取或者编辑。一般而言,控件可以包括图标、按钮、菜单、选项卡、文本框、对话框、状态栏、导航栏、Widget等可视的界面元素。
终端100可以通过ISP,摄像头193,视频编解码器,GPU,显示屏194以及应用处理器等实现拍摄功能。
ISP用于处理摄像头193反馈的数据。例如,拍照时,打开快门,光线通过镜头被传递到摄像头感光元件上,光信号转换为电信号,摄像头感光元件将所述电信号传递给ISP处理,转化为肉眼可见的图像。ISP还可以对图像的噪点,亮度,肤色进行算法优化。ISP还可以对拍摄场景的曝光,色温等参数优化。在一些实施例中,ISP可以设置在摄像头193中。
摄像头193用于捕获静态图像或视频。物体通过镜头生成光学图像投射到感光元件。感光元件可以是电荷耦合器件(charge coupled device,CCD)或互补金属氧化物半导体(complementary metal-oxide-semiconductor,CMOS)光电晶体管。感光元件把光信号转换成电信号,之后将电信号传递给ISP转换成数字图像信号。ISP将数字图像信号输出到DSP加工处理。DSP将数字图像信号转换成标准的RGB,YUV等格式的图像信号。在一些实施例中,终端100可以包括1个或N个摄像头193,N为大于1的正整数。例如,上述摄像头193可以包括主摄像头、长焦摄像头、广角摄像头、红外摄像头、深度摄像头或者黑白摄像头等一种或者至少两种摄像头。结合本申请实施例提供的技术方案,第一终端可以采用上述一种或者至少两种摄像头采集图像,并将采集到的图像进行处理(如融合等),得到预览图像(如第一预览图像或第二预览图像等)。
数字信号处理器用于处理数字信号,除了可以处理数字图像信号,还可以处理其他数字信号。例如,当终端100在频点选择时,数字信号处理器用于对频点能量进行傅里叶变换等。
视频编解码器用于对数字视频压缩或解压缩。终端100可以支持一种或多种视频编解码器。这样,终端100可以播放或录制多种编码格式的视频,例如:动态图像专家组(moving picture experts group,MPEG)1,MPEG2,MPEG3,MPEG4等。
NPU为神经网络(neural-network,NN)计算处理器,通过借鉴生物神经网络结构,例如借鉴人脑神经元之间传递模式,对输入信息快速处理,还可以不断的自学习。通过NPU可以实现终端100的智能认知等应用,例如:图像识别,人脸识别,语音识别,文本理解等。
外部存储器接口120可以用于连接外部存储卡,例如Micro SD卡,实现扩展终端100的存储能力。外部存储卡通过外部存储器接口120与处理器110通信,实现数据存储功能。例如将音乐,视频等文件保存在外部存储卡中。
内部存储器121可以用于存储计算机可执行程序代码,所述可执行程序代码包括 指令。处理器110通过运行存储在内部存储器121的指令,从而执行终端100的各种功能应用以及数据处理。例如,在本实施例中,处理器110可以通过执行存储在内部存储器121中的指令,获取终端100的姿势。内部存储器121可以包括存储程序区和存储数据区。其中,存储程序区可存储操作系统,至少一个功能所需的应用程序(比如声音播放功能,图像播放功能等)等。存储数据区可存储终端100使用过程中所创建的数据(比如音频数据,电话本等)等。此外,内部存储器121可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件,闪存器件,通用闪存存储器(universal flash storage,UFS)等。处理器110通过运行存储在内部存储器121的指令,和/或存储在设置于处理器中的存储器的指令,执行终端100的各种功能应用以及数据处理。
终端100可以通过音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,以及应用处理器等实现音频功能。例如音乐播放,录音等。
音频模块170用于将数字音频信息转换成模拟音频信号输出,也用于将模拟音频输入转换为数字音频信号。音频模块170还可以用于对音频信号编码和解码。在一些实施例中,音频模块170可以设置于处理器110中,或将音频模块170的部分功能模块设置于处理器110中。
扬声器170A,也称“喇叭”,用于将音频电信号转换为声音信号。终端100可以通过扬声器170A收听音乐,或收听免提通话。
受话器170B,也称“听筒”,用于将音频电信号转换成声音信号。当终端100接听电话或语音信息时,可以通过将受话器170B靠近人耳接听语音。
麦克风170C,也称“话筒”,“传声器”,用于将声音信号转换为电信号。当拨打电话或发送语音信息时,用户可以通过人嘴靠近麦克风170C发声,将声音信号输入到麦克风170C。终端100可以设置至少一个麦克风170C。在另一些实施例中,终端100可以设置两个麦克风170C,除了采集声音信号,还可以实现降噪功能。在另一些实施例中,终端100还可以设置三个,四个或更多麦克风170C,实现采集声音信号,降噪,还可以识别声音来源,实现定向录音功能等。
耳机接口170D用于连接有线耳机。耳机接口170D可以是USB接口130,也可以是3.5mm的开放移动电子设备平台(open mobile terminal platform,OMTP)标准接口,美国蜂窝电信工业协会(cellular telecommunications industry association of the USA,CTIA)标准接口。
压力传感器180A用于感受压力信号,可以将压力信号转换成电信号。在一些实施例中,压力传感器180A可以设置于显示屏194。压力传感器180A的种类很多,如电阻式压力传感器,电感式压力传感器,电容式压力传感器等。电容式压力传感器可以是包括至少两个具有导电材料的平行板。当有力作用于压力传感器180A,电极之间的电容改变。终端100根据电容的变化确定压力的强度。当有触摸操作作用于显示屏194,终端100根据压力传感器180A检测所述触摸操作强度。终端100也可以根据压力传感器180A的检测信号计算触摸的位置。在一些实施例中,作用于相同触摸位置,但不同触摸操作强度的触摸操作,可以对应不同的操作指令。例如:当有触摸操作强度小于第一压力阈值的触摸操作作用于短消息应用图标时,执行查看短消息的指令。 当有触摸操作强度大于或等于第一压力阈值的触摸操作作用于短消息应用图标时,执行新建短消息的指令。
陀螺仪传感器180B可以用于确定终端100的运动姿势。在一些实施例中,可以通过陀螺仪传感器180B确定终端100围绕三个轴(即,x,y和z轴)的角速度。陀螺仪传感器180B可以用于拍摄防抖。示例性的,当按下快门,陀螺仪传感器180B检测终端100抖动的角度,根据角度计算出镜头模组需要补偿的距离,让镜头通过反向运动抵消终端100的抖动,实现防抖。陀螺仪传感器180B还可以用于导航,体感游戏场景。
气压传感器180C用于测量气压。在一些实施例中,终端100通过气压传感器180C测得的气压值计算海拔高度,辅助定位和导航。
磁传感器180D包括霍尔传感器。终端100可以利用磁传感器180D检测翻盖皮套的开合。在一些实施例中,当终端100是翻盖机时,终端100可以根据磁传感器180D检测翻盖的开合。进而根据检测到的皮套的开合状态或翻盖的开合状态,设置翻盖自动解锁等特性。
加速度传感器180E可检测终端100在各个方向上(一般为三轴)加速度的大小。当终端100静止时可检测出重力的大小及方向。还可以用于识别终端姿势,应用于横竖屏切换,计步器等应用。
距离传感器180F,用于测量距离。终端100可以通过红外或激光测量距离。在一些实施例中,拍摄场景,终端100可以利用距离传感器180F测距以实现快速对焦。
接近光传感器180G可以包括例如发光二极管(LED)和光检测器,例如光电二极管。发光二极管可以是红外发光二极管。终端100通过发光二极管向外发射红外光。终端100使用光电二极管检测来自附近物体的红外反射光。当检测到充分的反射光时,可以确定终端100附近有物体。当检测到不充分的反射光时,终端100可以确定终端100附近没有物体。终端100可以利用接近光传感器180G检测用户手持终端100贴近耳朵通话,以便自动熄灭屏幕达到省电的目的。接近光传感器180G也可用于皮套模式,口袋模式自动解锁与锁屏。
环境光传感器180L用于感知环境光亮度。终端100可以根据感知的环境光亮度自适应调节显示屏194亮度。环境光传感器180L也可用于拍照时自动调节白平衡。环境光传感器180L还可以与接近光传感器180G配合,检测终端100是否在口袋里,以防误触。
指纹传感器180H用于采集指纹。终端100可以利用采集的指纹特性实现指纹解锁,访问应用锁,指纹拍照,指纹接听来电等。
温度传感器180J用于检测温度。在一些实施例中,终端100利用温度传感器180J检测的温度,执行温度处理策略。例如,当温度传感器180J上报的温度超过阈值,终端100执行降低位于温度传感器180J附近的处理器的性能,以便降低功耗实施热保护。在另一些实施例中,当温度低于另一阈值时,终端100对电池142加热,以避免低温导致终端100异常关机。在其他一些实施例中,当温度低于又一阈值时,终端100对电池142的输出电压执行升压,以避免低温导致的异常关机。
触摸传感器180K,也称“触控器件”。触摸传感器180K可以设置于显示屏194, 由触摸传感器180K与显示屏194组成触摸屏,也称“触控屏”。触摸传感器180K用于检测作用于其上或附近的触摸操作。触摸传感器可以将检测到的触摸操作传递给应用处理器,以确定触摸事件类型。可以通过显示屏194提供与触摸操作相关的视觉输出。在另一些实施例中,触摸传感器180K也可以设置于终端100的表面,与显示屏194所处的位置不同。
骨传导传感器180M可以获取振动信号。在一些实施例中,骨传导传感器180M可以获取人体声部振动骨块的振动信号。骨传导传感器180M也可以接触人体脉搏,接收血压跳动信号。在一些实施例中,骨传导传感器180M也可以设置于耳机中,结合成骨传导耳机。音频模块170可以基于所述骨传导传感器180M获取的声部振动骨块的振动信号,解析出语音信号,实现语音功能。应用处理器可以基于所述骨传导传感器180M获取的血压跳动信号解析心率信息,实现心率检测功能。
按键190包括开机键,音量键等。按键190可以是机械按键。也可以是触摸式按键。终端100可以接收按键输入,产生与终端100的用户设置以及功能控制有关的键信号输入。
马达191可以产生振动提示。马达191可以用于来电振动提示,也可以用于触摸振动反馈。例如,作用于不同应用(例如拍照,音频播放等)的触摸操作,可以对应不同的振动反馈效果。作用于显示屏194不同区域的触摸操作,马达191也可对应不同的振动反馈效果。不同的应用场景(例如:时间提醒,接收信息,闹钟,游戏等)也可以对应不同的振动反馈效果。触摸振动反馈效果还可以支持自定义。
指示器192可以是指示灯,可以用于指示充电状态,电量变化,也可以用于指示消息,未接来电,通知等。
另外,在上述部件之上,运行有操作系统。例如苹果公司所开发的iOS操作系统,谷歌公司所开发的Android开源操作系统,微软公司所开发的Windows操作系统等。在该操作系统上可以安装运行应用程序。
终端100的操作系统可以采用分层架构,事件驱动架构,微核架构,微服务架构,或云架构。本申请实施例以分层架构的Android系统为例,示例性说明终端100的软件结构。
图2是本申请实施例的终端100的软件结构框图。
分层架构将软件分成若干个层,每一层都有清晰的角色和分工。层与层之间通过软件接口通信。在一些实施例中,将Android系统分为四层,从上至下分别为应用程序层,应用程序框架层,安卓运行时(Android runtime)和系统库,以及内核层。
应用程序层可以包括一系列应用程序包。如图2所示,应用程序包可以包括相机,图库,日历,通话,地图,导航,WLAN,蓝牙,音乐,视频,短信息等应用程序。例如,在拍照时,相机应用可以访问应用程序框架层提供的相机接口管理服务。
应用程序框架层为应用程序层的应用程序提供应用编程接口(application programming interface,API)和编程框架。应用程序框架层包括一些预先定义的函数。如图2所示,应用程序框架层可以包括窗口管理器,内容提供器,视图系统,电话管理器,资源管理器,通知管理器等。例如,在本申请实施例中,在拍照时,应用程序框架层可以为应用程序层提供拍照功能相关的API,并为应用程序层提供相机接口管 理服务,以实现拍照功能。
窗口管理器用于管理窗口程序。窗口管理器可以获取显示屏大小,判断是否有状态栏,锁定屏幕,截取屏幕等。
内容提供器用来存放和获取数据,并使这些数据可以被应用程序访问。所述数据可以包括视频,图像,音频,拨打和接听的电话,浏览历史和书签,电话簿等。
视图系统包括可视控件,例如显示文字的控件,显示图片的控件等。视图系统可用于构建应用程序。显示界面可以由一个或多个视图组成的。例如,包括短信通知图标的显示界面,可以包括显示文字的视图以及显示图片的视图。
电话管理器用于提供终端100的通信功能。例如通话状态的管理(包括接通,挂断等)。
资源管理器为应用程序提供各种资源,比如本地化字符串,图标,图片,布局文件,视频文件等等。
通知管理器使应用程序可以在状态栏中显示通知信息,可以用于传达告知类型的消息,可以短暂停留后自动消失,无需用户交互。比如通知管理器被用于告知下载完成,消息提醒等。通知管理器还可以是以图表或者滚动条文本形式出现在系统顶部状态栏的通知,例如后台运行的应用程序的通知,还可以是以对话窗口形式出现在屏幕上的通知。例如在状态栏提示文本信息,发出提示音,终端振动,指示灯闪烁等。
Android Runtime包括核心库和虚拟机。Android runtime负责安卓系统的调度和管理。
核心库包含两部分:一部分是java语言需要调用的功能函数,另一部分是安卓的核心库。
应用程序层和应用程序框架层运行在虚拟机中。虚拟机将应用程序层和应用程序框架层的java文件执行为二进制文件。虚拟机用于执行对象生命周期的管理,堆栈管理,线程管理,安全和异常的管理,以及垃圾回收等功能。
系统库可以包括多个功能模块。例如:表面管理器(surface manager),媒体库(Media Libraries),三维图形处理库(例如:OpenGL ES),2D图形引擎(例如:SGL)等。
表面管理器用于对显示子系统进行管理,并且为多个应用程序提供了2D和3D图层的融合。
媒体库支持多种常用的音频,视频格式回放和录制,以及静态图像文件等。媒体库可以支持多种音视频编码格式,例如:MPEG4,H.264,MP3,AAC,AMR,JPG,PNG等。
三维图形处理库用于实现三维图形绘图,图像渲染,合成,和图层处理等。
2D图形引擎是2D绘图的绘图引擎。
内核层是硬件和软件之间的层。内核层至少包含显示驱动,摄像头驱动,音频驱动,传感器驱动。
需要说明的是,本申请实施例虽然以Android系统为例进行说明,但是其基本原理同样适用于基于iOS或Windows等操作系统的终端。
下面结合图1和视频拍摄场景,示例性说明终端100软件以及硬件的工作流程。
首先,触摸传感器180K接收到对相机应用图标的触摸操作触摸操作,上报给处 理器110,使得处理器110响应于上述触摸操作,启动相机应用,并在显示屏194上显示该相机应用的用户界面,如图3中的a图所示。此外,本申请实施例中还可以通过其它方式使得终端100启动相机应用,并在显示屏194上显示相机应用的用户界面。例如,终端100在黑屏、显示锁屏界面或者解锁后显示某一用户界面时,可以响应于用户的语音指令或者快捷操作等,启动相机应用,并在显示屏194上显示相机应用的用户界面。其中,相机的用户界面上包含“夜景”“人像”“拍照”“录像”“更多”等控件。
其次,触摸传感器180K接收到对“录像”控件的触摸操作,上报给处理器110,使得处理器110响应于上述触摸操作,将“录像”控件突出显示,如图3中的b图所示,图3中的b图中“录像”控件加边框以突出显示;并启动录像功能,显示录像功能下的用户界面,如图3中的c图所示。录像功能下的用户界面包括“希区柯克变焦视频”“普通视频”“更多”等控件。
接着,触摸传感器180K接收到对“希区柯克变焦视频”控件的触摸操作,上报给处理器110,使得处理器110响应于上述触摸操作,将“希区柯克变焦视频”控件突出显示,如图3中的d图所示,图3中的d图中“希区柯克变焦视频”控件加边框以突出显示;并采用希区柯克变焦视频拍摄模式开始录像,即开始拍摄希区柯克变焦视频。
本申请实施例中还可以通过其它方式使得终端100启动希区柯克变焦视频拍摄模式。例如,终端100可以响应于用户的语音指令或快捷操作等,启动希区柯克变焦视频拍摄模式。
如图4所示,为本申请实施例提供的一种计算机设备30的硬件结构示意图。该计算机设备30包括处理器301、存储器302、通信接口303以及总线304。其中,处理器301、存储器302以及通信接口303之间可以通过总线304连接。
处理器301是计算机设备30的控制中心,可以是一个通用CPU,也可以是其他通用处理器等。其中,通用处理器可以是微处理器或者是任何常规的处理器等。
作为示例,处理器301可以包括一个或多个CPU,例如图4中所示的CPU 0和CPU 1。
存储器302可以是只读存储器(read-only memory,ROM)或可存储静态信息和指令的其他类型的静态存储设备,随机存取存储器(random access memory,RAM)或者可存储信息和指令的其他类型的动态存储设备,也可以是电可擦可编程只读存储器(electrically erasable programmable read-only memory,EEPROM)、磁盘存储介质或者其他磁存储设备、或者能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质,但不限于此。
一种可能的实现方式中,存储器301可以独立于处理器301存在。存储器302可以通过总线304与处理器301相连接,用于存储数据、指令或者程序代码。处理器301调用并执行存储器302中存储的指令或程序代码时,能够实现本申请实施例提供的训练白平衡网络之前的训练数据准备过程的相应方法,以及训练白平衡网络的方法等。
另一种可能的实现方式中,存储器302也可以和处理器301集成在一起。
通信接口303,可以是任意能够输入参数信息的器件如通信接口等,本申请实施 例不作限定。其中,通信接口可以包括接收单元和发送单元。例如,该通信接口303可以用于向终端100发送训练好的白平衡网络的相关信息(如相关参数的值)等。
总线304,可以是工业标准体系结构(industry standard architecture,ISA)总线、外部设备互连(peripheral component interconnect,PCI)总线或扩展工业标准体系结构(extended industry standard architecture,EISA)总线等。该总线可以分为地址总线、数据总线、控制总线等。为便于表示,图4中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。
需要指出的是,图4中示出的结构并不构成对该计算机设备30的限定,除图4所示部件之外,该计算机设备30可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。
另外需要指出的是,图4所示的计算机设备30具体可以是上文中提供的任一种终端100,也可以是任一种网络设备,如接入网设备(如基站)等。
以下,结合附图,说明本申请实施例提供的技术方案:
本申请实施例提供的拍摄视频的方法,可以应用于拍摄希区柯克变焦视频。并且,在拍摄希区柯克变焦视频的过程中,基于白平衡网络对所采集的图像进行白平衡处理,以使时域相邻图像的白平衡一致。其中:
白平衡是描述中红、绿、蓝三基色混合生成后白色精确度的一项指标。白平衡是电视摄像领域一个非常重要的概念,通过它可以解决和色调处理的一系列问题。
白平衡增益,是对图像的白平衡进行矫正的参数。
白平衡网络是用于对图像的白平衡增益进行预测的网络。白平衡网络可以是深度学习网络,如神经网络等。
白平衡一致性是指通过使用近似的白平衡增益对时域相邻的图像进行处理,使得处理后得到的图像之间的白平衡效果相同或相似。具体的,使用本申请实施例提供的白平衡网路对时域相邻的图像进行处理,使得处理后得到的图像之间的白平衡效果相同或相似。
作为举例说明,终端连续采集的图像,或者基于终端连续采集的图像进行处理后得到的图像;都可以理解为时域相邻的图像。
以下,对本申请实施例中提供的白平衡网络进行说明:
训练数据准备过程:
如图5所示,为本申请实施例提供的一种训练白平衡网络之前的训练数据准备过程的流程示意图。训练白平衡网络之前的训练数据准备过程可以由上文中所描述的计算机设备30执行。图5所示的方法包括以下步骤:
S101:计算机设备获取通过多个摄像头(如主摄像头、广角摄像头等)采集的多种环境(如不同色温、不同亮度、不同视角等的室内外环境)下的原始图像。
S102:对于每个原始图像,计算机设备对该原始图像中灰色或白色部分进行参数提取,得到该原始图像的白平衡增益。
这里的白平衡增益在训练阶段用于作为白平衡网络的预测目标,因此,为了区别下文中的白平衡增益预测值,下文中将此处得到的白平衡增益称为白平衡增益参考值。
可选的,原始图像中的灰色或白色部分可以基于标准色卡对比得到。
S103:对于每个原始图像,计算机设备对该原始图像进行数据增强,得到一组增强后的图像。一组增强后的图像用于确定一个原始样本。
为了方便描述,下文中将每一组增强后的图像称为一个图像组。
实现方式1:一个图像组作为一个原始样本。一个图像组包括P个图像,P是大于等于2的整数。一个图像组中的P个图像用于模拟摄像头采集的时域连续的P个图像。
可选的,一个图像组中的P个图像用于模拟同一摄像头采集的P个时域连续的图像。
可选的,一个图像组中的P个图像用于模拟不同摄像头切换前后采集的P个时域连续的图像。
其中,一个图像组中的第一个图像可以是基于该图像组对应的原始图像在一组随机数的基础上生成的,当然本申请实施例不限于此。
实现方式2:一个图像组和该图像组对应的原始图像作为一个原始样本。一个图像组包括Q个图像,Q是大于等于1的整数。一个图像组中的图像和该图像组对应的原始图像,用于模拟摄像头采集的时域连续的Q+1个图像。
可选的,一个图像组中的Q个图像和该图像组对应的原始图像,用于模拟同一摄像头采集的Q+1个时域连续的图像。
可选的,一个图像组中的Q个图像和该图像组对应的原始图像,用于模拟不同摄像头切换前后采集的Q+1个时域连续的图像。
示例的,在一个原始样本中,原始图像可以作为第1个图像,该原始图像对应的图像组中的图像作为该原始样本中的第2个图像至第Q+1个图像。
通常,一部分原始样本用于模拟同一摄像头采集的多个时域连续的图像,另一部分原始样本用于模拟不同摄像头切换前后采集的多个时域连续的图像。这样,能够使得基于所有原始样本训练得到的白平衡网络,同时适用于不切换摄像头的场景和切换摄像头的场景。
需要说明的是,为了方便说明,下文中均以一个原始样本是一个图像组(即上述实现方式1)为例进行说明。
S104:对于每个原始样本,计算机设备将该原始样本中的图像转换到同一颜色空间,得到一个样本。所有样本组成训练数据。
可选的,颜色空间可以是红绿蓝(red green blue,RGB)颜色空间等,当然具体实现时不限于此。
可选的,训练数据中的所有样本均使用同一颜色空间。
需要说明的是,样本中的图像属于同一颜色空间,是为了在预测阶段,避免(或消除)不同摄像头模组之间的差异。另外,如果一个原始样本中的多个图像本身就在同一个颜色空间,则可以不执行转换到同一颜色空间的步骤。该情况下,基于同一原始图像得到的一组增强后的图像被称为一个图像组。
训练阶段:
如图6所示,为本申请实施例提供的一种训练白平衡网络时所使用的网络架构的示意图。
in(n)表示样本中的第n个图像。n是一个样本中的图像个数,n大于等于2,n 是整数。in(n-a)表示样本中的第n-a个图像。a<n,a是整数。
out(n)表示第一子网络的输入是in(n)时第一子网络的输出。out(n-a)表示第一子网络的输入是in(n-a)时第一子网络的输出。
mem(n-1,1)表示样本中的第n-1个图像对应的特征图至第1个图像对应的特征图。mem(n-a-1,1)表示样本中的第n-a-1个图像对应的特征图至第1个图像对应的特征图。
其中,第n-1个图像对应的特征图是第一子网络的输入为in(n-1)时,第一子网络包括的网络层的特征图。本申请实施例对该网络层具体是第一子网络中的哪个或哪些网络层,以及每个网络层的具体实现方式均不进行限定。另外,一个样本中的不同图像对应的特征图可以是第一子网络中的同一网络层的特征图或不同网络层的特征图。
损失函数(loss)用于在训练过程进行约束,从而实现“out(n)、out(n-1)……out(n-a)一致”的训练目标。
在训练过程中,对于训练数据中的任意一个样本:
首先,计算机设备将该样本中的第n-a个图像至第n个图像分别输入到图6所示的网络架构中,该网络架构输出一组白平衡增益预测值out(n-a)至out(n)。
其次,计算机设备使用该样本对应的原始图像的白平衡增益参考值作为监督,使“out(n-a)……out(n)”中的每个值,尽量接近该原始图像的白平衡增益参考值为目标,调整第一子网络中的参数的取值。
以此类推,依次将多个样本输入到该网络架构中,重复执行上述步骤。并且,在执行的过程中,使用损失函数进行约束,以使得同一样本对应的“out(n-a)……out(n)”、以及该样本对应的原始图像的白平衡增益参考值一致。当该网络架构的准确率达到某一预设准确率时,说明该白平衡网络已训练好。
也就是说,白平衡网络是基于“用于模拟时域连续的多个图像的白平衡增益预测值一致”的这一约束条件训练得到的。
图6中,时域连续的a+1个图像的白平衡增益预测值之间采用一致性监督方式进行监督。
在一个示例中,如果n=2,a=1,则训练白平衡网络时所使用的网络架构可以如图7所示。
在训练时,对于训练数据中的任意一个样本:
首先,计算机设备将该样本中的第一个图像作为in(1),第二个图像作为in(2)输入到图7所示的网络架构中,该网络架构输出一对白平衡增益预测值out(1)和out(2)。
其次,计算机设备使用样本对应的原始图像的白平衡增益参考值作为监督,并结合该网络架构输出的白平衡增益预测值out(1)和out(2),调整该白平衡网络的参数的取值。
以此类推,依次将多个样本输入到该网络架构中,重复执行上述步骤。并且,在执行的过程中,使用损失函数进行约束,以使得同一样本对应的“out(1)和out(2)”、以及该样本对应的原始图像的白平衡增益参考值一致。当该网络架构的准确率达到某一预设准确率时,说明该白平衡网络已训练好。
图7中,时域连续的2个图像的白平衡增益预测值之间采用一致性监督方式进行 监督。
预测阶段:
如图8所示,为本申请实施例提供的一种预测阶段所使用的网络架构的示意图。
图8中的白平衡网络中的第一子网络是训练阶段结束时的第一子网络。
in(t)表示该白平衡网络的输入,用于输入待预测图像。
out(t)是该白平衡网络的输入是in(t)时,该白平衡网络的输出。
mem(t-1,t-T)表示基于该白平衡网络对采集的第t-1个图像的白平衡增益进行预测的过程中所使用的第一目标网络层输出的特征图(下文中称为第t-1个图像对应的特征图),至基于该白平衡网络对采集的第t-T个图像的白平衡增益进行预测的过程中所使用的第二目标网络层输出的特征图(下文中称为第t-T个图像对应的特征图)。
可选的,T的取值是可以调整的。通常,T的取值越大,使用该白平衡网络预测的时域连续的多个图像的白平衡增益的整体波动范围越小,也就是说,T的取值越大,使用该白平衡网络对时域连续的多个图像进行白平衡处理后,所得到的图像之间的白平衡一致性效果更好。
mem(t-1,t-T)具体是哪个或哪些特征图,会随着t的取值的更新而更新。例如,在T=3的情况下,假设t=5,则mem(t-1,t-T)表示采集的第4个图像对应的特征图至采集的第1个图像对应的特征图。假设t=6,则mem(t-1,t-T)表示采集的第5个图像对应的特征图至采集的第2个图像对应的特征图。
在预测时,对于采集的第t个图像,将该图像作为in(t)输入到图8所示的白平衡网络中,该白平衡网络在mem(t-1,t-T)的约束下,输出的out(t)为白平衡增益预测值。
也就是说,该白平衡网络用于结合历史网络层的特征图(即mem(t-1,t-T)),对待处理图像(即in(t))的白平衡增益进行预测,以保证时域相邻图像的白平衡一致性。其中,历史网络层是预测在待处理图像之前且与待处理图像时域连续的图像的白平衡增益时所使用的网络层。
如图9所示,为本申请实施例提供的一种白平衡增益预测方法的流程示意图。该技术方案的执行主体可以是上文中提供的计算机设备30。图9所示的方法可以包括以下步骤:
S201:计算机设备获取第一待预测图像和第一待预测图像的原始颜色空间。其中,第一待预设图像的原始颜色空间,是采集该待预测图像时所使用的摄像头的颜色空间。
其中,第一待预测图像可以是同一摄像头采集的连续多个图像中的非首个图像中的任意一个图像,也可以是不同摄像头采集的连续多个图像中的非首个图像中的任意一个图像。
S202:计算机设备将第一待预测图像的原始颜色空间转换到预设颜色空间,得到第二待预测图像。其中,预设颜色空间为训练阶段中所使用的颜色空间。
需要说明的是,如果第一待预测图像的原始颜色空间为预设颜色空间,则可以不执行S202,该情况下,以下步骤中的“第二待预测图像”可以替换为“第一待预测图像”。
S203:计算机设备将第二待预测图像作为in(t)输入到白平衡网络(例如图8所示的白平衡网络),该白平衡网络在mem(t-1,t-T)的约束之下,输出的out(t)为白平衡 增益预测值。
S204:计算机设备将该预测值转换到原始颜色空间,并将转换后的预测值作用于第一待预测图像,得到第一待预测图像对应的优化图像。
将转换后的预测值作用于第一待预测图像,得到第一待预测图像对应的优化图像,可以包括:将转换后的预测值与第一待预测图像中的每个像素的像素值相乘,得到新的像素值,并将该像素值作为第一待预测图像对应的优化图像中与该像素对应的像素的像素值。
其中,该优化图像为基于白平衡网络对第一待预测图像进行处理后得到的图像。在本申请实施例认为:对时域连续的多个图像中非首个图像中的每个图像执行S201-S204的处理之后所得到的图像与该时域连续的多个图像中的第一个图像之间具有白平衡一致性。
传统的白平衡网络,通常只考虑单帧信息,这会导致帧与帧之间的白平衡增益预测值存在跳变,也就是说,帧与帧之间的白平衡增益预测值的整体波动范围较大。
而本申请实施例提供的白平衡网络融合了当前帧和历史帧的网络层特征(feature)信息(即特征图,具体为上文中的mem(t-1,t-T))。这样,考虑多帧的信息,有助于使得帧与帧之间的白平衡增益预测值更为接近,从而使得该白平衡网络更稳定。更具体的,由于上述技术方案中,当前帧与历史帧之间是时域连续的。因此,上述技术方案提供的白平衡网络,使得时域连续的多个图像的白平衡增益预测值更为接近,即,使用该白平衡网络预测的连续多个图像的白平衡增益的整体波动范围更小,从而使得该白平衡网络更稳定,也就是说,使得对连续多个图像进行白平衡处理后所得到的图像之间的白平衡一致性效果更好。
需要说明的是,本申请实施例提供的白平衡网络在训练过程中,以“同一原始图像增强得到的图像的白平衡增益预测值一致”为约束。而实际上,并不能保证连续多个图像的白平衡增益预测值完全相同,而是有一定的波动。但是,相比现有技术中基于单帧预测白平衡增益的网络来说,本申请实施例提供的白平衡网络有助于使得时域连续多个图像的白平衡增益的整体波动范围降低,从而提高白平衡网络的稳定性。
此外,传统的白平衡网络,是基于同一摄像头采集的多个图像进行训练的,这会导致传统的白平衡网络不能应用于多摄切换场景中。
而本申请实施例针对多摄切换场景进行了数据增强。具体可以体现在对白平衡网络进行训练时,所使用的训练数据中包含用于模拟多摄切换场景前后连续多个图像的样本。由于多摄切换时一般有视角、尺寸和入图统计的变化,在训练时通过数据增强模拟多摄切换场景,这样,有助于约束网络的预测值在多摄切换场景下保持一致。
本申请实施例对上文提供的白平衡网络的应用不做限制。
以下,说明本申请实施例提供的白平衡网络的在拍摄希区柯克变焦视频时的应用。
希区柯克变焦视频中不同图像中的目标主体的大小一致(如相同或相差不大)、且不同图像中的目标主体的相对位置一致(如相同或相差不大)。可选的,不同图像中的目标主体的姿势一致(如姿势相同或姿势相似)。例如,不同图像中的目标主体的大小一致,可以包括:不同图像中的目标主体的轮廓(或最小外接矩形)相对于一致。例如,不同图像中的目标主体的相对位置一致,可以包括:不同图像中的目标主 体相对于背景中的同一静态对象的相对位置一致。例如,不同图像中的目标主体的中心位置(或轮廓或最小外接矩形)相对于背景中的同一静态对象的中心位置(或轮廓或最小外接矩形)一致。例如,姿势相似可以包括整体姿势相同(如均是站姿、坐姿或卧姿),而局部姿势存在差异(如手势不同等)。
需要说明的是,在一个实例中,希区柯克变焦视频中不同图像中的目标主体的大小一致,是指希区柯克变焦视频的不同图像中的目标主体不存在跳变,或者跳变程度较小,从而使得用户感觉不到这种跳变,或者用户能够接受这种跳变。
需要说明的是,在一个实例中,希区柯克变焦视频中不同图像中的目标主体的相对位置一致,是指希区柯克变焦视频的不同图像中的目标主体是静止的,或者是动态变化程度较小的,从而使得用户感觉不到这种动态变化,或者用户能够接受这种动态变化。
如图10所示,为本申请实施例提供的一种拍摄视频的方法的流程示意图。该方法应用于终端。该终端包括至少两个摄像头。本实施例提供的技术方案应用于由近及远地拍摄希区柯克变焦视频的场景中,即在拍摄希区柯克变焦视频的过程中,终端与目标主体之间的距离越来越远。
本实施例中是在终端距离目标主体越来越远的条件下拍摄希区柯克变焦视频,不切换摄像头时,在后采集的图像中的目标主体的大小小于在前采集的图像中的目标主体的大小,而希区柯克变焦视频的不同图像中的目标主体的大小一致,因此,为了实现希区柯克变焦视频,需要对后采集的图像进行放大。而直接对图像进行放大,会导致放大后的图像不清晰。
基于此,本实施例提供的技术方案的基本原理为:终端可以采用比采集在前的图像时所使用的摄像头的倍率更大的摄像头采集在后的图像,即通过切换成更大倍率的摄像头实现对目标主体的放大,这样,相比“对后采集的图像进行放大”的技术方案,有助于提高图像的清晰度。
图10所示的方法可以包括以下步骤:
S300:终端确定由近及远地拍摄希区柯克变焦视频,并确定初始摄像头。
可选的,终端可以在用户的指示下,确定由近及远地拍摄希区柯克变焦视频。
例如,结合图3中的d图,响应于对“希区柯克变焦视频”控件的触摸操作,终端还可以在用户界面上显示“由近及远模式”401控件和“由远及近模式”402控件,如图11中的a图所示。基于此,用户可以通过对“由近及远模式”401控件进行触摸操作,响应于该触摸操作,终端突出显示“由近及远模式”401控件(如加粗显示该控件的边框),如图11中的b图所示,同时开始启动由近及远模式拍摄希区柯克变焦视频。
需要说明的是,终端还可以以其他方式(如语音指令方式或快捷键方式等)启动由近及远模式拍摄希区柯克变焦视频,本申请实施例对此不进行具体限定。
由于在本实施例中,如果终端切换摄像头,则切换成更大倍率的摄像头,因此,初始摄像头的倍率通常不是终端中倍率最大的摄像头,通常可以预定义初始摄像头是终端中倍率较小(如最小)的一个摄像头。
S301:终端针对第一场景实时采集N+1个图像,N+1个图像中均包括目标主体。 其中,在采集N+1个图像的过程中,该终端距离目标主体越来越远。N是大于等于1的整数。
可选的,N+1个图像可以是终端连续采集的N+1个图像。
可选的,N+1个图像中的第一个图像,是终端在“希区柯克变焦视频”模式下,开始拍摄时,终端保存的第一个图像。
其中,第一场景可以理解为终端执行S301时,终端的摄像头拍摄视野内的拍摄场景或者其周边可以遍及的场景,与用户所处的环境、终端的姿态或者摄像头的参数有关,本申请不做限定。
其中,目标主体可以为一个物体,该目标主体的位置在拍摄过程中可以不发生移动,也可以在同一深度横向移动。
或者,该目标主体也可以包括深度相同的多个物体,该多个物体的整体可以作为目标主体。在一些实施例中,当目标主体包括多个物体时,该多个物体的图像相连接或有部分重叠。在视频拍摄过程中,随着用户与目标主体之间的距离的变化,不同深度的物体成像大小的变化幅度不同。因而,在用户与目标主体之间的距离变化时,不同深度的物体难以同时实现图像的大小基本不变。因此,为保持目标主体图像的大小基本不变,目标主体中的多个物体应具有相同的深度。
其中,该目标主体可以是终端自动确定的或者用户指定的,以下针对这两种情况分别进行说明。
(1)、终端自动确定目标主体,该目标主体可以包括一个或多个物体。
在一些实施例中,目标主体为预设类型的物体。例如,该预设类型的物体为人物、动物、著名建筑或标志物等。终端基于预览图像确定预设类型的物体为目标主体。
在另一些实施例中,目标主体为在预览图像上的图像位于中心区域的物体。用户感兴趣的目标主体通常会正对着变焦摄像头,因而目标主体在预览图像上的图像通常位于中心区域。
在另一些实施例中,目标主体为在预览图像上的图像靠近中心区域且面积大于预设阈值1的物体。用户感兴趣的目标主体通常会对着变焦摄像头且离变焦摄像头较近,从而目标主体在预览图像上的图像靠近中心区域且面积大于预设阈值1。
在另一些实施例中,目标主体为在预览图像上的图像靠近中心区域的预设类型的物体。
在另一些实施例中,目标主体为在预览图像上的图像靠近中心区域,且面积大于预设阈值的预设类型的物体。
在另一些实施例中,目标主体为在预览图像上的图像靠近中心区域,且深度最小的预设类型的物体。当在预览图像上的图像靠近中心区域的预设类型的物体包括多个深度不同的物体时,目标对象为其中深度最小的物体。
在一些实施例中,终端默认目标主体仅包括一个物体。
可以理解的是,终端自动确定目标主体的方式还可以有多种,本申请实施例对该方式不予具体限定。
在一些实施例中,终端确定目标主体后,可以通过显示提示信息或语音播报等方式将目标主体提示给用户。
例如,预设类型为人物,终端确定目标主体为在预览图像上的图像靠近中心区域的预设类型的人物1。示例性的,参见图12中的(a),终端可以通过方框501将人物1框选出来,以提示用户该人物1为目标主体。
再例如,预设类型为人物,终端确定目标主体为在预览图像上的图像靠近中心区域,且具有相同深度的预设类型的人物2和人物3。示例性的,参见图12中的(b),终端可以通过圆圈502将人物2和人物3框选出来,以提示用户人物2和人物3为目标主体。
再例如,预设类型包括人物和动物,终端确定目标主体为在预览图像上的图像靠近中心区域且具有相同深度的预设类型的人物4和动物1。示例性的,参见图12中的(c),终端可以通过显示提示信息来提示用户人物4和动物1为目标主体。
可以理解的是,终端将目标主体提示给用户的方式还可以有多种,本申请实施例对该方式不予具体限定。
在一些实施例中,终端自动确定目标主体后,还可以响应于用户的操作修改目标主体,例如切换、增加或删除目标主体等。
例如,在图13中的(a)所示的情况下,终端自动确定的目标主体为人物1,终端检测到用户点击预览图像上的人物5的操作后,如图13中的(b)所示,将目标主体由人物1修改为人物5。
再例如,在图14中(a)所示的情况下,终端自动确定的目标主体为人物1,终端检测到用户拖动方框以同时框选人物1和人物5的操作后,如图14中的(b)所示,将目标主体由人物1修改为人物1和人物5。
再例如,在图15中的(a)所示的情况下,终端自动确定的目标主体为人物1和人物5,终端检测到用户可以点击人物5的操作后,如图15中的(b)所示,将目标主体由人物1和人物5修改为人物1。
再例如,终端根据用户的指示首先进入目标主体修改模式后,再响应于用户的操作修改目标主体。
可以理解的是,用户修改目标主体的方式还可以有多种,本申请实施例对该方式不予具体限定。
(2)、用户指定目标主体,该目标主体包括一个或多个物体。
终端进入希区柯克模式后,可以响应于用户在预览界面上的预设操作确定目标主体。该预设操作用于指定某个或某些物体为目标对象。其中,该预设操作可以是触摸操作、语音指令操作或手势操作等,本申请实施例不予限定。例如,该触摸操作可以是单击、双击、长按、压力按或圈定对象的操作等。
示例性的,在图16中的(a)所示的预览界面上,终端检测到用户双击预览图像上人物1的操作后,如图16中的(b)所示将人物1确定为目标主体。
在另一些实施例中,终端进入希区柯克模式后,可以提示用户指定目标主体。示例性的,参见图17中的(a),终端可以显示提示信息:请指定目标主体,以使得目标主体在拍摄过程中的图像大小基本不变。而后,终端响应于用户在预览界面上的预设操作确定目标主体。比如,终端检测到用户划圈圈定图17中的(a)所示的人物1的操作后,如图17中的(b)所示将对应的人物1确定为目标主体。再比如,终端检 测到用户语音指示人物为目标主体的操作后,将人物1确定为目标主体。
再示例性的,在目标主体为预设物体类型且预设物体类型为人物的情况下,参见图18中的(a),终端可以显示提示信息:检测到一个人物,是否指定该人物为目标主体,以使得目标主体在拍摄过程中的图像大小基本不变?而后,终端响应于用户点击“是”控件的操作后,如图18中的(b)所示确定该人物为目标主体。
在一些实施例中,若终端默认目标主体仅包括一个物体,则当用户指定多个物体为目标主体时,终端可以提示用户:请仅选择一个物体作为目标主体。
与终端自动确定目标主体后类似,终端响应于用户的预设操作确定目标主体后,也可以通过显示提示信息或语音播报等方式将目标主体提示给用户。并且,终端也可以响应于用户的操作修改目标主体,例如切换、增加或删除目标主体。此处不再赘述。
其中,终端针对第一场景实时采集N+1个图像,是指终端在拍摄的过程中针对第一场景采集N+1个图像,而非在拍摄之前已经获取到的针对第一场景的N+1个图像。
可选的,如图19a所示,S301可以包括以下步骤S301a-S301d:
S301a:终端采用初始摄像头针对第一场景采集第一个图像,第一个图像包括目标主体。
S301b:终端采用初始摄像头针对第一场景采集第二个图像,第二个图像包括目标主体。
S301c:终端基于第i个图像的拍摄倍率,确定采集第i+1个图像的摄像头。i≥2,i是整数。具体实现方式可以参考下文。第i个图像包括目标主体。
S301d:终端采用所确定的采集第i+1个图像的摄像头,采集第i+1个图像。
以此类推,终端可以采集到N+1个图像。
在一种实现方式中,N+1个图像包括在前采集的N1+1个图像和在后采集的N2个图像,其中,N1+1个图像是由终端的第一摄像头采集得到,N2个图像是由终端的第二摄像头采集得到;N1和N2均是大于等于1的整数。也就是说,本申请实施例提供的技术方案可以应用于在切换摄像头的场景中拍摄希区柯克变焦视频。当然,具体实现时,在拍摄一段希区柯克变焦视频的过程中,可以多次切换摄像头。
在该实现方式中,结合上述S301a-S301d可知:
N1+1个图像中的第二个图像至第N1个图像相对第一个图像的缩放倍率,均属于第一拍摄倍率范围。第一拍摄倍率范围与第一摄像头对应。
N1+1个图像中的第N1+1个图像和N2个图像中的前N2-1个图像相对第一个图像的缩放倍率,均属于第二拍摄倍率范围。第二拍摄倍率范围与第二摄像头对应。
在另一种实现方式中,N+1个图像均由第一摄像头拍摄采集得到。也就是说,本申请实施例提供的技术方案可以应用于在不切换摄像头的场景中拍摄希区柯克变焦视频。
可选的,如图19b所示,S301c可以包括以下步骤:S301c-1至S301c-3:
S301c-1:终端基于第一个图像对第i个图像进行防抖处理。具体的,终端确定第一个图像中的特征点的位置,并基于第一个图像中的特征点的位置,对第i个图像中的与该特征点相匹配的特征点的位置进行运动补偿,从而实现对第i个图像进行防抖处理。
本申请实施例对终端执行防抖处理时所采用的防抖处理技术不进行限定,例如,防抖处理技术可以是光学防抖处理技术、人工智能(artificial intelligence,AI)防抖处理技术或电子处理防抖技术等。
需要说明的是,S301c-1是可选的步骤。对于上述N个图像中的每个图像均执行S301c-1之后,从整体上而言,有助于进入缩放倍率计算模块(即终端中用于计算缩放倍率的模块)的视频(即所采集的后N个图像)抖动较弱(即使得该视频整体更佳稳定/平滑),从而使得所获得的缩放倍率的精确度更大。
S301c-2:终端获取第i个图像的拍摄倍率。如果终端执行S301c-1,则这里的第i个图像具体是经防抖处理后的第i个图像。
可选的,第i个图像的拍摄倍率是基于第i个图像相对于第一个图像的缩放倍率,和采集第一个图像的摄像头的倍率确定的。第i个图像相对第一个图像的缩放倍率是:基于第i个图像中的目标主体的大小和第一个图像中的目标主体的大小确定的。
例如,ci=c1/(di/d1)。其中,di是第i个图像中的目标主体的大小,d1是第一个图像中的目标主体的大小。di/d1是第i个图像相对第一个图像的缩放倍率。c1是第一摄像头的倍率,ci是第i个图像的拍摄倍率。
可选的,图像中的目标主体的大小通过以下特征1-4中的至少一个特征来表征:
特征1:该图像中的目标主体的宽度。
特征2:该图像中的目标主体的高度。
特征3:该图像中的目标主体的面积。
特征4:该图像中的目标主体所占的像素点的数量。
例如,以特征2表征目标主体在一个图像中的大小为例,第i个图像的拍摄倍率可以基于公式ci=c1/(hi/h1)得到。其中,hi是第i个图像中的目标主体的高度,h1是第一个图像中的目标主体的高度。hi/h1是第i个图像相对于第一个图像的缩放倍率。c1是第一摄像头的倍率,ci是第i个图像的拍摄倍率。
可选的,S301c-2可以包括:终端从第i个图像中提取目标主体,并基于该目标主体的大小与第一个图像的目标主体的大小,确定第i个图像相对于第一个图像的缩放倍率。
本申请实施例对终端从图像中提取目标主体的具体实现方式不进行限定。例如,终端通过主体分割算法、主体骨骼点检测算法和主体轮廓检测算法等中的一种或多种,从图像中提取目标主体。
示例的,主体分割算法包括实例分割算法。具体的,终端使用实例分割算法从第i个图像提取目标主体的实例分割掩膜,然后,将从第i个图像提取的目标主体的实例分割掩膜的大小除以从第一个图像提取的目标主体的实例分割掩膜的大小,得到第i个图像相对于第一个图像的缩放倍率。如图20所示,为本申请实施例提供的一种实例分割的示意图。其中,图20中的a图表示第i个图像,图20中的b图表示第i个图像中的目标主体的实例分割掩膜,在该掩膜中,像素值大于0的像素是表示目标主体的像素,其他区域是表示背景的像素。
实例分割算法是像素级别的分割方法,基于实例分割算法提取的目标主体的精确度更大,这有助于使得终端计算得到的缩放倍率更精确。比如,在目标主体包括多人 的情况下,也可以有效地区分主体人像和背景。
S301c-3:如果第i个图像的拍摄倍率在第一拍摄倍率范围,则终端确定采集第i+1个图像的摄像头是第一摄像头。如果第i个图像的拍摄倍率在第二拍摄倍率范围,则确定采集第i+1个图像的摄像头是第二摄像头。第一拍摄倍率范围与第一摄像头对应,第二拍摄倍率范围与第二摄像头对应。
也就是说,终端基于第i个图像的拍摄倍率,确定采集第i+1个图像的摄像头。由于相邻两个图像中的目标主体的大小/目标主体的位置相差不会太大,因此,终端可以基于第i个图像的拍摄倍率所确定的摄像头,采集第i+1个图像。
可选的,第一摄像头的倍率是a,第二摄像头的倍率是b;a<b;第一拍摄倍率范围是[a,b),第二拍摄倍率范围是大于等于b的范围。
例如,以终端中包括广角摄像头、主摄摄像头为例,由于广角摄像头的倍率是0.6,主摄摄像头的倍率等于1,因此,广角摄像头对应的拍摄倍率范围是[0.6,1),主摄摄像头对应的拍摄倍率范围是大于等于1的范围。
进一步可选的,如果终端中还包括第三摄像头,且第三摄像头的倍率是c,a<b<c,则第一拍摄倍率范围是[a,b),第二拍摄倍率范围是[b,c);第三摄像头对应的拍摄倍率范围是大于等于c的范围。
例如,以终端中包括广角摄像头、主摄摄像头和长焦摄像头为例,由于广角摄像头的倍率是0.6,主摄摄像头的倍率是1,长焦摄像头的倍率是w,w是大于1的整数,因此,广角摄像头对应的拍摄倍率范围是[0.6,1),主摄摄像头对应的拍摄倍率范围是[1,w),长焦摄像头对应的拍摄倍率范围是大于等于w的范围。
基于此,在示例1中,若通过广角摄像头采集第一个图像,且经计算得到第i个图像相对于第一个图像的缩放倍率是0.5,则第i个图像的拍摄倍率是0.6/0.5=1.2。假设长焦摄像头的倍率是10(即w=10),由于第i个图像的拍摄倍率(即1.2)在主摄摄像头对应的拍摄倍率范围(即[1,10))内,因此,终端确定使用主摄摄像头采集第i+1个图像。
又如,以终端中包括广角摄像头、主摄摄像头、第一长焦摄像头和第二长焦摄像头为例,由于广角摄像头的倍率是0.6,主摄摄像头的倍率是1,第一长焦摄像头的倍率是w1,第二长焦摄像头的倍率是w2,1<w1<w2,则广角摄像头对应的拍摄倍率范围是[0.6,1),主摄摄像头对应的拍摄倍率范围是[1,w1),第一长焦摄像头对应的拍摄倍率范围是[w1,w2),第二长焦摄像头对应的拍摄倍率范围是大于等于w2的范围。
基于此,在示例2中,若通过广角摄像头采集第一个图像,且经计算得到第i个图像相对于第一个图像的缩放倍率是0.2,则第i个图像的拍摄倍率是0.6/0.2=3。假设第一长焦摄像头的倍率是2(即w1=2),第二长焦摄像头的倍率是10(即w2=10),由于第i个图像的拍摄倍率(即3)在第一长焦摄像头对应的拍摄倍率范围(即[2,10))内,因此,终端确定使用第一长焦摄像头采集第i+1个图像。
需要说明的是,在具体实现时,终端也可以在第i个图像的拍摄倍率达到某一摄像头对应的拍摄倍率范围的最小临界值之前的一个小范围内时,就切换成该摄像头,以降低因切换摄像头而导致的拍摄时延的问题。
如图21所示,为本申请实施例提供的一种终端采集N+1个图像的过程示意图。其中,该终端包括摄像头1至摄像头x,x是大于等于2的整数,编号越大的摄像头的倍率越大。
基于图21,终端采集N+1个图像的过程可以包括以下步骤:
终端使用摄像头1(即初始摄像头)采集第1个图像。
终端使用摄像头1采集第2个图像,并基于第1个图像对第2个图像进行防抖处理,然后获取经防抖处理的第2个图像相对第1个图像的缩放倍率。如果基于该缩放倍率确定第2个图像的拍摄倍率在摄像头a对应的拍摄倍率范围内,则使用摄像头a采集第3个图像。其中,1≤a≤x。
终端使用摄像头a采集第3个图像,并对第3个图像进行防抖处理,然后获取经防抖处理的第3个图像相对第1个图像的缩放倍率。如果基于该缩放倍率确定第3个图像的拍摄倍率在在摄像头b对应的拍摄倍率范围内,则使用摄像头b采集第4个图像。其中,a≤b≤x。
以此类推,终端采集第4个至第N+1个图像。
可选的,N+1个图像中在后采集的图像中的目标主体的大小小于第一个图像中的目标主体的大小。
在一种实现方式中,N+1个图像中,在后采集的图像中的目标主体的大小,小于在前采集的图像中的目标主体的大小。如图22a所示,为本申请实施例提供的一种终端在S301中所采集到的图像的示意图。图22a中的a图表示终端采集的第一个图像,b图表示终端采集的第二个图像,c图表示终端采集的第三个图像。
在另一种实现方式中,N+1个图像中,在后采集的图像中的目标主体的大小,可能大于在前采集的图像中的目标主体的大小,但小于第一个图像中的目标主体的大小。如图22b所示,为本申请实施例提供的一种终端在S301中所采集到的图像的示意图。图22b中是以N+1=3为例。图22b中的a图表示终端采集的第一个图像,b图表示终端采集的第二个图像,c图表示终端采集的第三个图像。
以下通过一个示例说明,N+1个图像中,在后采集的图像中的目标主体的大小,可能大于在前采集的图像中的目标主体的大小,但小于第一个图像中的目标主体的大小。
假设第一个图像中的目标主体的大小是d,采集第一个图像的摄像头是1X摄像头,那么,由于采集第一个图像和第二个图像使用的均是1X摄像头,因此,以第二个图像中的目标主体的大小是d/2为例,第二个图像相对于第一个图像的缩放倍率是0.5,因此,第二个图像的拍摄倍率是1/0.5=2。后续终端可以采用2X摄像头采集第三个图像。一方面,由于终端采用2X摄像头采集第三个图像,而采用1X摄像头采集第二个图像,因此,可能第三个图像中的目标主体的大小大于第二个图像中的目标主体的大小。另一方面,由于在采集第二个至第三个图像的过程中,终端距离目标主体越来越远,因此,第三个图像中的目标主体的大小小于第一个图像中的目标主体的大小。
可选的,该方法还可以包括:终端在当前预览界面中,显示第一信息,第一信息用于指示停止拍摄希区柯克变焦视频。
例如,终端可以在当前使用的摄像头是该终端中最大倍率的摄像头时,在当前预 览界面中,显示第一信息。对于用户来说,可以在获取到第一信息之后的一段时间内停止拍摄希区柯克变焦视频。
在终端与目标主体之间的距离越来越远的情况下拍摄希区柯克变焦视频,若当前使用的摄像头是该终端中最大倍率的摄像头,则由于当前摄像头的倍率不能更大,因此终端不能再切换摄像头,此时,通过在当前预览界面上显示第一信息,有助于提示用户及时停止拍摄视频,否则终端后续采集的图像仅能通过放大,来使得放大后的图像中的目标主体的大小与第一个图像中的目标主体的大小一致,这会导致基于后续采集的图像生成的目标图像的清晰度不高。也就是说,本申请实施例提供了一种指导用户停止拍摄希区柯克变焦视频的方法,这有助于提高用户体验。
本申请实施例对第一信息具体包含哪些信息,来指示停止拍摄希区柯克变焦视频不进行限定。例如,可以直接指示“当前使用的摄像头是该终端中最大倍率的摄像头”,也可以通过指示“请停止录制视频”来间接指示当前使用的摄像头是该终端中最大倍率的摄像头”。如图23a所示,为本申请实施例提供的一种当前预览界面的示意图。当前预览界面中包含当前播放的希区柯克变焦视频的图像501(即当前预览图像),以及第一信息“请停止录制视频”502。
可选的,该方法还可以包括:终端在当前预览界面中,显示第二信息,第二信息用于指示目标主体静止。
例如,终端可以在确定当前预览图像中的目标主体的位置,与在前的预览图像中的目标主体的位置一致的情况下,在当前预览界面中显示第二信息。由于希区柯克变焦视频的要求之一是各图像中的目标主体的位置一致,因此,这样,用户可以在获取希区柯克变焦视频的过程中,获知当前是否满足获取希区柯克变焦视频的要求,从而提高用户体验。
本申请实施例对第一信息具体包含哪些信息,来指示目标主体静止不进行限定。例如,如图23b所示,为本申请实施例提供的一种当前预览界面的示意图。当前预览界面中包含当前呈现的希区柯克变焦视频的图像501(即当前预览图像),以及第二信息“目标主体静止”503。
可选的,该方法还可以包括:终端在当前预览界面中,显示第三信息,第三信息用于指示目标主体在当前预览图像的中央。这样,用户可以基于在终端中没有显示第三信息时,移动终端,从而使得目标主体在当前预览图像的中央,这有助于提高希区柯克视频的质量。
例如,终端可以在检测到当前预览图像中目标主体的位置(如目标主体的中心,或目标主体的轮廓或目标主体的最小外界矩形等),在当前预览图像的预设中央区域(即以当前预览图像的中心为中心的一个预设区域)中时,在当前预览界面中,显示第三信息。
本申请实施例对第三信息具体包含哪些信息,来指示目标主体在当前预览图像的中央不进行限定。例如,如图23c所示,为本申请实施例提供的一种当前预览界面的示意图。当前预览界面中包含当前呈现的希区柯克变焦视频的图像501(即当前预览图像),以及第三信息“目标主体在当前预览图像的中央”504。
可替换地,终端可以在当前预览界面中,显示第四信息,第四信息用于指示目标 主体不在当前预览图像的中央。这样,用户可以基于在终端中显示了第四信息时,移动终端,从而使得目标主体在当前预览图像的中央,这有助于提高希区柯克视频的质量。
可选的,针对第一场景实时采集N+1个图像,包括:在目标主体在当前预览图像的中央时,采集第一个图像。这样,有助于提高希区柯克视频的质量。
可选的,在采集N+1的图像的过程中,终端的移动速度小于等于预设速度。
由于终端中的摄像头的数量有限,而终端移动速度过快可能导致切换摄像头的速度过快,而当切换到最大倍率的摄像头时,不能再切换摄像头。在使用最大倍率的摄像头采集图像时,随着终端距离目标主体越来越远,在后采集的图像中的目标主体越来越小,在生成希区柯克变焦视频时,需要对这些图像进行放大,这会导致图像的清晰度不高。基于此,提出该可能的设计。这样,有助于提高希区柯克变焦视频的质量。
本申请实施例对预设速度的具体取值不进行限定,例如,可以是经验值。
S302:对于N+1个图像中后采集的N个图像,终端基于预设神经网络进行白平衡处理,得到N个优化图像。预设神经网络用于保证时域相邻图像的白平衡一致性。
这里的预设神经网络,可以是上文由本申请实施例提供的白平衡网络,如图8所示的白平衡网络。终端可以在网络设备中下载预设神经网络,或者可以通过本地训练得到预设神经网络。本申请实施例对此不进行限定。
可选的,由于不同摄像头拍摄的图像之间存在一些差异,因此,终端可以将N+1个图像中后采集的N个图像中的每个图像,除了进行白平衡处理之外,还可以进行亮度、色度等参数进行矫正,以避免(或尽量避免)因切换摄像头而导致的图像不一致的问题。
例如,对于图像色度和亮度来说,分别获取时域相邻的图像的亮度值/色度值,并获取亮度值/色度值的乘性因子或加性因子对该图像的亮度值/色度进行换算,使在后的图像的亮度/色度值换算后接近在前的图像的亮度值/色度值,从而使得时域相邻的图像的亮度值/色度值保持一致性。
S303:终端对N个优化图像进行放大并裁剪,得到N个目标图像;其中,N个目标图像中的目标主体的大小与N+1个图像中采集的第一个图像中目标主体的大小一致,N个目标图像中目标主体的相对位置,与第一个图像中目标主体的相对位置一致;N个目标图像与第一个图像的大小一致。
基于上文中的描述可知,在后采集的图像中的目标主体的大小小于第一个图像中的目标主体的大小,因此,终端需要对在后采集的N个图像的对应的N个优化图像进行放大并裁剪。
终端对N个优化图像进行放大,以使得放大后的N个优化图像中的目标主体的大小与第一个图像中的目标主体大小一致,且放大后的N个优化图像中的目标主体的相对位置,与第一个图像中目标主体的相对位置一致。终端对放大后的N个图像进行裁剪,得到N个目标图像,以使得N个目标图像中的每个图像与第一图像的大小一致。
在一个示例中,如图24所示。图24中的a图表示N+1个图像中的第一个图像,图24中的b图表示N个优化图像中的其中一个优化图像。图24中的c图表示对图24中的b图所示的优化图像进行放大后得到的图像,其中,放大后的图像中的目标主体 的大小与图24中的a图中所示的第一个图像中的目标主体的大小一致。图24中的d图表示对图24中的c图所示的图像进行裁剪后得到的目标图像,该目标图像的尺寸与图24中的a图所示的第一个图像的尺寸相同,其中,在裁剪的过程中,尽量保证该目标图像中目标主体的位置与第一个图像中目标主体的位置一致。
需要说明的是,由于“终端与目标主体之间的距离”与“终端与背景中的对象之间的距离”不同,因此,在采集N+1个图像中的不同的两个图像中,这两个图像中的目标主体的缩放倍率与这两个图像中背景中的同一对象的缩放倍率不同。这会使得对一个优化图像进行放大裁剪之后得到的目标图像中的背景,与第一个图像中的背景不同。例如,图24中的a图和d图中的背景不同。基于此,终端可以基于N个目标图像和第一个图像生成“不同图像中的目标主体的大小一致,相对位置一致,而背景不一致”的希区柯克变焦视频。
S304:终端基于N个目标图像和第一个图像生成希区柯克变焦视频。
希区柯克变焦视频中的相邻图像之间的播放时间间隔可以是预定义的。
在一种实现方式中,终端在拍摄的过程中,实时呈现希区柯克变焦视频。也就是说,边生成希区柯克变焦视频边呈现所生成的希区柯克变焦视频。
该情况下,终端执行上述S302和S303时:对于第i个图像,在执行S301d之后将第i个图像输入到预设神经网络中,得到第i个图像对应的优化图像。并对第i个图像对应的优化图像进行放大并裁剪,得到该优化图像对应的目标图像。也就是说,在采集到一个图像之后,即可对该图像进行白平衡处理、放大、裁剪、呈现(即显示目标图像)等。并且,在对该图像进行处理的过程中,可以采集并处理下一个图像。而非在获得N+1个图像之后才执行S302,以及在针对所有图像执行S302之后才执行S303。
在另一种实现方式中,终端可以在获得N+1个图像之后才执行S302,以及在针对所有图像执行S302之后才执行S303。也就是说,终端对所采集的N+1个图像进行后处理,从而获得希区柯克变焦视频。
需要说明的是,如果在获得N+1个图像的过程中,采集图像的摄像头进行了切换,则可能由于切换摄像头导致目标主体在切换摄像头前后所采集的不同图像中的位置不同,也就是说,目标主体可能会出现不稳定的情况。
基于此,可选的,该方法还可以包括:终端获取第一个图像中的目标主体的位置信息,以及,N个目标图像中的每个目标图像中的目标主体的位置信息;然后,对于N个目标图像中的每个目标图像,基于主体稳像算法和第一个图像中的目标主体的位置信息,对该目标图像进行稳像处理,得到新的目标图像。其中,新的目标图像中目标主体的位置与第一个图像中的目标主体的位置一致。
示例的,若上文中采用实例分割算法得到目标主体的掩膜,则可以基于目标主体掩膜获得目标主体在相应图像中的位置信息。结合目标主体的位置信息对目标主体位置区域进行特征点检测,可以有效排除目标主体区域外的特征点的影响。通过对目标主体区域的特征点进行稳像处理,可以得到新的目标图像。
本申请实施例对主体稳像算法不进行限定。例如,可以是AI稳像算法。
基于此,上述S304中,终端基于N个目标图像和第一个图像生成希区柯克变焦 视频,具体包括:终端基于N个新的目标图像和第一个图像生成希区柯克变焦视频。这样,可以使得所得到的希区柯克变焦视频的主体稳定。
本申请实施例提供的拍摄视频的方法中,在获取希区柯克变焦视频的过程中,对实时采集到的N+1个图像中的后N个图像进行了白平衡处理,以使得处理后的图像与所采集的N+1个图像中的第一个图像的白平衡一致。这样,可以使得获得的希区柯克变焦视频的白平衡效果更好,从而提高希区柯克变焦视频的质量,提高了用户体验。另外,在本申请实施例中,终端可以通过切换摄像头的方式,放大后续采集的图像中的目标主体的大小,相比传统技术,有助于使所获得的希区柯克变焦效果视频的清晰度更高,从而提高用户体验。
如图25所示,为本申请实施例提供的另一种拍摄视频的方法的流程示意图。该方法应用于终端。该终端包括至少两个摄像头。本实施例提供的技术方案应用于由远及近地拍摄希区柯克变焦视频的场景,即在拍摄希区柯克变焦视频的过程中,终端与目标主体之间的距离越来越近。
本实施例中是在终端距离目标主体越来越近的条件下拍摄希区柯克变焦视频,不切换摄像头时,在后采集的图像中的目标主体的大小大于在前采集的图像中的目标主体的大小。而希区柯克变焦视频中不同图像中的目标主体的大小一致,因此,需要对后采集的图像进行缩小。由于对在后采集的图像进行缩小之后,需要进行“补边”处理,以使“补边”后的图像的大小与终端采集的第一个图像的大小一致,这会导致出现“补边”后的图像出现黑边,从而在图像呈现时导致用户体验差。
例如,终端采集的第一个图像如图26中的a图所示,第二个图像如图26中的b图所示,对第二个图像进行缩小后得到的图像如图26中的c图所示,对图26中的c图进行“补边”后得到的图像如图26中的d图所示。
基于此,本实施例提供的技术方案的基本原理为:终端可以采用比采集在前的图像时所使用的摄像头的倍率更小的摄像头采集在后的图像,即通过切换成更小倍率的摄像头实现对目标主体的缩小,这样,不需要对采集后的图像进行“补边”,从而提高用户体验。
图25所示的方法可以包括以下步骤:
S400:终端确定由远及近地拍摄希区柯克变焦视频,并确定初始摄像头。
可选的,终端可以在用户的指示下,确定由远及近地拍摄希区柯克变焦视频。
例如,基于图11中的a图所示的用户界面,用户可以通过触摸操作点击“由远及近模式”402控件,响应于该触摸操作,终端突出显示“由远及近模式”402控件,同时开始启动以由远及近模式拍摄希区柯克变焦视频。
需要说明的是,终端还可以以其他方式(如语音指令方式等)启动以由远及近模式拍摄希区柯克变焦视频,本申请实施例对此不进行具体限定。
由于在本实施例中,如果终端切换摄像头,则切换成更小倍率的摄像头,因此,初始摄像头的倍率通常不是终端中倍率最小的摄像头,通常可以预定义该摄像头是终端中倍率较大(如最大)的一个摄像头。
S401:终端针对第一场景采集N+1个图像,N+1个图像中均包括目标主体。其中,在采集N+1个图像的过程中,该终端距离目标主体越来越近。N是大于等于1的整数。 N+1个图像中的第一个图像由终端的第一摄像头采集得到,N+1个图像中的后N个图像中的部分或全部图像由终端的第二摄像头采集得到,第二摄像头的倍率小于第一摄像头的倍率。N+1个图像中后采集的N个图像中目标主体的大小小于或等于N+1个图像中采集的第一个图像中的目标主体的大小。
也就是说,在本实施例中,终端采集图像的过程中,由大倍率的摄像头切换成了小倍率的摄像头,这有助于使得在终端距离目标主体越来越近的场景中,在后采集的图像中的目标主体的大小小于或等于在前采集的图像中的目标主体的大小。
在一种实现方式中,该N+1个图像是连续采集的N+1个图像,即实时采集的N+1个图像。
可选的,N+1个图像中的后N个图像包括在前采集的N1个图像和在后采集的N2个图像,其中,N1个图像由第二摄像头采集得到,N2个图像由终端的第三摄像头采集得到;N1和N2均是大于等于1的整数。
可选的,针对第一场景采集N+1个图像,包括:获取N+1个图像中的第i个图像的拍摄倍率;其中,2≤i≤N,i是整数;如果第i个图像的拍摄倍率在第一拍摄倍率范围内,则基于第二摄像头针对第一场景采集N+1个图像中的第i+1个图像;如果第i个图像的拍摄倍率在第二拍摄倍率范围内,则基于终端的第三摄像头针对第一场景采集N+1个图像中的第i+1个图像。其中,第二摄像头的倍率是b,第三摄像头的倍率是c;b>c;第一拍摄倍率范围是大于等于b的范围;第二拍摄倍率范围是[c,b)。该可选的实现方式中相关内容的解释以及示例基于参考上文中的示例推理得到,此处不再赘述。
可选的,第一个图像的拍摄倍率大于第二摄像头的倍率。也就是说,终端的初始拍摄倍率大于采集第二个图像时所采用的摄像头的倍率。例如,终端中包含5X摄像头和1X摄像头,采集第一个图像所使用的摄像头可以是5X摄像头,此时的拍摄倍率可以是大于5的范围,或者[1,5)。采集第二个图像所使用的摄像头是1X摄像头。
可选的,该方法还可以包括:在当前预览界面中,显示第一信息,第一信息用于指示停止拍摄希区柯克变焦视频。
例如,终端可以在当前使用的摄像头是该终端中最小倍率的摄像头时,在当前预览界面中,显示第一信息。对于用户来说,可以在获取到第一信息之后的一段时间内停止拍摄希区柯克变焦视频。
在终端与目标主体之间的距离越来越近的情况下拍摄希区柯克变焦视频,若当前使用的摄像头是该终端中最小倍率的摄像头,则由于当前摄像头的倍率不能更小,因此终端不能再切换摄像头,此时,通过在当前预览界面上显示第一信息,有助于提示用户及时停止拍摄视频,否则终端后续采集的图像需要进行缩小和补边,从而降低播放希区柯克变焦视频时的用户体验。也就是说,本申请实施例提供了一种指导用户停止拍摄希区柯克变焦视频的方法,这有助于提高用户体验。
可选的,该方法还可以包括:在当前预览界面中,显示第二信息,第二信息用于指示目标主体静止。可选的,该方法还可以包括:在当前预览界面中,显示第三信息,第三信息用于指示目标主体在当前预览图像的中央。可选的,针对第一场景采集N+1个图像,包括:在目标主体在当前预览图像的中央时,采集第一个图像。其具体实现 方式和示例可以参考上文,此处不再赘述。
在一种可能的设计中,终端的移动速度小于等于预设速度。由于终端中的摄像头的数量有限,而终端移动速度过快可能导致切换摄像头的速度过快,而当切换到最小倍率的摄像头时,不能再切换摄像头。在使用最小倍率的摄像头采集图像时,随着终端距离目标主体越来越近,在后采集的图像中的目标主体越来越大,可能导致在后采集的图像中的目标主体的大小大于上述N+1个图像中的第一个图像中目标主体的大小。在生成希区柯克变焦视频时,需要对这些图像进行缩小并补边,从而降低用户体验。基于此,提出该可能的设计。这样,有助于提高高希区柯克变焦视频的质量。
S402:对于N+1个图像中后采集的N个图像,终端基于预设神经网络进行白平衡处理,得到N个优化图像。其中,预设神经网络用于保证时域相邻图像的白平衡一致性。
S403:终端对N个优化图像进行放大并裁剪,得到N个目标图像。其中,该N个目标图像中的目标主体的大小与N+1个图像中采集的第一个图像中目标主体的大小一致。该N个目标图像中的目标主体的相对位置与第一个图像中目标主体的相对位置一致。该N个目标图像与第一个图像的大小一致。
S404:终端基于该N个目标图像和第一个图像,生成希区柯克变焦视频。
关于S402-S404的具体实现方式可以参考上述S302-S304的相关描述,此处不再赘述。
本申请实施例提供的拍摄视频的方法中,在终端距离目标主体越来越近的场景中,通过切换成更小倍率的摄像头,使得在后采集的图像中目标主体的大小小于或等于在前采集的图像中的目标主体的大小,从而基于所采集的图像得到希区柯克变焦视频。并且,对采集到的N+1个图像中的后N个图像进行了白平衡处理,以使得处理后的图像与所采集的N+1个图像中的第一个图像的白平衡一致。这样,可以使得获得的希区柯克变焦视频的白平衡效果更好,从而提高希区柯克变焦视频的质量,提高了用户体验。另外,在本申请实施例中,终端可以通过切换摄像头的方式,缩小后续采集的图像中的目标主体的大小,相比传统技术,不需要对采集的图像进行“补边”处理,因此,能够提高用户体验。
另外,本申请还提供的一种拍摄希区柯克变焦视频的方法,该方法可以应用于终端距离目标主体越来越近的场景中。该方法可以包括以下步骤:
步骤1:可以参考上述S400。
步骤2:终端针对第一场景实时采集N+1个图像,N+1个图像中均包括目标主体。其中,在采集N+1个图像的过程中,该终端距离目标主体越来越近。N是大于等于1的整数。N+1个图像中的第一个图像由终端的第一摄像头采集得到,N+1个图像中的后N个图像中的部分或全部图像由终端的第二摄像头采集得到,第二摄像头的倍率小于第一摄像头的倍率。
在一些示例中,该步骤的具体实现方式可以参考上述S301的相关描述,此处不再赘述。
在另一些示例中,基于上述S301中确定采集图像所使用的摄像头的方案(如图19a所示)可知,采集N+1个图像中的第一个图像和第二个图像时所采用的摄像头相 同。而本实施例中,在拍摄希区柯克变焦视频的过程中,终端距离目标主体越来越近,在不切换摄像头的情况下,在后采集的图像中的目标主体的大小大于在前采集的图像中的目标主体的大小。因此,采用如图19a所示的方法确定采集图像所使用的摄像头,会导致第二个图像中的目标主体的大小大于第一个图像中的目标主体的大小。
对此,在本申请实施例的一种解决方案中,终端采集第二个图像所使用的摄像头与采集第一个图像所使用的摄像头不同。并且,采集第二个图像所使用的摄像头的倍率小于采集第一个图像所使用的摄像头的倍率,这有助于实现第二个图像中的目标图像的大小小于第一个图像中的目标主体图像的大小。
步骤3:可以参考上述S402。
步骤4:终端对N个优化图像中满足第一条件的优化图像进行放大并裁剪,得到至少一个目标图像。其中,满足第一条件的优化图像是所包含的目标主体的大小小于第一个图像中的目标图像的大小的优化图像。该至少一个目标图像中的目标主体的大小与N+1个图像中采集的第一个图像中目标主体的大小一致。该至少一个目标图像中的目标主体的相对位置与第一个图像中目标主体的先对位置一致。该至少一个目标图像与第一个图像的大小一致。
关于满足第一条件的优化图像的处理方式可以参考S303中的相关描述,此处不再赘述。
步骤5:终端基于该至少一个目标图像和第一个图像,生成希区柯克变焦视频。
例如,终端基于该至少一个目标图像和第一个图像和N个优化图像中不满足第一条件的图像,生成希区柯克变焦视频。
由于本实施例提供的技术方案,可能导致在后采集的图像中的目标主体的大小大于、等于或小于第一个图像中的目标主体的大小。因此,本实施例区分了满足第一条件的优化图像和不满足第一条件的优化图像。对于不满足第一条件的优化图像,可以将其直接作为希区柯克变焦视频中的一个图像,而不需要执行放大裁剪。
步骤5的具体实现方式,也可以参考S304中的相关描述。
可选的,第N+1个图像中采集的第N+1个图像中目标主体的大小小于或等于第一个图像中目标主体的大小。
可选的,第N+1个图像中目标主体的大小大于第一个图像中目标主体的大小,且第N+1个图像中目标主体的大小与第一个图像中目标主体的大小的差值小于或等于预设阈值。也就是说,当第N+1个图像中目标主体的大小大于第一个图像中目标主体的大小时,二者相差不能太大,以使得基于第N+1个图像生成希区柯克变焦视频时能够满足“希区柯克变焦视频的不同图像中目标主体的大小一致”。
需要说明的是,在终端距离目标主体越来越近的场景下获取希区柯克变焦视频,在实际实现中,实时采集图像的过程中,如果终端因靠近目标主体而导致所采集的图像中的目标主体变大的程度,大于终端因将当前摄像头切换成更小倍率的摄像头(或因不切换摄像头)而导致所采集的图像中的目标主体缩小的程度,那么,可能导致在后采集的图像中的目标主体的大小大于第一个图像中的目标主体的大小。
对此:
在本申请实施例的一种解决方案中,如果终端当前采集的图像中的目标主体的大 小与第一个图像中的目标主体的大小的差值大于预设阈值,且当前采集的图像中的目标主体的大小大于第一个图像中的目标主体的大小,则停止采集图像。相应的,终端采用在此之前采集的图像生成希区柯克变焦视频。
由于希区柯克变焦视频的要求之一是不同图像中目标主体的大小一致,“终端当前采集的图像中的目标主体的大小与第一个图像中的目标主体的大小的差值大于预设阈值,且当前采集的图像中的目标主体的大小大于第一个图像中的目标主体的大小”,说明:基于终端当前采集的图像,不能满足希区柯克变焦视频中不同图像中目标主体的大小一致,因此,停止采集图像。
基于此,可选的,该方法还可以包括:针对第一场景采集第N+2个图像,第N+2个图像包括目标主体;采集N+2个图像时终端与目标主体之间的距离,小于采集N+1个图像时终端与目标主体之间的距离。该情况下,上述步骤4包括:在第N+2个图像中的目标主体的大小与第一个图像中的目标主体的大小的差值大于预设阈值,且第N+2个图像中的目标主体的大小大于第一个图像中的目标主体的大小时,基于该N个目标图像和第一个图像,生成希区柯克变焦视频。
可选的,终端在基于上述方案停止采集图像时,可以输出第一信息,第一信息用于指示停止拍摄希区柯克变焦视频。也就是说,本申请实施例提供了一种指导用户停止拍摄希区柯克变焦视频的方法,这样,用户可以基于该图像停止移动终端,从而提高用户体验。
本申请实施例对第一信息的具体实现方式不进行限定,例如第一信息可以以图像、文字、语音等方式输出。本实施例中,终端可以在基于上述方案,确定停止采集图像时,显示如图23a所示的当前预览界面,从而提示用户停止拍摄希区柯克变焦视频。
在本申请实施例的再一种解决方案中,N+1个图像由终端中的最小倍率的摄像头采集得到。在第N+1个图像中目标主体的大小大于第一个图像中目标主体的大小,且第N+1个图像中目标主体的大小与第一个图像中目标主体的大小的差值等于预设阈值时,输出第一信息,第一信息用于指示停止拍摄希区柯克变焦视频。
一方面,由于N+1个图像由终端中的最小倍率的摄像头采集得到,因此,后续如果继续采集图像,则不能再切换摄像头。另一方面,希区柯克变焦视频的要求不同图像中目标主体的大小一致。而第N+1个图像中目标主体的大小大于第一个图像中目标主体的大小,且第N+1个图像中目标主体的大小与第一个图像中目标主体的大小的差值等于预设阈值时,说明第N+1个图像中目标主体的大小相比第一个图像中目标主体的大小的跳变程度,已达到获得希区柯克变焦视频的临界值,此时,如果继续采集图像,会因不能再切换摄像头而导致继续采集的图像中目标主体的大小相比第一个图像中目标主体的大小的跳变程度更大,从而导致不能获得希区柯克变焦视频。考虑到这一点,本申请实施例提供了上述停止拍摄希区柯克变焦视频的方法。
在本申请实施例的另一种解决方案中,S403中满足第一条件的优化图像可以替换为:所包含的目标主体的大小小于参考图像中的目标图像的大小的优化图像。该参考图像是N+1个图像中的“在该优化图像对应的图像之前,与该图像之间的距离最近,且所包含的目标主体的大小大于或等于第一个图像中所包含的目标主体的大小”的图像。
例如,以终端中包括0.6X、1X、2X、5X和10X摄像头为例,这几个摄像头对应的拍摄倍率范围分别为:[0.6,1),[1,2),[2,5),[5,10),大于等于10的范围。
采用10X摄像头采集第1个图像,该图像中目标主体的大小是d。
采用5X摄像头采集第2个图像,该图像中目标主体的大小是0.8d。由此可知,第2个图像的拍摄倍率是:5/(0.8d/d)=6.25,6.25∈[5,10),因此采集第3个图像的摄像头是5X摄像头。
采用5X摄像头采集第3个图像,该图像中目标主体的大小是1.5d。由此可知,第3个图像的拍摄倍率是:5/(1.5d/d)=3.33,3.33∈[2,5),因此采集第4个图像的摄像头是2X摄像头。
采用2X摄像头采集第4个图像,该图像中目标主体的大小是0.8d。由此可知,第4个图像的拍摄倍率是:2/(0.8d/d)=2.5,2.5∈[2,5),因此采集第5个图像的摄像头是2X摄像头。
……
基于该示例,满足第一条件的优化图像是第2个图像对应的优化图像,以及第4个图像对应的优化图像。对于第4个图像的优化图像来说,对其进行放大裁剪时,参考图像是第3个图像。对于第2个图像的优化图像来说,对其进行放大裁剪时,参考图像是第1个图像。
如图27所示,为本申请实施例提供的一种拍摄视频的方法的流程示意图。图27所示的方法应用于终端,终端包括至少两个摄像头,该至少两个摄像头的倍率不同。图27所示的方法可以包括以下步骤:
S500:终端通过至少两个摄像在第一时刻针对第一场景分别采集至少两个图像;其中,一个摄像头对应一个图像,且至少两个图像中均包含目标主体。
也就是说,本申请实施例中,针对待拍摄视频所采集的图像是多个摄像头在同一时刻针对同一场景采集的图像。
S501:终端基于视频的预设播放时长和预设播放帧率,确定至少两个图像中的第一图像和第二图像之间的待插入图像的帧数N;其中,第一图像是至少两个图像中的通过第一摄像头采集的图像,第一摄像头是至少两个摄像头中倍率最大的摄像头。第二图像是至少两个头图像中通过第二摄像头采集的图像,第二摄像头是至少两个摄像头中倍率最小的摄像头。N是大于等于1的整数。
这是在考虑到:基于不同摄像头在同一时刻针对同一场景采集的图像中,通过最大倍率的摄像头采集的图像中的目标主体的大小最大,而通过最小倍率的摄像头采集的图像中的目标主体的大小最小,而提出的技术方案。
S502:终端基于待插入图像的帧数N和该至少两个图像中的部分或全部图像,确定N个待插入图像。其中,该部分或全部图像至少包含第一图像和第二图像。
具体实现时,终端先提取所采集的该部分或全部图像中的每个图像中的目标主体的大小,再基于相应图像中的目标主体的大小,确定相应待插入图像的像素的值。具体示例可以参考图28所示的示例。
终端基于该至少两个图像中的越多图像确定待插入图像,越有助于提高插帧的准确率,从而使得最终生成的视频中的图像更能反映真实场景,进而提高用户体验。
S503:终端基于该至少两个图像和N个待插入图像,生成视频。其中,该视频的各图像中的目标主体的大小逐渐变大或逐渐变小。
在一个示例中,终端中设置有10X摄像头,3X摄像头、1X摄像头和0.6X摄像头。
执行S500时,终端在同一时刻基于这4个摄像头分别采集图像,得到图像1-4,如图28所示。其中,图28中的a-d图分别表示图像1-4。
执行S501时,假设待拍摄视频的预设播放时长是n秒,n是大于等于1的整数,且预设播放帧率是24帧/秒,即每秒共播放24帧图像,则该视频总共需要的图像的个数是n*24。由此可以得到待拍摄视频中相邻两帧之间的缩放倍率是
Figure PCTCN2021094695-appb-000001
以n=1为例,待拍摄视频中相邻两帧之间的缩放倍率为
Figure PCTCN2021094695-appb-000002
执行S502时,终端可以执行以下步骤:
首先,终端确定参考图像,并基于参考图像和待拍摄视频中相邻两帧之间的缩放倍率确定待插入图像的拍摄倍率,待插入图像的个数是N。其中,该参考图像可以是最大倍率的摄像头采集的图像(即图像1),或最小倍率的摄像头采集的图像(即图像4)。以参考图像是图像4为例,N个待插入图像的拍摄倍率分别是:0.6*1.124=0.6744,0.6744*1.124=0.758,0.758*1.124=0.852,……
其次,终端基于任一待插入帧图像的拍摄倍率,以及终端所采集的拍摄倍率大于该待插入帧图像的拍摄倍率和小于该待插入帧图像的拍摄倍率的两个图像中的像素的值,进行插帧,得到该待插入图像中的像素的值。以此类推,终端可以获得N个待插入图像。
可以理解的是,在进行插帧之前,终端需要对这两个图像进行目标主体检测,从而得到目标主体的大小。如图28中的e-h图示意出了目标主体检测的示意图,这些图中矩形框中的部分表示目标主体。并且,图28中示意了插帧的步骤。
可选的,这两个图像是拍摄倍率大于该待插入帧图像的拍摄倍率且与该待插入帧图像的拍摄倍率之差最小的图像,以及,拍摄倍率小于该待插入帧图像的拍摄倍率与该待插入帧图像的拍摄倍率之差最小的图像。
例如,对于拍摄倍率在0.6-1之间的待插入图像,采用图像4和图像3进行插帧。对于拍摄倍率在1-3之间的待插入图像,采用图像3和图像2进行插帧。对于拍摄倍率在3-10之间的待插入图像,采用图像2和图像1进行插帧。
最后,终端将待插入图像和图像1-4,按照所包含的目标主体的大小由大到小的顺序或由小到大的顺序,生成视频(或动态图)。图28中示意了生成视频的步骤。
传统技术中,通常采用同一摄像头采集不同物距下的图像,从而生成视频,该视频的各图像中的目标主体的大小逐渐变大或逐渐变小。这可能因不同时刻采集的图像时,终端的位置偏移(如左右偏移或上下偏移),或者背景中的动态对象的移动等,而导致不同图像中的背景差异较大,从而降低了视频的质量。本实施例提供的拍摄视频的方法,终端通过多个摄像头在同一时刻针对同一场景采集多帧图像,并基于该多帧图像进行插帧,从而生成该视频。这样,相比传统技术,有助于提高所生成的视频的质量。另外,有助于提升动图效果的趣味性,增强用户对终端的粘性。
上述主要从方法的角度对本申请实施例提供的方案进行了介绍。为了实现上述功能,其包含了执行各个功能相应的硬件结构和/或软件模块。本领域技术人员应该很容 易意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,本申请能够以硬件或硬件和计算机软件的结合形式来实现。某个功能究竟以硬件还是计算机软件驱动硬件的方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
本申请实施例可以根据上述方法示例对终端进行功能模块的划分,例如,可以对应各个功能划分各个功能模块,也可以将两个或两个以上的功能集成在一个处理模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。需要说明的是,本申请实施例中对模块的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。
如图29所示,为本申请实施例提供的一种终端的结构示意图。图29所示的终端220可以用于实现上述方法实施例中终端的功能,因此也能实现上述方法实施例所具备的有益效果。在本申请的实施例中,该终端可以是如图1所示的终端100。
如图29所示,终端220包括采集单元221和处理单元222。可选的,如图30所示,终端220还包括显示单元223。
在一些实施例中:
采集单元221,用于针对第一场景实时采集N+1个图像,N+1个图像中均包括目标主体;其中,在采集N+1个图像的过程中,终端距离目标主体越来越远。N是大于等于1的整数。处理单元222,用于执行以下步骤:对于N+1个图像中后采集的N个图像,基于预设神经网络进行白平衡处理,得到N个优化图像;预设神经网络用于保证时域相邻图像的白平衡一致性。对N个优化图像进行放大并裁剪,得到N个目标图像;其中,N个目标图像中的目标主体的大小与N+1个图像中采集的第一个图像中目标主体的大小一致,N个目标图像中目标主体的相对位置,与第一个图像中目标主体的相对位置一致;N个目标图像与第一个图像的大小一致;基于N个目标图像和第一个图像生成希区柯克变焦视频。例如,结合图10,采集单元221用于执行S301,处理单元222用于执行S302-S304。
可选的,N+1个图像包括在前采集的N1+1个图像和在后采集的N2个图像,其中,N1+1个图像由终端的第一摄像头采集得到,N2个图像由终端的第二摄像头采集得到;N1和N2均是大于等于1的整数。
可选的,采集单元221具体用于:获取N+1个图像中的第i个图像的拍摄倍率,其中,2≤i≤N,i是整数;如果第i个图像的拍摄倍率在第一拍摄倍率范围内,则基于第一摄像头针对第一场景采集N+1个图像中的第i+1个图像。如果第i个图像的拍摄倍率在第二拍摄倍率范围内,则基于第二摄像头针对第一场景采集N+1个图像中的第i+1个图像;其中,第一摄像头的倍率是a,第二摄像头的倍率是b;a<b;第一拍摄倍率范围是[a,b);第二拍摄倍率范围是大于等于b的范围。例如,结合图19b,采集单元221可以用于执行S301c-3。
可选的,第i个图像的拍摄倍率是基于第i个图像中目标主体的大小相对于第一个图像中目标主体的大小的缩放倍率,和采集第一个图像的摄像头的倍率确定的。
可选的,第i个图像中的目标主体的大小通过以下至少一个特征来表征:第i个图 像中的目标主体的宽度,第i个图像中的目标主体的高度,第i个图像中的目标主体的面积,或者,第i个图像中的目标主体的所占的像素点的数量。
可选的,处理单元222还用于,采用实例分割算法从第i个图像中提取目标主体,以确定第i个图像中的目标主体的大小。
可选的,显示单元223,用于在当前预览界面中,显示第一信息,第一信息用于指示停止拍摄希区柯克变焦视频。
可选的,显示单元223,用于在当前预览界面中,显示第二信息,第二信息用于指示目标主体静止。
可选的,显示单元223,用于在当前预览界面中,显示第三信息,第三信息用于指示目标主体在当前预览图像的中央。
可选的,采集单元221具体用于,在目标主体在当前预览图像的中央时,采集第一个图像。
可选的,显示单元223,用于显示用户界面,用户界面中包含第一控件,第一控件用于指示由近及远拍摄希区柯克变焦视频;以及,接收针对第一控件的操作。采集单元221具体用于,响应于操作,针对第一场景实时采集N+1个图像。
可选的,终端的移动速度小于等于预设速度。
可选的,预设神经网络用于结合历史网络层的特征图,对待处理图像的白平衡增益进行预测,以保证时域相邻图像的白平衡一致性;其中,历史网络层是预测在待处理图像之前且与待处理图像时域连续的图像的白平衡增益时所使用的网络层。
可选的,预设神经网络基于预设约束条件训练得到;其中,预设约束条件包括:用于模拟时域连续的多个图像的白平衡增益预测值一致。
可选的,处理单元222在对于N+1个图像中后采集的N个图像,基于预设神经网络进行白平衡处理,得到N个优化图像的方面,具体用于:将N+1个图像中的第j个图像输入到预设神经网络,得到第j个图像的白平衡增益预测值;其中,2≤j≤N-1,j是整数。将第j个图像的白平衡增益预测值作用于第j个图像,得到第j个图像对应的优化图像;其中,N个优化图像包括第j个图像对应的优化图像。
在另一些实施例中:
采集单元221,用于针对第一场景实时采集N+1个图像,N+1个图像中均包括目标主体;其中,在采集N+1个图像的过程中,终端距离目标主体越来越近;N是大于等于1的整数。N+1个图像中的第一个图像由终端的第一摄像头采集得到,N+1个图像中的后N个图像中的部分或全部图像由终端的第二摄像头采集得到,第二摄像头的倍率小于第一摄像头的倍率。N+1个图像中后采集的N个图像中目标主体的大小小于或等于N+1个图像中采集的第一个图像中的目标主体的大小。处理单元222,用于执行以下步骤:对于后采集的N个图像,基于预设神经网络进行白平衡处理,得到N个优化图像;预设神经网络用于保证时域相邻图像的白平衡一致性。对N个优化图像进行放大并裁剪,得到N个目标图像,N个目标图像中的目标主体的大小与N+1个图像中采集的第一个图像中目标主体的大小一致,该N个目标图像中目标主体的相对位置,与第一个图像中目标主体的相对位置一致。该N个目标图像与第一个图像的大小一致。基于该N个目标图像和第一个图像,生成希区柯克变焦视频。例如,结合图25,采集 单元221可以用于执行S401,处理单元222可以用于执行S402-S404。
可选的,后采集的N个图像包括在前采集的N1个图像和在后采集的N2个图像,其中,N1个图像由第二摄像头采集得到,N2个图像由终端的第三摄像头采集得到;N1和N2均是大于等于1的整数。
可选的,采集单元221在针对第一场景采集N+1个图像的方面,具体用于:获取N+1个图像中的第i个图像的拍摄倍率,其中,2≤i≤N,i是整数;如果第i个图像的拍摄倍率在第一拍摄倍率范围内,则基于第二摄像头针对第一场景采集N+1个图像中的第i+1个图像。如果第i个图像的拍摄倍率在第二拍摄倍率范围内,则基于第三摄像头针对第一场景采集N+1个图像中的第i+1个图像。其中,第二摄像头的倍率是b,第三摄像头的倍率是c;b>c;第一拍摄倍率范围是大于等于b的范围;第二拍摄倍率范围是[c,b)。
可选的,第i个图像的拍摄倍率是基于第i个图像中目标主体的大小相对于第一个图像中目标主体的大小的缩放倍率,和采集第一个图像的摄像头的倍率确定的。
可选的,第i个图像中的目标主体的大小通过以下至少一个特征来表征:第i个图像中的目标主体的宽度,第i个图像中的目标主体的高度,第i个图像中的目标主体的面积,或者,第i个图像中的目标主体的所占的像素点的数量。
可选的,处理单元222还用于,采用实例分割算法从第i个图像中提取目标主体,以确定第i个图像中的目标主体的大小。
可选的,显示单元223,用于在当前预览界面中,显示第一信息,第一信息用于指示停止拍摄希区柯克变焦视频。
可选的,显示单元223,用于在当前预览界面中,显示第二信息,第二信息用于指示目标主体静止。
可选的,显示单元223,用于在当前预览界面中,显示第三信息,第三信息用于指示目标主体在当前预览图像的中央。
可选的,采集单元221具体用于:在目标主体在当前预览图像的中央时,采集第一个图像。
可选的,显示单元223,用于显示用户界面,用户界面中包含第二控件,第二控件用于指示由远及近拍摄希区柯克变焦视频;以及接收针对第二控件的操作。采集单元221具体用于,响应于操作,针对第一场景采集N+1个图像。
可选的,终端的移动速度小于等于预设速度。
可选的,预设神经网络用于结合历史网络层的特征图,对待处理图像的白平衡增益进行预测,以保证时域相邻图像的白平衡一致性;其中,历史网络层是预测在待处理图像之前且与待处理图像时域连续的图像的白平衡增益时所使用的网络层。
可选的,预设神经网络基于预设约束条件训练得到;其中,预设约束条件包括:用于模拟时域连续的多个图像的白平衡增益预测值一致。
可选的,对于N+1个图像中后采集的N个图像,基于预设神经网络进行白平衡处理,得到N个优化图像,包括:将N+1个图像中的第j个图像输入到预设神经网络,得到第j个图像的白平衡增益预测值;其中,2≤j≤N-1,j是整数。将第j个图像的白平衡增益预测值作用于第j个图像,得到第j个图像对应的优化图像;其中,N个优化 图像包括第j个图像对应的优化图像。
在另一些实施例中:
采集单元221包括第一摄像头和第二摄像头,第一摄像头的倍率与第二摄像头的倍率不同。采集单元221,用于通过第一摄像头和第二摄像头在第一时刻针对第一场景分别采集第一图像和第二图像;其中,第一图像和第二图像中均包含目标主体。处理单元222,用于执行以下步骤:基于视频的预设播放时长和预设播放帧率,确定第一图像和第二图像之间的待插入图像的帧数N;其中,N是大于等于1的整数。基于帧数N、第一图像和第二图像,确定N个待插入图像。基于第一图像、第二图像和待插入图像,生成视频;其中,该视频的各图像中目标主体的大小逐渐变大或逐渐变小。例如,结合图27,采集单元221可以用于执行S500,处理单元222可以用于执行S501-S503。
可选的,采集单元221还包括第三摄像头,第三摄像头的倍率在第一摄像头与第二摄像头的倍率之间。采集单元221还用于,通过第三摄像头在第一时刻针对第一场景采集第三图像;其中,第三图像包含目标主体。处理单元222在基于帧数、第一图像中的目标主体的大小和第二图像中的目标主体的大小,确定N个待插入图像的方面,具体用于:基于帧数N、第一图像和第二图像,确定N个待插入图像的方面,具体用于:基于帧数N、第一图像、第二图像和第三图像,确定N个待插入图像。
关于上述可选方式的具体描述可以参见前述的方法实施例,此处不再赘述。此外,上述提供的任一种终端220的解释以及有益效果的描述均可参考上述对应的方法实施例,不再赘述。
作为示例,结合图1,上述采集单元可以通过摄像头193实现。上述处理单元222的功能,均可以通过处理器110调用存储内部存储器121中的程度代码实现。
本申请另一实施例还提供一种终端,包括:处理器、存储器和摄像头,摄像头用于采集图像,存储器用于存储计算机程序和指令,处理器用于调用计算机程序和指令,与摄像协同执行上述方法实施例所示的方法流程中该终端执行的相应步骤。
本申请另一实施例还提供一种计算机可读存储介质,该计算机可读存储介质中存储有指令,当指令在终端执行上述方法实施例所示的方法流程中该终端执行的各个步骤。
在一些实施例中,所公开的方法可以实施为以机器可读格式被编码在计算机可读存储介质上的或者被编码在其它非瞬时性介质或者制品上的计算机程序指令。
应该理解,这里描述的布置仅仅是用于示例的目的。因而,本领域技术人员将理解,其它布置和其它元素(例如,机器、接口、功能、顺序、和功能组等等)能够被取而代之地使用,并且一些元素可以根据所期望的结果而一并省略。另外,所描述的元素中的许多是可以被实现为离散的或者分布式的组件的、或者以任何适当的组合和位置来结合其它组件实施的功能实体。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件程序实现时,可以全部或部分地以计算机程序产品的形式来实现。该计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行计算机执行指令时,全部或部分地产生按照本申请实施例的流程或功能。计算机可以是通用计算机、 专用计算机、计算机网络、或者其他可编程装置。计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,计算机指令可以从一个网站站点、计算机、服务器或者数据中心通过有线(例如同轴电缆、光纤、数字用户线(digital subscriber line,DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可以用介质集成的服务器、数据中心等数据存储设备。可用介质可以是磁性介质(例如,软盘、硬盘、磁带),光介质(例如,DVD)、或者半导体介质(例如固态硬盘(solid state disk,SSD))等。
以上所述,仅为本发明的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应以所述权利要求的保护范围为准。

Claims (65)

  1. 一种拍摄视频的方法,其特征在于,所述方法应用于终端,所述方法包括:
    针对第一场景实时采集N+1个图像,所述N+1个图像中均包括目标主体;其中,在采集所述N+1个图像的过程中,所述终端距离所述目标主体越来越远;N是大于等于1的整数;
    对于所述N+1个图像中后采集的N个图像,基于预设神经网络进行白平衡处理,得到N个优化图像;所述预设神经网络用于保证时域相邻图像的白平衡一致性;
    对所述N个优化图像进行放大并裁剪,得到N个目标图像;其中,所述N个目标图像中所述目标主体的大小与所述N+1个图像中采集的第一个图像中所述目标主体的大小一致,所述N个目标图像中所述目标主体的相对位置,与所述第一个图像中所述目标主体的相对位置一致;所述N个目标图像与所述第一个图像的大小一致;
    基于所述N个目标图像和所述第一个图像生成希区柯克变焦视频。
  2. 根据权利要求1所述的方法,其特征在于,
    所述N+1个图像包括在前采集的N1+1个图像和在后采集的N2个图像,其中,所述N1+1个图像由所述终端的第一摄像头采集得到,所述N2个图像由所述终端的第二摄像头采集得到;所述N1和所述N2均是大于等于1的整数。
  3. 根据权利要求1或2所述的方法,其特征在于,所述针对第一场景实时采集N+1个图像,包括:
    获取所述N+1个图像中的第i个图像的拍摄倍率;其中,2≤i≤N,i是整数;
    如果所述第i个图像的拍摄倍率在第一拍摄倍率范围内,则基于所述终端的第一摄像头针对所述第一场景采集所述N+1个图像中的第i+1个图像;
    如果所述第i个图像的拍摄倍率在第二拍摄倍率范围内,则基于所述终端的第二摄像头针对所述第一场景采集所述N+1个图像中的第i+1个图像;
    其中,所述第一摄像头的倍率是a,所述第二摄像头的倍率是b;a<b;所述第一拍摄倍率范围是[a,b);所述第二拍摄倍率范围是大于等于b的范围。
  4. 根据权利要求3所述的方法,其特征在于,
    所述第i个图像的拍摄倍率是基于所述第i个图像中所述目标主体的大小相对于所述第一个图像中所述目标主体的大小的缩放倍率,和采集所述第一个图像的摄像头的倍率确定的。
  5. 根据权利要求4所述的方法,其特征在于,所述第i个图像中的所述目标主体的大小通过以下至少一个特征来表征:
    所述第i个图像中的所述目标主体的宽度,
    所述第i个图像中的所述目标主体的高度,
    所述第i个图像中的所述目标主体的面积,或者,
    所述第i个图像中的所述目标主体的所占的像素点的数量。
  6. 根据权利要求4或5所述的方法,其特征在于,所述方法还包括:
    采用实例分割算法从所述第i个图像中提取所述目标主体,以确定所述第i个图像中的所述目标主体的大小。
  7. 根据权利要求1至6任一项所述的方法,其特征在于,所述方法还包括:
    在当前预览界面中,显示第一信息,所述第一信息用于指示停止拍摄希区柯克变焦视频。
  8. 根据权利要求1至7任一项所述的方法,其特征在于,所述方法还包括:
    在当前预览界面中,显示第二信息,所述第二信息用于指示所述目标主体静止。
  9. 根据权利要求1至8任一项所述的方法,其特征在于,所述方法还包括:
    在当前预览界面中,显示第三信息,所述第三信息用于指示所述目标主体在当前预览图像的中央。
  10. 根据权利要求1至9任一项所述的方法,其特征在于,所述针对第一场景实时采集N+1个图像,包括:
    在所述目标主体在当前预览图像的中央时,采集所述第一个图像。
  11. 根据权利要求1至10任一项所述的方法,其特征在于,所述方法还包括:
    显示用户界面,所述用户界面中包含第一控件,所述第一控件用于指示由近及远拍摄希区柯克变焦视频;
    所述针对第一场景实时采集N+1个图像,包括:
    接收针对所述第一控件的操作,响应于所述操作,针对所述第一场景实时采集所述N+1个图像。
  12. 根据权利要求1至11任一项所述的方法,其特征在于,所述终端的移动速度小于等于预设速度。
  13. 根据权利要求1至12任一项所述的方法,其特征在于,
    所述预设神经网络用于结合历史网络层的特征图,对待处理图像的白平衡增益进行预测,以保证时域相邻图像的白平衡一致性;其中,所述历史网络层是预测在所述待处理图像之前且与所述待处理图像时域连续的图像的白平衡增益时所使用的网络层。
  14. 根据权利要求13所述的方法,其特征在于,
    所述预设神经网络基于预设约束条件训练得到;其中,所述预设约束条件包括:用于模拟时域连续的多个图像的白平衡增益预测值一致。
  15. 根据权利要求1至14任一项所述的方法,其特征在于,所述对于所述N+1个图像中后采集的N个图像,基于预设神经网络进行白平衡处理,得到N个优化图像,包括:
    将所述N+1个图像中的第j个图像输入到所述预设神经网络,得到所述第j个图像的白平衡增益预测值;其中,2≤j≤N-1,j是整数;
    将所述第j个图像的白平衡增益预测值作用于所述第j个图像,得到所述第j个图像对应的优化图像;其中,所述N个优化图像包括所述第j个图像对应的优化图像。
  16. 一种拍摄视频的方法,其特征在于,所述方法应用于终端,所述方法包括:
    针对第一场景采集N+1个图像,所述N+1个图像中均包括目标主体;其中,在所述采集N+1个图像的过程中,所述终端距离所述目标主体越来越近;N是大于等于1的整数;所述N+1个图像中的第一个图像由所述终端的第一摄像头采集得到,所述N+1个图像中的后N个图像中的部分或全部图像由所述终端的第二摄像头采集得到,所述第二摄像头的倍率小于所述第一摄像头的倍率;所述N+1个图像中后采集的N个图像中所述目标主体的大小小于或等于所述N+1个图像中采集的第一个图像中的所述 目标主体的大小;
    对于所述N个图像,基于预设神经网络进行白平衡处理,得到N个优化图像;所述预设神经网络用于保证时域相邻图像的白平衡一致性;
    对所述N个优化图像进行放大并裁剪,得到N个目标图像;其中,所述N个目标图像中所述目标主体的大小与所述第一个图像中所述目标主体的大小一致,所述N个目标图像中所述目标主体的相对位置,与所述第一个图像中所述目标主体的相对位置一致;所述N个个目标图像与所述第一个图像的大小一致;
    基于所述N个目标图像和所述第一个图像,生成希区柯克变焦视频。
  17. 根据权利要求16所述的方法,其特征在于,所述N个图像包括在前采集的N1个图像和在后采集的N2个图像,其中,所述N1个图像由所述第二摄像头采集得到,所述N2个图像由所述终端的第三摄像头采集得到;所述N1和所述N2均是大于等于1的整数。
  18. 根据权利要求16或17所述的方法,其特征在于,所述针对第一场景采集N+1个图像,包括:
    获取所述N+1个图像中的第i个图像的拍摄倍率;其中,2≤i≤N,i是整数;
    如果所述第i个图像的拍摄倍率在第一拍摄倍率范围内,则基于所述第二摄像头针对所述第一场景采集所述N+1个图像中的第i+1个图像;
    如果所述第i个图像的拍摄倍率在第二拍摄倍率范围内,则基于所述终端的第三摄像头针对所述第一场景采集所述N+1个图像中的第i+1个图像;
    其中,所述第二摄像头的倍率是b,所述第三摄像头的倍率是c;b>c;所述第一拍摄倍率范围是大于等于b的范围;所述第二拍摄倍率范围是[c,b)。
  19. 根据权利要求18所述的方法,其特征在于,
    所述第i个图像的拍摄倍率是基于所述第i个图像中所述目标主体的大小相对于所述第一个图像中所述目标主体的大小的缩放倍率,和采集所述第一个图像的摄像头的倍率确定的。
  20. 根据权利要求19所述的方法,其特征在于,所述第i个图像中的所述目标主体的大小通过以下至少一个特征来表征:
    所述第i个图像中的所述目标主体的宽度,
    所述第i个图像中的所述目标主体的高度,
    所述第i个图像中的所述目标主体的面积,或者,
    所述第i个图像中的所述目标主体的所占的像素点的数量。
  21. 根据权利要求19或20所述的方法,其特征在于,所述方法还包括:
    采用实例分割算法从所述第i个图像中提取所述目标主体,以确定所述第i个图像中的所述目标主体的大小。
  22. 根据权利要求16至21任一项所述的方法,其特征在于,所述方法还包括:
    在当前预览界面中,显示第一信息,所述第一信息用于指示停止拍摄希区柯克变焦视频。
  23. 根据权利要求16至22任一项所述的方法,其特征在于,所述方法还包括:
    在当前预览界面中,显示第二信息,所述第二信息用于指示所述目标主体静止。
  24. 根据权利要求16至23任一项所述的方法,其特征在于,所述方法还包括:
    在当前预览界面中,显示第三信息,所述第三信息用于指示所述目标主体在当前预览图像的中央。
  25. 根据权利要求16至24任一项所述的方法,其特征在于,所述针对第一场景采集N+1个图像,包括:
    在所述目标主体在当前预览图像的中央时,采集所述第一个图像。
  26. 根据权利要求16至25任一项所述的方法,其特征在于,所述方法还包括:
    显示用户界面,所述用户界面中包含第二控件,所述第二控件用于指示由远及近拍摄希区柯克变焦视频;
    所述针对第一场景采集N+1个图像,包括:
    接收针对所述第二控件的操作,响应于所述操作,针对所述第一场景采集所述N+1个图像。
  27. 根据权利要求16至26任一项所述的方法,其特征在于,所述终端的移动速度小于等于预设速度。
  28. 根据权利要求16至27任一项所述的方法,其特征在于,
    所述预设神经网络用于结合历史网络层的特征图,对待处理图像的白平衡增益进行预测,以保证时域相邻图像的白平衡一致性;其中,所述历史网络层是预测在所述待处理图像之前且与所述待处理图像时域连续的图像的白平衡增益时所使用的网络层。
  29. 根据权利要求28所述的方法,其特征在于,
    所述预设神经网络基于预设约束条件训练得到;其中,所述预设约束条件包括:用于模拟时域连续的多个图像的白平衡增益预测值一致。
  30. 根据权利要求16至29任一项所述的方法,其特征在于,所述对于所述N+1个图像中后采集的N个图像,基于预设神经网络进行白平衡处理,得到N个优化图像,包括:
    将所述N+1个图像中的第j个图像输入到所述预设神经网络,得到所述第j个图像的白平衡增益预测值;其中,2≤j≤N-1,j是整数;
    将所述第j个图像的白平衡增益预测值作用于所述第j个图像,得到所述第j个图像对应的优化图像;其中,所述N个优化图像包括所述第j个图像对应的优化图像。
  31. 一种拍摄视频的方法,其特征在于,应用于终端,所述终端包括第一摄像头和第二摄像头,所述第一摄像头的倍率与所述第二摄像头的倍率不同,所述方法包括:
    通过所述第一摄像头和所述第二摄像头在第一时刻针对第一场景分别采集第一图像和第二图像;其中,所述第一图像和所述第二图像中均包含目标主体;
    基于所述视频的预设播放时长和预设播放帧率,确定所述第一图像和所述第二图像之间的待插入图像的帧数N;其中,N是大于等于1的整数;
    基于所述帧数N、所述第一图像和所述第二图像,确定N个待插入图像;
    基于所述第一图像、所述第二图像和所述N个待插入图像,生成所述视频;其中,所述视频的各图像中的所述目标主体的大小逐渐变大或逐渐变小。
  32. 根据权利要求31所述的方法,其特征在于,所述终端还包括第三摄像头,所述第三摄像头的倍率在所述第一摄像头与所述第二摄像头的倍率之间,所述方法还包 括:
    通过所述第三摄像头在所述第一时刻针对所述第一场景采集第三图像;其中,所述第三图像包含所述目标主体;
    所述基于所述帧数、所述第一图像和所述第二图像,确定N个待插入图像,包括:
    基于所述帧数、所述第一图像、所述第二图像和所述第三图像,确定N个待插入图像。
  33. 一种终端,其特征在于,所述终端包括:
    采集单元,用于针对第一场景实时采集N+1个图像,所述N+1个图像中均包括目标主体;其中,在采集所述N+1个图像的过程中,所述终端距离所述目标主体越来越远;N是大于等于1的整数;
    处理单元,用于执行以下步骤:
    对于所述N+1个图像中后采集的N个图像,基于预设神经网络进行白平衡处理,得到N个优化图像;所述预设神经网络用于保证时域相邻图像的白平衡一致性;
    对所述N个优化图像进行放大并裁剪,得到N个目标图像;其中,所述N个目标图像中所述目标主体的大小与所述N+1个图像中采集的第一个图像中所述目标主体的大小一致,所述N个目标图像中所述目标主体的相对位置,与所述第一个图像中所述目标主体的相对位置一致;所述N个目标图像与所述第一个图像的大小一致;
    基于所述N个目标图像和所述第一个图像生成希区柯克变焦视频。
  34. 根据权利要求33所述的终端,其特征在于,
    所述N+1个图像包括在前采集的N1+1个图像和在后采集的N2个图像,其中,所述N1+1个图像由所述终端的第一摄像头采集得到,所述N2个图像由所述终端的第二摄像头采集得到;所述N1和所述N2均是大于等于1的整数。
  35. 根据权利要求33或34所述的终端,其特征在于,所述采集单元具体用于:
    获取所述N+1个图像中的第i个图像的拍摄倍率;其中,2≤i≤N,i是整数;
    如果所述第i个图像的拍摄倍率在第一拍摄倍率范围内,则基于所述终端的第一摄像头针对所述第一场景采集所述N+1个图像中的第i+1个图像;
    如果所述第i个图像的拍摄倍率在第二拍摄倍率范围内,则基于所述终端的第二摄像头针对所述第一场景采集所述N+1个图像中的第i+1个图像;
    其中,所述第一摄像头的倍率是a,所述第二摄像头的倍率是b;a<b;所述第一拍摄倍率范围是[a,b);所述第二拍摄倍率范围是大于等于b的范围。
  36. 根据权利要求35所述的终端,其特征在于,
    所述第i个图像的拍摄倍率是基于所述第i个图像中所述目标主体的大小相对于所述第一个图像中所述目标主体的大小的缩放倍率,和采集所述第一个图像的摄像头的倍率确定的。
  37. 根据权利要求36所述的终端,其特征在于,所述第i个图像中的所述目标主体的大小通过以下至少一个特征来表征:
    所述第i个图像中的所述目标主体的宽度,
    所述第i个图像中的所述目标主体的高度,
    所述第i个图像中的所述目标主体的面积,或者,
    所述第i个图像中的所述目标主体的所占的像素点的数量。
  38. 根据权利要求36或37所述的终端,其特征在于,
    所述处理单元还用于,采用实例分割算法从所述第i个图像中提取所述目标主体,以确定所述第i个图像中的所述目标主体的大小。
  39. 根据权利要求33至38任一项所述的终端,其特征在于,所述终端还包括:
    显示单元,用于在当前预览界面中,显示第一信息,所述第一信息用于指示停止拍摄希区柯克变焦视频。
  40. 根据权利要求33至39任一项所述的终端,其特征在于,所述终端还包括:
    显示单元,用于在当前预览界面中,显示第二信息,所述第二信息用于指示所述目标主体静止。
  41. 根据权利要求33至40任一项所述的终端,其特征在于,所述终端还包括:
    显示单元,用于在当前预览界面中,显示第三信息,所述第三信息用于指示所述目标主体在当前预览图像的中央。
  42. 根据权利要求33至41任一项所述的终端,其特征在于,
    所述采集单元具体用于,在所述目标主体在当前预览图像的中央时,采集所述第一个图像。
  43. 根据权利要求33至42任一项所述的终端,其特征在于,所述终端还包括:
    显示单元,用于显示用户界面,所述用户界面中包含第一控件,所述第一控件用于指示由近及远拍摄希区柯克变焦视频;以及,接收针对所述第一控件的操作;
    所述采集单元具体用于,响应于所述操作,针对所述第一场景采集所述N+1个图像。
  44. 根据权利要求33至43任一项所述的终端,其特征在于,所述终端的移动速度小于等于预设速度。
  45. 根据权利要求33至44任一项所述的终端,其特征在于,
    所述预设神经网络用于结合历史网络层的特征图,对待处理图像的白平衡增益进行预测,以保证时域相邻图像的白平衡一致性;其中,所述历史网络层是预测在所述待处理图像之前且与所述待处理图像时域连续的图像的白平衡增益时所使用的网络层。
  46. 根据权利要求45所述的终端,其特征在于,
    所述预设神经网络基于预设约束条件训练得到;其中,所述预设约束条件包括:用于模拟时域连续的多个图像的白平衡增益预测值一致。
  47. 根据权利要求33至46任一项所述的终端,其特征在于,所述处理单元在所述对于所述N+1个图像中后采集的N个图像,基于预设神经网络进行白平衡处理,得到N个优化图像的方面,具体用于:
    将所述N+1个图像中的第j个图像输入到所述预设神经网络,得到所述第j个图像的白平衡增益预测值;其中,2≤j≤N-1,j是整数;
    将所述第j个图像的白平衡增益预测值作用于所述第j个图像,得到所述第j个图像对应的优化图像;其中,所述N个优化图像包括所述第j个图像对应的优化图像。
  48. 一种终端,其特征在于,所述终端包括:
    采集单元,用于针对第一场景采集N+1个图像,所述N+1个图像中均包括目标主 体;其中,在所述采集N+1个图像的过程中,所述终端距离所述目标主体越来越近;N是大于等于1的整数;所述N+1个图像中的第一个图像由所述终端的第一摄像头采集得到,所述N+1个图像中的后N个图像中的部分或全部图像由所述终端的第二摄像头采集得到,所述第二摄像头的倍率小于所述第一摄像头的倍率;所述N+1个图像中后采集的N个图像中所述目标主体的大小小于或等于所述N+1个图像中采集的第一个图像中的所述目标主体的大小;
    处理单元,用于执行以下步骤:
    对于所述N个图像,基于预设神经网络进行白平衡处理,得到N个优化图像;所述预设神经网络用于保证时域相邻图像的白平衡一致性;
    对所述N个优化图像进行放大并裁剪,得到N个目标图像;其中,所述N个目标图像中的所述目标主体的大小与所述第一个图像中所述目标主体的大小一致,所述N个目标图像中所述目标主体的相对位置,与所述第一个图像中所述目标主体的相对位置一致;所述N个个目标图像与所述第一个图像的大小一致;
    基于所述N个目标图像和所述第一个图像,生成希区柯克变焦视频。
  49. 根据权利要求48所述的终端,其特征在于,所述N个图像包括在前采集的N1个图像和在后采集的N2个图像,其中,所述N1个图像由所述第二摄像头采集得到,所述N2个图像由所述终端的第三摄像头采集得到;所述N1和所述N2均是大于等于1的整数。
  50. 根据权利要求48或49所述的终端,其特征在于,所述采集单元具体用于:
    获取所述N+1个图像中的第i个图像的拍摄倍率;其中,2≤i≤N,i是整数;
    如果所述第i个图像的拍摄倍率在第一拍摄倍率范围内,则基于所述第二摄像头针对所述第一场景采集所述N+1个图像中的第i+1个图像;
    如果所述第i个图像的拍摄倍率在第二拍摄倍率范围内,则基于所述终端的第三摄像头针对所述第一场景采集所述N+1个图像中的第i+1个图像;
    其中,所述第二摄像头的倍率是b,所述第三摄像头的倍率是c;b>c;所述第一拍摄倍率范围是大于等于b的范围;所述第二拍摄倍率范围是[c,b)。
  51. 根据权利要求50所述的终端,其特征在于,
    所述第i个图像的拍摄倍率是基于所述第i个图像中所述目标主体的大小相对于所述第一个图像中所述目标主体的大小的缩放倍率,和采集所述第一个图像的摄像头的倍率确定的。
  52. 根据权利要求51所述的终端,其特征在于,所述第i个图像中的所述目标主体的大小通过以下至少一个特征来表征:
    所述第i个图像中的所述目标主体的宽度,
    所述第i个图像中的所述目标主体的高度,
    所述第i个图像中的所述目标主体的面积,或者,
    所述第i个图像中的所述目标主体的所占的像素点的数量。
  53. 根据权利要求51或52所述的终端,其特征在于,
    所述处理单元还用于,采用实例分割算法从所述第i个图像中提取所述目标主体,以确定所述第i个图像中的所述目标主体的大小。
  54. 根据权利要求48至53任一项所述的终端,其特征在于,所述终端还包括:
    显示单元,用于在当前预览界面中,显示第一信息,所述第一信息用于指示停止拍摄希区柯克变焦视频。
  55. 根据权利要求48至54任一项所述的终端,其特征在于,所述终端还包括:
    显示单元,用于在当前预览界面中,显示第二信息,所述第二信息用于指示所述目标主体静止。
  56. 根据权利要求48至55任一项所述的终端,其特征在于,所述终端还包括:
    显示单元,用于在当前预览界面中,显示第三信息,所述第三信息用于指示所述目标主体在当前预览图像的中央。
  57. 根据权利要求48至56任一项所述的终端,其特征在于,
    所述采集单元具体用于:在所述目标主体在当前预览图像的中央时,采集所述第一个图像。
  58. 根据权利要求48至57任一项所述的终端,其特征在于,所述终端还包括:
    显示单元,用于显示用户界面,所述用户界面中包含第二控件,所述第二控件用于指示由远及近拍摄希区柯克变焦视频;以及接收针对所述第二控件的操作;
    所述采集单元具体用于,响应于所述操作,针对所述第一场景采集所述N+1个图像。
  59. 根据权利要求48至58任一项所述的终端,其特征在于,所述终端的移动速度小于等于预设速度。
  60. 根据权利要求48至59任一项所述的终端,其特征在于,
    所述预设神经网络用于结合历史网络层的特征图,对待处理图像的白平衡增益进行预测,以保证时域相邻图像的白平衡一致性;其中,所述历史网络层是预测在所述待处理图像之前且与所述待处理图像时域连续的图像的白平衡增益时所使用的网络层。
  61. 根据权利要求60所述的终端,其特征在于,
    所述预设神经网络基于预设约束条件训练得到;其中,所述预设约束条件包括:用于模拟时域连续的多个图像的白平衡增益预测值一致。
  62. 根据权利要求48至61任一项所述的终端,其特征在于,所述对于所述N+1个图像中后采集的N个图像,基于预设神经网络进行白平衡处理,得到N个优化图像,包括:
    将所述N+1个图像中的第j个图像输入到所述预设神经网络,得到所述第j个图像的白平衡增益预测值;其中,2≤j≤N-1,j是整数;
    将所述第j个图像的白平衡增益预测值作用于所述第j个图像,得到所述第j个图像对应的优化图像;其中,所述N个优化图像包括所述第j个图像对应的优化图像。
  63. 一种终端,其特征在于,所述终端包括采集单元和处理单元,所述采集单元包括第一摄像头和第二摄像头,所述第一摄像头的倍率与所述第二摄像头的倍率不同;
    所述采集单元,用于通过所述第一摄像头和所述第二摄像头在第一时刻针对第一场景分别采集第一图像和第二图像;其中,所述第一图像和所述第二图像中均包含目标主体;
    所述处理单元,用于执行以下步骤:
    基于所述视频的预设播放时长和预设播放帧率,确定所述第一图像和所述第二图像之间的待插入图像的帧数N;其中,N是大于等于1的整数;
    基于所述帧数N、所述第一图像和所述第二图像,确定N个待插入图像;
    基于所述第一图像、所述第二图像和所述N个待插入图像,生成所述视频;其中,所述视频的各图像中的所述目标主体的大小逐渐变大或逐渐变小。
  64. 根据权利要求63所述的终端,其特征在于,所述采集单元还包括第三摄像头,所述第三摄像头的倍率在所述第一摄像头与所述第二摄像头的倍率之间;
    所述采集单元还用于,通过所述第三摄像头在所述第一时刻针对所述第一场景采集第三图像;其中,所述第三图像包含所述目标主体;
    所述处理单元在所述基于所述帧数、所述第一图像和所述第二图像,确定N个待插入图像的方面,具体用于:
    基于所述帧数、所述第一图像、所述第二图像和所述第三图像,确定N个待插入图像。
  65. 一种终端,其特征在于,包括:处理器、存储器和摄像头,所述存储器用于存储计算机程序和指令,所述处理器用于调用所述计算机程序和指令,与所述摄像头协同执行如权利要求1-32中任一项所述的方法。
PCT/CN2021/094695 2020-05-30 2021-05-19 拍摄视频的方法和装置 WO2021244295A1 (zh)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN202010480536.3 2020-05-30
CN202010480536 2020-05-30
CN202011043999.X 2020-09-28
CN202011043999.XA CN113747085B (zh) 2020-05-30 2020-09-28 拍摄视频的方法和装置

Publications (1)

Publication Number Publication Date
WO2021244295A1 true WO2021244295A1 (zh) 2021-12-09

Family

ID=78728055

Family Applications (2)

Application Number Title Priority Date Filing Date
PCT/CN2021/078543 WO2022062318A1 (zh) 2020-05-30 2021-03-01 一种拍摄方法及设备
PCT/CN2021/094695 WO2021244295A1 (zh) 2020-05-30 2021-05-19 拍摄视频的方法和装置

Family Applications Before (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/078543 WO2022062318A1 (zh) 2020-05-30 2021-03-01 一种拍摄方法及设备

Country Status (2)

Country Link
CN (2) CN113747085B (zh)
WO (2) WO2022062318A1 (zh)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114500852A (zh) * 2022-02-25 2022-05-13 维沃移动通信有限公司 拍摄方法、拍摄装置、电子设备和可读存储介质
CN116546316A (zh) * 2022-01-25 2023-08-04 荣耀终端有限公司 切换摄像头的方法与电子设备
CN116723394A (zh) * 2022-02-28 2023-09-08 荣耀终端有限公司 多摄策略调度方法及其相关设备
WO2023220868A1 (zh) * 2022-05-16 2023-11-23 北京小米移动软件有限公司 一种图像处理方法、装置、终端及存储介质
CN117596497A (zh) * 2023-09-28 2024-02-23 书行科技(北京)有限公司 图像渲染方法、装置、电子设备和计算机可读存储介质

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116634261A (zh) * 2021-12-10 2023-08-22 荣耀终端有限公司 一种拍摄参数的控制方法及装置
CN116051368B (zh) * 2022-06-29 2023-10-20 荣耀终端有限公司 图像处理方法及其相关设备
CN116055871B (zh) * 2022-08-31 2023-10-20 荣耀终端有限公司 视频处理方法及其相关设备
CN115965942B (zh) * 2023-03-03 2023-06-23 安徽蔚来智驾科技有限公司 位置估计方法、车辆控制方法、设备、介质及车辆
CN117459830B (zh) * 2023-12-19 2024-04-05 北京搜狐互联网信息服务有限公司 一种移动设备自动变焦的方法及系统

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106412547A (zh) * 2016-08-29 2017-02-15 厦门美图之家科技有限公司 一种基于卷积神经网络的图像白平衡方法、装置和计算设备
CN108234879A (zh) * 2018-02-02 2018-06-29 成都西纬科技有限公司 一种获取滑动变焦视频的方法和装置
US20180309936A1 (en) * 2017-04-25 2018-10-25 International Business Machines Corporation System and method for photographic effects
US20190045163A1 (en) * 2018-10-02 2019-02-07 Intel Corporation Method and system of deep learning-based automatic white balancing
CN110262737A (zh) * 2019-06-25 2019-09-20 维沃移动通信有限公司 一种视频数据的处理方法及终端
CN111083380A (zh) * 2019-12-31 2020-04-28 维沃移动通信有限公司 一种视频处理方法、电子设备及存储介质

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007086269A (ja) * 2005-09-21 2007-04-05 Hitachi Kokusai Electric Inc カメラ装置およびカメラ装置のズームレンズ光学系の焦点距離調節方法
JP4991899B2 (ja) * 2010-04-06 2012-08-01 キヤノン株式会社 撮像装置及びその制御方法
CN104717427B (zh) * 2015-03-06 2018-06-08 广东欧珀移动通信有限公司 一种自动变焦方法、装置和移动终端
JP6512897B2 (ja) * 2015-03-30 2019-05-15 キヤノン株式会社 ズーム制御装置、ズーム制御装置の制御方法
KR20180056182A (ko) * 2016-11-18 2018-05-28 엘지전자 주식회사 이동 단말기 및 그 제어 방법
CN110557550B (zh) * 2018-05-31 2020-10-30 杭州海康威视数字技术股份有限公司 聚焦方法、装置及计算机可读存储介质
WO2019227441A1 (zh) * 2018-05-31 2019-12-05 深圳市大疆创新科技有限公司 可移动平台的拍摄控制方法和设备
CN109361865B (zh) * 2018-11-21 2020-08-04 维沃移动通信(杭州)有限公司 一种拍摄方法及终端
WO2020107372A1 (zh) * 2018-11-30 2020-06-04 深圳市大疆创新科技有限公司 拍摄设备的控制方法、装置、设备及存储介质
CN109379537A (zh) * 2018-12-30 2019-02-22 北京旷视科技有限公司 滑动变焦效果实现方法、装置、电子设备及计算机可读存储介质
CN110198413B (zh) * 2019-06-25 2021-01-08 维沃移动通信有限公司 一种视频拍摄方法、视频拍摄装置和电子设备

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106412547A (zh) * 2016-08-29 2017-02-15 厦门美图之家科技有限公司 一种基于卷积神经网络的图像白平衡方法、装置和计算设备
US20180309936A1 (en) * 2017-04-25 2018-10-25 International Business Machines Corporation System and method for photographic effects
CN108234879A (zh) * 2018-02-02 2018-06-29 成都西纬科技有限公司 一种获取滑动变焦视频的方法和装置
US20190045163A1 (en) * 2018-10-02 2019-02-07 Intel Corporation Method and system of deep learning-based automatic white balancing
CN110262737A (zh) * 2019-06-25 2019-09-20 维沃移动通信有限公司 一种视频数据的处理方法及终端
CN111083380A (zh) * 2019-12-31 2020-04-28 维沃移动通信有限公司 一种视频处理方法、电子设备及存储介质

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116546316A (zh) * 2022-01-25 2023-08-04 荣耀终端有限公司 切换摄像头的方法与电子设备
CN116546316B (zh) * 2022-01-25 2023-12-08 荣耀终端有限公司 切换摄像头的方法与电子设备
CN114500852A (zh) * 2022-02-25 2022-05-13 维沃移动通信有限公司 拍摄方法、拍摄装置、电子设备和可读存储介质
CN114500852B (zh) * 2022-02-25 2024-04-19 维沃移动通信有限公司 拍摄方法、拍摄装置、电子设备和可读存储介质
CN116723394A (zh) * 2022-02-28 2023-09-08 荣耀终端有限公司 多摄策略调度方法及其相关设备
CN116723394B (zh) * 2022-02-28 2024-05-10 荣耀终端有限公司 多摄策略调度方法及其相关设备
WO2023220868A1 (zh) * 2022-05-16 2023-11-23 北京小米移动软件有限公司 一种图像处理方法、装置、终端及存储介质
CN117596497A (zh) * 2023-09-28 2024-02-23 书行科技(北京)有限公司 图像渲染方法、装置、电子设备和计算机可读存储介质

Also Published As

Publication number Publication date
CN113747050B (zh) 2023-04-18
CN113747085B (zh) 2023-01-06
CN113747085A (zh) 2021-12-03
WO2022062318A1 (zh) 2022-03-31
CN113747050A (zh) 2021-12-03

Similar Documents

Publication Publication Date Title
WO2021244295A1 (zh) 拍摄视频的方法和装置
WO2021147482A1 (zh) 一种长焦拍摄的方法及电子设备
WO2021179773A1 (zh) 图像处理方法和装置
WO2021104485A1 (zh) 一种拍摄方法及电子设备
KR20220082926A (ko) 비디오 촬영 방법 및 전자 디바이스
WO2021185250A1 (zh) 图像处理方法及装置
WO2021190348A1 (zh) 图像处理方法和电子设备
US20220343648A1 (en) Image selection method and electronic device
CN115689963B (zh) 一种图像处理方法及电子设备
WO2020192761A1 (zh) 记录用户情感的方法及相关装置
WO2022262475A1 (zh) 拍摄方法、图形用户界面及电子设备
US20230224574A1 (en) Photographing method and apparatus
WO2023093169A1 (zh) 拍摄的方法和电子设备
CN113099146A (zh) 一种视频生成方法、装置及相关设备
CN113452969B (zh) 图像处理方法和装置
WO2023160230A9 (zh) 一种拍摄方法及相关设备
US20230014272A1 (en) Image processing method and apparatus
CN115442509B (zh) 拍摄方法、用户界面及电子设备
CN113497888B (zh) 照片预览方法、电子设备和存储介质
WO2023160224A9 (zh) 一种拍摄方法及相关设备
CN116709018B (zh) 一种变焦条分割方法及电子设备
WO2022228010A1 (zh) 一种生成封面的方法及电子设备
WO2022206589A1 (zh) 一种图像处理方法以及相关设备
EP4329320A1 (en) Method and apparatus for video playback
WO2023231696A1 (zh) 一种拍摄方法及相关设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21816865

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21816865

Country of ref document: EP

Kind code of ref document: A1