CN117082295A

CN117082295A - Image stream processing method, device and storage medium

Info

Publication number: CN117082295A
Application number: CN202311218669.3A
Authority: CN
Inventors: 黄宇星
Original assignee: Honor Device Co Ltd
Current assignee: Honor Device Co Ltd
Priority date: 2023-09-21
Filing date: 2023-09-21
Publication date: 2023-11-17
Anticipated expiration: 2043-09-21
Also published as: CN117082295B

Abstract

The application provides an image stream processing method, an image stream processing device and a storage medium. The method is applied to electronic equipment, and the electronic equipment comprises a first camera and a second camera. Specifically, a first image stream acquired by a first camera and a second image stream acquired by a second camera are acquired, wherein the first image stream is a high-resolution low-frame-rate image stream, and the second image stream is a low-resolution high-frame-rate image stream; a third image stream is calculated based on the first image stream and the second image stream, the third image stream being a high resolution high frame rate image stream. Namely, by combining two paths of video streams with low resolution, high frame rate and high resolution and low frame rate, the video stream acquisition with high resolution and high frame rate is realized, so that the slow-motion video shooting with high quality, low calculation amount and low time delay is realized.

Description

Image stream processing method, device and storage medium

Technical Field

The present application relates to the field of image processing technologies, and in particular, to a method, an apparatus, and a storage medium for processing an image stream.

Background

Currently, shooting functions of electronic devices such as mobile phones are more and more perfect, and user experience requirements on shooting are also higher and more. For example, in a scene of shooting a high-frame-rate slow-motion video, a user expects to shoot a video picture clearly and smoothly. This requires electronic devices such as cell phones to be able to capture high resolution high frame rate video streams (image streams). However, current electronic devices are limited by hardware, and generally cannot directly collect a video stream with high resolution and high frame rate, but the effect of obtaining a video stream with high frame rate based on a software mode, such as a mode based on video interpolation, is poor.

Disclosure of Invention

In order to solve the technical problems, the application provides an image stream processing method, an image stream processing device and a storage medium, which aim to combine two paths of video streams with low resolution, high frame rate and high resolution, low frame rate to realize the acquisition of the video streams with high resolution, high frame rate, thereby realizing the slow motion video shooting with high quality, low calculation amount and low time delay.

In a first aspect, the present application provides an image stream processing method, applied to an electronic device, where the electronic device includes a first camera and a second camera. The method comprises the following steps: acquiring a first image stream acquired by a first camera and a second image stream acquired by a second camera, wherein the first image stream is a high-resolution low-frame-rate image stream, and the second image stream is a low-resolution high-frame-rate image stream; a third image stream is calculated based on the first image stream and the second image stream, the third image stream being a high resolution high frame rate image stream.

Therefore, by combining two paths of video streams with low resolution, high frame rate and high resolution and low frame rate, the video stream acquisition with high resolution and high frame rate is realized, and the slow-motion video shooting with high quality, low calculation amount and low time delay is realized.

According to a first aspect, a third image stream is calculated based on the first image stream and the second image stream, the third image stream being a high resolution high frame rate image stream, comprising: calculating a first light flow map and a first visual map of an intermediate frame in the second image stream based on the second image stream; calculating an image of an intermediate frame in the first image stream based on the first image stream, the first light flow map and the first visualization map; and encoding the image and the image in the first image stream to obtain a third image stream.

According to a first aspect, or any implementation of the first aspect above, the first optical flow map comprises a first forward optical flow map and a first backward optical flow map, the first visualization map comprises a first forward visualization map and a first backward visualization map; based on the second image stream, computing a first light flow map and a first visualization map of an intermediate frame in the second image stream, comprising: and calculating a first forward light flow graph, a first forward visual graph, a first backward light flow graph and a first backward visual graph according to the previous frame of the intermediate frame, the intermediate frame and the next frame of the intermediate frame in the second image stream.

According to the first aspect, or any implementation manner of the first aspect, based on a pre-trained convolutional neural network, calculating a first forward optical flow graph and a first forward visual graph, and a first backward optical flow graph and a first backward visual graph; calculating a first forward light flow map, a first forward visual map, a first backward light flow map and a first backward visual map from a previous frame of an intermediate frame, an intermediate frame and a subsequent frame of the intermediate frame in the second image stream, comprising: the method comprises the steps of taking a previous frame, an intermediate frame and a next frame as input parameters, inputting a convolutional neural network, taking a first optical flow diagram output by the convolutional neural network as a first forward optical flow diagram, taking the first visual diagram output by the convolutional neural network as a first forward visual diagram, taking a second optical flow diagram output by the convolutional neural network as a first backward optical flow diagram, and taking the second visual diagram output by the convolutional neural network as a first backward visual diagram.

Therefore, the intermediate frame and the front and back frames are taken as input parameters to be input into the convolutional neural network, so that a forward optical flow diagram, a forward visual diagram, a backward optical flow diagram and a backward visual diagram can be obtained at one time, intermediate processing links are reduced, and convenience and rapidness are realized.

According to a first aspect, or any implementation manner of the first aspect above, calculating an image of an intermediate frame in the first image stream based on the first image stream, the first light flow map and the first visualization map, comprises: upsampling the first light flow graph and the first visual graph to obtain a second light flow graph and a second visual graph; an image of an intermediate frame in the first image stream is calculated based on the first image stream, the second light flow map and the second visualization map.

According to a first aspect, or any implementation of the first aspect above, the first optical flow map comprises a first forward optical flow map and a first backward optical flow map, the first visualization map comprises a first forward visualization map and a first backward visualization map; the second optical flow map comprises a second forward optical flow map and a second backward optical flow map, and the second visual map comprises a second forward visual map and a second backward visual map; upsampling the first light flow map and the first visual map to obtain a second light flow map and a second visual map, including: upsampling the first forward optical flow graph to a resolution corresponding to the first image flow to obtain a second forward optical flow graph; upsampling the first forward visual image to a resolution corresponding to the first image stream to obtain a second forward visual image; upsampling the first backward light flow graph to a resolution corresponding to the first image flow to obtain a second backward light flow graph; and up-sampling the first backward visual image to the resolution corresponding to the first image stream to obtain a second backward visual image.

According to the first aspect, or any implementation manner of the first aspect, the first image stream and the second image stream keep clock synchronization; wherein, when the second image stream includes a first low resolution image frame, a second low resolution image frame, and a third low resolution image frame, the first low resolution image frame is time aligned with the first high resolution image frame, and the third low resolution image frame is time aligned with the second high resolution image frame; the second low resolution image frame is an intermediate frame in the second image stream, the first low resolution image frame is a frame before the second low resolution image frame, and the third low resolution image frame is a frame after the second low resolution image frame.

According to a first aspect, or any implementation manner of the first aspect above, calculating an image of an intermediate frame in the first image stream based on the first image stream, the second light flow map and the second visualization map, comprises: an image of an intermediate frame in the first image stream is calculated based on the first high resolution image frame, the second forward light flow map, the second forward visualization map, the second backward light flow map, the second backward visualization map in the first image stream.

According to the first aspect, or any implementation manner of the first aspect above, the image of the intermediate frame in the first image stream is calculated based on the first high resolution image frame, the second forward light flow map, the second forward visual map, the second backward light flow map, the second backward visual map in the first image stream according to the following formula:

wherein,for a first high resolution image frame, < >>For the second high resolution image frame, +.>For the second forward optical flow map, +.>For a second forward visualization, +.>For the second backward optical flow map, +.>For a second backward visualization, < >>Is an image of an intermediate frame in the first image stream.

According to a first aspect, or any implementation manner of the first aspect, the encoding the image and the image in the first image stream to obtain a third image stream includes: will beInsert->And->And obtaining a third image stream.

In a second aspect, the present application provides an electronic device. The electronic device includes: a memory and a processor, the memory and the processor coupled; the memory stores program instructions that, when executed by the processor, cause the electronic device to perform the instructions of the first aspect or of the method in any possible implementation of the first aspect.

Any implementation manner of the second aspect and the second aspect corresponds to any implementation manner of the first aspect and the first aspect, respectively. The technical effects corresponding to the second aspect and any implementation manner of the second aspect may be referred to the technical effects corresponding to the first aspect and any implementation manner of the first aspect, which are not described herein.

In a third aspect, the application provides a computer readable medium storing a computer program comprising instructions for performing the method of the first aspect or any possible implementation of the first aspect.

Any implementation manner of the third aspect and any implementation manner of the third aspect corresponds to any implementation manner of the first aspect and any implementation manner of the first aspect, respectively. The technical effects corresponding to the third aspect and any implementation manner of the third aspect may be referred to the technical effects corresponding to the first aspect and any implementation manner of the first aspect, which are not described herein.

In a fourth aspect, the present application provides a computer program comprising instructions for performing the method of the first aspect or any possible implementation of the first aspect.

Any implementation manner of the fourth aspect and any implementation manner of the fourth aspect corresponds to any implementation manner of the first aspect and any implementation manner of the first aspect, respectively. Technical effects corresponding to any implementation manner of the fourth aspect may be referred to the technical effects corresponding to any implementation manner of the first aspect, and are not described herein.

In a fifth aspect, the present application provides a chip system comprising a processor and a memory, the memory storing program instructions that, when executed by the processor, cause the chip system to perform the instructions of the first aspect or of the method in any possible implementation of the first aspect.

An implementation manner of the fifth aspect corresponds to the first aspect and any implementation manner of the first aspect. Technical effects corresponding to implementation manners of the fifth aspect may be referred to the technical effects corresponding to any implementation manner of the first aspect and the first aspect, and are not described herein.

Drawings

FIG. 1A is a schematic illustration of an exemplary application scenario;

FIG. 1B is a schematic diagram of an exemplary scene for capturing high resolution slow motion video;

Fig. 1C is a schematic diagram illustrating a captured video frame in a high resolution slow motion video capture scene;

fig. 1D is a schematic diagram illustrating processing an output video frame according to an image stream processing method provided by an embodiment of the present application in a high-resolution slow motion video shooting scene;

FIG. 2 is a schematic diagram of an external appearance of an electronic device shown by way of example;

fig. 3 is a schematic diagram illustrating a hardware structure of an electronic device;

FIG. 4 is a schematic diagram of a software architecture of an exemplary electronic device;

fig. 5 is a schematic diagram illustrating a processing link involved in an image stream processing method according to an embodiment of the present application;

fig. 6 is a schematic diagram of processing logic of an image flow acquisition link, an optical flow diagram and a visual diagram calculation link, an optical flow diagram and a visual diagram up-sampling link, and a high-resolution intermediate frame calculation link, which are involved in an image flow processing method according to an exemplary embodiment of the present application;

FIG. 7 is a schematic diagram of an exemplary embodiment of processing a low resolution high frame rate image stream via a convolutional neural network to obtain a light flow map and a visualization map;

fig. 8 is a schematic diagram illustrating longitudinal interaction of functional modules involved in implementing the image stream processing method provided by the embodiment of the present application;

Fig. 9 is a schematic flow chart of an image stream processing method according to an exemplary embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

The term "and/or" is herein merely an association relationship describing an associated object, meaning that there may be three relationships, e.g., a and/or B, may represent: a exists alone, A and B exist together, and B exists alone.

The terms first and second and the like in the description and in the claims of embodiments of the application, are used for distinguishing between different objects and not necessarily for describing a particular sequential order of objects. For example, the first target object and the second target object, etc., are used to distinguish between different target objects, and are not used to describe a particular order of target objects.

In embodiments of the application, words such as "exemplary" or "such as" are used to mean serving as an example, instance, or illustration. Any embodiment or design described herein as "exemplary" or "e.g." in an embodiment of the present application is not to be taken as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary" or "such as" is intended to present related concepts in a concrete fashion.

In the description of the embodiments of the present application, unless otherwise indicated, the meaning of "a plurality" means two or more. For example, the plurality of processing units refers to two or more processing units; the plurality of systems means two or more systems.

Multimedia such as photographs and videos is one of application scenes of electronic equipment scenes such as mobile phones. As the experience requirement of the user on mobile phone photographing is higher, the photographing function of the mobile phone is perfected. The following takes a mobile phone as an example, and a related explanation of a photographing function of the electronic device is performed with reference to the accompanying drawings.

In fig. 1A (1), a cell phone interface 10a is exemplarily shown. Referring to fig. 1A (1), the interface 10a is shown with icons of a plurality of applications, such as the camera application icon 10a-1, and icons of applications such as address book, telephone, information, clock, calendar, gallery, memo, file management, email, music, calculator, video, recorder, weather, browser, settings, etc.

It should be noted that, in some possible implementations, the interface 10a shown in (1) in fig. 1A may be referred to as a main interface. When the user clicks the icon 10a-1 in the interface 10a, the camera application may be used to perform a photographing function, such as photographing, photographing (recording) video, etc.

With continued reference to fig. 1A, by way of example, when a user clicks an icon 10a-1 of a camera application, the mobile phone responds to a user operation to identify a control corresponding to the user click operation as a control of the camera application, and further invokes a corresponding interface in an application framework layer to start the camera application, and starts a camera driver by invoking a kernel layer, and acquires an image stream (in this case, a preview stream) through the camera. At this time, the mobile phone displays an interface of the camera application, for example, an interface 10b shown in (2) of fig. 1A, and a screen corresponding to the preview flow, for example, a screen for basketball by the user shown in (2) of fig. 1A, is displayed in the interface 10 b.

With the perfection of the shooting function of the mobile phone, more shooting modes are supported by camera applications. By way of example, the photographing modes may include an aperture mode, a night view mode, a portrait mode, a photographing mode, a video mode, a smiling face mode, a professional mode, etc., and may be referred to as a mode option displayed in the photographing mode list 10b-2 in fig. 1A (2).

For example, when the user clicks an icon option corresponding to a certain shooting mode, the mobile phone displays a camera application interface in the corresponding shooting mode. For example, if the user clicks the icon option of the photographing mode, the mobile phone displays an interface when the camera application adopts the photographing mode, and the interface may refer to the interface 10b shown in (2) in fig. 1A.

It should be noted that, in some possible implementations, after the mobile phone starts the camera application, the camera application defaults to a selected photographing mode, that is, the photographing mode is a default photographing mode of the camera application. In the photographing mode, the shutter space 10b-1 is also displayed in the interface 10b. Illustratively, when the user clicks the shutter control 106, the cell phone detects a user operation on the shutter control 106, and takes a picture in response to the operation.

It should be understood that the above description is only an example for better understanding of the technical solution of the present embodiment, and is not the only limitation of the present embodiment.

With the perfection of the shooting function of the mobile phone, the user has higher and higher experience requirements on shooting. For example, in a scene of shooting a high-frame-rate slow-motion video, a user expects to shoot a video picture clearly and smoothly. Specifically, to ensure the definition of a video frame, it is necessary to ensure that the resolution of an image frame in an acquired image stream is higher; to ensure the smoothness of the video picture, it is necessary to ensure that the higher the frame rate of the image frames in the acquired image stream. That is, a video picture of a clear flow is required to be obtained by encoding based on a high-resolution high-frame-rate image stream. This requires electronic devices such as cell phones to be able to capture high resolution high frame rate image streams.

However, at present, electronic devices such as mobile phones are limited by hardware, and generally cannot directly collect video streams with high resolution and high frame rate. The specific reason is that the mobile industry processor (Mobile Industry Processor Interface, MIPI) interface employed by the image sensors currently used in mainstream mobile terminals has a rate limitation (frame rate limitation). Therefore, the electronic devices such as mobile phones cannot bear high resolution and high frame rate image streaming. For example, the IMX 800C-PHY image sensor supports only 12M 120FPS video data.

In addition, since the area of the image sensor in a non-professional image capturing apparatus such as a mobile phone is much smaller than that of a professional image capturing apparatus, the line exposure time is short for a high frame rate video stream/image stream, for example, the exposure time of a 480FPS video single frame is 2ms at maximum, which causes serious noise problems for the high frame rate video. To appear to the user interface, the video frame may have a number of highlight noise points.

To solve this problem, in some existing implementations, the frame insertion is performed in a high resolution low frame rate image stream based on a software approach, such as a video frame insertion approach. For mainstream video interpolation algorithms, such as optical flow calculation and optical flow estimation algorithms, we currently mainly focus on how to obtain more accurate optical flow estimation (high resolution intermediate frames) based on high resolution, low frame rate image frames. Although this type of approach does work well, for some high-speed motion and complex scenes, it can cause ghosting of the motion, complex pictures due to inaccurate optical flow estimation.

In order to facilitate understanding, in the following, taking a video recording a basketball playing motion scene as an example, the problem that an electronic device such as a mobile phone cannot collect an image stream with high resolution and high frame rate at present, and a high frame rate video picture obtained based on video frame insertion is described.

Referring to fig. 1B (1), for example, when the user selects the recording mode (for example, recorded video is high-resolution slow motion video data) displayed in the shooting mode list 10B-2, the shutter control 10B-1 displayed in the interface 10B is replaced with the recording control 10B-3.

Understandably, before clicking the video control 10b-3, the screen corresponding to the preview stream is still displayed in the interface 10b, and the screen displayed in the interface 10b is clear in flow since the recording of the high resolution slow motion video has not yet been started. When the user clicks the video recording control 10b-3, the mobile phone will start recording the high resolution slow motion video in response to the user operation. At this time, the interface 10B is switched to a video interface, such as the interface 10c shown in fig. 1B (2).

Referring to fig. 1B (2), for example, during recording, a control 10c-1 for pausing and ending recording is displayed in the interface 10c, and the current recording time is shown as "01:11", indicating that 1.11s is currently recorded.

Because the video recorded in the currently selected video recording mode is high-resolution slow-motion video data, the video is limited by hardware, namely the frame rate of an image sensor is limited, and the defects of the existing video frame inserting mode are overcome, and the problem of blurring and double-image at the edge of a basketball which moves at a high speed in the video recorded in a basketball playing motion scene can be solved. Also, severe noise may be present in the picture due to the low exposure time of the high frame rate video. Specifically, when the picture recorded in (2) in fig. 1B is enlarged, as shown in fig. 1C, noise in the picture, and blurring and ghosting of the basketball rim can be clearly seen.

In view of this, the present application provides an image stream processing method, which aims to combine two paths of video streams (image streams) with low resolution, high frame rate and high resolution, and low frame rate, to realize video stream acquisition with high resolution, high frame rate, thereby realizing slow motion video shooting with high quality, low calculation amount and low time delay.

Taking the example of shooting the video in the basketball playing sports scene of the user as shown in fig. 1B, based on the image stream processing method provided by the application, the video picture finally output by video coding is as shown in fig. 1D, the basketball edge of high-speed motion is clear, no ghost is basically generated, and noise is effectively filtered. As can be seen from a comparison between fig. 1C and fig. 1D, the image stream with high resolution and high frame rate processed by the image stream processing method provided by the present application is better than the existing implementation manner in terms of sharpness and smoothness of the video picture obtained after the video coding process. The image stream processing method provided by the application combines the two paths of video streams (image streams) with low resolution, high frame rate and high resolution and low frame rate, thereby ensuring the video quality of video coding output finally, meeting the user requirements and further improving the shooting experience of users using non-professional shooting equipment such as mobile phones and the like.

In order to better understand the technical solution provided by the embodiments of the present application, the following describes the hardware structure and the software structure of the electronic device with reference to fig. 2 to fig. 4.

It should be noted that, the electronic device to which the embodiment of the present application is applicable is mainly a non-professional camera device with a camera, such as a mobile phone, a tablet computer, an intelligent wearable device, etc.

In addition, it should be noted that, since the image stream processing method provided by the embodiment of the present application needs to be based on two paths (respectively, high resolution low frame rate and low resolution high frame rate) of image streams, the non-professional image capturing device to which the embodiment of the present application is applicable is an electronic device that has at least two cameras and the two cameras are located on the same side.

Taking the mobile phone 100 shown in fig. 2 as an example, the electronic device according to the embodiment of the present application is applicable. As one possible implementation, the handset 100 may include at least two rear cameras, such as the first camera and the second camera shown in fig. 2.

By way of example, as one possible implementation, the hardware structure of the electronic device, such as the mobile phone 100 shown in fig. 2, may further include various functional devices shown in fig. 3.

Referring to fig. 3, exemplary, the handset 100 may include: processor 110, external memory interface 120, internal memory 121, universal serial bus (universal serial bus, USB) interface 130, charge management module 140, power management module 141, battery 142, antenna 1, antenna 2, mobile communication module 150, wireless communication module 160, audio module 170, sensor module 180, keys 190, motor 191, indicator 192, camera 193, display 194, and subscriber identity module (subscriber identification module, SIM) card interface 195, among others.

The processor 110 may include one or more processing units, for example: the processor 110 may include an application processor (application processor, AP), a Modem processor (Modem), a graphics processor (graphics processing unit, GPU), an image signal processor (image signal processor, ISP), a controller, a video codec, a digital signal processor (digital signal processor, DSP), a baseband processor, a neural network processor (neural-network processing unit, NPU), etc., which are not further listed here, and the present application is not limited thereto.

The controller as the processing unit may be a neural center or a command center of the mobile phone 100. In practical application, the controller can generate operation control signals according to the instruction operation codes and the time sequence signals to complete instruction fetching and instruction execution control.

With respect to the modem processor described above, a modulator and demodulator may be included. The modulator is used for modulating the low-frequency baseband signal to be transmitted into a medium-high frequency signal. The demodulator is used for demodulating the received electromagnetic wave signal into a low-frequency baseband signal and transmitting the low-frequency baseband signal obtained by demodulation to the baseband processor for processing.

The baseband processor is used for processing the low-frequency baseband signal transmitted by the regulator and transmitting the processed low-frequency baseband signal to the application processor.

It should be noted that in some implementations, the baseband processor may be integrated within the modem, i.e., the modem may be provided with the functionality of the baseband processor.

With respect to the above-mentioned application processor, it is used to output sound signals through audio devices (not limited to the speaker 170A, the receiver 170B, etc.), or to display images or videos through the display screen 194.

The above-mentioned digital signal processor is used for processing digital signals. Specifically, the digital signal processor may process other digital signals in addition to the digital image signal.

The above-mentioned neural network processor, particularly in the technical solution provided in the present application, may be used to train the convolutional neural network described in the embodiment of the present application. Understandably, to reduce the resource occupation of the mobile phone 100, the convolutional neural network may be trained by a cloud server or other server and issued to the mobile phone 100.

With respect to the video codec described above, it is used for compressing or decompressing digital video. Illustratively, the handset 100 may support one or more video codecs. In this way, the mobile phone 100 can play or record video in multiple coding formats, for example: dynamic picture experts group (moving picture experts group, MPEG) 1, MPEG2, MPEG3, MPEG4, etc.

The ISP is used for outputting the digital image signal to the DSP processing. Specifically, the ISP is used to process data fed back by the camera 193. For example, when photographing and video recording, the shutter is opened, light is transmitted to the camera photosensitive element through the lens, the optical signal is converted into an electric signal, and the camera photosensitive element transmits the electric signal to the ISP for processing and is converted into an image visible to naked eyes. ISP can also optimize the noise, brightness and skin color of the image. The ISP can also optimize parameters such as exposure, color temperature and the like of a shooting scene. In some implementations, the ISP may be provided in the camera 193.

The DSP is used to convert digital image signals into standard RGB, YUV, and other image signals.

Furthermore, it should be noted that, with respect to the processor 110 including the processing units described above, in some implementations, the different processing units may be separate devices. That is, each processing unit may be considered a processor. In other implementations, different processing units may also be integrated in one or more processors.

Further, the processor 110 may also include one or more interfaces. The interfaces may include, but are not limited to, an integrated circuit (inter-integrated circuit, I2C) interface, an integrated circuit built-in audio (inter-integrated circuit sound, I2S) interface, a pulse code modulation (pulse code modulation, PCM) interface, a universal asynchronous receiver transmitter (universal asynchronous receiver/transmitter, UART) interface, a mobile industry processor interface (mobile industry processor interface, MIPI), a general-purpose input/output (GPIO) interface, a subscriber identity module (subscriber identity module, SIM) interface, and/or a universal serial bus (universal serial bus, USB) interface, among others.

Further, a memory may be provided in the processor 110 for storing instructions and data. In some implementations, the memory in the processor 110 is a cache memory. The memory may hold instructions or data that the processor 110 has just used or recycled. If the processor 110 needs to reuse the instruction or data, it can be called directly from the memory. Repeated accesses are avoided and the latency of the processor 110 is reduced, thereby improving the efficiency of the system.

With continued reference to fig. 3, the external memory interface 120 may be used to interface with an external memory card, such as a Micro SD card, to enable expansion of the memory capabilities of the handset 100. The external memory card communicates with the processor 110 through an external memory interface 120 to implement data storage functions. For example, files such as music, video, etc. are stored in an external memory card.

With continued reference to fig. 3, the internal memory 121 may be used to store computer executable program code, including instructions. The processor 110 executes various functional applications of the cellular phone 100 and data processing by executing instructions stored in the internal memory 121. The internal memory 121 may include a storage program area and a storage data area. The storage program area may store an application program (such as a sound playing function, an image playing function, and a function for processing a non-high-resolution high-frame-rate image stream acquired by a camera in the embodiment of the present application, so as to obtain a high-resolution high-frame-rate image stream) required by at least one function of the operating system. The data storage area may store data created during use of the mobile phone 100 (such as high resolution and high frame rate video data recorded based on the technical scheme provided by the embodiment of the present application), and so on. In addition, the internal memory 121 may include a high-speed random access memory, and may further include a nonvolatile memory such as at least one magnetic disk storage device, a flash memory device, a universal flash memory (universal flash storage, UFS), and the like.

With continued reference to fig. 3, the charge management module 140 is operable to receive a charge input from a charger. The charger can be a wireless charger or a wired charger. Understandably, the method comprises the steps of. The charge management module 140 may also power the electronic device through the power management module 141 while charging the battery 142.

With continued reference to fig. 3, the power management module 141 is configured to connect the battery 142, the charge management module 140, and the processor 110. The power management module 141 receives input from the battery 142 and/or the charge management module 140 and provides power to the processor 110, the internal memory 121, the external memory, the display 194, the camera 193, the wireless communication module 160, and the like. The power management module 141 may also be configured to monitor battery capacity, battery cycle number, battery health (leakage, impedance) and other parameters. In other implementations, the power management module 141 may also be provided in the processor 110. In other implementations, the power management module 141 and the charge management module 140 may also be disposed in the same device.

With continued reference to fig. 3, the wireless communication function of the handset 100 may be implemented by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, a modem processor, a baseband processor, and so on.

The antennas 1 and 2 are used to transmit and receive electromagnetic wave signals. Each antenna in the handset 100 may be used to cover a single or multiple communication bands. Different antennas may also be multiplexed to improve the utilization of the antennas. For example: the antenna 1 may be multiplexed into a diversity antenna of a wireless local area network. In other implementations, the antenna may be used in conjunction with a tuning switch.

With continued reference to fig. 3, the mobile communication module 150 may provide a solution for wireless communications, including 2G/3G/4G/5G, applied to the handset 100.

With continued reference to fig. 3, the wireless communication module 160 may provide solutions for wireless communication including wireless local area network (wireless local area networks, WLAN) (e.g., wireless fidelity (wireless fidelity, wi-Fi) network), bluetooth (BT), global navigation satellite system (global navigation satellite system, GNSS), frequency modulation (frequency modulation, FM), near field wireless communication technology (near field communication, NFC), infrared technology (IR), and the like, as applied to the handset 100.

It should be noted that, in some implementations, the convolutional neural network (model) used for determining the optical flow map and the visual map with low resolution and high frame rate according to the image stream with low resolution and high frame rate may be obtained by training a cloud server or other servers. For such an implementation scenario, the handset 100 may communicate with a cloud server or other server providing a convolutional neural network through the mobile communication module 150 or the wireless communication module 160. For example, the mobile phone 100 may send a request to the cloud server to obtain or update the convolutional neural network through the mobile communication module 150. Accordingly, the cloud server can issue the trained convolutional neural network to the mobile phone 100 according to the request of the mobile phone 100.

In addition, it should be further noted that, in the scenario where the convolutional neural network is trained by the cloud server (or other servers), the cloud server may customize the convolutional neural network suitable for different mobile phones 100 according to the customization requirements corresponding to the mobile phones 100 with different configurations, and update and iterate the convolutional neural network according to the image stream processing results fed back by different mobile phones 100.

With continued reference to fig. 3, the audio module 170 may include a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, and the like. Illustratively, the handset 100 may implement audio functionality through a speaker 170A, a receiver 170B, a microphone 170C, an earpiece interface 170D, etc. in the application processor and audio module 170. Such as an audio and video recording function.

With continued reference to fig. 3, the sensor module 180 may include a pressure sensor, a gyroscope sensor, a barometric sensor, a magnetic sensor, an acceleration sensor, a distance sensor, a proximity sensor, a fingerprint sensor, a temperature sensor, a touch sensor, an ambient light sensor, a bone conduction sensor, etc., which are not further illustrated herein, but are not limiting.

With continued reference to fig. 3, the keys 190 include a power-on key, a volume key, etc. The handset 100 may receive key inputs, generating signal inputs related to user settings and function control of the handset 100. The motor 191 may generate a vibration cue. The indicator 192 may be an indicator light, may be used to indicate a state of charge, a change in charge, a message indicating a missed call, a notification, etc.

With continued reference to fig. 3, a camera 193 is used to capture still images or video. The mobile phone 100 may implement photographing functions through an ISP, a camera 193, a video codec, a GPU, a display 194, an application processor, and the like. Specifically, the object generates an optical image through a lens and projects the optical image onto a photosensitive element. The photosensitive element may be a charge coupled device (charge coupled device, CCD) or a Complementary Metal Oxide Semiconductor (CMOS) phototransistor. The photosensitive element converts the optical signal into an electrical signal, which is then transferred to the ISP to be converted into a digital image signal. The ISP outputs the digital image signal to the DSP for processing. The DSP converts the digital image signal into an image signal in a standard RGB, YUV, or the like format. In some implementations, the cell phone 100 may include 1 or N cameras 193, N being a positive integer greater than 1. In particular, in the technical solution provided in the present application, the mobile phone 100 at least needs to include 2 cameras 193, and the 2 cameras 193 are located on the same side, for example, all rear cameras or all front cameras.

With continued reference to fig. 3, a display screen 194 is used to display images, video, etc. The display 194 includes a display panel. In some implementations, the cell phone 100 may include 1 or N display screens 194, N being a positive integer greater than 1. The cell phone 100 may implement display functions through a GPU, a display 194, an application processor, and the like. The GPU is a microprocessor for image processing, and is connected to the display 194 and the application processor. The GPU is used to perform mathematical and geometric calculations for graphics rendering. Processor 110 may include one or more GPUs that execute program instructions to generate or change display information.

As to the hardware architecture of the handset 100, it should be understood that the handset 100 shown in fig. 3 is only one example, and in a specific implementation, the handset 100 may have more or fewer components than shown in the figures, may combine two or more components, or may have a different configuration of components. The various components shown in fig. 3 may be implemented in hardware, software, or a combination of hardware and software, including one or more signal processing and/or application specific integrated circuits.

In order to better understand the software structure of the mobile phone 100 shown in fig. 3, the following describes the software structure of the mobile phone 100. Before explaining the software structure of the mobile phone 100, an architecture that the software system of the mobile phone 100 can employ will be first described.

Specifically, in practical applications, the software system of the mobile phone 100 may adopt a layered architecture, an event-driven architecture, a microkernel architecture, a microservice architecture, or a cloud architecture.

Furthermore, it is understood that software systems currently in use in mainstream electronic devices include, but are not limited to, windows systems, android systems, and iOS systems. For convenience of explanation, the embodiment of the present application takes an Android system with a layered architecture as an example, and illustrates a software structure of the mobile phone 100.

In addition, the image stream processing scheme provided in the embodiment of the application is applicable to other systems in specific implementation.

Referring to fig. 4, a software architecture diagram of a mobile phone 100 according to an embodiment of the present application is shown.

As shown in fig. 4, the layered architecture of the electronic device 100 divides the software into several layers, each with a clear role and division of labor. The layers communicate with each other through a software interface. In some implementations, the Android system is divided into five layers, from top to bottom, an application layer, an application framework layer, an Zhuoyun rows (Android run) and libraries, a hardware abstraction layer (hardware abstraction layer, HAL), and a kernel layer, respectively.

The application layer may include a series of application packages, among other things. As shown in fig. 4, the application package may include applications such as camera, setup, map, WLAN, bluetooth, gallery, music, etc., which are not to be construed as limiting the application.

Wherein the application framework layer provides an application programming interface (application programming interface, API) and programming framework for application programs of the application layer. In some implementations, these programming interfaces and programming frameworks can be described as functions. As shown in FIG. 4, the application framework layer may include functions of a high resolution high frame rate image stream processing module, a camera service, a view system, a window manager, a resource manager, etc., which are not explicitly recited herein, and the present application is not limited in this regard.

Illustratively, in this embodiment, the camera service is configured to invoke a camera, such as the first camera and the second camera shown in fig. 2, in response to a request of an application.

In this embodiment, the high-resolution high-frame-rate image stream processing module is configured to process the high-resolution low-frame-rate image stream and the low-resolution high-frame-rate image stream provided by the camera service, so as to obtain the high-resolution high-frame-rate image stream.

The interaction between the camera service and the high-resolution high-frame-rate image stream processing module and the interaction between the camera service and other functional modules involved in implementing the image stream processing method provided in this embodiment will be described in detail in the embodiment shown in fig. 8, which is not repeated here.

It should be understood that the above-mentioned division of the functional modules is merely an example for better understanding the technical solution of the present embodiment, and is not the only limitation of the present embodiment. In practical applications, the above functions may also be integrated into one functional module, which is not limited in this embodiment.

In addition, in practical applications, the above functional modules may also be represented as services, frameworks, such as a high-resolution high-frame-rate image stream processing service, and the like, which is not limited in this embodiment.

In addition, it should be noted that the window manager located in the application framework layer is used for managing the window program. The window manager can acquire the size of the display screen, judge whether a status bar exists, lock the screen, intercept the screen and the like.

In addition, it should be noted that the view system located in the application framework layer includes visual controls, such as a control for displaying text, a control for displaying pictures, and the like. The view system may be used to build applications. The display interface may be composed of one or more views. For example, a display interface including a text message notification icon may include a view displaying text and a view displaying a picture.

In addition, it should be noted that the resource manager in the application framework layer provides various resources for the application program, such as localization strings, icons, pictures, layout files, video files, and the like, which are not listed here, and the application is not limited thereto.

Android run time includes a core library and virtual machines. Android run is responsible for scheduling and management of the Android system.

The core library consists of two parts: one part is a function which needs to be called by java language, and the other part is a core library of android.

The application layer and the application framework layer run in a virtual machine. The virtual machine executes java files of the application program layer and the application program framework layer as binary files. The virtual machine is used for executing the functions of object life cycle management, stack management, thread management, security and exception management, garbage collection and the like.

The system library may include a plurality of functional modules. For example: surface manager (surface manager), media Libraries (Media Libraries), three-dimensional (3D) graphics processing Libraries (e.g., openGL ES), two-dimensional (2D) graphics engines (e.g., SGL), etc.

The surface manager is used to manage the display subsystem and provides a fusion of 2D and 3D layers for multiple applications.

Media libraries support a variety of commonly used audio, video formats for playback and recording, still image files, and the like. The media library may support a variety of audio video encoding formats, such as: MPEG4, h.264, MP3, AAC, AMR, JPG, PNG, etc.

The three-dimensional graphic processing library is used for realizing three-dimensional graphic drawing, image rendering, synthesis, layer processing and the like.

It will be appreciated that the 2D graphics engine described above is a drawing engine for 2D drawing.

The HAL layer is an interface layer between the operating system kernel and the hardware circuitry. HAL layers include, but are not limited to: an Audio hardware abstraction layer (Audio HAL) and a Camera hardware abstraction layer (Camera HAL). The Audio HAL is used for processing the Audio stream, for example, noise reduction, directional enhancement and the like are performed on the Audio stream; the Camera HAL is used to process the image stream.

In particular, in the technical solution provided in the embodiment of the present application, camera HAL of HAL layer is needed for shooting by using Camera application installed in mobile phone 100, for example, for recording video.

Furthermore, it is understood that the kernel layer in the Android system is a layer between hardware and software. The inner core layer at least comprises a display driver, a camera driver, a microphone driver, a Bluetooth driver, a sensor driver and the like. For example, the camera driver may be configured to transmit an image captured by the camera to the camera service through the camera hardware abstraction layer, so that the camera service may send the image data captured by the camera to the high resolution high frame rate image stream processing module for processing.

It should be noted that, as a possible implementation manner, there may be a plurality of camera drives, for example, each camera corresponds to one camera drive. And the camera drivers corresponding to different cameras can output proper image streams to the camera hardware abstraction layer according to the picture modes (parameters) of the corresponding cameras, and then the proper image streams are transmitted to the camera service.

As another possible implementation, the number of camera drives may be 1, i.e. a plurality of cameras corresponds to the same camera drive. During specific operation, the camera driver can output proper image streams to the camera hardware abstraction layer according to the corresponding drawing modes (parameters) of different cameras, and then the proper image streams are transmitted to the camera service.

For convenience of explanation, in this embodiment, a plurality of cameras are driven corresponding to one camera, where a first camera corresponds to a pattern with a high resolution and a low frame rate, and a second camera corresponds to a pattern with a low resolution and a high frame rate. Illustratively, in this scenario, the camera driver will take a high resolution low frame rate image stream from the first camera in a high resolution low frame rate configuration and transmit the high resolution low frame rate image stream to the camera service. Meanwhile, the camera driver acquires the low-resolution high-frame-rate image stream from the second camera according to the low-resolution high-frame-rate configuration, and transmits the low-resolution high-frame-rate image stream to the camera service. In this way, the camera service can send two paths of image streams obtained under the same scene to the high-resolution high-frame-rate image stream processing module, so that the high-resolution high-frame-rate image stream processing module can acquire high-resolution high-frame-rate video streams according to the two paths of image streams, and therefore slow-motion video shooting with high quality, low calculation and low time delay is achieved.

As to the software structure of the mobile phone 100, it will be understood that the layers and the components included in the layers in the software structure shown in fig. 4 do not constitute a specific limitation on the mobile phone 100. In other embodiments of the present application, the handset 100 may include more or fewer layers than shown, and more or fewer components may be included in each layer, as the application is not limited.

Based on the above hardware structure and software structure, the image stream processing method provided by the embodiment of the present application is specifically described below.

Referring to fig. 5, an exemplary image stream processing method provided by an embodiment of the present application may include an image stream acquisition link, an optical Flow Map (opticacl Flow) and visual Map (visual Map) calculation link, an optical Flow Map and visual Map up-sampling link, a high-resolution intermediate frame calculation link, and a video encoding output link.

The image stream acquisition link includes acquisition of a high-resolution low-frame-rate image stream (for example, acquisition by a first camera of the mobile phone 100 shown in fig. 2) and acquisition of a low-resolution high-frame-rate image stream (for example, acquisition by a second camera of the mobile phone 100 shown in fig. 2).

It will be appreciated that in a particular implementation, the image stream described above may be regarded as an image sequence comprising a plurality of image frames. In particular, in the technical scheme provided by the embodiment of the application, the image sequence corresponding to the high-resolution low-frame-rate image stream acquired by the first camera may be represented by H, and the image sequence corresponding to the low-resolution high-frame-rate image stream acquired by the second camera may be represented by L.

In addition, it should be noted that, as a possible implementation manner, the optical flow map and the visual map calculation link, the optical flow map and the visual map up-sampling link, and the high-resolution intermediate frame calculation link may be integrated in the high-resolution high-frame rate image flow processing module provided by the embodiment of the present application.

For example, as another possible implementation manner, the optical flow map and the visual map calculation link, the optical flow map and the visual map up-sampling link, and the high-resolution intermediate frame calculation link may also be respectively used as separate image flow processing modules, that is, the image flow processing modules respectively corresponding to the three links achieve the function of the high-resolution high-frame-rate image flow processing module.

Furthermore, it should be noted that, in some possible implementations, the video encoding output link may be implemented by a separate functional module. In another possible implementation manner, the video coding output link may also be integrated in the high resolution high frame rate image stream processing module provided in the embodiment of the present application.

For convenience of explanation, in this embodiment, the four links (other than the image stream acquisition link) are integrated into one functional module, that is, the high-resolution high-frame-rate image stream processing module provided in the embodiment of the present application is taken as an example.

Specifically, in the calculation link of the optical flow map and the visual map, the high-resolution high-frame-rate image stream processing module calculates the optical flow map and the visual map of the low-resolution intermediate frame by using the low-resolution high-frame-rate image stream. The calculation of the optical flow diagram and the visual diagram of the low-resolution intermediate frame can be realized based on a convolutional neural network.

It is understood that the optical flow map and the visual map calculated through the convolutional neural network are low-resolution, and the intermediate frame of high resolution is required to be inserted based on the video interpolation frame, so in order to enable the optical flow map and the visual map of the intermediate frame of low resolution to reach the high resolution level corresponding to the first camera, the optical flow map and the visual map calculated through the convolutional neural network need to be up-sampled.

Specifically, in the step of upsampling the optical flow map and the visual map, the high-resolution high-frame-rate image flow processing module performs upsampling processing on the optical flow map and the visual map of the intermediate frame with low resolution, so as to process the resolutions of the obtained optical flow map and visual map into the resolutions corresponding to the first camera, namely the high resolution.

With continued reference to FIG. 5, exemplary, after the upsampled optical flow map and the visual map are obtained, a high resolution intermediate frame calculation link may be entered.

Specifically, in the high-resolution intermediate frame computing link, the high-resolution high-frame rate image stream processing module utilizes the optical flow diagram and the visual diagram obtained in the optical flow diagram and visual diagram up-sampling link to combine the front frame and the rear frame in the image sequence corresponding to the high-resolution low-frame rate image stream to compute the image frame (intermediate frame) in the image sequence between the two frames.

In this way, in the video coding output link, the intermediate frame calculated in the high-resolution intermediate frame calculation link and the two frames of image frames before and after the intermediate frame are coded, so that the high-resolution high-frame-rate image stream can be obtained.

Specific implementation details in an image stream acquisition link, an optical flow map and a visual map calculation link, an optical flow map and a visual map up-sampling link and a high-resolution intermediate frame calculation link in the image stream processing method provided by the embodiment of the application are described below with reference to fig. 6.

Referring to fig. 6, an exemplary example is an image stream acquired by a first camera as a high resolution low frame rate image stream, and an image stream acquired by a second camera as a low resolution high frame rate image stream.

With continued reference to fig. 6, by way of example, in one possible implementation, the first camera may acquire 2 frames of high resolution image frames, for example, during one acquisition period. Wherein the size information of each Frame of high resolution image Frame may be expressed as nh×nwxframe, and each Frame of high resolution image Frame is described as nh×nwxframe image Frame later.

With continued reference to fig. 6, by way of example, the second camera may acquire 3 frames of low resolution image frames during the same acquisition period. Wherein the size information of each frame of the low resolution image frame may be expressed as h×w×mframe, and each frame of the high resolution image frame is described as h×w×mframe.

With continued reference to fig. 6, exemplary, in the image sequence corresponding to the high resolution low Frame rate image stream, 2 frames nh×nwxframe image frames may be represented as respectively according to a temporal orderAnd->. In the image sequence corresponding to the low resolution high frame rate image stream, 3 framesThe H×W×mFrame image frames may be expressed as +.>、/>And->. Wherein (1)>For low resolution intermediate frames +.>Is->Before frame,/, of (a)>Is->Is a frame subsequent to the frame of (a).

In addition, in order to realize two image streams, i.e., a high-resolution low-frame-rate image stream and a low-resolution high-frame-rate image stream, a high-resolution high-frame-rate image stream is processed. The image frames in the image sequence corresponding to the two image streams need to be synchronized. As shown in fig. 6, the two image streams output by the first camera and the second camera are set to be always synchronous. The high resolution low Frame rate image stream still output by the first camera includes 2 frames nH x nwx Frame image frames (e.g And->) The low resolution high frame rate image stream output by the second camera includes 3 frames H×W×mFrame image frames (e.g.)>、/>And->) For example, as a possible implementation, in the case of clock synchronization, wherein +.>And->Time alignment (F) of (E) a (E)>And->Time alignment. That is, for high resolution low frame rate image streams, it is the AND that needs to be insertedTime-aligned nh×nwxframe image frames, which are described later as +.>。

With continued reference to FIG. 6, exemplary camera drives are acquired from the first camera and the second cameraAndthese 2 frames nH x nwxframe image frames, and +.>、/>And->After the 3 frames H W mFrame image frame, the image frame will be processedThe camera hardware abstract layer is transmitted to the camera service, and finally the camera service is used for carrying out +.>And->These 2 frames nH x nwxframe image frames, and +.>、/>And->The 3 frame h×w×mframe image frames are sent to a high resolution high frame rate image stream processing module.

With continued reference to fig. 6, an exemplary high resolution high frame rate image stream processing module sequentially enters a light flow map and visual map calculation link, a light flow map and visual map up-sampling link, and a high resolution intermediate frame calculation link.

Illustratively, in one possible implementation, in the optical flow map and visual map calculation link, the high resolution high frame rate image flow processing module may process 、/>And->The 3 frames of H multiplied by W multiplied by mFrame image frames are taken as input parameters and input into a pre-trained convolutional neural network to directly obtain a forward optical flow diagram (& lt/EN)>) Forward visualisation map (+)>) Optical flow diagram in backward direction (++>) And a backward visualization (++>) As shown in fig. 7.

It will be appreciated that, for the manner shown in fig. 7, the convolutional neural network needs to be iteratively trained by using the previous frame image frame, the middle frame image frame and the next frame image frame in the low-resolution high-frame-rate image stream as training data in the construction link until the requirement of the image stream processing method provided by the embodiment of the application is met.

With continued reference to FIG. 6, an exemplary method is now provided、/>、/>And->The high resolution high frame rate image stream processing module then upsamples the 4 types of hxw xmframe image frames. Specifically, these 4 types of hxw×mframe image frames are processed into nh×nwxframe image frames, i.e., up to the same resolution as the high-resolution low-Frame-rate image frames acquired by the first camera. For convenience of description, the present embodiment will upsample +. >Denoted as->Up-sampled +.>Denoted as->Up-sampled +.>Denoted as->Up-sampled +.>Denoted as->。

In addition, regarding the upsampling method used in the upsampling step of the optical flow chart and the visual chart, any technique for changing the low resolution image into high resolution may be used in actual operation. Such as resampling, interpolation, etc., to which the present application is not limited.

With continued reference to FIG. 6, an exemplary method is now provided、/>、/>And->The high resolution high frame rate image stream processing module will then incorporate->And->These 2 frames nH x nwxframe image frames calculate the prediction +.>The high-resolution intermediate frame enters a high-resolution intermediate frame calculation link.

Specifically, in the high resolution intermediate frame calculation step, the high resolution high frame rate image stream processing module calculates the image stream based on the following formula。

In some possible implementation manners, the calculation of the intermediate frame in the high-resolution intermediate frame calculation link may be specifically implemented based on a main stream video interpolation manner such as bicubic interpolation or linear interpolation, but when the high-resolution intermediate frame is calculated in the image stream processing method provided by the embodiment of the present application, not only the front and rear frames in the high-resolution low-frame-rate image stream are used, but also the forward optical flow map of the intermediate frame determined based on the low-resolution high-frame rate, the visual map of the high-resolution intermediate frame obtained by upsampling the visual map, and the backward optical flow map of the intermediate frame determined based on the low-resolution high-frame rate, the visual map of the high-resolution intermediate frame obtained by upsampling the visual map are considered, as shown in the above formula. Therefore, the image stream processing method provided by the embodiment of the application can ensure the calculated high-resolution intermediate frame, and not only considers the high-resolution characteristic, but also considers the high-frame rate characteristic.

Thus, in the video coding output link, the obtained video is used for the video coding outputAnd coding, namely inserting the high-resolution low-frame-rate image stream into an image sequence corresponding to the high-resolution low-frame-rate image stream, so that the high-resolution high-frame-rate image stream can be obtained.

In addition, in the case of the high-resolution high-frame-rate image stream acquisition scene, which camera performs image stream acquisition in a high-resolution low-frame-rate image pattern is specifically used by which electronic device such as a mobile phone, and which camera performs image stream acquisition in a low-resolution high-frame-rate image pattern can be predetermined according to a shooting mode provided by a camera application.

Regarding the relationship between the shooting mode and the drawing mode (parameter), when the camera application is installed, a corresponding configuration file (a file recording the relationship between the shooting mode and the drawing mode (parameter)) may be saved to the electronic device internal memory under a path corresponding to the camera application. In this way, when the camera application is started each time and a specific shooting mode is selected, the electronic equipment can automatically load a corresponding picture mode according to the shooting mode selected by the user, so that different cameras can acquire image streams according to the preset picture mode.

The following describes in detail the creation process of the camera application in the process of calling the camera and the process of processing the image stream after the image stream is acquired, thereby obtaining the high-resolution high-frame-rate image stream, in combination with the interactive flow schematic diagram of each functional module shown in fig. 8.

Referring to fig. 8, an exemplary creation process specifically includes:

s101, the camera application calls a camera service, and the camera service performs corresponding processing.

Illustratively, after the camera application is started (e.g., the process from (1) in fig. 1A to (2) in fig. 1A), the camera application invokes the camera service, e.g., the camera application sends a request message to the camera service, which may include, but is not limited to: application ID (which may be, for example, an application package name), PID (Process Identification, process identification number), configuration information of the camera application, and the like. For example, the configuration information may include resolution (for example 1080×720 p) corresponding to an image displayed by the camera application, that is, drawing mode (parameter) information corresponding to different photographing modes read from the configuration file in the above embodiment.

Alternatively, the request message may not include the application ID, for example, the request message includes configuration information, and the camera service may obtain, through an interface of the application layer, the application ID and the PID of the application corresponding to the received request message.

The corresponding processing with respect to the camera service may be, for example, creating a camera service instance, a camera device client instance, a camera device instance, a camera data stream instance, etc.

The camera service instance is used for providing an API interface for an application program layer capable of calling the camera, and creating a corresponding session based on a request of the application. Taking the camera application as an example, the camera service instance may receive a request input by the camera application (including an ID of the application, a configuration 1, and the like) based on the API interface, and the camera service instance may create a corresponding session (for example, identification information of the session is session 1) based on the request of the camera application, and output the ID of the application of the camera application, the configuration 1, and the identification information of the session (i.e., session 1) to the camera device client instance.

For example, configuration 1 is a graph mode (parameter) corresponding to a video in a high-resolution slow motion scene, for example, a first camera configures an image acquisition stream with a high resolution and a low frame rate, and a second camera configures an image acquisition stream with a low resolution and a high frame rate.

The camera equipment client instance can be regarded as a client of the camera service, and is mainly used for providing an E interface (an interface between different mobile service switching centers controlling adjacent areas) for the camera service to perform data interaction with other modules. The camera equipment client instance stores the corresponding relation between the application ID and the session1, and outputs the ID, the configuration 1 and the session1 of the application of the camera application to the camera equipment instance.

Wherein the camera device instance is used to provide an interface for the HAL layer and for transparent transmission of data (e.g. image streams, image frames). Specifically, the camera device instance records the correspondence between the information based on the application ID, configuration 1 and session1 of the camera application input by the camera device client instance, and outputs the application ID and session1 to the camera data stream instance.

The camera data flow instance is used for carrying out corresponding processing on the image. Specifically, the camera data flow instance stores the application ID and session1 of the camera application input by the camera device instance.

The interaction between the above examples of the camera service creation may refer to the existing API standard, and will not be described herein.

It should be understood that the above description is only an example for better understanding of the technical solution of the present application, and is not to be taken as the only limitation of the present application. In practical application, besides shooting by using a camera application, for example, when recording video in a high-resolution slow-motion scene, the image stream processing method provided by the embodiment can be adopted, and when executing the recording of video in a similar scene (high-resolution high-frame rate) by using a calling camera integrated in other applications, the image stream processing method provided by the embodiment can also be adopted.

S102, the camera service calls a camera hardware abstract layer, and the camera hardware abstract layer performs corresponding processing.

That is, the camera hardware abstraction layer of the HAL layer is called by the camera equipment instance, and a shooting request triggered by the camera, such as a request for recording a video in a high-resolution slow motion scene, is transmitted to the camera hardware abstraction layer, so that the camera hardware abstraction layer can create a corresponding instance in response to the request.

S103, the camera hardware abstraction layer calls a camera driver in the kernel layer, and the camera driver carries out corresponding processing.

Understandably, the respective processing by the camera driver is, for example, to establish the corresponding instance.

S104, the camera driver calls the corresponding camera.

Illustratively, the camera begins to acquire image streams in a corresponding configuration in response to a call from the camera driver, e.g., the first camera acquires image streams in a high resolution low frame rate configuration and the second camera acquires image streams in a low resolution high frame rate configuration. It should be noted that, in the creating process, each instance or module in the camera hardware abstraction layer and the camera driver performs corresponding processing on data (for example, an image), and the specific processing process may refer to a technical scheme in the embodiment of the prior art, which is not described in detail in the present application.

With continued reference to fig. 8, an exemplary process for processing an image stream after the image stream is acquired to obtain a high resolution high frame rate image stream specifically includes:

s201, the camera outputs the acquired image stream to a camera driver.

It should be noted that, in this embodiment, the electronic device having at least two cameras on the same side is provided, so two paths of image streams are output to the camera driver, which are respectively a high-resolution low-frame-rate image stream acquired by the first camera and a low-resolution high-frame-rate image stream acquired by the second camera.

Correspondingly, after the camera driver obtains the two paths of image streams, the two paths of image streams are transmitted to the high-resolution high-frame-rate image stream processing module through a camera hardware abstraction layer and camera service.

S202, the camera driver outputs two paths of image streams to a camera hardware abstraction layer.

And S203, the camera hardware abstraction layer outputs two paths of image streams to the camera service.

S204, the camera service outputs the two paths of image streams to the high-resolution high-frame-rate image stream processing module.

Understandably, as can be seen from the above description, the camera drives two image streams output to the camera hardware abstraction layer, specifically, a high resolution low frame rate image stream acquired by the first camera and a low resolution high frame rate image stream acquired by the second camera.

Correspondingly, the two paths of image streams output to the camera service by the camera hardware abstraction layer are the high-resolution low-frame-rate image stream acquired by the first camera and the low-resolution high-frame-rate image stream acquired by the second camera. In addition, it should be noted that, the camera hardware abstraction layer outputs two image streams to the camera service, specifically, outputs to the camera device instance in the camera service.

Correspondingly, the two paths of image streams output to the high-resolution high-frame-rate image stream processing module by the camera service are the high-resolution low-frame-rate image stream acquired by the first camera and the low-resolution high-frame-rate image stream acquired by the second camera.

In addition, as can be seen from the description of the creation process, the camera device instance records the corresponding relationship between the application ID, the configuration 1 and the session1 of the camera application input by the camera device client instance, so that after receiving the two image streams input by the camera hardware abstraction layer, the camera device instance detects the currently stored session. When session1 and other information (including application ID and configuration 1) are stored currently, the camera device instance outputs the two image streams, session1, and the configuration (resolution information) corresponding to each image stream to the high-resolution high-frame-rate image stream processing module.

It should be noted that, in this embodiment, the purpose of maintaining and associating session in the creation process and the shooting process is to facilitate distinguishing between different session processes, so that the method provided in this embodiment can be applied to various implementation scenarios, such as multiple calls for shooting, etc.

S205, the high-resolution high-frame-rate image stream processing module performs the sequence of the image stream processing links shown in FIG. 5 based on the high-resolution low-frame-rate image stream acquired by the first camera and the low-resolution high-frame-rate image stream acquired by the second camera, performs the processing operations of the optical flow graph and the visual image calculation link, the optical flow graph and the visual image up-sampling link, the high-resolution intermediate frame calculation link and the video coding output link, further obtains the high-resolution high-frame-rate image stream, and outputs the obtained high-resolution high-frame-rate image stream to the camera service.

Details of the specific implementation of step S205 may be found in the descriptions of the embodiments shown in fig. 5, 6 and 7, and will not be described herein.

S206, the camera service outputs the high-resolution high-frame-rate image stream to the camera application.

Regarding the process that the camera service outputs the high resolution high frame rate image stream to the camera application, for example, the camera device instance receives the session corresponding to the high resolution high frame rate image stream and the camera application input by the high resolution high frame rate image stream processing module, for example, session1, the camera device instance outputs the high resolution high frame rate image stream and session1 to the camera data stream instance, then the camera data stream instance outputs the high resolution high frame rate image stream to the camera application based on the recorded correspondence between session1 and the application ID of the camera application, so that the picture corresponding to the image frame in the high resolution high frame rate image stream can be displayed on the current interface, or can be directly stored in the directory corresponding to the gallery application and the video data.

Therefore, in the image stream processing method provided by the embodiment of the application, by combining the two paths of video streams with low resolution, high frame rate and high resolution and low frame rate, the video stream acquisition with high resolution and high frame rate is realized under the condition that the hardware of the electronic equipment is not changed, namely the hardware cost is not increased, so that the slow-motion video shooting with high quality, low calculation amount and low time delay is realized, and the shooting requirement of a user is met.

Through the above description, the image stream processing method provided by the embodiment of the present application may specifically include the following steps in one possible implementation manner:

s301, acquiring a first image stream acquired by a first camera and a second image stream acquired by a second camera.

In one possible implementation, the first image stream acquired by the first camera is, for example, a high resolution low frame rate image stream. The second image stream acquired by the second camera is, for example, a low-resolution high-frame-rate image stream.

It should be noted that, in order to ensure that the image stream processing method provided by the embodiment of the present application can be performed normally, the first image stream and the second image stream need to be kept always synchronous.

With regard to the constant synchronization of the first image stream and the second image stream, in one possible implementation this may be realized in hardware. In another possible implementation, the time alignment may also be performed based on the time stamps of each image frame in a subsequent process.

For convenience of explanation, in this embodiment, in one acquisition period, the second image stream acquired by the second camera includes 3 image frames, i.e., a first low-resolution image frame, a second low-resolution image frame, and a third low-resolution image frame, and the first image stream acquired by the first camera includes 2 image frames, i.e., a first high-resolution image frame and a second high-resolution image frame. In such a scenario, the second low resolution image frame may be considered as an intermediate frame in the second image stream, the first low resolution image frame being a frame preceding the second low resolution image frame, and the third low resolution image frame being a frame following the second low resolution image frame.

Corresponding to the above embodiment, the first high resolution image frame can be regarded as what has been said in the above embodimentThe second high resolution image frame can be seen as +.>The first low resolution image frame can be seen as +.>The second low resolution image frame can be seen as +.>The third low resolution image frame can be seen as +.>。

The process of acquiring the first image stream acquired by the first camera and the second image stream acquired by the second camera in this embodiment is essentially the process performed in the image stream acquisition step described in the embodiment shown in fig. 5. For details of the specific implementation of step S301, reference may be made to the embodiments shown in fig. 5 to 8, and description about the operations performed in the image stream acquisition link will not be repeated here.

S302, based on the second image stream, a first light flow graph and a first visual graph of an intermediate frame in the second image stream are calculated.

The first optical flow map comprises a first forward optical flow map and a first backward optical flow map, and the first visual map comprises a first forward visual map and a first backward visual map.

Corresponding to the above embodiment, the first forward light flow graph can be regarded as what has been said in the above embodimentThe first backward flow chart can be regarded as +.>The first forward visual map can be seen as said +.>The first backward visual map can be regarded as +.>。/>

For example, in one possible implementation, the first forward light flow map, the first forward visual map, the first backward light flow map, and the first backward visual map may be calculated directly from a previous frame of an intermediate frame, the intermediate frame, and a subsequent frame of the intermediate frame in the second image stream.

For this way, the specific implementation may be that the convolutional neural network as shown in fig. 7 is trained in advance based on the above parameter types, and then when the first optical flow graph and the first visual graph of the intermediate frame in the second image stream need to be calculated based on the second image stream, the previous frame, the intermediate frame and the subsequent frame are taken as input parameters, the convolutional neural network is input, the first optical flow graph output by the convolutional neural network is taken as a first forward optical flow graph, the first visual graph output by the convolutional neural network is taken as a first forward visual graph, the second optical flow graph output by the convolutional neural network is taken as a first backward optical flow graph, and the second visual graph output by the convolutional neural network is taken as a first backward visual graph.

With respect to the second image stream, the first optical flow diagram and the first visual diagram of the intermediate frame in the second image stream are calculated, which is substantially the operation performed in the optical flow diagram and visual diagram calculation link described in the embodiment shown in fig. 5. For details of the specific implementation of step S302, reference may be made to the embodiments shown in fig. 5 to 8, and descriptions about operations performed in the calculation links of the light-flow graph and the visualization are not repeated here.

S303, up-sampling the first light flow graph and the first visual graph to obtain a second light flow graph and a second visual graph.

Wherein the second optical flow map comprises a second forward optical flow map and a second backward optical flow map, and the second visual map comprises a second forward visual map and a second backward visual map.

Corresponding to the above embodiment, the second forward light flow graph can be regarded as what has been said in the above embodimentThe second backward flow chart can be regarded as +.>The second forward visual map can be regarded as +.>The second backward visual map can be regarded as +.>。

In practical application, the second forward optical flow diagram can be obtained by upsampling the first forward optical flow diagram to the resolution corresponding to the first image flow; upsampling the first forward visual image to a resolution corresponding to the first image stream to obtain a second forward visual image; upsampling the first backward light flow graph to a resolution corresponding to the first image flow to obtain a second backward light flow graph; and up-sampling the first backward visual image to the resolution corresponding to the first image stream to obtain a second backward visual image.

The up-sampling of the first optical flow graph and the first visual graph in this embodiment to obtain the second optical flow graph and the second visual graph is essentially the operation performed in the up-sampling link of the optical flow graph and the visual graph described in the embodiment shown in fig. 5. For details of the specific implementation of step S303, reference may be made to the embodiments shown in fig. 5 to 8, and descriptions about operations performed in the upsampling step of the optical flow diagram and the visualization are not repeated here.

S304, calculating an image of an intermediate frame in the first image stream based on the first image stream, the second light flow map and the second visual map.

Specifically, an image of an intermediate frame in a first image stream is calculated based on a first high-resolution image frame, a second forward light flow map, a second forward visual map, a second backward light flow map, and a second backward visual map in the first image stream.

Specifically, in practical application, a mainstream video frame inserting manner may be adopted, and according to the following formula, based on a first high-resolution image frame, a second forward optical flow graph, a second forward visual graph, a second backward optical flow graph, and a second backward visual graph in a first image stream, an image of an intermediate frame in the first image stream is calculated:

/>

With respect to the first image stream, the second light flow map and the second visual map, the image of the intermediate frame in the first image stream is calculated, which is essentially the operation performed in the high resolution intermediate frame calculation step described in the embodiment shown in fig. 5. For details of the specific implementation of step S304, reference may be made to the embodiments shown in fig. 5 to 8, and description of the operations performed in the high-resolution intermediate frame calculation link will not be repeated here.

And S305, encoding the image of the intermediate frame in the first image stream and the image in the first image stream to obtain a third image stream.

For example, it willInsert->And->In this way a third image stream of high resolution and high frame rate is obtained.

The process of encoding the intermediate frame image in the first image stream and the image in the first image stream to obtain the third image stream in this embodiment is essentially the process performed in the video encoding output link described in the embodiment shown in fig. 5. For details of the specific implementation of step S305, reference may be made to the embodiments shown in fig. 5 to 8, and description of the operations performed in the video coding output link will not be repeated here.

Compared with the current image stream processing mode based on optical stream calculation and optical stream estimation algorithm and based on how to obtain more accurate optical stream estimation, the image stream processing method provided by the embodiment of the application can solve the problems that the effect of the single-camera frame inserting algorithm is limited, such as high-speed moving objects, blurred image edges in complex scenes and ghost images caused by the lack of original intermediate frame information.

In addition, compared with the current training prediction estimation model method, the image stream processing method provided by the embodiment of the application can solve the problems that the existing method lacks the input of original physical information, causes lower prediction precision upper limit and further affects the image quality by learning the image stream processing mode of the inter-frame information prediction optical flow and the mask through the model.

In addition, it should be further noted that, the image stream processing scheme provided by the embodiment of the application can be suitable for situations where the frame rate and the resolution of the image stream acquisition cannot be met simultaneously due to the limitation of bandwidth as well as situations where the image stream acquisition scene with high resolution and high frame rate cannot be acquired directly due to the limitation of hardware. Specifically, by constructing an imaging system by using two paths of cameras with low bandwidth, the image stream processing technology provided by the embodiment of the application can be combined to realize the acquisition of the image stream with high frame rate and high resolution.

Furthermore, it will be appreciated that the electronic device, in order to achieve the above-described functions, comprises corresponding hardware and/or software modules that perform the respective functions. The present application can be implemented in hardware or a combination of hardware and computer software, in conjunction with the example algorithm steps described in connection with the embodiments disclosed herein. Whether a function is implemented as hardware or computer software driven hardware depends upon the particular application and design constraints imposed on the solution. Those skilled in the art may implement the described functionality using different approaches for each particular application in conjunction with the embodiments, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In addition, it should be noted that, in an actual application scenario, the image stream processing method provided in each of the foregoing embodiments implemented by the electronic device may also be executed by a chip system included in the electronic device, where the chip system may include a processor. The chip system may be coupled to a memory such that the chip system, when running, invokes a computer program stored in the memory, implementing the steps performed by the electronic device described above. The processor in the chip system can be an application processor or a non-application processor.

In addition, an embodiment of the present application further provides a computer readable storage medium, where computer instructions are stored, which when executed on an electronic device, cause the electronic device to execute the related method steps to implement the image stream processing method in the foregoing embodiment.

In addition, the embodiment of the application also provides a computer program product, which when being run on the electronic device, causes the electronic device to execute the related steps so as to realize the image stream processing method in the embodiment.

In addition, embodiments of the present application also provide a chip (which may also be a component or module) that may include one or more processing circuits and one or more transceiver pins; wherein the transceiver pin and the processing circuit communicate with each other through an internal connection path, and the processing circuit executes the related method steps to implement the image stream processing method in the above embodiment, so as to control the receiving pin to receive signals, and control the transmitting pin to transmit signals.

In addition, as can be seen from the above description, the electronic device, the computer-readable storage medium, the computer program product, or the chip provided by the embodiments of the present application are used to perform the corresponding methods provided above, and therefore, the advantages achieved by the embodiments of the present application can refer to the advantages in the corresponding methods provided above, and are not repeated herein.

The above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the application.

Claims

1. An image stream processing method, applied to an electronic device, the electronic device including a first camera and a second camera, the method comprising:

acquiring a first image stream acquired by the first camera and a second image stream acquired by the second camera, wherein the first image stream is a high-resolution low-frame-rate image stream, and the second image stream is a low-resolution high-frame-rate image stream;

a third image stream is calculated based on the first image stream and the second image stream, the third image stream being a high resolution high frame rate image stream.

2. The method of claim 1, wherein the computing a third image stream based on the first image stream and the second image stream, the third image stream being a high resolution high frame rate image stream, comprises:

Calculating a first light flow map and a first visual map of an intermediate frame in the second image stream based on the second image stream;

calculating an image of an intermediate frame in the first image stream based on the first image stream, the first light flow map and the first visualization map;

and encoding the image and the image in the first image stream to obtain the third image stream.

3. The method of claim 2, wherein the first optical flow map comprises a first forward optical flow map and a first backward optical flow map, the first visual map comprising a first forward visual map and a first backward visual map;

the computing a first light flow map and a first visualization map of an intermediate frame in the second image stream based on the second image stream, comprising:

and calculating the first forward optical flow map, the first forward visual map, the first backward optical flow map and the first backward visual map according to a previous frame of the intermediate frame, an intermediate frame and a subsequent frame of the intermediate frame in the second image stream.

4. The method of claim 3, wherein the first forward optical flow map and the first forward visual map, and the first backward optical flow map and the first backward visual map are calculated based on a pre-trained convolutional neural network;

Calculating the first forward optical flow map, the first forward visual map, the first backward optical flow map, and the first backward visual map from a previous frame of the intermediate frame, an intermediate frame, and a subsequent frame of the intermediate frame in the second image stream, comprising:

and taking the previous frame, the intermediate frame and the next frame as input parameters, inputting the convolutional neural network, taking a first optical flow diagram output by the convolutional neural network as the first forward optical flow diagram, taking the first visual diagram output by the convolutional neural network as the first forward visual diagram, taking a second optical flow diagram output by the convolutional neural network as the first backward optical flow diagram, and taking the second visual diagram output by the convolutional neural network as the first backward visual diagram.

5. The method of claim 2, wherein the computing an image of an intermediate frame in the first image stream based on the first image stream, the first light flow map, and the first visualization map comprises:

upsampling the first light flow graph and the first visual graph to obtain a second light flow graph and a second visual graph;

An image of an intermediate frame in the first image stream is calculated based on the first image stream, the second light flow map and the second visualization map.

6. The method of claim 5, wherein the first optical flow map comprises a first forward optical flow map and a first backward optical flow map, the first visual map comprising a first forward visual map and a first backward visual map; the second optical flow map comprises a second forward optical flow map and a second backward optical flow map, and the second visual map comprises a second forward visual map and a second backward visual map;

the up-sampling the first light flow graph and the first visual graph to obtain a second light flow graph and a second visual graph, including:

upsampling the first forward optical flow graph to a resolution corresponding to the first image flow to obtain the second forward optical flow graph;

upsampling the first forward visual map to a resolution corresponding to the first image stream to obtain the second forward visual map;

upsampling the first backward light flow graph to a resolution corresponding to the first image flow to obtain the second backward light flow graph;

and upsampling the first backward visual image to the resolution corresponding to the first image stream to obtain the second backward visual image.

7. The method of claim 6, wherein the first image stream and the second image stream remain clocked;

wherein when the second image stream includes a first low resolution image frame, a second low resolution image frame, and a third low resolution image frame, the first low resolution image frame is time aligned with the first high resolution image frame and the third low resolution image frame is time aligned with the second high resolution image frame, the first high resolution image frame is time aligned with the first high resolution image frame;

wherein the second low resolution image frame is the intermediate frame in the second image stream, the first low resolution image frame is a frame preceding the second low resolution image frame, and the third low resolution image frame is a frame following the second low resolution image frame.

8. The method of claim 7, wherein the computing an image of an intermediate frame in the first image stream based on the first image stream, the second light flow map, and the second visualization map comprises:

an image of an intermediate frame in the first image stream is calculated based on the first high resolution image frame, the second forward light flow map, the second forward visualization map, the second backward light flow map, the second backward visualization map in the first image stream.

9. The method of claim 8, wherein the image of the intermediate frame in the first image stream is calculated based on the first high resolution image frame, the second forward light flow map, the second forward visualization map, the second backward light flow map, the second backward visualization map in the first image stream according to the following formula:

wherein,for the first high resolution image frame, and (2)>For the second high resolution image frame, and (2)>For said second forward optical flow map, < >>For said second forward visual map, < > a->For said second backward optical flow map, < >>For said second backward visualization, < >>Is an image of an intermediate frame in the first image stream.

10. The method of claim 8, wherein encoding the image and the image in the first image stream to obtain the third image stream comprises:

the saidInsert said->And said->And obtaining the third image stream.

11. An electronic device, the electronic device comprising: a memory and a processor, the memory and the processor coupled; the memory stores program instructions that, when executed by the processor, cause the electronic device to perform the image stream processing method according to any one of claims 1 to 10.

12. A computer readable storage medium comprising a computer program which, when run on an electronic device, causes the electronic device to perform the image stream processing method of any one of claims 1 to 10.