CN112767295A

CN112767295A - Image processing method, image processing apparatus, storage medium, and electronic device

Info

Publication number: CN112767295A
Application number: CN202110049901.XA
Authority: CN
Inventors: 邹涵江
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2021-01-14
Filing date: 2021-01-14
Publication date: 2021-05-07

Abstract

The disclosure provides an image processing method, an image processing device, a computer readable storage medium and an electronic device, and relates to the technical field of image processing. The image processing method comprises the following steps: acquiring continuous multi-frame images; extracting foreground region images of at least two frames of images; fusing the foreground sub-regions according to the maximum pixel value of each foreground sub-region in each foreground region image to form a foreground fused image, wherein the foreground sub-regions are sub-regions of the foreground region image; and generating a target image based on the foreground fusion image and the background area image of at least one frame of image. The technical scheme for simulating the long exposure effect is achieved through multi-frame image fusion, the method is suitable for electronic equipment such as a smart phone, clear and continuous motion tracks can be reserved in the finally output target image, and meanwhile the influence of noise is reduced.

Description

Image processing method, image processing apparatus, storage medium, and electronic device

Technical Field

The present disclosure relates to the field of image processing technologies, and in particular, to an image processing method, an image processing apparatus, a computer-readable storage medium, and an electronic device.

Background

The long exposure is a photographing method in which a lens is maintained for a long time to obtain a special effect. Generally, the long exposure image can capture the motion track effect of a moving object, for example, a continuous light track effect (i.e., a streamer image) can be obtained by performing long exposure shooting on a motion light source (such as a car light and a meteor) in a night scene, and an image with a water flow running sense can be obtained by performing long exposure shooting on a water flow (such as a waterfall and a river).

Current long exposure shots are generally suitable for single lens reflex cameras. On electronic equipment such as a smart phone, because a camera lens is small, when long exposure shooting is used, if exposure time is insufficient, a shot moving body may have ghost (such as a ghost of a car body), if the exposure time is too long, overexposure of an image may be caused, and the long exposure process is easily affected by shaking of the equipment body, so that image blurring is caused.

Disclosure of Invention

The present disclosure provides an image processing method, an image processing apparatus, a computer-readable storage medium, and an electronic device, thereby achieving an image capturing effect of simulating a long exposure at least to a certain extent.

According to a first aspect of the present disclosure, there is provided an image processing method including: acquiring continuous multi-frame images; extracting foreground region images of at least two frames of images; fusing the foreground sub-regions according to the maximum pixel value of each foreground sub-region in each foreground region image to form a foreground fused image, wherein the foreground sub-regions are sub-regions of the foreground region image; and generating a target image based on the foreground fusion image and the background area image of at least one frame of image.

According to a second aspect of the present disclosure, there is provided an image processing method comprising: determining a reference frame image in continuous multi-frame images, and determining at least one frame image except the reference frame image as an image to be fused; determining a foreground region of the image to be fused; extracting at least one subregion from the image to be fused; and respectively fusing the sub-regions in the image to be fused to the reference frame image, wherein when the sub-regions are located in the foreground region, the larger pixel value of the sub-regions in the image to be fused and the reference frame image is used as the pixel value of the sub-regions after fusion.

According to a third aspect of the present disclosure, there is provided an image processing apparatus comprising: an image acquisition module configured to acquire a continuous multi-frame image; an image segmentation module configured to extract foreground region images of at least two frames of images; the foreground fusion module is configured to fuse the foreground sub-regions according to the maximum pixel value of each foreground sub-region in each foreground region image to form a foreground fusion image, wherein the foreground sub-regions are sub-regions of the foreground region image; and the target fusion module is configured to generate a target image based on the foreground fusion image and a background area image of at least one frame of image.

According to a fourth aspect of the present disclosure, there is provided an image processing apparatus comprising: the image fusion device comprises an image determining module, a fusion module and a fusion module, wherein the image determining module is configured to determine a reference frame image in continuous multi-frame images and determine at least one frame image except the reference frame image as an image to be fused; the image segmentation module is configured to determine a foreground region of the image to be fused from the foreground region; a subregion extraction module configured to extract at least one subregion from the image to be fused; and the sub-region fusion module is configured to fuse the sub-regions in the image to be fused to the reference frame image respectively, wherein when the sub-regions are located in the foreground region, a larger pixel value of the sub-regions in the image to be fused and the reference frame image is used as a pixel value of the sub-regions after fusion.

According to a fifth aspect of the present disclosure, there is provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the image processing method of the first or second aspect described above and possible implementations thereof.

According to a sixth aspect of the present disclosure, there is provided an electronic device comprising: a processor; a memory for storing executable instructions of the processor. Wherein the processor is configured to execute the image processing method of the first or second aspect and possible implementations thereof via execution of the executable instructions.

The technical scheme of the disclosure has the following beneficial effects:

the technical scheme for realizing the simulation of the long exposure effect through multi-frame image fusion is provided. On the one hand, this scheme is applicable to electronic equipment such as smart mobile phone, and adopt ordinary exposure to shoot multiframe image in the practical application can, compare in the mode that adopts long exposure to shoot, can avoid because ghost, overexposure scheduling problem that exposure time is improper leads to it is less to receive the influence of equipment fuselage shake, thereby improves the image blur problem. On the other hand, the scheme adopts maximum pixel value fusion for the foreground area image, which is equivalent to fusion of the actual image of the moving light source or the moving object, can keep clear and continuous motion trail, and simultaneously reduces the influence of noise, thereby improving the quality of the finally output target image.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

Fig. 1 shows a schematic configuration diagram of an electronic apparatus in the present exemplary embodiment;

FIG. 2 illustrates a flow chart of an image processing method in the present exemplary embodiment;

FIG. 3 is a flowchart showing a flow of determining an exposure condition in the present exemplary embodiment;

FIG. 4 is a flowchart illustrating a process for extracting foreground region images in the exemplary embodiment;

fig. 5 is a diagram illustrating extraction of a foreground region image in the present exemplary embodiment; (ii) a

FIG. 6 is a diagram illustrating one manner of determining sub-region pixel values in the present exemplary embodiment;

FIG. 7 illustrates a flowchart of the steps of an image registration process in the present exemplary embodiment;

FIG. 8 is a flowchart illustrating one process for generating a target image in the present exemplary embodiment;

fig. 9 is a flowchart illustrating an image processing method according to the exemplary embodiment;

fig. 10 is a flowchart illustrating another image processing method according to the present exemplary embodiment;

fig. 11 is a flowchart showing a flow of another image processing method in the present exemplary embodiment;

fig. 12 is a flowchart showing another image processing method in the present exemplary embodiment;

fig. 13 shows a continuous multi-frame image in the present exemplary embodiment;

FIG. 14 shows a target image resulting from fusion of the successive multi-frame images of FIG. 13

Fig. 15 is a schematic diagram showing the configuration of an image processing apparatus in the present exemplary embodiment;

fig. 16 shows a schematic configuration diagram of another image processing apparatus in the present exemplary embodiment.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the disclosure. One skilled in the relevant art will recognize, however, that the subject matter of the present disclosure can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and the like. In other instances, well-known technical solutions have not been shown or described in detail to avoid obscuring aspects of the present disclosure.

Furthermore, the drawings are merely schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus their repetitive description will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.

The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the steps. For example, some steps may be decomposed, and some steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.

Exemplary embodiments of the present disclosure first provide an image processing method, application scenarios of which include but are not limited to: in night scene shooting, a moving light source) is subjected to continuous multi-frame shooting, and a streamer image with a continuous optical track effect is obtained through fusion; and continuously shooting multiple frames of water flow, and obtaining waterfall or river images with a water flow running sense through fusion.

Exemplary embodiments of the present disclosure also provide an electronic device for performing the image processing method of the present exemplary embodiment. The electronic device may be a terminal device with a shooting function, such as a smart phone, a tablet computer, a wearable device, an unmanned aerial vehicle, or a computing device with an image processing function, for example, the terminal device sends a plurality of continuous frames of shot images to a server, the server processes the images, and returns the obtained target images to the terminal device. Generally, an electronic device includes a processor and a memory. The memory is used for storing executable instructions of the processor and can also be used for storing application data, such as image data, video data and the like; the processor is configured to perform the image processing method of the present exemplary embodiment via execution of executable instructions.

The structure of the electronic device is exemplarily described below by taking the mobile terminal 100 in fig. 1 as an example. It will be appreciated by those skilled in the art that the configuration of figure 1 can also be applied to fixed type devices, in addition to components specifically intended for mobile purposes.

As shown in fig. 1, the mobile terminal 100 may specifically include: a processor 110, an internal memory 121, an external memory interface 122, a USB (Universal Serial Bus) interface 130, a charging management Module 140, a power management Module 141, a battery 142, an antenna 1, an antenna 2, a mobile communication Module 150, a wireless communication Module 160, an audio Module 170, a speaker 171, a receiver 172, a microphone 173, an earphone interface 174, a sensor Module 180, a display 190, a camera Module 191, an indicator 192, a motor 193, a key 194, and a SIM (Subscriber identity Module) card interface 195.

Processor 110 may include one or more processing units, such as: the Processor 110 may include an AP (Application Processor), a modem Processor, a GPU (Graphics Processing Unit), an ISP (Image Signal Processor), a controller, an encoder, a decoder, a DSP (Digital Signal Processor), a baseband Processor, and/or an NPU (Neural-Network Processing Unit), etc.

The encoder may encode (i.e., compress) image or video data, for example, encode a plurality of consecutive frames of images taken to form corresponding code stream data, so as to reduce the bandwidth occupied by data transmission; the decoder may decode (i.e., decompress) the code stream data of the image or the video to restore the image or the video data, for example, decode the code stream data of the above-mentioned consecutive multi-frame images to obtain complete image data, which is convenient for executing the image processing method of the present exemplary embodiment. The mobile terminal 100 may support one or more encoders and decoders. In this way, the mobile terminal 100 may process images or video in a variety of encoding formats, such as: image formats such as JPEG (Joint Photographic Experts Group), PNG (Portable Network Graphics), BMP (Bitmap), and Video formats such as MPEG (Moving Picture Experts Group) 1, MPEG2, h.263, h.264, and HEVC (High Efficiency Video Coding).

In one embodiment, processor 110 may include one or more interfaces through which connections are made to other components of mobile terminal 100.

The internal memory 121 may be used to store computer-executable program code, which includes instructions. The internal memory 121 may include volatile memory and nonvolatile memory. The processor 110 executes various functional applications of the mobile terminal 100 and data processing by executing instructions stored in the internal memory 121.

The external memory interface 122 may be used to connect an external memory, such as a Micro SD card, for expanding the storage capability of the mobile terminal 100. The external memory communicates with the processor 110 through the external memory interface 122 to implement data storage functions, such as storing audio, video, and other files.

The USB interface 130 is an interface conforming to the USB standard specification, and may be used to connect a charger to charge the mobile terminal 100, or connect an earphone or other electronic devices.

The charging management module 140 is configured to receive charging input from a charger. While the charging management module 140 charges the battery 142, the power management module 141 may also supply power to the device; the power management module 141 may also monitor the status of the battery.

The wireless communication function of the mobile terminal 100 may be implemented by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, a modem processor, a baseband processor, and the like. The antennas 1 and 2 are used for transmitting and receiving electromagnetic wave signals. The mobile communication module 150 may provide a solution including 2G/3G/4G/5G wireless communication applied on the mobile terminal 100. The Wireless Communication module 160 may provide Wireless Communication solutions including WLAN (Wireless Local Area Networks, WLAN) (e.g., Wi-Fi (Wireless Fidelity, Wireless Fidelity)) Networks, BT (Bluetooth), GNSS (Global Navigation Satellite System), FM (Frequency Modulation), NFC (Near Field Communication), IR (Infrared technology), and the like, which are applied to the mobile terminal 100.

The mobile terminal 100 may implement a display function through the GPU, the display screen 190, the AP, and the like, and display a user interface. For example, when the user turns on a photographing function, the mobile terminal 100 may display a photographing interface, a preview image, and the like in the display screen 190.

The mobile terminal 100 may implement a photographing function through the ISP, the camera module 191, the encoder, the decoder, the GPU, the display screen 190, the AP, and the like.

The mobile terminal 100 may implement an audio function through the audio module 170, the speaker 171, the receiver 172, the microphone 173, the earphone interface 174, the AP, and the like.

The sensor module 180 may include a depth sensor 1801, a pressure sensor 1802, a gyroscope sensor 1803, an air pressure sensor 1804, etc. to implement corresponding sensing detection functions.

Indicator 192 may be an indicator light that may be used to indicate a state of charge, a change in charge, or a message, missed call, notification, etc. The motor 193 may generate a vibration cue, may also be used for touch vibration feedback, and the like. The keys 194 include a power-on key, a volume key, and the like.

The mobile terminal 100 may support one or more SIM card interfaces 195 for connecting SIM cards to implement functions such as telephony and mobile communications.

The image processing method of the present exemplary embodiment is described below with reference to fig. 2, where fig. 2 shows an exemplary flow of the image processing method, and may include:

step S210, acquiring continuous multi-frame images;

step S220, extracting foreground region images of at least two frames of images;

step S230, fusing the foreground sub-regions according to the maximum pixel value of each foreground sub-region in each foreground region image to form a foreground fused image, wherein the foreground sub-regions are sub-regions of the foreground region image;

step S240, generating a target image based on the foreground fusion image and the background region image of the at least one frame of image.

Based on the method of fig. 2, a technical scheme for realizing the simulation of the long exposure effect through multi-frame image fusion is provided. On the one hand, this scheme is applicable to electronic equipment such as smart mobile phone, and adopt ordinary exposure to shoot multiframe image in the practical application can, compare in the mode that adopts long exposure to shoot, can avoid because ghost, overexposure scheduling problem that exposure time is improper leads to it is less to receive the influence of equipment fuselage shake, thereby improves the image blur problem. On the other hand, the scheme adopts maximum pixel value fusion for the foreground area image, which is equivalent to fusion of the actual image of the moving light source or the moving object, can keep clear and continuous motion trail, and simultaneously reduces the influence of noise, thereby improving the quality of the finally output target image.

Each step in fig. 2 is explained in detail below.

In step S210, a continuous multi-frame image is acquired.

The continuous multiframe images can be images obtained by continuously shooting by aligning a lens to a certain area or an object. The areas photographed by the consecutive multi-frame images are substantially uniform, and of course, there may be a slight shift due to the influence of the body shake during photographing.

In one embodiment, step S210 may include the steps of:

determining exposure conditions according to the ambient illumination parameters;

a plurality of frame images continuously photographed under the above exposure condition are acquired.

The ambient illumination parameter is a parameter representing the intensity of ambient illumination, and may be obtained by detecting an ambient light sensor configured in the electronic device, or may be determined by detecting the brightness of the preview image, for example, a preview image is collected with a predetermined exposure time (Exp _ time) and sensitivity (ISO) during shooting, and the brightness level of the preview image is detected, so as to estimate the ambient illumination level, and obtain the corresponding ambient illumination parameter.

Generally, the larger the value of the ambient light parameter, the stronger the ambient light, and in order to obtain a proper exposure, lower exposure conditions, such as a shorter exposure time, a lower sensitivity, etc., may be set, so as to prevent adverse conditions such as overexposure and underexposure. For example, the ambient lighting parameter may be divided into three luminance intervals, which represent darker, medium and brighter environments, respectively; correspondingly setting a reference exposure time length for each brightness interval, such as the reference exposure time length of a darker environment being 1.5s, a medium brightness environment being 1s, and a brighter environment being 0.5 s; after the current ambient illumination parameters are obtained, the corresponding reference exposure time length is determined according to the brightness interval where the current ambient illumination parameters are located.

And after the exposure condition adaptive to the environmental illumination parameter is determined, shooting by adopting the exposure condition to obtain continuous multi-frame images.

In one embodiment, referring to fig. 3, the above-mentioned determining the exposure condition according to the ambient light parameter may include the following steps:

step S310, determining a corresponding first exposure duration according to the ambient illumination parameter;

step S320, obtaining a second exposure duration corresponding to the minimum sensitivity;

in step S330, an exposure condition is determined according to a maximum value of the first exposure duration and the second exposure duration and the minimum sensitivity.

The first exposure duration may be an exposure duration adapted to the ambient lighting parameter, such as the above-mentioned reference exposure duration. The system may be configured with a corresponding relationship between the ambient lighting parameter and the exposure duration in advance, where the ambient lighting parameter and the exposure duration are generally in positive correlation, such as linear positive correlation, interval positive correlation, and the like, so that the current ambient lighting parameter may be mapped to the corresponding first exposure duration.

Sensitivity is essentially the speed of the light sensing element, and the minimum sensitivity, i.e. the minimum speed of light sensing, is usually determined by the performance of the camera itself, for example the minimum speed of light sensing of a camera on a smartphone can be 50, 100, etc. Sensitivity and exposure period are two relevant parameters, and for example, a fixed exposure value (EV, if EV is set to 0) may be set, and the corresponding exposure period may be calculated by sensitivity, the higher the sensitivity, the shorter the exposure period. The present exemplary embodiment determines that the minimum sensitivity is adopted as the sensitivity in actual shooting, and the corresponding exposure time period is the longest, which is the second exposure time period.

The system may pre-configure a correspondence between sensitivity and exposure duration, which is generally negative correlation, and may configure different correspondence tables for different ambient lighting parameters, for example, as the ambient lighting parameter increases, the corresponding exposure duration for the same sensitivity decreases.

The present exemplary embodiment determines to use the maximum value of the first exposure duration and the second exposure duration as the exposure duration in the actual shooting, and if the first exposure duration is longer than the second exposure duration, the first exposure duration is used, and otherwise, the second exposure duration is used.

In actual shooting, if higher sensitivity is adopted, a shorter exposure time is required to be set, otherwise overexposure is easy to occur, and a certain time is required to be kept between the exposures of two adjacent frames of images, so that the jump of a certain object position in the two adjacent frames of images is easy to cause, and the subsequent image fusion is not facilitated. The exposure condition determined by the mode of the figure 3 can ensure that a single-frame image has longer exposure time during shooting, so that the effects of smear and light track can be formed in the single-frame image, the continuity between different frame images is stronger, and the subsequent fusion of a high-quality target image is facilitated.

With continued reference to fig. 2, in step S220, foreground region images of at least two frames of images are extracted.

For convenience of explanation, and to distinguish from images generated during subsequent processing, the continuous frame images acquired in step S210 are recorded as original images. Step S220 is to substantially segment the foreground region and the background region of the original image. The foreground region may be a moving part in the original image and correspondingly the background region may be a stationary part in the original image.

The at least two frames of images can be any two or more frames of images in a plurality of continuous frames of original images. In one embodiment, the image fusion method may perform screening according to image brightness, image sharpness, and the like, select at least two frames of original images with higher quality, for example, calculate a gradient of each frame of original image, remove an image with a gradient lower than a certain threshold, and extract a foreground region image from the remaining original images, thereby improving the quality of subsequent image fusion. In an implementation manner, foreground region images can be extracted from each frame of image in continuous multi-frame original images, that is, each frame of image is shot to participate in the fusion of subsequent foreground region images, so that the fusion information of the subsequent images is ensured to be sufficient.

For each frame of original image, the foreground and background can be segmented by detecting the moving object in the original image, so as to extract the foreground area image. For example, semantic segmentation may be performed on an original image, a semantic classification result corresponding to each portion is determined, whether each semantic is a moving object is determined, and then which portions in the image are moving objects are determined.

In one embodiment, as shown with reference to fig. 4, step S220 may include the steps of:

step S410, subtracting at least two frames of images to be segmented in the continuous multi-frame images from the reference frame images in the continuous multi-frame images respectively to obtain a difference image corresponding to each frame of image to be segmented;

step S420, according to the comparison result of each pixel value in the difference image and a preset threshold value, a foreground mask image corresponding to each frame of image to be segmented is generated;

and step S430, extracting a foreground area image from the image to be segmented by using the foreground mask image.

The image to be segmented is an original image subsequently used for foreground fusion, namely the at least two frames of images. The reference frame image provides reference information for detecting a moving object, and may be any one of a plurality of consecutive original frames, for example, the first frame. After the image to be segmented is subtracted from the reference frame image, the pixel difference value of the part where the moving object is located is larger, the pixel difference value of the background part is smaller, and the pixel value of each pixel point in the obtained difference image is the pixel difference value of the two images; note that, in the case where a negative value exists in the pixel difference value, the absolute value of the pixel difference value may be adopted as the pixel value in the difference image in the present exemplary embodiment. The preset threshold is a pixel difference standard for dividing a moving object and a static object, and can be determined according to experience or an actual scene. In the difference image, if the pixel value of a certain pixel point is greater than a preset threshold value, the pixel value belongs to a foreground region, otherwise, the pixel value belongs to a background region, so that the foreground and the background are segmented, and a foreground mask image is obtained. Step S420 corresponds to the difference image being actually subjected to the binarization process. And finally, multiplying the foreground mask image and the image to be segmented to obtain a foreground area image, wherein the complete information of a foreground part is reserved, and the pixel values of a background part are all 0.

The flow of fig. 4 is illustrated below by fig. 5: acquiring continuous k frames of original images I₀，I₁，I₂，…，I_k-1(ii) a Selecting a first frame I₀Taking each frame image as an image to be segmented as a reference frame image; respectively mixing I₁，I₂，…，I_k-1And I₀Subtracting to obtain a difference image D₁，D₂，…，D_k-1(ii) a Respectively convert the difference image D₁，D₂，…，D_k-1Comparing the pixel value of each pixel point with a preset threshold, if the pixel value is greater than the preset threshold, the mask value of the pixel point is 1, otherwise, the mask value is 0, and the process is equivalent to the step D₁，D₂，…，D_k-1Performing binarization processing to generate a foreground mask image M₁，M₂，…，M_k-1(ii) a Finally, multiplying each frame of foreground mask image with the corresponding image to be segmented respectively to obtain a foreground area image F₁，F₂，…，F_k-1。

When i represents any frame from 1 to k-1, the above process can be expressed by the following formula:

F_i(x,y)＝M_i(x,y)·I_i(x,y) (2)

(x, y) represents any pixel point in the ith frame image; t is a preset threshold value. Formula (1) represents that after the image to be segmented is subtracted from the reference frame image, the pixel value difference is compared with a preset threshold value T to determine that the mask value of each pixel point is 1 or 0, so that a foreground mask image M is obtained. Formula (2) represents that the foreground mask image is multiplied by the image to be segmented to obtain a foreground region image.

For different frames of images to be segmented, after subtracting the reference frame image to obtain a difference image, a uniform preset threshold value can be adopted to compare pixel values in the difference image so as to determine a foreground mask image, and different preset threshold values can also be adopted. In one embodiment, after obtaining the difference image, the distribution of pixel values in the difference image may be counted to determine the preset threshold. For example, a histogram doublet method, a maximum inter-class variance method, a two-dimensional entropy threshold segmentation method, a local threshold method and the like are adopted to determine a preset threshold, and then a corresponding foreground mask image is generated according to the preset threshold. Therefore, different preset threshold values are adopted for the images to be segmented of different frames, the flexibility of image segmentation is improved, and more accurate foreground region segmentation is facilitated.

In consideration of the fact that the foreground mask image and the foreground region image obtained through fig. 4 may have edge burrs and the like, in an embodiment, morphological processing, such as image etching, dilation and the like, may be performed on the foreground mask image, so that the edge of the foreground mask image is smooth and the shape is more complete, thereby further improving the quality of the extracted foreground region image.

It should be added that, in the flow of fig. 4, if the image to be segmented includes a reference frame image, the foreground mask image corresponding to any other frame of image to be segmented may be adopted to extract the foreground region image from the reference frame image. For example, FIG. 5 illustrates the use of a foreground mask image M₁And a reference frame picture I₀Multiplying to obtain the foreground region image of the reference frame, which can be marked as F₀。

With reference to fig. 2, in step S230, the foreground sub-regions are fused according to the maximum pixel value of each foreground sub-region in each foreground region image, so as to form a foreground fused image, where the foreground sub-regions are sub-regions of the foreground region image.

The foreground sub-region may be a sub-region obtained by dividing a foreground region image in any form. For example, every 2 × 2 or 3 × 3 pixel points in the foreground region image are used as a foreground sub-region, or every pixel point in the foreground region image is used as a foreground sub-region, or the foreground sub-regions are divided according to the color distribution of the foreground region image, and so on. It should be noted that, for different foreground region images, the manner of dividing the foreground sub-regions is the same, so that each foreground sub-region is at the same position in different foreground region images.

In the exemplary embodiment, the foreground region image is divided into different foreground sub-regions, for each foreground sub-region, the maximum pixel value is selected from each foreground region image, and then each foreground sub-region is spliced into the foreground fusion image. Step S230 actually performs maximum pixel value fusion on the foreground region images extracted in step S220, that is, fusing the foreground region images into a foreground fusion image, and retaining the maximum pixel values of the foreground sub-regions during fusion.

In an embodiment, each pixel point in the foreground region image may be used as a foreground sub-region, the pixel value of the pixel point is the pixel value of the foreground sub-region, and the maximum pixel value of the pixel point is selected from each foreground region image to be used as the pixel value of the pixel point in the foreground fusion image. Therefore, the foreground fusion at the pixel level can be realized, and the improvement of the fineness of the foreground fusion is facilitated.

In one embodiment, step S230 may include the steps of:

for each foreground subarea, respectively determining a pixel statistic value of the foreground subarea in each foreground area image;

determining a foreground area image corresponding to the maximum pixel statistic value of the foreground sub-area as a target foreground area image corresponding to the foreground sub-area;

and taking the pixel value of the foreground subarea in the target foreground area image as the pixel value of the foreground subarea in the foreground fusion image.

The above-described flow is illustrated with reference to fig. 6: in the foreground region image F₀，F₁，F₂，…，F_k-1Extracting foreground subarea A, and respectively recording the image blocks corresponding to A in each foreground area image as A _ F₀，A_F₁，A_F₂，…，A_F_k-1，A_F₀The image blocks have the same position, shape and size in the respective images, e.g. A _ F₀At F₀Position, shape, size and A _ F in₁At F₁The positions, shapes and sizes of the components are the same. Respectively to A _ F₀，A_F₁，A_F₂，…，A_F_k-1Performing pixel statistics to obtain a pixel statistic value, where the pixel statistic value may be a pixel average value, a pixel maximum value, a luminance average value, and the like, which is not limited in this disclosure; it should be noted that, when the foreground sub-region is only one pixel point, the average value of the pixel values of each channel of the pixel point may be counted, or the pixel value is converted into a brightness value (or a gray value), and the brightness value is used as a pixel count value; the pixel average provides a way to perform pixel value measurements on the foreground sub-regions, whereby the relative sizes of pixel values of the same foreground sub-region in different foreground region images can be compared. Taking the pixel average value as an example, A _ F is counted₀The average value of the pixels in the pixel is marked as P (A _ F)₀) The same statistics result in P (A _ F)₁)，P(A_F₂)，…，P(A_F_k-1) (ii) a Comparison P (A _ F)₀)，P(A_F₁)，P(A_F₂)，…，P(A_F_k-1) Determining the maximum pixel statistic therein, assuming P (A _ F)₂) Then determine the corresponding foreground region image F₂A target foreground region image corresponding to the foreground sub-region A; then selecting a foreground subarea A at F₂The pixel value of the foreground sub-area a is used as the pixel value of the foreground sub-area a in the foreground fusion image. Based on the same method, the pixel values of other foreground sub-regions can be determined, and finally the foreground fusion image is spliced.

In the foreground fusion image, the maximum pixel value of each foreground sub-area or each pixel point of the foreground part in the continuous multi-frame images is reserved, so that the maximum pixel value corresponds to the actual position of the moving light source or the moving object to a certain extent. For example, in FIG. 6, the foreground sub-region A is in the foreground region image F₂The largest pixel statistic in (1) indicates that the image F is in the foreground region₂(or original image I)₂) The middle foreground sub-area a is the actual position of the moving light source, while the foreground sub-area a may be the smear of the moving light source in other frame images. When the foreground area images are fused, the maximum pixel value of each foreground subarea in different foreground area images is selected, which is equivalent to that a moving light source or a moving pair is usedThe actual images of the images are fused, which is beneficial to realizing the motion track effect of long exposure. Moreover, the mode of maximum pixel value fusion is adopted for the motion part, so that the influence of noise can be reduced, and the image quality is improved.

In consideration of the possible slight shift of the shooting area between different frames of original images, in order to ensure the accuracy of image fusion, the different frames of original images can be registered. For example, after acquiring the continuous multi-frame images in step S210, the registration between the multi-frame images is performed first, and then the subsequent steps S220 to S240 are performed based on the registered images.

In one embodiment, the registration may be performed after step S220, specifically including:

and registering the at least two frames of images according to the background area images of the at least two frames of images.

Wherein, the background area image is the part of the original image except the foreground area image. For example, in FIG. 5, a foreground mask image M is obtained₁，M₂，…，M_k-1Then, respectively carrying out reverse mask operation, namely reversing 0/1 values in the foreground mask image to obtain a background mask image RM₁，RM₂，…，RM_k-1Then, the background mask image RM is used₁，RM₂，…，RM_k-1Respectively corresponding to the original images I₁，I₂，…，I_k-1Multiplying to obtain background region image B₁，B₂，…，B_k-1. Then according to the background region image B₁，B₂，…，B_k-1Registration of the original images is performed. Because the image contents of the background areas of the original images of different frames are basically consistent, the registration by adopting the images of the background areas is not interfered by moving objects in the foreground areas, and the registration precision is favorably improved.

It should be noted that any background mask image may be used to extract the background region image of the reference frame image, for example, the background mask image RM is used₁And a reference frame picture I₀Multiplying to obtain a background area image B of the reference frame image₀。

The registration is to ensure the accuracy of subsequent foreground region image fusion, so that only the images to be segmented in the original images can be registered, and when all the original images are selected as the images to be segmented, all the original images are registered.

In one embodiment, the at least two frame images may include a reference frame image, and when registering, images other than the reference frame image may be registered to the reference frame image. For example in the original image I₀，I₁，I₂，…，I_k-1In selecting a first frame I₀For reference frame picture, I₁，I₂，…，I_k-1All directions are I₀Registration, thereby unifying the standard of registration and finally realizing the standard of registration I₀For the registration of all the original images for reference, the registration between any two frames of original images is actually realized

In one embodiment, referring to fig. 7, registering the at least two images according to the background region image of the at least two images may be implemented by:

step S710, respectively extracting characteristic points from the background area images of the two frames of images;

step S720, carrying out feature point matching on the background area images of the two frames of images to obtain matching point pairs;

step S730, determining a homography matrix according to the matching point pairs;

step S740, registering one of the two frames of images to the other frame of image through the homography matrix.

The feature points are representative points or regions with high identification in the image, such as corner points, boundaries, and the like in the image. In the background area image of the original image, gradients at different positions can be detected, and feature points are extracted at positions with larger gradients. Generally, after the feature points are extracted, they need to be described, for example, pixel distribution features around the feature points are described by an array, which is called as description information (or descriptor, descriptor) of the feature points. The description information of the feature points can be regarded as local description information of the background region image. The present disclosure is not particularly limited to the type of Feature points and description information thereof, and for example, Harris corner algorithm may be used to extract and describe the Feature points, or FAST (Features From estimated Segment Test, Features detected based on Accelerated segmentation), BRIEF (Binary Robust Independent basic Features), ORB (ordered FAST and Rotated BRIEF, FAST Oriented and Rotated BRIEF), SIFT (Scale-Invariant Feature Transform, and other algorithms may be used.

The matching point pair refers to a pair of similar characteristic points in the two background area images, and can be regarded as the projection of the same object point in the real world on the two background area images. Whether the two characteristic points are matched point pairs can be judged by calculating the similarity of the description information of the two characteristic points. In general, the description information of the feature points may be expressed as vectors, the L1 distance or the L2 distance between the description information vectors of the two feature points is calculated, and the top point pairs (e.g., the top 15% of the point pairs, or the top n point pairs, where the n value may be set according to actual needs) are selected as matching point pairs by traversing all the point pairs between the two background region images and sorting the point pairs according to the L1 distance or the L2 distance from small to large.

The homography matrix is a matrix for describing the posture transformation relation between the two background area images. Generally, the homography matrix can be solved by 4 matching point pairs. If the number of the matching point pairs obtained in step S720 is greater than 4, 4 point pairs may be randomly selected, or 4 matching point pairs with the smallest L1 distance or L2 distance may be selected. When the homography matrix is solved, iterative optimization can be performed by using algorithms such as RANSAC (Random Sample Consensus) and the like based on geometric constraint relations among images in the background region, such as epipolar constraint and the like, so as to obtain an optimal solution of the homography matrix, and further improve the accuracy of image registration.

Affine transformation can be carried out on one of the two frames of images through the homography matrix so as to be registered to the other frame of image, for example, the two frames of images comprise a reference frame image, and then a non-reference frame image is registered to the reference frame image, so that the registration of the two frames of images is completed.

Referring to FIG. 5 above, the foreground mask image M₁，M₂，…，M_k-1There may be a difference between the foreground and background segmentation of different frame images, so that there may be a difference between the range corresponding to the background region of the original image of different frames.

In one embodiment, in order to make the background region of the reference frame image consistent with the background region of other original images, the background mask images of different frames may be respectively used to extract the background region image of the reference frame image, so as to respectively register the original images of different frames. For example, to connect I₁，I₂，…，I_k-1To reference frame picture I₀Registering k-1 frame image with reference frame image I₀Pairing to obtain an image pair (I)₀，I₁)，(I₀，I₂)，…，(I₀，I_k-1). For image pair (I)₀，I₁) By using I₁Background mask image RM of₁Are each independently of I₀、I₁Multiplying to obtain two background region images, and recording as B_{0_1}，B₁(ii) a Then calculate B_{0_1}And B₁Homography matrix of between, thereby I₁To I₀Registering; for image pair (I)₀，I₂) By using I₂Background mask image RM of₂Are each independently of I₀、I₂Multiplying to obtain two background region images, and recording as B_{0_2}，B₂(ii) a Then calculate B_{0_2}And B₂Homography matrix of between, thereby I₂To I₀Registration … … is thus performed from each frame of original image to the reference frame image I₀And (4) registering. Therefore, the consistency of the background area can be utilized to the maximum extent for registration, and the registration accuracy is improved.

In another embodiment, the background areas of the original images of different frames may be unified. Specifically, before the registration, the following steps may be performed:

when the background areas corresponding to the background area images are not completely overlapped, taking intersection of the background areas corresponding to the background area images to obtain a public background area;

and deleting the part of each background area image, which is positioned outside the common background area.

For example, the background mask image RM may be masked₁，RM₂，…，RM_k-1And (3) taking intersection, namely traversing each pixel point in the background mask image, if the value of the pixel point in all the background mask images is 1, determining that the pixel point is in a public background area, otherwise, determining that the pixel point is in a foreground area, thus obtaining a public background mask image, and recording the public background mask image as RM_inter. Re-use of RM_interRespectively correspond to the background region image B₀，B₁，B₂，…，B_k-1(or respectively with the original image I₀，I₁，I₂，…，I_k-1) The multiplication is equivalent to deleting a portion located outside the common background region (actually, setting a pixel point located outside the common background region to 0), thereby realizing optimization of the background region image. And subsequently, registration can be carried out based on the optimized background region image, so that the registration accuracy can be further improved.

In an embodiment, the foreground region image may be extracted again from the at least two registered frames of images to perform step S230. Therefore, the extracted foreground area images have pose consistency through registration, and accurate fusion is facilitated.

Because the foreground and background segmentation of different frame images are not exactly the same, there may be differences in the area ranges corresponding to foreground region images of different frames, such as foreground region image F in fig. 5₀，F₁，F₂，…，F_k-1There is a slight difference in the shape of the middle foreground portion. In order to achieve more accurate foreground fusion, in one embodiment, before step S230, the following steps may be performed:

when the foreground regions corresponding to the foreground region images are not completely overlapped, taking intersection of the foreground regions corresponding to the foreground region images to obtain a public foreground region;

and deleting the parts of each foreground area image, which are positioned outside the common foreground area.

For example, the foreground mask image M may be processed₁，M₂，…，M_k-1And (3) taking intersection, namely traversing each pixel point in the foreground mask image, if the values of the pixel points in all the foreground mask images are 1, determining that the pixel point is in a public foreground region, otherwise, determining that the pixel point is in a background region, thus obtaining a public foreground mask image, and recording the public foreground mask image as M_inter. Then using M_interRespectively correspond to the foreground region images F₀，F₁，F₂，…，F_k-1(or respectively with the original image I₀，I₁，I₂，…，I_k-1) The multiplication is equivalent to deleting the part outside the common foreground region (actually, setting the pixel point outside the common foreground region as 0), thereby realizing the optimization of the foreground region image. The step S230 may be performed subsequently based on the optimized foreground region image, which can further improve the accuracy of the fusion, and in particular, improve the error caused by the inconsistent segmentation of the foreground and background boundary portions.

With continued reference to fig. 2, in step S240, a target image is generated based on the foreground fusion image and the background region image of at least one frame of image.

In the foreground fusion image, the pixel value of the background part is 0, and the background part can be spliced with the background area image of any frame of original image to obtain a complete image, namely a target image.

In one embodiment, referring to fig. 8, step S240 may include:

step S810, fusing background area images of at least two frames of images to obtain a background fused image;

step S820, a target image is generated based on the foreground fusion image and the background fusion image.

The at least two frames of images in step S810 are used for background fusion, and may be the same as or different from the at least two frames of images used for foreground fusion in step S220, which is not limited in this disclosure. In one embodiment, all of the original images may be selected for background fusion in step S810. Compared with the original background region image, the background fusion image obtained by fusing the multi-frame background region images has richer detail information, and can reduce the defects of blurring, ghost and the like possibly existing in a single background region image, so that the target image is further obtained by splicing the foreground fusion image and the background fusion image, and the quality of the target image is favorably improved.

The present disclosure is not limited to the specific manner of background fusion. In one embodiment, step S810 may be implemented by:

and averaging the pixel values of each pixel point in the background area in each background area image to obtain a background fusion image.

The average value of the pixel points at the same position in each background area image is taken as the pixel value of the pixel point in the background fusion image, so that the pixel average value fusion of each background area image is realized. Therefore, the maximum value fusion of pixels is adopted for the foreground part, the average value fusion of pixels is adopted for the background part, the noise problem of the target image can be effectively improved, and the effect of synthesizing a continuous and clear motion track is ensured.

Fig. 9 shows a flow architecture diagram of an image processing method in the present exemplary embodiment, which divides the entire image processing flow into three parts:

and (5) image registration. Acquiring continuous k frames of original images I₀，I₁，I₂，…，I_k-1K is a positive integer greater than 1; selecting reference frame pictures, e.g. first frame I₀As a reference frame image, subtracting the original images of other frames from the reference frame image to obtain a difference image D₁，D₂，…，D_k-1(ii) a Carrying out binarization processing on the difference image to obtain a foreground mask image M₁，M₂，…，M_k-1And a background mask image RM₁，RM₂，…，RM_k-1The foreground mask image and the background mask image are in a complementary relationship, namely the foreground mask image is a reverse mask image of the background mask image; for RM₁，RM₂，…，RM_k-1Obtaining the intersection to obtain the public background mask image RM_inter(ii) a Masking the common background image RM_interRespectively with the original image I₀，I₁，I₂，…，I_k-1Multiplying to obtain a background area image B before registration₀，B₁，B₂，…，B_k-1(ii) a Respectively mixing B with₁，B₂，…，B_k-1And B₀Matching the characteristic points and calculating to obtain a homography matrix H₁，H₂，…，H_k-1(ii) a Respectively converting I into I by using homography matrix₁，I₂，…，I_k-1To I₀Registering to obtain k registered images I₀，I₁'，I₂'，…，I_k-1'。

And (5) image segmentation. For foreground mask image M₁，M₂，…，M_k-1And obtaining the common foreground mask image M by solving the intersection_inter(ii) a Masking the public foreground image M_interSeparately registering images I₀，I₁'，I₂'，…，I_k-1' multiplication to obtain a registered foreground region image F₀'，F₁'，F₂'，…，F_k-1'; masking the public foreground image M_interReverse mask image (1-M)_inter) (Note, 1-M)_interAnd RM_interGenerally unequal) are respectively registered with the image I₀，I₁'，I₂'，…，I_k-1' multiplication to obtain a registered background region image B₀'，B₁'，B₂'，…，B_k-1'。

And (5) image fusion. Registering the foreground area image F₀'，F₁'，F₂'，…，F_k-1' carry on the maximum value of the pixel to amalgamate, get the prospect amalgamates the picture F_fu(ii) a The registered background area image B₀'，B₁'，B₂'，…，B_k-1' carry out pixel average value fusion to obtain background fusion image B_fu(ii) a Spliced foreground fusion image F_fuFusing image B with background_fuSplicing togetherThe pixel values of the two images can be added to obtain a target image I_out. Target image I_outHas the image effect of simulating long exposure.

Fig. 10 shows a flow architecture diagram of another image processing method in the present exemplary embodiment, and similarly to the flow architecture diagram of fig. 9, the entire image processing flow is also divided into three parts:

and (5) image registration. Acquiring continuous k frames of original images I₀，I₁，I₂，…，I_k-1K is a positive integer greater than 1; selecting reference frame pictures, e.g. first frame I₀As a reference frame image, subtracting the original images of other frames from the reference frame image to obtain a difference image D₁，D₂，…，D_k-1(ii) a Carrying out binarization processing on the difference image, setting pixel points with pixel values higher than a preset threshold value in the difference image as 0, and setting pixel points with pixel values lower than the preset threshold value as 1, thereby obtaining a background mask image RM₁，RM₂，…，RM_k-1(ii) a For RM₁，RM₂，…，RM_k-1Obtaining the intersection to obtain the public background mask image RM_inter(ii) a Masking the common background image RM_interRespectively with the original image I₀，I₁，I₂，…，I_k-1Multiplying to obtain a background area image B before registration₀，B₁，B₂，…，B_k-1(ii) a Respectively mixing B with₁，B₂，…，B_k-1And B₀Matching the characteristic points and calculating to obtain a homography matrix H₁，H₂，…，H_k-1(ii) a Respectively converting I into I by using homography matrix₁，I₂，…，I_k-1To I₀Registering to obtain k registered images I₀，I₁'，I₂'，…，I_k-1'。

And (5) image segmentation. Respectively dividing reference frame image I₀Registering images with I other than by frame₀Subtracting to obtain a difference image D after registration₁'，D₂'，…，D_k-1'; for difference image D₁'，D₂'，…，D_k-1Performing binarization processing, setting the pixel point of the difference image with the pixel value higher than the preset threshold value as 1, and setting the pixel point with the pixel value lower than the preset threshold value as 0, thereby obtaining the foreground mask image M after registration₁'，M₂'，…，M_k-1'; to M₁'，M₂'，…，M_k-1Intersecting to obtain the registered public foreground mask image M_inter'; the registered public foreground mask image M_inter' separately registering images I₀，I₁'，I₂'，…，I_k-1' multiplication to obtain a registered foreground region image F₀'，F₁'，F₂'，…，F_k-1'; will M_inter' reverse mask image (1-M)_inter') separately registering the images I₀，I₁'，I₂'，…，I_k-1' multiplication to obtain a registered background region image B₀'，B₁'，B₂'，…，B_k-1'。

And (5) image fusion. Registering the foreground area image F₀'，F₁'，F₂'，…，F_k-1' carry on the maximum value of the pixel to amalgamate, get the prospect amalgamates the picture F_fu(ii) a The registered background area image B₀'，B₁'，B₂'，…，B_k-1' carry out pixel average value fusion to obtain background fusion image B_fu(ii) a Spliced foreground fusion image F_fuFusing image B with background_fuWhen splicing, the pixel values of the two images can be added to obtain a target image I_out. Target image I_outHas the image effect of simulating long exposure.

Exemplary embodiments of the present disclosure also provide another image processing method. Referring to fig. 11, the image processing method includes the following steps S1110 to S1140:

in step S1110, a reference frame image is determined in the continuous multi-frame images, and at least one frame image other than the reference frame image is determined as an image to be fused.

The continuous multiframe images can be images obtained by continuously shooting by aligning a lens to a certain area or an object. The specific manner of acquiring the continuous multi-frame images may refer to the content of step S210. The reference frame image is an image providing an image fusion reference, and may be any one of a plurality of consecutive original frames, for example, the first frame. The image to be fused is an image to be fused with the reference frame image, and may be any one or more frames of images except the reference frame image, for example, in a continuous multi-frame image, the first frame of image is used as the reference frame image, and each other frame of image is used as the image to be fused. In one embodiment, the image to be fused with higher quality may be selected by performing screening according to the brightness, the sharpness, and the like of the image.

In step S1120, the foreground region of the image to be fused is determined.

The foreground region may be a moving part in the image to be fused, and correspondingly, the stationary part is a background region, and the foreground region and the background region are usually in a complementary relationship. The foreground and the background can be segmented by detecting the moving object in the image to be fused, so that the foreground area is determined.

In one embodiment, step S1120 may include the steps of:

subtracting the image to be fused from the reference frame image to obtain a difference image corresponding to the image to be fused;

and determining a foreground region of the image to be fused according to the comparison result of each pixel value in the difference image and a preset threshold value.

The above steps can refer to the contents of fig. 4 and fig. 5. The preset threshold is a pixel difference standard for dividing a moving object and a static object, and can be determined according to experience or an actual scene. In the difference image, if the pixel value of a certain pixel point is greater than a preset threshold value, the pixel value belongs to a foreground region, otherwise, the pixel value belongs to a background region, so that the segmentation of the foreground and the background is realized, the foreground region of the image to be fused is determined, and meanwhile, the background region can also be determined.

Step S1130, at least one subregion is extracted from the image to be fused.

The sub-regions may be sub-regions obtained by dividing the image to be fused in any form. For example, every 2 × 2 or 3 × 3 pixel points in the image to be fused are used as a sub-region, or every pixel point in the image to be fused is used as a sub-region, or sub-regions are divided according to the color distribution of the image to be fused, and the like. The sub-region extracted in step S1130 may be located in a foreground region, which is a foreground sub-region, or located in a background region, which is a background sub-region.

In one embodiment, a part of the sub-regions in the image to be fused, typically the sub-regions that need to be fused to the reference frame image, may be extracted according to actual requirements, for example, only the sub-regions are extracted from the foreground region, or the gradients of the sub-regions in the image to be fused are calculated, and the sub-regions with the gradients larger than a certain threshold (i.e., high-frequency sub-regions) are extracted.

In one embodiment, the image to be fused may be divided into a plurality of sub-regions, and each sub-region is extracted, that is, all sub-regions in the image to be fused are extracted, which is equivalent to the requirement that the whole image to be fused is fused to the reference frame image.

In one embodiment, each pixel point in the image to be fused may be determined as a sub-region. Therefore, each pixel point is used as a sub-region and is fused to the reference frame image in the subsequent processing, so that the image fusion at the pixel level is realized, and the improvement of the fineness of the image fusion is facilitated.

In one embodiment, the image to be fused and the reference frame image may be registered to improve the accuracy of subsequent image fusion. The specific manner of registration may refer to the above-mentioned registration of at least two images in fig. 2.

It should be noted that the reference frame image and the image to be fused may be registered after being determined in step S1110, or the foreground region and the background region of the image to be fused may be determined in step S1120, and then the registration may be performed based on the background region, and the foreground region of the image to be fused may be re-determined after the registration.

Step S1140, respectively fusing the sub-regions in the image to be fused to the reference frame image, wherein when the sub-regions are located in the foreground region, the larger pixel value of the sub-region in the image to be fused and the reference frame image is taken as the pixel value of the sub-region after fusion.

It should be noted that a sub-region in the image to be fused is generally an image block, the sub-region has a corresponding image block in the reference frame image, and the positions, shapes, and sizes of the two image blocks are generally the same. For example from the image I to be fused₁Extracting sub-area A, and recording as image block A _ I₁In the reference frame image I, the sub-region A₀Is A _ I₀，A_I₁In I₁Position, shape, size and A _ I₀In I₀The positions, shapes and sizes of the components are the same. Selecting A _ I₀And A _ I₁The larger one of the pixel values is used as the pixel value of the sub-area A in the fused image. Specifically, note A _ I₀And A _ I₁Respectively, is P (A _ I)₀) And P (A _ I)₁) If P (A _ I)₀)≥P(A_I₁) Then the reference frame image I is retained₀Pixel value of the sub-region A if P (A _ I)₀)<P(A_I₁) Then, the image I to be fused₁Replacing reference frame image I by pixel value of sub-region A₀To complete the fusion of sub-region a.

In one embodiment, step S1140 may include the steps of:

determining the pixel statistic value of the sub-area in the image to be fused and the pixel statistic value in the reference frame image;

and taking the pixel value of the sub-area in the image corresponding to the larger pixel statistic value as the pixel value of the sub-area after fusion.

The pixel statistic value may be a pixel average value, a pixel maximum value, a brightness average value, etc., which is not limited in this disclosure; it should be noted that, when the sub-region is only one pixel, the average value of the pixel values of each channel of the pixel may be counted, or the pixel value is converted into a luminance value (or a gray value), and the luminance value is used as the pixel count value. Therefore, the relative sizes of the pixel values of the same sub-region in different images (namely the reference frame image and the image to be fused) can be compared through the pixel statistical values, so that the pixel value of the sub-region in the image with the larger pixel value is selected as the pixel value after fusion, and the fusion of the sub-region is completed.

In one embodiment, when fusing the sub-regions in the image to be fused to the reference frame image respectively, the following steps may be further performed:

and when the sub-region is positioned in the background region of the image to be fused, taking the pixel weighted value of the sub-region in the image to be fused and the reference frame image as the pixel value of the sub-region after fusion.

For example, the sub-region A is in the reference frame image I₀Is marked as A _ I₀In the image I to be fused₁Is mentioned as A _ I₁The pixel values are respectively P (A _ I)₀) And P (A _ I)₁) The pixel weighting value may be for P (A _ I)₀) And P (A _ I)₁) The weighted average is performed. The weighting required for weighting may be determined empirically or in practice, for example, P (A _ I) may be set₀) And P (A _ I)₁) Is always 0.5, i.e. P (A _ I) is taken₀) And P (A _ I)₁) The average value of (a) is used as the pixel value of the post-fusion subregion a.

It should be added that there may be one or more sub-regions located on the border of the foreground region and the background region, for example, a part of the sub-region a is located in the foreground region, and another part is located in the background region, and the sub-region a can be generally regarded as a sub-region of the background region.

In one embodiment, step S1110 may include the steps of:

determining an initial reference frame image in continuous multi-frame images, and sequentially determining each other frame image as an image to be fused;

and after the sub-region of the image to be fused is fused to the reference frame image, updating the reference frame image by the fused image.

For example, acquiring successive k frame images I₀，I₁，I₂，…，I_k-1(ii) a Selecting a first frame I₀Is an initial reference frame image I_refInstant I_ref＝I₀And sequentially adding othersEach frame image is taken as an image to be fused; extracting sub-regions from I1, and fusing the sub-regions to reference frame image I_refIn (1), after fusion, the reference frame image I is updated_ref(ii) a Then again from I₂Extracting sub-regions and respectively fusing the sub-regions to the reference frame image I_refTo update the reference frame image I_ref(ii) a …, respectively; iterating the above process, each fusion being equivalent to updating the reference frame image I_refUntil I is completed_k-1To obtain the final reference frame image I_ref。

In one embodiment, after the sub-region of the last frame of image to be fused is fused to the reference frame image, the target image is obtained. For example, sequentially with I₁，I₂，…，I_k-1Fusing as an image to be fused to finally obtain a reference frame image I_refFor a target image I for output_out。

In an embodiment, in the process of sequentially fusing each other frame of image as the image to be fused, when the sub-region is located in the background region, the pixel weighted value of the sub-region in the image to be fused and the reference frame image may be used as the pixel value of the sub-region after fusion based on the first weight of the reference frame image and the second weight of the image to be fused. Wherein the first weight is

The second weight is

And i is the ordinal number of the current image to be fused. For example, the above sequential fusion I₁，I₂，…，I_k-1In the process, the current image to be fused is recorded as I_i(0<i<k) I represents the ordinal number of the current image to be fused, namely the current image to be fused is the first image to be fused; if i is 3, the first weight is 3/4, and the second weight is 1/4. The current reference frame image is the result of the fusion of the initial reference frame image and the two images to be fused, namely the result of the fusion of the three images, and the current image to be fused is one imageIt can be seen that the first weight and the second weight are weights occupied by the number of fused images, and the pixel weighted value obtained thereby is actually the pixel average value of the reference frame image and all images to be fused.

Based on the method of fig. 11, another technical scheme for realizing the simulation of the long exposure effect through multi-frame image fusion is provided, and the method is applicable to electronic equipment such as smart phones, avoids the problems of ghost, overexposure and the like caused by improper exposure time, and is less affected by the shaking of the equipment body, so that the image blurring problem is improved; clear and continuous motion tracks can be reserved in the fused images, meanwhile, the influence of noise is reduced, and the quality of image fusion is improved.

In one embodiment, the image processing method of fig. 11 may be implemented as follows:

in successive k frame images I₀，I₁，I₂，…，I_k-1In selecting a first frame I₀Is an initial reference frame image I_refInstant I_ref＝I₀(ii) a Will I₁，I₂，…，I_k-1All directions are I_refRegistering to obtain registered image I₁'，I₂'，…，I_k-1', sequentially serving as images to be fused;

are respectively at I₁'，I₂'，…，I_k-1Determining foreground area to obtain foreground mask image M of the k-1 frame image to be fused₁，M₂，…，M_k-1；

Taking each pixel point in each frame image to be fused as a sub-region, such as any ith frame (0)<i<k) Image to be fused I_iIn the' method, the pixel point with coordinates (x, y) is marked as I_i'(x,y)；

At the time of fusion, input I₀，I₁'，I₂'，…，I_k-1', and a foreground mask image M of k-1 frame₁，M₂，…，M _k-1；

Sequentially mixing I₁'，I₂'，…，I_k-1To I_refFuse to update I_ref(ii) a To pairImage I to be fused in any ith frame_i', from the foreground mask image M_iJudging whether the pixel points are located in a foreground area or a background area, and fusing the pixel points in the foreground area into I according to the maximum value of the pixel_refThe pixel points of the background area are fused into I according to the pixel weighted value_refExpressed as follows:

up to I₁'，I₂'，…，I_k-1' complete the fusion completely, output the final target image I_out＝I_ref。

Fig. 12 is a flow architecture diagram of the image processing method, which divides the whole image processing flow into two parts:

and (4) image segmentation and registration. Acquiring continuous k frames of original images I₀，I₁，I₂，…，I_k-1K is a positive integer greater than 1; selecting reference frame pictures, e.g. first frame I₀As an initial reference frame image I_refDetermining other frame original images as images to be fused; respectively fusing the image to be fused with I_refSubtracting to obtain a difference image D₁，D₂，…，D_k-1(ii) a Carrying out binarization processing on the difference image to obtain a foreground mask image M₁，M₂，…，M_k-1And a background mask image RM₁，RM₂，…，RM_k-1The foreground mask image and the background mask image are in a complementary relationship, namely the foreground mask image is a reverse mask image of the background mask image; using RM₁To I_refAnd I₁Extracting and matching the characteristic points of the background area, and calculating I through the matching point pairs_refAnd I₁Homography matrix H between₁Using RM₂To I_refAnd I₂Extracting characteristic points of the background area for matching, and calculating I through matching point pairs_refAnd I₂Homography matrix H between₂… … thus obtain a homography matrix H of all the images to be fused₁，H₂，…，H_k-1(ii) a Respectively converting I into I by using homography matrix₁，I₂，…，I_k-1To I₀Registering to obtain k registered images I₀，I₁'，I₂'，…，I_k-1'。

And (5) image fusion. Input I₀，I₁'，I₂'，…，I_k-1', and M₁，M₂，…，M_k-1(ii) a Let I_ref＝I₀I is 1, 2, …, k-1, and is incremented, the above equation (3) is executed to iteratively update I_ref(ii) a Outputting a target image I after traversing all the images to be fused_out＝I_ref。

Fig. 13 is a continuous multi-frame image shot for a night scene of an urban road, and fig. 14 is a target image obtained by fusing the continuous multi-frame image of fig. 13 by using the method flow of fig. 12. Therefore, the image processing method of the exemplary embodiment can obtain the streamer images with clear images and continuous light tracks in the shooting of scenes with moving light sources in the night scenes, enrich the shooting style and the picture expressive force of the night scenes, and improve the user experience.

Exemplary embodiments of the present disclosure also provide an image processing apparatus. Referring to fig. 15, the image processing apparatus 1500 may include:

an image acquisition module 1510 configured to acquire a plurality of consecutive frame images;

an image segmentation module 1520 configured to extract foreground region images of the at least two frames of images;

the foreground fusion module 1530 is configured to fuse the foreground sub-regions according to the maximum pixel value of each foreground sub-region in each foreground region image to form a foreground fusion image, wherein each foreground sub-region is a sub-region of the foreground region image;

a target fusion module 1540 configured to generate a target image based on the foreground fusion image and the background region image of the at least one frame of image.

In one embodiment, each pixel in the foreground region image is a foreground sub-region.

In one embodiment, the foreground fusion module 1530 is configured to:

In one embodiment, the target fusion module 1540 is configured to:

fusing background area images of at least two frames of images to obtain a background fused image;

and generating a target image based on the foreground fusion image and the background fusion image.

In one embodiment, the target fusion module 1540 is configured to:

In one embodiment, the image acquisition module 1510 is configured to:

a plurality of frames of images continuously photographed under an exposure condition are acquired.

In one embodiment, the image acquisition module 1510 is configured to:

determining a corresponding first exposure time length according to the ambient illumination parameter;

acquiring a second exposure time corresponding to the minimum sensitivity;

and determining an exposure condition according to the maximum value of the first exposure time length and the second exposure time length and the minimum sensitivity.

In one embodiment, the image segmentation module 1520 is configured to:

subtracting at least two frames of images to be segmented in the continuous multi-frame images from reference frame images in the continuous multi-frame images respectively to obtain a difference image corresponding to each frame of image to be segmented;

generating a foreground mask image corresponding to each frame of image to be segmented according to a comparison result of each pixel value in the difference image and a preset threshold;

and extracting a foreground area image from the image to be segmented by using the foreground mask image.

In one embodiment, the image processing apparatus 1500 may further include an image registration module configured to:

In one embodiment, the image segmentation module 1520 is configured to:

and extracting the foreground area image again from the at least two registered frames of images.

In one embodiment, an image registration module configured to:

respectively extracting characteristic points from background region images of the two frames of images;

carrying out feature point matching on the background region images of the two frames of images to obtain a matching point pair;

determining a homography matrix according to the matching point pairs;

one of the two images is registered to the other by means of a homography matrix.

In one embodiment, the image segmentation module 1520 is configured to:

In one embodiment, the at least two frame images include a reference frame image.

An image registration module configured to:

and registering images except the reference frame image in the at least two frames of images to the reference frame image.

In one embodiment, the reference frame image may be a first frame image of the above-described consecutive multi-frame images.

In one embodiment, the image segmentation module 1520 is configured to:

and extracting a foreground area image of each frame image.

Exemplary embodiments of the present disclosure also provide another image processing apparatus. Referring to fig. 16, the image processing apparatus 1600 may include:

an image determining module 1610 configured to determine a reference frame image among the consecutive multi-frame images, and determine at least one frame image other than the reference frame image as an image to be fused;

an image segmentation module 1620 configured to determine a foreground region of the image to be fused from the foreground region;

a sub-region extraction module 1630 configured to extract at least one sub-region from the image to be fused;

and a sub-region fusion module 1640 configured to fuse the sub-regions in the image to be fused to the reference frame image, respectively, wherein when the sub-regions are located in the foreground region, a pixel value of the sub-region in the image to be fused and the pixel value of the sub-region in the reference frame image are used as a pixel value of the fused sub-region.

In one embodiment, the sub-region fusion module 1640 is configured to:

determining the pixel statistic value of the sub-region in the image to be fused and the pixel statistic value in the reference frame image;

and taking the pixel value of the sub-region in the image corresponding to the larger pixel statistic value as the pixel value of the sub-region after fusion.

In one embodiment, the sub-region fusion module 1640 is configured to:

in the process of respectively fusing the sub-regions in the image to be fused to the reference frame image, when the sub-regions are located in the background region of the image to be fused, the pixel weighted values of the sub-regions in the image to be fused and the reference frame image are used as the pixel values of the sub-regions after fusion.

In one embodiment, the image determination module 1610 is configured to:

In one embodiment, the sub-region fusion module 1640 is configured to:

in the process of respectively fusing the sub-regions in the image to be fused to the reference frame image, when the sub-regions are located in the background region, based on the first weight of the reference frame image and the second weight of the image to be fused, taking the pixel weighted values of the sub-regions in the image to be fused and the reference frame image as the pixel values of the fused sub-regions;

wherein the first weight is

The second weight is

And i is the ordinal number of the current image to be fused.

In one embodiment, the sub-region fusion module 1640 is configured to:

and obtaining a target image after fusing the subarea of the last frame image to be fused to the reference frame image.

In one embodiment, the image segmentation module 1620 is configured to:

In one embodiment, the sub-region extraction module 1630 is configured to:

dividing an image to be fused into a plurality of sub-regions, and extracting each sub-region.

In one embodiment, the sub-region extraction module 1630 is configured to:

and determining each pixel point in the image to be fused as a subarea.

The details of the above-mentioned parts of the apparatus have been described in detail in the method part embodiments, and thus are not described again.

It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functions of two or more modules or units described above may be embodied in one module or unit, according to exemplary embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.

Exemplary embodiments of the present disclosure also provide a computer-readable storage medium, which may be implemented in the form of a program product, including program code for causing an electronic device to perform the steps according to various exemplary embodiments of the present disclosure described in the above-mentioned "exemplary method" section of this specification, when the program product is run on the electronic device. In one embodiment, the program product may be embodied as a portable compact disc read only memory (CD-ROM) and include program code, and may be run on an electronic device, such as a personal computer. However, the program product of the present disclosure is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

A computer readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations for the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).

As will be appreciated by one skilled in the art, aspects of the present disclosure may be embodied as a system, method or program product. Accordingly, various aspects of the present disclosure may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" system. Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is to be limited only by the following claims.

Claims

1. An image processing method, comprising:

acquiring continuous multi-frame images;

extracting foreground region images of at least two frames of images;

fusing the foreground sub-regions according to the maximum pixel value of each foreground sub-region in each foreground region image to form a foreground fused image, wherein the foreground sub-regions are sub-regions of the foreground region image;

and generating a target image based on the foreground fusion image and the background area image of at least one frame of image.

2. The method of claim 1, wherein each pixel point in the foreground region image is one of the foreground sub-regions.

3. The method according to claim 1, wherein fusing the foreground sub-regions according to their maximum pixel values in the foreground region images to form a foreground fused image comprises:

determining the foreground area image corresponding to the maximum pixel statistic value of the foreground sub-area as a target foreground area image corresponding to the foreground sub-area;

4. The method according to claim 1, wherein generating the target image based on the foreground fusion image and the background area image of at least one frame of image comprises:

and generating the target image based on the foreground fusion image and the background fusion image.

5. The method according to claim 4, wherein the fusing the background region images of the at least two frames of images to obtain a background fused image comprises:

and averaging the pixel values of each pixel point in the background area in the image of each background area to obtain the background fusion image.

6. The method of claim 1, wherein said acquiring a plurality of consecutive frame images comprises:

acquiring a plurality of frames of images continuously shot under the exposure condition.

7. The method of claim 6, wherein determining an exposure condition based on the ambient lighting parameter comprises:

acquiring a second exposure time corresponding to the minimum sensitivity;

determining the exposure condition according to a maximum value of the first exposure time length and the second exposure time length and the minimum sensitivity.

8. The method according to claim 1, wherein said extracting foreground region images of at least two frames of images comprises:

and extracting the foreground area image from the image to be segmented by using the foreground mask image.

9. The method of claim 1, wherein after extracting the foreground region image of at least two frames of images, the method further comprises:

10. The method of claim 9, wherein after registering the at least two images, the method further comprises:

and extracting the foreground area image again from the at least two frames of images after registration.

11. The method of claim 9, wherein said registering the at least two images from their background region images comprises:

determining a homography matrix according to the matching point pairs;

registering one of the two images to the other image by the homography matrix.

12. The method of claim 9, wherein prior to registering the at least two images based on the background region images of the at least two images, the method further comprises:

13. The method of claim 9, wherein the at least two frame pictures comprise reference frame pictures;

the registering the at least two frame images comprises:

and registering images except the reference frame image in the at least two frame images to the reference frame image.

14. The method according to claim 8 or 13, wherein the reference frame image is a first frame image of the consecutive multi-frame images.

15. The method of claim 1, wherein before fusing the foreground sub-regions according to their maximum pixel values in the foreground region images to form a foreground fused image, the method further comprises:

and deleting the part of each foreground area image, which is positioned outside the common foreground area.

16. The method according to claim 1, wherein said extracting foreground region images of at least two frames of images comprises:

and extracting a foreground area image of each frame image.

17. An image processing method, comprising:

determining a reference frame image in continuous multi-frame images, and determining at least one frame image except the reference frame image as an image to be fused;

determining a foreground region of the image to be fused;

extracting at least one subregion from the image to be fused;

and respectively fusing the sub-regions in the image to be fused to the reference frame image, wherein when the sub-regions are located in the foreground region, the larger pixel value of the sub-regions in the image to be fused and the reference frame image is used as the pixel value of the sub-regions after fusion.

18. The method according to claim 17, wherein the using the larger pixel value of the sub-region in the image to be fused and the reference frame image as the pixel value of the sub-region after fusion comprises:

determining the pixel statistics of the sub-region in the image to be fused and the pixel statistics in the reference frame image;

19. The method according to claim 17, wherein when fusing the sub-regions in the image to be fused to the reference frame image respectively, the method further comprises:

20. The method according to claim 17, wherein determining a reference frame image among the continuous multiple frame images and determining at least one other frame image except the reference frame image as an image to be fused comprises:

determining an initial reference frame image in the continuous multi-frame images, and sequentially determining each other frame image as the image to be fused;

21. The method according to claim 20, wherein when fusing the sub-regions in the image to be fused to the reference frame image respectively, the method further comprises:

when the subregion is located in the background area, based on the first weight of the reference frame image and the second weight of the image to be fused, taking the weighted pixel value of the subregion in the image to be fused and the reference frame image as the pixel value of the subregion after fusion;

wherein the first weight is

The second weight is

And i is the ordinal number of the current image to be fused.

22. The method of claim 20, further comprising:

and obtaining a target image after fusing the subarea of the image to be fused of the last frame to the reference frame image.

23. The method according to claim 17, wherein the determining the foreground region of the image to be fused comprises:

subtracting the reference frame image from the image to be fused to obtain a difference image corresponding to the image to be fused;

and determining the foreground area of the image to be fused according to the comparison result of each pixel value in the difference image and a preset threshold value.

24. The method according to claim 17, wherein said extracting at least one sub-region from the image to be fused comprises:

and dividing the image to be fused into a plurality of sub-regions, and extracting each sub-region.

25. The method according to claim 24, wherein the dividing the image to be fused into a plurality of sub-regions comprises:

and determining each pixel point in the image to be fused as a subarea.

26. An image processing apparatus characterized by comprising:

an image acquisition module configured to acquire a continuous multi-frame image;

an image segmentation module configured to extract foreground region images of at least two frames of images;

the foreground fusion module is configured to fuse the foreground sub-regions according to the maximum pixel value of each foreground sub-region in each foreground region image to form a foreground fusion image, wherein the foreground sub-regions are sub-regions of the foreground region image;

and the target fusion module is configured to generate a target image based on the foreground fusion image and a background area image of at least one frame of image.

27. An image processing apparatus characterized by comprising:

the image fusion device comprises an image determining module, a fusion module and a fusion module, wherein the image determining module is configured to determine a reference frame image in continuous multi-frame images and determine at least one frame image except the reference frame image as an image to be fused;

the image segmentation module is configured to determine a foreground region of the image to be fused from the foreground region;

a subregion extraction module configured to extract at least one subregion from the image to be fused;

and the sub-region fusion module is configured to fuse the sub-regions in the image to be fused to the reference frame image respectively, wherein when the sub-regions are located in the foreground region, a larger pixel value of the sub-regions in the image to be fused and the reference frame image is used as a pixel value of the sub-regions after fusion.

28. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method of any one of claims 1 to 25.

29. An electronic device, comprising:

a processor; and

a memory for storing executable instructions of the processor;

wherein the processor is configured to perform the method of any of claims 1 to 25 via execution of the executable instructions.