WO2022011657A1

WO2022011657A1 - Image processing method and apparatus, electronic device, and computer-readable storage medium

Info

Publication number: WO2022011657A1
Application number: PCT/CN2020/102502
Authority: WO
Inventors: 布迪萨·艾哈迈德
Original assignee: Oppo广东移动通信有限公司
Priority date: 2020-07-16
Filing date: 2020-07-16
Publication date: 2022-01-20

Abstract

An image processing method, comprising: obtaining a first scene image, the first scene image comprising a main region, the main region being located in a depth of field region of the first scene image; obtaining a second scene image, a deep depth of field of the second scene image being not greater than a shallow depth of field of the first scene image; and fusing the main region and the second scene image to obtain a target image.

Description

Image processing method and apparatus, electronic device and computer-readable storage medium

technical field

The present application relates to the field of imaging, and in particular, to an image processing method, an image processing apparatus, an electronic device, and a computer-readable storage medium.

Background technique

With the development of imaging technology, it is necessary to blur the background of different images in a specific environment to achieve the effect of optical blur. In order to obtain a background blurred image, the usual practice is to first shoot the scene image, then distinguish the foreground part and the background part in the scene image, and finally use the software to process the background part with a blurring algorithm to achieve the optical blurring effect of the image. However, this blurring processing method is obtained through software processing, and the effect of optical blurring is not good.

SUMMARY OF THE INVENTION

An embodiment of the present application provides an image processing method, the image processing method includes acquiring a first scene image, the first scene image includes a subject area, and the subject area is located in a depth-of-field area of the first scene image; acquiring a second scene image, the second scene image The far-field depth of the scene image is not greater than the near-field depth of the first scene image; and the subject area and the second scene image are fused to obtain the target image.

An embodiment of the present application provides an electronic device, the electronic device includes a memory and one or more processors, the one or more processors are connected to the storage, and the one or more processors are used for: acquiring a first scene image, the first scene image including a subject area, the subject area is located in the depth of field area of the first scene image; acquiring a second scene image, the far depth of field of the second scene image is not greater than the near depth of field of the first scene image; and fusing the subject area and the second scene image to obtain target image.

Embodiments of the present application provide an image processing apparatus, where the image processing apparatus includes a first acquisition module, a second acquisition module, and an image fusion module. The first acquisition module is used for acquiring a first scene image, where the first scene image includes a subject area, and the subject area is located in a depth-of-field area of the first scene image. The second acquisition module is configured to acquire a second scene image, and the far field depth of the second scene image is not greater than the near field depth of the first scene image. The image fusion module is used for fusing the subject area and the second scene image to obtain the target image.

Additional aspects and advantages of embodiments of the present application will be set forth, in part, in the following description, and in part will be apparent from the following description, or learned by practice of the present application.

Description of drawings

The above and/or additional aspects and advantages of the present application will become apparent and readily understood from the following description of embodiments in conjunction with the accompanying drawings, wherein:

1 is a flowchart of an image processing method according to an embodiment of the present application;

2 is a schematic structural diagram of an electronic device according to an embodiment of the present application;

3 is a flowchart of an image processing method according to an embodiment of the present application;

4 is a flowchart of an image processing method according to an embodiment of the present application;

5 is a flowchart of an image processing method according to an embodiment of the present application;

6 is a flowchart of an image processing method according to an embodiment of the present application;

7 is a flowchart of an image processing method according to an embodiment of the present application;

8 is a flowchart of an image processing method according to an embodiment of the present application;

9 is a flowchart of an image processing method according to an embodiment of the present application;

10 is a flowchart of an image processing method according to an embodiment of the present application;

11 is a schematic diagram of the principle of an image processing method according to an embodiment of the present application;

12 is a schematic diagram of distortion correction of an image processing method according to an embodiment of the present application;

13 is a schematic diagram of the principle of color overflow correction of the image processing method according to an embodiment of the present application;

14 is a schematic diagram of the principle of color overflow correction of the image processing method according to the embodiment of the present application;

FIG. 15 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application;

FIG. 16 is a schematic diagram of connection between a computer-readable storage medium and a processor according to an embodiment of the present application.

detailed description

Embodiments of the present application are described in detail below, examples of which are illustrated in the accompanying drawings, wherein the same or similar reference numerals refer to the same or similar elements or elements having the same or similar functions throughout. The embodiments described below with reference to the accompanying drawings are exemplary, only used to explain the present application, and should not be construed as a limitation on the present application.

Referring to FIG. 1 , an embodiment of the present application provides an image processing method, and the image processing method includes:

01: Acquire a first scene image, where the first scene image includes a subject area, and the subject area is located in a depth-of-field area of the first scene image;

03: Acquire a second scene image, the far-field depth of the second scene image is not greater than the near-field depth of the first scene image; and

05: Fusion of the subject area and the second scene image to obtain the target image.

Referring to FIG. 2, an embodiment of the present application provides an electronic device, the electronic device includes a memory and one or more processors 320, the one or more processors 320 are connected to the storage, and the one or more processors 320 are used to execute 01 , 03, 05 methods. That is, the one or more processors 320 are configured to acquire a first scene image, the first scene image includes a subject area, and the subject area is located in the depth of field area of the first scene image; acquire a second scene image, and the distant depth of field of the second scene image is not greater than the near depth of field of the first scene image; and fusing the subject area with the second scene image to obtain the target image.

Specifically, as shown in FIG. 2 , the one or more processors 320 include a first ISP processor 232 and a second ISP processor 234 . Memory 260 includes image memory 260 . The electronic device further includes a first camera 210 , a second camera 220 , a control logic 250 , a display 270 , and a communication module 280 .

The first camera 210 includes one or more first lenses 212 and a first image sensor 214 . The first scene image may be any one of a visible light image (RGB image), an infrared image (IR image), or a black and white image. In an example, the acquisition of the first scene image here may be achieved by the first camera 210. When the first scene image 210 is a visible light image, the first camera 210 is a color camera. Correspondingly, the first image sensor 214 may include An array of color filters (eg, Bayer filters), the first image sensor 214 can obtain the light intensity and wavelength information captured by each imaging pixel and provide a set of image data that can be processed by the first ISP processor 232. When the first scene image is an infrared light image, the first camera 210 is an infrared light camera, and correspondingly, the first image sensor 214 may include an infrared filter array. When the first scene image is a black and white image, the first camera 210 is a black and white camera, and correspondingly, the first image sensor 214 may not be provided with a filter array. In another example, the first camera 212 collects the first scene image and stores it in the image memory 260 of the electronic device 200, where the first ISP processor 232 and the second ISP processor 234 can obtain the first scene image. Reading the first scene image stored in the image memory 260 is implemented. In another example, the first camera 210 collects the first scene image and stores it in the cloud or other devices, where the first scene image may be obtained by the communication module 280 in the electronic device 200 from the cloud or other devices, Then, it is transmitted to the first ISP processor 232 and the second ISP processor 234 by the communication module 280 for implementation.

The second camera 220 includes one or more second lenses 222 and a second image sensor 224 . The second scene image may also be any one of a visible light image (RGB image), an infrared image (IR image), or a black and white image. In an example, the acquisition of the second scene image here may be achieved by the second camera 220. When the second scene image 210 is a visible light image, the second camera 220 is a color camera. Correspondingly, the second image sensor 224 may include A color filter array, the second image sensor 224 can obtain the light intensity and wavelength information captured by each imaging pixel and provide a set of image data that can be processed by the first ISP processor 232. When the second scene image is an infrared light image, the second camera 220 is an infrared light camera, and correspondingly, the second image sensor 224 may include an infrared filter array. When the second scene image is a black and white image, the second camera 220 is a black and white camera, and correspondingly, the second image sensor 224 may not be provided with a filter array. In another example, the second camera 222 collects the second scene image and stores it in the image memory 260 of the electronic device 200, where the first ISP processor 232 and the second ISP processor 234 can obtain the second scene image. Reading the second scene image stored in the image memory 260 is implemented. In another example, the second camera 220 collects the second scene image and stores it in the cloud or other devices, where the second scene image may be obtained by the communication module 280 in the electronic device 200 from the cloud or other devices, Then, it is transmitted to the first ISP processor 232 and the second ISP processor 234 by the communication module 280 for implementation.

The first scene image collected by the first camera 210 is transmitted to the first ISP processor 232 for processing. After the first ISP processor 232 processes the first scene image, the statistical data of the first scene image can be sent to the control logic 250, The control logic 250 may determine the control parameters of the first camera 210 according to the statistical data, so that the first camera 210 may perform operations such as automatic focusing and automatic exposure according to the control parameters. The first scene image can be stored in the image memory 260 after being processed by the first ISP processor 232 . In addition, after being processed by the ISP processor 232, the first scene image can be directly sent to the display 270 for display, and the display 270 can also read the image in the image memory 260 for display.

Similarly, the second image captured by the second camera 220 is transmitted to the second ISP processor 234 for processing. After processing the second scene image, the second ISP processor 234 can send the statistical data of the second scene image to the control logic. 250. The control logic 250 may determine control parameters of the second camera 220 according to the statistical data, so that the second camera 220 may perform operations such as auto-focusing, auto-exposure, and the like according to the control parameters. The second scene image can be stored in the image memory 260 after being processed by the second ISP processor 234 . In addition, after being processed by the second ISP processor 234, the second image can be directly sent to the display 270 for display, and the display 270 can also read the image in the image memory 260 for display.

Wherein, both the first ISP processor 232 and the second ISP processor 234 can process image data pixel by pixel in various formats. For example, each image pixel may have a bit depth of 8, 10, 12, or 14 bits, and both the first ISP processor 232 and the second ISP processor 234 may perform one or more image processing operations on the image data, collect information about the image Statistics of the data. Among them, the image processing operations can be performed with the same or different bit depth precision.

Statistics determined by the first ISP processor 232 may be sent to the control logic 250 . At this time, the statistical data may include statistical information of the first image sensor 214 such as automatic exposure, automatic white balance, automatic focus, flicker detection, black level compensation, shading correction of the first lens 212, and the like. Statistics determined by the second ISP processor 234 may also be sent to the control logic 250 . At this time, the statistical data may include second image sensor 224 statistics such as automatic exposure, automatic white balance, automatic focus, flicker detection, black level compensation, second lens 222 shading correction, and the like. The control logic 250 may include a processor and/or a microcontroller executing one or more routines (eg, firmware) that may determine the control parameters and the first camera 210 based on the received statistics. A control parameter of the ISP processor 232 , and/or to determine the control parameter of the second camera 220 and the control parameter of the second ISP processor 234 . For example, the control parameters of the first camera 210 or the second camera 220 may include gain, integration time for exposure control, anti-shake parameters, flash control parameters, first lens 212 control parameters (eg, focal length for focusing or zooming), or these parameters combination, etc. Control parameters for the first ISP processor 232 or the second ISP processor 234 may include gain levels and color correction matrices for automatic white balance and color adjustment (eg, during RGB processing).

The image memory 260 may be a part of the memory device, or may be an independent dedicated memory in a storage device or an electronic device, and may include a DMA (Direct Memory Access, direct memory access) feature.

In one embodiment, the first ISP processor 232 and the second ISP processor 234 may be the same processor, or may be two independent processors, which are not limited herein.

The first scene image is an image obtained by shooting for a specific scene, and the subject area refers to the region of interest in the obtained first scene image. The process of processing and selectively ignoring the uninteresting region is the process of subject recognition. In one embodiment, the subject area refers to a first scene determined according to a preset subject recognition model, a preset subject recognition neural network, a subject recognition model obtained by machine learning, or a subject recognition neural network obtained by machine learning. The area of interest in the image, the subject area includes but is not limited to the person or object with the largest area in the first scene image, the person or object in the center of the first scene image, or the person or object of a specific color in the first scene image. things, etc. In this embodiment, the subject area may be the focus area of the first scene image.

Wherein, the subject area of the first scene image is located in the depth of field area of the first scene image, and the depth of field refers to the range of the distance before and after the object measured by imaging at the front of the camera lens or other imagers that can obtain a clear image, that is, the first camera After the module is focused, the distance range of the clear image presented in the range before and after the focus, it can be understood that the main area of the first scene image is located in the depth of field area of the first scene image, which means that the main area of the first scene image is clear. The degree is higher, eg, higher than the first predetermined sharpness.

The second scene image is also an image obtained by shooting for a specific scene. The imaging effect of the second scene image is different from that of the first scene image, including but not limited to: the depth of field position of the second scene image is different from that of the first scene image. There are differences in the depth of field positions of the images. For example, in this embodiment, the far-field depth of the second scene image is not greater than the near-field depth of the first scene image. The near depth of field is also called the foreground depth, and the foreground depth satisfies the representative formula: ΔL1=FδL^2/(f^2+FδL), and the far field depth is also called the back depth of field, and the back depth of field satisfies the representative formula: ΔL2=FδL^2/(f^2 -FδL), where δ is the diameter of the circle of confusion allowed by the camera module, F is the shooting aperture value of the camera, f is the lens focal length of the camera, L is the focusing distance of the camera, and the depth of field is related to the aperture, lens focal length, and shooting distance of the camera. And the requirements for image quality (expressed as the size of the allowable circle of confusion) are related. In this embodiment, the far depth of field of the second scene image is not greater than the near depth of field of the first scene image, so that the first scene image and the second scene image are in the focus position, the image clarity, resolution, gray value or shooting direction, There are differences in parameters such as angle and exposure. For example, the definition of the second scene image is relatively low, for example, lower than the second predetermined definition, and the second predetermined definition may be less than or equal to the first predetermined definition, so that the visual experience presented to the user by the first scene image is clear The visual experience presented to the user by the second scene image is a blurred image.

Further, in one embodiment, the scene targeted by the first scene image and the scene targeted by the second scene image may be exactly the same, and in this case, the first camera 210 and the second camera 220 may be the same camera, that is, the first scene The image and the second scene image are images of different frames acquired by the same camera at different times. At this time, the first scene image and the second scene image have no parallax, and it is faster and simpler to perform the fusion method in 05. In another embodiment, the scene targeted by the first scene image and the scene targeted by the second scene image may have a lower degree of parallax. In this case, the first camera 210 and the second camera 220 may be two cameras located on the same baseline. In this case, the image of the first scene and the image of the second scene may be acquired at the same time, which can reduce the time for acquiring images.

In order to obtain a bokeh image, the usual practice is to first take a scene image, then distinguish the foreground part and the background part in the scene image, and finally use software to bokeh the background part to achieve the optical bokeh effect of the image. However, this blurring processing method is obtained through software processing, and the effect of optical blurring is not good. The image processing method and the electronic device according to the embodiments of the present application obtain the target image by fusing the subject area and the second scene image. Since the subject area of the first scene image is located in the depth-of-field area of the first scene image, the definition is high and is presented to the user. The visual experience of the second scene image is a clear image, and the far-field depth of the second scene image is not greater than the near-field depth of the first scene image. area and the blurred second scene image directly captured by the camera, thereby avoiding the use of software algorithm processing to achieve background blur, and the obtained target image has a good optical blur effect.

Referring to FIG. 3, in some embodiments, the image processing method further includes:

07: Process the first scene image to obtain the subject area and the background mask area.

Referring to FIG. 2 , in some embodiments, one or more processors 320 are further configured to perform the method in 07 . That is, one or more processors 320 are used to process the first scene image to obtain the subject area and the background mask area.

As mentioned above, the body region refers to the region of interest in the acquired first scene image. In one embodiment, the body region refers to a subject recognition model based on a preset subject recognition model, a preset subject recognition neural network, a machine The area of interest in the first scene image determined by the subject recognition model acquired by learning or the subject recognition neural network acquired by machine learning, and other areas in the first scene image except the subject area can be delineated as the background mask area.

Referring to FIG. 4, in some embodiments, 07: Process the first scene image to obtain the subject area and the background mask area, including:

071: Process the first scene image through the subject recognition detection network to obtain an initial foreground area and an initial background area of the first scene image;

073: Obtain a color overflow area at the junction of the initial foreground area and the initial background area, where the color overflow area includes at least one color overflow pixel; and

075: Color-corrects color-bleached pixels for subject and background mask areas.

Referring to FIG. 2 , in some embodiments, one or more processors 320 are further configured to perform the methods in 071 , 073 , and 075 . That is, one or more processors 320 can also be used to process the first scene image through the subject recognition detection network to obtain the initial foreground area and the initial background area of the first scene image; obtain the color at the junction of the initial foreground area and the initial background area an overflow area, the color overflow area includes at least one color overflow pixel; and performing color correction on the color overflow pixel to obtain the main body area and the background mask area.

Among them, the subject recognition detection network may be pre-stored in the memory, or may be acquired later through machine learning. As shown in FIG. 14 , by inputting the first scene image into the subject recognition detection network, the initial foreground area BG and the initial background area FG can be output. There is a boundary between the initial foreground area BG and the initial background area FG, and the processor may: divide the boundary into a plurality of sub-areas Ai, and each sub-area Ai includes a certain number of pixels; and obtain the definition of each sub-area Ai, when the sub-area Ai The resolution of the sub-area Ai is greater than the third predetermined resolution, then the sub-area Ai does not belong to the color overflow area, such as shown in Figure 13 (2); when the resolution of the sub-area Ai is less than the third predetermined resolution, then the sub-area Ai belongs to The color overflow area, for example, is shown in Figure 13(1). Wherein, the color overflow area includes at least one color overflow pixel, and the third predetermined definition is smaller than the first predetermined definition.

Acquiring the sharpness of each sub-area Ai may specifically include: first obtaining the ratio of the number of pixels of high-frequency information in the sub-area Ai to all the pixels of the entire first scene image, and using the ratio to characterize the sharpness of the sub-area Ai , the higher the ratio, the higher the image clarity. In one example, the first scene image is first processed by shaping low-pass filtering to obtain a filtered image. Then, high-frequency information is obtained according to the first scene image and the filtered image. Specifically, the high-frequency information can be obtained by subtracting the filtered image from the first scene image. The high-frequency information is a part of the discrete cosine transform coefficient far from zero frequency, and this part is used to describe the detailed information of the first scene image. Finally, the proportion of the number of pixels of high frequency information of the sub-region Ai in all the pixels of the first scene image is counted. For example, if the number of high-frequency information pixels in the sub-region Ai accounts for 10% of all the pixels in the first scene image, the ratio of 10% is used to represent the sharpness of the sub-region Ai.

After obtaining the color overflow area at the junction of the initial foreground area and the initial background area, it is necessary to perform color correction on the color overflow pixels to obtain the main area and background mask area for fusion.

The image processing method and electronic device according to the embodiments of the present application obtain the target image by fusing the subject area and the second scene image. Since the subject area of the first scene image is obtained after color overflow correction, the clarity is further improved. The definition of the subject area in the obtained target image is also high, which is in sharp contrast with the blurred second scene image, and further improves the optical blurring effect of the target image.

Referring to FIG. 5, in some embodiments, 071: Process the first scene image through a subject recognition detection network to obtain an initial foreground area and an initial background area of the first scene image, including:

0711: Obtain the depth information of the first scene image;

0713: According to the depth information, generate a center weight map corresponding to the first scene image, and the weight value represented by the center weight map gradually decreases from the center to the edge;

0715: Input the first scene image and the center weight map into the subject recognition detection network to obtain a confidence map of the foreground region of the first scene image; and

0717: Determine the initial foreground area and the initial background area according to the confidence map of the foreground area.

Referring to FIG. 2 , in some embodiments, one or more processors 320 are further configured to perform the methods in 0711 , 0713 , 0715 , and 0717 . That is, the one or more processors 320 may be further configured to: acquire depth information of the first scene image; generate a center weight map corresponding to the first scene image according to the depth information, and the weight values represented by the center weight map are from the center to the edge Gradually decrease; input the first scene image and the center weight map into the subject recognition detection network to obtain the confidence map of the foreground subject area of the first scene image; and determine the initial foreground area and the initial background area according to the confidence map of the foreground subject area .

In one embodiment, the first scene image is processed through the subject recognition detection network. Specifically, depth information of the first scene image may be obtained first, and then center weights corresponding to the first scene image may be generated according to the depth information of the first scene image. Figure, the weight value represented by the center weight map gradually decreases from the center to the edge, which is conducive to highlighting the subject in the center position, and also conforms to the operating habits of general terminals or shooting users. In one example, the first scene image and the center weight map are input into the subject recognition monitoring network to obtain a confidence map of the subject area of the first scene image. There may be some points with low confidence or scattered points in the confidence map. The confidence map can be filtered and corrected by the ISP processor or the central processing unit, so as to obtain the initial foreground area and Initial background area. In one example, the filtering process may employ a configured confidence threshold, and filter pixels whose confidence values are lower than the confidence threshold in the confidence map. In an example, the confidence threshold can be an adaptive confidence threshold, a fixed threshold, or a corresponding threshold can be configured by region. In this embodiment, by filtering and correcting the confidence map of the first scene image, the reliability of the initial foreground area is improved, and separate segmentation processing is performed for the initial foreground area and the initial background area that affect the recognition accuracy of the subject, which is beneficial to Improve the precision and accuracy of image fusion processing.

In one example, the preset detection model of the subject recognition detection network is obtained by collecting a large amount of training data in advance, and inputting the training data into the subject detection model including the initial network weights for training. Each set of training data includes the visible light map, depth map, center weight map and annotated subject mask map corresponding to the same scene. Among them, the visible light map and the center weight map are used as the input of the trained subject detection model, and the annotated subject mask map is used as the actual value that the trained subject detection model expects to output. The subject mask map is an image filter template used to identify the subject in the image, which can block other parts of the image and filter out the subject in the image. The subject detection model can be trained to recognize and detect various subjects, such as people, flowers, cats, dogs, backgrounds, etc.

In one example, the preset detection model of the subject recognition detection network is obtained by training according to the center weight map corresponding to the first scene image and the depth information of the first scene image. In this embodiment, the depth map and the center weight map are used as the input of the preset detection model. The depth information of the depth map can be used to make objects closer to the camera easier to be detected, and the center attention mechanism of the center weight map can also be used. (The center weight is large and the surrounding weight is small), which makes the object in the center of the image easier to detect; in addition, the introduction of the depth map to enhance the depth feature of the subject, and the introduction of the center weight map to enhance the central attention feature of the subject, not only can accurately identify The target subject in a simple scene (a scene with a single subject and low contrast in the background area) greatly improves the accuracy of subject recognition in complex scenes; and the introduction of depth maps can also solve the ever-changing targets of natural images by traditional target detection methods. less robust problem.

In one embodiment, the above image processing method further includes: when there are multiple subjects, determining according to at least one of the priority of the category to which each subject belongs, the area occupied by each subject, and the position of each subject. The foreground subject area of the first scene image.

In some embodiments, acquiring the depth information of the first scene image is acquired through a binocular vision system;

In some embodiments, acquiring the depth information of the first scene image is acquired through a monocular vision system;

In some embodiments, acquiring the depth information of the first scene image is acquired by a structured light camera module;

In some embodiments, the depth information of the first scene image is obtained through a time-of-flight camera module.

In one embodiment, the binocular disparity value is obtained through a binocular vision system to obtain the depth information of the first scene image. Specifically, through the first camera and the second camera of the binocular vision system, the first depth image and the second depth image corresponding to the first scene image are obtained respectively, so as to obtain the difference between the pixels of the first depth image and the second depth image. Relative geometric position relationship between pixels. Specifically, the first camera and the second camera are used to capture images of the checkerboard calibration board from multiple angles, and the internal parameters, external parameters and distortion coefficients of the first camera and the second camera are calculated respectively, as well as the first camera and the second camera. geometrical relationship between them. Preferably, the first camera and the second camera are two independent cameras with the same performance index (same optical lens and image sensor), and their optical axes are parallel to each other and on the same baseline. In practical applications, the baseline between the two first cameras and the second camera can be adjusted according to different requirements, and two cameras with different focal lengths or models can be used to meet different functions; The two cameras can be placed horizontally or vertically, and the baseline distance can also be adjusted as required. The first camera and the second camera can be color cameras of the same model, or a color camera, a black-and-white camera , it can also be a high-resolution zoom color camera, a low-resolution color camera, or a color camera with OIS optical image stabilization, a fixed-focus color camera, etc., without the above-mentioned restrictions.

According to the relative geometric position relationship, the binocular disparity value between the corresponding pixels of the first depth image and the second depth image is determined, so as to obtain the coordinates of the corresponding pixels of the first depth image and the second depth image, and further obtain a sparse disparity map. According to the sparse disparity map, starting from the upper left corner of the sparse disparity map, the calculation is performed line by line from left to right, from top to bottom, pixel by pixel; if the pixel is a reference disparity point, skip it; if the pixel point If it is not a reference parallax point, select the reference parallax point closest to the pixel point as a reference, and calculate the parallax point of the pixel. The parallax point is centered at the corresponding point in the image of the right camera, the search window is extracted, and then the parallax of the pixel is calculated by the method of block matching; after the calculation of each pixel is completed, the parallax value of the entire image can be obtained, and finally obtained Depth information of the first scene image.

In one embodiment, algorithm calculation is performed by a monocular vision system to obtain the depth information of the first scene image. Specifically, at least two frames of images are acquired through the camera of the monocular vision system, and depth information of the first scene image is acquired through a preset depth prediction model. In one example, the acquired image can be pre-divided into multiple blocks using methods such as superpixels, and it is assumed that the depth values of the multiple image blocks are the same, and then absolute depth features and relative depth features are selected respectively. , corresponding to estimating the absolute depth of each block, and correspondingly estimating the relative depth of adjacent blocks (ie depth difference), and building back-end models such as Markov Random Field (MRF) to establish local features and depths The correlation relationship between them and the depth correlation relationship between multiple blocks are trained to obtain a depth prediction model.

In one embodiment, the depth information of the first scene image is acquired through the structured light camera module, that is, the first scene image is acquired based on the structured light image sensor in the structured light camera module, and the first scene image is structured light image. Specifically, the structured light image sensor may include a laser light and a laser camera. Pulse Width Modulation (PWM for short) can modulate the laser light to emit structured light, the structured light is irradiated to the imaging object, and the laser camera can capture the structured light reflected by the imaging object for imaging to obtain the first scene image. The depth engine can calculate and obtain the corresponding depth information according to the first scene image. Specifically, the depth engine demodulates the phase information corresponding to the deformed position pixels in the first scene image, converts the phase information into height information, and determines according to the height information. corresponding depth information.

In one embodiment, a time of flight (ToF) camera module is used to obtain depth information of a first scene image, where the first scene image is a time of flight depth image. Specifically, the laser transmitter in the ToF camera module is controlled to be turned on to emit laser light to the scene, and at the same time, the timing circuit of each photosensitive pixel of the image sensor in the ToF camera module is controlled to start counting, and the emitted laser is passed through the scene. The object is reflected back and received by the image collector. Since the avalanche photodiode in each photosensitive pixel in the image collector works in Geiger mode (the reverse bias voltage is higher than the avalanche voltage), the avalanche phenomenon will occur when a single photon is absorbed, so that the output current instantaneously (less than 1ps) reaches The maximum value is fed back to the independent timing circuit of each photosensitive pixel to make the timing circuit stop counting. According to each timing circuit, the count value and the speed of light are used to calculate the depth of each pixel in the time-of-flight depth image.

Referring to FIG. 6, in some embodiments, 075: Perform color correction on color bleed pixels to obtain a subject area and a background mask area, including:

07511: Extend a predetermined range with each color overflow pixel as the center to obtain a correction area, and the correction area includes pixels in the foreground area and pixels in the background area;

07513: Get the pixel value of the foreground area pixel in the correction area;

07515: Correct the color overflow pixel according to the pixel value of the foreground area pixel within the correction area to obtain the corrected pixel; and

07517: Merge corrected pixels into the initial foreground area to obtain the subject area and background mask area.

Referring to FIG. 2 , in some embodiments, one or more processors 320 are further configured to perform the methods in 07511 , 07513 , 07515 , and 07517 . That is, one or more processors 320 can also be used to: extend a predetermined range around each color overflow pixel to obtain a correction area, and the correction area includes pixels of the foreground area and pixels of the background area; obtain the pixels of the pixels of the foreground area in the correction area value; correct the color overflow pixels according to the pixel values of the foreground area pixels in the correction area to obtain corrected pixels; and merge the corrected pixels into the original foreground area to obtain the main area and the background mask area.

Specifically, referring to FIG. 14 and FIG. 14 , the area enclosed by the two black curves in FIG. 14 represents the color overflow area A1 . The color overflow area A1 includes a plurality of color overflow pixels. Assuming that a color overflow pixel in the color overflow area A1 is P, the first ISP processor can take the color overflow pixel P as the center, and expand a predetermined range around to obtain the correction area A2, and the first ISP processor can use the color overflow pixel P as the center. P is the center, and a predetermined range is extended to the surrounding area according to a predetermined shape to obtain a correction area A2, wherein the predetermined shape can be a circle, a triangle, a quadrilateral, a pentagon, a hexagon, an octagon, a dodecagon, etc. That is, the correction area A2 obtained after expanding the predetermined range may be a circle, a triangle, a quadrangle, a pentagon, a hexagon, an octagon, a dodecagon, etc., which is not limited herein. The correction area A2 also includes three types of pixels: foreground area pixels, background area pixels, and color overflow pixels P. In one example, the first ISP processor may select all foreground area pixels in the correction area A2, and use the pixel values of all foreground area pixels to correct the pixel value Pc of the color overflow pixel P. For example, assuming that there are N pixels in the foreground area, the pixel value of each foreground area pixel is Pi, i≤N, and i is a positive integer, then

In another example, the first ISP processor may select a part of the foreground area pixels located in the correction area A2, wherein, among the selected foreground area pixels, the space between each foreground area pixel and the color overflow pixel P The distance is less than or equal to the predetermined distance. Then, the first ISP corrects the pixel value Pc of the color overflow pixel P by using the pixel value of the selected part of the foreground area pixels. For example, assuming that there are N foreground area pixels, the first ISP processor may select M foreground area pixels from them, where M<N, and among the M foreground area pixels, the coordinate value Pxy of each foreground area pixel The spatial distance from the coordinate value Puv of the color overflow pixel P is less than or equal to the predetermined distance D, that is,

If among the M foreground area pixels, the pixel value of each foreground area pixel is Pi, i≤M, and i is a positive integer, then

Compared with using the pixel values of all foreground area pixels in the correction area A2 to correct the pixel values of the color overflow pixels, only the pixel values of some foreground area pixels that are closer to the color overflow pixels are used to correct the pixel values of the color overflow pixels, On the one hand, the color overflow phenomenon can be eliminated, and on the other hand, the pixel values of the corrected pixels obtained after correction can be made more accurate. In contrast to correcting the pixel values of the color overflow pixels using only some of the pixel values of the foreground area pixels that are closer to the color overflow pixels, the pixel values of the color overflow pixels are corrected by using the pixel values of all foreground area pixels in the correction area A2. , the calculation amount of the first ISP processor can be reduced.

After obtaining the corrected pixels, the first ISP processor may merge the corrected pixels into the foreground region. The first ISP processor may use the manner shown in FIG. 6 to traverse all the color overflow pixels in the correction area to obtain a plurality of corrected pixels. A plurality of corrected pixels are merged into the initial foreground area, so as to obtain the updated initial foreground area, that is, the subject area. The area other than the subject area in the first scene image is the background mask area.

Referring to FIG. 7, in some embodiments, 075: Perform color correction on color bleed pixels to obtain a subject area and a background mask area, including:

07521: Extend a predetermined range with each color overflow pixel as the center to obtain a correction area, and the correction area includes pixels in the foreground area and pixels in the background area;

07523: Get the pixel value of the background area pixel in the correction area;

07525: Correct the color overflow pixels according to the pixel values of the background area pixels within the correction area to obtain corrected pixels; and

07527: Merge corrected pixels to the initial background area to obtain the subject area and background mask area.

Referring to FIG. 2, in some embodiments, one or more processors 320 are also used to perform the methods in 07521, 07523, 07525, and 07527. That is, one or more processors 320 can also be used to: extend a predetermined range around each color overflow pixel to obtain a correction area, and the correction area includes pixels of the foreground area and pixels of the background area; obtain the pixels of the pixels of the background area in the correction area value; correct the color overflow pixels according to the pixel values of the background area pixels in the correction area to obtain corrected pixels; and merge the corrected pixels into the original background area to obtain the main area and the background mask area.

Specifically, referring to FIG. 14 and FIG. 14 , the area enclosed by the two black curves in FIG. 14 represents the color overflow area A1 . The color overflow area A1 includes a plurality of color overflow pixels. Assuming that a color overflow pixel in the color overflow area A1 is P, the first ISP processor can take the color overflow pixel P as the center, and expand a predetermined range around to obtain the correction area A2. Wherein, the first ISP processor may take the color overflow pixel P as the center, and expand a predetermined range to the surrounding area according to a predetermined shape to obtain the correction area A2, wherein the predetermined shape may be a circle, a triangle, a quadrilateral, a pentagon, or a hexagon. , octagon, dodecagon, etc., that is, the correction area A2 obtained after expanding the predetermined range can be circle, triangle, quadrilateral, pentagon, hexagon, octagon, dodecagon, etc., There is no restriction here. The correction area also includes three types of pixels: foreground area pixels, background area pixels, and color overflow pixels P. In one example, the first ISP processor may select all background area pixels in the correction area A2, and use the pixel values of all background area pixels to correct the pixel value Pc of the color overflow pixel P. For example, assuming that there are N background area pixels, the pixel value of each background area pixel is Pi, i≤N, and i is a positive integer, then

In another example, the first ISP processor may select a part of the background area pixels located in the correction area A2, wherein, among the selected background area pixels, the space between each background area pixel and the color overflow pixel P The distance is less than or equal to the predetermined distance. Then, the first ISP corrects the pixel value Pc of the color overflow pixel P by using the pixel value of the selected background area pixels. For example, assuming that there are N background area pixels, the first ISP processor may select M background area pixels from them, where M<N, and among the M background area pixels, the coordinate value Pxy of each background area pixel The spatial distance from the coordinate value Puv of the color overflow pixel P is less than or equal to the predetermined distance D, that is,

If among the M background area pixels, the pixel value of each background area pixel is Pi, i≤M, and i is a positive integer, then

Compared with using the pixel values of all background area pixels in the correction area A2 to correct the pixel values of the color overflow pixels, the pixel values of the color overflow pixels are corrected only by using the pixel values of some background area pixels that are closer to the color overflow pixels, On the one hand, the color overflow phenomenon can be eliminated, and on the other hand, the pixel values of the corrected pixels obtained after correction can be made more accurate. In contrast to correcting the pixel values of the color overflow pixels by using only the pixel values of some background area pixels that are closer to the color overflow pixels, the pixel values of all background area pixels in the correction area A2 are used to correct the pixel values of the color overflow pixels. , the calculation amount of the first ISP processor can be reduced.

After obtaining the corrected pixels, the first ISP processor may merge the corrected pixels into the background area. The first ISP processor may use the manner shown in FIG. 7 to traverse all the color overflow pixels in the correction area to obtain a plurality of corrected pixels. A plurality of corrected pixels are merged into the initial background area, so as to obtain the updated initial background area, that is, the background mask area. The area other than the background mask area in the first scene image is the main area.

Referring to FIG. 8 , in some embodiments, the second scene image is an optically blurred image, and the definition of the first scene image is higher than that of the second scene image. 05: Fusion of the subject area and the second scene image to obtain the target image, including:

051: Use the subject area as the target subject area;

053: Fusing the background mask area and the background area in the second scene image to obtain the target background area; and

055: Acquire a target image according to the target subject area and the target background area.

Referring to FIG. 2 , in some embodiments, one or more processors 320 are further configured to perform the methods in 051 , 053 and 055 . That is, one or more processors 320 may also be used to: take the subject area as the target subject area; fuse the background mask area and the background area in the second scene image to obtain the target background area; and obtain the target background area according to the target subject area and the target background area Get the target image.

Specifically, referring to FIG. 2 and FIG. 11 , in an embodiment of the present application, the second scene image is, for example, an image obtained by macro shooting. At least part of the main body area and all the background areas in the scene image are blurred, so that the second scene has the effect of optical defocus, and the definition of the second scene image is lower than that of the first scene image.

After obtaining the first scene image and the second scene image, any one of the first ISP processor or the second ISP processor may use the subject area in the first scene image as the target subject area. Moreover, any one of the first ISP processor or the second ISP processor can fuse the background mask area and the background area in the second scene image to obtain the target background area. As an example, it is assumed that the pixel value of a background area pixel in the background mask area in the first scene image is Pi, the pixel value of a background area pixel in the background area in the second scene image is Pi', and the pixel value is Pi'. The position of the background area pixel of Pi in the first scene image corresponds to the position of the background area pixel with the pixel value Pi' in the second scene image, then the pixel value of a target background pixel in the target background area can be calculated in the following way Pi": Pi"=a*Pi+b*Pi', where a and b are weights, and a+b=1. Any one of the first ISP processor or the second ISP processor can use the calculation method of Pi"=a*Pi+b*Pi' to traverse the background mask area and all backgrounds in the background area in the second scene image. area pixels to obtain multiple target background pixels. Multiple target background pixels constitute the target background area.

The image processing method of the embodiment of the present application uses the subject area as the target subject area. Since the first scene image has high definition, after the subject area is used as the target subject area, the target subject area can also have high definition. The image processing method of the embodiment of the present application fuses the background mask area and the background area in the second scene image to obtain the target background area, and can blur the target background image with the help of the optical defocus effect of the second scene image, which can improve the Bokeh effect for the target background image.

Further, in some embodiments, the depth corresponding to the background area pixel with the pixel value Pi and the background area pixel with the pixel value Pi' may be determined. When the depth corresponding to the background area pixel with the pixel value of Pi and the background area pixel with the pixel value of Pi' is small, a can be set to be greater than b. When the depth corresponding to the pixels in the background area is large, b can be set to be larger than a. In this way, the part with a smaller depth in the target background area can be made to have a lower degree of blurring, and the part with a larger depth in the target area can be made to have a higher degree of blurring, so that the blurring effect of the target background area can be further improved.

Referring to FIG. 9, in some embodiments, the image processing method further includes:

09: when it is detected that there is image distortion in the second scene image, obtain a transformation matrix corresponding to the second scene image; and

011: Perform correction processing on the second scene image according to the transformation matrix to obtain a corrected image.

05: Fusion of the subject area and the second scene image to obtain the target image, which may include:

057: Fuse the subject area with the corrected image to obtain the target image.

Referring to FIG. 2 , in some embodiments, one or more processors 320 are further configured to perform the methods in 09 , 011 and 057 . That is, the one or more processors 320 are configured to: obtain a transformation matrix corresponding to the second scene image when it is detected that there is image distortion in the second scene image; perform correction processing on the second scene image according to the transformation matrix to obtain a corrected image; And fuse the subject area with the corrected image to obtain the target image.

Wherein, the image distortion may be that one of the parallel parameters such as pixel value, chromaticity, depth value or exposure of the image is distorted or multiple parameters are distorted. Specifically, the content difference of the second scene image can be obtained, and the content difference can be compared with a preset threshold to determine whether there is distortion in a specific parameter. The content difference may be two adjacent pixel blocks (a pixel block may include a or multiple pixels), or the difference between parameters of pixel blocks at corresponding positions between different frame images acquired for the same object. For example, the content values of the same parameters of the first pixel block, the second pixel block, and the third pixel block are obtained, and the parameters include but are not limited to gray value, chromaticity, depth value, etc. The first pixel block and the third pixel block are: Pixel blocks adjacent to the second pixel block, the second pixel block is located between the first pixel block and the third pixel block, and each pixel of the first pixel block, the second pixel block, and the third pixel block contains depth values. (i.e. the content value is the pixel value). The first content mean value of each first pixel block and the second content mean value of the second pixel block may be determined, and the content difference between the first content value and the second content value may be determined. Understandably, if the content difference between the first pixel block and the second pixel block is low (for example, less than a preset threshold), it means that the first pixel block and the second pixel block actually correspond to the same position of the object being photographed. , the error of the depth value between the first pixel block and the second pixel block is small, and the distortion of the first pixel block and the second pixel block is small. Similarly, the content difference between the second pixel block and the third pixel block can be determined, and the content difference between the first pixel block and the third pixel block can be compared with a preset threshold, if the content difference is greater than the preset threshold, then There is image distortion. If the content difference is less than the preset threshold, there is no image distortion. It should be noted that, the above-mentioned first pixel block, second pixel block or third pixel block is adjacent, it can also be expressed as three pixel blocks are located in three adjacent frames of images, and any pixel block is located in the corresponding image. The position of the pixel block is the same as the position of the other two pixel blocks in the corresponding image, which is determined according to the specific application requirements.

FIG. 12 is a schematic diagram illustrating a comparison between an image with distortion and an image without distortion. Figure 12 shows the images of the calibration plate acquired by the two cameras. Among them, (1) in Figure 12 is the image of the calibration plate without image distortion. A series of points in the image are regularly arranged in the horizontal and vertical directions with the same spacing, and the shape of each square is normal. . While (2) in FIG. 12 is an image of the calibration plate with image distortion, the plates on the calibration plate are no longer arranged neatly and regularly in the horizontal and vertical directions, and the shape of the squares has changed.

When it is determined that there is image distortion in the second scene image, a transformation matrix corresponding to the second scene image may be obtained. In one example, the transformation matrix corresponding to the second scene image may be determined according to the focus segment of the camera used for capturing the second scene image when the second scene image is captured. If the focus segment of the camera is F1 to F2, the transformation matrix corresponding to the second scene image is matrix1. If the focus segment of the camera is F2 to F3, the transformation matrix corresponding to the second scene image is matrix2. If the focus segment corresponding to the camera If it is F3~F4, the transformation matrix corresponding to the second scene image is matrix3... and so on, if the focus segment corresponding to the camera is Fn-1~Fn, then the transformation matrix corresponding to the second scene image is matrix(n-1) . The corresponding relationship between the focus segment and the transformation matrix is pre-calibrated and stored in the image memory shown in FIG. 2 . It can be understood that when the focus segment of the camera is different, the distortion form of the obtained image will also be different. According to the focus segment of the camera, the transformation matrix corresponding to the focus segment is selected, and the distortion of the image can be corrected more accurately, and the corrected image with better distortion correction effect can be obtained.

It can be understood that the method of distorted image correction is not limited to the above-mentioned processing through the corresponding transformation matrix, and the distortion correction can also be performed by means of edge erosion or morphological processing. In particular, morphological treatments can include erosion and swelling. The second scene image may be eroded first, then expanded, and then the morphologically processed binarized mask image may be subjected to guided filtering to implement edge filtering to obtain a corrected image. Through morphological processing and guided filtering processing, the noise in the edge part of the corrected image can be reduced or even small, and the edge of the corrected image is softer.

After the corrected image is obtained, the subject area of the first scene image can be fused with the corrected image to obtain the target image. Since the target image is obtained by fusing the subject area of the first scene image and the corrected second scene image, there is no distortion in the target image, and the image quality is higher.

Referring to FIG. 10, in some embodiments, the first scene image and the second scene image are acquired by different cameras, and the image processing method may further include:

013: Acquire at least one first feature point in the first scene image;

015: Acquire at least one second feature point in the second scene image;

017: Match the first feature point and the second feature point to obtain at least one feature point pair;

019: Determine a mapping matrix from feature point pairs; and

021: Align the first scene image and the second scene image according to the mapping matrix.

Referring to FIG. 2 , in some embodiments, one or more processors 320 are further configured to execute the methods in 013 , 015 , 017 , 019 and 021 . That is, one or more processors 320 may be further configured to: acquire at least one first feature point in the first scene image; acquire at least one second feature point in the second scene image; combine the first feature point with the second feature matching the points to obtain at least one feature point pair; determining a mapping matrix according to the feature point pair; and aligning the first scene image and the second scene image according to the mapping matrix.

When the first scene image and the second scene image are acquired by different cameras, in one example, the first scene image is acquired by the first camera 210 shown in FIG. 2 , and the second scene image is acquired by the first camera 210 shown in FIG. 2 . Two cameras 220 acquire. In another example, the electronic device shown in FIG. 2 may further include a third camera (not shown), and the third camera may include a third lens and a third image sensor. The first scene image may be obtained by, for example, the first camera 210 , and the second scene image may be obtained by, for example, a third camera, wherein the second camera 220 may be used to form a binocular stereo vision system with the first camera 210 to obtain depth information.

After the first scene image and the second scene image are obtained, the first feature point in the first scene image and the second feature point in the second scene image can be identified, and the number of the first feature points can be one or more , the number of the second feature points may also be one or more. The first feature point and the second feature point may be matched to obtain one or more feature point pairs. The first feature point and the second feature point in each pair of feature points indicate the same position in the subject. The mapping matrix of the first camera and the third camera can be determined according to one or more feature point pairs, and the first scene image and the second scene image can be aligned according to the mapping matrix.

It can be understood that when the first scene image and the second scene are obtained by different cameras, the fields of view between the different cameras are not completely overlapped, resulting in overlapping areas and non-overlapping areas in the first scene image and the second scene image. area. In this regard, the first scene image and the second scene image may be aligned to obtain the aligned first scene image and the aligned second scene image, and the aligned first scene image and the aligned second scene image are completely overlapping. In this way, the target image obtained by fusing the first scene image and the second scene image may have better image quality.

It should be noted that, in an example, the alignment processing of the first scene image and the second scene image may be performed before step 07, step 09 and step 011. In this case, step 07 is to process the aligned first scene image In order to obtain the main body area and the background mask area, step 09 is to acquire the transformation matrix corresponding to the aligned second scene image when it is detected that the aligned second scene image has image distortion, and step 011 is to compare the The aligned second scene image is subjected to correction processing to obtain a corrected image. Step 05 is to fuse the aligned and color corrected subject area in the first scene with the aligned and distortion corrected second scene image to obtain a corrected image. target image. In another example, the alignment process of the first scene image and the second scene image may be performed after step 07 and performed before step 09 and step 011. At this time, aligning the first scene image and the second scene image is Align the distortion-corrected second scene image and the color overflow-corrected first scene image, step 05 is to fuse the aligned and color-corrected subject area in the first scene with the aligned and distortion-corrected first scene image. Second scene images to obtain target images.

In the image acquisition method of the embodiment of the present application, different cameras are used to acquire the first scene image and the second scene image respectively, then the first scene image and the second scene image can be acquired at the same time, and there is no time difference between the acquisition of the two frames of images , which can avoid the problem of ghost images during fusion when there is a time difference between the acquisition times of the two frames of images.

Of course, in other embodiments, the first scene image and the second scene image may also be acquired by the same camera (eg, the first camera 210 or the second camera 220 shown in FIG. 2 ) in a time-sharing manner. When the same camera is used to acquire the first scene image and the second scene image, the two frames of images are completely overlapped, and in this case, alignment processing is not required, and the amount of calculation can be reduced.

Referring to FIG. 15 , the present application further includes an image processing apparatus 150 . The image processing apparatus 150 includes a first acquisition module 1510 , a second acquisition module 1512 , and an image fusion module 1514 . The first acquisition module 1510 is configured to acquire a first scene image, where the first scene image includes a subject area, and the subject area is located in a depth-of-field area of the first scene image. The second acquisition module 1512 is configured to acquire a second scene image, where the far-field depth of the second scene image is not greater than the near-field depth of the first scene image. The image fusion module 1514 is used to fuse the subject area and the second scene image to obtain the target image.

The image processing apparatus 150 of the embodiment of the present application obtains the target image by fusing the subject area and the second scene image. Since the subject area of the first scene image is located in the depth of field area of the first scene image, the definition is high, and the visual image presented to the user is high. The feeling is a clear image, and the distant depth of field of the second scene image is not greater than the near depth of field of the first scene image, the second scene image has a lower definition, and the visual experience presented to the user is a blurred image, that is, the combination is a clear subject area and The blurred second scene image is directly captured by the camera, thereby avoiding the use of software algorithm processing to achieve background blur, and the obtained target image has a good optical blur effect.

In some embodiments, please refer to FIG. 15 , the image processing apparatus 150 may further include a processing module 1516, and the processing module 1516 is configured to process the first scene image to obtain the subject area and the background mask area. More specifically, the processing module 1516 can also be used to process the first scene image through the subject recognition detection network to obtain the initial foreground area and the initial background area of the first scene image; obtain the color overflow area at the junction of the initial foreground area and the initial background area. , the color overflow area includes at least one color overflow pixel; and perform color correction on the color overflow pixel to obtain the main body area and the background mask area. Further, the processing module 1516 can also be used to obtain the depth information of the first scene image; according to the depth information, a center weight map corresponding to the first scene image is generated, and the weight value represented by the center weight map gradually decreases from the center to the edge. ; Input the first scene image and the center weight map into the subject recognition detection network to obtain the confidence map of the foreground subject area of the first scene image; and determine the initial foreground area and the initial background area according to the confidence map of the foreground subject area. Still further, the processing module 1516 can also be used to expand a predetermined range around each color overflow pixel to obtain a correction area, and the correction area includes pixels in the foreground area and pixels in the background area; obtain the pixel value of the pixels in the foreground area in the correction area; The pixel values of the foreground area pixels in the correction area are corrected for color overflow pixels to obtain corrected pixels; and the corrected pixels are merged into the original foreground area to obtain the subject area and the background mask area. Still further, the processing module 1516 can also be used to extend a predetermined range around each color overflow pixel to obtain a correction area, and the correction area includes pixels in the foreground area and pixels in the background area; obtain the pixel value of the pixels in the background area in the correction area; The pixel values of the background area pixels in the correction area are used to correct the color overflow pixels to obtain corrected pixels; and merge the corrected pixels into the original background area to obtain the main area and the background mask area.

In some embodiments, referring to FIG. 15 , the image fusion module 1514 can also be used to use the subject area as the target subject area; fuse the background mask area and the background area in the second scene image to obtain the target background area; and according to the target The subject area and the target background area acquire the target image.

In some embodiments, please refer to FIG. 15 , the image processing apparatus 150 may further include a third acquisition module 1518 and a fourth acquisition module 1520 . The third obtaining module 1518 is configured to obtain a transformation matrix corresponding to the second scene image when it is detected that there is image distortion in the second scene image. The fourth obtaining module 1520 is configured to perform correction processing on the second scene image according to the transformation matrix to obtain a corrected image. Furthermore, the image fusion module 1514 can also be used to fuse the subject area and the corrected image to obtain the target image.

In some embodiments, referring to FIG. 15 , the image processing apparatus 150 may further include a fifth acquisition module 1522 , a sixth acquisition module 1524 , a matching module 1526 , a determination module 1528 and an alignment module 1530 . The fifth acquisition module 1522 is used to acquire at least one first feature point in the first scene image; the sixth acquisition module 1524 is used to acquire at least one second feature point in the second scene image; the matching module 1526 is used to The feature points and the second feature points are matched to obtain at least one feature point pair; the determining module 1528 is used for determining a mapping matrix according to the feature point pair; the aligning module 1530 is used for aligning the first scene image and the second scene image according to the mapping matrix.

Referring to FIG. 16 , an embodiment of the present application further provides a computer-readable storage medium 160 on which a computer program 162 is stored. When the program is executed by the processor 320, the image processing method of any of the above-mentioned embodiments can be implemented. Steps, such as 01, 03, 05, 07, 071, 073, 075, 0711, 0713, 0715, 0717, 07511, 07513, 07515, 07517, 07521, 07523, 07525, 07527, 051, 053, 055, 09, Methods in 011, 057, 013, 015, 017, 019 and 021.

More specifically, for example, when the program is executed by the processor 320, the following steps of the image processing method can be implemented:

The computer-readable storage medium 160 may be set in the image processing apparatus 150 or the electronic device 200, or may be set in the cloud server. At this time, the image processing apparatus 100 or the electronic device 200 can communicate with the cloud server to obtain the corresponding computer. program 162.

It is understood that the computer program 162 includes computer program code. The computer program code may be in source code form, object code form, an executable file or some intermediate form, or the like. Computer-readable storage media may include: any entity or device capable of carrying computer program codes, recording media, U disk, removable hard disk, magnetic disk, optical disk, computer memory, read-only memory (ROM, Read-Only Memory), random storage Access memory (RAM, Random Access Memory), and software distribution media, etc.

In the description of this specification, reference is made to the terms "one embodiment", "some embodiments". The description of "exemplary embodiment", "example", "specific example", or "some examples" etc. is intended to incorporate the A particular feature, structure, material or characteristic described by an embodiment or example is included in at least one embodiment or example of the present application. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials or characteristics described may be combined in a suitable manner in any one or more embodiments or examples. Furthermore, those skilled in the art may combine and combine the different embodiments or examples described in this specification, as well as the features of the different embodiments or examples, without conflicting each other.

Any description of a process or method in a flowchart or otherwise described herein may be understood to represent a module, segment or portion of code comprising one or more executable instructions for implementing a specified logical function or step of the process , and the scope of the preferred embodiments of the present application includes alternative implementations in which the functions may be performed out of the order shown or discussed, including performing the functions substantially concurrently or in the reverse order depending upon the functions involved, which should It is understood by those skilled in the art to which the embodiments of the present application belong.

Although the embodiments of the present application have been shown and described above, it should be understood that the above embodiments are exemplary and should not be construed as limitations to the present application. Variations in the implementation. Modifications. Substitutions and variations.

Claims

An image processing method, comprising:

acquiring a first scene image, where the first scene image includes a subject area, and the subject area is located in a depth-of-field area of the first scene image;

acquiring a second scene image, the far depth of field of the second scene image is not greater than the near depth of field of the first scene image; and

The subject area and the second scene image are fused to obtain a target image.
The image processing method according to claim 1, further comprising:

The first scene image is processed to obtain the subject region and background mask region.
The image processing method according to claim 2, wherein the processing of the first scene image to obtain the subject area and the background mask area comprises:

Process the first scene image through a subject recognition detection network to obtain an initial foreground area and an initial background area of the first scene image;

obtaining a color overflow area at the junction of the initial foreground area and the initial background area, the color overflow area including at least one color overflow pixel; and

The color bleed pixels are color corrected to obtain the subject area and the background mask area.
The image processing method according to claim 3, wherein the processing of the first scene image through a subject recognition detection network to obtain an initial foreground area and an initial background area of the first scene image includes:

acquiring depth information of the first scene image;

According to the depth information, a center weight map corresponding to the first scene image is generated, and the weight value represented by the center weight map gradually decreases from the center to the edge;

Inputting the first scene image and the center weight map into the subject recognition detection network to obtain a confidence map of the foreground region of the first scene image; and

An initial foreground area and an initial background area are determined according to the confidence map of the foreground area.
The image processing method according to claim 4, wherein,

The depth information of the obtained first scene image is obtained through a binocular vision system; or/and

The obtained depth information of the first scene image is obtained through a monocular vision system; or/and

The depth information for obtaining the first scene image is obtained through a structured light camera module; or/and

The acquired depth information of the first scene image is acquired through a time-of-flight camera module.
The image processing method according to claim 3, wherein the performing color correction on the color overflow pixels to obtain the main body area and the background mask area, comprising:

Extending a predetermined range around each of the color overflow pixels to obtain a correction area, the correction area includes pixels in the foreground area and pixels in the background area;

obtaining the pixel value of the foreground area pixel in the correction area;

Correcting the color overflow pixels according to the pixel values of the foreground area pixels in the correction area to obtain corrected pixels; and merging the corrected pixels into the initial foreground area to obtain the subject area and background mask area.
The image processing method according to claim 3, wherein the performing color correction on the color overflow pixels to obtain the main body area and the background mask area, comprising:

Extending a predetermined range around each of the color overflow pixels to obtain a correction area, the correction area includes pixels in the foreground area and pixels in the background area;

obtaining the pixel value of the background area pixel in the correction area;

Correcting the color overflow pixels according to the pixel values of the background area pixels in the correction area to obtain corrected pixels; and merging the corrected pixels into the initial background area to obtain the subject area and background mask area.
The image processing method of claim 2, wherein the second scene image is an optically blurred image, and the first scene image has a higher definition than the second scene image;

The fusion of the subject area and the second scene image to obtain a target image includes:

using the subject area as a target subject area; fusing the background mask area and the background area in the second scene image to obtain a target background area; and

The target image is acquired according to the target subject area and the target background area.
The image processing method according to any one of claims 1 to 8, further comprising:

When it is detected that there is image distortion in the second scene image, acquiring a transformation matrix corresponding to the second scene image; and

Perform correction processing on the second scene image according to the transformation matrix to obtain a corrected image;

The fusion of the subject area and the second scene image to obtain a target image includes:

The subject region is fused with the corrected image to obtain the target image.
The image processing method according to any one of claims 1-9, wherein the first scene image and the second scene image are acquired by different cameras, and the image processing method further comprises:

acquiring at least one first feature point in the first scene image;

acquiring at least one second feature point in the second scene image;

Matching the first feature point and the second feature point to obtain at least one feature point pair;

determining a mapping matrix from the pair of feature points; and

The first scene image and the second scene image are aligned according to the mapping matrix.
An electronic device, comprising:

memory; and

one or more processors, one or more of the processors connected to the storage, one or more of the processors for:

acquiring a first scene image, where the first scene image includes a subject area, and the subject area is located in a depth-of-field area of the first scene image;

acquiring a second scene image, the far depth of field of the second scene image is not greater than the near depth of field of the first scene image; and

The subject area and the second scene image are fused to obtain a target image.
The electronic device according to claim 11, wherein the processor is further configured to:

The first scene image is processed to obtain the subject region and background mask region.
The electronic device according to claim 12, wherein the processor is further configured to:

Process the first scene image through a subject recognition detection network to obtain an initial foreground area and an initial background area of the first scene image;

obtaining a color overflow area at the junction of the initial foreground area and the initial background area, the color overflow area including at least one color overflow pixel; and

The color bleed pixels are color corrected to obtain the subject area and the background mask area.
The electronic device according to claim 13, wherein the processor is further configured to:

acquiring depth information of the first scene image;

According to the depth information, a center weight map corresponding to the first scene image is generated, and the weight value represented by the center weight map gradually decreases from the center to the edge;

Inputting the first scene image and the center weight map into the subject recognition detection network to obtain a confidence map of the foreground region of the first scene image; and

The initial foreground area and the initial background area are determined according to the confidence map of the foreground area.
The electronic device according to claim 14, wherein the electronic device further comprises:

A binocular vision system for acquiring depth information of the first scene image; or/and

A monocular vision system for acquiring depth information of the first scene image; or/and

A structured light camera module, the structured light camera module is used to obtain the depth information of the first scene image; or/and

A time-of-flight camera module, the time-of-flight camera module is used to obtain depth information of the first scene image.
The electronic device according to claim 13, wherein the processor is further configured to:

Extending a predetermined range around each of the color overflow pixels to obtain a correction area, the correction area includes pixels in the foreground area and pixels in the background area;

obtaining the pixel value of the foreground area pixel in the correction area;

Correcting the color overflow pixels according to the pixel values of the foreground area pixels in the correction area to obtain corrected pixels; and merging the corrected pixels into the initial foreground area to obtain the subject area and background mask area.
The electronic device according to claim 13, wherein the processor is further configured to:

Extending a predetermined range around each of the color overflow pixels to obtain a correction area, the correction area includes pixels in the foreground area and pixels in the background area;

obtaining the pixel value of the background area pixel in the correction area;

Correcting the color overflow pixels according to the pixel values of the background area pixels in the correction area to obtain corrected pixels; and merging the corrected pixels into the initial background area to obtain the subject area and background mask area.
The electronic device according to claim 12, wherein the second scene image is an optical blurred image, and the definition of the first scene image is higher than that of the second scene image; the processor is further configured to :

using the subject area as a target subject area; fusing the background mask area and the background area in the second scene image to obtain a target background area; and

The target image is acquired according to the target subject area and the target background area.
The electronic device according to any one of claims 11 to 18, wherein the processor is further configured to:

When it is detected that there is image distortion in the second scene image, acquiring a transformation matrix corresponding to the second scene image; and

Perform correction processing on the second scene image according to the transformation matrix to obtain a corrected image;

The fusion of the subject area and the second scene image to obtain a target image includes:

The subject region is fused with the corrected image to obtain the target image.
The electronic device according to any one of claims 11-19, wherein the first scene image and the second scene image are acquired by different cameras, and the processor is further configured to:

acquiring at least one first feature point in the first scene image;

acquiring at least one second feature point in the second scene image;

Matching the first feature point and the second feature point to obtain at least one feature point pair;

determining a mapping matrix from the pair of feature points; and

The first scene image and the second scene image are aligned according to the mapping matrix.
An image processing device, comprising:

a first acquisition module, configured to acquire a first scene image, where the first scene image includes a subject area, and the subject area is located in a depth-of-field area of the first scene image;

a second acquisition module, configured to acquire a second scene image, the far depth of field of the second scene image is not greater than the near depth of field of the first scene image; and

An image fusion module, configured to fuse the subject area and the second scene image to obtain a target image.
A computer-readable storage medium on which a computer program is stored, characterized in that, when the computer program is executed by a processor, the steps of the method according to any one of claims 1 to 10 are implemented.