CN117437258A

CN117437258A - Image processing method, device, equipment and medium

Info

Publication number: CN117437258A
Application number: CN202210822681.4A
Authority: CN
Inventors: 王宝林; 吴涛
Original assignee: Beijing Zitiao Network Technology Co Ltd
Current assignee: Beijing Zitiao Network Technology Co Ltd
Priority date: 2022-07-12
Filing date: 2022-07-12
Publication date: 2024-01-23

Abstract

The embodiment of the disclosure relates to an image processing method, an image processing device and a medium, wherein the method comprises the following steps: acquiring at least one corresponding group of image pairs through a plurality of cameras, wherein the image pairs comprise a first image and a second image with a common view area; determining a plurality of motion vectors corresponding to the common-view region from the image pair; determining depth information corresponding to each motion vector based on camera parameters of a plurality of cameras; mapping the three-dimensional points corresponding to each motion vector in the world coordinate system to a preset target coordinate system to obtain a plurality of projection points, and generating depth information of each point in the target coordinate system according to the depth information corresponding to each motion vector and the projection points. By adopting the technical scheme, the depth information of each point in the target coordinate system can be determined without adopting a depth camera, so that the hardware cost is reduced, and the method can be applied to various hardware devices to quickly reconstruct three-dimensional information.

Description

Image processing method, device, equipment and medium

Technical Field

The present disclosure relates to the field of image processing technologies, and in particular, to an image processing method, apparatus, device, and medium.

Background

Three-dimensional reconstruction is widely used as an important research field in computer vision for hardware devices such as virtual display devices and augmented reality devices.

In the related art, three-dimensional reconstruction schemes generate three-dimensional image information by fusing a depth camera and a multi-view camera, however, the cost of the depth camera is expensive, and how to reduce the hardware cost is a problem to be solved.

Disclosure of Invention

In order to solve the technical problems, the present disclosure provides an image processing method, an apparatus, a device, and a medium.

The embodiment of the disclosure provides an image processing method, which comprises the following steps:

acquiring a corresponding plurality of images by a plurality of cameras, wherein the plurality of images comprise at least one group of image pairs, wherein the image pairs comprise a first image and a second image with a common view area;

determining a plurality of motion vectors corresponding to the common-view region according to the image pair;

determining depth information corresponding to each motion vector based on camera parameters of the plurality of cameras;

mapping the three-dimensional points corresponding to each motion vector in the world coordinate system to a preset target coordinate system to obtain a plurality of projection points, and generating depth information of each point in the target coordinate system according to the depth information corresponding to each motion vector and the plurality of projection points.

The embodiment of the disclosure also provides an image processing apparatus, which includes:

an acquisition module for acquiring a corresponding plurality of images through a plurality of cameras, wherein the plurality of images comprise at least one group of image pairs, wherein the image pairs comprise a first image and a second image with a common view area;

a first determining module, configured to determine a plurality of motion vectors corresponding to the common-view area according to the image pair;

the second determining module is used for determining depth information corresponding to each motion vector based on camera parameters of the cameras;

the generation module is used for mapping the three-dimensional points corresponding to each motion vector in the world coordinate system to a preset target coordinate system to obtain a plurality of projection points, and generating the depth information of each point in the target coordinate system according to the depth information corresponding to each motion vector and the projection points.

The embodiment of the disclosure also provides an electronic device, which comprises: a processor; a memory for storing the processor-executable instructions; the processor is configured to read the executable instructions from the memory and execute the instructions to implement an image processing method as provided in an embodiment of the disclosure.

The present disclosure also provides a computer-readable storage medium storing a computer program for executing the image processing method as provided by the embodiments of the present disclosure.

The disclosed embodiments also provide a computer program product comprising computer programs/instructions which, when executed by a processor, implement an image processing method as provided by the disclosed embodiments.

Compared with the prior art, the technical scheme provided by the embodiment of the disclosure has the following advantages: according to the image processing scheme provided by the embodiment of the disclosure, the plurality of motion vectors corresponding to the common view area are generated through the first image and the second image with the common view area, the depth information corresponding to each motion vector is determined based on the camera parameters of the multi-view camera, and further, the three-dimensional point corresponding to each motion vector under the world coordinate system is mapped to the preset target coordinate system to obtain a plurality of projection points, and the depth information of each point in the target coordinate system is generated according to the depth information corresponding to each motion vector and the plurality of projection points.

Drawings

The above and other features, advantages, and aspects of embodiments of the present disclosure will become more apparent by reference to the following detailed description when taken in conjunction with the accompanying drawings. The same or similar reference numbers will be used throughout the drawings to refer to the same or like elements. It should be understood that the figures are schematic and that elements and components are not necessarily drawn to scale.

Fig. 1 is a schematic flow chart of an image processing method according to an embodiment of the disclosure;

fig. 2 is a flowchart of another image processing method according to an embodiment of the disclosure;

fig. 3 is a flowchart illustrating another image processing method according to an embodiment of the disclosure;

fig. 4 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present disclosure;

fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure have been shown in the accompanying drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but are provided to provide a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the present disclosure are for illustration purposes only and are not intended to limit the scope of the present disclosure.

The embodiments of the present disclosure provide an image processing method, which is described below with reference to specific embodiments.

Fig. 1 is a flowchart of an image processing method according to an embodiment of the present disclosure, where the method may be performed by an image processing apparatus, and the apparatus may be implemented by software and/or hardware, and may be generally integrated in an electronic device. As shown in fig. 1, the method includes:

step 101, acquiring a plurality of corresponding images through a plurality of cameras, wherein the plurality of images comprise at least one group of image pairs, and the image pairs comprise a first image and a second image with a common view area.

In this embodiment, the plurality of images are acquired by a multi-camera, optionally, the multi-camera includes a plurality of cameras, each camera acquiring one image, the multi-camera being configured such that a common view area exists between the plurality of images acquired.

The multi-camera can be a binocular camera, and three or more cameras can be adopted. Taking a binocular camera as an example, an image I and an image II acquired by the binocular camera are acquired, wherein a common view area exists between the image I and the image II. As another example, taking a three-eye camera as an example, an image one, an image two and an image three acquired by the three-eye camera are acquired, wherein a common view area exists between the image one, the image two and the image three, and the image one, the image two, the image three and the image two respectively form an image pair.

Step 102, determining a plurality of motion vectors corresponding to the common area according to the image pair.

In this embodiment, there are various implementations of generating a plurality of motion vectors corresponding to the common view area according to the first image and the second image, alternatively, a hardware processing module may be called to process the first image and the second image to generate a plurality of motion vectors, where the hardware processing module may be a video processing module set in hardware of the device, or may be a visual processing hardware module on a chip, or may process the first image and the second image by a feature extraction matching method to generate a plurality of motion vectors.

The motion vector refers to a vector formed by connecting the features at different pixel positions in the two images.

As an example, a feature is extracted for each pixel point in the common view area of the first image and the second image, and if the pixel point P in the first image corresponds to a feature one, the feature one corresponds to the pixel point P 'in the second image, and as for the feature one, a motion vector may be generated from the pixel point P and the pixel point P', and thus, a plurality of corresponding motion vectors may be generated for the common view area.

And step 103, determining depth information corresponding to each motion vector based on camera parameters of the cameras.

In this embodiment, each motion vector is triangulated by a camera parameter to generate depth information of each motion vector. Wherein, the camera parameters comprise internal parameters, external parameters and distortion parameters.

For example, taking a binocular camera as an example, the positional relationship between the two cameras can be determined by the camera parameters of the two cameras, and for each motion vector, depth information corresponding to the motion vector can be determined based on the camera parameters of the two cameras and the motion vector by the principle of triangle similarity.

And 104, mapping the three-dimensional points corresponding to each motion vector in the world coordinate system to a preset target coordinate system to obtain a plurality of projection points, and generating depth information of each point in the target coordinate system according to the depth information corresponding to each motion vector and the projection points.

In this embodiment, for each motion vector, a three-dimensional point corresponding to the motion vector is determined under a world coordinate system, optionally, with reference to one of the cameras, for each motion vector, coordinates under a camera coordinate system of the camera may be determined, and, in combination with depth information of the motion vector, a three-dimensional point corresponding to the motion vector may be determined under the world coordinate system, where in the step, the depth information corresponding to the motion vector is the depth information of the three-dimensional point. And mapping the three-dimensional points into the target coordinate system according to the conversion relation between the world coordinate system and the target coordinate system to obtain a plurality of projection points, wherein the depth information of the projection points is the depth information of the three-dimensional points corresponding to the projection points, and determining the depth information of each point in the target coordinate system according to the depth information of the plurality of projection points.

The target coordinate system may be a coordinate system based on a required direction, for example, a virtual reality device is worn on the head of the user, and the target coordinate system is a coordinate system with the center of the head of the user as an origin, wherein the front direction is a positive z-axis direction, the right direction is a positive x-axis direction, and the upper direction is a positive y-axis direction.

The method of the embodiment of the disclosure can be applied to a Virtual Reality (VR) device or an Augmented Reality (AR) device, wherein the virtual reality device or the augmented reality device is provided with a plurality of cameras, or can also be applied to a platform with corresponding hardware configuration. In this embodiment, by generating depth information of each point in the target coordinate system and determining depth map information corresponding to the two-dimensional image acquired by the camera under the target coordinate system, the depth map information may be used for three-dimensional scene reconstruction.

According to the technical scheme of the embodiment of the disclosure, the first image and the second image of the common view area are used for generating a plurality of motion vectors corresponding to the common view area, depth information corresponding to each motion vector is determined based on camera parameters of the multi-view camera, further, three-dimensional points corresponding to each motion vector under a world coordinate system are mapped to a preset target coordinate system to obtain a plurality of projection points, and depth information of each point in the target coordinate system is generated according to the depth information corresponding to each motion vector and the plurality of projection points, therefore, depth information of each point in the target coordinate system is determined through the motion vectors among the plurality of images, the use of the depth camera is not needed, hardware cost is reduced, the method can be applied to quick reconstruction of three-dimensional information of various hardware equipment, and three-dimensional reconstruction instantaneity is guaranteed.

Based on the above embodiments, the processing procedure of the motion vector will be described below.

Fig. 2 is a flowchart of another image processing method according to an embodiment of the disclosure, as shown in fig. 2, where the method includes:

step 201, acquiring a plurality of corresponding images through a plurality of cameras, wherein the plurality of images comprise at least one group of image pairs, and wherein the image pairs comprise a first image and a second image with a common view area.

The explanation of step 101 in the previous embodiment applies equally to step 201.

Step 202, a hardware processing module is called to process the first image and the second image, a plurality of candidate motion vectors corresponding to the common view area are generated, and candidate motion vectors which do not meet preset conditions are removed, so that a plurality of motion vectors corresponding to the common view area are generated, and/or precision improvement processing is carried out on the plurality of motion vectors.

In this embodiment, the hardware processing module may be invoked to process the first image and the second image to generate a plurality of candidate motion vectors, for example, for VR devices and AR devices, the hardware processing module may be a video processing module set in device hardware, or may be a visual processing hardware module on a chip.

Next, candidate motion vectors that do not satisfy the preset condition are removed.

In one embodiment of the present disclosure, a direction of each candidate motion vector is matched with a specified direction, and the candidate motion vectors whose directions coincide with the specified direction are determined as a plurality of motion vectors corresponding to a common region.

In this embodiment, the direction of the motion vector is known in the case of correct matching for the first image and the second image, and therefore, it is possible to set a specified direction and match the direction of each candidate motion vector with the specified direction to reject the mismatching candidate motion vector. As an example, when a first image is taken as a reference, and a specified direction is known to be leftward, if there is a candidate motion vector whose direction is rightward, the candidate motion vector whose direction is rightward is eliminated, and the candidate motion vector whose direction is leftward is taken as a motion vector corresponding to the common region.

In one embodiment of the present disclosure, a first modulus value of each candidate motion vector is matched with a preset modulus value range, and the candidate motion vectors of which the first modulus value is within the preset modulus value range are determined as a plurality of motion vectors corresponding to a common-view region.

In this embodiment, when generating a plurality of candidate motion vectors, the modulus value of each candidate motion vector may be obtained, and since there is a correspondence between the modulus value of the motion vector and the scene depth of the scene shot by the camera, a preset modulus value range may be set to reject candidate motion vectors that do not conform to the preset modulus value range. As an example, a first threshold and a second threshold are set, wherein the first threshold is smaller than the second threshold, candidate motion vectors with a modulus smaller than the first threshold are eliminated, candidate motion vectors with a modulus larger than the second threshold are eliminated, and candidate motion vectors with a modulus larger than or equal to the first threshold and smaller than or equal to the second threshold are taken as motion vectors corresponding to the common region.

In one embodiment of the present disclosure, a first modulus value of each candidate motion vector and a second modulus value of a contrast motion vector corresponding to each candidate motion vector are obtained, and candidate motion vectors with the first modulus value being equal to the second modulus value are determined as a plurality of motion vectors corresponding to a common region.

The candidate motion vector and the contrast motion vector are obtained by taking the first image and the second image as references respectively.

In this embodiment, when the hardware processing module is invoked to process the first image and the second image, the first image is used as a reference to extract a candidate motion vector, the second image is used as a reference to extract a comparison motion vector, the directions of the candidate motion vector and the comparison motion vector are different, and the mode value of the motion vector has a corresponding relationship with the scene depth, so that the candidate motion vector with the first mode value different from the second mode value can be removed. The first module value being equal to the second module value may mean that the first module value is equal to the second module value, or that a difference value between the first module value and the second module value is within a preset value range.

It should be noted that, the implementation manner of eliminating the candidate motion vectors that do not satisfy the preset condition may be performed alone or may be matched with each other, which is not limited herein. Therefore, the accuracy of a plurality of motion vectors corresponding to the generated common-view area can be improved by eliminating candidate motion vectors which do not meet the preset condition, so that the accuracy of the three-dimensional point is improved, and the accuracy of depth information is further improved.

Next, how to perform processing for improving accuracy on a plurality of motion vectors will be described.

In one embodiment of the present disclosure, sub-pixel level offset processing is performed on a start position and an end position corresponding to a motion vector, respectively, to generate a plurality of candidate start positions and a plurality of candidate end positions; and determining the score between each candidate starting point position and each candidate end point position in a block matching mode, and updating the starting point position and the end point position corresponding to the motion vector according to the candidate starting point position and the candidate end point position with the highest score. The candidate starting point position is positioned between the starting point position and the adjacent position of the starting point position, the candidate end point position is positioned between the end point position and the adjacent position of the end point position, and the adjacent position is the position of the adjacent pixel point of the starting point position or the end point position.

In this embodiment, the start position of the motion vector is determined based on the first image and the end position is determined based on the second image with the first image as a reference. For each motion vector, performing sub-pixel level offset processing on the starting point position to generate a plurality of candidate starting point positions, performing sub-pixel level offset processing on the end point position to generate a plurality of candidate end point positions, determining the score between any candidate starting point position and any candidate end point position in a block matching mode, determining a target score meeting a preset condition from the determined plurality of scores, and taking the candidate starting point position and the candidate end point position corresponding to the target score as the starting point position and the end point position of the motion vector.

As an example, for each motion vector, N candidate start positions and M candidate end positions are generated by sub-pixel level shift processing, for example, for the start position (1, 1), a plurality of candidate start positions such as (0.8,1), (1, 1.2), (1,0.8), (1.2,1) are generated by sub-pixel level shift processing. Further, n×m scores are determined by means of block matching, the highest score is determined from the n×m scores, and the candidate start point position and the candidate end point position corresponding to the highest score are used as the start point position and the end point position of the motion vector. For the pixel points with a longer scene distance, the accuracy of the motion vector is lower, and in this example, by acquiring the position accuracy of the sub-pixel level of the motion vector, the accuracy of the generated depth information can be improved for scenes beyond a certain distance.

In one embodiment of the present disclosure, a plurality of adjacent positions of any one of a start position and an end position corresponding to a motion vector are determined; determining a plurality of scores between the other of the start position and the end position and a plurality of adjacent positions; an offset amount of any one of the start position and the end position is determined based on the plurality of scores, and any one of the start position and the end position is corrected based on the offset amount.

In this embodiment, the start position of the motion vector is determined based on the first image and the end position is determined based on the second image with the first image as a reference. For each motion vector, a plurality of adjacent positions to the start position may be determined and scores between the plurality of adjacent positions and the end position may be determined, respectively, or a plurality of adjacent positions to the end position may be determined and scores between the plurality of adjacent positions and the start position may be determined, respectively, wherein the adjacent positions are determined according to positions of adjacent pixels. Further, the amount of deviation of the start position or the end position is determined by a parabolic interpolation method and a plurality of scores, and the start position or the end position is corrected based on the amount of deviation.

As an example, for each motion vector, adjacent pixel points in the up-down-left-right four directions of the start point position are determined, and the positions of the four adjacent pixel points are taken as the adjacent positions. The scores between the starting point position and the end point position and the scores between the four adjacent positions and the end point position are determined in a block matching mode, the offset of the starting point position is determined based on a parabolic interpolation method and the five scores, and the starting point position is corrected according to the offset. For the pixel points with a longer scene distance, the precision of the motion vector is lower, in the example, the starting point position or the end point position of the motion vector is corrected by combining a parabolic interpolation method, and for the scenes with a certain distance, the accuracy of the generated depth information can be improved.

In step 203, depth information corresponding to each motion vector is determined based on camera parameters of the plurality of cameras.

The explanation of step 103 in the previous embodiment applies equally to step 203.

Step 204, determining a depth threshold according to the binocular baseline length of the camera corresponding to the image pair, and eliminating depth information greater than the depth threshold from the depth information of the plurality of motion vectors.

In this embodiment, for the first image and the second image and the two corresponding cameras, a binocular baseline length between the two cameras may be obtained, where the binocular baseline length is used to indicate a distance between the two cameras. The binocular baseline length can influence the ranging precision, and under the condition that the binocular baseline length is fixed, the distance measuring precision is lower as the scene is farther, so that the mapping relation between the binocular baseline length and the depth threshold can be preset, the currently adopted depth threshold is determined according to the binocular baseline length between two cameras and the mapping relation, further, depth information which is larger than the currently adopted depth threshold and a motion vector corresponding to the depth information are removed according to the depth information determined in the steps, so that the depth information with lower precision is removed, and the precision is further improved.

In step 205, mapping the three-dimensional points corresponding to each motion vector in the world coordinate system to a preset target coordinate system to obtain a plurality of projection points, and generating depth information of each point in the target coordinate system according to the depth information corresponding to each motion vector and the plurality of projection points.

In this embodiment, for each motion vector processed in the above steps, a three-dimensional point corresponding to the motion vector is determined in a world coordinate system, and then, according to a conversion relationship between the world coordinate system and a target coordinate system, a plurality of three-dimensional points are mapped to the target coordinate system to obtain a plurality of projection points, where depth information of a projection point is depth information of a three-dimensional point corresponding to the projection point, compensation processing is performed according to the depth information of the plurality of projection points, and depth information of each point in the target coordinate system is determined. Wherein the target coordinate system may be a coordinate system based on the required direction.

In the embodiment of the disclosure, the hardware processing module is invoked to acquire the motion vectors among the plurality of images, so that the power consumption is low, the instantaneity is high, and the accuracy of the three-dimensional points and the accuracy of the depth information are further improved while the instantaneity is ensured by eliminating candidate motion vectors which do not meet the preset conditions and performing accuracy improvement processing on the motion vectors.

Based on the above-described embodiments, a description is given below of a depth information compensation process based on a plurality of cameras.

Fig. 3 is a flowchart of another image processing method according to an embodiment of the disclosure, as shown in fig. 3, where the method includes:

step 301, acquiring a corresponding plurality of images through a plurality of cameras, wherein the plurality of images comprise at least one group of image pairs, and wherein the image pairs comprise a first image and a second image with a common view area.

In this embodiment, a plurality of images are acquired based on a multi-camera, and image preprocessing is performed on the plurality of images. Optionally, image preprocessing includes, but is not limited to, image filtering, image contrast adjustment, image anti-distortion manipulation, image alignment, image stitching.

Step 302, acquiring an image type of an image pair, and determining a feature type corresponding to the image type, wherein the feature type comprises a point feature and a line feature.

In this embodiment, the image types may be preset, where the preset image types include a first type and a second type, and the first type corresponds to line features, the second type corresponds to point features, and the first type includes, for example, a window edge, a door frame, and the like. Optionally, training an image classification model based on the neural network, wherein the input of the image classification model is an image, the output of the image classification model is an image type, and the image type of the image pair is determined through the trained image classification model.

And step 303, calling a hardware processing module to extract the target characteristics matched with the characteristic types of each image block in the common area, and generating a plurality of motion vectors according to the target characteristics of a plurality of image blocks.

In this embodiment, an image block is composed of a plurality of pixel points in a certain area in an image, for example, the image is divided into a plurality of image blocks according to a size of 8×8, then an image with a size of 640×480 is divided into 80×60 image blocks, and for each image block, features are extracted, and 80×60 motion vectors are correspondingly generated.

As an example, in the case where the image type of the image pair is the first type, a line feature of each image block in the common region is extracted, and a motion vector corresponding to each image block is generated based on the line feature of each image block. Therefore, by introducing the line features, the feature extraction effect can be improved for scenes such as window edges, door frames and the like, and the accuracy of depth information of each point in a target coordinate system can be further improved.

Step 304, determining depth information corresponding to each motion vector based on camera parameters of a plurality of cameras, and mapping three-dimensional points corresponding to each motion vector under a world coordinate system to a preset target coordinate system to obtain a plurality of projection points.

The explanation of step 104 in the previous embodiment applies equally to this step.

In step 305, depth information of each projection point is determined according to the depth information corresponding to each motion vector.

In an embodiment of the present disclosure, taking the first image and the second image as examples, there are cases where a plurality of three-dimensional points correspond to the same projection point, optionally, at least one three-dimensional point corresponding to each projection point is determined, weighting processing is performed according to depth information corresponding to the at least one three-dimensional point, and depth information corresponding to each projection point is generated.

In one embodiment of the present disclosure, the number of cameras in the multi-view camera is more than three, the number of image pairs is more than three, and the depth information of each projection point is determined according to the depth information corresponding to each motion vector, including: and determining at least one three-dimensional point corresponding to each projection point, and carrying out weighting processing according to the depth information corresponding to the at least one three-dimensional point to generate the depth information corresponding to each projection point.

In this embodiment, for each image pair, depth information of each projection point is determined according to depth information corresponding to each motion vector, where in the case that the number of image pairs is multiple, each projection point may correspond to multiple three-dimensional points under different image pairs, that is, where multiple three-dimensional points correspond to the same projection point, optionally, at least one three-dimensional point corresponding to each projection point is determined, and weighted average processing is performed according to the depth information corresponding to at least one three-dimensional point, so as to generate depth information corresponding to each projection point.

And 306, performing compensation processing according to the depth information of each projection point to generate the depth information of all points in the target coordinate system.

In this embodiment, a correlation compensation algorithm is used to perform compensation processing according to the depth information of each projection point, so as to generate depth information of the rest points except for the plurality of projection points in the target coordinate system. Alternatively, the compensation algorithm may employ poisson reconstruction, for a point of unknown depth information in the target coordinate system, using surrounding known depth information to determine the depth information for that point.

As an example, the number of cameras is four, and the cameras are respectively arranged at the upper left, the upper right, the lower left and the lower right, and a common view area exists among the first image, the second image, the third image and the fourth image shot by the four cameras. And determining a plurality of image pairs based on the first image, the second image, the third image and the fourth image, respectively determining the depth information of the projection points in the target coordinate system for each image pair, and performing compensation processing according to the depth information of each projection point to generate the depth information of all points in the target coordinate system.

In this embodiment, the same applies to the step of improving accuracy in the foregoing embodiment, and the present invention is not limited thereto.

In the embodiment of the disclosure, based on the established target coordinate system, the depth information of each point in the target coordinate system can be respectively acquired under different camera view angles, and the depth information of each point in the target coordinate system is determined through the depth information under different camera view angles, so that the accuracy of the depth information is further improved. And generating a plurality of motion vectors according to the target characteristics of a plurality of image blocks, determining the depth information of each point in the target coordinate system in a compensation mode, and accurately generating the depth information covering each point in the target coordinate system while improving the processing efficiency and the instantaneity.

Fig. 4 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present disclosure, where the apparatus may be implemented by software and/or hardware, and may be generally integrated into an electronic device for image processing. As shown in fig. 4, the image processing apparatus includes: the system comprises an acquisition module 41, a first determination module 42, a second determination module 43 and a generation module 44.

The acquiring module 41 is configured to acquire a plurality of images through a plurality of cameras, where the plurality of images includes at least one group of image pairs, and the image pairs include a first image and a second image that have a common view area.

A first determining module 42 is configured to determine a plurality of motion vectors corresponding to the common-view region according to the image pair.

A second determining module 43, configured to determine depth information corresponding to each motion vector based on camera parameters of the multi-view camera.

The generating module 44 is configured to map three-dimensional points corresponding to each motion vector in a world coordinate system to a preset target coordinate system to obtain a plurality of projection points, and generate depth information of each point in the target coordinate system according to the depth information corresponding to each motion vector and the plurality of projection points.

In one embodiment of the present disclosure, the first determining module 42 is specifically configured to: invoking a hardware processing module to process the first image and the second image, and generating a plurality of candidate motion vectors corresponding to the common-view area; matching the direction of each candidate motion vector with a specified direction, and determining the candidate motion vectors with the directions consistent with the specified direction as a plurality of motion vectors corresponding to a common-view area; and/or, matching a first module value of each candidate motion vector with a preset module value range, and determining the candidate motion vectors of which the first module values are in the preset module value range as a plurality of motion vectors corresponding to a common-view area; and/or, obtaining a first module value of each candidate motion vector and a second module value of a comparison motion vector corresponding to each candidate motion vector, and determining the candidate motion vector with the first module value being equal to the second module value as a plurality of motion vectors corresponding to a common-view area, wherein the candidate motion vector and the comparison motion vector are obtained by taking the first image and the second image as references respectively.

In one embodiment of the present disclosure, the common-view region includes a plurality of image blocks, and the first determining module 42 is specifically configured to: acquiring an image type of the image pair, and determining a feature type corresponding to the image type, wherein the feature type comprises point features and line features; and calling a hardware processing module to extract target characteristics matched with the characteristic types of each image block in the common area, and generating a plurality of motion vectors according to the target characteristics of the image blocks.

In one embodiment of the present disclosure, the apparatus further comprises: the updating module is used for respectively carrying out offset processing on the starting point position and the end point position corresponding to the motion vector to generate a plurality of candidate starting point positions and a plurality of candidate end point positions; and determining the score between each candidate starting point position and each candidate end point position in a block matching mode, and updating the starting point position and the end point position corresponding to the motion vector according to the candidate starting point position and the candidate end point position with the highest score.

In one embodiment of the present disclosure, the apparatus further comprises: a correction module, configured to determine a plurality of adjacent positions of any one of a start position and an end position corresponding to the motion vector; determining a plurality of scores between the other of the start position and the end position and the plurality of adjacent positions; and determining the offset of any one of the starting point position and the end point position according to the scores, and correcting any one of the starting point position and the end point position according to the offset.

In one embodiment of the present disclosure, the apparatus further comprises: the rejecting module is used for determining a depth threshold according to the binocular baseline length of the camera corresponding to the image pair; and removing depth information greater than the depth threshold from the depth information of the plurality of motion vectors.

In one embodiment of the present disclosure, the generation module 44 is specifically configured to: determining the depth information of each projection point according to the depth information corresponding to each motion vector; and carrying out compensation processing according to the depth information of each projection point, and generating the depth information of the rest points except the plurality of projection points in the target coordinate system.

In one embodiment of the present disclosure, the number of cameras in the multi-camera is more than three, and the generating module 44 is specifically configured to: determining at least one three-dimensional point corresponding to each projection point; and carrying out weighting processing according to the depth information corresponding to the at least one three-dimensional point, and generating the depth information corresponding to each projection point.

The image processing device provided by the embodiment of the disclosure can execute the image processing method provided by any embodiment of the disclosure, and has the corresponding functional modules and beneficial effects of the execution method.

To achieve the above embodiments, the present disclosure also proposes a computer program product comprising a computer program/instruction which, when executed by a processor, implements the image processing method in the above embodiments.

Referring now in particular to fig. 5, a schematic diagram of an electronic device 500 suitable for use in implementing embodiments of the present disclosure is shown. The electronic device 500 in the embodiments of the present disclosure may include, but is not limited to, mobile terminals such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), in-vehicle terminals (e.g., in-vehicle navigation terminals), and the like, and stationary terminals such as digital TVs, desktop computers, and the like. The electronic device shown in fig. 5 is merely an example and should not be construed to limit the functionality and scope of use of the disclosed embodiments.

As shown in fig. 5, the electronic device 500 may include a processing means (e.g., a central processing unit, a graphics processor, etc.) 501, which may perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 502 or a program loaded from a storage means 508 into a Random Access Memory (RAM) 503. In the RAM503, various programs and data required for the operation of the electronic apparatus 500 are also stored. The processing device 501, the ROM 502, and the RAM503 are connected to each other via a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.

In general, the following devices may be connected to the I/O interface 505: input devices 506 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; an output device 507 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 508 including, for example, magnetic tape, hard disk, etc.; and communication means 509. The communication means 509 may allow the electronic device 500 to communicate with other devices wirelessly or by wire to exchange data. While fig. 5 shows an electronic device 500 having various means, it is to be understood that not all of the illustrated means are required to be implemented or provided. More or fewer devices may be implemented or provided instead.

In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a non-transitory computer readable medium, the computer program comprising program code for performing the method shown in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication means 509, or from the storage means 508, or from the ROM 502. When the computer program is executed by the processing apparatus 501, the above-described functions defined in the image processing method of the embodiment of the present disclosure are performed.

It should be noted that the computer readable medium described in the present disclosure may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this disclosure, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present disclosure, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, fiber optic cables, RF (radio frequency), and the like, or any suitable combination of the foregoing.

In some implementations, the clients, servers may communicate using any currently known or future developed network protocol, such as HTTP (HyperText Transfer Protocol ), and may be interconnected with any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the internet (e.g., the internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed networks.

The computer readable medium may be contained in the electronic device; or may exist alone without being incorporated into the electronic device.

The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring a corresponding plurality of images by a plurality of cameras, wherein the plurality of images comprise at least one group of image pairs, wherein the image pairs comprise a first image and a second image with a common view area; determining a plurality of motion vectors corresponding to the common-view region according to the image pair; determining depth information corresponding to each motion vector based on camera parameters of the multi-view camera; mapping the three-dimensional points corresponding to each motion vector in the world coordinate system to a preset target coordinate system to obtain a plurality of projection points, and generating depth information of each point in the target coordinate system according to the depth information corresponding to each motion vector and the plurality of projection points.

Computer program code for carrying out operations of the present disclosure may be written in one or more programming languages, including, but not limited to, an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units involved in the embodiments of the present disclosure may be implemented by means of software, or may be implemented by means of hardware. Wherein the names of the units do not constitute a limitation of the units themselves in some cases.

The functions described above herein may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: a Field Programmable Gate Array (FPGA), an Application Specific Integrated Circuit (ASIC), an Application Specific Standard Product (ASSP), a system on a chip (SOC), a Complex Programmable Logic Device (CPLD), and the like.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

According to one or more embodiments of the present disclosure, the present disclosure provides an image processing method including: acquiring a corresponding plurality of images by a plurality of cameras, wherein the plurality of images comprise at least one group of image pairs, wherein the image pairs comprise a first image and a second image with a common view area; determining a plurality of motion vectors corresponding to the common-view region according to the image pair; determining depth information corresponding to each motion vector based on camera parameters of the multi-view camera; mapping the three-dimensional points corresponding to each motion vector in the world coordinate system to a preset target coordinate system to obtain a plurality of projection points, and generating depth information of each point in the target coordinate system according to the depth information corresponding to each motion vector and the plurality of projection points.

According to one or more embodiments of the present disclosure, in the image processing method provided by the present disclosure, the determining, according to the image pair, a plurality of motion vectors corresponding to the common area includes: invoking a hardware processing module to process the first image and the second image, and generating a plurality of candidate motion vectors corresponding to the common-view area; matching the direction of each candidate motion vector with a specified direction, and determining the candidate motion vectors with the directions consistent with the specified direction as a plurality of motion vectors corresponding to a common-view area; and/or, matching the first module value of each candidate motion vector with a preset module value range, and determining the candidate motion vectors with the lengths within the preset module value range as a plurality of motion vectors corresponding to a common-view area; and/or, obtaining a first module value of each candidate motion vector and a second module value of a comparison motion vector corresponding to each candidate motion vector, and determining the candidate motion vector with the first module value being equal to the second module value as a plurality of motion vectors corresponding to a common-view area, wherein the candidate motion vector and the comparison motion vector are obtained by taking the first image and the second image as references respectively.

According to one or more embodiments of the present disclosure, in the image processing method provided by the present disclosure, the common-view area includes a plurality of image blocks, and determining, according to the image pair, a plurality of motion vectors corresponding to the common-view area includes: acquiring an image type of the image pair, and determining a feature type corresponding to the image type, wherein the feature type comprises point features and line features; and extracting target characteristics matched with the characteristic types of each image block in the common area, and generating a plurality of motion vectors according to the target characteristics of the image blocks.

According to one or more embodiments of the present disclosure, in the image processing method provided by the present disclosure, after generating a plurality of motion vectors corresponding to the common area, the method further includes: respectively performing offset processing on the starting point position and the end point position corresponding to the motion vector to generate a plurality of candidate starting point positions and a plurality of candidate end point positions; and determining the score between each candidate starting point position and each candidate end point position in a block matching mode, and updating the starting point position and the end point position corresponding to the motion vector according to the candidate starting point position and the candidate end point position with the highest score.

According to one or more embodiments of the present disclosure, in the image processing method provided by the present disclosure, after generating a plurality of motion vectors corresponding to a common region, the method further includes: determining a plurality of adjacent positions of any one of a start position and an end position corresponding to the motion vector; determining a plurality of scores between the other of the start position and the end position and a plurality of adjacent positions; and determining the offset of any one of the starting point position and the end point position according to the scores, and correcting any one of the starting point position and the end point position according to the offset.

According to one or more embodiments of the present disclosure, in the image processing method provided by the present disclosure, after determining depth information corresponding to each motion vector based on camera parameters of the multi-view camera, the method further includes: determining a depth threshold according to the binocular baseline length of the camera corresponding to the image pair; and removing depth information greater than the depth threshold from the depth information of the plurality of motion vectors.

According to one or more embodiments of the present disclosure, in the image processing method provided by the present disclosure, the generating depth information of each point in the target coordinate system according to the depth information corresponding to each motion vector and the plurality of projection points includes: determining the depth information of each projection point according to the depth information corresponding to each motion vector; and carrying out compensation processing according to the depth information of each projection point, and generating the depth information of the rest points except the plurality of projection points in the target coordinate system.

According to one or more embodiments of the present disclosure, in the image processing method provided by the present disclosure, the number of cameras in the multi-view camera is more than three, the number of image pairs is more than one, and determining depth information of each projection point according to the depth information corresponding to each motion vector includes: determining at least one three-dimensional point corresponding to each projection point; and carrying out weighting processing according to the depth information corresponding to the at least one three-dimensional point, and generating the depth information corresponding to each projection point.

According to one or more embodiments of the present disclosure, there is provided an image processing apparatus including: and the acquisition module is used for acquiring a plurality of corresponding images through a plurality of cameras, wherein the plurality of images comprise at least one group of image pairs, and the image pairs comprise a first image and a second image with a common view area. And the first determining module is used for determining a plurality of motion vectors corresponding to the common area according to the image pair. And the second determining module is used for determining depth information corresponding to each motion vector based on the camera parameters of the multi-camera. And the generation module is used for mapping the three-dimensional points corresponding to each motion vector in the world coordinate system to a preset target coordinate system to obtain a plurality of projection points, and generating the depth information of each point in the target coordinate system according to the depth information corresponding to each motion vector and the plurality of projection points.

According to one or more embodiments of the present disclosure, the present disclosure provides an electronic device comprising: a processor; a memory for storing the processor-executable instructions; the processor is configured to read the executable instructions from the memory and execute the instructions to implement any of the image processing methods provided in the present disclosure.

According to one or more embodiments of the present disclosure, the present disclosure provides a computer-readable storage medium storing a computer program for performing any one of the image processing methods provided by the present disclosure.

It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order and/or performed in parallel. Furthermore, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.

The term "including" and variations thereof as used herein are intended to be open-ended, i.e., including, but not limited to. The term "based on" is based at least in part on. The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments. Related definitions of other terms will be given in the description below.

It should be noted that the terms "first," "second," and the like in this disclosure are merely used to distinguish between different devices, modules, or units and are not used to define an order or interdependence of functions performed by the devices, modules, or units. The references to "a" and "an" in this disclosure are intended to be illustrative rather than limiting, and those of ordinary skill in the art will appreciate that "one or more" is intended to be understood as "one or more" unless the context clearly indicates otherwise.

The names of messages or information interacted between the various devices in the embodiments of the present disclosure are for illustrative purposes only and are not intended to limit the scope of such messages or information.

Claims

1. An image processing method, comprising:

2. The method of claim 1, wherein the determining a plurality of motion vectors corresponding to the common region from the image pair comprises:

processing the first image and the second image to generate a plurality of candidate motion vectors corresponding to the common-view region;

and matching the direction of each candidate motion vector with a specified direction, and determining the candidate motion vectors with the directions consistent with the specified direction as a plurality of motion vectors corresponding to a common region.

3. The method of claim 1, wherein the determining a plurality of motion vectors corresponding to the common region from the image pair comprises:

and matching the first module value of each candidate motion vector with a preset module value range, and determining the candidate motion vectors of which the first module values are in the preset module value range as a plurality of motion vectors corresponding to a common region.

4. The method of claim 1, wherein the determining a plurality of motion vectors corresponding to the common region from the image pair comprises:

and acquiring a first module value of each candidate motion vector and a second module value of a comparison motion vector corresponding to each candidate motion vector, and determining the candidate motion vector with the first module value being equal to the second module value as a plurality of motion vectors corresponding to a common-view area, wherein the candidate motion vector and the comparison motion vector are obtained by taking the first image and the second image as references respectively.

5. The method of claim 1, wherein the common-view region comprises a plurality of image blocks, the determining a plurality of motion vectors corresponding to the common-view region from the image pair comprising:

acquiring an image type of the image pair, and determining a feature type corresponding to the image type, wherein the feature type comprises point features and line features;

and extracting target characteristics matched with the characteristic types of each image block in the common area, and generating a plurality of motion vectors according to the target characteristics of the image blocks.

6. The method of any of claims 1-5, further comprising, after determining a plurality of motion vectors corresponding to the common area:

respectively performing offset processing on the starting point position and the end point position corresponding to the motion vector to generate a plurality of candidate starting point positions and a plurality of candidate end point positions;

and determining the score between each candidate starting point position and each candidate end point position in a block matching mode, and updating the starting point position and the end point position corresponding to the motion vector according to the candidate starting point position and the candidate end point position with the highest score.

7. The method of any of claims 1-5, further comprising, after determining a plurality of motion vectors corresponding to the common area:

determining a plurality of adjacent positions of any one of a start position and an end position corresponding to the motion vector;

determining a plurality of scores between the other of the start position and the end position and the plurality of adjacent positions;

and determining the offset of any one of the starting point position and the end point position according to the scores, and correcting any one of the starting point position and the end point position according to the offset.

8. The method of claim 1, further comprising, after determining depth information corresponding to each motion vector based on camera parameters of the plurality of cameras:

determining a depth threshold according to the binocular baseline length of the camera corresponding to the image pair;

and removing depth information greater than the depth threshold from the depth information of the plurality of motion vectors.

9. The method of claim 1, wherein generating depth information for each point in the target coordinate system from the depth information for each motion vector and the plurality of projection points comprises:

determining the depth information of each projection point according to the depth information corresponding to each motion vector;

and carrying out compensation processing according to the depth information of each projection point, and generating the depth information of the rest points except the plurality of projection points in the target coordinate system.

10. The method of claim 9, wherein the number of cameras in the plurality of cameras is more than three, the number of image pairs is more, and determining the depth information of each projection point according to the depth information corresponding to each motion vector comprises:

Determining at least one three-dimensional point corresponding to each projection point;

and carrying out weighting processing according to the depth information corresponding to the at least one three-dimensional point, and generating the depth information corresponding to each projection point.

11. An image processing apparatus, comprising:

12. An electronic device, the electronic device comprising:

A processor;

a memory for storing the processor-executable instructions;

the processor is configured to read the executable instructions from the memory and execute the instructions to implement the image processing method according to any one of the preceding claims 1-10.

13. A computer-readable storage medium, characterized in that the storage medium stores a computer program for executing the image processing method according to any one of the preceding claims 1-10.

14. A computer program product, characterized in that the computer program product comprises a computer program/instruction which, when executed by a processor, implements the image processing method according to any of claims 1-10.