CN110533710A

CN110533710A - A kind of method and processing unit of the binocular ranging algorithm based on GPU

Info

Publication number: CN110533710A
Application number: CN201910779546.4A
Authority: CN
Inventors: 符强; 罗鑫禹; 孙希延; 纪元法; 任风华; 严素清; 付文涛
Original assignee: Guilin University of Electronic Technology
Current assignee: Guilin University of Electronic Technology
Priority date: 2019-08-22
Filing date: 2019-08-22
Publication date: 2019-12-03
Anticipated expiration: 2039-08-22
Also published as: CN110533710B

Abstract

The method and processing unit of the embodiment of the invention discloses a kind of binocular ranging algorithm based on GPU improve the real-time of binocular depth cognition technology for promoting the operation efficiency of image matching algorithm in binocular vision.The method comprise the steps that obtaining the first image data and second picture data, the first image data and second picture data are respectively by different camera acquisitions；According to the first image data and second picture data, cost calculating is carried out, cost value is obtained；It is polymerize according to the synchronous cost for carrying out first direction, second direction and third direction of cost value and is calculated, obtains the cost polymerizing value of the cost polymerizing value of first direction, the cost polymerizing value of second direction and third direction；It is polymerize according to the cost that cost value carries out fourth direction and is calculated, obtains the cost polymerizing value of fourth direction；According to the cost polymerizing value of first direction, the cost polymerizing value of second direction, the cost polymerizing value of the cost polymerizing value of third direction and fourth direction, parallax value is determined.

Description

A kind of method and processing unit of the binocular ranging algorithm based on GPU

Technical field

The present invention relates to field of image processing more particularly to a kind of methods and processing of the binocular ranging algorithm based on GPU Device.

Background technique

With the development of science and technology, unmanned robot is being widely used in all fields, and unmanned robot has One common demand, needing to adjust the distance is perceived.Common distance measuring method mainly has two major classes at present: initiative range measurement and super Sound ranging, infrared distance measurement, binocular distance measurement etc..Although initiative range measurement mode principle is relatively simple, real-time is higher, Vulnerable to the influence of object reflecting surface, ambient environment etc., therefore initiative range measurement mode is not made in unmanned machine domain variability For main distance measuring method.

Binocular distance measurement obtains scene images by two cameras, using different scenery two cameras it Between imaging position it is different, try to calculate parallax, then calculate final distance according to the parallaxometer estimated.It is existing double Visually feel location algorithm, due to having biggish calculation amount in the images match stage, real-time is relatively difficult to ensure, can not will be double Mesh vision technique is applied well in unmanned robot.

Summary of the invention

The method and processing unit of the embodiment of the invention provides a kind of binocular ranging algorithm based on GPU, for being promoted The operation efficiency of image matching algorithm in binocular vision improves the real-time of binocular depth cognition technology.

In view of this, first aspect present invention provides a kind of method of binocular ranging algorithm based on GPU, may include:

Obtain the first image data and second picture data, first image data and second picture data difference It is obtained by different cameras；

According to first image data and the second picture data, cost calculating is carried out, cost value is obtained；

It is polymerize according to the synchronous cost for carrying out first direction, second direction and third direction of the cost value and is calculated, is obtained The cost polymerizing value of the cost polymerizing value of first direction, the cost polymerizing value of second direction and third direction；

It is polymerize according to the cost that the cost value carries out fourth direction and is calculated, obtains the cost polymerizing value of fourth direction；

According to the cost polymerizing value of the first direction, the cost polymerizing value of the second direction, the third direction The cost polymerizing value of cost polymerizing value and the fourth direction, determines parallax value.

Optionally, in some embodiments of the invention, described according to first image data and the second picture Data carry out cost calculating, obtain cost value, comprising:

According to first image data and the second picture data, cost meter is carried out by the block of different arrangements It calculates, obtains cost value, wherein one pixel of per thread alignment processing in each block.

Optionally, in some embodiments of the invention, described to carry out first direction, second according to the cost value is synchronous The cost of direction and third direction polymerization calculate, obtain the cost polymerizing value of first direction, the cost polymerizing value of second direction and The cost polymerizing value of third direction, comprising:

It is synchronous to carry out first direction, second direction and the by SGM binocular image matching algorithm according to the cost value The cost in three directions, which polymerize, to be calculated, and the cost polymerizing value of first direction, the cost polymerizing value of second direction and third direction are obtained Cost polymerizing value；

The cost for carrying out fourth direction according to the cost value, which polymerize, to be calculated, and the cost polymerization of fourth direction is obtained Value, comprising:

According to the cost value, by SGM binocular image matching algorithm, the cost polymerization for carrying out fourth direction is calculated, is obtained To the cost polymerizing value of fourth direction.

Optionally, in some embodiments of the invention, described to carry out first direction, second according to the cost value is synchronous The cost of direction and third direction polymerization calculate, obtain the cost polymerizing value of first direction, the cost polymerizing value of second direction and The cost polymerizing value of third direction, the cost for carrying out fourth direction according to the cost value, which polymerize, to be calculated, and obtains four directions To cost polymerizing value, comprising:

According to the cost value, the cost value of first direction, the cost value of second direction, the cost value of third direction are determined With the cost value of fourth direction；

According to the cost value of the first direction, the cost value of the cost value of the second direction and the third direction, By butterfly sort algorithm, the synchronous cost for carrying out first direction, second direction and third direction, which polymerize, to be calculated, and obtains first party To cost polymerizing value, the cost polymerizing value of the cost polymerizing value of second direction and third direction；

According to the cost value of the fourth direction, by butterfly sort algorithm, the cost polymerization for carrying out fourth direction is calculated, Obtain the cost polymerizing value of fourth direction.

Optionally, in some embodiments of the invention, the cost polymerizing value according to the first direction, described The cost polymerizing value of the cost polymerizing value in two directions, the cost polymerizing value of the third direction and the fourth direction determines view Difference, comprising:

By the cost polymerizing value of the first direction, the cost polymerizing value of the second direction, the third direction generation The cost polymerizing value of valence polymerizing value and the fourth direction, adds up in the case where different parallax values, obtain with it is described not The corresponding cumulative polymerizing value with parallax value；

By butterfly sort algorithm, the minimum value in determining cumulative polymerizing value corresponding with the different parallax values is parallax Value.

Second aspect of the present invention provides a kind of processing unit, may include:

Module is obtained, for obtaining the first image data and second picture data, first image data and described the What two image datas were obtained by different cameras respectively；

Processing module, for carrying out cost calculating, obtaining according to first image data and the second picture data Cost value；It is polymerize according to the synchronous cost for carrying out first direction, second direction and third direction of the cost value and is calculated, obtains the The cost polymerizing value of the cost polymerizing value in one direction, the cost polymerizing value of second direction and third direction；According to the cost value The cost for carrying out fourth direction, which polymerize, to be calculated, and the cost polymerizing value of fourth direction is obtained；It is poly- according to the cost of the first direction Conjunction value, the cost polymerizing value of the second direction, the cost of the cost polymerizing value of the third direction and the fourth direction are poly- Conjunction value, determines parallax value.

Optionally, in some embodiments of the invention,

The processing module is specifically used for passing through difference according to first image data and the second picture data The block of arrangement carries out cost calculating, obtains cost value, wherein one picture of per thread alignment processing in each block Element.

Optionally, in some embodiments of the invention,

The processing module is specifically used for according to the cost value, synchronous to carry out by SGM binocular image matching algorithm The cost of first direction, second direction and third direction, which polymerize, to be calculated, and the cost polymerizing value of first direction, second direction are obtained The cost polymerizing value of cost polymerizing value and third direction；It is carried out according to the cost value by SGM binocular image matching algorithm The cost of fourth direction, which polymerize, to be calculated, and the cost polymerizing value of fourth direction is obtained.

Optionally, in some embodiments of the invention,

The processing module is specifically used for determining the cost value of first direction, the generation of second direction according to the cost value Value, the cost value of the cost value of third direction and fourth direction；According to the cost value of the first direction, the second direction Cost value and the third direction cost value, it is synchronous to carry out first direction, second direction and the by butterfly sort algorithm The cost in three directions, which polymerize, to be calculated, and the cost polymerizing value of first direction, the cost polymerizing value of second direction and third direction are obtained Cost polymerizing value；The cost polymerization of fourth direction is carried out by butterfly sort algorithm according to the cost value of the fourth direction It calculates, obtains the cost polymerizing value of fourth direction.

Optionally, in some embodiments of the invention,

The processing module, specifically for gathering the cost of the cost polymerizing value of the first direction, the second direction The cost polymerizing value of conjunction value, the cost polymerizing value of the third direction and the fourth direction, in the case where different parallax values It adds up, obtains cumulative polymerizing value corresponding from the different parallax values；Pass through butterfly sort algorithm, the determining and difference Minimum value in the corresponding cumulative polymerizing value of parallax value is parallax value.

Third aspect present invention provides a kind of processing unit, may include:

Transceiver, processor, memory, wherein the transceiver, the processor and the memory are connected by bus It connects；

The memory, for storing operational order；

The transceiver, for obtaining the first image data and second picture data, first image data and described What second picture data were obtained by different cameras respectively；

The processor, for calling the operational order, execute such as first aspect present invention and first aspect is any can The step of selecting the method for the binocular ranging algorithm described in implementation based on GPU.

Fourth aspect present invention provides a kind of readable storage medium storing program for executing, is stored thereon with computer program, which is characterized in that institute It states and realizes when computer program is executed by processor such as institute in first aspect present invention and any optional implementation of first aspect The step of method for the binocular ranging algorithm based on GPU stated.

As can be seen from the above technical solutions, the embodiment of the present invention has the advantage that

In embodiments of the present invention, the first image data and second picture data, first image data and institute are obtained State what second picture data were obtained by different cameras respectively；According to first image data and the second picture number According to progress cost calculating obtains cost value；First direction, second direction and third direction are carried out according to the cost value is synchronous Cost polymerize calculate, obtain the cost polymerizing value, the cost polymerizing value of second direction and the cost of third direction of first direction Polymerizing value；It is polymerize according to the cost that the cost value carries out fourth direction and is calculated, obtains the cost polymerizing value of fourth direction；According to The cost polymerizing value of the first direction, the cost polymerizing value of the second direction, the cost polymerizing value of the third direction and The cost polymerizing value of the fourth direction, determines parallax value.The present invention utilizes GPU concurrent operation, is suitable for the spy of large-scale calculations GPU is introduced into binocular ranging algorithm by point, promotes the operation efficiency of image matching algorithm in binocular vision, and is promoted double The real-time of mesh matching algorithm.

Detailed description of the invention

Technical solution in order to illustrate the embodiments of the present invention more clearly, below will be to institute in embodiment and description of the prior art Attached drawing to be used is needed to be briefly described, it should be apparent that, the accompanying drawings in the following description is only some implementations of the invention Example, can also be obtained according to these attached drawings other attached drawings.

Fig. 1 is that cost polymerize the schematic diagram merged with disparity computation in the embodiment of the present invention；

Fig. 2 is one embodiment schematic diagram of the method for the binocular ranging algorithm based on GPU in the embodiment of the present invention；

Fig. 3 A is a time diagram of the stream of GPU in the embodiment of the present invention；

Fig. 3 B is the design diagram that cost calculates in the embodiment of the present invention；

Fig. 3 C is a design diagram of cost polymerization in the embodiment of the present invention；

Fig. 3 D is a schematic diagram of cost calculation optimization in the embodiment of the present invention；

Fig. 3 E is the schematic diagram that the array Shared_base length of benchmark pixel is stored in the embodiment of the present invention；

Fig. 3 F is a schematic diagram of butterfly sort algorithm in the embodiment of the present invention；

Fig. 4 is one embodiment schematic diagram of processing unit in the embodiment of the present invention；

Fig. 5 is one embodiment schematic diagram of processing unit in the embodiment of the present invention.

Specific embodiment

In order to enable those skilled in the art to better understand the solution of the present invention, below in conjunction in the embodiment of the present invention Attached drawing, technical solution in the embodiment of the present invention are described, it is clear that described embodiment is only present invention a part Embodiment, instead of all the embodiments.Based on the embodiments of the present invention, it should fall within the scope of the present invention.

The present invention utilizes graphics processor (Graphics Processing Unit, GPU) concurrent operation, is suitable for extensive GPU is introduced into binocular ranging algorithm by the characteristics of calculating, to promote the real-time of binocular ranging algorithm.But it is double in order to allow Mesh matching algorithm plays better effect, is redesigned to the realization on GPU.It is directed to the fortune of binocular ranging algorithm Calculation process devises a series of GPU prioritization scheme.First on total algorithm framework, in order to promote Data duplication utilization rate, Cost polymerize by the present invention to be merged together with disparity computation, as shown in Figure 1, for cost polymerization and view in the embodiment of the present invention Difference calculates a schematic diagram of fusion.

Below by way of examples, technical solution of the present invention is described further, as shown in Fig. 2, for the present invention One embodiment schematic diagram of the method for binocular ranging algorithm in embodiment based on GPU may include:

201, the first image data and second picture data, first image data and the second picture data are obtained It is obtained respectively by different cameras.

It is understood that first image data and the second picture data are obtained by different cameras respectively , then the first image data can be and be obtained by left camera, second picture data, which can be, to be obtained by right camera. It can be directed to the different operation process of different binocular ranging algorithms, design different GPU resource allocation plans.Such as Fig. 3 A institute Show, is a time diagram of the stream (stream) of GPU in the embodiment of the present invention.It is parallel that the present invention devises 3 streams Thinking, promote the operational efficiency of matching algorithm.As shown in Figure 3A, in cost polymerization stage, allow GPU while calculating 3 directions Cost.Since the present invention is merged cost polymerization and disparity computation, it is therefore necessary to ensure other in cost polymerization After the operation in direction is all finished, the operation of disparity computation could be executed, and the task in the same stream follow by The principle that sequence executes, Executing Cost polymerization ' ↑ ' Shi Yiding ensure that the cost polymerization in first three direction calculates and completed.

It is easy to carry out concurrent operation due to the single calculating step very simple of calculating cost in cost calculation stages, because This fairly simple practical and efficient mentality of designing is that each thread is allowed to be responsible for a pixel.In this way, an instruction cycle can Directly to have handled a width picture.Therefore, the direction x for designing two dimension a block, each block first includes 32 threads, The direction y includes 32 threads, wherein it is understood that the Thread Count of a 32 exactly thread beams (warp).Then, if A two-dimensional thread lattice (grid) is set, the direction x of grid will include cols/blockDim.x blocks, the side y of grid To will include cols/blockDim.y blocks.As shown in Figure 3B, the design calculated for cost in the embodiment of the present invention Schematic diagram.

In cost polymerization stage, the optimal parallax of two pixels of each Blocks design treatment, and handed over to promote data Mutual efficiency, using the thread (thread) in the same warp can directly mutual shared data principle, by a picture The calculating of all parallax sizes is completed with a warp on certain plain direction.One warp maximum includes 32 thread, this hair The calculating of a pixel whole parallax cost is completed in bright design with a warp, therefore is handled two pixels and needed two in total Warp, therefore include 64 threads in a blocks.Whole parallax costs is calculated by 32 threads in order to meet, here It designs per thread and handles MAX_DISPARITY/32 parallax.Whole thinking is calculated similar to cost, each block processing Two row pixels gradually handle other pixels in a line using for circulation in block, and resource results distribution is as shown in Figure 3 C, Fig. 3 C is a design diagram of cost polymerization in the embodiment of the present invention.

It should be noted that GPU architecture is not also identical for different directions, but general thought is all each grid It handles a line or a column, each blocks handles two pixels.I.e. by the binocular vision matching optimization algorithm based on GPU, The operation efficiency of binocular vision matching algorithm is obviously improved.Illustratively, tall and handsome up to picture processing speed on TX2 processor Degree reaches 42FPS, can be applied in unmanned plane obstacle avoidance system.

202, according to first image data and the second picture data, cost calculating is carried out, cost value is obtained.

It is described to carry out cost calculating according to first image data and the second picture data, obtain cost value, it can To include: to carry out cost meter by the block of different arrangements according to first image data and the second picture data It calculates, obtains cost value, wherein one pixel of per thread alignment processing in each block.

Cost is calculated below and carries out a brief description, can be used in the embodiment of the present invention centrosymmetric Census transformation carries out cost calculating, and this method can subtract in the case where the influence of anti-light line can be good as traditional census Few a certain amount of stored memory.The present invention utilizes the concurrency of GPU, realizes optimization to centrosymmetric census transformation, has Body optimum ideals are as follows:

A two-dimentional thread block (block) is designed first, and the direction x of each block includes 32 threads, and the direction y includes 32 threads.It is because this is exactly the Thread Count of a warp it is understood why selecting 32.Then, it is provided with The direction x of one two-dimensional grid, grid will include cols/blockDim.x blocks, and the direction y of grid will include Cols/blockDim.y blocks.

Centrosymmetric census is converted, due to individually calculating step very simple, is easy to carry out concurrent operation, because This, fairly simple practical and efficient mentality of designing is that each thread is allowed to be responsible for a pixel.In this way, an instruction cycle can Directly to have handled a width picture.The resource structures of GPU are as shown in Figure 3B, are 640* with the photo resolution that camera acquires It is illustrated for 480.Here the direction x for designing two dimension a block, each block first includes 32 threads, the direction y Comprising 32 threads, why 32 are selected, is because this is exactly the Thread Count of a warp.Then, setting one is two-dimensional The direction x of grid, grid will include cols/blockDim.x blocks, and the direction y of grid will include cols/ BlockDim.y blocks, by taking 640*480 as an example, the direction y of cols/32=20, the direction the x blocks, grid of grid will Include row/32=15 blocks.It handles in this way, each thread can just be allowed to handle a pixel.

Followed by the calculating of cost, the entire matching cost space known to the theory of binocular vision is W × H × D, therefore this Invention is also designed according to this thinking.It as shown in Figure 3D, is a signal of cost calculation optimization in the embodiment of the present invention Figure.

Since the space that traditional cost calculates is W × H × D, and parallax cost of some pixel under some direction its Real is also a pixel cost, therefore traditional cost calculates and realizes will there is a large amount of memory redundancy.In order to be promoted to greatest extent Data duplication utilization rate reduces data storage capacity, and there is no allow per thread all to handle one as census transformation by the present invention A pixel, but allow each blocks processing one-row pixels utilizes for circulation in block gradually to handle other in a line Whole parallaxes of pixel, each pixel are disposably obtained by threads.I.e. each grid will distribute H blocks, each Blocks distributes D threads, and for circulation in block will repeat W times.

Illustratively, 480 block can be designed, design 128 threads in each block.In this way each Blocks is responsible for the pixel that a line handles a line.In this case, the corresponding variable data structure of the present invention is also different from tradition side Case, the array Shared_base length for storing benchmark pixel is D, and stores the array Shared_match long of pixel to be compared Degree is 2D, and shown in the following Fig. 3 E of data structure, Fig. 3 E is the array Shared_base that benchmark pixel is stored in the embodiment of the present invention The schematic diagram of length.And then the cost of 128 pixels of the last period of the previous D storage of Shared_match, the latter D are stored Next section of 128 pixels cost.

203, it is polymerize according to the synchronous cost for carrying out first direction, second direction and third direction of the cost value and is calculated, Obtain the cost polymerizing value of the cost polymerizing value of first direction, the cost polymerizing value of second direction and third direction.

It is described that calculating is polymerize according to the synchronous cost for carrying out first direction, second direction and third direction of the cost value, The cost polymerizing value of the cost polymerizing value of first direction, the cost polymerizing value of second direction and third direction is obtained, may include: It synchronizes by SGM binocular image matching algorithm according to the cost value and carries out first direction, second direction and third direction Cost polymerization calculates, and the cost for obtaining the cost polymerizing value of first direction, the cost polymerizing value of second direction and third direction is poly- Conjunction value.

It should be noted that polymerizeing for cost, polymerization theory of the invention uses SGM (semi-global Maching) binocular image matching algorithm needs point multiple directions to carry out cost polymerization in this process.Therefore, in order into One step promotes operation efficiency, and a plurality of stream can will be used to participate in cost and polymerize.Illustratively, in this example, only gather The cost of four direction is closed, so the cost polymerizing value of ' → ' is handled by stream0, the cost polymerizing value of ' ↓ ' is by stream1 Reason, the cost polymerizing value of ' → ' are handled by stream2.But consider for data user rate is increased, the present invention is not by ' ↑ ' Cost polymerizing value be divided into stream3, but stream1 is used, because in order to improve memory usage, in progress ' ↑ ' number side To while calculating, the processor active task of the optimal parallax of calculating will be completed, it is therefore necessary to ensure that the operation in other directions all carries out Step operation could be executed by finishing, and the task in the same stream follows the principle executed in order, therefore is formed 3 A direction parallel computation, the cost of four direction such as calculate at the mentality of designing for carrying out parallax optimization after they have been calculated again.

It is calculated since SGM algorithm needs to complete following formula:

L_r(p, d)=C (p, d)+min (L_r(p-r, d),

L_r(p-r, d-1)+P₁

L_r(p-r, d+1)+P₁,

Wherein, in above-mentioned formula, meaning indicated by parameters is as follows:

Lr (p, d): the cost polymerizing value in the direction r of some match point P；

C (p, d): the matching cost value of some match point P；

Lr (p-r, d): the matching cost polymerizing value on some match point under the same disparity of a match point；

Lr (p-r, d-1): the parallax of a match point subtracts matching cost polymerizing value on some match point；

Lr (p-r, d+1): the parallax of a match point adds matching cost polymerizing value on some match point；

MinLr (p-r, i): cost polymerization on some match point under all parallaxes of a match point minimum value (formula of back k is same)；

P1, P2: adjustable parameters compensate parallax, for being finely adjusted to algorithm.

The cost of one pixel needs to be related to neighbor pixel, needs to compare the generation of the adjacent parallax of neighbor pixel Valence, the minimum value of the cost of whole parallaxes of the current parallax cost and neighbor pixel of neighbor pixel, therefore deposit here In more data reusings and data interaction, and traditional scheme equally allows per thread to handle a parallax cost, can because Data communication between different threads and reduce parallel operational efficiency, the invention proposes a kind of more efficient processing scheme, needles To different directions, there is different architecture design strategies, by taking ' → ' direction as an example, layout strategy can be with reference to shown in Fig. 3 C.

204, it is polymerize according to the cost that the cost value carries out fourth direction and is calculated, obtains the cost polymerization of fourth direction Value.

The cost for carrying out fourth direction according to the cost value, which polymerize, to be calculated, and the cost polymerization of fourth direction is obtained Value may include:, by SGM binocular image matching algorithm, to carry out the cost polymerization meter of fourth direction according to the cost value It calculates, obtains the cost polymerizing value of fourth direction.

205, according to the cost polymerizing value of the first direction, the cost polymerizing value of the second direction, the third party To cost polymerizing value and the fourth direction cost polymerizing value, determine parallax value.

The cost polymerizing value according to the first direction, the cost polymerizing value of the second direction, the third party To cost polymerizing value and the fourth direction cost polymerizing value, determine parallax value, may include: by the first direction Cost polymerizing value, the cost polymerizing value of the second direction, the cost polymerizing value of the third direction and the fourth direction Cost polymerizing value adds up in the case where different parallax values, obtains cumulative polymerizing value corresponding from the different parallax values； By butterfly sort algorithm, the minimum value in determining cumulative polymerizing value corresponding with the different parallax values is parallax value.

It is understood that the optimal parallax of two pixels of each Blocks design treatment, and in order to promote data interaction Efficiency, using the thread in the same warp can directly mutual shared data principle, by a pixel direction The calculating of upper all parallax sizes is completed with a warp.And TX2 platform Pascal (pascall) framework is next at present Warp maximum includes 32 thread, and the calculating of a pixel whole parallax cost is completed in present invention design with a warp, therefore It handles two pixels and needs two warp in total, therefore include 64 threads in a blocks.In order to meet through 32 lines Journey calculates whole parallax costs, designs per thread here and handles MAX_DISPARITY/32 parallax.Whole thinking will It is calculated similar to cost, each block handles two row pixels, and other in a line are gradually handled using for circulation in block Pixel.Therefore an one-dimensional grid will be designed, each grid includes the blocks of rows/2.With the image procossing of 640*480 For, this grid will include 480/2=239 blocks, and each blocks will include 64 threads, per thread processing The cost of 128/32=4 parallax.

According to SGM algorithm, need to calculateAnd d^*=min_dThe value of S (p, d).Conventional method is to pass through Common sort method, to obtain minimum value, such as bubbling method, at least needs to arrange n × (n-1)/2 in several inter-group orderings. And for GPU, it can achieve higher efficiency, the present invention devises a kind of butterfly sort algorithm, using same Thread in warp can directly mutual shared data principle, pass through _ shuf_xor_sync instruction, significant less sequence Number.That value carries out data exchange between adjacent thread first, by the legacy data in script thread with new data progress size compared with, Realization finds out data maximums between finding out adjacent thread, and such data volume will reduce half；Then, be interval 1 cross-thread into Row data exchange compares by this wheel, realizes that the minimum value for finding out continuous 4 cross-threads, such data volume will reduce half again； Continue to compare, to the last until the data reduction of 32 threads to 1 data.This method can reduce number of comparisons, Computational complexity is reduced, by this butterfly sort operation, is shifted every time, the negligible amounts half for needing to compare, therefore It only needs to compareIt is secondary, significantly less number of comparisons.Fig. 3 F is one of butterfly sort algorithm in the embodiment of the present invention A schematic diagram.

Illustratively, judge the minimum disparity correspondence cost in 128 parallax points.Due to thread process 4 views The matching cost of difference, therefore first determine whether the minimum value for comparing 4 parallaxes, need to carry out 3 comparisons in total.After this step, 128 values will narrow down to 32 values and need to compare size, this 32 values are respectively in 32 different threads.Then, to line Journey uses shfl_xor_sync (val, 2), exactly swaps across 1 grid, i.e. the value of thread 2 has been placed to thread 0 Position similarly allows them to carry out size comparison, obtains the minimum value of thread 0,1,2,3, similarly other threads, needs in this way The val value compared is reduced half again, is left 8 val values.Similarly, it is shifted 8 times in displacement 4 times later, displacement 16 times, gradually The size of completeer 32 threads.

In embodiments of the present invention, depth optimization has been carried out for binocular vision matching algorithm, has proposed a series of be based on The optimizing structure design scheme of GPU.It proposes aiming at the problem that SGM algorithm finds smallest match cost, proposes a kind of butterfly Sort algorithm only needs operationIt is secondary to find out optimal parallax.Increase the utilization rate that cost calculates, the present invention redesigns The data structure of cost, can protect guarantee data reusing to the greatest extent.

The embodiment of the invention provides it is a kind of new based on graphics processor (Graphics Processing Unit, GPU binocular vision image matching method), can greatly promote operation efficiency, under the premise of matching precision is constant, reduce Handle the time.The operation efficiency of image matching algorithm in binocular vision can be promoted, the reality of binocular depth cognition technology is improved Shi Xing.

As shown in figure 4, may include: for one embodiment schematic diagram of processing unit in the embodiment of the present invention

Module 401 is obtained, for obtaining the first image data and second picture data, first image data and described What second picture data were obtained by different cameras respectively；

Processing module 402, for carrying out cost calculating according to first image data and the second picture data, Obtain cost value；It is polymerize according to the synchronous cost for carrying out first direction, second direction and third direction of the cost value and is calculated, is obtained Cost polymerizing value, the cost polymerizing value of the cost polymerizing value of second direction and third direction to first direction；According to the generation The cost that value carries out fourth direction, which polymerize, to be calculated, and the cost polymerizing value of fourth direction is obtained；According to the generation of the first direction The generation of valence polymerizing value, the cost polymerizing value of the second direction, the cost polymerizing value of the third direction and the fourth direction Valence polymerizing value, determines parallax value.

Optionally, in some embodiments of the invention,

Processing module 402 is specifically used for passing through different rows according to first image data and the second picture data The block of cloth carries out cost calculating, obtains cost value, wherein one pixel of per thread alignment processing in each block.

Optionally, in some embodiments of the invention,

Processing module 402 is specifically used for according to the cost value, synchronous to carry out the by SGM binocular image matching algorithm The cost in one direction, second direction and third direction, which polymerize, to be calculated, and the cost polymerizing value of first direction, the generation of second direction are obtained The cost polymerizing value of valence polymerizing value and third direction；According to the cost value, by SGM binocular image matching algorithm, the is carried out The cost in four directions, which polymerize, to be calculated, and the cost polymerizing value of fourth direction is obtained.

Optionally, in some embodiments of the invention,

Processing module 402 is specifically used for determining the cost value of first direction, the generation of second direction according to the cost value Value, the cost value of the cost value of third direction and fourth direction；According to the cost value of the first direction, the second direction Cost value and the third direction cost value, it is synchronous to carry out first direction, second direction and the by butterfly sort algorithm The cost in three directions, which polymerize, to be calculated, and the cost polymerizing value of first direction, the cost polymerizing value of second direction and third direction are obtained Cost polymerizing value；The cost polymerization of fourth direction is carried out by butterfly sort algorithm according to the cost value of the fourth direction It calculates, obtains the cost polymerizing value of fourth direction.

Optionally, in some embodiments of the invention,

Processing module 402, specifically for the cost of the cost polymerizing value of the first direction, the second direction to polymerize Value, the cost polymerizing value of the third direction and the fourth direction cost polymerizing value, in the case where different parallax values into Row is cumulative, obtains cumulative polymerizing value corresponding from the different parallax values；By butterfly sort algorithm, determine and the different views Minimum value in the corresponding cumulative polymerizing value of difference is parallax value.

As shown in figure 5, may include: for one embodiment schematic diagram of processing unit in the embodiment of the present invention

Transceiver 501, processor 502, memory 503, wherein transceiver 501, processor 502 and memory 503 pass through Bus connection；It is understood that transceiver 501 can be image capture device.

Memory 503, for storing operational order；

Transceiver 501, for obtaining the first image data and second picture data, first image data and described the What two image datas were obtained by different cameras respectively；

Processor 502, for calling the operational order, execution following steps:

Optionally, in some embodiments of the invention, processor 502 execute as follows for calling the operational order Step:

By the cost polymerizing value of the first direction, the cost polymerizing value of the second direction, the third direction generation The cost polymerizing value of valence polymerizing value and the fourth direction, adds up in the case where different parallax values, obtain with it is described not The corresponding cumulative polymerizing value with parallax value；By butterfly sort algorithm, determining cumulative polymerization corresponding from the different parallax values Minimum value in value is parallax value.

In the above-described embodiments, can come wholly or partly by software, hardware, firmware or any combination thereof real It is existing.When implemented in software, it can entirely or partly realize in the form of a computer program product.

The computer program product includes one or more computer instructions.Load and execute on computers the meter When calculation machine program instruction, entirely or partly generate according to process or function described in the embodiment of the present invention.The computer can To be general purpose computer, special purpose computer, computer network or other programmable devices.The computer instruction can be deposited Storage in a computer-readable storage medium, or from a computer readable storage medium to another computer readable storage medium Transmission, for example, the computer instruction can pass through wired (example from a web-site, computer, server or data center Such as coaxial cable, optical fiber, Digital Subscriber Line (DSL)) or wireless (such as infrared, wireless, microwave) mode to another website Website, computer, server or data center are transmitted.The computer readable storage medium can be computer and can deposit Any usable medium of storage either includes that the data storages such as one or more usable mediums integrated server, data center are set It is standby.The usable medium can be magnetic medium, (for example, floppy disk, hard disk, tape), optical medium (for example, DVD) or partly lead Body medium (such as solid state hard disk Solid State Disk (SSD)) etc..

It is apparent to those skilled in the art that for convenience and simplicity of description, the system of foregoing description, The specific work process of device and unit, can refer to corresponding processes in the foregoing method embodiment, and details are not described herein.

In several embodiments provided by the present invention, it should be understood that disclosed system, device and method can be with It realizes by another way.For example, the apparatus embodiments described above are merely exemplary, for example, the unit It divides, only a kind of logical function partition, there may be another division manner in actual implementation, such as multiple units or components It can be combined or can be integrated into another system, or some features can be ignored or not executed.Another point, it is shown or The mutual coupling, direct-coupling or communication connection discussed can be through some interfaces, the indirect coupling of device or unit It closes or communicates to connect, can be electrical property, mechanical or other forms.

The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme 's.

It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unit It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list Member both can take the form of hardware realization, can also realize in the form of software functional units.

If the integrated unit is realized in the form of SFU software functional unit and sells or use as independent product When, it can store in a computer readable storage medium.Based on this understanding, technical solution of the present invention is substantially The all or part of the part that contributes to existing technology or the technical solution can be in the form of software products in other words It embodies, which is stored in a storage medium, including some instructions are used so that a computer Equipment (can be personal computer, server or the network equipment etc.) executes the complete of each embodiment the method for the present invention Portion or part steps.And storage medium above-mentioned includes: USB flash disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic or disk etc. are various can store journey The medium of sequence code.

The above, the above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations；Although referring to before Stating embodiment, invention is explained in detail, those skilled in the art should understand that: it still can be to preceding Technical solution documented by each embodiment is stated to modify or equivalent replacement of some of the technical features；And these It modifies or replaces, the spirit and scope for technical solution of various embodiments of the present invention that it does not separate the essence of the corresponding technical solution.

Claims

1. a kind of method of the binocular ranging algorithm based on GPU characterized by comprising

The first image data and second picture data are obtained, first image data and the second picture data are not respectively by What same camera obtained；

It is polymerize according to the synchronous cost for carrying out first direction, second direction and third direction of the cost value and is calculated, obtains first The cost polymerizing value of the cost polymerizing value in direction, the cost polymerizing value of second direction and third direction；

According to the cost polymerizing value of the first direction, the cost polymerizing value of the second direction, the cost of the third direction The cost polymerizing value of polymerizing value and the fourth direction, determines parallax value.

2. the method according to claim 1, wherein described according to first image data and second figure Sheet data carries out cost calculating, obtains cost value, comprising:

According to first image data and the second picture data, cost calculating is carried out by the block of different arrangements, is obtained To cost value, wherein one pixel of per thread alignment processing in each block.

3. the method according to claim 1, wherein it is described according to the cost value it is synchronous carry out first direction, Second direction and the polymerization of the cost of third direction calculate, and obtain cost polymerizing value, the polymerization of the cost of second direction of first direction The cost polymerizing value of value and third direction, comprising:

It is synchronous to carry out first direction, second direction and third party by SGM binocular image matching algorithm according to the cost value To cost polymerize calculate, obtain the cost polymerizing value, the cost polymerizing value of second direction and the generation of third direction of first direction Valence polymerizing value；

The cost for carrying out fourth direction according to the cost value, which polymerize, to be calculated, and is obtained the cost polymerizing value of fourth direction, is wrapped It includes:

According to the cost value, by SGM binocular image matching algorithm, the cost polymerization for carrying out fourth direction is calculated, and obtains the The cost polymerizing value in four directions.

4. method according to any one of claim 1-3, which is characterized in that described to be carried out according to the cost value is synchronous The cost of first direction, second direction and third direction, which polymerize, to be calculated, and the cost polymerizing value of first direction, second direction are obtained The cost polymerizing value of cost polymerizing value and third direction, the cost for carrying out fourth direction according to the cost value polymerize meter It calculates, obtains the cost polymerizing value of fourth direction, comprising:

According to the cost value, the cost value of first direction, the cost value of second direction, the cost value of third direction and are determined The cost value in four directions；

According to the cost value of the first direction, the cost value of the cost value of the second direction and the third direction, pass through Butterfly sort algorithm, the synchronous cost for carrying out first direction, second direction and third direction, which polymerize, to be calculated, and obtains first direction The cost polymerizing value of cost polymerizing value, the cost polymerizing value of second direction and third direction；

According to the cost value of the fourth direction, by butterfly sort algorithm, the cost polymerization for carrying out fourth direction is calculated, is obtained The cost polymerizing value of fourth direction.

5. method according to any one of claim 1-3, which is characterized in that the cost according to the first direction Polymerizing value, the cost polymerizing value of the second direction, the cost of the cost polymerizing value of the third direction and the fourth direction Polymerizing value determines parallax value, comprising:

The cost polymerizing value of the first direction, the cost polymerizing value of the second direction, the cost of the third direction are gathered The cost polymerizing value of conjunction value and the fourth direction, adds up in the case where different parallax values, obtains and the different views The corresponding cumulative polymerizing value of difference；

6. a kind of processing unit characterized by comprising

Module is obtained, for obtaining the first image data and second picture data, first image data and second figure What sheet data was obtained by different cameras respectively；

Processing module, for carrying out cost calculating, obtaining cost according to first image data and the second picture data Value；It is polymerize according to the synchronous cost for carrying out first direction, second direction and third direction of the cost value and is calculated, obtains first party To cost polymerizing value, the cost polymerizing value of the cost polymerizing value of second direction and third direction；It is carried out according to the cost value The cost of fourth direction, which polymerize, to be calculated, and the cost polymerizing value of fourth direction is obtained；According to the cost polymerizing value of the first direction, The cost polymerizing value of the cost polymerizing value of the second direction, the cost polymerizing value of the third direction and the fourth direction, Determine parallax value.

7. processing unit according to claim 6, which is characterized in that

The processing module is specifically used for passing through different arrangements according to first image data and the second picture data Block carry out cost calculating, obtain cost value, wherein one pixel of per thread alignment processing in each block；

The processing module is specifically used for according to the cost value, synchronous to carry out first by SGM binocular image matching algorithm The cost in direction, second direction and third direction, which polymerize, to be calculated, and cost polymerizing value, the cost of second direction of first direction are obtained The cost polymerizing value of polymerizing value and third direction；The 4th is carried out by SGM binocular image matching algorithm according to the cost value The cost in direction, which polymerize, to be calculated, and the cost polymerizing value of fourth direction is obtained.

8. processing unit according to claim 6 or 7, which is characterized in that

The processing module is specifically used for determining the cost value of first direction, the cost of second direction according to the cost value Value, the cost value of the cost value of third direction and fourth direction；According to the cost value of the first direction, the second direction The cost value of cost value and the third direction, it is synchronous to carry out first direction, second direction and third by butterfly sort algorithm The cost in direction, which polymerize, to be calculated, and the cost polymerizing value of first direction, the cost polymerizing value of second direction and third direction are obtained Cost polymerizing value；The cost polymerization meter of fourth direction is carried out by butterfly sort algorithm according to the cost value of the fourth direction It calculates, obtains the cost polymerizing value of fourth direction；

The processing module, specifically for by the cost polymerizing value of the cost polymerizing value of the first direction, the second direction, The cost polymerizing value of the third direction and the cost polymerizing value of the fourth direction, carry out tired in the case where different parallax values Add, obtains cumulative polymerizing value corresponding from the different parallax values；By butterfly sort algorithm, determine and the different parallax values Minimum value in corresponding cumulative polymerizing value is parallax value.

9. a kind of processing unit characterized by comprising

Transceiver, processor, memory, wherein the transceiver, the processor and the memory are connected by bus；

The memory, for storing operational order；

The transceiver, for obtaining the first image data and second picture data, first image data and described second What image data was obtained by different cameras respectively；

The processor executes according to any one of claims 1 to 5 based on GPU's for calling the operational order The step of method of binocular ranging algorithm.

10. a kind of readable storage medium storing program for executing, is stored thereon with computer program, which is characterized in that the computer program is processed The step of device realizes the method for the binocular ranging algorithm according to any one of claims 1 to 5 based on GPU when executing.