CN105117738B

CN105117738B - Haar detection algorithm Fast implementations based on OmapL138 chips

Info

Publication number: CN105117738B
Application number: CN201510462430.XA
Authority: CN
Inventors: 曹泉; 郭强; 艾通
Original assignee: SHENZHEN HAGONGDA TRAFFIC ELECTRONIC TECHNOLOGY Co Ltd
Current assignee: SHENZHEN HAGONGDA TRAFFIC ELECTRONIC TECHNOLOGY Co Ltd
Priority date: 2015-07-31
Filing date: 2015-07-31
Publication date: 2018-08-10
Anticipated expiration: 2035-07-31
Also published as: CN105117738A

Abstract

The Haar detection algorithm Fast implementations based on OmapL138 chips that the invention discloses a kind of, include the following steps：（1）To camera calibration as a result, one-time calculation goes out the detection zone of Haar algorithms, formation Haar detection zone tables are saved in DDR memories；（2）ARM extracts data according to Haar detection zones table and Haar parameter lists from DDR, in linear array to L2 memories；（3）DSP linearly extracts data according to Haar parameter lists from L2, calculates as a result, and in testing result update to L2 memories；（4）The next batch data of testing result tissue that ARM is preserved according to L2 memories, then DSP calculating is passed to, data are extracted according to haar parameter lists by DSP, and calculated, executed repeatedly, until the testing result of all images is stored in L2；（5）ARM extracts testing result from L2, obtains the Haar testing results of whole image.The hardware feature of present invention combination OmapL138 chips can raise speed to the realization of haar detection algorithms from multiple angles, improve its practical value.

Description

Haar detection algorithm Fast implementations based on OmapL138 chips

Technical field

The present invention relates to a kind of haar detection algorithms, specifically, being to be related to a kind of Haar based on OmapL138 chips Detection algorithm Fast implementation.

Background technology

Haar detection algorithms are the methods of detection object in a sub-picture according to advance trained Haar mark sheets, Its basic procedure is as shown in Figure 1.Haar mark sheets are made of haar features, i.e. matrix character in Fig. 1.Rectangular characteristic value refers to In testing image in the identical two or more rectangles of shape gray-scale pixels difference, as the rectangular characteristic value in Fig. 2 refers to The difference of the pixel in pixel and black image in white image, the rectangular characteristic value in Fig. 3 refer to the pixel in white image And 2 times of difference of the pixel in black image.

In order to be quickly detected to vehicle, researcher has carried out induction and conclusion to matrix character, can be divided mainly into Following three classes：

（1）Edge is detected, as shown in Figure 2；

（2）The linear direction of detection image, as shown in Figure 3；

（3）The difference of inspection center's pixel and surrounding pixel, as shown in Figure 4.

The detector of one width M × M pixel resolutions, the internal rectangle number comprising the condition that meets be it is very more, such as It is considerably complicated that fruit calculates calculation amount of getting up one by one, and calculation amount is also prodigious.For this purpose, specially devising a kind of quick Convenient algorithm, it is identical to the calculation amount of various sizes of rectangular characteristic value, it is exactly the calculating of integrogram.And one It opens in image, only several car plates, but scanning window is really many, during scanning, most windows is equal For non-car plate, in order to reduce residence time of the inessential window in grader, and the method for introducing cascade classifier.

The training of cascade classifier is different from the training method of other graders, it be one stage an of stage successively into Capable.Trained first stage has used all positive samples and negative sample, generates a Weak Classifier；Second stage makes With all positive samples, and the negative sample that negative sample is not all, but use first Weak Classifier to negative sample into Row classification, if negative sample is classified as positive sample, which participates in the training of second stage, and otherwise the negative sample is not The training of second stage is participated in again.The training for carrying out follow-up phase according to the method needs to train how many a stages in total, by Operating personnel are previously set, generally in 17 step left-rights.This method for being phased out negative sample so that grader is from Level-one level-one to the end, gradually enhances negative sample recognition capability.During training, it will be seen that the first order The negative sample of misclassification is left for the second level, and the negative sample of misclassification is left for the third level by the second level, is gone down successively.Actually detected When car plate, most puppet car plate what is just excluded in front, what subsequent grader, which is just concentrated, tackles some and be difficult to The negative sample of differentiation, basic procedure schematic diagram figure are as shown in Figure 5.

But even if introducing integrogram and cascade classifier, with the continuous development of technology, haar detection algorithms are also got over It more cannot be satisfied the actual demand of today's society, urgent need is further improved it.

Invention content

The Haar detection algorithm Fast implementations based on OmapL138 chips that the purpose of the present invention is to provide a kind of, solution The problem of certainly haar detection algorithms are difficult to meet actual demand in the prior art improves its practical value.

To achieve the goals above, the technical solution adopted by the present invention is as follows：

Haar detection algorithm Fast implementations based on OmapL138 chips, include the following steps：

（1）To camera calibration as a result, one-time calculation goes out the detection zone of Haar algorithms, formation Haar detection zones Table is saved in DDR memories；

（2）ARM extracts data, linear array to L2 memories according to Haar detection zones table and Haar parameter lists from DDR In；

（3）DSP linearly extracts data according to Haar parameter lists from L2, calculates as a result, and testing result update is arrived In L2 memories；

（4）The next batch data of testing result tissue that ARM is preserved according to L2 memories, then pass to DSP calculating, by DSP according to Data are extracted according to haar parameter lists, and are calculated, are executed repeatedly, until the testing result of all images is stored in L2 In；

（5）ARM extracts testing result from L2, obtains the Haar testing results of whole image.

Further, BIT tables there are one individually being established in the L2, the step（3）The result of calculation of middle DSP is stored in In the BIT tables, and the step（4）Middle ARM extracts testing result from the BIT tables, obtains the result of haar detection algorithms. ARM and DSP directly carries out access in the same table and deposits number, and data need not carry out multiple unloading, can effectively improve number According to treatment effeciency.

Further, the step（2）In the data extracted from DDR carry out every group of size of data phase when linear array Deng.Linear array can reduce data extraction difficulty, be provided conveniently for data, and every group of size of data is equal, then is carrying Access according to when directly extract one group every time, without it is time-consuming go calculate size of data, improve access efficiency.

Still further, the step（3）Middle DSP is calculated testing result and is carried out using pure compilation mode.Pure compilation mode The utilization rate that can effectively improve eight cores of DSP, avoids the DSP wastings of resources.

Further, in step（3）During middle DSP calculates testing result, by weight, left sibling, right node and Threshold value is arranged respectively to constant and is assigned to register to be calculated.Using the method it is possible to prevente effectively from being instructed using access Load The increased functional unit expense of institute, saves DSP resources.

In the present invention, the OMAPL138 chips are the C6748 Floating-point DSPs kernel and ARM9 kernels that TI companies release Double-core high-speed processor, the device collection image, network, are stored in one at voice, cost-effective；Its frequency is up to 456MHz C6748 kernels the fixed point ability to work of floating-point ability to work and higher performance is provided；ARM9 kernels have the flexible of height Property, developer can use the operating systems such as Linux on it, convenient for its application addition man-machine interface, network work( Energy, touch screen etc..

The memory and peripheral resources very abundant of OMAP-L138 chips, can meet the system of mixed Gaussian algorithm completely Design requirement, and be also convenient for carrying out the extension and upgrading of system in the future.

Compared with prior art, the invention has the advantages that：

The actual needs of present invention combination haar detection algorithms and the interior nuclear properties of OMAP-L138 chips, by the confession of the two Need characteristic to be fully blended together, farthest play the resources advantage of two kernels of OMAP-L138 chips, from hardware and It is designed in terms of software two, so that the realization speed of haar detection algorithms is greatly improved, solve existing Haar detection algorithms realize that speed is slow, cannot be satisfied the problem of actual demand, improve the practical value of haar detection algorithms.

Description of the drawings

Fig. 1 is the basic procedure schematic diagram of Haar detection algorithms in the prior art.

Fig. 2 is a kind of schematic diagram of matrix character in the prior art.

Fig. 3 is another schematic diagram of matrix character in the prior art.

Fig. 4 is another schematic diagram of matrix character in the prior art.

Fig. 5 is the flow diagram of cascade classifier training in the prior art.

Fig. 6 is the linearly aligned schematic diagram of data in the present invention.

Specific implementation mode

The invention will be further described with reference to the accompanying drawings and examples, and embodiments of the present invention include but not limited to The following example.

Embodiment

Haar detection algorithm Fast implementations based on OmapL138 chips disclosed in the present embodiment, cardinal principle exist In the dual core characteristic for utilizing and giving full play to OmapL138 chips, logical process and specific calculate are separated, respectively by one Kernel is completed, to improving the realization speed of haar detection algorithms.

Specifically, be integrated with two individual cores of ARM and DSP in OmapL138 chips, two cores are given full play to Respective characteristic is the key that real-time Haar detections.Wherein, ARM is carried using Haar detection templates from DDR as controller The calculating data for taking, organizing Haar, send the data organized in the L2 of DSP in batches；And DSP then stage extraction numbers from L2 According to using eight therein calculating cores, supercomputing goes out Haar testing results；Then ARM is given testing result by DSP, Organize next batch data again by ARM；Repeatedly, until finally obtaining the testing result of Haar at the ends ARM.

The present invention is exactly the double-core feature for the design feature and process features and OmapL138 chips of Haar detections, The present invention is devised, general steps are as follows：

Specifically, in the calculating process of DSP, in order to using the parallel execution feature of DSP, make full use of its eight cores The computing capability of the heart, the present invention write calculation procedure in the form of pure compilation, and eight cores are distributed respectively instruction and Arrangement assembly line.

In the parameter of haar detection algorithms, weight, left sibling, right node, threshold value（T）It is required for expending Load instructions, To solve this drawback, it is contemplated that these values are given values, therefore normal using corresponding parameter as one in calculating process Number is assigned to register to be calculated, to avoid frequently using access Load instructions increased functional unit expense.

In addition, can learn that only there are two squares below the level 0 to the 4th layer of each node of cascade classifier in advance Battle array feature, and the number of 0-4 layers of verification node accounts for the 77% of the number that 0-22 layers need to verify in total node, the meter of other layers Calculation amount is relatively small.Therefore, special 0-4 layers of structure is designed a kind of optimization method：

First, mould iteration interval layout table is drawn, then the unit of instruction is allocated.Because of the use function of instruction Unit be it is conditional, such as LDDW instruction and STW instruction can only use .D units, MPYLI can only use .M units, CMPLTSP and CMPGTSP can only use .S units, ADDSP instructions that can only use .L .S units.

Such as：It needs to expend in the calculating of two Feature of a Node：

The functional unit that number of instructions uses

（1）LDDW * 4 .D1 .D2 .D1 .D2

（2）ADD * 4 .D1 .D2 .L1 .L2

（3）SUB * 2 .L1 .L2

（4）INTSP * 2 .L1 .L2

（5）MPYSP * 2 .M1 .M2

（6）CMPLTSP * 2 .S1 .S2

（7）CMPGTSP * 2 .S1 .S2

（8）MPYLI * 4 .M1 .M2 .M1 .M2

（9）ADDSP * 2 .L1 .L2

（10）STW * 2 .D1 .D2

It finds in the design process, STW send several deviation ranges that can only add 5bit, that is, deviates 32*4=128 Byres cannot meet and send several requirements, therefore it is as follows to also need to addition ADDK instructions：

（11）ADDK * 2 .S1 .S2

It can be found that most being used using unit .D units is 8, hence, it can be determined that design mould iteration interval is compiled Row's table minimum iteration interval is 2Cycles.

When writing pure assembly code with C call functions the difference is that：Need oneself Conservation environment.In the generation of compilation A10-A15, B10-B15 are stored in stack by code before calculating, and are calculated and are restored to register from stack after completing；It preserves simultaneously Return address PC and stack pointer SP.

It can be obtained from the assembly line of mould iteration interval layout table：Use 8+8+6+6 in 4Cycles= 28 functional units, i.e., it is average that 7 cores have been used in 1Cycles.In the calculating of assembly line, 2Cycles can calculate one A Feature because 0-4 layers of each node only have 2 matrix characters, therefore for 0-4 layers, calculates 1 node and needs 4Cycles can be completed.The result of each matrix character is instructed with STW and is sent in memory, finally by the value in memory It is cumulative, it calculates cumulative value and the threshold value of this layer is judged, and design label.

It can be from multiple angles in conjunction with the hardware feature of OmapL138 chips by the above-mentioned improvement to software program method Degree raises speed to the realization of haar detection algorithms, greatly improves the efficiency of haar algorithms detection target vehicle, improves its practicality Value.

Above-described embodiment is merely a preferred embodiment of the present invention, and it is not intended to limit the protection scope of the present invention, as long as using The design principle of the present invention, and the non-creative variation worked and made is carried out on this basis, it should all belong to the present invention's Within protection domain.

Claims

1. the Haar detection algorithm Fast implementations based on OmapL138 chips, which is characterized in that include the following steps：

（1）To camera calibration as a result, one-time calculation goes out the detection zone of Haar algorithms, formation Haar detection zone tables are protected It is stored in DDR memories；

（2）ARM extracts data according to Haar detection zones table and Haar parameter lists from DDR, in linear array to L2 memories；

（3）DSP linearly extracts data according to Haar parameter lists from L2 memories, calculates as a result, and result of calculation update is arrived In L2 memories；

（4）The next batch data of result of calculation tissue that ARM is preserved according to L2 memories, then DSP calculating is passed to, by DSP foundations Haar parameter lists extract data, and are calculated, and execute repeatedly, until the result of calculation of all images is stored in L2 In depositing；

（5）ARM extracts result of calculation from L2 memories, obtains the Haar result of calculations of whole image.

2. the Haar detection algorithm Fast implementations according to claim 1 based on OmapL138 chips, feature exist In individually there are one BIT tables, the steps for foundation in the L2 memories（3）The result of calculation of middle DSP is stored in the BIT tables, And the step（4）Middle ARM extracts result of calculation from the BIT tables, obtains the result of haar detection algorithms.

3. the Haar detection algorithm Fast implementations according to claim 2 based on OmapL138 chips, feature exist In the step（2）In the data extracted from DDR to carry out every group of size of data when linear array equal.

4. the Haar detection algorithm Fast implementations according to claim 3 based on OmapL138 chips, feature exist In the step（3）Middle DSP is calculated result of calculation and is carried out using pure compilation mode.

5. the Haar detection algorithm Fast implementations according to claim 4 based on OmapL138 chips, feature exist In in step（3）During middle DSP calculates result of calculation, weight, left sibling, right node and threshold value are arranged respectively to often Number is assigned to register and is calculated.