CN111145211A

CN111145211A - Monocular camera upright pedestrian head pixel height acquisition method

Info

Publication number: CN111145211A
Application number: CN201911235573.1A
Authority: CN
Inventors: 杨大伟; 毛琳; 程凡
Original assignee: Dalian Minzu University
Current assignee: Dalian Minzu University
Priority date: 2019-12-05
Filing date: 2019-12-05
Publication date: 2020-05-12
Anticipated expiration: 2039-12-05
Also published as: CN111145211B

Abstract

A monocular camera upright pedestrian head pixel height obtaining method belongs to the field of intelligent image vision and aims to solve the problem that head information is not clearly obtained in the process of distance estimation through a head part, S1, a monocular camera located in front of a vehicle collects images of pedestrians, a pedestrian head calibration frame is obtained through HOG characteristics, a head pixel region is obtained through a superpixel algorithm, a pedestrian head pixel region is obtained, a target binary image is obtained, S2, a vertical direction energy distribution diagram is established, pedestrian head pixel height is obtained, and the effect is that relevant information such as height is extracted after the pedestrian head part pixel region is obtained according to requirements.

Description

Monocular camera upright pedestrian head pixel height acquisition method

Technical Field

The invention belongs to the field of intelligent image vision, and particularly relates to a method for acquiring an image through a monocular camera and processing a standing pedestrian by using an image processing method to achieve the purpose of acquiring the head of the pedestrian.

Background

In a traffic system, the traffic system mainly comprises two objects, namely a vehicle and a pedestrian, and the judgment of the distance between the vehicle and the pedestrian is a key factor for judging whether the pedestrian is in a dangerous state or not. The pedestrian body part characteristics are processed by utilizing the image, and external equipment is not required to participate in assistance, so that the requirements of multiple aspects such as accuracy, simplicity and the like on the determination of the pedestrian distance in an auxiliary traffic environment are met.

The method for acquiring the pedestrian information on the image level is various. In the patent, "segmentation method of multi-target motion human body region in video" (publication number: CN108648198A), a motion region is binarized by a background difference method, a region where a pedestrian is positioned is subjected to rectangular maximum region segmentation, and a target motion human body region is finally extracted by a method of multiple iterations of an optical flow method. In the patent human target visual detection method and device based on the BING characteristics (publication number: CN108734200A), the BING characteristics are utilized to carry out significance detection on an image, an area possibly containing pedestrians is screened out, a screening result is processed through a cascade classifier, and finally position information of the area where the pedestrians are located is determined.

Disclosure of Invention

In order to solve the problem that head information acquisition is ambiguous in the process of estimating the distance through a head part, the invention aims to acquire a pixel area of an upright pedestrian head part, and defines three steps of upright pedestrian head part acquisition criterion based on HOG characteristics, super-pixel pedestrian head part acquisition and head part integrity on the basis of image segmentation.

The technical scheme of the invention is as follows: a monocular camera upright pedestrian head pixel height acquisition method comprises the following steps:

s1, acquiring a pedestrian image by a monocular camera positioned in front of a vehicle, acquiring a pedestrian head calibration frame through HOG characteristics, acquiring a head pixel area through a superpixel algorithm, and acquiring the pedestrian head pixel area to obtain a target binary image;

and S2, establishing a vertical energy distribution map to obtain the pixel height of the head of the pedestrian.

An application of a monocular camera upright pedestrian head pixel height acquisition method in continuous distance estimation of infrared-visible light binocular pedestrian body multi-component fusion.

Has the advantages that: the method for acquiring the head of the super-pixel upright pedestrian aims at acquiring the body part information of the pedestrian, and the pedestrian distance is specifically estimated by using the acquired result. According to the invention, distance estimation is taken as an application point, and after a pixel region of a pedestrian head part is obtained according to requirements, height and other related information are extracted, so that specific application of algorithms such as subsequent distance estimation is facilitated.

Drawings

FIG. 1 is a schematic logic diagram of an upright pedestrian head acquisition method;

FIG. 2 is a schematic diagram of pixel allocation;

FIG. 3 is a schematic diagram of head pixel acquisition;

FIG. 4 is a body height pixel acquisition diagram;

FIG. 5 is a graph showing the results of example 1;

FIG. 6 is a graph showing the results of example 2;

Detailed Description

The invention is described in further detail below with reference to the following detailed description and accompanying drawings:

a schematic logic diagram of a method for obtaining a head part of a pedestrian standing upright is shown in fig. 1, and the method is implemented by the following steps:

step 1: acquiring a pedestrian head calibration frame through HOG characteristics;

step 2: processing pixels in the target area and the head calibration frame through a superpixel algorithm and acquiring a head height pixel;

and 3, step 3: acquiring height pixels of the pedestrians, and verifying the height result of the head pixels according to the proportion of the head pixels of the pedestrians;

and 4, step 4: and outputting the height of the pedestrian head pixel according with the verification result, and obtaining the final height of the pedestrian head pixel according to the obtained fixed proportion of the pedestrian height pixel for the result which does not pass the verification.

From the above description, the specific method of the present invention is as follows:

1. upright pedestrian head part acquisition criterion based on HOG features

The method for detecting the pedestrian by using the HOG characteristic is a mature method, and has certain advantages in stability and detection precision. The pedestrian head part is extracted through the HOG characteristics, and the pedestrian head part and the area where the pedestrian head part is located are judged through a set rule. The established rules are as follows:

1) and processing each pedestrian frame based on an optical flow method, and obtaining a target binary image.

2) And carrying out energy filtering processing on the analysis result of the previous step to respectively obtain energy filtering curves. The energy filter curve has two directions x and y.

3) The curve of the energy filter in the x direction should be in a quadratic function inverse curve shape, the curve of the energy filter in the y direction should be in a front partial region constant value curve shape, and the curve of the energy filter in the rear partial quadratic function inverse curve shape. From which head parts are selected among the body parts and subsequently calculated.

2. Superpixel pedestrian head component pixel acquisition

The superpixel algorithm is essentially a method for grouping pixels with similar positions and characteristics. The use of the super-pixel algorithm is beneficial to removing redundant information, and effective information of the head of the pedestrian is extracted from the pedestrian head detection frame, so that the follow-up algorithm is convenient to carry out. When the superpixel algorithm is used for acquiring the pedestrian head area, the target and non-target areas can be accurately divided mainly depending on the adsorbability of the superpixel algorithm, so that the purpose of acquiring the pedestrian head pixel area is achieved.

The method mainly comprises the following steps in the aspect of acquiring the head pixel area of the super-pixel pedestrian:

(1) cluster initialization

In CIELAB color space, the number k of superpixels to be generated is input, and the same grid interval is determined according to the number n of pixel points in a processing area

Ensuring that the size of the acquired superpixel blocks is the same. Defining five-dimensional vector in CIELAB color space through pixel point color and position information

C_k＝[l_ka_kb_kc_kx_ky_k]^T(1)

Where l, a, b denote color information, x, y denote spatial information, l_kIs the brightness of the color of the center point, a_kIs the position of the center point between red/magenta and green, b_kIs the position of the center point between yellow and blue, x_kIs a central pointDistance from x-axis, y_kIs the distance of the center point from the y-axis;

(2) pixel distance calculation

The algorithm defines a new distance index D to represent the pixel i from the center C_kThe relationship between the color distance and the space distance is judged together, and the contribution of the color distance and the space distance to the distance index D is determined by the weight m.

Wherein d is_cIndicating the color distance, d_sRepresenting a space distance, wherein m represents a distance adjustment weight coefficient, when m is smaller, the weight of the color distance is higher, the adsorbability of the super-pixel to the target edge is stronger, and the regularity of the shape and the size of the super-pixel is reduced; when m is larger, the weight of the spatial distance is higher, and the regularity of forming the superpixel block is better;

(3) pixel allocation

In the pixel assignment process, each pixel i will be assigned to a corresponding super-pixel block according to the distance from the cluster center point. When the super pixel region size is s × s, the corresponding search region size should be 2s × 2 s.

(4) Cluster centric update

And after the pixel i is distributed to the clustering center, determining the clustering center again according to the color and the position information of the pixel. And calculating the residual value between the updated residual value and the previous clustering center by using the spatial two-norm, continuously repeating the updating process until the error is converged, stopping updating and determining the super-pixel block.

(5) Post-treatment

Because the superpixel algorithm does not have clear connectivity, after clustering processing, part of pixel points do not belong to any superpixel block. Aiming at the problem, the isolated pixel points are redistributed by using a connected algorithm.

(6) Obtaining head pixel height

And acquiring a target area through a superpixel algorithm, taking a target binary image as input, and acquiring a vertical-direction energy distribution map by using an energy filtering algorithm. In the vertical energy distribution diagram, the horizontal coordinate direction is the vertical coordinate direction in the image coordinate system, and the size of the horizontal coordinate direction is the same as that of the vertical coordinate direction in the image coordinate system; the ordinate of the energy distribution diagram is the energy value of the corresponding pixel row in the image. According to the corresponding relation between the image and the vertical energy distribution graph, the pedestrian vertex position is the initial value of the energy curve, and the vertical coordinate P of the pixel position of the pedestrian vertex position is determined_t(ii) a The pedestrian head bottom part and other body parts are connected to form a valley, and the vertical coordinate P of the pixel position of the pedestrian head bottom part is determined according to the valley_b. Pedestrian head pixel height;

H_h＝P_b-P_t(4)

3. header integrity definition Specification

Head integrity is the fixed proportion requirement that the head part pixel height satisfies the height pixel within the error range.

In the process of extracting the head pixel region, the characteristics of the height, width and pixel area of the head pixel are obtained correspondingly. In the process of acquiring the head pixel region through the superpixel algorithm, when the head region is similar to the background, the superpixel algorithm cannot effectively acquire the target region. And judging whether the regional acquisition is effective or not, and processing the failure condition to ensure the integrity of the invention.

The corresponding proportional relation exists between the pixel height of the head of the pedestrian and the pixel height. In applying the superpixel algorithm, the pedestrian body height is easier to obtain than the pedestrian head pixel area. Setting a pedestrian height pixel h_bHeight h of head pixel_hWhen the relation is satisfied, the ratio coefficient is r and the error ratio is e

h_b×r×(1+e)≥h_h≥h_b×r×(1-e) (5)

And then, judging that the head part of the pedestrian is completely acquired, and acquiring the head height characteristic according to the relation between the height pixel and the head height pixel for the related calculation of the subsequent estimation of the distance of the pedestrian.

4. Body height pixel

The body height pixel acquisition can be divided into four steps of super-pixel segmentation of pedestrians, energy statistics and analysis results. Specifically, pixels in a pedestrian frame are calculated by utilizing a superpixel segmentation algorithm, and a pedestrian area and a background area are segmented to remove redundant information; operating the segmentation result by using an energy filtering algorithm to obtain a vertical energy statistical result; and finally analyzing the result to obtain the body height pixel.

Compared with the head part of the body part, the pedestrian has rich overall part characteristics and is easier to acquire. The body pixel height is used as a head integrity judgment basis, and the feasibility ensures the stability of the judgment result.

The method for acquiring the head part of the upright pedestrian is based on intelligent image analysis, the approximate region where the head of the pedestrian is located is determined through the HOG characteristics, due to the fact that the complexity of a superpixel segmentation algorithm is high, superpixel processing is only carried out on the approximate region of the head, and the method for acquiring the head part guarantees the accuracy and the real-time performance of the algorithm at the same time. In order to avoid the condition of algorithm failure, the integrity of the head is verified through the relationship between the body and the height of the head pixel, and a specific method for acquiring the height characteristic is provided.

From the perspective of safety protection of pedestrians, the method for acquiring information based on body parts does not need the pedestrians to carry other devices, meets the requirements of actual conditions, and is more beneficial to popularization in a traffic system. With the purpose of pedestrian safety, the invention provides data support for a pedestrian distance estimation method based on the head height.

1) The HOG feature has been widely applied to image human body detection as a feature description method, and gradually becomes a mainstream algorithm. The invention takes the HOG characteristics as the information acquisition basis, thereby ensuring the stability of the subsequent algorithm. And acquiring information through the HOG characteristics, performing deep excavation, and distinguishing a head part and using the head part in the process of acquiring the head pixels of the pedestrian. The pedestrian head pixel acquisition process is more assisted, and the direct operation on all information in the image influences the real-time performance, so that the timeliness and the pixel acquisition precision of the method are simultaneously ensured by a mode of independently processing the head component after the head component is distinguished.

2) The head is used as a rigid part of the body, has the characteristic of difficult deformation under the condition that the pedestrian is in any posture, and is specifically estimated by adopting the monocular distance measuring principle. The integrity acquired by the head part is the key to whether the distance estimation result is accurate. In order to obtain the pixel size of the pedestrian head part more accurately, the invention adopts a superpixel segmentation algorithm to be applied to the acquisition process. The super-pixel segmentation algorithm effectively removes redundant regions and extracts effective information by segmenting pixel points which are adjacent in position and similar in characteristics such as color and texture. Since the algorithm calculates the pixel points one by one, the amount of calculation is a problem to be solved in the application process of the algorithm. The method carries out primary processing on the image through the upright pedestrian head part acquisition criterion based on the HOG characteristic to obtain the head and the peripheral pixel points. Therefore, when the super-pixel pedestrian head part pixel is obtained, the calculated amount can be controlled within a certain range, and the calculating speed of the method is guaranteed.

3) In an image, the proportion of pedestrian head pixels in body pixels is small, in a complex traffic environment, in the face of various scenes, the head pixels are obtained only by two methods, namely an upright pedestrian head part obtaining criterion based on HOG characteristics and a super-pixel pedestrian head part pixel obtaining method, and the risk of method failure exists. In order to ensure the integrity of the invention and form a closed loop, the head pixel acquisition result is tested through the proportional relation between the head height and the body height, and other schemes are adopted to acquire the head information of the results which do not pass the test. The head information acquisition result is confirmed in a self-checking mode, so that the error rate of the distance output result is greatly reduced, and the safety is guaranteed in the actual application process.

4) The self-checking process is mainly carried out according to the height proportion of the head and body pixels of the pedestrian. The pedestrian head pixel height is obtained according to the super-pixel pedestrian head part pixel, the body pixel height is obtained according to a segmentation algorithm and threshold segmentation, specifically, the image is processed for the first time through the HOG characteristic, then the processing result is subjected to super-pixel segmentation, and redundant information around the pedestrian is removed. And processing the segmentation result by an energy filtering method, and finally obtaining the pedestrian pixel height through a result oscillogram. The pedestrian risk index judgment is a main application point of the invention, so that the invention has higher requirements on accuracy and stability. Therefore, the process of checking the head height is particularly critical in the present invention. The pixel height ratio between the head and the body is used as a pedestrian head height result, the judgment basis is strong in feasibility, other hardware facilities do not need to be added in the process of checking the result to carry out auxiliary processing, and the application in a complex environment is facilitated due to the convenience.

(5) The invention can be used in the process of avoiding pedestrians by the mobile robot. The method is characterized in that an internal camera of the robot is used as a hardware basis, the pixel height of the head of the pedestrian is obtained through the method, the distance between the pedestrian and the robot is obtained according to the existing pedestrian body part distance estimation method, and the method is used as an effective judgment basis for the robot to avoid the pedestrian.

(6) The pedestrian danger judgment method and the pedestrian danger judgment system can be used in the process of judging the pedestrian danger by the vehicle-mounted equipment. The distance between the pedestrian and the vehicle is an important criterion for judging whether the vehicle forms danger to the pedestrian, and the head pixel height obtained according to the method is used as a distance estimation judgment basis, so that the distance is estimated on the basis of not increasing hardware load, and the method is suitable for complex traffic environments.

(7) The method can be used for the process of judging the distance between pedestrians by the unmanned aerial vehicle in the traffic law enforcement process. At present, the development of the unmanned aerial vehicle is not limited to technologies such as aerial photography, and the application of the unmanned aerial vehicle in traffic law enforcement is a breakthrough for the unmanned aerial vehicle to serve people. In the law enforcement process, unmanned aerial vehicle mainly used is to the shooting of the certificate that the pedestrian held. By applying the method and the device, the pixel height of the head of the pedestrian is acquired simultaneously in the process of shooting the certificate, the distance is estimated, the proper distance is kept, and the certificate is shot clearly on the premise of not damaging the pedestrian.

(8) The invention can be used for the process of judging the pedestrian distance by the unmanned aerial vehicle in art shooting. Along with the development of artificial intelligence, unmanned aerial vehicles also draw close to the service industry gradually, and the use of unmanned aerial vehicles in the art shooting process has been accepted by the public. In the unmanned aerial vehicle use, generally need the people or equipment to intervene, also promoted the shooting cost when having increased shooting personnel work load. The invention obtains the head pixel height of the photographer, and accordingly, the proper distance is kept between the head pixel height and the photographer, and the shooting is finished.

Example 1:

in the embodiment, the experimental image acquisition is completed by using the camera (480 × 640@30Hz), the image is shot on the street in winter, and the image contains 1 pedestrian target. The method carries out pixel height estimation on the head of the target pedestrian, the error of the estimation result of the distance between the head and the head of the target pedestrian is shown in figure 5, and the final error does not exceed 2 pixels.

Example 2:

in the embodiment, the experimental image acquisition is completed by using the camera (480 × 640@30Hz), the image is shot on the street in winter, and the image contains 1 pedestrian target. The method carries out pixel height estimation on the head of the target pedestrian, the error of the estimation result of the distance between the head and the target pedestrian is shown in figure 6, and the final error does not exceed 2 pixels.

Example 3:

in order to solve the problem of accuracy of estimating the human-vehicle distance through the image in front of the vehicle, the following scheme is proposed: an infrared-visible binocular pedestrian body multi-component fused continuous distance estimation method, comprising the steps of:

s1, shooting a same vehicle front image through an infrared-visible light binocular camera to obtain an infrared vehicle front image and a visible light vehicle front image;

s2, detecting and tracking multiple time scales, and determining the positions of target pedestrians in the front image of the infrared vehicle and the front image of the visible light vehicle;

s3, acquiring the head heights of the pedestrians in the two images, calculating a head part distance estimation result, and calculating a foot part distance estimation result;

and S4, performing primary fusion on the estimation results of the distances between different body parts of the pedestrian, performing secondary fusion on the estimated distance output according to the visible light image and the infrared light image, completing the distance fusion of the head part and the foot part of the cascade pedestrian, and determining the distance between the pedestrian and the front of the vehicle.

Further, the method for estimating the continuous distance by fusing the multiple parts of the infrared-visible light binocular pedestrian body further comprises S5, tracking and checking the distance output result, and outputting the checking accurate distance.

Further, the method for obtaining the height of the head pixel in the two images in step S3 is to determine the head height of the pedestrian with coarse granularity, and the method includes the following steps:

coarse-grained pedestrian head height estimation by a fixed ratio r between head height and body height to head height estimation, head height H_{re_head}Height H of human body_bodyRatio r_hbAs determined by the simulation example, it is,

head height is

H_{re_head}＝H_body×r_hb

The fine-grained method is to obtain the height of the head pixel by a super-pixel algorithm, and the height of the head pixel should float in a proportional range, wherein the reference range is as follows:

H_{re_head}×(1-r_re)＜H_head＜H_{re_head}×(1+r_re)

wherein r is_reControlling the height H of the head pixel obtained by a fine-grained method to be between 0.2 and 0.3 for a floating coefficient_headWithin the reference range, adding H_headOutputting as the head height, otherwise, judging that the height of the pedestrian head pixel obtained by the super pixel is invalid, and making H_{re_head}As an output.

Further, the multi-time scale detection and tracking method of step S2 is:

(1) setting a certain frame in a video sequence as a first frame, and actively marking the pedestrian information of the frame;

(2) continuously tracking the pedestrians by using a KCF algorithm according to the marked content of the first frame;

(3) after tracking the m frames of images, taking a tracking result as input, extracting HOG characteristics, training a pedestrian detection model on line through an SVM classifier, detecting images in a video sequence, checking the detection result as the tracking result, setting that the images are detected once every m frames of images to realize tracking correction, and performing n times in total, wherein the number k of detection frames is:

k＝1+m×n，n∈Z

further, in step S4, the method of performing first-level fusion on the estimation results of distances between different body parts of the pedestrian is as follows: the method comprises the steps that a pedestrian with a distance to be estimated stands at a plurality of different positions with the same known distance, the distance is the distance between the pedestrian and the front of a vehicle, distance estimation is carried out, projection part distance estimation and foot part distance estimation are respectively carried out at one position, distance estimation of all positions is completed, and a distance estimation result set x is obtained through a head part₁Obtaining a set of distance estimates x by the foot member₂，

Is the head part obtains a set of distance estimates x₁The average value of (a) of (b),

is a set of results x of the foot member's acquisition of distance estimates₂Mean value of (1), the weight occupied by the head part distance estimation result being p₁The result of the foot member distance estimation is weighted by p₂And σ is the standard deviation, the fusion weight is:

further, for a certain actual distance detection, the head part distance estimation result D_AFoot part distance estimation result D_BThen distance estimate D₁：

D₁＝p₁D_A+p₂D_B

Further, the method of performing the second-level fusion on the estimated distance output according to the visible light and infrared light images in step S4 is:

acquiring an infrared vehicle front image distance estimation set: set of head distance estimates x for acquisition using infrared front images₁And the foot part acquires the set of distance estimation results x₂The head part distance estimation result set obtained by the infrared front image is D_HThe set of results of estimating the distance between the foot members of the infrared front image is D_FThe distance estimation value set of the infrared front image is D_V：

D_V＝p₁D_H+p₂D_F

Acquiring a visible light vehicle front image distance estimation set: head distance estimation result set x for using a visible light vehicle front image acquisition₁And the foot part acquires the set of distance estimation results x₂The head part distance estimation result set acquired by the visible light vehicle front image is D_GThe set of foot part distance estimates for the visible light front image is D_KAnd the distance estimation value set of the visible light vehicle front image is D_I：

D_I＝p₁D_G+p₂D_K

Set of distance estimates for infrared front images D_VSet of distance estimates for visible light pre-vehicle images D_I，

Set of distance estimates D being an infrared front image_VThe average value of (a) of (b),

set of distance estimates D for visible light pre-vehicle images_IThe mean value of (1), the weight occupied by the distance estimation result of the infrared front image is p₃The weight occupied by the distance estimation result of the visible light vehicle front image is p₄Then the fusion weight is:

for a certain actual distance detection, the distance estimation result D of the front image of the infrared vehicle_CDistance estimation result D of visible light vehicle front image_DThen distance estimate D₂：

D₂＝p₃D_C+p₄D_D。

The infrared-visible light binocular pedestrian body multi-component distance fusion estimation method provided by the invention ensures the effectiveness of the method in obtaining images under different visibility and illumination conditions; the problem that distance estimation of partial frame numbers is invalid when the positions of pedestrians are obtained on mobile equipment is solved, and the continuous and effective distance estimation of the pedestrians in the later period is ensured; the method for acquiring the head height in the existing body part distance estimation algorithm is improved, and the accuracy of the head height is improved; the application range of the method is enlarged, and the distance can be accurately estimated in severe weather environments such as heavy fog, rain, snow and the like or when body parts are partially shielded; the check of the previous and the next frames of the distance estimation is completed, and the integrity of the algorithm is ensured.

The method is based on intelligent images, utilizes the infrared-visible binocular camera to acquire images, utilizes a multi-time scale detection and tracking method to determine the positions of pedestrians, enhances the acquisition accuracy of the head pixel height according to a coarse and fine granularity pedestrian head height judgment method, and accurately estimates the distance through cascading distance fusion, thereby improving the precision of distance estimation results, increasing distance estimation verification in the method, preventing the occurrence of failure of the method and ensuring the integrity of the method. (1) According to the invention, the image to be detected can be obtained only through the monocular camera and the infrared camera, the requirement on hardware cost is low, the popularization is convenient, and the effectiveness of the binocular camera on the image source is ensured; (2) in order to reduce the time consumption of the method and improve the accuracy of pedestrian positioning, the invention adopts a multi-time scale detection and tracking method, and determines the position of a pedestrian by implementing a tracking intermittent detection method; (3) the method for judging the head height by dividing the coarse-grained body proportion and the fine-grained image avoids failure judgment of the fine-grained head height while improving the accuracy of obtaining the head height; (4) the method is suitable for various complex conditions and becomes an indispensable requirement of people for an auxiliary driving system, the distance estimation result of the body part is processed in a cascading pedestrian head and foot part distance fusion mode, the influence of the shielding of the body part of the pedestrian on the distance estimation result is avoided in the first-stage fusion, and the second-stage fusion enables the application range of the method to be wider, can adapt to different visibility environments and carries out distance estimation on the pedestrian; (5) due to the application environment with complex algorithm, the condition that the distance estimation result is invalid is difficult to avoid, and in order to prevent the problem, the invention adds the tracking algorithm. The distance result of the failure frame is predicted by using the distance estimation result of the previous and next frames through the prediction function of the tracking algorithm, so that the integrity of the invention is ensured. The invention meets the requirements of an auxiliary driving system in multiple aspects in the process of autonomous automobile driving and has extremely strong popularization value.

According to the method, firstly, an infrared-visible light binocular camera is utilized to ensure effective acquisition of an image, the position of a pedestrian is determined by a continuous tracking-intermittent detection method, the position of the pedestrian in the image is continuously tracked after the image to be detected is acquired, and in order to ensure the detection precision, the tracking result is verified by a detection algorithm at the same time interval; acquiring a pixel image area corresponding to a pedestrian according to the position of the pedestrian, judging the head height of the pedestrian with coarse granularity, acquiring the head height of the pedestrian according to the proportion between the pedestrian and the head in the coarse granularity judgment, determining the fine granularity judgment according to a segmentation algorithm, and meanwhile, taking the coarse granularity judgment as a verification basis of the fine granularity judgment to prevent the failure condition of the fine granularity judgment caused by the conditions of excessive segmentation and the like; and obtaining the estimation result of the distance between the head and the foot parts under the conditions of visible light and infrared light by using a known algorithm, and processing by adopting a cascading type fusion mode to finally obtain the estimated distance. The cascade fusion is divided into two stages, wherein the first stage fusion is carried out on the estimation result of the distance between the head and the foot parts of the pedestrian under the condition of visible light or infrared light, and the second stage fusion is carried out on the fusion result of the visible light, the infrared head and the foot; in the operation of the method, when the estimated distance exceeds a certain threshold range of the tracking prediction result, the distance estimation result is judged to be invalid, and the tracking prediction result is used as a final distance estimation result.

The method is suitable for the distance estimation of the mobile device for the pedestrian. (1) Being applicable to mobile robot and dodging pedestrian: with the advent of the intelligent manufacturing era, the mobile robot industry has been unprecedentedly developed, and people have paid attention to avoidance of pedestrians as a problem to be solved urgently for mobile robots. The invention solves the main problems that the mobile robot is complex in environment and how to effectively avoid pedestrians under the conditions of dim light and low visibility. The visible-infrared binocular camera is used for simultaneously acquiring the images, the effectiveness of image information acquisition is guaranteed, and the processes of post-processing the images and estimating the distance are not affected by the external environment. Therefore, the method and the system meet the requirement on the estimation aspect of the distance of the pedestrian when the mobile robot avoids the pedestrian. (2) Positioning for pedestrians for Unmanned Ground Vehicles (UGVs): at present, the unmanned ground vehicle is mainly applied to emergency scenes such as logistics transportation, detection, protection, medical evacuation and the like, and under dangerous conditions, pedestrians in the environment are timely positioned, and corresponding measures are taken to solve the problem that the unmanned ground vehicle needs to solve. UGV is complicated in the face of environment, and difficulty in pedestrian distance estimation is increased by extreme weather conditions such as strong illumination of outdoor sand wind, rain and snow and the like. According to the method for acquiring the image through the infrared-visible light binocular camera and the method for fusing the distances between the cascaded pedestrian heads and the foot parts, the effectiveness and the accuracy of acquiring the pedestrian distance under the extremely severe outdoor weather conditions are guaranteed. (3) The method is suitable for the field of autonomous automobile assistant driving, mainly used for judging the existence of the distance between the pedestrian and the vehicle in the process, and provides important data support for pedestrian danger judgment. The pedestrian distance estimation method can meet the requirements on various aspects such as continuity, accuracy and integrity in the process of estimating the pedestrian distance in the auxiliary driving process, and innovatively improves various problems such as pedestrian position determination, body part information acquisition, emergency treatment of application scenes and algorithm failure and the like. Only a monocular camera and an infrared camera are used as image acquisition equipment, the requirement on hardware is low, and the method is easy to implement. Meanwhile, the binocular camera ensures the safety of the autonomous automobile in driving at night, and solves the problem of huge potential safety hazard.

Example 4:

the scheme in the embodiment 3, wherein the method for acquiring the height of the human head pixel comprises the following steps:

Further, the method of step S2 is:

acquiring a vertical energy distribution map of the target binary image by using an energy filtering algorithm, wherein in the vertical energy distribution map, the abscissa direction is the ordinate direction in an image coordinate system, the size of the abscissa direction is the same as the size of the ordinate in the image coordinate system, and the ordinate of the energy distribution map is the energy value size of a corresponding pixel row in the image;

determining the vertical coordinate P of the pixel position of the pedestrian vertex from the corresponding relation between the image of the head target area and the energy distribution map in the vertical direction and the pedestrian vertex position as the initial value of the energy curve_t(ii) a The connecting part of the bottom of the head of the pedestrian and other body parts has a valley, and the vertical coordinate P of the pixel position at the bottom of the head of the pedestrian is determined according to the valley_b；

The pedestrian head pixel height is represented by:

H_h＝P_b-P_t。

further, the method for acquiring the head pixel height of the standing pedestrian with the monocular camera further comprises the following steps:

s3, acquiring height pixels of the pedestrians, and verifying the height results of the head pixels according to the head-body pixel proportion of the pedestrians;

and S4, outputting the height of the pixel of the head of the pedestrian according with the verification result, and outputting the final height of the pixel of the head of the pedestrian according to the obtained fixed proportion of the pixel of the height of the pedestrian for the result of non-passing of the verification.

Further, the method for obtaining the body height pixel comprises the steps of calculating the pixel in the pedestrian frame by utilizing a superpixel segmentation algorithm, segmenting out a pedestrian and a background region, removing redundant information, operating the segmentation result by utilizing an energy filtering algorithm to obtain a vertical energy statistical result, and analyzing the result to obtain the body height pixel.

Further, in the extraction of the head pixel region, the method for obtaining the height of the head pixel of the pedestrian is to judge the head height of the pedestrian by coarse granularity, and the method comprises the following steps of estimating the head height of the pedestrian by coarse granularity and estimating the head height of the pedestrian by fine granularity:

head height is

H_{re_head}＝H_body×r_hb

H_{re_head}×(1-r_re)＜H_head＜H_{re_head}×(1+r_re)

Further, the method for acquiring the head pixel region comprises the following steps:

(1) initializing clusters: inputting the number k of superpixels to be generated in a CIELAB color space, and determining the same grid spacing s according to the number n of pixel points in a processing region to ensure that the sizes of the obtained superpixel blocks are the same;

wherein:

defining the five-dimensional vector distance center in the CIELAB color space by using the color and position information of the pixel points:

C_k＝[l_ka_kb_kc_kx_ky_k]^T

where l, a, b denote color information, x, y denote spatial information, l_kIs the brightness of the color of the center point, a_kIs the position of the center point between red/magenta and green, b_kIs the position of the center point between yellow and blue, x_kIs the distance of the center point from the x-axis, y_kIs the distance of the center point from the y-axis;

(2) pixel distance calculation: defining a distance index D to represent the pixel i and the distance center C_kThe relationship between the color distance and the space distance is judged together, and the contribution of the color distance and the space distance to the distance index D is determined by the weight m:

(3) pixel allocation: each pixel i in the pixel allocation process is allocated to a corresponding super pixel block according to the distance from the clustering center point, and the corresponding search area of the pixel area is twice of the super pixel area;

(4) updating a clustering center: when the pixel i is distributed to the clustering center, re-determining the clustering center according to the color and position information of the pixel point, calculating the residual value between the updated pixel and the previous clustering center by utilizing a space two-norm, continuously repeating the updating process until the error is converged, stopping updating and determining the super-pixel block;

(5) and (3) post-treatment: after clustering processing, part of the pixels do not belong to any super-pixel blocks, and redistribution processing is carried out on the isolated pixels by using a connected algorithm.

Example 5:

for the solution in embodiment 3 or 4, a foot component acquiring method based on pedestrian image of energy filtering includes the following steps:

step 1: acquiring a pedestrian foot calibration frame through HOG characteristics;

step 2: obtaining the area of the foot target through a superpixel algorithm;

and 3, step 3: and obtaining the coordinates of the output foot part through energy filtering.

Further, the method in the step 3 is as follows: setting the central point position of the tiptoe of the pedestrian as a specific corresponding point P of the foot position_fProjecting the binary result of the region where the foot target is located in the horizontal and vertical directions respectively, wherein the energy characteristic is the statistics of non-zero pixel points in the horizontal and vertical directions, performing energy filtering processing on a binary image, accumulating the non-zero pixel points and forming a corresponding energy filtering curve, in a vertical energy distribution diagram, the abscissa direction is the ordinate direction in an image coordinate system, the size is the same as the ordinate size in the image coordinate system, the ordinate of the energy distribution diagram is the energy value size of a corresponding pixel row in the image, and the P is the corresponding relation between the image and the vertical energy distribution diagram_fThe abscissa is the horizontal coordinate of the initial value of energy distribution in the horizontal direction

With the abscissa of the end point value

The median value, i.e.:

P_fthe ordinate is the abscissa of the end point value of the energy distribution diagram in the vertical direction

Namely:

wherein:

defining a five-dimensional vector in a CIELAB color space by using pixel point color and position information:

C_k＝[l_ka_kb_kc_kx_ky_k]^T(1)

The method for acquiring the upright pedestrian foot component is based on intelligent image analysis, determines the pedestrian foot component area through HOG characteristics, further processes the target area by utilizing a superpixel algorithm, removes redundant information and extracts the area where the target is located. In order to further acquire image information and extract required characteristics, a target area binary image is processed according to an energy filtering algorithm to obtain horizontal and vertical energy statistical curves, and characteristic points, namely the positions of the pedestrian foot parts, are obtained according to the horizontal and vertical energy statistical curves.

The baggage pedestrian foot component acquisition method provided by the invention serves a distance estimation algorithm based on a pedestrian body component and provides effective data support for the pedestrian body component. The body part of the pedestrian is used as an information acquisition source, external hardware intervention is not needed, and the method is suitable for distance estimation of the pedestrian in the complex environment due to convenience. The invention is based on computer vision, only uses a monocular camera as hardware equipment for image acquisition, and has low requirement on the hardware equipment. The pedestrian foot part characteristics are obtained through deep excavation of image information, so that the process of a subsequent algorithm is completed.

Compared with other body parts, the foot parts are in contact with the ground, the positions are relatively fixed, and the advantage in the aspect of obtaining complexity is achieved. Meanwhile, the ground color features are relatively single, and the burden in the image segmentation process is reduced.

1) According to the pedestrian body multi-component judgment method, the pedestrian body multi-components are obtained through the HOG characteristics, and the foot components are judged according to the inherent characteristics of the foot components through an optical flow method and an energy filtering curve. The HOG characteristic is widely used by scholars as a main characteristic in the pedestrian detection process, and meanwhile, the algorithm is relatively mature and has certain advantages in the aspects of detection precision, timeliness and the like. The invention only acquires the pedestrian foot component information, extracts the partial characteristics, and performs parallel processing on multiple components acquired by HOG by using an image segmentation basic algorithm-optical flow method, thereby ensuring the real-time property of the method. The foot component region is obtained by analyzing the component map processed by the optical flow method. In the process of obtaining the foot part area, a mainstream basic algorithm is used, so that the stability of the invention is ensured, and the instantaneity in the process of distance estimation is ensured.

2) The superpixel algorithm belongs to the field of image segmentation and processes pixels in a foot component area. The super-pixel algorithm processes images on a pixel level, so that the problems of high algorithm complexity and low real-time performance are brought while the algorithm precision is ensured. Therefore, only pixel calculation in the foot component area is carried out, and both calculation accuracy and speed are guaranteed. The super-pixel segmentation algorithm based on cluster analysis processes pixels with similar characteristics as a super-pixel block by calculating the characteristic values of pixel points. The super-pixel algorithm can be used for artificially regulating and controlling the weights in various aspects such as the size, the range and the like of the feature block, so that the application effectiveness of the super-pixel algorithm in different occasions is ensured.

3) The energy filtering algorithm extracts the characteristic points through the filtering curve, so that the aim of deeply mining the image information is fulfilled. The energy filtering algorithm mainly acts on accumulation of non-zero pixel points and obtains a filtering curve, and through analysis of the image and the filtering curve, pedestrian foot position point characteristics are obtained and specific pixel positions are obtained. Compared with other feature extraction methods, the method is easy to implement and calculates the speed block. The method is mainly applied to obtaining the pedestrian distance, and has higher requirement on real-time performance, so that the method for obtaining the pedestrian step position by adopting a quick and stable method is a necessary requirement of the method. The energy filtering algorithm only carries out accumulation calculation on pixel points, extracts the foot position pixels through the existing rule, and guarantees both real-time and stability.

Example 6:

for the solution in embodiment 3 or 4 or 5, in step s2, the multi-time scale detection and tracking is performed to determine the positions of the target pedestrians in the infrared vehicle front image and the visible light vehicle front image, the method in this embodiment may also be used, and the sample may select the target tracking method of the update mechanism to determine the positions of the target pedestrians in the infrared vehicle front image and the visible light vehicle front image, where the tracking method includes:

firstly, acquiring a video starting frame and initializing a tracker;

secondly, tracking the target by using a filtering tracking method for the next frame and returning a tracking result;

thirdly, analyzing the tracking result by using an image feature forgetting method, extracting the image feature of the target area, comparing the image feature with the reference image, and forgetting the tracking result with an overlarge difference;

and fourthly, verifying the forgotten tracking result in the third step by using an energy significance memory method, extracting gradient energy of a target area of the forgotten result to perform significance analysis, memorizing the tracking result containing the target into a sample library again, maintaining the forgetting operation of other results not containing the tracking target, and returning to the second step or finishing the tracking.

And further, if the third step does not have a tracking result with an overlarge difference, returning to the second step or ending the tracking.

Further, the third step is: respectively extracting HOG and CNN characteristics of target areas of a kth frame image and a (k + 1) th frame image by taking the kth frame image as a reference image, and then calculating Manhattan distance values of the image characteristics of the two areas as image distance values of a tracking result of the (k + 1) th frame;

let the k frame image of the video be characterized by J_k(x) The k +1 th frame image is characterized by J_k+1(x) Then, calculating an image distance value of the k +1 frame image by using the formula (2);

dist_k+1＞δ，δ∈(0，1) (3)

where δ is an upper limit value of the overlap determination failure sample, dist_k+1Is the image distance value of the k +1 th frame image, n is the number of elements in the feature map, J_k(x)_iAnd J_k+1(x)_iThe ith element in the image characteristics of the 1 st frame and the (k + 1) th frame respectively;

and if the image characteristic distance of the (k + 1) th frame is greater than delta, judging the tracking result of the (k + 1) th frame as a tracking result needing to be forgotten, and if the image characteristic distance is less than delta, memorizing the tracking result into a training set and jumping to the fifth step.

Further, the fourth step is as follows: extracting HOG energy of a target area from the (k + 1) th frame of image of the latest target tracking result, then extracting HOG energy of all images in a training set, calculating the mean value of the HOG energy as comparison, and calculating the HOG energy change value of the (k + 1) th frame of image as the energy significant value of the (k + 1) th frame of image;

let H_k+1HOG energy value, H, for the k +1 frame image_xFor the HOG energy set of all images in the training set, then

Equation (7) is the energy significance calculation equation, ener_k+1Is the energy significance value of the k +1 th frame image, m represents the number of images in the training set, H_x(i) HOG energy representing the ith image in the training set;

if the energy significance value of the (k + 1) th frame image meets the formula (8), the (k + 1) th frame image is memorized into the training set again, and if the energy significance value of the (k + 1) th frame image does not meet the formula, the forgetting operation of the (k + 1) th frame image is maintained;

in one scheme, in order to solve the problem of remembering effective samples in a forgotten result again, a method is provided, gradient energy of a target area of the forgotten result is extracted to perform significance analysis, and a tracking result containing a target is remembered into a sample library again.

Further, HOG energy of a target area is extracted from the (k + 1) th frame of image of the latest target tracking result, HOG energy of all images in a training set is extracted and the mean value of the HOG energy is calculated to be used as comparison, and the HOG energy change value of the (k + 1) th frame of image is calculated to be used as the energy significant value of the (k + 1) th frame of image;

(1) the target tracking algorithm of the sample selectable updating mechanism can keep good adaptability to complex environments such as target shielding, severe light and dark changes, target deformation and the like, so that the target tracking method can be applied to more actual scenes, and more reliable target position information can be provided for subsequent judgments such as pedestrian intention analysis and the like;

(2) the image characteristic forgetting method can screen the tracking result, forgets the tracking result with a larger difference with the reference image, is suitable for all discrimination model type target tracking methods, can avoid the pollution of the training set by the characteristic information of the sheltering object, and improves the adaptability of the target tracking method to the target shelter;

(3) the obvious energy memory method can verify the forgetting result of the image characteristic forgetting method, and mainly memorizes the target characteristic information which is greatly changed under the complex environments of light-dark change, target deformation and the like;

(4) the invention can provide more accurate road condition information for the mobile robot, the autonomous automobile and the auxiliary driving system, and can play an important role in aspects such as obstacle avoidance and path planning of the industrial robot or the autonomous automobile, and the service robot providing guidance service for a specific character target.

Claims

1. A monocular camera upright pedestrian head pixel height acquisition method is characterized by comprising the following steps:

2. The monocular camera upright pedestrian head pixel height acquisition method according to claim 1, characterized in that: the method of step S2 is:

The pedestrian head pixel height is represented by:

H_h＝P_b-P_t。

3. the monocular camera upright pedestrian head pixel height acquisition method according to claim 2, characterized in that: further comprising:

4. The monocular camera upright pedestrian head pixel height acquisition method according to claim 3, characterized in that: the method for obtaining the body height pixel comprises the steps of calculating the pixel in a pedestrian frame by utilizing a superpixel segmentation algorithm, segmenting out a pedestrian and a background region, removing redundant information, operating a segmentation result by utilizing an energy filtering algorithm to obtain a vertical energy statistical result, and analyzing the result to obtain the body height pixel.

5. The monocular camera upright pedestrian head pixel height acquisition method according to claim 2, characterized in that:

in the extraction of the head pixel region, the method for obtaining the height of the pedestrian head pixel is to judge the height of the pedestrian head by coarse granularity, and the method comprises the following steps of estimating the height of the pedestrian head by coarse granularity and estimating the height of the pedestrian head by fine granularity:

head height is

H_{re_head}＝H_body×r_hb(2)

H_{re_head}×(1-r_re)＜H_head＜H_{re_head}×(1+r_re) (3)

6. The monocular camera upright pedestrian head pixel height acquisition method according to claim 1, characterized in that:

the method for acquiring the head pixel region comprises the following steps:

wherein:

C_k＝[l_ka_kb_kc_kx_ky_k]^T(1)

7. An application of a monocular camera upright pedestrian head pixel height acquisition method in continuous distance estimation of infrared-visible light binocular pedestrian body multi-component fusion.