CN111368630B

CN111368630B - Foot part acquisition method of pedestrian image based on energy filtering

Info

Publication number: CN111368630B
Application number: CN201911236896.2A
Authority: CN
Inventors: 毛琳; 杨大伟; 程凡
Original assignee: Dalian Minzu University
Current assignee: Dalian Minzu University
Priority date: 2019-12-05
Filing date: 2019-12-05
Publication date: 2024-07-19
Anticipated expiration: 2039-12-05
Also published as: CN111368630A

Abstract

A foot part acquisition method of pedestrian images based on energy filtering belongs to the field of image information understanding in computer vision application. In order to solve the problem that the pedestrian distance is estimated based on the body part information, the accurate acquisition of the pixel position of the pedestrian foot part cannot be realized, the range of a target area is acquired through HOG characteristics, the pixels in the target area are further processed and redundant information is removed through a super-pixel algorithm, the image is subjected to binarization processing, an energy filtering curve is acquired through an energy filtering method, a target point is obtained in the curve through experimental analysis and is mapped into an original image, and the acquisition process of the pedestrian foot position is completed.

Description

Foot part acquisition method of pedestrian image based on energy filtering

Technical Field

The invention belongs to the field of intelligent image vision, in particular to a method for acquiring an image through a monocular camera and processing a standing pedestrian in the image by utilizing an image processing method so as to achieve the purpose of acquiring the foot of the pedestrian.

Background

With the continuous maturity of unmanned technology, avoidance of pedestrians becomes one of the key points of research of students today. The energy filtering upright pedestrian foot acquisition method provided by the invention is applied to a traffic system to estimate the distance between the pedestrian and the vehicle, so that the effective avoidance of the unmanned vehicle to the pedestrian is realized. The body parts of the pedestrians are used as distance judgment bases, no external hardware equipment is needed to intervene, and the convenience is more suitable for complex traffic environments. Meanwhile, compared with other body parts, the position of the foot part cannot be changed due to the change of the body posture, and the situation that the distance estimation result is invalid due to the change of the pedestrian action is avoided.

In a traffic system, obstacle avoidance methods are endless. The patent of the external rearview mirror obstacle avoidance system and method based on the front parking radar (publication number: CN 109094504A) is completed by three parts of a radar controller, a vehicle body control system and a sound system. The range finding task is mainly completed through the radar, when the distance between the radar surveying obstacle and the vehicle body is smaller than the safety distance, the vehicle body control system folds the rearview mirror, and the sound system is used for alarming, so that obstacle avoidance is completed. The patent of automatic obstacle avoidance method and obstacle avoidance system of AGV (publication number: CN 109085832A) completes the whole obstacle avoidance process through a plurality of parts of an obstacle detection module, a distance detection module, an alarm device, a control system and an actuator. In the distance detection module, the invention uses an ultrasonic ranging method to achieve the purpose of rapidly acquiring the distance of the obstacle.

Disclosure of Invention

In order to solve the problem that foot piece information cannot be accurately acquired in the process of estimating the distance through the foot piece, the invention provides an energy filtering upright pedestrian foot piece acquisition method. The invention aims at acquiring the center point between two feet of a pedestrian, processes the image and mainly comprises two steps of an upright pedestrian foot part acquisition criterion based on HOG characteristics and an energy filtering foot pixel point position acquisition.

The technical scheme of the invention is as follows: a foot component acquisition method of pedestrian images based on energy filtering, comprising the steps of:

step 1: acquiring a pedestrian foot calibration frame through HOG features;

step 2: obtaining the region where the foot target is located through a super-pixel algorithm;

Step 3: the output foot coordinates are obtained by energy filtering.

An application of a foot component acquisition method of pedestrian images based on energy filtering in continuous distance estimation of infrared-visible light binocular pedestrian body multi-component fusion.

The beneficial effects are that: the energy filtering upright pedestrian foot component acquisition method provided by the invention is applied to a pedestrian distance estimation module of a vehicle-mounted system, takes the body component of the pedestrian as the distance estimation information source basis, does not need external hardware equipment to add, ensures the convenience of the method, and is more suitable for complex traffic systems. The pedestrians are not objectively controlled in the traffic system, and the distance estimation is carried out according to the pedestrians, so that the problem that the distance measurement result is invalid due to the change of the posture, the action and the like of the pedestrians is caused. The foot parts of the pedestrians are directly contacted with the ground, and the foot parts of the pedestrians cannot change along with the morphology of the pedestrians when the distance is fixed, so that the effectiveness of the distance estimation result is ensured.

Drawings

FIG. 1 is a schematic logic diagram of a method of upright pedestrian foot member acquisition;

FIG. 2 is a schematic diagram of pixel allocation;

FIG. 3 is a DPM acquisition pedestrian foot component diagram;

FIG. 4 is a schematic diagram of superpixel target area acquisition;

FIG. 5 is a schematic representation of energy filtering to obtain pedestrian foot position;

FIG. 6 is a schematic diagram of simulation example 1;

FIG. 7 is a schematic diagram of simulation example 2;

Detailed Description

The invention is described in further detail below with reference to the attached drawings and detailed description: a schematic logic diagram of an upright pedestrian head member acquisition method is shown in fig. 1, and the algorithm is implemented by the following steps:

step 1: acquiring a pedestrian foot calibration frame through HOG features;

Step 2: obtaining the region where the target is located through a superpixel algorithm;

Step 3: obtaining a specific position of the foot through energy filtering;

step 4: and outputting foot coordinates.

From the above, the specific implementation method of the invention is as follows:

1. vertical pedestrian foot component acquisition criteria based on HOG features

According to the invention, pedestrians are detected according to HOG characteristics, and a multi-component detection frame of the bodies of the pedestrians is obtained. The invention aims to acquire the position information of foot parts of pedestrians, so that the problem to be solved is to acquire foot part areas from detected multiple parts.

1) And processing each part frame of the pedestrian based on an optical flow method, and obtaining a target binary image.

2) And (3) carrying out energy filtering treatment on the analysis result in the last step to obtain energy filtering curves respectively. The energy filtering curve has two directions, x and y.

3) The x-direction curve of the energy filter should be a bimodal curve. The head component is selected from the body multiple components accordingly, and subsequent calculations are performed.

2. Super pixel pedestrian foot component pixel acquisition

The area where the pedestrian foot is located can be acquired by the upright pedestrian foot acquisition criteria based on the HOG features. The superpixel algorithm is essentially a method of grouping closely located, similarly characterized pixels. The use of the super-pixel algorithm is beneficial to removing redundant information, and effective information of the feet of the pedestrians is extracted from the foot detection frame of the pedestrians, so that the subsequent algorithm is convenient to carry out. When the super-pixel algorithm is used for acquiring the foot region of the pedestrian, the target region and the non-target region can be accurately divided mainly depending on the adsorbability of the super-pixel algorithm, so that the purpose of acquiring the foot pixel region of the pedestrian is achieved.

The method mainly comprises the following steps of:

(1) Cluster initialization

Inputting the number k of super pixels to be generated in the CIELAB color space, and determining the same grid spacing according to the number n of pixel points in the processing areaThe same size of the obtained super pixel block is ensured. Defining five-dimensional vectors in CIELAB color space through pixel point color and position information

C_k＝[l_k a_k b_k c_k x_k y_k]^T (1)

Where l, a, b represent color information, x, y represent spatial information, l _k is the brightness of the center point color, a _k is the position of the center point between red/magenta and green, b _k is the position of the center point between yellow and blue, x _k is the distance of the center point from the x-axis, and y _k is the distance of the center point from the y-axis;

(2) Pixel distance calculation

The algorithm defines a new distance index D to represent the relation between the pixel i and the distance center C _k, and determines the contribution of the pixel i and the distance center C _k to the distance index D by the weight m through judging the color distance and the space distance together.

Where d _c represents a color distance, d _s represents a spatial distance, and m represents a distance adjustment weight coefficient. When m is smaller, the weight of the color distance is higher, the adsorptivity of the superpixel to the target edge is stronger, and the regularity of the shape and the size of the superpixel is reduced; when m is larger, the weight of the space distance is higher, and the regularity of forming the super pixel block is better.

(3) Pixel allocation

In the pixel allocation process, each pixel i will be allocated to a corresponding super-pixel block according to the distance from the cluster center point. When the super pixel area size is s×s, the corresponding search area size should be 2s×2s.

(4) Cluster center update

After the pixel i is allocated to the clustering center, the clustering center is determined again according to the color and the position information of the pixel point. And calculating the residual value between the updated spatial two norms and the previous clustering center, continuously repeating the updating process until the error converges, stopping updating and determining the super pixel block.

(5) Post-treatment

Because the super-pixel algorithm does not have explicit connectivity, after clustering, partial pixel points do not belong to any super-pixel blocks. To solve this problem, a reassignment process is performed on isolated pixels by using a connected algorithm.

3. Energy filtering pedestrian foot position acquisition

The method comprises the steps of determining a region where a target is located through a superpixel algorithm, and performing energy filtering processing on the region where the target is located in order to extract effective information of foot positions of pedestrians, wherein the center point position of the toe of the pedestrian is set as a specific corresponding point P _f of the foot positions. And respectively projecting the binary result of the region where the target is located in the horizontal direction and the vertical direction, and counting the energy characteristics, namely non-zero pixel points in the horizontal direction and the vertical direction. And the energy filtering algorithm counts the pixel points in the binary image, thereby achieving the purposes of extracting the depth information of the image and completing the process of acquiring the characteristics. The specific process is as follows:

1) And binarizing the super-pixel segmentation result, namely the region where the target is located.

2) And performing energy filtering processing on the binary image, accumulating non-zero pixel points and forming a corresponding energy filtering curve.

3) In the vertical energy distribution diagram, the abscissa direction is the ordinate direction in the image coordinate system, and the size of the ordinate direction is the same as that of the ordinate direction in the image coordinate system; the ordinate of the energy distribution diagram is the energy value of the corresponding pixel row in the image. From the correspondence between the image and the vertical energy distribution map, the abscissa P _f is the abscissa of the initial value of the horizontal energy distributionAnd the abscissa of the end point valueMedian value, namely:

and the ordinate of P _f is the abscissa of the end point value of the vertical energy distribution diagram Namely:

As shown in particular in fig. 5.

The method for acquiring the upright pedestrian foot component is based on intelligent image analysis, the pedestrian foot component area is determined through HOG features, then the target area is further processed by utilizing a super-pixel algorithm, and the area where the redundant information extraction target is located is removed. In order to further acquire image information, required characteristics are extracted, a target area binary image is processed according to an energy filtering algorithm, a horizontal and vertical energy statistical curve is obtained, and the horizontal and vertical energy statistical curve is used as a characteristic point, namely a position point of a pedestrian foot component.

The luggage pedestrian foot component acquisition method provided by the invention serves a distance estimation algorithm based on the pedestrian body component, and provides effective data support for the pedestrian foot component. The pedestrian body part is used as an information acquisition source, no external hardware intervention is needed, and the method is suitable for estimating the distance of the pedestrian in a complex environment by the convenience. The invention is based on computer vision, only uses the monocular camera as the hardware equipment for image acquisition, and has low requirement on the hardware equipment. The pedestrian foot part characteristics are acquired through deep mining of the image information, so that the process of a subsequent algorithm is completed.

The foot members are in contact with the ground and are relatively fixed in position, which is advantageous in terms of acquisition complexity over other body members. Meanwhile, the ground color features are relatively single, and the burden in the image segmentation process is reduced.

1) According to the invention, the pedestrian body multiple parts are obtained through the HOG characteristics, and the foot parts are judged according to the inherent characteristics of the foot parts through an optical flow method and an energy filtering curve. The HOG characteristic is widely used by scholars as a main flow characteristic in the pedestrian detection process, and meanwhile, the algorithm is relatively mature and has certain advantages in the aspects of detection precision, timeliness and the like. The invention only acquires the information of the foot parts of the pedestrians, extracts the partial characteristics, and carries out parallel processing on the multiple parts acquired by the HOG by using an image segmentation basic algorithm-an optical flow method, thereby ensuring the real-time performance of the method. Analyzing the component diagram processed by the optical flow method, thereby obtaining the foot component area. In the foot part region acquisition process, the main stream basic algorithm is used, so that the stability of the method is ensured, and the instantaneity in the distance estimation process is ensured.

2) The super-pixel algorithm belongs to the field of image segmentation and processes pixels in the foot component area. The super-pixel algorithm processes the image on the pixel level, so that the algorithm accuracy is guaranteed, and meanwhile, the problems of high algorithm complexity and low instantaneity are also brought. Therefore, only the pixels in the foot component area are calculated, and both the calculation accuracy and the calculation speed are ensured. The super-pixel segmentation algorithm based on cluster analysis processes pixels with similar characteristics as a super-pixel block through calculating the characteristic values of the pixel points. The super-pixel algorithm can be used for manually regulating and controlling the sizes, the ranges and other weight of the feature blocks, so that the application effectiveness of the super-pixel algorithm in different occasions is ensured.

3) The energy filtering algorithm extracts characteristic points through a filtering curve, so that the purpose of deep mining of image information is achieved. The energy filtering algorithm is mainly applied to accumulation of non-zero pixel points and obtains a filtering curve, and the characteristics of the foot position points of the pedestrians are obtained and specific pixel positions are obtained through analysis of the images and the filtering curve. This method is easy to implement and calculates a speed block compared to other feature extraction methods. The invention is mainly applied to the acquisition of the distance of pedestrians and has higher requirement on real-time performance, so the adoption of a rapid and stable method for acquiring the foot step position of the pedestrians is a necessary requirement of the invention. The energy filtering algorithm only carries out accumulation calculation on the pixel points, extracts the foot position pixels through the existing rules, and guarantees both real-time and stability.

Example 1:

In the embodiment, the experimental image acquisition is completed by using a camera (480 x 640@30Hz), the image is shot on a winter street, and the image contains 1 pedestrian target pedestrian. The pixel height estimation is carried out on the head of the target pedestrian through the invention, the error of the distance estimation result of the pixel height estimation and the distance estimation result of the pixel height estimation are shown in figure 6, and the final error is not more than 3 pixels.

Example 2:

In the embodiment, the experimental image acquisition is completed by using a camera (480 x 640@30Hz), the image is shot on a winter street, and the image contains 1 pedestrian target pedestrian. The pixel height estimation is carried out on the head of the target pedestrian through the invention, the error of the distance estimation result of the pixel height estimation and the distance estimation result of the pixel height estimation are shown in figure 7, and the final error is not more than 2 pixels.

Example 3:

In order to solve the problem of estimating the accuracy of the distance between the person and the vehicle through the front image, the embodiment provides the following scheme: an infrared-visible light binocular pedestrian body multi-component fusion continuous distance estimation method comprises the following steps:

s1, shooting the same front image through an infrared-visible light binocular camera to obtain an infrared front image and a visible light front image;

S2, detecting and tracking in multiple time scales, and determining the positions of target pedestrians in the infrared front image and the visible front image;

S3, acquiring the head heights of pedestrians in the two images, calculating head part distance estimation results, and calculating foot part distance estimation results;

s4, carrying out primary fusion on the distance estimation results of different body parts of the pedestrian, carrying out secondary fusion on the estimated distances output according to the visible light and infrared light images, and completing the distance fusion of the cascaded pedestrian head part and the foot part so as to determine the distance between the pedestrian and the front of the vehicle.

Furthermore, the continuous distance estimation method for the infrared-visible light binocular pedestrian body multi-component fusion further comprises S5, tracking and checking the distance output result, and outputting the checking accurate distance.

Further, the method for obtaining the head pixel height in the two images in step S3 is to judge the head height of the pedestrian with coarse granularity, and the method includes estimating the head height of the pedestrian with coarse granularity and estimating the head height of the pedestrian with fine granularity by:

Coarse-grain pedestrian head height estimation by a fixed ratio r between head part height and body height, head height H _{re_head} to body height H _body ratio r _hb determined by simulation examples,

The head height is

H_{re_head}＝H_body×r_hb

The fine granularity method is to obtain the head pixel height by a super pixel algorithm, wherein the head pixel height is required to float in a proportion range, and the reference range is as follows:

H_{re_head}×(1-r_re)＜H_head＜H_{re_head}×(1+r_re)

Wherein r _re is a floating coefficient, and is controlled to be between 0.2 and 0.3, when the head pixel height H _head obtained by a fine granularity method is in a reference range, H _head is used as the head height to be output, otherwise, the head pixel height of the pedestrian obtained by the super pixel is judged to be invalid, and H _{re_head} is used as the output.

Further, the method for detecting and tracking the multiple time scales in the step S2 is as follows:

(1) Setting a certain frame in a video sequence as a first frame, and actively marking pedestrian information of the frame;

(2) Continuously tracking pedestrians by using a KCF algorithm according to the labeling content of the first frame;

(3) After m-frame image tracking is carried out, taking a tracking result as input, extracting HOG characteristics, training a pedestrian detection model on line through an SVM classifier, detecting images in a video sequence, setting that the detection result exists as verification of the tracking result, detecting the images once every m-frame image to realize tracking correction, carrying out n times in total, and detecting the number of frames k to be:

k＝1+m×n，n∈Z

Further, in step S4, the method for performing the first-level fusion on the distance estimation results of different body parts of the pedestrian is: the pedestrians to be estimated are standing at a plurality of different positions with the same known distance, the distance is the distance between the pedestrians and the front of the automobile, the distance estimation is carried out, the part throwing distance estimation and the foot part distance estimation are respectively carried out at one position, the distance estimation of all positions is completed, the distance estimation result set x ₁ is acquired through the head part, the distance estimation result set x ₂ is acquired through the foot part, Is the average of the set x ₁ of head-unit acquisition distance estimates,The average value of the distance estimation result set x ₂ is obtained by the foot component, the weight occupied by the distance estimation result of the head component is p ₁, the weight occupied by the distance estimation result of the foot component is p ₂, the sigma is the standard deviation, and the fusion weight is:

further, for a certain actual distance detection, if the head distance estimation result D _A and the foot distance estimation result D _B are the distance estimation value D ₁:

D₁＝p₁D_A+p₂D_B

Further, in step S4, the method for performing the second-level fusion on the estimated distances output according to the visible light and infrared light images is as follows:

Acquiring an infrared front image distance estimation set: for the head part distance estimation result set x ₁ and the foot part distance estimation result set x ₂ acquired by using the infrared front image, let the head part distance estimation result set acquired by the infrared front image be D _H, the foot part distance estimation result set of the infrared front image be D _F, then the distance estimation value set of the infrared front image be D _V:

D_V＝p₁D_H+p₂D_F

Acquiring a visible light front image distance estimation set: for the head piece distance estimation result set x ₁ and the foot piece distance estimation result set x ₂ acquired using the visible light front image, let the head piece distance estimation result set acquired by the visible light front image be D _G, the foot piece distance estimation result set of the visible light front image be D _K, and the distance estimation value set of the visible light front image be D _I:

D_I＝p₁D_G+p₂D_K

For the set of distance estimates for the infrared front image D _V, the set of distance estimates for the visible front image D _I, Is the average value of the distance estimation value set D _V of the infrared front image,The average value of the distance estimation value set D _I of the visible light front image is the average value, the weight occupied by the distance estimation result of the infrared front image is p ₃, the weight occupied by the distance estimation result of the visible light front image is p ₄, and the fusion weight is:

For a certain actual distance detection, the distance estimation result D _C of the infrared front image and the distance estimation result D _D of the visible front image are the distance estimation value D ₂:

D₂＝p₃D_C+p₄D_D。

the infrared-visible light binocular pedestrian body multi-component distance fusion estimation method provided by the invention ensures the effectiveness of the method for acquiring images under different visibility and illumination conditions; the problem that the partial frame number distance estimation is invalid when the pedestrian position is acquired on the mobile equipment is solved, and the continuous and effective distance estimation of pedestrians in the later period is ensured; the head height acquisition method in the existing body part distance estimation algorithm is improved, and the head height precision is improved; the application range of the method is widened, and the distance can be accurately estimated in severe weather environments such as heavy fog, rain and snow or when body parts are partially shielded; and the front and back frame verification of the distance estimation is completed, and the integrity of the algorithm is ensured.

The invention is based on intelligent images, uses infrared-visible light binocular cameras to acquire images, realizes the determination of pedestrian positions by using a multi-time scale detection tracking method during the image acquisition, enhances the accuracy of acquiring head pixel heights according to a coarse-fine granularity row head height judgment method, and accurately estimates the distances through cascade distance fusion, thereby improving the accuracy of distance estimation results, increasing distance estimation verification in the method, preventing the occurrence of failure situations of the method, and ensuring the integrity of the invention. (1) The invention can acquire the image to be detected only through the monocular camera and the infrared camera, has low requirement on hardware cost, is convenient to popularize, and ensures the effectiveness of the binocular camera on the image source; (2) In order to reduce the time consumption of the method and improve the accuracy of pedestrian positioning, the invention adopts a multi-time scale detection tracking method, and the position of the pedestrian is determined by implementing a tracking intermittent detection method; (3) The information acquisition of the pedestrian body part is the key of distance estimation, the invention carries out coarse-fine granularity judgment on the pedestrian head part, the head height is judged through the coarse-granularity body proportion and the head height is judged through the fine-granularity image segmentation, and the method of combining the coarse granularity and the fine granularity avoids the occurrence of the failure of fine-granularity head height judgment while improving the head height acquisition precision; (4) The invention processes the distance estimation result of the body part in a cascade type pedestrian head and foot part distance fusion mode, the primary fusion avoids the influence of the shielding of the pedestrian body part on the distance estimation result, the secondary fusion enables the application range of the invention to be wider, and the invention can adapt to different visibility environments to estimate the distance of the pedestrian; (5) Because of the complex application environment of the algorithm, the situation that the distance estimation result is invalid is unavoidable, and in order to prevent the problems, a tracking algorithm is added. And predicting the distance result of the failure frame by using the distance estimation result of the front frame and the rear frame through the prediction function of the tracking algorithm, thereby ensuring the integrity of the invention. The invention meets the requirements of the autonomous automobile driving process on the auxiliary driving system in multiple aspects and has extremely strong popularization value.

The method comprises the steps of firstly, utilizing an infrared-visible light binocular camera to ensure effective acquisition of images, determining positions of pedestrians by a continuous tracking-intermittent detection method, continuously tracking the positions of the pedestrians in the images after acquiring images to be detected, and verifying tracking results by utilizing a detection algorithm every same time to ensure detection accuracy; acquiring a pedestrian corresponding pixel image area according to the pedestrian position, carrying out coarse-fine granularity pedestrian head height judgment, acquiring the pedestrian head height by the coarse-granularity judgment according to the ratio between the pedestrian and the head, determining fine granularity judgment according to a segmentation algorithm, and simultaneously, taking the coarse granularity judgment as a verification basis of the fine granularity judgment to prevent fine granularity judgment failure caused by over segmentation and other conditions; the distance estimation results of the head and foot parts of the person descending under the visible light and infrared light conditions are obtained by using a known algorithm, and are processed in a cascading fusion mode, so that the estimated distance is finally obtained. The cascade fusion is divided into two stages, wherein the first stage fusion is performed on the distance estimation result of the head and foot parts of the pedestrian under the condition of visible light or infrared light, and the second stage fusion is performed on the fusion result of the head and foot of the visible light and infrared light; in the operation of the method, when the estimated distance exceeds a certain threshold range of the tracking prediction result, the distance estimation result is judged to be invalid, and the tracking prediction result is taken as a final distance estimation result.

The method and the device are suitable for estimating the distance of the mobile equipment to the pedestrians. (1) adapted for use in mobile robots to evade pedestrians: with the advent of the intelligent manufacturing age, the mobile robot industry has been unprecedented, and avoidance of pedestrians has also been a concern as a problem to be solved by mobile robots. The mobile robot faces to the complex environment, and how to effectively avoid pedestrians under the conditions of dim light and low visibility is a main problem solved by the invention. The visible-infrared binocular camera is used for acquiring the images at the same time, so that the effectiveness of image information acquisition is guaranteed, and the process of post-processing the images and estimating the distance is not influenced by external environment. Accordingly, the invention meets the requirements of the mobile robot on the aspect of estimating the distance between pedestrians when avoiding the pedestrians. (2) The method is suitable for positioning pedestrians by Unmanned Ground Vehicles (UGVs): at present, the unmanned ground vehicle is mainly applied to first-aid scenes such as logistics transportation, detection, protection and medical evacuation, and the problem that the unmanned ground vehicle must solve is solved when the pedestrians in the environment are positioned in time under dangerous conditions. UGVs are complex to environment, and the difficulty in estimating the distance of pedestrians is increased under extreme weather conditions such as strong illumination of outdoor windsand, rain and snow. According to the invention, the effectiveness and accuracy of pedestrian distance acquisition under extremely severe outdoor weather conditions are ensured by the method for acquiring the images by the infrared-visible light binocular camera and the method for fusing the distances between the cascaded pedestrian head and the foot parts. (3) The method is suitable for the field of autonomous automobile auxiliary driving, and in the process, the method is mainly used for judging the distance between pedestrians and vehicles, and provides important data support for pedestrian risk judgment. The method can meet the requirements of sustainability, accuracy, completeness and the like in the process of estimating the distance between the pedestrians in the auxiliary driving, and innovatively improves the problems of pedestrian position determination, body part information acquisition, emergency treatment of application scenes and algorithm failure and the like. Only a monocular camera and an infrared camera are used as image acquisition equipment, so that the hardware requirement degree is low, and the implementation is easy. Meanwhile, the binocular camera guarantees the safety of autonomous automobile driving at night, and solves the problem of huge potential safety hazards.

Example 4:

For the scheme in embodiment 3, the method for acquiring the height of the head pixel of the person includes the following steps:

S1, acquiring a pedestrian image by a monocular camera positioned in front of a vehicle, acquiring a pedestrian head calibration frame through HOG characteristics, acquiring a head pixel region through a super-pixel algorithm, and acquiring a pedestrian head pixel region to obtain a target binary image;

S2, establishing a vertical direction energy distribution diagram, and obtaining the pixel height of the head of the pedestrian.

Further, the method of step S2 is:

The energy filtering algorithm is utilized to obtain a vertical energy distribution diagram of the target binary image, in the vertical energy distribution diagram, the horizontal coordinate direction is the vertical coordinate direction in the image coordinate system, the size of the vertical coordinate direction is the same as the size of the vertical coordinate direction in the image coordinate system, and the vertical coordinate of the energy distribution diagram is the energy value of the corresponding pixel row in the image;

Determining a vertical coordinate P _t of the pixel position at the top of the pedestrian head by taking the top position of the pedestrian head as an initial value of an energy curve according to the corresponding relation between the head target area image and the energy distribution diagram in the vertical direction; a valley will appear at the junction of the bottom of the pedestrian head and other body parts, from which the bottom pixel position ordinate P _b of the pedestrian head is determined;

the pedestrian head pixel height is represented by:

H_h＝P_b-P_t。

Further, the method for obtaining the pixel height of the head of the upright pedestrian of the monocular camera further comprises the following steps:

s3, obtaining pixels of the height of the pedestrian, and checking the head pixel height result according to the ratio of the pixels of the head and the body of the pedestrian;

s4, outputting the pixel height of the pedestrian head according with the verification result, and outputting the final pixel height of the pedestrian head according to the fixed proportion of the pixels of the obtained pedestrian height for the result which fails to pass the verification.

Further, the method for obtaining the body height pixels comprises the steps of calculating pixels in a pedestrian frame by using a super-pixel segmentation algorithm, segmenting pedestrians and background areas, removing redundant information, operating a segmentation result by using an energy filtering algorithm to obtain a vertical energy statistical result, and obtaining the body height pixels by using an analysis result.

Further, in the extracting of the head pixel area, the method for obtaining the head pixel height of the pedestrian is to judge the head height of the pedestrian with coarse granularity, and the method comprises the steps of estimating the head height of the pedestrian with coarse granularity and estimating the head height of the pedestrian with fine granularity:

The head height is

H_{re_head}＝H_body×r_hb

H_{re_head}×(1-r_re)＜H_head＜H_{re_head}×(1+r_re)

Further, the method for acquiring the head pixel area comprises the following steps:

(1) Cluster initialization: inputting the number k of the super pixels to be generated in the CIELAB color space, and determining the same grid spacing s according to the number n of the pixel points in the processing area to ensure that the obtained super pixel blocks have the same size;

Wherein:

the pixel point color and position information are used for defining the five-dimensional vector distance center in the CIELAB color space:

C_k＝[l_k a_k b_k c_k x_k y_k]^T

(2) Pixel distance calculation: defining a distance index D to represent the relation between the pixel i and the distance center C _k, judging together through the color distance and the space distance, and determining the contribution of the color distance and the space distance to the distance index D through a weight m:

Wherein d _c represents color distance, d _s represents space distance, m represents distance adjustment weight coefficient, when m is smaller, the weight of color distance is higher, the adsorptivity of super pixel to target edge is stronger, and the regularity of super pixel shape and size is reduced; when m is larger, the weight of the space distance is higher, and the regularity of forming the super pixel block is better;

(3) Pixel allocation: each pixel i in the pixel allocation process is allocated to a corresponding super-pixel block according to the distance from the clustering center point, and the corresponding search area of the pixel area is twice as large as the super-pixel area;

(4) Cluster center update: after the pixel i is distributed to the clustering center, determining the clustering center again according to the color and position information of the pixel point, calculating the residual value between the updated pixel i and the previous clustering center by using the space two norms, continuously repeating the updating process until the error converges, stopping updating and determining the super pixel block;

(5) Post-treatment: after the clustering process, the occurrence part of the pixel points does not belong to any super pixel block, and the isolated pixel points are reassigned by using a communication algorithm.

The method for acquiring the vertical pedestrian head part is based on intelligent image analysis, the general area where the pedestrian head is located is determined through the HOG features, and the complexity of the super-pixel segmentation algorithm is high, so that the super-pixel processing is only carried out on the general area of the head, and the accuracy and the instantaneity of the algorithm are ensured. In order to avoid the condition of algorithm failure, the integrity of the head is checked through the relation between the pixel heights of the body and the head, and a specific method for acquiring the height characteristics is provided.

From the perspective of safety protection of pedestrians, the method for acquiring information based on the body parts does not need the pedestrians to carry other equipment, meets the actual condition requirements, and is more beneficial to popularization in traffic systems. Aiming at pedestrian safety, the invention provides data support for a pedestrian distance estimation method based on head height.

1) HOG features have been widely used as a feature description method in image human body detection, and gradually become the mainstream algorithm. The method takes the HOG characteristics as the information acquisition basis, and ensures the stability of the subsequent algorithm. And acquiring information through the HOG characteristics and performing depth mining, distinguishing head parts and using the head parts in the process of acquiring pixels of the head of the pedestrian. Because the pedestrian head pixel acquisition process is relatively auxiliary, the real-time performance is influenced by directly operating all information in the image, and the timeliness and the pixel acquisition precision of the invention are ensured simultaneously by the way of distinguishing the head components and then independently processing the head components.

2) The head is used as a body rigid part, has the characteristic of being not easy to deform under any posture of a pedestrian, and is specifically estimated by adopting a monocular ranging principle. The integrity of the head part is the key point of whether the distance estimation result is accurate. In order to obtain the pixel size of the head part of the pedestrian more accurately, the invention adopts a super-pixel segmentation algorithm to be applied to the obtaining process. The super-pixel segmentation algorithm effectively removes redundant areas and extracts effective information by segmenting pixel points which are adjacent in position and have similar characteristics such as colors, textures and the like. Since the algorithm will calculate pixel points one by one, the calculation amount is a problem to be solved in the algorithm application process. According to the invention, the head and peripheral pixel points are obtained by processing the image once through the vertical pedestrian head component acquisition criterion based on the HOG characteristics. Therefore, when the super-pixel line head part pixels are acquired, the calculated amount can be controlled within a certain range, and the calculation speed of the method is ensured.

3) In the image, the pedestrian head pixels occupy less weight of body pixels, and in complex traffic environment, the head pixels are acquired only by two methods of vertical line head part acquisition criteria and super pixel line head part pixels based on HOG characteristics in a face of various scenes, and the risk of method failure exists. In order to ensure the integrity of the invention and form a closed loop, the head pixel acquisition result is tested through the proportional relation between the head height and the body height, and other schemes are adopted to acquire the head information for the result which does not pass the test. The head information acquisition result is confirmed in a self-checking mode, so that the error rate of the distance output result is greatly reduced, and the safety is ensured to be higher in the practical application process.

4) The self-checking process is mainly carried out through the height proportion of the pixels of the head and the body of the pedestrian. The pedestrian head pixel height is obtained according to the super pixel row head part pixel, the body pixel height is segmented according to a segmentation algorithm in combination with a threshold value, specifically, the image is processed for the first time through the HOG feature, then super pixel segmentation is carried out on the processing result, and redundant information of the periphery of the pedestrian is removed. And processing the segmentation result by an energy filtering method, and finally obtaining the pedestrian pixel height through a result oscillogram. The judgment of the pedestrian risk index is a main application point of the invention, so that the invention has higher requirements on accuracy and stability. The verification process for head height acquisition is particularly critical in the present invention. The pixel height ratio between the head and the body is used as the basis for judging the head height result of the pedestrian, so that the feasibility is high, other hardware facilities are not required to be added for auxiliary processing in the process of checking the result, and the convenience is more beneficial to application in a complex environment.

(5) The invention can be used for the process of avoiding pedestrians by the mobile robot. Based on the camera inside the robot as hardware, the invention obtains the pixel height of the head of the pedestrian, and the distance between the pedestrian and the robot is obtained according to the existing pedestrian body part distance estimation method and is used as the effective judgment basis for the robot to avoid the pedestrian.

(6) The invention can be used for judging the pedestrian danger in the vehicle-mounted equipment. The distance between the pedestrians and the vehicles is an important criterion for judging whether the vehicles form danger to the pedestrians, the head pixel height obtained according to the invention is used as a distance estimation judgment basis, the distance is estimated on the basis of not increasing the hardware burden, and the method is suitable for complex traffic environments.

(7) The invention can be used for the pedestrian distance judging process of the unmanned aerial vehicle in the traffic law enforcement process. The development of unmanned aerial vehicles is not limited to technologies such as aerial photography at present, and the application of unmanned aerial vehicles in traffic law enforcement is a breakthrough of unmanned aerial vehicles for people. In law enforcement, unmanned aerial vehicle mainly used is to the shooting of the credentials that the pedestrian held. The invention can obtain the pixel height of the head of the pedestrian and estimate the distance, keep the proper distance and clearly shoot the credentials held by the pedestrian without hurting the pedestrian.

(8) The invention can be used for the process of judging the distance between pedestrians by the unmanned aerial vehicle in artistic shooting. With the development of artificial intelligence, unmanned aerial vehicles are gradually approaching to the service industry, and the unmanned aerial vehicles are accepted by the masses in the process of artistic shooting. In unmanned aerial vehicle use, usually need human or equipment to intervene, also promoted the shooting cost when having increased shooting personnel work load. The invention acquires the head pixel height of the photographer, thereby keeping a proper distance with the photographer and completing shooting.

Example 5:

for the solution in embodiment 3 or 4, a foot member acquisition method based on an energy filtered pedestrian image includes the steps of:

step 1: acquiring a pedestrian foot calibration frame through HOG features;

Step 3: the output foot coordinates are obtained by energy filtering.

Further, the method in the 3 rd step is as follows: setting the position of the center point of the toe of the pedestrian as a specific corresponding point P _f of the foot position, respectively projecting the binary result of the region where the foot target is located in the horizontal direction and the vertical direction, carrying out energy filtering treatment on the binary image by counting non-zero pixel points in the horizontal direction and the vertical direction, accumulating the non-zero pixel points and forming a corresponding energy filtering curve, wherein in the vertical energy distribution diagram, the abscissa direction is the ordinate direction in an image coordinate system, the size is the same as the ordinate size in the image coordinate system, the ordinate of the energy distribution diagram is the energy value size of a corresponding pixel row in the image, and the abscissa of the P _f is the abscissa of the initial value of the energy distribution in the horizontal direction according to the corresponding relation between the image and the vertical energy distribution diagramAnd the abscissa of the end point valueMedian value, namely:

The ordinate of P _f is the abscissa of the end point value of the energy distribution diagram in the vertical direction Namely:

Wherein:

the pixel color and position information are used for defining five-dimensional vectors in the CIELAB color space:

C_k＝[l_k a_k b_k c_k x_k y_k]^T(1)

Example 6:

For the solution in embodiment 3, 4 or 5, where step s2, multi-time scale detection tracking, determining the positions of the target pedestrians in the infrared front image and the visible front image, the method of this embodiment may also be used, and the sample may select the target tracking method of the update mechanism, determining the positions of the target pedestrians in the infrared front image and the visible front image, where the tracking method includes:

firstly, acquiring a video initial frame and initializing a tracker;

Secondly, tracking a target for the next frame by using a filtering tracking method, and returning a tracking result;

Thirdly, analyzing the tracking result by using an image characteristic forgetting method, extracting image characteristics of a target area, comparing the image characteristics with a reference image, and forgetting the tracking result with an excessively large gap;

And fourthly, checking the forgotten tracking result in the third step by using an energy significant memory method, extracting gradient energy of a target area of the forgotten result, performing significance analysis, memorizing the tracking result containing the target in a sample library again, maintaining the forgotten operation by other results not containing the tracked target, and returning to the second step or ending tracking.

Further, if the tracking result with the too large difference does not exist in the third step, returning to the second step or ending the tracking.

Further, the third step is as follows: taking a kth frame image as a reference image, respectively extracting HOG and CNN characteristics of a kth frame and a kth+1 frame image target area, and then calculating Manhattan distance values of the image characteristics of the two areas to be used as image distance values of a kth+1 frame tracking result;

Setting the characteristic of a kth frame image of the video as J _k (x), setting the characteristic of a kth+1st frame image as J _k+1 (x), and then calculating an image distance value of the kth+1st frame image by using a formula (2);

dist_k+1＞δ，δ∈(0，1) (3)

Wherein delta is the upper limit value of the failure sample in the overlapping degree judgment, dist _k+1 is the image distance value of the k+1st frame image, n is the number of elements in the feature map, and J _k(x)_i and J _k+1(x)_i are the ith element in the 1 st frame and the k+1st frame image features respectively;

If the image characteristic distance of the k+1st frame is larger than delta, judging the tracking result of the k+1st frame as the tracking result which needs to be forgotten, and if the image characteristic distance is smaller than delta, memorizing the tracking result in a training set and jumping to a fifth step.

Further, the fourth step is as follows: extracting HOG energy of a target area from a k+1st frame image of the latest target tracking result, extracting HOG energy of all images in a training set, calculating the average value of the HOG energy as a comparison, and calculating the HOG energy change value of the k+1st frame image as the energy significance value of the k+1st frame image;

let H _k+1 be the HOG energy value of the (k+1) th frame image, H _x be the HOG energy set of all images in the training set, then

Equation (7) is an energy salient value calculation formula, enable _k+1 is the energy salient value of the (k+1) th frame of image, m represents the number of images existing in the training set, and H _x (i) represents the HOG energy of the (i) th image in the training set;

If the energy significance value of the k+1st frame image meets the formula (8), the k+1st frame image is remembered into the training set, and if the energy significance value of the k+1st frame image does not meet the formula, forgetting operation of the k+1st frame image is maintained;

In one scheme, in order to solve the problem of remembering effective samples in the forgetting result, a method is provided for extracting gradient energy of a target area of the forgetting result for significance analysis and remembering a tracking result containing the target into a sample library.

Further, extracting HOG energy of a target area from a k+1st frame image of the latest target tracking result, extracting HOG energy of all images in a training set, calculating the average value of the HOG energy as a comparison, and calculating the HOG energy change value of the k+1st frame image as the energy significant value of the k+1st frame image;

(1) The target tracking algorithm of the sample selectable updating mechanism can keep good adaptability to complex environments such as target shielding, intense light and dark change, target deformation and the like, so that the target tracking method can be applied to more actual scenes, and more reliable target position information can be provided for subsequent judgment such as pedestrian intention analysis and the like;

(2) The image feature forgetting method can screen tracking results, forget tracking results with a large gap from a reference image, can be suitable for all the object tracking methods of the discriminant model, can avoid the pollution of the training set by the feature information of the shielding object, and improves the adaptability of the object tracking method to the shielding of the object;

(3) The energy significant memory method can verify the forgetting result of the image characteristic forgetting method, and mainly memorize the target characteristic information with larger change under the complex environments such as light-dark change, target deformation and the like;

(4) The invention can provide more accurate road condition information for mobile robots, autonomous vehicles and auxiliary driving systems, and plays an important role in both obstacle avoidance and path planning of industrial robots or autonomous vehicles, guiding service provided by service robots for specific person targets and the like.

Claims

1. A foot component acquisition method of pedestrian images based on energy filtering, characterized by: the method comprises the following steps:

step 1: acquiring a pedestrian foot calibration frame through HOG features;

step 3: obtaining an output foot coordinate through energy filtering;

The method in the step 3 is as follows: setting the position of the center point of the toe of the pedestrian as a specific corresponding point P _f of the foot position, respectively projecting the binary result of the region where the foot target is located in the horizontal direction and the vertical direction, carrying out energy filtering treatment on the binary image by counting non-zero pixel points in the horizontal direction and the vertical direction, accumulating the non-zero pixel points and forming a corresponding energy filtering curve, wherein in the vertical energy distribution diagram, the abscissa direction is the ordinate direction in an image coordinate system, the size is the same as the ordinate size in the image coordinate system, the ordinate of the energy distribution diagram is the energy value size of a corresponding pixel row in the image, and the abscissa of the P _f is the abscissa of the initial value of the energy distribution in the horizontal direction according to the corresponding relation between the image and the vertical energy distribution diagram And the abscissa of the end point valueMedian value, namely:

The method for acquiring the foot pixel region comprises the following steps:

Wherein:

C_k＝[l_k a_k b_k c_k x_k y_k]^T (1)

(2) Pixel distance calculation: a distance index D is defined to represent the relationship between pixel i and the distance center C _k,

The contribution of the color distance and the space distance to the distance index D is determined by the weight m through judging the color distance and the space distance together:

2. The use of the foot component acquisition method based on energy filtered pedestrian images according to claim 1 in continuous distance estimation for infrared-visible binocular pedestrian body multi-component fusion.