CN111145211B

CN111145211B - Method for acquiring pixel height of head of upright pedestrian of monocular camera

Info

Publication number: CN111145211B
Application number: CN201911235573.1A
Authority: CN
Inventors: 杨大伟; 毛琳; 程凡
Original assignee: Dalian Minzu University
Current assignee: Dalian Minzu University
Priority date: 2019-12-05
Filing date: 2019-12-05
Publication date: 2023-06-30
Anticipated expiration: 2039-12-05
Also published as: CN111145211A

Abstract

A monocular camera vertical pedestrian head pixel height acquisition method belongs to the field of intelligent image vision, and aims to solve the problem that head part information acquisition is ambiguous in the process of estimating a distance through a head part.

Description

Method for acquiring pixel height of head of upright pedestrian of monocular camera

Technical Field

The invention belongs to the field of intelligent image vision, in particular to a method for acquiring an image through a monocular camera and processing a standing pedestrian in the image by utilizing an image processing method so as to achieve the purpose of acquiring the head of the pedestrian.

Background

In a traffic system, the traffic system is mainly composed of two objects, namely a vehicle and a pedestrian, and judging the distance between the two objects is a key factor for judging whether the pedestrian is in a dangerous state or not. The characteristics of the body parts of the pedestrians are processed by utilizing the images, external equipment is not required to participate in assistance, and the requirements of various aspects such as accuracy, simplicity and the like on the determination of the distance of the pedestrians are met in an auxiliary traffic environment.

The pedestrian information acquisition method is various on the image level. Patent (publication number: CN 108648198A) discloses a segmentation method of a multi-target moving human body region in a video, wherein the motion region is binarized by a background difference method, a rectangular maximum region is segmented for a region where a anthropomorphic person is located, and the target moving human body region is finally extracted by a method of repeated iteration of an optical flow method. The patent (publication No. CN 108734200A) discloses a human body target visual detection method and device based on BING features, wherein the method and device utilize the BING features to carry out significance detection on images, screen out areas possibly containing pedestrians, process screening results through a cascade classifier, and finally determine the position information of the areas where the pedestrians are located.

Disclosure of Invention

In order to solve the problem that the head information is not clearly acquired in the process of estimating the distance through the head, the method aims at acquiring the pixel area of the head of the upright pedestrian, and three steps of the head acquisition criterion of the upright pedestrian based on the HOG characteristics, the head acquisition of the super-pixel pedestrian and the head integrity definition are adopted on the basis of image segmentation.

The technical scheme of the invention is as follows: a monocular camera upright pedestrian head pixel height acquisition method comprises the following steps:

S1, acquiring a pedestrian image by a monocular camera positioned in front of a vehicle, acquiring a pedestrian head calibration frame through HOG characteristics, acquiring a head pixel region through a super-pixel algorithm, and acquiring a pedestrian head pixel region to obtain a target binary image;

s2, establishing a vertical direction energy distribution diagram, and obtaining the pixel height of the head of the pedestrian.

An application of a monocular camera upright pedestrian head pixel height acquisition method in continuous distance estimation of infrared-visible light binocular pedestrian body multi-component fusion.

The beneficial effects are that: the super-pixel upright pedestrian head acquisition method aims at acquiring the information of the pedestrian body part, and the pedestrian distance is specifically estimated by using the acquisition result. According to the invention, the distance estimation is taken as an application point, and after the pixel area of the pedestrian head part is obtained according to the requirement, the relevant information such as the height is extracted, so that the specific application of algorithms such as the subsequent distance estimation is facilitated.

Drawings

FIG. 1 is a schematic logic diagram of a method of upright pedestrian head member acquisition;

FIG. 2 is a schematic diagram of pixel allocation;

FIG. 3 is a head pixel acquisition schematic;

FIG. 4 is a schematic view of body height pixel acquisition;

FIG. 5 is a graph of the results of example 1;

FIG. 6 is a graph of the results of example 2;

Detailed Description

The invention is described in further detail below with reference to the attached drawings and detailed description:

a schematic logic diagram of an upright pedestrian head member acquisition method is shown in fig. 1, and the algorithm is implemented by the following steps:

step 1: acquiring a pedestrian head calibration frame through HOG features;

step 2: processing pixels in the target area and the head calibration frame through a super-pixel algorithm and obtaining head high pixels;

step 3: acquiring pixels of the height of a pedestrian, and checking the head pixel height result according to the ratio of the pixels of the head and the body of the pedestrian;

step 4: and outputting the pixel height of the pedestrian head conforming to the verification result, and obtaining the final pixel height of the pedestrian head according to the fixed proportion of the pixels of the obtained pedestrian height for the result which fails to pass the verification.

From the above description, the specific method of the present invention is as follows:

1. vertical pedestrian head part acquisition criterion based on HOG characteristics

The detection of pedestrians by using HOG features is a mature method, and has certain advantages in stability and detection precision. The invention aims at acquiring the pedestrian head area, extracts the pedestrian body part through the HOG characteristic, and judges the pedestrian head part and the area where the pedestrian head part is positioned through a set rule. The established rules are as follows:

1) And processing each part frame of the pedestrian based on an optical flow method, and obtaining a target binary image.

2) And (3) carrying out energy filtering treatment on the analysis result in the last step to obtain energy filtering curves respectively. The energy filtering curve has two directions, x and y.

3) The curve of the energy filtering in the x direction is in a quadratic function inverse curve shape, the curve of the energy filtering in the y direction is in a constant curve of a front part area, and the curve of the energy filtering in the y direction is in a rear part quadratic function inverse curve shape. The head component is selected from the body multiple components accordingly, and subsequent calculations are performed.

2. Super pixel row human head component pixel acquisition

The superpixel algorithm is essentially a method of grouping closely located, similarly characterized pixels. The super-pixel algorithm is beneficial to removing redundant information, and effective pedestrian head information is extracted from a pedestrian head detection frame, so that the subsequent algorithm is convenient to carry out. When the super-pixel algorithm is used for acquiring the pedestrian head region, the target region and the non-target region can be accurately divided mainly depending on the adsorbability of the super-pixel algorithm, so that the purpose of acquiring the pedestrian head pixel region is achieved.

The method mainly comprises the following steps of:

(1) Cluster initialization

Inputting the number k of super pixels to be generated in the CIELAB color space, and determining the same grid spacing according to the number n of pixel points in the processing area

The same size of the obtained super pixel block is ensured. Defining five-dimensional vectors in CIELAB color space through pixel point color and position information

C _k ＝[l _k a _k b _k c _k x _k y _k ] ^T (1)

Wherein l, a, b represent color information, x, y represent spatial information, l _k Is the brightness of the color of the center point, a _k Is the position of the centre point between red/magenta and green, b _k Is the position of the center point between yellow and blue, x _k Is the distance between the center point and the x-axis, y _k Is the distance between the center point and the y-axis;

(2) Pixel distance calculation

The algorithm defines a new distance index D to represent pixel i and distance center C _k The relation between the two is judged by the color distance and the space distance, and the contribution of the color distance and the space distance to the distance index D is determined by the weight m.

Wherein d _c Represents the color distance, d _s Representing the space distance, wherein m represents a distance adjustment weight coefficient, when m is smaller, the weight of the color distance is higher, the adsorptivity of the super pixel to the target edge is stronger, and the regularity of the shape and the size of the super pixel is reduced; when m is larger, the weight of the space distance is higher, and the regularity of forming the super pixel block is better;

(3) Pixel allocation

In the pixel allocation process, each pixel i will be allocated to a corresponding super-pixel block according to the distance from the cluster center point. When the super pixel area size is s×s, the corresponding search area size should be 2s×2s.

(4) Cluster center update

After the pixel i is allocated to the clustering center, the clustering center is determined again according to the color and the position information of the pixel point. And calculating the residual value between the updated spatial two norms and the previous clustering center, continuously repeating the updating process until the error converges, stopping updating and determining the super pixel block.

(5) Post-treatment

Because the super-pixel algorithm does not have explicit connectivity, after clustering, partial pixel points do not belong to any super-pixel blocks. To solve this problem, a reassignment process is performed on isolated pixels by using a connected algorithm.

(6) Acquiring head pixel height

And acquiring a target area through a super-pixel algorithm, taking a target binary image as input, and acquiring a vertical energy distribution map through an energy filtering algorithm. In the vertical energy distribution diagram, the abscissa direction is the ordinate direction in the image coordinate system, and the size of the ordinate direction is the same as that of the ordinate direction in the image coordinate system; the ordinate of the energy distribution diagram is the energy value of the corresponding pixel row in the image. The corresponding relation between the image and the vertical energy distribution diagram can know that the head top position of the pedestrian is the initial value of the energy curve, and the head top pixel of the pedestrian is determinedPosition ordinate P _t The method comprises the steps of carrying out a first treatment on the surface of the Whereas the junction of the pedestrian head bottom and other body parts will have a valley, thereby determining the ordinate P of the pixel position of the pedestrian head bottom _b . Pedestrian head pixel height;

H _h ＝P _b -P _t (4)

3. header integrity definition specification

Head integrity means that the head part pixel height meets the fixed ratio requirement of the height pixels within the error range.

In the head pixel region extraction process, the features of the head pixel height, width, pixel area, and the like are also obtained correspondingly. In the process of acquiring the head pixel region through the super-pixel algorithm, when the head region is similar to the background, the super-pixel algorithm cannot effectively acquire the target region. Judging whether the regional acquisition is effective or not and processing failure conditions ensures the integrity of the invention.

Through multiple experiments, the corresponding proportion relation exists between the pixel height of the head of the pedestrian and the height of the pixel. In applying the super-pixel algorithm, the pedestrian body height is more readily available than the pedestrian head pixel area. Setting a pedestrian height pixel h _b Height h from head pixel _h The ratio coefficient is r, the error ratio is e, when the relation is satisfied

h _b ×r×(1+e)≥h _h ≥h _b ×r×(1-e) (5)

And when the pedestrian head part is judged to be complete, the head height characteristics are obtained according to the relation between the height pixels and the head height pixels, and the head height characteristics are used for the related calculation of the subsequent pedestrian distance estimation.

4. Body height pixel

The acquisition of the body height pixels can be concretely divided into four steps of super pixel segmentation of pedestrians, energy statistics and analysis results. Specifically, pixels in a pedestrian frame are calculated by utilizing a super-pixel segmentation algorithm, pedestrians and background areas are segmented, and redundant information is removed; operating the segmentation result by using an energy filtering algorithm to obtain a vertical energy statistical result; and finally analyzing the result to obtain the body height pixel.

The pedestrian overall component is more feature-rich and therefore easier to acquire than the body component head. The body pixel height is used as the head integrity judgment basis, and the feasibility ensures the stability of the judgment result.

The method for acquiring the vertical pedestrian head part is based on intelligent image analysis, the general area where the pedestrian head is located is determined through the HOG features, and the complexity of the super-pixel segmentation algorithm is high, so that the super-pixel processing is only carried out on the general area of the head, and the accuracy and the instantaneity of the algorithm are ensured. In order to avoid the condition of algorithm failure, the integrity of the head is checked through the relation between the pixel heights of the body and the head, and a specific method for acquiring the height characteristics is provided.

From the perspective of safety protection of pedestrians, the method for acquiring information based on the body parts does not need the pedestrians to carry other equipment, meets the actual condition requirements, and is more beneficial to popularization in traffic systems. Aiming at pedestrian safety, the invention provides data support for a pedestrian distance estimation method based on head height.

1) HOG features have been widely used as a feature description method in image human body detection, and gradually become the mainstream algorithm. The method takes the HOG characteristics as the information acquisition basis, and ensures the stability of the subsequent algorithm. And acquiring information through the HOG characteristics and performing depth mining, distinguishing head parts and using the head parts in the process of acquiring pixels of the head of the pedestrian. Because the pedestrian head pixel acquisition process is relatively auxiliary, the real-time performance is influenced by directly operating all information in the image, and the timeliness and the pixel acquisition precision of the invention are ensured simultaneously by the way of distinguishing the head components and then independently processing the head components.

2) The head is used as a body rigid part, has the characteristic of being not easy to deform under any posture of a pedestrian, and is specifically estimated by adopting a monocular ranging principle. The integrity of the head part is the key point of whether the distance estimation result is accurate. In order to obtain the pixel size of the head part of the pedestrian more accurately, the invention adopts a super-pixel segmentation algorithm to be applied to the obtaining process. The super-pixel segmentation algorithm effectively removes redundant areas and extracts effective information by segmenting pixel points which are adjacent in position and have similar characteristics such as colors, textures and the like. Since the algorithm will calculate pixel points one by one, the calculation amount is a problem to be solved in the algorithm application process. According to the invention, the head and peripheral pixel points are obtained by processing the image once through the vertical pedestrian head component acquisition criterion based on the HOG characteristics. Therefore, when the super-pixel line head part pixels are acquired, the calculated amount can be controlled within a certain range, and the calculation speed of the method is ensured.

3) In the image, the pedestrian head pixels occupy less weight of body pixels, and in complex traffic environment, the head pixels are acquired only by two methods of vertical line head part acquisition criteria and super pixel line head part pixels based on HOG characteristics in a face of various scenes, and the risk of method failure exists. In order to ensure the integrity of the invention and form a closed loop, the head pixel acquisition result is tested through the proportional relation between the head height and the body height, and other schemes are adopted to acquire the head information for the result which does not pass the test. The head information acquisition result is confirmed in a self-checking mode, so that the error rate of the distance output result is greatly reduced, and the safety is ensured to be higher in the practical application process.

4) The self-checking process is mainly carried out through the height proportion of the pixels of the head and the body of the pedestrian. The pedestrian head pixel height is obtained according to the super pixel row head part pixel, the body pixel height is segmented according to a segmentation algorithm in combination with a threshold value, specifically, the image is processed for the first time through the HOG feature, then super pixel segmentation is carried out on the processing result, and redundant information of the periphery of the pedestrian is removed. And processing the segmentation result by an energy filtering method, and finally obtaining the pedestrian pixel height through a result oscillogram. The judgment of the pedestrian risk index is a main application point of the invention, so that the invention has higher requirements on accuracy and stability. The verification process for head height acquisition is particularly critical in the present invention. The pixel height ratio between the head and the body is used as the basis for judging the head height result of the pedestrian, so that the feasibility is high, other hardware facilities are not required to be added for auxiliary processing in the process of checking the result, and the convenience is more beneficial to application in a complex environment.

(5) The invention can be used for the process of avoiding pedestrians by the mobile robot. Based on the camera inside the robot as hardware, the invention obtains the pixel height of the head of the pedestrian, and the distance between the pedestrian and the robot is obtained according to the existing pedestrian body part distance estimation method and is used as the effective judgment basis for the robot to avoid the pedestrian.

(6) The invention can be used for judging the pedestrian danger in the vehicle-mounted equipment. The distance between the pedestrians and the vehicles is an important criterion for judging whether the vehicles form danger to the pedestrians, the head pixel height obtained according to the invention is used as a distance estimation judgment basis, the distance is estimated on the basis of not increasing the hardware burden, and the method is suitable for complex traffic environments.

(7) The invention can be used for the pedestrian distance judging process of the unmanned aerial vehicle in the traffic law enforcement process. The development of unmanned aerial vehicles is not limited to technologies such as aerial photography at present, and the application of unmanned aerial vehicles in traffic law enforcement is a breakthrough of unmanned aerial vehicles for people. In law enforcement, unmanned aerial vehicle mainly used is to the shooting of the credentials that the pedestrian held. The invention can obtain the pixel height of the head of the pedestrian and estimate the distance, keep the proper distance and clearly shoot the credentials held by the pedestrian without hurting the pedestrian.

(8) The invention can be used for the process of judging the distance between pedestrians by the unmanned aerial vehicle in artistic shooting. With the development of artificial intelligence, unmanned aerial vehicles are gradually approaching to the service industry, and the unmanned aerial vehicles are accepted by the masses in the process of artistic shooting. In unmanned aerial vehicle use, usually need human or equipment to intervene, also promoted the shooting cost when having increased shooting personnel work load. The invention acquires the head pixel height of the photographer, thereby keeping a proper distance with the photographer and completing shooting.

Example 1:

in the embodiment, the experimental image acquisition is completed by using a camera (480 x 640@30Hz), the image is shot on a winter street, and the image contains 1 pedestrian target pedestrian. The pixel height estimation is carried out on the head of the target pedestrian through the method, the error of the distance estimation result of the pixel height estimation and the distance estimation result of the pixel height estimation are shown in the figure 5, and the final error is not more than 2 pixels.

Example 2:

in the embodiment, the experimental image acquisition is completed by using a camera (480 x 640@30Hz), the image is shot on a winter street, and the image contains 1 pedestrian target pedestrian. The pixel height estimation is carried out on the head of the target pedestrian through the invention, the error of the distance estimation result of the pixel height estimation and the distance estimation result of the pixel height estimation are shown in figure 6, and the final error is not more than 2 pixels.

Example 3:

in order to solve the problem of estimating the accuracy of the distance between the person and the vehicle through the front image, the embodiment provides the following scheme: an infrared-visible light binocular pedestrian body multi-component fusion continuous distance estimation method comprises the following steps:

s1, shooting the same front image through an infrared-visible light binocular camera to obtain an infrared front image and a visible light front image;

s2, detecting and tracking in multiple time scales, and determining the positions of target pedestrians in the infrared front image and the visible front image;

s3, acquiring the head heights of pedestrians in the two images, calculating head part distance estimation results, and calculating foot part distance estimation results;

s4, carrying out primary fusion on the distance estimation results of different body parts of the pedestrian, carrying out secondary fusion on the estimated distances output according to the visible light and infrared light images, and completing the distance fusion of the cascaded pedestrian head part and the foot part so as to determine the distance between the pedestrian and the front of the vehicle.

Furthermore, the continuous distance estimation method for the infrared-visible light binocular pedestrian body multi-component fusion further comprises S5, tracking and checking the distance output result, and outputting the checking accurate distance.

Further, the method for obtaining the head pixel height in the two images in step S3 is to judge the head height of the pedestrian with coarse granularity, and the method includes estimating the head height of the pedestrian with coarse granularity and estimating the head height of the pedestrian with fine granularity by:

Coarse grain rowHead height estimation head height H is estimated by head part height and body height fixed ratio r _{re_head} Height H with body _body Proportion r _hb As determined by the simulation instance,

the head height is

H _{re_head} ＝H _body ×r _hb

The fine granularity method is to obtain the head pixel height by a super pixel algorithm, wherein the head pixel height is required to float in a proportion range, and the reference range is as follows:

H _{re_head} ×(1-r _re )＜H _head ＜H _{re_head} ×(1+r _re )

wherein r is _re For the floating coefficient, controlling to be between 0.2 and 0.3, when the head pixel height H is obtained by a fine granularity method _head Within the reference range, H _head As the head height output, otherwise, judging that the super pixel acquires the height failure of the pedestrian head pixel, and enabling H to be the same as the height failure of the pedestrian head pixel _{re_head} As an output.

Further, the method for detecting and tracking the multiple time scales in the step S2 is as follows:

(1) Setting a certain frame in a video sequence as a first frame, and actively marking pedestrian information of the frame;

(2) Continuously tracking pedestrians by using a KCF algorithm according to the labeling content of the first frame;

(3) After m-frame image tracking is carried out, taking a tracking result as input, extracting HOG characteristics, training a pedestrian detection model on line through an SVM classifier, detecting images in a video sequence, setting that the detection result exists as verification of the tracking result, detecting the images once every m-frame image to realize tracking correction, carrying out n times in total, and detecting the number of frames k to be:

k＝1+m×n，n∈Z

Further, in step S4, the method for performing the first-level fusion on the distance estimation results of different body parts of the pedestrian is: the pedestrians to be estimated are located at a plurality of different positions with the same known distance, the distance is the distance between the pedestrians and the front of the automobile, the distance estimation is carried out, and the distance estimation is carried out at one positionProjection part distance estimation and foot part distance estimation, completing distance estimation of all positions, and acquiring a distance estimation result set x through a head part ₁ Obtaining a distance estimation result set x by foot components ₂ ，

Is the head-piece acquisition distance estimation result set x ₁ Mean value of->

Is the foot component obtains the distance estimation result set x ₂ The average value of the head distance estimation result is p ₁ The weight of the foot component distance estimation result is p ₂ Sigma is standard deviation, then the fusion weight is:

further, for a certain actual distance detection, the head distance estimation result D _A Foot component distance estimation D _B Distance estimate D ₁ ：

D ₁ ＝p ₁ D _A +p ₂ D _B

Further, in step S4, the method for performing the second-level fusion on the estimated distances output according to the visible light and infrared light images is as follows:

acquiring an infrared front image distance estimation set: head distance estimation result set x for head distance acquisition using infrared front image ₁ And foot component obtains distance estimation result set x ₂ The head part distance estimation result set acquired by the infrared front image is D _H The foot component distance estimation result set of the infrared front image is D _F The distance estimation value set of the infrared front image is D _V ：

D _V ＝p ₁ D _H +p ₂ D _F

Acquiring a visible light front-of-vehicle image distance estimation setAnd (3) combining: head distance estimation result set x for front-of-vehicle image acquisition using visible light ₁ And foot component obtains distance estimation result set x ₂ The head distance estimation result set acquired by the visible light front image is D _G The foot component distance estimation result set of the visible light front image is D _K The distance estimation value set of the visible light front image is D _I ：

D _I ＝p ₁ D _G +p ₂ D _K

Distance estimation value set D for infrared front image _V Distance estimation value set D of visible light front image _I ，

Is the distance estimation value set D of the infrared front image _V Mean value of->

Is the distance estimation value set D of the visible light front image _I The weight of the distance estimation result of the infrared front image is p ₃ The weight occupied by the distance estimation result of the visible light front image is p ₄ The fusion weights are:

for a certain actual distance detection, the distance estimation result D of the infrared front image _C Distance estimation result D of visible light front image _D Distance estimate D ₂ ：

D ₂ ＝p ₃ D _C +p ₄ D _D 。

The infrared-visible light binocular pedestrian body multi-component distance fusion estimation method provided by the invention ensures the effectiveness of the method for acquiring images under different visibility and illumination conditions; the problem that the partial frame number distance estimation is invalid when the pedestrian position is acquired on the mobile equipment is solved, and the continuous and effective distance estimation of pedestrians in the later period is ensured; the head height acquisition method in the existing body part distance estimation algorithm is improved, and the head height precision is improved; the application range of the method is widened, and the distance can be accurately estimated in severe weather environments such as heavy fog, rain and snow or when body parts are partially shielded; and the front and back frame verification of the distance estimation is completed, and the integrity of the algorithm is ensured.

The invention is based on intelligent images, uses infrared-visible light binocular cameras to acquire images, realizes the determination of pedestrian positions by using a multi-time scale detection tracking method during the image acquisition, enhances the accuracy of acquiring head pixel heights according to a coarse-fine granularity row head height judgment method, and accurately estimates the distances through cascade distance fusion, thereby improving the accuracy of distance estimation results, increasing distance estimation verification in the method, preventing the occurrence of failure situations of the method, and ensuring the integrity of the invention. (1) The invention can acquire the image to be detected only through the monocular camera and the infrared camera, has low requirement on hardware cost, is convenient to popularize, and ensures the effectiveness of the binocular camera on the image source; (2) In order to reduce the time consumption of the method and improve the accuracy of pedestrian positioning, the invention adopts a multi-time scale detection tracking method, and the position of the pedestrian is determined by implementing a tracking intermittent detection method; (3) The information acquisition of the pedestrian body part is the key of distance estimation, the invention carries out coarse-fine granularity judgment on the pedestrian head part, the head height is judged through the coarse-granularity body proportion and the head height is judged through the fine-granularity image segmentation, and the method of combining the coarse granularity and the fine granularity avoids the occurrence of the failure of fine-granularity head height judgment while improving the head height acquisition precision; (4) The invention processes the distance estimation result of the body part in a cascade type pedestrian head and foot part distance fusion mode, the primary fusion avoids the influence of the shielding of the pedestrian body part on the distance estimation result, the secondary fusion enables the application range of the invention to be wider, and the invention can adapt to different visibility environments to estimate the distance of the pedestrian; (5) Because of the complex application environment of the algorithm, the situation that the distance estimation result is invalid is unavoidable, and in order to prevent the problems, a tracking algorithm is added. And predicting the distance result of the failure frame by using the distance estimation result of the front frame and the rear frame through the prediction function of the tracking algorithm, thereby ensuring the integrity of the invention. The invention meets the requirements of the autonomous automobile driving process on the auxiliary driving system in multiple aspects and has extremely strong popularization value.

The method comprises the steps of firstly, utilizing an infrared-visible light binocular camera to ensure effective acquisition of images, determining positions of pedestrians by a continuous tracking-intermittent detection method, continuously tracking the positions of the pedestrians in the images after acquiring images to be detected, and verifying tracking results by utilizing a detection algorithm every same time to ensure detection accuracy; acquiring a pedestrian corresponding pixel image area according to the pedestrian position, carrying out coarse-fine granularity pedestrian head height judgment, acquiring the pedestrian head height by the coarse-granularity judgment according to the ratio between the pedestrian and the head, determining fine granularity judgment according to a segmentation algorithm, and simultaneously, taking the coarse granularity judgment as a verification basis of the fine granularity judgment to prevent fine granularity judgment failure caused by over segmentation and other conditions; the distance estimation results of the head and foot parts of the person descending under the visible light and infrared light conditions are obtained by using a known algorithm, and are processed in a cascading fusion mode, so that the estimated distance is finally obtained. The cascade fusion is divided into two stages, wherein the first stage fusion is performed on the distance estimation result of the head and foot parts of the pedestrian under the condition of visible light or infrared light, and the second stage fusion is performed on the fusion result of the head and foot of the visible light and infrared light; in the operation of the method, when the estimated distance exceeds a certain threshold range of the tracking prediction result, the distance estimation result is judged to be invalid, and the tracking prediction result is taken as a final distance estimation result.

The method and the device are suitable for estimating the distance of the mobile equipment to the pedestrians. (1) adapted for use in mobile robots to evade pedestrians: with the advent of the intelligent manufacturing age, the mobile robot industry has been unprecedented, and avoidance of pedestrians has also been a concern as a problem to be solved by mobile robots. The mobile robot faces to the complex environment, and how to effectively avoid pedestrians under the conditions of dim light and low visibility is a main problem solved by the invention. The visible-infrared binocular camera is used for acquiring the images at the same time, so that the effectiveness of image information acquisition is guaranteed, and the process of post-processing the images and estimating the distance is not influenced by external environment. Accordingly, the invention meets the requirements of the mobile robot on the aspect of estimating the distance between pedestrians when avoiding the pedestrians. (2) The method is suitable for positioning pedestrians by Unmanned Ground Vehicles (UGVs): at present, the unmanned ground vehicle is mainly applied to first-aid scenes such as logistics transportation, detection, protection and medical evacuation, and the problem that the unmanned ground vehicle must solve is solved when the pedestrians in the environment are positioned in time under dangerous conditions. UGVs are complex to environment, and the difficulty in estimating the distance of pedestrians is increased under extreme weather conditions such as strong illumination of outdoor windsand, rain and snow. According to the invention, the effectiveness and accuracy of pedestrian distance acquisition under extremely severe outdoor weather conditions are ensured by the method for acquiring the images by the infrared-visible light binocular camera and the method for fusing the distances between the cascaded pedestrian head and the foot parts. (3) The method is suitable for the field of autonomous automobile auxiliary driving, and in the process, the method is mainly used for judging the distance between pedestrians and vehicles, and provides important data support for pedestrian risk judgment. The method can meet the requirements of sustainability, accuracy, completeness and the like in the process of estimating the distance between the pedestrians in the auxiliary driving, and innovatively improves the problems of pedestrian position determination, body part information acquisition, emergency treatment of application scenes and algorithm failure and the like. Only a monocular camera and an infrared camera are used as image acquisition equipment, so that the hardware requirement degree is low, and the implementation is easy. Meanwhile, the binocular camera guarantees the safety of autonomous automobile driving at night, and solves the problem of huge potential safety hazards.

Example 4:

for the scheme in embodiment 3, the method for acquiring the height of the head pixel of the person includes the following steps:

Further, the method of step S2 is:

the energy filtering algorithm is utilized to obtain a vertical energy distribution diagram of the target binary image, in the vertical energy distribution diagram, the horizontal coordinate direction is the vertical coordinate direction in the image coordinate system, the size of the vertical coordinate direction is the same as the size of the vertical coordinate direction in the image coordinate system, and the vertical coordinate of the energy distribution diagram is the energy value of the corresponding pixel row in the image;

determining the ordinate P of the pixel position at the top of the pedestrian head by taking the top position of the pedestrian head as the initial value of an energy curve according to the corresponding relation between the head target area image and the energy distribution diagram in the vertical direction _t The method comprises the steps of carrying out a first treatment on the surface of the The junction of the bottom of the pedestrian head and other body parts will have a valley, from which the ordinate P of the pixel position of the bottom of the pedestrian head is determined _b ；

The pedestrian head pixel height is represented by:

H _h ＝P _b -P _t 。

Further, the method for obtaining the pixel height of the head of the upright pedestrian of the monocular camera further comprises the following steps:

s3, obtaining pixels of the height of the pedestrian, and checking the head pixel height result according to the ratio of the pixels of the head and the body of the pedestrian;

s4, outputting the pixel height of the pedestrian head according with the verification result, and outputting the final pixel height of the pedestrian head according to the fixed proportion of the pixels of the obtained pedestrian height for the result which fails to pass the verification.

Further, the method for obtaining the body height pixels comprises the steps of calculating pixels in a pedestrian frame by using a super-pixel segmentation algorithm, segmenting pedestrians and background areas, removing redundant information, operating a segmentation result by using an energy filtering algorithm to obtain a vertical energy statistical result, and obtaining the body height pixels by using an analysis result.

Further, in the extracting of the head pixel area, the method for obtaining the head pixel height of the pedestrian is to judge the head height of the pedestrian with coarse granularity, and the method comprises the steps of estimating the head height of the pedestrian with coarse granularity and estimating the head height of the pedestrian with fine granularity:

coarse-grained pedestrian head overestimationThe gauge estimates the head height, H, by a fixed ratio r between the head piece height and the body height _{re_head} Height H with body _body Proportion r _hb As determined by the simulation instance,

The head height is

H _{re_head} ＝H _body ×r _hb

H _{re_head} ×(1-r _re )＜H _head ＜H _{re_head} ×(1+r _re )

Further, the method for acquiring the head pixel area comprises the following steps:

(1) Cluster initialization: inputting the number k of the super pixels to be generated in the CIELAB color space, and determining the same grid spacing s according to the number n of the pixel points in the processing area to ensure that the obtained super pixel blocks have the same size;

wherein:

the pixel point color and position information are used for defining the five-dimensional vector distance center in the CIELAB color space:

C _k ＝[l _k a _k b _k c _k x _k y _k ] ^T

(2) Pixel distance calculation: defining a distance index D to represent the pixel i and the distance center C _k The relation between the two is judged by the color distance and the space distance, and the contribution of the color distance and the space distance to the distance index D is determined by the weight m:

(3) Pixel allocation: each pixel i in the pixel allocation process is allocated to a corresponding super-pixel block according to the distance from the clustering center point, and the corresponding search area of the pixel area is twice as large as the super-pixel area;

(4) Cluster center update: after the pixel i is distributed to the clustering center, determining the clustering center again according to the color and position information of the pixel point, calculating the residual value between the updated pixel i and the previous clustering center by using the space two norms, continuously repeating the updating process until the error converges, stopping updating and determining the super pixel block;

(5) Post-treatment: after the clustering process, the occurrence part of the pixel points does not belong to any super pixel block, and the isolated pixel points are reassigned by using a communication algorithm.

Example 5:

for the solution in embodiment 3 or 4, a foot member acquisition method based on an energy filtered pedestrian image includes the steps of:

step 1: acquiring a pedestrian foot calibration frame through HOG features;

step 2: obtaining the region where the foot target is located through a super-pixel algorithm;

step 3: the output foot coordinates are obtained by energy filtering.

Further, the method in the 3 rd step is as follows: setting the center point position of the toe of the pedestrian as a specific corresponding point P of the foot position _f Respectively projecting binary results of a region where a foot target is located in the horizontal direction and the vertical direction, carrying out statistics on non-zero pixel points in the horizontal direction and the vertical direction on an energy characteristic, carrying out energy filtering processing on a binary image, accumulating the non-zero pixel points and forming a corresponding energy filtering curve, wherein in a vertical energy distribution diagram, the abscissa direction is the ordinate direction in an image coordinate system, the size is the same as the ordinate size in the image coordinate system, the ordinate of the energy distribution diagram is the energy value size of a corresponding pixel row in the image, and the corresponding relation between the image and the vertical energy distribution diagram, P is _f The abscissa is the horizontal energy distribution initial value abscissa

And the abscissa of the end point value->

Median value, namely:

P _f The ordinate is the abscissa of the end point value of the vertical energy distribution diagram

Namely:

wherein:

the pixel color and position information are used for defining five-dimensional vectors in the CIELAB color space:

C _k ＝[l _k a _k b _k c _k x _k y _k ] ^T (1)

wherein d _c Represents the color distance, d _s Representing the space distance, wherein m represents a distance adjustment weight coefficient, when m is smaller, the weight of the color distance is higher, the adsorptivity of the super pixel to the target edge is stronger, and the regularity of the shape and the size of the super pixel is reduced; when m is larger, the weight of the space distance is higher, and the rule of forming the super pixel block is formed The sex is better;

The method for acquiring the upright pedestrian foot component is based on intelligent image analysis, the pedestrian foot component area is determined through HOG features, then the target area is further processed by utilizing a super-pixel algorithm, and the area where the redundant information extraction target is located is removed. In order to further acquire image information, required characteristics are extracted, a target area binary image is processed according to an energy filtering algorithm, a horizontal and vertical energy statistical curve is obtained, and the horizontal and vertical energy statistical curve is used as a characteristic point, namely a position point of a pedestrian foot component.

The luggage pedestrian foot component acquisition method provided by the invention serves a distance estimation algorithm based on the pedestrian body component, and provides effective data support for the pedestrian foot component. The pedestrian body part is used as an information acquisition source, no external hardware intervention is needed, and the method is suitable for estimating the distance of the pedestrian in a complex environment by the convenience. The invention is based on computer vision, only uses the monocular camera as the hardware equipment for image acquisition, and has low requirement on the hardware equipment. The pedestrian foot part characteristics are acquired through deep mining of the image information, so that the process of a subsequent algorithm is completed.

The foot members are in contact with the ground and are relatively fixed in position, which is advantageous in terms of acquisition complexity over other body members. Meanwhile, the ground color features are relatively single, and the burden in the image segmentation process is reduced.

1) According to the invention, the pedestrian body multiple parts are obtained through the HOG characteristics, and the foot parts are judged according to the inherent characteristics of the foot parts through an optical flow method and an energy filtering curve. The HOG characteristic is widely used by scholars as a main flow characteristic in the pedestrian detection process, and meanwhile, the algorithm is relatively mature and has certain advantages in the aspects of detection precision, timeliness and the like. The invention only acquires the information of the foot parts of the pedestrians, extracts the partial characteristics, and carries out parallel processing on the multiple parts acquired by the HOG by using an image segmentation basic algorithm-an optical flow method, thereby ensuring the real-time performance of the method. Analyzing the component diagram processed by the optical flow method, thereby obtaining the foot component area. In the foot part region acquisition process, the main stream basic algorithm is used, so that the stability of the method is ensured, and the instantaneity in the distance estimation process is ensured.

2) The super-pixel algorithm belongs to the field of image segmentation and processes pixels in the foot component area. The super-pixel algorithm processes the image on the pixel level, so that the algorithm accuracy is guaranteed, and meanwhile, the problems of high algorithm complexity and low instantaneity are also brought. Therefore, only the pixels in the foot component area are calculated, and both the calculation accuracy and the calculation speed are ensured. The super-pixel segmentation algorithm based on cluster analysis processes pixels with similar characteristics as a super-pixel block through calculating the characteristic values of the pixel points. The super-pixel algorithm can be used for manually regulating and controlling the sizes, the ranges and other weight of the feature blocks, so that the application effectiveness of the super-pixel algorithm in different occasions is ensured.

3) The energy filtering algorithm extracts characteristic points through a filtering curve, so that the purpose of deep mining of image information is achieved. The energy filtering algorithm is mainly applied to accumulation of non-zero pixel points and obtains a filtering curve, and the characteristics of the foot position points of the pedestrians are obtained and specific pixel positions are obtained through analysis of the images and the filtering curve. This method is easy to implement and calculates a speed block compared to other feature extraction methods. The invention is mainly applied to the acquisition of the distance of pedestrians and has higher requirement on real-time performance, so the adoption of a rapid and stable method for acquiring the foot step position of the pedestrians is a necessary requirement of the invention. The energy filtering algorithm only carries out accumulation calculation on the pixel points, extracts the foot position pixels through the existing rules, and guarantees both real-time and stability.

Example 6:

for the solution in embodiment 3, 4 or 5, where step s2, multi-time scale detection tracking, determining the positions of the target pedestrians in the infrared front image and the visible front image, the method of this embodiment may also be used, and the sample may select the target tracking method of the update mechanism, determining the positions of the target pedestrians in the infrared front image and the visible front image, where the tracking method includes:

firstly, acquiring a video initial frame and initializing a tracker;

secondly, tracking a target for the next frame by using a filtering tracking method, and returning a tracking result;

thirdly, analyzing the tracking result by using an image characteristic forgetting method, extracting image characteristics of a target area, comparing the image characteristics with a reference image, and forgetting the tracking result with an excessively large gap;

and fourthly, checking the forgotten tracking result in the third step by using an energy significant memory method, extracting gradient energy of a target area of the forgotten result, performing significance analysis, memorizing the tracking result containing the target in a sample library again, maintaining the forgotten operation by other results not containing the tracked target, and returning to the second step or ending tracking.

Further, if the tracking result with the too large difference does not exist in the third step, returning to the second step or ending the tracking.

Further, the third step is as follows: taking a kth frame image as a reference image, respectively extracting HOG and CNN characteristics of a kth frame and a kth+1 frame image target area, and then calculating Manhattan distance values of the image characteristics of the two areas to be used as image distance values of a kth+1 frame tracking result;

let the k-th frame image of video be characterized as J _k (x) The k+1st frame image is characterized by J _k+1 (x) Then calculating an image distance value of the (k+1) th frame image by using the formula (2);

dist _k+1 ＞δ，δ∈(0，1) (3)

wherein delta is the upper limit value of the overlap determination failure sample, dist _k+1 The image distance value of the (k+1) th frame image is n, the number of elements in the feature map is J _k (x) _i And J _k+1 (x) _i The ith element in the image characteristics of the 1 st frame and the k+1st frame respectively;

if the image characteristic distance of the k+1st frame is larger than delta, judging the tracking result of the k+1st frame as the tracking result which needs to be forgotten, and if the image characteristic distance is smaller than delta, memorizing the tracking result in a training set and jumping to a fifth step.

Further, the fourth step is as follows: extracting HOG energy of a target area from a k+1st frame image of the latest target tracking result, extracting HOG energy of all images in a training set, calculating the average value of the HOG energy as a comparison, and calculating the HOG energy change value of the k+1st frame image as the energy significance value of the k+1st frame image;

Set H _k+1 HOG energy value, H, for the (k+1) th frame image _x HOG energy set for all images in training set, then

Equation (7) is an energy significance value calculation equation, ener _k+1 For the energy significance value of the k+1st frame image, m represents the number of the images existing in the training set, H _x (i) HOG energy representing the ith image in the training set;

if the energy significance value of the k+1st frame image meets the formula (8), the k+1st frame image is remembered into the training set, and if the energy significance value of the k+1st frame image does not meet the formula, forgetting operation of the k+1st frame image is maintained;

in one scheme, in order to solve the problem of remembering effective samples in the forgetting result, a method is provided for extracting gradient energy of a target area of the forgetting result for significance analysis and remembering a tracking result containing the target into a sample library.

Further, extracting HOG energy of a target area from a k+1st frame image of the latest target tracking result, extracting HOG energy of all images in a training set, calculating the average value of the HOG energy as a comparison, and calculating the HOG energy change value of the k+1st frame image as the energy significant value of the k+1st frame image;

(1) The target tracking algorithm of the sample selectable updating mechanism can keep good adaptability to complex environments such as target shielding, intense light and dark change, target deformation and the like, so that the target tracking method can be applied to more actual scenes, and more reliable target position information can be provided for subsequent judgment such as pedestrian intention analysis and the like;

(2) The image feature forgetting method can screen tracking results, forget tracking results with a large gap from a reference image, can be suitable for all the object tracking methods of the discriminant model, can avoid the pollution of the training set by the feature information of the shielding object, and improves the adaptability of the object tracking method to the shielding of the object;

(3) The energy significant memory method can verify the forgetting result of the image characteristic forgetting method, and mainly memorize the target characteristic information with larger change under the complex environments such as light-dark change, target deformation and the like;

(4) The invention can provide more accurate road condition information for mobile robots, autonomous vehicles and auxiliary driving systems, and plays an important role in both obstacle avoidance and path planning of industrial robots or autonomous vehicles, guiding service provided by service robots for specific person targets and the like.

Claims

1. A method for acquiring the height of a head pixel of a pedestrian erected by a monocular camera, comprising the following steps:

s2, establishing a vertical direction energy distribution diagram, and acquiring the pixel height of the head of the pedestrian;

the method of step S2 is:

The pedestrian head pixel height is represented by:

H _h ＝P _b -P _t ；

in the extraction of the head pixel region, the method for acquiring the head pixel height of the pedestrian is to judge the head height of the pedestrian with coarse and fine granularity, and comprises the steps of estimating the head height of the pedestrian with coarse granularity and estimating the head height of the pedestrian with fine granularity:

coarse-grain pedestrian head height estimation head height H is estimated by a fixed ratio r between head part height and body height _{re_head} Height H with body _body Proportion r _hb As determined by the simulation instance,

the head height is

H _{re_head} ＝H _body ×r _hb (2)

H _{re_head} ×(1-r _re )＜H _head ＜H _{re_head} ×(1+r _re ) (3)

wherein r is _re For the floating coefficient, controlling to be between 0.2 and 0.3, when the head pixel height H is obtained by a fine granularity method _head Within the reference range, H _head As the head height output, otherwise, judging that the super pixel acquires the height failure of the pedestrian head pixel, and enabling H to be the same as the height failure of the pedestrian head pixel _{re_head} As an output;

the method for acquiring the head pixel area comprises the following steps:

Wherein:

C _k ＝[l _k a _k b _k c _k x _k y _k ] ^T (1)

(2) Pixel distance calculation: defining a distance index D to represent the pixel i and the distance center C _k The relationship between the two,

the contribution of the color distance and the space distance to the distance index D is determined by the weight m through judging the color distance and the space distance together:

2. The monocular camera upright pedestrian head pixel height acquisition method of claim 1, wherein: further comprises:

3. A monocular camera upright pedestrian head pixel height acquisition method as defined in claim 2, wherein: the method for obtaining the body height pixels comprises the steps of calculating pixels in a pedestrian frame by using a super-pixel segmentation algorithm, segmenting pedestrians and background areas, removing redundant information, operating a segmentation result by using an energy filtering algorithm to obtain a vertical energy statistical result, and obtaining the body height pixels by using an analysis result.