CN112183287A

CN112183287A - People counting method of mobile robot under complex background

Info

Publication number: CN112183287A
Application number: CN202011002069.XA
Authority: CN
Inventors: 彭倍; 罗忠福; 邵继业; 葛森
Original assignee: Sichuan Artigent Robotics Equipment Co ltd
Current assignee: Sichuan Artigent Robotics Equipment Co ltd
Priority date: 2020-09-22
Filing date: 2020-09-22
Publication date: 2021-01-05

Abstract

The invention provides a method for counting the number of people of a mobile robot under a complex background, which is used for primarily detecting a visible light image acquired by the mobile robot through a deep learning technology and further filtering the primary detection result through an infrared image acquired by an infrared camera to finish the counting of the number of people of the mobile robot under the complex background and solve the problems that the detection precision is not high, the false detection phenomenon is serious and the strict requirement of special scenes such as prisons on the detection result cannot be met in the prior art.

Description

People counting method of mobile robot under complex background

Technical Field

The invention relates to the technical field of mobile robots, in particular to a people counting method of a mobile robot under a complex background.

Background

At present, most of the existing people number detection methods under the complex background have the problems of low speed, low accuracy and poor reliability, so that the existing people number detection methods are completely inapplicable in some scenes with strict requirements on accuracy and false detection rate. In the invention patent with the application number of CN109359577A, a system for detecting the number of people under a complex background based on machine learning is introduced, which performs multi-scale channel feature superposition from an infrared image video stream based on an infrared image, and realizes the detection of the number of people by using Adaboost algorithm training in a machine learning algorithm. However, the infrared image has limited information, and the channel features used for classification are all selected manually, which limits the accuracy of such algorithms from data information sources, and the infrared image is greatly interfered by light sources, smooth reflectors and the like, thereby limiting the applicable scenes. With the development of deep learning technology, the computer vision direction makes great progress, and the precision of the method based on the traditional image processing or machine learning algorithm is greatly improved on the vision tasks such as face detection, pedestrian detection and the like. For example, in 2018, Zhang et al paid attention to pedestrian detection under the shielding problem, and the thought of body parts is combined in the paper "Detecting Pedestrians in a crown", the human body is divided into 5 parts to be processed one by one, then feature fusion is carried out, a loss function design task target is adjusted, and pedestrian detection under a complex background is realized based on a two-stage target detection method, namely, fast RCNN. The method based on deep learning generally needs a large number of image training models, and when the image resolution is not high and the target is small, the false detection phenomenon is often caused along with a serious overfitting phenomenon. In special scenes such as prison people number inspection, a higher recall rate, namely a lower false detection rate, is generally needed. Under conditions of insufficient light, personnel obstruction, etc., this is unacceptable if the light, clothing, etc. is erroneously detected as a person. Therefore, it is desirable to provide a solution to improve the accuracy and reliability of the personnel detection and statistics of the mobile robot in a complex background.

Disclosure of Invention

The invention aims to provide a people counting method of a mobile robot under a complex background, which is used for realizing the technical effect of improving the accuracy and reliability of people detection and counting of the mobile robot under the complex background.

The invention provides a people counting method of a mobile robot under a complex background, which comprises the following steps:

step S1, acquiring a first visible light image shot by the mobile robot, labeling the first visible light image, constructing a personnel detection data set under a specific environment, and dividing the personnel detection data set into a training set, a verification set and a test set; wherein the marking part during marking is the head of the person;

step S2, taking the darknet-53 network as a feature extraction skeleton network, constructing a detection model by using a target detection framework of a one-stage method YOLO v3, and carrying out fine adjustment, specifically adjusting the fine adjustment of an activation function and a loss function, wherein the activation function is a Mish function expression as follows:

Mish(x)＝x·tanh(log(1+ex))

the loss function is a focal loss function, and the expression is as follows:

in the formula, y is a real label of a detection frame, p is a predicted value, alpha is a positive and negative sample balance parameter, and gamma is a difficult and easy sample balance parameter;

s3, performing data enhancement processing on the training set, and then sending the training set to the detection model for training;

s4, loading weights of a pre-training network by using a transfer learning method, training the detection model by using the training set after data enhancement processing, and verifying through the verification set to obtain the detection model with the average error precision mean value meeting the requirement;

step S5, acquiring a second visible light image and an infrared image shot when the mobile robot runs, and performing projection alignment processing on the second visible light image and the infrared image;

s6, acquiring the set detection hyper-parameters, and carrying out personnel detection on the second visible light image through the trained detection model to obtain a preliminary detection frame set;

and step S7, analyzing whether the integrated intensity of the infrared image pixels in each detection frame in the detection frame set exceeds a set threshold value, if so, confirming the target and outputting the corresponding detection result.

Further, the data enhancement processing on the training set in step S3 at least includes horizontal flipping, random clipping sampling, and random rotation.

Further, the confidence coefficient range of the detection model in the step S7 is 0.2-0.4.

Further, when the mobile robot detects a person in the target scene, the process is performed at least 3 times according to the set time interval in steps S1 to S7, and the output result of the detected person is used as the final detection result.

Further, the calculation method of the pixel cumulative intensity in the step S7 is as follows:

wherein RGB (x ', y') represents a point coordinate of the infrared image projected on the visible light image; w is a_jIs to represent the width of the jth detection box; h is_jThe height of the jth detection frame; the threshold is set to be not less than 0.8 during calculation.

The invention has the beneficial effects that: based on a deep learning technology and data information fusion of visible light and infrared images, a set of people number detection scheme with high precision and low false detection rate is provided, people number statistics of the mobile robot under a complex background is completed, the problems that in the prior art, detection precision is not high, false detection phenomena are serious, special scenes such as prisons and the like which have strict requirements on detection results cannot be met, and the like are solved, and good operation performance is shown for complex environment conditions such as shielding, illumination, night and the like.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments of the present invention will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.

Fig. 1 is a schematic flow chart of a people counting method of a mobile robot in a complex background according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be described below with reference to the drawings in the embodiments of the present invention.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures. Meanwhile, in the description of the present invention, the terms "first", "second", and the like are used only for distinguishing the description, and are not to be construed as indicating or implying relative importance.

Referring to fig. 1, fig. 1 is a schematic flow chart illustrating a people counting method of a mobile robot under a complex background according to an embodiment of the present invention.

The inventor researches and discovers that the people number detection method which runs well in the common environment mostly has the problems of low speed, low accuracy and poor reliability in the complex background, so that the method is not completely applicable to scenes with strict requirements on accuracy and false detection rate. Therefore, in order to improve the accuracy of the mobile robot in detecting the people and reduce the probability of missing detection, the embodiment of the invention provides a people counting method of the mobile robot under a complex background, and the specific flow is as follows.

The first step is as follows: the method comprises the steps of obtaining a first visible light image shot by the mobile robot, labeling the first visible light image, constructing a personnel detection data set under a specific environment, and dividing the personnel detection data set into a training set, a verification set and a test set.

In one embodiment, a mobile robot equipped with SLAM automatic navigation can be provided with visible light and infrared cameras, then visible light images are shot, data labeling is carried out by using a labeling tool labelme, and the ratio of 6:2:2 is divided into a training set, a verification set and a test set, wherein the training set is used for training data, the verification set is used for adjusting, selecting a model, and the test set is used for testing the final model precision. When data is labeled, the head is considered to be a place with relatively high intensity of the infrared image of the human body, and meanwhile, in the visible image, even under the complex conditions of sleeping, daily life, overlooking photographing and the like, the head is generally shielded less, so that only the head of a person can be labeled, and the detection is more convenient.

It should be noted that the division of the training set, the verification set, and the test set is not limited to the ratio of 6:2:2, and may be adjusted according to actual requirements.

The second step is that: using a darknet-53 network as a feature extraction skeleton network, constructing a detection model by using a target detection framework of a one-stage method YOLO v3, and carrying out fine adjustment; the activation function of the detection model during fine tuning is a Mish function, and the expression is as follows:

Mish(x)＝x·tanh(log(1+e^x))

the loss function is a focal loss function, and the expression is as follows:

in the formula, y is a real label of the detection frame, p is a predicted value, alpha is a positive and negative sample balance parameter, and gamma is a difficult and easy sample balance parameter.

Through the fine adjustment of YOLO v3 in the mode, although the activation function increases a certain amount of calculation by using a Mish function, better negative gradient information transfer can be brought, and better stability is provided; meanwhile, the problem of proportion unbalance of positive and negative samples of the feature layer can be balanced to a certain degree through the focal loss function, and meanwhile, the key learning of difficult samples is added.

The third step: and carrying out data enhancement processing on the training set, and then sending the training set to the detection model in the second step for training.

In one embodiment, after the training set data is obtained, the training set may be subjected to data enhancement processing, and then sent to the detection model, and the detection model may be trained according to the set maximum number of iterations. Specifically, when data enhancement processing is performed, processing such as horizontal inversion, random clipping sampling, random rotation and the like can be performed on the training set so as to enhance the stability of the model to data noise and improve generalization capability.

The fourth step: and loading the weights of the pre-training network by using a transfer learning method, training the detection model by using a training set after data enhancement processing, and verifying through a verification set to obtain the detection model with the average error precision mean value meeting the requirement.

In one embodiment, a pre-training network weight loaded on the ImageNet data set by using a transfer learning method can be used, the detection model is trained by using the training set constructed in the first step until the average error average value (mAP) on the verification set meets the required requirement, the hyper-parameters are adjusted for many times, and the optimal model with the highest mAP is selected through the verification set.

The fifth step: and acquiring a second visible light image and an infrared image shot by the mobile robot, and performing projection alignment processing on the second visible light image and the infrared image.

In one embodiment, after the detection model with the highest mAP is acquired, the visible light and infrared cameras on the mobile robot can shoot a second visible light image and an infrared image of the current detection environment again, and then the second visible light image and the infrared image are subjected to projection alignment processing.

Specifically, the correspondence between the highlight in the infrared image and the actual object in the visible light can be judged manually, 10 groups of corresponding point sets are labeled manually, then the RANSAC algorithm is used for solving the projection transformation matrix H from the infrared image to the visible light image, the projection transformation matrix is fixed, and then the projection relationship is formed:

RGB(x',y')＝H·Infrared(x,y)

wherein, Infrared (x, y) is the point coordinate of the Infrared image; RGB (x ', y') represents the coordinates of points where infra (x, y) is projected on the visible light image; h denotes a projective transformation matrix.

It should be noted that the number of groups of the corresponding point set is not limited to the above 10 groups, as long as the number of groups of the corresponding point set is not less than 4 groups, the purpose of projection alignment can be achieved, and a user can label according to actual requirements. The projective transformation matrix only needs one-time manual registration calculation, and then can be fixed and directly used for the subsequent projection of the infrared image to the visible light image.

And a sixth step: and acquiring the set detection hyper-parameters, and carrying out personnel detection on the second visible light image through the trained detection model to obtain a preliminary detection frame set.

In an embodiment, after the detection hyper-parameter set by the user is obtained, the person detection may be performed on the second visible light image through a trained detection model, so as to obtain a preliminary detection frame set S ═ box₁(x₁,y₁,w₁,h₁),box₂(x₂,y₂,w₂,h₂) …, where box is the detection box, x is the abscissa of the upper left point of the detection box, y is the ordinate of the upper left point of the detection box, w is the width of the detection box, and h is the height of the width of the detection box. In addition, the confidence coefficient range of the detection model during personnel detection can be set to be 0.2-0.4, so that all personnel can be completely detected with high tolerance, and certain false detection is allowed.

The seventh step: and analyzing whether the accumulated intensity of the infrared image pixels in each detection frame in the detection frame set exceeds a set threshold, if so, confirming the target and outputting a corresponding detection result.

In one embodiment, the cumulative intensity of the infrared image pixels is calculated by:

wherein RGB (x ', y') represents a point coordinate of the infrared image projected on the visible light image; w is a_jIs to represent the width of the jth detection box; h is_jAnd setting a threshold value not lower than 0.8 during calculation for the height of the jth detection frame. And if the pixel accumulated intensity I of the infrared image exceeds a set threshold, outputting a corresponding result as a final result, otherwise, taking the output result as a candidate result.

In order to test the practical application of the present invention, in this embodiment, a verification test is performed at a certain guard house (the confidence of the detection model in the current verification test is set to 0.4, and the set threshold of the pixel integrated intensity I is set to 0.85). Specifically, a robot is arranged at one night, a visible light camera and an infrared camera are mounted, 640 images are collected in normal inspection of each monitoring room, more than 10 people exist in each image on average, and the images are divided into a training set, a verification set and a test set according to the ratio of 6:2: 2. Firstly, training a model according to the training process, because the activity condition of personnel in the prison is complex, wherein the difference between the sleeping scene and the non-sleeping scene and the serious shielding scene is large, the models are subdivided according to the situation, the performance of the models obtained by training is respectively tested under each scene, and the original Yolo v3 model is compared to obtain the test result shown in the table 1.

TABLE 1 test results

In order to better put the invention into practical application, the detection process of the mobile robot can be further optimized, the flow from the first step to the seventh step is executed every 2 seconds, the execution is totally carried out for 3 times, and the most people is taken from the 3 times as the final detection result, so as to avoid the missing detection phenomenon caused by the walking of people. It should be noted that the above-mentioned flow is not limited to be executed 3 times, and the number of times of repeated execution may be set according to actual requirements.

In summary, the embodiment of the invention provides a method for counting the number of people of a mobile robot under a complex background, which completes the counting of the number of people of the mobile robot by preliminarily detecting a visible light image acquired by the robot through a deep learning technology and further filtering a preliminary detection result through an infrared image acquired by an infrared camera, and particularly solves the problems that the detection precision of the prior art is not high, the false detection phenomenon is serious, and the requirement of special scenes such as prisons on the detection result is not strict under the complex background.

The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A people counting method of a mobile robot under a complex background is characterized by comprising the following steps:

Mish(x)＝x·tanh(log(1+e^x))

the loss function is a focal loss function, and the expression is as follows:

and step S7, analyzing whether the integrated pixel intensity of the infrared image in each detection frame in the detection frame set exceeds a set threshold, and if the integrated pixel intensity exceeds the set threshold, confirming the target and outputting a corresponding detection result.

2. The method according to claim 1, wherein the data enhancement processing on the training set in step S3 at least includes horizontal flipping, random crop sampling and random rotation.

3. The method according to claim 1, wherein the confidence level of the detection model in the step S6 is in a range of 0.2-0.4.

4. The method as claimed in any one of claims 1 to 3, wherein the mobile robot performs the human detection on the target scene at least 3 times according to the procedure of steps S1 to S7 according to the set time interval, and outputs the detected result of the most human number as the final detection result.

5. The method according to claim 1, wherein the pixel cumulative intensity in step S7 is calculated by:

wherein RGB (x ', y') represents a point coordinate where the point coordinate (x, y) of the infrared image is projected on the visible light image; w is a_jIs to represent the width of the jth detection box; h is_jThe height of the jth detection frame; the threshold is set to be not less than 0.8 during calculation.