CN116311351A

CN116311351A - Indoor human body target recognition and monocular distance measurement method and system

Info

Publication number: CN116311351A
Application number: CN202310076992.5A
Authority: CN
Inventors: 李振华; 李超; 徐胜男
Original assignee: Shandong University
Current assignee: Shandong University
Priority date: 2023-01-29
Filing date: 2023-01-29
Publication date: 2023-06-23

Abstract

The present disclosure provides an indoor human body target recognition and monocular ranging method and system, which relates to the technical field of target detection and image processing, and comprises the steps of acquiring image data of an indoor human body target and preprocessing; constructing a lightweight neural network human body target detection model and a tracking model, firstly inputting image data into the target detection and tracking model to carry out target detection, and reserving a target frame of a current frame image; taking the detected target human body in the current frame as a tracking target of the next frame, extracting image characteristic information of the tracking target, generating a tracking frame in the next frame through matching of the image characteristic information, then calculating IOU parameters with the human body target frame detected by the current frame, and judging whether detection is missed; when a raw human body target is shielded, a human body frame when a previous frame of the shielded target is not shielded is found, the coordinate of the bottommost side of the human body frame when the shielding is calculated according to the height of the human body frame when the shielding is not shielded, and the new coordinate is subjected to monocular distance measurement.

Description

Indoor human body target recognition and monocular distance measurement method and system

Technical Field

The disclosure relates to the technical field of target detection and image processing, in particular to an indoor human body target recognition and monocular distance measurement method and system.

Background

The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.

Nowadays, with the rapid development of society and technology and the improvement of living standard and mental requirements of people, the safety problem is more and more emphasized, and the demand of video monitoring systems also shows explosive growth. The real-time crowd target detection and counting method can be applied to public places such as hospitals, stations, scenic spots and the like for safety, and meanwhile, detection and analysis of abnormal human body targets are also important components of the intelligent video monitoring system. In the field of monitoring security, the method has important significance for alarming abnormal target intrusion by simultaneously obtaining the position information and the distance information of the human target. Especially, in the indoor environment of home, only the old or children are at home alone, the safety guarantee is greatly reduced, so that a human body anti-collision system, a dangerous early warning system and the like are needed to guarantee the life safety of human bodies, and in order to realize the functions, the human body targets in the environment are needed to be identified and the distance is needed to be estimated; in addition, intelligent control of some household equipment also depends on the identification count of human targets and the distance between the household equipment and the equipment, and equipment parameters are adjusted according to the quantity of people and the distance between the household equipment and the equipment, so that real-time intelligent adjustment is realized, and the living environment is more comfortable.

The current ranging methods include ultrasonic ranging, laser radar ranging, binocular stereoscopic vision ranging, monocular camera ranging and the like, the ultrasonic ranging, laser radar ranging and binocular stereoscopic vision ranging methods are easily affected by the environment and are high in price, the camera-based ranging method processes image information, stability is good, and monocular ranging is simpler to use than binocular ranging and is higher in cost performance. In the aspect of monocular camera ranging, the monocular camera cannot capture depth information of an object, so that the monocular camera is difficult to range, a common method is to realize simple object ranging by using the principles of small-hole imaging and similar triangles, and the method is influenced by camera calibration parameters, environmental factors, object positions and the like and has a certain error with the actual distance. Moreover, the prior art has few methods for simultaneously performing object recognition and ranging in real time, and has certain limitations.

Disclosure of Invention

In order to solve the problems, the present disclosure provides an indoor human body target recognition and monocular ranging method and system, which adopts a convolutional neural network model based on deep learning, trains by using a lightweight convolutional neural network model, deploys the lightweight model at an embedded end, and applies a neural network in an AI chip; meanwhile, the detection and tracking systems are mutually compensated, the problem of missed detection is solved, and meanwhile, important correction information is provided for the ranging of the shielding target, so that the performance and the efficiency are improved.

According to some embodiments, the present disclosure employs the following technical solutions:

an indoor human body target recognition and monocular ranging method comprises the following steps:

acquiring image data of an indoor human body target, and preprocessing;

constructing a lightweight neural network human body target detection model and a tracking model, firstly inputting image data into the target detection and tracking model to carry out target detection, and reserving a target frame of a current frame image;

taking the detected target human body in the current frame as a tracking target of the next frame, extracting image characteristic information of the tracking target, generating a tracking frame in the next frame through matching of the image characteristic information, then calculating IOU parameters with the human body target frame detected by the current frame, and judging whether detection is missed;

when a raw human body target is shielded, a human body frame when a previous frame of the shielded target is not shielded is found, the coordinate of the bottommost side of the human body frame when the shielding is calculated according to the height of the human body frame when the shielding is not shielded, and the new coordinate is subjected to monocular distance measurement.

an indoor human target recognition and monocular ranging system, comprising:

the image acquisition module is used for acquiring image data of an indoor human body target and preprocessing the image data;

the human body detection and tracking module is used for constructing a lightweight neural network human body target detection model and a tracking model, inputting image data into the target detection model to carry out target detection, and reserving a target frame of a current frame image;

the target human body detected in the current frame is used as a tracking target of the next frame, the tracking target is input into a tracking model to extract image characteristic information of the tracking target, a tracking frame is generated in the next frame through matching of the image characteristic information, and then IOU parameters are calculated with the human body target frame detected in the current frame to judge whether detection is missed;

and the monocular distance measurement module is used for finding out a human body frame when a previous frame of the blocked target is not blocked when the raw human body target is blocked, calculating the coordinate of the bottommost side of the human body frame when the blocked target is blocked according to the height of the human body frame when the blocked target is not blocked, and carrying out monocular distance measurement on the new coordinate.

a non-transitory computer readable storage medium for storing computer instructions which, when executed by a processor, implement the indoor human target recognition and monocular ranging method.

an electronic device, comprising: a processor, a memory, and a computer program; the processor is connected with the memory, the computer program is stored in the memory, and when the electronic equipment runs, the processor executes the computer program stored in the memory so that the electronic equipment executes the method for realizing indoor human body target identification and monocular distance measurement.

Compared with the prior art, the beneficial effects of the present disclosure are:

the human body target detection model and the human body tracking model can be used for complementation, and the problems of target missing detection and target shielding ranging are effectively solved.

The present disclosure proposes a new monocular ranging method, which achieves excellent ranging effect for a target ranging error within a certain range around 25 cm.

The convolutional neural network prediction model with lighter weight is designed, the utilization of computing resources is greatly reduced, the model accuracy is basically kept unchanged, the embedded type terminal transplanting deployment is more friendly, and the embedded type terminal transplanting deployment is convenient to directly deploy on an artificial intelligent chip.

Drawings

The accompanying drawings, which are included to provide a further understanding of the disclosure, illustrate and explain the exemplary embodiments of the disclosure and together with the description serve to explain the disclosure, and do not constitute an undue limitation on the disclosure.

FIG. 1 is a general flow chart of a human target detection system and tracking system and monocular ranging system provided by embodiments of the present disclosure;

fig. 2 is a schematic structural diagram of a lightweight detection neural network model according to an embodiment of the present disclosure;

fig. 3 is a schematic diagram of an improved SE-C3 module of a backhaul of a lightweight detection neural network model according to an embodiment of the present disclosure.

FIG. 4 is a training record of a lightweight detection neural network model provided by an embodiment of the present disclosure;

fig. 5 is a schematic diagram of ranging region arrangement of a monocular ranging system provided in an embodiment of the present disclosure;

fig. 6 is a schematic diagram of camera placement of a monocular ranging system provided by an embodiment of the present disclosure.

Fig. 7 is a schematic diagram of ranging principle of a monocular ranging system according to an embodiment of the present disclosure.

The specific embodiment is as follows:

the disclosure is further described below with reference to the drawings and examples.

It should be noted that the following detailed description is illustrative and is intended to provide further explanation of the present disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments in accordance with the present disclosure. As used herein, the singular is also intended to include the plural unless the context clearly indicates otherwise, and furthermore, it is to be understood that the terms "comprises" and/or "comprising" when used in this specification are taken to specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof.

Example 1

An embodiment of the present disclosure provides an indoor human body target recognition and monocular ranging method, including:

step 1: acquiring image data of an indoor human body target, and preprocessing;

step 2: constructing a lightweight neural network human body target detection model and a tracking model, firstly inputting image data into the target detection and tracking model to carry out target detection, and reserving a target frame of a current frame image;

step 3: taking the detected target human body in the current frame as a tracking target of the next frame, extracting image characteristic information of the tracking target, generating a tracking frame in the next frame through matching of the image characteristic information, then calculating IOU parameters with the human body target frame detected by the current frame, and judging whether detection is missed;

step 4: when a raw human body target is shielded, a human body frame when a previous frame of the shielded target is not shielded is found, the coordinate of the bottommost side of the human body frame when the shielding is calculated according to the height of the human body frame when the shielding is not shielded, and the new coordinate is subjected to monocular distance measurement.

As an embodiment, in step 1, the process of acquiring image data of an indoor human body target and performing preprocessing includes:

the camera is used for shooting different scenes and different human bodies indoors under different positions and different illumination conditions, images are obtained, the obtained image data are subjected to data expansion, and the original image is subjected to overturning and mirror image processing according to a certain angle.

Specifically, (1) adopt the camera of the same model specification that carries on with RV1126 development board to gather the image, including the height of fixing the camera at 2.5m, inclination is 35, carries out indoor shooting to different scenes, different human under different positions, different illumination conditions.

(2) The human body image data set obtained by shooting is subjected to data expansion, the original image is turned over according to a certain angle and subjected to mirror image processing, a batch of new images are obtained, and the training image data set is greatly expanded.

(3) In order to better simulate a real indoor shielding scene, a square area with a fixed size is randomly selected by using a Cutout mode, then all 0 filling is adopted, erasing with any size is realized by phase change, and more important areas are reserved.

(4) In order to better simulate different brightness environments, brightness adjustment is carried out on the image by utilizing brightness enhancement and brightness reduction, and training sets in a backlight environment are enriched.

As an embodiment, a lightweight neural network human body target detection model and a tracking model are built, the improved network model is trained through the sample data set in the step 1, the training round is 500, the batch-size is 16, and the lightweight neural network human body target detection model and the tracking model are obtained.

The training set is established by utilizing the sample data obtained in the step 1, a lightweight convolutional neural network model is designed and established, and the network model has 2 innovations relative to a CSP-Darknet53 network, and is characterized in that:

the MoblieOne module is introduced, mobileOneBlock is integrated into the backbone network, the training and test reasoning time is reduced, and the purpose of light-weight real-time and rapid detection is achieved.

The attention introducing mechanism, as a plug and play attention module, can be added anywhere in the YOLOv5 network, with the SE channel attention mechanism added at the C3 module of the Backbone.

The embedded terminal is light in weight and high in adaptation, the calculated amount and the parameter amount of a network model are small, the network model supports the direct conversion of an Onnx or Pt format into a rknn model file, the rknn model file is deployed into a Rayleigh core microchip, and a human body detection and tracking model and a monocular ranging model constructed by embedding 3 are fused.

Specifically, the model transplanting process on the rayleigh core micro RV1126 specifically includes:

(1) Training is carried out on a local PC host by utilizing an improved lightweight convolutional neural network, and a model file with the best effect is saved.

(2) And converting the saved model into a pt format, converting the pt format into a torchscript.

(3) And connecting the USBOTG of the core board of the Rayleigh core micro RV1126 to a PC, correctly identifying equipment, realizing the communication between the RV1126 and the PC end through SSH, and carrying out file transmission and code debugging.

(4) And copying and transplanting the rknn model file into the RV1126, and operating the rknn model on the NPU of the Rayleigh core micro RV1126 to identify and predict the target position and the distance estimation of the human body.

As an embodiment, the method for identifying the indoor human body target and measuring the distance in a monocular way comprises the following specific implementation processes:

firstly, carrying out target detection on an input image, reserving a target frame of a current frame image, taking a detected person in the current frame as a tracking target of a next frame, extracting image characteristic information of the tracking target, generating a tracking frame through matching of the image characteristic information in the next frame, then calculating an IOU parameter with a human target frame detected by the current frame, if the IOU parameter is higher than a set threshold value, considering that the target of the previous frame is detected in the current frame, and the condition of no missed detection is not generated, wherein the threshold value has the best effect when the threshold value is set to 0.75 in multiple tests, namely, the coincidence degree of the target frame of the previous frame and the target frame of the next frame is considered to be the same target when the coincidence degree of the target of the previous frame and the target frame of the next frame is above 0.75, and if the target is lost in the field of view in the next frame, the tracking frame cannot be generated in the current frame. Meanwhile, the tracking model can generate an ID label for each target, the ID of the current target frame is updated each time of detection and tracking, and the ID labels of the frames before and after the same target are the same. Based on the above, when the human body target is shielded, the human body frame when the previous frame of the shielded target is not shielded is found out through ID judgment, then the coordinate of the bottommost side of the human body frame when the shielded is calculated according to the height of the human body frame when the shielded is not shielded, and the new coordinate is calculated to be the distance.

Further, the disclosure proposes a novel ranging method, and a specific implementation mode of the method for monocular ranging is as follows:

measuring the distance between the bottom of a target in a certain range area and a camera, assuming that the distance measurement range is a rectangular area with the length of (m), dividing the rectangular area into a plurality of small square areas with the side length of s (m), wherein each square area corresponds to a distance measurement formula, firstly judging which distance measurement area the target falls in, and then calculating the distance by the distance measurement formula in the corresponding area.

Specifically, (1) a ranging area of the camera is selected. The ranging area was set to 11.2×7.2 (m), and the area was divided into 14×9 square grids each having a side length of 80cm, and red cards were placed at the respective grid vertices.

(2) The position of the camera is fixed. The camera height was set to 2.5m, the angle was 35 ° from top view, and the camera was placed 80cm in front of the center point of the foremost edge of the ranging area.

(3) And shooting a calibration picture. A shooting command is input to the RV1126 development board command line, and a plurality of pictures with a resolution of 1920×1080 are shot.

(4) And (3) calibrating pixel coordinates of grid points corresponding to the red card according to the calibrated picture obtained in the step (3), and calculating the actual distance between each point and the camera.

Further, the distance calculating method comprises the following steps:

because each grid area corresponds to one ranging formula, firstly, a vector cross multiplication method is used for judging which grid area the target falls in, namely the ranging formula is calculated by the area when the target falls in which grid area, and the coordinates of the target are substituted into the ranging formula to obtain the distance.

3 grid points are taken for each area, the obtained coordinates and corresponding distances of each pixel point are substituted into the following equation, the ranging parameters a, b and c are obtained by fitting each grid area, and the method for specifically calculating the ranging formula comprises the following steps:

R ² ＝((x-960)·k _x ) ² +((y-1080)·k _y ) ² +c

wherein R represents the distance from the target to the camera, x and y are the midpoint coordinates of the bottommost edge of the human body frame, and k _x 、k _y And c is a distance parameter.

The above formula is arranged to obtain the following formula:

R ² ＝a(x-960) ² +b(y-1080) ² +c

wherein the method comprises the steps of

When a human target is occluded, it is assumed that the coordinates of a human frame when the target is not occluded for one frame before entering the occlusion are (x) _0, y ₀ ,x ₁ ,y ₁ ) Wherein (x) ₀ ,y ₀ ) For the upper left corner coordinates of the frame, (x) ₁ ,y ₁ ) The height h and the width w of the human frame when the human frame is not shielded are calculated as the lower right corner coordinates of the frame:

h＝y ₁ -y ₀

w＝x ₁ -x ₀

calculating the right lower corner correction coordinate (x') of the lowest edge of the human body frame for ranging when being blocked according to the height h and the width w of the human body frame when not being blocked ₁ ,y″ ₁ ) The coordinates of the human body frame at the time of occlusion are assumed to be (x' ₀ ,y′ ₀ ,x′ ₁ ,y′ ₁ ):

x″ ₁ ＝x′ ₀ +w

y″ ₁ ＝y′ ₀ +h

The nonlinear least square method is used for fitting a ranging formula and solving the coefficients a, b and c to be determined, and the basic idea is as follows:

let (x, y, r) be a set of observations, x= [ x ] ₁ ,x ₂ ,x ₃ ,x ₄ ] ^T ，y＝[y ₁ ,y ₂ ,y ₃ ,y ₄ ] ^T Wherein (x) _i ,y _i ) (i=1, 2,3, 4) is the pixel coordinates of the vertices of a square grid, r=r ² The method comprises the steps of carrying out a first treatment on the surface of the And satisfies the following theoretical functions:

r＝f(x,y,w)

wherein w= [ w ] ₁ ,w ₂ ,w ₃ ] ^T For undetermined parameters, w ₁ ＝a,w ₂ ＝b,w ₃ =c; to find the optimal estimate of the parameter w of the function f (x, y, w), for a given 4 sets of observations:

(x _i ,y _i ,r _i )(i＝1,2,3,4)

solving the parameter w taking the minimum value of the objective function L (r, f (x, y, w)) _i (i=1, 2,3, 4), the objective function is as follows:

finally, parameters a, b and c in the ranging formula are obtained.

Finally, compiling an upper computer display program source file at the PC end to obtain an executable file, constructing a push-flow pull stream image transmission system at the RV1126 and the PC end, and displaying a detection picture at the upper computer in real time.

And real-time detection is carried out by a camera mounted on the embedded RV1126 development board, the image is sent to a detection tracking network model for prediction, the detection result and the distance estimation of the human body target are obtained, and the image and the distance parameter are transmitted to an upper computer for display in real time.

Example 2

One embodiment of the present disclosure provides an indoor human target recognition and monocular ranging system, comprising:

Example 3

An embodiment of the present disclosure provides a non-transitory computer readable storage medium for storing computer instructions which, when executed by a processor, implement the indoor human target recognition and monocular ranging method steps.

Example 4

One embodiment of the present disclosure provides an electronic device, including: a processor, a memory, and a computer program; the processor is connected with the memory, the computer program is stored in the memory, and when the electronic equipment runs, the processor executes the computer program stored in the memory so as to enable the electronic equipment to execute the steps of the indoor human body target identification and monocular distance measurement method.

The present disclosure is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the disclosure. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While the specific embodiments of the present disclosure have been described above with reference to the drawings, it should be understood that the present disclosure is not limited to the embodiments, and that various modifications and changes can be made by one skilled in the art without inventive effort on the basis of the technical solutions of the present disclosure while remaining within the scope of the present disclosure.

Claims

1. An indoor human body target recognition and monocular ranging method is characterized by comprising the following steps:

acquiring image data of an indoor human body target, and preprocessing;

2. The method for recognizing and monocular ranging an indoor human target according to claim 1, wherein the step of acquiring image data of the indoor human target and performing preprocessing comprises:

3. The method for recognizing and monocular ranging an indoor human target according to claim 1, wherein the step of determining whether to miss the detection comprises: setting an IOU threshold value, comparing the IOU threshold value with the calculated IOU parameter value, and if the calculated IOU parameter value is higher than the set IOU threshold value, considering that the target of the previous frame is detected in the current frame and no omission condition exists; if the target disappears in the field of view in the next frame, no tracking frame will be generated in the current frame.

4. The method for recognizing and monocular ranging an indoor human body object according to claim 1, wherein an ID tag is generated for each object in the tracking model, the ID of the current object frame is updated every time the object is detected and tracked, and the ID tags of the previous and subsequent frames of the same object are identical.

5. The method for recognizing and monocular ranging an indoor human body object according to claim 4, wherein when the human body object is blocked, a human body frame when a previous frame of the blocked object is not blocked is judged by the ID, then the coordinate of the lowest edge of the human body frame when blocked is calculated according to the height of the human body frame when not blocked, and the calculation of the distance is performed by using the new coordinate.

6. The method for recognizing and monocular ranging an indoor human target according to claim 1, wherein the monocular ranging comprises: the method comprises the steps of measuring the distance between the bottom of a target in a certain range area and a camera, dividing the rectangular area into a plurality of small square areas on the assumption that the distance measuring range is in one rectangular area, wherein each square area corresponds to one distance measuring formula, firstly judging which distance measuring area the target falls in, and then calculating the distance according to the distance measuring formula in the corresponding area.

7. The method for recognizing and measuring a human body target in a room according to claim 1, wherein the distance parameter obtained after the monocular distance measurement and the detection result obtained after the detection of the human body target are transmitted to the upper computer in real time and the detection picture is displayed.

8. An indoor human target recognition and monocular ranging system, comprising:

the human body detection and tracking module is used for constructing a lightweight neural network human body target detection model and a tracking model, inputting image data into the target detection and tracking model to carry out target detection, and reserving a target frame of a current frame image;

9. A non-transitory computer readable storage medium storing computer instructions which, when executed by a processor, implement an indoor human target recognition and monocular ranging method as claimed in any of claims 1 to 7.

10. An electronic device, comprising: a processor, a memory, and a computer program; wherein the processor is connected to the memory, and the computer program is stored in the memory, and when the electronic device is running, the processor executes the computer program stored in the memory, so that the electronic device executes a method for realizing indoor human body target recognition and monocular ranging according to any one of claims 1-7.