CN113378752B

CN113378752B - Pedestrian backpack detection method and device, electronic equipment and storage medium

Info

Publication number: CN113378752B
Application number: CN202110700423.4A
Authority: CN
Inventors: 余永龙
Original assignee: Jinan Boguan Intelligent Technology Co Ltd
Current assignee: Jinan Boguan Intelligent Technology Co Ltd
Priority date: 2021-06-23
Filing date: 2021-06-23
Publication date: 2022-09-06
Anticipated expiration: 2041-06-23
Also published as: CN113378752A

Abstract

The invention discloses a pedestrian backpack detection method, a pedestrian backpack detection device, electronic equipment and a storage medium, wherein the method comprises the following steps: acquiring an image to be detected, and determining a backpack detection area and a strap detection area in the image to be detected; the backpack detection area comprises a harness detection area; when the preset backpack detection model detects a backpack in the backpack detection area, generating a backpack type detection result corresponding to the backpack; when the preset backpack detection model does not detect the backpack in the backpack detection area, the type of the backpack in the backpack detection area is determined by using the preset backpack detection model, and a backpack type detection result is determined according to the type of the backpack. The backpack detecting device and the backpack detecting method respectively detect the main body characteristics and the strap characteristics of the backpack based on the backpack detecting area and the strap detecting area contained in the backpack, and can still detect the strap characteristics when the main body is shielded, thereby not only ensuring that the backpack detecting result and the strap detecting result come from the same backpack, but also improving the accuracy and the flexibility of the pedestrian backpack detection.

Description

Pedestrian backpack detection method and device, electronic equipment and storage medium

Technical Field

The present invention relates to the field of image recognition, and in particular, to a method and an apparatus for detecting a backpack for a pedestrian, an electronic device, and a storage medium.

Background

In the field of image recognition, the process of extracting the target attribute is always an important link in a target recognition task, for example, in pedestrian recognition, whether a pedestrian target carries a packet or not and the type of the packet are important contents for human attribute recognition are judged.

In the related art, detection of a backpack for a pedestrian generally uses a deep learning network model to determine a pedestrian region in an image to be detected, and detects a backpack body in the pedestrian region. However, the backpack is small in size, a human body easily shields the backpack, and then the backpack object is difficult to detect, and the accuracy of the detection of the backpack for pedestrians is affected.

Disclosure of Invention

The invention aims to provide a pedestrian backpack detection method, a pedestrian backpack detection device, electronic equipment and a storage medium, which can simultaneously utilize the main body characteristics and the strap characteristics of a backpack to carry out detection, thereby improving the accuracy and the flexibility of pedestrian backpack detection.

In order to solve the technical problem, the invention provides a pedestrian backpack detection method, which comprises the following steps:

acquiring an image to be detected, and determining a backpack detection area and a strap detection area in the image to be detected; the backpack detection area comprises the harness detection area;

when a preset backpack detection model detects a backpack in the backpack detection area, generating a backpack type detection result corresponding to the backpack;

when the backpack is not detected in the backpack detection area by the preset backpack detection model, the type of the straps in the strap detection area is determined by the preset strap detection model, and the backpack type detection result is determined according to the type of the straps.

Optionally, the determining a backpack detection area and a shoulder strap detection area in the image to be detected includes:

extracting a human body region in the image to be detected, and extracting human body key point data in the human body region;

and generating the backpack detection area and the strap detection area by using the human body key point data.

Optionally, when the human body key point data includes shoulder key point data, hip key point data, and ankle key point data, the generating the backpack detection area and the harness detection area using the human body key point data includes:

judging whether the human body inclines or not by utilizing the shoulder key point data, the hip key point data and the ankle key point data;

if so, calculating a frame compensation amount by using the shoulder key point data, the hip key point data and the ankle key point data, and generating the strap detection area by using the shoulder key point data, the hip key point data, the ankle key point data and the frame compensation amount;

if not, generating the strap detection area by using the shoulder key point data, the hip key point data and the ankle key point data;

and generating the backpack detection area by using the position data of the backpack detection area.

Optionally, after generating the backpack detection area and the harness detection area using the human body key point data, the method further includes:

judging whether the backpack detection area exceeds the human body area;

if so, utilizing the image to be detected to perform edge repairing processing on the backpack detection area beyond the human body area.

Optionally, the determining a type of a strap in the strap detection area by using a preset strap detection model includes:

determining a first strap type and a corresponding first confidence coefficient in the strap detection area by using the preset strap detection model;

extracting line segments in the strap detection area by using an LSD (linear position detection) method, and performing expansion processing on the line segments to obtain a strap area;

determining a second strap type and a corresponding second confidence coefficient according to the strap area, and determining the strap type in the first strap type and the second strap type according to the first confidence coefficient and the second confidence coefficient.

Optionally, when the preset backpack detection model and the preset backpack detection model share a basic network, before acquiring the image to be detected, the method further includes:

acquiring a marked training image, and extracting a backpack area image in the training image;

normalizing the backpack area image, and extracting the characteristics of the normalized backpack area by using the basic network to obtain a backpack area characteristic image;

cutting the backpack area characteristic graph to obtain a strap area characteristic graph;

and respectively utilizing the backpack region characteristic diagram and the strap region characteristic diagram to carry out classification training on the preset backpack detection model and the preset strap detection model.

The invention also provides a pedestrian backpack detection device, comprising:

the area determination module is used for acquiring an image to be detected and determining a backpack detection area and a strap detection area in the image to be detected; the backpack detection area comprises the harness detection area;

the first detection module is used for generating a backpack type detection result corresponding to the backpack when a backpack is detected in the backpack detection area by a preset backpack detection model;

and the second detection module is used for determining the type of the shoulder strap in the shoulder strap detection area by using a preset shoulder strap detection model when the preset shoulder strap detection model does not detect the shoulder strap in the shoulder strap detection area, and determining the detection result of the shoulder strap type according to the type of the shoulder strap.

Optionally, the region determining module includes:

the human body identification submodule is used for extracting a human body region in the image to be detected and extracting human body key point data in the human body region;

and the region determining submodule is used for generating the backpack detection region and the strap detection region by using the human body key point data.

The present invention also provides an electronic device comprising:

a memory for storing a computer program;

and the processor is used for realizing the pedestrian backpack detection method when executing the computer program.

The invention also provides a storage medium, wherein the storage medium stores computer-executable instructions, and when the computer-executable instructions are loaded and executed by a processor, the pedestrian backpack detection method is realized.

The invention provides a pedestrian backpack detection method, which comprises the following steps: acquiring an image to be detected, and determining a backpack detection area and a strap detection area in the image to be detected; the backpack detection area comprises the harness detection area; when a preset backpack detection model detects a backpack in the backpack detection area, generating a backpack type detection result corresponding to the backpack; when the backpack is not detected in the backpack detection area by the preset backpack detection model, the type of the straps in the strap detection area is determined by the preset strap detection model, and the backpack type detection result is determined according to the type of the straps.

Therefore, the backpack detection region and the strap detection region are extracted from the image to be detected, the preset backpack detection model and the preset strap detection model are used for detecting in the corresponding regions respectively, when the backpack cannot be detected by the preset backpack detection model, the strap type determined by the preset strap detection model is used for determining the backpack type detection result, in other words, the backpack detection is performed by using the main body characteristics and the strap characteristics of the backpack, when the backpack main body is shielded by a human body, the strap characteristics can still be detected, and the application flexibility of the pedestrian backpack detection can be effectively improved; meanwhile, the backpack detection area comprises the strap detection area, so that the backpack detection result and the strap detection result can be ensured to come from the same backpack, the accuracy and the reliability of the pedestrian backpack detection can be further ensured, and the application effect of the pedestrian backpack detection can be finally and effectively improved. The invention also provides a pedestrian backpack detection device, electronic equipment and a storage medium, and has the beneficial effects.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

Fig. 1 is a flowchart of a pedestrian backpack detection method according to an embodiment of the present invention;

FIG. 2 is a schematic diagram illustrating the generation of backpack detection zones and strap detection zones according to an embodiment of the present invention;

fig. 3 is a schematic diagram of a strap extraction process based on LSD line detection according to an embodiment of the present invention;

FIG. 4 is a schematic diagram illustrating a multi-tasking network model training process according to an embodiment of the present invention;

fig. 5 is a block diagram of a pedestrian backpack detection apparatus according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In the related art, detection of a backpack for pedestrians generally uses a deep learning network model to determine a pedestrian region in an image to be detected, and detects a backpack body in the pedestrian region. However, the backpack is small in size, a human body can easily shield the backpack, and then the backpack object is difficult to detect, so that the accuracy of the backpack detection for pedestrians is influenced. In view of the above, the present invention provides a method for detecting a backpack for pedestrians, which can simultaneously detect the backpack body characteristics and the backpack straps characteristics, and can still detect the backpack straps when the backpack body is covered by a human body, thereby improving the accuracy and flexibility of the backpack detection for pedestrians. Referring to fig. 1, fig. 1 is a flowchart of a pedestrian backpack detection method according to an embodiment of the present invention, where the method may include:

s101, acquiring an image to be detected, and determining a backpack detection area and a strap detection area in the image to be detected; the backpack detection area includes a harness detection area.

In the embodiment of the invention, the backpack detection area and the strap detection area are determined in the image to be detected simultaneously, and the backpack body detection and the strap detection are respectively carried out in the two areas. Because the knapsack main part is sheltered from by the human body easily, only detect the knapsack main part and cause easily to miss and examine, reduce the accuracy that pedestrian's knapsack detected, nevertheless when the knapsack main part is sheltered from, the braces can not be sheltered from, for example when the knapsack that the pedestrian carried is the backpack, according to the characteristics of carrying of backpack, when the human certain side of backpack main part sheltered from, two baldrics can appear at the human opposite side relatively, consequently alright confirm the type whether the pedestrian carries knapsack and knapsack through the mode that detects the braces. In other words, the backpack body characteristic and the strap characteristic are detected simultaneously, so that the influence of human body shielding on backpack detection can be avoided, and the accuracy and the reliability of pedestrian backpack detection are improved.

It should be noted that the embodiment of the present invention does not limit the manner of obtaining the image to be detected, and for example, the image may be obtained by directly shooting with an image capturing device, or may be obtained by extracting a frame from a video. In order to ensure real-time monitoring of the backpack target, the image to be detected in the embodiment of the invention can be obtained by frame extraction from the video. The embodiment of the present invention does not limit the specific frame extraction manner, such as random frame extraction, fixed duration frame extraction, fixed frame number frame extraction, and the like, and can be set according to the actual application requirements. The embodiment of the invention also does not limit various shooting parameters of the image to be detected, and can be adjusted according to the actual application requirements.

Further, it can be understood that, in order to detect the backpack of the pedestrian, at least the backpack detection area and the strap detection area should include the trunk information of the human body, so that the human body information in the detection area can be firstly identified, the position information of the human body can be determined, and then the backpack detection area and the strap detection area can be generated according to the position information of the human body. The embodiment of the present invention does not limit the specific form of the human body position information, and for example, the specific form may be a human body region where the human body is located in the image, or may also be position information of each key point of the human body in the image, where the key point is a certain position on the human body. In consideration of the fact that the backpack is small in size, if the backpack detection area and the strap detection area are generated by using the human body area, the problems of inaccurate area positioning and large error are easily caused, the human body key point data can more accurately position the position of the human body trunk, and the backpack detection area and the strap detection area can be more accurately generated by using the human body key point data. Of course, it can be understood that if the human body key point data is directly extracted from the image to be detected, a large amount of calculation is generated, so that the human body region can be firstly determined in the image to be detected, and the human body key point data can be extracted from the human body region.

In one possible case, determining the backpack detection area and the harness detection area in the image to be detected may include:

step 11: extracting a human body region in an image to be detected, and extracting human body key point data in the human body region;

step 12: and generating a backpack detection area and a strap detection area by using the human body key point data.

It should be noted that, the embodiment of the present invention does not limit the determination manner of the human body region, and may refer to the related technology of human body target detection, for example, the YOLO-V4 target detection network is adopted to perform human body target identification. The embodiment of the invention also does not limit the specific way of extracting the human key point data in the human body region, and can refer to the relevant technology of identifying human key point detection, for example, the human key point data can be extracted in a way of predicting the human key point through a regression network. Furthermore, the embodiment of the invention does not limit the generation of the human body key point data specifically required by the two regions, and can be set by combining human body trunk knowledge and application requirements. In the embodiment of the present invention, in order to calibrate the human trunk, at least the data of the shoulder key points of two shoulders, the data of the hip key points of two sides of the hip bone and the data of the ankle key points of two ankles are required. Certainly, in order to further expand the detection range of the luggage, the elbow key points, the wrist key point data and the like can be further extracted, and certainly, the key points of other non-joint parts on the human body can also be extracted and can be set according to the actual application requirements.

Furthermore, after extracting the data of the shoulder key points, the data of the hip key points and the data of the ankle key points, the positions of the backpack and the braces which are possibly arranged can be positioned, and then a backpack detection area and a brace detection area are determined. It will be appreciated that the backpack detection zone comprises a harness detection zone, and thus the backpack detection zone is larger than the harness detection zone; further, in order to ensure that the backpack detection area includes the strap detection area, the two areas should be generated in sequence, and the position information of one area should be used to generate the other area. The embodiment of the invention does not limit the generation sequence of the backpack detection area and the strap detection area, and can be adjusted according to the actual application requirement. Considering that the shoulder strap detection area is small and higher positioning accuracy is required, the shoulder strap detection area may be first generated using the shoulder key point data, hip key point data, and ankle key point data, and the backpack detection area may be further generated using the position data of the shoulder strap detection area.

Furthermore, considering that the human body may not be directly facing the image acquisition device but form a certain included angle with the image acquisition device, that is, the human body may be inclined, at this time, the shoulder detection area generated by using the shoulder key point data, the hip key point data and the ankle key point data may be too small to meet the application requirement, so that whether the human body is inclined or not can be judged by using the shoulder key point data, the hip key point data and the ankle key point data, and then the corresponding area compensation amount is generated by using the shoulder key point data, the hip key point data and the ankle key point data when the human body is inclined, and the area of the shoulder detection area is expanded to improve the effectiveness of the shoulder detection.

In one possible case, when the human body key point data includes shoulder key point data, hip key point data, and ankle key point data, generating the backpack detection area and the harness detection area using the human body key point data may include:

step 21: judging whether the human body inclines or not by utilizing the shoulder key point data, the hip key point data and the ankle key point data; if yes, go to step 22; if not, go to step 23;

step 22: calculating a frame compensation amount by using the shoulder key point data, the hip key point data and the ankle key point data, and generating a strap detection area by using the shoulder key point data, the hip key point data, the ankle key point data and the frame compensation amount;

step 23: generating a strap detection area by using the shoulder key point data, the hip key point data and the ankle key point data;

step 24: a backpack detection area is generated using position data of the backpack detection area.

It is understood that the shoulder key point data, hip key point data, and ankle key point data are coordinate data, and a coordinate system is established based on the origin of a certain point in the image to be detected. The embodiment of the invention does not limit the setting mode of the coordinate system and can carry out setting according to the actual application requirement.

Furthermore, as the backpack detection area is possibly too large and exceeds the image boundary of the human body area, the backpack detection area generated by the human body area has image deletion and size errors, further the size proportion error between the backpack detection area and the strap detection area is caused, and subsequent backpack detection and strap detection are influenced, the position relation between the backpack detection area and the human body area can be judged after the backpack detection area is generated, and the image pair is used for performing edge supplementing processing on the backpack detection area to be detected and exceeds the human body area, so that the image information exceeding the human body area in the backpack detection area is recovered, and the original size information of the backpack detection area is restored.

In one possible case, after generating the backpack detection area and the strap detection area by using the human body key point data, the method may further include:

step 31: judging whether the backpack detection area exceeds the human body area; if yes, go to step 32; if not, quitting the judgment;

step 32: and performing edge repairing treatment on the backpack detection area beyond the human body area by using the image to be detected.

Of course, if the backpack detection area exceeds the image boundary of the image to be detected, in order to ensure the correct size data of the backpack detection area, a blank image may be supplemented to the backpack detection area, and the original size information of the backpack detection area is recovered to maintain the original size ratio of the backpack detection area to the strap detection area.

Lower knotA specific schematic diagram describes the above process for generating the backpack detection area and the strap detection area. Referring to fig. 2, fig. 2 is a schematic diagram illustrating generation of a backpack detection area and a strap detection area according to an embodiment of the present invention. The thick grey line frame is the human body region, where P _{det_start} (x _s ,y _s )、P _{det_end} (x _e ,y _e )、W _det 、H _det Respectively representing the starting point, the ending point and the width and the height of the detection frame; p ₀ (x ₀ ,y ₀ )、P ₁ (x ₁ ,y ₁ ) For shoulder keypoint data, P ₂ (x ₂ ,y ₂ )、P ₃ (x ₃ ,y ₃ ) For hip key point data, P ₄ (x ₄ ,y ₄ )、P ₅ (x ₅ ,y ₅ ) Ankle keypoint data. The dotted line frame and the black thin solid line frame in the figure respectively represent the backpack detection area and the strap detection area, and the final target respectively represents the initial coordinate and the width and the height of the backpack detection area and the strap detection area, wherein P is the initial coordinate and the width and the height of the backpack detection area and the strap detection area in the calculation _{straps_s} (x _{s_s} ,y _{s_s} )、P _{straps_e} (x _{s_e} ,y _{s_e} )、W _straps 、H _straps Respectively representing the starting point, the ending point and the width and the height of the area of the strap detection area, P _{bags_s} (x _{b_s} ,y _{b_s} )、P _{bags_e} (x _{b_e} ,y _{b_e} )、W _bags 、H _bags Respectively representing the starting point, the ending point and the width and the height of the area of the backpack detection area. Whether the human body is inclined can be judged by the following formula:

wherein

Indicates the amount of regional compensation when

Less than zero indicates that the body is not tilted, whereas there is a tilt, in other words, the aboveThe formula determines whether the human body is inclined by determining the relationship between coordinates of locations on the body that are relatively fixed, such as shoulder and hip key points, and coordinates of locations on the body that are prone to positional shifts, such as ankle key points. Further, the specific position data of the strap detection area may be calculated using the following formula:

where max () and min () represent a maximum function and a minimum function for determining the maximum or minimum of two values. Further, the position data of the backpack detection area can be calculated by using the position data of the backpack detection area:

it should be noted that the proportional parameters in the above formula (the real numbers appearing in the above formula are all proportional parameters) are determined based on statistics of bag-carrying position data and strap-carrying position data in a large number of actual samples. In other words, the parameters are related to the image samples participating in the statistics, and when the samples change, the parameters will be different.

S102, when the preset knapsack detection model detects the knapsack in the knapsack detection area, generating a knapsack type detection result corresponding to the knapsack.

It can be understood that the backpack detection model is preset as a classification network model, i.e. the identification and classification of backpack objects can be completed. It is also understood that the preset knapsack detection model used in the embodiment of the present invention is a classification network model that has been trained. The embodiment of the invention does not limit the basic network based on which the preset backpack detection model is based, and can be a deep learning classification network such as ResNet18, VGG-16 and the like, and can be selected according to the actual application requirements.

Further, the embodiment of the present invention is not limited to a specific type of backpack, and may be configured according to practical requirements, for example, it may be a backpack, a handbag, a rucksack, a backpack, or the like.

S103, when the backpack is not detected in the backpack detection area by the preset backpack detection model, determining the type of the backpack in the backpack detection area by using the preset backpack detection model, and determining a backpack type detection result according to the type of the backpack.

It can be understood that the preset backpack detection model is also a classification network model, i.e., the identification and classification of the backpack targets can be completed, and the identification and classification are consistent with the limited description of the preset backpack detection model. It is also understood that the preset strap detection model used in the embodiment of the present invention is a classification network model that has been trained. It should be noted that, in the embodiments of the present invention, it is not limited whether the predetermined backpack detection model and the predetermined strap detection model share the same basic network, in other words, the embodiments of the present invention do not limit whether the predetermined backpack detection model and the predetermined strap detection model are two branch models based on the same basic network. If the preset backpack detection model and the preset strap detection model are set as independent classification network models, the preset backpack detection model and the preset strap detection model can not share the same basic network when higher detection efficiency can still be ensured; when the detection efficiency and the model detection performance need to be improved, the same basic network can be configured for the preset backpack detection model and the preset strap detection model, and the preset backpack detection model and the preset strap detection model are set to be two branch models based on the same basic network. In the embodiment of the invention, in order to improve the detection efficiency and enhance the model detection performance, the preset backpack detection model and the preset strap detection model can be configured to share the same basic network. Further, it can be understood that, if the two models are based on the same basic network, after extracting the backpack detection area, the backpack detection area is firstly output to the basic network for feature extraction; after the characteristic extraction is completed, the backpack detection area is cut out in the backpack detection area, and finally, the backpack detection area and the backpack detection area can be respectively conveyed to the preset backpack detection model and the preset backpack detection model for corresponding detection, so that the detection time is effectively shortened, and the detection efficiency is improved.

Further, the embodiment of the present invention is not limited to a specific type of the harness, and may be, for example, a non-harness, a single harness, and a double harness. Since the number of straps is related to the backpack type, the backpack type can be determined according to the strap type, for example, a single strap corresponds to a single shoulder bag, a handbag, a double strap corresponds to a double shoulder bag, and a non-strap corresponds to a non-bag.

Further, considering that the area occupied by the strap in the image is small, when the image is down-sampled by the deep learning neural network for multiple times, the strap features may be difficult to be identified by the classification network model, and the traditional image detection method, such as the lsd (line Segment detector), is easy to extract these fine image features. Therefore, in the embodiment of the invention, the preset strap detection model and the LSD line detection method can be fused to detect the strap characteristics together.

In one possible case, determining a type of harness in a harness detection area using a preset harness detection model includes:

step 41: and determining a first strap type and a corresponding first confidence coefficient in the strap detection area by using a preset strap detection model.

Step 42: and extracting line segments in the strap detection area by using an LSD (linear location detection) method, and performing expansion processing on the line segments to obtain the strap area.

Because the LSD line detection method converts image edge information into line segment information, and the strap generally has a more regular shape (such as a straight line segment and a rectangle), after the line segment in the strap detection area is extracted by the LSD line detection method, the strap area can be obtained by performing expansion processing on the line segment. Referring to fig. 3, fig. 3 is a schematic diagram of a strap extraction process based on LSD line detection according to an embodiment of the present invention, where a schematic diagram d and a schematic diagram e are extraction results of an LSD line detection method, and a schematic diagram f is a strap area after expansion of a wire segment.

Step 43: and determining a second strap type and a corresponding second confidence coefficient according to the strap area, and determining the strap type from the first strap type and the second strap type according to the first confidence coefficient and the second confidence coefficient.

It is understood that the classification network model may also be used to perform classification recognition on the images corresponding to the strap region, so as to determine the second strap type. It will also be appreciated that the classification network model will add confidence to the recognition results, and will take the recognition result with the highest confidence as the output result. Because the invention utilizes two classification recognition networks to detect the type of the strap, the confidence degrees (namely the first confidence degree and the second confidence degree) of the two classification recognition results (namely the first strap type and the second strap type) can be integrated to determine the final strap type. The embodiment of the invention does not limit the way of carrying out comprehensive identification by using the confidence coefficient, for example, the weighted calculation can be carried out on the confidence coefficient, or the operation of taking or can be carried out by using the confidence coefficient, and the setting can be carried out according to the actual application requirement.

Based on the embodiment, firstly, a backpack detection area and a strap detection area are extracted from an image to be detected, and detection is performed in the corresponding areas by using a preset backpack detection model and a preset strap detection model respectively, when the backpack cannot be detected by the preset backpack detection model, a backpack type detection result is determined by using a strap type determined by the preset strap detection model, in other words, the backpack detection method performs backpack detection by using the main body characteristics and the strap characteristics of the backpack, when the backpack main body is shielded by a human body, the strap characteristics can still be detected, and the application flexibility of the pedestrian backpack detection can be effectively improved; meanwhile, the backpack detection area comprises the strap detection area, so that the backpack detection result and the strap detection result can be ensured to come from the same backpack, the accuracy and the reliability of the pedestrian backpack detection can be further ensured, and the application effect of the pedestrian backpack detection can be finally and effectively improved.

Based on the above embodiment, the following describes the training process of the preset backpack detection model and the preset harness detection model. In a possible case, before acquiring the image to be detected, the method may further include:

s201, obtaining the marked training image, and extracting the backpack area image in the training image.

It can be understood that the training images need to be data labeled, i.e., labeled with a backpack type and a harness type. The embodiment of the invention does not limit the number of the training images and can be set according to the actual application requirements. It can be understood that the higher the number of training images, the higher the recognition rate of the classification network model.

Further, for the backpack region image in the extracted training image, the above embodiment may be referred to, consistent with the above definition description of the backpack detection region for extracting the image to be detected.

S202, normalizing the backpack area image, and performing feature extraction on the normalized backpack area by using the basic network to obtain a strap area feature map.

In the embodiment of the invention, the size difference of the human body in the image is considered, so that the size difference of the backpack area image generated by the human body area is caused, and the model training and the segmentation of the strap area image are influenced. To ensure the areas are aligned, the images of the backpack areas are normalized, i.e., the sizes of the backpack training areas are scaled to be the same. The embodiment of the invention does not limit the size of the backpack area image after normalization, and can be set according to the actual application requirements, for example, 224 pixels by 224 pixels.

Further, the embodiment of the present invention does not limit the specific process of extracting the image features by the basic network, and the related technologies may be referred to according to the specifically selected basic network (such as ResNet18, VGG-16, etc.).

And S203, segmenting the backpack area characteristic graph to obtain a strap area characteristic graph.

After the feature extraction is completed, the backpack area image can be segmented (Slice) to obtain the strap area image, and then the same basic network is used for carrying out classification training on the two branch models so as to improve the classification training efficiency. It should be noted that the embodiment of the present invention does not limit the specific process of the segmentation, and specifically, reference may be made to the related technology of segmentation and delamination (Slice Layer), as long as the strap region image can be extracted from the backpack region image. It will be appreciated that the strap region image may be cut from the backpack region image at some predetermined cut-out ratio. The embodiment of the invention does not limit the specific numerical value of the segmentation proportion, the setting of the segmentation proportion is related to the proportion parameters used for extracting the backpack detection area and the strap detection area, and the setting can be carried out by referring to the proportion parameters in the embodiment.

S204, respectively utilizing the backpack region characteristic diagram and the strap region characteristic diagram to carry out classification training on the preset backpack detection model and the preset strap detection model.

It can be understood that, since the preset backpack detection model and the preset strap detection model are classification network models, they need to be trained in classification. The embodiment of the invention does not limit the specific classification training modes of the preset backpack detection model and the preset harness detection model, and can refer to the related technology of a classification network model. It should be noted that, since a general basic network performs downsampling on an input image to reduce video memory consumption and improve calculation efficiency by losing image quality, but the strap characteristics are small, and downsampling for multiple times may cause serious damage to the strap characteristics, the downsampling times of the basic classification network may be appropriately reduced when a strap detection model is trained. Further, in order to enhance the detailed information of the strap region image, the strap region image may be input to a Convolution Layer (Convolution Layer) and Convolution calculation may be performed. It should be noted that, the embodiment of the present invention does not limit the specific manner of convolution calculation, and reference may be made to related technologies.

Referring to fig. 4, fig. 4 is a schematic diagram of a training flow of a multi-task network model provided in an embodiment of the present invention, it can be seen that the network model can be classified into two branches, where a main branch (i.e., a branch corresponding to the preset backpack detection model) is formed by combining a main branch feature extraction Layer, a Pooling1 Layer, an IP-Layer1 Layer, a softmax1 Layer, and a Loss1 Layer, and a corresponding backpack type Label is Label1, where the Pooling Layer, the IP-Layer, the softmax Layer, and the Loss Layer are common structures in the classified network model; the secondary branch (i.e. the branch corresponding to the preset strap detection model) is composed of a Slices Layer, a Con2 (convolutional Layer), a Pooling2 Layer, an IP-Layer2 Layer and a softmax2 Layer, and the two branches share the same feature extraction result and a Shared Layers composed of a basic network.

When the multi-task network model is trained, the two types of labels can be simultaneously sent into the network in the training process, so that two branches can be trained simultaneously; or, the backpack type Label1 can be sent to train the main branch, after the model converges, the related learning rate of the Shared basic network modules (Shared Layers) is fixed, and then the strap branch is trained based on the converged model. Based on the embodiment, the preset backpack detection model and the preset harness detection model can be trained simultaneously by using the same basic network, so that the training efficiency can be effectively improved; meanwhile, the backpack region images used in training are subjected to size normalization, so that the size uniformity of the backpack region images can be effectively ensured, the backpack regions of the training images and the backpack regions can be effectively aligned, and the training efficiency is further improved.

The following describes a pedestrian backpack detection apparatus, an electronic device, and a storage medium according to embodiments of the present invention, and the pedestrian backpack detection apparatus, the electronic device, and the storage medium described below and the pedestrian backpack detection method described above may be referred to in correspondence.

Referring to fig. 5, fig. 5 is a block diagram of a pedestrian backpack detection apparatus according to an embodiment of the present invention, the apparatus may include:

the region determining module 501 is configured to obtain an image to be detected, and determine a backpack detection region and a strap detection region in the image to be detected; the backpack detection area comprises a harness detection area;

the first detection module 502 is configured to generate a backpack type detection result corresponding to a backpack when the backpack is detected in the backpack detection area by the preset backpack detection model;

the second detecting module 503 is configured to, when the preset backpack detecting model does not detect a backpack in the backpack detecting region, determine a type of the backpack in the backpack detecting region by using the preset backpack detecting model, and determine a backpack type detecting result according to the type of the backpack.

Optionally, the area determining module 501 may include:

the key point extraction submodule is used for extracting a human body region in the image to be detected and extracting human body key point data in the human body region;

and the region determining submodule is used for generating a backpack detection region and a strap detection region by using the human body key point data.

Optionally, the region determining sub-module may include:

an inclination determination unit for determining whether the human body is inclined by using the shoulder key point data, the hip key point data, and the ankle key point data;

a first region generation unit, configured to calculate a frame compensation amount using the shoulder key point data, the hip key point data, and the ankle key point data, and generate a brace detection region using the shoulder key point data, the hip key point data, the ankle key point data, and the frame compensation amount if the detected position is the same as the detected position;

a second area generation unit for generating a strap detection area by using the shoulder key point data, the hip key point data and the ankle key point data if the shoulder key point data, the hip key point data and the ankle key point data are not the same;

and a third area generation unit for generating a backpack detection area using the position data of the backpack detection area.

Optionally, the area determining module 501 may further include:

the abnormal area judgment submodule is used for judging whether the backpack detection area exceeds the human body area;

and the abnormal area removing submodule is used for performing edge supplementing processing on the backpack detection area beyond the human body area by using the image to be detected if the abnormal area removing submodule is used.

Optionally, the second detecting module 503 may include:

the third detection submodule is used for determining a first strap type and a corresponding first confidence coefficient in a strap detection area by using a preset strap detection model;

the fourth detection sub-module is used for extracting line segments in the strap detection area by using an LSD (linear space detection) linear detection method and performing expansion processing on the line segments to obtain a strap area;

and the strap type determining submodule is used for determining a second strap type and a corresponding second confidence coefficient according to the strap area, and determining the strap type in the first strap type and the second strap type according to the first confidence coefficient and the second confidence coefficient.

Optionally, the apparatus may further include:

the backpack area extracting module is used for acquiring the marked training image and extracting a backpack area image in the training image;

the characteristic extraction module is used for normalizing the backpack area image and extracting the characteristics of the normalized backpack area by using a basic network to obtain a backpack area characteristic image;

the cutting module is used for cutting the backpack region characteristic graph to obtain a strap region characteristic graph;

and the training module is used for carrying out classification training on the preset backpack detection model and the preset strap detection model by respectively utilizing the backpack region characteristic diagram and the strap region characteristic diagram.

An embodiment of the present invention further provides an electronic device, including:

a memory for storing a computer program;

a processor for implementing the steps of the pedestrian backpack detection method when executing the computer program.

Since the embodiment of the electronic device portion corresponds to the embodiment of the pedestrian backpack detection method portion, please refer to the description of the embodiment of the pedestrian backpack detection method portion for the embodiment of the electronic device portion, which is not repeated here.

The embodiment of the invention also provides a storage medium, wherein a computer program is stored on the storage medium, and when being executed by a processor, the steps of the pedestrian backpack detection method of any embodiment are realized.

Since the embodiment of the storage medium portion corresponds to the embodiment of the pedestrian backpack detection method portion, please refer to the description of the embodiment of the pedestrian backpack detection method portion for the embodiment of the storage medium portion, which is not repeated here.

The embodiments are described in a progressive manner in the specification, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.

Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

The pedestrian backpack detection method, the pedestrian backpack detection device, the electronic device and the storage medium provided by the invention are described in detail above. The principles and embodiments of the present invention have been described herein using specific examples, which are presented only to assist in understanding the method and its core concepts of the present invention. It should be noted that, for those skilled in the art, without departing from the principle of the present invention, it is possible to make various improvements and modifications to the present invention, and those improvements and modifications also fall within the scope of the claims of the present invention.

Claims

1. A pedestrian backpack detection method, comprising:

2. The pedestrian backpack detection method according to claim 1, wherein the determining of the backpack detection area and the strap detection area in the image to be detected comprises:

3. The pedestrian backpack detection method of claim 2, wherein when the human body key point data includes shoulder key point data, hip key point data, and ankle key point data, the generating the backpack detection area and the harness detection area using the human body key point data comprises:

4. The pedestrian backpack detection method of claim 2, further comprising, after generating the backpack detection area and the harness detection area using the human body key point data:

judging whether the backpack detection area exceeds the human body area;

5. The pedestrian backpack detection method of claim 1, wherein said determining a harness type in the harness detection area using a preset harness detection model comprises:

and determining a second strap type and a corresponding second confidence degree according to the strap area, and determining the strap type in the first strap type and the second strap type according to the first confidence degree and the second confidence degree.

6. The pedestrian backpack detection method according to any one of claims 1 to 5, wherein when the preset backpack detection model and the preset backpack detection model share a basic network, before acquiring the image to be detected, the method further comprises:

7. A pedestrian backpack detection device, comprising:

8. The pedestrian backpack detection apparatus of claim 7, wherein the zone determination module comprises:

9. An electronic device, comprising:

a memory for storing a computer program;

a processor for implementing the pedestrian backpack detection method according to any one of claims 1 to 6 when executing the computer program.

10. A storage medium having stored thereon computer-executable instructions that, when loaded and executed by a processor, implement the pedestrian backpack detection method of any one of claims 1 to 6.