CN110717449A

CN110717449A - Vehicle annual inspection personnel behavior detection method and device and computer equipment

Info

Publication number: CN110717449A
Application number: CN201910954333.0A
Authority: CN
Inventors: 周康明; 李俊杰
Original assignee: Shanghai Eye Control Technology Co Ltd
Current assignee: Shanghai Eye Control Technology Co Ltd
Priority date: 2019-10-09
Filing date: 2019-10-09
Publication date: 2020-01-21

Abstract

The application relates to a behavior detection method and device for vehicle annual inspection personnel and computer equipment. The method comprises the following steps: performing key point detection on a picture sequence to be detected through a key point detection model, connecting a plurality of key points to obtain a human body skeleton, and marking the positions of pixel points occupied by the human body skeleton to obtain an initial characteristic matrix; identifying the initial characteristic matrix through a behavior type identification network to obtain the behavior type of the vehicle annual inspection personnel; positioning the category and the position information of an interested object in the picture sequence to be detected through a target detection model; and generating a behavior detection result of the vehicle annual inspection personnel according to the position information of the key points, the behavior category of the vehicle annual inspection personnel, the category of the interested object and the position information of the interested object. By adopting the method, the influence of each joint of the human body on the behavior can be fully utilized by combining the key points of the vehicle annual inspection personnel and the positioning of the interested object, so that the behavior of the vehicle annual inspection personnel can be more accurately detected.

Description

Vehicle annual inspection personnel behavior detection method and device and computer equipment

Technical Field

The present application relates to the field of vehicle detection technologies, and in particular, to a method and an apparatus for detecting a behavior of a vehicle annual inspection worker, a computer device, and a storage medium.

Background

Annual vehicle inspection means that each vehicle which obtains a formal license plate and a driving license needs to be inspected, which is equivalent to physical examination of the vehicle according to regulations every year. The annual inspection of the vehicle can eliminate potential safety hazards of the vehicle in time, supervise and urge to strengthen the maintenance of the vehicle, and reduce traffic accidents.

Annual vehicle inspections are typically performed by trained professional annual vehicle inspection personnel. In the vehicle annual inspection process, vehicle annual inspection personnel should carry out comprehensive detection on a vehicle to be detected according to a specified detection flow and an operation mode. Meanwhile, the vehicle management station also needs to supervise and detect whether the operation is standard or not. The traditional supervision work usually needs to be finished by arranging special supervisors, the supervision result depends on the professional literacy of the supervisors, and the problem that the behavior of the vehicle annual inspection supervisors is not accurate enough, so that the false inspection rate of the vehicle annual inspection is high is caused.

Disclosure of Invention

In view of the above, it is necessary to provide a method, an apparatus, a computer device and a storage medium for detecting the behavior of a vehicle annual inspection personnel, which can detect whether the behavior of the vehicle annual inspection personnel meets the operation requirement, thereby improving the accuracy of vehicle annual inspection.

In order to achieve the above object, in one aspect, an embodiment of the present application provides a behavior detection method for a vehicle annual inspection person, where the method includes:

acquiring a picture sequence to be detected;

performing key point detection on the picture sequence to be detected through a key point detection model to obtain a plurality of key points in each picture to be detected of the picture sequence to be detected;

connecting the plurality of key points to obtain a human body skeleton, and marking the positions of pixel points occupied by the human body skeleton to obtain an initial characteristic matrix;

identifying the initial characteristic matrix through a behavior type identification network to obtain the behavior type of the vehicle annual inspection personnel;

positioning an interested object in the picture sequence to be detected through a target detection model to obtain the category and position information of the interested object;

and generating a behavior detection result of the vehicle annual inspection personnel according to the position information of the key points, the behavior category of the vehicle annual inspection personnel, the category of the interested object and the position information of the interested object.

On the other hand, the embodiment of this application still provides a vehicle annual survey personnel's action detection device, the device includes:

the acquisition module is used for acquiring a picture sequence to be detected;

the key point detection module is used for detecting key points of the picture sequence to be detected through the key point detection model to obtain a plurality of key points in each picture to be detected of the picture sequence to be detected;

the matrix generation module is used for connecting the plurality of key points to obtain a human body skeleton, marking the positions of pixel points occupied by the human body skeleton to obtain an initial characteristic matrix;

the behavior type identification module is used for identifying the initial characteristic matrix through a behavior type identification network to obtain the behavior type of the vehicle annual inspection personnel;

the target detection module is used for positioning an interested object in the picture sequence to be detected through the target detection model to obtain the category and the position information of the interested object;

and the detection result generation module is used for generating a behavior detection result of the vehicle annual inspection personnel according to the position information of the key points, the behavior category of the vehicle annual inspection personnel, the category of the interested object and the position information of the interested object.

In yet another aspect, the present application further provides a computer device, which includes a memory and a processor, where the memory stores a computer program, and the processor implements the steps of the method when executing the computer program.

In yet another aspect, the present application further provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to implement the steps of the method as described above.

According to the behavior detection method, the behavior detection device, the computer equipment and the storage medium for the annual vehicle inspector, the sequence of the pictures to be detected is obtained, and a plurality of key points in each picture to be detected are detected based on the deep learning theory. And connecting the plurality of key points to obtain a human body skeleton, and marking the positions of the pixel points occupied by the human body skeleton to obtain an initial characteristic matrix. And identifying the initial characteristic matrix based on a deep learning theory to obtain the behavior category of the vehicle annual inspection personnel. And then detecting the category and the position information of the interested object in the picture sequence to be detected based on a deep learning theory. And finally, judging whether the behavior of the vehicle annual inspection personnel meets the requirements or not according to the position information of the key points, the behavior category of the vehicle annual inspection personnel, the category of the interested object and the position information of the interested object. The method is based on a deep learning theory, combines the position information of key points of vehicle annual inspection personnel and the positioning of interested objects, and makes full use of the influence of each joint of a human body on the behavior, so that the behavior detection judgment result of the vehicle annual inspection personnel is more accurate.

Drawings

FIG. 1 is a diagram of an exemplary embodiment of an application environment of a method for detecting vehicle annual inspector behavior;

FIG. 2 is a schematic flow chart illustrating a method for detecting vehicle annual inspector behavior according to one embodiment;

FIG. 3 is a schematic diagram of a process for obtaining an initial feature matrix according to an embodiment;

FIG. 4 is a schematic diagram of an initial feature matrix in one embodiment;

FIG. 5 is a flow diagram that illustrates the identification of a behavior category for a vehicle inspector using a behavior category identification network, under an embodiment;

FIG. 6 is a schematic flow chart illustrating the use of an object detection model to locate an object of interest in one embodiment;

FIG. 7 is a flowchart illustrating a method for detecting vehicle annual inspector behavior according to one embodiment;

FIG. 8 is a block diagram showing a behavior detection device for a vehicle inspector in one embodiment;

FIG. 9 is a diagram illustrating an internal structure of a computer device according to an embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

The behavior detection method for the vehicle annual inspection personnel can be applied to the application environment shown in fig. 1. The application environment includes a terminal 102, a server 104, and an image capture device 106. Among other things, the terminal 102 may refer to an electronic device with strong data storage and computing capabilities. The image capturing device 106 may be in the terminal 102 or may be a separate device. The terminal 102 and the server 104 communicate via a network. The terminal 102 may be provided with a trained key point detection model, a behavior category recognition network, and an object detection model. Specifically, the image acquisition device 106 acquires a picture sequence to be detected, and the terminal 102 acquires the picture sequence to be detected. The terminal 102 performs key point detection on the picture sequence to be detected through the key point detection model to obtain a plurality of key points in each picture to be detected of the picture sequence to be detected, and connects the plurality of key points to obtain the human skeleton. The terminal 102 marks the positions of the pixel points occupied by the human skeleton to obtain an initial feature matrix. The terminal 102 identifies the initial characteristic matrix through the behavior type identification network to obtain the behavior type of the vehicle annual inspection personnel. The terminal 102 positions the object of interest in the image sequence to be detected through the target detection model to obtain the category and position information of the object of interest. The terminal 102 generates a behavior detection result of the vehicle annual inspection personnel according to the position information of the key point, the behavior category of the vehicle annual inspection personnel, the category of the interested object and the position information of the interested object.

In other embodiments, the method for detecting the behavior of the vehicle annual inspection personnel provided by the application can also be applied to a terminal side and a server side, the image acquisition device 106 acquires the picture sequence to be detected, the picture sequence to be detected is sent to the server 104 through the terminal 102 in a network connection mode or the like, and then the server 104 detects the behavior of the vehicle annual inspection personnel according to the picture sequence to be detected. The terminal 102 may be, but is not limited to, various portable mobile devices, and the server 104 may be a live server or a remote server.

In one embodiment, as shown in fig. 2, a behavior detection method for a vehicle annual inspection personnel is provided, which is described by taking the method as an example applied to the terminal 102 in fig. 1, and includes the following steps:

step 202, acquiring a picture sequence to be detected.

The picture sequence to be detected is a series of pictures to be detected, which are continuously acquired by annual vehicle inspectors at different times and different directions in sequence. In this embodiment, the picture sequence to be detected may be acquired from a video captured by the image capturing device, and the picture sequence to be detected is obtained by sampling the acquired video at a preset interval. The preset interval may be determined according to the acquired duration, for example, the time of the acquired video is 3 minutes, and then the preset interval may be 2 seconds.

And 204, performing key point detection on the picture sequence to be detected through the key point detection model to obtain a plurality of key points in each picture to be detected of the picture sequence to be detected.

The key point detection model may be any key point detection network based on deep learning, such as openpos (an open source human gesture recognition project). In this embodiment, a keypoint detection model is taken as an openpos model for example. Specifically, the picture to be detected is input into a well-trained key point detection model, and features are extracted through a convolutional network to obtain a group of feature maps. A two-branch multi-stage (e.g., 6-stage) convolutional neural network is used to preset the confidence of keypoints and keypoint affinity vectors, where a first branch predicts a set of confidence maps for body part positions (e.g., elbow, knee, etc.) and a second branch predicts a set of partial affinity vectors. The location of each keypoint can be found by finding the maximum of the confidence map.

And step 206, connecting the plurality of key points to obtain a human body skeleton, and marking the positions of the pixel points occupied by the human body skeleton to obtain an initial characteristic matrix.

Specifically, after the key points of the human body are detected, all the key points of the same person may be connected according to a preset sequence to obtain the human skeleton. The preset sequence may refer to an output sequence of a plurality of key points, and is not limited herein. After the human body skeletons are obtained through connection, the positions occupied by the human body skeletons and other positions except the positions occupied by the human body skeletons are respectively marked to obtain an initial characteristic matrix.

And 208, identifying the initial characteristic matrix through the behavior type identification network to obtain the behavior type of the vehicle annual inspection personnel.

The behavior category identification network can be various deep learning-based classification identification models. In the present embodiment, an improved deep learning network is adopted as the behavior class identification network, and the configuration of the behavior class identification network includes a feature extraction network, LSTM (Long Short-Term Memory network), and a fully connected network to be connected subsequently. Specifically, the initial feature matrix is used for extracting skeleton morphological features in the initial feature matrix through a feature extraction network in a trained and mature behavior class recognition network, and a two-dimensional matrix is obtained. Then, the two-dimensional matrix is input to a long-time and short-time memory network to extract sequence characteristics, and the sequence is rearranged to obtain a one-dimensional vector. And finally, inputting the one-dimensional vector into a full-connection network for classification to obtain the behavior category of the vehicle annual inspection personnel.

In this embodiment, before the behavior class of the vehicle annual inspection personnel is identified, the behavior class identification network needs to be trained. The acquisition process of the training sample may be: obtain a plurality of video samples, including vehicle annual survey personnel's various actions in these a plurality of video samples, for example, shoot, strike, draw, the winding, push away the door, lift thing etc.. The plurality of video samples are sampled respectively to obtain a plurality of picture sequence samples. Obtaining a plurality of key points of each picture sample through a key point detection model, connecting the key points to obtain a human skeleton, and marking the positions of pixel points of the human skeleton to obtain an initial characteristic matrix sample. Then, the initial matrix sample is used to train the behavior class recognition network to obtain parameters of the behavior class recognition network, and the parameters are used to set the behavior class recognition network in this embodiment.

Step 210, positioning the interested object in the picture sequence to be detected through the target detection model, and obtaining the category and position information of the interested object.

The target detection model may be a model such as fast R-CNN (a target detection network), SSD (multi-class single-rod Detector), YOLO (single-rod Detector), and the like, and is not limited herein. The object of interest may refer to a tool that may be used in a vehicle inspection process, such as a camera, a cell phone, a hammer, etc. Specifically, each picture in the picture sequence to be detected is input into a trained and mature target detection model to be detected, so that the score and the position information of the interested object category in each picture to be detected are obtained, and the category with the highest score is determined as the category of the interested object being used by the vehicle inspection personnel.

And step 212, generating a behavior detection result of the vehicle annual inspection personnel according to the position information of the key points, the behavior category of the vehicle annual inspection personnel, the category of the interested object and the position information of the interested object.

Specifically, whether the behavior category of the vehicle annual inspection personnel and the category of the object of interest match can be judged according to the requirements of the vehicle annual inspection operation specifications. Then, whether the position information of the key points and the position information of the interested object meet the requirements is further judged. If the behavior category of the vehicle annual inspection personnel is matched with the category of the interested object, and meanwhile the position information of the key point and the position information of the interested object meet the requirements, a qualified result of the behavior detection of the vehicle annual inspection personnel can be generated. For example, the behavior category of the vehicle annual inspection personnel is obtained through a behavior category identification network as a picture, and the category of the interested object is obtained through the target detection model as a mobile phone. According to the requirements of the vehicle annual inspection operation specifications, when the mobile phone or the camera needs to be held by hand, the behavior category of the vehicle annual inspection personnel can be judged to be matched with the category of the used interested object. If the distance between the position information of the hand in the key point and the position information of the interested object is further judged to be close, a qualified result of the behavior detection of the vehicle annual inspection personnel can be generated.

According to the behavior detection method of the annual inspection personnel of the vehicle, a plurality of key points in each picture to be detected are obtained through detection of the key point detection model. And marking the positions of the pixel points occupied by the human skeleton obtained by connecting the plurality of key points to obtain an initial characteristic matrix. And identifying the initial characteristic matrix through a behavior type identification network to obtain the behavior type of the vehicle annual inspection personnel. And detecting through the target detection model to obtain the category and position information of the interested object in the picture sequence to be detected. And finally, judging whether the behavior of the vehicle annual inspection personnel meets the requirements or not according to the position information of the key points, the behavior category of the vehicle annual inspection personnel, the category of the interested object and the position information of the interested object. The method is based on a deep learning theory, combines the position information of key points of vehicle annual inspection personnel and the positioning of interested objects, and makes full use of the influence of each joint of a human body on the behavior, so that the behavior detection judgment result of the vehicle annual inspection personnel is more accurate.

In one embodiment, as shown in fig. 3, connecting a plurality of key points to obtain a human skeleton, and marking the positions of pixel points occupied by the human skeleton to obtain an initial feature matrix, includes the following steps:

step 302, marking a plurality of key points on each to-be-detected drawing sheet according to the position information of the key points.

Wherein, the key points can include but are not limited to the following parts: 1) nose, 2) neck, 3) right shoulder, 4) right elbow, 5) right wrist, 6) left shoulder, 7) left elbow, 8) left wrist, 9) right hip, 10) right knee, 11) right ankle, 12) left hip, 13) left knee, 14) left ankle. The position information of the key points may refer to horizontal and vertical coordinate data of the key points. In particular, the key points can be marked on the image to be detected according to respective horizontal and vertical coordinates.

And 304, adjusting the to-be-detected picture marked with the key points to a preset size, and adjusting the position information of the key points on the to-be-detected picture according to the preset size.

Specifically, since the sizes of the source pictures or the source videos acquired by different image acquisition devices may be different, in order to adapt to different source pictures or source videos, after the sequence of pictures to be detected is obtained, the pictures to be detected may be unified resize (resized) to a preset size. The preset size may be set according to actual needs, and may be set to 30 × 40 (pixel width), for example, without limitation. After the size of the picture to be detected is adjusted, the position information of the key points can be correspondingly updated according to the adjustment proportion of the picture to be detected. For example, assume that the size of the acquired picture to be detected is W × H, where W refers to the width of the picture to be detected and H refers to the height of the picture to be detected. The preset size is S₁×S₂In which S is₁Is the adjusted width, S, of the picture to be detected₂Refers to the adjusted height of the picture to be detected. Then, the adjustment ratios of the width and the height of the picture to be detected are S₁/W，S₂and/H. Suppose that the position information of a plurality of key points is (X) respectively₁,Y₁),(X₂,Y₂)…(X_n,Y_n) Then, the adjusted position information of the plurality of key points obtained according to the adjustment ratio is respectively:

(X₁S₁/W,Y₁S₂/H),(X₂S₁/W,Y₂S₂/H)…(X_nS₁/W,Y_nS₂/H)。

and step 306, connecting the plurality of key points after the position information is adjusted according to a preset sequence to obtain the human skeleton.

And 308, marking the positions of the pixel points occupied by the human skeleton to obtain an initial characteristic matrix.

The preset sequence may refer to a link sequence of the human skeleton. Specifically, even matching may be used to find a partssociation (association region), and the keypoints of the same person are vector-connected according to the link order of the human skeleton according to the keypoints and the association region, so as to form the human skeleton. Then, the positions of the pixel points occupied by the human body skeleton and the positions of other pixel points without the human body skeleton are respectively marked to obtain an initial characteristic matrix. In this embodiment, the pixel position occupied by the human skeleton may be marked as 1, and the other pixel positions are marked as 0, so as to obtain the 0-1 initial feature matrix shown in fig. 4.

In one embodiment, as shown in fig. 5, the identifying the initial feature matrix by the behavior class identification network to obtain the behavior class of the vehicle annual inspection personnel includes the following steps:

step 502, extracting the characteristics of the initial characteristic matrix to obtain a two-dimensional matrix.

A ResNet residual network may be used as the feature extraction network of the behavior class identification network in this embodiment. Specifically, the initial feature matrix is input to a Resnet residual error network for convolution extraction of features, and human skeleton morphological features formed by connection of key points on the picture to be detected are extracted. In this embodiment, to implement digital processing, the extracted morphological features of the human skeleton may be mapped to a two-dimensional matrix.

Step 504, inputting the two-dimensional matrix into the long and short time memory network model to extract sequence features, and rearranging the extracted sequence features to obtain a one-dimensional feature vector.

The long and short time memory Network (LSTM) is an optimized variant of RNN (Recurrent Neural Network), and at time t, there are three inputs of LSTM: the input at the current time, the output at the previous time, and the cell state at the previous time. In particular, the feature of the current time may be represented by a two-dimensional matrix, i.e. in the form of a picture. The two-dimensional matrix characteristics of the human skeleton at the current moment and the characteristics output by the LSTM at the previous moment can be combined into a two-dimensional matrix to be used as the input of the LSTM at the current moment. The output of the LSTM may be a two-dimensional matrix comprising a plurality of elements. And then rearranging the finally obtained two-dimensional matrix to form a one-dimensional vector which can be identified by the fully-connected network.

Step 506, inputting the one-dimensional vectors into the full-connection network for classification to obtain the behavior categories of the vehicle detection personnel.

Specifically, the one-dimensional vector obtained through the LSTM is input into the full-connected layer, the prediction score of each behavior category is obtained, and the behavior category corresponding to the maximum prediction score is considered as the behavior category of the vehicle annual inspection personnel in the picture sequence to be detected. In this embodiment, Softmax normalization processing may be added after the fully connected layer, and the prediction score is mapped to a probability through Softmax, which is naturally better interpretable, so that operations such as subsequent threshold value fetching are facilitated.

In this embodiment, the obtained initial feature matrix can include information of the human skeleton by marking the position of the human skeleton and the position of the background, respectively. The subsequent behavior category identification network can obtain the behavior categories of the vehicle annual inspection personnel based on the relative position characteristics of the key points, and compared with the traditional filtering method and the method for identifying the human body behaviors by image segmentation, the method has the advantages of good robustness and high accuracy.

In one embodiment, as shown in fig. 6, the step of locating the object of interest in each picture to be detected by the target detection model to obtain the category and position information of the object of interest includes the following steps:

step 602, predicting each picture to be detected in the picture sequence to be detected through the target detection model to obtain a plurality of positioning frames, wherein each positioning frame corresponds to the primary selection scores and the primary selection position information of all the interested object categories.

And step 604, selecting the interested object category with the highest initial selection score.

Where a score may refer to a category score. In this embodiment, a target detection model is taken as an example of the YOLO model, and the YOLO model adopts a one-step (one-stage) algorithm: the algorithm is applied directly to the input image and the category and corresponding location information are output. Specifically, a plurality of positioning frames are obtained from the picture to be detected through a YOLO model, and each positioning frame comprises a corresponding category score and position information. And selecting a plurality of positioning frames with the scores larger than the threshold value, and acquiring the primary selection position information and the primary selection score. And taking the category with the highest initial score as the category of the interested object.

And 606, screening the plurality of positioning frames according to the primary selection scores and the primary selection position information to obtain the position information of the interested object.

Specifically, after obtaining the plurality of location frames with the scores larger than the threshold, a preset algorithm, such as NMS (Non-Maximum Suppression), may be adopted to screen the plurality of location frames. Firstly, finding out the positioning frame with the maximum score from all the positioning frames, then sequentially calculating the overlapping degree IOU (interaction over Unit) of the positioning frame with the maximum score and the rest positioning frames, and removing the positioning frames with the IOU value larger than a certain threshold (for example, 0.7); the above process is then repeated for the remaining location frames until all location frames have been processed. And determining the finally obtained position information of the positioning frame as the position information of the interested object. In this embodiment, by using the YOLO model, the detection speed of the object of interest can be increased, so that the efficiency of monitoring and detecting the behavior of the vehicle annual inspection personnel is improved.

In one embodiment, generating a behavior detection result of the vehicle annual inspection personnel according to the position information of the key point, the behavior category of the vehicle annual inspection personnel, the category of the interested object and the position information of the interested object comprises: judging whether the category of the interested object meets the preset requirement or not according to the behavior category of the vehicle annual inspection personnel, judging whether the position information of the key point and the position information of the interested object meet the preset position condition or not when the category of the interested object meets the preset requirement, and if so, generating a qualified result of the behavior detection of the vehicle annual inspection personnel.

Wherein the position information comprises horizontal and vertical coordinate data. Specifically, after the behavior category of the vehicle annual inspection person and the category of the object of interest are acquired, whether the tool used by the vehicle annual inspection person is consistent with the actual operation behavior (i.e., the behavior category) of the vehicle annual inspection person can be judged according to the category of the object of interest. And then according to the behavior category of the vehicle annual inspection personnel, selecting the position information of the key point needing to be used from the key points, judging whether the position information of the key point needing to be used and the position information of the interested object meet preset position conditions or not, and if so, judging that the behavior of the vehicle annual inspection personnel is qualified.

For example, if the behavior category of the vehicle annual inspection personnel is taken as a picture and the category of the interested object is a mobile phone, it can be determined that the tool used by the vehicle annual inspection personnel is correct. And setting key points required by photographing in behavior categories as a right wrist and a left wrist, and if the positions of the right wrist and the left wrist are judged to be closer to the position of the mobile phone, considering that the behaviors of the annual inspection personnel of the vehicle are qualified. Further, the preset position condition may mean that the distance between the coordinates of the right wrist and the left wrist and the coordinates of the mobile phone is smaller than a preset threshold. Let the coordinates of the right wrist be (X)₁,Y₁) The coordinate of the left wrist is (X)₂,Y₂) The coordinate of the mobile phone positioning frame is (X)₀,Y₀) The preset threshold value is T₀And then, whether the distance between the coordinates of the right wrist and the left wrist and the coordinate of the mobile phone meets a preset threshold value or not can be judged and calculated through the following formula:

|X₀-X₁|<T₀,|X₂-X₀|<T₀,|Y₀-Y₁|<T₀,|Y₀-Y₂|<T₀。

for another example, the behavior category of the vehicle annual inspection personnel is acquired as walking, and the category of the object of interest is acquired as a vehicle. Setting key points required by walking as noses or necks in the behavior category, and if the key point position information of the noses or the necks of the vehicle annual inspection personnel with the four pictures in the picture sequence to be detected is judged to be respectively positioned at the upper part, the lower part, the left part and the right part of the position information of the vehicle, considering that the vehicle annual inspection personnel walks around the vehicle for one circle, and the behavior of the vehicle annual inspection personnel is qualified. Further, the preset position condition may refer to a relative positional relationship of coordinates of a nose or a neck and coordinates of a vehicle. Taking the key point as the nose as an example, suppose the coordinates of the noses of the four pictures are respectively (N)₁,M₁),(N₂,M₂),(N₃,M₃),(N₄,M₄) The coordinates of the vehicle positioning frame are (N)₀,M₀) Whether the vehicle detection personnel walks around the vehicle for one circle can be judged through the following formula:

N₀-N₁<0，M₀-M₁<0；

N₀-N₂>0，M₀-M₂<0；

N₀-N₃<0，M₀-M₃>0；

N₀-N₄>0，M₀-M₄>0。

as another example, if the behavior category of the vehicle annual inspection personnel is tap and the category of the object of interest is hammer, it may be determined that the tool used by the vehicle annual inspection personnel is correct. And setting the key point required by the action type as knocking as a right wrist or a left wrist, taking the right wrist as an example, and if the position of the right wrist and the position of the hammer are judged to be closer, determining that the action of the annual inspection personnel of the vehicle is qualified. Further, the preset position condition may mean that a distance between the coordinates of the right wrist and the coordinates of the hammer is smaller than a preset threshold. Let the coordinates of the right wrist be (A)₁,B₁) Coordinates of the hammer (A)₀,B₀) The preset threshold value is T₁Whether the coordinates of the right wrist and the coordinates of the hammer meet a preset threshold can be judged through the following formula:

|A₀-A₁|<T₁,|A₂-A₀|<T₁,|B₀-B₁|<T₁,|B₀-B₂|<T₁。

it is to be understood that the above examples are merely illustrative of the multiple behavior categories of the vehicle annual inspection personnel, and the multiple behavior categories of the vehicle annual inspection personnel may also include multiple types, such as pulling, pushing, lifting, and the like, for which, the behavior of the vehicle annual inspection personnel may be judged whether the behavior meets the requirements by referring to the above manners, and is not limited herein.

In the embodiment, when one of the following conditions is satisfied, a result that the behavior detection of the vehicle annual inspection personnel is not qualified is generated: judging that the category of the interested object does not meet the preset requirement according to the behavior category of the vehicle annual inspection personnel; and judging that the category of the interested object meets the preset requirement according to the behavior category of the annual inspection personnel of the vehicle, but the position information of the key point and the position information of the interested object do not meet the preset position condition. In the embodiment, the behavior of the vehicle annual inspection personnel is judged by combining the position information of the specific key point of the vehicle annual inspection personnel and the position information of the interested object, so that the accuracy of judging the behavior of the vehicle annual inspection personnel can be improved.

Further, in this embodiment, when the obtained key point information obtains a plurality of vehicle detection personnel, the behavior category of each vehicle annual inspection personnel is obtained; and if the type of the interested object corresponding to the behavior type of at least one vehicle annual inspection personnel meets the preset requirement, and the position information of the corresponding key point and the position information of the interested object meet the preset position condition, generating a qualified result of the behavior detection of the vehicle annual inspection personnel.

Specifically, when a plurality of vehicle detection personnel exist in the picture sequence to be detected through the key point detection model, a plurality of groups of key points of the plurality of vehicle detection personnel can be obtained. The initial feature matrix formed by the plurality of sets of key points may be stored in a set V, where vi counts the initial feature moments obtained for all key points of the ith vehicle inspector. The behavior categories of the vehicle detection personnel are identified and obtained through the behavior category identification network. When the behavior category of at least one vehicle detection person and the category of the interested object exist, and the position information of the key point of the at least one vehicle detection person and the position information of the interested object meet the requirements, the behavior of the vehicle annual inspection person can be judged to be qualified. In the embodiment, by adding a multi-person detection classification measure, the behavior of the vehicle annual inspection personnel can be detected when multiple persons exist in the video or the picture, so that the detection applicability and accuracy are improved.

In one embodiment, as shown in fig. 7, a method for detecting the behavior of the vehicle annual inspection personnel is described as an embodiment, which includes the following steps:

step 701, acquiring a picture sequence to be detected.

Specifically, the picture sequence to be detected may be obtained from the acquired video by sampling at preset intervals.

And 702, performing key point detection on the picture sequence to be detected through the key point detection model to obtain a plurality of key points in each picture to be detected of the picture sequence to be detected.

Specifically, the plurality of key points may include 1) a nose, 2) a neck, 3) a right shoulder, 4) a right elbow, 5) a right wrist, 6) a left shoulder, 7) a left elbow, 8) a left wrist, 9) a right hip, 10) a right knee, 11) a right ankle, 12) a left hip, 13) a left knee, 14) a left ankle.

And 703, marking a plurality of key points on each to-be-detected drawing according to the position information of the key points. The position information may refer to the horizontal and vertical coordinates of the key point.

Step 704, adjusting the to-be-detected picture marked with the key points to a preset size, and adjusting the position information of the key points on the to-be-detected picture according to the preset size.

Specifically, the preset dimensions may include dimensions of width and height. And respectively obtaining the adjustment proportion of the width and the height according to the preset size and the size of the original picture to be detected, and then correspondingly adjusting the abscissa value and the ordinate value of the key point according to the adjustment proportion of the width and the height.

Step 705, connecting the plurality of key points after the position information is adjusted according to a preset sequence to obtain the human skeleton.

Step 706, marking the pixel point position occupied by the human skeleton to obtain an initial feature matrix.

Specifically, the pixel point position occupied by the human skeleton may be marked as 1, and the other pixel point positions are marked as 0, so as to obtain an initial feature matrix of 0-1.

And 707, identifying the initial characteristic matrix through a behavior type identification network to obtain the behavior type of the vehicle annual inspection personnel.

Specifically, the composition of the behavior class identification network may include a ResNet feature extraction network, an LSTM network, and a fully connected network. And (3) extracting the characteristics of the initial characteristic matrix through a ResNet characteristic extraction network, extracting the morphological characteristics of the human skeleton, and mapping the extracted characteristics to a two-dimensional matrix. And inputting the two-dimensional matrix into an LSTM network to extract sequence characteristics, and rearranging the sequence characteristics to obtain a one-dimensional vector. And inputting the one-dimensional vector into a full-connection network for classification to obtain the behavior category of the vehicle annual inspection personnel.

Step 708, positioning the object of interest in the image sequence to be detected through the target detection model to obtain the category and position information of the object of interest.

Specifically, after positioning, a plurality of positioning frames may be obtained, and each positioning frame corresponds to the primary election score and the primary election location information. And obtaining the category according to the score of the positioning frame, and obtaining the category score with the highest score as the category of the interested object. And obtaining a final positioning frame through the NMS according to the initial selection score and the initial selection position information, and taking the position information of the positioning frame as the position information of the interested object.

And 709, judging whether the behavior category of the vehicle annual inspection personnel is matched with the category of the interested object, and judging whether the position information of the key point and the position information of the interested object meet a preset position condition.

Specifically, if the behavior category of the vehicle annual inspection personnel is photographing, the category of the interested object may be a mobile phone or a camera; or the behavior category of the vehicle annual inspection personnel is knocking, and the category of the interested object can be hammers; or the behavior category of the vehicle annual inspection personnel is walking, and the category of the interested object may be a vehicle, and the like, which is not limited herein. Whether the distance between the coordinate of the related key point and the coordinate of the interested object meets a preset threshold value or not can be judged by selecting the key point related to the behavior category of the vehicle annual inspection personnel.

And step 710, if the behavior category of the vehicle annual inspection personnel is matched with the category of the interested object, and the position information of the key point and the position information of the interested object meet the preset position condition, generating a qualified result of the behavior detection of the vehicle annual inspection personnel.

And 711, if the behavior category of the vehicle annual inspection personnel is not matched with the category of the interested object, and/or the position information of the key point and the position information of the interested object do not meet the preset position condition, generating a result that the behavior inspection of the vehicle annual inspection personnel is not qualified.

It should be understood that although the various steps in the flow charts of fig. 1-7 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 1-7 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternating with other steps or at least some of the sub-steps or stages of other steps.

In one embodiment, as shown in fig. 8, there is provided a behavior detection apparatus 800 for a vehicle annual checking person, including: an obtaining module 801, a key point detecting module 802, a matrix generating module 803, a behavior category identifying module 804, a target detecting module 805, and a detection result generating module 806, wherein:

an obtaining module 801, configured to obtain a picture sequence to be detected;

the key point detection module 802 is configured to perform key point detection on the picture sequence to be detected through the key point detection model to obtain a plurality of key points in each picture to be detected of the picture sequence to be detected;

the matrix generation module 803 is configured to connect the plurality of key points to obtain a human skeleton, and mark pixel positions occupied by the human skeleton to obtain an initial feature matrix;

the behavior type identification module 804 is used for identifying the initial characteristic matrix through a behavior type identification network to obtain the behavior type of the vehicle annual inspection personnel;

the target detection module 805 is configured to position the object of interest in the to-be-detected picture sequence through the target detection model to obtain the category and the position information of the object of interest;

the detection result generation module 806 generates a behavior detection result of the vehicle annual inspection personnel according to the position information of the key point, the behavior category of the vehicle annual inspection personnel, the category of the interested object and the position information of the interested object.

In one embodiment, the matrix generation module 803 is specifically configured to label a plurality of keypoints on each to-be-detected image according to the position information of the keypoints; adjusting the to-be-detected picture marked with the key points to a preset size, and adjusting the position information of the key points on the to-be-detected picture according to the preset size; connecting the plurality of key points after the position information is adjusted according to a preset sequence to obtain a human skeleton; and marking the positions of the pixel points occupied by the human skeleton to obtain an initial characteristic matrix.

In an embodiment, the behavior category identification module 804 is specifically configured to perform feature extraction on the initial feature matrix to obtain a two-dimensional matrix; inputting the two-dimensional matrix into a long-time memory network model to extract sequence features, and rearranging the extracted sequence features to obtain a one-dimensional feature vector; and inputting the one-dimensional vectors into a full-connection network for classification to obtain the behavior categories of the vehicle detection personnel.

In one embodiment, the target detection module 805 is specifically configured to predict each to-be-detected picture in the to-be-detected picture sequence through a target detection model to obtain a plurality of positioning frames, where each positioning frame corresponds to the primary selection scores and the primary selection position information of all the categories of the interested object; selecting the object type with the highest primary selection score; and screening the plurality of positioning frames according to the primary selection score and the primary selection position information to obtain the position information of the interested object.

In an embodiment, the detection result generating module 806 is specifically configured to determine whether the category of the object of interest meets a preset requirement according to the behavior category of the vehicle annual inspection personnel, determine whether the position information of the key point and the position information of the object of interest meet the preset position condition when the category of the object of interest meets the preset requirement, and if so, generate a result that the behavior detection of the vehicle annual inspection personnel is qualified.

In one embodiment, the detection result generation module 806 is further configured to obtain a behavior category of each vehicle annual inspection person; and if the type of the interested object corresponding to the behavior type of at least one vehicle annual inspection personnel meets the preset requirement, and the position information of the corresponding key point and the position information of the interested object meet the preset position condition, generating a qualified result of the behavior detection of the vehicle annual inspection personnel.

In one embodiment, the detection result generation module 806 is further configured to generate a result that the behavior detection of the vehicle annual inspection personnel is not qualified when one of the following conditions is met: judging that the category of the interested object does not meet the preset requirement according to the behavior category of the vehicle annual inspection personnel; and judging that the category of the interested object meets the preset requirement according to the behavior category of the annual inspection personnel of the vehicle, but the position information of the key point and the position information of the interested object do not meet the preset position condition.

For specific limitations of the behavior detection device of the vehicle annual inspection personnel, reference may be made to the above limitations of the behavior detection method of the vehicle annual inspection personnel, and details thereof are not repeated here. The modules in the device for detecting the behavior of the vehicle annual inspector can be wholly or partially realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, a computer device is provided, which may be a terminal, and its internal structure diagram may be as shown in fig. 9. The computer device includes a processor, a memory, a network interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a method of vehicle annual inspection personnel behavior detection. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.

Those skilled in the art will appreciate that the architecture shown in fig. 9 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, a computer device is provided, comprising a memory and a processor, the memory having a computer program stored therein, the processor implementing the following steps when executing the computer program:

acquiring a picture sequence to be detected; performing key point detection on the picture sequence to be detected through a key point detection model to obtain a plurality of key points in each picture to be detected of the picture sequence to be detected; connecting the plurality of key points to obtain a human body skeleton, and marking the positions of pixel points occupied by the human body skeleton to obtain an initial characteristic matrix; identifying the initial characteristic matrix through a behavior type identification network to obtain the behavior type of the vehicle annual inspection personnel; positioning an interested object in the picture sequence to be detected through a target detection model to obtain the category and position information of the interested object; and generating a behavior detection result of the vehicle annual inspection personnel according to the position information of the key points, the behavior category of the vehicle annual inspection personnel, the category of the interested object and the position information of the interested object.

In one embodiment, the processor, when executing the computer program, further performs the steps of:

marking a plurality of key points on each to-be-detected mapping sheet according to the position information of the key points; adjusting the to-be-detected picture marked with the key points to a preset size, and adjusting the position information of the key points on the to-be-detected picture according to the preset size; connecting the plurality of key points after the position information is adjusted according to a preset sequence to obtain a human skeleton; and marking the positions of the pixel points occupied by the human skeleton to obtain an initial characteristic matrix.

extracting the characteristics of the initial characteristic matrix to obtain a two-dimensional matrix; inputting the two-dimensional matrix into a long-time memory network model to extract sequence features, and rearranging the extracted sequence features to obtain a one-dimensional feature vector; and inputting the one-dimensional vectors into a full-connection network for classification to obtain the behavior categories of the vehicle detection personnel.

predicting each picture to be detected in the picture sequence to be detected through a target detection model to obtain a plurality of positioning frames, wherein each positioning frame corresponds to the primary selection scores and the primary selection position information of all interested object categories; selecting the object type with the highest primary selection score; and screening the plurality of positioning frames according to the primary selection score and the primary selection position information to obtain the position information of the interested object.

judging whether the category of the interested object meets the preset requirement or not according to the behavior category of the vehicle annual inspection personnel, judging whether the position information of the key point and the position information of the interested object meet the preset position condition or not when the category of the interested object meets the preset requirement, and if so, generating a qualified result of the behavior detection of the vehicle annual inspection personnel.

acquiring the behavior category of each vehicle annual inspection personnel; and if the type of the interested object corresponding to the behavior type of at least one vehicle annual inspection personnel meets the preset requirement, and the position information of the corresponding key point and the position information of the interested object meet the preset position condition, generating a qualified result of the behavior detection of the vehicle annual inspection personnel.

when one of the following conditions is met, generating a result that the behavior detection of the vehicle annual inspection personnel is not qualified: judging that the category of the interested object does not meet the preset requirement according to the behavior category of the vehicle annual inspection personnel; and judging that the category of the interested object meets the preset requirement according to the behavior category of the annual inspection personnel of the vehicle, but the position information of the key point and the position information of the interested object do not meet the preset position condition.

In one embodiment, a computer-readable storage medium is provided, having a computer program stored thereon, which when executed by a processor, performs the steps of:

In one embodiment, the computer program when executed by the processor further performs the steps of:

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A method for detecting the behavior of a vehicle annual inspector, the method comprising:

acquiring a picture sequence to be detected;

and generating a behavior detection result of the vehicle annual inspection personnel according to the position information of the key point, the behavior category of the vehicle annual inspection personnel, the category of the interested object and the position information of the interested object.

2. The method of claim 1, wherein the connecting the plurality of key points to obtain a human skeleton, and marking positions of pixel points occupied by the human skeleton to obtain an initial feature matrix comprises:

marking the plurality of key points on each to-be-detected mapping sheet according to the position information of the key points;

adjusting the to-be-detected picture marked with the key points to a preset size, and adjusting the position information of the key points on the to-be-detected picture according to the preset size;

connecting the plurality of key points after the position information is adjusted according to a preset sequence to obtain a human skeleton;

and marking the positions of the pixel points occupied by the human body skeleton to obtain the initial characteristic matrix.

3. The method according to claim 2, wherein in the initial feature matrix, the position of the pixel point occupied by the human skeleton is marked as 1, and the positions of other pixel points are marked as 0.

4. The method of claim 1, wherein the identifying the initial feature matrix through a behavior class identification network to obtain a behavior class of a vehicle annual inspection personnel comprises:

extracting the characteristics of the initial characteristic matrix to obtain a two-dimensional matrix;

inputting the two-dimensional matrix into a long-time memory network model to extract sequence features, and rearranging the extracted sequence features to obtain a one-dimensional feature vector;

and inputting the one-dimensional vector into a full-connection network for classification to obtain the behavior category of the vehicle detection personnel.

5. The method according to claim 1, wherein the locating the object of interest in each picture to be detected through a target detection model to obtain the category and position information of the object of interest comprises:

predicting each picture to be detected in the picture sequence to be detected through the target detection model to obtain a plurality of positioning frames, wherein each positioning frame corresponds to the primary selection scores and the primary selection position information of all interested object categories;

selecting the object type with the highest primary selection score;

and screening the plurality of positioning frames according to the primary selection score and the primary selection position information to obtain the position information of the interested object.

6. The method according to claim 1, wherein the generating the behavior detection result of the vehicle human assistant according to the position information of the key point, the behavior category of the vehicle human assistant, the category of the object of interest, and the position information of the object of interest comprises:

judging whether the category of the interested object meets the preset requirement or not according to the behavior category of the vehicle annual inspection personnel, if so,

and judging whether the position information of the key point and the position information of the interested object meet preset position conditions, and if so, generating a qualified result of the behavior detection of the vehicle annual inspection personnel.

7. The method of claim 6, further comprising:

acquiring the behavior category of each vehicle annual inspection personnel;

and if the type of the interested object corresponding to the behavior type of at least one vehicle annual inspection personnel meets the preset requirement, and the corresponding position information of the key point and the position information of the interested object meet the preset position condition, generating a qualified result of the behavior detection of the vehicle annual inspection personnel.

8. The method of claim 6, further comprising:

when one of the following conditions is met, generating a result that the behavior detection of the vehicle annual inspection personnel is not qualified:

judging that the category of the interested object does not meet a preset requirement according to the behavior category of the vehicle annual inspection personnel;

and judging that the category of the interested object meets a preset requirement according to the behavior category of the vehicle annual inspection personnel, but the position information of the key point and the position information of the interested object do not meet the preset position condition.

9. A behavior detection device for a vehicle annual inspector, the device comprising:

the acquisition module is used for acquiring a picture sequence to be detected;

the key point detection module is used for performing key point detection on the picture sequence to be detected through a key point detection model to obtain a plurality of key points in each picture to be detected of the picture sequence to be detected;

the matrix generation module is used for connecting the plurality of key points to obtain a human body skeleton, and marking the positions of pixel points occupied by the human body skeleton to obtain an initial characteristic matrix;

the target detection module is used for positioning the interested object in the picture sequence to be detected through a target detection model to obtain the category and the position information of the interested object;

and the detection result generation module is used for generating a behavior detection result of the vehicle annual inspection personnel according to the position information of the key point, the behavior category of the vehicle annual inspection personnel, the category of the interested object and the position information of the interested object.

10. A computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor implements the steps of the method of any one of claims 1 to 8 when executing the computer program.