CN113762168A

CN113762168A - Safety alarm method, device, electronic equipment and storage medium

Info

Publication number: CN113762168A
Application number: CN202111056039.1A
Authority: CN
Inventors: 胡方健; 余程鹏; 王小刚
Original assignee: Nanjing Leading Technology Co Ltd
Current assignee: Nanjing Leading Technology Co Ltd
Priority date: 2021-09-09
Filing date: 2021-09-09
Publication date: 2021-12-07

Abstract

The application discloses a safety warning method, a safety warning device, electronic equipment and a storage medium, which belong to the technical field of computers, an in-vehicle image of a target vehicle is obtained, human key points of the in-vehicle image are detected, the coordinate positions of a plurality of human key points are obtained, if the coordinate position of at least one human key point of the plurality of human key points is located in a set area, an out-vehicle image of the target vehicle is obtained, the set area is used for indicating the area corresponding to the window of the target vehicle in the in-vehicle image, the out-vehicle image contains the window of the target vehicle, if the out-vehicle image contains the human image area, whether the out-vehicle image contains the human image or not is detected, and safety warning information is output. Through the method in the application, whether the person in the vehicle has the action of approaching the body to the window or extending out of the window can be accurately judged, and the safety of the person is ensured.

Description

Safety alarm method, device, electronic equipment and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a security alarm method and apparatus, an electronic device, and a storage medium.

Background

With the development of computer network technology, intelligent taxi taking, such as taxi taking in a network appointment, is gradually becoming a main means for travel.

In the process of traveling, both a driver and a passenger can stretch the body out of the window, and the behavior is extremely dangerous and easy to cause accidents. Therefore, how to detect whether the bodies of passengers and drivers extend out of the window is an urgent problem to be solved in the driving process.

Disclosure of Invention

In order to solve the problems in the prior art, the embodiment of the application provides a safety warning method, which can monitor whether a driver or a passenger in a vehicle has a behavior of approaching or extending a body to or from a window in real time, give a corresponding warning, and ensure the safety of personnel.

In a first aspect, an embodiment of the present application provides a security alarm method, where the method includes:

detecting human body key points of the acquired in-vehicle image of the target vehicle to obtain coordinate positions of a plurality of human body key points;

if the coordinate position of at least one human body key point in the plurality of human body key points is located in a set area, acquiring an image outside the target vehicle; the set area is used for indicating a corresponding area of the window of the target vehicle in the in-vehicle image; the vehicle exterior image comprises a window of the target vehicle;

and if the vehicle exterior image contains a human body image area, outputting safety warning information.

In a possible implementation manner, the performing human body key point detection on the acquired in-vehicle image of the target vehicle to obtain coordinate positions of a plurality of human body key points includes:

detecting human body key points of the image in the car through a key point detection model to obtain coordinate positions of a plurality of human body key points output by the key point detection model; the key point detection model is obtained by training based on a sample image with human body position marking information.

In one possible implementation, the keypoint detection model comprises a first convolution sub-network and a first output sub-network; through the key point detection model, carry out human key point detection to the image in the car, obtain the coordinate position of a plurality of human key points of key point detection model output includes:

inputting the in-vehicle image into the first convolution sub-network, and performing feature extraction on the in-vehicle image through the first convolution sub-network to obtain a first in-vehicle image feature map;

and inputting the first in-vehicle image feature map into the first output sub-network to obtain the coordinate positions of the plurality of human body key points output by the first output sub-network.

In one possible implementation, the keypoint detection model comprises a second convolution sub-network, a first classification sub-network, a first regression sub-network, and a second export sub-network; through the key point detection model, carry out human key point detection to the image in the car, obtain the coordinate position of a plurality of human key points of key point detection model output includes:

inputting the in-vehicle image into the second convolution sub-network, and performing feature extraction on the in-vehicle image through the second convolution sub-network to obtain a second in-vehicle image feature map;

inputting the second in-vehicle image feature map into the first classification sub-network, and classifying objects contained in the second in-vehicle image feature map through the first classification sub-network to obtain a first classification feature map;

inputting the first classification feature map into the first regression sub-network, and obtaining an in-vehicle label feature map at the position of the first classification feature map in the labeled human body region through the first regression sub-network;

inputting the in-vehicle annotation feature map into the second output sub-network to obtain the coordinate positions of the plurality of human body key points output by the second output sub-network based on the in-vehicle annotation feature map.

In a possible manner, the acquiring an image outside the vehicle of the target vehicle if the coordinate position of at least one of the plurality of human body key points is located in a set area includes:

if the coordinate position of the at least one human body key point is located in a set area corresponding to a left window of the target vehicle, acquiring an outside vehicle image of the left side of the target vehicle;

and if the coordinate position of the at least one human body key point is located in a set area corresponding to the right window of the target vehicle, acquiring an outside vehicle image of the right side of the target vehicle.

In a possible manner, before outputting safety warning information if the vehicle exterior image includes a human body image region, the method includes:

carrying out human body detection on the vehicle exterior image through a region detection model, and determining whether the vehicle exterior image contains a human body image region according to the output of the region detection model; the region detection model is obtained by training based on a training image with human body key point position marking information.

In one possible approach, the region detection model includes a third convolution sub-network and a second convolution sub-network; the passing through the region detection model, carrying out human body detection on the vehicle exterior image, and determining whether the vehicle exterior image contains a human body image region according to the output of the region detection model, including:

inputting the vehicle exterior image into the third convolution sub-network, and performing feature extraction on the vehicle exterior image through the third convolution sub-network to obtain a first vehicle exterior image feature map;

inputting the first vehicle exterior image feature map into the second regression subnetwork, and if a vehicle exterior annotation image marked with a human body region is obtained, determining that the vehicle exterior image contains the human body image region; and if the image without the marked human body area is obtained, determining that the vehicle exterior image does not contain the human body image area.

In one possible embodiment, the region detection model includes a fourth convolution sub-network, a second classification sub-network, and a third regression sub-network; the passing through the region detection model, carrying out human body detection on the vehicle exterior image, and determining whether the vehicle exterior image contains a human body image region according to the output of the region detection model, including:

inputting the vehicle exterior image into the fourth convolution sub-network, and performing feature extraction on the vehicle exterior image through the fourth convolution sub-network to obtain a second vehicle exterior image feature map;

inputting the second out-of-vehicle image feature map into the second classification sub-network, and classifying objects contained in the second out-of-vehicle image feature map through the second classification sub-network to obtain a second classification feature map;

inputting the second classification feature map into the third regression subnetwork, and if an outside-vehicle labeled image labeled with a human body region is obtained, determining that the outside-vehicle image contains the human body image region; and if the image without the marked human body area is obtained, determining that the vehicle exterior image does not contain the human body image area.

In a second aspect, an embodiment of the present application provides a safety warning device, including:

the determining unit is used for detecting human key points of the acquired in-vehicle image of the target vehicle to obtain coordinate positions of a plurality of human key points;

the judging unit is used for acquiring an image outside the target vehicle if the coordinate position of at least one human body key point in the plurality of human body key points is located in a set area; the set area is used for indicating a corresponding area of the window of the target vehicle in the in-vehicle image; the vehicle exterior image comprises a window of the target vehicle;

and the warning unit is used for outputting safety warning information if the images outside the vehicle contain human body image areas.

In a possible implementation manner, the determining unit is further configured to:

if the coordinate position of the at least one human body key point is located in a set area corresponding to a left window of the target vehicle, acquiring an outside vehicle image of the left side of the target vehicle; and if the coordinate position of the at least one human body key point is located in a set area corresponding to the right window of the target vehicle, acquiring an outside vehicle image of the right side of the target vehicle.

In a possible embodiment, the safety warning device further includes:

the detection unit is used for detecting the human body of the vehicle exterior image through an area detection model and determining whether the vehicle exterior image contains a human body image area or not according to the output of the area detection model; the region detection model is obtained by training based on a training image with human body key point position marking information.

In a possible implementation, the detection unit is further configured to:

In one possible embodiment, the region detection model includes a fourth convolution sub-network, a second classification sub-network, and a third regression sub-network; the detection unit is further configured to:

In a third aspect, an embodiment of the present application provides an electronic device, including a memory and a processor, where the memory stores a computer program operable on the processor, and when the computer program is executed by the processor, the steps of the security alarm method in any one of the above first aspects are implemented.

In a fourth aspect, the present application provides a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and when the computer program is executed by a processor, the steps in the security alarm method in any one of the above first aspects are implemented.

The safety warning method provided by the embodiment of the application comprises the steps of detecting key points of a human body of an in-car image by obtaining the in-car image of a target vehicle, obtaining coordinate positions of a plurality of key points of the human body, if the coordinate position of at least one key point of the human body in the key points of the human body is located in a set area, proving that a passenger or a driver in the car approaches or extends a certain part of the body to the outside of a window, obtaining an out-car image of the target vehicle, wherein the out-car image comprises a window of the target vehicle, if the out-car image comprises a human body image area, detecting whether the out-car image comprises a human body area image, if the out-car image comprises the human body area image, and if the out-car image comprises the human body area image, proving that the passenger or the driver actually extends the body out of the window, outputting safety warning information to remind the personnel in the car to pay attention, reducing accidents and ensuring the safety of the personnel.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.

Fig. 1 is a schematic flowchart of a security alarm method according to an embodiment of the present application;

fig. 2 is a schematic structural diagram of a keypoint detection model according to an embodiment of the present disclosure;

FIG. 3 is a schematic structural diagram of another keypoint detection model provided in an embodiment of the present application;

fig. 4 is a schematic structural diagram of an area detection model according to an embodiment of the present disclosure;

fig. 5 is a schematic structural diagram of another area detection model provided in the embodiment of the present application;

FIG. 6 is a schematic structural diagram of a keypoint detection model training process according to an embodiment of the present disclosure;

FIG. 7 is a schematic structural diagram of a region detection model training process according to an embodiment of the present disclosure;

FIG. 8 is a schematic flow chart of another security alarm method provided in the embodiments of the present application;

fig. 9 is a schematic structural diagram of a safety warning device according to an embodiment of the present application;

FIG. 10 is a schematic structural diagram of another safety warning device provided in an embodiment of the present application;

fig. 11 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application clearer, the present application will be described in further detail with reference to the accompanying drawings, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

It should be noted that references in the specification of the present application to the terms "comprises" and "comprising," and variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

In the traveling process, no matter a driver or a passenger possibly extends the body out of a window, the behavior is extremely dangerous and is easy to cause an accident, in order to remind the passenger or the driver in real time, the embodiment of the application provides a safety warning method, the in-vehicle image of a target vehicle is obtained, the in-vehicle image is subjected to human body key point detection to obtain the coordinate positions of a plurality of human body key points, if the coordinate position of at least one human body key point in the plurality of human body key points is located in a set area, the passenger or the driver in the vehicle is proved to approach the window or extend out of the window at a certain part in the body, the out-vehicle image of the target vehicle is obtained, the set area is used for indicating the corresponding area of the window of the target vehicle in the in-vehicle image, the out-vehicle image comprises the window of the target vehicle, if the out-vehicle image comprises the human body image area, whether the out-vehicle image comprises the human body area image is detected, if the safety warning information exists, the passenger or the driver is proved to actually stretch the body out of the window, and the safety warning information is output to warn the personnel in the vehicle. The safety warning method can accurately judge whether the personnel in the vehicle have the action of approaching the body to the window or extending out of the window, and ensures the safety of the personnel in the vehicle.

Fig. 1 is a schematic flowchart illustrating a security alarm method according to an embodiment of the present application; the safety warning method can be applied to electronic equipment, particularly electronic equipment such as a vehicle-mounted terminal or a mobile phone of a driver, and the following description takes the vehicle-mounted terminal as an example. As shown in fig. 1, the security alarm method includes the following steps:

step S101: and detecting the human body key points of the acquired in-vehicle image of the target vehicle to obtain the coordinate positions of the plurality of human body key points.

In one possible embodiment, the driver may choose to turn on the on-board terminal of the target vehicle during driving, the on-board terminal being connected to an in-vehicle camera and to an out-vehicle camera mounted on an out-vehicle rear-view mirror. The in-vehicle camera can selectively use a DVR (digital Video recorder) camera and is used for obtaining an in-vehicle image of the target vehicle; the external camera can selectively use a bsd (blind Spot detection) camera for acquiring the external image of the target vehicle.

The vehicle-mounted terminal acquires an in-vehicle image of the target vehicle by using the in-vehicle camera, performs human body key point detection on the acquired in-vehicle image of the target vehicle, and can output coordinate positions of a plurality of human body key points in the in-vehicle image. The human body key points may be key points of different human body parts, such as key points of the head or key points of the limbs. And detecting the human key points of the acquired in-vehicle image to obtain the coordinate positions of all the human key points contained in the in-vehicle image.

In other embodiments, when the target vehicle is stationary, an in-vehicle image of the target vehicle may also be acquired, and it is detected whether a limb of a person in the vehicle extends outside the vehicle. For example, when the target vehicle waits for a red light or is parked in a parking lot, it is also possible to detect whether or not a limb of a person in the vehicle is extended outside the vehicle.

Step S102: and if the coordinate position of at least one human body key point in the plurality of human body key points is located in the set area, acquiring an image outside the target vehicle.

In a possible embodiment, after obtaining the coordinate positions of the plurality of human body key points, the vehicle-mounted terminal may determine whether the coordinate positions of the plurality of human body key points are located in the setting area. The preset area is a pre-calibrated area and can be used for indicating a corresponding area of a window of the target vehicle in the in-vehicle image. If the installation position and the angle of the camera in the vehicle are not changed, the position of the set area in the image in the vehicle is not changed; if the installation position or the angle of the in-vehicle camera is adjusted, the position of the set area needs to be calibrated again in the in-vehicle image acquired by the in-vehicle camera.

If the coordinate position of any one or more human key points is located in the set area, it is indicated that part of the body of a person in the vehicle approaches to a vehicle window or extends out of the vehicle window, and the image outside the vehicle of the target vehicle is acquired by using the camera outside the vehicle. The exterior image of the vehicle contains the windows of the subject vehicle for further determination of whether a portion of the body part is extending outside the window.

When satisfying in a plurality of human key points the coordinate position of at least one human key point and being located the settlement region, just open the camera outside the car and gather the picture outside the car, if the coordinate position of a plurality of human key points is not all in the settlement region, then explain that personnel do not be close to the health or stretch out the door window in the car, then need not to open the camera outside the car, can avoid the camera outside the car to open extravagant resource directly, and open the camera outside the car always, there is pedestrian's process near the car also can be detected, lead to the false triggering safety warning information, so, this application embodiment can reduce the condition of false triggering.

In one possible embodiment, the target vehicle may be equipped with one exterior camera on each of the left side mirror and the right side mirror of the vehicle. Two set areas can be calibrated aiming at an in-vehicle image acquired by an in-vehicle camera, wherein one set area is corresponding to the left side window, and the other set area is corresponding to the right side window.

If the coordinate position of at least one human body key point is located in a set area corresponding to a left window of the target vehicle, it is indicated that a part of a body part of an in-vehicle person approaches to the left window of the vehicle or extends out of the window, and at the moment, an out-vehicle image of the left side of the target vehicle can be acquired through an out-vehicle camera of the left side of the vehicle. If the coordinate position of at least one human body key point is located in the set area corresponding to the right window of the target vehicle, it is indicated that a part of the body of a person in the vehicle is close to the right window of the vehicle or extends out of the window, and at the moment, an image outside the vehicle on the right side of the target vehicle can be obtained through the camera outside the vehicle on the right side of the vehicle. If the position coordinates of at least one human body key point exist in the set area corresponding to the left side window and the set area corresponding to the right side window, the cameras outside the vehicle are selected to be opened, and images outside the vehicle are collected. Therefore, the images outside the vehicle can be collected in a targeted manner, and resources can be saved.

Step S103: and if the outside image contains the human body image area, outputting safety warning information.

In a possible embodiment, after the vehicle exterior image is acquired, whether a human body image region exists in the vehicle exterior image is detected, if the human body image region exists, it is indicated that a person in the vehicle stretches out a body part of the vehicle exterior image out of a window, and safety warning information can be sent out through voice prompt or other prompt modes. For example, a voice prompt "please not to extend the head or body outside the window" may be issued.

The safety warning method can accurately judge whether the personnel in the vehicle have the action of approaching the body to the window or extending out of the window, and can give a warning in time so as to reduce the occurrence of accidents and ensure the safety of the personnel in the vehicle.

In a possible implementation manner, in step S101, human body key point detection may be performed on the in-vehicle image through the key point detection model, so as to obtain coordinate positions of a plurality of human body key points output by the key point detection model.

For example, a trained key point detection model may be mounted on the vehicle-mounted terminal, and after the in-vehicle camera acquires the in-vehicle image, the in-vehicle image may be input into the key point detection model, and the trained key point detection model may be used to perform human key point detection on the in-vehicle image, so as to obtain coordinate positions of a plurality of human key points. The key point detection model is obtained by training based on a sample image with human body position marking information.

In one embodiment, the structure of the keypoint detection model may be as shown in FIG. 2, including a first convolution sub-network and a first output sub-network. The process of processing the images in the vehicle through the key point detection model comprises the following steps: inputting the in-vehicle image into a first convolution sub-network, performing feature extraction on the in-vehicle image through the first convolution sub-network to obtain a first in-vehicle image feature map, inputting the first in-vehicle image feature map into a first output sub-network, and obtaining coordinate positions of a plurality of human key points output by the first output sub-network. And judging whether the coordinate position of at least one human body key point in the obtained position coordinates of the plurality of human body key points is in a preset area, and judging whether a person approaches or stretches out of a window by the body part in the vehicle.

In another embodiment, the structure of the keypoint detection model may be as shown in FIG. 3, comprising a second convolution sub-network, a first classification sub-network, a first regression sub-network, and a second export sub-network. The process of processing the images in the vehicle through the key point detection model comprises the following steps: inputting the in-vehicle image into a second convolution sub-network, performing feature extraction on the in-vehicle image through the second convolution sub-network to obtain a second in-vehicle image feature map, inputting the second in-vehicle image feature map into a first classification sub-network, classifying objects contained in the second in-vehicle image feature map through the first classification sub-network to obtain a first classification feature map, inputting the first classification feature map into the first regression sub-network, obtaining an in-vehicle labeled feature map at the position of a labeled human body region through the first regression sub-network, inputting the in-vehicle labeled feature map into a second output sub-network, and obtaining the coordinate positions of a plurality of human body key points output by the second output sub-network based on the in-vehicle labeled feature map. And judging whether the coordinate position of at least one human body key point in the position coordinates of the plurality of human body key points is in a preset area or not, so that whether a person in the vehicle approaches or stretches out of the window or not can be judged.

In one possible implementation, when determining whether the vehicle exterior image includes the human body image region, the human body detection may be performed on the vehicle exterior image through the region detection model, and whether the vehicle exterior image includes the human body image region may be determined according to an output of the region detection model.

Specifically, the vehicle-mounted terminal may carry a trained area detection model, after the vehicle-mounted camera acquires the vehicle exterior image, the vehicle exterior image may be selected to be input into the area detection model, the trained area detection model is used to perform human body image area detection on the vehicle exterior image, and whether the vehicle exterior image includes a human body image area is determined, where the human body image area may be an image area of a human body key point. The region detection model is obtained by training based on a training image with human body key point position marking information.

In one embodiment, the processing of the off-board image by the region detection model is: inputting the vehicle exterior image into a third convolution sub-network, and performing feature extraction on the vehicle exterior image through the third convolution sub-network to obtain a first vehicle exterior image feature map; inputting the first vehicle exterior image feature map into a second regression subnetwork, and if a vehicle exterior annotation image marked with a human body area is obtained, determining that the vehicle exterior image contains the human body image area; and if the image without the marked human body area is obtained, determining that the vehicle exterior image does not contain the human body image area.

In another embodiment, the processing procedure of the region detection model on the off-vehicle image is as follows: inputting the vehicle exterior image into a fourth convolution sub-network, and performing feature extraction on the vehicle exterior image through the fourth convolution sub-network to obtain a second vehicle exterior image feature map; inputting the second out-of-vehicle image feature map into a second classification sub-network, and classifying objects contained in the second out-of-vehicle image feature map through the second classification sub-network to obtain a second classification feature map; inputting the second classification feature map into a third regression subnetwork, and if an automobile exterior labeling image labeled with a human body region is obtained, determining that the automobile exterior image contains the human body image region; and if the image without the marked human body area is obtained, determining that the vehicle exterior image does not contain the human body image area.

The following describes the training procedures of the keypoint detection model and the region detection model used in the above embodiments, respectively.

In one possible embodiment, in training the keypoint detection model shown in fig. 2, the classification sub-network and the regression sub-network may be used to assist in training the keypoint detection model. Specifically, as shown in fig. 6, the classification sub-network for training assistance, the regression sub-network for training assistance, and the first output sub-network are connected in parallel, and during the training process, the output of the first convolution sub-network is used as the input of the classification sub-network for training assistance, the regression sub-network for training assistance, and the first output sub-network, respectively.

The specific training process of the keypoint detection model shown in fig. 2 includes:

step a1, obtaining a first training data set, wherein the sample images in the first training data set are sample images with human body position labeling information. The human body position marking information comprises a human body category label, a marking frame of a human body key point part and a specific coordinate position of the human body key point.

And b1, extracting a sample image from the first training data set, and inputting the extracted sample image into a first convolution sub-network to obtain a first convolution sample feature map.

And c1, inputting the first convolution sample feature map into an assistant training classification sub-network to obtain an assistant classification human sample feature image containing the human body position and a classification loss value.

The classification sub-network for auxiliary training is used for classifying the feature map of the human body part and the feature map of the non-human body part in the in-vehicle image, and outputting to obtain an auxiliary classification human body sample feature map, namely a sample feature image of the human body part determined in the in-vehicle image. And performing loss calculation by using the output of the auxiliary training classification sub-network and the sample image containing the class label of the human body, and determining a classification loss value.

Illustratively, the loss function formula for determining the classification loss value is as follows:

wherein y is an auxiliary classification feature map output by an auxiliary training classification sub-network, y^*Feature maps with human category labels in the first training data set.

And d1, inputting the first convolution sample feature map into an auxiliary training regression subnetwork to obtain an auxiliary human body key sample feature map and a regression loss value.

The regression subnetwork for auxiliary training is used for determining the specific positions of the human key points in the first convolution sample feature map, and marking the positions, which are the human key points, in the first convolution sample feature map with a marking frame. And performing loss calculation by using the output of the regression subnetwork for auxiliary training and the image with the human body position marking information to determine a regression loss value.

The loss function for determining the regression loss value is as follows:

t is an auxiliary regression feature map output by a regression sub-network for auxiliary training, t^*And the feature diagram of the labeling box with the key point parts of the human body in the first training data set.

Wherein, the determining the loss function of the regression loss value specifically further comprises:

β²(t，t^*)＝(x_p1-x_p2)²+(y_p1-y_p2)²

x_p1＝x_t2-x_t1，y_p1 ^＝y_t2-y_t1

x_p2＝x_t*2-x_t*1，y_p2 ^＝y_t*2-y_t*1

c²＝(x_c1-x_c2)²+(y_c1-y_c2)²

x_c1＝min(x_t1，x_t*1)，x_c2＝min(x_t2，x_t*2)

y_c1＝min(y_t1，y_t*1)，y_c2＝min(y_t2，y_t*2)

the IOU (Intersection over Unit) is the Intersection of the image with the labeling frame of the key point part of the human body and the image output by the regression subnetwork divided by the Union of the images, and the larger the value is, the more overlapped areas are, the more the superposition of the output image and the image with the labeling frame of the key point part of the human body is, the better the training effect is, and the more the loss value meets the preset requirement.

M in the loss function of the key point detection model is specifically valued according to the intersection ratio in the loss function of the regression sub-network, and if the intersection ratio is greater than 0.5, the value of m is 1; and if the intersection ratio is less than 0.5, the value of m is 0.

Wherein x is_p1，x_t2，x_t1，x_p2The contents of p, t, etc. in the contents only play a distinguishing effect. (x)_t1，y_t1)，(x_t2，y_t2) Refers to the coordinates of the top left corner and the bottom right corner in the image output by the regression sub-network, (x)_t*1，y_t*1)，(x_t*2，y_t*2) The coordinates of the upper left corner and the coordinates of the lower right corner of the image of the labeling frame with the key point parts of the human body are shown. c. C²＝(x_c1-x_c2)²+(y_c1-Y_c2)²To calculate the distance between coordinate points.

And e1, inputting the first convolution sample characteristic image into the first output sub-network to obtain the coordinate positions of the plurality of human body key points corresponding to the first output sub-network and a first output loss value.

The first output sub-network is used for obtaining the coordinate positions of the plurality of human key points, loss calculation is carried out by using the output coordinate positions of the plurality of human key points and the image with the coordinate position mark of the human key points, and a first output loss value is determined.

The loss function that determines the first output loss value is as follows:

l is the coordinate position of the key points of the plurality of human bodies output by the first output sub-network, and l^*Is the specific coordinate position of the key point of the human body.

And f1, adjusting the network parameters of the key point detection model according to the classification loss value, the regression loss value and the first output loss value until the trained key point detection model is obtained.

Specifically, a total loss value of the keypoint detection model may be determined based on the classification loss value, the regression loss value, and the first output loss value.

Illustratively, the total LOSS value LOSS of the keypoint detection model may be determined by the following formula:

LOSS_{key points}＝k₁L_class(y，y^*)+k₂mL_bbox(t，t^*)+k₃mL_pts(l，l^*)

L_class(y，y^*) Loss function of classification sub-network for training assistance, k₁Representing the weight of the loss function of the classification sub-network assisting the training.

L_bbox(t，t^*) Loss function of regression sub-network for training assistance, k₂Weights representing loss functions of the training-assisted regression sub-network; and m is taken as a value according to the size of the loss function of the regression sub-network.

L_pts(l，l^*) As a loss function of the first output sub-network, k₃Representing the weight of the loss function of the first output sub-network.

Adjusting network parameters of the key point detection model according to the total loss value of the key point detection model, and if the total loss value of the key point detection model meets a preset range, finishing the training of the key point detection model; and if the total loss value of the key point detection model does not meet the preset range, returning to the step b1, continuing to train the key point detection model until the total loss value of the key point detection model meets the preset range, and outputting the trained key point detection model.

After the key point detection model is trained, the coordinate positions of a plurality of human body key points can be obtained in the application process, so that the classification sub-network for auxiliary training and the regression sub-network for auxiliary training can be selected to be closed, the data processing pressure of the vehicle-mounted terminal can be saved, and the resources are saved. If the output results of the auxiliary training classification sub-network and the auxiliary training regression sub-network are desired, the classification sub-network and the regression sub-network can be selected to be used in the application process.

In one possible embodiment, in training the keypoint detection model shown in fig. 3, the first classification subnetwork, the first regression subnetwork, and the second export subnetwork are connected in sequence as shown in fig. 3.

The specific training process of the keypoint detection model shown in fig. 3 includes:

step a2, obtaining a first training data set, where the training data set includes a sample image with human body position labeling information, and the human body position labeling information includes a category label of a human body, a labeling frame of a human body key point part, and a specific coordinate position of a human body key point.

And b2, extracting a sample image from the first training data set, and inputting the extracted sample image into a second convolution sub-network to obtain a second convolution sample feature map.

And c2, inputting the second convolution sample characteristic image into the first classification sub-network to obtain a first classification human body sample characteristic image containing the human body position and a first classification loss value.

Wherein the network structure of the first classification subnetwork may be the same as the network structure of the co-trained classification subnetwork in the above, and the first classification subnetwork determines the first classification penalty value using the same penalty function as the co-trained classification subnetwork.

And d2, inputting the characteristic image of the first classified human sample into a first regression sub-network to obtain a characteristic image of a key sample of the first regression human body and a first regression loss value.

Wherein the network structure of the first regression subnetwork may be the same as the network structure of the co-trained regression subnetwork in the above, and the first regression subnetwork determines the first regression loss value using the same loss function as the co-trained regression subnetwork.

And e2, inputting the first regression human body key point sample characteristic image into a second output sub-network to obtain the coordinate positions of a plurality of human body key points corresponding to the second output sub-network and a second output loss value.

Wherein the network structure of the second export sub-network may be the same as the network structure of the first export sub-network in the above, and the second export sub-network determines the second output loss value using the same loss function as the first export sub-network.

Step f2, determining a total loss value of the key point detection model based on the first classification loss value, the first return loss value and the second output loss value, adjusting network parameters of the key point detection model, and if the total loss value of the key point detection model meets a preset range, indicating that the key point detection model is trained; and if the total loss value of the key point detection model does not meet the preset range, returning to the step b2, continuing to train the key point detection model until the total loss value of the key point detection model meets the preset range, and outputting the trained key point detection model.

The key point detection model determines the network parameters of the key point detection model by using the same loss function as the key point detection model in the above contents.

The sub-networks included in the key point detection model are connected in series, and the loss value can be determined more accurately by using the output in the last sub-network as the input, so that the key point detection model with better effect can be obtained, and the obtained coordinate positions of a plurality of human body key points are more accurate.

In one possible embodiment, in training the keypoint region detection model shown in fig. 4, a classification subnetwork may be used to assist in training the region detection model. Specifically, as shown in FIG. 7, the parallel connection of the trained classification subnetwork and the second regression subnetwork is aided. During the training process, the output of the third convolution sub-network is used as the input of the classification sub-network and the second regression sub-network for assisting the training respectively. The specific training process of the region detection model shown in fig. 4 includes:

step a3, acquiring a second training data set, wherein the second training data set comprises sample images with human body key point position marking information.

And b3, extracting a sample image from the second training data set, and inputting the extracted sample image into a third convolution sub-network to obtain a third convolution sample feature map.

And c3, inputting the third convolution sample feature image into an assistant training classification sub-network to obtain an assistant classification human body key sample feature map containing the positions of the human body key points and an assistant training classification loss value.

Specifically, the input of the classification sub-network for auxiliary training is a third convolution sample feature image, and the output is a human body key sample feature map for auxiliary classification, namely a classification feature map classified according to the category of the human body part. The sub-network of classes that assist in the region detection model may be the same as the sub-network of classes in the keypoint detection model, and the sub-network of classes that assist in the region detection model determines the classification loss value using the same loss function as the sub-network of classes in the keypoint detection model.

And d3, inputting the third convolution sample characteristic image into a second regression sub-network to obtain a second regression human key sample characteristic image and a second regression loss value.

Specifically, the input of the second regression subnetwork is a third convolution sample feature image, and the output of the second regression subnetwork is a second regression human key point sample feature image, which is the feature map marked with the human image region marking frame. The second regression subnetwork may be the same as the regression subnetwork in the keypoint detection model, and the second regression subnetwork determines the second regression loss value using the same loss function as the regression subnetwork in the keypoint detection model.

Step e3, determining a loss value of the first region detection model based on the classification loss value and the second regression loss value of the auxiliary training, adjusting a network parameter of the first region detection model, and if the total loss value of the region detection model meets a preset range, indicating that the region detection model training is finished; and if the total loss value of the area detection model does not meet the preset range, returning to the step b3, continuing to train the area detection model until the total loss value of the area detection model meets the preset range, and outputting the trained area detection model.

Adjusting the network parameters of the region detection model through the loss value obtained by the loss function until the region detection model is trained, wherein the loss function of the region detection model is as follows:

the penalty function for the training-assisted classification sub-network is the same as the penalty function used for the training-assisted classification sub-network in the keypoint detection model. Wherein, y_iA classification feature map which is output by a classification sub-network for assisting training and is classified according to the classes of the human body parts,

a sample diagram labeled with category labels of parts of the human body, k₁Representing the weight of the loss function of the classification sub-network assisting the training.

The loss function for the second regression subnetwork is the same as the loss function used by the co-trained regression subnetwork in the keypoint detection model. Wherein, t_iThe feature map which is output by the second regression sub-network and is marked with the human body image region marking box,

a sample graph with a human body image region labeling frame is obtained. k2 represents the weight of the loss function of the second regression sub-network; and m is taken according to the loss function of the second regression sub-network.

After the training of the region detection model is finished, in the application process, the judgment only needs to judge whether the human body image region exists in the vehicle exterior image. Therefore, the classification sub-network for auxiliary training can be selected to be closed, the data processing pressure of the vehicle-mounted terminal can be saved, and resources are saved. If it is desired to obtain the output of the sub-networks of the assisted training, it is also possible to select a sub-network of the assisted training that is also used during the application process.

In one possible embodiment, in training the region detection model shown in fig. 5, the second classification sub-network and the third regression sub-network are connected in sequence as shown in fig. 5.

The specific training process of the region detection model shown in fig. 5 includes:

step a4, acquiring a second training data set, wherein the second training data set comprises a second sample image with the position marking information of the human body key point. And b4, extracting a sample image from the second training data set, and inputting the extracted sample image into a fourth convolution sub-network to obtain a fourth convolution sample feature map.

And c4, inputting the fourth convolution sample characteristic image into a second classification sub-network to obtain a second classification human sample characteristic image containing the human key point position and a second classification loss value.

In particular, the second classification sub-network may be the same as the co-trained classification sub-network in the area detection model, and the second classification sub-network determines the second classification loss value using the same loss function as the co-trained classification sub-network in the area detection model.

And d4, inputting the second classified human body sample characteristic image into a third regression subnetwork to obtain a third regression human body key point sample characteristic image and a third regression loss value.

In particular, the third regression subnetwork may be the same as the second regression subnetwork in the area detection model, and the third regression subnetwork determines a third regression loss value using the same loss function as the second regression subnetwork in the area detection model.

Determining a loss value of the region detection model based on the second classification loss value and the third regression loss value, adjusting network parameters of the region detection model, and if the total loss value of the region detection model meets a preset range, finishing the training of the region detection model; and if the total loss value of the area detection model does not meet the preset range, returning to the step b4, continuing to train the area detection model until the total loss value of the area detection model meets the preset range, and outputting the trained area detection model.

The area detection model determines the network parameters of the area detection model by using the same loss function as the area detection model in the above description.

The network structures of the sub-networks in the area detection model of the area detection model are connected in series, and the loss value can be determined more accurately by using the output in the previous sub-network as the input, so that the area detection model with better effect can be obtained, and the obtained coordinate positions of a plurality of human body key points are more accurate.

For more convenient understanding of the embodiment of the present application, fig. 8 shows a specific implementation flow of a security alarm method, and as shown in fig. 8, the method may include the following steps:

step S801: an in-vehicle image of a target vehicle is acquired.

Step S802: and determining the coordinate positions of a plurality of human body key points in the in-vehicle image of the target vehicle through the key point detection model.

Step S803: judging whether the preset area contains the coordinate position of at least one human body key point in the coordinate positions of the plurality of human body key points; if so, step S804 is executed, otherwise, step S801 is returned to, and the in-vehicle image at the next time is acquired.

Step S804: an image outside the target vehicle is acquired.

Step S805: judging whether the images outside the vehicle of the target vehicle contain human body image areas or not through the area detection model; if so, step S806 is executed, otherwise, the process returns to step S801 to acquire the in-vehicle image at the next time.

Step S806: and outputting the safety alarm information.

In a possible embodiment, the vehicle-mounted terminal calls a DVR (digital video recorder) camera to capture an in-vehicle image, the in-vehicle image is input into a trained key point detection model to obtain coordinate positions of a plurality of human body key points, whether the coordinate position of at least one human body key point in the coordinate positions of the human body key points is in a preset area or not is judged, namely whether the body part of an in-vehicle person is close to a window or stretches out of the window or not is judged, if not, the in-vehicle camera continuously obtains the in-vehicle image, and whether the coordinate positions of the plurality of human body key points in the in-vehicle image obtained at the next moment are in the preset area or not is continuously judged. If yes, calling the BSD camera to obtain an image outside the vehicle of the target vehicle, inputting the image outside the vehicle into a region detection model, detecting whether a human body image region exists in the image outside the vehicle, namely detecting whether a body part of an in-vehicle person exists in the image outside the vehicle, if not, continuing to obtain the image inside the vehicle by the in-vehicle camera, closing the camera outside the vehicle, and waiting for calling; if the automobile body is found, the safety warning information is sent out to warn the people in the automobile to pay attention to the safety and the body is retracted into the automobile after the people approach or stretch out of the window.

Based on the same concept, the embodiment of the present application further provides a safety warning device, and fig. 9 is a schematic structural diagram of the safety warning device provided in the embodiment of the present application; as shown in fig. 9, the safety warning apparatus includes:

a determining unit 901, configured to perform human body key point detection on the acquired in-vehicle image of the target vehicle, so as to obtain coordinate positions of a plurality of human body key points;

the judging unit 902 is configured to obtain an image outside the vehicle of the target vehicle if the coordinate position of at least one of the plurality of human body key points is located in the set area; the setting area is used for indicating a corresponding area of a window of the target vehicle in the in-vehicle image; the vehicle exterior image comprises a vehicle window of the target vehicle;

and the warning unit 903 is used for outputting safety warning information if the outside image contains a human body image area.

In a possible implementation manner, the determining unit 901 is further configured to:

detecting human body key points of the image in the car through the key point detection model to obtain coordinate positions of a plurality of human body key points output by the key point detection model; the key point detection model is obtained by training based on a sample image with human body position marking information.

inputting the in-vehicle image into a first convolution sub-network, and performing feature extraction on the in-vehicle image through the first convolution sub-network to obtain a first in-vehicle image feature map;

and inputting the first in-vehicle image feature map into a first output sub-network to obtain the coordinate positions of a plurality of human body key points output by the first output sub-network.

inputting the in-vehicle image into a second convolution sub-network, and performing feature extraction on the in-vehicle image through the second convolution sub-network to obtain a second in-vehicle image feature map;

inputting the second in-vehicle image feature map into a first classification sub-network, and classifying objects contained in the second in-vehicle image feature map through the first classification sub-network to obtain a first classification feature map;

inputting the first classification characteristic diagram into a first regression subnetwork, and obtaining an in-vehicle label characteristic diagram at the position of the first classification characteristic diagram in the labeled human body region through the first regression subnetwork;

and inputting the in-vehicle label characteristic diagram into a second output sub-network to obtain the coordinate positions of a plurality of human body key points output by the second output sub-network based on the in-vehicle label characteristic diagram.

In a possible implementation, the determining unit 902 is further configured to:

if the coordinate position of at least one human body key point is located in a set area corresponding to a left window of the target vehicle, acquiring an outside image of the left side of the target vehicle; and if the coordinate position of at least one human body key point is located in a set area corresponding to the right window of the target vehicle, acquiring an image outside the vehicle on the right side of the target vehicle.

In a possible embodiment, as shown in fig. 10, the safety warning device may further include:

a detection unit 1001 configured to perform human body detection on the vehicle exterior image through the region detection model, and determine whether the vehicle exterior image includes a human body image region according to an output of the region detection model; the region detection model is obtained by training based on a training image with the position marking information of the key points of the human body.

In a possible implementation, the detecting unit 1001 is further configured to:

inputting the vehicle exterior image into a third convolution sub-network, and performing feature extraction on the vehicle exterior image through the third convolution sub-network to obtain a first vehicle exterior image feature map;

inputting the first vehicle exterior image feature map into a second regression subnetwork, and if a vehicle exterior annotation image marked with a human body area is obtained, determining that the vehicle exterior image contains the human body image area; and if the image without the marked human body area is obtained, determining that the vehicle exterior image does not contain the human body image area.

In a possible implementation, the detecting unit 1001 is further configured to:

inputting the vehicle exterior image into a fourth convolution sub-network, and performing feature extraction on the vehicle exterior image through the fourth convolution sub-network to obtain a second vehicle exterior image feature map;

inputting the second out-of-vehicle image feature map into a second classification sub-network, and classifying objects contained in the second out-of-vehicle image feature map through the second classification sub-network to obtain a second classification feature map;

inputting the second classification feature map into a third regression subnetwork, and if an automobile exterior labeling image labeled with a human body region is obtained, determining that the automobile exterior image contains the human body image region; and if the image without the marked human body area is obtained, determining that the vehicle exterior image does not contain the human body image area.

Corresponding to the method embodiment, the embodiment of the application also provides the electronic equipment. The electronic device may be a server, or may be a terminal device such as a mobile terminal or a computer, and the electronic device includes at least a memory for storing data and a processor for processing data. The processor for data Processing may be implemented by a microprocessor, a CPU, a GPU (Graphics Processing Unit), a DSP, or an FPGA when executing Processing. For the memory, the memory stores an operation instruction, which may be a computer executable code, and the operation instruction implements the steps in the flow of the security alarm method according to the embodiment of the present application.

Fig. 11 is a schematic structural diagram of an electronic device according to an embodiment of the present application; as shown in fig. 11, the electronic device 1100 in the embodiment of the present application includes: a processor 1110, a display 1120, a memory 1130, an input device 1160, a bus 1105, and a communication module 1140; the processor 1110, memory 1130, input device 1160, display 1120, and communication module 1140 are all connected via a bus 1150, and the bus 1150 is used for data transfer between the processor 1110, memory 1130, display 1120, communication module 1140, and input device 1160.

The memory 1130 may be used to store software programs and modules, and the processor 1110 executes the software programs and modules stored in the memory 1130 to perform various functional applications and data processing of the electronic device 1100, such as the security alarm method provided in the embodiments of the present application. The memory 1130 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program of at least one application, and the like; the storage data area may store data created according to the use of the electronic device 1100, and the like. In addition, the memory 1130 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.

The processor 1110 is a control center of the electronic device 1100, connects various parts of the entire electronic device 1100 using the bus 1150 and various interfaces and lines, and performs various functions of the electronic device 1100 and processes data by operating or executing software programs and/or modules stored in the memory 1130 and calling data stored in the memory 1130. Alternatively, processor 1100 may include one or more Processing units such as a CPU, GPU (Graphics Processing Unit), digital Processing Unit, or the like.

The processor 1110 may further connect the camera inside the vehicle and the camera outside the vehicle by using bluetooth or the like through the communication module 1140, and send the acquired image inside the vehicle and the acquired image outside the vehicle to the electronic device.

The input device 1160 is mainly used to obtain an input operation by a user, and when the electronic devices are different, the input device 1160 may be different. For example, when the electronic device is a portable device such as a smart phone or a tablet computer, the input device 1160 may be a touch screen.

The embodiment of the present application further provides a computer storage medium, where computer-executable instructions are stored in the computer storage medium, and the computer-executable instructions are used to implement the security alarm method described in any embodiment of the present application.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application.

Claims

1. A security alarm method, the method comprising:

2. The method according to claim 1, wherein the performing human body key point detection on the acquired in-vehicle image of the target vehicle to obtain coordinate positions of a plurality of human body key points comprises:

3. The method of claim 2, wherein the keypoint detection model comprises a first convolution sub-network and a first output sub-network; through the key point detection model, carry out human key point detection to the image in the car, obtain the coordinate position of a plurality of human key points of key point detection model output includes:

4. The method of claim 2, wherein the keypoint detection model comprises a second convolution sub-network, a first classification sub-network, a first regression sub-network, and a second export sub-network; through the key point detection model, carry out human key point detection to the image in the car, obtain the coordinate position of a plurality of human key points of key point detection model output includes:

5. The method according to claim 1, wherein the obtaining the off-board image of the target vehicle if the coordinate position of at least one of the plurality of human body key points is within a set area comprises:

6. The method according to claim 1, wherein before outputting safety warning information if the image outside the vehicle contains a human body image area, the method comprises:

7. The method of claim 6, wherein the region detection model comprises a third convolution sub-network and a second regression sub-network; the passing through the region detection model, carrying out human body detection on the vehicle exterior image, and determining whether the vehicle exterior image contains a human body image region according to the output of the region detection model, including:

8. The method of claim 6, wherein the region detection model comprises a fourth convolution sub-network, a second classification sub-network, and a third regression sub-network; the passing through the region detection model, carrying out human body detection on the vehicle exterior image, and determining whether the vehicle exterior image contains a human body image region according to the output of the region detection model, including:

9. A safety warning device, characterized in that the device comprises:

10. The apparatus of claim 9, wherein the apparatus comprises:

the judging unit is further used for acquiring an outside image of the left side of the target vehicle if the coordinate position of the at least one human body key point is located in a set area corresponding to the left window of the target vehicle; and if the coordinate position of the at least one human body key point is located in a set area corresponding to the right window of the target vehicle, acquiring an outside vehicle image of the right side of the target vehicle.

11. An electronic device comprising a memory and a processor, a computer program executable on the memory on the processor, the computer program, when executed by the processor, implementing the method of any one of claims 1 to 8.

12. A computer-readable storage medium having a computer program stored therein, the computer program characterized by: the computer program, when executed by a processor, implements the method of any of claims 1-8.