CN117036868A

CN117036868A - Training method and device of human body perception model, medium and electronic equipment

Info

Publication number: CN117036868A
Application number: CN202311293165.8A
Authority: CN
Inventors: 杨李杰; 杨照辉; 余显斌; 普莱姆; 董园园; 张雪薇
Original assignee: Zhejiang Lab
Current assignee: Zhejiang Lab
Priority date: 2023-10-08
Filing date: 2023-10-08
Publication date: 2023-11-10
Anticipated expiration: 2043-10-08
Also published as: CN117036868B

Abstract

The specification discloses a training method, a training device, a training medium and electronic equipment of a human body perception model, wherein a region to be detected is determined, millimeter wave signals of millimeter wave radars for human body perception in the region to be detected are obtained, the millimeter wave signals are taken as sample data, detection results of detection of the region to be detected by other sensors are obtained, the detection results are taken as labels, and the detection results comprise positions of all parts of a human body in the region to be detected and human body contours. Based on the sample data, a human body position prediction result and a human body contour prediction result are obtained, the loss is determined according to the difference between the human body position prediction result and the detection result, and the human body perception model to be trained is trained according to the loss. On the basis of detecting whether the human body exists in the region to be detected, the method realizes the fine sensing of the human body, namely the positioning of the positions of all parts of the human body and the presentation of the human body outline.

Description

Training method and device of human body perception model, medium and electronic equipment

Technical Field

The present application relates to the field of computer technologies, and in particular, to a training method and apparatus for a human body perception model, a medium, and an electronic device.

Background

With the development of technology, artificial intelligence is rapidly developing. With the rising of ideas such as intelligent building and intelligent house, the wide attention of human perception technology especially in intelligent medical science and intelligent endowment field, needs to acquire the physiological state and the motion state of monitored person in real time, effectively detects the abnormal state of monitored person to in time provide the rescue for monitored person. The millimeter wave radar has the characteristics of all weather, interference resistance, non-contact and the like, and is one of sensors used in human body sensing technology.

However, since the spatial resolution of the millimeter wave radar sensor is very limited and the indoor environment is generally complex, it is difficult for the human body sensing technology based on the millimeter wave radar to exert a satisfactory effect in an indoor scene.

Based on the method, the training method of the human body perception model based on the millimeter wave radar is provided by the specification, so that the trained human body perception model not only can determine whether a person exists in a space to be detected, but also can realize positioning of positions of all parts of a human body, realize perception of morphological contours of the human body, and improve human body perception capability of the human body perception model based on the millimeter wave radar.

Disclosure of Invention

The present disclosure provides a training method, device, medium and electronic device for a human body perception model, so as to at least partially solve the above-mentioned problems in the prior art.

The technical scheme adopted in the specification is as follows:

the specification provides a training method of a human body perception model, which comprises the following steps:

determining a region to be detected;

acquiring millimeter wave signals of a millimeter wave radar for performing human body perception on the region to be detected, and taking the millimeter wave signals as sample data;

obtaining detection results of other sensors for detecting the region to be detected, and taking the detection results as labels; wherein, the detection result comprises: the positions of all parts of the human body in the region to be detected and the contour of the human body;

inputting the sample data into a human body perception model to be trained to obtain a human body position prediction result and a human body contour prediction result which are output by the human body perception model to be trained;

according to the difference between the position prediction result and the detection result of each part of the human body, determining loss, and training the human body perception model to be trained by taking the minimum loss as an optimization target.

Optionally, the other sensor comprises an optical detection device;

the method for acquiring the detection result of the detection of the region to be detected by other sensors specifically comprises the following steps:

acquiring an image of the optical detection equipment for image acquisition of the region to be detected;

determining the positions of all parts of the human body in the image and the human body contour, taking the determined positions of all parts of the human body in the image as a first part position and taking the determined human body contour in the image as a first human body contour;

and taking the first position and the first human body outline as detection results.

Optionally, the other sensor further comprises a lidar;

acquiring a laser radar signal of the laser radar for scanning the region to be detected;

according to the laser radar signal, determining the positions of all parts of the human body in the to-be-detected area and the human body contour, taking the determined positions of all parts of the human body in the to-be-detected area as a second position and taking the determined human body contour in the to-be-detected area as a second human body contour;

Determining a final part position according to the first part position and the second part position, and determining a final human body contour according to the first human body contour and the second human body contour;

and taking the final position and the final human body contour as detection results.

Optionally, the human perception model to be trained includes: the prediction layer comprises a classification sub-layer, a regression sub-layer and a segmentation sub-layer;

the detection result further comprises: whether a human body exists in the region to be detected or not;

inputting the sample data into a human body perception model to be trained to obtain a human body position prediction result and a human body contour prediction result of each part of a human body output by the human body perception model to be trained, wherein the method specifically comprises the following steps of:

inputting the sample data into the coding layer to obtain coding characteristics;

inputting the coding features into the decoding layer to obtain decoding features;

inputting the decoding characteristics into the classification sub-layer to obtain a classification prediction result; the classification prediction result is used for representing whether a human body exists in the region to be detected or not;

inputting the decoding characteristics into the regression sub-layer to obtain the position prediction result of each part of the human body;

And inputting the decoding characteristics into the segmentation sub-layer to obtain a human body contour prediction result.

Optionally, taking whether a human body exists in the region to be detected in the detection result as a first label;

taking the positions of all parts of the human body in the region to be detected in the detection result as a second mark;

taking the human body outline in the region to be detected in the detection result as a third mark;

determining the loss specifically includes:

determining a first loss according to the difference between the classification prediction result and the first label;

determining a second loss according to the difference between the position prediction result of each part of the human body and the second label;

and determining a third loss according to the difference between the human body contour prediction result and the third label.

Optionally, training the human body perception model to be trained trains the human body perception model to be trained, which specifically includes:

determining a composite loss based on the first loss and the second loss;

and training the human body perception model to be trained according to the comprehensive loss and the third loss.

Optionally, inputting the sample data into a human perception model to be trained specifically includes:

Performing distance dimension calculation and Doppler dimension calculation on the sample data to respectively obtain distance dimension data and Doppler dimension data corresponding to the sample data;

determining channel number dimension data corresponding to the sample data according to the working mode of the millimeter wave radar; the channel number dimension data represents the channel number corresponding to the working mode of the millimeter wave radar;

obtaining a three-dimensional array corresponding to the sample data based on the distance dimension data, the Doppler dimension data and the channel number dimension data corresponding to the sample data;

and inputting the three-dimensional array corresponding to the sample data into a human body perception model to be trained.

The specification provides a training device of human perception model, includes:

the determining module is used for determining a region to be detected;

the first acquisition module is used for acquiring millimeter wave signals of a millimeter wave radar for human body perception of the region to be detected and taking the millimeter wave signals as sample data;

the second acquisition module is used for acquiring detection results of other sensors for detecting the region to be detected and taking the detection results as labels; wherein, the detection result comprises: the positions of all parts of the human body in the region to be detected and the contour of the human body;

The input module is used for inputting the sample data into a human body perception model to be trained to obtain a human body position prediction result and a human body contour prediction result which are output by the human body perception model to be trained;

and the training module is used for determining loss according to the difference between the position prediction result and the detection result of each part of the human body, and training the human body perception model to be trained by taking the minimum loss as an optimization target.

Optionally, the other sensor comprises an optical detection device;

the second acquisition module is specifically configured to acquire an image of the to-be-detected area acquired by the optical detection device; determining the positions of all parts of the human body in the image and the human body contour, taking the determined positions of all parts of the human body in the image as a first part position and taking the determined human body contour in the image as a first human body contour; and taking the first position and the first human body outline as detection results.

Optionally, the other sensor further comprises a lidar;

The second acquisition module is specifically configured to acquire a laser radar signal that is scanned by the laser radar for the region to be detected; according to the laser radar signal, determining the positions of all parts of the human body in the to-be-detected area and the human body contour, taking the determined positions of all parts of the human body in the to-be-detected area as a second position and taking the determined human body contour in the to-be-detected area as a second human body contour; determining a final part position according to the first part position and the second part position, and determining a final human body contour according to the first human body contour and the second human body contour; and taking the final position and the final human body contour as detection results.

the input module is specifically configured to input the sample data into the coding layer to obtain coding features; inputting the coding features into the decoding layer to obtain decoding features; inputting the decoding characteristics into the classification sub-layer to obtain a classification prediction result; the classification prediction result is used for representing whether a human body exists in the region to be detected or not; inputting the decoding characteristics into the regression sub-layer to obtain the position prediction result of each part of the human body; and inputting the decoding characteristics into the segmentation sub-layer to obtain a human body contour prediction result.

the training module is specifically configured to determine a first loss according to a difference between the classification prediction result and the first label; determining a second loss according to the difference between the position prediction result of each part of the human body and the second label; and determining a third loss according to the difference between the human body contour prediction result and the third label.

Optionally, the training module is specifically configured to determine a comprehensive loss according to the first loss and the second loss; and training the human body perception model to be trained according to the comprehensive loss and the third loss.

Optionally, the input module is specifically configured to perform a distance dimension calculation and a doppler dimension calculation on the sample data, to obtain distance dimension data and doppler dimension data corresponding to the sample data, respectively; determining channel number dimension data corresponding to the sample data according to the working mode of the millimeter wave radar; the channel number dimension data represents the channel number corresponding to the working mode of the millimeter wave radar; obtaining a three-dimensional array corresponding to the sample data based on the distance dimension data, the Doppler dimension data and the channel number dimension data corresponding to the sample data; and inputting the three-dimensional array corresponding to the sample data into a human body perception model to be trained.

The present specification provides a computer readable storage medium storing a computer program which when executed by a processor implements the above-described method of training a human perception model.

The present specification provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the above-described method of training a human perception model when executing the program.

The above-mentioned at least one technical scheme that this specification adopted can reach following beneficial effect:

in the training method of the human body perception model provided in the specification, it can be seen that in the process of training the human body perception model by taking millimeter wave signals of the millimeter wave radar as sample data, other sensors are used for detecting a region to be detected, detection results at least comprising the position of a human body in the region to be detected and the contour of the human body are obtained, and the detection results are used as labels to train the human body perception model to be trained. According to the method, detection results of other sensors for detecting the region to be detected are used as labels, the problem that millimeter wave signals are difficult to extract labels is solved, and on the basis of detecting whether a human body exists in the region to be detected, the position of each part of the human body can be positioned and the human body contour can be presented by realizing the fine perception of the human body.

Drawings

The accompanying drawings, which are included to provide a further understanding of the specification, illustrate and explain the exemplary embodiments of the present specification and their description, are not intended to limit the specification unduly. In the drawings:

FIG. 1 is a flow chart of a training method of a human perception model in the present specification;

FIG. 2 is a schematic diagram of a human perception model according to the present disclosure;

FIG. 3 is a schematic diagram of a training device for a human perception model provided in the present specification;

fig. 4 is a schematic view of the electronic device corresponding to fig. 1 provided in the present specification.

Detailed Description

With the rising of ideas such as intelligent building and intelligent house, the indoor human body perception technology has received much attention. Particularly in smart medical and smart care scenarios, it is desirable to acquire the physiological and kinetic states of a monitored person in real time to provide assistance to the monitored person at a first time when the monitored person is in an abnormal state. At present, optical detection equipment using a camera as a core is widely applied, however, privacy problems are introduced due to an optical imaging technology, namely privacy leakage can be caused by using the camera in an indoor scene, and millimeter wave radar has the characteristics of all weather, interference resistance, non-contact and the like, privacy leakage can be avoided, so that human body perception technology based on the millimeter wave radar is widely applied.

In general, millimeter wave radars mainly rely on doppler effects generated by human motion to detect whether a human body is present. The principle is as follows: the actions of the human body in the motion state such as waving hands, walking, running, jumping and the like can generate Doppler modulation on millimeter wave signals, so that the millimeter wave radar can capture the millimeter wave signals, the detection of the existence of the human body is realized, and in the static state, some basic physiological activities (such as respiration and heartbeat) of the human body can be captured by the millimeter wave radar, such as respiration and heartbeat, can form tiny and regular fluctuation on the skin surface of the human body, when the millimeter wave radar irradiates the human body, the fluctuation can carry out phase modulation on the transmitting signals of the millimeter wave radar, the modulated signals can be received by the millimeter wave radar, and the modulated signals can be analyzed, so that the corresponding respiratory frequency and heartbeat frequency of the human body are obtained, and the detection of the existence of the human body is realized.

However, there is a multipath propagation effect in the indoor environment, so that the millimeter wave radar is difficult to exert a satisfactory sensing effect in the indoor environment, and the existing millimeter wave radar can only sense the existence of a human body, but cannot perform fine sensing on the human body, namely cannot learn the positions of all parts of the human body. Based on the method, the training method of the human body perception model based on the millimeter wave radar is provided by the specification, so that the trained human body perception model not only can determine whether a human body exists in a space to be detected, but also can realize the refined perception of the morphological outline of the human body and all parts of the human body, and the human body perception capability of the human body perception model based on the millimeter wave radar is improved.

For the purposes of making the objects, technical solutions and advantages of the present specification more apparent, the technical solutions of the present specification will be clearly and completely described below with reference to specific embodiments of the present specification and corresponding drawings. It will be apparent that the described embodiments are only some, but not all, of the embodiments of the present specification. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are intended to be within the scope of the present disclosure.

The following describes in detail the technical solutions provided by the embodiments of the present specification with reference to the accompanying drawings.

Fig. 1 is a flow chart of a training method of a human perception model provided in the present specification, which specifically includes the following steps:

s100: and determining the area to be detected.

S102: and acquiring millimeter wave signals of a millimeter wave radar for performing human body perception on the region to be detected, and taking the millimeter wave signals as sample data.

The execution body for executing the technical scheme of the specification can be any computing device (such as a server and a terminal) with computing capability.

Firstly, the computing device may determine the area to be detected, where the area to be detected may be an indoor area such as a bathroom, a living room, or the like, and may be an outdoor area, that is, the area to be detected may be determined according to a specific scene and a specific requirement in the present specification, which is not limited in the present specification.

Then, the computing device may acquire millimeter wave signals of the millimeter wave radar that perform human body perception in the region to be detected, and use the millimeter wave signals as sample data for training the human body perception model to be trained in a subsequent step.

Specifically, in one or more embodiments of the present disclosure, a millimeter wave radar system may be configured in the computing device, where the millimeter wave radar system may operate in a multiple-input multiple-output (Multiple Input Multiple Output, MIMO) antenna array, where the antenna array corresponding to the millimeter wave radar system includes a transmitting antenna that transmits millimeter wave signals and a receiving antenna that receives millimeter wave signals. The millimeter wave radar system can transmit a first signal to a region to be detected, receive a returned second signal, sample the second signal to obtain a sampled millimeter wave signal, and further can take the sampled millimeter wave signal as sample data. More specifically, in one or more embodiments of the present description, the transmit antennas in the millimeter wave radar system may be configured in a time division multiplexed mode of operation (Time Division Multiplexing, TDM) or a code division multiplexed mode of operation (Code Division Multiplexing, CDM). In the TDM mode, the millimeter wave radar system may activate each transmitting antenna in turn according to a specific working time slot to transmit the same millimeter wave signal, i.e., the first signal, to the area to be detected. In the CDM mode, the millimeter wave radar system may activate all transmitting antennas simultaneously to transmit millimeter wave signals, i.e., first signals, to the region to be detected, and there is a difference in the initial phase of the millimeter wave signals transmitted by each transmitting antenna. Meanwhile, the receiving antenna in the millimeter wave radar system can sample the second signal at the same time in the TDM mode or the CDM mode to obtain a millimeter wave signal after sampling.

Furthermore, in embodiments of the present description, in order to adapt the format of sample data to the format of input data of a human perception model to be trained, and to facilitate the process of model training, the sample data may be preprocessed. Specifically, the distance dimension calculation and the Doppler dimension calculation can be performed on the sampled millimeter wave signals, namely the sample data, and the three-dimensional array corresponding to the sample data is obtained based on the channel number of the working mode corresponding to the millimeter wave radar system. The first dimension of the three-dimensional array represents the distance dimension, the second dimension represents the Doppler dimension, and the third dimension represents the channel number dimension of the working mode corresponding to the millimeter wave radar system. That is, in this specification, when training a human body perception model to be trained according to sample data and labels, the computing device may first perform distance dimension calculation and doppler dimension calculation on the sample data, respectively obtain distance dimension data and doppler dimension data corresponding to the sample data, and determine channel number dimension data corresponding to the sample data according to a working mode of the millimeter wave radar, where the channel number dimension data refers to a channel number corresponding to the working mode of the millimeter wave radar. Then, based on the distance dimension data, doppler dimension data and channel number dimension data corresponding to the sample data, a three-dimensional array corresponding to the sample data can be obtained. And finally, training the human body perception model to be trained according to the three-dimensional array and the label corresponding to the sample data.

It should be noted that, the millimeter wave signal, that is, the sample data, may be data obtained in advance, that is, may be millimeter wave signals (sample data) obtained from a millimeter wave signal database or the like without collection, and the above method may be used to collect the millimeter wave signals to obtain the sample data.

S104: obtaining detection results of other sensors for detecting the region to be detected, and taking the detection results as labels; wherein, the detection result comprises: and the positions of all parts of the human body in the region to be detected and the contour of the human body.

Then, as the labeling is difficult to extract from the millimeter wave signal when the human body perception model to be trained is trained, and the technology for detecting and perceiving the human body in the region to be detected based on some other sensors (such as optical detection equipment and laser radar) is mature and convenient, in the specification, the computing equipment can acquire the detection results of detecting the region to be detected by the other sensors, wherein the detection results at least comprise the positions of all parts of the human body in the region to be detected and the human body contours, and the detection results can be used as labels. In the specification, the detection results of detecting the region to be detected by using other sensors are used as labels, and the human body perception model to be trained is trained, so that the information fusion of cross modes is realized, and the perception performance of the trained human body perception model is improved.

In particular, in one or more embodiments of the present description, the other sensor may comprise an optical detection device, more particularly a video camera, a still camera, or the like. When the computing device obtains detection results of detecting the region to be detected by other sensors, an image of image acquisition of the region to be detected by the optical detection device can be obtained, positions of all parts of a human body and human body contours in the image are determined, the determined positions of all parts of the human body in the image are used as first part positions, the determined human body contours in the image are used as first human body contours, and then the first part positions and the first human body contours are the detection results. When determining the positions of the parts of the human body and the human body contour in the image, the positions of the parts of the human body in the image can be determined according to a target detection model, wherein the target detection model can be a RetinaNet model specifically, and the human body contour can be extracted from the image based on the target detection model (RetinaNet model). Of course, the human body part recognition model may be used to determine the positions of the parts of the human body in the image, or the optical detection device itself may have the target detection function and/or the human body part recognition function, so that the positions of the parts of the human body and the human body contour in the region to be detected output by the optical device may be directly obtained. The present specification is not particularly limited as long as the position of the part of the human body in the region to be detected and the contour of the human body can be obtained based on the optical detection apparatus.

In the present specification, the contour of the human body refers to a contour of the human body divided from the region to be detected, that is, a contour of the human body divided from the region to be detected, and positions of parts of the human body refer to spatial orientations of parts constituting the human body or main limbs constituting the human body (such as a head, a large arm, a small arm, a trunk, a thigh, and a small leg) in the region to be detected.

In addition, since the accuracy of the detection result of detecting the region to be detected by different sensors is different, and the information of the detection result is different, for example: the optical detection device such as a camera can only acquire optical information of a human body, but cannot acquire depth information of the human body, and the laser radar can acquire accurate positioning of the human body, so in one or more embodiments of the present disclosure, the other sensors may further include the laser radar, and when acquiring detection results of detecting the region to be detected by the other sensors, the computing device may acquire laser radar signals of the laser radar for scanning the region to be detected, determine positions of parts of the human body in the region to be detected and contours of the human body according to the laser radar signals, and further may use the determined positions of the parts of the human body in the region to be detected as the second position and use the determined contours of the human body in the region to be detected as the second contour of the human body. The final part position may then be determined based on the first part position and the second part position, and the final body contour may be determined based on the first body contour and the second body contour. The final position and final contour of the human body are the detection results. That is, in one or more embodiments of the present disclosure, a laser radar is used to scan and sense a region to be detected, so as to obtain depth information corresponding to each part of a human body (such as a head, a big arm, a small arm, a trunk, a thigh, and a small leg), that is, each part of the human body position, and combine a human body contour obtained based on an optical detection device, and fuse each part of the human body position and the human body contour to obtain a detection result, and use the detection result as a label, thereby improving the accuracy of the label, and improving the accuracy of a trained human body sensing model.

Of course, in one or more embodiments of the present disclosure, the detection result determined by the laser radar may not be fused with the detection result determined by the optical detection device, that is, the detection result determined by only a single other sensor may be used as the label, and the label may be determined by combining the detection results of a plurality of other sensors as well, which is not limited in this disclosure.

It should be noted that, other sensors (such as optical detection device, laser radar) and millimeter wave radar may be disposed in the computing device, and the sensing detection range of the other sensors should be consistent with the sensing detection range of the millimeter wave radar.

S106: and inputting the sample data into a human body perception model to be trained to obtain a human body position prediction result and a human body contour prediction result which are output by the human body perception model to be trained.

S108: according to the difference between the position prediction result and the detection result of each part of the human body, determining loss, and training the human body perception model to be trained by taking the minimum loss as an optimization target.

Finally, the computing equipment can train the human body perception model to be trained according to the sample data and the labels, namely, the sample data can be input into the human body perception model to be trained to obtain the position prediction result and the contour prediction result of each part of the human body output by the human body perception model to be trained, further, according to the difference between the position prediction result and the detection result of each part of the human body, the loss is determined, and the human body perception model to be trained is trained by taking the minimum loss as an optimization target. Specifically, since the positions of all parts of the human body and the contour of the human body in the region to be detected are to be determined, for the human body perception model, the precondition is that whether the human body exists in the region to be detected can be judged, so that the model tasks of the human body perception model to be trained comprise: sensing whether a human body exists in the region to be detected, the contour of the human body in the region to be detected, and the positions of all parts of the human body in the region to be detected, namely the detection result comprises: whether a human body exists in the region to be detected or not, and the positions of all parts of the human body in the region to be detected and the human body contour.

The computing device may use whether a human body exists in the region to be detected in the detection result as a first label, use positions of all parts of the human body in the region to be detected in the detection result as a second label, and use a human body contour in the region to be detected in the detection result as a third label, so as to train the human body perception model to be trained based on the three labels.

In one or more embodiments of the present specification, the human perception model to be trained may include: the coding layer, the decoding layer and the prediction layer, wherein the prediction layer comprises a classification sub-layer, a regression sub-layer and a segmentation sub-layer, as shown in fig. 2, which is a schematic structural diagram of a human body perception model provided by the present application. When the computing device inputs sample data into the human body perception model to be trained to obtain a human body position prediction result and a human body contour prediction result of each part of a human body output by the human body perception model to be trained, the sample data can be input into the coding layer to obtain coding features, the coding features are input into the decoding layer to obtain decoding features, and the decoding features can be further respectively input into the classification sub-layer, the regression sub-layer and the segmentation sub-layer to respectively obtain a classification prediction result, a human body position prediction result and a human body contour prediction result, wherein the classification prediction result is used for representing whether a human body exists in a region to be detected.

When the computing device determines the loss, the first loss may be determined based on a difference between the classification prediction result and the first annotation, the second loss may be determined based on a difference between the position prediction result and the second annotation for each portion of the human body, and the third loss may be determined based on a difference between the human body contour prediction result and the third annotation to train the human body perception model to be trained based on the three losses. Further, in one or more embodiments of the present description, the computing device may further determine a composite loss based on the first loss and the second loss, and train the human perception model to be trained based on the composite loss and the third loss.

Specifically, first, as described in the above steps S100 to S102, the sample data may be preprocessed, and in this step, the preprocessed sample data, that is, the three-dimensional array corresponding to the sample data, may be input into a coding layer, where the coding layer may specifically be a feature pyramid network, and the feature pyramid network may learn multi-scale features in the three-dimensional array using a pyramid structure. More specifically, in one or more embodiments of the present description, the pyramid network may include 5 end-to-end residual layers, and the feature maps output by the residual layers may form a feature pyramid. In the present specification, the channel number dimension data in the three-dimensional array is used for encoding azimuth angles within the millimeter wave radar sensing distance range. And in one or more embodiments of the present disclosure, since the small objects in the area to be detected occupy a small number of data points in the three-dimensional array, to prevent losing the information features of the small objects, the encoding layer corresponding to the feature pyramid network may perform 2×2 downsampling on the three-dimensional array so that the output data tensor, that is, the size of the encoded features, is reduced by 16 times in total in the distance dimension and the doppler dimension.

Further, the computing device may input the three-dimensional array, i.e., the encoded features, processed via the encoding layer to a decoding layer for decoding the distance tensor as well as the angle tensor, with the intent of extending the output encoded feature map processed via the encoding layer to a higher resolution representation. In one or more embodiments in the present description, the decoding layer may specifically implement the feature extension function through a plurality of deconvolution layers, that is, the decoding layer may specifically be a plurality of deconvolution layers. The output tensor of each deconvolution layer can be fused with the feature map in the input deconvolution layer, namely the feature map obtained by the coding layer, so as to retain the detail information of the region to be detected. The characteristic pyramid network is adopted as an encoding layer and a decoding layer is used for decoding the distance and the angle to realize the azimuth calculation of the human body, and the method is different from the existing indoor human body detection method based on the millimeter wave radar Doppler effect/micro Doppler effect, and can avoid the complex azimuth dimension calculation based on fast Fourier, reduce the calculation force requirement on the computing equipment and avoid consuming excessive computing resources.

Finally, the computing device may input the tensor (which may be a two-dimensional array of distance dimensions and azimuth dimensions) output by the decoding layer to a prediction layer, which is composed of a classification sub-layer, a regression sub-layer, and a segmentation sub-layer. For the classification sub-layer and the regression sub-layer, both the classification sub-layer and the regression sub-layer can comprise four cascaded convolution-batch normalization (Conv-BatchNorm) layers, namely tensor output by the de-coding layer, namely coding features are input into the convolution-batch normalization layer to obtain processing results, and the processing results are respectively input into the classification sub-layer and the regression sub-layer, wherein the classification sub-layer can be a convolution layer and comprises a sigmoid activation function for generating a probability map so as to realize binary classification judgment on whether millimeter wave signals are occupied by a human body or not, namely realize judgment on whether a human body exists in a region to be predicted, the regression sub-layer is used for predicting the position of the human body, namely the distance and azimuth angle of the human body, and the regression sub-layer can be a convolution layer with the length of 3×3, and two features can be output and are respectively used for representing the distance and the azimuth angle of the human body.

Furthermore, in one or more embodiments of the present description, the classification sub-layer may employ Focal Loss (Focal Loss), i.e., the first Loss may be Focal Loss, the regression sub-layer may employ Smooth L1 Loss (Smooth L1 Loss), i.e., the second Loss may be Smooth L1 Loss, and the total Loss based on the first Loss and the second Loss in the present description may be expressed using the following formula:

wherein,indicating total loss->Is a superparameter, which can be used to adjust the weights of the first and second losses,/->The value range of (2) is +.>Naturally, the sum of the first loss and the second loss can be directly used as the comprehensive loss without setting the weights of the first loss and the second loss, and the specification is not limited specifically, and the +_s are given as follows>Indicating focus loss, ++>Representing a smooth L1 loss, ">Representing the first label, the classification prediction result, the second label and the prediction result of the position of each part of the human body in sequence.

For the segmentation sub-layer, the segmentation sub-layer is used for determining the contour of the human body in the region to be detected, i.e. the human body is segmented from the region to be detected. In this specification, a tensor output by a decoding layer, that is, a decoding feature, may be input into the segmentation sublayer, specifically, the segmentation sublayer may be 4 concatenated convolution blocks, each convolution block includes 2 convolution-batch normalization-linear rectification function (Conv-batch normal-ReLu) blocks, and data processed by the convolution blocks may be processed by a 1×1 convolution layer to obtain a two-dimensional feature map, where the feature map may be processed by a sigmoid activation function to obtain a class of each pixel included in the tensor. In this specification, the third loss may be a binary cross entropy loss (Binary Cross Entropy Loss), as follows:

Wherein,for the area to be detected, +.>Respectively representing the +.>Distance and angle of>、/>And respectively representing the third labeling and the human body contour prediction result.

In the training method of the human body perception model provided by the specification based on fig. 1, the detection results of the position of the human body and the human body outline in the region to be detected by other sensors are used as labels, so that the human body perception model taking millimeter wave signals as samples is trained, the difficulty that the millimeter wave signals are difficult to extract labels is solved, the trained human body perception model realizes the determination of the position of each position of the human body in the region to be detected and the segmentation of the human body outline on the basis of determining whether the human body exists in the region to be detected, namely, the refined perception of the human body is realized, and the human body perception capability of the human body perception model based on millimeter wave radar is improved.

Based on the above-mentioned method for human body perception model, the embodiment of the present disclosure further provides a schematic diagram of a training device for human body perception model, as shown in fig. 3.

Fig. 3 is a schematic diagram of a training device for a human perception model according to an embodiment of the present disclosure, where the device includes:

A determining module 300, configured to determine a region to be detected;

a first obtaining module 302, configured to obtain a millimeter wave signal of a millimeter wave radar that performs human body sensing on the area to be detected, and use the millimeter wave signal as sample data;

the second obtaining module 304 is configured to obtain a detection result of detecting the region to be detected by other sensors, and use the detection result as a label; wherein, the detection result comprises: the positions of all parts of the human body in the region to be detected and the contour of the human body;

the input module 306 is configured to input the sample data into a human body perception model to be trained, and obtain a position prediction result and a contour prediction result of each part of the human body output by the human body perception model to be trained;

the training module 308 is configured to determine a loss according to a difference between the position prediction result and the detection result of each part of the human body, and train the human body perception model to be trained with the minimum loss as an optimization target.

Optionally, the other sensor comprises an optical detection device;

the second obtaining module 304 is specifically configured to obtain an image of the image acquisition of the region to be detected by the optical detection device; determining the positions of all parts of the human body in the image and the human body contour, taking the determined positions of all parts of the human body in the image as a first part position and taking the determined human body contour in the image as a first human body contour; and taking the first position and the first human body outline as detection results.

Optionally, the other sensor further comprises a lidar;

the second obtaining module 304 is specifically configured to obtain a laser radar signal that is scanned by the laser radar for the area to be detected; according to the laser radar signal, determining the positions of all parts of the human body in the to-be-detected area and the human body contour, taking the determined positions of all parts of the human body in the to-be-detected area as a second position and taking the determined human body contour in the to-be-detected area as a second human body contour; determining a final part position according to the first part position and the second part position, and determining a final human body contour according to the first human body contour and the second human body contour; and taking the final position and the final human body contour as detection results.

the input module 306 is specifically configured to input the sample data into the coding layer to obtain coding features; inputting the coding features into the decoding layer to obtain decoding features; inputting the decoding characteristics into the classification sub-layer to obtain a classification prediction result; the classification prediction result is used for representing whether a human body exists in the region to be detected or not; inputting the decoding characteristics into the regression sub-layer to obtain the position prediction result of each part of the human body; and inputting the decoding characteristics into the segmentation sub-layer to obtain a human body contour prediction result.

the training module 308 is specifically configured to determine a first loss according to a difference between the classification prediction result and the first label; determining a second loss according to the difference between the position prediction result of each part of the human body and the second label; and determining a third loss according to the difference between the human body contour prediction result and the third label.

Optionally, the training module 308 is specifically configured to determine a comprehensive loss according to the first loss and the second loss; and training the human body perception model to be trained according to the comprehensive loss and the third loss.

Optionally, the input module 306 is specifically configured to perform a distance dimension calculation and a doppler dimension calculation on the sample data, so as to obtain distance dimension data and doppler dimension data corresponding to the sample data, respectively; determining channel number dimension data corresponding to the sample data according to the working mode of the millimeter wave radar; the channel number dimension data represents the channel number corresponding to the working mode of the millimeter wave radar; obtaining a three-dimensional array corresponding to the sample data based on the distance dimension data, the Doppler dimension data and the channel number dimension data corresponding to the sample data; and inputting the three-dimensional array corresponding to the sample data into a human body perception model to be trained.

The embodiments of the present specification also provide a computer readable storage medium storing a computer program, where the computer program is configured to perform the method for training the human perception model described above.

Based on the method of the human body perception model described above, the embodiment of the present disclosure further provides a schematic structural diagram of the electronic device shown in fig. 4. At the hardware level, as in fig. 4, the electronic device includes a processor, an internal bus, a network interface, a memory, and a non-volatile storage, although it may include hardware required for other services. The processor reads the corresponding computer program from the nonvolatile memory into the memory and then runs the computer program to realize the training method of the human body perception model.

Of course, other implementations, such as logic devices or combinations of hardware and software, are not excluded from the present description, that is, the execution subject of the following processing flows is not limited to each logic unit, but may be hardware or logic devices.

In the 90 s of the 20 th century, improvements to one technology could clearly be distinguished as improvements in hardware (e.g., improvements to circuit structures such as diodes, transistors, switches, etc.) or software (improvements to the process flow). However, with the development of technology, many improvements of the current method flows can be regarded as direct improvements of hardware circuit structures. Designers almost always obtain corresponding hardware circuit structures by programming improved method flows into hardware circuits. Therefore, an improvement of a method flow cannot be said to be realized by a hardware entity module. For example, a programmable logic device (Programmable Logic Device, PLD) (e.g., field programmable gate array (Field Programmable Gate Array, FPGA)) is an integrated circuit whose logic function is determined by the programming of the device by a user. A designer programs to "integrate" a digital system onto a PLD without requiring the chip manufacturer to design and fabricate application-specific integrated circuit chips. Moreover, nowadays, instead of manually manufacturing integrated circuit chips, such programming is mostly implemented by using "logic compiler" software, which is similar to the software compiler used in program development and writing, and the original code before the compiling is also written in a specific programming language, which is called hardware description language (Hardware Description Language, HDL), but not just one of the hdds, but a plurality of kinds, such as ABEL (Advanced Boolean Expression Language), AHDL (Altera Hardware Description Language), confluence, CUPL (Cornell University Programming Language), HDCal, JHDL (Java Hardware Description Language), lava, lola, myHDL, PALASM, RHDL (Ruby Hardware Description Language), etc., VHDL (Very-High-Speed Integrated Circuit Hardware Description Language) and Verilog are currently most commonly used. It will also be apparent to those skilled in the art that a hardware circuit implementing the logic method flow can be readily obtained by merely slightly programming the method flow into an integrated circuit using several of the hardware description languages described above.

The controller may be implemented in any suitable manner, for example, the controller may take the form of, for example, a microprocessor or processor and a computer readable medium storing computer readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, application specific integrated circuits (Application Specific Integrated Circuit, ASIC), programmable logic controllers, and embedded microcontrollers, examples of which include, but are not limited to, the following microcontrollers: ARC 625D, atmel AT91SAM, microchip PIC18F26K20, and Silicone Labs C8051F320, the memory controller may also be implemented as part of the control logic of the memory. Those skilled in the art will also appreciate that, in addition to implementing the controller in a pure computer readable program code, it is well possible to implement the same functionality by logically programming the method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers, etc. Such a controller may thus be regarded as a kind of hardware component, and means for performing various functions included therein may also be regarded as structures within the hardware component. Or even means for achieving the various functions may be regarded as either software modules implementing the methods or structures within hardware components.

The system, apparatus, module or unit set forth in the above embodiments may be implemented in particular by a computer chip or entity, or by a product having a certain function. One typical implementation is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.

For convenience of description, the above devices are described as being functionally divided into various units, respectively. Of course, the functions of each element may be implemented in one or more software and/or hardware elements when implemented in the present specification.

It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.

Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.

It will be appreciated by those skilled in the art that embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, the present specification may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present description can take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

The description may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The specification may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, as relevant to see a section of the description of method embodiments.

The foregoing is merely exemplary of the present disclosure and is not intended to limit the disclosure. Various modifications and alterations to this specification will become apparent to those skilled in the art. Any modifications, equivalent substitutions, improvements, or the like, which are within the spirit and principles of the present description, are intended to be included within the scope of the claims of the present application.

Claims

1. A method of training a human perception model, the method comprising:

determining a region to be detected;

2. The method of claim 1, wherein the other sensor comprises an optical detection device;

3. The method of claim 2, wherein the other sensor further comprises a lidar;

4. The method of claim 1, wherein the human perception model to be trained comprises: the prediction layer comprises a classification sub-layer, a regression sub-layer and a segmentation sub-layer;

5. The method of claim 4, wherein whether a human body exists in the region to be detected in the detection result is used as a first label;

determining the loss specifically includes:

6. The method of claim 5, wherein training the human perception model to be trained trains the human perception model to be trained, comprising:

determining a composite loss based on the first loss and the second loss;

7. The method according to claim 1, wherein inputting the sample data into a human perception model to be trained, in particular comprises:

8. A training device for a human perception model, the device comprising:

the determining module is used for determining a region to be detected;

9. A computer readable storage medium, characterized in that the storage medium stores a computer program which, when executed by a processor, implements the method of any of the preceding claims 1-7.

10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method of any of the preceding claims 1-7 when the program is executed.