CN116030493A

CN116030493A - Human shape detection method, device, equipment and storage medium based on event data

Info

Publication number: CN116030493A
Application number: CN202211595376.2A
Authority: CN
Inventors: 刘松华
Original assignee: Shenzhen Ruishi Zhixin Technology Co ltd
Current assignee: Shenzhen Ruishi Zhixin Technology Co ltd
Priority date: 2022-12-13
Filing date: 2022-12-13
Publication date: 2023-04-28

Abstract

The application provides a humanoid detection method, device and equipment based on event data and a storage medium, wherein the method comprises the following steps: aggregating a plurality of event data frames associated with the current detection task to obtain an event image to be detected; inputting an event image to be detected into a trained lightweight humanoid detection model, and outputting target key characteristic data; and acquiring a humanoid detection result based on the target key characteristic data. Through implementation of the scheme, the human shape detection is carried out based on the event data acquired by the event camera, the method is suitable for human shape detection of a high-dynamic scene, the requirement of a lightweight model on hardware performance is relatively low, the method is easy to deploy on conventional equipment, and the application universality is effectively improved.

Description

Human shape detection method, device, equipment and storage medium based on event data

Technical Field

The present disclosure relates to the field of computer vision, and in particular, to a method, an apparatus, a device, and a storage medium for detecting a humanoid form based on event data.

Background

With the continuous development of science and technology, humanoid detection is widely applied to the technical fields of auxiliary driving, monitoring, robots and the like. The current mainstream humanoid detection algorithm is realized based on a deep learning technology, is generally applied to detection of images acquired by a traditional CMOS camera, has higher requirements on hardware performance due to deployment of a humanoid detection model, and has larger application limitation.

Disclosure of Invention

The embodiment of the application provides a humanoid detection method, device and equipment based on event data and a storage medium, which at least can solve the problem of larger application limitation of a humanoid detection algorithm based on a COMS camera provided in the related technology.

An embodiment of the present application provides a humanoid detection method based on event data, including: aggregating a plurality of event data frames associated with the current detection task to obtain an event image to be detected; inputting the event image to be detected into a trained lightweight humanoid detection model, and outputting target key feature data; and acquiring a humanoid detection result based on the target key characteristic data.

A second aspect of the embodiments of the present application provides a humanoid detection apparatus based on event data, including: the aggregation module is used for aggregating a plurality of event data frames associated with the current detection task to obtain an event image to be detected; the detection module is used for inputting the event image to be detected into the trained lightweight humanoid detection model and outputting target key feature data; and the acquisition module is used for acquiring the humanoid detection result based on the target key characteristic data.

A third aspect of the embodiments of the present application provides an electronic device, including: the system comprises a memory and a processor, wherein the processor is used for executing a computer program stored on the memory, and when the processor executes the computer program, the steps in the human shape detection method provided in the first aspect of the embodiment of the application are realized.

A fourth aspect of the embodiments of the present application provides a computer readable storage medium having a computer program stored thereon, where the computer program, when executed by a processor, implements each step in the human shape detection method provided in the first aspect of the embodiments of the present application.

From the above, according to the humanoid detection method, device, equipment and storage medium based on event data provided by the scheme of the application, a plurality of event data frames associated with a current detection task are aggregated to obtain an event image to be detected; inputting an event image to be detected into a trained lightweight humanoid detection model, and outputting target key characteristic data; and acquiring a humanoid detection result based on the target key characteristic data. Through implementation of the scheme, the human shape detection is carried out based on the event data acquired by the event camera, the method is suitable for human shape detection of a high-dynamic scene, the requirement of a lightweight model on hardware performance is relatively low, the method is easy to deploy on conventional equipment, and the application universality is effectively improved.

Drawings

Fig. 1 is a schematic view of an application scenario provided in an embodiment of the present application;

fig. 2 is a schematic structural diagram of an electronic device according to an embodiment of the present application;

FIG. 3 is a basic flow chart of a human shape detection method according to an embodiment of the present disclosure;

fig. 4 is a schematic diagram of a network structure of a lightweight humanoid detection model according to an embodiment of the present application;

fig. 5 is a detailed flow chart of a humanoid detection method according to an embodiment of the present application;

fig. 6 is a schematic program module diagram of a humanoid detection device according to an embodiment of the disclosure.

Detailed Description

In order to make the objects, features and advantages of the present invention more obvious and understandable, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments herein without making any inventive effort, are intended to be within the scope of the present application.

In the description of the embodiments of the present application, it should be understood that the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or an implicit indication of the number of features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more such feature. In the description of the embodiments of the present application, the meaning of "a plurality" is two or more, unless explicitly defined otherwise.

A method, an apparatus, a device, and a storage medium for detecting a humanoid form based on event data according to embodiments of the present application will be described in detail below with reference to the accompanying drawings.

In order to solve the problem of greater application limitation of the COMS camera-based humanoid detection algorithm provided in the related art, an embodiment of the present application provides a humanoid detection method based on event data, which is applied to a scene as shown in fig. 1, and in the application scene, an event camera 10 and an electronic device 20 may be included.

It should be noted that the Event monitoring vision sensor (EVS, event-based Vision Sensor) configured by the Event camera 10 is a novel sensor, which simulates the retina of a human being, responds to the pixel pulse of brightness change generated by motion, that is, the Event camera 10 asynchronously records the brightness change on the pixel, and outputs an Event including coordinates (x, y), a time stamp (t) and an Event polarity (p, with values of +1 and-1, respectively representing the increase and decrease of brightness) when the brightness change exceeds a certain threshold, and each Event is represented in the form of e= (t, x, y, p), so that it can capture the brightness change (that is, the light intensity change) of a scene at a very high frame rate, record the Event at a specific time point and a specific position in an image, and form an Event stream instead of a frame stream, thereby solving the problems of redundancy of information, a data storage amount, a large real-time processing amount and the like of the conventional camera. In addition, the electronic device 20 is a variety of terminal devices having data processing functions, including, but not limited to, a smart phone, a tablet computer, a laptop portable computer, a desktop computer, an in-vehicle terminal, an on-board terminal, and the like.

In the application scenario shown in fig. 1, event data corresponding to an actual application scenario may be collected by the event camera 10, and then the event camera transmits the event data to the electronic device 20. The electronic device 20 performs the following flow of the humanoid detection method with respect to the received event data: firstly, aggregating a plurality of event data frames related to a current detection task to obtain an event image to be detected; then, inputting an event image to be detected into a trained lightweight humanoid detection model, and outputting target key feature data; and finally, acquiring a humanoid detection result based on the target key characteristic data.

Fig. 2 is a schematic structural diagram of an electronic device according to an embodiment of the present application. The electronic device mainly comprises: the number of the processors 202 may be one or more, the memory 201 stores a computer program 203 that can run on the processor 202, the memory 201 is communicatively connected to the processor 202, and the processor 202 implements the flow of the human shape detection method when executing the computer program 203.

It should be noted that the storage 201 may be an internal storage unit, such as a hard disk or a memory; the memory may also be an external storage device, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash Card (Flash Card), or the like. Further, the memory may also include both an internal storage unit and an external storage device, and may also be used to temporarily store data that has been output or is to be output. It should be noted that, when the processor 202 is a neural network chip, the electronic device may not include the memory 202, and whether the electronic device needs to use the memory 202 to store the corresponding computer program depends on the type of the processor 202.

In addition, the processor 202 may be a central processing unit (Central Processing Unit, CPU), which may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSPs), application specific integrated circuits (Application Specific Integrated Circuit, ASICs), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGAs), neural network chips or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

An embodiment of the present application further provides a computer readable storage medium, which may be provided in the foregoing electronic device, and the computer readable storage medium may be a memory in the foregoing embodiment shown in fig. 2.

The computer readable storage medium stores a computer program which, when executed by a processor, implements the flow of the human shape detection method described above. Further, the computer-readable medium may be any medium capable of storing a program code, such as a usb (universal serial bus), a removable hard disk, a Read-Only Memory (ROM), a RAM, a magnetic disk, or an optical disk.

Fig. 3 is a basic flowchart of a human shape detection method according to an embodiment of the present application, where the human shape detection method may be executed by the electronic device in fig. 1 or fig. 2, and specifically includes the following steps:

step 301, aggregating a plurality of event data frames associated with the current detection task to obtain an event image to be detected.

In this embodiment, the event stream collected by the event camera includes a plurality of event data frames, and in this embodiment, the plurality of event data frames may be aggregated for event characterization data corresponding to pixels in each event data frame to obtain aggregated event data, and then a corresponding event image is generated based on the aggregated event data.

In an optional implementation manner of this embodiment, the step of aggregating the plurality of event data frames associated with the current detection task to obtain the event image to be detected includes: acquiring event characterization data corresponding to the same pixel position in a plurality of event data frames associated with a current detection task; performing OR operation on all event characterization data of each pixel position respectively to obtain aggregate event data; an event image to be detected is generated based on the aggregate event data.

Specifically, the event characterization data of the present embodiment includes 0 and 1, where 0 characterizes no event generation and 1 characterizes event generation. In this embodiment, the event monitoring vision sensor configured by the event camera includes a pixel array composed of a plurality of pixels, each pixel works independently, and outputs an event when the pixel detects that the brightness change reaches a preset brightness change threshold value, in practical application, the event generated by the pixel includes a positive event and a negative event, the positive event indicates that the brightness of the current moment is stronger relative to the previous moment, the negative event indicates that the brightness of the current moment is weaker relative to the previous moment, and in addition, when the brightness is unchanged, no event is generated.

When event data aggregation is performed, taking each pixel position in the whole pixel array as a unit, acquiring a plurality of event characterization data corresponding to the same pixel position in multi-frame event data, if the plurality of event characterization data comprises at least one value of '1', assigning '1' to the pixel position in the aggregated event data, if the values of all the event characterization data are '0', assigning '0' to the pixel position in the aggregated event data, and executing the operation to all the pixel positions in the whole pixel array, thereby obtaining final aggregated event data, and then generating a corresponding event image based on the aggregated event data to serve as an event image to be detected. It should be noted that the foregoing embodiment is only an optional event data aggregation manner, and may be implemented in other manners, such as an exclusive or operation, in an actual application scenario, which is not limited to this embodiment.

In an optional implementation manner of this embodiment, the step of generating the event image to be detected based on the aggregate event data includes: determining a pixel region of interest from the overall pixel array corresponding to the aggregate event data according to the task attribute of the current detection task; an event image to be detected is generated based on a portion of the aggregated event data corresponding to the pixel region of interest.

Specifically, in practical application, human shape detection scenes are different, different human shape detection scenes have different detection task requirements, event data of different pixel areas in aggregate event data have different references for human shape detection, and the calculated data amount of a global event image generated based on overall event data is large.

Step 302, inputting an event image to be detected into a trained lightweight humanoid detection model, and outputting target key feature data.

Specifically, the light-weight humanoid detection model of this embodiment is implemented based on a neural network, as shown in fig. 4, which is a network structure schematic diagram of the light-weight humanoid detection model provided by this embodiment, where the light-weight humanoid detection model includes a first CBR module, a second CBR module, a third CBR module, a fourth CBR module, a fifth CBR module, a sixth CBR module, a maximum pooling layer, a fusion module (concatate), an average pooling layer, a full connection layer, and a normalization module, and an input of the first CBR module, the maximum pooling layer, the second CBR module, the third CBR module, the fourth CBR module, the fifth CBR module, the fusion module, the average pooling layer, the full connection layer, and the normalization module are sequentially cascaded, and an input of the sixth CBR module is connected to an output of the third CBR module, and an input of the sixth CBR module is connected to an input of the fusion module.

It should be noted that the dimensions of the first CBR module, the max pooling layer, the second CBR module, the third CBR module, the fourth CBR module, the fifth CBR module, and the sixth CBR module are respectively: the number of channels of the first CBR module, the second CBR module, the third CBR module, the fourth CBR module, the fifth CBR module, and the sixth CBR module in this embodiment is 8×8, 4×4, 3×1, and 3×1, respectively: 8. 12, 16, 64, and 12, and the first CBR module performs 4-fold downsampling, the largest pooling layer and the second CBR module each perform 2-fold downsampling, and none of the third CBR module, the fourth CBR module, the fifth CBR module, and the sixth CBR module performs downsampling.

It should also be understood that the CBR module of the present embodiment includes a concatenated convolutional layer, BN layer, and Relu activation function layer, and the size of the event image to be detected input to the lightweight humanoid detection model of the present embodiment is 192×192×1. Therefore, the model of the embodiment has less parameter quantity and correspondingly less operation quantity in the application stage, so that the requirement on hardware resources is lower, the deployment cost is lower, and the model is easy to deploy on low-power-consumption equipment, such as MCU and the like.

In an optional implementation manner of this embodiment, before the step of inputting the event image to be detected to the trained lightweight humanoid detection model and outputting the target key feature data, the method further includes: affine transformation is carried out on each original APS image in a preset public APS data set, and processed APS images are generated; inputting the original APS image and the processed APS image into a preset event image simulation model to generate a simulation event image; and inputting a training sample set formed by all the simulation event images into the initial lightweight humanoid detection model for training, and obtaining the lightweight humanoid detection model after training.

Specifically, the common data set may be a coco data set or the like, in this embodiment, affine transformation is performed on an original APS image in the data set to adjust positions and angles of the images, the generated processed APS image has a certain position and angle offset with respect to the original APS image, then the original APS image and the processed APS image are combined to simulate an event image, finally, a training sample set composed of event images obtained through simulation is used for training an initial lightweight humanoid detection model, and when the loss value of the overall neural network is no longer reduced, the model converges, so that a trained lightweight humanoid detection model is obtained.

In an alternative implementation of this embodiment, the event image simulation model is expressed as:

wherein image is _src Representing the original APS image, image _dst Representing processed APS Image, image _dvs Representing a simulated event image.

And 303, acquiring a humanoid detection result based on the target key feature data.

Specifically, the human shape detection task may be a classification task, after the light-weight human shape detection model finishes the extraction of the key features, the key features are further classified, the human shape detection result in this embodiment includes that a person appears and no person appears, and when the human shape detection result is that a person appears, the human shape detection result in this embodiment may further obtain the human shape recognition result based on the target key feature data. In practical application, a preset skeleton key point feature library can be called, at least one legal skeleton key point feature is stored in the feature library, then target key feature data is compared with the legal skeleton key point features in the skeleton key point feature library, the actual feature matching degree of the target key feature data and the legal skeleton key point features is obtained, and if the actual feature matching degree is higher than a preset matching degree threshold value, the person recognition result is determined to be that a legal person is recognized.

Further, the task recognition result can be uploaded to the application platform, corresponding control is achieved by the application platform, and taking an electronic door lock as an example, an unlocking command can be sent to an electronic lock control of the electronic door lock based on the face recognition result of the legal person, and the electronic lock control controls a mechanical lock body of the door lock to execute unlocking action according to the unlocking command.

In an optional implementation manner of this embodiment, after the step of obtaining the humanoid detection result based on the target key feature data, the method further includes: if the humanoid detection result is that a person appears, acquiring a security application level required by an application task corresponding to the current detection task; and acquiring a person identification result based on part of key feature data corresponding to the security application level in the target key feature data.

Specifically, as described above, when the presence of a person is detected, the present embodiment may further perform a person recognition task, that is, to recognize whether or not the person currently present is a legal person. In this embodiment, the person recognition result is commonly used in an application scenario of identity verification, and for identity verification tasks with different security requirements, the person recognition result generally has different security application level requirements, and the embodiment can be adapted to the security application level to correspondingly acquire part of key feature data from all detected key feature data to perform person recognition, so that the computing amount of person recognition is reduced while the basic security application requirements are ensured, and the recognition efficiency is effectively improved.

The method in fig. 5 is a refined human shape detection method provided in an embodiment of the present application, and the implementation flow of the human shape detection method includes the following steps:

step 501, carrying out affine transformation on each original APS image in a preset public APS data set to generate a processed APS image;

step 502, inputting an original APS image and a processed APS image into a preset event image simulation model to generate a simulation event image;

step 503, inputting a training sample set formed by all the simulation event images into an initial lightweight humanoid detection model for training, and obtaining a lightweight humanoid detection model after training;

step 504, obtaining event characterization data corresponding to the same pixel position in a plurality of event data frames associated with the current detection task;

step 505, performing OR operation on all event characterization data of each pixel position to obtain aggregate event data;

step 506, determining a pixel region of interest from the whole pixel array corresponding to the aggregated event data according to the task attribute of the current detection task;

step 507, generating an event image to be detected based on part of event data corresponding to the pixel region of interest in the aggregated event data;

step 508, inputting an event image to be detected into a trained lightweight humanoid detection model, and outputting target key feature data;

step 509, acquiring a humanoid detection result based on the target key feature data.

It should be understood that, the sequence number of each step in this embodiment does not mean the order of execution of the steps, and the execution order of each step should be determined by its functions and internal logic, and should not be construed as a unique limitation on the implementation process of the embodiments of the present application.

Fig. 6 is a human form detection device based on event data according to an embodiment of the present application, where the human form detection device may be used to implement the human form detection method in the foregoing embodiment, and the human form detection device mainly includes:

the aggregation module 601 is configured to aggregate a plurality of event data frames associated with a current detection task to obtain an event image to be detected;

the detection module 602 is configured to input an event image to be detected into a trained lightweight humanoid detection model, and output target key feature data;

an obtaining module 603, configured to obtain a humanoid detection result based on the target key feature data.

In some implementations of this embodiment, the aggregation module is specifically configured to: acquiring event characterization data corresponding to the same pixel position in a plurality of event data frames associated with a current detection task; wherein the event characterization data includes 0 and 1,0 characterizes no event generation, 1 characterizes event generation; performing OR operation on all event characterization data of each pixel position respectively to obtain aggregate event data; an event image to be detected is generated based on the aggregate event data.

Further, in some implementations of the present embodiment, the aggregation module, when executing the above-described function of generating the event image to be detected based on the aggregate event data, is specifically configured to: determining a pixel region of interest from the overall pixel array corresponding to the aggregate event data according to the task attribute of the current detection task; an event image to be detected is generated based on a portion of the aggregated event data corresponding to the pixel region of interest.

In some implementations of this embodiment, the lightweight humanoid detection model includes a first CBR module, a second CBR module, a third CBR module, a fourth CBR module, a fifth CBR module, a sixth CBR module, a max pooling layer, a fusion module, an average pooling layer, a full connection layer, and a normalization module, the first CBR module, the max pooling layer, the second CBR module, the third CBR module, the fourth CBR module, the fifth CBR module, the fusion module, the average pooling layer, the full connection layer, and the normalization module are cascaded in sequence, an input of the sixth CBR module is connected to an output of the third CBR module, and an input of the sixth CBR module is connected to an input of the fusion module; the dimensions of the first CBR module, the max-pooling layer, the second CBR module, the third CBR module, the fourth CBR module, the fifth CBR module, and the sixth CBR module are: 8×8, 4×4, 3×1, and 3×1.

In some implementations of this embodiment, the humanoid detection apparatus further includes: the training system comprises a generating module and a training module, wherein the generating module is used for: affine transformation is carried out on each original APS image in a preset public APS data set, and processed APS images are generated; and inputting the original APS image and the processed APS image into a preset event image simulation model to generate a simulation event image. The training module is used for: and inputting a training sample set formed by all the simulation event images into the initial lightweight humanoid detection model for training, and obtaining the lightweight humanoid detection model after training.

In some implementations of this embodiment, the obtaining module is further configured to: if the humanoid detection result is that a person appears, acquiring a security application level required by an application task corresponding to the current detection task; and acquiring a person identification result based on part of key feature data corresponding to the security application level in the target key feature data.

It should be noted that, the human shape detection method in the foregoing embodiment may be implemented based on the human shape detection device provided in the foregoing embodiment, and those skilled in the art can clearly understand that, for convenience and brevity of description, the specific working process of the human shape detection device described in the foregoing embodiment may refer to the corresponding process in the foregoing method embodiment, which is not repeated herein.

Based on the technical scheme of the embodiment of the application, a plurality of event data frames associated with the current detection task are aggregated to obtain an event image to be detected; inputting an event image to be detected into a trained lightweight humanoid detection model, and outputting target key characteristic data; and acquiring a humanoid detection result based on the target key characteristic data. Through implementation of the scheme, the human shape detection is carried out based on the event data acquired by the event camera, the method is suitable for human shape detection of a high-dynamic scene, the requirement of a lightweight model on hardware performance is relatively low, the method is easy to deploy on conventional equipment, and the application universality is effectively improved.

It should be noted that the apparatus and method disclosed in several embodiments provided in the present application may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of modules is merely a logical function division, and there may be additional divisions of actual implementation, e.g., multiple modules or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or modules, which may be in electrical, mechanical, or other forms.

The modules illustrated as separate components may or may not be physically separate, and components shown as modules may or may not be physical modules, i.e., may be located in one place, or may be distributed over a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional module in each embodiment of the present application may be integrated into one processing module, or each module may exist alone physically, or two or more modules may be integrated into one module. The integrated modules may be implemented in hardware or in software functional modules.

The integrated modules, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a readable storage medium, including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods of the embodiments of the present application. And the aforementioned readable storage medium includes: a usb disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk, etc.

It should be noted that, for the sake of simplicity of description, the foregoing method embodiments are all expressed as a series of combinations of actions, but it should be understood by those skilled in the art that the present application is not limited by the order of actions described, as some steps may be performed in other order or simultaneously in accordance with the present application. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily all necessary for the present application.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to the related descriptions of other embodiments.

The foregoing describes the method, apparatus, device and storage medium for detecting a humanoid form based on event data provided in the present application, and those skilled in the art, based on the ideas of the embodiments of the present application, will change the specific implementation and application scope, so that the content of the present application should not be construed as limiting the present application.

Claims

1. A humanoid detection method based on event data, comprising:

aggregating a plurality of event data frames associated with the current detection task to obtain an event image to be detected;

inputting the event image to be detected into a trained lightweight humanoid detection model, and outputting target key feature data;

and acquiring a humanoid detection result based on the target key characteristic data.

2. The human shape detection method according to claim 1, wherein the step of aggregating a plurality of event data frames associated with a current detection task to obtain an event image to be detected comprises:

acquiring event characterization data corresponding to the same pixel position in a plurality of event data frames associated with a current detection task; wherein the event characterization data includes 0 and 1,0 characterizes no event generation, 1 characterizes event generation;

performing OR operation on all the event characterization data of each pixel position to obtain aggregate event data;

and generating an event image to be detected based on the aggregate event data.

3. The human shape detection method according to claim 2, wherein the step of generating an event image to be detected based on the aggregate event data includes:

determining a pixel region of interest from the overall pixel array corresponding to the aggregate event data according to the task attribute of the current detection task;

generating an event image to be detected based on part of event data corresponding to the pixel region of interest in the aggregated event data.

4. The humanoid detection method of claim 1, wherein the lightweight humanoid detection model includes a first CBR module, a second CBR module, a third CBR module, a fourth CBR module, a fifth CBR module, a sixth CBR module, a max pooling layer, a fusion module, an average pooling layer, a full connection layer, and a normalization module, inputs of the first CBR module, the max pooling layer, the second CBR module, the third CBR module, the fourth CBR module, the fifth CBR module, the fusion module, the average pooling layer, the full connection layer, and the normalization module are sequentially cascaded, inputs of the sixth CBR module are connected to an output of the third CBR module, and inputs of the sixth CBR module are connected to an input of the fusion module; the dimensions of the first CBR module, the max-pooling layer, the second CBR module, the third CBR module, the fourth CBR module, the fifth CBR module, and the sixth CBR module are respectively: 8×8, 4×4, 3×1, and 3×1.

5. The human shape detection method according to claim 1, wherein before the step of inputting the event image to be detected to the training-completed lightweight human shape detection model and outputting the target key feature data, further comprises:

affine transformation is carried out on each original APS image in a preset public APS data set, and processed APS images are generated;

inputting the original APS image and the processed APS image into a preset event image simulation model to generate a simulation event image;

and inputting a training sample set formed by all the simulation event images into an initial lightweight humanoid detection model for training, and obtaining the lightweight humanoid detection model after training.

6. The humanoid detection method of claim 5, wherein the event image simulation model is expressed as:

wherein image is _src Representing the original APS image, image _dst Representing the processed APS Image, image _dvs Representing the simulated event image.

7. The human form detection method according to any one of claims 1 to 6, characterized by further comprising, after the step of acquiring human form detection results based on the target key feature data:

if the humanoid detection result shows that a person appears, acquiring a security application level required by an application task corresponding to the current detection task;

and acquiring a person identification result based on part of key feature data corresponding to the security application level in the target key feature data.

8. A humanoid detection apparatus based on event data, comprising:

the aggregation module is used for aggregating a plurality of event data frames associated with the current detection task to obtain an event image to be detected;

the detection module is used for inputting the event image to be detected into the trained lightweight humanoid detection model and outputting target key feature data;

and the acquisition module is used for acquiring the humanoid detection result based on the target key characteristic data.

9. An electronic device comprising a memory and a processor, wherein:

the processor is used for executing the computer program stored on the memory;

the processor, when executing the computer program, implements the steps of the human shape detection method of any one of claims 1 to 7.

10. A computer readable storage medium having stored thereon a computer program, characterized in that the computer program, when being executed by a processor, implements the steps of the humanoid detection method of any one of claims 1 to 7.