CN113158858A

CN113158858A - Behavior analysis method and system based on deep learning

Info

Publication number: CN113158858A
Application number: CN202110383994.XA
Authority: CN
Inventors: 刘镇硕; 方倩
Original assignee: Suzhou Aikor Intelligent Technology Co ltd
Current assignee: Suzhou Aikor Intelligent Technology Co ltd
Priority date: 2021-04-09
Filing date: 2021-04-09
Publication date: 2021-07-23

Abstract

The scheme relates to a behavior analysis method, a behavior analysis system, computer equipment and a storage medium based on deep learning. The method comprises the following steps: inputting a video data frame acquired by a camera in real time into a convolutional neural network to obtain a figure positioning result and a face recognition result; determining a figure positioning area in the video data frame according to the figure positioning result, and carrying out area image interception on the figure positioning area to obtain an intercepted target image area; inputting the target image area into the recognition model, and performing behavior recognition on the character in the target image area to obtain a behavior recognition result; and acquiring a time point in real time, and generating a behavior log according to the time point, the behavior recognition result and the face recognition result. The positions and identities of the people in the video data frames are identified through the convolutional neural network, and the behaviors of the people are identified through the identification model, so that a behavior log is generated, the behaviors of the old are identified without manual nursing, and the nursing cost of the old is reduced.

Description

Behavior analysis method and system based on deep learning

Technical Field

The invention relates to the technical field of computers, in particular to a behavior analysis method and system based on deep learning, computer equipment and a storage medium.

Background

With the development of society, the aging of population will continuously increase, and the proportion of the elderly in the population of China is more and more important. As the aging of society is increasing, more and more old people are careless and perform various activities, and accidents are likely to occur, so that these old people need a lot of life services such as nursing, entertainment, medical treatment, education, and the like. Correspondingly, a great number of young people will be put into the industry in the future, for example, in order to relieve the contradiction that people are needed for both work and family care, currently, many people choose to hire caregivers to specially care the old people needing the care of the family, so that the family can take care of the old people and can work at ease.

However, the same system does not exist for nursing workers at present, the nursing workers who are engaged in the market have different levels, some nursing workers with low nursing worker levels inevitably exist, and the nursing mode with low nursing worker levels is incorrect and easily causes injury to the old. Therefore, the traditional nursing method for the elderly has the problems of high cost and low efficiency.

Disclosure of Invention

Based on the above, in order to solve the above technical problems, a behavior analysis method, a system, a computer device and a storage medium based on deep learning are provided, which can reduce the nursing cost of the elderly.

A method of deep learning based behavior analysis, the method comprising:

acquiring a video data frame in real time through a camera, and inputting the video data frame into a convolutional neural network to obtain a figure positioning result and a face recognition result;

determining a figure positioning area in the video data frame according to the figure positioning result, and carrying out area image interception on the figure positioning area to obtain an intercepted target image area;

inputting the target image area into an identification model, and performing behavior identification on the characters in the target image area to obtain a behavior identification result;

and acquiring a time point in real time through a system clock, and generating a behavior log according to the time point, the behavior recognition result and the face recognition result.

In one embodiment, the inputting the video data frame into a convolutional neural network to obtain a human positioning result and a human face recognition result includes:

inputting the video data frame into a convolutional neural network, and determining the person positioning coordinates in the video data frame through the convolutional neural network;

calculating the figure positioning result according to the figure positioning coordinates;

determining face positioning coordinates in the video data frame through the convolutional neural network, determining a face area according to the face positioning coordinates, and obtaining the face recognition result.

In one embodiment, the method further comprises:

respectively calculating a first area of the person positioning area and a second area of the video data frame;

when the ratio of the first area to the second area is larger than the ratio threshold of the areas, taking the video data frame as the target image area;

the area image interception of the person positioning area to obtain an intercepted target image area comprises the following steps: and when the ratio of the first area to the second area is not larger than the ratio threshold of the areas, carrying out region image interception on the person positioning region to obtain an intercepted target image region.

In one embodiment, when the ratio of the first area to the second area is not greater than the ratio threshold, performing area image segmentation on the person positioning region to obtain a segmented target image region, including:

and when the ratio of the first area to the second area is not larger than the ratio threshold of the areas, taking the person positioning area as a center, carrying out regional image interception after expanding the target area outwards, and obtaining an intercepted target image area.

In one embodiment, inputting the target image area into a recognition model comprises:

carrying out size normalization processing on the target image area to obtain a processed target image area;

and inputting the processed target image area into the recognition model.

In one embodiment, the method further comprises:

acquiring video segment data with labels, and inputting the video segment data into an initial convolutional neural network;

obtaining identification data through the initial convolutional neural network;

and adjusting parameters in the initial convolutional neural network according to the identification data to obtain the convolutional neural network.

In one embodiment, the identification model is a TSM model.

A deep learning based behavior analysis system, the system comprising:

the figure recognition module is used for acquiring a video data frame in real time through a camera and inputting the video data frame into a convolutional neural network to obtain a figure positioning result and a face recognition result;

the region intercepting module is used for determining a person positioning region in the video data frame according to the person positioning result, and carrying out region image intercepting on the person positioning region to obtain an intercepted target image region;

the behavior recognition module is used for inputting the target image area into a recognition model and performing behavior recognition on the people in the target image area to obtain a behavior recognition result;

and the log generation module is used for acquiring a time point in real time through a system clock and generating a behavior log according to the time point, the behavior recognition result and the face recognition result.

A computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the following steps when executing the computer program:

A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, carries out the steps of:

According to the behavior analysis method, the behavior analysis system, the computer equipment and the storage medium based on deep learning, the video data frames are collected in real time through the camera and input into the convolutional neural network, and a figure positioning result and a face recognition result are obtained; determining a figure positioning area in the video data frame according to the figure positioning result, and carrying out area image interception on the figure positioning area to obtain an intercepted target image area; inputting the target image area into an identification model, and performing behavior identification on the characters in the target image area to obtain a behavior identification result; and acquiring a time point in real time through a system clock, and generating a behavior log according to the time point, the behavior recognition result and the face recognition result. Based on the deep learning algorithm of the convolutional neural network, the position and the face information of a person in a video data frame can be identified, and the person behavior is identified by combining an identification model, so that a behavior log is generated, and the accuracy of behavior identification in a far shot is guaranteed; and the behaviors of the old are identified, manual nursing is not needed, and the nursing cost of the old can be reduced.

Drawings

FIG. 1 is a diagram of an application environment of a deep learning based behavior analysis method in one embodiment;

FIG. 2 is a schematic flow chart diagram illustrating a deep learning-based behavior analysis method according to an embodiment;

FIG. 3 is a schematic diagram of person location coordinates and face location coordinates in one embodiment;

FIG. 4 is a graph of a feature distribution without a TSM in one embodiment;

FIG. 5 is a profile of a feature of a TSM according to one embodiment;

FIG. 6 is a process diagram of a deep learning based behavior analysis method in one embodiment;

FIG. 7 is a block diagram of a deep learning based behavior analysis system in one embodiment;

FIG. 8 is a diagram illustrating an internal structure of a computer device according to an embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

The behavior analysis method based on deep learning provided by the embodiment of the application can be applied to the application environment shown in fig. 1. As shown in FIG. 1, the application environment includes a computer device 110. The computer device 110 may acquire a video data frame in real time through a camera, and input the video data frame into a convolutional neural network to obtain a person positioning result and a face recognition result; the computer device 110 may determine a person positioning area in the video data frame according to the person positioning result, and perform area image capturing on the person positioning area to obtain a captured target image area; the computer device 110 may input the target image area into the recognition model, and perform behavior recognition on the person in the target image area to obtain a behavior recognition result; the computer device 110 may acquire a time point in real time through a system clock and generate a behavior log according to the time point, a behavior recognition result, and a face recognition result. The computer device 110 may be, but is not limited to, various personal computers, laptops, smartphones, robots, unmanned aerial vehicles, tablets, portable wearable devices, and the like.

In one embodiment, as shown in fig. 2, there is provided a deep learning-based behavior analysis method, including the steps of:

step 202, acquiring a video data frame in real time through a camera, and inputting the video data frame into a convolutional neural network to obtain a person positioning result and a face recognition result.

The computer equipment can acquire images in real time through the camera, wherein the camera can be installed on the computer equipment or can be a separate camera connected with the computer equipment.

The convolutional neural network may be pre-trained for locating and recognizing a person in a video data frame. The computer equipment can input the video data frame into the pre-trained convolutional neural network, so that a person positioning result and a face recognition result of the person in the video data frame are obtained.

And 204, determining a person positioning area in the video data frame according to the person positioning result, and performing area image interception on the person positioning area to obtain an intercepted target image area.

After obtaining the person positioning result, the computer device may determine the person positioning region in the video data frame according to the person positioning result. The person positioning area may be an area frame where a person is located in the video data frame. The computer device can perform region image interception according to the region frame, so as to obtain an intercepted target image region.

And step 206, inputting the target image area into the recognition model, and performing behavior recognition on the person in the target image area to obtain a behavior recognition result.

The recognition model may be used to recognize the character behavior in the target image area. After obtaining the target image area, the computer device may input the target image area into the recognition model, so that the recognition model recognizes the behavior of the person in the target image, thereby obtaining a behavior recognition result.

And step 208, acquiring a time point in real time through a system clock, and generating a behavior log according to the time point, the behavior recognition result and the face recognition result.

In this embodiment, the computer device acquires a video data frame in real time through the camera, and inputs the video data frame into the convolutional neural network to obtain a person positioning result and a face recognition result; determining a figure positioning area in the video data frame according to the figure positioning result, and carrying out area image interception on the figure positioning area to obtain an intercepted target image area; inputting the target image area into the recognition model, and performing behavior recognition on the character in the target image area to obtain a behavior recognition result; and acquiring a time point in real time through a system clock, and generating a behavior log according to the time point, a behavior recognition result and a face recognition result. Based on the deep learning algorithm of the convolutional neural network, the position and the face information of a person in a video data frame can be identified, and the person behavior is identified by combining an identification model, so that a behavior log is generated, and the accuracy of behavior identification in a far shot is guaranteed; and the behaviors of the old are identified, manual nursing is not needed, and the nursing cost of the old can be reduced.

In an embodiment, the behavior analysis method based on deep learning provided by the present invention may further include a process of obtaining a person positioning result and a face recognition result, where the specific process includes: inputting the video data frame into a convolutional neural network, and determining a person positioning coordinate in the video data frame through the convolutional neural network; calculating a figure positioning result according to the figure positioning coordinates; determining face positioning coordinates in the video data frame through a convolutional neural network, and determining a face area according to the face positioning coordinates to obtain a face recognition result.

In this embodiment, after the computer device inputs the video data frame into the convolutional neural network, the person positioning coordinates can be determined in the video data frame through the convolutional neural network. As shown in fig. 3, after the video data frame is input into the convolutional neural network, the calculated human location coordinate, i.e., a (P) can be obtained_x1，P_y1)，B(P_x2，P_y2). The computer device may obtain a person positioning result from the person positioning coordinates.

Similarly, as shown in FIG. 3, the computer device may determine face location coordinates, i.e., C (f), in the video data frame via a convolutional neural network_x1，f_y1)，D(f_x2，f_y2). The computer equipment can determine the face area according to the face positioning coordinates, so as to obtain a face recognition result.

In an embodiment, the behavior analysis method based on deep learning provided by the present invention may further include a process of performing region image interception, where the specific process includes: respectively calculating a first area of the person positioning area and a second area of the video data frame; when the ratio of the first area to the second area is larger than the ratio threshold of the areas, taking the video data frame as a target image area; and when the ratio of the first area to the second area is not larger than the area ratio threshold, carrying out region image interception on the human positioning region to obtain an intercepted target image region.

The area ratio threshold may be a specific value preset, for example, the area ratio threshold may be 35%.

The computer device may calculate a first area of the person positioning region, i.e. the person positioning region box, and calculate a second area of the video data frame. Specifically, the computer device may further calculate a ratio of the first area to the second area, and compare the ratio of the first area to the second area with an area ratio threshold, thereby obtaining a comparison result. Taking the area ratio threshold as 35% as an example, when the ratio of the first area to the second area is greater than 35%, the information in the video data frame can be considered as valid information, and at this time, no screenshot of the region image is performed, and the video data frame is taken as a target image region; when the ratio of the first area to the second area is not greater than 35%, the computer device may perform region image interception on the human locating region to obtain an intercepted target image region.

In one embodiment, the behavior analysis method based on deep learning provided by the present invention may further include a process of intercepting the region image, where the specific process includes: and when the ratio of the first area to the second area is not more than the ratio threshold of the areas, taking the person positioning area as the center, carrying out regional image interception after expanding the target area outwards, and obtaining an intercepted target image area.

In the present embodiment, taking the area ratio threshold as 35% as an example, when the ratio of the first area to the second area is not greater than the area ratio threshold, the computer apparatus may take a (P) as an example_x1，P_y1)，B(P_x2，P_y2) And (4) taking the target area as the center, and outwards expanding the target area so as to intercept the regional image and obtain the intercepted target image region.

In an embodiment, the behavior analysis method based on deep learning provided by the present invention may further include a process of performing normalization processing on the image region, where the specific process includes: carrying out size normalization processing on the target image area to obtain a processed target image area; and inputting the processed target image area into the recognition model.

In one embodiment, the deep learning-based behavior analysis method provided by the present invention may further include a process of training a convolutional neural network, where the process includes: acquiring video segment data with labels, and inputting the video segment data into an initial convolutional neural network; obtaining identification data through an initial convolutional neural network; and adjusting parameters in the initial convolutional neural network according to the identification data to obtain the convolutional neural network.

In one embodiment, the recognition model may be a TSM model. Specifically, the adopted behavior recognition model is TSM (temporal Shift module), the behavior recognition model has the advantages of considering both the accuracy and the calculation efficiency of behavior recognition, and the core idea of the TSM is as follows: and shifting part of channels along the time dimension to facilitate information exchange between adjacent frames so as to make up for the defect that the traditional 2d CNN cannot capture the time dimension characteristics. As shown in fig. 4 and 5, fig. 4 is a characteristic distribution diagram of the TSM not shown, and fig. 5 is a characteristic distribution diagram of the TSM shown. By comparing fig. 4 and fig. 5, it can be seen that before TSM is performed, the feature data of each time point is independent, and after TSM, the time point n has data of three time points n-1, n, n + 1. Information between adjacent frames is exchanged. TSM is an optimization without computational cost and can be used in most conventional 2D CNNs. The scheme of the invention uses a mode of combining resnet (residual error network, a common convolutional neural network) with TSM to perform behavior recognition.

In one embodiment, as shown in fig. 6, a behavior analysis method based on deep learning is provided, and a specific process includes:

1. the computer equipment acquires video data frames in real time through a camera and inputs the video data frames into a convolutional neural network to obtain a figure positioning result and a face recognition result;

2. the computer equipment respectively calculates a first area of the person positioning area and a second area of the video data frame; when the ratio of the first area to the second area is larger than the ratio threshold of the areas, taking the video data frame as a target image area; when the ratio of the first area to the second area is not larger than the area ratio threshold, carrying out region image interception on the human positioning region to obtain an intercepted target image region;

3. the computer equipment inputs the target image area into the recognition model, and conducts behavior recognition on the characters in the target image area to obtain a behavior recognition result;

4. and the computer equipment acquires the time point in real time through a system clock and generates a behavior log according to the time point, the behavior recognition result and the face recognition result.

It should be understood that, although the steps in the above-described flowcharts are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a portion of the steps in the above-described flowcharts may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of performing the sub-steps or the stages is not necessarily sequential, but may be performed alternately or alternatingly with other steps or at least a portion of the sub-steps or stages of other steps.

In one embodiment, as shown in fig. 7, there is provided a deep learning based behavior analysis system, including: a person identification module 710, a region interception module 720, a behavior identification module 730, and a log generation module 740, wherein:

the person identification module 710 is used for acquiring a video data frame in real time through a camera and inputting the video data frame into a convolutional neural network to obtain a person positioning result and a face identification result;

the region intercepting module 720 is configured to determine a person positioning region in the video data frame according to the person positioning result, and perform region image interception on the person positioning region to obtain an intercepted target image region;

the behavior recognition module 730 is configured to input the target image area into the recognition model, perform behavior recognition on the person in the target image area, and obtain a behavior recognition result;

the log generating module 740 is configured to obtain a time point in real time through a system clock, and generate a behavior log according to the time point, a behavior recognition result, and a face recognition result.

In one embodiment, the person identification module 710 is further configured to input the video data frame into a convolutional neural network, and determine the person positioning coordinates in the video data frame through the convolutional neural network; calculating a figure positioning result according to the figure positioning coordinates; determining face positioning coordinates in the video data frame through a convolutional neural network, and determining a face area according to the face positioning coordinates to obtain a face recognition result.

In one embodiment, the region truncation module 720 is further configured to calculate a first area of the person positioning region and a second area of the video data frame respectively; when the ratio of the first area to the second area is larger than the ratio threshold of the areas, taking the video data frame as a target image area; and when the ratio of the first area to the second area is not larger than the area ratio threshold, carrying out region image interception on the human positioning region to obtain an intercepted target image region.

In one embodiment, the region clipping module 720 is further configured to, when the ratio of the first area to the second area is not greater than the ratio threshold of the areas, perform region clipping after expanding the target area outwards with the person positioning region as the center, and obtain a clipped target image region.

In one embodiment, the behavior recognition module 730 is further configured to perform size normalization processing on the target image area to obtain a processed target image area; and inputting the processed target image area into the recognition model.

In one embodiment, the person identification module 710 is further configured to obtain tagged video clip data and input the video clip data into an initial convolutional neural network; obtaining identification data through an initial convolutional neural network; and adjusting parameters in the initial convolutional neural network according to the identification data to obtain the convolutional neural network.

In one embodiment, the identification model is a TSM model.

In one embodiment, a computer device is provided, which may be a terminal, and its internal structure diagram may be as shown in fig. 8. The computer device includes a processor, a memory, a network interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a deep learning based behavior analysis method. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.

Those skilled in the art will appreciate that the architecture shown in fig. 8 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, a computer device is provided, comprising a memory and a processor, the memory having a computer program stored therein, the processor implementing the following steps when executing the computer program:

inputting the target image area into the recognition model, and performing behavior recognition on the character in the target image area to obtain a behavior recognition result;

and acquiring a time point in real time through a system clock, and generating a behavior log according to the time point, a behavior recognition result and a face recognition result.

In one embodiment, the processor, when executing the computer program, further performs the steps of: inputting the video data frame into a convolutional neural network, and determining a person positioning coordinate in the video data frame through the convolutional neural network; calculating a figure positioning result according to the figure positioning coordinates; determining face positioning coordinates in the video data frame through a convolutional neural network, and determining a face area according to the face positioning coordinates to obtain a face recognition result.

In one embodiment, the processor, when executing the computer program, further performs the steps of: respectively calculating a first area of the person positioning area and a second area of the video data frame; when the ratio of the first area to the second area is larger than the ratio threshold of the areas, taking the video data frame as a target image area; and when the ratio of the first area to the second area is not larger than the area ratio threshold, carrying out region image interception on the human positioning region to obtain an intercepted target image region.

In one embodiment, the processor, when executing the computer program, further performs the steps of: and when the ratio of the first area to the second area is not more than the ratio threshold of the areas, taking the person positioning area as the center, carrying out regional image interception after expanding the target area outwards, and obtaining an intercepted target image area.

In one embodiment, the processor, when executing the computer program, further performs the steps of: carrying out size normalization processing on the target image area to obtain a processed target image area; and inputting the processed target image area into the recognition model.

In one embodiment, the processor, when executing the computer program, further performs the steps of: acquiring video segment data with labels, and inputting the video segment data into an initial convolutional neural network; obtaining identification data through an initial convolutional neural network; and adjusting parameters in the initial convolutional neural network according to the identification data to obtain the convolutional neural network.

In one embodiment, the processor, when executing the computer program, further performs the steps of: the identified model is a TSM model.

In one embodiment, a computer-readable storage medium is provided, having a computer program stored thereon, which when executed by a processor, performs the steps of:

In one embodiment, the computer program when executed by the processor further performs the steps of: inputting the video data frame into a convolutional neural network, and determining a person positioning coordinate in the video data frame through the convolutional neural network; calculating a figure positioning result according to the figure positioning coordinates; determining face positioning coordinates in the video data frame through a convolutional neural network, and determining a face area according to the face positioning coordinates to obtain a face recognition result.

In one embodiment, the computer program when executed by the processor further performs the steps of: respectively calculating a first area of the person positioning area and a second area of the video data frame; when the ratio of the first area to the second area is larger than the ratio threshold of the areas, taking the video data frame as a target image area; and when the ratio of the first area to the second area is not larger than the area ratio threshold, carrying out region image interception on the human positioning region to obtain an intercepted target image region.

In one embodiment, the computer program when executed by the processor further performs the steps of: and when the ratio of the first area to the second area is not more than the ratio threshold of the areas, taking the person positioning area as the center, carrying out regional image interception after expanding the target area outwards, and obtaining an intercepted target image area.

In one embodiment, the computer program when executed by the processor further performs the steps of: carrying out size normalization processing on the target image area to obtain a processed target image area; and inputting the processed target image area into the recognition model.

In one embodiment, the computer program when executed by the processor further performs the steps of: acquiring video segment data with labels, and inputting the video segment data into an initial convolutional neural network; obtaining identification data through an initial convolutional neural network; and adjusting parameters in the initial convolutional neural network according to the identification data to obtain the convolutional neural network.

In one embodiment, the computer program when executed by the processor further performs the steps of: the identified model is a TSM model.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A deep learning-based behavior analysis method, the method comprising:

2. The deep learning-based behavior analysis method according to claim 1, wherein the inputting the video data frame into a convolutional neural network to obtain a human localization result and a human face recognition result comprises:

3. The deep learning based behavior analysis method according to claim 1, further comprising:

4. The behavior analysis method based on deep learning of claim 3, wherein when the ratio of the first area to the second area is not greater than the ratio threshold, performing region image segmentation on the person positioning region to obtain a segmented target image region, comprises:

5. The deep learning based behavior analysis method according to claim 4, wherein inputting the target image region into a recognition model comprises:

and inputting the processed target image area into the recognition model.

6. The deep learning based behavior analysis method according to claim 1, further comprising:

obtaining identification data through the initial convolutional neural network;

7. The deep learning-based behavior analysis method according to claim 1, wherein the recognition model is a TSM model.

8. A deep learning based behavior analysis system, the system comprising:

9. A computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor implements the steps of the method of any one of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.