CN113723169A

CN113723169A - Behavior identification method, system and equipment based on SlowFast

Info

Publication number: CN113723169A
Application number: CN202110455595.XA
Authority: CN
Inventors: 马喜波; 徐哲; 雷震
Original assignee: Institute of Automation of Chinese Academy of Science
Current assignee: Institute of Automation of Chinese Academy of Science
Priority date: 2021-04-26
Filing date: 2021-04-26
Publication date: 2021-11-30
Anticipated expiration: 2041-04-26
Also published as: CN113723169B

Abstract

The invention belongs to the technical field of behavior recognition, and particularly relates to a behavior recognition method, a behavior recognition system and a behavior recognition equipment device based on SlowFast, aiming at solving the problems of low recognition efficiency and low recognition accuracy. The method comprises the following steps: preprocessing original video data of the target behaviors to obtain preprocessed video data; dividing the pre-processed video data into a training data set and a verification data set; inputting the training data set into a first SlowFast neural network recognition model which is constructed in advance for preliminary training to obtain a second SlowFast neural network recognition model; calculating the identification precision of the second SlowFast neural network identification model according to the verification data set; adjusting parameters of the second SlowFast neural network recognition model according to the recognition precision, and performing iterative training to obtain a third SlowFast neural network recognition model; and identifying target behaviors in the real environment by using a third SlowFast neural network identification model. The invention greatly improves the identification efficiency, saves the labor and time and improves the identification precision.

Description

Behavior identification method, system and equipment based on SlowFast

Technical Field

The invention belongs to the technical field of behavior recognition, and particularly relates to a behavior recognition method, a behavior recognition system and behavior recognition equipment based on SlowFast.

Background

In many medical experiments, it is not possible to directly treat a human as an experimental subject for safety and ethical reasons. At this time, the artificially fed animals can be used for experiments instead of human beings, and the experimental results are obtained by observing and recording the behavior and physiological changes of the animals. Because human and monkey are close relations and are primates, the observation of the behavior change of the monkey has direct biological significance and medical significance.

At present, the monkey behaviors are observed generally in a mode of on-site manual observation and video monitoring, but the existing monitoring mode has the following problems:

1. on-spot manual observation generally wastes time and energy, and the monkey action receives operation personnel's interference easily, has the influence to the experimental result, leads to detecting the precision low.

2. With video surveillance, behavior recording still requires a lot of manual involvement and is therefore not an optimal solution.

Disclosure of Invention

In order to solve the problems of low efficiency and low detection precision in the prior art, the invention provides a behavior identification method, a behavior identification system and behavior identification equipment based on SlowFast,

in a first aspect of the present invention, a behavior identification method based on SlowFast is provided, where the method includes:

preprocessing original video data of the target behaviors to obtain preprocessed video data;

dividing the pre-processed video data into a training data set and a verification data set;

inputting the training data set into a first SlowFast neural network recognition model which is constructed in advance for preliminary training to obtain a second SlowFast neural network recognition model;

calculating the identification precision of the second SlowFast neural network identification model according to the verification data set;

adjusting parameters of the second SlowFast neural network recognition model according to the recognition precision, and performing iterative training to obtain a third SlowFast neural network recognition model;

and identifying target behaviors in a real environment by using the third SlowFast neural network identification model.

Optionally, the preprocessing the original video data of the target behavior to obtain preprocessed video data includes:

performing first preprocessing on the original video data of the target behaviors to obtain a plurality of video segment data, wherein each video segment data comprises one target behavior;

and respectively carrying out second preprocessing on the plurality of video segment data to expand the data to obtain preprocessed video data.

Optionally, the obtaining of the plurality of pieces of video segment data by performing the first preprocessing on the original video data of the target behavior includes:

acquiring the start-stop moment and the behavior category label of each target behavior in the original video data of the target behaviors;

cutting the original video data of the target behaviors according to the starting and stopping moments to obtain video fragment data;

and labeling the video name label of each video clip according to the behavior category label.

Optionally, the performing a second pre-processing on the plurality of pieces of video segment data to expand the data, and obtaining pre-processed video data includes:

and performing one or more operations of random cutting and turning on the plurality of video segments to obtain the expanded preprocessed video data.

Optionally, the inputting the training data set into a first SlowFast neural network recognition model which is constructed in advance for preliminary training, and obtaining a second SlowFast neural network recognition model includes:

sampling the training data set according to a preset sampling rule to obtain a first video data sample with a preset frame number;

performing data enhancement preprocessing on the first video data sample;

sampling a first video data sample subjected to data enhancement preprocessing according to a first sampling interval to obtain a second video data sample, and inputting the second video data sample into a Slow branch in a first Slow fast neural network recognition model to obtain spatial information of a target behavior;

sampling the first video data sample subjected to data enhancement preprocessing according to a second sampling interval to obtain a third video data sample, and inputting the third video data sample into a Fast branch in a first SlowFast neural network identification model to obtain time information of a target behavior; wherein the first sampling interval is greater than the second sampling interval;

fusing the spatial information and the time information;

calculating according to the fused information to obtain a training recognition result;

and repeating the training process according to the preset training times to obtain a second SlowFast neural network recognition model.

Optionally, the sampling the training data set according to a preset sampling rule to obtain a first video data sample with a preset frame number includes:

acquiring an initial frame number of each video data in a training data set;

determining a sampling interval according to a preset proportional corresponding relation between the initial frame number and the sampling interval;

sampling according to the sampling interval to obtain an intermediate video data sample;

and if the frame number of the intermediate video data sample is greater than the preset frame number, randomly intercepting the video data sample with the preset frame number to determine the video data sample as a first video data sample.

Optionally, the calculating the recognition accuracy of the second SlowFast neural network recognition model according to the verification data set includes:

inputting the verification data set into the second SlowFast neural network recognition model, sequentially training each video data in the verification data set, and outputting a verification recognition result of each video data, wherein the verification recognition result is a behavior class verification label;

comparing the behavior class verification label with a video name label;

and calculating the ratio of the verification identification results of the behavior type verification label and the video name label, and determining the ratio as the identification precision.

In a second aspect, the invention provides a SlowFast-based behavior recognition system, comprising:

the preprocessing unit is used for preprocessing the original video data of the target behaviors to obtain preprocessed video data;

a dividing unit, configured to divide the preprocessed video data into a training data set and a verification data set;

the first training unit is used for inputting the training data set into a first SlowFast neural network recognition model which is constructed in advance for preliminary training to obtain a second SlowFast neural network recognition model;

a calculating unit, configured to calculate, according to the verification data set, an identification accuracy of the second SlowFast neural network identification model;

the second training unit is used for adjusting parameters of the second SlowFast neural network recognition model according to the recognition precision and performing iterative training to obtain a third SlowFast neural network recognition model;

and the identification unit is used for identifying the target behaviors in the real environment by utilizing the third SlowFast neural network identification model.

In a third aspect of the present invention, an apparatus is provided, which includes:

at least one processor; and

a memory communicatively coupled to at least one of the processors; wherein the content of the first and second substances,

the memory stores instructions executable by the processor to perform a method for SlowFast-based behavior recognition according to any one of the first aspect.

In a fourth aspect of the present invention, a computer-readable storage medium is provided, which stores computer instructions for execution by the computer to implement the method for SlowFast-based behavior recognition according to the first aspect.

The invention has the beneficial effects that: according to the method, the target behaviors are automatically identified by establishing a neural network identification model based on the SlowFast algorithm, so that the identification efficiency is greatly improved, and the preprocessed video data is divided into a training data set and a verification data set; inputting the training data set into a first SlowFast neural network recognition model which is constructed in advance for preliminary training to obtain a second SlowFast neural network recognition model; calculating the identification precision of the second SlowFast neural network identification model according to the verification data set; adjusting parameters of the second SlowFast neural network recognition model according to the recognition precision, and performing iterative training to obtain a third SlowFast neural network recognition model; and the third SlowFast neural network recognition model is used for recognizing the target behaviors in the real environment, so that the detection precision of the SlowFast neural network recognition model is greatly improved.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

FIG. 1 is a schematic diagram of a behavior recognition method based on SlowFast according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a behavior recognition method based on SlowFast according to another embodiment of the present invention;

FIG. 3 is a schematic diagram of a SlowFast-based behavior recognition system according to an embodiment of the present invention;

FIG. 4 is a block diagram of a computer system of a server for implementing embodiments of the method, system, and apparatus of the present application.

Detailed Description

The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.

The invention provides a behavior recognition method based on SlowFast, which is mainly applied to recognition of monkey behaviors, and comprises the following steps:

In order to more clearly explain the behavior recognition method based on SlowFast of the present invention, the following describes the steps in the embodiment of the present invention in detail with reference to fig. 1.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.

The behavior recognition method based on SlowFast according to the first embodiment of the invention comprises the following steps S101-S106, and the steps are described in detail as follows:

step S101: and preprocessing the original video data of the target behaviors to obtain preprocessed video data.

In the practice of the present application, the target behaviors are primarily monkey behaviors, which are a collective term for the family monkeys, primates, mammalia, omnivorous, fruit-based, meat-based foods available without the need to spit, and are members of three types of ape primates.

In one example, the observation target may be a rhesus monkey or a cynomolgus monkey, and may also be primarily a rhesus monkey, with a small portion being a cynomolgus monkey.

In this step, before preprocessing the target behavior raw video data, the target behavior raw video data is first acquired. In one example, front and top view video data is obtained primarily for a monkey. The concrete operation can be design and prepare two fixing device, puts into fixing device with two cameras, installs fixing device respectively again and treats the cage front and the top that shoots the monkey place. All the behaviors of the monkeys are photographed without interruption without interference. The photographs were taken to cover monkeys of different sexes and ages as much as possible.

and carrying out first preprocessing on the original video data of the target behaviors to obtain a plurality of video segment data, wherein each video segment comprises one target behavior.

Specifically, the obtaining of the plurality of pieces of video segment data by performing the first preprocessing on the original video data of the target behavior includes:

in the embodiment of the application, the original video data of the target behaviors are cleaned firstly, the original video data of the target behaviors with higher definition are selected, then the original video data of the target behaviors with higher definition are watched according to the predetermined action categories, and the starting time and the ending time of each action category are recorded; and labeling the behavior category labels.

The action categories are determined in advance, all actions of a target, such as a monkey, are defined in a classified manner by a worker in advance, the actions of the monkey are required to be completely visible, the occurrence frequency is high, the definition can be made clear, the classification is more detailed, and the types of recognition of the SlowFast neural network recognition model are more. In one example, the action categories of monkeys can be divided into 10 categories, 1, lying down, respectively; 2. squatting; 3. walking; 4. jumping upwards; 5. jumping downwards; 6. climbing upwards; 7. climbing downwards; 8. hanging; 9. standing; 10. and (4) climbing. Wherein, 1-10 are behavior category labels.

And cutting the original video data of the target behaviors according to the starting and stopping moments to obtain video fragment data. In one example, Python code may be written in conjunction with the Ffmpeg software command line to batch crop data from the start-stop time.

And labeling the video name label of the video clip according to the behavior category label. For example, if the action category represented by the action category label "1" is lying down, the video name label may be set to "1", so that the action category label and thus the action category may be determined according to the video name label.

The data can be expanded through the second preprocessing, the amount of the preprocessed video data is increased, and the preprocessed video data can be divided into the training data set and the verification data set only under the condition that the preprocessed video data are sufficient, so that a data basis is provided for the step S102.

Step S102: the pre-processed video data is divided into a training data set and a verification data set.

In the step, division is performed according to a preset proportion, for example, 4:1, and the proportion of the training data set is larger than that of the verification data set.

Step S103: and inputting the training data set into a first SlowFast neural network recognition model which is constructed in advance for preliminary training to obtain a second SlowFast neural network recognition model.

Optionally, the step of inputting the training data set into a first SlowFast neural network recognition model which is constructed in advance to perform preliminary training to obtain a second SlowFast neural network recognition model includes the following steps, as shown in fig. 2:

step S201: sampling the training data set according to a preset sampling rule to obtain a first video data sample with a preset frame number, and specifically comprising the following steps:

acquiring an initial frame number of each video data in a training data set, and determining a sampling interval according to a preset proportional corresponding relation between the initial frame number and the sampling interval; for example, 30-60 frames of video, with a sample interval of 1; the video with 60-90 frames has a sampling interval of 2; the video of 90-180 frames, the sampling interval is 3; video larger than 180 frames: the sampling interval is 4. Sampling according to the sampling interval to obtain an intermediate video data sample; if the frame number of the intermediate video data sample is greater than the preset frame number, randomly intercepting the video data sample with the preset frame number to determine that the video data sample is a first video data sample, for example, a video with 50 frames, and the sampling interval is 1, then obtaining 50 video data samples, and randomly intercepting 30 continuous frames in the 50 video data samples as the first video data sample.

Because the initial frame numbers of the videos are inconsistent, the sampling interval is dynamically set according to the initial frame numbers of the videos, and the longer the video is, the larger the sampling interval is, so that the global information of the video is more favorably obtained.

Step S202: performing data enhancement pre-processing on the first video data sample. The processing method comprises the following steps: random clipping and horizontal flipping at 50% probability.

Step S203: and sampling the first video data sample subjected to data enhancement preprocessing according to a first sampling interval to obtain a second video data sample, and inputting the second video data sample into a Slow branch in a first Slow fast neural network recognition model to obtain spatial information of a target behavior.

In one example, after first sampling first video data samples of 30 frames, for example, the first video data samples of 30 frames are sampled again at a sampling interval of 6, and second video data samples of 5 video frames are obtained and input to the Slow branch.

In the application, the Slow branch is used for acquiring spatial information of a video, such as color, plants and other information around a monkey, and although there are few input video frames in the Slow branch, feature information is complex and fine-grained, so that a large amount of calculation is generated, and the calculation amount occupies about 80% of the whole network model.

Step S204: sampling the first video data sample subjected to data enhancement preprocessing according to a second sampling interval to obtain a third video data sample, and inputting the third video data sample into a Fast branch in a first SlowFast neural network identification model to obtain time information of a target behavior; wherein the first sampling interval is greater than the second sampling interval.

In one example, after the first video data sample of 30 frames is obtained by the first sampling, the first video data sample of 30 frames is sampled again with a sampling interval of 2, and the third video data sample of 15 video frames is obtained and input to the Fast branch.

In the application, the Fast branch is used for acquiring time information of a video, such as action information of a monkey from 2S to 3S, and although there are many input video frames in the Fast branch, the characteristic information is simple, fine granularity is low, and the calculation amount is small, and occupies about 20% of the whole network for calculation.

Step S205: and fusing the spatial information and the time information.

In this step, the first SlowFast neural network identification model includes a channel connected from the Fast branch to the Slow branch, so as to fuse the time information and the spatial information. However, because the number of input video frames of the two branches is different, the generated feature dimensions are also different, so that the feature maps of the Fast branch need to be subjected to scale change by using a 3D convolution kernel such as 5 × 1 during connection, and then summed with the feature map of the Slow branch to realize temporal and spatial feature information fusion.

Step S206: and calculating to obtain a training recognition result according to the fused information.

In the step, complete video information is obtained after fusion, and is input into a full link layer extraction characteristic value of a first SlowFast neural network recognition model, and the characteristics extracted from the full link layer are input into a sigmoid regression layer for calculation, so that a training recognition result is obtained.

Step S207: and repeating the training processes S201-S206 according to preset training times to obtain a second SlowFast neural network recognition model.

Step S104: and calculating the identification precision of the second SlowFast neural network identification model according to the verification data set.

inputting the verification data set into the second SlowFast neural network recognition model, sequentially training each video data in the verification data set, and outputting a verification recognition result of each video data, wherein the verification recognition result is a behavior type verification label. The training process refers to steps S201 to S206, which are not described herein again.

Comparing the behavior class verification label with a video name label;

In one example, for example, the verification training set includes 10 sets of video segment data, each set includes 10 video segment data, each video segment corresponds to one video name tag, and the video name tags corresponding to the 10 video segment data are, 1 and lying down, respectively; 2. squatting; 3. walking; 4. jumping upwards; 5. jumping downwards; 6. climbing upwards; 7. climbing downwards; 8. hanging; 9. standing; 10. and (4) climbing. 1-10 represent the behavior category label and also represent the video name label, for example, after a video segment with the video name label of 1 is trained, the output is the behavior category label 2, then the action category is determined to be squatting according to the behavior category label 2, and the identification is wrong when the action category is different from the input video label, that is, the action is different from the real action. Assuming that 5 of the 10 video segments in one group are identified incorrectly and 5 are identified correctly, the group identification accuracy is 50%.

Step S105: and adjusting parameters of the second SlowFast neural network recognition model according to the recognition precision, and performing iterative training to obtain a third SlowFast neural network recognition model.

In this step, iterative training times are preset, the second SlowFast neural network recognition model is trained according to the preset training times, for example, 1000 times, model parameters such as parameters learning _ rate and weight _ decay are adjusted according to the output recognition precision each time training is performed, and after the training of all times is completed, the corresponding model parameter with the highest recognition precision is configured as a third SlowFast neural network recognition model parameter.

Step S106: and identifying target behaviors in a real environment by using the third SlowFast neural network identification model. And deploying the third SlowFast neural network recognition model to a server in a real environment, and recognizing the monkey behaviors.

In another embodiment of the present application, the preprocessed video data may be divided into test data sets, for example, according to a ratio of 3:1:1, where the training data set accounts for 60%, the verification data set accounts for 20%, and the test data set accounts for 20%, where the test data set is used to test the performance of the third SlowFast neural network recognition model, and the optimal recognition accuracy of the third SlowFast neural network recognition model on the test data set is determined as the recognition accuracy of the third SlowFast neural network recognition model.

In a second aspect, based on the same inventive concept, the invention provides a behavior recognition system based on SlowFast, which is mainly used for recognition of monkey behaviors, as shown in fig. 3, and the system comprises:

the preprocessing unit 301 is configured to preprocess the original video data of the target behavior to obtain preprocessed video data;

a dividing unit 302, configured to divide the preprocessed video data into a training data set and a verification data set;

a first training unit 303, configured to input the training data set into a first SlowFast neural network recognition model that is constructed in advance for preliminary training, so as to obtain a second SlowFast neural network recognition model;

a calculating unit 304, configured to calculate, according to the verification data set, an identification accuracy of the second SlowFast neural network identification model;

a second training unit 305, configured to adjust parameters of the second SlowFast neural network recognition model according to the recognition accuracy, and perform iterative training to obtain a third SlowFast neural network recognition model;

a recognition unit 306, configured to recognize a target behavior in a real environment by using the third SlowFast neural network recognition model.

It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process and related description of the system described above may refer to the corresponding process in the foregoing method embodiments, and will not be described herein again.

It should be noted that the behavior recognition system based on SlowFast provided in the foregoing embodiment is only illustrated by the division of the foregoing functional modules, and in practical applications, the functions may be allocated to different functional modules according to needs, that is, the modules or steps in the embodiment of the present invention are further decomposed or combined, for example, the modules in the foregoing embodiment may be combined into one module, or may be further split into multiple sub-modules, so as to complete all or part of the functions described above. The names of the modules and steps involved in the embodiments of the present invention are only for distinguishing the modules or steps, and are not to be construed as unduly limiting the present invention.

An apparatus of a third embodiment of the invention comprises:

at least one processor; and

A computer-readable storage medium of a fourth embodiment of the present invention stores computer instructions for execution by the computer to implement the method for SlowFast-based behavior recognition according to any one of the first aspect.

It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes and related descriptions of the storage device and the processing device described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

Referring now to FIG. 4, therein is shown a block diagram of a computer system of a server that may be used to implement embodiments of the method, system, and apparatus of the present application. The server shown in fig. 4 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.

As shown in fig. 4, the computer system includes a Central Processing Unit (CPU)401 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 402 or a program loaded from a storage section 408 into a Random Access Memory (RAM) 403. In the RAM 403, various programs and data necessary for system operation are also stored. The CPU401, ROM 402, and RAM 403 are connected to each other via a bus 404. An Input/Output (I/O) interface 405 is also connected to the bus 404.

The following components are connected to the I/O interface 405: an input section 406 including a keyboard, a mouse, and the like; an output section 407 including a Display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and a speaker; a storage section 408 including a hard disk and the like; and a communication section 409 including a Network interface card such as a LAN (Local Area Network) card, a modem, or the like. The communication section 409 performs communication processing via a network such as the internet. A driver 410 is also connected to the I/O interface 405 as needed. A removable medium 411 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 410 as necessary, so that a computer program read out therefrom is mounted into the storage section 408 as necessary.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 409, and/or installed from the removable medium 411. The computer program performs the above-described functions defined in the method of the present application when executed by a Central Processing Unit (CPU) 401. It should be noted that the computer readable medium mentioned above in the present application may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations for aspects of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The terms "first," "second," and the like are used for distinguishing between similar elements and not necessarily for describing or implying a particular order or sequence.

The terms "comprises," "comprising," or any other similar term are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

So far, the technical solutions of the present invention have been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of the present invention is obviously not limited to these specific embodiments. Equivalent changes or substitutions of related technical features can be made by those skilled in the art without departing from the principle of the invention, and the technical scheme after the changes or substitutions can fall into the protection scope of the invention.

Claims

1. A behavior recognition method based on SlowFast, characterized by comprising the following steps:

2. The method of claim 1, wherein preprocessing the raw video data of the target behavior to obtain preprocessed video data comprises:

3. The method of claim 2, wherein the first pre-processing the raw video data of the target line to obtain a plurality of video segment data comprises:

4. The method of claim 2, wherein the second pre-processing the video segment data to expand the data respectively comprises:

5. The method according to claim 1, wherein the inputting the training data set into a first SlowFast neural network recognition model which is constructed in advance for preliminary training, and obtaining a second SlowFast neural network recognition model comprises:

performing data enhancement preprocessing on the first video data sample;

fusing the spatial information and the time information;

6. The method of claim 5, wherein sampling the training data set according to a predetermined sampling rule to obtain a predetermined number of first video data samples comprises:

acquiring an initial frame number of each video data in a training data set;

7. The method according to claim 3, wherein said calculating the recognition accuracy of the second SlowFast neural network recognition model from the validation data set comprises:

comparing the behavior class verification label with a video name label;

8. A SlowFast-based behavior recognition system, the system comprising:

9. An apparatus, comprising:

at least one processor; and

the memory stores instructions executable by the processor to perform the method for SlowFast-based behavior recognition of any one of claims 1-7.

10. A computer-readable storage medium storing computer instructions for execution by the computer to perform the method for SlowFast-based behavior recognition of any one of claims 1-7.