CN113723169B

CN113723169B - SlowFast-based behavior recognition method, system and equipment

Info

Publication number: CN113723169B
Application number: CN202110455595.XA
Authority: CN
Inventors: 马喜波; 徐哲; 雷震
Original assignee: Institute of Automation of Chinese Academy of Science
Current assignee: Institute of Automation of Chinese Academy of Science
Priority date: 2021-04-26
Filing date: 2021-04-26
Publication date: 2024-04-30
Anticipated expiration: 2041-04-26
Also published as: CN113723169A

Abstract

The invention belongs to the technical field of behavior recognition, and particularly relates to a behavior recognition method, system and equipment based on SlowFast, aiming at solving the problems of low recognition efficiency and low recognition precision. The method comprises the following steps: preprocessing target behavior original video data to obtain preprocessed video data; dividing the preprocessed video data into a training data set and a verification data set; inputting the training data set into a pre-constructed first SlowFast neural network identification model for preliminary training to obtain a second SlowFast neural network identification model; calculating the recognition accuracy of the second SlowFast neural network recognition model according to the verification data set; adjusting parameters of the second SlowFast neural network identification model according to the identification precision, and performing iterative training to obtain a third SlowFast neural network identification model; and identifying target behaviors in the real environment by using a third SlowFast neural network identification model. The invention greatly improves the recognition efficiency, saves manpower and time and improves the recognition precision.

Description

SlowFast-based behavior recognition method, system and equipment

Technical Field

The invention belongs to the technical field of behavior recognition, and particularly relates to a behavior recognition method, system and equipment based on SlowFast.

Background

In many medical experiments, it is not possible to take a person directly as the subject for safety and ethical reasons. At this time, the artificially raised animals can replace humans to perform experiments, and the results of the experiments are obtained by observing and recording the behaviors and physiological changes of the animals. Since humans are closely related to monkeys and are also primates, observing changes in behavior in monkeys is of direct biological and medical interest.

At present, a mode of site manual observation and video monitoring is generally adopted when the monkey behavior is observed, but the existing monitoring mode has the following problems:

1. Manual observation on site is generally time-consuming and labor-consuming, and monkey behaviors are easily interfered by operators, so that experimental results are affected, and detection accuracy is low.

2. With video monitoring, behavior recording still requires a lot of manual involvement and is therefore not an optimal solution.

Disclosure of Invention

In order to solve the problems in the prior art, namely the problems of low efficiency and low detection precision, the invention provides a behavior recognition method, a behavior recognition system and behavior recognition equipment based on SlowFast,

In a first aspect of the present invention, a behavior recognition method based on SlowFast is provided, where the method includes:

preprocessing target behavior original video data to obtain preprocessed video data;

Dividing the pre-processed video data into a training data set and a verification data set;

inputting the training data set into a pre-constructed first SlowFast neural network identification model for preliminary training to obtain a second SlowFast neural network identification model;

Calculating the recognition accuracy of the second SlowFast neural network recognition model according to the verification dataset;

Adjusting parameters of the second SlowFast neural network identification model according to the identification precision, and performing iterative training to obtain a third SlowFast neural network identification model;

and identifying target behaviors in the real environment by using the third SlowFast neural network identification model.

Optionally, the preprocessing the target behavior original video data to obtain preprocessed video data includes:

Performing first preprocessing on the target behavior original video data to obtain a plurality of video clip data, wherein each video clip data comprises a target behavior;

And respectively performing second preprocessing on the video clip data to expand the data, so as to obtain preprocessed video data.

Optionally, the first preprocessing the target behavior original video data to obtain a plurality of video clip data includes:

acquiring the starting and ending time of each target behavior in the target behavior original video data and a behavior class label;

cutting the original video data of the target behaviors according to the starting and ending moments to obtain video clip data;

and labeling the video name label of each video clip according to the behavior category label.

Optionally, the performing second preprocessing on the plurality of video clip data to expand the data, and obtaining the preprocessed video data includes:

And performing one or more operations of random cutting and overturning on the video clips to obtain the expanded preprocessed video data.

Optionally, inputting the training data set into a pre-constructed first SlowFast neural network identification model for preliminary training, and obtaining the second SlowFast neural network identification model includes:

Sampling the training data set according to a preset sampling rule to obtain a first video data sample with a preset frame number;

performing data enhancement preprocessing on the first video data sample;

Sampling the first video data sample subjected to data enhancement pretreatment according to a first sampling interval to obtain a second video data sample, and inputting the second video data sample into a Slow branch in a first SlowFast neural network identification model to obtain spatial information of a target behavior;

Sampling the first video data sample subjected to data enhancement pretreatment according to a second sampling interval to obtain a third video data sample, and inputting the third video data sample into a Fast branch in a first SlowFast neural network identification model to obtain time information of a target behavior; wherein the first sampling interval is greater than the second sampling interval;

fusing the space information and the time information;

calculating according to the fused information to obtain a training recognition result;

Repeating the training process according to the preset training times to obtain a second SlowFast neural network identification model.

Optionally, the sampling the training data set according to a preset sampling rule to obtain a first video data sample with a preset frame number includes:

acquiring an initial frame number of each video data in a training data set;

determining a sampling interval according to a preset proportional corresponding relation between the initial frame number and the sampling interval;

Sampling according to the sampling interval to obtain an intermediate video data sample;

and if the frame number of the intermediate video data samples is larger than the preset frame number, randomly intercepting the video data samples with the preset frame number and determining the video data samples with the preset frame number as first video data samples.

Optionally, the calculating the recognition accuracy of the second SlowFast neural network recognition model according to the verification dataset includes:

inputting the verification data set into the second SlowFast neural network identification model, training each video data in the verification data set in sequence, and outputting a verification identification result of each video data, wherein the verification identification result is a behavior type verification tag;

comparing the behavior category verification tag with a video name tag;

And calculating the duty ratio of the verification recognition result of the behavior category verification tag which is the same as that of the video name tag, and determining the duty ratio as the recognition precision.

In a second aspect, the present invention proposes a SlowFast-based behavior recognition system, the system comprising:

the preprocessing unit is used for preprocessing the target behavior original video data to obtain preprocessed video data;

A dividing unit for dividing the pre-processed video data into a training data set and a verification data set;

The first training unit is used for inputting the training data set into a pre-constructed first SlowFast neural network identification model to perform preliminary training to obtain a second SlowFast neural network identification model;

a calculation unit, configured to calculate an identification accuracy of the second SlowFast neural network identification model according to the verification dataset;

The second training unit is used for adjusting parameters of the second SlowFast neural network identification model according to the identification precision and performing iterative training to obtain a third SlowFast neural network identification model;

and the identification unit is used for identifying target behaviors in the real environment by using the third SlowFast neural network identification model.

In a third aspect of the invention, an apparatus is presented comprising:

at least one processor; and

A memory communicatively coupled to at least one of the processors; wherein,

The memory stores instructions executable by the processor for execution by the processor to implement the SlowFast-based behavior recognition method of any one of the first aspects.

In a fourth aspect of the present invention, a computer readable storage medium is provided, where computer instructions are stored, where the computer instructions are configured to be executed by the computer to implement the method for behavior recognition based on SlowFast in the first aspect.

The invention has the beneficial effects that: according to the invention, the target behavior is automatically identified by establishing the neural network identification model based on SlowFast algorithm, so that the identification efficiency is greatly improved, and the preprocessed video data is divided into a training data set and a verification data set; inputting the training data set into a pre-constructed first SlowFast neural network identification model for preliminary training to obtain a second SlowFast neural network identification model; calculating the recognition accuracy of the second SlowFast neural network recognition model according to the verification data set; adjusting parameters of the second SlowFast neural network identification model according to the identification precision, and performing iterative training to obtain a third SlowFast neural network identification model; and the third SlowFast neural network recognition model is used for recognizing the target behavior in the real environment, so that the detection accuracy of the SlowFast neural network recognition model is greatly improved.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the detailed description of non-limiting embodiments, made with reference to the accompanying drawings in which:

FIG. 1 is a schematic diagram of a behavior recognition method based on SlowFast according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a behavior recognition method based on SlowFast according to another embodiment of the present invention;

FIG. 3 is a schematic diagram of a SlowFast-based behavior recognition system according to an example embodiment of the present invention;

FIG. 4 is a schematic diagram of a computer system of a server for implementing embodiments of the method, system, and apparatus of the present application.

Detailed Description

The application is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the application and are not limiting of the application. It should be noted that, for convenience of description, only the portions related to the present application are shown in the drawings.

The invention provides a SlowFast-based behavior recognition method which is mainly applied to monkey behavior recognition, and the method comprises the following steps:

In order to more clearly describe the behavior recognition method based on SlowFast of the present invention, each step in the embodiment of the present invention is described in detail below with reference to fig. 1.

It should be noted that, without conflict, the embodiments of the present application and features of the embodiments may be combined with each other. The application will be described in detail below with reference to the drawings in connection with embodiments.

The SlowFast-based behavior recognition method according to the first embodiment of the present invention includes steps S101 to S106, and each step is described in detail as follows:

Step S101: and preprocessing the target behavior original video data to obtain preprocessed video data.

In the implementation of the application, the target behavior mainly refers to monkey behavior, the monkey is a collective term of mammal primates, the omnivorous is mainly fruit, and the animal is a member of three kinds of apes primates without being spitted.

In one example, the observation target may be a rhesus or a cynomolgus monkey, and may be mainly a rhesus, with a small fraction being a cynomolgus monkey.

In this step, the target-behavior raw video data is first acquired before the target-behavior raw video data is preprocessed. In one example, the monkey's front and top video data is obtained primarily. The specific operation can be to design and prepare two fixing devices, put two cameras into the fixing devices, and then respectively install the fixing devices on the front and the top of the cage where the monkey to be photographed is located. All the actions of the monkey were photographed without interruption. Monkeys of different sexes and ages were photographed as closely as possible.

and performing first preprocessing on the target behavior original video data to obtain a plurality of video segment data, wherein each video segment comprises a target behavior.

Specifically, the first preprocessing the target behavior original video data to obtain a plurality of video clip data includes:

In the embodiment of the application, firstly, the original video data of the target behavior is cleaned, the original video data of the target behavior with higher definition is selected, then the original video data of the target behavior with higher definition is watched according to the predetermined action category, and the starting time and the ending time of each action category are recorded; and labeling behavior category labels.

The action category is determined in advance, and a worker classifies and defines all actions of a target such as a monkey, so that the actions of the monkey are required to be completely visible, the occurrence number is more, the action category can be definitely defined, and the more the classification is, the more the category is identified by the SlowFast neural network identification model. In one example, the action categories of monkeys may be divided into 10 categories, 1, lying down, respectively; 2. squatting; 3. walking; 4. jumping upwards; 5. jumping downwards; 6. climbing upwards; 7. climbing downwards; 8. hanging; 9. standing; 10. and (5) climbing. Wherein, 1-10 are behavior category labels.

And cutting the original video data of the target behaviors according to the starting and ending time to obtain video clip data. In one example, python code may be written in combination with Ffmpeg software command lines, and the data is cut out in bulk according to the start-stop times.

And labeling the video name label of the video clip according to the behavior category label. For example, if the action category represented by the action category label "1" is horizontal, the video name label may be set to "1", so that the action category label, and thus the action category, may be determined according to the video name label.

The data can be expanded by the second preprocessing, and the number of preprocessed video data can be increased, and the preprocessed video data can be divided into the training data set and the verification data set only if the preprocessed video data is sufficient, so that a data basis is provided for step S102.

Step S102: the pre-processed video data is divided into a training data set and a validation data set.

In this step, the training data set is divided according to a preset ratio, such as 4:1, and the duty ratio of the training data set is larger than that of the verification data set.

Step S103: and inputting the training data set into a pre-constructed first SlowFast neural network identification model to perform preliminary training to obtain a second SlowFast neural network identification model.

Optionally, inputting the training data set into a pre-constructed first SlowFast neural network identification model for preliminary training, and obtaining a second SlowFast neural network identification model includes the following steps, as shown in fig. 2:

step S201: sampling the training data set according to a preset sampling rule to obtain a first video data sample with a preset frame number, wherein the method specifically comprises the following steps:

Acquiring an initial frame number of each video data in a training data set, and determining a sampling interval according to a preset proportional corresponding relation between the initial frame number and the sampling interval; for example, 30-60 frames of video, with a sampling interval of 1;60-90 frames of video with a sampling interval of 2;90-180 frames of video with a sampling interval of 3; video of more than 180 frames: the sampling interval is 4. Sampling according to the sampling interval to obtain an intermediate video data sample; if the number of frames of the intermediate video data samples is greater than the preset number of frames, randomly intercepting the video data samples of the preset number of frames to determine the video data samples as first video data samples, for example, 50 frames of video, wherein the sampling interval is 1, 50 video data samples are obtained, and then 30 continuous frames are randomly intercepted from the 50 video data samples to serve as first video data samples.

Because the initial frames of the video are inconsistent, the application dynamically sets the sampling interval according to the initial frames of the video, and the longer the video is, the larger the sampling interval is, thus being more beneficial to obtaining the global information of the video.

Step S202: and carrying out data enhancement preprocessing on the first video data sample. The processing method comprises the following steps: random clipping and 50% probability horizontal flipping.

Step S203: and sampling the first video data sample subjected to data enhancement pretreatment according to a first sampling interval to obtain a second video data sample, and inputting the second video data sample into a Slow branch in a first SlowFast neural network identification model to obtain spatial information of target behaviors.

In one example, for example, after a first sampling obtains a first video data sample of 30 frames, the first video data sample of 30 frames is sampled again, the sampling interval is 6, and a second video data sample of 5 video frames is obtained and input to the Slow branch.

In the application, the Slow branch is used for acquiring the spatial information of the video, such as the information of colors, plants and the like around the monkey, and the input video frames in the Slow branch are less, but the characteristic information is complex and the fine granularity is high, so that a large amount of calculation is generated, and the calculation amount of the whole network model is approximately 80%.

Step S204: sampling the first video data sample subjected to data enhancement pretreatment according to a second sampling interval to obtain a third video data sample, and inputting the third video data sample into a Fast branch in a first SlowFast neural network identification model to obtain time information of a target behavior; wherein the first sampling interval is greater than the second sampling interval.

In one example, for example, after a first sampling obtains a first video data sample of 30 frames, the first video data sample of 30 frames is sampled again, the sampling interval is 2, and a third video data sample of 15 video frames is obtained and input to the Fast branch.

In the application, the Fast branch is used for acquiring the time information of the video, such as the motion information of the monkey from 2S to 3S, and the Fast branch has more input video frames, but the characteristic information is simpler, the fine granularity is low, the calculated amount is small, and the calculation occupies about 20% of the whole network.

Step S205: and fusing the space information and the time information.

In this step, the first SlowFast neural network identification model includes a channel connected from the Fast branch to the Slow branch, so as to fuse the temporal information and the spatial information. However, since the number of the input video frames of the two branches is different, the generated feature dimensions are also different, so that the feature images of Fast branches need to be subjected to scale change by using a 3D convolution kernel of 5 x 1 during connection, and then summed with the feature images of Slow branches to realize the fusion of time and space feature information.

Step S206: and calculating according to the fused information to obtain a training recognition result.

In the step, the integrated video information is obtained and input into a full-connection layer extraction feature value of a first SlowFast neural network recognition model, and the feature extracted by the full-connection layer is input into a sigmoid regression layer for calculation to obtain a training recognition result.

Step S207: repeating the training processes S201-S206 according to the preset training times to obtain a second SlowFast neural network identification model.

Step S104: and calculating the recognition accuracy of the second SlowFast neural network recognition model according to the verification dataset.

inputting the verification data set into the second SlowFast neural network identification model, training each video data in the verification data set in sequence, and outputting a verification identification result of each video data, wherein the verification identification result is a behavior type verification tag. The training process refers to step S201-step S206, and is not described herein.

Comparing the behavior category verification tag with a video name tag;

In one example, for example, the verification training set includes 10 sets of video clip data, each set includes 10 video clip data, each video clip corresponds to a video name tag, and the video name tags corresponding to the 10 video clip data are respectively 1 and lying down; 2. squatting; 3. walking; 4. jumping upwards; 5. jumping downwards; 6. climbing upwards; 7. climbing downwards; 8. hanging; 9. standing; 10. and (5) climbing. 1-10, namely, representing a behavior type label and also representing a video name label, for example, a video clip with the video name label of 1 is output after training to be a behavior type label 2, then determining that the action type is squatting according to the behavior type label 2, and identifying errors if the action type is different from the input video label, namely, different from the actual action. Assuming that there are 5 recognition errors and 5 recognition is correct in 10 video clips of one group, the recognition accuracy of the group is 50%.

Step S105: and adjusting parameters of the second SlowFast neural network identification model according to the identification precision, and performing iterative training to obtain a third SlowFast neural network identification model.

In this step, the number of iterative training is preset, the second SlowFast neural network recognition model is trained according to the preset number of training times, for example, 1000 times are set, model parameters, for example, parameters learning_rate and weight_decay, are adjusted according to the output recognition accuracy once each training, and after the training of all times is completed, the corresponding model parameter with the highest recognition accuracy is configured as the third SlowFast neural network recognition model parameter.

Step S106: and identifying target behaviors in the real environment by using the third SlowFast neural network identification model. And deploying a third SlowFast neural network recognition model into a server of the real environment to recognize the monkey behavior.

In another embodiment of the present application, the preprocessed video data may be further divided into test data sets, for example, according to a ratio of 3:1:1, that is, the training data set accounts for 60%, the verification data set accounts for 20%, the test data set is used for testing the performance of the third SlowFast neural network identification model, and the optimal identification accuracy of the third SlowFast neural network identification model on the test data set is determined as the identification accuracy of the third SlowFast neural network identification model.

In a second aspect, based on the same inventive concept, the present invention proposes a SlowFast-based behavior recognition system, mainly for monkey behavior recognition, as shown in fig. 3, the system comprising:

A preprocessing unit 301, configured to preprocess original video data of a target behavior to obtain preprocessed video data;

A dividing unit 302 for dividing the pre-processed video data into a training data set and a verification data set;

The first training unit 303 is configured to input the training data set into a first SlowFast neural network identification model that is built in advance to perform preliminary training, so as to obtain a second SlowFast neural network identification model;

a calculating unit 304, configured to calculate an identification accuracy of the second SlowFast neural network identification model according to the verification dataset;

The second training unit 305 is configured to adjust parameters of the second SlowFast neural network identification model according to the identification accuracy, and perform iterative training to obtain a third SlowFast neural network identification model;

And the identifying unit 306 is configured to identify the target behavior in the real environment by using the third SlowFast neural network identification model.

It will be clear to those skilled in the art that, for convenience and brevity of description, the specific working process of the system described above and the related description may refer to the corresponding process in the foregoing method embodiment, which is not repeated here.

It should be noted that, in the behavior recognition system based on SlowFast provided in the foregoing embodiment, only the division of the foregoing functional modules is illustrated, in practical application, the foregoing functional allocation may be performed by different functional modules according to needs, that is, the modules or steps in the foregoing embodiment of the present invention are further decomposed or combined, for example, the modules in the foregoing embodiment may be combined into one module, or may be further decomposed into a plurality of sub-modules, so as to complete all or part of the functions described above. The names of the modules and steps related to the embodiments of the present invention are merely for distinguishing the respective modules or steps, and are not to be construed as unduly limiting the present invention.

An apparatus of a third embodiment of the present invention comprises:

at least one processor; and

A memory communicatively coupled to at least one of the processors; wherein,

A computer readable storage medium of a fourth embodiment of the present invention stores computer instructions for execution by the computer to implement the SlowFast-based behavior recognition method of any one of the first aspects.

It will be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process of the storage device and the processing device described above and the related description may refer to the corresponding process in the foregoing method embodiment, which is not repeated herein.

Referring now to FIG. 4, there is shown a block diagram of a computer system of a server for implementing embodiments of the methods, systems, and apparatus of the present application. The server illustrated in fig. 4 is merely an example, and should not be construed as limiting the functionality and scope of use of embodiments of the present application.

As shown in fig. 4, the computer system includes a central processing unit (CPU, central Processing Unit) 401 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 402 or a program loaded from a storage section 408 into a random access Memory (RAM, random Access Memory) 403. In the RAM 403, various programs and data required for the system operation are also stored. The CPU 401, ROM 402, and RAM 403 are connected to each other by a bus 404. An Input/Output (I/O) interface 405 is also connected to bus 404.

The following components are connected to the I/O interface 405: an input section 406 including a keyboard, a mouse, and the like; an output portion 407 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and a speaker; a storage section 408 including a hard disk or the like; and a communication section 409 including a network interface card such as a LAN (local area network ) card, a modem, or the like. The communication section 409 performs communication processing via a network such as the internet. The drive 410 is also connected to the I/O interface 405 as needed. A removable medium 411 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is installed on the drive 410 as needed, so that a computer program read therefrom is installed into the storage section 408 as needed.

In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flowcharts. In such an embodiment, the computer program may be downloaded and installed from a network via the communication portion 409 and/or installed from the removable medium 411. The above-described functions defined in the method of the present application are performed when the computer program is executed by a Central Processing Unit (CPU) 401. The computer readable medium of the present application may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present application, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The terms "first," "second," and the like, are used for distinguishing between similar objects and not for describing a particular sequential or chronological order.

The terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus/apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus/apparatus.

Thus far, the technical solution of the present invention has been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of protection of the present invention is not limited to these specific embodiments. Equivalent modifications and substitutions for related technical features may be made by those skilled in the art without departing from the principles of the present invention, and such modifications and substitutions will be within the scope of the present invention.

Claims

1. A method of behavior recognition based on SlowFast, the method comprising:

inputting the training data set into a pre-constructed first SlowFast neural network identification model for preliminary training to obtain a second SlowFast neural network identification model; the method comprises the following steps:

Sampling the training data set according to a preset sampling rule to obtain a first video data sample with a preset frame number; the method comprises the following steps: acquiring an initial frame number of each video data in a training data set; determining a sampling interval according to a preset proportional corresponding relation between the initial frame number and the sampling interval; sampling according to the sampling interval to obtain an intermediate video data sample; if the number of frames of the intermediate video data samples is larger than the preset number of frames, randomly intercepting the video data samples with the preset number of frames and determining the video data samples with the preset number of frames as first video data samples;

performing data enhancement preprocessing on the first video data sample;

fusing the space information and the time information;

repeating the training process according to the preset training times to obtain a second SlowFast neural network identification model;

2. The method of claim 1, wherein preprocessing the target behavior raw video data to obtain preprocessed video data comprises:

3. The method according to claim 2, wherein the first preprocessing the target behavior raw video data to obtain a plurality of video clip data includes:

4. The method of claim 2, wherein performing a second preprocessing on the plurality of video clip data to augment the data, respectively, to obtain preprocessed video data comprises:

5. A method according to claim 3, wherein said calculating an identification accuracy of the second SlowFast neural network identification model from the verification dataset comprises:

comparing the behavior category verification tag with a video name tag;

6. A SlowFast-based behavior recognition system, the system comprising:

The first training unit is used for inputting the training data set into a pre-constructed first SlowFast neural network identification model to perform preliminary training to obtain a second SlowFast neural network identification model; the method comprises the following steps:

performing data enhancement preprocessing on the first video data sample;

fusing the space information and the time information;

7. An apparatus, comprising:

at least one processor; and

A memory communicatively coupled to at least one of the processors; wherein,

The memory stores instructions executable by the processor for execution by the processor to implement the SlowFast-based behavior recognition method of any one of claims 1-5.

8. A computer-readable storage medium storing computer instructions for execution by the computer to implement the SlowFast behavior recognition-based method of any one of claims 1-5.