CN113723169B - SlowFast-based behavior recognition method, system and equipment - Google Patents

SlowFast-based behavior recognition method, system and equipment Download PDF

Info

Publication number
CN113723169B
CN113723169B CN202110455595.XA CN202110455595A CN113723169B CN 113723169 B CN113723169 B CN 113723169B CN 202110455595 A CN202110455595 A CN 202110455595A CN 113723169 B CN113723169 B CN 113723169B
Authority
CN
China
Prior art keywords
video data
slowfast
neural network
training
identification model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110455595.XA
Other languages
Chinese (zh)
Other versions
CN113723169A (en
Inventor
马喜波
徐哲
雷震
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Automation of Chinese Academy of Science
Original Assignee
Institute of Automation of Chinese Academy of Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Automation of Chinese Academy of Science filed Critical Institute of Automation of Chinese Academy of Science
Priority to CN202110455595.XA priority Critical patent/CN113723169B/en
Publication of CN113723169A publication Critical patent/CN113723169A/en
Application granted granted Critical
Publication of CN113723169B publication Critical patent/CN113723169B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Abstract

The invention belongs to the technical field of behavior recognition, and particularly relates to a behavior recognition method, system and equipment based on SlowFast, aiming at solving the problems of low recognition efficiency and low recognition precision. The method comprises the following steps: preprocessing target behavior original video data to obtain preprocessed video data; dividing the preprocessed video data into a training data set and a verification data set; inputting the training data set into a pre-constructed first SlowFast neural network identification model for preliminary training to obtain a second SlowFast neural network identification model; calculating the recognition accuracy of the second SlowFast neural network recognition model according to the verification data set; adjusting parameters of the second SlowFast neural network identification model according to the identification precision, and performing iterative training to obtain a third SlowFast neural network identification model; and identifying target behaviors in the real environment by using a third SlowFast neural network identification model. The invention greatly improves the recognition efficiency, saves manpower and time and improves the recognition precision.

Description

SlowFast-based behavior recognition method, system and equipment
Technical Field
The invention belongs to the technical field of behavior recognition, and particularly relates to a behavior recognition method, system and equipment based on SlowFast.
Background
In many medical experiments, it is not possible to take a person directly as the subject for safety and ethical reasons. At this time, the artificially raised animals can replace humans to perform experiments, and the results of the experiments are obtained by observing and recording the behaviors and physiological changes of the animals. Since humans are closely related to monkeys and are also primates, observing changes in behavior in monkeys is of direct biological and medical interest.
At present, a mode of site manual observation and video monitoring is generally adopted when the monkey behavior is observed, but the existing monitoring mode has the following problems:
1. Manual observation on site is generally time-consuming and labor-consuming, and monkey behaviors are easily interfered by operators, so that experimental results are affected, and detection accuracy is low.
2. With video monitoring, behavior recording still requires a lot of manual involvement and is therefore not an optimal solution.
Disclosure of Invention
In order to solve the problems in the prior art, namely the problems of low efficiency and low detection precision, the invention provides a behavior recognition method, a behavior recognition system and behavior recognition equipment based on SlowFast,
In a first aspect of the present invention, a behavior recognition method based on SlowFast is provided, where the method includes:
preprocessing target behavior original video data to obtain preprocessed video data;
Dividing the pre-processed video data into a training data set and a verification data set;
inputting the training data set into a pre-constructed first SlowFast neural network identification model for preliminary training to obtain a second SlowFast neural network identification model;
Calculating the recognition accuracy of the second SlowFast neural network recognition model according to the verification dataset;
Adjusting parameters of the second SlowFast neural network identification model according to the identification precision, and performing iterative training to obtain a third SlowFast neural network identification model;
and identifying target behaviors in the real environment by using the third SlowFast neural network identification model.
Optionally, the preprocessing the target behavior original video data to obtain preprocessed video data includes:
Performing first preprocessing on the target behavior original video data to obtain a plurality of video clip data, wherein each video clip data comprises a target behavior;
And respectively performing second preprocessing on the video clip data to expand the data, so as to obtain preprocessed video data.
Optionally, the first preprocessing the target behavior original video data to obtain a plurality of video clip data includes:
acquiring the starting and ending time of each target behavior in the target behavior original video data and a behavior class label;
cutting the original video data of the target behaviors according to the starting and ending moments to obtain video clip data;
and labeling the video name label of each video clip according to the behavior category label.
Optionally, the performing second preprocessing on the plurality of video clip data to expand the data, and obtaining the preprocessed video data includes:
And performing one or more operations of random cutting and overturning on the video clips to obtain the expanded preprocessed video data.
Optionally, inputting the training data set into a pre-constructed first SlowFast neural network identification model for preliminary training, and obtaining the second SlowFast neural network identification model includes:
Sampling the training data set according to a preset sampling rule to obtain a first video data sample with a preset frame number;
performing data enhancement preprocessing on the first video data sample;
Sampling the first video data sample subjected to data enhancement pretreatment according to a first sampling interval to obtain a second video data sample, and inputting the second video data sample into a Slow branch in a first SlowFast neural network identification model to obtain spatial information of a target behavior;
Sampling the first video data sample subjected to data enhancement pretreatment according to a second sampling interval to obtain a third video data sample, and inputting the third video data sample into a Fast branch in a first SlowFast neural network identification model to obtain time information of a target behavior; wherein the first sampling interval is greater than the second sampling interval;
fusing the space information and the time information;
calculating according to the fused information to obtain a training recognition result;
Repeating the training process according to the preset training times to obtain a second SlowFast neural network identification model.
Optionally, the sampling the training data set according to a preset sampling rule to obtain a first video data sample with a preset frame number includes:
acquiring an initial frame number of each video data in a training data set;
determining a sampling interval according to a preset proportional corresponding relation between the initial frame number and the sampling interval;
Sampling according to the sampling interval to obtain an intermediate video data sample;
and if the frame number of the intermediate video data samples is larger than the preset frame number, randomly intercepting the video data samples with the preset frame number and determining the video data samples with the preset frame number as first video data samples.
Optionally, the calculating the recognition accuracy of the second SlowFast neural network recognition model according to the verification dataset includes:
inputting the verification data set into the second SlowFast neural network identification model, training each video data in the verification data set in sequence, and outputting a verification identification result of each video data, wherein the verification identification result is a behavior type verification tag;
comparing the behavior category verification tag with a video name tag;
And calculating the duty ratio of the verification recognition result of the behavior category verification tag which is the same as that of the video name tag, and determining the duty ratio as the recognition precision.
In a second aspect, the present invention proposes a SlowFast-based behavior recognition system, the system comprising:
the preprocessing unit is used for preprocessing the target behavior original video data to obtain preprocessed video data;
A dividing unit for dividing the pre-processed video data into a training data set and a verification data set;
The first training unit is used for inputting the training data set into a pre-constructed first SlowFast neural network identification model to perform preliminary training to obtain a second SlowFast neural network identification model;
a calculation unit, configured to calculate an identification accuracy of the second SlowFast neural network identification model according to the verification dataset;
The second training unit is used for adjusting parameters of the second SlowFast neural network identification model according to the identification precision and performing iterative training to obtain a third SlowFast neural network identification model;
and the identification unit is used for identifying target behaviors in the real environment by using the third SlowFast neural network identification model.
In a third aspect of the invention, an apparatus is presented comprising:
at least one processor; and
A memory communicatively coupled to at least one of the processors; wherein,
The memory stores instructions executable by the processor for execution by the processor to implement the SlowFast-based behavior recognition method of any one of the first aspects.
In a fourth aspect of the present invention, a computer readable storage medium is provided, where computer instructions are stored, where the computer instructions are configured to be executed by the computer to implement the method for behavior recognition based on SlowFast in the first aspect.
The invention has the beneficial effects that: according to the invention, the target behavior is automatically identified by establishing the neural network identification model based on SlowFast algorithm, so that the identification efficiency is greatly improved, and the preprocessed video data is divided into a training data set and a verification data set; inputting the training data set into a pre-constructed first SlowFast neural network identification model for preliminary training to obtain a second SlowFast neural network identification model; calculating the recognition accuracy of the second SlowFast neural network recognition model according to the verification data set; adjusting parameters of the second SlowFast neural network identification model according to the identification precision, and performing iterative training to obtain a third SlowFast neural network identification model; and the third SlowFast neural network recognition model is used for recognizing the target behavior in the real environment, so that the detection accuracy of the SlowFast neural network recognition model is greatly improved.
Drawings
Other features, objects and advantages of the present application will become more apparent upon reading of the detailed description of non-limiting embodiments, made with reference to the accompanying drawings in which:
FIG. 1 is a schematic diagram of a behavior recognition method based on SlowFast according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a behavior recognition method based on SlowFast according to another embodiment of the present invention;
FIG. 3 is a schematic diagram of a SlowFast-based behavior recognition system according to an example embodiment of the present invention;
FIG. 4 is a schematic diagram of a computer system of a server for implementing embodiments of the method, system, and apparatus of the present application.
Detailed Description
The application is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the application and are not limiting of the application. It should be noted that, for convenience of description, only the portions related to the present application are shown in the drawings.
The invention provides a SlowFast-based behavior recognition method which is mainly applied to monkey behavior recognition, and the method comprises the following steps:
preprocessing target behavior original video data to obtain preprocessed video data;
Dividing the pre-processed video data into a training data set and a verification data set;
inputting the training data set into a pre-constructed first SlowFast neural network identification model for preliminary training to obtain a second SlowFast neural network identification model;
Calculating the recognition accuracy of the second SlowFast neural network recognition model according to the verification dataset;
Adjusting parameters of the second SlowFast neural network identification model according to the identification precision, and performing iterative training to obtain a third SlowFast neural network identification model;
and identifying target behaviors in the real environment by using the third SlowFast neural network identification model.
In order to more clearly describe the behavior recognition method based on SlowFast of the present invention, each step in the embodiment of the present invention is described in detail below with reference to fig. 1.
It should be noted that, without conflict, the embodiments of the present application and features of the embodiments may be combined with each other. The application will be described in detail below with reference to the drawings in connection with embodiments.
The SlowFast-based behavior recognition method according to the first embodiment of the present invention includes steps S101 to S106, and each step is described in detail as follows:
Step S101: and preprocessing the target behavior original video data to obtain preprocessed video data.
In the implementation of the application, the target behavior mainly refers to monkey behavior, the monkey is a collective term of mammal primates, the omnivorous is mainly fruit, and the animal is a member of three kinds of apes primates without being spitted.
In one example, the observation target may be a rhesus or a cynomolgus monkey, and may be mainly a rhesus, with a small fraction being a cynomolgus monkey.
In this step, the target-behavior raw video data is first acquired before the target-behavior raw video data is preprocessed. In one example, the monkey's front and top video data is obtained primarily. The specific operation can be to design and prepare two fixing devices, put two cameras into the fixing devices, and then respectively install the fixing devices on the front and the top of the cage where the monkey to be photographed is located. All the actions of the monkey were photographed without interruption. Monkeys of different sexes and ages were photographed as closely as possible.
Optionally, the preprocessing the target behavior original video data to obtain preprocessed video data includes:
and performing first preprocessing on the target behavior original video data to obtain a plurality of video segment data, wherein each video segment comprises a target behavior.
Specifically, the first preprocessing the target behavior original video data to obtain a plurality of video clip data includes:
acquiring the starting and ending time of each target behavior in the target behavior original video data and a behavior class label;
In the embodiment of the application, firstly, the original video data of the target behavior is cleaned, the original video data of the target behavior with higher definition is selected, then the original video data of the target behavior with higher definition is watched according to the predetermined action category, and the starting time and the ending time of each action category are recorded; and labeling behavior category labels.
The action category is determined in advance, and a worker classifies and defines all actions of a target such as a monkey, so that the actions of the monkey are required to be completely visible, the occurrence number is more, the action category can be definitely defined, and the more the classification is, the more the category is identified by the SlowFast neural network identification model. In one example, the action categories of monkeys may be divided into 10 categories, 1, lying down, respectively; 2. squatting; 3. walking; 4. jumping upwards; 5. jumping downwards; 6. climbing upwards; 7. climbing downwards; 8. hanging; 9. standing; 10. and (5) climbing. Wherein, 1-10 are behavior category labels.
And cutting the original video data of the target behaviors according to the starting and ending time to obtain video clip data. In one example, python code may be written in combination with Ffmpeg software command lines, and the data is cut out in bulk according to the start-stop times.
And labeling the video name label of the video clip according to the behavior category label. For example, if the action category represented by the action category label "1" is horizontal, the video name label may be set to "1", so that the action category label, and thus the action category, may be determined according to the video name label.
And respectively performing second preprocessing on the video clip data to expand the data, so as to obtain preprocessed video data.
Optionally, the performing second preprocessing on the plurality of video clip data to expand the data, and obtaining the preprocessed video data includes:
And performing one or more operations of random cutting and overturning on the video clips to obtain the expanded preprocessed video data.
The data can be expanded by the second preprocessing, and the number of preprocessed video data can be increased, and the preprocessed video data can be divided into the training data set and the verification data set only if the preprocessed video data is sufficient, so that a data basis is provided for step S102.
Step S102: the pre-processed video data is divided into a training data set and a validation data set.
In this step, the training data set is divided according to a preset ratio, such as 4:1, and the duty ratio of the training data set is larger than that of the verification data set.
Step S103: and inputting the training data set into a pre-constructed first SlowFast neural network identification model to perform preliminary training to obtain a second SlowFast neural network identification model.
Optionally, inputting the training data set into a pre-constructed first SlowFast neural network identification model for preliminary training, and obtaining a second SlowFast neural network identification model includes the following steps, as shown in fig. 2:
step S201: sampling the training data set according to a preset sampling rule to obtain a first video data sample with a preset frame number, wherein the method specifically comprises the following steps:
Acquiring an initial frame number of each video data in a training data set, and determining a sampling interval according to a preset proportional corresponding relation between the initial frame number and the sampling interval; for example, 30-60 frames of video, with a sampling interval of 1;60-90 frames of video with a sampling interval of 2;90-180 frames of video with a sampling interval of 3; video of more than 180 frames: the sampling interval is 4. Sampling according to the sampling interval to obtain an intermediate video data sample; if the number of frames of the intermediate video data samples is greater than the preset number of frames, randomly intercepting the video data samples of the preset number of frames to determine the video data samples as first video data samples, for example, 50 frames of video, wherein the sampling interval is 1, 50 video data samples are obtained, and then 30 continuous frames are randomly intercepted from the 50 video data samples to serve as first video data samples.
Because the initial frames of the video are inconsistent, the application dynamically sets the sampling interval according to the initial frames of the video, and the longer the video is, the larger the sampling interval is, thus being more beneficial to obtaining the global information of the video.
Step S202: and carrying out data enhancement preprocessing on the first video data sample. The processing method comprises the following steps: random clipping and 50% probability horizontal flipping.
Step S203: and sampling the first video data sample subjected to data enhancement pretreatment according to a first sampling interval to obtain a second video data sample, and inputting the second video data sample into a Slow branch in a first SlowFast neural network identification model to obtain spatial information of target behaviors.
In one example, for example, after a first sampling obtains a first video data sample of 30 frames, the first video data sample of 30 frames is sampled again, the sampling interval is 6, and a second video data sample of 5 video frames is obtained and input to the Slow branch.
In the application, the Slow branch is used for acquiring the spatial information of the video, such as the information of colors, plants and the like around the monkey, and the input video frames in the Slow branch are less, but the characteristic information is complex and the fine granularity is high, so that a large amount of calculation is generated, and the calculation amount of the whole network model is approximately 80%.
Step S204: sampling the first video data sample subjected to data enhancement pretreatment according to a second sampling interval to obtain a third video data sample, and inputting the third video data sample into a Fast branch in a first SlowFast neural network identification model to obtain time information of a target behavior; wherein the first sampling interval is greater than the second sampling interval.
In one example, for example, after a first sampling obtains a first video data sample of 30 frames, the first video data sample of 30 frames is sampled again, the sampling interval is 2, and a third video data sample of 15 video frames is obtained and input to the Fast branch.
In the application, the Fast branch is used for acquiring the time information of the video, such as the motion information of the monkey from 2S to 3S, and the Fast branch has more input video frames, but the characteristic information is simpler, the fine granularity is low, the calculated amount is small, and the calculation occupies about 20% of the whole network.
Step S205: and fusing the space information and the time information.
In this step, the first SlowFast neural network identification model includes a channel connected from the Fast branch to the Slow branch, so as to fuse the temporal information and the spatial information. However, since the number of the input video frames of the two branches is different, the generated feature dimensions are also different, so that the feature images of Fast branches need to be subjected to scale change by using a 3D convolution kernel of 5 x 1 during connection, and then summed with the feature images of Slow branches to realize the fusion of time and space feature information.
Step S206: and calculating according to the fused information to obtain a training recognition result.
In the step, the integrated video information is obtained and input into a full-connection layer extraction feature value of a first SlowFast neural network recognition model, and the feature extracted by the full-connection layer is input into a sigmoid regression layer for calculation to obtain a training recognition result.
Step S207: repeating the training processes S201-S206 according to the preset training times to obtain a second SlowFast neural network identification model.
Step S104: and calculating the recognition accuracy of the second SlowFast neural network recognition model according to the verification dataset.
Optionally, the calculating the recognition accuracy of the second SlowFast neural network recognition model according to the verification dataset includes:
inputting the verification data set into the second SlowFast neural network identification model, training each video data in the verification data set in sequence, and outputting a verification identification result of each video data, wherein the verification identification result is a behavior type verification tag. The training process refers to step S201-step S206, and is not described herein.
Comparing the behavior category verification tag with a video name tag;
And calculating the duty ratio of the verification recognition result of the behavior category verification tag which is the same as that of the video name tag, and determining the duty ratio as the recognition precision.
In one example, for example, the verification training set includes 10 sets of video clip data, each set includes 10 video clip data, each video clip corresponds to a video name tag, and the video name tags corresponding to the 10 video clip data are respectively 1 and lying down; 2. squatting; 3. walking; 4. jumping upwards; 5. jumping downwards; 6. climbing upwards; 7. climbing downwards; 8. hanging; 9. standing; 10. and (5) climbing. 1-10, namely, representing a behavior type label and also representing a video name label, for example, a video clip with the video name label of 1 is output after training to be a behavior type label 2, then determining that the action type is squatting according to the behavior type label 2, and identifying errors if the action type is different from the input video label, namely, different from the actual action. Assuming that there are 5 recognition errors and 5 recognition is correct in 10 video clips of one group, the recognition accuracy of the group is 50%.
Step S105: and adjusting parameters of the second SlowFast neural network identification model according to the identification precision, and performing iterative training to obtain a third SlowFast neural network identification model.
In this step, the number of iterative training is preset, the second SlowFast neural network recognition model is trained according to the preset number of training times, for example, 1000 times are set, model parameters, for example, parameters learning_rate and weight_decay, are adjusted according to the output recognition accuracy once each training, and after the training of all times is completed, the corresponding model parameter with the highest recognition accuracy is configured as the third SlowFast neural network recognition model parameter.
Step S106: and identifying target behaviors in the real environment by using the third SlowFast neural network identification model. And deploying a third SlowFast neural network recognition model into a server of the real environment to recognize the monkey behavior.
In another embodiment of the present application, the preprocessed video data may be further divided into test data sets, for example, according to a ratio of 3:1:1, that is, the training data set accounts for 60%, the verification data set accounts for 20%, the test data set is used for testing the performance of the third SlowFast neural network identification model, and the optimal identification accuracy of the third SlowFast neural network identification model on the test data set is determined as the identification accuracy of the third SlowFast neural network identification model.
In a second aspect, based on the same inventive concept, the present invention proposes a SlowFast-based behavior recognition system, mainly for monkey behavior recognition, as shown in fig. 3, the system comprising:
A preprocessing unit 301, configured to preprocess original video data of a target behavior to obtain preprocessed video data;
A dividing unit 302 for dividing the pre-processed video data into a training data set and a verification data set;
The first training unit 303 is configured to input the training data set into a first SlowFast neural network identification model that is built in advance to perform preliminary training, so as to obtain a second SlowFast neural network identification model;
a calculating unit 304, configured to calculate an identification accuracy of the second SlowFast neural network identification model according to the verification dataset;
The second training unit 305 is configured to adjust parameters of the second SlowFast neural network identification model according to the identification accuracy, and perform iterative training to obtain a third SlowFast neural network identification model;
And the identifying unit 306 is configured to identify the target behavior in the real environment by using the third SlowFast neural network identification model.
It will be clear to those skilled in the art that, for convenience and brevity of description, the specific working process of the system described above and the related description may refer to the corresponding process in the foregoing method embodiment, which is not repeated here.
It should be noted that, in the behavior recognition system based on SlowFast provided in the foregoing embodiment, only the division of the foregoing functional modules is illustrated, in practical application, the foregoing functional allocation may be performed by different functional modules according to needs, that is, the modules or steps in the foregoing embodiment of the present invention are further decomposed or combined, for example, the modules in the foregoing embodiment may be combined into one module, or may be further decomposed into a plurality of sub-modules, so as to complete all or part of the functions described above. The names of the modules and steps related to the embodiments of the present invention are merely for distinguishing the respective modules or steps, and are not to be construed as unduly limiting the present invention.
An apparatus of a third embodiment of the present invention comprises:
at least one processor; and
A memory communicatively coupled to at least one of the processors; wherein,
The memory stores instructions executable by the processor for execution by the processor to implement the SlowFast-based behavior recognition method of any one of the first aspects.
A computer readable storage medium of a fourth embodiment of the present invention stores computer instructions for execution by the computer to implement the SlowFast-based behavior recognition method of any one of the first aspects.
It will be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process of the storage device and the processing device described above and the related description may refer to the corresponding process in the foregoing method embodiment, which is not repeated herein.
Referring now to FIG. 4, there is shown a block diagram of a computer system of a server for implementing embodiments of the methods, systems, and apparatus of the present application. The server illustrated in fig. 4 is merely an example, and should not be construed as limiting the functionality and scope of use of embodiments of the present application.
As shown in fig. 4, the computer system includes a central processing unit (CPU, central Processing Unit) 401 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 402 or a program loaded from a storage section 408 into a random access Memory (RAM, random Access Memory) 403. In the RAM 403, various programs and data required for the system operation are also stored. The CPU 401, ROM 402, and RAM 403 are connected to each other by a bus 404. An Input/Output (I/O) interface 405 is also connected to bus 404.
The following components are connected to the I/O interface 405: an input section 406 including a keyboard, a mouse, and the like; an output portion 407 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and a speaker; a storage section 408 including a hard disk or the like; and a communication section 409 including a network interface card such as a LAN (local area network ) card, a modem, or the like. The communication section 409 performs communication processing via a network such as the internet. The drive 410 is also connected to the I/O interface 405 as needed. A removable medium 411 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is installed on the drive 410 as needed, so that a computer program read therefrom is installed into the storage section 408 as needed.
In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flowcharts. In such an embodiment, the computer program may be downloaded and installed from a network via the communication portion 409 and/or installed from the removable medium 411. The above-described functions defined in the method of the present application are performed when the computer program is executed by a Central Processing Unit (CPU) 401. The computer readable medium of the present application may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present application, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The terms "first," "second," and the like, are used for distinguishing between similar objects and not for describing a particular sequential or chronological order.
The terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus/apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus/apparatus.
Thus far, the technical solution of the present invention has been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of protection of the present invention is not limited to these specific embodiments. Equivalent modifications and substitutions for related technical features may be made by those skilled in the art without departing from the principles of the present invention, and such modifications and substitutions will be within the scope of the present invention.

Claims (8)

1. A method of behavior recognition based on SlowFast, the method comprising:
preprocessing target behavior original video data to obtain preprocessed video data;
Dividing the pre-processed video data into a training data set and a verification data set;
inputting the training data set into a pre-constructed first SlowFast neural network identification model for preliminary training to obtain a second SlowFast neural network identification model; the method comprises the following steps:
Sampling the training data set according to a preset sampling rule to obtain a first video data sample with a preset frame number; the method comprises the following steps: acquiring an initial frame number of each video data in a training data set; determining a sampling interval according to a preset proportional corresponding relation between the initial frame number and the sampling interval; sampling according to the sampling interval to obtain an intermediate video data sample; if the number of frames of the intermediate video data samples is larger than the preset number of frames, randomly intercepting the video data samples with the preset number of frames and determining the video data samples with the preset number of frames as first video data samples;
performing data enhancement preprocessing on the first video data sample;
Sampling the first video data sample subjected to data enhancement pretreatment according to a first sampling interval to obtain a second video data sample, and inputting the second video data sample into a Slow branch in a first SlowFast neural network identification model to obtain spatial information of a target behavior;
Sampling the first video data sample subjected to data enhancement pretreatment according to a second sampling interval to obtain a third video data sample, and inputting the third video data sample into a Fast branch in a first SlowFast neural network identification model to obtain time information of a target behavior; wherein the first sampling interval is greater than the second sampling interval;
fusing the space information and the time information;
calculating according to the fused information to obtain a training recognition result;
repeating the training process according to the preset training times to obtain a second SlowFast neural network identification model;
Calculating the recognition accuracy of the second SlowFast neural network recognition model according to the verification dataset;
Adjusting parameters of the second SlowFast neural network identification model according to the identification precision, and performing iterative training to obtain a third SlowFast neural network identification model;
and identifying target behaviors in the real environment by using the third SlowFast neural network identification model.
2. The method of claim 1, wherein preprocessing the target behavior raw video data to obtain preprocessed video data comprises:
Performing first preprocessing on the target behavior original video data to obtain a plurality of video clip data, wherein each video clip data comprises a target behavior;
And respectively performing second preprocessing on the video clip data to expand the data, so as to obtain preprocessed video data.
3. The method according to claim 2, wherein the first preprocessing the target behavior raw video data to obtain a plurality of video clip data includes:
acquiring the starting and ending time of each target behavior in the target behavior original video data and a behavior class label;
cutting the original video data of the target behaviors according to the starting and ending moments to obtain video clip data;
and labeling the video name label of each video clip according to the behavior category label.
4. The method of claim 2, wherein performing a second preprocessing on the plurality of video clip data to augment the data, respectively, to obtain preprocessed video data comprises:
And performing one or more operations of random cutting and overturning on the video clips to obtain the expanded preprocessed video data.
5. A method according to claim 3, wherein said calculating an identification accuracy of the second SlowFast neural network identification model from the verification dataset comprises:
inputting the verification data set into the second SlowFast neural network identification model, training each video data in the verification data set in sequence, and outputting a verification identification result of each video data, wherein the verification identification result is a behavior type verification tag;
comparing the behavior category verification tag with a video name tag;
And calculating the duty ratio of the verification recognition result of the behavior category verification tag which is the same as that of the video name tag, and determining the duty ratio as the recognition precision.
6. A SlowFast-based behavior recognition system, the system comprising:
the preprocessing unit is used for preprocessing the target behavior original video data to obtain preprocessed video data;
A dividing unit for dividing the pre-processed video data into a training data set and a verification data set;
The first training unit is used for inputting the training data set into a pre-constructed first SlowFast neural network identification model to perform preliminary training to obtain a second SlowFast neural network identification model; the method comprises the following steps:
Sampling the training data set according to a preset sampling rule to obtain a first video data sample with a preset frame number; the method comprises the following steps: acquiring an initial frame number of each video data in a training data set; determining a sampling interval according to a preset proportional corresponding relation between the initial frame number and the sampling interval; sampling according to the sampling interval to obtain an intermediate video data sample; if the number of frames of the intermediate video data samples is larger than the preset number of frames, randomly intercepting the video data samples with the preset number of frames and determining the video data samples with the preset number of frames as first video data samples;
performing data enhancement preprocessing on the first video data sample;
Sampling the first video data sample subjected to data enhancement pretreatment according to a first sampling interval to obtain a second video data sample, and inputting the second video data sample into a Slow branch in a first SlowFast neural network identification model to obtain spatial information of a target behavior;
Sampling the first video data sample subjected to data enhancement pretreatment according to a second sampling interval to obtain a third video data sample, and inputting the third video data sample into a Fast branch in a first SlowFast neural network identification model to obtain time information of a target behavior; wherein the first sampling interval is greater than the second sampling interval;
fusing the space information and the time information;
calculating according to the fused information to obtain a training recognition result;
repeating the training process according to the preset training times to obtain a second SlowFast neural network identification model;
a calculation unit, configured to calculate an identification accuracy of the second SlowFast neural network identification model according to the verification dataset;
The second training unit is used for adjusting parameters of the second SlowFast neural network identification model according to the identification precision and performing iterative training to obtain a third SlowFast neural network identification model;
and the identification unit is used for identifying target behaviors in the real environment by using the third SlowFast neural network identification model.
7. An apparatus, comprising:
at least one processor; and
A memory communicatively coupled to at least one of the processors; wherein,
The memory stores instructions executable by the processor for execution by the processor to implement the SlowFast-based behavior recognition method of any one of claims 1-5.
8. A computer-readable storage medium storing computer instructions for execution by the computer to implement the SlowFast behavior recognition-based method of any one of claims 1-5.
CN202110455595.XA 2021-04-26 2021-04-26 SlowFast-based behavior recognition method, system and equipment Active CN113723169B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110455595.XA CN113723169B (en) 2021-04-26 2021-04-26 SlowFast-based behavior recognition method, system and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110455595.XA CN113723169B (en) 2021-04-26 2021-04-26 SlowFast-based behavior recognition method, system and equipment

Publications (2)

Publication Number Publication Date
CN113723169A CN113723169A (en) 2021-11-30
CN113723169B true CN113723169B (en) 2024-04-30

Family

ID=78672693

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110455595.XA Active CN113723169B (en) 2021-04-26 2021-04-26 SlowFast-based behavior recognition method, system and equipment

Country Status (1)

Country Link
CN (1) CN113723169B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114241376A (en) * 2021-12-15 2022-03-25 深圳先进技术研究院 Behavior recognition model training and behavior recognition method, device, system and medium
CN114359791B (en) * 2021-12-16 2023-08-01 北京信智文科技有限公司 Group macaque appetite detection method based on Yolo v5 network and SlowFast network
CN116612524A (en) * 2022-02-07 2023-08-18 北京字跳网络技术有限公司 Action recognition method and device, electronic equipment and storage medium
CN115376210B (en) * 2022-10-24 2023-03-21 杭州巨岩欣成科技有限公司 Drowning behavior identification method, device, equipment and medium for preventing drowning in swimming pool
CN116110586B (en) * 2023-04-13 2023-11-21 南京市红山森林动物园管理处 Elephant health management system based on YOLOv5 and SlowFast
CN116363137B (en) * 2023-06-01 2023-08-04 合力(天津)能源科技股份有限公司 Cleaning effect evaluation method and system for guiding automatic cleaning of oil pipe

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108647643A (en) * 2018-05-11 2018-10-12 浙江工业大学 A kind of packed tower liquid flooding state on-line identification method based on deep learning
CN109145789A (en) * 2018-08-09 2019-01-04 炜呈智能电力科技(杭州)有限公司 Power supply system safety work support method and system
CN110119703A (en) * 2019-05-07 2019-08-13 福州大学 The human motion recognition method of attention mechanism and space-time diagram convolutional neural networks is merged under a kind of security protection scene
CN110555368A (en) * 2019-06-28 2019-12-10 西安理工大学 Fall-down behavior identification method based on three-dimensional convolutional neural network
CN110717301A (en) * 2019-09-19 2020-01-21 中国石油大学(华东) Flow unit information classification and identification method based on support vector machine algorithm
CN110852168A (en) * 2019-10-11 2020-02-28 西北大学 Pedestrian re-recognition model construction method and device based on neural framework search
CN111291840A (en) * 2020-05-12 2020-06-16 成都派沃智通科技有限公司 Student classroom behavior recognition system, method, medium and terminal device
CN111598230A (en) * 2019-02-21 2020-08-28 北京创新工场旷视国际人工智能技术研究院有限公司 Training method and system of neural network model with anti-counterfeiting function, anti-counterfeiting verification method and electronic device
CN111814669A (en) * 2020-07-08 2020-10-23 中国工商银行股份有限公司 Method and device for identifying abnormal behaviors of bank outlets
CN111814661A (en) * 2020-07-07 2020-10-23 西安电子科技大学 Human behavior identification method based on residual error-recurrent neural network
CN112183313A (en) * 2020-09-27 2021-01-05 武汉大学 SlowFast-based power operation field action identification method
CN112529020A (en) * 2020-12-24 2021-03-19 携程旅游信息技术(上海)有限公司 Animal identification method, system, equipment and storage medium based on neural network
CN112580523A (en) * 2020-12-22 2021-03-30 平安国际智慧城市科技股份有限公司 Behavior recognition method, behavior recognition device, behavior recognition equipment and storage medium

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11165800B2 (en) * 2017-08-28 2021-11-02 Oracle International Corporation Cloud based security monitoring using unsupervised pattern recognition and deep learning
US11282297B2 (en) * 2019-09-10 2022-03-22 Blue Planet Training, Inc. System and method for visual analysis of emotional coherence in videos

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108647643A (en) * 2018-05-11 2018-10-12 浙江工业大学 A kind of packed tower liquid flooding state on-line identification method based on deep learning
CN109145789A (en) * 2018-08-09 2019-01-04 炜呈智能电力科技(杭州)有限公司 Power supply system safety work support method and system
CN111598230A (en) * 2019-02-21 2020-08-28 北京创新工场旷视国际人工智能技术研究院有限公司 Training method and system of neural network model with anti-counterfeiting function, anti-counterfeiting verification method and electronic device
CN110119703A (en) * 2019-05-07 2019-08-13 福州大学 The human motion recognition method of attention mechanism and space-time diagram convolutional neural networks is merged under a kind of security protection scene
CN110555368A (en) * 2019-06-28 2019-12-10 西安理工大学 Fall-down behavior identification method based on three-dimensional convolutional neural network
CN110717301A (en) * 2019-09-19 2020-01-21 中国石油大学(华东) Flow unit information classification and identification method based on support vector machine algorithm
CN110852168A (en) * 2019-10-11 2020-02-28 西北大学 Pedestrian re-recognition model construction method and device based on neural framework search
CN111291840A (en) * 2020-05-12 2020-06-16 成都派沃智通科技有限公司 Student classroom behavior recognition system, method, medium and terminal device
CN111814661A (en) * 2020-07-07 2020-10-23 西安电子科技大学 Human behavior identification method based on residual error-recurrent neural network
CN111814669A (en) * 2020-07-08 2020-10-23 中国工商银行股份有限公司 Method and device for identifying abnormal behaviors of bank outlets
CN112183313A (en) * 2020-09-27 2021-01-05 武汉大学 SlowFast-based power operation field action identification method
CN112580523A (en) * 2020-12-22 2021-03-30 平安国际智慧城市科技股份有限公司 Behavior recognition method, behavior recognition device, behavior recognition equipment and storage medium
CN112529020A (en) * 2020-12-24 2021-03-19 携程旅游信息技术(上海)有限公司 Animal identification method, system, equipment and storage medium based on neural network

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Tjeng Wawan Cenggoro ; Awang Harsa Kridalaksana ; Eka Arriyanti ; M. Irwan Ukkas."Recognition of a human behavior pattern in paper rock scissor game using backpropagation artificial neural network method".《2014 2nd International Conference on Information and Communication Technology (ICoICT)》.2014,全文. *
基于卷积网络的视频目标检测;杨洁;陈灵娜;林颖;陈宇韶;陈俊熹;;南华大学学报(自然科学版)(第04期);全文 *
基于通道注意力机制的视频人体行为识别;解怀奇;乐红兵;;电子技术与软件工程(第04期);全文 *

Also Published As

Publication number Publication date
CN113723169A (en) 2021-11-30

Similar Documents

Publication Publication Date Title
CN113723169B (en) SlowFast-based behavior recognition method, system and equipment
CN109104620B (en) Short video recommendation method and device and readable medium
US20210019531A1 (en) Method and apparatus for classifying video
CN108154105B (en) Underwater biological detection and identification method and device, server and terminal equipment
EP3637310A1 (en) Method and apparatus for generating vehicle damage information
CN113382279B (en) Live broadcast recommendation method, device, equipment, storage medium and computer program product
CN111046956A (en) Occlusion image detection method and device, electronic equipment and storage medium
CN111346842A (en) Coal gangue sorting method, device, equipment and storage medium
CN109840503B (en) Method and device for determining category information
CN112613569A (en) Image recognition method, and training method and device of image classification model
CN115082752A (en) Target detection model training method, device, equipment and medium based on weak supervision
CN114724140A (en) Strawberry maturity detection method and device based on YOLO V3
CN109088793B (en) Method and apparatus for detecting network failure
CN113253336B (en) Earthquake prediction method and system based on deep learning
CN114494863A (en) Animal cub counting method and device based on Blend Mask algorithm
CN110490056A (en) The method and apparatus that image comprising formula is handled
CN112508078A (en) Image multitask multi-label identification method, system, equipment and medium
CN115438945A (en) Risk identification method, device, equipment and medium based on power equipment inspection
CN114241376A (en) Behavior recognition model training and behavior recognition method, device, system and medium
CN114821396A (en) Normative detection method, device and storage medium for LNG unloading operation process
CN109522203B (en) Software product evaluation method and device
CN111310511A (en) Method and device for identifying objects
CN114445711B (en) Image detection method, image detection device, electronic equipment and storage medium
US20230139957A1 (en) Automated visual recognition for atmospheric visibility measurement
CN111859370A (en) Method, apparatus, electronic device and computer-readable storage medium for identifying service

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant