CN113723169A - Behavior identification method, system and equipment based on SlowFast - Google Patents

Behavior identification method, system and equipment based on SlowFast Download PDF

Info

Publication number
CN113723169A
CN113723169A CN202110455595.XA CN202110455595A CN113723169A CN 113723169 A CN113723169 A CN 113723169A CN 202110455595 A CN202110455595 A CN 202110455595A CN 113723169 A CN113723169 A CN 113723169A
Authority
CN
China
Prior art keywords
slowfast
video data
neural network
training
video
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110455595.XA
Other languages
Chinese (zh)
Other versions
CN113723169B (en
Inventor
马喜波
徐哲
雷震
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Automation of Chinese Academy of Science
Original Assignee
Institute of Automation of Chinese Academy of Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Automation of Chinese Academy of Science filed Critical Institute of Automation of Chinese Academy of Science
Priority to CN202110455595.XA priority Critical patent/CN113723169B/en
Publication of CN113723169A publication Critical patent/CN113723169A/en
Application granted granted Critical
Publication of CN113723169B publication Critical patent/CN113723169B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention belongs to the technical field of behavior recognition, and particularly relates to a behavior recognition method, a behavior recognition system and a behavior recognition equipment device based on SlowFast, aiming at solving the problems of low recognition efficiency and low recognition accuracy. The method comprises the following steps: preprocessing original video data of the target behaviors to obtain preprocessed video data; dividing the pre-processed video data into a training data set and a verification data set; inputting the training data set into a first SlowFast neural network recognition model which is constructed in advance for preliminary training to obtain a second SlowFast neural network recognition model; calculating the identification precision of the second SlowFast neural network identification model according to the verification data set; adjusting parameters of the second SlowFast neural network recognition model according to the recognition precision, and performing iterative training to obtain a third SlowFast neural network recognition model; and identifying target behaviors in the real environment by using a third SlowFast neural network identification model. The invention greatly improves the identification efficiency, saves the labor and time and improves the identification precision.

Description

Behavior identification method, system and equipment based on SlowFast
Technical Field
The invention belongs to the technical field of behavior recognition, and particularly relates to a behavior recognition method, a behavior recognition system and behavior recognition equipment based on SlowFast.
Background
In many medical experiments, it is not possible to directly treat a human as an experimental subject for safety and ethical reasons. At this time, the artificially fed animals can be used for experiments instead of human beings, and the experimental results are obtained by observing and recording the behavior and physiological changes of the animals. Because human and monkey are close relations and are primates, the observation of the behavior change of the monkey has direct biological significance and medical significance.
At present, the monkey behaviors are observed generally in a mode of on-site manual observation and video monitoring, but the existing monitoring mode has the following problems:
1. on-spot manual observation generally wastes time and energy, and the monkey action receives operation personnel's interference easily, has the influence to the experimental result, leads to detecting the precision low.
2. With video surveillance, behavior recording still requires a lot of manual involvement and is therefore not an optimal solution.
Disclosure of Invention
In order to solve the problems of low efficiency and low detection precision in the prior art, the invention provides a behavior identification method, a behavior identification system and behavior identification equipment based on SlowFast,
in a first aspect of the present invention, a behavior identification method based on SlowFast is provided, where the method includes:
preprocessing original video data of the target behaviors to obtain preprocessed video data;
dividing the pre-processed video data into a training data set and a verification data set;
inputting the training data set into a first SlowFast neural network recognition model which is constructed in advance for preliminary training to obtain a second SlowFast neural network recognition model;
calculating the identification precision of the second SlowFast neural network identification model according to the verification data set;
adjusting parameters of the second SlowFast neural network recognition model according to the recognition precision, and performing iterative training to obtain a third SlowFast neural network recognition model;
and identifying target behaviors in a real environment by using the third SlowFast neural network identification model.
Optionally, the preprocessing the original video data of the target behavior to obtain preprocessed video data includes:
performing first preprocessing on the original video data of the target behaviors to obtain a plurality of video segment data, wherein each video segment data comprises one target behavior;
and respectively carrying out second preprocessing on the plurality of video segment data to expand the data to obtain preprocessed video data.
Optionally, the obtaining of the plurality of pieces of video segment data by performing the first preprocessing on the original video data of the target behavior includes:
acquiring the start-stop moment and the behavior category label of each target behavior in the original video data of the target behaviors;
cutting the original video data of the target behaviors according to the starting and stopping moments to obtain video fragment data;
and labeling the video name label of each video clip according to the behavior category label.
Optionally, the performing a second pre-processing on the plurality of pieces of video segment data to expand the data, and obtaining pre-processed video data includes:
and performing one or more operations of random cutting and turning on the plurality of video segments to obtain the expanded preprocessed video data.
Optionally, the inputting the training data set into a first SlowFast neural network recognition model which is constructed in advance for preliminary training, and obtaining a second SlowFast neural network recognition model includes:
sampling the training data set according to a preset sampling rule to obtain a first video data sample with a preset frame number;
performing data enhancement preprocessing on the first video data sample;
sampling a first video data sample subjected to data enhancement preprocessing according to a first sampling interval to obtain a second video data sample, and inputting the second video data sample into a Slow branch in a first Slow fast neural network recognition model to obtain spatial information of a target behavior;
sampling the first video data sample subjected to data enhancement preprocessing according to a second sampling interval to obtain a third video data sample, and inputting the third video data sample into a Fast branch in a first SlowFast neural network identification model to obtain time information of a target behavior; wherein the first sampling interval is greater than the second sampling interval;
fusing the spatial information and the time information;
calculating according to the fused information to obtain a training recognition result;
and repeating the training process according to the preset training times to obtain a second SlowFast neural network recognition model.
Optionally, the sampling the training data set according to a preset sampling rule to obtain a first video data sample with a preset frame number includes:
acquiring an initial frame number of each video data in a training data set;
determining a sampling interval according to a preset proportional corresponding relation between the initial frame number and the sampling interval;
sampling according to the sampling interval to obtain an intermediate video data sample;
and if the frame number of the intermediate video data sample is greater than the preset frame number, randomly intercepting the video data sample with the preset frame number to determine the video data sample as a first video data sample.
Optionally, the calculating the recognition accuracy of the second SlowFast neural network recognition model according to the verification data set includes:
inputting the verification data set into the second SlowFast neural network recognition model, sequentially training each video data in the verification data set, and outputting a verification recognition result of each video data, wherein the verification recognition result is a behavior class verification label;
comparing the behavior class verification label with a video name label;
and calculating the ratio of the verification identification results of the behavior type verification label and the video name label, and determining the ratio as the identification precision.
In a second aspect, the invention provides a SlowFast-based behavior recognition system, comprising:
the preprocessing unit is used for preprocessing the original video data of the target behaviors to obtain preprocessed video data;
a dividing unit, configured to divide the preprocessed video data into a training data set and a verification data set;
the first training unit is used for inputting the training data set into a first SlowFast neural network recognition model which is constructed in advance for preliminary training to obtain a second SlowFast neural network recognition model;
a calculating unit, configured to calculate, according to the verification data set, an identification accuracy of the second SlowFast neural network identification model;
the second training unit is used for adjusting parameters of the second SlowFast neural network recognition model according to the recognition precision and performing iterative training to obtain a third SlowFast neural network recognition model;
and the identification unit is used for identifying the target behaviors in the real environment by utilizing the third SlowFast neural network identification model.
In a third aspect of the present invention, an apparatus is provided, which includes:
at least one processor; and
a memory communicatively coupled to at least one of the processors; wherein the content of the first and second substances,
the memory stores instructions executable by the processor to perform a method for SlowFast-based behavior recognition according to any one of the first aspect.
In a fourth aspect of the present invention, a computer-readable storage medium is provided, which stores computer instructions for execution by the computer to implement the method for SlowFast-based behavior recognition according to the first aspect.
The invention has the beneficial effects that: according to the method, the target behaviors are automatically identified by establishing a neural network identification model based on the SlowFast algorithm, so that the identification efficiency is greatly improved, and the preprocessed video data is divided into a training data set and a verification data set; inputting the training data set into a first SlowFast neural network recognition model which is constructed in advance for preliminary training to obtain a second SlowFast neural network recognition model; calculating the identification precision of the second SlowFast neural network identification model according to the verification data set; adjusting parameters of the second SlowFast neural network recognition model according to the recognition precision, and performing iterative training to obtain a third SlowFast neural network recognition model; and the third SlowFast neural network recognition model is used for recognizing the target behaviors in the real environment, so that the detection precision of the SlowFast neural network recognition model is greatly improved.
Drawings
Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:
FIG. 1 is a schematic diagram of a behavior recognition method based on SlowFast according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a behavior recognition method based on SlowFast according to another embodiment of the present invention;
FIG. 3 is a schematic diagram of a SlowFast-based behavior recognition system according to an embodiment of the present invention;
FIG. 4 is a block diagram of a computer system of a server for implementing embodiments of the method, system, and apparatus of the present application.
Detailed Description
The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.
The invention provides a behavior recognition method based on SlowFast, which is mainly applied to recognition of monkey behaviors, and comprises the following steps:
preprocessing original video data of the target behaviors to obtain preprocessed video data;
dividing the pre-processed video data into a training data set and a verification data set;
inputting the training data set into a first SlowFast neural network recognition model which is constructed in advance for preliminary training to obtain a second SlowFast neural network recognition model;
calculating the identification precision of the second SlowFast neural network identification model according to the verification data set;
adjusting parameters of the second SlowFast neural network recognition model according to the recognition precision, and performing iterative training to obtain a third SlowFast neural network recognition model;
and identifying target behaviors in a real environment by using the third SlowFast neural network identification model.
In order to more clearly explain the behavior recognition method based on SlowFast of the present invention, the following describes the steps in the embodiment of the present invention in detail with reference to fig. 1.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
The behavior recognition method based on SlowFast according to the first embodiment of the invention comprises the following steps S101-S106, and the steps are described in detail as follows:
step S101: and preprocessing the original video data of the target behaviors to obtain preprocessed video data.
In the practice of the present application, the target behaviors are primarily monkey behaviors, which are a collective term for the family monkeys, primates, mammalia, omnivorous, fruit-based, meat-based foods available without the need to spit, and are members of three types of ape primates.
In one example, the observation target may be a rhesus monkey or a cynomolgus monkey, and may also be primarily a rhesus monkey, with a small portion being a cynomolgus monkey.
In this step, before preprocessing the target behavior raw video data, the target behavior raw video data is first acquired. In one example, front and top view video data is obtained primarily for a monkey. The concrete operation can be design and prepare two fixing device, puts into fixing device with two cameras, installs fixing device respectively again and treats the cage front and the top that shoots the monkey place. All the behaviors of the monkeys are photographed without interruption without interference. The photographs were taken to cover monkeys of different sexes and ages as much as possible.
Optionally, the preprocessing the original video data of the target behavior to obtain preprocessed video data includes:
and carrying out first preprocessing on the original video data of the target behaviors to obtain a plurality of video segment data, wherein each video segment comprises one target behavior.
Specifically, the obtaining of the plurality of pieces of video segment data by performing the first preprocessing on the original video data of the target behavior includes:
acquiring the start-stop moment and the behavior category label of each target behavior in the original video data of the target behaviors;
in the embodiment of the application, the original video data of the target behaviors are cleaned firstly, the original video data of the target behaviors with higher definition are selected, then the original video data of the target behaviors with higher definition are watched according to the predetermined action categories, and the starting time and the ending time of each action category are recorded; and labeling the behavior category labels.
The action categories are determined in advance, all actions of a target, such as a monkey, are defined in a classified manner by a worker in advance, the actions of the monkey are required to be completely visible, the occurrence frequency is high, the definition can be made clear, the classification is more detailed, and the types of recognition of the SlowFast neural network recognition model are more. In one example, the action categories of monkeys can be divided into 10 categories, 1, lying down, respectively; 2. squatting; 3. walking; 4. jumping upwards; 5. jumping downwards; 6. climbing upwards; 7. climbing downwards; 8. hanging; 9. standing; 10. and (4) climbing. Wherein, 1-10 are behavior category labels.
And cutting the original video data of the target behaviors according to the starting and stopping moments to obtain video fragment data. In one example, Python code may be written in conjunction with the Ffmpeg software command line to batch crop data from the start-stop time.
And labeling the video name label of the video clip according to the behavior category label. For example, if the action category represented by the action category label "1" is lying down, the video name label may be set to "1", so that the action category label and thus the action category may be determined according to the video name label.
And respectively carrying out second preprocessing on the plurality of video segment data to expand the data to obtain preprocessed video data.
Optionally, the performing a second pre-processing on the plurality of pieces of video segment data to expand the data, and obtaining pre-processed video data includes:
and performing one or more operations of random cutting and turning on the plurality of video segments to obtain the expanded preprocessed video data.
The data can be expanded through the second preprocessing, the amount of the preprocessed video data is increased, and the preprocessed video data can be divided into the training data set and the verification data set only under the condition that the preprocessed video data are sufficient, so that a data basis is provided for the step S102.
Step S102: the pre-processed video data is divided into a training data set and a verification data set.
In the step, division is performed according to a preset proportion, for example, 4:1, and the proportion of the training data set is larger than that of the verification data set.
Step S103: and inputting the training data set into a first SlowFast neural network recognition model which is constructed in advance for preliminary training to obtain a second SlowFast neural network recognition model.
Optionally, the step of inputting the training data set into a first SlowFast neural network recognition model which is constructed in advance to perform preliminary training to obtain a second SlowFast neural network recognition model includes the following steps, as shown in fig. 2:
step S201: sampling the training data set according to a preset sampling rule to obtain a first video data sample with a preset frame number, and specifically comprising the following steps:
acquiring an initial frame number of each video data in a training data set, and determining a sampling interval according to a preset proportional corresponding relation between the initial frame number and the sampling interval; for example, 30-60 frames of video, with a sample interval of 1; the video with 60-90 frames has a sampling interval of 2; the video of 90-180 frames, the sampling interval is 3; video larger than 180 frames: the sampling interval is 4. Sampling according to the sampling interval to obtain an intermediate video data sample; if the frame number of the intermediate video data sample is greater than the preset frame number, randomly intercepting the video data sample with the preset frame number to determine that the video data sample is a first video data sample, for example, a video with 50 frames, and the sampling interval is 1, then obtaining 50 video data samples, and randomly intercepting 30 continuous frames in the 50 video data samples as the first video data sample.
Because the initial frame numbers of the videos are inconsistent, the sampling interval is dynamically set according to the initial frame numbers of the videos, and the longer the video is, the larger the sampling interval is, so that the global information of the video is more favorably obtained.
Step S202: performing data enhancement pre-processing on the first video data sample. The processing method comprises the following steps: random clipping and horizontal flipping at 50% probability.
Step S203: and sampling the first video data sample subjected to data enhancement preprocessing according to a first sampling interval to obtain a second video data sample, and inputting the second video data sample into a Slow branch in a first Slow fast neural network recognition model to obtain spatial information of a target behavior.
In one example, after first sampling first video data samples of 30 frames, for example, the first video data samples of 30 frames are sampled again at a sampling interval of 6, and second video data samples of 5 video frames are obtained and input to the Slow branch.
In the application, the Slow branch is used for acquiring spatial information of a video, such as color, plants and other information around a monkey, and although there are few input video frames in the Slow branch, feature information is complex and fine-grained, so that a large amount of calculation is generated, and the calculation amount occupies about 80% of the whole network model.
Step S204: sampling the first video data sample subjected to data enhancement preprocessing according to a second sampling interval to obtain a third video data sample, and inputting the third video data sample into a Fast branch in a first SlowFast neural network identification model to obtain time information of a target behavior; wherein the first sampling interval is greater than the second sampling interval.
In one example, after the first video data sample of 30 frames is obtained by the first sampling, the first video data sample of 30 frames is sampled again with a sampling interval of 2, and the third video data sample of 15 video frames is obtained and input to the Fast branch.
In the application, the Fast branch is used for acquiring time information of a video, such as action information of a monkey from 2S to 3S, and although there are many input video frames in the Fast branch, the characteristic information is simple, fine granularity is low, and the calculation amount is small, and occupies about 20% of the whole network for calculation.
Step S205: and fusing the spatial information and the time information.
In this step, the first SlowFast neural network identification model includes a channel connected from the Fast branch to the Slow branch, so as to fuse the time information and the spatial information. However, because the number of input video frames of the two branches is different, the generated feature dimensions are also different, so that the feature maps of the Fast branch need to be subjected to scale change by using a 3D convolution kernel such as 5 × 1 during connection, and then summed with the feature map of the Slow branch to realize temporal and spatial feature information fusion.
Step S206: and calculating to obtain a training recognition result according to the fused information.
In the step, complete video information is obtained after fusion, and is input into a full link layer extraction characteristic value of a first SlowFast neural network recognition model, and the characteristics extracted from the full link layer are input into a sigmoid regression layer for calculation, so that a training recognition result is obtained.
Step S207: and repeating the training processes S201-S206 according to preset training times to obtain a second SlowFast neural network recognition model.
Step S104: and calculating the identification precision of the second SlowFast neural network identification model according to the verification data set.
Optionally, the calculating the recognition accuracy of the second SlowFast neural network recognition model according to the verification data set includes:
inputting the verification data set into the second SlowFast neural network recognition model, sequentially training each video data in the verification data set, and outputting a verification recognition result of each video data, wherein the verification recognition result is a behavior type verification label. The training process refers to steps S201 to S206, which are not described herein again.
Comparing the behavior class verification label with a video name label;
and calculating the ratio of the verification identification results of the behavior type verification label and the video name label, and determining the ratio as the identification precision.
In one example, for example, the verification training set includes 10 sets of video segment data, each set includes 10 video segment data, each video segment corresponds to one video name tag, and the video name tags corresponding to the 10 video segment data are, 1 and lying down, respectively; 2. squatting; 3. walking; 4. jumping upwards; 5. jumping downwards; 6. climbing upwards; 7. climbing downwards; 8. hanging; 9. standing; 10. and (4) climbing. 1-10 represent the behavior category label and also represent the video name label, for example, after a video segment with the video name label of 1 is trained, the output is the behavior category label 2, then the action category is determined to be squatting according to the behavior category label 2, and the identification is wrong when the action category is different from the input video label, that is, the action is different from the real action. Assuming that 5 of the 10 video segments in one group are identified incorrectly and 5 are identified correctly, the group identification accuracy is 50%.
Step S105: and adjusting parameters of the second SlowFast neural network recognition model according to the recognition precision, and performing iterative training to obtain a third SlowFast neural network recognition model.
In this step, iterative training times are preset, the second SlowFast neural network recognition model is trained according to the preset training times, for example, 1000 times, model parameters such as parameters learning _ rate and weight _ decay are adjusted according to the output recognition precision each time training is performed, and after the training of all times is completed, the corresponding model parameter with the highest recognition precision is configured as a third SlowFast neural network recognition model parameter.
Step S106: and identifying target behaviors in a real environment by using the third SlowFast neural network identification model. And deploying the third SlowFast neural network recognition model to a server in a real environment, and recognizing the monkey behaviors.
In another embodiment of the present application, the preprocessed video data may be divided into test data sets, for example, according to a ratio of 3:1:1, where the training data set accounts for 60%, the verification data set accounts for 20%, and the test data set accounts for 20%, where the test data set is used to test the performance of the third SlowFast neural network recognition model, and the optimal recognition accuracy of the third SlowFast neural network recognition model on the test data set is determined as the recognition accuracy of the third SlowFast neural network recognition model.
In a second aspect, based on the same inventive concept, the invention provides a behavior recognition system based on SlowFast, which is mainly used for recognition of monkey behaviors, as shown in fig. 3, and the system comprises:
the preprocessing unit 301 is configured to preprocess the original video data of the target behavior to obtain preprocessed video data;
a dividing unit 302, configured to divide the preprocessed video data into a training data set and a verification data set;
a first training unit 303, configured to input the training data set into a first SlowFast neural network recognition model that is constructed in advance for preliminary training, so as to obtain a second SlowFast neural network recognition model;
a calculating unit 304, configured to calculate, according to the verification data set, an identification accuracy of the second SlowFast neural network identification model;
a second training unit 305, configured to adjust parameters of the second SlowFast neural network recognition model according to the recognition accuracy, and perform iterative training to obtain a third SlowFast neural network recognition model;
a recognition unit 306, configured to recognize a target behavior in a real environment by using the third SlowFast neural network recognition model.
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process and related description of the system described above may refer to the corresponding process in the foregoing method embodiments, and will not be described herein again.
It should be noted that the behavior recognition system based on SlowFast provided in the foregoing embodiment is only illustrated by the division of the foregoing functional modules, and in practical applications, the functions may be allocated to different functional modules according to needs, that is, the modules or steps in the embodiment of the present invention are further decomposed or combined, for example, the modules in the foregoing embodiment may be combined into one module, or may be further split into multiple sub-modules, so as to complete all or part of the functions described above. The names of the modules and steps involved in the embodiments of the present invention are only for distinguishing the modules or steps, and are not to be construed as unduly limiting the present invention.
An apparatus of a third embodiment of the invention comprises:
at least one processor; and
a memory communicatively coupled to at least one of the processors; wherein the content of the first and second substances,
the memory stores instructions executable by the processor to perform a method for SlowFast-based behavior recognition according to any one of the first aspect.
A computer-readable storage medium of a fourth embodiment of the present invention stores computer instructions for execution by the computer to implement the method for SlowFast-based behavior recognition according to any one of the first aspect.
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes and related descriptions of the storage device and the processing device described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
Referring now to FIG. 4, therein is shown a block diagram of a computer system of a server that may be used to implement embodiments of the method, system, and apparatus of the present application. The server shown in fig. 4 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.
As shown in fig. 4, the computer system includes a Central Processing Unit (CPU)401 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 402 or a program loaded from a storage section 408 into a Random Access Memory (RAM) 403. In the RAM 403, various programs and data necessary for system operation are also stored. The CPU401, ROM 402, and RAM 403 are connected to each other via a bus 404. An Input/Output (I/O) interface 405 is also connected to the bus 404.
The following components are connected to the I/O interface 405: an input section 406 including a keyboard, a mouse, and the like; an output section 407 including a Display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and a speaker; a storage section 408 including a hard disk and the like; and a communication section 409 including a Network interface card such as a LAN (Local Area Network) card, a modem, or the like. The communication section 409 performs communication processing via a network such as the internet. A driver 410 is also connected to the I/O interface 405 as needed. A removable medium 411 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 410 as necessary, so that a computer program read out therefrom is mounted into the storage section 408 as necessary.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 409, and/or installed from the removable medium 411. The computer program performs the above-described functions defined in the method of the present application when executed by a Central Processing Unit (CPU) 401. It should be noted that the computer readable medium mentioned above in the present application may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The terms "first," "second," and the like are used for distinguishing between similar elements and not necessarily for describing or implying a particular order or sequence.
The terms "comprises," "comprising," or any other similar term are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
So far, the technical solutions of the present invention have been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of the present invention is obviously not limited to these specific embodiments. Equivalent changes or substitutions of related technical features can be made by those skilled in the art without departing from the principle of the invention, and the technical scheme after the changes or substitutions can fall into the protection scope of the invention.

Claims (10)

1. A behavior recognition method based on SlowFast, characterized by comprising the following steps:
preprocessing original video data of the target behaviors to obtain preprocessed video data;
dividing the pre-processed video data into a training data set and a verification data set;
inputting the training data set into a first SlowFast neural network recognition model which is constructed in advance for preliminary training to obtain a second SlowFast neural network recognition model;
calculating the identification precision of the second SlowFast neural network identification model according to the verification data set;
adjusting parameters of the second SlowFast neural network recognition model according to the recognition precision, and performing iterative training to obtain a third SlowFast neural network recognition model;
and identifying target behaviors in a real environment by using the third SlowFast neural network identification model.
2. The method of claim 1, wherein preprocessing the raw video data of the target behavior to obtain preprocessed video data comprises:
performing first preprocessing on the original video data of the target behaviors to obtain a plurality of video segment data, wherein each video segment data comprises one target behavior;
and respectively carrying out second preprocessing on the plurality of video segment data to expand the data to obtain preprocessed video data.
3. The method of claim 2, wherein the first pre-processing the raw video data of the target line to obtain a plurality of video segment data comprises:
acquiring the start-stop moment and the behavior category label of each target behavior in the original video data of the target behaviors;
cutting the original video data of the target behaviors according to the starting and stopping moments to obtain video fragment data;
and labeling the video name label of each video clip according to the behavior category label.
4. The method of claim 2, wherein the second pre-processing the video segment data to expand the data respectively comprises:
and performing one or more operations of random cutting and turning on the plurality of video segments to obtain the expanded preprocessed video data.
5. The method according to claim 1, wherein the inputting the training data set into a first SlowFast neural network recognition model which is constructed in advance for preliminary training, and obtaining a second SlowFast neural network recognition model comprises:
sampling the training data set according to a preset sampling rule to obtain a first video data sample with a preset frame number;
performing data enhancement preprocessing on the first video data sample;
sampling a first video data sample subjected to data enhancement preprocessing according to a first sampling interval to obtain a second video data sample, and inputting the second video data sample into a Slow branch in a first Slow fast neural network recognition model to obtain spatial information of a target behavior;
sampling the first video data sample subjected to data enhancement preprocessing according to a second sampling interval to obtain a third video data sample, and inputting the third video data sample into a Fast branch in a first SlowFast neural network identification model to obtain time information of a target behavior; wherein the first sampling interval is greater than the second sampling interval;
fusing the spatial information and the time information;
calculating according to the fused information to obtain a training recognition result;
and repeating the training process according to the preset training times to obtain a second SlowFast neural network recognition model.
6. The method of claim 5, wherein sampling the training data set according to a predetermined sampling rule to obtain a predetermined number of first video data samples comprises:
acquiring an initial frame number of each video data in a training data set;
determining a sampling interval according to a preset proportional corresponding relation between the initial frame number and the sampling interval;
sampling according to the sampling interval to obtain an intermediate video data sample;
and if the frame number of the intermediate video data sample is greater than the preset frame number, randomly intercepting the video data sample with the preset frame number to determine the video data sample as a first video data sample.
7. The method according to claim 3, wherein said calculating the recognition accuracy of the second SlowFast neural network recognition model from the validation data set comprises:
inputting the verification data set into the second SlowFast neural network recognition model, sequentially training each video data in the verification data set, and outputting a verification recognition result of each video data, wherein the verification recognition result is a behavior class verification label;
comparing the behavior class verification label with a video name label;
and calculating the ratio of the verification identification results of the behavior type verification label and the video name label, and determining the ratio as the identification precision.
8. A SlowFast-based behavior recognition system, the system comprising:
the preprocessing unit is used for preprocessing the original video data of the target behaviors to obtain preprocessed video data;
a dividing unit, configured to divide the preprocessed video data into a training data set and a verification data set;
the first training unit is used for inputting the training data set into a first SlowFast neural network recognition model which is constructed in advance for preliminary training to obtain a second SlowFast neural network recognition model;
a calculating unit, configured to calculate, according to the verification data set, an identification accuracy of the second SlowFast neural network identification model;
the second training unit is used for adjusting parameters of the second SlowFast neural network recognition model according to the recognition precision and performing iterative training to obtain a third SlowFast neural network recognition model;
and the identification unit is used for identifying the target behaviors in the real environment by utilizing the third SlowFast neural network identification model.
9. An apparatus, comprising:
at least one processor; and
a memory communicatively coupled to at least one of the processors; wherein the content of the first and second substances,
the memory stores instructions executable by the processor to perform the method for SlowFast-based behavior recognition of any one of claims 1-7.
10. A computer-readable storage medium storing computer instructions for execution by the computer to perform the method for SlowFast-based behavior recognition of any one of claims 1-7.
CN202110455595.XA 2021-04-26 2021-04-26 SlowFast-based behavior recognition method, system and equipment Active CN113723169B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110455595.XA CN113723169B (en) 2021-04-26 2021-04-26 SlowFast-based behavior recognition method, system and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110455595.XA CN113723169B (en) 2021-04-26 2021-04-26 SlowFast-based behavior recognition method, system and equipment

Publications (2)

Publication Number Publication Date
CN113723169A true CN113723169A (en) 2021-11-30
CN113723169B CN113723169B (en) 2024-04-30

Family

ID=78672693

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110455595.XA Active CN113723169B (en) 2021-04-26 2021-04-26 SlowFast-based behavior recognition method, system and equipment

Country Status (1)

Country Link
CN (1) CN113723169B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114359791A (en) * 2021-12-16 2022-04-15 北京信智文科技有限公司 Group macaque appetite detection method based on Yolo v5 network and SlowFast network
CN115376210A (en) * 2022-10-24 2022-11-22 杭州巨岩欣成科技有限公司 Drowning behavior identification method, device, equipment and medium for preventing drowning in swimming pool
WO2023108782A1 (en) * 2021-12-15 2023-06-22 深圳先进技术研究院 Method and apparatus for training behavior recognition model, behavior recognition method, apparatus and system, and medium
CN116363137A (en) * 2023-06-01 2023-06-30 合力(天津)能源科技股份有限公司 Cleaning effect evaluation method and system for guiding automatic cleaning of oil pipe
WO2023147778A1 (en) * 2022-02-07 2023-08-10 北京字跳网络技术有限公司 Action recognition method and apparatus, and electronic device and storage medium
CN116110586B (en) * 2023-04-13 2023-11-21 南京市红山森林动物园管理处 Elephant health management system based on YOLOv5 and SlowFast

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108647643A (en) * 2018-05-11 2018-10-12 浙江工业大学 A kind of packed tower liquid flooding state on-line identification method based on deep learning
CN109145789A (en) * 2018-08-09 2019-01-04 炜呈智能电力科技(杭州)有限公司 Power supply system safety work support method and system
US20190068627A1 (en) * 2017-08-28 2019-02-28 Oracle International Corporation Cloud based security monitoring using unsupervised pattern recognition and deep learning
CN110119703A (en) * 2019-05-07 2019-08-13 福州大学 The human motion recognition method of attention mechanism and space-time diagram convolutional neural networks is merged under a kind of security protection scene
CN110555368A (en) * 2019-06-28 2019-12-10 西安理工大学 Fall-down behavior identification method based on three-dimensional convolutional neural network
CN110717301A (en) * 2019-09-19 2020-01-21 中国石油大学(华东) Flow unit information classification and identification method based on support vector machine algorithm
CN110852168A (en) * 2019-10-11 2020-02-28 西北大学 Pedestrian re-recognition model construction method and device based on neural framework search
CN111291840A (en) * 2020-05-12 2020-06-16 成都派沃智通科技有限公司 Student classroom behavior recognition system, method, medium and terminal device
CN111598230A (en) * 2019-02-21 2020-08-28 北京创新工场旷视国际人工智能技术研究院有限公司 Training method and system of neural network model with anti-counterfeiting function, anti-counterfeiting verification method and electronic device
CN111814669A (en) * 2020-07-08 2020-10-23 中国工商银行股份有限公司 Method and device for identifying abnormal behaviors of bank outlets
CN111814661A (en) * 2020-07-07 2020-10-23 西安电子科技大学 Human behavior identification method based on residual error-recurrent neural network
CN112183313A (en) * 2020-09-27 2021-01-05 武汉大学 SlowFast-based power operation field action identification method
US20210073526A1 (en) * 2019-09-10 2021-03-11 Blue Planet Training, Inc. System and Method for Visual Analysis of Emotional Coherence in Videos
CN112529020A (en) * 2020-12-24 2021-03-19 携程旅游信息技术(上海)有限公司 Animal identification method, system, equipment and storage medium based on neural network
CN112580523A (en) * 2020-12-22 2021-03-30 平安国际智慧城市科技股份有限公司 Behavior recognition method, behavior recognition device, behavior recognition equipment and storage medium

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190068627A1 (en) * 2017-08-28 2019-02-28 Oracle International Corporation Cloud based security monitoring using unsupervised pattern recognition and deep learning
CN108647643A (en) * 2018-05-11 2018-10-12 浙江工业大学 A kind of packed tower liquid flooding state on-line identification method based on deep learning
CN109145789A (en) * 2018-08-09 2019-01-04 炜呈智能电力科技(杭州)有限公司 Power supply system safety work support method and system
CN111598230A (en) * 2019-02-21 2020-08-28 北京创新工场旷视国际人工智能技术研究院有限公司 Training method and system of neural network model with anti-counterfeiting function, anti-counterfeiting verification method and electronic device
CN110119703A (en) * 2019-05-07 2019-08-13 福州大学 The human motion recognition method of attention mechanism and space-time diagram convolutional neural networks is merged under a kind of security protection scene
CN110555368A (en) * 2019-06-28 2019-12-10 西安理工大学 Fall-down behavior identification method based on three-dimensional convolutional neural network
US20210073526A1 (en) * 2019-09-10 2021-03-11 Blue Planet Training, Inc. System and Method for Visual Analysis of Emotional Coherence in Videos
CN110717301A (en) * 2019-09-19 2020-01-21 中国石油大学(华东) Flow unit information classification and identification method based on support vector machine algorithm
CN110852168A (en) * 2019-10-11 2020-02-28 西北大学 Pedestrian re-recognition model construction method and device based on neural framework search
CN111291840A (en) * 2020-05-12 2020-06-16 成都派沃智通科技有限公司 Student classroom behavior recognition system, method, medium and terminal device
CN111814661A (en) * 2020-07-07 2020-10-23 西安电子科技大学 Human behavior identification method based on residual error-recurrent neural network
CN111814669A (en) * 2020-07-08 2020-10-23 中国工商银行股份有限公司 Method and device for identifying abnormal behaviors of bank outlets
CN112183313A (en) * 2020-09-27 2021-01-05 武汉大学 SlowFast-based power operation field action identification method
CN112580523A (en) * 2020-12-22 2021-03-30 平安国际智慧城市科技股份有限公司 Behavior recognition method, behavior recognition device, behavior recognition equipment and storage medium
CN112529020A (en) * 2020-12-24 2021-03-19 携程旅游信息技术(上海)有限公司 Animal identification method, system, equipment and storage medium based on neural network

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
TJENG WAWAN CENGGORO; AWANG HARSA KRIDALAKSANA; EKA ARRIYANTI; M. IRWAN UKKAS: ""Recognition of a human behavior pattern in paper rock scissor game using backpropagation artificial neural network method"", 《2014 2ND INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY (ICOICT)》 *
杨洁;陈灵娜;林颖;陈宇韶;陈俊熹;: "基于卷积网络的视频目标检测", 南华大学学报(自然科学版), no. 04 *
解怀奇;乐红兵;: "基于通道注意力机制的视频人体行为识别", 电子技术与软件工程, no. 04 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023108782A1 (en) * 2021-12-15 2023-06-22 深圳先进技术研究院 Method and apparatus for training behavior recognition model, behavior recognition method, apparatus and system, and medium
CN114359791A (en) * 2021-12-16 2022-04-15 北京信智文科技有限公司 Group macaque appetite detection method based on Yolo v5 network and SlowFast network
CN114359791B (en) * 2021-12-16 2023-08-01 北京信智文科技有限公司 Group macaque appetite detection method based on Yolo v5 network and SlowFast network
WO2023147778A1 (en) * 2022-02-07 2023-08-10 北京字跳网络技术有限公司 Action recognition method and apparatus, and electronic device and storage medium
CN115376210A (en) * 2022-10-24 2022-11-22 杭州巨岩欣成科技有限公司 Drowning behavior identification method, device, equipment and medium for preventing drowning in swimming pool
CN116110586B (en) * 2023-04-13 2023-11-21 南京市红山森林动物园管理处 Elephant health management system based on YOLOv5 and SlowFast
CN116363137A (en) * 2023-06-01 2023-06-30 合力(天津)能源科技股份有限公司 Cleaning effect evaluation method and system for guiding automatic cleaning of oil pipe
CN116363137B (en) * 2023-06-01 2023-08-04 合力(天津)能源科技股份有限公司 Cleaning effect evaluation method and system for guiding automatic cleaning of oil pipe

Also Published As

Publication number Publication date
CN113723169B (en) 2024-04-30

Similar Documents

Publication Publication Date Title
CN113723169B (en) SlowFast-based behavior recognition method, system and equipment
US10937144B2 (en) Pipe feature identification using pipe inspection data analysis
CN110705405B (en) Target labeling method and device
CN108171260B (en) Picture identification method and system
CN113382279B (en) Live broadcast recommendation method, device, equipment, storage medium and computer program product
CN111046956A (en) Occlusion image detection method and device, electronic equipment and storage medium
CN113158909B (en) Behavior recognition light-weight method, system and equipment based on multi-target tracking
CN110751675B (en) Urban pet activity track monitoring method based on image recognition and related equipment
CN111346842A (en) Coal gangue sorting method, device, equipment and storage medium
CN109285181B (en) Method and apparatus for recognizing image
CN109685847B (en) Training method and device for visual target detection model
CN108982522B (en) Method and apparatus for detecting pipe defects
CN111598913B (en) Image segmentation method and system based on robot vision
CN111950812B (en) Method and device for automatically identifying and predicting rainfall
Mann et al. Automatic flower detection and phenology monitoring using time‐lapse cameras and deep learning
CN114724140A (en) Strawberry maturity detection method and device based on YOLO V3
CN109088793B (en) Method and apparatus for detecting network failure
Prior et al. Estimating precision and accuracy of automated video post-processing: A step towards implementation of ai/ml for optics-based fish sampling
US10922569B2 (en) Method and apparatus for detecting model reliability
CN115438945A (en) Risk identification method, device, equipment and medium based on power equipment inspection
CN114494971A (en) Video yellow-related detection method and device, electronic equipment and storage medium
CN114038040A (en) Machine room inspection monitoring method, device and equipment
CN114821396A (en) Normative detection method, device and storage medium for LNG unloading operation process
CN114241376A (en) Behavior recognition model training and behavior recognition method, device, system and medium
CN112308090A (en) Image classification method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant