CN113723169A - Behavior identification method, system and equipment based on SlowFast - Google Patents
Behavior identification method, system and equipment based on SlowFast Download PDFInfo
- Publication number
- CN113723169A CN113723169A CN202110455595.XA CN202110455595A CN113723169A CN 113723169 A CN113723169 A CN 113723169A CN 202110455595 A CN202110455595 A CN 202110455595A CN 113723169 A CN113723169 A CN 113723169A
- Authority
- CN
- China
- Prior art keywords
- slowfast
- video data
- neural network
- training
- video
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 48
- 230000006399 behavior Effects 0.000 claims abstract description 109
- 238000013528 artificial neural network Methods 0.000 claims abstract description 86
- 238000012549 training Methods 0.000 claims abstract description 78
- 238000012795 verification Methods 0.000 claims abstract description 48
- 238000007781 pre-processing Methods 0.000 claims abstract description 36
- 238000005070 sampling Methods 0.000 claims description 50
- 230000008569 process Effects 0.000 claims description 12
- 238000002372 labelling Methods 0.000 claims description 4
- 239000012634 fragment Substances 0.000 claims description 3
- 239000000126 substance Substances 0.000 claims description 3
- 238000010200 validation analysis Methods 0.000 claims 1
- 241000282693 Cercopithecidae Species 0.000 description 18
- 230000009471 action Effects 0.000 description 17
- 238000010586 diagram Methods 0.000 description 9
- 238000004590 computer program Methods 0.000 description 8
- 230000006870 function Effects 0.000 description 7
- 230000009194 climbing Effects 0.000 description 6
- 238000004364 calculation method Methods 0.000 description 5
- 238000012360 testing method Methods 0.000 description 5
- 238000004891 communication Methods 0.000 description 4
- 230000009191 jumping Effects 0.000 description 4
- 230000003287 optical effect Effects 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- 241000282414 Homo sapiens Species 0.000 description 3
- 241000288906 Primates Species 0.000 description 3
- 241000282567 Macaca fascicularis Species 0.000 description 2
- 241000282560 Macaca mulatta Species 0.000 description 2
- 241001465754 Metazoa Species 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 230000004927 fusion Effects 0.000 description 2
- 238000012544 monitoring process Methods 0.000 description 2
- 230000000644 propagated effect Effects 0.000 description 2
- 239000004065 semiconductor Substances 0.000 description 2
- 238000006467 substitution reaction Methods 0.000 description 2
- 241000124008 Mammalia Species 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 235000013399 edible fruits Nutrition 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 235000013305 food Nutrition 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 235000013372 meat Nutrition 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Image Analysis (AREA)
Abstract
The invention belongs to the technical field of behavior recognition, and particularly relates to a behavior recognition method, a behavior recognition system and a behavior recognition equipment device based on SlowFast, aiming at solving the problems of low recognition efficiency and low recognition accuracy. The method comprises the following steps: preprocessing original video data of the target behaviors to obtain preprocessed video data; dividing the pre-processed video data into a training data set and a verification data set; inputting the training data set into a first SlowFast neural network recognition model which is constructed in advance for preliminary training to obtain a second SlowFast neural network recognition model; calculating the identification precision of the second SlowFast neural network identification model according to the verification data set; adjusting parameters of the second SlowFast neural network recognition model according to the recognition precision, and performing iterative training to obtain a third SlowFast neural network recognition model; and identifying target behaviors in the real environment by using a third SlowFast neural network identification model. The invention greatly improves the identification efficiency, saves the labor and time and improves the identification precision.
Description
Technical Field
The invention belongs to the technical field of behavior recognition, and particularly relates to a behavior recognition method, a behavior recognition system and behavior recognition equipment based on SlowFast.
Background
In many medical experiments, it is not possible to directly treat a human as an experimental subject for safety and ethical reasons. At this time, the artificially fed animals can be used for experiments instead of human beings, and the experimental results are obtained by observing and recording the behavior and physiological changes of the animals. Because human and monkey are close relations and are primates, the observation of the behavior change of the monkey has direct biological significance and medical significance.
At present, the monkey behaviors are observed generally in a mode of on-site manual observation and video monitoring, but the existing monitoring mode has the following problems:
1. on-spot manual observation generally wastes time and energy, and the monkey action receives operation personnel's interference easily, has the influence to the experimental result, leads to detecting the precision low.
2. With video surveillance, behavior recording still requires a lot of manual involvement and is therefore not an optimal solution.
Disclosure of Invention
In order to solve the problems of low efficiency and low detection precision in the prior art, the invention provides a behavior identification method, a behavior identification system and behavior identification equipment based on SlowFast,
in a first aspect of the present invention, a behavior identification method based on SlowFast is provided, where the method includes:
preprocessing original video data of the target behaviors to obtain preprocessed video data;
dividing the pre-processed video data into a training data set and a verification data set;
inputting the training data set into a first SlowFast neural network recognition model which is constructed in advance for preliminary training to obtain a second SlowFast neural network recognition model;
calculating the identification precision of the second SlowFast neural network identification model according to the verification data set;
adjusting parameters of the second SlowFast neural network recognition model according to the recognition precision, and performing iterative training to obtain a third SlowFast neural network recognition model;
and identifying target behaviors in a real environment by using the third SlowFast neural network identification model.
Optionally, the preprocessing the original video data of the target behavior to obtain preprocessed video data includes:
performing first preprocessing on the original video data of the target behaviors to obtain a plurality of video segment data, wherein each video segment data comprises one target behavior;
and respectively carrying out second preprocessing on the plurality of video segment data to expand the data to obtain preprocessed video data.
Optionally, the obtaining of the plurality of pieces of video segment data by performing the first preprocessing on the original video data of the target behavior includes:
acquiring the start-stop moment and the behavior category label of each target behavior in the original video data of the target behaviors;
cutting the original video data of the target behaviors according to the starting and stopping moments to obtain video fragment data;
and labeling the video name label of each video clip according to the behavior category label.
Optionally, the performing a second pre-processing on the plurality of pieces of video segment data to expand the data, and obtaining pre-processed video data includes:
and performing one or more operations of random cutting and turning on the plurality of video segments to obtain the expanded preprocessed video data.
Optionally, the inputting the training data set into a first SlowFast neural network recognition model which is constructed in advance for preliminary training, and obtaining a second SlowFast neural network recognition model includes:
sampling the training data set according to a preset sampling rule to obtain a first video data sample with a preset frame number;
performing data enhancement preprocessing on the first video data sample;
sampling a first video data sample subjected to data enhancement preprocessing according to a first sampling interval to obtain a second video data sample, and inputting the second video data sample into a Slow branch in a first Slow fast neural network recognition model to obtain spatial information of a target behavior;
sampling the first video data sample subjected to data enhancement preprocessing according to a second sampling interval to obtain a third video data sample, and inputting the third video data sample into a Fast branch in a first SlowFast neural network identification model to obtain time information of a target behavior; wherein the first sampling interval is greater than the second sampling interval;
fusing the spatial information and the time information;
calculating according to the fused information to obtain a training recognition result;
and repeating the training process according to the preset training times to obtain a second SlowFast neural network recognition model.
Optionally, the sampling the training data set according to a preset sampling rule to obtain a first video data sample with a preset frame number includes:
acquiring an initial frame number of each video data in a training data set;
determining a sampling interval according to a preset proportional corresponding relation between the initial frame number and the sampling interval;
sampling according to the sampling interval to obtain an intermediate video data sample;
and if the frame number of the intermediate video data sample is greater than the preset frame number, randomly intercepting the video data sample with the preset frame number to determine the video data sample as a first video data sample.
Optionally, the calculating the recognition accuracy of the second SlowFast neural network recognition model according to the verification data set includes:
inputting the verification data set into the second SlowFast neural network recognition model, sequentially training each video data in the verification data set, and outputting a verification recognition result of each video data, wherein the verification recognition result is a behavior class verification label;
comparing the behavior class verification label with a video name label;
and calculating the ratio of the verification identification results of the behavior type verification label and the video name label, and determining the ratio as the identification precision.
In a second aspect, the invention provides a SlowFast-based behavior recognition system, comprising:
the preprocessing unit is used for preprocessing the original video data of the target behaviors to obtain preprocessed video data;
a dividing unit, configured to divide the preprocessed video data into a training data set and a verification data set;
the first training unit is used for inputting the training data set into a first SlowFast neural network recognition model which is constructed in advance for preliminary training to obtain a second SlowFast neural network recognition model;
a calculating unit, configured to calculate, according to the verification data set, an identification accuracy of the second SlowFast neural network identification model;
the second training unit is used for adjusting parameters of the second SlowFast neural network recognition model according to the recognition precision and performing iterative training to obtain a third SlowFast neural network recognition model;
and the identification unit is used for identifying the target behaviors in the real environment by utilizing the third SlowFast neural network identification model.
In a third aspect of the present invention, an apparatus is provided, which includes:
at least one processor; and
a memory communicatively coupled to at least one of the processors; wherein the content of the first and second substances,
the memory stores instructions executable by the processor to perform a method for SlowFast-based behavior recognition according to any one of the first aspect.
In a fourth aspect of the present invention, a computer-readable storage medium is provided, which stores computer instructions for execution by the computer to implement the method for SlowFast-based behavior recognition according to the first aspect.
The invention has the beneficial effects that: according to the method, the target behaviors are automatically identified by establishing a neural network identification model based on the SlowFast algorithm, so that the identification efficiency is greatly improved, and the preprocessed video data is divided into a training data set and a verification data set; inputting the training data set into a first SlowFast neural network recognition model which is constructed in advance for preliminary training to obtain a second SlowFast neural network recognition model; calculating the identification precision of the second SlowFast neural network identification model according to the verification data set; adjusting parameters of the second SlowFast neural network recognition model according to the recognition precision, and performing iterative training to obtain a third SlowFast neural network recognition model; and the third SlowFast neural network recognition model is used for recognizing the target behaviors in the real environment, so that the detection precision of the SlowFast neural network recognition model is greatly improved.
Drawings
Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:
FIG. 1 is a schematic diagram of a behavior recognition method based on SlowFast according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a behavior recognition method based on SlowFast according to another embodiment of the present invention;
FIG. 3 is a schematic diagram of a SlowFast-based behavior recognition system according to an embodiment of the present invention;
FIG. 4 is a block diagram of a computer system of a server for implementing embodiments of the method, system, and apparatus of the present application.
Detailed Description
The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.
The invention provides a behavior recognition method based on SlowFast, which is mainly applied to recognition of monkey behaviors, and comprises the following steps:
preprocessing original video data of the target behaviors to obtain preprocessed video data;
dividing the pre-processed video data into a training data set and a verification data set;
inputting the training data set into a first SlowFast neural network recognition model which is constructed in advance for preliminary training to obtain a second SlowFast neural network recognition model;
calculating the identification precision of the second SlowFast neural network identification model according to the verification data set;
adjusting parameters of the second SlowFast neural network recognition model according to the recognition precision, and performing iterative training to obtain a third SlowFast neural network recognition model;
and identifying target behaviors in a real environment by using the third SlowFast neural network identification model.
In order to more clearly explain the behavior recognition method based on SlowFast of the present invention, the following describes the steps in the embodiment of the present invention in detail with reference to fig. 1.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
The behavior recognition method based on SlowFast according to the first embodiment of the invention comprises the following steps S101-S106, and the steps are described in detail as follows:
step S101: and preprocessing the original video data of the target behaviors to obtain preprocessed video data.
In the practice of the present application, the target behaviors are primarily monkey behaviors, which are a collective term for the family monkeys, primates, mammalia, omnivorous, fruit-based, meat-based foods available without the need to spit, and are members of three types of ape primates.
In one example, the observation target may be a rhesus monkey or a cynomolgus monkey, and may also be primarily a rhesus monkey, with a small portion being a cynomolgus monkey.
In this step, before preprocessing the target behavior raw video data, the target behavior raw video data is first acquired. In one example, front and top view video data is obtained primarily for a monkey. The concrete operation can be design and prepare two fixing device, puts into fixing device with two cameras, installs fixing device respectively again and treats the cage front and the top that shoots the monkey place. All the behaviors of the monkeys are photographed without interruption without interference. The photographs were taken to cover monkeys of different sexes and ages as much as possible.
Optionally, the preprocessing the original video data of the target behavior to obtain preprocessed video data includes:
and carrying out first preprocessing on the original video data of the target behaviors to obtain a plurality of video segment data, wherein each video segment comprises one target behavior.
Specifically, the obtaining of the plurality of pieces of video segment data by performing the first preprocessing on the original video data of the target behavior includes:
acquiring the start-stop moment and the behavior category label of each target behavior in the original video data of the target behaviors;
in the embodiment of the application, the original video data of the target behaviors are cleaned firstly, the original video data of the target behaviors with higher definition are selected, then the original video data of the target behaviors with higher definition are watched according to the predetermined action categories, and the starting time and the ending time of each action category are recorded; and labeling the behavior category labels.
The action categories are determined in advance, all actions of a target, such as a monkey, are defined in a classified manner by a worker in advance, the actions of the monkey are required to be completely visible, the occurrence frequency is high, the definition can be made clear, the classification is more detailed, and the types of recognition of the SlowFast neural network recognition model are more. In one example, the action categories of monkeys can be divided into 10 categories, 1, lying down, respectively; 2. squatting; 3. walking; 4. jumping upwards; 5. jumping downwards; 6. climbing upwards; 7. climbing downwards; 8. hanging; 9. standing; 10. and (4) climbing. Wherein, 1-10 are behavior category labels.
And cutting the original video data of the target behaviors according to the starting and stopping moments to obtain video fragment data. In one example, Python code may be written in conjunction with the Ffmpeg software command line to batch crop data from the start-stop time.
And labeling the video name label of the video clip according to the behavior category label. For example, if the action category represented by the action category label "1" is lying down, the video name label may be set to "1", so that the action category label and thus the action category may be determined according to the video name label.
And respectively carrying out second preprocessing on the plurality of video segment data to expand the data to obtain preprocessed video data.
Optionally, the performing a second pre-processing on the plurality of pieces of video segment data to expand the data, and obtaining pre-processed video data includes:
and performing one or more operations of random cutting and turning on the plurality of video segments to obtain the expanded preprocessed video data.
The data can be expanded through the second preprocessing, the amount of the preprocessed video data is increased, and the preprocessed video data can be divided into the training data set and the verification data set only under the condition that the preprocessed video data are sufficient, so that a data basis is provided for the step S102.
Step S102: the pre-processed video data is divided into a training data set and a verification data set.
In the step, division is performed according to a preset proportion, for example, 4:1, and the proportion of the training data set is larger than that of the verification data set.
Step S103: and inputting the training data set into a first SlowFast neural network recognition model which is constructed in advance for preliminary training to obtain a second SlowFast neural network recognition model.
Optionally, the step of inputting the training data set into a first SlowFast neural network recognition model which is constructed in advance to perform preliminary training to obtain a second SlowFast neural network recognition model includes the following steps, as shown in fig. 2:
step S201: sampling the training data set according to a preset sampling rule to obtain a first video data sample with a preset frame number, and specifically comprising the following steps:
acquiring an initial frame number of each video data in a training data set, and determining a sampling interval according to a preset proportional corresponding relation between the initial frame number and the sampling interval; for example, 30-60 frames of video, with a sample interval of 1; the video with 60-90 frames has a sampling interval of 2; the video of 90-180 frames, the sampling interval is 3; video larger than 180 frames: the sampling interval is 4. Sampling according to the sampling interval to obtain an intermediate video data sample; if the frame number of the intermediate video data sample is greater than the preset frame number, randomly intercepting the video data sample with the preset frame number to determine that the video data sample is a first video data sample, for example, a video with 50 frames, and the sampling interval is 1, then obtaining 50 video data samples, and randomly intercepting 30 continuous frames in the 50 video data samples as the first video data sample.
Because the initial frame numbers of the videos are inconsistent, the sampling interval is dynamically set according to the initial frame numbers of the videos, and the longer the video is, the larger the sampling interval is, so that the global information of the video is more favorably obtained.
Step S202: performing data enhancement pre-processing on the first video data sample. The processing method comprises the following steps: random clipping and horizontal flipping at 50% probability.
Step S203: and sampling the first video data sample subjected to data enhancement preprocessing according to a first sampling interval to obtain a second video data sample, and inputting the second video data sample into a Slow branch in a first Slow fast neural network recognition model to obtain spatial information of a target behavior.
In one example, after first sampling first video data samples of 30 frames, for example, the first video data samples of 30 frames are sampled again at a sampling interval of 6, and second video data samples of 5 video frames are obtained and input to the Slow branch.
In the application, the Slow branch is used for acquiring spatial information of a video, such as color, plants and other information around a monkey, and although there are few input video frames in the Slow branch, feature information is complex and fine-grained, so that a large amount of calculation is generated, and the calculation amount occupies about 80% of the whole network model.
Step S204: sampling the first video data sample subjected to data enhancement preprocessing according to a second sampling interval to obtain a third video data sample, and inputting the third video data sample into a Fast branch in a first SlowFast neural network identification model to obtain time information of a target behavior; wherein the first sampling interval is greater than the second sampling interval.
In one example, after the first video data sample of 30 frames is obtained by the first sampling, the first video data sample of 30 frames is sampled again with a sampling interval of 2, and the third video data sample of 15 video frames is obtained and input to the Fast branch.
In the application, the Fast branch is used for acquiring time information of a video, such as action information of a monkey from 2S to 3S, and although there are many input video frames in the Fast branch, the characteristic information is simple, fine granularity is low, and the calculation amount is small, and occupies about 20% of the whole network for calculation.
Step S205: and fusing the spatial information and the time information.
In this step, the first SlowFast neural network identification model includes a channel connected from the Fast branch to the Slow branch, so as to fuse the time information and the spatial information. However, because the number of input video frames of the two branches is different, the generated feature dimensions are also different, so that the feature maps of the Fast branch need to be subjected to scale change by using a 3D convolution kernel such as 5 × 1 during connection, and then summed with the feature map of the Slow branch to realize temporal and spatial feature information fusion.
Step S206: and calculating to obtain a training recognition result according to the fused information.
In the step, complete video information is obtained after fusion, and is input into a full link layer extraction characteristic value of a first SlowFast neural network recognition model, and the characteristics extracted from the full link layer are input into a sigmoid regression layer for calculation, so that a training recognition result is obtained.
Step S207: and repeating the training processes S201-S206 according to preset training times to obtain a second SlowFast neural network recognition model.
Step S104: and calculating the identification precision of the second SlowFast neural network identification model according to the verification data set.
Optionally, the calculating the recognition accuracy of the second SlowFast neural network recognition model according to the verification data set includes:
inputting the verification data set into the second SlowFast neural network recognition model, sequentially training each video data in the verification data set, and outputting a verification recognition result of each video data, wherein the verification recognition result is a behavior type verification label. The training process refers to steps S201 to S206, which are not described herein again.
Comparing the behavior class verification label with a video name label;
and calculating the ratio of the verification identification results of the behavior type verification label and the video name label, and determining the ratio as the identification precision.
In one example, for example, the verification training set includes 10 sets of video segment data, each set includes 10 video segment data, each video segment corresponds to one video name tag, and the video name tags corresponding to the 10 video segment data are, 1 and lying down, respectively; 2. squatting; 3. walking; 4. jumping upwards; 5. jumping downwards; 6. climbing upwards; 7. climbing downwards; 8. hanging; 9. standing; 10. and (4) climbing. 1-10 represent the behavior category label and also represent the video name label, for example, after a video segment with the video name label of 1 is trained, the output is the behavior category label 2, then the action category is determined to be squatting according to the behavior category label 2, and the identification is wrong when the action category is different from the input video label, that is, the action is different from the real action. Assuming that 5 of the 10 video segments in one group are identified incorrectly and 5 are identified correctly, the group identification accuracy is 50%.
Step S105: and adjusting parameters of the second SlowFast neural network recognition model according to the recognition precision, and performing iterative training to obtain a third SlowFast neural network recognition model.
In this step, iterative training times are preset, the second SlowFast neural network recognition model is trained according to the preset training times, for example, 1000 times, model parameters such as parameters learning _ rate and weight _ decay are adjusted according to the output recognition precision each time training is performed, and after the training of all times is completed, the corresponding model parameter with the highest recognition precision is configured as a third SlowFast neural network recognition model parameter.
Step S106: and identifying target behaviors in a real environment by using the third SlowFast neural network identification model. And deploying the third SlowFast neural network recognition model to a server in a real environment, and recognizing the monkey behaviors.
In another embodiment of the present application, the preprocessed video data may be divided into test data sets, for example, according to a ratio of 3:1:1, where the training data set accounts for 60%, the verification data set accounts for 20%, and the test data set accounts for 20%, where the test data set is used to test the performance of the third SlowFast neural network recognition model, and the optimal recognition accuracy of the third SlowFast neural network recognition model on the test data set is determined as the recognition accuracy of the third SlowFast neural network recognition model.
In a second aspect, based on the same inventive concept, the invention provides a behavior recognition system based on SlowFast, which is mainly used for recognition of monkey behaviors, as shown in fig. 3, and the system comprises:
the preprocessing unit 301 is configured to preprocess the original video data of the target behavior to obtain preprocessed video data;
a dividing unit 302, configured to divide the preprocessed video data into a training data set and a verification data set;
a first training unit 303, configured to input the training data set into a first SlowFast neural network recognition model that is constructed in advance for preliminary training, so as to obtain a second SlowFast neural network recognition model;
a calculating unit 304, configured to calculate, according to the verification data set, an identification accuracy of the second SlowFast neural network identification model;
a second training unit 305, configured to adjust parameters of the second SlowFast neural network recognition model according to the recognition accuracy, and perform iterative training to obtain a third SlowFast neural network recognition model;
a recognition unit 306, configured to recognize a target behavior in a real environment by using the third SlowFast neural network recognition model.
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process and related description of the system described above may refer to the corresponding process in the foregoing method embodiments, and will not be described herein again.
It should be noted that the behavior recognition system based on SlowFast provided in the foregoing embodiment is only illustrated by the division of the foregoing functional modules, and in practical applications, the functions may be allocated to different functional modules according to needs, that is, the modules or steps in the embodiment of the present invention are further decomposed or combined, for example, the modules in the foregoing embodiment may be combined into one module, or may be further split into multiple sub-modules, so as to complete all or part of the functions described above. The names of the modules and steps involved in the embodiments of the present invention are only for distinguishing the modules or steps, and are not to be construed as unduly limiting the present invention.
An apparatus of a third embodiment of the invention comprises:
at least one processor; and
a memory communicatively coupled to at least one of the processors; wherein the content of the first and second substances,
the memory stores instructions executable by the processor to perform a method for SlowFast-based behavior recognition according to any one of the first aspect.
A computer-readable storage medium of a fourth embodiment of the present invention stores computer instructions for execution by the computer to implement the method for SlowFast-based behavior recognition according to any one of the first aspect.
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes and related descriptions of the storage device and the processing device described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
Referring now to FIG. 4, therein is shown a block diagram of a computer system of a server that may be used to implement embodiments of the method, system, and apparatus of the present application. The server shown in fig. 4 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.
As shown in fig. 4, the computer system includes a Central Processing Unit (CPU)401 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 402 or a program loaded from a storage section 408 into a Random Access Memory (RAM) 403. In the RAM 403, various programs and data necessary for system operation are also stored. The CPU401, ROM 402, and RAM 403 are connected to each other via a bus 404. An Input/Output (I/O) interface 405 is also connected to the bus 404.
The following components are connected to the I/O interface 405: an input section 406 including a keyboard, a mouse, and the like; an output section 407 including a Display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and a speaker; a storage section 408 including a hard disk and the like; and a communication section 409 including a Network interface card such as a LAN (Local Area Network) card, a modem, or the like. The communication section 409 performs communication processing via a network such as the internet. A driver 410 is also connected to the I/O interface 405 as needed. A removable medium 411 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 410 as necessary, so that a computer program read out therefrom is mounted into the storage section 408 as necessary.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 409, and/or installed from the removable medium 411. The computer program performs the above-described functions defined in the method of the present application when executed by a Central Processing Unit (CPU) 401. It should be noted that the computer readable medium mentioned above in the present application may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The terms "first," "second," and the like are used for distinguishing between similar elements and not necessarily for describing or implying a particular order or sequence.
The terms "comprises," "comprising," or any other similar term are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
So far, the technical solutions of the present invention have been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of the present invention is obviously not limited to these specific embodiments. Equivalent changes or substitutions of related technical features can be made by those skilled in the art without departing from the principle of the invention, and the technical scheme after the changes or substitutions can fall into the protection scope of the invention.
Claims (10)
1. A behavior recognition method based on SlowFast, characterized by comprising the following steps:
preprocessing original video data of the target behaviors to obtain preprocessed video data;
dividing the pre-processed video data into a training data set and a verification data set;
inputting the training data set into a first SlowFast neural network recognition model which is constructed in advance for preliminary training to obtain a second SlowFast neural network recognition model;
calculating the identification precision of the second SlowFast neural network identification model according to the verification data set;
adjusting parameters of the second SlowFast neural network recognition model according to the recognition precision, and performing iterative training to obtain a third SlowFast neural network recognition model;
and identifying target behaviors in a real environment by using the third SlowFast neural network identification model.
2. The method of claim 1, wherein preprocessing the raw video data of the target behavior to obtain preprocessed video data comprises:
performing first preprocessing on the original video data of the target behaviors to obtain a plurality of video segment data, wherein each video segment data comprises one target behavior;
and respectively carrying out second preprocessing on the plurality of video segment data to expand the data to obtain preprocessed video data.
3. The method of claim 2, wherein the first pre-processing the raw video data of the target line to obtain a plurality of video segment data comprises:
acquiring the start-stop moment and the behavior category label of each target behavior in the original video data of the target behaviors;
cutting the original video data of the target behaviors according to the starting and stopping moments to obtain video fragment data;
and labeling the video name label of each video clip according to the behavior category label.
4. The method of claim 2, wherein the second pre-processing the video segment data to expand the data respectively comprises:
and performing one or more operations of random cutting and turning on the plurality of video segments to obtain the expanded preprocessed video data.
5. The method according to claim 1, wherein the inputting the training data set into a first SlowFast neural network recognition model which is constructed in advance for preliminary training, and obtaining a second SlowFast neural network recognition model comprises:
sampling the training data set according to a preset sampling rule to obtain a first video data sample with a preset frame number;
performing data enhancement preprocessing on the first video data sample;
sampling a first video data sample subjected to data enhancement preprocessing according to a first sampling interval to obtain a second video data sample, and inputting the second video data sample into a Slow branch in a first Slow fast neural network recognition model to obtain spatial information of a target behavior;
sampling the first video data sample subjected to data enhancement preprocessing according to a second sampling interval to obtain a third video data sample, and inputting the third video data sample into a Fast branch in a first SlowFast neural network identification model to obtain time information of a target behavior; wherein the first sampling interval is greater than the second sampling interval;
fusing the spatial information and the time information;
calculating according to the fused information to obtain a training recognition result;
and repeating the training process according to the preset training times to obtain a second SlowFast neural network recognition model.
6. The method of claim 5, wherein sampling the training data set according to a predetermined sampling rule to obtain a predetermined number of first video data samples comprises:
acquiring an initial frame number of each video data in a training data set;
determining a sampling interval according to a preset proportional corresponding relation between the initial frame number and the sampling interval;
sampling according to the sampling interval to obtain an intermediate video data sample;
and if the frame number of the intermediate video data sample is greater than the preset frame number, randomly intercepting the video data sample with the preset frame number to determine the video data sample as a first video data sample.
7. The method according to claim 3, wherein said calculating the recognition accuracy of the second SlowFast neural network recognition model from the validation data set comprises:
inputting the verification data set into the second SlowFast neural network recognition model, sequentially training each video data in the verification data set, and outputting a verification recognition result of each video data, wherein the verification recognition result is a behavior class verification label;
comparing the behavior class verification label with a video name label;
and calculating the ratio of the verification identification results of the behavior type verification label and the video name label, and determining the ratio as the identification precision.
8. A SlowFast-based behavior recognition system, the system comprising:
the preprocessing unit is used for preprocessing the original video data of the target behaviors to obtain preprocessed video data;
a dividing unit, configured to divide the preprocessed video data into a training data set and a verification data set;
the first training unit is used for inputting the training data set into a first SlowFast neural network recognition model which is constructed in advance for preliminary training to obtain a second SlowFast neural network recognition model;
a calculating unit, configured to calculate, according to the verification data set, an identification accuracy of the second SlowFast neural network identification model;
the second training unit is used for adjusting parameters of the second SlowFast neural network recognition model according to the recognition precision and performing iterative training to obtain a third SlowFast neural network recognition model;
and the identification unit is used for identifying the target behaviors in the real environment by utilizing the third SlowFast neural network identification model.
9. An apparatus, comprising:
at least one processor; and
a memory communicatively coupled to at least one of the processors; wherein the content of the first and second substances,
the memory stores instructions executable by the processor to perform the method for SlowFast-based behavior recognition of any one of claims 1-7.
10. A computer-readable storage medium storing computer instructions for execution by the computer to perform the method for SlowFast-based behavior recognition of any one of claims 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110455595.XA CN113723169B (en) | 2021-04-26 | 2021-04-26 | SlowFast-based behavior recognition method, system and equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110455595.XA CN113723169B (en) | 2021-04-26 | 2021-04-26 | SlowFast-based behavior recognition method, system and equipment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113723169A true CN113723169A (en) | 2021-11-30 |
CN113723169B CN113723169B (en) | 2024-04-30 |
Family
ID=78672693
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110455595.XA Active CN113723169B (en) | 2021-04-26 | 2021-04-26 | SlowFast-based behavior recognition method, system and equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113723169B (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114359791A (en) * | 2021-12-16 | 2022-04-15 | 北京信智文科技有限公司 | Group macaque appetite detection method based on Yolo v5 network and SlowFast network |
CN115376210A (en) * | 2022-10-24 | 2022-11-22 | 杭州巨岩欣成科技有限公司 | Drowning behavior identification method, device, equipment and medium for preventing drowning in swimming pool |
WO2023108782A1 (en) * | 2021-12-15 | 2023-06-22 | 深圳先进技术研究院 | Method and apparatus for training behavior recognition model, behavior recognition method, apparatus and system, and medium |
CN116363137A (en) * | 2023-06-01 | 2023-06-30 | 合力(天津)能源科技股份有限公司 | Cleaning effect evaluation method and system for guiding automatic cleaning of oil pipe |
WO2023147778A1 (en) * | 2022-02-07 | 2023-08-10 | 北京字跳网络技术有限公司 | Action recognition method and apparatus, and electronic device and storage medium |
CN116110586B (en) * | 2023-04-13 | 2023-11-21 | 南京市红山森林动物园管理处 | Elephant health management system based on YOLOv5 and SlowFast |
Citations (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108647643A (en) * | 2018-05-11 | 2018-10-12 | 浙江工业大学 | A kind of packed tower liquid flooding state on-line identification method based on deep learning |
CN109145789A (en) * | 2018-08-09 | 2019-01-04 | 炜呈智能电力科技(杭州)有限公司 | Power supply system safety work support method and system |
US20190068627A1 (en) * | 2017-08-28 | 2019-02-28 | Oracle International Corporation | Cloud based security monitoring using unsupervised pattern recognition and deep learning |
CN110119703A (en) * | 2019-05-07 | 2019-08-13 | 福州大学 | The human motion recognition method of attention mechanism and space-time diagram convolutional neural networks is merged under a kind of security protection scene |
CN110555368A (en) * | 2019-06-28 | 2019-12-10 | 西安理工大学 | Fall-down behavior identification method based on three-dimensional convolutional neural network |
CN110717301A (en) * | 2019-09-19 | 2020-01-21 | 中国石油大学(华东) | Flow unit information classification and identification method based on support vector machine algorithm |
CN110852168A (en) * | 2019-10-11 | 2020-02-28 | 西北大学 | Pedestrian re-recognition model construction method and device based on neural framework search |
CN111291840A (en) * | 2020-05-12 | 2020-06-16 | 成都派沃智通科技有限公司 | Student classroom behavior recognition system, method, medium and terminal device |
CN111598230A (en) * | 2019-02-21 | 2020-08-28 | 北京创新工场旷视国际人工智能技术研究院有限公司 | Training method and system of neural network model with anti-counterfeiting function, anti-counterfeiting verification method and electronic device |
CN111814669A (en) * | 2020-07-08 | 2020-10-23 | 中国工商银行股份有限公司 | Method and device for identifying abnormal behaviors of bank outlets |
CN111814661A (en) * | 2020-07-07 | 2020-10-23 | 西安电子科技大学 | Human behavior identification method based on residual error-recurrent neural network |
CN112183313A (en) * | 2020-09-27 | 2021-01-05 | 武汉大学 | SlowFast-based power operation field action identification method |
US20210073526A1 (en) * | 2019-09-10 | 2021-03-11 | Blue Planet Training, Inc. | System and Method for Visual Analysis of Emotional Coherence in Videos |
CN112529020A (en) * | 2020-12-24 | 2021-03-19 | 携程旅游信息技术(上海)有限公司 | Animal identification method, system, equipment and storage medium based on neural network |
CN112580523A (en) * | 2020-12-22 | 2021-03-30 | 平安国际智慧城市科技股份有限公司 | Behavior recognition method, behavior recognition device, behavior recognition equipment and storage medium |
-
2021
- 2021-04-26 CN CN202110455595.XA patent/CN113723169B/en active Active
Patent Citations (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190068627A1 (en) * | 2017-08-28 | 2019-02-28 | Oracle International Corporation | Cloud based security monitoring using unsupervised pattern recognition and deep learning |
CN108647643A (en) * | 2018-05-11 | 2018-10-12 | 浙江工业大学 | A kind of packed tower liquid flooding state on-line identification method based on deep learning |
CN109145789A (en) * | 2018-08-09 | 2019-01-04 | 炜呈智能电力科技(杭州)有限公司 | Power supply system safety work support method and system |
CN111598230A (en) * | 2019-02-21 | 2020-08-28 | 北京创新工场旷视国际人工智能技术研究院有限公司 | Training method and system of neural network model with anti-counterfeiting function, anti-counterfeiting verification method and electronic device |
CN110119703A (en) * | 2019-05-07 | 2019-08-13 | 福州大学 | The human motion recognition method of attention mechanism and space-time diagram convolutional neural networks is merged under a kind of security protection scene |
CN110555368A (en) * | 2019-06-28 | 2019-12-10 | 西安理工大学 | Fall-down behavior identification method based on three-dimensional convolutional neural network |
US20210073526A1 (en) * | 2019-09-10 | 2021-03-11 | Blue Planet Training, Inc. | System and Method for Visual Analysis of Emotional Coherence in Videos |
CN110717301A (en) * | 2019-09-19 | 2020-01-21 | 中国石油大学(华东) | Flow unit information classification and identification method based on support vector machine algorithm |
CN110852168A (en) * | 2019-10-11 | 2020-02-28 | 西北大学 | Pedestrian re-recognition model construction method and device based on neural framework search |
CN111291840A (en) * | 2020-05-12 | 2020-06-16 | 成都派沃智通科技有限公司 | Student classroom behavior recognition system, method, medium and terminal device |
CN111814661A (en) * | 2020-07-07 | 2020-10-23 | 西安电子科技大学 | Human behavior identification method based on residual error-recurrent neural network |
CN111814669A (en) * | 2020-07-08 | 2020-10-23 | 中国工商银行股份有限公司 | Method and device for identifying abnormal behaviors of bank outlets |
CN112183313A (en) * | 2020-09-27 | 2021-01-05 | 武汉大学 | SlowFast-based power operation field action identification method |
CN112580523A (en) * | 2020-12-22 | 2021-03-30 | 平安国际智慧城市科技股份有限公司 | Behavior recognition method, behavior recognition device, behavior recognition equipment and storage medium |
CN112529020A (en) * | 2020-12-24 | 2021-03-19 | 携程旅游信息技术(上海)有限公司 | Animal identification method, system, equipment and storage medium based on neural network |
Non-Patent Citations (3)
Title |
---|
TJENG WAWAN CENGGORO; AWANG HARSA KRIDALAKSANA; EKA ARRIYANTI; M. IRWAN UKKAS: ""Recognition of a human behavior pattern in paper rock scissor game using backpropagation artificial neural network method"", 《2014 2ND INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY (ICOICT)》 * |
杨洁;陈灵娜;林颖;陈宇韶;陈俊熹;: "基于卷积网络的视频目标检测", 南华大学学报(自然科学版), no. 04 * |
解怀奇;乐红兵;: "基于通道注意力机制的视频人体行为识别", 电子技术与软件工程, no. 04 * |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2023108782A1 (en) * | 2021-12-15 | 2023-06-22 | 深圳先进技术研究院 | Method and apparatus for training behavior recognition model, behavior recognition method, apparatus and system, and medium |
CN114359791A (en) * | 2021-12-16 | 2022-04-15 | 北京信智文科技有限公司 | Group macaque appetite detection method based on Yolo v5 network and SlowFast network |
CN114359791B (en) * | 2021-12-16 | 2023-08-01 | 北京信智文科技有限公司 | Group macaque appetite detection method based on Yolo v5 network and SlowFast network |
WO2023147778A1 (en) * | 2022-02-07 | 2023-08-10 | 北京字跳网络技术有限公司 | Action recognition method and apparatus, and electronic device and storage medium |
CN115376210A (en) * | 2022-10-24 | 2022-11-22 | 杭州巨岩欣成科技有限公司 | Drowning behavior identification method, device, equipment and medium for preventing drowning in swimming pool |
CN116110586B (en) * | 2023-04-13 | 2023-11-21 | 南京市红山森林动物园管理处 | Elephant health management system based on YOLOv5 and SlowFast |
CN116363137A (en) * | 2023-06-01 | 2023-06-30 | 合力(天津)能源科技股份有限公司 | Cleaning effect evaluation method and system for guiding automatic cleaning of oil pipe |
CN116363137B (en) * | 2023-06-01 | 2023-08-04 | 合力(天津)能源科技股份有限公司 | Cleaning effect evaluation method and system for guiding automatic cleaning of oil pipe |
Also Published As
Publication number | Publication date |
---|---|
CN113723169B (en) | 2024-04-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113723169B (en) | SlowFast-based behavior recognition method, system and equipment | |
US10937144B2 (en) | Pipe feature identification using pipe inspection data analysis | |
CN110705405B (en) | Target labeling method and device | |
CN108171260B (en) | Picture identification method and system | |
CN113382279B (en) | Live broadcast recommendation method, device, equipment, storage medium and computer program product | |
CN111046956A (en) | Occlusion image detection method and device, electronic equipment and storage medium | |
CN113158909B (en) | Behavior recognition light-weight method, system and equipment based on multi-target tracking | |
CN110751675B (en) | Urban pet activity track monitoring method based on image recognition and related equipment | |
CN111346842A (en) | Coal gangue sorting method, device, equipment and storage medium | |
CN109285181B (en) | Method and apparatus for recognizing image | |
CN109685847B (en) | Training method and device for visual target detection model | |
CN108982522B (en) | Method and apparatus for detecting pipe defects | |
CN111598913B (en) | Image segmentation method and system based on robot vision | |
CN111950812B (en) | Method and device for automatically identifying and predicting rainfall | |
Mann et al. | Automatic flower detection and phenology monitoring using time‐lapse cameras and deep learning | |
CN114724140A (en) | Strawberry maturity detection method and device based on YOLO V3 | |
CN109088793B (en) | Method and apparatus for detecting network failure | |
Prior et al. | Estimating precision and accuracy of automated video post-processing: A step towards implementation of ai/ml for optics-based fish sampling | |
US10922569B2 (en) | Method and apparatus for detecting model reliability | |
CN115438945A (en) | Risk identification method, device, equipment and medium based on power equipment inspection | |
CN114494971A (en) | Video yellow-related detection method and device, electronic equipment and storage medium | |
CN114038040A (en) | Machine room inspection monitoring method, device and equipment | |
CN114821396A (en) | Normative detection method, device and storage medium for LNG unloading operation process | |
CN114241376A (en) | Behavior recognition model training and behavior recognition method, device, system and medium | |
CN112308090A (en) | Image classification method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |