CN116580343A

CN116580343A - Small sample behavior recognition method, storage medium and controller

Info

Publication number: CN116580343A
Application number: CN202310859607.4A
Authority: CN
Inventors: 常峰; 刘海峰
Original assignee: Hefei Zhongke Leinao Intelligent Technology Co ltd
Current assignee: Hefei Zhongke Leinao Intelligent Technology Co ltd
Priority date: 2023-07-13
Filing date: 2023-07-13
Publication date: 2023-08-11

Abstract

The invention discloses a small sample behavior recognition method, a storage medium and a controller. The method comprises the following steps: acquiring a video to be identified and a first reference video, respectively carrying out feature extraction processing on the video to be identified and the first reference video to obtain corresponding frame features, and carrying out feature enhancement processing on the frame features to obtain enhanced frame features; processing the enhanced frame features according to the time sequence segment prototype to obtain a corresponding high-level time sequence segment, wherein the time sequence segment prototype is obtained by dynamically updating a randomly generated initial time sequence segment prototype; and obtaining a first distance between the video to be identified and the first reference video according to the enhanced frame characteristics and the high-level time sequence fragments, and performing behavior identification on the video to be identified according to the first distance. The method can realize high-performance small sample behavior recognition.

Description

Small sample behavior recognition method, storage medium and controller

Technical Field

The invention relates to the technical field of behavior recognition, in particular to a small sample behavior recognition method, a storage medium and a controller.

Background

With the development of deep learning technology, behavior recognition has made breakthrough progress. The deep learning method requires a large amount of labeling data, in the real world, due to reasons of privacy, high data acquisition cost and the like, less data can be obtained, and in the fields of security, medical treatment and the like, the data labeling is very difficult. The small sample learning is an algorithm for classifying a new class of samples (videos to be identified) by using fewer labeling samples (reference videos), improves generalization of a model, is suitable for target identification with a shortage of label data volume, and can effectively reduce dependence on human labeling.

However, since there are a large number of complex structures in the time dimension in the video compared to the picture, the small sample behavior recognition in the related art focuses only on the alignment at the frame level or the segment level, resulting in the ambiguity of the alignment, degrading the performance of the small sample behavior recognition.

Disclosure of Invention

The present invention aims to solve at least one of the technical problems in the related art to some extent. To this end, a first object of the present invention is to propose a small sample behavior recognition method to achieve high-performance small sample behavior recognition.

A second object of the present invention is to propose a computer readable storage medium.

A third object of the present invention is to propose a controller.

To achieve the above object, an embodiment of a first aspect of the present invention provides a small sample behavior recognition method, including: acquiring a video to be identified and a first reference video, respectively carrying out feature extraction processing on the video to be identified and the first reference video to obtain corresponding frame features, and carrying out feature enhancement processing on the frame features to obtain enhanced frame features; processing the enhanced frame features according to a time sequence segment prototype to obtain a corresponding high-level time sequence segment, wherein the time sequence segment prototype is obtained by dynamically updating a randomly generated initial time sequence segment prototype; and obtaining a first distance between the video to be identified and the first reference video according to the enhanced frame characteristics and the high-level time sequence fragments, and performing behavior identification on the video to be identified according to the first distance.

To achieve the above object, an embodiment of a second aspect of the present invention provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the small sample behavior recognition method described above.

To achieve the above object, an embodiment of a third aspect of the present invention provides a controller, including a memory, a processor, and a computer program stored on the memory and executable on the processor, the computer program implementing the small sample behavior recognition method described above when executed by the processor.

According to the small sample behavior recognition method, the storage medium and the controller, a video to be recognized and a first reference video are obtained, feature extraction processing is carried out on the video to be recognized and the first reference video respectively to obtain corresponding frame features, and feature enhancement processing is carried out on the frame features to obtain enhanced frame features; processing the enhanced frame features according to the time sequence segment prototype to obtain a corresponding high-level time sequence segment, wherein the time sequence segment prototype is obtained by dynamically updating a randomly generated initial time sequence segment prototype; and obtaining a first distance between the video to be identified and the first reference video according to the enhanced frame characteristics and the high-level time sequence fragments, and performing behavior identification on the video to be identified according to the first distance. And a high-level time sequence fragment is obtained by introducing a time sequence fragment prototype, the high-level time sequence fragment in the video is automatically found through an attention mechanism, and the performance of small sample behavior recognition is improved. In addition, as the time sequence segment prototype is obtained by dynamically updating the initial time sequence segment prototype generated randomly, the high-level time sequence segment can be obtained by using the more adaptive time sequence segment prototype, the alignment accuracy is further improved, and the performance of small sample behavior recognition is improved.

Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.

Drawings

FIG. 1 is a flow diagram of a small sample behavior recognition method in accordance with one or more embodiments of the present invention;

FIG. 2 is a schematic diagram of a small sample behavior recognition method of one example of the present invention;

FIG. 3 is a flow diagram of a small sample behavior recognition method in accordance with one or more embodiments of the present invention.

Detailed Description

The small sample behavior recognition method, storage medium, controller of the embodiments of the present invention are described below with reference to the drawings, wherein the same or similar reference numerals denote the same or similar elements or elements having the same or similar functions throughout. The embodiments described with reference to the drawings are exemplary and should not be construed as limiting the invention.

FIG. 1 is a flow diagram of a small sample behavior recognition method in accordance with one or more embodiments of the invention.

As shown in fig. 1, the small sample behavior recognition method includes:

s11, acquiring a video to be identified and a first reference video, respectively carrying out feature extraction processing on the video to be identified and the first reference video to obtain corresponding frame features, and carrying out feature enhancement processing on the frame features to obtain enhanced frame features.

And S12, processing the enhanced frame features according to the time sequence segment prototype to obtain a corresponding high-level time sequence segment.

And S13, obtaining a first distance between the video to be identified and the first reference video according to the enhanced frame characteristics and the high-level time sequence fragments, and performing behavior identification on the video to be identified according to the first distance.

Specifically, after obtaining a video to be identified and a first reference video, firstly, respectively performing feature extraction processing on the video to be identified and the first reference video to obtain corresponding frame features.

In order to improve the motion perception capability, after obtaining the frame features corresponding to the video to be identified and the first reference video, a feature enhancer is adopted to conduct feature enhancement processing on the frame features corresponding to the video to be identified and the first reference video, so that the enhanced features can perceive motion information.

After the enhanced frame features are obtained, an attention mechanism is used to implement automatic video clip mining.

A series of timing segment prototypes was first introduced. The method for generating the time sequence segment prototype comprises the steps of firstly carrying out random initialization on the time sequence segment prototype to realize random generation of an initial time sequence segment prototype, and then carrying out dynamic update on the initial time sequence segment prototype to obtain the time sequence segment prototype, and specifically, carrying out dynamic update on the time sequence segment prototype in an end-to-end learning mode according to training loss so as to learn a better time sequence segment prototype.

After the time sequence segment prototype is introduced, the time sequence segment prototype is adopted to process the enhanced frame characteristics so as to acquire the soft correspondence between the time sequence segment prototype and the video frame, so that a plurality of semantically related frames are aggregated according to the soft correspondence between the time sequence segment prototype and the video frame, and a high-level time sequence segment is acquired. Therefore, the high-level time sequence fragments are obtained by aggregating the video frames by adopting the time sequence fragment prototypes, so that the data for executing video alignment are obtained according to the soft correspondence between the time sequence fragment prototypes and the video frames, the alignment accuracy is improved, and the performance of small sample behavior recognition is improved.

After the time sequence segment prototype is introduced, soft correspondence between the time sequence segment prototype and the video frame to be identified and soft correspondence between the time sequence segment prototype and the first reference video frame can be respectively obtained, so that a first high-level time sequence segment corresponding to the video to be identified and a second high-level time sequence segment corresponding to the first reference video are obtained.

After the high-level time sequence segment is obtained, optimal video matching is achieved according to the enhanced frame characteristics and the high-level time sequence segment prototype, and after matching is completed, the distance between the video to be identified and the first reference video is obtained, so that the similarity score between the video to be identified and the first reference video can be determined according to the distance, and behavior identification is carried out on the video to be identified according to the similarity score between the video to be identified and the first reference video.

The method comprises the steps of obtaining a video to be identified and a first reference video, respectively carrying out feature extraction processing on the video to be identified and the first reference video to obtain corresponding frame features, and carrying out feature enhancement processing on the frame features to obtain enhanced frame features; processing the enhanced frame features according to the time sequence segment prototype to obtain a corresponding high-level time sequence segment, wherein the time sequence segment prototype is obtained by dynamically updating a randomly generated initial time sequence segment prototype; and obtaining a first distance between the video to be identified and the first reference video according to the enhanced frame characteristics and the high-level time sequence fragments, and performing behavior identification on the video to be identified according to the first distance. And a high-level time sequence fragment is obtained by introducing a time sequence fragment prototype, so that the performance of small sample behavior recognition is improved. In addition, as the time sequence segment prototype is obtained by dynamically updating the initial time sequence segment prototype generated randomly, the high-level time sequence segment can be obtained by using the more adaptive time sequence segment prototype, the alignment accuracy is further improved, and the performance of small sample behavior recognition is improved.

In one or more embodiments of the present invention, before the feature extraction processing is performed on the video to be identified and the first reference video, the small sample behavior identification method further includes: t frames are randomly extracted from the video to be identified and the first reference video respectively, wherein T is a positive integer. That is, assuming that the acquired video is O, T frames are first adopted, and after T frames are extracted, the feature extraction processing is performed on the T frames, so as to obtain corresponding frame features:。

in one or more embodiments of the present invention, note that a frame feature extracted from a video to be identified is a first frame feature, and note that a frame feature extracted from a first reference video is a second frame feature, the small sample behavior identification method further includes: acquiring a plurality of third frame features corresponding to a plurality of preset second reference videos, calculating a difference value between convolution of a next frame feature and a previous frame feature according to the frame features of each reference video to obtain a difference value set corresponding to each reference video, and obtaining a task level motion mode according to the difference value set; and obtaining channel enhancement parameters according to the task level motion mode.

Specifically, referring to the example shown in fig. 2, a video feature calculation module is provided, where the video feature calculation module includes two identical feature extractors, and further includes a feature enhancer, a motion encoder, and a parameter generator, where the feature extractor extracts second frame features of a first reference video, and the motion encoder acquires the second frame features and acquires a plurality of third frame features, where the plurality of third frame features correspond to a plurality of preset second reference videos, and the second reference videos are other support videos except the first reference video.

After the second frame feature corresponding to the first reference video and the third frame feature corresponding to the second reference video are acquired, for each reference video, calculating a difference between a convolution of a frame feature of a subsequent frame and a frame feature of a previous frame in the frame features of the reference video to obtain a differential feature, specifically, calculating by the following formula:

，

wherein ,for the calculated difference feature +_>For frame features of adjacent frames, h () is a convolution layer used for spatial smoothing.

Further, the motion characteristics of the reference video can be obtained by the following formula:

wherein ,for the motion characteristics of the reference video +.>() For a smooth convolutional layer for spatial smoothing, GAP () is a global average pool of aggregated spatial information, T is the number of frames extracted for each video, i.e. in the specific example shown in fig. 2, the above-mentioned techniques of separately extracting from the video to be identified and the first one are employedThe method for randomly extracting the T frames from the reference video is not limited to this in practical application.

After motion features are calculated for each reference video, a task level motion pattern is calculated according to the following formula:

，

wherein ,for the task level motion mode, NK is the sum of the number of first reference video and second reference video,is the ith reference video motion feature.

After the task level motion pattern is obtained, the parameter generator generates channel enhancement parameters according to the task level motion pattern.

In one or more embodiments of the present invention, the channel enhancement parameters include a first channel enhancement parameter and a second channel enhancement parameter, and the feature enhancer performs the feature enhancement process using the following formula:

，

wherein ,enhancement parameters for the first channel,>enhancement parameters for the second channel, in +.>In the case of the first frame feature,for the corresponding first enhancement frame feature, in +.>For the second frame feature +.>Is the corresponding second enhancement frame feature.

In one or more embodiments of the invention, processing enhancement frame features according to a temporal segment prototype includes: obtaining a first feature and a second feature according to the first enhancement frame feature, calculating to obtain a first attention score between the first feature and the time sequence segment prototype, and obtaining a first high-level time sequence segment according to the first attention score and the second feature; and obtaining a third feature and a fourth feature according to the second enhanced frame feature, calculating to obtain a second attention score between the third feature and the time sequence segment prototype, and obtaining a second high-level time sequence segment according to the second attention score and the fourth feature.

Specifically, after the enhancement frame features are obtained, a time sequence segment prototype is introducedThe features of the time sequence segment prototype are obtained and denoted as Q.

Continuing to refer to the example shown in fig. 2, for a first enhancement frame feature corresponding to a video to be identified and a second enhancement frame feature corresponding to a first reference video, the first enhancement frame feature is subjected to a linear mapping layer to obtain a corresponding first feature K1 and a corresponding second feature V1, and the second enhancement frame feature is subjected to a linear mapping layer to obtain a corresponding third feature K2 and a corresponding fourth feature V2, so that a first high-level time sequence segment is obtained according to the first feature, the second feature and the time sequence segment prototype, and a second high-level time sequence segment is obtained according to the third feature, the fourth feature and the time sequence segment prototype.

For the first enhancement frame feature, an attention score between the first feature and the temporal segment prototype feature is first calculated according to:

，

wherein ,for attention score, ++>Is a scale factor (F)>Characteristic of the transition of the t-th frame, +.>For other video frames than t +.>For its corresponding attention score, +.>Prototype features for the j-th timing segment.

The attention score is the soft correspondence between the time sequence segment prototype and the video to be identified, and a first high-level time sequence segment can be obtained by aggregating a plurality of semantically related frames by the following steps:

，

wherein ,for high-level timing segment, +.>Is the corresponding t frame feature.

Similarly, for the second enhancement frame feature, the method of calculating the second high-level timing slice is similar to the method of calculating the first high-level timing slice described above.

In one or more embodiments of the present invention, referring to fig. 3, obtaining a first distance between a video to be identified and a first reference video according to an enhanced frame feature and a high-level timing slice includes:

s31, obtaining a first time sequence slice according to the first enhanced frame characteristic and the first high-level time sequence slice, and obtaining a second time sequence slice according to the second enhanced frame characteristic and the second high-level time sequence slice.

Specifically, the corresponding time-series slice is obtained according to the following formula:

，

where l is the time sequence slice.

S32, acquiring a first discrete distribution of the first time sequence slice, a second discrete distribution of the second time sequence slice and a second distance between the first time sequence slice and the second time sequence slice.

Specifically, after the first time-series slice and the second time-series slice are obtained, the discrete distribution s of the first time-series slice and the discrete distribution d of the second time-series slice are obtained, and the distance between the first time-series slice and the second time-series slice is obtained。

S33, obtaining a first distance according to the first discrete distribution, the second discrete distribution and the second distance.

In one or more embodiments of the invention, deriving the first distance from the first discrete distribution, the second discrete distribution, and the second distance includes: selecting a target video matching matrix between the video to be identified and the first reference video from a plurality of preset video matching matrices according to the first discrete distribution, the second discrete distribution and the second distance, wherein the target video matching matrix is a matrix which enables the distance calculated according to the target video matching matrix and the second distance to be minimum; and obtaining a first distance between the video to be identified and the first reference video according to the video matching matrix.

Continuing with the example shown in FIG. 2, a plurality of selectable video matching matrices are pre-acquiredCalculate->Thereby from a plurality ofDetermining a target video matching matrix from the selectable video matching matrices, wherein +.>The video matching matrix corresponding to the smallest calculation result is selected.

Above-mentioned，/>，

wherein ,s_i Is a specific element in the discrete distribution s, d _j Is a specific element in the discrete distribution d.

Thereby, optimal matching can be achieved.

After determining the actually used video matching matrix F, obtaining the video matching matrix corresponding to the video matching matrix. wherein ,/>The matching score in (c) may be used to measure how much the sequential fragment similarity contributes to the global similarity.

And also calculate the cosine similarity of the first time sequence slice and the second time sequence sliceWherein, the method comprises the steps of, wherein,<,>representing the calculated cosine similarity, l ^q For first time slicing, l ^s Is a second time-series slice.

Further, a first distance between the video to be identified and the first reference video is calculated by the following formula:

，

wherein ,at the first distance of the first distance,/>for the video to be identified, add->Is the first reference video.

In one or more embodiments of the present invention, performing behavior recognition on a video to be recognized according to a first distance includes: and obtaining the probability that the video to be identified belongs to the category of the first reference video according to the first distance.

As an example, assume that the first reference video belongs to category c, ready-to-useRepresenting the first reference video, the probability that the video to be classified belongs to the category c can be obtained through the following formula:

，

moreover, the classification loss function can also be calculated by the following formula:

，

wherein ,for the probability that the video to be classified belongs to category c, +.>For classifying loss functions, ++>For other categories than c +.>For classifying the number of samples>For a real label->Is the data set to be classified.

In summary, the small sample behavior recognition method of the embodiment of the invention obtains the video to be recognized and the first reference video, performs feature extraction processing on the video to be recognized and the first reference video respectively to obtain corresponding frame features, and performs feature enhancement processing on the frame features to obtain enhanced frame features; processing the enhanced frame features according to the time sequence segment prototype to obtain a corresponding high-level time sequence segment, wherein the time sequence segment prototype is obtained by dynamically updating a randomly generated initial time sequence segment prototype; and obtaining a first distance between the video to be identified and the first reference video according to the enhanced frame characteristics and the high-level time sequence fragments, and performing behavior identification on the video to be identified according to the first distance. And a high-level time sequence fragment is obtained by introducing a time sequence fragment prototype, the high-level time sequence fragment in the video is automatically found through an attention mechanism, and the performance of small sample behavior recognition is improved. In addition, as the time sequence segment prototype is obtained by dynamically updating the initial time sequence segment prototype generated randomly, the high-level time sequence segment can be obtained by using the more adaptive time sequence segment prototype, the alignment accuracy is further improved, and the performance of small sample behavior recognition is improved. And the feature enhancer is arranged to enhance the frame features, so that the enhanced features can sense the motion information, and the performance of small sample behavior identification is further improved. The optimal transmission algorithm is utilized to realize multi-level video time sequence matching, the robustness of video matching is enhanced, the generalization and migration of small sample characteristics are enhanced, and therefore the accuracy of small sample behavior identification is greatly improved.

Further, the present invention proposes a computer-readable storage medium.

In an embodiment of the present invention, a computer-readable storage medium has stored thereon a computer program which, when executed by a processor, implements the small sample behavior recognition method described above.

The computer readable storage medium of the embodiment of the invention obtains the video to be identified and the first reference video by realizing the small sample behavior identification method, respectively performs feature extraction processing on the video to be identified and the first reference video to obtain corresponding frame features, and performs feature enhancement processing on the frame features to obtain enhanced frame features; processing the enhanced frame features according to the time sequence segment prototype to obtain a corresponding high-level time sequence segment, wherein the time sequence segment prototype is obtained by dynamically updating a randomly generated initial time sequence segment prototype; and obtaining a first distance between the video to be identified and the first reference video according to the enhanced frame characteristics and the high-level time sequence fragments, and performing behavior identification on the video to be identified according to the first distance. And a high-level time sequence fragment is obtained by introducing a time sequence fragment prototype, the high-level time sequence fragment in the video is automatically found through an attention mechanism, and the performance of small sample behavior recognition is improved. In addition, as the time sequence segment prototype is obtained by dynamically updating the initial time sequence segment prototype generated randomly, the high-level time sequence segment can be obtained by using the more adaptive time sequence segment prototype, the alignment accuracy is further improved, and the performance of small sample behavior recognition is improved. And the feature enhancer is arranged to enhance the frame features, so that the enhanced features can sense the motion information, and the performance of small sample behavior identification is further improved. The optimal transmission algorithm is utilized to realize multi-level video time sequence matching, the robustness of video matching is enhanced, the generalization and migration of small sample characteristics are enhanced, and therefore the accuracy of small sample behavior identification is greatly improved.

Further, the invention provides a controller.

In an embodiment of the present invention, a controller includes a memory, a processor, and a computer program stored on the memory and executable on the processor, and when the computer program is executed by the processor, the small sample behavior recognition method described above is implemented.

The controller of the embodiment of the invention obtains the video to be identified and the first reference video by realizing the small sample behavior identification method, respectively performs characteristic extraction processing on the video to be identified and the first reference video to obtain corresponding frame characteristics, and performs characteristic enhancement processing on the frame characteristics to obtain enhanced frame characteristics; processing the enhanced frame features according to the time sequence segment prototype to obtain a corresponding high-level time sequence segment, wherein the time sequence segment prototype is obtained by dynamically updating a randomly generated initial time sequence segment prototype; and obtaining a first distance between the video to be identified and the first reference video according to the enhanced frame characteristics and the high-level time sequence fragments, and performing behavior identification on the video to be identified according to the first distance. And a high-level time sequence fragment is obtained by introducing a time sequence fragment prototype, the high-level time sequence fragment in the video is automatically found through an attention mechanism, and the performance of small sample behavior recognition is improved. In addition, as the time sequence segment prototype is obtained by dynamically updating the initial time sequence segment prototype generated randomly, the high-level time sequence segment can be obtained by using the more adaptive time sequence segment prototype, the alignment accuracy is further improved, and the performance of small sample behavior recognition is improved. And the feature enhancer is arranged to enhance the frame features, so that the enhanced features can sense the motion information, and the performance of small sample behavior identification is further improved. The optimal transmission algorithm is utilized to realize multi-level video time sequence matching, the robustness of video matching is enhanced, the generalization and migration of small sample characteristics are enhanced, and therefore the accuracy of small sample behavior identification is greatly improved.

It should be noted that the logic and/or steps represented in the flow diagrams or otherwise described herein may be considered a ordered listing of executable instructions for implementing logical functions, and can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). In addition, the computer readable medium may even be paper or other suitable medium on which the program is printed, as the program may be electronically captured, via, for instance, optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner, if necessary, and then stored in a computer memory.

It is to be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. If implemented in hardware, as in another embodiment, may be implemented using any one or combination of the following techniques, as known in the art: discrete logic circuits having logic gates for implementing logic functions on data signals, application specific integrated circuits having suitable combinational logic gates, programmable Gate Arrays (PGAs), field Programmable Gate Arrays (FPGAs), and the like.

In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

In the description of the present specification, the terms "center", "longitudinal", "transverse", "length", "width", "thickness", "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", "clockwise", "counterclockwise", "axial", "radial", "circumferential", etc. refer to an orientation or positional relationship based on that shown in the drawings, and do not indicate or imply that the apparatus or element referred to must have a specific orientation, be constructed and operated in a specific orientation, and should not be construed as limiting the invention.

Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. In the description of the present invention, the meaning of "plurality" means at least two, for example, two, three, etc., unless specifically defined otherwise.

In the description of the present specification, unless otherwise indicated, the terms "mounted," "connected," "secured," and the like are to be construed broadly and may be, for example, fixedly connected, detachably connected, or integrally formed; can be mechanically or electrically connected; either directly or indirectly, through intermediaries, or both, may be in communication with each other or in interaction with each other, unless expressly defined otherwise. The specific meaning of the above terms in the present invention can be understood by those of ordinary skill in the art according to the specific circumstances.

In the present invention, unless expressly stated or limited otherwise, a first feature "up" or "down" a second feature may be the first and second features in direct contact, or the first and second features in indirect contact via an intervening medium. Moreover, a first feature being "above," "over" and "on" a second feature may be a first feature being directly above or obliquely above the second feature, or simply indicating that the first feature is level higher than the second feature. The first feature being "under", "below" and "beneath" the second feature may be the first feature being directly under or obliquely below the second feature, or simply indicating that the first feature is less level than the second feature.

While embodiments of the present invention have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the invention, and that variations, modifications, alternatives and variations may be made to the above embodiments by one of ordinary skill in the art within the scope of the invention.

Claims

1. A method of small sample behavior recognition, the method comprising:

acquiring a video to be identified and a first reference video, respectively carrying out feature extraction processing on the video to be identified and the first reference video to obtain corresponding frame features, and carrying out feature enhancement processing on the frame features to obtain enhanced frame features;

processing the enhanced frame features according to a time sequence segment prototype to obtain a corresponding high-level time sequence segment, wherein the time sequence segment prototype is obtained by dynamically updating a randomly generated initial time sequence segment prototype;

and obtaining a first distance between the video to be identified and the first reference video according to the enhanced frame characteristics and the high-level time sequence fragments, and performing behavior identification on the video to be identified according to the first distance.

2. The small sample behavior recognition method of claim 1, wherein the extracted frame feature of the video to be recognized is a first frame feature and the extracted frame feature of the first reference video is a second frame feature, the method further comprising:

acquiring a plurality of third frame features corresponding to a plurality of preset second reference videos, calculating a difference value between convolution of a next frame feature and a previous frame feature according to the frame features of each reference video to obtain a difference value set corresponding to each reference video, and obtaining a task level motion mode according to the difference value set;

and obtaining channel enhancement parameters according to the task level motion mode.

3. The small sample behavior recognition method of claim 2, wherein the channel enhancement parameters include a first channel enhancement parameter and a second channel enhancement parameter, and wherein the feature enhancer performs the feature enhancement process using the following equation:

，

wherein ,enhancement parameters for said first channel, +.>Enhancement parameters for the second channel, in +.>For the first frame feature +.>For the corresponding first enhancement frame feature, in +.>For the second frame feature +.>Is the corresponding second enhancement frame feature.

4. The small sample behavior recognition method according to claim 1, wherein before the feature extraction processing is performed on the video to be recognized and the first reference video, respectively, the method further comprises:

randomly extracting T frames from the video to be identified and the first reference video respectively, wherein T is a positive integer.

5. A small sample behavior recognition method according to claim 3, wherein said processing the enhanced frame features according to a time-series segment prototype comprises:

obtaining a first feature and a second feature according to the first enhancement frame feature, calculating to obtain a first attention score between the first feature and the time sequence segment prototype, and obtaining a first high-level time sequence segment according to the first attention score and the second feature;

and obtaining a third feature and a fourth feature according to the second enhanced frame feature, calculating to obtain a second attention score between the third feature and the time sequence segment prototype, and obtaining a second high-level time sequence segment according to the second attention score and the fourth feature.

6. The small sample behavior recognition method of claim 5, wherein the deriving a first distance between the video to be recognized and the first reference video from the enhanced frame features and the high-level timing segments comprises:

obtaining a first time sequence slice according to the first enhanced frame characteristic and the first high-level time sequence slice, and obtaining a second time sequence slice according to the second enhanced frame characteristic and the second high-level time sequence slice;

acquiring a first discrete distribution of the first time sequence slice, a second discrete distribution of the second time sequence slice and a second distance between the first time sequence slice and the second time sequence slice;

and obtaining the first distance according to the first discrete distribution, the second discrete distribution and the second distance.

7. The small sample behavior recognition method of claim 6, wherein the deriving the first distance from the first discrete distribution, the second discrete distribution, and the second distance comprises:

selecting a target video matching matrix between the video to be identified and the first reference video from a plurality of preset video matching matrices according to the first discrete distribution, the second discrete distribution and the second distance, wherein the target video matching matrix is a matrix which minimizes the distance calculated according to the target video matching matrix and the second distance;

and obtaining a first distance between the video to be identified and the first reference video according to the video matching matrix.

8. The small sample behavior recognition method according to claim 1, wherein the performing behavior recognition on the video to be recognized according to the first distance includes:

and obtaining the probability that the video to be identified belongs to the category of the first reference video according to the first distance.

9. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the small sample behavior recognition method according to any one of claims 1-8.

10. A controller comprising a memory, a processor and a computer program stored on the memory and executable on the processor, which when executed by the processor implements the small sample behavior recognition method of any one of claims 1-8.