CN113516030B

CN113516030B - Action sequence verification method and device, storage medium and terminal

Info

Publication number: CN113516030B
Application number: CN202110469750.3A
Authority: CN
Inventors: 高盛华; 钱一成; 罗伟鑫
Original assignee: ShanghaiTech University
Current assignee: ShanghaiTech University
Priority date: 2021-04-28
Filing date: 2021-04-28
Publication date: 2024-03-26
Anticipated expiration: 2041-04-28
Also published as: CN113516030A

Abstract

The invention provides a method, a device, a storage medium and a terminal for verifying an action sequence, which comprise the following steps: acquiring an action sequence to be verified; extracting features of the action sequence to be verified to obtain a corresponding feature sequence; information fusion is carried out on the characteristic sequences to obtain overall sequence characteristics so as to judge the action category of the action sequence to be verified; and comparing the characteristic sequence of the action sequence to be verified with the characteristic sequence of the standard action sequence to judge whether the action sequence belongs to the same action sequence. According to the invention, the action sequence is verified by constructing a new neural network model, and meanwhile, the integral characteristics and the characteristic sequence of the action sequence are restrained according to time sequence, so that the accuracy of the action verification is high; the application field is wide, for example, whether people in two sections of videos are completing the same action is identified, and standardized processes of factories and workshops are detected; action scoring for sports entertainment areas, and so forth.

Description

Action sequence verification method and device, storage medium and terminal

Technical Field

The present invention relates to the field of computer vision, and in particular, to a method and apparatus for verifying an action sequence, a storage medium, and a terminal.

Background

With the rapid development of information network technology, video has become a main means for people to acquire information, and the video is penetrated into various fields of production, security, transportation, entertainment and the like. Among them, how to effectively use video information to realize motion recognition has become a research hotspot. The research is still in a blank stage, for example, whether the action sequences in the two video sections are the same action sequence is judged, whether the work of the factory workshop accords with the standard is judged, and the like.

Disclosure of Invention

In view of the above-mentioned drawbacks of the prior art, an object of the present invention is to provide a method, an apparatus, a storage medium and a terminal for verifying an action sequence, so as to solve the problems in the prior art.

To achieve the above and other related objects, a first aspect of the present invention provides an action sequence verification method, including: acquiring an action sequence to be verified; extracting features of the action sequence to be verified to obtain a corresponding feature sequence; information fusion is carried out on the characteristic sequences to obtain overall sequence characteristics so as to judge the action category of the action sequence to be verified; and comparing the characteristic sequence of the action sequence to be verified with the characteristic sequence of the standard action sequence to judge whether the action sequence belongs to the same action sequence.

In some embodiments of the first aspect of the present invention, the feature extraction method includes: and extracting the characteristics of the action sequence to be verified based on a BN-acceptance network or a Resnet50 network, and modifying the final full-connection layer to output a characteristic sequence with preset dimension.

In some embodiments of the first aspect of the present invention, the overall sequence feature obtaining manner includes: and carrying out time sequence information fusion on the characteristic sequence of the action sequence to be verified based on a Vision Transformer model so as to acquire the integral sequence characteristics.

In some embodiments of the first aspect of the present invention, the feature comparison method includes: performing a comparison on the characteristic sequence of the action sequence to be verified and the characteristic sequence of the standard action sequence according to the time sequence to obtain a characteristic comparison result; and constructing a loss function to carry out supervision constraint on the characteristic comparison result.

In some embodiments of the first aspect of the present invention, the loss function is a first loss function; the method further comprises the steps of: acquiring the overall sequence characteristics of the action sequence to be verified based on a Vision Transformer model; judging the action category of the action sequence to be verified based on the integral sequence characteristics; and constructing a second loss function to carry out supervision constraint on the judging result of the action category.

In some embodiments of the first aspect of the present invention, the action sequence verification method includes: and carrying out weighted calculation based on the first loss function and the second loss function to obtain a verification result of the action sequence to be verified.

In some embodiments of the first aspect of the present invention, the feature extraction results in a plurality of feature maps; the method comprises the following steps: dividing all feature maps into a plurality of fixed-size patches to obtain feature patch sequence; inputting Vision Transformer the obtained feature patch sequence; vision Transformer the characteristics of the rest of the latches are weighted and fused by a self-attribute module, and meanwhile, a special token is reserved, and the characteristics corresponding to the token position represent the category information of the whole action sequence to be verified.

To achieve the above and other related objects, a second aspect of the present invention provides an action sequence verification apparatus, comprising: the action sequence acquisition module is used for acquiring an action sequence to be verified; the feature extraction module is used for carrying out feature extraction on the action sequence to be verified so as to obtain a corresponding feature sequence; the feature fusion module is used for carrying out information fusion on the feature sequences to obtain overall sequence features so as to judge the action category of the action sequence to be verified; and the characteristic comparison module is used for comparing the characteristic sequence of the action sequence to be verified with the characteristic sequence of the standard action sequence to judge whether the action sequence belongs to the same action sequence or not.

To achieve the above and other related objects, a third aspect of the present invention provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the action sequence verification method.

To achieve the above and other related objects, a fourth aspect of the present invention provides an electronic terminal, comprising: a processor and a memory; the memory is used for storing a computer program, and the processor is used for executing the computer program stored in the memory so as to enable the terminal to execute the action sequence verification method.

As described above, the present invention provides an action sequence verification method, an apparatus, a storage medium, and a terminal, where feature extraction is performed on an action sequence to obtain a feature sequence, and time sequence matching and information fusion are performed on the feature sequence to obtain features with finer granularity of the action sequence to be verified, so as to implement accurate verification of the action sequence; the method can compare and verify whether the action sequences in the two sections of videos are consistent, and is more suitable for the actual life scene that the actions are completed and a plurality of continuous atomic actions are needed; and the method can verify the action sequence which is not predefined, thereby expanding the application range. In addition, the invention verifies the action sequence by constructing a new neural network model, not only can extract the action information of each atomic action in the action sequence, but also can extract the integral characteristic of the action sequence, and can simultaneously restrict the integral characteristic of the action sequence and the characteristic of each atomic action of the action sequence and the time sequence of the characteristic sequence, thereby greatly improving the success rate of action verification. In addition, the invention has wide application field, can be used for accurately verifying whether people in two sections of videos are completing the same action sequence, can also detect the standardized flow in the production fields of factories, workshops and the like, and is beneficial to improving the product quality; the method can also be applied to the fields of sports entertainment and the like, scoring the action of athletes, scoring the action in man-machine interaction games and the like.

Drawings

Fig. 1 is a flowchart illustrating a method for verifying an action sequence according to an embodiment of the invention.

Fig. 2 is a schematic diagram of three sets of motion sequences obtained by sampling frames from a video according to an embodiment of the present invention.

FIG. 3 is a schematic diagram of an action sequence verification neural network model and a workflow thereof according to an embodiment of the invention.

Fig. 4 is a schematic diagram of an action sequence verification device according to an embodiment of the invention.

Fig. 5 is a schematic structural diagram of an electronic terminal according to an embodiment of the invention.

Detailed Description

Other advantages and effects of the present invention will become apparent to those skilled in the art from the following disclosure, which describes the embodiments of the present invention with reference to specific examples. The invention may be practiced or carried out in other embodiments that depart from the specific details, and the details of the present description may be modified or varied from the spirit and scope of the present invention. It should be noted that the following embodiments and features in the embodiments may be combined with each other without conflict.

In the following description, reference is made to the accompanying drawings, which illustrate several embodiments of the invention. It is to be understood that other embodiments may be utilized and that changes may be made without departing from the spirit and scope of the present invention. The following detailed description is not to be taken in a limiting sense, and the scope of embodiments of the present invention is defined only by the claims of the issued patent. The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention.

In the present invention, as used herein, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context indicates otherwise. It will be further understood that the terms "comprises," "comprising," "includes," and/or "including" specify the presence of stated features, operations, elements, components, items, categories, and/or groups, but do not preclude the presence, presence or addition of one or more other features, operations, elements, components, items, categories, and/or groups. The terms "or" and/or "as used herein are to be construed as inclusive, or meaning any one or any combination. Thus, "A, B or C" or "A, B and/or C" means "any of the following: a, A is as follows; b, a step of preparing a composite material; c, performing operation; a and B; a and C; b and C; A. b and C). An exception to this definition will occur only when a combination of elements, functions or operations are in some way inherently mutually exclusive.

The invention provides a method, a device, a storage medium and a terminal for verifying an action sequence, so as to solve the problems in the prior art.

In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions in the embodiments of the present invention are further described in detail by the following embodiments with reference to the accompanying drawings. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

Example 1

As shown in fig. 1, the present embodiment provides a flow chart of an action sequence verification method, which includes steps S11 to S14, and can be specifically described as follows:

and S11, acquiring an action sequence to be verified. An action sequence is composed of a plurality of atomic actions in sequential relation, and more actions are shown in the form of action sequences in life because a single atomic action can perform too few functions. The atomic actions included in different action sequences may differ in number, may differ in corresponding single or multiple atomic actions, and may also differ in both number and action.

For example, fig. 2 shows a sequence of three sets of motion obtained from decimating frame samples from three videos, each representing an atomic motion, consisting of different video pictures. The three sets of action sequences are named in turn: seq1, seq2 and seq3. An action sequence is composed of a plurality of atomic actions (i.e. corresponding video pictures in fig. 1) with sequential relation, and seq1 and seq2 contain the same atomic actions, and the order in which these atomic actions occur is also the same, so that seq1 and seq2 belong to the same action sequence; while seq2 and seq3 are not only different in the order of atomic actions, but also different in the number of atomic actions, and thus seq2 and seq3 do not belong to the same sequence of actions.

Typically, the motion sequence to be verified is derived from a video, and can be obtained by sampling successive video frames in the video according to a preset rule. In addition, the obtained action sequence is preferably preprocessed, for example, smoothing, restoring, enhancing, median filtering, edge detection and the like are performed on the picture, so that the beneficial effects of eliminating irrelevant information, recovering useful information, enhancing the detectability of relevant information, simplifying data to the greatest extent and the like are realized.

And S12, carrying out feature extraction on the action sequence to be verified to obtain a corresponding feature sequence. Specifically, a convolutional neural network is utilized to perform feature extraction operation on the action sequence to be verified, and a time-ordered feature sequence is obtained. Wherein the feature sequence is a representation of the motion information contained in the entire motion sequence over a feature space.

In a preferred implementation manner of this embodiment, the feature extraction manner includes: and constructing a feature extraction module (Backbone) based on a BN-acceptance network or a Resnet50 network to extract the features of the action sequence to be verified, and modifying a final full connection layer (fc layer) to output a feature sequence with preset dimension.

Specifically, the BN-concept network and the Resnet50 network include a multi-layered layer that hierarchically extracts an input picture through a hierarchical structure, and as the layer deepens, the extracted features are more highly dimensional and global. And the BN-acceptance network and the Resnet50 network have the characteristic of strong robustness.

Further, the feature extraction module is to solve a 45-class classification problem, so that the K-dimensional features are required to be converted into 45 dimensions through the fc layer, and the output preset dimension of the fc layer is larger than 45; if the value of K is too high, the output dimension of the previous backup is exceeded, so the preset dimension of the fc layer is smaller than 512. Preferably, the output dimension of the feature sequence is 128 or 256, so that an effective balance between the recognition efficiency and the recognition accuracy can be achieved.

And S13, carrying out information fusion on the characteristic sequences to obtain overall sequence characteristics so as to judge the action category of the action sequence to be verified. Specifically, based on a Vision Transformer model, carrying out time sequence information fusion on the characteristic sequence of the action sequence to be verified so as to obtain the integral sequence characteristics.

Specifically, the workflow of the Vision Transformer model is as follows: inputting a video frame sequence (namely an action sequence to be verified) into a backup (such as a resnet 50) for feature extraction and outputting a series of feature maps (feature maps); dividing all feature maps into a plurality of patches with fixed sizes, and inputting vision transformer the obtained feature patch sequence; vision transformer the weighting of each patch fuses the features of the remaining patches (i.e., a feature sequence enhancement is performed on the input feature patch sequence) by the self-intent module; at the same time, a special token is reserved, and the corresponding feature of the token position is used for representing the category information of the whole video frame sequence. The Vision Transformer model has no stronger induction bias, but has stronger time sequence modeling capability and smaller calculation amount, and is particularly suitable for feature extraction fusion of action sequences with time sequence.

Further, the output of the Vision Transformer model is consistent with the feature dimension of the feature recognition module (backup), i.e., the dimension of the fc layer of the last layer of the Vision Transformer model is set to be consistent with the backup output feature dimension, thereby facilitating verification of the validity of the Vision transformer model.

And S14, comparing the characteristic sequence of the action sequence to be verified with the characteristic sequence of the standard action sequence to judge whether the action sequence belongs to the same action sequence. Specifically, a comparison is carried out on the characteristic sequence of the action sequence to be verified and the characteristic sequence of the standard action sequence according to a time sequence so as to obtain a characteristic comparison result; a first loss function is constructed to carry out supervision constraint on the characteristic comparison result; acquiring the overall sequence characteristics of the action sequence to be verified based on a Vision Transformer model; judging the action category of the action sequence to be verified based on the integral sequence characteristics; and constructing a second loss function to carry out supervision constraint on the judging result of the action category.

In a preferred implementation manner of this embodiment, the weighting calculation is performed based on the first loss function and the second loss function, so as to balance the influence degree of the two losses on the training of the overall model, thereby obtaining the verification result of the action sequence to be verified. In some examples, the weight ratio of the first loss function and the second loss function may be set to 10, 2, 1, etc., wherein a more accurate verification effect may be achieved when the weight is set to 10.

In some embodiments, the method may be applied to a controller, such as a ARM (Advanced RISC Machines) controller, FPGA (Field Programmable Gate Array) controller, soC (System on Chip) controller, DSP (Digital Signal Processing) controller, or MCU (Microcontroller Unit) controller, among others. In some embodiments, the method may also be applied to a computer including components such as memory, a memory controller, one or more processing units (CPUs), peripheral interfaces, RF circuitry, audio circuitry, speakers, microphones, input/output (I/O) subsystems, display screens, other output or control devices, and external ports; the computer includes, but is not limited to, a personal computer such as a desktop computer, a notebook computer, a tablet computer, a smart phone, a smart television, a personal digital assistant (Personal Digital Assistant, PDA for short), and the like. In other embodiments, the method may also be applied to servers, which may be disposed on one or more physical servers according to various factors such as functions, loads, etc., or may be composed of a distributed or centralized server cluster.

Example two

As shown in fig. 3, the present embodiment proposes a schematic diagram of an action sequence verification neural network model and a workflow thereof. Specifically, the action sequence verification neural network model includes: a feature extraction module (Intra-action module) built based on 2D backhaul, a feature fusion module (Inter-action module) built based on Vision Transformer, and a feature comparison module (Alignment module). It is worth mentioning that the feature enhancement module built based on Vision Transformer not only can realize parallelization training, but also can acquire global feature information of the action sequence, and is beneficial to guaranteeing the action sequence verification accuracy. After training of the feature extraction module and the feature fusion module by the neural network is completed, the motion sequence to be verified and the standard motion sequence are respectively input into the feature extraction module and the feature fusion module, and the feature extraction module and the feature fusion module are compared through the feature comparison module after obtaining the feature sequences of the feature extraction module and the standard motion sequence, so that whether the feature extraction module and the feature fusion module belong to the same motion sequence is judged.

Fig. 3 illustrates a workflow of verifying a neural network model by using an example of a verification process of whether two sets of action sequences input frames 1 and input frames 2 are the same action sequence, which can be specifically described as follows:

firstly, input frames 1 and 2 of the motion sequences are respectively input into a 2D backup, features corresponding to a plurality of atomic motions in each motion sequence are connected according to a specified dimension by adopting a concat function, and a corresponding feature map sequence is obtained by adopting a reshape function (Feature map sequence).

Then, feature sequences corresponding to the motion sequences input frames 1 and input frames 2 are respectively input into a Feature enhancement module, feature maps with different lengths are converted into vectors with fixed lengths by using a linear projection layer (linear projection layer), position information is combined in a mode of adding the Feature maps and position embedding, extra learnable embedding shows that the corresponding result after Transformer encoder is a representation of the whole motion sequence, namely, an integral sequence Feature vector is obtained, so that the motion category of the corresponding motion sequence is judged and obtained, a classification threshold (class score) is set based on a second loss function, and a corresponding first loss value (Classification loss 1) and a corresponding second loss value (Classification loss) are obtained.

And, input the characteristic sequence corresponding to motion sequences input frames 1 and input frames 2 into a characteristic registration module (Alignment module), compare and match the characteristic sequences of two groups of motion sequences based on a sequence similarity matrix (Sequence similarity matrix) and an identity matrix (identity matrix), and monitor and restrict the characteristic registration result based on a first loss function to obtain a characteristic registration loss value (Sequence Alignment loss).

Finally, the first loss value (Classification loss 1), the second loss value (Classification loss) and the feature registration loss value (Sequence Alignment loss) are weighted based on preset weights, so as to obtain a final action verification result, i.e. whether the action sequences input frames 1 and input frames 2 are the same action sequence can be verified.

It should be noted that, the present invention trains the action sequence to verify the neural network model by proposing a new data set, which is different from the existing data sets in that the action sequence is more focused than the single atomic action; and contains a plurality of action sequences with atomic action level differences that are not available with existing data sets. An action sequence is composed of a plurality of atomic actions in sequential relation, and more actions are shown in the form of action sequences in life because a single atomic action can perform too few functions.

In some examples, the manner in which the training dataset is obtained includes: shooting 2000 videos containing 70 different action sequences, eliminating some videos causing problems with equipment or actions, and remaining 1938 videos; extracting the remaining videos according to each frame, wherein the total of the remaining videos is 960,000 pictures, and each video lasts for 20.58 seconds on average and comprises 495.85 frames; the 70 different action sequences can be divided into 14 groups, wherein the first action sequence in each group is a standard sequence, the remaining four action sequences have slight differences from the standard sequence, and the error sequence is defined, and the difference between the two action sequences and the standard sequence is that the sequence of actions of a certain number of atoms is disordered; the other two are deleting some atomic actions based on the standard sequence; finally, the obtained data set can be used for training the action sequence verification neural network model provided by the invention, and the trained model can be used for verifying the action sequence to be verified or verifying whether a plurality of groups of action sequences are the same action.

Example III

As shown in fig. 4, the present embodiment proposes an action sequence verification apparatus, including: an action sequence acquisition module 41, configured to acquire an action sequence to be verified; the feature extraction module 42 is configured to perform feature extraction on the action sequence to be verified to obtain a corresponding feature sequence; the feature fusion module 43 is configured to perform information fusion on the feature sequence to obtain overall sequence features to determine an action category of the action sequence to be verified; the feature comparison module 44 is configured to perform feature comparison on the feature sequence of the to-be-verified action sequence and the feature sequence of the standard action sequence to determine whether the feature sequence belongs to the same action sequence.

It should be noted that the modules provided in this embodiment are similar to the methods and embodiments provided above, and thus will not be described again. It should be further noted that, it should be understood that the division of each module of the above apparatus is merely a division of a logic function, and may be fully or partially integrated into one physical entity or may be physically separated. And these modules may all be implemented in software in the form of calls by the processing element; or can be realized in hardware; the method can also be realized in a form of calling software by a processing element, and the method can be realized in a form of hardware by a part of modules. For example, the feature registration module 44 may be a processing element that is set up separately, may be implemented in a chip of the above apparatus, or may be stored in a memory of the above apparatus in the form of program codes, and the functions of the feature registration module 44 may be invoked and executed by a processing element of the above apparatus. The implementation of the other modules is similar. In addition, all or part of the modules can be integrated together or can be independently implemented. The processing element described herein may be an integrated circuit having signal processing capabilities. In implementation, each step of the above method or each module above may be implemented by an integrated logic circuit of hardware in a processor element or an instruction in a software form.

For example, the modules above may be one or more integrated circuits configured to implement the methods above, such as: one or more application specific integrated circuits (Application Specific Integrated Circuit, abbreviated as ASIC), or one or more microprocessors (digital signal processor, abbreviated as DSP), or one or more field programmable gate arrays (Field Programmable Gate Array, abbreviated as FPGA), or the like. For another example, when a module above is implemented in the form of a processing element scheduler code, the processing element may be a general-purpose processor, such as a central processing unit (Central Processing Unit, CPU) or other processor that may invoke the program code. For another example, the modules may be integrated together and implemented in the form of a system-on-a-chip (SOC).

Example IV

The present embodiment proposes a computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the aforementioned method of verification of an action sequence.

Those of ordinary skill in the art will appreciate that: all or part of the steps for implementing the method embodiments described above may be performed by computer program related hardware. The aforementioned computer program may be stored in a computer readable storage medium. The program, when executed, performs steps including the method embodiments described above; and the aforementioned storage medium includes: various media that can store program code, such as ROM, RAM, magnetic or optical disks.

Example five

As shown in fig. 5, an embodiment of the present invention provides a schematic structural diagram of an electronic terminal. The electronic terminal provided in this embodiment includes: a processor 51, a memory 52, a communicator 53; the memory 52 is connected to the processor 51 and the communicator 53 via a system bus and performs communication with each other, the memory 52 is used for storing a computer program, the communicator 53 is used for communicating with other devices, and the processor 51 is used for running the computer program to cause the electronic terminal to execute the steps of the above action sequence verification method.

The system bus mentioned above may be a peripheral component interconnect standard (Peripheral Component Interconnect, PCI) bus or an extended industry standard architecture (Extended Industry Standard Architecture, EISA) bus, or the like. The system bus may be classified into an address bus, a data bus, a control bus, and the like. For ease of illustration, the figures are shown with only one bold line, but not with only one bus or one type of bus. The communication interface is used to enable communication between the database access apparatus and other devices (e.g., clients, read-write libraries, and read-only libraries). The memory may comprise random access memory (Random Access Memory, RAM) and may also comprise non-volatile memory (non-volatile memory), such as at least one disk memory.

The processor may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU for short), a network processor (Network Processor, NP for short), etc.; but may also be a digital signal processor (Digital Signal Processing, DSP for short), application specific integrated circuit (Application Specific Integrated Circuit, ASIC for short), field-programmable gate array (Field-Programmable Gate Array, FPGA for short), or other programmable logic device, discrete gate or transistor logic device, discrete hardware components.

In summary, the invention provides an action sequence verification method, an action sequence verification device, a storage medium and a terminal, wherein the action sequence is subjected to feature extraction to obtain a feature sequence, and the feature sequence is subjected to time sequence matching and information fusion to obtain the feature with finer granularity of the action sequence to be verified, so that the accurate verification of the action sequence is realized. Because the completion of one action in the actual life is often an action sequence consisting of a plurality of continuous atomic actions, verification of the action sequence with continuity can effectively improve the accuracy of action verification, and the actual application range is wider. In addition, the invention builds the action sequence verification neural network model based on deep learning, performs preliminary feature extraction on the action sequence through a basic network (backup), further extracts features with finer granularity through vision transformer and Sequence Alignment, has comprehensive feature information extraction and high action verification accuracy. Therefore, the invention effectively overcomes various defects in the prior art and has high industrial utilization value.

The above embodiments are merely illustrative of the principles of the present invention and its effectiveness, and are not intended to limit the invention. Modifications and variations may be made to the above-described embodiments by those skilled in the art without departing from the spirit and scope of the invention. Accordingly, it is intended that all equivalent modifications and variations of the invention be covered by the claims, which are within the ordinary skill of the art, be within the spirit and scope of the present disclosure.

Claims

1. A method of action sequence verification, comprising:

acquiring an action sequence to be verified;

extracting features of the action sequence to be verified to obtain a corresponding feature sequence;

information fusion is carried out on the characteristic sequences to obtain overall sequence characteristics so as to judge the action category of the action sequence to be verified;

comparing the characteristic sequence of the action sequence to be verified with the characteristic sequence of the standard action sequence to judge whether the action sequence belongs to the same action sequence or not;

the characteristic comparison mode comprises the following steps:

comparing the characteristic sequence of the action sequence to be verified with the characteristic sequence of the standard action sequence according to the time sequence to obtain a characteristic comparison result; constructing a first loss function to carry out supervision constraint on the characteristic comparison result;

acquiring the overall sequence characteristics of the action sequence to be verified based on a Vision Transformer model; judging the action category of the action sequence to be verified based on the integral sequence characteristics; constructing a second loss function to carry out supervision constraint on the judgment result of the action category;

and carrying out weighted calculation based on the first loss function and the second loss function to obtain a verification result of the action sequence to be verified.

2. The method of claim 1, wherein the feature extraction method includes:

and extracting the characteristics of the action sequence to be verified based on a BN-acceptance network or a Resnet50 network, and modifying the final full-connection layer to output a characteristic sequence with preset dimension.

3. The method for verifying an action sequence according to claim 1, wherein the overall sequence feature obtaining manner includes:

and carrying out time sequence information fusion on the characteristic sequence of the action sequence to be verified based on a Vision Transformer model so as to acquire the integral sequence characteristics.

4. A method of validating an action sequence according to claim 3 wherein the result of feature extraction is a plurality of feature maps; the method comprises the following steps:

dividing all feature maps into a plurality of fixed-size patches to obtain feature patch sequence;

inputting Vision Transformer the obtained feature patch sequence;

vision Transformer the characteristics of the rest of the latches are weighted and fused by a self-attribute module, and meanwhile, a special token is reserved, and the characteristics corresponding to the token position represent the category information of the whole action sequence to be verified.

5. An action sequence verification apparatus, comprising:

the action sequence acquisition module is used for acquiring an action sequence to be verified;

the feature extraction module is used for carrying out feature extraction on the action sequence to be verified so as to obtain a corresponding feature sequence;

the feature fusion module is used for carrying out information fusion on the feature sequences to obtain overall sequence features so as to judge the action category of the action sequence to be verified;

the feature comparison module is used for comparing the feature sequence of the action sequence to be verified with the feature sequence of the standard action sequence to judge whether the action sequence belongs to the same action sequence or not;

the characteristic comparison mode comprises the following steps:

6. A computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor implements the method of action sequence verification of any one of claims 1 to 4.

7. An electronic terminal, comprising: a processor and a memory;

the memory is configured to store a computer program, and the processor is configured to execute the computer program stored in the memory, so as to cause the terminal to execute the action sequence verification method according to any one of claims 1 to 4.