CN117292222A

CN117292222A - Training data acquisition method, system, device, server and medium

Info

Publication number: CN117292222A
Application number: CN202311272850.2A
Authority: CN
Inventors: 吴鑫; 林平; 唐琦松; 谢涛
Original assignee: Shanghai I Search Software Co ltd
Current assignee: Shanghai I Search Software Co ltd
Priority date: 2023-09-28
Filing date: 2023-09-28
Publication date: 2023-12-26

Abstract

A training data acquisition method, system, device, server and medium, wherein the training data acquisition method comprises the following steps: acquiring video data of a test action; inputting the video data of the test action to a trained object detection model to execute object detection, so as to obtain an object detection result; inputting the object detection result into a trained action classification model to perform action classification, so as to obtain an action classification result; when the action classification result identifies an action image which does not belong to a preset storage area, the action image is stored in the preset storage area and is automatically marked, the storage area is configured with a category identification of the existing action, and the category identification corresponds to the action category of the executed test action. The invention realizes the rapid acquisition of training data and improves the detection accuracy of test actions.

Description

Training data acquisition method, system, device, server and medium

Technical Field

The invention relates to the technical field of data acquisition, in particular to a training data acquisition method, a training data acquisition system, a training data acquisition device, a training data acquisition server and a training data acquisition medium.

Background

In today's business office environment, most of the work of employees is done through computers, networks, so that large amounts of confidential data are stored in electronic files on the employee's computers, and many business confidential and intangible assets need to be stored by means of the computers. How to avoid these documents from being spread out and how to avoid immeasurable losses to enterprises is a serious consideration of enterprises in terms of anti-disclosure. Although most businesses have taken many measures to prevent data leakage, for example: establishing a pure intranet office environment, prohibiting mobile storage use, monitoring employee computer operation behaviors, prohibiting use of cloud disk sharing, managing file outgoing and the like, but omitting an important secret leakage path: and taking a photograph to reveal the secret. Although the disclosure tracing can be realized by adding the display watermark in the application window, the disclosure behavior cannot be managed and controlled in real time or instantly, so that in this case, when the disclosure person makes the photographed behavior, the disclosure person should be identified instantly to realize instant behavior management and control.

In the prior art, when motion class detection is required, the space-time motion detection task (space-temporal action detection) can be implemented using existing motion class detection schemes. The method has a problem that in practical application, the types of actions to be detected are often not invariable, and the types of detected actions are often increased, or the detection accuracy of the existing action types is further improved. When the detection action category needs to be increased or the detection accuracy of the existing action category needs to be further improved, a large amount of data needs to be collected and marked, and the method is difficult to deal with application scenes with rapid changes in requirements.

Disclosure of Invention

The invention provides a training data acquisition method, a training data acquisition system, a training data acquisition device, a training data acquisition server and a training data acquisition medium, which can effectively improve the detection accuracy of the existing action category and increase the detection action category.

In a first aspect, the present invention provides a method for acquiring training data, including:

acquiring video data of a test action;

inputting the video data of the test action to a trained object detection model to execute object detection, so as to obtain an object detection result; inputting the object detection result into a trained action classification model to perform action classification, so as to obtain an action classification result;

when the action classification result identifies an action image which does not belong to a preset storage area, the action image is stored in the preset storage area and is automatically marked, the storage area is configured with a category identification of the existing action, and the category identification corresponds to the action category of the executed test action.

According to the technical scheme, the video data are acquired, the motion classification is carried out on the video data based on the motion classification model, and the image frames with errors in the motion classification model classification are acquired, so that the image frames are stored in the preset storage area and are automatically marked, and therefore the rapid acquisition of training data of the motion classification model is realized, and the detection accuracy of the existing motion class is improved.

Optionally, the collecting method further includes:

acquiring video data of the same new action, wherein the new action is an action of an action category which is not included in training data of an action classification model;

storing the video data of the acquired new action into a storage area corresponding to the new action and automatically marking, wherein the storage area is provided with a category identifier of the new action;

video data and corresponding category identifiers are extracted from the storage area to train an action classification model to identify new actions.

Optionally, the same existing action is continuously performed during the acquisition process.

Alternatively, the object detection model uses an object detection model of the YOLO series system.

Optionally, the action classification model uses a convolutional neural network, a recurrent neural network, or a spatial-temporal-attention-network-Temporal Attention Network model.

Optionally, the collecting method further includes: and displaying the actually executed actions and the identified actions in real time and evaluating the confidence degree.

In a second aspect, the present invention further provides a training data acquisition device, including:

the acquisition module is used for acquiring video data of the test action;

the motion detection and recognition module is used for inputting the video data of the test motion into the trained object detection model to execute object detection, so as to obtain an object detection result; inputting the object detection result into a trained action classification model to perform action classification, so as to obtain an action classification result;

and the storage labeling module is used for storing the action images into a preset storage area and automatically labeling the action images when the action classification result identifies the action images which do not belong to the preset storage area, wherein the storage area is configured with a category identifier of the existing action, and the category identifier corresponds to the action category of the executed test action.

In a third aspect, the present invention further provides a training data acquisition system, including:

the video acquisition device is used for continuously acquiring the executed actions and uploading the acquired data;

the server is used for receiving the collected data and identifying action categories, and is provided with a pre-trained object detection model and an action classification model;

the client is used for providing a real-time identification interface to display the executed actions and the identification results given by the model in real time; and providing a confidence scoring area for displaying the value of the confidence in real time.

In a fourth aspect, the present invention further provides a server, including a processor, a memory, and a computer program stored on the memory and executable by the processor, where the computer program, when executed by the processor, implements the steps of the training data acquisition method.

In a fifth aspect, the present invention further provides a computer readable storage medium, on which a computer program is stored, wherein the computer program, when executed by a processor, implements the steps of the training data acquisition method.

Compared with the prior art, the invention has the beneficial effects that:

1. according to the method, the device and the system, the video data are acquired, the video data are subjected to action classification based on the action classification model, and the image frames with the action classification model classification errors are acquired, and then the image frames are stored in the preset storage area and are automatically marked, so that the training data of the action classification model are acquired rapidly, and the detection accuracy of the existing action class is improved.

2. The new action category is increased by acquiring video data of the new action, configuring a category identifier of the new action for the storage area, storing the video data and automatically labeling, extracting the video data and the category identifier and training the action classification model.

3. According to the method, the action classification model is trained only, the training data related to the action category is added only to train the action classification model, and the training cost is reduced.

Drawings

FIG. 1 is a flow chart of a training data acquisition method according to the present invention;

FIG. 2 is a flow chart of adding and identifying new action categories in the training data acquisition process according to the present invention;

FIG. 3 is a schematic diagram of an example of the task of performing motion classification based on video data according to the present invention;

fig. 4 is a schematic diagram of an example of a training data acquisition method according to the present invention.

Detailed Description

It should be noted that: in many cases, it is necessary to identify the action of an agent, for example, in a scenario where there is no human examination room, it is necessary to identify whether or not the action of the examinee is an illegal action, for example, an action of looking down at a reading book, looking up a mobile phone, or the like. In some special office scenarios, such as a service hall of a financial institution like a bank, it is necessary to identify whether on-site service personnel have illegal actions like eating or sleeping, which can cause a loss of enterprise image. The training data acquisition method is applied to the scenes, aims at sensitive actions or forbidden actions, needs to improve the accuracy of action classification, rapidly analyzes actions executed by a processing target and realizes action classification, and further improves the management efficiency of the sensitive actions or forbidden actions.

The following terms are explained:

the prior actions are as follows: the storage area has stored actions corresponding to the action categories, i.e. actions that can be identified by the action classification model 8.

The new actions are as follows: the training data of the motion classification model 8 does not have a motion of the motion class.

The following detailed description of the present invention is made with reference to the accompanying drawings and specific embodiments, and it is to be understood that the specific features of the embodiments and the embodiments of the present invention are detailed description of the technical solutions of the present invention, and not limited to the technical solutions of the present invention, and that the embodiments and the technical features of the embodiments of the present invention may be combined with each other without conflict.

The term "and/or" is merely an association relationship describing an associated object, meaning that there may be three relationships, e.g., a and/or B, may represent: a exists alone, A and B exist together, and B exists alone. The character "/", generally indicates that the front and rear associated objects are an or relationship.

The invention provides a training data acquisition method, which comprises the following steps:

and step 1, acquiring video data of the existing action.

In this embodiment, a video capture device 12 may be provided, and the tester 10 may continuously perform existing actions, such as a mobile phone playing action, a transcription action, a eating action, or other actions, in the capturing area of the video capture device 12. After the video data is acquired by the video acquisition device 12, the video data is transmitted to the server 11 via the network. It should be noted that there may be a plurality of types of actions in the existing actions, but the tester 10 may perform only one action in one acquisition. In the present embodiment, the existing action is an action that can be recognized by the action classification model 8.

Step 2, inputting the video data of the existing actions into a trained object detection model 7 to execute object detection, and obtaining an object detection result; and inputting the object detection result into a trained action classification model 8 to perform action classification, so as to obtain an action classification result.

In this embodiment, two models implement two-stage detection, respectively: stage 1: human body detection stage: detecting the region by arranging an object recognition model, namely, a picture frame of a person; stage 2: action classification and identification stage: the action class of the person is detected by setting an action classification model 8.

In the present embodiment, after obtaining the video data described above, the server 11 sequentially inputs the video data into the object detection model 7 and the motion classification model 8 which have been trained in advance, and performs object detection and motion classification.

In the present embodiment, the object detection model 7 may employ a YOLO system series of object detection models 7, such as a YOLOv5 model, which may be obtained using training data such as an image with a person, an image of a person in a room, which may be an office, and the training data is preferably an image of a person sitting in the office, and the person is facing the view angle of the video capture device 12. The corresponding annotation is for the person, i.e. the picture frame is made for the person in the image.

In this embodiment, the motion classification model 8 may be a convolutional neural network or a cyclic neural network, or a Spatio-temporal attention network (space-Temporal Attention Network) model, and the training data of the motion classification model 8 may be obtained by using training data, for example, an image with a person, and the tag is a motion type of the person in the image, in this embodiment, the training data of the motion classification model 8 is also preferably an image of the person in an office, and the person is facing the view angle of the video capturing device 12. The corresponding annotation is an action category of the person in the image, the action category corresponds to a specific scene, and specifically, the action category is a sensitive action or a forbidden action in a certain scene. For example, in a scenario where data leakage is prevented, the action category may be an action that a person performs transcription, an action that a person performs photographing on a computer screen, and so on, which may cause data leakage, and thus be a sensitive action. In the scenario of preventing cheating on exams, the action category may also be a cell phone playing action. In some scenarios, the action category may also be an eating action, for example, in the service posts of some large institutions, the actions of staff on the service posts need to be detected in real time, so as to prevent the staff from performing forbidden actions, thereby affecting the image of the institutions.

And 3, when the action classification result identifies the action image which does not belong to the preset storage area, storing the action image into the preset storage area and automatically marking, wherein the storage area is configured with a category identifier of the existing action, and the category identifier corresponds to the action category of the executed existing action.

In this embodiment, the recognition error may be represented as an action category that should be recognized as corresponding to the existing action in step S1, but the action classification model 8 cannot be recognized as the action category of the existing action, for example, the action classification model 8 can recognize three action categories of A, B, C, and accordingly, in step S1, the tester 10 is executing the action a, but the model output result is B or C, or no specific category is given, that is, the action classification model 8 considers that none of the actions executed by the tester 10 in step S1 belongs to the three action categories of A, B, C.

In this embodiment, the storage area corresponding to the existing action is preconfigured, and the storage area may be a folder, and the storage area may be a storage area directly provided in a storage space local to the server 11, for example, in a hard disk of the local server 11, or may be a storage area provided in a remote storage repository, for example, a File System (FS), a distributed file system (HDFS), a Network File System (NFS), or a Network Attached Storage (NAS), or may be stored in cloud storage, for example, amazon S3, google Cloud Storage, or the like.

The storage area is configured with a category identifier of the existing action, and may be a folder created in the various storage spaces, where the name of the folder is the category identifier, for example, in step S1, when the tester 10 performs the action of playing the mobile phone, the name of the file may be "play the mobile phone action".

In the present embodiment, the tester 10 continuously performs the same existing action in step S1, and accordingly, the action classification model 8 should output the same action classification result, and the action classification result should correspond to the existing action. When a certain frame of image is not recognized as the category of the existing action, it is explained that the action classification model 8 may lack training data of the action, so that acquisition is required, so in this embodiment, only the image frame with the recognition error is acquired as training data (we consider that for the image frame with the correct recognition, the action classification model 8 has learned the corresponding data representation, and can not continue to acquire, only the recognition error is acquired), and secondly, because it is known in advance what existing action the tester 10 is about to execute, and the corresponding storage area with the existing action category identification is preset, it is possible to directly know that all the image frames in the storage area are of a certain existing action category, and the action classification model 8 recognizes the error, in other words, the data representation of the image frame in the storage area is not learned or has not been sufficiently learned.

Therefore, in this embodiment, by the smart method, the images that are in error or cannot be classified by the motion classification model 8 can be collected quickly, and the images are automatically labeled (the images are put into the storage area to be labeled), so that the training data of the motion classification model 8 can be collected quickly, and in practical application, the images stored in the storage area and the corresponding class identifiers can be used as the labels of the training data to train the motion classification model 8, so that the accuracy of the motion classification model 8 in the existing motion classification is improved.

Considering that the motion of a person is often continuous, in other words, the person cannot execute the motion of drinking water at the last moment, the result is that the motion is immediately converted into the motion of playing a mobile phone at the next moment, wherein a continuous conversion process is necessary, for example, for the motion of drinking water, the motion of taking up a cup, putting the cup at the mouth for drinking water, and putting down the cup is a plurality of decomposition motions, and the plurality of decomposition motions characterize the motion semantic of drinking water by the person. Therefore, in time series, the image frames before and after a certain moment are recognized as having certain similar data characterization with the image frames before and after the moment, even if the image frames before and after the image with the recognition error are recognized correctly, the model can not be recognized sufficiently to the data characterization, thus the reinforcement learning is needed

Thus, for this case, in the video data described above, if a certain frame of image is recognized as erroneous, in the time sequence reflected by the video data, a plurality of frames of image may be searched forward or backward in a certain time window, starting from the frame of image with the erroneous recognition, and placed in the storage area as training data. The time window may be 1 second or 0.5 seconds. Instead of using a time window, the image frames with the error identification can be used as a starting point, and the image frames can be traversed forward and/or traversed backward to traverse the confidence degrees corresponding to the image frames, the confidence degrees are also output by the action classification model 8, and the image frames with the confidence degrees lower than a certain threshold value are all placed in the storage area to be used as training data. The threshold may be 75%.

The acquisition method further comprises the following steps:

in some cases, as the user's requirement changes, it may be necessary to add new recognition of action categories, for example, the current action classification model 8 can recognize A, B, C three types of actions, the user extracts the requirement, and the model can also recognize action D, so for this case, in some embodiments, a training data collection method further includes the following steps:

step 4, obtaining video data of the new action executed by the tester 10, wherein the tester 10 continuously executes the same new action. In the present embodiment, the new motion is represented as a motion of a motion class not included in the training data of the motion classification model 8.

Step 5, storing the video data acquired by the new action into a storage area corresponding to the new action so as to realize automatic labeling of the acquired data; wherein, the storage area has a category identification of the new action;

and 6, extracting video data and a corresponding category identification training action classification model 8 from the storage area, and realizing the identification of new actions by action classification.

In the present scenario example, the action classification model 8 has been able to identify A, B, C three types of actions; however, the recognition accuracy of the three types of actions cannot meet the requirements and needs to be improved. For example, to improve the accuracy of identifying the type a motion, a folder may be created in advance in the hard disk built in the server 11, and may be named as a motion, the a motion folder is used to collect training data of the type a motion, and one tester 10 continues to perform various types of type a motion before the video capture device 12, and it is easy to understand that even a simple motion of drinking water may have various types of motion modes indicating the meaning of drinking water, and thus the type a motion is various types of type a motion. The type a motion is continuously collected by the camera, and accordingly, the video data of the collected type a motion is input into the server 11 for real-time identification and classification, and the classification result includes correctly classified image frames and incorrectly classified image frames. And storing the image frames with wrong classification into a folder of the A action, and training the action classification model 8 to improve the recognition accuracy of the A action.

The invention provides a training data acquisition device, which comprises:

and the acquisition module is used for acquiring the video data of the test action.

In this embodiment, a video capture device 12 may be provided, and the tester 10 may continuously perform existing actions in the capture area of the video capture device 12. After the video data is acquired by the video acquisition device 12, the video data is transmitted to the server 11 via the network. It should be noted that there may be a plurality of types of actions in the existing actions, but the tester 10 may perform only one action in one acquisition. In the present embodiment, the existing action is an action that can be recognized by the action classification model.

The motion detection and recognition module is used for inputting the video data of the test motion into the trained object detection model to execute object detection, so as to obtain an object detection result; and inputting the object detection result into a trained action classification model to perform action classification, so as to obtain an action classification result.

In the present embodiment, the object detection model may employ an object detection model of the YOLO system series, such as a YOLOv5 model, which may be obtained using training data such as an image with a person, an image of a person in a room, and an office, and in the present embodiment, the training data is preferably an image in which a person sits in an office, and the person is facing the view angle of the video capture device 12. The corresponding annotation is for the person, i.e. the picture frame is made for the person in the image.

In this embodiment, the motion classification model may be a convolutional neural network or a cyclic neural network, or a Spatio-temporal attention network (space-Temporal Attention Network) model, and the training data of the motion classification model may be obtained by using training data, for example, an image with a person, and the tag is the motion type of the person in the image, in this embodiment, the training data of the motion classification model is also preferably an image of the person in an office, and the person is facing the view angle of the video capture device 12. The corresponding annotation is an action category of the person in the image, the action category corresponds to a specific scene, and specifically, the action category is a sensitive action or a forbidden action in a certain scene. For example, in a scenario where data leakage is prevented, the action category may be an action that a person performs transcription, an action that a person performs photographing on a computer screen, and so on, which may cause data leakage, and thus be a sensitive action. In the scenario of preventing cheating on exams, the action category may also be a cell phone playing action. In some scenarios, the action category may also be an eating action, for example, in the service posts of some large institutions, the actions of staff on the service posts need to be detected in real time, so as to prevent the staff from performing forbidden actions, thereby affecting the image of the institutions.

In this embodiment, the recognition error may be represented as an action category that should be recognized as corresponding to the existing action in step S1, but the action classification model cannot be recognized as the action category of the existing action, for example, the action classification model can recognize three action categories of A, B, C, and accordingly, in step S1, the tester 10 is executing the action a, but the model output result is B or C, or no specific category is given, that is, the action classification model considers that the action executed by the tester 10 in step S1 does not belong to the three action categories of A, B, C.

In this embodiment, the storage area corresponding to the existing action is preconfigured, and the storage area may be a folder, and the storage area may be a storage area directly disposed in a storage space of a server, for example, a hard disk of a local server, or may be a storage area disposed in a remote storage repository, for example, a File System (FS), a distributed file system (HDFS), a Network File System (NFS), or a Network Attached Storage (NAS), or may be stored in cloud storage, for example, amazon S3, google Cloud Storage, or the like.

The invention provides a training data acquisition system, which comprises:

the video acquisition device 12 is used for continuously acquiring the executed actions and uploading acquired data;

in this embodiment, a video capture device 12 may be provided, and the tester 10 may continuously perform existing actions, such as a mobile phone playing action, a transcription action, a eating action, or other actions, in the capturing area of the video capture device 12. After the video data is acquired by the video acquisition device 12, the video data is transmitted to the server 11 via the network. It should be noted that there may be a plurality of types of actions in the existing actions, but the tester 10 may perform only one action in one acquisition. In the present embodiment, the existing action is an action that can be recognized by the action classification model.

The server 11 is used for receiving the collected data and identifying action types, and the server 11 is provided with a pre-trained object detection model and an action classification model;

in the present embodiment, after obtaining the video data described above, the server 11 sequentially inputs the video data into the object detection model and the motion classification model which have been trained in advance, and performs object detection and motion classification.

In the present embodiment, the object detection model may employ an object detection model of the YOLO system series, such as a YOLOv5 model, which may be obtained using training data such as an image with a person, an image of a person in a room, and an office, and in the present embodiment, the training data is preferably an image in which a person sits in an office, and the person is facing the view angle of the video capture device. The corresponding annotation is for the person, i.e. the picture frame is made for the person in the image.

The client 9 is used for providing a real-time identification interface to display the executed actions and the identification results given by the model in real time; and providing a confidence scoring area for displaying the value of the confidence in real time.

A web page may be provided in which a real-time recognition interface may be provided for displaying in real-time the actions performed by the user, and the recognition results given by the model, i.e. the specific action categories. A confidence scoring area may also be provided in the web page for displaying specific confidence values in real-time for the tester 10 itself or other user to view in real-time.

The server 11 may be an electronic device having a certain arithmetic processing capability. For example, the server 11 may be a server of a distributed system, or may be a system having a plurality of processors, memories, network communication modules, etc. operating in conjunction. The server 11 may also be a cloud server, or an intelligent cloud computing server or intelligent cloud host with artificial intelligence technology. The server 11 may also be a server cluster formed for several servers. Alternatively, with the development of science and technology, the server 11 may be a new technical means capable of realizing the functions corresponding to the embodiments of the specification. For example, a new form of "server" based on quantum computing implementation may be possible.

The client may be an electronic device with network access capabilities. Specifically, for example, the terminal may be a desktop computer, a tablet computer, a notebook computer, a smart phone, or the like. Alternatively, the terminal may be software capable of running in the electronic device.

The network described above may be any type of network that may support data communications using any of a number of available protocols, including but not limited to TCP/IP, SNA, IPX and the like. The one or more networks may be a Local Area Network (LAN), an ethernet-based network, a token ring, a Wide Area Network (WAN), the internet, a virtual network, a Virtual Private Network (VPN), an intranet, an extranet, a Public Switched Telephone Network (PSTN), an infrared network, a wireless network (e.g., bluetooth, WIFI), and/or any combination of these and/or other networks.

Specific implementations of the system or apparatus may be realized with reference to the foregoing method embodiments.

The invention provides a server 11 comprising a processor, a memory and a computer program stored on the memory and executable by the processor, wherein the computer program, when executed by the processor, implements the steps of the training data acquisition method.

The invention provides a computer readable storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the steps of the training data acquisition method.

In summary, the present invention only trains the motion classification model 8, only adds training data related to the motion class, and trains the motion classification model 8 to reduce the total cost of training. In order to realize rapid training data acquisition and automatic labeling, the technical scheme sets a corresponding training data storage area for the existing action type or the action type to be added, stores different image data into the training data storage area when acquiring the image data of the action, realizes rapid training data acquisition and automatic labeling, extracts training data from the training data storage area to train a model during training, realizes the increase of the detection action type and further improves the detection accuracy of the existing action type, and meets the rapid change action recognition requirement with extremely low cost.

It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The present invention has been described above with reference to the embodiments of the present invention, but the present invention is not limited to the above-described embodiments, which are merely illustrative and not restrictive, and many forms may be made by those having ordinary skill in the art without departing from the spirit of the present invention and the scope of the appended claims, which are to be included in the protection of the present invention

Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims

1. A method for collecting training data, comprising:

acquiring video data of a test action;

2. The method of claim 1, further comprising:

3. A method of collecting training data according to claim 1, wherein the same existing action is continuously performed during the collection.

4. The method of claim 1, wherein the object detection model uses an object detection model of YOLO series system.

5. The method of claim 1, wherein the action classification model uses a convolutional neural network, a cyclic neural network, or a spatial attention network space-Temporal Attention Network model.

6. The method of claim 1, further comprising: and displaying the actually executed actions and the identified actions in real time and evaluating the confidence degree.

7. A training data acquisition device, comprising:

the acquisition module is used for acquiring video data of the test action;

8. A training data acquisition system comprising:

9. A server comprising a processor, a memory, and a computer program stored on the memory and executable by the processor, wherein the computer program when executed by the processor performs the steps of the training data acquisition method according to any one of claims 1 to 6.

10. A computer-readable storage medium, on which a computer program is stored, wherein the computer program, when being executed by a processor, implements the steps of the training data acquisition method according to any one of claims 1 to 6.