WO2021101052A1 - Procédé et dispositif de détection de trame d'action fondée sur un apprentissage faiblement supervisé, à l'aide d'une suppression de trame d'arrière-plan - Google Patents
Procédé et dispositif de détection de trame d'action fondée sur un apprentissage faiblement supervisé, à l'aide d'une suppression de trame d'arrière-plan Download PDFInfo
- Publication number
- WO2021101052A1 WO2021101052A1 PCT/KR2020/012645 KR2020012645W WO2021101052A1 WO 2021101052 A1 WO2021101052 A1 WO 2021101052A1 KR 2020012645 W KR2020012645 W KR 2020012645W WO 2021101052 A1 WO2021101052 A1 WO 2021101052A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- class
- activation sequence
- frame
- generating
- behavior
- Prior art date
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N5/00—Details of television systems
- H04N5/14—Picture signal circuitry for video frequency region
- H04N5/144—Movement detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/90—Determination of colour characteristics
Definitions
- This study is related to the research on the source technology for understanding the semantic context based on deep learning of the next generation information computing technology development project conducted with the support of the Ministry of Science and ICT (No. NRF-2017M3C4A7069370).
- a technology for detecting only the action frame is required, and the video refined by detecting the action frame is very easy to use as training data in other deep learning models as well as users.
- Patent Document 1 Korean Laid-Open Patent Publication No. 10-2015-0127684 (2015.11.17)
- Patent Document 1 Korean Registered Patent Publication No. 10-0785076 (2017.12.05)
- Embodiments of the present invention have a main purpose to accurately detect a behavior frame based on weak supervised learning by generating an adjustment class activation sequence through a behavior classification model in which a background frame is suppressed from a feature map extracted from a video.
- a method for detecting a behavioral frame based on weak supervised learning by a computing device extracting a feature map from a video, and applying a foreground weight to the feature map to provide an adjustment class through a behavior classification model. It provides a method for detecting a weak supervised learning-based action frame, including the step of generating an activation sequence.
- the extracting of the feature map may include converting the video into a plurality of color frames, extracting color features from the plurality of color frames, converting the video into an optical flow frame, and determining the optical flow characteristics from the optical flow frame.
- the feature map may be generated by extracting and combining the color feature and the optical flow feature.
- the foreground weight may be adjusted by filtering the background frame so that the background frame is not activated in the adjustment class activation sequence through a filtering model for the feature map.
- positive learning for the behavior class and negative learning for the background class may be performed by calculating an adjustment class score for the adjustment class activation sequence.
- a base class score may be calculated for the base class activation sequence, and positive learning may be performed on the behavior class and the background class.
- the generating of the base class activation sequence and the generating of the adjustment class activation sequence may share weights of the behavioral classification model to be learned together.
- an apparatus for detecting an action frame including at least one processor and a memory for storing at least one program executed by the at least one processor, wherein the processor extracts a feature map from the video, and the processor May provide an apparatus for detecting a behavioral frame, comprising generating an adjustment class activation sequence through a behavioral classification model by applying a foreground weight to the feature map.
- the processor converts the video into a plurality of color frames, extracts color features from the plurality of color frames, converts the video into an optical flow frame, and extracts optical flow features from the optical flow frame, and the color
- the feature map may be generated by combining the feature and the optical flow feature.
- the processor may generate the adjustment class activation sequence by filtering the background frame so that the background frame is not activated in the adjustment class activation sequence through a filtering model for the feature map to adjust the foreground weight.
- the processor may generate a basic class activation sequence for the feature map through the behavior classification model while generating the adjustment class activation sequence.
- the processor calculates a base class score for the base class activation sequence to perform positive learning for the behavior class and the background class, and calculates an adjustment class score for the adjustment class activation sequence to positively learn the behavior class. And negative learning may be performed on the background class.
- FIG. 1 is a block diagram illustrating an apparatus for detecting an action frame according to an embodiment of the present invention.
- FIG. 2 is a flowchart illustrating a method of detecting an action frame according to another embodiment of the present invention.
- FIG. 3 is a diagram illustrating a behavior classification model of a behavior frame detection apparatus according to embodiments of the present invention.
- Supervised Learning is a learning strategy in which the correct answer is given, and has the premise that we can know what the correct output is for which input.
- For supervised learning when learning about a data set, correct answers for each of the data constituting the data set are required.
- the present invention generates an adjustment class activation sequence through a behavior classification model in which a background frame is suppressed from a feature map extracted from a video, and accurately detects a behavior frame based on weak supervised learning.
- FIG. 1 is a block diagram illustrating an apparatus for detecting an action frame according to an embodiment of the present invention.
- the action frame detection apparatus 110 includes at least one processor 120, a computer-readable storage medium 130, and a communication bus 170.
- the processor 120 may be controlled to operate as the action frame detection device 110.
- the processor 120 may execute one or more programs stored in the computer-readable storage medium 130.
- One or more programs may include one or more computer-executable instructions, and when the computer-executable instructions are executed by the processor 120, the action frame detection apparatus 110 is configured to perform operations according to an exemplary embodiment. Can be.
- Computer-readable storage medium 130 is configured to store computer-executable instructions or program code, program data, and/or other suitable form of information.
- the program 140 stored in the computer-readable storage medium 130 includes a set of instructions executable by the processor 120.
- the computer-readable storage medium 130 includes memory (volatile memory such as random access memory, nonvolatile memory, or a suitable combination thereof), one or more magnetic disk storage devices, optical disk storage devices, Flash memory devices, other types of storage media that can be accessed by the action frame detection apparatus 110 and store desired information, or a suitable combination thereof.
- the communication bus 170 interconnects various other components of the action frame detection device 110 including the processor 120 and a computer-readable storage medium 140.
- the action frame detection device 110 may also include one or more input/output interfaces 150 and one or more communication interfaces 160 that provide an interface for one or more input/output devices.
- the input/output interface 150 and the communication interface 160 are connected to the communication bus 170.
- the input/output device may be connected to other components of the action frame detection device 110 through the input/output interface 150.
- the behavior frame detection apparatus generates an adjustment class activation sequence through a behavior classification model in which a background frame is suppressed from a feature map extracted from a video, and accurately detects the behavior frame based on weak supervised learning.
- the behavioral frame detection apparatus filters the background frame so that the background frame is not activated in the adjustment class activation sequence through a filtering model for the feature map, and adjusts the foreground weight to generate the adjustment class activation sequence.
- the behavior frame detection apparatus generates a basic class activation sequence through a behavior classification model for a feature map while generating an adjustment class activation sequence.
- the behavior frame detection device calculates the base class score for the base class activation sequence, performs positive learning for the behavior class and the background class, and calculates the adjustment class score for the adjustment class activation sequence, and calculates the positive learning and background for the behavior class. Negative learning is performed on the class.
- the action frame detection method may be performed by an action frame detection apparatus or a computing device.
- step S210 the processor extracts a feature map from the video.
- the video is converted into a plurality of color frames, and color features are extracted from the plurality of color frames.
- the video is converted into an optical flow frame and optical flow features are extracted from the optical flow frame.
- a feature map is generated by combining a color feature and an optical flow feature.
- step S220 the processor generates a base class activation sequence for the feature map through the behavior classification model.
- a base class score is calculated for the base class activation sequence, and positive learning is performed on the behavior class and the background class.
- step S230 the processor applies the foreground weight to the feature map to generate an adjustment class activation sequence through the behavior classification model.
- the foreground weight is adjusted by filtering the background frame so that the background frame is not activated in the adjustment class activation sequence through the filtering model for the feature map.
- a class score is calculated for the adjustment class activation sequence, and positive learning for the behavior class and negative learning for the background class are performed.
- the weights of the behavior classification model are shared and learned together.
- FIG. 3 is a diagram illustrating a behavior classification model of a behavior frame detection apparatus according to embodiments of the present invention.
- the behavior frame detection device includes a feature extraction model and a behavior classification model.
- the feature extraction model is a network with connected layers and is a model that learns weights and biases.
- the feature extraction model may be implemented as a neural network such as a convolutional neural network (CNN).
- CNN convolutional neural network
- the feature extraction model extracts an RGB frame and an optical flow frame from an input video. After dividing the extracted frames into segments in units of 16 frames, 1024-dimensional RGB feature information and optical flow feature information are obtained from each segment. By connecting the RGB feature information and the optical flow feature information, a 2048-dimensional feature map is created.
- the behavioral classification model predicts a class of data and assigns a corresponding label.
- various classification models implemented by neural networks can be applied.
- a feature map is input to the convolutional network, and action and background class scores at each time are generated.
- the generated class activation sequence is trained to classify into an action class and a background class.
- the number of behavioral classes and background classes is expressed as C+1, where C is the number of behavioral classes. Induces the class activation sequence to be activated in the behavioral part.
- the action frame detection apparatus generates a base class activation sequence (A n ) in order to predict the class score at the segment level.
- ⁇ is the learning parameter of the convolutional layer.
- class scores at the segment level are aggregated.
- the score of the video unit is compared with the ground truth.
- the top K averages are applied to calculate the composite score.
- the class score in the video unit is used to predict the probability (p n) of the positive sample of each class.
- Equation 4 The loss function of the basic processing operation is expressed as Equation 4.
- y n is the label of the video level in the n-th video.
- a feature map is added to a filtering module to calculate a foreground weight, and then the foreground weight is multiplied by the feature map.
- a feature map in which the background is suppressed can be obtained. Similar to the basic processing operation, it is input to the convolutional network to generate a class activation sequence and classify the classes. Since the suppression processing operation is designed for the purpose of suppressing the background frame, it learns negatively about the background class. As a result, the filtering module can accurately calculate the foreground weight. Positive learning about the background class learns that there is a background as the correct answer, and negative learning about the background class learns that there is no background as the correct answer.
- Equation (5) The feature map (X' n ) multiplied by the foreground weight is expressed as Equation (5).
- Adjusting class activation sequence (A 'n) is expressed as Equation (6).
- Equation 7 The loss function of the suppression processing operation is expressed as in Equation 7.
- the label at the video level is to be.
- the label for the background class is set to 1.
- the label at the video level is to be.
- the label for the background class is set to 0.
- ⁇ , ⁇ , ⁇ are optimization parameters, L1 normalization for attention weights Can be applied.
- the action frame detection apparatus detects the action frame by using the result of suppressing the background frame in the suppression processing operation.
- FIGS. 4 to 6 show simulation results performed according to embodiments of the present invention.
- 4 is a spike scene of a volleyball video
- FIG. 5 is a shot put video
- FIG. 6 is a penalty kick scene of a soccer video.
- GT ground truth
- the action frame detection apparatus may be implemented in a logic circuit by hardware, firmware, software, or a combination thereof, or may be implemented using a general purpose or specific purpose computer.
- the device may be implemented using a hardwired device, a field programmable gate array (FPGA), an application specific integrated circuit (ASIC), or the like.
- the device may be implemented as a System on Chip (SoC) including one or more processors and controllers.
- SoC System on Chip
- the action frame detection apparatus may be mounted in a form of software, hardware, or a combination thereof on a computing device or server provided with a hardware element.
- Computing devices or servers are all or part of a communication device such as a communication modem for performing communication with various devices or wired/wireless communication networks, a memory storing data for executing a program, and a microprocessor for calculating and commanding by executing a program. It can mean a variety of devices including.
- each process is described as sequentially executing, but this is only illustrative, and those skilled in the art may change the order shown in FIG. 2 without departing from the essential characteristics of the embodiment of the present invention. Or, by executing one or more processes in parallel, or adding other processes, various modifications and variations may be applied.
- Computer-readable medium refers to any medium that has participated in providing instructions to a processor for execution.
- the computer-readable medium may include program instructions, data files, data structures, or a combination thereof.
- there may be a magnetic medium, an optical recording medium, a memory, and the like.
- Computer programs may be distributed over networked computer systems to store and execute computer-readable codes in a distributed manner. Functional programs, codes, and code segments for implementing this embodiment may be easily inferred by programmers in the art to which this embodiment belongs.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Medical Informatics (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Artificial Intelligence (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Image Analysis (AREA)
Abstract
Les présents modes de réalisation de l'invention concernent un procédé et un dispositif qui permettent de détecter avec précision une trame d'action en fonction d'un apprentissage faiblement supervisé par génération d'une séquence d'activation de classe ajustée par l'intermédiaire d'un modèle de classification d'action qui a supprimé la trame d'arrière-plan dans une carte de caractéristiques extraite d'une image animée.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020190151551A KR102201353B1 (ko) | 2019-11-22 | 2019-11-22 | 배경 프레임 억제를 통한 약한 지도 학습 기반의 행동 프레임 검출 방법 및 장치 |
KR10-2019-0151551 | 2019-11-22 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021101052A1 true WO2021101052A1 (fr) | 2021-05-27 |
Family
ID=74127672
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/KR2020/012645 WO2021101052A1 (fr) | 2019-11-22 | 2020-09-18 | Procédé et dispositif de détection de trame d'action fondée sur un apprentissage faiblement supervisé, à l'aide d'une suppression de trame d'arrière-plan |
Country Status (2)
Country | Link |
---|---|
KR (1) | KR102201353B1 (fr) |
WO (1) | WO2021101052A1 (fr) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112818829B (zh) * | 2021-01-27 | 2022-09-09 | 中国科学技术大学 | 基于结构网络的弱监督时域动作定位方法及系统 |
CN116612420B (zh) * | 2023-07-20 | 2023-11-28 | 中国科学技术大学 | 弱监督视频时序动作检测方法、系统、设备及存储介质 |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019099226A1 (fr) * | 2017-11-14 | 2019-05-23 | Google Llc | Localisation d'action faiblement supervisée par réseau de regroupement temporel épars |
KR20190127261A (ko) * | 2018-05-04 | 2019-11-13 | 연세대학교 산학협력단 | 행동 인식을 위한 투 스트림 네트워크의 클래스 스코어 학습 방법 및 장치 |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR100785076B1 (ko) | 2006-06-15 | 2007-12-12 | 삼성전자주식회사 | 스포츠 동영상에서의 실시간 이벤트 검출 방법 및 그 장치 |
US9098923B2 (en) | 2013-03-15 | 2015-08-04 | General Instrument Corporation | Detection of long shots in sports video |
-
2019
- 2019-11-22 KR KR1020190151551A patent/KR102201353B1/ko active IP Right Grant
-
2020
- 2020-09-18 WO PCT/KR2020/012645 patent/WO2021101052A1/fr active Application Filing
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019099226A1 (fr) * | 2017-11-14 | 2019-05-23 | Google Llc | Localisation d'action faiblement supervisée par réseau de regroupement temporel épars |
KR20190127261A (ko) * | 2018-05-04 | 2019-11-13 | 연세대학교 산학협력단 | 행동 인식을 위한 투 스트림 네트워크의 클래스 스코어 학습 방법 및 장치 |
Non-Patent Citations (3)
Also Published As
Publication number | Publication date |
---|---|
KR102201353B1 (ko) | 2021-01-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2018217019A1 (fr) | Dispositif de détection d'un code malveillant variant sur la base d'un apprentissage de réseau neuronal, procédé associé, et support d'enregistrement lisible par ordinateur dans lequel un programme d'exécution dudit procédé est enregistré | |
WO2020246834A1 (fr) | Procédé de reconnaissance d'objet dans une image | |
WO2021101052A1 (fr) | Procédé et dispositif de détection de trame d'action fondée sur un apprentissage faiblement supervisé, à l'aide d'une suppression de trame d'arrière-plan | |
WO2021261696A1 (fr) | Segmentation d'instances d'objets visuels à l'aide d'une imitation de modèle spécialisé de premier plan | |
WO2020071701A1 (fr) | Procédé et dispositif de détection d'un objet en temps réel au moyen d'un modèle de réseau d'apprentissage profond | |
EP3568828A1 (fr) | Appareil et procédé de traitement d'images utilisant une carte de caractéristiques multicanaux | |
WO2022158819A1 (fr) | Procédé et dispositif électronique pour déterminer une saillance de mouvement et un style de lecture vidéo dans une vidéo | |
WO2021054706A1 (fr) | Apprendre à des gan (réseaux antagonistes génératifs) à générer une annotation par pixel | |
WO2017138766A1 (fr) | Procédé de regroupement d'image à base hybride et serveur de fonctionnement associé | |
CN107316035A (zh) | 基于深度学习神经网络的对象识别方法及装置 | |
WO2013048159A1 (fr) | Procédé, appareil et support d'enregistrement lisible par ordinateur pour détecter un emplacement d'un point de caractéristique de visage à l'aide d'un algorithme d'apprentissage adaboost | |
WO2021246810A1 (fr) | Procédé d'entraînement de réseau neuronal par auto-codeur et apprentissage multi-instance, et système informatique pour la mise en oeuvre de ce procédé | |
WO2020231005A1 (fr) | Dispositif de traitement d'image et son procédé de fonctionnement | |
WO2020231226A1 (fr) | Procédé de réalisation, par un dispositif électronique, d'une opération de convolution au niveau d'une couche donnée dans un réseau neuronal, et dispositif électronique associé | |
WO2021246811A1 (fr) | Procédé et système d'entraînement de réseau neuronal pour déterminer la gravité | |
WO2020017829A1 (fr) | Procédé de génération d'image de plaque d'immatriculation à l'aide d'un motif de bruit et appareil associé | |
CN108921023A (zh) | 一种确定低质量人像数据的方法及装置 | |
WO2023013809A1 (fr) | Procédé de commande pour appareil d'apprentissage de classification d'activité sportive, et support d'enregistrement et appareil pour sa mise en œuvre | |
WO2021091096A1 (fr) | Procédé et appareil de réponse à des questions visuelles utilisant un réseau de classification d'équités | |
CN110852209A (zh) | 目标检测方法及装置、介质和设备 | |
WO2019225799A1 (fr) | Procédé et dispositif de suppression d'informations d'utilisateur à l'aide d'un modèle génératif d'apprentissage profond | |
WO2022191366A1 (fr) | Dispositif électronique et son procédé de commande | |
WO2023277448A1 (fr) | Procédé et système d'entraînement de modèle de réseau neuronal artificiel pour traitement d'image | |
WO2023033194A1 (fr) | Procédé et système de distillation de connaissances spécialisés pour l'éclaircissement de réseau neuronal profond à base d'élagage | |
WO2021071258A1 (fr) | Dispositif et procédé d'apprentissage d'image de sécurité mobile basés sur l'intelligence artificielle |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20890263 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20890263 Country of ref document: EP Kind code of ref document: A1 |