WO2023136417A1 - Procédé et dispositif de construction d'un modèle de transformateur pour répondre à une question d'histoire vidéo - Google Patents

Procédé et dispositif de construction d'un modèle de transformateur pour répondre à une question d'histoire vidéo Download PDF

Info

Publication number
WO2023136417A1
WO2023136417A1 PCT/KR2022/012050 KR2022012050W WO2023136417A1 WO 2023136417 A1 WO2023136417 A1 WO 2023136417A1 KR 2022012050 W KR2022012050 W KR 2022012050W WO 2023136417 A1 WO2023136417 A1 WO 2023136417A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
story
question
transformer model
answer
Prior art date
Application number
PCT/KR2022/012050
Other languages
English (en)
Korean (ko)
Inventor
장병탁
최성호
Original Assignee
서울대학교 산학협력단
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 서울대학교 산학협력단 filed Critical 서울대학교 산학협력단
Publication of WO2023136417A1 publication Critical patent/WO2023136417A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/73Querying
    • G06F16/732Query formulation
    • G06F16/7328Query by example, e.g. a complete video frame or video sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2477Temporal data queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/73Querying
    • G06F16/732Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7844Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using original textual content or text extracted from visual content or transcript of audio data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7847Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using low-level visual features of the video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology

Definitions

  • Embodiments disclosed herein relate to an apparatus and method for building a transformer model for video story question and answer, and more particularly, to learn a video story by considering the context of a video clip included in video data. It relates to an apparatus and method for building a transformer model for story question and answer.
  • video question answering measures video comprehension ability with the accuracy of multi-choice multiple-choice questions in natural language form.
  • Patent Document 1 Korean Patent Publication No. 10-2020-0144417 (published on December 29, 2020)
  • Embodiments disclosed in this specification are intended to provide an apparatus and method for building a transformer model for video story question and answer to learn a video story in consideration of the context of the video clip included in the video data.
  • an apparatus for building a transformer model for video story question answering receives video data including a plurality of continuous video clips and question data for video question answering, and video an input/output unit for outputting a story query response result; a storage unit for storing programs and data for performing video story Q&A; and a control unit including at least one processor and executing the program to build a transformer model for video story Q&A, wherein the control unit performs a temporal order from video data including the plurality of consecutive video clips. It is characterized in that the video story is learned in consideration of the context of the adjacent video clips before and after.
  • a method of building a transformer model for video story question and answer performed by an apparatus for building a transformer model for video story question and answer includes video data including a plurality of continuous video clips and video question and answer. Receiving question data for; and learning a video story from the video data including the plurality of consecutive video clips in consideration of the contexts of the video clips before and after that are adjacent to each other in temporal order.
  • the recording medium is a computer readable recording medium on which a program for performing a method of building a transformer model for video story question and answer is recorded.
  • a method of building a transformer model for video story question and answering performed by an apparatus for building a transformer model for video story question and answering includes inputting video data including a plurality of continuous video clips and question data for video question and answering. receiving step; and learning a video story from the video data including the plurality of consecutive video clips in consideration of the contexts of the video clips before and after that are adjacent to each other in temporal order.
  • a computer program is executed by an apparatus for building a transformer model for video story question and answering, and is stored in a recording medium to perform a method of building a transformer model for video story questioning and answering.
  • a method of building a transformer model for video story question and answering performed by an apparatus for building a transformer model for video story question and answering includes inputting video data including a plurality of continuous video clips and question data for video question and answering. receiving step; and learning a video story from the video data including the plurality of consecutive video clips in consideration of the contexts of the video clips before and after that are adjacent to each other in temporal order.
  • any one of the above-described problem solving means in performing a video story question and answer by constructing a transformer that considers the context of the video clip included in the video data, it is possible to effectively process a long video without incurring a large calculation cost. There are possible effects.
  • FIG. 1 is a diagram for explaining a transformer model according to the prior art.
  • FIG. 2 is a diagram for explaining a transformer model according to an exemplary embodiment.
  • FIG. 3 is a functional block diagram of a device for building a transformer model for video story question and answer according to an embodiment.
  • FIG. 4 is a flowchart illustrating a method of constructing a transformer model for video story question and answer according to an exemplary embodiment.
  • FIG. 1 is a diagram for explaining a transformer model according to the prior art.
  • Figure 1 shows a transformer model according to the prior art, and shows the structure of the transformer model used for video representation (representation) learning.
  • the transformer model may be a vanilla transformer.
  • the transformer shown in FIG. 1 may be configured such that the encoder 100 is separated for each layer for all video frames.
  • the encoder 100 separated for each section (S 1 , S 2 , S 3 ) in the transformer shown in FIG. 1 may be a temporal transformer.
  • Video story question and answer can be performed using the transformer model shown in FIG . 1, but the transformer as shown in FIG . does not consider the context of the video clip included in the input video data, the computational cost increases exponentially when the length of the video increases, and is used only in short video story Q&A. It became.
  • a transformer capable of more effectively processing a long video has been required, and accordingly, a transformer considering the context of a video clip included in video data has been constructed.
  • a transformer in consideration of the context of a video clip included in video data according to an exemplary embodiment will be described later in detail with reference to FIGS. 2 and 3 .
  • FIG. 2 is a diagram for explaining a transformer model according to an exemplary embodiment.
  • the transformer model may be a contextual transformer.
  • the transformer shown in FIG. 2 may be configured such that the encoder 200 is separated for each layer for all video frames.
  • the separate encoder 200 for each section (S 1 , S 2 , S 3 ) learns the video story by considering the context of the video clip included in the input video data. , the number of video clips of the front and back sections that can be considered may vary as the hierarchy increases.
  • a video clip may mean a short recorded video.
  • the video data may include a plurality of continuous video clips
  • the above-described video clips may include a plurality of visual tokens and text tokens.
  • the encoder 200 separated for each section (S 1 , S 2 , S 3 ) may be a cross-modal transformer, and the above-described cross-modal transformer may receive visual tokens and text tokens corresponding to each section (S 1 , S 2 , S 3 ) as inputs.
  • the above-described transformer of FIG. 2 may be built by an apparatus for building a transformer model for video story question and answer shown in FIG. 3 .
  • FIG. 3 is a functional block diagram of a device for building a transformer model for video story question and answer according to an embodiment.
  • an apparatus 300 for constructing a transformer model for video story question and answer includes an input/output unit 310, a storage unit 320, and a control unit 330.
  • the input/output unit 310 may include an input unit for receiving an input from a user and an output unit for displaying information such as the status of the device 300 for constructing a transformer model for a task execution result or video story question and answer. . That is, the input/output unit 310 is a component for receiving video data including a plurality of continuous video clips and question data for video question answering, and outputting a video story question answering result.
  • the video clip may include a plurality of visual tokens and text tokens.
  • the storage unit 320 is a component capable of storing files and programs, and may be configured through various types of memories.
  • the storage unit 320 may store data and programs that enable the controller 330 to build a transformer model for video story question and answer according to an algorithm presented below.
  • the controller 330 is a component including at least one processor such as a CPU, GPU, iOS, and the like, and can control the overall operation of the apparatus 300 for building a transformer model for video story question and answer. That is, the controller 330 may control other elements included in the apparatus 300 for building a transformer model for video story question and answer so as to perform video story question and answer.
  • the control unit 330 may perform an operation to build a transformer model for video story question and answer according to an algorithm presented below by executing a program stored in the storage unit 320 . A method for the controller 330 to perform an operation to build a transformer model for answering a video story question will be described later.
  • the controller 330 may learn a video story from video data including a plurality of continuous video clips by considering contexts of video clips before and after that are adjacent to each other in temporal order.
  • the video clip may include a plurality of visual tokens and text tokens.
  • video data input through the input/output unit 310 is T consecutive video clips ( ) can be expressed as
  • each video clip ( ) may include N visual tokens and M text tokens.
  • a transformer having a general structure according to the prior art is used, a video clip ( ) can generate a hidden representation of can be expressed as In this case, d may mean a hidden dimension.
  • the hidden representation can be modified and used as shown in Equation 1 below.
  • Equation 2 Corresponds to a query, a key, and a value of the transformer structure, respectively, and m may mean a memory length. Meanwhile, is an extended context, where Only users can be a difference from transformers according to the prior art. also, is the linear projection parameter to be learned, may mean a stop-gradient. On the other hand, if the iterative regression transformer according to Equation 1 described above is modified and expressed in consideration of the context, it can be expressed as Equation 2 below.
  • the controller 330 may build a transformer model for video story question and answer using Equation 2 described above.
  • the control unit 330 receives visual tokens and text tokens included in each video clip for each video clip corresponding to a preset section through each of the separated encoders as inputs, and controls the lower layer of the video clips before and after each adjacent to each other.
  • Video storage can be learned by calculating a hidden representation and calculating a representation of video data considering the context using the calculated hidden representation.
  • the controller 330 may learn a temporal order for each video clip using a masked modality model (hereinafter referred to as MMM).
  • MMMM Masked Modality Model
  • MMMM is a token-based masking technique proposed in the previous model, the Masked Language Model, that masks all tokens in a given section.
  • the Masked Modality Model allows one Modality to be created from another Modality, while preventing encoders from generating masked tokens too easily from surrounding tokens. and the alignment between modalities can be learned.
  • the modality may be video and text. Accordingly, when the above-described learning is performed using the context transformer according to an embodiment, it is possible to predict the content of a segment (eg, video data separated by section) based on the context before and after, thereby learning the natural flow of the story. can do.
  • the masked modality model may be learned through negative contrastive learning.
  • the masked modality model can be expressed as Equation 3 below.
  • the predicted token is closer to the ground-truth token embedding and away from other tokens.
  • FIG. 4 is a flowchart illustrating a method of constructing a transformer model for video story question and answer according to an exemplary embodiment.
  • the method of building a transformer model for video story question and answer according to the embodiment shown in FIG. 4 is processed time-sequentially in the apparatus 100 for building a transformer model for video story question and answer shown in FIGS. 2 and 3 includes steps to Therefore, even if the contents are omitted below, the above description of the apparatus 100 for building a transformer model for the video story question and answer shown in FIGS. 2 and 3 is based on the embodiment shown in FIG. It can also be applied to a method of building a transformer model for video story question and answer.
  • the apparatus 100 for building a transformer model for video story question answering may receive video data including a plurality of continuous video clips and question data for video question answering (S410).
  • the video clip may include a plurality of visual tokens and text tokens.
  • the apparatus 100 for constructing a transformer model for video story question and answer builds a video story from video data including a plurality of continuous video clips input in step S410 by considering the context of adjacent video clips before and after each other in chronological order. It can be learned (S420).
  • the apparatus 100 for constructing a transformer model for video story question and answer receives as input a visual token and a text token included in each video clip corresponding to a preset section through separate encoders, Video storage can be learned by calculating a hidden representation of a lower layer of adjacent video clips before and after, and calculating a representation of video data considering the context using the calculated hidden representation.
  • the apparatus 100 for constructing a transformer model for video story Q&A can learn a temporal order for each video clip by using a Masked Modality Model (MMM).
  • MMM Masked Modality Model
  • MMM is a token-based masking technique proposed in the previous model, the Masked Language Model, that masks all tokens in a given section. may have been extended to
  • MMM allows one Modality to be created from another Modality, while preventing encoders from generating masked tokens too easily from surrounding tokens. and the alignment between modalities can be learned. Meanwhile, the masked modality model may be learned through negative contrastive learning. In this case, the masked modality model can be expressed as Equation 3 described above.
  • ' ⁇ unit' used in the above embodiments means software or a hardware component such as a field programmable gate array (FPGA) or ASIC, and ' ⁇ unit' performs certain roles.
  • ' ⁇ part' is not limited to software or hardware.
  • ' ⁇ bu' may be configured to be in an addressable storage medium and may be configured to reproduce one or more processors. Therefore, as an example, ' ⁇ unit' refers to components such as software components, object-oriented software components, class components, and task components, processes, functions, properties, and procedures. , subroutines, segments of program patent code, drivers, firmware, microcode, circuitry, data, databases, data structures, tables, arrays and variables.
  • components and ' ⁇ units' may be implemented to play one or more CPUs in a device or a secure multimedia card.
  • the method for building a transformer model for video story question and answer may be implemented in the form of a computer-readable medium storing instructions and data executable by a computer.
  • instructions and data may be stored in the form of program codes, and when executed by a processor, a predetermined program module may be generated to perform a predetermined operation.
  • computer-readable media can be any available media that can be accessed by a computer and includes both volatile and nonvolatile media, removable and non-removable media.
  • a computer-readable medium may be a computer recording medium, which is a volatile and non-volatile memory implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules, or other data.
  • the computer recording medium may be a magnetic storage medium such as HDD and SSD, an optical recording medium such as CD, DVD, and Blu-ray disc, or a memory included in a server accessible through a network.
  • the method of building a transformer model for video story question and answer may be implemented as a computer program (or computer program product) including instructions executable by a computer.
  • a computer program includes programmable machine instructions processed by a processor and may be implemented in a high-level programming language, object-oriented programming language, assembly language, or machine language.
  • the computer program may be recorded on a tangible computer-readable recording medium (eg, a memory, a hard disk, a magnetic/optical medium, or a solid-state drive (SSD)).
  • SSD solid-state drive
  • a computing device may include at least some of a processor, a memory, a storage device, a high-speed interface connected to the memory and a high-speed expansion port, and a low-speed interface connected to a low-speed bus and a storage device.
  • a processor may include at least some of a processor, a memory, a storage device, a high-speed interface connected to the memory and a high-speed expansion port, and a low-speed interface connected to a low-speed bus and a storage device.
  • Each of these components are connected to each other using various buses and may be mounted on a common motherboard or mounted in any other suitable manner.
  • the processor may process commands within the computing device, for example, to display graphic information for providing a GUI (Graphic User Interface) on an external input/output device, such as a display connected to a high-speed interface.
  • GUI Graphic User Interface
  • Examples include instructions stored in memory or storage devices.
  • multiple processors and/or multiple buses may be used along with multiple memories and memory types as appropriate.
  • the processor may be implemented as a chipset comprising chips including a plurality of independent analog and/or digital processors.
  • Memory also stores information within the computing device.
  • the memory may consist of a volatile memory unit or a collection thereof.
  • the memory may be composed of a non-volatile memory unit or a collection thereof.
  • Memory may also be another form of computer readable medium, such as, for example, a magnetic or optical disk.
  • a storage device may provide a large amount of storage space to the computing device.
  • a storage device may be a computer-readable medium or a component that includes such a medium, and may include, for example, devices in a storage area network (SAN) or other components, such as a floppy disk device, a hard disk device, an optical disk device, or a tape device, flash memory, or other semiconductor memory device or device array of the like.
  • SAN storage area network

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Library & Information Science (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Fuzzy Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Electrically Operated Instructional Devices (AREA)

Abstract

Sont divulgués ici un dispositif et un procédé de construction d'un modèle de transformateur pour répondre à une question d'histoire vidéo. Le dispositif de construction d'un modèle de transformateur pour répondre à une question d'histoire vidéo comprend : une unité d'entrée-sortie permettant de recevoir des données vidéo qui comprennent une pluralité de clips vidéo consécutifs et des données de question pour répondre à une question vidéo et émettre le résultat d'exécution d'opérations sur les données vidéo et les données de question ; une unité de stockage dans laquelle un programme et des données pour répondre à la question d'histoire vidéo sont mémorisés ; et une unité de commande qui comprend au moins un processeur et qui construit un modèle de transformateur pour répondre à une question d'histoire vidéo en exécutant le programme, l'unité de commande apprenant une histoire vidéo à partir des données vidéo comprenant la pluralité de clips vidéo consécutifs en tenant compte du contexte de clips vidéo chronologiquement successifs.
PCT/KR2022/012050 2022-01-14 2022-08-11 Procédé et dispositif de construction d'un modèle de transformateur pour répondre à une question d'histoire vidéo WO2023136417A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
KR10-2022-0005770 2022-01-14
KR1020220005770A KR20230109931A (ko) 2022-01-14 2022-01-14 비디오 스토리 질의 응답을 위한 트랜스포머 모델을 구축하는 장치 및 방법

Publications (1)

Publication Number Publication Date
WO2023136417A1 true WO2023136417A1 (fr) 2023-07-20

Family

ID=87279250

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2022/012050 WO2023136417A1 (fr) 2022-01-14 2022-08-11 Procédé et dispositif de construction d'un modèle de transformateur pour répondre à une question d'histoire vidéo

Country Status (3)

Country Link
JP (1) JP2023103966A (fr)
KR (1) KR20230109931A (fr)
WO (1) WO2023136417A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117439800A (zh) * 2023-11-21 2024-01-23 河北师范大学 一种网络安全态势预测方法、系统及设备

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101369270B1 (ko) * 2012-03-29 2014-03-10 서울대학교산학협력단 멀티 채널 분석을 이용한 비디오 스트림 분석 방법
KR20190056940A (ko) * 2017-11-17 2019-05-27 삼성전자주식회사 멀티모달 데이터 학습 방법 및 장치
KR102211939B1 (ko) * 2018-12-07 2021-02-04 서울대학교산학협력단 질의 응답 장치 및 방법

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102276728B1 (ko) 2019-06-18 2021-07-13 빅펄 주식회사 멀티모달 콘텐츠 분석 시스템 및 그 방법

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101369270B1 (ko) * 2012-03-29 2014-03-10 서울대학교산학협력단 멀티 채널 분석을 이용한 비디오 스트림 분석 방법
KR20190056940A (ko) * 2017-11-17 2019-05-27 삼성전자주식회사 멀티모달 데이터 학습 방법 및 장치
KR102211939B1 (ko) * 2018-12-07 2021-02-04 서울대학교산학협력단 질의 응답 장치 및 방법

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
CHOI, SEONGHO ET AL.: "Multi-modal Contextual Transformer for Video Question Answering", PROCEEDINGS OF KOREA SOFTWARE CONGRESS 2021, December 2021 (2021-12-01), pages 801 - 803, XP009547739 *
IVANO LAURIOLA; ALESSANDRO MOSCHITTI: "Context-based Transformer Models for Answer Sentence Selection", ARXIV.ORG, 1 June 2020 (2020-06-01), XP081690019 *
XU HU, GHOSH GARGI, HUANG PO-YAO, ARORA PRAHAL, AMINZADEH MASOUMEH, FEICHTENHOFER CHRISTOPH, METZE FLORIAN, ZETTLEMOYER LUKE: "VLM: Task-agnostic Video-Language Model Pre-training for Video Understanding", FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL-IJCNLP 2021, ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, STROUDSBURG, PA, USA, 1 January 2021 (2021-01-01), Stroudsburg, PA, USA, pages 4227 - 4239, XP093078913, DOI: 10.18653/v1/2021.findings-acl.370 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117439800A (zh) * 2023-11-21 2024-01-23 河北师范大学 一种网络安全态势预测方法、系统及设备
CN117439800B (zh) * 2023-11-21 2024-06-04 河北师范大学 一种网络安全态势预测方法、系统及设备

Also Published As

Publication number Publication date
KR20230109931A (ko) 2023-07-21
JP2023103966A (ja) 2023-07-27

Similar Documents

Publication Publication Date Title
Burns et al. A dataset for interactive vision-language navigation with unknown command feasibility
JP6267711B2 (ja) モデル化された依存関係に基づく、レガシーソフトウェアシステムのモダニゼーション
WO2017164478A1 (fr) Procédé et appareil de reconnaissance de micro-expressions au moyen d'une analyse d'apprentissage profond d'une dynamique micro-faciale
US10553207B2 (en) Systems and methods for employing predication in computational models
US10664659B2 (en) Method for modifying segmentation model based on artificial intelligence, device and storage medium
CN102741859A (zh) 用于减少模式辨识处理器中的功率消耗的方法及设备
US20190130270A1 (en) Tensor manipulation within a reconfigurable fabric using pointers
WO2020231005A1 (fr) Dispositif de traitement d'image et son procédé de fonctionnement
WO2022163996A1 (fr) Dispositif pour prédire une interaction médicament-cible à l'aide d'un modèle de réseau neuronal profond à base d'auto-attention, et son procédé
WO2023136417A1 (fr) Procédé et dispositif de construction d'un modèle de transformateur pour répondre à une question d'histoire vidéo
WO2022059969A1 (fr) Procédé de pré-apprentissage de réseau neuronal profond permettant une classification de données d'électrocardiogramme
WO2018056613A1 (fr) Processeur multifil et procédé de commande de celui-ci
WO2022080582A1 (fr) Procédé d'apprentissage par renforcement orienté cible et dispositif pour sa réalisation
CN110647360A (zh) 协处理器的设备执行代码的处理方法、装置、设备及计算机可读存储介质
US20210264247A1 (en) Activation function computation for neural networks
WO2022025357A1 (fr) Procédé de traitement de codage de blocs pour l'enseignement de la programmation
WO2023068463A1 (fr) Système de dispositif de stockage pour simulation de circuit quantique
WO2021045434A1 (fr) Dispositif électronique et procédé de commande associé
WO2023106466A1 (fr) Dispositif et procédé d'apprentissage en nuage d'intelligence artificielle basé sur un type de nuage d'apprentissage
WO2021020848A2 (fr) Opérateur matriciel et procédé de calcul matriciel pour réseau de neurones artificiels
Chamunorwa et al. Embedded system learning platform for developing economies
WO2023101112A1 (fr) Procédé d'apprentissage par méta-renforcement hors ligne de multiples tâches et dispositif informatique permettant sa mise en œuvre
CN111340043B (zh) 关键点检测方法、系统、设备及存储介质
US20210019592A1 (en) Cooperative Neural Network for Recommending Next User Action
US20200184369A1 (en) Machine learning in heterogeneous processing systems

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22920750

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE