WO2024057374A1 - 抽出システム、抽出方法および抽出プログラム - Google Patents

抽出システム、抽出方法および抽出プログラム Download PDF

Info

Publication number
WO2024057374A1
WO2024057374A1 PCT/JP2022/034124 JP2022034124W WO2024057374A1 WO 2024057374 A1 WO2024057374 A1 WO 2024057374A1 JP 2022034124 W JP2022034124 W JP 2022034124W WO 2024057374 A1 WO2024057374 A1 WO 2024057374A1
Authority
WO
WIPO (PCT)
Prior art keywords
extraction
data
policy
tokens
unit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/JP2022/034124
Other languages
English (en)
French (fr)
Japanese (ja)
Inventor
モニカ ロスリアナ ブスト
昇平 榎本
毅晴 江田
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NTT Inc
Original Assignee
Nippon Telegraph and Telephone Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nippon Telegraph and Telephone Corp filed Critical Nippon Telegraph and Telephone Corp
Priority to PCT/JP2022/034124 priority Critical patent/WO2024057374A1/ja
Priority to JP2024546692A priority patent/JPWO2024057578A1/ja
Priority to PCT/JP2023/006600 priority patent/WO2024057578A1/ja
Publication of WO2024057374A1 publication Critical patent/WO2024057374A1/ja
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the present invention relates to an extraction system, an extraction method, and an extraction program.
  • edge computing data acquired by sensors placed at the edge is offloaded to cloud servers.
  • edge devices and cloud servers share calculations, and data compression when offloading from edge devices to cloud servers are being considered.
  • Non-Patent Documents 1 and 2 a technique for extracting a region of interest (ROI) is known (see Non-Patent Documents 1 and 2). For small-scale systems, extracting the ROI as a pre-processing for offloading is sufficient as it minimizes the data size.
  • the present invention has been made in view of the above, and aims to reduce data offloaded from edge devices to cloud servers in transformer-based collaborative intelligence.
  • an extraction device includes an acquisition unit that acquires data to be processed and a policy that specifies a token to be extracted from among the tokens constituting the data. , an extraction unit that extracts a token to be transmitted to a cloud server from the data according to the policy.
  • FIG. 1 is a diagram for explaining an overview of a system including an extraction device.
  • FIG. 2 is a diagram for explaining an overview of a system including an extraction device.
  • FIG. 3 is a diagram for explaining an overview of a system including an extraction device.
  • FIG. 4 is a schematic diagram illustrating a schematic configuration of a system including an extraction device.
  • FIG. 5 is a diagram for explaining the processing of the system including the extraction device.
  • FIG. 6 is a flowchart showing the extraction processing procedure.
  • FIG. 7 illustrates an example of a computer that executes an extraction program.
  • Extraction system overview 1 to 3 are diagrams for explaining the outline of the extraction system.
  • the extraction system of this embodiment compresses and offloads data from an edge device to a cloud server in transformer-based collaborative intelligence.
  • a token is defined as the size of the original image (height x width x number of channels) (H x W x C), and N patches with a size of (P 2 x C). It is divided into.
  • FIG. 2 unlike general object segmentation, information regarding importance according to an attention-based policy is included. This allows data to be reduced because the object is input to the transformer only when importance is associated with it.
  • the policy output is used to change the data to be offloaded to JPEG format, and no new encoder or decoder design is required.
  • the transformer on the cloud server performs self-supervised relearning using auxiliary tokens to achieve robust inference. As a result, as illustrated in FIG. 3, even if input data is reduced, highly accurate inference is possible.
  • FIG. 4 is a schematic diagram illustrating a schematic configuration of the extraction system. Further, FIG. 5 is a diagram for explaining the processing of the extraction system.
  • the extraction system 1 of this embodiment includes an extraction device 10, a cloud server 20, and an edge device 30. Note that the extraction device 10 may be implemented in the same hardware as the edge device 30.
  • the extraction device 10 is realized by a general-purpose computer such as a personal computer, and includes a communication control section 13, a storage section 14, and a control section 15.
  • the communication control unit 13 is realized by a NIC (Network Interface Card) or the like, and controls communication between an external device and the control unit 15 via a telecommunication line such as a LAN (Local Area Network) or the Internet.
  • a NIC Network Interface Card
  • the communication control unit 13 controls communication between the cloud server 20, the edge device 30, etc., and the control unit 15.
  • the storage unit 14 is realized by a semiconductor memory element such as a RAM (Random Access Memory) or a flash memory, or a storage device such as a hard disk or an optical disk.
  • a processing program for operating the extraction device 10 data used during execution of the processing program, etc. are stored in advance, or are temporarily stored each time processing is performed.
  • the storage unit 14 may be configured to communicate with the control unit 15 via the communication control unit 13.
  • the storage unit 14 stores a policy 14a and the like used in the extraction process described later.
  • the control unit 15 is realized using a CPU (Central Processing Unit), an NP (Network Processor), an FPGA (Field Programmable Gate Array), etc., and executes a processing program stored in a memory. Thereby, the control unit 15 functions as an acquisition unit 15a and an extraction unit 15b, as illustrated in FIG. Note that these functional units may be implemented in different hardware. Further, the control unit 15 may include other functional units.
  • CPU Central Processing Unit
  • NP Network Processor
  • FPGA Field Programmable Gate Array
  • the acquisition unit 15a acquires the data to be processed and a policy 14a that specifies the token to be extracted from among the tokens that make up the data. For example, the acquisition unit 15a acquires an image (image data) to be processed from the edge device 30 via an input unit or communication control unit 13 (not shown).
  • the acquisition unit 15a may store the acquired data in the storage unit 14 prior to the extraction process described below. Alternatively, the acquisition unit 15a may not store this information in the storage unit 14, but may immediately transfer it to the extraction unit 15b described below.
  • the acquisition unit 15a also acquires a policy 14a that specifies the token to be extracted from among the tokens that make up the data.
  • the policy 14a is trained to identify tokens among the tokens that make up the input data according to their importance for the task.
  • the offloading policy imitates the attention map of a self-supervised teacher model such as DINO (Self-Distillation with NO labels). This is what we were taught to do.
  • DINO Self-Distillation with NO labels
  • the loss function L KL is expressed by the following equation (1).
  • the attention rank r is used to improve the distillation of the attention map.
  • the rank is compared with Spearman's rank correlation coefficient rs , and the loss function L RANK is expressed by the following equation (2).
  • the attention score is a measure of importance
  • rank loss is used in order for the policy 14a to learn to maintain the same rank of token importance as the teacher model.
  • the loss function of the learning target is obtained by combining the knowledge distillation loss function L KL using the KL divergence in the above equation (1) and the rank loss function L RANK in the above equation (2), using the following equation ( 3).
  • the extraction unit 15b extracts a token to be sent to the cloud server 20 from the data according to the policy 14a. Specifically, as shown in area b of FIG. 5, the extraction unit 15b retains only a predetermined number k of tokens from the top among the attention values output by the policy 14a, discards the rest, and performs offloading. Generate data to The data to be offloaded is JPEG format data in which the token has been changed (retained or discarded).
  • the number of bits used to represent the frequency components of the zero pixel region is significantly reduced, so the data size of the continuous zero pixel region is effectively reduced.
  • the cloud server 20 is virtually constructed on a general-purpose computer such as a server device, and includes a storage section 24 and a control section 25.
  • the storage unit 24 is realized by a semiconductor memory element such as a RAM or a flash memory, or a storage device such as a hard disk or an optical disk.
  • a processing program for operating the cloud server 20 data used during execution of the processing program, and the like are stored in advance, or are temporarily stored each time processing is performed.
  • the storage unit 24 may be configured to communicate with the control unit 25 via a communication control unit (not shown).
  • the storage unit 24 stores a model 24a used in extraction processing described later.
  • the control unit 25 is realized using a CPU, NP, FPGA, etc., and executes a processing program stored in a memory. Thereby, the control unit 25 functions as a prediction unit 25a and a learning unit 25g, as illustrated in FIG. Note that the control unit 25 may include other functional units.
  • the prediction unit 25a inputs the extracted tokens into the model 24a to predict data.
  • This model 24a is, for example, a ViT (Vision Transformer). In order to maintain the robustness of the ViT model to sparse inputs, it is necessary to perform retraining.
  • the learning unit 25b re-learns the model 24a by adding tokens whose importance is equal to or higher than a predetermined threshold. Specifically, as shown in area c of FIG. 5, the learning unit 25b performs relearning of ViT by rotation prediction as a self-supervised task. At that time, a CLS token and an auxiliary token are added. In this case, the rotation prediction loss function L ROT is expressed by the following equation (4).
  • the target loss function for relearning is expressed as a combination of the rotation prediction loss function L ROT of the above equation (4) and the task L TASK such as classification, as shown in the following equation (5).
  • FIG. 6 is a flowchart showing the extraction processing procedure.
  • the flowchart in FIG. 6 is started, for example, at the timing when the user performs an operation input instructing to start.
  • the acquisition unit 15a acquires the data to be processed and the policy 14a that specifies the token to be extracted from among the tokens that make up the data (step S1). For example, the acquisition unit 15a acquires an image (image data) to be processed from the edge device 30 via the input unit or the communication control unit 13.
  • the extraction unit 15b extracts a token to be sent to the cloud server 20 from the data to be processed according to the policy 14a (step S2). Specifically, the extraction unit 15b retains only a predetermined number k of tokens from the top among the attention values output by the policy 14a, discards the rest, and generates data to be offloaded.
  • the prediction unit 25a predicts class probabilities by inputting the extracted tokens into a model 24a such as ViT (step S3). This completes the series of extraction processes.
  • the acquisition unit 15a acquires the data to be processed and the policy 14a that specifies the token to be extracted from among the tokens constituting the data. do.
  • the extraction unit 15b extracts a token to be transmitted to the cloud server 20 from the data according to the policy 14a.
  • the policy 14a is trained to identify tokens among the tokens that make up the input data according to their importance for the task. This makes it possible to reduce the amount of data offloaded from the edge device 30 to the cloud server 20 while minimizing the impact.
  • the prediction unit 25a inputs the extracted tokens into the model 24a to predict class probabilities. This makes it possible to reproduce the data to be processed.
  • the learning unit 25b re-learns the model 24a by adding tokens whose importance is greater than or equal to a predetermined threshold. As a result, even if input data is reduced, it is possible to suppress a decrease in task prediction accuracy.
  • the extraction device 10 can be implemented by installing an extraction program that executes the above extraction process on a desired computer as packaged software or online software.
  • the information processing device can be made to function as the extraction device 10.
  • the information processing device referred to here includes a desktop or notebook personal computer.
  • information processing devices include mobile communication terminals such as smartphones, mobile phones, and PHSs (Personal Handyphone Systems), as well as slate terminals such as PDAs (Personal Digital Assistants).
  • the functions of the extraction device 10 may be implemented in a cloud server.
  • FIG. 7 is a diagram showing an example of a computer that executes the extraction program.
  • Computer 1000 includes, for example, memory 1010, CPU 1020, hard disk drive interface 1030, disk drive interface 1040, serial port interface 1050, video adapter 1060, and network interface 1070. These parts are connected by a bus 1080.
  • the memory 1010 includes a ROM (Read Only Memory) 1011 and a RAM 1012.
  • the ROM 1011 stores, for example, a boot program such as BIOS (Basic Input Output System).
  • Hard disk drive interface 1030 is connected to hard disk drive 1031.
  • Disk drive interface 1040 is connected to disk drive 1041.
  • a removable storage medium such as a magnetic disk or an optical disk is inserted into the disk drive 1041, for example.
  • a mouse 1051 and a keyboard 1052 are connected to the serial port interface 1050.
  • a display 1061 is connected to the video adapter 1060.
  • the hard disk drive 1031 stores, for example, an OS 1091, an application program 1092, a program module 1093, and program data 1094. Each piece of information described in the above embodiments is stored in, for example, the hard disk drive 1031 or the memory 1010.
  • the extraction program is stored in the hard disk drive 1031, for example, as a program module 1093 in which commands to be executed by the computer 1000 are written.
  • a program module 1093 in which each process executed by the extraction device 10 described in the above embodiment is described is stored in the hard disk drive 1031.
  • data used for information processing by the extraction program is stored as program data 1094 in, for example, the hard disk drive 1031.
  • the CPU 1020 reads out the program module 1093 and program data 1094 stored in the hard disk drive 1031 to the RAM 1012 as necessary, and executes each of the above-described procedures.
  • program module 1093 and program data 1094 related to the extraction program are not limited to being stored in the hard disk drive 1031; for example, they may be stored in a removable storage medium and read by the CPU 1020 via the disk drive 1041 or the like. may be done.
  • the program module 1093 and program data 1094 related to the extraction program are stored in another computer connected via a network such as a LAN or WAN (Wide Area Network), and read out by the CPU 1020 via the network interface 1070. You can.
  • Extraction System 10 Extraction Device 13 Communication Control Unit 14, 24 Storage Unit 15, 25 Control Unit 15a Acquisition Unit 15b Extraction Unit 20 Cloud Server 25a Prediction Unit 25b Learning Unit 30 Edge Device

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Image Analysis (AREA)
PCT/JP2022/034124 2022-09-12 2022-09-12 抽出システム、抽出方法および抽出プログラム Ceased WO2024057374A1 (ja)

Priority Applications (3)

Application Number Priority Date Filing Date Title
PCT/JP2022/034124 WO2024057374A1 (ja) 2022-09-12 2022-09-12 抽出システム、抽出方法および抽出プログラム
JP2024546692A JPWO2024057578A1 (https=) 2022-09-12 2023-02-22
PCT/JP2023/006600 WO2024057578A1 (ja) 2022-09-12 2023-02-22 抽出システム、抽出方法および抽出プログラム

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2022/034124 WO2024057374A1 (ja) 2022-09-12 2022-09-12 抽出システム、抽出方法および抽出プログラム

Publications (1)

Publication Number Publication Date
WO2024057374A1 true WO2024057374A1 (ja) 2024-03-21

Family

ID=90274414

Family Applications (2)

Application Number Title Priority Date Filing Date
PCT/JP2022/034124 Ceased WO2024057374A1 (ja) 2022-09-12 2022-09-12 抽出システム、抽出方法および抽出プログラム
PCT/JP2023/006600 Ceased WO2024057578A1 (ja) 2022-09-12 2023-02-22 抽出システム、抽出方法および抽出プログラム

Family Applications After (1)

Application Number Title Priority Date Filing Date
PCT/JP2023/006600 Ceased WO2024057578A1 (ja) 2022-09-12 2023-02-22 抽出システム、抽出方法および抽出プログラム

Country Status (2)

Country Link
JP (1) JPWO2024057578A1 (https=)
WO (2) WO2024057374A1 (https=)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022064656A1 (ja) * 2020-09-25 2022-03-31 日本電信電話株式会社 処理システム、処理方法及び処理プログラム
WO2022113175A1 (ja) * 2020-11-24 2022-06-02 日本電信電話株式会社 処理方法、処理システム及び処理プログラム
WO2022130496A1 (ja) * 2020-12-15 2022-06-23 富士通株式会社 画像処理装置、画像処理方法及び画像処理プログラム

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022064656A1 (ja) * 2020-09-25 2022-03-31 日本電信電話株式会社 処理システム、処理方法及び処理プログラム
WO2022113175A1 (ja) * 2020-11-24 2022-06-02 日本電信電話株式会社 処理方法、処理システム及び処理プログラム
WO2022130496A1 (ja) * 2020-12-15 2022-06-23 富士通株式会社 画像処理装置、画像処理方法及び画像処理プログラム

Also Published As

Publication number Publication date
WO2024057578A1 (ja) 2024-03-21
JPWO2024057578A1 (https=) 2024-03-21

Similar Documents

Publication Publication Date Title
EP2806374B1 (en) Method and system for automatic selection of one or more image processing algorithm
CN112001914A (zh) 深度图像补全的方法和装置
CN113704531A (zh) 图像处理方法、装置、电子设备及计算机可读存储介质
JP2023547010A (ja) 知識の蒸留に基づくモデルトレーニング方法、装置、電子機器
WO2021027193A1 (zh) 人脸聚类方法、装置、设备和存储介质
CN113642583B (zh) 用于文本检测的深度学习模型训练方法及文本检测方法
CN113487618B (zh) 人像分割方法、装置、电子设备及存储介质
CN112561060A (zh) 神经网络训练方法及装置、图像识别方法及装置和设备
WO2020062191A1 (zh) 图像处理方法、装置及设备
CN112101386A (zh) 文本检测方法、装置、计算机设备和存储介质
WO2022064656A1 (ja) 処理システム、処理方法及び処理プログラム
CN113610064B (zh) 笔迹识别方法和装置
CN114863539A (zh) 一种基于特征融合的人像关键点检测方法及系统
CN113971644A (zh) 基于数据增强策略选择的图像识别方法及装置
CN114495102A (zh) 文本识别方法、文本识别网络的训练方法及装置
CN114581657A (zh) 基于多尺度条形空洞卷积的图像语义分割方法、设备和介质
CN110321892B (zh) 一种图片筛选方法、装置及电子设备
US20250086952A1 (en) Method of edge-cloud fusion-aware visual prompt large language model
CN114529750A (zh) 图像分类方法、装置、设备及存储介质
CN113011410A (zh) 字符识别模型的训练方法、字符识别方法及装置
CN113066059A (zh) 图像清晰度检测方法、装置、设备及存储介质
CN112733670A (zh) 指纹特征提取方法、装置、电子设备及存储介质
WO2024057374A1 (ja) 抽出システム、抽出方法および抽出プログラム
CN113705600A (zh) 特征图确定方法、装置、计算机设备及存储介质
CN116311271B (zh) 文本图像的处理方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22958709

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22958709

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: JP