WO2024101540A1 - Procédé d'apprentissage continu pour modèle de transformateur sur la base d'une attention d'un adaptateur et appareil associé - Google Patents
Procédé d'apprentissage continu pour modèle de transformateur sur la base d'une attention d'un adaptateur et appareil associé Download PDFInfo
- Publication number
- WO2024101540A1 WO2024101540A1 PCT/KR2023/002210 KR2023002210W WO2024101540A1 WO 2024101540 A1 WO2024101540 A1 WO 2024101540A1 KR 2023002210 W KR2023002210 W KR 2023002210W WO 2024101540 A1 WO2024101540 A1 WO 2024101540A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- adapter
- module
- output
- transformer
- layer
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 54
- 230000008569 process Effects 0.000 claims description 25
- 238000010606 normalization Methods 0.000 claims description 13
- 238000005516 engineering process Methods 0.000 description 14
- 238000012360 testing method Methods 0.000 description 10
- 238000010586 diagram Methods 0.000 description 7
- 238000004891 communication Methods 0.000 description 6
- 230000006870 function Effects 0.000 description 4
- 230000003287 optical effect Effects 0.000 description 3
- 230000007246 mechanism Effects 0.000 description 2
- 238000003058 natural language processing Methods 0.000 description 2
- 238000013526 transfer learning Methods 0.000 description 2
- 102100033814 Alanine aminotransferase 2 Human genes 0.000 description 1
- 101000779415 Homo sapiens Alanine aminotransferase 2 Proteins 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000001066 destructive effect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 230000000116 mitigating effect Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000011017 operating method Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000012549 training Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
- G06N3/0455—Auto-encoder networks; Encoder-decoder networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0499—Feedforward networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
Definitions
- the present invention relates to a transformer model continuous learning method and device based on adapter attention.
- Natural language processing is a technology that uses computers to analyze and process human natural language. Recently, deep learning technology has been applied to machine translation and natural language generation.
- transfer learning technology is commonly used to perform a desired task by fine-tuning a large-scale pre-learning model based on Transformer models such as BERT, GPT2, etc.
- the learning parameters of the transformer-based pre-learning model are more than one million, for efficiency in memory and learning speed, a technology is used to insert an adapter, a small-sized module, into the transformer model and then learn only the adapter during fine tuning. do.
- the adapter-based transformer continuous learning technology described above uses a method of mitigating the destructive forgetting phenomenon by using an adapter corresponding to the domain, even if the transformer model learns multiple domains by using a separate adapter for each domain.
- the testing phase requires selecting an appropriate adapter for each test sample.
- information to which domain each test sample belongs to is provided to the model during the testing phase.
- the domain of the data given in real time is often unknown, so this is not possible. This is not an appropriate method.
- the problem to be solved by the present invention is to provide an adapter attention-based transformer model continuous learning method and device to solve the problem of selecting an appropriate adapter for each test sample in the test phase in existing adapter-based transformer continuous learning technology. will be.
- the characteristic configuration of the present invention is as follows.
- a transformer model continuous learning method comprising:
- a transformer model continuous learning method in a device including a processor comprising: learning a plurality of adapters inserted into a transformer decoder for each domain by the device, and an adapter attention layer inserted into the transformer decoder by the device. It includes the step of fusing the information of the learned adapter for each domain using .
- the step of fusing the learned domain-specific adapter information includes preparing similar domain-specific data from the transformer decoder, inserting the adapter attention layer into the transformer decoder, and using the similar domain-specific data. This includes performing learning on the transformer decoder into which the adapter attention layer is inserted.
- the transformer decoder includes a multi-head attention layer and a feed forward network layer located after the multi-head attention layer, and the plurality of adapters are inserted between the multi-head attention layer and the feed forward network layer. It is included in a first adapter module and a second adapter module inserted after the feedforward network layer, and the adapter attention layer includes a first adapter attention layer inserted between the first adapter module and the feedforward network layer and the It includes a second adapter attention layer inserted after the second adapter module.
- the first adapter attention layer fuses information on the learned domain-specific adapter included in the first adapter module
- the second adapter attention layer fuses information on the learned domain-specific adapter included in the second adapter module. fusion of information.
- the first adapter attention layer includes a query module that outputs the input of the first adapter module, a key module that outputs the output of each adapter included in the first adapter module, and an adapter included in the first adapter module.
- a value module that outputs each output, a first multiplier that calculates the inner product of the output of the key module and the output of the query module, a softmax module that normalizes the output of the first multiplier, and an output of the softmax module It includes a second multiplier that weights and outputs the output of the value module.
- the data for each similar domain is prepared from a transformer decoder including a plurality of adapters learned for each domain.
- the output of the first adapter attention layer is combined with the input of the multi-head attention layer to generate a residual output, and layer normalization is performed on the residual output. Further comprising applying a first residual connection and layer normalization layer, combining the output of the second adapter attention layer with the input of the feed forward network layer after the second adapter attention layer to generate a residual output, and generating the residual output. It further includes a second residual concatenation and layer normalization layer that applies layer normalization to the output.
- a transformer model continuous learning device comprising:
- It includes a processor and a memory, wherein the memory is configured to store a set of codes, wherein the code includes a process for learning each of a plurality of adapters inserted into the transformer decoder for each domain, and an adapter attention layer inserted into the transformer decoder. It is used to control the processor to execute a process of fusing the learned domain-specific adapter information.
- the transformer decoder includes a multi-head attention layer and a feed forward network layer located after the multi-head attention layer, and the plurality of adapters are inserted between the multi-head attention layer and the feed forward network layer. It is included in a first adapter module and a second adapter module inserted after the feedforward network layer, and the adapter attention layer includes a first adapter attention layer inserted between the first adapter module and the feedforward network layer and the It includes a second adapter attention layer inserted after the second adapter module.
- the process of fusing the information of the learned domain-specific adapter includes preparing similar domain-specific data from the transformer decoder, inserting the adapter attention layer into the transformer decoder, and using the similar domain-specific data. This includes a process of performing learning on the transformer decoder into which the adapter attention layer is inserted.
- the first adapter attention layer includes a query module that outputs the input of the first adapter module, a key module that outputs the output of each adapter included in the first adapter module, and an adapter included in the first adapter module.
- a value module that outputs each output, a first multiplier that calculates the inner product of the output of the key module and the output of the query module, a softmax module that normalizes the output of the first multiplier, and an output of the softmax module It includes a second multiplier that weights and outputs the output of the value module.
- Figure 1 is a schematic flowchart of a transformer model continuous learning method based on adapter attention according to an embodiment of the present invention.
- Figure 2 is a structural diagram of a transformer decoder in an adapter attention-based transformer model continuous learning device according to an embodiment of the present invention.
- Figure 3 is a diagram illustrating the process of learning each adapter for N tasks in the transformer model continuous learning method based on adapter attention according to an embodiment of the present invention.
- Figure 4 is a detailed flowchart of the process of fusing learned adapter information for each domain using an adapter attention layer according to an embodiment of the present invention.
- Figure 5 is a diagram showing the structure of a transformer decoder in which an adapter attention layer is inserted according to an embodiment of the present invention.
- Figure 6 is a diagram showing the specific structure of an adapter attention layer according to an embodiment of the present invention.
- Figure 7 is a diagram showing a schematic configuration of an adapter attention-based transformer model continuous learning device according to an embodiment of the present invention.
- the devices described in the present invention are composed of hardware including at least one processor, a memory device, a communication device, etc., and a program that is executed in conjunction with the hardware is stored in a designated location.
- the hardware has a configuration and performance capable of executing the method of the present invention.
- the program includes instructions that implement the operating method of the present invention described with reference to the drawings, and executes the present invention by combining it with hardware such as a processor and memory device.
- Figure 1 is a schematic flowchart of a transformer model continuous learning method based on adapter attention according to an embodiment of the present invention.
- adapters for individual domains are learned using adapter-based transformer continuous learning technology (S100).
- S100 adapter-based transformer continuous learning technology
- the adapter learning process in step S100 may be the same as the learning process in the existing adapter-based transformer continuous learning technology.
- the structure of the transformer and adapter used in the step (S100) may be implemented as the structure shown in FIG. 2.
- FIG. 2 shows one transformer decoder 100 and two adapter modules 110 included in one transformer decoder 100. 120) is shown.
- one transformer decoder 100 includes a multi-head attention layer (MHA) 101, a feed forward network layer (FFN) 103, and 2 It includes two residual connection and layer normalization layers (Add & Norm) (102, 104).
- MHA multi-head attention layer
- FNN feed forward network layer
- It includes two residual connection and layer normalization layers (Add & Norm) (102, 104).
- MHA 101 receives input for each output location that precedes the corresponding output location, and for each output location, uses one or more queries derived from the input for the output location.
- An attention mechanism is applied at the input location to generate an updated representation for the output location.
- the FFN 103 at each generation time step, for each output location preceding the corresponding output location, receives an input at that output location and applies a transformation sequence to the input at the output location to produce a transform sequence at that output location. It is configured to generate output for:
- Add & Norm(102) combines the output of MHA(101) with the input to MHA(101) to produce a residual output and applies hierarchical normalization to the residual output
- Add & Norm(104) combines the output of MHA(101) with the input to MHA(101).
- the output is combined with the input to FFN 103 to produce a residual output and hierarchical normalization is applied to the residual output.
- one adapter module 110 is inserted between the MHA (101) and Add & Norm (102), and the remaining one adapter module (120) is inserted between FFN (103) and Add & Norm (104).
- These two adapter modules 110, 120 may be implemented as a single feed forward network layer, for example.
- the step (S100) means learning the corresponding adapter for each domain using the transformer decoder 100 described above. For example, when N tasks (N is a natural number greater than 1) are given, the corresponding adapter is learned for each given N task.
- N tasks N is a natural number greater than 1
- the corresponding adapter is learned for each given N task.
- FIG. 3 a state in which the adapters 111 and 121 corresponding to task 1 are first learned for task 1 among the N tasks is shown.
- the adapters 111 and 121 are learned for task 1
- the adapters 112 and 122 corresponding to task 2 are learned for task 2 among the N tasks.
- the state is shown.
- each adapter 110, 120 in the transformer decoder 100 includes three adapters, for example, three FFNs 111, 112, 113 and three FFNs 121, 122, 123. ) can be seen that has been learned.
- the learned adapter information for each domain is fused using the adapter attention layer (S110).
- step (S110) will be described in detail with reference to FIG. 4.
- Figure 4 is a detailed flowchart of the process of fusing learned adapter information for each domain using an adapter attention layer according to an embodiment of the present invention.
- data for each pseudo domain is prepared from the transformer decoder 100 learned through the step (S100) (S111).
- the data for each similar domain may be data corresponding to each task in step S100.
- an adapter attention layer is inserted into the transformer decoder 100 (S112).
- the structure of the transformer decoder 100 in which the adapter attention layers 130 and 140 are inserted is, for example, as shown in FIG. 5.
- adapter attention layers 130 and 140 are inserted onto adapter modules 110 and 120, respectively. Specifically, the adapter attention layer 130 is inserted between the adapter module 110 and Add & Norm (102), and the adapter attention layer 140 is inserted between the adapter module 120 and Add & Norm (104). .
- the adapter attention layers 130 and 140 inserted into the transformer decoder 100 perform weighted sum and normalization on the outputs of the adapter modules 110 and 120 learned for each domain using an attention mechanism.
- Figure 6 is a diagram showing the specific structure of an adapter attention layer according to an embodiment of the present invention.
- the adapter attention layer 140 includes a query module (Query) 141, a key module (142), a value module (Value) 143, a multiplier (144, 146), and a softmax module. (SoftMax)(145).
- Query query module
- key module 142
- value module Value module
- multiplier 144, 146
- SoftMax softmax module
- the query module 141 outputs the input of the adapter module 120 to the multiplier 144.
- the key module 142 outputs the output of each of the learned adapters in the adapter module 120, here three adapters 121, 122, and 123, to the multiplier 144.
- the value module 143 outputs the output of each of the learned adapters 121, 122, and 123 in the adapter module 120 to the multiplier 146.
- the multiplier 144 calculates the dot product of the values output from the key module 142 and the value output from the query module 141 and outputs the dot product to the softmax module 145.
- the similarity between the output of the adapters 121, 122, and 123 and the input of the adapter module 120 can be calculated through the dot product in the multiplier 144.
- Softmax module 145 normalizes the output of multiplier 144 to a value between 0 and 1. That is, the softmax module 145 normalizes the similarity between the output of the adapters 121, 122, and 123 and the input of the adapter module 120 to a value between 0 and 1.
- the multiplier 146 generates a final output vector by weighting the output of the Spotmax module 145 and the output of the value module 143 and outputs it to Add & Norm (104). That is, the multiplier 146 generates a final output vector by weighting the normalized similarity between the output of the adapters 121, 122, and 123 and the input of the adapter module 120 with the output values of the adapters 121, 122, and 123. .
- the adapter attention layer 140 can fuse the information of the learned adapters 121, 122, and 123 for each domain in the adapter module 120.
- the adapter attention layer 130 can fuse the information of the learned domain-specific adapters 111, 112, and 113 within the adapter module 110.
- the adapter attention layer (130) When learning about the transformer decoder 100 in which the adapter attention layer (130, 140) is inserted, the adapter attention layer (130) is A kind of similarity is calculated through the dot product between the output value and x, and the similarity between the output value of each adapter (111, 112, 113) and the input hidden vector After normalization, the normalized similarity is weighted and added with the output value of each adapter (111, 112, 113) to generate the final output vector. Similarly, the adapter attention layer 140 generates a final output vector for a random input hidden vector x.
- a kind of similarity is calculated through the inner product between the output value of each adapter (121, 122, 123) and x, and between the output value of each adapter (121, 122, 123) and the input hidden vector x through the softmax module 145. After normalizing the similarity to a value between 0 and 1, the normalized similarity is weighted and added with the output value of each adapter (121, 122, and 123) to generate the final output vector.
- the transformer decoder 100 does not use the adapter attention layers 130 and 140 to select an appropriate adapter for specific data, but rather fuses the outputs of all adapters to continue the existing adapter-based transformer.
- learning technology we solved the problem of having to select an appropriate adapter for each test sample in the test stage, and by separating the individual adapter learning stage for each domain (S100) and the adapter attention layer learning stage (110), each adapter is used for a specific domain. By making it possible to influence the learning of other domains without losing knowledge, we solved the problem of not being able to use any knowledge of the adapter learned about the existing domain when learning a new domain.
- FIG. 7 is a diagram illustrating a schematic configuration of an adapter attention-based transformer model continuous learning device 200 according to an embodiment of the present invention.
- the transformer model continuous learning device 200 includes at least one processor 210, a memory 220, a communicator 230, an input/output device 240, and a communication bus. Includes (250).
- the processor 210 may be a general-purpose CPU (Central Processing Unit), a microprocessor, an Application-Specific Integrated Circuit (ASIC), or one or more integrated circuits for controlling program execution in the solution of the present application.
- CPU Central Processing Unit
- ASIC Application-Specific Integrated Circuit
- the memory 220 stores information for the adapter attention-based transformer model continuous learning method according to an embodiment of the present invention.
- memory 220 is further configured to store a set of codes, which are used to control processor 210 to execute the following processes.
- This process involves learning adapters (111, 112, 113, 121, 122, 123) for individual domains using an adapter-based transformer continuous learning technique, using adapter attention layers (130, 140), It includes a process of fusing the information of learned adapters for each domain.
- the process of fusing the information of the learned domain-specific adapter using the adapter attention layer is a process of preparing similar domain-specific data through the transformer decoder 100, and the adapter attention layers 130 and 140 within the transformer decoder 100. ), a process of performing learning on the transformer decoder 100 into which the adapter attention layers 130 and 140 are inserted using prepared similar domain-specific data, etc.
- Memory 220 may be Read-Only Memory (ROM) or another type of static storage device capable of storing instructions, or Random Access Memory (RAM) or other type of dynamic storage device capable of storing information and instructions; or Electrically Erasable Programmable Read-Only Memory (EEPROM), Compact Disc Read-Only Memory (CD-ROM), or other compact disc storage or optical disc storage (compressed optical disc, laser disc, optical disc, digital versatile disc, blue may be a magnetic disk storage medium or other magnetic storage device, or any other medium that can be accessed by a computer while carrying or storing the expected program code in the form of instructions or data structures; This is not limited.
- the memory 220 may exist independently and is connected to the processor 210 by a communication bus 250.
- the communicator 230 performs wired and wireless communication with other external devices and can be implemented using various wired or wireless communication technologies. For example, when the communicator 230 is connected to the Internet and provides a service, it may follow TCP/IP, a standard protocol for information transmission on the Internet.
- TCP/IP a standard protocol for information transmission on the Internet.
- the input/output device 240 specifically consists of an input device 241 and an output device 242, and the input device 241 communicates with the processor 210 and can receive user input in a plurality of ways.
- input device 241 may be a mouse, keyboard, touch screen, or sensing device.
- the output device 242 communicates with the processor 210 and can display information or output audio in multiple ways.
- the output device 242 may be a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, an Organic Light Emitting Diode (OLED) display, or a speaker.
- LCD Liquid Crystal Display
- LED Light Emitting Diode
- OLED Organic Light Emitting Diode
- the input/output unit 240 may be used to input information about the adapter attention-based transformer model continuous learning and output information such as the adapter attention-based transformer model continuous learning process or results according to an embodiment of the present invention.
- Communication bus 250 is configured to combine all components of transformer model continuous learning 200, namely processor 210, memory 220, communicator 230, and input/output device 240.
- Components or “parts” used in embodiments of the present invention include software such as tasks, classes, subroutines, processes, objects, execution threads, and programs performed in a predetermined area on memory, or FPGA (field-unit). It may be implemented with hardware such as a programmable gate array or ASIC (application-specific integrated circuit), and may also be implemented as a combination of the software and hardware.
- the above components or '-parts' may be included in a computer-readable storage medium, or parts of them may be dispersed and distributed across a plurality of computers.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Medical Informatics (AREA)
- Neurology (AREA)
- Complex Calculations (AREA)
Abstract
Sont divulgués un procédé d'apprentissage continu pour modèle de transformateur sur la base d'une attention d'un adaptateur et un appareil associé. Le procédé comprend les étapes au cours desquelles : l'appareil entraîne chaque adaptateur d'une pluralité d'adaptateurs insérés dans un décodeur de transformateur associé à chaque domaine; et l'appareil fusionne des informations concernant l'adaptateur entraîné associé à chaque domaine à l'aide d'une couche d'attention d'adaptateur insérée dans le décodeur de transformateur.
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR10-2022-0147918 | 2022-11-08 | ||
KR20220147918 | 2022-11-08 | ||
KR10-2023-0011000 | 2023-01-27 | ||
KR1020230011000A KR20240066944A (ko) | 2022-11-08 | 2023-01-27 | 어댑터 어텐션 기반의 트랜스포머 모델 연속 학습 방법 및 그 장치 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2024101540A1 true WO2024101540A1 (fr) | 2024-05-16 |
Family
ID=91033107
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/KR2023/002210 WO2024101540A1 (fr) | 2022-11-08 | 2023-02-15 | Procédé d'apprentissage continu pour modèle de transformateur sur la base d'une attention d'un adaptateur et appareil associé |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2024101540A1 (fr) |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20210255862A1 (en) * | 2020-02-13 | 2021-08-19 | The Toronto-Dominion Bank | Initialization of Parameters for Machine-Learned Transformer Neural Network Architectures |
-
2023
- 2023-02-15 WO PCT/KR2023/002210 patent/WO2024101540A1/fr unknown
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20210255862A1 (en) * | 2020-02-13 | 2021-08-19 | The Toronto-Dominion Bank | Initialization of Parameters for Machine-Learned Transformer Neural Network Architectures |
Non-Patent Citations (4)
Title |
---|
LEE TAEWOO; LEE MIN-JOONG; KANG TAE GYOON; JUNG SEOKYEOUNG; KWON MINSEOK; HONG YEONA; LEE JUNGIN; WOO KYOUNG-GU; KIM HO-GYEONG; JE: "Adaptable Multi-Domain Language Model for Transformer ASR", ICASSP 2021 - 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), IEEE, 6 June 2021 (2021-06-06), pages 7358 - 7362, XP033954965, DOI: 10.1109/ICASSP39728.2021.9413475 * |
POTH CLIFTON, HANNAH STERZ: "Adapter-Transformers v3 - Unifying Efficient Fine-Tuning", ADAPTERHUB BLOG, 21 March 2022 (2022-03-21), XP093171296, Retrieved from the Internet <URL:https://adapterhub.ml/blog/2022/03/adapter-transformers-v3-unifying-efficient-fine-tuning/> * |
RABEEH KARIMI MAHABADI; JAMES HENDERSON; SEBASTIAN RUDER: "Compacter: Efficient Low-Rank Hypercomplex Adapter Layers", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 27 November 2021 (2021-11-27), 201 Olin Library Cornell University Ithaca, NY 14853, XP091089757 * |
ZANGWEI ZHENG; XIANGYU YUE; KAI WANG; YANG YOU: "Prompt Vision Transformer for Domain Generalization", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 18 August 2022 (2022-08-18), 201 Olin Library Cornell University Ithaca, NY 14853, XP091297289 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021054514A1 (fr) | Système de questions-réponses personnalisées par l'utilisateur basé sur un graphe de connaissances | |
WO2021049706A1 (fr) | Système et procédé de réponse aux questions d'ensemble | |
US7401086B2 (en) | Translating configuration files among network devices | |
Mallery | A Common LISP hypermedia server | |
Yang et al. | Dynamic learning style prediction method based on a pattern recognition technique | |
WO2023287064A1 (fr) | Procédé et système de construction d'une base de données d'entraînement au moyen d'une technologie de détection automatique de données anormales et d'étiquetage automatique | |
CN101253476A (zh) | 句法程序语言翻译 | |
WO2021221390A1 (fr) | Système et procédé de prise en charge de phrases hors vocabulaire dans une reconnaissance de la parole automatique | |
WO2022034945A1 (fr) | Appareil d'apprentissage par renforcement et procédé de classification de données | |
WO2023128093A1 (fr) | Appareil et procédé d'apprentissage par renforcement basés sur un environnement d'apprentissage utilisateur dans la conception de semi-conducteur | |
Louge et al. | Semantic Web Services Composition in the astrophysics domain: Issues and solutions | |
WO2022191513A1 (fr) | Dispositif et système d'entraînement de modèle de suivi de connaissances basé sur l'augmentation de données et leur procédé de fonctionnement | |
WO2024101540A1 (fr) | Procédé d'apprentissage continu pour modèle de transformateur sur la base d'une attention d'un adaptateur et appareil associé | |
WO2024143913A1 (fr) | Système et procédé de conception pour optimiser une zone et un macro-agencement sur la base d'un apprentissage par renforcement | |
WO2022114368A1 (fr) | Procédé et dispositif de complétion de connaissances par représentation vectorielle continue d'une relation neuro-symbolique | |
WO2022030669A1 (fr) | Système d'inférence par apprentissage profond basé sur une requête et procédé associé | |
WO2021187833A1 (fr) | Procédé et appareil pour générer automatiquement une image de bannière et support d'enregistrement lisible par ordinateur | |
WO2023068691A1 (fr) | Procédé de traitement de langage naturel via la réalisation d'une analyse sémantique au moyen d'informations syntaxiques, et appareil pour celui-ci | |
WO2023200059A1 (fr) | Procédé de fourniture d'une recommandation de conception et d'une proposition finale pour un produit à commercialiser et appareil associé | |
WO2022060061A1 (fr) | Procédé de génération automatique de problèmes de vocabulaire à l'aide d'un modèle de clarification de signification de mots en fonction d'un apprentissage profond, programme informatique associé et dispositif serveur associé | |
WO2022114322A1 (fr) | Système et procédé pour générer automatiquement une légende à l'aide d'un modèle orienté attribut d'objet d'image basé sur un algorithme d'apprentissage profond | |
WO2018169168A1 (fr) | Système de gestion de documents électroniques et procédé pour fournir des dictionnaires de traduction spécifiques à l'utilisateur | |
KR20240066944A (ko) | 어댑터 어텐션 기반의 트랜스포머 모델 연속 학습 방법 및 그 장치 | |
WO2024128807A1 (fr) | Procédé basé sur le prêt à l'emploi et destiné à fournir une description d'un modèle d'intelligence artificielle | |
WO2024043355A1 (fr) | Procédé de gestion de données linguistiques et serveur le mettant en œuvre |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 23888809 Country of ref document: EP Kind code of ref document: A1 |