CN118140230A - 对经预训练的语言模型的单个转换器层的多头网络进行微调 - Google Patents
对经预训练的语言模型的单个转换器层的多头网络进行微调 Download PDFInfo
- Publication number
- CN118140230A CN118140230A CN202280069151.5A CN202280069151A CN118140230A CN 118140230 A CN118140230 A CN 118140230A CN 202280069151 A CN202280069151 A CN 202280069151A CN 118140230 A CN118140230 A CN 118140230A
- Authority
- CN
- China
- Prior art keywords
- layers
- robot
- machine learning
- learning model
- trained
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0475—Generative networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/09—Supervised learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/096—Transfer learning
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L51/00—User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
- H04L51/02—User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail using automatic reactions or user delegation, e.g. automatic replies or chatbot-generated messages
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
- G10L2015/0635—Training updating or merging of old and new templates; Mean values; Weighting
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/223—Execution procedure of a spoken command
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Acoustics & Sound (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Medical Informatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Machine Translation (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Applications Claiming Priority (5)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US202163254740P | 2021-10-12 | 2021-10-12 | |
| US63/254,740 | 2021-10-12 | ||
| US17/735,651 | 2022-05-03 | ||
| US17/735,651 US12512091B2 (en) | 2021-10-12 | 2022-05-03 | Fine-tuning multi-head network from a single transformer layer of pre-trained language model |
| PCT/US2022/040530 WO2023064033A1 (en) | 2021-10-12 | 2022-08-17 | Fine-tuning multi-head network from a single transformer layer of pre-trained language model |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN118140230A true CN118140230A (zh) | 2024-06-04 |
Family
ID=85798249
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202280069151.5A Pending CN118140230A (zh) | 2021-10-12 | 2022-08-17 | 对经预训练的语言模型的单个转换器层的多头网络进行微调 |
Country Status (6)
| Country | Link |
|---|---|
| US (2) | US12512091B2 (https=) |
| JP (1) | JP2024539003A (https=) |
| KR (1) | KR20240089615A (https=) |
| CN (1) | CN118140230A (https=) |
| GB (1) | GB2631139A (https=) |
| WO (1) | WO2023064033A1 (https=) |
Cited By (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN119418320A (zh) * | 2024-10-30 | 2025-02-11 | 上海哔哩哔哩科技有限公司 | 一种模型训练方法、装置、介质和程序产品 |
| CN119418319A (zh) * | 2024-10-30 | 2025-02-11 | 上海哔哩哔哩科技有限公司 | 模型训练方法、文本检测方法、装置、介质和程序产品 |
| CN119418321A (zh) * | 2024-10-30 | 2025-02-11 | 上海哔哩哔哩科技有限公司 | 模型训练方法、用于检测和识别文本的方法及相关装置 |
| CN119915374A (zh) * | 2025-04-03 | 2025-05-02 | 浙江潮汐力科技有限公司 | 故障监测方法、装置、设备、存储介质和程序产品 |
Families Citing this family (11)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US12548552B2 (en) * | 2021-11-19 | 2026-02-10 | International Business Machines Corporation | Dynamic language selection of an AI voice assistance system |
| US11947935B2 (en) * | 2021-11-24 | 2024-04-02 | Microsoft Technology Licensing, Llc. | Custom models for source code generation via prefix-tuning |
| US20240061835A1 (en) * | 2022-08-22 | 2024-02-22 | Oracle International Corporation | System and method of selective fine-tuning for custom training of a natural language to logical form model |
| US20240169165A1 (en) * | 2022-11-17 | 2024-05-23 | Samsung Electronics Co., Ltd. | Automatically Generating Annotated Ground-Truth Corpus for Training NLU Model |
| US12562163B2 (en) * | 2023-05-12 | 2026-02-24 | Servicenow, Inc. | Bidirectional assistant for development platforms |
| CN116774140A (zh) * | 2023-06-26 | 2023-09-19 | 南京邮电大学 | 基于残差注意力网络的无网格信号源doa估计方法 |
| US20250005282A1 (en) * | 2023-06-29 | 2025-01-02 | Amazon Technologies, Inc. | Domain entity extraction for performing text analysis tasks |
| CN118446218B (zh) * | 2024-05-16 | 2024-11-01 | 西南交通大学 | 一种对抗式阅读理解嵌套命名实体识别方法 |
| CA3253531A1 (en) * | 2024-06-14 | 2026-01-19 | The Toronto-Dominion Bank | Context retrieval for in-context learning model |
| WO2026000314A1 (en) * | 2024-06-27 | 2026-01-02 | Beijing Youzhuju Network Technology Co., Ltd. | Model-based task processing |
| JP7658644B1 (ja) * | 2024-10-21 | 2025-04-08 | スパーブエーアイ カンパニー リミテッド | 事前学習されたベースモデルに基づいたカスタムモデルを学習する方法及びそれを用いた学習装置{method for training custom model based on pre-trained base model and learning device using the same} |
Family Cites Families (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US11138392B2 (en) * | 2018-07-26 | 2021-10-05 | Google Llc | Machine translation using neural network models |
| US20200042864A1 (en) | 2018-08-02 | 2020-02-06 | Veritone, Inc. | Neural network orchestration |
| US11556778B2 (en) | 2018-12-07 | 2023-01-17 | Microsoft Technology Licensing, Llc | Automated generation of machine learning models |
| US20210279596A1 (en) | 2020-03-06 | 2021-09-09 | Hitachi, Ltd. | System for predictive maintenance using trace norm generative adversarial networks |
| US20220094713A1 (en) * | 2020-09-21 | 2022-03-24 | Sophos Limited | Malicious message detection |
| US12141701B2 (en) * | 2021-01-21 | 2024-11-12 | International Business Machines Corporation | Channel scaling: a scale-and-select approach for selective transfer learning |
| US11875898B2 (en) * | 2021-05-26 | 2024-01-16 | Merative Us L.P. | Automatic condition diagnosis using an attention-guided framework |
| US20230106669A1 (en) * | 2021-09-27 | 2023-04-06 | X Development Llc | Binding affinity prediction using neural networks |
-
2022
- 2022-05-03 US US17/735,651 patent/US12512091B2/en active Active
- 2022-08-17 CN CN202280069151.5A patent/CN118140230A/zh active Pending
- 2022-08-17 GB GB2403625.3A patent/GB2631139A/en active Pending
- 2022-08-17 JP JP2024522110A patent/JP2024539003A/ja active Pending
- 2022-08-17 WO PCT/US2022/040530 patent/WO2023064033A1/en not_active Ceased
- 2022-08-17 KR KR1020247015690A patent/KR20240089615A/ko active Pending
-
2025
- 2025-11-26 US US19/402,418 patent/US20260080864A1/en active Pending
Cited By (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN119418320A (zh) * | 2024-10-30 | 2025-02-11 | 上海哔哩哔哩科技有限公司 | 一种模型训练方法、装置、介质和程序产品 |
| CN119418319A (zh) * | 2024-10-30 | 2025-02-11 | 上海哔哩哔哩科技有限公司 | 模型训练方法、文本检测方法、装置、介质和程序产品 |
| CN119418321A (zh) * | 2024-10-30 | 2025-02-11 | 上海哔哩哔哩科技有限公司 | 模型训练方法、用于检测和识别文本的方法及相关装置 |
| CN119418319B (zh) * | 2024-10-30 | 2025-09-30 | 上海哔哩哔哩科技有限公司 | 模型训练方法、文本检测方法、装置、介质和程序产品 |
| CN119418321B (zh) * | 2024-10-30 | 2025-09-30 | 上海哔哩哔哩科技有限公司 | 模型训练方法、用于检测和识别文本的方法及相关装置 |
| CN119418320B (zh) * | 2024-10-30 | 2025-09-30 | 上海哔哩哔哩科技有限公司 | 一种模型训练方法、装置、介质和程序产品 |
| CN119915374A (zh) * | 2025-04-03 | 2025-05-02 | 浙江潮汐力科技有限公司 | 故障监测方法、装置、设备、存储介质和程序产品 |
Also Published As
| Publication number | Publication date |
|---|---|
| US20260080864A1 (en) | 2026-03-19 |
| GB2631139A (en) | 2024-12-25 |
| JP2024539003A (ja) | 2024-10-28 |
| US12512091B2 (en) | 2025-12-30 |
| KR20240089615A (ko) | 2024-06-20 |
| US20230115321A1 (en) | 2023-04-13 |
| GB202403625D0 (en) | 2024-04-24 |
| WO2023064033A1 (en) | 2023-04-20 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US12361219B2 (en) | Context tag integration with named entity recognition models | |
| CN115398437B (zh) | 改进的域外(ood)检测技术 | |
| CN114424185B (zh) | 用于自然语言处理的停用词数据扩充 | |
| CN116802629B (zh) | 用于自然语言处理的多因素建模 | |
| CN115398436B (zh) | 用于自然语言处理的噪声数据扩充 | |
| US12512091B2 (en) | Fine-tuning multi-head network from a single transformer layer of pre-trained language model | |
| CN116583837B (zh) | 用于自然语言处理的基于距离的logit值 | |
| US12288550B2 (en) | Framework for focused training of language models and techniques for end-to-end hypertuning of the framework | |
| CN116547676B (zh) | 用于自然语言处理的增强型logit | |
| US12412563B2 (en) | Path dropout for natural language processing | |
| CN118265981B (zh) | 用于为预训练的语言模型处置长文本的系统和技术 | |
| CN116615727A (zh) | 用于自然语言处理的关键词数据扩充工具 | |
| CN116490879A (zh) | 用于神经网络中过度预测的方法和系统 | |
| CN119183573A (zh) | 实体感知数据增强技术 | |
| US20240062112A1 (en) | Adaptive training data augmentation to facilitate training named entity recognition models | |
| JP2025528391A (ja) | 名前付きエンティティ認識モデルの訓練を容易にするための適応的訓練データ拡大 | |
| CN121936414A (zh) | 用于为预训练的语言模型处置长文本的系统和技术 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination |