CN118140230A - 对经预训练的语言模型的单个转换器层的多头网络进行微调 - Google Patents

对经预训练的语言模型的单个转换器层的多头网络进行微调 Download PDF

Info

Publication number
CN118140230A
CN118140230A CN202280069151.5A CN202280069151A CN118140230A CN 118140230 A CN118140230 A CN 118140230A CN 202280069151 A CN202280069151 A CN 202280069151A CN 118140230 A CN118140230 A CN 118140230A
Authority
CN
China
Prior art keywords
layers
robot
machine learning
learning model
trained
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202280069151.5A
Other languages
English (en)
Chinese (zh)
Inventor
T·T·乌
T·Q·帕姆
O·M·内扎米
M·E·约翰逊
T·L·杜翁
C·D·V·黄
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Oracle International Corp
Original Assignee
Oracle International Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oracle International Corp filed Critical Oracle International Corp
Publication of CN118140230A publication Critical patent/CN118140230A/zh
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0475Generative networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/09Supervised learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/096Transfer learning
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L51/00User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
    • H04L51/02User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail using automatic reactions or user delegation, e.g. automatic replies or chatbot-generated messages
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • G10L2015/0635Training updating or merging of old and new templates; Mean values; Weighting
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Acoustics & Sound (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
CN202280069151.5A 2021-10-12 2022-08-17 对经预训练的语言模型的单个转换器层的多头网络进行微调 Pending CN118140230A (zh)

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
US202163254740P 2021-10-12 2021-10-12
US63/254,740 2021-10-12
US17/735,651 2022-05-03
US17/735,651 US12512091B2 (en) 2021-10-12 2022-05-03 Fine-tuning multi-head network from a single transformer layer of pre-trained language model
PCT/US2022/040530 WO2023064033A1 (en) 2021-10-12 2022-08-17 Fine-tuning multi-head network from a single transformer layer of pre-trained language model

Publications (1)

Publication Number Publication Date
CN118140230A true CN118140230A (zh) 2024-06-04

Family

ID=85798249

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202280069151.5A Pending CN118140230A (zh) 2021-10-12 2022-08-17 对经预训练的语言模型的单个转换器层的多头网络进行微调

Country Status (6)

Country Link
US (2) US12512091B2 (https=)
JP (1) JP2024539003A (https=)
KR (1) KR20240089615A (https=)
CN (1) CN118140230A (https=)
GB (1) GB2631139A (https=)
WO (1) WO2023064033A1 (https=)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119418320A (zh) * 2024-10-30 2025-02-11 上海哔哩哔哩科技有限公司 一种模型训练方法、装置、介质和程序产品
CN119418319A (zh) * 2024-10-30 2025-02-11 上海哔哩哔哩科技有限公司 模型训练方法、文本检测方法、装置、介质和程序产品
CN119418321A (zh) * 2024-10-30 2025-02-11 上海哔哩哔哩科技有限公司 模型训练方法、用于检测和识别文本的方法及相关装置
CN119915374A (zh) * 2025-04-03 2025-05-02 浙江潮汐力科技有限公司 故障监测方法、装置、设备、存储介质和程序产品

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US12548552B2 (en) * 2021-11-19 2026-02-10 International Business Machines Corporation Dynamic language selection of an AI voice assistance system
US11947935B2 (en) * 2021-11-24 2024-04-02 Microsoft Technology Licensing, Llc. Custom models for source code generation via prefix-tuning
US20240061835A1 (en) * 2022-08-22 2024-02-22 Oracle International Corporation System and method of selective fine-tuning for custom training of a natural language to logical form model
US20240169165A1 (en) * 2022-11-17 2024-05-23 Samsung Electronics Co., Ltd. Automatically Generating Annotated Ground-Truth Corpus for Training NLU Model
US12562163B2 (en) * 2023-05-12 2026-02-24 Servicenow, Inc. Bidirectional assistant for development platforms
CN116774140A (zh) * 2023-06-26 2023-09-19 南京邮电大学 基于残差注意力网络的无网格信号源doa估计方法
US20250005282A1 (en) * 2023-06-29 2025-01-02 Amazon Technologies, Inc. Domain entity extraction for performing text analysis tasks
CN118446218B (zh) * 2024-05-16 2024-11-01 西南交通大学 一种对抗式阅读理解嵌套命名实体识别方法
CA3253531A1 (en) * 2024-06-14 2026-01-19 The Toronto-Dominion Bank Context retrieval for in-context learning model
WO2026000314A1 (en) * 2024-06-27 2026-01-02 Beijing Youzhuju Network Technology Co., Ltd. Model-based task processing
JP7658644B1 (ja) * 2024-10-21 2025-04-08 スパーブエーアイ カンパニー リミテッド 事前学習されたベースモデルに基づいたカスタムモデルを学習する方法及びそれを用いた学習装置{method for training custom model based on pre-trained base model and learning device using the same}

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11138392B2 (en) * 2018-07-26 2021-10-05 Google Llc Machine translation using neural network models
US20200042864A1 (en) 2018-08-02 2020-02-06 Veritone, Inc. Neural network orchestration
US11556778B2 (en) 2018-12-07 2023-01-17 Microsoft Technology Licensing, Llc Automated generation of machine learning models
US20210279596A1 (en) 2020-03-06 2021-09-09 Hitachi, Ltd. System for predictive maintenance using trace norm generative adversarial networks
US20220094713A1 (en) * 2020-09-21 2022-03-24 Sophos Limited Malicious message detection
US12141701B2 (en) * 2021-01-21 2024-11-12 International Business Machines Corporation Channel scaling: a scale-and-select approach for selective transfer learning
US11875898B2 (en) * 2021-05-26 2024-01-16 Merative Us L.P. Automatic condition diagnosis using an attention-guided framework
US20230106669A1 (en) * 2021-09-27 2023-04-06 X Development Llc Binding affinity prediction using neural networks

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119418320A (zh) * 2024-10-30 2025-02-11 上海哔哩哔哩科技有限公司 一种模型训练方法、装置、介质和程序产品
CN119418319A (zh) * 2024-10-30 2025-02-11 上海哔哩哔哩科技有限公司 模型训练方法、文本检测方法、装置、介质和程序产品
CN119418321A (zh) * 2024-10-30 2025-02-11 上海哔哩哔哩科技有限公司 模型训练方法、用于检测和识别文本的方法及相关装置
CN119418319B (zh) * 2024-10-30 2025-09-30 上海哔哩哔哩科技有限公司 模型训练方法、文本检测方法、装置、介质和程序产品
CN119418321B (zh) * 2024-10-30 2025-09-30 上海哔哩哔哩科技有限公司 模型训练方法、用于检测和识别文本的方法及相关装置
CN119418320B (zh) * 2024-10-30 2025-09-30 上海哔哩哔哩科技有限公司 一种模型训练方法、装置、介质和程序产品
CN119915374A (zh) * 2025-04-03 2025-05-02 浙江潮汐力科技有限公司 故障监测方法、装置、设备、存储介质和程序产品

Also Published As

Publication number Publication date
US20260080864A1 (en) 2026-03-19
GB2631139A (en) 2024-12-25
JP2024539003A (ja) 2024-10-28
US12512091B2 (en) 2025-12-30
KR20240089615A (ko) 2024-06-20
US20230115321A1 (en) 2023-04-13
GB202403625D0 (en) 2024-04-24
WO2023064033A1 (en) 2023-04-20

Similar Documents

Publication Publication Date Title
US12361219B2 (en) Context tag integration with named entity recognition models
CN115398437B (zh) 改进的域外(ood)检测技术
CN114424185B (zh) 用于自然语言处理的停用词数据扩充
CN116802629B (zh) 用于自然语言处理的多因素建模
CN115398436B (zh) 用于自然语言处理的噪声数据扩充
US12512091B2 (en) Fine-tuning multi-head network from a single transformer layer of pre-trained language model
CN116583837B (zh) 用于自然语言处理的基于距离的logit值
US12288550B2 (en) Framework for focused training of language models and techniques for end-to-end hypertuning of the framework
CN116547676B (zh) 用于自然语言处理的增强型logit
US12412563B2 (en) Path dropout for natural language processing
CN118265981B (zh) 用于为预训练的语言模型处置长文本的系统和技术
CN116615727A (zh) 用于自然语言处理的关键词数据扩充工具
CN116490879A (zh) 用于神经网络中过度预测的方法和系统
CN119183573A (zh) 实体感知数据增强技术
US20240062112A1 (en) Adaptive training data augmentation to facilitate training named entity recognition models
JP2025528391A (ja) 名前付きエンティティ認識モデルの訓練を容易にするための適応的訓練データ拡大
CN121936414A (zh) 用于为预训练的语言模型处置长文本的系统和技术

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination