CN115485690A - 用于处置聊天机器人的不平衡训练数据的分批技术 - Google Patents

用于处置聊天机器人的不平衡训练数据的分批技术 Download PDF

Info

Publication number
CN115485690A
CN115485690A CN202180026697.8A CN202180026697A CN115485690A CN 115485690 A CN115485690 A CN 115485690A CN 202180026697 A CN202180026697 A CN 202180026697A CN 115485690 A CN115485690 A CN 115485690A
Authority
CN
China
Prior art keywords
intent
utterances
batch
training
utterance
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202180026697.8A
Other languages
English (en)
Chinese (zh)
Inventor
T·L·杜翁
M·E·约翰逊
V·比什诺伊
S·文纳科塔
洪宇衡
E·L·贾拉勒丁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Oracle International Corp
Original Assignee
Oracle International Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oracle International Corp filed Critical Oracle International Corp
Publication of CN115485690A publication Critical patent/CN115485690A/zh
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/906Clustering; Classification
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24147Distances to closest patterns, e.g. nearest neighbour classification
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • G06F40/35Discourse or dialogue representation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1815Semantic context, e.g. disambiguation of the recognition hypotheses based on word meaning
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1822Parsing for meaning understanding
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/183Speech classification or search using natural language modelling using context dependencies, e.g. language models
    • G10L15/19Grammatical context, e.g. disambiguation of the recognition hypotheses based on word sequence rules
    • G10L15/197Probabilistic grammars, e.g. word n-grams
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Acoustics & Sound (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Medical Informatics (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computing Systems (AREA)
  • Evolutionary Biology (AREA)
  • Databases & Information Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
CN202180026697.8A 2020-03-30 2021-03-30 用于处置聊天机器人的不平衡训练数据的分批技术 Pending CN115485690A (zh)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US202063002151P 2020-03-30 2020-03-30
US63/002,151 2020-03-30
PCT/US2021/024946 WO2021202569A1 (en) 2020-03-30 2021-03-30 Batching techniques for handling unbalanced training data for a chatbot

Publications (1)

Publication Number Publication Date
CN115485690A true CN115485690A (zh) 2022-12-16

Family

ID=77856167

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202180026697.8A Pending CN115485690A (zh) 2020-03-30 2021-03-30 用于处置聊天机器人的不平衡训练数据的分批技术

Country Status (5)

Country Link
US (1) US12236321B2 (https=)
EP (1) EP4128011A1 (https=)
JP (1) JP2023520420A (https=)
CN (1) CN115485690A (https=)
WO (1) WO2021202569A1 (https=)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN120395912A (zh) * 2025-07-03 2025-08-01 浙江理工大学 一种任务驱动的通用机器人智能控制方法及系统

Families Citing this family (46)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8977255B2 (en) 2007-04-03 2015-03-10 Apple Inc. Method and system for operating a multi-function portable electronic device using voice-activation
US8676904B2 (en) 2008-10-02 2014-03-18 Apple Inc. Electronic devices with voice command and contextual data processing capabilities
DE112014000709B4 (de) 2013-02-07 2021-12-30 Apple Inc. Verfahren und vorrichtung zum betrieb eines sprachtriggers für einen digitalen assistenten
US10170123B2 (en) 2014-05-30 2019-01-01 Apple Inc. Intelligent assistant for home automation
US9715875B2 (en) 2014-05-30 2017-07-25 Apple Inc. Reducing the need for manual start/end-pointing and trigger phrases
US9338493B2 (en) 2014-06-30 2016-05-10 Apple Inc. Intelligent automated assistant for TV user interactions
US9886953B2 (en) 2015-03-08 2018-02-06 Apple Inc. Virtual assistant activation
US10460227B2 (en) 2015-05-15 2019-10-29 Apple Inc. Virtual assistant in a communication session
US10331312B2 (en) 2015-09-08 2019-06-25 Apple Inc. Intelligent automated assistant in a media environment
US10747498B2 (en) 2015-09-08 2020-08-18 Apple Inc. Zero latency digital assistant
US10671428B2 (en) 2015-09-08 2020-06-02 Apple Inc. Distributed personal assistant
US10691473B2 (en) 2015-11-06 2020-06-23 Apple Inc. Intelligent automated assistant in a messaging environment
US10586535B2 (en) 2016-06-10 2020-03-10 Apple Inc. Intelligent digital assistant in a multi-tasking environment
US12197817B2 (en) 2016-06-11 2025-01-14 Apple Inc. Intelligent device arbitration and control
DK201670540A1 (en) 2016-06-11 2018-01-08 Apple Inc Application integration with a digital assistant
US11204787B2 (en) 2017-01-09 2021-12-21 Apple Inc. Application integration with a digital assistant
DK201770428A1 (en) 2017-05-12 2019-02-18 Apple Inc. LOW-LATENCY INTELLIGENT AUTOMATED ASSISTANT
DK179496B1 (en) 2017-05-12 2019-01-15 Apple Inc. USER-SPECIFIC Acoustic Models
US10303715B2 (en) 2017-05-16 2019-05-28 Apple Inc. Intelligent automated assistant for media exploration
US10818288B2 (en) 2018-03-26 2020-10-27 Apple Inc. Natural assistant interaction
US10928918B2 (en) 2018-05-07 2021-02-23 Apple Inc. Raise to speak
DK180639B1 (en) 2018-06-01 2021-11-04 Apple Inc DISABILITY OF ATTENTION-ATTENTIVE VIRTUAL ASSISTANT
DK201870355A1 (en) 2018-06-01 2019-12-16 Apple Inc. VIRTUAL ASSISTANT OPERATION IN MULTI-DEVICE ENVIRONMENTS
US11462215B2 (en) 2018-09-28 2022-10-04 Apple Inc. Multi-modal inputs for voice commands
US11348573B2 (en) 2019-03-18 2022-05-31 Apple Inc. Multimodality in digital assistant systems
US11651162B2 (en) 2019-04-26 2023-05-16 Oracle International Corporation Composite entity for rule driven acquisition of input data to chatbots
US11307752B2 (en) 2019-05-06 2022-04-19 Apple Inc. User configurable task triggers
DK201970509A1 (en) 2019-05-06 2021-01-15 Apple Inc Spoken notifications
US11227599B2 (en) 2019-06-01 2022-01-18 Apple Inc. Methods and user interfaces for voice-based control of electronic devices
US11704474B2 (en) * 2020-02-25 2023-07-18 Transposit Corporation Markdown data content with action binding
US11061543B1 (en) 2020-05-11 2021-07-13 Apple Inc. Providing relevant data items based on context
US12301635B2 (en) 2020-05-11 2025-05-13 Apple Inc. Digital assistant hardware abstraction
US11490204B2 (en) 2020-07-20 2022-11-01 Apple Inc. Multi-device audio adjustment coordination
US11438683B2 (en) 2020-07-21 2022-09-06 Apple Inc. User identification using headphones
US11715464B2 (en) * 2020-09-14 2023-08-01 Apple Inc. Using augmentation to create natural language models
WO2022240918A1 (en) * 2021-05-11 2022-11-17 AskWisy, Inc. Intelligent training and education bot
US12147768B2 (en) * 2021-05-18 2024-11-19 International Business Machines Corporation Natural language bias detection in conversational system environments
US12321428B2 (en) * 2021-07-08 2025-06-03 Nippon Telegraph And Telephone Corporation User authentication device, user authentication method, and user authentication computer program
US11763803B1 (en) * 2021-07-28 2023-09-19 Asapp, Inc. System, method, and computer program for extracting utterances corresponding to a user problem statement in a conversation between a human agent and a user
US12609102B2 (en) * 2021-09-30 2026-04-21 Sap Se Training dataset generation for speech-to-text service
US12067363B1 (en) 2022-02-24 2024-08-20 Asapp, Inc. System, method, and computer program for text sanitization
CN114694655B (zh) * 2022-03-28 2025-07-08 南方电网数字企业科技(广东)有限公司 一种针对粤语音频的拓展方法及语音识别方法
US12468895B2 (en) 2022-06-21 2025-11-11 Kore.Ai, Inc. Systems and methods for training a virtual assistant
WO2024238420A1 (en) * 2023-05-12 2024-11-21 Genesys Cloud Services, Inc. Systems and methods for computing intent health for enhancing conversational bots
US12231378B2 (en) * 2023-06-08 2025-02-18 Sap Se Realtime conversation AI insights and deployment
US12367342B1 (en) * 2025-01-15 2025-07-22 Conversational AI Ltd Automated analysis of computerized conversational agent conversational data

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7853451B1 (en) * 2003-12-18 2010-12-14 At&T Intellectual Property Ii, L.P. System and method of exploiting human-human data for spoken language understanding systems
JP2010204966A (ja) * 2009-03-03 2010-09-16 Nippon Telegr & Teleph Corp <Ntt> サンプリング装置、サンプリング方法、サンプリングプログラム、クラス判別装置およびクラス判別システム。
US20120262461A1 (en) * 2011-02-17 2012-10-18 Conversive, Inc. System and Method for the Normalization of Text
US9589074B2 (en) * 2014-08-20 2017-03-07 Oracle International Corporation Multidimensional spatial searching for identifying duplicate crash dumps
US10453117B1 (en) * 2016-06-29 2019-10-22 Amazon Technologies, Inc. Determining domains for natural language understanding
US10909980B2 (en) * 2017-02-27 2021-02-02 SKAEL, Inc. Machine-learning digital assistants
US10546583B2 (en) * 2017-08-30 2020-01-28 Amazon Technologies, Inc. Context-based device arbitration
US10617959B2 (en) 2018-01-18 2020-04-14 Moveworks, Inc. Method and system for training a chatbot
US10497366B2 (en) * 2018-03-23 2019-12-03 Servicenow, Inc. Hybrid learning system for natural language understanding
US10621976B2 (en) 2018-09-18 2020-04-14 International Business Machines Corporation Intent classification from multiple sources when building a conversational system
US10977443B2 (en) * 2018-11-05 2021-04-13 International Business Machines Corporation Class balancing for intent authoring using search
WO2020163627A1 (en) * 2019-02-07 2020-08-13 Clinc, Inc. Systems and methods for machine learning-based multi-intent segmentation and classification
US12026468B2 (en) * 2020-11-30 2024-07-02 Oracle International Corporation Out-of-domain data augmentation for natural language processing

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN120395912A (zh) * 2025-07-03 2025-08-01 浙江理工大学 一种任务驱动的通用机器人智能控制方法及系统
CN120395912B (zh) * 2025-07-03 2025-09-12 浙江理工大学 一种任务驱动的通用机器人智能控制方法及系统

Also Published As

Publication number Publication date
WO2021202569A1 (en) 2021-10-07
EP4128011A1 (en) 2023-02-08
JP2023520420A (ja) 2023-05-17
US20210304075A1 (en) 2021-09-30
US12236321B2 (en) 2025-02-25

Similar Documents

Publication Publication Date Title
US12236321B2 (en) Batching techniques for handling unbalanced training data for a chatbot
CN114424185B (zh) 用于自然语言处理的停用词数据扩充
US20230252975A1 (en) Routing for chatbots
US20250094725A1 (en) Digital assistant using generative artificial intelligence
CN115398419A (zh) 用于基于目标的超参数调优的方法和系统
WO2022040547A1 (en) Techniques for providing explanations for text classification
CN115398437A (zh) 改进的域外(ood)检测技术
US11989523B2 (en) Composite entity for rule driven acquisition of input data to chatbots
CN116802629A (zh) 用于自然语言处理的多因素建模
CN114365119A (zh) 在聊天机器人系统中检测不相关的话语
US12367352B2 (en) Deep learning techniques for extraction of embedded data from documents
US12518129B2 (en) Method and system for over-prediction in neural networks
KR102821062B1 (ko) 사전-트레이닝된 언어 모델들에 대한 긴 텍스트를 핸들링하기 위한 시스템 및 기술들
EP4281880A1 (en) Multi-feature balancing for natural language processors
US12112560B2 (en) Usage based resource utilization of training pool for chatbots
US20230136965A1 (en) Prohibiting inconsistent named entity recognition tag sequences
US20230134149A1 (en) Rule-based techniques for extraction of question and answer pairs from data
WO2025058830A1 (en) Digital assistant using generative artificial intelligence
CN120092248A (zh) 基于目标的超参数调谐中的目标函数优化
WO2023091436A1 (en) System and techniques for handling long text for pre-trained language models

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination