CN115917553A - 在聊天机器人中实现稳健命名实体识别的实体级数据扩充 - Google Patents

在聊天机器人中实现稳健命名实体识别的实体级数据扩充 Download PDF

Info

Publication number
CN115917553A
CN115917553A CN202180041814.8A CN202180041814A CN115917553A CN 115917553 A CN115917553 A CN 115917553A CN 202180041814 A CN202180041814 A CN 202180041814A CN 115917553 A CN115917553 A CN 115917553A
Authority
CN
China
Prior art keywords
utterance
model
template
data set
entity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202180041814.8A
Other languages
English (en)
Chinese (zh)
Inventor
S·P·K·加德
Y·吴
A·D·卡努加
E·L·贾拉勒丁
V·比什诺伊
M·E·约翰逊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Oracle International Corp
Original Assignee
Oracle International Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oracle International Corp filed Critical Oracle International Corp
Publication of CN115917553A publication Critical patent/CN115917553A/zh
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • G06F40/35Discourse or dialogue representation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/183Speech classification or search using natural language modelling using context dependencies, e.g. language models
    • G10L15/19Grammatical context, e.g. disambiguation of the recognition hypotheses based on word sequence rules
    • G10L15/197Probabilistic grammars, e.g. word n-grams
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/186Templates
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • G06N3/0442Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/09Supervised learning
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L51/00User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
    • H04L51/02User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail using automatic reactions or user delegation, e.g. automatic replies or chatbot-generated messages
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L51/00User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
    • H04L51/52User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail for supporting social networking services
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • G10L2015/0631Creating reference templates; Clustering

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Multimedia (AREA)
  • Acoustics & Sound (AREA)
  • Human Computer Interaction (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Probability & Statistics with Applications (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
CN202180041814.8A 2020-06-12 2021-06-11 在聊天机器人中实现稳健命名实体识别的实体级数据扩充 Pending CN115917553A (zh)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US202063038337P 2020-06-12 2020-06-12
US63/038,337 2020-06-12
PCT/US2021/036939 WO2021252845A1 (en) 2020-06-12 2021-06-11 Entity level data augmentation in chatbots for robust named entity recognition

Publications (1)

Publication Number Publication Date
CN115917553A true CN115917553A (zh) 2023-04-04

Family

ID=76797120

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202180041814.8A Pending CN115917553A (zh) 2020-06-12 2021-06-11 在聊天机器人中实现稳健命名实体识别的实体级数据扩充

Country Status (5)

Country Link
US (2) US11804219B2 (https=)
EP (1) EP4165540A1 (https=)
JP (2) JP7686678B2 (https=)
CN (1) CN115917553A (https=)
WO (1) WO2021252845A1 (https=)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116612743A (zh) * 2023-06-20 2023-08-18 北京云思智学科技有限公司 语音识别模型的评估方法及装置、电子设备和存储介质
CN117056720A (zh) * 2023-07-27 2023-11-14 远光软件股份有限公司 预训练语言的学习微调方法、计算机装置及计算机可读存储介质

Families Citing this family (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2569335B (en) * 2017-12-13 2022-07-27 Sage Global Services Ltd Chatbot system
US11804219B2 (en) * 2020-06-12 2023-10-31 Oracle International Corporation Entity level data augmentation in chatbots for robust named entity recognition
US11720752B2 (en) * 2020-07-07 2023-08-08 Sap Se Machine learning enabled text analysis with multi-language support
US11599721B2 (en) * 2020-08-25 2023-03-07 Salesforce, Inc. Intelligent training set augmentation for natural language processing tasks
US11657227B2 (en) * 2021-01-13 2023-05-23 International Business Machines Corporation Corpus data augmentation and debiasing
US12026471B2 (en) * 2021-04-16 2024-07-02 Accenture Global Solutions Limited Automated generation of chatbot
US12340792B2 (en) * 2021-05-17 2025-06-24 Salesforce, Inc. Systems and methods for few-shot intent classifier models
US12321428B2 (en) * 2021-07-08 2025-06-03 Nippon Telegraph And Telephone Corporation User authentication device, user authentication method, and user authentication computer program
US12170079B2 (en) * 2021-08-03 2024-12-17 Samsung Electronics Co., Ltd. System and method for improving named entity recognition
WO2023019255A1 (en) * 2021-08-12 2023-02-16 Yohana Llc Systems and methods for representative support in a task determination system
US12468938B2 (en) * 2021-09-21 2025-11-11 International Business Machines Corporation Training example generation to create new intents for chatbots
US11948599B2 (en) * 2022-01-06 2024-04-02 Microsoft Technology Licensing, Llc Audio event detection with window-based prediction
US12321702B2 (en) 2022-01-31 2025-06-03 Walmart Apollo, Llc Automatically augmenting and labeling conversational data for training machine learning models
US20230252287A1 (en) * 2022-02-07 2023-08-10 Accenture Global Solutions Limited Evaluation of reliability of artificial intelligence (ai) models
CN117290501A (zh) * 2022-06-16 2023-12-26 青岛海尔特种电冰箱有限公司 文本分类训练语料自动生成方法、装置、设备和存储介质
US11784833B1 (en) * 2022-07-25 2023-10-10 Gravystack, Inc. Apparatus and method for generating an endpoint path associated with a user
CN119768794A (zh) * 2022-08-22 2025-04-04 甲骨文国际公司 自适应训练数据扩充以促进命名实体识别模型的训练
US12499385B2 (en) 2022-08-22 2025-12-16 Oracle International Corporation Adaptive training data augmentation to facilitate training named entity recognition models
US20240169165A1 (en) * 2022-11-17 2024-05-23 Samsung Electronics Co., Ltd. Automatically Generating Annotated Ground-Truth Corpus for Training NLU Model
US11847565B1 (en) * 2023-02-14 2023-12-19 Fmr Llc Automatic refinement of intent classification for virtual assistant applications
US20240289553A1 (en) * 2023-02-23 2024-08-29 Insight Direct Usa, Inc. Conversational inclusion/exclusion detection
US20240354423A1 (en) * 2023-04-21 2024-10-24 Teachers Insurance And Annuity Association Of America Cybersecurity management systems integrating artificial intelligence, machine learning and extended reality
US11922515B1 (en) * 2023-04-28 2024-03-05 Peppercorn AI Technology Limited Methods and apparatuses for AI digital assistants
US12231378B2 (en) * 2023-06-08 2025-02-18 Sap Se Realtime conversation AI insights and deployment
US12475325B2 (en) * 2023-11-09 2025-11-18 Oracle International Corporation Model robustness on operators and triggering keywords in natural language to a meaning representation language system
US20250181327A1 (en) * 2023-12-05 2025-06-05 Lg Management Development Institute Co., Ltd. Method and system for performing tasks based on the context of task-oriented dialogue

Family Cites Families (41)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6499012B1 (en) 1999-12-23 2002-12-24 Nortel Networks Limited Method and apparatus for hierarchical training of speech models for use in speaker verification
US7295981B1 (en) * 2004-01-09 2007-11-13 At&T Corp. Method for building a natural language understanding model for a spoken dialog system
JP4537755B2 (ja) 2004-04-30 2010-09-08 株式会社日立製作所 音声対話システム
JP5412137B2 (ja) 2009-02-24 2014-02-12 ヤフー株式会社 機械学習装置及び方法
US10217059B2 (en) * 2014-02-04 2019-02-26 Maluuba Inc. Method and system for generating natural language training data
US20190180196A1 (en) 2015-01-23 2019-06-13 Conversica, Inc. Systems and methods for generating and updating machine hybrid deep learning models
US20200143468A1 (en) * 2015-08-13 2020-05-07 Cronus Consulting Group Pty Ltd System for financial information reporting
US11048934B2 (en) * 2015-08-27 2021-06-29 Longsand Limited Identifying augmented features based on a bayesian analysis of a text document
US10417566B2 (en) 2016-05-22 2019-09-17 Microsoft Technology Licensing, Llc Self-learning technique for training a PDA component and a simulated user component
US9910848B2 (en) * 2016-07-07 2018-03-06 International Business Machines Corporation Generating semantic variants of natural language expressions using type-specific templates
US10540967B2 (en) * 2016-11-14 2020-01-21 Xerox Corporation Machine reading method for dialog state tracking
US11138388B2 (en) * 2016-12-22 2021-10-05 Verizon Media Inc. Method and system for facilitating a user-machine conversation
US20180191643A1 (en) * 2016-12-30 2018-07-05 Facebook, Inc. User communications with a third party through a social networking system
WO2018174443A1 (en) 2017-03-23 2018-09-27 Samsung Electronics Co., Ltd. Electronic apparatus, controlling method of thereof and non-transitory computer readable recording medium
US10572595B2 (en) * 2017-04-13 2020-02-25 Baidu Usa Llc Global normalized reader systems and methods
US10679129B2 (en) * 2017-09-28 2020-06-09 D5Ai Llc Stochastic categorical autoencoder network
US11645277B2 (en) * 2017-12-11 2023-05-09 Google Llc Generating and/or utilizing a machine learning model in response to a search request
WO2019160556A1 (en) * 2018-02-16 2019-08-22 Hewlett-Packard Development Company, L.P. Encoded features and rate-based augmentation based speech authentication
US10726826B2 (en) * 2018-03-04 2020-07-28 International Business Machines Corporation Voice-transformation based data augmentation for prosodic classification
US20190327330A1 (en) * 2018-04-20 2019-10-24 Facebook, Inc. Building Customized User Profiles Based on Conversational Data
EP3561742A1 (en) * 2018-04-25 2019-10-30 Volume Limited Test and training data
US10795752B2 (en) * 2018-06-07 2020-10-06 Accenture Global Solutions Limited Data validation
CN117912447A (zh) 2018-06-25 2024-04-19 谷歌有限责任公司 热词感知语音合成
US11183176B2 (en) * 2018-10-31 2021-11-23 Walmart Apollo, Llc Systems and methods for server-less voice applications
US11281857B1 (en) * 2018-11-08 2022-03-22 Amazon Technologies, Inc. Composite slot type resolution
US11003909B2 (en) * 2019-03-20 2021-05-11 Raytheon Company Neural network trained by homographic augmentation
US11158307B1 (en) * 2019-03-25 2021-10-26 Amazon Technologies, Inc. Alternate utterance generation
EP3948570A4 (en) * 2019-03-26 2022-12-21 The Regents of the University of California DISTRIBUTED PRIVACY CALCULATION ON PROTECTED DATA
US10679012B1 (en) 2019-04-18 2020-06-09 Capital One Services, Llc Techniques to add smart device information to machine learning for increased context
US11562180B2 (en) * 2019-05-03 2023-01-24 Microsoft Technology Licensing, Llc Characterizing failures of a machine learning model based on instance features
US10924393B2 (en) * 2019-06-05 2021-02-16 Cisco Technology, Inc. Per-flow call admission control using a predictive model to estimate tunnel QoS in SD-WAN networks
US11886996B2 (en) * 2019-06-20 2024-01-30 Nippon Telegraph And Telephone Corporation Training data extension apparatus, training data extension method, and program
US20190311254A1 (en) * 2019-06-21 2019-10-10 Intel Corporation Technologies for performing in-memory training data augmentation for artificial intelligence
US20210064862A1 (en) * 2019-08-28 2021-03-04 Cognizant Technology Solutions India Pvt. Ltd. System and a method for developing a tool for automated data capture
US11070441B2 (en) * 2019-09-23 2021-07-20 Cisco Technology, Inc. Model training for on-premise execution in a network assurance system
US11803887B2 (en) * 2019-10-02 2023-10-31 Microsoft Technology Licensing, Llc Agent selection using real environment interaction
US20220245574A1 (en) * 2019-11-05 2022-08-04 Strong Force Vcn Portfolio 2019, Llc Systems, Methods, Kits, and Apparatuses for Digital Product Network Systems and Biology-Based Value Chain Networks
US11423235B2 (en) * 2019-11-08 2022-08-23 International Business Machines Corporation Cognitive orchestration of multi-task dialogue system
US11657307B1 (en) * 2019-11-27 2023-05-23 Amazon Technologies, Inc. Data lake-based text generation and data augmentation for machine learning training
US11804219B2 (en) * 2020-06-12 2023-10-31 Oracle International Corporation Entity level data augmentation in chatbots for robust named entity recognition
WO2022043675A2 (en) * 2020-08-24 2022-03-03 Unlikely Artificial Intelligence Limited A computer implemented method for the automated analysis or use of data

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116612743A (zh) * 2023-06-20 2023-08-18 北京云思智学科技有限公司 语音识别模型的评估方法及装置、电子设备和存储介质
CN117056720A (zh) * 2023-07-27 2023-11-14 远光软件股份有限公司 预训练语言的学习微调方法、计算机装置及计算机可读存储介质

Also Published As

Publication number Publication date
JP2023530423A (ja) 2023-07-18
US20240013780A1 (en) 2024-01-11
WO2021252845A1 (en) 2021-12-16
JP2025118956A (ja) 2025-08-13
US12603085B2 (en) 2026-04-14
JP7686678B2 (ja) 2025-06-02
US20210390951A1 (en) 2021-12-16
US11804219B2 (en) 2023-10-31
EP4165540A1 (en) 2023-04-19

Similar Documents

Publication Publication Date Title
JP7686678B2 (ja) 堅牢な固有表現認識のためのチャットボットにおけるエンティティレベルデータ拡張
US12361219B2 (en) Context tag integration with named entity recognition models
CN114424185B (zh) 用于自然语言处理的停用词数据扩充
CN115398437B (zh) 改进的域外(ood)检测技术
CN115398436B (zh) 用于自然语言处理的噪声数据扩充
CN116802629B (zh) 用于自然语言处理的多因素建模
US12153885B2 (en) Multi-feature balancing for natural language processors
CN118140230A (zh) 对经预训练的语言模型的单个转换器层的多头网络进行微调
US12412563B2 (en) Path dropout for natural language processing
KR102821062B1 (ko) 사전-트레이닝된 언어 모델들에 대한 긴 텍스트를 핸들링하기 위한 시스템 및 기술들
CN118202344A (zh) 用于从文档中提取嵌入式数据的深度学习技术
US12572852B2 (en) Lexical dropout for natural language processing
CN116490879A (zh) 用于神经网络中过度预测的方法和系统
KR20250029146A (ko) 개체-인식 데이터 증강을 위한 기술들
CN118251668A (zh) 用于从数据中提取问题答案对的基于规则的技术
US12499385B2 (en) Adaptive training data augmentation to facilitate training named entity recognition models
US20260065171A1 (en) Adaptive training data augmentation to facilitate training named entity recognition models
CN121936414A (zh) 用于为预训练的语言模型处置长文本的系统和技术

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination