CN114424185B - 用于自然语言处理的停用词数据扩充 - Google Patents

用于自然语言处理的停用词数据扩充

Info

Publication number
CN114424185B
CN114424185B CN202080064541.4A CN202080064541A CN114424185B CN 114424185 B CN114424185 B CN 114424185B CN 202080064541 A CN202080064541 A CN 202080064541A CN 114424185 B CN114424185 B CN 114424185B
Authority
CN
China
Prior art keywords
utterance
utterances
intent
training set
robot
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202080064541.4A
Other languages
English (en)
Chinese (zh)
Other versions
CN114424185A (zh
Inventor
V·比什诺伊
M·E·约翰逊
E·L·贾拉鲁丁
B·S·文纳科塔
T·L·杜翁
G·辛格拉朱
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Oracle International Corp
Original Assignee
Oracle International Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oracle International Corp filed Critical Oracle International Corp
Publication of CN114424185A publication Critical patent/CN114424185A/zh
Application granted granted Critical
Publication of CN114424185B publication Critical patent/CN114424185B/zh
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3338Query expansion
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3335Syntactic pre-processing, e.g. stopword elimination, stemming
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/247Thesauruses; Synonyms
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/253Grammatical analysis; Style critique
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • G06F40/35Discourse or dialogue representation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L51/00User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
    • H04L51/02User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail using automatic reactions or user delegation, e.g. automatic replies or chatbot-generated messages

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Multimedia (AREA)
  • Acoustics & Sound (AREA)
  • Mathematical Physics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Probability & Statistics with Applications (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
CN202080064541.4A 2019-09-16 2020-09-11 用于自然语言处理的停用词数据扩充 Active CN114424185B (zh)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US201962901203P 2019-09-16 2019-09-16
US62/901,203 2019-09-16
PCT/US2020/050407 WO2021055247A1 (en) 2019-09-16 2020-09-11 Stop word data augmentation for natural language processing

Publications (2)

Publication Number Publication Date
CN114424185A CN114424185A (zh) 2022-04-29
CN114424185B true CN114424185B (zh) 2026-01-02

Family

ID=72659345

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202080064541.4A Active CN114424185B (zh) 2019-09-16 2020-09-11 用于自然语言处理的停用词数据扩充

Country Status (5)

Country Link
US (1) US11651768B2 (https=)
EP (1) EP4032004A1 (https=)
JP (1) JP7561836B2 (https=)
CN (1) CN114424185B (https=)
WO (1) WO2021055247A1 (https=)

Families Citing this family (61)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8977255B2 (en) 2007-04-03 2015-03-10 Apple Inc. Method and system for operating a multi-function portable electronic device using voice-activation
US8676904B2 (en) 2008-10-02 2014-03-18 Apple Inc. Electronic devices with voice command and contextual data processing capabilities
DE112014000709B4 (de) 2013-02-07 2021-12-30 Apple Inc. Verfahren und vorrichtung zum betrieb eines sprachtriggers für einen digitalen assistenten
US9715875B2 (en) 2014-05-30 2017-07-25 Apple Inc. Reducing the need for manual start/end-pointing and trigger phrases
US10170123B2 (en) 2014-05-30 2019-01-01 Apple Inc. Intelligent assistant for home automation
US9338493B2 (en) 2014-06-30 2016-05-10 Apple Inc. Intelligent automated assistant for TV user interactions
US9886953B2 (en) 2015-03-08 2018-02-06 Apple Inc. Virtual assistant activation
US10460227B2 (en) 2015-05-15 2019-10-29 Apple Inc. Virtual assistant in a communication session
US10671428B2 (en) 2015-09-08 2020-06-02 Apple Inc. Distributed personal assistant
US10747498B2 (en) 2015-09-08 2020-08-18 Apple Inc. Zero latency digital assistant
US10331312B2 (en) 2015-09-08 2019-06-25 Apple Inc. Intelligent automated assistant in a media environment
US10691473B2 (en) 2015-11-06 2020-06-23 Apple Inc. Intelligent automated assistant in a messaging environment
US10586535B2 (en) 2016-06-10 2020-03-10 Apple Inc. Intelligent digital assistant in a multi-tasking environment
US12197817B2 (en) 2016-06-11 2025-01-14 Apple Inc. Intelligent device arbitration and control
DK201670540A1 (en) 2016-06-11 2018-01-08 Apple Inc Application integration with a digital assistant
US11204787B2 (en) 2017-01-09 2021-12-21 Apple Inc. Application integration with a digital assistant
DK179496B1 (en) 2017-05-12 2019-01-15 Apple Inc. USER-SPECIFIC Acoustic Models
DK201770428A1 (en) 2017-05-12 2019-02-18 Apple Inc. LOW-LATENCY INTELLIGENT AUTOMATED ASSISTANT
US10303715B2 (en) 2017-05-16 2019-05-28 Apple Inc. Intelligent automated assistant for media exploration
GB2569335B (en) * 2017-12-13 2022-07-27 Sage Global Services Ltd Chatbot system
US10818288B2 (en) 2018-03-26 2020-10-27 Apple Inc. Natural assistant interaction
US10928918B2 (en) 2018-05-07 2021-02-23 Apple Inc. Raise to speak
DK180639B1 (en) 2018-06-01 2021-11-04 Apple Inc DISABILITY OF ATTENTION-ATTENTIVE VIRTUAL ASSISTANT
DK201870355A1 (en) 2018-06-01 2019-12-16 Apple Inc. VIRTUAL ASSISTANT OPERATION IN MULTI-DEVICE ENVIRONMENTS
US11462215B2 (en) 2018-09-28 2022-10-04 Apple Inc. Multi-modal inputs for voice commands
US11348573B2 (en) 2019-03-18 2022-05-31 Apple Inc. Multimodality in digital assistant systems
DK201970509A1 (en) 2019-05-06 2021-01-15 Apple Inc Spoken notifications
US11307752B2 (en) 2019-05-06 2022-04-19 Apple Inc. User configurable task triggers
US11227599B2 (en) 2019-06-01 2022-01-18 Apple Inc. Methods and user interfaces for voice-based control of electronic devices
US11222630B1 (en) * 2019-09-19 2022-01-11 Amazon Technologies, Inc. Detecting false accepts in a shopping domain for handling a spoken dialog
US11593608B2 (en) * 2019-10-28 2023-02-28 Paypal, Inc. Systems and methods for predicting and providing automated online chat assistance
US11321532B2 (en) * 2019-12-17 2022-05-03 Microsoft Technology Licensing, Llc Conversational manifests for enabling complex bot communications
US11741140B2 (en) 2019-12-17 2023-08-29 Microsoft Technology Licensing, Llc Marketplace for conversational bot skills
WO2021134432A1 (en) * 2019-12-31 2021-07-08 Paypal, Inc. Framework for managing natural language processing tools
US11909698B2 (en) * 2020-01-17 2024-02-20 Bitonic Technology Labs, Inc. Method and system for identifying ideal virtual assistant bots for providing response to user queries
US11316806B1 (en) * 2020-01-28 2022-04-26 Snap Inc. Bulk message deletion
CN111414731B (zh) * 2020-02-28 2023-08-11 北京小米松果电子有限公司 文本标注方法和装置
US12301635B2 (en) 2020-05-11 2025-05-13 Apple Inc. Digital assistant hardware abstraction
US11061543B1 (en) 2020-05-11 2021-07-13 Apple Inc. Providing relevant data items based on context
US11490204B2 (en) 2020-07-20 2022-11-01 Apple Inc. Multi-device audio adjustment coordination
US11438683B2 (en) 2020-07-21 2022-09-06 Apple Inc. User identification using headphones
US11715464B2 (en) * 2020-09-14 2023-08-01 Apple Inc. Using augmentation to create natural language models
US11893354B2 (en) 2021-03-25 2024-02-06 Cognizant Technology Solutions India Pvt. Ltd. System and method for improving chatbot training dataset
EP4315320A4 (en) * 2021-03-30 2025-01-15 Five9, Inc. SYSTEMS AND METHODS FOR TRAINING NATURAL LANGUAGE PROCESSING MODELS IN A CONTACT CENTER
US12026471B2 (en) * 2021-04-16 2024-07-02 Accenture Global Solutions Limited Automated generation of chatbot
US11663421B2 (en) * 2021-04-27 2023-05-30 Jpmorgan Chase Bank, N.A. Systems and methods for intent-based natural language processing
US12321428B2 (en) * 2021-07-08 2025-06-03 Nippon Telegraph And Telephone Corporation User authentication device, user authentication method, and user authentication computer program
US12468938B2 (en) * 2021-09-21 2025-11-11 International Business Machines Corporation Training example generation to create new intents for chatbots
CN114881035B (zh) * 2022-05-13 2023-07-25 平安科技(深圳)有限公司 训练数据的增广方法、装置、设备和存储介质
US12579448B2 (en) * 2022-06-22 2026-03-17 Oracle International Corporation Techniques for positive entity aware augmentation using two-stage augmentation
US12288031B2 (en) * 2022-07-13 2025-04-29 Adp, Inc. Filtering user intent eligibility
US12499385B2 (en) * 2022-08-22 2025-12-16 Oracle International Corporation Adaptive training data augmentation to facilitate training named entity recognition models
US20240169165A1 (en) * 2022-11-17 2024-05-23 Samsung Electronics Co., Ltd. Automatically Generating Annotated Ground-Truth Corpus for Training NLU Model
KR20240076977A (ko) * 2022-11-24 2024-05-31 고려대학교 산학협력단 개체 유형 및 관계 정보에 대한 프롬프트 및 빈칸 추론을 이용한 대화 관계 추출 방법 및 장치
US20240185369A1 (en) * 2022-12-05 2024-06-06 Capital One Services, Llc Biasing machine learning model outputs
US12573391B2 (en) * 2023-03-22 2026-03-10 Meta Platforms, Inc. Generating contextual responses for out-of-coverage requests for assistant systems
TWI897311B (zh) * 2023-03-22 2025-09-11 宏達國際電子股份有限公司 語言處理方法以及語言處理系統
US12608562B2 (en) * 2023-09-21 2026-04-21 Google Llc Providing personalized prompts to users based on documents in cloud storage
TWI882526B (zh) * 2023-11-14 2025-05-01 開曼群島商沛嘻科技股份有限公司 智能對話導入方法與系統
CN118427309B (zh) * 2024-07-03 2024-08-27 云储新能源科技有限公司 一种基于自然语言交互的储能管理系统参数提取方法
US12367342B1 (en) * 2025-01-15 2025-07-22 Conversational AI Ltd Automated analysis of computerized conversational agent conversational data

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2010033157A (ja) * 2008-07-25 2010-02-12 Sharp Corp 情報処理装置および情報処理方法
CN107862027A (zh) * 2017-10-31 2018-03-30 北京小度信息科技有限公司 检索意图识别方法、装置、电子设备及可读存储介质

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE112014007123T5 (de) 2014-10-30 2017-07-20 Mitsubishi Electric Corporation Dialogsteuersystem und Dialogsteuerverfahren
JP6448765B2 (ja) 2015-03-20 2019-01-09 株式会社東芝 対話装置、方法及びプログラム
US10579834B2 (en) 2015-10-26 2020-03-03 [24]7.ai, Inc. Method and apparatus for facilitating customer intent prediction
WO2017112813A1 (en) * 2015-12-22 2017-06-29 Sri International Multi-lingual virtual personal assistant
US11158311B1 (en) * 2017-08-14 2021-10-26 Guangsheng Zhang System and methods for machine understanding of human intentions
CN107515857B (zh) 2017-08-31 2020-08-18 科大讯飞股份有限公司 基于定制技能的语义理解方法及系统
US10778614B2 (en) 2018-03-08 2020-09-15 Andre Arzumanyan Intelligent apparatus and method for responding to text messages
CN109241533A (zh) 2018-09-06 2019-01-18 科大国创软件股份有限公司 一种基于自然语言处理的语意理解系统及方法
US11093707B2 (en) * 2019-01-15 2021-08-17 International Business Machines Corporation Adversarial training data augmentation data for text classifiers
WO2020163627A1 (en) * 2019-02-07 2020-08-13 Clinc, Inc. Systems and methods for machine learning-based multi-intent segmentation and classification

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2010033157A (ja) * 2008-07-25 2010-02-12 Sharp Corp 情報処理装置および情報処理方法
CN107862027A (zh) * 2017-10-31 2018-03-30 北京小度信息科技有限公司 检索意图识别方法、装置、电子设备及可读存储介质

Also Published As

Publication number Publication date
CN114424185A (zh) 2022-04-29
WO2021055247A1 (en) 2021-03-25
JP2022547631A (ja) 2022-11-14
JP7561836B2 (ja) 2024-10-04
US20210082400A1 (en) 2021-03-18
EP4032004A1 (en) 2022-07-27
US11651768B2 (en) 2023-05-16

Similar Documents

Publication Publication Date Title
CN114424185B (zh) 用于自然语言处理的停用词数据扩充
CN115398437B (zh) 改进的域外(ood)检测技术
CN116724305B (zh) 上下文标签与命名实体识别模型的集成
CN115398436B (zh) 用于自然语言处理的噪声数据扩充
CN116583837B (zh) 用于自然语言处理的基于距离的logit值
CN116547676B (zh) 用于自然语言处理的增强型logit
CN115917553A (zh) 在聊天机器人中实现稳健命名实体识别的实体级数据扩充
CN116635862A (zh) 用于自然语言处理的域外数据扩充
JP2024539003A (ja) 事前トレーニングされた言語モデルの単一のトランスフォーマ層からのマルチヘッドネットワークの微調整
CN116615727A (zh) 用于自然语言处理的关键词数据扩充工具
CN118265981B (zh) 用于为预训练的语言模型处置长文本的系统和技术
CN118202344A (zh) 用于从文档中提取嵌入式数据的深度学习技术
CN116724306A (zh) 用于自然语言处理器的多特征平衡
CN118235143A (zh) 自然语言处理的路径失活
CN118215920A (zh) 用于使用散列嵌入进行语言检测的宽深网络
KR20250029146A (ko) 개체-인식 데이터 증강을 위한 기술들
CN118251668A (zh) 用于从数据中提取问题答案对的基于规则的技术
CN119768794A (zh) 自适应训练数据扩充以促进命名实体识别模型的训练

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant