WO2022135496A1 - 一种语音交互的数据处理方法和装置 - Google Patents

一种语音交互的数据处理方法和装置 Download PDF

Info

Publication number
WO2022135496A1
WO2022135496A1 PCT/CN2021/140591 CN2021140591W WO2022135496A1 WO 2022135496 A1 WO2022135496 A1 WO 2022135496A1 CN 2021140591 W CN2021140591 W CN 2021140591W WO 2022135496 A1 WO2022135496 A1 WO 2022135496A1
Authority
WO
WIPO (PCT)
Prior art keywords
voice interaction
information
intent
interaction information
sample
Prior art date
Application number
PCT/CN2021/140591
Other languages
English (en)
French (fr)
Inventor
韩传宇
易晖
翁志伟
Original Assignee
广州橙行智动汽车科技有限公司
广州小鹏汽车科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 广州橙行智动汽车科技有限公司, 广州小鹏汽车科技有限公司 filed Critical 广州橙行智动汽车科技有限公司
Publication of WO2022135496A1 publication Critical patent/WO2022135496A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • G06F40/35Discourse or dialogue representation

Definitions

  • the present invention relates to the field of data processing, in particular to a data processing method and device for voice interaction.
  • a data processing method for voice interaction comprising:
  • a semantic rejection result is determined according to the target intent field, and the voice interaction event is responded to according to the semantic rejection result.
  • the method before the determining of the historical voice interaction information, the method further includes:
  • the determining of historical voice interaction information is performed.
  • determining the target intent field for the voice interaction event according to the historical voice interaction information includes:
  • the intent prediction result output by the intent classification model is received, and according to the intent prediction result, the target intent domain for the voice interaction event is determined.
  • sample voice interaction combination information Obtain sample voice interaction combination information, and determine sample combination intent label information corresponding to the sample voice interaction combination information
  • Model training is performed according to the sample voice interaction combination information and the sample combination intention label information to obtain an intention classification model.
  • the obtaining sample voice interaction combination information includes:
  • sample voice interaction combination information is obtained.
  • the determining the sample combination intention label information corresponding to the sample voice interaction combination information includes:
  • the sample combination intent label information is determined.
  • the determining of the semantic rejection result according to the target intent field includes:
  • the target intent domain is the specified intent domain, determining that the semantic rejection result is a rejection processing result
  • the semantic rejection result is determined as a non-rejection processing result.
  • a data processing device for voice interaction comprising:
  • a historical voice interaction information determination module used for determining historical voice interaction information when a voice interaction event is detected
  • a target intent domain determination module configured to determine a target intent domain for the voice interaction event according to the historical voice interaction information
  • a voice interaction event response module configured to determine a semantic rejection result according to the target intent field, and respond to the voice interaction event according to the semantic rejection result.
  • a server comprising a processor, a memory, and a computer program stored on the memory and capable of running on the processor, the computer program being executed by the processor to realize the above-mentioned data processing of voice interaction method.
  • a computer-readable storage medium stores a computer program on the computer-readable storage medium, and when the computer program is executed by a processor, realizes the above-mentioned data processing method for voice interaction.
  • a voice interaction event when a voice interaction event is detected, historical voice interaction information is determined; according to the historical voice interaction information, a target intent field for the voice interaction event is determined; according to the target intent field, Semantic rejection result, and responding to the voice interaction event according to the semantic rejection result, realizing the optimization processing of rejection for weak-intent conversations, by determining the target intent field according to historical voice interaction information, and then determining based on the target intent field
  • the result of semantic rejection can comprehensively understand the user's voice interaction information, solve the selection problem of releasing or rejecting weak-intent conversations, and improve the ability of semantic rejection.
  • FIG. 1 is a flowchart of steps of a data processing method for voice interaction provided by an embodiment of the present invention
  • FIG. 2 is a flowchart of steps of another voice interaction data processing method provided by an embodiment of the present invention.
  • FIG. 3 is a flowchart of steps of another voice interaction data processing method provided by an embodiment of the present invention.
  • FIG. 4 is a schematic diagram of a technical framework for semantic rejection provided by an embodiment of the present invention.
  • FIG. 5 is a schematic structural diagram of a data processing apparatus for voice interaction according to an embodiment of the present invention.
  • FIG. 1 a flowchart of steps of a data processing method for voice interaction provided by an embodiment of the present invention is shown, which may specifically include the following steps:
  • Step 101 when a voice interaction event is detected, determine historical voice interaction information
  • the voice interaction event may be a user-triggered voice interaction operation.
  • the user may send voice interaction information through the voice interaction operation, and the in-vehicle system may perform voice interaction processing on the voice interaction information.
  • historical voice interaction information can be determined to further perform voice interaction processing for the voice interaction event according to the historical voice interaction information.
  • a detected voice interaction event its previous voice interaction information can be used as historical voice interaction information.
  • its previous voice interaction information can be used as historical voice interaction information.
  • it can be The voice interaction information of the round conversation is obtained first, and the historical voice interaction information is obtained.
  • Step 102 determine the target intent field for the voice interaction event
  • the target intent field for the voice interaction event can be determined according to the historical voice interaction information.
  • the target intent field can be the user intent field for the voice interaction event. For example, for NLU natural language understanding, it can have Multiple taxonomy domains that can characterize different user intent domains.
  • multiple intent domains can be preset, and each intent domain can have a domain name, such as the domain name can be air conditioner, chat, weather, which can represent the user's intention to control the air conditioner. , chatting, querying the weather, and other intended fields, which are not limited in the present invention.
  • Step 103 Determine a semantic rejection result according to the target intent domain, and respond to the voice interaction event according to the semantic rejection result.
  • the semantic rejection result can be determined according to the target intent domain, and then the voice interaction event can be responded to according to the semantic rejection result. deal with.
  • a voice interaction event when a voice interaction event is detected, historical voice interaction information is determined; according to the historical voice interaction information, a target intent field for the voice interaction event is determined; according to the target intent field, Semantic rejection result, and responding to the voice interaction event according to the semantic rejection result, realizing the optimization processing of rejection for weak-intent conversations, by determining the target intent field according to historical voice interaction information, and then determining based on the target intent field
  • the result of semantic rejection can comprehensively understand the user's voice interaction information, solve the selection problem of releasing or rejecting weak-intent conversations, and improve the ability of semantic rejection.
  • FIG. 2 a flow chart of the steps of another voice interaction data processing method provided by an embodiment of the present invention is shown, which may specifically include the following steps:
  • Step 201 when a voice interaction event is detected, determine target voice interaction information of the voice interaction event;
  • the target voice interaction information of the voice interaction event can be determined, and the target voice interaction information can be the user voice interaction information for the voice interaction event.
  • the received user voice interaction information of the current round session is the target voice interaction information of the current round session.
  • Step 202 judging whether the target voice interaction information is voice interaction information with weak intent
  • the target voice interaction information is weak-intent voice interaction information, so as to further select release or refusal processing for the weak-intent voice interaction information.
  • specific instruction information such as a wake-up word
  • the wake-up word can be set, and then when the wake-up word is detected, it can be determined that the user is sending instructions to the interactive device, because the instruction information is clear. If it is clear, the operation corresponding to the instruction information can be performed. For example, when the voice interaction information is "wake-up word, 23 degrees", the operation of setting the air conditioner to 23 degrees can be performed; however, in the scenario of continuous listening, if the user's statement is brief, such as "23 degrees", this statement can be called weak intent, the weak intent session does not have specific instruction information, nor does it include contextual information used to directly characterize the user's intent.
  • the judgment condition for the weak-intent conversation can be preset, and after obtaining the target voice interaction information, it can be judged whether it is the weak-intent voice interaction information, and the target voice interaction information is the weak-intent voice.
  • the follow-up semantic rejection processing judgment is performed, and the voice interaction information of the weak intent is selected to be released or rejected.
  • Step 203 when determining that the target voice interaction information is voice interaction information with weak intent, determine historical voice interaction information
  • historical voice interaction information may be determined according to the target voice interaction information.
  • the target voice interaction information of the current session round can be obtained by obtaining the voice interaction information of its historical session rounds, for example, for the current session round.
  • the voice interaction information of the second session can be used as the historical voice interaction information of the previous session.
  • Step 204 inputting the target voice interaction information and the historical voice interaction information into a pre-trained intent classification model
  • the target voice interaction information and the historical voice interaction information can be merged, and then the merged voice interaction information can be input into a pre-trained intent classification model for intent classification. predict.
  • Step 205 receiving the intent prediction result output by the intent classification model, and determining the target intent field for the voice interaction event according to the intent prediction result;
  • the intent prediction result output by the intent classification model can be received, and the target intent field for the voice interaction event can be determined according to the intent prediction result.
  • the voice interaction information of weak intent can be combined with its historical voice interaction information, and then an intent classification model can be used to perform intent classification prediction on the combined voice interaction information,
  • the intent prediction result output by the model can be obtained.
  • the intent classification model can output the classification label result, which can correspond to the mapping relationship between the classification label and the intent domain, and then according to the output classification label result, the corresponding intent domain can be obtained.
  • Step 206 Determine a semantic rejection result according to the target intent domain, and respond to the voice interaction event according to the semantic rejection result.
  • the intention classification model can be used to perform classification prediction, and then the classification label result of the voice interaction information for the weak intention can be obtained, based on the classification label.
  • the classification label result of the voice interaction information for the weak intention can be obtained, based on the classification label.
  • the voice interaction information of weak intent can be "full scene voice”
  • the voice interaction information of the previous round of conversation can be "XX (a certain vehicle model) has many New function”
  • the output result ie the intention prediction result
  • the corresponding field can be determined as the "chat” field (ie the target Intention field)
  • recognition rejection processing is required, that is, the user's intention is not to enable the full-scene voice function, and the user can choose to reject the voice interaction information of the weak intention.
  • the voice interaction information of weak intent can be "I'm hungry", and the voice interaction information of the previous round of conversation can be "open the map and open the navigation".
  • the intent classification model is used for classification prediction, and the output result can be Classification label "navigation" (navigation), it can be determined that its corresponding field is the "navigation” field, and then based on the "navigation” field, it can be determined that non-rejection processing is required, that is, the user's intention is to recommend navigation to nearby restaurants, you can choose Release the voice interaction information for the weak intent.
  • step 206 may include the following sub-steps:
  • the semantic rejection result is determined to be a rejection processing result; when the target intent domain is a non-designated intent domain, the semantic rejection result is determined to be a non-recognition processing result result.
  • the specified intent domain may be the intent domain preset for the recognition rejection process. For example, by presetting the “small chat” domain as the specified intent domain, when it is detected that the target intent domain is the “small chat” domain, the identification will be rejected. deal with.
  • the semantic rejection result can be determined as the result of the rejection processing, or when the target intent domain is the non-specified intent domain, the semantic rejection result can be determined as the non-rejection processing result.
  • the field of "small talk" can be preset as the designated field of intent, and when it is detected that the field of target intention is the field of "chat", the recognition processing is performed; when it is detected that the field of target intention is not the field of "chat", the Release processing.
  • the context information is combined, the semantic space is expanded, the user's voice interaction information can be comprehensively understood, the semantic rejection ability is improved, and the voice with weak intent can be solved. Unintended puzzles in interactive information.
  • FIG. 3 a flowchart of steps of another voice interaction data processing method provided by an embodiment of the present invention is shown, which may specifically include the following steps:
  • Step 301 Obtain sample voice interaction combination information, and determine sample combination intent label information corresponding to the sample voice interaction combination information;
  • the sample voice interaction combination information can be obtained, and the sample combination intention label information corresponding to the sample voice interaction combination information can be determined, so as to further perform model training.
  • the step of acquiring sample voice interaction combination information may include the following sub-steps:
  • Sub-step 11 obtaining sample voice interaction information and sample intent domain information corresponding to the sample voice interaction information
  • sample voice interaction information and the sample intent domain information corresponding to the sample voice interaction information can be obtained.
  • massive data can be obtained, which can include the user voice interaction information and the intent domain information corresponding to the user voice interaction information.
  • Sub-step 12 from the sample voice interaction information, determine the seed voice interaction information, and determine the seed historical voice interaction information of the seed voice interaction information;
  • the seed voice interaction information can be determined from the corresponding sample voice interaction information for each sample intent domain information, and can be determined. Seed historical voice interaction information of the seed voice interaction information.
  • the high-frequency voice interaction information can be screened through the NLU domain classification results, and then a high-frequency voice interaction information base (ie, seed voice interaction information) for each domain (ie, sample intent domain information) can be established.
  • the seed voice interaction information base is generated in a supervised manner, which is efficient and labor-saving.
  • high-frequency voice interaction information can be obtained, and then through TF-IDF calculation, representative voice interaction information for each field can be obtained, among which, Table 1 can be the established high-frequency voice interaction information database :
  • its previous voice interaction information (such as the previous round of voice interaction information) may be used as the seed historical voice interaction information.
  • Sub-step 13 obtain sample voice interaction combination information according to the seed voice interaction information and the seed historical voice interaction information.
  • the sample voice interaction combination information can be obtained according to the seed voice interaction information and the seed historical voice interaction information.
  • the voice interaction information database after obtaining the voice interaction information database based on the seed voice interaction information, it can be represented by context-related text features.
  • the bert model can be used to represent the seed voice interaction information and its corresponding seed historical voice interaction information. , which can be characterized in the following way:
  • Seed historical voice interaction information and seed voice interaction information are merged:
  • the bert feature represents feature_matrix_size[q_length,768]:
  • the k-means method can be used to cluster the features, and an appropriate K value can be selected.
  • the K value can be automatically selected, and the K value can be the number of fields involved in the contextual voice interaction information.
  • the K value can be Set to 3.
  • the step of determining the sample combination intent label information corresponding to the sample voice interaction combination information may include the following sub-steps:
  • Sub-step 21 using the sample voice interaction combination information to generate a label matching result
  • the sample voice interaction combination information can be used to generate a label matching result, and the label matching result can be a label matching result obtained according to the clustering result, such as a pseudo-label mark.
  • $i can be used to represent the clustering result, i ⁇ [0,n], n is the number of categories, and then the pseudo-label can be obtained, which can be expressed as follows:
  • Sub-step 22 according to the label matching result, determine the sample combination intention label information.
  • the sample combination intent label information can be determined according to the label matching result, so that by using the sample voice interaction combination information to determine the sample combination intent label information method, the massive context-related voice interaction information can be processed. Pre-labeled.
  • the clustering result can be set as a pseudo-label, and then TF-IDF can be used to calculate the score of the interactive information database with the seed voice, and then the real label (ie, the sample combination intent label information) can be obtained.
  • the seed historical voice interaction information and the seed voice interaction information can be combined to obtain the sample voice interaction combination information, and then the sample voice interaction combination information can be combined with each in the seed voice interaction information database.
  • a field (such as the class in Table 1) calculates the TF-IDF value, and then the most relevant real label can be obtained according to the pseudo-label.
  • the clustered contextual voice interaction information features ie, sample voice interaction combination information
  • pseudo-label labels ie, label matching results
  • Step 302 Perform model training according to the sample voice interaction combination information and the sample combination intention label information to obtain an intention classification model
  • model training can be performed according to the sample voice interaction combination information and the sample combination intent label information, and then an intent classification model can be obtained.
  • the bert classification model can be trained, and other models can also be used, which is not limited in the present invention; after the intention classification model is obtained, the test data can be used for testing. , we can get the real intent classification results (namely the target intent domain) that the voice interaction information of weak intent is related to its context.
  • Step 303 when a voice interaction event is detected, determine the target voice interaction information of the voice interaction event;
  • Step 304 judging whether the target voice interaction information is voice interaction information with weak intent
  • Step 305 when determining that the target voice interaction information is voice interaction information with weak intent, determine historical voice interaction information
  • Step 306 inputting the target voice interaction information and the historical voice interaction information into a pre-trained intent classification model
  • Step 307 Receive the intent prediction result output by the intent classification model, and determine the target intent field for the voice interaction event according to the intent prediction result;
  • Step 308 Determine a semantic rejection result according to the target intent domain, and respond to the voice interaction event according to the semantic rejection result.
  • the bert features of the weak intent query and its context that is, the seed voice interaction information and the seed historical voice interaction information
  • the clustering results that is, the sample voice interaction combination information
  • the intent classification model uses the intent classification model to predict, and obtain the weak intent query classification label (ie the intent prediction result).
  • FIG. 5 a schematic structural diagram of a data processing device for voice interaction provided by an embodiment of the present invention is shown, which may specifically include the following modules:
  • a historical voice interaction information determination module 501 configured to determine historical voice interaction information when a voice interaction event is detected
  • a target intent domain determination module 502 configured to determine a target intent domain for the voice interaction event according to the historical voice interaction information
  • the voice interaction event response module 503 is configured to determine a semantic rejection result according to the target intent field, and respond to the voice interaction event according to the semantic rejection result.
  • it also includes:
  • a target voice interaction information determining module configured to determine the target voice interaction information of the voice interaction event
  • a judgment module for judging whether the target voice interaction information is voice interaction information with weak intent
  • the determining module is configured to call the historical voice interaction information determining module 501 when determining that the target voice interaction information is voice interaction information with weak intent.
  • the target intent domain determination module 502 includes:
  • a model input submodule used for inputting the target voice interaction information and the historical voice interaction information into a pre-trained intent classification model
  • the model output sub-module is configured to receive the intent prediction result output by the intent classification model, and determine the target intent field for the voice interaction event according to the intent prediction result.
  • it also includes:
  • a module for acquiring sample voice interaction combination information and sample combination intent label information configured to acquire sample voice interaction combination information, and determine sample combination intent label information corresponding to the sample voice interaction combination information
  • the intent classification model obtaining module is configured to perform model training according to the sample voice interaction combination information and the sample combination intent label information to obtain an intent classification model.
  • the acquisition module for the sample voice interaction combination information and the sample combination intention label information includes:
  • a submodule for acquiring sample voice interaction information and sample intent domain information which is used to obtain sample voice interaction information and sample intent domain information corresponding to the sample voice interaction information
  • a submodule for determining the seed voice interaction information for determining the seed voice interaction information from the sample voice interaction information, and determining the seed historical voice interaction information of the seed voice interaction information
  • the sample voice interaction combination information obtaining sub-module is configured to obtain sample voice interaction combination information according to the seed voice interaction information and the seed historical voice interaction information.
  • the acquisition module for the sample voice interaction combination information and the sample combination intention label information further includes:
  • a label matching result generating submodule is used to generate a label matching result by using the sample voice interaction combination information
  • the sample combination intention label information determination sub-module is configured to determine the sample combination intention label information according to the label matching result.
  • the voice interaction event response module 503 includes:
  • a recognition rejection processing result determination submodule configured to determine the semantic rejection result as a rejection processing result when the target intent field is a designated intention field;
  • the non-rejection processing result determination sub-module is configured to determine the semantic rejection result as a non-rejection processing result when the target intent domain is a non-designated intent domain.
  • a voice interaction event when a voice interaction event is detected, historical voice interaction information is determined; according to the historical voice interaction information, a target intent field for the voice interaction event is determined; according to the target intent field, Semantic rejection result, and responding to the voice interaction event according to the semantic rejection result, realizing the optimization processing of rejection for weak-intent conversations, by determining the target intent field according to historical voice interaction information, and then determining based on the target intent field
  • the result of semantic rejection can comprehensively understand the user's voice interaction information, solve the selection problem of releasing or rejecting weak-intent conversations, and improve the ability of semantic rejection.
  • An embodiment of the present invention also provides a server, which may include a processor, a memory, and a computer program stored in the memory and capable of running on the processor.
  • a server which may include a processor, a memory, and a computer program stored in the memory and capable of running on the processor.
  • An embodiment of the present invention also provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the above data processing method for voice interaction is implemented.
  • embodiments of the present invention may be provided as a method, an apparatus, or a computer program product. Accordingly, embodiments of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present invention may take the form of a computer program product implemented on one or more computer-usable storage media having computer-usable program code embodied therein, including but not limited to disk storage, CD-ROM, optical storage, and the like.
  • Embodiments of the present invention are described with reference to flowcharts and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the present invention. It will be understood that each flow and/or block in the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to the processor of a general purpose computer, special purpose computer, embedded processor or other programmable data processing terminal equipment to produce a machine that causes the instructions to be executed by the processor of the computer or other programmable data processing terminal equipment Means are created for implementing the functions specified in the flow or flows of the flowcharts and/or the blocks or blocks of the block diagrams.
  • These computer program instructions may also be stored in a computer readable memory capable of directing a computer or other programmable data processing terminal equipment to operate in a particular manner, such that the instructions stored in the computer readable memory result in an article of manufacture comprising instruction means, the The instruction means implement the functions specified in the flow or flow of the flowcharts and/or the block or blocks of the block diagrams.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Machine Translation (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

本发明实施例提供了一种语音交互的数据处理方法和装置,所述方法包括:在检测到语音交互事件时,确定历史语音交互信息;根据所述历史语音交互信息,确定针对所述语音交互事件的目标意图领域;根据所述目标意图领域,确定语义拒识结果,并根据所述语义拒识结果,响应所述语音交互事件。通过本发明实施例,实现了针对弱意图会话的拒识优化处理,通过根据历史语音交互信息确定目标意图领域,进而基于目标意图领域确定语义拒识结果,能够对用户语音交互信息进行综合理解,解决了放行或拒识弱意图会话的选择问题,提升了语义拒识能力。

Description

一种语音交互的数据处理方法和装置
本发明要求在2020年12月23日提交中国专利局、申请号202011541411.3、发明名称为“一种语音交互的数据处理方法和装置”的中国专利申请的优先权,其全部内容通过引用结合在本发明中。
技术领域
本发明涉及数据处理领域,特别是涉及一种语音交互的数据处理方法和装置。
背景技术
目前,在用户与车机进行语音交互的过程中,针对持续倾听的场景,当用户说法简略时(如“23度”,这种说法通常被称为弱意图),由于车机接收到弱意图会话中指令不明确,也不具有上下文信息,无法进行意图判断,导致了弱意图会话被拒识,但某些情况下会产生误拦截,如何综合理解用户query(语音交互信息),以针对弱意图会话选择放行或拒识是目前急需解决的问题。
发明内容
鉴于上述问题,提出了以便提供克服上述问题或者至少部分地解决上述问题的一种语音交互的数据处理方法和装置,包括:
一种语音交互的数据处理方法,所述方法包括:
在检测到语音交互事件时,确定历史语音交互信息;
根据所述历史语音交互信息,确定针对所述语音交互事件的目标意图领域;
根据所述目标意图领域,确定语义拒识结果,并根据所述语义拒识结果,响应所述语音交互事件。
可选地,在所述确定历史语音交互信息之前,还包括:
确定所述语音交互事件的目标语音交互信息;
判断所述目标语音交互信息是否为弱意图的语音交互信息;
在判定所述目标语音交互信息为弱意图的语音交互信息时,执行所述确定历史语音交互信息。
可选地,所述根据所述历史语音交互信息,确定针对所述语音交互事件的目标意图领域,包括:
将所述目标语音交互信息和所述历史语音交互信息输入预先训练的意图分类模型;
接收所述意图分类模型输出的意图预测结果,并根据所述意图预测结果,确定针对所述语音交互事件的目标意图领域。
可选地,还包括:
获取样本语音交互组合信息,并确定所述样本语音交互组合信息对应的样本组合意图标签信息;
根据所述样本语音交互组合信息和所述样本组合意图标签信息进行模型训练,得到意图分类模型。
可选地,所述获取样本语音交互组合信息,包括:
获取样本语音交互信息和所述样本语音交互信息对应的样本意图领域信息;
从所述样本语音交互信息中,确定种子语音交互信息,并确定所述种子语音交互信息的种子历史语音交互信息;
根据所述种子语音交互信息和所述种子历史语音交互信息,得到样本语音交互组合信息。
可选地,所述确定所述样本语音交互组合信息对应的样本组合意图标签信息,包括:
采用所述样本语音交互组合信息,生成标签匹配结果;
根据所述标签匹配结果,确定样本组合意图标签信息。
可选地,所述根据所述目标意图领域,确定语义拒识结果,包括:
在所述目标意图领域为指定意图领域时,确定所述语义拒识结果为拒识处理结果;
在所述目标意图领域为非指定意图领域时,确定所述语义拒识结果为非 拒识处理结果。
一种语音交互的数据处理装置,所述装置包括:
历史语音交互信息确定模块,用于在检测到语音交互事件时,确定历史语音交互信息;
目标意图领域确定模块,用于根据所述历史语音交互信息,确定针对所述语音交互事件的目标意图领域;
语音交互事件响应模块,用于根据所述目标意图领域,确定语义拒识结果,并根据所述语义拒识结果,响应所述语音交互事件。
一种服务器,包括处理器、存储器及存储在所述存储器上并能够在所述处理器上运行的计算机程序,所述计算机程序被所述处理器执行时实现如上所述的语音交互的数据处理方法。
一种计算机可读存储介质,所述计算机可读存储介质上存储计算机程序,所述计算机程序被处理器执行时实现如上所述的语音交互的数据处理方法。
本发明实施例具有以下优点:
在本发明实施例中,通过在检测到语音交互事件时,确定历史语音交互信息;根据所述历史语音交互信息,确定针对所述语音交互事件的目标意图领域;根据所述目标意图领域,确定语义拒识结果,并根据所述语义拒识结果,响应所述语音交互事件,实现了针对弱意图会话的拒识优化处理,通过根据历史语音交互信息确定目标意图领域,进而基于目标意图领域确定语义拒识结果,能够对用户语音交互信息进行综合理解,解决了放行或拒识弱意图会话的选择问题,提升了语义拒识能力。
上述说明仅是本发明技术方案的概述,为了能够更清楚了解本发明的技术手段,而可依照说明书的内容予以实施,并且为了让本发明的上述和其它目的、特征和优点能够更明显易懂,以下特举本发明的具体实施方式。
附图说明
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作一简单地介绍,显而易见地,下面描述中的附图是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本发明一实施例提供的一种语音交互的数据处理方法的步骤流程图;
图2是本发明一实施例提供的另一种语音交互的数据处理方法的步骤流程图;
图3是本发明一实施例提供的另一种语音交互的数据处理方法的步骤流程图;
图4是本发明一实施例提供的一种语义拒识技术构架的示意图;
图5是本发明一实施例提供的一种语音交互的数据处理装置的结构示意图。
具体实施例
为使本发明实施例的目的、技术方案和优点更加清楚,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。
参照图1,示出了本发明一实施例提供的一种语音交互的数据处理方法的步骤流程图,具体可以包括如下步骤:
步骤101,在检测到语音交互事件时,确定历史语音交互信息;
其中,语音交互事件可以为用户触发的语音交互操作,例如,用户可以通过语音交互操作,发出语音交互信息,进而车载系统可以针对该语音交互信息进行语音交互处理。
在用户与车机进行语音交互的过程中,可以在检测到语音交互事件时,通过确定历史语音交互信息,以进一步根据该历史语音交互信息,针对语音 交互事件进行语音交互处理。
在一示例中,可以针对检测到的语音交互事件,将其在先的语音交互信息,作为历史语音交互信息,例如,在持续倾听的场景中,可以在检测到语音交互事件时,根据其在先轮次会话的语音交互信息,得到历史语音交互信息。
步骤102,根据所述历史语音交互信息,确定针对所述语音交互事件的目标意图领域;
在得到历史语音交互信息后,可以根据该历史语音交互信息,确定针对语音交互事件的目标意图领域,该目标意图领域可以为针对语音交互事件的用户意图领域,如针对NLU自然语言理解,可以具有多个分类领域(domain),该分类领域可以表征不同的用户意图领域。
在一示例中,通过NLU自然语言理解的领域分类,可以预设多个意图领域,每一意图领域可以具有领域名,如领域名可以为空调、闲聊、天气,其可以表征用户意图为控制空调、进行闲聊、查询天气,还可以为其它意图领域,本发明对此不做限制。
步骤103,根据所述目标意图领域,确定语义拒识结果,并根据所述语义拒识结果,响应所述语音交互事件。
在得到目标意图领域后,可以根据该目标意图领域,确定语义拒识结果,进而可以根据语义拒识结果,响应语音交互事件,即可以基于目标意图领域,针对语音交互事件选择放行处理或拒识处理。
在本发明实施例中,通过在检测到语音交互事件时,确定历史语音交互信息;根据所述历史语音交互信息,确定针对所述语音交互事件的目标意图领域;根据所述目标意图领域,确定语义拒识结果,并根据所述语义拒识结果,响应所述语音交互事件,实现了针对弱意图会话的拒识优化处理,通过根据历史语音交互信息确定目标意图领域,进而基于目标意图领域确定语义拒识结果,能够对用户语音交互信息进行综合理解,解决了放行或拒识弱意图会话的选择问题,提升了语义拒识能力。
参照图2,示出了本发明一实施例提供的另一种语音交互的数据处理方 法的步骤流程图,具体可以包括如下步骤:
步骤201,在检测到语音交互事件时,确定所述语音交互事件的目标语音交互信息;
在用户与车机进行语音交互的过程中,可以在检测到语音交互事件时,确定该语音交互事件的目标语音交互信息,目标语音交互信息可以为针对语音交互事件的用户语音交互信息,如在检测到语音交互事件时,接收到的当前轮次会话的用户语音交互信息。
步骤202,判断所述目标语音交互信息是否为弱意图的语音交互信息;
在实际应用中,可以通过判断目标语音交互信息是否为弱意图的语音交互信息,以进一步针对弱意图的语音交互信息,选择放行或拒识处理。
例如,针对用户与车机进行语音交互的单轮会话过程,可以通过设置特定指令信息,如唤醒词,进而可以在检测到唤醒词时,判断出用户在对交互设备发送指令,由于指令信息清晰明确,则可以执行指令信息对应的操作,如语音交互信息为“唤醒词,23度”时,可以执行将空调设置为23度的操作;但在持续倾听的场景中,若用户说法简略,如“23度”,这种说法可以被称为弱意图,该弱意图会话中不具有特定指令信息,也不包括用于直接表征用户意图的上下文信息。
在一示例中,可以通过预设针对弱意图会话的判断条件,进而可以在得到目标语音交互信息后,可以判断其是否为弱意图的语音交互信息,以针对目标语音交互信息为弱意图的语音交互信息的情况,进行后续的语义拒识处理判断,选择对该弱意图的语音交互信息放行或拒识处理。
步骤203,在判定所述目标语音交互信息为弱意图的语音交互信息时,确定历史语音交互信息;
在具体实现中,可以在判定目标语音交互信息为弱意图的语音交互信息时,根据目标语音交互信息,确定历史语音交互信息。
在一示例中,在目标语音交互信息为弱意图的语音交互信息的情况下,可以针对当前会话轮次的目标语音交互信息,获取其历史会话轮次的语音交互信息,例如,针对当前会话轮次的语音交互信息,可以将其上一轮会话轮次的语音交互信息,作为其历史语音交互信息。
步骤204,将所述目标语音交互信息和所述历史语音交互信息输入预先训练的意图分类模型;
在得到目标语音交互信息和历史语音交互信息后,可以针对目标语音交互信息和历史语音交互信息进行合并处理,进而可以将合并后的语音交互信息,输入预先训练的意图分类模型,以进行意图分类预测。
步骤205,接收所述意图分类模型输出的意图预测结果,并根据所述意图预测结果,确定针对所述语音交互事件的目标意图领域;
在意图分类模型处理后,可以接收该意图分类模型输出的意图预测结果,并可以根据意图预测结果,确定针对语音交互事件的目标意图领域。
在一示例中,针对弱意图的语音交互信息,可以通过将该弱意图的语音交互信息与其历史语音交互信息进行合并,进而可以采用意图分类模型,对合并后的语音交互信息进行意图分类预测,可以得到模型输出的意图预测结果,例如,意图分类模型可以输出分类标签结果,该分类标签结果可以对应有分类标签与意图领域的映射关系,进而根据输出的分类标签结果,可以得到对应的意图领域。
步骤206,根据所述目标意图领域,确定语义拒识结果,并根据所述语义拒识结果,响应所述语音交互事件。
在一示例中,可以针对待预测的弱意图的语音交互信息,通过上下文合并,并可以利用意图分类模型进行分类预测,进而可以得到针对弱意图的语音交互信息的分类标签结果,以基于分类标签结果,响应语音交互事件,即可以选择对弱意图的语音交互信息放行或拒识。
例如,弱意图的语音交互信息(即目标语音交互信息)可以为“全场景语音”,其上一轮次会话语音交互信息(即历史语音交互信息)可以为“XX(某车辆型号)有好多新功能了啊”,通过合并后利用意图分类模型进行分类预测,输出结果(即意图预测结果)可以为分类标签“chat”(闲聊),则可以确定其对应领域为“闲聊”领域(即目标意图领域),进而可以基于“闲聊”领域,确定需要进行拒识处理,即用户意图不是开启全场景语音功能,可以选择对该弱意图的语音交互信息拒识。
又如,弱意图的语音交互信息可以为“我饿了”,其上一轮次会话语音 交互信息可以为“打开地图开导航”,通过合并后利用意图分类模型进行分类预测,输出结果可以为分类标签“navigation”(导航),则可以确定其对应领域为“导航”领域,进而可以基于“导航”领域,确定需要进行非拒识处理,即用户意图为推荐导航去附近的餐馆,可以选择对该弱意图的语音交互信息放行。
在本发明一实施例中,步骤206可以包括如下子步骤:
在所述目标意图领域为指定意图领域时,确定所述语义拒识结果为拒识处理结果;在所述目标意图领域为非指定意图领域时,确定所述语义拒识结果为非拒识处理结果。
作为一示例,指定意图领域可以为针对拒识处理预设的意图领域,例如,通过将“闲聊”领域预设为指定意图领域,可以检测到目标意图领域为“闲聊”领域时,进行拒识处理。
在实际应用中,可以在目标意图领域为指定意图领域时,确定语义拒识结果为拒识处理结果,也可以在目标意图领域为非指定意图领域时,确定语义拒识结果为非拒识处理结果,例如,可以将“闲聊”领域预设为指定意图领域,在检测到目标意图领域为“闲聊”领域时,进行拒识处理;在检测到目标意图领域不为“闲聊”领域时,进行放行处理。
通过将目标语音交互信息和历史语音交互信息合并进行判断的方式,结合了上下文信息,扩展了语义空间,能够对用户语音交互信息进行综合理解,提升了语义拒识能力,可以解决弱意图的语音交互信息中意图不清的难题。
参照图3,示出了本发明一实施例提供的另一种语音交互的数据处理方法的步骤流程图,具体可以包括如下步骤:
步骤301,获取样本语音交互组合信息,并确定所述样本语音交互组合信息对应的样本组合意图标签信息;
在具体实现中,可以通过获取样本语音交互组合信息,并可以确定样本语音交互组合信息对应的样本组合意图标签信息,以进一步进行模型训练。
在本发明一实施例中,所述获取样本语音交互组合信息的步骤,可以包括如下子步骤:
子步骤11,获取样本语音交互信息和所述样本语音交互信息对应的样本意图领域信息;
在实际应用中,可以获取样本语音交互信息和样本语音交互信息对应的样本意图领域信息,例如,可以通过获取海量数据,其可以包括用户语音交互信息和用户语音交互信息对应的意图领域信息。
子步骤12,从所述样本语音交互信息中,确定种子语音交互信息,并确定所述种子语音交互信息的种子历史语音交互信息;
在获取样本语音交互信息和样本意图领域信息后,由于样本意图领域信息可以为多个,可以针对每一样本意图领域信息,从对应的样本语音交互信息中,确定种子语音交互信息,并可以确定该种子语音交互信息的种子历史语音交互信息。
具体的,可以通过NLU领域分类结果,对高频语音交互信息进行筛选,进而可以建立针对每一领域(即样本意图领域信息)的高频语音交互信息库(即种子语音交互信息),通过无监督的方式生成种子语音交互信息库,具有高效且节省人力的效果。
例如,通过频度统计,可以得到高频语音交互信息,进而通过TF-IDF计算,可以得到针对每一领域具有代表性的语音交互信息,其中,表1可以为建立的高频语音交互信息库:
class seedqueries
ac 十六度,空调,出风口,单向风,开车窗,风量五档,好热啊......
chat 二十六度,好热啊,有点冷,有风......
weather 三十度,天气晴,有风......
...... ......
在一示例中,可以针对种子语音交互信息,将其在先的语音交互信息(如上一轮语音交互信息)作为种子历史语音交互信息。
子步骤13,根据所述种子语音交互信息和所述种子历史语音交互信息,得到样本语音交互组合信息。
在得到种子语音交互信息和种子历史语音交互信息后,可以根据该种子语音交互信息和种子历史语音交互信息,得到样本语音交互组合信息。
具体的,可以在得到基于种子语音交互信息的语音交互信息库后,通过上下文相关的文本特征表示,例如,可以采用bert模型对种子语音交互信息,以及其对应的种子历史语音交互信息进行特征表示,可以采用如下方式进行特征表示:
种子历史语音交互信息和种子语音交互信息合并:
a.打开空调;二十三度(种子历史语音交互信息;种子语音交互信息);
bert特征表示feature_matrix_size[q_length,768]:
a_feature_matrix[8,768]。
通过将种子历史语音交互信息和种子语音交互信息合并,进行bert特征表示后,可以利用k-means方法对特征进行聚类处理,可以选取合适的K值,例如,针对bert特征表示后的上下文进行聚类,可以自动选择K值,K值可以为上下文语音交互信息涉及的领域个数。
在一示例中,针对bert特征表示后的上下文进行聚类的过程中,若对4个种子语音交互信息进行聚类,该4个种子语音交互信息涉及到3个不同的领域,则K值可以设置为3。
在本发明一实施例中,所述确定所述样本语音交互组合信息对应的样本组合意图标签信息的步骤,可以包括如下子步骤:
子步骤21,采用所述样本语音交互组合信息,生成标签匹配结果;
在得到样本语音交互组合信息后,可以采用样本语音交互组合信息,生成标签匹配结果,该标签匹配结果可以为根据聚类后的结果,得到的标签匹配结果,如伪标签标记。
例如,可以根据聚类后结果,采用$i表示聚类的结果,i∈[0,n],n是类别数,进而可以得到伪标签标记,可以采用如下方式表示:
伪标签 $0 $1
  打开空调;二十三度 有点冷啊;五度
  风量开到三挡;十六度  
子步骤22,根据所述标签匹配结果,确定样本组合意图标签信息。
在得到标签匹配结果后,可以根据该标签匹配结果,确定样本组合意图标签信息,从而通过采用样本语音交互组合信息,确定样本组合意图标签信 息的方法,可以实现对海量上下文相关的语音交互信息进行预标注。
具体的,可以将聚类结果设置伪标签标记,然后可以利用TF-IDF计算,可以计算出与种子语音交互信息库的分值,进而可以得到真实标签标记(即样本组合意图标签信息)。
在一示例中,通过采用种子语音交互信息库,可以将种子历史语音交互信息和种子语音交互信息合并,得到样本语音交互组合信息,然后可以将样本语音交互组合信息与种子语音交互信息库中每一领域(如表1中class)计算TF-IDF值,进而可以根据伪标签标记得到最相关的真实标签标记。
例如,通过输入聚类后的上下文语音交互信息特征(即样本语音交互组合信息)和伪标签标记(即标签匹配结果),可以输出所匹配的真实标签结果(即样本组合意图标签信息),可以采用如下方式表示:
真标签 ac chat
  打开空调;二十三度 有点冷啊;五度
  风量开到三挡;十六度  
步骤302,根据所述样本语音交互组合信息和所述样本组合意图标签信息进行模型训练,得到意图分类模型;
在具体实现中,可以根据样本语音交互组合信息和样本组合意图标签信息进行模型训练,进而可以得到意图分类模型。
例如,通过采用样本语音交互组合信息和样本组合意图标签信息,可以进行bert分类模型训练,也可以使用其它模型,本发明对此不做限制;在得到意图分类模型后,可以利用测试数据进行测试,可以得到弱意图的语音交互信息与其上下文相关的真实意图分类结果(即目标意图领域)。
步骤303,在检测到语音交互事件时,确定所述语音交互事件的目标语音交互信息;
步骤304,判断所述目标语音交互信息是否为弱意图的语音交互信息;
步骤305,在判定所述目标语音交互信息为弱意图的语音交互信息时,确定历史语音交互信息;
步骤306,将所述目标语音交互信息和所述历史语音交互信息输入预先训练的意图分类模型;
步骤307,接收所述意图分类模型输出的意图预测结果,并根据所述意图预测结果,确定针对所述语音交互事件的目标意图领域;
步骤308,根据所述目标意图领域,确定语义拒识结果,并根据所述语义拒识结果,响应所述语音交互事件。
以下结合图4对本发明实施例进行示例性说明:
1、海量query以及对应的domain(即样本语音交互信息和样本语音交互信息对应的样本意图领域信息);
2、生成弱意图种子库;
3、提取种子query(即种子语音交互信息)以及对应的domain(即样本意图领域信息);
4、上下文相关的文本特征表示与聚类;
5、弱意图query及其上下文的bert特征(即种子语音交互信息和种子历史语音交互信息)与聚类结果(即样本语音交互组合信息);
6、伪标签(即标签匹配结果)匹配方法与批量预标注;
7、弱意图query及其上下文特征对应的真实类别(即样本组合意图标签信息);
8、Bert模型(即意图分类模型)训练;
9、待预测意图query(即目标语音交互信息)及其上下文特征(即历史语音交互信息),采用意图分类模型进行预测,可以得到弱意图query分类标签(即意图预测结果)。
需要说明的是,对于方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本发明实施例并不受所描述的动作顺序的限制,因为依据本发明实施例,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作并不一定是本发明实施例所必须的。
参照图5,示出了本发明一实施例提供的一种语音交互的数据处理装置 的结构示意图,具体可以包括如下模块:
历史语音交互信息确定模块501,用于在检测到语音交互事件时,确定历史语音交互信息;
目标意图领域确定模块502,用于根据所述历史语音交互信息,确定针对所述语音交互事件的目标意图领域;
语音交互事件响应模块503,用于根据所述目标意图领域,确定语义拒识结果,并根据所述语义拒识结果,响应所述语音交互事件。
在本发明一实施例中,还包括:
目标语音交互信息确定模块,用于确定所述语音交互事件的目标语音交互信息;
判断模块,用于判断所述目标语音交互信息是否为弱意图的语音交互信息;
判定模块,用于在判定所述目标语音交互信息为弱意图的语音交互信息时,调用所述历史语音交互信息确定模块501。
在本发明一实施例中,所述目标意图领域确定模块502包括:
模型输入子模块,用于将所述目标语音交互信息和所述历史语音交互信息输入预先训练的意图分类模型;
模型输出子模块,用于接收所述意图分类模型输出的意图预测结果,并根据所述意图预测结果,确定针对所述语音交互事件的目标意图领域。
在本发明一实施例中,还包括:
样本语音交互组合信息和样本组合意图标签信息获取模块,用于获取样本语音交互组合信息,并确定所述样本语音交互组合信息对应的样本组合意图标签信息;
意图分类模型得到模块,用于根据所述样本语音交互组合信息和所述样本组合意图标签信息进行模型训练,得到意图分类模型。
在本发明一实施例中,所述样本语音交互组合信息和样本组合意图标签信息获取模块包括:
样本语音交互信息和样本意图领域信息获取子模块,用于获取样本语音交互信息和所述样本语音交互信息对应的样本意图领域信息;
种子语音交互信息确定子模块,用于从所述样本语音交互信息中,确定种子语音交互信息,并确定所述种子语音交互信息的种子历史语音交互信息;
样本语音交互组合信息得到子模块,用于根据所述种子语音交互信息和所述种子历史语音交互信息,得到样本语音交互组合信息。
在本发明一实施例中,所述样本语音交互组合信息和样本组合意图标签信息获取模块还包括:
标签匹配结果生成子模块,用于采用所述样本语音交互组合信息,生成标签匹配结果;
样本组合意图标签信息确定子模块,用于根据所述标签匹配结果,确定样本组合意图标签信息。
在本发明一实施例中,所述语音交互事件响应模块503包括:
拒识处理结果确定子模块,用于在所述目标意图领域为指定意图领域时,确定所述语义拒识结果为拒识处理结果;
非拒识处理结果确定子模块,用于在所述目标意图领域为非指定意图领域时,确定所述语义拒识结果为非拒识处理结果。
在本发明实施例中,通过在检测到语音交互事件时,确定历史语音交互信息;根据所述历史语音交互信息,确定针对所述语音交互事件的目标意图领域;根据所述目标意图领域,确定语义拒识结果,并根据所述语义拒识结果,响应所述语音交互事件,实现了针对弱意图会话的拒识优化处理,通过根据历史语音交互信息确定目标意图领域,进而基于目标意图领域确定语义拒识结果,能够对用户语音交互信息进行综合理解,解决了放行或拒识弱意图会话的选择问题,提升了语义拒识能力。
本发明一实施例还提供了一种服务器,可以包括处理器、存储器及存储在存储器上并能够在处理器上运行的计算机程序,计算机程序被处理器执行时实现如上语音交互的数据处理方法。
本发明一实施例还提供了一种计算机可读存储介质,计算机可读存储介 质上存储计算机程序,计算机程序被处理器执行时实现如上语音交互的数据处理方法。
对于装置实施例而言,由于其与方法实施例基本相似,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。
本说明书中的各个实施例均采用递进的方式描述,每个实施例重点说明的都是与其他实施例的不同之处,各个实施例之间相同相似的部分互相参见即可。
本领域内的技术人员应明白,本发明实施例可提供为方法、装置、或计算机程序产品。因此,本发明实施例可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本发明实施例可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本发明实施例是参照根据本发明实施例的方法、终端设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理终端设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理终端设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理终端设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理终端设备上,使得在计算机或其他可编程终端设备上执行一系列操作步骤以产生计 算机实现的处理,从而在计算机或其他可编程终端设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
尽管已描述了本发明实施例的优选实施例,但本领域内的技术人员一旦得知了基本创造性概念,则可对这些实施例做出另外的变更和修改。所以,所附权利要求意欲解释为包括优选实施例以及落入本发明实施例范围的所有变更和修改。
最后,还需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者终端设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者终端设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者终端设备中还存在另外的相同要素。
以上对所提供的一种语音交互的数据处理方法和装置,进行了详细介绍,本文中应用了具体个例对本发明的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本发明的方法及其核心思想;同时,对于本领域的一般技术人员,依据本发明的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本发明的限制。

Claims (10)

  1. 一种语音交互的数据处理方法,其特征在于,所述方法包括:
    在检测到语音交互事件时,确定历史语音交互信息;
    根据所述历史语音交互信息,确定针对所述语音交互事件的目标意图领域;
    根据所述目标意图领域,确定语义拒识结果,并根据所述语义拒识结果,响应所述语音交互事件。
  2. 根据权利要求1所述的方法,其特征在于,在所述确定历史语音交互信息之前,还包括:
    确定所述语音交互事件的目标语音交互信息;
    判断所述目标语音交互信息是否为弱意图的语音交互信息;
    在判定所述目标语音交互信息为弱意图的语音交互信息时,执行所述确定历史语音交互信息。
  3. 根据权利要求2所述的方法,其特征在于,所述根据所述历史语音交互信息,确定针对所述语音交互事件的目标意图领域,包括:
    将所述目标语音交互信息和所述历史语音交互信息输入预先训练的意图分类模型;
    接收所述意图分类模型输出的意图预测结果,并根据所述意图预测结果,确定针对所述语音交互事件的目标意图领域。
  4. 根据权利要求3所述的方法,其特征在于,还包括:
    获取样本语音交互组合信息,并确定所述样本语音交互组合信息对应的样本组合意图标签信息;
    根据所述样本语音交互组合信息和所述样本组合意图标签信息进行模型训练,得到意图分类模型。
  5. 根据权利要求4所述的方法,其特征在于,所述获取样本语音交互组合信息,包括:
    获取样本语音交互信息和所述样本语音交互信息对应的样本意图领域信息;
    从所述样本语音交互信息中,确定种子语音交互信息,并确定所述种子 语音交互信息的种子历史语音交互信息;
    根据所述种子语音交互信息和所述种子历史语音交互信息,得到样本语音交互组合信息。
  6. 根据权利要求4或5所述的方法,其特征在于,所述确定所述样本语音交互组合信息对应的样本组合意图标签信息,包括:
    采用所述样本语音交互组合信息,生成标签匹配结果;
    根据所述标签匹配结果,确定样本组合意图标签信息。
  7. 根据权利要求1或2或3所述的方法,其特征在于,所述根据所述目标意图领域,确定语义拒识结果,包括:
    在所述目标意图领域为指定意图领域时,确定所述语义拒识结果为拒识处理结果;
    在所述目标意图领域为非指定意图领域时,确定所述语义拒识结果为非拒识处理结果。
  8. 一种语音交互的数据处理装置,其特征在于,所述装置包括:
    历史语音交互信息确定模块,用于在检测到语音交互事件时,确定历史语音交互信息;
    目标意图领域确定模块,用于根据所述历史语音交互信息,确定针对所述语音交互事件的目标意图领域;
    语音交互事件响应模块,用于根据所述目标意图领域,确定语义拒识结果,并根据所述语义拒识结果,响应所述语音交互事件。
  9. 一种服务器,其特征在于,包括处理器、存储器及存储在所述存储器上并能够在所述处理器上运行的计算机程序,所述计算机程序被所述处理器执行时实现如权利要求1至7中任一项所述的语音交互的数据处理方法。
  10. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储计算机程序,所述计算机程序被处理器执行时实现如权利要求1至7中任一项所述的语音交互的数据处理方法。
PCT/CN2021/140591 2020-12-23 2021-12-22 一种语音交互的数据处理方法和装置 WO2022135496A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011541411.3 2020-12-23
CN202011541411.3A CN112667076A (zh) 2020-12-23 2020-12-23 一种语音交互的数据处理方法和装置

Publications (1)

Publication Number Publication Date
WO2022135496A1 true WO2022135496A1 (zh) 2022-06-30

Family

ID=75409121

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/140591 WO2022135496A1 (zh) 2020-12-23 2021-12-22 一种语音交互的数据处理方法和装置

Country Status (2)

Country Link
CN (1) CN112667076A (zh)
WO (1) WO2022135496A1 (zh)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112667076A (zh) * 2020-12-23 2021-04-16 广州橙行智动汽车科技有限公司 一种语音交互的数据处理方法和装置
CN113221580B (zh) * 2021-07-08 2021-10-12 广州小鹏汽车科技有限公司 语义拒识方法、语义拒识装置、交通工具及介质
CN113470649A (zh) * 2021-08-18 2021-10-01 三星电子(中国)研发中心 语音交互方法及装置
CN115910035B (zh) * 2023-03-01 2023-06-30 广州小鹏汽车科技有限公司 语音交互方法、服务器及计算机可读存储介质

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108509619A (zh) * 2018-04-04 2018-09-07 科大讯飞股份有限公司 一种语音交互方法及设备
CN109326289A (zh) * 2018-11-30 2019-02-12 深圳创维数字技术有限公司 免唤醒语音交互方法、装置、设备及存储介质
CN110807333A (zh) * 2019-10-30 2020-02-18 腾讯科技(深圳)有限公司 一种语义理解模型的语义处理方法、装置及存储介质
WO2020155766A1 (zh) * 2019-01-31 2020-08-06 平安科技(深圳)有限公司 意图识别中的拒识方法、装置、设备及存储介质
CN111583919A (zh) * 2020-04-15 2020-08-25 北京小米松果电子有限公司 信息处理方法、装置及存储介质
CN112382291A (zh) * 2020-11-23 2021-02-19 北京百度网讯科技有限公司 语音交互的处理方法、装置、电子设备及存储介质
CN112667076A (zh) * 2020-12-23 2021-04-16 广州橙行智动汽车科技有限公司 一种语音交互的数据处理方法和装置

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108509619A (zh) * 2018-04-04 2018-09-07 科大讯飞股份有限公司 一种语音交互方法及设备
CN109326289A (zh) * 2018-11-30 2019-02-12 深圳创维数字技术有限公司 免唤醒语音交互方法、装置、设备及存储介质
WO2020155766A1 (zh) * 2019-01-31 2020-08-06 平安科技(深圳)有限公司 意图识别中的拒识方法、装置、设备及存储介质
CN110807333A (zh) * 2019-10-30 2020-02-18 腾讯科技(深圳)有限公司 一种语义理解模型的语义处理方法、装置及存储介质
CN111583919A (zh) * 2020-04-15 2020-08-25 北京小米松果电子有限公司 信息处理方法、装置及存储介质
CN112382291A (zh) * 2020-11-23 2021-02-19 北京百度网讯科技有限公司 语音交互的处理方法、装置、电子设备及存储介质
CN112667076A (zh) * 2020-12-23 2021-04-16 广州橙行智动汽车科技有限公司 一种语音交互的数据处理方法和装置

Also Published As

Publication number Publication date
CN112667076A (zh) 2021-04-16

Similar Documents

Publication Publication Date Title
WO2022135496A1 (zh) 一种语音交互的数据处理方法和装置
US10831345B2 (en) Establishing user specified interaction modes in a question answering dialogue
CN110168535B (zh) 一种信息处理方法及终端、计算机存储介质
CN109841212B (zh) 分析具有多个意图的命令的语音识别系统和语音识别方法
CN111428010B (zh) 人机智能问答的方法和装置
CN111400607B (zh) 搜索内容输出方法、装置、计算机设备及可读存储介质
US20170109615A1 (en) Systems and Methods for Automatically Classifying Businesses from Images
CN110262273A (zh) 一种家居设备控制方法、装置、存储介质及智能家居系统
KR20180025121A (ko) 메시지 입력 방법 및 장치
JP2017010517A (ja) 人工知能によるヒューマン・マシン間の知能チャットの方法および装置
US11127399B2 (en) Method and apparatus for pushing information
WO2017186050A1 (zh) 人机智能问答系统的断句识别方法和装置
CN110163376B (zh) 样本检测方法、媒体对象的识别方法、装置、终端及介质
CN110415679A (zh) 语音纠错方法、装置、设备和存储介质
CN111708869A (zh) 人机对话的处理方法及装置
WO2021063089A1 (zh) 规则匹配方法、规则匹配装置、存储介质及电子设备
CN112417158A (zh) 文本数据分类模型的训练方法、分类方法、装置和设备
CN112417128A (zh) 话术推荐方法、装置、计算机设备及存储介质
US20240176798A1 (en) Generating and presenting a searchable graph based on a graph query
CN111767382A (zh) 生成反馈信息的方法、装置及终端设备
CN107291774B (zh) 错误样本识别方法和装置
CN113282754A (zh) 针对新闻事件的舆情检测方法、装置、设备和存储介质
CN110059172B (zh) 基于自然语言理解的推荐答案的方法和装置
CN111339290A (zh) 一种文本分类方法和系统
CN114461749B (zh) 对话内容的数据处理方法、装置、电子设备和介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21909491

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21909491

Country of ref document: EP

Kind code of ref document: A1