CN112667076A - Voice interaction data processing method and device - Google Patents

Voice interaction data processing method and device Download PDF

Info

Publication number
CN112667076A
CN112667076A CN202011541411.3A CN202011541411A CN112667076A CN 112667076 A CN112667076 A CN 112667076A CN 202011541411 A CN202011541411 A CN 202011541411A CN 112667076 A CN112667076 A CN 112667076A
Authority
CN
China
Prior art keywords
voice interaction
information
intention
determining
interaction information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011541411.3A
Other languages
Chinese (zh)
Inventor
韩传宇
易晖
翁志伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Xiaopeng Motors Technology Co Ltd
Guangzhou Chengxingzhidong Automotive Technology Co., Ltd
Original Assignee
Guangzhou Xiaopeng Motors Technology Co Ltd
Guangzhou Chengxingzhidong Automotive Technology Co., Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Xiaopeng Motors Technology Co Ltd, Guangzhou Chengxingzhidong Automotive Technology Co., Ltd filed Critical Guangzhou Xiaopeng Motors Technology Co Ltd
Priority to CN202011541411.3A priority Critical patent/CN112667076A/en
Publication of CN112667076A publication Critical patent/CN112667076A/en
Priority to PCT/CN2021/140591 priority patent/WO2022135496A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • G06F40/35Discourse or dialogue representation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Human Computer Interaction (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Machine Translation (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The embodiment of the invention provides a voice interaction data processing method and a voice interaction data processing device, wherein the method comprises the following steps: when a voice interaction event is detected, determining historical voice interaction information; determining a target intention field aiming at the voice interaction event according to the historical voice interaction information; and determining a semantic rejection result according to the target intention field, and responding to the voice interaction event according to the semantic rejection result. According to the embodiment of the invention, rejection optimization processing aiming at the weak intention conversation is realized, the target intention field is determined according to the historical voice interaction information, and then the semantic rejection result is determined based on the target intention field, so that the voice interaction information of the user can be comprehensively understood, the selection problem of passing or rejecting the weak intention conversation is solved, and the semantic rejection capability is improved.

Description

Voice interaction data processing method and device
Technical Field
The present invention relates to the field of data processing, and in particular, to a method and an apparatus for processing voice interaction data.
Background
At present, in the process of voice interaction between a user and a car machine, for a continuous listening scene, when a user speaks briefly (for example, "23 degrees", this speaking is generally called weak intention), since a command received by the car machine in a weak intention session is not clear, and has no context information, and an intention judgment cannot be performed, a weak intention session is rejected, but in some cases, a false interception is generated, and how to comprehensively understand a user query (voice interaction information) so as to select to pass or reject the weak intention session is a problem which needs to be solved urgently at present.
Disclosure of Invention
In view of the above, it is proposed to provide a data processing method and apparatus for voice interaction that overcomes or at least partially solves the above mentioned problems, comprising:
a method of data processing for voice interaction, the method comprising:
when a voice interaction event is detected, determining historical voice interaction information;
determining a target intention field aiming at the voice interaction event according to the historical voice interaction information;
and determining a semantic rejection result according to the target intention field, and responding to the voice interaction event according to the semantic rejection result.
Optionally, before the determining the historical voice interaction information, the method further includes:
determining target voice interaction information of the voice interaction event;
judging whether the target voice interaction information is weak voice interaction information or not;
and executing the determined historical voice interaction information when the target voice interaction information is judged to be the voice interaction information with weak intentions.
Optionally, the determining a target intention field for the voice interaction event according to the historical voice interaction information includes:
inputting the target voice interaction information and the historical voice interaction information into a pre-trained intention classification model;
and receiving an intention prediction result output by the intention classification model, and determining a target intention field aiming at the voice interaction event according to the intention prediction result.
Optionally, the method further comprises:
acquiring sample voice interaction combination information, and determining sample combination intention label information corresponding to the sample voice interaction combination information;
and performing model training according to the sample voice interaction combination information and the sample combination intention label information to obtain an intention classification model.
Optionally, the obtaining of the sample voice interaction combination information includes:
acquiring sample voice interaction information and sample intention field information corresponding to the sample voice interaction information;
determining seed voice interaction information from the sample voice interaction information, and determining seed historical voice interaction information of the seed voice interaction information;
and obtaining sample voice interaction combination information according to the seed voice interaction information and the seed historical voice interaction information.
Optionally, the determining sample combination intention tag information corresponding to the sample voice interaction combination information includes:
generating a label matching result by adopting the sample voice interaction combination information;
and determining the sample combination intention label information according to the label matching result.
Optionally, the determining a semantic rejection result according to the target intention field includes:
when the target intention field is a designated intention field, determining the semantic rejection result as a rejection processing result;
and when the target intention field is a non-specified intention field, determining the semantic rejection result as a non-rejection processing result.
A voice-interactive data processing apparatus, the apparatus comprising:
the historical voice interaction information determining module is used for determining historical voice interaction information when a voice interaction event is detected;
the target intention field determining module is used for determining a target intention field aiming at the voice interaction event according to the historical voice interaction information;
and the voice interaction event response module is used for determining a semantic rejection result according to the target intention field and responding to the voice interaction event according to the semantic rejection result.
A server comprising a processor, a memory and a computer program stored on the memory and capable of running on the processor, the computer program, when executed by the processor, implementing a data processing method of voice interaction as described above.
A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the data processing method of voice interaction as described above.
The embodiment of the invention has the following advantages:
in the embodiment of the invention, when the voice interaction event is detected, the historical voice interaction information is determined; determining a target intention field aiming at the voice interaction event according to the historical voice interaction information; and determining a semantic rejection result according to the target intention field, responding to the voice interaction event according to the semantic rejection result, realizing rejection optimization processing aiming at the weak intention session, determining the target intention field according to historical voice interaction information, further determining the semantic rejection result based on the target intention field, comprehensively understanding the voice interaction information of the user, solving the problem of selection of passing or rejecting the weak intention session, and improving the semantic rejection capability.
Drawings
In order to more clearly illustrate the technical solution of the present invention, the drawings needed to be used in the description of the present invention will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.
FIG. 1 is a flowchart illustrating steps of a method for processing voice interaction data according to an embodiment of the present invention;
FIG. 2 is a flow chart illustrating steps of another method for processing voice-interactive data according to an embodiment of the present invention;
FIG. 3 is a flow chart illustrating steps of another method for processing voice-interactive data according to an embodiment of the present invention;
FIG. 4 is a diagram of a semantic rejection technique architecture provided by an embodiment of the present invention;
fig. 5 is a schematic structural diagram of a voice interaction data processing apparatus according to an embodiment of the present invention.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below. It is to be understood that the embodiments described are only a few embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, a flowchart illustrating steps of a voice interaction data processing method according to an embodiment of the present invention is shown, which may specifically include the following steps:
step 101, when a voice interaction event is detected, determining historical voice interaction information;
the voice interaction event may be a voice interaction operation triggered by a user, for example, the user may send out voice interaction information through the voice interaction operation, and then the vehicle-mounted system may perform voice interaction processing on the voice interaction information.
In the process of voice interaction between a user and a vehicle machine, when a voice interaction event is detected, the voice interaction processing can be performed on the voice interaction event by determining the historical voice interaction information according to the historical voice interaction information.
In an example, the previous voice interaction information of the detected voice interaction event may be used as the historical voice interaction information, for example, in a continuous listening scenario, the historical voice interaction information may be obtained according to the voice interaction information of the previous turn session when the voice interaction event is detected.
Step 102, determining a target intention field aiming at the voice interaction event according to the historical voice interaction information;
after obtaining the historical voice interaction information, a target intent field for the voice interaction event may be determined based on the historical voice interaction information, the target intent field may be a user intent field for the voice interaction event, such as for NLU natural language understanding, and may have multiple classification fields (domains) that may characterize different user intent fields.
In an example, a plurality of intention fields may be preset through a field classification understood by the NLU natural language, and each intention field may have a realm name, for example, the realm name may be an air conditioner, a chatting, or weather, and may represent a user intention to control the air conditioner, perform the chatting, or query the weather, or may be other intention fields, which is not limited in the present invention.
And 103, determining a semantic rejection result according to the target intention field, and responding to the voice interaction event according to the semantic rejection result.
After the target intention field is obtained, a semantic rejection result can be determined according to the target intention field, and then the voice interaction event can be responded according to the semantic rejection result, that is, the passing processing or rejection processing can be selected for the voice interaction event based on the target intention field.
In the embodiment of the invention, when the voice interaction event is detected, the historical voice interaction information is determined; determining a target intention field aiming at the voice interaction event according to the historical voice interaction information; and determining a semantic rejection result according to the target intention field, responding to the voice interaction event according to the semantic rejection result, realizing rejection optimization processing aiming at the weak intention session, determining the target intention field according to historical voice interaction information, further determining the semantic rejection result based on the target intention field, comprehensively understanding the voice interaction information of the user, solving the problem of selection of passing or rejecting the weak intention session, and improving the semantic rejection capability.
Referring to fig. 2, a flowchart illustrating steps of another voice interaction data processing method according to an embodiment of the present invention is shown, which may specifically include the following steps:
step 201, when a voice interaction event is detected, determining target voice interaction information of the voice interaction event;
in the process of voice interaction between a user and a vehicle machine, when a voice interaction event is detected, target voice interaction information of the voice interaction event can be determined, and the target voice interaction information can be user voice interaction information aiming at the voice interaction event, such as user voice interaction information of a current turn of conversation received when the voice interaction event is detected.
Step 202, judging whether the target voice interaction information is weak voice interaction information;
in practical application, the passing or rejection processing can be selected according to the voice interaction information with weak intentions by judging whether the target voice interaction information is the voice interaction information with weak intentions.
For example, for a single-round conversation process in which a user performs voice interaction with a vehicle machine, it may be determined that the user is sending an instruction to an interactive device by setting specific instruction information, such as a wakeup word, when the wakeup word is detected, and since the instruction information is clear, an operation corresponding to the instruction information may be executed, and if the voice interaction information is "wakeup word, 23 degrees", an operation of setting an air conditioner to 23 degrees may be executed; however, in the continuous listening scenario, if the user speaks a short term, such as "23 degrees," such a language may be referred to as a weak intent, which has no specific instructional information in the session, nor includes contextual information that directly characterizes the user's intent.
In an example, by presetting a judgment condition for the weak intention conversation, and further after obtaining the target voice interaction information, judging whether the target voice interaction information is the voice interaction information with the weak intention, performing subsequent semantic rejection processing judgment aiming at the condition that the target voice interaction information is the voice interaction information with the weak intention, and selecting to pass or reject the voice interaction information with the weak intention.
Step 203, determining historical voice interaction information when the target voice interaction information is judged to be the voice interaction information with weak intentions;
in a specific implementation, when the target voice interaction information is judged to be the weak-intention voice interaction information, the historical voice interaction information can be determined according to the target voice interaction information.
In an example, in a case that the target voice interaction information is weak-intention voice interaction information, the voice interaction information of the historical conversation turn may be acquired for the target voice interaction information of the current conversation turn, for example, the voice interaction information of the previous conversation turn may be taken as the historical voice interaction information of the current conversation turn.
Step 204, inputting the target voice interaction information and the historical voice interaction information into a pre-trained intention classification model;
after the target voice interaction information and the historical voice interaction information are obtained, merging processing can be performed on the target voice interaction information and the historical voice interaction information, and then the merged voice interaction information can be input into a pre-trained intention classification model to perform intention classification prediction.
Step 205, receiving an intention prediction result output by the intention classification model, and determining a target intention field aiming at the voice interaction event according to the intention prediction result;
after the intention classification model is processed, an intention prediction result output by the intention classification model can be received, and a target intention field aiming at the voice interaction event can be determined according to the intention prediction result.
In an example, for voice interaction information of a weak intention, the voice interaction information of the weak intention may be merged with historical voice interaction information thereof, and then an intention classification model may be adopted to perform intention classification prediction on the merged voice interaction information, and an intention prediction result output by the model may be obtained.
And step 206, determining a semantic rejection result according to the target intention field, and responding to the voice interaction event according to the semantic rejection result.
In an example, the voice interaction information of the weak intention to be predicted can be subjected to classification prediction by context combination and an intention classification model, so that a classification label result of the voice interaction information of the weak intention can be obtained, and a voice interaction event can be responded based on the classification label result, that is, the voice interaction information of the weak intention can be selected to be passed or rejected.
For example, the voice interaction information with weak intention (i.e., target voice interaction information) may be "full-scene voice", the voice interaction information (i.e., historical voice interaction information) of the previous turn of session may be "XX (a certain vehicle model) has how many new functions, classification prediction is performed by using an intention classification model after merging, an output result (i.e., an intention prediction result) may be a classification label" chat "(chatty), a corresponding field of the output result may be determined to be a" chatty "field (i.e., a target intention field), and further, based on the" chatty "field, it may be determined that rejection processing is required, that is, the user intends not to turn on the full-scene voice function, and may select rejection of the voice interaction information with weak intention.
For another example, the voice interaction information with weak intention may be "i hungry", the voice interaction information of the previous turn of session may be "open map for navigation", classification prediction is performed by using an intention classification model after merging, an output result may be a classification label "navigation", and then the corresponding field may be determined to be a "navigation" field, and further, based on the "navigation" field, it may be determined that non-rejection processing is required, that is, the user intends to recommend navigation to a nearby restaurant, and may select to pass the voice interaction information with weak intention.
In an embodiment of the present invention, step 206 may include the following sub-steps:
when the target intention field is a designated intention field, determining the semantic rejection result as a rejection processing result; and when the target intention field is a non-specified intention field, determining the semantic rejection result as a non-rejection processing result.
As an example, the designated intention field may be an intention field preset for rejection processing, and for example, by presetting a "chatting" field as the designated intention field, the rejection processing may be performed when it is detected that the target intention field is the "chatting" field.
In practical applications, when the target intention field is a designated intention field, the semantic rejection result may be determined as a rejection processing result, or when the target intention field is a non-designated intention field, the semantic rejection result may be determined as a non-rejection processing result, for example, a "chatting" field may be preset as the designated intention field, and when the target intention field is detected as the "chatting" field, the rejection processing may be performed; and when the target intention field is detected not to be the 'chatting' field, performing release processing.
By combining the target voice interaction information and the historical voice interaction information for judgment, the semantic space is expanded by combining the context information, the voice interaction information of the user can be comprehensively understood, the semantic rejection capability is improved, and the problem of unclear intention in the voice interaction information with weak intention can be solved.
Referring to fig. 3, a flowchart illustrating steps of another voice interaction data processing method according to an embodiment of the present invention is shown, which may specifically include the following steps:
step 301, obtaining sample voice interaction combination information, and determining sample combination intention label information corresponding to the sample voice interaction combination information;
in specific implementation, model training can be further performed by obtaining sample voice interaction combination information and determining sample combination intention label information corresponding to the sample voice interaction combination information.
In an embodiment of the present invention, the step of obtaining the sample voice interaction combination information may include the following sub-steps:
substep 11, obtaining sample voice interaction information and sample intention field information corresponding to the sample voice interaction information;
in practical applications, the sample voice interaction information and the sample intention domain information corresponding to the sample voice interaction information may be obtained, for example, mass data may be obtained, which may include the user voice interaction information and the intention domain information corresponding to the user voice interaction information.
Substep 12, determining seed voice interaction information from the sample voice interaction information, and determining seed historical voice interaction information of the seed voice interaction information;
after the sample voice interaction information and the sample intention field information are obtained, since the number of the sample intention field information can be multiple, the seed voice interaction information can be determined from the corresponding sample voice interaction information according to each sample intention field information, and the seed historical voice interaction information of the seed voice interaction information can be determined.
Specifically, the high-frequency voice interaction information can be screened through the classification result of the NLU field, then a high-frequency voice interaction information base (seed voice interaction information) aiming at each field (sample intention field information) can be established, the seed voice interaction information base is generated in an unsupervised mode, and the method has the effects of being efficient and saving manpower.
For example, high-frequency voice interaction information can be obtained through frequency statistics, and then voice interaction information representative for each field can be obtained through TF-IDF calculation, where table 1 may be an established high-frequency voice interaction information base:
class seed queries
ac sixthly, air conditioning, an air outlet, unidirectional wind, car window opening, five-gear wind quantity and good heating.
chat Twenty-six degrees, warm and cool, and wind.
weather Thirty degrees, sunny weather, windy.
...... ......
In an example, the seed voice interaction information may be referred to as the seed history voice interaction information, and its previous voice interaction information (e.g., the previous round of voice interaction information) may be referred to as the seed history voice interaction information.
And a substep 13, obtaining sample voice interaction combination information according to the seed voice interaction information and the seed historical voice interaction information.
After the seed voice interaction information and the seed historical voice interaction information are obtained, sample voice interaction combination information can be obtained according to the seed voice interaction information and the seed historical voice interaction information.
Specifically, after the speech interaction information base based on the seed speech interaction information is obtained, the seed speech interaction information and the corresponding seed historical speech interaction information may be characterized by context-dependent text characteristics, for example, a bert model may be used to characterize the seed speech interaction information and the corresponding seed historical speech interaction information, and the following method may be used to characterize:
merging the seed historical voice interaction information and the seed voice interaction information:
a. opening the air conditioner; twenty-three degrees (seed history voice interaction information; seed voice interaction information);
the bert feature represents feature _ matrix _ size [ q _ length,768 ]:
a_feature_matrix[8,768]。
by combining the seed historical voice interaction information and the seed voice interaction information, after the bert feature representation is carried out, clustering processing can be carried out on the features by using a K-means method, and a proper K value can be selected.
In an example, in the process of clustering the contexts after the bert feature representation, if 4 pieces of seed speech interaction information are clustered, where the 4 pieces of seed speech interaction information relate to 3 different fields, the K value may be set to 3.
In an embodiment of the present invention, the step of determining the sample combination intention tag information corresponding to the sample voice interaction combination information may include the following sub-steps:
substep 21, generating a label matching result by adopting the sample voice interaction combination information;
after the sample voice interaction combination information is obtained, a tag matching result can be generated by using the sample voice interaction combination information, and the tag matching result can be a tag matching result obtained according to the clustered result, such as a pseudo tag label.
For example, the result of clustering can be represented by $ i, i ∈ [0, n ], n is the number of categories, and then the pseudo label can be obtained, which can be represented as follows:
pseudo label $0 $1
Opening the air conditioner; twenty three degrees Cooling somewhat; five degrees
The air quantity is switched to the third gear; sixteen degree
And a substep 22 of determining sample combination intention label information according to the label matching result.
After the tag matching result is obtained, the sample combination intention tag information can be determined according to the tag matching result, and therefore the method for determining the sample combination intention tag information by adopting the sample voice interaction combination information can realize the pre-labeling of massive voice interaction information related to the context.
Specifically, a pseudo label mark can be set on the clustering result, and then TF-IDF calculation can be used to calculate the score of the information base interacting with the seed speech, so as to obtain a real label mark (i.e. the sample combination intention label information).
In an example, by using the seed voice interaction information base, the seed history voice interaction information and the seed voice interaction information may be merged to obtain sample voice interaction combination information, and then the TF-IDF value may be calculated for each field (e.g., class in table 1) in the sample voice interaction combination information and the seed voice interaction information base, so that the most relevant real tag label may be obtained according to the pseudo tag label.
For example, by inputting the clustered contextual voice interaction information features (i.e. sample voice interaction combination information) and pseudo tag labels (i.e. tag matching results), matched real tag results (i.e. sample combination intention tag information) can be output, and can be represented as follows:
true label ac chat
Opening the air conditioner; twenty three degrees Cooling somewhat; five degrees
The air quantity is switched to the third gear; sixteen degree
Step 302, performing model training according to the sample voice interaction combination information and the sample combination intention label information to obtain an intention classification model;
in specific implementation, model training can be performed according to the sample voice interaction combination information and the sample combination intention label information, and then an intention classification model can be obtained.
For example, by using the sample voice interaction combination information and the sample combination intention label information, the training of the bert classification model can be performed, and other models can also be used, which is not limited in the present invention; after the intention classification model is obtained, testing can be performed by using the test data, and a real intention classification result (namely a target intention field) of the voice interaction information with weak intention and the context thereof can be obtained.
Step 303, when a voice interaction event is detected, determining target voice interaction information of the voice interaction event;
step 304, judging whether the target voice interaction information is weak voice interaction information;
305, determining historical voice interaction information when the target voice interaction information is judged to be the voice interaction information with weak intentions;
step 306, inputting the target voice interaction information and the historical voice interaction information into a pre-trained intention classification model;
step 307, receiving an intention prediction result output by the intention classification model, and determining a target intention field aiming at the voice interaction event according to the intention prediction result;
and 308, determining a semantic rejection result according to the target intention field, and responding to the voice interaction event according to the semantic rejection result.
An embodiment of the invention is illustrated below with reference to fig. 4:
1. mass query and corresponding domain (namely, sample voice interaction information and sample intention field information corresponding to the sample voice interaction information);
2. generating a weak intention seed library;
3. extracting seed query (namely seed voice interaction information) and corresponding domain (namely sample intention field information);
4. context-dependent textual feature representations and clusters;
5. the bert features (namely seed voice interaction information and seed historical voice interaction information) and clustering results (namely sample voice interaction combination information) of the weak intention query and the context thereof;
6. matching method and batch pre-labeling of pseudo labels (namely label matching results);
7. weak intention query and its real category corresponding to the context feature (i.e. sample combination intention label information);
8. training a Bert model (namely an intention classification model);
9. and predicting the intention query (namely target voice interaction information) to be predicted and the context characteristics (namely historical voice interaction information) of the intention query by adopting an intention classification model, so as to obtain a weak intention query classification label (namely an intention prediction result).
It should be noted that, for simplicity of description, the method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the illustrated order of acts, as some steps may occur in other orders or concurrently in accordance with the embodiments of the present invention. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred and that no particular act is required to implement the invention.
Referring to fig. 5, a schematic structural diagram of a voice interaction data processing apparatus according to an embodiment of the present invention is shown, which may specifically include the following modules:
a historical voice interaction information determining module 501, configured to determine historical voice interaction information when a voice interaction event is detected;
a target intention domain determining module 502, configured to determine a target intention domain for the voice interaction event according to the historical voice interaction information;
and a voice interaction event response module 503, configured to determine a semantic rejection result according to the target intention field, and respond to the voice interaction event according to the semantic rejection result.
In an embodiment of the present invention, the method further includes:
the target voice interaction information determining module is used for determining target voice interaction information of the voice interaction event;
the judging module is used for judging whether the target voice interaction information is weak voice interaction information or not;
a determining module, configured to invoke the historical voice interaction information determining module 501 when it is determined that the target voice interaction information is the weak-intention voice interaction information.
In an embodiment of the present invention, the target intention domain determining module 502 includes:
the model input submodule is used for inputting the target voice interaction information and the historical voice interaction information into a pre-trained intention classification model;
and the model output submodule is used for receiving the intention prediction result output by the intention classification model and determining a target intention field aiming at the voice interaction event according to the intention prediction result.
In an embodiment of the present invention, the method further includes:
the system comprises a sample voice interaction combination information and sample combination intention label information acquisition module, a sample voice interaction combination information and a sample combination intention label information acquisition module, wherein the sample voice interaction combination information and sample combination intention label information acquisition module is used for acquiring sample voice interaction combination information and determining sample combination intention label information corresponding to the sample voice interaction combination information;
and the intention classification model obtaining module is used for carrying out model training according to the sample voice interaction combination information and the sample combination intention label information to obtain an intention classification model.
In an embodiment of the present invention, the module for acquiring sample voice interaction combination information and sample combination intention tag information includes:
the system comprises a sample voice interaction information and sample intention field information acquisition submodule and a sample voice interaction information and sample intention field information acquisition submodule, wherein the sample voice interaction information and the sample intention field information acquisition submodule are used for acquiring sample voice interaction information and sample intention field information corresponding to the sample voice interaction information;
the seed voice interaction information determining submodule is used for determining seed voice interaction information from the sample voice interaction information and determining seed historical voice interaction information of the seed voice interaction information;
and the sample voice interaction combination information obtaining submodule is used for obtaining sample voice interaction combination information according to the seed voice interaction information and the seed historical voice interaction information.
In an embodiment of the present invention, the module for acquiring sample voice interaction combination information and sample combination intention tag information further includes:
the tag matching result generation submodule is used for generating a tag matching result by adopting the sample voice interaction combination information;
and the sample combination intention label information determining submodule is used for determining the sample combination intention label information according to the label matching result.
In an embodiment of the present invention, the voice interaction event response module 503 includes:
a rejection processing result determining submodule, configured to determine that the semantic rejection result is a rejection processing result when the target intention field is a designated intention field;
and the non-rejection processing result determining submodule is used for determining the semantic rejection result as a non-rejection processing result when the target intention field is a non-specified intention field.
In the embodiment of the invention, when the voice interaction event is detected, the historical voice interaction information is determined; determining a target intention field aiming at the voice interaction event according to the historical voice interaction information; and determining a semantic rejection result according to the target intention field, responding to the voice interaction event according to the semantic rejection result, realizing rejection optimization processing aiming at the weak intention session, determining the target intention field according to historical voice interaction information, further determining the semantic rejection result based on the target intention field, comprehensively understanding the voice interaction information of the user, solving the problem of selection of passing or rejecting the weak intention session, and improving the semantic rejection capability.
An embodiment of the present invention further provides a server, which may include a processor, a memory, and a computer program stored in the memory and capable of running on the processor, and when the computer program is executed by the processor, the data processing method for implementing the above voice interaction is implemented.
An embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements the data processing method of voice interaction as above.
For the device embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.
The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, apparatus, or computer program product. Accordingly, embodiments of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
Embodiments of the present invention are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing terminal to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing terminal to cause a series of operational steps to be performed on the computer or other programmable terminal to produce a computer implemented process such that the instructions which execute on the computer or other programmable terminal provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present invention have been described, additional variations and modifications of these embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the embodiments of the invention.
Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or terminal that comprises the element.
The above provides a voice interaction data processing method and apparatus, which are introduced in detail, and the principle and the implementation of the present invention are explained in this document by applying specific examples, and the descriptions of the above embodiments are only used to help understanding the method and the core idea of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims (10)

1. A method for processing data for voice interaction, the method comprising:
when a voice interaction event is detected, determining historical voice interaction information;
determining a target intention field aiming at the voice interaction event according to the historical voice interaction information;
and determining a semantic rejection result according to the target intention field, and responding to the voice interaction event according to the semantic rejection result.
2. The method of claim 1, prior to the determining historical voice interaction information, further comprising:
determining target voice interaction information of the voice interaction event;
judging whether the target voice interaction information is weak voice interaction information or not;
and executing the determined historical voice interaction information when the target voice interaction information is judged to be the voice interaction information with weak intentions.
3. The method of claim 2, wherein determining a target intent field for the voice interaction event based on the historical voice interaction information comprises:
inputting the target voice interaction information and the historical voice interaction information into a pre-trained intention classification model;
and receiving an intention prediction result output by the intention classification model, and determining a target intention field aiming at the voice interaction event according to the intention prediction result.
4. The method of claim 3, further comprising:
acquiring sample voice interaction combination information, and determining sample combination intention label information corresponding to the sample voice interaction combination information;
and performing model training according to the sample voice interaction combination information and the sample combination intention label information to obtain an intention classification model.
5. The method of claim 4, wherein obtaining sample voice interaction combination information comprises:
acquiring sample voice interaction information and sample intention field information corresponding to the sample voice interaction information;
determining seed voice interaction information from the sample voice interaction information, and determining seed historical voice interaction information of the seed voice interaction information;
and obtaining sample voice interaction combination information according to the seed voice interaction information and the seed historical voice interaction information.
6. The method according to claim 4 or 5, wherein the determining sample combination intention label information corresponding to the sample voice interaction combination information comprises:
generating a label matching result by adopting the sample voice interaction combination information;
and determining the sample combination intention label information according to the label matching result.
7. The method according to claim 1, 2 or 3, wherein the determining a semantic rejection result according to the target intention field comprises:
when the target intention field is a designated intention field, determining the semantic rejection result as a rejection processing result;
and when the target intention field is a non-specified intention field, determining the semantic rejection result as a non-rejection processing result.
8. A voice interactive data processing apparatus, characterized in that the apparatus comprises:
the historical voice interaction information determining module is used for determining historical voice interaction information when a voice interaction event is detected;
the target intention field determining module is used for determining a target intention field aiming at the voice interaction event according to the historical voice interaction information;
and the voice interaction event response module is used for determining a semantic rejection result according to the target intention field and responding to the voice interaction event according to the semantic rejection result.
9. A server, characterized in that it comprises a processor, a memory and a computer program stored on the memory and capable of running on the processor, which computer program, when executed by the processor, implements the data processing method of voice interaction according to any one of claims 1 to 7.
10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out a data processing method of voice interaction according to any one of claims 1 to 7.
CN202011541411.3A 2020-12-23 2020-12-23 Voice interaction data processing method and device Pending CN112667076A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202011541411.3A CN112667076A (en) 2020-12-23 2020-12-23 Voice interaction data processing method and device
PCT/CN2021/140591 WO2022135496A1 (en) 2020-12-23 2021-12-22 Voice interaction data processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011541411.3A CN112667076A (en) 2020-12-23 2020-12-23 Voice interaction data processing method and device

Publications (1)

Publication Number Publication Date
CN112667076A true CN112667076A (en) 2021-04-16

Family

ID=75409121

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011541411.3A Pending CN112667076A (en) 2020-12-23 2020-12-23 Voice interaction data processing method and device

Country Status (2)

Country Link
CN (1) CN112667076A (en)
WO (1) WO2022135496A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113221580A (en) * 2021-07-08 2021-08-06 广州小鹏汽车科技有限公司 Semantic rejection method, semantic rejection device, vehicle and medium
CN113470649A (en) * 2021-08-18 2021-10-01 三星电子(中国)研发中心 Voice interaction method and device
WO2022135496A1 (en) * 2020-12-23 2022-06-30 广州橙行智动汽车科技有限公司 Voice interaction data processing method and device
CN115910035A (en) * 2023-03-01 2023-04-04 广州小鹏汽车科技有限公司 Voice interaction method, server and computer readable storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108509619A (en) * 2018-04-04 2018-09-07 科大讯飞股份有限公司 A kind of voice interactive method and equipment
CN110807333A (en) * 2019-10-30 2020-02-18 腾讯科技(深圳)有限公司 Semantic processing method and device of semantic understanding model and storage medium
WO2020155766A1 (en) * 2019-01-31 2020-08-06 平安科技(深圳)有限公司 Method, device and apparatus for identification rejection in intention identification, and storage medium
CN111583919A (en) * 2020-04-15 2020-08-25 北京小米松果电子有限公司 Information processing method, device and storage medium

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109326289B (en) * 2018-11-30 2021-10-22 深圳创维数字技术有限公司 Wake-up-free voice interaction method, device, equipment and storage medium
CN112382291B (en) * 2020-11-23 2021-10-22 北京百度网讯科技有限公司 Voice interaction processing method and device, electronic equipment and storage medium
CN112667076A (en) * 2020-12-23 2021-04-16 广州橙行智动汽车科技有限公司 Voice interaction data processing method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108509619A (en) * 2018-04-04 2018-09-07 科大讯飞股份有限公司 A kind of voice interactive method and equipment
WO2020155766A1 (en) * 2019-01-31 2020-08-06 平安科技(深圳)有限公司 Method, device and apparatus for identification rejection in intention identification, and storage medium
CN110807333A (en) * 2019-10-30 2020-02-18 腾讯科技(深圳)有限公司 Semantic processing method and device of semantic understanding model and storage medium
CN111583919A (en) * 2020-04-15 2020-08-25 北京小米松果电子有限公司 Information processing method, device and storage medium

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022135496A1 (en) * 2020-12-23 2022-06-30 广州橙行智动汽车科技有限公司 Voice interaction data processing method and device
CN113221580A (en) * 2021-07-08 2021-08-06 广州小鹏汽车科技有限公司 Semantic rejection method, semantic rejection device, vehicle and medium
CN113470649A (en) * 2021-08-18 2021-10-01 三星电子(中国)研发中心 Voice interaction method and device
CN115910035A (en) * 2023-03-01 2023-04-04 广州小鹏汽车科技有限公司 Voice interaction method, server and computer readable storage medium

Also Published As

Publication number Publication date
WO2022135496A1 (en) 2022-06-30

Similar Documents

Publication Publication Date Title
CN112632385B (en) Course recommendation method, course recommendation device, computer equipment and medium
CN109918673B (en) Semantic arbitration method and device, electronic equipment and computer-readable storage medium
CN112667076A (en) Voice interaction data processing method and device
CN108255934B (en) Voice control method and device
CN110168535B (en) Information processing method and terminal, computer storage medium
CN109003624B (en) Emotion recognition method and device, computer equipment and storage medium
CN110262273A (en) A kind of home equipment control method, device, storage medium and smart home system
CN110442718A (en) Sentence processing method, device and server and storage medium
US7412383B1 (en) Reducing time for annotating speech data to develop a dialog application
CN109741735B (en) Modeling method, acoustic model acquisition method and acoustic model acquisition device
CN1637744A (en) Machine-learned approach to determining document relevance for search over large electronic collections of documents
CN113590850A (en) Multimedia data searching method, device, equipment and storage medium
US10347243B2 (en) Apparatus and method for analyzing utterance meaning
CN111832305B (en) User intention recognition method, device, server and medium
CN110415679A (en) Voice error correction method, device, equipment and storage medium
CN111462761A (en) Voiceprint data generation method and device, computer device and storage medium
CN112925904B (en) Lightweight text classification method based on Tucker decomposition
CN111666416A (en) Method and apparatus for generating semantic matching model
CN108710653B (en) On-demand method, device and system for reading book
CN111444930A (en) Method and device for determining prediction effect of two-classification model
Al-Talabani et al. Kurdish dialects and neighbor languages automatic recognition
CN110797013A (en) Live broadcast entrance display method of voice live broadcast room, related equipment and storage medium
CN114297390B (en) Aspect category identification method and system in long tail distribution scene
CN113111855B (en) Multi-mode emotion recognition method and device, electronic equipment and storage medium
CN110458383B (en) Method and device for realizing demand processing servitization, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210416