CN112331185B - Voice interaction method, system, storage medium and electronic equipment - Google Patents

Voice interaction method, system, storage medium and electronic equipment Download PDF

Info

Publication number
CN112331185B
CN112331185B CN202011248204.9A CN202011248204A CN112331185B CN 112331185 B CN112331185 B CN 112331185B CN 202011248204 A CN202011248204 A CN 202011248204A CN 112331185 B CN112331185 B CN 112331185B
Authority
CN
China
Prior art keywords
intention
voice interaction
voice
necessary parameters
matched
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011248204.9A
Other languages
Chinese (zh)
Other versions
CN112331185A (en
Inventor
李禹慧
黄姿荣
李�瑞
吴伟
贾巨涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Gree Electric Appliances Inc of Zhuhai
Zhuhai Lianyun Technology Co Ltd
Original Assignee
Gree Electric Appliances Inc of Zhuhai
Zhuhai Lianyun Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Gree Electric Appliances Inc of Zhuhai, Zhuhai Lianyun Technology Co Ltd filed Critical Gree Electric Appliances Inc of Zhuhai
Priority to CN202011248204.9A priority Critical patent/CN112331185B/en
Publication of CN112331185A publication Critical patent/CN112331185A/en
Application granted granted Critical
Publication of CN112331185B publication Critical patent/CN112331185B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The application provides a voice interaction method, a voice interaction system, a storage medium and electronic equipment, wherein the voice interaction method comprises the following steps: acquiring voice information; analyzing the voice information, and acquiring an intention corresponding to the voice information and a first element corresponding to the intention, wherein the first element is matched with a necessary parameter required by executing the intention; judging whether the intention is an explicit intention or not according to whether the weight value of the necessary parameter matched with the first element is larger than or equal to a preset threshold value; if the intent is an explicit intent, then the intent is performed. According to the technical scheme, the user intention can be executed as long as the weight value of the necessary parameters matched to the elements through the analysis of the voice information is larger than or equal to the preset threshold value, and all the necessary parameters are matched one by one through the dialogue between a plurality of rounds and the user, so that the number of rounds of voice interaction is greatly reduced, and the user experience in the voice interaction process is remarkably improved.

Description

Voice interaction method, system, storage medium and electronic equipment
Technical Field
The present application relates to the field of voice interaction technologies, and in particular, to a voice interaction method, a system, a storage medium, and an electronic device.
Background
At present, a voice interaction technology is widely adopted in an intelligent home system so as to realize interaction with a user and control of the user on the system. In the actual voice interaction process, the user often cannot give the necessary parameters required for executing the intention of the user through voice at one time, so that multiple rounds of dialogue interaction are generally required to respectively acquire the corresponding necessary parameters.
As such, the current smart home voice control process has the problems to be solved in urgent need, namely, the number of interaction rounds corresponding to the interaction mode of a plurality of parameters needing to perform multi-round conversations is excessive, the user experience is poor, and the expected result cannot be finally achieved due to the wear-resistant user. Therefore, a voice interaction method is required to be provided, the dialogue is completed in the voice interaction with the least number of rounds, and the user experience is improved.
Disclosure of Invention
In order to solve the above-mentioned problems in the prior art, the present application provides a voice interaction method, a system, a storage medium and an electronic device that can reduce the number of rounds of voice interaction as much as possible and complete a conversation.
In a first aspect, the present application provides a voice interaction method, including:
acquiring voice information;
analyzing the voice information, and acquiring an intention corresponding to the voice information and a first element corresponding to the intention, wherein the first element is matched with a necessary parameter required by executing the intention;
judging whether the intention is an explicit intention or not according to whether the weight value of the necessary parameter matched with the first element is larger than or equal to a preset threshold value;
if the intent is an explicit intent, then the intent is performed.
In one embodiment, the method further comprises:
if the intent is not explicit intent, performing supplemental voice interaction to obtain a second element that matches the necessary parameters that did not match the first element;
enabling the sum of the weight values of the necessary parameters respectively corresponding to the second element and the first element to be larger than or equal to the preset threshold value;
judging the intention as an explicit intention and executing the intention.
According to the embodiment, when the first acquired voice information does not meet the intention execution condition, the voice supplement interaction is performed to supplement the necessary information for acquiring the execution intention, so that the intention is ensured to be executed smoothly.
In one embodiment, the method further comprises:
judging whether a target record matched with the intention exists in the history record according to the analysis result of the voice information;
if there is a target record in the history that matches the intent, a round of voice interaction is performed to confirm whether the intent is performed according to the target record.
According to the method and the device, the target records corresponding to the user intention are matched in the history records, and the records matched with the current intention of the user can be quickly found according to the use habit of the user, so that the intention of the user can be quickly executed according to the target records when the user agrees.
In one embodiment, if there is a target record in the history that matches the intent, performing a round of voice interaction to confirm whether the intent is performed according to the target record includes:
judging whether only one record with highest occurrence frequency exists in the target records;
if only one record with the highest occurrence frequency exists, performing a round of voice interaction to confirm whether the intention is executed according to the record with the highest occurrence frequency;
if there is no record with highest frequency, a round of voice interaction is performed to confirm whether the intention is executed according to the latest record in the target records.
According to the embodiment, the matching weight of the target record is set, namely, the weight of the use frequency of the user is the largest, and the use time is close to the current degree, so that the use habit of the user can be further met.
In one embodiment, the complementary voice interaction is performed to obtain a second element that matches the necessary parameters that do not match the first element, including:
judging the weight value of the necessary parameter which is not matched with the first element;
and preferentially acquiring the second element corresponding to the necessary parameter with the largest weight value through the supplementary voice interaction.
According to the embodiment, when the supplementary voice interaction is performed, the supplementary voice interaction corresponding to the necessary parameter with the largest weight value is performed preferentially. Therefore, on the premise of ensuring that the intention execution conditions are met, the number of interaction rounds for supplementing the voice is reduced as much as possible, and the user experience is improved.
In one embodiment, the supplemental voice interaction is performed at most two times.
According to the method and the device, at most two rounds of supplementary voice interaction are set, so that the dialogue is ended after two rounds of interaction, the number of interaction rounds is greatly reduced, and the user experience is improved.
In one embodiment, among the necessary parameters required to perform the intention, there is a sum of weight values of two of the necessary parameters that is greater than or equal to the preset threshold.
In one embodiment, the method further comprises:
judging whether the necessary parameters which are not matched with the vacant spaces of the elements exist or not when the intention is executed;
if the necessary parameters of the vacancy exist, filling the necessary parameters of the vacancy and executing the intention.
According to the method and the device, after the intention execution condition is met, necessary parameters of the remaining gaps are automatically filled by the voice interaction system, unnecessary interaction with a user is avoided, and user experience is improved.
In a second aspect, the present application also provides a voice interaction system, including:
the voice acquisition module is used for acquiring voice information;
the analysis module is used for analyzing the voice information and acquiring an intention corresponding to the voice information and a first element corresponding to a necessary parameter of the intention;
the judging module is used for judging whether the intention is an explicit intention or not according to whether the weight value of the necessary parameter matched to the first element is larger than or equal to a preset threshold value;
and the execution module is used for executing the intention when the intention is an explicit intention.
In a third aspect, the present application further provides a storage medium, where a computer program is stored, where the computer program, when executed by a processor, implements the above-mentioned voice interaction method.
In a fourth aspect, the present application further provides an electronic device, where the electronic device includes a memory and a processor, where the memory stores a computer program that can be executed on the processor, and the computer program implements the above-mentioned voice interaction method when executed by the processor.
The above-described features may be combined in various suitable ways or replaced by equivalent features as long as the object of the present application can be achieved.
Compared with the prior art, the voice interaction method, the voice interaction system, the storage medium and the electronic equipment provided by the application have the following beneficial effects:
according to the voice interaction method, the voice interaction system, the storage medium and the electronic equipment, provided by the application, through setting the preset threshold value, the necessary parameters and the weight values of the necessary parameters in the corresponding application scene, the user intention can be executed as long as the weight values of the necessary parameters matched to the elements through the analysis of the voice information are larger than or equal to the preset threshold value, all the necessary parameters are matched one by one through the dialogue between a plurality of rounds of interaction and the user, the number of rounds of voice interaction is greatly reduced, and the user experience in the voice interaction process is remarkably improved.
Drawings
The application will be described in more detail hereinafter on the basis of embodiments and with reference to the accompanying drawings. Wherein:
fig. 1 shows a schematic flow chart of the voice interaction method of the present application.
Detailed Description
The application will be further described with reference to the accompanying drawings.
Example 1
This embodiment mainly illustrates the principle of the voice interaction method of the present application.
As shown in fig. 1 of the accompanying drawings, the application provides a voice interaction method, which comprises the following steps:
step S1: and acquiring voice information.
Specifically, the voice information input by the user can be acquired through the microphone of the corresponding electronic device.
Step S2: analyzing the voice information, and acquiring the intention corresponding to the voice information and a first element corresponding to the intention, wherein the first element is matched with necessary parameters required by executing the intention.
Specifically, the voice information input by the user has a corresponding intention, and the intention corresponds to a target result which the user wants to obtain. The intention must be performed to meet the necessary parameters of the intention, and the first element is acquired from the voice information to match and meet the corresponding necessary parameters through parsing the voice information.
Preferably, step S2 further includes:
step S21: judging whether a target record matched with the intention exists in the history record according to the analysis result of the voice information;
step S22: if the target record matched with the intention exists in the history record, a round of voice interaction is performed to confirm whether the intention is executed according to the target record.
Specifically, according to semantic analysis of the voice information and the first element matched with the corresponding necessary parameters, searching whether a target record with the same intention as that corresponding to the current voice information exists in the history record, and if the target record exists, confirming whether the target record is executed to the user in a voice interaction mode. If the user agrees to execute according to the target record, skipping the subsequent steps of the voice interaction method to directly enter the execution of the intention; if the user does not agree to perform per target recording, then proceed to the subsequent steps of the voice interaction method.
Through the matching of the history records, the current intention of the user can be quickly matched according to the past interaction records of the user, and the conversation can be completed and the intention can be executed through one round of interaction.
Preferably, step S22 further includes:
step S221: judging whether only one record with highest occurrence frequency exists in the target records;
step S222: if only one record with the highest occurrence frequency exists, performing a round of voice interaction to confirm whether the intention is executed according to the record with the highest occurrence frequency;
step S223: if there is no record with highest frequency, a round of voice interaction is performed to confirm whether the intention is executed according to the latest record in the target records.
Specifically, when matching target records in the history, if there is only one target record, it may be defined to some extent as the only one record with the highest occurrence frequency, according to which the user is confirmed through voice interaction. More often than not, there will be more than one target record, as the first element in the user's voice message may only match some of the necessary parameters, and accordingly there will be multiple corresponding target records. Therefore, the target record needs to be screened through steps S221 to S223.
Specifically, it is first determined whether there is a unique record with the highest frequency of occurrence in the target records, and the record with the highest frequency of occurrence represents the usage habit of the user to some extent. If the target record has the only record with the highest occurrence frequency, confirming the record with the highest occurrence frequency to the user through voice interaction. If there is no unique record with highest occurrence frequency in the target records, multiple situations may be corresponding, for example, the occurrence frequencies of all the target records are the same or the occurrence frequencies of some records in the target records are the same, in these situations, the user is confirmed through voice interaction according to the latest record in the target records, and the latest record represents the record with the latest sequence according to time.
When the user is confirmed through voice interaction, if the user agrees to execute according to the corresponding target record, the subsequent steps of the voice interaction method are skipped to directly enter the execution of the intention; if the user does not agree to execute the corresponding target record, the following steps of the voice interaction method are continued.
Step S3: and judging whether the intention is an explicit intention according to whether the weight value of the necessary parameter matched with the first element is larger than or equal to a preset threshold value.
Specifically, executing the intention corresponding to the voice information requires satisfying the necessary parameters of the intention, and the necessary parameters are satisfied when the necessary parameters are matched with the first element in the voice information. The necessary parameters are usually plural, and in this embodiment, it is sufficient to satisfy some of the necessary parameters, and when the weight value of the satisfied necessary parameters is greater than or equal to a preset threshold, that is, key information indicating that execution intention is required has been acquired, the meaning of the intention is explicit, and the intention is explicit that execution can be performed directly. If the weight value of the necessary parameter satisfied by the first element is smaller than the preset threshold, the meaning of the intention is not clearly expressed, and the necessary parameter needs to be supplemented by acquiring the second element in step S5 to blur the intention.
It should be noted that, if the necessary parameters to be matched to the first element have a plurality, the weight value of the necessary parameters to be matched to the first element herein represents the sum of the weight values of the plurality of necessary parameters.
Preferably, the necessary parameters, the weight values of the necessary parameters and the preset threshold are set according to the application scene of the voice interaction.
Specifically, the application scenario herein represents the field of voice interaction application, and may be specific electronic devices corresponding to different types, for example, home appliances such as an air conditioner, a washing machine, an electric cooker, and the like with a voice interaction function, or may be software systems corresponding to an intelligent home voice system and the like.
Furthermore, it should be noted that the necessary parameters should be understood as a set of key information necessary for executing the intention, which is set in advance according to the application scenario of the voice interaction. Because the intention that can be executed in the corresponding application scene is within a certain range, and any intention of the user cannot be necessarily executed, and the necessary parameters correspond to all intentions within a certain range that can be executed in the corresponding application scene, the necessary parameters cannot be changed due to different intentions input by the user, and only relevant elements, namely specific key information, matched with the necessary parameters in the voice information are changed.
Step S4: if the intention is an explicit intention, the intention is executed.
Specifically, the meaning of the explicit intention is explicit, and execution may be performed directly.
Step S5 comprises the steps of:
step S51: if the intent is not explicit, a supplemental voice interaction is performed to obtain a second element that matches the necessary parameters that did not match the first element.
Specifically, if the intention is not an explicit intention, that is, the weight value of the necessary parameter matched to the first element is smaller than the preset threshold, the intention is to blur the intention, and the second element needs to be acquired to be supplemented to the necessary parameter not matched to the first element. According to the difference value between the weight value of the necessary parameter matched with the first element and the preset threshold value, a plurality of second elements can be acquired through the supplementary voice interaction, and each round of supplementary voice interaction correspondingly acquires one second element.
Step S52: and enabling the sum of the weight values of the necessary parameters respectively corresponding to the second element and the first element to be larger than or equal to a preset threshold value.
Specifically, when the second element is obtained by performing the supplementary voice interaction round by round, once the sum of the weight value of the necessary parameter matched with the obtained second element and the weight value of the necessary parameter matched with the first element is just greater than or equal to a preset threshold, the execution step of the intention is directly entered. Even if there are necessary parameters that do not match the element, the next round of supplementary voice interaction is not continued.
Step S53: the intention is determined to be a clear intention and the intention is executed.
The purpose of step S5 is to obtain the second element by supplementing the voice interaction to match and satisfy the necessary parameters that are not matched to the first element, so that the sum of the weight values of the necessary parameters that respectively correspond to the second element and the first element is greater than or equal to the preset threshold, that is, after obtaining enough second element, the intention is changed from the ambiguous intention to the explicit intention that can be executed. In step S5, the number of rounds of voice interaction is at least one, each round corresponds to obtaining a second element, and the number of specific rounds to be performed depends on the difference between the weight value of the necessary parameter corresponding to the first element obtained in step S2 and the preset threshold value and the weight value of the necessary parameter that is not matched with the first element.
Preferably, the supplementary voice interaction is performed at most in two rounds.
Specifically, in order to improve the voice interaction experience of the user as much as possible, the number of rounds of voice interaction must be reduced as much as possible, so that the supplementary voice interaction is set to be performed at most two rounds, so that after the voice information of the user is acquired for the first time, the acquisition of the second element is completed and the dialogue is ended through at most two rounds of supplementary voice interaction, and the user experience is greatly improved.
Further, in order to complete a conversation within two rounds of supplementary voice interaction, a preset threshold and a weight value of a necessary parameter need to be preset, that is, a necessary parameter required for executing an intention needs to have a sum of weight values of three necessary parameters greater than or equal to the preset threshold, and a necessary parameter with a minimum weight value must be included in the three necessary parameters.
Specifically, in an extreme case, when only one necessary parameter matches the acquired first element after the voice information acquired in step S1 is analyzed in step S2, the second element needs to be acquired in step S5. Based on the requirement of improving user experience, the supplementary voice interaction only sets up at most two rounds, and the second element acquired in each round only matches one necessary parameter, so the second element acquired in the step S5 can be matched and meet two necessary parameters at most, and then can meet three necessary parameters at most through the step S2 and the step S5. The magnitude of the weights of the preset threshold and the necessary parameters needs to be set so that the sum of the weights of the three necessary parameters is greater than or equal to the preset threshold. Meanwhile, since the necessary parameters that are matched to the first element and are satisfied in step S2 are determined according to the first element in the voice information input by the user, and there is uncertainty, it is possible that the first element acquired in step S2 matches the necessary parameter with the smallest weight value, so in order to avoid the influence of the uncertainty, the necessary parameter with the smallest weight value must be included in the three necessary parameters that are greater than or equal to the preset threshold value.
Preferably, among the necessary parameters required for executing the intention, there is one in which a sum of weight values of two of the necessary parameters is greater than or equal to the preset threshold.
Specifically, according to the foregoing, at most two rounds of supplementary voice interaction are preferably set in step S5, and under the requirement that the user experience needs to be further improved, in step S5, only one round of voice interaction is preferably performed to complete the conversation, and then the weight values of the necessary parameters need to be further set, so that it is possible to complete the conversation only by performing one round of voice interaction. In the extreme case, only one necessary parameter is matched to the first element obtained by analyzing the voice information obtained in the step S1 in the step S2, and the second element obtained by one round of supplementary voice interaction in the step S5 is matched to one necessary parameter, and two necessary parameters are satisfied by the elements, so that the sum of the weight values of the two necessary parameters is required to be greater than or equal to the preset threshold.
Therefore, in an extreme case, if only one necessary parameter in step S2 matches the acquired first element and the necessary parameter is one of the two necessary parameters with the sum of the weight values greater than or equal to the preset threshold, step S5 acquires the second element corresponding to the other necessary parameter through one round of voice interaction, and the dialogue can be completed through one round of voice interaction, so as to further improve the user experience.
Preferably, in step S51, further includes:
step S511: judging the weight value of the necessary parameter which is not matched with the first element;
step S512: and preferentially acquiring the second element corresponding to the necessary parameter with the largest weight value through the supplementary voice interaction.
Specifically, the second element corresponding to the necessary parameter with the largest weight value is preferably obtained to satisfy the necessary parameter, so that step S51 can quickly realize that the sum of the weight values of the necessary parameters corresponding to the second element and the first element respectively is greater than or equal to the preset threshold in the voice interaction with the smallest number of rounds.
Preferably, the voice interaction method of the present embodiment further includes:
judging whether the necessary parameters which are not matched with the vacant spaces of the elements exist or not when the intention is executed;
if the necessary parameters of the vacancy exist, filling the necessary parameters of the vacancy and executing the intention.
Specifically, when the execution condition is intended to be satisfied, that is, the weight value of the necessary parameter that the first element matches and satisfies is greater than or equal to a preset threshold value or the sum of the weight values of the necessary parameters that the second element corresponds to the first element, respectively, is greater than or equal to a preset threshold value, there may still be necessary parameters that are not matched to the absence of any element. In executing the intention, in principle, all the necessary parameters should be satisfied, but when part of the necessary parameters are satisfied and the corresponding weight values are greater than the preset threshold, the core information required for executing the intention is acquired, and the remaining necessary parameters still vacant can be adaptively filled, and the filling process is not required to be performed by the user in person any more, and the corresponding voice interaction system can perform filling by itself.
Further, for filling of the necessary parameters of the void, filling is preferably performed by matching the usage habits of the user in the history. If the history record has no corresponding user using habit, filling is carried out according to the first element or the first element and the second element and combining the conventional conditions.
Example two
In this embodiment, an electric cooker is taken as an example, and the voice interaction method of the present application is further described by a voice interaction example.
The application scene of voice interaction is a specific household appliance, namely an electric cooker, and the known intelligent electric cooker has the following functional parameters: reservation time, cooking mode, rice type, taste, and corresponding necessary parameters are set corresponding to the functional parameters. Meanwhile, according to the influence degree of the necessary parameters on the functions of the electric cooker, setting a corresponding weight value and a preset threshold value; fully divided into 10, the reservation time is 2, the cooking mode is 4.5, the rice seeds are 0.5, the rice type is 0.5, the mouthfeel is 2.5, and the preset threshold value is 6.
The user: i eat a little softer rice.
The voice information is acquired and analyzed, the intention of the user is obtained to cook rice, and two first elements are acquired at the same time: the soft, the rice and the soft correspond to the taste, the rice corresponds to the cooking mode, the taste and the cooking mode are satisfied, the sum of the weight values of the taste and the cooking mode is 7 and is larger than a preset threshold 6, the intention of the user is clear intention, and the subsequent voice interaction step is skipped; meanwhile, according to the soft and rice, the reservation time, rice seeds and rice type are filled by the user by combining the habit of the user and the conventional situation, and the intention of cooking is executed.
Preferably, the user's voice information and the first element are matched to reply to the user by voice.
And (3) replying: a little softer rice is being cooked for you.
Example III
In this embodiment, an electric cooker is taken as an example, and the voice interaction method of the present application is further described by a voice interaction example.
The application scene of voice interaction is a specific household appliance, namely an electric cooker, and the known intelligent electric cooker has the following functional parameters: reservation time, cooking mode, rice type, taste, and corresponding necessary parameters are set corresponding to the functional parameters. Meanwhile, according to the influence degree of the necessary parameters on the functions of the electric cooker, setting a corresponding weight value and a preset threshold value; fully divided into 10, the reservation time is 2, the cooking mode is 4.5, the rice seeds are 0.5, the rice type is 0.5, the mouthfeel is 2.5, and the preset threshold value is 6.
The user: i want to eat rice.
The voice information is acquired and analyzed, the intention of the user is obtained to cook rice, and a first element is acquired at the same time: the rice and the rice correspond to a cooking mode, the necessary parameters of the cooking mode are met, the weight value is 4.5 and is smaller than the preset threshold value 6, the intention of the user is not clear intention and is fuzzy intention, and the subsequent voice interaction step is continued. In addition to the cooking mode, the remaining necessary parameters have the largest taste weight value, and the second element matched with the taste is preferentially obtained through a round of voice interaction.
And (3) replying: please ask you what taste of cooked rice?
The user: is somewhat soft.
Obtaining a second element 'soft' of the matched taste, wherein the sum of the taste and the weight value of the cooking mode is 7, and is larger than a preset threshold 6, the intention of the user is converted from fuzzy intention into executable clear intention, and the conversation is completed; meanwhile, according to the soft and rice, the reservation time, rice seeds and rice type are filled by the user by combining the habit of the user and the conventional situation, and the intention of cooking is executed.
Preferably, the voice information of the user is matched with the first element and the second element to reply to the user through voice.
And (3) replying: a little softer rice is being cooked for you.
Furthermore, in the above example, if the user cannot answer to the taste, the next round of voice interaction is continued, such as:
and (3) replying: please ask you what taste of cooked rice?
The user: is not known.
And removing the necessary parameters of taste, wherein the remaining necessary parameters have the maximum weight value of the reserved time, and preferentially acquiring a second element of the matched reserved time through the next round of voice interaction.
And (3) replying: how long do you ask you to reserve?
The user: XX hours.
The second factor XX time of the matching reservation time is obtained, the reservation time and the cooking mode are satisfied, the sum of the weight values of the reservation time and the cooking mode is 6.5, the weight value is larger than a preset threshold value 6, the intention of the user is clear intention which can be directly executed, and the dialogue is completed; meanwhile, according to the soft and rice and combining the habit of the user and the conventional situation, rice seeds are filled by the user himself, and the intention of cooking is executed.
Preferably, the voice information of the user is matched with the first element and the second element to reply to the user through voice.
And (3) replying: after XX hours, you cook the rice.
Further, if the user cannot answer or answer the voice information in the interaction process, the user can continuously perform the next round of supplementary voice interaction from large to small according to the remaining necessary parameters to obtain a second element and match the corresponding necessary parameters, so as to finally meet the execution conditions of the intention. If the number of the supplementary voice interaction rounds exceeds the preset number of the two rounds due to unexpected situations (the user cannot reply), the dialogue can be selectively disconnected according to actual situations or the next supplementary voice interaction round can be continued when the weight value of the remaining necessary parameters can meet the intention execution condition. If the necessary parameters matched by voice interaction ultimately fail to meet the intended execution conditions (the user cannot reply or the reply cannot be recognized), the user is replied by spam, for example:
and (3) replying: please tell me again after you want.
Further, if the sum of the weight values of all the remaining necessary parameters and the weight values of the satisfied necessary parameters cannot reach the condition greater than or equal to the preset threshold value in the process of performing the voice-filling interaction, the voice-filling interaction can be set to finish the conversation in advance through the spam operation, and no meaningless voice-filling interaction is performed continuously round by round.
Example IV
In this embodiment, an electric cooker is taken as an example, and the voice interaction method of the present application is further described by a voice interaction example.
The application scene of voice interaction is a specific household appliance, namely an electric cooker, and the known intelligent electric cooker has the following functional parameters: reservation time, cooking mode, rice type, taste, and corresponding necessary parameters are set corresponding to the functional parameters. Meanwhile, according to the influence degree of the necessary parameters on the functions of the electric cooker, setting a corresponding weight value and a preset threshold value; fully divided into 10, the reservation time is 2, the cooking mode is 4.5, the rice seeds are 0.5, the rice type is 0.5, the mouthfeel is 2.5, and the preset threshold value is 6.
The user: i eat a little softer rice.
Analyzing the voice information, acquiring corresponding intention 'cooking' and first element 'soft' and 'rice', searching the history record, and judging whether a target record matching the corresponding intention and the first element exists in the history record. If the target record does not exist, continuing to enter the subsequent steps of the voice interaction method. If the target record exists, further judging whether the target record has only one record with highest occurrence frequency.
If there is a unique record with the highest frequency of occurrence, replying to the user for confirmation according to the unique record with the highest frequency of occurrence, for example:
and (3) replying: matching to you have the most commonly used cooking record, reservation time: XX, cooking mode: XX, rice: XX, rice type: XX, mouthfeel: XX, cooking according to the record.
If there is no record with the highest occurrence frequency, replying to the user for confirmation according to the latest record in the target records, for example:
and (3) replying: matching to your most recently used cooking record, reservation time: XX, cooking mode: XX, rice: XX, rice type: XX, mouthfeel: XX, cooking according to the record.
The user replies agreements, and the subsequent steps of the voice interaction method are executed according to the corresponding target record and skipped; if the user replies disagreement, continuing to enter the subsequent corresponding step of the voice interaction method according to whether the user's intention is an explicit intention. In the above example, after the user replies disagreement, according to the analysis result of the voice information, the necessary parameters corresponding to the first element "soft" and "rice" are taste and cooking mode, the sum of the weight values of the two is 7, and is greater than the preset threshold 6, the intention of the user is clear intention, and the subsequent voice interaction step is skipped; meanwhile, according to the soft and rice, the reservation time, rice seeds and rice type are filled by the user by combining the habit of the user and the conventional situation, and the intention of cooking is executed.
Note that, regardless of whether the voice information input by the user for the first time corresponds to an explicit intention or a fuzzy intention, the matching of the target record is first performed in the history. Such as:
the user: i want to eat rice.
And similarly, analyzing the voice information, matching corresponding target records in the history records according to the analysis result, and carrying out the corresponding steps according to whether the target records exist. If the corresponding target record does not exist or the user does not agree to execute according to the target record, entering the subsequent corresponding step of the voice interaction method according to the analysis result of the voice information. For the above example, the first element "rice" matching the cooking mode is analyzed, the weight value is 4.5, which is smaller than the preset threshold 6, the intention is not clear, the subsequent interaction step is performed, and the subsequent step is performed with reference to the related step in the third embodiment.
Example five
According to an embodiment of the present application, there is also provided a voice interaction system including:
the voice acquisition module is used for acquiring voice information;
the analysis module is used for analyzing the voice information and acquiring the intention corresponding to the voice information and a first element of necessary parameters corresponding to the intention;
the judging module is used for judging whether the intention is an explicit intention or not according to whether the weight value of the necessary parameter matched to the first element is larger than or equal to a preset threshold value;
and the execution module is used for executing the intention when the intention is clear.
Example six
According to an embodiment of the present application, there is also provided a storage medium having stored thereon a computer program which, when executed by a processor, implements the voice interaction method of any of the above embodiments.
Example seven
According to an embodiment of the present application, there is also provided an electronic device including a memory, a processor, and a computer program stored in the memory, the computer program being executable on the processor, and when the computer program is executed by the processor, implementing the voice interaction method according to any one of the above embodiments.
By setting the preset threshold, the necessary parameters and the weight values of the necessary parameters in the corresponding application scenes, the voice interaction method, the voice interaction system, the storage medium and the electronic equipment provided by the application can execute the user intention as long as the weight values of the necessary parameters matched to the elements through analysis of the voice information are larger than or equal to the preset threshold, and all the necessary parameters are matched one by one through the dialogue between a plurality of rounds and the user, so that the number of rounds of voice interaction is greatly reduced, and the user experience in the voice interaction process is remarkably improved.
In the several embodiments provided by the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of elements is merely a logical functional division, and there may be additional divisions of actual implementation, e.g., multiple elements or components may be combined or integrated into another system, or some features may be omitted, or not performed.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the embodiment of the present application.
In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application is essentially or a part contributing to the prior art, or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium, comprising several instructions for causing an electronic device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods of the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
Although the embodiments of the present application are disclosed above, the embodiments are only used for the convenience of understanding the present application, and are not intended to limit the present application. Any person skilled in the art can make any modification and variation in form and detail without departing from the spirit and scope of the present disclosure, but the scope of the present disclosure is still subject to the scope of the present disclosure as defined by the appended claims.

Claims (10)

1. A method of voice interaction, comprising:
acquiring voice information;
analyzing the voice information, and acquiring an intention corresponding to the voice information and a first element corresponding to the intention, wherein the first element is matched with a necessary parameter required by executing the intention;
judging whether the intention is an explicit intention or not according to whether the weight value of the necessary parameter matched with the first element is larger than or equal to a preset threshold value;
if the intent is an explicit intent, then performing the intent;
judging whether the necessary parameters which are not matched with the vacant spaces of the elements exist or not when the intention is executed;
if the necessary parameters of the vacancy exist, filling the necessary parameters of the vacancy and executing the intention.
2. The voice interaction method of claim 1, wherein the method further comprises:
if the intent is not explicit intent, performing supplemental voice interaction to obtain a second element that matches the necessary parameters that did not match the first element;
enabling the sum of the weight values of the necessary parameters respectively corresponding to the second element and the first element to be larger than or equal to the preset threshold value;
judging the intention as an explicit intention and executing the intention.
3. The voice interaction method according to claim 1 or 2, characterized in that the method further comprises:
judging whether a target record matched with the intention exists in the history record according to the analysis result of the voice information;
if there is a target record in the history that matches the intent, a round of voice interaction is performed to confirm whether the intent is performed according to the target record.
4. The voice interaction method of claim 3, wherein if there is a target record in the history that matches the intent, performing a round of voice interaction to confirm whether the intent is executed according to the target record comprises:
judging whether only one record with highest occurrence frequency exists in the target records;
if only one record with the highest occurrence frequency exists, performing a round of voice interaction to confirm whether the intention is executed according to the record with the highest occurrence frequency;
if there is no record with highest frequency, a round of voice interaction is performed to confirm whether the intention is executed according to the latest record in the target records.
5. The voice interaction method according to claim 2, wherein performing a supplementary voice interaction to obtain a second element that matches the necessary parameters that do not match the first element comprises:
judging the weight value of the necessary parameter which is not matched with the first element;
and preferentially acquiring the second element corresponding to the necessary parameter with the largest weight value through the supplementary voice interaction.
6. The voice interaction method of claim 2, wherein the supplemental voice interaction is performed at most two times.
7. The voice interaction method according to claim 6, wherein a sum of weight values of two of the necessary parameters required for performing the intention is greater than or equal to the preset threshold.
8. A voice interactive system, comprising:
the voice acquisition module is used for acquiring voice information;
the analysis module is used for analyzing the voice information and acquiring an intention corresponding to the voice information and a first element corresponding to a necessary parameter of the intention;
the judging module is used for judging whether the intention is an explicit intention or not according to whether the weight value of the necessary parameter matched to the first element is larger than or equal to a preset threshold value;
an execution module for executing the intention when the intention is an explicit intention;
wherein, when executing the intention, the judging module is further used for judging whether the necessary parameters which are not matched with the vacant spaces of the elements exist; if the necessary parameters of the gap exist, the execution module fills the necessary parameters of the gap and executes the intention.
9. A storage medium having a computer program stored thereon, which, when executed by a processor, implements the voice interaction method according to any of claims 1 to 7.
10. An electronic device comprising a memory, a processor, the memory having stored thereon a computer program executable on the processor, the computer program, when executed by the processor, implementing the voice interaction method of any of claims 1-7.
CN202011248204.9A 2020-11-10 2020-11-10 Voice interaction method, system, storage medium and electronic equipment Active CN112331185B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011248204.9A CN112331185B (en) 2020-11-10 2020-11-10 Voice interaction method, system, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011248204.9A CN112331185B (en) 2020-11-10 2020-11-10 Voice interaction method, system, storage medium and electronic equipment

Publications (2)

Publication Number Publication Date
CN112331185A CN112331185A (en) 2021-02-05
CN112331185B true CN112331185B (en) 2023-08-11

Family

ID=74317969

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011248204.9A Active CN112331185B (en) 2020-11-10 2020-11-10 Voice interaction method, system, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN112331185B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113889076B (en) * 2021-09-13 2022-11-01 北京百度网讯科技有限公司 Speech recognition and coding/decoding method, device, electronic equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106847278A (en) * 2012-12-31 2017-06-13 威盛电子股份有限公司 System of selection and its mobile terminal apparatus and information system based on speech recognition
CN109657236A (en) * 2018-12-07 2019-04-19 腾讯科技(深圳)有限公司 Guidance information acquisition methods, device, electronic device and storage medium
US10418032B1 (en) * 2015-04-10 2019-09-17 Soundhound, Inc. System and methods for a virtual assistant to manage and use context in a natural language dialog
CN110516786A (en) * 2019-08-28 2019-11-29 出门问问(武汉)信息科技有限公司 A kind of dialogue management method and apparatus
CN111223485A (en) * 2019-12-19 2020-06-02 深圳壹账通智能科技有限公司 Intelligent interaction method and device, electronic equipment and storage medium
CN111680144A (en) * 2020-06-03 2020-09-18 湖北亿咖通科技有限公司 Method and system for multi-turn dialogue voice interaction, storage medium and electronic equipment

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106847278A (en) * 2012-12-31 2017-06-13 威盛电子股份有限公司 System of selection and its mobile terminal apparatus and information system based on speech recognition
US10418032B1 (en) * 2015-04-10 2019-09-17 Soundhound, Inc. System and methods for a virtual assistant to manage and use context in a natural language dialog
CN109657236A (en) * 2018-12-07 2019-04-19 腾讯科技(深圳)有限公司 Guidance information acquisition methods, device, electronic device and storage medium
CN110516786A (en) * 2019-08-28 2019-11-29 出门问问(武汉)信息科技有限公司 A kind of dialogue management method and apparatus
CN111223485A (en) * 2019-12-19 2020-06-02 深圳壹账通智能科技有限公司 Intelligent interaction method and device, electronic equipment and storage medium
CN111680144A (en) * 2020-06-03 2020-09-18 湖北亿咖通科技有限公司 Method and system for multi-turn dialogue voice interaction, storage medium and electronic equipment

Also Published As

Publication number Publication date
CN112331185A (en) 2021-02-05

Similar Documents

Publication Publication Date Title
CN107948761B (en) Bullet screen play control method, server and bullet screen play control system
McCarthy et al. Experience-based critiquing: Reusing critiquing experiences to improve conversational recommendation
KR20030004448A (en) Adaptive Sampling technique for selecting negative examples for artificial intelligence applications
GB2453753A (en) Method and system for generating recommendations of content items
CN110784768B (en) Multimedia resource playing method, storage medium and electronic equipment
CN113485144B (en) Intelligent home control method and system based on Internet of things
CN112331185B (en) Voice interaction method, system, storage medium and electronic equipment
CN103218034A (en) Application object adjusting method and electronic device
CN111385594A (en) Virtual character interaction method, device and storage medium
CN109299293A (en) Cooking tip method, apparatus, equipment and storage medium for AR scene
CN108932947B (en) Voice control method and household appliance
CN111523050B (en) Content recommendation method, server and storage medium
CN110781341A (en) Audio album recommendation method and system fusing multi-strategy recall data sets
CN113009839B (en) Scene recommendation method and device, storage medium and electronic equipment
CN108769809B (en) Smart television-based home user behavior data acquisition method and device and computer-readable storage medium
CN114253147A (en) Intelligent device control method and device, electronic device and storage medium
CN116540556A (en) Equipment control method and device based on user habit
CN116224812A (en) Intelligent device control method and device and electronic device
CN113920995A (en) Processing method and device of voice engine, electronic equipment and storage medium
CN110381339B (en) Picture transmission method and device
JP5116811B2 (en) Program recommendation device, method and program
CN113439253B (en) Application cleaning method and device, storage medium and electronic equipment
CN113268631B (en) Video screening method and device based on big data
CN117118773B (en) Scene generation method, system and storage medium
CN111273555B (en) Smart home control method and device, terminal and computer readable medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant