CN116129890A

CN116129890A - Voice interaction processing method, device and storage medium

Info

Publication number: CN116129890A
Application number: CN202310121375.2A
Authority: CN
Inventors: 张强; 智建军; 王丹叶
Original assignee: Lenovo Beijing Ltd
Current assignee: Lenovo Beijing Ltd
Priority date: 2023-02-03
Filing date: 2023-02-03
Publication date: 2023-05-16

Abstract

The application provides a voice interaction processing method, a voice interaction processing device and a storage medium, and in a voice interaction mode of electronic equipment, the application can detect whether a sensitive input condition is met in real time, so that interference voice is timely obtained and played under the condition that the sensitive input condition is met, an electronic equipment user can speak input voice containing sensitive content in an environment where the interference voice is played later, the fact that other people cannot directly recognize the sensitive content is guaranteed, and the risk of leakage of the sensitive content is reduced. Moreover, the electronic equipment knows the played interference voice, can rapidly and accurately filter the interference voice in the acquired voice to be processed, accurately obtain the input voice containing the sensitive content input by the user, and ensure the safety and reliability of voice interaction.

Description

Voice interaction processing method, device and storage medium

Technical Field

The present application relates generally to the field of artificial intelligence applications, and more particularly, to a method and apparatus for processing voice interaction, and a storage medium.

Background

Along with the development of voice interaction technology, the method has been widely applied to electronic devices such as AR (Augmented Reality) devices, intelligent robots, self-service devices, intelligent medical devices and the like, and is convenient for users to interact with the electronic devices or other users in communication connection with the electronic devices in a voice manner, so that interaction efficiency and convenience are improved.

However, in the voice interaction process, sensitive contents such as account passwords, identity card numbers and other personal privacy are often involved, and users directly input the sensitive contents in a voice mode, so that privacy disclosure is easily caused, and user experience is reduced.

Disclosure of Invention

In order to solve the technical problems, the application provides the following technical scheme:

the application provides a voice interaction processing method, which comprises the following steps:

determining that sensitive input conditions are met when the electronic equipment is in a voice interaction mode, and obtaining interference voice;

controlling an audio player to play the interference voice under the current voice interaction environment;

acquiring voice to be processed acquired by an audio acquisition device in the voice interaction environment; the voice to be processed comprises input voice which is input by the user of the electronic equipment and contains sensitive content in the process of playing the interference voice;

and carrying out noise reduction processing on the voice to be processed according to the interference voice to obtain the input voice for voice interaction processing.

Optionally, the determining meets a sensitive input condition, including any implementation manner of the following:

obtaining content to be interacted aiming at the electronic equipment user, and determining that sensitive content exists in the content to be interacted;

Determining that a voice interaction interface output by the electronic equipment belongs to a preset sensitive interaction interface;

and determining that the interactive prompt statement output by the electronic equipment belongs to a preset sensitive statement.

Optionally, the obtaining the content to be interacted with for the electronic device user includes any one of the following implementation manners:

obtaining the content to be interacted indicated by the voice interaction interface output by the electronic equipment;

analyzing the interactive prompt voice output by the electronic equipment, and predicting the content to be interacted for the user of the electronic equipment;

analyzing the interactive voice from the second electronic equipment, and predicting the content to be interacted for the user of the electronic equipment; the second electronic device is communicatively connected with the electronic device.

Optionally, the obtaining the interfering voice includes any implementation manner of the following:

obtaining interference voice for the user of the electronic equipment;

reading interference voice prestored in the electronic equipment;

and obtaining corresponding interference voice according to the content meeting the sensitive input condition.

Optionally, the obtaining the interference voice for the user of the electronic device includes:

obtaining historical input voice of a user of the electronic equipment;

Analyzing the historical input voice to obtain voice characteristics of the user of the electronic equipment;

and generating interference voice aiming at the user of the electronic equipment according to the voice characteristics.

Optionally, the obtaining the historical input voice of the user of the electronic device includes:

acquiring historical input voice input by the electronic equipment user under the condition that the sensitive input condition is not met; or alternatively, the process may be performed,

obtaining historical input voice of a preset continuous audio frame input by a user of the electronic equipment under the condition of meeting the sensitive input condition; the historical input speech of the preset continuous audio frames does not contain sensitive content or contains incomplete sensitive content.

Optionally, the method further comprises:

determining that the sensitive input condition is not met, and controlling the audio player to stop playing the interference voice;

under the voice interaction environment which does not meet the sensitive input conditions, the obtained input voice of the electronic equipment user does not contain sensitive content.

Optionally, the controlling the audio player to play the interfering voice in the current voice interaction environment includes:

when the input voice of the user of the electronic equipment under the condition of meeting the sensitive input is detected, controlling an audio player to play the interference voice under the current voice interaction environment;

The audio player is integrated in the electronic equipment or is located in the voice interaction environment and is in communication connection with the electronic equipment.

The application also provides a voice interaction processing device, which comprises:

the interference voice obtaining module is used for determining that the sensitive input condition is met when the electronic equipment is in a voice interaction mode to obtain interference voice;

the interference voice playing control module is used for controlling the audio player to play the interference voice in the current voice interaction environment;

the voice to be processed obtaining module is used for obtaining the voice to be processed collected by the audio collector in the voice interaction environment; the voice to be processed comprises input voice which is input by the user of the electronic equipment and contains sensitive content in the process of playing the interference voice;

and the noise reduction processing module is used for carrying out noise reduction processing on the voice to be processed according to the interference voice to obtain the input voice for voice interaction processing.

The application also provides a computer readable storage medium, on which a computer program is stored, the computer program being loaded and executed by a processor, to implement the voice interaction processing method as described above.

The application also proposes an electronic device comprising:

an audio collector; an audio player; a communication interface;

a memory for storing a program for implementing the voice interaction processing method as described above;

and the processor is used for loading and executing the program stored in the memory to realize the voice interaction processing method.

Therefore, the application provides a voice interaction processing method, a voice interaction processing device and a storage medium, and in a voice interaction mode of electronic equipment, whether the sensitive input condition is met can be detected in real time, so that interference voice is timely obtained and played under the condition that the sensitive input condition is met, an electronic equipment user can speak input voice containing sensitive content in the environment of the interference voice playing, the fact that other people cannot directly recognize the sensitive content is guaranteed, and the risk of sensitive content leakage is reduced. Moreover, the electronic equipment knows the played interference voice, can rapidly and accurately filter the interference voice in the acquired voice to be processed, accurately obtain the input voice containing the sensitive content input by the user, and ensure the safety and reliability of voice interaction.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present invention, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart illustrating an alternative example of a voice interaction processing method proposed in the present application;

FIG. 2 is a flow chart illustrating yet another alternative example of a voice interaction processing method proposed in the present application;

FIG. 3 is a flow chart illustrating yet another alternative example of a voice interaction processing method proposed in the present application;

FIG. 4 is a schematic structural diagram of an alternative example of a voice interaction processing apparatus according to the present application;

FIG. 5 is a schematic diagram of a hardware architecture of an alternative example of an electronic device suitable for use in the voice interaction processing method presented herein;

fig. 6 is a schematic flow chart of an alternative scenario of the voice interaction processing method proposed in the present application.

Detailed Description

Aiming at the technical problems described in the background art, in order to include the security of the sensitive content of voice input, the method and the device for predicting the voice input by the user can timely output interference voice when the voice to be input by the user contains the sensitive content, so that the user inputs the input voice containing the sensitive content in the playing environment of the interference voice, other users in the voice acquisition environment can hear the sensitive content such as account numbers, identification numbers and the like spoken by the user, the sensitive content input by the user is leaked, the acquired voice can be filtered by directly utilizing the known interference voice, the voice actually input by the user is obtained quickly and accurately, and the reliability and the security of voice interaction are ensured.

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Referring to fig. 1, a flowchart of an optional example of a voice interaction processing method provided in the present application may be applicable to any terminal of an electronic device, such as a smart phone with a voice interaction function, a notebook computer, a self-service device, a smart home, a smart medical device, a vehicle-mounted terminal, a robot, or a smart wearable device such as AR glasses/helmets, where in a voice interaction scenario where a user uses the electronic device, in order to ensure safety of voice interaction, as shown in fig. 1, the voice interaction processing method executed by the electronic device may include:

step S11, determining that sensitive input conditions are met when the electronic equipment is in a voice interaction mode, and obtaining interference voice;

in combination with the above description of the technical solution of the present application, when a user starts a voice assistant of an electronic device and enters a voice interaction mode of the electronic device, the user may input interaction content to the electronic device through a voice input manner, and the principles of implementing the voice interaction function of the electronic device will not be described in detail in this application.

In the voice input process of a user by using an electronic device, in order to prevent the input voice from containing sensitive content, such as account numbers, passwords, identification numbers and other personal privacy scenes, the user directly speaks the content to cause leakage. The present application does not limit the content of the interfering voice and the method for obtaining the same.

It can be seen that the above-mentioned sensitive input condition may be a condition for predicting/determining that the interactive voice to be output by the user of the electronic device contains sensitive content. In practical application, whether the sensitive input condition is met or not can be determined through the modes of the interactive object of the voice interactive scene, the output content thereof and the like, and the content of the sensitive input condition can be different for different interactive objects.

Step S12, controlling an audio player to play the interference voice in the current voice interaction environment;

step S13, the voice to be processed collected by the audio collector in the voice interaction environment is obtained; the voice to be processed comprises input voice containing sensitive content, which is input by a user of the electronic equipment, in the process of playing the interference voice;

in combination with the above description of the technical solution of the present application, since the voice content subsequently spoken by the user of the electronic device may include the sensitive content, in order to avoid the sensitive content from being heard by others, the electronic device may timely control the audio player to play the interfering voice under the current voice interaction scene, so that the user of the electronic device inputs the input voice including the sensitive content in the interfering voice playing environment, and thus, the other people in the current voice interaction environment are interfered by the voice, and cannot accurately hear the sensitive content, thereby reducing the leakage risk of the voice-inputted sensitive content.

The audio player can be integrated in the electronic equipment, such as a loudspeaker of the electronic equipment, and the processor of the electronic equipment timely controls the loudspeaker to play the obtained interference voice after determining that the sensitive input condition is met, so that the interference voice exists in the environment when the user of the electronic equipment speaks the sensitive content, and the voice interaction safety is improved.

Optionally, the above-mentioned audio player may also be an independent audio device under a current voice interaction environment of the electronic device, such as an independent sound box or other terminals with speakers, where the independent audio device may be in communication connection with the electronic device, so that after the electronic device obtains the interference voice, the electronic device may timely transmit the interference voice to the independent audio device, when the user of the electronic device speaks the voice of the sensitive content, the independent audio device may be controlled to play the interference voice.

In addition, the deployment modes of the audio collector and the audio player can be integrated in an electronic device, such as a microphone, or can be an independent audio collection device or other terminals with audio collection function, and no matter which application scene, under the condition that the electronic device enters a voice interaction mode, the voice output by a user of the electronic device under the current voice interaction environment can be collected, the collected input voice is sent to a processor of the electronic device for processing, and the application is not limited with respect to a voice collection and transmission method.

Therefore, in practical application, the electronic device is in a voice interaction mode, the audio collector can enter an audio collection state to collect various audio signals (i.e. voices) existing in the current voice interaction environment in real time, according to the processing method described above, an electronic device user inputs input voices containing sensitive contents in the voice interaction environment where interference voices are played, the input voices and the interference voices are overlapped and output in the voice interaction environment, it is ensured that other people in the environment cannot directly recognize the sensitive contents in the input voices, and meanwhile, each frame of audio synchronously collected by the audio collector is mixed audio containing the interference voices and the input voices output by the user.

Step S14, noise reduction processing is carried out on the voice to be processed according to the interference voice, and input voice for voice interaction processing is obtained.

After the electronic equipment obtains the voice to be processed collected under the interference voice playing environment, the electronic equipment can filter the voice to be processed by actually utilizing the known interference voice because the electronic equipment knows the interference voice, so that the interference voice contained in the voice to be processed is filtered rapidly and accurately, and the input voice containing sensitive content which is actually input by the user of the electronic equipment is obtained.

Optionally, in order to improve the speech recognition efficiency or the speech communication quality, the electronic device may also perform noise reduction processing on the speech after filtering the interference speech by adopting other noise reduction manners, so as to accurately obtain the input speech spoken by the user of the electronic device, and use subsequent speech interaction processing.

For the input voice obtained by the noise reduction processing, the subsequent processing can be performed according to the interaction requirement of the voice interaction scene, for example, in the man-machine interaction scene, the electronic equipment can perform voice recognition on the input voice, respond to the obtained voice input content, control the electronic equipment to execute corresponding operations, for example respond to the recognized sensitive content such as account numbers, passwords and the like, enter a target interface of an application operated by the electronic equipment, and continue to execute subsequent man-machine interaction. In the case that the user uses the voice assistant of the electronic device to perform voice communication with other users, the electronic device may send the obtained input voice to the second electronic device in communication connection with the user, and the second electronic device plays the voice input by the user of the electronic device, so that the implementation process is not described in detail in the present application. It should be noted that, according to the obtained input voice, the electronic device performs the subsequent voice interaction process, which includes but is not limited to the above list.

In summary, under the voice interaction mode of electronic equipment, the method and the device can detect whether the sensitive input condition is met in real time, so that under the condition that the sensitive input condition is met, interference voice is timely obtained and played, an electronic equipment user can output input voice containing sensitive content in the environment of the interference voice playing, the effect of superposition output of the input voice and the interference voice is presented in the current voice interaction environment, the fact that other people cannot directly identify the sensitive content contained in the input voice is guaranteed, and the risk of leakage of the sensitive content is greatly reduced.

Moreover, the electronic equipment knows the played interference voice, can rapidly and accurately filter the interference voice in the acquired voice to be processed, accurately obtain the input voice containing the sensitive content input by the user, and ensure the safety and reliability of voice interaction.

Referring to fig. 2, for a schematic flow chart of yet another alternative example of the voice interaction processing method proposed in the present application, this embodiment may describe an alternative refinement implementation of the voice interaction processing method proposed above, as shown in fig. 2, where the method may include:

step S21, obtaining the content to be interacted for the user of the electronic equipment when the electronic equipment is in a voice interaction mode;

Step S22, determining that sensitive content exists in the content to be interacted, and obtaining interference voice aiming at a user of the electronic equipment;

in order to predict whether the voice to be input by the user of the electronic device contains sensitive content or not, leakage caused by direct speaking under the current voice interaction environment is avoided, the embodiment of the application provides that the voice content to be interacted of the user, namely, the voice content to be output by the user is hoped or indicated in real time, whether the sensitive input condition is met or not is determined by detecting whether the voice content to be interacted exists or not, and accordingly whether the interference voice needs to be played or not is determined subsequently.

The sensitive content may be predefined content that cannot be revealed, such as personal privacy of account passwords, identity card numbers, etc., or important content that accords with industry rules, etc., may be predefined words and/or voices that cannot be revealed and related words thereof, etc., and may also include matching conditions defined for the content that cannot be revealed, etc. In practical application, the configuration can be configured by a manager or maintainer or producer in the industry of the electronic equipment, if the electronic equipment is a personal user terminal of a user, the user can also predefine the configuration to meet the personalized confidentiality requirements of different users, the configuration method of the sensitive content and the content thereof are not limited, and the configuration method is optional.

Based on the above, after the electronic device obtains the content to be interacted for the user, the electronic device can perform matching processing on the content to be interacted with the preset sensitive content, for example, whether the content to be interacted contains any pre-defined sensitive words/sentences, whether the pre-defined matching conditions are met or not is detected, whether the content to be interacted has the sensitive content is determined, if the content to be interacted is successfully matched with any sensitive word/sentence or related words or any matching conditions, the content to be interacted has the sensitive content, the sensitive input conditions are met, and interference voice needs to be obtained; otherwise, it is determined that the sensitive input condition is not satisfied, and the interfering voice may not be obtained.

Regarding the matching process of the content to be interacted with the sensitive content, a content comparison or similarity algorithm may be used to perform similarity comparison, if the similarity reaches a threshold value, the sensitive content may be considered to exist in the content to be interacted, and the implementation process of the similarity comparison will not be described in detail in the present application.

In the embodiment of the application, in order to improve the reliability of interference to the input voice including the sensitive content output by the user and further reduce the leakage risk of the sensitive content caused by voice output, the electronic device will obtain corresponding interference voice for the user of the actual input voice, that is, the user of the current input voice of the electronic device may be different, and the obtained interference voice may be different.

Optionally, the present invention may obtain, on line, current voice features of a user of the electronic device, such as one or more of a pitch, a speech speed, a volume, etc. of the voice input by the user, thereby constructing an interference voice having features similar to/similar to those voice features, and improving an interference effect on the voice input by the user.

Step S23, controlling an audio player to play the interference voice in the current voice interaction environment;

step S24, the voice to be processed collected by the audio collector in the voice interaction environment is obtained; the voice to be processed comprises input voice containing sensitive content, which is input by a user of the electronic equipment, in the process of playing the interference voice;

in practical application, the electronic equipment can directly transmit the interference voice to the audio player for playing after obtaining the interference voice, so that reliable interference on the voice input by a user is ensured, and voice sensitive content leakage caused by untimely playing of the interference voice is avoided; optionally, when the user of the electronic device is detected to input the voice under the condition of meeting the sensitive input, namely, after the condition of meeting the sensitive input is determined, any frame of audio input by the user is detected, the audio player is controlled to play the obtained interference voice under the current voice interaction environment, so that the interference voice does not need to be played when the user does not input the voice, invalid interference of the interference voice and adverse influence on the user are avoided, and user experience is reduced.

Step S25, noise reduction processing is carried out on the voice to be processed according to the obtained interference voice, and input voice for voice interaction processing is obtained;

in combination with the related description of the interference voice, the voice characteristics of the interference voice played under the voice interaction scene of the electronic equipment are similar to the voice characteristics of the input voice of the user, the user outputs the input voice containing the sensitive content under the interference voice playing environment, and the input voice is overlapped with the interference voice, so that other people under the current voice interaction environment are interfered, what the sensitive content output by the user is can not be directly identified, and the risk of leakage of the sensitive content is greatly reduced.

For the mixed voice of input voice and interference voice superposition broadcast under the voice interaction environment, after the audio collector carries out real-time collection and sends to the processor of electronic equipment, the processor can read the interference voice that obtains in advance, and the waiting processing voice that gathers in real time falls the noise, accurately obtains the input voice that the user actually output for follow-up voice interaction handles, and the realization process is this application not described in detail.

Step S26, determining that the sensitive input condition is no longer met, and controlling the audio player to stop playing the interference voice.

In order to reduce the consumption of resources by voice processing of the electronic device, under the condition that a user of the electronic device cannot input sensitive content, the user can not play interference voice, and then the acquired voice to be processed is not required to be subjected to interference voice filtering, so that the resource consumption caused by executing the filtering operation is avoided. Therefore, whether the sensitive input condition is met or not can be detected in real time, if so, the interference voice can be played when the user outputs the input voice containing the sensitive content according to the method described above, and leakage of the sensitive content is reduced; if the detection that the sensitive input condition is no longer met, the audio player can be controlled to stop playing the interference voice in time, the voice to be processed acquired subsequently is the input voice of the user, and the input voice does not contain sensitive content.

That is, in a voice interaction environment that does not satisfy the sensitive input condition, the obtained input voice of the user of the electronic device does not include sensitive content, and the electronic device can directly use the voice to be processed collected in the environment for subsequent voice interaction processing.

In combination with the related description of the sensitive input conditions, the content to be interacted for the user of the electronic equipment can be dynamically obtained, whether the content to be interacted exists or not is detected, if the newly obtained content to be interacted does not contain the sensitive content any more, namely the content to be interacted obtained last time contains the sensitive content, the content to be interacted obtained this time does not contain the sensitive content, and the playing of the interference voice can be stopped.

Optionally, the electronic device may also determine whether the content of the obtained input voice of the user is complete by performing integrity detection on the input voice of the user, and may directly control the audio player to stop playing the interference voice if it is determined that the input voice is complete. The processing can be continued according to the method described above, it is determined that sensitive content exists in the newly obtained content to be interacted with, and the obtained interference voice can be replayed.

Under the condition that the user of the electronic equipment is unchanged, for example, the login account is unchanged or the detected voice characteristics are unchanged, when sensitive content exists in the to-be-interacted content which is not obtained for the first time, the user can directly read the historical interference voice aiming at the user of the electronic equipment and play the voice without obtaining the interference voice again, so that the playing efficiency of the interference voice is improved, and the consumption of resources in the process of obtaining the interference voice is reduced. Of course, in order to ensure interference reliability, the interference voice can be obtained again according to the method described above and then played, and the method for obtaining the interference voice of different contents to be interacted with, which includes sensitive contents, is not limited in the application.

In summary, in the embodiment of the present application, in a voice interaction environment of an electronic device, by predicting whether a sensitive content exists in a content to be interacted for a user of the electronic device, before the user outputs an input voice containing the sensitive content, an interference voice for the user is obtained, so that the user outputs the input voice containing the sensitive content in the voice interaction environment where the interference voice is played, and the interference voice interferes with other people in the current voice interaction environment, so that the user cannot accurately identify the sensitive content spoken by the user, and the leakage risk of the sensitive content outputted by the voice is reduced.

The electronic equipment can utilize the locally stored played interference voice to perform interference filtering on the voice to be processed after the voice to be processed, which is acquired by the audio acquisition device in real time, to obtain the input voice containing the sensitive content, which is actually output by the user, in the process, if the fact that the sensitive input condition is no longer met is detected, the voice to be processed can be stopped to be played in time, so that the audio acquisition device can directly acquire the input voice of the user, interference filtering processing is not needed, voice interaction processing can be directly used, voice processing workload is reduced, and voice interaction efficiency and reliability are improved.

In some embodiments provided in the present application, regarding a method for detecting whether a sensitive input condition is met, in addition to the implementation manner of whether a sensitive content exists in the content to be interacted described above, one or more preset sensitive interaction interfaces that need to input the sensitive content by a user, such as an account password input interface, an identity card number input interface, etc., may be predefined, so when the electronic device outputs a voice interaction interface and waits for a user to output a voice of a corresponding content according to the voice interaction interface, whether the sensitive input condition is met may be determined by determining whether the voice interaction interface belongs to the preset sensitive interaction interface, that is, by a judging manner of a voice interaction interface type, if the output voice interaction interface belongs to any preset sensitive interaction interface, it may be determined that the sensitive input condition is met; otherwise, the output voice interaction interface does not belong to any preset sensitive interaction interface, it may be determined that the sensitive input condition is not met, or it may be further determined whether the sensitive input condition is met according to other modes (which may include, but are not limited to, other detection modes described in the context).

In still other embodiments, in a human-computer interaction scenario, for example, the electronic device interacts with the user by outputting an interaction prompt voice manner, so as to instruct the user to perform a subsequent operation, and the electronic device may analyze the interaction prompt sentence output by the electronic device to determine whether the interaction prompt sentence belongs to a predefined preset sensitive sentence containing sensitive content, such as "please input a password", etc., if yes, it may be determined that the sensitive input condition is met, if no, it may be determined that the sensitive input condition is not met, or may further determine whether the sensitive input condition is met in other manners.

Therefore, according to the method described in the above embodiment, the present application may directly obtain the content to be interacted for the user of the electronic device, determine whether the sensitive content exists therein to determine whether the sensitive input condition is met, or may also determine whether the sensitive input condition is met by presetting a specific sentence or interface, such as the above-mentioned preset sensitive sentence and/or preset sensitive interaction interface, so as to determine whether to trigger the interfering voice playing function, obtain the interfering voice according to the method described above, play the interfering voice, and interfere the sensitive content input voice output by the user. It should be noted that, the implementation method of whether the sensitive input condition is satisfied includes, but is not limited to, the implementation method described above, and the user may actively trigger to play the interference voice through a key or the like, so that the implementation process is not described in detail in this application.

Referring to fig. 3, for a schematic flow chart of a further alternative example of the voice interaction processing method proposed in the present application, this embodiment may describe a further alternative refinement implementation of the voice interaction processing method proposed above, as shown in fig. 3, where the method may include:

step S31, obtaining the content to be interacted indicated by a voice interaction interface output by the electronic equipment when the electronic equipment is in a voice interaction mode;

in practical application, regarding to the to-be-interacted content of the electronic device user, after identifying the display content of the voice interaction interface currently output by the electronic device, the to-be-interacted content of the electronic device user can be obtained through prediction according to the display content, namely, the to-be-interacted content indicated by the voice interaction interface is obtained. For example, if the voice interaction interface is an account password input interface, the display content of the voice interaction interface includes an account password input prompt, a password input prompt and the like, so that a user can be predicted to input an account and a password in a voice manner, and the content to be interacted includes the account and the password.

Optionally, if the electronic device directly prompts the user to output the input voice of the content required by the service in a voice manner, instead of through the interface interaction manner described above, the electronic device may directly analyze the output interaction prompt voice, for example, through an artificial intelligent algorithm such as voice recognition/semantic recognition, to obtain the interaction prompt voice content, so as to predict the content to be interacted for the user of the electronic device accordingly, where the prediction process is similar to the implementation process of the voice interaction interface example described above, and this application will not be described in detail herein.

In still other embodiments, if the current voice interaction scenario of the electronic device is not a man-machine interaction scenario, but a voice interaction scenario among multiple devices, the electronic device may be in communication connection with other electronic devices participating in the scenario (denoted as second electronic devices), after receiving the interaction voice sent by the second electronic device, the interactive voice may be analyzed, and content to be interacted for the user of the electronic device is predicted, where a method for implementing prediction of the content to be interacted is similar to the above-described prediction implementation manner, and the application will not be described in detail.

Step S32, determining that sensitive content exists in the content to be interacted, and obtaining historical input voice of a user of the electronic equipment;

step S33, analyzing the history input voice to obtain voice characteristics of a user of the electronic equipment;

step S34, generating interference voice for the user of the electronic equipment according to the voice characteristics;

to enhance the interference effect of the input speech of the sensitive content, as in the above analysis, the speech characteristics of the user of the current input speech may be obtained, including but not limited to the intonation, the speech speed, and/or other acoustic characteristics of the user, etc., to determine therefrom the speech characteristics of the interfering speech, and to generate the interfering speech having the speech characteristics.

In order to obtain the voice characteristics of the user of the electronic device, obtain the interference voice for the user of the electronic device, obtain the historical input voice of the user, extract the voice characteristics of the historical input voice to obtain the voice characteristics of the user, and then generate the interference voice by using the voice characteristics, for example, process the voice characteristics of the user according to an acoustic model to obtain the interference voice for the input voice of the user, but not limited to this implementation method.

The historical input voice can be input voice output by the user of the electronic device under the condition of not meeting sensitive input, for example, after the user uses the electronic device to perform voice interaction, the input voice output by the user is collected, and then the input voice is directly recorded for constructing the interference voice aiming at the user according to the method.

Optionally, the present application may also obtain a history input voice of a preset number of continuous audio frames output by the user after determining that the sensitive input condition is met (the number of frames of the continuous audio frames is not limited in this application, and may be determined through experience or experiment in advance, so as to ensure that the complete voice feature of the user can be accurately extracted according to this, where the history input voice of the preset number of continuous audio frames does not include sensitive content or includes unsatisfied sensitive content, and does not cause leakage of the sensitive content, but the history input voice obtained according to this manner can include the voice feature of the user, so that the voice feature extraction is performed on the preset number of continuous audio frames, so that the current voice feature of the user can be accurately obtained, and then, according to but not limited to the above method, an interference voice for the user can be generated.

Step S35, when the input voice of the user of the electronic equipment under the condition of meeting the sensitive input is detected, the audio player can be controlled to play the interference voice under the current voice interaction environment;

in practical application of the embodiment, when a user outputs sensitive content voice, interference voice can be played, so that the interference voice and input voice which exist in the current voice interaction environment are overlapped, and the situation that other people in the environment cannot directly recognize the sensitive content spoken by the user is ensured. When the user does not output the sensitive content voice, the interference voice can be not played.

In combination with the above description of the sensitive input conditions, after determining that the sensitive input conditions are met, the user of the electronic device will normally speak the sensitive content, so that the obtained interfering voice can be directly played. Optionally, in order to reduce adverse effects of ineffective interference on the user, when the user speaking is detected, the audio player can be controlled to play the obtained interference voice, so that the interference voice cannot be played in the pause time of the user, and the interference reliability and effectiveness of playing the interference voice are improved.

Step S36, the voice to be processed collected by the audio collector in the voice interaction environment is obtained; the voice to be processed comprises input voice containing sensitive content, which is input by a user of the electronic equipment, in the process of playing the interference voice;

Step S37, noise reduction processing is carried out on the voice to be processed according to the obtained interference voice, and input voice for voice interaction processing is obtained.

Regarding the implementation procedure of step S36 and step S37, reference may be made to the description of the corresponding parts of the above embodiments, which are not described in detail herein.

In this embodiment of the present application, when the newly obtained content to be interacted does not have any sensitive content, that is, does not satisfy the sensitive input condition, the audio player may be controlled to stop playing the interference voice in time, the voice to be processed collected by the audio collector is determined as the input voice for the voice interaction processing, and regarding how to use the input voice to implement the subsequent voice interaction processing, the process of implementing the subsequent voice interaction processing may be determined in combination with the actual requirement of the voice interaction scene, which is not described in detail herein.

Referring to fig. 4, a schematic structural diagram of an alternative example of a voice interaction processing apparatus proposed in the present application may include:

the interfering voice obtaining module 41 is configured to determine that a sensitive input condition is satisfied when the electronic device is in a voice interaction mode, and obtain interfering voice;

an interfering voice playing control module 42, configured to control the audio player to play the interfering voice in the current voice interaction environment;

A to-be-processed voice obtaining module 43, configured to obtain to-be-processed voice collected by the audio collector in the voice interaction environment; the voice to be processed comprises input voice which is input by the user of the electronic equipment and contains sensitive content in the process of playing the interference voice;

the noise reduction processing module 44 is configured to perform noise reduction processing on the to-be-processed voice according to the interfering voice, so as to obtain the input voice for voice interaction processing.

Optionally, the interfering voice obtaining module 41 may include any one of the following determining units:

the to-be-interacted content obtaining unit is used for obtaining to-be-interacted content aiming at the electronic equipment user;

the first determining unit is used for determining that sensitive content exists in the content to be interacted;

the second determining unit is used for determining that the voice interaction interface output by the electronic equipment belongs to a preset sensitive interaction interface;

and the third determining unit is used for determining that the interactive prompt statement output by the electronic equipment belongs to a preset sensitive statement.

In one implementation manner, the content obtaining unit to be interacted may include any one of the following units:

the first obtaining unit is used for obtaining the content to be interacted indicated by the voice interaction interface output by the electronic equipment;

The first prediction unit is used for analyzing the interaction prompt voice output by the electronic equipment and predicting the content to be interacted for the user of the electronic equipment;

the second prediction unit is used for analyzing the interactive voice from the second electronic equipment and predicting the content to be interacted for the user of the electronic equipment; the second electronic device is communicatively connected with the electronic device.

In still other embodiments, the interfering voice obtaining module 41 may further include any one of the following units:

a second obtaining unit configured to obtain an interference voice for the user of the electronic device;

the reading unit is used for reading the interference voice prestored in the electronic equipment;

and the third obtaining unit is used for obtaining corresponding interference voice according to the content meeting the sensitive input condition.

Optionally, the second obtaining unit may include:

a history input voice obtaining unit, configured to obtain a history input voice of a user of the electronic device;

the voice characteristic obtaining unit is used for analyzing the historical input voice to obtain the voice characteristic of the user of the electronic equipment;

and the interference voice generating unit is used for generating interference voice for the user of the electronic equipment according to the voice characteristics.

In one implementation, the history input voice obtaining unit may include:

a fourth obtaining unit, configured to obtain a history input voice output by the electronic device user under the condition that the sensitive input condition is not satisfied;

in still another implementation manner, the history input voice obtaining unit may further include:

a fifth obtaining unit, configured to obtain a historical input voice of a preset number of continuous audio frames output by the electronic device user under the condition that the sensitive input condition is satisfied; the historical input speech of the preset continuous audio frames does not contain sensitive content or contains incomplete sensitive content.

The voice interaction processing apparatus described in connection with the above embodiment may further include:

the control module is used for determining that the sensitive input condition is not met and controlling the audio player to stop playing the interference voice;

Optionally, the interfering voice playing control module 42 may include:

the playing control unit is used for controlling the audio player to play the interference voice under the current voice interaction environment when the input voice of the electronic equipment user under the sensitive input condition is detected;

In the application, the audio player can be integrated in the electronic device, or can be an independent audio device which is positioned in the voice interaction environment and is in communication connection with the electronic device, and the relation between the audio player and the electronic device is not limited, and the audio player and the electronic device can be determined according to the situation.

It should be noted that, regarding the various modules, units, and the like in the foregoing embodiments of the apparatus, the various modules, units, and the like may be stored as program modules in a memory of an electronic device, and the processor of the electronic device executes the program modules stored in the memory to implement corresponding functions, and regarding the functions implemented by each program module and a combination thereof, and the achieved technical effects, reference may be made to descriptions of corresponding parts of the foregoing method embodiments, which are not repeated herein.

The present application also provides a computer-readable storage medium on which a computer program can be stored, which can be called and loaded by a processor to implement the steps of the voice interaction processing method described in the above embodiments.

Referring to fig. 5, a schematic hardware structure of an electronic device suitable for an alternative example of the voice interaction processing method proposed in the present application, as shown in fig. 5, the electronic device may include: an audio collector 51, an audio player 52, a communication interface 53, a memory 54, and a processor 55, wherein:

The number of the audio collector 51, the audio player 52, the communication interface 53, the memory 54 and the processor 55 may be at least one, and may be connected to a communication bus of the electronic device, so as to implement data interaction between different components, and the connection manner is not limited in this application.

The audio collector 51 may include at least one speaker, and the present application does not limit the arrangement of a plurality of speakers in the electronic device, as the case may be. In the event that the electronic device enters a voice interaction mode, the voice present in the current voice interaction environment, typically the voice output by the user of the electronic device, may be captured by the audio capture device 51 and transmitted to the processor 55 for subsequent processing.

The audio players 52 may include at least one speaker, and the deployment location of each audio player 52 in the electronic device may be determined according to the product structure of the electronic device, which is not limited in this application. In practical applications of the present application, the audio player 52 may be configured to play interfering voices to interfere with sensitive contents uttered by a user, so as to avoid leakage of the sensitive contents caused by voice input.

In some embodiments, for example, in a man-machine interaction scenario, the audio player 52 may play the prompting voice of the electronic device in response to the input voice of the user, and the obtained prompting voice of the next interaction for the current service and/or the prompting voice of the response result for the input voice, etc., where the playing content of the audio player 52 and the control manner thereof are not limited, and may be determined according to the actual interaction requirement.

The communication interface 53 may include a physical interface for implementing data interaction between components inside the electronic device, such as a USB interface, a serial/parallel port, an audio/video data transmission interface, etc., to implement connection between each of the audio collector 51 and the audio player 52 and the processor 55 and/or the memory 53, so as to transmit audio data (such as the input voice, the interference voice, and the voice to be processed) through a communication channel between the corresponding communication interfaces 53, and the implementation process will not be described in detail.

In order to ensure that the electronic device realizes the business service, the communication interface 53 may further include an interface supporting a wireless communication mode and/or a wired communication mode, such as a communication interface of a communication module of a WIFI module, a 5G/6G (fifth generation mobile communication network/sixth generation mobile communication network) module, a GPRS module, etc., so that the electronic device may access a business server supporting a corresponding business service through the communication interface, thereby meeting the operation requirement of a user on the business.

Optionally, in a scenario that the electronic device needs to participate in voice interaction between multiple devices, such as an online conference, the electronic device may also be connected to other electronic devices directly or through a communication server through the above communication interface supporting a wireless/wired communication manner, so as to implement data interaction between multiple electronic devices, thereby meeting the voice interaction communication requirement between multiple devices.

The memory 54 may be used to store a program for implementing the voice interaction processing method described in the above-described method embodiments; the processor 55 may load and execute the program stored in the memory to implement the steps of the voice interaction processing method described in the above-described corresponding method embodiment, and the specific implementation process may refer to the description of the corresponding portion of the above-described embodiment.

In embodiments of the present application, memory 54 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device or other volatile solid-state storage device. The processor 55 may be a central processing unit (Central Processing Unit, CPU), application-specific integrated circuit (ASIC), digital Signal Processor (DSP), application-specific integrated circuit (ASIC), off-the-shelf programmable gate array (FPGA), or other programmable logic device, etc. The structures and the models of the memory 54 and the processor 55 are not limited, and can be flexibly adjusted according to actual requirements.

It should be understood that the structure of the electronic device shown in fig. 5 is not limited to the electronic device in the embodiment of the present application, and in practical application, the electronic device may include more components than those shown in fig. 5, or some components may be combined, such as a display screen, various sensors, a power module, a camera, and the like, which are not listed herein.

In combination with the voice interaction processing method described in the foregoing embodiments and related description of the electronic device suitable for the method, an optional implementation process of the voice interaction processing method will be described below by taking the electronic device as an AR glasses as an example, and as shown in fig. 6, when a user wears the AR glasses to perform voice interaction, according to the method described above, it is determined that a sensitive input condition is satisfied, in order to avoid leakage of sensitive content to be uttered by the user, an interference voice playing function of the AR glasses may be triggered, and a history input voice input by the user is analyzed to generate an interference voice having an interference voice feature similar to a voice feature such as a tone, a intonation and/or a speech speed of the user. After that, when the user speaks the input voice containing the sensitive content, the AR glasses synchronously play the interference voice, so that the outside cannot obtain the voice content spoken by the user, and the leakage risk of the sensitive content is reduced.

In addition, the processor of the AR glasses is known to play the interference voice, and for the collected mixed voice formed by overlapping the interference voice and the input voice of the user, the interference voice in the mixed voice can be directly filtered, the input voice of the user is accurately obtained, and the voice interaction processing requirement is met. It should be noted that, the voice interaction processing method performed by other types of electronic devices is similar, and is not described in detail herein.

It should be understood that, under the condition that the interference voice playing function of the AR glasses is not triggered, that is, when the sensitive input condition is not satisfied, the processor will not generate interference voice, the generated interference voice will not control the audio player to play, the processor can directly obtain the user input voice collected by the audio collector, without filtering the interference voice, and directly output the input voice.

In still other embodiments of the present application, for an audio collector for implementing voice collection and/or an audio player for implementing voice playback interference, the audio collector may be integrated into an electronic device as described above, or may be an independent audio device, which is respectively recorded as an audio collection device and an audio playback device, and the audio collection device and the audio playback device may be integrated into the same other device, such as an intelligent speaker, etc., and may be determined according to a scene requirement, which is not limited in this application, but the processes of implementing the voice interaction processing method by these devices are similar, which is not described in detail in one example.

Finally, it should be noted that, in the embodiments described above, unless the context clearly indicates otherwise, the words "a," "an," "the," and/or "the" are not to be construed as limiting, but rather as including the singular, as well. In general, the terms "comprises" and "comprising" merely indicate that the steps and elements are explicitly identified, and they do not constitute an exclusive list, as other steps or elements may be included in a method or apparatus. The inclusion of an element defined by the phrase "comprising one … …" does not exclude the presence of additional identical elements in a process, method, article, or apparatus that comprises an element.

Wherein, in the description of the embodiments of the present application, "/" means or is meant unless otherwise indicated, for example, a/B may represent a or B; "and/or" herein is merely an association relationship describing an association object, and means that three relationships may exist, for example, a and/or B may mean: a exists alone, A and B exist together, and B exists alone. In addition, in the description of the embodiments of the present application, "plurality" means two or more than two.

The terms "first," "second," and the like, herein are used for descriptive purposes only and are not necessarily for distinguishing one operation, element or module from another, and not necessarily for describing or implying any actual such relationship or order between such elements, elements or modules. And is not to be taken as indicating or implying a relative importance or implying that the number of technical features indicated is such that the features defining "first", "second" or "a" may explicitly or implicitly include one or more such features.

In addition, various embodiments in the present specification are described in a progressive or parallel manner, and each embodiment is mainly described in a different manner from other embodiments, and identical and similar parts between the various embodiments are only required to be mutually referred. The device and the electronic equipment disclosed in the embodiments correspond to the method disclosed in the embodiments, so that the description is simpler, and the relevant parts are referred to in the description of the method.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A voice interaction processing method, the method comprising:

2. The method of claim 1, the determining that the sensitive input condition is satisfied comprising any one of the following implementations:

3. The method of claim 2, the obtaining content to be interacted with for the electronic device user, comprising any implementation of:

4. A method according to any of claims 1-3, said obtaining interfering speech comprising any of the following implementations:

obtaining interference voice for the user of the electronic equipment;

Reading interference voice prestored in the electronic equipment;

5. The method of claim 4, the obtaining interfering speech for the electronic device user, comprising:

obtaining historical input voice of a user of the electronic equipment;

6. The method of claim 5, the obtaining historical input speech of the electronic device user, comprising:

7. A method according to any one of claims 1-3, the method further comprising:

8. A method according to any of claims 1-3, the controlling the audio player to play the interfering speech in the current speech interaction environment comprising:

9. A voice interaction processing apparatus, the apparatus comprising:

10. A computer-readable storage medium having stored thereon a computer program for execution by a processor, the computer program implementing the voice interaction processing method according to any of claims 1-8.