CN117877468A

CN117877468A - Multi-mode voice refusing method and system for electric power man-machine interaction scene

Info

Publication number: CN117877468A
Application number: CN202311851194.1A
Authority: CN
Inventors: 周逸聪; 龚梁; 钟刚; 郭鹏程; 胡华
Original assignee: Wuhan Firehome Putian Information Technology Co ltd
Current assignee: Wuhan Firehome Putian Information Technology Co ltd
Priority date: 2023-12-27
Filing date: 2023-12-27
Publication date: 2024-04-12

Abstract

The application provides a multi-mode voice refusing method and a system for an electric power man-machine interaction scene, wherein the method comprises the following steps: collecting and acquiring voice signals of an electric power man-machine interaction scene; performing voice discrimination on the acquired voice signal; when the acquired voice signal is voice, converting the voice into text in real time, and acquiring the text; and carrying out multi-modal fusion processing on the acquired text and the original voice, acquiring multi-modal fusion characteristics, and controlling and executing different multi-modal voice refusal strategies according to the acquired multi-modal fusion characteristics. The multi-mode voice refusing method for the electric power man-machine interaction scene has higher refusing precision and stronger practicability; the voice recognition performance of the voice assistant is effectively enhanced, the voice interaction efficiency is improved, continuous dialogue is facilitated, and the user experience is improved.

Description

Multi-mode voice refusing method and system for electric power man-machine interaction scene

Technical Field

The application relates to the technical field of information, in particular to a multi-mode voice refusing method and system for an electric power man-machine interaction scene.

Background

Along with the development of man-machine interaction core technologies such as voice recognition and natural language processing and the continuous expansion of man-machine interaction application scenes, irrelevant voice rejection is a key component for realizing continuous dialogue in the voice interaction process, and is used for distinguishing whether voice signals and voice interaction instructions point to a voice assistant or not, and helping a user to continuously send instructions under the condition of not repeatedly using wake-up words by filtering environmental noise, interference audio and voices of any other non-directional characteristic instruction sets, so that continuous dialogue experience between the user and the voice assistant is improved.

It is important to accurately identify valid speech signals and speech commands specifying a user instruction set during intelligent interaction. In the intelligent human-computer interaction process, besides a directional voice instruction set of a user, a large amount of invalid human voice signals such as ambient noise, interference audio and the like and other nondirectional voice instructions exist. If the irrelevant sounds cannot be effectively identified and filtered, the voice assistant can be wrongly identified and wrongly operated in the human-computer interaction process, and the voice interaction experience of the user is affected. In short, it is important to achieve effective rejection of invalid speech in the intelligent speech interaction process to improve speech interaction efficiency and achieve continuous dialogue goals. Therefore, by recognizing and filtering out background noise and other non-voice sounds, the voice recognition performance of the voice assistant can be enhanced, and the user experience can be improved.

In the prior art, an effective refusing method for invalid voice of an electric power man-machine interaction scene is lacking.

Disclosure of Invention

The application provides a multi-mode voice refusing method and a multi-mode voice refusing system for an electric power man-machine interaction scene, which can solve the technical problem of how to effectively refuse invalid voice in the voice interaction process of the electric power man-machine interaction scene in the prior art.

In a first aspect, the present application provides a multi-modal voice rejection method for an electric human-computer interaction scenario, including the steps of:

collecting and acquiring voice signals of an electric power man-machine interaction scene;

performing voice discrimination on the acquired voice signal;

when the acquired voice signal is voice, converting the voice into text in real time, and acquiring the text;

and carrying out multi-modal fusion processing on the acquired text and the original voice, acquiring multi-modal fusion characteristics, and controlling and executing different multi-modal voice refusal strategies according to the acquired multi-modal fusion characteristics.

With reference to the first aspect, in one implementation manner, the step of performing voice discrimination on the acquired voice signal specifically includes the following steps:

constructing a power scene non-human voice classification model, wherein the power scene non-human voice classification model comprises an energy band-based non-human voice discrimination model and a deep learning-based non-human voice discrimination model;

inputting a voice signal to a non-human voice discrimination model based on an energy band, and acquiring a first non-human voice probability;

inputting a voice signal to a non-human voice discrimination model based on deep learning, and acquiring a second non-human voice probability;

according to the acquired first non-human voice probability and second non-human voice probability, calculating the non-human voice probability of the acquired voice signal;

comparing the non-human voice probability with a non-human voice probability threshold value to obtain a comparison result;

and obtaining a voice judgment result of the voice signal according to the obtained comparison result.

With reference to the first aspect, in one implementation manner, when the acquired voice signal is voice, converting voice into text in real time, acquiring text, and executing different rejection strategies according to the acquired recognition result, where the steps specifically include:

constructing a voice recognition model for converting voice into characters in the electric power field;

when the acquired voice signal is voice, inputting voice to the constructed voice recognition model to perform real-time text conversion, and acquiring text.

With reference to the first aspect, in an implementation manner, the step of constructing a speech recognition model for converting speech into text in the electric power domain specifically includes the following steps:

constructing a power voice instruction set;

and constructing a voice recognition model for converting the voice in the electric power field into characters based on the voice corpus in the electric power field and the constructed electric power voice instruction set.

With reference to the first aspect, in an implementation manner, the power voice instruction set includes instruction set text and corresponding voice corpus.

With reference to the first aspect, in an implementation manner, the step of constructing a voice recognition model for converting voice in the electric power domain based on the voice corpus in the electric power domain and the constructed electric power voice instruction set specifically includes the following steps:

acquiring the recognition accuracy of each voice command in the electric power voice command set;

when the recognition accuracy of each voice instruction in the acquired power voice instruction set is larger than an accuracy threshold, a voice recognition model for converting the power field voice into characters is built based on the power field voice corpus and the built power voice instruction set.

With reference to the first aspect, in an implementation manner, when the acquired voice signal is voice, inputting voice to the constructed voice recognition model to perform real-time text conversion, and acquiring text, the method specifically includes the following steps:

constructing a multi-mode refusal recognition model of human-computer interaction text and voice of an electric power scene;

respectively inputting a text of voice and an original voice into a multi-modal rejection model, and carrying out real-time semantic extraction through a text encoder and voice encoding to obtain text features and voice features;

inputting text features and voice features into a multi-modal fusion module to obtain multi-modal fusion features;

inputting the multi-mode fusion characteristics to the classifier, and controlling and executing different voice refusal strategies according to the appointed power voice instruction set.

In a second aspect, the present application provides a multi-modal speech rejection system for an electrical human-machine interaction scenario, comprising:

the signal acquisition module is used for acquiring voice signals of the electric power man-machine interaction scene;

the voice judging module is in communication connection with the signal acquisition module and is used for judging the voice of the acquired voice signal;

the voice conversion module is in communication connection with the voice judging module and is used for converting voice into text in real time when the acquired voice signal is voice and acquiring the text;

the voice refusing module is in communication connection with the signal acquisition module and the voice conversion module and is used for carrying out multi-mode fusion processing on the acquired text and the original voice, acquiring multi-mode fusion characteristics and controlling and executing different multi-mode voice refusing strategies according to the acquired multi-mode fusion characteristics.

With reference to the second aspect, in one implementation manner, the voice discriminating module includes:

the model building unit is used for building a power scene non-human voice classification model, and comprises an energy band-based non-human voice discrimination model and a deep learning-based non-human voice discrimination model;

the first probability acquisition unit is in communication connection with the model construction unit and the signal acquisition module and is used for inputting a voice signal to the non-human voice discrimination model based on the energy band to acquire a first non-human voice probability;

the second probability acquisition unit is in communication connection with the model construction unit and the signal acquisition module and is used for inputting a voice signal to a non-human voice discrimination model based on deep learning to acquire a second non-human voice probability;

the non-human voice probability acquisition unit is in communication connection with the first probability acquisition unit and the second probability acquisition unit and is used for calculating the non-human voice probability of acquiring the voice signal according to the acquired first non-human voice probability and second non-human voice probability;

the comparison unit is in communication connection with the non-human voice probability acquisition unit and is used for comparing the non-human voice probability with a non-human voice probability threshold value to acquire a comparison result;

and the judging result acquisition unit is in communication connection with the comparison unit and is used for acquiring the voice judging result of the voice signal according to the acquired comparison result.

In a third aspect, an embodiment of the present application provides a computer readable storage medium, where a multi-modal speech recognition program for an electric human-computer interaction scenario is stored on the computer readable storage medium, where when the multi-modal speech recognition program for an electric human-computer interaction scenario is executed by a processor, the steps of the multi-modal speech recognition method for an electric human-computer interaction scenario as described above are implemented.

The beneficial effects that technical scheme that this application embodiment provided include at least:

according to the multi-mode voice rejection method for the electric power man-machine interaction scene, the voice signals are acquired through voice discrimination, then the voice signals are converted into texts, the texts and original voices are subjected to multi-mode fusion processing, multi-mode fusion characteristics are acquired, accordingly, rejection of invalid voices is executed, compared with irrelevant voice rejection of single modes such as traditional voices or texts, the multi-mode irrelevant voice rejection technology does not need complex characteristic engineering, learning can be directly conducted from original voices and text modes, and due to the fact that multi-mode information is fused, higher rejection accuracy can be provided by multi-mode rejection models through complementation among the modes, and the multi-mode voice rejection method has higher practicability;

the voice recognition performance of the voice assistant is enhanced, the voice interaction efficiency is effectively improved, continuous dialogue is facilitated, and the user experience is improved.

Drawings

Fig. 1 is a schematic flow chart of a method for multi-mode voice rejection method for an electric man-machine interaction scenario according to an embodiment of the present application;

fig. 2 is a schematic flow chart of implementation of an energy band-based non-human voice discrimination model according to an embodiment of the present application;

fig. 3 is a schematic flow chart of implementation of a non-human voice discrimination model based on deep learning according to an embodiment of the present application;

fig. 4 is a schematic flow chart of an implementation of a human-computer interaction multi-mode rejection model of an electric power scene provided in an embodiment of the present application;

fig. 5 is a schematic diagram of a multimodal voice rejection procedure of an electric man-machine interaction scenario provided in an embodiment of the present application;

fig. 6 is a functional block diagram of a system of a multi-mode voice rejection method for an electric human-computer interaction scenario according to an embodiment of the present application.

Detailed Description

In order to make the present application solution better understood by those skilled in the art, the following description will clearly and completely describe the technical solution in the embodiments of the present application with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, are intended to be within the scope of the present application.

The terms "comprising" and "having" and any variations thereof in the description and claims of the present application and in the foregoing drawings are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those listed steps or elements but may include other steps or elements not listed or inherent to such process, method, article, or apparatus. The terms "first," "second," and "third," etc. are used for distinguishing between different objects and not necessarily for describing a sequential or chronological order, and are not limited to the fact that "first," "second," and "third" are not identical.

In the description of embodiments of the present application, "exemplary," "such as," or "for example," etc., are used to indicate an example, instance, or illustration. Any embodiment or design described herein as "exemplary," "such as" or "for example" is not necessarily to be construed as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary," "such as" or "for example," etc., is intended to present related concepts in a concrete fashion.

In the description of the embodiments of the present application, unless otherwise indicated, "/" means or, for example, a/B may represent a or B; the text "and/or" is merely an association relation describing the associated object, and indicates that three relations may exist, for example, a and/or B may indicate: the three cases where a exists alone, a and B exist together, and B exists alone, and in addition, in the description of the embodiments of the present application, "plural" means two or more than two.

In some of the processes described in the embodiments of the present application, a plurality of operations or steps occurring in a particular order are included, but it should be understood that these operations or steps may be performed out of the order in which they occur in the embodiments of the present application or in parallel, the sequence numbers of the operations merely serve to distinguish between the various operations, and the sequence numbers themselves do not represent any order of execution. In addition, the processes may include more or fewer operations, and the operations or steps may be performed in sequence or in parallel, and the operations or steps may be combined.

For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

In a first aspect, referring to fig. 1, an embodiment of the present application provides a multi-modal voice rejection method for an electric human-computer interaction scenario, including the following steps:

s1, acquiring a voice signal of an electric power man-machine interaction scene;

s2, performing voice judgment on the acquired voice signals;

s3, when the acquired voice signal is voice, converting the voice into text in real time, and acquiring the text;

and S4, performing multi-modal fusion processing on the acquired text and the original voice to acquire multi-modal fusion characteristics, and controlling and executing different multi-modal voice refusal strategies according to the acquired multi-modal fusion characteristics.

According to the multi-mode voice refusing method for the electric power man-machine interaction scene, the human voice signals are acquired through human voice discrimination and then converted into the texts, the texts and the original voices are subjected to multi-mode fusion processing, the multi-mode fusion characteristics are acquired, accordingly, refusing of invalid voices is executed, voice recognition performance of a voice assistant is enhanced, voice interaction efficiency is effectively improved, continuous conversation is facilitated, and user experience is improved.

In an embodiment, in view of the fact that the electric power man-machine interaction scene has the characteristics that the voice is rich in accents of a specific region in the interaction process and the types of non-voice signals of the interaction environment are relatively concentrated, the voice signals of the acquired electric power man-machine interaction scene need to be subjected to voice judgment, and the step S2 of carrying out voice judgment on the acquired voice signals specifically comprises the following steps:

s21, constructing an electric power scene non-human voice classification model, wherein the electric power scene non-human voice classification model comprises an energy band-based non-human voice discrimination model A and a deep learning-based non-human voice discrimination model B;

s22, inputting a voice signal to a non-human voice discrimination model based on an energy band, and acquiring a first non-human voice probability alpha;

s23, inputting a voice signal to a non-human voice discrimination model based on deep learning, and acquiring a second non-human voice probability beta;

step S24, calculating the non-human voice probability lambda of the acquired voice signal according to the acquired first non-human voice probability alpha and second non-human voice probability beta;

s25, comparing the non-human voice probability with a non-human voice probability threshold value to obtain a comparison result;

and S26, acquiring a voice judgment result of the voice signal according to the acquired comparison result.

According to the multi-mode voice refusing method for the electric power human-computer interaction scene, the electric power scene non-human voice classification model is constructed and is mainly used for eliminating non-human voice signals in the electric power scene human-computer interaction process, when the voice signals are judged to be non-human voice, the electric power scene human-computer interaction system needs to refuse the voice signals, and when the voice signals are determined to be human voice, the follow-up steps S3 and S4 are needed.

In an embodiment, as shown in fig. 2, the step S22 of inputting a voice signal to an energy band-based non-human voice discrimination model to obtain a first non-human voice probability α specifically includes the following steps:

resampling voice signals acquired by a microphone of a human-computer interaction system of an electric power scene, resampling voice signals entering a non-human voice classification model of the electric power scene into 8000HZ, designing a group of filters, including six band-pass filters, wherein the band-pass frequencies of the filters are respectively 80-250HZ,250-500HZ,500-1000HZ,1000-2000HZ,2000-3000HZ,3000-4000Hz, dividing the frequency of the voice signals, respectively calculating the energy of each sub-band signal, and carrying out normalization processing to obtain epsilon _i ,0≤ε _i Not more than 1, wherein the value range of i is [1,6 ]]，

In an embodiment, as shown in fig. 3, the deep learning-based non-human voice discrimination model B mainly includes a spec segment layer, a Convolition layer, a Linear layer, a Dropout layer, and a Softmax layer, and the step S23 of inputting a voice signal to the deep learning-based non-human voice discrimination model to obtain a second non-human voice probability β specifically includes the following steps:

resampling a voice signal acquired by a microphone of the power scene man-machine interaction system, resampling the voice signal entering the power scene non-human voice classification model into 16000HZ, enhancing the voice signal through a SpecAugment layer, then respectively passing the enhanced voice signal through a Convolition layer, a Linear layer and a Dropout layer, and finally carrying out two-classification output through a Softmax layer to obtain the non-human voice probability which is beta.

In an embodiment, the step S24 calculates a non-human voice probability λ of the acquired voice signal according to the acquired first non-human voice probability α and second non-human voice probability β, and specifically includes the following steps:

performing numerical transformation on the acquired first non-human voice probability alpha and second non-human voice probability beta according to the following formula, and calculating non-human voice probability lambda of acquired voice signals:

λ＝μ*α+(1-μ)*β

in the formula, mu and (1-mu) are weights of non-human voice discrimination probabilities given by a model A and a model B in a power scene non-human voice classification model designed by the application respectively.

In an embodiment, the step S26 of obtaining the voice discrimination result of the voice signal according to the obtained comparison result specifically includes the following steps:

setting a non-human voice discrimination threshold as tau;

step S26A, when lambda > tau, judging that the acquired voice signal of the electric power man-machine interaction scene is non-human voice;

step S26B, when lambda _≤ And when tau is detected, judging that the acquired voice signal of the electric power man-machine interaction scene is human voice.

In a special case, in an actual application process, values of μ and τ are tested and adjusted according to a field operation environment of the electric human-computer interaction application system.

In an embodiment, when the acquired voice signal is voice, the step S3 converts voice into text in real time, and specifically includes the following steps:

s31, constructing a voice recognition model for converting voice into characters in the electric power field;

and S32, when the acquired voice signal is voice, inputting voice to the constructed voice recognition model to perform real-time text conversion, and acquiring text.

In an embodiment, the step S31 is a step of constructing a voice recognition model for converting voice into text in the electric power domain, and specifically includes the following steps:

step S311, constructing an electric power voice instruction set;

step S312, a voice recognition model for converting the voice in the electric power field into characters is built based on the voice corpus in the electric power field and the built electric power voice instruction set.

In an embodiment, the power voice instruction set includes an instruction set text and a corresponding voice corpus, the instruction set text of the power voice needs to meet the basic requirements that the instruction is clear and unambiguous, concise and redundancy-free, and the power professional term factors must be involved, and the corresponding voice corpus is clear and natural in reading.

In an embodiment, the step S312 is a step of constructing a voice recognition model for converting the voice in the electric power domain into text based on the voice corpus in the electric power domain and the constructed electric power voice instruction set, and specifically includes the following steps:

step S3121, obtaining the recognition accuracy acc of each voice command in the power voice command set;

step S3122A, when the recognition accuracy acc of each voice command in the obtained power voice command set is greater than the accuracy threshold value theta, constructing a voice recognition model (DL-ASR) for converting the voice in the power domain based on the voice corpus in the power domain and the constructed power voice command set;

step S3122B, when the recognition accuracy acc of any voice command in the obtained power voice command set is not greater than the accuracy threshold θ, returning to step S311, redesigning the command, adding the command to the specified power voice command set after the design is completed, and updating the specified power voice command set.

In an embodiment, as shown in fig. 4, the step S4 of performing multi-modal fusion processing on the acquired text and the original voice to acquire multi-modal fusion features, and according to the acquired multi-modal fusion features, controlling to execute different multi-modal voice rejection policies specifically includes the following steps:

s41, constructing a multi-mode refusal model of text and voice of man-machine interaction of an electric scene;

step S42, respectively inputting a text of voice of a person and an original voice multi-modal rejection model, carrying out real-time semantic extraction through a text encoder and voice encoding to obtain text features and voice features, specifically extracting high-order acoustic features in (N, X) dimension through an acoustic high-order feature extractor, and extracting high-order text features in (N, Y) dimension through the text high-order feature extractor;

s43, inputting text features and voice features into a multi-modal fusion module to obtain multi-modal fusion features;

and S44, inputting the multi-mode fusion characteristics into a classifier, and controlling to execute different voice refusing strategies according to the appointed power voice instruction set so as to realize non-power voice interaction instruction refusing.

According to the multi-mode voice refusing method for the motor interaction scene, through the steps of S21, the construction of the power scene non-human voice classification model, S311, the construction of the power voice instruction set, S312, the construction of the voice recognition model for converting voices into characters in the power field, S41, the construction of the power scene man-machine interaction text and voice multi-mode refusing model, and the construction of 4 core components, continuous dialogue of the designated power voice instruction in the power scene is realized, learning is directly carried out from original voice and text modes, multi-mode information is integrated, higher precision is provided by the multi-mode refusing model through complementation among modes, the high-efficiency and accurate refusing of invalid voices is realized.

In a more specific embodiment, as shown in fig. 5, it is determined whether the acquired voice signal is a voice command of a specified command set, if yes, voice rejection is not performed on the voice signal, if no, voice rejection is performed on the voice signal, invalid voice rejection determination is performed on a round of voice signal in the conversation is ended, invalid voice rejection determination is performed on a voice signal in a next round of conversation, the above-mentioned non-voice rejection model and multi-modal voice rejection model are circularly performed, and whether a new round of conversation voice signal is voice and is a specified voice command is determined.

In a second aspect, please refer to fig. 6, the present application provides a multi-mode voice rejection system for an electric power man-machine interaction scene, which includes a signal acquisition module 100, a voice discriminating module 200, a voice converting module 300 and a voice rejecting module 400, wherein the signal acquisition module 100 is configured to acquire a voice signal of the electric power man-machine interaction scene; the voice discriminating module 200 is in communication connection with the signal collecting module 100, and is used for discriminating the voice of the acquired voice signal; the voice conversion module 300 is in communication connection with the voice judging module 200, and is used for converting voice into text in real time when the acquired voice signal is voice, and acquiring text; the voice refusing module 400 is in communication connection with the signal collecting module 200 and the voice converting module 300, and is configured to perform multi-modal fusion processing on the acquired text and the original voice, acquire multi-modal fusion features, and control execution of different multi-modal voice refusing strategies according to the acquired multi-modal fusion features.

In an embodiment, the voice discriminating module includes:

In a third aspect, embodiments of the present application also provide a readable storage medium.

The readable storage medium of the application stores a multi-modal voice rejection program for an electric power man-machine interaction scene, wherein when the multi-modal voice rejection program for the electric power man-machine interaction scene is executed by a processor, the steps of the multi-modal voice rejection method for the electric power man-machine interaction scene are realized.

The method implemented when the multi-modal voice rejection procedure for the electric human-computer interaction scenario is executed may refer to various embodiments of the multi-modal voice rejection method for the electric human-computer interaction scenario of the present application, which are not described herein.

It should be noted that, the foregoing embodiment numbers are merely for describing the embodiments, and do not represent the advantages and disadvantages of the embodiments.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) as described above, comprising several instructions for causing a terminal device to perform the method described in the various embodiments of the present application.

The foregoing description is only of the preferred embodiments of the present application, and is not intended to limit the scope of the claims, and all equivalent structures or equivalent processes using the descriptions and drawings of the present application, or direct or indirect application in other related technical fields are included in the scope of the claims of the present application.

Claims

1. The multi-mode voice refusing method for the electric power man-machine interaction scene is characterized by comprising the following steps of:

performing voice discrimination on the acquired voice signal;

2. The multi-modal voice rejection method for electric human-computer interaction scenario of claim 1, wherein the step of performing a human voice discrimination on the acquired voice signal comprises the steps of:

3. The multi-modal speech rejection method for electric human-computer interaction scenario of claim 1, wherein when the acquired speech signal is voice, converting voice into text in real time, acquiring text, and executing different rejection strategies according to the acquired recognition result, comprising the following steps:

4. The multi-modal speech recognition method for electric power man-machine interaction scenario as set forth in claim 3, wherein the step of constructing a speech recognition model for converting electric power domain speech into text specifically includes the steps of:

constructing a power voice instruction set;

5. The method of claim 4, wherein the power speech instruction set includes instruction set text and corresponding speech corpus.

6. The multi-modal speech rejection method for electric power man-machine interaction scenario as in claim 4, wherein the step of constructing a speech recognition model for converting electric power domain speech into text based on electric power domain speech corpus and constructed electric power speech instruction set specifically comprises the steps of:

7. The multi-modal speech rejection method for electric human-computer interaction scenario of claim 1, wherein when the acquired speech signal is voice, inputting voice to the constructed speech recognition model to perform real-time text conversion, and acquiring text, comprising the following steps:

constructing a multi-mode refusing model of human-computer interaction text and voice of an electric power scene;

8. A multi-modal speech recognition system for an electrical human-machine interaction scenario, comprising:

9. The multi-modal speech rejection method for power human-computer interaction scenarios as in claim 4 wherein the voice discrimination module comprises: