CN109949831B

CN109949831B - Method and device for voice recognition in intelligent equipment and computer readable storage medium

Info

Publication number: CN109949831B
Application number: CN201711384201.6A
Authority: CN
Inventors: 曾显伟; 俞国新
Original assignee: Qingdao Haier Smart Technology R&D Co Ltd
Current assignee: Qingdao Haier Smart Technology R&D Co Ltd; Haier Smart Home Co Ltd
Priority date: 2017-12-20
Filing date: 2017-12-20
Publication date: 2021-09-24
Anticipated expiration: 2037-12-20
Also published as: CN109949831A

Abstract

The invention discloses a method and a device for voice recognition in intelligent equipment and a computer readable storage medium, and belongs to the technical field of intelligent household appliances. The method comprises the following steps: determining the current energy value of the current frame sound signal in the obtained sound signals; if the current energy value is larger than the self-adaptive energy threshold value of the intelligent equipment, acquiring a current characteristic vector value of the current frame sound signal; and if the current characteristic vector value is larger than the set characteristic threshold value, determining that the current frame sound signal is a voice signal. Therefore, the efficiency of voice recognition is improved, and the accuracy of the voice recognition is also improved.

Description

Method and device for voice recognition in intelligent equipment and computer readable storage medium

Technical Field

The present invention relates to the field of intelligent household appliance technologies, and in particular, to a method and an apparatus for speech recognition in an intelligent device, and a computer-readable storage medium.

Background

With the development of intelligent household appliance technology, household appliances such as air conditioners, refrigerators, washing machines, range hoods, electric fans and the like can be intelligently controlled. And, these smart devices can acquire the speech signal and realize corresponding speech control.

The sound signals collected on the smart device are not necessarily all speech, or the collected sound signals further include some noise or noises, for example: the sound of opening the door, the sound of cooking, etc. These sounds may be collected by the smart device along with the speech, and thus, it may be difficult for the smart device to recognize the speech controlling the smart device from the collected sound signals.

Disclosure of Invention

The embodiment of the invention provides a method and a device for voice recognition in intelligent equipment and a computer readable storage medium. The following presents a simplified summary in order to provide a basic understanding of some aspects of the disclosed embodiments. This summary is not an extensive overview and is intended to neither identify key/critical elements nor delineate the scope of such embodiments. Its sole purpose is to present some concepts in a simplified form as a prelude to the more detailed description that is presented later.

According to a first aspect of the embodiments of the present invention, there is provided a method for speech recognition in an intelligent device, including:

determining the current energy value of the current frame sound signal in the obtained sound signals;

if the current energy value is larger than the self-adaptive energy threshold value of the intelligent equipment, acquiring a current characteristic vector value of the current frame sound signal;

and if the current characteristic vector value is larger than the set characteristic threshold value, determining that the current frame sound signal is a voice signal.

In an embodiment of the present invention, before determining the current energy value of the current frame sound signal, the method includes:

determining a first energy value corresponding to steady-state noise of the intelligent equipment according to the running state of the intelligent equipment;

determining the adaptive energy threshold according to the first energy value.

In an embodiment of the present invention, before determining the current energy value of the current frame sound signal, the method further includes:

and performing framing processing on the acquired sound signal, and determining a frame of sound signal as the current frame of sound signal.

In an embodiment of the present invention, after determining that the current frame sound signal is a speech signal, the method further includes:

after each frame of sound signal of the sound signal is subjected to voice recognition processing, a voice control end point is determined according to each frame of the sound signal, and corresponding voice control is carried out on the intelligent equipment.

According to a second aspect of the embodiments of the present invention, there is provided an apparatus for speech recognition in a smart device, including:

the determining unit is used for determining the current energy value of the current frame sound signal in the acquired sound signals;

the acquisition unit is used for acquiring a current characteristic vector value of the current frame sound signal if the current energy value is larger than the self-adaptive energy threshold value of the intelligent equipment;

and the identification unit is used for determining the current frame sound signal as a voice signal if the current characteristic vector value is greater than the set characteristic threshold value.

In an embodiment of the present invention, the apparatus further includes:

the self-adaptive unit is used for determining a first energy value corresponding to the steady-state noise of the intelligent equipment according to the running state of the intelligent equipment; determining the adaptive energy threshold according to the first energy value.

In an embodiment of the present invention, the apparatus further includes:

and the framing unit is used for performing framing processing on the acquired sound signals and determining a frame of sound signals as the current frame of sound signals.

In an embodiment of the present invention, the apparatus further includes:

and the control unit is used for determining a voice control end point according to each frame of voice signal after each frame of voice signal of the voice signal is subjected to voice recognition processing, and performing corresponding voice control on the intelligent equipment.

According to a third aspect of the embodiments of the present invention, there is provided an apparatus for speech recognition in an intelligent device, where the apparatus is used for the intelligent device, and includes:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to:

According to a fourth aspect of embodiments of the present invention, there is provided a computer readable storage medium having stored thereon computer instructions which, when executed by a processor, implement the steps of the above-described method.

The technical scheme provided by the embodiment of the invention has the following beneficial effects:

in the embodiment of the invention, the voice signal can be identified from the voice signal through double detection of the energy and the characteristic vector of the voice, so that the efficiency of voice identification is improved, the accuracy of voice identification is also improved, and the accuracy and the intellectualization of voice control of intelligent equipment are also improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.

FIG. 1 is a flow diagram illustrating a method of speech recognition in a smart device in accordance with an exemplary embodiment;

FIG. 2 is a flow diagram illustrating a method of speech recognition in a smart device in accordance with an exemplary embodiment;

FIG. 3 is a block diagram illustrating a speech recognition arrangement in a smart device in accordance with an exemplary embodiment;

fig. 4 is a block diagram illustrating a speech recognition apparatus in a smart device according to an example embodiment.

Detailed Description

The following description and the drawings sufficiently illustrate specific embodiments of the invention to enable those skilled in the art to practice them. The examples merely typify possible variations. Individual components and functions are optional unless explicitly required, and the sequence of operations may vary. Portions and features of some embodiments may be included in or substituted for those of others. The scope of embodiments of the invention encompasses the full ambit of the claims, as well as all available equivalents of the claims. Embodiments may be referred to herein, individually or collectively, by the term "invention" merely for convenience and without intending to voluntarily limit the scope of this application to any single invention or inventive concept if more than one is in fact disclosed. Herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed. The embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the structures, products and the like disclosed by the embodiments, the description is relatively simple because the structures, the products and the like correspond to the parts disclosed by the embodiments, and the relevant parts can be just described by referring to the method part.

The intelligent device can be used for human-computer interaction to realize intelligent control, for example: the voice control system can collect voice signals of users and perform corresponding voice control according to the collected voice signals. However, the sound signals collected by the smart device are not all speech signals, but may include some sudden sounds or noises, and thus, the speech signals in the sound signals need to be recognized. In the embodiment of the invention, the voice signal can be identified from the voice signal through double detection of the energy and the feature vector matched with the voice, so that the efficiency of voice identification of the intelligent equipment is improved, the accuracy of the voice identification is also improved, and the method is particularly suitable for intelligent equipment with steady-state noise, such as: a range hood, an electric fan, etc.

FIG. 1 is a flow chart illustrating a method of speech recognition in a smart device according to an exemplary embodiment. As shown in fig. 1, the process of speech recognition in the smart device includes:

step 101: and determining the current energy value of the current frame sound signal in the acquired sound signals.

Generally, the intelligent device has a human-computer interaction function, and can collect sound, so that a corresponding sound signal can be acquired. After the sound signal is acquired, the acquired sound signal may be subjected to framing processing, and then speech recognition processing may be performed with respect to each frame of sound signal, so that one frame of sound signal after the framing processing may be determined as a current frame of sound signal. Preferably, during the preliminary operation, the speech recognition processing may be performed on each frame of the sound signal according to the chronological order. Alternatively, an arbitrary one frame sound signal is randomly determined as the current frame sound signal.

In the embodiment of the present invention, the speech recognition processing is performed based on the energy detection matched with the speech in the sound signal, so that the energy value of the sound signal of the current frame, that is, the current energy value, needs to be determined here. Here, the energy of the sound signal may specifically include: short-term average energy of speech. Thus, the energy value of each frame of the sound signal is the short-time energy value of each frame. In general, the short-time energy value can be expressed as a weighted sum of squares of the amplitude values of the samples of the one-frame sound signal, and thus, the current energy value can be determined according to the weighted sum of squares of the amplitude values of the samples in the current-frame sound signal. Of course, the embodiments of the present invention are not limited thereto, and other ways of determining the energy value of the sound signal can be applied thereto.

Step 102: and if the current energy value is larger than the self-adaptive energy threshold value of the intelligent equipment, acquiring the current characteristic vector value of the current frame sound signal.

Some smart devices may accompany some sounds while running, for example: pumping units, electric fans, air purifiers, etc. Therefore, the sound signal collected by the smart device is not necessarily a voice signal. A threshold value is predetermined and only sound signals having an energy value greater than the threshold value are likely to be candidate speech signals. Therefore, a threshold value can be pre-configured as the adaptive energy threshold. For some intelligent devices, the adaptive energy threshold may be a fixed value, and may be determined according to a statistical value of noise energy values of the intelligent devices during operation. Alternatively, some smart devices, for example: smoke exhaust ventilator, electric fan and so on these intelligent equipment with motor, to different operating condition, for example: the noise energy values corresponding to the high wind operation and the low wind operation are different, however, the noise energy values corresponding to the same operation state are determined to be stable, namely, the noise energy values are steady state noise energy values, and each operation state corresponds to one steady state noise energy value, which can be a first energy value.

It can be seen that before determining the current energy value of the current frame sound signal, an adaptive energy threshold needs to be determined. The adaptive energy threshold value can be a preset fixed value, or a first energy value corresponding to the steady-state noise of the intelligent device is determined according to the running state of the intelligent device, and the adaptive energy threshold value is determined according to the first energy value. Preferably, the adaptive energy threshold is slightly greater than the first energy value.

After the current energy value of the current frame sound signal is compared with the adaptive energy threshold, when the current energy value is determined to be larger than the adaptive energy threshold of the intelligent device, the current frame sound signal can be indicated to be a candidate voice signal, and at the moment, the current characteristic vector value of the current frame sound signal needs to be acquired.

An important step in speech recognition is the extraction of feature vectors of the acoustic signal. The feature vector of the sound signal may include, depending on the application scenario: linear predictive coding coefficients LPC parameters, cepstral coefficients CEP, or mel-frequency cepstral coefficients MFCC, etc. Different feature vectors have different corresponding acquisition modes, and many existing modes for extracting the feature vectors of the sound signals can be applied to the different feature vectors.

Preferably, the feature vector comprises: mel Frequency Cepstral Coefficients (MFCC). The Mel-scaled filter bank has a high resolution at low frequencies consistent with the auditory properties of the human ear, so that the MFCC can simulate the processing characteristics of human auditory perception, and only focus on certain specific frequency components, and has many filters in the low frequency region, which are more densely distributed, but in the high frequency region, the number of filters becomes smaller, which are more sparsely distributed. Each frame of sound signal of sound corresponds to a MFCC multidimensional array, the value of the MFCC array is extracted, namely the value of a characteristic vector of the sound signal, if the current frame of sound signal is a speech signal, the value of the corresponding MFCC array is very large even if the sound is very small, and if the current frame of sound signal is a sudden noise signal, the value of the corresponding MFCC array is far smaller than the speech even if the sound is very large. Therefore, when the current energy value is determined to be larger than the adaptive energy threshold of the intelligent device, the MFCC value corresponding to the current frame sound signal can be obtained, and the value of the corresponding MFCC array can be obtained by generally performing frequency domain transformation, cepstrum transformation, differential processing and the like on the current frame sound signal, so that the current characteristic vector value of the current frame sound signal can be obtained.

Of course, for other types of feature vector values, the way of specifically acquiring the current feature vector value of the current frame voice signal is also different, and since the linear predictive coding coefficient LPC parameter, the cepstrum coefficient CEP, and the like are all conventional feature vectors of the voice signal, the existing ways of extracting the feature vector value of the voice signal can be applied thereto, and are not always described.

Step 103: and if the current characteristic vector value is larger than the set characteristic threshold value, determining the current frame sound signal as a voice signal.

For the characteristics of the voice signals, a set characteristic threshold value can be configured in advance for each type of characteristic vector. For example: the MFCC can be configured with a set characteristic threshold value, and when the value of the MFCC array is greater than the set characteristic threshold value, the current frame sound signal can be determined to be a speech signal.

Because the current energy value of the current frame sound signal is greater than the adaptive energy threshold of the intelligent device, and the current characteristic vector value of the current frame sound signal is greater than the set characteristic threshold, the current frame sound signal can be determined to be a voice signal.

The speech signal can be identified from the sound signal through the dual detection of the energy and the feature vector of the speech, so that the efficiency of speech identification is improved, the accuracy of the speech identification is also improved, and the accuracy and the intellectualization of the speech control of the intelligent equipment are also improved.

In the embodiment of the present invention, for some intelligent devices, for example: the intelligent device comprises a range hood, an electric fan, an air purifier and the like, wherein the intelligent device can comprise a motor, a fan and the like, so that certain noise is generated during operation, namely steady-state noise, and the intelligent device corresponds to different steady-state noises in different operation states. In the embodiment of the invention, the self-adaptive energy threshold of the intelligent equipment can be configured in a self-adaptive manner, namely, the first energy value corresponding to the steady-state noise of the intelligent equipment is determined according to the running state of the intelligent equipment; an adaptive energy threshold is then determined based on the first energy value. Therefore, only if the current energy value of the current frame sound signal is larger than the adaptive energy threshold value, the current frame sound signal can become a candidate sound signal, and therefore the method for recognizing the speech provided by the embodiment of the invention can recognize the sound signal in the sound signal without being influenced by the advance state of the intelligent equipment, and intelligent control is realized. For example: no matter the oil pumping unit is in a high wind state, a medium wind state or a low wind state, and sudden noise exists in the process, such as cooking, opening a door and the like, as long as the current energy value of the current frame sound signal is larger than the self-adaptive energy threshold value of the intelligent equipment, and the current characteristic vector value of the current frame sound signal is larger than the set characteristic threshold value, the current frame sound signal can be determined to be a voice signal. The recognition rate of the intelligent device is ensured, the accuracy of voice recognition is improved, and the method is particularly suitable for intelligent devices with steady-state noise, such as: a range hood, an electric fan, etc.

Of course, the smart device may acquire the sound signal through human-computer interaction, and then may perform framing processing on the acquired sound signal, and determine a frame of sound signal as a current frame of sound signal. Then, the voice recognition process can be performed on each frame of the sound signal by the above method. Thus, in another embodiment of the present invention, after each frame of the sound signal is subjected to the speech recognition processing, the speech control endpoint is determined according to each frame of the sound signal, and the intelligent device is subjected to corresponding speech control. Therefore, the voice control of the intelligent equipment is realized, and the intellectualization of the intelligent equipment and the accuracy of the voice control are improved.

The following operational flows are grouped into specific embodiments to illustrate the methods provided by the embodiments of the present disclosure.

In this embodiment, the smart machine may be a range hood.

FIG. 2 is a flow diagram illustrating a method of speech recognition in a smart device, according to an example embodiment. As shown in fig. 2, the process of speech recognition in the smart device is as follows:

step 201: a sound signal is acquired.

The smoke exhaust ventilator is provided with a voice acquisition device, and a sound signal can be acquired through the voice acquisition device

Step 202: the acquired sound signal is subjected to framing processing, and a frame of sound signal is determined as a current frame of sound signal.

Preferably, the first frame of sound signal may be determined as the current frame of sound signal according to the sequence of the acquisition times.

Step 203: and determining the self-adaptive energy threshold according to the running state of the intelligent equipment.

The general range hood is provided with a plurality of gears, and the rotating speeds of corresponding fans are different, so that the first energy values of corresponding steady-state noises are also different, and therefore the first energy values corresponding to the steady-state noises of the intelligent equipment can be determined according to the running state of the intelligent equipment; an adaptive energy threshold is then determined based on the first energy value. Typically, the adaptive energy threshold is slightly greater than the first energy value.

Step 204: and determining the current energy value of the current frame sound signal according to the sampling point amplitude value in the current frame sound signal.

Step 205: is the current energy value greater than the adaptive energy threshold determined? If so, go to step 206, otherwise, the process ends.

If the current energy value of the current frame sound signal is greater than the adaptive energy threshold, the current frame sound signal may be a candidate speech signal, step 206 may be performed, otherwise, the process ends.

Step 206: and acquiring the value of the MFCC array corresponding to the current frame sound signal.

In this embodiment, the current feature vector value of the current frame sound signal may be an MFCC value, that is, a value of a corresponding MFCC array.

Step 207: is it determined whether the value of the MFCC array is greater than a set characteristic threshold? If so, go to step 208, otherwise, the process ends.

The current feature vector value may be an MFCC value, and the set feature threshold value also corresponds to the MFCC, if the value of the MFCC array is greater than the set feature threshold value, step 208 is executed, otherwise, the process ends.

Step 208: and determining the current frame sound signal as a speech signal.

Step 209: is it determined whether or not each frame of sound signal in the acquired sound signal is subjected to the speech recognition processing? If so, go to step 211, otherwise, go to step 210.

Step 210: determines a frame of sound signal as the current frame of sound signal and returns to step 204.

Preferably, the next frame of sound signal is determined as the current frame of sound signal according to the collection time sequence.

Step 211: and determining a voice control end point according to each frame of determined voice signals, and performing corresponding voice control on the intelligent equipment.

The voice recognition and matching can be further carried out on each frame of determined voice signals, the voice control end point is determined, and the corresponding voice control is carried out. Thus, voice control of the intelligent device is achieved.

It is thus clear that in this embodiment, smoke ventilator can confirm self-adaptation energy threshold according to the running state to through the dual detection with energy and the eigenvector that the pronunciation matches, discern the speech signal from the sound signal, like this, no matter what kind of state smoke ventilator operates, and when the unexpected sound such as dish appears frying in the speech acquisition process, switch door, heavy object fall ground, also can follow the accurate speech signal of discerning in the sound signal, ensured smoke ventilator voice control function, further improved smoke ventilator's intellectuality.

The following are embodiments of the disclosed apparatus that may be used to perform embodiments of the disclosed methods.

According to the process of voice recognition in the intelligent equipment, a device for voice recognition in the intelligent equipment can be constructed.

Fig. 3 is a block diagram illustrating a speech recognition apparatus in a smart device according to an example embodiment. As shown in fig. 3, the apparatus may include: a determination unit 100, an acquisition unit 200 and a recognition unit 300, wherein,

a determining unit 100, configured to determine a current energy value of a current frame sound signal in the acquired sound signals.

The obtaining unit 200 is configured to obtain a current feature vector value of the current frame sound signal if the current energy value is greater than the adaptive energy threshold of the smart device.

The recognizing unit 300 is configured to determine that the current frame sound signal is a speech signal if the current feature vector value is greater than the set feature threshold.

In an embodiment of the present invention, the apparatus further includes:

the self-adaptive unit is used for determining a first energy value corresponding to the steady-state noise of the intelligent equipment according to the running state of the intelligent equipment; an adaptive energy threshold is determined based on the first energy value.

In an embodiment of the present invention, the apparatus further includes:

and the framing unit is used for performing framing processing on the acquired sound signals and determining one frame of sound signals as the current frame of sound signals.

In an embodiment of the present invention, the apparatus further includes:

and the control unit is used for determining a voice control endpoint according to each frame of voice signal after each frame of voice signal of the voice signal is subjected to voice recognition processing, and performing corresponding voice control on the intelligent equipment.

The following illustrates an apparatus provided by an embodiment of the present disclosure.

Fig. 4 is a block diagram illustrating a speech recognition apparatus in a smart device according to an example embodiment. As shown in fig. 4, the apparatus includes: the apparatus may include: the determining unit 100, the obtaining unit 200 and the identifying unit 300 further comprise an adapting unit 400, a framing unit 500 and a control unit 600.

The adaptive unit 400 may determine the adaptive energy threshold according to the operating state of the smart device. In this way, after acquiring the sound signal through human-computer interaction, the framing unit 500 may perform framing processing on the acquired sound signal, and determine a frame of sound signal as a current frame of sound signal.

Thus, the determining unit 100 can determine the current energy value of the current frame sound signal according to the magnitude value of the sample point in the current frame sound signal. When the current energy value is determined to be greater than the adaptive energy threshold, the obtaining unit 200 may obtain the value of the MFCC array corresponding to the current frame sound signal. And when the value of the MFCC array is greater than the set feature threshold, the recognition unit 300 may determine that the current frame sound signal is a speech signal.

Therefore, the voice recognition device in the intelligent equipment can perform voice recognition processing on each frame of sound signals in the acquired sound signals. After the processing is completed, the control unit 600 may determine a voice control end point according to each frame of the determined voice signal, and perform corresponding voice control on the intelligent device.

Therefore, in the embodiment, the intelligent device can recognize the voice signal from the voice signal through double detection of the energy and the feature vector of the voice, so that the efficiency of voice recognition is improved, the accuracy of the voice recognition is also improved, and the accuracy and the intellectualization of voice control of the intelligent device are also improved.

In an embodiment of the present invention, an apparatus for speech recognition in an intelligent device is provided, where the apparatus is used for the intelligent device, and includes:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to:

and if the current characteristic vector value is larger than the set characteristic threshold value, determining the current frame sound signal as a voice signal.

In one embodiment of the present invention, a computer-readable storage medium is provided, on which computer instructions are stored, and the instructions, when executed by a processor, implement the steps of the voice recognition method in the intelligent device.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It is to be understood that the present invention is not limited to the procedures and structures described above and shown in the drawings, and that various modifications and changes may be made without departing from the scope thereof. The scope of the invention is limited only by the appended claims.

Claims

1. A method for speech recognition in an intelligent device, comprising:

determining an adaptive energy threshold according to the first energy value;

if the current energy value is larger than the self-adaptive energy threshold value of the intelligent equipment, acquiring an MFCC multi-dimensional array value corresponding to the current frame sound signal as a current characteristic vector value;

and if the current characteristic vector value is larger than a set characteristic threshold value, determining that the current frame sound signal is a voice signal.

2. The method of claim 1, wherein said determining a current energy value of the current frame sound signal further comprises:

3. The method of claim 2, wherein after determining that the current frame sound signal is a speech signal, further comprising:

4. An apparatus for speech recognition in a smart device, comprising:

the self-adaptive unit is used for determining a first energy value corresponding to the steady-state noise of the intelligent equipment according to the running state of the intelligent equipment; determining an adaptive energy threshold according to the first energy value;

the obtaining unit is used for obtaining an MFCC multi-dimensional array value corresponding to the current frame sound signal as a current characteristic vector value if the current energy value is larger than the self-adaptive energy threshold value of the intelligent device;

and the identification unit is used for determining the current frame sound signal as a voice signal if the current characteristic vector value is greater than a set characteristic threshold value.

5. The apparatus of claim 4, wherein the apparatus further comprises:

6. The apparatus of claim 5, wherein the apparatus further comprises:

7. An apparatus for speech recognition in a smart device, the apparatus being used for the smart device and comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to:

determining an adaptive energy threshold according to the first energy value;

8. A computer-readable storage medium having stored thereon computer instructions, which when executed by a processor, perform the steps of the method of any one of claims 1 to 3.