CN110349566B - Voice wake-up method, electronic device and storage medium - Google Patents

Voice wake-up method, electronic device and storage medium Download PDF

Info

Publication number
CN110349566B
CN110349566B CN201910624198.3A CN201910624198A CN110349566B CN 110349566 B CN110349566 B CN 110349566B CN 201910624198 A CN201910624198 A CN 201910624198A CN 110349566 B CN110349566 B CN 110349566B
Authority
CN
China
Prior art keywords
voice
time length
value
envelope
envelopes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910624198.3A
Other languages
Chinese (zh)
Other versions
CN110349566A (en
Inventor
聂镭
沙露露
聂颖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Longma Zhixin Zhuhai Hengqin Technology Co ltd
Original Assignee
Longma Zhixin Zhuhai Hengqin Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Longma Zhixin Zhuhai Hengqin Technology Co ltd filed Critical Longma Zhixin Zhuhai Hengqin Technology Co ltd
Priority to CN201910624198.3A priority Critical patent/CN110349566B/en
Publication of CN110349566A publication Critical patent/CN110349566A/en
Application granted granted Critical
Publication of CN110349566B publication Critical patent/CN110349566B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Telephone Function (AREA)

Abstract

The invention discloses a voice awakening method, which comprises the following steps: s1, acquiring the current voice of the user, and intercepting the voice of unit duration; s2, judging the number of envelopes in the voice of the unit time length according to the voice of the unit time length; s3, calculating the duration of each envelope; s4, judging whether to carry out voice awakening recognition on the voice in unit time length according to the number of the envelopes in the voice in unit time length and the time length of each envelope. According to the invention, the collection of negative samples is not required, so that the labor cost of data collection is saved, meanwhile, the false awakening rate is greatly reduced, and the user experience is greatly improved.

Description

Voice wake-up method, electronic device and storage medium
Technical Field
The present invention relates to the field of voice recognition technologies, and in particular, to a voice wake-up method, an electronic device, and a storage medium.
Background
The voice wake-up technique is an important branch of the voice recognition technique. The voice awakening application fields are wide, such as robots, mobile phones, wearable equipment, smart homes, vehicles and the like. The process from sleep to wake-up of a device is generally: the device is started and loaded with the resources and is in a dormant state, when a user speaks a specific awakening word, the device is awakened and is switched to a working state to wait for a next instruction of the user. In the process, the user can directly operate by voice without touching with hands, and meanwhile, the device does not need to be in a working state in real time by utilizing a voice awakening mechanism, so that the energy consumption is saved.
In the existing voice wake-up technology, a separate voice recognition hardware system and a separate voice recognition software system are generally used. In the process of implementing the present invention, the inventor finds that the existing voice wake-up technical scheme has at least the following defects:
in the process of recognizing a command word (also referred to as a wake word) by the voice wake model, when negative samples of the training voice wake model are insufficient, frequent false wake situations of the voice wake model may be caused. The solution to this problem is to collect command words for false wake-up in the specific wake-up environment, and retrain the voice wake-up model with the collected negative samples to reduce the false wake-up rate in the environment. However, in practice, it is difficult to collect all negative samples completely, and although the false wake-up rate of the voice wake-up model is reduced to a certain extent by using the method, the voice wake-up effect still cannot meet the requirements of customers, and the false wake-up rate is high and the experience is poor.
Disclosure of Invention
In view of the above, an object of the present invention is to provide a voice wake-up method, an electronic device and a storage medium, so as to solve the problems of high false wake-up rate and poor user experience of voice wake-up in the related art.
According to an embodiment of the present invention, there is provided a voice wake-up method, including the steps of: s1, acquiring the current voice of the user, and intercepting the voice of unit duration; s2, judging the number of envelopes in the voice of the unit time length according to the voice of the unit time length; s3, calculating the duration of each envelope; s4, judging whether to carry out voice awakening recognition on the voice in unit time length according to the number of the envelopes in the voice in unit time length and the time length of each envelope.
According to yet another embodiment of the present invention, there is also provided an electronic device, including a memory in which a computer program is stored and a processor configured to execute the computer program to perform the steps in any of the above method embodiments.
According to a further embodiment of the present invention, there is also provided a storage medium having a computer program stored therein, wherein the computer program is arranged to perform the steps of any of the above method embodiments when executed.
The method comprises the steps of intercepting the voice of unit duration by acquiring the current voice of a user; judging the number of envelopes in the voice of the unit time length according to the voice of the unit time length; calculating the time length of each envelope; and judging whether to carry out voice awakening identification on the voice of the unit time length or not according to the number of envelopes in the voice of the unit time length and the time length of each envelope. Therefore, the voice awakening method provided by the invention can pre-filter the voice possibly causing the false awakening by calculating the number of envelopes and the envelope duration, so that the effect of greatly reducing the false awakening rate is realized.
Drawings
The above and other objects, features and advantages of the present invention will become more apparent from the following description of the embodiments of the present invention with reference to the accompanying drawings, in which:
fig. 1 is a block diagram of a hardware structure of a terminal of a voice wake-up method according to an embodiment of the present invention;
FIG. 2 is a flow chart of a voice wake-up method according to an embodiment of the present invention;
fig. 3 is a schematic diagram of a speech envelope of a voice wakeup word according to an embodiment of the present invention.
Detailed Description
The present invention will be described below based on examples, but the present invention is not limited to only these examples. In the following detailed description of the present invention, certain specific details are set forth in order to avoid obscuring the nature of the present invention, and well-known methods, procedures, and components have not been described in detail.
Further, those of ordinary skill in the art will appreciate that the drawings provided herein are for illustrative purposes and are not necessarily drawn to scale.
Unless the context clearly requires otherwise, throughout the description and the claims, the words "comprise", "comprising", and the like are to be construed in an inclusive sense as well as in an exclusive or exhaustive sense; that is, what is meant is "including, but not limited to".
In the description of the present invention, it is to be understood that the terms "first," "second," and the like are used for descriptive purposes only and for distinguishing between similar elements and not for indicating or implying relative importance or order, nor for describing a particular order or sequence. In addition, in the description of the present invention, "a plurality" means two or more unless otherwise specified.
The method provided by the embodiment of the application can be executed in a mobile terminal, a computer terminal or a similar operation device. Taking the example of operating on a mobile terminal, fig. 1 is a block diagram of a hardware structure of a terminal of a voice wake-up method according to an embodiment of the present invention. As shown in fig. 1, the mobile terminal 10 may include one or more (only one shown in fig. 1) processors 102 (the processor 102 may include, but is not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA) and a memory 104 for storing data, and optionally may also include a transmission device 106 for communication functions and an input-output device 108. It will be understood by those skilled in the art that the structure shown in fig. 1 is only an illustration, and does not limit the structure of the mobile terminal. For example, the mobile terminal 10 may also include more or fewer components than shown in FIG. 1, or have a different configuration than shown in FIG. 1.
The memory 104 may be used to store a computer program, for example, a software program and a module of an application software, such as a computer program corresponding to the voice wakeup method in the embodiment of the present invention, and the processor 102 executes the computer program stored in the memory 104 to execute various functional applications and data processing, i.e., to implement the method described above. The memory 104 may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some instances, the memory 104 may further include memory located remotely from the processor 102, which may be connected to the mobile terminal 10 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The transmission device 106 is used for receiving or transmitting data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the mobile terminal 10. In one example, the transmission device 106 includes a Network adapter (NIC), which can be connected to other Network devices through a base station so as to communicate with the internet. In one example, the transmission device 106 may be a Radio Frequency (RF) module, which is used for communicating with the internet in a wireless manner.
As shown in fig. 2, in this embodiment, a voice wake-up method is provided, which includes the following steps:
s1, acquiring the current voice of the user, and intercepting the voice of unit duration;
s2, judging the number of envelopes in the voice of the unit time length according to the voice of the unit time length;
s3, calculating the duration of each envelope;
s4, judging whether to carry out voice awakening recognition on the voice in unit time length according to the number of the envelopes in the voice in unit time length and the time length of each envelope.
The method comprises the steps of intercepting the voice of unit duration by acquiring the current voice of a user; judging the number of envelopes in the voice of the unit time length according to the voice of the unit time length; calculating the time length of each envelope; and judging whether to carry out voice awakening identification on the voice of the unit time length or not according to the number of envelopes in the voice of the unit time length and the time length of each envelope. Therefore, the voice awakening method provided by the invention can pre-filter the voice possibly causing the false awakening by calculating the number of envelopes and the envelope duration, so that the effect of greatly reducing the false awakening rate is realized.
The individual steps will be described in detail below with reference to specific embodiments.
And acquiring the current voice of the user and intercepting the voice of unit time length.
The current voice of the user is acquired by an audio acquisition device (such as a microphone in the input/output device 108) to obtain the current voice of the user. The original speech is resampled to 8 KHz. The sampling frequency is 8KHz, 16KHz and 48KHz, so that the voice data is resampled to 8KHz in the invention for convenient uniform processing.
And after the current voice of the user is acquired according to the mode, intercepting the current voice according to unit time length. It should be noted that the unit time length can be set by self-defining according to the requirement, in the present invention, the time length of the unit time length is set according to the time length corresponding to the time length of the voice segment of the voice wake-up model in the scheme of the present invention, that is, the voice of the unit time length is a basic unit for the voice wake-up model to perform each wake-up recognition in the scheme of the present invention, that is, the time length of the voice segment input by the voice wake-up model is the time length of one unit. The voice wakeup model used in the present invention is the prior art, which is not the point of the invention of the present application, and therefore, will not be described in detail here.
And judging the number of envelopes in the voice of the unit time length according to the voice of the unit time length.
And judging the number of envelopes in the voice of the unit time length according to the intercepted voice of the unit time length. The envelope of the speech, i.e., the envelope characteristics of the speech waveform, can reflect the characteristics of the speech waveform as a whole.
For a specific wake-up word (command word), the specific content of the wake-up word is fixed once it is set, so the number of envelopes of the speech waveform corresponding to the wake-up word is also fixed. Typically, in a speech waveform, one word corresponds to one envelope. Taking "small degree" as an example, in the standard case, 4 speech envelopes should appear, as shown in fig. 3-a. The envelope of the voice signal of the awakening word is extracted, and the purpose of extracting the envelope is to judge the number of words in the obtained voice signal according to the number of the envelopes, namely the number of the words in the awakening word is reflected to a certain degree by the number of the envelopes in the voice signal.
In some embodiments, the number of speech envelopes may be determined by first plotting a waveform of the speech and then determining the number of envelopes directly from the waveform of the speech. For example, the speech waveform diagram in fig. 3-a contains 4 envelopes, the speech waveform diagram in fig. 3-b contains 3 envelopes, and the speech waveform diagram in fig. 3-c contains 2 envelopes.
In some implementations of the present application, step S2 includes the steps of:
and S21, carrying out voice signal conversion on the voice of the unit time length to obtain a voice signal sequence.
By performing digital processing on the voice of the unit time length intercepted in step S1, a voice signal sequence corresponding to the voice of the unit time length can be obtained through voice signal conversion. For example, signal = [0.01,0.005,0.02,0.01,0.001,0.02,0.03,0.02,0.001,0.02,0.001,0.02,0.02,0.0105] is a speech signal sequence obtained by converting a speech signal of a certain unit duration intercepted.
And S22, judging the number of the envelopes according to the voice signal sequence.
In some embodiments of the present application, after the step S21 and before the step S22, the method further comprises:
s211, performing feature binarization processing on feature values in the voice signal sequence to obtain a first voice signal mark sequence corresponding to the voice signal sequence, wherein a first threshold value of the feature binarization is an average value of the feature values in the voice signal sequence, the mark is 1 if the feature value is greater than or equal to the first threshold value, and the mark is 0 if the feature value is less than the first threshold value;
s212, performing secondary feature binarization on the feature values in the speech signal sequence to obtain a second speech signal flag sequence corresponding to the speech signal sequence, where a second threshold of the secondary feature binarization is a mean value of the feature values marked as 1 in the speech signal sequence, the feature value is marked as 1 if the second threshold is greater than or equal to the second threshold, and the feature value is marked as 0 if the feature value is smaller than the second threshold.
In this embodiment, through the above processing in step S211 and step S212, the part of the speech signal sequence that contains the environmental noise can be filtered, that is, through two times of feature binarization processing, the envelope value of the speech signal sequence that only contains the noise is changed to 0, and only the value of the speech part in the audio is retained. By adopting the double judgment of the two times of feature binarization, the envelope of the voice signal can be effectively extracted, and the noise in the environment can be filtered in a self-adaptive manner. Compared with the traditional noise reduction method (such as a filter method, a related characteristic method, a nonlinear processing method, a spectrum reduction method and the like), the noise filtering method for the characteristic binarization processing adopted in the embodiment of the invention only needs a small amount of calculation, has good noise reduction effect, does not cause signal distortion and introduce extra noise, has low requirement on hardware, is easy to realize on various mainstream hardware and has good universality. The following examples are given for illustrative purposes:
for the above speech signal sequence signal, first, a first threshold for primary feature binarization is calculated, that is, the first threshold is a mean value mean1 of feature values in the signal, the calculated mean value mean1=0.0135, positions greater than or equal to the mean value are marked as 1, positions smaller than the mean value are marked as 0, and a marking result is marked as a first speech signal marking sequence sign1, and then, after the feature binarization processing, a sign1= [1,1,0,1,1,0,0,0, 1] is obtained. Then, the speech signal sequence signal is subjected to secondary feature binarization processing. First, a second threshold of the secondary feature binarization is calculated, that is, a mean value of feature values marked as 1 in the speech signal sequence or a mean value multiplied by a threshold coefficient (an empirical value), in this embodiment, the second threshold is obtained by multiplying the mean value by the threshold coefficient, the threshold coefficient is 5, a mean value mean2 of feature values of points marked as 1 in a signal is calculated, the second threshold is 5 mean 2= 5 sum (signal sign1)/sum (sign1), a result of the calculation of the second threshold is 0.0275, a position greater than or equal to the mean value mean2 is marked as 1 at this time, otherwise, the position is marked as 0, a second speech signal mark sequence sign2 corresponding to the speech signal sequence is obtained, and the sign2 is not marked as [0,0,0,0,0,0 ].
Through the above-mentioned feature binarization processing in step S211 and step S212, the problem that a small amount of background noise is reserved as a signal or removed as noise when filtering noise by means of one binarization in the prior art is solved, the envelope of the speech signal can be effectively extracted, noise in the environment can be adaptively filtered, the background noise can be adaptively set to zero, and distortion is not caused to a required signal or extra noise is not introduced.
In this embodiment, the step S22 is: judging the number of envelopes according to the second speech signal mark sequence, which comprises the following steps:
s221, setting a Step Value Step and a merging threshold Value;
s222, carrying out segmentation processing on the second voice signal marking sequence according to the Step value Step;
s223, summing Sum of the mark values in each segment in the second voice signal mark sequence one by oneiWherein i represents the ith segment;
s224, SumiComparing with the merge threshold Value if SumiIf Value is greater than or equal to Value, the mark Value in the segment is marked as 1, if Sumi< Value, then SumiAnd marking the mark value in the corresponding segment as 0 to obtain a voice signal mark merging sequence.
It should be noted that, since the envelope of the speech signal is effectively extracted from the second speech signal flag sequence sign2 obtained through the calculation in the above steps S211 and S212, the determination of the number of envelopes in steps S221 to S224 in this embodiment is equivalent to the determination of the second speech signal flag sequence sign2, rather than the analysis of the signal itself.
In some embodiments of the present invention, before the envelope is determined, the second speech signal mark sequence needs to be merged to solve the problem of abnormal speech data sampling of individual sampling points and ensure the consistency of the speech data of the sampling points. The merging process comprises the following specific steps:
first, a Step Value Step, a merge threshold Value, and a Step Value and a merge threshold Value may be set according to actual needs, and are named in this embodiment, where Step =5 and Value = 3.
Then, the second speech signal mark sequence is segmented according to the Step value Step. After speech processing in the above steps is performed on speech of a certain unit duration, the obtained second speech signal flag sequence sign2 ' = [0,0,1,0,0,0,1,1,1, 0,0,0], sign2 ' has a length of 15, and sign2 ' is divided into 3 segments in total according to the Step value Step = 5.
Then, Sum is performed on the mark values in each segment in the second voice signal mark sequence one by oneiWhere i represents the ith segment. The first section is [0,0,1,0]The second section is [0,1, 1)]And the third segment is [0,0,1,0]The result of the summation is: sum1=1,Sum2=4,Sum3=1。
Finally, SumiComparing with the merge threshold Value if SumiIf Value is greater than or equal to Value, the mark Value in the segment is marked as 1, if Sumi< Value, then SumiAnd marking the mark value in the corresponding segment as 0 to obtain a voice signal mark merging sequence. The first section is [0,0,1,0]Value =3, Sum of first paragraph Sum1Is 1 and is less than the merge threshold Value, so the first stage merge process becomes [0,0,0]The second stage is [0,1,1]Sum of the second segment2Is 4, is greater than the merge threshold Value, so the second merge process becomes [1,1,1]Sum of the third segment and Sum3Is 1 and is less than the merge threshold Value, so the third stage of merge processing becomes [0,0,0]. Therefore, sign 2' is combined to obtain a speech signal mark combining sequence of [0,0,0,0,0,1,1,1,1, 0,0,0,0]。
In some implementations of the invention, the number M of consecutive segments in the merged sequence of speech signal markers with a marker value labeled 1 is calculated and used as the number of envelopes of speech of the unit duration. In this embodiment, the number M of consecutive segments marked with a flag value of 1 in the speech signal flag merging sequence [0,0,0,0,0,1,1,1,1, 0,0,0,0,0] is 1.
And calculating the time length of each envelope.
In some implementations of the invention, the calculating of the duration of each of the envelopes in step S3 includes:
and calculating the number N of sampling points in a continuous segment with a mark value marked as 1 in the voice signal mark merging sequence, wherein the envelope duration calculation method comprises the following steps:
T=1000*N/fs
where N represents the number of sampling points, fs represents the sampling frequency, and the unit of the duration of the envelope is ms.
For example, if the number N of sampling points in the continuous segment whose flag value is 1 in the speech signal flag merging sequence [0,0,0,0, 1,1,1,1,0,0,0,0,0] is 5, the envelope duration is T =1000 × 5/8000=0.625 ms.
And judging whether to carry out voice awakening identification on the voice of the unit time length or not according to the number of envelopes in the voice of the unit time length and the time length of each envelope.
In some embodiments of the present invention, it is determined whether the number of envelopes in the speech of the unit time length and the time length of each envelope meet preset requirements, if so, performing speech awakening recognition on the speech of the unit time length, and if not, discarding the speech of the unit time length. Whether voice awakening recognition is carried out or not is determined through judging the number of envelopes and the duration of the envelopes, on one hand, experiments show that the false awakening rate of the voice awakening recognition can be greatly reduced, and on the other hand, the voice which does not meet the judgment requirement is discarded without the voice awakening recognition, so that the number of times of the voice awakening recognition can be reduced, the calculated amount of hardware is reduced, and the power consumption of the hardware is saved. The following are specific examples:
for example, when the wake-up word is "small", in a standard case, 4 speech envelopes should occur, but in a practical case, 3 speech envelopes, or 2 speech envelopes may occur. Therefore, whether the voice of the unit time length is subjected to voice awakening recognition needs to be judged according to the envelope quantity and the envelope time length obtained in the above steps and different conditions. In addition, the pronunciation time of Chinese character should be 0.2-0.4s (empirical value), so the time length of each envelope should be 0.2-0.4s, but the time length of the envelope will be different in different number of envelopes. In the following, the specific determination process is described as follows, taking the awakening word as "small degree" as an example:
if the detected number of envelopes is 4, respectively calculating the time length of each envelope, if the time length of each envelope is in the range of 0.2-0.4s, returning to 1, otherwise, returning to 0.
If the number of detected envelopes is 3, respectively calculating the time length of each envelope, if the time length of two envelopes is within 0.2-0.4 and the time length of one envelope is within 0.4-0.6s (empirical value), returning to 1, otherwise, returning to 0.
If the number of detected envelopes is 2, respectively calculating the time length of each envelope, if the time length of each envelope is within 0.3-0.6s (empirical value), returning the value to be 1, otherwise, returning the value to be 0.
The number of envelopes in the other cases, i.e. when the calculated number of envelopes is not 2, 3, 4, the direct return value is 0.
If the received return value is 1, voice awakening identification is needed, otherwise, the voice segment is discarded.
Compared with the method for collecting negative sample data and retraining the model in the prior art, the voice awakening method provided by the invention does not need to collect the negative sample data, thereby saving the labor cost of data collection, greatly reducing the false awakening rate and greatly improving the user experience. Whether voice awakening recognition is carried out or not is determined through judging the number of envelopes and the duration of the envelopes, on one hand, experiments show that the false awakening rate of the voice awakening recognition can be greatly reduced, and on the other hand, the voice which does not meet the judgment requirement is discarded without the voice awakening recognition, so that the number of times of the voice awakening recognition can be reduced, the calculated amount of hardware is reduced, and the power consumption of the hardware is saved.
Embodiments of the present invention also provide an electronic device comprising a memory having a computer program stored therein and a processor arranged to run the computer program to perform the steps of any of the above method embodiments.
Optionally, the electronic device may further include a transmission device and an input/output device, wherein the transmission device is connected to the processor, and the input/output device is connected to the processor.
Optionally, in this embodiment, the processor may be configured to execute the following steps by a computer program:
s1, acquiring the current voice of the user, and intercepting the voice of unit duration;
s2, judging the number of envelopes in the voice of the unit time length according to the voice of the unit time length;
s3, calculating the duration of each envelope;
s4, judging whether to carry out voice awakening recognition on the voice in unit time length according to the number of the envelopes in the voice in unit time length and the time length of each envelope.
Optionally, the specific examples in this embodiment may refer to the examples described in the above embodiments, and this embodiment is not described herein again.
Embodiments of the present invention also provide a storage medium having a computer program stored therein, wherein the computer program is arranged to perform the steps of any of the above method embodiments when executed.
Alternatively, in the present embodiment, the storage medium may be configured to store a computer program for executing the steps of:
s1, acquiring the current voice of the user, and intercepting the voice of unit duration;
s2, judging the number of envelopes in the voice of the unit time length according to the voice of the unit time length;
s3, calculating the duration of each envelope;
s4, judging whether to carry out voice awakening recognition on the voice in unit time length according to the number of the envelopes in the voice in unit time length and the time length of each envelope.
Optionally, the storage medium is further configured to store program codes for executing steps included in the method in the foregoing embodiment, which is not described in detail in this embodiment.
Optionally, in this embodiment, the storage medium may include, but is not limited to: various media capable of storing computer programs, such as a usb disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk.
Optionally, the specific examples in this embodiment may refer to the examples described in the above embodiments and optional implementation manners, and this embodiment is not described herein again.
It will be apparent to those skilled in the art that the modules or steps of the present invention described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of multiple computing devices, and alternatively, they may be implemented by program code executable by a computing device, such that they may be stored in a storage device and executed by a computing device, and in some cases, the steps shown or described may be performed in an order different than that described herein, or they may be separately fabricated into individual integrated circuit modules, or multiple ones of them may be fabricated into a single integrated circuit module. Thus, the present invention is not limited to any specific combination of hardware and software.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the principle of the present invention should be included in the protection scope of the present invention.

Claims (9)

1. A voice wake-up method, characterized in that the method comprises the steps of:
s1, acquiring the current voice of the user, and intercepting the voice of unit duration;
s2, judging the number of envelopes in the voice of the unit time length according to the voice of the unit time length;
s3, calculating the duration of each envelope;
s4, judging whether to perform voice awakening recognition on the voice in unit time length according to the number of envelopes in the voice in unit time length and the time length of each envelope;
the step S4 includes: judging whether the number of envelopes and the time length of each envelope meet preset requirements or not, if so, performing voice awakening recognition on the voice of the unit time length, and if not, discarding the voice of the unit time length;
and judging whether the number of envelopes and the duration of each envelope meet preset requirements or not, wherein the judging comprises the following steps:
judging whether the number of envelopes is within a preset envelope number range threshold value, if so, determining a preset time length threshold value corresponding to each envelope according to the number of envelopes;
and judging whether the duration of each envelope is within the corresponding preset duration threshold, if so, meeting the preset requirement.
2. The method according to claim 1, wherein the step S2 includes:
s21, carrying out voice signal transformation on the voice with unit duration to obtain a voice signal sequence;
and S22, judging the number of the envelopes according to the voice signal sequence.
3. The method of claim 2, after the step S21 and before the step S22, further comprising:
s211, performing feature binarization processing on feature values in the voice signal sequence to obtain a first voice signal mark sequence corresponding to the voice signal sequence, wherein a first threshold value of the feature binarization is an average value of the feature values in the voice signal sequence, the mark is 1 if the feature value is greater than or equal to the first threshold value, and the mark is 0 if the feature value is less than the first threshold value;
s212, performing secondary feature binarization processing on the feature values in the speech signal sequence to obtain a second speech signal flag sequence corresponding to the speech signal sequence, where a second threshold of the secondary feature binarization is a mean value of the feature values marked as 1 in the speech signal sequence or a threshold coefficient multiplied by the mean value, the feature values are marked as 1 if being greater than or equal to the second threshold, the feature values are marked as 0 if being smaller than the second threshold, and the threshold coefficient is an empirical value.
4. The method according to claim 3, wherein the step S22 is:
and judging the number of the envelopes according to the second voice signal mark sequence.
5. The method according to claim 4, wherein the step S22 comprises the steps of:
s221, setting a Step Value Step and a merging threshold Value;
s222, carrying out segmentation processing on the second voice signal marking sequence according to the Step value Step;
s223, summing Sumi of the mark values in all the segments in the second voice signal mark sequence one by one, wherein i represents the ith segment;
s224, comparing Sumi with the merging threshold Value, if Sumi is larger than or equal to Value, marking the mark Value in the segment as 1, and if Sumi is smaller than Value, marking the mark Value in the segment corresponding to Sumi as 0 to obtain a voice signal mark merging sequence.
6. The method according to claim 5, wherein the step S2 of determining the number of envelopes comprises:
and calculating the number M of continuous segments with the mark value marked as 1 in the voice signal mark merging sequence, and taking the number M as the envelope number of the voice of the unit time length.
7. The method according to claim 5 or 6, wherein the step S3 includes:
and calculating the number N of sampling points in a continuous segment with a mark value marked as 1 in the voice signal mark merging sequence, wherein the envelope duration calculation method comprises the following steps:
T=1000*N/fs
where N represents the number of sampling points, fs represents the sampling frequency, and the unit of the duration of the envelope is ms.
8. An electronic device comprising a memory and a processor, wherein the memory has stored therein a computer program, and wherein the processor is arranged to execute the computer program to perform the method of any of claims 1 to 7.
9. A storage medium, in which a computer program is stored, wherein the computer program is arranged to perform the method of any of claims 1 to 7 when executed.
CN201910624198.3A 2019-07-11 2019-07-11 Voice wake-up method, electronic device and storage medium Active CN110349566B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910624198.3A CN110349566B (en) 2019-07-11 2019-07-11 Voice wake-up method, electronic device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910624198.3A CN110349566B (en) 2019-07-11 2019-07-11 Voice wake-up method, electronic device and storage medium

Publications (2)

Publication Number Publication Date
CN110349566A CN110349566A (en) 2019-10-18
CN110349566B true CN110349566B (en) 2020-11-24

Family

ID=68175698

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910624198.3A Active CN110349566B (en) 2019-07-11 2019-07-11 Voice wake-up method, electronic device and storage medium

Country Status (1)

Country Link
CN (1) CN110349566B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105575405A (en) * 2014-10-08 2016-05-11 展讯通信(上海)有限公司 Double-microphone voice active detection method and voice acquisition device
CN105898065A (en) * 2016-05-16 2016-08-24 深圳天珑无线科技有限公司 Intelligent terminal and control method thereof
CN106131292A (en) * 2016-06-03 2016-11-16 上海与德通讯技术有限公司 The system of the method for terminal wake-up, awakening method and correspondence is set
CN107102713A (en) * 2016-02-19 2017-08-29 北京君正集成电路股份有限公司 It is a kind of to reduce the method and device of power consumption
DE102018204860A1 (en) * 2017-03-31 2018-10-04 Intel Corporation Systems and methods for energy efficient and low power distributed automatic speech recognition on portable devices
CN109360585A (en) * 2018-12-19 2019-02-19 晶晨半导体(上海)股份有限公司 A kind of voice-activation detecting method
CN109378000A (en) * 2018-12-19 2019-02-22 科大讯飞股份有限公司 Voice awakening method, device, system, equipment, server and storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9953632B2 (en) * 2014-04-17 2018-04-24 Qualcomm Incorporated Keyword model generation for detecting user-defined keyword

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105575405A (en) * 2014-10-08 2016-05-11 展讯通信(上海)有限公司 Double-microphone voice active detection method and voice acquisition device
CN107102713A (en) * 2016-02-19 2017-08-29 北京君正集成电路股份有限公司 It is a kind of to reduce the method and device of power consumption
CN105898065A (en) * 2016-05-16 2016-08-24 深圳天珑无线科技有限公司 Intelligent terminal and control method thereof
CN106131292A (en) * 2016-06-03 2016-11-16 上海与德通讯技术有限公司 The system of the method for terminal wake-up, awakening method and correspondence is set
DE102018204860A1 (en) * 2017-03-31 2018-10-04 Intel Corporation Systems and methods for energy efficient and low power distributed automatic speech recognition on portable devices
CN109360585A (en) * 2018-12-19 2019-02-19 晶晨半导体(上海)股份有限公司 A kind of voice-activation detecting method
CN109378000A (en) * 2018-12-19 2019-02-22 科大讯飞股份有限公司 Voice awakening method, device, system, equipment, server and storage medium

Also Published As

Publication number Publication date
CN110349566A (en) 2019-10-18

Similar Documents

Publication Publication Date Title
CN108492827B (en) Wake-up processing method, device and the storage medium of application program
CN107240395B (en) Acoustic model training method and device, computer equipment and storage medium
CN105261366B (en) Audio recognition method, speech engine and terminal
EP3522153A1 (en) Voice control system, wakeup method and wakeup apparatus therefor, electrical appliance and co-processor
CN110428810A (en) A kind of recognition methods, device and electronic equipment that voice wakes up
CN110970016B (en) Awakening model generation method, intelligent terminal awakening method and device
CN109065046A (en) Method, apparatus, electronic equipment and the computer readable storage medium that voice wakes up
CN105632486A (en) Voice wake-up method and device of intelligent hardware
CN111091813A (en) Voice wakeup model updating method, device, equipment and medium
CN110942763A (en) Voice recognition method and device
CN113436611B (en) Test method and device for vehicle-mounted voice equipment, electronic equipment and storage medium
CN112507118A (en) Information classification and extraction method and device and electronic equipment
CN104123930A (en) Guttural identification method and device
CN108053822A (en) A kind of audio signal processing method, device, terminal device and medium
CN110349566B (en) Voice wake-up method, electronic device and storage medium
CN112767935B (en) Awakening index monitoring method and device and electronic equipment
CN113190678B (en) Chinese dialect language classification system based on parameter sparse sharing
CN114791771A (en) Interaction management system and method for intelligent voice mouse
CN114267342A (en) Recognition model training method, recognition method, electronic device and storage medium
CN113903334B (en) Method and device for training sound source positioning model and sound source positioning
CN115881124A (en) Voice wake-up recognition method, device and storage medium
CN113593553B (en) Voice recognition method, voice recognition apparatus, voice management server, and storage medium
CN115862604A (en) Voice wakeup model training and voice wakeup method, device and computer equipment
CN115670397A (en) PPG artifact identification method and device, storage medium and electronic equipment
CN114119972A (en) Model acquisition and object processing method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP02 Change in the address of a patent holder
CP02 Change in the address of a patent holder

Address after: 519031 office 1316, No. 1, lianao Road, Hengqin new area, Zhuhai, Guangdong

Patentee after: LONGMA ZHIXIN (ZHUHAI HENGQIN) TECHNOLOGY Co.,Ltd.

Address before: Room 417, 418, 419, building 20, creative Valley, 1889 Huandao East Road, Hengqin New District, Zhuhai City, Guangdong Province

Patentee before: LONGMA ZHIXIN (ZHUHAI HENGQIN) TECHNOLOGY Co.,Ltd.

PP01 Preservation of patent right
PP01 Preservation of patent right

Effective date of registration: 20240718

Granted publication date: 20201124