CN112509596A - Wake-up control method and device, storage medium and terminal - Google Patents
Wake-up control method and device, storage medium and terminal Download PDFInfo
- Publication number
- CN112509596A CN112509596A CN202011303745.7A CN202011303745A CN112509596A CN 112509596 A CN112509596 A CN 112509596A CN 202011303745 A CN202011303745 A CN 202011303745A CN 112509596 A CN112509596 A CN 112509596A
- Authority
- CN
- China
- Prior art keywords
- confidence
- target
- audio data
- signal processing
- terminal
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 44
- 238000012545 processing Methods 0.000 claims abstract description 152
- 238000004590 computer program Methods 0.000 claims description 10
- 238000000926 separation method Methods 0.000 claims description 8
- 230000001629 suppression Effects 0.000 claims description 8
- 238000004891 communication Methods 0.000 description 10
- 238000003672 processing method Methods 0.000 description 10
- 238000005516 engineering process Methods 0.000 description 7
- 238000010586 diagram Methods 0.000 description 6
- 230000003993 interaction Effects 0.000 description 4
- 230000003287 optical effect Effects 0.000 description 4
- 230000005236 sound signal Effects 0.000 description 4
- 230000006870 function Effects 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- 230000001133 acceleration Effects 0.000 description 2
- 230000007613 environmental effect Effects 0.000 description 2
- 238000007726 management method Methods 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 230000009471 action Effects 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000005406 washing Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/008—Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0272—Voice signal separating
- G10L21/028—Voice signal separating using properties of sound source
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/223—Execution procedure of a spoken command
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L2021/02161—Number of inputs available containing the signal or the noise to be suppressed
- G10L2021/02166—Microphone arrays; Beamforming
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D30/00—Reducing energy consumption in communication networks
- Y02D30/70—Reducing energy consumption in communication networks in wireless communication networks
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Multimedia (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Quality & Reliability (AREA)
- Mathematical Physics (AREA)
- Circuit For Audible Band Transducer (AREA)
- Signal Processing For Digital Recording And Reproducing (AREA)
Abstract
The disclosure relates to a wake-up control method, a device, a storage medium and a terminal, wherein the method comprises the following steps: collecting multi-channel audio data; respectively carrying out signal processing on each path of first audio data acquired in a target time period to obtain a plurality of first target audio data; respectively obtaining first confidence degrees of a plurality of first target audio data, wherein the confidence degrees are used for representing the probability that the audio data can wake up the terminal; acquiring a second confidence degree of second target audio data in a historical time period, wherein the historical time period is a preset time period before the target time period, and the second confidence degree comprises the confidence degrees of a plurality of second target audio data; and determining whether to wake up the terminal according to the first confidence coefficient and the second confidence coefficient. That is, whether to wake up the terminal can be determined according to the first confidence of the target time period and the second confidence of the historical time period, so that the probability that the terminal is mistakenly awakened or missed to awaken can be reduced, and the accuracy of the voice recognition system can be improved.
Description
Technical Field
The present disclosure relates to the field of terminal technologies, and in particular, to a wake-up control method and apparatus, a storage medium, and a terminal.
Background
With the development of science and technology, more and more intelligent devices gradually enter the lives of users, and applications such as voice control, voice input, voice start and the like in the intelligent devices become more and more popular. The intelligent equipment can collect voice data of the user in real time by carrying the voice recognition system, execute a control instruction sent by the user according to the voice data and interact with the user.
However, in a real-world environment, due to noise interference, when a control instruction of a user is responded according to real-time voice data, the probability of occurrence of false recognition is high, so that the accuracy of the existing voice recognition system is low.
Disclosure of Invention
To overcome the problems in the related art, the present disclosure provides a wake-up control method, apparatus, storage medium, and terminal.
According to a first aspect of the embodiments of the present disclosure, there is provided a wake-up control method, including: collecting multi-channel audio data; respectively carrying out signal processing on each path of first audio data acquired in a target time period to obtain a plurality of first target audio data; respectively obtaining first confidence degrees of a plurality of first target audio data, wherein the confidence degrees are used for representing the probability that the audio data can wake up a terminal; acquiring a second confidence degree of second target audio data in a historical time period, wherein the historical time period is a preset time period before the target time period, and the second confidence degree comprises the confidence degrees of a plurality of second target audio data; and determining whether to awaken the terminal according to the first confidence coefficient and the second confidence coefficient.
Optionally, the respectively performing signal processing on each path of first audio data acquired in the target time period to obtain a plurality of first target audio data includes: selecting one microphone of a microphone array of the terminal as a reference channel; acquiring reference audio data acquired by the reference channel in the target time period; and according to the reference audio data, performing signal processing on each path of first audio data through a plurality of signal processing modes respectively to obtain a plurality of first target audio data, wherein the signal processing modes of the first audio data in different paths are different.
Optionally, the respectively obtaining the first confidence degrees of the plurality of first target audio data includes: determining a signal processing mode corresponding to each first target audio data in the plurality of first target audio data; determining a target decoder corresponding to the first target audio data according to the signal processing mode, wherein different signal processing modes correspond to different decoders; and inputting the first target audio data into the target decoder for decoding processing to obtain a first confidence coefficient of the first target audio data.
Optionally, the determining, according to the signal processing method, a target decoder corresponding to the first target audio data includes: determining a decoder corresponding to the signal processing mode from a plurality of decoders through a preset decoder association relationship, wherein the decoder association relationship comprises the correspondence relationship between different signal processing modes and the decoders; and taking the decoder corresponding to the signal processing mode as the target decoder.
Optionally, the determining whether to wake up the terminal according to the first confidence level and the second confidence level includes: executing the following awakening processing modes according to the first target confidence degrees output by the target decoder until the terminal is awakened under the condition that the first confidence degree output by one target decoder is obtained every time, or executing the awakening processing modes according to the first confidence degrees output by a plurality of target decoders; the wake-up processing mode comprises the following steps: and determining whether to awaken the terminal according to the second confidence coefficient and the first confidence coefficient output by the target decoder.
Optionally, the determining whether to wake up the terminal according to the first confidence level and the second confidence level includes: determining a target confidence degree from the second confidence degrees, wherein the target confidence degree and the first confidence degree are confidence degrees obtained by decoding through the same decoder; acquiring a weight value corresponding to the first confidence degree according to the target confidence degree and a third confidence degree, wherein the third confidence degree comprises other confidence degrees except the target confidence degree in the second confidence degree; determining a final confidence degree according to the weight value and the first confidence degree; and determining whether to awaken the terminal according to the final confidence.
Optionally, the obtaining, according to the target confidence and the third confidence, a weight value corresponding to the first confidence includes: obtaining a confidence difference between the target confidence and the third confidence; and acquiring a weight value corresponding to the first confidence coefficient according to the confidence coefficient difference value and a preset corresponding relation.
Optionally, the determining whether to wake up the terminal according to the final confidence degree includes: and determining to awaken the terminal under the condition that the final confidence degree is greater than or equal to a preset confidence degree threshold value.
Optionally, the signal processing means comprises blind source separation or noise suppression.
According to a second aspect of the embodiments of the present disclosure, there is provided a wake-up control apparatus, the apparatus including: the data acquisition module is configured to acquire multi-channel audio data; the signal processing module is configured to perform signal processing on each path of first audio data acquired in a target time period respectively to obtain a plurality of first target audio data; a first confidence coefficient obtaining module configured to obtain first confidence coefficients of a plurality of first target audio data, respectively, where the confidence coefficients are used to characterize a probability that the audio data can wake up a terminal; a second confidence obtaining module configured to obtain a second confidence of second target audio data in a historical time period, wherein the historical time period is a preset time period before the target time period, and the second confidence comprises a plurality of confidences of the second target audio data; a wake-up module configured to determine whether to wake up the terminal according to the first confidence level and the second confidence level.
Optionally, the signal processing module includes: a channel selection submodule configured to select one microphone of a microphone array of the terminal as a reference channel; a reference data acquisition submodule configured to acquire reference audio data acquired by the reference channel at the target time period; and the signal processing submodule is configured to perform signal processing on each path of the first audio data through a plurality of signal processing modes according to the reference audio data to obtain a plurality of first target audio data, wherein the signal processing modes of different paths of the first audio data are different.
Optionally, the first confidence level obtaining module includes: a processing mode determining submodule configured to determine, for each of the plurality of first target audio data, a signal processing mode corresponding to the first target audio data; a decoder determining submodule configured to determine a target decoder corresponding to the first target audio data according to the signal processing manner, wherein different signal processing manners correspond to different decoders; and the confidence coefficient determining submodule is configured to input the first target audio data into the target decoder for decoding processing to obtain a first confidence coefficient of the first target audio data.
Optionally, the decoder determining sub-module is configured to: determining a decoder corresponding to the signal processing mode from a plurality of decoders through a preset decoder association relationship, wherein the decoder association relationship comprises the correspondence relationship between different signal processing modes and the decoders; and taking the decoder corresponding to the signal processing mode as the target decoder.
Optionally, the wake-up module includes: the awakening processing sub-module is configured to execute the following awakening processing modes according to the first target confidence coefficient output by the target decoder until the terminal is awakened under the condition that the first confidence coefficient output by one target decoder is obtained, or execute the awakening processing modes according to the first confidence coefficients output by a plurality of target decoders; the wake-up processing mode comprises the following steps: and determining whether to awaken the terminal according to the second confidence coefficient and the first confidence coefficient output by the target decoder.
Optionally, the wake-up module includes: a target confidence determination submodule configured to determine a target confidence from the second confidence, the target confidence and the first confidence being confidences decoded by the same decoder; the weight value determining submodule is configured to obtain a weight value corresponding to the first confidence degree according to the target confidence degree and a third confidence degree, wherein the third confidence degree comprises confidence degrees other than the target confidence degree in the second confidence degree; a final confidence coefficient obtaining sub-module configured to obtain a final confidence coefficient according to the weight value and the first confidence coefficient; and the awakening submodule is configured to determine whether to awaken the terminal according to the final confidence level.
Optionally, the weight value determining submodule is further configured to: obtaining a confidence difference between the target confidence and the third confidence; and acquiring a weight value corresponding to the first confidence coefficient according to the confidence coefficient difference value and a preset corresponding relation.
Optionally, the wake processing sub-module is further configured to: and determining to awaken the terminal under the condition that the final confidence degree is greater than or equal to a preset confidence degree threshold value.
Optionally, the signal processing means comprises blind source separation or noise suppression.
According to a third aspect of embodiments of the present disclosure, there is provided a computer-readable storage medium having stored thereon computer program instructions, which when executed by a processor, implement the steps of the wake-up control method provided by the first aspect of the present disclosure.
According to a fourth aspect of the embodiments of the present disclosure, there is provided a terminal, including: a memory having a computer program stored thereon; a processor for executing the computer program in the memory to implement the steps of the wake-up control method provided by the first aspect of the present disclosure.
The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects: collecting multi-channel audio data; respectively carrying out signal processing on each path of first audio data acquired in a target time period to obtain a plurality of first target audio data; respectively obtaining first confidence degrees of a plurality of first target audio data, wherein the confidence degrees are used for representing the probability that the audio data can wake up a terminal; acquiring a second confidence degree of second target audio data in a historical time period, wherein the historical time period is a preset time period before the target time period, and the second confidence degree comprises the confidence degrees of a plurality of second target audio data; and determining whether to awaken the terminal according to the first confidence coefficient and the second confidence coefficient. That is to say, the present disclosure may determine whether to wake up the terminal according to the first confidence of the target time period and the second confidence of the historical time period, and thus, may reduce the probability that the terminal is mistakenly woken up or missed to wake up, and may thereby improve the accuracy of the voice recognition system.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.
FIG. 1 is a flow chart illustrating a wake-up control method according to an exemplary embodiment;
FIG. 2 is a block diagram illustrating a terminal according to an exemplary embodiment;
FIG. 3 is a flow chart illustrating another wake-up control method in accordance with an exemplary embodiment;
FIG. 4 is a schematic diagram illustrating a wake-up control apparatus according to an exemplary embodiment;
fig. 5 is a block diagram illustrating a terminal according to an example embodiment.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.
First, an application scenario of the present disclosure is described. The method and the device can be applied to the terminal with the voice recognition function, and in a real environment, due to the influence of environmental noise on the voice recognition system, the probability that the terminal is awakened by mistake or is awakened neglectly is high. Considering that a single microphone cannot effectively process noise, especially noise whose frequency response changes with time, such as music, in the related art, in order to satisfy different noise scenes, such as background human voice, washing machine, television, etc., a microphone array may be employed in a terminal and prediction may be performed by a plurality of decoders to determine whether to wake up the terminal.
However, when prediction is performed by a plurality of decoders, if a terminal is awakened as a result of prediction output by any one of the decoders, it is determined that the terminal is awakened. In this case, if the decoder has a low prediction accuracy, the terminal may be mistakenly woken up, so that the accuracy of the speech recognition system of the terminal is low, and the user experience is affected.
In order to solve the above problem, the present disclosure provides a wake-up control method, device, storage medium, and terminal, where each path of first audio data acquired in a target time period is subjected to signal processing to obtain a plurality of first target audio data, first confidence degrees of the plurality of first target audio data are obtained, then a second confidence degree of a second target audio data in a historical time period can be obtained, and whether to wake up the terminal is determined according to the first confidence degree and the second confidence degree. That is to say, the present disclosure may determine whether to wake up the terminal according to the first confidence of the target time period and the second confidence of the historical time period, and thus, may reduce the probability that the terminal is mistakenly woken up or missed to wake up, and may thereby improve the accuracy of the voice recognition system.
The present disclosure is described below with reference to specific examples.
Fig. 1 is a flowchart illustrating a wake-up control method according to an exemplary embodiment, as shown in fig. 1, the method includes:
and S101, collecting multi-channel audio data.
It should be noted that the wake-up control method is applied to a terminal device with a voice interaction function, for example, the terminal device is installed with an application with the voice interaction function, such as a voice assistant application, and the voice assistant application is used for recognizing voice information of a user. The embodiments of the present disclosure may be applied to various terminal devices including, but not limited to, fixed devices and mobile devices, for example, the fixed devices include, but are not limited to: personal Computers (PC), televisions, air conditioners, wall-mounted ovens, and the like; the mobile devices include, but are not limited to: cell-phone, panel computer, wearable equipment, audio amplifier, alarm clock etc. this disclosure does not limit to this. Fig. 2 is a schematic diagram illustrating a structure of a terminal according to an exemplary embodiment, as shown in fig. 2, the terminal may include a microphone array, which may include a plurality of microphones, a signal processing module, a decoder, and a wake-up module, and the decoder may also include a plurality of microphones. The terminal can acquire multi-channel audio data in real time through the microphone array and send the multi-channel audio data to the signal processing module, the signal processing module can process the multi-channel audio data to obtain a plurality of processed target audio data, then the target audio data can be decoded through a plurality of decoders to obtain a plurality of confidence degrees, and finally whether the terminal is awakened or not is determined according to the confidence degrees.
In this step, after the terminal is powered on and started, the acquisition module of the terminal may acquire multiple paths of audio data through multiple microphones in the microphone array, where each microphone corresponds to one path of audio data.
S102, respectively carrying out signal processing on each path of first audio data collected in the target time period to obtain a plurality of first target audio data.
In this step, the corresponding signal processing method may be preset according to the environment in which the terminal is used, for example, a large number of signal processing methods may be set for a terminal that is often used in a noisy environment, such as a mobile phone, and a small number of signal processing methods may be set for a terminal that is used in a relatively quiet environment, such as an air conditioner.
After the multi-channel first audio data collected in the target time period are obtained, the first audio data can be subjected to signal processing in a signal processing mode preset by the terminal, and a plurality of first target audio data are obtained.
S103, respectively obtaining first confidence degrees of a plurality of first target audio data.
The confidence coefficient is used for representing the probability that the audio data can wake up the terminal, and the range of the confidence coefficient can be 0-100.
In this step, after the plurality of first target audio data are obtained, for each first target audio data, a decoder may perform decoding processing on the first target audio data to obtain a first confidence of the first target audio data, and finally obtain a plurality of first confidence levels.
And S104, acquiring a second confidence coefficient of the second target audio data in the historical time period.
The historical time period is a preset time period before the target time period, and the historical time period may be a time period belonging to the same scene as the target time period, for example, the historical time period and the target time period both belong to a time period for acquiring audio data in a voice wake-up scene. In addition, the duration of the preset time period may also be set according to the type of the terminal, and the duration of the preset time period may also be set according to a test experience value, which is not limited in this disclosure.
The second confidence may include confidence of a plurality of second target audio data, which are audio data after signal processing of the second audio data. The obtaining manner of the second confidence coefficient may refer to the obtaining manner of the first confidence coefficient, and is not described herein again. In addition, the second confidence may be stored in the terminal, for example, if the preset time period is 1 minute, the second confidence of the second target audio data 1 minute before the target time period may be stored. For example, the present disclosure may store the second confidence level in a queue manner, and different second confidence levels may correspond to different queues, for example, if the second confidence level includes 10, 10 queues may be used to store the 10 second confidence levels. For each queue, the second confidence of the head of the queue is the second confidence of the second target audio data obtained earliest, and the second confidence of the tail of the queue is the second confidence of the second target audio data obtained latest.
In this step, after the first confidence degrees of the plurality of first target audio data are acquired, a plurality of stored second confidence degrees may be acquired.
And S105, determining whether to awaken the terminal according to the first confidence level and the second confidence level.
In this step, after obtaining the plurality of first confidence levels of the plurality of first target audio data, it may be determined whether to wake up the terminal according to any one of the first confidence levels and the second confidence level. Considering that the variation of the environmental noise within a period of time is small, in order to avoid that the first confidence is not accurate enough to cause the terminal to be woken by mistake or to be woken by omission, the first confidence can be adjusted according to the second confidence. For example, if the obtained first confidence level is relatively high, and a second confidence level of second target audio data obtained in a historical time period before the first target audio data corresponding to the first confidence level is relatively low, a relatively large error may exist in the first confidence level, and in this case, the first confidence level may be reduced with reference to the second confidence level, so that a more accurate first confidence level may be obtained.
Further, after the adjusted first confidence is obtained, whether to wake up the terminal may be determined according to the adjusted first confidence, for example, when the adjusted first confidence is higher, it may be determined to wake up the terminal, and when the adjusted first confidence is lower, it may be determined not to wake up the terminal.
By adopting the method, each path of first audio data collected in the target time period is subjected to signal processing to obtain a plurality of first target audio data, first confidence degrees of the plurality of first target audio data are respectively obtained, then, a second confidence degree of second target audio data in the historical time period can be obtained, and whether the terminal is awakened or not is determined according to the first confidence degree and the second confidence degree. That is to say, the present disclosure may determine whether to wake up the terminal according to the first confidence of the target time period and the second confidence of the historical time period, and thus, may reduce the probability that the terminal is mistakenly woken up or missed to wake up, and may thereby improve the accuracy of the voice recognition system.
Fig. 3 is a flow chart illustrating another wake-up control method according to an exemplary embodiment, as shown in fig. 3, the method including:
s301, collecting multi-channel audio data.
S302, selecting one microphone of the microphone array of the terminal as a reference channel.
It should be noted that, while the terminal collects the multiple channels of audio data through the collection module, the terminal may also output audio data, for example, the terminal plays music, video, or ring tone, and in this case, the multiple channels of audio data collected by the terminal also include the audio data output by the terminal. When the terminal performs speech recognition, it needs to extract speech input by the user from the collected audio data, so as to use one microphone in the microphone array of the terminal as a reference channel, as shown in fig. 2, and obtain the audio data output by the terminal through the reference channel.
And S303, acquiring reference audio data acquired by the reference channel in the target time period.
In this step, the terminal may acquire the multiple channels of audio data and simultaneously acquire the reference audio data output by the terminal in real time through the reference channel, so that the terminal acquires the first audio data of the target time period and then synchronously acquires the reference audio data of the target time period.
And S304, according to the reference audio data, performing signal processing on each path of the first audio data through a plurality of signal processing modes respectively to obtain a plurality of first target audio data.
The signal processing modes of the first audio data of different paths are different, and the signal processing modes may include blind source separation or noise suppression.
In this step, after acquiring the multiple paths of first audio data acquired within the target time period, signal processing may be performed on the multiple paths of first audio data through the multiple signal processing modes preset by the terminal according to the reference audio data, so as to filter noise in the first audio data, and obtain a voice in the first audio data, that is, the first target audio data. For example, if the first audio data includes two paths, and the signal processing manner includes two manners, namely a blind source separation manner and a noise suppression manner, the two paths of first audio data may be respectively signal-processed by the blind source separation manner and the noise suppression manner according to the reference audio data to obtain two first target audio data. Because the reference audio data output by the terminal in the target time period is synchronously acquired, when the first audio data is subjected to signal processing, partial noise in the first audio data can be filtered according to the reference audio data, so that the complexity of signal processing can be simplified, the awakening delay of the terminal can be reduced, and the user experience can be improved.
S305, determining a signal processing mode corresponding to each of the plurality of first target audio data.
The first audio data of different paths correspond to different signal processing modes.
In this step, after each path of first audio data is acquired, the path of first audio data is subjected to signal processing in a corresponding signal processing manner to obtain first target audio data corresponding to the path of first audio data, and therefore, each first target audio data corresponds to one signal processing manner. After the plurality of first target audio data are obtained, a signal processing manner corresponding to the first target audio data may be determined for each first target audio data.
S306, determining a target decoder corresponding to the first target audio data according to the signal processing mode.
The different signal processing modes correspond to different decoders, and parameters of different decoders may be different, for example, the parameters may be determined according to types of the signal processing modes, and different parameters may be set for different types of signal processing modes, which is not limited in this disclosure.
In this step, after determining the signal processing method corresponding to the first target audio data, a decoder corresponding to the signal processing method may be determined from a plurality of decoders through a preset decoder association relationship, where the decoder association relationship may include a correspondence relationship between different signal processing methods and decoders, and the decoder corresponding to the signal processing method is taken as the target decoder.
S307, inputting the first target audio data into the target decoder for decoding, and outputting a first confidence of the first target audio data.
In this step, after the target decoder corresponding to each first target audio data is obtained, for the first target audio data, the first target audio data may be input into the target decoder, and the target decoder performs decoding processing on the first target audio data to obtain a first confidence of the first target audio data.
And S308, acquiring a second confidence coefficient of the second target audio data in the historical time period.
The historical time period is a preset time period before the target time period, the historical time period may be a time period belonging to the same scene as the target time period, the duration of the preset time period may be set according to the type of the terminal, or the duration of the preset time period may be set according to a test experience value, which is not limited in the present disclosure.
The second confidence may include confidence of a plurality of second target audio data, which are audio data after signal processing of the second audio data. The obtaining manner of the second confidence may refer to the obtaining manner of the first confidence, which is not described herein again, and in addition, the second confidence may be stored in the terminal, for example, if the preset time period is 1 minute, the second confidence of the second target audio data 1 minute before the target time period may be stored. For example, the present disclosure may store the second confidence level in a queue manner, and different second confidence levels may correspond to different queues, for example, if the second confidence level includes 10, 10 queues may be used to store the 10 second confidence levels. For each queue, the second confidence of the head of the queue is the second confidence of the second target audio data obtained earliest, and the second confidence of the tail of the queue is the second confidence of the second target audio data obtained latest.
And S309, determining the target confidence level from the second confidence levels.
Wherein the target confidence and the first confidence are confidences decoded by the same decoder.
In this step, before determining the target confidence, a target decoder corresponding to the first confidence is determined, and according to the target decoder corresponding to the first confidence, the same target confidence as the target decoder corresponding to the first confidence is determined from the second confidence.
S310, acquiring a weight value corresponding to the first confidence coefficient according to the target confidence coefficient and the third confidence coefficient.
Wherein the third confidence level comprises the confidence levels of the second confidence levels except the target confidence level.
In this step, after the target confidence is obtained, the third confidence may be determined according to the target confidence, and then, a confidence difference between the target confidence and the third confidence may be obtained, and a weight value corresponding to the first confidence may be obtained according to the confidence difference and a preset corresponding relationship. Wherein, if the third confidence level includes a confidence level, the confidence level difference between the target confidence level and the third confidence level can be directly calculated; if the third confidence level includes multiple confidence levels, an average confidence level of the multiple third confidence levels may be obtained first, and then a confidence level difference between the target confidence level and the average confidence level may be calculated.
In a possible implementation manner, the preset corresponding relationship may be a preset weight value relationship, where the weight value relationship includes a corresponding relationship between the confidence difference and the weight value, and after obtaining the confidence difference between the target confidence and the third confidence, the weight value corresponding to the confidence difference may be determined through the weight value relationship. The weight value relationship may be set empirically, for example, the weight value may be 1.1 when the confidence difference is 0.15, and the weight value may be 0.8 when the confidence difference is-0.2.
In another possible implementation manner, the preset corresponding relationship of the weight values of any decoder may be the following calculation formula:
ratio=1+(Asmooth-Bsmooth)/a (1)
wherein, ratio is the weight value, AsmoothAs the confidence of the object, BsmoothFor the third confidence, a is a preset constant.
After the confidence difference between the target confidence and the third confidence is obtained, the weight value corresponding to the first confidence may be calculated according to the confidence difference and a preset constant by using formula (1).
And S311, acquiring a final confidence level according to the weight value and the first confidence level.
In some embodiments of this step, after obtaining the weight value, the first confidence level may be multiplied by the weight value to obtain the final confidence level. In other embodiments, the final confidence may also be a sum, a difference, or a division of the first confidence and the weight value. The method for acquiring the final confidence level according to the weight value and the first confidence level is not limited in the present disclosure, and may be set according to different needs.
When the target confidence is higher than the third confidence, the obtained final confidence is also higher than the first confidence output by the target decoder, and when the target confidence is lower than the third confidence, the obtained final confidence is also lower than the first confidence output by the target decoder. Therefore, the first confidence coefficient can be corrected through the second confidence coefficient of the historical time period, so that more accurate confidence coefficient can be obtained, and the accuracy of the voice recognition system of the terminal is improved.
And S312, determining whether to awaken the terminal according to the final confidence.
In this step, after the final confidence is obtained, a preset confidence threshold may be obtained first, the final confidence and the confidence threshold are compared, and the terminal is determined to be awakened when the final confidence is greater than or equal to the preset confidence threshold. The preset confidence threshold may be determined according to the type of the terminal, for example, for a terminal with a higher requirement on the wake-up rate but a lower requirement on the false alarm rate, a lower preset confidence threshold, for example, 0.7 may be set, for a terminal with a lower requirement on the wake-up rate but a higher requirement on the false alarm rate, a higher preset confidence threshold, for example, 0.9 may be set, or according to the requirements on the wake-up rate and the false alarm rate of the terminal, it is determined through a test, and the setting mode of the preset confidence threshold is not limited by the present disclosure.
It should be noted that the first confidence level in steps S308 to S312 may be any one of a plurality of first confidence levels, but it is considered that after the first audio data in the target time period is acquired, the first audio data may be signal-processed by a plurality of signal processing methods to obtain a plurality of first target audio data, and then the plurality of first target audio data may be input to a plurality of target decoders to obtain the first confidence levels of the plurality of first target audio data. Since the time taken for the signal processing of the first audio data by the plurality of signal processing methods is different, the time taken for obtaining each first target audio data is different, and therefore, the time taken for each first target audio data to be input into the target decoder is also different, and the time taken for each target decoder to perform decoding processing on the first target audio data is also different, so that the time taken for each target decoder to output the first confidence of the first target audio data is also different finally.
For the above reasons, if the output of the arbitrarily selected first confidence is slow, the wake-up delay time will be long, and the user experience will be affected. Therefore, in order to avoid the user experience being poor due to too long wake-up delay time, the following wake-up processing mode may be executed according to the first target confidence level output by one target decoder until the terminal is woken up, or the wake-up processing mode may be executed according to the first confidence levels output by a plurality of target decoders, each time the first confidence level output by the target decoder is obtained.
Wherein, the awakening processing mode comprises: and determining whether to wake up the terminal according to the second confidence coefficient and the first confidence coefficient output by the target decoder.
For example, if the terminal includes a target decoder a, a target decoder B, and a target decoder C, if the target decoder B outputs the first confidence level first, the wake-up processing mode may be executed according to the first confidence level output by the target decoder B first. If the terminal is confirmed to be awakened after the awakening processing mode is executed, the awakening processing mode is stopped being executed, and the terminal can be directly awakened; if the terminal is determined not to be woken up after the wakening-up processing mode is executed, a first confidence coefficient output by a next target decoder can be obtained, if the target decoder A outputs the first confidence coefficient next, the wakening-up processing mode can be continuously executed according to the first confidence coefficient output by the target decoder A, if the terminal is determined to be woken up according to the wakening-up processing mode, the wakening-up processing mode is stopped to be executed, and the terminal can be wakened up directly; if the terminal is determined not to be woken up after the wakeup processing mode is executed, the first confidence coefficient output by the target decoder C can be obtained, the wakeup processing mode is continuously executed according to the first confidence coefficient output by the target decoder C, and whether the terminal is woken up or not is determined. Therefore, whether the terminal is awakened can be determined according to the first confidence coefficient output firstly without waiting for the first confidence coefficient output by a specific target decoder, so that the awakening delay time can be shortened, and the user experience is improved.
By adopting the method, the target confidence coefficient can be determined from the second confidence coefficient according to the first confidence coefficient output by the target decoder, the weight value corresponding to the first confidence coefficient is obtained according to the target confidence coefficient and the third confidence coefficient, the final confidence coefficient is determined according to the weight value and the first confidence coefficient, and whether the terminal is awakened or not is determined according to the target confidence coefficient, so that the first confidence coefficient can be adjusted according to the target confidence coefficient and the third confidence coefficient, and more accurate final confidence coefficient can be obtained, and the accuracy of a voice recognition system of the terminal is higher; in addition, according to the method and the device, after the first confidence coefficient output by one target decoder is obtained, the awakening processing mode is executed, so that the awakening efficiency of the terminal can be improved, and the user experience is improved.
Fig. 4 is a schematic structural diagram illustrating a wake-up control apparatus according to an exemplary embodiment, where as shown in fig. 4, the apparatus includes:
a data acquisition module 401 configured to acquire multiple channels of audio data;
a signal processing module 402, configured to perform signal processing on each path of first audio data acquired in a target time period, respectively, to obtain a plurality of first target audio data;
a first confidence obtaining module 403, configured to obtain first confidences of a plurality of first target audio data, respectively, where the confidences are used to characterize a probability that the audio data can wake up the terminal;
a second confidence obtaining module 404 configured to obtain a second confidence of the second target audio data in a historical time period, where the historical time period is a preset time period before the target time period, and the second confidence includes the confidence of a plurality of second target audio data;
a wake-up module 405 configured to determine whether to wake up the terminal according to the first confidence level and the second confidence level.
Optionally, the signal processing module 402 includes:
a channel selection submodule configured to select one microphone of the microphone array of the terminal as a reference channel;
a reference data acquisition submodule configured to acquire reference audio data acquired by the reference channel at the target time period;
and the signal processing submodule is configured to perform signal processing on each path of first audio data through a plurality of signal processing modes respectively according to the reference audio data to obtain a plurality of first target audio data, wherein the signal processing modes of the first audio data of different paths are different.
Optionally, the first confidence obtaining module 403 includes:
the processing mode determining sub-module is configured to determine, for each first target audio data in the plurality of first target audio data, a signal processing mode corresponding to the first target audio data;
a decoder determining submodule configured to determine a target decoder corresponding to the first target audio data according to the signal processing manner, wherein different signal processing manners correspond to different decoders;
and the confidence level determination submodule is configured to input the first target audio data into the target decoder for decoding processing, and output a first confidence level of the first target audio data.
Optionally, the decoder determining sub-module is configured to:
determining a decoder corresponding to the signal processing mode from a plurality of decoders through a preset decoder association relationship, wherein the decoder association relationship comprises different signal processing modes and the corresponding relationship of the decoders;
and taking the decoder corresponding to the signal processing mode as the target decoder.
Optionally, the wake-up module 405 includes:
the awakening processing submodule is configured to execute the following awakening processing modes according to the first target confidence coefficient output by a target decoder until the terminal is awakened under the condition that the first confidence coefficient output by the target decoder is obtained, or execute the awakening processing modes according to the first confidence coefficients output by a plurality of target decoders;
the wake-up processing mode comprises the following steps:
and determining whether to wake up the terminal according to the second confidence coefficient and the first confidence coefficient output by the target decoder.
Optionally, the wake-up module includes:
a target confidence determination submodule configured to determine a target confidence from the second confidence, the target confidence and the first confidence being the confidence obtained by decoding by the same decoder;
the weight value determining submodule is configured to obtain a weight value corresponding to the first confidence degree according to the target confidence degree and a third confidence degree, wherein the third confidence degree comprises other confidence degrees except the target confidence degree in the second confidence degree;
a final confidence coefficient obtaining sub-module configured to obtain a final confidence coefficient according to the weight value and the first confidence coefficient;
and the awakening submodule is configured to determine whether to awaken the terminal according to the final confidence level.
Optionally, the weight value determining submodule is further configured to:
obtaining a confidence difference between the target confidence and the third confidence;
and acquiring a weight value corresponding to the first confidence coefficient according to the confidence coefficient difference value and a preset corresponding relation.
Optionally, the wake-up sub-module is further configured to:
and determining to awaken the terminal under the condition that the final confidence coefficient is greater than or equal to a preset confidence coefficient threshold value.
Optionally, the signal processing means comprises blind source separation or noise suppression.
By the device, each path of first audio data collected in the target time period is subjected to signal processing to obtain a plurality of first target audio data, first confidence degrees of the plurality of first target audio data are respectively obtained, then a second confidence degree of second target audio data in the historical time period can be obtained, and whether the terminal is awakened or not is determined according to the first confidence degree and the second confidence degree. That is to say, the present disclosure may determine whether to wake up the terminal according to the first confidence of the target time period and the second confidence of the historical time period, and thus, may reduce the probability that the terminal is mistakenly woken up or missed to wake up, and may thereby improve the accuracy of the voice recognition system.
With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.
The present disclosure also provides a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the steps of the wake-up control method provided by the present disclosure.
Fig. 5 is a block diagram illustrating a terminal 500 according to an example embodiment. For example, the terminal 500 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and the like.
Referring to fig. 5, terminal 500 may include one or more of the following components: a processing component 502, a memory 504, a power component 506, a multimedia component 508, an audio component 510, an input/output (I/O) interface 512, a sensor component 514, and a communication component 516.
The processing component 502 generally controls overall operation of the terminal 500, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component 502 may include one or more processors 520 to execute instructions to perform all or some of the steps of the wake-up control method described above. Further, the processing component 502 can include one or more modules that facilitate interaction between the processing component 502 and other components. For example, the processing component 502 can include a multimedia module to facilitate interaction between the multimedia component 508 and the processing component 502.
The memory 504 is configured to store various types of data to support operations at the terminal 500. Examples of such data include instructions for any application or method operating on terminal 500, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 504 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.
The multimedia component 508 includes a screen providing an output interface between the terminal 500 and the user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 508 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the terminal 500 is in an operation mode, such as a photographing mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.
The audio component 510 is configured to output and/or input audio signals. For example, the audio component 510 includes a Microphone (MIC) configured to receive external audio signals when the terminal 500 is in an operating mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 504 or transmitted via the communication component 516. In some embodiments, audio component 510 further includes a speaker for outputting audio signals.
The I/O interface 512 provides an interface between the processing component 502 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.
The sensor assembly 514 includes one or more sensors for providing various aspects of status assessment for the terminal 500. For example, sensor assembly 514 can detect an open/closed state of terminal 500, relative positioning of components, such as a display and keypad of terminal 500, position changes of terminal 500 or a component of terminal 500, presence or absence of user contact with terminal 500, orientation or acceleration/deceleration of terminal 500, and temperature changes of terminal 500. The sensor assembly 514 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 514 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 514 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
The communication component 516 is configured to facilitate communications between the terminal 500 and other devices in a wired or wireless manner. The terminal 500 may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 516 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 516 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.
In an exemplary embodiment, the terminal 500 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described wake-up control method.
In an exemplary embodiment, a non-transitory computer readable storage medium comprising instructions, such as the memory 504 comprising instructions, executable by the processor 520 of the terminal 500 to perform the wake-up control method described above is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.
In another exemplary embodiment, a computer program product is also provided, which contains a computer program executable by a programmable apparatus, the computer program having code portions for performing the wake-up control method described above when executed by the programmable apparatus.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.
Claims (20)
1. A method of wake control, the method comprising:
collecting multi-channel audio data;
respectively carrying out signal processing on each path of first audio data acquired in a target time period to obtain a plurality of first target audio data;
respectively obtaining first confidence degrees of a plurality of first target audio data, wherein the confidence degrees are used for representing the probability that the audio data can wake up a terminal;
acquiring a second confidence degree of second target audio data in a historical time period, wherein the historical time period is a preset time period before the target time period, and the second confidence degree comprises the confidence degrees of a plurality of second target audio data;
and determining whether to awaken the terminal according to the first confidence coefficient and the second confidence coefficient.
2. The method according to claim 1, wherein the separately performing signal processing on each path of first audio data collected in a target time period to obtain a plurality of first target audio data comprises:
selecting one microphone of a microphone array of the terminal as a reference channel;
acquiring reference audio data acquired by the reference channel in the target time period;
and according to the reference audio data, performing signal processing on each path of first audio data through a plurality of signal processing modes respectively to obtain a plurality of first target audio data, wherein the signal processing modes of the first audio data in different paths are different.
3. The method according to claim 1 or 2, wherein the respectively obtaining a first confidence level of a plurality of the first target audio data comprises:
determining a signal processing mode corresponding to each first target audio data in the plurality of first target audio data;
determining a target decoder corresponding to the first target audio data according to the signal processing mode, wherein different signal processing modes correspond to different decoders;
and inputting the first target audio data into the target decoder for decoding processing to obtain a first confidence coefficient of the first target audio data.
4. The method of claim 3, wherein determining the target decoder corresponding to the first target audio data according to the signal processing mode comprises:
determining a decoder corresponding to the signal processing mode from a plurality of decoders through a preset decoder association relationship, wherein the decoder association relationship comprises the correspondence relationship between different signal processing modes and the decoders;
and taking the decoder corresponding to the signal processing mode as the target decoder.
5. The method of claim 3, wherein the determining whether to wake up the terminal according to the first confidence level and the second confidence level comprises:
executing the following awakening processing modes according to the first confidence degrees output by the target decoder until the terminal is awakened under the condition that the first confidence degree output by one target decoder is obtained every time, or executing the awakening processing modes according to the first confidence degrees output by a plurality of target decoders;
the wake-up processing mode comprises the following steps:
and determining whether to awaken the terminal according to the second confidence coefficient and the first confidence coefficient output by the target decoder.
6. The method of claim 1, wherein the determining whether to wake up the terminal according to the first confidence level and the second confidence level comprises:
determining a target confidence degree from the second confidence degrees, wherein the target confidence degree and the first confidence degree are confidence degrees obtained by decoding through the same decoder;
acquiring a weight value corresponding to the first confidence degree according to the target confidence degree and a third confidence degree, wherein the third confidence degree comprises other confidence degrees except the target confidence degree in the second confidence degree;
determining a final confidence degree according to the weight value and the first confidence degree;
and determining whether to awaken the terminal according to the final confidence.
7. The method according to claim 6, wherein the obtaining the weight value corresponding to the first confidence level according to the target confidence level and the third confidence level comprises:
obtaining a confidence difference between the target confidence and the third confidence;
and acquiring a weight value corresponding to the first confidence coefficient according to the confidence coefficient difference value and a preset corresponding relation.
8. The method of claim 6, wherein the determining whether to wake up the terminal according to the final confidence level comprises:
and determining to awaken the terminal under the condition that the final confidence degree is greater than or equal to a preset confidence degree threshold value.
9. The method of claim 1, wherein the signal processing means comprises blind source separation or noise suppression.
10. A wake-up control apparatus, the apparatus comprising:
the data acquisition module is configured to acquire multi-channel audio data;
the signal processing module is configured to perform signal processing on each path of first audio data acquired in a target time period respectively to obtain a plurality of first target audio data;
a first confidence coefficient obtaining module configured to obtain first confidence coefficients of a plurality of first target audio data, respectively, where the confidence coefficients are used to characterize a probability that the audio data can wake up a terminal;
a second confidence obtaining module configured to obtain a second confidence of second target audio data in a historical time period, wherein the historical time period is a preset time period before the target time period, and the second confidence comprises a plurality of confidences of the second target audio data;
a wake-up module configured to determine whether to wake up the terminal according to the first confidence level and the second confidence level.
11. The apparatus of claim 10, wherein the signal processing module comprises:
a channel selection submodule configured to select one microphone of a microphone array of the terminal as a reference channel;
a reference data acquisition submodule configured to acquire reference audio data acquired by the reference channel at the target time period;
and the signal processing submodule is configured to perform signal processing on each path of the first audio data through a plurality of signal processing modes respectively according to the reference audio data to obtain a plurality of first target audio data, wherein the signal processing modes of different paths of the first audio data are different.
12. The apparatus of claim 10 or 11, wherein the first confidence score obtaining module comprises:
a processing mode determining submodule configured to determine, for each of the plurality of first target audio data, a signal processing mode corresponding to the first target audio data;
a decoder determining submodule configured to determine a target decoder corresponding to the first target audio data according to the signal processing manner, wherein different signal processing manners correspond to different decoders;
and the confidence coefficient determining submodule is configured to input the first target audio data into the target decoder for decoding processing to obtain a first confidence coefficient of the first target audio data.
13. The apparatus of claim 12, wherein the decoder determines a sub-module configured to:
determining a decoder corresponding to the signal processing mode from a plurality of decoders through a preset decoder association relationship, wherein the decoder association relationship comprises the correspondence relationship between different signal processing modes and the decoders;
and taking the decoder corresponding to the signal processing mode as the target decoder.
14. The apparatus of claim 12, wherein the wake-up module comprises:
the awakening processing sub-module is configured to execute the following awakening processing modes according to the first target confidence coefficient output by the target decoder until the terminal is awakened under the condition that the first confidence coefficient output by one target decoder is obtained, or execute the awakening processing modes according to the first confidence coefficients output by a plurality of target decoders;
the wake-up processing mode comprises the following steps:
and determining whether to awaken the terminal according to the second confidence coefficient and the first confidence coefficient output by the target decoder.
15. The apparatus of claim 10, wherein the wake-up module comprises:
a target confidence determination submodule configured to determine a target confidence from the second confidence, the target confidence and the first confidence being confidences decoded by the same decoder;
the weight value determining submodule is configured to obtain a weight value corresponding to the first confidence degree according to the target confidence degree and a third confidence degree, wherein the third confidence degree comprises confidence degrees other than the target confidence degree in the second confidence degree;
a final confidence coefficient obtaining sub-module configured to obtain a final confidence coefficient according to the weight value and the first confidence coefficient;
and the awakening submodule is configured to determine whether to awaken the terminal according to the final confidence level.
16. The apparatus of claim 15, wherein the weight value determination submodule is further configured to:
obtaining a confidence difference between the target confidence and the third confidence;
and acquiring a weight value corresponding to the first confidence coefficient according to the confidence coefficient difference value and a preset corresponding relation.
17. The apparatus of claim 15, wherein the wake-up sub-module is further configured to:
and determining to awaken the terminal under the condition that the final confidence degree is greater than or equal to a preset confidence degree threshold value.
18. The apparatus of claim 10, wherein the signal processing means comprises blind source separation or noise suppression.
19. A computer-readable storage medium, on which computer program instructions are stored, which program instructions, when executed by a processor, carry out the steps of the method according to any one of claims 1 to 9.
20. A terminal, comprising:
a memory having a computer program stored thereon;
a processor for executing the computer program in the memory to carry out the steps of the method of any one of claims 1 to 9.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011303745.7A CN112509596B (en) | 2020-11-19 | 2020-11-19 | Wakeup control method, wakeup control device, storage medium and terminal |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011303745.7A CN112509596B (en) | 2020-11-19 | 2020-11-19 | Wakeup control method, wakeup control device, storage medium and terminal |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112509596A true CN112509596A (en) | 2021-03-16 |
CN112509596B CN112509596B (en) | 2024-07-09 |
Family
ID=74959093
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011303745.7A Active CN112509596B (en) | 2020-11-19 | 2020-11-19 | Wakeup control method, wakeup control device, storage medium and terminal |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112509596B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114229637A (en) * | 2021-12-03 | 2022-03-25 | 北京声智科技有限公司 | Elevator floor determining method, device, equipment and computer readable storage medium |
CN115050013A (en) * | 2022-06-14 | 2022-09-13 | 南京人工智能高等研究院有限公司 | Behavior detection method and device, vehicle, storage medium and electronic equipment |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105654949A (en) * | 2016-01-07 | 2016-06-08 | 北京云知声信息技术有限公司 | Voice wake-up method and device |
US20170125036A1 (en) * | 2015-11-03 | 2017-05-04 | Airoha Technology Corp. | Electronic apparatus and voice trigger method therefor |
TW201717192A (en) * | 2015-11-03 | 2017-05-16 | 絡達科技股份有限公司 | Electronic apparatus and voice trigger method therefor |
CN110047485A (en) * | 2019-05-16 | 2019-07-23 | 北京地平线机器人技术研发有限公司 | Identification wakes up method and apparatus, medium and the equipment of word |
US20190287518A1 (en) * | 2018-03-16 | 2019-09-19 | Wistron Corporation | Speech service control apparatus and method thereof |
CN110428810A (en) * | 2019-08-30 | 2019-11-08 | 北京声智科技有限公司 | A kind of recognition methods, device and electronic equipment that voice wakes up |
CN110838306A (en) * | 2019-11-12 | 2020-02-25 | 广州视源电子科技股份有限公司 | Voice signal detection method, computer storage medium and related equipment |
CN111508493A (en) * | 2020-04-20 | 2020-08-07 | Oppo广东移动通信有限公司 | Voice wake-up method and device, electronic equipment and storage medium |
CN111522592A (en) * | 2020-04-24 | 2020-08-11 | 腾讯科技(深圳)有限公司 | Intelligent terminal awakening method and device based on artificial intelligence |
CN111696562A (en) * | 2020-04-29 | 2020-09-22 | 华为技术有限公司 | Voice wake-up method, device and storage medium |
-
2020
- 2020-11-19 CN CN202011303745.7A patent/CN112509596B/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170125036A1 (en) * | 2015-11-03 | 2017-05-04 | Airoha Technology Corp. | Electronic apparatus and voice trigger method therefor |
TW201717192A (en) * | 2015-11-03 | 2017-05-16 | 絡達科技股份有限公司 | Electronic apparatus and voice trigger method therefor |
CN105654949A (en) * | 2016-01-07 | 2016-06-08 | 北京云知声信息技术有限公司 | Voice wake-up method and device |
US20190287518A1 (en) * | 2018-03-16 | 2019-09-19 | Wistron Corporation | Speech service control apparatus and method thereof |
CN110047485A (en) * | 2019-05-16 | 2019-07-23 | 北京地平线机器人技术研发有限公司 | Identification wakes up method and apparatus, medium and the equipment of word |
CN110428810A (en) * | 2019-08-30 | 2019-11-08 | 北京声智科技有限公司 | A kind of recognition methods, device and electronic equipment that voice wakes up |
CN110838306A (en) * | 2019-11-12 | 2020-02-25 | 广州视源电子科技股份有限公司 | Voice signal detection method, computer storage medium and related equipment |
CN111508493A (en) * | 2020-04-20 | 2020-08-07 | Oppo广东移动通信有限公司 | Voice wake-up method and device, electronic equipment and storage medium |
CN111522592A (en) * | 2020-04-24 | 2020-08-11 | 腾讯科技(深圳)有限公司 | Intelligent terminal awakening method and device based on artificial intelligence |
CN111696562A (en) * | 2020-04-29 | 2020-09-22 | 华为技术有限公司 | Voice wake-up method, device and storage medium |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114229637A (en) * | 2021-12-03 | 2022-03-25 | 北京声智科技有限公司 | Elevator floor determining method, device, equipment and computer readable storage medium |
CN114229637B (en) * | 2021-12-03 | 2024-02-27 | 北京声智科技有限公司 | Elevator floor determination method, device, equipment and computer readable storage medium |
CN115050013A (en) * | 2022-06-14 | 2022-09-13 | 南京人工智能高等研究院有限公司 | Behavior detection method and device, vehicle, storage medium and electronic equipment |
Also Published As
Publication number | Publication date |
---|---|
CN112509596B (en) | 2024-07-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP3136793A1 (en) | Method and apparatus for awakening electronic device | |
CN112037787B (en) | Wake-up control method, device and computer readable storage medium | |
CN106791893A (en) | Net cast method and device | |
EP3133874A1 (en) | Method and apparatus for starting energy saving mode | |
EP3779968A1 (en) | Audio processing | |
US10230891B2 (en) | Method, device and medium of photography prompts | |
EP3024211B1 (en) | Method and device for announcing voice call | |
CN109087650B (en) | Voice wake-up method and device | |
EP3933570A1 (en) | Method and apparatus for controlling a voice assistant, and computer-readable storage medium | |
CN110349578A (en) | Equipment wakes up processing method and processing device | |
CN112509596B (en) | Wakeup control method, wakeup control device, storage medium and terminal | |
CN111540350B (en) | Control method, device and storage medium of intelligent voice control equipment | |
CN111968680A (en) | Voice processing method, device and storage medium | |
CN109522058B (en) | Wake-up method, device, terminal and storage medium | |
US20170034347A1 (en) | Method and device for state notification and computer-readable storage medium | |
CN112489653B (en) | Speech recognition method, device and storage medium | |
US11561278B2 (en) | Method and device for processing information based on radar waves, terminal, and storage medium | |
CN112019948B (en) | Intercommunication method for intercom equipment, intercom equipment and storage medium | |
CN109788367A (en) | A kind of information cuing method, device, electronic equipment and storage medium | |
CN112489650B (en) | Wakeup control method, wakeup control device, storage medium and terminal | |
CN115579003A (en) | Voice wake-up method, device and storage medium | |
CN109922203A (en) | Terminal puts out screen method and apparatus | |
CN107979695B (en) | Network message receiving method and device and storage medium | |
CN112866480A (en) | Information processing method, information processing device, electronic equipment and storage medium | |
CN108491180B (en) | Audio playing method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |