CN110277095B - Voice service control device and method thereof - Google Patents

Voice service control device and method thereof Download PDF

Info

Publication number
CN110277095B
CN110277095B CN201810325210.6A CN201810325210A CN110277095B CN 110277095 B CN110277095 B CN 110277095B CN 201810325210 A CN201810325210 A CN 201810325210A CN 110277095 B CN110277095 B CN 110277095B
Authority
CN
China
Prior art keywords
voice data
value
voice
threshold value
confidence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810325210.6A
Other languages
Chinese (zh)
Other versions
CN110277095A (en
Inventor
李金龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wistron Corp
Original Assignee
Wistron Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wistron Corp filed Critical Wistron Corp
Publication of CN110277095A publication Critical patent/CN110277095A/en
Application granted granted Critical
Publication of CN110277095B publication Critical patent/CN110277095B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/30Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • G10L2015/0635Training updating or merging of old and new templates; Mean values; Weighting
    • G10L2015/0636Threshold criteria for the updating
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L2015/088Word spotting
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • G10L2025/783Detection of presence or absence of voice signals based on threshold decision
    • G10L2025/786Adaptive threshold

Abstract

The invention provides a voice service control device and a method thereof. In the method, voice data is obtained. Identifying a keyword in the voice data to determine a confidence value corresponding to the keyword, wherein the confidence value is a degree of conformity of the keyword with respect to a wake-up keyword for requesting a voice service. And determining the accumulated failure times in response to the confidence value being smaller than the recognition threshold value, wherein the voice service is required to be provided in response to the detection of the confidence value being larger than the recognition threshold value, and the accumulated failure times are accumulated times in which the confidence values of the voice data and at least one previous voice data in the time interval are smaller than the recognition threshold value. And adjusting the identification threshold value according to the accumulated failure times and the operation relation of the confidence values of the voice data and the previous voice data. Therefore, the user can smoothly start the voice service.

Description

Voice service control device and method thereof
Technical Field
The present invention relates to voice control technology, and more particularly, to a voice service control apparatus and method based on voice control technology.
Background
In recent years, various network service providers have introduced voice assistants and related voice services, home electrical appliances have also introduced voice-controlled home electrical appliances, and other electronic equipment manufacturers have incorporated voice-controlled products to allow users to control the operation of various types of electronic devices through voice (e.g., power on, weather broadcast, music broadcast, etc.). In order to meet the user requirements and improve the product availability, some manufacturers even open related source codes, so that third-party developers can customize services or combine peripheral application services. In these source codes, a developer can set a wake-up keyword (e.g., Alexa, Cortana, Hey Siri, OK Google, etc.) by itself to request a server or a program through a specific wake-up keyword, so as to obtain a corresponding voice service.
However, users in different regions may have different pronunciation modes and accents for the wake-up keyword, and different voice control devices (e.g., computers, mobile phones, smart speakers, etc.) may use different sound receiving devices (e.g., microphones) or recognize voice data through different sound receiving algorithms, so that the same user speaking the same wake-up keyword to different voice control devices may have different results (e.g., the user may successfully obtain corresponding voice services for device a calls, but the user may not successfully make requests for device B calls). Therefore, the existing voice service control technology still has defects.
Disclosure of Invention
In view of the above, the present invention provides a voice service control apparatus and method thereof, which can effectively avoid the failure of starting voice service by learning several times of calls of a user to a wake-up keyword.
The voice service control method of the present invention includes the following steps. Voice data is obtained. Identifying a keyword in the voice data to determine a confidence value corresponding to the keyword, wherein the confidence value is a degree of conformity of the keyword with respect to a wake-up keyword for requesting a voice service. And determining the accumulated failure times in response to the confidence value being smaller than the recognition threshold value, wherein the voice service is required to be provided in response to the detection of the confidence value being larger than the recognition threshold value, and the accumulated failure times are accumulated times in which the confidence values of the voice data and at least one previous voice data in the time interval are smaller than the recognition threshold value. And adjusting the identification threshold value according to the accumulated failure times and the operation relation of the confidence values of the voice data and the previous voice data.
The voice service control device of the invention comprises a radio device and a processor. The sound receiving device receives voice data. The processor is coupled to the sound receiving device and configured to perform the following steps. Identifying a keyword in the voice data to determine a confidence value corresponding to the keyword, wherein the confidence value is a degree of conformity of the keyword with respect to a wake-up keyword for requesting a voice service. And determining the accumulated failure times in response to the confidence value being smaller than the recognition threshold value, wherein the voice service is required to be provided in response to the detection of the confidence value being larger than the recognition threshold value, and the accumulated failure times are accumulated times in which the confidence values of the voice data and at least one previous voice data in the time interval are smaller than the recognition threshold value. And adjusting the identification threshold value according to the accumulated failure times and the operation relation of the confidence values of the voice data and the previous voice data.
Based on the above, the recognition threshold is an important key for the voice service activation. The voice service control and the method thereof of the embodiment of the invention reflect the situation that the voice service is not successfully required for many times, and reduce the identification threshold value based on the confidence values corresponding to the voice data which are failed to be required for many times, so that the subsequent calls of the user can successfully require the voice service. On the other hand, if the user does not call but successfully requests the voice service, the embodiment of the invention can more timely increase the identification threshold value, so that the external sound is difficult to successfully request the voice service.
In order to make the aforementioned and other features and advantages of the invention more comprehensible, embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a schematic diagram of a voice service system according to an embodiment of the invention.
Fig. 2 is a flowchart of a voice service control method according to an embodiment of the present invention.
FIG. 3 is a flow diagram of an application scenario.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by persons skilled in the art without any inventive step based on the embodiments of the present invention, belong to the protection scope of the present invention.
Fig. 1 is a schematic diagram of a voice service system 1 according to an embodiment of the present invention. Referring to fig. 1, the voice service system 1 includes a voice service control device 110 and a voice service providing server 150.
The voice service control device 110 may be a smart phone, a tablet computer, a desktop computer, a notebook computer, a voice assistant, an intelligent multimedia device, an intelligent speaker, or an intelligent home appliance, and the voice service control device 110 includes but is not limited to a radio device 111, an input/output interface 112, a processor 113, and a memory 114.
The sound pickup device 111 includes, but is not limited to, an omnidirectional microphone, a directional microphone, or other electronic components capable of receiving sound waves (e.g., human voice, environmental sound, machine operation sound, etc.) and converting the sound waves into audio signals, an analog-to-digital converter, a filter, and an audio processor, and in the present embodiment, the sound pickup device 111 generates digital voice data (or audio data) in response to the reception of the sound waves.
The i/o interface 112 may be a network interface card supporting communication technologies such as Wi-Fi, mobile communication, ethernet, etc., or a transmission interface such as various types of serial or parallel buses, and in the present embodiment, the i/o interface 112 receives and transmits data with the outside.
The processor 113 is coupled to the radio device 111 and the input/output interface 112, and may be a Central Processing Unit (CPU), or other programmable general purpose or special purpose Microprocessor (Microprocessor), Digital Signal Processor (DSP), programmable controller, Application-Specific Integrated Circuit (ASIC), or other similar components or combinations thereof. In the embodiment of the present invention, the processor 113 is used for executing all operations of the voice service control device 110, acquiring and processing the voice data generated by the sound receiving device 111, and transmitting the data through the input/output interface 112.
The Memory 114 is coupled to the processor 113, and the Memory 114 may be any type of fixed or removable Random Access Memory (RAM), Read-Only Memory (ROM), Flash Memory (Flash Memory), or the like or combination thereof, and the Memory 114 is used for storing software programs, related voice data, and related values (for example, a confidence value, a recognition threshold value, various variance values, a maximum threshold value, etc., and will be described in detail in the following embodiments), and data of a confidence value related equation, etc. for executing the voice service control method (to be described in the following embodiments) of the present invention, and the software programs, data, values, and equations may be loaded and executed or used by the processor 113.
The voice service providing server 150 may be a personal computer, a notebook computer, a workstation, or various types of servers. The voice service providing server 150 receives the service request and recognizes the voice data in the service request based on the voice-to-text and semantic analysis techniques to understand the content of the service request. The voice service providing server 150 determines whether the content of the service request matches the voice function (e.g., keyword query, music playing, calendar reminder, etc.) of the service request, so as to provide the corresponding voice service.
In order to facilitate understanding of the operation flow of the present invention, the following detailed description will be given by way of examples. Fig. 2 is a flowchart illustrating a voice service control method according to an embodiment of the present invention. Referring to fig. 2, the method according to the embodiment of the present invention will be described with reference to the components and modules of the devices and the voice service control device 110 in fig. 1. The various processes of the method may be adapted according to the implementation, and are not limited thereto.
After the processor 113 obtains the voice data through the sound receiving device 111 (step S210), it can recognize the keyword in the voice data to determine the confidence value corresponding to the keyword (step S220). In the embodiment, the processor 113 identifies the speech data through a speech-to-text and semantic analysis technique, so as to obtain the sentence content of the speech data. The processor 113 detects whether the statement content is or has a specific wake-up key (e.g., Alexa, Cortana, Hey Siri, OK Google, etc.) for initiating the voice service request. However, the processor 113 recognizes that the content of the sentence is inevitable with some error. Therefore, the processor 113 needs to determine the compliance of the sentence content with the wake-up key (i.e., the confidence value, usually from zero to one) and then determine whether to request the service. Assuming that the sound receiving device 111 receives the sound wave of the user calling the wake-up keyword, the voice data converted from the sound wave will include the keyword (included in the sentence content) related to the wake-up keyword. The processor 113 further determines the matching degree of the keyword with respect to the wake-up keyword as the confidence value of the current voice data.
It is noted that this confidence value is important to determine whether the processor 113 issues a service request. The processor 113 determines whether the confidence level of the current speech data is greater than a recognition threshold (between zero and one, e.g., 0.6, 0.55, etc.). If the confidence value is greater than the recognition threshold, the processor 113 will issue a service request. Conversely, if the confidence value is less than the recognition threshold, the processor 113 does not issue (or ignores, disables) the service request. In the prior art, the identification threshold is fixed. Therefore, if the user cannot successfully make the voice control apparatus generate service by calling the wake-up keyword, even if the same user repeatedly calls the wake-up keyword, the conventional voice control apparatus may still fail to issue a service request because the corresponding confidence values are all smaller than the fixed identification threshold value.
In order to solve the foregoing problems, the embodiments of the present invention train the user to call the wake-up keyword several times to learn the confidence values corresponding to the user calls, and then adjust the identification threshold according to the confidence values, which will be described in detail below.
In response to the confidence value being less than the recognition threshold, the processor 113 determines the accumulated failure times (step S230). The accumulated failure times are accumulated times when the confidence value of the current speech data and the at least one previous speech data is less than the recognition threshold value in a time interval (e.g., 3 seconds, 5 seconds, etc.). Wherein, every time the processor 113 determines that the confidence value of the current voice data is smaller than the recognition threshold value, the accumulated failure times is increased by one.
It should be noted that, in some embodiments, the accumulated number of failures is an accumulated number of times that the confidence value between the voice data and the at least one piece of previous voice data obtained continuously in a period of time is smaller than the recognition threshold. That is, the processor 113 should continuously detect the user call wakeup keyword, and then accumulate the accumulated failure times. However, in some practical cases, the user may inadvertently participate in the sentence content of other non-wake-up keywords during a certain number of calls, resulting in the user needing to call the wake-up keywords several times more repeatedly. Therefore, the mechanism of continuous detection is somewhat strict but can avoid misjudgment, and the person applying the embodiment of the present invention can adjust the condition of whether to set continuity or not according to the situation. On the other hand, each time a certain time interval expires, the processor 113 will zero the accumulated failure count to re-accumulate the accumulated failure count.
The processor 113 then adjusts the recognition threshold according to the accumulated failure times and an operation relationship between the confidence values of the speech data and the previous speech data (step S240). Specifically, the processor 113 determines whether the current accumulated failure count is greater than a count threshold (an integer greater than one, e.g., 2, 3, 5, etc.). In response to the accumulated failure count not being greater than the count threshold, the processor 113 will continue to recognize subsequent speech data. In response to the accumulated failure number being greater than the threshold number, the processor 113 decreases the recognition threshold according to the operational relationship between the confidence values of the speech data and the previous speech data.
In one embodiment, the processor 113 obtains at least one of the speech data and the previous speech data with the largest confidence value (e.g., two or three), and then uses the average of the confidence value and the recognition threshold value of the speech data and the previous speech data (i.e., the at least one of the previously obtained confidence values with the largest confidence value) as the adjusted recognition threshold value. Since the confidence values of the speech data and the previous speech data are both less than the initial recognition threshold, the average of the confidence values and the initial recognition threshold obtained by the processor 113 is less than the initial recognition threshold, thereby reducing the recognition threshold. For example, if the confidence values are 0.5, 0.56, 0.45, 0.3, the processor 113 extracts the maximum two confidence values 0.5 and 0.56 from the two confidence values and averages the two confidence values with the current recognition threshold value 0.6 to obtain 0.53 as the adjusted recognition threshold value.
Further, in another embodiment, the processor 113 obtains at least one of the confidence values of the speech data and the previous speech data that is greater than a minimum threshold value, where the minimum threshold value is the recognition threshold value minus a first variance value (between zero and one, e.g., 0.05, 0.08, etc.). Then, the processor 113 may select the selected confidence value directly or further select an average value of at least one of the largest confidence values and the recognition threshold value as the adjusted recognition threshold value. For example, assuming that the confidence values are 0.2, 0.5, 0.56, 0.45, 0.3 and the minimum threshold is the recognition threshold 0.6 minus the first variance 0.05 to 0.55, the processor 113 will select the confidence value 0.56 greater than 0.55 to average with the current recognition threshold 0.6 to obtain 0.58 as the adjusted recognition threshold. The method of this embodiment is characterized by further setting a floor threshold for adjusting the confidence level to avoid the false activation problem that the subsequent recognition threshold is inadvertently adjusted too low to correctly speak the keyword (or due to environmental noise) but can optionally start the service of the voice control device.
It should be noted that the foregoing embodiments determine the recognition threshold value in an average manner, but there may be many ways to adjust the recognition threshold value. For example, weighting values are respectively assigned to the confidence value and the recognition threshold value, and the maximum one of the confidence values is subtracted from the first variance value to obtain the recognition threshold value, which is adjusted according to the actual requirement of the user.
Therefore, the identification threshold value is probably closer to or smaller than the confidence value of the voice data corresponding to the user call after being adjusted and reduced, thereby being capable of providing the requirement of the voice service.
In another aspect, the foregoing description relates to adjusting the recognition threshold. However, in some cases, the recognition threshold may be too low, which may cause the confidence level of the environmental sound to be higher than the recognition threshold, and cause the voice control device to make a misjudgment and issue a service request. In order to reduce the occurrence of the foregoing situation, in response to the confidence value of the voice data not being less than the recognition threshold value, the processor 113 sends a service request to the voice service providing server 150 through the input/output interface 112. The service request includes voice data obtained by the radio device 111. The voice service providing server 150 will determine whether the recorded sentence content of the voice data matches the voice function provided by the voice service providing server (e.g., how is tomorrow.
The processor 113 receives the service response through the i/o interface 112 and determines whether the service response is associated with voice data that does not match the voice function provided by the voice service providing server 150. In response to the service response being associated with a voice data mismatch indicating a false positive for the confidence value, the processor 113 adjusts the recognition threshold. In the present embodiment, the processor 113 has a maximum threshold value, which is determined according to the recognition threshold value and is greater than the recognition threshold value. For example, the ceiling value is the recognition threshold plus a second variance (between zero and one, e.g., 0.05, 0.03, etc., which in some embodiments is equal to the first variance). In response to the confidence value of the voice data being less than the maximum threshold, the processor 113 uses the confidence value of the voice data as the recognition threshold. Since the confidence value corresponding to a successful service request must be greater than the recognition threshold, setting the recognition threshold to the confidence value will increase the recognition threshold. On the other hand, in response to the confidence value of the voice data not being less than the maximum threshold, the processor 113 uses the maximum threshold as the recognition threshold, so that the recognition threshold is not too high for one adjustment. That is, the embodiment of the present invention further increases the recognition threshold by learning the confidence values that cause the occurrence of the misjudgment condition, and the environmental sound with the confidence value cannot start the voice service any more.
It should be noted that there may be many ways to increase the recognition threshold, for example, adding the confidence value to the second variance value as the recognition threshold, adding half of the recognition threshold to the second variance value as the adjusted recognition threshold, and so on, which are adjusted according to the actual requirements of the user.
On the other hand, if the voice data has a matching voice function, the service response includes a corresponding voice service (e.g., transmitting weather information, today's trip content, music streaming, etc.), so that the processor 113 can execute the voice function corresponding to the voice service (e.g., displaying weather information, displaying today's trip, playing music, etc.).
It should be noted that the voice service control apparatus 110 of the foregoing embodiment is connected to the voice service providing server 150 in a remote or wired manner. However, in some embodiments, the voice service control device 110 may provide the off-line voice service, so the processor 113 may directly determine whether the voice data matches the voice function, and may provide the voice service accordingly. That is, the service request and the service response are determined by the processor 113, and the i/o interface 112 may not be provided.
To assist the reader in understanding the spirit of the embodiments of the present invention, another application scenario is described below.
Fig. 3 is a flowchart of the application scenario, and it is assumed that the identification threshold is 0.6, the first variance and the second variance are both 0.05, and the frequency threshold is two times. When the sound receiving device 111 receives the user' S call and generates voice data, the processor 113 starts recognizing the voice data (step S310) and determines whether a wake-up keyword is detected (step S315). If the wake-up keyword is not detected, the process returns to step S310, and the processor 113 continues to recognize the next received voice data. If the wake-up keyword is detected, the processor 113 obtains a confidence value corresponding to the voice data (step S320), and determines whether the confidence value is greater than the recognition threshold (step S325). Assuming that the confidence value is 0.5 and is not greater than the recognition threshold value, it represents that the service request is not successfully (or not) issued (step S330). The processor 113 further determines whether the accumulated failure count is greater than a count threshold (step S335). Assuming that the accumulated failure times are three times, the processor 113 adjusts the recognition threshold according to equation (1) (step S340):
Figure GDA0003015397180000071
restricted to LB-sigma1≤Vi,Vi-1,Vi-2≤LB
Wherein LB is the recognition threshold value, and LB is the recognition threshold value,
Figure GDA0003015397180000082
represents the maximum of two, Vi、Vi-1、Vi-2Respectively, the confidence values corresponding to the current speech data and the previous speech data from which the wake-up keyword was successfully detected for the previous two times, and σ1Is the first variance value (i.e., 0.05). For example, Vi、Vi-1、Vi-2Are 0.56, 0.55, 0.5, respectively, the processor will obtain the confidence value Vi、Vi-1(0.56 and 0.55 are both greater than or equal to LB-sigma1) And applying the two confidence values Vi、Vi-1The adjusted recognition threshold is 0.57 (lower than the initial value of 0.6) by substituting into equation (1).
On the other hand, if the confidence value corresponding to the next received voice data is 0.63, the confidence value is greater than the adjusted recognition threshold value (0.57), which means that the processor 113 will successfully issue a service request to the voice service providing server 150 (step S350). After receiving the service response through the i/o interface 112, the processor 113 determines whether the service response corresponds to any voice function (step S355).
Assuming that the speech data is originated from the environmental sound and thus cannot correspond to any speech function, the processor 113 adjusts the recognition threshold according to equations (2) and (3) (step S370):
UB=LB+σ2…(2)
Figure GDA0003015397180000081
wherein sigma2The second variance (i.e., 0.05), UB is the highest threshold (i.e., 0.62 is the recognition threshold 0.57 plus the second variance 0.05), and V is the confidence level of the current speech data (i.e., 0.63). Since the confidence value of the current speech data is greater than the maximum threshold, the maximum threshold is used as the adjusted recognition threshold.
On the other hand, if the voice data is originated from human voice and the voice data can correspond to a voice function, the processor 113 maintains the recognition threshold and executes the corresponding voice function (step S360).
In summary, the voice service control apparatus and method thereof according to the embodiments of the present invention determine whether the user repeatedly fails to wake up the keyword for several times, and then lower the recognition threshold according to the operation relationship of the confidence values corresponding to the failed calls, so that the user can successfully start the voice service. On the other hand, in order to avoid the erroneous judgment caused by the recognition threshold value being always lower than the confidence value corresponding to the environmental sound, the embodiment of the invention also can judge whether the voice data really makes a request for the voice function, and increase the recognition threshold value under the condition that the voice data is not matched with the voice function.
Although the present invention has been described with reference to the above embodiments, it should be understood that various changes and modifications can be made therein by those skilled in the art without departing from the spirit and scope of the invention.

Claims (18)

1. A voice service control method, comprising:
obtaining voice data;
identifying a keyword in the voice data to determine a confidence value corresponding to the keyword, wherein the confidence value is a conformity degree of the keyword relative to a wake-up keyword for requesting a voice service;
determining an accumulated failure time in response to the confidence value being less than an identification threshold, wherein the voice service is requested to be provided in response to detecting the confidence value being greater than the identification threshold, and the accumulated failure time is an accumulated time in which the confidence value of the voice data and at least one previous voice data is less than the identification threshold in a time interval; and
adjusting the recognition threshold value according to the accumulated failure times and an operation relation between the voice data and the confidence value of the at least one previous voice data;
after determining the confidence value corresponding to the keyword, the method further includes:
in response to the confidence value of the voice data not being less than the recognition threshold value, sending a service request, wherein the service request comprises the voice data;
receiving a service response in response to the service request;
determining whether the service response is associated with the voice data without matching at least one voice function; and
the recognition threshold is adjusted in response to the service response being associated with the voice data not matching the at least one voice function.
2. The method of claim 1, wherein the step of adjusting the recognition threshold according to the accumulated failure times and the operational relationship between the speech data and the confidence value of the at least one previous speech data comprises:
judging whether the accumulated failure times is greater than a time threshold value, wherein the time threshold value is greater than one; and
and reducing the identification threshold value according to the operational relationship between the voice data and the confidence value of the at least one previous voice data in response to the accumulated failure times being greater than the time threshold value.
3. The method of claim 2, wherein the step of lowering the recognition threshold according to the operational relationship between the speech data and the confidence value of the at least one previous speech data comprises:
and taking the average value of the confidence value and the recognition threshold value of at least one of the voice data and the at least one previous voice data as the adjusted recognition threshold value.
4. The method of claim 3, wherein before taking an average of the recognition threshold and a confidence value of at least one of the speech data and the at least one previous speech data as the adjusted recognition threshold, further comprising:
obtaining the largest at least one of the confidence values of the voice data and the at least one previous voice data.
5. The method of claim 3, wherein before taking an average of the recognition threshold and a confidence value of at least one of the speech data and the at least one previous speech data as the updated recognition threshold, further comprising:
at least one of the confidence values of the speech data and the at least one previous speech data is obtained, wherein the confidence value is greater than a minimum threshold value, and the minimum threshold value is the recognition threshold value minus a variance value.
6. The method of claim 1, wherein the accumulated number of failures is an accumulated number of times that a confidence value between the speech data and the at least one previous speech data obtained continuously is less than the recognition threshold in the time interval.
7. The method of claim 1 wherein the step of adjusting the recognition threshold comprises:
determining a maximum threshold value according to the identification threshold value, wherein the maximum threshold value is greater than the identification threshold value; and
increasing the identification threshold value according to the maximum threshold value.
8. The method of claim 7 wherein the step of raising the recognition threshold according to the highest threshold comprises:
taking the confidence value of the voice data as the identification threshold value in response to the confidence value of the voice data being smaller than the highest threshold value; and
and taking the highest threshold as the identification threshold in response to the confidence value of the voice data not being less than the highest threshold.
9. The method of claim 7 wherein the ceiling is the recognition threshold plus a variance.
10. A voice service control apparatus, comprising:
a radio device for obtaining a voice data; and
a processor coupled to the sound receiving device and configured to perform:
identifying a keyword in the voice data to determine a confidence value corresponding to the keyword, wherein the confidence value is a conformity degree of the keyword relative to a wake-up keyword for requesting a voice service;
determining an accumulated failure time in response to the confidence value being less than an identification threshold, wherein the voice service is requested to be provided in response to detecting the confidence value being greater than the identification threshold, and the accumulated failure time is an accumulated time in which the confidence value of the voice data and at least one previous voice data is less than the identification threshold in a time interval; and
adjusting the recognition threshold value according to the accumulated failure times and an operation relation between the voice data and the confidence value of the at least one previous voice data;
wherein, the voice service control device further comprises:
an input/output interface coupled to the processor for receiving and transmitting data; and the processor is configured to perform:
in response to the confidence value of the voice data not being less than the recognition threshold value, sending a service request through the input/output interface, wherein the service request includes the voice data;
receiving a service response responding to the service request through the input/output interface;
determining whether the service response is associated with the voice data without matching at least one voice function; and
the recognition threshold is adjusted in response to the service response being associated with the voice data not matching the at least one voice function.
11. The voice service control device of claim 10 wherein the processor is configured to perform:
judging whether the accumulated failure times is greater than a time threshold value, wherein the time threshold value is greater than one; and
and reducing the recognition threshold value according to the confidence value of the voice data and the at least one previous voice data in response to the accumulated failure times being greater than the time threshold value.
12. The voice service control device of claim 11 wherein the processor is configured to perform:
and taking the average value of the confidence value and the recognition threshold value of at least one of the voice data and the at least one previous voice data as the adjusted recognition threshold value.
13. The voice service control device of claim 12 wherein the processor is configured to perform:
obtaining the largest at least one of the confidence values of the voice data and the at least one previous voice data.
14. The voice service control device of claim 12 wherein the processor is configured to perform:
at least one of the confidence values of the speech data and the at least one previous speech data is obtained, wherein the confidence value is greater than a minimum threshold value, and the minimum threshold value is the recognition threshold value minus a variance value.
15. The apparatus of claim 10, wherein the accumulated number of failures is an accumulated number of times that a confidence value between the speech data and the at least one previous speech data obtained continuously is less than the recognition threshold in the time interval.
16. The voice service control device of claim 10 wherein the processor is configured to perform:
determining a maximum threshold value according to the identification threshold value, wherein the maximum threshold value is greater than the identification threshold value; and
increasing the identification threshold value according to the maximum threshold value.
17. The voice service control device of claim 16 wherein the processor is configured to perform:
taking the confidence value of the voice data as the identification threshold value in response to the confidence value of the voice data being smaller than the highest threshold value; and
and taking the highest threshold as the identification threshold in response to the confidence value of the voice data not being less than the highest threshold.
18. The apparatus of claim 16 wherein the ceiling value is the recognition threshold plus a variance.
CN201810325210.6A 2018-03-16 2018-04-12 Voice service control device and method thereof Active CN110277095B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
TW107108937 2018-03-16
TW107108937A TWI682385B (en) 2018-03-16 2018-03-16 Speech service control apparatus and method thereof

Publications (2)

Publication Number Publication Date
CN110277095A CN110277095A (en) 2019-09-24
CN110277095B true CN110277095B (en) 2021-06-18

Family

ID=63012890

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810325210.6A Active CN110277095B (en) 2018-03-16 2018-04-12 Voice service control device and method thereof

Country Status (4)

Country Link
US (1) US10755696B2 (en)
EP (1) EP3540730B1 (en)
CN (1) CN110277095B (en)
TW (1) TWI682385B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111128155B (en) * 2019-12-05 2020-12-01 珠海格力电器股份有限公司 Awakening method, device, equipment and medium for intelligent equipment
KR20210079004A (en) * 2019-12-19 2021-06-29 삼성전자주식회사 A computing apparatus and a method of operating the computing apparatus
CN111816178A (en) * 2020-07-07 2020-10-23 云知声智能科技股份有限公司 Voice equipment control method, device and equipment
EP4191577A4 (en) * 2020-09-25 2024-01-17 Samsung Electronics Co Ltd Electronic device and control method therefor
CN112509596A (en) * 2020-11-19 2021-03-16 北京小米移动软件有限公司 Wake-up control method and device, storage medium and terminal

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2015045689A (en) * 2013-08-27 2015-03-12 インターナショナル・ビジネス・マシーンズ・コーポレーションInternational Business Machines Corporation Method for evaluating voice recognition result about voice recognition system, computer and computer program for the same
CN105912725A (en) * 2016-05-12 2016-08-31 上海劲牛信息技术有限公司 System for calling vast intelligence applications through natural language interaction
CN107659847A (en) * 2016-09-22 2018-02-02 腾讯科技(北京)有限公司 Voice interface method and apparatus

Family Cites Families (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7895039B2 (en) 2005-02-04 2011-02-22 Vocollect, Inc. Methods and systems for optimizing model adaptation for a speech recognition system
KR100679044B1 (en) * 2005-03-07 2007-02-06 삼성전자주식회사 Method and apparatus for speech recognition
US8396715B2 (en) * 2005-06-28 2013-03-12 Microsoft Corporation Confidence threshold tuning
JP5725028B2 (en) * 2010-08-10 2015-05-27 日本電気株式会社 Speech segment determination device, speech segment determination method, and speech segment determination program
JP5949550B2 (en) * 2010-09-17 2016-07-06 日本電気株式会社 Speech recognition apparatus, speech recognition method, and program
US8639508B2 (en) * 2011-02-14 2014-01-28 General Motors Llc User-specific confidence thresholds for speech recognition
US9916538B2 (en) * 2012-09-15 2018-03-13 Z Advanced Computing, Inc. Method and system for feature detection
US9704486B2 (en) * 2012-12-11 2017-07-11 Amazon Technologies, Inc. Speech recognition power management
US20140337031A1 (en) 2013-05-07 2014-11-13 Qualcomm Incorporated Method and apparatus for detecting a target keyword
JP2015155975A (en) * 2014-02-20 2015-08-27 ソニー株式会社 Sound signal processor, sound signal processing method, and program
TWI639153B (en) * 2015-11-03 2018-10-21 絡達科技股份有限公司 Electronic apparatus and voice trigger method therefor
CN106653010B (en) * 2015-11-03 2020-07-24 络达科技股份有限公司 Electronic device and method for waking up electronic device through voice recognition
US10038787B2 (en) * 2016-05-06 2018-07-31 Genesys Telecommunications Laboratories, Inc. System and method for managing and transitioning automated chat conversations
US10169319B2 (en) * 2016-09-27 2019-01-01 International Business Machines Corporation System, method and computer program product for improving dialog service quality via user feedback
WO2018097969A1 (en) 2016-11-22 2018-05-31 Knowles Electronics, Llc Methods and systems for locating the end of the keyword in voice sensing

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2015045689A (en) * 2013-08-27 2015-03-12 インターナショナル・ビジネス・マシーンズ・コーポレーションInternational Business Machines Corporation Method for evaluating voice recognition result about voice recognition system, computer and computer program for the same
CN105912725A (en) * 2016-05-12 2016-08-31 上海劲牛信息技术有限公司 System for calling vast intelligence applications through natural language interaction
CN107659847A (en) * 2016-09-22 2018-02-02 腾讯科技(北京)有限公司 Voice interface method and apparatus

Also Published As

Publication number Publication date
TWI682385B (en) 2020-01-11
CN110277095A (en) 2019-09-24
EP3540730A1 (en) 2019-09-18
TW201939482A (en) 2019-10-01
US20190287518A1 (en) 2019-09-19
EP3540730B1 (en) 2020-07-08
US10755696B2 (en) 2020-08-25

Similar Documents

Publication Publication Date Title
CN110277095B (en) Voice service control device and method thereof
CN109378000B (en) Voice wake-up method, device, system, equipment, server and storage medium
US20210005197A1 (en) Detecting Self-Generated Wake Expressions
US8781826B2 (en) Method for operating a speech recognition system
US20180285068A1 (en) Processing method of audio control and electronic device thereof
WO2014096506A1 (en) Method, apparatus, and computer program product for personalizing speech recognition
CN111402900B (en) Voice interaction method, equipment and system
US20150081274A1 (en) System and method for translating speech, and non-transitory computer readable medium thereof
CN109982228B (en) Microphone fault detection method and mobile terminal
US20180174574A1 (en) Methods and systems for reducing false alarms in keyword detection
US9929709B1 (en) Electronic device capable of adjusting output sound and method of adjusting output sound
JP2004511823A (en) Dynamically reconfigurable speech recognition system and method
KR20200015267A (en) Electronic device for determining an electronic device to perform speech recognition and method for the same
US9558758B1 (en) User feedback on microphone placement
US20170178627A1 (en) Environmental noise detection for dialog systems
US11178280B2 (en) Input during conversational session
CN111433737A (en) Electronic device and control method thereof
JP2014191029A (en) Voice recognition system and method for controlling voice recognition system
WO2019228138A1 (en) Music playback method and apparatus, storage medium, and electronic device
JP2003241788A (en) Device and system for speech recognition
US20190304457A1 (en) Interaction device and program
US11205433B2 (en) Method and apparatus for activating speech recognition
CN113096651A (en) Voice signal processing method and device, readable storage medium and electronic equipment
CN112420043A (en) Intelligent awakening method and device based on voice, electronic equipment and storage medium
CN112885341A (en) Voice wake-up method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant