WO2011134288A1 - 语音控制方法和语音控制装置 - Google Patents

语音控制方法和语音控制装置 Download PDF

Info

Publication number
WO2011134288A1
WO2011134288A1 PCT/CN2011/070198 CN2011070198W WO2011134288A1 WO 2011134288 A1 WO2011134288 A1 WO 2011134288A1 CN 2011070198 W CN2011070198 W CN 2011070198W WO 2011134288 A1 WO2011134288 A1 WO 2011134288A1
Authority
WO
WIPO (PCT)
Prior art keywords
voice
grammar
packet
identification information
voice recognition
Prior art date
Application number
PCT/CN2011/070198
Other languages
English (en)
French (fr)
Inventor
李满海
肖开利
王景平
廖芯
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Priority to EP11774279.1A priority Critical patent/EP2521121B1/en
Priority to US13/575,717 priority patent/US9236048B2/en
Publication of WO2011134288A1 publication Critical patent/WO2011134288A1/zh

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/04Segmentation; Word boundary detection
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/221Announcement of recognition results
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/226Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics
    • G10L2015/228Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics of application context
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M2250/00Details of telephonic subscriber devices
    • H04M2250/74Details of telephonic subscriber devices with voice recognition means

Definitions

  • the present invention relates to the field of voice recognition and communication technologies, and in particular, to a voice control method and a voice control device. Background technique
  • voice control technology In order to enable people to use various services provided by terminal devices without pressing buttons and in a specific scenario, voice control technology has emerged. People only need to speak various instructions near the microphone of the terminal device, and the terminal device can perform corresponding processing according to the instruction. Taking the voice dialing technology as an example, the technology can recognize the call from the user's voice in order to facilitate the person to make a call when the hands are occupied and cannot be pressed (such as driving a vehicle), or for a disabled person whose upper limb is not perfect. The required information, and dial based on the identified information.
  • the terminal device including fixed terminal or mobile terminal
  • the terminal device can establish a call with the called party for the user, which greatly simplifies the user. Operation.
  • voice control technology is also widely used in many products such as robots and garages with voice-activated switches.
  • the voice dialing technology is taken as an example to introduce the basic principles of the existing voice control technology:
  • the terminal device first generates a grammar package according to various contact information included in the address book, such as name, address, contact, and the like.
  • the packet includes the voice data of the contact information; then the terminal device receives the voice signal input by the user through an audio signal receiving interface such as a microphone, and performs voice recognition according to the received voice signal and the generated syntax packet, and determines the received voice signal. Whether the speech data of each word in the grammar packet exists, and if so, it is considered that the word is recognized from the received speech signal.
  • the terminal device can only recognize very few of them. A few words, because the proportion of words that can be recognized is difficult to reach the predetermined threshold, and the processing ends. Therefore, the success rate of existing voice control technologies is low. Summary of the invention
  • the main purpose of the embodiments of the present invention is to provide a voice control method, which is used to solve the problem of low voice control success rate in the prior art.
  • an embodiment of the present invention also provides a voice control device.
  • the technical solution provided by the embodiment of the present invention is as follows:
  • a voice control method includes: classifying the stored identification information for performing voice recognition, obtaining syntax packets corresponding to each type of identification information; receiving the input voice signal, and sequentially using each obtained syntax packet to receive The obtained speech signal is subjected to speech recognition processing; and, according to the speech recognition result of the speech signal by each grammar packet, a corresponding control process is performed.
  • the performing voice recognition processing on the voice signal according to each grammar package includes: When at least one piece of identification information in the grammar packet can be identified from the received voice signal, an identifier corresponding to the identified identification information is selected as the grammar from the identifier corresponding to each piece of identification information in the grammar package specified in advance.
  • the voice recognition result of the voice signal is packetized; otherwise, the voice recognition failure is determined, and according to the reason for the failure of the voice recognition processing, the voice is selected from the identifier corresponding to the failure reason of each voice recognition processing specified in advance.
  • An identifier corresponding to the reason for the failure of the processing is identified as a result of the speech recognition of the speech signal by the grammar packet.
  • Performing the corresponding control processing on the voice recognition result of the voice signal according to each grammar packet specifically: when at least one identifier corresponding to the reason for the failure of the voice recognition processing exists in the voice recognition result of the grammar packet for the voice signal , output a prompt signal for prompting voice recognition failure.
  • Performing the corresponding control processing on the voice recognition result of the voice signal according to each grammar packet specifically includes: counting the number of identifiers corresponding to the reason for the failure of the same voice recognition processing in the voice recognition result of the voice signal in each grammar packet And the reason for the failure of the speech recognition processing corresponding to the most-numbered identifier is presented to the user through a prompt message.
  • the performing the corresponding control processing on the voice recognition result of the voice signal according to each grammar packet specifically includes: when the voice recognition result of the voice packet is specified by the grammar packet, if there is no identifier corresponding to the reason for the failure of the voice recognition processing, And performing, according to the speech recognition result of the speech signal by the specified grammar packet, a predetermined control process corresponding to the speech recognition result of the speech signal by the specified grammar packet.
  • Performing the corresponding control processing on the voice recognition result of the voice signal according to each grammar packet specifically includes: combining, according to a predetermined combination order, the voice recognition result of each voice packet to the voice signal, and transmitting the combined result to the outside Receiving a query request sent by the external device, where the query request includes the correspondence between the identification information and the identifier specified by the external device in advance according to the predetermined combination, and selecting the query request.
  • the query request includes the correspondence between the identification information and the identifier specified by the external device in advance according to the predetermined combination, and selecting the query request.
  • the identification information corresponding to the split result is provided; the identification information corresponding to the split result is provided to the external device, and the external device performs the control process according to the identification information corresponding to the split result.
  • the identification information for performing voice recognition includes contact name type information, contact contact type information, and operation type information.
  • a voice control device includes:
  • a grammar packet obtaining unit configured to classify the stored identification information for performing voice recognition, and obtain a grammar package corresponding to each type of identification information
  • a speech recognition processing unit configured to receive the input speech signal, and sequentially perform speech recognition processing on the received speech signal by using each grammar packet obtained by the grammar packet acquisition unit; and an execution unit, configured to be obtained according to the speech recognition processing unit
  • Each grammar packet performs a corresponding control process on the speech recognition result of the speech signal.
  • the voice recognition processing unit specifically includes:
  • a first determining subunit configured to: for each syntax packet obtained by the syntax packet acquiring unit, when at least one of the grammar packets can be identified from the received voice signal, each of the grammar packets specified in advance And identifying, by the identifier corresponding to the identification information, an identifier corresponding to the identified identification information as a voice recognition result of the grammar packet to the voice signal;
  • a second determining subunit configured to: when the at least one identification information in the grammar packet cannot be identified from the received voice signal, determine that the current voice recognition fails, and according to the reason for the failure of the voice recognition processing, from the pre-designated In the identifier corresponding to the reason for the failure of the speech recognition processing, the identifier corresponding to the reason for the failure of the speech recognition processing is selected as the speech recognition result of the speech signal to the speech signal.
  • a voice control device connected to an external device comprising:
  • a grammar packet obtaining unit configured to classify the stored identification information for performing voice recognition, and obtain a grammar package corresponding to each type of identification information
  • a speech recognition processing unit configured to receive an input voice signal, and sequentially obtain a grammar packet Each grammar packet obtained by the fetching unit respectively performs speech recognition processing on the received speech signal; and a combining subunit, configured to perform speech recognition result of each grammar packet obtained by the speech recognition processing unit on the speech signal according to a predetermined combination order Combining, sending the combined result to the external device;
  • a receiving subunit configured to receive a query request sent by the external device, and the result of the splitting is obtained after the query request is split;
  • a selecting subunit configured to: select, from a correspondence between the pre-designated identification information and the identifier, identification information corresponding to the splitting result included in the query request received by the receiving subunit; and send a subunit for selecting
  • the identification information corresponding to the split result selected by the subunit is provided to the external device, so that the external device performs control processing according to the identification information corresponding to the split result.
  • the solution provided by the embodiment of the present invention separately generates a grammar package required for performing voice recognition according to different types of identification information, instead of generating a grammar package according to all identification information as in the prior art, and then separately according to each grammar package.
  • the received speech signal is subjected to speech recognition processing, regardless of the total number of words included in the received speech signal, when the identification information in each grammar packet can be recognized from the received speech signal, or can be received from
  • the subsequent control processing is performed according to the identified identification information, thereby improving the success rate of the voice control.
  • FIG. 1 is a schematic flowchart of a main implementation principle of an embodiment of the present invention.
  • FIG. 2 is a schematic flowchart of a specific process of performing voice dialing according to an embodiment of the present invention
  • FIG. 3 is a schematic structural diagram of a voice control apparatus according to an embodiment of the present invention
  • FIG. 4 is a schematic structural diagram of another voice control apparatus according to an embodiment of the present invention
  • FIG. 5 is a schematic structural diagram of another voice control apparatus according to an embodiment of the present invention. detailed description
  • the main reason is that the existing voice control technology specifies the recognition when performing voice recognition processing on the voice signal input by the user through the microphone. Subsequent processing is performed when the ratio between the number of words and the total number of words contained in the speech signal exceeds a predetermined threshold. In fact, in many cases, although the speech recognized by the speech is sufficient to guide the subsequent processing, the speech recognition fails because the proportion of the speech recognized in the speech signal is still lower than the predetermined threshold. . In order to solve the above problem, it is also unreasonable to reduce the predetermined threshold value during speech recognition, because this may cause subsequent processing to be impossible due to too few recognized words in many cases. It can be seen that the existing voice control technology is less flexible in the speech recognition processing.
  • the embodiment of the present invention proposes to generate a grammar packet required for performing voice recognition according to different types of information used for voice recognition, and perform voice recognition processing on the received voice signal according to each grammar packet, according to each grammar.
  • the speech recognition result of the packet when the result including the information required for performing the subsequent processing is included, the subsequent processing can be performed, thereby improving the success rate of the voice control.
  • Step 10 Acquire, according to the stored identification information used for performing voice recognition, a grammar package corresponding to each type of identification information;
  • Step 20 Receive an input voice signal, and sequentially perform voice recognition processing on the received voice signal by using each syntax packet obtained in step 10;
  • Step 30 Perform corresponding control processing on the speech recognition result of the speech signal according to each grammar packet obtained in step 20.
  • step 20 the following method is used to determine that each grammar package is received separately according to each The speech signal is subjected to speech recognition processing to obtain speech recognition results:
  • the identifier corresponding to the identification information in the grammar packet identified from the received voice signal is used as the grammar package pair
  • the speech recognition result of the speech signal for example, according to the speech signal data of each contact name information included in the first grammar package, the speech recognition process recognizes that the received speech signal includes the contact name information included in the grammar package. Zhang San", the corresponding identifier of "Zhang San” specified in advance is used as the speech recognition result of the first grammar packet to the speech signal.
  • step 10 and step 30 can be performed by the two different functional modules of the first module and the second module, or two devices respectively, in order to In the case of a large number of grammar packets, there is no need to transmit too many parameters between the two (ie, one module transmits the speech recognition result of each grammar packet to the speech signal to another module separately), which can be obtained in the first module.
  • each grammar packet has a speech recognition result for the speech signal, it is combined, and the processing is sent to a second module, and the second module performs a split processing corresponding to the combined processing, and performs corresponding processing according to the split result.
  • step 10 and step 20 are performed by the voice control device connected to the external device
  • step 30 is performed by the external device, specifically:
  • the voice control device combines each of the grammar packets with the voice recognition result of the voice signal in a predetermined combination order, and uses the combined result as the voice recognition result corresponding to the voice signal.
  • the voice recognition result of the first syntax packet to the speech signal is an identifier of "Zhang San”
  • the second syntax packet is a speech of the speech signal.
  • the recognition result is the identifier of the "mobile phone”
  • the speech recognition result of the speech signal is first according to the first grammar packet
  • the first grammar packet of the second grammar packet is the speech recognition result of the speech signal.
  • the voice control device After combining the voice recognition result of the voice signal and the voice recognition result of the second syntax packet with the voice signal, the combined result (the identifier of "Zhang San” + the identifier of "mobile phone") is used as the received voice signal. Speech recognition results.
  • the voice control device transmits the voice recognition result of the combined voice signal to the external device;
  • the external device splits the voice recognition result according to the split sequence corresponding to the predetermined combination order in step 20, and carries the split result in the query request and sends the result to the voice control device;
  • the voice control device receives the query request sent by the external device, and selects the identification information corresponding to the split result included in the query request from the corresponding relationship between the identifier information and the identifier specified in advance, and the split result Corresponding identification information is provided to the external device;
  • the external device performs control processing according to the identification information corresponding to the split result.
  • the identifier corresponding to the contact information identified in each grammar packet is used as a speech recognition result, and the speech recognition result of each grammar packet is combined as a speech recognition result of the received speech signal, and subsequent operations are performed.
  • Splitting, according to the split result, the corresponding processing is performed because the identifier requires less storage space than the string, which can improve the processing efficiency of the terminal device.
  • the contact information of the terminal device stores various information of the contact, and in the embodiment, the contact information is classified (in practice, many terminal devices have been stored according to the category when storing the contact information, then this step can be Omitted), for example, contact name type information including "Zhang San”, “Li Si”, contact contact type information including "mobile phone”, “landline”, including “dial-up”, “calling "Operation type information, etc.
  • each type of contact information obtained by the classification is separately grammatically compiled, thereby obtaining a grammar package corresponding to each type of contact information and containing voice data of the type of contact information.
  • the technique of specifically compiling the grammar package belongs to the prior art and will not be described in detail here.
  • the first language corresponding to the contact name type information is obtained respectively.
  • a corresponding identifier is preset for each contact information, and the identifier may be a string of a predetermined length, for example, each contact in this embodiment.
  • the identifier corresponding to the name information is a 3-digit string starting with the character "c", the identifier corresponding to "Zhang San” is c01, and the identifier corresponding to "Li Si” is c02; the identifier corresponding to the contact information of each contact is The 3-digit string at the beginning of the character "e”, the identifier corresponding to the "phone” is e01, the identifier corresponding to the "office phone” is e02, and the identifier corresponding to the "home phone” is e03; the identifier corresponding to each operation information is the character The 3-digit string at the beginning of "d”, the corresponding identifier for "dial-through” is d01, and the corresponding identifier for "query” is d02.
  • the identifier corresponding to each failure reason is determined when the predetermined number of voice recognition processing fails, for example, the identifier corresponding to each failure reason in the embodiment is a 3-digit character string starting with the character "cx". The corresponding input "sound voice is too small” is cxl.
  • Step 202 The terminal device receives a voice signal input by the user through a voice input interface such as a microphone.
  • a voice input interface such as a microphone.
  • the user inputs “handset for the mobile phone for me”.
  • Scene 1 The user volume is sufficient to meet the needs of voice recognition, and each word can be recognized from the voice signal input by the user, and the process proceeds to step 203;
  • Scenario 2 The user volume is very low, it is difficult to meet the needs of speech recognition, and each word cannot be recognized from the voice signal input by the user, and the process proceeds to step 210;
  • Step 203 The terminal device performs voice recognition processing on the voice signal received in step 202 in sequence according to each of the three syntax packets obtained in step 201 (that is, loading the syntax packets obtained in step 201 in the voice recognition processing module.
  • the identifier corresponding to the contact information corresponding to the voice data of the contact information present in the received grammar packet is used as the voice recognition result of the grammar packet to the voice signal, and proceeds to step 204;
  • the voice signal input by the user is subjected to voice recognition processing, and it can be recognized that the voice signal input by the user includes "Zhang San", and then the identifier cO1 corresponding to "Zhang San” is used as the first grammar package.
  • the voice signal input by the user is subjected to voice recognition processing, and it can be recognized that the voice signal input by the user includes “mobile phone”, and then the identifier eO1 corresponding to the “mobile phone” is received as the second grammar packet.
  • the speech recognition result of the speech signal is received as the second grammar packet.
  • the voice signal input by the user is subjected to voice recognition processing, and it can be recognized that the voice signal input by the user includes “dial-through”, and then the identifier dO1 corresponding to the “dial-through” is used as the third grammar package pair.
  • the speech recognition result of the received speech signal is used as the third grammar package pair.
  • Step 204 in accordance with a predetermined combination order, each grammar packet obtained in step 203 is combined with the voice recognition result of the voice signal, and the combined result is sent to the external device as the voice recognition result corresponding to the voice signal, and proceeds to step 205;
  • the speech recognition result of the first grammar packet is placed in the first place
  • the speech recognition result of the second grammar packet is placed in the second place
  • the speech recognition result of the third grammar packet is placed in the third place.
  • the speech recognition results of the grammar packs are combined, and the combined result cOleOldOl is used as the speech recognition result of the speech signal received in step 202.
  • Step 205 The external device splits the voice recognition result corresponding to the voice signal according to the split sequence corresponding to the combination order in step 204, and obtains three identifiers respectively as c01, e01, and dOl, and proceeds to step 206;
  • Step 206 The external device carries the split results c01, eO1, and dO1 in step 205 in the query request and sends the result to the terminal device.
  • Step 207 The terminal device selects identification information corresponding to the split result included in the query request from a correspondence relationship between each contact information and the identifier that is specified in advance, for example, selecting “Zhang San” corresponding to cO1, selecting The "telephone” corresponding to eOl and the "handset” corresponding to dOl; step 208, the terminal device provides the identification information corresponding to the split result in step 207 Sending to the external device, for example, carrying the query response to the external device;
  • Step 209 The external device performs a process of initiating a call to Zhang San according to "Zhang San” corresponding to cOl in the query response, "Mobile” corresponding to eOl, and "Dial” corresponding to dOl;
  • Step 210 The terminal device performs voice recognition processing on the voice signal received in step 202 in sequence according to each of the three syntax packets obtained in step 201, because the syntax packet cannot be identified from the received voice signal. Any contact information included, so that the voice recognition failure is determined, and according to the reason for the failure of the voice recognition processing, the voice recognition processing fails to be selected from the identifiers corresponding to the various reasons for the failure of the voice recognition processing specified in advance.
  • the identifier corresponding to the reason is used as the identifier corresponding to the contact information identified in the grammar package, and the process proceeds to step 211.
  • the voice signal input by the user is first performed according to the first grammar packet, because the voice cannot be received. Identifying any contact name information of the first grammar packet in the signal, and using the identifier cxl corresponding to the "identification failure due to the volume being too small" as the first grammar packet to the voice recognition result of the received voice signal ;
  • the voice signal input by the user is performed, and since the contact information of any contact of the second grammar packet cannot be recognized from the received voice signal, the pre-specified "The recognition fails due to the volume being too small".
  • the corresponding identifier exl is used as the speech recognition result of the received speech signal by the second grammar packet;
  • the voice signal input by the user is subjected to voice recognition processing, and since any one of the operation information of the third grammar packet cannot be recognized from the received voice signal, the pre-specified "because of the volume Small causes the recognition to fail"
  • the corresponding identifier dxl is used as the third grammar packet for the speech recognition result of the received speech signal.
  • Step 211 Combine, according to a predetermined combination order, each grammar packet obtained in step 210 with a voice recognition result of the voice signal, and send the combined result as a voice recognition result of the received voice signal to an external device, and proceed to the step 212;
  • the speech recognition result of the first grammar package is placed in the first place and the second language.
  • the speech recognition result of the packet is placed in the second place, and the speech recognition result of the third grammar packet is placed in the third order, and the speech recognition results of the grammar packs are combined, and the combined result cxlexldxl is received as step 202.
  • the speech recognition result of the speech signal is received as step 202.
  • step 212 the external device splits the voice recognition result corresponding to the voice signal according to the split sequence corresponding to the predetermined combination order in step 211, and obtains three identifiers respectively cxl, exl, and dxl, and proceeds to step 213;
  • Step 213 The external device carries the split results cxl, exl, and dxl in step 212 in the query request and sends the result to the terminal device.
  • Step 214 The terminal device selects a reason for the failure of the voice recognition corresponding to the split result included in the query request, for example, “the recognition fails due to the volume being too small” from the correspondence between the reason for the failure of the voice recognition failure and the identifier.
  • Step 215 The terminal device provides the voice recognition failure reason corresponding to the split result in step 214 to the external device, for example, carried in the query response and sent to the external device;
  • Step 216 The external device determines that the subsequent processing cannot be performed according to the reason for the failure of the sound corresponding to the split result in the query response, and performs a prompt signal for sending the voice recognition failure to the user.
  • step 202 The two scenarios given in step 202 are extremely extreme.
  • the contact included in the grammar packet can be identified from the received speech signal. Human information, while speech recognition fails when using the rest of the grammar package for speech recognition processing, then the following adaptation scheme can be used:
  • the user sends a prompt message to the user, and the prompt message prompts the user that the voice recognition fails, optionally, the terminal device according to
  • the feedback information after the user receives the prompt signal determines whether the voice recognition processing failure reason is further determined according to the identifier corresponding to the voice recognition processing failure cause; or Preferably, the number of identifiers corresponding to the reason for the failure of the same voice recognition processing in the voice recognition result of the voice signal of all the grammar packets is counted, and the reason for the failure of the voice recognition processing corresponding to the identifier with the largest number is prompted by the prompt message To the user; or,
  • the reason for the failure of the speech recognition processing corresponding to the identifier corresponding to the failure of the same number of identical speech recognition processing reasons outputting a prompt signal; otherwise, performing corresponding processing according to the split result, or,
  • the corresponding processing is performed, for example, when the voice recognition result of the first syntax packet and the third syntax packet for the voice signal is not the identifier corresponding to the voice recognition processing failure, for example, the voice recognition result of the first syntax packet is Cxl (corresponding contact information is "Zhang San"), the third grammar package voice recognition result is dxl (corresponding contact information is "dial”), then you can dial Zhang San's mobile phone or office phone.
  • step 203 or step 210 since the voice recognition processing is performed according to the syntax packet having a small amount of data corresponding to each contact type information, instead of one data including all the contact information as in the prior art, A large amount of grammar packets are used for speech recognition processing, thereby speeding up the speed of speech recognition and saving time spent performing speech recognition processing.
  • the solution provided by the embodiment of the present invention separately generates a grammar package required for voice recognition according to different types of contact information, instead of generating a grammar package according to all contact information as in the prior art, and then according to each grammar package.
  • Performing a voice recognition process on the received voice signal respectively, when the contact information in each grammar packet can be identified from the received voice signal, or can identify a part of the grammar packet from the received voice signal
  • the subsequent processing is performed according to the identified contact information, instead of the syllable of the recognized contact information as in the prior art, the proportion of all the syllables included in the speech signal is higher than the pre-predetermined
  • the threshold value can be used to perform subsequent processing. Otherwise, the recognition failure processing ends, and it is ignored whether the contact information identified at this time is sufficient to support subsequent processing. Therefore, the voice control solution provided by the embodiment solves the problem of low success rate of voice control in the prior art.
  • the embodiment of the present invention further provides a voice control apparatus.
  • the apparatus includes a syntax packet acquiring unit 301, a voice recognition processing unit 302, and an executing unit 303, which are specifically as follows:
  • the grammar packet obtaining unit 301 is configured to classify the stored identification information for performing voice recognition, and obtain a grammar package corresponding to each type of identification information;
  • the voice recognition processing unit 302 is configured to receive the input voice signal, and sequentially perform voice recognition processing on the received voice signal by using each syntax packet obtained by the syntax packet obtaining unit 301;
  • the executing unit 303 is configured to perform corresponding control processing on the voice recognition result of the voice signal according to each grammar packet obtained by the voice recognition processing unit 302.
  • the voice recognition processing unit 302 specifically includes a first determining subunit 401 and a second determining subunit 402, where:
  • a first determining subunit 401 configured to: for each grammar packet acquired by the grammar packet obtaining unit 301, when the at least one identifiable information in the grammar packet can be identified from the received voice signal, the pre-specified grammar packet In the identifier corresponding to each identification information, an identifier corresponding to the identified identification information is selected as a voice recognition result of the grammar packet to the voice signal;
  • the second determining sub-unit 402 is configured to: when the at least one piece of the identification information in the grammar packet cannot be identified from the received voice signal, determine that the voice recognition fails, and according to the reason for the failure of the voice recognition process, pre-specify Among the identifiers corresponding to the reasons for the failure of the speech recognition processing, the identifier corresponding to the reason for the failure of the speech recognition processing is selected as the speech recognition result of the speech packet to the speech signal.
  • the voice control device is connected to an external device, and includes a syntax packet acquiring unit 501, a voice recognition processing unit 502, a combining unit 503, a receiving unit 504, a selecting unit 505, and a sending unit 506, where:
  • the grammar packet obtaining unit 501 is configured to classify the stored identification information for performing voice recognition, and obtain a grammar package corresponding to each type of identification information;
  • the voice recognition processing unit 502 is configured to receive the input voice signal, and sequentially perform voice recognition processing on the received voice signal by using each syntax packet obtained by the syntax packet obtaining unit 501;
  • the combining unit 503 is configured to combine, according to a predetermined combination order, the grammar packets to the voice recognition result of the voice signal, and send the combined result to the external device;
  • the receiving unit 504 is configured to receive a query request sent by the external device, where the result of the query request is split after the split result is obtained;
  • the selecting unit 505 is configured to select identification information corresponding to the split result included in the query request received by the receiving unit 504 from the correspondence between the identifier information and the identifier specified in advance; the sending unit 506 is configured to select The identification information corresponding to the split result selected by the unit 505 is provided to the external device, so that the external device performs control processing according to the identification information corresponding to the split result.

Description

语音控制方法和语音控制装置 技术领域
本发明涉及语音识别和通信技术领域, 尤其涉及一种语音控制方法及 一种语音控制装置。 背景技术
为了使人们在特定场景下能够无需按键、 快捷地使用终端设备提供的 各种业务, 语音控制技术应运而生。 人们只需要在终端设备的麦克风附近 说出各种指令, 终端设备就可以根据该指令执行相应的处理。 以语音拨号 技术为例, 为了方便人们在双手被占用无法按键(如驾驶交通工具) 时、 或者为了使上肢不健全的残疾人也能够拨打电话, 该技术能够从用户的语 音中识别出拨打电话所需的信息, 并根据识别出的信息进行拨号。 人们只 需在终端设备(包括固定终端或移动终端) 的麦克风中输入语音指令, 例 如 "拨通张三的手机", 终端设备就可以为用户建立与被叫人的通话, 极大 地简化了用户的操作。 除了语音拨号这种应用之外, 语音控制技术还广泛 应用于机器人、 能够音控开关的车库等很多产品中。
下面以语音拨号技术为例, 介绍现有语音控制技术的基本原理: 终端设备首先根据通信录中包含的各种联系人信息, 例如姓名、 地址、 联系方式等等, 生成一个语法包, 该语法包中包含上述联系人信息的语音 数据; 然后终端设备通过麦克风等音频信号接收接口接收用户输入的语音 信号, 并根据接收到的语音信号和生成的语法包进行语音识别, 判断接收 到的语音信号中的每个字的语音数据是否存在语法包中, 若是, 认为从接 收到的语音信号中识别出了这个字。 在从接收到的语音信号中识别出的字 数量在接收到的语音信号包含的全部字中所占比例超过预定阔值后, 确定 对接收到的语音信号识别成功, 执行对应的后续处理。 举例来说: 假如终 端设备规定在能够成功识别出 60%的字时确定识别成功, 这时用户输入的 语音为 "拨通张三的手机", 那么如果终端设备能识别出其中的四个 ( 7*60%=4.2 )以上字的音节时, 认为识别成功, 进行后续拨号流程; 否则 认为识别失败, 处理结束。
为了在语音识别成功后能够根据识别出的信息有效地进行对应的拨号 处理, 通常会预先规定在语音识别时, 判别语音识别成功与否时识别出的 字数在接收到的语音信号包含的总字数中所占比例的阔值较高。 而现实中, 很多原因都会导致能识别出的字的比例难以达到预定阔值而导致语音识别 失败, 使得处理结束, 例如用户无意识地输入了很长一段话, 其中只有少 数几个字是与拨号行为相关的, 这时往往会因为能识别出的字所占比例难 以达到预定阔值而导致识别失败, 使得处理结束; 又如, 用户由于方言口 音问题, 终端设备只能识别出其中的极少几个字, 也会因为能识别出的字 所占比例难以达到预定阔值, 而导致处理结束。 因此, 现有语音控制技术 的成功率较低。 发明内容
有鉴于此, 本发明实施例的主要目的在于提供一种语音控制方法, 用 以解决现有技术中语音控制成功率低的问题。 对应地, 本发明实施例还提 供了一种语音控制装置。
为解决上述技术问题, 本发明实施例提供的技术方案如下:
一种语音控制方法, 包括: 对存储的用于进行语音识别的识别信息分 类, 获得各类识别信息分别对应的语法包; 接收输入的语音信号, 并依次 使用获得的每个语法包分别对接收到的语音信号进行语音识别处理; 以及, 根据各语法包对所述语音信号的语音识别结果, 执行对应控制处理。
所述根据每个语法包对所述语音信号进行语音识别处理, 具体包括: 当能够从接收到的语音信号中识别出该语法包中至少一个识别信息时, 从 预先指定的该语法包中各识别信息对应的标识中, 选择出识别出的识别信 息对应的标识作为该语法包对所述语音信号的语音识别结果; 否则, 确定 本次语音识别失败, 并根据本次语音识别处理失败原因, 从预先指定的各 语音识别处理失败原因对应的标识中, 选择出本次语音识别处理失败原因 对应的标识作为该语法包对所述语音信号的语音识别结果。
所述根据各语法包对所述语音信号的语音识别结果执行对应控制处 理, 具体包括: 当各语法包对所述语音信号的语音识别结果中, 存在至少 一个语音识别处理失败原因对应的标识时, 输出用于提示语音识别失败的 提示信号。
所述根据各语法包对所述语音信号的语音识别结果执行对应控制处 理, 具体包括: 统计在各语法包对所述语音信号的语音识别结果中, 相同 语音识别处理失败原因对应的标识的数量, 并将其中数量最多的标识对应 的语音识别处理失败原因通过提示消息提示给用户。
所述根据各语法包对所述语音信号的语音识别结果执行对应控制处 理, 具体包括: 在指定语法包对所述语音信号的语音识别结果中, 不存在 语音识别处理失败原因对应的标识时, 根据所述指定语法包对所述语音信 号的语音识别结果, 执行预定的与所述指定语法包对所述语音信号的语音 识别结果对应的控制处理。
所述根据各语法包对所述语音信号的语音识别结果执行对应控制处 理, 具体包括: 按照预定组合顺序, 将各语法包对所述语音信号的语音识 别结果进行组合, 将组合结果发送给外部设备; 以及, 接收所述外部设备 发来的查询请求, 所述查询请求中包含所述外部设备按照与所述预定组合 预先指定的识别信息与标识的对应关系中, 选择出所述查询请求中包含的 拆分结果对应的识别信息; 将拆分结果对应的识别信息提供给所述外部设 备, 所述外部设备根据拆分结果对应的识别信息执行控制处理。
所述用于进行语音识别的识别信息包括联系人姓名类型信息、 联系人 联系方式类型信息、 操作类型信息。
一种语音控制装置, 包括:
语法包获取单元, 用于对存储的用于进行语音识别的识别信息分类, 获得各类型识别信息分别对应的语法包;
语音识别处理单元, 用于接收输入的语音信号, 并依次使用语法包获 取单元获得的每个语法包分别对接收到的语音信号进行语音识别处理; 执行单元, 用于根据语音识别处理单元获得的各语法包对所述语音信 号的语音识别结果, 执行对应控制处理。
所述语音识别处理单元具体包括:
第一确定子单元, 用于针对语法包获取单元获取的每个语法包, 当能 够从接收到的语音信号中识别出该语法包中至少一个识别信息时, 从预先 指定的该语法包中各识别信息对应的标识中, 选择出识别出的识别信息对 应的标识作为该语法包对所述语音信号的语音识别结果;
第二确定子单元, 用于当不能从接收到的语音信号中识别出该语法包 中至少一个识别信息时, 确定本次语音识别失败, 并根据本次语音识别处 理失败原因, 从预先指定的各语音识别处理失败原因对应的标识中, 选择 出本次语音识别处理失败原因对应的标识作为该语法包对所述语音信号的 语音识别结果。
一种与外部设备连接的语音控制装置, 包括:
语法包获取单元, 用于对存储的用于进行语音识别的识别信息分类, 获得各类型识别信息分别对应的语法包;
语音识别处理单元, 用于接收输入的语音信号, 并依次使用语法包获 取单元获得的每个语法包分别对接收到的语音信号进行语音识别处理; 组合子单元, 用于按照预定组合顺序, 将语音识别处理单元获取的各 语法包对所述语音信号的语音识别结果进行组合, 将组合结果发送给所述 外部设备;
接收子单元, 用于接收所述外部设备发来的查询请求, 所述查询请求 结果进行拆分后获得的拆分结果;
选择子单元, 用于从预先指定的识别信息与标识的对应关系中, 选择 出接收子单元接收到的所述查询请求中包含的拆分结果对应的识别信息; 发送子单元, 用于将选择子单元选择出的拆分结果对应的识别信息提 供给所述外部设备, 以便所述外部设备根据拆分结果对应的识别信息执行 控制处理。
本发明实施例提供的方案根据识别信息的不同类型, 分别生成用于进 行语音识别所需的语法包, 而不是像现有技术一样根据全部识别信息生成 一个语法包, 然后根据各个语法包分别对接收到的语音信号进行语音识别 处理, 无论接收到的语音信号中包含的总字数如何, 在能够从接收到的语 音信号中识别出每个语法包中的识别信息时、 或者能够从接收到的语音信 号中识别出部分语法包中的识别信息时, 根据识别出的识别信息执行后续 控制处理, 从而提高了语音控制的成功率。 附图说明
图 1为本发明实施例的主要实现原理流程示意图;
图 2为本发明实施例进行语音拨号时具体过程的流程示意图; 图 3为本发明实施例提供的语音控制装置的结构示意图;
图 4为本发明实施例提供的另一种语音控制装置的结构示意图; 图 5为本发明实施例提供的又一种语音控制装置的结构示意图。 具体实施方式
发明人在实施包括语音拨号在内的语音控制技术的过程中发现语音控 制成功率较低, 主要原因是现有语音控制技术在对用户通过麦克风输入的 语音信号进行语音识别处理时, 规定识别出的字数与语音信号包含的全部 字数之间的比例超过预定阔值时, 才能进行后续处理。 而事实上, 很多情 况下虽然语音识别出的字已经足够指导完成后续的处理, 但却由于语音识 别出的字在语音信号包含的字中所占比例仍然低于预定阔值, 导致语音拨 号失败。 而单纯为了解决上述问题, 降低语音识别时的预定阔值也是不合 理的, 因为这会导致在很多情况下由于识别出的字过少, 而使得后续处理 无法完成。 可见现有语音控制技术在语音识别处理环节中灵活些较差。
本发明实施例提出根据用于进行语音识别的信息的不同类型, 分别生 成用于进行语音识别所需的语法包, 并根据各个语法包分别对接收到的语 音信号进行语音识别处理, 根据各语法包的语音识别结果, 在该结果中包 含执行后续处理所需的信息时, 即可执行后续处理, 从而提高了语音控制 的成功率。
下面结合各个附图对本发明实施例技术方案的主要实现原理、 具体实 施方式及其对应能够达到的有益效果进行详细的阐述。
如图 1所示, 本发明实施例的主要实现原理流程如下:
步骤 10, 根据存储的用于进行语音识别的识别信息, 获得各类识别信 息分别对应的语法包;
步骤 20,接收输入的语音信号, 并依次使用步骤 10获得的每个语法包 分别对接收到的语音信号进行语音识别处理;
步骤 30,根据步骤 20获得的每个语法包对所述语音信号的语音识别结 果, 执行对应控制处理。
在上述步骤 20中, 釆用以下方法来确定根据每个语法包分别对接收到 的语音信号进行语音识别处理, 获得语音识别结果:
针对每个语法包, 根据预先指定的该语法包中每个识别信息对应的标 识, 将从接收到的语音信号中识别出的该语法包中的识别信息对应的标识 作为该语法包对所述语音信号的语音识别结果, 例如根据第一语法包中包 含的各联系人姓名信息的语音信号数据, 经语音识别处理识别出接收到的 语音信号中包含该语法包中包含的联系人姓名信息 "张三", 则将预先指定 的 "张三" 的对应的标识作为该第一语法包对语音信号的语音识别结果。
在实际实施过程中, 考虑到功能模块配置的灵活性, 可以由第一模块 和第二模块两个不同的功能模块、或两个设备分别执行步骤 10、 步骤 20和 步骤 30的功能, 为了在语法包数量较多的情况下, 二者之间无需传送过多 的参数(即其中一个模块将每个语法包对语音信号的语音识别结果分别传 送给另一模块 ), 可以在第一模块获得每个语法包对语音信号的语音识别结 果后, 将其进行合并, 处理为一个结果发送给第二模块, 第二模块进行与 组合处理对应的拆分处理, 并根据拆分结果来执行对应处理。 例如, 由与 外部设备连接的语音控制装置执行步骤 10、 步骤 20, 而由外部设备执行步 骤 30, 具体地:
语音控制装置在步骤 20中, 按照预定组合顺序, 将每个语法包对所述 语音信号的语音识别结果进行组合, 将组合结果作为所述语音信号对应的 语音识别结果。 例如, 共有第一语法包和第二语法包两个语法包, 其中第 一语法包对所述语音信号的语音识别结果为 "张三" 的标识, 第二语法包 对所述语音信号的语音识别结果为 "手机" 的标识, 那么按照第一语法包 对所述语音信号的语音识别结果在先, 第二语法包对所述语音信号的语音 识别结果在后的顺序, 将第一语法包对所述语音信号的语音识别结果和第 二语法包对所述语音信号的语音识别结果组合后, 将组合结果("张三" 的 标识 + "手机" 的标识)作为接收到的语音信号的语音识别结果。 语音控制装置将组合获得的语音信号的语音识别结果发送给外部设 备;
对应地,外部设备按照步骤 20中的所述预定组合顺序对应的拆分顺序, 对语音识别结果进行拆分, 将拆分结果携带在查询请求中发送给语音控制 装置;
语音控制装置接收所述外部设备发来的查询请求, 并从预先指定的识 别信息与标识的对应关系中, 选择出所述查询请求中包含的拆分结果对应 的识别信息, 以及将拆分结果对应的识别信息提供给所述外部设备;
所述外部设备根据拆分结果对应的识别信息执行控制处理。
在上述各个步骤中, 将每个语法包中识别出的联系人信息对应的标识 作为语音识别结果, 将每个语法包的语音识别结果组合作为接收到的语音 信号的语音识别结果、 以及后续进行拆分, 根据拆分结果执行对应处理是 因为标识相对于字符串来说, 所需的存储空间较小, 可以提高终端设备的 处理效率。
下面将依据本发明上述发明原理, 以语音拨号过程为例详细介绍一个 具体实施例来对本发明方法的主要实现原理进行详细的阐述和说明。
终端设备的通信录中存储有联系人的各种信息, 在本实施例中对联系 人信息进行分类 (在实际中许多终端设备在存储联系人信息时已经按照类 别进行存储, 那么这一步就可以省略), 例如包括 "张三"、 "李四" 在内的 联系人姓名类型信息、 包括 "手机"、 "座机" 在内的联系人联系方式类型 信息、 包括 "拨通"、 "打电话" 在内的操作类型信息等。
请参照附图 2, 步骤 201 , 对分类获得的每种类型的联系人信息, 分别 进行语法编译, 从而获得每种类型联系人信息对应的、 包含该类型联系人 信息的语音数据的语法包, 具体编译获得语法包的技术属于现有技术, 在 这里不再详述。 在本实施例中分别获得联系人姓名类型信息对应的第一语 法包、 联系人联系方式类型信息对应的第二语法包和操作类型信息对应的 第三语法包。
出于终端设备执行效率方面的考虑, 在生成语法包时, 为每个联系人 信息预先设定对应的标识, 该标识可以为一个预定长度的字符串, 例如在 本实施例中每个联系人姓名信息对应的标识为以字符 "c" 开头的 3位字符 串, "张三" 对应的标识为 c01、 "李四" 对应的标识为 c02; 每个联系人联 系方式信息对应的标识为以字符 "e" 开头的 3位字符串, "手机" 对应的 标识为 e01、 "办公电话 "对应的标识为 e02、 "家庭电话 "对应的标识为 e03; 每个操作信息对应的标识为以字符 "d" 开头的 3位字符串, "拨通" 对应 的标识为 d01、 "查询" 对应的标识为 d02。
较佳地, 还可以设定预定数量语音识别处理失败时各种失败原因分别 对应的标识, 例如在本实施例中每种失败原因对应的标识为以字符 "cx" 开头的 3位字符串, "输入语音声音过小" 对应的标识为 cxl。
步骤 202,终端设备接收用户通过麦克风等语音输入接口输入的语音信 号, 例如用户输入 "替我拨通张三的手机", 为了便于说明本实施例提供的 方案, 假定存在两种场景:
场景 1 : 用户音量足以满足语音识别的需要, 能够从用户输入的语音信 号中识别出每一个字, 进入步骤 203;
场景 2: 用户音量很低, 难以满足语音识别的需要, 无法从用户输入的 语音信号中识别出每一个字, 进入步骤 210;
步骤 203 , 终端设备根据步骤 201获得的三个语法包中的每个语法包, 依次对步骤 202接收到的语音信号进行语音识别处理(即在语音识别处理 模块中加载步骤 201获得的各语法包), 将接收到的语音信号中出现的、 且 该语法包中存在的联系人信息语音数据对应的联系人信息对应的标识作为 该语法包对所述语音信号的语音识别结果, 进入步骤 204; 例如, 先根据第一语法包, 对用户输入的语音信号进行语音识别处理, 能够识别出用户输入的语音信号中包括 "张三", 那么将 "张三" 对应的标 识 cOl作为第一语法包对接收到的语音信号的语音识别结果;
同理, 根据第二语法包, 对用户输入的语音信号进行语音识别处理, 能够识别出用户输入的语音信号中包括 "手机", 那么将 "手机" 对应的标 识 eOl作为第二语法包对接收到的语音信号的语音识别结果;
然后, 根据第三语法包, 对用户输入的语音信号进行语音识别处理, 能够识别出用户输入的语音信号中包括 "拨通", 那么将 "拨通" 对应的标 识 dOl作为第三语法包对接收到的语音信号的语音识别结果。
步骤 204,按照预定组合顺序, 将步骤 203获得的每个语法包对所述语 音信号的语音识别结果组合, 将组合结果作为所述语音信号对应的语音识 别结果发送给外部设备, 进入步骤 205;
在本实施例中按照将第一语法包的语音识别结果放在第一位、 第二语 法包的语音识别结果放在第二位、 第三语法包的语音识别结果放在第三位 的顺序, 将各语法包的语音识别结果组合在一起, 将组合结果 cOleOldOl 作为步骤 202接收到的语音信号的语音识别结果。
步骤 205, 外部设备按照与步骤 204中的组合顺序对应的拆分顺序,对 所述语音信号对应的语音识别结果进行拆分,获得三个标识分别为 c01、e01 和 dOl , 进入步骤 206;
步骤 206, 外部设备将步骤 205中的拆分结果 c01、 eOl和 dOl携带在 查询请求中发送给终端设备;
步骤 207, 终端设备从预先指定的每个联系人信息与标识的对应关系 中, 选择出所述查询请求中包含的拆分结果对应的识别信息, 例如选择出 cOl对应的 "张三"、 选择出 eOl对应的 "拨通" 和 dOl对应的 "手机"; 步骤 208,终端设备将包含步骤 207中的拆分结果对应的识别信息提供 给所述外部设备, 例如携带在查询响应中发送给外部设备;
步骤 209,外部设备根据查询响应中 cOl对应的 "张三"、 eOl对应的 "手 机" 和 dOl对应的 "拨通", 执行向张三发起呼叫的处理;
步骤 210, 终端设备根据步骤 201获得的三个语法包中的每个语法包, 依次对步骤 202接收到的语音信号进行语音识别处理, 由于不能从接收到 的语音信号中识别出该语法包中包含的任一联系人信息, 因此确定本次语 音识别失败, 并根据本次语音识别处理失败原因, 从预先指定的各种语音 识别处理失败原因对应的标识中, 选择出本次语音识别处理失败原因对应 的标识作为该语法包中识别出的联系人信息对应的标识, 进入步骤 211 ; 例如, 先根据第一语法包, 对用户输入的语音信号进行语音识别处理, 由于不能从接收到的语音信号中识别出第一语法包的任——个联系人姓名 信息, 将预先指定的 "由于音量过小导致识别失败" 对应的标识 cxl 作为 第一语法包对接收到的语音信号的语音识别结果;
同理, 根据第二语法包, 对用户输入的语音信号进行语音识别处理, 由于不能从接收到的语音信号中识别出第二语法包的任——个联系人联系 方式信息, 将预先指定的 "由于音量过小导致识别失败" 对应的标识 exl 作为第二语法包对接收到的语音信号的语音识别结果;
然后, 根据第三语法包, 对用户输入的语音信号进行语音识别处理, 由于不能从接收到的语音信号中识别出第三语法包的任——个操作信息, 将预先指定的 "由于音量过小导致识别失败" 对应的标识 dxl 作为第三语 法包对接收到的语音信号的语音识别结果。
步骤 211 , 按照预定组合顺序, 将步骤 210获得的每个语法包对所述语 音信号的语音识别结果组合, 将组合结果作为所述接收到的语音信号的语 音识别结果发送给外部设备, 进入步骤 212;
在本实施例中按照将第一语法包的语音识别结果放在第一位、 第二语 法包的语音识别结果放在第二位、 第三语法包的语音识别结果放在第三位 的顺序, 将各语法包的语音识别结果组合在一起, 将组合结果 cxlexldxl 作为步骤 202接收到的语音信号的语音识别结果。
步骤 212,外部设备按照与步骤 211中的预定组合顺序对应的拆分顺序 , 对所述语音信号对应的语音识别结果进行拆分, 获得三个标识分别为 cxl、 exl和 dxl , 进入步骤 213;
步骤 213 , 外部设备将步骤 212中的拆分结果 cxl、 exl和 dxl携带在 查询请求中发送给终端设备;
步骤 214,终端设备从预先指定的语音识别失败原因与标识的对应关系 中, 选择出所述查询请求中包含的拆分结果对应的语音识别失败原因, 例 如 "由于音量过小导致识别失败";
步骤 215 ,终端设备将包含步骤 214中的拆分结果对应的语音识别失败 原因提供给所述外部设备, 例如携带在查询响应中发送给外部设备;
步骤 216, 外部设备根据查询响应中的拆分结果对应的音识别失败原 因, 判断出无法进行后续的处理, 执行向用户发送语音识别失败的提示信 号。
在步骤 202 中给出的两种场景均为比较极端的情况, 实际中往往会出 现釆用部分语法包进行语音识别处理时, 能从接收到的语音信号中识别出 该语法包中包含的联系人信息, 而在釆用其余部分语法包进行语音识别处 理时, 语音识别失败, 这时可以釆用如下适应方案:
在确定存在至少一个语法包对所述语音信号的语音识别结果为语音识 别处理失败原因对应的标识时, 向用户发送提示消息, 通过该提示消息提 示用户语音识别失败, 可选地, 终端设备根据用户收到提示信号后的反馈 信息判断是否进一步根据语音识别处理失败原因对应的标识确定语音识别 处理失败原因; 或者, 较佳地, 统计在所有语法包对所述语音信号的语音识别结果中, 相同 语音识别处理失败原因对应的标识的数量, 并将其中数量最多的标识对应 的语音识别处理失败原因通过提示消息提示给用户; 或者,
在所有语法包对所述语音信号的语音识别结果中语音识别处理失败原 因对应的标识数量超过预定阔值时, 根据其中数量最多的相同语音识别处 理失败原因对应的标识对应的语音识别处理失败原因, 输出提示信号; 否 则根据拆分结果执行对应的处理, 或者,
预先指定部分语法包对所述语音信号的语音识别结果不是语音识别处 理失败对应的标识 (即各种语音识别处理失败原因对应的标识) 时, 根据 该部分语法包对所述语音信号的语音识别结果, 执行对应的处理, 例如, 预先指定第一语法包和第三语法包的对所述语音信号的语音识别结果不是 语音识别处理失败对应的标识时, 例如第一语法包的语音识别结果为 cxl (对应的联系人信息为 "张三")、 第三语法包的语音识别结果为 dxl (对应 的联系人信息为 "拨通"), 那么即可拨通张三的手机或办公电话。
在上述步骤 203或步骤 210中, 由于是根据各联系人类型信息对应的 数据量较小的语法包进行语音识别处理, 而不是像现有技术一样, 根据包 括所有联系人信息在内的一个数据量较大的语法包进行语音识别处理, 因 而加快了语音识别的速度, 节省了进行语音识别处理耗用的时间。
本发明实施例提供的方案根据联系人信息的不同类型, 分别生成用于 进行语音识别所需的语法包, 而不是像现有技术一样根据全部联系人信息 生成一个语法包, 然后根据各个语法包分别对接收到的语音信号进行语音 识别处理, 在能够从接收到的语音信号中识别出每个语法包中的联系人信 息时, 或者能够从接收到的语音信号中识别出部分语法包中的联系人信息 时, 根据识别出的联系人信息执行后续处理, 而不是像现有技术一样只有 识别出的联系人信息的音节在语音信号包含的全部音节中所占比例高于预 定阔值才能执行后续处理, 否则就认为识别失败处理结束, 忽略了此时识 别出的联系人信息是否已经足以支持执行后续处理。 因此, 实施例提供的 语音控制方案解决了现有技术语音控制成功率低的问题。
相应地, 本发明实施例还提供了一种语音控制装置, 如图 3 所示, 该 装置包括语法包获取单元 301、 语音识别处理单元 302和执行单元 303 , 具 体如下:
语法包获取单元 301 , 用于对存储的用于进行语音识别的识别信息分 类, 获得各类型识别信息分别对应的语法包;
语音识别处理单元 302, 用于接收输入的语音信号, 并依次使用语法包 获取单元 301 获得的每个语法包分别对接收到的语音信号进行语音识别处 理;
执行单元 303 ,用于根据语音识别处理单元 302获得的各语法包对所述 语音信号的语音识别结果, 执行对应控制处理。
请参照附图 4, 在附图 3 所示的语音控制装置中, 语音识别处理单元 302具体包括第一确定子单元 401和第二确定子单元 402, 其中:
第一确定子单元 401 ,用于针对语法包获取单元 301获取的每个语法包 , 当能够从接收到的语音信号中识别出该语法包中至少一个识别信息时, 从 预先指定的该语法包中各识别信息对应的标识中, 选择出识别出的识别信 息对应的标识作为该语法包对所述语音信号的语音识别结果;
第二确定子单元 402,用于当不能从接收到的语音信号中识别出该语法 包中至少一个识别信息时, 确定本次语音识别失败, 并根据本次语音识别 处理失败原因, 从预先指定的各语音识别处理失败原因对应的标识中, 选 择出本次语音识别处理失败原因对应的标识作为该语法包对所述语音信号 的语音识别结果。
较佳地,请参照附图 5 , 为本发明实施例提供了另一种语音控制装置的 结构示意图, 该语音控制装置与外部设备连接, 包括语法包获取单元 501、 语音识别处理单元 502、 组合单元 503、 接收单元 504、 选择单元 505和发 送单元 506, 其中:
语法包获取单元 501 , 用于对存储的用于进行语音识别的识别信息分 类, 获得各类型识别信息分别对应的语法包;
语音识别处理单元 502, 用于接收输入的语音信号, 并依次使用语法包 获取单元 501 获得的每个语法包分别对接收到的语音信号进行语音识别处 理;
组合单元 503 , 用于按照预定组合顺序, 将各语法包对所述语音信号的 语音识别结果进行组合, 将组合结果发送给所述外部设备;
接收单元 504, 用于接收所述外部设备发来的查询请求, 所述查询请求 结果进行拆分后获得的拆分结果;
选择单元 505 , 用于从预先指定的识别信息与标识的对应关系中, 选择 出接收单元 504接收到的所述查询请求中包含的拆分结果对应的识别信息; 发送单元 506,用于将选择单元 505选择出的拆分结果对应的识别信息 提供给所述外部设备, 以便所述外部设备根据拆分结果对应的识别信息执 行控制处理。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分步 骤是可以通过程序来指令相关的硬件来完成, 该程序可以存储于一计算机 可读取存储介质中, 如: ROM/RAM、 磁碟、 光盘等。 本发明的精神和范围。 这样, 倘若本发明的这些修改和变型属于本发明权 利要求及其等同技术的范围之内, 则本发明也意图包含这些改动和变型在 内。

Claims

权利要求书
1、 一种语音控制方法, 其特征在于, 包括:
对存储的用于进行语音识别的识别信息分类, 获得各类识别信息分别 对应的语法包;
接收输入的语音信号, 并依次使用获得的每个语法包分别对接收到的 语音信号进行语音识别处理; 以及,
根据各语法包对所述语音信号的语音识别结果, 执行对应控制处理。
2、 如权利要求 1所述的方法, 其特征在于, 所述根据每个语法包对所 述语音信号进行语音识别处理, 具体包括:
当能够从接收到的语音信号中识别出该语法包中至少一个识别信息 时, 从预先指定的该语法包中各识别信息对应的标识中, 选择出识别出的 识别信息对应的标识作为该语法包对所述语音信号的语音识别结果;
否则, 确定本次语音识别失败, 并根据本次语音识别处理失败原因, 从预先指定的各语音识别处理失败原因对应的标识中, 选择出本次语音识 别处理失败原因对应的标识作为该语法包对所述语音信号的语音识别结 果。
3、 如权利要求 2所述的方法, 其特征在于, 所述根据各语法包对所述 语音信号的语音识别结果执行对应控制处理, 具体包括:
当各语法包对所述语音信号的语音识别结果中, 存在至少一个语音识 别处理失败原因对应的标识时, 输出用于提示语音识别失败的提示信号。
4、 如权利要求 2所述的方法, 其特征在于, 所述根据各语法包对所述 语音信号的语音识别结果执行对应控制处理, 具体包括:
统计在各语法包对所述语音信号的语音识别结果中, 相同语音识别处 理失败原因对应的标识的数量, 并将其中数量最多的标识对应的语音识别 处理失败原因通过提示消息提示给用户。
5、 如权利要求 2所述的方法, 其特征在于, 所述根据各语法包对所述 语音信号的语音识别结果执行对应控制处理, 具体包括:
在指定语法包对所述语音信号的语音识别结果中, 不存在语音识别处 理失败原因对应的标识时, 根据所述指定语法包对所述语音信号的语音识 别结果, 执行预定的与所述指定语法包对所述语音信号的语音识别结果对 应的控制处理。
6、 如权利要求 2所述的方法, 其特征在于, 所述根据各语法包对所述 语音信号的语音识别结果执行对应控制处理, 具体包括:
按照预定组合顺序, 将各语法包对所述语音信号的语音识别结果进行 组合, 将组合结果发送给外部设备; 以及,
接收所述外部设备发来的查询请求, 所述查询请求中包含所述外部设 照与所述预定
得的拆分结果; 并,
从预先指定的识别信息与标识的对应关系中, 选择出所述查询请求中 包含的拆分结果对应的识别信息;
将拆分结果对应的识别信息提供给所述外部设备, 所述外部设备根据 拆分结果对应的识别信息执行控制处理。
7、 如权利要求 1至 6任一所述的方法, 其特征在于, 所述用于进行语 音识别的识别信息包括联系人姓名类型信息、 联系人联系方式类型信息、 操作类型信息。
8、 一种语音控制装置, 其特征在于, 包括:
语法包获取单元, 用于对存储的用于进行语音识别的识别信息分类, 获得各类型识别信息分别对应的语法包;
语音识别处理单元, 用于接收输入的语音信号, 并依次使用语法包获 取单元获得的每个语法包分别对接收到的语音信号进行语音识别处理; 执行单元, 用于根据语音识别处理单元获得的各语法包对所述语音信 号的语音识别结果, 执行对应控制处理。
9、 如权利要求 8所述的装置, 其特征在于, 所述语音识别处理单元具 体包括:
第一确定子单元, 用于针对语法包获取单元获取的每个语法包, 当能 够从接收到的语音信号中识别出该语法包中至少一个识别信息时, 从预先 指定的该语法包中各识别信息对应的标识中, 选择出识别出的识别信息对 应的标识作为该语法包对所述语音信号的语音识别结果;
第二确定子单元, 用于当不能从接收到的语音信号中识别出该语法包 中至少一个识别信息时, 确定本次语音识别失败, 并根据本次语音识别处 理失败原因, 从预先指定的各语音识别处理失败原因对应的标识中, 选择 出本次语音识别处理失败原因对应的标识作为该语法包对所述语音信号的 语音识别结果。
10、 一种与外部设备连接的语音控制装置, 其特征在于, 包括: 语法包获取单元, 用于对存储的用于进行语音识别的识别信息分类, 获得各类型识别信息分别对应的语法包;
语音识别处理单元, 用于接收输入的语音信号, 并依次使用语法包获 取单元获得的每个语法包分别对接收到的语音信号进行语音识别处理; 组合子单元, 用于按照预定组合顺序, 将语音识别处理单元获取的各 语法包对所述语音信号的语音识别结果进行组合, 将组合结果发送给所述 外部设备;
接收子单元, 用于接收所述外部设备发来的查询请求, 所述查询请求 结果进行拆分后获得的拆分结果;
选择子单元, 用于从预先指定的识别信息与标识的对应关系中, 选择 出接收子单元接收到的所述查询请求中包含的拆分结果对应的识别信息; 发送子单元, 用于将选择子单元选择出的拆分结果对应的识别信息提 供给所述外部设备, 以便所述外部设备根据拆分结果对应的识别信息执行 控制处理。
PCT/CN2011/070198 2010-04-27 2011-01-12 语音控制方法和语音控制装置 WO2011134288A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP11774279.1A EP2521121B1 (en) 2010-04-27 2011-01-12 Method and device for voice controlling
US13/575,717 US9236048B2 (en) 2010-04-27 2011-01-12 Method and device for voice controlling

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201010165495.5 2010-04-27
CN201010165495.5A CN102237087B (zh) 2010-04-27 2010-04-27 语音控制方法和语音控制装置

Publications (1)

Publication Number Publication Date
WO2011134288A1 true WO2011134288A1 (zh) 2011-11-03

Family

ID=44860825

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2011/070198 WO2011134288A1 (zh) 2010-04-27 2011-01-12 语音控制方法和语音控制装置

Country Status (4)

Country Link
US (1) US9236048B2 (zh)
EP (1) EP2521121B1 (zh)
CN (1) CN102237087B (zh)
WO (1) WO2011134288A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103729123A (zh) * 2013-12-31 2014-04-16 青岛高校信息产业有限公司 一种应用程序的映射方法和系统

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103366740B (zh) * 2012-03-27 2016-12-14 联想(北京)有限公司 语音命令识别方法及装置
CN102739869A (zh) * 2012-06-26 2012-10-17 华为终端有限公司 语音查找目标联系人的信息的方法及终端
CN102780653B (zh) * 2012-08-09 2016-03-09 上海量明科技发展有限公司 即时通信中快捷通信的方法、客户端及系统
KR101453979B1 (ko) * 2013-01-28 2014-10-28 주식회사 팬택 음성 명령에 의한 데이터 송수신 방법, 단말 및 시스템
US10046457B2 (en) 2014-10-31 2018-08-14 General Electric Company System and method for the creation and utilization of multi-agent dynamic situational awareness models
CN104469002A (zh) * 2014-12-02 2015-03-25 科大讯飞股份有限公司 确定手机联系人的方法和装置
KR101643560B1 (ko) * 2014-12-17 2016-08-10 현대자동차주식회사 음성 인식 장치, 그를 가지는 차량 및 그 방법
CN105872687A (zh) * 2016-03-31 2016-08-17 乐视控股(北京)有限公司 一种通过语音控制智能设备的方法及装置
CN107545896A (zh) * 2016-06-24 2018-01-05 中兴通讯股份有限公司 设备的控制方法、装置及系统、文件的发送方法和装置
CN106531158A (zh) * 2016-11-30 2017-03-22 北京理工大学 一种应答语音的识别方法及装置
US10229682B2 (en) * 2017-02-01 2019-03-12 International Business Machines Corporation Cognitive intervention for voice recognition failure
CN109215640B (zh) * 2017-06-30 2021-06-01 深圳大森智能科技有限公司 语音识别方法、智能终端及计算机可读存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH10187185A (ja) * 1996-12-25 1998-07-14 Fuji Xerox Co Ltd 言語処理装置および方法
CN1389852A (zh) * 2001-06-06 2003-01-08 松下电器产业株式会社 使用语音识别和自然语言对家居活动的自动控制
US20030105633A1 (en) * 1999-12-02 2003-06-05 Christophe Delaunay Speech recognition with a complementary language model for typical mistakes in spoken dialogue
CN1783213A (zh) * 2004-12-01 2006-06-07 国际商业机器公司 用于自动语音识别的方法和装置
CN101369425A (zh) * 2007-08-17 2009-02-18 株式会社东芝 语音识别装置及其方法

Family Cites Families (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3402100B2 (ja) * 1996-12-27 2003-04-28 カシオ計算機株式会社 音声制御ホスト装置
US6058166A (en) * 1997-10-06 2000-05-02 Unisys Corporation Enhanced multi-lingual prompt management in a voice messaging system with support for speech recognition
US6249765B1 (en) * 1998-12-22 2001-06-19 Xerox Corporation System and method for extracting data from audio messages
US6377922B2 (en) * 1998-12-29 2002-04-23 At&T Corp. Distributed recognition system having multiple prompt-specific and response-specific speech recognizers
JP3444486B2 (ja) * 2000-01-26 2003-09-08 インターナショナル・ビジネス・マシーンズ・コーポレーション 音声認識手段を使用する自動音声応答システムおよび方法
US6654720B1 (en) * 2000-05-09 2003-11-25 International Business Machines Corporation Method and system for voice control enabling device in a service discovery network
US20040085162A1 (en) 2000-11-29 2004-05-06 Rajeev Agarwal Method and apparatus for providing a mixed-initiative dialog between a user and a machine
US6728348B2 (en) * 2000-11-30 2004-04-27 Comverse, Inc. System for storing voice recognizable identifiers using a limited input device such as a telephone key pad
JP2003295890A (ja) * 2002-04-04 2003-10-15 Nec Corp 音声認識対話選択装置、音声認識対話システム、音声認識対話選択方法、プログラム
US7003464B2 (en) * 2003-01-09 2006-02-21 Motorola, Inc. Dialog recognition and control in a voice browser
US7331036B1 (en) * 2003-05-02 2008-02-12 Intervoice Limited Partnership System and method to graphically facilitate speech enabled user interfaces
US20050043067A1 (en) * 2003-08-21 2005-02-24 Odell Thomas W. Voice recognition in a vehicle radio system
US7363228B2 (en) * 2003-09-18 2008-04-22 Interactive Intelligence, Inc. Speech recognition system and method
US8000452B2 (en) * 2004-07-26 2011-08-16 General Motors Llc Method and system for predictive interactive voice recognition
US8121839B2 (en) * 2005-12-19 2012-02-21 Rockstar Bidco, LP Method and apparatus for detecting unsolicited multimedia communications
US8214213B1 (en) * 2006-04-27 2012-07-03 At&T Intellectual Property Ii, L.P. Speech recognition based on pronunciation modeling
US8055502B2 (en) * 2006-11-28 2011-11-08 General Motors Llc Voice dialing using a rejection reference
US8725513B2 (en) * 2007-04-12 2014-05-13 Nuance Communications, Inc. Providing expressive user interaction with a multimodal application
US20080300025A1 (en) * 2007-05-31 2008-12-04 Motorola, Inc. Method and system to configure audio processing paths for voice recognition
WO2009048434A1 (en) 2007-10-11 2009-04-16 Agency For Science, Technology And Research A dialogue system and a method for executing a fully mixed initiative dialogue (fmid) interaction between a human and a machine
US8868424B1 (en) * 2008-02-08 2014-10-21 West Corporation Interactive voice response data collection object framework, vertical benchmarking, and bootstrapping engine
JP2009229529A (ja) * 2008-03-19 2009-10-08 Toshiba Corp 音声認識装置及び音声認識方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH10187185A (ja) * 1996-12-25 1998-07-14 Fuji Xerox Co Ltd 言語処理装置および方法
US20030105633A1 (en) * 1999-12-02 2003-06-05 Christophe Delaunay Speech recognition with a complementary language model for typical mistakes in spoken dialogue
CN1389852A (zh) * 2001-06-06 2003-01-08 松下电器产业株式会社 使用语音识别和自然语言对家居活动的自动控制
CN1783213A (zh) * 2004-12-01 2006-06-07 国际商业机器公司 用于自动语音识别的方法和装置
CN101369425A (zh) * 2007-08-17 2009-02-18 株式会社东芝 语音识别装置及其方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP2521121A4 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103729123A (zh) * 2013-12-31 2014-04-16 青岛高校信息产业有限公司 一种应用程序的映射方法和系统

Also Published As

Publication number Publication date
US9236048B2 (en) 2016-01-12
US20130289995A1 (en) 2013-10-31
EP2521121A4 (en) 2014-03-19
EP2521121A1 (en) 2012-11-07
CN102237087A (zh) 2011-11-09
CN102237087B (zh) 2014-01-01
EP2521121B1 (en) 2017-10-25

Similar Documents

Publication Publication Date Title
WO2011134288A1 (zh) 语音控制方法和语音控制装置
AU2021286393B2 (en) Speech recognition method, speech wakeup apparatus, speech recognition apparatus, and terminal
KR100232873B1 (ko) 음성인식처리용 메모리를 가지는 휴대용 전화기
US9570076B2 (en) Method and system for voice recognition employing multiple voice-recognition techniques
WO2013182118A1 (zh) 一种语音数据的传输方法及装置
JP2004248248A (ja) ユーザがプログラム可能な移動局ハンドセット用の音声ダイヤル入力
TW200540649A (en) Method and apparatus for automatic telephone menu navigation
JP2001308970A (ja) 携帯電話の音声認識操作方法及びシステム
US9077802B2 (en) Automated response system
TW201246899A (en) Handling a voice communication request
CN111325039B (zh) 基于实时通话的语言翻译方法、系统、程序和手持终端
WO2001008384A1 (fr) Telephone cellulaire
TW200304638A (en) Network-accessible speaker-dependent voice models of multiple persons
JP3319186B2 (ja) Pbx−コンピュータ連動システム
KR20080054591A (ko) 휴대단말기의 통화 서비스 방법
JP2015100054A (ja) 音声通信システム、音声通信方法及びプログラム
WO2014180197A1 (zh) 自动发送多媒体文件的方法及装置、移动终端、存储介质
JP2019204112A (ja) 音声認識方法、音声ウェイクアップ装置、音声認識装置、および端末
JP6508251B2 (ja) 音声対話システムおよび情報処理装置
CN111274828B (zh) 基于留言的语言翻译方法、系统、计算机程序和手持终端
KR100291002B1 (ko) 음성인식디지털휴대용전화기에서통화종료및재다이얼링방법
JP3716928B2 (ja) 音声発呼装置
JP2002320037A (ja) 翻訳電話システム
JP2013214924A (ja) 無線操作機、無線操作機の制御方法、およびプログラム
CN115273912A (zh) 语音状态判定方法、装置、终端及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 11774279

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 13575717

Country of ref document: US

REEP Request for entry into the european phase

Ref document number: 2011774279

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 2011774279

Country of ref document: EP

NENP Non-entry into the national phase

Ref country code: DE