WO2014177015A1 - 一种移动终端的语音识别方法及其装置 - Google Patents

一种移动终端的语音识别方法及其装置 Download PDF

Info

Publication number
WO2014177015A1
WO2014177015A1 PCT/CN2014/076180 CN2014076180W WO2014177015A1 WO 2014177015 A1 WO2014177015 A1 WO 2014177015A1 CN 2014076180 W CN2014076180 W CN 2014076180W WO 2014177015 A1 WO2014177015 A1 WO 2014177015A1
Authority
WO
WIPO (PCT)
Prior art keywords
keyword
voice
mobile terminal
contact
category
Prior art date
Application number
PCT/CN2014/076180
Other languages
English (en)
French (fr)
Inventor
罗永浩
Original Assignee
锤子科技(北京)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 锤子科技(北京)有限公司 filed Critical 锤子科技(北京)有限公司
Priority to US14/787,926 priority Critical patent/US9502035B2/en
Publication of WO2014177015A1 publication Critical patent/WO2014177015A1/zh

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/26Devices for calling a subscriber
    • H04M1/27Devices whereby a plurality of signals may be stored simultaneously
    • H04M1/271Devices whereby a plurality of signals may be stored simultaneously controlled by voice recognition
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/72Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
    • H04M1/724User interfaces specially adapted for cordless or mobile telephones
    • H04M1/72403User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/72Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
    • H04M1/724User interfaces specially adapted for cordless or mobile telephones
    • H04M1/72448User interfaces specially adapted for cordless or mobile telephones with means for adapting the functionality of the device according to specific conditions
    • H04M1/72454User interfaces specially adapted for cordless or mobile telephones with means for adapting the functionality of the device according to specific conditions according to context-related or environment-related conditions
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W8/00Network data management
    • H04W8/18Processing of user or subscriber data, e.g. subscribed services, user preferences or user profiles; Transfer of user or subscriber data
    • H04W8/183Processing at user equipment or user record carrier
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • G10L2015/0635Training updating or merging of old and new templates; Mean values; Weighting
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L2015/088Word spotting
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/226Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics
    • G10L2015/228Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics of application context
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M2250/00Details of telephonic subscriber devices
    • H04M2250/74Details of telephonic subscriber devices with voice recognition means

Definitions

  • a common human-computer interaction method in a smart mobile terminal is to touch the screen of the mobile terminal by a finger, and the touch sensor built in the mobile terminal senses the touch information of the finger to realize the interaction.
  • Apple's human-computer interaction mode has changed from traditional physical touch to voice control, that is, the human language is used to command the mobile terminal to meet the tasks that the user needs to achieve.
  • the speech recognition process allows the user to give instructions to the voice assistant software in a natural language form. After receiving the instruction, the related device of the mobile terminal performs speech recognition and semantic analysis on the local and/or cloud server by the voice assistant software. Feedback is provided based on the results of the identification and analysis.
  • the recognition accuracy is low, especially for the recognition and analysis of multi-words, long sentences and multiple sentences.
  • the error rate is often high.
  • the real needs of users are very different. Users need to repeatedly input and revise the results of recognition and analysis, which seriously affects the accuracy and speed of recognition based on mobile terminal-based speech recognition methods.
  • the embodiment of the present application provides a voice recognition method for a mobile terminal and a corresponding device thereof, so as to improve the accuracy and speed of voice recognition based on the mobile terminal.
  • the voice recognition method of the mobile terminal provided by the present application includes:
  • the operation category being a category classified according to a service function of the mobile terminal
  • Receiving voice keyword information determining voice keywords from voice keyword information, according to voice off
  • the key words retrieve the keyword library under the category to be operated, and return the search result.
  • the trigger message for receiving the operation category to be operated for operating the mobile terminal includes:
  • An operation category window is presented on the screen of the mobile terminal, and when a tag corresponding to one of the operation categories in the operation category window is clicked or determined to be in focus, it is determined that a trigger message of the operation category to be operated for operating the mobile terminal is received.
  • the label corresponding to the operation category in the operation category window includes a contact label for implementing a communication service function, an application label for implementing an application service function, a music label for implementing a music play service function, and / or web search tags used to implement online search business functions.
  • the trigger message for receiving the to-be-operated category for operating the mobile terminal specifically includes:
  • the X and Y axes are the plane where the mobile terminal panel is located, the Z axis is perpendicular to the plane formed by the X and Y axes, and the first listener is to receive the sensor service.
  • the second listener is a listener for the distance sensor registered after receiving the sensor server; if yes, determining to receive the to-be-operated category for operating the mobile terminal a trigger message, the operation category is a contact; the receiving voice keyword information, determining a voice keyword from the voice keyword information, and retrieving the keyword library under the to-be-operated category item according to the voice keyword, and returning the search
  • the results include:
  • Receiving voice keyword information including the contact, determining a contact keyword from the voice keyword information, retrieving the contact library according to the contact keyword, returning the retrieved contact, and calling the contact.
  • each contact is numbered, and the numbered voice information is received, and the contact corresponding to the call number voice information is received.
  • the frequency of the corresponding keyword in the keyword library under the operation category item is increased, and when the keyword library under the item to be operated is retrieved according to the voice keyword, The keyword frequency is searched in order from the largest to the smallest.
  • the voice keyword library under the operation category item is updated according to the operation result when the preset condition is satisfied.
  • the voice recognition device of the mobile terminal includes: a trigger message receiving unit, a voice keyword information receiving unit, a voice keyword identifying unit, and a keyword library searching unit, wherein:
  • the trigger message receiving unit is configured to receive a trigger message of a to-be-operated category that operates on the mobile terminal, where the operation category is a category that is classified according to a service function of the mobile terminal;
  • the voice keyword information receiving unit is configured to receive voice keyword information
  • the voice keyword identifying unit is configured to determine a voice keyword from the voice keyword information; the keyword library searching unit is configured to retrieve a keyword library of the to-be-operated category item according to the voice keyword, and return the search result.
  • the trigger message receiving unit specifically includes: an operation category window presentation subunit and a trigger message receiving subunit, where:
  • the operation category window presentation subunit is configured to present an operation category window on a screen of the mobile terminal
  • the trigger message receiving subunit is configured to receive a trigger message of a to-be-operated category for operating the mobile terminal when a tag corresponding to an operation category in the operation category window is clicked or determined to be in focus.
  • the trigger message receiving unit specifically includes: a monitoring result determining subunit and a triggering message receiving subunit, wherein:
  • the monitoring result judging subunit is configured to determine whether the gravitational acceleration component on the Z axis monitored by the first listener is within a range of 0 to 4 gravitational acceleration units, and whether the gravitational acceleration component on the X and Y axes is 4 to Within the range of 10 gravitational acceleration units, and whether the distance monitored by the second listener is zero, the X and Y axes are the plane where the mobile terminal panel is located, and the Z axis is perpendicular to the plane formed by the X and Y axes.
  • the first listener is a listener for the gravity sensor registered after receiving the sensor service
  • the second listener is a listener for the distance sensor registered after receiving the sensor server
  • the trigger message receiving subunit When the determination result is yes, determining that the trigger message of the to-be-operated category that operates the mobile terminal is received, where the operation category is a contact;
  • the voice keyword information receiving unit is specifically configured to receive voice keyword information including a contact, where the voice keyword identifying unit is specifically configured to determine a contact key from the voice keyword information. a keyword, the keyword retrieval unit is specifically configured to retrieve a contact database according to the contact keyword, and return the retrieved contact;
  • the apparatus also includes a call unit for calling the retrieved contact.
  • the device further includes a contact number unit and a numbered voice information receiving unit, where: the contact number unit is configured to: when the number of contacts retrieved according to the contact keyword includes multiple, Each of the contacts is numbered; the numbered voice information receiving unit is configured to receive the numbered voice information, and the calling unit is specifically configured to call the contact corresponding to the numbered voice information.
  • the device further includes a keyword frequency increasing unit, configured to increase a frequency of the corresponding keyword in the keyword library of the operation category item after the mobile terminal is operated, where the keyword is
  • the library retrieval unit is specifically configured to search the keyword library in descending order of the keyword frequency when the keyword library under the item to be operated is retrieved according to the voice keyword.
  • the device further includes a keyword updating unit, configured to update the keyword library under the operation category item according to the operation result when the mobile terminal is operated, when the preset condition is met.
  • a keyword updating unit configured to update the keyword library under the operation category item according to the operation result when the mobile terminal is operated, when the preset condition is met.
  • the embodiment of the present application After receiving the trigger message of an operation category according to the service function of the mobile terminal, the embodiment of the present application receives the voice keyword information, determines the voice keyword from the voice keyword, and then searches for the corresponding keyword database according to the voice keyword. And return the search results.
  • the embodiment of the present application divides the operation category according to the business function, so that the keyword library only corresponds to each operation category, and on the other hand, the retrieval processing object according to the speech keyword retrieval is limited to
  • the keyword library corresponding to the operation of the mobile terminal reduces the number of processing objects, and adapts to the weak processing capability of the mobile terminal; on the other hand, the number of processing objects involved in the retrieval is reduced to shorten the retrieval process time Thereby, the efficiency of speech recognition is improved; on the other hand, the reduction in the number of processing objects involved in the retrieval reduces the probability of occurrence of repetition and ambiguity of the keyword, thereby improving the accuracy of speech recognition.
  • the embodiment of the present application receives the voice information in the form of voice keyword information, is no longer an ordinary natural language, avoids multiple words, long sentences and multiple sentences, and on the other hand, is more easy to extract key from the voice information. Words, in turn, improve the efficiency of speech recognition; on the other hand, the retrieval of the keywords extracted from the speech keyword information and the keyword library to obtain the return results, is conducive to improve the accuracy of speech recognition.
  • FIG. 1 is a flowchart of an embodiment of a voice recognition method of a mobile terminal according to the present application.
  • FIG. 2 is a structural block diagram of an embodiment of a voice recognition apparatus of a mobile terminal according to the present application.
  • FIG. 1 there is shown a flow chart of an embodiment of a voice recognition method of a mobile terminal of the present application.
  • the process includes:
  • Step S101 Receive a trigger message for operating a to-be-operated operation category of the mobile terminal, where the operation category is a category classified according to a service function of the mobile terminal;
  • mobile terminals not only have traditional communication functions, but also have many new business functions, such as network retrieval, playing audio and video, playing games, and the like.
  • the nature of these different business functions is different, and the operation mode and operation instructions of the mobile terminal users to implement various business functions have their own characteristics.
  • various operations for implementing the same service function are generally common.
  • various possible operations of the mobile terminal are classified in advance according to different service functions. The division of this type of operation makes the subsequent speech recognition process clearly targeted.
  • This embodiment does not limit the number and type of operation categories that are divided, as long as the actual application needs can be met.
  • the following categories may be classified according to the service function of the mobile terminal itself and the usage range of the mobile terminal user: a contact category, which is used to store information such as a contact name, a phone number, and a personal feature, and recognizes a certain voice in the voice.
  • a contact category which is used to store information such as a contact name, a phone number, and a personal feature, and recognizes a certain voice in the voice.
  • the contact information can be viewed, the contact can be called, the short message can be sent to the contact
  • the application category is used to record the application name, icon, storage location, etc. of the application.
  • the voice recognizes an application you can view the basic attribute information of the application, and can perform various operations on the application: start, uninstall, delete, update, etc.; music category, used to record the music name, singer Name, album name and other related information, When the voice recognizes a certain music, the basic attribute information about the music can be viewed, and various operations can be performed on the music: playing, moving, deleting, etc.; the webpage search category is used to implement the webpage search function.
  • Step S102 Receive voice keyword information, and determine a voice keyword from the voice keyword information. If the mobile terminal user needs to use voice to implement certain control and operation on the mobile terminal, the voice recognition engine may be activated to be in a working state. When voice recognition is required, the voice keyword information is received by the voice recognition engine.
  • the voice information received in this embodiment is a voice content containing keywords as the main topic, and may not be a general natural language containing a complete sentence. For example, if you need to call Zhang, the voice of the prior art is: "Call Zhang", and in the case of this embodiment, when the operation category information is determined to be "contact”, Directly say "Zhang Moumou", that is, only need to give the operation of the keywords, you can control the mobile terminal to achieve the corresponding operation.
  • the voice information of the mobile terminal user is usually not exactly the voice keyword. For example, it may include some transitional sounds, tone sounds, etc. These voices are noise for voice recognition and need to be removed from the voice keyword information.
  • the voice keyword is extracted from the keyword, and the voice keyword directly corresponds to a certain keyword in the keyword library, and further corresponds to an operation command.
  • Step S103 Retrieving the keyword library under the operation category to be operated according to the voice keyword, and returning the search result;
  • the keyword is searched in the keyword library corresponding to the operation category to be operated, and the search result is returned.
  • the retrieval result may be triggered to perform a corresponding operation on the mobile terminal.
  • steps S101 and S102 in this embodiment may be run in parallel during the actual operation or the step S102 is performed after the step S101, that is, the user of the mobile terminal may trigger the operation to be operated as described above.
  • the category and then receiving the voice keyword input by the user; or receiving the voice keyword of the user first, receiving the trigger of the operation category to be operated by the user, or receiving the voice keyword information when receiving the trigger of the operation category to be operated.
  • the execution timing between the two does not affect the implementation of the object of the present application, and may be selected in accordance with the needs of the application.
  • the embodiment After receiving the trigger message of an operation category according to the service function of the mobile terminal, the embodiment receives the voice keyword information, determines the voice keyword from the voice keyword, and then searches for the corresponding keyword library according to the voice keyword, and Returns the search results.
  • this The application embodiment can achieve the following technical effects:
  • the keyword library is only corresponding to each operation category, which is different from the existing speech recognition using all speech recognition libraries having various operational properties and modes. Therefore, the search processing object according to the speech keyword search is limited to the range of the keyword library corresponding to the operation to be performed on the mobile terminal, the number of processing objects is reduced, and the processing capability of the mobile terminal is weak.
  • the existing speech recognition library includes 100 voice operation instructions.
  • the 100 voice operation instructions are classified, and the instructions for implementing the "contact" function are classified into one category, and the category includes 10 Voice operation instruction, when the mobile terminal user only needs to perform the contact function, it will trigger the retrieval and recognition of the voice under the category, that is, only need to perform retrieval within the 10 voice operation instructions, therefore, the number of processing is greatly cut back.
  • the time for completing the retrieval process will be greatly shortened in the case where the processing capability of the mobile terminal is unchanged, and the voice input with the user can be given in a short time.
  • the keyword corresponds to the retrieval result, which improves the efficiency of speech recognition.
  • the time for retrieving each voice operation instruction is 0.01s, and the position of a voice word spoken by the user is located at the 80th position.
  • the above 100 voice operation instructions will be executed.
  • the voice operation instruction can be found after 80 search matches in the library, and the time is 0.8s.
  • the search matching operation is limited to the range of 10 voice operation instructions for implementing the contact function, the maximum time is only 0.1s. It can be seen that the retrieval time is greatly shortened, thereby improving the efficiency of speech recognition.
  • the user Since the number of processing objects involved in the retrieval is reduced, the probability of repetition and ambiguity of occurrence of keywords is lowered, thereby improving the accuracy of speech recognition.
  • the user said the word "Zhang Moumou”.
  • two "Zhang Moumou” may be found.
  • One "Zhang Moumou” is a contact stored by the user on the mobile terminal.
  • the name, a "Zhang Moumou” is the name of a singer stored in the user's music library. That is to say, the phonetic word is duplicated and ambiguous. At this time, the system will not know whether the user of the mobile terminal is giving the phone.
  • the embodiment receives the voice information in the form of voice keyword information, is no longer an ordinary natural language, avoids multiple words, long sentences and multiple sentences, and on the other hand, it is easier to extract the key from the voice information. Words, in turn, improve the efficiency of speech recognition; on the other hand, the retrieval of the keywords extracted from the speech keyword information and the keyword library to obtain the return results, is conducive to improve the accuracy of speech recognition.
  • the manner in which the trigger message is received is various.
  • an operation category window is displayed on the screen of the mobile terminal, and various operation category labels are displayed in the category window, and the category label may include: Feature contact tags, application tags for implementing application business functions, music tags for implementing music playback business functions, web search for implementing online search business functions, and more.
  • a trigger event (trigger message) is generated in the system, and when the trigger event is detected, the trigger message for the operation category is considered to be received.
  • the trigger message For example, when the user sets the application automatic update function, when a new version of an application appears in the network, the mobile terminal will receive an update notification, and then the update notification can be regarded as "right". The application "trigger message of this operation category, so that the user's voice instruction can be received to implement the application update or not update.
  • the trigger message of the operation category may be determined based on the user's some usual actions of the mobile terminal.
  • a common action such as when the user places the phone to the ear, means that the user needs to call a contact, in which case the "contact" category is considered to have been received.
  • the sensor service of the system is obtained when the speech recognition engine is initialized, a listener of the gravity sensor and a listener of the distance sensor are registered, and the gravity sensor can provide the component of the gravitational acceleration in three dimensions (x, y, z).
  • the gravitational acceleration value along the z-axis tends to 9.8, while the x, y-axis component tends to be 0.
  • the voice assistant application monitors the gravitational acceleration sensor return value in real time when the phone is placed horizontally or slightly tilted When (the user is normally flat)
  • the component of the z-axis tends to 7 and at the same time judges that the return value of the distance sensor is non-zero (that is, there is no object occlusion before the distance sensor of the mobile phone), and the entire process is initialized by satisfying the above two conditions, and the initialization time is recorded.
  • the distance sensor always returns a non-zero value (without any obstruction) before the user takes the phone to the ear, and the status is working.
  • the z-axis tends to 2 (required)
  • the value of the invention can satisfy the object of the invention within 0 to 4 gravitational acceleration units, and the sum of the absolute values of the X-axis and the y-axis tends to 7 (the value can be in the range of 4 to 10) Value)
  • the absolute value of the X-axis should be greater than 2
  • the system state will be set to WAIT-PROXI , this state waits for the distance sensor to return a value of 0 (face blocking distance sensor), once it returns 0 value, it will start the program to call the contact dialing operation, if the distance sensor returns 0 value, from initialization to WAIT - PROXI full process more than 2 seconds
  • the clock will judge that the action recognition failed.
  • the user can directly call the contact's name, and the system will read the qualified contact from the mobile phone contact list according to the recognition result. If there are multiple matching contacts, the system will pass the voice. Prompt the user, for example (1. Chen Moumou. 2. Liu Moumou), at this time the user only needs to say " ⁇ , or "2" to make a choice to call Chen or Wang Moumou, when the user chooses, The system will prompt the user to dial the number and dial the contact directly to the user. If there is only one contact, the system will prompt the user to dial and make a call.
  • the same user in the process of using the speech recognition function for a long time, the same user must form a regular habit, which can be applied to the retrieval process of the keyword library. For example, when a mobile terminal is often performing an operation, it indicates that the user needs to perform such an operation frequently. In this case, a counter may be set to record the total number of times the mobile terminal performs the operation after an operation is performed.
  • the total number of times is used as an attribute of the keyword corresponding to the action in the keyword library, and when the search is performed based on the voice keyword, the keyword library is searched in order from the time of the arrival to the school according to the frequency of the keyword. Since the user often performs an operation, the frequency of the operation is necessarily large, and it is inevitably in the keyword library, and the retrieval order from the largest to the small will be able to obtain the retrieval result relatively quickly.
  • the voice keyword library under the operation category item may be updated according to the operation result when the preset condition is met. For example, to add a person to the contact list, you need to update the voice keyword library.
  • the update time may be the time when each contact is added, or each time the mobile phone is restarted, these may be set according to actual conditions, when the pre-satisfaction is met. When the condition is set, the update operation is triggered.
  • FIG. 2 there is shown a structural block diagram of an apparatus for voice recognition of a mobile terminal of the present application.
  • the device includes: a trigger message receiving unit 201, a voice keyword information receiving unit 202, a voice keyword identifying unit 203, and a keyword library searching unit 204, where:
  • the trigger message receiving unit 201 is configured to receive a trigger message of a to-be-operated category that operates on the mobile terminal, where the operation category is a category classified according to a service function of the mobile terminal;
  • the voice keyword information receiving unit 202 is configured to receive voice keyword information
  • the voice keyword recognition unit 203 is configured to determine a voice keyword from the voice keyword information; the keyword library retrieval unit 204 is configured to retrieve the keyword library under the to-be-operated category item according to the voice keyword, and return the retrieval result.
  • the working process of the foregoing apparatus embodiment is: the trigger message receiving unit 201 receives the trigger message of the to-be-operated category for operating the mobile terminal; the voice keyword information receiving unit 202 receives the voice keyword information, and the voice keyword identifying unit 203 receives the voice The keyword keyword is determined in the keyword information. Then, the keyword library retrieval unit 204 retrieves the keyword library under the category to be operated according to the voice keyword, and returns the retrieval result.
  • the device embodiment After receiving the trigger message of an operation category according to the service function of the mobile terminal, the device embodiment receives the voice keyword information, determines the voice keyword from the voice keyword, and then searches for the corresponding keyword library according to the voice keyword. And return the search results.
  • the device embodiment divides the operation category according to the business function, so that the keyword library only corresponds to each operation category, and on the other hand, the retrieval processing object is limited to the retrieval based on the speech keyword.
  • the keyword library corresponding to the operation of the mobile terminal reduces the number of processing objects, and adapts to the weak processing capability of the mobile terminal; on the other hand, the number of processing objects involved in the retrieval is reduced to shorten the retrieval process time , thereby improving the efficiency of speech recognition; on the other hand, the reduction in the number of processing objects involved in the retrieval reduces the probability of occurrence of repetition and ambiguity of the keyword, thereby improving the accuracy of speech recognition. Sex.
  • the device embodiment receives the voice information in the form of voice keyword information, is no longer an ordinary natural language, avoids multiple words, long sentences and multiple sentences, and on the other hand, is more easy to extract key from the voice information. Words, in turn, improve the efficiency of speech recognition; on the other hand, the retrieval of the keywords extracted from the speech keyword information and the keyword library to obtain the return results, is conducive to improve the accuracy of speech recognition.
  • the trigger message receiving unit 201 may include: an operation category window presentation subunit 2011 and a trigger message receiving subunit 2012, where:
  • the operation category window presentation subunit 2011 is configured to present an operation category window on the screen of the mobile terminal;
  • the trigger message receiving subunit 2012 is configured to receive a trigger message of a to-be-operated category for operating the mobile terminal when the label corresponding to an operation category in the operation category window is clicked or determined to be the focus.
  • Method 2 The mode of identifying the user's operation by the sensor confirms that the operation category trigger message is received.
  • the trigger message receiving unit specifically includes: a monitoring result determining subunit and a triggering message receiving subunit, where:
  • the monitoring result judging subunit is configured to determine whether the gravitational acceleration component on the Z axis monitored by the first listener is 2, whether the gravitational acceleration component on the X and Y axes is 7, and the second listener monitors Whether the distance is zero, the X and Y axes are the plane where the mobile terminal panel is located, the Z axis is perpendicular to the plane formed by the X and Y axes, and the first monitor is a gravity sensor registered after receiving the sensor service. Listener, the second listener is a listener for the distance sensor registered after receiving the sensor server;
  • the trigger message receiving subunit is configured to determine, when the determination result is yes, a trigger message that receives a to-be-operated category that operates on the mobile terminal, where the operation category is a contact.
  • the other functional units have corresponding changes, that is, the voice keyword information receiving unit is specifically configured to receive the voice keyword information including the contact, and the voice keyword identifying unit is specifically The method for determining a contact keyword from the voice keyword information, the keyword search unit is specifically configured to retrieve the contact database according to the contact keyword, and return the retrieved contact.
  • the above apparatus embodiment further includes a call unit for calling the retrieved contact.
  • the device embodiment further includes a contact number unit and a numbered voice information receiving unit, where: the contact number unit is configured to: when the number of contacts retrieved according to the contact keyword includes multiple, Each of the contacts is numbered; the numbered voice information receiving unit is configured to receive the numbered voice information, and the calling unit is specifically configured to call the contact corresponding to the numbered voice information.
  • the device embodiment further includes a keyword frequency increasing unit, configured to increase a frequency of the corresponding keyword in the keyword library of the operation category item after the mobile terminal is operated, where the keyword is
  • the library retrieval unit is specifically configured to search the keyword library in descending order of the keyword frequency when the keyword library under the item to be operated is retrieved according to the voice keyword.
  • the speed of retrieval can be increased by adding this unit.
  • the device embodiment may further include a keyword updating unit 205, configured to update the keyword library under the operation category item according to the operation result when the mobile terminal is operated, when the preset condition is met.

Landscapes

  • Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Environmental & Geological Engineering (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • Telephone Function (AREA)
  • User Interface Of Digital Computer (AREA)
  • Telephonic Communication Services (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种移动终端的语音识别方法和装置,以提高语音识别的效率和准确性。该方法包括:接收对移动终端进行操作的待操作的操作类别的触发消息,所述操作类别为根据移动终端的业务功能划分的类别(S101);接收语音关键词信息,从语音关键词信息中确定语音关键词(S102);根据语音关键词检索所述待操作的操作类别项下的关键词库,返回检索结果(S103)。

Description

一种移动终端的语音识別方法及其裝置 本申请要求于 2013 年 5 月 2 日提交中国国家知识产权局、 申请号为 201310157943.0、 发明名称为 "一种移动终端的语音识别方法及其装置" 的中 国专利申请的优先权, 该在先申请的全部内容通过 )用结合在本申请中。 技术领域 本申请涉及信息处理技术领域,特别涉及一种基于移动终端的语音识别方 法及其对应的装置。
背景技术 移动终端的使用离不开人机交互过程。在智能移动终端中比较常见的人机 交互方式是通过手指触摸移动终端的屏幕,由移动终端内置的感应器感应手指 的触压信息实现交互。 随着苹果公司在 iPhone系列产品中加入 Siri语音助手 功能后,人机交互方式由传统的物理触摸变化为语音控制, 即通过人的语言来 指令移动终端满足用户需要达成的任务。该语音识别过程允许用户随意以自然 语言形式向语音助手类软件给出指令, 移动终端的相关装置接收到该指令后, 由语音助手类软件在本地和 /或云端服务器进行语音识别和语义分析, 并根据 识别和分析的结果给予反馈。
然而, 由于现有语音识别, 特别是语义分析方面的技术不完善, 识别准确 率较低, 尤其对于多词、 长句、 多句的识别和分析错误率相当高, 识别和分析 的结果经常与用户真正的需要大相径庭, 用户需要反复输入、 不断修订识别和 分析的结果, 严重影响了基于移动终端的语音识别方法识别的准确性和快捷 性。
发明内容
为解决上述技术问题,本申请实施例提供了一种移动终端的语音识别方法 及其对应装置, 以提高基于移动终端的语音识别的准确性和快捷性。
本申请提供的移动终端的语音识别方法包括:
接收对移动终端进行操作的待操作的操作类别的触发消息,所述操作类别 为才艮据移动终端的业务功能划分的类别;
接收语音关键词信息,从语音关键词信息中确定语音关键词,根据语音关 键词检索所述待操作类别项下的关键词库, 返回检索结果。
优选地,所述接收对移动终端进行操作的待操作的操作类别的触发消息具 体包括:
在移动终端屏幕上呈现操作类别窗口,当所述操作类别窗口中的一个操作 类别对应的标签被点击或确定为焦点时,确定接收到对移动终端进行操作的待 操作的操作类别的触发消息。
进一步优选地,所述操作类别窗口中的操作类别对应的标签包括用于实现 通信业务功能的联系人标签、用于实现应用业务功能的应用程序标签、用于实 现音乐播放业务功能的音乐标签和 /或用于实现在线搜索业务功能的网页搜索 标签。
优选地, 所述接收对移动终端进行操作的待操作类别的触发消息具体包 括:
判断第一监听器监听到的 Z轴上的重力加速度分量是否在 0至 4个重力加 速度单位范围内, X、 Y轴上的重力加速度分量是否在 4到 10个重力加速度 单位范围内, 且第二监听器监听到的距离是否为零, 所述 X、 Y轴为移动终端 面板所在的平面, 所述 Z轴垂直于 X、 Y轴构成的平面, 所述第一监听器为接 收到传感器服务后注册的对重力传感器的监听器,所述第二监听器为接收到传 感器服务器后注册的对距离传感器的监听器; 如果均为是, 则确定接收到对移 动终端进行操作的待操作类别的触发消息, 所述操作类别为联系人; 则所述接 收语音关键词信息,从语音关键词信息中确定语音关键词,根据语音关键词检 索所述待操作类别项下的关键词库, 返回检索结果包括:
接收包含联系人的语音关键词信息,从语音关键词信息中确定出联系人关 键词,根据所述联系人关键词检索联系人库,返回检索到的联系人并呼叫该联 系人。
进一步优选地, 当根据所述联系人关键词检索到的联系人包括多个时,对 每个联系人进行编号, 接收编号语音信息, 呼叫编号语音信息对应的联系人。
优选地, 当移动终端被操作后,增加所述操作在其操作类别项下的关键词 库中对应的关键词的频次, 则在根据语音关键词检索待操作项下的关键词库 时, 按照关键词频次由大到小的顺序检索关键词库。 优选地, 当移动终端被操作后,在满足预设条件时根据所述操作结果对操 作类别项下的语音关键词库进行更新。
本申请提供的移动终端的语音识别装置包括: 触发消息接收单元、语音关 键词信息接收单元、 语音关键词识别单元和关键词库检索单元, 其中:
所述触发消息接收单元,用于接收对移动终端进行操作的待操作类别的触 发消息, 所述操作类别为根据移动终端的业务功能划分的类别;
所述语音关键词信息接收单元, 用于接收语音关键词信息;
所述语音关键词识别单元, 用于从语音关键词信息中确定语音关键词; 所述关键词库检索单元,用于根据语音关键词检索所述待操作类别项下的 关键词库, 返回检索结果。
优选地, 所述触发消息接收单元具体包括: 操作类别窗口呈现子单元和触 发消息接收子单元, 其中:
所述操作类别窗口呈现子单元, 用于在移动终端屏幕上呈现操作类别窗 口;
所述触发消息接收子单元,用于在所述操作类别窗口中的一个操作类别对 应的标签被点击或确定为焦点时,接收到对移动终端进行操作的待操作类别的 触发消息。
优选地, 所述触发消息接收单元具体包括: 监听结果判断子单元和触发消 息接收子单元, 其中:
所述监听结果判断子单元,用于判断第一监听器监听到的 Z轴上的重力加 速度分量是否在 0至 4个重力加速度单位范围内, X、 Y轴上的重力加速度分 量是否在 4至 10个重力加速度单位范围内, 且第二监听器监听到的距离是否 为零, 所述 X、 Y轴为移动终端面板所在的平面, 所述 Z轴垂直于 X、 Y轴构 成的平面,所述第一监听器为接收到传感器服务后注册的对重力传感器的监听 器, 所述第二监听器为接收到传感器服务器后注册的对距离传感器的监听器; 所述触发消息接收子单元, 用于在判断结果均为是时,确定接收到对移动 终端进行操作的待操作类别的触发消息, 所述操作类别为联系人;
所述语音关键词信息接收单元具体用于接收包含联系人的语音关键词信 息,所述语音关键词识别单元具体用于从语音关键词信息中确定出联系人关键 词, 所述关键词检索单元具体用于根据所述联系人关键词检索联系人库,返回 检索到的联系人;
所述装置还包括呼叫单元, 用于呼叫所述检索到的联系人。
进一步优选地, 所述装置还包括联系人编号单元和编号语音信息接收单 元, 其中: 所述联系人编号单元, 用于在根据所述联系人关键词检索到的联系 人包括多个时, 对每个联系人进行编号; 所述编号语音信息接收单元, 用于接 收编号语音信息, 所述呼叫单元具体用于呼叫编号语音信息对应的联系人。
优选地,所述装置还包括关键词频次增加单元,用于在移动终端被操作后, 增加所述操作在其操作类别项下的关键词库中对应的关键词的频次,则所述关 键词库检索单元具体用于在根据语音关键词检索待操作项下的关键词库时,按 照关键词频次由大到小的顺序检索关键词库。
优选地, 所述装置还包括关键词更新单元, 用于在移动终端被操作后, 在 满足预设条件时根据所述操作结果对操作类别项下的关键词库进行更新。
本申请实施例接收到依据移动终端业务功能划分的某个操作类别的触发 消息后, 接收语音关键词信息, 从语音关键词中确定语音关键词, 然后根据语 音关键词检索相应的关键词库,并返回检索结果。与现有的语音识别技术相比, 本申请实施例由于根据业务功能对操作类别进行了划分,使关键词库仅仅与每 个操作类别对应,一方面根据语音关键词检索时检索处理对象仅限于与对移动 终端的操作相对应的关键词库, 减少了处理对象的数量, 适应了移动终端的处 理能力较弱的特点; 又一方面,检索涉及的处理对象的数量减少使检索过程的 时间缩短, 从而提高了语音识别的效率; 再一方面, 检索涉及的处理对象的数 量减少使出现关键词的重复和二义性的概率降低,从而提高了语音识别的准确 性。 而且, 本申请实施例在接收语音信息时以语音关键词信息的形式接收, 不 再是普通的自然语言, 避免了多词、 长句和多句, 一方面更加容易从语音信息 中提取出关键词, 进而提高了语音识别的效率; 另一方面通过从语音关键词信 息中提取的关键词与关键词库匹配来获取返回结果,有利于提高语音识别的准 确性。
附图说明 为了更清楚地说明本发明实施例或现有技术中的技术方案,以下将对 实施例或现有技术描述中所需要使用的附图作简单地介绍。 显而易见地, 以下描述中的附图仅仅是本发明的一些实施例, 对于本领域普通技术人 员而言, 还可以根据这些附图所示实施例得到其它的实施例及其附图。
图 1为本申请的移动终端的语音识别方法的一个实施例的流程图; 图 2为本申请的移动终端的语音识别装置的一个实施例的结构框图。
具体实施方式 为了使本技术领域的人员更好地理解本申请中的技术方案,下面将结合本 申请实施例中的附图, 对本申请实施例中的技术方案进行清楚、 完整地描述, 显然, 所描述的实施例仅仅是本申请一部分实施例, 而不是全部的实施例。 基 于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获 得的所有其他实施例, 都应当属于本申请保护的范围。
参见图 1 , 该图示出了本申请的移动终端的语音识别方法的实施例的流 程。 该流程包括:
步骤 S101 : 接收对操作移动终端的待操的操作类别的触发消息, 所述操 作类别为根据移动终端的业务功能划分的类别;
随着信息技术的发展,移动终端不仅单单具有传统的通信功能, 而且还具 有许多新的业务功能, 比如, 网络检索、 播放音视频、 玩游戏等。 这些不同的 业务功能的性质存在差别,移动终端使用者实现各个业务功能的操作方式、操 作指令各具特色。 尽管如此, 实现同一个业务功能的各种操作通常具有共性, 本实施例依据业务功能的不同预先对移动终端的各种可能的操作进行类别划 分。通过这种操作类别的划分使后续的语音识别过程具有明确针对性。本实施 例不限定划分出来的操作类别数量和类型, 只要能够满足实际应用需要即可。 比如,可以根据移动终端本身的业务功能和移动终端使用者的使用范围划分出 如下的类别: 联系人类别, 用于存储联系人的姓名、 电话号码、 个人特征等信 息,在语音识别出某个联系人时可以查看到该联系人的有关信息, 可以呼叫该 联系人、给该联系人发送短信等;应用程序类别,用于记载应用程序的程序名、 图标、存储位置等与应用程序相关的信息,在语音识别出某个应用程序时可以 查看该应用程序的基本属性信息, 可以对该应用程序进行各种操作: 启动、 卸 载、 删除、 更新等; 音乐类别, 用于记载音乐名、歌手名、 专辑名等相关信息, 在语音识别出某个音乐时可以查看关于该音乐的基本属性信息,可以对该音乐 进行各种操作: 播放、移动、删除等; 网页搜索类别, 用于实现网页搜索功能。
步骤 S102: 接收语音关键词信息, 从语音关键词信息中确定语音关键词; 移动终端使用者如果需要使用语音实现对移动终端的某些控制、操作, 可 以启动语音识别引擎, 使其处于工作状态, 当需要进行语音识别时, 通过语音 识别引擎接收语音关键词信息。本实施例接收的语音信息是包含以关键词为主 题的语音内容, 可以不是一般的包含完整句意的自然语言。 比如, 如果需要向 张某某打电话, 现有技术的语音是: "给张某某打电话", 而在本实施例的情形 下, 当确定操作类别信息为 "联系人" 时, 则可以直接说出 "张某某", 即只 需要给出操作的关键词, 便能控制移动终端实现相应的操作。
接收到语音关键词信息后, 需要从语音关键词信息中确定出语音关键词。 移动终端使用者的语音信息通常不能非常精确地仅仅是语音关键词, 比如, 可 能包括一些过渡音、 语气音等, 这些语音对于语音识别而言属于噪声, 需要从 语音关键词信息中将其去掉,从中提取出语音关键词, 该语音关键词直接对应 于关键词库中的某个关键词, 进而对应着某个操作命令。
步骤 S103: 根据语音关键词检索所述待操作的操作类别项下的关键词库, 返回检索结果;
通过前述步骤确定出语音关键词后,利用该关键词在待操作的操作类别对 应的关键词库中进行检索, 并返回检索结果。 在获取到检索结果后, 可以触发 该检索结果执行对移动终端的相应操作。
需要说明的是:本实施例中的步骤 S101和 S102在实际运行过程中可以并 行运行或者 S102步骤在前 S101步骤在后,即移动终端的使用者可以如前所述 的先触发待操作的操作类别, 然后再接收用户输入的语音关键词; 也可以先接 收用户的语音关键词,在接收用户对待操作的操作类别的触发, 或者在接收对 待操作的操作类别的触发时也接收语音关键词信息,这两者之间的执行时序并 不影响本申请发明目的的实现, 根据应用需要, 可以选择其中合适的方式。
本实施例接收到依据移动终端业务功能划分的某个操作类别的触发消息 后, 接收语音关键词信息, 从语音关键词中确定语音关键词, 然后根据语音关 键词检索相应的关键词库, 并返回检索结果。 与现有的语音识别技术相比, 本 申请实施例可以取得如下的技术效果:
( 1 ) 由于根据业务功能对操作类别进行了划分, 使关键词库仅仅与每个 操作类别对应, 这不同于现有的语音识别使用的包含具有各种不同操作性质、 方式的全部语音识别库,从而使根据语音关键词检索时检索处理对象仅限于与 将要对移动终端进行的操作相对应的关键词库的范围, 减少了处理对象的数 量, 适应了移动终端的处理能力较弱的特点。 比如, 现有语音识别库包含 100 个语音操作指令, 本实施例对该 100个语音操作指令进行了类别划分,将其中 用于实现 "联系人" 功能的指令归于一个类别, 该类别包括 10个语音操作指 令, 当移动终端使用者仅需要进行联系人功能时, 它将触发在该类别下进行语 音的检索识别, 即只需要在这 10个语音操作指令内进行检索, 因此, 处理的 数量大大减少。
( 2 ) 由于检索涉及的处理对象的数量减少, 在移动终端的处理能力不变 的情况下, 完成检索过程的时间将大为缩短,在较短的时间内即可给出与用户 输入的语音关键词相对应的检索结果,从而提高了语音识别的效率。仍以前例 进行说明, 假设检索每个语音操作指令的时间是 0.01s, 用户说出的一个语音 词的位置位于第 80位, 按照现有的语音识别方式, 将在上述的 100个语音操 作指令库中进行 80次检索匹配后才能找到该语音操作指令, 用时为 0.8s, 但 是如果将检索匹配操作限制在实现联系人功能的 10 个语音操作指令范围内 时, 最大用时也只不过 0.1s, 可见大大缩短了检索时间, 从而提高了语音识别 的效率。
( 3 ) 由于检索涉及的处理对象的数量减少使出现关键词的重复和二义性 的概率降低, 从而提高了语音识别的准确性。 比如, 用户说出了 "张某某" 这 个词, 在上述 100个语音操作指令中, 可能找到两个 "张某某", 一个 "张某 某"是用户在移动终端上存储的一个联系人的名字, 一个 "张某某"是用户音 乐库中存储的一个歌手的名字, 也就是说, 该语音词存在重复和二义性, 这时 系统将不知道移动终端的用户到底是向给电话薄中的 "张某某"打电话, 还是 需要听音乐库中 "张某某" 的歌, 如果默认选择前者, 那么用户真正的想法可 能是实现后者; 如果默认选择后者, 那么用户真正的想法可能是实现前者。 但 在本实施例中, 由于用户事先指定了操作类别, 如果指定的类别为 "联系人", 则用户说 "张某某", 即是想与张某某通电话; 如果指定的类别为 "音乐", 则 用户说 "张某某", 即是想听张某某的歌, 从而能够准确地进行语音识别操作。
( 4 )本实施例在接收语音信息时以语音关键词信息的形式接收, 不再是 普通的自然语言, 避免了多词、 长句和多句, 一方面更加容易从语音信息中提 取出关键词, 进而提高了语音识别的效率; 另一方面通过从语音关键词信息中 提取的关键词与关键词库匹配来获取返回结果, 有利于提高语音识别的准确 性。
在前述实施例中提及需要接收对移动终端的待操作的操作类别的触发消 息, 在实际应用过程中, 接收到触发消息的方式多种多样。 比如, 在用户需要 使用语音识别引擎操作控制移动终端时,在移动终端屏幕上呈现出一个操作类 别窗口, 在该类别窗口中显示各种操作类别标签, 该类别标签可以包括: 用于 实现通信业务功能的联系人标签、用于实现应用业务功能的应用程序标签、用 于实现音乐播放业务功能的音乐标签、用于实现在线搜索业务功能的网页搜索 等等。 当用户点击这些类别标签中的一个时或者焦点移动到某个类别标签时, 将在系统中产生一个触发事件 (触发消息), 监测到该触发事件时即可认为接 收到了对操作类别的触发消息。还比如, 当用户设置了应用程序自动更新功能 时, 当发现网络中出现了某个应用程序的新版本时,移动终端将接收到更新通 知,这时可将接收到该更新通知视为对 "应用程序"这个操作类别的触发消息, 从而可以接收用户的语音指令实现应用程序的更新或不更新。此外, 除上述的 基于某个触控事件或网络事件来视为接收到操作类别的触发消息外,还可以基 于用户对移动终端的某些惯常动作来确定是否接收到操作类别的触发消息。一 个常见的动作如用户将手机放置到耳边,该动作即表示用户需要呼叫某个联系 人, 在这种情况下, 则可以认为接收到了 "联系人" 类别。 这种触发方式的具 体过程如下:
在语音识别引擎初始化时获得系统的传感器服务,注册一个重力传感器的 监听器和一个距离传感器的监听器,重力传感器可以提供重力加速度在三个维 度(x, y, z )的分量。 当手机水平放置时, 沿着 z轴的重力加速度值趋向于 9.8, 而 x,y轴的分量趋向与 0.所以, 语音助手应用程序实时监测重力加速度传感器 返回值,当手机水平放置或稍稍倾斜的时候(也就是用户正常平握手机的时候) z轴的分量趋向于 7 , 并且同时判断距离传感器的返回值为非 0 (也就是手机 的距离传感器前没有任何物体遮挡 ), 满足以上 2个条件便初始化整个流程, 并记录初始化时间。在用户将手机拿到耳边之前的过程中距离传感器始终返回 非 0值(无任何遮挡物), 此时状态为 working 当用户将手机放置耳边时, 此 时的 z轴趋向于 2 (需要说明的是, 在数值可以在 0至 4个重力加速度单位内 即可满足本申请的发明目的), X轴和 y轴的绝对值之和则趋向于 7 (该值可以 在 4至 10范围内取值 ), 考虑到用户将手机放置耳边 X轴有一个倾斜的角度, 此时 X轴的绝对值应该是大于 2的, 满足以上条件并且系统为 working状态, 系统状态将置为 WAIT— PROXI, 此状态等待距离传感器返回 0值(脸挡住距 离传感器), 一旦返回 0值将启动程序进行呼叫联系人拨号操作, 如果在距离 传感器返回 0值之前, 从初始化到 WAIT— PROXI全过程超过 2秒钟, 将判断 此次动作识别失败。 当呼叫联系人拨号功能启动之后, 用户可以直接呼叫联系 人的名字, 系统将根据识别结果从手机联系人列表里读取符合条件的联系人, 如果有多个匹配的联系人, 系统将通过语音提示用户, 例如(1.陈某某。 2.刘 某某), 此时用户只需说 "Γ,或者 "2"即可进行选择拨打给陈某某或王某某, 当 用户选择后, 系统将提示用户正在进行拨号, 并直接拨打给用户所选联系人。 如果只有一个联系人, 系统将直接提示用户正在进行拨号并拨打电话。
在上述实施例中没有限定在获取到语音关键词后具体如何实现对操作类 别项下的关键词库的检索, 尽管这并不影响本申请发明目的的实现。 但是, 同 一个用户在长期使用语音识别功能过程中, 必然形成某个具有规律性的习惯, 这些习惯可以运用于对关键词库的检索过程。 比如, 当移动终端经常被执行某 个操作时, 说明需要用户对这种操作的需求比较频繁, 这时, 可以设置一个计 数器, 记录移动终端在被执行某个操作后该操作被执行的总次数(频次), 将 该总次数作为关键词库中与该动作对应的关键词的一个属性,在依据语音关键 词进行检索时,按照关键词的频次大小由大到校的顺序检索关键词库, 由于用 户经常进行某个操作, 该操作的频次必然较大, 在关键词库中必然靠前, 由大 到小的检索顺序将能较快地得到检索结果。此外,还可以在移动终端被操作后, 在满足预设条件时根据所述操作结果对操作类别项下的语音关键词库进行更 新。比如,用于在联系人列表中增加了一个人,那么则需要更新语音关键词库, 将该增加的联系人作为关键词添入关键词库,更新的时间可以是每次增加完一 个联系人的当时,也可以是每次重启手机时,这些可以根据实际情况进行设置, 当满足预设的条件时, 即触发更新操作。 上述内容详细叙述了本申请的移动终端语音识别的方法实施例, 相应地, 本申请还提供了一种移动终端语音识别的装置实施例。 参见图 2, 该图示出了 本申请的移动终端语音识别的装置的结构框图。 该装置包括: 触发消息接收单 元 201、 语音关键词信息接收单元 202、 语音关键词识别单元 203和关键词库 检索单元 204 , 其中:
触发消息接收单元 201 , 用于接收对移动终端进行操作的待操作类别的触 发消息, 所述操作类别为根据移动终端的业务功能划分的类别;
语音关键词信息接收单元 202 , 用于接收语音关键词信息;
语音关键词识别单元 203 , 用于从语音关键词信息中确定语音关键词; 关键词库检索单元 204 , 用于根据语音关键词检索所述待操作类别项下的 关键词库, 返回检索结果。
上述装置实施例的工作过程是:触发消息接收单元 201接收对移动终端进 行操作的待操作类别的触发消息;语音关键词信息接收单元 202接收语音关键 词信息, 由语音关键词识别单元 203从语音关键词信息中确定语音关键词; 然 后,由关键词库检索单元 204根据语音关键词检索所述待操作类别项下的关键 词库, 返回检索结果。
本装置实施例接收到依据移动终端业务功能划分的某个操作类别的触发 消息后, 接收语音关键词信息, 从语音关键词中确定语音关键词, 然后根据语 音关键词检索相应的关键词库,并返回检索结果。与现有的语音识别技术相比, 本装置实施例由于根据业务功能对操作类别进行了划分,使关键词库仅仅与每 个操作类别对应,一方面根据语音关键词检索时检索处理对象仅限于与对移动 终端的操作相对应的关键词库, 减少了处理对象的数量, 适应了移动终端的处 理能力较弱的特点; 又一方面,检索涉及的处理对象的数量减少使检索过程的 时间缩短, 从而提高了语音识别的效率; 再一方面, 检索涉及的处理对象的数 量减少使出现关键词的重复和二义性的概率降低,从而提高了语音识别的准确 性。 而且, 本装置实施例在接收语音信息时以语音关键词信息的形式接收, 不 再是普通的自然语言, 避免了多词、 长句和多句, 一方面更加容易从语音信息 中提取出关键词, 进而提高了语音识别的效率; 另一方面通过从语音关键词信 息中提取的关键词与关键词库匹配来获取返回结果,有利于提高语音识别的准 确性。
在实际应用过程中, 具有多种触发操作类别的方式, 不同的方式对应的触 发消息接收单元的具体结构可能不同。 下面提供两种方式, 本领域技术人员基 于这两种方式可以推知其他的实现方式:
方式之一:通过弹出窗口并接收用户的点击或焦点移动的方式来确定接收 到操作类别触发消息。 这种方式下, 触发消息接收单元 201可以包括: 操作类 别窗口呈现子单元 2011和触发消息接收子单元 2012, 其中:
操作类别窗口呈现子单元 2011 , 用于在移动终端屏幕上呈现操作类别窗 口;
触发消息接收子单元 2012 , 用于在所述操作类别窗口中的一个操作类别 对应的标签被点击或确定为焦点时,接收到对移动终端进行操作的待操作类别 的触发消息。
方式之二:通过感应器识别用户的操作的方式类确认接收到操作类别触发 消息。 这种方式下, 触发消息接收单元具体包括: 监听结果判断子单元和触发 消息接收子单元, 其中:
所述监听结果判断子单元,用于判断第一监听器监听到的 Z轴上的重力加 速度分量是否为 2, X、 Y轴上的重力加速度分量是否为 7 , 且第二监听器监 听到的距离是否为零, 所述 X、 Y轴为移动终端面板所在的平面, 所述 Z轴垂 直于 X、 Y轴构成的平面, 所述第一监听器为接收到传感器服务后注册的对重 力传感器的监听器,所述第二监听器为接收到传感器服务器后注册的对距离传 感器的监听器;
所述触发消息接收子单元, 用于在判断结果均为是时,确定接收到对移动 终端进行操作的待操作类别的触发消息, 所述操作类别为联系人。
在第二种方式下, 其他功能单元存在着相应的变化, 即语音关键词信息接 收单元具体用于接收包含联系人的语音关键词信息,语音关键词识别单元具体 用于从语音关键词信息中确定出联系人关键词,关键词检索单元具体用于根据 所述联系人关键词检索联系人库,返回检索到的联系人。 上述装置实施例还包 括呼叫单元, 用于呼叫所述检索到的联系人。 进一步地, 上述装置实施例还包 括联系人编号单元和编号语音信息接收单元, 其中: 所述联系人编号单元, 用 于在根据所述联系人关键词检索到的联系人包括多个时,对每个联系人进行编 号; 所述编号语音信息接收单元, 用于接收编号语音信息, 所述呼叫单元具体 用于呼叫编号语音信息对应的联系人。
此外,还可以基于某些实际需要,对上述装置实施例进行某些变形或等同 替换, 以获得更加优化的技术效果。 比如, 上述装置实施例还包括关键词频次 增加单元, 用于在移动终端被操作后,增加所述操作在其操作类别项下的关键 词库中对应的关键词的频次,则所述关键词库检索单元具体用于在根据语音关 键词检索待操作项下的关键词库时,按照关键词频次由大到小的顺序检索关键 词库。 通过增加该单元可提高检索的速度。 再比如, 上述装置实施例还可以包 括关键词更新单元 205 , 用于在移动终端被操作后, 在满足预设条件时根据所 述操作结果对操作类别项下的关键词库进行更新。
需要说明的是: 为了叙述的简便, 本说明书的上述实施例以及实施例的各 种变形实现方式重点说明的都是与其他实施例或变形方式的不同之处,各个情 形之间相同相似的部分互相参见即可。尤其,对于装置实施例的几个改进方式 而言, 由于其基本相似于方法实施例, 所以描述得比较简单, 相关之处参见方 法实施例的部分说明即可。以上所描述的装置实施例的各单元可以是或者也可 以不是物理上分开的, 既可以位于一个地方, 或者也可以分布到多个网络环境 下。在实际应用过程中, 可以根据实际的需要选择其中的部分或者全部单元来 实现本实施例方案的目的, 本领域普通技术人员在不付出创造性劳动的情况 下, 即可以理解并实施。
以上所述仅是本申请的具体实施方式,应当指出,对于本技术领域的普通 技术人员来说, 在不脱离本申请原理的前提下, 还可以做出若干改进和润饰, 这些改进和润饰也应视为本申请的保护范围。

Claims

权 利 要 求
1、 一种移动终端的语音识别方法, 其特征在于, 该方法包括:
接收对移动终端进行操作的待操作的操作类别的触发消息,所述操作类别 为根据移动终端的业务功能划分的类别; 接收语音关键词信息,从语音关键词 信息中确定语音关键词;
根据语音关键词检索所述待操作的操作类别项下的关键词库,返回检索结 果。
2、 根据权利要求 1所述的方法, 其特征在于, 所述接收对移动终端进行 操作的待操作的操作类别的触发消息具体包括:
在移动终端屏幕上呈现操作类别窗口,当所述操作类别窗口中的一个操作 类别对应的标签被点击或确定为焦点时,确定接收到对移动终端进行操作的待 操作的操作类别的触发消息。
3、 根据权利要求 2所述的方法, 其特征在于, 所述操作类别窗口中的操 作类别对应的标签包括用于实现通信业务功能的联系人标签、用于实现应用业 务功能的应用程序标签、 用于实现音乐播放业务功能的音乐标签和 /或用于实 现在线搜索业务功能的网页搜索标签。
4、 根据权利要求 1所述的方法, 其特征在于, 所述接收对移动终端进行 操作的待操作类别的触发消息具体包括:
判断第一监听器监听到的 Z轴上的重力加速度分量是否在 0至 4个重力加 速度单位范围内, X、 Y轴上的重力加速度分量是否在 4到 10个重力加速度 单位范围内, 且第二监听器监听到的距离是否为零, 所述 X、 Y轴为移动终端 面板所在的平面, 所述 Z轴垂直于 X、 Y轴构成的平面, 所述第一监听器为接 收到传感器服务后注册的对重力传感器的监听器,所述第二监听器为接收到传 感器服务器后注册的对距离传感器的监听器; 如果均为是, 则确定接收到对移 动终端进行操作的待操作类别的触发消息, 所述操作类别为联系人; 则所述接 收语音关键词信息,从语音关键词信息中确定语音关键词,根据语音关键词检 索所述待操作类别项下的关键词库, 返回检索结果包括:
接收包含联系人的语音关键词信息,从语音关键词信息中确定出联系人关 键词,根据所述联系人关键词检索联系人库,返回检索到的联系人并呼叫该联 系人。
5、 根据权利要求 4所述的方法, 其特征在于, 当根据所述联系人关键词 检索到的联系人包括多个时, 对每个联系人进行编号, 接收编号语音信息, 呼 叫编号语音信息对应的联系人。
6、 根据权利要求 1所述的方法, 其特征在于, 当移动终端被操作后, 增 加所述操作在其操作类别项下的关键词库中对应的关键词的频次,则在根据语 音关键词检索待操作项下的关键词库时,按照关键词频次由大到小的顺序检索 关键词库。
7、 根据权利要求 1所述的方法, 其特征在于, 当移动终端被操作后, 在 满足预设条件时根据所述操作结果对操作类别项下的语音关键词库进行更新。
8、 一种移动终端的语音识别装置, 其特征在于, 该装置包括: 触发消息 接收单元、语音关键词信息接收单元、语音关键词识别单元和关键词库检索单 元, 其中:
所述触发消息接收单元,用于接收对移动终端进行操作的待操作类别的触 发消息, 所述操作类别为根据移动终端的业务功能划分的类别;
所述语音关键词信息接收单元, 用于接收语音关键词信息;
所述语音关键词识别单元, 用于从语音关键词信息中确定语音关键词; 所述关键词库检索单元,用于根据语音关键词检索所述待操作类别项下的 关键词库, 返回检索结果。
9、 根据权利要求 8所述的装置, 其特征在于, 所述触发消息接收单元具 体包括: 操作类别窗口呈现子单元和触发消息接收子单元, 其中:
所述操作类别窗口呈现子单元, 用于在移动终端屏幕上呈现操作类别窗 口;
所述触发消息接收子单元,用于在所述操作类别窗口中的一个操作类别对 应的标签被点击或确定为焦点时,接收到对移动终端进行操作的待操作类别的 触发消息。
10、根据权利要求 8所述的装置, 其特征在于, 所述触发消息接收单元具 体包括: 监听结果判断子单元和触发消息接收子单元, 其中:
所述监听结果判断子单元,用于判断第一监听器监听到的 Z轴上的重力加 速度分量是否在 0至 4个重力加速度单位范围内, X、 Y轴上的重力加速度分 量是否在 4至 10个重力加速度单位范围内, 且第二监听器监听到的距离是否 为零, 所述 X、 Y轴为移动终端面板所在的平面, 所述 Z轴垂直于 X、 Y轴构 成的平面,所述第一监听器为接收到传感器服务后注册的对重力传感器的监听 器, 所述第二监听器为接收到传感器服务器后注册的对距离传感器的监听器; 所述触发消息接收子单元, 用于在判断结果均为是时,确定接收到对移动 终端进行操作的待操作类别的触发消息, 所述操作类别为联系人;
所述语音关键词信息接收单元具体用于接收包含联系人的语音关键词信 息,所述语音关键词识别单元具体用于从语音关键词信息中确定出联系人关键 词, 所述关键词检索单元具体用于根据所述联系人关键词检索联系人库,返回 检索到的联系人;
所述装置还包括呼叫单元, 用于呼叫所述检索到的联系人。
11、 根据权利要求 10所述的装置, 其特征在于, 所述装置还包括联系人 编号单元和编号语音信息接收单元, 其中: 所述联系人编号单元, 用于在根据 所述联系人关键词检索到的联系人包括多个时,对每个联系人进行编号; 所述 编号语音信息接收单元, 用于接收编号语音信息, 所述呼叫单元具体用于呼叫 编号语音信息对应的联系人。
12、根据权利要求 8所述的装置, 其特征在于, 所述装置还包括关键词频 次增加单元, 用于在移动终端被操作后,增加所述操作在其操作类别项下的关 键词库中对应的关键词的频次,则所述关键词库检索单元具体用于在根据语音 关键词检索待操作项下的关键词库时,按照关键词频次由大到小的顺序检索关 键词库。
13、根据权利要求 8所述的装置, 其特征在于, 所述装置还包括关键词更 新单元, 用于在移动终端被操作后,在满足预设条件时根据所述操作结果对操 作类别项下的关键词库进行更新。
PCT/CN2014/076180 2013-05-02 2014-04-25 一种移动终端的语音识别方法及其装置 WO2014177015A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US14/787,926 US9502035B2 (en) 2013-05-02 2014-04-25 Voice recognition method for mobile terminal and device thereof

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201310157943.0A CN103280217B (zh) 2013-05-02 2013-05-02 一种移动终端的语音识别方法及其装置
CN201310157943.0 2013-05-02

Publications (1)

Publication Number Publication Date
WO2014177015A1 true WO2014177015A1 (zh) 2014-11-06

Family

ID=49062712

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2014/076180 WO2014177015A1 (zh) 2013-05-02 2014-04-25 一种移动终端的语音识别方法及其装置

Country Status (3)

Country Link
US (1) US9502035B2 (zh)
CN (1) CN103280217B (zh)
WO (1) WO2014177015A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106020763A (zh) * 2015-03-26 2016-10-12 三星电子株式会社 用于提供内容的方法和电子设备
CN110561453A (zh) * 2019-09-16 2019-12-13 北京觅机科技有限公司 一种绘本机器人的引导式陪读方法

Families Citing this family (124)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8677377B2 (en) 2005-09-08 2014-03-18 Apple Inc. Method and apparatus for building an intelligent automated assistant
US9318108B2 (en) 2010-01-18 2016-04-19 Apple Inc. Intelligent automated assistant
US8977255B2 (en) 2007-04-03 2015-03-10 Apple Inc. Method and system for operating a multi-function portable electronic device using voice-activation
US8676904B2 (en) 2008-10-02 2014-03-18 Apple Inc. Electronic devices with voice command and contextual data processing capabilities
US20120311585A1 (en) 2011-06-03 2012-12-06 Apple Inc. Organizing task items that represent tasks to perform
US10276170B2 (en) 2010-01-18 2019-04-30 Apple Inc. Intelligent automated assistant
US8682667B2 (en) 2010-02-25 2014-03-25 Apple Inc. User profiling for selecting user specific voice input processing information
US9262612B2 (en) 2011-03-21 2016-02-16 Apple Inc. Device access using voice authentication
US10417037B2 (en) 2012-05-15 2019-09-17 Apple Inc. Systems and methods for integrating third party services with a digital assistant
US10199051B2 (en) 2013-02-07 2019-02-05 Apple Inc. Voice trigger for a digital assistant
US10652394B2 (en) 2013-03-14 2020-05-12 Apple Inc. System and method for processing voicemail
US10748529B1 (en) 2013-03-15 2020-08-18 Apple Inc. Voice activated device for use with a voice-based digital assistant
CN110442699A (zh) 2013-06-09 2019-11-12 苹果公司 操作数字助理的方法、计算机可读介质、电子设备和系统
US10176167B2 (en) 2013-06-09 2019-01-08 Apple Inc. System and method for inferring user intent from speech inputs
CN103455642B (zh) * 2013-10-10 2017-03-08 三星电子(中国)研发中心 一种多媒体文件检索的方法和装置
CN103578474B (zh) * 2013-10-25 2017-09-12 小米科技有限责任公司 一种语音控制方法、装置和设备
US10296160B2 (en) 2013-12-06 2019-05-21 Apple Inc. Method for extracting salient dialog usage from live data
US9430463B2 (en) 2014-05-30 2016-08-30 Apple Inc. Exemplar-based natural language processing
US10170123B2 (en) 2014-05-30 2019-01-01 Apple Inc. Intelligent assistant for home automation
US9633004B2 (en) 2014-05-30 2017-04-25 Apple Inc. Better resolution when referencing to concepts
US9715875B2 (en) 2014-05-30 2017-07-25 Apple Inc. Reducing the need for manual start/end-pointing and trigger phrases
EP3480811A1 (en) 2014-05-30 2019-05-08 Apple Inc. Multi-command single utterance input method
US9338493B2 (en) 2014-06-30 2016-05-10 Apple Inc. Intelligent automated assistant for TV user interactions
CN105407316B (zh) * 2014-08-19 2019-05-31 北京奇虎科技有限公司 智能摄像系统的实现方法、智能摄像系统和网络摄像头
US9668121B2 (en) 2014-09-30 2017-05-30 Apple Inc. Social reminders
US10127911B2 (en) 2014-09-30 2018-11-13 Apple Inc. Speaker identification and unsupervised speaker adaptation techniques
US10074360B2 (en) 2014-09-30 2018-09-11 Apple Inc. Providing an indication of the suitability of speech recognition
CN105991827A (zh) * 2015-02-11 2016-10-05 中兴通讯股份有限公司 呼叫处理方法及装置
US10152299B2 (en) 2015-03-06 2018-12-11 Apple Inc. Reducing response latency of intelligent automated assistants
US9721566B2 (en) 2015-03-08 2017-08-01 Apple Inc. Competing devices responding to voice triggers
US9886953B2 (en) 2015-03-08 2018-02-06 Apple Inc. Virtual assistant activation
US10460227B2 (en) 2015-05-15 2019-10-29 Apple Inc. Virtual assistant in a communication session
US10200824B2 (en) 2015-05-27 2019-02-05 Apple Inc. Systems and methods for proactively identifying and surfacing relevant content on a touch-sensitive device
US10083688B2 (en) 2015-05-27 2018-09-25 Apple Inc. Device voice control for selecting a displayed affordance
US9578173B2 (en) 2015-06-05 2017-02-21 Apple Inc. Virtual assistant aided communication with 3rd party service in a communication session
CN106328129B (zh) * 2015-06-18 2020-11-27 中兴通讯股份有限公司 指令处理方法及装置
US20160378747A1 (en) 2015-06-29 2016-12-29 Apple Inc. Virtual assistant for media playback
CN105161099B (zh) * 2015-08-12 2019-11-26 恬家(上海)信息科技有限公司 一种语音控制的遥控装置及其实现方法
US10740384B2 (en) 2015-09-08 2020-08-11 Apple Inc. Intelligent automated assistant for media search and playback
US10747498B2 (en) 2015-09-08 2020-08-18 Apple Inc. Zero latency digital assistant
US10671428B2 (en) 2015-09-08 2020-06-02 Apple Inc. Distributed personal assistant
US10331312B2 (en) 2015-09-08 2019-06-25 Apple Inc. Intelligent automated assistant in a media environment
US10691473B2 (en) 2015-11-06 2020-06-23 Apple Inc. Intelligent automated assistant in a messaging environment
CN105426357A (zh) * 2015-11-06 2016-03-23 武汉卡比特信息有限公司 语音快速选择方法
US10956666B2 (en) 2015-11-09 2021-03-23 Apple Inc. Unconventional virtual assistant interactions
CN105450822A (zh) * 2015-11-11 2016-03-30 百度在线网络技术(北京)有限公司 智能语音交互方法和装置
US10223066B2 (en) 2015-12-23 2019-03-05 Apple Inc. Proactive assistance based on dialog communication between devices
CN107025046A (zh) * 2016-01-29 2017-08-08 阿里巴巴集团控股有限公司 终端应用语音操作方法及系统
CN106098066B (zh) * 2016-06-02 2020-01-17 深圳市智物联网络有限公司 语音识别方法及装置
US11227589B2 (en) 2016-06-06 2022-01-18 Apple Inc. Intelligent list reading
US10586535B2 (en) 2016-06-10 2020-03-10 Apple Inc. Intelligent digital assistant in a multi-tasking environment
DK201670540A1 (en) 2016-06-11 2018-01-08 Apple Inc Application integration with a digital assistant
DK179415B1 (en) 2016-06-11 2018-06-14 Apple Inc Intelligent device arbitration and control
CN107799115A (zh) * 2016-08-29 2018-03-13 法乐第(北京)网络科技有限公司 一种语音识别方法及装置
US10474753B2 (en) 2016-09-07 2019-11-12 Apple Inc. Language identification using recurrent neural networks
CN106683669A (zh) * 2016-11-23 2017-05-17 河池学院 一种机器人语音控制系统
CN106603826A (zh) * 2016-11-29 2017-04-26 维沃移动通信有限公司 一种应用事件的处理方法及移动终端
CN106844484B (zh) * 2016-12-23 2020-08-28 北京安云世纪科技有限公司 信息搜索方法、装置及移动终端
US11204787B2 (en) 2017-01-09 2021-12-21 Apple Inc. Application integration with a digital assistant
KR102398390B1 (ko) * 2017-03-22 2022-05-16 삼성전자주식회사 전자 장치 및 전자 장치의 제어 방법
KR102068182B1 (ko) * 2017-04-21 2020-01-20 엘지전자 주식회사 음성 인식 장치, 및 음성 인식 시스템
CN107038052A (zh) * 2017-04-28 2017-08-11 陈银芳 语音卸载文件的方法及终端
CN108874797B (zh) * 2017-05-08 2020-07-03 北京字节跳动网络技术有限公司 语音处理方法和装置
DK201770383A1 (en) 2017-05-09 2018-12-14 Apple Inc. USER INTERFACE FOR CORRECTING RECOGNITION ERRORS
US10417266B2 (en) 2017-05-09 2019-09-17 Apple Inc. Context-aware ranking of intelligent response suggestions
US10395654B2 (en) 2017-05-11 2019-08-27 Apple Inc. Text normalization based on a data-driven learning network
US10726832B2 (en) 2017-05-11 2020-07-28 Apple Inc. Maintaining privacy of personal information
DK180048B1 (en) 2017-05-11 2020-02-04 Apple Inc. MAINTAINING THE DATA PROTECTION OF PERSONAL INFORMATION
DK179745B1 (en) 2017-05-12 2019-05-01 Apple Inc. SYNCHRONIZATION AND TASK DELEGATION OF A DIGITAL ASSISTANT
DK179496B1 (en) 2017-05-12 2019-01-15 Apple Inc. USER-SPECIFIC Acoustic Models
US11301477B2 (en) 2017-05-12 2022-04-12 Apple Inc. Feedback analysis of a digital assistant
DK201770428A1 (en) 2017-05-12 2019-02-18 Apple Inc. LOW-LATENCY INTELLIGENT AUTOMATED ASSISTANT
US20180336892A1 (en) 2017-05-16 2018-11-22 Apple Inc. Detecting a trigger of a digital assistant
US10311144B2 (en) 2017-05-16 2019-06-04 Apple Inc. Emoji word sense disambiguation
US20180336275A1 (en) 2017-05-16 2018-11-22 Apple Inc. Intelligent automated assistant for media exploration
DK179549B1 (en) 2017-05-16 2019-02-12 Apple Inc. FAR-FIELD EXTENSION FOR DIGITAL ASSISTANT SERVICES
CN107564517A (zh) * 2017-07-05 2018-01-09 百度在线网络技术(北京)有限公司 语音唤醒方法、设备及系统、云端服务器与可读介质
CN107731231B (zh) * 2017-09-15 2020-08-14 瑞芯微电子股份有限公司 一种支持多云端语音服务的方法及一种存储设备
US10592604B2 (en) 2018-03-12 2020-03-17 Apple Inc. Inverse text normalization for automatic speech recognition
US10818288B2 (en) 2018-03-26 2020-10-27 Apple Inc. Natural assistant interaction
CN108665900B (zh) 2018-04-23 2020-03-03 百度在线网络技术(北京)有限公司 云端唤醒方法及系统、终端以及计算机可读存储介质
US10674427B2 (en) * 2018-05-01 2020-06-02 GM Global Technology Operations LLC System and method to select and operate a mobile device through a telematics unit
US11145294B2 (en) 2018-05-07 2021-10-12 Apple Inc. Intelligent automated assistant for delivering content from user experiences
US10928918B2 (en) 2018-05-07 2021-02-23 Apple Inc. Raise to speak
US10892996B2 (en) 2018-06-01 2021-01-12 Apple Inc. Variable latency device coordination
DK201870355A1 (en) 2018-06-01 2019-12-16 Apple Inc. VIRTUAL ASSISTANT OPERATION IN MULTI-DEVICE ENVIRONMENTS
DK179822B1 (da) 2018-06-01 2019-07-12 Apple Inc. Voice interaction at a primary device to access call functionality of a companion device
DK180639B1 (en) 2018-06-01 2021-11-04 Apple Inc DISABILITY OF ATTENTION-ATTENTIVE VIRTUAL ASSISTANT
US11076039B2 (en) 2018-06-03 2021-07-27 Apple Inc. Accelerated task performance
CN109120774A (zh) * 2018-06-29 2019-01-01 深圳市九洲电器有限公司 终端应用语音操控方法及系统
CN108962261A (zh) * 2018-08-08 2018-12-07 联想(北京)有限公司 信息处理方法、信息处理装置和蓝牙耳机
CN108984800B (zh) * 2018-08-22 2020-10-16 广东小天才科技有限公司 一种语音搜题方法及终端设备
US11010561B2 (en) 2018-09-27 2021-05-18 Apple Inc. Sentiment prediction from textual data
US10839159B2 (en) 2018-09-28 2020-11-17 Apple Inc. Named entity normalization in a spoken dialog system
US11170166B2 (en) 2018-09-28 2021-11-09 Apple Inc. Neural typographical error modeling via generative adversarial networks
CN110970032A (zh) * 2018-09-28 2020-04-07 深圳市冠旭电子股份有限公司 一种音箱语音交互控制的方法及装置
US11462215B2 (en) 2018-09-28 2022-10-04 Apple Inc. Multi-modal inputs for voice commands
US11475898B2 (en) 2018-10-26 2022-10-18 Apple Inc. Low-latency multi-speaker speech recognition
KR102590914B1 (ko) * 2018-12-14 2023-10-19 삼성전자주식회사 전자 장치 및 이의 제어 방법
US11638059B2 (en) 2019-01-04 2023-04-25 Apple Inc. Content playback on multiple devices
CN109918040B (zh) * 2019-03-15 2022-08-16 阿波罗智联(北京)科技有限公司 语音指令分发方法和装置、电子设备及计算机可读介质
US11348573B2 (en) 2019-03-18 2022-05-31 Apple Inc. Multimodality in digital assistant systems
US11423908B2 (en) 2019-05-06 2022-08-23 Apple Inc. Interpreting spoken requests
US11475884B2 (en) 2019-05-06 2022-10-18 Apple Inc. Reducing digital assistant latency when a language is incorrectly determined
DK201970509A1 (en) 2019-05-06 2021-01-15 Apple Inc Spoken notifications
US11307752B2 (en) 2019-05-06 2022-04-19 Apple Inc. User configurable task triggers
US11140099B2 (en) 2019-05-21 2021-10-05 Apple Inc. Providing message response suggestions
DK180129B1 (en) 2019-05-31 2020-06-02 Apple Inc. USER ACTIVITY SHORTCUT SUGGESTIONS
DK201970510A1 (en) 2019-05-31 2021-02-11 Apple Inc Voice identification in digital assistant systems
US11289073B2 (en) 2019-05-31 2022-03-29 Apple Inc. Device text to speech
US11496600B2 (en) 2019-05-31 2022-11-08 Apple Inc. Remote execution of machine-learned models
US11468890B2 (en) 2019-06-01 2022-10-11 Apple Inc. Methods and user interfaces for voice-based control of electronic devices
US11360641B2 (en) 2019-06-01 2022-06-14 Apple Inc. Increasing the relevance of new available information
WO2021056255A1 (en) 2019-09-25 2021-04-01 Apple Inc. Text detection using global geometry estimators
US11615790B1 (en) * 2019-09-30 2023-03-28 Amazon Technologies, Inc. Disambiguating contacts using relationship data
US11061543B1 (en) 2020-05-11 2021-07-13 Apple Inc. Providing relevant data items based on context
US11043220B1 (en) 2020-05-11 2021-06-22 Apple Inc. Digital assistant hardware abstraction
US11755276B2 (en) 2020-05-12 2023-09-12 Apple Inc. Reducing description length based on confidence
US11917092B2 (en) * 2020-06-04 2024-02-27 Syntiant Systems and methods for detecting voice commands to generate a peer-to-peer communication link
US11490204B2 (en) 2020-07-20 2022-11-01 Apple Inc. Multi-device audio adjustment coordination
US11438683B2 (en) 2020-07-21 2022-09-06 Apple Inc. User identification using headphones
CN112199033B (zh) * 2020-09-30 2023-06-20 北京搜狗科技发展有限公司 一种语音输入方法、装置和电子设备
CN113838467B (zh) * 2021-08-02 2023-11-14 北京百度网讯科技有限公司 语音处理方法、装置及电子设备
CN115659302B (zh) * 2022-09-22 2023-07-14 北京睿家科技有限公司 一种漏检人员确定方法、装置、电子设备及存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1801846A (zh) * 2004-12-30 2006-07-12 中国科学院自动化研究所 耳机全语音手机拨号交互应用的方法
US20090273682A1 (en) * 2008-04-06 2009-11-05 Shekarri Nache D Systems And Methods For A Recorder User Interface
CN101601259A (zh) * 2007-01-12 2009-12-09 松下电器产业株式会社 控制便携式终端的语音识别功能的方法和无线通信系统
CN101855521A (zh) * 2007-11-12 2010-10-06 大众汽车有限公司 用于信息的输入和展示的驾驶员辅助系统的多形态的用户接口
CN102663016A (zh) * 2012-03-21 2012-09-12 上海汉翔信息技术有限公司 电子设备上输入候选框进行输入信息扩展的系统及其方法
CN102915733A (zh) * 2011-11-17 2013-02-06 微软公司 交互式语音识别

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6449496B1 (en) * 1999-02-08 2002-09-10 Qualcomm Incorporated Voice recognition user interface for telephone handsets
US6741963B1 (en) * 2000-06-21 2004-05-25 International Business Machines Corporation Method of managing a speech cache
US7246063B2 (en) * 2002-02-15 2007-07-17 Sap Aktiengesellschaft Adapting a user interface for voice control
JP2004341033A (ja) * 2003-05-13 2004-12-02 Matsushita Electric Ind Co Ltd 音声媒介起動装置およびその方法
KR20050028150A (ko) * 2003-09-17 2005-03-22 삼성전자주식회사 음성 신호를 이용한 유저-인터페이스를 제공하는휴대단말기 및 그 방법
CN101853253A (zh) * 2009-03-30 2010-10-06 三星电子株式会社 在移动终端中管理多媒体内容的设备和方法
CN103020069A (zh) * 2011-09-22 2013-04-03 联想(北京)有限公司 一种搜索数据的方法、装置及电子设备
CN102591932A (zh) * 2011-12-23 2012-07-18 优视科技有限公司 语音搜索方法及系统、移动终端、中转服务器
CN103077176A (zh) * 2012-01-13 2013-05-01 北京飞漫软件技术有限公司 一种在浏览器中按关键词类型进行快捷搜索的方法

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1801846A (zh) * 2004-12-30 2006-07-12 中国科学院自动化研究所 耳机全语音手机拨号交互应用的方法
CN101601259A (zh) * 2007-01-12 2009-12-09 松下电器产业株式会社 控制便携式终端的语音识别功能的方法和无线通信系统
CN101855521A (zh) * 2007-11-12 2010-10-06 大众汽车有限公司 用于信息的输入和展示的驾驶员辅助系统的多形态的用户接口
US20090273682A1 (en) * 2008-04-06 2009-11-05 Shekarri Nache D Systems And Methods For A Recorder User Interface
CN102915733A (zh) * 2011-11-17 2013-02-06 微软公司 交互式语音识别
CN102663016A (zh) * 2012-03-21 2012-09-12 上海汉翔信息技术有限公司 电子设备上输入候选框进行输入信息扩展的系统及其方法

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106020763A (zh) * 2015-03-26 2016-10-12 三星电子株式会社 用于提供内容的方法和电子设备
US10049662B2 (en) 2015-03-26 2018-08-14 Samsung Electronics Co., Ltd. Method and electronic device for providing content
CN106020763B (zh) * 2015-03-26 2019-03-15 三星电子株式会社 用于提供内容的方法和电子设备
CN110561453A (zh) * 2019-09-16 2019-12-13 北京觅机科技有限公司 一种绘本机器人的引导式陪读方法

Also Published As

Publication number Publication date
US20160098991A1 (en) 2016-04-07
US9502035B2 (en) 2016-11-22
CN103280217B (zh) 2016-05-04
CN103280217A (zh) 2013-09-04

Similar Documents

Publication Publication Date Title
WO2014177015A1 (zh) 一种移动终端的语音识别方法及其装置
US11321116B2 (en) Systems and methods for integrating third party services with a digital assistant
US20230206940A1 (en) Method of and system for real time feedback in an incremental speech input interface
US20210224032A1 (en) Virtual assistant for media playback
US9697829B1 (en) Evaluating pronouns in context
US9502031B2 (en) Method for supporting dynamic grammars in WFST-based ASR
US9502032B2 (en) Dynamically biasing language models
US8745025B2 (en) Methods and apparatus for searching the Internet
AU2012227294B2 (en) Speech recognition repair using contextual information
US10698654B2 (en) Ranking and boosting relevant distributable digital assistant operations
US20120259636A1 (en) Method and apparatus for processing spoken search queries
US20120060113A1 (en) Methods and apparatus for displaying content
US20120059658A1 (en) Methods and apparatus for performing an internet search
US11575624B2 (en) Contextual feedback, with expiration indicator, to a natural understanding system in a chat bot
US20120059814A1 (en) Methods and apparatus for selecting a search engine to which to provide a search query
CN106663113B (zh) 保存并获取对象的位置
US20200380076A1 (en) Contextual feedback to a natural understanding system in a chat bot using a knowledge model
CN112334979A (zh) 通过计算设备检测持续对话
CN112262382A (zh) 上下文深层书签的注释和检索
WO2017028635A1 (zh) 一种信息处理系统、方法、电子设备和计算机存储介质
US11477140B2 (en) Contextual feedback to a natural understanding system in a chat bot
CN117099077A (zh) 具有用户视图上下文和多模态输入支持的支持语音助手的客户端应用
TW202240461A (zh) 使用輔助系統的語音和手勢輸入之文字編輯
CN115130478A (zh) 意图决策方法及设备、计算机可读存储介质
KR20190134929A (ko) 대화 이해 ai 시스템에 의하여, 키워드 기반 북마크 검색 서비스 제공을 위하여 북마크 정보를 저장하는 방법 및 컴퓨터 판독가능 기록 매체

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14791728

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 14787926

Country of ref document: US

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14791728

Country of ref document: EP

Kind code of ref document: A1