JP2014010456A - Mobile terminal and voice recognition method thereof - Google Patents

Mobile terminal and voice recognition method thereof Download PDF

Info

Publication number
JP2014010456A
JP2014010456A JP2013134874A JP2013134874A JP2014010456A JP 2014010456 A JP2014010456 A JP 2014010456A JP 2013134874 A JP2013134874 A JP 2013134874A JP 2013134874 A JP2013134874 A JP 2013134874A JP 2014010456 A JP2014010456 A JP 2014010456A
Authority
JP
Japan
Prior art keywords
voice recognition
recognition data
mobile terminal
voice
speech recognition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
JP2013134874A
Other languages
Japanese (ja)
Other versions
JP5956384B2 (en
Inventor
Juhee Kim
チュヒ キム
Hyunseob Lee
ヒョンソプ リ
Jun-Yeob Lee
ジュンヨプ リ
Jungkyu Choi
チョンギュ チェ
Original Assignee
Lg Electronics Inc
エルジー エレクトロニクス インコーポレイティド
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority to KR10-2012-0070353 priority Critical
Priority to KR1020120070353A priority patent/KR101961139B1/en
Application filed by Lg Electronics Inc, エルジー エレクトロニクス インコーポレイティド filed Critical Lg Electronics Inc
Publication of JP2014010456A publication Critical patent/JP2014010456A/en
Application granted granted Critical
Publication of JP5956384B2 publication Critical patent/JP5956384B2/en
Application status is Active legal-status Critical
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/30Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/32Multiple recognisers used in sequence or in parallel; Score combination systems therefor, e.g. voting systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers; Analogous equipment at exchanges
    • H04M1/26Devices for signalling identity of wanted subscriber
    • H04M1/27Devices whereby a plurality of signals may be stored simultaneously
    • H04M1/271Devices whereby a plurality of signals may be stored simultaneously controlled by voice recognition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/226Taking into account non-speech caracteristics
    • G10L2015/228Taking into account non-speech caracteristics of application context

Abstract

A mobile terminal having a voice recognition function and a voice recognition method thereof are provided.
According to an embodiment of the present invention, there is provided a voice recognition method for a mobile terminal in conjunction with a server, comprising: receiving a user's voice; and receiving the received voice in a first voice recognition engine and a mobile terminal provided in the server. Providing the second speech recognition engine, obtaining the first speech recognition data as a result of the first speech recognition engine recognizing the received speech, and the second speech recognition engine recognizing the received speech. As a result, obtaining the second speech recognition data, predicting a function corresponding to the user's intention based on at least one of the first and second speech recognition data, and requesting personal information for the predicted function When calculating the similarity between the first and second speech recognition data, and calculating the similarity between the first and second speech recognition data based on the calculated similarity. And selecting a deviation or the other.
[Selection] Figure 4

Description

  The present invention relates to a mobile terminal, and more particularly, to a mobile terminal having a voice recognition function and a voice recognition method thereof.

  Terminals are classified into mobile terminals (mobile / portable terminals) and fixed terminals (stationary terminals) depending on whether they can move. Furthermore, the mobile terminal is classified into a portable (type) handheld terminal and a vehicle mount terminal depending on whether or not the user can directly carry the mobile terminal.

  Such terminals can be realized in the form of multimedia devices with complex functions such as taking pictures and moving images, playing music and moving image files, playing games, receiving broadcasts, etc. Has been. In addition, in order to support and improve the functions of such a terminal, attempts have been made to improve the structural part and / or the software part of the terminal.

  As an example of such an improvement, a speech recognition function can be executed using various algorithms in a mobile terminal. In executing the speech recognition function, a large amount of data computation and resources are required. Thereby, a distributed speech recognition system for realizing appropriate resource distribution has been introduced.

  However, in such a distributed speech recognition system, it is required to improve the speed and accuracy of speech recognition.

  An object of the present invention is to provide a mobile terminal capable of improving the reliability of a speech recognition result.

  Another object of the present invention is to provide a mobile terminal capable of preventing leakage of personal information when executing a voice recognition function.

  In order to achieve the above object, a method of recognizing a voice of a mobile terminal in conjunction with a server according to an embodiment of the present invention includes receiving a user's voice and the server including the received voice in the server. Providing a voice recognition engine and a second voice recognition engine provided in the mobile terminal; obtaining first voice recognition data as a result of the first voice recognition engine recognizing the received voice; Obtaining the second voice recognition data as a result of the second voice recognition engine recognizing the received voice, and a user based on at least one of the first voice recognition data and the second voice recognition data Predicting a function corresponding to the intention of the user, and when the predicted function requires personal information, the similarity between the first voice recognition data and the second voice recognition data Calculating a, based on the degree of similarity the calculated, and selecting a one of said first voice recognition data second voice recognition data.

  According to an aspect, the speech recognition method may further include ignoring the second speech recognition data when personal information is not required for the predicted function.

  According to another aspect, obtaining the first voice recognition data includes sending a request signal for requesting the first voice recognition data to the server, and as a response to the request signal, the first voice recognition data. Receiving voice recognition data from the server.

  According to still another aspect, the voice recognition method includes the step of grasping status information of a network connecting the server and the mobile terminal, and the first voice recognition data based on the status information of the network. May further include a step of blocking reception of. The speech recognition method may further include a step of executing the predicted function using the second speech recognition data when reception of the first speech recognition data is blocked.

  According to still another aspect, the voice recognition method includes a step of displaying a menu button for executing a personal information protection function, and the personal information protection function is executed in response to a touch input of the menu button. In some cases, the method may further include blocking providing the received voice to the first voice recognition engine. The speech recognition method may further include a step of executing the predicted function using any one of the selected speech recognition data.

  According to still another aspect, the step of acquiring the second voice recognition data may include a step of recognizing the received voice based on the personal information database.

  In order to achieve the above object, a mobile terminal linked to a server according to an embodiment of the present invention transmits a microphone that receives a user's voice, the received voice to the server, and the received voice is transmitted to the server. A communication unit that receives first speech recognition data generated as a result of recognition by a first speech recognition engine provided in the server, and a second that generates second speech recognition data as a result of recognizing the received speech. When a function corresponding to a user's intention is predicted based on a voice recognition engine, at least one of the first voice recognition data and the second voice recognition data, and personal information is required for the predicted function, A similarity between the first speech recognition data and the second speech recognition data is calculated, and the first speech recognition data and the second speech recognition data are calculated based on the calculated similarity. Re Choose one and a controller.

  According to an aspect, the control unit may ignore the second speech recognition data when personal information is not required for the predicted function.

  According to another aspect, the control unit grasps status information of a network connecting the server and the mobile terminal, and blocks reception of the first voice recognition data based on the status information of the network. You may make it do. The control unit may execute the predicted function using the second voice recognition data when reception of the first voice recognition data is interrupted.

  According to another aspect, the mobile terminal may further include a display unit that displays a menu button for executing a personal information protection function. The control unit may block the transmission of the received voice to the server when the personal information protection function is executed in response to a touch input of the menu button.

  According to still another aspect, the control unit may execute the predicted function using any one of the selected speech recognition data.

  According to still another aspect, the second speech recognition engine may recognize the received speech based on the personal information database.

  According to the present invention, among the speech recognition results of the mutually complementary remote speech recognition engine (first speech recognition engine) and the local speech recognition engine (second speech recognition engine), the reliability is high by a predetermined algorithm. The voice recognition rate of the mobile terminal can be improved by selecting and using the voice recognition result determined to be.

  Further, according to the present invention, when personal information is required for a function predicted in the voice recognition process, the voice recognition result of the local voice recognition engine is used to prevent the remote voice recognition engine from recognizing voice related to the personal information. can do. That is, it is possible to prevent the leakage of personal information.

  Furthermore, according to the present invention, when the network condition is not good, the speech recognition processing of the remote speech recognition engine is ignored, and the delay due to the reception of the speech recognition result from the remote speech recognition engine is eliminated. Speed can be increased.

1 is a block diagram illustrating a mobile terminal according to an embodiment of the present invention. 1 is a front perspective view of a mobile terminal according to an embodiment of the present invention. 1 is a rear perspective view of a mobile terminal according to an embodiment of the present invention. 1 is a block diagram illustrating a speech recognition system according to an embodiment of the present invention. 3 is a flowchart illustrating a voice recognition method of a mobile terminal according to an exemplary embodiment of the present invention. 3 is a flowchart illustrating a voice recognition method of a mobile terminal related to whether or not voice recognition data is received according to an embodiment of the present invention. 3 is a flowchart illustrating a voice recognition method of a mobile terminal related to whether or not voice recognition data is received according to an embodiment of the present invention. 3 is a flowchart illustrating a voice recognition method of a mobile terminal related to a personal information protection function according to an embodiment of the present invention. FIG. 8 is a conceptual diagram illustrating a user interface of a mobile terminal to which the speech recognition method of FIG. 7 is applied. 4 is a flowchart illustrating a voice recognition method of a mobile terminal related to a user's selection of voice recognition data according to an embodiment of the present invention. FIG. 10 is a conceptual diagram illustrating a user interface of a mobile terminal to which the speech recognition method of FIG. 9 is applied.

  Hereinafter, a mobile terminal and a voice recognition method thereof according to a preferred embodiment of the present invention will be described in detail with reference to the accompanying drawings so that a person having ordinary knowledge in the technical field to which the present invention belongs can be easily implemented. . However, the present invention is not limited to the embodiments described later, and can be realized in various forms. In order to describe the present invention more clearly, in the drawings, portions not related to the description are omitted, and the same or similar components are denoted by the same or similar reference numerals throughout the specification.

  The mobile terminals described in this specification include mobile phones, smartphones, notebook computers, digital broadcasting terminals, personal digital assistants (PDAs), portable multimedia players (PMP), and navigation. Etc. are included. However, the configuration according to the embodiment disclosed in the present specification can be applied to fixed terminals such as digital televisions and desktop computers, except when it can be applied only to mobile terminals. Those who have ordinary knowledge in the field will be able to easily understand.

  FIG. 1 is a block diagram illustrating a mobile terminal according to an embodiment of the present invention.

  As shown in FIG. 1, the mobile terminal 100 includes a wireless communication unit 110, an A / V (Audio / Video) input unit 120, a user input unit 130, a sensing unit 140, an output unit 150, a memory 160, an interface unit 170, A control unit 180, a power supply unit 190, and the like are included. Not all the constituent elements shown in FIG. 1 are essential constituent elements, and the mobile terminal according to the present invention may be realized with more constituent elements than the constituent elements shown in the figure, and may be realized with fewer constituent elements. May be.

  Hereinafter, components of the mobile terminal 100 will be sequentially described.

  The wireless communication unit 110 includes at least one module that enables wireless communication between the mobile terminal 100 and the wireless communication system, or wireless communication between networks in which the mobile terminal 100 and the mobile terminal 100 are located. For example, the wireless communication unit 110 includes a broadcast receiving module 111, a mobile communication module 112, a wireless Internet module 113, a short-range communication module 114, a position information module 115, and the like.

  The broadcast receiving module 111 receives a broadcast signal and broadcast related information from an external broadcast management server via a broadcast channel. Broadcast related information includes information regarding broadcast channels, broadcast programs, or broadcast service providers. Note that the broadcast-related information can be provided via a mobile communication network, and in this case, can be received by the mobile communication module 112. Broadcast signals and broadcast related information received by the broadcast receiving module 111 can be stored in the memory 160.

  The mobile communication module 112 transmits and receives radio signals to / from at least one of a base station, an external terminal, and a server on the mobile communication network. Wireless signals include various forms of data by sending and receiving voice call signals, videophone call signals, or SMS / MMS messages.

  The wireless internet module 113 is a module for wireless internet connection, and is built in or externally attached to the mobile terminal 100. As a wireless Internet technology, WLAN (Wireless LAN), Wi-Fi (Wireless Fidelity), Wibro (Wireless Broadband), WiMAX (Worldwide Interoperability for Microwave Access), HSDPA (High Speed Downlink Packet Access), or the like can be used.

  The near field communication module 114 is a module for near field communication. Bluetooth, RFID (Radio Frequency Identification), IrDA (Infrared Data Association), UWB (Ultra Wideband), ZigBee, etc. can be used as the short-range communication technology.

  The position information module 115 is a module for acquiring the position of the mobile terminal 100. A typical example is a GPS (Global Position System) module.

  The A / V input unit 120 is for inputting an audio signal or a video signal, and includes a front camera 121, a microphone 122, and the like. The front camera 121 processes an image frame such as a still image or a moving image obtained by an image sensor in a videophone mode or a shooting mode.

  The image frame processed by the front camera 121 can be displayed on the display unit 151. Also, the image frame processed by the front camera 121 can be stored in the memory 160 or transmitted to the outside by the wireless communication unit 110. Two or more front cameras 121 may be provided depending on the use environment.

  The microphone 122 processes an acoustic signal input from the outside into electrical voice data in a call mode, a recording mode, a voice selection mode, or the like. The voice data processed by the microphone 122 in the call mode can be converted into a form that can be transmitted to the mobile communication base station by the mobile communication module 112 and output. The microphone 122 implements various noise removal algorithms for removing noise generated in the process of inputting an external acoustic signal.

  The user input unit 130 generates input data for controlling the operation of the mobile terminal 100 by the user. The user input unit 130 may be configured by a keypad, a dome switch, a touch pad (static pressure / electrostatic), a jog wheel, a jog switch, or the like.

  The sensing unit 140 senses the current state of the mobile terminal 100 such as the presence / absence of contact of the user, the open / closed state of the mobile terminal 100, position, orientation, acceleration, and deceleration, and controls the operation of the mobile terminal 100. The sensing signal is generated. For example, when the mobile terminal 100 is a slide type, the sensing unit 140 may sense the open / closed state of the mobile terminal 100. In addition, the sensing unit 140 may sense whether power is supplied from the power supply unit 190, whether an external device is coupled to the interface unit 170, and the like.

  The sensing unit 140 may include a proximity sensor 141. Furthermore, the sensing unit 140 may include a touch sensor (not shown) that senses a touch operation on the display unit 151.

  A touch sensor has forms, such as a touch film, a touch sheet, and a touch pad, for example. The touch sensor may be configured to convert a change in pressure applied to a specific part of the display unit 151 or capacitance generated in the specific part of the display unit 151 into an electrical input signal. The touch sensor may be configured to detect not only the touched position and area but also the pressure at the time of touch.

  When the touch sensor and the display unit 151 have a layer structure, the display unit 151 can be used as an input device in addition to the output device. Such a display unit 151 is referred to as a “touch screen”.

  If there is a touch input on the touch screen, a corresponding signal is sent to a touch controller (not shown). The touch control device processes a signal sent from the touch sensor and sends data corresponding to the processed signal to the control unit 180. As a result, the control unit 180 knows which area of the display unit 151 is touched.

  The electrostatic touch screen is configured to detect the proximity of the sensing object from a change in the electric field due to the proximity of the sensing object. Such touch screens are also classified as proximity sensors 141.

  The proximity sensor 141 refers to a sensor that can detect the presence / absence of a sensing object without mechanical contact using the force of an electromagnetic field or infrared rays. The proximity sensor 141 has a longer life than a contact sensor, and its utilization is high. Examples of the proximity sensor 141 include a transmission photoelectric sensor, a direct reflection photoelectric sensor, a regression reflection photoelectric sensor, a high-frequency oscillation proximity sensor, a capacitance proximity sensor, a magnetic proximity sensor, and an infrared proximity sensor.

  In the following, for convenience of explanation, the proximity of the sensing object rather than the touch screen is called “proximity touch”, and the contact of the sensing object on the touch screen is called “contact touch”. ) ".

  The proximity sensor 141 detects the presence / absence of a proximity touch and a proximity touch pattern (for example, proximity touch distance, proximity touch direction, proximity touch speed, proximity touch time, proximity touch position, proximity touch movement state, etc.). Information on the presence / absence of the detected proximity touch and the proximity touch pattern may be output on the touch screen.

  The output unit 150 generates an output related to vision, hearing, touch, and the like. The output unit 150 may include a display unit 151, a front sound output unit 152, an alarm unit 153, and a haptic module 154.

  The display unit 151 displays (outputs) information processed by the mobile terminal 100. For example, when the mobile terminal 100 is in the call mode, the display unit 151 displays a UI (User Interface) or a GUI (Graphic User Interface) related to the call. When the mobile terminal 100 is in the videophone mode or the shooting mode, the display unit 151 displays a captured image, a received image, a UI, a GUI, or the like.

  The display unit 151 includes a liquid crystal display (LCD), a thin film transistor-liquid crystal display (TFT-LCD), an organic light-emitting diode (OLED), a flexible display, and a three-dimensional display. Including at least one of a display and an electronic ink display.

  At least one display (or display element) included in the display unit 151 may be configured as a transparent type or a light transmissive type so that the outside can be seen from the display (or display element). This is also called a transparent display, and a typical example of a transparent display is TOLED (Transparent OLED). The rear structure of the display unit 151 may also be configured as a light transmissive structure. With this structure, the user can see what is located behind the terminal body from the area occupied by the display unit 151 of the terminal body.

  Depending on how the mobile terminal 100 is implemented, two or more display units 151 may be provided. For example, in the mobile terminal 100, a plurality of display units may be arranged separately or integrally on one surface, or may be arranged on different surfaces.

  The front acoustic output unit 152 outputs audio data received from the wireless communication unit 110 or stored in the memory 160 in a call reception mode, a call mode, a recording mode, a voice selection mode, or a broadcast reception mode. In addition, the front sound output unit 152 outputs sound signals related to functions (for example, call signal reception sound, message reception sound, etc.) executed by the mobile terminal 100. Such a front sound output unit 152 includes a receiver, a speaker, a buzzer, and the like.

  The alarm unit 153 outputs a signal for notifying the occurrence of an event of the mobile terminal 100. Events that occur in the mobile terminal 100 include call signal reception, message reception, key signal input, and touch input. In addition to the video signal and the audio signal, the alarm unit 153 can output a signal for notifying the occurrence of the event in another form, for example, vibration. Since the video signal or the audio signal can be output by the display unit 151 or the front sound output unit 152, the display unit 151 and the front sound output unit 152 are also classified as a part of the alarm unit 153.

  The haptic module 154 generates various haptic effects that the user can feel. A typical example of the haptic effect generated by the haptic module 154 is vibration. The intensity and pattern of vibration generated by the haptic module 154 can be controlled. For example, different vibrations can be combined and output, or sequentially output.

  In addition to vibration, the haptic module 154 includes a pin arrangement that moves vertically with respect to the skin contact surface, air injection force or suction input using an injection port or suction port, rubbing against the skin surface, electrode contact, electrostatic force Various tactile effects can be generated, such as an effect due to a stimulus such as the above, and an effect due to reproduction of a cool / warm feeling using an element that can absorb heat or generate heat.

  The haptic module 154 can be configured not only to transmit the haptic effect by direct contact, but also to allow the user to feel the haptic effect by muscle senses such as fingers and arms. Two or more haptic modules 154 may be provided according to the configuration of the mobile terminal 100.

  The memory 160 can store a program for the operation of the control unit 180, and can temporarily store input / output data (for example, a phone book, a message, a still image, a moving image, etc.). The memory 160 may store various patterns of vibration and sound data that are output when a touch screen is touched.

  The memory 160 includes a flash memory, a hard disk, a multimedia card micro type, a card type memory (for example, SD or XD memory), a RAM (Random Access Memory), an SRAM (Static Random Access Memory), a ROM (Read-Only Memory). ), EEPROM (Electrically Erasable Programmable Read-Only Memory), PROM (Programmable Read-Only Memory), magnetic memory, magnetic disk, and optical disk. The mobile terminal 100 may operate in connection with a web storage that performs a storage function of the memory 160 on the Internet.

  The interface unit 170 serves as a path for all external devices connected to the mobile terminal 100. The interface unit 170 receives data from an external device, sends supplied power to each component in the mobile terminal 100, or sends data in the mobile terminal 100 to the external device. The interface unit 170 includes, for example, a wired / wireless headset port, an external charger port, a wired / wireless data port, a memory card port, a port for connecting a device equipped with an identification module, an audio I / O (Input / Output) port, and a video. Includes I / O ports, earphone ports, etc.

  The identification module is a chip that stores various types of information for authenticating the right to use the mobile terminal 100, such as a user identity module (UIM), a subscriber identity module (SIM), and a general-purpose subscription. Including a subscriber identity module (USIM). A device provided with an identification module (hereinafter referred to as an identification device) may be manufactured in the form of a smart card. Accordingly, the identification device can be connected to the mobile terminal 100 via the port.

  When the mobile terminal 100 is connected to an external cradle, the interface unit 170 serves as a passage through which power from the cradle is supplied to the mobile terminal 100, and various command signals input from the cradle by the user are transmitted to the mobile terminal 100. It becomes a passage transmitted to. Various command signals or power supply input from the cradle also function as a signal for recognizing that the mobile terminal 100 is correctly attached to the cradle.

  The controller 180 controls the overall operation of the mobile terminal 100. For example, the control unit 180 performs control and processing related to voice call, data communication, videophone, and the like. Further, the control unit 180 may include a multimedia module 181 for playing back multimedia. The multimedia module 181 may be realized in the control unit 180 or may be realized separately from the control unit 180. Further, the control unit 180 performs pattern recognition processing for recognizing handwritten input and handwritten input performed on the touch screen as characters and images, respectively.

  The power supply unit 190 supplies the supplied external power supply and internal power supply to each component as necessary under the control of the control unit 180.

  The various embodiments described herein may be implemented in software, hardware, or a combination thereof in a recording medium that can be read by a computer or similar device.

  In a hardware implementation, the embodiments described herein include ASICs (Application Specific Integrated Circuits), DSPs (Digital Signal Processors), DSPDs (Digital Signal Processing Devices), PLDs (Programmable Logic Devices), FPGAs (Field Programmable Gate Arrays), a processor, a controller, a microcontroller, a microprocessor, and other electrical units for executing functions may be used. In some cases, these embodiments may be realized by the control unit 180.

  In software implementation, embodiments such as the procedures and functions described herein may be implemented by another software module. Each software module may perform one or more functions or operations described herein. The software code may be realized by a software application written in an appropriate program language. The software code may be stored in the memory 160 and executed by the control unit 180.

  Hereinafter, a user input processing method of the mobile terminal 100 will be described.

  The user input unit 130 receives a command for controlling the operation of the mobile terminal 100 and may include a plurality of operation units. The operation unit is also called an operation unit, and any method can be adopted as long as it is a tactile manner.

  Various types of visual information can be displayed on the display unit 151. The visual information may be displayed in the form of letters, numbers, symbols, graphics, icons, etc., or may be composed of a three-dimensional stereoscopic image. For the input of visual information, at least one of letters, numbers, symbols, graphics, and icons may be displayed in a predetermined arrangement in the form of a keypad. Such a keypad is called a “soft key”.

  The display unit 151 may be operated in the entire area, or may be divided into a plurality of areas. In the latter case, a plurality of regions may be configured to operate in cooperation with each other. For example, an output window may be displayed at the top of the display unit 151 and an input window may be displayed at the bottom of the display unit 151. The output window is an area allocated for outputting information, and the input window is an area allocated for inputting information. A soft key on which a number for inputting a telephone number is displayed may be output to the input window. When the soft key is touched, a number corresponding to the touched soft key is displayed in the output window. When the operation unit is operated, a call connection to the telephone number displayed in the output window may be attempted, or text displayed in the output window may be input to the application.

  The display unit 151 or the touch pad may be configured to detect touch scroll. The user can move an individual displayed on the display unit 151, for example, a cursor or a pointer located on an icon, by scrolling the display unit 151 or the touch pad. Furthermore, when a finger is moved on the display unit 151 or the touch pad, a path along which the finger moves can be visually displayed on the display unit 151. This is useful for editing an image displayed on the display unit 151.

  When the display unit 151 and the touch pad are touched together within a predetermined time range, one function of the mobile terminal 100 may be executed. As a case where the display unit 151 and the touch pad are touched together, for example, a user may pinch the main body of the mobile terminal 100 using a thumb and an index finger. One function of the mobile terminal 100 executed in this case may be, for example, activation or deactivation of the display unit 151 or the touch pad.

  2A and 2B are perspective views illustrating an appearance of a mobile terminal according to an embodiment of the present invention. FIG. 2A shows the front and one side of the mobile terminal, and FIG. 2B shows the back and the other side of the mobile terminal.

  Referring to FIGS. 2A and 2B, the mobile terminal 100 includes a straight type terminal body. However, the mobile terminal 100 is not limited to this, and can be realized in various forms such as a slide type, a folding type, a swing type, and a two-axis rotation type in which two or more main bodies are coupled so as to be relatively movable. be able to.

  The terminal body includes a case (casing, housing, cover, etc.) that forms the appearance of the mobile terminal 100. In the present embodiment, the terminal body case includes a front case 101 and a rear case 102. Various electronic components are built in a space formed between the front case 101 and the rear case 102. One or more intermediate cases may be further disposed between the front case 101 and the rear case 102.

  The case may be formed by injecting synthetic resin, or may be formed of a metal material such as stainless steel or titanium (Ti).

  A display unit 151, a front sound output unit 152, a front camera 121, a user input unit 130 (see FIG. 1), a microphone 122, an interface unit 170, and the like are disposed on the front surface of the terminal body, particularly the front case 101.

  The user input unit 130 is for receiving an instruction to control the operation of the mobile terminal 100 and may include a plurality of operation units (a first operation unit 131 and a second operation unit 132). .

  The first operation unit 131 and the second operation unit 132 can receive various commands. For example, the first operation unit 131 receives commands such as start, end, and scroll, and the second operation unit 132 adjusts the volume of the sound output from the front sound output unit 152 and enters the touch selection mode of the display unit 151. It is also possible to receive a command such as a transition.

  The display unit 151 occupies most of the main surface of the front case 101. The front acoustic output unit 152 and the front camera 121 are disposed in a region adjacent to one end of the display unit 151, and the first operation unit 131 and the microphone 122 are disposed in a region adjacent to the other end of the display unit 151. Is done. A second operation unit 132 and an interface unit 170 are disposed on the side surfaces of the front case 101 and the rear case 102.

  A rear camera 121 ′ is disposed on the rear surface of the terminal body, particularly the rear case 102. The rear camera 121 ′ may be configured to have a shooting direction opposite to that of the front camera 121 and have different pixels from the front camera 121.

  For example, the front camera 121 may be configured with a low pixel camera, and the rear camera 121 ′ may be configured with a high pixel camera. As a result, in the case of a videophone or the like, the size of the transmission data can be reduced by transmitting an image captured by photographing the user's face using the front camera 121 to the other party in real time. On the other hand, the rear camera 121 'is used mainly for the purpose of storing high-quality images.

  Meanwhile, the front camera 121 and the rear camera 121 'may be installed on the terminal body so as to be rotatable or pop-up.

  Further, a flash 123 and a mirror 124 are disposed adjacent to the rear camera 121 '. The flash 123 emits light toward the subject when the user uses the rear camera 121 ′ to photograph the subject. The mirror 124 can display the user's face and the like when the user photographs the user himself / herself with the rear camera 121 '(self-photographing).

  A back acoustic output unit 152 ′ is further disposed on the back of the terminal body. The back sound output unit 152 ′ performs a stereo function together with the front sound output unit 152 and performs a speakerphone function during a call.

  In addition to the antenna for calling, a broadcast signal receiving antenna 116 is further disposed on the side of the terminal body. The antenna 116 constituting a part of the broadcast receiving module 111 (see FIG. 1) may be installed so as to be pulled out from the terminal body.

  A power supply unit 190 for supplying power to the mobile terminal 100 is attached to the terminal body. The power supply unit 190 may be configured to be built in the terminal body, or may be configured to be directly detachable from the terminal body.

  A touch pad 135 for sensing touch is further attached to the rear case 102. The touch pad 135 may be configured to be a light transmission type similarly to the display unit 151. In addition, a display unit on the back for displaying visual information may be attached to the touch pad 135. Here, information output from both the front display unit 151 and the rear display unit may be controlled by the touch pad 135.

  The touch pad 135 and the display unit 151 operate in association with each other. The touch pad 135 may be arranged in parallel behind the display unit 151. Further, the size of the touch pad 135 may be equal to or smaller than that of the display unit 151.

  FIG. 3 is a block diagram showing a speech recognition system according to an embodiment of the present invention.

  As shown in FIG. 3, the voice recognition system includes a server 200 and a mobile terminal 300 that are linked to each other via a network so that voice recognition can be processed using distributed resources. That is, the speech recognition system can realize a distributed speech recognition technology.

  The server 200 may include a first speech recognition engine 210 and a first database 220. The first speech recognition engine 210 recognizes speech provided by the mobile terminal 300 based on the first database 220 whose information range (domain) is specified as general-purpose information. As a result, the first speech recognition engine 210 generates first speech recognition data. The server 200 transmits the first voice recognition data generated by the first voice recognition engine 210 to the mobile terminal 300.

  The mobile terminal 300 may include a microphone 310, a second speech recognition engine 320, a second database 330, a communication unit 340, a display unit 350, and a control unit 360. The microphone 310 receives the user's voice. The second voice recognition engine 320 recognizes the voice received from the microphone 310 based on the second database 330 whose range of information is specified as personal information. As a result, the second speech recognition engine 320 generates second speech recognition data. The communication unit 340 transmits the voice received from the microphone 310 to the server 200, and receives first voice recognition data as a response to the voice from the server 200. The display unit 350 displays various information related to voice recognition and a control menu. The controller 360 controls general operations of the mobile terminal 300 related to voice recognition.

  Hereinafter, the voice recognition processing of the first voice recognition engine 210 and the second voice recognition engine 320 will be described in detail. For convenience of explanation, the first speech recognition engine 210 and the second speech recognition engine 320 are collectively referred to as a speech recognition engine, the first database 220 and the second database 330 are collectively referred to as a database, and the first speech recognition data and the second speech recognition. The data is collectively referred to as voice recognition data.

  The speech recognition engine uses a speech recognition algorithm to analyze the meaning and context of received (input) speech within the range of database information. For this purpose, the speech recognition engine converts speech into text format data using an STT (Speech To Text) algorithm and stores it in a database.

  The voice of the user can be converted into a plurality of data by the voice recognition algorithm. In this case, the speech recognition engine determines a recognition rate of a plurality of data, and selects data having the highest recognition rate among the plurality of data as a speech recognition result.

  FIG. 4 is a flowchart illustrating a voice recognition method of a mobile terminal according to an embodiment of the present invention.

  Referring to FIG. 4, first, a step (S102) of receiving the user's voice from the microphone 310 is performed.

  Next, the step (S104) of providing the received speech to the first speech recognition engine 210 and the second speech recognition engine 320 is performed. The received voice may be transmitted from the communication unit 340 to the server 200 to be provided to the first voice recognition engine 210. Here, transmission of the received voice to the server 200 may be blocked according to the state of the network.

  Next, as a result of the first voice recognition engine 210 recognizing the received voice, a step (S106) of obtaining first voice recognition data is performed. At this time, the first speech recognition data is received from the server 200 and acquired. Here, the voice reception from the server 200 may be blocked according to the state of the network. Further, as a result of the second speech recognition engine 320 recognizing the received speech, a step (S108) of acquiring second speech recognition data is performed.

  Next, a step (S110) of predicting a function corresponding to the user's intention is performed based on at least one of the acquired first voice recognition data and second voice recognition data. For example, a function corresponding to the user's intention is predicted by spoken language understanding (SLU). Spoken language comprehension refers to extracting meaningful information from speech-recognized sentences and inferring the user's intentions. Mainly, main action, speech act, specific expression (named entity)). Here, the main action means what is a specific action that the user tries to make clear from the user's utterance, and the utterance action means the type of the user's utterance, Means an important word appearing in the utterance, for example, information such as a person, a place, an organization, and a time.

  Next, a step of determining whether or not personal information (for example, contact information) is requested for the predicted function is performed (S112). For example, in order to execute a call function, personal information regarding a call target is required. When personal information is required for the predicted function, a step (S114) of calculating the similarity between the first voice recognition data and the second voice recognition data is performed. Here, the similarity may indicate the ratio of the number of matching characters or words in the texts compared with each other. For example, “ABCD” and “ABCF” have a similarity of 75% because three of the four characters match.

  Next, a step (S116) of comparing the calculated similarity with a predetermined reference value (for example, 80%) is performed. When the calculated similarity is smaller than the reference value, that is, when it is determined that the difference between the first speech recognition data and the second speech recognition data is large, the first of the first speech recognition data and the second speech recognition data is the first. A step of selecting speech recognition data (S118) is performed. Thereby, the selected first speech recognition data is used, and the predicted function is executed. At this time, the predicted function is corrected or complemented by the selected first speech recognition data and executed.

  On the other hand, when the calculated similarity is equal to or larger than the reference value, that is, when it is determined that the difference between the first speech recognition data and the second speech recognition data is small, the first speech recognition data and the first speech recognition data A step (S120) of selecting the second voice recognition data from the two voice recognition data is performed. Thereby, the selected second speech recognition data is used and the predicted function is executed. At this time, the predicted function is corrected or complemented by the selected second speech recognition data and executed.

  On the other hand, when personal information is not required for the predicted function, a step (S122) of ignoring the second speech recognition data is performed. As a result, the first speech recognition data is used and the predicted function is executed.

  As described above, according to the present invention, a predetermined algorithm among the speech recognition results of the mutually complementary remote speech recognition engine (first speech recognition engine) and the local speech recognition engine (second speech recognition engine). Therefore, the speech recognition rate of the mobile terminal 300 can be improved by selecting and using the speech recognition result determined to be highly reliable.

  Further, according to the present invention, when personal information is required for a function predicted in the voice recognition process, the voice recognition result of the local voice recognition engine is used to prevent the remote voice recognition engine from recognizing voice related to the personal information. can do. That is, it is possible to prevent the leakage of personal information.

  Furthermore, according to the present invention, when the network condition is not good, the speech recognition processing of the remote speech recognition engine is ignored, and the delay due to the reception of the speech recognition result from the remote speech recognition engine is eliminated. Speed can be increased.

  5 and 6 are flowcharts illustrating a voice recognition method of a mobile terminal related to whether or not voice recognition data is received according to an embodiment of the present invention.

  Referring to FIG. 5, first, a step (S210) of grasping a state of a network constructed between the server 200 and the mobile terminal 300 is performed. The state of the network is grasped based on the transmission speed and the data packet loss rate.

  Next, a step (S220) for determining whether or not the state of the network is not good is performed. If the network condition is not good, a step (S230) of blocking reception of the first voice recognition data from the server 200 is performed.

  Referring to FIG. 6, first, a step (S310) of transmitting a request signal for the first speech recognition data to the server 200 is performed. The first voice recognition data is received from the server 200 as a response to the request signal.

  Next, a step of determining whether or not the first voice recognition data is received within the reference response time (S320) is performed. When the first speech recognition data is not received within the reference response time, a step (S330) of transmitting a cancel signal for canceling the request for the first speech recognition data to the server 200 is performed. The server 200 interrupts the generation and transmission of the first speech recognition data in response to the cancel signal.

  FIG. 7 is a flowchart illustrating a voice recognition method of a mobile terminal related to a personal information protection function according to an embodiment of the present invention.

  Referring to FIG. 7, first, a step (S410) of displaying a menu button for executing the personal information protection function is performed in the voice recognition mode. The personal information protection function is executed in response to the touch input of the menu button.

  Next, a step of determining whether or not to execute the personal information protection function (S420) is performed. When the personal information protection function is executed, a step of blocking the provision of the voice received from the user to the first voice recognition engine 210 (S430) is performed. This means that the transmission of the user's voice to the server 300 is blocked.

  FIG. 8 is a conceptual diagram illustrating a user interface of a mobile terminal to which the speech recognition method of FIG. 7 is applied.

  Referring to FIG. 8, the control unit 360 controls the display unit 350 to display a screen image 351 related to voice recognition. Screen image 351 includes guide information 352 indicating that the voice recognition mode is being executed, menu button 353 for executing the personal information protection function, and the like.

  When the touch input of the menu button 353 is detected, the control unit 360 executes a personal information protection function. When the user's voice is received from the microphone 310 during the execution of the personal information protection function, the control unit 360 cuts off the provision of the received voice to the first voice recognition engine 210, and the received voice is used as the second voice recognition engine 320. To provide.

  The second speech recognition engine 320 recognizes the received speech based on the second database 330 whose range of information is specified as personal information, and sends the speech recognition result to the control unit 360. The control unit 360 predicts and executes a function corresponding to the user's intention based on the voice recognition result of the second voice recognition engine 320. For example, the control unit 360 predicts and executes the call function by recognizing the voice “Call Kim Tae Hee!” Received from the user. In addition, the control unit 360 controls the display unit 350 to display a screen image 354 related to the call function.

  Furthermore, in order to execute the call function, contact information of “Kim Tae-hi” is requested as personal information. In this case, the menu button 353 is used to prevent the voice related to the personal information from being transmitted to the server 200. The personal information protection function can be executed manually.

  FIG. 9 is a flowchart illustrating a voice recognition method of a mobile terminal related to a user selection for voice recognition data according to an embodiment of the present invention.

  Referring to FIG. 9, first, the first speech recognition data is displayed as the speech recognition result of the first speech recognition engine 210, and the second speech recognition data is displayed as the speech recognition result of the second speech recognition engine 320 (S510). ) Is performed.

  Next, a step of selecting either the first voice recognition data or the second voice recognition data in response to the touch input (S520) is performed.

  Next, a step (S530) of executing the predicted function using any one of the selected speech recognition data is performed.

  FIG. 10 is a conceptual diagram illustrating a user interface of a mobile terminal to which the speech recognition method of FIG. 9 is applied.

  Referring to FIG. 10, the control unit 360 controls the display unit 350 to display a screen image 451 related to voice recognition. Screen image 451 includes guidance information 452 indicating that the voice recognition mode is being executed, first voice recognition data 453, second voice recognition data 454, and the like.

  For example, as a result of recognizing the voice “Call Kim Tae Hee!” Received from the user, the text-format first voice recognition data 453 “Kim Tae Hei! The second voice recognition data 454 in a text format is displayed. Here, different characters or words in the first voice recognition data 453 and the second voice recognition data 454 may be emphasized. For example, the character styles such as bold, color, italic, and font may be changed so that “fi” and “hi” are distinguished from other characters. Alternatively, graphic effects such as underlining and shadowing may be given to “fi” and “hi”. As a result, the user can intuitively recognize which voice recognition data relatively matches the user's intention.

  The control unit 360 selects either the first voice recognition data 453 or the second voice recognition data 454 in response to the touch input. Then, the control unit 360 predicts and executes a function corresponding to the user's intention based on one of the selected voice recognition data. For example, when the voice recognition data 454 “Call Kim Tae Hee!” Is selected, the call function is predicted and executed.

  According to one embodiment disclosed herein, the method described above can be implemented in processor readable code on a program recording medium. Processor readable media include ROM, RAM, CD-ROM, magnetic tape, floppy disk, optical data storage device, and the like, including those realized in the form of a carrier wave (for example, transmission over the Internet).

  The mobile terminal and the voice recognition method thereof according to the present invention are not limited to the configurations and methods of the above-described embodiments, and can be modified in various ways by selectively combining all or a part of the embodiments. be able to.

200 server 210 first voice recognition engine 220 first database 300 mobile terminal 310 microphone 320 second voice recognition engine 330 second database 340 communication unit 350 display unit 353 menu button 360 control unit 453 first voice recognition data 454 second voice Recognition data

Claims (16)

  1. In the voice recognition method of the mobile terminal linked with the server,
    Receiving the user's voice;
    Providing the received voice to a first voice recognition engine provided in the server and a second voice recognition engine provided in the mobile terminal;
    Obtaining first speech recognition data as a result of the first speech recognition engine recognizing the received speech;
    Obtaining second speech recognition data as a result of the second speech recognition engine recognizing the received speech;
    Predicting a function corresponding to a user's intention based on at least one of the first voice recognition data and the second voice recognition data;
    When personal information is required for the predicted function, calculating a similarity between the first voice recognition data and the second voice recognition data;
    Selecting one of the first speech recognition data and the second speech recognition data based on the calculated similarity;
    A speech recognition method for a mobile terminal, comprising:
  2.   The method of claim 1, further comprising ignoring the second voice recognition data when personal information is not required for the predicted function.
  3. The step of acquiring the first voice recognition data includes:
    Transmitting a request signal for requesting the first voice recognition data to the server;
    Receiving the first speech recognition data from the server as a response to the request signal;
    The method of claim 1, further comprising:
  4. Grasping status information of a network connecting the server and the mobile terminal;
    Blocking the reception of the first voice recognition data based on the status information of the network;
    The method of claim 3, further comprising:
  5.   The voice of the mobile terminal according to claim 4, further comprising executing the predicted function using the second voice recognition data when reception of the first voice recognition data is blocked. Recognition method.
  6. Displaying a menu button for executing the personal information protection function;
    Blocking the provision of the received voice to the first voice recognition engine when the personal information protection function is executed in response to a touch input of the menu button;
    The method of claim 1, further comprising:
  7.   The method of claim 1, further comprising executing the predicted function using the selected one of the speech recognition data.
  8. The step of obtaining the second voice recognition data includes:
    The method of claim 1, further comprising the step of recognizing the received voice based on the personal information database.
  9. In a mobile terminal that works with a server,
    A microphone that receives the user's voice;
    A communication unit that transmits the received voice to the server and receives first voice recognition data generated as a result of recognition of the received voice by a first voice recognition engine provided in the server;
    A second speech recognition engine that generates second speech recognition data as a result of recognizing the received speech;
    When a function corresponding to the user's intention is predicted based on at least one of the first voice recognition data and the second voice recognition data, and personal information is required for the predicted function, the first voice recognition data And a controller that selects the first voice recognition data and the second voice recognition data based on the calculated similarity,
    A mobile terminal comprising:
  10. The controller is
    The mobile terminal of claim 9, wherein the second voice recognition data is ignored when personal information is not required for the predicted function.
  11. The controller is
    The system according to claim 9, wherein status information of a network connecting the server and the mobile terminal is grasped, and reception of the first voice recognition data is blocked based on the status information of the network. Mobile terminal.
  12. The controller is
    The mobile terminal of claim 10, wherein when the reception of the first voice recognition data is blocked, the predicted function is executed using the second voice recognition data.
  13.   The mobile terminal of claim 9, further comprising a display unit that displays a menu button for executing a personal information protection function.
  14. The controller is
    14. The mobile terminal according to claim 13, wherein when the personal information protection function is executed in response to a touch input of the menu button, transmission of the received voice to the server is blocked.
  15. The controller is
    The mobile terminal according to claim 9, wherein the predicted function is executed using any one of the selected voice recognition data.
  16. The second speech recognition engine is
    The mobile terminal according to claim 9, wherein the received voice is recognized based on the personal information database.
JP2013134874A 2012-06-28 2013-06-27 Mobile terminal and voice recognition method thereof Active JP5956384B2 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
KR10-2012-0070353 2012-06-28
KR1020120070353A KR101961139B1 (en) 2012-06-28 2012-06-28 Mobile terminal and method for recognizing voice thereof

Publications (2)

Publication Number Publication Date
JP2014010456A true JP2014010456A (en) 2014-01-20
JP5956384B2 JP5956384B2 (en) 2016-07-27

Family

ID=48747311

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2013134874A Active JP5956384B2 (en) 2012-06-28 2013-06-27 Mobile terminal and voice recognition method thereof

Country Status (6)

Country Link
US (1) US9147395B2 (en)
EP (1) EP2680257B1 (en)
JP (1) JP5956384B2 (en)
KR (1) KR101961139B1 (en)
CN (1) CN103533154B (en)
WO (1) WO2014003329A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2016045487A (en) * 2014-08-21 2016-04-04 本田技研工業株式会社 Information processing device, information processing system, information processing method, and information processing program

Families Citing this family (58)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8677377B2 (en) 2005-09-08 2014-03-18 Apple Inc. Method and apparatus for building an intelligent automated assistant
US9330720B2 (en) 2008-01-03 2016-05-03 Apple Inc. Methods and apparatus for altering audio output signals
US8996376B2 (en) 2008-04-05 2015-03-31 Apple Inc. Intelligent text-to-speech conversion
US20100030549A1 (en) 2008-07-31 2010-02-04 Lee Michael M Mobile device having human language translation capability with positional feedback
US9431006B2 (en) 2009-07-02 2016-08-30 Apple Inc. Methods and apparatuses for automatic speech recognition
US8682667B2 (en) 2010-02-25 2014-03-25 Apple Inc. User profiling for selecting user specific voice input processing information
US9262612B2 (en) 2011-03-21 2016-02-16 Apple Inc. Device access using voice authentication
US9721563B2 (en) 2012-06-08 2017-08-01 Apple Inc. Name recognition system
US9547647B2 (en) 2012-09-19 2017-01-17 Apple Inc. Voice-based media searching
WO2014197334A2 (en) 2013-06-07 2014-12-11 Apple Inc. System and method for user-specified pronunciation of words for speech synthesis and recognition
US9966065B2 (en) * 2014-05-30 2018-05-08 Apple Inc. Multi-command single utterance input method
US9633004B2 (en) 2014-05-30 2017-04-25 Apple Inc. Better resolution when referencing to concepts
US9430463B2 (en) 2014-05-30 2016-08-30 Apple Inc. Exemplar-based natural language processing
KR20160001359A (en) * 2014-06-27 2016-01-06 삼성전자주식회사 Method for managing data and an electronic device thereof
US9338493B2 (en) 2014-06-30 2016-05-10 Apple Inc. Intelligent automated assistant for TV user interactions
JP6418820B2 (en) * 2014-07-07 2018-11-07 キヤノン株式会社 Information processing apparatus, display control method, and computer program
US9818400B2 (en) 2014-09-11 2017-11-14 Apple Inc. Method and apparatus for discovering trending terms in speech requests
US9668121B2 (en) 2014-09-30 2017-05-30 Apple Inc. Social reminders
US10127911B2 (en) 2014-09-30 2018-11-13 Apple Inc. Speaker identification and unsupervised speaker adaptation techniques
US10074360B2 (en) 2014-09-30 2018-09-11 Apple Inc. Providing an indication of the suitability of speech recognition
US9934406B2 (en) 2015-01-08 2018-04-03 Microsoft Technology Licensing, Llc Protecting private information in input understanding system
US9721566B2 (en) 2015-03-08 2017-08-01 Apple Inc. Competing devices responding to voice triggers
EP3091422A1 (en) * 2015-05-08 2016-11-09 Nokia Technologies Oy Method, apparatus and computer program product for entering operational states based on an input type
US9578173B2 (en) 2015-06-05 2017-02-21 Apple Inc. Virtual assistant aided communication with 3rd party service in a communication session
US10338959B2 (en) * 2015-07-13 2019-07-02 Microsoft Technology Licensing, Llc Task state tracking in systems and services
KR101910383B1 (en) * 2015-08-05 2018-10-22 엘지전자 주식회사 Driver assistance apparatus and vehicle including the same
CN105206266B (en) * 2015-09-01 2018-09-11 重庆长安汽车股份有限公司 Vehicle-mounted voice control system and method based on user view conjecture
US10366158B2 (en) 2015-09-29 2019-07-30 Apple Inc. Efficient word encoding for recurrent neural network language models
US10049668B2 (en) 2015-12-02 2018-08-14 Apple Inc. Applying neural network language models to weighted finite state transducers for automatic speech recognition
US10223066B2 (en) 2015-12-23 2019-03-05 Apple Inc. Proactive assistance based on dialog communication between devices
CN106971716A (en) * 2016-01-14 2017-07-21 芋头科技(杭州)有限公司 A kind of robot noise database updates and speech recognition equipment, method
CN106971720A (en) * 2016-01-14 2017-07-21 芋头科技(杭州)有限公司 A kind of robot voice recognition methods for updating noise database and device
US10446143B2 (en) 2016-03-14 2019-10-15 Apple Inc. Identification of voice inputs providing credentials
US9934775B2 (en) 2016-05-26 2018-04-03 Apple Inc. Unit-selection text-to-speech synthesis based on predicted concatenation parameters
US9972304B2 (en) 2016-06-03 2018-05-15 Apple Inc. Privacy preserving distributed evaluation framework for embedded personalized systems
US10249300B2 (en) 2016-06-06 2019-04-02 Apple Inc. Intelligent list reading
US10049663B2 (en) 2016-06-08 2018-08-14 Apple, Inc. Intelligent automated assistant for media exploration
DK201670578A1 (en) 2016-06-09 2018-02-26 Apple Inc Intelligent automated assistant in a home environment
US10490187B2 (en) 2016-06-10 2019-11-26 Apple Inc. Digital assistant providing automated status report
US10067938B2 (en) 2016-06-10 2018-09-04 Apple Inc. Multilingual word prediction
US10192552B2 (en) 2016-06-10 2019-01-29 Apple Inc. Digital assistant providing whispered speech
DK179415B1 (en) 2016-06-11 2018-06-14 Apple Inc Intelligent device arbitration and control
DK179343B1 (en) 2016-06-11 2018-05-14 Apple Inc Intelligent task discovery
DK201670540A1 (en) 2016-06-11 2018-01-08 Apple Inc Application integration with a digital assistant
US10474753B2 (en) 2016-09-07 2019-11-12 Apple Inc. Language identification using recurrent neural networks
US10043516B2 (en) 2016-09-23 2018-08-07 Apple Inc. Intelligent automated assistant
WO2018117608A1 (en) * 2016-12-20 2018-06-28 삼성전자 주식회사 Electronic device, method for determining utterance intention of user thereof, and non-transitory computer-readable recording medium
US10417266B2 (en) 2017-05-09 2019-09-17 Apple Inc. Context-aware ranking of intelligent response suggestions
US10332518B2 (en) 2017-05-09 2019-06-25 Apple Inc. User interface for correcting recognition errors
US10395654B2 (en) 2017-05-11 2019-08-27 Apple Inc. Text normalization based on a data-driven learning network
US10410637B2 (en) 2017-05-12 2019-09-10 Apple Inc. User-specific acoustic models
US10482874B2 (en) 2017-05-15 2019-11-19 Apple Inc. Hierarchical belief states for digital assistants
US10311144B2 (en) 2017-05-16 2019-06-04 Apple Inc. Emoji word sense disambiguation
US10303715B2 (en) 2017-05-16 2019-05-28 Apple Inc. Intelligent automated assistant for media exploration
US10403278B2 (en) 2017-05-16 2019-09-03 Apple Inc. Methods and systems for phonetic matching in digital assistant services
KR102033929B1 (en) * 2017-06-28 2019-10-18 포항공과대학교 산학협력단 A real-time speech-recognition device using an ASIC chip and a smart-phone
US10445429B2 (en) 2017-09-21 2019-10-15 Apple Inc. Natural language understanding using vocabularies with compressed serialized tries
US10403283B1 (en) 2018-06-01 2019-09-03 Apple Inc. Voice interaction at a primary device to access call functionality of a companion device

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH08259125A (en) * 1995-03-28 1996-10-08 Fujitec Co Ltd Voice inputting device for elevator
JP2001142487A (en) * 1999-11-11 2001-05-25 Sony Corp Voice data input system
US20030120486A1 (en) * 2001-12-20 2003-06-26 Hewlett Packard Company Speech recognition system and method
JP2004272134A (en) * 2003-03-12 2004-09-30 Advanced Telecommunication Research Institute International Speech recognition device and computer program
JP2004312210A (en) * 2003-04-04 2004-11-04 R & D Associates:Kk Individual authentication method, apparatus, and system
JP2005284543A (en) * 2004-03-29 2005-10-13 Advanced Media Inc Business support system and method
JP2009237439A (en) * 2008-03-28 2009-10-15 Kddi Corp Speech recognition device of mobile terminal, speech recognition method of mobile terminal and speech recognition program for the mobile terminal
JP2010014885A (en) * 2008-07-02 2010-01-21 Advanced Telecommunication Research Institute International Information processing terminal with voice recognition function
JP2010113678A (en) * 2008-11-10 2010-05-20 Advanced Media Inc Full name analysis method, full name analysis device, voice recognition device, and full name frequency data generation method
WO2010090679A1 (en) * 2009-01-22 2010-08-12 Microsoft Corporation Markup language-based selection and utilization of recognizers for utterance processing
JP2012013910A (en) * 2010-06-30 2012-01-19 Denso Corp Voice recognition terminal

Family Cites Families (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020138274A1 (en) * 2001-03-26 2002-09-26 Sharma Sangita R. Server based adaption of acoustic models for client-based speech systems
US6738743B2 (en) * 2001-03-28 2004-05-18 Intel Corporation Unified client-server distributed architectures for spoken dialogue systems
KR100956941B1 (en) * 2003-06-27 2010-05-11 주식회사 케이티 Selective speech recognition apparatus and method which it follows in network situation
US8589156B2 (en) * 2004-07-12 2013-11-19 Hewlett-Packard Development Company, L.P. Allocation of speech recognition tasks and combination of results thereof
US8024194B2 (en) 2004-12-08 2011-09-20 Nuance Communications, Inc. Dynamic switching between local and remote speech rendering
KR101073190B1 (en) * 2005-02-03 2011-10-13 주식회사 현대오토넷 Distribute speech recognition system
US20070276651A1 (en) * 2006-05-23 2007-11-29 Motorola, Inc. Grammar adaptation through cooperative client and server based speech recognition
JP5212910B2 (en) * 2006-07-07 2013-06-19 日本電気株式会社 Speech recognition apparatus, speech recognition method, and speech recognition program
US8635243B2 (en) * 2007-03-07 2014-01-21 Research In Motion Limited Sending a communications header with voice recording to send metadata for use in speech recognition, formatting, and search mobile search application
KR101326262B1 (en) * 2007-12-27 2013-11-20 삼성전자주식회사 Speech recognition device and method thereof
US8364481B2 (en) * 2008-07-02 2013-01-29 Google Inc. Speech recognition with parallel recognition tasks
JP5381988B2 (en) * 2008-07-28 2014-01-08 日本電気株式会社 Dialogue speech recognition system, dialogue speech recognition method, and dialogue speech recognition program
US9959870B2 (en) * 2008-12-11 2018-05-01 Apple Inc. Speech recognition involving a mobile device
JP5326892B2 (en) * 2008-12-26 2013-10-30 富士通株式会社 Information processing apparatus, program, and method for generating acoustic model
JP5377430B2 (en) * 2009-07-08 2013-12-25 本田技研工業株式会社 Question answering database expansion device and question answering database expansion method
CN102496364A (en) 2011-11-30 2012-06-13 苏州奇可思信息科技有限公司 Interactive speech recognition method based on cloud network
US20130238326A1 (en) * 2012-03-08 2013-09-12 Lg Electronics Inc. Apparatus and method for multiple device voice control
US10354650B2 (en) * 2012-06-26 2019-07-16 Google Llc Recognizing speech with mixed speech recognition models to generate transcriptions

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH08259125A (en) * 1995-03-28 1996-10-08 Fujitec Co Ltd Voice inputting device for elevator
JP2001142487A (en) * 1999-11-11 2001-05-25 Sony Corp Voice data input system
US20030120486A1 (en) * 2001-12-20 2003-06-26 Hewlett Packard Company Speech recognition system and method
JP2004272134A (en) * 2003-03-12 2004-09-30 Advanced Telecommunication Research Institute International Speech recognition device and computer program
JP2004312210A (en) * 2003-04-04 2004-11-04 R & D Associates:Kk Individual authentication method, apparatus, and system
JP2005284543A (en) * 2004-03-29 2005-10-13 Advanced Media Inc Business support system and method
JP2009237439A (en) * 2008-03-28 2009-10-15 Kddi Corp Speech recognition device of mobile terminal, speech recognition method of mobile terminal and speech recognition program for the mobile terminal
JP2010014885A (en) * 2008-07-02 2010-01-21 Advanced Telecommunication Research Institute International Information processing terminal with voice recognition function
JP2010113678A (en) * 2008-11-10 2010-05-20 Advanced Media Inc Full name analysis method, full name analysis device, voice recognition device, and full name frequency data generation method
WO2010090679A1 (en) * 2009-01-22 2010-08-12 Microsoft Corporation Markup language-based selection and utilization of recognizers for utterance processing
JP2012013910A (en) * 2010-06-30 2012-01-19 Denso Corp Voice recognition terminal

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2016045487A (en) * 2014-08-21 2016-04-04 本田技研工業株式会社 Information processing device, information processing system, information processing method, and information processing program

Also Published As

Publication number Publication date
KR101961139B1 (en) 2019-03-25
KR20140001711A (en) 2014-01-07
EP2680257B1 (en) 2016-08-10
JP5956384B2 (en) 2016-07-27
CN103533154A (en) 2014-01-22
US20140006027A1 (en) 2014-01-02
US9147395B2 (en) 2015-09-29
CN103533154B (en) 2015-09-02
WO2014003329A1 (en) 2014-01-03
EP2680257A1 (en) 2014-01-01

Similar Documents

Publication Publication Date Title
US8661369B2 (en) Mobile terminal and method of controlling the same
KR101708821B1 (en) Mobile terminal and method for controlling thereof
US9344622B2 (en) Control of input/output through touch
US9459793B2 (en) Mobile terminal and controlling method thereof
KR101608532B1 (en) Method for displaying data and mobile terminal thereof
KR101537596B1 (en) Mobile terminal and method for recognizing touch thereof
KR101496512B1 (en) Mobile terminal and control method thereof
KR101510484B1 (en) Mobile Terminal And Method Of Controlling Mobile Terminal
KR20130123064A (en) Mobile terminal and control method thereof
EP2464084A1 (en) Mobile terminal and displaying method thereof
EP2261785B1 (en) Mobile terminal and controlling method thereof
US9167059B2 (en) Mobile terminal and control method thereof
KR20110054452A (en) Method for outputting tts voice data in mobile terminal and mobile terminal thereof
KR101651135B1 (en) Mobile terminal and method for controlling the same
KR20110045330A (en) Mobile terminal
US9280263B2 (en) Mobile terminal and control method thereof
KR101873413B1 (en) Mobile terminal and control method for the mobile terminal
EP2261784B1 (en) Mobile terminal and method of displaying information in mobile terminal
EP2151980A1 (en) Mobile terminal with touch screen and method of processing messages using the same
US9147395B2 (en) Mobile terminal and method for recognizing voice thereof
EP2259175A2 (en) Mobile terminal and method of displaying information in mobile terminal
CN101859226A (en) Method for inputting command and mobile terminal using the same
KR101917685B1 (en) Mobile terminal and control method thereof
US9310996B2 (en) Mobile terminal and method for providing user interface thereof
CN104298450A (en) Mobile terminal and control method of same mobile terminal

Legal Events

Date Code Title Description
A131 Notification of reasons for refusal

Free format text: JAPANESE INTERMEDIATE CODE: A131

Effective date: 20141209

A521 Written amendment

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20150309

A131 Notification of reasons for refusal

Free format text: JAPANESE INTERMEDIATE CODE: A131

Effective date: 20150908

A521 Written amendment

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20151208

TRDD Decision of grant or rejection written
A01 Written decision to grant a patent or to grant a registration (utility model)

Free format text: JAPANESE INTERMEDIATE CODE: A01

Effective date: 20160517

A61 First payment of annual fees (during grant procedure)

Free format text: JAPANESE INTERMEDIATE CODE: A61

Effective date: 20160616

R150 Certificate of patent or registration of utility model

Ref document number: 5956384

Country of ref document: JP

Free format text: JAPANESE INTERMEDIATE CODE: R150

R250 Receipt of annual fees

Free format text: JAPANESE INTERMEDIATE CODE: R250