WO2011028051A2

WO2011028051A2 - Electronic device and a voice recognition method using the same

Info

Publication number: WO2011028051A2
Application number: PCT/KR2010/005984
Authority: WO
Inventors: 김유진; 신원호
Original assignee: 엘지전자 주식회사
Priority date: 2009-09-04
Filing date: 2010-09-02
Publication date: 2011-03-10
Also published as: KR20110025510A; WO2011028051A3

Abstract

The present invention relates to an electronic device and to a voice recognition method using the same. Provided are: an electronic device which effectively recognises voices including numbers and which can facilitate user access to and correction of recognition results; and a voice recognition method using the same.

Description

Electronic device and voice recognition method using same

BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to speech recognition, and more particularly, to an electronic device capable of efficiently recognizing speech including numbers and a speech recognition method using the same.

In general, speech recognition of a string containing a plurality of numbers is more difficult and speech recognition rate is lower than speech recognition of a character. For example, in the case of speech recognition for a string of numbers containing 10 numbers, even if the recognition rate for each number is 90%, the speech recognition rate for all 10 numbers is (90%) ^ 10. Degrades.

In the case of a mobile communication terminal providing a call function, a function of recognizing a phone number may be provided. In this case, a more efficient and effective method is required for voice recognition of a telephone number string having at least seven to more than ten digits with a reliable recognition rate.

Disclosure of Invention An object of the present invention is to provide an electronic device capable of efficiently and effectively recognizing a voice including a number and a voice recognition method using the same.

Another object of the present invention is to provide an electronic device and a voice recognition method using the same, which improves a user's accessibility to a voice recognition result and enables the user to easily and conveniently modify the voice recognition result.

An electronic device according to a first aspect of the present invention includes a display unit; A voice receiver configured to receive a voice including a plurality of numbers; And a controller configured to recognize the received voice and to display, on the display unit, a plurality of recognition candidates corresponding to a plurality of different numeric strings as a voice recognition result for the plurality of numbers. In addition, the result of the speech recognition between the different numeric strings is characterized in that the different numbers are highlighted.

An electronic device according to a second aspect of the present invention includes a voice receiver for receiving a voice; And a controller configured to perform voice recognition on the received voice, wherein the controller includes the voice receiver following the first keyword when the received voice includes a first predefined keyword representing an international call. The second keyword may be recognized by assuming a second keyword received through the country code number.

An electronic device according to a third aspect of the present invention includes a voice receiver for receiving a voice including a plurality of numbers; And a controller configured to perform voice recognition with respect to the received voice, wherein the controller is configured to perform at least one received priority over the detected pause whenever a pause, which is a silent section, is detected in the received voice. It is characterized by performing a voice recognition for the number.

A voice recognition method of an electronic device according to a fourth aspect of the present invention includes: receiving a voice including a plurality of numbers; Recognizing the received voice; And a plurality of recognition candidates corresponding to a plurality of different numeric strings, respectively, as a speech recognition result of the plurality of numbers, wherein the speech recognition results of the different numeric strings are highlighted. It comprises a step.

A voice recognition method of an electronic device according to a fifth aspect of the present invention includes: receiving a voice; And recognizing the second keyword by assuming a second keyword received after the first keyword as a country code number when the received voice includes a predefined first keyword indicating an international call. It is done by

A voice recognition method of an electronic device according to a sixth aspect of the present invention includes: receiving a voice including a plurality of numbers; And whenever a pause, which is a silent section, is detected in the received voice, performing voice recognition on the received at least one number in preference to the detected pause.

According to the electronic device and the voice recognition method using the same according to the present invention, the following effects are obtained.

According to the present invention, there is an effect that can significantly improve the recognition rate for the voice containing a number such as a telephone number.

In addition, according to the present invention, there is an effect of improving the accessibility of the user to the voice recognition result for the voice including the number.

In addition, according to the present invention, there is an effect that the user can easily and conveniently modify the voice recognition result for the voice including the number.

1 is a block diagram of an electronic device according to an embodiment of the present invention.

2 is a diagram illustrating a case where the electronic device 100 according to an embodiment of the present invention is a mobile terminal.

3 is a conceptual diagram in which the electronic device 100 and the external server 300 are connected to the Internet 400.

4 is a flowchart of a voice recognition method of an electronic device according to a first embodiment of the present invention.

5 to 8 are diagrams for describing a voice recognition method of an electronic device according to a first embodiment of the present invention.

9 is a flowchart of a voice recognition method of an electronic device according to a second embodiment of the present invention.

10 to 15 are diagrams for describing a voice recognition method of an electronic device according to a second embodiment of the present invention.

16 is a flowchart of a voice recognition method of an electronic device according to a third embodiment of the present invention.

17 to 19 are diagrams for describing a voice recognition method of an electronic device according to a third embodiment of the present invention.

20 is a diagram showing an example in which the fourth embodiment of the present invention is implemented.

21 is a diagram showing an example in which the fifth embodiment of the present invention is implemented.

The above objects, features and advantages of the present invention will become more apparent from the following detailed description taken in conjunction with the accompanying drawings. Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings. Like numbers refer to like elements throughout. In addition, when it is determined that the detailed description of the known function or configuration related to the present invention may unnecessarily obscure the subject matter of the present invention, the detailed description thereof will be omitted.

Speech recognition technology is an application of pattern matching technique. In other words, the feature parameters of the recognition word or phoneme are stored in advance, and when a voice is input, the feature is extracted, the feature is extracted, and the similarities with the features of the prestored word or phoneme are measured and output as the recognition result. do. Since the voice changes over time, the voice characteristic is stable only for a short frame. Therefore, the feature of speech is analyzed for each frame to generate a feature vector, which is represented as a column of feature vectors.

There are two main methods of speech recognition. First, there is a method of recognizing speech as a kind of pattern by measuring similarity between a registered pattern and an input pattern. Second, there is a method of modeling the speech utterance and assigning a unique model to each target word or phoneme to measure and recognize which voice model the input voice has the highest probability of originating. In addition, there are methods using neural networks, mixed forms of various methods, and the like. In addition to such a signal processing aspect, a language model including knowledge information related to a language system may be applied to the speech recognition process.

EMBODIMENT OF THE INVENTION Hereinafter, the electronic device which concerns on this invention is demonstrated in detail with reference to drawings. The suffixes "module" and "unit" for components used in the following description are given or used in consideration of ease of specification, and do not have distinct meanings or roles from each other.

The electronic device described herein includes a mobile phone, a smart phone, a laptop computer, a digital broadcasting terminal, a personal digital assistant (PDA), a portable multimedia player (PMP), a navigation terminal, and a digital television (DTV). ), And IPTV (Internet Protocol Television).

1 is a block diagram of an electronic device according to an embodiment of the present invention. 2 is a diagram illustrating a case where the electronic device 100 according to an embodiment of the present invention is a mobile terminal.

The electronic device 100 may include a wireless communication unit 110, an A / V input unit 120, a user input unit 130, a sensing unit 140, an output unit 150, a memory unit 160, The interface unit 170, the controller 180, the voice detector 182, the voice recognizer 183, the voice synthesizer 184, and the power supply 190 may be included.

Since the components shown in FIG. 1 are not essential, an electronic device having more or fewer components may be implemented.

Hereinafter, the components will be described in order.

The wireless communication unit 110 may include one or more modules that enable wireless communication between the electronic device 100 and the wireless communication system or between a network in which the electronic device 100 and the electronic device 100 are located. For example, the wireless communication unit 110 may include a broadcast receiving module 111, a mobile communication module 112, a wireless internet module 113, a short range communication module 114, a location information module 115, and the like. .

The broadcast receiving module 111 receives a broadcast signal and / or broadcast related information from an external broadcast management server through a broadcast channel.

The broadcast channel may include a satellite channel and a terrestrial channel. The broadcast management server may mean a server that generates and transmits a broadcast signal and / or broadcast related information, or a server that receives a previously generated broadcast signal and / or broadcast related information and transmits the same to an electronic device. The broadcast signal may include not only a TV broadcast signal, a radio broadcast signal, and a data broadcast signal, but also a broadcast signal having a data broadcast signal combined with a TV broadcast signal or a radio broadcast signal.

The broadcast related information may mean information related to a broadcast channel, a broadcast program, or a broadcast service provider. The broadcast related information may also be provided through a mobile communication network. In this case, it may be received by the mobile communication module 112.

The broadcast related information may exist in various forms. For example, it may exist in the form of Electronic Program Guide (EPG) of Digital Multimedia Broadcasting (DMB) or Electronic Service Guide (ESG) of Digital Video Broadcast-Handheld (DVB-H).

The broadcast receiving module 111 receives broadcast signals using various broadcasting systems, and in particular, digital multimedia broadcasting-terrestrial (DMB-T), digital multimedia broadcasting-satellite (DMB-S), and media forward link (MediaFLO). Digital broadcast signals can be received using digital broadcasting systems such as only), digital video broadcast-handheld (DVB-H), integrated services digital broadcast-terrestrial (ISDB-T), and the like. Of course, the broadcast receiving module 111 may be configured to be suitable for not only the above-described digital broadcast system but also other broadcast system for providing a broadcast signal.

The broadcast signal and / or broadcast related information received through the broadcast receiving module 111 may be stored in the memory unit 160.

The mobile communication module 112 transmits and receives a wireless signal with at least one of a base station, an external terminal, and a server on a mobile communication network. The wireless signal may include various types of data according to transmission and reception of a voice call signal, a video call call signal, or a text / multimedia message.

The wireless internet module 113 refers to a module for wireless internet access, and the wireless internet module 113 may be internal or external to the electronic device 100. Wireless Internet technologies may include Wireless LAN (Wi-Fi), Wireless Broadband (Wibro), World Interoperability for Microwave Access (Wimax), High Speed Downlink Packet Access (HSDPA), and the like.

The short range communication module 114 refers to a module for short range communication. As a short range communication technology, Bluetooth, Radio Frequency Identification (RFID), Infrared Data Association (IrDA), Ultra Wideband (UWB), ZigBee, and the like may be used.

The location information module 115 is a module for checking or obtaining the location of the electronic device. The location information module 115 may obtain location information using a global navigation satellite system (GNSS). Here, global satellite navigation system (GNSS) is a term used to describe radionavigation satellite systems that revolve around the earth and send reference signals from which certain types of radio navigation receivers can determine their location near or on the earth's surface. . The Global Satellite Navigation System (GNSS) includes Global Position System (GPS) operated in the United States, Galileo operated in Europe, Global Orbiting Navigational Satelite System operated in Russia, COMPASS operated in China and Japan. QZSS (Quasi-Zenith Satellite System) operated by.

As a representative example of the GNSS, the location information module 115 may be a global position system (GPS) module. The GPS module calculates information about a distance at which a point (object) is separated from three or more satellites, information about a time at which the distance information is measured, and then applies a triangulation method to the calculated distance information. Three-dimensional position information according to latitude, longitude, and altitude of one point (object) can be calculated. Furthermore, a method of calculating position and time information using three satellites and correcting the error of the calculated position and time information using another satellite is also used. The GPS module continuously calculates the current position in real time and uses the same to calculate speed information.

Referring to FIG. 1, the A / V input unit 120 is for inputting an audio signal or a video signal, and may include a camera 121 and an audio receiver 122. The camera 121 processes image frames such as still images or moving images obtained by the image sensor in the video call mode or the photographing mode. The processed image frame may be displayed on the display unit 151.

The image frame processed by the camera 121 may be stored in the memory unit 160 or transmitted to the outside through the wireless communication unit 110. Two or more cameras 121 may be provided according to the configuration aspect of the electronic device 100.

The audio receiver 122 receives an external sound signal by a microphone in a call mode, a recording mode, a voice recognition mode, etc., and processes the external sound signal into electrical voice data. The processed voice data may be converted into a form transmittable to the mobile communication base station through the mobile communication module 112 and output in the call mode. The audio receiver 122 may implement various noise removing algorithms for removing noise generated in the process of receiving an external sound signal.

The user input unit 130 generates input data for the user to control the operation of the electronic device 100. The user input unit 130 may include a key pad dome switch, a touch pad (static pressure / capacitance), a jog wheel, a jog switch, and the like.

The sensing unit 140 is an electronic device 100 such as an open / closed state of the electronic device 100, a location of the electronic device 100, a presence or absence of a user contact, an orientation of the electronic device 100, an acceleration / deceleration of the electronic device 100, or the like. A sensing signal for controlling the operation of the electronic device 100 is generated by detecting a current state or an external environment. For example, when the electronic device 100 is in the form of a slide phone, it may sense whether the slide phone is opened or closed. In addition, it may be responsible for sensing functions related to whether the power supply unit 190 is supplied with power, whether the interface unit 170 is coupled to an external device, and the like. The sensing unit 140 may include a proximity sensor 142.

The output unit 150 is used to generate an output related to sight, hearing, or tactile sense, and may include a display unit 151, an audio output module 152, an alarm unit 153, and a haptic module 154. have.

The display unit 151 displays and outputs information processed by the electronic device 100. For example, when the electronic device 100 is in a call mode, the electronic device 100 displays a user interface (UI) or a graphic user interface (GUI) related to the call. When the electronic device 100 is in a video call mode or a photographing mode, the electronic device 100 displays a photographed and / or received image, a UI, or a GUI.

The display unit 151 may be a liquid crystal display, a thin film transistor-liquid crystal display, an organic light-emitting diode, a flexible display, or a three-dimensional display. 3D display).

Some of these displays can be configured to be transparent or light transmissive so that they can be seen from the outside. This may be referred to as a transparent display. A representative example of the transparent display is a transparent LCD. The rear structure of the display unit 151 may also be configured as a light transmissive structure. With this structure, the user can see an object located behind the body of the electronic device 100 through the area occupied by the display unit 151 of the body of the electronic device 100.

Two or more display units 151 may exist according to the implementation form of the electronic device 100. For example, a plurality of display units may be spaced apart or integrally disposed on one surface of the electronic device 100, or may be disposed on different surfaces.

When the display unit 151 and a sensor for detecting a touch operation (hereinafter, referred to as a touch sensor) form a mutual layer structure (hereinafter, abbreviated as “touch screen”), the display unit 151 is an output device. It can also be used as an input device. The touch sensor may have, for example, a form of a touch film, a touch sheet, a touch pad, or the like.

The touch sensor may be configured to convert a change in pressure applied to a specific portion of the display unit 151 or capacitance generated in a specific portion of the display unit 151 into an electrical input signal. The touch sensor may be configured to detect not only the position and area of the touch but also the pressure at the touch.

If there is a touch input to the touch sensor, the corresponding signal (s) is sent to the touch controller. The touch controller processes the signal (s) and then transmits the corresponding data to the controller 180. As a result, the controller 180 can know which area of the display unit 151 is touched.

The proximity sensor 142 may be disposed in an inner region of the electronic device 100 covered by the touch screen or near the touch screen. The proximity sensor 142 refers to a sensor that detects the presence or absence of an object approaching a predetermined detection surface or an object present in the vicinity without using mechanical force by using an electromagnetic force or infrared rays. The proximity sensor 142 has a longer life and higher utilization than a contact sensor.

Examples of the proximity sensor 142 include a transmission photoelectric sensor, a direct reflection photoelectric sensor, a mirror reflection photoelectric sensor, a high frequency oscillation proximity sensor, a capacitive proximity sensor, a magnetic proximity sensor, and an infrared proximity sensor.

When the touch screen is capacitive, the touch screen is configured to detect the proximity of the pointer by the change of the electric field according to the proximity of the pointer. In this case, the touch screen (touch sensor) may be classified as a proximity sensor.

Hereinafter, for convenience of explanation, the act of allowing the pointer to be recognized without being in contact with the touch screen so that the pointer is located on the touch screen is referred to as a "proximity touch", and the touch The act of actually touching the pointer on the screen is called "contact touch." The position where the proximity touch is performed by the pointer on the touch screen refers to a position where the pointer is perpendicular to the touch screen when the pointer is in proximity proximity.

The proximity sensor 142 detects a proximity touch and a proximity touch pattern (for example, a proximity touch distance, a proximity touch direction, a proximity touch speed, a proximity touch time, a proximity touch position, and a proximity touch movement state). Information corresponding to the sensed proximity touch operation and proximity touch pattern may be output on the touch screen.

The sound output module 152 may output audio data received from the wireless communication unit 110 or stored in the memory unit 160 in a call signal reception, a call mode or a recording mode, a voice recognition mode, a broadcast reception mode, and the like. The sound output module 152 outputs a sound signal related to a function (for example, a call signal reception sound or a message reception sound) performed in the electronic device 100. The sound output module 152 may include a receiver, a speaker, a buzzer, and the like. In addition, the sound output module 152 may output sound through the earphone jack 116. The user can hear the sound output by connecting the earphone to the earphone jack 116.

The alarm unit 153 outputs a signal for notifying occurrence of an event of the electronic device 100. Examples of events generated in the electronic device 100 include call signal reception, message reception, key signal input, and touch input. The alarm unit 153 may output a signal for notifying occurrence of an event in a form other than a video signal or an audio signal, for example, vibration. The video signal or the audio signal may also be output through the display unit 151 or the sound output module 152.

The haptic module 154 generates various haptic effects that a user can feel. Vibration is a representative example of the haptic effect generated by the haptic module 154. The intensity and pattern of vibration generated by the haptic module 154 can be controlled. For example, different vibrations may be synthesized and output or may be sequentially output.

In addition to the vibration, the haptic module 154 may be used for the effects of stimulation by the arrangement of pins vertically moving with respect to the contact skin surface, the effect of the injection force of the air through the injection or inlet or the stimulation through the suction force, and the stimulation that rubs the skin surface. Various tactile effects may be generated, such as effects by stimulation through contact of electrodes, effects by stimulation using electrostatic force, and effects of reproducing a sense of warmth and heat using an endothermic or heat generating element.

The haptic module 154 may not only deliver the haptic effect through direct contact, but also implement the haptic effect through the muscle sense of the user's finger or arm. Two or more haptic modules 154 may be provided according to a configuration aspect of the electronic device 100.

The memory unit 160 may store a program for the operation of the controller 180 and may temporarily store input / output data (for example, a phone book, a message, a still image, a video, etc.). The memory unit 160 may store data regarding vibration and sound of various patterns output when a touch input on the touch screen is performed.

The memory unit 160 may include an acoustic model and a recognition dictionary required for speech recognition. In addition, the memory unit 160 may include a language model.

The recognition dictionary may include at least one of a word, a word, a keyword, and an expression formed in a specific language.

The memory unit 160 may include a flash memory type, a hard disk type, a multimedia card micro type, a card type memory (for example, SD or XD memory), Random Access Memory (RAM), Static Random Access Memory (SRAM), Read-Only Memory (ROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Programmable Read-Only Memory (PROM) Magnetic Memory, Magnetic It may include a storage medium of at least one type of disk, optical disk. The electronic device 100 may operate in association with a web storage that performs a storage function of the memory unit 160 on the Internet.

The interface unit 170 serves as a path with all external devices connected to the electronic device 100. The interface unit 170 receives data from an external device or receives power and transmits the data to each component in the electronic device 100, or transmits the data inside the electronic device 100 to an external device. For example, wired / wireless headset ports, external charger ports, wired / wireless data ports, memory card ports, ports for connecting devices with identification modules, audio input / output (I / O) ports, The video input / output (I / O) port, the earphone port, and the like may be included in the interface unit 170.

The identification module is a chip that stores various types of information for authenticating the use authority of the electronic device 100, and includes a user identification module (UIM), a subscriber identify module (SIM), and a universal user authentication module ( Universal Subscriber Identity Module (USIM), and the like. A device equipped with an identification module (hereinafter referred to as an 'identification device') may be manufactured in the form of a smart card. Therefore, the identification device may be connected to the electronic device 100 through a port.

The interface unit 170 may be a passage through which power from the cradle is supplied to the electronic device 100 when the electronic device 100 is connected to an external cradle, or various commands input by the user from the cradle. It may be a passage through which a signal is transmitted to the electronic device 100. Various command signals or power input from the cradle may operate as signals for recognizing that the electronic device 100 is correctly mounted on the cradle.

The voice detector 182 detects a voice signal included in the audio signal input through the audio receiver 122. The voice detector 182 may determine whether voice is present in the audio signal. The voice detector 182 is generally referred to as a voice activity detector (VAD).

The voice detector 182 may detect the voice while buffering the audio signal for a predetermined period or a predetermined time.

The voice recognition unit 183 performs voice recognition on the audio signal or the voice signal input through the audio receiver 122, and obtains at least one recognition candidate corresponding to the recognized voice.

For example, the voice recognition unit 183 may recognize the input voice signal by detecting a voice section from the input voice signal, performing a sound analysis, and recognizing it as a recognition unit. The voice recognition unit 183 may obtain the at least one recognition candidate corresponding to the speech recognition result by referring to a recognition dictionary and a translation database stored in the memory 160.

The voice recognition unit 183 may include the voice detection unit 182.

The speech synthesizer 184 converts text into speech using a text-to-speech engine. TTS technology is a technology that converts the character information or symbols into a human voice to hear. TTS technology generates a continuous speech by building a pronunciation database for all phonemes of the language and connecting them. In this case, natural speech processing technology is synthesized by adjusting the size, length, and height of the voice. This may be included. TTS technology can be easily found in electronic communication fields such as CTI, PC, PDA, and mobile phones, and in consumer electronics such as recorders, toys, and game machines, and is widely used in home automation systems for improving productivity in the factory or for more convenient everyday life. It is written. Since the TTS technology is a known technology, a detailed description thereof will be omitted.

Meanwhile, the voice detector 182 and the voice recognizer 183 are not necessarily provided in the electronic device 100. For example, at least one of the voice detector 182 and the voice recognizer 183 may exist outside the electronic device 100. 3 is a conceptual diagram in which the electronic device 100 and the external server 300 are connected to the Internet 400.

The external server 300 may include the voice recognition unit 183. The electronic device 100 may include the voice detector 182 and may not include the voice recognizer 183. The electronic device 100 may detect a voice section from the audio signal received from the audio receiver 122 using the voice detector 182 and transmit the detected voice section to the external server 300. have.

The external server 300 may recognize the voice section transmitted from the electronic device 100 through the voice recognition unit 183, and transmit the recognition result to the electronic device 100.

The external server 300 may include the voice detector 182 and the voice recognizer 183. The electronic device 100 may transmit the audio signal received from the audio receiver 122 to the external server 300.

The external server 300 detects the voice from the audio signal received from the electronic device 100 using the voice detector 182 and the voice recognizer 183 to recognize the voice, and recognizes the recognition result. It may transmit to the electronic device 100.

The controller 180 typically controls the overall operation of the electronic device 100. For example, it performs related control and processing for voice call, data communication, video call, voice recognition and the like. The controller 180 may include a multimedia module 181 for playing multimedia. The multimedia module 181 may be implemented in the controller 180 or may be implemented separately from the controller 180.

The controller 180 may perform a pattern recognition process for recognizing a writing input or a drawing input performed on the touch screen as text and an image, respectively.

The power supply unit 190 receives an external power source and an internal power source under the control of the controller 180 to supply power for operation of each component.

Various embodiments described herein may be implemented in a recording medium readable by a computer or similar device using, for example, software, hardware or a combination thereof.

According to a hardware implementation, the embodiments described herein include application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field programmable gate arrays (FPGAs), and the like. It may be implemented using at least one of processors, controllers, micro-controllers, microprocessors, and electrical units for performing the functions. It may be implemented by the controller 180.

In a software implementation, embodiments such as procedures or functions may be implemented with separate software modules that allow at least one function or operation to be performed. The software code may be implemented by a software application written in a suitable programming language. In addition, the software code may be stored in the memory unit 160 and executed by the controller 180.

Hereinafter, embodiments of the present invention will be described. Embodiments of the present invention may be implemented in the electronic device 100 described with reference to FIGS. 1 to 3. Hereinafter, for convenience of description, the embodiments of the present invention will be described assuming that the electronic device 100 is a mobile terminal. As mentioned above, it is clear that the technical idea disclosed in the present document can be applied to various electronic devices.

4 is a flowchart of a voice recognition method of an electronic device according to a first embodiment of the present invention. 5 to 8 are diagrams for describing a voice recognition method of an electronic device according to a first embodiment of the present invention. Hereinafter, a voice recognition method of an electronic device and an operation of the electronic device 100 for implementing the same will be described in detail with reference to the accompanying drawings.

The controller 180 receives a voice through the audio receiver 122 [S100]. The voice may include a plurality of numbers.

5A illustrates an example of a screen corresponding to a standby mode for receiving a voice of a user. The controller 180 can receive a user's voice in the screen state shown in FIG. 5A (see FIG. 5B).

The controller 180 controls the voice recognition unit 183 to perform voice recognition on the received voice [S110].

In the example of FIG. 5, the controller 180 may recognize voice by separating “Call” and the remaining numeric part from “Call 225 6142” spoken by a user. "Call" is a command corresponding to a call function.

When the controller 180 recognizes the "Call", the controller 180 may recognize the number received after the "Call" by voice recognition and send a call to the receiver corresponding to the recognized number.

The controller 180 highlights different portions of the results of the voice recognition and displays a plurality of recognition candidates on the display unit 151 [S120].

FIG. 5C illustrates an example in which the plurality of recognition candidates are displayed on the display unit 151.

In the example of FIG. 5, the controller 180 is a voice recognition result for the plurality of numbers 225 6142, and as shown in FIG. 5C, a plurality of different numeric strings 10. A plurality of recognition candidates 10 respectively corresponding to may be displayed on the display unit 151.

In this case, the controller 180 highlights and displays numbers having different voice recognition results between the different strings of numbers 10.

The different numbers of emphasis can be made in a variety of ways. For example, the controller 180 highlights numbers having different voice recognition results or uses different numbers of voice recognition results by using at least one of colors, fonts, and sizes of numbers with different voice recognition results. Can be emphasized.

In addition, the controller 180 displays the plurality of recognition candidates 10 on the display unit 151 in the order of recognition candidates having the highest recognition scores as the speech recognition result.

For example, referring to FIG. 5C, the controller 180 displays the first recognition candidate 10a having the highest recognition score at the top. Then, the second recognition candidate 10b and the third recognition candidate 10c are displayed in the order of the recognition scores being high.

The controller 180 compares the numbers corresponding to the same position with respect to the different strings of numbers 10, and if the numbers corresponding to the same positions are different from each other, the different numbers are highlighted. can do.

For example, referring to FIG. 6, the controller 180 may have a number corresponding to the same position (eg, first to seventh column) of a number string constituting each of the plurality of recognition candidates 10. Compare the sameness between the fields. In FIG. 6, the first to third columns, the fifth and sixth columns are the same, and the fourth and seventh columns include different numbers.

The controller 180 receives a selection signal for a specific recognition candidate among the plurality of recognition candidates 10 [S130], and sends a call to the selected specific recognition candidate as a receiving side [S140].

For example, referring to FIG. 7, when a user selects the first recognition candidate 10a, the controller 180 makes a call using a number string corresponding to the first recognition candidate 10a as a received telephone number. Send.

On the other hand, the controller 180 can receive a call originating command from the user in another way, and send a call.

For example, referring to FIG. 8, in providing the plurality of recognition candidates 10, the controller 180 may display an ordinal number as an identifier of each recognition candidate. In FIG. 8A, the first recognition candidate 10a corresponds to "1", the second recognition candidate 10b corresponds to "2", and the third recognition candidate 10c corresponds to "3". It is.

The user may speak an ordinal number corresponding to a recognition candidate desired by the user, from among the plurality of recognition codes 10 (see FIG. 8B). The controller 180 recognizes the user's voice, selects a recognition candidate indicated by an ordinal corresponding thereto, and sends a call to the selected recognition candidate as a receiving side (see FIG. 8C).

According to the first embodiment of the present invention described above, the user can easily recognize the result of the contents spoken by the user, and the accessibility to the desired candidate can be improved. Thus, the user can easily access the desired result and select it quickly.

9 is a flowchart of a voice recognition method of an electronic device according to a second embodiment of the present invention. 10 to 15 are diagrams for describing a voice recognition method of an electronic device according to a second embodiment of the present invention. Hereinafter, a voice recognition method of an electronic device and an operation of the electronic device 100 to implement the same will be described in detail with reference to the accompanying drawings.

The controller 180 receives a voice through the audio receiver 122 [S200].

The controller 180 performs voice recognition on the received voice to determine whether the received voice includes a first predefined keyword representing an international call [S210].

The first keyword may vary. In general, "+" is a symbol for international calls.

Examples of the first keyword include "+", "plus", "international", and the like. That is, the controller 180 determines whether the received voice includes the first keyword indicating an international call as a result of performing voice recognition on the received voice.

If the received voice includes the first keyword, the controller 180 performs voice recognition assuming a second keyword received after the first keyword as a country code number, when the received voice includes the first keyword. [S220]. The controller 180 performs voice recognition assuming a plurality of syllables received after the second keyword as a phone number [S230].

For example, referring to FIGS. 10 and 11, the controller 180 receives a voice from a voice recognition screen for performing a call origination function (see FIG. 10 (a)) (FIG. 10 (b)). See).

The content of the voice spoken by the user shown in FIG. 10 (b) has the structure shown in FIG. 11. The structure shown in Fig. 11 starts with a keyword 30 corresponding to a call origination function, and includes the first keyword 31 representing the international call following the keyword 30.

As the voice recognition of the first keyword 31, the controller 180 assumes a second keyword 32 received after the first keyword 31 as a country code number and performs voice recognition.

The memory 160 may store a database in which a country and a country code number are matched. For example, the country "Korea" matches the country code number "82" and the country "USA" matches the country code number "1".

The controller 180 may perform voice recognition with respect to the second keyword 32 that is received after the first keyword 31 with reference to the database. That is, the controller 180 may perform voice recognition on the second keyword 32 by comparing the second keyword 32 with a country name or country code number included in the database.

10 and 11, the controller 180 compares the second keyword 32 with a country code number included in the database. The “820” 32 is not present in the database. "0" cannot follow the country code number, and "82" corresponding to "Korea" exists in the database, so that "820" may be modified to "82".

The controller 180 may recognize a plurality of syllables 33 received after the second keyword 32 as a phone number.

Meanwhile, as shown in FIG. 12, the second keyword 32 received following the first keyword 31 may be a country name in addition to a country code number. As in the voice recognition process of the country code number, the controller 180 can perform voice recognition with reference to the database on the country name received after the first keyword 31.

According to the second embodiment of the present invention, the recognition rate of a voice composed of numbers can be greatly improved.

FIG. 13 is a view for explaining the structure of a number recognizer according to the prior art, and FIG. 14 is a view for explaining the structure of a number recognizer according to the present invention.

Referring to FIG. 13, the structure of the number recognizer according to the prior art will be described. The number recognizer according to the prior art is composed of a search space in which all digits (0 to 9) can come for each digit. The selection of a path to each leaf node is determined by an acoustic matching process based on the similarity of speech and acoustic models.

The structure of the number recognizer according to the prior art requires a large amount of computation in order to calculate the number in all cases, and the recognition rate also decreases exponentially with the length of the digit. For example, the number of cases for recognizing a string of numbers having four digits is 10000 (10 ^ ⁴ ).

Referring to Figure 14 describes the structure of the number recognizer according to the present invention. FIG. 14 illustrates a case where speech recognition is performed on a numeric string having four digits as in the case of FIG. 13.

Referring to FIG. 14, a number may be designated for each digit according to a country code in a search space. Accordingly, unlike the case of FIG. 13, the size of the search space is greatly reduced, and the complexity of the search space is greatly reduced. Therefore, while the calculation amount is greatly reduced, the number of cases of the recognition result is greatly reduced (about 300 or less). Therefore, according to the number recognizer according to the present invention, a recognition rate for a speech composed of a string of numbers can be greatly improved.

The controller 180 may recognize, as a predetermined identification number, a third keyword starting first among the plurality of syllables received after the second keyword. For example, referring to FIG. 15, the controller 180 may recognize the third keyword 34 received after the second keyword 32 as a mobile phone identification number. The method and process of recognizing the mobile phone identification number is the same as or similar to the method and process of recognizing the second keyword 32.

16 is a flowchart of a voice recognition method of an electronic device according to a third embodiment of the present invention. 17 to 19 are diagrams for describing a voice recognition method of an electronic device according to a third embodiment of the present invention. Hereinafter, a voice recognition method of an electronic device and an operation of the electronic device 100 for implementing the same will be described in detail with reference to the accompanying drawings.

The controller 180 receives a voice through the audio receiver 122 [S300].

The controller 180 determines whether a pause, which is a silent section, is detected in the received voice [S310].

The controller 180 may detect the pose by using the voice detector 182 or determine that the pose exists when no voice is received from the audio receiver 122 for a predetermined time.

When the pose is detected in the received voice as a result of the determination in step S210, the controller 180 performs voice recognition on the received number in preference to the detected pose [S320], and in step S320 The execution result of the performed voice recognition is output [S330].

The controller 180 may return to step S300 to repeat steps S300 to S330.

That is, according to the third embodiment of the present invention, whenever the pose is detected in the received voice, the controller 180 performs voice recognition on at least one number received in preference to the detected pose. Each time the voice recognition is performed on the at least one number, a result of the voice recognition may be output.

17 and 18, an example in which the third embodiment of the present invention is actually implemented will be described.

As shown in FIG. 17, the user utters each of the

sections

40, 41, 42, 43 with a pose between the

sections

40, 41, 42, 43. In the case of FIG. 17, if only a numeric sequence is assumed, a first pose exists between the first numeric sequence 41 and the second numeric sequence 42, and the second numeric sequence 42 and the third numeric sequence 43 are present. There is a second pose in between, and a third pose exists after receiving the third sequence 43, so there are three poses in all.

As shown in FIG. 18, whenever the respective poses are detected, the controller 180 performs voice recognition on the received numeric strings prior to the respective detected poses, and outputs the result.

For example, when the first pose is detected, voice recognition is performed on the first numeric string 41 and the result is output to the display unit 151 (see FIG. 18A). When the second pose is detected, voice recognition is performed on the second string of numbers 42, and the result is output to the display unit 151 (see FIG. 18B), and the third pose is detected. In response to this, the voice recognition is performed on the third numeric string 43 and the result is output to the display unit 151 (see FIG. 18C).

18 (a) to 18 (c), the results of the speech recognition, like the first embodiment of the present invention, highlight the different numbers of the results.

FIG. 19 (a) shows a case in which the user speaks by breaking the numeric string by three digits, and FIG. 19 (b) shows a case in which the user speaks by breaking the numeric string by two digits. In FIGS. 19A and 19B, the display unit 151 may display a voice recognition result for the numeric string that is prioritized whenever a pose is detected in the same manner as in FIG. 18.

According to the third embodiment of the present invention, a partial recognition result can be subsequently output by detecting a pose which is a silent section in the middle of the user's utterance. The user generally does not talk off the phone number from beginning to end. In France, for example, it is a common culture to speak two digits. Therefore, when the pose detection is used, the voice recognition may be performed for each section of the telephone number rather than the voice recognition after the user's utterance is completely stopped, resulting in an increase in the recognition rate.

A fourth embodiment of the present invention discloses a method for modifying a voice recognition result and a mobile terminal implementing the same. 20 is a diagram showing an example in which the fourth embodiment of the present invention is implemented.

20 (a) shows a voice recognition result for the user's voice. When the user touches “3”, which is the third digit of the speech recognition result, the controller 180 may display the recognition candidate group 50 according to the recognition score (see FIG. 20B).

As shown in FIG. 20B, the user may select a desired number from the recognition candidate group 50. Here, the method of selecting a number desired by the user may vary. For example, as shown in FIG. 20B, when the display unit 151 is a touch screen, the user may touch a desired number. Also, for example, the user can select a desired number by voice. Referring to FIG. 20B, when the user says "6", the controller 180 recognizes the user's voice and selects "6" from the recognition candidate group 50.

As illustrated in FIG. 20C, the controller 180 changes the number corresponding to the corresponding place among the voice recognition results into a number selected by the user and displays the number.

Meanwhile, in the state of FIG. 20A, the user may select a desired number by using the voice recognition function. For example, if the user says "5" in the state of FIG. 20 (a), the controller 180 may recognize the user's voice and output a screen as shown in FIG. 20 (b).

If the user says "4" in the state of Fig. 20 (a), since the voice recognition result of Fig. 20 (a) includes two "4" s, both "4" s are selected. That is, the user may select a plurality of digits from the voice recognition result and modify the plurality of digits at once.

20 illustrates a case in which the recognition candidate group 50 is provided in a predetermined number in the order of the recognition scores. However, embodiments of the present invention are not limited thereto.

A fifth embodiment of the present invention discloses another method for modifying a voice recognition result and a mobile terminal implementing the same. 21 is a diagram showing an example in which the fifth embodiment of the present invention is implemented. FIG. 21 shows another form of displaying voice recognition results according to the first embodiment of the present invention described with reference to FIG.

FIG. 21A shows a case where only the recognition candidate 10a having the highest recognition score is output among the speech recognition results shown in FIG. 5C (see the first embodiment of the present invention).

As illustrated in FIG. 21A, the controller 180 may highlight and display different numbers among the plurality of recognition candidates.

When the user selects the first "6" on the screen of Figure 21 (a), as shown in Figure 21 (b), the controller 180 corresponds to the position corresponding to the selected "6", and recognize A recognition candidate group 51 including a number included in another recognition candidate having a low score may be displayed. The user may select a number desired by the recognition candidate group 51.

When the user selects the second "6" on the screen of Figure 21 (a), as shown in Figure 21 (c), the controller 180 corresponds to the position corresponding to the selected "6", and recognize A recognition candidate group 52 including a number included in another recognition candidate having a low score may be displayed. The user may select a number desired by the recognition candidate group 51.

The situation shown in FIG. 21 is the same as the example shown in FIG. However, the way that the recognition candidate group is provided to the user is different.

The speech recognition method of the electronic device according to the present invention described above may be provided by recording on a computer-readable recording medium as a program for executing in a computer.

The speech recognition method of the electronic device according to the present invention can be executed through software. When implemented in software, the constituent means of the present invention are code segments that perform the necessary work. The program or code segments may be stored on a processor readable medium or transmitted by a computer data signal coupled with a carrier on a transmission medium or network.

Computer-readable recording media include all types of recording devices that store data that can be read by a computer system. Examples of computer-readable recording devices include ROM, RAM, CD-ROM, DVD ± ROM, DVD-RAM, magnetic tape, floppy disks, hard disks, optical data storage devices, and the like. The computer readable recording medium can also be distributed over network coupled computer devices so that the computer readable code is stored and executed in a distributed fashion.

The present invention described above is capable of various substitutions, modifications, and changes without departing from the spirit of the present invention for those of ordinary skill in the art to which the present invention pertains, as described above and the accompanying examples. It is not limited by the drawings. In addition, the embodiments described in this document may not be limitedly applied, but may be configured by selectively combining all or part of the embodiments so that various modifications may be made.

According to the present invention, by efficiently and effectively recognizing a voice including a number, the recognition rate for a voice including a number such as a telephone number can be greatly improved, and the user's accessibility to a voice recognition result for a voice including a number can be improved. It is possible to provide an electronic device and a voice recognition method using the same, which can improve and improve the voice recognition result of the voice including numbers.

Claims

A display unit;

A voice receiver configured to receive a voice including a plurality of numbers; And

And a controller for recognizing the received voice and displaying a plurality of recognition candidates corresponding to a plurality of different numeric strings as a result of voice recognition for the plurality of numbers, on the display unit.

The controller is characterized in that for highlighting the numbers with different voice recognition results between the different numbers of strings.
The method of claim 1, wherein the control unit,

And comparing the numbers corresponding to the same positions with respect to the different strings of numbers, and displaying a different number of voice recognition results.
The method of claim 1, wherein the control unit,

Wherein the speech recognition result highlights numbers that differ, or the speech recognition result highlights numbers that differ by using at least one of a color, a font, and a size of the numbers.
A voice receiver for receiving a voice;

And a controller configured to perform voice recognition on the received voice.

If the received voice includes a predefined first keyword indicating an international call, the controller assumes that the second keyword received through the voice receiver is a country code number following the first keyword as the country code number. An electronic device characterized by recognizing a keyword.
The method of claim 4, wherein

A memory storing a first database matching country codes of countries;

The controller performs voice recognition on the second keyword with reference to the first database.
The method of claim 5, wherein the control unit,

And comparing the second keyword with a country name or a country code included in the first database to perform voice recognition on the second keyword.
The method of claim 4, wherein the control unit,

And a plurality of syllables received through the voice receiver as a phone number following the second keyword.
The method of claim 4, wherein the control unit,

And a third keyword, which starts first among the plurality of syllables, as a predetermined identification number.
The method of claim 8, wherein the third keyword,

Electronic device characterized in that the mobile communication identification station or area code.
A voice receiver configured to receive a voice including a plurality of numbers;

And a controller configured to perform voice recognition on the received voice.

The control unit,

And whenever a pause, which is a silent section, is detected in the received voice, performing voice recognition on the received at least one number in preference to the detected pause.
The method of claim 10, wherein the control unit,

And detecting the pose using a voice detector (VAD) or determining that the pose exists if no voice is received from the voice receiver for a predetermined time.
The method of claim 10, wherein the control unit,

And whenever a voice recognition is performed on the at least one number, outputting a result of performing the voice recognition.
Receiving a voice comprising a plurality of numbers;

Recognizing the received voice; And

A plurality of recognition candidates respectively corresponding to a plurality of different numeric strings are displayed as a speech recognition result of the plurality of numbers, and the speech recognition results of the different numeric strings are highlighted. Steps

Voice recognition method of an electronic device comprising.
The method of claim 13, wherein the displaying step,

Comparing numbers corresponding to the same position with respect to the different number strings, the speech recognition result of the speech recognition method of the electronic device, characterized in that for highlighting different numbers.
Receiving a voice; And

If the received voice includes a predefined first keyword representing an international call, voice recognition of the second keyword assuming a second keyword received after the first keyword as a country code number.

Voice recognition method of an electronic device comprising.
The method of claim 15,

Voice recognition of a plurality of syllables received following the second keyword by phone number

Speech recognition method of the electronic device further comprising.
Receiving a voice comprising a plurality of numbers; And

Whenever a pause, which is a silent section, is detected in the received voice, performing voice recognition on the received at least one number in preference to the detected pause.

Voice recognition method of an electronic device comprising.
The method of claim 17,

Whenever the voice recognition for the at least one number is performed, outputting a result of performing the voice recognition

Speech recognition method of the electronic device further comprising.