WO2006137246A1

WO2006137246A1 - Speech recognizing device, speech recognizing method, speech recognizing program, and recording medium

Info

Publication number: WO2006137246A1
Application number: PCT/JP2006/310673
Authority: WO
Inventors: Kentaro Yamamoto
Original assignee: Pioneer Corporation
Priority date: 2005-06-21
Filing date: 2006-05-29
Publication date: 2006-12-28

Abstract

A character input section (101) receives an input of character contained in a part of a phrase or word to be recognized. A speech input section (102) receives an input of a speech to be recognized. An extracting section (103) extracts a waiting word containing the inputted character from predetermined waiting words. A speech recognizing section (104) recognizes the inputted speech by using the extracted waiting word. A display section (105) displays the extracted waiting words. A start section (106) starts a predetermined processing depending on the result of the speech recognition. A selecting section (107) selects a desired waiting word from the waiting words displayed by the display section (105). A genre input section (108) receives an input of the genre to which the phrase or word belongs.

Description

Specification

Voice recognition apparatus, voice recognition method, voice recognition program, and recording medium

TECHNICAL FIELD [0001] The present invention relates to a voice recognition device, a voice recognition method, a voice recognition program, and a recording medium that recognize a spoken voice. However, the use of the present invention is not limited to the above-described voice recognition device, voice recognition method, voice recognition program, and recording medium.

Background art

Conventionally, a voice recognition technique for recognizing human voice input via a microphone or the like is known. In such speech recognition, the frequency vector of the utterance (voice input) by the user is analyzed, compared with the phoneme model prepared in advance, and the phoneme is specified. Then, by comparing * collating the identified word model with the standby words registered in the speech recognition dictionary in advance, the matching frequency of both is calculated, and the spoken phrase is identified (for example, (See Patent Document 1.) o

[0003] Patent Document 1: JP 2000-99546 A

Disclosure of the invention

Problems to be solved by the invention

[0004] However, according to the above-described prior art, time is required for the recognition processing in proportion to the number of standby words registered in the speech recognition dictionary. For this reason, when many standby words are registered, the response of the speech recognition processing is bad, for example.

[0005] Generally, in speech recognition, the convenience of a user is higher when as many standby words as possible are registered. For example, “parking lot” and “parking” are words that indicate the space in which the same vehicle is parked. It is impossible to know in advance which user will speak, and it is better to register both as standby words. Convenience is high. In order to maintain such convenience, it is necessary to efficiently perform speech recognition processing while maintaining the number of registered standby words.

[0006] When many standby words are registered, words having similar voice characteristics An example is the problem that the possibility of registering phrases increases and the accuracy of speech recognition decreases.

Means for solving the problem

In order to solve the above-described problems and achieve the object, the speech recognition apparatus according to the invention of claim 1 includes a character input means for inputting characters included in a part of a phrase to be speech-recognized, and the speech Extracted by a speech input means for inputting recognition speech, an extraction means for extracting a standby word including the character input to the character input means from a plurality of preset standby words, and extracted by the extraction means Voice recognition means for recognizing the voice input to the voice input means using the standby word.

[0008] Further, the speech recognition method according to the invention of claim 7 includes a character input step of inputting characters included in a part of a phrase after speech recognition, and a speech input of speech of the speech recognition. An input step, an extraction step for extracting a standby word including the character input in the character input step from a plurality of preset standby words, and the standby word extracted by the extraction step. And a voice recognition step for recognizing the voice inputted in the voice input step.

[0009] Further, a voice recognition program according to the invention of claim 8 causes a computer to execute the voice recognition method of claim 7.

[0010] Further, a recording medium according to the invention of claim 9 is readable by a computer having the voice recognition program according to claim 8 recorded thereon.

Brief Description of Drawings

[0011] FIG. 1 is a block diagram showing a functional configuration of a speech recognition apparatus according to an embodiment.

FIG. 2 is a flowchart showing a procedure of voice recognition processing by the voice recognition device.

FIG. 3 is a block diagram showing a hardware configuration of a navigation device that is effective in the embodiment.

FIG. 4 is a flowchart showing a procedure of voice recognition processing by a voice recognition unit. O

FIG. 5 is a diagram showing an example of a first character input screen.

[Figure 1-

〇 6] Figure 6 is a chart showing an example of refined words.

Yes

FIG. 7 is a diagram showing an example of a menu screen.

FIG. 8 is a diagram showing an example of a menu screen.

[FIG. 9] FIG. 9 is a diagram showing an example of a first character input screen when voice recognition is performed on the menu screen.

[FIG. 10] FIG. 10 is a chart showing an example of a narrowed word when performing speech recognition on the menu screen.

Explanation of symbols

Voice recognition device

101 Character input part

102 Audio input section

103 Extractor

104 Voice recognition unit

105 Display

106 Starter

107 Selector

108 Genre input 咅 1

BEST MODE FOR CARRYING OUT THE INVENTION

Exemplary embodiments of a speech recognition device, a speech recognition method, a speech recognition program, and a recording medium according to the present invention will be described below in detail with reference to the accompanying drawings.

[0014] (Embodiment)

FIG. 1 is a block diagram showing a functional configuration of a speech recognition apparatus that works on the embodiment. In FIG. 1, a speech recognition device 100 that is relevant to the embodiment includes a character input unit 101, a speech input unit 102, an extraction unit 103, a speech recognition unit 104, a display unit 105, an activation unit 106, a selection unit 107, and a genre input. Part 108 is constituted.

[0015] Character input unit 101 inputs characters included in a part of a word / phrase to be recognized. Characters that are part of a phrase you want to recognize are, for example, the first letter of the phrase, The characters that make up the phrase. A plurality of characters may be input to the character input unit 101. You can also specify the position that the character occupies in the phrase (for example, the first, second, or end).

The voice input unit 102 receives voice recognition voice. The voice input unit 102 realizes its function by, for example, a voice microphone. A plurality of voice input units 102 may be provided.

The extraction unit 103 extracts a standby word including a character input to the character input unit 101 from a plurality of standby words that have been set in advance. For example, the extraction unit 103 extracts a standby word whose first character is the character input to the character input unit 101. In addition, when the position where the input character occupies the phrase is specified, the standby word of the input character at the specified position is extracted.

The speech recognition unit 104 recognizes the speech input to the speech input unit 102 using the standby word extracted by the extraction unit 103. The voice recognition unit 104 performs voice recognition, for example, by converting input voice into data and performing matching processing with the extracted standby words (standby word data).

Display unit 105 displays the standby word extracted by extraction unit 103. The display unit 105 realizes its function by, for example, a display. When a plurality of standby words are extracted by the extraction unit 103, a plurality of standby words are displayed on the display unit 105. If the number of extracted standby words is larger than the display space of the display unit 105, it may be displayed using a scroll screen or the like.

The activation unit 106 activates a predetermined process based on the voice recognition result of the voice recognition unit 104. In addition, when the selection unit 107 described later selects a standby word, the activation unit 106 activates a predetermined process based on the selection result. For example, the predetermined process starts the instructed process if the speech-recognized word / phrase is a word / phrase meaning an instruction for the process. If the process has already been started, the necessary information can be obtained from the voice recognition result of the voice recognition unit 104 during the process.

The selection unit 107 selects a desired standby word from a plurality of standby words displayed on the display unit 105. For example, the user wants to speak This is a standby word that indicates a phrase.

The genre input unit 108 receives the genre to which the word belongs. The word to which a word belongs is classified according to the meaning and content of the word, such as place name, person name, and instruction word. When a genre is input to the genre input unit 108, the extraction unit 103 extracts standby words belonging to the input genre.

FIG. 2 is a flowchart showing a procedure of voice recognition processing by the voice recognition device. The voice recognition device 100 first waits until a character is input to the character input unit 101 (step S201: No loop). When a character is input (step S201: Yes), the extraction unit 103 extracts a standby word including the input character (step S202). Then, the display unit 105 displays the extracted standby word (step S203).

Next, it is determined whether or not a voice is input to the voice input unit 102 (step S204). When the voice is input (step S204: Yes), the input voice is recognized using the standby word extracted in step S202 (step S205). Then, the activation unit 106 activates a predetermined process based on the voice recognition result (step S206), and ends the process according to the present flow chart.

On the other hand, if no voice is input in step S204 (step S204: No), it is determined whether one of the standby words displayed on display unit 105 has been selected (step S207). When any one is selected (step S207: Yes), the activation unit 106 activates a predetermined process based on the selection result (step S208), and ends the process according to this flowchart. On the other hand, if none is selected (step S207: No), the process returns to step S204 and the subsequent processing is repeated.

[0026] As described above, according to the speech recognition apparatus 100 that is effective in the embodiment, the standby words used for speech recognition can be narrowed down by inputting characters included in the phrase to be recognized. As a result, the time required for voice recognition can be shortened and voice recognition processing can be performed efficiently. In addition, the accuracy of speech recognition can be improved by narrowing down standby words that are candidates for speech recognition results.

Example

[0027] (Hardware configuration of navigation device 300) FIG. 3 is a block diagram showing a hardware configuration of a navigation apparatus that is effective in the embodiment. In the example, a case where the speech recognition apparatus 100 according to the embodiment is used as an input unit of the navigation apparatus 300 will be described. In FIG. 3, a navigation device 300 is mounted on a vehicle, and includes a navigation control unit 301, a user operation unit 302, a display unit 303, a position acquisition unit 304, a recording medium 305, and a recording medium decoding. A unit 306, a voice output unit 307, a communication unit 308, a route search unit 309, a route guidance unit 310, a guidance sound generation unit 311, and a voice recognition unit 312 are configured.

The navigation control unit 301 controls the entire navigation device 300. The navigation control unit 301 includes, for example, a CPU (Central Processing Unit) that executes predetermined arithmetic processing, a ROM (Read Only Memory) that stores various control programs, and a RAM (Random) that functions as a work area for the CPU. It can be realized by a microcomputer constituted by an Access Memory).

In addition, the navigation control unit 301 inputs / outputs information on route guidance to / from the route search unit 309, the route guidance unit 310, and the guidance sound generation unit 311, and obtains the result. The information is output to the display unit 303 and the audio output unit 307.

The user operation unit 302 outputs information input by the user, such as characters, numerical values, and various instructions, to the navigation control unit 301. As the configuration of the user operation unit 302, various known forms such as a touch panel configured integrally with a display unit 303 described later, a push button switch for detecting physical press Z non-press, a keyboard, and a joystick may be employed. Is possible.

[0031] The user operation unit 302 includes a microphone 302a for inputting sound from the outside. The voice input from the microphone 302a is recognized by the voice recognition unit 312 described later. As a result, the user can perform an input operation by voice.

[0032] The user operation unit 302 may be provided integrally with the navigation device 300, and may be configured to be operated separately from the navigation device 300, such as a remote controller. The user operation unit 302 may be configured in any one of the various forms described above, or may be configured in a plurality of forms. The user inputs information by appropriately performing an input operation according to the form of the user operation unit 302. User Examples of information input by operating the operation unit 302 include a destination point or a departure point of a route to be searched.

[0033] Entering the destination or departure point is applicable by entering the latitude, longitude, and address of each point, as well as specifying the telephone number, genre, keyword, etc. of the facility that is the destination or departure point. The facility is searched and its location can be determined. More specifically, these pieces of information are specified as one point on the map based on background type data included in map information recorded on the recording medium 305 described later. Also, display map information on the display unit 303 described later, and specify a point on the displayed map.

[0034] Further, when voice recognition is performed by a voice recognition unit 312 (to be described later), the first character of a phrase that the user wants to utter is input to the user operation unit 302. When a character is input to the user operation unit 302 on a predetermined screen, the speech recognition unit 312 described later performs speech recognition using only a word having the input character as the first character as a standby word. The character input at this time is not limited to the first character, and may be the character at the end of the phrase or the character included in the phrase.

[0035] The display unit 303 is, for example, a CRT (Cathode Ray Tube), a TFT liquid crystal display, an organic EL display, a plasma display, or the like, and displays necessary information. Specifically, the display unit 303 can be configured by, for example, a video IZF or a video display device connected to the video IZF. Video IZF includes, for example, a graphic controller that controls the entire display device, a buffer memory such as VRAM (Video RAM) that temporarily stores image information that can be displayed immediately, and image information that is output from the graphic controller. Based on the above, it is composed of a control IC that controls display of the display device. The display unit 303 displays icons, cursors, menus, windows, or various information such as characters and images. The display unit 303 displays map information and route guidance information stored in a recording medium 305 to be described later.

[0036] The position acquisition unit 304 includes a GPS receiver and various sensor forces, and acquires information on the current position of the apparatus main body (current position of the vehicle). The position acquisition unit 304 also has a GPS receiver. When a server enters a predetermined area such as an area where GPS information cannot be received, it receives GPS alternative information transmitted from a communication device installed in that area and detects the current position of the vehicle.

[0037] The GPS receiver receives GPS information transmitted from the GPS satellite force, and obtains a geometric position with respect to the GPS satellite. GPS is an abbreviation for Global Positioning System, and is a system that accurately determines the position on the ground by receiving radio waves from four or more satellites. The GPS receiver is composed of an antenna for receiving radio waves from a GPS satellite, a tuner for demodulating the received radio waves, and an arithmetic circuit for calculating the current position based on the demodulated information.

[0038] The various sensors are various sensors mounted on the vehicle such as a vehicle speed sensor, an angular velocity sensor, a travel distance sensor, and an inclination sensor, and the travel locus of the vehicle is obtained from information output from these sensors. Thus, by using the information output from various sensors mounted on the vehicle in combination with the information obtained from the external force by the GPS receiver, the vehicle position can be recognized with higher accuracy.

[0039] The vehicle speed sensor detects from the output shaft of the transmission of the vehicle on which the navigation device 300 is mounted. The angular velocity sensor detects the angular velocity when the host vehicle is rotating, and outputs angular velocity information and relative orientation information. The mileage sensor calculates the number of pulses per rotation of the wheel by counting the number of pulses of a pulse signal with a predetermined period that is output as the wheel rotates, and the mileage information based on the number of pulses per rotation Is output. The inclination sensor detects the inclination angle of the road surface.

The recording medium 305 records various control programs and various information in a state that can be read by a computer. The recording medium 305 accepts writing of information by the recording medium decoding unit 306 and records the written information in a nonvolatile manner. The recording medium 305 can be realized by, for example, an HD (Hard Disk). The recording medium 305 is not limited to HD. Instead of HD or in addition to HD, DVD (Digital Versatile Disk) and CD (Compact Disk) can be attached to and removed from the recording medium decoding unit 306. Use portable media as the recording medium 305. The recording medium 305 is not limited to DVD and CD. CD—ROM (CD—R, CD-RW), MO (Mag It is also possible to use a portable medium that is detachable from the recording medium decoding unit 306 such as a neto-optical disk or a memory card.

[0041] The map information stored in the recording medium 305 includes background data representing features such as buildings, rivers, and the ground surface, and road shape data representing the shape of the road. It is drawn in 2D or 3D on the display screen of part 303. When the navigation device 300 is guiding a route, the map information recorded on the recording medium 305 and the vehicle position acquired by the position acquisition unit 304 are displayed in an overlapping manner.

In the present embodiment, the map information is recorded on the recording medium 305. However, the present invention is not limited to this. The map information may be provided outside the navigation device 300, not the information recorded only in the one integrated with the hardware of the navigation device 300. In this case, the navigation device 300 acquires map information via the network through the communication unit 308, for example. The acquired map information is stored in RAM.

The recording medium decoding unit 306 controls reading of information on the recording medium 305 and writing of Z. For example, when HD is used as a recording medium, the recording medium decoding unit 306 is an HDD (Hard Disk Drive). Similarly, when a DVD or CD (including CD-R and CD-RW) is used as a recording medium, the recording medium decoding unit 306 is a DVD drive or a CD drive. When a CD-ROM (CD-R, CD-RW), MO, memory card, etc. is used as a writable and removable recording medium 30 5, information can be written to various recording media and various recording media can be used. A dedicated drive device capable of reading stored information is appropriately used as the recording medium decoding unit 306.

[0044] The audio output unit 307 reproduces the guide sound by controlling the output to the connected speaker (not shown). There may be one or more speakers. Specifically, the audio output unit 307 can be realized by an audio IZF connected to an audio output speaker. More specifically, the audio IZF is, for example, a DZA converter that performs DZA conversion of audio digital information, an amplifier that amplifies the audio analog signal output from the DZ A converter, and AZD that converts audio analog information. Con It can be configured with a barter and force.

[0045] The communication unit 308 acquires road traffic information such as traffic jams and traffic regulations regularly or irregularly. The communication unit 308 is connected to a network and transmits / receives information to / from other devices connected to the network such as a server.

[0046] The reception of road traffic information by the communication unit 308 may be performed at the timing when the road traffic information is distributed from the VICS (Vehicle Information and Communication System) center, or the road traffic information is periodically sent to the VICS center. It may be done on request. In addition, road traffic information in a desired area may be acquired via a network from nationwide VICS information collected in Sano. The communication unit 308 can be realized by, for example, an FM tuner, a VICS / beacon resino, a wireless communication device, and other communication devices.

[0047] The route search unit 309 searches for an optimal route from the departure point to the destination point using map information stored in the recording medium 305, VICS information acquired via the communication unit 308, and the like. To do. Here, the optimum route is a route that best meets the conditions specified by the user. In general, there are an infinite number of routes from a departure point to a destination point. For this reason, items to be considered in route search are set, and routes that match the conditions are searched.

[0048] The route guidance unit 310 is obtained from the guidance route information searched by the route search unit 309, the vehicle position information acquired by the position acquisition unit 304, and the recording medium 305 via the recording medium decoding unit 300. Real-time route guidance information is generated based on the map information. The route guidance information generated at this time may be information that considers the traffic jam information received by the communication unit 308. The route guidance information generated by the route guidance unit 310 is output to the display unit 303 via the navigation control unit 301.

[0049] The guide sound generator 311 generates tone and voice information corresponding to the pattern. That is, based on the route guidance information generated by the route guidance unit 310, the virtual sound source corresponding to the guidance point is set and the voice guidance information is generated, and the voice is transmitted via the navigation control unit 301. Output to the output unit 307.

[0050] The voice recognition unit 312 recognizes voice input via the microphone 302a. Voice recognition The recognition unit 312 has an utterance button or the like in a part of the user operation unit 302, and recognizes the voice input to the microphone 302a after the utterance trigger is generated by using the utterance button as an utterance trigger. When the voice is recognized by the voice recognition unit 312, the navigation control unit 301 performs processing corresponding to the recognized word.

[0051] For example, when an utterance is made on the destination point setting screen and a place name is recognized by voice recognition, the navigation control unit 301 sets the recognized place name as the destination point. The user can set the destination point by speaking the destination point name instead of specifying the destination point from the map displayed on the display unit 303. As described above, the voice recognition performed by the voice recognition unit 312 can be replaced with the operation performed by the user operation unit 302.

[0052] Here, various methods of voice recognition are known. Generally, in order to identify the input voice, by analyzing the frequency distribution of the voice to be recognized in advance, for example, In addition, a speech recognition dictionary is provided that extracts time series information of spectrum and fundamental frequency as feature quantities of input speech and stores the pattern corresponding to each word.

[0053] When the speech to be recognized is input, the frequency spectrum of the input speech is analyzed, and the phoneme is specified by comparing and collating with a phoneme model prepared in advance. Then, the identified phoneme model and the pattern of each word stored in the speech recognition dictionary (hereinafter referred to as a standby word) are compared and verified by pattern matching to calculate the similarity for each word. Next, the standby word with the highest similarity (the word with the closest pattern) is recognized as the input speech, and the standby word is output. That is, the input speech is determined by examining which standby word the frequency distribution pattern of the input word is most similar to.

Here, the voice recognition unit 312 limits the number of standby words to be subjected to the matching process in the voice recognition process from the relationship with the processing time of the matching process. As described above, the speech recognition unit 312 performs a matching process on the frequency pattern of the input speech and all the standby words to be processed, and then calculates the similarity for each standby word. To do. For this reason, the processing time can be shortened as the number of waiting words to be subjected to the matching processing is small. However, the standby word to be matched If the word does not match the spoken word, misrecognitions and errors (no corresponding word) will occur frequently, resulting in poor use and use.

[0055] Therefore, the voice recognition unit 312 narrows down the words to be matched (hereinafter referred to as narrowed words) by waiting for the user to input the first character of the word to be recognized. For example, when “sa” is input as the first character, only words having “sa” as the first character, such as “sa, tamago” and “sasebo”, are extracted as narrowed words. When performing speech recognition processing, matching processing is performed on the input speech and refined words. As a result, the efficiency of the speech recognition process can be improved while improving the accuracy of speech recognition.

[0056] Note that the narrowing down of phrases is not limited to the input of the first character, and for example, the last character may be input or the phrases including the input character may be narrowed down. You can also refine by specifying the attribute of the word. For example, when a place name input screen such as a destination point setting screen is displayed, only the place name is narrowed down.

[0057] Further, character input is not limited to the touch panel, and may be handwritten input, for example. In that case, a sensor panel for handwriting input is provided on the dominant hand side of the user. At this time, reception of voice input may be started with the utterance trigger that the input character is recognized.

The navigation device 300 is configured by the hardware configuration as described above!

It should be noted that the character input unit 101, the selection unit 107, and the genre input unit 108, which are functional configurations of the speech recognition device 100 that is relevant to the embodiment, are the user operation unit 302, the voice input unit 102 is the microphone 302a, The navigation control unit 301, the voice recognition unit 104, the voice recognition unit 312 and the display unit 105, the display control unit 103 and the activation unit 106, respectively.

[0059] (Voice recognition processing by voice recognition unit 312)

FIG. 4 is a flowchart showing a procedure of voice recognition processing by the voice recognition unit. In the following description, a touch panel is adopted as the user operation unit 302. First, the voice recognition unit 312 waits until a character is input via the user operation unit 302 (step S401: loop of No). For example, the user should speak This is the first character (first character) of the phrase. In addition, it may be the last character or a character included in a phrase. At this time, if an utterance trigger occurs without any character input, and speech is input, the speech recognition is performed by matching with the entire waiting word without narrowing down.

[0060] When a character is input (step S401: Yes), a refined word is extracted based on the input character (step S402). A narrowed word is a narrowed word that is narrowed down under certain conditions, as described above. Then, the refined word is displayed on the display unit 303 (step S403).

[0061] When the narrowed word is displayed in step S403, the user can determine whether or not to narrow down by looking at the displayed word and phrase and further inputting characters. If more characters are input by the user (step S404: Yes), the process returns to step S402 and the subsequent processing is repeated. This further narrows down the standby words.

[0062] In step S404, if there is no further character input (step S404: No), an utterance trigger is generated and the process waits until a voice is input (step S405: loop of No). When speech is input (step S405: Yes), the input speech and the narrowed word are matched (step S406), and the spoken phrase is recognized (step S407). The process ends. In this case, it is assumed that characters are input, but for example, the phrase attributes (for example, the semantic classification of place names, song titles, directives, etc.) are specified, and only the phrases with the specified attributes are extracted. As a matter of fact.

[0063] As described above, the speech recognition unit 312 narrows down the standby words to be matched under the conditions specified by the user, and performs the matching process with the input speech only for the narrowed words and phrases. Thereby, the time required for the voice recognition processing can be shortened, and the voice recognition response of the navigation device 300 can be improved. In addition, the phrase recognition that has been narrowed down to a certain extent also performs speech recognition, so the recognition accuracy can be improved.

[0064] (Display screen example when extracting refined words)

FIG. 5 is a diagram showing an example of the input screen for the first character. Details of the extraction of narrowed words (step S402) and the display of narrowed words (step S403) shown in FIG. 4 will be described. Here, speech recognition is used when setting the destination point. Figure 5 In this case, a touch panel is adopted as the user operation unit 302, and a character input screen 500 is displayed on the display unit 303. On the character input screen 500, a character input key 511, an input character display unit 512, and a narrowed word display unit 513 are displayed.

[0065] In the character input key 511, hiragana characters are arranged in the order of 50 tones. A switch button is provided to display alphanumeric and katakana input keys. The user can input a desired character by touching the screen display of the desired character. The input character is displayed in the input character display section 512. In the illustrated example, a key corresponding to “sa” among the character input keys 511 is pressed, and the character “sa” is displayed in the input character display portion 512.

When the first character is input via the user operation unit 302, the voice recognition unit 312 extracts a standby word having the input character as the first character as a narrowed word. The extracted narrowed words are displayed in the narrowed word display unit 513. In the example shown in the figure, since the voice recognition is performed for setting the destination point, place names such as “Yuki-ku”, “Saimura”, “Nishikaien”, etc., with “sa” as the first character are displayed as narrowed words. In addition, when the scroll button 513a is pressed, the narrowed words are displayed!

[0067] The user presses an utterance button (not shown) to generate an utterance trigger, and utters a desired phrase (a place name starting with “sa”). The voice recognition unit 312 performs matching processing between the input voice and the narrowed word, and recognizes a phrase uttered by the user. For this reason, in the example shown in the figure, for example, words such as “Tokyo” that do not start with “sa” cannot be recognized.

Note that the user can also select a place name by touching the place name displayed on the narrowed word display unit 513. In this case, in order to select a word that is not displayed in the narrowed word display unit 513, the user presses the scroll button 513a to display a desired word in the narrowed word display unit 513, and touches the display.

FIG. 6 is a chart showing an example of narrowed words. In FIG. 6, a word / phrase group 601 displays place names having “sa” as the first character. This is because “sa” is entered in the character input screen 500 shown in FIG. The speech recognition unit 312 recognizes the input speech by matching processing with the words in the word group 601. Also, as shown, There are many words that start with place names. Therefore, only the display word / phrase group 602 indicated by the dotted line can be displayed on the narrowed word display portion 513 (see FIG. 5) of the character input screen 500. To display words other than the display word group 602, press the scroll button 513a (FIG. 5).

7 and 8 are diagrams showing an example of the menu screen. In the examples shown in Fig. 5 and Fig. 6, the destination point is set by voice recognition, so the narrow-down word is also limited to the place name. In this way, the speech recognition unit 312 extracts narrowed words that meet the purpose of speech recognition. In FIG. 7, a menu screen 700 is displayed on the display unit 303. The menu screen 700 is a screen for selecting an operation to be performed by the user. The user selects an operation by touching a desired operation display 711 to 714 or by speaking a desired operation content.

[0071] In the example shown in the figure, selectable operations include “Set destination point” (operation display 711), “Search for a song” (operation display 712), “View traffic information” (operation display 713), “ Change device settings "(operation display 714) is displayed. Further, when the scroll bar 721 is pressed, an operation display of other operations is displayed.

[0072] When setting the destination point, the user presses or utters "set destination point" (operation display 711). Then, a setting method selection screen 800 shown in FIG. 8 is displayed. Setting method selection screen 800 is a screen for selecting a method for executing the operation content (in the illustrated example, setting of a destination point). As in FIG. 7, the user selects the operation by touching the desired method display 811 to 814 or by speaking the desired setting method.

[0073] In the illustrated example, selectable methods are “Search in order of 50 notes” (Method display 811), “Search for map power” (Method display 812), “Search for driving history power” (Method display 813), “ “Search by genre” (method display 814) is displayed. Furthermore, if the scroll bar 821 is pressed, a method display of another method is displayed.

[0074] Here, "Search in order of 50 notes" (method display 811) is pressed or uttered. Then, a character input screen 500 shown in FIG. 5 is displayed. The user may input a desired point on the character input screen 500, or input only the first character and set by voice recognition. Yes. Since it has gone through the hierarchy shown in Fig. 7 and Fig. 8, it is clear that the place name that can be entered is the destination name. For this reason, the speech recognition unit 312 extracts only place names as the targets for speech recognition. Furthermore, when the user inputs the first character on the character input screen 500, a narrowed word is extracted from the standby word, and voice recognition can be performed more efficiently and accurately.

Here, when performing voice recognition on the menu screen 700 shown in FIG. 7, it is possible to narrow down standby words by the first character. When speech recognition is performed on the menu screen 700, speech recognition is performed from all standby words. Accordingly, not only a specific operation but also various operations of the navigation device 300 can be simultaneously waited, and it is possible to save the user the trouble of displaying the setting screen through the hierarchy.

FIG. 9 is a diagram showing an example of the input screen for the first character when performing voice recognition on the menu screen. When the utterance button is pressed while the menu screen 700 shown in FIG. 7 is displayed, the character input screen 900 shown in FIG. The character input screen 900 〖has a character input key 911, an input character display portion 912, and a narrowed word display portion 913.

In the illustrated example, a key corresponding to “ko” among the character input keys 911 is pressed, and the character “ko” is displayed in the input character display unit 912. A standby word with “ko” as the initial character is displayed as a narrowing word. When the scroll button 913a is pressed, words that cannot be displayed are displayed. As this standby word, all the words that can be uttered in the operation of the navigation device 300 without being limited to a specific attribute are displayed. For example, directives such as “go here”, facility names (place names) such as “Koshien”, and compound phrases such as “go to Koshien”.

FIG. 10 is a chart showing an example of narrowed words when performing voice recognition on the menu screen. In FIG. 10, a phrase group 1001 displays standby words with “ko” as the first character. In the narrowed word display portion 913 of the character input screen 900 shown in FIG. 9, a display word / phrase group 1002 indicated by a dotted line is displayed. To display words other than the displayed word group 1002, press the scroll button 913a (see FIG. 9).

[0079] The phrases included in the phrase group 1001 include the above-mentioned directives and place names, as well as navigation devices. There are song titles of music data recorded on the recording medium 305 of the device 300, map scale change instructions, and the like. The user can perform these operations directly from the menu screen 700 by voice recognition. For this reason, it is possible to perform a desired operation without going through the hierarchy of the display screen, and the user's operation burden can be reduced.

[0080] Normally, when waiting for various operations of the navigation device 300 at the same time, the number of standby words becomes enormous, so that the speech recognition process takes time and the response of the process becomes slow. In the navigation device 300 that is effective in the present embodiment, by inputting the first character of the standby word, the standby word to be processed is narrowed down, and the user's operation burden is reduced while performing the speech recognition processing efficiently. be able to. In this way, the user's operational burden can be reduced by appropriately narrowing down the words to be processed from many standby words.

[0081] As described above, according to the navigation device 300 that is powerful in the embodiment, by narrowing down the standby words to be processed by the first word or the like, the time required for the matching process is shortened and the processing load is reduced. Can be reduced. In addition, since the corresponding words are recognized from the narrowed standby words, the accuracy of voice recognition can be improved. Furthermore, by appropriately narrowing down the words to be processed from many standby words, it is possible to reduce the operation burden during the setting operation for the user.

Note that the speech recognition method described in the present embodiment can be realized by executing a program prepared in advance on a computer such as a personal computer or a workstation. This program is recorded on a computer-readable recording medium such as a hard disk, a flexible disk, a CD-ROM, an MO, or a DVD, and is executed by being read by the computer. The program may be a transmission medium that can be distributed via a network such as the Internet.

Claims

The scope of the claims

[1] A character input means for inputting characters included in a part of a phrase to be recognized,

Voice input means for inputting the voice of the voice recognition;

Extraction means for extracting a standby word including the above-mentioned characters input to the character input means from a plurality of standby words set in advance;

Using the standby word extracted by the extraction means, voice recognition means for recognizing the voice input to the voice input means;

A speech recognition apparatus comprising:

2. The speech recognition apparatus according to claim 1, further comprising display means for displaying the standby word extracted by the extraction means.

3. The speech recognition apparatus according to claim 1, further comprising start means for starting predetermined processing based on a speech recognition result of the speech recognition means.

[4] Selection means for selecting a desired standby word from a plurality of the standby words displayed by the display means;

3. The speech recognition apparatus according to claim 2, further comprising: an activation unit that activates a predetermined process based on a selection result of the selection unit.

5. The speech recognition apparatus according to claim 1, wherein the extraction unit extracts the standby word whose first character is the character input to the character input unit.

[6] A genre input means for inputting a genre to which the word belongs,

6. The speech recognition apparatus according to claim 1, wherein the extraction unit extracts the standby word belonging to the genre input by the genre input unit.

[7] A character input process in which characters included in a part of a phrase to be recognized are input,

A voice input step in which the voice of the voice recognition is input;

An extraction step of extracting a standby word including the above-mentioned characters input in the character input step from a plurality of standby words set in advance;

Using the standby words extracted in the extraction step, a speech recognition step for recognizing the speech input in the speech input step; A speech recognition method comprising:

A speech recognition program for causing a computer to execute the speech recognition method according to claim 7.

A computer-readable recording medium on which the voice recognition program according to claim 8 is recorded.