WO2006137246A1 - Speech recognizing device, speech recognizing method, speech recognizing program, and recording medium - Google Patents

Speech recognizing device, speech recognizing method, speech recognizing program, and recording medium Download PDF

Info

Publication number
WO2006137246A1
WO2006137246A1 PCT/JP2006/310673 JP2006310673W WO2006137246A1 WO 2006137246 A1 WO2006137246 A1 WO 2006137246A1 JP 2006310673 W JP2006310673 W JP 2006310673W WO 2006137246 A1 WO2006137246 A1 WO 2006137246A1
Authority
WO
WIPO (PCT)
Prior art keywords
input
unit
word
character
voice
Prior art date
Application number
PCT/JP2006/310673
Other languages
French (fr)
Japanese (ja)
Inventor
Kentaro Yamamoto
Original Assignee
Pioneer Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Pioneer Corporation filed Critical Pioneer Corporation
Publication of WO2006137246A1 publication Critical patent/WO2006137246A1/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue

Definitions

  • Voice recognition apparatus voice recognition method, voice recognition program, and recording medium
  • the present invention relates to a voice recognition device, a voice recognition method, a voice recognition program, and a recording medium that recognize a spoken voice.
  • the use of the present invention is not limited to the above-described voice recognition device, voice recognition method, voice recognition program, and recording medium.
  • Patent Document 1 JP 2000-99546 A
  • the speech recognition apparatus includes a character input means for inputting characters included in a part of a phrase to be speech-recognized, and the speech Extracted by a speech input means for inputting recognition speech, an extraction means for extracting a standby word including the character input to the character input means from a plurality of preset standby words, and extracted by the extraction means Voice recognition means for recognizing the voice input to the voice input means using the standby word.
  • the speech recognition method includes a character input step of inputting characters included in a part of a phrase after speech recognition, and a speech input of speech of the speech recognition.
  • An input step an extraction step for extracting a standby word including the character input in the character input step from a plurality of preset standby words, and the standby word extracted by the extraction step.
  • a voice recognition step for recognizing the voice inputted in the voice input step.
  • a voice recognition program according to the invention of claim 8 causes a computer to execute the voice recognition method of claim 7.
  • a recording medium according to the invention of claim 9 is readable by a computer having the voice recognition program according to claim 8 recorded thereon.
  • FIG. 1 is a block diagram showing a functional configuration of a speech recognition apparatus according to an embodiment.
  • FIG. 2 is a flowchart showing a procedure of voice recognition processing by the voice recognition device.
  • FIG. 3 is a block diagram showing a hardware configuration of a navigation device that is effective in the embodiment.
  • FIG. 4 is a flowchart showing a procedure of voice recognition processing by a voice recognition unit.
  • FIG. 5 is a diagram showing an example of a first character input screen.
  • Figure 6 is a chart showing an example of refined words.
  • FIG. 7 is a diagram showing an example of a menu screen.
  • FIG. 8 is a diagram showing an example of a menu screen.
  • FIG. 9 is a diagram showing an example of a first character input screen when voice recognition is performed on the menu screen.
  • FIG. 10 is a chart showing an example of a narrowed word when performing speech recognition on the menu screen.
  • FIG. 1 is a block diagram showing a functional configuration of a speech recognition apparatus that works on the embodiment.
  • a speech recognition device 100 that is relevant to the embodiment includes a character input unit 101, a speech input unit 102, an extraction unit 103, a speech recognition unit 104, a display unit 105, an activation unit 106, a selection unit 107, and a genre input.
  • Part 108 is constituted.
  • Character input unit 101 inputs characters included in a part of a word / phrase to be recognized. Characters that are part of a phrase you want to recognize are, for example, the first letter of the phrase, The characters that make up the phrase. A plurality of characters may be input to the character input unit 101. You can also specify the position that the character occupies in the phrase (for example, the first, second, or end).
  • the voice input unit 102 receives voice recognition voice.
  • the voice input unit 102 realizes its function by, for example, a voice microphone.
  • a plurality of voice input units 102 may be provided.
  • the extraction unit 103 extracts a standby word including a character input to the character input unit 101 from a plurality of standby words that have been set in advance. For example, the extraction unit 103 extracts a standby word whose first character is the character input to the character input unit 101. In addition, when the position where the input character occupies the phrase is specified, the standby word of the input character at the specified position is extracted.
  • the speech recognition unit 104 recognizes the speech input to the speech input unit 102 using the standby word extracted by the extraction unit 103.
  • the voice recognition unit 104 performs voice recognition, for example, by converting input voice into data and performing matching processing with the extracted standby words (standby word data).
  • Display unit 105 displays the standby word extracted by extraction unit 103.
  • the display unit 105 realizes its function by, for example, a display.
  • a plurality of standby words are extracted by the extraction unit 103, a plurality of standby words are displayed on the display unit 105. If the number of extracted standby words is larger than the display space of the display unit 105, it may be displayed using a scroll screen or the like.
  • the activation unit 106 activates a predetermined process based on the voice recognition result of the voice recognition unit 104.
  • the activation unit 106 activates a predetermined process based on the selection result. For example, the predetermined process starts the instructed process if the speech-recognized word / phrase is a word / phrase meaning an instruction for the process. If the process has already been started, the necessary information can be obtained from the voice recognition result of the voice recognition unit 104 during the process.
  • the selection unit 107 selects a desired standby word from a plurality of standby words displayed on the display unit 105. For example, the user wants to speak This is a standby word that indicates a phrase.
  • the genre input unit 108 receives the genre to which the word belongs.
  • the word to which a word belongs is classified according to the meaning and content of the word, such as place name, person name, and instruction word.
  • the extraction unit 103 extracts standby words belonging to the input genre.
  • FIG. 2 is a flowchart showing a procedure of voice recognition processing by the voice recognition device.
  • the voice recognition device 100 first waits until a character is input to the character input unit 101 (step S201: No loop).
  • step S201: Yes the extraction unit 103 extracts a standby word including the input character (step S202).
  • step S203 the display unit 105 displays the extracted standby word (step S203).
  • step S204 it is determined whether or not a voice is input to the voice input unit 102 (step S204).
  • the voice is input (step S204: Yes)
  • the input voice is recognized using the standby word extracted in step S202 (step S205).
  • the activation unit 106 activates a predetermined process based on the voice recognition result (step S206), and ends the process according to the present flow chart.
  • step S204 determines whether one of the standby words displayed on display unit 105 has been selected (step S207). When any one is selected (step S207: Yes), the activation unit 106 activates a predetermined process based on the selection result (step S208), and ends the process according to this flowchart. On the other hand, if none is selected (step S207: No), the process returns to step S204 and the subsequent processing is repeated.
  • the standby words used for speech recognition can be narrowed down by inputting characters included in the phrase to be recognized.
  • the time required for voice recognition can be shortened and voice recognition processing can be performed efficiently.
  • the accuracy of speech recognition can be improved by narrowing down standby words that are candidates for speech recognition results.
  • FIG. 3 is a block diagram showing a hardware configuration of a navigation apparatus that is effective in the embodiment.
  • a navigation device 300 is mounted on a vehicle, and includes a navigation control unit 301, a user operation unit 302, a display unit 303, a position acquisition unit 304, a recording medium 305, and a recording medium decoding.
  • a unit 306, a voice output unit 307, a communication unit 308, a route search unit 309, a route guidance unit 310, a guidance sound generation unit 311, and a voice recognition unit 312 are configured.
  • the navigation control unit 301 controls the entire navigation device 300.
  • the navigation control unit 301 includes, for example, a CPU (Central Processing Unit) that executes predetermined arithmetic processing, a ROM (Read Only Memory) that stores various control programs, and a RAM (Random) that functions as a work area for the CPU. It can be realized by a microcomputer constituted by an Access Memory).
  • a CPU Central Processing Unit
  • ROM Read Only Memory
  • RAM Random
  • the navigation control unit 301 inputs / outputs information on route guidance to / from the route search unit 309, the route guidance unit 310, and the guidance sound generation unit 311, and obtains the result.
  • the information is output to the display unit 303 and the audio output unit 307.
  • the user operation unit 302 outputs information input by the user, such as characters, numerical values, and various instructions, to the navigation control unit 301.
  • information input by the user such as characters, numerical values, and various instructions
  • various known forms such as a touch panel configured integrally with a display unit 303 described later, a push button switch for detecting physical press Z non-press, a keyboard, and a joystick may be employed. Is possible.
  • the user operation unit 302 includes a microphone 302a for inputting sound from the outside.
  • the voice input from the microphone 302a is recognized by the voice recognition unit 312 described later. As a result, the user can perform an input operation by voice.
  • the user operation unit 302 may be provided integrally with the navigation device 300, and may be configured to be operated separately from the navigation device 300, such as a remote controller.
  • the user operation unit 302 may be configured in any one of the various forms described above, or may be configured in a plurality of forms.
  • the user inputs information by appropriately performing an input operation according to the form of the user operation unit 302.
  • User Examples of information input by operating the operation unit 302 include a destination point or a departure point of a route to be searched.
  • Entering the destination or departure point is applicable by entering the latitude, longitude, and address of each point, as well as specifying the telephone number, genre, keyword, etc. of the facility that is the destination or departure point.
  • the facility is searched and its location can be determined. More specifically, these pieces of information are specified as one point on the map based on background type data included in map information recorded on the recording medium 305 described later. Also, display map information on the display unit 303 described later, and specify a point on the displayed map.
  • the first character of a phrase that the user wants to utter is input to the user operation unit 302.
  • the speech recognition unit 312 described later performs speech recognition using only a word having the input character as the first character as a standby word.
  • the character input at this time is not limited to the first character, and may be the character at the end of the phrase or the character included in the phrase.
  • the display unit 303 is, for example, a CRT (Cathode Ray Tube), a TFT liquid crystal display, an organic EL display, a plasma display, or the like, and displays necessary information.
  • the display unit 303 can be configured by, for example, a video IZF or a video display device connected to the video IZF.
  • Video IZF includes, for example, a graphic controller that controls the entire display device, a buffer memory such as VRAM (Video RAM) that temporarily stores image information that can be displayed immediately, and image information that is output from the graphic controller. Based on the above, it is composed of a control IC that controls display of the display device.
  • the display unit 303 displays icons, cursors, menus, windows, or various information such as characters and images.
  • the display unit 303 displays map information and route guidance information stored in a recording medium 305 to be described later.
  • the position acquisition unit 304 includes a GPS receiver and various sensor forces, and acquires information on the current position of the apparatus main body (current position of the vehicle).
  • the position acquisition unit 304 also has a GPS receiver.
  • a server enters a predetermined area such as an area where GPS information cannot be received, it receives GPS alternative information transmitted from a communication device installed in that area and detects the current position of the vehicle.
  • the GPS receiver receives GPS information transmitted from the GPS satellite force, and obtains a geometric position with respect to the GPS satellite.
  • GPS is an abbreviation for Global Positioning System, and is a system that accurately determines the position on the ground by receiving radio waves from four or more satellites.
  • the GPS receiver is composed of an antenna for receiving radio waves from a GPS satellite, a tuner for demodulating the received radio waves, and an arithmetic circuit for calculating the current position based on the demodulated information.
  • the various sensors are various sensors mounted on the vehicle such as a vehicle speed sensor, an angular velocity sensor, a travel distance sensor, and an inclination sensor, and the travel locus of the vehicle is obtained from information output from these sensors.
  • a vehicle speed sensor an angular velocity sensor
  • a travel distance sensor a travel distance sensor
  • an inclination sensor an inclination sensor
  • the vehicle speed sensor detects from the output shaft of the transmission of the vehicle on which the navigation device 300 is mounted.
  • the angular velocity sensor detects the angular velocity when the host vehicle is rotating, and outputs angular velocity information and relative orientation information.
  • the mileage sensor calculates the number of pulses per rotation of the wheel by counting the number of pulses of a pulse signal with a predetermined period that is output as the wheel rotates, and the mileage information based on the number of pulses per rotation Is output.
  • the inclination sensor detects the inclination angle of the road surface.
  • the recording medium 305 records various control programs and various information in a state that can be read by a computer.
  • the recording medium 305 accepts writing of information by the recording medium decoding unit 306 and records the written information in a nonvolatile manner.
  • the recording medium 305 can be realized by, for example, an HD (Hard Disk).
  • the recording medium 305 is not limited to HD. Instead of HD or in addition to HD, DVD (Digital Versatile Disk) and CD (Compact Disk) can be attached to and removed from the recording medium decoding unit 306.
  • the recording medium 305 is not limited to DVD and CD.
  • the map information stored in the recording medium 305 includes background data representing features such as buildings, rivers, and the ground surface, and road shape data representing the shape of the road. It is drawn in 2D or 3D on the display screen of part 303.
  • the navigation device 300 is guiding a route, the map information recorded on the recording medium 305 and the vehicle position acquired by the position acquisition unit 304 are displayed in an overlapping manner.
  • the map information is recorded on the recording medium 305.
  • the map information may be provided outside the navigation device 300, not the information recorded only in the one integrated with the hardware of the navigation device 300.
  • the navigation device 300 acquires map information via the network through the communication unit 308, for example.
  • the acquired map information is stored in RAM.
  • the recording medium decoding unit 306 controls reading of information on the recording medium 305 and writing of Z.
  • the recording medium decoding unit 306 is an HDD (Hard Disk Drive).
  • the recording medium decoding unit 306 is a DVD drive or a CD drive.
  • a CD-ROM (CD-R, CD-RW), MO, memory card, etc. is used as a writable and removable recording medium 30 5, information can be written to various recording media and various recording media can be used.
  • a dedicated drive device capable of reading stored information is appropriately used as the recording medium decoding unit 306.
  • the audio output unit 307 reproduces the guide sound by controlling the output to the connected speaker (not shown). There may be one or more speakers. Specifically, the audio output unit 307 can be realized by an audio IZF connected to an audio output speaker. More specifically, the audio IZF is, for example, a DZA converter that performs DZA conversion of audio digital information, an amplifier that amplifies the audio analog signal output from the DZ A converter, and AZD that converts audio analog information. Con It can be configured with a barter and force.
  • the communication unit 308 acquires road traffic information such as traffic jams and traffic regulations regularly or irregularly.
  • the communication unit 308 is connected to a network and transmits / receives information to / from other devices connected to the network such as a server.
  • the reception of road traffic information by the communication unit 308 may be performed at the timing when the road traffic information is distributed from the VICS (Vehicle Information and Communication System) center, or the road traffic information is periodically sent to the VICS center. It may be done on request.
  • road traffic information in a desired area may be acquired via a network from nationwide VICS information collected in Sano.
  • the communication unit 308 can be realized by, for example, an FM tuner, a VICS / beacon resino, a wireless communication device, and other communication devices.
  • the route search unit 309 searches for an optimal route from the departure point to the destination point using map information stored in the recording medium 305, VICS information acquired via the communication unit 308, and the like. To do.
  • the optimum route is a route that best meets the conditions specified by the user. In general, there are an infinite number of routes from a departure point to a destination point. For this reason, items to be considered in route search are set, and routes that match the conditions are searched.
  • the route guidance unit 310 is obtained from the guidance route information searched by the route search unit 309, the vehicle position information acquired by the position acquisition unit 304, and the recording medium 305 via the recording medium decoding unit 300. Real-time route guidance information is generated based on the map information. The route guidance information generated at this time may be information that considers the traffic jam information received by the communication unit 308. The route guidance information generated by the route guidance unit 310 is output to the display unit 303 via the navigation control unit 301.
  • the guide sound generator 311 generates tone and voice information corresponding to the pattern. That is, based on the route guidance information generated by the route guidance unit 310, the virtual sound source corresponding to the guidance point is set and the voice guidance information is generated, and the voice is transmitted via the navigation control unit 301. Output to the output unit 307.
  • the voice recognition unit 312 recognizes voice input via the microphone 302a.
  • Voice recognition The recognition unit 312 has an utterance button or the like in a part of the user operation unit 302, and recognizes the voice input to the microphone 302a after the utterance trigger is generated by using the utterance button as an utterance trigger.
  • the navigation control unit 301 performs processing corresponding to the recognized word.
  • the navigation control unit 301 sets the recognized place name as the destination point.
  • the user can set the destination point by speaking the destination point name instead of specifying the destination point from the map displayed on the display unit 303.
  • the voice recognition performed by the voice recognition unit 312 can be replaced with the operation performed by the user operation unit 302.
  • a speech recognition dictionary that extracts time series information of spectrum and fundamental frequency as feature quantities of input speech and stores the pattern corresponding to each word.
  • the frequency spectrum of the input speech is analyzed, and the phoneme is specified by comparing and collating with a phoneme model prepared in advance. Then, the identified phoneme model and the pattern of each word stored in the speech recognition dictionary (hereinafter referred to as a standby word) are compared and verified by pattern matching to calculate the similarity for each word. Next, the standby word with the highest similarity (the word with the closest pattern) is recognized as the input speech, and the standby word is output. That is, the input speech is determined by examining which standby word the frequency distribution pattern of the input word is most similar to.
  • the voice recognition unit 312 limits the number of standby words to be subjected to the matching process in the voice recognition process from the relationship with the processing time of the matching process.
  • the speech recognition unit 312 performs a matching process on the frequency pattern of the input speech and all the standby words to be processed, and then calculates the similarity for each standby word. To do. For this reason, the processing time can be shortened as the number of waiting words to be subjected to the matching processing is small.
  • the standby word to be matched If the word does not match the spoken word, misrecognitions and errors (no corresponding word) will occur frequently, resulting in poor use and use.
  • the voice recognition unit 312 narrows down the words to be matched (hereinafter referred to as narrowed words) by waiting for the user to input the first character of the word to be recognized. For example, when “sa” is input as the first character, only words having “sa” as the first character, such as “sa, tamago” and “sasebo”, are extracted as narrowed words.
  • narrowed words For example, when “sa” is input as the first character, only words having “sa” as the first character, such as “sa, tamago” and “sasebo”, are extracted as narrowed words.
  • speech recognition processing matching processing is performed on the input speech and refined words. As a result, the efficiency of the speech recognition process can be improved while improving the accuracy of speech recognition.
  • the narrowing down of phrases is not limited to the input of the first character, and for example, the last character may be input or the phrases including the input character may be narrowed down.
  • character input is not limited to the touch panel, and may be handwritten input, for example.
  • a sensor panel for handwriting input is provided on the dominant hand side of the user. At this time, reception of voice input may be started with the utterance trigger that the input character is recognized.
  • the navigation device 300 is configured by the hardware configuration as described above!
  • the character input unit 101, the selection unit 107, and the genre input unit 108 which are functional configurations of the speech recognition device 100 that is relevant to the embodiment, are the user operation unit 302, the voice input unit 102 is the microphone 302a, The navigation control unit 301, the voice recognition unit 104, the voice recognition unit 312 and the display unit 105, the display control unit 103 and the activation unit 106, respectively.
  • FIG. 4 is a flowchart showing a procedure of voice recognition processing by the voice recognition unit.
  • a touch panel is adopted as the user operation unit 302.
  • the voice recognition unit 312 waits until a character is input via the user operation unit 302 (step S401: loop of No). For example, the user should speak This is the first character (first character) of the phrase. In addition, it may be the last character or a character included in a phrase. At this time, if an utterance trigger occurs without any character input, and speech is input, the speech recognition is performed by matching with the entire waiting word without narrowing down.
  • step S401 When a character is input (step S401: Yes), a refined word is extracted based on the input character (step S402).
  • a narrowed word is a narrowed word that is narrowed down under certain conditions, as described above. Then, the refined word is displayed on the display unit 303 (step S403).
  • step S403 When the narrowed word is displayed in step S403, the user can determine whether or not to narrow down by looking at the displayed word and phrase and further inputting characters. If more characters are input by the user (step S404: Yes), the process returns to step S402 and the subsequent processing is repeated. This further narrows down the standby words.
  • step S404 if there is no further character input (step S404: No), an utterance trigger is generated and the process waits until a voice is input (step S405: loop of No).
  • step S405: Yes When speech is input (step S405: Yes), the input speech and the narrowed word are matched (step S406), and the spoken phrase is recognized (step S407).
  • step S407 the spoken phrase is recognized (step S407).
  • the process ends.
  • characters are input, but for example, the phrase attributes (for example, the semantic classification of place names, song titles, directives, etc.) are specified, and only the phrases with the specified attributes are extracted. As a matter of fact.
  • the speech recognition unit 312 narrows down the standby words to be matched under the conditions specified by the user, and performs the matching process with the input speech only for the narrowed words and phrases. Thereby, the time required for the voice recognition processing can be shortened, and the voice recognition response of the navigation device 300 can be improved.
  • the phrase recognition that has been narrowed down to a certain extent also performs speech recognition, so the recognition accuracy can be improved.
  • FIG. 5 is a diagram showing an example of the input screen for the first character. Details of the extraction of narrowed words (step S402) and the display of narrowed words (step S403) shown in FIG. 4 will be described.
  • speech recognition is used when setting the destination point.
  • Figure 5 In this case, a touch panel is adopted as the user operation unit 302, and a character input screen 500 is displayed on the display unit 303. On the character input screen 500, a character input key 511, an input character display unit 512, and a narrowed word display unit 513 are displayed.
  • hiragana characters are arranged in the order of 50 tones.
  • a switch button is provided to display alphanumeric and katakana input keys. The user can input a desired character by touching the screen display of the desired character. The input character is displayed in the input character display section 512. In the illustrated example, a key corresponding to “sa” among the character input keys 511 is pressed, and the character “sa” is displayed in the input character display portion 512.
  • the voice recognition unit 312 extracts a standby word having the input character as the first character as a narrowed word.
  • the extracted narrowed words are displayed in the narrowed word display unit 513.
  • place names such as “Yuki-ku”, “Saimura”, “Nishikaien”, etc., with “sa” as the first character are displayed as narrowed words.
  • the scroll button 513a is pressed, the narrowed words are displayed!
  • the voice recognition unit 312 performs matching processing between the input voice and the narrowed word, and recognizes a phrase uttered by the user. For this reason, in the example shown in the figure, for example, words such as “Tokyo” that do not start with “sa” cannot be recognized.
  • the user can also select a place name by touching the place name displayed on the narrowed word display unit 513.
  • FIG. 6 is a chart showing an example of narrowed words.
  • a word / phrase group 601 displays place names having “sa” as the first character. This is because “sa” is entered in the character input screen 500 shown in FIG.
  • the speech recognition unit 312 recognizes the input speech by matching processing with the words in the word group 601. Also, as shown, There are many words that start with place names. Therefore, only the display word / phrase group 602 indicated by the dotted line can be displayed on the narrowed word display portion 513 (see FIG. 5) of the character input screen 500. To display words other than the display word group 602, press the scroll button 513a (FIG. 5).
  • FIG. 7 and 8 are diagrams showing an example of the menu screen.
  • the destination point is set by voice recognition, so the narrow-down word is also limited to the place name.
  • the speech recognition unit 312 extracts narrowed words that meet the purpose of speech recognition.
  • a menu screen 700 is displayed on the display unit 303.
  • the menu screen 700 is a screen for selecting an operation to be performed by the user. The user selects an operation by touching a desired operation display 711 to 714 or by speaking a desired operation content.
  • selectable operations include “Set destination point” (operation display 711), “Search for a song” (operation display 712), “View traffic information” (operation display 713), “ Change device settings "(operation display 714) is displayed. Further, when the scroll bar 721 is pressed, an operation display of other operations is displayed.
  • Setting method selection screen 800 is a screen for selecting a method for executing the operation content (in the illustrated example, setting of a destination point). As in FIG. 7, the user selects the operation by touching the desired method display 811 to 814 or by speaking the desired setting method.
  • selectable methods are “Search in order of 50 notes” (Method display 811), “Search for map power” (Method display 812), “Search for driving history power” (Method display 813), “ “Search by genre” (method display 814) is displayed. Furthermore, if the scroll bar 821 is pressed, a method display of another method is displayed.
  • FIG. 9 is a diagram showing an example of the input screen for the first character when performing voice recognition on the menu screen.
  • the character input screen 900 shown in FIG. The character input screen 900 has a character input key 911, an input character display portion 912, and a narrowed word display portion 913.
  • a key corresponding to “ko” among the character input keys 911 is pressed, and the character “ko” is displayed in the input character display unit 912.
  • a standby word with “ko” as the initial character is displayed as a narrowing word.
  • words that cannot be displayed are displayed.
  • this standby word all the words that can be uttered in the operation of the navigation device 300 without being limited to a specific attribute are displayed. For example, directives such as “go here”, facility names (place names) such as “Koshien”, and compound phrases such as “go to Koshien”.
  • FIG. 10 is a chart showing an example of narrowed words when performing voice recognition on the menu screen.
  • a phrase group 1001 displays standby words with “ko” as the first character.
  • a display word / phrase group 1002 indicated by a dotted line is displayed.
  • press the scroll button 913a see FIG. 9.
  • phrases included in the phrase group 1001 include the above-mentioned directives and place names, as well as navigation devices. There are song titles of music data recorded on the recording medium 305 of the device 300, map scale change instructions, and the like. The user can perform these operations directly from the menu screen 700 by voice recognition. For this reason, it is possible to perform a desired operation without going through the hierarchy of the display screen, and the user's operation burden can be reduced.
  • the navigation device 300 that is effective in the present embodiment, by inputting the first character of the standby word, the standby word to be processed is narrowed down, and the user's operation burden is reduced while performing the speech recognition processing efficiently. be able to. In this way, the user's operational burden can be reduced by appropriately narrowing down the words to be processed from many standby words.
  • the navigation device 300 that is powerful in the embodiment, by narrowing down the standby words to be processed by the first word or the like, the time required for the matching process is shortened and the processing load is reduced. Can be reduced. In addition, since the corresponding words are recognized from the narrowed standby words, the accuracy of voice recognition can be improved. Furthermore, by appropriately narrowing down the words to be processed from many standby words, it is possible to reduce the operation burden during the setting operation for the user.
  • the speech recognition method described in the present embodiment can be realized by executing a program prepared in advance on a computer such as a personal computer or a workstation.
  • This program is recorded on a computer-readable recording medium such as a hard disk, a flexible disk, a CD-ROM, an MO, or a DVD, and is executed by being read by the computer.
  • the program may be a transmission medium that can be distributed via a network such as the Internet.

Abstract

A character input section (101) receives an input of character contained in a part of a phrase or word to be recognized. A speech input section (102) receives an input of a speech to be recognized. An extracting section (103) extracts a waiting word containing the inputted character from predetermined waiting words. A speech recognizing section (104) recognizes the inputted speech by using the extracted waiting word. A display section (105) displays the extracted waiting words. A start section (106) starts a predetermined processing depending on the result of the speech recognition. A selecting section (107) selects a desired waiting word from the waiting words displayed by the display section (105). A genre input section (108) receives an input of the genre to which the phrase or word belongs.

Description

明 細 書  Specification
音声認識装置、音声認識方法、音声認識プログラムおよび記録媒体 技術分野  Voice recognition apparatus, voice recognition method, voice recognition program, and recording medium
[0001] 本発明は、発話された音声を認識する音声認識装置、音声認識方法、音声認識プ ログラムおよび記録媒体に関する。ただし、本発明の利用は、上述した音声認識装 置、音声認識方法、音声認識プログラムおよび記録媒体には限られない。  TECHNICAL FIELD [0001] The present invention relates to a voice recognition device, a voice recognition method, a voice recognition program, and a recording medium that recognize a spoken voice. However, the use of the present invention is not limited to the above-described voice recognition device, voice recognition method, voice recognition program, and recording medium.
背景技術  Background art
[0002] 従来、マイクなどを介して入力された人間の音声を認識する音声認識技術が知ら れている。このような音声認識においては、ユーザによる発話 (音声入力)の周波数ス ベクトルを解析し、あらかじめ用意されている音素モデルと比較 ·照合し、音素の特定 をおこなう。そして、特定された単語モデルとあらかじめ音声認識用辞書などに登録 されている待ち受け語とを比較 *照合することによって、両者の一致頻度を計算し、発 話された語句を特定する (たとえば、下記特許文献 1参照。 ) o  Conventionally, a voice recognition technique for recognizing human voice input via a microphone or the like is known. In such speech recognition, the frequency vector of the utterance (voice input) by the user is analyzed, compared with the phoneme model prepared in advance, and the phoneme is specified. Then, by comparing * collating the identified word model with the standby words registered in the speech recognition dictionary in advance, the matching frequency of both is calculated, and the spoken phrase is identified (for example, (See Patent Document 1.) o
[0003] 特許文献 1 :特開 2000— 99546号公報  [0003] Patent Document 1: JP 2000-99546 A
発明の開示  Disclosure of the invention
発明が解決しょうとする課題  Problems to be solved by the invention
[0004] しかしながら、前述した従来技術によれば、音声認識用辞書に登録されている待ち 受け語の数に比例して、認識処理に時間が力かってしまう。このため、多くの待ち受 け語が登録されて 、る場合、音声認識処理のレスポンスが悪 、と 、う問題点が一例と して挙げられる。 [0004] However, according to the above-described prior art, time is required for the recognition processing in proportion to the number of standby words registered in the speech recognition dictionary. For this reason, when many standby words are registered, the response of the speech recognition processing is bad, for example.
[0005] 一般に音声認識においては、なるべく多くの待ち受け語が登録されている方がュ 一ザの利便性が高くなる。たとえば、「駐車場」と「パーキング」は、同じ車両を駐車す るスペースを示す語である力 ユーザがどちらを発話するかは事前に知ることができ ず、両方を待ち受け語として登録した方が利便性が高い。このような利便性を保った めには、待ち受け語の登録数を維持しつつ、音声認識処理を効率的におこなう必要 がある。  [0005] Generally, in speech recognition, the convenience of a user is higher when as many standby words as possible are registered. For example, “parking lot” and “parking” are words that indicate the space in which the same vehicle is parked. It is impossible to know in advance which user will speak, and it is better to register both as standby words. Convenience is high. In order to maintain such convenience, it is necessary to efficiently perform speech recognition processing while maintaining the number of registered standby words.
[0006] また、多くの待ち受け語が登録されている場合、似たような音声的特徴を有する語 句どうしが登録される可能性が高くなり、音声認識の精度が低下してしまうという問題 点が一例として挙げられる。 [0006] When many standby words are registered, words having similar voice characteristics An example is the problem that the possibility of registering phrases increases and the accuracy of speech recognition decreases.
課題を解決するための手段  Means for solving the problem
[0007] 上述した課題を解決し、目的を達成するため、請求項 1の発明にかかる音声認識 装置は、音声認識したい語句の一部に含まれる文字が入力される文字入力手段と、 前記音声認識の音声が入力される音声入力手段と、あらかじめ設定された複数の待 ち受け語から、前記文字入力手段に入力された前記文字を含む待ち受け語を抽出 する抽出手段と、前記抽出手段によって抽出された前記待ち受け語を用いて、前記 音声入力手段に入力された前記音声を音声認識する音声認識手段と、を備えること を特徴とする。  In order to solve the above-described problems and achieve the object, the speech recognition apparatus according to the invention of claim 1 includes a character input means for inputting characters included in a part of a phrase to be speech-recognized, and the speech Extracted by a speech input means for inputting recognition speech, an extraction means for extracting a standby word including the character input to the character input means from a plurality of preset standby words, and extracted by the extraction means Voice recognition means for recognizing the voice input to the voice input means using the standby word.
[0008] また、請求項 7の発明に力かる音声認識方法は、音声認識した 、語句の一部に含 まれる文字が入力される文字入力工程と、前記音声認識の音声が入力される音声入 力工程と、あらかじめ設定された複数の待ち受け語から、前記文字入力工程に入力 された前記文字を含む待ち受け語を抽出する抽出工程と、前記抽出工程によって抽 出された前記待ち受け語を用いて、前記音声入力工程に入力された前記音声を音 声認識する音声認識工程と、を含むことを特徴とする。  [0008] Further, the speech recognition method according to the invention of claim 7 includes a character input step of inputting characters included in a part of a phrase after speech recognition, and a speech input of speech of the speech recognition. An input step, an extraction step for extracting a standby word including the character input in the character input step from a plurality of preset standby words, and the standby word extracted by the extraction step. And a voice recognition step for recognizing the voice inputted in the voice input step.
[0009] また、請求項 8の発明に力かる音声認識プログラムは、請求項 7に記載の音声認識 方法をコンピュータに実行させることを特徴とする。  [0009] Further, a voice recognition program according to the invention of claim 8 causes a computer to execute the voice recognition method of claim 7.
[0010] また、請求項 9の発明に力かる記録媒体は、請求項 8に記載の音声認識プログラム を記録したコンピュータに読み取り可能なことを特徴とする。  [0010] Further, a recording medium according to the invention of claim 9 is readable by a computer having the voice recognition program according to claim 8 recorded thereon.
図面の簡単な説明  Brief Description of Drawings
[0011] [図 1]図 1は、実施の形態に力かる音声認識装置の機能的構成を示すブロック図であ る。  [0011] FIG. 1 is a block diagram showing a functional configuration of a speech recognition apparatus according to an embodiment.
[図 2]図 2は、音声認識装置による音声認識処理の手順を示すフローチャートである  FIG. 2 is a flowchart showing a procedure of voice recognition processing by the voice recognition device.
[図 3]図 3は、実施例に力かるナビゲーシヨン装置のハードウェア構成を示すブロック 図である。 FIG. 3 is a block diagram showing a hardware configuration of a navigation device that is effective in the embodiment.
[図 4]図 4は、音声認識部による音声認識処理の手順を示すフローチャートである。 O FIG. 4 is a flowchart showing a procedure of voice recognition processing by a voice recognition unit. O
[図 5]図 5は、先頭文字の入力画面の一例を示す図である。  FIG. 5 is a diagram showing an example of a first character input screen.
[図1—  [Figure 1-
〇 6]図 6は、絞り込み語の一例を示した図表である。  〇 6] Figure 6 is a chart showing an example of refined words.
 Yes
[図 7]図 7は、メニュー画面の一例を示す図である。  FIG. 7 is a diagram showing an example of a menu screen.
[図 8]図 8は、メニュー画面の一例を示す図である。  FIG. 8 is a diagram showing an example of a menu screen.
[図 9]図 9は、メニュー画面で音声認識をおこなう際の先頭文字の入力画面の一例を 示す図である。  [FIG. 9] FIG. 9 is a diagram showing an example of a first character input screen when voice recognition is performed on the menu screen.
[図 10]図 10は、メニュー画面で音声認識をおこなう際の絞り込み語の一例を示した 図表である。  [FIG. 10] FIG. 10 is a chart showing an example of a narrowed word when performing speech recognition on the menu screen.
符号の説明  Explanation of symbols
音声認識装置  Voice recognition device
101 文字入力部  101 Character input part
102 音声入力部  102 Audio input section
103 抽出部  103 Extractor
104 音声認識部  104 Voice recognition unit
105 表示部  105 Display
106 起動部  106 Starter
107 選択部  107 Selector
108 ジャンル入力咅 1  108 Genre input 咅 1
発明を実施するための最良の形態  BEST MODE FOR CARRYING OUT THE INVENTION
[0013] 以下に添付図面を参照して、この発明にかかる音声認識装置、音声認識方法、音 声認識プログラムおよび記録媒体の好適な実施の形態を詳細に説明する。  Exemplary embodiments of a speech recognition device, a speech recognition method, a speech recognition program, and a recording medium according to the present invention will be described below in detail with reference to the accompanying drawings.
[0014] (実施の形態)  [0014] (Embodiment)
図 1は、実施の形態に力かる音声認識装置の機能的構成を示すブロック図である。 図 1において、実施の形態に力かる音声認識装置 100は、文字入力部 101、音声入 力部 102、抽出部 103、音声認識部 104、表示部 105、起動部 106、選択部 107、 ジャンル入力部 108によって構成される。  FIG. 1 is a block diagram showing a functional configuration of a speech recognition apparatus that works on the embodiment. In FIG. 1, a speech recognition device 100 that is relevant to the embodiment includes a character input unit 101, a speech input unit 102, an extraction unit 103, a speech recognition unit 104, a display unit 105, an activation unit 106, a selection unit 107, and a genre input. Part 108 is constituted.
[0015] 文字入力部 101は、音声認識したい語句の一部に含まれる文字が入力される。音 声認識したい語句の一部に含まれる文字とは、たとえば、その語句の先頭文字など、 その語句を構成する文字である。文字入力部 101に入力される文字は、複数であつ てもよい。また、その文字が語句に占める位置 (たとえば、先頭、 2文字目、末尾など) を指定できるようにしてもょ 、。 [0015] Character input unit 101 inputs characters included in a part of a word / phrase to be recognized. Characters that are part of a phrase you want to recognize are, for example, the first letter of the phrase, The characters that make up the phrase. A plurality of characters may be input to the character input unit 101. You can also specify the position that the character occupies in the phrase (for example, the first, second, or end).
[0016] 音声入力部 102は、音声認識の音声が入力される。音声入力部 102は、たとえば、 音声マイクによってその機能を実現する。音声入力部 102は、複数設けられていても よい。 The voice input unit 102 receives voice recognition voice. The voice input unit 102 realizes its function by, for example, a voice microphone. A plurality of voice input units 102 may be provided.
[0017] 抽出部 103は、あら力じめ設定された複数の待ち受け語から、文字入力部 101に 入力された文字を含む待ち受け語を抽出する。抽出部 103は、たとえば、文字入力 部 101に入力された文字が先頭文字の待ち受け語を抽出する。この他、入力された 文字が語句に占める位置が指定された場合は、指定された位置が入力された文字 の待ち受け語を抽出する。  The extraction unit 103 extracts a standby word including a character input to the character input unit 101 from a plurality of standby words that have been set in advance. For example, the extraction unit 103 extracts a standby word whose first character is the character input to the character input unit 101. In addition, when the position where the input character occupies the phrase is specified, the standby word of the input character at the specified position is extracted.
[0018] 音声認識部 104は、抽出部 103によって抽出された待ち受け語を用いて、音声入 力部 102に入力された音声を音声認識する。音声認識部 104は、たとえば、入力さ れた音声をデータに変換し、抽出された待ち受け語 (待ち受け語データ)とのマッチ ング処理をおこなうことによって、音声認識をおこなう。  The speech recognition unit 104 recognizes the speech input to the speech input unit 102 using the standby word extracted by the extraction unit 103. The voice recognition unit 104 performs voice recognition, for example, by converting input voice into data and performing matching processing with the extracted standby words (standby word data).
[0019] 表示部 105は、抽出部 103によって抽出された待ち受け語を表示する。表示部 10 5は、たとえば、ディスプレイなどによってその機能を実現する。抽出部 103によって 複数の待ち受け語が抽出された場合は、表示部 105には複数の待ち受け語が表示 される。抽出された待ち受け語の数が表示部 105の表示スペースに比べて多い場合 は、スクロール画面などを用いて表示することとしてもよ 、。  Display unit 105 displays the standby word extracted by extraction unit 103. The display unit 105 realizes its function by, for example, a display. When a plurality of standby words are extracted by the extraction unit 103, a plurality of standby words are displayed on the display unit 105. If the number of extracted standby words is larger than the display space of the display unit 105, it may be displayed using a scroll screen or the like.
[0020] 起動部 106は、音声認識部 104の音声認識結果に基づいて、所定の処理を起動 する。また、起動部 106は、後述する選択部 107によって待ち受け語の選択があった 場合は、その選択結果に基づいて、所定の処理を起動する。所定の処理とは、たと えば、音声認識された語句が処理の指示を意味する語句であれば、指示された処理 を起動する。すでに処理が起動されている場合、その処理中で必要な情報を音声認 識部 104の音声認識結果によって得られるようにしてもょ 、。  The activation unit 106 activates a predetermined process based on the voice recognition result of the voice recognition unit 104. In addition, when the selection unit 107 described later selects a standby word, the activation unit 106 activates a predetermined process based on the selection result. For example, the predetermined process starts the instructed process if the speech-recognized word / phrase is a word / phrase meaning an instruction for the process. If the process has already been started, the necessary information can be obtained from the voice recognition result of the voice recognition unit 104 during the process.
[0021] 選択部 107は、表示部 105によって表示された複数の待ち受け語から、所望の待 ち受け語を選択する。所望の待ち受け語とは、たとえば、ユーザが発話しょうとしてい る語句を示す待ち受け語である。 The selection unit 107 selects a desired standby word from a plurality of standby words displayed on the display unit 105. For example, the user wants to speak This is a standby word that indicates a phrase.
[0022] ジャンル入力部 108は、語句が属するジャンルが入力される。語句が属するジヤン ルとは、地名や人名、指示語など、語句の意味や内容ごとに種類分けしたものである 。ジャンル入力部 108にジャンルの入力があった場合、抽出部 103は、入力されたジ ヤンルに属する待ち受け語を抽出する。  The genre input unit 108 receives the genre to which the word belongs. The word to which a word belongs is classified according to the meaning and content of the word, such as place name, person name, and instruction word. When a genre is input to the genre input unit 108, the extraction unit 103 extracts standby words belonging to the input genre.
[0023] 図 2は、音声認識装置による音声認識処理の手順を示すフローチャートである。音 声認識装置 100は、まず、文字入力部 101に文字が入力されるまで待機する (ステツ プ S201 :Noのループ)。文字が入力されると(ステップ S201 : Yes)、抽出部 103は 、入力された文字を含む待ち受け語を抽出する (ステップ S202)。そして、表示部 10 5は抽出された待ち受け語を表示する (ステップ S203)。  FIG. 2 is a flowchart showing a procedure of voice recognition processing by the voice recognition device. The voice recognition device 100 first waits until a character is input to the character input unit 101 (step S201: No loop). When a character is input (step S201: Yes), the extraction unit 103 extracts a standby word including the input character (step S202). Then, the display unit 105 displays the extracted standby word (step S203).
[0024] つづいて、音声入力部 102に音声が入力されたかを判断する (ステップ S204)。音 声が入力された場合は (ステップ S204 : Yes)、ステップ S202で抽出された待ち受け 語を用いて、入力された音声を音声認識する (ステップ S205)。そして、起動部 106 は、音声認識結果に基づいて所定の処理を起動して (ステップ S206)、本フローチヤ ートによる処理を終了する。  Next, it is determined whether or not a voice is input to the voice input unit 102 (step S204). When the voice is input (step S204: Yes), the input voice is recognized using the standby word extracted in step S202 (step S205). Then, the activation unit 106 activates a predetermined process based on the voice recognition result (step S206), and ends the process according to the present flow chart.
[0025] 一方、ステップ S204において、音声が入力されない場合は(ステップ S204 : No)、 表示部 105に表示された待ち受け語のうちいずれかが選択されたかを判断する (ス テツプ S207)。いずれかが選択された場合は(ステップ S207 : Yes)、起動部 106は 、選択結果に基づいて所定の処理を起動して (ステップ S208)、本フローチャートに よる処理を終了する。一方、いずれも選択されな力つた場合は (ステップ S207 : No) 、ステップ S204に戻り、以降の処理を繰り返す。  On the other hand, if no voice is input in step S204 (step S204: No), it is determined whether one of the standby words displayed on display unit 105 has been selected (step S207). When any one is selected (step S207: Yes), the activation unit 106 activates a predetermined process based on the selection result (step S208), and ends the process according to this flowchart. On the other hand, if none is selected (step S207: No), the process returns to step S204 and the subsequent processing is repeated.
[0026] 以上説明したように、実施の形態に力かる音声認識装置 100によれば、認識したい 語句が含む文字を入力させることによって、音声認識に用いる待ち受け語を絞り込 むことができる。これにより、音声認識に要する時間を短縮し、音声認識処理を効率 的におこなうことができる。また、音声認識結果の候補である待ち受け語を絞り込むこ とによって、音声認識の精度を高めることができる。  [0026] As described above, according to the speech recognition apparatus 100 that is effective in the embodiment, the standby words used for speech recognition can be narrowed down by inputting characters included in the phrase to be recognized. As a result, the time required for voice recognition can be shortened and voice recognition processing can be performed efficiently. In addition, the accuracy of speech recognition can be improved by narrowing down standby words that are candidates for speech recognition results.
実施例  Example
[0027] (ナビゲーシヨン装置 300のハードウェア構成) 図 3は、実施例に力かるナビゲーシヨン装置のハードウェア構成を示すブロック図で ある。実施例では、実施の形態にかかる音声認識装置 100を、ナビゲーシヨン装置 3 00の入力手段として用いる場合について説明する。図 3において、ナビゲーシヨン装 置 300は、車両に搭載されており、ナビゲーシヨン制御部 301と、ユーザ操作部 302 と、表示部 303と、位置取得部 304と、記録媒体 305と、記録媒体デコード部 306と、 音声出力部 307と、通信部 308と、経路探索部 309と、経路誘導部 310と、案内音生 成部 311と、音声認識部 312と、によって構成される。 [0027] (Hardware configuration of navigation device 300) FIG. 3 is a block diagram showing a hardware configuration of a navigation apparatus that is effective in the embodiment. In the example, a case where the speech recognition apparatus 100 according to the embodiment is used as an input unit of the navigation apparatus 300 will be described. In FIG. 3, a navigation device 300 is mounted on a vehicle, and includes a navigation control unit 301, a user operation unit 302, a display unit 303, a position acquisition unit 304, a recording medium 305, and a recording medium decoding. A unit 306, a voice output unit 307, a communication unit 308, a route search unit 309, a route guidance unit 310, a guidance sound generation unit 311, and a voice recognition unit 312 are configured.
[0028] ナビゲーシヨン制御部 301は、ナビゲーシヨン装置 300全体を制御する。ナビゲー シヨン制御部 301は、たとえば所定の演算処理を実行する CPU (Central Process ing Unit)や、各種制御プログラムを格納する ROM (Read Only Memory)、お よび、 CPUのワークエリアとして機能する RAM (Random Access Memory)など によって構成されるマイクロコンピュータなどによって実現することができる。  The navigation control unit 301 controls the entire navigation device 300. The navigation control unit 301 includes, for example, a CPU (Central Processing Unit) that executes predetermined arithmetic processing, a ROM (Read Only Memory) that stores various control programs, and a RAM (Random) that functions as a work area for the CPU. It can be realized by a microcomputer constituted by an Access Memory).
[0029] また、ナビゲーシヨン制御部 301は、経路誘導に際し、経路探索部 309、経路誘導 部 310、案内音生成部 311との間で経路誘導に関する情報の入出力をおこない、そ の結果得られる情報を表示部 303および音声出力部 307へ出力する。  In addition, the navigation control unit 301 inputs / outputs information on route guidance to / from the route search unit 309, the route guidance unit 310, and the guidance sound generation unit 311, and obtains the result. The information is output to the display unit 303 and the audio output unit 307.
[0030] ユーザ操作部 302は、文字、数値、各種指示など、ユーザによって入力操作された 情報をナビゲーシヨン制御部 301に対して出力する。ユーザ操作部 302の構成とし ては、後述する表示部 303と一体として構成されるタツチパネル、物理的な押下 Z非 押下を検出する押ボタンスィッチ、キーボード、ジョイスティックなど公知の各種形態 を採用することが可能である。  The user operation unit 302 outputs information input by the user, such as characters, numerical values, and various instructions, to the navigation control unit 301. As the configuration of the user operation unit 302, various known forms such as a touch panel configured integrally with a display unit 303 described later, a push button switch for detecting physical press Z non-press, a keyboard, and a joystick may be employed. Is possible.
[0031] また、ユーザ操作部 302は、外部からの音声を入力するマイク 302aを備える。マイ ク 302aから入力された音声は、後述する音声認識部 312によって音声認識される。 これにより、ユーザは音声によって入力操作をおこなうことができる。  [0031] The user operation unit 302 includes a microphone 302a for inputting sound from the outside. The voice input from the microphone 302a is recognized by the voice recognition unit 312 described later. As a result, the user can perform an input operation by voice.
[0032] ユーザ操作部 302は、ナビゲーシヨン装置 300に対して一体に設けられていてもよ V、し、リモコンのようにナビゲーシヨン装置 300から分離して操作可能な形態であって もよい。ユーザ操作部 302は、上述した各種形態のうちいずれか単一の形態で構成 されていてもよいし、複数の形態で構成されていてもよい。ユーザは、ユーザ操作部 302の形態に応じて、適宜入力操作をおこなうことによって情報を入力する。ユーザ 操作部 302の操作によって入力される情報としては、たとえば、探索する経路の目的 地点または出発地点が挙げられる。 [0032] The user operation unit 302 may be provided integrally with the navigation device 300, and may be configured to be operated separately from the navigation device 300, such as a remote controller. The user operation unit 302 may be configured in any one of the various forms described above, or may be configured in a plurality of forms. The user inputs information by appropriately performing an input operation according to the form of the user operation unit 302. User Examples of information input by operating the operation unit 302 include a destination point or a departure point of a route to be searched.
[0033] 目的地点または出発地点の入力は、それぞれの地点の緯度 ·経度や住所を入力 する他、目的地点または出発地点となる施設の電話番号やジャンル、キーワードなど を指定することによって、該当する施設が探索され、その位置を特定することができる 。より詳細には、これらの情報は、後述する記録媒体 305に記録された地図情報に含 まれる背景種別データに基づいて、地図上の一点として特定される。また、後述する 表示部 303に地図情報を表示させ、表示された地図上の一点を指定するようにして ちょい。 [0033] Entering the destination or departure point is applicable by entering the latitude, longitude, and address of each point, as well as specifying the telephone number, genre, keyword, etc. of the facility that is the destination or departure point. The facility is searched and its location can be determined. More specifically, these pieces of information are specified as one point on the map based on background type data included in map information recorded on the recording medium 305 described later. Also, display map information on the display unit 303 described later, and specify a point on the displayed map.
[0034] また、ユーザ操作部 302には、後述する音声認識部 312によって音声認識がおこ なわれる際に、ユーザが発話しょうとする語句の先頭文字が入力される。後述する音 声認識部 312は、所定の画面においてユーザ操作部 302に文字が入力されると、入 力された文字を先頭文字とする語句のみを待ち受け語として、音声認識をおこなう。 なお、このとき入力される文字は先頭文字に限らず、語句の末尾の文字や、語句が 含む文字であってもよい。  [0034] Further, when voice recognition is performed by a voice recognition unit 312 (to be described later), the first character of a phrase that the user wants to utter is input to the user operation unit 302. When a character is input to the user operation unit 302 on a predetermined screen, the speech recognition unit 312 described later performs speech recognition using only a word having the input character as the first character as a standby word. The character input at this time is not limited to the first character, and may be the character at the end of the phrase or the character included in the phrase.
[0035] 表示部 303は、たとえば、 CRT (Cathode Ray Tube)、 TFT液晶ディスプレイ、 有機 ELディスプレイ、プラズマディスプレイなどであり、必要な情報の表示をおこなう 。表示部 303は、具体的には、たとえば、映像 IZFや映像 IZFに接続された映像表 示用のディスプレイ装置によって構成することができる。映像 IZFは、たとえば、ディ スプレイ装置全体の制御をおこなうグラフィックコントローラと、即時表示可能な画像 情報を一時的に記憶する VRAM (Video RAM)などのバッファメモリと、グラフイツ クコントローラから出力される画像情報に基づいて、ディスプレイ装置を表示制御する 制御 ICなどによって構成される。表示部 303には、アイコン、カーソル、メニュー、ウイ ンドウ、あるいは文字や画像などの各種情報が表示される。また、表示部 303には、 後述する記録媒体 305に記憶された地図情報や経路誘導に関する情報が表示され る。  [0035] The display unit 303 is, for example, a CRT (Cathode Ray Tube), a TFT liquid crystal display, an organic EL display, a plasma display, or the like, and displays necessary information. Specifically, the display unit 303 can be configured by, for example, a video IZF or a video display device connected to the video IZF. Video IZF includes, for example, a graphic controller that controls the entire display device, a buffer memory such as VRAM (Video RAM) that temporarily stores image information that can be displayed immediately, and image information that is output from the graphic controller. Based on the above, it is composed of a control IC that controls display of the display device. The display unit 303 displays icons, cursors, menus, windows, or various information such as characters and images. The display unit 303 displays map information and route guidance information stored in a recording medium 305 to be described later.
[0036] 位置取得部 304は、 GPSレシーバおよび各種センサ力 構成され、装置本体の現 在位置(車両の現在位置)の情報を取得する。また、位置取得部 304は、 GPSレシ ーバが GPS情報を受信できない領域など、所定の領域に入った際には、その領域 内に設けられた通信機から送信される GPS代替情報を受信して、車両の現在位置を 検出する。 [0036] The position acquisition unit 304 includes a GPS receiver and various sensor forces, and acquires information on the current position of the apparatus main body (current position of the vehicle). The position acquisition unit 304 also has a GPS receiver. When a server enters a predetermined area such as an area where GPS information cannot be received, it receives GPS alternative information transmitted from a communication device installed in that area and detects the current position of the vehicle.
[0037] GPSレシーバは、 GPS衛星力 送信される GPS情報を受信し、 GPS衛星との幾何 学的位置を求める。なお、 GPSとは、 Global Positioning Systemの略称であり、 4つ以上の衛星からの電波を受信することによって地上での位置を正確に求めるシ ステムである。 GPSレシーバは、 GPS衛星からの電波を受信するためのアンテナ、受 信した電波を復調するチューナーおよび復調した情報に基づいて現在位置を算出 する演算回路などによって構成される。  [0037] The GPS receiver receives GPS information transmitted from the GPS satellite force, and obtains a geometric position with respect to the GPS satellite. GPS is an abbreviation for Global Positioning System, and is a system that accurately determines the position on the ground by receiving radio waves from four or more satellites. The GPS receiver is composed of an antenna for receiving radio waves from a GPS satellite, a tuner for demodulating the received radio waves, and an arithmetic circuit for calculating the current position based on the demodulated information.
[0038] 各種センサは、車速センサや角速度センサ、走行距離センサ、傾斜センサなど自 車に搭載された各種センサであり、これらのセンサから出力される情報から、自車の 走行軌跡を求める。このように、 GPSレシーバによって外部力 得られた情報と合わ せて、自車に搭載された各種センサの出力する情報を用いることによって、より高い 精度で自車位置の認識をおこなうことができる。  [0038] The various sensors are various sensors mounted on the vehicle such as a vehicle speed sensor, an angular velocity sensor, a travel distance sensor, and an inclination sensor, and the travel locus of the vehicle is obtained from information output from these sensors. Thus, by using the information output from various sensors mounted on the vehicle in combination with the information obtained from the external force by the GPS receiver, the vehicle position can be recognized with higher accuracy.
[0039] 車速センサは、ナビゲーシヨン装置 300を搭載する車両のトランスミッションの出力 側シャフトから検出する。角速度センサは、自車の回転時の角速度を検出し、角速度 情報と相対方位情報とを出力する。走行距離センサは、車輪の回転に伴って出力さ れる所定周期のパルス信号のパルス数をカウントすることによって車輪 1回転当たり のパルス数を算出し、その 1回転当たりのパルス数に基づく走行距離情報を出力す る。傾斜センサは、路面の傾斜角度を検出する。  [0039] The vehicle speed sensor detects from the output shaft of the transmission of the vehicle on which the navigation device 300 is mounted. The angular velocity sensor detects the angular velocity when the host vehicle is rotating, and outputs angular velocity information and relative orientation information. The mileage sensor calculates the number of pulses per rotation of the wheel by counting the number of pulses of a pulse signal with a predetermined period that is output as the wheel rotates, and the mileage information based on the number of pulses per rotation Is output. The inclination sensor detects the inclination angle of the road surface.
[0040] 記録媒体 305は、各種制御プログラムや各種情報をコンピュータに読み取り可能な 状態で記録する。記録媒体 305は、記録媒体デコード部 306による情報の書き込み を受け付けるとともに、書き込まれた情報を不揮発に記録する。記録媒体 305は、た とえば、 HD (Hard Disk)によって実現することができる。記録媒体 305は、 HDに 限るものではなぐ HDに代えて、あるいは、 HDに加えて、 DVD (Digital Versatil e Disk)や CD (Compact Disk)など、記録媒体デコード部 306に対して着脱可能 であり可搬性を有するメディアを記録媒体 305として用いてもょ 、。記録媒体 305は、 DVDおよび CDに限るものではなぐ CD— ROM (CD— R, CD-RW) , MO (Mag neto- Optical disk)、メモリカードなどの記録媒体デコード部 306に対して着脱可 能であり可搬性を有するメディアを利用することもできる。 The recording medium 305 records various control programs and various information in a state that can be read by a computer. The recording medium 305 accepts writing of information by the recording medium decoding unit 306 and records the written information in a nonvolatile manner. The recording medium 305 can be realized by, for example, an HD (Hard Disk). The recording medium 305 is not limited to HD. Instead of HD or in addition to HD, DVD (Digital Versatile Disk) and CD (Compact Disk) can be attached to and removed from the recording medium decoding unit 306. Use portable media as the recording medium 305. The recording medium 305 is not limited to DVD and CD. CD—ROM (CD—R, CD-RW), MO (Mag It is also possible to use a portable medium that is detachable from the recording medium decoding unit 306 such as a neto-optical disk or a memory card.
[0041] 記録媒体 305に記憶された地図情報は、建物、河川、地表面などの地物(フィーチ ャ)を表す背景データと、道路の形状を表す道路形状データとを有しており、表示部 303の表示画面にお!、て 2次元または 3次元に描画される。ナビゲーシヨン装置 300 が経路誘導中の場合は、記録媒体 305に記録された地図情報と位置取得部 304に よって取得された自車位置とが重ねて表示されることとなる。  [0041] The map information stored in the recording medium 305 includes background data representing features such as buildings, rivers, and the ground surface, and road shape data representing the shape of the road. It is drawn in 2D or 3D on the display screen of part 303. When the navigation device 300 is guiding a route, the map information recorded on the recording medium 305 and the vehicle position acquired by the position acquisition unit 304 are displayed in an overlapping manner.
[0042] なお、本実施例では地図情報を記録媒体 305に記録するようにしたが、これに限る ものではない。地図情報は、ナビゲーシヨン装置 300のハードウェアと一体に設けら れているものに限って記録されているものではなぐナビゲーシヨン装置 300外部に 設けられていてもよい。その場合、ナビゲーシヨン装置 300は、たとえば、通信部 308 を通じて、ネットワークを介して地図情報を取得する。取得された地図情報は RAMな どに記憶される。  In the present embodiment, the map information is recorded on the recording medium 305. However, the present invention is not limited to this. The map information may be provided outside the navigation device 300, not the information recorded only in the one integrated with the hardware of the navigation device 300. In this case, the navigation device 300 acquires map information via the network through the communication unit 308, for example. The acquired map information is stored in RAM.
[0043] 記録媒体デコード部 306は、記録媒体 305に対する情報の読み取り Z書き込みの 制御をおこなう。たとえば、記録媒体として HDを用いた場合には、記録媒体デコード 部 306は、 HDD (Hard Disk Drive)となる。同様に、記録媒体として DVDあるい は CD (CD— R, CD— RWを含む)を用いた場合には、記録媒体デコード部 306は、 DVDドライブある 、は CDドライブとなる。書き込み可能かつ着脱可能な記録媒体 30 5として、 CD— ROM (CD— R, CD— RW)、 MO、メモリカードなどを利用する場合 には、各種記録媒体への情報の書き込みおよび各種記録媒体に記憶された情報の 読み出しが可能な専用のドライブ装置を、記録媒体デコード部 306として適宜用いる  The recording medium decoding unit 306 controls reading of information on the recording medium 305 and writing of Z. For example, when HD is used as a recording medium, the recording medium decoding unit 306 is an HDD (Hard Disk Drive). Similarly, when a DVD or CD (including CD-R and CD-RW) is used as a recording medium, the recording medium decoding unit 306 is a DVD drive or a CD drive. When a CD-ROM (CD-R, CD-RW), MO, memory card, etc. is used as a writable and removable recording medium 30 5, information can be written to various recording media and various recording media can be used. A dedicated drive device capable of reading stored information is appropriately used as the recording medium decoding unit 306.
[0044] 音声出力部 307は、接続されたスピーカ(図示なし)への出力を制御することによつ て、案内音を再生する。スピーカは、 1つであってもよいし、複数であってもよい。具体 的には、音声出力部 307は、音声出力用のスピーカに接続される音声 IZFによって 実現することができる。より具体的には、音声 IZFは、たとえば、音声デジタル情報の DZA変換をおこなう DZAコンバータと、 DZ Aコンバータから出力される音声アナ ログ信号を増幅する増幅器と、音声アナログ情報の AZD変換をおこなう AZDコン バータと、力ら構成することができる。 [0044] The audio output unit 307 reproduces the guide sound by controlling the output to the connected speaker (not shown). There may be one or more speakers. Specifically, the audio output unit 307 can be realized by an audio IZF connected to an audio output speaker. More specifically, the audio IZF is, for example, a DZA converter that performs DZA conversion of audio digital information, an amplifier that amplifies the audio analog signal output from the DZ A converter, and AZD that converts audio analog information. Con It can be configured with a barter and force.
[0045] 通信部 308は、渋滞や交通規制などの道路交通情報を、定期的あるいは不定期に 取得する。また、通信部 308は、ネットワークと接続され、サーバなどネットワークに接 続された他の機器と情報の送受信をおこなう。  [0045] The communication unit 308 acquires road traffic information such as traffic jams and traffic regulations regularly or irregularly. The communication unit 308 is connected to a network and transmits / receives information to / from other devices connected to the network such as a server.
[0046] 通信部 308による道路交通情報の受信は、 VICS (Vehicle Information and Communication System)センターから道路交通情報が配信されたタイミングで 行ってもよいし、 VICSセンターに対し定期的に道路交通情報を要求することで行つ てもよい。また、サーノ に集約された全国の VICS情報から、所望の地域の道路交通 情報をネットワークを介して取得するようにしてもよい。通信部 308は、たとえば、 FM チューナー、 VICS/ビーコンレシーノ 、無線通信機器、およびその他の通信機器 によって実現することが可能である。  [0046] The reception of road traffic information by the communication unit 308 may be performed at the timing when the road traffic information is distributed from the VICS (Vehicle Information and Communication System) center, or the road traffic information is periodically sent to the VICS center. It may be done on request. In addition, road traffic information in a desired area may be acquired via a network from nationwide VICS information collected in Sano. The communication unit 308 can be realized by, for example, an FM tuner, a VICS / beacon resino, a wireless communication device, and other communication devices.
[0047] 経路探索部 309は、記録媒体 305に記憶されている地図情報や、通信部 308を介 して取得する VICS情報などを利用して、出発地点から目的地点までの最適な経路 を探索する。ここで、最適な経路とは、ユーザが指定した条件に最も合致する経路で ある。一般に、出発地点から目的地点までの経路は無数存在する。このため、経路 探索にあたって考慮される事項を設定し、条件に合致する経路を探索するようにして いる。  [0047] The route search unit 309 searches for an optimal route from the departure point to the destination point using map information stored in the recording medium 305, VICS information acquired via the communication unit 308, and the like. To do. Here, the optimum route is a route that best meets the conditions specified by the user. In general, there are an infinite number of routes from a departure point to a destination point. For this reason, items to be considered in route search are set, and routes that match the conditions are searched.
[0048] 経路誘導部 310は、経路探索部 309によって探索された誘導経路情報、位置取得 部 304によって取得された自車位置情報、記録媒体 305から記録媒体デコード部 3 06を経由して得られた地図情報に基づいて、リアルタイムな経路誘導情報の生成を おこなう。このとき生成される経路誘導情報は、通信部 308によって受信した渋滞情 報を考慮したものであってもよい。経路誘導部 310で生成された経路誘導情報は、 ナビゲーシヨン制御部 301を介して表示部 303へ出力される。  [0048] The route guidance unit 310 is obtained from the guidance route information searched by the route search unit 309, the vehicle position information acquired by the position acquisition unit 304, and the recording medium 305 via the recording medium decoding unit 300. Real-time route guidance information is generated based on the map information. The route guidance information generated at this time may be information that considers the traffic jam information received by the communication unit 308. The route guidance information generated by the route guidance unit 310 is output to the display unit 303 via the navigation control unit 301.
[0049] 案内音生成部 311は、パターンに対応したトーンと音声の情報を生成する。すなわ ち、経路誘導部 310で生成された経路誘導情報に基づいて、案内ポイントに対応し た仮想音源の設定と音声ガイダンス情報の生成をおこな 、、ナビゲーシヨン制御部 3 01を介して音声出力部 307へ出力する。  [0049] The guide sound generator 311 generates tone and voice information corresponding to the pattern. That is, based on the route guidance information generated by the route guidance unit 310, the virtual sound source corresponding to the guidance point is set and the voice guidance information is generated, and the voice is transmitted via the navigation control unit 301. Output to the output unit 307.
[0050] 音声認識部 312は、マイク 302aを介して入力された音声を音声認識する。音声認 識部 312は、ユーザ操作部 302の一部などに発話ボタンなどを有し、発話ボタンの 押下を発話トリガーとして、発話トリガー発生以後にマイク 302aに入力された音声を 音声認識する。音声認識部 312によって音声が認識されると、ナビゲーシヨン制御部 301は認識された言葉に対応した処理をおこなう。 [0050] The voice recognition unit 312 recognizes voice input via the microphone 302a. Voice recognition The recognition unit 312 has an utterance button or the like in a part of the user operation unit 302, and recognizes the voice input to the microphone 302a after the utterance trigger is generated by using the utterance button as an utterance trigger. When the voice is recognized by the voice recognition unit 312, the navigation control unit 301 performs processing corresponding to the recognized word.
[0051] たとえば、目的地点の設定画面において発話がおこなわれ、音声認識によって地 名が認識されると、ナビゲーシヨン制御部 301は、認識された地名を目的地点として 設定する。ユーザは、表示部 303に表示された地図から目的地点を指定する代わり に、目的地点名を発話することによって、目的地点を設定することができる。このよう に、音声認識部 312による音声認識によって、ユーザ操作部 302によっておこなう操 作に代えることができる。  [0051] For example, when an utterance is made on the destination point setting screen and a place name is recognized by voice recognition, the navigation control unit 301 sets the recognized place name as the destination point. The user can set the destination point by speaking the destination point name instead of specifying the destination point from the map displayed on the display unit 303. As described above, the voice recognition performed by the voice recognition unit 312 can be replaced with the operation performed by the user operation unit 302.
[0052] ここで、音声認識の手法は様々なものが知られている力 一般には、入力された音 声を特定するために、あらかじめ認識対象となる音声の周波数分布を分析することで 、たとえば、スペクトルや基本周波数の時系列情報などを入力音声の特徴量として抽 出し、そのパターンを各単語に対応させて格納する音声認識用辞書を備えている。  [0052] Here, various methods of voice recognition are known. Generally, in order to identify the input voice, by analyzing the frequency distribution of the voice to be recognized in advance, for example, In addition, a speech recognition dictionary is provided that extracts time series information of spectrum and fundamental frequency as feature quantities of input speech and stores the pattern corresponding to each word.
[0053] 認識されるべき音声が入力されると、入力された音声の周波数スペクトルを解析し、 あらかじめ用意されている音素モデルとの比較 ·照合によって音素を特定する。そし て、特定された音素モデルと、音声認識用辞書に格納された各単語のパターン (以 下、待ち受け語という)をパターンマッチングにより比較 ·照合し、各単語に対する類 似度を算出する。つぎに算出された類似度が最も高い待ち受け語 (パターンが最も 近い単語)を、入力された音声であると認識し、その待ち受け語を出力するようにして いる。つまり、入力された単語の周波数分布のパターンがどの待ち受け語に最も類 似しているかを調べることによって、入力音声を判定する。  [0053] When the speech to be recognized is input, the frequency spectrum of the input speech is analyzed, and the phoneme is specified by comparing and collating with a phoneme model prepared in advance. Then, the identified phoneme model and the pattern of each word stored in the speech recognition dictionary (hereinafter referred to as a standby word) are compared and verified by pattern matching to calculate the similarity for each word. Next, the standby word with the highest similarity (the word with the closest pattern) is recognized as the input speech, and the standby word is output. That is, the input speech is determined by examining which standby word the frequency distribution pattern of the input word is most similar to.
[0054] ここで、音声認識部 312は、マッチング処理の処理時間との関係から、音声認識処 理にお 、てマッチング処理をおこなう待ち受け語の数を限定して 、る。上述のように 、音声認識部 312は、入力された音声の周波数パターンと、処理対象となっている全 ての待ち受け語に対してマッチング処理をおこなった上で、各待ち受け語に対する 類似度を算出する。このため、マッチング処理の対象となる待ち受け語が少ないほど 、処理時間を短縮することができる。ただし、マッチング処理の対象となる待ち受け語 が発話された単語と一致しなければ、誤認識やエラー (該当単語なし)が多発し、返 つて使 、勝手が悪くなつてしまう。 Here, the voice recognition unit 312 limits the number of standby words to be subjected to the matching process in the voice recognition process from the relationship with the processing time of the matching process. As described above, the speech recognition unit 312 performs a matching process on the frequency pattern of the input speech and all the standby words to be processed, and then calculates the similarity for each standby word. To do. For this reason, the processing time can be shortened as the number of waiting words to be subjected to the matching processing is small. However, the standby word to be matched If the word does not match the spoken word, misrecognitions and errors (no corresponding word) will occur frequently, resulting in poor use and use.
[0055] そこで、音声認識部 312は、認識しょうとする語句の先頭文字をユーザに入力させ ることによって、待ち受け語のうちマッチング処理をおこなうもの(以下、絞り込み語と いう)を絞り込む。たとえば、先頭文字として「さ」が入力された場合は、待ち受け語の うち「さ 、たま巿」「佐世保巿」など「さ」を先頭文字とする単語のみを絞り込み語として 抽出する。音声認識処理をおこなう際は、入力された音声と絞り込み語についてマツ チング処理をおこなう。これにより、音声認識の精度を高めつつ音声認識処理の効 率ィ匕を図ることができる。  [0055] Therefore, the voice recognition unit 312 narrows down the words to be matched (hereinafter referred to as narrowed words) by waiting for the user to input the first character of the word to be recognized. For example, when “sa” is input as the first character, only words having “sa” as the first character, such as “sa, tamago” and “sasebo”, are extracted as narrowed words. When performing speech recognition processing, matching processing is performed on the input speech and refined words. As a result, the efficiency of the speech recognition process can be improved while improving the accuracy of speech recognition.
[0056] なお、語句の絞り込みは、先頭文字の入力に限らず、たとえば、末尾の文字を入力 したり、入力された文字を含んだ語句を絞り込んだりしてもよい。また、語句の属性を 指定して絞り込んでもよい。たとえば、目的地点の設定画面など、地名を入力する画 面が表示されて 、る時は、地名のみを絞り込み語にするなどである。  [0056] Note that the narrowing down of phrases is not limited to the input of the first character, and for example, the last character may be input or the phrases including the input character may be narrowed down. You can also refine by specifying the attribute of the word. For example, when a place name input screen such as a destination point setting screen is displayed, only the place name is narrowed down.
[0057] また、文字の入力はタツチパネルに限らず、たとえば、手書き入力などであってもよ い。その場合、ユーザの利き手側に手書き入力用のセンサパネルなどを設ける。この とき、入力された文字が認識されたことを発話トリガーとして、音声入力の受け付けを 開始することとしてもよい。  [0057] Further, character input is not limited to the touch panel, and may be handwritten input, for example. In that case, a sensor panel for handwriting input is provided on the dominant hand side of the user. At this time, reception of voice input may be started with the utterance trigger that the input character is recognized.
[0058] 以上のようなハードウェア構成によってナビゲーシヨン装置 300は構成されて!、る。  The navigation device 300 is configured by the hardware configuration as described above!
なお、実施の形態に力かる音声認識装置 100の機能的構成である文字入力部 101 、選択部 107、ジャンル入力部 108はユーザ操作部 302が、音声入力部 102はマイ ク 302aが、抽出部 103および起動部 106はナビゲーシヨン制御部 301が、音声認識 部 104は音声認識部 312が、表示部 105は表示部 303が、それぞれその機能を実 現する。  It should be noted that the character input unit 101, the selection unit 107, and the genre input unit 108, which are functional configurations of the speech recognition device 100 that is relevant to the embodiment, are the user operation unit 302, the voice input unit 102 is the microphone 302a, The navigation control unit 301, the voice recognition unit 104, the voice recognition unit 312 and the display unit 105, the display control unit 103 and the activation unit 106, respectively.
[0059] (音声認識部 312による音声認識処理)  [0059] (Voice recognition processing by voice recognition unit 312)
図 4は、音声認識部による音声認識処理の手順を示すフローチャートである。以下 の説明において、ユーザ操作部 302としてタツチパネルを採用するものとする。まず、 音声認識部 312は、ユーザ操作部 302を介して文字が入力されるまで待機する (ス テツプ S401 :Noのループ)。ここで入力される文字は、たとえば、ユーザが発話しょう としている語句の先頭の 1文字 (先頭文字)である。この他、末尾の文字や語句に含 まれる文字などであってもよい。このとき、文字の入力がないまま発話トリガーが発生 し、音声の入力があった場合は、絞り込みをおこなわず、待ち受け語全体とのマッチ ング処理によって音声認識をおこなう。 FIG. 4 is a flowchart showing a procedure of voice recognition processing by the voice recognition unit. In the following description, a touch panel is adopted as the user operation unit 302. First, the voice recognition unit 312 waits until a character is input via the user operation unit 302 (step S401: loop of No). For example, the user should speak This is the first character (first character) of the phrase. In addition, it may be the last character or a character included in a phrase. At this time, if an utterance trigger occurs without any character input, and speech is input, the speech recognition is performed by matching with the entire waiting word without narrowing down.
[0060] 文字が入力された場合は (ステップ S401: Yes)、入力された文字に基づいて、絞り 込み語を抽出する (ステップ S402)。絞り込み語とは、先述のように、待ち受け語を所 定の条件で絞り込んだものである。そして、表示部 303に絞り込み語を表示する (ス テツプ S403)。  [0060] When a character is input (step S401: Yes), a refined word is extracted based on the input character (step S402). A narrowed word is a narrowed word that is narrowed down under certain conditions, as described above. Then, the refined word is displayed on the display unit 303 (step S403).
[0061] ステップ S403において絞り込み語が表示されると、ユーザは表示された語句を見 て、さらに文字を入力して絞り込みをおこなうかを判断することができる。ユーザによ つてさらに文字が入力された場合は (ステップ S404 : Yes)、ステップ S402に戻り、以 降の処理を繰り返す。これにより、さらに待ち受け語の絞り込みが進むこととなる。  [0061] When the narrowed word is displayed in step S403, the user can determine whether or not to narrow down by looking at the displayed word and phrase and further inputting characters. If more characters are input by the user (step S404: Yes), the process returns to step S402 and the subsequent processing is repeated. This further narrows down the standby words.
[0062] ステップ S404において、さらなる文字の入力がない場合は(ステップ S404 :No)、 発話トリガーが発生し、音声が入力されるまで待機する (ステップ S405 :Noのループ )。音声が入力されると (ステップ S405 : Yes)、入力された音声と絞り込み語とのマツ チング処理をおこな ヽ (ステップ S406)、発話された語句を認識して (ステップ S407) 、本フローチャートによる処理を終了する。なお、ここでは、文字の入力を受けることと したが、たとえば、語句の属性 (たとえば、地名や曲名、指示語などの意味上の分類) を指定し、指定された属性の語句のみを抽出することとしてもょ 、。  [0062] In step S404, if there is no further character input (step S404: No), an utterance trigger is generated and the process waits until a voice is input (step S405: loop of No). When speech is input (step S405: Yes), the input speech and the narrowed word are matched (step S406), and the spoken phrase is recognized (step S407). The process ends. In this case, it is assumed that characters are input, but for example, the phrase attributes (for example, the semantic classification of place names, song titles, directives, etc.) are specified, and only the phrases with the specified attributes are extracted. As a matter of fact.
[0063] 以上説明したように、音声認識部 312は、マッチング処理をおこなう待ち受け語を、 ユーザの指定する条件で絞り込み、絞り込まれた語句に対してのみ入力音声とのマ ツチング処理をおこなう。これにより、音声認識処理に要する時間を短縮し、ナビゲー シヨン装置 300の音声認識のレスポンスを向上させることができる。また、ある程度絞 り込まれた語句力も音声認識をおこなうため、認識精度の向上を図ることができる。  [0063] As described above, the speech recognition unit 312 narrows down the standby words to be matched under the conditions specified by the user, and performs the matching process with the input speech only for the narrowed words and phrases. Thereby, the time required for the voice recognition processing can be shortened, and the voice recognition response of the navigation device 300 can be improved. In addition, the phrase recognition that has been narrowed down to a certain extent also performs speech recognition, so the recognition accuracy can be improved.
[0064] (絞り込み語抽出時の表示画面例)  [0064] (Display screen example when extracting refined words)
図 5は、先頭文字の入力画面の一例を示す図である。図 4に示した絞り込み語の抽 出(ステップ S402)および絞り込み語の表示 (ステップ S403)の詳細について説明 する。ここでは、 目的地点の設定をおこなう際に音声認識を用いるものとする。図 5に おいて、ユーザ操作部 302としてタツチパネルを採用し、表示部 303に文字入力画 面 500を表示している。文字入力画面 500には、文字入力キー 511、入力文字表示 部 512、絞り込み語表示部 513が表示されている。 FIG. 5 is a diagram showing an example of the input screen for the first character. Details of the extraction of narrowed words (step S402) and the display of narrowed words (step S403) shown in FIG. 4 will be described. Here, speech recognition is used when setting the destination point. Figure 5 In this case, a touch panel is adopted as the user operation unit 302, and a character input screen 500 is displayed on the display unit 303. On the character input screen 500, a character input key 511, an input character display unit 512, and a narrowed word display unit 513 are displayed.
[0065] 文字入力キー 511は、 50音順にひらがなが配置されている。また、切換ボタンが設 けられ、英数字やカタカナの入力キーを表示することもできる。ユーザは、所望の文 字の画面表示を触れることによって、その文字を入力することができる。入力された文 字は、入力文字表示部 512に表示される。図示の例では、文字入力キー 511のうち「 さ」に対応するキーが押下され、入力文字表示部 512に「さ」の文字が表示されてい る。 [0065] In the character input key 511, hiragana characters are arranged in the order of 50 tones. A switch button is provided to display alphanumeric and katakana input keys. The user can input a desired character by touching the screen display of the desired character. The input character is displayed in the input character display section 512. In the illustrated example, a key corresponding to “sa” among the character input keys 511 is pressed, and the character “sa” is displayed in the input character display portion 512.
[0066] 音声認識部 312は、ユーザ操作部 302を介して先頭文字が入力されると、入力さ れた文字を先頭文字とする待ち受け語を絞り込み語として抽出する。抽出された絞り 込み語は、絞り込み語表示部 513に表示される。図示の例では、目的地点の設定の ための音声認識であるため、「幸区」「佐井村」「西海巿」など、「さ」が先頭文字とする 地名が絞り込み語として表示されている。また、スクロールボタン 513aを押下すると、 表示しきれて!/、な 、絞り込み語が表示される。  When the first character is input via the user operation unit 302, the voice recognition unit 312 extracts a standby word having the input character as the first character as a narrowed word. The extracted narrowed words are displayed in the narrowed word display unit 513. In the example shown in the figure, since the voice recognition is performed for setting the destination point, place names such as “Yuki-ku”, “Saimura”, “Nishikaien”, etc., with “sa” as the first character are displayed as narrowed words. In addition, when the scroll button 513a is pressed, the narrowed words are displayed!
[0067] ユーザは、発話ボタン(図示せず)を押下し発話トリガーを発生させ、所望の語句(「 さ」から始まる地名)を発話する。音声認識部 312は、入力された音声と絞り込み語と の間でマッチング処理をおこない、ユーザが発話した語句を認識する。このため、図 示の例では、たとえば、「東京」のように「さ」で始まらない語句が発話されても認識す ることができない。  [0067] The user presses an utterance button (not shown) to generate an utterance trigger, and utters a desired phrase (a place name starting with “sa”). The voice recognition unit 312 performs matching processing between the input voice and the narrowed word, and recognizes a phrase uttered by the user. For this reason, in the example shown in the figure, for example, words such as “Tokyo” that do not start with “sa” cannot be recognized.
[0068] なお、ユーザは、絞り込み語表示部 513に表示された地名を触れることによつても、 その地名を選択することができる。この場合、絞り込み語表示部 513に表示されてい ない語句を選択するには、スクロールボタン 513aを押下して所望の語句を絞り込み 語表示部 513に表示させ、その表示を触れる。  Note that the user can also select a place name by touching the place name displayed on the narrowed word display unit 513. In this case, in order to select a word that is not displayed in the narrowed word display unit 513, the user presses the scroll button 513a to display a desired word in the narrowed word display unit 513, and touches the display.
[0069] 図 6は、絞り込み語の一例を示した図表である。図 6において、語句群 601は、「さ」 を先頭文字とする地名が表示されている。これは、図 5に示した文字入力画面 500に おいて、「さ」が入力されたためである。音声認識部 312は、語句群 601内の語句と のマッチング処理によって、入力された音声を認識する。また、図示のように、「さ」で 始まる語句は、地名に限っても数多く存在する。このため、文字入力画面 500の絞り 込み語表示部 513 (図 5参照)には、点線で示した表示語句群 602しか表示すること ができない。表示語句群 602以外の語句を表示するには、スクロールボタン 513a ( 図 5)を押下する。 FIG. 6 is a chart showing an example of narrowed words. In FIG. 6, a word / phrase group 601 displays place names having “sa” as the first character. This is because “sa” is entered in the character input screen 500 shown in FIG. The speech recognition unit 312 recognizes the input speech by matching processing with the words in the word group 601. Also, as shown, There are many words that start with place names. Therefore, only the display word / phrase group 602 indicated by the dotted line can be displayed on the narrowed word display portion 513 (see FIG. 5) of the character input screen 500. To display words other than the display word group 602, press the scroll button 513a (FIG. 5).
[0070] 図 7および図 8は、メニュー画面の一例を示す図である。図 5および図 6に示した例 は目的地点の設定を音声認識でおこなうものであるため、絞り込み語も地名に限られ ていた。このように、音声認識部 312は、音声認識の目的に沿った絞り込み語を抽出 する。図 7において、表示部 303にはメニュー画面 700が表示されている。メニュー画 面 700は、ユーザがおこなおうとする操作を選択する画面である。ユーザは、所望の 操作表示 711〜714に触れることによって、または、所望の操作内容を発話すること によって、その操作を選択する。  7 and 8 are diagrams showing an example of the menu screen. In the examples shown in Fig. 5 and Fig. 6, the destination point is set by voice recognition, so the narrow-down word is also limited to the place name. In this way, the speech recognition unit 312 extracts narrowed words that meet the purpose of speech recognition. In FIG. 7, a menu screen 700 is displayed on the display unit 303. The menu screen 700 is a screen for selecting an operation to be performed by the user. The user selects an operation by touching a desired operation display 711 to 714 or by speaking a desired operation content.
[0071] 図示の例では、選択可能な操作として「目的地点を設定する」(操作表示 711)、「 曲をさがす」(操作表示 712)、「渋滞情報をみる」(操作表示 713)、「機器の設定を 変更する」(操作表示 714)が表示されている。さらに、スクロールバー 721を押下す れば、他の操作の操作表示が表示される。  [0071] In the example shown in the figure, selectable operations include “Set destination point” (operation display 711), “Search for a song” (operation display 712), “View traffic information” (operation display 713), “ Change device settings "(operation display 714) is displayed. Further, when the scroll bar 721 is pressed, an operation display of other operations is displayed.
[0072] 目的地点の設定をおこなう場合、「目的地点を設定する」(操作表示 711)を押下、 または発話する。すると、図 8に示す設定方法選択画面 800が表示される。設定方法 選択画面 800は、操作内容(図示の例では、目的地点の設定)を実行するための方 法を選択する画面である。図 7同様、ユーザは、所望の方法表示 811〜814に触れ ること〖こよって、または、所望の設定方法を発話することによって、その操作を選択す る。  [0072] When setting the destination point, the user presses or utters "set destination point" (operation display 711). Then, a setting method selection screen 800 shown in FIG. 8 is displayed. Setting method selection screen 800 is a screen for selecting a method for executing the operation content (in the illustrated example, setting of a destination point). As in FIG. 7, the user selects the operation by touching the desired method display 811 to 814 or by speaking the desired setting method.
[0073] 図示の例では、選択可能な方法として「50音順でさがす」(方法表示 811)、「地図 力 さがす」(方法表示 812)、「走行履歴力 さがす」(方法表示 813)、「ジャンルで さがす」(方法表示 814)が表示されている。さらに、スクロールバー 821を押下すれ ば、他の方法の方法表示が表示される。  [0073] In the illustrated example, selectable methods are “Search in order of 50 notes” (Method display 811), “Search for map power” (Method display 812), “Search for driving history power” (Method display 813), “ “Search by genre” (method display 814) is displayed. Furthermore, if the scroll bar 821 is pressed, a method display of another method is displayed.
[0074] ここで、「50音順でさがす」(方法表示 811)を押下、または発話する。すると、図 5に 示す文字入力画面 500が表示される。ユーザは、文字入力画面 500に所望の地点 を入力してもよいし、先頭文字だけを入力して音声認識による設定をおこなってもよ い。図 7および図 8に示した階層を経ているため、入力されるのは目的地点となりうる 地名であることが明らかである。このため、音声認識部 312は、地名のみを音声認識 の対象として抽出する。さらに、ユーザが文字入力画面 500で、先頭文字を入力する ことによって、待ち受け語から絞り込み語が抽出され、より効率的かつ精度よく音声認 識をおこなうことができる。 [0074] Here, "Search in order of 50 notes" (method display 811) is pressed or uttered. Then, a character input screen 500 shown in FIG. 5 is displayed. The user may input a desired point on the character input screen 500, or input only the first character and set by voice recognition. Yes. Since it has gone through the hierarchy shown in Fig. 7 and Fig. 8, it is clear that the place name that can be entered is the destination name. For this reason, the speech recognition unit 312 extracts only place names as the targets for speech recognition. Furthermore, when the user inputs the first character on the character input screen 500, a narrowed word is extracted from the standby word, and voice recognition can be performed more efficiently and accurately.
[0075] ここで、図 7に示したメニュー画面 700において音声認識をおこなう際にも、先頭文 字による待ち受け語の絞り込みをおこなうことができる。メニュー画面 700で音声認識 をおこなう際には、全ての待ち受け語から音声認識をおこなう。これにより、特定の操 作に限らず、ナビゲーシヨン装置 300の有する様々な操作を同時に待ち受けることが でき、ユーザに対して、階層を経て設定画面を表示させる手間を省くことができる。  Here, when performing voice recognition on the menu screen 700 shown in FIG. 7, it is possible to narrow down standby words by the first character. When speech recognition is performed on the menu screen 700, speech recognition is performed from all standby words. Accordingly, not only a specific operation but also various operations of the navigation device 300 can be simultaneously waited, and it is possible to save the user the trouble of displaying the setting screen through the hierarchy.
[0076] 図 9は、メニュー画面で音声認識をおこなう際の先頭文字の入力画面の一例を示 す図である。図 7に示したメニュー画面 700を表示した状態で発話ボタンを押下する と、表示部 303に図 9に示す文字入力画面 900が表示される。文字入力画面 900〖こ は、文字入力キー 911、入力文字表示部 912、絞り込み語表示部 913が表示されて いる。  FIG. 9 is a diagram showing an example of the input screen for the first character when performing voice recognition on the menu screen. When the utterance button is pressed while the menu screen 700 shown in FIG. 7 is displayed, the character input screen 900 shown in FIG. The character input screen 900 〖has a character input key 911, an input character display portion 912, and a narrowed word display portion 913.
[0077] 図示の例では、文字入力キー 911のうち「こ」に対応するキーが押下され、入力文 字表示部 912に「こ」の文字が表示されている。そして、絞り込み語として、「こ」を先 頭文字とする待ち受け語が表示されている。また、スクロールボタン 913aを押下する と、表示しきれていない語句が表示される。この待ち受け語は、特定の属性に限られ ることなぐナビゲーシヨン装置 300の操作において発話される可能性のある全ての 語句が表示される。たとえば、「ここへ行く」などの指示語や、「甲子園」などの施設名 称 (地名)、「甲子園へ行く」などの複合語句などである。  In the illustrated example, a key corresponding to “ko” among the character input keys 911 is pressed, and the character “ko” is displayed in the input character display unit 912. A standby word with “ko” as the initial character is displayed as a narrowing word. When the scroll button 913a is pressed, words that cannot be displayed are displayed. As this standby word, all the words that can be uttered in the operation of the navigation device 300 without being limited to a specific attribute are displayed. For example, directives such as “go here”, facility names (place names) such as “Koshien”, and compound phrases such as “go to Koshien”.
[0078] 図 10は、メニュー画面で音声認識をおこなう際の絞り込み語の一例を示した図表 である。図 10において、語句群 1001は、「こ」を先頭文字とする待ち受け語が表示さ れている。図 9に示した文字入力画面 900の絞り込み語表示部 913には、点線で示 した表示語句群 1002が表示される。表示語句群 1002以外の語句を表示するには 、スクロールボタン 913a (図 9参照)を押下する。  FIG. 10 is a chart showing an example of narrowed words when performing voice recognition on the menu screen. In FIG. 10, a phrase group 1001 displays standby words with “ko” as the first character. In the narrowed word display portion 913 of the character input screen 900 shown in FIG. 9, a display word / phrase group 1002 indicated by a dotted line is displayed. To display words other than the displayed word group 1002, press the scroll button 913a (see FIG. 9).
[0079] 語句群 1001に含まれる語句には、先述した指示語や地名の他、ナビゲーシヨン装 置 300の記録媒体 305に記録された音楽データの曲名や地図の縮尺変更指示など がある。ユーザは、音声認識によって、メニュー画面 700から直接これらの操作をお こなうことができる。このため、表示画面の階層を経ることなぐ所望の操作をおこなう ことができ、ユーザの操作負担を軽減することができる。 [0079] The phrases included in the phrase group 1001 include the above-mentioned directives and place names, as well as navigation devices. There are song titles of music data recorded on the recording medium 305 of the device 300, map scale change instructions, and the like. The user can perform these operations directly from the menu screen 700 by voice recognition. For this reason, it is possible to perform a desired operation without going through the hierarchy of the display screen, and the user's operation burden can be reduced.
[0080] 通常、ナビゲーシヨン装置 300の有する様々な操作を同時に待ち受けると、待ち受 け語が膨大な数になるため、音声認識処理に時間がかかり、処理のレスポンスが遅く なってしまう。本実施例に力かるナビゲーシヨン装置 300では、待ち受け語の先頭文 字を入力させることによって、処理対象となる待ち受け語を絞り込み、効率的に音声 認識処理をおこないつつ、ユーザの操作負担を軽減することができる。このように、多 くの待ち受け語から処理対象となるものを適切に絞り込むことによって、ユーザの操 作負担を軽減することができる。  [0080] Normally, when waiting for various operations of the navigation device 300 at the same time, the number of standby words becomes enormous, so that the speech recognition process takes time and the response of the process becomes slow. In the navigation device 300 that is effective in the present embodiment, by inputting the first character of the standby word, the standby word to be processed is narrowed down, and the user's operation burden is reduced while performing the speech recognition processing efficiently. be able to. In this way, the user's operational burden can be reduced by appropriately narrowing down the words to be processed from many standby words.
[0081] 以上説明したように、実施例に力かるナビゲーシヨン装置 300によれば、先頭語句 などによって処理対象の待ち受け語を絞り込むことによって、マッチング処理に要す る時間を短縮し、処理負担を軽減することができる。また、絞り込まれた待ち受け語か ら該当する語句を認識するため、音声認識の精度を向上させることができる。さらに、 多くの待ち受け語から処理対象となるものを適切に絞り込むことによって、ユーザに 対する設定操作時の操作負担を軽減することができる。  [0081] As described above, according to the navigation device 300 that is powerful in the embodiment, by narrowing down the standby words to be processed by the first word or the like, the time required for the matching process is shortened and the processing load is reduced. Can be reduced. In addition, since the corresponding words are recognized from the narrowed standby words, the accuracy of voice recognition can be improved. Furthermore, by appropriately narrowing down the words to be processed from many standby words, it is possible to reduce the operation burden during the setting operation for the user.
[0082] なお、本実施の形態で説明した音声認識方法は、あらかじめ用意されたプログラム をパーソナル.コンピュータやワークステーション等のコンピュータで実行することによ り実現することができる。このプログラムは、ハードディスク、フレキシブルディスク、 C D— ROM、 MO、 DVD等のコンピュータで読み取り可能な記録媒体に記録され、コ ンピュータによって記録媒体力 読み出されることによって実行される。またこのプロ グラムは、インターネット等のネットワークを介して配布することが可能な伝送媒体で あってもよい。  Note that the speech recognition method described in the present embodiment can be realized by executing a program prepared in advance on a computer such as a personal computer or a workstation. This program is recorded on a computer-readable recording medium such as a hard disk, a flexible disk, a CD-ROM, an MO, or a DVD, and is executed by being read by the computer. The program may be a transmission medium that can be distributed via a network such as the Internet.

Claims

請求の範囲 The scope of the claims
[1] 音声認識したい語句の一部に含まれる文字が入力される文字入力手段と、  [1] A character input means for inputting characters included in a part of a phrase to be recognized,
前記音声認識の音声が入力される音声入力手段と、  Voice input means for inputting the voice of the voice recognition;
あらかじめ設定された複数の待ち受け語から、前記文字入力手段に入力された前 記文字を含む待ち受け語を抽出する抽出手段と、  Extraction means for extracting a standby word including the above-mentioned characters input to the character input means from a plurality of standby words set in advance;
前記抽出手段によって抽出された前記待ち受け語を用いて、前記音声入力手段に 入力された前記音声を音声認識する音声認識手段と、  Using the standby word extracted by the extraction means, voice recognition means for recognizing the voice input to the voice input means;
を備えることを特徴とする音声認識装置。  A speech recognition apparatus comprising:
[2] 前記抽出手段によって抽出された前記待ち受け語を表示する表示手段を備えるこ とを特徴とする請求項 1に記載の音声認識装置。  2. The speech recognition apparatus according to claim 1, further comprising display means for displaying the standby word extracted by the extraction means.
[3] 前記音声認識手段の音声認識結果に基づ!、て、所定の処理を起動する起動手段 を備えることを特徴とする請求項 1に記載の音声認識装置。 3. The speech recognition apparatus according to claim 1, further comprising start means for starting predetermined processing based on a speech recognition result of the speech recognition means.
[4] 前記表示手段によって表示された複数の前記待ち受け語から、所望の前記待ち受 け語を選択する選択手段と、 [4] Selection means for selecting a desired standby word from a plurality of the standby words displayed by the display means;
前記選択手段の選択結果に基づ!、て、所定の処理を起動する起動手段と、 を備えることを特徴とする請求項 2に記載の音声認識装置。  3. The speech recognition apparatus according to claim 2, further comprising: an activation unit that activates a predetermined process based on a selection result of the selection unit.
[5] 前記抽出手段は、前記文字入力手段に入力された前記文字が先頭文字の前記待 ち受け語を抽出することを特徴とする請求項 1に記載の音声認識装置。 5. The speech recognition apparatus according to claim 1, wherein the extraction unit extracts the standby word whose first character is the character input to the character input unit.
[6] 前記語句が属するジャンルが入力されるジャンル入力手段を備え、 [6] A genre input means for inputting a genre to which the word belongs,
前記抽出手段は、前記ジャンル入力手段によって入力された前記ジャンルに属す る前記待ち受け語を抽出することを特徴とする請求項 1〜5のいずれか一つに記載 の音声認識装置。  6. The speech recognition apparatus according to claim 1, wherein the extraction unit extracts the standby word belonging to the genre input by the genre input unit.
[7] 音声認識したい語句の一部に含まれる文字が入力される文字入力工程と、  [7] A character input process in which characters included in a part of a phrase to be recognized are input,
前記音声認識の音声が入力される音声入力工程と、  A voice input step in which the voice of the voice recognition is input;
あらかじめ設定された複数の待ち受け語から、前記文字入力工程に入力された前 記文字を含む待ち受け語を抽出する抽出工程と、  An extraction step of extracting a standby word including the above-mentioned characters input in the character input step from a plurality of standby words set in advance;
前記抽出工程によって抽出された前記待ち受け語を用いて、前記音声入力工程に 入力された前記音声を音声認識する音声認識工程と、 を含むことを特徴とする音声認識方法。 Using the standby words extracted in the extraction step, a speech recognition step for recognizing the speech input in the speech input step; A speech recognition method comprising:
請求項 7に記載の音声認識方法をコンピュータに実行させることを特徴とする音声 認識プログラム。  A speech recognition program for causing a computer to execute the speech recognition method according to claim 7.
請求項 8に記載の音声認識プログラムを記録したコンピュータに読み取り可能な記 録媒体。  A computer-readable recording medium on which the voice recognition program according to claim 8 is recorded.
PCT/JP2006/310673 2005-06-21 2006-05-29 Speech recognizing device, speech recognizing method, speech recognizing program, and recording medium WO2006137246A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2005181065 2005-06-21
JP2005-181065 2005-06-21

Publications (1)

Publication Number Publication Date
WO2006137246A1 true WO2006137246A1 (en) 2006-12-28

Family

ID=37570280

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2006/310673 WO2006137246A1 (en) 2005-06-21 2006-05-29 Speech recognizing device, speech recognizing method, speech recognizing program, and recording medium

Country Status (1)

Country Link
WO (1) WO2006137246A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2010147624A (en) * 2008-12-17 2010-07-01 Konica Minolta Business Technologies Inc Communication device, search processing method and search processing program
JP2011243011A (en) * 2010-05-19 2011-12-01 Yahoo Japan Corp Input support device, extraction method, program and information processor
JP2013222229A (en) * 2012-04-12 2013-10-28 Konica Minolta Inc Input operation device, image forming apparatus including the device, input operation method, and input operation program
WO2017219991A1 (en) * 2016-06-23 2017-12-28 华为技术有限公司 Optimization method and apparatus suitable for model of pattern recognition, and terminal device
JP2019133025A (en) * 2018-01-31 2019-08-08 トヨタ自動車株式会社 Information processing device and information processing method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH11305790A (en) * 1998-04-23 1999-11-05 Denso Corp Voice recognition device
JP2002123279A (en) * 2000-10-16 2002-04-26 Pioneer Electronic Corp Institution retrieval device and its method
JP2002350146A (en) * 2001-05-25 2002-12-04 Mitsubishi Electric Corp Navigation device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH11305790A (en) * 1998-04-23 1999-11-05 Denso Corp Voice recognition device
JP2002123279A (en) * 2000-10-16 2002-04-26 Pioneer Electronic Corp Institution retrieval device and its method
JP2002350146A (en) * 2001-05-25 2002-12-04 Mitsubishi Electric Corp Navigation device

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2010147624A (en) * 2008-12-17 2010-07-01 Konica Minolta Business Technologies Inc Communication device, search processing method and search processing program
JP2011243011A (en) * 2010-05-19 2011-12-01 Yahoo Japan Corp Input support device, extraction method, program and information processor
JP2013222229A (en) * 2012-04-12 2013-10-28 Konica Minolta Inc Input operation device, image forming apparatus including the device, input operation method, and input operation program
WO2017219991A1 (en) * 2016-06-23 2017-12-28 华为技术有限公司 Optimization method and apparatus suitable for model of pattern recognition, and terminal device
CN107545889A (en) * 2016-06-23 2018-01-05 华为终端(东莞)有限公司 Suitable for the optimization method, device and terminal device of the model of pattern-recognition
CN107545889B (en) * 2016-06-23 2020-10-23 华为终端有限公司 Model optimization method and device suitable for pattern recognition and terminal equipment
US10825447B2 (en) 2016-06-23 2020-11-03 Huawei Technologies Co., Ltd. Method and apparatus for optimizing model applicable to pattern recognition, and terminal device
JP2019133025A (en) * 2018-01-31 2019-08-08 トヨタ自動車株式会社 Information processing device and information processing method
JP7056185B2 (en) 2018-01-31 2022-04-19 トヨタ自動車株式会社 Information processing equipment and information processing method

Similar Documents

Publication Publication Date Title
JP5315289B2 (en) Operating system and operating method
US7310602B2 (en) Navigation apparatus
US20060100871A1 (en) Speech recognition method, apparatus and navigation system
JP4466379B2 (en) In-vehicle speech recognition device
WO2006137246A1 (en) Speech recognizing device, speech recognizing method, speech recognizing program, and recording medium
JP5217838B2 (en) In-vehicle device operating device and in-vehicle device operating method
JP2009230068A (en) Voice recognition device and navigation system
JP4914632B2 (en) Navigation device
JP5455355B2 (en) Speech recognition apparatus and program
JP4682199B2 (en) Voice recognition apparatus, information processing apparatus, voice recognition method, voice recognition program, and recording medium
JP4381632B2 (en) Navigation system and its destination input method
JP2011232668A (en) Navigation device with voice recognition function and detection result presentation method thereof
US20040015354A1 (en) Voice recognition system allowing different number-reading manners
JP2006171305A (en) Navigation device, and information retrieval method using speech recognition for the navigation device
JP3296783B2 (en) In-vehicle navigation device and voice recognition method
JP4453377B2 (en) Voice recognition device, program, and navigation device
WO2006028171A1 (en) Data presentation device, data presentation method, data presentation program, and recording medium containing the program
JP2017182251A (en) Analyzer
JP4004885B2 (en) Voice control device
JP4645708B2 (en) Code recognition device and route search device
JP2001092493A (en) Speech recognition correcting system
JP2006284677A (en) Voice guiding device, and control method and control program for voice guiding device
JP2005114964A (en) Method and processor for speech recognition
JP2006039954A (en) Database retrieval system, program, and navigation system
JP2003140682A (en) Voice recognition device and voice dictionary generation method

Legal Events

Date Code Title Description
DPE2 Request for preliminary examination filed before expiration of 19th month from priority date (pct application filed from 20040101)
121 Ep: the epo has been informed by wipo that ep was designated in this application
NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 06746951

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: JP