WO2006025106A1 - 音声認識システム、音声認識方法およびそのプログラム - Google Patents
音声認識システム、音声認識方法およびそのプログラム Download PDFInfo
- Publication number
- WO2006025106A1 WO2006025106A1 PCT/JP2004/012626 JP2004012626W WO2006025106A1 WO 2006025106 A1 WO2006025106 A1 WO 2006025106A1 JP 2004012626 W JP2004012626 W JP 2004012626W WO 2006025106 A1 WO2006025106 A1 WO 2006025106A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- speaker
- voice
- preset information
- speech
- input
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims description 23
- 238000012545 processing Methods 0.000 claims description 31
- 238000000926 separation method Methods 0.000 claims description 13
- 238000010586 diagram Methods 0.000 description 8
- 238000005516 engineering process Methods 0.000 description 6
- 230000006870 function Effects 0.000 description 6
- 230000004044 response Effects 0.000 description 4
- 238000004891 communication Methods 0.000 description 3
- 230000007423 decrease Effects 0.000 description 3
- 206010039203 Road traffic accident Diseases 0.000 description 2
- 230000015572 biosynthetic process Effects 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000012880 independent component analysis Methods 0.000 description 2
- 238000003786 synthesis reaction Methods 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- 125000002066 L-histidyl group Chemical group [H]N1C([H])=NC(C([H])([H])[C@](C(=O)[*])([H])N([H])[H])=C1[H] 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 238000012552 review Methods 0.000 description 1
- 230000005236 sound signal Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0272—Voice signal separating
- G10L21/028—Voice signal separating using properties of sound source
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/20—Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L2021/02161—Number of inputs available containing the signal or the noise to be suppressed
- G10L2021/02166—Microphone arrays; Beamforming
Definitions
- Speech recognition system speech recognition method and program thereof
- the present invention relates to a voice recognition system, a voice recognition method, and a program thereof.
- Patent Document 1 A technique for separating speech based on the directivity of a microphone (see Patent Document 1 and Patent Document 3).
- Non-patent Document 1 Non-Patent Document 3
- the algorithm used for signal processing in (2) is BSS (Blind Source S mark aration), which uses independent component analysis (ICA) to separate sound sources using only the received audio signal.
- ICA independent component analysis
- Patent Document 1 Japanese Patent Application Laid-Open No. 2003-040992 (Claim 1)
- Patent Document 2 JP-A-11-298988 (Claim 1)
- Patent Document 3 Japanese Patent Laid-Open No. 2001-337694 (Claim 1)
- Patent Document 2 S.Kurita, H.Saruwatari, S.Kajita, K.Takeda, F.Itakura, "Evaluation of blind signal separation method using directivity pattern under reverberant conditions," Proc, IEEE, International Conference on Acoustic, Speech, and Signal Processing, SAM-P2-5, pp.3140-3143 June, 2000.
- Non-Patent Document 3 Hiroshi Saruwatari, Katsuyuki Sawai, et al., Vehicle Speech Recognition Using Samurai Blind Sound Source Separation and Subband Removal Processing, Seigaku Technical Review, Vol. 102, No. 35, pp. 7-12.
- a CPU Central Processing Unit
- a CPU Central Processing Unit
- the problem is that the time required for recognizing the voice command increases or the time required for recognizing the voice command is reduced. Since it is necessary to use a CPU with high capacity, there is a problem that it leads to cost increase.
- the speech recognition rate is lowered by adjusting the physique of the speaker, the position of the seat, and the inclination angle.
- the present invention solves the above-described problems, suppresses the consumption of CPU resources when recognizing the voice command of the speaker, and reduces the voice recognition rate even when the position of the speaker speaking changes. It is an object of the present invention to provide a voice recognition system that does not decrease.
- a voice authentication system includes a storage device that stores preset information indicating a sound source position of a speaker's voice, and a speaker's voice stored in the storage device.
- the speech recognition unit is configured to separate the speech of the speaker from the speech input by the microphone and perform speech recognition.
- the speech recognition system of the present invention further includes a sensor that detects the position of the seat of the speaker, the storage device stores preset information for each position of the seat of the speaker, and the preset information search unit includes: Acquired the position of the seat seat of the speaker from the sensor Based on the position of the seat, the preset information is retrieved from the storage device and output to the voice recognition unit.
- the preset information search unit includes: Acquired the position of the seat seat of the speaker from the sensor Based on the position of the seat, the preset information is retrieved from the storage device and output to the voice recognition unit.
- the voice recognition system Since the voice recognition system stores the sound source position for each speaker in advance, even when different speakers (drivers) use it, the sound source separation between the speaker's voice and noise (noise) is possible. Easy. In addition, it is possible to prevent the voice recognition rate of the voice command from being lowered when the position where the speaker speaks changes depending on the manner of adjusting the physique of the speaker and the position of the seat.
- the voice recognition system performs voice separation processing based on information of the sound source position stored in advance, CPU resources required for voice separation are saved. Therefore, the time required for the voice separation process (voice recognition process) can be shortened, and the operation response when the speaker operates the in-vehicle terminal device with the voice command can be improved. In other words, it becomes easier for the speaker to perform operations using voice commands.
- FIG. 1 is a block diagram showing the speech recognition system of the present embodiment.
- the speech recognition system of the present embodiment includes a microphone (speech input means) 14 that accepts a speech input of a speaker 31 (31A, B) of a vehicle 1, a navigation device (vehicle information terminal) 10 and a navigation device. 10 (sound output means) for outputting audio data from 10 (15A, B) and a monitor (display means) for outputting image data from the navigation device 10 ) 13, a remote controller (input means) 18 for the speaker 31 (31A, B) to input various information to the navigation device 10, and a receiver 19 for receiving various information input from the remote controller 18. It is comprised including.
- Microphone 14 is connected to navigation device 10 and receives the voice input of speaker 31 (31A, B), and outputs this voice information to navigation device 10.
- the microphone 14 includes at least two microphones (macrophones) as shown in FIG.
- the microphone 14 consists of three or more.
- the remote controller 18 is an input means for inputting various information such as the ID of the speaker 31 (31A, B) and an instruction for calling preset information to the navigation device 10.
- the receiver 19 receives information input via the remote controller 18 and outputs this information to the navigation device 10.
- the remote controller 18 and the receiver 19 may perform wireless communication using infrared rays, Bluetooth, or the like, or may be connected by a cable or the like to perform wired communication.
- the monitor 13 may allow the speaker 31 (31A, B) to input various information by touching the screen. That is, the monitor 13 may be provided with a touch panel function.
- the remote controller 18 may be a 3D “X” using a mobile phone or a PDA (Personal Digital Assistant) having a predetermined communication function.
- FIG. 2 (a) is a block diagram of the speech recognition system
- FIG. 2 (b) is a block diagram showing functions realized by the CPU of FIG. 2 (a).
- the navigation device 10 includes an AZD (Analog / Digital) conversion device 16, a CPU (Central Processing Unit) 17, a storage device 21, and a D / A (DigitalZAnalog )
- the converter 24 is comprised.
- the AZD (AnalogZDigital) converter 16 converts the voice command of the speaker 31 input from the microphone 14 from an analog waveform to a digital signal. And digital The voice command converted to the number is output to the CPU17.
- the CPU 17 is divided into a sound source position specifying unit 23, a speech recognition unit 20, a preset information search unit 25, and a navigation processing unit 22 as shown in FIG. .
- Each configuration shown here is realized by the CPU 17 executing a sound source position specifying program, a speech recognition program, and the like stored in the storage device 21.
- the sound source position specifying unit 23 Upon receiving the voice command converted into a digital signal by the A / D converter 16, the sound source position specifying unit 23 calculates and processes the phase information (advance and delay) and intensity distribution of the voice command, Analyzes the directivity of the voice command utterance, and identifies the sound source position of the voice of the speaker 31. Then, this sound source position is registered (stored) in the storage device 21 as preset information of the speaker 31 (details will be described later).
- the sound source position is specified using a known technique (Patent Documents 1 and 3, Non-patent Documents 1 and 3), and the sound source position specifying accuracy at this time is about ⁇ 5 cm.
- the preset information search unit 25 searches the preset information of the speaker 31 from the storage device 21 in response to the input of the ID of the speaker 31 from the remote controller 18 or the like. Then, the retrieved preset information is transferred to the voice recognition unit 20.
- the preset information search unit 25 may display the preset information stored in the storage device 21 on the monitor 13 and output a screen prompting the speaker 31 to select and input preset information. Good.
- the voice recognition unit 20 Upon receiving voice input via the microphone 14, the voice recognition unit 20 refers to the preset information of the speaker 31 and creates a voice signal in which the directivity of this voice command is set. The input voice power also separates the voice of the speaker 31. Then, the command indicates what command the voice signal of the separated voice (voice command) indicates. That is, the voice command is recognized as a predetermined command with reference to the voice dictionary (information indicating the command that the voice signal means for each voice signal) stored in the storage device 21.
- the navigation processing unit 22 When the navigation processing unit 22 receives a command output from the voice recognition unit 20, the navigation processing unit 22 performs various types of navigation processing based on the command.
- the storage device 21 stores the preset information, the sound source position specifying program, the voice recognition program, the preset information registration program, the voice dictionary, and the like, and is configured by a hard disk, a nonvolatile memory, or the like.
- the voice dictionary is information indicating a command that the voice signal means for each voice signal.
- FIG. 3 is a diagram illustrating preset information stored in the storage device of FIG.
- the preset information stores, for each speaker 31 (passenger of the vehicle 1), information on the sound source position when the speaker 31 utters a voice command.
- the sound source position when the speaker 31A utters is (X, Y),
- This preset information is stored in the storage device 21 as coordinate position data.
- image data (see FIG. 3) indicating the sound source position when the speaker 31 speaks is displayed. It may be created and output.
- the speaker 31 may register the preset information every time the vehicle 1 starts to be operated. Also, if it is determined in advance that a predetermined person is a driver (speaker), preset information of the driver (speaker) is registered, and when the speaker 31 starts driving the vehicle 1 The speaker 31 may input the ID or the like to the navigation device 10 to call the preset information.
- this preset information includes the sound source position of the front passenger's speaker 31B (see Fig. 3) and the sound source position of the passenger in the rear seat (not shown). May be included. In this way, the speaker 31 other than the driver's seat can also use voice commands.
- the navigation device 10 can be operated with.
- the information on the sound source position in the preset information is assumed to be stored as the two-dimensional information of (X, Y).
- the information on the height is added to this information, and the three-dimensional information ( ⁇ , ⁇ , ⁇ ) It can be remembered as).
- the navigation apparatus 10 acquires sound from the three microphones 14 so that the CPU 17 calculates the directivity of the sound in the height direction.
- the navigation device 10 obtains information on the height of the sound source position of the speaker 31 so that the CPU 17 can identify the accurate sound source position and sound directivity of the speaker 31. It becomes easy to perform the arithmetic processing.
- the accuracy of the sound source position of the speaker 31 stored in the preset information is about 5 cm as described above.
- the preset information stored in the storage device 21 is called out by inputting an ID or the like from the remote controller 18 (or the monitor 13 if the monitor 13 has a touch panel function).
- the preset information search unit 25 searches the storage device 21 for preset information of the speaker 31 using this ID as a key. And call this information.
- the ID of the speaker 31 may be manually input by the speaker 31 using the key of the remote controller 18 or stored in advance in a storage unit (not shown) of the remote controller 18. This may be transmitted to the navigation device 10.
- FIG. 4 is a flowchart showing a preset information registration procedure according to the present embodiment.
- the preset information registration procedure (execution processing of the preset information registration program by the CPU 17) in the present embodiment will be described with reference to FIG. 4 (see FIG. 1 and FIG. 3 as appropriate).
- the monitor 13 (see FIGS. 1 and 2) is configured with a touch panel and the speaker 31 performs various inputs by touch input to the monitor 13 will be described as an example.
- the CPU 17 inputs an instruction to start registration of preset information from the monitor 13 of the vehicle 1 ),
- the information registration screen of the speaker 31 is read from the storage device 21 and output to the monitor 13.
- the CPU 17 receives input of the information of the speaker 31 (for example, the name and ID of the speaker 31) via the monitor 13 (step S401)
- the CPU 17 stores the information of the speaker 31 in the storage device 21. .
- the CPU 17 stores voice data that prompts the speaker 31 to speak a voice command.
- the voice data (voice guidance) is output from the speaker 15 (step S402).
- the CPU 17 outputs a voice guidance saying “Perform initial setting of the utterance position (sound source position). Press the start button of utterance position setting on the monitor” from the speaker 15.
- the CPU 17 reads “From the speaker 15, hold the handle and take a normal driving posture. Repeat the voice command for 10 seconds after the sound, and it will end with a beep. " Then, voices previously stored in the storage device 21 such as “Destination setting”, “Restoration guidance”, “Reroute”, etc. are output from the speaker 15, and the voices of these voices (voice commands) are sent to the speaker 31. Prompt.
- the CPU 17 receives an input of a voice command uttered by the speaker 31 via the microphone 14 (step S403).
- the input voice command is a voice command converted into a digital signal by the A / D converter 16.
- the sound source position specifying unit 23 of the CPU 17 calculates the phase information (advance and delay) and intensity distribution in the voice command converted into a digital signal, and analyzes the directivity of the utterance in the voice command. Based on the analyzed directivity information, the sound source position of the utterance of the speaker 31 is specified (step S404). Then, this sound source position is registered in the storage device 21 as the preset information of the speaker 31 (step S405), and the recording process is terminated. When preset information is registered, the information (ID, etc.) of the speaker 31 input in step S401 is also included. In this way, the preset information search unit 25 can call the preset information of the speaker 31 from the storage device 21 using the ID of the speaker 31 as a key.
- FIG. 5 is a flowchart showing a speech recognition processing procedure in the present embodiment.
- the speech recognition processing execution processing of the speech recognition program by the CPU 17
- Fig. 5 see Fig. 1 and Fig. 4 as appropriate.
- the preset information search unit 25 uses the ID as a key to store the speaker 3 from the storage device 21.
- the preset information of 1 is retrieved and this preset information is called (step S502). Then, the preset information is transferred to the voice recognition unit 20.
- step S503 the speech information of the voice command is recognized by referring to the preset information of the speaker 31 searched by the preset information search unit 25 (step S504).
- the voice recognition unit 20 refers to the preset information of the speaker 31 and identifies the directivity of the voice (voice command). Next, based on this directivity, the voice input via the microphone 14 is separated into vehicle interior noise (for example, noise generated when traveling in a tunnel) and voice command voice. Then, the voice dictionary of the storage device 21 is referred to specify a command that is meant by the voice of the separated voice command.
- the speech recognition unit 20 outputs the command specified in step S504 to the navigation processing unit 22 (step S505), and the navigation processing unit 22 performs navigation processing according to this command. (Step S506).
- the navigation processing here is, for example, outputting image data for navigation to the monitor 13 in accordance with a command output from the voice recognition unit 20, or to the speaker 15 via the DZA converter 21 for navigation. Or to output the sound.
- the voice recognition unit 20 refers to the preset information to separate voice commands. To do. By doing so, it is possible to reduce the load of the speech recognition processing of the CPU 17 as compared with the conventional case.
- the ID of the speaker 31 is input from the monitor (touch panel) 13.
- a wireless entry key that wirelessly locks and unlocks the door of the vehicle 1 may be used. That is, when the door of the vehicle 1 is opened, a unique ID (speaker 31 ID) transmitted from the wireless entry key is obtained via the receiver 19, and the preset information search unit 25 presets the speaker 31 based on this ID. Information may be called up and passed to the speech recognition unit 20.
- the driver can easily use the navigation device 10 of the present embodiment.
- the storage device 21 stores information on standard sound source positions (standard preset information) for each seat position (driver seat, passenger seat, right rear seat, left rear seat, etc.).
- the sound source position identifying unit 23 identifies the sound source position with reference to the standard preset information. The present embodiment will be described with reference to FIGS.
- the storage device stores information on the standard sound source position at the input seat position. Call from 21. Then, the sound source position specifying unit 23 specifies the sound source position of the speaker 31 based on the standard sound source position information and the voice command acquired from the speaker 31. In this way, the sound source position specifying unit 23 can create more accurate sound source position preset information. Further, the load of the sound source position specifying process in the sound source position specifying unit 23 can be reduced.
- the voice recognition unit 20 may perform voice recognition processing of the speaker 31 based on the information on the standard sound source position described above. That is, when the preset information search unit 25 receives the selection input of the seat position of the speaker 31, the preset information search unit 25 reads information on the standard sound source position of the seat position (standard preset information) from the storage device 21, and the voice recognition unit 20 To hand over. Then, the voice recognition unit 20 sets the directivity of the speaker 31 based on the standard sound source position, and performs voice command voice separation and voice recognition processing. In this way, people who have not registered preset information (for example, passengers in the passenger seat) can speak temporarily. It becomes easy to become person 31.
- the vehicle 1 includes a sensor that detects the shift amount (seat position) of the seat back and forth, the inclination angle of the backrest, and the like, and the preset information search unit 25 performs presets based on the detection results of the sensors. Information may be searched.
- preset information indicating the shift amount of the seat seat of the vehicle 1 in the front-rear direction, the sound source position for each inclination angle of the backrest, and the like is registered in the storage device 21 in advance. Then, when the preset information retrieval unit 25 obtains the shift amount before and after the seat seat, the inclination angle of the backrest, etc. from the sensor, the preset corresponding to the shift amount before and after the seat seat, the inclination angle of the backrest, etc. Information is retrieved from the storage device 21. Then, the speech recognition unit 20 performs speech recognition based on the searched preset information. By doing so, it is possible to reduce the processing load of speech recognition of the speech recognition unit 20.
- the speaker 31 inputs an instruction to confirm whether or not the voice command is correctly recognized by the navigation device 10 via the remote controller 18.
- the CPU 17 calls up the noise data stored in the storage device 21 (for example, the noise data when the vehicle is traveling in the tunnel at a speed of 100 km / h). Output.
- the speaker 31 utters a voice command under such noise, and the CPU 17 performs a process of specifying the voice command uttered by the speaker 31 as in Steps S503 and S504 of FIG.
- the CPU 17 refers to the text / speech conversion table recorded in the storage device 21 based on the contents of the specified command, and converts the speech command into a speech synthesis signal. Then, it is converted into an analog waveform by the DZA converter 21 and output from the speaker 15 as a voice synthesized sound. That is, the navigation device 10 is caused to repeat the voice command input from the speaker 31.
- the navigation device 10 recognizes the voice command correctly. Will be.
- the voice synthesis sound (repeated voice command) output from the speaker 15 is different from the voice command issued by the speaker 31, the navigation device 10 does not recognize the voice command correctly. Therefore, the speaker 31 can take measures such as registering preset information again.
- the present invention is not limited to the above-described embodiments, and can be applied without departing from the spirit of the invention.
- the case where the voice recognition system of the present invention is applied to a navigation device has been described as an example, but may be applied to other in-vehicle information terminals.
- the noise output from the speaker 15 uses the sound data stored in the storage medium such as the force S, CD, etc., in which the sound data stored in the storage device 21 is used.
- the speech recognition system can be realized by a computer and a program, and the program can be provided by being stored in a computer-readable storage medium (CD-ROM or the like). Is possible.
- the program can be provided through a network.
- the computer system includes software such as an OS (Operating System) and hardware such as peripheral devices.
- OS Operating System
- peripheral devices such as peripheral devices.
- FIG. 1 is a block diagram showing a configuration of a voice recognition system according to the present exemplary embodiment.
- FIG. 2 (a) is a block diagram of the speech recognition system
- FIG. 2 (b) is a block diagram showing functions realized by the CPU of FIG. 2 (a).
- FIG. 3 is a diagram illustrating preset information stored in the storage device of FIG. 2 (a).
- FIG. 4 is a flowchart showing a preset information registration procedure in the present embodiment.
- FIG. 5 is a flowchart showing a speech recognition processing procedure in the present embodiment. Explanation of symbols
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Quality & Reliability (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Navigation (AREA)
Abstract
Description
Claims
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2006515454A JP4478146B2 (ja) | 2004-09-01 | 2004-09-01 | 音声認識システム、音声認識方法およびそのプログラム |
PCT/JP2004/012626 WO2006025106A1 (ja) | 2004-09-01 | 2004-09-01 | 音声認識システム、音声認識方法およびそのプログラム |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2004/012626 WO2006025106A1 (ja) | 2004-09-01 | 2004-09-01 | 音声認識システム、音声認識方法およびそのプログラム |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2006025106A1 true WO2006025106A1 (ja) | 2006-03-09 |
Family
ID=35999770
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2004/012626 WO2006025106A1 (ja) | 2004-09-01 | 2004-09-01 | 音声認識システム、音声認識方法およびそのプログラム |
Country Status (2)
Country | Link |
---|---|
JP (1) | JP4478146B2 (ja) |
WO (1) | WO2006025106A1 (ja) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2009025714A (ja) * | 2007-07-23 | 2009-02-05 | Xanavi Informatics Corp | 車載装置および音声認識方法 |
JP2009036810A (ja) * | 2007-07-31 | 2009-02-19 | National Institute Of Information & Communication Technology | 近傍場音源分離プログラム、及びこのプログラムを記録したコンピュータ読取可能な記録媒体、並びに近傍場音源分離方法 |
WO2012160602A1 (ja) * | 2011-05-24 | 2012-11-29 | 三菱電機株式会社 | 目的音強調装置およびカーナビゲーションシステム |
US9583119B2 (en) | 2015-06-18 | 2017-02-28 | Honda Motor Co., Ltd. | Sound source separating device and sound source separating method |
US9697832B2 (en) | 2015-06-18 | 2017-07-04 | Honda Motor Co., Ltd. | Speech recognition apparatus and speech recognition method |
CN112185353A (zh) * | 2020-09-09 | 2021-01-05 | 北京小米松果电子有限公司 | 音频信号的处理方法、装置、终端及存储介质 |
CN113241073A (zh) * | 2021-06-29 | 2021-08-10 | 深圳市欧瑞博科技股份有限公司 | 智能语音控制方法、装置、电子设备及存储介质 |
WO2022176085A1 (ja) * | 2021-02-18 | 2022-08-25 | 三菱電機株式会社 | 車載向け音声分離装置及び音声分離方法 |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9619645B2 (en) * | 2013-04-04 | 2017-04-11 | Cypress Semiconductor Corporation | Authentication for recognition systems |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH05122689A (ja) * | 1991-10-25 | 1993-05-18 | Seiko Epson Corp | テレビ会議システム |
JPH11219193A (ja) * | 1998-02-03 | 1999-08-10 | Fujitsu Ten Ltd | 車載用音声認識装置 |
JP2001296891A (ja) * | 2000-04-14 | 2001-10-26 | Mitsubishi Electric Corp | 音声認識方法および装置 |
JP2002034092A (ja) * | 2000-07-17 | 2002-01-31 | Sharp Corp | 収音装置 |
JP2003114699A (ja) * | 2001-10-03 | 2003-04-18 | Auto Network Gijutsu Kenkyusho:Kk | 車載音声認識システム |
JP2004029299A (ja) * | 2002-06-25 | 2004-01-29 | Auto Network Gijutsu Kenkyusho:Kk | 音声認識システム |
-
2004
- 2004-09-01 JP JP2006515454A patent/JP4478146B2/ja not_active Expired - Fee Related
- 2004-09-01 WO PCT/JP2004/012626 patent/WO2006025106A1/ja active Application Filing
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH05122689A (ja) * | 1991-10-25 | 1993-05-18 | Seiko Epson Corp | テレビ会議システム |
JPH11219193A (ja) * | 1998-02-03 | 1999-08-10 | Fujitsu Ten Ltd | 車載用音声認識装置 |
JP2001296891A (ja) * | 2000-04-14 | 2001-10-26 | Mitsubishi Electric Corp | 音声認識方法および装置 |
JP2002034092A (ja) * | 2000-07-17 | 2002-01-31 | Sharp Corp | 収音装置 |
JP2003114699A (ja) * | 2001-10-03 | 2003-04-18 | Auto Network Gijutsu Kenkyusho:Kk | 車載音声認識システム |
JP2004029299A (ja) * | 2002-06-25 | 2004-01-29 | Auto Network Gijutsu Kenkyusho:Kk | 音声認識システム |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2009025714A (ja) * | 2007-07-23 | 2009-02-05 | Xanavi Informatics Corp | 車載装置および音声認識方法 |
JP2009036810A (ja) * | 2007-07-31 | 2009-02-19 | National Institute Of Information & Communication Technology | 近傍場音源分離プログラム、及びこのプログラムを記録したコンピュータ読取可能な記録媒体、並びに近傍場音源分離方法 |
WO2012160602A1 (ja) * | 2011-05-24 | 2012-11-29 | 三菱電機株式会社 | 目的音強調装置およびカーナビゲーションシステム |
JP5543023B2 (ja) * | 2011-05-24 | 2014-07-09 | 三菱電機株式会社 | 目的音強調装置およびカーナビゲーションシステム |
US9583119B2 (en) | 2015-06-18 | 2017-02-28 | Honda Motor Co., Ltd. | Sound source separating device and sound source separating method |
US9697832B2 (en) | 2015-06-18 | 2017-07-04 | Honda Motor Co., Ltd. | Speech recognition apparatus and speech recognition method |
CN112185353A (zh) * | 2020-09-09 | 2021-01-05 | 北京小米松果电子有限公司 | 音频信号的处理方法、装置、终端及存储介质 |
WO2022176085A1 (ja) * | 2021-02-18 | 2022-08-25 | 三菱電機株式会社 | 車載向け音声分離装置及び音声分離方法 |
CN113241073A (zh) * | 2021-06-29 | 2021-08-10 | 深圳市欧瑞博科技股份有限公司 | 智能语音控制方法、装置、电子设备及存储介质 |
CN113241073B (zh) * | 2021-06-29 | 2023-10-31 | 深圳市欧瑞博科技股份有限公司 | 智能语音控制方法、装置、电子设备及存储介质 |
Also Published As
Publication number | Publication date |
---|---|
JPWO2006025106A1 (ja) | 2008-05-08 |
JP4478146B2 (ja) | 2010-06-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106796786B (zh) | 语音识别系统 | |
JP4779748B2 (ja) | 車両用音声入出力装置および音声入出力装置用プログラム | |
US8010359B2 (en) | Speech recognition system, speech recognition method and storage medium | |
JP6584731B2 (ja) | ジェスチャ操作装置及びジェスチャ操作方法 | |
JPWO2008084575A1 (ja) | 車載用音声認識装置 | |
JP2010130223A (ja) | 音声操作システムおよび音声操作方法 | |
JP4478146B2 (ja) | 音声認識システム、音声認識方法およびそのプログラム | |
CN111007968A (zh) | 智能体装置、智能体提示方法及存储介质 | |
JP6459330B2 (ja) | 音声認識装置、音声認識方法、及び音声認識プログラム | |
JP2015074315A (ja) | 車載中継装置及び車載通信システム | |
JP3654045B2 (ja) | 音声認識装置 | |
JP2018116130A (ja) | 車内音声処理装置および車内音声処理方法 | |
JP4410378B2 (ja) | 音声認識方法および装置 | |
JP6522009B2 (ja) | 音声認識システム | |
JP5052241B2 (ja) | 車載用の音声処理装置、音声処理システム、及び音声処理方法 | |
JP7280074B2 (ja) | エージェント装置、エージェント装置の制御方法、およびプログラム | |
JP2004301875A (ja) | 音声認識装置 | |
WO2022137534A1 (ja) | 車載用音声認識装置及び車載用音声認識方法 | |
JP2007057805A (ja) | 車両用情報処理装置 | |
JP2009098217A (ja) | 音声認識装置、音声認識装置を備えたナビゲーション装置、音声認識方法、音声認識プログラム、および記録媒体 | |
JP6509098B2 (ja) | 音声出力装置および音声出力制御方法 | |
JP5446540B2 (ja) | 情報検索装置、制御方法及びプログラム | |
JP7192561B2 (ja) | 音声出力装置および音声出力方法 | |
JP2019212168A (ja) | 音声認識システムおよび情報処理装置 | |
JP2003345389A (ja) | 音声認識装置 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): BW GH GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2006515454 Country of ref document: JP |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |