KR101681988B1 - Speech recognition apparatus, vehicle having the same and speech recongition method - Google Patents
Speech recognition apparatus, vehicle having the same and speech recongition method Download PDFInfo
- Publication number
- KR101681988B1 KR101681988B1 KR1020150106376A KR20150106376A KR101681988B1 KR 101681988 B1 KR101681988 B1 KR 101681988B1 KR 1020150106376 A KR1020150106376 A KR 1020150106376A KR 20150106376 A KR20150106376 A KR 20150106376A KR 101681988 B1 KR101681988 B1 KR 101681988B1
- Authority
- KR
- South Korea
- Prior art keywords
- speech
- signal
- voice
- noise signal
- speech recognition
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0272—Voice signal separating
Abstract
A speech recognition apparatus capable of improving the accuracy of the preprocessing for speech recognition by using a noise signal from other nearby speech recognition apparatuses for use in the preprocessing, a vehicle including the same, a server for exchanging signals with a plurality of speech recognition apparatuses And a speech recognition method.
A voice recognition device for converting the input voice into a voice signal and recognizing the voice when the voice of the user is input is characterized by comprising means for extracting a first noise signal from the voice signal, A preprocessing unit for synthesizing the first noise signal and the second noise signal extracted by the peripheral speech recognition apparatus on the basis of a second time at which the speech is input to generate a synthesized noise signal and performing preprocessing using the synthesized noise signal, ; And a recognition unit for recognizing the speech signal subjected to the preprocessing.
Description
The disclosed invention relates to a speech recognition apparatus that recognizes a user's speech, a vehicle that performs a specific function according to the recognized speech including the same, and a speech recognition method.
Speech recognition technology is a technology that increases the usability of a device by allowing the user to control the device by simply uttering a command without physically manipulating the interface.
Such speech recognition technology can be applied to various fields. In recent years, attempts have been made to apply speech recognition technology to a vehicle in order to reduce a driver's operation load.
In order for the speech recognition technology to be effectively applied, the accuracy of speech recognition must be guaranteed to some extent. Therefore, in order to improve the accuracy of speech recognition, many researches and developments have been made. In order to improve the accuracy of speech recognition, noise removal should be performed efficiently.
The present invention provides a speech recognition apparatus, a vehicle including the same, and a speech recognition method capable of improving the accuracy of the preprocessing for speech recognition by using a noise signal from other nearby speech recognition apparatuses for use in the preprocessing.
A voice recognition device for converting the input voice into a voice signal and recognizing the voice when the voice of the user is input is characterized by comprising means for extracting a first noise signal from the voice signal, A preprocessing unit for synthesizing the first noise signal and the second noise signal extracted by the peripheral speech recognition apparatus on the basis of a second time at which the speech is input to generate a synthesized noise signal and performing preprocessing using the synthesized noise signal, ; And a recognition unit for recognizing the speech signal subjected to the preprocessing.
The preprocessor may extract the voice section by removing the composite noise signal from the voice signal, and extract the feature from the voice section.
The recognition unit can recognize the voice signal by comparing the extracted feature with a model stored in advance.
A voice input unit for receiving the voice signal; And a communication unit for receiving the second time and the second noise signal.
The communication unit may receive the second time and the second noise signal from an external server.
The communication unit may receive the second time and the second noise signal from the peripheral speech recognition apparatus.
The communication unit may be a wireless LAN, a Wi-Fi, a Bluetooth, a Zigbee, a Wi-Fi Direct, a UWB (Ultra Wideband), an Infrared Data Association ), Bluetooth low energy (BLE), Near Field Communication (NFC), and Radio Frequency Identification (RFID).
The communication unit may transmit the first time and the first noise signal to the peripheral speech recognition apparatus.
The communication unit may transmit the first time and the first noise signal to an external server.
According to an embodiment of the present invention, there is provided a vehicle including: a voice input unit for receiving voice of a user; And a second noise signal extracting unit that extracts a first noise signal from the input speech signal and outputs the first noise signal and the second noise signal to the first time and the neighboring speech recognition apparatus, A preprocessing unit for synthesizing a second noise signal extracted by the peripheral speech recognition apparatus to generate a synthesized noise signal and performing preprocessing using the synthesized noise signal; And a recognition unit for recognizing the speech signal subjected to the preprocessing.
The preprocessor may extract the voice section by removing the composite noise signal from the voice signal, and extract the feature from the voice section.
The recognition unit can recognize the voice signal by comparing the extracted feature with a model stored in advance.
The vehicle may further include a communication unit for receiving the second time and the second noise signal.
The communication unit may receive the second time and the second noise signal from an external server.
The communication unit may receive the second time and the second noise signal from the peripheral speech recognition apparatus.
According to an embodiment of the present invention, there is provided a voice recognition method for converting a voice input to a voice signal into a voice signal, the voice recognition method comprising: extracting a first noise signal from the voice signal; Synthesizes the first noise signal and the second noise signal extracted by the peripheral speech recognition apparatus based on a first time at which the speech is input and a second time at which the speech is input to the peripheral speech recognition apparatus to generate a synthesized noise signal and; Performing preprocessing using the synthesized noise signal; And recognizing the speech signal on which the preprocessing has been performed.
Performing the preprocessing may include extracting a speech interval by removing the composite noise signal from the speech signal, and extracting features from the speech interval.
Recognizing the preprocessed speech signal may include recognizing the speech signal by comparing the extracted feature with a previously stored model.
And receiving the second time signal and the second noise signal from an external server or the peripheral speech recognition apparatus.
According to one aspect of the present invention, a speech recognition apparatus, a vehicle including the same, and a speech recognition method can receive a noise signal from other speech recognition devices in the vicinity and use the same in a preprocessing, thereby improving the accuracy of preprocessing for speech recognition.
FIG. 1 and FIG. 2 are views for schematically illustrating a relationship between a speech recognition apparatus according to an embodiment and other speech recognition apparatuses located around the speech recognition apparatus.
3 and 4 are control block diagrams of a speech recognition apparatus according to an embodiment.
Figs. 5 to 7 are views showing examples of voice signals input to the first voice recognition device, the second voice recognition device, and the third voice recognition device. Fig.
8 is a diagram illustrating a process of synthesizing a plurality of noise signals.
9 is a diagram showing a voice signal from which a synthesized noise signal is removed.
10 is an external view of a vehicle according to an embodiment.
11 is a view showing an internal configuration of a vehicle according to an embodiment.
12 is a control block diagram of a server according to an embodiment.
FIG. 13 is a flowchart illustrating a process in which a plurality of speech recognition apparatuses and a server exchange signals with each other.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, embodiments of a speech recognition apparatus, a vehicle including the same, and a speech recognition method will be described in detail with reference to the accompanying drawings.
FIG. 1 and FIG. 2 are views for schematically illustrating a relationship between a speech recognition apparatus according to an embodiment and other speech recognition apparatuses located around the speech recognition apparatus. 1 and 2 are all plan views of the vehicle from above.
Referring to FIG. 1, a
Other
In the example of FIG. 1, there are two
In order to recognize the voice of the user aboard the
When the
As shown in FIG. 1, the
The
When the noise signals extracted by the peripheral
The
The peripheral
3 and 4 are control block diagrams of a speech recognition apparatus according to an embodiment.
3, the
The
For example, the
The
When synthesizing the noise signal, it is possible to consider the time information when the speech is input to the
Then, the
The
The acoustic model can be divided into a direct comparison method of setting a recognition object as a feature vector model and comparing it with a feature vector of voice data, and a statistical method of using the feature vector of the recognition object statistically.
The direct comparison method is a method of setting a unit of a recognition target word, a phoneme, etc. as a feature vector model and comparing how similar the input speech is, and a vector quantization method is typically used. According to the vector quantization method, a feature vector of input speech data is mapped to a codebook, which is a reference model, and is encoded into a representative value, thereby comparing the code values with each other.
The statistical model method is a method of constructing a unit of a recognition object as a state sequence and using the relation between the state strings. The state column may consist of a plurality of nodes. The method of using the relation between state columns is again Dynamic Time Warping (DTW), Hidden Markov Model (HMM), and a method using a neural network.
Dynamic time warping is a method of compensating for differences in the time axis when compared to a reference model, taking into account the dynamic characteristics of the speech, in which the length of the signal varies with time even if the same person pronounces the same. Hidden Markov models, Assuming a Markov process with probabilities and observation probabilities of nodes (output symbols) in each state, we estimate the state transition probability and observation probability of nodes through the learning data, and calculate the probability that the input speech is generated in the estimated model .
On the other hand, the language model modeling the linguistic order relations of words and syllables can reduce the acoustical ambiguity and reduce the errors of recognition by applying the order relation between the units constituting the language to the units obtained by speech recognition. There are statistical language models and Finite State Automata (FSA) based models for language models, and the chained probabilities of words such as Unigram, Bigram, and Trigram are used for statistical language models.
The
The recognition result of the
At least one of the
Some or all of the
The
The
In addition, the processor and the memory may be provided in a single configuration, a plurality of configurations, physically separated configurations, or a single chip, depending on their capacities.
As described above, the
4, the
The
The
The
The
Also, the
On the other hand, there may be cases where other
Hereinafter, the process of removing the noise signal from the speech signal by the
Figs. 5 to 7 are views showing examples of voice signals input to the first voice recognition device, the second voice recognition device, and the third voice recognition device. Fig.
When the voice signal input to the first
6, the time at which the speech interval starts can be estimated as T 2 , and the second
7, the time at which the voice section starts can be estimated to be T 3 , and the third
FIG. 8 is a diagram illustrating a process of synthesizing a plurality of noise signals, and FIG. 9 is a diagram illustrating a voice signal from which a synthesized noise signal is removed.
The second noise signal extracted by the second
The
Specifically, as shown in Fig. 8, the noise signal can be synthesized according to the time sequence. The first noise signal may be generated from T 1 , the second noise signal may be generated from T 2 , and the third noise signal may be generated from T 3 to generate a composite noise signal.
The
When the user selects the first
However, when the user selects the second
In this case, the first
FIG. 10 is an external view of a vehicle according to an embodiment, and FIG. 11 is a diagram illustrating an internal configuration of a vehicle according to an embodiment.
The
10, the outer appearance of the
The
The
The
The side mirrors 204L and 204R include a
In addition, the
The proximity sensor can send a detection signal to the side or rear of the vehicle and receive a reflection signal reflected from an obstacle such as another vehicle. The presence or absence of an obstacle on the side or rear of the
Referring to FIG. 11, an AVN (Audio Video Navigation) terminal 270 may be provided in a central area of the
The
The
The
A
The
For example, the
On the other hand, the
The
The microphone for receiving voice from the user may be provided in the head lining 209, the
As described above, the peripheral
When the
12 is a control block diagram of a server according to an embodiment.
The
The
The wireless communication module may include an antenna or a wireless communication chip for transmitting and receiving a wireless signal with at least one of a base station and an external device on a mobile communication network. For example, the wireless communication module may include a wireless LAN standard (IEEE802.11x May be a wireless communication module.
The short-range communication module means a module for performing short-range communication with a device located within a predetermined distance. Examples of the local communication technology that can be applied to one embodiment include a wireless LAN, a Wi-Fi, a Bluetooth, a zigbee, a Wi-Fi Direct, an ultra wideband (UWB) , infrared data association (BDE), Bluetooth low energy (BLE), and Near Field Communication (NFC).
When at least one of the
When the
The
It is also possible that the
The speech recognition program or the speech recognition application that executes the operation to perform the above-described operations of the
The speech recognition program may be installed at the time of manufacturing the
The
When the storage medium is included in the application or the server providing the program, it is possible to access the server through the Internet and download the speech recognition program. Here, the server providing the program may be the same as or different from the
In the case where the storage medium is implemented as an auxiliary storage device such as a magnetic disk, an optical disk, a CD-ROM, or a DVD, it is also possible to download a speech recognition program by inserting an auxiliary storage device into the
FIG. 13 is a flowchart illustrating a process in which a plurality of speech recognition apparatuses and a server exchange signals with each other. All or some of the following processes may be included in the speech recognition method according to an embodiment.
In the example of FIG. 13, it is assumed that there are two peripheral speech recognition devices, and the speech recognition devices exchange signals through the server. 5 to 9, the
First, the user turns on the voice recognition mode of the first
The first
The first
The
Here, the second
Information regarding which speech recognition apparatus has agreed to provide the noise signal may be stored in the
Alternatively, it is also possible that information regarding which speech recognition apparatus has agreed to provide the noise signal is stored in the
If the second
The
Alternatively, when the
The above examples show a method of turning on the voice recognition mode of the second
When the first, second and third
When a beep sound is generated, the user inputs a voice, the first
The second
The second
The second
The third
The
Then, the noise preprocessing is performed by removing the composite noise signal from the speech signal (615).
On the other hand, when the first
According to the above-described embodiments, various noise environments can be reflected in speech recognition, and when different speech recognition engines of a plurality of speech recognition devices existing at different positions are used, It is possible to obtain an effect to utilize.
100: Speech recognition device
200: vehicle
300,400: Peripheral speech recognition device
110:
120:
130: Post-
140:
500: Server
Claims (20)
A communication unit for transmitting a signal for turning on a voice recognition mode of the peripheral voice recognition device to an external server connected to the peripheral voice recognition device or the peripheral voice recognition device when the voice recognition mode is turned on;
Wherein the first noise signal and the peripheral speech recognition apparatus are configured to extract a first noise signal from the speech signal and to generate a first noise signal based on a first time at which the speech is input and a second time at which the speech is input to the peripheral speech recognition apparatus A preprocessor for synthesizing the extracted second noise signals to generate a synthesized noise signal and performing preprocessing using the synthesized noise signal; And
And a recognition unit for recognizing the speech signal on which the preprocessing has been performed.
The pre-
Extracts a synthesized noise signal from the speech signal to extract a speech section, and extracts a feature from the speech section.
Wherein,
And comparing the extracted feature with a previously stored model to recognize the speech signal.
And a voice input unit for receiving the voice signal.
Wherein,
And receives the second time signal and the second noise signal from the external server.
Wherein,
And receives the second time and the second noise signal from the peripheral speech recognition apparatus.
Wherein,
Wireless LAN, Wi-Fi, Bluetooth, Zigbee, Wi-Fi Direct, UWB, Infrared Data Association (IrDA), BLE Bluetooth Low Energy), NFC (Near Field Communication), and Radio Frequency Identification (RFID).
Wherein,
And transmits the first time and the first noise signal to the peripheral speech recognition apparatus.
Wherein,
And transmits the first time and the first noise signal to the external server.
A communication unit for transmitting a signal for turning on a voice recognition mode of the peripheral voice recognition device to an external server connected to the peripheral voice recognition device or the peripheral voice recognition device when the voice recognition mode is turned on;
Extracting a first noise signal from the input voice-converted voice signal, and outputting the first noise signal and the second noise signal based on a first time when the voice is input and a second time when the voice is input to the peripheral voice recognition device A preprocessing unit for synthesizing a second noise signal extracted by the peripheral speech recognition apparatus to generate a synthesized noise signal and performing preprocessing using the synthesized noise signal; And
And a recognition unit for recognizing the speech signal on which the preprocessing has been performed.
The pre-
Extracting the synthesized noise signal from the voice signal to extract a voice section, and extracting features from the voice section.
Wherein,
And comparing the extracted feature with a previously stored model to recognize the speech signal.
Wherein,
And receives the second time and the second noise signal from the external server.
Wherein,
And receives the second time and the second noise signal from the peripheral speech recognition apparatus.
Transmitting a signal for turning on the voice recognition mode of the peripheral voice recognition device to an external server connected to the peripheral voice recognition device or the peripheral voice recognition device when the voice recognition mode is turned on;
Extracting a first noise signal from the speech signal;
The first noise signal and the second noise signal extracted by the peripheral speech recognition apparatus are synthesized based on a first time at which the speech is input and a second time at which the speech is input to the peripheral speech recognition apparatus, Generate;
Performing preprocessing using the synthesized noise signal;
And recognizing the speech signal on which the preprocessing has been performed.
Performing the pre-
Extracting a synthesized noise signal from the speech signal to extract a speech section, and extracting features from the speech section.
Recognizing the speech signal on which the preprocess has been performed,
And comparing the extracted feature with a previously stored model to recognize the speech signal.
And receiving the second time and the second noise signal.
Wherein the second time and the second noise signal comprise:
And receiving the speech from the external server or the peripheral speech recognition apparatus.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020150106376A KR101681988B1 (en) | 2015-07-28 | 2015-07-28 | Speech recognition apparatus, vehicle having the same and speech recongition method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020150106376A KR101681988B1 (en) | 2015-07-28 | 2015-07-28 | Speech recognition apparatus, vehicle having the same and speech recongition method |
Publications (1)
Publication Number | Publication Date |
---|---|
KR101681988B1 true KR101681988B1 (en) | 2016-12-02 |
Family
ID=57571634
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
KR1020150106376A KR101681988B1 (en) | 2015-07-28 | 2015-07-28 | Speech recognition apparatus, vehicle having the same and speech recongition method |
Country Status (1)
Country | Link |
---|---|
KR (1) | KR101681988B1 (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109509465A (en) * | 2017-09-15 | 2019-03-22 | 阿里巴巴集团控股有限公司 | Processing method, component, equipment and the medium of voice signal |
KR20210081050A (en) * | 2019-12-23 | 2021-07-01 | 주식회사 에스에이치비쥬얼 | System having Power Distributor controller with energy saving function |
KR20230001968A (en) * | 2021-06-29 | 2023-01-05 | 혜윰기술 주식회사 | Voice and gesture integrating device of vehicle |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2003510645A (en) * | 1999-09-23 | 2003-03-18 | コーニンクレッカ フィリップス エレクトロニクス エヌ ヴィ | Voice recognition device and consumer electronic system |
JP2006227634A (en) * | 2006-03-29 | 2006-08-31 | Seiko Epson Corp | Equipment control method using voice recognition, equipment control system using voice recognition and recording medium which records equipment control program using voice recognition |
JP2011128391A (en) * | 2009-12-18 | 2011-06-30 | Toshiba Corp | Voice processing device, voice processing program and voice processing method |
-
2015
- 2015-07-28 KR KR1020150106376A patent/KR101681988B1/en active IP Right Grant
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2003510645A (en) * | 1999-09-23 | 2003-03-18 | コーニンクレッカ フィリップス エレクトロニクス エヌ ヴィ | Voice recognition device and consumer electronic system |
JP2006227634A (en) * | 2006-03-29 | 2006-08-31 | Seiko Epson Corp | Equipment control method using voice recognition, equipment control system using voice recognition and recording medium which records equipment control program using voice recognition |
JP2011128391A (en) * | 2009-12-18 | 2011-06-30 | Toshiba Corp | Voice processing device, voice processing program and voice processing method |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109509465A (en) * | 2017-09-15 | 2019-03-22 | 阿里巴巴集团控股有限公司 | Processing method, component, equipment and the medium of voice signal |
CN109509465B (en) * | 2017-09-15 | 2023-07-25 | 阿里巴巴集团控股有限公司 | Voice signal processing method, assembly, equipment and medium |
KR20210081050A (en) * | 2019-12-23 | 2021-07-01 | 주식회사 에스에이치비쥬얼 | System having Power Distributor controller with energy saving function |
KR102295020B1 (en) | 2019-12-23 | 2021-08-27 | 정혜진 | System having Power Distributor controller with energy saving function |
KR20230001968A (en) * | 2021-06-29 | 2023-01-05 | 혜윰기술 주식회사 | Voice and gesture integrating device of vehicle |
KR102492229B1 (en) * | 2021-06-29 | 2023-01-26 | 혜윰기술 주식회사 | Voice and gesture integrating device of vehicle |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10854195B2 (en) | Dialogue processing apparatus, a vehicle having same, and a dialogue processing method | |
US9619645B2 (en) | Authentication for recognition systems | |
US10839797B2 (en) | Dialogue system, vehicle having the same and dialogue processing method | |
KR101579533B1 (en) | Vehicle and controlling method for the same | |
US9756161B2 (en) | Voice recognition apparatus, vehicle having the same, and method of controlling the vehicle | |
US20180350366A1 (en) | Situation-based conversation initiating apparatus, system, vehicle and method | |
US8005681B2 (en) | Speech dialog control module | |
US11004447B2 (en) | Speech processing apparatus, vehicle having the speech processing apparatus, and speech processing method | |
KR101681988B1 (en) | Speech recognition apparatus, vehicle having the same and speech recongition method | |
US11189276B2 (en) | Vehicle and control method thereof | |
US10861460B2 (en) | Dialogue system, vehicle having the same and dialogue processing method | |
EP3982359A1 (en) | Electronic device and method for recognizing voice by same | |
US20230102157A1 (en) | Contextual utterance resolution in multimodal systems | |
US10770070B2 (en) | Voice recognition apparatus, vehicle including the same, and control method thereof | |
CN111421557A (en) | Electronic device and control method thereof | |
KR102339443B1 (en) | An apparatus for determining an action based on a situation, a vehicle which is capable of determining an action based on a situation, a method of an action based on a situation and a method for controlling the vehicle | |
KR102594310B1 (en) | Dialogue processing apparatus, vehicle having the same and dialogue processing method | |
WO2024070080A1 (en) | Information processing device, information processing method, and program | |
WO2022038724A1 (en) | Voice interaction device and interaction target determination method carried out in voice interaction device | |
KR101804765B1 (en) | Vehicle and control method for the same | |
KR20230075915A (en) | An electronic apparatus and a method thereof | |
KR102304342B1 (en) | Method for recognizing voice and apparatus used therefor |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
E902 | Notification of reason for refusal | ||
E701 | Decision to grant or registration of patent right | ||
GRNT | Written decision to grant |