KR101006257B1

KR101006257B1 - Apparatus and method for recognizing speech according to speaking environment and speaker

Info

Publication number: KR101006257B1
Application number: KR1020080055815A
Authority: KR
Inventors: 전주성
Original assignee: 주식회사 케이티
Priority date: 2008-06-13
Filing date: 2008-06-13
Publication date: 2011-01-06
Also published as: KR20090129739A

Abstract

본 발명은 발화 환경과 발화자에 따른 음성 인식 방법 및 장치에 관한 것으로, 본 발명에 따른 장소별 잡음 패턴 정보, 최적화된 음성 인식률을 나타내는 성별과 나이별 표준 음성 패턴 정보 및 단말 모델별 음성 패턴 정보를 저장하는 데이터베이스를 이용한 음성 인식 방법은, 가입자로부터 음성 데이터를 수신하는 단계; 상기 데이터베이스의 장소별 잡음 패턴을 참조하여, 상기 음성 데이터에서 잡음을 제거하는 단계; 상기 잡음 제거된 음성 데이터를, 상기 가입자의 성별 및 나이에 대응하는 상기 데이터베이스의 표준 음성 패턴으로 변환하는 단계; 및 상기 나이 및 성별에 따라 변환된 음성 데이터를, 상기 가입자의 단말 모델에 대응하는 상기 데이터베이스의 단말 음성 패턴으로 변환하는 단계;를 포함한다.The present invention relates to a speech recognition method and apparatus according to a speech environment and a talker. A voice recognition method using a database for storing includes receiving voice data from a subscriber; Removing noise from the voice data by referring to the noise-specific noise pattern of the database; Converting the noise canceled speech data into a standard speech pattern of the database corresponding to the gender and age of the subscriber; And converting the voice data converted according to the age and gender into a terminal voice pattern of the database corresponding to the terminal model of the subscriber.

잡음 패턴, 표준 음성 패턴, 단말 모델명 음성 패턴, 음성 인식 Noise Pattern, Standard Speech Pattern, Terminal Model Name Speech Pattern, Speech Recognition

Description

Speech recognition method and apparatus according to the speech environment and the speaker {APPARATUS AND METHOD FOR RECOGNIZING SPEECH ACCORDING TO SPEAKING ENVIRONMENT AND SPEAKER}

본 발명은 음성 인식 방법에 관한 것으로, 더욱 상세하게는 가입자의 발화 환경 및 상기 가입자의 성별과 연령 따른 음성 인식 방법 및 장치에 관한 것이다.The present invention relates to a speech recognition method, and more particularly, to a speech recognition method and apparatus according to the subscriber's speech environment and the gender and age of the subscriber.

이동통신시스템이 발전하고 무선통신단말이 현대인의 필수품이 됨에 따라, 가입자들은 언제 어디서나 타 가입자들과 음성 통화할 수 있다. 아울러, 가입자들은 음성통화 외에 무선통신단말을 이용하여 증권, 날씨, 스포츠 중계, 뉴스 등 다양한 서비스를 제공받는다.As mobile communication systems develop and wireless communication terminals become a necessity for modern people, subscribers can make voice calls with other subscribers anytime, anywhere. In addition, subscribers are provided with various services such as stocks, weather, sports relay, and news using wireless communication terminals in addition to voice calls.

한편, 음성 인식 기술이 발전함에 따라 키보드 또는 키패드 등의 특정 장치를 통해 입력되는 신호가 음성으로 대체되고 있다. 이런 음성 인식 기술로서, 가입자가 검색단어를 발화하면, 그 검색단어에 대응하는 검색 결과를 추출하여 상기 가입자에게 제공하는 음성 검색 서비스가 있다. 상기 음성 검색 서비스에 대해 부연하면, 음성 인식 서버는 가입자로부터 음성 데이터를 수신하고, 그 음성 데이터의 음성 주파수를 분석하여 내포된 단어를 식별한다. 그리고 음성 인식 서버는 그 식별된 단어에 대응하는 정보를 추출하여 검색 결과를 상기 가입자에게 전송한다. 이 런 음성 인식 기술은 가입자의 음성 인식률을 향상시키는 것을 주요한 목적으로 한다. Meanwhile, with the development of voice recognition technology, signals input through a specific device such as a keyboard or a keypad are replaced by voice. As a voice recognition technology, when a subscriber speaks a search word, there is a voice search service that extracts a search result corresponding to the search word and provides the same to the subscriber. More specifically about the voice search service, the voice recognition server receives voice data from the subscriber and analyzes the voice frequency of the voice data to identify the nested words. The speech recognition server extracts information corresponding to the identified word and transmits a search result to the subscriber. This speech recognition technology has a main purpose to improve the speech recognition rate of subscribers.

그런데 가입자가 검색단어로서 음성을 발화한 경우, 상기 검색단어의 음성신호에는 상기 가입자의 주변에 존재하는 잡음이 혼합된다. 즉, 가입자가 목욕탕, 운동장, 차량 등의 장소에서 음성을 발화할 때, 상기 목욕탕, 운동장, 차량 등에 산재하는 잡음까지도 상기 검색단어의 음성신호로서 작용하게 된다. 한편, 음성 인식 서버가 가입자로부터 발화된 검색단어를 수신할 때, PSTN(Public Switched Telephone Network) 등의 통신망에서 산재하는 전기 잡음이 상기 검색단어의 음성신호와 혼합될 수도 있다. 이렇게 가입자의 주변 잡음 또는 통신망의 잡음이 혼합된 음성신호를 음성 인식 서버가 수신하면 가입자의 음성 인식률이 낮아진다. However, when a subscriber speaks a voice as a search word, noise existing in the vicinity of the subscriber is mixed with the voice signal of the search word. That is, when a subscriber speaks a voice in a place such as a bath, a playground, or a vehicle, even noise scattered in the bath, the playground, or a vehicle acts as a voice signal of the search word. On the other hand, when the voice recognition server receives a search word spoken from a subscriber, electrical noise scattered in a communication network such as a public switched telephone network (PSTN) may be mixed with the voice signal of the search word. When the voice recognition server receives the voice signal mixed with the subscriber's ambient noise or the communication network noise, the subscriber's voice recognition rate is lowered.

한편, 음성 인식 서버는 음성신호 입력한 가입자의 나이와 성별, 음성신호가 입력된 단말의 모델에 따라 정형화되지 않고 다양한 음성 주파수를 가지는 음성신호를 수신한다. 즉, 음성 인식 서버는 음성 인식에 최적화된 음성신호가 아닌 가입자의 성별과 나이, 단말 모델에 따라 음의 고저, 주파수 톤 파형 및 음의 크기 등이 서로 다른 다양한 음성신호를 수신한다. 그런데 이렇게 정형화되지 않은 음성신호 중 일부는 음의 고저, 주파수 톤 파형 및 음의 크기가 음성 인식에 적합하지 않아, 음성 인식 서버가 제대로 인식하지 못하는 경우가 발생한다. Meanwhile, the voice recognition server receives a voice signal having various voice frequencies without being standardized according to the age and gender of the subscriber who input the voice signal and the model of the terminal to which the voice signal is input. That is, the voice recognition server receives a variety of voice signals having different pitches, frequency tone waveforms, and loudness levels depending on the gender and age of the subscriber and the terminal model, rather than the voice signal optimized for speech recognition. However, some of the unstructured voice signals may have high or low pitches, frequency tone waveforms, and loudness, which are not suitable for speech recognition.

본 발명은 상기 문제점을 해결하기 위하여 제안된 것으로, 가입자의 음성 발화 장소, 상기 가입자의 성별과 나이 및 음성 발화시 이용한 단말 정보를 토대로 음성 인식률을 향상시키는 음성 인식 방법 및 장치를 제공하는데 그 목적이 있다.The present invention has been proposed to solve the above problems, and provides a voice recognition method and apparatus for improving the voice recognition rate based on the location of the subscriber's voice, the gender and age of the subscriber and the terminal information used for the voice. have.

본 발명의 다른 목적 및 장점들은 하기의 설명에 의해서 이해될 수 있으며, 본 발명의 실시예에 의해 보다 분명하게 알게 될 것이다. 또한, 본 발명의 목적 및 장점들은 특허 청구 범위에 나타낸 수단 및 그 조합에 의해 실현될 수 있음을 쉽게 알 수 있을 것이다.Other objects and advantages of the present invention can be understood by the following description, and will be more clearly understood by the embodiments of the present invention. Also, it will be readily appreciated that the objects and advantages of the present invention may be realized by the means and combinations thereof indicated in the claims.

상기 목적을 달성하기 위한 본 발명의 제 1 측면에 따른, 장소별 잡음 패턴 정보, 최적화된 음성 인식률을 나타내는 성별과 나이별 표준 음성 패턴 정보 및 단말 모델별 음성 패턴 정보를 저장하는 데이터베이스를 이용한 음성 인식 방법은, (a) 가입자로부터 음성 데이터를 수신하는 단계; (b) 상기 데이터베이스의 장소별 잡음 패턴을 참조하여, 상기 음성 데이터에서 잡음을 제거하는 단계; (c) 상기 잡음 제거된 음성 데이터를, 상기 가입자의 성별 및 나이에 대응하는 상기 데이터베이스의 표준 음성 패턴으로 변환하는 단계; 및 (d) 상기 나이 및 성별에 따라 변환된 음성 데이터를, 상기 가입자의 단말 모델에 대응하는 상기 데이터베이스의 단말 음성 패턴으로 변환하는 단계;를 포함하는 것을 특징으로 한다.According to a first aspect of the present invention for achieving the above object, speech recognition using a database storing noise pattern information for each place, standard voice pattern information for each gender and age indicating an optimized speech recognition rate, and voice pattern information for each terminal model The method includes (a) receiving voice data from a subscriber; (b) removing noise from the voice data by referring to the noise-specific noise patterns of the database; (c) converting the noise canceled speech data into a standard speech pattern in the database corresponding to the gender and age of the subscriber; And (d) converting the voice data converted according to the age and gender into a terminal voice pattern of the database corresponding to the terminal model of the subscriber.

상기 목적을 달성하기 위한 본 발명의 제 2 측면에 따른, 장소별 잡음 패턴 정보, 최적화된 음성 인식률을 나타내는 성별과 나이별 표준 음성 패턴 정보 및 단말 모델별 음성 패턴 정보를 저장하는 데이터베이스를 이용한 음성 변환하는 장치는, 가입자로부터 음성 데이터를 수신하는 수신부; 상기 장소별 잡음 패턴 정보를 참조하여 상기 음성 데이터에서 잡음을 제거하는 잡음 제거부; 및 상기 잡음 제거된 음성 데이터를 상기 가입자의 성별과 나이에 대응하는 표준 음성 패턴 및 가입자 단말의 모델에 대응하는 단말 모델 음성 패턴으로 변환하는 변환부;를 포함하는 것을 특징으로 한다.According to the second aspect of the present invention for achieving the above object, voice conversion using a database for storing the noise pattern information for each place, the standard voice pattern information for each gender and age indicating the optimized speech recognition rate, and the voice pattern information for each terminal model The apparatus includes a receiving unit for receiving voice data from a subscriber; A noise removing unit removing noise from the voice data by referring to the noise pattern information for each place; And a converter for converting the noise-removed voice data into a standard voice pattern corresponding to the gender and age of the subscriber and a terminal model voice pattern corresponding to the model of the subscriber terminal.

본 발명은 가입자가 발성한 음성 데이터에 잡음이 혼합되어 있더라도, 그 잡음을 상기 음성 데이터에서 제거한다.The present invention removes the noise from the voice data even if noise is mixed in the voice data spoken by the subscriber.

또한, 본 발명은 가입자의 성별, 나이 및 단말 모델에 근거하여 최적의 음성 인식률을 나타내는 형태로 음성 데이터를 변환하여, 가입자의 음성 인식률을 향상시키는 장점이 있다.In addition, the present invention has an advantage of improving the voice recognition rate of the subscriber by converting the voice data in a form representing the optimal voice recognition rate based on the gender, age and terminal model of the subscriber.

상술한 목적, 특징 및 장점은 첨부된 도면과 관련한 다음의 상세한 설명을 통하여 보다 분명해 질 것이며, 그에 따라 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자가 본 발명의 기술적 사상을 용이하게 실시할 수 있을 것이다. 또한, 본 발명을 설명함에 있어서 본 발명과 관련된 공지 기술에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에 그 상세한 설명을 생략하기로 한다. 이하, 첨부된 도면을 참조하여 본 발명에 따른 바람직한 일 실시예를 상세히 설명하기로 한다.The foregoing and other objects, features and advantages of the present invention will become more apparent from the following detailed description of the present invention when taken in conjunction with the accompanying drawings, in which: There will be. In the following description, well-known functions or constructions are not described in detail since they would obscure the invention in unnecessary detail. Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 일 실시예에 따른, 음성 인식 시스템의 구성을 나타내는 도면이다.1 is a diagram illustrating a configuration of a voice recognition system according to an embodiment of the present invention.

도 1에 도시된 바와 같이, 본 발명의 일 실시예에 따른 음성 인식 시스템은 무선통신단말(110), 음성 변환 서버(130), 음성 표준화 데이터베이스(140, 이하, "음성 표준화 DB"라 칭함), 음성 인식 서버(150) 및 검색 데이터베이스(160, 이하, "검색 DB"라 칭함)를 포함한다. 상기 구성요소들은 이동통신망(120)을 통해 서로 연결된다. 여기서 이동통신망(120)은 기지국(BTS/NodeB), 기지국 제어기(BSC/RNC), 교환기, 패킷 데이터 서비스를 위한 장치(PDSN, SGSN, IWF 등) 등을 포함하는 것으로서, 본 발명이 속하는 기술분야에서 주지의 기술요소이므로 그 상세한 설명은 생략한다. As shown in FIG. 1, the speech recognition system according to an embodiment of the present invention includes a wireless communication terminal 110, a speech conversion server 130, and a speech standardization database 140 (hereinafter, referred to as a “voice standardization DB”). , Speech recognition server 150 and search database 160 (hereinafter referred to as a "search DB"). The components are connected to each other through the mobile communication network (120). Here, the mobile communication network 120 includes a base station (BTS / NodeB), a base station controller (BSC / RNC), a switch, a device for packet data service (PDSN, SGSN, IWF, etc.), and the present invention. Since it is a well-known technical element in the detailed description thereof will be omitted.

무선통신단말(110)은 이동통신망(120)과 통신하며, 특히 가입자로부터 검색단어로서 입력받은 음성 데이터를 음성 변환 서버(130)로 전송한다. 이때, 무선통신단말(110)은 상기 가입자가 발화한 음성신호를 녹음하여 오디오 파일로 변환하여 저장하고, 그 오디오 파일을 음성 변환 서버(130)로 전송하는 것이 바람직하다. 이런 무선통신단말(110)은 이동통신망(120)과 데이터 송수신이 가능한 단말로서, PDA, 셀룰러폰, PCS폰, GSM폰, W-CDMA폰, DMB폰, Wibro폰, Wifi폰, 3G 영상폰 등을 포함한다. The wireless communication terminal 110 communicates with the mobile communication network 120, and in particular, transmits voice data received as a search word from the subscriber to the voice conversion server 130. In this case, it is preferable that the wireless communication terminal 110 records the voice signal uttered by the subscriber, converts the audio signal into an audio file, and stores the audio signal, and transmits the audio file to the voice conversion server 130. The wireless communication terminal 110 is a terminal capable of transmitting and receiving data with the mobile communication network 120, PDA, cellular phone, PCS phone, GSM phone, W-CDMA phone, DMB phone, Wibro phone, Wifi phone, 3G video phone, etc. It includes.

음성 변환 서버(130)는 무선통신단말(110)로부터 수신한 음성 데이터의 주파수 파형을 분석하여, 상기 음성 데이터에 포함된 잡음을 제거한다. 이때, 음성 변 환 서버(130)는 무선통신단말(110)로부터 오디오 파일을 수신하여 그 오디오 파일을 복호화하고, 그 복호화된 음성 데이터를 음성 표준화 DB(140)에 저장된 음성 패턴의 포맷으로 부호화할 수 있다. 아울러, 음성 변환 서버(130)는 발신자의 연령과 나이에 대응하는 표준 음성 패턴을 음성 표준화 DB(140)에서 추출하여, 상기 음성 데이터의 음의 고저, 톤의 주파수 파형을 상기 음성 패턴의 음의 고저, 톤의 주파수 파형으로 변환한다. 또한, 음성 변환 서버(130)는 무선통신단말(110)의 모델명에 대응하는 단말 음성 패턴을 음성 표준화 DB(140)에서 추출하여 상기 음성 데이터의 음의 크기 및 음의 고저를 상기 단말 음성 패턴의 음의 크기 및 음의 고저로 변환한다. 아울러, 음성 변환 서버(130)는 잡음 제거되고 변환된 음성 데이터를 음성 인식 서버(150)로 전송하여, 검색을 요청할 수 있다. The voice conversion server 130 analyzes the frequency waveform of the voice data received from the wireless communication terminal 110 to remove noise included in the voice data. At this time, the voice conversion server 130 receives the audio file from the wireless communication terminal 110, decodes the audio file, and encodes the decoded voice data into a format of a voice pattern stored in the voice standardization DB 140. Can be. In addition, the voice conversion server 130 extracts a standard voice pattern corresponding to the caller's age and age from the voice standardization DB 140, and outputs a high and low frequency waveform of the voice data to the negative of the voice pattern. Convert to a high or low frequency waveform. In addition, the voice conversion server 130 extracts a terminal voice pattern corresponding to the model name of the wireless communication terminal 110 from the voice standardization DB 140 to determine the loudness and the height of the voice data of the terminal voice pattern. Convert the loudness and loudness of the sound. In addition, the voice conversion server 130 may transmit the noise-removed and converted voice data to the voice recognition server 150 to request a search.

음성 표준화 DB(140)는 장소별 잡음 패턴, 성별과 나이별 표준 음성 패턴 및 단말 모델별 음성 패턴을 저장한다. 상기 장소별 잡음 패턴은 목욕탕, 화장실, 차량, 운동장 등에서 발생하는 에코(echo) 및 딜레이(delay) 현상의 특정 주파수로서, 장소마다 발생하는 잡음 주파수가 상기 음성 표준화 DB(140)에 저장된다. 또한, 상기 성별과 나이별 표준 음성 패턴은 가입자의 성별과 나이별로 저장된 음성 패턴으로서, 상기 표준 음성 패턴은 각각의 성별 및 나이에 따라 음성 인식률이 최적화된 음성 주파수이다. 마찬가지로, 단말 모델별 음성 패턴은 무선통신단말(110) 모델별로 저장된 음성 패턴으로서, 상기 단말 음성 패턴은 각 단말 모델에 따라 음성 인식률이 최적화된 음성 주파수이다. 한편, 음성 표준화 DB(140)는 가입자별 식별번호(또는 IMSI)에 대응하여 가입자의 성별, 나이 및 무선통신단말(110)의 모델 명을 저장할 수 있다.The voice standardization DB 140 stores a noise pattern for each place, a standard voice pattern for each gender and age, and a voice pattern for each terminal model. The noise pattern for each place is a specific frequency of an echo and delay phenomenon occurring in a bathroom, a restroom, a vehicle, a playground, and the like, and the noise frequency generated for each place is stored in the voice standardization DB 140. In addition, the standard voice pattern for each gender and age is a voice pattern stored for each sex and age of the subscriber, and the standard voice pattern is a voice frequency of which voice recognition rate is optimized according to each gender and age. Similarly, the voice pattern for each terminal model is a voice pattern stored for each wireless communication terminal 110 model. The terminal voice pattern is a voice frequency with a voice recognition rate optimized according to each terminal model. Meanwhile, the voice standardization DB 140 may store the sex, age, and model name of the wireless communication terminal 110 in correspondence with the subscriber identification number (or IMSI).

음성 인식 서버(150)는 수신한 음성 데이터를 분석하여 그 음성 데이터에 내포된 단어를 식별하고, 그 단어에 대응하는 정보를 검색 DB(160)에서 추출하여 무선통신단말(110)로 전송한다.The voice recognition server 150 analyzes the received voice data, identifies a word contained in the voice data, extracts information corresponding to the word from the search DB 160, and transmits the information to the wireless communication terminal 110.

검색 DB(160)는 지리정보, 전화번호 정보, 웹 사이트 정보, 인물 정보, 날씨 정보, 증권 정보, 뉴스 정보 등의 검색 데이터를 저장한다.The search DB 160 stores search data such as geographic information, phone number information, web site information, person information, weather information, stock information, news information, and the like.

도 2는 본 발명의 일 실시예에 따른, 가입자의 발화 환경 및 가입자별 성별과 나이를 참조하여 음성을 인식하는 방법을 설명하는 순서도이다.2 is a flowchart illustrating a method of recognizing a voice with reference to a subscriber's speech environment and a subscriber's gender and age according to an embodiment of the present invention.

도 2를 참조하면, 음성 변환 서버(130)는 무선통신단말(110)로부터 검색단어가 내포된 음성 데이터를 수신한다(S201). 이때, 음성 변환 서버(130)는 무선통신단말(110)의 식별번호와 IMSI를 수신한다. 또한, 음성 변환 서버(130)는 음성 데이터로서 오디오 파일(예를 들어, QCP 파일)을 상기 무선통신단말(110)로부터 수신할 수 있다. 그러면, 음성 변환 서버(130)는 수신한 음성 데이터를 복호화하고, 상기 복호화된 음성 데이터를 음성 표준화 DB(140)에 저장된 음성 패턴 포맷으로 부호화한다(S203). 예를 들어, 음성 변환 서버(130)는 수신한 QCP 오디오 파일을 복호화하고, 그 복호화된 음성 데이터를 WAVE 포맷으로 부호화할 수 있다.Referring to FIG. 2, the voice conversion server 130 receives voice data containing a search word from the wireless communication terminal 110 (S201). At this time, the voice conversion server 130 receives the identification number and IMSI of the wireless communication terminal 110. In addition, the voice conversion server 130 may receive an audio file (eg, a QCP file) from the wireless communication terminal 110 as voice data. Then, the voice conversion server 130 decodes the received voice data and encodes the decoded voice data into a voice pattern format stored in the voice standardization DB 140 (S203). For example, the speech conversion server 130 may decode the received QCP audio file and encode the decoded speech data in a WAVE format.

이어서, 음성 변환 서버(130)는 음성 표준화 DB(140)의 장소별 잡음 패턴을 참조하여, 상기 부호화된 음성 데이터에서 상기 장소별 잡음 패턴의 주파수를 검출하고 상기 음성 데이터에 포함된 잡음 주파수를 제거한다(S205). 즉, 음성 변환 서버(130)는 음성 표준화 DB(140)에 저장된 각각의 잡음 주파수와 상기 음성 데이터 를 비교 분석하여, 상기 음성 데이터에 잡음 주파수가 포함된 경우 그 잡음 주파수를 상기 음성 데이터에서 제거한다. Subsequently, the speech conversion server 130 detects a frequency of the noise pattern of each place in the encoded speech data and removes a noise frequency included in the speech data by referring to the noise pattern of each place of the speech standardization DB 140. (S205). That is, the speech conversion server 130 compares and analyzes each noise frequency stored in the speech standardization DB 140 and the speech data, and removes the noise frequency from the speech data when the speech data includes the noise frequency. .

다음으로, 음성 변환 서버(130)는 무선통신단말(110)의 식별번호(또는 IMSI)를 토대로, 음성 표준화 DB(140)에서 발화한 가입자의 성별과 나이, 무선통신단말(110)의 모델명을 확인한다. 또는, 음성 변환 서버(130)는 무선통신단말(110)의 식별번호(또는 IMSI)를 홈위치등록기(미도시)로 전송하여, 해당 가입자의 성별과 나이 및 무선통신단말(110)의 모델명을 수신할 수 있다. 이어서, 음성 변환 서버(130)는 음성 표준화 DB(140)의 성별과 나이별 표준 음성 패턴에서 상기 가입자의 성별과 연령에 대응하는 표준 음성 패턴을 추출하고, 잡음 제거된 음성 데이터와 상기 표준 음성 패턴을 비교 분석한다. 그리고 음성 변환 서버(130)는 상기 음성 데이터의 음의 고저 및 톤의 주파수 파형을 상기 표준 음성 패턴의 음의 고저 및 톤의 주파수 파형으로 변환한다(S207).Next, the voice conversion server 130 based on the identification number (or IMSI) of the wireless communication terminal 110, the gender and age of the subscriber spoken in the voice standardization DB 140, the model name of the wireless communication terminal 110 Check it. Alternatively, the voice conversion server 130 transmits the identification number (or IMSI) of the wireless communication terminal 110 to a home location register (not shown), and displays the gender and age of the corresponding subscriber and the model name of the wireless communication terminal 110. Can be received. Subsequently, the voice conversion server 130 extracts a standard voice pattern corresponding to the gender and age of the subscriber from the gender and age standard voice patterns of the voice standardization DB 140, and removes the noise data and the standard voice pattern. Compare and analyze. The voice conversion server 130 converts the frequency waveforms of the tone height and tone of the voice data into the frequency waveforms of the tone height and tone of the standard voice pattern (S207).

이어서, 음성 변환 서버(130)는 음성 표준화 DB(140)의 단말 모델별 음성 패턴에서 상기 무선통신단말(110)의 모델명에 대응하는 단말 음성 패턴을 추출하고, 성별 및 나이에 따라 변환된 음성 데이터와 상기 단말 음성 패턴을 비교 분석한다. 그리고 음성 변환 서버(130)는 상기 음성 데이터의 음의 크기와 음의 고저를 상기 단말 음성 패턴의 음의 크기와 음의 고저로 변환한다(S209).Subsequently, the speech conversion server 130 extracts the terminal speech pattern corresponding to the model name of the wireless communication terminal 110 from the speech pattern for each terminal model of the speech standardization DB 140 and converts the speech data according to gender and age. Compare and analyze the terminal voice pattern. The voice conversion server 130 converts the loudness and the loudness of the voice data into the loudness and the loudness of the voice pattern of the terminal (S209).

다음으로, 음성 변환 서버(130)는 장소별 잡음이 제거되고, 성별과 나이 및 단말 모델에 따라 변환된 음성 데이터를 음성 인식 서버(150)로 전송하여 정보 검색을 요청한다(S211). 이에 따라, 음성 인식 서버(150)는 상기 음성 데이터를 분석 하여 그 음성 데이터에 내포된 검색단어를 식별하고, 상기 검색단어에 대응되는 정보를 검색 DB(160)에서 추출하여 무선통신단말(110)로 전송한다. 상기 검색결과는 지리정보, 인물정보, 사이트정보, 이미지정보, 동영상정보, 뉴스정보, 음악정보, 쇼핑정보 등을 포함한다. Next, the voice conversion server 130 removes noise for each place, and transmits the voice data converted according to gender, age, and the terminal model to the voice recognition server 150 to request information retrieval (S211). Accordingly, the voice recognition server 150 analyzes the voice data to identify a search word embedded in the voice data, extracts information corresponding to the search word from the search DB 160, and wireless communication terminal 110. To send. The search results include geographic information, person information, site information, image information, video information, news information, music information, shopping information, and the like.

도 3은 본 발명의 일 실시예에 따른, 음성 변환 서버의 구성을 나타내는 도면이다.3 is a diagram illustrating a configuration of a voice conversion server according to an embodiment of the present invention.

도 3에 도시된 바와 같이, 본 발명의 일 실시예에 따른 음성 변환 서버(130)는 수신부(131), 복부호부(132), 잡음 제거부(133), 변환부(134) 및 검색 요청부(135)를 포함하고, 음성 표준화 DB(140)는 장소별 잡음 패턴(141), 성별 나이별 표준 음성 패턴(142) 및 단말 모델별 음성 패턴(143)을 저장한다.As shown in FIG. 3, the voice conversion server 130 according to an embodiment of the present invention includes a receiver 131, a decoder 132, a noise remover 133, a converter 134, and a search requester. The voice standardization DB 140 stores a noise pattern 141 for each place, a standard voice pattern 142 for each gender age, and a voice pattern 143 for each terminal model.

수신부(131)는 무선통신단말(110)로부터 검색단어로서 음성 데이터를 수신한다. 이때, 수신부(131)는 상기 무선통신단말(110)의 식별번호와 IMSI를 수신한다. 한편, 수신부(131)는 무선통신단말(110)에서 녹음된 오디오 파일을 음성 데이터로서 수신할 수 있다. The receiver 131 receives voice data as a search word from the wireless communication terminal 110. At this time, the receiver 131 receives the identification number and IMSI of the wireless communication terminal 110. Meanwhile, the receiver 131 may receive the audio file recorded by the wireless communication terminal 110 as voice data.

복부호부(132)는 음성 데이터를 복호화하고 그 복호화된 오디오 파일을 음성 표준화 DB(140)에 저장된 음성 패턴 포맷으로 부호화한다.The decoding unit 132 decodes the voice data and encodes the decoded audio file into a voice pattern format stored in the voice standardization DB 140.

잡음 제거부(133)는 장소별 잡음 패턴(141)을 참조하여, 상기 부호화된 음성 데이터에서 상기 장소별 잡음 패턴의 주파수를 검출하여 제거한다. The noise removing unit 133 detects and removes a frequency of the noise pattern for each place from the encoded speech data with reference to the noise pattern 141 for each place.

변환부(134)는 무선통신단말(110)의 식별번호(또는 IMSI)를 근거로, 음성 표준화 DB(140) 또는 홈위치등록기를 통하여 발화한 가입자의 성별과 나이 및 무선통 신단말(110) 모델명을 확인한다. 또한, 변환부(134)는 성별 나이별 표준 음성 패턴(142)에서 상기 가입자의 성별과 연령에 대응하는 표준 음성 패턴을 추출하고, 잡음 제거된 음성 데이터를 상기 추출한 표준 음성 패턴으로 변환한다. 즉, 변환부(134)는 상기 음성 데이터의 음의 고저 및 톤의 주파수 파형을 상기 표준 음성 패턴의 음의 고저 및 톤의 주파수 파형으로 변환한다. The conversion unit 134 is based on the identification number (or IMSI) of the wireless communication terminal 110, the gender and age of the subscriber spoken through the voice standardization DB 140 or the home location register and the wireless communication terminal 110 Check the model name. In addition, the conversion unit 134 extracts a standard voice pattern corresponding to the gender and age of the subscriber from the standard voice pattern 142 for each gender age and converts the noise-removed voice data into the extracted standard voice pattern. That is, the converting unit 134 converts the frequency waveforms of the tone height and tone of the voice data into the frequency waveforms of the tone height and tone of the standard voice pattern.

또한, 변환부(134)는 단말 모델별 음성 패턴(143)에서 무선통신단말(110)의 모델명에 대응하는 단말 음성 패턴을 추출하여, 상기 음성 데이터를 상기 추출한 단말 음성 패턴으로 변환한다. 즉, 변환부(134)는 상기 음성 데이터의 음의 크기와 음의 고저를 상기 단말 음성 패턴의 음의 크기와 음의 고저로 변환한다.In addition, the conversion unit 134 extracts the terminal voice pattern corresponding to the model name of the wireless communication terminal 110 from the voice pattern for each terminal model 143, and converts the voice data into the extracted terminal voice pattern. That is, the converter 134 converts the loudness and the loudness of the voice data into the loudness and the loudness of the voice pattern of the terminal.

검색 요청부(135)는 장소별 잡음이 제거되고, 성별과 나이 및 단말 모델에 따라 변환된 음성 데이터를 음성 인식 서버(150)로 전송하여 정보 검색을 요청한다. The search request unit 135 removes noise for each place, and transmits the voice data converted according to gender, age, and the terminal model to the voice recognition server 150 to request information retrieval.

상술한 본 발명에 따르면, 가입자에 의해 입력한 음성 데이터가 상기 가입자의 주변 잡음과 혼합되더라도, 음성 변환 서버(130)에 의해 상기 주변 잡음은 제거된다. 또한, 본 발명은 정형화되지 않은 음성 데이터가 수신되더라도, 성별 나이별 표준 음성 패턴(142) 및 단말 모델별 음성 패턴(143)을 참조하여 최적의 인식률을 달성할 수 있도록 상기 음성 데이터를 변환함으로써, 가입자의 음성 인식률을 높일 수 있다. According to the present invention described above, even if the voice data input by the subscriber is mixed with the ambient noise of the subscriber, the ambient noise is removed by the voice conversion server 130. In addition, the present invention by converting the voice data so as to achieve an optimal recognition rate with reference to the standard voice pattern 142 for each gender age and the voice pattern 143 for each terminal model even if unstructured voice data is received, It is possible to increase the voice recognition rate of the subscriber.

한편, 상술한 실시예에서 음성 변환 서버(130)와 음성 인식 서버(150)로 독립적으로 동작하는 것으로 설명하였지만, 본 발명은 이에 한정되지 않고 음성변화 서버(130)와 음성 인식 서버(150)과 통합되어 동작될 수 있음을 분명히 해 둔다. Meanwhile, in the above-described embodiment, the voice conversion server 130 and the voice recognition server 150 have been described as operating independently. However, the present invention is not limited thereto, and the voice change server 130 and the voice recognition server 150 are different from each other. Make sure it can be integrated.

상술한 바와 같은 본 발명의 방법은 프로그램으로 구현되어 컴퓨터로 읽을 수 있는 형태로 기록매체(시디롬, 램, 롬, 플로피 디스크, 하드 디스크, 광자기 디스크 등)에 저장될 수 있다. 이러한 과정은 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있으므로 더 이상 상세히 설명하지 않기로 한다.The method of the present invention as described above may be implemented as a program and stored in a recording medium (CD-ROM, RAM, ROM, floppy disk, hard disk, magneto-optical disk, etc.) in a computer-readable form. Since this process can be easily implemented by those skilled in the art will not be described in more detail.

이상에서 설명한 본 발명은, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 있어 본 발명의 기술적 사상을 벗어나지 않는 범위 내에서 여러 가지 치환, 변형 및 변경이 가능하므로 전술한 실시예 및 첨부된 도면에 의해 한정되는 것이 아니다.The present invention described above is capable of various substitutions, modifications, and changes without departing from the technical spirit of the present invention for those skilled in the art to which the present invention pertains. It is not limited by the drawings.

본 명세서에 첨부되는 다음의 도면들은 본 발명의 바람직한 실시예를 예시하는 것이며, 전술하는 발명의 상세한 설명과 함께 본 발명의 기술사상을 더욱 이해시키는 역할을 하는 것이므로, 본 발명은 그러한 도면에 기재된 사항에만 한정되어 해석되어서는 아니된다.The following drawings, which are attached to this specification, illustrate exemplary embodiments of the present invention, and together with the detailed description of the present invention, serve to further understand the technical spirit of the present invention. It should not be construed as limited to.

<도면의 주요 부분에 대한 부호의 설명><Explanation of symbols for the main parts of the drawings>

110 : 무선통신단말 130 : 음성 변환 서버110: wireless communication terminal 130: voice conversion server

140 : 음성 표준화 DB 150 : 음성 인식 서버140: speech standardization DB 150: speech recognition server

160 : 검색 DB160: search DB

131 : 수신부 132 : 복부호부131: receiving unit 132: abdominal code

133 : 잡음 제거부 134 : 변환부133: noise removing unit 134: conversion unit

135 : 검색 요청부 141 : 장소별 잡음 패턴135: search request unit 141: noise pattern for each place

142 : 성별 나이별 표준 음성 패턴142: Standard Voice Patterns by Sex and Age

143 : 단말 모델별 음성 패턴143: voice pattern for each terminal model

Claims

delete

A speech recognition method using a database that stores noise pattern information by place, standard voice pattern information by gender and age indicating optimized speech recognition rate, and voice pattern information by terminal model,

(a) receiving voice data from a subscriber;

(b) removing noise from the voice data by referring to the noise-specific noise patterns of the database;

(c) referring to the frequency waveforms of negative pitches and tones in the standard speech patterns of the database corresponding to the gender and age of the subscribers, and converting the frequency patterns of negative pitches and tones of the noise-removed speech data to the subscribers. Converting to a standard speech pattern corresponding to gender and age; And

(d) converting the voice data converted according to the age and gender into a terminal voice pattern of the database corresponding to the terminal model of the subscriber.

(a) receiving voice data from a subscriber;

(c) converting the noise canceled speech data into a standard speech pattern in the database corresponding to the gender and age of the subscriber; And

(d) referring to the loudness and the height of the terminal voice pattern of the database corresponding to the subscriber's terminal model, the loudness and the height of the voice data converted according to the age and gender to the terminal model of the subscriber. Converting to a corresponding terminal voice pattern; Speech recognition method comprising a.

The method of claim 2 or 3,

In step (a),

Decoding the speech data and encoding the speech data into a speech pattern format stored in the database.

The method of claim 2 or 3,

After step (d),

Identifying a search word included in the converted voice data, extracting information corresponding to the search word, and transmitting the extracted search word to a terminal of the subscriber;

delete

An apparatus for speech conversion using a database that stores noise pattern information by place, standard voice pattern information by gender and age indicating optimized speech recognition rate, and voice pattern information by terminal model.

Receiving unit for receiving voice data from the subscriber;

A noise removing unit removing noise from the voice data by referring to the noise pattern information for each place; And

And a converter configured to convert the noise-free voice data into a standard voice pattern corresponding to the gender and age of the subscriber and a terminal model voice pattern corresponding to the model of the subscriber terminal.

The conversion unit converts the frequency of the tone and tone of the voice data of the voice data in the standard voice pattern corresponding to the gender and age of the subscriber, converting the frequency of the tone and tone of the voice data Device.

Receiving unit for receiving voice data from the subscriber;

And the converting unit converts a loudness and a high pitch of the voice data by referring to the loudness and the high and low in the terminal model voice pattern corresponding to the terminal model of the subscriber.

9. The method according to claim 7 or 8,

And a decoding unit to decode the received speech data and to encode the received speech data into a speech pattern format stored in the database.