KR20160055059A

KR20160055059A - Method and apparatus for speech signal processing

Info

Publication number: KR20160055059A
Application number: KR1020150152525A
Authority: KR
Inventors: 김태윤; 김상하; 김성수; 이진식; 한창우; 김은경; 이재원
Original assignee: 삼성전자주식회사
Priority date: 2014-11-07
Filing date: 2015-10-30
Publication date: 2016-05-17
Also published as: KR102536944B1

Abstract

A speech signal processing method of a terminal comprises the following steps of: receiving a speech signal; detecting a personalized information section including personal information in the speech signal; performing data processing with respect to a speech signal in correspondence with the personalized information section by using a personalized model generated based on the personal information; and receiving a result of the data processing with respect to the speech signal in correspondence with a general information section, which is a section except the personalized information section, from a server. Therefore, the speech signal processing method can protect personal information while using a personalized model.

Description

TECHNICAL FIELD [0001] The present invention relates to a speech signal processing method,

본 발명은 음성 신호 처리 방법 및 장치에 관한 것으로, 보다 구체적으로 개인화 모델을 이용하면서도 개인 정보를 보호할 수 있는 음성 신호 처리 방법 및 장치에 관한 것이다.BACKGROUND OF THE INVENTION 1. Field of the Invention [0002] The present invention relates to a method and apparatus for processing a voice signal, and more particularly, to a method and apparatus for processing a voice signal that can protect personal information while using a personalization model.

음성 인식(speech recognition)은 사용자의 음성을 입력 받아 문자로 변환하는 기술이다. 이러한 과정을 자동으로 수행한다고 하여 자동 음성 인식(Automatic Speech Recognition, 이하 ASR)이라고 부르기도 한다. 근래에 들어 스마트 폰이나 TV 등의 기기에서 키보드 입력을 대체하기 위한 인터페이스 기술로써 널리 확산되고 있다. 언어 이해(Natural Language Understanding, 이하 NLU)는 음성 인식의 인식 결과로부터 사용자 발화의 의미를 추출하는 기술이다. 단순히 사용자의 음성을 인식하는 것이 아니라 보다 높은 수준의 사용자 음성의 분석을 수행하여 음성의 의미를 보다 정확하게 파악할 수 있다. Speech recognition is a technique of receiving a user's voice and converting it into a character. This process is also called Automatic Speech Recognition (ASR). Recently, it has been widely spread as an interface technology for replacing keyboard input in a device such as a smart phone or a TV. Natural Language Understanding (NLU) is a technique for extracting the meaning of user utterance from recognition results of speech recognition. It is possible to more accurately grasp the meaning of the voice by analyzing the user's voice at a higher level rather than merely recognizing the user's voice.

이러한 음성 인식 및 언어 이해 시스템은 일반적으로 음성 신호를 입력받는 클라이언트(Client)와 음성 신호로부터 음성 인식 및 언어 이해를 수행하는 음성 인식 및 언어 이해 엔진(ASR/NLU engine)으로 나눌 수 있으며, 음성 신호 처리 속도를 높이기 위하여 두 개의 모듈은 서로 떨어지도록 디자인 될 수 있다. 이 경우, 프로세싱 능력과 데이터 저장 능력에 제한이 있는 스마트폰이나 TV 등의 디바이스는 클라이언트로, 음성 인식 및 언어 이해 엔진은 높은 연산 능력을 갖는 독립된 서버 형태로 구성이 가능하며, 이 두 모듈은 네트워크를 통하여 연결된다. 사용자와 가까운 곳에 위치하는 디바이스는 음성 신호를 입력 받는 역할을 수행하고, 데이터 처리 속도가 빠른 서버는 음성 인식 및 언어 이해를 수행하는 역할을 수행하는 것이다. 다른 형태의 구성으로는 서버 외에 디바이스 내부에도 음성 인식 및 언어 이해 엔진을 장치하여, 두 개의 음성 인식 및 언어 이해 엔진이 서로 협조하여 음성 인식 및 언어 이해를 수행 하는 구성이 있을 수 있다. This speech recognition and language understanding system can be generally divided into a client for inputting a voice signal and a speech recognition and language understanding engine (ASR / NLU engine) for performing voice recognition and speech understanding from a voice signal, In order to increase the processing speed, the two modules can be designed to be separated from each other. In this case, a device such as a smart phone or a TV having limited processing capability and data storage capability can be configured as a client, and a speech recognition and language understanding engine can be configured as a separate server having high computing capability. Lt; / RTI > A device located close to a user plays a role of receiving a voice signal, and a server having a high data processing speed plays a role of performing speech recognition and language understanding. In another configuration, there may be a configuration in which a voice recognition and language understanding engine is installed in a device other than the server, and two voice recognition and language understanding engines cooperate with each other to perform voice recognition and language understanding.

이러한 음성 인식 및 언어 이해 시스템의 성능을 높이기 위한 방법 중 하나로, 사용자 별로 데이터를 수집하여 사용자별 모델을 생성하는 방법이 있다. 이러한 사용자별 모델을 개인화 모델(personalized model)이라고 하고, 이러한 방법을 개인화 모델링(personalized modeling)이라고 한다. 개인화 모델은 특정 개인에 대한 맞춤형 모델의 생성이 가능하기 때문에 불특정 다수를 위해 만들어진 일반 모델(General model) 에 비하여 더 높은 성능을 가지는 것이 일반적이다.One method for enhancing the performance of the speech recognition and language understanding system is to collect data for each user and generate a user-specific model. This per-user model is called a personalized model, and this method is called personalized modeling. Since personalization model can generate customized model for a specific individual, it is general that the personalization model has a higher performance than a general model created for an unspecified majority.

다만, 개인화 모델링을 사용하는 경우, 개인화 모델을 생성하기 위하여 사용자의 개인 정보를 이용해야 하는데 개인 정보의 전송 및 처리 과정에서 정보 보호의 문제점이 발생할 수 있고, 이를 해결하기 위하여 암호화 기술이 적용되는 경우, 처리 속도가 느려진다는 문제점이 발생할 수 있다.However, if personalization modeling is used, the personal information of the user must be used to create a personalization model. In the case where personal information is transmitted and processed, information security problems may occur. In order to solve this problem, , The processing speed may be slowed.

개시된 실시예는 개인화 모델을 이용하면서도 개인 정보를 보호할 수 있는 음성 신호 처리 방법 및 장치를 제공한다. The disclosed embodiment provides a voice signal processing method and apparatus that can protect personal information while using a personalization model.

구체적으로, 개시된 실시예에서는 개인화 정보 구간 및 일반 정보 구간을 단말과 서버에 나누어 처리하는 음성 신호 처리 방법 및 장치를 제공한다. Specifically, in the disclosed embodiment, a method and apparatus for processing a voice signal for dividing a personalized information section and a general information section into a terminal and a server are provided.

또한, 개시된 실시예에서는 서버에서 처리한 음성 신호에 대하여 단말이 개인화 모델을 이용하여 다시 처리하는 음성 신호 처리 방법 및 장치를 제공한다.Also, in the disclosed embodiment, a method and apparatus for processing a voice signal processed by a server, the terminal performing the processing again using a personalization model.

또한, 개시된 실시예에서는 ID 기반의 개인화 모델을 이용하는 음성 신호 처리 방법 및 장치를 제공한다.In addition, the disclosed embodiments provide a method and apparatus for processing a voice signal using an ID-based personalization model.

개시된 실시예에 따른 단말의 음성 신호 처리 방법은, 음성 신호(speech signal)를 수신하는 단계; 상기 음성 신호 중 개인 정보를 포함하는 개인화 정보 구간(personalized information section)을 검출하는 단계; 상기 음성 신호 중 상기 개인화 정보 구간에 대응하는 음성 신호에 대하여 상기 개인 정보를 기반으로 생성된 개인화 모델(personalized model)을 이용하여 데이터 처리하는 단계; 및 서버로부터 상기 개인화 정보 구간 이외의 구간인 일반 정보 구간(general information section)에 대응하는 음성 신호에 대하여 데이터 처리한 결과를 수신하는 단계를 포함한다. According to another aspect of the present invention, there is provided a method of processing a speech signal of a terminal, the method comprising: receiving a speech signal; Detecting a personalized information section including personal information of the voice signal; Performing data processing on a voice signal corresponding to the personalized information section of the voice signal using a personalized model generated based on the personal information; And receiving a result of data processing of a voice signal corresponding to a general information section, which is a section other than the personalized information section, from the server.

또한, 상기 개인화 정보 구간과 상기 일반 정보 구간에 대한 음성 구간 정보를 생성하여 상기 서버로 전송하는 단계를 더 포함할 수 있다. The method may further include generating the personalized information section and the voice section information for the general information section and transmitting the generated voice section information to the server.

또한, 상기 음성 구간 정보는, 상기 음성 신호 중 상기 개인화 정보 구간 및 상기 일반 정보 구간 중 적어도 하나 이상의 구간을 마킹(marking)한 구간 마킹 정보를 포함할 수 있다. The voice interval information may include interval marking information marking at least one of the personalized information interval and the general information interval of the voice signal.

또한, 상기 서버로부터 상기 개인화 정보 구간과 상기 일반 정보 구간에 대한 음성 구간 정보를 수신하는 단계를 더 포함할 수 있다. The method may further include receiving the personalized information section and the voice section information for the general information section from the server.

또한, 상기 서버로부터 수신한 상기 일반 정보 구간에 대응하는 음성 신호에 대하여 데이터 처리한 결과는, 상기 서버가 일반 모델을 이용하여 상기 일반 정보 구간에 대응하는 음성 신호를 처리한 결과일 수 있다. The result of the data processing on the audio signal corresponding to the general information section received from the server may be a result of the server processing the audio signal corresponding to the general information section using the general model.

개시된 실시예에 따른 서버의 음성 신호 처리 방법은, 음성 신호를 수신하는 단계; 상기 음성 신호 중 개인 정보를 포함하는 개인화 정보 구간을 검출하는 단계; 상기 음성 신호 중 상기 개인화 정보 구간 이외의 구간인 일반 정보 구간에 대응하는 음성 신호에 대하여 일반 모델을 이용하여 데이터 처리하는 단계; 및 상기 일반 정보 구간에 대응하는 음성 신호에 대하여 데이터 처리한 결과를 단말로 전송하는 단계를 포함한다. According to another aspect of the present invention, there is provided a method of processing a voice signal of a server, the method comprising: receiving a voice signal; Detecting a personalized information section including personal information of the voice signal; Performing data processing on a speech signal corresponding to a general information section, which is a section of the speech signal other than the personalized information section, using a general model; And transmitting the data processing result of the voice signal corresponding to the general information section to the terminal.

또한, 상기 개인화 정보 구간과 상기 일반 정보 구간에 대한 음성 구간 정보를 생성하여 상기 단말로 전송하는 단계를 더 포함할 수 있다. The method may further include generating and transmitting the personalized information section and the voice section information for the general information section to the terminal.

또한, 상기 단말로부터 상기 개인화 정보 구간과 상기 일반 정보 구간에 대한 음성 구간 정보를 수신하는 단계를 더 포함할 수 있다. The method may further include receiving the personalized information section and the voice section information for the general information section from the terminal.

개시된 다른 실시예에 따른 단말의 음성 신호 처리 방법은, 음성 신호를 수신하는 단계; 서버로부터 상기 음성 신호에 대하여 일반 모델을 이용하여 데이터 처리한 결과를 수신하는 단계; 및 상기 음성 신호에 대하여 개인 정보를 기반으로 생성된 개인화 모델 및 상기 데이터 처리 결과를 이용하여 데이터 처리하는 단계를 포함한다. According to another aspect of the present invention, there is provided a method of processing a voice signal of a terminal, the method comprising: receiving a voice signal; Receiving a result of data processing of the voice signal from a server using a general model; And data processing using the personalization model generated based on the personal information about the voice signal and the data processing result.

또한, 상기 음성 신호에 대하여 상기 데이터 처리 결과 및 개인 정보를 기반으로 생성된 개인화 모델을 이용하여 데이터 처리를 수행하는 단계는, 상기 개인 정보를 포함하는 개인화 정보 구간에 대응하는 음성 신호에 대하여 데이터 처리를 수행하는 단계를 포함할 수 있다. The step of performing data processing with respect to the voice signal using the personalization model generated based on the data processing result and the personal information may include a step of performing data processing on the voice signal corresponding to the personalized information section including the personal information, And performing the steps of:

또한, 상기 음성 신호 중 상기 개인화 정보 구간을 검출하는 단계를 더 포함할 수 있다. The method may further include detecting the personalized information section of the voice signal.

또한, 상기 서버로부터 상기 개인화 정보 구간과 일반 정보 구간에 대한 음성 구간 정보를 수신하는 단계를 더 포함할 수 있다. The method may further include receiving the personalization information section and the voice section information for the general information section from the server.

또한, 상기 개인화 모델은, 개인화 음성 인식 모델(personalized speech recognition model), 언어 이해 모델(personalized natural language understanding model) 및 개인화 렉시컬 모델(personalized lexical model) 중 적어도 하나 이상의 모델일 수 있다. In addition, the personalization model may be at least one of a personalized speech recognition model, a personalized natural language understanding model, and a personalized lexical model.

개시된 또다른 실시예에 따른 단말의 음성 신호 처리 방법은, 개인 정보 에 ID를 매핑시켜 매핑 테이블을 생성하는 단계; 상기 매핑 테이블을 이용하여 ID 기반의 개인화 모델을 생성하는 단계; 상기 ID 기반의 개인화 모델을 서버로 전송하는 단계; 상기 서버로부터 음성 신호에 대하여 상기 ID 기반의 개인화 모델을 이용해 데이터 처리한 결과를 수신하는 단계; 및 상기 데이터 처리 결과 및 상기 매핑 테이블을 이용하여 상기 ID에 대응하는 상기 개인 정보를 복원하는 단계를 포함한다. According to another aspect of the present invention, there is provided a method for processing a voice signal of a terminal, the method comprising the steps of: mapping an ID to personal information to generate a mapping table; Generating an ID-based personalization model using the mapping table; Transmitting the ID-based personalization model to a server; Receiving a result of data processing of the voice signal from the server using the ID-based personalization model; And restoring the personal information corresponding to the ID using the data processing result and the mapping table.

또한, 상기 매핑 테이블을 이용하여 ID 기반의 개인화 모델을 생성하는 단계는, 상기 개인 정보에 매핑된 ID를 음향에 매핑된 ID인 음향 단위 ID로 나타내는 단계를 포함하는 것을 특징으로 하는 단계를 포함할 수 있다. The step of generating an ID-based personalization model using the mapping table may include the step of representing an ID mapped to the personal information as an acoustic unit ID that is an ID mapped to the sound .

또한, 상기 음향 단위 ID는, 상기 서버와 합의에 따라 상기 음향에 매핑된 ID일 수 있다. The acoustic unit ID may be an ID mapped to the sound in agreement with the server.

또한, 상기 개인 정보로부터 생성되는 부가 정보에 ID를 매핑시켜 상기 매핑 테이블을 생성하는 단계를 더 포함할 수 있다. The method may further include generating the mapping table by mapping an ID to the additional information generated from the personal information.

개시된 또다른 실시예에 따른 서버의 음성 신호 처리 방법은, 단말로부터 ID 기반의 개인화 모델을 수신하는 단계; 음성 신호를 수신하는 단계; 상기 음성 신호에 대하여 상기 ID 기반의 개인화 모델을 이용하여 데이터 처리하는 단계; 및 상기 데이터 처리 결과를 상기 단말로 전송하는 단계를 포함한다. According to another aspect of the present invention, there is provided a method of processing a voice signal of a server, comprising: receiving an ID-based personalization model from a terminal; Receiving a voice signal; Performing data processing on the voice signal using the ID-based personalization model; And transmitting the data processing result to the terminal.

또한, 상기 음성 신호에 대하여 상기 ID 기반의 개인화 모델을 이용하여 데이터 처리하는 단계는, 상기 단말과 합의에 따라 음향에 매핑된 ID인 음향 단위 ID를 이용하여 개인 정보에 매핑된 ID를 나타내는 단계를 포함할 수 있다. The step of performing data processing on the voice signal using the ID-based personalization model may include the step of indicating an ID mapped to the personal information using the sound unit ID that is an ID mapped to the sound according to the agreement with the terminal .

개시된 실시예에 따른 단말은, 음성을 수신하는 수신부; 서버와 통신을 수행하는 통신부; 및 음성 신호를 수신하고, 상기 음성 신호 중 개인 정보를 포함하는 개인화 정보 구간을 검출하며, 상기 음성 신호 중 상기 개인화 정보 구간에 대응하는 음성 신호에 대하여 상기 개인 정보를 기반으로 생성된 개인화 모델을 이용하여 데이터 처리하고, 상기 서버로부터 상기 개인화 정보 구간 이외의 구간인 일반 정보 구간에 대응하는 음성 신호에 대하여 데이터 처리한 결과를 수신하도록 제어하는 제어부를 포함한다. The terminal according to the disclosed embodiments includes: a receiver for receiving a voice; A communication unit for communicating with the server; And a personalization information section that includes personal information of the voice signal and detects a voice signal corresponding to the personalized information section of the voice signal using a personalization model generated based on the personal information And a control unit for controlling the data processing of the voice signal corresponding to the general information section, which is a section other than the personalized information section, from the server.

또한, 상기 제어부는, 상기 개인화 정보 구간과 상기 일반 정보 구간에 대한 음성 구간 정보를 생성하여 상기 서버로 전송하도록 제어할 수 있다. In addition, the controller may generate voice section information for the personalized information section and the general information section and transmit the generated voice section information to the server.

또한, 상기 음성 구간 정보는, 상기 음성 신호 중 상기 개인화 정보 구간 및 상기 일반 정보 구간 중 적어도 하나 이상의 구간을 마킹한 구간 마킹 정보를 포함할 수 있다. The voice interval information may include interval marking information in which at least one of the personalized information interval and the general information interval of the voice signal is marked.

또한, 상기 제어부는, 상기 서버로부터 상기 개인화 정보 구간과 상기 일반 정보 구간에 대한 음성 구간 정보를 수신하도록 제어할 수 있다. Also, the control unit may control the personalization information section and the voice section information on the general information section to be received from the server.

개시된 실시예에 따른 서버는, 음성을 수신하는 수신부; 단말과 통신을 수행하는 통신부; 및 음성 신호를 수신하고, 상기 음성 신호 중 개인 정보를 포함하는 개인화 정보 구간을 검출하며, 상기 음성 신호 중 상기 개인화 정보 구간 이외의 구간인 일반 정보 구간에 대응하는 음성 신호에 대하여 일반 모델을 이용하여 데이터 처리하고, 상기 일반 정보 구간에 대응하는 음성 신호에 대하여 데이터 처리한 결과를 상기 단말로 전송하도록 제어하는 제어부를 포함할 수 있다. The server according to the disclosed embodiment includes: a receiver for receiving a voice; A communication unit for performing communication with the terminal; And a controller for receiving a voice signal and detecting a personalization information section including personal information of the voice signal, and using a general model for a voice signal corresponding to a general information section, which is a section other than the personalized information section, And a control unit for controlling the data processing unit to transmit the data processing result of the voice signal corresponding to the general information section to the terminal.

또한, 상기 제어부는, 상기 개인화 정보 구간과 상기 일반 정보 구간에 대한 음성 구간 정보를 생성하여 상기 단말로 전송하도록 제어할 수 있다. Also, the controller may generate voice section information for the personalized information section and the general information section and transmit the generated voice section information to the terminal.

또한, 상기 제어부는, 상기 단말로부터 상기 개인화 정보 구간과 상기 일반 정보 구간에 대한 음성 구간 정보를 수신하도록 제어할 수 있다. In addition, the controller may control the personalization information section and the voice section information for the general information section to be received from the terminal.

개시된 다른 실시예에 따른 단말은, 서버와 통신을 수행하는 통신부; 및 상기 서버로부터 상기 음성 신호에 대하여 일반 모델을 이용하여 데이터 처리한 결과를 수신하고, 상기 음성 신호에 대하여 상기 데이터 처리 결과 및 개인 정보를 기반으로 생성된 개인화 모델을 이용하여 데이터 처리하도록 제어하는 제어부를 포함한다. According to another aspect of the present invention, there is provided a terminal comprising: a communication unit for performing communication with a server; And a control unit for receiving a result of data processing of the voice signal from the server using a general model and performing data processing on the voice signal using a personalization model generated based on the data processing result and the personal information, .

또한, 상기 제어부는, 상기 음성 신호에 대하여 상기 데이터 처리 결과 및 개인 정보를 기반으로 생성된 개인화 모델을 이용하여 데이터 처리를 수행 시, 상기 개인 정보를 포함하는 개인화 정보 구간에 대응하는 음성 신호에 대하여 데이터 처리를 수행하도록 제어할 수 있다. In addition, when the data processing is performed on the voice signal using the personalization model generated based on the data processing result and the personal information, the control unit may perform a process on the voice signal corresponding to the personalized information section including the personal information Data processing can be performed.

또한, 상기 제어부는, 상기 음성 신호 중 상기 개인화 정보 구간을 검출하도록 제어할 수 있다. In addition, the control unit may control to detect the personalized information section of the voice signal.

또한, 상기 제어부는, 상기 서버로부터 상기 개인화 정보 구간과 일반 정보 구간에 대한 음성 구간 정보를 수신하도록 제어할 수 있다. In addition, the controller may control the personalization information section and the voice section information for the general information section to be received from the server.

또한, 상기 개인화 모델은, 개인화 음성 인식 모델, 언어 이해 모델 및 개인화 렉시컬 모델 중 적어도 하나 이상의 모델일 수 있다. The personalization model may be at least one of a personalized speech recognition model, a language understanding model, and a personalized lexical model.

개시된 또다른 실시예에 따른 단말은, 음성 신호를 수신하는 수신부: 서버와 통신을 수행하는 통신부; 및 개인 정보에 ID를 대응시켜 매핑 테이블을 생성하고, 상기 매핑 테이블을 이용하여 ID 기반의 개인화 모델을 생성하며, 상기 ID 기반의 개인화 모델을 상기 서버로 전송하고, 상기 서버로부터 음성 신호에 대하여 상기 ID 기반의 개인화 모델을 이용해 데이터 처리한 결과를 수신하며, 상기 데이터 처리 결과 및 상기 매핑 테이블을 이용하여 상기 ID에 대응하는 상기 개인 정보를 복원하도록 제어하는 제어부를 포함한다. According to another aspect of the present invention, there is provided a terminal including: a receiver for receiving a voice signal; a communication unit for communicating with a server; And generating a personalization model based on the ID using the mapping table, transmitting the ID-based personalization model to the server, and transmitting the personalization model to the server using the mapping table, And a controller for receiving a result of the data processing using the ID-based personalization model and restoring the personal information corresponding to the ID using the data processing result and the mapping table.

또한, 상기 제어부는, 상기 매핑 테이블을 이용하여 ID 기반의 개인화 모델을 생성 시, 상기 개인 정보에 매핑된 ID를 음향에 매핑된 ID인 음향 단위 ID로 나타내도록 제어할 수 있다. In addition, when the ID-based personalization model is created using the mapping table, the controller may control the ID mapped to the personal information to be expressed as an ID mapped to the sound.

또한, 상기 제어부는, 상기 개인 정보로부터 생성되는 부가 정보에 ID를 매핑시켜 상기 매핑 테이블을 생성하도록 제어할 수 있다. The control unit may control the generation of the mapping table by mapping an ID to the additional information generated from the personal information.

개시된 또다른 실시예에 따른 서버는, 음성 신호를 수신하는 수신부; 단말과 통신을 수행하는 통신부; 및 단말로부터 ID 기반의 개인화 모델을 수신하고, 음성 신호를 수신하며, 상기 음성 신호에 대하여 상기 ID 기반의 개인화 모델을 이용하여 데이터 처리하고, 상기 데이터 처리 결과를 상기 단말로 전송하도록 제어할 수 있다. According to another embodiment of the present invention, there is provided a server comprising: a receiver for receiving a voice signal; A communication unit for performing communication with the terminal; And an ID-based personalization model from the terminal, receives the voice signal, processes the voice signal using the ID-based personalization model, and controls the data processing result to be transmitted to the terminal .

또한, 상기 제어부는, 상기 단말과 합의에 따라 음향에 매핑된 ID인 음향 단위 ID를 이용하여 개인 정보에 매핑된 ID를 나타내도록 제어할 수 있다.In addition, the control unit may control to display the ID mapped to the personal information by using the sound unit ID which is the ID mapped to the sound according to the agreement with the terminal.

도 1은 개시된 실시예에 따른 단말의 내부 구성을 나타내는 블록도이다.
도 2는 개시된 실시예에 따른 서버의 내부 구성을 나타내는 블록도이다.
도 3은 도 1에서 도시하는 단말의 내부 구성을 보다 상세히 나타내는 블록도이다.
도 4는 도 2에서 도시하는 서버의 내부 구성을 보다 상세히 나타내는 블록도이다.
도 5는 개시된 실시예에 따른 단말의 음성 처리 방법을 나타내는 순서도이다.
도 6은 개시된 실시예에 따른 서버의 음성 처리 방법을 나타내는 순서도이다.
도 7은 개인화 정보 구간과 일반 정보 구간을 설명하는 도면이다.
도 8은 개시된 실시예에 따른 단말과 서버의 구체적인 동작 과정의 예시를 나타내는 순서도이다.
도 9는 개시된 다른 실시예에 따른 단말의 음성 처리 방법을 나타내는 순서도이다.
도 10은 개시된 다른 실시예에 따른 서버의 음성 처리 방법을 나타내는 순서도이다.
도 11은 개시된 다른 실시예에 따른 단말과 서버의 구체적인 동작 과정의 예시를 나타내는 순서도이다.
도 12는 개시된 또다른 실시예에 따른 단말의 내부 구성을 나타내는 블록도이다.
도 13은 개시된 또다른 실시예에 따른 서버의 내부 구성을 보다 상세히 나타내는 블록도이다.
도 14는 도 12에서 도시하는 단말의 내부 구성을 보다 상세히 나타내는 블록도이다.
도 15는 도 13에서 도시하는 단말의 내부 구성을 보다 상세히 나타내는 블록도이다.
도 16은 개시된 또다른 실시예에 따른 단말의 음성 처리 방법을 나타내는 순서도이다.
도 17은 개시된 또다른 실시예에 따른 서버의 음성 처리 방법을 나타내는 순서도이다.
도 18은 개인 정보를 나타내는 도면이다.
도 19는 개인 정보를 발음 기호 별로 나타내는 도면이다.
도 20은 개인 정보를 ID에 매핑시킨 매핑 테이블을 나타내는 도면이다.
도 21은 개인 정보의 발음 기호를 ID에 매핑시킨 매핑 테이블을 나타내는 도면이다.
도 22는 개인 정보 ID를 발음 기호 ID로 나타내는 도면이다.
도 23은 개시된 다른 실시예에 따른 단말과 서버의 구체적인 동작 과정의 예시를 나타내는 순서도이다.1 is a block diagram illustrating an internal configuration of a UE according to an embodiment of the present invention.
2 is a block diagram showing an internal configuration of a server according to the disclosed embodiment.
3 is a block diagram showing the internal configuration of the terminal shown in FIG. 1 in more detail.
4 is a block diagram showing the internal configuration of the server shown in Fig. 2 in more detail.
5 is a flowchart illustrating a voice processing method of a terminal according to the disclosed embodiment.
6 is a flowchart showing a voice processing method of a server according to the disclosed embodiment.
7 is a view for explaining a personalization information section and a general information section.
8 is a flowchart illustrating an exemplary operation procedure of a terminal and a server according to the disclosed embodiment.
9 is a flowchart showing a voice processing method of a terminal according to another disclosed embodiment.
10 is a flowchart showing a voice processing method of a server according to another disclosed embodiment.
11 is a flowchart illustrating an exemplary operation procedure of a terminal and a server according to another disclosed embodiment.
12 is a block diagram illustrating an internal configuration of a terminal according to another disclosed embodiment.
FIG. 13 is a block diagram showing an internal configuration of a server according to another disclosed embodiment in more detail.
FIG. 14 is a block diagram showing the internal configuration of the terminal shown in FIG. 12 in more detail.
15 is a block diagram showing the internal configuration of the terminal shown in FIG. 13 in more detail.
16 is a flowchart showing a voice processing method of a terminal according to another disclosed embodiment.
17 is a flowchart showing a voice processing method of a server according to another disclosed embodiment.
18 is a diagram showing personal information.
FIG. 19 is a diagram showing personal information by phonetic symbols. FIG.
20 is a diagram showing a mapping table in which personal information is mapped to an ID.
21 is a diagram showing a mapping table in which pronunciation symbols of personal information are mapped to IDs.
FIG. 22 is a diagram showing the personal information ID by the pronunciation symbol ID. FIG.
23 is a flowchart showing an example of a specific operation procedure of a terminal and a server according to another disclosed embodiment.

개시된 실시예의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나 개시된 실시예는 이하에서 개시되는 실시예들에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 수 있으며, 단지 본 실시예들은 개시된 실시예의 개시가 완전하도록 하고, 개시된 실시예가 속하는 기술분야에서 통상의 지식을 가진 자에게 발명의 범주를 완전하게 알려주기 위해 제공되는 것이며, 개시된 실시예는 청구항의 범주에 의해 정의될 뿐이다. 명세서 전체에 걸쳐 동일 참조 부호는 동일 구성 요소를 지칭한다.BRIEF DESCRIPTION OF THE DRAWINGS The advantages and features of the disclosed embodiments, and how to accomplish them, will become apparent with reference to the embodiments described below with reference to the accompanying drawings. It should be understood, however, that the disclosed embodiments are not limited to the embodiments disclosed herein but are to be embodied in different forms and should not be construed as limited to the specific embodiments set forth herein, It is to be understood that the present invention is not limited to the above-described embodiments, but may be modified and changed without departing from the scope of the present invention. Like reference numerals refer to like elements throughout the specification.

명세서 전체에서, 어떤 부분이 다른 부분과 "연결"되어 있다고 할 때, 이는 "직접적으로 연결"되어 있는 경우뿐 아니라, 그 중간에 다른 소자를 사이에 두고 "전기적으로 연결"되어 있는 경우도 포함한다. 또한, 명세서 전체에서 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있음을 의미한다. 또한, 명세서에서 사용되는 "부"라는 용어는 소프트웨어, FPGA 또는 ASIC과 같은 하드웨어 구성요소를 의미하며, "부"는 어떤 역할들을 수행한다. 그렇지만 '부'는 소프트웨어 또는 하드웨어에 한정되는 의미는 아니다. '부'는 어드레싱할 수 있는 저장 매체에 있도록 구성될 수도 있고 하나 또는 그 이상의 프로세서들을 재생시키도록 구성될 수도 있다. 따라서, 일 예로서 "부"는 소프트웨어 구성요소들, 객체지향 소프트웨어 구성요소들, 클래스 구성요소들 및 태스크 구성요소들과 같은 구성요소들과, 프로세스들, 함수들, 속성들, 프로시저들, 서브루틴들, 프로그램 코드의 세그먼트들, 드라이버들, 펌웨어, 마이크로 코드, 회로, 데이터, 데이터베이스, 데이터 구조들, 테이블들, 어레이들 및 변수들을 포함한다. 구성요소들과 "부"들 안에서 제공되는 기능은 더 작은 수의 구성요소들 및 "부"들로 결합되거나 추가적인 구성요소들과 "부"들로 더 분리될 수 있다.Throughout the specification, when a part is referred to as being "connected" to another part, it includes not only "directly connected" but also "electrically connected" with another part in between . Also, when an element is referred to as "including" an element throughout the specification, it is to be understood that the element may include other elements, Also, as used herein, the term "part " refers to a hardware component such as software, FPGA or ASIC, and" part " However, 'minus' is not limited to software or hardware. The " part " may be configured to reside on an addressable storage medium and may be configured to play back one or more processors. Thus, by way of example, and not limitation, "part (s) " refers to components such as software components, object oriented software components, class components and task components, and processes, Subroutines, segments of program code, drivers, firmware, microcode, circuitry, data, databases, data structures, tables, arrays and variables. The functions provided in the components and "parts " may be combined into a smaller number of components and" parts " or further separated into additional components and "parts ".

아래에서는 첨부한 도면을 참고하여 개시된 실시예의 실시예에 대하여 개시된 실시예가 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 상세히 설명한다. 그러나 개시된 실시예는 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 개시된 실시예를 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략한다.BRIEF DESCRIPTION OF THE DRAWINGS The above and other objects, features and advantages of the present invention will be more apparent from the following detailed description taken in conjunction with the accompanying drawings, in which: FIG. However, the disclosed embodiments may be embodied in many different forms and are not limited to the embodiments described herein. In order to clearly illustrate the embodiments disclosed in the drawings, portions not related to the description are omitted.

개시된 실시예에서 사용되는 용어는 개시된 실시예에서의 기능을 고려하면서 가능한 현재 널리 사용되는 일반적인 용어들을 선택하였으나, 이는 당 분야에 종사하는 기술자의 의도 또는 판례, 새로운 기술의 출현 등에 따라 달라질 수 있다. 또한, 특정한 경우는 출원인이 임의로 선정한 용어도 있으며, 이 경우 해당되는 발명의 설명 부분에서 상세히 그 의미를 기재할 것이다. 따라서 개시된 실시예에서 사용되는 용어는 단순한 용어의 명칭이 아닌, 그 용어가 가지는 의미와 개시된 실시예의 전반에 걸친 내용을 토대로 정의되어야 한다. Although the terms used in the disclosed embodiments have been chosen to take into account the functionality in the disclosed embodiments, the presently widely used generic terms have been chosen and may vary depending on the intent or circumstance of the skilled artisan, the emergence of new technology, and the like. Also, in certain cases, there may be a term selected arbitrarily by the applicant, in which case the meaning thereof will be described in detail in the description of the corresponding invention. Thus, the terms used in the disclosed embodiments should not be construed as merely descriptive terms, but rather should be defined on the basis of the meaning of the terms and throughout the description of the disclosed embodiments.

본 명세서에서, 개인화 정보 구간(personalized information section)은 음성 신호 중 직간접적으로 각 개인을 식별할 수 있는 개인 정보를 포함하는 구간을 의미한다. 예를 들어, 음성 신호 중에 사용자 단말의 전화번호부에 저장된 이름, 사용자의 검색 기록, 사용자 위치 정보 등을 포함하는 구간이 개인화 정보 구간에 해당할 수 있다. In this specification, a personalized information section refers to a section including personal information that can directly or indirectly identify each individual of the voice signals. For example, a section including a name stored in the telephone directory of the user terminal, a search history of the user, user location information, and the like in the voice signal may correspond to the personalized information section.

본 명세서에서 일반 정보 구간(general information section)은 음성 신호 중 개인 정보가 아닌 일반적인 정보를 포함하는 구간으로, 개인화 정보 구간을 제외한 나머지 구간을 의미한다. 예를 들어, ‘전화해’, ‘밥 먹었어?’ 등과 같이 일반적인 의미를 갖는 문장을 포함하는 구간이 일반 정보 구간에 해당할 수 있다. In this specification, a general information section is a section including general information, not personal information, of a speech signal, and means a section excluding a personalized information section. For example, a section including a sentence having a general meaning such as 'Call me' or 'Did you eat?' May correspond to a general information section.

본 명세서에서 개인화 모델(personalized model)은, 개인별 특징을 반영한 음성 처리 모델로, 특정 개인에 대한 맞춤형 음성 처리 모델이다. In this specification, a personalized model is a speech processing model that reflects individual characteristics, and is a customized speech processing model for a specific individual.

본 명세서에서 일반 모델(general model)은, 일반적인 음성 처리 모델로, 특정 개인이 아닌 불특정인의 음성 처리를 위한 음성 처리 모델이다. In this specification, a general model is a general speech processing model, and is a speech processing model for speech processing of an unspecified person rather than a specific individual.

도 1은 개시된 실시예에 따른 단말의 내부 구성을 나타내는 블록도이다.1 is a block diagram illustrating an internal configuration of a UE according to an embodiment of the present invention.

도 1을 참조하면, 개시된 실시예에 따른 단말(100)은 수신부(110), 통신부(130) 및 제어부(150)를 포함한다. Referring to FIG. 1, a terminal 100 according to an embodiment includes a receiving unit 110, a communication unit 130, and a controller 150.

수신부(110)는 음성 신호를 수신하는 역할을 수행한다. 수신부(110)는 마이크부, USB 인터페이스부, DVD 인터페이스부 등 다양한 구성 요소를 포함할 수 있다. 예를 들어, 수신부(110)가 마이크부를 포함하는 경우, 단말(100)은 사용자 음성 신호를 마이크부를 통해 직접 수신할 수 있다. 또한, 수신부(110)가 USB 인터페이스부를 포함하는 경우, 단말(100)은 음성 신호 파일을 USB로부터 수신할 수도 있다. 나아가, 통신부(130)를 통해 외부 장치로부터 음성 신호를 수신하는 경우, 통신부(130)가 수신부(110)의 역할을 수행하는 것도 가능하다. The receiving unit 110 plays a role of receiving a voice signal. The receiving unit 110 may include various components such as a microphone unit, a USB interface unit, and a DVD interface unit. For example, when the receiving unit 110 includes a microphone unit, the terminal 100 can directly receive a user voice signal through the microphone unit. Also, when the receiving unit 110 includes the USB interface unit, the terminal 100 may receive the voice signal file from the USB. Furthermore, when receiving a voice signal from an external device through the communication unit 130, the communication unit 130 may serve as the receiving unit 110.

통신부(130)는 외부 장치와 통신하는 역할을 수행한다. 통신부(130)는 유선 또는 무선으로 네트워크와 연결되어 외부 장치와의 통신을 수행할 수 있다. 개시된 실시예에 따르면, 통신부(130)는 서버와 통신하며 데이터를 송수신할 수 있다. 예를 들어, 통신부(130)는, 근거리 통신 모듈, 이동 통신 모듈, 무선 인터넷 모듈, 유선 인터넷 모듈 등을 포함할 수 있다. 또한, 통신부(130)는 하나 이상의 구성 요소를 포함할 수도 있다. The communication unit 130 performs communication with the external device. The communication unit 130 may be connected to the network by wire or wireless, and may perform communication with an external device. According to the disclosed embodiment, the communication unit 130 communicates with the server and can transmit and receive data. For example, the communication unit 130 may include a short-range communication module, a mobile communication module, a wireless Internet module, and a wired Internet module. Further, the communication unit 130 may include one or more components.

제어부(150)는 단말(100) 전체의 동작을 제어하며, 수신부(110) 및 통신부(130)를 제어함으로써 음성 신호를 처리할 수 있다. 제어부(150)는 단말(100)의 외부에서부터 입력되는 신호 또는 데이터를 저장하거나, 전자 장치에서 수행되는 다양한 작업에 대응되는 저장 영역으로 사용되는 램, 주변기기의 제어를 위한 제어 프로그램이 저장된 롬(ROM) 및 프로세서(Processor)를 포함할 수 있다. 프로세서는 코어(core, 도시되지 아니함)와 GPU(도시되지 아니함)를 통합한 SoC(System On Chip)로 구현될 수 있다. 또한, 프로세서는 복수의 프로세서를 포함할 수 있다. The control unit 150 controls the operation of the entire terminal 100 and can process the voice signal by controlling the receiving unit 110 and the communication unit 130. The controller 150 stores signals or data input from the outside of the terminal 100, a RAM used as a storage area corresponding to various jobs performed in the electronic device, a ROM (ROM) storing a control program for controlling the peripheral devices, And a processor. The processor may be implemented as a SoC (System On Chip) incorporating a core (not shown) and a GPU (not shown). A processor may also include a plurality of processors.

개시된 실시예에 따른 제어부(150)는 수신부(110)를 통해 음성 신호를 수신하여, 수신한 음성 신호 중 개인 정보를 포함하는 개인화 정보 구간을 검출하고, 개인화 정보 구간에 대응하는 음성 신호에 대하여 개인 정보를 기반으로 생성된 개인화 모델을 이용하여 데이터 처리하며, 통신부(130)를 통해 서버로부터 개인화 정보 구간 이외의 구간인 일반 정보 구간에 대응하는 음성 신호에 대하여 데이터 처리한 결과를 수신하도록 제어한다. 이때, 서버로부터 수신한 일반 정보 구간에 대응하는 음성 신호에 대하여 데이터 처리한 결과는, 서버가 일반 모델을 이용하여 일반 정보 구간에 대응하는 음성 신호를 처리한 결과일 수 있다. 서버는 높은 연산 능력을 가지는 바, 일반 정보 구간에 대하여 빠른 데이터 처리가 가능하다. The control unit 150 according to the disclosed embodiment receives a voice signal through the receiving unit 110, detects a personalized information section including personal information in the received voice signal, And controls the communication unit 130 to receive the data processing result of the voice signal corresponding to the general information section, which is a section other than the personalized information section, from the server. At this time, the data processing result of the voice signal corresponding to the general information section received from the server may be a result of processing the voice signal corresponding to the general information section by using the general model of the server. Since the server has a high computation capability, it is possible to process data in a fast manner over a general information section.

또한, 제어부(150)는 통신부(130)를 통해 개인화 정보 구간과 일반 정보 구간에 대한 음성 구간 정보를 생성하여 서버로 전송하도록 제어할 수 있거나, 또는 서버로부터 개인화 정보 구간과 일반 정보 구간에 대한 음성 구간 정보를 수신하도록 제어할 수도 있다. 여기서, 음성 구간 정보는 음성 신호 중 개인화 정보 구간 및 일반 정보 구간 중 적어도 하나 이상의 구간을 마킹한 구간 마킹 정보를 포함할 수 있다. In addition, the control unit 150 may generate the personalized information section and the voice section information for the general information section through the communication section 130 and may transmit the personalized information section and the general information section to the server, It may be controlled to receive the section information. Here, the voice interval information may include interval marking information that marks at least one interval of the personalized information interval and the general information interval in the voice signal.

나아가, 제어부(150)는 음성 처리 결과를 사용자에게 출력하도록 제어할 수 있다. Furthermore, the control unit 150 can control the output of the voice processing result to the user.

개시된 실시예에 따르면, 단말(100)은 개인화 정보 구간에 대응하는 음성 신호에 대하여 개인 정보를 기반으로 생성된 개인화 모델을 이용하여 데이터 처리하고, 개인화 정보 구간 이외의 구간인 일반 정보 구간에 대응하는 음성 신호에 대해서는 서버에서 데이터 처리한 결과를 수신하게 된다. 서버는 일반 정보 구간에 대해서만 데이터 처리를 수행하므로, 데이터 처리 과정에 개인화 모델을 사용하지 않는다. 따라서, 서버와 단말(100) 간에는 개인 정보를 포함하는 어떠한 형태의 정보 전송도 일어나지 않고, 그에 따라 서버에는 어떠한 개인 정보도 저장되지 않는다. 결과적으로 개시된 실시예에 따르면, 개인 정보를 원천적으로 보호하면서도 보다 높은 성능 및 처리 속도를 갖는 음성 처리 시스템을 구현할 수 있다. According to the disclosed embodiment, the terminal 100 performs data processing on a voice signal corresponding to a personalized information section using a personalization model generated based on personal information, and transmits the personalized information model corresponding to the general information section, which is a section other than the personalized information section And a result of data processing by the server is received for the voice signal. Since the server performs data processing only for the general information section, the personalization model is not used in the data processing process. Therefore, no type of information transmission including personal information occurs between the server and the terminal 100, so that no personal information is stored in the server. As a result, according to the disclosed embodiment, a voice processing system having higher performance and higher processing speed can be implemented while originally protecting personal information.

개시된 다른 실시예에 따른 제어부(150)는, 수신부(110)를 통해 음성 신호를 수신하며, 통신부(130)를 통해 서버로부터 음성 신호에 대하여 일반 모델을 이용하여 데이터 처리한 결과를 수신하고, 음성 신호에 대하여 데이터 처리 결과 및 개인 정보를 기반으로 생성된 개인화 모델을 이용하여 데이터 처리하도록 제어한다. 이 경우, 서버(200)는 높은 연산 능력을 가지는 바, 일반 정보 구간에 대하여 빠른 데이터 처리가 가능하다. 또한, 개인화 모델은, 개인화 음성 인식 모델, 언어 이해 모델 및 개인화 렉시컬 모델 중 적어도 하나 이상의 모델일 수 있다. The control unit 150 according to another embodiment of the present invention receives a voice signal through the receiving unit 110 and receives a result of data processing using the general model for the voice signal from the server through the communication unit 130, And controls the data processing using the personalization model generated based on the data processing result and the personal information. In this case, since the server 200 has a high computation capability, it is possible to process data in a short period of time. In addition, the personalization model may be at least one of a personalized speech recognition model, a language understanding model, and a personalized lexical model.

또한, 제어부(150)는, 음성 신호에 대하여 데이터 처리 결과 및 개인 정보를 기반으로 생성된 개인화 모델을 이용하여 데이터 처리를 수행 시, 개인 정보를 포함하는 개인화 정보 구간에 대응하는 음성 신호에 대하여 데이터 처리를 수행하도록 제어할 수 있다. 이때, 제어부(150)는 음성 신호 중 개인화 정보 구간을 검출하도록 제어할 수도 있고, 통신부(130)를 통해 서버로부터 개인화 정보 구간과 일반 정보 구간에 대한 음성 구간 정보를 수신하도록 제어할 수도 있다. 여기서, 음성 구간 정보는 음성 신호 중 개인화 정보 구간 및 일반 정보 구간 중 적어도 하나 이상의 구간을 마킹한 구간 마킹 정보를 포함할 수 있다. In addition, when the data processing is performed using the personalization model generated based on the data processing result and the personal information with respect to the voice signal, the control unit 150 controls the data processing unit 150 to process the voice signal corresponding to the personalized information section including the personal information, Processing can be performed. At this time, the controller 150 may control to detect the personalization information section of the voice signal, or may control the personalization information section and the voice section information about the general information section from the server through the communication section 130. Here, the voice interval information may include interval marking information that marks at least one interval of the personalized information interval and the general information interval in the voice signal.

개시된 실시예에 따르면, 단말(100)은 서버로부터 음성 신호에 대하여 일반 모델을 이용하여 데이터 처리한 결과를 수신하고, 수신한 음성 신호에 대하여 데이터 처리 결과 및 개인 정보를 기반으로 생성된 개인화 모델을 이용하여 다시 한 번 데이터 처리를 수행한다. 즉, 개인화 모델링을 이용한 음성 처리는 단말(100)에서만 수행되므로, 서버와 단말(100) 간에는 개인 정보를 포함하는 어떠한 형태의 정보 전송도 일어나지 않고, 그에 따라 서버에는 어떠한 개인 정보도 저장되지 않는다. 결과적으로 개시된 실시예에 따르면, 개인 정보를 원천적으로 보호하면서도 보다 높은 성능 및 처리 속도를 갖는 음성 처리 시스템을 구현할 수 있다. According to the disclosed embodiment, the terminal 100 receives a result of data processing of a speech signal using a general model from a server, and generates a personalization model based on the data processing result and personal information with respect to the received speech signal And performs data processing once again. That is, since voice processing using personalization modeling is performed only in the terminal 100, no information transmission including personal information occurs between the server and the terminal 100, and thus no personal information is stored in the server. As a result, according to the disclosed embodiment, a voice processing system having higher performance and higher processing speed can be implemented while originally protecting personal information.

도 2는 개시된 실시예에 따른 서버의 내부 구성을 나타내는 블록도이다.2 is a block diagram showing an internal configuration of a server according to the disclosed embodiment.

도 2를 참조하면, 개시된 실시예에 따른 서버(200)는 수신부(210), 통신부(230) 및 제어부(250)를 포함한다. Referring to FIG. 2, the server 200 according to the disclosed embodiment includes a receiving unit 210, a communication unit 230, and a control unit 250.

수신부(210)는 음성 신호를 수신하는 역할을 수행한다. 수신부(210)는 USB 인터페이스부, DVD 인터페이스부 등 음성 신호를 다양한 형태로 수신할 수 있는 구성 요소를 포함할 수 있다. 예를 들어, 수신부(210)가 USB 인터페이스를 포함하는 경우, 서버(200)는 음성 신호 파일을 USB로부터 수신할 수 있다. 나아가, 통신부(230)를 통해 외부 장치로부터 음성 신호를 수신하는 경우, 통신부(230)가 수신부(210)의 역할을 수행하는 것도 가능하다. The receiving unit 210 plays a role of receiving a voice signal. The receiving unit 210 may include a component that can receive a voice signal in various forms, such as a USB interface unit and a DVD interface unit. For example, when the receiving unit 210 includes a USB interface, the server 200 can receive a voice signal file from the USB. Further, when receiving a voice signal from an external device through the communication unit 230, the communication unit 230 may also serve as the receiving unit 210.

통신부(230)는 외부 장치와 통신하는 역할을 수행한다. 통신부(230)는 유선 또는 무선으로 네트워크와 연결되어 외부 장치와의 통신을 수행할 수 있다. 개시된 실시예에 따르면, 통신부(230)는 단말(100)과 통신하며 데이터를 송수신할 수 있다. 개시된 실시예에 따르면, 통신부(130)는 서버와 통신하며 데이터를 송수신할 수 있다. 예를 들어, 통신부(130)는, 근거리 통신 모듈, 이동 통신 모듈, 무선 인터넷 모듈, 유선 인터넷 모듈 등을 포함할 수 있다. 또한, 통신부(130)는 하나 이상의 구성 요소를 포함할 수도 있다.The communication unit 230 performs communication with the external device. The communication unit 230 may be connected to the network by wire or wirelessly to perform communication with an external device. According to the disclosed embodiment, the communication unit 230 can communicate with the terminal 100 and transmit / receive data. According to the disclosed embodiment, the communication unit 130 communicates with the server and can transmit and receive data. For example, the communication unit 130 may include a short-range communication module, a mobile communication module, a wireless Internet module, and a wired Internet module. Further, the communication unit 130 may include one or more components.

제어부(250)는 서버(200) 전체의 동작을 제어하며, 수신부(210) 및 통신부(230)를 제어함으로써 음성 신호를 처리할 수 있다. 제어부(250)는 서버(200)의 외부에서부터 입력되는 신호 또는 데이터를 저장하거나, 전자 장치에서 수행되는 다양한 작업에 대응되는 저장 영역으로 사용되는 램, 주변기기의 제어를 위한 제어 프로그램이 저장된 롬(ROM) 및 프로세서(Processor)를 포함할 수 있다. 프로세서는 코어(core, 도시되지 아니함)와 GPU(도시되지 아니함)를 통합한 SoC(System On Chip)로 구현될 수 있다. 또한, 프로세서는 복수의 프로세서를 포함할 수 있다.The control unit 250 controls the operation of the entire server 200 and can process the voice signal by controlling the receiving unit 210 and the communication unit 230. The controller 250 stores signals or data input from the outside of the server 200, a RAM used as a storage area corresponding to various jobs performed in the electronic device, a ROM (ROM) storing a control program for controlling peripheral devices, And a processor. The processor may be implemented as a SoC (System On Chip) incorporating a core (not shown) and a GPU (not shown). A processor may also include a plurality of processors.

개시된 실시예에 따른 제어부(250)는 수신부(210)를 통해 음성 신호를 수신하여, 음성 신호 중 개인 정보를 포함하는 개인화 정보 구간을 검출하고, 음성 신호 중 개인화 정보 구간 이외의 구간인 일반 정보 구간에 대응하는 음성 신호에 대하여 일반 모델을 이용하여 데이터 처리하며, 통신부(230)를 통해 일반 정보 구간에 대응하는 음성 신호에 대하여 데이터 처리한 결과를 단말(100)로 전송하도록 제어한다. The control unit 250 according to the disclosed embodiment receives a voice signal through the receiving unit 210, detects a personalized information section including personal information in the voice signal, and detects a personal information section including a general information section And controls the communication unit 230 to transmit the result of the data processing to the voice signal corresponding to the general information section to the terminal 100. [

또한, 제어부(250)는 통신부(230)를 통해 개인화 정보 구간과 일반 정보 구간에 대한 음성 구간 정보를 생성하여 단말(100)로 전송하거나, 단말(100)로부터 개인화 정보 구간과 일반 정보 구간에 대한 음성 구간 정보를 수신한다. 여기서, 음성 구간 정보는 음성 신호 중 개인화 정보 구간 및 일반 정보 구간 중 적어도 하나 이상의 구간을 마킹한 구간 마킹 정보를 포함할 수 있다.The control unit 250 may generate the personalized information section and the voice section information for the general information section through the communication section 230 and transmit the personalized information section and the general information section to the terminal 100 or may transmit the personalized information section and the general information section, And receives voice section information. Here, the voice interval information may include interval marking information that marks at least one interval of the personalized information interval and the general information interval in the voice signal.

개시된 실시예에 따르면, 서버(200)는은 일반 정보 구간에 대응하는 음성 신호에만 데이터 처리를 수행하므로, 데이터 처리 과정에 개인화 모델(171)을 사용하지 않는다. 따라서, 서버와 단말(100) 간에는 개인 정보를 포함하는 어떠한 형태의 정보 전송도 일어나지 않고, 그에 따라 서버에는 어떠한 개인 정보도 저장되지 않는다. 결과적으로 개시된 실시예에 따르면, 개인 정보를 원천적으로 보호하면서도 보다 높은 성능 및 처리 속도를 갖는 음성 처리 시스템을 구현할 수 있다. According to the disclosed embodiment, the server 200 does not use the personalization model 171 for data processing because it performs data processing only on voice signals corresponding to the general information section. Therefore, no type of information transmission including personal information occurs between the server and the terminal 100, so that no personal information is stored in the server. As a result, according to the disclosed embodiment, a voice processing system having higher performance and higher processing speed can be implemented while originally protecting personal information.

개시된 다른 실시예에 따른 제어부(250)는, 수신부(210)를 통해 음성 신호를 수신하여, 음성 신호 중 개인화 정보 구간 이외의 구간인 일반 정보 구간에 대응하는 음성 신호에 대하여 일반 모델을 이용하여 데이터 처리하고, 통신부(230)를 통해 데이터 처리 결과를 단말로 전송한다. 제어부(250)는 음성 신호 중 개인화 음성 구간에 대한 음성 구간 정보를 생성할 수 있고, 생성한 음성 구간 정보를 단말(100)로 전송할 수 있다. 여기서, 음성 구간 정보는 음성 신호 중 개인화 정보 구간 및 일반 정보 구간 중 적어도 하나 이상의 구간을 마킹한 구간 마킹 정보를 포함할 수 있다.The control unit 250 according to another embodiment of the present invention receives the voice signal through the receiving unit 210 and transmits the voice signal corresponding to the general information section, which is a section other than the personalization information section, And transmits the data processing result to the terminal through the communication unit 230. The controller 250 can generate voice section information on the personalized voice section of the voice signal and transmit the generated voice section information to the terminal 100. Here, the voice interval information may include interval marking information that marks at least one interval of the personalized information interval and the general information interval in the voice signal.

개시된 실시예에 따르면, 서버(200)는은 수신한 음성 신호에 대하여 일반 모델만을 이용하여 데이터 처리를 수행하므로, 데이터 처리 과정에 개인화 모델(171)을 사용하지 않는다. 따라서, 서버와 단말(100) 간에는 개인 정보를 포함하는 어떠한 형태의 정보 전송도 일어나지 않고, 그에 따라 서버에는 어떠한 개인 정보도 저장되지 않는다. 결과적으로 개시된 실시예에 따르면, 개인 정보를 원천적으로 보호하면서도 보다 높은 성능 및 처리 속도를 갖는 음성 처리 시스템을 구현할 수 있다. According to the disclosed embodiment, the server 200 does not use the personalization model 171 in the data processing process because it performs data processing only on the general model for the speech signal received. Therefore, no type of information transmission including personal information occurs between the server and the terminal 100, so that no personal information is stored in the server. As a result, according to the disclosed embodiment, a voice processing system having higher performance and higher processing speed can be implemented while originally protecting personal information.

도 3은 도 1에서 도시하는 단말의 내부 구성을 보다 상세히 나타내는 블록도이다. 도 3에서는, 도 1과 중복되는 구성 요소에 대한 설명은 생략한다. 3 is a block diagram showing the internal configuration of the terminal shown in FIG. 1 in more detail. In FIG. 3, the description of the components that are the same as those in FIG. 1 will be omitted.

제어부(150)는 음성 처리 엔진(151)을 포함할 수 있다. 개시된 실시예에 따르면, 음성 처리 엔진(151)은 음성 인식 엔진(ASR engine)과 언어 이해 엔진(NLU engine)을 포함할 수 있으며, 수신한 음성 신호를 데이터 처리하여 음성 인식 및 언어 이해를 수행한다. 이때, 음성 인식 엔진과 언어 이해 엔진은 각각 음성 인식 모델과 언어 이해 모델을 이용하여 음성 신호를 처리할 수 있다. The control unit 150 may include a voice processing engine 151. According to the disclosed embodiment, the speech processing engine 151 may include a speech recognition engine (ASR engine) and a language understanding engine (NLU engine), and performs data processing on the received speech signal to perform speech recognition and language understanding . At this time, the speech recognition engine and the language understanding engine can process the speech signal using the speech recognition model and the language understanding model, respectively.

음성 인식 모델은 음향 모델과 언어 모델을 포함할 수 있다. 음향 모델은 음성 신호에 대한 모델로, 수집된 많은 양의 음성 데이터로부터 통계적인 방법을 통하여 생성된다. 언어 모델은 사용자 발화에 대한 문법적 모델로 이 또한 수집된 많은 양의 텍스트 데이터로부터 통계적 학습을 통하여 얻어지는 것이 일반적이다. 언어이해 모델은 사용자 발화에 대한 의미를 나타내는 모델(Semantic model)로 많은 양의 텍스트 데이터로부터 통계적 학습을 하거나, 사용 시나리오를 고려하여 의미 이해 규칙을 작성함으로써 얻어진다.The speech recognition model may include an acoustic model and a language model. The acoustic model is a model for speech signals, and is generated by statistical methods from a large amount of collected voice data. The language model is a grammatical model of user utterance, and it is common to obtain statistical learning from a large amount of text data that is also collected. The language understanding model is a semantic model that represents the meaning of user utterance. It is obtained by statistical learning from a large amount of text data or by creating a semantic understanding rule in consideration of a usage scenario.

단말(100)은 저장부(170)를 더 포함할 수 있다. 저장부(170)는 단말(100)의 동작에 필요한 프로그램 및 데이터를 저장하는 역할을 수행한다. 저장부(170)는 휘발성(volatile) 저장 매체 또는 비휘발성(nonvolatile) 저장 매체로 구성될 수 있으며, 양 저장 매체의 조합(combination)으로 구성될 수도 있다. 휘발성 저장 매체로는 RAM, DRAM, SRAM과 같은 반도체 메모리(semiconductor memory)가 포함될 수 있으며, 비휘발성 저장 매체로는 하드 디스크(hard disk), 플래시 낸드 메모리(Flash NAND Memory)가 포함될 수 있다. The terminal 100 may further include a storage unit 170. The storage unit 170 stores programs and data necessary for the operation of the terminal 100. The storage unit 170 may be a volatile storage medium or a nonvolatile storage medium and may be a combination of both storage media. The volatile storage medium may include a semiconductor memory such as a RAM, a DRAM, and a SRAM. The nonvolatile storage medium may include a hard disk and a flash NAND memory.

개시된 실시예에 따르면 저장부(170)에는 개인 정보(172)가 저장될 수 있다. 개인 정보(172)는 직간접적으로 각 개인을 식별할 수 있는 정보로, 단말의 종류에 따라 저장되는 데이터의 종류가 달라질 수 있다. 예를 들어, 모바일 디바이스의 경우에는 연락처, 음악 리스트, 단문 메시지의 내용이나 수신, 발신 내역, 웹 검색 이력을 포함할 수 있고, TV 의 경우에는 개인적인 재생 목록 등이 포함될 수 있다.According to the disclosed embodiment, the personal information 172 may be stored in the storage unit 170. The personal information 172 is information that can directly or indirectly identify each individual, and the type of data stored according to the type of the terminal can be changed. For example, in the case of a mobile device, it may include a contact, a music list, a short message content, a reception history, a call history, and a web search history. In the case of a TV, a personal playlist may be included.

또한, 저장부(170)에는 개인화 모델(171)이 저장될 수 있다. 개인화 모델(171)은 개인 정보를 이용하여 생성된 개인별 특징을 반영한 음성 처리 모델이다. 저장부(170)에는 개인화된 음성 인식 모델 및/또는 개인화된 언어 이해 모델이 저장될 수 있다. 이러한 개인화된 음성 인식 모델 및/또는 개인화된 언어 이해 모델을 이용하는 경우, 보다 높은 성능을 갖는 음성 처리 시스템을 구현할 수 있다. In addition, the personalization model 171 may be stored in the storage unit 170. The personalization model 171 is a speech processing model that reflects personal characteristics generated using personal information. The storage unit 170 may store a personalized speech recognition model and / or a personalized language understanding model. When such a personalized speech recognition model and / or a personalized language understanding model is used, a speech processing system having higher performance can be implemented.

도 4는 도 2에서 도시하는 서버의 내부 구성을 보다 상세히 나타내는 블록도이다. 도 4에서는, 도 2와 중복되는 구성 요소에 대한 설명은 생략한다.4 is a block diagram showing the internal configuration of the server shown in Fig. 2 in more detail. In Fig. 4, the description of the components that are the same as those in Fig. 2 will be omitted.

제어부(250)는 음성 처리 엔진(251)을 포함할 수 있다. 개시된 실시예에 따르면, 음성 처리 엔진(151)은 음성 인식 엔진(ASR engine)과 언어 이해 엔진(NLU engine)을 포함할 수 있으며, 수신한 음성 신호를 데이터 처리하여 음성 인식 및 언어 이해를 수행한다. 이때, 음성 인식 엔진과 언어 이해 엔진은 각각 음성 인식 모델과 언어 이해 모델을 이용하여 음성 신호를 처리할 수 있다. The control unit 250 may include a speech processing engine 251. According to the disclosed embodiment, the speech processing engine 151 may include a speech recognition engine (ASR engine) and a language understanding engine (NLU engine), and performs data processing on the received speech signal to perform speech recognition and language understanding . At this time, the speech recognition engine and the language understanding engine can process the speech signal using the speech recognition model and the language understanding model, respectively.

서버(200)는 저장부(270)를 더 포함할 수 있다. 저장부(270)는 서버(200)의 동작에 필요한 프로그램 및 데이터를 저장하는 역할을 수행한다. 저장부(270)는 휘발성(volatile) 저장 매체 또는 비휘발성(nonvolatile) 저장 매체로 구성될 수 있으며, 양 저장 매체의 조합(combination)으로 구성될 수도 있다. 휘발성 저장 매체로는 RAM, DRAM, SRAM과 같은 반도체 메모리(semiconductor memory)가 포함될 수 있으며, 비휘발성 저장 매체로는 하드 디스크(hard disk), 플래시 낸드 메모리(Flash NAND Memory)가 포함될 수 있다. 개시된 실시예에 따르면 저장부(270)에는 일반 모델(271)이 저장될 수 있다. 일반 모델(271)은 일반적인 음성 처리 모델로, 특정 개인이 아닌 불특정인의 음성 처리를 위한 음성 처리 모델이다. 대용량으로 구성되는 일반 모델(271)은 서버의 높은 연산 능력과 결합하여 사용자의 다양한 언어 표현(대어휘)에 대한 높은 음성 처리 성능을 제공한다. 따라서, 개시된 실시예에 따르면, 저장부(270)에는 어떠한 형태의 개인 정보도 저장되지 않아 개인 정보를 원천적으로 보호하면서도 높은 음성 처리 성능을 제공할 수 있다. The server 200 may further include a storage unit 270. The storage unit 270 stores programs and data necessary for the operation of the server 200. The storage unit 270 may be a volatile storage medium or a nonvolatile storage medium, or may be a combination of both storage media. The volatile storage medium may include a semiconductor memory such as a RAM, a DRAM, and a SRAM. The nonvolatile storage medium may include a hard disk and a flash NAND memory. The general model 271 may be stored in the storage unit 270 according to the disclosed embodiment. The general model 271 is a general speech processing model, and is a speech processing model for speech processing of an unspecified person who is not a specific individual. The generic model 271, which consists of a large capacity, combines with the high computing power of the server to provide high voice processing performance for various language expressions (vocabulary) of the user. Therefore, according to the disclosed embodiment, since no type of personal information is stored in the storage unit 270, it is possible to provide high voice processing performance while originally protecting personal information.

아래에서 단말(100)과 서버(200)의 동작을 보다 상세하게 설명하도록 한다. Operations of the terminal 100 and the server 200 will be described in more detail below.

도 5는 개시된 실시예에 따른 단말의 음성 신호 처리 방법을 나타내는 순서도이다. 5 is a flowchart illustrating a method of processing a voice signal of a terminal according to the disclosed embodiment.

먼저, 510 단계에서 단말(100)은 음성 신호를 수신한다. 단말(100)은 다양한 구성 요소를 통해 음성 신호를 수신할 수 있다. 마이크부를 통해 음성 신호를 수신하는 것이 가장 일반적인 형태일 것이지만, USB 인터페이스부 또는 DVD 인터페이스부 등을 통해 음성 신호를 수신할 수도 있다. 나아가, 외부 장치와 통신을 통해 음성 신호를 수신할 수도 있다. 개시된 실시예에 따르면, 단말(100)은 이렇게 수신한 음성 신호를 서버(200)로 전송할 수 있다. First, in step 510, the terminal 100 receives a voice signal. The terminal 100 may receive voice signals through various components. Receiving a voice signal through a microphone unit may be the most general form, but it may also receive a voice signal through a USB interface unit or a DVD interface unit. Further, it is also possible to receive a voice signal through communication with an external device. According to the disclosed embodiment, the terminal 100 can transmit the voice signal thus received to the server 200. [

그 후, 520 단계에서 단말(100)은 음성 신호 중 개인 정보를 포함하는 개인화 정보 구간을 검출한다. 개인화 정보 구간은 음성 신호 중 직간접적으로 각 개인을 식별할 수 있는 개인 정보를 포함하는 구간을 의미한다. 예를 들어, 단말(100)은 도 3의 저장부(170)에 저장된 개인 정보(172)를 참조하여 음성 신호 중에 사용자 단말의 전화번호부에 저장된 이름, 사용자의 검색 기록, 사용자 위치 정보 등을 포함하는 구간을 개인화 정보 구간으로 검출할 수 있다. 도 7을 참조하여 설명한다. Then, in step 520, the terminal 100 detects a personalization information section including personal information in the voice signal. The personalization information section refers to a section including personal information that can directly or indirectly identify each individual of the voice signals. For example, the terminal 100 refers to the personal information 172 stored in the storage unit 170 of FIG. 3 and includes the name stored in the phone book of the user terminal, the search history of the user, the user location information, Can be detected as a personalized information section. Will be described with reference to FIG.

도 7은 개인화 정보 구간과 일반 정보 구간을 설명하는 도면이다. 7 is a view for explaining a personalization information section and a general information section.

도 7을 참조하면, ‘홍길동씨 10층 김길동씨에게 전화 부탁드립니다’라는 문장은 9개의 구간으로 구분될 수 있다. 즉, 홍길동(701), 씨(702), 10(703), 층(704), 김길동(705), 씨(706), 에게(707), 전화(708), 부탁드립니다(709) 로 구분될 수 있다. 이러한 구간에 대한 구분 기준은 상황에 따라 다르게 적용될 수 있다. Referring to FIG. 7, the sentence 'Let's call Kim Gil-dong on the 10th floor of Hong Gil-dong' can be divided into nine sections. That is, it is divided into Hong Gil Dong 701, Seo 702, 10 703, Layer 704, Kim Gil Dong 705, Se 706, Phone 708, . Classification criteria for these segments may be applied differently depending on the situation.

여기서, 홍길동(701)은 사용자를 지칭하는 단어이고, 김길동(705)은 사용자 단말의 전화번호부에 저장된 이름이라고 가정하는 경우, 홍길동(701) 및 김길동(705)에 해당하는 음성 신호 구간은 개인화 구간(710)에 해당한다. 즉, 홍길동(701) 및 김길동(705)은 개인 정보에 해당하고 이러한 개인 정보를 포함하는 구간 701 및 705는 개인화 정보 구간(710)에 해당한다. 단말(100)은 520 단계에서 이와 같이 음성 신호 중 개인 정보를 포함하는 개인화 정보 구간(710)을 검출하는 것이다. Herein, assuming that Hong Kil-Dong 701 is a word referring to a user and Kim Gil-Dong 705 is a name stored in a telephone directory of a user terminal, a voice signal section corresponding to Hong Gil-Dong 701 and Kim Gil- (710). In other words, Hong Gil Dong 701 and Kim Gil Dong 705 correspond to personal information, and sections 701 and 705 including such personal information correspond to the personalization information section 710. In step 520, the terminal 100 detects a personalized information section 710 including personal information among the voice signals.

단말(100)은 다양한 방법을 통해 개인화 정보 구간을 검출할 수 있다. 예를 들어, 사전에 포함되어 있지 않은 단어는 개인 정보라고 판단하여 해당 단어가 포함된 음성 구간을 개인화 정보 구간이라고 판단할 수 있다. 다만, 이러한 방법은 하나의 예시에 불과하며, 다양한 개인화 정보 구간을 검출 방법을 이용할 수 있다.The terminal 100 can detect the personalized information section through various methods. For example, a word not included in the dictionary may be determined to be personal information, and a voice section including the word may be determined to be a personalized information section. However, this method is merely an example, and various personalized information section detection methods can be used.

또한, 단말(100)은 위와 같이 검출한 개인화 정보 구간을 이용하여 개인화 정보 구간과 일반 정보 구간에 대한 음성 구간 정보를 생성하여 서버(200)로 전송할 수 있다. 여기서, 음성 구간 정보는 음성 신호 중 개인화 정보 구간 및 일반 정보 구간 중 적어도 하나 이상의 구간을 마킹한 구간 마킹 정보를 포함할 수 있다. 즉, 단말(100)은 음성 신호 중 개인화 정보 구간 및/또는 일반 정보 구간에 해당 구간임을 알 수 있는 마킹을 하고, 이러한 구간 마킹 정보를 포함하는 음성 구간 정보를 생성하여 전송할 수 있다. 이렇게 서버(200)로 전송된 음성 구간 정보는, 서버(200)가 음성 신호 중 일반 정보 구간을 확인 및/또는 처리하는데 사용될 수 있다. 서버(200)는 일반 정보 구간으로 마킹된 구간 또는 개인화 정보 구간으로 마킹된 구간을 제외한 나머지 구간을 확인하고 그에 대응하는 음성 신호에 대하여 데이터 처리를 할 수 있다. Also, the terminal 100 may generate personalized information section and voice section information for the general information section using the personalized information section detected as described above, and transmit the personalized information section and the voice section information to the server 200. Here, the voice interval information may include interval marking information that marks at least one interval of the personalized information interval and the general information interval in the voice signal. That is, the terminal 100 marks the personalized information section and / or the general information section of the voice signal to be a corresponding section, and generates and transmits the voice section information including the section marking information. The voice section information transmitted to the server 200 may be used by the server 200 to confirm and / or process the general information section of the voice signal. The server 200 can check the remaining sections excluding the section marked with the general information section or the section marked with the personalized information section and process data corresponding to the corresponding section.

다시 도 5의 설명으로 돌아가면, 530 단계에서 단말(100)은 개인화 정보 구간에 대응하는 음성 신호에 대하여 개인 정보를 기반으로 생성된 개인화 모델(171)을 이용하여 데이터 처리한다. 개인화 모델(171)은, 개인별 특징을 반영한 음성 처리 모델로, 특정 개인에 대한 맞춤형 음성 처리 모델이다. 개인화 모델(171)을 이용하여 음성 신호에 대한 데이터 처리를 수행하는 경우, 불특정 다수를 위해 만들어진 일반 모델(271)에 비하여 더 정확한 음성 처리가 가능하다. 또한, 단말(100)은 개인화 구간에 대응하는 음성 신호에 대하여 직접 개인화 모델(171)을 이용하여 데이터 처리를 수행함으로써, 개인 정보를 포함하는 개인화 모델(171)을 서버와 같은 외부 장치로 전송할 필요가 없고, 따라서, 개인 정보를 원천적으로 보호할 수 있다. Returning to the description of FIG. 5, in step 530, the terminal 100 performs data processing on the voice signal corresponding to the personalization information section using the personalization model 171 generated based on the personal information. The personalization model 171 is a speech processing model reflecting personal characteristics, and is a customized speech processing model for a specific individual. When the personalization model 171 is used to perform data processing on a voice signal, it is possible to perform more accurate voice processing than the general model 271 created for an unspecified number of persons. In addition, the terminal 100 performs data processing directly on the voice signal corresponding to the personalization section using the personalization model 171, so that the personalization model 171 including the personal information needs to be transmitted to an external device such as a server So that the personal information can be originally protected.

나아가, 540 단계에서 단말(100)은 서버(200)로부터 개인화 정보 구간 이외의 구간인 일반 정보 구간에 대응하는 음성 신호에 대하여 데이터 처리한 결과를 수신한다. 일반 정보 구간은 음성 신호 중 개인 정보가 아닌 일반적인 정보를 포함하는 구간으로, 개인화 정보 구간을 제외한 나머지 구간을 의미한다. 예를 들어, ‘전화해’, ‘밥 먹었어?’ 등과 같이 일반적인 의미를 갖는 문장을 포함하는 구간이 일반 정보 구간에 해당할 수 있다. 도 7을 참조하여 설명한다. In step 540, the terminal 100 receives a result of the data processing of the voice signal corresponding to the general information section, which is a section other than the personalization information section, from the server 200. [ The general information section is a section that includes general information, not personal information, of a voice signal, and the remaining section excluding the personalized information section. For example, a section including a sentence having a general meaning such as 'Call me' or 'Did you eat?' May correspond to a general information section. Will be described with reference to FIG.

위에서 살펴본 것과 같이 도 7에서, ‘홍길동씨 10층 김길동씨에게 전화 부탁드립니다’라는 문장은 9개의 구간으로 구분될 수 있다. As shown above, in FIG. 7, the sentence 'I would like to call Mr. Kim Gil-dong on the 10th floor of Hong Kil-dong' can be divided into 9 sections.

여기서, 씨(702), 10(703), 층(704), 씨(706), 에게(707), 전화(708), 부탁드립니다(709)는 일반적인 단어들로 이러한 일반적인 정보를 포함하는 구간 702, 703, 704, 706, 707, 708 및 709는 일반 정보 구간(720)에 해당한다. Herein, the words 702, 10 703, 704, 706, 707, 708, 709 are words that are generic words, , 703, 704, 706, 707, 708 and 709 correspond to the general information section 720.

단말(100)은 540 단계에서 이와 같이 음성 신호 중 일반 정보를 포함하는 일반 정보 구간(720)에 대응하는 음성 신호에 대해서는 직접 데이터 처리를 하지 않고 단말(100)에 비해 높은 연산 능력을 갖는 서버(200)로부터 데이터 처리한 결과를 수신한다. 따라서, 일반 정보 구간(720)에 대응하는 음성 신호에 대해서 빠른 데이터 처리가 가능하다. The terminal 100 does not directly process the voice signal corresponding to the general information section 720 including the general information among the voice signals in step 540, 200). Therefore, it is possible to perform fast data processing on the audio signal corresponding to the general information section 720.

서버(200)로부터 수신한 일반 정보 구간에 대응하는 음성 신호에 대하여 데이터 처리한 결과는 서버(200)가 일반 모델(271)을 이용하여 일반 정보 구간에 대응하는 음성 신호를 처리한 결과일 수 있다. 일반 모델(271)은 서버(200)의 높은 연산 능력과 결합하여 사용자의 다양한 언어 표현(대어휘)에 대한 높은 음성 처리 성능을 제공할 수 있다. 또한, 서버(200)는 높은 연산 능력을 가지는 바, 일반 정보 구간에 대하여 빠른 데이터 처리가 가능하다.The result of data processing of the audio signal corresponding to the general information section received from the server 200 may be a result of the server 200 processing the audio signal corresponding to the general information section using the general model 271 . The general model 271 can provide high speech processing performance for various language expressions (vocabulary) of the user in combination with the high computing power of the server 200. In addition, since the server 200 has a high computation capability, it is possible to perform fast data processing on a general information section.

단말(100)은 서버(200)로부터 개인화 정보 구간과 일반 정보 구간에 대한 음성 구간 정보를 수신할 수도 있다. 여기서, 음성 구간 정보는 음성 신호 중 개인화 정보 구간 및 일반 정보 구간 중 적어도 하나 이상의 구간을 마킹한 구간 마킹 정보를 포함할 수 있다. 단말(100)은 자체적으로 개인화 정보 구간을 검출하고 음성 구간 정보를 생성할 수도 있다. 다만, 단말(100)과 서버(200)는 각각 스스로 검출한 개인화 정보 구간 및/또는 일반 정보 구간에 대하여 데이터 처리를 수행할 수 있으므로, 단말(100)에서 검출한 개인화 정보 구간과 서버(200)에서 검출한 개인화 정보 구간이 서로 다른 경우, 데이터 처리에 빠지는 구간이 발생할 수 있다. 따라서, 모든 구간에 대해 빠짐없이 데이터 처리를 수행하기 위하여, 단말(100)과 서버(200)는 개인화 정보 구간과 일반 정보 구간에 대한 음성 구간 정보를 공유할 수 있다. The terminal 100 may receive the personalization information section from the server 200 and the voice section information on the general information section. Here, the voice interval information may include interval marking information that marks at least one interval of the personalized information interval and the general information interval in the voice signal. The terminal 100 may detect the personalization information section itself and generate the voice section information. Since the terminal 100 and the server 200 can perform data processing on the personalized information section and / or the general information section which are respectively detected by the terminal 100 and the server 200, the personalized information section detected by the terminal 100 and the personalized information section detected by the server 200, If the detected personalization information sections are different from each other, a period of time for data processing may occur. Accordingly, the terminal 100 and the server 200 may share the personalization information section and the voice section information for the general information section in order to perform data processing for all sections without fail.

나아가, 단말(100)은 데이터 처리 결과를 사용자에게 출력할 수 있다. Furthermore, the terminal 100 can output the data processing result to the user.

결과적으로 개시된 실시예에 따르면, 개인 정보(172)나 개인화 모델(171)은 서버(200)로 전송함 없이 단말(100)에서 유지하고 단말(100)에서 이러한 개인 정보(172)나 개인화 모델(171)을 이용하여 음성 처리함으로써 개인 정보를 원천적으로 보호하면서도 보다 높은 성능 및 처리 속도를 갖는 음성 처리 시스템을 구현할 수 있다. As a result, according to the disclosed embodiment, the personal information 172 and the personalization model 171 are stored in the terminal 100 without being transmitted to the server 200, and are stored in the terminal 100 by the personal information 172 or the personalization model 171), it is possible to implement a voice processing system having higher performance and processing speed while originally protecting personal information.

도 6은 개시된 실시예에 따른 서버의 음성 신호 처리 방법을 나타내는 순서도이다.6 is a flowchart illustrating a method of processing a voice signal of a server according to an embodiment of the present invention.

먼저, 610 단계에서 서버(200)는 음성 신호를 수신한다. 서버(200)는 다양한 구성 요소를 통해 음성 신호를 수신할 수 있다. 단말(100)로부터 음성 신호를 수신하는 것이 가장 일반적인 형태일 것이지만, USB 인터페이스부 또는 DVD 인터페이스부 등을 통해 음성 신호를 수신할 수도 있다. First, in step 610, the server 200 receives a voice signal. The server 200 can receive voice signals through various components. Receiving the voice signal from the terminal 100 may be the most general form, but it may also receive the voice signal through the USB interface unit or the DVD interface unit.

그 후, 620 단계에서 서버(200)는 수신한 음성 신호 중 개인 정보를 포함하는 개인화 정보 구간을 검출한다. 서버(200)는 수신한 음성 신호를 해석하여 개인화 정보 구간을 검출할 수도 있고, 단말(100)로부터 음성 구간 정보를 수신하여 파싱함으로써 개인화 정보 구간을 검출할 수도 있다. 서버(200)는 다양한 방법을 통해 개인화 정보 구간을 검출할 수 있다. 예를 들어, 사전에 포함되어 있지 않은 단어는 개인 정보라고 판단하여 해당 단어가 포함된 음성 구간을 개인화 정보 구간이라고 판단할 수 있다. 다만, 이러한 방법은 하나의 예시에 불과하며, 다양한 개인화 정보 구간을 검출 방법을 이용할 수 있다. Thereafter, in step 620, the server 200 detects a personalization information section including personal information among the received voice signals. The server 200 may detect the personalized information section by analyzing the received voice signal or may detect the personalized information section by receiving and parsing the voice section information from the terminal 100. [ The server 200 can detect the personalized information section through various methods. For example, a word not included in the dictionary may be determined to be personal information, and a voice section including the word may be determined to be a personalized information section. However, this method is merely an example, and various personalized information section detection methods can be used.

개인화 정보 구간 및 일반 구간에 대해서는 위에서 도 7과 함께 설명한 바, 중복하여 설명하지 않는다. The personalization information section and the general section are described above with reference to FIG. 7, and will not be described in duplicate.

또한, 서버(200)는 위와 같이 검출한 개인화 정보 구간을 이용하여 개인화 정보 구간과 일반 정보 구간에 대한 음성 구간 정보를 생성하여 단말(100)로 전송할 수 있다. 여기서, 음성 구간 정보는 음성 신호 중 개인화 정보 구간 및 일반 정보 구간 중 적어도 하나 이상의 구간을 마킹한 구간 마킹 정보를 포함할 수 있다. 즉, 서버(200)은 음성 신호 중 개인화 정보 구간 및/또는 일반 정보 구간에 해당 구간임을 알 수 있는 마킹을 하고, 이러한 구간 마킹 정보를 포함하는 음성 구간 정보를 생성하여 전송할 수 있다. 이렇게 단말(100)로 전송된 음성 구간 정보는, 단말(100)이 음성 신호 중 개인화 정보 구간을 확인 및/또는 처리하는데 사용될 수 있다. 단말(100)은 일반 정보 구간으로 마킹된 구간 또는 개인화 정보 구간으로 마킹된 구간을 제외한 나머지 구간을 확인하고 그에 대응하는 음성 신호에 대하여 데이터 처리를 할 수 있다. In addition, the server 200 may generate the personalization information section and the voice section information for the general information section using the personalized information section detected as described above, and transmit the personalized information section and the voice section information to the terminal 100. Here, the voice interval information may include interval marking information that marks at least one interval of the personalized information interval and the general information interval in the voice signal. That is, the server 200 may perform marking of the personalized information section and / or the general information section of the voice signal to indicate that the corresponding section is generated, and may generate and transmit the voice section information including the section marking information. The voice section information transmitted to the terminal 100 in this way can be used by the terminal 100 to identify and / or process the personalized information section of the voice signal. The terminal 100 can check the remaining sections excluding the section marked with the general information section or the section marked with the personalized information section and process the voice signal corresponding thereto.

다음으로, 630 단계에서 서버(200)는 음성 신호 중 개인화 정보 구간 이외의 구간인 일반 정보 구간에 대응하는 음성 신호에 대하여 일반 모델(271)을 이용하여 데이터 처리한다. 일반 모델(271)은 서버의 높은 연산 능력과 결합하여 사용자의 다양한 언어 표현(대어휘)에 대한 높은 음성 처리 성능을 제공할 수 있다. 나아가, 서버(200)는 높은 연산 능력을 가지는 바, 일반 정보 구간에 대하여 빠른 데이터 처리가 가능하다. 또한, 개인 정보를 포함하는 개인화 구간에 대응하는 음성 신호에 대해서는 서버(200)가 처리 하지 않고 개인화 모델(171)을 가지고 있는 단말(100)에서 데이터 처리를 수행하는 바, 개인 정보를 원천적으로 보호할 수 있다. Next, in step 630, the server 200 processes the voice signal corresponding to the general information section, which is a section other than the personalization information section, of the voice signal using the general model 271. The generic model 271 can combine with the high computing power of the server to provide high voice processing performance for the user's various language expressions (vocabulary). Further, since the server 200 has a high computing capability, it is possible to process data in a short period of time. As for the voice signal corresponding to the personalization section including the personal information, the terminal 200 having the personalization model 171 performs the data processing without processing by the server 200, so that the personal information is protected originally can do.

나아가, 640 단계에서, 서버(200)는 일반 정보 구간에 대응하는 음성 신호에 대하여 데이터 처리한 결과를 단말로 전송한다. In addition, in step 640, the server 200 transmits to the terminal a result of the data processing on the voice signal corresponding to the general information section.

서버(200)는 단말(100)로부터 개인화 정보 구간과 일반 정보 구간에 대한 음성 구간 정보를 수신할 수도 있다. 여기서, 음성 구간 정보는 음성 신호 중 개인화 정보 구간 및 일반 정보 구간 중 적어도 하나 이상의 구간을 마킹한 구간 마킹 정보를 포함할 수 있다. 서버(200)는 자체적으로 개인화 정보 구간을 검출하고 음성 구간 정보를 생성할 수도 있다. 다만, 단말(100)과 서버(200)는 각각 스스로 검출한 개인화 정보 구간 및/또는 일반 정보 구간에 대하여 데이터 처리를 수행할 수 있으므로, 단말(100)에서 검출한 개인화 정보 구간과 서버(200)에서 검출한 개인화 정보 구간이 서로 다른 경우, 데이터 처리에 빠지는 구간이 발생할 수 있다. 따라서, 모든 구간에 대해 빠짐없이 데이터 처리를 수행하기 위하여, 단말(100)과 서버(200)는 개인화 정보 구간과 일반 정보 구간에 대한 음성 구간 정보를 공유할 수 있다. The server 200 may receive the personalization information section from the terminal 100 and the voice section information on the general information section. Here, the voice interval information may include interval marking information that marks at least one interval of the personalized information interval and the general information interval in the voice signal. The server 200 may detect the personalization information section itself and generate the voice section information. Since the terminal 100 and the server 200 can perform data processing on the personalized information section and / or the general information section which are respectively detected by the terminal 100 and the server 200, the personalized information section detected by the terminal 100 and the personalized information section detected by the server 200, If the detected personalization information sections are different from each other, a period of time for data processing may occur. Accordingly, the terminal 100 and the server 200 may share the personalization information section and the voice section information for the general information section in order to perform data processing for all sections without fail.

결과적으로 개시된 실시예에 따르면, 개인 정보를 원천적으로 보호하면서도 보다 높은 성능 및 처리 속도를 갖는 음성 처리 시스템을 구현할 수 있다. As a result, according to the disclosed embodiment, a voice processing system having higher performance and higher processing speed can be implemented while originally protecting personal information.

도 8은 개시된 실시예에 따른 단말과 서버의 구체적인 동작 과정의 예시를 나타내는 순서도이다. 8 is a flowchart illustrating an exemplary operation procedure of a terminal and a server according to the disclosed embodiment.

먼저, 805 단계에서 단말(100)은 음성 신호를 수신한다. 위에서 설명한 것과 같이 단말(100)은 다양한 구성 요소를 통해 음성 신호를 수신할 수 있다. 마이크부를 통해 음성 신호를 수신하는 것이 가장 일반적인 형태일 것이지만, USB 인터페이스부 또는 DVD 인터페이스부 등을 통해 음성 신호를 수신할 수도 있다. 나아가, 외부 장치와 통신을 통해 음성 신호를 수신할 수도 있다. 그 후, 810 단계에서 단말(100)은 수신한 음성 신호를 서버(200)로 전송할 수 있다. First, in step 805, the terminal 100 receives a voice signal. As described above, the terminal 100 can receive voice signals through various components. Receiving a voice signal through a microphone unit may be the most general form, but it may also receive a voice signal through a USB interface unit or a DVD interface unit. Further, it is also possible to receive a voice signal through communication with an external device. Thereafter, the terminal 100 may transmit the received voice signal to the server 200 in step 810.

단말(100)과 서버(200)는 각각 815 단계 및 820 단계에서 음성 신호에 대하여 개인화 음성 구간을 검출한다. 단말(100)과 서버(200)는 다양한 방법을 통해 개인화 정보 구간을 검출할 수 있다. 예를 들어, 사전에 포함되어 있지 않은 단어는 개인 정보라고 판단하여 해당 단어가 포함된 음성 구간을 개인화 정보 구간이라고 판단할 수 있다. 다만, 이러한 방법은 하나의 예시에 불과하며, 다양한 개인화 정보 구간을 검출 방법을 이용할 수 있다.The terminal 100 and the server 200 detect personalized voice intervals for voice signals in steps 815 and 820, respectively. The terminal 100 and the server 200 can detect the personalized information section through various methods. For example, a word not included in the dictionary may be determined to be personal information, and a voice section including the word may be determined to be a personalized information section. However, this method is merely an example, and various personalized information section detection methods can be used.

단말(100)은 815 단계에서 검출한 개인화 음성 구간에 대해서는 825 단계로 진행하여 개인화 모델(171)을 기반으로 음성 데이터를 처리하고, 개인화 음성 구간 이외의 일반 정보 구간에 대해서는 830 단계로 진행하여 데이터 처리를 패스(pass)하고 음성 구간 정보를 생성할 수 있다. 서버(200)는 820 단계에서 검출한 개인화 음성 구간에 대해서는 835 단계로 진행하여 데이터 처리를 패스(pass)하고 음성 구간 정보를 생성하며, 개인화 음성 구간 이외의 일반 정보 구간에 대해서는 830 단계로 진행하여 일반 모델(271)을 기반으로 음성 데이터를 처리한다. The terminal 100 proceeds to step 825 to process the voice data based on the personalization model 171 for the personalized voice interval detected in step 815 and proceeds to step 830 for the general information interval other than the personalized voice interval, It is possible to pass the processing and generate the voice interval information. The server 200 proceeds to step 835 for the personalized voice interval detected in step 820 and passes the data processing to generate voice interval information and proceeds to step 830 for a general information interval other than the personalized voice interval And processes the voice data based on the general model 271.

이러한 과정을 통해 단말(100)은 개인화 구간에 대응하는 음성 신호에 대하여 직접 개인화 모델(171)을 이용하여 데이터 처리를 수행하고, 서버(200)는 높은 연산 능력을 바탕으로 일반 정보 구간에 대응하는 음성 신호에 대하여 일반 모델(271)을 이용하여 데이터 처리함으로써, 개인 정보를 원천적으로 보호하면서도 높은 음성 처리 성능을 구현할 수 있다. Through this process, the terminal 100 performs data processing directly on the voice signal corresponding to the personalization section using the personalization model 171, and the server 200 performs data processing corresponding to the general information section Data processing is performed on the voice signal using the general model 271, so that high voice processing performance can be realized while originally protecting personal information.

그 후, 단말(100) 및 서버(200)는 845 단계 및 850 단계에서 음성 구간 정보 및 데이터 처리 결과를 공유한다. 즉, 서버(200)는 845 단계에서 음성 데이터 처리 결과와 음성 구간 정보를 단말(100)로 전송하고, 단말(100)은 음성 구간 정보를 서버(200)로 전송한다. 도 8에서는 이러한 과정이 845 단계, 850 단계로 도시되어 있으나, 공유 과정에서 단말(100) 또는 서버(200) 어느쪽이 먼저 데이터를 전송하는 것도 무방하다. After that, the terminal 100 and the server 200 share the voice interval information and the data processing result in steps 845 and 850. [ That is, the server 200 transmits the voice data processing result and the voice interval information to the AT 100 in step 845, and the AT 100 transmits the voice interval information to the server 200. 8, steps 845 and 850 are performed. However, either the terminal 100 or the server 200 may transmit data in the sharing process.

위에서 설명한 것과 같이 단말(100)과 서버(200)는 각각 자체적으로 개인화 정보 구간을 검출하고 음성 구간 정보를 생성할 수도 있다. 다만, 단말(100)과 서버(200)는 각각 스스로 검출한 개인화 정보 구간 및/또는 일반 정보 구간에 대하여 데이터 처리를 수행할 수 있으므로, 단말(100)에서 검출한 개인화 정보 구간과 서버(200)에서 검출한 개인화 정보 구간이 서로 다른 경우, 데이터 처리에 빠지는 구간이 발생할 수 있다. 따라서, 모든 구간에 대해 빠짐없이 데이터 처리를 수행하기 위하여, 단말(100)과 서버(200)는 개인화 정보 구간과 일반 정보 구간에 대한 음성 구간 정보를 공유할 수 있다. As described above, the terminal 100 and the server 200 may respectively detect the personalized information section and generate the voice section information. Since the terminal 100 and the server 200 can perform data processing on the personalized information section and / or the general information section which are respectively detected by the terminal 100 and the server 200, the personalized information section detected by the terminal 100 and the personalized information section detected by the server 200, If the detected personalization information sections are different from each other, a period of time for data processing may occur. Accordingly, the terminal 100 and the server 200 may share the personalization information section and the voice section information for the general information section in order to perform data processing for all sections without fail.

단말(100)은 855 단계에서 모든 음성 신호 구간에 대하여 데이터 처리가 완료되었는지 판단하여 완료된 경우, 동작을 종료한다. 855 단계에서 데이터 처리가 완료되지 않은 경우, 단말(100)은 865 단계로 진행하여 서버(200)로부터 수신한 음성 구간 정보를 기반으로 처리되지 않은 음성 구간에 대하여 개인화 모델(171)을 이용하여 데이터 처리할 수 있다. 서버(200)는 860 단계에서 모든 음성 신호 구간에 대하여 데이터 처리가 완료되었는지 판단하여 완료된 경우, 동작을 종료한다. 860 단계에서 데이터 처리가 완료되지 않은 경우, 서버(200)는 870 단계로 진행하여 단말(100)로부터 수신한 음성 구간 정보를 기반으로 처리되지 않은 음성 구간에 대하여 개인화 모델(171)을 이용하여 데이터 처리할 수 있다. In step 855, the terminal 100 determines whether data processing has been completed for all voice signal intervals. If the data processing is completed, the terminal 100 ends the operation. If the data processing is not completed in step 855, the terminal 100 proceeds to step 865. In step 865, the terminal 100 transmits the data Can be processed. In step 860, the server 200 determines whether data processing is completed for all voice signal intervals. If the data processing is completed, the server 200 ends the operation. If the data processing is not completed in step 860, the server 200 proceeds to step 870. In step 870, the server 200 determines whether or not data processing is performed on the voice section that has not been processed based on the voice section information received from the terminal 100, using the personalization model 171 Can be processed.

도 8에 도시된 것과 같이 815 단계 내지 870 단계의 음성 신호 처리 과정은 단말(100)과 서버(200)에서 병렬적으로 동시에 수행될 수도 있고, 단말(100)과 서버(200)가 번갈아가며 스위칭하여 음성 처리를 수행하는 것도 가능하다. 8, the voice signal processing processes of steps 815 to 870 may be performed simultaneously in parallel in the terminal 100 and the server 200, or alternatively in the terminal 100 and the server 200, It is also possible to perform voice processing.

단말(100)과 서버(200)가 번갈아가며 스위칭하여 음성 처리를 수행하는 경우, 단말(100)이 개인화 정보 구간에 대응하는 음성 신호에 대하여 개인화 모델(171)을 기반으로 데이터 처리를 수행하다가 일반 정보 구간에 대응하는 음성 신호가 나오는 경우, 음성 구간 정보를 서버(200)로 전송하면, 서버(200)는 단말(100)이 데이터 처리한 음성 신호 다음에 오는 음성 신호부터 일반 모델(271)을 이용하여 음성 처리를 수행할 수도 있다. 그 후, 다시 개인화 정보 구간에 대응하는 음성 신호가 나오는 경우, 지금까지의 데이터 처리 결과와 음성 구간 정보를 단말(100)로 전송하면, 단말(100)은 서버(200)가 데이터 처리한 음성 신호 다음에 오는 음성 신호부터 일반 모델(271)을 이용하여 음성 처리를 수행할 수도 있다.When the terminal 100 and the server 200 alternately switch and perform voice processing, the terminal 100 performs data processing based on the personalization model 171 on the voice signal corresponding to the personalized information section, When the audio signal corresponding to the information section comes out and the audio section information is transmitted to the server 200, the server 200 extracts the general model 271 from the audio signal following the audio signal processed by the terminal 100 It is also possible to perform the voice processing. Then, when a voice signal corresponding to the personalized information section again appears, the terminal 100 transmits the data processing result up to the present and the voice section information to the terminal 100, The voice processing may be performed using the general model 271 from the next voice signal.

이와 반대로, 서버(200)에서 음성 처리를 시작하는 경우, 서버(200)는 일반 구간에 대응하는 음성 신호에 대하여 일반 모델(271)을 이용하여 음성 처리를 수행하다가 개인화 정보 구간에 대응하는 음성 신호가 나오는 경우, 지금까지의 데이터 처리 결과와 음성 구간 정보를 단말(100)로 전송하면, 단말(100)은 서버(200)가 데이터 처리한 음성 신호 다음에 오는 음성 신호부터 일반 모델(271)을 이용하여 음성 처리를 수행할 수도 있다. Conversely, when the server 200 starts voice processing, the server 200 performs voice processing on the voice signal corresponding to the general section using the general model 271, and outputs voice signals corresponding to the personalized information section The terminal 100 transmits the general model 271 from the audio signal following the audio signal processed by the server 200 to the terminal 100. In this case, It is also possible to perform the voice processing.

개시된 실시예에 따르면, 개인 정보를 원천적으로 보호하면서도 보다 높은 성능 및 처리 속도를 갖는 음성 처리 시스템을 구현할 수 있다.According to the disclosed embodiments, it is possible to implement a voice processing system with higher performance and higher processing speed while at the same time protecting private information.

도 9는 개시된 다른 실시예에 따른 단말의 음성 처리 방법을 나타내는 순서도이다. 9 is a flowchart showing a voice processing method of a terminal according to another disclosed embodiment.

먼저, 910 단계에서, 단말(100)은 음성 신호를 수신한다. 단말(100)은 다양한 구성 요소를 통해 음성 신호를 수신할 수 있다. 마이크부를 통해 음성 신호를 수신하는 것이 가장 일반적인 형태일 것이지만, USB 인터페이스부 또는 DVD 인터페이스부 등을 통해 음성 신호를 수신할 수도 있다. 나아가, 외부 장치와 통신을 통해 음성 신호를 수신할 수도 있다. 개시된 실시예에 따르면, 단말(100)은 이렇게 수신한 음성 신호를 서버(200)로 전송할 수 있다. First, in step 910, the terminal 100 receives a voice signal. The terminal 100 may receive voice signals through various components. Receiving a voice signal through a microphone unit may be the most general form, but it may also receive a voice signal through a USB interface unit or a DVD interface unit. Further, it is also possible to receive a voice signal through communication with an external device. According to the disclosed embodiment, the terminal 100 can transmit the voice signal thus received to the server 200. [

그 후, 920 단계에서, 단말(100)은 서버(200)로부터 음성 신호에 대하여 일반 모델(271)을 이용하여 데이터 처리한 결과를 수신한다. 즉, 단말(100)은 서버(200)에서 단말(100)과 독립적으로 음성 신호에 대한 데이터 처리를 수행한 결과를 수신하는 것이다. 일반 모델(271)은 서버(200)의 높은 연산 능력과 결합하여 사용자의 다양한 언어 표현(대어휘)에 대한 높은 음성 처리 성능을 제공할 수 있다. 나아가, 서버(200)는 높은 연산 능력을 가지는 바, 일반 정보 구간에 대하여 빠른 데이터 처리가 가능하다.Thereafter, in step 920, the terminal 100 receives a result of data processing using the general model 271 with respect to the voice signal from the server 200. [ That is, the terminal 100 receives a result of performing data processing on a voice signal independently of the terminal 100 from the server 200. [ The general model 271 can provide high speech processing performance for various language expressions (vocabulary) of the user in combination with the high computing power of the server 200. Further, since the server 200 has a high computing capability, it is possible to process data in a short period of time.

다음으로 930 단계에서, 단말(100)은 음성 신호에 대하여 개인 정보를 기반으로 생성된 개인화 모델(171) 및 서버(200)로부터 수신한 데이터 처리 결과를 이용하여 데이터 처리한다. 개시된 실시예에 따르면, 단말(100)은 음성 신호에 대해서 개인화 모델(171) 및 서버(200)로부터의 데이터 처리 결과를 이용하여 음성 신호 전체 또는 부분에 대하여 다시 한 번 데이터 처리를 할 수 있다. 위에서 설명한 것과 같이 서버(200)는 일반 모델(271)을 이용하여 데이터 처리를 수행하는 바, 보다 높은 음성 처리 성능을 위하여 개인화 모델(171)을 이용하여 다시 한 번 데이터 처리를 하는 것이다. Next, in step 930, the terminal 100 processes the voice signal using the personalization model 171 generated based on the personal information and the data processing result received from the server 200. According to the disclosed embodiment, the terminal 100 can perform data processing on all or part of the voice signal again using the personalization model 171 and the data processing result from the server 200 with respect to the voice signal. As described above, the server 200 performs data processing using the general model 271, and performs data processing once again using the personalization model 171 for higher voice processing performance.

이 경우, 단말(100)은 개인 정보를 포함하는 개인화 정보 구간에 대응하는 음성 신호에 대하여 데이터 처리를 수행할 수 있다. 즉, 서버(200)는 높은 연산 능력을 이용하여 일반 정보 구간이나 개인화 정보 구간의 구별없이 음성 신호 전체에 대해서 일반 모델(271)을 이용하여 데이터 처리를 수행한다. 그 후, 개인 정보(172)를 이용하여 처리 성능을 높일 수 있는 개인화 정보 구간에 대해서는 단말(100)에서 개인화 모델을 이용하여 다시 한번 데이터 처리를 반복하는 것이다. 이를 위하여 단말(100)은 음성 신호 중 개인화 정보 구간을 검출할 수 있다. 단말(100)은 다양한 방법을 통해 개인화 정보 구간을 검출할 수 있다. 예를 들어, 사전에 포함되어 있지 않은 단어는 개인 정보라고 판단하여 해당 단어가 포함된 음성 구간을 개인화 정보 구간이라고 판단할 수 있다. 다만, 이러한 방법은 하나의 예시에 불과하며, 다양한 개인화 정보 구간을 검출 방법을 이용할 수 있다. In this case, the terminal 100 can perform data processing on a voice signal corresponding to a personalized information section including personal information. That is, the server 200 performs data processing using the generic model 271 with respect to the entire voice signal without discriminating between the general information section and the personalized information section by using the high computing power. Thereafter, the personalization information section that can increase the processing performance by using the personal information 172 is once again subjected to the data processing using the personalization model in the terminal 100. [ For this, the terminal 100 can detect the personalization information section of the voice signal. The terminal 100 can detect the personalized information section through various methods. For example, a word not included in the dictionary may be determined to be personal information, and a voice section including the word may be determined to be a personalized information section. However, this method is merely an example, and various personalized information section detection methods can be used.

또는, 단말(100)은 서버(200)로부터 개인화 정보 구간과 일반 정보 구간에 대한 음성 구간 정보를 수신할 수도 있다. 높은 연산 능력을 갖는 서버(200)에서 생성한 음성 구간 정보를 수신함으로써, 단말(100)의 데이터 처리 부담이 줄어 전체적인 음성 처리 속도를 보다 빠르게 할 수 있다. 여기서, 음성 구간 정보는 음성 신호 중 개인화 정보 구간 및 일반 정보 구간 중 적어도 하나 이상의 구간을 마킹한 구간 마킹 정보를 포함할 수 있다. Alternatively, the terminal 100 may receive the personalization information section from the server 200 and the voice section information on the general information section. By receiving the voice interval information generated by the server 200 having a high computing capability, the data processing burden on the terminal 100 can be reduced and the overall voice processing speed can be increased. Here, the voice interval information may include interval marking information that marks at least one interval of the personalized information interval and the general information interval in the voice signal.

개인화 정보 구간 및 일반 정보 구간에 대해서는 위에서 도 7과 함께 설명한 바, 중복하여 설명하지 않는다. The personalized information section and the general information section are described above with reference to FIG. 7, but will not be described in duplicate.

여기서 개인화 모델(171)은 개인화 음성 인식 모델(personalized speech recognition model), 언어 이해 모델(personalized natural language understanding model) 및 개인화 렉시컬 모델(personalized lexical model) 중 적어도 하나 이상의 모델일 수 있다. Here, the personalization model 171 may be at least one of a personalized speech recognition model, a personalized natural language understanding model, and a personalized lexical model.

단말(100)이 개인화 음성 인식 모델을 이용하여 음성 신호에 대한 데이터 처리를 수행하는 경우, 단말(100)은 서버(200)로부터 음소, 의사 형태소 또는 단어 단위의 데이터 처리 결과를 수신할 수 있으며, N-best 가설(N-best hypothesis), 래티스(lattice), 컨퓨젼 네트워크(confusion network) 와 같은 멀티 패스(multi-pass) 처리 결과를 수신하여 데이터 처리에 활용할 수 있다. When the terminal 100 performs data processing on the voice signal using the personalized speech recognition model, the terminal 100 can receive the data processing result of the phoneme, pseudo-morpheme, or word unit from the server 200, It can receive and process multi-pass processing results such as N-best hypothesis, lattice, and confusion network for data processing.

단말(100)이 언어 이해 모델을 이용하여 음성 신호에 대한 데이터 처리를 수행하는 경우, 단말(100)은 서버(200)로부터 음소, 의사 형태소 또는 단어 단위의 데이터 처리 결과 및 문장 또는 단어 단위의 신뢰도 점수(confidence score)와 같은 정보를 수신하여 데이터 처리에 활용할 수 있다. 또한, 서버(200)로부터 멀티 패스(multi-pass) 처리 결과를 수신하여 데이터 처리에 활용할 수도 있다. When the terminal 100 performs data processing on the voice signal using the language understanding model, the terminal 100 receives the data processing result of phonemes, pseudo-morphemes, or words from the server 200 and the reliability of the sentence or word unit Information such as a score (confidence score) can be received and used for data processing. In addition, a multi-pass processing result may be received from the server 200 and used for data processing.

단말(100)이 개인화 렉시컬 모델(personalized lexical model)을 이용하여 음성 신호에 대한 데이터 처리를 수행하는 경우, 단말(100)은 서버(200)로부터 음소, 의사 형태소 또는 단어 단위의 데이터 처리 결과 및 문장 또는 단어 단위의 신뢰도 점수(confidence score)와 같은 정보를 수신하여 데이터 처리에 활용할 수 있다. 또한, 단말(100)은 서버(200)로부터 수신한 데이터 처리 결과와 개인 단어 리스트를 이용하여 데이터 처리를 수행할 수 있으며, 이 경우, 가정(hypothesis)과 개인 단어 간의 음소 단위 비교를 위해 발음 사전을 사용할 수도 있다.When the terminal 100 performs data processing on a voice signal using a personalized lexical model, the terminal 100 receives a result of processing data of a phoneme, pseudo-morpheme, or word from the server 200, Information such as sentence or word-based confidence score can be received and used for data processing. In addition, the terminal 100 may perform data processing using the data processing result received from the server 200 and the personal word list. In this case, May be used.

어떠한 모델을 사용하더라도 개인 정보가 서버(200)로 전송될 필요없이 개인 정보의 활용으로 더욱 정확한 음성 처리가 가능하다. Even if any model is used, personal information can be processed more accurately by utilizing personal information without being transmitted to the server 200.

개시된 실시예에 따르면, 개인화 모델링을 이용한 음성 처리는 단말(100)에서만 수행되므로, 서버(200)와 단말(100) 간에는 개인 정보를 포함하는 어떠한 형태의 정보 전송도 일어나지 않고, 그에 따라 서버(200)에는 어떠한 개인 정보도 저장되지 않는다. 따라서, 개인 정보를 원천적으로 보호하면서도 보다 높은 성능 및 처리 속도를 갖는 음성 처리 시스템을 구현할 수 있다.According to the disclosed embodiment, since voice processing using personalization modeling is performed only in the terminal 100, no information transmission including personal information occurs between the server 200 and the terminal 100, ) Does not store any personal information. Therefore, a voice processing system having higher performance and higher processing speed can be implemented while protecting personal information originally.

도 10은 개시된 다른 실시예에 따른 서버의 음성 처리 방법을 나타내는 순서도이다.10 is a flowchart showing a voice processing method of a server according to another disclosed embodiment.

먼저, 1010 단계에서 서버(200)는 음성 신호를 수신한다. 서버(200)는 다양한 구성 요소를 통해 음성 신호를 수신할 수 있다. 단말(100)로부터 음성 신호를 수신하는 것이 가장 일반적인 형태일 것이지만, USB 인터페이스부 또는 DVD 인터페이스부 등을 통해 음성 신호를 수신할 수도 있다.First, in step 1010, the server 200 receives a voice signal. The server 200 can receive voice signals through various components. Receiving the voice signal from the terminal 100 may be the most general form, but it may also receive the voice signal through the USB interface unit or the DVD interface unit.

그 후, 1020 단계에서 서버(200)는 수신한 음성 신호에 대하여 일반 모델(271)을 이용하여 데이터 처리한다. 즉 서버(200)는 일반 모델(271)을 이용하여 단말(100)과 독립적으로 음성 신호에 대해 데이터 처리를 수행한다. 일반 모델(271)은 서버(200)의 높은 연산 능력과 결합하여 사용자의 다양한 언어 표현(대어휘)에 대한 높은 음성 처리 성능을 제공할 수 있다. 나아가, 서버(200)는 높은 연산 능력을 가지는 바, 빠른 데이터 처리가 가능하다.Thereafter, in step 1020, the server 200 performs data processing on the received voice signal using the general model 271. That is, the server 200 performs data processing on the voice signal independently of the terminal 100 using the general model 271. The general model 271 can provide high speech processing performance for various language expressions (vocabulary) of the user in combination with the high computing power of the server 200. Further, since the server 200 has high computation capability, it is possible to process data quickly.

다음으로, 1030 단계에서 서버(200)는 데이터 처리 결과를 단말(100)로 전송한다. 이 경우, 서버(200)는 개인화 정보 구간과 일반 정보 구간에 대한 음성 구간 정보를 생성하여 단말(100)로 함께 전송할 수도 있다. 높은 연산 능력을 갖는 서버(200)에서 음성 구간 정보를 생성하여 단말(100)로 전송함으로써, 단말(100)의 데이터 처리 부담을 줄여주어 전체적인 음성 처리 속도를 보다 빠르게 할 수 있다. 여기서, 음성 구간 정보는 음성 신호 중 개인화 정보 구간 및 일반 정보 구간 중 적어도 하나 이상의 구간을 마킹한 구간 마킹 정보를 포함할 수 있다.Next, in step 1030, the server 200 transmits the data processing result to the terminal 100. In this case, the server 200 may generate the voice section information for the personalized information section and the general information section, and may transmit the voice section information to the terminal 100 together. It is possible to reduce the data processing burden of the terminal 100 and to speed up the overall voice processing speed by generating the voice section information in the server 200 having high computing capability and transmitting the generated voice section information to the terminal 100. [ Here, the voice interval information may include interval marking information that marks at least one interval of the personalized information interval and the general information interval in the voice signal.

도 11은 개시된 다른 실시예에 따른 단말과 서버의 구체적인 동작 과정의 예시를 나타내는 순서도이다.11 is a flowchart illustrating an exemplary operation procedure of a terminal and a server according to another disclosed embodiment.

먼저, 1110 단계에서 단말(100)은 음성 신호를 수신한다. 위에서 설명한 것과 같이 단말(100)은 다양한 구성 요소를 통해 음성 신호를 수신할 수 있다. 마이크부를 통해 음성 신호를 수신하는 것이 가장 일반적인 형태일 것이지만, USB 인터페이스부 또는 DVD 인터페이스부 등을 통해 음성 신호를 수신할 수도 있다. 나아가, 외부 장치와 통신을 통해 음성 신호를 수신할 수도 있다. 그 후, 1120 단계에서 단말(100)은 수신한 음성 신호를 서버(200)로 전송할 수 있다.First, in step 1110, the terminal 100 receives a voice signal. As described above, the terminal 100 can receive voice signals through various components. Receiving a voice signal through a microphone unit may be the most general form, but it may also receive a voice signal through a USB interface unit or a DVD interface unit. Further, it is also possible to receive a voice signal through communication with an external device. Thereafter, in step 1120, the terminal 100 can transmit the received voice signal to the server 200. [

1130 단계에서 서버(200)는 수신한 음성 신호에 대하여 일반 모델(271)을 이용하여 데이터 처리한다. 일반 모델(271)은 서버(200)의 높은 연산 능력과 결합하여 사용자의 다양한 언어 표현(대어휘)에 대한 높은 음성 처리 성능을 제공할 수 있다. 나아가, 서버(200)는 높은 연산 능력을 가지는 바, 빠른 데이터 처리가 가능하다.In step 1130, the server 200 processes the received voice signal using the general model 271. The general model 271 can provide high speech processing performance for various language expressions (vocabulary) of the user in combination with the high computing power of the server 200. Further, since the server 200 has high computation capability, it is possible to process data quickly.

이때, 서버(200)는 개인화 정보 구간과 일반 정보 구간에 대한 음성 구간 정보를 생성할 수 있다. 높은 연산 능력을 갖는 서버(200)에서 음성 구간 정보를 생성함으로써 단말(100)의 데이터 처리 부담을 줄여주어 전체적인 음성 처리 속도를 보다 빠르게 할 수 있다. At this time, the server 200 can generate the personalization information section and the voice section information for the general information section. It is possible to reduce the data processing burden of the terminal 100 by generating the voice interval information in the server 200 having a high computation capability, and thereby to speed up the overall voice processing speed.

그 후, 1140 단계에서 서버(200)는 데이터 처리 결과와 음성 구간 정보를 단말(100)로 전송할 수 있다. 단말(100)은 1150 단계에서 음성 신호에 대하여 개인 정보를 기반으로 생성된 개인화 모델(171) 및 수신한 데이터 처리 결과를 이용하여 데이터 처리를 할 수 있다. 즉, 단말(100)은 음성 신호 전체 또는 부분에 대하여 다시 한 번 데이터 처리를 할 수 있다. 보다 높은 음성 처리 성능을 위하여 개인화 모델(171)을 이용하여 다시 한 번 데이터 처리를 하는 것이다. Thereafter, in step 1140, the server 200 may transmit the data processing result and the voice interval information to the terminal 100. [ In step 1150, the terminal 100 may process the voice signal using the personalization model 171 generated based on the personal information and the received data processing result. That is, the terminal 100 can perform data processing on all or part of the voice signal once again. The data processing is once again performed using the personalization model 171 for higher voice processing performance.

이 경우, 단말(100)은 개인화 정보 구간을 검출하여 개인화 정보 구간에 대응하는 음성 신호에 대하여 데이터 처리를 수행할 수 있다. 또한, 단말(100)은 개인화 음성 인식 모델, 언어 이해 모델 및 개인화 렉시컬 모델 중 적어도 하나 이상의 모델을 이용하여 음성 신호에 대한 데이터 처리를 할 수 있다. In this case, the terminal 100 may detect the personalized information section and perform data processing on the voice signal corresponding to the personalized information section. Also, the terminal 100 can process data on a voice signal using at least one of a personalized speech recognition model, a language understanding model, and a personalized lexical model.

도 12는 개시된 또다른 실시예에 따른 단말의 내부 구성을 나타내는 블록도이다. 12 is a block diagram illustrating an internal configuration of a terminal according to another disclosed embodiment.

도 12를 참조하면, 개시된 또다른 실시예에 따른 단말(1200)은 통신부(1210) 및 제어부(1230)를 포함한다. Referring to FIG. 12, a terminal 1200 according to another disclosed embodiment includes a communication unit 1210 and a control unit 1230.

통신부(1210)는 외부 장치와 통신하는 역할을 수행한다. 통신부(1210)는 유선 또는 무선으로 네트워크와 연결되어 외부 장치와의 통신을 수행할 수 있다. 개시된 실시예에 따르면, 통신부(1210)는 서버와 통신하며 데이터를 송수신할 수 있다. 개시된 실시예에 따르면, 통신부(130)는 서버와 통신하며 데이터를 송수신할 수 있다. 예를 들어, 통신부(130)는, 근거리 통신 모듈, 이동 통신 모듈, 무선 인터넷 모듈, 유선 인터넷 모듈 등을 포함할 수 있다. 또한, 통신부(130)는 하나 이상의 구성 요소를 포함할 수도 있다.The communication unit 1210 performs communication with the external device. The communication unit 1210 may be connected to a network by wire or wirelessly to perform communication with an external device. According to the disclosed embodiment, the communication unit 1210 communicates with the server and can transmit and receive data. According to the disclosed embodiment, the communication unit 130 communicates with the server and can transmit and receive data. For example, the communication unit 130 may include a short-range communication module, a mobile communication module, a wireless Internet module, and a wired Internet module. Further, the communication unit 130 may include one or more components.

제어부(1230)는 단말(1200) 전체의 동작을 제어하며, 통신부(1210)를 제어함으로써 오디오 신호를 처리할 수 있다. 제어부(1230)는 단말(1200)의 외부에서부터 입력되는 신호 또는 데이터를 저장하거나, 전자 장치에서 수행되는 다양한 작업에 대응되는 저장 영역으로 사용되는 램, 주변기기의 제어를 위한 제어 프로그램이 저장된 롬(ROM) 및 프로세서(Processor)를 포함할 수 있다. 프로세서는 코어(core, 도시되지 아니함)와 GPU(도시되지 아니함)를 통합한 SoC(System On Chip)로 구현될 수 있다. 또한, 프로세서는 복수의 프로세서를 포함할 수 있다.The control unit 1230 controls the operation of the entire terminal 1200 and can process the audio signal by controlling the communication unit 1210. The control unit 1230 may store signals or data input from the outside of the terminal 1200, a RAM used as a storage area corresponding to various jobs performed by the electronic device, a ROM (ROM) storing a control program for controlling the peripheral devices, And a processor. The processor may be implemented as a SoC (System On Chip) incorporating a core (not shown) and a GPU (not shown). A processor may also include a plurality of processors.

개시된 실시예에 따른 제어부(1230)는 개인 정보에 ID를 대응시켜 매핑 테이블을 생성하고, 매핑 테이블을 이용하여 ID 기반의 개인화 모델을 생성하며, 통신부(1210) 통해 ID 기반의 개인화 모델을 서버로 전송하고, 서버로부터 음성 신호에 대하여 ID 기반의 개인화 모델을 이용해 데이터 처리한 결과를 수신하며, 데이터 처리 결과 및 매핑 테이블을 이용하여 ID에 대응하는 개인 정보를 복원하도록 제어한다. 이때, 제어부(1230)는 개인 정보로부터 생성되는 부가 정보에 ID를 대응시켜 매핑 테이블을 생성할 수도 있다. 이 경우, 제어부(1230)는 서버로부터 부가 정보에 대하여 ID 기반의 개인화 모델을 이용해 데이터 처리한 결과를 수신하며, 데이터 처리 결과 및 매핑 테이블을 이용하여 ID에 대응하는 부가 정보를 복원하도록 제어할 수 있다. The control unit 1230 according to the disclosed embodiment generates a mapping table by associating the ID with the personal information, generates an ID-based personalization model using the mapping table, and transmits the ID-based personalization model to the server through the communication unit 1210 Receives a result of data processing using an ID-based personalization model for a voice signal from the server, and controls to restore personal information corresponding to the ID using the data processing result and the mapping table. At this time, the controller 1230 may generate the mapping table by associating the ID with the additional information generated from the personal information. In this case, the controller 1230 receives the result of the data processing using the ID-based personalization model with respect to the additional information from the server, and controls to restore the additional information corresponding to the ID using the data processing result and the mapping table have.

제어부(1230)는 매핑 테이블을 이용하여 ID 기반의 개인화 모델을 생성 시, 상기 개인 정보에 매핑된 ID를 음향 단위에 매핑된 ID인 음향 단위 ID로 나타낼 수 있다. 여기서 음향 단위 ID는 서버와 합의에 따라 상기 음향 단위에 매핑된 ID일 수 있다. 음향 단위 ID는 발음 기호에 해당하는 음성인식 모델의 특정 부분을 나타내며, 발음 기호와 1 대 1 매핑 되지 않을 수 있다. When generating the ID-based personalization model using the mapping table, the control unit 1230 may display the ID mapped to the personal information as the ID mapped to the sound unit. Here, the acoustic unit ID may be an ID mapped to the acoustic unit in agreement with the server. The acoustic unit ID indicates a specific part of the speech recognition model corresponding to the pronunciation symbol, and may not be mapped to the pronunciation symbol one-to-one.

개시된 실시예에 따르면, 개인 정보 및 개인 정보로부터 생성 되는 부가 정보는 단말에서 직접 부여한 ID로 마스킹하여 개인화 모델을 생성함으로써 외부에 개인화 모델이 노출되더라도 ID로 마스킹 된 개인 정보를 복원하기 어려워 개인 정보를 보호할 수 있다. 또한, 높은 연산 능력을 갖는 서버로 개인화 모델을 전송하여 음성 신호에 대하여 데이터 처리를 함으로써 보다 높은 성능 및 처리 속도를 갖는 음성 처리 시스템을 구현할 수 있다.According to the disclosed embodiments, the additional information generated from the personal information and the personal information is masked by the ID directly given by the terminal to generate the personalization model, so that even if the personalization model is exposed to the outside, it is difficult to restore the personal information masked by the ID, Can be protected. In addition, a personalization model is transmitted to a server having a high computation capability and data processing is performed on a voice signal, thereby realizing a voice processing system having higher performance and processing speed.

도 13은 개시된 또다른 실시예에 따른 서버의 내부 구성을 보다 상세히 나타내는 블록도이다.FIG. 13 is a block diagram showing an internal configuration of a server according to another disclosed embodiment in more detail.

도 13을 참조하면, 개시된 실시예에 따른 서버(1300)는 수신부(1310), 통신부(1330) 및 제어부(1350)를 포함한다. 13, the server 1300 according to the disclosed embodiment includes a receiving unit 1310, a communication unit 1330, and a control unit 1350.

수신부(1310)는 음성 신호를 수신하는 역할을 수행한다. 수신부(1310)는 USB 인터페이스부, DVD 인터페이스부 등 음성 신호를 다양한 형태로 수신할 수 있는 구성 요소를 포함할 수 있다. 예를 들어, 수신부(1310)가 USB 인터페이스를 포함하는 경우, 서버(1300)는 음성 신호 파일을 USB로부터 수신할 수 있다. 나아가, 통신부(1330)를 통해 외부 장치로부터 음성 신호를 수신하는 경우, 통신부(1330)가 수신부(1310)의 역할을 수행하는 것도 가능하다. The receiving unit 1310 plays a role of receiving a voice signal. The receiving unit 1310 may include a component capable of receiving voice signals in various forms, such as a USB interface unit and a DVD interface unit. For example, when the receiving unit 1310 includes a USB interface, the server 1300 can receive a voice signal file from the USB. In addition, when receiving a voice signal from an external device through the communication unit 1330, the communication unit 1330 may serve as the receiving unit 1310. [

통신부(1330)는 외부 장치와 통신하는 역할을 수행한다. 통신부(1330)는 유선 또는 무선으로 네트워크와 연결되어 외부 장치와의 통신을 수행할 수 있다. 개시된 실시예에 따르면, 통신부(1330)는 단말(1200)과 통신하며 데이터를 송수신할 수 있다. 개시된 실시예에 따르면, 통신부(130)는 서버와 통신하며 데이터를 송수신할 수 있다. 예를 들어, 통신부(130)는, 근거리 통신 모듈, 이동 통신 모듈, 무선 인터넷 모듈, 유선 인터넷 모듈 등을 포함할 수 있다. 또한, 통신부(130)는 하나 이상의 구성 요소를 포함할 수도 있다.The communication unit 1330 performs communication with the external device. The communication unit 1330 may be connected to the network by wire or wirelessly to perform communication with an external device. According to the disclosed embodiment, the communication unit 1330 can communicate with the terminal 1200 and transmit / receive data. According to the disclosed embodiment, the communication unit 130 communicates with the server and can transmit and receive data. For example, the communication unit 130 may include a short-range communication module, a mobile communication module, a wireless Internet module, and a wired Internet module. Further, the communication unit 130 may include one or more components.

제어부(1350)는 서버(1300) 전체의 동작을 제어하며, 수신부(1310) 및 통신부(1330)를 제어함으로써 음성 신호를 처리할 수 있다. 제어부(1350)는 서버(1300)의 외부에서부터 입력되는 신호 또는 데이터를 저장하거나, 전자 장치에서 수행되는 다양한 작업에 대응되는 저장 영역으로 사용되는 램, 주변기기의 제어를 위한 제어 프로그램이 저장된 롬(ROM) 및 프로세서(Processor)를 포함할 수 있다. 프로세서는 코어(core, 도시되지 아니함)와 GPU(도시되지 아니함)를 통합한 SoC(System On Chip)로 구현될 수 있다. 또한, 프로세서는 복수의 프로세서를 포함할 수 있다.The control unit 1350 controls the operation of the entire server 1300 and can process the audio signal by controlling the receiving unit 1310 and the communication unit 1330. The controller 1350 stores signals or data input from the outside of the server 1300, a RAM used as a storage area corresponding to various jobs performed in the electronic device, a ROM (ROM) storing a control program for controlling the peripheral devices, And a processor. The processor may be implemented as a SoC (System On Chip) incorporating a core (not shown) and a GPU (not shown). A processor may also include a plurality of processors.

개시된 실시예에 따른 제어부(1350)는 통신부(1330)를 통해 단말(1200)로부터 ID 기반의 개인화 모델을 수신하고, 수신부(1310)를 통해 음성 신호를 수신하며, 음성 신호에 대하여 ID 기반의 개인화 모델을 이용하여 데이터 처리하고, 통신부(1330)를 통해 데이터 처리 결과를 단말(1200)로 전송하도록 제어한다. The control unit 1350 according to the disclosed embodiment receives the ID-based personalization model from the terminal 1200 through the communication unit 1330, receives the voice signal through the receiving unit 1310, and performs ID-based personalization And controls the communication unit 1330 to transmit the data processing result to the terminal 1200.

또한, 제어부(1350)는, 음성 신호에 대하여 ID 기반의 개인화 모델을 이용하여 데이터 처리 시, 단말(1200)과 합의에 따라 음향 단위에 매핑된 ID인 음향 단위 ID를 이용하여 개인 정보에 매핑된 ID를 나타내도록 제어할 수 있다. In addition, the control unit 1350 may use the ID-based personalization model for the voice signal to map the voice signal to the personal information using the sound unit ID, which is an ID mapped to the sound unit, It is possible to control to display the ID.

개시된 실시예에 따르면, 서버(1300)는 ID 기반의 개인화 모델을 단말(1200)로부터 수신하여, 수신한 ID 기반의 개인화 모델을 기용하여 데이터를 처리함으로써, 외부에 개인화 모델이 노출되더라도 ID로 마스킹 된 개인 정보(1272)를 복원하기 어려워 개인 정보(1272)를 보호할 수 있다. 또한, 높은 연산 능력을 갖는 서버(1300)가 음성 신호에 대하여 데이터 처리를 함으로써 보다 높은 성능 및 처리 속도를 갖는 음성 처리 시스템을 구현할 수 있다.According to the disclosed embodiment, the server 1300 receives an ID-based personalization model from the terminal 1200 and processes the data using the received ID-based personalization model, so that even if the personalization model is exposed to the outside, It is difficult to restore the personal information 1272, and thus the personal information 1272 can be protected. In addition, a server 1300 having a high computing capability performs data processing on a voice signal, thereby realizing a voice processing system having higher performance and processing speed.

도 14는 개시된 또다른 실시예에 따른 단말의 내부 구성을 보다 상세히 나타내는 블록도이다. 도 14에서는 도 12와 중복되는 구성 요소에 대한 설명은 생략한다. FIG. 14 is a block diagram showing an internal configuration of a terminal according to another disclosed embodiment in more detail. In FIG. 14, the description of the elements that are the same as those in FIG. 12 will be omitted.

제어부(1230)는 개인화 모델 생성부(1231)를 포함할 수 있다. 개시된 실시예에 따르면, 개인화 모델 생성부(1231)는 저장부(1270)에 포함된 개인 정보(1272)를 기반으로 개인화 모델을 생성할 수 있다. 개인화 모델 생성부(1231)는 개인 정보(1272) 또는 개인 정보(1272)로부터 생성되는 부가 정보에 ID를 대응시켜 매핑 테이블(1273)을 생성하고, 생성된 매핑 테이블(1273)을 이용하여 ID 기반의 개인화 모델을 생성할 수 있다. The control unit 1230 may include a personalization model generation unit 1231. According to the disclosed embodiment, the personalization model generation unit 1231 can generate a personalization model based on the personal information 1272 included in the storage unit 1270. [ The personalization model generation unit 1231 generates the mapping table 1273 by associating the ID with the additional information generated from the personal information 1272 or the personal information 1272 and generates the mapping table 1273 based on the ID based The personalization model of the user can be created.

단말(1200)은 수신부(1250)를 더 포함할 수 있다. 수신부(1250)는 음성 신호를 수신하는 역할을 수행한다. 수신부(1250)는 마이크부, USB 인터페이스부, DVD 인터페이스부 등 다양한 구성 요소를 포함할 수 있다. 예를 들어, 수신부(1250)가 마이크부를 포함하는 경우, 단말(1200)은 사용자 음성 신호를 마이크부를 통해 직접 수신할 수 있다. 또한, 수신부(1250)가 USB 인터페이스부를 포함하는 경우, 단말(1200)은 음성 신호 파일을 USB로부터 수신할 수도 있다. 나아가, 통신부(130)를 통해 외부 장치로부터 음성 신호를 수신하는 경우, 통신부(1210)가 수신부(1250)의 역할을 수행하는 것도 가능하다. The terminal 1200 may further include a receiving unit 1250. The receiver 1250 plays a role of receiving a voice signal. The receiving unit 1250 may include various components such as a microphone unit, a USB interface unit, and a DVD interface unit. For example, when the receiving unit 1250 includes a microphone unit, the terminal 1200 can directly receive a user voice signal through the microphone unit. Also, when the receiving unit 1250 includes a USB interface unit, the terminal 1200 may receive a voice signal file from the USB. Further, when receiving a voice signal from an external device through the communication unit 130, the communication unit 1210 may serve as the receiving unit 1250.

단말(1200)은 저장부(1270)를 더 포함할 수 있다. 저장부(1270)는 단말(1200)의 동작에 필요한 프로그램 및 데이터를 저장하는 역할을 수행한다. 저장부(1270)는 휘발성(volatile) 저장 매체 또는 비휘발성(nonvolatile) 저장 매체로 구성될 수 있으며, 양 저장 매체의 조합(combination)으로 구성될 수도 있다. 휘발성 저장 매체로는 RAM, DRAM, SRAM과 같은 반도체 메모리(semiconductor memory)가 포함될 수 있으며, 비휘발성 저장 매체로는 하드 디스크(hard disk), 플래시 낸드 메모리(Flash NAND Memory)가 포함될 수 있다. 개시된 실시예에 따르면 저장부(170)에는 개인 정보(1272) 및 매핑 테이블(1273)이 저장될 수 있다. The terminal 1200 may further include a storage unit 1270. The storage unit 1270 stores programs and data necessary for the operation of the terminal 1200. The storage unit 1270 may be a volatile storage medium or a nonvolatile storage medium and may be a combination of both storage media. The volatile storage medium may include a semiconductor memory such as a RAM, a DRAM, and a SRAM. The nonvolatile storage medium may include a hard disk and a flash NAND memory. According to the disclosed embodiment, the personal information 1272 and the mapping table 1273 may be stored in the storage unit 170.

개인 정보(1272)는 직간접적으로 각 개인을 식별할 수 있는 정보로, 단말의 종류에 따라 저장되는 데이터의 종류가 달라질 수 있다. 예를 들어, 모바일 디바이스의 경우에는 연락처, 음악 리스트, 단문 메시지의 내용이나 수신, 발신 내역, 웹 검색 이력을 포함할 수 있고, TV 의 경우에는 개인적인 재생 목록 등이 포함될 수 있다.The personal information 1272 is information that can directly or indirectly identify each individual. The type of data stored according to the type of the terminal can be changed. For example, in the case of a mobile device, it may include a contact, a music list, a short message content, a reception history, a call history, and a web search history. In the case of a TV, a personal playlist may be included.

매핑 테이블(1273)은 개인 정보(1272) 또는 개인 정보(1272)로부터 생성되는 부가 정보에 대응되는 ID 포함한다. 매핑 테이블(1273)은 개인화 모델 생성부(1231)가 ID 기반의 개인화 모델을 생성하는데 사용된다. 또한, 매핑 테이블(1273)은 ID에 대응하는 개인 정보(1272) 또는 부가 정보를 복원하는 과정에도 사용된다. The mapping table 1273 includes an ID corresponding to the additional information generated from the personal information 1272 or the personal information 1272. [ The mapping table 1273 is used by the personalization model generation unit 1231 to generate an ID-based personalization model. The mapping table 1273 is also used for restoring the personal information 1272 corresponding to the ID or the additional information.

도 15는 도 13에서 도시하는 서버의 내부 구성을 보다 상세히 나타내는 블록도이다.15 is a block diagram showing in more detail the internal configuration of the server shown in Fig.

제어부(1350)는 음성 처리 엔진(1351)을 포함할 수 있다. 개시된 실시예에 따르면, 음성 처리 엔진(1351)은 음성 인식 엔진과 언어 이해 엔진을 포함할 수 있으며, 수신한 음성 신호를 데이터 처리하여 음성 인식 및 언어 이해를 수행한다. 이때, 음성 인식 엔진과 언어 이해 엔진은 각각 음성 인식 모델과 언어 이해 모델을 이용하여 음성 신호를 처리할 수 있다. The control unit 1350 may include a voice processing engine 1351. According to the disclosed embodiment, the speech processing engine 1351 may include a speech recognition engine and a language understanding engine and performs data processing on the received speech signal to perform speech recognition and language understanding. At this time, the speech recognition engine and the language understanding engine can process the speech signal using the speech recognition model and the language understanding model, respectively.

서버(1300)는 저장부(1370)를 더 포함할 수 있다. 저장부(1370)는 서버(1300)의 동작에 필요한 프로그램 및 데이터를 저장하는 역할을 수행한다. 저장부(1370)는 휘발성(volatile) 저장 매체 또는 비휘발성(nonvolatile) 저장 매체로 구성될 수 있으며, 양 저장 매체의 조합(combination)으로 구성될 수도 있다. 휘발성 저장 매체로는 RAM, DRAM, SRAM과 같은 반도체 메모리(semiconductor memory)가 포함될 수 있으며, 비휘발성 저장 매체로는 하드 디스크(hard disk), 플래시 낸드 메모리(Flash NAND Memory)가 포함될 수 있다. The server 1300 may further include a storage unit 1370. The storage unit 1370 stores programs and data necessary for the operation of the server 1300. The storage unit 1370 may be a volatile storage medium or a nonvolatile storage medium, or may be a combination of both storage media. The volatile storage medium may include a semiconductor memory such as a RAM, a DRAM, and a SRAM. The nonvolatile storage medium may include a hard disk and a flash NAND memory.

개시된 실시예에 따르면 저장부(1370)에는 개인화 모델(1372) 및 일반 모델(1373)이 저장될 수 있다. 개인화 모델(1372)은 단말(1200)로부터 수신한 ID 기반의 개인화 모델로, 외부에 개인화 모델이 노출되더라도 ID로 마스킹 된 개인 정보(1272)를 복원하기 어려워 개인 정보(1272)를 보호할 수 있다. 일반 모델(271)은 일반적인 음성 처리 모델로, 특정 개인이 아닌 불특정인의 음성 처리를 위한 음성 처리 모델이다. 대용량으로 구성되는 일반 모델(271)은 서버의 높은 연산 능력과 결합하여 사용자의 다양한 언어 표현(대어휘)에 대한 높은 음성 처리 성능을 제공할 수 있다. According to the disclosed embodiment, the personalization model 1372 and the general model 1373 may be stored in the storage unit 1370. [ The personalization model 1372 is an ID-based personalization model received from the terminal 1200, and even if the personalization model is exposed to the outside, it is difficult to restore the personal information 1272 masked by the ID, so that the personal information 1272 can be protected . The general model 271 is a general speech processing model, and is a speech processing model for speech processing of an unspecified person who is not a specific individual. The generic model 271, which is composed of a large capacity, can combine with the high computing power of the server to provide high voice processing performance for various language expressions (vocabulary) of the user.

아래에서 단말(1200)과 서버(1300)의 동작을 보다 상세하게 설명하도록 한다. Operations of the terminal 1200 and the server 1300 will be described in more detail below.

도 16은 개시된 또다른 실시예에 따른 단말의 음성 처리 방법을 나타내는 순서도이다. 16 is a flowchart showing a voice processing method of a terminal according to another disclosed embodiment.

먼저, 단말(1200)은 1610 단계에서 개인 정보(1272) 에 ID를 대응시켜 매핑 테이블(1273)을 생성한다. 이 경우, 개인 정보(1272)와 함께 개인 정보(1272)로부터 생성되는 부가 정보에 ID를 대응시켜 매핑 테이블(1273)을 생성할 수도 있다. 여기서, 부가 정보는 발음 기호, 발음열 등을 포함할 수 있다. 개시된 실시예에 따르면, 단말(1200)은 개인 정보(1272)의 단어 리스트를 이용하여 발음 사전을 생성하고, 발음 사전을 이용하여 발음 기호와 단어에 대하여 ID를 매핑할 수 있다. 이 경우, 단말(1200)은 임의의 ID를 부여할 수 있다. 도 18 내지 도 20을 참조하여 보다 상세히 설명하도록 한다. First, in step 1610, the terminal 1200 generates a mapping table 1273 by associating an ID with the personal information 1272. In this case, the mapping table 1273 may be generated by associating the ID with the additional information generated from the personal information 1272 together with the personal information 1272. [ Here, the additional information may include a phonetic symbol, a pronunciation string, and the like. According to the disclosed embodiment, the terminal 1200 can generate a pronunciation dictionary using the word list of the personal information 1272, and map the ID to the pronunciation symbol and the word using the pronunciation dictionary. In this case, the terminal 1200 can assign an arbitrary ID. This will be described in more detail with reference to FIGS. 18 to 20. FIG.

도 18은 개인 정보를 나타내는 도면이다. 18 is a diagram showing personal information.

개인 정보(1272)는 직간접적으로 각 개인을 식별할 수 있는 정보로, 연락처, 음악 리스트, 단문 메시지의 내용이나 수신, 발신 내역, 웹 검색 이력, 재생 목록 등이 이에 해당한다. 도 18을 참조하면, 다양한 형태의 개인 정보가 도시되어 있는데, 연락처 항목에 저장된 이름, 음악 재생 목록에 있는 음악 리스트 또는 가수, 검색 결과 등의 개인 정보(1272)가 저장되어 있음을 알 수 있다. 개시된 실시예에 따르면, 단말(1200)은 이러한 개인 정보(1272)에 ID를 매핑시켜 매핑 테이블을 생성할 수 있다. 도 19를 참조하여 설명하도록 한다. The personal information 1272 is information that can directly or indirectly identify each individual. Examples of the personal information 1272 include the contents of a contact, a music list, and a short message, a transmission history, a web search history, and a play list. Referring to FIG. 18, various types of personal information are shown. It is noted that personal information 1272 such as a name stored in a contact item, a music list in a music playlist or an artist, a search result, and the like is stored. According to the disclosed embodiment, the terminal 1200 can generate a mapping table by mapping an ID to this personal information 1272. [ This will be described with reference to FIG.

도 19는 개인 정보를 ID에 매핑시킨 매핑 테이블을 나타내는 도면이다. 19 is a diagram showing a mapping table in which personal information is mapped to an ID.

도 19를 참조하면, 단말(1200)은 개인 정보(1272)에 포함된 단어인 홍길동, 김길동, 강남스타일, TOXIC, Psy, Galaxy, Note 를 각각 ID 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07 로 매핑시킴으로써 단어 매핑 테이블(1910)을 생성한다. 나아가, 단말(1200)은 개인 정보(1272) 뿐 아니라, 개인 정보(1272)로부터 생성되는 부가 정보에 ID를 대응시켜 매핑 테이블(1273)을 생성할 수도 있다. 이러한 부가 정보는 발음 기호, 발음열 등을 포함할 수 있다. 도 20을 참조하여 설명하도록 한다. Referring to FIG. 19, the terminal 1200 stores the words 0, 1, 2, 3, 4, 5, 6, 7, 0x07 to generate a word mapping table 1910. [ Furthermore, the terminal 1200 may generate the mapping table 1273 by associating the ID with the additional information generated from the personal information 1272 as well as the personal information 1272. [ Such additional information may include a pronunciation symbol, a pronunciation column, and the like. This will be described with reference to FIG.

도 20은 개인 정보를 발음 기호 별로 나타내는 도면이다. 20 is a diagram showing individual information by phonetic symbols.

개시된 실시예에 따르면, 단말(1200)은 개인 정보(1272)의 단어 리스트를 이용하여 발음 사전(phonetic dictionary, 1910)을 생성할 수 있다. 도 20을 참조하면, 개인 정보(1272) 중 연락처에 저장된 이름인 홍길동은 ‘HH OW NX K IY T OW NX’ 으로 발음 기호를 표시하고 있다. 또한, 음악 리스트에 포함된 강남스타일은 ‘K AA NX N A M ST AI L’로 표시할 수 있다. 단말(1200)은 이와 같이 개인 정보(1272)들을 발음 기호로 표시하여 발음 사전(1910)을 생성할 수 있다. 이때, 단말(1200)은 다양한 형태의 발음 기호를 사용할 수 있다. 위에서 설명한 것과 같이 알파벳을 이용하여 발음을 표시할 수도 있고, 영어 단어의 경우, 영어 발음 기호를 사용하여 발음 사전(2010)을 생성할 수도 있다. 도 20을 참조하면 TOXIC은 ‘t ?: k s ? k’, Galaxy는 ‘g ? l ? k s i ’로 표시한 것과 같이 영어 발음 기호로 개인 정보(1272)를 표시할 수 있다. 단말(1200)은 개인 정보(1272) 뿐 아니라, 발음 기호와 같이 개인 정보(1272)로부터 생성되는 부가 정보에도 ID를 대응시켜 매핑 테이블(1273)을 생성할 수 있다.According to the disclosed embodiment, the terminal 1200 can generate a phonetic dictionary 1910 using the word list of the personal information 1272. Referring to FIG. 20, the name of the personal information 1272, which is the name stored in the contact, indicates the pronunciation symbol with 'HH OW NX K IY T OW NX'. Also, the Gangnam style included in the music list can be displayed as 'K AA NX N A M ST AI L'. The terminal 1200 can generate the pronunciation dictionary 1910 by displaying the personal information 1272 as a pronunciation symbol in this manner. At this time, the terminal 1200 can use various types of pronunciation symbols. The pronunciation may be displayed using the alphabet as described above, or, in the case of English words, the pronunciation dictionary 2010 may be generated using the English pronunciation symbol. Referring to FIG. 20, TOXIC is represented by 't?: K s? k ', Galaxy is' g? l? personal information 1272 can be displayed in English pronunciation symbols as indicated by k s i '. The terminal 1200 can generate the mapping table 1273 by associating the ID with the additional information generated from the personal information 1272 such as the pronunciation symbol as well as the personal information 1272. [

다시 도 16의 설명으로 돌아가면, 단말(1200)은 1620 단계에서 매핑 테이블(1273)을 이용하여 ID 기반의 개인화 모델(1372)을 생성한다. ID 기반의 개인화 모델(1372)은 1610 단계에서 생성한 매핑 테이블(1273)을 이용하여 생성되는 바, 개인 정보(1272) 및 부가 정보는 ID로 마스킹 된다. 개시된 실시예에 따르면, 단말(1200)은 개인화 모델을 생성함으로써 개인 정보(1272) 및 부가 정보에 매핑된 ID를 음향 단위 ID로 나타낼 수 있다. 도 21 및 22를 참조하여 설명하도록 한다. Returning to the description of FIG. 16, the terminal 1200 generates an ID-based personalization model 1372 using the mapping table 1273 in step 1620. The ID-based personalization model 1372 is generated using the mapping table 1273 generated in step 1610, and the personal information 1272 and the additional information are masked by the ID. According to the disclosed embodiment, the terminal 1200 can generate the personalization model, thereby indicating the ID mapped to the private information 1272 and the additional information by the acoustic unit ID. Will be described with reference to Figs. 21 and 22. Fig.

도 21은 음향 단위를 ID에 매핑시킨 매핑 테이블을 나타내는 도면이다. 21 is a diagram showing a mapping table in which acoustic units are mapped to IDs.

음향 단위 ID 는 발음 기호에 해당하는 음성 인식 모델의 특정 부분을 나타낸다. 도 21을 참조하면, 단말(100)는 각 단어에 포함된 발음 기호인 HH, OW, NX, K, IY, L 를 각각 ID 0x101, 0x102, 0x103, 0x104, 0x105, 0x106, … 로 매핑시킴으로써 음향 단위 매핑 테이블(2110)을 생성한다. 개시된 실시예에 따르면, 단말(1200)은 음향에 ID를 매핑 시, 서버(1300)와 합의에 따라 특정 음향에 특정 ID를 매핑할 수 있다. 즉, 단말(1200)이 음향 단위 HH에 매핑한 ID 0x101는 서버(1300)와 사전에 합의된 ID 일 수 있다. 이에 따라, 서버(1300)는 음성 신호를 데이터 처리할 때, 특정 음성 신호, 즉, 특정 음향에 대하여 단말(1200)과 합의된 특정 ID로 대응시킬 수 있다. 단말(1200)과 서버(1300)의 합의 과정은 단말(1200) 또는 서버(1300) 한쪽에서 특정 음향에 매핑되는 ID를 지정하여 상대방에게 통보할 수도 있고, 의견을 교환하며 발음 기호에 ID를 매핑할 수도 있다. 도 20에서는 편의상 발음 기호와 음향 단위 ID가 1 대 1로 매핑되어 있으나, 음향 단위 ID는 발음 기호와 1 대 1 매핑 되지 않을 수도 있다. 예를 들어, 발음 기호 HH 와 OW 를 합친 음향을 하나의 음향 단위로 보아 HH OW 에 하나의 음향 단위 ID를 부여할 수도 있다. The acoustic unit ID indicates a specific part of the speech recognition model corresponding to the pronunciation symbol. Referring to FIG. 21, the terminal 100 converts the pronunciation symbols HH, OW, NX, K, IY, and L included in each word into IDs 0x101, 0x102, 0x103, 0x104, 0x105, 0x106, ..., So as to generate a sound unit mapping table 2110. [ According to the disclosed embodiment, when the terminal 1200 maps an ID to a sound, it can map a specific ID to a specific sound according to agreement with the server 1300. [ That is, the ID 0x101 mapped to the acoustic unit HH by the terminal 1200 may be an ID agreed with the server 1300 in advance. Accordingly, the server 1300 can correspond to a specific voice, that is, a specific sound, with a specific ID agreed with the terminal 1200, when processing the voice signal. The terminal 1200 and the server 1300 can specify an ID mapped to a specific sound from either the terminal 1200 or the server 1300 to notify the other party, exchange opinions, map an ID to a pronunciation symbol, You may. In FIG. 20, the pronunciation symbol and the sound unit ID are mapped one to one for convenience, but the sound unit ID may not be mapped to the pronunciation symbol one to one. For example, a sound unit ID may be assigned to the HH OW by considering the sound combining the pronunciation symbols HH and OW as one sound unit.

도 22는 개인 정보 ID를 음향 단위 ID로 나타내는 도면이다. 22 is a diagram showing the personal information ID by the sound unit ID.

단말(1200)은 개인 정보(2210)에 단어 ID(2220)를 매핑할 수 있고, 이러한 개인 정보(2210)의 발음 기호 및 음향 모델을 이용하여 단어 ID(2220)에 음향 단위 ID(2230)를 매핑할 수 있다. 음향 단위 ID 는 발음 기호에 해당하는 음성 인식 모델의 특정 부분을 나타내며, 발음 기호와 1 대 1 매핑 되지 않을 수도 있다. 다만, 여기에서는 편의상 음향 단위 ID와 발음 기호가 1 대 1 매핑됨을 가정하고 설명한다. The terminal 1200 can map the word ID 2220 to the personal information 2210 and use the pronunciation symbol and acoustic model of the personal information 2210 to assign the acoustic unit ID 2230 to the word ID 2220 Can be mapped. The acoustic unit ID indicates a specific part of the speech recognition model corresponding to the pronunciation symbol, and may not be mapped to the pronunciation symbol one-to-one. Here, it is assumed that the acoustic unit ID and the pronunciation symbol are mapped on a one-to-one basis for convenience.

도 22를 참조하면, 단말(1200)은 개인 정보(1272) ‘홍길동’이라는 단어에 대해서는 임의로 0x01 라는 ID로 매핑하였다. ‘홍길동’ 이라는 단어는 발음 기호 ‘HH OW NX K IY T OW NX’ 로 나타낼 수 있고, 각 발음 기호는 서버(1300)와 합의된 음향 단위 ID 0x101,0x102, 0x103, 0x104, 0x105, 0x106, … 로 매핑 된다. 따라서, ‘홍길동’에 해당하는 0x01 라는 ID는 음향 단위 ID 0x101,0x102, 0x103, 0x104, 0x105, 0x106, … 로 나타낼 수 있다. Referring to FIG. 22, the terminal 1200 arbitrarily maps the word 'Hong Kil Dong' to the personal information 1272 'ID 0x01. The word 'Hong Gil Dong' can be represented by the pronunciation symbol 'HH OW NX K IY T OW NX', and each pronunciation symbol is represented by the sound unit IDs 0x101, 0x102, 0x103, 0x104, 0x105, 0x106, ... Lt; / RTI > Therefore, the ID of 0x01 corresponding to 'Hong Kil-Dong' corresponds to the sound unit IDs 0x101, 0x102, 0x103, 0x104, 0x105, 0x106, ... .

개시된 실시예에 따르면, 단말(1200)은 단어에 대해서는 임의로 ID를 매핑하고, 해당 단어 ID를 서버와 합의된 음향 단위 ID로 나타낼 수 있다. 이에 따라 개인 정보(1272)는 ID로 마스킹함으로써 개인화 모델이 외부에 노출되더라도 개인 정보(1272)를 보호할 수 있고, 서버(1300)는 합의된 음향 단위 ID를 이용하여 음성 신호에 대한 데이터 처리를 할 수 있다. According to the disclosed embodiment, the terminal 1200 arbitrarily maps an ID to a word, and the corresponding word ID can be represented by an acoustic unit ID agreed with the server. Accordingly, the personal information 1272 can protect the personal information 1272 even if the personalization model is exposed to the outside by masking with the ID, and the server 1300 can perform data processing on the voice signal using the agreed acoustic unit ID can do.

다음으로, 단말(1200)은 1630 단계에서 ID 기반의 개인화 모델(1372)을 서버로 전송한다. 위에서 설명한 것과 같이, ID 기반의 개인화 모델(1372)은 도 22에 도시된 단어 ID(2220) 및 음향 단위 ID(2230)를 기초로 생성될 수 있다. 따라서 서버(1300)는 인식할 음성 신호를 수신한 경우 음성 신호를 처리하여 해당 음성 신호의 음향 단위 ID(2230)에 대응하는 단어 ID(2220)를 결과로 출력할 수 있다. 이때, 매핑 테이블(1273)은 서버(1300)에 전송하지 않고 단말(1200)에만 저장함으로써 개인화 모델이 외부에 노출되더라도 개인 정보(1272)를 보호할 수 있다. In step 1630, the terminal 1200 transmits an ID-based personalization model 1372 to the server. As described above, the ID-based personalization model 1372 may be generated based on the word ID 2220 and acoustic unit ID 2230 shown in FIG. Accordingly, when the server 1300 receives the voice signal to be recognized, the server 1300 processes the voice signal and outputs a word ID 2220 corresponding to the voice unit ID 2230 of the voice signal as a result. At this time, the mapping table 1273 is stored only in the terminal 1200 without being transmitted to the server 1300, thereby protecting the personal information 1272 even if the personalization model is exposed to the outside.

그 후, 단말(1200)은 1640 단계에서 서버(1300)로부터 음성 신호에 대하여 ID 기반의 개인화 모델(1372)을 이용해 데이터 처리한 결과를 수신한다. 예를 들어 서버에서 ID 기반의 개인화 모델(1372)을 이용해 데이터 처리한 결과는 도 22에 도시된 바와 같은 단어 ID(2220)를 포함할 수 있다. After that, the terminal 1200 receives the result of the data processing using the ID-based personalization model 1372 with respect to the voice signal from the server 1300 in step 1640. For example, the result of data processing using the ID-based personalization model 1372 at the server may include a word ID 2220 as shown in FIG.

다음, 1650 단계에서 단말(1200)은 서버(1300)로부터 수신한 데이터 처리 결과 및 매핑 테이블(1273)을 이용하여 ID에 대응하는 개인 정보(1272) 또는 부가 정보를 복원한다. 즉, 단말(1200)은 도 22에 도시된 바와 같은 단어 ID(2220)를 서버(1300)로부터 데이터 처리결과로서 수신하면, 저장된 단어 매핑 테이블(1273)을 이용하여 단어 ID(2220)에 대응하는 개인 정보를 복원할 수 있다. 도 20을 참조하면, 단말(1200)은 ID 0x01 를 ‘홍길동’으로 복원할 수 있다. 개시된 실시예에 따르면, 단말(1200)은 ID로 마스킹된 개인 정보(1272)를 매핑 테이블(1273)을 이용하여 복원함으로써 데이터 처리를 완료할 수 있다. 단말(1200)은 개인화 모델(1372)을 생성하여 음성 처리 시스템이 높은 성능을 가질 수 있도록 하고, 실제 데이터 처리는 높은 연산 능력을 갖는 서버(1300)에서 이루어짐으로써 음성 신호를 빠르게 처리할 수 있다. Next, in step 1650, the terminal 1200 restores the personal information 1272 or the additional information corresponding to the ID using the data processing result received from the server 1300 and the mapping table 1273. That is, when the terminal 1200 receives the word ID 2220 as shown in FIG. 22 as a data processing result from the server 1300, the terminal 1200 uses the stored word mapping table 1273 to obtain the word ID 2220 corresponding to the word ID 2220 Personal information can be restored. Referring to FIG. 20, the terminal 1200 can restore the ID 0x01 to 'Hong Kil Dong'. According to the disclosed embodiment, the terminal 1200 can complete the data processing by restoring the personal information 1272 masked by the ID using the mapping table 1273. [ The terminal 1200 generates a personalization model 1372 to enable the voice processing system to have high performance and the actual data processing is performed in the server 1300 having high computing ability, so that the voice signal can be processed quickly.

나아가, 단말(1200)은 데이터 처리 결과를 사용자에게 출력할 수 있다. Furthermore, the terminal 1200 can output the data processing result to the user.

결과적으로, 개시된 실시예에 따르면, 개인 정보(1272)는 단어나 문장의 내용을 파악할 수 없는 형태로 변형되어 서버(1300)로 전달됨으로써 외부에 개인화 모델이 노출되더라도 ID로 마스킹 된 개인 정보(1272)를 복원하기 어려워 개인 정보(1272)를 보호할 수 있다. 또한, 높은 연산 능력을 갖는 서버로 개인화 모델을 전송하여 음성 신호에 대하여 데이터 처리를 함으로써 보다 높은 성능 및 처리 속도를 갖는 음성 처리 시스템을 구현할 수 있다.As a result, according to the disclosed embodiment, the personal information 1272 is transformed into a form that can not grasp the contents of a word or a sentence and is transmitted to the server 1300, so that even if the personalization model is exposed to the outside, The personal information 1272 can be protected. In addition, a personalization model is transmitted to a server having a high computation capability and data processing is performed on a voice signal, thereby realizing a voice processing system having higher performance and processing speed.

도 17은 개시된 또다른 실시예에 따른 서버의 음성 처리 방법을 나타내는 순서도이다.17 is a flowchart showing a voice processing method of a server according to another disclosed embodiment.

먼저, 서버(1300)는 1710 단계에서 단말(1200)로부터 ID 기반의 개인화 모델(1273)을 수신한다. ID 기반의 개인화 모델(1273)은 개인 정보(1272)에 대한 부분이 ID로 마스킹 되어 있어 외부에 개인화 모델이 노출되더라도 ID로 마스킹 된 개인 정보(1272)를 복원하기 어려워 개인 정보(1272)를 보호할 수 있다.First, in step 1710, the server 1300 receives an ID-based personalization model 1273 from the terminal 1200. The ID-based personalization model 1273 has a function of protecting the personal information 1272 because it is difficult to restore the personal information 1272 masked by the ID even if the personalization model is exposed to the outside because the portion of the personal information 1272 is masked by the ID can do.

그 후, 서버(1300)는 1720 단계에서 음성 신호를 수신한다. 서버(200)는 다양한 구성 요소를 통해 음성 신호를 수신할 수 있다. 단말(100)로부터 음성 신호를 수신하는 것이 가장 일반적인 형태일 것이지만, USB 인터페이스부 또는 DVD 인터페이스부 등을 통해 음성 신호를 수신할 수도 있다.Thereafter, the server 1300 receives the voice signal in step 1720. [ The server 200 can receive voice signals through various components. Receiving the voice signal from the terminal 100 may be the most general form, but it may also receive the voice signal through the USB interface unit or the DVD interface unit.

다음으로, 서버(1300)는 1730 단계에서 음성 신호에 대하여 ID 기반의 개인화 모델(1273)을 이용하여 데이터 처리한다. 이 경우, 서버(1300)는 단말(1200)과 사전 합의에 따라 음향 단위에 매핑된 ID를 이용하여 개인 정보 ID를 나타내도록 제어할 수 있다. 도 21 내지 도 22를 참조하여 설명한다. Next, the server 1300 processes the voice signal using the ID-based personalization model 1273 in step 1730. In this case, the server 1300 can control to display the personal information ID using the ID mapped to the sound unit according to the pre-agreement with the terminal 1200. Will be described with reference to Figs. 21 to 22. Fig.

도 21을 참조하면, 서버(1300)가 단말(1200)과의 합의 에 따라 음향 단위에 ID가 매핑되어 있다. 서버(1300)는 ‘홍길동’이라는 음성 신호에 대해서 연속적인 음향의 집합으로 인식한다. 따라서, 음향 ‘HH’에 대해서는 단말(1200)과 사전에 합의된 ID 인 0x101 를 부여하고, 음향 ‘OW’에 대해서는 0x102를 부여하는 방식으로, ‘홍길동’이라는 음성 신호를 음향 단위 ID의 집합인 0x101, 0x102, 0x103, 0x104, 0x105, 0x106, … 으로 나타낼 수 있다. Referring to FIG. 21, an ID is mapped to a sound unit according to an agreement with the terminal 1200 by the server 1300. FIG. The server 1300 recognizes the audio signal of 'Hong Gil Dong' as a set of continuous sounds. Therefore, a speech signal of 'Hong Kil-Dong' is assigned to the terminal 1200 in a manner of assigning 0x101, which is an ID agreed in advance with the terminal 1200, and 0x102, for the sound 'OW' 0x101, 0x102, 0x103, 0x104, 0x105, 0x106, ... .

서버(1300)는 단말(1200)로부터 수신한 개인화 모델(1273)에서 이러한 발음 기호 ID 집합에 대응하는 단어 ID를 찾아 대응 시킬 수 있다. 도 22를 참조하면, ‘홍길동’ 이라는 단어에는 0x01 이라는 ID가 매핑되어 있다. 따라서, 서버(1300)는 ‘홍길동’ 이라는 음성 신호를 발음 기호 ID의 집합인 0x101, 0x102, 0x103, 0x104, 0x105, 0x106, … 에 대응하는 0x01 이라는 단어 ID로 나타낼 수 있다. 서버(1300)는 단말(1200)로부터 개인화 모델(1372)을 수신하여 데이터 처리를 함으로써 높은 성능을 가질 수 있고, 또한, 서버(1300)는 높은 연산 능력을 갖는바, 음성 신호를 빠르게 처리할 수 있다.The server 1300 can find and correspond to the word ID corresponding to this set of pronunciation symbol IDs in the personalization model 1273 received from the terminal 1200. [ Referring to FIG. 22, an ID of 0x01 is mapped to the word 'Hong Kil Dong'. Accordingly, the server 1300 transmits a voice signal of 'Hong Kil-Dong' as a set of pronunciation symbol IDs 0x101, 0x102, 0x103, 0x104, 0x105, 0x106, ... Quot; 0x01 " corresponding to " 0x01 ". The server 1300 can have high performance by receiving the personalization model 1372 from the terminal 1200 and performing the data processing and the server 1300 has high computation ability and can perform fast processing of the voice signal have.

나아가, 서버(1300)는 1740 단계에서 데이터 처리 결과를 단말(1200)로 전송한다. In addition, the server 1300 transmits the data processing result to the terminal 1200 in step 1740.

결과적으로, 개시된 실시예에 따르면, 개인 정보(1272)는 단어나 문장의 내용을 파악할 수 없는 형태로 변형되어 서버(1300)로 전달됨으로써 외부에 개인화 모델이 노출되더라도 ID로 마스킹 된 개인 정보(1272)를 복원하기 어려워 개인 정보(1272)를 보호할 수 있다. 또한, 높은 연산 능력을 갖는 서버(1300)에서 개인화 모델을 이용하여 음성 신호에 대하여 데이터 처리를 함으로써 보다 높은 성능 및 처리 속도를 갖는 음성 처리 시스템을 구현할 수 있다. As a result, according to the disclosed embodiment, the personal information 1272 is transformed into a form that can not grasp the contents of a word or a sentence and is transmitted to the server 1300, so that even if the personalization model is exposed to the outside, The personal information 1272 can be protected. In addition, a server 1300 having a high computation capability can perform a data processing on a voice signal using a personalization model, thereby realizing a voice processing system having higher performance and processing speed.

도 23은 개시된 다른 실시예에 따른 단말과 서버의 구체적인 동작 과정의 예시를 나타내는 순서도이다. 23 is a flowchart showing an example of a specific operation procedure of a terminal and a server according to another disclosed embodiment.

먼저, 2310 단계에서 단말(1200)은 개인 정보(1272) 에 ID를 대응시켜 매핑 테이블(1273)을 생성한다. 단말(1200)은 개인 정보(1272)의 단어 리스트를 이용하여 발음 사전(1910)을 생성하고, 발음 사전(1910)을 이용하여 발음 기호와 단어에 대하여 ID를 매핑할 수 있다. 이 경우, 단말(1200)은 임의의 ID를 부여할 수 있다. First, in step 2310, the terminal 1200 generates a mapping table 1273 by associating an ID with the personal information 1272. The terminal 1200 can generate a pronunciation dictionary 1910 using the word list of the personal information 1272 and map an ID to the pronunciation symbol and the word using the pronunciation dictionary 1910. [ In this case, the terminal 1200 can assign an arbitrary ID.

다음으로, 2320 단계에서 단말(1200)은 매핑 테이블(1273)을 이용하여 ID 기반의 개인화 모델(1372)을 생성하여, 2330 단계에서 서버(1300)로 전송한다. 서버(1300)는 2340 단계에서 수신한 ID 기반의 개인화 모델(1372)을 저장부(1370)에 저장한다. ID 기반의 개인화 모델(1372)은 도 22에 도시된 단어 ID(2220) 및 음향 단위 ID(2230)를 기초로 생성될 수 있다.Next, in step 2320, the terminal 1200 generates an ID-based personalization model 1372 using the mapping table 1273, and transmits the ID-based personalization model 1372 to the server 1300 in step 2330. The server 1300 stores the ID-based personalization model 1372 received in step 2340 in the storage unit 1370. The ID-based personalization model 1372 can be generated based on the word ID 2220 and the acoustic unit ID 2230 shown in Fig.

그 후, 단말(1200)은 2350 단계에서 음성 신호를 수신하여 2360 단계에서 서버(1300)로 전송한다. 위에서 설명한 것과 같이 단말(100)은 다양한 구성 요소를 통해 음성 신호를 수신할 수 있다. 마이크부를 통해 음성 신호를 수신하는 것이 가장 일반적인 형태일 것이지만, USB 인터페이스부 또는 DVD 인터페이스부 등을 통해 음성 신호를 수신할 수도 있다. 나아가, 외부 장치와 통신을 통해 음성 신호를 수신할 수도 있다.Thereafter, the terminal 1200 receives the voice signal in step 2350 and transmits the voice signal to the server 1300 in step 2360. As described above, the terminal 100 can receive voice signals through various components. Receiving a voice signal through a microphone unit may be the most general form, but it may also receive a voice signal through a USB interface unit or a DVD interface unit. Further, it is also possible to receive a voice signal through communication with an external device.

서버(1300)는 2370 단계에서 수신한 음성 신호에 대하여 ID 기반의 개인화 모델을 이용하여 데이터 처리하고, 2380 단계에서 데이터 처리 결과를 단말(1200)로 전송한다. 이 경우, 서버(1300)는 개인 정보(1272) 또는 부가 정보에 매핑된 ID를 단말(1200)과 합의에 따라 음향 단위에 매핑된 ID를 이용하여 나타낼 수 있다.The server 1300 processes the voice signal received in step 2370 using an ID-based personalization model, and transmits the data processing result to the terminal 1200 in step 2380. In this case, the server 1300 can display the ID mapped to the private information 1272 or the additional information, using the ID mapped to the sound unit, in agreement with the terminal 1200.

그 후, 2390 단계에서 단말(1200)은 데이터 처리 결과 및 매핑 테이블(1273)을 이용하여 ID에 대응하는 개인 정보(1272) 또는 부가 정보를 복원한다. Then, in step 2390, the terminal 1200 restores the personal information 1272 or the additional information corresponding to the ID using the data processing result and the mapping table 1273.

한편, 상술한 실시예들은 컴퓨터에서 실행될 수 있는 프로그램으로 작성가능하고, 컴퓨터로 읽을 수 있는 기록매체를 이용하여 상기 프로그램을 동작시키는 범용 디지털 컴퓨터에서 구현될 수 있다.The above-described embodiments may be embodied in a general-purpose digital computer that can be embodied as a program that can be executed by a computer and operates the program using a computer-readable recording medium.

상기 컴퓨터로 읽을 수 있는 기록매체는 마그네틱 저장매체(예를 들면, 롬, 플로피 디스크, 하드디스크 등), 광학적 판독 매체(예를 들면, 시디롬, 디브이디 등) 및 캐리어 웨이브(예를 들면, 인터넷을 통한 전송)와 같은 저장매체를 포함한다. The computer readable recording medium may be a magnetic storage medium such as a ROM, a floppy disk, a hard disk, etc., an optical reading medium such as a CD-ROM or a DVD and a carrier wave such as the Internet Lt; / RTI > transmission).

이상과 첨부된 도면을 참조하여 실시예를 설명하였지만, 개시된 실시예가 속하는 기술분야에서 통상의 지식을 가진 자는 개시된 실시예가 그 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 실시될 수 있다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다.While the embodiments have been described with reference to the above description and accompanying drawings, those skilled in the art will appreciate that the disclosed embodiments may be practiced in other specific forms without departing from the spirit or essential characteristics thereof. It will be possible. It is therefore to be understood that the above-described embodiments are illustrative in all aspects and not restrictive.

100: 단말
110: 수신부
130: 통신부
150: 제어부
200: 서버
210: 수신부
230: 통신부
250: 제어부100: terminal
110:
130:
150:
200: Server
210:
230:
250:

Claims

Receiving a speech signal;
Detecting a personalized information section including personal information of the voice signal;
Performing data processing on a voice signal corresponding to the personalized information section of the voice signal using a personalized model generated based on the personal information; And
And receiving a result of data processing of a voice signal corresponding to a general information section, which is a section other than the personalized information section, from the server.

The method according to claim 1,
And generating and transmitting the personalized information section and the voice section information for the general information section to the server.

3. The method of claim 2,
The voice section information includes:
And marking at least one of the personalized information section and the general information section of the voice signal.

The method according to claim 1,
Further comprising the step of receiving the personalized information section and the voice interval information for the general information section from the server.

The method according to claim 1,
And a result of data processing of the audio signal corresponding to the general information section received from the server,
Wherein the server is a result of processing a voice signal corresponding to the general information section using a general model.

Receiving a voice signal;
Detecting a personalized information section including personal information of the voice signal;
Performing data processing on a speech signal corresponding to a general information section, which is a section of the speech signal other than the personalized information section, using a general model; And
And transmitting the data processing result of the voice signal corresponding to the general information section to the terminal.

The method according to claim 6,
Generating the personalized information section and the voice section information for the general information section and transmitting the generated voice section information to the terminal.

8. The method of claim 7,
The voice section information includes:
And marking at least one of the personalized information section and the general information section of the voice signal.

The method according to claim 6,
Further comprising the step of receiving the personalized information section and the voice section information for the general information section from the terminal.

Receiving a voice signal;
Receiving a result of data processing of the voice signal from a server using a general model; And
A personalization model generated based on personal information on the voice signal, and data processing using the personalization model and the data processing result.

11. The method of claim 10,
Wherein the personalization model generated based on the personal information with respect to the voice signal and the data processing using the data processing result include:
And performing data processing on a voice signal corresponding to a personalized information section including the personal information.

12. The method of claim 11,
Further comprising detecting the personalized information section of the voice signal.

11. The method of claim 10,
Further comprising the step of receiving voice section information on the personalized information section and the general information section from the server.

14. The method of claim 13,
The voice section information includes:
And marking at least one of the personalized information section and the general information section of the voice signal.

11. The method of claim 10,
The personalization model includes:
Wherein the at least one model is at least one of a personalized speech recognition model, a personalized natural language understanding model, and a personalized lexical model.

Generating a mapping table by mapping IDs to private information;
Generating an ID-based personalization model using the mapping table;
Transmitting the ID-based personalization model to a server;
Receiving a result of data processing of the voice signal from the server using the ID-based personalization model; And
And restoring the personal information corresponding to the ID using the data processing result and the mapping table.

17. The method of claim 16,
Wherein the step of generating an ID-based personalization model using the mapping table comprises:
And displaying the ID mapped to the personal information as an acoustic unit ID that is an ID mapped to an acoustic signal.

18. The method of claim 17,
The acoustic unit ID includes:
And an ID mapped to the sound in agreement with the server.

18. The method of claim 17,
And mapping the ID to the additional information generated from the private information to generate the mapping table.

Receiving an ID-based personalization model from a terminal;
Receiving a voice signal;
Performing data processing on the voice signal using the ID-based personalization model; And
And transmitting the data processing result to the terminal.

21. The method of claim 20,
Wherein the step of data processing the voice signal using the ID-based personalization model comprises:
And displaying an ID mapped to the personal information using an acoustic unit ID that is an ID mapped to the acoustic unit in agreement with the terminal.

A receiving unit for receiving a voice;
A communication unit for communicating with the server; And
A personalization information section that includes personal information of the voice signal and detects a voice signal corresponding to the personalized information section of the voice signal using a personalization model generated based on the personal information, And a control unit for controlling the data processing unit to receive a result of data processing of a voice signal corresponding to a general information section, which is a section other than the personalized information section, from the server.

23. The method of claim 22,
Wherein,
And generates and transmits voice section information for the personalized information section and the general information section to the server.

24. The method of claim 23,
The voice section information includes:
And marking at least one of the personalized information section and the general information section of the voice signal.

23. The method of claim 22,
Wherein,
And controls to receive the personalized information section and the voice section information for the general information section from the server.

23. The method of claim 22,
And a result of data processing of the audio signal corresponding to the general information section received from the server,
Wherein the server is a result of processing a voice signal corresponding to the general information section using a general model.

A receiving unit for receiving a voice;
A communication unit for performing communication with the terminal; And
The method includes receiving a voice signal, detecting a personalized information section including personal information of the voice signal, and transmitting the voice signal corresponding to the general information section, which is a section other than the personalized information section, And a controller for controlling the terminal to transmit a data processing result of the voice signal corresponding to the general information section to the terminal.

28. The method of claim 27,
Wherein,
And generates voice section information on the personalized information section and the general information section and transmits the generated voice section information to the terminal.

29. The method of claim 28,
The voice section information includes:
And marking at least one of the personalized information section and the general information section of the voice signal.

28. The method of claim 27,
Wherein,
And controls to receive the personalized information section and the voice section information for the general information section from the terminal.

A communication unit for communicating with the server; And
A control unit for receiving a result of data processing of the voice signal from the server using a general model and for controlling data processing of the voice signal using the personalization model generated based on the data processing result and the personal information And the terminal.

32. The method of claim 31,
Wherein,
When data processing is performed on the voice signal using the personalization model generated based on the data processing result and the personal information,
And controls to perform data processing on a voice signal corresponding to a personalized information section including the personal information.

33. The method of claim 32,
Wherein,
And controls to detect the personalized information section of the voice signal.

32. The method of claim 31,
Wherein,
And controls to receive from the server the personalized information section and the voice section information for the general information section.

35. The method of claim 34,
The voice section information includes:
And marking at least one of the personalized information section and the general information section of the voice signal.

32. The method of claim 31,
The personalization model includes:
A personalized speech recognition model, a language understanding model, and a personalized lexical model.

Receiving unit for receiving a voice signal:
A communication unit for communicating with the server; And
Based personalization model by using the mapping table, transmitting the ID-based personalization model to the server, and transmitting the ID from the server to the ID Based on a result of the data processing using the personalization model based on the personalization model, and restoring the personal information corresponding to the ID using the data processing result and the mapping table.

39. The method of claim 37,
Wherein,
When creating an ID-based personalization model using the mapping table,
And controls the ID mapped to the personal information to be represented by an acoustic unit ID which is an ID mapped to the sound.

39. The method of claim 38,
The acoustic unit ID includes:
And an ID mapped to the sound in agreement with the server.

39. The method of claim 38,
Wherein,
And generates the mapping table by mapping an ID to the additional information generated from the personal information.

A receiving unit for receiving a voice signal;
A communication unit for performing communication with the terminal; And
Based personalization model, receives an audio-based personalization model, receives a voice signal, processes the voice signal using the ID-based personalization model, and transmits the data processing result to the terminal The server.

42. The method of claim 41,
Wherein,
Based on the ID-based personalization model,
And controls to display an ID mapped to the personal information using an acoustic unit ID that is an ID mapped to the acoustic unit in agreement with the terminal.