KR102345754B1

KR102345754B1 - Speech Recognition Model Management System for Training Speech Recognition Model

Info

Publication number: KR102345754B1
Application number: KR1020190179749A
Authority: KR
Inventors: 이성용; 이영래; 김성욱
Original assignee: 주식회사 포스코아이씨티
Priority date: 2019-12-31
Filing date: 2019-12-31
Publication date: 2021-12-30
Also published as: KR20210086068A

Abstract

사용자가 웹 환경에서 음성인식을 위한 음성인식 모델을 학습시킬 수 있는 본 발명의 일 측면에 따른 음성인식모델을 학습시킬 수 있는 음성인식모델 관리시스템은 언어데이터 및 음성데이터 중 적어도 하나의 데이터의 업로드요청에 따라 상기 데이터를 수집하여 데이터베이스에 저장하는 데이터 관리부; 제1 데이터의 메타정보와 제1 음성인식모델의 메타정보를 포함하는 음성인식모델 학습요청에 따라 상기 데이터베이스에 저장된 복수개의 데이터 중 상기 메타정보에 매핑되는 제1 데이터를 추출하고, 상기 복수개의 음성인식모델 중 상기 메타정보에 매핑되는 제1 음성인식모델을 추출하여 상기 제1 데이터로 상기 제1 음성인식모델을 학습시키는 모델링부; 및 상기 제1 음성인식모델 배포요청에 따라 상기 제1 음성인식모델을 음성인식 서비스를 제공하는 음성인식서버로 배포하는 모델 배포부를 포함하는 것을 특징으로 한다.A voice recognition model management system capable of learning a voice recognition model according to an aspect of the present invention, in which a user can learn a voice recognition model for voice recognition in a web environment, uploads at least one of language data and voice data. a data management unit that collects the data according to a request and stores it in a database; Extracting first data mapped to the meta information from among a plurality of data stored in the database according to a voice recognition model learning request including meta information of the first data and meta information of the first voice recognition model, and the plurality of voices a modeling unit that extracts a first speech recognition model mapped to the meta information among recognition models and trains the first speech recognition model with the first data; and a model distribution unit for distributing the first speech recognition model to a speech recognition server providing a speech recognition service in response to the request for distribution of the first speech recognition model.

Description

A speech recognition model management system that can train a speech recognition model {Speech Recognition Model Management System for Training Speech Recognition Model}

본 발명은 음성인식모델에 관한 것이다.The present invention relates to a speech recognition model.

음성인식이란 사람의 음성을 디지털화하여 텍스트로 변환하는 것을 의미한다. 음성인식을 수행하는 음성인식모델을 생성하기 위해서는 음성데이터와 해당 음성데이터를 텍스트로 변환한 텍스트데이터가 필요하다.Speech recognition refers to digitizing human voice and converting it into text. In order to create a voice recognition model that performs voice recognition, voice data and text data obtained by converting the voice data into text are required.

일반적으로 텍스트 데이터는 속기사가 음성데이터를 청취하여 작성하게 되는데, 이러한 경우 비용이나 시간적인 측면에서 많은 손실이 발생한다. In general, text data is written by a stenographer by listening to voice data. In this case, a lot of loss occurs in terms of cost or time.

또한, 음성데이터와 해당 음성데이터의 텍스트 데이터를 학습데이터로 하여 음성인식모델을 학습시키는 것은 숙련된 전문가가 아니라면 시간이 오래 소요되고 배경지식이 많이 필요하다는 문제가 있다.In addition, there is a problem in that it takes a long time and requires a lot of background knowledge to train a voice recognition model using voice data and text data of the corresponding voice data as learning data.

더욱이, 오픈소스 기반의 음성인식모델의 특성상 리눅스(Linux) 환경에서 학습데이터의 관리 및 처리가 수행되고, 음성인식모델이 이를 통해 학습되기 때문에, 데이터 관리 및 음성인식모델의 관리가 어려울 뿐만 아니라, 이러한 데이터 또한 검색하기 어렵다는 문제가 있다.Moreover, due to the nature of the open source-based voice recognition model, the management and processing of learning data is performed in a Linux environment, and the voice recognition model is learned through this, making it difficult to manage data and manage the voice recognition model. Such data also has a problem in that it is difficult to search.

본 발명은 상술한 문제점을 해결하기 위한 것으로서, 사용자가 웹 환경에서 음성인식을 위한 음성인식 모델을 학습시킬 수 있는 음성인식 관리시스템을 제공하는 것을 그 기술적 과제로 한다.The present invention is to solve the above problems, and it is a technical task of the present invention to provide a voice recognition management system in which a user can learn a voice recognition model for voice recognition in a web environment.

또한, 본 발명은 음성데이터를 기존에 존재하는 음성인식모델에 입력하여 텍스트 데이터를 생성하고 이러한 텍스트 데이터를 사용자가 쉽게 수정할 수 있는 환경을 제공할 수 있는 음성인식 관리시스템을 제공하는 것을 그 기술적 과제로 한다.In addition, it is a technical task of the present invention to provide a voice recognition management system capable of generating text data by inputting voice data into an existing voice recognition model and providing an environment in which a user can easily modify such text data. do it with

또한, 본 발명은 학습된 음성인식 모델들을 음성인식을 수행하기 위한 서버로 각각 배포할 수 있는 음성인식 관리시스템을 제공하는 것을 그 기술적 과제로 한다.In addition, it is a technical task of the present invention to provide a voice recognition management system capable of distributing the learned voice recognition models to a server for performing voice recognition, respectively.

상술한 목적을 달성하기 위해서 본 발명의 일 측면에 따른 음성인식모델을 학습시킬 수 있는 음성인식모델 관리시스템은 언어데이터 및 음성데이터 중 적어도 하나의 데이터의 업로드요청에 따라 상기 데이터를 수집하여 데이터베이스에 저장하는 데이터 관리부; 제1 데이터의 메타정보와 제1 음성인식모델의 메타정보를 포함하는 음성인식모델 학습요청에 따라 상기 데이터베이스에 저장된 복수개의 데이터 중 상기 메타정보에 매핑되는 제1 데이터를 추출하고, 상기 복수개의 음성인식모델 중 상기 메타정보에 매핑되는 제1 음성인식모델을 추출하여 상기 제1 데이터로 상기 제1 음성인식모델을 학습시키는 모델링부; 및 상기 제1 음성인식모델 배포요청에 따라 상기 제1 음성인식모델을 음성인식 서비스를 제공하는 음성인식서버로 배포하는 모델 배포부를 포함하는 것을 특징으로 한다.In order to achieve the above object, a voice recognition model management system capable of learning a voice recognition model according to an aspect of the present invention collects the data according to a request for uploading at least one of language data and voice data, and stores the data in a database. data management unit to store; Extracting first data mapped to the meta information from among a plurality of data stored in the database according to a voice recognition model learning request including meta information of the first data and meta information of the first voice recognition model, and the plurality of voices a modeling unit that extracts a first speech recognition model mapped to the meta information from among recognition models and trains the first speech recognition model with the first data; and a model distribution unit for distributing the first speech recognition model to a speech recognition server providing a speech recognition service in response to the request for distribution of the first speech recognition model.

본 발명에 따르면 사용자가 웹 환경에서 음성인식을 위한 음성인식 모델을 학습시킬 수 있어 음성인식을 위한 전문적인 배경지식이 없더라도 쉽게 음성인식모델을 학습시킬 수 있을 뿐만 아니라, 리눅스 환경에서 수행되던 음성인식모델의 학습이 웹 환경을 통해 수행될 수 있어 사용자의 편의성이 향상될 수 있다는 효과가 있다.According to the present invention, since a user can learn a voice recognition model for voice recognition in a web environment, it is possible to easily learn a voice recognition model even without professional background knowledge for voice recognition, as well as voice recognition performed in a Linux environment. Since model learning can be performed through the web environment, there is an effect that user convenience can be improved.

또한, 본 발명은 음성데이터를 기존에 존재하는 음성인식모델에 입력하여 텍스트 데이터를 생성하고 이러한 텍스트 데이터를 사용자가 쉽게 수정할 수 있는 환경을 제공할 수 있어 속기사를 고용함으로 인해 발생하는 비용과 시간을 감축할 수 있다는 효과가 있다.In addition, the present invention can generate text data by inputting voice data into an existing voice recognition model and provide an environment in which a user can easily modify such text data, thereby reducing the cost and time incurred by hiring a stenographer. It has the effect of reducing it.

또한, 본 발명은 학습된 음성인식 모델들을 음성인식을 수행하기 위한 서버로 배포할 수 있어 음성인식을 필요로 하는 다양한 서비스를 쉽게 제공할 수 있다는 효과가 있다. In addition, the present invention has the effect of being able to easily provide various services requiring voice recognition by distributing the learned voice recognition models to a server for performing voice recognition.

도 1은 본 발명의 일 실시예에 따른 음성인식모델 관리시스템의 구성을 보여주는 도면이다.
도 2는 본 발명의 일 실시예에 따른 관리서버(200)의 구성을 보여주는 도면이다.
도 3은 데이터 관리 페이지의 일 예를 보여주는 도면이다.
도 4는 언어데이터 조회페이지 및 언어데이터 수정페이지의 일예를 보여주는 도면이다.
도 5는 음향데이터 조회페이지 및 음향데이터 수정페이지의 일예를 보여주는 도면이다.
도 6은 음성녹음페이지의 일예를 보여주는 도면이다.
도 7은 모델을 학습시키기 위한 페이지들의 일예를 보여주는 도면이다.
도 8은 모델 관리페이지의 일예를 보여주는 도면이다.1 is a diagram showing the configuration of a voice recognition model management system according to an embodiment of the present invention.
2 is a diagram showing the configuration of the management server 200 according to an embodiment of the present invention.
3 is a diagram illustrating an example of a data management page.
4 is a view showing an example of a language data inquiry page and a language data correction page.
5 is a view showing an example of a sound data inquiry page and a sound data correction page.
6 is a diagram illustrating an example of a voice recording page.
7 is a diagram illustrating an example of pages for training a model.
8 is a diagram illustrating an example of a model management page.

본 명세서에서 서술되는 용어의 의미는 다음과 같이 이해되어야 할 것이다.The meaning of the terms described in this specification should be understood as follows.

단수의 표현은 문맥상 명백하게 다르게 정의하지 않는 한 복수의 표현을 포함하는 것으로 이해되어야 하고, "제1", "제2" 등의 용어는 하나의 구성요소를 다른 구성요소로부터 구별하기 위한 것으로, 이들 용어들에 의해 권리범위가 한정되어서는 아니 된다.The singular expression is to be understood as including the plural expression unless the context clearly defines otherwise, and the terms "first", "second", etc. are used to distinguish one element from another, The scope of rights should not be limited by these terms.

"포함하다" 또는 "가지다" 등의 용어는 하나 또는 그 이상의 다른 특징이나 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.It should be understood that terms such as “comprise” or “have” do not preclude the possibility of addition or existence of one or more other features or numbers, steps, operations, components, parts, or combinations thereof.

"적어도 하나"의 용어는 하나 이상의 관련 항목으로부터 제시 가능한 모든 조합을 포함하는 것으로 이해되어야 한다. 예를 들어, "제1 항목, 제2 항목 및 제 3항목 중에서 적어도 하나"의 의미는 제1 항목, 제2 항목 또는 제3 항목 각각 뿐만 아니라 제1 항목, 제2 항목 및 제3 항목 중에서 2개 이상으로부터 제시될 수 있는 모든 항목의 조합을 의미한다.The term “at least one” should be understood to include all possible combinations from one or more related items. For example, the meaning of “at least one of the first, second, and third items” means 2 of the first, second, and third items as well as each of the first, second, or third items. It means a combination of all items that can be presented from more than one.

이하, 첨부된 도면을 참조하여 본 발명의 실시예에 대해 구체적으로 설명한다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 일 실시예에 따른 음성인식모델을 학습시킬 수 있는 음성인식모델 관리시스템의 구성을 보여주는 도면이다.1 is a diagram showing the configuration of a voice recognition model management system capable of learning a voice recognition model according to an embodiment of the present invention.

본 발명에 따른 음성인식모델을 학습시킬 수 있는 음성인식모델 관리시스템(이하 '음성인식모델 관리시스템'이라 함, 100)은 음성인식모델을 학습시키기 위한 음성데이터 및 언어데이터를 수집하고, 이를 이용하여 음성인식모델을 학습시킨다. 또한, 음성인식모델 관리시스템(100)은 학습이 완료된 음성인식모델을 음성인식 서비스를 제공하는 음성인식서버로 배포할 수 있다.A voice recognition model management system (hereinafter referred to as 'speech recognition model management system', 100) capable of learning a voice recognition model according to the present invention collects voice data and language data for learning a voice recognition model, and uses it to train a voice recognition model. In addition, the voice recognition model management system 100 may distribute the learned voice recognition model to a voice recognition server that provides a voice recognition service.

특히, 음성데이터 및 언어데이터 등을 포함하는 데이터의 수집, 처리와 음성인식모델 학습, 관리, 평가는 리눅스(Linux) 환경에서 수행되었다. 이에 따라 전문가가 아닌 사용자는 접근이 어려웠다는 문제가 있었다.In particular, collection and processing of data including voice data and language data, and voice recognition model learning, management, and evaluation were performed in a Linux environment. Accordingly, there was a problem that non-professional users were difficult to access.

이에 따라 본 발명은 사용자가 쉽게 데이터 수집, 데이터 처리, 음성인식모델 학습, 음성인식모델 관리, 음성인식모델 평가 등을 수행할 수 있도록 웹 기반의 음성인식모델 관리시스템을 제공한다.Accordingly, the present invention provides a web-based voice recognition model management system so that a user can easily collect data, process data, learn a voice recognition model, manage a voice recognition model, evaluate a voice recognition model, and the like.

이를 위해 도 1에 도시된 바와 같이 본 발명에 따른 음성인식모델 관리시스템(100)은 관리서버(200) 및 사용자 단말기(300)를 포함한다.To this end, as shown in FIG. 1 , the voice recognition model management system 100 according to the present invention includes a management server 200 and a user terminal 300 .

관리서버(200)는 사용자의 요청에 따라 음성데이터 또는 언어데이터를 수집하고, 음성데이터 또는 언어데이터로 음성인식모델을 학습시켜 음성인식 서비스를 제공하는 서버(미도시)로 배포할 수 있다.The management server 200 may collect voice data or language data according to a user's request, train a voice recognition model with the voice data or language data, and distribute it to a server (not shown) that provides a voice recognition service.

이하, 도 2를 참조하여 본 발명에 따른 관리시버(200)에 대해 보다 구체적으로 설명한다.Hereinafter, the management receiver 200 according to the present invention will be described in more detail with reference to FIG. 2 .

도 2는 본 발명의 일 실시예에 따른 관리서버(200)의 구성을 보여주는 도면이다.2 is a diagram showing the configuration of the management server 200 according to an embodiment of the present invention.

도 2에 도시된 바와 같이 본 발명의 일 실시예에 따른 관리서버(200)는 데이터 관리부(210), 모델링부(220), 모델 배포부(230), 데이터베이스(240), 검색부(250), 검색 데이터베이스(260)를 더 포함할 수 있다.As shown in FIG. 2 , the management server 200 according to an embodiment of the present invention includes a data management unit 210 , a modeling unit 220 , a model distribution unit 230 , a database 240 , and a search unit 250 . , a search database 260 may be further included.

데이터 관리부(210)는 언어데이터 및 음성데이터 중 적어도 하나의 데이터의 업로드 요청에 따라 데이터를 수집하여 데이터베이스(240)에 저장한다. 구체적으로 데이터 관리부(210)는 업로드 요청이 발생하면, 데이터베이스(240)에 언어데이터 또는 음성데이터가 저장될 저장공간을 생성한다. 데이터 관리부(210)는 언어데이터 또는 음성데이터를 임시 데이터베이스(미도시)에 업로드하고 생성된 저장공간으로 이동시킨다. The data management unit 210 collects data according to a request for uploading at least one of language data and voice data, and stores the data in the database 240 . Specifically, when an upload request occurs, the data management unit 210 creates a storage space in which language data or voice data is stored in the database 240 . The data management unit 210 uploads language data or voice data to a temporary database (not shown) and moves it to the created storage space.

예컨대, 사용자가 사용자 단말기(300)를 통해 도 3에 도시된 바와 같이 데이터 관리페이지에 접속하여 데이터 관리페이지에 추가아이콘(310)을 선택하면, 사용자 단말기(300)에 저장된 언어데이터 또는 음성데이터를 업로드할 수 있는 업로드 페이지가 나타난다. 사용자가 제1 언어데이터를 업로드 페이지를 통해 업로드하면 업로드 요청이 발생된다. 데이터 관리부(210)는 업로드 요청이 발생하면 데이터베이스(240)에 제1 언어데이터가 저장될 제1 저장공간을 생성한다. 데이터 관리부(240)는 제1 언어데이터를 임시 데이터베이스를 통해 업로드하고 제1 저장공간에 제1 언어데이터를 저장한다.For example, when the user accesses the data management page as shown in FIG. 3 through the user terminal 300 and selects the add icon 310 to the data management page, the language data or voice data stored in the user terminal 300 is An upload page appears where you can upload. When the user uploads the first language data through the upload page, an upload request is generated. When an upload request occurs, the data management unit 210 creates a first storage space in which the first language data is stored in the database 240 . The data management unit 240 uploads the first language data through the temporary database and stores the first language data in the first storage space.

일 실시예에 있어서, 업로드 요청은 언어데이터 또는 음성데이터의 입력정보를 포함할 수 있다. 일예로, 사용자가 제1 음성데이터를 업로드 페이지를 통해 업로드할 때, 제1 음성데이터의 입력정보가 입력될 수 있다. 예컨대, 입력정보는 제1 음성데이터의 이름, 제1 음성데이터를 업로드하는 사용자 정보 등을 포함할 수 있다.In an embodiment, the upload request may include input information of language data or voice data. For example, when the user uploads the first voice data through the upload page, input information of the first voice data may be input. For example, the input information may include a name of the first voice data, user information for uploading the first voice data, and the like.

이러한 실시예를 따를 때, 데이터 관리부(210)는 언어데이터 또는 음성데이터가 저장될 저장공간을 입력정보를 기초로 암호화하여 생성할 수 있다. 예컨대, 데이터 관리부(210)는 입력정보에 포함된 사용자 정보, 현재시간 등을 혼합한 32자리로 암호화하여 저장공간을 생성할 수 있다.According to this embodiment, the data management unit 210 may generate a storage space in which language data or voice data is to be stored by encrypting it based on input information. For example, the data management unit 210 may generate a storage space by encrypting the user information, the current time, etc. included in the input information to 32 digits.

일 실시예에 있어서, 데이터 관리부(210)는 언어데이터 또는 음성데이터를 입력정보를 기초로 암호화하여 저장공간에 저장할 수 있다. 예컨대, 제1 언어데이터가 암호화된 저장공간에 저장될 때, 데이터 관리부(210)는 제1 언어데이터의 파일명을 입력정보를 기초로 암호화하여 저장할 수 있다.In an embodiment, the data management unit 210 may encrypt the language data or voice data based on the input information and store it in the storage space. For example, when the first language data is stored in the encrypted storage space, the data management unit 210 may encrypt and store the file name of the first language data based on the input information.

데이터 관리부(210)는 음성데이터를 미리 생성된 기준 음성인식모델에 입력하여 텍스트 데이터를 추출할 수 있다. 구체적으로, 데이터 관리부(210)는 음성데이터가 업로드 되면, 텍스트 데이터 추출요청에 따라 음성데이터를 기준 음성인식모델에 입력하여 텍스트 데이터를 추출할 수 있다. 이때, 텍스트 데이터 추출요청은 음성데이터의 메타정보를 포함하고 있을 수 있다.The data management unit 210 may extract text data by inputting voice data into a pre-generated reference voice recognition model. Specifically, when the voice data is uploaded, the data management unit 210 may extract the text data by inputting the voice data into the reference voice recognition model according to a text data extraction request. In this case, the text data extraction request may include meta information of voice data.

이러한 경우, 데이터 관리부(210)는 복수개의 음성데이터 중 메타정보에 매핑되는 음성데이터를 기준음성인식모델에 입력하여 텍스트 데이터를 추출할 수 있다. In this case, the data management unit 210 may extract text data by inputting voice data mapped to meta information among a plurality of voice data into the reference voice recognition model.

예컨대, 사용자는 사용자 단말기(300)를 통해 도 3에 도시된 바와 같이 데이터 관리페이지에 접속할 수 있다. 이러한 경우, 사용자는 데이터 관리페이지에 표시되는 복수개의 음성데이터 중 어느 하나의 음성데이터의 텍스트변환 아이콘(320)을 선택하면, 해당 음성데이터의 메타정보를 포함하는 텍스트 데이터 추출요청이 발생된다. 데이터 관리부(210)는 텍스트 데이터 추출요청에 따라 복수개의 음성데이터 중 메타정보에 매핑되는 음성데이터를 기준 음성인식모델에 입력하여 텍스트 데이터를 추출하게 된다. 텍스트 데이터가 추출된 음성데이터의 텍스트 변환 아이콘(320)은 보라색으로 활성화되어 있을 수 있고, 텍스트 데이터가 추출되지 않은 음성데이터의 텍스트 변환 아이콘(320)은 회색으로 활성화되어 있을 수 있다.For example, the user may access the data management page as shown in FIG. 3 through the user terminal 300 . In this case, when the user selects the text conversion icon 320 of any one of the plurality of voice data displayed on the data management page, a text data extraction request including meta information of the corresponding voice data is generated. The data management unit 210 extracts text data by inputting voice data mapped to meta information among a plurality of voice data into a reference voice recognition model in response to a text data extraction request. The text conversion icon 320 of voice data from which text data is extracted may be activated in purple, and the text conversion icon 320 of voice data from which text data is not extracted may be activated in gray.

데이터 관리부(210)는 추출된 텍스트 데이터를 음성데이터와 매핑시켜 데이터베이스(240)에 저장할 수 있다. 구체적으로, 데이터 관리부(210)는 해당 음성데이터가 저장된 데이터베이스(240)의 저장공간에 텍스트 데이터를 저장할 수 있다.The data management unit 210 may store the extracted text data and voice data in the database 240 . Specifically, the data management unit 210 may store the text data in the storage space of the database 240 in which the corresponding voice data is stored.

일 실시예에 있어서, 데이터 관리부(210)는 음성데이터 또는 언어데이터 수정요청이 발생하면, 해당 데이터를 로딩하여 수정되게 한다. 구체적으로 데이터 관리부(210)는 음성데이터의 메타정보를 포함하는 음성데이터 수정요청이 발생하면, 복수개의 음성데이터 중 메타정보에 매핑되는 음성데이터 및 해당 음성데이터의 텍스트데이터를 로딩하여 수정되게 한다. 데이터 관리부(210)는 수정이 완료되면 수정된 음성데이터 및 텍스트데이터를 데이터베이스(240)에 저장한다. 또한, 데이터 관리부(210)는 언어데이터의 메타정보를 언어데이터 수정요청이 발생하면, 복수개의 언어데이터 중 메타정보에 매핑되는 언어데이터를 로딩하여 수정되게 한다. 데이터 관리부(210)는 수정이 완료되면 수정된 언어데이터를 데이터 베이스(240)에 저장한다.In one embodiment, when a request for correction of voice data or language data occurs, the data management unit 210 loads the data to be corrected. Specifically, when a request for correction of voice data including meta information of voice data is generated, the data management unit 210 loads voice data mapped to meta information among a plurality of voice data and text data of the corresponding voice data to be modified. When the correction is completed, the data management unit 210 stores the corrected voice data and text data in the database 240 . In addition, the data management unit 210 loads the language data mapped to the meta information among the plurality of language data when a language data modification request occurs to modify the meta information of the language data. When the correction is completed, the data management unit 210 stores the corrected language data in the database 240 .

예컨대, 도 4a에 도시된 바와 같이 사용자가 사용자 단말기(300)를 통해 언어데이터 조회페이지에 접속하면, 데이터베이스(250)에 저장된 복수개의 언어데이터가 사용자단말기(300)를 통해 디스플레이 된다. 사용자가 도 4a에 도시된 복수개의 언어데이터 중 어느 하나의 텍스트 수정 아이콘(410)을 선택하면, 해당 언어데이터의 메타정보를 포함하는 언어데이터 수정요청이 발생된다. 데이터 관리부(210)는 언어데이터 수정요청에 따라 복수개의 언어데이터 중 메타정보에 매핑되는 언어데이터를 로딩하여 사용자가 언어데이터를 수정할 수 있게 한다. 도 4b에 도시된 바와 같이 데이터 관리부(210)는 언어데이터 수정페이지를 사용자단말기(300)를 통해 디스플레이하게 되고, 수정이 완료되면, 데이터 관리부(210)는 수정된 언어데이터를 데이터베이스(240)에 저장한다.For example, as shown in FIG. 4A , when the user accesses the language data inquiry page through the user terminal 300 , a plurality of language data stored in the database 250 is displayed through the user terminal 300 . When the user selects one of the text correction icons 410 among the plurality of language data shown in FIG. 4A , a language data correction request including meta information of the corresponding language data is generated. The data management unit 210 allows the user to modify the language data by loading language data mapped to meta information among a plurality of language data in response to a language data modification request. As shown in FIG. 4B , the data management unit 210 displays the language data correction page through the user terminal 300 , and when the correction is completed, the data management unit 210 stores the corrected language data in the database 240 . Save.

또한, 도 5a에 도시된 바와 같이 사용자가 사용자 단말기(300)를 통해 음성데이터 조회페이지에 접속하면, 데이터베이스(250)에 저장된 복수개의 음성데이터가 사용자단말기(300)를 통해 디스플레이 된다. 사용자가 도 5a에 도시된 복수개의 음성데이터 중 어느 하나의 텍스트 수정 아이콘(510)을 선택하면, 해당 음성데이터에 매핑되는 메타정보를 포함하는 음성데이터 수정요청이 발생된다. 데이터 관리부(210)는 음성데이터 수정요청에 따라 복수개의 음성데이터 중 메타정보에 매핑되는 음성데이터 및 음성데이터의 텍스트 데이터를 로딩하여 사용자가 음성데이터 및 텍스트 데이터를 수정할 수 있게 한다. 도 5b에 도시된 바와 같이 데이터 관리부(210)는 음성데이터 수정페이지를 사용자단말기(300)를 통해 디스플레이하게 된다. 음성데이터 수정페이지에서 사용자에 의해 음성재생 요청이 발생되면 데이터 관리부(210)는 음성재생을 수행할 수 있고, 편집요청이 발생되면 도 5c에 도시된 바와 같이 음성데이터 수정페이지의 편집모드로 변화되게 할 수 있다. Also, as shown in FIG. 5A , when the user accesses the voice data inquiry page through the user terminal 300 , a plurality of voice data stored in the database 250 is displayed through the user terminal 300 . When the user selects one of the text correction icons 510 among the plurality of voice data shown in FIG. 5A , a voice data correction request including meta information mapped to the corresponding voice data is generated. The data management unit 210 allows the user to modify the voice data and text data by loading voice data mapped to meta information among a plurality of voice data and text data of the voice data according to the voice data correction request. As shown in FIG. 5B , the data management unit 210 displays the voice data correction page through the user terminal 300 . When a voice playback request is generated by the user on the voice data correction page, the data management unit 210 can perform voice playback, and when an editing request is generated, the voice data correction page is changed to the editing mode as shown in FIG. 5C . can do.

여기서 사용자는 도 3에 도시된 데이터 관리페이지에서도 음성데이터 또는 언어데이터 수정요청을 발생시킬 수 있고, 이에 따라 데이터 관리부(210)는 음성데이터 또는 언어데이터 수정페이지를 디스플레이할 수 있다.Here, the user may generate a voice data or language data correction request even on the data management page shown in FIG. 3 , and accordingly, the data management unit 210 may display the voice data or language data correction page.

데이터 관리부(210)는 실시간으로 음성 녹음을 할 수 있다. 예컨대, 도 6에 도시된 바와 같이 사용자는 사용자 단말기(300)를 통해 음성녹음페이지에 접속하여 녹음요청을 발생시킬 수 있다. 녹음요청이 발생되면 데이터 관리부(210)는 실시간으로 음성데이터를 데이터베이스(210)에 저장하고, 기준 음성인식모델에 해당 음성데이터를 입력하여 텍스트 데이터를 추출할 수 있다. 이러한 경우, 데이터 관리부(210)는 텍스트 데이터를 음성녹음페이지에 디스플레이 하게 된다.The data management unit 210 may record voice in real time. For example, as shown in FIG. 6 , the user may generate a recording request by accessing the voice recording page through the user terminal 300 . When a recording request is generated, the data management unit 210 may store the voice data in the database 210 in real time, and may extract the text data by inputting the corresponding voice data into the reference voice recognition model. In this case, the data management unit 210 displays the text data on the voice recording page.

다시 도 2를 참조하면, 모델링부(220)는 음성인식모델 학습요청에 따라 음성데이터 또는 언어데이터로 데이터베이스에 저장된 음성인식모델을 학습시킨다. 구체적으로 모델링부(220)는 음성데이터 또는 언어데이터의 메타정보와 음성인식모델의 메타정보를 포함하는 음성인식모델 학습요청에 따라 복수개의 음성데이터 또는 언어데이터 중 메타정보에 매핑되는 음성데이터 또는 언어데이터를 추출한다. 모델링부(220)는 복수개의 음성인식모델 중 메타정보에 매핑되는 음성인식모델을 추출하고, 추출된 음성데이터 또는 언어데이터로 음성인식모델을 학습시킬 수 있다.Referring back to FIG. 2 , the modeling unit 220 learns the voice recognition model stored in the database as voice data or language data according to the voice recognition model learning request. Specifically, the modeling unit 220 is voice data or language mapped to meta information among a plurality of voice data or language data according to a voice recognition model learning request including meta information of voice data or language data and meta information of the voice recognition model. extract data. The modeling unit 220 may extract a voice recognition model mapped to meta information among a plurality of voice recognition models, and train the voice recognition model with the extracted voice data or language data.

일 실시예에 있어서, 모델링부(220)는 음성인식모델을 KALDI 알고리즘을 이용하여 학습시킬 수 있다.In an embodiment, the modeling unit 220 may learn the voice recognition model using the KALDI algorithm.

일 실시예에 있어서, 음성인식모델은 음향모델 및 언어모델을 포함할 수 있다. 음향모델은 음성을 텍스트로 변환하고, 언어모델은 음성모델로부터 출력되는 텍스트를 교정한다.In an embodiment, the speech recognition model may include an acoustic model and a language model. The acoustic model converts speech into text, and the language model corrects the text output from the speech model.

모델링부(220)는 언어모델 생성요청이 발생하면, 언어데이터로 언어모델을 생성한다. 구체적으로 모델링부(220)는 언어데이터의 메타정보를 포함하는 언어모델 생성요청이 발생하면, 복수개의 언어데이터 중 언어데이터의 메타정보에 매핑되는 언어데이터를 추출하고, 추출된 언어데이터로 언어모델을 생성한다. The modeling unit 220 generates a language model using language data when a language model generation request occurs. Specifically, when a request for generating a language model including meta information of language data occurs, the modeling unit 220 extracts language data mapped to meta information of language data from among a plurality of language data, and uses the extracted language data as a language model create

예컨대, 도 7a에 도시된 바와 같이 사용자는 사용자단말기(300)를 통해 모델 학습페이지에 접속할 수 있다. 사용자가 언어모델 추가 아이콘(710)을 선택하면, 도 7b에 도시된 바와 같이 모델링부(220)는 언어모델 추가페이지를 사용자단말기(300)를 통해 디스플레이한다. 사용자가 데이터셋 추가 아이콘(810)을 선택하면, 모델링부(220)는 데이터베이스(240)에 저장된 복수개의 언어데이터를 디스플레이하게 되고, 사용자가 복수개의 언어데이터 중 어느 하나를 선택하고, 학습시작 아이콘(820)을 선택하면, 언어모델 생성요청이 발생한다. 이때, 언어모델 생성요청은 사용자가 선택한 언어데이터의 메타정보를 포함한다. 모델링부(220)는 언어데이터의 메타정보를 포함하는 언어모델 생성요청에 따라 복수개의 언어데이터 중 메타정보에 매핑되는 언어데이터를 추출하고, 추출된 언어데이터를 이용하여 언어모델을 생성하게 된다. 사용자는 언어 모델 이름, 모델 설명 등을 포함하는 언어모델 입력정보를 입력할 수 있고, 이러한 경우, 언어모델 생성요청에 입력정보가 포함된다.For example, as shown in FIG. 7A , the user may access the model learning page through the user terminal 300 . When the user selects the language model addition icon 710 , the modeling unit 220 displays the language model addition page through the user terminal 300 as shown in FIG. 7B . When the user selects the dataset addition icon 810, the modeling unit 220 displays a plurality of language data stored in the database 240, the user selects any one of the plurality of language data, and the learning start icon If 820 is selected, a language model creation request occurs. In this case, the language model creation request includes meta information of the language data selected by the user. The modeling unit 220 extracts language data mapped to meta information among a plurality of language data according to a request for generating a language model including meta information of the language data, and generates a language model using the extracted language data. The user may input language model input information including a language model name and model description, and in this case, the input information is included in the language model creation request.

모델링부(220)는 생성된 언어모델을 데이터베이스(240)에 저장한다. 이때, 언어모델 생성요청에는 언어모델 입력정보를 포함할 수 있다. 언어모델 입력정보는 언어모델의 명칭 또는 용도 등을 포함하는 정보일 수 있다. 이러한 실시예를 따를 때, 모델링부(220)는 데이터베이스(240)에 입력정보를 기초로 언어모델이 저장될 저장공간을 암호화하여 생성하고, 언어모델을 해당 저장공간에 저장할 수 있다.The modeling unit 220 stores the generated language model in the database 240 . In this case, the language model generation request may include language model input information. The language model input information may be information including the name or use of the language model. According to this embodiment, the modeling unit 220 may encrypt and generate a storage space in which a language model is to be stored based on input information in the database 240 and store the language model in the corresponding storage space.

이와 같이 언어모델은 데이터베이스(240)에 저장되게 된다. 이러한 경우, 데이터베이스(240)에는 복수개의 언어모델이 저장되어 있을 수 있다.As such, the language model is stored in the database 240 . In this case, a plurality of language models may be stored in the database 240 .

모델링부(220)는 복수개의 언어모델을 통합하여 통합언어모델을 생성할 수 있다. 구체적으로, 모델링부(220)는 n개의 언어모델의 메타정보를 포함하는 통합언어모델 생성요청이 발생하면, 복수개의 언어모델 중 n개의 언어모델의 메타정보에 매핑되는 언어모델들을 통합하여 통합언어모델을 생성할 수 있다. n은 2이상의 정수를 의미할 수 있다.The modeling unit 220 may generate an integrated language model by integrating a plurality of language models. Specifically, when a request for generating an integrated language model including meta information of n language models occurs, the modeling unit 220 integrates language models mapped to meta information of n language models among a plurality of language models to form an integrated language. You can create a model. n may mean an integer of 2 or more.

예컨대, 사용자가 도 7a에 도시된 모델 학습페이지에 사용자단말기(300)를 통해 접속할 수 있다. 사용자가 모델 학습페이지에 통합언어모델 추가 아이콘(720)을 선택하면, 모델링부(220)는 도 7c에 도시된 통합언어모델 추가 페이지를 사용자단말기(300)를 통해 디스플레이하게 된다. 사용자가 모델추가 아이콘(910)을 선택하면 데이터베이스(240)에 저장된 복수개의 언어모델이 디스플레이 되고, 사용자가 복수개의 언어모델 중 n개의 언어모델을 선택하고, 학습시작 아이콘(920)을 선택하면, n개의 언어모델의 메타정보를 포함하는 통합언어모델 생성요청이 발생하게 된다. 모델링부(220)는 통합언어모델 생성요청에 따라 복수개의 언어모델 중 n개의 언어모델의 메타정보에 매핑되는 언어모델들을 통합하여 통합언어모델을 생성한다.For example, the user may access the model learning page shown in FIG. 7A through the user terminal 300 . When the user selects the integrated language model addition icon 720 on the model learning page, the modeling unit 220 displays the integrated language model addition page shown in FIG. 7C through the user terminal 300 . When the user selects the model addition icon 910, a plurality of language models stored in the database 240 are displayed, and when the user selects n language models among the plurality of language models, and selects the learning start icon 920, A request to create an integrated language model including meta information of n language models is generated. The modeling unit 220 generates an integrated language model by integrating language models mapped to meta information of n language models among a plurality of language models according to a request for generating an integrated language model.

모델링부(220)는 생성된 통합언어모델을 데이터베이스(240)에 저장한다. The modeling unit 220 stores the generated integrated language model in the database 240 .

모델링부(220)는 음향모델을 학습시킬 수 있다. 구체적으로 모델링부(220)는 음성데이터의 메타정보와 음향모델의 메타정보를 포함하는 음향모델 학습요청이 발생하면 데이터베이스(240)에 저장된 복수개의 음성데이터 중 음성데이터의 메타정보에 매핑되는 음성데이터를 추출하고, 데이터베이스(240)에 저장된 복수개의 음향모델 중 음향모델의 메타정보에 매핑되는 음성데이터를 추출한다. 모델링부(220)는 추출된 음성데이터로 추출된 음향모델을 학습시킨다. The modeling unit 220 may learn an acoustic model. Specifically, when a request for learning an acoustic model including meta information of the voice data and the meta information of the acoustic model occurs, the modeling unit 220 is the voice data mapped to the meta information of the voice data among the plurality of voice data stored in the database 240 . and extracts voice data mapped to meta information of the acoustic model among the plurality of acoustic models stored in the database 240 . The modeling unit 220 learns the extracted acoustic model from the extracted voice data.

예컨대, 사용자는 도 7a에 도시된 바와 같이 모델 학습페이지를 사용자단말기(300)를 통해 접속할 수 있다. 사용자가 음향모델 추가 아이콘(730)을 선택하면, 모델링부(220)는 도 7d에 도시된 바와 같이 음향모델 학습페이지를 디스플레이하게 된다. 사용자가 모델추가 아이콘(1010)을 선택하면 데이터베이스(240)에 저장된 복수개의 음향모델 중 어느 하나의 음향모델을 선택할 수 있고, 데이터셋 추가 아이콘(1020)을 선택하면, 데이터베이스(240)에 저장된 복수개의 음성데이터 중 어느 하나의 음성데이터를 선택할 수 있다. 사용자가 음향모델 및 음성데이터를 선택하고 학습시작 아이콘(1030)을 선택하면, 음향모델 학습요청이 발생된다. 음향모델 학습요청은 사용자가 선택한 음성데이터의 메타정보와 음향모델의 메타정보를 포함한다. 모델링부(220)는 음향모델 학습요청에 따라 복수개의 음성데이터 중 음성데이터의 메타정보와 매핑되는 음성데이터를 추출하고, 복수개의 음향모델 중 음향모델의 메타정보와 매핑되는 음향모델을 추출한다. 모델링부(220)는 추출된 음향모델을 추출된 음성데이터로 학습시킨다. 모델링부(220)는 학습이 완료되면 음향모델은 데이터 베이스(240)에 저장한다.For example, the user may access the model learning page through the user terminal 300 as shown in FIG. 7A . When the user selects the acoustic model addition icon 730, the modeling unit 220 displays the acoustic model learning page as shown in FIG. 7D. When the user selects the model addition icon 1010, any one acoustic model can be selected among a plurality of acoustic models stored in the database 240, and when the user selects the dataset addition icon 1020, the plurality of acoustic models stored in the database 240 is selected. Any one of the voice data can be selected. When the user selects an acoustic model and voice data and selects the learning start icon 1030, a request to learn the acoustic model is generated. The acoustic model learning request includes meta information of the voice data selected by the user and meta information of the acoustic model. The modeling unit 220 extracts voice data mapped with meta information of voice data from among a plurality of voice data according to a request for learning an acoustic model, and extracts an acoustic model mapped with meta information of the acoustic model from among the plurality of acoustic models. The modeling unit 220 learns the extracted acoustic model as the extracted voice data. When the learning is completed, the modeling unit 220 stores the acoustic model in the database 240 .

일 실시예에 있어서, 모델링부(220)는 음향모델을 음성데이터로 전이학습(Transfer Learning)시킬 수 있다.In an embodiment, the modeling unit 220 may transfer learning the acoustic model to voice data.

모델링부(220)는 통합된 통합언어모델과 생성된 음향모델로 음성인식모델을 생성하게 된다.The modeling unit 220 generates a voice recognition model using the integrated integrated language model and the generated acoustic model.

이러한 음성인식모델은 데이터베이스(240)에 복수개가 저장되어 있을 수 있다.A plurality of such speech recognition models may be stored in the database 240 .

모델링부(220)는 음성인식모델을 업데이트시킬 수 있다. 구체적으로 모델링부(220)는 음성인식모델의 메타정보 및 통합언어모델의 메타정보를 포함하는 음성인식모델 업데이트 요청이 발생하면, 복수개의 음성인식모델 중 음성인식모델의 메타정보에 매핑되는 음성인식모델을 추출하고, 복수개의 통합언어모델 중 통합언어모델의 메타정보에 매핑되는 통합언어모델을 추출한다. 모델링부(220)는 추출된 음성인식모델의 언어모델을 통합언어모델로 교체한다.The modeling unit 220 may update the voice recognition model. Specifically, when a voice recognition model update request including meta information of the voice recognition model and meta information of the integrated language model occurs, the modeling unit 220 is a voice recognition mapped to meta information of the voice recognition model among a plurality of voice recognition models. The model is extracted, and the integrated language model mapped to meta information of the integrated language model among the plurality of integrated language models is extracted. The modeling unit 220 replaces the language model of the extracted speech recognition model with an integrated language model.

예컨대, 사용자는 도 7a에 도시된 바와 같이 모델 학습페이지를 사용자단말기(300)를 통해 접속할 수 있다. 사용자가 업데이트 아이콘(740)을 선택하면, 모델링부(220)는 도 7e에 도시된 바와 같이 음성인식모델 업데이트 페이지를 디스플레이하게 된다. 사용자가 음향모델 추가 아이콘(1110)을 선택하면 데이터베이스(240)에 저장된 복수개의 음성인식모델 중 어느 하나의 음성인식모델을 선택할 수 있고, 언어모델 추가 아이콘(1120)을 선택하면, 데이터베이스(240)에 저장된 복수개의 통합언어모델 중 어느 하나의 통합언어모델을 선택할 수 있다. 여기서 음향모델 추가아이콘(1110)이라고 기재되어 있으나, 이는 음성인식모델을 추가하는 아이콘을 의미한다.For example, the user may access the model learning page through the user terminal 300 as shown in FIG. 7A . When the user selects the update icon 740 , the modeling unit 220 displays the voice recognition model update page as shown in FIG. 7E . When the user selects the acoustic model addition icon 1110, any one of a plurality of voice recognition models stored in the database 240 can be selected, and when the language model addition icon 1120 is selected, the database 240 Any one of the integrated language models stored in the . Here, although it is described as an acoustic model addition icon 1110, this means an icon for adding a voice recognition model.

사용자가 음성인식모델 및 통합언어모델를 선택하고 학습시작 아이콘(1130)을 선택하면, 음성인식모델 업데이트요청이 발생된다. 음성인식모델 업데이트요청은 사용자가 선택한 음성인식모델의 메타정보와 통합언어모델의 메타정보를 포함한다. 모델링부(220)는 음성인식모델 업데이트요청에 따라 복수개의 음성인식모델 중 음성인식모델의 메타정보와 매핑되는 음성인식모델을 추출하고, 복수개의 통합언어모델 중 통합언어모델의 메타정보와 매핑되는 통합언어모델을 추출한다. 모델링부(220)는 추출된 음성인식모델의 언어모델을 추출된 통합언어모델로 교체한다. 모델링부(220)는 업데이트가 완료되면 업데이트된 음성인식모델을 데이터 베이스(240)에 저장한다.When the user selects the voice recognition model and the integrated language model and selects the learning start icon 1130, a voice recognition model update request is generated. The voice recognition model update request includes meta information of the voice recognition model selected by the user and meta information of the integrated language model. The modeling unit 220 extracts a speech recognition model mapped with meta information of a speech recognition model from among a plurality of speech recognition models according to a request for updating a speech recognition model, and is mapped with meta information of an integrated language model from among a plurality of unified language models Extract the integrated language model. The modeling unit 220 replaces the language model of the extracted speech recognition model with the extracted integrated language model. When the update is completed, the modeling unit 220 stores the updated voice recognition model in the database 240 .

모델링부(220)는 음성인식모델의 파라미터 최적값 산출요청이 발생하면, 해당 음성인식모델의 파라미터값을 최적화한다. 이때, 파라미터는 LMWT(Language Model Weight for Lattice), WIP(Work In Progress), iVector Silence Weight, iVector Remembered Frame 등을 포함할 수 있다.The modeling unit 220 optimizes the parameter value of the speech recognition model when a request for calculating the optimal parameter value of the speech recognition model is generated. In this case, the parameters may include Language Model Weight for Lattice (LMWT), Work In Progress (WIP), iVector Silence Weight, iVector Remembered Frame, and the like.

구체적으로 모델링부(220)는 음성데이터의 메타정보와 음성인식모델의 메타정보를 포함하는 파라미터 최적값 산출요청이 발생하면, 복수개의 음성인식모델 중 음성인식모델의 메타정보에 매핑되는 음성인식모델을 추출하고, 복수개의 음성데이터 중 음성데이터의 메타정보에 매핑되는 음성데이터를 추출한다. 모델링부(220)는 추출된 음성인식모델에 추출된 음성데이터를 입력한다. 이때, 모델링부(220)는 m개의 파라미터값 중 어느 하나의 파라미터값으로 변경해가면서 최적화된 파라미터값을 산출한다. 이러한 경우, 모델링부(220)는 음성데이터를 음성인식모델에 m번 입력하게 되고, m개의 결과가 산출된다. 모델링부(220)는 m개의 결과 중 음성인식률이 가장 높은 파라미터값을 파라미터 최적값으로 선택한다.Specifically, when a request for calculating an optimal parameter value including meta information of voice data and meta information of a voice recognition model occurs, the modeling unit 220 is a voice recognition model mapped to meta information of a voice recognition model among a plurality of voice recognition models. and extracts voice data mapped to meta information of voice data among a plurality of voice data. The modeling unit 220 inputs the extracted voice data to the extracted voice recognition model. At this time, the modeling unit 220 calculates an optimized parameter value while changing to any one of the m parameter values. In this case, the modeling unit 220 inputs the voice data m times to the voice recognition model, and m results are calculated. The modeling unit 220 selects a parameter value having the highest speech recognition rate among the m results as an optimal parameter value.

예컨대, 사용자는 도 7a에 도시된 바와 같이 모델 학습페이지를 사용자단말기(300)를 통해 접속할 수 있다. 사용자가 최적 파라미터 탐색 아이콘(750)을 선택하면, 모델링부(220)는 도 7f에 도시된 바와 같이 최적 파라미터 탐색 페이지를 디스플레이하게 된다. 사용자가 음향모델 추가 아이콘(1210)을 선택하면 데이터베이스(240)에 저장된 복수개의 음성인식모델 중 어느 하나의 음성인식모델을 선택할 수 있고, 데이터셋 추가 아이콘(1220)을 선택하면, 데이터베이스(240)에 저장된 복수개의 음성데이터 중 어느 하나의 음성데이터를 선택할 수 있다. 여기서 음향모델 추가아이콘(1210)이라고 기재되어 있으나, 이는 음성인식모델을 추가하는 아이콘을 의미한다.For example, the user may access the model learning page through the user terminal 300 as shown in FIG. 7A . When the user selects the optimum parameter search icon 750 , the modeling unit 220 displays the optimum parameter search page as shown in FIG. 7F . When the user selects the acoustic model addition icon 1210, any one of a plurality of voice recognition models stored in the database 240 can be selected, and when the data set add icon 1220 is selected, the database 240 Any one of the plurality of voice data stored in the . Here, although it is described as an acoustic model addition icon 1210, this means an icon for adding a voice recognition model.

사용자가 음성인식모델 및 음성데이터를 선택하고 학습시작 아이콘(1230)을 선택하면, 파라미터 최적값 산출요청이 발생된다. 파라미터 최적값 산출요청은 사용자가 선택한 음성인식모델의 메타정보와 음성데이터의 메타정보를 포함한다. 모델링부(220)는 파라미터 최적값 산출요청에 따라 복수개의 음성인식모델 중 음성인식모델의 메타정보와 매핑되는 음성인식모델을 추출하고, 복수개의 음성데이터 중 음성데이터의 메타정보와 매핑되는 통합언어모델을 추출한다. 모델링부(220)는 복수개의 파라미터값 중 어느 하나의 값으로 설정하고 추출된 음성인식모델에 음성데이터를 입력하여 음성인식률을 산출한다. 모델링부(220)는 파라미터값을 복수개의 파라미터값내에서 변경해가면서 음성인식률을 산출하고, 복수개의 파라미터값 중에서 음성인식률이 가장 높은 파라미터값을 최적 파라미터 값으로 선택한다. 모델링부(220)는 최적 파라미터가 산출되면 해당 최적 파라미터가 적용된 음성인식모델을 데이터 베이스(240)에 저장한다.When the user selects a voice recognition model and voice data and selects a learning start icon 1230, a request for calculating an optimal parameter value is generated. The parameter optimal value calculation request includes meta information of the voice recognition model selected by the user and meta information of voice data. The modeling unit 220 extracts a speech recognition model mapped with meta information of a speech recognition model from among a plurality of speech recognition models according to a request for calculating an optimal parameter value, and an integrated language mapped with meta information of speech data from among the plurality of speech data Extract the model. The modeling unit 220 calculates a voice recognition rate by setting any one of a plurality of parameter values and inputting voice data to the extracted voice recognition model. The modeling unit 220 calculates a voice recognition rate while changing a parameter value within a plurality of parameter values, and selects a parameter value having the highest voice recognition rate among the plurality of parameter values as an optimal parameter value. When the optimal parameter is calculated, the modeling unit 220 stores the voice recognition model to which the optimal parameter is applied in the database 240 .

모델링부(220)는 모델 관리요청에 따라 데이터베이스(240)에 저장된 언어모델, 통합언어모델, 음향모델, 및 음성인식모델을 도 8a 내지 8c에 도시된 바와 같이 모델 관리 페이지를 통해 관리할 수 있게 한다. 모델링부(220)는 조회요청에 따라 데이터베이스(240)에 저장된 각 모델들을 추출한다. 모델링부(220)는 사용자단말기(300)로 각 모델들을 디스플레이한다. 또한, 모델링부(220)는 삭제요청에 따라 해당 삭제요청의 대상이된 모델을 데이터베이스(240)에서 삭제할 수 있다. The modeling unit 220 can manage the language model, the integrated language model, the acoustic model, and the voice recognition model stored in the database 240 according to the model management request through the model management page as shown in FIGS. 8A to 8C . do. The modeling unit 220 extracts each model stored in the database 240 according to the inquiry request. The modeling unit 220 displays each model on the user terminal 300 . In addition, the modeling unit 220 may delete the model that is the target of the deletion request from the database 240 according to the deletion request.

한편 다시 도 2를 참조하면, 모델 배포부(230)는 음성인식모델 배포요청에 따라 음성인식모델을 배포한다. 구체적으로 모델 배포부(230)는 음성인식모델의 메타정보를 포함하는 음성인식모델 배포요청에 따라 데이터베이스(240)에 저장된 복수개의 음성인식모델 중 음성인식모델의 메타정보에 매핑되는 음성인식모델을 음성인식서버로 배포할 수 있다. 음성인식서버는 음성인식 서비스를 제공하는 서버를 의미한다. 예컨대, 모바일 메모 어플리케이션, 모바일 데이터 수집 어플리케이션, 회의용 음성수집 프로그램 등과 같은 음성인식 서비스를 제공하기 위한 서버로 제공할 수 있다.Meanwhile, referring back to FIG. 2 , the model distribution unit 230 distributes the voice recognition model according to the voice recognition model distribution request. Specifically, the model distribution unit 230 is a speech recognition model that is mapped to the meta information of the speech recognition model among a plurality of speech recognition models stored in the database 240 in response to a request for distribution of a speech recognition model including meta information of the speech recognition model. It can be distributed as a voice recognition server. The voice recognition server means a server that provides a voice recognition service. For example, it may be provided as a server for providing a voice recognition service such as a mobile memo application, a mobile data collection application, a voice collection program for meetings, and the like.

예컨대, 사용자는 도 8c에 도시된 바와 같이 음성인식모델 관리페이지에 접속할 수 있다. 사용자가 배치 아이콘(1310)을 선택하면, 복수개의 음성인식모델 중 어느 하나의 음성인식모델을 선택할 수 있다. 또한, 사용자는 복수개의 음성인식서비스 서버 중 어느 하나의 음성인식서버를 선택할 수 있다. 사용자에 의해 음성인식모델이 선택되면 음성인식모델 배포요청이 발생된다. 음성인식모델 배포요청은 사용자가 선택한 음성인식모델의 메타정보와 음성인식서비스 서버의 메타정보를 포함한다. 모델 배포부(230)는 음성인식모델 배포요청에 따라 메타정보가 매핑되는 음성인식모델을 선택된 음성인식서비스 서버로 배포하게 된다.For example, the user may access the voice recognition model management page as shown in FIG. 8C . When the user selects the arrangement icon 1310, any one of the plurality of voice recognition models may be selected. In addition, the user may select any one of the plurality of voice recognition service servers. When the voice recognition model is selected by the user, a voice recognition model distribution request is generated. The voice recognition model distribution request includes meta information of the voice recognition model selected by the user and meta information of the voice recognition service server. The model distribution unit 230 distributes the voice recognition model to which meta information is mapped to the selected voice recognition service server according to the voice recognition model distribution request.

데이터베이스(240)에는 음성데이터, 음성데이터의 텍스트 데이터, 언어데이터, 언어모델, 통합언어모델, 음향모델, 및 음성인식모델이 저장된다. 여기서 데이터베이스(240)가 하나인 것으로 설명하지만, 이는 하나의 예에 불과할 뿐, 음성데이터용 데이터베이스, 언어데이터용 데이터베이스, 언어모델용 데이터베이스, 통합언어모델용 데이터베이스, 음향모델용 데이터베이스, 음성인식모델용 데이터베이스로 구현될 수 있으며, 이는 구현방식의 차이에 불과할 뿐 이에 한정되지 않는다.The database 240 stores voice data, text data of voice data, language data, language model, integrated language model, acoustic model, and voice recognition model. Here, although it is described that there is one database 240, this is only an example, and a database for voice data, a database for language data, a database for a language model, a database for an integrated language model, a database for an acoustic model, and a database for a voice recognition model. It may be implemented as a database, which is only a difference in implementation method, but is not limited thereto.

데이터베이스(240)에는 각 데이터 및 모델 별로 저장공간이 생성되어 있을 수 있고, 이러한 저장공간은 암호화되어 생성될 수 있다.A storage space may be created for each data and model in the database 240 , and this storage space may be encrypted and generated.

한편, 검색부(250)는 음성데이터 또는 언어데이터의 메타정보를 추출하여 검색 데이터베이스(260)에 저장한다. 구체적으로 검색부(250)는 업로드되는 음성데이터 또는 언어데이터의 메타정보를 추출하여 검색 데이터베이스(260)에 저장한다.Meanwhile, the search unit 250 extracts meta information of voice data or language data and stores it in the search database 260 . Specifically, the search unit 250 extracts meta information of the uploaded voice data or language data and stores it in the search database 260 .

일 실시예에 있어서, 검색부(250)는 음성데이터의 텍스트데이터를 기초로 인덱싱정보를 생성하고, 음성데이터에 인덱싱정보를 인덱싱할 수 있다. 검색부(250)는 인덱싱정보를 해당 음성데이터의 메타정보와 매핑시켜 검색데이터 베이스(250)에 저장할 수 있다.In an embodiment, the search unit 250 may generate indexing information based on text data of voice data and index the indexing information into voice data. The search unit 250 may store the indexing information in the search database 250 by mapping the indexing information with the meta information of the corresponding voice data.

이러한 경우, 사용자가 음성데이터의 텍스트데이터에 포함된 단어로 검색하는 경우, 검색부(250)가 검색데이터 베이스(260)에 저장된 복수개의 인덱싱정보 중 해당 단어가 포함된 인덱싱정보를 검색하고, 검색된 인덱싱정보와 매핑되는 메타정보에 따라 음성데이터를 데이터베이스(240)로부터 추출하여 사용자에게 제공할 수 있다.In this case, when the user searches for a word included in the text data of the voice data, the search unit 250 searches for indexing information including the corresponding word among a plurality of indexing information stored in the search database 260, and the searched Voice data may be extracted from the database 240 according to the meta information mapped with the indexing information and provided to the user.

일 실시예에 있어서, 검색부(250)는 Elastic Search로 구현될 수 있다.In one embodiment, the search unit 250 may be implemented as Elastic Search.

검색 데이터베이스(260)에는 메타정보가 저장된다. 구체적으로 검색 데이터베이스(260)에는 관리서버(200)에 포함된 데이터들의 메타정보가 저장된다. 예컨대, 음성데이터의 메타정보, 음성데이터의 텍스트데이터의 메타정보, 언어데이터의 메타정보, 언어모델의 메타정보, 통합언어모델의 메타정보, 음향모델의 메타정보, 음성인식모델의 메타정보가 저장될 수 있다. Meta information is stored in the search database 260 . Specifically, meta information of data included in the management server 200 is stored in the search database 260 . For example, meta information of voice data, meta information of text data of voice data, meta information of language data, meta information of language model, meta information of integrated language model, meta information of sound model, and meta information of speech recognition model are stored. can be

또한, 검색 데이터베이스(260)는 음성데이터의 텍스트데이터를 기초로 생성한 인덱싱정보가 저장되어 있을 수 있다. 이러한 인덱싱정보는 해당 음성데이터의 텍스트데이터의 메타정보와 매핑되어 저장되어 있을 수 있다. Also, the search database 260 may store indexing information generated based on text data of voice data. Such indexing information may be stored by being mapped with meta information of text data of the corresponding voice data.

한편, 본 발명에 따른 관리서버(200)는 사용자 등록부(260)를 더 포함할 수 있다. 사용자 등록부(260)는 사용자를 등록한다. 이러한 경우, 사용자 등록부(260)에 의해 등록된 사용자만이 관리서버(200)에 접근할 수 있다. 이때, 사용자 등록부(260)는 사용자 별로 서로 다른 권한을 부여할 수 있다.Meanwhile, the management server 200 according to the present invention may further include a user registration unit 260 . The user registration unit 260 registers a user. In this case, only the user registered by the user registration unit 260 may access the management server 200 . In this case, the user registration unit 260 may grant different rights to each user.

한편, 본 발명에 따른 관리서버(200)는 서비스 등록부(270)를 더 포함할 수 있다. 서비스 등록부(270)는 관리서버(200)에서 관리하고자 하는 서비스를 추가, 삭제, 및 수정을 수행할 수 있다. 이러한 경우, 모델 배포부(230)는 해당 서비스에 따라 생성된 음성인식모델을 해당 서비스를 제공하는 음성인식서버로 배포할 수 있다.Meanwhile, the management server 200 according to the present invention may further include a service registration unit 270 . The service registration unit 270 may add, delete, and modify services to be managed by the management server 200 . In this case, the model distribution unit 230 may distribute the voice recognition model generated according to the corresponding service to the voice recognition server that provides the corresponding service.

사용자 단말기(300)는 네트워크를 통해 관리서버(200)와 연동하여 사용자에게 음성인식모델을 학습시키기 위한 음성데이터 및 언어데이터를 업로드할 수 있게 하고, 음성인식모델을 학습시킬 수 있게 한다.The user terminal 300 enables the user to upload voice data and language data for learning the voice recognition model by interworking with the management server 200 through a network, and to learn the voice recognition model.

사용자 단말기(300)는 유선 인터넷 접속 및 브라우징 기능이 구비된 개인 컴퓨터(PC) 또는 노트북, 무선랜/휴대인터넷 접속 및 브라우징 기능이 구비된 노트북 또는 휴대단말기, 이동통신망에 접속 및 브라우징 기능이 구비된 PCS(Personal Communication System), GSM(Global System for Mobile) 단말기, PDA(Personal Digital Assistant), 또는 스마트폰(Smart Phone) 등을 포함할 수 있다. The user terminal 300 includes a personal computer (PC) or notebook equipped with wired Internet access and browsing functions, a notebook or portable terminal equipped with wireless LAN/mobile Internet access and browsing functions, and a mobile communication network with access and browsing functions. It may include a Personal Communication System (PCS), a Global System for Mobile (GSM) terminal, a Personal Digital Assistant (PDA), or a smart phone.

본 발명이 속하는 기술분야의 당업자는 상술한 본 발명이 그 기술적 사상이나 필수적 특징을 변경하지 않고서 다른 구체적인 형태로 실시될 수 있다는 것을 이해할 수 있을 것이다.Those skilled in the art to which the present invention pertains will understand that the above-described present invention may be embodied in other specific forms without changing the technical spirit or essential characteristics thereof.

그러므로, 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적인 것이 아닌 것으로 이해해야만 한다. 본 발명의 범위는 상기 상세한 설명보다는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 그 등가 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본 발명의 범위에 포함되는 것으로 해석되어야 한다.Therefore, it should be understood that the embodiments described above are illustrative in all respects and not restrictive. The scope of the present invention is indicated by the following claims rather than the above detailed description, and all changes or modifications derived from the meaning and scope of the claims and their equivalent concepts should be interpreted as being included in the scope of the present invention. do.

100: 음성인식모델 관리시스템 200: 관리서버
210: 데이터 관리부 220: 모델링부
230: 모델 배포부 240: 데이터베이스
250: 검색부 260: 검색 데이터베이스
300: 사용자 단말기 100: voice recognition model management system 200: management server
210: data management unit 220: modeling unit
230: model distribution unit 240: database
250: search unit 260: search database
300: user terminal

Claims

a data management unit for collecting and storing the data in a database according to a request for uploading at least one of language data and voice data;
Meta information of the first data among a plurality of data stored in the database according to a voice recognition model learning request including meta information of the first data including at least one of language data and voice data and meta information of the first voice recognition model Extracting first data mapped to information, extracting a first speech recognition model mapped to meta information of the first speech recognition model among the plurality of speech recognition models, and converting the first speech recognition model to the first data a modeling unit for learning; and
a model distribution unit for distributing the first speech recognition model to a speech recognition server providing a speech recognition service in response to the request for distribution of the first speech recognition model; and
The first voice recognition model is a voice recognition model management system capable of learning a voice recognition model, characterized in that it comprises an acoustic model for converting speech into text and a language model for correcting the text output from the acoustic model.

According to claim 1,
The data management unit,
A voice recognition model management system capable of learning a voice recognition model, characterized in that a storage space is created in the database to store the data to be uploaded, and the data is uploaded to a temporary database and moved to the storage space.

According to claim 1,
The upload request includes input information of the language data or voice data,
The data management unit,
A voice recognition model management system capable of learning a voice recognition model, characterized in that generated by encrypting a storage space in which the data is to be stored based on the input information.

According to claim 1,
The data management unit,
When a request for extracting text data including meta information of the first voice data is generated, first voice data mapped to the meta information among a plurality of voice data stored in the database is input to a pre-generated reference voice recognition model to obtain a first A voice recognition model management system capable of learning a voice recognition model, comprising extracting text data, mapping the first text data with the first voice data, and storing the data in the database.

According to claim 1,
The data management unit,
When a request for modifying voice data including meta information of the first voice data occurs, the first voice data mapped to the meta information among a plurality of voice data stored in the database and first text data of the first voice data are loaded. A voice recognition model management system capable of learning a voice recognition model, characterized in that when the correction is completed, the corrected first voice data and the first text data of the first voice data are stored in the database.

According to claim 1,
The data management unit,
When a request to modify language data including meta information of the first language data occurs, first language data mapped to the meta information among a plurality of language data stored in the database is loaded and modified. A voice recognition model management system capable of learning a voice recognition model, characterized in that storing the first language data in the database.

According to claim 1,
The modeling unit,
When a request for learning an acoustic model including meta information of the first voice data and meta information of the first acoustic model occurs, first voice data mapped to the meta information is extracted from among a plurality of voice data stored in the database, and A voice recognition model capable of learning a voice recognition model, characterized in that by extracting a first acoustic model mapped to the meta information from among acoustic models of dogs, and transferring the first acoustic model to the first voice data management system.

According to claim 1,
The modeling unit,
When a request for generating a language model including meta information of first language data occurs, first language data mapped to the meta information is extracted from among a plurality of language data stored in the database, and the language model is converted to the first language data. A voice recognition model management system that can train a voice recognition model, characterized in that it generates.

9. The method of claim 8,
The language model is plural,
The modeling unit,
When a request for generating an integrated language model including meta information of n language models is generated, a speech recognition model for generating an integrated language model by integrating n language models mapped to the meta information among the plurality of language models is learned A voice recognition model management system that can

According to claim 1,
The voice recognition model is
an acoustic model that converts speech into text; and a language model for correcting the text,
The modeling unit,
When a voice recognition model update request including meta information of the first voice recognition model and meta information of the first integrated language model occurs, the language model of the first voice recognition model mapped to the meta information among a plurality of voice recognition models A voice recognition model management system capable of learning a voice recognition model, characterized in that it replaces the first integrated language model mapped to meta information among a plurality of integrated language models.

According to claim 1,
The modeling unit,
When a request for calculating the optimal parameter value including the meta information of the first voice data and the meta information of the first voice recognition model occurs, while changing the value of any one of the m parameters in the first voice recognition model A voice recognition model management system capable of learning a voice recognition model, characterized in that the first voice data is input m times, and the value of the parameter having the highest voice recognition rate among the m results is calculated as an optimal value.

According to claim 1,
A voice recognition model management capable of learning a voice recognition model, characterized in that it extracts meta information from the data, stores it in a search database, and further comprises a search unit for retrieving the data based on the meta information in response to a search request system.

13. The method of claim 12,
The search unit,
A voice recognition model management system capable of training a voice recognition model, characterized in that indexing information is generated based on text data of the voice data, and indexing information is indexed in the voice data.

13. The method of claim 12,
The search unit,
A voice recognition model management system capable of learning a voice recognition model, characterized in that the indexing information is mapped to the meta information and stored in the search database.