KR20210020294A

KR20210020294A - Method And Apparatus for Providing Speech Recognition of Word Unit or Sentence Unit

Info

Publication number: KR20210020294A
Application number: KR1020190099358A
Authority: KR
Inventors: 임대환; 심혜영
Original assignee: 주식회사 코너스톤헬스케어랩; 임대환
Priority date: 2019-08-14
Filing date: 2019-08-14
Publication date: 2021-02-24

Abstract

Disclosed are a method and device for recognizing speech in units of words or sentences. The method according to an embodiment of the present invention checks partial matching of a word corresponding to inputted speech, extracts a candidate word by combining a plurality of preset algorithms, and generates a final candidate word for filtering the candidate word, and checks whether a word with the smallest value is a standard word to output the word as the standard word after comparing approximations of the final candidate words if there are a plurality of final candidate words.

Description

TECHNICAL FIELD [Method And Apparatus for Providing Speech Recognition of Word Unit or Sentence Unit}

본 실시예는 단어 또는 문장 단위 음성 인식 방법 및 장치에 관한 것이다. The present embodiment relates to a method and apparatus for recognizing speech in units of words or sentences.

이하에 기술되는 내용은 단순히 본 실시예와 관련되는 배경 정보만을 제공할 뿐 종래기술을 구성하는 것이 아니다.The contents described below merely provide background information related to the present embodiment and do not constitute the prior art.

일반적으로, 음성인식은 음성에 포함된 음향학적 정보로부터 음운 및 언어적 정보를 추출하고, 음성인식기를 통하여 음성어휘 사전에 등록된 어휘를 모델링한 후 이와 가장 유사한 데이터를 찾아내어 반응하게 만드는 일련의 과정이다.In general, speech recognition is a series of phonological and linguistic information that extracts phonological and linguistic information from the acoustic information included in the speech, and after modeling the vocabulary registered in the speech vocabulary dictionary through the speech recognizer, finds the most similar data and makes it react. It's a process.

따라서 음성인식의 처리과정 전체에서 인식대상 어휘의 많고 적음은 인식률 향상과 인식속도를 결정하는 중요한 부분을 차지한다.Therefore, in the entire process of speech recognition, a large or small number of words to be recognized occupies an important part in determining recognition rate improvement and recognition speed.

현재의 음성인식 기술은 소용량 어휘를 대상으로 하는 고립단어 인식의 수준을 넘어 대규모 어휘를 대상으로 하는 연속 음성인식에 대한 연구가 활발하며, 인식대상 어휘의 처리에 많은 노력을 기울이고 있다.The current speech recognition technology goes beyond the level of isolated word recognition for small-volume vocabulary, and studies on continuous speech recognition for large-scale vocabulary are active, and a lot of effort is being put into processing the vocabulary to be recognized.

이와 관련하여 대규모 음성어휘를 인식대상으로 하게 되면서 음성인식률이 큰 해결과제로 부각되어 있다. 인식대상 어휘의 수가 적을 때는 인식률이 상대적으로 높은 반면, 인식대상 어휘의 수가 많아짐에 따라 인식률이 현저하게 떨어지는 현상이 나타나게 되기 때문이다.In this regard, as large-scale speech vocabulary is being recognized as a recognition target, the speech recognition rate has emerged as a large problem. This is because the recognition rate is relatively high when the number of words to be recognized is small, while the recognition rate decreases significantly as the number of words to be recognized increases.

현재 음성인식 대상 어휘의 수가 1,000개를 넘는 인식률의 경우, 정제된 실험실 환경에서는 90~95%로 발표되고 있으나, 실생활에서는 주변의 환경잡음과 다양한 통신망 및 통신기기 자체내의 채널잡음으로 인하여 80%가 채 안 되는 수준이다. 이러한 정도의 인식률 수준으로는 성공적인 상용서비스가 불가능하다는 문제점이 있다.Currently, in the case of the recognition rate of more than 1,000 words for speech recognition, it is announced as 90-95% in a refined laboratory environment, but in real life, 80% is due to the surrounding environmental noise and channel noise in various communication networks and communication devices themselves. It is less than that. There is a problem that a successful commercial service is not possible with this level of recognition rate.

본 실시예는 입력된 음성에 대응하는 단어에 대한 부분 일치, 기 설정된 복수의 알고리즘을 조합하여 후보 단어를 추출하고, 후보 단어를 필터링하는 최종 후보 단어를 생성하고, 최종 후보 단어가 복수 개인 경우, 최종 후보 단어의 근사치를 비교한 후 가장 적은 값을 갖는 단어를 표준어 여부를 확인하여 표준어로 출력하는 단어 또는 문장 단위 음성 인식 방법 및 장치를 제공하는 데 목적이 있다.In this embodiment, a partial match for a word corresponding to an input voice, a candidate word is extracted by combining a plurality of preset algorithms, a final candidate word for filtering the candidate word is generated, and there are a plurality of final candidate words, An object of the present invention is to provide a method and apparatus for speech recognition in units of words or sentences that compares approximate values of final candidate words and then determines whether a word having the smallest value is a standard word and outputs a standard word.

본 실시예의 일 측면에 의하면, 단어 단위 음성인식 방법에 있어서, 단어 단위의 음성 입력을 받아 STT(Speech-To-Text)를 거쳐 단어 텍스트로 변환하는 과정; 상기 단어 텍스트에 대한 토큰을 분류하고, 상기 토큰에 대한 다른 의미 확인하고, 예외처리를 수행한 결과를 기반으로 알파벳 처리와 숫자처리를 수행한 후 토큰을 조합한 최종 입력 단어를 생성하는 전처리 과정; 상기 최종 입력 단어에 대한 부분 일치 확인하고, 기 설정된 복수의 알고리즘을 조합하여 후보 단어를 추출하고, 상기 후보 단어를 필터링하는 최종 후보 단어를 생성하는 후보 선정 과정; 상기 최종 후보 단어가 복수 개인 경우, 상기 최종 후보 단어의 근사치를 비교한 후 가장 적은 값을 갖는 단어를 표준어 여부를 확인하여 표준어로 출력하는 최종 후보 필터링 과정을 포함하는 것을 특징으로 하는 단어 단위 음성 인식 방법을 제공한다.According to an aspect of the present embodiment, in a word-by-word speech recognition method, the method includes: receiving a word-based speech input and converting it into word text through Speech-To-Text (STT); A pre-processing step of classifying tokens for the word text, checking different meanings of the tokens, performing alphabetic processing and numeric processing based on the result of performing exception processing, and then generating a final input word combining tokens; A candidate selection process of confirming partial match of the final input word, extracting a candidate word by combining a plurality of preset algorithms, and generating a final candidate word filtering the candidate word; When there are a plurality of final candidate words, a final candidate filtering process of comparing the approximate values of the final candidate words, checking whether the word having the smallest value is a standard word, and outputting a standard word. Provides a way.

본 실시예의 다른 측면에 의하면, 문장 단위 음성인식 방법에 있어서, 문장 단위의 음성 입력을 받아 STT(Speech-To-Text)를 거쳐 문장 텍스트로 변환하는 과정; 상기 문장 텍스트에 대한 토큰을 분류하고, 상기 토큰에 대한 다른 의미 확인하고, 예외처리를 수행한 결과를 기반으로 알파벳 처리와 숫자처리를 수행한 후 토큰을 조합한 최종 입력 단어를 생성하는 전처리 과정; 상기 최종 입력 단어에 대한 조사 기준 문장으로 분류한 후 가공하는 입력 문장 분류 및 가공 과정; 상기 최종 입력 단어에 입력단어로 시작하거나 끝나는 음식명사를 추출할 때 입력단어가 한 글자일 경우 완전일치 하는 경우에만 추출하는 음식 명사 부분일치 추출 과정; 상기 음식 명사 부분일치 결과를 true, false값을 통하여 부분일치하는 음식명사가 있는지의 여부를 알려주는 음식 명사 부분일치 검증 과정; 상기 음식 명사 부분일치 검증 결과 중 접속사격 조사로 나뉘어지지 않은 입력문장일 때, 단어를 순서대로 조합하고, 조합된 단어가 유효한 단어인지 검사하는 조합단어 추출 과정; 상기 최종 입력 단어, 부분 일치 단어, 모두 일치 단어를 기반으로 초기 후보 단어로 생성하고, 상기 초기 후보 단어가 내부 데이터베이스에 존재시 상기 초기 후보 단어 내에서 상기 최종 후보 단어를 선출하고, 상기 초기 후보 단어가 내부 데이터베이스에 미존재시 상기 최종 입력 단어를 이용하여 상기 최종 후보 단어를 선출하는 그룹 추출 과정; 상기 최종 입력 단어에 대한 부분 일치 확인하고, 기 설정된 복수의 알고리즘을 이용한 근사치값과 자음비교로 최종 후보 단어를 선출하는 단어 검색 과정; 상기 최종 후보 단어 중 최종 결과 및 다른 의미 결과에 저장된 사투리를 알맞은 표준어로 변환하여 각 입력 값에 관한 결과 내에서 중복 제거하는 사투리 표준어 변환 과정;을 포함하는 것을 특징으로 하는 문장 단위 음성 인식 방법을 제공한다.According to another aspect of the present embodiment, in a sentence-by-sentence speech recognition method, the method includes: receiving a sentence-by-sentence voice input and converting it into sentence text through Speech-To-Text (STT); A pre-processing step of classifying tokens for the sentence text, checking different meanings of the tokens, performing alphabetic processing and numeric processing based on a result of performing exception processing, and then generating a final input word combining tokens; An input sentence classification and processing process of classifying and processing the final input word as a search reference sentence; A process of extracting a partial match for food nouns when extracting a food noun starting or ending with an input word in the final input word, extracting only when the input word is a complete match; A food noun partial match verification process of indicating whether or not there is a food noun that partially matches the food noun partial match result through true and false values; A combined word extraction process of combining words in order and checking whether the combined word is a valid word when the food noun partial match verification result is an input sentence that is not divided by a conjunctive shooting survey; Generates an initial candidate word based on the final input word, partial match word, and all match words, selects the final candidate word from the initial candidate word when the initial candidate word exists in an internal database, and selects the initial candidate word A group extraction process of selecting the final candidate word by using the final input word if it does not exist in the internal database; A word retrieval process of checking partial coincidence of the final input word and selecting a final candidate word by comparing approximate values and consonants using a plurality of preset algorithms; Providing a sentence-by-sentence speech recognition method comprising: converting a dialect stored in a final result and another meaning result among the final candidate words into an appropriate standard language to remove redundancies in the result of each input value; do.

이상에서 설명한 바와 같이 본 실시예에 의하면, 입력된 음성에 대응하는 단어에 대한 부분 일치, 기 설정된 복수의 알고리즘을 조합하여 후보 단어를 추출하고, 후보 단어를 필터링하는 최종 후보 단어를 생성하고, 최종 후보 단어가 복수 개인 경우, 최종 후보 단어의 근사치를 비교한 후 가장 적은 값을 갖는 단어를 표준어 여부를 확인하여 표준어로 출력하는 효과가 있다.As described above, according to the present embodiment, a partial match for a word corresponding to an input voice, a candidate word is extracted by combining a plurality of preset algorithms, a final candidate word for filtering the candidate word is generated, and the final When there are a plurality of candidate words, after comparing approximate values of the final candidate words, there is an effect of outputting the standard word by checking whether the word having the smallest value is a standard word.

도 1은 본 실시예에 따른 단어 단위 음성인식 시 전처리 알고리즘을 나타낸 도면이다.
도 2는 본 실시예에 따른 단어 단위 음성인식 시 다른 의미 확인 알고리즘을 나타낸 도면이다.
도 3은 본 실시예에 따른 단어 단위 음성인식 시 예외 처리 알고리즘을 나타낸 도면이다.
도 4는 본 실시예에 따른 단어 단위 음성인식 시 알파벳 표기 변환 알고리즘을 나타낸 도면이다.
도 5는 본 실시예에 따른 단어 단위 음성인식 시 숫자 표기 변환 알고리즘을 나타낸 도면이다.
도 6은 본 실시예에 따른 단어 단위 음성인식 시 후보 선정 알고리즘을 나타낸 도면이다.
도 7은 본 실시예에 따른 단어 단위 음성인식 시 부분 일치 알고리즘을 나타낸 도면이다.
도 8은 본 실시예에 따른 단어 단위 음성인식 시 Levenshtein Distance Algorithm을 나타낸 도면이다.
도 9는 본 실시예에 따른 단어 단위 음성인식 시 가중치 형성 테이블을 나타낸 도면이다.
도 10은 본 실시예에 따른 단어 단위 음성인식 시 Hashing Algorithm을 나타낸 도면이다.
도 11은 본 실시예에 따른 단어 단위 음성인식 시 두 알고리즘의 조합을 나타낸 도면이다.
도 12는 본 실시예에 따른 단어 단위 음성인식 시 근사치 비교 알고리즘을 나타낸 도면이다.
도 13은 본 실시예에 따른 단어 단위 음성인식 시 자음 비교 알고리즘을 나타낸 도면이다.
도 14는 본 실시예에 따른 단어 단위 음성인식 시 표준어 확인 알고리즘을 나타낸 도면이다.
도 15는 본 실시예에 따른 단어 단위 음성인식 시 Mobile 기반 성능평가 Layout을 나타낸 도면이다.
도 16은 본 실시예에 따른 단어 단위 음성인식 시 표준어 불일치 단어를 나타낸 도면이다.
도 17은 본 실시예에 따른 단어 단위 음성인식 시 사투리 불일치 단어를 나타낸 도면이다.
도 18은 본 실시예에 따른 문장 단위 음성인식 시 입력 데이터 가공 알고리즘을 나타낸 도면이다.
도 19는 본 실시예에 따른 문장 단위 음성인식 시 전처리 알고리즘을 나타낸 도면이다.
도 20은 본 실시예에 따른 문장 단위 음성인식 시 예외 처리 알고리즘을 나타낸 도면이다.
도 21은 본 실시예에 따른 문장 단위 음성인식 시 알파벳 표기 변환 알고리즘을 나타낸 도면이다.
도 22는 본 실시예에 따른 문장 단위 음성인식 시 숫자 표기 변환 알고리즘을 나타낸 도면이다.
도 23은 본 실시예에 따른 문장 단위 음성인식 시 입력문장 분류 및 가공 알고리즘을 나타낸 도면이다.
도 24는 본 실시예에 따른 문장 단위 음성인식 시 단어 추출 알고리즘을 나타낸 도면이다.
도 25는 본 실시예에 따른 문장 단위 음성인식 시 후보 선정 알고리즘을 나타낸 도면이다.
도 26은 본 실시예에 따른 문장 단위 음성인식 시 부분 일치 알고리즘을 나타낸 도면이다.
도 27은 본 실시예에 따른 문장 단위 음성인식 시 Levenshtein Distance Algorithm을 나타낸 도면이다.
도 28은 본 실시예에 따른 문장 단위 음성인식 시 가중치 형성 테이블을 나타낸 도면이다.
도 29는 본 실시예에 따른 문장 단위 음성인식 시 Hashing Algorithm을 나타낸 도면이다.
도 30은 본 실시예에 따른 문장 단위 음성인식 시 두 알고리즘의 조합을 나타낸 도면이다.
도 31은 본 실시예에 따른 문장 단위 음성인식 시 근사치 비교 알고리즘을 나타낸 도면이다.
도 32는 본 실시예에 따른 문장 단위 음성인식 시 자음 비교 알고리즘을 나타낸 도면이다.
도 33은 본 실시예에 따른 문장 단위 음성인식 시 Mobile 기반 성능평가 Layout을 나타낸 도면이다.1 is a diagram showing a preprocessing algorithm for speech recognition in units of words according to the present embodiment.
2 is a diagram showing another algorithm for checking meaning in speech recognition for each word according to the present embodiment.
3 is a diagram showing an exception processing algorithm for speech recognition in units of words according to the present embodiment.
4 is a diagram showing an algorithm for converting alphabetic representation in word-based speech recognition according to the present embodiment.
5 is a diagram showing a number representation conversion algorithm for speech recognition in units of words according to the present embodiment.
6 is a diagram showing a candidate selection algorithm in speech recognition by word according to the present embodiment.
7 is a diagram illustrating a partial matching algorithm in speech recognition for each word according to the present embodiment.
8 is a diagram illustrating a Levenshtein Distance Algorithm when speech recognition is performed in units of words according to the present embodiment.
9 is a diagram illustrating a weight formation table in speech recognition by word according to the present embodiment.
10 is a diagram illustrating a Hashing Algorithm in speech recognition by word according to the present embodiment.
11 is a diagram showing a combination of two algorithms in speech recognition by word according to the present embodiment.
12 is a diagram showing an approximate value comparison algorithm for speech recognition in units of words according to the present embodiment.
13 is a diagram illustrating an algorithm for comparing consonants in speech recognition by word according to the present embodiment.
14 is a diagram illustrating an algorithm for checking standard words in speech recognition by word according to the present embodiment.
15 is a diagram illustrating a mobile-based performance evaluation layout in speech recognition by word according to the present embodiment.
16 is a diagram illustrating a standard word mismatched word during word-by-word speech recognition according to the present embodiment.
FIG. 17 is a diagram illustrating a word that is inconsistent with dialect during speech recognition by word according to the present embodiment.
18 is a diagram illustrating an input data processing algorithm when speech recognition is performed by sentence according to the present embodiment.
19 is a diagram showing a preprocessing algorithm for speech recognition by sentence according to the present embodiment.
20 is a diagram illustrating an exception processing algorithm for speech recognition by sentence according to the present embodiment.
21 is a diagram illustrating an alphabetic notation conversion algorithm when speech recognition in sentence units according to the present embodiment.
22 is a diagram illustrating an algorithm for converting number representation in speech recognition in sentences according to the present embodiment.
23 is a diagram showing an input sentence classification and processing algorithm when speech recognition by sentence according to the present embodiment.
24 is a diagram illustrating a word extraction algorithm in speech recognition for each sentence according to the present embodiment.
25 is a diagram illustrating a candidate selection algorithm in speech recognition for each sentence according to the present embodiment.
26 is a diagram illustrating a partial matching algorithm in speech recognition for each sentence according to the present embodiment.
27 is a diagram illustrating a Levenshtein Distance Algorithm when speech recognition is performed by sentence according to the present embodiment.
28 is a diagram illustrating a weight formation table in speech recognition for each sentence according to the present embodiment.
29 is a diagram illustrating a Hashing Algorithm in sentence-by-sentence speech recognition according to the present embodiment.
FIG. 30 is a diagram illustrating a combination of two algorithms in speech recognition by sentence according to the present embodiment.
31 is a diagram illustrating an approximate value comparison algorithm for speech recognition by sentence according to the present embodiment.
32 is a diagram illustrating a consonant comparison algorithm in sentence-by-sentence speech recognition according to the present embodiment.
33 is a diagram illustrating a mobile-based performance evaluation layout when speech recognition is performed by sentence according to the present embodiment.

이하, 본 실시예를 첨부된 도면을 참조하여 상세하게 설명한다.Hereinafter, this embodiment will be described in detail with reference to the accompanying drawings.

단어 단위 음성인식Word-based speech recognition

음성인식 애플리케이션을 탑재한 단말기는 단어 단위의 음성 입력을 받아 처리하는 방식으로 STT(Speech-To-Text) 입력 결과로 한 음식 명사에 대한 처리를 수행한다. A terminal equipped with a speech recognition application processes a food noun as a result of STT (Speech-To-Text) input in a manner that receives and processes a word-based voice input.

음성인식 애플리케이션은 단말기에 탑재된다. 단말기는 스마트폰(Smart Phone), 태블릿(Tablet), 랩톱(Laptop), 개인용 컴퓨터(PC: Personal Computer), 휴대형 멀티미디어 플레이어(PMP: Portable Multimedia Player), 무선 통신 단말기(Wireless Communication Terminal) 등과 같은 전자 기기일 수 있다. The voice recognition application is installed on the terminal. Terminals include electronic devices such as smart phones, tablets, laptops, personal computers (PCs), portable multimedia players (PMPs), and wireless communication terminals. It can be a device.

단말기는 (ⅰ) 각종 기기 또는 유무선 네트워크와 통신을 수행하기 위한 통신 모뎀 등의 통신 장치, (ⅱ) 각종 프로그램과 데이터를 저장하기 위한 메모리, (ⅲ) 프로그램을 실행하여 연산 및 제어하기 위한 마이크로프로세서 등을 구비하는 다양한 장치이다. 적어도 일 실시예에 따르면, 메모리는 램(Random Access Memory: RAM), 롬(Read Only Memory: ROM), 플래시 메모리, 광 디스크, 자기 디스크, 솔리드 스테이트 디스크(Solid State Disk: SSD) 등의 컴퓨터로 판독 가능한 기록/저장매체일 수 있다. 적어도 일 실시예에 따르면, 마이크로프로세서는 명세서상에 기재된 동작과 기능을 하나 이상 선택적으로 수행하도록 프로그램될 수 있다. 적어도 일 실시예에 따르면, 마이크로프로세서는 전체 또는 부분적으로 특정한 구성의 주문형반도체(Application Specific Integrated Circuit: ASIC) 등의 하드웨어로써 구현될 수 있다.The terminal is (i) a communication device such as a communication modem for performing communication with various devices or wired/wireless networks, (ii) a memory for storing various programs and data, and (iii) a microprocessor for operation and control by executing programs. It is a variety of devices including the like. According to at least one embodiment, the memory is a computer such as a random access memory (RAM), a read only memory (ROM), a flash memory, an optical disk, a magnetic disk, or a solid state disk (SSD). It may be a readable recording/storing medium. According to at least one embodiment, the microprocessor may be programmed to selectively perform one or more of the operations and functions described in the specification. According to at least one embodiment, the microprocessor may be implemented entirely or partially as hardware such as an Application Specific Integrated Circuit (ASIC) having a specific configuration.

단어 단위 음성인식 시 전처리Pre-processing for word-based speech recognition

음성인식 애플리케이션을 탑재한 단말기는 전처리 과정으로 도 1에 도시된 바와 같이 STT를 거쳐 나온 입력값에 띄어쓰기가 있는지 확인한 후 띄어쓰기가 있을 경우 띄어쓰기를 기준으로 입력 단어를 분류하고 이를 토큰으로 정의한다.As a preprocessing process, the terminal equipped with the voice recognition application checks whether there is a space in the input value output through the STT as shown in FIG. 1, and if there is space, classifies the input word based on the space and defines it as a token.

만약 띄어쓰기가 있는 경우, 음성인식 애플리케이션을 탑재한 단말기는 단어를 띄어쓰기로 분류하여 다른 의미가 있는지 확인함과 동시에 예외 단어가 있는지 확인하고, 띄어쓰기가 없는 경우 입력값 그대로 다른 의미가 있는지 확인함과 동시에 예외단어가 있는지 확인한다.If there are spaces, the terminal equipped with the voice recognition application classifies words as spaces and checks if there is any other meaning, and checks if there is an exception word, and if there is no space, checks if the input value has a different meaning. Check for exception words.

음성인식 애플리케이션을 탑재한 단말기는 다른 의미 확인을 위해 입력받은 값이 어떤 단어의 다른 의미인 경우 해당하는 원래단어를 저장하고 아닐경우 무시한다. 음성인식 애플리케이션을 탑재한 단말기는 저장된 원래 단어를 최종 후보 필터링 부분에서 표준어 확인을 거쳐 사용자에게 해당 단어를 찾는지 알린다. 음성인식 애플리케이션을 탑재한 단말기는 예외 단어가 있는지 확인하고 있을 경우 예외처리를 하고 없을 경우 알파벳 처리를 실행한다.A terminal equipped with a voice recognition application stores the corresponding original word if the input value is a different meaning of a word for checking different meanings, and ignores it if not. The terminal equipped with the speech recognition application checks the standard word in the final candidate filtering part of the stored original word and informs the user whether to find the word. The terminal equipped with the voice recognition application checks whether there is an exception word, and if there is an exception, it processes the exception, and if it does not, it performs the alphabet process.

음성인식 애플리케이션을 탑재한 단말기는 띄어쓰기가 있는 경우 예외 또는 알파벳 처리로 인해 변경된 단어를 입력의 해당 토큰 부분을 변경된 단어로 바꾸어 출력하고, 띄어쓰기가 없는 경우 예외 또는 알파벳 처리 결과 그대로 출력한다. 이후 숫자 처리를 통해 입력값 내에 존재하는 숫자를 변경한다.The terminal equipped with the voice recognition application outputs the word changed due to exception or alphabet processing by replacing the corresponding token part of the input with the changed word if there is a space, and outputs the result of the exception or alphabet processing as it is if there is no space. After that, the number existing in the input value is changed through number processing.

음성인식 애플리케이션을 탑재한 단말기는 마지막으로 숫자 처리를 통해 나온 여러 입력 단어들을 띄어쓰기가 있을 경우 각 단어의 글자 순서와 띄어쓰기를 기준으로 토큰 조합을 실행하여 최종 입력 단어를 생성하고 반대로 띄어쓰기가 없는 경우에는 토큰 분류 없이 숫자 처리를 거쳐 나온 여러 입력 단어를 최종 입력 단어로 생성한다.Terminals equipped with a voice recognition application finally generate the final input word by executing a token combination based on the letter order and spacing of each word when there is a space between the multiple input words generated through number processing. Conversely, if there is no space Multiple input words that have passed through number processing without token classification are generated as final input words.

단어 단위 음성인식 시 토큰 분류Token classification in word-based speech recognition

음성인식 애플리케이션을 탑재한 단말기는 STT의 인식 결과 띄어쓰기가 포함 된 경우 각 단어를 띄어쓰기를 기준으로 분류하여 예외 처리와 알파벳 처리를 부분적으로 할 수 있도록 한다.The terminal equipped with the voice recognition application classifies each word based on the spacing when the STT recognition result includes spacing, so that exception processing and alphabet processing can be partially performed.

단어 단위 음성인식 시 다른 의미 확인Checking different meanings in word-by-word speech recognition

음성인식 애플리케이션을 탑재한 단말기는 도 2에 도시된 바와 같이, 입력값에 다른 의미(표준어, 사투리) 테이블의 다른 의미 column과 일치하는 단어가 있는지 확인한다. 만약 일치하는 단어가 있을 경우, 음성인식 애플리케이션을 탑재한 단말기는 해당하는 원 의미 단어를 저장하여 최종 후보 필터링에 표준어 확인 단계에서 표준어로 바꾼 후 최종 결과 단어와 함께 출력한다.As shown in FIG. 2, the terminal equipped with the voice recognition application checks whether there is a word that matches the other meaning column of the other meaning (standard language, dialect) table in the input value. If there is a matching word, the terminal equipped with the speech recognition application stores the corresponding original meaning word, converts it into a standard word in the standard word check step in final candidate filtering, and outputs it together with the final result word.

단어 단위 음성인식 시 예외처리Exception handling in word-based speech recognition

음성인식 애플리케이션을 탑재한 단말기는 STT의 DB 내에 존재하지않는 단어를 예외처리 하는 알고리즘으로 예외(표준어, 사투리) 테이블의 단어와 비교하여 원단어로 변경한다. 음성인식 애플리케이션을 탑재한 단말기는 도 3에 도시된 바와 같이, 입력이 예외(표준어, 사투리) 테이블 내의 예외결과 column과 비교하여 일치하는 경우 해당하는 원래 단어로 변환하여 출력한다.The terminal equipped with the voice recognition application is an algorithm that handles words that do not exist in the DB of STT and compares them with words in the exception (standard language, dialect) table and changes them to the final word. As shown in FIG. 3, the terminal equipped with the voice recognition application compares the input with the exception result column in the exception (standard language, dialect) table, converts it into a corresponding original word, and outputs it.

단어 단위 음성인식 시 알파벳 처리Alphabet processing for word-based speech recognition

음성인식 애플리케이션을 탑재한 단말기는 입력값에 알파벳이 포함된 단어가 있을 경우 알파벳을 한글로 변경하기 위한 알고리즘으로 도 4에 도시된 바와 같이 입력값에 알파벳이 존재하는지 확인하고 만약 알파벳이 존재할 경우 그 알파벳의 한글 표기를 알파벳 테이블에서 찾아 변환하고 출력한다. The terminal equipped with the speech recognition application is an algorithm for changing the alphabet into Korean when there is a word containing the alphabet in the input value. As shown in Fig. 4, the terminal checks whether the alphabet exists in the input value, and if there is an alphabet, Finds the Korean notation of the alphabet in the alphabet table, converts it, and prints it.

음성인식 애플리케이션을 탑재한 단말기는 알파벳 처리 알고리즘을 이용하여 한 알파벳마다 알파벳 테이블의 알파벳과 비교하여 기본 한글 발음으로 변경하는 것으로 입력값 안에 알파벳이 존재하지 않음을 확인할 때까지 계속 변환한다.The terminal equipped with the speech recognition application uses an alphabet processing algorithm to compare the alphabet in the alphabet table for each alphabet and change it to the basic Korean pronunciation. The conversion continues until it is confirmed that no alphabet exists in the input value.

단어 단위 음성인식 시 숫자처리Number processing in word-based speech recognition

음성인식 애플리케이션을 탑재한 단말기는 입력값에 숫자가 포함된 단어가 있을 경우 숫자를 한글로 변경하기 위한 알고리즘으로 숫자의 단위와 한글 표기법을 고려하고 간혹 숫자가 포함된 음식 명칭이 있는 경우를 대비하여 숫자가 한글 표기로 변경되지 않은 결과도 함께 저장하여 최대 5가지의 변경 결과값을 출력한다.A terminal equipped with a voice recognition application is an algorithm for converting numbers into Korean when there is a word containing numbers in the input value, taking into account the unit of the number and the Korean notation, and in case there are sometimes food names containing numbers. The result of the number that has not been changed in Korean notation is also saved, and up to 5 change result values are output.

음성인식 애플리케이션을 탑재한 단말기는 도 5에 도시된 바와 같이, 입력이 들어오면 숫자가 존재하는지 확인한 후 존재하지 않을 경우에는 별다른 처리 없이 그대로 결과를 출력하고 숫자가 존재 할 경우 연속적인 숫자가 있는지 판별한다. As shown in Fig. 5, the terminal equipped with the voice recognition application checks whether a number exists when an input is received, and if it does not exist, outputs the result as it is without any special processing, and determines whether there is a continuous number if there is a number. do.

만약, 연속되는 숫자가 존재할 경우, 음성인식 애플리케이션을 탑재한 단말기는 단위와 숫자 표기를 고려한 4가지 결과를 출력하고 없을 경우 숫자 표기만 고려한 2가지의 결과를 출력한다. 이때, 음성인식 애플리케이션을 탑재한 단말기는 두 가지 경우 모두 숫자 테이블에 존재하는 각 표기 별 단어와 단위 별 단어를 이용하여 변환하며 숫자를 변경하지 않은 결과도 함께 출력한다.If there are consecutive numbers, the terminal equipped with the voice recognition application outputs 4 results considering the units and numbers, and if not, outputs 2 results considering only the numbers. At this time, in both cases, the terminal equipped with the voice recognition application converts using the words for each notation and the word for each unit existing in the number table, and outputs the result without changing the number.

단어 단위 음성인식 시 토큰 조합Token combination for word-based speech recognition

음성인식 애플리케이션을 탑재한 단말기는 STT의 인식 결과 띄어쓰기가 포함 된 경우 1.1을 통해 띄어쓰기를 기준으로 분류하여 토큰화 하고 토큰화 된 단어가 1.2, 1.3, 1.4를 거쳐 변경 되었을 때 [표1]과 같이 각각 변경된 입력 결과의 순서를 기준으로 변경된 단어를 조합하고 띄어쓰기와 중복된 단어는 삭제함으로써 최종 입력 결과를 만든다.Terminals equipped with speech recognition applications are tokenized by classifying them based on spacing through 1.1 when spaces are included as a result of STT recognition, and when tokenized words are changed through 1.2, 1.3, and 1.4, as shown in [Table 1]. The final input result is created by combining the changed words based on the order of the changed input results and deleting spaces and duplicate words.

이때, 음성인식 애플리케이션을 탑재한 단말기는 글자 길이에 따라 초기 근사치 값을 생성하는데 [수학식1]과 같은 방법으로 생성하며 이것을 SCORE라 부른다. At this time, the terminal equipped with the voice recognition application generates an initial approximation value according to the character length, which is generated in the same manner as in [Equation 1], and this is called SCORE.

음성인식 애플리케이션을 탑재한 단말기는 초기 근사치 점수를 해당하는 최종 입력단어와 함께 저장한다. 띄어쓰기가 없어 토큰 분류 과정을 거치지 않는 경우, 음성인식 애플리케이션을 탑재한 단말기는 초기 근사치를 0으로 설정한다.The terminal equipped with the voice recognition application stores the initial approximate score along with the corresponding final input word. When the token classification process is not performed because there is no space, the terminal equipped with the voice recognition application sets the initial approximation to 0.

단어 단위 음성인식 시 후보 선정Candidate selection for word-based speech recognition

음성인식 애플리케이션을 탑재한 단말기는 후보 선정에 앞서 전처리 단어는 1개 또는 그 이상일 수 있다. 음성인식 애플리케이션을 탑재한 단말기는 후보 선정을 위해 도 6에 도시된 바와 같이, 각각의 최종 입력 단어에 적용되는 과정으로 부분 일치 확인을 통해 최종입력단어와 부분 또는 모두 일치하는 단어를 찾아 초기 후보 단어를 생성한다.A terminal equipped with a voice recognition application may have one or more preprocessed words prior to candidate selection. A terminal equipped with a speech recognition application is a process applied to each final input word for candidate selection, as shown in FIG. 6, and searches for a word that partially or all matches the final input word through partial match check and finds the initial candidate word. Create

만약, 초기 후보 단어가 존재하면, 음성인식 애플리케이션을 탑재한 단말기는 초기 후보 단어 내에서 최종 후보 단어를 선출하고 초기 후보 단어가 없으면 최종입력단어를 이용해 최종 후보 단어를 선출한다.If there is an initial candidate word, the terminal equipped with the speech recognition application selects a final candidate word from the initial candidate word, and if there is no initial candidate word, the final candidate word is selected using the final input word.

음성인식 애플리케이션을 탑재한 단말기는 최종 후보 단어 선출을 위해 먼저, 앞서 전처리 때 만들어진 초기 근사치 값이 0일 경우에 Hashing Algorithm과 Levenshtein Distance Algorithm의 결과로 나온 단어들을 매칭시켜 같은 단어가 있을 경우 그 단어의 근사치에 -1 값을 더해주고. 만약 0이 아닐 경우 Hashing Algorithm은 사용하지 않으며 근사치 값을 그대로 저장한다.In order to select the final candidate word, the terminal equipped with the speech recognition application first matches the words resulting from the Hashing Algorithm and the Levenshtein Distance Algorithm when the initial approximation value created in the preprocessing is 0. Add -1 to the approximation. If it is not 0, the Hashing Algorithm is not used and the approximate value is stored as it is.

이후 음성인식 애플리케이션을 탑재한 단말기는 근사치 비교 과정을 거쳐 후보 단어의 수를 줄이고 만약 후보 단어의 수가 2개 이상일 경우 자음 비교를 거쳐 한번 더 후보 단어의 수를 줄임으로써 최종 후보 단어를 선출한다.After that, the terminal equipped with the speech recognition application reduces the number of candidate words through an approximate comparison process, and if the number of candidate words is more than two, it performs consonant comparison and selects the final candidate word by reducing the number of candidate words once more.

단어 단위 음성인식 시 부분 일치 확인Partial match check for word-by-word speech recognition

음성인식 애플리케이션을 탑재한 단말기는 부분 일치 확인을 위해 도 7에 도시된 바와 같이, 최종입력단어가 표준어 테이블, 사투리 테이블의 단어 내에 부분적 또는 모두 일치하는 단어를 색출한다. 이때, 부분 일치하는 단어가 있을 경우, 음성인식 애플리케이션을 탑재한 단말기는 true 값을 전달하고 없을 경우 false 값을 전달한다.The terminal equipped with the voice recognition application searches for a word in which the final input word partially or all matches within the words of the standard word table and dialect table as shown in FIG. 7 to confirm partial match. In this case, if there is a partial match, the terminal equipped with the voice recognition application transmits a true value, and if there is no word, a false value.

단어 단위 음성인식 시 Word-based speech recognition LevenshteinLevenshtein Distance Algorithm Distance Algorithm

음성인식 애플리케이션을 탑재한 단말기는 Levenshtein Distance Algorithm을 이용하여 표준어 테이블과 사투리 테이블에 존재하는 단어들 중 어떤 단어가 전처리를 거친 입력단어와 얼마나 유사한지 검사하는 알고리즘으로 도 8에 도시된 바와 같이, 후보단어를 색출한다. 이때, 입력받은 단어와 동일해지기 위해서 몇번의 글자 변경이 일어나야 하는지 기록하고 이를 근사치로 정의한다. 근사치는 그 값이 적을 수록 가장 근사함을 의미한다.The terminal equipped with the speech recognition application uses the Levenshtein Distance Algorithm as an algorithm that checks how similar a word among the words in the standard language table and the dialect table is to the preprocessed input word, as shown in FIG. Search for words. At this time, record how many times the letter needs to be changed to become the same as the input word, and define it as an approximation. The approximate value means that the smaller the value, the closest.

근사치는 Levenshtein Distance Algorithm을 통해 얻은 후보단어의 근사치에 해당 최종입력단어의 초기 근사치 값을 더한다.For the approximation, the initial approximation value of the final input word is added to the approximate value of the candidate word obtained through the Levenshtein Distance Algorithm.

음성인식 애플리케이션을 탑재한 단말기는 현재 후보 선정에 있어서 근사치 값이 5 이하일 경우에만 후보에 올리고 각 후보 단어의 근사치에 앞서 전처리를 거친 각각의 입력단어가 가지는 근사치를 더한다. In the current candidate selection, a terminal equipped with a speech recognition application is placed in a candidate only when the approximate value is 5 or less, and adds the approximate value of each input word that has undergone preprocessing prior to the approximate value of each candidate word.

단어 단위 음성인식 시 Hashing AlgorithmHashing Algorithm for word-based speech recognition

음성인식 애플리케이션을 탑재한 단말기는 Hashing Algorithm을 이용하여 도 10에 도시된 바와 같은 알고리즘을 실행한다. 먼저, 음성인식 애플리케이션을 탑재한 단말기는 발음 유사도를 기준으로 만든 가중치 형성 테이블인 도 9에 도시된 바를 기준으로 입력단어의 hash code를 구하여 표준어 테이블과 사투리 테이블에 존재하는 단어들의 hash code값과 비교하여 같은 값을 가지는 단어를 색출한다. 단, 음성인식 애플리케이션을 탑재한 단말기는 원래 입력 단어와 글자수가 같을 경우이면서 숫자가 포함되지 않은 경우에만 실행한다. 여기서, 초기 근사치 값이 0이고 숫자가 없을 때를 의미한다. 도 9에 도시된 가중치 형성 테이블은

은 초성,

은 중성,

은 종성을 나타낸다. A terminal equipped with a voice recognition application executes an algorithm as shown in FIG. 10 using a Hashing Algorithm. First, the terminal equipped with the speech recognition application obtains the hash code of the input word based on the bar shown in Fig. 9, which is a weight formation table based on the pronunciation similarity, and compares the hash code value of words in the standard language table and dialect table. And search for words with the same value. However, the terminal equipped with the voice recognition application is executed only when the original input word and the number of characters are the same and no numbers are included. Here, it means when the initial approximation value is 0 and there is no number. The weight formation table shown in FIG. 9 is

Silver First Star,

Is neutral,

Represents the finality.

단어 단위 음성인식 시 두 알고리즘의 조합Combination of two algorithms in word-based speech recognition

음성인식 애플리케이션을 탑재한 단말기는 도 11에 도시된 바와 같이, Hashing Algorithm으로 인해 나온 후보 단어와 Levenshtein Distance Algorithm에 나온 후보단어 중 같은 단어가 있을 때 전처리를 거친 입력단어와 글자수가 같은 경우에만 그 단어의 근사치 값에 -1을 더해준다. 단, 음성인식 애플리케이션을 탑재한 단말기는 원래 입력 단어와 글자수가 같을 경우에만 실행한다. 여기서, 초기 근사치 값이 0일때를 의미한다.As shown in FIG. 11, the terminal equipped with the speech recognition application has the same word among the candidate words from the Hashing Algorithm and the candidate words from the Levenshtein Distance Algorithm. Add -1 to the approximate value of. However, a terminal equipped with a voice recognition application runs only when the original input word and the number of characters are the same. Here, it means when the initial approximation value is 0.

단어 단위 음성인식 시 후보 단어 Candidate words for word-by-word speech recognition 필터링Filtering

단어 단위 음성인식 시 근사치 비교Approximate comparison for word-based speech recognition

음성인식 애플리케이션을 탑재한 단말기는 근사치 비교를 위해, 도 12에 도시된 바와 같이, 각 후보 단어에 근사치를 비교하여 가장 적은 근사치 값을 가지는 후보 단어를 최종 결과로 저장한다. 이때 같은 근사치를 갖는 단어도 함께 저장한다.The terminal equipped with the speech recognition application compares the approximate values to each candidate word and stores the candidate word having the smallest approximation value as a final result for approximate comparison, as shown in FIG. 12. At this time, words with the same approximation are also stored.

단어 단위 음성인식 시 자음 비교Consonant comparison in word-based speech recognition

음성인식 애플리케이션을 탑재한 단말기는 자음 비교를 위해 근사치를 비교한 뒤에도 단어가 2개 이상일 경우 그림 13과 같이 최종입력단어의 자음 순서를 비교하여 가장 비슷한 단어를 최종 결과로 저장한다. 단, 음성인식 애플리케이션을 탑재한 단말기는 원래단어의 글자수와 같지 않은 입력단어의 후보단어일 경우 자음의 순서가 모두 같아야만 최종 후보 결과에 저장한다.The terminal equipped with the speech recognition application compares the consonant order of the final input word as shown in Figure 13, and stores the most similar word as the final result when there are two or more words after approximation for consonant comparison. However, in the case of a terminal equipped with a voice recognition application, in case of a candidate word for an input word that is not the same as the number of characters in the original word, the order of consonants must be the same to store it in the final candidate result.

단어 단위 음성인식 시 최종 후보 Final candidate for word-based speech recognition 필터링Filtering

근사치 비교Approximate comparison

음성인식 애플리케이션을 탑재한 단말기는 최종 후보 단어의 개수가 2개 이상일 때 근사치 비교 알고리즘으로 각 최종 후보 단어의 근사치를 비교하여 가장 적은 값을 가지는 단어를 최종 결과로 저장한다. 이때 같은 근사치를 갖는 단어도 저장한다.When the number of final candidate words is two or more, the terminal equipped with the speech recognition application compares the approximate values of each final candidate word with an approximation comparison algorithm and stores the word with the smallest value as a final result. At this time, words with the same approximation are also stored.

표준어 확인Standard language check

음성인식 애플리케이션을 탑재한 단말기는 도 14에 도시된 바와 같이 최종 결과 단어와 전처리 단계에서 찾아낸 다른 의미 단어의 원래 단어 중 SRD 테이블의 사투리 column과 일치하는 단어가 있는지 확인한다. 만약 일치하는 단어가 있을 경우 해당하는 표준어를 최종 결과 단어와 함께 출력하고 없을 경우 최종 결과 단어만 출력한다.As shown in FIG. 14, the terminal equipped with the speech recognition application checks whether there is a word that matches the dialect column of the SRD table among the final result word and the original word of another meaning word found in the preprocessing step. If there is a matching word, the corresponding standard word is output along with the final result word. If there is no match, only the final result word is output.

단어 단위 인식률Word-based recognition rate

인식률 테스트 환경Recognition rate test environment

음성인식 애플리케이션을 탑재한 단말기는 단어 단위 인식률을 테스트하기 위해, 3번의 발화 기회를 부여하고, 테스트 단어와 동일하지 않을 경우 저장하고, 3번 이내에 나오지 않은 단어 확인한다.In order to test the recognition rate by word, a terminal equipped with a voice recognition application gives three utterance opportunities, saves it if it is not the same as the test word, and checks the word that did not appear within three times.

단어 단위 음성인식 애플리케이션 UIWord-based speech recognition application UI

도 5에 도시된 애플리케이션 실행화면 상의 원본 단어 영역①은 표준 테이블과 사투리 테이블에 존재하는 단어를 하나씩 출력한다.The original word area ① on the application execution screen shown in FIG. 5 outputs words that exist in the standard table and dialect table one by one.

애플리케이션 실행화면 상의 음성인식 결과 단어 영역②은 사용자가 원본 단어를 읽은 결과 출력한다.The speech recognition result word area ② on the application execution screen outputs the result of the user reading the original word.

애플리케이션 실행화면 상의 처리결과 영역③는 저장 또는 저장취소 버튼을 누른 후 잘 처리되었는지 확인한다.Check that the processing result area ③ on the application execution screen has been processed properly after pressing the Save or Cancel button.

애플리케이션 실행화면 상의 마이크 버튼④는 마이크 버튼을 누르면 사용자 음성을 입력 받는다.When the microphone button ④ on the application execution screen is pressed, the user's voice is input.

애플리케이션 실행화면 상의 뒤로⑤ 버튼은 이전 단어로 넘어간다.The Back⑤ button on the application execution screen moves to the previous word.

애플리케이션 실행화면 상의 앞으로 ⑥ 버튼은 다음 단어로 넘어간다.The forward ⑥ button on the application execution screen moves to the next word.

애플리케이션 실행화면 상의 저장하기 ⑦ 버튼은 원본 단어와 음성인식결과단어가 다를 경우 저장한다.The Save ⑦ button on the application execution screen saves when the original word and the voice recognition result word are different.

애플리케이션 실행화면 상의 저장취소 ⑧ 버튼은 잘못 저장했을 경우 저장을 취소한다.The Save Cancel ⑧ button on the application execution screen cancels saving if it is saved incorrectly.

인식률 결과Recognition rate result

표준어의 경우 1578개의 단어 중 4개 불일치, 약 99.7% 적중률을 보이며, 표준어 불일치 단어는 도 16에 도시한 바와 같다.In the case of a standard word, 4 out of 1578 words are inconsistent, showing a hit rate of about 99.7%, and the standard word inconsistent words are as shown in FIG. 16.

사투리의 경우, 661개의 단어 중 21개 불일치, 약 96% 적중률을 보이며, 표준어가 같은 경우 77개이며, 이 경우 적중률은 약 88%이다. 사투리 불일치 단어는 도 17에 도시된 바와 같다.In the case of dialect, 21 out of 661 words are inconsistent and have a hit rate of about 96%, and when the standard words are the same, they are 77, and in this case, the hit rate is about 88%. The dialect mismatch words are as shown in FIG. 17.

문장 단위 음성인식Speech recognition by sentence

문장 단위 음성인식은 음성 입력을 하나의 문장 단위로 받으며, 한끼 식사 량에 대해서 입력을 받는다. Sentence-based speech recognition receives voice input in one sentence unit, and receives input for the amount of one meal.

문장 단위 음성인식 시 입력 데이터 가공Input data processing for speech recognition by sentence

음성인식 애플리케이션을 탑재한 단말기는 문장 단위 음성인식을 위해 음성 입력을 하나의 문장 단위로 받으며, 한끼 식사 량에 대해서 입력을 받는다. 입력 데이터 가공 알고리즘의 구조는 도 18에 도시된 바와 같다.A terminal equipped with a voice recognition application receives voice input as a sentence unit for sentence unit voice recognition, and receives input for the amount of one meal. The structure of the input data processing algorithm is as shown in FIG. 18.

문장 단위 음성인식 시 전처리Pre-processing for speech recognition by sentence

음성인식 애플리케이션을 탑재한 단말기는 전처리를 위해 도 19에 도시된 바와 같이 STT를 거쳐 나온 입력 문장을 띄어쓰기를 기준으로 분리하여 예외단어가 있는지 검사한다. 음성인식 애플리케이션을 탑재한 단말기는 전처리 과정에서 토큰분류, 예외처리, 숫자처리, 알파벳처리를 수행한다.As shown in FIG. 19, for pre-processing, the terminal equipped with the voice recognition application separates the input sentence from the STT based on spaces and checks whether there is an exception word. A terminal equipped with a voice recognition application performs token classification, exception processing, number processing, and alphabet processing during pre-processing.

먼저, 음성인식 애플리케이션을 탑재한 단말기는 토큰분류로 입력 문장을 띄어쓰기를 기준으로 분리하고 이후 각 단어의 순서에 기준을 두어 예외단어 table에 존재하는 예외단어 리스트와 부분일치하는 지의 여부를 확인한다. First, the terminal equipped with the voice recognition application divides the input sentence based on the spacing by token classification, and then checks whether or not it partially matches the exception word list existing in the exception word table based on the order of each word.

음성인식 애플리케이션을 탑재한 단말기는 어떤 단어가 예외단어와 부분일치 한다면 그 단어 뒤에 오는 단어와 조합하여 다시 예외단어 리스트와 부분일치 하는 지 확인한다. 이때, 음성인식 애플리케이션을 탑재한 단말기는 부분일치 하지 않는다면 이전에 부분일치 되었던 앞단어는 예외처리 과정으로 넘어가고 부분일치 되지 않은 단어는 다시 예외단어 리스트와 부분일치하는지 검증한다. (부분일치 기준은 “그 단어로 시작하는 예외단어가 존재하는가?” 이다.)A terminal equipped with a voice recognition application checks if a word partially matches the exception word by combining it with the word after the word and partially matches the exception word list. At this time, if the terminal equipped with the voice recognition application does not partially match, the previous word that was partially matched goes to the exception processing process, and the words that did not match partially match the exception word list again. (The criterion for partial match is “Is there any exception word that begins with that word?”)

이러한 반복 구조는 문장단위로 입력되는 결과에 각 단어가 서로 띄어쓰기 되어있는 경우가 대다수이므로 예외단어 검사 시 각 단어를 위와 같은 방식으로 조합함으로써 찾아내기 위함이다. 음성인식 애플리케이션을 탑재한 단말기는 예외단어 리스트 중 일치하는 단어가 있을 경우 해당 원단어로 변환한다.This repetition structure is to find out by combining each word in the same manner as above when examining the exception word, since most of the cases are spaced apart from each other in the result input in sentence units. A terminal equipped with a voice recognition application converts a matching word from the exception word list into the corresponding original word.

이후, 음성인식 애플리케이션을 탑재한 단말기는 알파벳 및 숫자 예외처리를 거쳐 변환되지 못한 알파벳과 숫자를 한글표기로 바꾸어 변환된 결과를 저장하고 출력값으로 내보낸다.Thereafter, the terminal equipped with the voice recognition application converts alphabets and numbers that could not be converted through the alphabet and number exception processing into Korean notation, stores the converted result, and sends it as an output value.

문장 단위 음성인식 시 토큰 분류Token classification in sentence-level speech recognition

음성인식 애플리케이션을 탑재한 단말기는 STT의 인식 결과 띄어쓰기가 포함된 경우 각 단어를 띄어쓰기를 기준으로 분류하여 예외 처리와 알파벳 처리를 부분적으로 할 수 있도록 한다.A terminal equipped with a voice recognition application classifies each word based on the spacing as a result of STT recognition, so that exception processing and alphabet processing can be partially performed.

문장 단위 음성인식 시 예외 처리Exception handling for speech recognition by sentence

음성인식 애플리케이션을 탑재한 단말기는 STT의 DB내에 존재하지 않는 단어를 예외처리 하는 알고리즘으로 예외(표준어, 사투리)테이블의 단어와 비교하여 원단어로 변경한다. 음성인식 애플리케이션을 탑재한 단말기는 예외 처리를 위해 도 20에 도시된 바와 같이 입력이 예외(표준어, 사투리)테이블 내의 예외결과 column과 비교하여 일치하는 경우 해당하는 원래 단어로 변환하여 출력한다.A terminal equipped with a voice recognition application is an algorithm that handles words that do not exist in the DB of STT and compares them with words in the exception (standard language, dialect) table and changes them to the final word. For exception processing, the terminal equipped with the voice recognition application compares the input with the exception result column in the exception (standard language, dialect) table and converts it into a corresponding original word and outputs it if it matches.

문장 단위 음성인식 시 알파벳 처리Alphabet processing for speech recognition by sentence

음성인식 애플리케이션을 탑재한 단말기는 입력값에 알파벳이 포함된 단어가 있을 경우 알파벳을 한글로 변경하기 위한 알고리즘으로 도 21에 도시된 바와 같이 입력값에 알파벳이 존재하는지 확인하고 만약 알파벳이 존재할 경우 그 알파벳의 한글 표기를 알파벳 테이블에서 찾아 변환 하고 출력한다. The terminal equipped with the voice recognition application is an algorithm for changing the alphabet into Korean when there is a word containing the alphabet in the input value. As shown in Fig. 21, the terminal checks whether the alphabet exists in the input value, and if there is an alphabet, The Korean notation of the alphabet is found in the alphabet table, converted, and printed.

음성인식 애플리케이션을 탑재한 단말기는 알파벳 표기 변환 알고리즘을 이용하여 한 알파벳 마다 알파벳 테이블의 알파벳과 비교하여 기본 한글 발음으로 변경하는 것으로 입력값 안에 알파벳이 존재하지 않음을 확인할 때까지 계속 변환한다.The terminal equipped with the voice recognition application uses an alphabetic notation conversion algorithm to compare each alphabet with the alphabet in the alphabet table and change it to the basic Korean pronunciation. Until it is confirmed that no alphabet exists in the input value, the terminal converts it.

문장 단위 음성인식 시 숫자 처리Number processing in sentence-level speech recognition

음성인식 애플리케이션을 탑재한 단말기는 입력값에 숫자가 포함된 단어가 있을 경우 숫자를 한글로 변경하기 위한 알고리즘으로 숫자의 단위와 한글 표기법을 고려하고 간혹 숫자가 포함된 음식 명칭이 있는 경우를 대비하여 숫자가 한글 표기로 변경되지 않은 결과도 함께 저장하여 최대 5가지의 변경 결과값이 나온다. A terminal equipped with a voice recognition application is an algorithm for converting numbers into Korean when there is a word containing numbers in the input value, taking into account the unit of the number and the Korean notation, and in case there are sometimes food names containing numbers. The result of the number that has not been changed in Korean notation is also saved, and up to 5 change result values are displayed.

음성인식 애플리케이션을 탑재한 단말기는 숫자 처리를 위해 도 22에 도시된 바와 같이, 입력이 들어오면 숫자가 존재하는지 확인한 후 존재하지 않을 경우에는 별다른 처리 없이 그대로 결과를 출력하고 숫자가 존재할 경우 연속적인 숫자가 있는지 판별한다. As shown in FIG. 22 for number processing, the terminal equipped with the voice recognition application checks whether a number exists when an input is received, and if it does not exist, it outputs the result as it is without any special processing, and if there is a number, consecutive numbers Determine if there is.

만약, 음성인식 애플리케이션을 탑재한 단말기는 연속되는 숫자가 존재할 경우 단위와 숫자 표기를 고려한 4가지 결과를 출력하고 없을 경우 숫자 표기만 고려한 2가지의 결과를 출력한다. 이때, 음성인식 애플리케이션을 탑재한 단말기는 두 가지 경우 모두 숫자 테이블에 존재하는 각 표기 별 단어와, 단위별 단어를 이용하여 변환하며 숫자를 변경하지 않은 결과도 함께 포함한다.If a terminal equipped with a voice recognition application has four consecutive numbers, it outputs four results that consider the unit and number representation, and if no, it outputs two results that consider only the number representation. At this time, in both cases, the terminal equipped with the voice recognition application converts using a word for each notation existing in the number table and a word for each unit, and includes the result of not changing the number.

문장 단위 음성인식 시 입력 문장 분류 및 가공Classification and processing of input sentences for speech recognition by sentence

음성인식 애플리케이션을 탑재한 단말기는 전처리 과정을 거쳐 나온 입력 문장은 띄어쓰기 기준으로 분리되어 각 단어별로 전처리 결과를 가진다. 그 순서는 입력 문장의 순서와 동일하다. In the terminal equipped with the voice recognition application, the input sentences that have passed through the pre-processing process are separated based on spaces and have the pre-processing result for each word. The order is the same as the order of the input sentences.

음성인식 애플리케이션을 탑재한 단말기는 전처리 과정을 거친 입력 문장을 접속사격 조사를 기준으로 분리하는 과정이 조사기준 문장 분류 과정이다. In a terminal equipped with a voice recognition application, the process of separating the input sentences that have undergone pre-processing based on the conjunctive fire investigation is the investigation standard sentence classification process.

음성인식 애플리케이션을 탑재한 단말기는 조사기준 문장 분류 과정 과정을 위해 도 23에 도시된 바와 같이, 접속사격 조사를 가진 단어를 기준으로 문장을 나누고 그 기준으로 단어를 가공하고 재조합 한다.The terminal equipped with the voice recognition application divides the sentences based on the words with the conjunctive fire investigation, and processes and reassembles the words based on the words as shown in FIG.

문장 단위 음성인식 시 조사 기준 문장 분류Sentence classification based on survey for speech recognition by sentence

음성인식 애플리케이션을 탑재한 단말기는 조사기준 문장 분류는 전처리 과정을 거쳐 나온 입력문장을 접속사격 조사 리스트와 비교하여 단어 끝에 접속사격 조사를 가진 단어가 있을 때 그 단어를 기준으로 단어를 분리한다. The terminal equipped with the voice recognition application compares the input sentence that has passed through the pre-processing process for the search criteria sentence classification with the conjunctive fire investigation list, and separates words based on the word when there is a word with conjunctive fire investigation at the end of the word.

문장 단위 음성인식 시 가공Processing for speech recognition by sentence

음성인식 애플리케이션을 탑재한 단말기는 조사기준으로 분리된 단어는 가공의 과정을 거쳐 하나의 단어로 재조합한다. 가공의 과정은 도 7과 같다. 먼저, 음성인식 애플리케이션을 탑재한 단말기는 조사기준으로 분리된 단어들을 하나의 단어로 취급하며 그 단어에서 음식명사 추출과 “먹었다”, “먹었어”, “먹음”과 같은 끝맺는 단어의 삭제 그리고 접속사격 조사 제거 과정을 수행한다. 이후 음성인식 애플리케이션을 탑재한 단말기는 단어 조합 과정을 거쳐 하나의 단어로 가공한다. 이때, 음성인식 애플리케이션을 탑재한 단말기는 음식명사 추출 성공 시 다음단어로 넘어가고 실패 시 끝맺는 단어 삭제과정으로 들어간다. 만약 이 과정에서도 삭제 실패시 접속사격 조사 제거 과정을 수행한다.A terminal equipped with a voice recognition application recombines words separated by survey criteria into a single word through a process of processing. The process of processing is shown in FIG. 7. First, a terminal equipped with a voice recognition application treats words separated by survey criteria as a single word, and extracts food nouns from the word, deletes ending words such as "eaten", "eaten", and "eats", and accesses them. Carry out the shooting investigation removal process. After that, the terminal equipped with the voice recognition application is processed into a single word through a word combination process. At this time, the terminal equipped with the voice recognition application moves to the next word when the food noun extraction is successful, and enters the process of deleting the closing word when it fails. If deletion fails even in this process, the connection fire investigation and removal process is performed.

음식명사Food noun 추출 extraction

음성인식 애플리케이션을 탑재한 단말기는 음식 명사 추출 과정을 위해 말그대로 단어가 음식명사 테이블에 존재하는 단어인지 확인한다. 이때 음성인식 애플리케이션을 탑재한 단말기는 해당 단어로 시작하는 단어 또는 해당 단어로 끝나는 단어가 음식명사 테이블에 존재하는지 확인한다. 만약 존재한다면 음성인식 애플리케이션을 탑재한 단말기는 그 단어 그대로 단어조합시에 사용되고, 존재하지 않는다면 끝맺는 단어 삭제과정과 접속사격 조사 제거 과정을 수행한다.A terminal equipped with a voice recognition application checks whether a word literally exists in the food noun table for the food noun extraction process. At this time, the terminal equipped with the voice recognition application checks whether a word starting with the word or a word ending with the word exists in the food noun table. If it exists, the terminal equipped with the voice recognition application is used when combining the word as it is, and if it does not exist, it performs the process of deleting the closing word and removing the connection fire investigation.

끝맺는 단어 삭제Delete closing word

음성인식 애플리케이션을 탑재한 단말기는 끝맺는 단어 삭제 과정을 위해 말그대로 “먹었다”, “먹었어”, “먹음”과 같은 끝맺는 단어가 있는지 확인한다. A terminal equipped with a voice recognition application checks whether there are closing words such as "eaten", "eaten", and "eaten" literally in order to delete the closing word.

음성인식 애플리케이션을 탑재한 단말기는 끝맺는 단어가 있는지 확인하는 과정을 위해 가이드에 존재하는 “먹었다”라는 동사를 제거하기 위한 것으로 끝맺는 단어가 존재할 시 그 단어는 삭제되어 단어 조합 과정에서 제외한다.A terminal equipped with a voice recognition application is to remove the verb “eaten” from the guide to check if there is a closing word. If there is a closing word, the word is deleted and excluded from the word combination process.

접속사격 조사 제거Elimination of conjunctive fire investigation

음성인식 애플리케이션을 탑재한 단말기는 접속사격 조사 제거 과정을 위해 접속사격 조사 테이블에 존재하는 조사가 단어의 끝에 존재하는지 확인하여 제거하며, 음식명칭 추출과 끝맺는 단어 삭제 과정에서 걸러지지 못한 단어들을 확인하는 마지막 절차를 수행한다.The terminal equipped with the voice recognition application checks and removes the search existing in the connection fire investigation table to see if it exists at the end of the word for the procedure to remove the connection fire investigation, and checks the words that could not be filtered out during the process of extracting the food name and deleting the ending word. Perform the final procedure.

단어 조합Word combination

음성인식 애플리케이션을 탑재한 단말기는 단어 조합을 위해 접속사격 조사를 가진 단어만 거치는 과정으로 음식명칭 추출, 끝맺는 단어 제거, 접속사격 조사제거 과정을 모두 거쳐 추출된 단어를 입력문장 순서를 기준으로 조합한다.A terminal equipped with a voice recognition application is a process that only goes through words with conjunctive fire investigation for word combination, and combines words extracted through all the processes of extracting food name, ending word, and removing conjunctive fire investigation based on the order of input sentences. do.

단위 음성인식 시 단어 추출Word extraction during unit speech recognition

음성인식 애플리케이션을 탑재한 단말기는 단어 추출을 위해 도 24에 도시된 바와 같이 입력 데이터 추출과정을 거쳐 만들어진 단어를 음식명사 테이블에 존재하는 단어와 매칭하여 추출한다.In order to extract words, the terminal equipped with the voice recognition application matches and extracts the words created through the input data extraction process with words existing in the food noun table as shown in FIG. 24 for word extraction.

먼저, 음성인식 애플리케이션을 탑재한 단말기는 접속사격 조사로 나누어진 단어인지 판별하고 접속사격 조사로 나누어진 경우는 입력데이터 추출과정에서 접속사격 조사로 인해 변환되지 못한 예외단어가 있는지 확인하여 예외처리 과정을 한번 더 진행하고 없을 경우 음식명사 부분일치 추출 및 단어 검색과정을 이용하여 결과 단어를 추출한다.First, the terminal equipped with the voice recognition application determines whether it is a word divided by the connection shooting survey, and in the case of the connection shooting survey, checks whether there is an exception word that could not be converted due to the connection shooting survey in the process of extracting the input data, and processing the exception. If it is not performed once more, the result word is extracted using the process of partial match extraction of food nouns and word search.

음성인식 애플리케이션을 탑재한 단말기는 음식명사 부분일치 추출 과정을 위해 화자가 정확하지 않은 단어를 말한 경우를 대비하여 생긴 과정으로 “그 단어로 시작되는 단어 또는 그 단어로 끝나는 단어”가 기준이 된다. A terminal equipped with a speech recognition application is a process created in case a speaker said an incorrect word for the process of extracting partial match for food nouns, and “a word that begins with that word or a word that ends with that word” is the standard.

음성인식 애플리케이션을 탑재한 단말기는 만약 음식명사 일치과정에서 단어가 추출되지 못했다면 단어검색과정을 사용한다.A terminal equipped with a voice recognition application uses a word search process if the word cannot be extracted during the food noun matching process.

음성인식 애플리케이션을 탑재한 단말기는 단어 검색과정에서 입력단어와 가장 유사한 단어를 찾아내는 것으로 STT 결과가 좋지 않거나 화자의 발음이 좋지 않음으로 인해 초래된 결과를 좀더 유연하게 대처할 수 있다. A terminal equipped with a voice recognition application finds a word that is most similar to an input word in the word search process, and can more flexibly cope with the result caused by poor STT results or poor pronunciation of the speaker.

음성인식 애플리케이션을 탑재한 단말기는 접속사격 조사로 나누어진 입력문장의 경우 위 세 과정을 거쳐 나온 결과가 최종 결과로 출력한다.In the case of the input sentence divided by the connection shooting investigation, the terminal equipped with the voice recognition application outputs the result obtained through the above three processes as the final result.

반면, 접속사격 조사로 나누어지지 않은 경우, 음성인식 애플리케이션을 탑재한 단말기는 입력데이터 추출과정에서 이미 예외단어 확인이 끝났으므로 예외처리를 하지 않고 단어의 개수에 따라 단어검색과정 또는 조합단어 추출 및 음식명사 부분일치 추출 과정을 수행한다. 접속사격 조사로 나누어지지 않은 입력문장의 경우 그 문장을 분류할 기준이 존재하지 않기 때문에 단어의 개수에 따른 분류없이 단어 검색과정을 이용하면 올바르지 못한 검색 결과를 초래할 상황이 높다고 판단 되었기 때문이다.On the other hand, if it is not divided into contact shooting investigation, the terminal with the voice recognition application has already checked the exception word in the process of extracting the input data, so the word search process or the combination word extraction and food according to the number of words are not processed without exception processing. The noun partial match extraction process is performed. This is because in the case of input sentences that are not divided by conjunctive shooting investigation, since there is no standard to classify the sentence, it was judged that the situation that would lead to incorrect search results if the word search process was used without classification according to the number of words.

만약, 단어의 개수가 1개 이상이라면, 음성인식 애플리케이션을 탑재한 단말기는 단어의 입력 순서에 기준을 두어 조합될 수 있는 단어를 찾는 조합단어 추출 과정을 실행한다. 이후 음성인식 애플리케이션을 탑재한 단말기는 음식명사 부분일치 추출 과정을 이용해 결과단어를 추출한다. 이때 조합된 단어가 있다면, 음성인식 애플리케이션을 탑재한 단말기는 그룹 추출과정을 실행하여 조합단어와 조합된 단어의 일부를 차지하는 부모단어 리스트들의 상관관계를 정리한다. 이 과정을 끝으로 음성인식 애플리케이션을 탑재한 단말기는 나온 결과를 내보낸다.If the number of words is more than one, the terminal equipped with the voice recognition application executes a combined word extraction process to find words that can be combined based on the input order of words. After that, the terminal equipped with the speech recognition application extracts the result word using the process of extracting partial match for food nouns. In this case, if there is a combined word, the terminal equipped with the voice recognition application executes a group extraction process to organize the correlation between the combined word and the parent word lists that occupy a part of the combined word. At the end of this process, the terminal equipped with the voice recognition application sends out the result.

음성인식 애플리케이션을 탑재한 단말기는 만약 단어의 개수가 1개라면 음식명사 부분일치 및 단어 검색과정을 이용하게 된다. 음성인식 애플리케이션을 탑재한 단말기는 단어의 개수가 하나로 단어 검색과정을 이용시에 올바르지 못한 검색 결과가 나올 위험이 적기 때문이다. 먼저 음식명사 부분일치 추출과정을 거쳐 부분 일치되는 음식명사가 존재하는지 확인하여 결과단어를 추출하고 존재하는 음식명사가 없을 시 단어 검색과정을 실행한다. 이 과정을 끝으로 음성인식 애플리케이션을 탑재한 단말기는 나온 결과를 내보낸다. A terminal equipped with a speech recognition application uses a food noun partial match and word search process if the number of words is one. This is because a terminal equipped with a voice recognition application has a small number of words that are less likely to result in incorrect search results when using the word search process. First, a partial match extraction process for food nouns is performed to check whether there is a partial match for food nouns, extract the result word, and execute a word search process if there is no food noun. At the end of this process, the terminal equipped with the voice recognition application sends out the result.

음성인식 애플리케이션을 탑재한 단말기는 단어 추출 과정을 통해 나온 결과와 다른 의미 결과 내에 존재하는 사투리를 표준어로 변환하고 그 결과를 최종 결과 및 최종 다른 의미 결로 내보낸다.The terminal equipped with the speech recognition application converts the dialect existing in the meaning result different from the result obtained through the word extraction process into a standard language, and sends the result as a final result and a different meaning.

문장 단위 음성인식 시 음식 명사 부분일치 추출Extraction of partial match of food nouns in sentence-by-sentence recognition

음성인식 애플리케이션을 탑재한 단말기는 음식명사 부분일치 추출과정을 위해 입력단어로 시작하거나 끝나는 음식명사를 추출한다. 음성인식 애플리케이션을 탑재한 단말기는 입력단어가 한 글자일 경우 완전일치하는 경우에만 추출한다.A terminal equipped with a voice recognition application extracts food nouns that start or end as an input word for the process of extracting a partial match for food nouns. A terminal equipped with a voice recognition application extracts only when the input word is a single letter and matches it completely.

문장 단위 음성인식 시 음식 명사 부분일치 검증Verification of partial match of food nouns in sentence-level speech recognition

음성인식 애플리케이션을 탑재한 단말기는 음식명사 부분일치 검증 과정을 위해 부분 일치된 단어를 출력으로 내놓는 것이 아닌 true, false값을 통하여 부분일치하는 음식명사가 있는지 알려준다. 이때의 기준은 음식명사 부분일치 추출과정과 같은 “입력단어로 시작하거나 끝나는 음식명사가 존재하는가?”이며 음식명사 부분일치 추출과는 다르게 부분일치하는지 검증하는 단계로 한 글자일 경우에도 부분일치를 허용한다.A terminal equipped with a voice recognition application notifies whether there is a partial match for food nouns through true and false values rather than outputting the partially matched words for the food noun partial match verification process. The criterion at this time is “Is there a food noun that starts or ends with an input word?”, which is the same as the process of extracting partial match of food nouns. Allow.

문장 단위 음성인식 시 조합단어 추출Combination word extraction for sentence-by-sentence speech recognition

음성인식 애플리케이션을 탑재한 단말기는 접속사격 조사로 나누어지지 않은 입력문장일 때, 입력 문장을 나눌 기준이 없으므로 단어를 순서대로 조합해보는 과정을 통하여 적절한 음식명사를 찾아낸다. 이때 음성인식 애플리케이션을 탑재한 단말기는 음식명사 부분일치 추출과정을 이용해 조합된 단어가 유효한 단어인지 검사한다.A terminal equipped with a voice recognition application finds an appropriate food noun through the process of combining words in order because there is no standard for dividing the input sentence when the input sentence is not divided by the conjunctive shooting survey. At this time, the terminal equipped with the speech recognition application checks whether the combined word is a valid word by using the food noun partial match extraction process.

문장 단위 음성인식 시 그룹 추출Group extraction for speech recognition by sentence

음성인식 애플리케이션을 탑재한 단말기는 그룹추출 과정을 위해 접속사격 조사로 나누어진 문장이 아니면서 단어의 개수가 1개 이상일 때 조합단어 추출 과정으로 만들어진 조합단어와 부모단어 사이의 관계를 정리한다.A terminal equipped with a voice recognition application organizes the relationship between the combination word created by the combination word extraction process and the parent word when the number of words is more than one, but the sentence is not divided by the connection shooting survey for the group extraction process.

음성인식 애플리케이션을 탑재한 단말기는 부모단어의 부분일치 결과 생성시에 조합단어와 부분일치 되는 단어는 제외시키며 조합단어의 부분일치 생성시에는 다른 제약 없이 생성한다. 이때, 음성인식 애플리케이션을 탑재한 단말기는 음식명사 부분일치 추출 과정을 수행한다.A terminal equipped with a voice recognition application excludes a word that is partially matched with a combination word when generating a partial match result of a parent word, and generates it without any other restrictions when creating a partial match of a combination word. At this time, the terminal equipped with the voice recognition application performs a process of extracting partial match for food nouns.

문장 단위 음성인식 시 단어 검색Word search in sentence-level speech recognition

음성인식 애플리케이션을 탑재한 단말기는 단어 검색을 위해 입력된 단어와 가장 유사한 단어를 찾아내는 과정으로 도 25에 도시된 바와 같이 부분 일치 확인을 통해 최종입력단어와 부분 또는 모두 일치하는 단어를 찾아 초기 후보 단어를 생성한다. 만약 초기 후보 단어가 있으면, 음성인식 애플리케이션을 탑재한 단말기는 그 단어 내에서 최종 후보 단어를 선출하고 초기 후보 단어가 없으면 최종입력단어를 이용해 최종 후보 단어를 선출한다.A terminal equipped with a speech recognition application is a process of finding a word that is most similar to an input word for word search. As shown in FIG. 25, a partial match check is performed to find a word that matches the final input word and partially or both, and the initial candidate word Create If there is an initial candidate word, the terminal equipped with the speech recognition application selects a final candidate word within the word, and if there is no initial candidate word, the final candidate word is selected using the final input word.

음성인식 애플리케이션을 탑재한 단말기는 최종 후보 단어 선출 방법으로 먼저, 앞서 전처리 때 만들어진 초기 근사치 값이 0일 경우에 Hashing Algorithm과 Levenshtein Distance Algorithm의 결과로 나온 단어들을 매칭시켜 같은 단어가 있을 경우 그 단어의 근사치에 -1 값을 더해주고. 만약 0이 아닐 경우 Hashing Algorithm은 사용하지 않으며 근사치 값을 그대로 저장한다.The terminal equipped with the speech recognition application is the method of selecting the final candidate word. First, if the initial approximation value created in the preprocessing is 0, the words resulting from the Hashing Algorithm and the Levenshtein Distance Algorithm are matched. Add -1 to the approximation. If it is not 0, the Hashing Algorithm is not used and the approximate value is stored as it is.

그 후 음성인식 애플리케이션을 탑재한 단말기는 근사치 비교 과정을 거쳐 후보 단어의 수를 줄이고 만약 후보 단어의 수가 2개 이상일 경우 자음 비교를 거쳐 최종 후보 단어를 선출한다.After that, the terminal equipped with the speech recognition application reduces the number of candidate words through an approximate comparison process, and if the number of candidate words is more than two, selects a final candidate word through consonant comparison.

문장 단위 음성인식 시 부분 일치 확인Partial coincidence check for sentence-by-sentence speech recognition

음성인식 애플리케이션을 탑재한 단말기는 부분 일치 확인을 위해 도 26에 도시된 바와 같이, 최종입력단어가 표준어 테이블, 사투리 테이블의 단어 내에 부분적 또는 모두 일치하는 단어를 색출한다. 이때 부분 일치하는 단어가 있을 경우, 음성인식 애플리케이션을 탑재한 단말기는 true 값을 전달하고 없을 경우 false 값을 전달한다.The terminal equipped with the voice recognition application searches for a word in which the final input word partially or all matches within the words of the standard language table and dialect table as shown in FIG. 26 to confirm partial match. At this time, if there is a partial match, the terminal equipped with the voice recognition application transmits a true value, and if there is no word, a false value.

문장 단위 음성인식 시 Speech recognition by sentence LevenshteinLevenshtein Distance Algorithm Distance Algorithm

음성인식 애플리케이션을 탑재한 단말기는 Levenshtein Distance Algorithm은 표준어 테이블과 사투리 테이블에 존재하는 단어들 중 어떤 단어가 전처리를 거친 입력단어와 얼마나 유사한지 검사하는 알고리즘으로 도 27에 도시된 바와 같이 후보단어를 색출한다. 이때, 음성인식 애플리케이션을 탑재한 단말기는 입력받은 단어와 같아지기 위해서 몇번의 글자 변경이 일어나야 하는지 기록하고 이를 근사치라 부른다. 근사치는 그 값이 적을 수록 가장 근사함을 의미한다.In a terminal equipped with a speech recognition application, Levenshtein Distance Algorithm is an algorithm that checks how similar a word among words in the standard language table and dialect table is to the pre-processed input word, and searches for candidate words as shown in FIG. do. At this time, the terminal equipped with the voice recognition application records how many times the letter needs to be changed to become the same as the input word, and this is called an approximation. The approximate value means that the smaller the value, the closest.

근사치의 경우 Levenshtein Distance Algorithm을 통해 얻은 후보단어의 근사치에 해당 최종입력단어의 초기 근사치 값을 더한다.In the case of an approximation, the initial approximation value of the final input word is added to the approximate value of the candidate word obtained through the Levenshtein Distance Algorithm.

음성인식 애플리케이션을 탑재한 단말기는 현재 후보 선정에 있어서 근사치 값이 5이하일 경우에만 후보에 올리고 각 후보 단어의 근사치를 저장한다. A terminal equipped with a speech recognition application places an approximate value of 5 or less in a candidate selection and stores an approximate value of each candidate word.

문장 단위 음성인식 시 Hashing AlgorithmHashing Algorithm for sentence-level speech recognition

음성인식 애플리케이션을 탑재한 단말기는 Hashing Algorithm을 이용하여 도 29와 같은 형태로 실행한다. 먼저, 음성인식 애플리케이션을 탑재한 단말기는 발음 유사도를 기준으로 만든 가중치 형성 테이블인 도 28을 기준으로 입력단어의 hash code를 구하여 표준어 테이블과 사투리 테이블에 존재하는 단어들의 hash code값과 비교하여 같은 값을 가지는 단어를 색출한다. 단, 음성인식 애플리케이션을 탑재한 단말기는 원래 입력 단어와 글자수가 같을 경우이면서 숫자가 포함되지 않은 경우에만 실행한다. 초기 근사치 값이 0이고 숫자가 없을 때이다.A terminal equipped with a voice recognition application is executed as shown in FIG. 29 using a Hashing Algorithm. First, a terminal equipped with a speech recognition application obtains the hash code of an input word based on FIG. 28, which is a weighting table based on pronunciation similarity, and compares the hash code value of words existing in the standard language table and dialect table to the same value. Find words that have However, the terminal equipped with the voice recognition application is executed only when the original input word and the number of characters are the same and no numbers are included. This is when the initial approximation is 0 and there are no numbers.

도 28에 도시된 가중치 형성 테이블에

은 초성,

은 중성,

은 종성을 나타낸다.In the weight formation table shown in FIG. 28

Silver First Star,

Is neutral,

Represents the finality.

문장 단위 음성인식 시 두 알고리즘의 조합Combination of two algorithms in sentence-based speech recognition

음성인식 애플리케이션을 탑재한 단말기는 도 30에 도시된 바와 같이, Hashing Algorithm으로 인해 나온 후보 단어와 Levenshtein Distance Algorithm에 나온 후보단어 중 같은 단어가 있을 때 전처리를 거친 입력단어와 글자수가 동일한 경우에만 그 단어의 근사치 값에 -1을 더해준다. 단, 음성인식 애플리케이션을 탑재한 단말기는 원래 입력 단어와 글자수가 같을 경우에만 실행한다. 초기 근사치 값이 0이다.As shown in Fig. 30, the terminal equipped with the voice recognition application has the same word among the candidate words from the Hashing Algorithm and the candidate words from the Levenshtein Distance Algorithm, and only when the preprocessed input word and the number of letters are the same. Add -1 to the approximate value of. However, a terminal equipped with a voice recognition application runs only when the original input word and the number of characters are the same. The initial approximation value is 0.

문장 단위 음성인식 시 후보 단어 Candidate words for speech recognition by sentence 필터링Filtering

근사치 비교Approximate comparison

음성인식 애플리케이션을 탑재한 단말기는 근사치 비교를 위해 도 31에 도시된 바와 같이, 각 후보 단어에 근사치를 비교하여 가장 적은 근사치 값을 가지는 후보 단어를 최종 결과로 저장한다. 이때 음성인식 애플리케이션을 탑재한 단말기는 동일한 근사치를 갖는 단어도 함께 저장한다.The terminal equipped with the speech recognition application compares the approximate values to each candidate word and stores the candidate word having the smallest approximation value as a final result, as shown in FIG. 31 for approximation comparison. At this time, the terminal equipped with the voice recognition application also stores words with the same approximation.

자음 비교Consonant comparison

음성인식 애플리케이션을 탑재한 단말기는 자음 비교를 위해 근사치를 비교한 뒤에도 단어가 2개 이상일 경우 도 32에 도시된 바와 같이 최종입력단어의 자음 순서를 비교하여 가장 비슷한 단어를 최종 결과로 저장한다. 단, 음성인식 애플리케이션을 탑재한 단말기는 원래단어의 글자수와 같지 않은 입력단어의 후보단어일 경우 자음의 순서가 모두 같아야만 최종 후보 결과에 저장한다.When there are two or more words after approximate comparisons for consonant comparison, the terminal equipped with the speech recognition application compares the order of consonants of the final input words as shown in FIG. 32 and stores the most similar words as the final result. However, in the case of a terminal equipped with a voice recognition application, in case of a candidate word for an input word that is not the same as the number of characters in the original word, the order of consonants must be the same to store it in the final candidate result.

문장 단위 음성인식 시 최종 후보 Final candidate for speech recognition by sentence 필터링Filtering

근사치 비교Approximate comparison

음성인식 애플리케이션을 탑재한 단말기는 최종 후보 단어의 개수가 2개 이상일 때 도 31에 도시된 근사치 비교 알고리즘의 근사치 비교 과정과 동일한 알고리즘으로 각 최종 후보 단어의 근사치를 비교하여 가장 적은 값을 가지는 단어를 최종 결과로 저장한다. 이때 음성인식 애플리케이션을 탑재한 단말기는 동일한 근사치를 갖는 단어도 저장한다.When the number of final candidate words is 2 or more, the terminal equipped with the speech recognition application compares the approximate values of each final candidate word with the same algorithm as the approximation comparison process of the approximation comparison algorithm shown in FIG. 31 to determine the word with the smallest value. Save as final result. At this time, the terminal equipped with the voice recognition application also stores words with the same approximation.

문장 단위 음성인식 시 사투리 표준어 변환Standard dialect conversion for sentence-level speech recognition

음성인식 애플리케이션을 탑재한 단말기는 최종 결과 및 다른 의미 결과에 저장된 사투리를 알맞은 표준어로 변환하는 과정으로 각 입력값에 관한 결과 내에서 중복 제거를 수행한다. The terminal equipped with the speech recognition application performs redundancy in the result of each input value by converting the dialect stored in the final result and other semantic results into an appropriate standard language.

문장 단위 음성인식 시 문장 단위 인식률Sentence unit recognition rate for sentence unit speech recognition

문장 단위 음성인식 시 인식률 테스트 환경Recognition rate test environment for speech recognition by sentence

음성인식 애플리케이션을 탑재한 단말기는 문장 단위 인식률을 테스트하기 위해, 20대 여자 3명을 한 명씩 단어만, ~랑 먹었어, ~하고 먹었어를 테스트하고, 3번의 발화 기회를 부여하고, 3번이상 나오지 않은 단어 체크하였다.In order to test the recognition rate by sentence unit, the terminal equipped with the speech recognition application, Three women in their twenties, one by one, tested only words, ate with ~, and ate with ~, gave 3 chances of uttering, and checked words that did not appear more than 3 times.

문장 단위 음성인식 시 테스트 애플리케이션 UITest application UI for speech recognition in sentences

도 5에 도시된 애플리케이션 실행화면 상의 원본 단어 영역①은 표준 테이블과 사투리 테이블에 존재하는 단어를 하나씩 출력한다. 애플리케이션 실행화면 상의 음성인식 결과 단어 영역②은 사용자가 원본 단어를 읽은 결과 출력한다. 애플리케이션 실행화면 상의 처리결과 영역③은 저장 또는 저장취소 버튼을 누른 후 잘 처리되었는지 확인한다. 애플리케이션 실행화면 상의 마이크 버튼④은 마이크 버튼을 누르면 사용자 음성을 입력 받는다. 애플리케이션 실행화면 상의 뒤로 버튼⑤은 이전 단어로 넘어간다. 애플리케이션 실행화면 상의 앞으로 버튼⑥은 다음 단어로 넘어간다. 애플리케이션 실행화면 상의 저장하기 버튼⑦은 원본 단어와 음성인식결과단어가 다를 경우 저장한다. 애플리케이션 실행화면 상의 저장취소⑧는 잘못 저장했을 경우 저장을 취소한다.The original word area ① on the application execution screen shown in FIG. 5 outputs words that exist in the standard table and dialect table one by one. The speech recognition result word area ② on the application execution screen outputs the result of the user reading the original word. Check that the processing result area ③ on the application execution screen is processed properly after pressing the Save or Cancel button. When the microphone button ④ on the application execution screen is pressed, the user's voice is input. The back button ⑤ on the application execution screen moves to the previous word. The forward button⑥ on the application execution screen moves to the next word. The save button ⑦ on the application execution screen saves the original word and the voice recognition result word if they are different. Save Cancel ⑧ on the application execution screen cancels saving if it is saved incorrectly.

문장 단위 음성인식 시 인식률 결과Recognition rate result for speech recognition by sentence

단순히 음식 명사만 읽어 테스트한 결과는 [표2]와 같다.The test results by simply reading the food noun are shown in [Table 2].

음식명사 뒤에 “~(이)랑 먹었어”를 붙여 읽어 테스트한 결과는 [표3]과 같다.[Table 3] shows the test results by adding "I ate with ~" after the food noun.

음식명사 뒤에 “~하고 먹었어”를 붙여 읽어 테스트한 결과는 표[4]와 같다.Table [4] shows the results of the test by adding “I ate it” after the food noun.

이상의 설명은 본 실시예의 기술 사상을 예시적으로 설명한 것에 불과한 것으로서, 본 실시예가 속하는 기술 분야에서 통상의 지식을 가진 자라면 본 실시예의 본질적인 특성에서 벗어나지 않는 범위에서 다양한 수정 및 변형이 가능할 것이다. 따라서, 본 실시예들은 본 실시예의 기술 사상을 한정하기 위한 것이 아니라 설명하기 위한 것이고, 이러한 실시예에 의하여 본 실시예의 기술 사상의 범위가 한정되는 것은 아니다. 본 실시예의 보호 범위는 아래의 청구범위에 의하여 해석되어야 하며, 그와 동등한 범위 내에 있는 모든 기술 사상은 본 실시예의 권리범위에 포함되는 것으로 해석되어야 할 것이다.The above description is merely illustrative of the technical idea of the present embodiment, and those of ordinary skill in the technical field to which the present embodiment belongs will be able to make various modifications and variations without departing from the essential characteristics of the present embodiment. Accordingly, the present exemplary embodiments are not intended to limit the technical idea of the present exemplary embodiment, but are illustrative, and the scope of the technical idea of the present exemplary embodiment is not limited by these exemplary embodiments. The scope of protection of this embodiment should be interpreted by the following claims, and all technical ideas within the scope equivalent thereto should be construed as being included in the scope of the present embodiment.

Claims

In the word-based speech recognition method,
A process of receiving a word-based voice input and converting it into word text through Speech-To-Text (STT);
A pre-processing step of classifying tokens for the word text, checking different meanings of the tokens, performing alphabetic processing and numeric processing based on the result of performing exception processing, and then generating a final input word combining tokens;
A candidate selection process of confirming partial match of the final input word, extracting a candidate word by combining a plurality of preset algorithms, and generating a final candidate word filtering the candidate word;
When there are a plurality of final candidate words, a final candidate filtering process of comparing the approximate values of the final candidate words, checking whether the word having the smallest value is a standard word, and outputting the standard word
Word-based speech recognition method comprising a.

The method of claim 1,
The pretreatment process is
A token classification process of checking whether there is a space in the word text and, if there is space, classifying an input word based on the space and defining a token as a token;
Another meaning checking process of classifying the word into spaces and checking whether there is a different meaning if there is a space in the word text;
An exception processing step of checking whether there is an exception word in the word text, and checking whether there is an exception word as it is in the input value when there is no space;
An alphabet processing step of storing the original word when the word text has a different meaning, performing exception processing when there is an exception word in the word text, and performing alphabet processing when there is no exception word;
If there is a space in the word text, the word changed due to exception processing or alphabet processing is output by replacing the corresponding token part of the input with the changed word, and if there is no space in the word text, the result of exception processing or alphabet processing is output. A number processing process of changing a number existing in the word text through number processing;
If there are spaces between the plurality of input words obtained through the above number processing, token combinations are executed based on the order of letters and spaces of each word to generate the final input word, and if there is no space, the number is processed without token classification. Token combination process of generating multiple input words as the final input words
Word-based speech recognition method comprising a.

The method of claim 1,
The candidate selection process above is
This is a process applied to each final input word, and generates an initial candidate word by finding a word that partially or all matches with the final input word through partial match checking, and if the initial candidate word exists, a final candidate word within the initial candidate word Partial agreement checking process of selecting a final candidate word using the final input word if there is no initial candidate word
If the initial approximation value is 0, the words resulting from the Hashing Algorithm and Levenshtein Distance Algorithm are matched, and if there is the same word, add -1 to the approximation of the same word, and if it is not 0, the Hashing Algorithm is not used. The combination of two algorithms that stores the approximate value as it is
When reducing the number of candidate words through an approximate comparison process, when the number of candidate words is plural, a candidate word filtering process in which the final candidate word is selected by reducing the number of candidate words once more through consonant comparison
Word-based speech recognition method comprising a.

The method of claim 1,
The final candidate filtering process is
An approximation comparison process of comparing approximate values of each final candidate word and storing the word having the smallest value as a final result when the number of final candidate words is plural; And
If there is a word that matches the dialect column of the table among the original words of the final result word and other meaning words found in the preprocessing step, the corresponding standard word is output together with the final result word. If there is no match, only the final result word Process of checking the output standard language
Word-based speech recognition method comprising a.

In the sentence unit speech recognition method,
A process of receiving a speech input in a sentence and converting it into sentence text through Speech-To-Text (STT);
A pre-processing step of classifying tokens for the sentence text, checking different meanings of the tokens, performing alphabetic processing and numeric processing based on a result of performing exception processing, and then generating a final input word combining tokens;
An input sentence classification and processing process of classifying and processing the final input word as a search reference sentence;
A process of extracting a partial match for food nouns when extracting a food noun starting or ending with an input word in the final input word, extracting only when the input word is a complete match;
A food noun partial match verification process of indicating whether or not there is a food noun that partially matches the food noun partial match result through true and false values;
A combined word extraction process of combining words in order and checking whether the combined word is a valid word when the food noun partial match verification result is an input sentence that is not divided by a conjunctive shooting survey;
Generates an initial candidate word based on the final input word, partial match word, and all match words, selects the final candidate word from the initial candidate word when the initial candidate word exists in an internal database, and selects the initial candidate word A group extraction process of selecting the final candidate word by using the final input word if it does not exist in the internal database;
A word retrieval process of checking partial coincidence of the final input word and selecting a final candidate word by comparing approximate values and consonants using a plurality of preset algorithms;
A standard dialect conversion process of converting a dialect stored in a final result and other semantic results among the final candidate words into an appropriate standard word to remove duplicates in the result of each input value;
Sentence-based speech recognition method comprising a.

The method of claim 5,
The pretreatment process,
A token classification process of checking whether there are spaces in the sentence text and, if there are spaces, classifying input words based on the spaces and then defining them as tokens;
Exception handling process of checking whether there is an exception word in the sentence text, and checking whether there is an exception word as it is in the input value when there is no space.'
An alphabet processing step of storing the original word when the sentence text has a different meaning, performing exception processing when there is an exception word in the sentence text, and performing alphabet processing when there is no exception word;
If there is a space in the sentence text, the word changed due to exception processing or alphabet processing is output by replacing the corresponding token part of the input with the changed word, and if there is no space in the sentence text, the result of exception processing or alphabet processing is output as it is, Number processing process that changes the number existing in the word text
Sentence-based speech recognition method comprising a.

The method of claim 5,
The input sentence classification and processing process,
A search criterion sentence classification process of comparing the final input word with a conjunctive fire investigation list and, if there is a word with conjunctive fire investigation at the end of a word, separating words based on the word with conjunctive fire investigation; And
Processing process of recombining words separated in the search criteria sentence classification process into one word
Sentence-based speech recognition method comprising a.

The method of claim 7,
The processing process;
In the process of classifying the search criteria sentence, it is checked whether or not the separated word exists in the food noun table, and if it exists, it is used when combining the word as it exists in the food noun table, deleting the ending word if it does not exist, and removing the conjunctive fire investigation. The food noun extraction process to perform;
A closing word deletion process of checking whether there is a closing word among words existing in the food noun table and deleting the closing word when there is a closing word to be excluded from the word combination process;
A conjunctive fire investigation removing process of checking and removing whether a conjunctive fire investigation among words existing in the food noun table exists at the end of a word;
A word combination process in which words extracted through all of the food noun extraction process, the closing word deletion process, and the connection fire investigation and removal process are combined based on the input sentence order
Sentence-based speech recognition method comprising a.

The method of claim 5,
The group extraction process
When the number of words is not a sentence divided by the conjunctive fire survey and the number of words is more than one, the relationship between the combination word created by the combination word extraction process and the parent word is arranged, and when the result of partial matching of the parent word is generated, the combination word A sentence-by-sentence speech recognition method, characterized in that words that partially match with are excluded and are generated without any other restrictions when generating a partial match of a combined word.

The method of claim 5,
The word search process
When the initial approximation value of the final input word is 0, words resulting from the Hashing Algorithm and Levenshtein Distance Algorithm are matched, and when the same word exists, a value of -1 is added to the approximation of the same word, and it is not 0. In case the Hashing Algorithm is not used, the approximate value is stored as it is, the number of candidate words is reduced through an approximate value comparison process, and if the number of candidate words is more than two, the final candidate word is selected through consonant comparison Unit speech recognition method.

The method of claim 10,
The word search process
Partial match checking process in which when the final input word searches for a word that partially or all matches within a word of the standard language table or dialect table, a true value is transferred when a partial matched word exists, and a false value is transferred when the divided matched word does not exist. ;
Among the words in the standard word table and dialect table, a candidate word is searched for by examining how similar a specific word is to the pre-processed input word, and how many times the letter needs to be changed to become the same as the input word as an approximation. A Levenshtein Distance Algorithm that records and stores the approximate value of each candidate word by adding the initial approximate value of the final input word to the approximate value of the candidate word obtained through the Levenshtein Distance Algorithm, and raising it as a candidate only if the approximate value is less than 5 in the current candidate selection. Performing process;
Hashing Algorithm execution process of finding a hash code of an input word based on a weight formation table based on pronunciation similarity and comparing it with hash code values of words existing in the standard language table and the dialect table to search for words having the same value;
A combination process of two algorithms that add -1 to the approximate value of the corresponding word only when there is the same word among the candidate words from the Hashing Algorithm and the candidate words from the Levenshtein Distance Algorithm, and only when the number of letters and the preprocessed input word are the same;
A candidate word filtering process of comparing approximate values with each candidate word and comparing consonants to select the final candidate word;
Final candidate filtering process of filtering the final candidate word by comparing approximate values
Sentence-based speech recognition method comprising a.

The method of claim 11,
The candidate word filtering process,
Approximate value comparison process in which the candidate word with the smallest approximation value is stored as the final result by comparing the approximate value to each candidate word, and the word with the same approximation value is also stored.
After performing the approximation comparison process, if there are two or more words, the consonant order of the final input word is compared and the most similar word is stored as the final result, and if it is a candidate word of the input word that is not the same as the original word, consonants Consonant comparison process that saves in the final candidate result only when the order of all of them is same
Sentence-based speech recognition method comprising a.

The method of claim 11,
The final candidate filtering,
When the number of final candidate words is two or more, comparing approximate values of each final candidate word and storing a word having the smallest value as a final result.