KR102374405B1

KR102374405B1 - Apparatus and method for post-processing speech recognition in artificial intelligence interview

Info

Publication number: KR102374405B1
Application number: KR1020210083699A
Authority: KR
Inventors: 임헌영; 김성원
Original assignee: 주식회사 무하유
Priority date: 2021-06-28
Filing date: 2021-06-28
Publication date: 2022-03-15
Also published as: KR102597086B1; KR20230001496A

Abstract

The present invention relates to a method and a device for post-processing speech recognition in an artificial intelligence (AI) interview environment, which includes the following steps of: obtaining interview data on an AI interview target and generating a word vector space based on the interview data; obtaining a speech recognition result text on the AI interview target and verifying the non-fluency of the speech recognition result text based on the word vector space; defining the non-fluency part of the speech recognition result text as a correction section and generating a correction vocabulary candidate group corresponding to the correction section based on the word vector space; verifying the structural similarity and contextual suitability of words belonging to the correction vocabulary candidate group and selecting a final correction word candidate by calculating a similarity index through selective weighting therebetween; determining whether the selected final correction word candidate is suitable; and, if the selected final correction word candidate is suitable, replacing the correction section in the speech recognition result text with the final correction word candidate and outputting the candidate. The present invention is a technique developed through the "Commercialization of AI Recruitment Evaluation System for Non-face-to-face Recruitment Environment Improvement" of the 2020 AI Technology Commercialization Support Project (CY201042) of the Seoul Business Agency in Seoul.

Description

Speech recognition post-processing apparatus and method in AI interview environment

본 발명은 음성 인식 후처리 방법에 관한 것으로, 보다 구체적으로 AI 면접 환경에서의 음성 인식 후처리 장치 및 방법에 관한 것이다.The present invention relates to a voice recognition post-processing method, and more particularly, to a voice recognition post-processing apparatus and method in an AI interview environment.

일반적으로, 음성 인식 후처리에 있어서, 비유창성 감지는, 화자가 발화한 내용 중 잘못 인식된 부분을 교정 혹은 삭제할 때 사용되는 기법이다.In general, in post-processing of speech recognition, detection of non-fluency is a technique used to correct or delete an erroneously recognized part of content uttered by a speaker.

비유창성 감지에 사용되는 기술은, 최근에 주로 사용되고 있는 데이터 기반 지도 학습 기법과 기존에 사용되던 규칙 기반 방법이 모두 사용 가능하다.As for the technology used for non-fluency detection, both the recently used data-based supervised learning method and the existing rule-based method can be used.

그러나, 데이터 기반 지도 학습 기법은, 비유창성 감지를 위한 별도의 데이터가 준비되어야 하고 한국어에 대한 충분한 성능 검증이 이루어지지 않았다는 문제가 있었다.However, the data-based supervised learning technique had a problem that separate data for non-fluency detection had to be prepared and sufficient performance verification for Korean was not performed.

그리고, 규칙 기반 방법 또한 한국어의 복잡한 발음과 음운 변동을 모두 고려하여 규칙을 정의하는 것에는, 상당한 인적 자원과 시간이 소요된다는 문제가 있었다.In addition, the rule-based method also had a problem in that it takes considerable human resources and time to define rules in consideration of both the complicated pronunciation and phonological fluctuations of Korean.

또한, 기존 기술의 경우에는, 발음 사전 데이터베이스 등을 별도로 생성하고, 이를 토대로 단어 간 구조적 유사성을 측정하는 방식으로 어휘 교정을 진행하였으나, 교정 완료 단어에 대한 문맥적 적합성 검증은 시행하지 않는다.In addition, in the case of the existing technology, vocabulary correction was performed by separately generating a pronunciation dictionary database, etc. and measuring structural similarity between words based on this.

이 경우, 음성 인식 오류 교정 결과는, 구축된 발음 사전 데이터베이스의 성능에 높은 의존성을 갖게 되며, 한국어의 언어 특성 상 모든 구조적 변화 경우의 수에 대응하는 발음 사전 구축은 높은 구축 비용이 동반된다는 단점이 존재할 수 있다.In this case, the speech recognition error correction result has a high dependence on the performance of the constructed pronunciation dictionary database, and the construction of the pronunciation dictionary corresponding to the number of structural changes due to the language characteristics of Korean is accompanied by a high construction cost. may exist.

또한, 한국어는, 고유한 언어적 특성(모호함, 교착어 등)으로 인해 자연 언어 처리 분야에 있어 상당히 불리한 조건을 갖추고 있는 언어이다.In addition, Korean is a language that has quite disadvantageous conditions in the field of natural language processing due to its inherent linguistic characteristics (ambiguity, agglutinative language, etc.).

영어를 대상으로 진행되는 해외 자연 언어 처리 관련 기술들은, 실제 한국어에 적용하고 동일한 수준의 성능을 기대하기 어려운 경우가 많으며, 주요 자연 언어 처리 기법들을 선택적으로 적용할 필요성이 있다.Overseas natural language processing-related technologies conducted for English are often difficult to apply to actual Korean and expect the same level of performance, and there is a need to selectively apply major natural language processing techniques.

최근, 채용 문화는, 코로나로 인해 변화한 것 중 하나이며 흔히 면접자와 면접관이 화상 회의 프로그램 등을 사용하여 진행하는 언택트 면접, 사전에 PT 녹화를 진행하고 이를 제출하는 온라인 PT 면접, AI 면접관의 절차에 따라 몇 가지 질의응답을 진행하는 AI 면접이 있다.Recently, the hiring culture is one of the things that have changed due to Corona, and it is often an untact interview where the interviewer and interviewer use a video conference program, etc., an online PT interview in which PT is recorded and submitted in advance, and the AI interviewer’s There is an AI interview with several questions and answers according to the procedure.

이 중 AI 면접은, 사람이 아닌 기계에 의해 면접 당락이 결정되며, 질의응답 간 음성을 포함한 모든 상호작용은 면접의 결과에 반영된다.Among these, AI interview is decided by machine rather than human, and all interactions, including voice, between questions and answers are reflected in the interview results.

그러므로, AI 면접에 있어 다양한 환경의 면접자 음성을 놓치지 않고 정확하게 파악하는 것은, 필수불가결한 작업이지만, 음성 인식 한계로 인하여 AI 면접 환경에서의 음성 인식 결과에 대한 보완이 필요하다.Therefore, it is an essential task to accurately grasp the voice of the interviewer in various environments in the AI interview, but it is necessary to supplement the voice recognition result in the AI interview environment due to the limitation of voice recognition.

따라서, 향후, AI 면접 환경에서, 최소 비용 및 최소 시간으로 음성 인식 결과를 보완하여 정확성 및 신뢰성이 향상된 음성 인식 결과를 제공할 수 있는 음성 인식 후처리 방법의 개발이 요구되고 있다.Therefore, in the future, in the AI interview environment, the development of a voice recognition post-processing method capable of providing a voice recognition result with improved accuracy and reliability by supplementing the voice recognition result with minimum cost and minimum time is required.

본 발명은 서울특별시 서울산업진흥원 2020년도 인공지능(AI) 기술사업화 지원사업(CY201042)"비대면 채용환경개선을 위한 AI채용 평가시스템 사업화"를 통해 개발된 기술이다.The present invention is a technology developed through the Seoul Industry Promotion Agency's 2020 Artificial Intelligence (AI) Technology Commercialization Support Project (CY201042) "Commercialization of AI Recruitment Evaluation System for Non-face-to-face Recruitment Environment Improvement".

대한민국 공개특허 10-2018-0062003호 (2018. 06. 08)Republic of Korea Patent Publication No. 10-2018-0062003 (2018. 06. 08)

상술한 바와 같은 문제점을 해결하기 위한 본 발명의 일 목적은, 면접 자료 데이터를 기반으로 단어 벡터 공간을 생성하고, 이를 기초로 음성 인식 결과 텍스트의 교정 구간에 상응하는 교정 어휘 후보군의 구조적 유사성 및 문맥적 적합성을 검증하여 최종 교정 단어 후보를 선정함으로써, AI 면접 환경에서, 최소 비용 및 최소 시간으로 음성 인식 결과를 보완하여 정확성 및 신뢰성이 향상된 음성 인식 결과를 제공할 수 있는 AI 면접 환경에서의 음성 인식 후처리 장치 및 방법을 제공하는 것이다.One object of the present invention to solve the above problems is to generate a word vector space based on interview data, and based on this, the structural similarity and context of the proofreading vocabulary candidate group corresponding to the proofreading section of the speech recognition result text Speech recognition in an AI interview environment that can provide speech recognition results with improved accuracy and reliability by supplementing speech recognition results with minimum cost and minimum time in an AI interview environment by verifying appropriate suitability and selecting final proofreading word candidates To provide a post-processing apparatus and method.

또한, 본 발명은, AI 면접 환경에서 음성 인식 결과를 거친 면접자 음성 처리 텍스트를 비지도 학습 기반 기술로 생성한 단어 벡터 공간 등을 활용하여 후처리하는 음성 인식 후처리 장치 및 방법을 제공하는 것이다.Another object of the present invention is to provide a voice recognition post-processing apparatus and method for post-processing an interviewer's speech processing text, which has undergone speech recognition in an AI interview environment, using a word vector space generated by unsupervised learning-based technology, etc.

본 발명이 해결하고자 하는 과제들은 이상에서 언급된 과제로 제한되지 않으며, 언급되지 않은 또 다른 과제들은 아래의 기재로부터 통상의 기술자에게 명확하게 이해될 수 있을 것이다.The problems to be solved by the present invention are not limited to the problems mentioned above, and other problems not mentioned will be clearly understood by those skilled in the art from the following description.

상술한 과제를 해결하기 위한 본 발명의 일 실시예에 따른 음성 인식 후처리 방법은, AI(Artificial Intelligence) 면접 환경에서의 음성 인식 후처리 장치의 음성 인식 후처리 방법으로서, AI 면접 대상자의 면접 자료를 획득하여 면접 자료 데이터를 기반으로 단어 벡터 공간을 생성하는 단계, 상기 AI 면접 대상자의 음성 인식 결과 텍스트를 획득하고, 상기 단어 벡터 공간을 기반으로 상기 음성 인식 결과 텍스트의 비유창성을 검증하는 단계, 상기 음성 인식 결과 텍스트 중 비유창성 부분을 교정 구간으로 정의하고, 상기 단어 벡터 공간을 기반으로 상기 교정 구간에 상응하는 교정 어휘 후보군을 생성하는 단계, 상기 교정 어휘 후보군에 속하는 단어들의 구조적 유사성 및 문맥적 적합성을 검증하고, 이들 간의 선택적 가중치 적용을 통해 유사성 지수를 산출하여 최종 교정 단어 후보를 선정하는 단계, 상기 선정된 최종 교정 단어 후보의 적합성 여부를 판단하는 단계, 및 상기 선정된 최종 교정 단어 후보가 적합하면 상기 음성 인식 결과 텍스트 중 교정 구간을 상기 최종 교정 단어 후보로 교체하여 출력하는 단계를 포함하는 것을 특징으로 한다.A voice recognition post-processing method according to an embodiment of the present invention for solving the above-described problems is a voice recognition post-processing method of a voice recognition post-processing apparatus in an artificial intelligence (AI) interview environment, and interview data of an AI interviewee generating a word vector space based on interview material data by acquiring defining the non-fluency part of the speech recognition result text as a proofreading section, generating a proofreading vocabulary candidate group corresponding to the proofreading section based on the word vector space; Selecting a final proofreading word candidate by verifying suitability and calculating a similarity index through selective weighting between them, determining whether the selected final proofreading word candidate is appropriate, and the selected final proofreading word candidate If appropriate, the correction section of the speech recognition result text is replaced with the final corrected word candidate and output.

실시 예에 있어서, 상기 단어 벡터 공간을 생성하는 단계는, 상기 AI 면접 대상자의 면접 자료들 중 자기소개 데이터를 기반으로 단어 벡터 공간을 생성하는 것을 특징으로 한다.In an embodiment, the generating of the word vector space is characterized in that the word vector space is generated based on self-introduction data among the interview data of the AI interviewee.

실시 예에 있어서, 상기 음성 인식 결과 텍스트의 비유창성을 검증하는 단계는, 상기 음성 인식 결과 텍스트를 획득하면 상기 음성 인식 결과 텍스트를 형태소 단위로 분리하고 품사 구분을 진행한 다음, 상기 단어 벡터 공간에 사상하여 상기 음성 인식 결과 텍스트에서 비유창성을 갖는 부분을 감지하고 상기 해당 부분을 교정 구간으로 판단하는 것을 특징으로 한다.In an embodiment, the step of verifying the non-fluency of the speech recognition result text comprises: when the speech recognition result text is obtained, the speech recognition result text is divided into morpheme units, part-of-speech division is performed, and then, the speech recognition result text is stored in the word vector space. It is characterized in that a portion having non-fluency is detected in the speech recognition result text by mapping, and the corresponding portion is determined as a correction section.

실시 예에 있어서, 상기 교정 어휘 후보군을 생성하는 단계는, 상기 음성 인식 결과 텍스트 중 비유창성 부분을 교정 구간으로 정의하고, 상기 교정 구간을 형태소 단위로 분리하며, 상기 단어 벡터 공간 내에서 상기 형태소 단위로 분리된 다른 부분과 함께 등장하는 단어군을 우선적으로 추출하고, 상기 교정 구간의 예상 품사군으로 후보군을 제한하여 상기 교정 어휘 후보군을 생성하는 것을 특징으로 한다.In an embodiment, the generating of the proofreading vocabulary candidate group includes defining a non-fluent portion of the speech recognition result text as a proofreading section, dividing the proofreading section into morpheme units, and using the morpheme unit within the word vector space. It is characterized in that the group of words appearing together with other parts separated by .

실시 예에 있어서, 상기 교정 어휘 후보군에 속하는 단어들의 구조적 유사성을 검증하는 단계는, 상기 교정 구간의 단어와 상기 교정 어휘 후보군에 속한 단어들간의 상호 변환에 필요한 최소한의 연산 개수를 단어간의 편집 거리로 정의하고, 음소간 및 음절간의 최소 편집 거리를 산출하며, 상기 산출된 최소 편집 거리를 기반으로 상기 교정 어휘 후보군에 속하는 단어들의 구조적 유사성을 검증하는 것을 특징으로 한다.In an embodiment, the verifying of the structural similarity of the words belonging to the proofreading vocabulary candidate group includes setting the minimum number of operations required for mutual conversion between the words in the proofreading section and the words belonging to the proofreading vocabulary candidate group as the editing distance between words. It is characterized by defining, calculating a minimum editing distance between phonemes and syllables, and verifying structural similarity of words belonging to the proofreading vocabulary candidate group based on the calculated minimum editing distance.

실시 예에 있어서, 상기 교정 어휘 후보군에 속하는 단어들의 문맥적 적합성을 검증하는 단계는, 상기 교정 어휘 후보군에 속한 단어들이 상기 교정 구간의 단어와 교체되었을 때, 문맥적으로 적합한 문장이 완성되는지를 구술 및 기술을 포함한 면접자 관점 및 도메인 관점에서 검증하는 것을 특징으로 한다.In an embodiment, the verifying of the contextual suitability of the words belonging to the proofreading vocabulary candidate group comprises dictating whether a contextually appropriate sentence is completed when words belonging to the proofreading vocabulary candidate group are replaced with a word in the proofreading section. And it is characterized in that it is verified from the viewpoint of the interviewer and the domain, including the technology.

실시 예에 있어서, 상기 유사성 지수를 산출하여 최종 교정 단어 후보를 선정하는 단계는, 상기 교정 어휘 후보군 단어들의 구조적 유사성과 문맥적 적합성을 각각 0에서 1 사이의 값으로 수치화하고, 이를 기반으로 상기 최종 교정 단어 후보 추출을 위한 유사성 지수를 산출하는 것을 특징으로 한다.In an embodiment, the step of selecting the final proofreading word candidate by calculating the similarity index may include quantifying the structural similarity and contextual relevance of the words of the proofreading vocabulary candidate group as values between 0 and 1, respectively, and based on this, the final proofreading word candidate is calculated. It is characterized in that a similarity index for extracting proofreading word candidates is calculated.

실시 예에 있어서, 상기 유사성 지수를 산출하는 단계는, 상기 구조적 유사성의 가중치와 상기 문맥적 유사성의 가중치의 총합이 1일 때, 각 가중치를 선택적으로 조정해가면서 단계별로 상기 유사성 지수를 산출하고, 상기 단계별로 산출된 유사성 지수들로부터 톱(TOP) N개의 단어들을 추출하는 것을 특징으로 한다.In an embodiment, the calculating of the similarity index comprises calculating the similarity index step by step while selectively adjusting each weight when the sum of the weights of the structural similarity and the contextual similarity is 1, It is characterized in that the TOP N words are extracted from the similarity indices calculated in each step.

실시 예에 있어서, 상기 유사성 지수를 산출하는 단계는, 유사성 지수 = (문맥적 적합성 × 문맥 유사 가중치) + (구조적 유사성 × 구조 유사 가중치)로 이루어지는 수식에 의해 상기 유사성 지수를 산출하는 것을 특징으로 한다.In an embodiment, the calculating of the similarity index comprises calculating the similarity index by a formula consisting of similarity index = (contextual relevance × context similarity weight) + (structural similarity × structure similarity weight). .

실시 예에 있어서, 상기 유사성 지수를 산출하는 단계는, 상기 추출된 단계별 톱(TOP) N개의 단어들 중 다른 단계에서 높은 유사성 지수가 부여되는 단어들을 최종 교정 단어 후보로 판단하는 것을 특징으로 한다.In an embodiment, the step of calculating the similarity index is characterized in that words given a high similarity index in another step among the extracted top N words for each step are determined as final proofreading word candidates.

실시 예에 있어서, 상기 선정된 최종 교정 단어 후보의 적합성 여부를 판단하는 단계는, 상기 선정된 최종 교정 단어 후보가 적합하지 않으면 상기 음성 인식 결과 텍스트의 원본 문장을 출력하는 것을 특징으로 한다.In an embodiment, the determining whether the selected final proofreading word candidate is suitable may include outputting an original sentence of the speech recognition result text if the selected final proofreading word candidate is not appropriate.

본 발명의 일 실시예에 따른 음성 인식 후처리 장치는, AI(Artificial Intelligence) 면접 환경에서의 음성 인식 후처리 장치로서, AI 면접 대상자의 면접 자료를 획득하여 면접 자료 데이터를 기반으로 단어 벡터 공간을 생성하는 단어 벡터 공간 생성부, 상기 AI 면접 대상자의 음성 인식 결과 텍스트를 획득하여 상기 단어 벡터 공간을 기반으로 상기 음성 인식 결과 텍스트의 비유창성을 검증하는 비유창성 검증부, 상기 음성 인식 결과 텍스트 중 비유창성 부분을 교정 구간으로 정의하고, 상기 단어 벡터 공간을 기반으로 상기 교정 구간에 상응하는 교정 어휘 후보군을 생성하는 교정 어휘 후보군 생성부, 상기 교정 어휘 후보군에 속하는 단어들의 구조적 유사성을 검증하는 구조적 유사성 검증부, 상기 교정 어휘 후보군에 속하는 단어들의 문맥적 적합성을 검증하는 문맥적 적합성 검증부, 상기 교정 어휘 후보군에 속하는 단어들의 구조적 유사성 및 문맥적 적합성의 선택적 가중치 적용을 통해 유사성 지수를 산출하여 최종 교정 단어 후보를 선정하는 최종 교정 단어 후보 선정부, 상기 선정된 최종 교정 단어 후보의 적합성 여부를 판단하는 판단부, 그리고, 상기 선정된 최종 교정 단어 후보가 적합하면 상기 음성 인식 결과 텍스트 중 교정 구간을 상기 최종 교정 단어 후보로 교체하여 출력하는 결과 출력부를 포함하는 것을 특징으로 한다.A speech recognition post-processing device according to an embodiment of the present invention is a speech recognition post-processing device in an AI (Artificial Intelligence) interview environment. A word vector space generating unit that generates, a non-fluency verification unit that obtains the speech recognition result text of the AI interviewee and verifies the non-fluency of the speech recognition result text based on the word vector space, a metaphor among the speech recognition result text A proofreading vocabulary candidate group generating unit that defines a generation part as a proofreading section and generates a proofreading vocabulary candidate group corresponding to the proofreading section based on the word vector space, and structural similarity verification that verifies the structural similarity of words belonging to the proofreading vocabulary candidate group A final proofreading word by calculating a similarity index through a part, a contextual relevance verification unit that verifies the contextual relevance of words belonging to the proofreading vocabulary candidate group, and structural similarity and selective weighting of the contextual relevance of words belonging to the proofreading vocabulary candidate group A final proofreading word candidate selection unit that selects a candidate, a determination unit determining whether the selected final proofreading word candidate is appropriate, and if the selected final proofreading word candidate is suitable, a correction section of the speech recognition result text is selected as the final proofreading word candidate It characterized in that it comprises a result output unit for outputting by replacing the candidate correction word.

상술한 과제를 해결하기 위한 본 발명의 다른 실시 예에 따른 음성 인식 후처리 방법을 제공하는 컴퓨터 프로그램은, 하드웨어인 컴퓨터와 결합되어 상술한 방법 중 어느 하나의 방법을 수행하기 위해 매체에 저장된다.A computer program providing a voice recognition post-processing method according to another embodiment of the present invention for solving the above-described problems is stored in a medium in order to perform any one of the methods described above in combination with a computer that is hardware.

이 외에도, 본 발명을 구현하기 위한 다른 방법, 다른 시스템 및 상기 방법을 실행하기 위한 컴퓨터 프로그램을 기록하는 컴퓨터 판독 가능한 기록 매체가 더 제공될 수 있다.In addition to this, another method for implementing the present invention, another system, and a computer-readable recording medium for recording a computer program for executing the method may be further provided.

상기와 같이 본 발명에 따르면, 면접 자료 데이터를 기반으로 단어 벡터 공간을 생성하고, 이를 기초로 음성 인식 결과 텍스트의 교정 구간에 상응하는 교정 어휘 후보군의 구조적 유사성 및 문맥적 적합성을 검증하여 최종 교정 단어 후보를 선정함으로써, AI 면접 환경에서, 최소 비용 및 최소 시간으로 음성 인식 결과를 보완하여 정확성 및 신뢰성이 향상된 음성 인식 결과를 제공할 수 있다.As described above, according to the present invention, a word vector space is generated based on interview data data, and the structural similarity and contextual suitability of the proofreading vocabulary candidate group corresponding to the proofreading section of the speech recognition result text are verified based on this to verify the final proofreading word. By selecting candidates, it is possible to provide speech recognition results with improved accuracy and reliability by supplementing speech recognition results with minimum cost and minimum time in an AI interview environment.

즉, 본 발명은, 비지도 학습으로 생성된 단어 벡터 공간을 음성 인식 결과의 비유창성 감지에 활용할 수 있음을 제시하고, 포스트 코로나 시대가 시작된 이후 높은 수요가 발생하고 있는 비대면 면접 환경에 이를 적용할 수 있다.In other words, the present invention suggests that the word vector space generated by unsupervised learning can be used to detect the non-fluency of speech recognition results, and applies it to a non-face-to-face interview environment that is in high demand since the post-corona era began. can do.

이처럼, 본 발명은, AI 면접 환경에서 음성 인식 결과를 거친 면접자 음성 처리 텍스트를 비지도 학습 기반 기술로 생성한 단어 벡터 공간 등을 활용하여 후처리하는 방법이다.As such, the present invention is a method of post-processing the interviewer's speech processing text, which has undergone speech recognition in an AI interview environment, using a word vector space generated by unsupervised learning-based technology, and the like.

본 발명의 효과들은 이상에서 언급된 효과로 제한되지 않으며, 언급되지 않은 또 다른 효과들은 아래의 기재로부터 통상의 기술자에게 명확하게 이해될 수 있을 것이다.Effects of the present invention are not limited to the effects mentioned above, and other effects not mentioned will be clearly understood by those skilled in the art from the following description.

도 1은, 본 발명의 일 실시예에 따른 음성 인식 후처리 장치를 설명하기 위한 블록 구성도이다.
도 2 내지 5는, 본 발명의 일 실시예에 따른 음성 인식 후처리 장치의 음성 인식 후처리 방법을 설명하기 위한 흐름도이다.
도 6은, 유사성 지수 산출을 위한 수식을 설명하기 위한 도면이다.1 is a block diagram illustrating a voice recognition post-processing apparatus according to an embodiment of the present invention.
2 to 5 are flowcharts for explaining a voice recognition post-processing method of the voice recognition post-processing apparatus according to an embodiment of the present invention.
6 is a diagram for explaining a formula for calculating a similarity index.

본 발명의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나, 본 발명은 이하에서 개시되는 실시예들에 제한되는 것이 아니라 서로 다른 다양한 형태로 구현될 수 있으며, 단지 본 실시예들은 본 발명의 개시가 완전하도록 하고, 본 발명이 속하는 기술 분야의 통상의 기술자에게 본 발명의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 발명은 청구항의 범주에 의해 정의될 뿐이다.Advantages and features of the present invention and methods of achieving them will become apparent with reference to the embodiments described below in detail in conjunction with the accompanying drawings. However, the present invention is not limited to the embodiments disclosed below, but may be implemented in various different forms, and only the present embodiments allow the disclosure of the present invention to be complete, and those of ordinary skill in the art to which the present invention pertains. It is provided to fully understand the scope of the present invention to those skilled in the art, and the present invention is only defined by the scope of the claims.

본 명세서에서 사용된 용어는 실시예들을 설명하기 위한 것이며 본 발명을 제한하고자 하는 것은 아니다. 본 명세서에서, 단수형은 문구에서 특별히 언급하지 않는 한 복수형도 포함한다. 명세서에서 사용되는 "포함한다(comprises)" 및/또는 "포함하는(comprising)"은 언급된 구성요소 외에 하나 이상의 다른 구성요소의 존재 또는 추가를 배제하지 않는다. 명세서 전체에 걸쳐 동일한 도면 부호는 동일한 구성 요소를 지칭하며, "및/또는"은 언급된 구성요소들의 각각 및 하나 이상의 모든 조합을 포함한다. 비록 "제1", "제2" 등이 다양한 구성요소들을 서술하기 위해서 사용되나, 이들 구성요소들은 이들 용어에 의해 제한되지 않음은 물론이다. 이들 용어들은 단지 하나의 구성요소를 다른 구성요소와 구별하기 위하여 사용하는 것이다. 따라서, 이하에서 언급되는 제1 구성요소는 본 발명의 기술적 사상 내에서 제2 구성요소일 수도 있음은 물론이다.The terminology used herein is for the purpose of describing the embodiments and is not intended to limit the present invention. In this specification, the singular also includes the plural unless specifically stated otherwise in the phrase. As used herein, “comprises” and/or “comprising” does not exclude the presence or addition of one or more other components in addition to the stated components. Like reference numerals refer to like elements throughout, and "and/or" includes each and every combination of one or more of the recited elements. Although "first", "second", etc. are used to describe various elements, these elements are not limited by these terms, of course. These terms are only used to distinguish one component from another. Accordingly, it goes without saying that the first component mentioned below may be the second component within the spirit of the present invention.

다른 정의가 없다면, 본 명세서에서 사용되는 모든 용어(기술 및 과학적 용어를 포함)는 본 발명이 속하는 기술분야의 통상의 기술자에게 공통적으로 이해될 수 있는 의미로 사용될 수 있을 것이다. 또한, 일반적으로 사용되는 사전에 정의되어 있는 용어들은 명백하게 특별히 정의되어 있지 않는 한 이상적으로 또는 과도하게 해석되지 않는다.Unless otherwise defined, all terms (including technical and scientific terms) used herein will have the meaning commonly understood by those of ordinary skill in the art to which this invention belongs. In addition, terms defined in a commonly used dictionary are not to be interpreted ideally or excessively unless specifically defined explicitly.

이하, 첨부된 도면을 참조하여 본 발명의 실시예를 상세하게 설명한다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

설명에 앞서 본 명세서에서 사용하는 용어의 의미를 간략히 설명한다. 그렇지만 용어의 설명은 본 명세서의 이해를 돕기 위한 것이므로, 명시적으로 본 발명을 한정하는 사항으로 기재하지 않은 경우에 본 발명의 기술적 사상을 한정하는 의미로 사용하는 것이 아님을 주의해야 한다.Before the description, the meaning of the terms used in this specification will be briefly described. However, it should be noted that, since the description of the term is for the purpose of helping the understanding of the present specification, it is not used in the meaning of limiting the technical idea of the present invention unless explicitly described as limiting the present invention.

도 1은, 본 발명의 일 실시예에 따른 음성 인식 후처리 장치를 설명하기 위한 블록 구성도이다.1 is a block diagram illustrating a voice recognition post-processing apparatus according to an embodiment of the present invention.

도 1에 도시된 바와 같이, 본 발명에 따른 AI 면접 환경에서의 음성 인식 후처리 장치는, 단어 벡터 공간 생성부(110), 비유창성 검증부(120), 교정 어휘 후보군 생성부(130), 구조적 유사성 검증부(140), 문맥적 적합성 검증부(150), 최종 교정 단어 후보 선정부(160), 판단부(170), 그리고 결과 출력부(180)를 포함할 수 있다.1, the speech recognition post-processing apparatus in an AI interview environment according to the present invention includes a word vector space generation unit 110, a non-fluency verification unit 120, a corrected vocabulary candidate group generation unit 130, It may include a structural similarity verification unit 140 , a contextual suitability verification unit 150 , a final proofread word candidate selection unit 160 , a determination unit 170 , and a result output unit 180 .

여기서, 단어 벡터 공간 생성부(110)는, AI 면접 대상자의 면접 자료를 획득하여 면접 자료 데이터를 기반으로 단어 벡터 공간을 생성할 수 있다.Here, the word vector space generating unit 110 may acquire interview data of the AI interviewee and generate a word vector space based on the interview data data.

일 예로, 단어 벡터 공간 생성부(110)는, AI 면접 대상자의 면접 자료들 중 자기소개 데이터를 기반으로 단어 벡터 공간을 생성할 수 있는데, 이는 일 실시예일 뿐, 이에 한정되지는 않는다.As an example, the word vector space generator 110 may generate a word vector space based on self-introduction data among interview materials of the AI interviewee, but this is only an example, and is not limited thereto.

그리고, 단어 벡터 공간 생성부(110)는, 면접 자료 데이터에서 개인 정보를 포함하는 부분 및 불용어를 전처리하고, 전처리가 완료된 면접 자료 데이터를 토큰화시켜 단어 벡터 공간을 생성할 수 있다.In addition, the word vector space generating unit 110 may pre-process portions and stopwords including personal information in the interview data data, and tokenize the pre-processed interview data data to generate a word vector space.

여기서, 단어 벡터 공간 생성부(110)는, 토큰화시킬 때, 전처리가 완료된 면접 자료 데이터를 형태소 분리 및 품사 분류를 통해 토큰화시킬 수 있다.Here, when tokenizing, the word vector space generating unit 110 may tokenize the interview data data that has been pre-processed through morpheme separation and part-of-speech classification.

다음, 비유창성 검증부(120)는, AI 면접 대상자의 음성 인식 결과 텍스트를 획득하여 단어 벡터 공간을 기반으로 음성 인식 결과 텍스트의 비유창성을 검증할 수 있다.Next, the non-fluency verification unit 120 may obtain the speech recognition result text of the AI interviewee and verify the non-fluency of the speech recognition result text based on the word vector space.

일 예로, 비유창성 검증부(120)는, 음성 인식 결과 텍스트를 획득하면 음성 인식 결과 텍스트를 형태소 단위로 분리하고 품사 구분을 진행한 다음, 단어 벡터 공간에 사상하여 음성 인식 결과 텍스트에서 비유창성을 갖는 부분을 감지하고 해당 부분을 교정 구간으로 판단할 수 있다.As an example, the non-fluency verification unit 120, upon obtaining the speech recognition result text, separates the speech recognition result text into morpheme units, performs part-of-speech classification, and maps the speech recognition result text to the word vector space to determine the non-fluency level in the speech recognition result text. It is possible to detect a portion having a portion and determine that portion as a calibration section.

여기서, 비유창성 검증부(120)는, 형태소 분리 및 품사 분류가 진행된 음성 인식 결과 텍스트를 단어 벡터 공간에 사상할 때, 음성 인식 오류로 잘못 인식된 형태소가 벡터 공간 내에서의 배치가 일반적이지 않거나 OOV(Out-of-Vocabulary) 문제를 갖는다는 점을 활용하여 비유창성을 갖는 부분을 감지할 수 있다.Here, when the non-fluency verification unit 120 maps the speech recognition result text, which has undergone morpheme separation and part-of-speech classification, to the word vector space, the morpheme incorrectly recognized as a speech recognition error is not generally arranged in the vector space or By taking advantage of the fact that it has an Out-of-Vocabulary (OOV) problem, it is possible to detect parts with inflexibility.

이어, 교정 어휘 후보군 생성부(130)는, 음성 인식 결과 텍스트 중 비유창성 부분을 교정 구간으로 정의하고, 단어 벡터 공간을 기반으로 교정 구간에 상응하는 교정 어휘 후보군을 생성할 수 있다.Next, the proofreading vocabulary candidate group generating unit 130 may define a non-fluent part of the speech recognition result text as a proofreading section and generate a proofreading vocabulary candidate group corresponding to the proofreading section based on the word vector space.

여기서, 교정 어휘 후보군 생성부(130)는, 음성 인식 결과 텍스트 중 비유창성 부분을 교정 구간으로 정의하고, 교정 구간을 형태소 단위로 분리하며, 단어 벡터 공간 내에서 형태소 단위로 분리된 다른 부분과 함께 등장하는 단어군을 우선적으로 추출하고, 교정 구간의 예상 품사군으로 후보군을 제한하여 교정 어휘 후보군을 생성할 수 있다.Here, the proofreading vocabulary candidate group generating unit 130 defines a non-fluent part of the speech recognition result text as a calibration section, separates the proofreading section into morpheme units, and together with other parts separated by morpheme units in the word vector space. It is possible to generate a proofreading vocabulary candidate group by preferentially extracting the appearing word group and limiting the candidate group to the expected part-of-speech group of the proofreading section.

다음, 구조적 유사성 검증부(140)는, 교정 어휘 후보군에 속하는 단어들의 구조적 유사성을 검증할 수 있다.Next, the structural similarity verification unit 140 may verify the structural similarity of words belonging to the proofreading vocabulary candidate group.

여기서, 구조적 유사성 검증부(140)는, 교정 구간의 단어와 교정 어휘 후보군에 속한 단어들간의 상호 변환에 필요한 최소한의 연산 개수를 단어간의 편집 거리로 정의하고, 음소간 및 음절간의 최소 편집 거리를 산출하며, 산출된 최소 편집 거리를 기반으로 교정 어휘 후보군에 속하는 단어들의 구조적 유사성을 검증할 수 있다.Here, the structural similarity verification unit 140 defines the minimum number of operations required for mutual conversion between the words in the proofing section and the words belonging to the proofreading vocabulary candidate group as the editing distance between words, and the minimum editing distance between phonemes and syllables. and, based on the calculated minimum editing distance, it is possible to verify the structural similarity of words belonging to the proofreading vocabulary candidate group.

일 예로, 구조적 유사성 검증부(140)는, 면접 환경을 고려하여 같은 음절 수를 가진 교정 구간의 단어와 교정 어휘 후보군에 속한 단어들의 편집 거리와 두 단어의 구조적 유사성이 반비례한다는 가정을 기반으로 교정 어휘 후보군에 속하는 단어들의 구조적 유사성을 검증할 수 있다.As an example, the structural similarity verification unit 140 performs correction based on the assumption that the editing distance between the words in the proofreading section having the same number of syllables and the words in the proofreading vocabulary candidate group and the structural similarity of the two words are inversely proportional to each other in consideration of the interview environment. Structural similarity of words belonging to the lexical candidate group can be verified.

이때, 구조적 유사성 검증부(140)는, 면접 환경을 고려하여 통신체, 비속어, 줄임말을 포함하는 단어를 제외하고 교정 어휘 후보군에 속하는 단어들의 구조적 유사성을 검증할 수 있다.In this case, the structural similarity verifying unit 140 may verify the structural similarity of words belonging to the proofreading vocabulary candidate group except for words including communicators, profanity, and abbreviations in consideration of the interview environment.

그리고, 문맥적 적합성 검증부(150)는, 교정 어휘 후보군에 속하는 단어들의 문맥적 적합성을 검증할 수 있다.In addition, the contextual relevance verification unit 150 may verify the contextual relevance of words belonging to the proofreading vocabulary candidate group.

여기서, 문맥적 적합성 검증부(150)는, 교정 어휘 후보군에 속한 단어들이 교정 구간의 단어와 교체되었을 때, 문맥적으로 적합한 문장이 완성되는지를 구술 및 기술을 포함한 면접자 관점 및 도메인 관점에서 검증할 수 있다.Here, the contextual relevance verification unit 150 verifies whether or not a contextually appropriate sentence is completed when words belonging to the proofreading vocabulary candidate group are replaced with words in the proofreading section from the interviewer's point of view and domain point of view, including oral and descriptive. can

일 예로, 문맥적 적합성 검증부(150)는, 면접자 관점 및 도메인 관점에서 검증할 때, 토픽 모델링 및 감정 분석을 통해 검증을 수행하고, 각 검증 항목에 선택적 가중치를 부여하여 문맥적 적합성 지수를 산정하며, 산정한 문맥적 적합성 지수를 반영할 수 있다.As an example, the contextual relevance verification unit 150 calculates a contextual relevance index by assigning a selective weight to each verification item and performing verification through topic modeling and emotion analysis when verifying from the interviewer's point of view and the domain point of view. and the calculated contextual relevance index can be reflected.

예를 들면, 검증 항목은, 면접자가 음성으로 기술한 사항 중 교정 대상에 포함되지 않은 다른 문장과 유사한 내용 흐름으로 진행되는지를 검증하는 제1 검증 항목, 면접자가 음성으로 기술한 사항 중 교정 대상에 포함되지 않은 다른 문장과 유사한 감정적 흐름으로 진행되는지를 검증하는 제2 검증 항목, 면접자가 제출한 자기소개서와 유사한 내용 흐름으로 진행되는지를 검증하는 제3 검증 항목, 면접자가 제출한 자기소개서와 유사한 감정적 흐름으로 진행되는지를 검증하는 제4 검증 항목, 자기소개서 데이터 내에서 유사한 문맥적 흐름의 관측이 가능한지를 검증하는 제5 검증 항목을 포함할 수 있는데, 이는 일 실시예일 뿐, 이에 한정되지는 않는다.For example, the verification item is the first verification item that verifies whether the content flow is similar to other sentences not included in the correction target among the matters described by the interviewer voice, A second verification item that verifies whether it proceeds with an emotional flow similar to other sentences not included, a third verification item that verifies whether the flow of content is similar to the self-introduction submitted by the interviewer, and emotional flow similar to the self-introduction submitted by the interviewer It may include a fourth verification item verifying whether the flow proceeds or not, and a fifth verification item verifying whether observation of a similar contextual flow within the self-introduction data is possible, which is only an example, but is not limited thereto.

이어, 최종 교정 단어 후보 선정부(160)는, 교정 어휘 후보군에 속하는 단어들의 구조적 유사성 및 문맥적 적합성의 선택적 가중치 적용을 통해 유사성 지수를 산출하여 최종 교정 단어 후보를 선정할 수 있다.Next, the final proofreading word candidate selection unit 160 may select the final proofreading word candidate by calculating a similarity index through selective weighting of structural similarity and contextual relevance of words belonging to the proofreading vocabulary candidate group.

여기서, 최종 교정 단어 후보 선정부(160)는, 교정 어휘 후보군 단어들의 구조적 유사성과 문맥적 적합성을 각각 0에서 1 사이의 값으로 수치화하고, 이를 기반으로 최종 교정 단어 후보 추출을 위한 유사성 지수를 산출할 수 있다.Here, the final proofreading word candidate selection unit 160 quantifies the structural similarity and contextual relevance of the words in the proofreading vocabulary candidate group as values between 0 and 1, respectively, and calculates a similarity index for extracting the final proofreading word candidate based on this value can do.

일 예로, 최종 교정 단어 후보 선정부(160)는, 유사성 지수를 산출할 때, 구조적 유사성의 가중치와 문맥적 유사성의 가중치의 총합이 1일 때, 각 가중치를 선택적으로 조정해가면서 단계별로 유사성 지수를 산출하고, 단계별로 산출된 유사성 지수들로부터 톱(TOP) N개의 단어들을 추출할 수 있다.For example, when calculating the similarity index, the final proofreading word candidate selection unit 160 selectively adjusts each weight when the sum of the structural similarity weight and the contextual similarity weight is 1, and the similarity index step by step , and top N words can be extracted from the similarity indices calculated step by step.

일 예로, 최종 교정 단어 후보 선정부(160)는, 유사성 지수를 산출할 때, 유사성 지수 = (문맥적 적합성 × 문맥 유사 가중치) + (구조적 유사성 × 구조 유사 가중치)로 이루어지는 수식에 의해 유사성 지수를 산출할 수 있다.For example, when calculating the similarity index, the final proofreading word candidate selection unit 160 calculates the similarity index by using a formula consisting of similarity index = (contextual relevance × context similarity weight) + (structural similarity × structure similarity weight). can be calculated.

그리고, 최종 교정 단어 후보 선정부(160)는, 추출된 단계별 톱(TOP) N개의 단어들 중 다른 단계에서 높은 유사성 지수가 부여되는 단어들을 최종 교정 단어 후보로 판단할 수 있다.In addition, the final proofreading word candidate selection unit 160 may determine, as final proofreading word candidates, words given a high similarity index in another step among the extracted top N words for each step.

다음, 판단부(170)는, 선정된 최종 교정 단어 후보의 적합성 여부를 판단할 수 있다.Next, the determination unit 170 may determine whether the selected final proofreading word candidate is appropriate.

여기서, 판단부(170)는, 교정 여부를 결정하는 교정 결정 지표를 기준으로 선정된 최종 교정 단어 후보의 적합성 여부를 판단할 수 있다.Here, the determination unit 170 may determine whether the final proofreading word candidate selected based on the proofreading decision index for determining whether to proofread is appropriate.

이어, 결과 출력부(180)는, 선정된 최종 교정 단어 후보가 적합하면 음성 인식 결과 텍스트 중 교정 구간을 최종 교정 단어 후보로 교체하여 출력할 수 있다.Then, if the selected final proofreading word candidate is suitable, the result output unit 180 may replace the proofreading section of the speech recognition result text with the final proofreading word candidate and output the same.

또한, 결과 출력부(180)는, 선정된 최종 교정 단어 후보가 적합하지 않으면 음성 인식 결과 텍스트의 원본 문장을 출력할 수 있다.Also, the result output unit 180 may output the original sentence of the speech recognition result text if the selected final proofreading word candidate is not suitable.

이와 같이, 본 발명은, 면접 자료 데이터를 기반으로 단어 벡터 공간을 생성하고, 이를 기초로 음성 인식 결과 텍스트의 교정 구간에 상응하는 교정 어휘 후보군의 구조적 유사성 및 문맥적 적합성을 검증하여 최종 교정 단어 후보를 선정함으로써, AI 면접 환경에서, 최소 비용 및 최소 시간으로 음성 인식 결과를 보완하여 정확성 및 신뢰성이 향상된 음성 인식 결과를 제공할 수 있다.As described above, the present invention generates a word vector space based on interview data, and verifies the structural similarity and contextual suitability of the proofreading vocabulary candidate group corresponding to the proofreading section of the speech recognition result text based on the result of the final proofreading word candidate. By selecting , it is possible to provide speech recognition results with improved accuracy and reliability by supplementing speech recognition results with minimum cost and minimum time in an AI interview environment.

도 2 내지 5는, 본 발명의 일 실시예에 따른 음성 인식 후처리 장치의 음성 인식 후처리 방법을 설명하기 위한 흐름도이고, 도 6은, 유사성 지수 산출을 위한 수식을 설명하기 위한 도면이다.2 to 5 are flowcharts for explaining a voice recognition post-processing method of a voice recognition post-processing apparatus according to an embodiment of the present invention, and FIG. 6 is a diagram for explaining a formula for calculating a similarity index.

도 2 내지 도 5에 도시된 바와 같이, 본 발명은, AI 면접 대상자의 면접 자료를 획득하여 면접 자료 데이터를 기반으로 단어 벡터 공간을 생성할 수 있다(S10).2 to 5 , the present invention may generate a word vector space based on the interview data by acquiring interview data of an AI interviewee (S10).

여기서, 본 발명은, AI 면접 대상자의 면접 자료들 중 자기소개 데이터를 기반으로 단어 벡터 공간을 생성할 수 있는데, 면접 자료 데이터에서 개인 정보를 포함하는 부분 및 불용어를 전처리하고, 전처리가 완료된 면접 자료 데이터를 형태소 분리 및 품사 분류를 통해 토큰화시켜 단어 벡터 공간을 생성할 수 있다.Here, the present invention can create a word vector space based on self-introduction data among the interview data of the AI interviewee. In the interview data data, parts and stop words containing personal information are pre-processed, and the pre-processing is completed. A word vector space can be created by tokenizing data through morpheme separation and part-of-speech classification.

이어, 본 발명은, AI 면접 대상자의 음성 인식 결과 텍스트를 획득하고, 단어 벡터 공간을 기반으로 음성 인식 결과 텍스트의 비유창성을 검증할 수 있다(S20).Next, according to the present invention, it is possible to obtain the speech recognition result text of the AI interviewee and verify the non-fluency of the speech recognition result text based on the word vector space (S20).

여기서, 도 3과 같이, 본 발명은, 음성 인식 결과 텍스트를 획득하면(S22), 음성 인식 결과 텍스트를 형태소 단위로 분리하고 품사 구분을 진행한 다음(S24), 단어 벡터 공간에 사상하여 음성 인식 결과 텍스트에서 비유창성을 갖는 부분을 감지하고(S26), 해당 부분을 교정 구간으로 판단할 수 있다(S28).Here, as shown in FIG. 3 , in the present invention, when the speech recognition result text is obtained (S22), the speech recognition result text is separated into morpheme units and part-of-speech classification is performed (S24), and then mapped to a word vector space for speech recognition A portion having non-fluency in the resulting text may be detected (S26), and the corresponding portion may be determined as a correction section (S28).

일 예로, 본 발명은, 형태소 분리 및 품사 분류가 진행된 음성 인식 결과 텍스트를 단어 벡터 공간에 사상할 때, 음성 인식 오류로 잘못 인식된 형태소가 벡터 공간 내에서의 배치가 일반적이지 않거나 OOV(Out-of-Vocabulary) 문제를 갖는다는 점을 활용하여 비유창성을 갖는 부분을 감지할 수 있다.As an example, according to the present invention, when a speech recognition result text subjected to morpheme separation and part-of-speech classification is mapped to a word vector space, a morpheme incorrectly recognized as a speech recognition error is not generally arranged in the vector space or OOV (Out-of-Speech) By taking advantage of the fact that it has the of-Vocabulary problem, it is possible to detect parts with non-fluency.

다음, 본 발명은, 음성 인식 결과 텍스트 중 비유창성 부분을 교정 구간으로 정의하고, 단어 벡터 공간을 기반으로 교정 구간에 상응하는 교정 어휘 후보군을 생성할 수 있다(S30).Next, according to the present invention, the non-fluent part of the speech recognition result text may be defined as a proofreading section, and a proofreading vocabulary candidate group corresponding to the proofreading section may be generated based on the word vector space ( S30 ).

여기서, 도 4와 같이, 본 발명은, 음성 인식 결과 텍스트 중 비유창성 부분을 교정 구간으로 정의하고(S32), 교정 구간을 형태소 단위로 분리하며(S34), 단어 벡터 공간 내에서 형태소 단위로 분리된 다른 부분과 함께 등장하는 단어군을 우선적으로 추출하고(S36), 교정 구간의 예상 품사군으로 후보군을 제한하여 교정 어휘 후보군을 생성할 수 있다(S38).Here, as shown in FIG. 4, in the present invention, the non-fluent portion of the speech recognition result text is defined as a correction section (S32), the correction section is separated by morpheme units (S34), and separated into morpheme units in the word vector space. A group of words appearing together with other parts of the subject may be preferentially extracted (S36), and the candidate group may be limited to an expected part-of-speech group of the proofreading section to generate a proofreading vocabulary candidate group (S38).

그리고, 본 발명은, 교정 어휘 후보군에 속하는 단어들의 구조적 유사성 및 문맥적 적합성을 검증하고, 이들 간의 선택적 가중치 적용을 통해 유사성 지수를 산출하여 최종 교정 단어 후보를 선정할 수 있다(S40).And, according to the present invention, the final proofreading word candidate can be selected by verifying structural similarity and contextual suitability of words belonging to the proofreading vocabulary candidate group, and calculating a similarity index through selective weighting between them (S40).

즉, 본 발명은, 도 5와 같이, 교정 어휘 후보군에 속하는 단어들의 구조적 유사성을 검증하고(S42), 교정 어휘 후보군에 속하는 단어들의 문맥적 적합성을 검증한 다음(S44), 이들 간의 선택적 가중치 적용을 통해 유사성 지수를 산출하고(S46), 유사성 기수를 기반으로 최종 교정 단어 후보를 선정할 수 있다(S48).That is, in the present invention, as shown in FIG. 5 , the structural similarity of words belonging to the proofreading vocabulary candidate group is verified (S42), the contextual suitability of the words belonging to the proofreading vocabulary candidate group is verified (S44), and selective weighting is applied therebetween. A similarity index is calculated through (S46), and a final proofreading word candidate can be selected based on the similarity radix (S48).

일 예로, 본 발명은, 유사성 지수를 산출할 때, 구조적 유사성의 가중치와 문맥적 유사성의 가중치의 총합이 1일 때, 각 가중치를 선택적으로 조정해가면서 단계별로 유사성 지수를 산출하고, 단계별로 산출된 유사성 지수들로부터 톱(TOP) N개의 단어들을 추출할 수 있다.For example, in the present invention, when calculating the similarity index, when the sum of the weights of structural similarity and contextual similarity is 1, the similarity index is calculated step by step while selectively adjusting each weight, and calculated step by step Top N words can be extracted from the obtained similarity indices.

여기서, 도 6과 같이, 본 발명은, 유사성 지수 = (문맥적 적합성 × 문맥 유사 가중치) + (구조적 유사성 × 구조 유사 가중치)로 이루어지는 수식에 의해 유사성 지수를 산출할 수 있다.Here, as shown in FIG. 6 , in the present invention, the similarity index can be calculated by a formula consisting of similarity index = (contextual relevance × context similarity weight) + (structural similarity × structure similarity weight).

또한, 본 발명은, 교정 어휘 후보군에 속하는 단어들의 구조적 유사성을 검증할 때, 교정 구간의 단어와 교정 어휘 후보군에 속한 단어들간의 상호 변환에 필요한 최소한의 연산 개수를 단어간의 편집 거리로 정의하고, 음소간 및 음절간의 최소 편집 거리를 산출하며, 산출된 최소 편집 거리를 기반으로 교정 어휘 후보군에 속하는 단어들의 구조적 유사성을 검증할 수 있다.In addition, in the present invention, when verifying the structural similarity of words belonging to the proofreading vocabulary candidate group, the minimum number of operations required for mutual conversion between the words in the proofreading section and the words in the proofreading vocabulary candidate group is defined as the editing distance between words, The minimum editing distance between phonemes and syllables is calculated, and the structural similarity of words belonging to the proofreading vocabulary candidate group can be verified based on the calculated minimum editing distance.

이때, 본 발명은, 면접 환경을 고려하여 통신체, 비속어, 줄임말을 포함하는 단어를 제외하고 같은 음절 수를 가진 교정 구간의 단어와 교정 어휘 후보군에 속한 단어들의 편집 거리와 두 단어의 구조적 유사성이 반비례한다는 가정을 기반으로 교정 어휘 후보군에 속하는 단어들의 구조적 유사성을 검증할 수 있다.At this time, in the present invention, in consideration of the interview environment, the editing distance of words in the proofreading section having the same number of syllables and words belonging to the proofreading vocabulary candidate group except for words including communicators, profanity, and abbreviations are inversely proportional to the structural similarity of the two words Structural similarity of words belonging to the proofreading vocabulary candidate group can be verified based on the assumption that

또한, 본 발명은, 교정 어휘 후보군에 속하는 단어들의 문맥적 적합성을 검증할 때, 교정 어휘 후보군에 속한 단어들이 교정 구간의 단어와 교체될 경우, 문맥적으로 적합한 문장이 완성되는지를 구술 및 기술을 포함한 면접자 관점 및 도메인 관점에서 검증할 수 있다.In addition, when verifying the contextual suitability of words belonging to the proofreading vocabulary candidate group, the present invention provides dictation and description of whether a contextually appropriate sentence is completed when words belonging to the proofreading vocabulary candidate group are replaced with a word in the proofreading section. It can be verified from the viewpoint of the interviewer including the domain and the viewpoint of the domain.

또한, 본 발명은, 면접자 관점 및 도메인 관점에서 검증할 때, 토픽 모델링 및 감정 분석을 통해 검증을 수행하고, 각 검증 항목에 선택적 가중치를 부여하여 문맥적 적합성 지수를 산정하며, 산정한 문맥적 적합성 지수를 반영할 수 있다.In addition, the present invention performs verification through topic modeling and emotion analysis when verifying from the interviewer's point of view and from the domain point of view, assigning an optional weight to each verification item to calculate a contextual relevance index, and calculated contextual relevance index can be reflected.

그리고, 본 발명은, 유사성 지수를 산출하여 최종 교정 단어 후보를 선정할 때, 교정 어휘 후보군 단어들의 구조적 유사성과 문맥적 적합성을 각각 0에서 1 사이의 값으로 수치화하고, 이를 기반으로 최종 교정 단어 후보 추출을 위한 유사성 지수를 산출할 수 있다.In addition, the present invention quantifies the structural similarity and contextual relevance of the words of the proofreading vocabulary candidate group as values between 0 and 1, respectively, when selecting the final proofreading word candidate by calculating the similarity index, and based on this, the final proofreading word candidate A similarity index for extraction can be calculated.

여기서, 본 발명은, 유사성 지수를 산출할 때, 구조적 유사성의 가중치와 문맥적 유사성의 가중치의 총합이 1일 때, 각 가중치를 선택적으로 조정해가면서 단계별로 유사성 지수를 산출하고, 단계별로 산출된 유사성 지수들로부터 톱(TOP) N개의 단어들을 추출하며, 추출된 단계별 톱(TOP) N개의 단어들 중 다른 단계에서 높은 유사성 지수가 부여되는 단어들을 최종 교정 단어 후보로 판단할 수 있다.Here, in the present invention, when calculating the similarity index, when the sum of the weights of structural similarity and contextual similarity is 1, the similarity index is calculated step by step while selectively adjusting each weight, and the calculated similarity index is calculated step by step. Top N words are extracted from the similarity indices, and words given a high similarity index in another step among the extracted top N words for each stage may be determined as final proofreading word candidates.

이어, 본 발명은, 선정된 최종 교정 단어 후보의 적합성 여부를 판단할 수 있다(S50).Next, the present invention may determine whether the selected final proofreading word candidate is appropriate (S50).

다음, 본 발명은, 선정된 최종 교정 단어 후보가 적합하면 음성 인식 결과 텍스트 중 교정 구간을 최종 교정 단어 후보로 교체하여 출력할 수 있다(S60).Next, according to the present invention, if the selected final proofreading word candidate is suitable, the calibration section of the speech recognition result text may be replaced with the final proofreading word candidate and outputted (S60).

또한, 본 발명은, 선정된 최종 교정 단어 후보가 적합하지 않으면 음성 인식 결과 텍스트의 원본 문장을 출력할 수 있다(S70).Also, in the present invention, if the selected final proofreading word candidate is not suitable, the original sentence of the speech recognition result text may be output (S70).

이상에서 전술한 본 발명의 일 실시예에 따른 방법은, 하드웨어인 서버와 결합되어 실행되기 위해 프로그램(또는 어플리케이션)으로 구현되어 매체에 저장될 수 있다.The method according to an embodiment of the present invention described above may be implemented as a program (or application) to be executed in combination with a server, which is hardware, and stored in a medium.

상기 전술한 프로그램은, 상기 컴퓨터가 프로그램을 읽어 들여 프로그램으로 구현된 상기 방법들을 실행시키기 위하여, 상기 컴퓨터의 프로세서(CPU)가 상기 컴퓨터의 장치 인터페이스를 통해 읽힐 수 있는 C, C++, JAVA, 기계어 등의 컴퓨터 언어로 코드화된 코드(Code)를 포함할 수 있다. 이러한 코드는 상기 방법들을 실행하는 필요한 기능들을 정의한 함수 등과 관련된 기능적인 코드(Functional Code)를 포함할 수 있고, 상기 기능들을 상기 컴퓨터의 프로세서가 소정의 절차대로 실행시키는데 필요한 실행 절차 관련 제어 코드를 포함할 수 있다. 또한, 이러한 코드는 상기 기능들을 상기 컴퓨터의 프로세서가 실행시키는데 필요한 추가 정보나 미디어가 상기 컴퓨터의 내부 또는 외부 메모리의 어느 위치(주소 번지)에서 참조되어야 하는지에 대한 메모리 참조관련 코드를 더 포함할 수 있다. 또한, 상기 컴퓨터의 프로세서가 상기 기능들을 실행시키기 위하여 원격(Remote)에 있는 어떠한 다른 컴퓨터나 서버 등과 통신이 필요한 경우, 코드는 상기 컴퓨터의 통신 모듈을 이용하여 원격에 있는 어떠한 다른 컴퓨터나 서버 등과 어떻게 통신해야 하는지, 통신 시 어떠한 정보나 미디어를 송수신해야 하는지 등에 대한 통신 관련 코드를 더 포함할 수 있다.The above-described program is C, C++, JAVA, machine language, etc. that a processor (CPU) of the computer can read through a device interface of the computer in order for the computer to read the program and execute the methods implemented as a program It may include code (Code) coded in the computer language of Such code may include functional code related to a function defining functions necessary for executing the methods, etc., and includes an execution procedure related control code necessary for the processor of the computer to execute the functions according to a predetermined procedure. can do. In addition, the code may further include additional information necessary for the processor of the computer to execute the functions or code related to memory reference for which location (address address) in the internal or external memory of the computer to be referenced. there is. In addition, when the processor of the computer needs to communicate with any other computer or server located remotely in order to execute the above functions, the code uses the communication module of the computer to determine how to communicate with any other computer or server remotely. It may further include a communication-related code for whether to communicate and what information or media to transmit and receive during communication.

상기 저장되는 매체는, 레지스터, 캐쉬, 메모리 등과 같이 짧은 순간 동안 데이터를 저장하는 매체가 아니라 반영구적으로 데이터를 저장하며, 기기에 의해 판독(reading)이 가능한 매체를 의미한다. 구체적으로는, 상기 저장되는 매체의 예로는 ROM, RAM, CD-ROM, 자기 테이프, 플로피디스크, 광 데이터 저장장치 등이 있지만, 이에 제한되지 않는다. 즉, 상기 프로그램은 상기 컴퓨터가 접속할 수 있는 다양한 서버 상의 다양한 기록매체 또는 사용자의 상기 컴퓨터상의 다양한 기록매체에 저장될 수 있다. 또한, 상기 매체는 네트워크로 연결된 컴퓨터 시스템에 분산되어, 분산방식으로 컴퓨터가 읽을 수 있는 코드가 저장될 수 있다.The storage medium is not a medium that stores data for a short moment, such as a register, a cache, a memory, etc., but a medium that stores data semi-permanently and can be read by a device. Specifically, examples of the storage medium include, but are not limited to, ROM, RAM, CD-ROM, magnetic tape, floppy disk, and an optical data storage device. That is, the program may be stored in various recording media on various servers accessible by the computer or in various recording media on the computer of the user. In addition, the medium may be distributed in a computer system connected to a network, and a computer-readable code may be stored in a distributed manner.

본 발명의 실시예와 관련하여 설명된 방법 또는 알고리즘의 단계들은 하드웨어로 직접 구현되거나, 하드웨어에 의해 실행되는 소프트웨어 모듈로 구현되거나, 또는 이들의 결합에 의해 구현될 수 있다. 소프트웨어 모듈은 RAM(Random Access Memory), ROM(Read Only Memory), EPROM(Erasable Programmable ROM), EEPROM(Electrically Erasable Programmable ROM), 플래시 메모리(Flash Memory), 하드 디스크, 착탈형 디스크, CD-ROM, 또는 본 발명이 속하는 기술 분야에서 잘 알려진 임의의 형태의 컴퓨터 판독가능 기록매체에 상주할 수도 있다.The steps of a method or algorithm described in relation to an embodiment of the present invention may be implemented directly in hardware, as a software module executed by hardware, or by a combination thereof. A software module may contain random access memory (RAM), read only memory (ROM), erasable programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), flash memory, hard disk, removable disk, CD-ROM, or It may reside in any type of computer-readable recording medium well known in the art to which the present invention pertains.

이상, 첨부된 도면을 참조로 하여 본 발명의 실시예를 설명하였지만, 본 발명이 속하는 기술분야의 통상의 기술자는 본 발명이 그 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 실시될 수 있다는 것을 이해할 수 있을 것이다. 그러므로, 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며, 제한적이 아닌 것으로 이해해야만 한다.As mentioned above, although embodiments of the present invention have been described with reference to the accompanying drawings, those skilled in the art to which the present invention pertains know that the present invention may be embodied in other specific forms without changing the technical spirit or essential features thereof. you will be able to understand Therefore, it should be understood that the embodiments described above are illustrative in all respects and not restrictive.

Claims

In the voice recognition post-processing method of a voice recognition post-processing device in an AI (Artificial Intelligence) interview environment,
acquiring the interview data of the AI interviewee and generating a word vector space based on the interview data;
obtaining the speech recognition result text of the AI interviewee and verifying the non-fluency of the speech recognition result text based on the word vector space;
defining a non-fluent part of the speech recognition result text as a correction section, and generating a proofreading vocabulary candidate group corresponding to the correction section based on the word vector space;
selecting a final proofreading word candidate by verifying structural similarity and contextual suitability of words belonging to the proofreading vocabulary candidate group, and calculating a similarity index through selective weighting between them;
determining whether the selected final proofreading word candidate is appropriate; and
If the selected final proofreading word candidate is suitable, replacing the calibration section of the speech recognition result text with the final proofreading word candidate and outputting it;
The step of calculating the similarity index and selecting a final proofreading word candidate includes:
A method for post-processing speech recognition, characterized in that the structural similarity and contextual relevance of the words of the proofreading vocabulary candidate group are digitized as values between 0 and 1, respectively, and a similarity index for extracting the final proofreading word candidate is calculated based on this value.

According to claim 1,
The step of generating the word vector space comprises:
Speech recognition post-processing method, characterized in that generating a word vector space based on self-introduction data among the interview data of the AI interviewee.

3. The method of claim 2,
The step of verifying the non-fluency of the speech recognition result text includes:
When the speech recognition result text is obtained, the speech recognition result text is separated into morpheme units, part-of-speech division is performed, and then mapped to the word vector space to detect a portion having non-fluency in the speech recognition result text, and the corresponding part Speech recognition post-processing method, characterized in that it is determined as a calibration section.

4. The method of claim 3,
The step of generating the proofreading vocabulary candidate group includes:
A non-fluent part of the speech recognition result text is defined as a correction section, the correction section is divided into morpheme units, and a group of words appearing together with other parts separated by the morpheme unit in the word vector space is preferentially extracted. and limiting the candidate group to the expected part-of-speech group of the proofreading section to generate the proofreading vocabulary candidate group.

According to claim 1,
The step of verifying the structural similarity of words belonging to the proofreading vocabulary candidate group includes:
The minimum number of operations required for mutual conversion between the word in the proofreading section and the words belonging to the proofreading vocabulary candidate group is defined as the editing distance between words, the minimum editing distance between phonemes and syllables is calculated, and the calculated minimum editing distance based on the structural similarity of words belonging to the proofreading vocabulary candidate group.

6. The method of claim 5,
The step of verifying the contextual relevance of words belonging to the proofreading vocabulary candidate group includes:
Speech recognition post-processing method, characterized in that when words belonging to the proofreading vocabulary candidate group are replaced with words in the proofreading section, whether a contextually appropriate sentence is completed from the point of view of the interviewer including the oral and description and the domain point of view.

delete

According to claim 1,
The step of determining the suitability of the selected final proofreading word candidate includes:
If the selected final proofreading word candidate is not suitable, the speech recognition post-processing method according to claim 1 , wherein the original sentence of the speech recognition result text is output.

It is combined with a computer that is hardware, and stored in a computer-readable recording medium to execute the method of any one of claims 1, 2, 3, 4, 5, 6 and 8. program.

In an AI (Artificial Intelligence) interview environment, a post-processing apparatus for speech recognition,
a word vector space generator that acquires interview data of an AI interviewee and generates a word vector space based on the interview data;
a non-fluency verification unit that obtains the speech recognition result text of the AI interviewee and verifies the non-fluency of the speech recognition result text based on the word vector space;
a proofreading vocabulary candidate group generating unit defining a non-fluent part of the speech recognition result text as a proofreading section and generating a proofreading vocabulary candidate group corresponding to the proofreading section based on the word vector space;
a structural similarity verification unit that verifies the structural similarity of words belonging to the proofreading vocabulary candidate group;
a contextual relevance verification unit for verifying the contextual relevance of words belonging to the proofreading vocabulary candidate group;
a final proofreading word candidate selector selecting a final proofreading word candidate by calculating a similarity index through selective weighting of structural similarity and contextual relevance of words belonging to the proofreading word candidate group;
a determination unit for determining whether the selected final proofreading word candidate is appropriate; And,
and a result output unit for replacing the correction section in the speech recognition result text with the final correction word candidate and outputting the selected final correction word candidate if the selected final correction word candidate is suitable;
The final proofreading word candidate selection unit,
The speech recognition post-processing apparatus of claim 1, wherein structural similarity and contextual relevance of the words of the proofreading vocabulary candidate group are digitized as values between 0 and 1, respectively, and a similarity index for extracting the final proofreading word candidate is calculated based on the numerical values.