KR20210076558A

KR20210076558A - Apparatus and Method for verifying the learning phrase quality of the AI service dialogue model

Info

Publication number: KR20210076558A
Application number: KR1020190167901A
Authority: KR
Inventors: 신광수; 김선희
Original assignee: 주식회사 엘지유플러스
Priority date: 2019-12-16
Filing date: 2019-12-16
Publication date: 2021-06-24
Also published as: KR102356996B1

Abstract

The present invention relates to an apparatus and a method for verifying the learning phrase quality of an artificial intelligence service dialog model which can automatically recognize similar utterance phrases with different dialog intentions but similar sentences to cause confusion among learning phrases of an artificial intelligence service dialog model in advance and improve intention classification accuracy and convenience through correction of the corresponding phrases. According to one aspect of the present invention, the apparatus for verifying the learning phrase quality of an artificial intelligence service dialog model comprises: an embedding unit to embed one or more learning phrases for each inputted dialog intention; a similarity calculation unit to calculate the similarity between two embedded learning phrases with different dialog intentions; a cohesion calculation unit to calculate the cohesion of all embedded learning phrases in the same dialog intention; and a similar phrase extraction unit to select and extract a specific learning phrase in a dialog intention as a similar utterance phrase based on the calculated similarity and cohesion.

Description

Apparatus and Method for verifying the learning phrase quality of the AI service dialogue model

본 발명은 인공지능 서비스 대화모델의 학습 문구 품질을 검증하기 위한 기술에 관한 것이다.The present invention relates to a technique for verifying the quality of learning phrases of an artificial intelligence service conversation model.

일반적으로, 대화모델은 모델 개발자 또는 서비스 운영 담당자가 기존 대화 시나리오나 상담 이력 등의 데이터를 대화의도 기준으로 레이블링하여 학습을 하고 있다. 이때 다량의 학습문구를 사용하기 때문에 대화의도 간 구분하기 힘든 유사 발화 문구들이 존재할 수 있는데, 이 경우 해당 유사 발화 문구들은 대화의도 분류 성공률을 저하시킬 수 있다.In general, a conversation model is learned by a model developer or a service operation person in charge of labeling data such as an existing conversation scenario or a consultation history based on the conversation intention. In this case, since a large amount of learning phrases are used, there may be similar utterance phrases that are difficult to distinguish between conversational intentions. In this case, the similar utterance phrases may lower the success rate of conversational intention classification.

따라서, 룰기반/딥러닝기반 방식에 상관없이 학습 전 데이터 전처리 단계에서 대화의도(Intent) 별 학습문구들 간 유사 발화 문구들을 수작업으로 확인, 제거 또는 수정하는 과정이 필요한데, 이러한 수작업은 많은 시간이 소요되며 휴먼에러 또한 발생하기 때문에 자동화를 통해 의도분류의 정확도/편의성 등을 개선할 필요가 있다.Therefore, regardless of the rule-based/deep learning-based method, it is necessary to manually check, remove, or correct similar utterances between learning phrases by conversational intent in the data pre-processing stage before learning. It is necessary to improve the accuracy/convenience of intention classification through automation because human errors also occur.

공개특허공보 제10-2011-0099434호(2011.09.08.)Laid-Open Patent Publication No. 10-2011-0099434 (2011.09.08.)

본 발명은 전술한 종래의 문제점을 해결하기 위한 것으로, 그 목적은 인공지능 서비스 대화모델의 학습 문구 중 대화 의도는 다르지만 문장이 유사하여 혼동을 줄 수 있는 유사 발화 문구들을 사전에 자동으로 인지하고 해당 문구들의 수정을 통해 의도분류 정확도 개선 및 편의성을 개선 할 수 있는, 인공지능 서비스 대화모델의 학습 문구 품질 검증 장치 및 방법을 제공하는 것이다.The present invention is to solve the above-mentioned conventional problems, and its purpose is to automatically recognize in advance similar utterance phrases that may cause confusion because the conversation intention is different but the sentences are similar among the learning phrases of the AI service conversation model, and corresponding It is to provide an apparatus and method for verifying the quality of learning phrases of an artificial intelligence service conversation model, which can improve intention classification accuracy and convenience through correction of phrases.

전술한 목적을 달성하기 위하여 본 발명의 일 측면에 따른 인공지능 서비스 대화모델의 학습 문구 품질 검증 장치는, 입력된 대화 의도 별 하나 이상의 학습 문구를 임베딩하기 위한 임베딩부; 대화 의도가 다른 임베딩된 두 학습 문구 간의 유사도를 계산하기 위한 유사도계산부; 동일 대화 의도 내 임베딩된 모든 학습 문구의 응집도를 계산하기 위한 응집도계산부; 및 상기 계산된 유사도 및 응집도를 기초로 대화 의도 내 특정 학습 문구를 유사 발화 문구로 선택하여 추출하기 위한 유사문구추출부를 포함할 수 있다.In order to achieve the above object, an apparatus for verifying the quality of a learning phrase of an artificial intelligence service conversation model according to an aspect of the present invention includes: an embedding unit for embedding one or more learning phrases for each input conversation intention; a similarity calculator for calculating a similarity between two embedded learning phrases having different conversational intentions; a cohesion calculation unit for calculating the degree of cohesion of all learning phrases embedded within the same conversational intent; and a similar phrase extraction unit for selecting and extracting a specific learning phrase in a conversation intention as a similar speech phrase based on the calculated similarity and cohesion degree.

상기 임베딩부는 .CVS(Comma-Separated Value) 포맷의 학습 문구를 입력할 수하는 있고, 상기 유사도계산부는 코사인(Cosine) 유사도 계산 방식으로 유사도를 계산할 수 있으며, 상기 코사인(Cosine) 유사도 계산 방식은 사이킷-런(Scikit-learn)의 코사인 유사도(Cosine Similarity) 방식을 포함할 수 있다.The embedding unit may input a learning phrase in a comma-separated value (.CVS) format, the similarity calculating unit may calculate the similarity using a cosine similarity calculation method, and the cosine similarity calculation method may be between It may include a cosine similarity method of Scikit-learn.

상기 응집도계산부는 동일 대화 의도 내 임베딩된 모든 학습 문구의 평균 유사도를 응집도로 산출할 수 있고, 상기 유사문구추출부는 응집도가 해당 기준보다 낮고 유사도가 해당 기준보다 높은 학습 문구를 유사 발화 문구로 선택하여 추출할 수 있다.The cohesion calculation unit may calculate the average degree of similarity of all learning phrases embedded within the same conversational intent as a degree of cohesion, and the similar phrase extraction unit selects a learning phrase having a degree of cohesion lower than the corresponding standard and a degree of similarity higher than the corresponding standard as a similar speech phrase. can be extracted.

전술한 목적을 달성하기 위하여 본 발명의 다른 측면에 따른 인공지능 서비스 대화모델의 학습 문구 품질 검증 방법은, (a) 입력된 대화 의도 별 하나 이상의 학습 문구를 임베딩하기 위한 단계; (b) 대화 의도가 다른 임베딩된 두 학습 문구 간의 유사도를 계산하기 위한 단계; (c) 동일 대화 의도 내 임베딩된 모든 학습 문구의 응집도를 계산하기 위한 단계; 및 (d) 상기 계산된 유사도 및 응집도를 기초로 대화 의도 내 특정 학습 문구를 유사 발화 문구로 선택하여 추출하기 위한 단계를 포함할 수 있다.In order to achieve the above object, a method for verifying the quality of a learning phrase of an artificial intelligence service conversation model according to another aspect of the present invention includes the steps of: (a) embedding one or more learning phrases for each input conversation intention; (b) calculating a similarity between two embedded learning phrases having different conversational intentions; (c) calculating the degree of cohesion of all embedded learning phrases within the same conversational intent; and (d) selecting and extracting a specific learning phrase in the conversation intention as a similar utterance phrase based on the calculated similarity and cohesion degree.

상기 단계 (a)는 .CVS(Comma-Separated Value) 포맷의 학습 문구를 입력할 수 있고, 상기 단계 (b)는 코사인(Cosine) 유사도 계산 방식으로 유사도를 계산할 수 있으며, 상기 코사인(Cosine) 유사도 계산 방식은 사이킷-런(Scikit-learn)의 코사인 유사도(Cosine Similarity) 방식을 포함할 수 있다.In step (a), a learning phrase in a comma-separated value (.CVS) format may be input, and in step (b), similarity may be calculated using a cosine similarity calculation method, and the cosine similarity may be input. The calculation method may include a cosine similarity method of Scikit-learn.

상기 단계 (c)는 동일 대화 의도 내 임베딩된 모든 학습 문구의 평균 유사도를 응집도로 산출할 수 있고, 상기 단계 (d)는 응집도가 해당 기준보다 낮고 유사도가 해당 기준보다 높은 학습 문구를 유사 발화 문구로 선택하여 추출할 수 있으며, 상기 단계 (a)는 입력된 학습 문구를 텐서플로우 허브(TensorFlow hub) 방식을 통해 임베딩하여 벡터화 할 수 있다.In step (c), the average degree of similarity of all embedded learning phrases within the same conversational intent may be calculated as a degree of cohesion, and in step (d), the degree of cohesion is lower than the corresponding standard and the similarity is higher than the standard for learning phrases similar to speech phrases. can be selected and extracted, and the step (a) can be vectorized by embedding the input learning phrase through a TensorFlow hub method.

전술한 목적을 달성하기 위하여 본 발명의 또 다른 측면에 따르면, 상기 인공지능 서비스 대화모델의 학습 문구 품질 검증 방법을 컴퓨터에서 실행시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록 매체가 제공될 수 있다.According to another aspect of the present invention in order to achieve the above object, a computer-readable recording medium in which a program for executing the learning phrase quality verification method of the artificial intelligence service conversation model in a computer is recorded may be provided.

전술한 목적을 달성하기 위하여 본 발명의 또 다른 측면에 따르면, 상기 인공지능 서비스 대화모델의 학습 문구 품질 검증 방법을 하드웨어와 결합하여 실행시키기 위하여 컴퓨터로 읽을 수 있는 기록 매체에 저장된 애플리케이션이 제공될 수 있다.According to another aspect of the present invention in order to achieve the above object, an application stored in a computer-readable recording medium may be provided in order to execute the learning phrase quality verification method of the artificial intelligence service conversation model in combination with hardware. have.

전술한 목적을 달성하기 위하여 본 발명의 또 다른 측면에 따르면, 상기 인공지능 서비스 대화모델의 학습 문구 품질 검증 방법을 컴퓨터에서 실행시키기 위하여 컴퓨터로 읽을 수 있는 기록 매체에 저장된 컴퓨터 프로그램이 제공될 수 있다.According to another aspect of the present invention in order to achieve the above object, there may be provided a computer program stored in a computer-readable recording medium in order to execute the method for verifying the quality of the learning phrase of the artificial intelligence service conversation model in the computer. .

이상에서 설명한 바와 같이 본 발명의 다양한 측면에 따르면, 인공지능 서비스 대화모델의 학습 문구 중 대화 의도는 다르지만 문장이 유사하여 혼동을 줄 수 있는 유사 발화 문구들을 사전에 자동으로 인지하고 해당 문구들의 수정을 통해 의도분류 정확도 개선 및 편의성을 개선 할 수 있다.As described above, according to various aspects of the present invention, among the learning phrases of the artificial intelligence service conversation model, similar utterance phrases that may cause confusion due to different conversational intentions but similar sentences are automatically recognized in advance, and the corresponding phrases are corrected. Through this, it is possible to improve the accuracy of intention classification and improve convenience.

즉, 기존에는 서비스/챗봇 대화 모델 생성 후 검증단계에서 의도분류 성공률이 낮은 경우, 학습 문구 등을 수작업을 통해 전수 조사하고 문제가 되는 문장에 대해서 리뷰, 수정 반영하는 절차대로 진행된 반면, 본 발명에 따르면 대화모델 생성 전 자동으로 학습문장의 품질 평가가 수행이 되므로 의도분류 저하를 사전에 방지할 수 있는 효과가 있다. That is, in the past, when the success rate of intention classification was low in the verification stage after the creation of the service/chatbot conversation model, the learning phrases were manually investigated, and the problematic sentences were reviewed, corrected and reflected. According to this, since the quality evaluation of the learning sentence is automatically performed before generating the dialogue model, it is possible to prevent the deterioration of intention classification in advance.

도 1은 본 발명의 예시적인 실시예에 따른 인공지능 서비스 대화모델의 학습 문구 품질 검증 장치의 구성도,
도 2는 대와 의도 별 학습 문구의 예시도,
도 3은 유사도 계산 예시도,
도 4는 응집도 계산 예시도,
도 5는 본 발명의 예시적인 실시예에 따른 인공지능 서비스 대화모델의 학습 문구 품질 검증 방법의 흐름도이다.1 is a block diagram of a learning phrase quality verification apparatus of an artificial intelligence service conversation model according to an exemplary embodiment of the present invention;
2 is an exemplary diagram of learning phrases for each purpose and purpose;
3 is an exemplary diagram of similarity calculation;
Figure 4 is an example of the degree of cohesion calculation;
5 is a flowchart of a method of verifying the quality of a learning phrase of an artificial intelligence service conversation model according to an exemplary embodiment of the present invention.

이하, 첨부도면을 참조하여 본 발명의 실시예에 대해 구체적으로 설명한다. 각 도면의 구성요소들에 참조부호를 부가함에 있어서 동일한 구성요소들에 대해서는 비록 다른 도면상에 표시되더라도 가능한 한 동일한 부호를 가지도록 한다. 또한, 본 발명의 실시예에 대한 설명 시 관련된 공지 구성 또는 기능에 대한 구체적인 설명이 본 발명의 요지를 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명은 생략한다. Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings. In adding reference numerals to the components of each drawing, the same components are to have the same reference numerals as much as possible even though they are indicated in different drawings. In addition, when it is determined that a detailed description of a known configuration or function related to the embodiment of the present invention may obscure the gist of the present invention, the detailed description thereof will be omitted.

도 1은 본 발명의 예시적인 실시예에 따른 인공지능 서비스 대화모델의 학습 문구 품질 검증 장치의 구성도로, 동 도면에 도시된 바와 같이, 입력부(11), 전처리부(12), 임베딩부(13), 유사도계산부(14), 응집도계산부(15), 및 유사문구추출부(16)를 포함할 수 있다.1 is a block diagram of a learning phrase quality verification apparatus of an artificial intelligence service dialogue model according to an exemplary embodiment of the present invention. As shown in the figure, an input unit 11, a preprocessor 12, and an embedding unit 13 ), a similarity calculating unit 14 , a cohesive level calculating unit 15 , and a similar phrase extracting unit 16 .

입력부(11)는 대화 모델의 학습 문구를 입력하기 위한 것으로, 예를 들어, 대화 모델 담당자가 도 2와 같이 정리된 대화의도(Intent) 별 학습 문구를 .CSV(Comma-Separated Value) 포맷의 파일 형태로 만들어 시스템으로 업로드 하면 입력부(11)는 이를 입력 처리할 수 있다.The input unit 11 is for inputting the learning phrase of the conversation model, for example, the conversation model person in charge of the conversational intent (Intent) organized learning phrases in .CSV (Comma-Separated Value) format as shown in FIG. When it is made in the form of a file and uploaded to the system, the input unit 11 can input and process it.

전처리부(12)는입력부를 통해 입력된 학습 문구 데이터에서 공백제거, 불용어 제거 등의 전처리를 수행하기 위한 것이다.The pre-processing unit 12 is for performing pre-processing such as removing blanks and removing stop words from the learning phrase data input through the input unit.

임베딩부(13)는 입력된 .CSV(Comma-Separated Value) 포맷의 대화 의도 별 하나 이상의 학습 문구를 임베딩(Embedding)하기 위한 것으로, 입력된 학습 문구를 텐서플로우 허브(TensorFlow hub) 방식을 통해 임베딩하여 벡터화 할 수 있다. The embedding unit 13 is for embedding one or more learning phrases for each conversation intent of the input .CSV (Comma-Separated Value) format, and embeds the input learning phrases through a TensorFlow hub method. can be vectorized.

예를 들어, 임베딩부(13)는 아래의 표 1과 같이 학습 문구를 대응하는 특정 벡터 값으로 임베딩할 수 있다.For example, the embedding unit 13 may embed the learning phrase as a corresponding specific vector value as shown in Table 1 below.

학습문구study phrase 임베딩 벡터 값Embedding Vector Values 파이브지 약정 할인 해지Cancellation of 5G contract discount [-0.03649221 0.02498418 -0.03456857 0.02827227 0.00471277][-0.03649221 0.02498418 -0.03456857 0.02827227 0.00471277] 5G 약정할인이 뭔가요What is the 5G contract discount? [-0.02732556 -0.00821852 -0.00794602 0.06356855 -0.03726532][-0.02732556 -0.00821852 -0.00794602 0.06356855 -0.03726532] 폰케어플러스 보상 문의Phone Care Plus Compensation Inquiry [-0.01732556 -0.00821852 -0.00494602 0.06357855 -0.01726532][-0.01732556 -0.00821852 -0.00494602 0.06357855 -0.01726532]

유사도계산부(14)는 임베딩부(13)를 통해 임베딩된 학습 문구에서 대화 의도가 다른 두 학습 문구 간의 유사도를 도 3에 예시된 바와 같이 계산하기 위한 것으로, 예를 들어, 유사도는 코사인(Cosine) 유사도 계산 방식을 통해 계산할 수 있고, 코사인(Cosine) 유사도 계산 방식은 사이킷-런(Scikit-learn)의 코사인 유사도(Cosine Similarity) 방식을 포함할 수 있으며, 코사인 값이 클수록 유사도가 높으며 유사도 값은 0~1 사이의 값으로 정의될 수 있다.The similarity calculating unit 14 is for calculating the similarity between two learning phrases having different conversational intentions in the learning phrases embedded through the embedding unit 13 as illustrated in FIG. 3 , for example, the degree of similarity is cosine (Cosine). ) similarity calculation method, and the cosine similarity calculation method may include Scikit-learn's cosine similarity method. The larger the cosine value, the higher the similarity value. may be defined as a value between 0 and 1.

유사도계산부(14)에서 계산된 유사도 값의 일 예를 설명하면, 도 3에 예시된 바와 같이, 대화의도 1 '5G_Clear'에 속한 학습 문구 '5G 약정 해지'와 대화의도 2 '5G_outlineReq'에 속한 학습 문구 '5G 약정 할인이 뭔가요'간의 유사도 값은 0.764919이고, 대화의도 1 '5G_Clear'에 속한 학습 문구 '5G 약정 할인 해지'와 대화의도 2 '5G_outlineReq'에 속한 학습 문구 '5G 약정 할인'간의 유사도 값은 0.847193 등으로 산출될 수 있으며, 이와 같이 대화의도 1의 학습 문구와 대화의도 2의 학습 문구를 각각 매칭하고 매칭된 두 학습문구 간의 유사도를 코사인(Cosine) 유사도 계산 방식을 통해 계산하도록 한다.An example of the similarity value calculated by the similarity calculator 14 will be described, as illustrated in FIG. 3 , the learning phrase '5G contract termination' and the conversation intention 2 '5G_outlineReq' belonging to '5G_Clear' in conversation intention 1 The similarity value between the learning phrase 'What is the 5G contract discount' belonging to is 0.764919, the learning phrase '5G contract discount cancellation' belonging to conversational intent 1 '5G_Clear' and the learning phrase '5G agreement' belonging to conversation intention 2 '5G_outlineReq' The similarity value between 'discount' can be calculated as 0.847193, etc. In this way, the learning phrase of conversational intention 1 and the learning phrase of conversational intention 2 are respectively matched, and the similarity between the two matched learning phrases is calculated using a cosine similarity calculation method. to be calculated through

응집도계산부(15)는 동일 대화 의도 내 임베딩된 모든 학습 문구의 응집도를 계산하기 위한 것으로, 예를 들어, 동일 대화 의도 내 임베딩된 모든 학습 문구의 평균 유사도를 응집도로 산출할 수 있다. The cohesion calculation unit 15 is for calculating the degree of cohesion of all learning phrases embedded in the same conversational intention, for example, the average similarity of all the learning phrases embedded in the same conversational intention may be calculated as the cohesion.

응집도계산부(15)에서 계산된 응집도 값의 일 예를 설명하면, 도 4에 예시된 바와 같이, 대화의도 1 '5G_Clear'에 속한 학습 문구의 응집도는 0.217600이고, 대화의도 2 '5G_outlineReq'에 속한 학습 문구의 응집도는 0.388721 등과 같이 산출될 수 있다. An example of the cohesion value calculated by the cohesion calculation unit 15 will be described. As illustrated in FIG. 4 , the degree of cohesion of the learning phrase belonging to '5G_Clear' in conversational intention 1 is 0.217600, and conversational intention 2 '5G_outlineReq'. The degree of cohesion of learning phrases belonging to can be calculated as 0.388721.

유사문구추출부(16)는 유사도계산부(14)를 통해 계산된 유사도 및 응집도계산부(15)를 통해 계산된 응집도를 기초로 대화 의도 내 특정 학습 문구를 유사 발화 문구로 선택하여 추출하기 위한 것으로, 예를 들어, 응집도가 해당 기준보다 낮고 유사도가 해당 기준보다 높은 학습 문구를 유사 발화 문구로 선택하여 추출할 수 있다.The similar phrase extracting unit 16 selects and extracts a specific learning phrase in the conversational intention as a similar utterance phrase based on the similarity calculated through the similarity calculation unit 14 and the degree of cohesion calculated through the cohesion calculation unit 15. That is, for example, a learning phrase having a degree of cohesion lower than the corresponding standard and a similarity higher than the corresponding standard may be selected and extracted as a similar utterance phrase.

유사문구추출부(16)에서 추출되는 유사 발화 문구의 일 예를 도 3의 유사도 계산 예시도 및 도 4의 응집도 계산 예시도를 참조하여 설명하면, 응집도가 상대적으로 낮은(실제로는 기 설정된 해당 기준보다 낮은) '5G_Clear' 대화의도에서 유사도가 상대적으로 높은(실제로는 기 설정된 해당 기준보다 높은) 학습 문구인 '5G 약정 할인 해지' 문구가 유사 발화 문구로 선택되어 추출될 수 있다.An example of the similar utterances extracted by the similar phrase extraction unit 16 is described with reference to the similarity calculation example of FIG. 3 and the cohesion degree calculation example of FIG. 4 , the degree of cohesion is relatively low (actually, the preset corresponding standard) In the lower) '5G_Clear' conversational intention, the phrase '5G contract discount cancellation', which is a learning phrase with a relatively high degree of similarity (in fact, higher than the preset standard), may be selected and extracted as a similar utterance phrase.

유사문구추출부(16)는 전술한 바와 같이 혼동되는 학습 문구(또는 문장)을 판별하고 해당 대화의도(Intent)에서 제외하여 별도 파일로 생성할 수 있다.The similar phrase extraction unit 16 may determine the learning phrase (or sentence) that is confused as described above, and may generate a separate file by excluding it from the corresponding conversation intent.

따라서, 모델 운영 담당자는 데이터 전처리 단계에서 본 발명의 장치를 통해 의도분류 성공률을 저하시킬 수 있는 문장들을 미리 제거할 수 있으며, 해당 문장들을 추후 수정 반영할 수도 있다. Accordingly, in the data pre-processing step, the person in charge of the model operation may remove in advance sentences that may reduce the success rate of intention classification through the apparatus of the present invention, and may modify and reflect the sentences later.

도 5는 본 발명의 예시적인 실시예에 따른 인공지능 서비스 대화모델의 학습 문구 품질 검증 방법의 흐름도로, 도 1의 장치에 적용되므로 해당 장치의 동작과 병행하여 설명한다.5 is a flowchart of a learning phrase quality verification method of an artificial intelligence service conversation model according to an exemplary embodiment of the present invention. Since it is applied to the apparatus of FIG. 1, it will be described in parallel with the operation of the corresponding apparatus.

먼저, 입력부(11)는 도 2와 같이 정리된 대화의도(Intent) 별 학습 문구를 .CSV(Comma-Separated Value) 포맷의 형태로 입력 처리하고(S501), 전처리부(12)는 입력된 학습 문구 데이터에 대해 공백제거, 불용어 제거 등의 전처리를 수행한다(S503).First, the input unit 11 inputs and processes the learning phrases for each conversation intent organized as shown in FIG. 2 in the form of a comma-separated value (.CSV) format (S501), and the preprocessor 12 receives the input Pre-processing such as blank removal and stopword removal is performed on the learning phrase data (S503).

이어, 임베딩부(13)는 단계 S501에서 입력되어 단계 S503에서 전처리된 .CSV(Comma-Separated Value) 포맷의 대화 의도 별 학습 문구를 예를 들어 텐서플로우 허브(TensorFlow hub) 방식을 통해 임베딩하여 표 1과 같이 벡터화하고(S505), 유사도계산부(14)는 단계 S505에서 임베딩부(13)를 통해 임베딩된 학습 문구에서 대화 의도가 다른 두 학습 문구 간의 유사도를 도 3에 예시된 바와 같이 계산하되, 전술한 바와 같이 유사도는 코사인(Cosine) 유사도 계산 방식을 통해 계산할 수 있고, 코사인(Cosine) 유사도 계산 방식은 사이킷-런(Scikit-learn)의 코사인 유사도(Cosine Similarity) 방식을 포함할 수 있으며, 코사인 값이 클수록 유사도가 높으며 유사도 값은 0~1 사이의 값으로 정의될 수 있다(S507).Next, the embedding unit 13 embeds the learning phrase for each conversation intention in the .CSV (Comma-Separated Value) format input in step S501 and pre-processed in step S503, for example, through the TensorFlow hub method, and then a table Vectorized as 1 (S505), the similarity calculator 14 calculates the similarity between two learning phrases with different conversational intentions in the learning phrases embedded through the embedding unit 13 in step S505 as illustrated in FIG. , as described above, the similarity may be calculated through a cosine similarity calculation method, and the cosine similarity calculation method may include a Scikit-learn cosine similarity method. , the higher the cosine value, the higher the similarity, and the similarity value may be defined as a value between 0 and 1 (S507).

이어, 응집도계산부(15)는 동일 대화 의도 내 임베딩된 모든 학습 문구의 응집도를 계산하되, 예를 들어, 동일 대화 의도 내 임베딩된 모든 학습 문구의 평균 유사도를 계산하고 계산된 평균유사도 값을 응집도 값으로 정의할 수 있고, 응집도 계산 예시는 도 4에 도시된 바와 같다(S509). Next, the cohesion calculation unit 15 calculates the degree of cohesion of all the learning phrases embedded in the same conversational intention, for example, calculates the average similarity of all the learning phrases embedded in the same conversational intention, and calculates the calculated average similarity value as the degree of cohesion It can be defined as a value, and an example of calculating the degree of cohesion is as shown in FIG. 4 (S509).

마지막으로, 유사문구추출부(16)는 단계 S507에서 유사도계산부(14)를 통해 계산된 유사도 및 단계 S509에서 응집도계산부(15)를 통해 계산된 응집도를 기초로 대화 의도 내 특정 학습 문구를 유사 발화 문구로 선택하여 추출하되, 응집도가 해당 기준보다 낮고 유사도가 해당 기준보다 높은 학습 문구를 유사 발화 문구로 선택하여 추출할 수 있으며, 예를 들어, 도 4의 응집도 예시도에서 응집도가 상대적으로 낮은(실제로는 기 설정된 기준보다 낮은) '5G_Clear' 대화의도 내의 학습 문구 중, 도 3의 유사도 예시도에서 유사도가 상대적으로 높은(실제로는 기 설정된 기준보다 높은) 학습 문구인 '5G 약정 할인 해지' 문구를 유사 발화 문구로 선택하여 추출할 수 있다(S511).Finally, the similar phrase extraction unit 16 selects a specific learning phrase in the conversation intention based on the degree of similarity calculated by the similarity calculator 14 in step S507 and the degree of cohesion calculated by the cohesion calculator 15 in step S509. A similar utterance phrase is selected and extracted, but a learning phrase having a degree of cohesion lower than the corresponding standard and a degree of similarity higher than the corresponding standard can be selected and extracted as a similar speech phrase, for example, the degree of cohesion is relatively Among the learning phrases in the '5G_Clear' conversational intention that are low (actually lower than the preset standard), '5G contract discount cancellation', which is a learning phrase with a relatively high similarity (actually higher than the preset standard) in the similarity example diagram of FIG. 3 ' can be extracted by selecting the phrase as a similar utterance phrase (S511).

전술한 바와 같이 본 발명의 방법에 따르면 유사 발화 문구(또는 문장)를 판별하고 해당 대화의도(Intent)에서 제외하여 별도 파일로 생성할 수 있으며, 이에 따라 모델 운영 담당자는 데이터 전처리 단계에서 본 발명의 방법을 통해 의도분류 성공률을 저하시킬 수 있는 문장들을 미리 제거할 수 있으며, 해당 문장들을 추후 수정 반영할 수도 있다. As described above, according to the method of the present invention, similar utterance phrases (or sentences) can be determined and generated as a separate file by excluding them from the corresponding conversation intent. Through the method of , sentences that may reduce the success rate of intention classification can be removed in advance, and the sentences can be revised and reflected later.

한편, 전술한 인공지능 서비스 대화모델의 학습 문구 품질 검증 방법에 따르면 해당 방법을 컴퓨터에서 실행시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록 매체를 구현할 수 있다.On the other hand, according to the above-described method for verifying the quality of the learning phrase of the artificial intelligence service conversation model, a computer-readable recording medium in which a program for executing the method in a computer is recorded can be implemented.

또 한편, 전술한 인공지능 서비스 대화모델의 학습 문구 품질 검증 방법에 따르면 해당 방법을 하드웨어와 결합하여 실행시키기 위하여 컴퓨터로 읽을 수 있는 기록 매체에 저장된 애플리케이션을 구현할 수 있다.On the other hand, according to the above-described method for verifying the quality of the learning phrase of the AI service conversation model, it is possible to implement an application stored in a computer-readable recording medium in order to execute the method in combination with hardware.

또 다른 한편, 전술한 인공지능 서비스 대화모델의 학습 문구 품질 검증 방법에 따르면 해당 방법을 컴퓨터에서 실행시키기 위하여 컴퓨터로 읽을 수 있는 기록 매체에 저장된 컴퓨터 프로그램을 구현할 수 있다.On the other hand, according to the above-described method for verifying the quality of the learning phrase of the artificial intelligence service conversation model, a computer program stored in a computer-readable recording medium can be implemented in order to execute the method in a computer.

예를 들어, 전술한 바와 같이 본 발명의 예시적인 실시예에 따른 인공지능 서비스 대화모델의 학습 문구 품질 검증 방법은 다양한 컴퓨터로 구현되는 동작을 수행하기 위한 프로그램 명령을 포함하는 컴퓨터 판독가능 기록 매체 또는 이러한 기록 매체에 저장된 애플리케이션으로 구현될 수 있다. 상기 컴퓨터 판독 가능 기록 매체는 프로그램 명령, 로컬 데이터 파일, 로컬 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 기록 매체는 본 발명의 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체, CD-ROM, DVD와 같은 광기록 매체, 플롭티컬 디스크와 같은 자기-광 매체, 및 롬, 램, 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함할 수 있다.For example, as described above, the learning phrase quality verification method of the artificial intelligence service conversation model according to an exemplary embodiment of the present invention is a computer-readable recording medium including program instructions for performing operations implemented in various computers or It may be implemented as an application stored in such a recording medium. The computer-readable recording medium may include program instructions, local data files, local data structures, and the like alone or in combination. The recording medium may be specially designed and configured for the embodiment of the present invention, or may be known and available to those skilled in the art of computer software. Examples of the computer-readable recording medium include hard disks, magnetic media such as floppy disks and magnetic tapes, optical recording media such as CD-ROMs and DVDs, magneto-optical media such as floppy disks, and ROMs, RAMs, flash memories, and the like. Hardware devices specially configured to store and execute the same program instructions are included. Examples of program instructions may include not only machine language codes such as those generated by a compiler, but also high-level language codes that can be executed by a computer using an interpreter or the like.

이상의 설명은 본 발명의 기술 사상을 예시적으로 설명한 것에 불과한 것으로서, 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자라면 본 발명의 본질적인 특성에서 벗어나지 않는 범위에서 다양한 수정 및 변형이 가능할 것이다. 따라서, 본 발명에 개시된 실시예들은 본 발명의 기술 사상을 한정하기 위한 것이 아니라 설명하기 위한 것이고, 이러한 실시예에 의하여 본 발명의 기술 사상의 범위가 한정되는 것은 아니다. 본 발명의 보호 범위는 아래의 청구범위에 의하여 해석되어야 하며, 그와 동등한 범위 내에 있는 모든 기술 사상은 본 발명의 권리범위에 포함되는 것으로 해석되어야 할 것이다.The above description is merely illustrative of the technical spirit of the present invention, and various modifications and variations will be possible without departing from the essential characteristics of the present invention by those skilled in the art to which the present invention pertains. Therefore, the embodiments disclosed in the present invention are not intended to limit the technical spirit of the present invention, but to explain, and the scope of the technical spirit of the present invention is not limited by these embodiments. The protection scope of the present invention should be construed by the following claims, and all technical ideas within the equivalent range should be construed as being included in the scope of the present invention.

11: 입력부
12: 전처리부
13: 임베딩부
14: 유사도계산부
15: 응집도계산부
16: 유사문구추출부11: input
12: preprocessor
13: embedding part
14: similarity calculation unit
15: aggregation degree calculation unit
16: Similar phrase extraction unit

Claims

an embedding unit for embedding one or more learning phrases for each input conversation intention;
a similarity calculator for calculating a similarity between two embedded learning phrases having different conversational intentions;
a cohesion calculation unit for calculating the degree of cohesion of all learning phrases embedded within the same conversation intention; and
a similar phrase extraction unit for selecting and extracting a specific learning phrase in a conversation intention as a similar speech phrase based on the calculated similarity and cohesion degree;
A learning phrase quality verification device of the artificial intelligence service dialogue model, including a.

According to claim 1,
The embedding unit .CVS (Comma-Separated Value) format learning phrase quality verification apparatus of the artificial intelligence service dialogue model, characterized in that for inputting the learning phrase.

According to claim 1,
The similarity calculator calculates the similarity in a cosine similarity calculation method, the learning phrase quality verification apparatus of the artificial intelligence service dialogue model, characterized in that.

4. The method of claim 3,
The cosine similarity calculation method is a learning phrase quality verification apparatus of an artificial intelligence service dialogue model, characterized in that it includes a cosine similarity method of Scikit-learn.

According to claim 1,
The cohesiveness calculator calculates the average similarity of all learning phrases embedded within the same conversational intention as a cohesive degree, the learning phrase quality verification apparatus of the artificial intelligence service conversation model, characterized in that it calculates the cohesion degree.

According to claim 1,
The similar phrase extraction unit is a learning phrase quality verification apparatus of an artificial intelligence service dialog model, characterized in that the cohesion is lower than the corresponding standard and the learning phrase having a similarity higher than the corresponding standard is selected and extracted as the similar speech phrase.

According to claim 1,
The embedding unit is a learning phrase quality verification apparatus of an artificial intelligence service conversation model, characterized in that embedding the input learning phrase through a TensorFlow hub (TensorFlow hub) method.

(a) embedding one or more learning phrases for each input conversation intention;
(b) calculating a similarity between two embedded learning phrases having different conversational intentions;
(c) calculating the degree of cohesion of all embedded learning phrases within the same conversational intent; and
(d) selecting and extracting a specific learning phrase in a conversational intention as a similar utterance phrase based on the calculated similarity and cohesion;
A method of verifying the quality of learning phrases of an artificial intelligence service conversation model, including a.

9. The method of claim 8,
The step (a) is a method of verifying the quality of the learning phrase of the artificial intelligence service dialog model, characterized in that inputting the learning phrase in the .CVS (Comma-Separated Value) format.

9. The method of claim 8,
The step (b) is a learning phrase quality verification method of an artificial intelligence service conversation model, characterized in that the similarity is calculated by a cosine similarity calculation method.

11. The method of claim 10,
The cosine similarity calculation method is a learning phrase quality verification method of an artificial intelligence service conversation model, characterized in that it includes a cosine similarity method of Scikit-learn.

9. The method of claim 8,
The step (c) is a method of verifying the quality of learning phrases in an artificial intelligence service conversation model, characterized in that the average degree of similarity of all embedded learning phrases in the same conversation intention is calculated as a cohesive degree.

9. The method of claim 8,
The step (d) is a learning phrase quality verification method of an artificial intelligence service conversation model, characterized in that the degree of cohesion is lower than the corresponding standard and the learning phrase having a similarity higher than the corresponding standard is selected and extracted as the similar speech phrase.

9. The method of claim 8,
The step (a) is a method for verifying the quality of learning phrases of an artificial intelligence service conversation model, characterized in that the input learning phrase is embedded and vectorized through a TensorFlow hub method.

A computer-readable recording medium in which a program for executing the learning phrase quality verification method of the artificial intelligence service conversation model of any one of claims 8 to 14 in a computer is recorded.

An application stored in a computer-readable recording medium in order to execute the learning phrase quality verification method of the artificial intelligence service conversation model of any one of claims 8 to 14 in combination with hardware.

A computer program stored in a computer-readable recording medium in order to execute the learning phrase quality verification method of the artificial intelligence service conversation model of any one of claims 8 to 14 in a computer.