KR102583434B1

KR102583434B1 - Method and system for evaluating quality of voice counseling

Info

Publication number: KR102583434B1
Application number: KR1020210099202A
Authority: KR
Inventors: 이건수; 김찬호; 김기원; 신종호
Original assignee: 주식회사 씨앤에이아이
Priority date: 2021-07-28
Filing date: 2021-07-28
Publication date: 2023-09-27
Also published as: KR20230017554A

Abstract

본 발명은 음성 상담의 품질 평가 방법에 관한 것이다. 음성 상담의 품질 평가 방법은, 음성 상담이 수행된 경우, 음성 상담과 연관된 녹취 데이터를 수신하는 단계, 수신된 녹취 데이터를 기초로, 시간에 따른 상담사 및 고객의 음성에 대응하는 텍스트 데이터를 생성하는 단계, 생성된 텍스트 데이터를 문장 단위로 분할하여 복수의 텍스트 문장을 생성하는 단계, 녹취 데이터과 연관된 멜 스펙트로그램을 생성하고, 생성된 멜 스펙트로그램을 이용하여 생성된 복수의 텍스트 문장의 정확성을 검증하는 단계 및 음성 상담의 시간 구간 및 시간에 따른, 검증된 복수의 텍스트 문장을 이용하여 음성 상담에 대한 품질 평가를 수행하는 단계를 포함한다.The present invention relates to a method for evaluating the quality of voice consultation. The quality evaluation method of voice consultation includes, when voice consultation is performed, receiving recorded data associated with the voice consultation, and based on the received recorded data, generating text data corresponding to the voices of the counselor and the customer over time. Step, generating a plurality of text sentences by dividing the generated text data into sentence units, generating a Mel spectrogram associated with the recorded data, and verifying the accuracy of the plurality of text sentences generated using the generated Mel spectrogram. It includes the step of performing a quality evaluation of the voice consultation using a plurality of verified text sentences according to the stage and time section and time of the voice consultation.

Description

Quality evaluation method and system for voice counseling {METHOD AND SYSTEM FOR EVALUATING QUALITY OF VOICE COUNSELING}

본 발명은 음성 상담의 품질 평가 방법 및 시스템에 관한 것으로, 구체적으로, 음성을 텍스트로 변환하여 품질 평가를 수행하는 음성 상담의 품질 평가 방법 및 시스템에 관한 것이다.The present invention relates to a quality evaluation method and system for voice consultation, and specifically, to a quality evaluation method and system for voice consultation that converts voice into text and performs quality evaluation.

언택트(untact) 시대의 시작으로, 온라인 산업의 성장 속도는 점차 빨라지고 있다. 이와 같이 온라인 산업이 성장함에 따라, 고객 관리에 대한 중요성은 높아지게 되었고, 그 접점에 존재하는 컨택(contact) 센터 역시 중요하게 인식되고 있다. 이러한 컨택 센터의 업무는 크게 상담사의 상담 업무 및 상담사 관리 업무로 구분된다. 상담사 관리 업무 중 하나로서, 고객 관리를 위해 주기적으로 상담사의 상담 품질 평가를 수행하는 것은 매우 중요하다.With the beginning of the untact era, the growth rate of the online industry is gradually accelerating. As the online industry grows, the importance of customer management has increased, and the contact center that exists at the point of contact is also recognized as important. The work of these contact centers is largely divided into counselor counseling work and counselor management work. As one of the counselor management tasks, it is very important to periodically evaluate the counselor's counseling quality for customer management.

한편, 상담 품질 평가는 품질 평가를 수행하는 교육 강사에 의해 수행될 수 있다. 그러나, 상담사 대비 교육 강사의 수가 현저히 적기 때문에, 교육 강사가 모든 상담사의 통화 내용을 검수하여 품질 평가를 수행하는 것은 불가능하다. 따라서, 음성 상담의 품질 평가를 높은 정확도로 자동화하기 위한 기술이 요구된다.Meanwhile, the consultation quality evaluation may be performed by an educational instructor who performs the quality evaluation. However, because the number of training instructors is significantly small compared to counselors, it is impossible for training instructors to inspect the call contents of all counselors and perform quality evaluation. Therefore, technology for automating the quality evaluation of voice consultation with high accuracy is required.

본 발명은 상기와 같은 문제점을 해결하기 위한 음성 상담의 품질 평가 방법, 기록매체에 저장된 컴퓨터 프로그램 및 시스템(장치)을 제공한다.The present invention provides a method for evaluating the quality of voice consultation, a computer program stored in a recording medium, and a system (device) to solve the above problems.

본 발명은 방법, 시스템(장치) 또는 판독 가능 저장 매체에 저장된 컴퓨터 프로그램을 포함한 다양한 방식으로 구현될 수 있다.The present invention may be implemented in various ways, including as a method, system (device), or computer program stored in a readable storage medium.

본 발명의 일 실시예에 따르면, 적어도 하나의 프로세서에 의해 수행되는 음성 상담의 품질 평가 방법은, 음성 상담이 수행된 경우, 음성 상담과 연관된 녹취 데이터를 수신하는 단계, 수신된 녹취 데이터를 기초로, 시간에 따른 상담사 및 고객의 음성에 대응하는 텍스트 데이터를 생성하는 단계, 생성된 텍스트 데이터를 문장 단위로 분할하여 복수의 텍스트 문장을 생성하는 단계, 녹취 데이터 상에 포함된 무음 구간을 추출하고, 추출된 무음 구간을 기초로 생성된 복수의 텍스트 문장의 정확성을 검증하는 단계 및 음성 상담의 시간 구간 및 시간에 따른, 검증된 복수의 텍스트 문장을 이용하여 음성 상담에 대한 품질 평가를 수행하는 단계를 포함한다.According to one embodiment of the present invention, a method for evaluating the quality of a voice consultation performed by at least one processor includes, when a voice consultation is performed, receiving recording data associated with the voice consultation, based on the received recording data. , generating text data corresponding to the voices of the counselor and customer over time, dividing the generated text data into sentences to generate a plurality of text sentences, extracting silent sections included in the recording data, Verifying the accuracy of a plurality of text sentences generated based on the extracted silent section and performing a quality evaluation of the voice consultation using the plurality of verified text sentences according to the time section and time of the voice consultation. Includes.

본 발명의 일 실시예에 따르면, 녹취 데이터 상에 포함된 무음 구간을 추출하고, 추출된 무음 구간을 기초로 생성된 복수의 텍스트 문장의 정확성을 검증하는 단계는, 녹취 데이터와 연관된 멜 스펙트로그램을 생성하고, 생성된 멜 스펙트로그램을 이용하여 무음 구간을 추출하는 단계를 포함한다.According to an embodiment of the present invention, the step of extracting a silent section included in the recorded data and verifying the accuracy of a plurality of text sentences generated based on the extracted silent section includes generating a mel spectrogram associated with the recorded data. It includes the steps of generating and extracting a silent section using the generated Mel spectrogram.

본 발명의 일 실시예에 따르면, 음성 상담에 대한 품질 평가를 수행하는 단계는, 복수의 텍스트 문장 중 특정 시간 범위 내에 포함된 제1 세트의 텍스트 문장을 추출하는 단계, 추출된 제1 세트의 텍스트 문장을 복수의 형태소로 분할하는 단계 및 분할된 복수의 형태소 상에 인사말과 연관된 인사 키워드가 포함되어 있는지 여부를 판정하는 단계를 포함한다.According to one embodiment of the present invention, the step of performing quality evaluation on voice consultation includes extracting a first set of text sentences included within a specific time range from among a plurality of text sentences, the extracted first set of text It includes dividing a sentence into a plurality of morphemes and determining whether a greeting keyword related to a greeting is included in the plurality of divided morphemes.

본 발명의 일 실시예에 따르면, 음성 상담에 대한 품질 평가를 수행하는 단계는, 복수의 텍스트 문장을 상담사와 연관된 제2 세트의 텍스트 문장 및 고객과 연관된 제3 세트의 텍스트 문장으로 분할하는 단계 및 제2 세트의 텍스트 문장의 적어도 일부가 제3 세트의 텍스트 문장의 적어도 일부와 중첩되는 횟수를 산출하는 단계를 포함한다.According to one embodiment of the invention, performing a quality assessment for a voice consultation includes splitting the plurality of text sentences into a second set of text sentences associated with a counselor and a third set of text sentences associated with a customer; and calculating the number of times at least some of the text sentences in the second set overlap with at least some of the text sentences in the third set.

본 발명의 일 실시예에 따르면, 음성 상담에 대한 품질 평가를 수행하는 단계는, 음성 상담의 시간 구간 및 시간에 따른 복수의 텍스트 문장을 기초로 특정 시간 동안 음소거가 발생한 구간을 추출하는 단계, 복수의 텍스트 문장 중 음소거가 발생한 구간 이전의 텍스트 문장을 결정하는 단계 및 결정된 음소거가 발생한 구간 이전의 텍스트 문장 상에 대기 요청 키워드가 포함되어 있는지 여부를 판정하는 단계를 포함한다.According to an embodiment of the present invention, the step of performing quality evaluation on voice consultation includes extracting a section in which muting occurred during a specific time based on the time section of the voice consultation and a plurality of text sentences according to time, a plurality of steps, It includes determining a text sentence before the section where muting occurred among the text sentences of and determining whether the determined text sentence before the section where muting occurred includes a waiting request keyword.

본 발명의 일 실시예에 따르면, 음성 상담에 대한 품질 평가를 수행하는 단계는, 복수의 텍스트 문장을 상담사와 연관된 제2 세트의 텍스트 문장 및 고객과 연관된 제3 세트의 텍스트 문장으로 분할하는 단계 및 제3 세트의 텍스트 문장 중 고객의 개인 정보를 포함하는 개인 정보 텍스트 문장을 추출하는 단계, 제2 세트의 텍스트 문장 중 추출된 개인 정보 텍스트 문장 직후의 텍스트 문장을 추출하는 단계 및 추출된 개인 정보 텍스트 문장 직후의 텍스트 문장에 고객의 개인 정보에 대응하는 키워드가 포함되어 있는지 여부를 판정하는 단계를 포함한다.According to one embodiment of the invention, performing a quality assessment for a voice consultation includes splitting the plurality of text sentences into a second set of text sentences associated with a counselor and a third set of text sentences associated with a customer; A step of extracting a personal information text sentence containing the customer's personal information from a third set of text sentences, a step of extracting a text sentence immediately following the extracted personal information text sentence from the second set of text sentences, and the extracted personal information text It includes a step of determining whether the text sentence immediately following the sentence contains a keyword corresponding to the customer's personal information.

본 발명의 일 실시예에 따르면, 음성 상담에 대한 품질 평가를 수행하는 단계는, 복수의 텍스트 문장 중 마지막 특정 개수에 대응하는 제4 세트의 텍스트 문장을 추출하는 단계 및 개체명 인식 알고리즘을 이용하여 제4 세트의 텍스트 문장 상에 상담사와 연관된 정보가 포함되어 있는지 여부를 판정하는 단계를 포함한다.According to one embodiment of the present invention, the step of performing quality evaluation on voice consultation includes extracting a fourth set of text sentences corresponding to the last specific number of a plurality of text sentences and using an entity name recognition algorithm. and determining whether information associated with the counselor is included in the fourth set of text sentences.

본 발명의 일 실시예에 따르면, 음성 상담의 품질 평가에 대한 결과 데이터를 품질 평가의 항목 별로 시각화하여 품질 평가 보고서를 생성하는 단계를 더 포함한다.According to an embodiment of the present invention, the method further includes generating a quality evaluation report by visualizing result data for quality evaluation of voice consultation for each quality evaluation item.

본 발명의 일 실시예에 따른 상술된 음성 상담의 품질 평가 방법을 컴퓨터에서 실행하기 위해 컴퓨터 판독 가능한 기록 매체에 저장된 컴퓨터 프로그램이 제공된다.A computer program stored in a computer-readable recording medium is provided to execute the above-described voice consultation quality evaluation method according to an embodiment of the present invention on a computer.

본 발명의 일 실시예에 따른 음성 상담 품질 평가 시스템은, 음성 상담이 수행된 경우, 음성 상담과 연관된 녹취 데이터를 수신하고, 수신된 녹취 데이터를 기초로, 시간에 따른 상담사 및 고객의 음성에 대응하는 텍스트 데이터를 생성하는 텍스트 변환부, 생성된 텍스트 데이터를 문장 단위로 분할하여 복수의 텍스트 문장을 생성하는 문장 생성부, 녹취 데이터 상에 포함된 무음 구간을 추출하고, 추출된 무음 구간을 기초로 생성된 복수의 텍스트 문장의 정확성을 검증하는 문장 검증부 및 음성 상담의 시간 구간 및 시간에 따른, 검증된 복수의 텍스트 문장을 이용하여 음성 상담에 대한 품질 평가를 수행하는 품질 평가 수행부를 포함한다.The voice consultation quality evaluation system according to an embodiment of the present invention receives recording data associated with the voice consultation when voice consultation is performed, and responds to the voices of the counselor and the customer over time based on the received recording data. A text conversion unit that generates text data, a sentence generator that divides the generated text data into sentence units to generate a plurality of text sentences, extracts the silent section included in the recorded data, and extracts the silent section based on the extracted silent section. It includes a sentence verification unit that verifies the accuracy of the plurality of generated text sentences, and a quality evaluation performance unit that performs a quality evaluation of the voice consultation using the plurality of verified text sentences according to the time section and time of the voice consultation.

본 발명의 다양한 실시예에서, 사용자가 직접 상담사들의 모든 음성 상담 내용을 직접 듣지 않고도, 음성 상담에 대한 품질 평가를 효과적으로 수행할 수 있다.In various embodiments of the present invention, the quality of voice counseling can be effectively evaluated without the user having to directly listen to all the voice counseling contents of the counselors.

본 발명의 다양한 실시예에서, 상담사의 상담 능력 향상 및 추가 교육을 위한 별도의 문서를 생성하지 않고도, 자동적으로 생성되는 품질 평가 보고서를 이용하여 상담사에 대한 교육이 효율적으로 수행될 수 있다.In various embodiments of the present invention, training for counselors can be efficiently performed using an automatically generated quality evaluation report without generating separate documents for improving the counselor's counseling ability and additional training.

본 발명의 다양한 실시예에서, 프로세서는 음성 상담이 시작된 후, 특정 시간 내에 정해진 구성을 포함하는 인사말이 발화되었는지 여부를 간단히 판정할 수 있다.In various embodiments of the present invention, the processor can simply determine whether a greeting containing a predetermined composition has been uttered within a certain time after the voice consultation begins.

본 발명의 다양한 실시예에서, 프로세서는 텍스트 문장을 통해 상담사와 고객의 음성이 중첩되는 구간을 인식하여, 상담사의 경청 능력을 효과적으로 평가할 수 있다.In various embodiments of the present invention, the processor can effectively evaluate the counselor's listening ability by recognizing the section where the voices of the counselor and the customer overlap through text sentences.

본 발명의 다양한 실시예에서, 프로세서는 음성 상담 중 발생하는 묵음 구간들 중 대기 요청이 없는 묵음 구간만을 효과적으로 추출하여 상담 품질 평가를 수행할 수 있다.In various embodiments of the present invention, the processor can perform consultation quality evaluation by effectively extracting only the silent sections in which there is no waiting request among the silent sections that occur during voice counseling.

본 발명의 다양한 실시예에서, 프로세서는 고객의 개인 정보가 발화된 경우에, 상담사가 해당 내용을 복창하며 개인 정보의 정확성에 대한 재확인을 수행하였는지 여부를 간단히 인식할 수 있다.In various embodiments of the present invention, when the customer's personal information is uttered, the processor can simply recognize whether the counselor has repeated the content and re-confirmed the accuracy of the personal information.

본 발명의 효과는 이상에서 언급한 효과로 제한되지 않으며, 언급되지 않은 다른 효과들은 청구범위의 기재로부터 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자("통상의 기술자"라 함)에게 명확하게 이해될 수 있을 것이다.The effects of the present invention are not limited to the effects mentioned above, and other effects not mentioned are clear to a person skilled in the art (referred to as a “person skilled in the art”) in the technical field to which the present invention pertains from the description of the claims. It will be understandable.

본 발명의 실시예들은, 이하 설명하는 첨부 도면들을 참조하여 설명될 것이며, 여기서 유사한 참조 번호는 유사한 요소들을 나타내지만, 이에 한정되지는 않는다.
도 1은 본 발명의 일 실시예에 따른 음성 상담의 품질 평가가 수행되는 예시를 나타내는 도면이다.
도 2는 본 발명의 일 실시예에 따른 음성 상담 품질 평가 시스템의 내부 구성을 나타내는 기능적인 블록도이다.
도 3은 본 발명의 일 실시예에 따른 멜 스펙트로그램이 생성되는 예시를 나타내는 도면이다.
도 4는 본 발명의 일 실시예에 따른 음성 상담의 품질 평가 방법의 예시를 나타내는 흐름도이다.
도 5는 본 발명의 일 실시예에 따른 도입부 평가 방법의 예시를 나타내는 흐름도이다.
도 6은 본 발명의 일 실시예에 따른 경청 능력 평가 방법의 예시를 나타내는 흐름도이다.
도 7은 본 발명의 일 실시예에 따른 묵음 평가 방법의 예시를 나타내는 흐름도이다.
도 8은 본 발명의 일 실시예에 따른 정보 확인 평가 방법의 예시를 나타내는 흐름도이다.
도 9는 본 발명의 일 실시예에 따른 끝인사 평가 방법의 예시를 나타내는 흐름도이다.Embodiments of the present invention will be described with reference to the accompanying drawings described below, in which like reference numerals represent like elements, but are not limited thereto.
1 is a diagram illustrating an example in which quality evaluation of voice consultation is performed according to an embodiment of the present invention.
Figure 2 is a functional block diagram showing the internal configuration of a voice consultation quality evaluation system according to an embodiment of the present invention.
Figure 3 is a diagram showing an example of generating a Mel spectrogram according to an embodiment of the present invention.
Figure 4 is a flowchart showing an example of a method for evaluating the quality of voice consultation according to an embodiment of the present invention.
Figure 5 is a flowchart showing an example of an introduction evaluation method according to an embodiment of the present invention.
Figure 6 is a flowchart showing an example of a listening ability evaluation method according to an embodiment of the present invention.
Figure 7 is a flowchart showing an example of a silence evaluation method according to an embodiment of the present invention.
Figure 8 is a flowchart showing an example of an information verification evaluation method according to an embodiment of the present invention.
Figure 9 is a flowchart showing an example of a closing greeting evaluation method according to an embodiment of the present invention.

이하, 본 발명의 실시를 위한 구체적인 내용을 첨부된 도면을 참조하여 상세히 설명한다. 다만, 이하의 설명에서는 본 발명의 요지를 불필요하게 흐릴 우려가 있는 경우, 널리 알려진 기능이나 구성에 관한 구체적 설명은 생략하기로 한다.Hereinafter, specific details for implementing the present invention will be described in detail with reference to the attached drawings. However, in the following description, detailed descriptions of well-known functions or configurations will be omitted if there is a risk of unnecessarily obscuring the gist of the present invention.

첨부된 도면에서, 동일하거나 대응하는 구성요소에는 동일한 참조부호가 부여되어 있다. 또한, 이하의 실시예들의 설명에 있어서, 동일하거나 대응되는 구성요소를 중복하여 기술하는 것이 생략될 수 있다. 그러나, 구성요소에 관한 기술이 생략되어도, 그러한 구성요소가 어떤 실시예에 포함되지 않는 것으로 의도되지는 않는다.In the accompanying drawings, identical or corresponding components are given the same reference numerals. Additionally, in the description of the following embodiments, overlapping descriptions of identical or corresponding components may be omitted. However, even if descriptions of components are omitted, it is not intended that such components are not included in any embodiment.

개시된 실시예의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나, 본 발명은 이하에서 개시되는 실시예들에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 수 있으며, 단지 본 실시예들은 본 발명이 완전하도록 하고, 본 발명이 통상의 기술자에게 발명의 범주를 완전하게 알려주기 위해 제공되는 것일 뿐이다.Advantages and features of the disclosed embodiments and methods for achieving them will become clear by referring to the embodiments described below in conjunction with the accompanying drawings. However, the present invention is not limited to the embodiments disclosed below and may be implemented in various different forms, but the present embodiments only serve to ensure that the present invention is complete and that the present invention does not convey the scope of the invention to those skilled in the art. It is provided only for complete information.

본 명세서에서 사용되는 용어에 대해 간략히 설명하고, 개시된 실시예에 대해 구체적으로 설명하기로 한다. 본 명세서에서 사용되는 용어는 본 발명에서의 기능을 고려하면서 가능한 현재 널리 사용되는 일반적인 용어들을 선택하였으나, 이는 관련 분야에 종사하는 기술자의 의도 또는 판례, 새로운 기술의 출현 등에 따라 달라질 수 있다. 또한, 특정한 경우는 출원인이 임의로 선정한 용어도 있으며, 이 경우 해당되는 발명의 설명 부분에서 상세히 그 의미를 기재할 것이다. 따라서, 본 발명에서 사용되는 용어는 단순한 용어의 명칭이 아닌, 그 용어가 가지는 의미와 본 발명의 전반에 걸친 내용을 토대로 정의되어야 한다.Terms used in this specification will be briefly described, and the disclosed embodiments will be described in detail. The terms used in this specification are general terms that are currently widely used as much as possible while considering the function in the present invention, but this may vary depending on the intention or precedent of a technician working in the related field, the emergence of new technology, etc. In addition, in certain cases, there are terms arbitrarily selected by the applicant, and in this case, the meaning will be described in detail in the description of the relevant invention. Therefore, the terms used in the present invention should be defined based on the meaning of the term and the overall content of the present invention, rather than simply the name of the term.

본 명세서에서의 단수의 표현은 문맥상 명백하게 단수인 것으로 특정하지 않는 한, 복수의 표현을 포함한다. 또한, 복수의 표현은 문맥상 명백하게 복수인 것으로 특정하지 않는 한, 단수의 표현을 포함한다. 명세서 전체에서 어떤 부분이 어떤 구성요소를 포함한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있음을 의미한다.In this specification, singular expressions include plural expressions, unless the context clearly specifies the singular. Additionally, plural expressions include singular expressions, unless the context clearly specifies plural expressions. When it is said that a certain part includes a certain element throughout the specification, this does not mean excluding other elements, but may further include other elements, unless specifically stated to the contrary.

본 발명에서, "포함하다", "포함하는" 등의 용어는 특징들, 단계들, 동작들, 요소들 및/또는 구성 요소들이 존재하는 것을 나타낼 수 있으나, 이러한 용어가 하나 이상의 다른 기능들, 단계들, 동작들, 요소들, 구성 요소들 및/또는 이들의 조합이 추가되는 것을 배제하지는 않는다.In the present invention, terms such as "comprise", "comprising", etc. may indicate the presence of features, steps, operations, elements and/or components, but may indicate that such terms include one or more other functions, It does not preclude the addition of steps, operations, elements, components and/or combinations thereof.

본 발명에서, 특정 구성 요소가 임의의 다른 구성 요소에 "결합", "조합", "연결" 되거나, "반응" 하는 것으로 언급된 경우, 특정 구성 요소는 다른 구성 요소에 직접 결합, 조합 및/또는 연결되거나, 반응할 수 있으나, 이에 한정되지 않는다. 예를 들어, 특정 구성 요소와 다른 구성 요소 사이에 하나 이상의 중간 구성 요소가 존재할 수 있다. 또한, 본 발명에서 "및/또는"은 열거된 하나 이상의 항목의 각각 또는 하나 이상의 항목의 적어도 일부의 조합을 포함할 수 있다.In the present invention, when a specific component is referred to as being “coupled,” “combined,” “connected,” or “reacting” with any other component, the specific component is directly bonded, combined, and/or connected to the other component. Alternatively, it may be connected or react, but is not limited thereto. For example, one or more intermediate components may exist between a particular component and another component. Additionally, in the present invention, “and/or” may include each of one or more listed items or a combination of at least a portion of one or more items.

본 발명에서, "제1", "제2" 등의 용어는 특정 구성 요소를 다른 구성 요소와 구별하기 위해 사용되는 것으로, 이러한 용어에 의해 상술된 구성 요소가 제한되진 않는다. 예를 들어, "제1" 구성 요소는 "제2" 구성 요소와 동일하거나 유사한 형태의 요소일 수 있다.In the present invention, terms such as “first” and “second” are used to distinguish specific components from other components, and the components described above are not limited by these terms. For example, the “first” component may be an element of the same or similar form as the “second” component.

본 발명에서, '녹취 데이터'는 상담사와 고객 사이의 음성 상담을 녹음하거나 녹취한 음성 데이터로서, 시간에 따른 상담사 및 고객의 음성의 크기(진폭), 높낮이 등에 대한 정보를 포함할 수 있다.In the present invention, 'recorded data' is voice data that records or transcribes a voice consultation between a counselor and a customer, and may include information about the volume (amplitude) and pitch of the counselor's and customer's voices over time.

본 발명에서, '스펙트로그램(spectrogram) '은 소리 영역을 주요한 특징 중심으로 표시하는 이미지를 지칭할 수 있으며, 음성의 강도와 주파수의 분포를 포함할 수 있다. 또한, 멜 스펙트로그램(Mel Spectrogram)는 음성을 주파수 영역으로 인식하는 스펙트로그램을 지칭할 수 있다.In the present invention, 'spectrogram' may refer to an image that displays the sound area as the main feature center, and may include the distribution of intensity and frequency of the voice. Additionally, Mel Spectrogram may refer to a spectrogram that recognizes voice in the frequency domain.

도 1은 본 발명의 일 실시예에 따른 음성 상담의 품질 평가가 수행되는 예시를 나타내는 도면이다. 일 실시예에 따르면, 상담사(110)와 고객(120)은 음성 상담을 수행할 수 있다. 이 경우, 음성 상담 녹취 서버(130)는 음성 상담을 수행하는 상담사(110) 및 고객(120)의 음성을 실시간으로 녹취할 수 있다. 예를 들어, 음성 상담 녹취 서버(130)는 상담사(110)의 음성 및 고객(120)의 음성을 각각 분리하여 녹취하거나, 음성 상담의 내용을 녹취한 후, 상담사(110)의 음성 및 고객(120)의 음성으로 분리할 수도 있다.1 is a diagram illustrating an example in which quality evaluation of voice consultation is performed according to an embodiment of the present invention. According to one embodiment, the counselor 110 and the customer 120 may perform voice counseling. In this case, the voice consultation recording server 130 can record the voices of the counselor 110 and the customer 120 performing the voice consultation in real time. For example, the voice consultation recording server 130 records the voice of the counselor 110 and the voice of the customer 120 separately, or records the content of the voice consultation, and then records the voice of the counselor 110 and the voice of the customer ( 120) can also be separated by voice.

음성 상담 녹취 서버(130)에 의해 획득된 녹취 데이터는 녹취 DB(140)에 저장되어 관리될 수 있다. 예를 들어, 녹취 데이터는 상담을 수행한 상담사 별로 분리되어 녹취 DB(140)에 저장되거나, 고객 별로 분리되어 녹취 DB(140)에 저장될 수 있으나, 이에 한정되지 않는다. 다른 예에서, 녹취 데이터는 상담사 및 고객의 명칭 등으로 매칭되어 녹취 DB(140)에 저장될 수도 있다.Recorded data acquired by the voice consultation recording server 130 may be stored and managed in the recording DB 140. For example, the recorded data may be separated by counselor who performed the consultation and stored in the recording DB 140, or separated by customer and stored in the recording DB 140, but is not limited to this. In another example, recorded data may be matched with the names of counselors and customers and stored in the recording DB 140.

음성 상담 품질 평가 시스템(150)은 녹취 DB(140)에 저장된 녹취 데이터(142)를 수신하거나 추출할 수 있다. 이 경우, 음성 상담 품질 평가 시스템(150)은 수신된 녹취 데이터(142)를 이용하여 상담사(110)의 음성 상담에 대한 품질 평가를 수행할 수 있다. 예를 들어, 음성 상담에 대한 품질 평가는 도입부 평가, 경청 능력 평가, 묵음 평가, 정보 확인 평가, 끝인사 평가 등을 포함할 수 있으나, 이에 한정되지 않으며, 가점 항목 및 감점 항목을 이용한 추가적인 평가를 더 포함할 수 있다. 예를 들어, 품질 평가에서 상담사가 상담을 친절하게 처리한 것으로 판정되거나, 상담 중 쿠션어를 n회(여기서, n은 자연수) 이상 사용한 경우, 가점이 주어질 수 있다. 다른 예에서, 상담사가 상담을 불친절하게 처리한 것으로 판정되거나, 상담 결과 이력 기록 과정이 특정 기준에 미달된 경우, 감점이 주어질 수 있다.The voice consultation quality evaluation system 150 may receive or extract the recording data 142 stored in the recording DB 140. In this case, the voice counseling quality evaluation system 150 may use the received recording data 142 to evaluate the quality of the counselor 110's voice counseling. For example, the quality evaluation of voice counseling may include, but is not limited to, introduction evaluation, listening ability evaluation, silence evaluation, information confirmation evaluation, and closing greeting evaluation, and may include additional evaluation using plus and minus points. More may be included. For example, in quality evaluation, additional points may be given if it is determined that the counselor handled the consultation kindly or if a cushion word was used more than n times (where n is a natural number) during the consultation. In other examples, points may be deducted if the counselor is determined to have handled the counseling unkindly or if the process of recording the history of counseling results falls short of certain standards.

품질 평가를 위해, 음성 상담 품질 평가 시스템(150)은 수신된 녹취 데이터(142)를 기초로, 시간에 따른 상담사(110) 및 고객(120)의 음성에 대응하는 텍스트 데이터를 생성할 수 있다. 이 경우, 음성을 텍스트로 변환하기 위한 임의의 STT(Speech To Text) 알고리즘이 사용되거나 임의의 기계학습 모델이 이용될 수 있다. 이와 같이 생성된 텍스트 데이터는 상담사(110)의 발화(speaking)와 연관된 텍스트 데이터 및 고객(120)의 발화와 연관된 텍스트 데이터로 구분될 수 있다. 여기서, 텍스트 데이터는 시간에 따른 음성에 대응하도록 생성되므로, 음성 상담에 포함된 상담사(110) 및 고객(120)의 발화의 순서가 생성된 텍스트 데이터 상에서 동일하게 유지될 수 있다. 다시 말해, 텍스트 데이터는 음성 상담과 연관된 시간 정보를 포함할 수 있다.For quality evaluation, the voice counseling quality evaluation system 150 may generate text data corresponding to the voices of the counselor 110 and the customer 120 over time based on the received recording data 142. In this case, any STT (Speech To Text) algorithm may be used to convert speech to text, or any machine learning model may be used. The text data generated in this way can be divided into text data related to the speaking of the counselor 110 and text data related to the speaking of the customer 120. Here, since the text data is generated to correspond to voices over time, the order of speech of the counselor 110 and the customer 120 included in the voice consultation can be maintained the same in the generated text data. In other words, the text data may include time information associated with the voice consultation.

일 실시예에 따르면, 음성 상담 품질 평가 시스템(150)은 생성된 텍스트 데이터를 문장 단위로 분할하여 복수의 텍스트 문장을 생성할 수 있다. 여기서, 음성 상담 품질 평가 시스템(150)은 임의의 알고리즘, 기계학습 모델 등을 이용하여 텍스트 데이터를 문장 단위로 분할할 수 있다. 예를 들어, 음성 상담 품질 평가 시스템(150)은 주어 및 동사를 포함하는 텍스트 데이터의 일부를 하나의 텍스트 문장으로 결정할 수 있으나, 이에 한정되지 않는다.According to one embodiment, the voice consultation quality evaluation system 150 may divide the generated text data into sentences to generate a plurality of text sentences. Here, the voice consultation quality evaluation system 150 may divide text data into sentence units using an arbitrary algorithm, machine learning model, etc. For example, the voice consultation quality evaluation system 150 may determine a portion of text data including a subject and a verb as one text sentence, but is not limited to this.

음성 상담 품질 평가 시스템(150)은 생성된 복수의 텍스트 문장의 정확성을 검증할 수 있다. 다시 말해, 텍스트 데이터가 각각의 문장 별로 정확히 분할되었는지 여부가 검증될 수 있다. 예를 들어, 음성 상담이 수행되는 경우, 정형화된 문장 뿐만 아니라, 비정형 문장이 포함될 수 있으며, 이러한 비정형 문장 중 적어도 일부는 텍스트 문장으로 정확히 분할되지 않을 수 있다. 따라서, 1차적으로 분할된 복수의 텍스트 문장의 정확성에 대한 검증을 수행하여, 텍스트 데이터에 포함된 모든 문장을 정확히 식별하고 재분할할 수 있다.The voice consultation quality evaluation system 150 can verify the accuracy of the plurality of generated text sentences. In other words, it can be verified whether the text data has been accurately segmented for each sentence. For example, when voice counseling is performed, not only standard sentences but also unstructured sentences may be included, and at least some of these unstructured sentences may not be accurately divided into text sentences. Therefore, by verifying the accuracy of the plurality of initially segmented text sentences, it is possible to accurately identify and re-segment all sentences included in the text data.

일 실시예에 따르면, 음성 상담 품질 평가 시스템(150)은 녹취 데이터(142) 상에 포함된 무음 구간을 추출하고, 추출된 무음 구간을 기초로 생성된 복수의 텍스트 문장의 정확성을 검증할 수 있다. 여기서, 음성 상담 품질 평가 시스템(150)은 녹취 데이터(142)와 연관된 멜 스펙트로그램(mel spectrogram)을 생성하고, 생성된 멜 스펙트로그램을 이용하여 무음 구간을 추출한 후, 추출된 무음 구간을 기초로 생성된 복수의 텍스트 문장의 정확성을 검증할 수 있다. 예를 들어, 분할된 텍스트 문장이 “아 그건 말이죠 (공백) 이만 오천원입니다"와 같이 구성된 경우, 음성 상담 품질 평가 시스템(150)은 해당 텍스트 문장을 공백을 기준으로 2개의 새로운 텍스트 문장으로 재분할할 수 있다.According to one embodiment, the voice consultation quality evaluation system 150 may extract a silent section included in the recording data 142 and verify the accuracy of a plurality of text sentences generated based on the extracted silent section. . Here, the voice consultation quality evaluation system 150 generates a mel spectrogram associated with the recording data 142, extracts a silent section using the generated mel spectrogram, and then extracts a silent section based on the extracted silent section. The accuracy of multiple generated text sentences can be verified. For example, if the divided text sentence is composed of “Oh, that is (space) 25,000 won”, the voice consultation quality evaluation system 150 re-divides the text sentence into two new text sentences based on the space. You can.

그 후, 음성 상담 품질 평가 시스템(150)은 음성 상담의 시간 구간 및 시간에 따른, 검증된 복수의 텍스트 문장을 이용하여 음성 상담에 대한 품질 평가를 수행할 수 있다. 즉, 음성 상담 품질 평가 시스템(150)은 도입부 평가, 경청 능력 평가, 묵음 평가, 정보 확인 평가, 끝인사 평가 등을 각각 수행하고, 그 결과를 정의된 형식으로 결합하여 품질 보고서(152)를 생성할 수 있다. 이와 같이 생성된 품질 보고서(152)는 품질 평가를 수행하는 사용자 및/또는 상담사(110) 등의 사용자 단말로 전송되거나 전달될 수 있다.Thereafter, the voice consultation quality evaluation system 150 may perform a quality evaluation of the voice consultation using a plurality of verified text sentences according to the time section and time of the voice consultation. That is, the voice consultation quality evaluation system 150 performs introduction evaluation, listening ability evaluation, silence evaluation, information confirmation evaluation, and closing greeting evaluation, and combines the results in a defined format to generate a quality report 152. can do. The quality report 152 generated in this way may be transmitted or delivered to user terminals such as the user and/or the counselor 110 performing the quality evaluation.

도 2는 본 발명의 일 실시예에 따른 음성 상담 품질 평가 시스템(150)의 내부 구성을 나타내는 기능적인 블록도이다. 도시된 바와 같이, 음성 상담 품질 평가 시스템(150)은 텍스트 변환부(210), 문장 생성부(220), 문장 검증부(230), 품질 평가 수행부(240) 등을 포함할 수 있다. 상술된 바와 같이, 음성 상담 품질 평가 시스템(150)은 녹취 DB 등과 통신하며, 품질 평가에 필요한 데이터 및/또는 정보 등을 주고받을 수 있다.Figure 2 is a functional block diagram showing the internal configuration of the voice consultation quality evaluation system 150 according to an embodiment of the present invention. As shown, the voice consultation quality evaluation system 150 may include a text conversion unit 210, a sentence generation unit 220, a sentence verification unit 230, and a quality evaluation performance unit 240. As described above, the voice consultation quality evaluation system 150 communicates with a recording database, etc., and can exchange data and/or information necessary for quality evaluation.

텍스트 변환부(210)는 음성 상담과 연관된 녹취 데이터를 수신하고, 수신된 녹취 데이터(142)를 기초로, 시간에 따른 상담사 및 고객의 음성에 대응하는 텍스트 데이터를 생성할 수 있다. 예를 들어, 텍스트 변환부(210)는 임의의 STT 알고리즘, 기계학습 모델 등을 이용하여 녹취 데이터를 텍스트 데이터로 변환할 수 있다. 이와 같이, STT 변환이 수행된 경우, 텍스트 데이터는 음성의 인식 결과와 해당 결과의 시간 정보의 집합으로 구성될 수 있다.The text converter 210 may receive recorded data related to voice counseling and, based on the received recorded data 142, generate text data corresponding to the voices of the counselor and the customer over time. For example, the text conversion unit 210 may convert recorded data into text data using an arbitrary STT algorithm, machine learning model, etc. In this way, when STT conversion is performed, text data may be composed of a set of voice recognition results and time information of the results.

상담 도입 부분에서 첫인사에 대한 발화, 응대 과정, 마무리 과정에서 끝인사에 대한 발화 등을 평가하기 위해서는 음성 상담의 내용을 문장 단위로 처리하는 것이 요구될 수 있다. 따라서, 문장 생성부(220)는 텍스트 데이터를 문장 단위의 스크립트(script)로 변경할 수 있다. 예를 들어, 문장 생성부(220)는 텍스트 데이터에 포함된 단어들 중 문장을 구성하는 단어들을 각각 조합하여 복수의 텍스트 문장을 생성할 수 있다. 이 경우, 문장 생성부(220)는 인접한 시간 범위의 주어 및 동사를 포함하도록 복수의 텍스트 문장을 생성할 수 있으나, 이에 한정되지 않는다.In order to evaluate the utterance of the first greeting in the introduction part of the consultation, the response process, and the utterance of the final greeting in the closing process, it may be required to process the content of the voice consultation on a sentence-by-sentence basis. Accordingly, the sentence generator 220 can change text data into a script in sentence units. For example, the sentence generator 220 may generate a plurality of text sentences by combining words constituting sentences among words included in text data. In this case, the sentence generator 220 may generate a plurality of text sentences to include subjects and verbs of adjacent time ranges, but is not limited to this.

문장 검증부(230)는 녹취 데이터(142)를 이용하여, 문장 생성부(220)에 의해 생성된 복수의 텍스트 문장을 검증할 수 있다. 일 실시예에 따르면, 상담 과정에서 상담사와 고객은 불완전한 문장으로 대화를 주고받을 수 있으며, 이에 따라, 비정형 문장이 녹취 데이터(142)에 포함될 수 있다. 이러한 비정형 문장의 적어도 일부는 문장 생성부(220)에 의해 정확히 구분되지 않을 수 있다. 즉, 문장 검증부(230)는 녹취 데이터(142) 상에 포함된 무음 구간을 추출하고, 추출된 무음 구간을 기초로 생성된 복수의 텍스트 문장의 정확성을 검증할 수 있다. 이 경우, 문장 검증부(230)는 녹취 데이터(142)와 연관된 멜 스펙트로그램을 생성하고, 생성된 멜 스펙트로그램을 이용하여 무음 구간을 추출한 후, 추출된 무음 구간을 기준으로 텍스트 문장을 서로 다른 복수의 텍스트 문장으로 분할할 수 있다.The sentence verification unit 230 may use the recording data 142 to verify a plurality of text sentences generated by the sentence generation unit 220. According to one embodiment, during the counseling process, the counselor and the customer may exchange conversations using incomplete sentences, and accordingly, unstructured sentences may be included in the recording data 142. At least some of these irregular sentences may not be accurately classified by the sentence generator 220. That is, the sentence verification unit 230 can extract the silent section included in the recording data 142 and verify the accuracy of a plurality of text sentences generated based on the extracted silent section. In this case, the sentence verification unit 230 generates a mel spectrogram associated with the recording data 142, extracts a silent section using the generated mel spectrogram, and then converts the text sentences into different types based on the extracted silent section. It can be divided into multiple text sentences.

품질 평가 수행부(240)는 생성되고 검증된 복수의 텍스트 문장을 이용하여, 음성 상담에 대한 품질 평가를 수행할 수 있다. 즉, 품질 평가 수행부(240)는 음성 상담의 시간 구간 및 시간에 따른, 검증된 복수의 텍스트 문장을 이용하여 음성 상담에 대한 품질 평가를 수행할 수 있다. 음성 상담에 대한 품질 평가는 미리 정해진 방식에 따라 여러 단계로 나누어 평가될 수 있으며, 각 단계 마다 품질 평가를 위한 임의의 알고리즘, 기계학습 모델이 사용될 수 있다. 이와 같이, 품질 평가가 완료된 후, 음성 상담 품질 평가 시스템(150)은 음성 상담의 품질 평가에 대한 결과 데이터(예: JSON 형식의 데이터)를 품질 평가의 항목 별로 시각화하여 품질 평가 보고서(152)를 생성하고 생성된 품질 평가 보고서(152)를 사용자들에게 제공할 수 있다.The quality evaluation performing unit 240 may perform a quality evaluation of the voice consultation using a plurality of text sentences that have been generated and verified. That is, the quality evaluation performing unit 240 may perform a quality evaluation of the voice consultation using a plurality of verified text sentences according to the time section and time of the voice consultation. Quality evaluation of voice counseling can be divided into several stages according to a predetermined method, and arbitrary algorithms and machine learning models for quality evaluation can be used at each stage. In this way, after the quality evaluation is completed, the voice consultation quality evaluation system 150 visualizes the result data (e.g., data in JSON format) for the quality evaluation of the voice consultation for each item of the quality evaluation to produce a quality evaluation report 152. It is possible to generate and provide the generated quality evaluation report 152 to users.

도 2에서는 음성 상담 품질 평가 시스템(150)에 포함된 각각의 기능적인 구성이 구분되어 상술되었으나, 이는 발명의 이해를 돕기 위한 것일 뿐이며, 하나의 연산 장치에서 둘 이상의 기능을 수행할 수도 있다. 이와 같은 구성에 의해, 사용자가 직접 상담사들의 모든 음성 상담 내용을 직접 듣지 않고도, 음성 상담에 대한 품질 평가를 효과적으로 수행할 수 있다. 또한, 상담사의 상담 능력 향상 및 추가 교육을 위한 별도의 문서를 생성하지 않고도, 자동적으로 생성되는 품질 평가 보고서(152)를 이용하여 상담사에 대한 교육이 효율적으로 수행될 수 있다.In FIG. 2, each functional configuration included in the voice consultation quality evaluation system 150 is separately described and described in detail, but this is only to aid understanding of the invention, and two or more functions may be performed in one computing device. With this configuration, the user can effectively evaluate the quality of voice counseling without having to directly listen to all the voice counseling contents of the counselors. In addition, training for counselors can be efficiently performed using the automatically generated quality evaluation report 152 without creating separate documents for improving the counselor's counseling ability and additional training.

도 3은 본 발명의 일 실시예에 따른 멜 스펙트로그램(340)이 생성되는 예시를 나타내는 도면이다. 상술된 바와 같이, 음성 상담 품질 평가 시스템(도 1의 150)은 녹취 데이터(310)를 이용하여 멜 스펙트로그램(340)을 생성할 수 있다. 이와 같이, 생성된 멜 스펙트로그램(340)은 텍스트 문장의 정확성을 검증하기 위해 사용될 수 있다. 멜 스펙트로그램(340)은 그래프의 형태로 시각화되어 생성될 수 있으나, 이에 한정되지 않는다.Figure 3 is a diagram illustrating an example of generating a Mel spectrogram 340 according to an embodiment of the present invention. As described above, the voice consultation quality evaluation system (150 in FIG. 1) may generate a mel spectrogram 340 using the recorded data 310. In this way, the generated mel spectrogram 340 can be used to verify the accuracy of the text sentence. The Mel spectrogram 340 may be visualized and generated in the form of a graph, but is not limited to this.

일 실시예에 따르면, 음성 상담 품질 평가 시스템은 녹취 데이터(310)를 이용하여 스펙트럼(spectrum)(320)을 생성할 수 있다. 예를 들어, 음성 상담 품질 평가 시스템은 녹취 데이터(310) 상에 푸리에 변환(Fourier transform)을 적용하여 녹취 데이터(310)와 연관된 주파수 정보를 추출하고, 추출된 주파수 정보를 이용하여 스펙트럼(320)을 생성할 수 있다. 구체적으로, 녹취 데이터(310)는 시간에 따른 음성의 진폭(amplitude)에 대한 정보를 포함할 수 있으며, 스펙트럼(320)은 주파수(frequency)에 따른 진폭에 대한 정보를 포함하도록 구성될 수 있다.According to one embodiment, the voice consultation quality evaluation system may generate a spectrum 320 using recorded data 310. For example, the voice consultation quality evaluation system applies Fourier transform to the recording data 310 to extract frequency information associated with the recording data 310, and uses the extracted frequency information to obtain a spectrum 320. can be created. Specifically, the recording data 310 may include information about the amplitude of the voice over time, and the spectrum 320 may be configured to include information about the amplitude according to the frequency.

또한, 음성 상담 품질 평가 시스템은 스펙트럼(320)을 이용하여 스펙트로그램(spectrogram)(330)을 생성할 수 있다. 예를 들어, 음성 상담 품질 평가 시스템은 스펙트럼(320)의 각 시간 별로 푸리에 변환을 적용하여 시간별 주파수에 따른 진폭에 대한 정보를 추출할 수 있다. 그리고 나서, 음성 상담 품질 평가 시스템은 진폭을 데시벨(decibel)로 변환하고, 주파수에 로그 스케일(log scale)을 적용하여 스펙트로그램(330)을 생성할 수 있다.Additionally, the voice consultation quality evaluation system can generate a spectrogram 330 using the spectrum 320. For example, the voice consultation quality evaluation system may apply Fourier transform for each time of the spectrum 320 to extract information about amplitude according to frequency for each time. Then, the voice consultation quality evaluation system can convert the amplitude into decibels and apply a log scale to the frequency to generate a spectrogram 330.

그 후, 음성 상담 품질 평가 시스템은 스펙트로그램(330)을 이용하여 멜 스펙트로그램(mel spectrogram)(340)을 생성할 수 있다. 예를 들어, 스펙트로그램(330) 상의 주파수에 멜 스케일(mel scale)을 매칭하는 경우, 멜 스펙트로그램(340)이 생성될 수 있다. 여기서, 멜 스케일은 고주파수(high frequency)보다 저주파수(low frequency) 대역에서 더 민감하게 반응하는 사람의 특성을 고려하여 생성된 스케일로서, 물리적인 주파수와 실제 사람이 인식하는 주파수의 관계를 나타내는 스케일을 지칭할 수 있다. 상술된 바와 같이, 음성 상담 품질 평가 시스템은 이와 같이 생성된 멜 스펙트로그램(340)을 이용하여 녹취 데이터(310) 상의 무음 구간을 추출하고, 추출된 무음 구간을 기초로 복수의 텍스트 문장의 정확성을 검증할 수 있다. 예를 들어, 음성 상담 품질 평가 시스템은 2 이상의 문장을 포함하는 하나의 텍스트 문장에 있어서, 무음 구간을 기준으로 무음 구간의 이전 문장과 이후 문장의 서로 다른 두 개의 텍스트 문장으로 분할할 수 있다.Afterwards, the voice consultation quality evaluation system can generate a mel spectrogram 340 using the spectrogram 330. For example, when matching the mel scale to the frequency on the spectrogram 330, the mel spectrogram 340 may be generated. Here, the Mel scale is a scale created in consideration of the characteristics of people who react more sensitively to low frequency bands than to high frequencies, and is a scale that represents the relationship between physical frequencies and frequencies perceived by actual people. It can be referred to. As described above, the voice consultation quality evaluation system extracts silent sections on the recording data 310 using the mel spectrogram 340 generated in this way, and determines the accuracy of a plurality of text sentences based on the extracted silent sections. It can be verified. For example, the voice consultation quality evaluation system can divide one text sentence containing two or more sentences into two different text sentences, the sentence before and after the silent section, based on the silent section.

도 4는 본 발명의 일 실시예에 따른 음성 상담의 품질 평가 방법(400)의 예시를 나타내는 흐름도이다. 일 실시예에 따르면, 음성 상담의 품질 평가 방법(400)은 프로세서(예를 들어, 음성 상당 품질 평가 시스템의 적어도 하나의 프로세서)에 의해 수행될 수 있다. 도시된 바와 같이, 음성 상담의 품질 평가 방법(400)은 음성 상담이 수행된 경우, 음성 상담과 연관된 녹취 데이터를 수신함으로써 개시될 수 있다(S410).Figure 4 is a flowchart showing an example of a method 400 for evaluating the quality of voice consultation according to an embodiment of the present invention. According to one embodiment, the method 400 for evaluating the quality of voice consultation may be performed by a processor (eg, at least one processor of a voice equivalent quality evaluation system). As shown, the method 400 for evaluating the quality of voice counseling may be initiated by receiving recorded data associated with the voice counseling when voice counseling is performed (S410).

프로세서는 수신된 녹취 데이터를 기초로, 시간에 따른 상담사 및 고객의 음성에 대응하는 텍스트 데이터를 생성할 수 있다(S420). 또한, 프로세서는 생성된 텍스트 데이터를 문장 단위로 분할하여 복수의 텍스트 문장을 생성할 수 있다(S430). 즉, 프로세서는 녹취 데이터로부터 음성 상담을 수행한 상담사 및 고객과 연관된 문장들을 추출할 수 있다.Based on the received recording data, the processor may generate text data corresponding to the voices of the counselor and customer over time (S420). Additionally, the processor may divide the generated text data into sentence units to generate a plurality of text sentences (S430). That is, the processor can extract sentences related to the customer and the counselor who performed the voice consultation from the recorded data.

프로세서는 녹취 데이터 상에 포함된 무음 구간을 추출하고, 추출된 무음 구간을 기초로 생성된 복수의 텍스트 문장의 정확성을 검증할 수 있다(S440). 예를 들어, 프로세서는 녹취 데이터와 연관된 멜 스펙트로그램을 생성하고, 생성된 멜 스펙트로그램을 이용하여 무음 구간을 추출할 수 있다. 또한, 프로세서는 음성 상담의 시간 구간 및 시간에 따른, 검증된 복수의 텍스트 문장을 이용하여 음성 상담에 대한 품질 평가를 수행할 수 있다(S450). 이 경우, 프로세서는 음성 상담의 품질 평가에 대한 결과 데이터를 품질 평가의 항목 별로 시각화하여 품질 평가 보고서를 생성할 수 있다.The processor may extract the silent section included in the recording data and verify the accuracy of a plurality of text sentences generated based on the extracted silent section (S440). For example, the processor may generate a mel spectrogram associated with the recorded data and extract a silent section using the generated mel spectrogram. Additionally, the processor may perform a quality evaluation of the voice consultation using a plurality of verified text sentences according to the time section and time of the voice consultation (S450). In this case, the processor can generate a quality evaluation report by visualizing the result data for the quality evaluation of voice consultation for each quality evaluation item.

도 5는 본 발명의 일 실시예에 따른 도입부 평가 방법(500)의 예시를 나타내는 흐름도이다. 일 실시예에 따르면, 도입부 평가 방법(500)은 프로세서(예를 들어, 음성 상당 품질 평가 시스템의 적어도 하나의 프로세서)에 의해 수행될 수 있다. 도시된 바와 같이, 도입부 평가 방법(500)은 프로세서가 복수의 텍스트 문장 중 특정 시간 범위 내에 포함된 제1 세트의 텍스트 문장을 추출함으로써 개시될 수 있다(S510). 여기서, 특정 시간 범위는 대화의 시작부터 미리 정해진 시간 이내의 범위를 포함할 수 있다. 예를 들어, 특정 시간 범위는 인사말이 발화되는 평균 시간의 2배의 시간 범위일 수 있다. 즉, 인사말의 발화는 평균 2초의 시간을 소요하는 것으로 결정된 경우, 특정 시간 범위는 대화가 시작되고 4초 이내의 범위를 포함할 수 있다.Figure 5 is a flowchart showing an example of an introduction evaluation method 500 according to an embodiment of the present invention. According to one embodiment, the introductory evaluation method 500 may be performed by a processor (e.g., at least one processor of a speech equivalent quality evaluation system). As shown, the introduction evaluation method 500 may be initiated by the processor extracting a first set of text sentences included within a specific time range from among a plurality of text sentences (S510). Here, the specific time range may include a range within a predetermined time from the start of the conversation. For example, the specific time range may be a time range that is twice the average time the greeting is uttered. That is, if it is determined that uttering a greeting takes an average of 2 seconds, the specific time range may include a range within 4 seconds from the start of the conversation.

프로세서는 특정 시간 범위 내에 인사, 소속 및/또는 이름의 정보를 포함하는 인사말이 발화가 되었는지 여부를 기초로 도입부 평가를 수행할 수 있다. 도입부 평가를 위해, 프로세서는 추출된 제1 세트의 텍스트 문장을 복수의 형태소로 분할할 수 있다(S520). 또한, 프로세서는 분할된 복수의 형태소 상에 인사 키워드가 포함되어 있는지 여부를 판정할 수 있다(S530). 여기서, 인사 키워드는 인사말과 연관된 키워드로서, 다양한 인사말에서 공통적으로 사용되는 핵심 공통 키워드일 수 있으나, 이에 한정되지 않으며, 미리 정해진 상담사의 소속, 이름 등에 대한 키워드를 더 포함할 수 있다. 예를 들어, 프로세서는 형태소 분석을 사용하여, 각각의 요소가 발화 내에 존재하는지 여부를 확인할 수 있다. 인사말은 다양한 변이가 허용될 수 있으므로, 프로세서는 형태소 분석 결과 중 인사 키워드가 포함된 경우, 인사말이 존재한다고 판정할 수 있다.The processor may perform an introductory evaluation based on whether a greeting containing information of greeting, affiliation, and/or name was uttered within a certain time range. For introduction evaluation, the processor may divide the extracted first set of text sentences into a plurality of morphemes (S520). Additionally, the processor may determine whether a greeting keyword is included in the plurality of segmented morphemes (S530). Here, the greeting keyword is a keyword related to the greeting and may be a core common keyword commonly used in various greetings, but is not limited thereto and may further include keywords about the affiliation, name, etc. of a predetermined counselor. For example, the processor may use morphological analysis to determine whether each element is present in the utterance. Since various variations of a greeting may be permitted, the processor may determine that a greeting exists if a greeting keyword is included in the morpheme analysis result.

추가적으로 또는 대안적으로, 프로세서는 개체명 인식(NER; Named Entity Recognition) 알고리즘을 이용하여 상담사의 이름이 포함되어 있는지 여부를 판정할 수 있다. 즉, 프로세서는 개체명 인식을 통해 사람 이름의 형태를 갖는 명사를 검출하여, 텍스트 문장 내에 상담사의 이름이 포함되어 있는지 여부를 판정할 수 있다. 이 경우, 프로세서는 트랜스포머(transformer) 기반의 BERT 모델 또는 ELECTRA 모델 등의 기계학습 모델을 이용하여 개체명 인식을 수행할 수 있다. 추가적으로 또는 대안적으로, 프로세서는 상담사의 이름을 미리 저장하고, 저장된 상담사의 이름과 매칭되는 명사가 존재하는지 여부를 기초로 상담사의 이름이 포함되어 있는지 여부를 판정할 수도 있다. 이와 같은 구성에 의해, 프로세서는 음성 상담이 시작된 후, 특정 시간 내에 정해진 구성을 포함하는 인사말이 발화되었는지 여부를 간단히 판정할 수 있다.Additionally or alternatively, the processor may use a Named Entity Recognition (NER) algorithm to determine whether the counselor's name is included. That is, the processor can detect a noun in the form of a person's name through entity name recognition and determine whether the counselor's name is included in the text sentence. In this case, the processor can perform entity name recognition using a machine learning model such as a transformer-based BERT model or ELECTRA model. Additionally or alternatively, the processor may pre-store the counselor's name and determine whether the counselor's name is included based on whether a noun matching the stored counselor's name exists. With this configuration, the processor can simply determine whether a greeting containing a predetermined configuration has been uttered within a specific time after the voice consultation begins.

도 6은 본 발명의 일 실시예에 따른 경청 능력 평가 방법(600)의 예시를 나타내는 흐름도이다. 일 실시예에 따르면, 경청 능력 평가 방법(600)은 프로세서(예를 들어, 음성 상당 품질 평가 시스템의 적어도 하나의 프로세서)에 의해 수행될 수 있다. 여기서, 경청 능력은 상담사가 고객의 발화 내용을 얼마나 주의 깊게 들었는지에 대한 정량적인 평가 요소로서, 고객의 발화 중 상담사가 개입한 횟수를 탐지하여 판단될 수 있다. 도시된 바와 같이, 경청 능력 평가 방법(600)은 프로세서가 복수의 텍스트 문장을 상담사와 연관된 제2 세트의 텍스트 문장 및 고객과 연관된 제3 세트의 텍스트 문장으로 분할함으로써 개시될 수 있다(S610).Figure 6 is a flowchart showing an example of a listening ability evaluation method 600 according to an embodiment of the present invention. According to one embodiment, the listening ability evaluation method 600 may be performed by a processor (e.g., at least one processor of a speech equivalent quality evaluation system). Here, listening ability is a quantitative evaluation factor of how carefully the counselor listened to the customer's speech, and can be judged by detecting the number of times the counselor intervened during the customer's speech. As shown, the listening ability evaluation method 600 may be initiated by the processor dividing a plurality of text sentences into a second set of text sentences associated with the counselor and a third set of text sentences associated with the customer (S610).

그 후, 프로세서는 제2 세트의 텍스트 문장의 적어도 일부가 제3 세트의 텍스트 문장의 적어도 일부와 중첩되는 횟수를 산출함으로써 상담사의 경청 능력을 평가할 수 있다(S620). 즉, 프로세서는 고객의 발화가 먼저 진행 중인 상태에서 상담사의 발화가 중첩되는 횟수를 산출할 수 있다. 이 경우, 프로세서는 특정 시간(예: 0.5초) 이상의 중첩이 발생하고, 중첩 이후 상담사의 발화가 연속해서 발생하는 횟수만을 상담사와 고객의 발화가 중첩되는 횟수로서 산출할 수 있다. 추가적으로, 중첩 이후 고객의 발화가 발생했다고 하더라도 중첩이 다른 특정 시간(예: 1초)을 초과한 경우에는, 상담사와 고객의 발화가 중첩된 것으로 판정될 수 있다. 이러한 중첩 횟수가 높게 산출될수록 해당 상담사의 경청 능력은 낮게 평가될 수 있다. 이와 같은 구성에 의해, 프로세서는 텍스트 문장을 통해 상담사와 고객의 음성이 중첩되는 구간을 인식하여, 상담사의 경청 능력을 효과적으로 평가할 수 있다.Thereafter, the processor may evaluate the counselor's listening ability by calculating the number of times that at least part of the text sentence in the second set overlaps with at least part of the text sentence in the third set (S620). In other words, the processor can calculate the number of times the counselor's utterance overlaps while the customer's utterance is in progress first. In this case, the processor can calculate only the number of times overlap occurs for a certain period of time (e.g., 0.5 seconds or more) and the counselor's utterances occur continuously after the overlap as the number of times the counselor's and customer's utterances overlap. Additionally, even if the customer's utterance occurs after the overlap, if the overlap exceeds another specific time (e.g., 1 second), the counselor's and the customer's utterances may be determined to have overlapped. The higher the number of overlaps, the lower the counselor's listening ability may be evaluated. With this configuration, the processor can effectively evaluate the counselor's listening ability by recognizing the section where the counselor's and the customer's voices overlap through text sentences.

도 7은 본 발명의 일 실시예에 따른 묵음 평가 방법(700)의 예시를 나타내는 흐름도이다. 일 실시예에 따르면, 묵음 평가 방법(700)은 프로세서(예를 들어, 음성 상당 품질 평가 시스템의 적어도 하나의 프로세서)에 의해 수행될 수 있다. 예를 들어, 음성 상담 과정에서 고객의 요구를 충족시키기 위해, 외부 정보를 참조해야 하는 경우, 발생하는 음소거(mute)는 허용될 수 있으나, 그 이외의 경우 발생하는 음소거는 상담 품질을 낮추는 근거로 결정될 수 있다. 따라서, 프로세서는 대기 요청이 없는 음소거 구간을 탐지하여 묵음 평가 방법(700)을 수행할 수 있다. 도시된 바와 같이, 묵음 평가 방법(700)은 음성 상담의 시간 구간 및 시간에 따른 복수의 텍스트 문장을 기초로 특정 시간 동안 음소거가 발생한 구간을 추출함으로써 개시될 수 있다(S710). 여기서, 음소거가 발생한 구간은 묵음이 특정 시간(예: 5초) 동안 진행된 구간을 지칭할 수 있으며, 예를 들어, 특정 텍스트 문장들 사이에서 발생될 수 있다.Figure 7 is a flowchart showing an example of a silence evaluation method 700 according to an embodiment of the present invention. According to one embodiment, the silence evaluation method 700 may be performed by a processor (e.g., at least one processor of a speech equivalent quality evaluation system). For example, in cases where external information must be referred to in order to meet the customer's needs during the voice consultation process, muting that occurs may be permitted, but muting that occurs in other cases is grounds for lowering the quality of consultation. can be decided. Accordingly, the processor may perform the silence evaluation method 700 by detecting a mute section in which there is no waiting request. As shown, the silence evaluation method 700 may be initiated by extracting a section in which silence occurs for a specific time based on the time section of the voice consultation and a plurality of text sentences according to time (S710). Here, the section in which silence occurs may refer to a section in which silence lasts for a specific time (e.g., 5 seconds), and may occur, for example, between specific text sentences.

프로세서는 복수의 텍스트 문장 중 추출된 음소거가 발생한 구간 이전의 텍스트 문장을 결정할 수 있다(S720). 또한, 프로세서는 결정된 음소거가 발생한 구간 이전의 텍스트 문장 상에 대기 요청 키워드가 포함되어 있는지 여부를 판정할 수 있다(S730). 대기 요청 키워드가 포함된 것으로 판정된 경우, 음소거가 발생한 구간의 예외에 해당할 수 있으나, 대기 요청 키워드가 포함되지 않은 것으로 판정된 경우, 음소거가 발생한 구간으로 판단될 수 있다. 일 실시예에 따르면, 대기 요청 키워드가 포함되어 있는지 여부는 도입부 평가 방법(도 6의 600)과 유사하게 수행될 수 있으며, 예를 들어, 프로세서는 음소거가 발생한 구간 이전의 텍스트 문장을 형태소 단위로 분할하고, 분할된 형태소 중 대기 요청 키워드가 포함되어 있는지 여부를 결정함으로써, 묵음 평가를 수행할 수 있다. 이와 같은 구성에 의해, 프로세서는 음성 상담 중 발생하는 묵음 구간들 중 대기 요청이 없는 묵음 구간만을 효과적으로 추출하여 상담 품질 평가를 수행할 수 있다.The processor may determine the text sentence preceding the section in which the extracted muting occurred among the plurality of text sentences (S720). Additionally, the processor may determine whether a waiting request keyword is included in the text sentence before the determined muted section (S730). If it is determined that the waiting request keyword is included, it may be an exception to the section in which muting occurred. However, if it is determined that the waiting request keyword is not included, it may be determined to be a section in which muting occurred. According to one embodiment, whether a waiting request keyword is included may be performed similarly to the introduction evaluation method (600 in FIG. 6). For example, the processor evaluates the text sentence before the section where muting occurred in morpheme units. Silence evaluation can be performed by segmenting and determining whether the segmented morphemes contain waiting request keywords. With this configuration, the processor can effectively extract only the silent sections without waiting requests among the silent sections that occur during voice counseling and perform consultation quality evaluation.

도 8은 본 발명의 일 실시예에 따른 정보 확인 평가 방법(800)의 예시를 나타내는 흐름도이다. 일 실시예에 따르면, 정보 확인 평가 방법(800)은 프로세서(예를 들어, 음성 상당 품질 평가 시스템의 적어도 하나의 프로세서)에 의해 수행될 수 있다. 여기서, 정보 확인은 고객이 요청하는 서비스를 제공하기 위해, 고객이 서비스의 접근 권한이 있는지 여부를 확인하기 위한 과정으로서, 고객의 이름, 연락처, 계좌번호 등을 확인하는 것을 지칭할 수 있다. 정보 확인을 위해, 상담사는 고객에게 정보를 요청하고, 고객으로부터 전달받은 내용을 복창한 후, 감사 인사 등을 수행할 수 있다. 도시된 바와 같이, 정보 확인 평가 방법(800)은 복수의 텍스트 문장을 상담사와 연관된 제2 세트의 텍스트 문장 및 고객과 연관된 제3 세트의 텍스트 문장으로 분할함으로써 개시될 수 있다(S810).Figure 8 is a flowchart showing an example of an information verification evaluation method 800 according to an embodiment of the present invention. According to one embodiment, the information verification evaluation method 800 may be performed by a processor (e.g., at least one processor of a speech equivalent quality evaluation system). Here, information verification is a process to check whether the customer has permission to access the service in order to provide the service requested by the customer, and may refer to checking the customer's name, contact information, account number, etc. To confirm information, the counselor may request information from the customer, repeat the information received from the customer, and then say thank you. As shown, the information verification evaluation method 800 may be initiated by dividing a plurality of text sentences into a second set of text sentences associated with the counselor and a third set of text sentences associated with the customer (S810).

프로세서는 제3 세트의 텍스트 문장 중 고객의 개인 정보를 포함하는 개인 정보 텍스트 문장을 추출할 수 있다(S820). 예를 들어, 개인 정보 텍스트 문장은 고객의 이름, 연락처, 계좌번호 등을 포함하는 문장을 지칭할 수 있다. 프로세서는 상술된 개체명 인식 및/또는 형태소 분석 등을 이용하여, 개인 정보 텍스트 문장을 추출할 수 있으나, 이에 한정되지 않으며, 특정 정보를 포함하는 텍스트를 추출하기 위한 임의의 알고리즘을 이용하여 개인 정보 텍스트 문장을 추출할 수도 있다.The processor may extract a personal information text sentence containing the customer's personal information from the third set of text sentences (S820). For example, a personal information text sentence may refer to a sentence containing the customer's name, contact information, account number, etc. The processor may extract personal information text sentences using the above-described entity name recognition and/or morphological analysis, but is not limited to this, and may extract personal information using an arbitrary algorithm for extracting text containing specific information. You can also extract text sentences.

프로세서는 제2 세트의 텍스트 문장 중 추출된 개인 정보 텍스트 문장 직후의 문장을 추출할 수 있다(S830). 또한, 프로세서는 추출된 개인 정보 텍스트 문장 직후의 문장에 고객의 개인 정보에 대응하는 키워드가 포함되어 있는지 여부를 판정할 수 있다(S840). 일 실시예에 따르면, 프로세서는 개인 정보 텍스트 문장으로부터 고객의 개인 정보에 해당하는 키워드를 추출하고, 추출된 키워드와 대응되는 키워드가 추출된 개인 정보 텍스트 문장 직후의 상담사의 발화에 포함되어 있는지 여부를 판정할 수 있다. 추가적으로, 프로세서는 상담사가 고객의 개인 정보를 확인한 이후에, 감사 인사 등을 포함하는 텍스트 문장이 존재하는지 여부를 판정할 수도 있다. 이와 같은 구성에 의해, 프로세서는 고객의 개인 정보가 발화된 경우에, 상담사가 해당 내용을 복창하며 개인 정보의 정확성에 대한 재확인을 수행하였는지 여부를 간단히 인식할 수 있다.The processor may extract a sentence immediately following the extracted personal information text sentence from among the text sentences in the second set (S830). Additionally, the processor may determine whether the sentence immediately following the extracted personal information text sentence includes a keyword corresponding to the customer's personal information (S840). According to one embodiment, the processor extracts a keyword corresponding to the customer's personal information from the personal information text sentence and determines whether the keyword corresponding to the extracted keyword is included in the counselor's utterance immediately after the extracted personal information text sentence. can be judged. Additionally, the processor may determine whether a text sentence including a thank you note exists after the counselor confirms the customer's personal information. With this configuration, when the customer's personal information is uttered, the processor can simply recognize whether the counselor has re-confirmed the accuracy of the personal information by repeating the relevant content.

도 9는 본 발명의 일 실시예에 따른 끝인사 평가 방법(900)의 예시를 나타내는 흐름도이다. 일 실시예에 따르면, 끝인사 평가 방법(900)은 프로세서(예를 들어, 음성 상당 품질 평가 시스템의 적어도 하나의 프로세서)에 의해 수행될 수 있다. 여기서, 끝인사는 음성 상담이 종료되는 마무리 부분에서 추가문의 여부, 소속, 이름 및 인사말로 구성될 수 있다. 즉, 프로세서는 마무리 부분에서 상담사가 추가문의 여부, 소속, 이름 및/또는 인사말을 포함하는 끝인사를 발화하였는지 여부를 기초로 끝인사 평가를 수행할 수 있다. 도시된 바와 같이, 끝인사 평가 방법(900)은 프로세서가 복수의 텍스트 문장 중 마지막 특정 개수(예: 5개)에 대응하는 제4 세트의 텍스트 문장을 추출함으로써 개시될 수 있다(S910).Figure 9 is a flowchart showing an example of a closing greeting evaluation method 900 according to an embodiment of the present invention. According to one embodiment, the greeting evaluation method 900 may be performed by a processor (eg, at least one processor of a voice equivalent quality evaluation system). Here, the ending greeting may consist of additional inquiries, affiliation, name, and greeting at the end of the voice consultation. That is, the processor may perform a closing greeting evaluation based on whether or not the counselor uttered a closing greeting including additional inquiry, affiliation, name, and/or greeting in the closing portion. As shown, the ending greeting evaluation method 900 may be initiated by the processor extracting a fourth set of text sentences corresponding to the last specific number (e.g., 5) of the plurality of text sentences (S910).

프로세서는 개체명 인식(NER; Named Entity Recognition) 알고리즘을 이용하여 제4 세트의 텍스트 문장 상에 상담사의 이름이 포함되어 있는지 여부를 판정할 수 있다. 즉, 프로세서는 개체명 인식을 통해 사람 이름의 형태를 갖는 명사를 검출하여, 텍스트 문장 내에 상담사의 이름이 포함되어 있는지 여부를 판정할 수 있다. 이 경우, 프로세서는 트랜스포머(transformer) 기반의 BERT 모델 또는 ELECTRA 모델 등의 기계학습 모델을 이용하여 개체명 인식을 수행할 수 있다.The processor may determine whether the counselor's name is included in the fourth set of text sentences using a Named Entity Recognition (NER) algorithm. That is, the processor can detect a noun in the form of a person's name through entity name recognition and determine whether the counselor's name is included in the text sentence. In this case, the processor can perform entity name recognition using a machine learning model such as a transformer-based BERT model or ELECTRA model.

추가적으로, 프로세서는 제4 세트의 텍스트 문장을 복수의 형태소로 분할할 수 있다. 또한, 프로세서는 분할된 복수의 형태소 상에 마무리 키워드가 포함되어 있는지 여부를 판정할 수 있다. 여기서, 마무리 키워드는 끝인사, 추가문의 확인 등과 연관된 키워드일 수 있으나, 이에 한정되지 않으며, 미리 정해진 상담사의 소속, 이름 등에 대한 키워드를 더 포함할 수 있다. 예를 들어, 프로세서는 형태소 분석을 사용하여, 각각의 요소가 발화 내에 존재하는지 여부를 확인할 수 있다. 인사말은 다양한 변이가 허용될 수 있으므로, 프로세서는 형태소 분석 결과 중 마무리 키워드가 포함된 경우, 인사말이 존재한다고 판정할 수 있다.Additionally, the processor may segment the fourth set of text sentences into a plurality of morphemes. Additionally, the processor may determine whether a final keyword is included in the plurality of divided morphemes. Here, the closing keyword may be a keyword related to a closing greeting, confirmation of additional inquiry, etc., but is not limited to this and may further include keywords related to the affiliation, name, etc. of a predetermined counselor. For example, the processor may use morphological analysis to determine whether each element is present in the utterance. Since various variations of a greeting may be permitted, the processor may determine that a greeting exists if a concluding keyword is included in the morpheme analysis result.

상술된 방법 및/또는 다양한 실시예들은, 디지털 전자 회로, 컴퓨터 하드웨어, 펌웨어, 소프트웨어 및/또는 이들의 조합으로 실현될 수 있다. 본 발명의 다양한 실시예들은 데이터 처리 장치, 예를 들어, 프로그래밍 가능한 하나 이상의 프로세서 및/또는 하나 이상의 컴퓨팅 장치에 의해 실행되거나, 컴퓨터 판독 가능한 기록 매체 및/또는 컴퓨터 판독 가능한 기록 매체에 저장된 컴퓨터 프로그램으로 구현될 수 있다. 상술된 컴퓨터 프로그램은 컴파일된 언어 또는 해석된 언어를 포함하여 임의의 형태의 프로그래밍 언어로 작성될 수 있으며, 독립 실행형 프로그램, 모듈, 서브 루틴 등의 임의의 형태로 배포될 수 있다. 컴퓨터 프로그램은 하나의 컴퓨팅 장치, 동일한 네트워크를 통해 연결된 복수의 컴퓨팅 장치 및/또는 복수의 상이한 네트워크를 통해 연결되도록 분산된 복수의 컴퓨팅 장치를 통해 배포될 수 있다.The above-described method and/or various embodiments may be implemented with digital electronic circuitry, computer hardware, firmware, software, and/or combinations thereof. Various embodiments of the present invention are executed by a data processing device, for example, one or more programmable processors and/or one or more computing devices, or as a computer program stored in a computer-readable recording medium and/or a computer-readable recording medium. It can be implemented. The above-described computer program may be written in any form of programming language, including compiled language or interpreted language, and may be distributed in any form such as a stand-alone program, module, or subroutine. A computer program may be distributed via a single computing device, multiple computing devices connected through the same network, and/or multiple computing devices distributed so as to be connected through multiple different networks.

상술된 방법 및/또는 다양한 실시예들은, 입력 데이터를 기초로 동작하거나 출력 데이터를 생성함으로써, 임의의 기능, 함수 등을 처리, 저장 및/또는 관리하는 하나 이상의 컴퓨터 프로그램을 실행하도록 구성된 하나 이상의 프로세서에 의해 수행될 수 있다. 예를 들어, 본 발명의 방법 및/또는 다양한 실시예는 FPGA(Field Programmable Gate Array) 또는 ASIC(Application Specific Integrated Circuit)과 같은 특수 목적 논리 회로에 의해 수행될 수 있으며, 본 발명의 방법 및/또는 실시예들을 수행하기 위한 장치 및/또는 시스템은 FPGA 또는 ASIC와 같은 특수 목적 논리 회로로서 구현될 수 있다.The above-described method and/or various embodiments include one or more processors configured to execute one or more computer programs that process, store, and/or manage any functions, functions, etc., by operating on input data or generating output data. It can be performed by . For example, the method and/or various embodiments of the present invention may be performed by special purpose logic circuits such as a Field Programmable Gate Array (FPGA) or Application Specific Integrated Circuit (ASIC), and the method and/or various embodiments of the present invention may An apparatus and/or system for performing embodiments may be implemented as a special purpose logic circuit, such as an FPGA or ASIC.

컴퓨터 프로그램을 실행하는 하나 이상의 프로세서는, 범용 목적 또는 특수 목적의 마이크로 프로세서 및/또는 임의의 종류의 디지털 컴퓨팅 장치의 하나 이상의 프로세서를 포함할 수 있다. 프로세서는 읽기 전용 메모리, 랜덤 액세스 메모리의 각각으로부터 명령 및/또는 데이터를 수신하거나, 읽기 전용 메모리와 랜덤 액세스 메모리로부터 명령 및/또는 데이터를 수신할 수 있다. 본 발명에서, 방법 및/또는 실시예들을 수행하는 컴퓨팅 장치의 구성 요소들은 명령어들을 실행하기 위한 하나 이상의 프로세서, 명령어들 및/또는 데이터를 저장하기 위한 하나 이상의 메모리 디바이스를 포함할 수 있다.The one or more processors executing the computer program may include a general-purpose or special-purpose microprocessor and/or one or more processors in any type of digital computing device. The processor may receive instructions and/or data from each of read-only memory and random access memory, or may receive instructions and/or data from read-only memory and random access memory. In the present invention, components of a computing device performing methods and/or embodiments may include one or more processors for executing instructions and one or more memory devices for storing instructions and/or data.

일 실시예에 따르면, 컴퓨팅 장치는 데이터를 저장하기 위한 하나 이상의 대용량 저장 장치와 데이터를 주고받을 수 있다. 예를 들어, 컴퓨팅 장치는 자기 디스크(magnetic disc) 또는 광 디스크(optical disc)로부터 데이터를 수신하거나/수신하고, 자기 디스크 또는 광 디스크로 데이터를 전송할 수 있다. 컴퓨터 프로그램과 연관된 명령어들 및/또는 데이터를 저장하기에 적합한 컴퓨터 판독 가능한 저장 매체는, EPROM(Erasable Programmable Read-Only Memory), EEPROM(Electrically Erasable PROM), 플래시 메모리 장치 등의 반도체 메모리 장치를 포함하는 임의의 형태의 비 휘발성 메모리를 포함할 수 있으나, 이에 한정되지 않는다. 예를 들어, 컴퓨터 판독 가능한 저장 매체는 내부 하드 디스크 또는 이동식 디스크와 같은 자기 디스크, 광 자기 디스크, CD-ROM 및 DVD-ROM 디스크를 포함할 수 있다.According to one embodiment, a computing device may exchange data with one or more mass storage devices for storing data. For example, a computing device may receive data from and/or transmit data to a magnetic disc or optical disc. Computer-readable storage media suitable for storing instructions and/or data associated with a computer program include semiconductor memory devices such as EPROM (Erasable Programmable Read-Only Memory), EEPROM (Electrically Erasable PROM), and flash memory devices. It may include, but is not limited to, any form of non-volatile memory. For example, computer-readable storage media may include magnetic disks such as internal hard disks or removable disks, magneto-optical disks, CD-ROM, and DVD-ROM disks.

사용자와의 상호 작용을 제공하기 위해, 컴퓨팅 장치는 정보를 사용자에게 제공하거나 디스플레이하기 위한 디스플레이 장치(예를 들어, CRT (Cathode Ray Tube), LCD(Liquid Crystal Display) 등) 및 사용자가 컴퓨팅 장치 상에 입력 및/또는 명령 등을 제공할 수 있는 포인팅 장치(예를 들어, 키보드, 마우스, 트랙볼 등)를 포함할 수 있으나, 이에 한정되지 않는다. 즉, 컴퓨팅 장치는 사용자와의 상호 작용을 제공하기 위한 임의의 다른 종류의 장치들을 더 포함할 수 있다. 예를 들어, 컴퓨팅 장치는 사용자와의 상호 작용을 위해, 시각적 피드백, 청각 피드백 및/또는 촉각 피드백 등을 포함하는 임의의 형태의 감각 피트백을 사용자에게 제공할 수 있다. 이에 대해, 사용자는 시각, 음성, 동작 등의 다양한 제스처를 통해 컴퓨팅 장치로 입력을 제공할 수 있다.To provide interaction with a user, the computing device may include a display device (e.g., cathode ray tube (CRT), liquid crystal display (LCD), etc.) for presenting or displaying information to the user and a display device (e.g., cathode ray tube (CRT), liquid crystal display (LCD), etc.) for displaying or presenting information to the user. It may include, but is not limited to, a pointing device (e.g., keyboard, mouse, trackball, etc.) capable of providing input and/or commands. That is, the computing device may further include any other types of devices for providing interaction with the user. For example, a computing device may provide the user with any form of sensory feedback for interaction with the user, including visual feedback, auditory feedback, and/or tactile feedback. In response, the user can provide input to the computing device through various gestures such as sight, voice, and movement.

본 발명에서, 다양한 실시예들은 백엔드 구성 요소(예: 데이터 서버), 미들웨어 구성 요소(예: 애플리케이션 서버) 및/또는 프론트 엔드 구성 요소를 포함하는 컴퓨팅 시스템에서 구현될 수 있다. 이 경우, 구성 요소들은 통신 네트워크와 같은 디지털 데이터 통신의 임의의 형태 또는 매체에 의해 상호 연결될 수 있다. 예를 들어, 통신 네트워크는 LAN(Local Area Network), WAN(Wide Area Network) 등을 포함할 수 있다.In the present invention, various embodiments may be implemented in a computing system that includes a back-end component (e.g., a data server), a middleware component (e.g., an application server), and/or a front-end component. In this case, the components may be interconnected by any form or medium of digital data communication, such as a communications network. For example, a communication network may include a Local Area Network (LAN), a Wide Area Network (WAN), etc.

본 명세서에서 기술된 예시적인 실시예들에 기반한 컴퓨팅 장치는, 사용자 디바이스, 사용자 인터페이스(UI) 디바이스, 사용자 단말 또는 클라이언트 디바이스를 포함하여 사용자와 상호 작용하도록 구성된 하드웨어 및/또는 소프트웨어를 사용하여 구현될 수 있다. 예를 들어, 컴퓨팅 장치는 랩톱(laptop) 컴퓨터와 같은 휴대용 컴퓨팅 장치를 포함할 수 있다. 추가적으로 또는 대안적으로, 컴퓨팅 장치는, PDA(Personal Digital Assistants), 태블릿 PC, 게임 콘솔(game console), 웨어러블 디바이스(wearable device), IoT(internet of things) 디바이스, VR(virtual reality) 디바이스, AR(augmented reality) 디바이스 등을 포함할 수 있으나, 이에 한정되지 않는다. 컴퓨팅 장치는 사용자와 상호 작용하도록 구성된 다른 유형의 장치를 더 포함할 수 있다. 또한, 컴퓨팅 장치는 이동 통신 네트워크 등의 네트워크를 통한 무선 통신에 적합한 휴대용 통신 디바이스(예를 들어, 이동 전화, 스마트 전화, 무선 셀룰러 전화 등) 등을 포함할 수 있다. 컴퓨팅 장치는, 무선 주파수(RF; Radio Frequency), 마이크로파 주파수(MWF; Microwave Frequency) 및/또는 적외선 주파수(IRF; Infrared Ray Frequency)와 같은 무선 통신 기술들 및/또는 프로토콜들을 사용하여 네트워크 서버와 무선으로 통신하도록 구성될 수 있다.Computing devices based on example embodiments described herein may be implemented using hardware and/or software configured to interact with a user, including a user device, user interface (UI) device, user terminal, or client device. You can. For example, a computing device may include a portable computing device, such as a laptop computer. Additionally or alternatively, computing devices include personal digital assistants (PDAs), tablet PCs, game consoles, wearable devices, internet of things (IoT) devices, virtual reality (VR) devices, AR (augmented reality) devices, etc. may be included, but are not limited thereto. Computing devices may further include other types of devices configured to interact with a user. Additionally, the computing device may include a portable communication device (eg, a mobile phone, smart phone, wireless cellular phone, etc.) suitable for wireless communication over a network, such as a mobile communication network. The computing device may wirelessly connect to a network server using wireless communication technologies and/or protocols, such as radio frequency (RF), microwave frequency (MWF), and/or infrared ray frequency (IRF). It can be configured to communicate with.

본 발명에서 특정 구조적 및 기능적 세부 사항을 포함하는 다양한 실시예들은 예시적인 것이다. 따라서, 본 발명의 실시예들은 상술된 것으로 한정되지 않으며, 여러 가지 다른 형태로 구현될 수 있다. 또한, 본 발명에서 사용된 용어는 일부 실시예를 설명하기 위한 것이며 실시예를 제한하는 것으로 해석되지 않는다. 예를 들어, 단수형 단어 및 상기는 문맥상 달리 명확하게 나타내지 않는 한 복수형도 포함하는 것으로 해석될 수 있다.The various embodiments herein, including specific structural and functional details, are illustrative. Accordingly, embodiments of the present invention are not limited to those described above and may be implemented in various other forms. Additionally, the terms used in the present invention are intended to describe some embodiments and are not to be construed as limiting the embodiments. For example, the singular forms of words and the like may be construed to include the plural, unless the context clearly indicates otherwise.

본 발명에서, 달리 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함하여 본 명세서에서 사용되는 모든 용어는 이러한 개념이 속하는 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 갖는다. 또한, 사전에 정의된 용어와 같이 일반적으로 사용되는 용어들은 관련 기술의 맥락에서의 의미와 일치하는 의미를 갖는 것으로 해석되어야 한다.In the present invention, unless otherwise defined, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by a person of ordinary skill in the technical field to which these concepts belong. . Additionally, commonly used terms, such as those defined in dictionaries, should be interpreted as having meanings consistent with their meaning in the context of the relevant technology.

본 명세서에서는 본 발명이 일부 실시예들과 관련하여 설명되었지만, 본 발명의 발명이 속하는 기술분야의 통상의 기술자가 이해할 수 있는 본 발명의 범위를 벗어나지 않는 범위에서 다양한 변형 및 변경이 이루어질 수 있다. 또한, 그러한 변형 및 변경은 본 명세서에 첨부된 특허청구의 범위 내에 속하는 것으로 생각되어야 한다.Although the present invention has been described in relation to some embodiments in this specification, various modifications and changes may be made without departing from the scope of the present invention as understood by those skilled in the art. Additionally, such modifications and changes should be considered to fall within the scope of the claims appended hereto.

110: 상담사 120: 고객
130: 음성 상담 녹취 서버 140: 녹취 DB
142: 녹취 데이터 150: 음성 상담 품질 평가 시스템
152: 평가 보고서110: Counselor 120: Customer
130: Voice consultation recording server 140: Recording DB
142: Recorded data 150: Voice consultation quality evaluation system
152: Evaluation report

Claims

A method for evaluating the quality of a voice consultation performed by at least one processor, comprising:
When a voice consultation is performed, receiving recorded data associated with the voice consultation;
Based on the received recording data, generating text data corresponding to the voices of the counselor and the customer over time;
generating a plurality of text sentences by dividing the generated text data into sentence units;
extracting a silent section included in the recorded data and verifying the accuracy of the plurality of generated text sentences based on the extracted silent section; and
performing a quality evaluation of the voice consultation using a time section of the voice consultation and a plurality of verified text sentences according to the time;
In the quality evaluation method of voice consultation, including,
The quality evaluation of the above voice consultation is,
The entire voice counseling process includes an introduction evaluation, listening ability evaluation, silence evaluation, information confirmation evaluation, and closing greeting evaluation performed in chronological order,
In the case of the listening ability evaluation, the step of performing a quality evaluation of the voice consultation is,
splitting the plurality of text sentences into a second set of text sentences associated with the counselor and a third set of text sentences associated with the customer; and
calculating a number of times that at least a portion of the text sentences in the second set overlap with at least a portion of the text sentences in the third set;
Including,
The number of overlaps is,
Overlap occurs over a certain time, and includes the number of times the counselor's utterance occurs continuously after the overlap, and the number of times the overlap exceeds another specific time even if the customer's utterance occurs after the overlap,
In the case of the silent evaluation, the step of performing a quality evaluation of the voice consultation is,
extracting a section in which mute occurs for another specific time based on the time section of the voice consultation and a plurality of text sentences according to time;
determining a text sentence before the section in which the muting occurred among the plurality of text sentences; and
determining whether a waiting request keyword is included in the text sentence before the determined muted section;
Quality evaluation method of voice consultation including.

According to paragraph 1,
The step of extracting a silent section included in the recording data and verifying the accuracy of the plurality of text sentences generated based on the extracted silent section includes,
generating a mel spectrogram associated with the recording data and extracting the silent section using the generated mel spectrogram;
Including, a method for evaluating the quality of voice consultation.

According to paragraph 1,
The step of performing a quality evaluation on the voice consultation is,
extracting a first set of text sentences included within a specific time range from among the plurality of text sentences;
dividing the extracted first set of text sentences into a plurality of morphemes; and
determining whether a greeting keyword associated with a greeting is included in the plurality of divided morphemes;
Including, a method for evaluating the quality of voice consultation.

delete

According to paragraph 1,
The step of performing a quality evaluation on the voice consultation is,
splitting the plurality of text sentences into a second set of text sentences associated with the counselor and a third set of text sentences associated with the customer; and
extracting a personal information text sentence containing the customer's personal information from the third set of text sentences;
extracting a text sentence immediately following the extracted personal information text sentence from among the second set of text sentences; and
determining whether a text sentence immediately following the extracted personal information text sentence includes a keyword corresponding to the customer's personal information;
Including, a method for evaluating the quality of voice consultation.

According to paragraph 1,
The step of performing a quality evaluation on the voice consultation is,
extracting a fourth set of text sentences corresponding to a last specific number of the plurality of text sentences; and
determining whether information associated with the counselor is included in the fourth set of text sentences using a Named Entity Recognition (NER) algorithm;
Including, a method for evaluating the quality of voice consultation.

According to paragraph 1,
generating a quality evaluation report by visualizing result data for quality evaluation of the voice consultation for each quality evaluation item;
A method for evaluating the quality of voice consultation, further comprising:

A computer program stored in a computer-readable recording medium for executing the quality evaluation method of voice consultation according to any one of claims 1 to 3 and 6 to 8 on a computer.

As a voice consultation quality evaluation system,
When a voice consultation is performed, a text conversion unit that receives recording data associated with the voice consultation and generates text data corresponding to the voices of the counselor and the customer over time based on the received recording data;
a sentence generator that divides the generated text data into sentence units to generate a plurality of text sentences;
a sentence verification unit that extracts a silent section included in the recorded data and verifies the accuracy of the plurality of generated text sentences based on the extracted silent section; and
a quality evaluation performing unit that performs a quality evaluation of the voice consultation using a time section of the voice consultation and a plurality of verified text sentences according to the time;
In the voice consultation quality evaluation system including,
The quality evaluation of the above voice consultation is,
The entire voice counseling process includes an introduction evaluation, listening ability evaluation, silence evaluation, information confirmation evaluation, and closing greeting evaluation performed in chronological order,
In the case of the listening ability evaluation, the quality evaluation performance department,
Splitting the plurality of text sentences into a second set of text sentences associated with the counselor and a third set of text sentences associated with the customer, wherein at least a portion of the second set of text sentences is one of the third set of text sentences. Calculate the number of times it overlaps with at least a part,
The number of overlaps is,
Overlap occurs over a certain time, and includes the number of times the counselor's utterance occurs continuously after the overlap, and the number of times the overlap exceeds another specific time even if the customer's utterance occurs after the overlap,
In the case of the silent evaluation, the quality evaluation performance unit,
Based on the time section of the voice consultation and a plurality of text sentences according to time, a section in which muting occurred during another specific time is extracted, and a text sentence before the section in which the muting occurred among the plurality of text sentences is determined. and determining whether a waiting request keyword is included in the text sentence before the section in which the determined muting occurred,
Voice consultation quality evaluation system.