KR102455709B1

KR102455709B1 - Method and apparatus for automated evaluation of synthetic speech based on artificial intelligence

Info

Publication number: KR102455709B1
Application number: KR1020200162905A
Authority: KR
Inventors: 한경훈
Original assignee: 주식회사 에스알유니버스
Priority date: 2020-11-27
Filing date: 2020-11-27
Publication date: 2022-10-21
Also published as: KR20220074415A

Abstract

본 발명은 인공지능 기반 합성음성의 평가 자동화 방법 및 장치에 관한 것으로서, 합성음성 및 원본음성을 화음적, 타악기적으로 각각 분리하는 단계 및 분리된 상기 화음적, 타악기적 부분으로 평가지표로서, 평균값, 표준편차, 멜스펙트로그램 평균값, 멜스펙트로그램 표준편차를 산출하여 평가 점수를 산출하는 단계를 포함한다.The present invention relates to a method and apparatus for automating the evaluation of synthetic voices based on artificial intelligence, the steps of separating the synthesized voice and the original voice into chord and percussion, respectively, and the separated chord and percussion parts as an evaluation index, the average value , and calculating the evaluation score by calculating the standard deviation, the average value of the mel spectrogram, and the standard deviation of the mel spectrogram.

Description

Method and apparatus for automated evaluation of synthetic speech based on artificial intelligence}

본 발명의 실시예들은 인공지능 또는 컴퓨터를 통해 생성된 음성에 대한 평가를 자동화할 수 있는 인공지능 기반 합성음성의 평가 자동화 방법 및 장치에 관한 것이다.Embodiments of the present invention relate to a method and apparatus for automating the evaluation of artificial intelligence-based synthesized speech capable of automating the evaluation of speech generated through artificial intelligence or a computer.

인공지능(AI: Artificial Intelligence)은 인간의 학습능력과 추론능력, 지각능력, 자연언어의 이해능력 등을 컴퓨터 프로그램으로 실현한 기술을 의미한다. 현재 개발되고 있는 인공지능은 대화형 사용자 인터페이스(CUI: Conversational User Interface)를 구현하기 위한 기술들에 주로 사용되고 있다. 여기에 사용되는 기술로 음성인식(STT: Speech To Text), 자연어 이해(NLU: Natural Language Understanding), 자연어 생성(NLG: Natural Language Generation), 텍스트-음성합성(TTS: text to speech) 등이 있다.Artificial intelligence (AI) refers to a technology that realizes human learning ability, reasoning ability, perceptual ability, and natural language understanding ability through computer programs. Artificial intelligence, which is currently being developed, is mainly used in technologies for implementing a Conversational User Interface (CUI). Technologies used here include speech to text (STT), natural language understanding (NLU), natural language generation (NLG), and text to speech (TTS). .

텍스트-음성합성은 텍스트를 음성으로 변환하는 데 사용되는 　음성 합성　애플리케이션으로서, 임의의 문자열, 단어, 문장을 동일한 내용을 말하는 사람의 소리로 변환하여 출력할 수 있다. 이를 통해, 스마트폰, TV, 스피커, 내비게이션 등과 같이 음성 인식을 통해 입력된 명령에 해당하는 결과 텍스트를 텍스트-음성합성 애플리케이션에서 자연스럽게 합성된 사람의 음성을 재생 가능한 오디오로 생성하여 출력하게 된다. Text-to-speech synthesis is a 　speech synthesis application used to convert text into speech, and it can convert arbitrary character strings, words, and sentences into the sound of a person speaking the same content and output it. Through this, the result text corresponding to the command input through voice recognition, such as a smartphone, TV, speaker, navigation, etc., is generated and output as reproducible audio by naturally synthesized human voice in a text-to-speech synthesis application.

이러한 텍스트-음성합성 방식은 인공 신경망(artificial neural network) 기반 또는 컴퓨터를 이용한 텍스트-음성합성을 통해 종래에 비해 사람과 대화하는 수준의 자연스러운 음성 특징을 보여주고 있다. This text-to-speech synthesis method shows natural speech characteristics at the level of conversation with a human compared to the related art through text-to-speech synthesis based on an artificial neural network or using a computer.

다만, 인공 신경망 또는 컴퓨터 기반의 음성 합성 방법을 통해 자연스러운 음성 특징을 도출하고 있으나, 다양한 방식을 통해 합성된 음성을 평가하는 방식은 사람에 의한 설문 평가 및 음색의 판단에 의존하고 있기 때문에 개인의 주관에 따라 평가가 진행될 수 밖에 없다는 문제점이 있었다. However, although natural speech characteristics are derived through artificial neural networks or computer-based speech synthesis methods, the method of evaluating synthesized speech through various methods depends on the evaluation of questionnaires and the judgment of tone by individuals, so it is subjective of the individual. There was a problem that the evaluation had to proceed accordingly.

전술한 배경기술은 발명자가 본 발명의 도출을 위해 보유하고 있었거나, 본 발명의 도출 과정에서 습득한 기술 정보로서, 반드시 본 발명의 출원 전에 일반 공중에게 공개된 공지기술이라 할 수는 없다.The above-mentioned background art is technical information possessed by the inventor for the derivation of the present invention or acquired in the process of derivation of the present invention, and cannot necessarily be said to be a known technique disclosed to the general public prior to the filing of the present invention.

대한민국 공개특허 제2009-0026504호Republic of Korea Patent Publication No. 2009-0026504

현재까지 인공지능 또는 컴퓨터를 통해 생성된 음성에 대한 평가는 그 목소리를 들어보지 못한 여러명의 제 3자를 통해 점수를 매겨 그 음성 모듈의 성능과 그 모듈을 통해 발화된 음성의 질을 가늠하는 형태로 진행되었으므로, 이러한 종래 기술의 문제점을 해결하기 위한 본 발명의 일 실시예는, 인공지능 또는 컴퓨터를 통해 생성된 음성에 대한 평가를 자동화할 수 있는 인공지능 기반 합성음성의 평가 자동화 방법 및 장치를 제공한다.Until now, the evaluation of voices generated by artificial intelligence or computers is a form of estimating the performance of the voice module and the quality of the voice uttered through the module by scoring scores through several third parties who have not heard the voice. Since progress has been made, an embodiment of the present invention for solving the problems of the prior art provides an artificial intelligence-based synthetic voice evaluation automation method and apparatus capable of automating the evaluation of a voice generated through artificial intelligence or a computer do.

본 발명의 다른 실시예는, 합성음성 평가를 위한 세부지표들을 토대로 기계로 만들어진 음성의 평가를 자동화하며 객관화할 수 있는 인공지능 기반 합성음성의 평가 자동화 방법 및 장치를 제공한다.Another embodiment of the present invention provides an artificial intelligence-based synthetic voice evaluation automation method and apparatus that can automate and objectify the evaluation of a machine-made voice based on detailed indicators for the synthesized voice evaluation.

본 발명의 일 측면은, 합성음성 및 원본음성을 화음적(Harmonic), 타악기적(Percussive)으로 각각 분리하는 단계 및 분리된 상기 화음적, 타악기적 부분으로 평가지표로서, 평균값, 표준편차, 멜스펙트로그램 평균값, 멜스펙트로그램 표준편차를 산출하여 평가 점수를 산출하는 단계를 포함한다.An aspect of the present invention provides a step of separating a synthesized voice and an original voice into harmonic and percussive, respectively, and the separated harmonic and percussive parts as evaluation indicators, average value, standard deviation, mel and calculating an evaluation score by calculating a spectrogram average value and a mel spectrogram standard deviation.

또한, 상기 화음적, 타악기적으로 각각 분리하는 단계는, 상기 합성음성 및 원본음성을 각각 단시간 푸리에 변환(STFT)을 통해 분리하는 것을 특징으로 한다.In addition, the step of separating each chordally and percussion instrument is characterized in that the synthesized voice and the original voice are separated through a short-time Fourier transform (STFT), respectively.

또한, 상기 평가 점수를 산출하는 단계는, 분리된 상기 화음적, 타악기적 부분을 멜스펙트로그램으로 변환하는 단계를 더 포함한다.The calculating of the evaluation score further includes converting the separated chordal and percussion parts into a melspectrogram.

또한, 상기 멜스펙트로그램으로 변환하는 단계는, 상기 멜스펙트로그램으로 변환된 2차원 정보를 1차원으로 변환하는 단계를 더 포함한다.In addition, the converting into the mel spectrogram may further include converting the two-dimensional information converted into the mel spectrogram into a one-dimensional one.

또한, 상기 평가 점수를 산출하는 단계는, 산출된 상기 평가 점수와, 상기 평가 점수를 임계값과 비교하여 비교 결과를 출력하는 단계를 더 포함한다.The calculating of the evaluation score further includes outputting a comparison result by comparing the calculated evaluation score and the evaluation score with a threshold value.

본 발명의 다른 측면은, 합성음성 및 원본음성을 화음적, 타악기적으로 각각 분리하는 화음적/타악기적 소스분리부와, 분리된 상기 화음적, 타악기적 부분으로 평균값 및 멜스펙트로그램 평균값을 산출하는 평균값모듈부와, 분리된 상기 화음적, 타악기적 부분으로 표준편차, 멜스펙트로그램 표준편차를 산출하는 표준편차 모듈부 를 포함한다.Another aspect of the present invention is a chord/percussion source separation unit that separates the synthesized voice and the original voice chordically and percussion, respectively, and calculates the average value and the melspectrogram average value with the separated chordal and percussion parts. and a standard deviation module for calculating standard deviation and melspectrogram standard deviation with the separated chord and percussion parts.

또한, 상기 화음적/타악기적 소스분리부는, 상기 합성음성 및 원본음성을 각각 단시간 푸리에 변환(STFT)을 통해 분리하는 것을 특징으로 한다.In addition, the chordal/percussion source separating unit separates the synthesized voice and the original voice through a short-time Fourier transform (STFT), respectively.

또한, 상기 평가 자동화 장치는, 분리된 상기 화음적, 타악기적 부분을 멜스펙트로그램으로 변환하는 멜스펙트로그램 변환모듈부를 더 포함한다.In addition, the evaluation automation apparatus further includes a mel spectrogram conversion module unit for converting the separated chord and percussion parts into a mel spectrogram.

또한, 상기 평가 자동화 장치는, 상기 멜스펙트로그램으로 변환된 2차원 정보를 1차원으로 변환하는 멜 1차원 변환부를 더 포함한다.In addition, the evaluation automation apparatus further includes a Mel 1D transform unit for converting the 2D information converted into the Mel spectrogram into 1D.

또한, 상기 평가 자동화 장치는, 산출된 상기 평가 점수와, 상기 평가 점수를 임계값과 비교하여 비교 결과를 출력하는 출력부를 더 포함한다.The evaluation automation apparatus further includes an output unit for outputting a comparison result by comparing the calculated evaluation score with a threshold value and the evaluation score.

전술한 것 외의 다른 측면, 특징, 이점이 이하의 도면, 특허청구범위 및 발명의 상세한 설명으로부터 명확해질 것이다.Other aspects, features and advantages other than those described above will become apparent from the following drawings, claims, and detailed description of the invention.

본 발명의 일 실시예에 따른 인공지능 기반 합성음성의 평가 자동화 방법 및 장치는, 인공지능 또는 컴퓨터를 통해 생성된 음성에 대한 평가의 자동화를 가능하게 할 수 있다.The method and apparatus for automating the evaluation of artificial intelligence-based synthesized speech according to an embodiment of the present invention may enable automation of the evaluation of speech generated through artificial intelligence or a computer.

그리고 인공지능 또는 컴퓨터를 통해 생성된 음성에 대한 평가가 그 목소리를 들어보지 못한 제 3자 여럿을 고용하거나 부탁하여 점수를 매겨 그 음성 모듈의 성능과 그 모듈을 통해 발화된 음성의 질을 가늠하거나 평가하는 방식에서 인공지능 기반의 세부지표를 토대로 합성음성에 대한 평가 자동화 및 객관화를 가능하게 할 수 있는 효과가 있다.In addition, for the evaluation of the voice generated by artificial intelligence or computer, several third parties who have not heard the voice are hired or asked to score and evaluate the performance of the voice module and the quality of the voice uttered through the module, or In the evaluation method, it has the effect of enabling automation and objectification of the evaluation of synthesized speech based on artificial intelligence-based detailed indicators.

도 1은 본 발명의 일 실시예에 따른 합성음성 평가 장치를 도시한 도면이다.
도 2는 본 발명의 실시예에 따른 세부지표 별 수식을 도시한 도면이다.
도 3은 본 발명의 일 실시예에 따른 합성음성 평가 방법을 도시한 순서도이다.
도 4는 본 발명의 실시예에 따른 합성음성 평가 장치에서의 평가 점수 반영 방식을 도시한 순서도이다.1 is a diagram illustrating an apparatus for evaluating synthesized speech according to an embodiment of the present invention.
2 is a diagram illustrating a formula for each detailed indicator according to an embodiment of the present invention.
3 is a flowchart illustrating a method for evaluating a synthesized voice according to an embodiment of the present invention.
4 is a flowchart illustrating an evaluation score reflection method in the synthesized speech evaluation apparatus according to an embodiment of the present invention.

본 발명은 다양한 변환을 가할 수 있고 여러 가지 실시예를 가질 수 있는 바, 특정 실시예들을 도면에 예시하고 상세한 설명에 상세하게 설명하고자 한다. 본 발명의 효과 및 특징, 그리고 그것들을 달성하는 방법은 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나 본 발명은 이하에서 개시되는 실시예들에 한정되는 것이 아니라 다양한 형태로 구현될 수 있다. Since the present invention can apply various transformations and can have various embodiments, specific embodiments are illustrated in the drawings and described in detail in the detailed description. Effects and features of the present invention, and a method for achieving them, will become apparent with reference to the embodiments described below in detail in conjunction with the drawings. However, the present invention is not limited to the embodiments disclosed below and may be implemented in various forms.

이하, 첨부된 도면을 참조하여 본 발명의 실시예들을 상세히 설명하기로 하며, 도면을 참조하여 설명할 때 동일하거나 대응하는 구성 요소는 동일한 도면부호를 부여하고 이에 대한 중복되는 설명은 생략하기로 한다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings, and when described with reference to the drawings, the same or corresponding components are given the same reference numerals, and the overlapping description thereof will be omitted. .

이하의 실시예에서, 제1, 제2 등의 용어는 한정적인 의미가 아니라 하나의 구성 요소를 다른 구성 요소와 구별하는 목적으로 사용되었다. In the following embodiments, terms such as first, second, etc. are used for the purpose of distinguishing one component from another, not in a limiting sense.

이하의 실시예에서, 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다.In the following examples, the singular expression includes the plural expression unless the context clearly dictates otherwise.

이하의 실시예에서, 포함하다 또는 가지다 등의 용어는 명세서상에 기재된 특징, 또는 구성요소가 존재함을 의미하는 것이고, 하나 이상의 다른 특징들 또는 구성요소가 부가될 가능성을 미리 배제하는 것은 아니다. In the following embodiments, terms such as include or have means that the features or components described in the specification are present, and the possibility that one or more other features or components may be added is not excluded in advance.

이하의 실시예에서, 막, 영역, 구성 요소 등의 부분이 다른 부분 위에 또는 상에 있다고 할 때, 다른 부분의 바로 위에 있는 경우뿐만 아니라, 그 중간에 다른 막, 영역, 구성 요소 등이 개재되어 있는 경우도 포함한다. In the following embodiments, when it is said that a part such as a film, region, or component is on or on another part, not only when it is directly on the other part, but also another film, region, component, etc. is interposed therebetween. Including cases where there is

도면에서는 설명의 편의를 위하여 구성 요소들이 그 크기가 과장 또는 축소될 수 있다. 예컨대, 도면에서 나타난 각 구성의 크기 및 두께는 설명의 편의를 위해 임의로 나타내었으므로, 본 발명이 반드시 도시된 바에 한정되지 않는다.In the drawings, the size of the components may be exaggerated or reduced for convenience of description. For example, since the size and thickness of each component shown in the drawings are arbitrarily indicated for convenience of description, the present invention is not necessarily limited to the illustrated bar.

본 발명은 인공지능 또는 컴퓨터를 통해 생성된 음성에 대한 평가를 자동화하기 위한 것으로서, 세부지표들을 토대로 기계로 만들어진 음성의 평가를 자동화하며 객관화하는 기술이다.The present invention is for automating the evaluation of speech generated by artificial intelligence or a computer, and is a technology for automating and objectifying the evaluation of machine-made speech based on detailed indicators.

합성음성의 평가를 위해서는 평가하고자 하는 합성음성을 들어보지 못한 복수의 제3자들을 통해 설문 평가를 진행해왔다.For the evaluation of the synthesized voice, questionnaire evaluation has been conducted through a plurality of third parties who have not heard the synthesized voice to be evaluated.

구체적으로 특정 프로그램을 통해 만들어진 원본 목소리와 합성된 목소리를 들어보지 않은 사람 다수를 통해 합성된 음성에 대한 설문 평가를 시킨다. 이후, 최저점과 최고점을 정해주어 개인의 생각에 따라 각 음성에 대한 점수를 그 사이의 값으로 선정하도록 한다. 이러한 평가를 위한 설문지는 단순히 음성에 대해 점수를 부여할 수 있으며, 여러 지표별, 예를 들어, 음성이 사람이 말하는 억양과 구분 가능한지를 평가하는 지표, 기계에 의해 생성된 잡음의 존재를 평가하는 지표, 각 음절이 전부 표현되는지 판단하는 지표 등으로 음성에 대한 점수를 부여하여 그 점수를 합산하는 방법 또는 그 점수들의 평균을 구할 수 있다.Specifically, a questionnaire evaluation is conducted on the synthesized voice through a large number of people who have not heard the synthesized voice and the original voice created through a specific program. Thereafter, the lowest and highest points are determined, and the score for each voice is selected as a value between them according to the individual's thoughts. The questionnaire for this evaluation can simply give a score for the voice, and by several indicators, for example, an indicator that evaluates whether a voice is distinguishable from a human intonation, and an indicator that evaluates the presence of noise generated by a machine. It is possible to obtain the average of the scores or a method of summing the scores by giving scores to the voice as an indicator, an indicator for determining whether all syllables are expressed, or the like.

여기서 한가지 문제점은 음색의 판단에 대한 문제이다. 악기 또는 음을 내는 사람에 따라 음색이 다르다는 점은 음을 들을 수 있는 사람이라면 자명하게 이해하는 점이나, 이 음색이라는 것은 정의 내리기 어려운 요소이다. 이에 이러한 음색을 색감과 같이 3원색으로 표현하거나, 13원색 등으로 표현하는 실험들이 수행되었으나, 완전한 음의 분류는 하지 못하고 있다.One problem here is the problem of timbre judgment. The fact that the timbre differs depending on the instrument or the person making the sound is obvious to anyone who can hear the sound, but the timbre is a difficult element to define. Therefore, experiments have been performed to express these tones with three primary colors, such as colors, or with 13 primary colors, but complete sound classification is not possible.

이와 같이 제3자를 이용한 합성음성 평가 방식은 개인의 주관적인 성향에 따라 평가점수가 일관적이고, 객관적으로 평가되기 어렵다. 이에 아래의 도면을 통해 객관적이면서도 자동화 가능한 합성음성 평가 장치에 대해 설명하고자 한다.As described above, in the synthetic voice evaluation method using a third party, the evaluation score is consistent depending on the individual's subjective tendency, and it is difficult to evaluate objectively. Accordingly, an objective and automated synthetic voice evaluation device will be described with reference to the drawings below.

도 1은 본 발명의 일 실시예에 따른 합성음성 평가 장치를 도시한 도면이다.1 is a diagram illustrating an apparatus for evaluating synthesized speech according to an embodiment of the present invention.

도 1을 참조하면, 합성음성 평가 장치(100)는 합성음성 입력부(110), 원본음성 입력부(120), 화음적/타악기적 소스 분리(HPSS: Harmonic/Percussive Source Separation)모듈부(130), 멜스펙트로그램(Mel-spectrogram) 변환모듈부(140), 멜 1차원 변환부(150), 평균값모듈부(160), 표준편차 모듈부(170) 및 출력부(180) 등을 포함할 수 있다.1, the synthesized voice evaluation apparatus 100 includes a synthesized voice input unit 110, an original voice input unit 120, a harmonic/percussive source separation (HPSS) module unit 130, It may include a Mel-spectrogram transformation module unit 140 , a Mel one-dimensional transformation unit 150 , an average value module unit 160 , a standard deviation module unit 170 , an output unit 180 , and the like. .

합성음성 평가 장치(100)는 인공지능기반 합성음성 자동 평가 모듈로서, 장치 내 구성요소들을 제어하며, 복수의 지표들에서 산출된 점수로 평가를 수행할 수 있다. 이에 합성음성 입력부(110)에서는 어느 한 음성합성 장치로부터 출력된 합성음성 데이터를 입력 받을 수 있다. 그리고 합성음성 입력부(110)는 입력된 음성 데이터를 HPSS모듈부(130)로 전달할 수 있다.The synthesized voice evaluation apparatus 100 is an artificial intelligence-based automatic synthesized voice evaluation module, controls components in the apparatus, and may perform evaluation with scores calculated from a plurality of indices. Accordingly, the synthesized voice input unit 110 may receive synthesized voice data output from any one voice synthesizing apparatus. In addition, the synthesized voice input unit 110 may transmit the input voice data to the HPSS module unit 130 .

원본음성 입력부(120)는 합성음성에 대응하는 원본음성으로서, 음성 데이터를 파일로 전송 받거나, 발화자의 음성을 녹음하여 저장할 수 있다. 그리고 원본음성 입력부(120)는 입력된 원본음성을 HPSS모듈부(130)로 전달할 수 있다.The original voice input unit 120 may receive voice data as an original voice corresponding to the synthesized voice as a file or record and store the speaker's voice. In addition, the original voice input unit 120 may transmit the input original voice to the HPSS module unit 130 .

HPSS모듈부(130)는 합성음성 즉, 기계음에 대해 화음적/타악기적 소스 부분으로 분리를 수행할 수 있다. 구체적으로 특정 음파

는 단시간 푸리에 변환(STFT: Short-time Fourier transform)을 통해 다음 (수학식 1)과 같이 변환될 수 있다.The HPSS module unit 130 may separate the synthesized voice, that is, the mechanical sound, into chord/percussion source parts. specifically a specific sound wave

can be transformed as follows (Equation 1) through a short-time Fourier transform (STFT).

[수학식 1][Equation 1]

여기서,

,

은 프레임 수,

은 프레임 크기 (즉, 이산 STFT의 길이),

은 윈도우 함수이며

는 hop(이하 홉)의 크기다. 이 변환을 거친 후 음파

의 파워 스펙트럼은 다음 (수학식 2)와 같이 정의될 수 있다.here,

,

number of frames,

is the frame size (i.e. the length of the discrete STFT),

is a window function

is the size of hops (hereafter hops). After this transformation, the sound wave

The power spectrum of can be defined as follows (Equation 2).

[수학식 2][Equation 2]

여기서, 임의의 유한 첨수집합

에 대한 유한집합

이 고려되어야한다. 이 경우 중간값,

는

일 때 다음 (수학식 3)과 같이 정의된다.Here, any finite exponential set

finite set for

This should be taken into account. In this case, the median,

Is

When , it is defined as follows (Equation 3).

[수학식 3][Equation 3]

특정 행렬

에 대한 화음적, 타악기적 중간값 필터는 다음 (수학식 4)와 같이 정의될 수 있다.specific matrix

A chordal and percussion median filter for ? can be defined as follows (Equation 4).

[수학식 4][Equation 4]

여기서,

와

는 타악기적, 화음적 중간값 필터의 크기이다. (수학식 2)에서 언급한

에 (수학식 4)의 두 함수를 적용하여

와

를 얻은 후 각각의 결과에 다음 (수학식 5)의 바이너리 마스크(binary mask) 함수들을 적용한다.here,

Wow

is the size of the percussion and chord median filter. (Equation 2) mentioned in

By applying the two functions of (Equation 4) to

Wow

After obtaining , apply the binary mask functions of the following (Equation 5) to each result.

[수학식 5][Equation 5]

원래의 음을 변환하여 얻은 결과

에 위의 binary mask 함수들을 적용하면, 원음의 화음적 부분과 타악기적 부분에 대한 푸리에 변환결과들

과

을 얻을 수 있으며, 이들을 역푸리에 변환하여 얻은 결과,

와

가 원음

에 대한 화음적 및 타악기적 부분들이다. 이 과정을 화음적/타악기적 소스 분리(Harmonic/Percussive Source Separation)라 한다. 원음

에 대해 다음 (수학식 6)은 자명할 수 있다.The result obtained by transforming the original note

Applying the above binary mask functions to , the Fourier transform results for the chord and percussion

class

can be obtained, and the result obtained by inverse Fourier transform

Wow

the original sound

chordal and percussion parts for This process is called Harmonic/Percussive Source Separation. original sound

For , the following (Equation 6) may be self-evident.

[수학식 6][Equation 6]

대부분의 음성은 화음적 부분과 타악기적 부분 모두를 가지고 있다. 화음적 부분은 시간의 변화에 대해 주파수가 연속적으로 변화하는 함수의 형태를 띄는 반면, 타악기적 부분은 특정 시간에 여러 주파수가 포함된다. 이에 타악기적 부분을 분석할 때는

평면에서 그의 역함수

를 분석하는 것이 도움이 된다. 이 역함수는 주파수가 가로축에 놓여있으며, 세로 축에 시간에 해당되는 함수가 있다. 그리고

와

는 확률분포로 다루어 분석을 수행할 수 있다.Most voices have both chordal and percussion parts. The chordal part takes the form of a function whose frequency changes continuously with time, whereas the percussion part contains several frequencies at a specific time. Therefore, when analyzing the percussion part,

its inverse in the plane

It is helpful to analyze In this inverse function, frequency lies on the horizontal axis, and the vertical axis has a function corresponding to time. and

Wow

can be analyzed as a probability distribution.

원본음성과 합성음성은 HPSS 모듈부(130)를 통해 화음적/타악기적 부분으로 분리되어 각각 멜스펙트로그램 변환모듈부(140), 평균값모듈부(160) 및 표준편차 모듈부(170)로 입력될 수 있다.The original voice and the synthesized voice are separated into chord/percussion parts through the HPSS module unit 130 and input to the melspectrogram conversion module unit 140, the average value module unit 160 and the standard deviation module unit 170, respectively. can be

멜스펙트로그램 변환모듈부(140)는 음성 데이터를 2차원 평면에 그림으로 표현하는 방식으로 스펙트로그램을 생성하게 된다. 스펙트로그램은 소리나 파동을 시각화하여 파악하기 위한 도구로, 파형(waveform)과 스펙트럼(spectrum)의 특징이 조합되어 있다. 멜 스케일(mel-scale)은 사람의 귀를 칼라맵인 스펙트로그램에 반영하는 것을 의미한다. 일반적으로 고주파로 갈수록 사람이 구분하는 주파수 간격이 넓어지는데, 멜 스케일은 이러한 원리로 필터를 이용하여 스케일 단위를 변환한다. 이와 같이 생성된 멜스펙트로그램은 다른 발화자가 같은 문장을 말하거나, 같은 발화자가 다른 문장을 말하는 경우, 서로 다른 모양으로 생성된다.The mel spectrogram conversion module unit 140 generates a spectrogram in a manner that expresses voice data as a picture on a two-dimensional plane. A spectrogram is a tool for visualizing and understanding sound or waves, and combines the characteristics of a waveform and a spectrum. The mel-scale refers to reflecting the human ear in the spectrogram, which is a color map. In general, as the frequency increases, the frequency interval that humans distinguish becomes wider, and the Mel scale converts the scale unit by using a filter based on this principle. The generated mel spectrograms are generated in different shapes when different speakers speak the same sentence or when the same speaker speaks different sentences.

이와 같이 멜스펙트로그램 변환모듈부(140)로부터 생성된 멜스펙트로그램은 멜 1차원 변환부(150)로 전달되고, 멜 1차원 변환부(150)에서는 2차원으로 전달된 멜스펙트로그램을 1차원으로 변환을 수행할 수 있다. 그리고 1차원으로 변환된 원본음성 및 합성음성에 대한 멜스펙트로그램은 평균값모듈부(160) 및 표준편차 모듈부(170)로 각각 입력될 수 있다.As described above, the mel spectrogram generated from the mel spectrogram transformation module unit 140 is transmitted to the mel one-dimensional transformation unit 150, and the mel one-dimensional transformation unit 150 converts the mel spectrogram transferred in two dimensions into one dimension. conversion can be performed. In addition, mel spectrograms for the original speech and the synthesized speech converted into one dimension may be input to the average value module unit 160 and the standard deviation module unit 170 , respectively.

평균값모듈부(160)는 원본음성의 화음적/타악기적 부분에 대한 평균값을 각각 산출하고, 합성음성의 화음적/타악기적 부분에 대한 평균값을 각각 산출할 수 있다. 그리고 원본음성의 화음적/타악기적 부분의 멜스펙트로그램에 대한 평균값을 각각 산출하고, 합성음성의 화음적/타악기적 부분의 멜스펙트로그램에 대한 평균값을 각각 산출할 수 있다. 산출된 값은 출력부(180)로 전달할 수 있다.The average value module unit 160 may calculate an average value for each chord/percussion part of the original voice, and may calculate an average value for each chord/percussion part of the synthesized voice. In addition, an average value of the mel spectrogram of the chord/percussion part of the original voice may be calculated, and the average value of the mel spectrogram of the chord/percussion part of the synthesized voice may be calculated, respectively. The calculated value may be transmitted to the output unit 180 .

표준편차 모듈부(170)는 원본음성의 화음적/타악기적 부분에 대한 표준편차값을 각각 산출하고, 합성음성의 화음적/타악기적 부분에 대한 표준편차값을 각각 산출할 수 있다. 그리고 원본음성의 화음적/타악기적 부분의 멜스펙트로그램에 대한 표준편차값을 각각 산출하고, 합성음성의 화음적/타악기적 부분의 멜스펙트로그램에 대한 표준편차값을 각각 산출할 수 있다. 산출된 값은 출력부(180)로 전달할 수 있다.The standard deviation module unit 170 may calculate a standard deviation value for the chord/percussion part of the original voice, respectively, and may calculate a standard deviation value for the chord/percussion part of the synthesized voice, respectively. In addition, standard deviation values of the mel spectrogram of the chord/percussion part of the original voice may be calculated, respectively, and standard deviation values of the mel spectrogram of the chord/percussion part of the synthesized voice may be calculated, respectively. The calculated value may be transmitted to the output unit 180 .

이를 구체적인 수식으로 표현하면, 특정 사람의 음성(즉, 원본 음성)에 대한

와

가 얻어졌다고 가정하면 다음의 값들을 구할 수 있다.Expressing this in a specific formula, it is possible to express

Wow

Assuming that is obtained, the following values can be obtained.

-

의 평균(

), 표준편차(

), MEL스펙트로그램의 평균(

)과 표준편차(

)-

average of (

), Standard Deviation(

), the mean of the MEL spectrogram (

) and standard deviation (

)

-

의 평균(

), 표준편차(

), MEL스펙트로그램의 평균(

)과 표준편차(

).-

average of (

), Standard Deviation(

), the mean of the MEL spectrogram (

) and standard deviation (

).

이 값들을 얻기 위해 사용된 사람의 목소리를 해당 대본을 통해 생성된 기계음(즉, 합성 음성)에 해당되는 값들을 *를 통해 구분하면, 다음의 값들 또한 얻을 수 있다.If the human voice used to obtain these values is separated through values corresponding to the machine sound (ie, synthesized voice) generated through the corresponding script, the following values can also be obtained.

-

의 평균(

), 표준편차(

), MEL스펙트로그램의 평균(

)과 표준편차(

).-

average of (

), Standard Deviation(

), the mean of the MEL spectrogram (

) and standard deviation (

).

-

의 평균(

), 표준편차(

), MEL스펙트로그램의 평균(

)과 표준편차(

).-

average of (

), Standard Deviation(

), the mean of the MEL spectrogram (

) and standard deviation (

).

이는 도 2의 표에 나타낸 바와 같이 8개의 물성적 지표를 얻을 수 있다.As shown in the table of FIG. 2, eight physical property indicators can be obtained.

도 2는 본 발명의 실시예에 따른 세부지표 별 수식을 도시한 도면이다.2 is a diagram illustrating a formula for each detailed indicator according to an embodiment of the present invention.

도 2를 참조하면, 세부지표로서, 8개의 물성적 지표는 화음부의 상대 평균(

), 화음부의 상대 표준편차(

, 타악기부의 상대 평균(

), 타악기부의 상대 표준편차(

), 화음부의 상대 MEL 평균(

), 화음부의 상대 MEL 표준편차(

), 타악기부의 상대 MEL 평균(

및 타악기부의 상대 MEL 표준편차(

이며, 각각의 수식을 토대로 값을 산출할 수 있다. 각각의 지표들은 기 설정된 계수들(

)을 찾아 다음 (수학식 7)을 계산할 수 있다.Referring to FIG. 2 , as detailed indicators, the eight physical property indicators are the relative average (

), the relative standard deviation of the chord (

, the relative mean of percussion (

), the relative standard deviation of percussion instruments (

), the relative MEL average of the chords (

), the relative MEL standard deviation of the chord (

), the relative MEL mean of percussion (

and the relative MEL standard deviation of the percussion part (

, and a value can be calculated based on each formula. Each index is a preset coefficient (

) to calculate the following (Equation 7).

[수학식 7][Equation 7]

여기서,

값들은 경험적으로 세워질 수 있는 값이며, 원하는 점수의 의미에 따라 임의로 세울 수도 있다.here,

The values are values that can be established empirically, and can be set arbitrarily according to the meaning of the desired score.

출력부(160)는 평균값 모듈부(160) 및 표준편차 모듈부(170)를 통해 생성된 평가 점수를 입력받아 출력하는 것으로 예를 들어, 디스플레이 장치가 될 수 있다. 또한, 산출된 평가 점수 데이터를 합성음성을 입력한 음성합성 장치로 전송할 수 있고, 평가 점수가 임계값 보다 낮은 경우, 음성합성 장치로 재학습 수행 및 재설계 요청 메시지를 전송할 수도 있다. The output unit 160 receives and outputs the evaluation scores generated through the average value module unit 160 and the standard deviation module unit 170 , and may be, for example, a display device. In addition, the calculated evaluation score data may be transmitted to the speech synthesis device to which the synthesized voice is input, and when the evaluation score is lower than a threshold value, a re-learning performance and redesign request message may be transmitted to the speech synthesis device.

이를 통해 합성음성 평가 장치(100)는 입력된 합성음성에 대해 원본음성 대비 8가지 지표에 대한 평가 점수를 자동 산출하여 출력할 수 있다.Through this, the synthesized voice evaluation apparatus 100 may automatically calculate and output evaluation scores for 8 indicators compared to the original voice for the input synthesized voice.

도 3은 본 발명의 일 실시예에 따른 합성음성 평가 방법을 도시한 순서도이다.3 is a flowchart illustrating a method for evaluating a synthesized voice according to an embodiment of the present invention.

도 3을 참조하면, S300 단계에서 합성음성 평가 장치(100)의 합성음성 입력부(110)에서 합성음성을 입력 받고, 원본음성 입력부(120)에서 원본음성을 입력 받아 HPSS 모듈부(130)로 전달하게 된다. Referring to FIG. 3 , in step S300 , a synthesized voice is input from the synthesized voice input unit 110 of the synthesized voice evaluation apparatus 100 , and the original voice is received from the original voice input unit 120 and transmitted to the HPSS module unit 130 . will do

S310단계에서 HPSS 모듈부(130)는 전달받은 합성음성 및 원본음성을 각각 화음적/타악기적 부분으로 분리를 수행하게 된다. S320단계에서 멜스펙트로그램 변환모듈부(140)는 합성음성 및 원본음성의 화음적/타악기 부분을 멜스펙트로그램을 각각 변환시키게 된다.In step S310, the HPSS module unit 130 separates the received synthesized voice and the original voice into chord/percussion parts, respectively. In step S320, the mel spectrogram conversion module unit 140 converts the chord/percussion parts of the synthesized voice and the original voice into mel spectrograms, respectively.

S330단계에서 평균값모듈부(160)는 원본음성의 화음적/타악기적 부분에 대한 평균값과, 멜스펙트로그램에 대한 평균값을 각각 산출하고, 합성음성의 화음적/타악기적 부분에 대한 평균값과, 멜스펙트로그램에 대한 평균값을 각각 산출하게 된다. In step S330, the average value module unit 160 calculates the average value for the chord/percussion part of the original voice and the average value for the mel spectrogram, respectively, and the average value for the chord/percussion part of the synthesized voice and the mel An average value for each spectrogram is calculated.

S340단계에서 표준편차 모듈부(170)는 원본음성의 화음적/타악기적 부분에 대한 평균값과, 멜스펙트로그램에 대한 표준편차값을 각각 산출하고, 합성음성의 화음적/타악기적 부분에 대한 평균값과, 멜스펙트로그램에 대한 표준편차값을 각각 산출하게 된다.In step S340, the standard deviation module unit 170 calculates the average value for the chord/percussion part of the original voice and the standard deviation value for the mel spectrogram, respectively, and the average value for the chord/percussion part of the synthesized voice. and standard deviation values for the Mel spectrogram are respectively calculated.

이후 S350단계에서 출력부(160)는 평균값 모듈부(160) 및 표준편차 모듈부(170)를 통해 생성된 평가 점수를 입력받아 출력하게 된다. Thereafter, in step S350 , the output unit 160 receives and outputs the evaluation score generated through the average value module unit 160 and the standard deviation module unit 170 .

도 4는 본 발명의 실시예에 따른 합성음성 평가 장치에서의 평가 점수 반영 방식을 도시한 순서도이다.4 is a flowchart illustrating an evaluation score reflection method in the synthesized speech evaluation apparatus according to an embodiment of the present invention.

도 4를 참조하면, S400단계에서 음성합성장치(미도시)에 문장 또는 음성을 입력값으로 입력하게 되고, S410단계에서 음성합성장치에서는 입력된 문장 또는 음성을 기 학습된 텍스트-음성 합성 모델에 통과시켜, 합성된 음성을 출력하게 된다.Referring to FIG. 4 , in step S400, a sentence or voice is input as an input value to a speech synthesis device (not shown), and in step S410, the input sentence or voice is applied to the pre-learned text-to-speech synthesis model in the speech synthesis device. passed, and the synthesized voice is output.

S420단계에서 합성음성 평가 장치(100)는 새로운 지표들에서 산출된 점수로 평가를 수행하며, 구체적으로 합성음성 및 원본음성에 대한 화음적, 타악기적 부분값을 획득하고, 획득된 화음적, 타악기적 부분값으로 평균, 표준편차, MEL 평균 및 MEL 표준편차를 각각 산출한 후, 산출된 값으로 입력된 합성 음성에 대한 평가 점수로서 사용하게 된다. 이후, S430단계에서 합성음성 평가 장치(100)는 산출된 점수가 임계값 이상인지 여부를 비교 판단하여 임계값 이상인 경우에는 S440단계에서 음성합성장치의 평가결과로서, 지표별 점수를 출력하고, 이러한 데이터를 음성합성장치로 전달하게 된다. 이는 음성합성장치가 어느 정도의 완성형 모듈로서, 활용 가능한 상태로 판단할 수 있다.In step S420, the synthesized voice evaluation apparatus 100 evaluates the scores calculated from the new indicators, and specifically acquires chordal and percussion partial values for the synthesized voice and the original voice, and obtains the acquired chordal and percussion instruments. After calculating the mean, standard deviation, MEL mean, and MEL standard deviation as partial values, the calculated values are used as evaluation scores for the input synthetic voice. Thereafter, in step S430, the synthesized speech evaluation apparatus 100 compares and determines whether the calculated score is equal to or greater than the threshold value, and when the calculated score is equal to or greater than the threshold value, in step S440, as the evaluation result of the speech synthesis apparatus, the score for each index is output, and such The data is transmitted to the speech synthesizer. This can be determined as a state in which the speech synthesizer can be utilized as a complete module to a certain extent.

그러나, 430단계에서 합성음성 평가 장치(100)는 산출된 점수가 임계값 이상인지 여부를 비교 판단하여 임계값 이하인 경우에는 450단계에서 음성합성장치의 비교 평가결과로서, 지표별 점수를 출력하고, 임계값 이하인 지표의 점수를 출력하게 된다. 이와 함께 음성합성장치의 재학습 또는 모듈 재설계를 제안하는 정보 또는 메시지를 음성합성장치로 전달하게 된다.However, in step 430, the synthesized speech evaluation apparatus 100 compares and determines whether the calculated score is equal to or greater than the threshold value, and when it is less than or equal to the threshold value, in step 450, as a result of the comparison evaluation of the speech synthesis apparatus, the score for each index is output, The score of the index below the threshold is output. In addition, information or a message suggesting re-learning or module redesign of the speech synthesis device is transmitted to the speech synthesis device.

이를 통해 설계한 음성합성장치로부터 합성된 음성에 대한 자동화된 평가를 수행할 수 있으며, 어느 정도 객관화된 정보를 제공할 수 있으므로, 음성합성 장치의 현재 상태 및 재설계 필요성 등을 주기적이며, 설계시마다 확인할 수 있다.Through this, automated evaluation of the synthesized voice from the designed voice synthesizer can be performed and objective information can be provided to a certain extent. can be checked

본 발명의 일 실시예에 따른 인공지능 기반 합성음성의 평가 자동화 방법은, 인공지능 또는 컴퓨터를 통해 생성된 음성에 대한 평가를 자동화하기 위한 것으로서, 세부지표들을 토대로 기계로 만들어진 음성의 평가를 자동화하며 객관화할 수 있는 이점이 있다.An artificial intelligence-based synthetic voice evaluation automation method according to an embodiment of the present invention is for automating the evaluation of a voice generated through artificial intelligence or a computer, and it automates the evaluation of a voice made by a machine based on detailed indicators, There are advantages to being objective.

한편, 본 발명은 컴퓨터로 읽을 수 있는 기록 매체에 컴퓨터가 읽을 수 있는 코드로 구현하는 것이 가능하다. 컴퓨터가 읽을 수 있는 기록 매체는 컴퓨터 시스템에 의하여 읽혀질 수 있는 데이터가 저장되는 모든 종류의 기록 장치를 포함한다. 컴퓨터가 읽을 수 있는 기록 매체의 예로는 ROM, RAM, CD-ROM, 자기 테이프, 플로피디스크, 광 데이터 저장 장치 등이 있다.Meanwhile, the present invention can be implemented as computer-readable codes on a computer-readable recording medium. The computer-readable recording medium includes all types of recording devices in which data readable by a computer system is stored. Examples of the computer-readable recording medium include ROM, RAM, CD-ROM, magnetic tape, floppy disk, and optical data storage device.

또한, 컴퓨터가 읽을 수 있는 기록 매체는 네트워크로 연결된 컴퓨터 시스템에 분산되어, 분산 방식으로 컴퓨터가 읽을 수 있는 코드가 저장되고 실행될 수 있다. 그리고 본 발명을 구현하기 위한 기능적인(functional) 프로그램, 코드 및 코드 세그먼트들은 본 발명이 속하는 기술 분야의 프로그래머들에 의하여 용이하게 추론될 수 있다.In addition, the computer-readable recording medium may be distributed in a network-connected computer system, and the computer-readable code may be stored and executed in a distributed manner. And functional programs, codes, and code segments for implementing the present invention can be easily inferred by programmers in the technical field to which the present invention pertains.

본 발명에 따른 방법을 구성하는 단계들에 대하여 명백하게 순서를 기재하거나 반하는 기재가 없다면, 상기 단계들은 적당한 순서로 행해질 수 있다. 반드시 상기 단계들의 기재 순서에 따라 본 발명이 한정되는 것은 아니다.The steps constituting the method according to the present invention may be performed in an appropriate order, unless the order is explicitly stated or there is no description to the contrary. The present invention is not necessarily limited to the order in which the steps are described.

본 발명에서 모든 예들 또는 예시적인 용어(예를 들어, 등등)의 사용은 단순히 본 발명을 상세히 설명하기 위한 것으로서 특허청구범위에 의해 한정되지 않는 이상 상기 예들 또는 예시적인 용어로 인해 본 발명의 범위가 한정되는 것은 아니다. 또한 해당 기술 분야의 통상의 기술자는 다양한 수정, 조합 및 변경이 부가된 특허청구범위 또는 그 균등물의 범주 내에서 설계 조건 및 팩터(factor)에 따라 구성될 수 있음을 알 수 있다.The use of all examples or exemplary terminology (eg, etc.) in the present invention is merely for the purpose of describing the present invention in detail, and the scope of the present invention is not limited by the examples or exemplary terms unless limited by the appended claims. It is not limited. In addition, those skilled in the art can appreciate that various modifications, combinations and changes can be made according to design conditions and factors within the scope of the appended claims or equivalents thereof.

따라서, 본 발명의 사상은 상기 설명된 실시예에 국한되어 정해져서는 아니되며, 후술하는 특허청구범위뿐만 아니라, 이 특허청구범위와 균등한 또는 이로부터 등가적으로 변경된 모든 범위는 본 발명의 사상의 범주에 속한다고 할 것이다. Therefore, the spirit of the present invention should not be limited to the above-described embodiments, and not only the claims described below, but also all ranges equivalent to or changed from these claims are of the spirit of the present invention. would be said to belong to the category.

이와 같이 본 발명은 도면에 도시된 일 실시예를 참고로 하여 설명하였으나 이는 예시적인 것에 불과하며 당해 분야에서 통상의 지식을 가진 자라면 이로부터 다양한 변형 및 실시예의 변형이 가능하다는 점을 이해할 것이다. 따라서, 본 발명의 진정한 기술적 보호 범위는 첨부된 특허청구범위의 기술적 사상에 의하여 정해져야 할 것이다.As such, the present invention has been described with reference to one embodiment shown in the drawings, but this is merely exemplary, and those skilled in the art will understand that various modifications and variations of the embodiments are possible therefrom. Accordingly, the true technical protection scope of the present invention should be defined by the technical spirit of the appended claims.

100: 합성음성 평가 장치
110: 합성음성 입력부
120: 원본음성 입력부
130: HPSS모듈부
140: 멜스펙트로그램 변환모듈부
150: 멜 1차원 변환부
160: 평균값모듈부
170: 표준편차 모듈부
180: 출력부100: synthesized speech evaluation device
110: synthesized voice input unit
120: original voice input unit
130: HPSS module unit
140: Mel spectrogram conversion module unit
150: Mel one-dimensional transformation unit
160: average value module unit
170: standard deviation module part
180: output unit

Claims

A step in which a chord/percussion source separation unit performs short-time Fourier transform (STFT) on the received synthesized voice and the original voice, and then inverse Fourier transforms the transform results to obtain the following (1) to (4), respectively
(1) a chordal source of the synthesized speech;
(2) a percussion source of the synthesized voice;
(3) a chordal source of the original voice;
(4) a percussion source of the original voice;
converting, by a mel spectrogram conversion module unit, the above (1) to (4) into two-dimensional mel spectrograms, respectively;
Step, by the Mel one-dimensional transformation module unit, converting the two-dimensional Mel spectrogram of (1) to (4) into one dimension to obtain the following (5) to (8), respectively
(5) one-dimensional Mel spectrogram of (1) above,
(6) one-dimensional Mel spectrogram of (2) above,
(7) one-dimensional Mel spectrogram of (3) above,
(8) the one-dimensional Mel spectrogram of (4);
After the average value module unit calculates the average values of (1) to (8) above, respectively, the average values of (1), (2), (5) and (6) above (3), (4), (7) ) and calculating an evaluation score compared to the average value of (8); and
After the standard deviation module unit calculates the standard deviations of (1) to (8) above, respectively, the standard deviations of (1), (2), (5) and (6) above (3), (4) , (7) and (8) compared to the standard deviation of the evaluation score is calculated for an AI-based synthetic voice evaluation automation method.

After short-time Fourier transform (STFT) of the received synthesized voice and the original voice, the chordal/percussion source separation unit obtains the following (1) to (4) by inverse Fourier transforming the transform results, respectively
(1) a chordal source of the synthesized speech;
(2) a percussion source of the synthesized voice;
(3) a chordal source of the original voice;
(4) a percussion source of the original voice;
a mel spectrogram conversion module unit for converting (1) to (4) into a two-dimensional mel spectrogram, respectively;
Mel one-dimensional transformation module unit that converts the two-dimensional Mel spectrogram of (1) to (4) into one dimension to obtain the following (5) to (8), respectively
(5) one-dimensional Mel spectrogram of (1) above,
(6) one-dimensional Mel spectrogram of (2) above,
(7) one-dimensional Mel spectrogram of (3) above,
(8) the one-dimensional Mel spectrogram of (4);
Calculate the average value of (1) to (8) above, respectively, and calculate the average value of (1), (2), (5) and (6) above (3), (4), (7) and (8) an average value module unit for calculating an evaluation score in comparison with an average value; and
Calculate the standard deviations of (1) to (8), respectively, and calculate the standard deviations of (1), (2), (5) and (6) above (3), (4), (7) and (8) ), an artificial intelligence-based synthetic voice evaluation automation device including a standard deviation module unit that calculates an evaluation score in comparison with the standard deviation of .

delete