KR100623214B1

KR100623214B1 - Real-time quality analyzer for voice and audio signals

Info

Publication number: KR100623214B1
Application number: KR1020017000881A
Authority: KR
Inventors: 이안앤드류앗킨슨; 마틴리; 웨이마; 감비즈호마연파
Original assignee: 내셔널 세미컨덕터 코포레이션
Priority date: 1999-05-25
Filing date: 1999-05-25
Publication date: 2006-09-12
Also published as: JP2003500701A; AU4097099A; WO2000072306A1; JP4500458B2; KR20010106412A

Abstract

본 발명은, 오디오 시험신호를 포함하는 품질 시험신호가 시험하고자 하는 장치에 의해 수신되는 오디오 신호의 실시간 지각성 품질 측정을 제공하기 위한 방법이다. 오디오 신호의 미리 기억된 표시의 재생은, 예컨대 품질 시험신호의 헤더의 동기신호를 이용하여 수신된 오디오 시험신호와 대충 동기된다. 그 후, 재생은 예컨대 수신된 오디오 시험신호의 윈도우 부분의 데이터와 오디오 시험신호의 미리 기억된 표시의 윈도우 부분을 비교하여 그 비교의 결과에 따라 오디오 시험신호의 미리 기억된 표시의 윈도우 부분을 조정함으로써 수신된 오디오 신호와 미세하게 동기된다. 그 후, 수신된 오디오 시험신호의 윈도우는, 수신된 오디오 시험신호의 품질 측정을 출력하기 위해, 오디오 시험신호의 미리 기억된 표시의 미세하게 동기된 재생의 일부와 비교된다.The present invention is a method for providing a real-time perceptual quality measurement of an audio signal received by a device to be tested by a quality test signal comprising an audio test signal. The reproduction of the pre-stored display of the audio signal is roughly synchronized with the received audio test signal using, for example, the synchronization signal of the header of the quality test signal. Then, the reproduction compares, for example, the data of the window portion of the received audio test signal with the window portion of the pre-stored display of the audio test signal and adjusts the window portion of the pre-stored display of the audio test signal according to the result of the comparison. Thereby finely synchronizing with the received audio signal. The window of the received audio test signal is then compared with a portion of the finely synchronized reproduction of the pre-stored representation of the audio test signal to output the quality measurement of the received audio test signal.

Description

Real-time quality analyzer for voice and audio signals {REAL-TIME QUALITY ANALYZER FOR VOICE AND AUDIO SIGNALS}

본 발명은, 시험하고자 하는 음성장치에 품질측정을 제공하기 위한 방법 및 장치에 관한 것으로, 특히 그러한 장치가 수신한 음성이나 오디오신호의 실시간 객관지각 품질측정(real-time objective perceptual quality measurement)을 제공하기 위한 방법 및 장치에 관한 것이다.The present invention relates to a method and apparatus for providing a quality measurement to a speech device to be tested, and in particular to providing a real-time objective perceptual quality measurement of a speech or audio signal received by such a device. A method and apparatus for

통상의 파형 및 스펙트럼 유사도 기준이 수신한 음성신호의 인지된 품질과 특히 잘 상관하지 않기 때문에, 음성 시스템, 특히 압축 및 부호화를 포함하는 시스템에 있어서는 음성품질 평가가 어려운 작업이다. 이전에는 전기통신 시스템의 음성품질 평가는 미리 준비된 음성 재료를 이용하여 주의깊게 제어된 환경에서 수행되는 정식의 지각성 청취시험(listening test)을 이용하여 오프라인(off-line)으로 측정되고 있었다. 이 방법은 유효한 것이지만, 비용 및 시간이 많이 소비된다. 게다가, 그 시험으로부터 얻어지는 결과는 각각의 시험 대상 및 그들의 환경에 의해 결정된다. 그 결과, 그 시험으로부터의 결과는 항상 재현가능하지 않거나 일정하지 않다.Since conventional waveform and spectral similarity criteria do not correlate well with the perceived quality of the received speech signal, speech quality evaluation is a difficult task in speech systems, particularly those involving compression and encoding. Previously, voice quality evaluation of telecommunication systems was measured off-line using a formal perceptual listening test conducted in a carefully controlled environment using prepared speech materials. This method is valid but costly and time consuming. In addition, the results obtained from that test are determined by each test subject and their environment. As a result, the results from the test are not always reproducible or constant.

심리 음향학의 분야에서의 최근의 연구는 인간이 음성 및 소리를 어떻게 인식하는가 하는 더 좋은 이해에 이르렀다. 임계 대역 이론, 청각 마스킹 및 지각성 소리크기 등과 같은 이 분야의 몇가지 지견(知見)을 적용함으로써, 이제는 정식의 주관적인 청취시험의 결과와 밀접하게 매치하는 객관적인 소리측정을 개발하는 것이 가능하다. 예컨대 ITU(International Telecommunications Union: 국제 전기통신 연합)를 포함하는 각종 기관이 컴퓨터에 기억된 파일을 이용하여 오프라인의 음성 품질을 측정하기 위한 알고리즘을 개발했다. 알려진 객관적인 측정 알고리즘의 예로서는, PSQM(Perceptual Speech Quality Measure), MNB(Measuring Normalizing Blocks), PAMS(Perceptual Analysis Measurement System) 및 MBSD(Modified Bark Spectral Distortion) 측정 등이 있다. 후자의 측정은, 예컨대 인간의 청각 수신을 반사하는 대역으로 주파수를 분할한다.Recent research in the field of psychoacoustics has led to a better understanding of how humans perceive voice and sound. By applying some knowledge in this area, such as critical band theory, auditory masking, and perceptual loudness, it is now possible to develop objective sound measurements that closely match the results of formal subjective listening tests. For example, various organizations, including the International Telecommunications Union (ITU), have developed algorithms for measuring offline voice quality using files stored on a computer. Examples of known objective measurement algorithms include Perceptual Speech Quality Measure (PSQM), Measuring Normalizing Blocks (NBN), Perceptual Analysis Measurement System (PAMS), and Modified Bark Spectral Distortion (MBSD) measurements. The latter measurement, for example, divides the frequency into bands that reflect the human auditory reception.

알려진 객관적인 지각성 품질측정 시스템은, 음성 품질의 측정이 오프라인, 즉 수신되어 기억된 음성 데이터로부터 행해지는 것을 필요로 한다. 그러한 객관적인 지각성 품질측정은 연산장치내에서 실시간으로 이루어지거나 거의 실시간으로 이루어지는 것이 바람직하다.Known objective perceptual quality measurement systems require that the measurement of voice quality be done off-line, ie from received and stored voice data. Such objective perceptual quality measurements are preferably made in real time or near real time in the computing device.

본 발명은, 객관적인 지각성 품질측정을 실시간으로 또는 거의 실시간으로 실행할 수 있는 음성이나 오디오신호의 실시간 객관지각 품질측정을 제공하기 위한 방법 및 장치를 제공한다.
한 태양(態樣)에서는, 본 발명은 오디오신호의 실시간 지각성 품질측정을 제공하기 위한 방법을 제공한다. 오디오 시험신호를 포함하는 품질 시험신호가 시험하고자 하는 장치에 의해 수신된다. 오디오 신호의 미리 기억된 표시(pre-stored representation)의 재생(playback)은, 예컨대 품질 시험신호의 헤더에서 동기 펄스를 이용하여 수신된 오디오 시험신호와 대충 동기된다. 이 재생은, 그 후 예컨대 수신된 오디오 시험신호의 윈도우 부분과 오디오 시험신호의 미리 기억된 표시의 윈도우 부분의 데이터를 비교하고, 그 비교의 결과에 따라 오디오 시험신호의 미리 기억된 표시의 윈도우 부분을 조정함으로써, 수신된 오디오 시험신호와 미세하게 동기된다. 그 후, 수신된 오디오 시험신호의 윈도우는, 수신된 오디오 시험신호의 품질측정을 출력하기 위해, 오디오 시험신호의 미리 기억된 표시의 미세하게 동기된 재생의 일부와 비교된다.The present invention provides a method and apparatus for providing a real-time objective perceptual quality measurement of a speech or audio signal capable of performing the objective perceptual quality measurement in real time or near real time.
In one aspect, the present invention provides a method for providing real-time perceptual quality measurement of an audio signal. A quality test signal containing an audio test signal is received by the device under test. The playback of the pre-stored representation of the audio signal is roughly synchronized with the received audio test signal, for example using a sync pulse in the header of the quality test signal. This reproduction then compares, for example, the window portion of the received audio test signal with the data of the window portion of the pre-stored display of the audio test signal, and according to the result of the comparison By adjusting, it is finely synchronized with the received audio test signal. The window of the received audio test signal is then compared with a portion of the finely synchronized reproduction of the pre-stored representation of the audio test signal to output the quality measurement of the received audio test signal.

다른 태양에서는, 본 발명은 시험하고자 하는 장치에 의해 수신되는 품질 시험신호의 품질을 평가하기 위한 오디오 품질 분석기(audio quality analyzer: AQA)를 구비하고 있고, 품질 시험신호가 오디오 시험신호를 포함하고 있다. AQA는 오디오 시험신호의 미리 기억된 표시의 재생을 수신된 오디오 시험신호와 대충 동기시키고, 상기 오디오 시험신호의 미리 기억된 표시의 재생을 수신된 오디오 시험신호와 미세하게 동기시키며, 수신된 오디오 시험신호의 품질측정을 출력하기 위해 수신된 오디오 시험신호의 윈도우를 오디오 시험신호의 미리 기억된 표시의 미세하게 동기된 부분과 비교하도록 구성되어 있다.In another aspect, the invention includes an audio quality analyzer (AQA) for evaluating the quality of a quality test signal received by a device to be tested, wherein the quality test signal comprises an audio test signal. . The AQA roughly synchronizes the reproduction of the prestored representation of the audio test signal with the received audio test signal, finely synchronizes the reproduction of the prestored representation of the audio test signal with the received audio test signal, and receives the received audio test. Configured to compare a window of the received audio test signal with a finely synchronized portion of a pre-stored representation of the audio test signal for outputting a quality measurement of the signal.

따라서, 본 발명은 연산장치내에서 실시간 또는 거의 실시간으로 오디오 및 음성신호의 객관적인 지각성 품질측정을 제공할 수 있다.Accordingly, the present invention can provide an objective perceptual quality measurement of audio and voice signals in real time or near real time within a computing device.

도 1은 본 발명에 따른 음성 품질 인식기의 한 실시예의 블럭도,1 is a block diagram of one embodiment of a speech quality recognizer in accordance with the present invention;

도 2는 품질 시험 메시지 프레임을 나타낸 도면,2 illustrates a quality test message frame;

도 3은 본 발명에 따른 음성 품질 인식기의 다른 실시예의 블럭도,3 is a block diagram of another embodiment of a speech quality recognizer in accordance with the present invention;

도 4는 본 발명에 따른 동기 윈도우잉과 선택 윈도우잉을 제공하는 버퍼의 실시예의 블럭도,4 is a block diagram of an embodiment of a buffer providing synchronous windowing and selective windowing in accordance with the present invention;

도 5는 사각형 윈도우 함수 형상을 나타낸 도면,5 illustrates a rectangular window function shape;

도 6은 비선형 강화된 윈도우 함수 형상을 나타낸 도면,6 shows a non-linear enhanced window function shape;

도 7은 불연속 사각형 윈도우 함수를 나타낸 도면,7 shows a discrete rectangular window function,

도 8은 본 발명에 따른 시험 구성의 블럭도,8 is a block diagram of a test configuration in accordance with the present invention;

도 9는 본 발명에 따른 시험 방법의 실시예의 플로우차트이다.9 is a flowchart of an embodiment of a test method according to the present invention.

도 1은 VEUT(voice equipment under test: 시험하고자 하는 음성장치)(12)에 의해 출력된 음성신호를 수신하는 VQA(voice quality analyzer: 음성 품질 분석기)(10)의 블럭도이다. VQA(10)는 VEUT(12)로부터 수신한 음성 시험신호의 품질 요구를 발생시키는 품질 평가기(14)를 구비하고 있다. VQA(10)는 또한 DTMF(dual tone multiple frequency) 검출기(18) 및 시퀀서(sequencer; 20)를 차례로 갖춘 헤더 검출기(16)를 구비하고 있다. DTMF 검출기(18)는 VEUT(12)로부터 수신한 신호를 감시하여 수신한 신호내에 존재하고 있는 시그널링 톤(signaling tone)을 검출하여 디코드한다. 디코드된 신호는 음성문장 발생기(voice sentence generator; 22)의 동작을 제어하기 위해 시퀀서(20)에 의해 사용된다.1 is a block diagram of a voice quality analyzer (VQA) 10 that receives a voice signal output by a voice equipment under test (VEUT) 12. The VQA 10 is provided with a quality estimator 14 for generating a quality request for a voice test signal received from the VEUT 12. The VQA 10 also includes a header detector 16 which in turn is equipped with a dual tone multiple frequency (DTMF) detector 18 and a sequencer 20. The DTMF detector 18 monitors the signal received from the VEUT 12 and detects and decodes a signaling tone existing in the received signal. The decoded signal is used by the sequencer 20 to control the operation of the voice sentence generator 22.

음성 시험신호의 미리 기억된 표시는 음성문장 발생기(22)내에 기억되어 있다. 이러한 "문장(sentence)"은 어떤 특정의 언어로 완전한 문장이나 단어를 반드시 표현해야 한다든지 어떤 특정의 인간으로부터의 음성을 반드시 표현해야 할 필요는 없다. 오히려, 그 표시는 품질 평가기(14)에 의해 수행되는 음성품질 측정을 용이하게 하기 위해 선택된다. 음성 시험신호를 선행하는 헤더 신호가 수신된 때, 시퀀서(20)는 그 헤더에 있어서 식별되는 특정의 음성 시험신호에 기초하여 음성문장 발생기(22)로부터의 특정의 미리 기억된 음성 시험신호 표시의 재생을 초기화한다. 음성 시험신호의 미리 기억된 표시와 품질 평가기(14)를 이용하여 객관적인 지각성 품질 비교를 수행하기에 충분한 수신 음성 시험신호의 동기를 얻기 위해, 미세 동기화기(fine synchronizer; 24)가 제공된다. 음성품질 측정은, 미세 동기화기(24)로부터 동기되고 국부적으로 발생된 기준신호의 일부를 VEUT(12)로부터 수신한 신호의 윈도우 부분과 비교하는데 객관적인 지각성 품질측정 알고리즘을 적용함으로써 수행된다. 한 실시예에서는, 다음의 알고리즘, 즉 PSQM(Perceptual Speech Quality Measure), MNB(Measuring Normalizing Blocks), PAMS(Perceptual Analysis Measurement System) 및 MBSD(Modified Bark Spectral Distortion) 측정 중의 하나가 사용된다. 다른 실시예에서는, 복수의 다른 알고리즘이 이용가능하고, 알고리즘 선택이 수동으로 이루어진다. 도시하지 않은 다른 실시예에서는, 복수의 다른 알고리즘이 이용가능하고, 음성문장 발생기(22)의 미리 기억된 표시가 시퀀서(20)에 의해 선택되었는가에 따라 선택이 이루어진다.The pre-stored display of the voice test signal is stored in the voice sentence generator 22. This "sentence" does not necessarily have to express a complete sentence or word in any particular language, or the voice from any particular human being. Rather, the indication is selected to facilitate the speech quality measurement performed by the quality estimator 14. When a header signal preceding the voice test signal is received, the sequencer 20 generates a specific pre-stored voice test signal indication from the voice sentence generator 22 based on the specific voice test signal identified in the header. Initialize playback. A fine synchronizer 24 is provided for synchronizing the pre-stored representation of the speech test signal with the received speech test signal sufficient to perform an objective perceptual quality comparison using the quality estimator 14. . Voice quality measurement is performed by applying an objective perceptual quality measurement algorithm to compare a portion of the reference signal synchronized and locally generated from the fine synchronizer 24 with the window portion of the signal received from the VEUT 12. In one embodiment, one of the following algorithms is used: Perceptual Speech Quality Measure (PSQM), Measuring Normalizing Blocks (NBN), Perceptual Analysis Measurement System (PAMS), and Modified Bark Spectral Distortion (MBSD) measurements. In other embodiments, a plurality of different algorithms are available, and algorithm selection is made manually. In another embodiment, not shown, a plurality of different algorithms are available and the selection is made depending on whether the pre-stored representation of the speech sentence generator 22 has been selected by the sequencer 20.

한 실시예에서는, 도 2를 참조하면, 품질 시험 메시지(30)의 일례가 나타내어져 있다. 품질 시험 메시지(30)는, 세 부분이 DTMF 시그널링을 이용하여 송신되는 헤더(40)를 구성하는 32, 34, 36와, 네번째 부분이 음성 시험 메시지(38)를 포함하고 있는 4개의 부분(32, 34, 36, 38)으로 구성되어 있다. 유일어(unique word; 32)는 새로운 품질 시험 메시지(30)의 선두(시작)를 신호로 알려 주기 위해 사용된다. 유일어(32)는 엄격한 채널 열화의 기간, 예컨대 셀룰러 네트워크로부터의 신호 중 VEUT(12)에 의한 매우 시끄러운 소리가 나는 수신의 기간에 측정 시작신호가 잘못되는 것을 방지하기 위해 포함된다. 문장 ID(34)는 음성 시험 메시지(38)의 지수나 식별자를 포함하고 있고, 이에 따라 다른 시험 메시지가 VEUT(12)로 송신되어 VQA(10)에 의해 식별되도록 되어 있다. 동기 펄스(36)는 음성 시험신호(38)의 선두를 신호로 알려 주기 위해 사용되는 짧은 DTMF 펄스이다. 동기 펄스(36)는 시퀀서(20)에 의해 사용되어 음성문장 발생기(22)가 VEUT(12)에 의해 수신되는 것과의 비교를 위해 적당한 미리 기억된 음성 시험신호 표시를 재생하는 것을 개시하도록 한다. 다른 실시예에서는, 헤더(40)는 다른 방법, 예컨대 다른 형태의 대역내 시그널링(in-band signaling)을 이용하거나, 혹은 대역외 시그널링(out-of-band signaling)을 이용함으로써 전송된다. 이들 다른 실시예에서는, 헤더(40)를 검출하여 그에 응답하기 위해 DTMF 검출기(18) 이외의 수단이 사용된다. 적당한 대역내 시그널링의 예는 모노톤 시그널링(monotone signaling: 단조 시그널링)과 전화기술 데이터 프로토콜을 포함한다. 적당한 대역외 시그널링의 예는 분리된 페이징(paging) 채널상에서의 시그널링이다.In one embodiment, referring to FIG. 2, an example of a quality test message 30 is shown. The quality test message 30 is composed of 32, 34, 36, three parts of which comprise a header 40 transmitted using DTMF signaling, and four parts 32 of which a fourth part contains a voice test message 38. , 34, 36, 38). A unique word 32 is used to signal the start (start) of the new quality test message 30. A unique language 32 is included to prevent erroneous measurement start signals during periods of strict channel degradation, e.g., reception of very loud sounds by the VEUT 12 of signals from the cellular network. The sentence ID 34 includes the exponent or identifier of the voice test message 38 so that another test message is sent to the VEUT 12 for identification by the VQA 10. The sync pulse 36 is a short DTMF pulse used to signal the head of the voice test signal 38. The sync pulse 36 is used by the sequencer 20 to cause the speech sentence generator 22 to begin playing back a pre-stored speech test signal indication suitable for comparison with that received by the VEUT 12. In another embodiment, the header 40 is transmitted by other methods, such as using other forms of in-band signaling or by using out-of-band signaling. In these other embodiments, means other than DTMF detector 18 are used to detect and respond to header 40. Examples of suitable in-band signaling include monotone signaling (mono signaling) and telephony data protocols. An example of suitable out-of-band signaling is signaling on a separate paging channel.

한 실시예에서는, 도 3을 참조하면, 시퀀서(20)는 유일어 검출기(42), 문장 ID 검출기(44) 및 도 1의 DTMF 검출기(18)의 기능을 포함하고 있는 대충 동기 검출기(coarse sync detector; 46)를 갖추고 있다. 그래서, 분리된 DTMF 검출기(18)는 도 3에는 나타내고 있지 않다. 유일어(32)가 유일어 검출기(42)에 의해 인식된 경우, 순차적으로 수신된 데이터는 문장 ID 검출기(44)를 통과하게 된다. 문장 ID 검출기(44)는 유일어 다음에 수신된 문장 ID(34)를 검출한다. 문장 ID가 식별된 때에, 문장 ID(34)에 의해 식별된 음성 시험신호에 대응하는 음성 시험신호의 적합한 미리 기억된 표시를 출력할 수 있도록 음성문장 발생기(22)를 통과하게 되고, 순차적으로 수신된 데이터는 대충 동기 검출기(46)를 통과하게 된다. 대충 동기 검출기(46)는, 한 실시예에서 짧은 DTMF 펄스로 부호화된 동기 펄스(36)를 검출한다. 대충 동기 검출기(46)로부터 대충 동기 신호가 수신된 때에, 음성문장 발생기(22)는 결정된 문장 ID(34)에 대응하는 음성신호의 미리 기억된 표시의 재생을 개시한다.In one embodiment, referring to FIG. 3, the sequencer 20 includes a coarse sync that includes the functions of a unique language detector 42, a sentence ID detector 44, and the DTMF detector 18 of FIG. 1. detector 46). Thus, the separated DTMF detector 18 is not shown in FIG. When the unique language 32 is recognized by the unique language detector 42, the sequentially received data passes through the sentence ID detector 44. The sentence ID detector 44 detects the sentence ID 34 received after the unique word. When the sentence ID is identified, it is passed through the speech sentence generator 22 so that a suitable pre-stored indication of the speech test signal corresponding to the speech test signal identified by the sentence ID 34 can be output and received sequentially. The collected data is roughly passed through the sync detector 46. Roughly synchronous detector 46 detects synchronous pulses 36 encoded with short DTMF pulses in one embodiment. When the rough synchronizing signal is received from the rough synchronizing detector 46, the speech sentence generator 22 starts reproduction of a pre-stored display of the speech signal corresponding to the determined sentence ID 34. < Desc / Clms Page number 12 >

한 실시예에 있어서, 동기 펄스(36)에 의해 제공되는 대충 동기는, 신호 비교기(14)가 실시간 즉 신호 비교기(14)에 의해 수행되는 품질 평가가 사용자에 의해 인식되는 바와 같이 명백한 지연이 약간 있거나 없는 음성 시험신호(38)의 수신중에 발생하도록 음성 시험신호(38)를 음성신호의 미리 기억된 표시와 비교하기에 충분하지 않다. 한 실시예에서는, 대충 동기는 PSQM(Perceptual Speech Quality Measure), MNB(Measuring Normalizing Blocks), PAMS(Perceptual Analysis Measurement System) 및 MBSD(Modified Bark Spectral Distortion) 측정 알고리즘을 이용하여 음성 시험신호(38)를 분석하기에 충분하지 않다. 따라서, 더욱 정확한 동기를 위해 미세 동기 검출기(fine sync detector; 24)가 제공된다. 미세 동기 검출기(24)는 음성문장 발생기(22)의 출력을 동기 윈도우잉 모듈(sync windowing module; 52)에 의해 선택된 음성 데이터의 윈도우와 비교한다. 한 실시예에서는, 이 비교는 ITU(International Telecommunications Union: 국제 전기통신 연합) 표준 P.931, "Multimedia Communications Delay, Synchronization and Frame Rate Measurement"에 따라 수행된다. 이 비교의 결과, 미세 동기가 이루어질 때 닫히는 스위치(54)를 제어하기 위해 미세 동기 검출기(24)의 출력이 생성된다. 스위치(54)는 미세 동기가 이루어지기 전에 품질 평가가 출력되는 것을 방지한다. 게다가, 음성 시험신호의 미리 기억된 표시의 동기된 부분을 나타내는 데이터 윈도우는 선택 윈도우잉 모듈(56)로의 출력이다. 선택 윈도우잉 모듈(56)은 미리 기억된 표시(60)의 동기된 부분과 비교하기 위해 도입되는 음성 시험 데이터(58)의 동기된 부분을 선택한다. 이 비교는 지각성 비교기(14)에 의해 수행되고, 품질 평가가 생성된다. 품질 평가는 상술한 바와 같이 스위치(54)가 닫힐 때의 출력이다.In one embodiment, the coarse synchronization provided by the sync pulse 36 is such that the apparent delay is slightly reduced as the signal comparator 14 is recognized in real time, i.e., by the user, a quality assessment performed by the signal comparator 14 is recognized. It is not sufficient to compare the audio test signal 38 with a pre-stored representation of the audio signal so that it occurs during reception of the audio test signal 38 with or without it. In one embodiment, rough synchronization is performed using the Perceptual Speech Quality Measure (PSQM), Measuring Normalizing Blocks (MNB), Perceptual Analysis Measurement System (PAMS), and Modified Bark Spectral Distortion (MBSD) measurement algorithms. Not enough to analyze Thus, a fine sync detector 24 is provided for more accurate synchronization. Fine sync detector 24 compares the output of speech sentence generator 22 with a window of speech data selected by sync windowing module 52. In one embodiment, this comparison is performed in accordance with International Telecommunications Union (ITU) standard P.931, "Multimedia Communications Delay, Synchronization and Frame Rate Measurement." As a result of this comparison, the output of the fine sync detector 24 is generated to control the switch 54 which is closed when fine sync is made. The switch 54 prevents the quality evaluation from being output before fine synchronization is made. In addition, the data window representing the synchronized portion of the pre-stored display of the audio test signal is an output to the selection windowing module 56. The selection windowing module 56 selects the synchronized part of the voice test data 58 to be introduced for comparison with the synchronized part of the pre-stored indication 60. This comparison is performed by the perceptual comparator 14, and a quality assessment is generated. The quality evaluation is an output when the switch 54 is closed as described above.

도 4는 본 발명의 한 실시예의 동기 윈도우잉 모듈(52)과 선택 윈도우잉 모듈(56)의 윈도우잉 동작을 나타낸 도면이다. 동기 윈도우(62)는 동기 윈도우잉 모듈(52)에 의해 버퍼(48)로부터 선택된다. 동기 윈도우(62)와 선택 윈도우잉 모듈(56)에 의해 선택된 선택 윈도우(64)의 선두가 정렬된다. 버퍼(48)는 계수화된 음성 입력을 수용하는 환상 버퍼(circular buffer)이다. 동기 윈도우(62)의 위치는, 도 3에 나타낸 바와 같이 지각성 비교기(14)에 의해 이루어진 품질 측정에 따라 조정된다. 이 실시예에서 선택 윈도우(64)의 동기 윈도우(62)와의 정렬은, 음성문장 발생기(22)로부터 출력되는 윈도우 데이터의 선택에 의해 포함하고 있는 미세 동기 검출기(24)에 의해 이루어진다.4 is a view showing the windowing operation of the synchronous windowing module 52 and the selection windowing module 56 of one embodiment of the present invention. The sync window 62 is selected from the buffer 48 by the sync windowing module 52. The head of the selection window 64 selected by the synchronization window 62 and the selection windowing module 56 is aligned. Buffer 48 is a circular buffer that accepts digitized speech input. The position of the synchronization window 62 is adjusted according to the quality measurement made by the perceptual comparator 14 as shown in FIG. In this embodiment, alignment of the selection window 64 with the synchronization window 62 is made by the fine synchronization detector 24 included by the selection of the window data output from the speech sentence generator 22.

도 3에 나타낸 실시예에서는, 선택 윈도우잉 모듈(56)은 또한 데이터 가중(data weighting)을 위해 적어도 수신된 음성과 음성 시험신호의 미리 기억된 표시의 하나에 윈도우 함수(window function)를 적용한다. 한 실시예에서는, 도 5에 나타내어진 바와 같은 사각형 가중(rectangular weighting), 일례로서 도 6에 나타내어진 비선형의 강조된 가중(nonlinear emphasized weighting) 및 일례로서 도 7에 나타내어진 불연속 사각형 가중(discontinuous rectangular weighting)을 포함하는 복수의 가중함수가 제공된다. 가중함수의 선택은 품질 알고리즘의 선택을 통해 미리 선택된다. 이 선택은 또한 도 3에 나타낸 바와 같이 지각성 비교기(14)로부터의 품질 측정에 따라 적응적으로 변경할 수 있다. 불연속 사각형 가중은, 예컨대 셀룰러 시스템에서의 핸드오프(hand-off) 등의 장애가 음성신호 데이터의 수신을 방해할 때 사용된다. 이 경우, 한 실시예에서는, 지각성 비교기(14)에 의해 사용되는 알고리즘이 품질 평가로부터 교란된 기간을 차단한다. 한 실시예에 있어서, 교란된 기간의 발생 및 길이는 품질 측정과 독립하여 보고된다.In the embodiment shown in FIG. 3, the selection windowing module 56 also applies a window function to at least one of the pre-stored representations of the received speech and speech test signal for data weighting. . In one embodiment, rectangular weighting as shown in FIG. 5, nonlinear emphasized weighting as shown in FIG. 6 as an example and discontinuous rectangular weighting as shown in FIG. 7 as an example A plurality of weighting functions are provided. The selection of weighting functions is preselected through the selection of a quality algorithm. This selection can also be adaptively changed according to the quality measurement from the perceptual comparator 14 as shown in FIG. 3. Discontinuous rectangular weighting is used when disturbances, such as hand-off in cellular systems, for example, interfere with the reception of voice signal data. In this case, in one embodiment, the algorithm used by the perceptual comparator 14 blocks the disturbed period from the quality assessment. In one embodiment, the occurrence and length of disturbed periods are reported independently of quality measurements.

본 발명에 따른 시험 구성의 일례가 도 8에 나타내어져 있다. VQA(10)내의 많은 기능 구성요소나 모든 기능 구성요소는 설계상의 선택으로서 컴퓨터의 소프트웨어나 펌웨어로 실현될 수 있고, 따라서 VQA(10)는 도 8에 컴퓨터로서 나타내고 있다. 한 실시예에 있어서, VEUT(12)의 출력 포트에 연결된 VQA(10)는 핸즈프리 포트(hands-free port)를 갖춘 셀룰러 전화기(12)이다. 이와 같이 해서, 셀룰러 전화기(12)에 의해 수신된 품질 시험 메시지(30)는 분석을 위해 VQA(10)로 전송된다. 셀룰러 전화기(12)는, 예컨대 셀룰러 무선 네트워크 등의 네트워크(68)를 통해 메시지 소스(66)로부터 품질 시험 메시지(30)를 수신한다. 한 실시예에서는, 메시지 소스(66)는 음성 메일박스내에 기억되어 있는 기록된 품질 시험 메시지(30)를 가지고 있는 응답머신으로서 구성되어 있다. 음성 메일박스내의 기록된 품질 시험 메시지(30)는 문장 ID(34)와 식별된다. 메시지 소스(66)내에 기억되어 있는 음성 시험신호(38)는 VQA(10)의 음성문장 발생기(22)에서의 음성 시험 메시지의 대응하는 미리 기억된 표시를 식별하는 문장 ID(34)와 식별된다.An example of the test configuration according to the present invention is shown in FIG. 8. Many or all of the functional components in the VQA 10 can be realized by software or firmware of a computer as a design choice, and thus the VQA 10 is shown as a computer in FIG. In one embodiment, the VQA 10 connected to the output port of the VEUT 12 is a cellular telephone 12 with a hands-free port. In this way, the quality test message 30 received by the cellular telephone 12 is sent to the VQA 10 for analysis. The cellular telephone 12 receives the quality test message 30 from the message source 66 via, for example, a network 68 such as a cellular wireless network. In one embodiment, the message source 66 is configured as a response machine having a recorded quality test message 30 stored in a voice mailbox. The recorded quality test message 30 in the voice mailbox is identified with the sentence ID 34. The voice test signal 38 stored in the message source 66 is identified with a sentence ID 34 identifying the corresponding pre-stored indication of the voice test message in the voice sentence generator 22 of the VQA 10. .

한 실시예에서는, 도 9를 참조하면, VEUT(12)가 네트워크(68)를 통해 메시지 소스(66)를 다이얼링하고(스텝 100), 그로부터 음성 메일 메시지를 검색한다(스텝 102). 검색된 음성 메일 메시지는 품질 시험 메시지(30)이다. 그 후, VQA(10)는 유일어(32)가 인식될 때까지 대기한다(스텝 104, 106). 다음에, 문장 ID(34)를 얻는다(스텝 108). 그 후, VQA(10)는 동기 펄스(36)가 수신될 때까지 대기한다(스텝 110, 112). 동기 펄스(36)가 수신된 때에는, 음성 시험신호(38)의 국부적인 복사가, 예컨대 음성문장 발생기(22)로부터 검색된다(스텝 114). 그 후, 음성 시험신호(38)의 국부적인 복사의 미세 동기가 수행되고(스텝 116), 음성 시험신호(38)가 종료되었다는 것이 판단될 때까지 음성 품질 측정이 계산된다(스텝 118). 음성 시험신호(38)가 종료된 때에는, 계산된 품질이 표시되고(스텝 122), 시험의 종료에 도달하게 된다(스텝 124). 다른 실시예에서는, 품질 시험이 수동 또는 자동으로 반복되도록 할 수 있다.In one embodiment, referring to FIG. 9, VEUT 12 dials message source 66 via network 68 (step 100) and retrieves a voice mail message therefrom (step 102). The retrieved voice mail message is a quality test message 30. Thereafter, the VQA 10 waits until the unique language 32 is recognized (steps 104 and 106). Next, sentence ID 34 is obtained (step 108). The VQA 10 then waits until the sync pulse 36 is received (steps 110 and 112). When the sync pulse 36 is received, a local copy of the voice test signal 38 is retrieved from the voice sentence generator 22, for example (step 114). Thereafter, fine synchronization of local copying of the voice test signal 38 is performed (step 116), and the voice quality measurement is calculated until it is determined that the voice test signal 38 has been terminated (step 118). When the audio test signal 38 ends, the calculated quality is displayed (step 122), and the end of the test is reached (step 124). In other embodiments, the quality test can be repeated manually or automatically.

당업자라면, 여기에 설명된 발명이 음성신호의 실시간 지각성 품질 측정을 제공함을 알 수 있을 것이다. 본 발명은, 이전에는 신호의 실시간 측정에 적당하다고 알려지지 않았던 알고리즘을 이용하여 그러한 측정을 수행하는데 특히 바람직하다. 본 발명은 또한 고도로 압축된 음성신호가 전송될 때의 실시간 지각성 품질 측정을 제공하는데 특히 바람직하다. 여기에 설명된 실시예는 음성신호의 품질측정에 적용가능하지만, 본 발명은 비음성 오디오 시험신호(non-voice audio test signal)의 품질 측정에 더 적합함을 알 수 있을 것이다. 따라서, 이들 실시예에서 음성 품질 분석기(10)는 일반적으로 오디오 품질 분석기(audio quality analyzer: AQA)이고, 음성 시험신호(38)는 오디오 시험신호이며, 음성문장 발생기(22)는 (계수화된 파형 발생기 등의) 오디오 파형 발생기이고, 오디오 파형 발생기에서의 음성 시험신호의 미리 기억된 표시는 오디오 시험신호의 미리 기억된 표시이다.
이상 설명한 바와 같이 본 발명에 의하면, 객관적인 지각성 품질측정을 실시간으로 또는 거의 실시간으로 실행할 수 있는 음성이나 오디오신호의 실시간 객관지각 품질측정을 제공하기 위한 방법 및 장치를 제공할 수 있다.Those skilled in the art will appreciate that the invention described herein provides a real time perceptual quality measure of a speech signal. The present invention is particularly desirable for making such measurements using algorithms that were not previously known to be suitable for real time measurements of signals. The present invention is also particularly desirable to provide real-time perceptual quality measurements when highly compressed voice signals are transmitted. While the embodiments described herein are applicable to the measurement of the quality of speech signals, it will be appreciated that the present invention is more suitable for measuring the quality of non-voice audio test signals. Thus, in these embodiments the speech quality analyzer 10 is typically an audio quality analyzer (AQA), the speech test signal 38 is an audio test signal, and the speech sentence generator 22 is An audio waveform generator (such as a waveform generator), and the pre-stored display of the audio test signal in the audio waveform generator is a pre-stored display of the audio test signal.
As described above, according to the present invention, it is possible to provide a method and apparatus for providing a real-time objective perception quality measurement of a voice or audio signal capable of performing the objective perceptual quality measurement in real time or near real time.

당업자에게는 본 발명의 범위내에서의 각종의 변형이 가능함을 알 수 있다. 따라서, 본 발명의 범위는 이후에 첨부된 청구범위 및 그 등가내용의 참조에 의해 판단되어야 한다.It will be appreciated by those skilled in the art that various modifications are possible within the scope of the invention. Accordingly, the scope of the invention should be determined by reference to the following claims and their equivalents.

Claims

Receiving a quality test signal, the method comprising receiving an audio test signal;

Roughly synchronizing reproduction of a pre-stored display of the audio test signal with a received audio test signal,

Finely synchronizing reproduction of a pre-stored display of the audio test signal with a received audio test signal;

Comparing the window of the received audio test signal with a portion of the finely synchronized reproduction of a pre-stored representation of the audio test signal to output a quality measurement of the received audio test signal. A method for providing real-time perceptual quality measurements of a person.

2. The method of claim 1, wherein the quality test signal has a header signal comprising a sync pulse, and the step of roughly synchronizing reproduction of a pre-stored representation of the audio test signal with a received audio test signal comprises: And synchronizing the reproduction of the pre-stored display of the audio test signal.

The method of claim 2, wherein the step of finely synchronizing the reproduction of the pre-stored display of the audio test signal with the received audio test signal,

Comparing the data of the window portion of the received audio test signal with the window portion of a pre-stored display of the audio test signal;

And adjusting the alignment of the window portion of the received audio test signal with the window portion of a pre-stored display of the audio test signal in accordance with the result of the comparison.

4. The method of claim 3, further comprising receiving an out-of-band header signal.

4. The method of claim 3, further comprising receiving an in-band header signal.

6. The pre-stored display of the audio test signal according to claim 5, wherein the receiving of the header signal comprises receiving a DTMF tone and synchronizing reproduction of a pre-stored display of the audio test signal with a DTMF pulse. And roughly synchronizing playback of the music.

4. The method according to claim 3, wherein the audio test signal is a voice test signal and the pre-stored display of the audio test signal is a pre-stored display of the voice test signal.

8. The method of claim 7, further comprising: receiving a sentence ID identifying a received voice test signal;

And selecting a pre-stored display of the voice test signal from the plurality of pre-stored displays in accordance with the received sentence ID.

9. The method of claim 8, wherein receiving a sentence ID identifying a received speech test signal comprises receiving a DTMF tone identifying the received speech test signal.

4. The method of claim 3, wherein comparing the window of the received audio test signal with a portion of the finely synchronized reproduction of a pre-stored representation of the audio test signal to output a quality measurement of the received audio test signal is ITU P. .861 determine the quality measurement according to at least one quality measurement algorithm selected from a quality measurement group consisting of perceptual speech quality measurement (PSQM), modified normalized block (NBN), modified bark spectral distortion (MSBD), and perceptual analysis measurement system (PAMS). And generating the same.

The method of claim 10, further comprising: receiving a sentence ID in a header signal;

Selecting a quality measurement algorithm for generating a quality measurement according to the received sentence ID.

4. The method of claim 3, further comprising: receiving a unique language transmitted in a header signal;

Verifying that a unique language has been received before outputting a quality measurement of the received audio test signal.

13. The method of claim 12, wherein receiving the unique language comprises receiving a DTMF signal representing the unique language.

The method according to claim 1, wherein a windowing function is applied to at least one window of the received audio test signal and one window of the finely synchronized pre-stored display of the audio test signal before comparing the window portions to generate a quality measurement. And further comprising applying.

15. The method of claim 14, wherein applying the windowing function comprises preselecting the windowing function.

16. The method of claim 15, wherein applying the windowing function comprises adaptively selecting the windowing function.

In an audio quality analyzer (AQA) for evaluating a quality test signal with an audio test signal received by a device to be tested,

Roughly synchronizing reproduction of the pre-stored display of the audio test signal with the received audio test signal,

Finely synchronizing reproduction of a pre-stored display of the audio test signal with a received audio test signal,

An audio quality analyzer, configured to compare a window of the received audio test signal with a portion of finely synchronized reproduction of a pre-stored representation of the audio test signal to output a quality measurement of the received audio test signal .

18. The apparatus of claim 17, wherein the quality test signal has a sync pulse, and is configured to roughly synchronize the reproduction of a pre-stored display of the audio test signal with the received audio test signal using the sync pulse. Audio quality analyzer.

19. The method of claim 18, wherein the data of the window portion of the received audio test signal is compared with the window portion of a pre-stored display of the audio test signal

And adjust the alignment of the window portion of the received audio test signal with the window portion of a pre-stored display of the audio test signal in accordance with the result of the comparison.

20. The audio quality analyzer of claim 19, further configured to receive an out-of-band header signal.

20. The audio quality analyzer of claim 19, further configured to receive an in-band header signal.

22. The audio quality analyzer according to claim 21, further comprising: receiving a DTMF tone as the header signal and roughly synchronizing reproduction of a pre-stored display of an audio test signal with a DTMF pulse.

20. The audio quality analyzer of claim 19, wherein the audio test signal is a voice test signal and the pre-stored display of the audio test signal is a pre-stored display of the voice test signal.

24. The method of claim 23, further comprising: receiving a sentence ID identifying a received voice test signal,

And a pre-stored display of the voice test signal from the plurality of pre-stored displays in accordance with the received sentence ID.

25. The audio quality analyzer of claim 24, further configured to receive a DTMF signal as a sentence ID.

20. The method of claim 19, wherein at least one selected from the group of quality measurements comprising ITU P.861 perceptual speech quality measurement (PSQM), modified normalized block (NBN), modified bark spectral distortion (MSBD), and perceptual analysis measurement system (PAMS) An audio quality analyzer, configured to generate a quality measurement in accordance with a quality measurement algorithm.

27. The apparatus of claim 26, further comprising: receiving a sentence ID in a header signal,

And a quality measurement algorithm for generating a quality measurement in accordance with the received sentence ID.

20. The method of claim 19, further comprising: receiving a unique language transmitted in a header signal,

And verify that a unique language has been received before outputting a quality measurement of the received audio test signal.

29. The audio quality analyzer of claim 28, further configured to receive a DTMF signal representing a unique language.

20. The apparatus of claim 19, furthermore at least one of the window portion of the received audio test signal and the finely synchronized window portion of the pre-stored representation of the audio test signal before comparing them to generate a quality measurement of the received audio test signal. An audio quality analyzer configured to apply a windowing function to a window of the.

31. The audio quality analyzer of claim 30, further configured to apply a preselected windowing function.

32. The audio quality analyzer of claim 31, further configured to adaptively apply a windowing function.