RU2704723C2

RU2704723C2 - Automatic voice verification device

Info

Publication number: RU2704723C2
Application number: RU2018108980A
Authority: RU
Inventors: Андрей Андреевич Катанович; Виктор Леонидович Муравченко
Priority date: 2018-03-12
Filing date: 2018-03-12
Publication date: 2019-10-30
Also published as: RU2018108980A3; RU2018108980A

Abstract

FIELD: physics.

SUBSTANCE: invention relates to computer engineering for voice recognition. Technical result is achieved due to device for automatic verification of personality by voice, which contains microphone, ADC, two switches, unit for calculating speech signal parameters, unit for comparing parameters of speech signals, unit for verifying verification, unit for calculating reference microphone, a reference standard storage unit, a microphone reference unit, N microphone equivalence standards memorizing units connected in parallel to the microphone reference storage unit, commutator of input circuits of memory units of standards of microphones and switch of their output circuits, at that input of switch of input circuits is connected to output of memory unit of reference microphone, and output of output circuit commutator is connected to input of unit of calculation of parameters of speech signal.

EFFECT: technical result consists in improvement of accuracy of voice recognition for identity identification.

1 cl, 3 dwg

Description

Изобретение относится к системам установления или подтверждения личности говорящего. Техническим результатом является расширение функциональных возможностей устройства.The invention relates to systems for establishing or verifying the identity of the speaker. The technical result is the expansion of the functionality of the device.

Известны различные устройства верификации (подтверждения) личности по голосу. Например, Метод распознавания диктора и устройство для его осуществления, Патент РФ №2230375 от 20.05.2012, Способ и устройство автоматической верификации личности по голосу, Патент РФ №2399102 от 10.09.2010.Various voice verification devices are known. For example, the Speaker Recognition Method and device for its implementation, RF Patent No. 2230375 of 05/20/2012, Method and device for automatic verification of voice by voice, RF Patent No. 2399102 of 09/10/2010.

Наиболее близким по технической сущности является последнее из названных выше изобретений - Патент РФ №2399102.The closest in technical essence is the last of the above inventions - RF Patent No. 2399102.

Недостатком известного устройства является ограничение анализа акустического сигнала диктора, находящегося перед микрофоном. В то же время для практики представляет интерес верификация личности по голосу диктора, который поступает по каналу проводной связи либо по радиоканалу и на микрофон устройства верификации поступает акустический сигнал от динамика приемного устройства канала связи.A disadvantage of the known device is the limitation of the analysis of the acoustic signal of the speaker in front of the microphone. At the same time, for practice, it is of interest to verify the identity using the voice of the speaker, which is received via a wired communication channel or via a radio channel, and an acoustic signal from the speaker of the receiver of the communication channel is transmitted to the microphone of the verification device.

Ожидаемым техническим результатом является расширение возможности устройства автоматического подтверждение личности по голосу при поступлении акустического сигнала на микрофон устройства по каналу связи. Под каналом связи здесь понимается микрофон передатчика высокочастотного сигнала, эфир либо фидерная линия, приемник высокочастотного сигнала, динамик приемника. Поставленная задача решается тем, что в устройство автоматической верификации личности по голосу, содержащее источник речевого сигнала (микрофон и аналого-цифровой преобразователь), соединенный с входом первого коммутатора, один из выходов которого подключен к первому входу блока расчета параметров речевого сигнала, а другой выход первого коммутатора подключен к входу блока вычисления эталона микрофона, выход которого соединен с входом второго коммутатора, первый выход которого подключен к первому входу блока сравнения параметров эталона и входного речевого сигнала, выход которого соединен с входом блока принятия решения о распознаваемом дикторе, выход которого является выходом устройства в целом, а второй выход второго коммутатора соединен с входом блока запоминания эталонов дикторов, выход которого соединен со вторым входом блока выбора эталона диктора, первый вход которого соединен с блоком ввода верификационного признака диктора, а выход блока выбора эталона диктора подключен ко второму входу блока сравнения параметров эталона и входного речевого сигнала, при этом блок расчета параметров речевого сигнала содержит выделитель начало/окончания речевого сигнала и блок формирования параметров входного речевого сигнала, а блок вычисления эталона микрофона содержит блок оценки начала/окончания речевого сигнала, блок вычисления среднего значения спектральной плотности мощности речевого сигнала, блок деления на этот эталон среднего значения спектральной плотности мощности речевого сигнала, при этом блок расчета параметров входного речевого сигнала содержит блок вычисления коэффициентов корреляции речевого сигнала, соединенный с блоком расчета оценки частоты основного тона, первый выход которого соединен с первым входом блока формирования параметров входного речевого сигнала, а второй выход подключен ко входу блока расчета оценок амплитуд несущих гармоник, выход которого соединен с первым входом блока деления амплитуд несущих частот на эталон амплитудно-частотной характеристики используемого микрофона, выход которого подключен ко второму входу блока формирования параметров входного речевого сигнала, при этом блок запоминания эталона микрофона подключен ко второму входу блока деления амплитуд несущих гармоник на эталон амплитудно-частотной характеристики используемого микрофона, а блок вычисления эталона микрофона содержит блок вычисления коэффициентов корреляции речевого сигнала, причем блок оценки начало/окончания речевого сигнала, блок вычисления коэффициентов корреляции речевого сигнала и блок вычисления среднего значения спектральной плотности мощности речевого сигнала соединены последовательно, выход блока вычисления среднего значения спектральной плотности мощности соединен с первым ходом блока деления на эталон среднего значения спектральной плотности мощности речевого сигнала, а второй вход блока деления соединен с выходом блока запоминания эталона среднего значения спектральной плотности мощности речевого сигнала, выход блока деления сигнала соединен со входом блока запоминания эталона микрофона, а блок сравнения параметров эталона и входного речевого сигнала выполнен с возможностью вычисления взвешенной Евклидовой невязки параметров входного речевого сигнала и эталона, отличающееся тем, что дополнительно параллельно блоку запоминания эталона микрофона включены N блоков запоминания эталонов эквивалентов микрофонов, коммутатор входных цепей блоков запоминания эталонов микрофонов и коммутатор их выходных цепей, при этом вход коммутатора входных цепей соединен с выходом блока запоминания эталона микрофона, а выход коммутатора выходных, цепей соединен с входом блока расчета параметров речевого сигнала.The expected technical result is the expansion of the device’s ability to automatically confirm identity by voice when an acoustic signal arrives at the device’s microphone through the communication channel. A communication channel here refers to a microphone of a transmitter of a high-frequency signal, ether or feeder line, a receiver of a high-frequency signal, and a speaker of a receiver. The problem is solved in that in a device for automatic verification of voice identity, containing a source of a speech signal (microphone and analog-to-digital converter) connected to the input of the first switch, one of the outputs of which is connected to the first input of the block for calculating the parameters of the speech signal, and the other output the first switch is connected to the input of the calculation unit of the microphone standard, the output of which is connected to the input of the second switch, the first output of which is connected to the first input of the unit for comparing parameters et a voice and an input speech signal, the output of which is connected to the input of the decision-making unit for a recognizable speaker, the output of which is the output of the device as a whole, and the second output of the second switch is connected to the input of the storage unit for speaker standards, the output of which is connected to the second input of the speaker standard selection unit, the first input of which is connected to the input unit of the speaker verification feature, and the output of the speaker standard selection unit is connected to the second input of the unit for comparing the parameters of the standard and the input speech signal, etc. This unit for calculating the parameters of the speech signal contains a separator of the beginning / end of the speech signal and the unit for generating the parameters of the input speech signal, and the unit for calculating the standard microphone contains the unit for evaluating the beginning / end of the speech signal, the unit for calculating the average value of the spectral power density of the speech signal, the division unit for this standard the average value of the spectral power density of the speech signal, while the block for calculating the parameters of the input speech signal contains a block for calculating the coefficients of the speech signal connected to the block for calculating the estimation of the frequency of the fundamental tone, the first output of which is connected to the first input of the block for generating parameters of the input speech signal, and the second output is connected to the input of the block for calculating estimates of the amplitudes of the carrier harmonics, the output of which is connected to the first input of the block for dividing the carrier amplitudes frequencies per standard amplitude-frequency characteristics of the microphone used, the output of which is connected to the second input of the block forming the parameters of the input speech signal, while the block of the microphone standard is connected to the second input of the unit for dividing the amplitudes of the carrier harmonics by the standard of the amplitude-frequency characteristics of the microphone used, and the calculation unit of the microphone standard contains a unit for calculating the correlation coefficients of the speech signal, and a unit for estimating the beginning / end of the speech signal, a unit for calculating the correlation coefficients of the speech signal and the unit for calculating the average value of the spectral power density of the speech signal is connected in series, the output of the unit for calculating the average value with the power spectral density is connected to the first stroke of the dividing unit by the standard of the average value of the spectral power density of the speech signal, and the second input of the dividing unit is connected to the output of the storage unit of the standard of the average value of the spectral power density of the speech signal, the output of the signal division unit is connected to the input of the memory unit of the microphone standard, and the unit for comparing the parameters of the standard and the input speech signal is configured to calculate the weighted Euclidean residual of the parameters of the input speech signal la and standard, characterized in that in addition to parallel to the memory block of the microphone standard included N blocks of memory of the standards of microphone equivalents, the input circuit switch of the blocks of memory of the microphone standards and the switch of their output circuits, while the input switch of the input circuits is connected to the output of the memory block of the microphone standard, and the output switch of the output circuits is connected to the input of the block for calculating the parameters of the speech signal.

Блок-схема устройства автоматической верификации личности по голосу представлена на Фиг. 1.A block diagram of an automatic voice verification device is shown in FIG. one.

На Фиг. 2 приведена блок-схема расчета параметров речевого сигнала.In FIG. 2 shows a block diagram of the calculation of the parameters of the speech signal.

На Фиг. 3 приведена блок-схема определения эталона амплитудно-частотных характеристик (АЧХ) используемого микрофона.In FIG. 3 shows a block diagram of determining the standard amplitude-frequency characteristics (AFC) of the microphone used.

Перечень позиций.The list of positions.

1 - микрофон (М);1 - microphone (M);

2 - аналого-цифровой преобразователь (АЦП);2 - analog-to-digital converter (ADC);

3 - коммутатор (низ - верификация или обучение, верх - настройка технических параметров);3 - switch (bottom - verification or training, top - setting technical parameters);

4 - блок расчета параметров речевого сигнала (БПРС);4 - block calculation of the parameters of the speech signal (BPRS);

5 - блок сравнения параметров речевых сигналов (БСПРС);5 - block comparing the parameters of speech signals (BSPRS);

6 - блок принятия решения верификации (БПРВ);6 - block decision making verification (BPRV);

7 - блок запоминания эталона микрофона (БЗЭМ);7 - block memorization of the microphone standard (BZEM);

8 - блок вычисления эталона микрофона (БВЭМ);8 - block calculation of the microphone standard (BEM);

9 - коммутатор (верх - верификация, низ - обучение);9 - switch (top - verification, bottom - training);

10 - блок запоминания эталонов дикторов (БЗЭД);10 - block memorization of speaker standards (BZED);

11 - блок ввода верификационного признака диктора (БВВПД);11 - input block speaker verification feature (BVVPD);

12 - блок выбора эталона диктора (БВЭД);12 - block selection of the standard speaker (BVED);

13 - блок оценки начала/окончания речевого сигнала (БОНОРС);13 - block assessment of the beginning / end of a speech signal (BONORS);

14 - блок вычисления коэффициентов корреляции речевого сигнала (БВККРС);14 - block calculation of the correlation coefficients of the speech signal (BVKKRS);

15 - блок расчета оценки частоты основного тона (БРОЧОТ);15 - block calculating the assessment of the frequency of the fundamental tone (LAPP);

16 - блок расчета оценок амплитуд несущих гармоник (БРОАНГ);16 is a block for calculating estimates of the amplitudes of the carrier harmonics (BROANG);

17 - блок деления амплитуд несущих гармоник на эталон амплитудно-частотной характеристики используемого микрофона (БДАНГМ);17 is a block for dividing the amplitudes of the carrier harmonics on the standard amplitude-frequency characteristics of the microphone used (BDANGM);

18 - блок формирования параметров речевого сигнала (БФПРС);18 - block forming the parameters of the speech signal (BFPRS);

19 - блок оценки начала/окончания речевого сигнала (БОНОРС);19 - block assessment of the beginning / end of a speech signal (BONORS);

20 - блок вычисления коэффициентов корреляции речевого сигнала (БВККРС);20 - block calculation of the correlation coefficients of the speech signal (BVKKRS);

21 - блок вычисления среднего значения спектральной плотности мощности (БВСЗСПМ);21 is a block for calculating the average value of the power spectral density (BVSZSPM);

22 - блок деления (БД);22 - division block (DB);

23 - эталон амплитудно-частотной характеристики микрофона;23 - standard amplitude-frequency characteristics of the microphone;

24 - эталон среднего значения спектральной плотности мощности речевого сигнала (ЭСЗСПМРС).24 is a standard average value of the spectral power density of a speech signal (ESZSPMRS).

Работа устройства осуществляется в 2-х режимах: в режиме обучения и режиме верификации.The device operates in 2 modes: in training mode and verification mode.

В режиме обучения речевой сигнал голосовых паролей, произносимых заранее известными дикторами, подают на вход микрофона 1 или выхода канала связи канала связи через АЦП 2 и коммутатор 3 на вход БРПРС 4. Коммутатор 3 переключает устройство в режим верификации или обучения (нижняя позиция на блоке 3 фиг. 1) или в режим настройки технических параметров (верхняя позиция на блоке 3 фиг. 1). В качестве голосовых паролей используют отдельные слова. Из речевого сигнала произнесенных паролей в БРПРС 4 формируют параметры речевых эталонов. При этом коммутатор 9 замыкает вход на второй (нижний на блоке 9 на фиг. 1) На каждое произнесение каждого голосового пароля каждого известного диктора запоминают свой эталон. Число заранее известных дикторов может быть любым: от одного и более. Число использованных голосов паролей также может быть любым, большим единицы.In the training mode, the voice signal of voice passwords, spoken by well-known speakers, is fed to the microphone input 1 or the output of the communication channel of the communication channel through the ADC 2 and switch 3 to the input BRPRS 4. Switch 3 switches the device to verification or learning mode (lower position on block 3 Fig. 1) or in the setting mode of technical parameters (the upper position on the block 3 of Fig. 1). As voice passwords use separate words. From the speech signal spoken passwords in BRPRS 4 form the parameters of speech standards. At the same time, the switch 9 closes the input to the second (lower one on block 9 in Fig. 1). Each pronunciation of each voice password of each known speaker remembers its own standard. The number of well-known speakers can be any: from one or more. The number of password votes used can also be any greater than one.

Сохраненные эталоны используют для сравнения с входным речевым сигналом верифицируемого диктора. Выбор эталонов для верификации заявляемого диктора производят блоком БВЭД 12. В режиме верификации неизвестный диктор через блок БВВПД 11 вводит верификационный признак того диктора, тождество с которым он хочет подтвердить своим голосовым паролем. Далее блок БВЭД 12 выбирает для сравнения эталон того диктора, тождество с которым заявил верифицируемый диктор.The stored standards are used for comparison with the input speech signal of the verified speaker. The selection of standards for verification of the claimed speaker is carried out by the BVED unit 12. In the verification mode, an unknown speaker through the BVVPD block 11 introduces a verification sign of that speaker, the identity with which he wants to confirm with his voice password. Further, the BVED unit 12 selects for comparison the standard of that speaker, the identity with which the verified speaker declared.

В этой части (осуществления верификации личности по голосу) заявляемое устройство полностью совпадает по составу блоков и их функционирования с устройством прототипа.In this part (verification of identity by voice), the inventive device completely coincides in the composition of the blocks and their functioning with the prototype device.

В отличие от прототипа, в котором вычисляется нормированное значение АЧХ используемого микрофона, при использовании канала связи акустический сигнал на микрофон заявляемого устройства поступает от динамика приемника канала связи. В этом случае нормирование АЧХ микрофона заявляемого устройства производится по тестовому сигналу (сигналу с равными амплитудами гармоник акустического сигнала от модуля тестирования микрофонов, например Euraudio PRO 600S), прошедшему по каналу связи изменившего тестовый сигнал в соответствии с реальной АЧХ канала связи, и далее прошедшему через микрофон заявляемого устройства, еще раз изменившего тестовый сигнал в соответствии с АЧХ микрофона заявляемого устройства. Следовательно, этот сигнал будет содержать информацию о совокупной АЧХ конкретного канала связи и микрофона устройства верификации. Совокупное значение АЧХ конкретного канала связи будет учтено заявляемым устройством как характеристика некоторого эквивалента микрофона устройства верификации, обладающего совокупной АЧХ, и будет внесена в память БЗЭМ 7. Поскольку принципиально возможно использование нескольких каналов связи, то целесообразно иметь несколько блоков БЗЭМ для реализации верификации голосов дикторов в различных каналах связи. При наличии нескольких N блоков БЗЭМ необходимо введение в устройство верификации входного и выходного коммутаторов для переключения блоков БЗЭМ при смене каналов связи. При использовании нескольких БЗЭМ функционирование всех блоков заявляемого устройства, включая и блоки БЗЭМ, принципиально не изменяется и полностью соответствует функционированию блоков устройства прототипа. Различие заявляемого устройства от устройства прототипа заключается только в увеличении числа блоков БЗЭМ и коммутаторов на их входах и выходах.Unlike the prototype, in which the normalized frequency response of the microphone used is calculated, when using the communication channel, the acoustic signal to the microphone of the claimed device comes from the speaker of the receiver of the communication channel. In this case, the frequency response of the microphone of the claimed device is normalized by the test signal (a signal with equal amplitudes of harmonics of the acoustic signal from the microphone testing module, for example Euraudio PRO 600S), passed through the communication channel that changed the test signal in accordance with the real frequency response of the communication channel, and then passed through the microphone of the claimed device, once again changed the test signal in accordance with the frequency response of the microphone of the claimed device. Therefore, this signal will contain information about the aggregate frequency response of a particular communication channel and microphone of the verification device. The aggregate value of the frequency response of a particular communication channel will be taken into account by the claimed device as a characteristic of a microphone equivalent of a verification device having a total frequency response and will be stored in the memory of the BZEM 7. Since it is fundamentally possible to use several communication channels, it is advisable to have several BZEM blocks for verifying the voice of speakers in various communication channels. If there are several N blocks of BZEM, it is necessary to introduce into the verification device the input and output switches to switch the blocks of BZEM when changing communication channels. When using several BZEM, the functioning of all blocks of the inventive device, including BZEM blocks, does not fundamentally change and fully corresponds to the functioning of the blocks of the prototype device. The difference of the claimed device from the device of the prototype is only to increase the number of blocks BZEM and switches at their inputs and outputs.

Таким образом, за счет введения дополнительных БЗЭМ, учитывающих АЧХ канала связи при тестировании микрофона устройства верификации, и коммутаторов переключения этих блоков, решается поставленная цель - расширение возможности известного устройства автоматической верификации личности по голосу диктора в части увеличения числа каналов верификация личности по голосу.Thus, due to the introduction of additional BEMs, taking into account the frequency response of the communication channel when testing the microphone of the verification device, and the switching switches of these units, the goal is solved - expanding the capabilities of the known device for automatic verification of personality by voice of the speaker in terms of increasing the number of channels by voice verification.

Claims

An automatic voice verification device containing a voice source (microphone and analog-to-digital converter) connected to the input of the first switch, one of the outputs of which is connected to the first input of the block for calculating the parameters of the speech signal, and the other output of the first switch is connected to the input of the calculation block the microphone standard, the output of which is connected to the input of the second switch, the first output of which is connected to the first input of the unit for comparing the parameters of the standard and the input speech signal, the output which is connected to the input of the decision block on the recognizable speaker, the output of which is the output of the device as a whole, and the second output of the second switch is connected to the input of the storage unit of speaker standards, the output of which is connected to the second input of the selection of the speaker standard, the first input of which is connected to the input unit speaker verification feature, and the output of the speaker standard selection unit is connected to the second input of the unit for comparing the parameters of the standard and the input speech signal, while the unit for calculating the parameters of the speech the signal contains a separator of the beginning / end of the speech signal and the unit for generating parameters of the input speech signal, and the unit for calculating the standard microphone contains the unit for evaluating the beginning / end of the speech signal, the unit for calculating the average value of the spectral power density of the speech signal, the unit for dividing into this standard the average value of the power spectral density the speech signal, while the block calculating the parameters of the input speech signal contains a block for calculating the correlation coefficients of the speech signal connected to the calculation window for estimating the frequency of the fundamental tone, the first output of which is connected to the first input of the unit for generating parameters of the input speech signal, and the second output is connected to the input of the unit for calculating estimates of the amplitudes of the carrier harmonics, the output of which is connected to the first input of the unit for dividing the amplitudes of the carrier frequencies into an amplitude-frequency standard characteristics of the microphone used, the output of which is connected to the second input of the unit for generating parameters of the input speech signal, while the unit for storing the microphone standard is connected to each input of the unit for dividing the amplitudes of the carrier harmonics by the standard of the amplitude-frequency characteristic of the microphone used, and the unit for calculating the standard of the microphone contains a unit for calculating the correlation coefficients of the speech signal, and a unit for estimating the beginning / end of the speech signal, a unit for calculating the correlation coefficients of the speech signal, and a unit for calculating the average spectral value the power density of the speech signal are connected in series, the output of the unit for calculating the average value of the spectral power density of the power is connected n with the first stroke of the dividing unit into the standard of the average value of the spectral power density of the speech signal, and the second input of the dividing unit is connected to the output of the storage unit of the standard of the average value of the spectral power density of the speech signal, the output of the dividing unit of the signal is connected to the input of the storage unit of the microphone standard, and the comparison unit parameters of the reference and the input speech signal is configured to calculate a weighted Euclidean residual of the parameters of the input speech signal and the standard, characterized in that In addition to the microphone standard memory block, N microphone equivalent standard memory blocks, an input circuit switch for microphone standard memory blocks and a switch for their output circuits are included, while the input circuit switch input is connected to the output of a microphone standard memory block, and the output circuit switch output is connected to the block input calculating the parameters of the speech signal.