RU170249U1

RU170249U1 - DEVICE FOR TEMPERATURE-INVARIANT AUDIO-VISUAL VOICE SOURCE LOCALIZATION

Info

Publication number: RU170249U1
Application number: RU2016135683U
Authority: RU
Inventors: Дмитрий Андреевич Суворов; Роман Алексеевич Жуков; Антон Александрович Евмененко; Дмитрий Олегович Тетерюков
Original assignee: Общество с ограниченной ответственностью ЛЕКСИ (ООО ЛЕКСИ)
Priority date: 2016-09-02
Filing date: 2016-09-02
Publication date: 2017-04-18

Abstract

Полезная модель относится к измерительной технике, в частности к устройствам локализации источников человеческой речи, и могут использоваться в системах распознавания речи или в системах видеоконференций, а также в охранных или робототехнических изделиях для контроля желаемых объектов или событий. Техническим результатом заявленного решения является повышение точности определения источников человеческой речи. Устройство локализации источника голоса содержит связанные по общей шине данных: микрофонную решетку, состоящую из MEMS микрофонов; устройство видеофиксации, жестко закрепленное относительно микрофонной решетки; блок определения атмосферного параметра окружающей среды; память, хранящую таблицу зависимости значений скорости звука в воздухе от значений атмосферного параметра окружающей среды; и блок обработки информации. Указанный результат достигается за счет введения в устройство дополнительных программно-аппаратных средств, выполняющих акустическое сканирование только по диапазону азимутов и углов мест, соответствующих областям обнаружения лиц с учетом реального значения скорости звука в воздухе, зависящего от атмосферных параметров окружающей среды.The utility model relates to measuring equipment, in particular to devices for the localization of human speech sources, and can be used in speech recognition systems or in video conferencing systems, as well as in security or robotic products for monitoring desired objects or events. The technical result of the claimed solution is to increase the accuracy of determining the sources of human speech. The voice source localization device contains connected via a common data bus: a microphone array consisting of MEMS microphones; a video capture device rigidly fixed relative to the microphone array; a unit for determining an atmospheric environmental parameter; a memory storing a table of the dependence of the speed of sound in air on the values of the atmospheric environmental parameter; and an information processing unit. The specified result is achieved by introducing additional software and hardware into the device that perform acoustic scanning only over the range of azimuths and elevation angles corresponding to areas of face detection, taking into account the real value of the speed of sound in air, depending on atmospheric environmental parameters.

Description

Область техникиTechnical field

Полезная модель относится к измерительной технике, в частности к устройствам локализации источников человеческой речи, и могут использоваться в системах распознавания речи или в системах видеоконференций, а также в охранных или робототехнических изделиях для контроля желаемых объектов или событий.The utility model relates to measuring equipment, in particular to devices for the localization of human speech sources, and can be used in speech recognition systems or in video conferencing systems, as well as in security or robotic products for monitoring desired objects or events.

Уровень техникиState of the art

Из уровня техники известны различные устройства и системы, обеспечивающие локализацию источников человеческой речи, осуществляемую посредством микрофонных решеток.The prior art various devices and systems that provide the localization of human speech sources, carried out by means of microphone arrays.

Например, известна звуковая система локализации для телеконференций с помощью самоуправляемых микрофонных решеток, описанная в патенте № US 5335011 А, опубл. 12.01.1993. В данном решении для определения направления на источники звука область вокруг установки разделяется на зоны. Каждая зона сканируется высоконаправленным акустическим лучом, чтобы проверить там наличие источников звука. Такая система чувствительна к реверберациям, а также плохо различает близкорасположенные источники звука, т.к. сформированная диаграмма направленности имеет угловую ширину от нескольких, до десятков градусов. Кроме того, в случае использования плоских или линейных микрофонных решеток, система не сможет различать источники звука спереди и сзади нее, т.е. расположенные на смежных углах. Система при формировании диаграммы направленности учитывает скорость звука в воздухе, однако не имеет датчиков для оценки ее реального значения, что приводит к ухудшению качества локализации при несовпадении предполагаемой скорости и реальной. Данное решения является наиболее близким аналогом.For example, a sound localization system for teleconferences using self-guided microphone arrays is described, described in patent No. US 5335011 A, publ. 01/12/1993. In this solution, to determine the direction to the sound sources, the area around the installation is divided into zones. Each zone is scanned by a highly directional acoustic beam to check for sound sources. Such a system is sensitive to reverberations, and also poorly distinguishes nearby sound sources, because the formed radiation pattern has an angular width from several to tens of degrees. In addition, in the case of using flat or linear microphone arrays, the system will not be able to distinguish between sound sources in front and behind it, i.e. located at adjacent corners. When forming the radiation pattern, the system takes into account the speed of sound in air, however, it does not have sensors to estimate its real value, which leads to a deterioration in the quality of localization if the assumed speed does not match the real speed. This solution is the closest analogue.

Известны способ и устройство для выбора активного говорящего с помощью микрофонных решеток и идентификация голоса, описанные в заявке № US 20090220065 А1, опубл. 03.03.2008. Описанная в данном документе система определяет направления на источники звука с помощью массива микрофонов и выделяет сигнал от говорящего. В дальнейшем он проходит через систему идентификации дикторов, которая служит дополнительным фильтром, отбрасывающим шумы и реверберации. Система опять же не имеет датчиков для оценки реальной скорости звука, необходимой для акустических расчетов, а также имеет проблемы с различением близко расположенных источников звука.A known method and device for selecting an active speaker using microphone arrays and voice identification described in application No. US 20090220065 A1, publ. 03.03.2008. The system described in this document determines the directions to sound sources using an array of microphones and emits a signal from the speaker. Subsequently, it passes through the speaker identification system, which serves as an additional filter that rejects noise and reverb. The system, again, does not have sensors to estimate the real speed of sound needed for acoustic calculations, and also has problems with distinguishing closely spaced sources of sound.

Известна система, обеспечивающая разделение источника звука с использованием пространственной фильтрации и регуляризацией фаз, описанная в патенте № US 8583428 В2, опубл. 15.06.2010. Данная система вычисляет направления на источники звука, используя разности фаз гармоник звуковых сигналов, приходящих на разные микрофоны микрофонной решетки. Система имеет проблемы с различением близкорасположенных источников звука. Также для ее работы необходимо знание реальной скорости звука, оценка которой в патенте отсутствует.A known system for separating a sound source using spatial filtering and regularization of the phases described in patent No. US 8583428 B2, publ. 06/15/2010. This system calculates directions to sound sources using phase differences of the harmonics of sound signals arriving at different microphones of the microphone array. The system has problems distinguishing nearby sound sources. Also, for its operation, it is necessary to know the real speed of sound, which is not evaluated in the patent.

Сущность полезной моделиUtility Model Essence

Заявленное техническое решение решает задачу локализации источников человеческой речи посредством средств аудио- и видеофиксации.The claimed technical solution solves the problem of localizing the sources of human speech by means of audio and video fixation.

Техническим результатом заявленного решения является повышение точности определения источников человеческой речи.The technical result of the claimed solution is to increase the accuracy of determining the sources of human speech.

Данный результат достигается за счет выполнения акустического сканирования только по диапазону азимутов и углов мест, соответствующих областям обнаружения лиц с учетом реального значения скорости звука в воздухе, зависящего от атмосферных параметров окружающей среды, причем акустическое сканирование выполняется посредством микрофонной решетки, состоящей из MEMS микрофонов.This result is achieved by performing acoustic scanning only in the range of azimuths and elevation angles corresponding to the face detection regions, taking into account the real value of the speed of sound in air, depending on the atmospheric environmental parameters, and the acoustic scanning is performed using a microphone array consisting of MEMS microphones.

Для обеспечения указанного технического результата было разработано устройство локализации источника голоса, содержащее связанные по общей шине данных: микрофонную решетку, состоящую из MEMS микрофонов; устройство видеофиксации, жестко закрепленное относительно микрофонной решетки; блок определения атмосферного параметра окружающей среды; память, хранящую таблицу зависимости значений скорости звука в воздухе от значений атмосферного параметра окружающей среды; и блок обработки информации, причем блок обработки информации выполнен с возможностью:To ensure the specified technical result, a voice source localization device was developed, which contains data connected via a common data bus: a microphone array consisting of MEMS microphones; a video capture device rigidly fixed relative to the microphone array; a unit for determining an atmospheric environmental parameter; a memory storing a table of the dependence of the speed of sound in air on the values of the atmospheric environmental parameter; and an information processing unit, wherein the information processing unit is configured to:

обнаружения лиц на видео на основе сигнала, получаемого от устройства видеофиксации;detecting faces in the video based on a signal received from the video recording device;

определения диапазонов азимутов и углов мест, соответствующих областям обнаруженных лиц, для формирования диаграммы направленности микрофонной решетки;determining the azimuth ranges and elevation angles corresponding to the regions of the detected faces to form the radiation pattern of the microphone array;

определения значения скорости звука в зависимости от значения атмосферного параметра окружающей среды;determining the speed of sound depending on the value of the atmospheric environmental parameter;

акустического сканирования окружающей среды с помощью микрофонной решетки во множестве направлений, соответствующих областям обнаруженных лиц и определенных сформированной диаграммой направленности микрофонной решетки, с учетом определенного ранее значения скорости звука; иacoustic scanning of the environment using a microphone array in a variety of directions corresponding to the areas of detected faces and determined by the formed array pattern of the microphone array, taking into account the previously determined value of the speed of sound; and

локализации источников человеческой речи на основе данных акустического сканирования.localization of human speech sources based on acoustic scanning data.

Краткое описание чертежейBrief Description of the Drawings

Для лучшего понимания сущности полезной модели, и чтобы более ясно показать, каким образом она может быть осуществлена, далее будет сделана ссылка, лишь в качестве примера, на прилагаемые чертежи, на которых:For a better understanding of the essence of the utility model, and to more clearly show how it can be implemented, hereinafter, reference will be made, only as an example, to the accompanying drawings, on which:

фиг. 1 - структурная схема устройства локализации источника голоса;FIG. 1 is a structural diagram of a device for localizing a voice source;

фиг. 2 - алгоритм работы устройства локализации источника голоса.FIG. 2 - algorithm of the device for localization of the voice source.

Осуществление полезной моделиUtility Model Implementation

На фиг. 1 изображена структурная схема устройства локализации источника голоса, в соответствии с которой устройство содержит связанные по общей шине данных: блок обработки информации 1; микрофонную решетку 2, состоящую из MEMS микрофонов; устройство видеофиксации 3, жестко закрепленную относительно микрофонной решетки; блок определения атмосферного параметра окружающей среды 4; память 5 и интерфейс связи 6.In FIG. 1 shows a structural diagram of a device for localizing a voice source, in accordance with which the device contains connected via a common data bus: information processing unit 1; microphone array 2, consisting of MEMS microphones; video capture device 3, rigidly fixed relative to the microphone array; a unit for determining an atmospheric environmental parameter 4; memory 5 and communication interface 6.

Далее алгоритм работы устройства локализации источника голоса будет описан в соответствии со схемой, отображенной на фиг. 2.Next, the operation algorithm of the voice source localization device will be described in accordance with the circuit shown in FIG. 2.

Видеосигнал от устройства видеофиксации 3 непрерывно поступает на блок обработки информации 1, который в соответствии с заложенными в него программно-аппаратными алгоритмами, выполняет обнаружение лиц на видео, а также их сопровождение в случае, если лица были обнаружены ранее, при этом в случае обнаружения лиц на видео дальнейшее акустическое сканирование будет осуществляться только по диапазону азимутов и углов мест, соответствующих областям обнаруженных лиц. Для достижения данной задачи блок обработки информации 1 определяет диапазоны азимутов и углов мест, соответствующих областям обнаруженных лиц, и формирует на их основе диаграмму направленности микрофонной решетки. Для определения блоком обработки информации значения скорости звука, используемой при акустическом сканировании, используется информация от блока определения атмосферного параметра окружающей среды 4 и таблица зависимости скорости звука в воздухе от значения атмосферного параметра окружающей среды, хранящейся в памяти устройства.The video signal from the video capture device 3 is continuously supplied to the information processing unit 1, which, in accordance with the software and hardware algorithms embedded in it, performs face detection on the video, as well as their accompaniment in the event that faces were previously detected, while in the case of detection of faces in the video, further acoustic scanning will be carried out only according to the range of azimuths and elevation angles corresponding to the areas of detected faces. To achieve this task, the information processing unit 1 determines the ranges of azimuths and elevation angles corresponding to the regions of the detected faces, and forms a directivity diagram of the microphone array on their basis. To determine the value of the speed of sound used in acoustic scanning by the information processing unit, information is used from the unit for determining the atmospheric environmental parameter 4 and a table of the dependence of the speed of sound in air on the value of the atmospheric environmental parameter stored in the device memory.

Атмосферный параметр окружающей среды может представляет собой параметр температуры, влажности, атмосферного давления и других атмосферных параметров, влияющих на изменение значения скорости звука в воздухе. Также блок определения атмосферного параметра окружающей среды 4 может определять по меньшей мере один дополнительный параметр окружающей среды, а блок обработки информации осуществляет корректировку значения скорости звука в зависимости по меньшей мере от одного значения дополнительного атмосферного параметра окружающей среды.The atmospheric environmental parameter can be a parameter of temperature, humidity, atmospheric pressure and other atmospheric parameters that affect the change in the value of the speed of sound in air. Also, the atmospheric environmental parameter determining unit 4 can determine at least one additional environmental parameter, and the information processing unit corrects the speed of sound depending on at least one additional atmospheric environmental parameter.

Далее с помощью микрофонной решетки 2 блок обработки информации 1 сканирует окружающую среду, используя диаграмму направленности микрофонной решетки из MEMS микрофонов, проверяя энергию звукового сигнала в заранее заданном множестве направлений, соответствующим областям обнаруженных лиц, с учетом определенного ранее значения скорости звука. Координаты (азимут и угол места) обнаруженных источников человеческой речи при акустическом сканировании обрабатываются блоком обработки информации с помощью методов пространственно-временной фильтрации для локализации источников человеческой речи. Соответственно в случае перемещения обнаруженных лиц, диапазоны азимутов и углов мест, соответствующим областям обнаруженных лиц, будут корректироваться блоком обработки информации 1, что в свою очередь приведет к корректировке в реальном времени множества направлений, в котором будет осуществляться акустическое сканирование согласно описанному выше алгоритму.Next, using the microphone array 2, the information processing unit 1 scans the environment using the directivity pattern of the microphone array from MEMS microphones, checking the energy of the audio signal in a predetermined set of directions corresponding to the areas of detected faces, taking into account the previously determined value of the speed of sound. The coordinates (azimuth and elevation) of the detected human speech sources during acoustic scanning are processed by the information processing unit using spatial-temporal filtering methods to localize human speech sources. Accordingly, in the case of movement of the detected faces, the azimuth and elevation ranges corresponding to the areas of the detected faces will be adjusted by the information processing unit 1, which in turn will lead to real-time adjustment of the many directions in which acoustic scanning will be performed according to the algorithm described above.

В качестве устройства видеофиксации 3 может быть использована видеокамера, инфракрасная камера или иное устройство для видеозаписи, а в качестве блока обработки информации 1 используется промышленный контроллер или плата на базе микроконтроллера.As a video recording device 3, a video camera, an infrared camera or other device for video recording can be used, and as an information processing unit 1, an industrial controller or a board based on a microcontroller is used.

Блок определения атмосферного параметра окружающей среды 4 представляет собой один или несколько датчиков, размещенных на единой печатной плате, и обеспечивающих измерение атмосферных параметров окружающей среды таких, как температура, влажность, атмосферное давление окружающей среды и тд.The unit for determining the atmospheric environmental parameter 4 is one or more sensors located on a single printed circuit board, and providing measurement of atmospheric environmental parameters such as temperature, humidity, atmospheric pressure, etc.

Все составные элементы заявленного устройства выполнены в виде единой конструкции, например, посредством их размещения на единой печатной плате или другим образом.All the constituent elements of the claimed device are made in the form of a single design, for example, by placing them on a single printed circuit board or in another way.

Для обеспечения обмена данными с внешними устройствами, такими как настольный компьютер, ноутбук, планшетный компьютер, смартфон и др., заявленное устройство дополнительно содержит интерфейс связи 6.To ensure the exchange of data with external devices, such as a desktop computer, laptop, tablet computer, smartphone, etc., the claimed device further comprises a communication interface 6.

Основное отличие заявленного устройства от аналогов заключается в наличии блока определения атмосферного параметра окружающей среды и использовании устройства видеофиксации перед акустическим сканированием. Благодаря блоку определения атмосферного параметра окружающей среды оценивается реальное значение скорости звука в воздухе, которое необходимо при акустическом сканировании, чтобы обеспечить более точную локализацию источников человеческой речи, например, в уличных условиях, где скорость звука изменяется от 318 м/с до 348 м/с в зависимости от температуры, влажности, атмосферного давления и прочих атмосферных параметров окружающей среды. Благодаря акустическому сканированию в областях обнаруженных лиц шумовые области никакого влияния на результат акустического сканирования не окажут, вследствие чего также повышается точность локализации источников человеческой речи. Кроме того, используемая микрофонная решетка из MEMS микрофонов также оказывает положительный эффект на точность локализации источников человеческой речи, поскольку в отличие от микрофонов других типов, MEMS микрофоны высоким соотношением сигнал/шум и максимальной дальностью локализации источников звука.The main difference between the claimed device and analogues is the presence of a unit for determining the atmospheric environmental parameter and the use of a video recording device before acoustic scanning. Thanks to the unit for determining the atmospheric environmental parameter, the real value of the speed of sound in air is estimated, which is necessary for acoustic scanning to provide more accurate localization of human speech sources, for example, in street conditions, where the speed of sound varies from 318 m / s to 348 m / s depending on temperature, humidity, atmospheric pressure and other atmospheric environmental parameters. Thanks to acoustic scanning in the areas of detected faces, noise areas will not have any effect on the result of acoustic scanning, as a result of which the accuracy of localization of human speech sources also increases. In addition, the used microphone array of MEMS microphones also has a positive effect on the accuracy of localization of human speech sources, because, unlike other types of microphones, MEMS microphones have a high signal-to-noise ratio and a maximum range of localization of sound sources.

Таким образом, заявленное решение обладает более высокой точностью локализации источников человеческой речи, чем известные аналоги. Кроме того, поскольку объем данных, получаемых при акустическом сканировании только в областях обнаруженных лиц с учетом атмосферных параметров окружающей среды, занимает меньший объем, чем объем данных, получаемых при таком же акустическом сканировании во всех направлениях, и за счет использования именно MEMS микрофонов содержит меньшее количество шумов, то заявленное решение обладает более высокой скоростью локализации источников человеческой речи, поскольку не требуются дополнительных операций по фильтрации источников шума, а блок обработки информации будет быстрее обрабатывать меньший объем данных и определять источники человеческой речи согласно описанному выше алгоритму.Thus, the claimed solution has a higher accuracy of localization of human speech sources than the known analogues. In addition, since the amount of data obtained by acoustic scanning only in areas of detected faces, taking into account atmospheric environmental parameters, is less than the amount of data obtained by the same acoustic scanning in all directions, and due to the use of MEMS microphones contains less the amount of noise, then the claimed solution has a higher rate of localization of human speech sources, since additional operations are not required to filter noise sources, and the unit brabotki information will be less data to process and determine the sources of human speech according to the algorithm described above.

Claims

1. A device for localizing a voice source, containing connected via a common data bus: a microphone array consisting of MEMS microphones; a video capture device rigidly fixed relative to the microphone array; a unit for determining an atmospheric environmental parameter; a memory storing a table of the dependence of the speed of sound in air on the values of the atmospheric environmental parameter; and an information processing unit, wherein the information processing unit is configured to:

detecting faces in the video based on a signal received from the video recording device;

determining the azimuth ranges and elevation angles corresponding to the regions of the detected faces to form the radiation pattern of the microphone array;

determining the speed of sound depending on the value of the atmospheric environmental parameter;

acoustic scanning of the environment using the aforementioned microphone array in a variety of directions corresponding to the regions of detected faces and determined by the formed array pattern of the microphone array, taking into account the previously determined value of the speed of sound; and

localization of human speech sources based on acoustic scanning data.

2. The device according to p. 1, characterized in that the information processing unit is configured to accompany the detected faces in the video.

3. The device according to any one of paragraphs. 1 or 2, characterized in that it contains a communication interface connected to a common data bus, providing data exchange with external devices.

4. The device according to claim 1, characterized in that the atmospheric environmental parameter is a parameter of the ambient temperature.

5. The device according to claim 1, characterized in that the atmospheric parameter of the environment is a parameter of the humidity of the environment.

6. The device according to p. 1, characterized in that the atmospheric environmental parameter is a parameter of the atmospheric pressure of the environment.

7. The device according to any one of paragraphs. 4-6, characterized in that the unit for determining the atmospheric environmental parameter is configured to determine an additional atmospheric environmental parameter, and the information processing unit is configured to adjust the value of the speed of sound depending on the value of the additional atmospheric environmental parameter.