RU2602667C1

RU2602667C1 - Method for multimedia output

Info

Publication number: RU2602667C1
Application number: RU2015115028/08A
Authority: RU
Inventors: Олег Олегович Басов; Олег Викторович Романюк; Андрей Леонидович Ронжин; Игорь Акрамович Саитов
Priority date: 2015-04-21
Filing date: 2015-04-21
Publication date: 2016-11-20

Abstract

FIELD: information technology.

SUBSTANCE: invention relates to multimedia information and communication systems. Method includes receiving through a network and decoding multiple media streams for a multimedia conference event, calculating total number of available display frames in a visual composition based on technical constraints associated with network, and viewing constrains associated with a display, determining that total number of decoded media streams is greater than total number of available display frames in a visual composition, selecting an active group of decoded media streams from total number of decoded media streams to be compared with available display frames based on estimates of speech and/or physical activity, characterising majority of conditions occurring during change of composition of multimedia conference participants, and selecting active group of decoded media streams as an active candidate for replacement, which will be replaced with a member of inactive group of decoded media streams, based on said estimates of activity.

EFFECT: improved integrity of output information.

1 cl, 3 dwg, 1 tbl

Description

Изобретение относится к области электросвязи, а именно к мультимедийным инфокоммуникационным системам, и может быть применено для совместного вывода мультимедийного содержимого (контента) в ходе интерактивной (мультимедийной) конференции в реальном времени.The invention relates to the field of telecommunications, namely to multimedia infocommunication systems, and can be used for the joint output of multimedia content (content) during an interactive (multimedia) conference in real time.

Мультимедийная конференция позволяет множеству участников (пользователей) осуществлять инфокоммуникационное взаимодействие и совместно использовать разные типы мультимедийного содержимого (контента), используя формы (окна) графического интерфейса пользователя (Graphical User Interface, GUI). Посредством указанных форм могут отображаться (выводиться) видеоизображения участников, слайды презентации, изображения с сенсорной панели, текстовые сообщения, которыми обмениваются пользователи, и т.п. Таким образом, территориально разнесенные участники могут обмениваться информацией в окружении виртуальной конференции, приближая свое интерактивное инфокоммуникационное взаимодействие к реальной конференции.A multimedia conference allows many participants (users) to carry out information and communication interaction and share different types of multimedia content (content) using the forms (windows) of the Graphical User Interface (GUI). Through these forms, video images of participants, presentation slides, images from the touch panel, text messages exchanged by users, etc. can be displayed (displayed). Thus, geographically dispersed participants can exchange information surrounded by a virtual conference, bringing their interactive info-communication interaction closer to a real conference.

Однако отображение всех участников конференции (видеоизображение) и инцидентного им контента может представлять некоторую сложность. Данная проблема усиливается по мере увеличения количества участников мультимедийной конференции, поскольку возрастает вероятность отображения контента неактивных участников или, наоборот, невывод активных пользователей. Указанная проблема, в частности, характерна известным способам отображения активных участников (патенты US 6628767 B1, US 2005/0078171).However, displaying all conference participants (video image) and the incident content may be of some complexity. This problem increases as the number of participants in a multimedia conference increases, since the likelihood of displaying the content of inactive participants or, conversely, non-withdrawal of active users increases. This problem, in particular, is characteristic of known methods for displaying active participants (patents US 6628767 B1, US 2005/0078171).

Известен способ (патент US 2005/0099492 А1), обеспечивающий изменение размеров форм, соответствующих видеоизображениям участников мультимедийной конференции, в зависимости от уровня их активности. При большом количестве активных участников соответствующие им видеоизображения при выводе будут иметь малый размер, неприемлемый с точки зрения восприятия человеческим зрением и снижающий достоверность передаваемой информации (контента). Кроме того, оценка уровня активности в указанном аналоге производится только на основе речевой активности участников мультимедийной конференции.A known method (patent US 2005/0099492 A1), providing a change in the size of the forms corresponding to the video images of the participants of a multimedia conference, depending on the level of their activity. With a large number of active participants, the corresponding video images during output will have a small size that is unacceptable from the point of view of human vision and reduces the reliability of the transmitted information (content). In addition, the assessment of the level of activity in the specified analogue is made only on the basis of the speech activity of the participants in a multimedia conference.

Известны способы (патенты US 6922718 В2, US 2004/0230651, US 2007/0299981), обеспечивающие вывод видеоизображений (видеопотоков) участников мультимедийной конференции на основе заранее определенных правил. Их общими недостатками являются: оценка активности участников только на основе аудиопотоков (речевой активности), нечувствительность к активности других видов мультимедийного контента, слабая зависимость от него выбранных правил вывода, что приводит к снижению целостности выводимой информации.Known methods (patents US 6922718 B2, US 2004/0230651, US 2007/0299981), providing the output of video images (video streams) of participants in a multimedia conference based on predefined rules. Their common shortcomings are: assessment of participants' activity only on the basis of audio streams (speech activity), insensitivity to the activity of other types of multimedia content, weak dependence on the selected output rules, which leads to a decrease in the integrity of the displayed information.

Наиболее близким по технической сущности к заявляемому способу и выбранным в качестве прототипа является способ управления мультимедийным содержимым (патент RU 2518423), заключающийся в том, что принимают через сеть множество мультимедийных потоков для события мультимедийной конференции; декодируют это множество мультимедийных потоков; вычисляют общее количество доступных фреймов отображения в визуальной композиции на основе, по меньшей мере, технических ограничений, связанных с сетью, и ограничений просмотра, связанных с дисплеем; определяют, что общее количество декодируемых мультимедийных потоков больше, чем общее количество доступных фреймов отображения в визуальной композиции; выбирают активную группу декодируемых мультимедийных потоков из общего количества декодируемых мультимедийных потоков для сопоставления с доступными фреймами отображения на основе речевой активности; генерируют первую оценку активности для представления величины отношения речевой активности к отсутствию речевой активности, вторую оценку активности для представления величины длительности речевой активности участника и третью оценку активности для представления самого последнего времени речевой активности участника; и выбирают члена активной группы декодируемых мультимедийных потоков в качестве активного кандидата на замену, который будет заменен членом неактивной группы декодируемых мультимедийных потоков, на основе первой, второй или третьей оценки активности.The closest in technical essence to the claimed method and selected as a prototype is a method for managing multimedia content (patent RU 2518423), which consists in the fact that many multimedia streams for a multimedia conference event are received through the network; decode this many multimedia streams; calculating the total number of available display frames in the visual composition based on at least the technical limitations associated with the network and the viewing limitations associated with the display; determining that the total number of decoded multimedia streams is greater than the total number of available display frames in the visual composition; an active group of decoded multimedia streams is selected from the total number of decoded multimedia streams for comparison with available display frames based on speech activity; generating a first activity score to represent the magnitude of the ratio of speech activity to the lack of speech activity, a second activity score to represent the duration of the participant's speech activity, and a third activity score to represent the most recent participant's speech activity; and selecting a member of the active group of decoded multimedia streams as an active replacement candidate to be replaced by a member of the inactive group of decoded multimedia streams based on the first, second, or third activity rating.

Способ-прототип предусматривает замену мультимедийного контента (потока) активных участников виртуальной конференции, отображаемых в визуальной композиции, членами неактивной группы на основе одной из трех оценок речевой активности.The prototype method provides for the replacement of multimedia content (stream) of active participants in a virtual conference displayed in a visual composition by members of an inactive group based on one of three assessments of speech activity.

В случае проведения организованной мультимедийной конференции возможны следующие ситуации.In the case of an organized multimedia conference, the following situations are possible.

1. Для того чтобы присоединиться к обсуждению, член неактивной группы, как правило, поднимает руку. Далее, если администратор конференции не замечает его жеста, данный участник спрашивает разрешения присоединиться к обсуждению. Следовательно, в описанной ситуации для замены члена активной группы необходимо использовать его двигательную (жестовую) или двигательную и речевую активность. В способе-прототипе такая возможность отсутствует, замена участника конференции производится при его речевой активности, а следовательно, часть его выступления теряется, что сказывается на точности Т^ИНФ сообщаемой им информации.1. In order to join the discussion, a member of an inactive group, as a rule, raises his hand. Further, if the administrator of the conference does not notice his gesture, this participant asks permission to join the discussion. Therefore, in the described situation, to replace a member of the active group, it is necessary to use its motor (gesture) or motor and speech activity. In the prototype method, such a possibility is absent, the conference participant is replaced during his speech activity, and therefore, part of his speech is lost, which affects the accuracy of the T ^INF information communicated to him.

Точность [на основе ГОСТ РВ 51987-2002. Информационная технология. Комплекс стандартов на автоматизированные системы. Типовые требования и показатели качества функционирования информационных систем. Общие положения. М.: Госстандарт России, 2001] - свойство системы мультимедийной конференции обеспечивать достижение согласованных результатов обработки и передачи информации, необходимых для получения достоверной выходной информации.Accuracy [based on GOST RV 51987-2002. Information technology. Set of standards for automated systems. Typical requirements and quality indicators of the functioning of information systems. General Provisions M .: Gosstandart of Russia, 2001] - the property of a multimedia conference system to ensure the achievement of consistent results of processing and transmitting information necessary to obtain reliable output information.

2. При формировании участниками конференции текстовых сообщений либо другого контента с использованием сенсорной панели или других устройств ввода информации в принимаемых мультимедийных потоках будет отсутствовать речевая активность. Тогда, согласно способу-прототипу не будет осуществляться замена участников конференции и отображение в визуальной композиции (на дисплее) инцидентных им мультимедийных потоков, а следовательно, не будет обеспечиваться полнота выводимой информации.2. When conference participants generate text messages or other content using the touch panel or other input devices, there will be no speech activity in the received multimedia streams. Then, according to the prototype method, the conference participants will not be replaced and the multimedia streams incident to them will be displayed in the visual composition (on the display), and therefore, the completeness of the displayed information will not be ensured.

Полнота [на основе ГОСТ РВ 51987-2002] - свойство выходной информации отражать состояния всех требуемых декодируемых мультимедийных потоков:Completeness [based on GOST RV 51987-2002] - the property of the output information to reflect the state of all the required decoded multimedia streams:

где М - минимально необходимое для эффективного проведения конференции число декодируемых мультимедийных потоков, а m-й показатель полноты определяется как:where M is the minimum number of decoded multimedia streams necessary for an effective conference, and the m-th completeness indicator is defined as:

Исходя из вышеизложенного недостатком способа-прототипа является невозможность выбора члена активной группы декодируемых мультимедийных потоков в качестве активного кандидата на замену на основе двигательной активности, приводящая к уменьшению целостности выводимой информацииBased on the foregoing, the disadvantage of the prototype method is the inability to select a member of the active group of decoded multimedia streams as an active candidate for replacement based on motor activity, leading to a decrease in the integrity of the displayed information

характеризующей полноту и точность отражения требуемых декодируемых мультимедийных потоков.characterizing the completeness and accuracy of reflection of the required decoded multimedia streams.

С учетом выражений (1) и (2) целостность выводимой мультимедийной информации можно определить какGiven the expressions (1) and (2), the integrity of the displayed multimedia information can be defined as

где Д_m - достоверность оценки m-го декодируемого мультимедийного потока. При этом под достоверностью следует понимать свойство информации отражать декодируемые мультимедийные потоки со степенью приближения (точностью), обеспечивающей эффективное использование этой информации при проведении мультимедийной конференции.where D _m is the reliability of the estimate of the m-th decoded multimedia stream. At the same time, reliability should be understood as the property of information to reflect decoded multimedia streams with a degree of approximation (accuracy), which ensures the effective use of this information when holding a multimedia conference.

Задачей изобретения является разработка способа мультимедийного вывода, позволяющего повысить целостность информации (мультимедийного контента), выводимой участнику организованной мультимедийной конференции.The objective of the invention is to develop a method of multimedia output, which allows to increase the integrity of the information (multimedia content) displayed to a participant in an organized multimedia conference.

В заявленном способе эта задача решается тем, что в способе мультимедийного вывода, в котором принимают через сеть множество мультимедийных потоков для события мультимедийной конференции; декодируют это множество мультимедийных потоков; вычисляют общее количество доступных фреймов отображения в визуальной композиции на основе, по меньшей мере, технических ограничений, связанных с сетью, и ограничений просмотра, связанных с дисплеем; определяют, что общее количество декодируемых мультимедийных потоков больше, чем общее количество доступных фреймов отображения в визуальной композиции; выбирают члена активной группы декодируемых мультимедийных потоков в качестве активного кандидата на замену, который будет заменен членом неактивной группы декодируемых мультимедийных потоков, на основе первой, второй или третьей оценки активности, дополнительно после того как определяют, что общее количество декодируемых мультимедийных потоков больше, чем общее количество доступных фреймов отображения в визуальной композиции, выбирают активную группу декодируемых мультимедийных потоков из общего количества декодируемых мультимедийных потоков для сопоставления с доступными фреймами отображения на основе речевой и/или двигательной активности. Формируют первую оценку активности на основе речевой активности, вторую оценку активности на основе двигательной активности и третью оценку активности на основе речевой и двигательной активности совместно, а затем выбирают члена активной группы декодируемых мультимедийных потоков в качестве активного кандидата на замену.In the claimed method, this problem is solved in that in a multimedia output method in which a plurality of multimedia streams for a multimedia conference event are received through a network; decode this many multimedia streams; calculating the total number of available display frames in the visual composition based on at least the technical limitations associated with the network and the viewing limitations associated with the display; determining that the total number of decoded multimedia streams is greater than the total number of available display frames in the visual composition; a member of the active group of decoded multimedia streams is selected as an active replacement candidate to be replaced by a member of an inactive group of decoded multimedia streams based on the first, second or third activity score, further after it is determined that the total number of decoded multimedia streams is greater than the total the number of available display frames in the visual composition, select the active group of decoded multimedia streams from the total number of decoded multimedia stream flows for comparison with available display frames based on speech and / or motor activity. A first activity estimate based on speech activity, a second activity estimate based on motor activity and a third activity estimate based on speech and motor activity are formed together, and then a member of the active group of decoded multimedia streams is selected as an active replacement candidate.

Новая совокупность существенных признаков позволяет достичь указанного технического результата за счет выбора активной группы декодируемых мультимедийных потоков из общего количества декодируемых мультимедийных потоков для сопоставления с доступными фреймами отображения на основе речевой и/или двигательной активности.A new set of essential features allows you to achieve the specified technical result by selecting an active group of decoded multimedia streams from the total number of decoded multimedia streams for comparison with available display frames based on speech and / or motor activity.

Проведенный анализ уровня техники позволил установить, что аналоги, характеризующиеся совокупностью признаков, тождественных всем признакам заявленного способа управления мультимедийным содержимым, отсутствуют. Следовательно, заявленное изобретение соответствует условию патентоспособности «новизна».The analysis of the prior art made it possible to establish that analogues, characterized by a combination of features that are identical to all the features of the claimed method of managing multimedia content, are missing. Therefore, the claimed invention meets the condition of patentability "novelty."

Результаты поиска известных решений в данной и смежных областях техники с целью выявления признаков, совпадающих с отличительными от прототипа признаками заявленного объекта, показали, что они не следуют явным образом из уровня техники. Из уровня техники также не выявлена известность влияния предусматриваемых существенными признаками заявленного изобретения преобразований на достижение указанного технического результата. Следовательно, заявленное изобретение соответствует условию патентоспособности «изобретательский уровень».Search results for known solutions in this and related fields of technology in order to identify features that match the distinctive features of the claimed object from the prototype showed that they do not follow explicitly from the prior art. The prior art also did not reveal the popularity of the impact provided for by the essential features of the claimed invention, the transformations to achieve the specified technical result. Therefore, the claimed invention meets the condition of patentability "inventive step".

Заявленное изобретение поясняется следующими чертежами:The claimed invention is illustrated by the following drawings:

- фиг. 1, на которой представлена блок-схема последовательности действий, реализующих предлагаемый способ;- FIG. 1, which shows a block diagram of a sequence of actions that implement the proposed method;

- фиг. 2, отображающей вариант возможного размещения доступных фреймов отображения в визуальной композиции;- FIG. 2, showing a variant of the possible placement of available display frames in the visual composition;

- фиг. 3, на которой представлен многомодальный входной интерфейс, объединяющий речь и жесты.- FIG. 3, which shows a multimodal input interface that combines speech and gestures.

Реализация заявленного способа заключается в следующем (фиг. 1).Implementation of the claimed method is as follows (Fig. 1).

В блоке 101 принимают через сеть множество мультимедийных потоков для события мультимедийной конференции. В блоке 102 декодируют это множество мультимедийных потоков. Кодирование и декодирование, передача и прием мультимедийной информации, а также другие обеспечивающие процедуры подробно описаны, например, в стандарте Международного союза электросвязи Н.323.At block 101, a plurality of multimedia streams for a multimedia conference event are received through the network. At block 102, a plurality of multimedia streams are decoded. Encoding and decoding, transmission and reception of multimedia information, as well as other supporting procedures, are described in detail, for example, in the standard of the International Telecommunication Union H.323.

Затем в блоке 103 вычисляют общее количество доступных фреймов отображения в визуальной композиции (фиг. 2) на основе, по меньшей мере, технических ограничений, связанных с сетью, и ограничений просмотра, связанных с дисплеем. Когда общее число декодируемых мультимедийных потоков не превышает общего числа фреймов отображения (N-3) в визуальной композиции, все участники конференции отображаются в ней.Then, in block 103, the total number of available display frames in the visual composition (FIG. 2) is calculated based on at least the technical limitations associated with the network and the viewing restrictions associated with the display. When the total number of decoded multimedia streams does not exceed the total number of display frames (N-3) in the visual composition, all conference participants are displayed in it.

В блоке 104 определяют, что общее количество декодируемых мультимедийных потоков больше, чем общее количество доступных фреймов отображения в визуальной композиции. В данном случае требуется сопоставление поднабора общего количества декодируемых мультимедийных потоков доступным фреймам отображения.In block 104, it is determined that the total number of decoded multimedia streams is greater than the total number of available display frames in the visual composition. In this case, a comparison of the total number of decoded multimedia streams to available display frames is required.

Когда общее количество декодируемых мультимедийных потоков не больше общего количества фреймов отображения в визуальной композиции, согласно предлагаемому способу происходит сопоставление декодируемых мультимедийных потоков доступным фреймам отображения. В этом случае видеокомпозиция (фиг. 2) может иметь достаточное количество доступных фреймов отображения, чтобы отобразить всех участников в декодируемых мультимедийных потоках для заданного события мультимедийной конференции.When the total number of decoded multimedia streams is not greater than the total number of display frames in the visual composition, according to the proposed method, the decoded multimedia streams are compared to available display frames. In this case, the video composition (Fig. 2) may have a sufficient number of available display frames to display all participants in the decoded multimedia streams for a given multimedia conference event.

Когда общее количество декодируемых мультимедийных потоков больше общего количества фреймов отображения в визуальной композиции, может потребоваться сопоставить поднабор общего количества декодируемых мультимедийных потоков доступным фреймам отображения. В этом случае видеокомпозиция (фиг. 2) не всегда может иметь достаточное количество доступных фреймов отображения, чтобы отобразить всех участников в декодируемых мультимедийных потоках для заданного события мультимедийной конференции. Конкретный поднабор декодируемых мультимедийных потоков, который должен быть воспроизведен посредством доступных фреймов отображения, может быть выбран посредством формирования активной группы декодируемых мультимедийных потоков.When the total number of decoded multimedia streams is greater than the total number of display frames in the visual composition, it may be necessary to map a subset of the total number of decoded multimedia streams to the available display frames. In this case, the video composition (Fig. 2) may not always have a sufficient number of available display frames to display all participants in decoded multimedia streams for a given multimedia conference event. A particular subset of the decoded multimedia streams to be reproduced by the available display frames may be selected by forming an active group of decoded multimedia streams.

В блоке 105 выбирают активную группу декодируемых мультимедийных потоков из общего количества декодируемых мультимедийных потоков для сопоставления с доступными фреймами отображения на основе речевой и/или двигательной активности. Активная группа декодируемых мультимедийных потоков может обозначать те декодируемые мультимедийные потоки, которые в текущее время сопоставляются доступному фрейму отображения. В отличие от этого декодируемые мультимедийные потоки, которые в текущее время не сопоставлены доступному фрейму отображения, обозначаются как неактивная группа декодируемых мультимедийных потоков. Члены неактивной группы, как правило, не видны в видеосодержимом из декодируемого мультимедийного содержимого, но их можно услышать в аудиосодержимом из декодируемого мультимедийного содержимого.In block 105, an active group of decoded multimedia streams is selected from the total number of decoded multimedia streams for comparison with available display frames based on speech and / or motor activity. The active group of decoded multimedia streams may indicate those decoded multimedia streams that are currently being mapped to an available display frame. In contrast, decoded multimedia streams that are currently not mapped to an available display frame are referred to as an inactive group of decoded multimedia streams. Members of an inactive group are generally not visible in the video content of the decoded multimedia content, but they can be heard in the audio content of the decoded multimedia content.

При инициализации (в начале события мультимедийной конференции) активная группа декодируемых мультимедийных потоков может выбираться множеством различных способов, например случайным образом или согласно набору правил выбора, такому как порядок, в котором участники присоединялись к событию мультимедийной конференции. В некоторых случаях выбор активной группы может быть осуществлен на основании некоторого набора эвристических правил, с помощью которых можно предсказывать участников, которые, скорее всего, примут участие в событии мультимедийной конференции. Например, определенные участники могут быть обозначены как «докладчики» для события мультимедийной конференции, тогда как другие участники могут быть обозначены как «слушатели». Поскольку в течение события мультимедийной конференции докладчики, как правило, говорят больше, чем слушатели, участники, они могут быть изначально выбраны в активную группу. В любом случае изначально выбирается активная группа для сопоставления с доступным фреймом отображения.Upon initialization (at the beginning of the multimedia conference event), the active group of decoded multimedia streams can be selected in a variety of different ways, for example randomly or according to a set of selection rules, such as the order in which participants joined the multimedia conference event. In some cases, the choice of the active group can be made on the basis of a certain set of heuristic rules, with which you can predict the participants who are likely to take part in the event of a multimedia conference. For example, certain participants may be designated as “speakers” for a multimedia conference event, while other participants may be designated as “listeners”. Since during the event of a multimedia conference, speakers, as a rule, speak more than listeners, participants, they can initially be selected into an active group. In any case, the active group is initially selected for comparison with the available display frame.

В течение события мультимедийной конференции требуется периодически реконфигурировать активную группу, чтобы отобразить других участников мультимедийной конференции (членов неактивной группы).During an event in a multimedia conference, it is required to periodically reconfigure the active group to display other participants in the multimedia conference (members of the inactive group).

Член неактивной группы декодируемых мультимедийных потоков может быть выбран как неактивный кандидат для замены члена активной группы декодируемых мультимедийных потоков на основании речевой активности. Есть вероятность, что в течение события мультимедийной конференции участник в неактивной группе декодируемых мультимедийных потоков может принять участие в речевой активности, например, в ситуации, когда слушатель задает вопрос докладчику.A member of an inactive group of decoded multimedia streams can be selected as an inactive candidate to replace a member of an active group of decoded multimedia streams based on speech activity. It is possible that during a multimedia conference event, a participant in an inactive group of decoded multimedia streams may take part in speech activity, for example, in a situation where the listener asks a speaker a question.

Поэтому в блоке 106 формируют первую оценку активности на основе речевой активности. Оценки на основе речевой активности могут формироваться на основании различных характеристик речевой активности, которые в некоторой степени предсказывают, когда член активной группы заговорит снова. Например, оценка активности может быть сформирована на основе:Therefore, in block 106, a first activity score is generated based on speech activity. Assessments based on speech activity can be formed based on various characteristics of speech activity, which to some extent predict when a member of the active group speaks again. For example, an activity score may be generated based on:

- последнего времени речевой активности для участника;- the last time of speech activity for the participant;

- величины отношения речевой активности к отсутствию речевой активности;- the ratio of speech activity to the absence of speech activity;

- числа событий, когда участник проявлял речевую активность;- the number of events when the participant showed speech activity;

- длительности речевой активности участника;- the duration of the speech activity of the participant;

- их интегральной характеристики.- their integral characteristics.

Мониторинг декодируемых мультимедийных потоков с целью детектирования активности речи может быть осуществлен с помощью известного устройства, например, описанного в [Обнаружитель активности речи // патент на полезную модель №77717 от 27.10.2008]. Расчет характеристик речевой активности может быть реализован на известных устройствах сходящихся вычислений, в частности на комплексных умножителях PDSP16112A (Mitel) и комплексных накопителях PDSP16318A (Mitel) и устройствах оперативной памяти (ОЗУ). Схемы ОЗУ известны и описаны, например, в книге В.Н. Вениаминова, О.Н. Лебедева, А.И. Мирошниченко. Микросхемы и их применение. М.: Радио и связь, 1989. - с. 146, рис. 5.2. В частности, ОЗУ может быть реализовано на микросхемах К565 серии.Decoded multimedia streams can be monitored to detect speech activity using a known device, for example, described in [Speech activity detector // Utility Model Patent No.77717 of 10.27.2008]. The calculation of the characteristics of speech activity can be implemented on known convergent computing devices, in particular, complex multipliers PDSP16112A (Mitel) and complex drives PDSP16318A (Mitel) and random access memory (RAM). RAM circuits are known and described, for example, in the book of V.N. Veniaminova, O.N. Lebedeva A.I. Miroshnichenko. Microcircuits and their application. M .: Radio and communications, 1989. - p. 146, fig. 5.2. In particular, RAM can be implemented on K565 series chips.

Член неактивной группы декодируемых мультимедийных потоков может быть выбран как неактивный кандидат для замены члена активной группы декодируемых мультимедийных потоков на основании двигательной активности. Указанная ситуация характерна для случаев:A member of an inactive group of decoded multimedia streams can be selected as an inactive candidate to replace a member of an active group of decoded multimedia streams based on motor activity. This situation is typical for cases:

- когда участник в неактивной группе декодируемых мультимедийных потоков поднимает руку и/или встает для того, чтобы обратиться к докладчику, задать ему вопрос или «взять слово»;- when a participant in an inactive group of decoded multimedia streams raises his hand and / or stands up to address the speaker, ask him a question or “take the floor”;

- когда участник в неактивной группе декодируемых мультимедийных потоков формирует текстовые сообщения либо другой контент с использованием сенсорной панели или других устройств ввода информации.- when a participant in an inactive group of decoded multimedia streams generates text messages or other content using a touch panel or other information input devices.

Поэтому в блоке 107 формируют вторую оценку активности на основе двигательной активности. Такая оценка может формироваться на основании различных характеристик двигательной активности (движения рук, телодвижения, изменение положения тела и/или его отдельных частей), которые в некоторой степени указывают на желание члена неактивной группы участвовать в событии мультимедийной конференции. Например, оценка активности может быть сформирована на основе:Therefore, in block 107, a second activity score is generated based on motor activity. Such an assessment can be formed on the basis of various characteristics of motor activity (arm movements, body movements, changes in body position and / or its individual parts), which to some extent indicate the desire of a member of an inactive group to participate in a multimedia conference event. For example, an activity score may be generated based on:

- последнего времени двигательной активности для участника;- the last time of physical activity for the participant;

- величины отношения длительности двигательной активности к отсутствию двигательной активности;- the ratio of the duration of motor activity to the lack of motor activity;

- числа событий, когда участник проявлял двигательную активность;- the number of events when the participant showed motor activity;

- длительности двигательной активности участника;- the duration of the motor activity of the participant;

Мониторинг декодируемых мультимедийных потоков с целью детектирования двигательной активности может быть осуществлен с помощью известных алгоритмов, например, описанных в [Aggarwal J.K., Cai Q. Human motion analysis: a review // Comput. Vis. Image Understanding. Vol. 73, 1999. - P. 428-440; Mitra S., Acharya T. Gesture Recognition: a survey // IEEE Transactions on Systems, Man, and Cybernetics - Part C: Applications and reviews. Vol. 37, No. 3, May 2007]. Указанные алгоритмы, а также расчет характеристик двигательной активности могут быть реализованы на известных устройствах сходящихся вычислений, в частности на комплексных умножителях PDSP16112A (Mitel) и комплексных накопителях PDSP16318A (Mitel) и устройствах оперативной памяти (ОЗУ). Схемы ОЗУ известны и описаны, например, в книге В.Н. Вениаминова, О.Н. Лебедева, А.И. Мирошниченко. Микросхемы и их применение. М.: Радио и связь, 1989. - с. 146, рис. 5.2. В частности, ОЗУ может быть реализовано на микросхемах К565 серии.Decoded multimedia streams can be monitored to detect motor activity using well-known algorithms, such as those described in [Aggarwal J.K., Cai Q. Human motion analysis: a review // Comput. Vis Image Understanding. Vol. 73, 1999. - P. 428-440; Mitra S., Acharya T. Gesture Recognition: a survey // IEEE Transactions on Systems, Man, and Cybernetics - Part C: Applications and reviews. Vol. 37, No. 3, May 2007]. These algorithms, as well as the calculation of the characteristics of motor activity, can be implemented on known convergent computing devices, in particular, complex multipliers PDSP16112A (Mitel) and complex drives PDSP16318A (Mitel) and random access memory devices (RAM). RAM circuits are known and described, for example, in the book of V.N. Veniaminova, O.N. Lebedeva A.I. Miroshnichenko. Microcircuits and their application. M .: Radio and communications, 1989. - p. 146, fig. 5.2. In particular, RAM can be implemented on K565 series chips.

Член неактивной группы декодируемых мультимедийных потоков может быть выбран как неактивный кандидат для замены члена активной группы декодируемых мультимедийных потоков на основании речевой и двигательной активности. В случае проведения организованной мультимедийной конференции возможна ситуация, когда участник в неактивной группе декодируемых мультимедийных потоков поднимает руку и спрашивает разрешения у докладчика задать ему вопрос или «взять слово». Указанный случай характеризует процесс многомодального взаимодействия участника мультимедийной конференции и соответствующих устройств ввода (технических средств).A member of an inactive group of decoded multimedia streams can be selected as an inactive candidate to replace a member of an active group of decoded multimedia streams based on speech and motor activity. In the case of an organized multimedia conference, it is possible that a participant in an inactive group of decoded multimedia streams raises his hand and asks the speaker for permission to ask him a question or “take the floor”. This case characterizes the process of multimodal interaction of a multimedia conference participant and corresponding input devices (technical means).

Поэтому в блоке 108 формируют третью оценку активности на основе речевой и двигательной активности совместно. Типичный процесс совместной обработки жестов и речи показан на фиг. 3.Therefore, in block 108, a third activity score is formed based on speech and motor activity together. A typical process for the joint processing of gestures and speech is shown in FIG. 3.

На первых двух этапах обработки информация, поступающая по разным каналам, обрабатывается параллельно и независимо. Далее обработанная информация в форме наборов лучших гипотез по каждой из модальностей (движение рукой и речь) объединяется (используя фреймо-ориентированные или иные семантические подходы) в единое представление с учетом ситуативного контекста. Важным этапом здесь является синхронизация информации, поступающей от разных каналов, так как временное расхождение между речью и жестами, выражающими один многомодальный коммуникативный акт (попытку члена неактивной группы стать членом активной группы), может достигать нескольких секунд. В процессе интеграции альтернативные лексические гипотезы по каждой модальности сортируются по их вероятностным оценкам для дальнейшей финальной многомодальной интерпретации. Наилучшая гипотеза, полученная после интеграции модальностей, передается в подсистему управления диалогом, которая обеспечивает связь с конкретным приложением.In the first two stages of processing, information coming through different channels is processed in parallel and independently. Further, the processed information in the form of sets of the best hypotheses for each of the modalities (hand movement and speech) is combined (using frame-oriented or other semantic approaches) into a single representation, taking into account the situational context. An important stage here is the synchronization of information coming from different channels, since the temporary discrepancy between speech and gestures expressing one multimodal communicative act (an attempt by a member of an inactive group to become a member of an active group) can reach several seconds. During the integration process, alternative lexical hypotheses for each modality are sorted by their probabilistic estimates for further final multimodal interpretation. The best hypothesis obtained after the integration of modalities is transmitted to the dialogue management subsystem, which provides communication with a specific application.

В случае мультимедийной конференции в данном приложении формируется оценка активности на основе представленных выше характеристик речевой и двигательной активности, а также на основе:In the case of a multimedia conference, this appendix forms an assessment of activity based on the above characteristics of speech and motor activity, as well as on the basis of:

- времени рассинхронизации речевой и двигательной активности для участника;- time of desynchronization of speech and motor activity for the participant;

- величины отношения длительности двигательной активности к речевой активности;- the ratio of the duration of motor activity to speech activity;

- числа событий, когда участник проявлял речевую и двигательную активность совместно;- the number of events when the participant showed speech and motor activity together;

Мониторинг декодируемых мультимедийных потоков с целью детектирования двигательной активности может быть осуществлен с помощью известных систем, например, QuickSet, Human-Centric Word Processor, VR Aircraft Maintenance Training System и Portable Voice Assistant [Oviatt, S.L. Multimodal interfaces. In The Human-Computer Interaction Handbook: Fundamentals, Evolving Technologies and Emerging Applications, J. Jacko and A. Sears, Eds. Lawrence Erlbaum Assoc. Mahwah, NJ, chap.14, 2003. - P. 286-304], или алгоритмов, например, рассмотренных в [Карпов, А.А. Аудиовизуальный речевой интерфейс для систем управления и оповещения // Известия ЮФУ. Технические науки. - Таганрог: ТТИ ЮФУ, №3 (104), 2010. - С. 218-222.].Decoded multimedia streams can be monitored to detect motor activity using well-known systems such as QuickSet, Human-Centric Word Processor, VR Aircraft Maintenance Training System and Portable Voice Assistant [Oviatt, S.L. Multimodal interfaces. In The Human-Computer Interaction Handbook: Fundamentals, Evolving Technologies and Emerging Applications, J. Jacko and A. Sears, Eds. Lawrence Erlbaum Assoc. Mahwah, NJ, chap. 14, 2003. - P. 286-304], or algorithms, for example, considered in [Karpov, A.A. Audiovisual speech interface for control and warning systems // News of SFU. Technical science. - Taganrog: TTI SFU, No. 3 (104), 2010. - S. 218-222.].

Далее в блоке 109 выбирают члена активной группы декодируемых мультимедийных потоков в качестве активного кандидата на замену, который будет заменен членом неактивной группы декодируемых мультимедийных потоков, на основе первой, второй или третьей оценки активности. Например, в визуальной композиции (фиг. 2) может быть удален активный кандидат из доступного фрейма отображения, а неактивный кандидат сопоставлен с освобожденным доступным фреймом отображения. Таким образом, доступные фреймы отображения могут использоваться для отображения более активных участников мультимедийной конференции. При этом достигается сокращение количества переходов между мультимедийным содержимым, отображаемым на доступных фреймах отображения.Next, in block 109, a member of the active group of decoded multimedia streams is selected as the active replacement candidate to be replaced by a member of the inactive group of decoded multimedia streams based on the first, second, or third activity rating. For example, in the visual composition (FIG. 2), the active candidate from the available display frame may be deleted, and the inactive candidate will be mapped to the freed available display frame. Thus, available display frames can be used to display more active participants in a multimedia conference. This reduces the number of transitions between multimedia content displayed on the available display frames.

Визуальная композиция (фиг. 2) может содержать различные фреймы (формы) отображения, расположенные в определенной порядке для представления участникам организованной мультимедийной конференции. Каждый фрейм отображения устроен так, чтобы воспроизводить или отображать мультимедийное содержимое из мультимедийных потоков, такое как видеоизображение выступающего участника, слайды презентации, изображения с сенсорной панели (F₄) и видеоизображения членов активной группы (F₅,…F_N), а также: основные данные о текущем мероприятии (F₁), изображение выступающего участника, заставку текущего мероприятия, текстовые сообщения (F₂), которыми обмениваются участники, индикатор длительности выступления (F₃).The visual composition (Fig. 2) may contain various display frames (forms) arranged in a specific order for presentation to the participants of an organized multimedia conference. Each display frame is designed to play or display multimedia content from multimedia streams, such as a video of the speaker, presentation slides, images from the touch panel (F ₄ ) and video images of active group members (F ₅ , ... F _N ), as well as: basic data about the current event (F ₁ ), the image of the speaker, the splash screen of the current event, text messages (F ₂ ) exchanged by participants, an indicator of the duration of the speech (F ₃ ).

Заявленный способ мультимедийного вывода позволяет повысить целостность информации (мультимедийного контента), выводимой участнику организованной мультимедийной конференции. Для доказательства достижения заявленного технического результата приведены следующие экспериментальные исследования.The claimed method of multimedia output can improve the integrity of the information (multimedia content) displayed to the participant of the organized multimedia conference. To prove the achievement of the claimed technical result, the following experimental studies are given.

Рассматривалась мультимедийная конференция с М=16 участником при числе доступных фреймов отображения N=8 (5 из них для отображения членов активной группы). При числе активных участников М_А=5 моделировались и исследовались следующие ситуации:A multimedia conference with M = 16 participants was considered with the number of available display frames N = 8 (5 of them for displaying members of the active group). With the number of active participants M _A = 5, the following situations were modeled and investigated:

1) шестой участник поднимал руку, желая выступить, затем спустя 5 с начинал свое выступление;1) the sixth participant raised his hand, wishing to speak, then after 5 s he began his speech;

2) о своем желании активно участвовать в обсуждении заявили (путем поднятия руки) 3 члена неактивной группы;2) 3 members of an inactive group declared their desire to actively participate in the discussion (by raising their hands);

3) о своем желании активно участвовать в обсуждении заявили (путем поднятия руки) 7 членов неактивной группы;3) 7 members of the inactive group declared their desire to actively participate in the discussion (by raising their hands);

4) шестой участник поднимал руку, желая выступить, а затем спрашивал разрешения присоединиться к обсуждению и:4) the sixth participant raised his hand, wishing to speak, and then asked permission to join the discussion and:

4а) начинал выступление, не дождавшись разрешения администратора конференции (или не получив его);4a) started the presentation without waiting for the permission of the conference administrator (or without receiving it);

4б) начинал выступление, дождавшись разрешения администратора конференции;4b) began the presentation, waiting for the permission of the conference administrator;

5) шестой участник формировал текстовое сообщение с использованием сенсорной панели.5) the sixth participant formed a text message using the touch panel.

Результаты оценки целостности информации, выводимой участнику организованной мультимедийной конференции, при реализации способа-прототипа и заявленного способа представлены ниже (таблица 1).The results of evaluating the integrity of information displayed to a participant in an organized multimedia conference when implementing the prototype method and the claimed method are presented below (table 1).

Из анализа результатов экспериментальных исследований, представленных в таблице 1, видно, что применение заявленного способа мультимедийного вывода позволяет повысить целостность информации (мультимедийного контента), выводимой участнику организованной мультимедийной конференции, во всех практических случаях по сравнению со способом-прототипом.From the analysis of the results of experimental studies, presented in table 1, it is seen that the application of the claimed method of multimedia output can improve the integrity of the information (multimedia content) displayed to the participant of the organized multimedia conference in all practical cases compared to the prototype method.

Claims

A multimedia output method, comprising: receiving through a network a plurality of multimedia streams for a multimedia conference event; decode this many multimedia streams; calculating the total number of available display frames in the visual composition based on at least the technical limitations associated with the network and the viewing limitations associated with the display; determining that the total number of decoded multimedia streams is greater than the total number of available display frames in the visual composition; a member of the active group of decoded multimedia streams is selected as an active replacement candidate to be replaced by a member of an inactive group of decoded multimedia streams based on the first, second, or third activity score, characterized in that after determining that the total number of decoded multimedia streams is greater than the total number of available display frames in the visual composition, select the active group of decoded multimedia streams from the total number of decoded m ltimediynyh streams for mapping to the available display frames based on speech and / or physical activity; form a first activity score based on speech activity, form a second activity score based on motor activity, form a third activity score based on speech and motor activity together, select a member of the active group of decoded multimedia streams as an active replacement candidate.