CN114449205B - Data processing method, terminal device, electronic device and storage medium - Google Patents

Data processing method, terminal device, electronic device and storage medium Download PDF

Info

Publication number
CN114449205B
CN114449205B CN202210362677.4A CN202210362677A CN114449205B CN 114449205 B CN114449205 B CN 114449205B CN 202210362677 A CN202210362677 A CN 202210362677A CN 114449205 B CN114449205 B CN 114449205B
Authority
CN
China
Prior art keywords
scene
conference control
stream
media stream
control scene
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210362677.4A
Other languages
Chinese (zh)
Other versions
CN114449205A (en
Inventor
孙俊伟
王克彦
曹亚曦
俞鸣园
费敏健
吕少卿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Huachuang Video Signal Technology Co Ltd
Original Assignee
Zhejiang Huachuang Video Signal Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Huachuang Video Signal Technology Co Ltd filed Critical Zhejiang Huachuang Video Signal Technology Co Ltd
Priority to CN202210362677.4A priority Critical patent/CN114449205B/en
Publication of CN114449205A publication Critical patent/CN114449205A/en
Application granted granted Critical
Publication of CN114449205B publication Critical patent/CN114449205B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/15Conference systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/451Execution arrangements for user interfaces
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/80Responding to QoS
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/15Conference systems
    • H04N7/157Conference systems defining a virtual conference space and using avatars or agents

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Telephonic Communication Services (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

The disclosure provides a data processing method, a terminal device, an electronic device and a storage medium, wherein a target conference control scene displayed on the terminal device is determined based on at least one conference control scene information included in conference control operation information sent by a first server and media stream information corresponding to the at least one conference control scene; sending media stream subscription request information to a second server based on the media stream information corresponding to the at least one conference control scene and the target conference control scene to acquire a media stream corresponding to the target conference control scene; outputting a media stream corresponding to the target conference control scene through the target conference control scene; the media stream comprises a video stream and an audio stream corresponding to at least one terminal device, and/or the second server determines a virtual stream based on the video stream and the audio stream corresponding to the at least one terminal device.

Description

Data processing method, terminal device, electronic device and storage medium
Technical Field
The present disclosure relates to the field of data processing technologies, and in particular, to a data processing method, a terminal device, an electronic device, and a storage medium.
Background
With the development of terminal intellectualization, a fourth generation mobile communication technology (4G) and a fifth generation mobile communication technology (5G) mobile internet, the access coverage of a video conference system is wider and wider, the types of accessible terminal equipment are more and more, and the conference pictures displayed on the terminal equipment often directly influence the experience of customers; in addition, with the popularization of video conference services, the provided conference control scenes are increased, and the picture layout of the combined conference control determined by different conference control scenes is more and more diversified; therefore, how to determine the screen layout displayed on the terminal device needs to be solved.
Disclosure of Invention
The present disclosure provides a data processing method, a terminal device, an electronic device, and a storage medium, to at least solve the above technical problems in the prior art.
According to a first aspect of the present disclosure, there is provided a data processing method applied to a terminal device, the method including:
determining a target conference control scene displayed on the terminal device based on at least one conference control scene information included in the conference control operation information sent by the first server and media stream information corresponding to the at least one conference control scene;
Sending media stream subscription request information to a second server based on the media stream information corresponding to the at least one conference control scene and the target conference control scene to acquire a media stream corresponding to the target conference control scene;
outputting a media stream corresponding to the target conference control scene through the target conference control scene;
the media stream comprises a video stream and an audio stream corresponding to at least one terminal device, and/or the second server determines a virtual stream based on the video stream and the audio stream corresponding to the at least one terminal device.
In the foregoing solution, before determining to display a target affiliate control scene of the terminal device based on at least one piece of affiliate control scene information and media stream information corresponding to the at least one affiliate control scene included in the affiliate control operation information sent by the first server, the method further includes:
and confirming the controllable scene coefficient of the at least one controllable scene and/or the media stream coefficient of the media stream corresponding to the at least one controllable scene.
In the foregoing solution, the determining a target conference control scene displayed on the terminal device based on at least one piece of conference control scene information included in the conference control operation information sent by the first server and media stream information corresponding to the at least one conference control scene includes:
Determining the priority of at least one conference control scene to be displayed by the terminal equipment based on at least one conference control scene coefficient and a media stream coefficient corresponding to the at least one conference control scene included in the conference control operation information;
sequencing at least one conference control scene to be displayed by the terminal equipment based on the priority from high to low to obtain a sequencing result;
determining a target control scene displayed on the terminal equipment based on the sequencing result;
the target conference control scene comprises at least one conference control scene.
In the foregoing solution, the determining, based on at least one conference control scene coefficient included in the conference control operation information and a media stream coefficient corresponding to the at least one conference control scene, a priority of the at least one conference control scene to be displayed by the terminal device includes:
responding to the at least one conference control scene information, wherein a media stream corresponding to a first conference control scene comprises a media stream of a second conference control scene in the at least one conference control scene information, and determining the conference control scene coefficient of the first conference control scene as the conference control scene coefficient of the second conference control scene; determining the priority of the first controlled scene as the sum of the controlled scene coefficient of the second controlled scene and the media stream coefficient corresponding to the second controlled scene;
Or, in response to that the media stream corresponding to the first controlled scene does not include the media stream of the second controlled scene in the at least one piece of controlled scene information, determining that the priority of the first controlled scene is the controlled scene coefficient of the first controlled scene.
In the foregoing solution, before or after the sending of the media stream subscription request information to the second server based on the media stream information corresponding to the at least one event control scene, the method further includes:
determining the transmission code rate of the media stream corresponding to each region based on the proportion of each region in the target conference control scene;
sending the transmission code rate of the media stream corresponding to each region to the first server; and adding the transmission code rate of the media stream corresponding to each region in the virtual stream configuration information by the first server.
In the foregoing solution, before or after the sending of the media stream subscription request information to the second server based on the media stream information corresponding to the at least one event control scene, the method further includes:
sending the video stream and the audio stream corresponding to the terminal equipment to the second server;
the video stream corresponding to the terminal equipment comprises the video collected by the terminal equipment, and the audio stream corresponding to the terminal equipment comprises the audio collected by the terminal equipment.
According to a second aspect of the present disclosure, there is provided a data processing method applied to a second server, including:
receiving a video stream sent by at least one terminal device;
generating at least one virtual stream corresponding to the virtual stream configuration information based on the virtual stream configuration information sent by the first server and/or the video stream sent by the at least one terminal device;
based on the media stream subscription request information sent by the terminal equipment, sending at least one media stream corresponding to the media stream subscription request information to the terminal equipment;
the media stream comprises a video stream and an audio stream corresponding to at least one terminal device, and/or a virtual stream corresponding to the virtual stream configuration information.
In the foregoing solution, the generating at least one virtual stream corresponding to the virtual stream configuration information based on the virtual stream configuration information sent by the first server and the video stream sent by the at least one terminal device includes, for each original media stream identifier corresponding to the virtual stream configuration information, performing the following operations:
determining a first video stream sent by the terminal equipment corresponding to the original media stream identifier based on the original media stream identifier included in the virtual stream configuration information;
In response to receiving a first identification frame of the first video stream, replacing an original media stream identification of the first video stream with a virtual stream identification included in the virtual stream configuration information;
updating frame metadata of the first video stream;
and determining to replace the original media stream identification, and updating the video stream of the frame metadata to the virtual stream corresponding to the virtual stream configuration information.
In the above solution, the updating the frame metadata of the first video stream includes:
updating a frame sequence number of at least one frame in the first video stream;
and/or updating a frame timestamp of at least one frame in the first video stream.
In the foregoing solution, the generating at least one virtual stream corresponding to the virtual stream configuration information based on the virtual stream configuration information sent by the first server and the video stream sent by the at least one terminal device further includes:
in response to not receiving the first video stream, generating at least one virtual stream corresponding to the virtual stream configuration information based on a preset frame.
In the foregoing solution, the sending, to the terminal device, at least one media stream corresponding to the media stream subscription request information based on the media stream subscription request information sent by the terminal device includes:
Determining a transmission code rate of at least one media stream based on the virtual stream configuration information;
sending a virtual stream corresponding to the virtual stream identifier to the terminal equipment based on the virtual stream identifier in the media stream subscription request information and/or the transmission code rate of the virtual stream corresponding to the virtual stream identifier;
or sending a first video stream corresponding to the original media stream identifier to the terminal device based on the original media stream identifier in the media stream subscription request information and/or the transmission code rate of the video stream corresponding to the original media stream identifier.
According to a third aspect of the present disclosure, there is provided a data processing method applied to a first server, including:
sending conference control operation information to at least one terminal device, so that the at least one terminal device determines a target conference control scene displayed on the terminal device based on at least one conference control scene information and media stream information corresponding to the at least one conference control scene, wherein the conference control scene information comprises the conference control operation information;
and sending virtual stream configuration information to a second server so that the second server determines a corresponding virtual stream based on the virtual stream configuration information and the video stream and the audio stream corresponding to at least one terminal device.
In the foregoing solution, before sending the conference control operation information to at least one terminal device, the method further includes:
receiving conference control information sent by first equipment;
receiving the transmission code rate of the media stream corresponding to each region in the terminal equipment, which is sent by the terminal equipment;
generating the conference operation information based on the conference information;
generating the virtual flow configuration information based on the conference control information and/or the transmission code rate;
and the conference control information comprises parameter information of a conference target conference control scene.
According to a fourth aspect of the present disclosure, there is provided a terminal device including:
the layout unit is used for determining a target conference control scene displayed on the terminal equipment based on at least one conference control scene information included in the conference control operation information sent by the first server and media stream information corresponding to the at least one conference control scene;
a first sending unit, configured to send media stream subscription request information to a second server based on media stream information corresponding to the at least one conference control scene and the target conference control scene, so as to obtain a media stream corresponding to the target conference control scene;
the output unit is used for outputting the media stream corresponding to the target conference control scene through the target conference control scene;
The media stream comprises a video stream and an audio stream corresponding to at least one terminal device, and/or the second server determines a virtual stream based on the video stream and the audio stream corresponding to the at least one terminal device.
According to a fifth aspect of the present disclosure, there is provided a second server comprising:
the first receiving unit is used for receiving a video stream sent by at least one terminal device;
the generating unit is used for generating at least one virtual stream corresponding to the virtual stream configuration information based on the virtual stream configuration information sent by the first server and/or the video stream sent by the at least one terminal device;
a second sending unit, configured to send, to a terminal device, at least one media stream corresponding to media stream subscription request information based on the media stream subscription request information sent by the terminal device;
the media stream comprises a video stream and an audio stream corresponding to at least one terminal device, and/or a virtual stream corresponding to the virtual stream configuration information.
According to a sixth aspect of the present disclosure, there is provided a first server comprising:
a third sending unit, configured to send conference control operation information to at least one terminal device, so that the at least one terminal device determines a target conference control scene displayed on the terminal device based on at least one conference control scene information included in the conference control operation information and media stream information corresponding to the at least one conference control scene; and sending virtual stream configuration information to a second server so that the second server determines a corresponding virtual stream based on the virtual stream configuration information and the video stream and the audio stream corresponding to at least one terminal device.
According to a seventh aspect of the present disclosure, there is provided an electronic apparatus comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor, the instructions being executable by the at least one processor to enable the at least one processor to perform a data processing method performed by the terminal device; or, a data processing method executed by the second server can be executed; alternatively, the data processing method executed by the first server can be executed.
According to a fourth aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing a computer to execute the data processing method performed by the above-described terminal device; or, a data processing method executed by the second server can be executed; alternatively, the data processing method executed by the first server can be executed.
The data processing method disclosed by the invention determines a target conference control scene displayed on the terminal equipment through at least one conference control scene information and media stream information corresponding to at least one conference control scene, wherein the conference control scene information comprises the conference control operation information sent by a first server; sending media stream subscription request information to a second server based on the media stream information corresponding to the target conference control scene to acquire a media stream corresponding to the target conference control scene; outputting a media stream corresponding to the target conference control scene through the target conference control scene; the media stream comprises a video stream and an audio stream corresponding to at least one terminal device, and/or the second server can determine a conference picture displayed on the terminal device based on a virtual stream determined by the video stream and the audio stream corresponding to the at least one terminal device.
It should be understood that the statements in this section are not intended to identify key or critical features of the embodiments of the present disclosure, nor are they intended to limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
The above and other objects, features and advantages of exemplary embodiments of the present disclosure will become readily apparent from the following detailed description, which proceeds with reference to the accompanying drawings. Several embodiments of the present disclosure are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which:
in the drawings, the same or corresponding reference numerals indicate the same or corresponding parts.
FIG. 1 is a schematic flow chart diagram illustrating an alternative data processing method provided by an embodiment of the present disclosure;
FIG. 2 is a schematic flow chart diagram illustrating an alternative data processing method provided by the embodiment of the present disclosure;
FIG. 3 is a schematic flow chart diagram illustrating a further alternative data processing method provided by the embodiment of the present disclosure;
FIG. 4 is a block diagram illustrating an alternative configuration of a data processing system provided by an embodiment of the present disclosure;
FIG. 5 is a schematic flow chart diagram illustrating a further alternative data processing method provided by the embodiment of the present disclosure;
fig. 6 shows a first schematic diagram of a display screen of a terminal device provided by an embodiment of the present disclosure;
Fig. 7 shows a second schematic diagram of a display screen of a terminal device provided by an embodiment of the present disclosure;
fig. 8 illustrates a third schematic diagram of a display screen of a terminal device provided by an embodiment of the present disclosure;
fig. 9 illustrates a fourth schematic diagram of a display screen of a terminal device according to an embodiment of the disclosure;
fig. 10 shows a fifth schematic diagram of a display screen of a terminal device provided by an embodiment of the present disclosure;
fig. 11 shows a sixth schematic diagram of a display screen of a terminal device provided by an embodiment of the present disclosure;
fig. 12 is a schematic diagram illustrating an alternative structure of a terminal device provided by an embodiment of the present disclosure;
fig. 13 is a schematic diagram illustrating an alternative structure of a second server provided by an embodiment of the present disclosure;
fig. 14 is a schematic diagram illustrating an alternative structure of a first server provided in an embodiment of the present disclosure;
FIG. 15 shows a schematic block diagram of an example electronic device that may be used to implement embodiments of the present disclosure.
Detailed Description
In order to make the objectives, technical solutions and advantages of the present application clearer, the present application will be described in further detail with reference to the attached drawings, the described embodiments should not be considered as limiting the present application, and all other embodiments obtained by a person of ordinary skill in the art without creative efforts shall fall within the protection scope of the present application.
In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is understood that "some embodiments" may be the same subset or different subsets of all possible embodiments, and may be combined with each other without conflict.
In the following description, references to the terms "first", "second", and the like, are only to distinguish similar objects and do not denote a particular order, but rather the terms "first", "second", and the like may be used interchangeably with the order specified, where permissible, to enable embodiments of the present application described herein to be practiced otherwise than as specifically illustrated or described herein.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the present application only and is not intended to be limiting of the application.
It should be understood that, in the various embodiments of the present application, the size of the serial number of each implementation process does not mean the execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.
Before further detailed description of the embodiments of the present application, terms and expressions referred to in the embodiments of the present application will be described, and the terms and expressions referred to in the embodiments of the present application will be used for the following explanation.
The virtual stream or the conference virtual stream refers to a path of media stream formed by virtualizing multiple paths of media streams meeting specific conditions generated by multiple different encoders at different time intervals in a conference by modifying media frame metadata and splicing the media streams without secondary encoding. If there is no media stream meeting the condition for some period of time, the system inserts a specific media frame, for example, a video stream is inserted with a sticker frame, and an audio stream is inserted with a mute frame. When the conference starts, the system allocates a fixed virtual stream unique identifier for the conference virtual stream.
Many different types of virtual streams can be defined in the same conference, for example: the video stream which is uplinked at the terminal with the most active voice at present in the conference can be a voice excitation virtual stream; the video stream that is currently in the current conference and is uploaded by the terminal called to speak can be a virtual stream of a speaking conference place; the video stream that is currently in the up stream of the terminal set as the focal meeting place in the current conference can be a virtual stream of the focal meeting place; the presentation video stream of the current conference (one conference and only one presentation terminal) may be a presentation virtual stream.
Further, the virtual stream of the audio stream may also be determined with reference to the virtual stream of the video stream.
The video conference system includes a Multi-Control Unit (MCU), i.e., a video conference server, and various participating devices such as soft and hard terminals and a recording and playing server. The terminal equipment collects image sound (audio and/or video) and sends the image sound to the MCU after coding. The MCU fuses or does not fuse images, audios or videos sent by the plurality of terminal devices according to the conference requirements, and sends the mixed sound or the non-mixed sound to each participating terminal, thereby realizing audio and video conversation of the multi-party conferees. With the increasing maturity of cloud computing technology, the clouding of a video conference system becomes more and more common. The concept of a Selective Forwarding Unit (SFU) is becoming more popular, that is, the conference system server does not perform multi-picture fusion, but only performs multi-path stream Forwarding, thereby avoiding the media stream data calculation overhead caused by the media stream processing of the conference server. With the development of terminal equipment intellectualization and 4G and 5G mobile internet, the access coverage of the video conference system is wider and wider, and more types of accessible terminals are available. Such as smart phones, Personal Computers (PCs), flat panel televisions, etc. may be accessible. However, the types of terminal devices are different, and their performances are also different. One conference and one screen layout do not necessarily satisfy the requirements of all types of terminal devices at the same time. Therefore, demands for different terminal differentiated screen layouts are also being provided. With the popularization of video conference services, the logic of the requirements of combining conference control differentiated picture layouts is more and more complex, the definition of the priority of picture display is more and more complex, and the complexity of picture switching is more and more high. For example: and business logics such as speaker picture-in-multi-picture layout, focus picture-in-multi-picture layout and the like.
Therefore, in view of the problems in the related art, the present disclosure provides a data processing method to solve at least some or all of the above technical problems.
Fig. 1 shows an alternative flow chart of a data processing method provided by the embodiment of the present disclosure, which will be described according to various steps.
Step S101, determining a target conference control scene displayed on the terminal device based on at least one piece of conference control scene information included in the conference control operation information sent by the first server and media stream information corresponding to the at least one conference control scene.
In some embodiments, the terminal device confirms the controlled scene coefficient of the at least one controlled scene in advance, and/or the media stream coefficient of the media stream corresponding to the at least one controlled scene. The controllable scene coefficients of the at least one controllable scene are gradually decreased or increased in a stepwise manner, that is, the minimum value of the difference between the controllable scene coefficients of any two controllable scenes is a fixed value; the media stream coefficient of the media stream corresponding to the at least one controlled scene is increased or decreased (may be increased in a stepwise manner, or may not be increased in a stepwise manner), and the media stream coefficient of the media stream is smaller than the difference between the controlled scene coefficients of any two controlled scenes.
In some embodiments, the terminal device determines a priority of at least one conference scene to be displayed by the terminal device based on at least one conference scene coefficient included in the conference operation information and a media stream coefficient corresponding to the at least one conference scene; sequencing at least one conference control scene to be displayed by the terminal equipment based on the priority from high to low to obtain a sequencing result; and determining the target control scene displayed on the terminal equipment based on the sequencing result.
In specific implementation, when the terminal device responds to the at least one conference control scene information, and the media stream corresponding to the first conference control scene comprises the media stream of the second conference control scene in the at least one conference control scene information, determining that the conference control scene coefficient of the first conference control scene is the conference control scene coefficient of the second conference control scene; determining the priority of the first controlled scene as the sum of the controlled scene coefficient of the second controlled scene and the media stream coefficient corresponding to the second controlled scene; or, in response to that, in the at least one piece of conference control scene information, the media stream corresponding to the first conference control scene does not include the media stream of the second conference control scene in the at least one piece of conference control scene information, determining that the priority of the first conference control scene is the conference control scene coefficient of the first conference control scene.
In specific implementation, the terminal device determines the target conference control scene displayed on the terminal device according to the priority ranking result of the at least one conference control scene and/or the attribute of the at least one conference control scene.
In some embodiments, the conference control scene may include at least one of a presentation conference, a focus conference, a speaking conference, a round robin layout, a broadcast layout, and the like, wherein the conference control scene coefficient of the conference control scene decreases in a stepwise manner; the demonstration meeting place and the focus meeting place are in single-picture layout, the speaking meeting place is in two-picture (double-picture) layout, and the polling layout and the broadcasting layout are in multi-picture layout; the controllable operation information may include at least one controllable scene and/or an identification of the at least one controllable scene, and an identification of a media stream output in each picture (region) in the at least one controllable scene.
In some embodiments, in response to that the director operation information includes a combined scene of at least one director scene, determining a priority of the at least one director scene according to the at least one director scene and a media stream of the at least one director scene, and determining a layout corresponding to a director scene with a highest priority as a target director scene of the terminal device.
In some optional embodiments, after the terminal device determines the target conference control scenario of the terminal device, the media stream identifier corresponding to the target conference control scenario may also be confirmed based on the conference control operation information.
For example, the conference control operation information includes a focus conference place and a broadcast layout, where the conference control scene coefficient of the focus conference place is 150; the controlled scene coefficient of the broadcast layout is 120; in addition, the media stream identifier corresponding to the focal meeting place is the first media stream identifier; and the corresponding media stream in the broadcast layout comprises the media stream corresponding to the focal point meeting place and the media streams of other terminal equipment, and the identifier is a second media stream identifier. The media stream corresponding to the focal point meeting place is directly acquired by the first terminal device, so that the media stream corresponding to the focal point meeting place is the video stream and the audio stream of the first terminal device, and the coefficient of the media stream is 0; the media stream corresponding to the focal point meeting place and the media streams of other terminal devices need to be output in the broadcast layout, that is, the media streams in the broadcast layout are virtual streams including the focal point meeting place, and the media stream coefficient is 2; therefore, the priority of the focal point site is 150, the priority of the broadcast layout is 150+2=152, and the terminal device finally displays the broadcast layout in which the media stream of the focal point site is output.
For another example, the conference control operation information includes a focus conference room and a broadcast layout, where a conference control scene coefficient of the focus conference room is 150; the controlled scene coefficient of the broadcast layout is 120; in addition, the media stream identifier corresponding to the focal meeting place is a first media stream identifier; the corresponding media stream in the broadcast layout includes the video stream and the audio stream of the other terminal device, and is identified as the second media stream identifier. The media stream corresponding to the focal point meeting place is directly acquired by the first terminal device, so that the media stream corresponding to the focal point meeting place is the video stream and the audio stream of the first terminal device, and the coefficient of the media stream is 0; the video streams of other terminal devices need to be output in the broadcast layout, and the coefficient of the media stream is 0; therefore, the priority of the focus meeting place is 150, the priority of the broadcast layout is 120, and the terminal device finally displays the focus meeting place.
Step S102, sending media stream subscription request information to a second server based on the media stream information corresponding to the at least one conference control scene and the target conference control scene, so as to obtain a media stream corresponding to the target conference control scene.
In some embodiments, after the terminal device confirms the target controlled scene, a media stream identifier corresponding to the target controlled scene is confirmed based on media stream information corresponding to the target controlled scene in the at least one controlled scene; and sending the media stream subscription information to the second server based on the media stream identifier to acquire the media stream corresponding to the target conference control scene. The media stream comprises a video stream and an audio stream corresponding to at least one terminal device, and/or a virtual stream determined by the second server based on the video stream and the audio stream corresponding to the at least one terminal device; the second server may be a media server for generating a virtual stream.
In some optional embodiments, the conference control operation information includes at least one conference control scenario information, and a media stream identifier of a media stream corresponding to the at least one conference control scenario information; if the media stream corresponding to the conference control scene is a video stream, the media stream identifier may be an original media stream identifier corresponding to the video stream; if the media stream corresponding to the conference control scene is a virtual stream, the media stream identifier may be a virtual stream identifier.
In some optional embodiments, before or after sending the media stream subscription request information to the second server based on the media stream information corresponding to the at least one event control scene, the terminal device may further determine, based on a ratio of each region (picture) in the target event control scene, a transmission code rate of the media stream corresponding to each region; sending the transmission code rate of the media stream corresponding to each region to the first server; and adding the transmission code rate of the media stream corresponding to each region in the virtual stream configuration information by the first server. The area in the target controlled scene may be a picture included in the target controlled scene, for example, a broadcast layout includes a large picture and a plurality of small pictures; alternatively, the focus meeting place includes a large screen, etc.
In further optional embodiments, before or after sending the media stream subscription request information to the second server based on the media stream information corresponding to the at least one event control scene, the terminal device may further send a video stream and an audio stream corresponding to the terminal device and/or an identifier of the terminal device to the second server; the video stream corresponding to the terminal equipment comprises the video collected by the terminal equipment, and the audio stream corresponding to the terminal equipment comprises the audio collected by the terminal equipment.
And step S103, outputting the media stream corresponding to the target conference control scene through the target conference control scene.
In some embodiments, the terminal device receives a media stream corresponding to a target controlled scene sent by the second server, and outputs a corresponding media stream in at least one region corresponding to the target controlled scene.
In specific implementation, the terminal device confirms the media stream identifier to be output in at least one region included in the target conference control scene, and outputs the media stream sent by the second server based on the media stream identifier to be output in the at least one region.
For example, the target event control scene includes four regions a1, a2, a3 and a4, the media stream identifications to be output by the four regions are x1, x2, x3 and x4 in sequence, the second server sends the media stream including the media stream identifications x1, x2, x3 and x4 in sequence, and accordingly, the media stream with the media stream identification x1 is displayed in a region a 1; displaying the media stream with the media stream identification x2 in the area a 2; displaying the media stream with the media stream identification x3 in the area a 3; the media stream identified as x4 is displayed in region a 4.
Thus, with the data processing method provided by the embodiment of the present disclosure, based on at least one piece of event control scene information included in the event control operation information sent by the first server and media stream information corresponding to the at least one event control scene, a target event control scene displayed on the terminal device is determined; sending media stream subscription request information to a second server based on the media stream information corresponding to the at least one conference control scene and the target conference control scene to acquire a media stream corresponding to the target conference control scene; and outputting the media stream corresponding to the target controlled scene through the target controlled scene. The picture layout (first meeting control scene) effect of differentiation of virtual inflow layout in multiple meeting control scenes (combined meeting control scenes) can be realized, and the diversity, the practicability and the flexibility of the picture layout of the video conference system are improved; for example: by configuring the speaking meeting place broadcasting layout and subsequently executing the meeting control operation for setting the speaking meeting place, various picture layout effects containing pictures of the speaking meeting place can be generated, thereby realizing the differentiated picture layout support of combined meeting control. Because the logic of the virtual inflow layout (conference control scene) is used, the focus scene, the speaking scene and the like can control the pictures more smoothly when the frequent operation is carried out, and the picture layout is completely unchanged at the moment; different main and auxiliary priorities can be defined according to different types of terminals, so that configuration selection according to differentiated pictures of the terminals is realized, and the method is more flexible; the algorithm has strong universality and adaptability, and can be suitable for various ever-changing complex combination controllable picture layout requirements.
Fig. 2 shows another alternative flow chart of the data processing method provided in the embodiment of the present disclosure, which will be described according to various steps.
Step S201, receiving a video stream sent by at least one terminal device.
In some embodiments, the second server may be a media server. The second server receives a video stream sent by at least one terminal device; wherein the video stream may be audio and/or video captured by the at least one terminal device.
Step S202, generating at least one virtual stream corresponding to the virtual stream configuration information based on the virtual stream configuration information sent by the first server and/or the video stream sent by the at least one terminal device.
In some embodiments, the second server determines, based on an original media stream identifier included in the virtual stream configuration information, a first video stream transmitted by a terminal device corresponding to the original media stream identifier; in response to receiving a first identification frame of the first video stream, replacing an original media stream identification of the first video stream with a virtual stream identification included in the virtual stream configuration information; updating frame metadata of the first video stream; and determining to replace the original media stream identification, and updating the video stream and the audio stream of the frame metadata to the virtual stream corresponding to the virtual stream configuration information. Wherein the first identification frame may be an I-frame; the original media stream identifier may be an identifier of a terminal device, or an identifier of a location where the terminal device is located; the original media stream identifier may be an identifier of a terminal device corresponding to a media stream to be displayed by the terminal device (i.e., an identifier of a next media stream); for example, the confirmation, i.e. the output, media stream is generated based on the video stream and the audio stream of the terminal device a, and the original media stream identification may be the device identification of the terminal device a. The full English name of the I frame is an Intra frame, and the Chinese name is an Intra-frame coding frame, or key frame.
In a specific implementation, the second server may update a frame sequence number of at least one frame in the first video stream; and/or updating a frame timestamp of at least one frame in the first video stream. Optionally, the second server may insert at least one preset frame into the first video stream, and update a frame sequence number of the at least one frame and/or a frame timestamp of the at least one frame after the insertion. The preset frame may be a frame in a video stream sent by at least one terminal device, or may be a preset image frame.
In some other optional embodiments, the second server may further generate, based on a preset frame, at least one virtual stream corresponding to the virtual stream configuration information in response to not receiving the first video stream.
In some optional embodiments, the second server does not modify the actual encoded data of the video stream.
Step S203, based on the media stream subscription request information sent by the terminal device, sending at least one media stream corresponding to the media stream subscription request information to the terminal device.
In some embodiments, the second server determines at least one media stream based on a media stream identifier included in the media stream subscription request information sent by the terminal device; and sending the at least one media stream to the terminal equipment corresponding to the media stream subscription request information. The media stream identification may be the original media stream identification or the virtual stream identification.
The media stream comprises a video stream and an audio stream corresponding to at least one terminal device, and/or a virtual stream corresponding to the virtual stream configuration information.
For example, if the media stream corresponding to the media stream subscription request information is a virtual stream, the media stream identifier included in the media stream subscription request information is a virtual stream identifier; and if the media stream corresponding to the media stream subscription request information is a video stream, the media stream identifier included in the media stream subscription request information is an original media stream identifier.
Therefore, by the data processing method provided by the embodiment of the disclosure, the video stream sent by at least one terminal device is received; generating at least one virtual stream corresponding to the virtual stream configuration information based on the virtual stream configuration information sent by the first server and/or the video stream sent by the at least one terminal device; sending at least one media stream corresponding to the media stream subscription request information to a terminal device based on the media stream subscription request information sent by the terminal device; the picture layout (first meeting control scene) effect of the differentiation of the virtual inflow layout in the process of multiple meeting control scenes (combined meeting control scenes) can be realized, and the diversity, the practicability and the flexibility of the picture layout of the video conference system are improved.
Fig. 3 shows a schematic flow chart of yet another alternative of the data processing method provided in the embodiment of the present disclosure, which will be described according to various steps.
Step S301, sending the conference control operation information to at least one terminal device.
In some embodiments, the first server may be a signaling server. The first server receives conference control information sent by first equipment; receiving the transmission code rate of the media stream corresponding to each region in the terminal equipment, which is sent by the terminal equipment; generating the conference operation information based on the conference information; generating the virtual flow configuration information based on the conference control information and/or the transmission code rate; the conference control information comprises parameter information of a conference target conference control scene; the first device may be a remote server.
In some embodiments, the first server sends the conference operation information to the at least one terminal, so that the at least one terminal device determines a target conference scene displayed on the terminal device based on at least one conference scene information included in the conference operation information and media stream information corresponding to the at least one conference scene.
Step S302, sending the virtual stream configuration information to the second server.
In some embodiments, the first server sends virtual stream configuration information to a second server to cause the second server to determine a corresponding virtual stream based on the virtual stream configuration information and a corresponding video stream and audio stream of at least one terminal device.
Thus, with the data processing method provided by the embodiment of the present disclosure, the first server sends the conference control operation information to at least one terminal device, so that the at least one terminal device determines the target conference control scene displayed on the terminal device based on at least one conference control scene information and the media stream information corresponding to the at least one conference control scene included in the conference control operation information; sending virtual stream configuration information to a second server to enable the second server to determine a corresponding virtual stream based on the virtual stream configuration information and a video stream and an audio stream corresponding to at least one terminal device; the picture layout (first meeting control scene) effect of the differentiation of the virtual inflow layout in the process of multiple meeting control scenes (combined meeting control scenes) can be realized, and the diversity, the practicability and the flexibility of the picture layout of the video conference system are improved.
Fig. 4 is a schematic diagram illustrating an alternative structure of a data processing system according to an embodiment of the present disclosure, which will be described in terms of various parts.
The data processing system 400 includes a first device 410, a first server 420, a second server 430, and at least one terminal device 440.
In some embodiments, the first device 410 (the conferencing server) and the second server (the media server) 420 (the signaling server) may be the same server or may be different three servers; or the first device and the first server are the same server, and the second server is a separate server; or the first server and the second server are the same server, and the first device is a separate server; or the first device and the second server are the same server, and the first server is a separate service. When at least two of the first device, the first server, and the second server are the same server, the servers are respectively configured to implement an emergency service (a function implemented by the emergency server), a signaling service (a function implemented by the signaling server), and a media service (a function implemented by the media server).
The first device 410 may be a conferencing control server, configured to provide a conference control interface to a WEB browser or a terminal device (e.g., a terminal device of a conference host), where the conference control interface includes a control interface for setting a presentation conference site, a control interface for setting a focus conference site, a control interface for setting a speaking conference site, a control interface for setting a multi-screen broadcast layout, a control interface for setting a polling layout, and the like; in the control interface in which the multi-picture broadcast layout is set, an interface is also provided to select various picture layouts and provide a participant list for video streaming and layout grid (area or picture) one-to-one filling. It is particularly noted that virtual streams predefined for the conference are provided simultaneously for picture filling, including talk-floor, focus, etc.; the control interface of the polling layout is set, similar to the setting of the multi-picture broadcasting layout, the picture layout is selected firstly, and then one-to-many filling of the video stream and the layout grid is carried out. Also specifically, virtual streams predefined for the conference are provided simultaneously for screen filling, including talk-site, focus-site, etc.; and sending the conference control operation information to a signaling service.
The first server 420 may be a signaling server, configured to receive conference control operation information of the first device 410, including setting a presentation conference, setting a focus conference, setting a speaking conference, setting a multi-screen broadcast layout, setting a polling layout, and the like; according to the conference control operation information, aiming at the related virtual stream, sending virtual stream configuration information to a media service, and setting the unique identifier of the next original media stream; and sending a conference control operation list message to the terminal equipment which has entered the conference, wherein the conference control operation list message is used for expressing all currently operated conference control types, unique identifiers (virtual stream unique identifiers) of video streams related to the conference control, picture layout filling information corresponding to the video streams and the like.
The conference operation information may be instruction information for changing a conference scene (for example, instructing to change from a focus conference site to a talk conference site), or instruction information for changing the content of the conference scene (for example, the content output by the focus conference site is changed from a chairman to a video stream of any terminal device). The unique identifier of the original media stream, that is, the identifier of the original media stream, may be an identifier of a terminal device or an identifier of a location where the terminal device is located.
The second server 430 may be a media server, and is configured to receive media stream subscription request information sent by a terminal device and a video stream (a video stream collected by the terminal device) sent by the terminal device, and send a corresponding media stream to the terminal device according to the media stream subscription request information.
In addition, the method is further configured to receive virtual stream configuration information from the first server, and generate a virtual stream according to the virtual stream configuration information, and specifically may include: determining a video stream acquired by a terminal device according to an original media stream identifier in virtual stream configuration information, and modifying frame metadata of the video stream acquired by the terminal device, wherein the method comprises the following steps: the identifier of the video stream collected by the modified terminal device is a virtual stream identifier, a rewriting frame sequence number, a rewriting frame timestamp, etc., and then is sent to the corresponding terminal device, it should be noted that the actual encoded data of the video stream is not modified in the process of generating the media stream.
When a virtual stream is generated, stopping the modified output of an old video stream (a video stream collected by a previous terminal device) and starting the modified output of a new video stream when a first I frame of the new video stream (the video stream collected by the terminal device) arrives; and when the new video stream is empty, outputting frame data (namely a preset frame) corresponding to the preset map.
The terminal device 440 is configured to receive an emergency control operation list message (emergency control operation information) sent by the first server, and invoke a screen layout decision algorithm to select a screen layout (based on at least one emergency control scene information included in the emergency control operation information sent by the first server and media stream information corresponding to the at least one emergency control scene, a target emergency control scene displayed on the terminal device is determined); configuring the target control scene to a media transmission module included in the terminal device 440; the media transmission module sends media stream subscription request information to a second server according to the media stream identifications (including virtual stream identifications and/or original media stream identifications) filled in the target conference control scene, wherein the media stream subscription request information includes the media stream identifications corresponding to the target conference control scene; according to the size of each region in the target meeting control scene, sending a media stream code rate requirement setting message to a first server, so that a proper media stream is obtained, and the network transmission bandwidth is saved; and configuring the target controlled scene to a media decoding module, and decoding and outputting the multi-path media stream by the media decoding module according to the target controlled scene.
The terminal device 440 is specifically configured to define a main priority of an individual controlled operation (i.e., a controlled scene coefficient of a controlled scene), and is configured to determine a display order of the controlled scene. The main priority value comprises a step interval, for example, when the step interval is 10, a presentation operation 160, a focus operation 150, a speaking operation 140, etc.;
And defining auxiliary priorities (namely media stream coefficients of virtual streams corresponding to different controlled scenes) when the combined controlled scenes virtually flow into the layout, and using the auxiliary priorities to fine-tune the priorities so as to solve the problem of picture display sequence in complex combined controlled scenes. The secondary priority value is within the ladder interval of the primary priority, for example, the ladder interval of the primary priority is 10, and all secondary priorities should be less than 10. For example, the following secondary priority definitions; the media stream coefficient of the virtual stream (virtual inflow polling layout) in the polling layout is 3, the media stream coefficient of the virtual stream (virtual inflow broadcasting layout) in the broadcasting layout is 2, and the media stream coefficient of the virtual stream (virtual inflow roll calling layout) in the roll calling layout is 1; different terminal types may define different conferenced primary priorities and combined conferenced secondary priorities.
Specifically, the terminal device 400 traverses the conference control scene list, determines a conference control scene, and sets the conference control scene coefficient of the conference control scene as the main priority corresponding to the conference control scene;
traversing all video stream lists related to the controlled scene, and if the controlled scene contains a virtual stream, performing priority adjustment, specifically:
the priority of the virtual inflow layout = a primary priority (the controllable scene corresponding to the source of the virtual flow) + a secondary priority corresponding to the virtual flow (the virtual inflow layout) in the controllable scene;
The priority of the controlled scene = the maximum of the priorities of all the virtual inflow layouts of the controlled scene;
that is to say, the priority of the virtual inflow layout may be determined by the primary priority of the controlled scenario corresponding to the source of the virtual flow and the secondary priority corresponding to the virtual flow in the controlled scenario. For example, in a broadcast layout scenario, if a virtual stream output by the broadcast layout is a virtual stream of a focus meeting place, the main priority of the controlled scenario corresponding to the source of the corresponding virtual stream is 150, and the auxiliary priority corresponding to the virtual stream in the broadcast layout is 2, the priority of the virtual stream in the broadcast layout is 150+ 2. Further, if there is a virtual stream in the broadcast layout and none of the other streams are video streams, the priority of the broadcast layout is 152.
After the priority setting of all the controlled scenes is finished, traversing the controlled operation list again, finding the controlled scene with the maximum priority, and selecting the controlled scene as a target controlled scene; and if the conference control operation list is empty, selecting the voice excitation virtual stream + single-picture layout as a target conference control scene by default.
Table 1 shows a schematic diagram of the conference coefficient of the conference scene, and an optional picture layout corresponding to the conference scene. Table 2 shows media stream coefficients of a media stream corresponding to a controlled scene.
TABLE 1 Scenario coefficients of Scenario
Controlled scene Main priority (control scene coefficient) Picture layout
Demonstration meeting place 160 Full screen single picture
Focal meeting place 150 Full screen single picture
Speaking meeting place 140 Two-screen speech layout with chairman or focus on the left and speech on the right
Wheel patrol layout 130 Multi-picture layout
Broadcast layout 120 Multi-picture layout
Table 2 media stream coefficients of media streams corresponding to controlled scenes
Controlled scene Auxiliary priority (media stream coefficient)
Virtual flows in a round robin layout 3
Virtual streams in broadcast layouts 2
Virtual flows in floor layouts 1
Fig. 5 shows a schematic flow chart of yet another alternative of the data processing method provided by the embodiment of the present disclosure, which will be described according to various steps.
In step S501, the first server receives the conference control information sent by the first device.
In some embodiments, the first server may be a signaling server. The first server receives the conference control information sent by the first equipment; receiving the transmission code rate of the media stream corresponding to each region in the terminal equipment, which is sent by the terminal equipment; generating the conference operation information based on the conference information; generating the virtual flow configuration information based on the conference control information and/or the transmission code rate; the conference control information comprises parameter information of a conference target conference control scene; the first device may be a remote server.
Step S502, the first server sends the conference control operation information to at least one terminal device.
In some embodiments, the first server sends the conference operation information to the at least one terminal, so that the at least one terminal device determines a target conference scene displayed on the terminal device based on at least one conference scene information included in the conference operation information and media stream information corresponding to the at least one conference scene.
In step S503, the first server sends the virtual stream configuration information to the second server.
In some embodiments, the first server sends virtual stream configuration information to a second server to cause the second server to determine a corresponding virtual stream based on the virtual stream configuration information and a corresponding video stream and audio stream of at least one terminal device.
Wherein, the virtual stream configuration information at least comprises an original media stream identifier and a virtual stream identifier; the original media stream identification comprises an identification of the terminal equipment or an identification of the location of the terminal equipment; the virtual flow identification is used to determine the virtual flow to which the end device subscribes.
Specifically, in the process of determining a virtual stream, the second server first obtains a corresponding first video stream based on the original media stream identifier, updates the frame metadata of the first video stream, and/or replaces the identifier of the first video stream with the virtual stream identifier from the original media stream identifier. So that the terminal device can determine a picture (area) for outputting the virtual stream based on the virtual stream identifier after receiving the virtual stream.
And step S504, the terminal equipment determines the target control scene displayed on the terminal equipment.
In some embodiments, the terminal device confirms the guidable scene coefficient of the at least one guidable scene in advance, and/or the media stream coefficient of the media stream corresponding to the at least one guidable scene. The controllable scene coefficients of the at least one controllable scene are gradually decreased or increased in a stepwise manner, that is, the minimum value of the difference between the controllable scene coefficients of any two controllable scenes is a fixed value; the media stream coefficient of the media stream corresponding to the at least one controlled scene is increased or decreased (may be increased in a stepwise manner, or may not be increased in a stepwise manner), and the media stream coefficient of the media stream is smaller than the difference between the controlled scene coefficients of any two controlled scenes. As shown in tables 1 and 2.
In some embodiments, the terminal device determines a priority of at least one controlled scene to be displayed by the terminal device based on at least one controlled scene coefficient included in the controlled operation information and a media stream coefficient corresponding to the at least one controlled scene; sequencing at least one conference control scene to be displayed by the terminal equipment based on the priority from high to low to obtain a sequencing result; and determining the target control scene displayed on the terminal equipment based on the sequencing result.
In specific implementation, when the terminal device responds to the at least one conference control scene information, and the media stream corresponding to the first conference control scene comprises the media stream of the second conference control scene in the at least one conference control scene information, determining that the conference control scene coefficient of the first conference control scene is the conference control scene coefficient of the second conference control scene; determining the priority of the first controlled scene as the sum of the controlled scene coefficient of the second controlled scene and the media stream coefficient corresponding to the second controlled scene; or, in response to that, in the at least one piece of conference control scene information, the media stream corresponding to the first conference control scene does not include the media stream of the second conference control scene in the at least one piece of conference control scene information, determining that the priority of the first conference control scene is the conference control scene coefficient of the first conference control scene.
In specific implementation, the terminal device determines the target controlled scene displayed on the terminal device according to the sequencing result of the priority of the at least one controlled scene and/or the attribute of the at least one controlled scene.
A specific example of determining, by the terminal device, a target controlled scene displayed on the terminal device based on at least one piece of controlled scene information and media stream information corresponding to the at least one controlled scene will be described in detail later.
Step S505, the terminal device sends media stream subscription request information to the second server.
In some embodiments, after the terminal device confirms the target controlled scene, a media stream identifier corresponding to the target controlled scene is confirmed based on media stream information corresponding to the target controlled scene in the at least one controlled scene; and sending the media stream subscription information to the second server based on the media stream identifier to acquire the media stream corresponding to the target conference control scene. The media stream comprises a video stream and an audio stream corresponding to at least one terminal device, and/or a virtual stream determined by the second server based on the video stream and the audio stream corresponding to the at least one terminal device; the second server may be a media server for generating a virtual stream.
In some optional embodiments, the affiliate operation information includes at least one affiliate scene information, and a media stream identifier of a media stream corresponding to the at least one affiliate scene information.
In some optional embodiments, before or after sending the media stream subscription request information to the second server based on the media stream information corresponding to the at least one event control scene, the terminal device may further determine, based on a ratio of each region (picture) in the target event control scene, a transmission code rate of the media stream corresponding to each region; sending the transmission code rate of the media stream corresponding to each region to the first server; and adding the transmission code rate of the media stream corresponding to each region in the virtual stream configuration information by the first server. The area in the target controlled scene may be a picture included in the target controlled scene, for example, a broadcast layout includes a large picture and a plurality of small pictures; alternatively, the focal meeting place includes a large screen, etc.
Step S506, the terminal device sends the video stream to the second server.
In some embodiments, the terminal device sends, to the second server, the video stream collected by the terminal device and/or the original media stream identifier corresponding to the video stream.
In some optional embodiments, the original media stream identifier corresponding to the video stream may not be changed according to the content of the video stream, that is, the original media stream identifier of the video stream sent by the terminal device is unique.
It should be noted that there is no sequential relationship between step S505 and step S506, and step S505 may be executed first, and then step S506 may be executed; or, first, step S506 is executed, and then step S505 is executed; or step S505 and step S506 are performed simultaneously.
In step S507, the second server receives a video stream sent by at least one terminal device.
In some embodiments, the second server may be a media server. And the second server receives the video stream sent by at least one terminal device.
In step S508, the second server generates at least one virtual stream.
In some embodiments, the second server may receive media stream subscription information sent by at least one terminal device,
In some embodiments, the second server determines, based on an original media stream identifier included in the virtual stream configuration information, a first video stream transmitted by a terminal device corresponding to the original media stream identifier; in response to receiving a first identification frame of the first video stream, replacing an original media stream identification of the first video stream with a virtual stream identification included in the virtual stream configuration information; updating frame metadata of the first video stream; and determining to replace the original media stream identification, and updating the video stream of the frame metadata to a virtual stream corresponding to the virtual stream configuration information. Wherein the first identification frame may be an I-frame; the original media stream identifier may be an identifier of a terminal device, or an identifier of a location where the terminal device is located; the original media stream identifier may be an identifier of a terminal device corresponding to a media stream to be displayed by the terminal device (i.e., an identifier of a next media stream); for example, the confirmation, i.e. the output, media stream is generated based on the video stream of the terminal device a, and the original media stream identification may be the device identification of the terminal device a.
In a specific implementation, the second server may update a frame sequence number of at least one frame in the first video stream; and/or updating a frame timestamp of at least one frame in the first video stream. Optionally, the second server may insert at least one preset frame into the first video stream, and update a frame sequence number of the at least one frame and/or a frame timestamp of the at least one frame after the insertion. The preset frame may be a frame in a video stream sent by at least one terminal device, or may be a preset image frame.
For example, the second server confirms that the original media stream identifier is y1 and the generated virtual stream identifier is y2 based on the virtual stream configuration information. The second server determines a first video stream with an original media stream identifier of y1 from the plurality of video streams sent by the at least one terminal device; in response to the second server receiving an I-frame (first identification frame) of the first video stream, the second server replacing the identification of the first video stream from y1 to y2, updating frame metadata of the first video stream, determining to replace the original media stream identification, and updating the video stream of the frame metadata to a virtual stream corresponding to the virtual stream configuration information.
In some other optional embodiments, the second server may further generate, in response to not receiving the first video stream, a corresponding at least one virtual stream based on preset frames. The preset frame may be a preset image frame or a video stream collected by any terminal device.
In some optional embodiments, the second server does not modify the actual encoded data of the video stream.
Step S509, the second server sends the media stream to the terminal device.
In some embodiments, the second server transmits the corresponding media stream to the terminal device based on the media stream identification included in the media stream request information.
In specific implementation, in response to that the media stream identifier is an original media stream identifier, the second server sends a video stream corresponding to the original media stream identifier to the terminal device; or, in response to that the media stream identifier is a virtual stream identifier, the second server sends a virtual stream corresponding to the virtual stream identifier to the terminal device.
Step S510, the terminal device outputs a media stream corresponding to the target conference control scene through the target conference control scene.
In some embodiments, the terminal device receives a media stream corresponding to a target controlled scene sent by the second server, and outputs a corresponding media stream in at least one region corresponding to the target controlled scene.
In specific implementation, the terminal device confirms the media stream identifier to be output in at least one region included in the target conference control scene, and outputs the media stream sent by the second server based on the media stream identifier to be output in the at least one region.
For example, the target event control scene includes four regions a1, a2, a3 and a4, the media stream identifications to be output by the four regions are x1, x2, x3 and x4 in sequence, the second server sends the media stream including the media stream identifications x1, x2, x3 and x4 in sequence, and accordingly, the media stream with the media stream identification x1 is displayed in a region a 1; displaying the media stream with the media stream identification x2 in the area a 2; displaying the media stream with the media stream identification x3 in the area a 3; displaying the media stream with the media stream identification x4 in the area a 4; wherein the media stream comprises a video stream or a virtual stream.
Thus, by the data processing method provided by the embodiment of the disclosure, a differentiated picture layout (first conference control scene) effect of virtual inflow layout in a plurality of conference control scenes (combined conference control scenes) can be realized, and the diversity, the practicability and the flexibility of the picture layout of the video conference system are improved; for example: by configuring the speaking meeting place broadcasting layout and subsequently executing the meeting control operation for setting the speaking meeting place, various picture layout effects containing pictures of the speaking meeting place can be generated, thereby realizing the differentiated picture layout support of combined meeting control. Because the logic of the virtual inflow layout (conference control scene) is used, the focus scene, the speaking scene and the like can control the pictures more smoothly when the frequent operation is carried out, and the picture layout is completely unchanged at the moment; different main and auxiliary priorities can be defined according to different types of terminals, so that configuration selection according to differentiated pictures of the terminals is realized, and the method is more flexible; the algorithm has strong universality and adaptability, and can be suitable for various ever-changing complex combination controllable picture layout requirements.
Next, a specific scheme that the terminal device determines a target conference control scene displayed on the terminal device based on at least one conference control scene information and media stream information corresponding to the at least one conference control scene is explained; take the controllable scene coefficients and media stream coefficients shown in tables 1 and 2 as examples.
If no conference control operation is received, the target conference control scene of the terminal device may include a voice excitation single-picture layout.
Fig. 6 shows a first schematic diagram of a display screen of a terminal device provided by an embodiment of the present disclosure.
If the combined conference control type comprises a focus conference room and a speaking conference room, and in the two pictures of the speaking conference room, the left picture is a picture corresponding to the focus conference room (namely the virtual inflow of the focus conference room to the speaking conference room), and the right picture is a picture of the speaking conference room; the priority of the focal meeting place is the controlled scene coefficient of the focal meeting place, namely 150; the priority of the talk spurt place is the talk spurt scene coefficient of the focus place + the media stream coefficient of the virtual stream in the talk layout, i.e. 150+1= 151. The target conference control scene displayed by the terminal equipment is a picture of the speaking conference room, and a left picture outputs a virtual stream of the focus conference room; the right screen outputs a video stream of a terminal device that is speaking (outputting a video stream). As shown in fig. 6, if the presentation conference room is output as a roll call conference room, the left side is the focus conference room, and the right side is the video streaming machine or virtual stream of the terminal device that outputs the video stream (that is being called or is presenting).
If the combined conference control type includes a focus conference room and a broadcast layout, and the virtual stream of the focus conference room does not enter the broadcast layout (i.e., there is no virtual stream of the focus conference room in the broadcast layout), the priority of the focus conference room is the conference control scene coefficient 150 of the focus conference room, the priority of the broadcast layout is the conference control scene coefficient 120 of the broadcast layout, and the target conference control scene finally displayed by the terminal device is a focus single-picture layout.
Fig. 7 shows a second schematic diagram of a display screen of a terminal device provided by an embodiment of the present disclosure.
If the combined conference control type includes a focus conference room and a broadcast layout, and the virtual inflow broadcast layout of the focus conference room (that is, the broadcast layout includes a virtual stream of the focus conference room), the priority of the focus conference room is the conference control scene coefficient 150 of the focus conference room, the priority of the broadcast layout is the sum of the conference control scene coefficient 150 of the focus conference room and the media stream coefficient 2 of the virtual stream in the broadcast layout, that is, 152, and the target conference control scene finally displayed by the terminal device is the broadcast layout. As shown in fig. 7, the largest area in the broadcast layout outputs the virtual stream of the focal point meeting place, and the other areas output the video streams of the terminal devices (i.e., yt, lj, and hyx are the video streams of the terminal devices).
If the combined conference control type comprises a speaking conference place and a polling layout, and a virtual stream of the speaking conference place does not enter the polling layout (namely the polling layout does not have the virtual stream of the speaking conference place), the priority of the speaking conference place is a conference control scene coefficient 140 of the speaking conference place, the priority of the polling layout is a conference control scene coefficient 130 of the polling layout, and a target conference control scene finally displayed by the terminal device is a picture of the speaking conference place, namely a picture surrounding a chairman on the left side and a speaking picture on the right side.
If the combined conference control type includes a speaking conference place, a polling layout and a broadcasting layout, and the virtual inflow polling layout and the broadcasting layout of the speaking conference place (that is, the polling layout and the broadcasting layout both include the virtual stream of the speaking conference place), the priority of the speaking conference place is the sum of the conference control scene coefficient 140 of the speaking conference place and the media stream coefficient 1 of the virtual stream of the speaking conference place, that is, 141; the priority of the polling layout is the sum of the conference control scene coefficient 140 of the speaking conference place and the media stream coefficient 3 of the virtual stream of the polling layout, namely 143; the priority of the broadcast venue is the sum of the conference scenario coefficient 140 of the speaking venue and the media stream coefficient 2 of the virtual stream of the broadcast venue, i.e. 142. And the final target meeting control scene displayed by the terminal equipment is a picture of the polling layout.
Fig. 8 illustrates a third schematic diagram of a display screen of a terminal device according to an embodiment of the present disclosure.
If the combined conference control type includes a talk session, a focus session and a broadcast layout, and the virtual stream of the focus session flows into the talk session, the virtual stream of the focus session and the virtual stream of the talk session both flow into the broadcast layout (the broadcast layout includes the virtual stream of the focus session and the virtual stream of the talk session), and the priority of the focus session is a conference control scene coefficient 150 of the focus session; the priority of the speaking place is the sum of the conference control scene coefficient 150 of the focus place and the media stream coefficient 1 of the virtual stream of the speaking place, namely 151; the priority of the broadcast layout is the sum of the controllable scene coefficient 150 of the focus meeting place and the media stream coefficient 2 of the virtual stream of the broadcast layout, namely 152; and the final target meeting control scene displayed by the terminal equipment is a picture of a broadcast layout. As shown in fig. 8, the screen includes an area corresponding to the virtual stream of the focus conference site and an area corresponding to the virtual stream of the roll call conference site (i.e., the floor conference site), and video streams of the terminal devices yt, thz, and lj.
If the combined conference control type comprises a broadcast layout, and the media stream output in the broadcast layout is a voice excitation stream; the priority of the broadcast layout is the controlled scene coefficient 120 of the broadcast layout, and the target controlled scene finally displayed by the terminal device is the picture of the broadcast layout.
If the combined meeting control type comprises a broadcast layout and a demonstration meeting place, and the virtual stream of the demonstration meeting place does not enter the broadcast layout; the priority of the broadcast layout is the controlled scene coefficient 120 of the broadcast layout; the priority of the demonstration meeting place is the meeting place control scene coefficient 160 of the demonstration meeting place, and the final displayed target meeting place of the terminal equipment is the picture of the demonstration meeting place.
If the combined meeting control type comprises a broadcast layout and a demonstration meeting place, and the virtual inflow broadcast layout of the demonstration meeting place; the priority of the broadcast layout is the sum of the controlled scene coefficient 160 of the presentation venue and the media stream coefficient 2 of the virtual stream of the broadcast venue, i.e. 162; the priority of the demonstration meeting place is the meeting place control scene coefficient 160 of the demonstration meeting place, and the final displayed target meeting place of the terminal equipment is a picture of a broadcast layout.
Fig. 9 shows a fourth schematic diagram of a display screen of a terminal device according to an embodiment of the present disclosure.
If the combined conference control type includes a talk session and a broadcast layout, and a virtual inflow broadcast layout of the talk session (that is, a virtual stream of the talk session is included in the broadcast layout), the priority of the talk session is a conference control scene coefficient 140 of the talk session, the priority of the broadcast layout is the sum of the conference control scene coefficient 140 of the talk session and a media stream coefficient 2 of the virtual stream in the broadcast layout, that is, 142, and the target conference control scene finally displayed by the terminal device is the broadcast layout. As shown in fig. 9, the largest area in the broadcast layout outputs a virtual stream of the speaking venue, and the other areas output video streams of the terminal devices (i.e., lj, hyx, dogenya, yt2 are video streams of the terminal devices).
Fig. 10 shows a fifth schematic diagram of a display screen of a terminal device provided by an embodiment of the present disclosure.
If the combined conference control type includes a focus conference place and a round robin layout, and a virtual inflow round robin layout of the focus conference place (i.e., the round robin layout includes a virtual stream of the focus conference place), the priority of the focus conference place is a conference control scene coefficient 150 of the focus conference place, the priority of the round robin layout is the sum of the conference control scene coefficient 150 of the focus conference place and a media stream coefficient 3 of the virtual stream in the round robin layout, i.e., 153, and the target conference control scene finally displayed by the terminal device is the round robin layout. As shown in fig. 10, the largest area in the polling layout outputs a virtual stream of the focal meeting place, and the other areas output video streams of the terminal devices (that is, han, gen, xyz, zhx, xyz, and wz are video streams of the terminal devices), and unlike the broadcast layout, a media stream in a small screen (small area) in the polling layout is a fusion of video streams of a plurality of terminal devices. E.g. the two boxes of the second row are filled with the video streams of 3 terminal devices, respectively.
Fig. 11 shows a sixth schematic diagram of a display screen of a terminal device provided by an embodiment of the present disclosure.
If the combined conference control type includes a presentation conference site and a broadcast layout, and a virtual inflow broadcast layout of the presentation conference site (that is, the broadcast layout includes a virtual stream of the presentation conference site), the priority of the presentation conference site is a conference control scene coefficient 160 of the presentation conference site, the priority of the broadcast layout is the sum of the conference control scene coefficient 160 of the presentation conference site and a media stream coefficient 2 of the virtual stream in the broadcast layout, that is, 162, and a target conference control scene finally displayed by the terminal device is the broadcast layout. As shown in fig. 11, the largest area in the broadcast layout outputs a virtual stream of a presentation venue, and the other areas output video streams of terminal devices (i.e., Test and Test2 are video streams of terminal devices).
In some embodiments, the method (particularly, the frame layout decision method) executed by the terminal device is also applicable to frame layout decision when the mcu server outputs the merged stream in the multi-frame merging process, and also applicable to frame layout decision when the multi-frame merged recording and broadcasting is performed directly.
Fig. 12 shows an alternative structural diagram of a terminal device provided in an embodiment of the present disclosure, which will be described according to various parts.
In some embodiments, the terminal device 440 includes a layout unit 441, a first transmission unit 442, and an output unit 443.
The layout unit 441 is configured to determine a target conference control scene displayed on the terminal device based on at least one piece of conference control scene information included in the conference control operation information sent by the first server and media stream information corresponding to the at least one conference control scene;
the first sending unit 442 is configured to send media stream subscription request information to a second server based on the media stream information corresponding to the at least one event control scene and the target event control scene, so as to obtain a media stream corresponding to the target event control scene;
the output unit 443 is configured to output a media stream corresponding to the target controlled scene through the target controlled scene;
The media stream comprises a video stream and an audio stream corresponding to at least one terminal device, and/or the second server determines a virtual stream based on the video stream and the audio stream corresponding to the at least one terminal device.
The layout unit 441 is further configured to, before determining a target controlled scene displayed on the terminal device based on at least one piece of controlled scene information included in the controlled operation information sent by the first server and media stream information corresponding to the at least one controlled scene, confirm a controlled scene coefficient of the at least one controlled scene and/or a media stream coefficient of a media stream corresponding to the at least one controlled scene.
The layout unit 441 is specifically configured to determine, based on at least one event control scene coefficient included in the event control operation information and a media stream coefficient corresponding to the at least one event control scene, a priority of the at least one event control scene to be displayed by the terminal device; sequencing at least one conference control scene to be displayed by the terminal equipment based on the priority from high to low to obtain a sequencing result; and determining a target control scene displayed on the terminal equipment based on the sequencing result.
The layout unit 441 is specifically configured to, in response to that, in the at least one piece of controlled scene information, a media stream corresponding to a first controlled scene includes a media stream of a second controlled scene in the at least one piece of controlled scene information, determine that a controlled scene coefficient of the first controlled scene is a controlled scene coefficient of the second controlled scene; determining the priority of the first controllable scene as the sum of the controllable scene coefficient of the second controllable scene and the media stream coefficient corresponding to the second controllable scene; or, in response to that, in the at least one piece of conference control scene information, the media stream corresponding to the first conference control scene does not include the media stream of the second conference control scene in the at least one piece of conference control scene information, determining that the priority of the first conference control scene is the conference control scene coefficient of the first conference control scene.
The layout unit 441 is further configured to determine, based on the ratio of each region in the target conference control scene, the transmission code rate of the media stream corresponding to each region before or after the media stream subscription request information is sent to the second server based on the media stream information corresponding to the at least one conference control scene; sending the transmission code rate of the media stream corresponding to each region to the first server; and the first server adds the transmission code rate of the media stream corresponding to each region in the virtual stream configuration information.
The first sending unit 442, before or after sending the media stream subscription request information to the second server based on the media stream information corresponding to the at least one controllable scene, is further configured to send a video stream and an audio stream corresponding to the terminal device to the second server; the video stream corresponding to the terminal equipment comprises the video collected by the terminal equipment, and the audio stream corresponding to the terminal equipment comprises the audio collected by the terminal equipment.
Fig. 13 shows an alternative structural diagram of the second server provided in the embodiment of the present disclosure, which will be described according to various parts.
In some embodiments, the second server 430 includes a first receiving unit 431, a generating unit 432, and a second sending unit 433.
The first receiving unit 431 is configured to receive a video stream sent by at least one terminal device;
the generating unit 432 is configured to generate at least one virtual stream corresponding to the virtual stream configuration information based on the virtual stream configuration information sent by the first server and/or the video stream sent by the at least one terminal device;
the second sending unit 433 is configured to send, to a terminal device, at least one media stream corresponding to media stream subscription request information based on the media stream subscription request information sent by the terminal device;
The media stream comprises a video stream and an audio stream corresponding to at least one terminal device, and/or a virtual stream corresponding to the virtual stream configuration information.
The generating unit 432 is specifically configured to execute the following operations for each original media stream identifier corresponding to the virtual stream configuration information:
determining a first video stream sent by the terminal equipment corresponding to the original media stream identifier based on the original media stream identifier included in the virtual stream configuration information;
in response to receiving a first identification frame of the first video stream, replacing an original media stream identification of the first video stream with a virtual stream identification included in the virtual stream configuration information;
updating frame metadata of the first video stream;
and determining to replace the original media stream identification, and updating the video stream of the frame metadata to a virtual stream corresponding to the virtual stream configuration information.
The generating unit 432 is specifically configured to update a frame sequence number of at least one frame in the first video stream; and/or updating a frame timestamp of at least one frame in the first video stream.
The generating unit 432 is further configured to generate at least one virtual stream corresponding to the virtual stream configuration information based on a preset frame in response to that the first video stream is not received.
The second sending unit 433 is specifically configured to determine a transmission code rate of at least one media stream based on the virtual stream configuration information; sending a virtual stream corresponding to the virtual stream identifier to the terminal equipment based on the virtual stream identifier in the media stream subscription request information and/or the transmission code rate of the virtual stream corresponding to the virtual stream identifier; or sending a first video stream corresponding to the original media stream identifier to the terminal device based on the original media stream identifier in the media stream subscription request information and/or the transmission code rate of the video stream corresponding to the original media stream identifier.
Fig. 14 shows an alternative structural diagram of the first server provided in the embodiment of the present disclosure, which will be described according to various parts.
In some embodiments, the first server 420 comprises a third sending unit 421.
The third sending unit 421 is configured to send the conference control operation information to at least one terminal device, so that the at least one terminal device determines a target conference control scene displayed on the terminal device based on at least one piece of conference control scene information included in the conference control operation information and media stream information corresponding to the at least one conference control scene; and sending virtual stream configuration information to a second server so that the second server determines a corresponding virtual stream based on the virtual stream configuration information and the video stream and the audio stream corresponding to at least one terminal device.
In some embodiments, the first server 420 comprises a processing unit 422.
The processing unit 422 is configured to receive the conference control information sent by the first device; receiving the transmission code rate of the media stream corresponding to each region in the terminal equipment, which is sent by the terminal equipment; generating the conference operation information based on the conference information; generating the virtual flow configuration information based on the conference control information and/or the transmission code rate; and the conference control information comprises parameter information of a conference target conference control scene.
The present disclosure also provides an electronic device and a readable storage medium according to an embodiment of the present disclosure.
FIG. 15 shows a schematic block diagram of an example electronic device 800 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 15, the electronic apparatus 800 includes a computing unit 801 that can perform various appropriate actions and processes in accordance with a computer program stored in a Read Only Memory (ROM) 802 or a computer program loaded from a storage unit 808 into a Random Access Memory (RAM) 803. In the RAM 803, various programs and data necessary for the operation of the device 800 can also be stored. The calculation unit 801, the ROM 802, and the RAM 803 are connected to each other by a bus 804. An input/output (I/O) interface 805 is also connected to bus 804.
A number of components in the electronic device 800 are connected to the I/O interface 805, including: an input unit 806 such as a keyboard, a mouse, or the like; an output unit 807 such as various types of displays, speakers, and the like; a storage unit 808, such as a magnetic disk, optical disk, or the like; and a communication unit 809 such as a network card, modem, wireless communication transceiver, etc. The communication unit 809 allows the device 800 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.
Computing unit 801 may be a variety of general and/or special purpose processing components with processing and computing capabilities. Some examples of the computing unit 801 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and the like. The calculation unit 801 executes the respective methods and processes described above, such as the data processing method. For example, in some embodiments, the data processing method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 808. In some embodiments, part or all of the computer program can be loaded and/or installed onto device 800 via ROM 802 and/or communications unit 809. When loaded into RAM 803 and executed by the computing unit 801, a computer program may perform one or more steps of the data processing method described above. Alternatively, in other embodiments, the computing unit 801 may be configured to perform the data processing method by any other suitable means (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present disclosure, "a plurality" means two or more unless specifically limited otherwise.
The above description is only for the specific embodiments of the present disclosure, but the scope of the present disclosure is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present disclosure, and all the changes or substitutions should be covered within the scope of the present disclosure. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.

Claims (14)

1. A data processing method is applied to terminal equipment, and the method comprises the following steps:
determining a target conference control scene displayed on the terminal device based on at least one conference control scene information included in the conference control operation information sent by the first server and media stream information corresponding to the at least one conference control scene;
sending media stream subscription request information to a second server based on the media stream information corresponding to the at least one conference control scene and the target conference control scene to acquire a media stream corresponding to the target conference control scene;
outputting a media stream corresponding to the target conference control scene through the target conference control scene;
the media stream comprises a video stream and an audio stream corresponding to at least one terminal device, and/or a virtual stream determined by the second server based on the video stream and the audio stream corresponding to the at least one terminal device;
the determining a target conference control scene displayed on the terminal device based on at least one piece of conference control scene information included in the conference control operation information sent by the first server and media stream information corresponding to the at least one conference control scene includes:
responding to the at least one conference control scene information, wherein a media stream corresponding to a first conference control scene comprises a media stream of a second conference control scene in the at least one conference control scene information, and determining the conference control scene coefficient of the first conference control scene as the conference control scene coefficient of the second conference control scene; determining the priority of the first controlled scene as the sum of the controlled scene coefficient of the second controlled scene and the media stream coefficient corresponding to the second controlled scene;
Or, in response to that, in the at least one piece of conference control scene information, the media stream corresponding to the first conference control scene does not include the media stream of the second conference control scene in the at least one piece of conference control scene information, determining that the priority of the first conference control scene is the conference control scene coefficient of the first conference control scene;
sequencing at least one conference control scene to be displayed by the terminal equipment based on the priority of the conference control scene from high to low to obtain a sequencing result; and determining the target control scene displayed on the terminal equipment based on the sequencing result.
2. The method according to claim 1, wherein the determining that the target affiliate control scene is displayed before the terminal device is determined based on at least one affiliate control scene information included in the affiliate control operation information sent by the first server and media stream information corresponding to the at least one affiliate control scene, and the method further includes:
and confirming the controllable scene coefficient of the at least one controllable scene and/or the media stream coefficient of the media stream corresponding to the at least one controllable scene.
3. The method according to claim 1, wherein before or after sending the media stream subscription request information to the second server based on the media stream information corresponding to the at least one event control scenario, the method further comprises:
Sending the video stream and the audio stream corresponding to the terminal equipment to the second server;
the video stream corresponding to the terminal equipment comprises the video collected by the terminal equipment, and the audio stream corresponding to the terminal equipment comprises the audio collected by the terminal equipment.
4. A data processing method applied to a second server, the method comprising:
receiving a video stream sent by at least one terminal device;
generating at least one virtual stream corresponding to the virtual stream configuration information based on the virtual stream configuration information sent by the first server and/or the video stream sent by the at least one terminal device;
sending at least one media stream corresponding to the media stream subscription request information to a terminal device based on the media stream subscription request information sent by the terminal device;
the media stream comprises a video stream and an audio stream corresponding to at least one terminal device, and/or a virtual stream corresponding to the virtual stream configuration information; the media stream subscription request information is determined based on a target controlled scene, the target controlled scene is determined by the priority of at least one controlled scene to be displayed by the terminal device, and the method comprises the following steps:
Responding to the media stream corresponding to the first controlled scene in the at least one piece of controlled scene information and including the media stream of the second controlled scene in the at least one piece of controlled scene information, and determining the controlled scene coefficient of the first controlled scene as the controlled scene coefficient of the second controlled scene; determining the priority of the first controllable scene as the sum of the controllable scene coefficient of the second controllable scene and the media stream coefficient corresponding to the second controllable scene; or, in response to that the media stream corresponding to the first controlled scene does not include the media stream of the second controlled scene in the at least one piece of controlled scene information, determining that the priority of the first controlled scene is the controlled scene coefficient of the first controlled scene.
5. The method according to claim 4, wherein the generating at least one virtual stream corresponding to the virtual stream configuration information based on the virtual stream configuration information sent by the first server and the video stream sent by the at least one terminal device comprises, for each original media stream identifier corresponding to the virtual stream configuration information:
Determining a first video stream sent by the terminal equipment corresponding to the original media stream identifier based on the original media stream identifier included in the virtual stream configuration information;
in response to receiving a first identification frame of the first video stream, replacing an original media stream identification of the first video stream with a virtual stream identification included in the virtual stream configuration information;
updating frame metadata of the first video stream;
and determining to replace the original media stream identification, and updating the video stream of the frame metadata to a virtual stream corresponding to the virtual stream configuration information.
6. The method of claim 5, wherein the updating the frame metadata of the first video stream comprises:
updating a frame sequence number of at least one frame in the first video stream;
and/or updating a frame timestamp of at least one frame in the first video stream.
7. The method according to claim 5, wherein the generating at least one virtual stream corresponding to the virtual stream configuration information based on the virtual stream configuration information sent by the first server and the video stream sent by the at least one terminal device further comprises:
and in response to not receiving the first video stream, generating at least one virtual stream corresponding to the virtual stream configuration information based on a preset frame.
8. A data processing method applied to a first server, the method comprising:
sending conference control operation information to at least one terminal device, so that the at least one terminal device responds to at least one conference control scene information included in the conference control operation information, a media stream corresponding to a first conference control scene includes a media stream of a second conference control scene in the at least one conference control scene information, and then determining that the conference control scene coefficient of the first conference control scene is the conference control scene coefficient of the second conference control scene; determining the priority of the first controlled scene as the sum of the controlled scene coefficient of the second controlled scene and the media stream coefficient corresponding to the second controlled scene; or, in response to that, in the at least one piece of conference control scene information, the media stream corresponding to the first conference control scene does not include the media stream of the second conference control scene in the at least one piece of conference control scene information, determining that the priority of the first conference control scene is the conference control scene coefficient of the first conference control scene; determining a target conference scene displayed on the terminal equipment based on the priority of the at least one conference scene;
and sending virtual stream configuration information to a second server so that the second server determines a corresponding virtual stream based on the virtual stream configuration information and the video stream and the audio stream corresponding to at least one terminal device.
9. The method of claim 8, wherein before sending the opportunistic operation information to the at least one terminal device, the method further comprises:
receiving conference control information sent by first equipment;
generating the conference operation information based on the conference information;
generating the virtual flow configuration information based on the control information;
and the conference control information comprises parameter information of a conference target conference control scene.
10. Terminal equipment, characterized in that terminal equipment includes:
the layout unit is used for determining a target conference control scene displayed on the terminal equipment based on at least one conference control scene information included in the conference control operation information sent by the first server and media stream information corresponding to the at least one conference control scene;
a first sending unit, configured to send media stream subscription request information to a second server based on media stream information corresponding to the at least one conference control scene and the target conference control scene, so as to obtain a media stream corresponding to the target conference control scene;
the output unit is used for outputting the media stream corresponding to the target conference control scene through the target conference control scene;
the media stream comprises a video stream and an audio stream corresponding to at least one terminal device, and/or a virtual stream determined by the second server based on the video stream and the audio stream corresponding to the at least one terminal device;
The layout unit is specifically configured to determine, in response to that, in the at least one piece of conference control scene information, a media stream corresponding to a first conference control scene includes a media stream of a second conference control scene in the at least one piece of conference control scene information, that a conference control scene coefficient of the first conference control scene is a conference control scene coefficient of the second conference control scene; determining the priority of the first controllable scene as the sum of the controllable scene coefficient of the second controllable scene and the media stream coefficient corresponding to the second controllable scene; or, in response to that, in the at least one piece of conference control scene information, the media stream corresponding to the first conference control scene does not include the media stream of the second conference control scene in the at least one piece of conference control scene information, determining that the priority of the first conference control scene is the conference control scene coefficient of the first conference control scene; sequencing at least one conference control scene to be displayed by the terminal equipment based on the priority of the conference control scene from high to low to obtain a sequencing result; and determining the target control scene displayed on the terminal equipment based on the sequencing result.
11. A second server, wherein the second service comprises:
the first receiving unit is used for receiving a video stream sent by at least one terminal device;
A generating unit, configured to generate at least one virtual stream corresponding to virtual stream configuration information based on the virtual stream configuration information sent by the first server and/or a video stream sent by the at least one terminal device;
a second sending unit, configured to send, to a terminal device, at least one media stream corresponding to media stream subscription request information based on the media stream subscription request information sent by the terminal device;
the media stream comprises a video stream and an audio stream corresponding to at least one terminal device, and/or a virtual stream corresponding to the virtual stream configuration information; the media stream subscription request information is determined based on a target controlled scene, the target controlled scene is determined by the priority of at least one controlled scene to be displayed by the terminal device, and the method comprises the following steps:
responding to the at least one conference control scene information, wherein a media stream corresponding to a first conference control scene comprises a media stream of a second conference control scene in the at least one conference control scene information, and determining the conference control scene coefficient of the first conference control scene as the conference control scene coefficient of the second conference control scene; determining the priority of the first controlled scene as the sum of the controlled scene coefficient of the second controlled scene and the media stream coefficient corresponding to the second controlled scene; or, in response to that, in the at least one piece of conference control scene information, the media stream corresponding to the first conference control scene does not include the media stream of the second conference control scene in the at least one piece of conference control scene information, determining that the priority of the first conference control scene is the conference control scene coefficient of the first conference control scene.
12. A first server, wherein the first server comprises:
a third sending unit, configured to send conference control operation information to at least one terminal device, so that in response to at least one piece of conference control scene information included in the conference control operation information, a media stream corresponding to a first conference control scene includes a media stream of a second conference control scene in the at least one piece of conference control scene information, and then determine that a conference control scene coefficient of the first conference control scene is a conference control scene coefficient of the second conference control scene; determining the priority of the first controlled scene as the sum of the controlled scene coefficient of the second controlled scene and the media stream coefficient corresponding to the second controlled scene; or, in response to that, in the at least one piece of conference control scene information, the media stream corresponding to the first conference control scene does not include the media stream of the second conference control scene in the at least one piece of conference control scene information, determining that the priority of the first conference control scene is the conference control scene coefficient of the first conference control scene; determining a target conference scene displayed on the terminal equipment based on the priority of the at least one conference scene; and sending virtual stream configuration information to a second server so that the second server determines a corresponding virtual stream based on the virtual stream configuration information and the video stream and the audio stream corresponding to at least one terminal device.
13. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-3;
or, performing the method of any one of claims 4-7;
or, performing the method of any one of claims 8-9.
14. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-3;
or, performing the method of any one of claims 4-7;
or, performing the method of any one of claims 8-9.
CN202210362677.4A 2022-04-08 2022-04-08 Data processing method, terminal device, electronic device and storage medium Active CN114449205B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210362677.4A CN114449205B (en) 2022-04-08 2022-04-08 Data processing method, terminal device, electronic device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210362677.4A CN114449205B (en) 2022-04-08 2022-04-08 Data processing method, terminal device, electronic device and storage medium

Publications (2)

Publication Number Publication Date
CN114449205A CN114449205A (en) 2022-05-06
CN114449205B true CN114449205B (en) 2022-07-29

Family

ID=81358835

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210362677.4A Active CN114449205B (en) 2022-04-08 2022-04-08 Data processing method, terminal device, electronic device and storage medium

Country Status (1)

Country Link
CN (1) CN114449205B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101031065A (en) * 2007-04-27 2007-09-05 华为技术有限公司 Method, apparatus and system for switching pictures in video service
CN109089070A (en) * 2018-09-26 2018-12-25 福建星网智慧科技股份有限公司 A kind of layout switching method and system of video conference terminal
CN110933359A (en) * 2020-01-02 2020-03-27 随锐科技集团股份有限公司 Intelligent video conference layout method and device and computer readable storage medium
JP2021056840A (en) * 2019-09-30 2021-04-08 株式会社リコー Program, communication method, communication terminal and communication system
CN112788276A (en) * 2019-11-11 2021-05-11 中兴通讯股份有限公司 Video stream display method, transmission method, device, terminal, server and medium
CN113206975A (en) * 2021-04-29 2021-08-03 北京融讯科创技术有限公司 Video conference picture display method, device, equipment and storage medium
CN113315927A (en) * 2021-05-27 2021-08-27 维沃移动通信有限公司 Video processing method and device, electronic equipment and storage medium
CN113542660A (en) * 2021-07-20 2021-10-22 随锐科技集团股份有限公司 Method, system and storage medium for realizing conference multi-picture high-definition display

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9232192B2 (en) * 2013-11-08 2016-01-05 Avaya, Inc. Method and system for video conference snapshot presence
CN105812717A (en) * 2016-04-21 2016-07-27 邦彦技术股份有限公司 Multimedia conference control method and server
CN111131751B (en) * 2019-12-24 2023-04-07 视联动力信息技术股份有限公司 Information display method, device and system for video network conference
US20210227169A1 (en) * 2020-01-22 2021-07-22 Vonage Business Inc. System and method for using predictive analysis to generate a hierarchical graphical layout

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101031065A (en) * 2007-04-27 2007-09-05 华为技术有限公司 Method, apparatus and system for switching pictures in video service
CN109089070A (en) * 2018-09-26 2018-12-25 福建星网智慧科技股份有限公司 A kind of layout switching method and system of video conference terminal
JP2021056840A (en) * 2019-09-30 2021-04-08 株式会社リコー Program, communication method, communication terminal and communication system
CN112788276A (en) * 2019-11-11 2021-05-11 中兴通讯股份有限公司 Video stream display method, transmission method, device, terminal, server and medium
CN110933359A (en) * 2020-01-02 2020-03-27 随锐科技集团股份有限公司 Intelligent video conference layout method and device and computer readable storage medium
CN113206975A (en) * 2021-04-29 2021-08-03 北京融讯科创技术有限公司 Video conference picture display method, device, equipment and storage medium
CN113315927A (en) * 2021-05-27 2021-08-27 维沃移动通信有限公司 Video processing method and device, electronic equipment and storage medium
CN113542660A (en) * 2021-07-20 2021-10-22 随锐科技集团股份有限公司 Method, system and storage medium for realizing conference multi-picture high-definition display

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
级联视频会议中的多画面的处理方法;龙艳等;《甘肃科技纵横》;20130225(第02期);全文 *

Also Published As

Publication number Publication date
CN114449205A (en) 2022-05-06

Similar Documents

Publication Publication Date Title
CN102422639B (en) System and method for translating communications between participants in a conferencing environment
CN103324457B (en) Terminal and multi-task data display method
US9088692B2 (en) Managing the layout of multiple video streams displayed on a destination display screen during a videoconference
US9154737B2 (en) User-defined content magnification and multi-point video conference system, method and logic
CN102177711B (en) Method, device and computer program for processing images during video conferencing
CN102215375B (en) The system of selection of the video source of the sprite of more pictures and device in multimedia conferencing
US8984156B2 (en) Multi-party mesh conferencing with stream processing
CN109565568B (en) Method for controlling user interface of user equipment
CN113542660A (en) Method, system and storage medium for realizing conference multi-picture high-definition display
WO2016169496A1 (en) Video conference image presentation method and device therefor
US20160285921A1 (en) Techniques for organizing participant interaction during a communication session
US10848712B1 (en) User-defined media source synchronization
CN112788276A (en) Video stream display method, transmission method, device, terminal, server and medium
CN111131757B (en) Video conference display method, device and storage medium
CN113286190A (en) Cross-network and same-screen control method and device and cross-network and same-screen system
US20200329083A1 (en) Video conference transmission method and apparatus, and mcu
CN114449205B (en) Data processing method, terminal device, electronic device and storage medium
US9232192B2 (en) Method and system for video conference snapshot presence
CN112752058B (en) Method and device for adjusting attribute of video stream
CN113507641B (en) Client-based multi-channel video screen mixing method, system and equipment
CN110708491A (en) Video conference display method, mobile terminal, and computer-readable storage medium
CN115396684A (en) Connecting wheat display method and device, electronic equipment and computer readable medium
WO2017173953A1 (en) Server, conference terminal, and cloud conference processing method
CN110636244B (en) Video conference server, system, control method and storage medium
CN108769565B (en) Automatic switching method of picture layout, server and local recording and broadcasting system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant