CN112040168A

CN112040168A - Station caption processing method, electronic device and storage medium

Info

Publication number: CN112040168A
Application number: CN202010925539.3A
Authority: CN
Inventors: 王展; 胡小鹏; 顾振华
Original assignee: Suzhou Keda Technology Co Ltd
Current assignee: Suzhou Keda Technology Co Ltd
Priority date: 2020-09-04
Filing date: 2020-09-04
Publication date: 2020-12-04
Also published as: WO2022048137A1

Abstract

The invention relates to the technical field of video conferences, in particular to a station caption processing method, electronic equipment and a storage medium, wherein the method comprises the steps of acquiring station caption information and a video picture corresponding to a conference terminal; the video pictures comprise meeting place pictures of at least one meeting terminal; forming a video data packet based on the video picture; and adding the station caption information corresponding to the conference terminal into the video data packet to obtain a target video data packet, so that the corresponding conference terminal performs station caption processing in the corresponding conference place picture based on the received target video data packet. The method adds the station caption information into the video data packet formed by the video picture, namely, the station caption is not added into the meeting place picture before the target video data packet is sent, thereby avoiding the coding of the station caption before sending, and after the subsequent corresponding conference terminal receives the target video data packet, the station caption is processed on the corresponding meeting place picture, thereby improving the definition of the station caption in the meeting place picture.

Description

Station caption processing method, electronic device and storage medium

Technical Field

The invention relates to the technical field of video conferences, in particular to a station caption processing method, electronic equipment and a storage medium.

Background

In a video conference, in order to enable participants to obtain other meeting place information, a station caption is generally added to each meeting place picture of a video picture to show the meeting place information. Wherein, the station logo generally enables a conference place alias, a registration number or other information.

Taking a conference system based on a conference MCU framework as an example, each conference terminal sends a meeting place picture to a conference platform, and the conference platform adds a corresponding station caption to each meeting place picture after carrying out picture synthesis on the corresponding meeting place picture and sends the station caption to each conference terminal.

The inventor finds out in the analysis process of the technical scheme that the station caption is added before the video picture is sent, then the station caption is sent to each conference terminal after being coded, and each conference terminal decodes and plays the station caption. Namely, the station caption is added to the meeting place picture and then encoded, however, after the station caption is encoded, the definition of the station caption is reduced, and especially under low code rate, the definition of the station caption is greatly reduced, which results in lower definition of the station caption in the meeting place picture when the conference terminal decodes and plays.

Disclosure of Invention

In view of this, embodiments of the present invention provide a station caption adding method, an electronic device, and a storage medium to solve the problem of low definition of a station caption.

According to a first aspect, an embodiment of the present invention provides a station caption processing method, including:

acquiring station caption information and a video picture corresponding to a conference terminal; the video pictures comprise meeting place pictures of at least one meeting terminal;

forming a video data packet based on the video picture;

and adding the station caption information corresponding to the conference terminal into the video data packet to obtain a target video data packet, so that the corresponding conference terminal performs station caption processing in the corresponding conference place picture based on the received target video data packet.

The station caption processing method provided by the embodiment of the invention adds the station caption information into the video data packet formed by the video picture, namely, the station caption is not added into the meeting place picture before the target video data packet is sent, so that the encoding of the station caption before sending is avoided, and the station caption is processed on the corresponding meeting place picture after the corresponding conference terminal receives the target video data packet, thereby improving the definition of the station caption in the meeting place picture.

With reference to the first aspect, in a first implementation manner of the first aspect, the acquiring station caption information corresponding to a meeting place terminal includes:

acquiring station caption position information and station caption content information;

and determining the station caption information based on the station caption position information and the station caption content information.

In the station caption processing method provided by the embodiment of the invention, the station caption information includes the station caption position information and the station caption content information, and the station caption is added only based on the station caption position information and the station caption content information when the station caption is processed subsequently, without performing other processing operations on the station caption, so that the station caption processing efficiency is improved, and the time delay of a video picture is reduced.

With reference to the first implementation manner of the first aspect, in a second implementation manner of the first aspect, the acquiring the station caption position information includes:

acquiring the position of a meeting place picture corresponding to the station caption in the video picture, the position relation between the station caption and the meeting place picture and the size of the station caption;

and determining the station caption position information based on the position of the meeting place picture corresponding to the station caption in the video picture, the position relation between the station caption and the meeting place picture and the size of the station caption.

The station caption processing method provided by the embodiment of the invention determines the position information of the station caption by utilizing the position of the meeting place picture corresponding to the station caption in the video picture, the position relation between the station caption and the meeting place picture and the size of the station caption, and ensures the accuracy of the position of the station caption.

With reference to the first implementation manner of the first aspect, in a third implementation manner of the first aspect, the obtaining station caption content information includes:

acquiring a station caption identifier corresponding to the conference terminal and attribute information of the station caption, wherein the attribute information of the station caption comprises at least one of a font name of the station caption, a font size of the station caption, a font color of the station caption or background data of the station caption;

searching an attribute mapping table and other attribute information in the attribute information of the station caption based on the attribute information to be mapped in the attribute information of the station caption, and determining an attribute identifier of the station caption;

and determining the station caption content information based on the corresponding station caption identification of the conference terminal and the attribute identification of the station caption.

The station caption processing method provided by the embodiment of the invention determines the attribute identification of the station caption by using the mapping table, reduces the size of a target video data packet, and ensures the real-time performance of a video conference

With reference to the first aspect, in a fourth implementation manner of the first aspect, the acquiring station caption information corresponding to a conference terminal further includes:

judging whether a station caption in a meeting place picture corresponding to a preset meeting terminal needs to be deleted or not;

when the station caption in the meeting place picture corresponding to the preset meeting terminal needs to be deleted, setting the deletion flag position as a first preset value;

and forming station caption information corresponding to the preset conference terminal by using the station caption identification corresponding to the preset conference terminal and the first preset value.

According to the station caption processing method provided by the embodiment of the invention, when the station caption in the meeting place picture of the preset meeting terminal needs to be deleted, the station caption information corresponding to the preset meeting terminal is determined without influencing the station caption information corresponding to other meeting terminals, and the station caption corresponding to each meeting terminal can be independently processed.

With reference to the first aspect or the fourth implementation manner of the first aspect, in a fifth implementation manner of the first aspect, the acquiring station caption information corresponding to a conference terminal further includes:

judging whether a station caption in a current meeting place picture corresponding to a preset meeting terminal is the same as a station caption in a last meeting place picture corresponding to the preset meeting terminal;

when the station caption in the current meeting place picture corresponding to the preset meeting terminal is the same as the station caption in the last meeting place picture corresponding to the preset meeting terminal, setting the repeated mark position as a second preset value;

and forming station caption information corresponding to the preset conference terminal by using the station caption identification corresponding to the preset conference terminal and the second preset value.

According to the station caption processing method provided by the embodiment of the invention, when the station caption of the preset conference terminal is the same as the station caption in the previous conference room picture, the station caption information comprises the station caption identification and the repeated flag bit corresponding to the preset conference terminal, and the station caption attribute information and the station caption position information do not exist, that is, the station caption attribute information and the station caption position information do not need to be transmitted again, so that the non-repeated transmission of the station caption attribute information and the station caption position information of the repeated station caption can be realized, the requirement on bandwidth is reduced, and the time delay of the video picture can be further reduced.

With reference to the first aspect or any one of the first to the fifth embodiments of the first aspect, in a sixth embodiment of the first aspect, the adding, to the video data packet, station caption information corresponding to the conference terminal to obtain a target video data packet, so that the corresponding conference terminal performs station caption processing in a corresponding meeting place picture based on the received target video data packet, includes:

adding station caption information corresponding to the conference terminal to an extension head of the video data packet to obtain the target video data packet;

and sending the target video data packet to a corresponding conference terminal, so that the corresponding conference terminal performs station caption processing in a corresponding conference place picture based on the received target video data packet.

According to a second aspect, an embodiment of the present invention further provides a station caption processing method, including:

receiving a target video data packet, wherein the target video data packet comprises station caption information and video pictures corresponding to conference terminals, and the video pictures comprise conference site pictures of at least one conference terminal;

analyzing the target video data packet to obtain station caption information and the video picture corresponding to the conference terminal;

and processing station captions in the corresponding meeting place pictures of the video pictures based on the station caption information corresponding to the meeting terminals.

The station caption processing method provided by the embodiment of the invention processes the station caption at the receiving end, avoids coding the station caption at the sending end and improves the definition of the station caption in a meeting place picture.

With reference to the second aspect, in a first implementation manner of the second aspect, the processing of a station caption in a corresponding meeting place picture of the video picture based on station caption information corresponding to the conference terminal includes:

extracting station caption position information and station caption content information in the station caption information;

determining the position of the station caption in the corresponding meeting place picture by using the station caption position information;

and forming a station caption corresponding to the station caption content information on the meeting place picture based on the determined position.

In the station caption processing method provided by this embodiment, the station caption information includes the station caption position information and the station caption content information, and when the station caption is processed, only the station caption needs to be added based on the station caption position information and the station caption content information, and no other processing operation needs to be performed on the station caption, so that the efficiency of processing the station caption is improved, and the time delay of a video picture is reduced.

With reference to the second aspect, in a second implementation manner of the second aspect, the processing of a station caption in a corresponding meeting place picture of the video picture based on station caption information corresponding to the conference terminal further includes:

judging whether a deletion flag bit in station caption information corresponding to the conference terminal is a first preset value or not;

and when the deletion flag bit in the station caption information corresponding to the conference terminal is a first preset value, emptying the station caption in the conference place picture corresponding to the conference terminal.

In the station caption processing method provided by this embodiment, when the deletion flag bit in the station caption information corresponding to the conference terminal is the first preset value, the station caption of the conference site picture terminal corresponding to the conference terminal is directly set to be empty without affecting the station captions in the pictures of other conference sites, so that the station caption corresponding to each conference terminal is independently processed.

With reference to the second aspect or the second implementation manner of the second aspect, in a third implementation manner of the second aspect, the processing of a station caption in a corresponding meeting place picture of the video picture based on station caption information corresponding to the conference terminal further includes:

judging whether a repeated flag bit in station caption information corresponding to the conference terminal is a second preset value or not;

and when the repeated flag bit in the station caption information corresponding to the conference terminal is the second preset value, setting the station caption of the current conference site picture of the conference terminal as the station caption of the last conference site picture of the current conference site picture.

According to the station caption processing method provided by the embodiment of the invention, when the station caption of the conference terminal is the same as the station caption in the previous conference room picture, the received station caption information comprises the station caption identifier and the repeated flag bit corresponding to the conference terminal, but no station caption attribute information and station caption position information exist, so that the non-repeated transmission of the station caption attribute information and the station caption position of the repeated station caption can be realized, the requirement on bandwidth is reduced, and the time delay of the video picture can be further reduced.

According to a third aspect, an embodiment of the present invention further provides a station caption processing apparatus, including:

the acquisition module is used for acquiring station caption information and video pictures corresponding to the conference terminal; the video pictures comprise meeting place pictures of at least one meeting terminal;

a forming module for forming a video data packet based on the video picture;

and the adding module is used for adding the station caption information corresponding to the conference terminal into the video data packet to obtain a target video data packet so that the corresponding conference terminal can perform station caption processing in the corresponding conference place picture based on the received target video data packet.

The station caption processing device provided by the embodiment of the invention adds the station caption information into the video data packet formed by the video picture, namely, the station caption is not added into the meeting place picture before the target video data packet is sent, so that the encoding of the station caption before sending is avoided, and the station caption is processed on the corresponding meeting place picture after the corresponding conference terminal receives the target video data packet, thereby improving the definition of the station caption in the meeting place picture.

According to a fourth aspect, an embodiment of the present invention further provides a station caption adding apparatus, including:

the receiving module is used for receiving a target video data packet, wherein the target video data packet comprises station caption information and video pictures corresponding to conference terminals, and the video pictures comprise at least one conference site picture of the conference terminal;

the analysis module is used for analyzing the target video data packet to obtain station caption information and the video picture corresponding to the conference terminal;

and the processing module is used for processing the station caption in the corresponding meeting place picture of the video picture based on the station caption information corresponding to the meeting terminal.

The station caption processing device provided by the embodiment of the invention processes the station caption at the receiving end, avoids coding the station caption at the sending end and improves the definition of the station caption in a meeting place picture.

According to a fifth aspect, an embodiment of the present invention provides an electronic device, including: the station caption processing device comprises a memory and a processor, wherein the memory and the processor are connected with each other in a communication mode, computer instructions are stored in the memory, and the processor executes the computer instructions so as to execute the first aspect or any one of the implementation modes of the first aspect, or execute the station caption processing method described in any one of the implementation modes of the second aspect or the second aspect.

According to a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, which stores computer instructions for causing a computer to execute the first aspect or any one of the implementation manners of the first aspect, or execute the station caption processing method described in any one of the implementation manners of the second aspect or the second aspect.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.

Fig. 1 a-1 b are schematic diagrams illustrating application scenarios of a video conference system in an embodiment of the present invention;

fig. 2 is a flowchart of a station caption processing method according to an embodiment of the present invention;

fig. 3 is a flowchart of a station caption processing method according to an embodiment of the present invention;

fig. 4 is a schematic diagram of station caption information according to an embodiment of the present invention;

fig. 5 is a flowchart of a station caption processing method according to an embodiment of the present invention;

fig. 6 is a flowchart of a station caption processing method according to an embodiment of the present invention;

fig. 7 is a flowchart of a station caption processing method according to an embodiment of the present invention;

fig. 8 is a flowchart of a station caption processing method according to an embodiment of the present invention;

fig. 9 is a block diagram of a station caption processing apparatus according to an embodiment of the present invention;

fig. 10 is a block diagram of a station caption processing apparatus according to an embodiment of the present invention;

fig. 11 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that, in the station caption processing method according to the embodiment of the present invention, the station caption information is transmitted together with the video data packet, instead of adding the station caption information to the meeting place picture before transmission. The conference system in the embodiment of the present invention may be a communication between two conference terminals as shown in fig. 1a, for example, in the conference system, a video picture of a conference terminal a is displayed in a conference terminal B, only a meeting place picture of the conference terminal a exists in a corresponding video picture, the conference terminal a encodes the video picture to form a video data packet, then, the station caption information corresponding to the conference terminal a and the video data packet are sent to the conference terminal B, and the conference terminal B performs station caption processing in the meeting place picture of the conference terminal a.

The conference system in the embodiment of the present invention may also be a video conference system based on an MCU architecture as shown in fig. 1b, where the conference system includes a conference platform and N conference terminals, each conference terminal sends its own picture to the conference platform, and the conference platform performs corresponding picture synthesis on the received picture of the conference room to obtain a video picture, and processes the video picture to obtain a video data packet; and then the station caption information corresponding to each meeting place picture in the video pictures and the video data packet are sent to a corresponding meeting terminal. And after the conference terminal receives the station caption information and the video data packet, the station caption is processed in each meeting place picture.

In the following description of the embodiments of the present invention, a conference system shown in fig. 1b is taken as an example for detailed description.

In accordance with an embodiment of the present invention, there is provided a station caption processing method embodiment, it is noted that the steps illustrated in the flowchart of the drawings may be performed in a computer system such as a set of computer-executable instructions, and that while a logical order is illustrated in the flowchart, in some cases the steps illustrated or described may be performed in an order different than here.

In this embodiment, a station caption processing method is provided, which can be used in the above-mentioned electronic devices, such as a conference terminal, a conference platform, and the like, and fig. 2 is a flowchart of the station caption processing method according to the embodiment of the present invention, and as shown in fig. 2, the flowchart includes the following steps:

and S11, acquiring station caption information and video pictures corresponding to the conference terminal.

Wherein the video pictures comprise meeting place pictures of at least one meeting terminal.

The station caption information corresponding to the conference terminal may be acquired by the electronic device from the outside or stored in a storage space of the electronic device. The station caption information corresponds to the conference terminal, and the conference terminal corresponds to the conference site picture, so that the conference site picture corresponds to the station caption information.

And each conference terminal collects the respective conference site picture and sends the conference site picture to the electronic equipment. The electronic device performs picture synthesis processing on the corresponding meeting place picture according to the picture synthesis requirement to obtain the video picture, and the specific picture synthesis mode can be set correspondingly according to the actual requirement without any limitation. The video picture can be regarded as a picture obtained by picture synthesis.

For example, for a conference system including a conference terminal a, a conference terminal B, and a conference terminal C, the electronic device may perform picture synthesis on the conference site pictures of the conference terminal B and the conference terminal C and then send the conference site pictures to the conference terminal a, perform picture synthesis on the conference site pictures of the conference terminal a and the conference terminal C and then send the conference site pictures to the conference terminal B, and perform picture synthesis on the conference site pictures of the conference terminal a and the conference terminal B and then send the conference site pictures to the conference terminal C.

In the following description, a detailed description is given of an example in which the electronic device synthesizes the conference site pictures of all the conference terminals in the conference system.

S12, forming a video data packet based on the video picture.

The electronic equipment forms a video picture after picture synthesis is carried out on the picture of the meeting place of all the meeting terminals, and forms a video data packet after the video picture is coded and the like.

And S13, adding the station caption information corresponding to the conference terminal into the video data packet to obtain a target video data packet, so that the corresponding conference terminal performs station caption processing in the corresponding conference place picture based on the received target video data packet.

After the electronic device obtains the station caption information in S11 and the video packet in S12, the electronic device adds the station caption information to the video packet to obtain the destination video packet. Specifically, the target video data packet includes conference site pictures of all conference terminals and station caption information corresponding to the conference site pictures one by one. The video data packet includes the meeting place pictures of all the conference terminals, and the meeting place pictures correspond to the station caption information, so that each station caption information can be distinguished by using the identifier. The identifier may be an identifier of the conference terminal, or a station caption identifier determined by each conference terminal, and so on, and it is only necessary to ensure that station caption information corresponding to each conference room picture can be distinguished by using the unique identifier.

After receiving the target video data packet, the subsequent conference terminal can perform station caption processing in the corresponding conference room picture by using the station caption information in the target video data packet. The station caption processing may be adding, deleting or changing of the station caption, and the like, and specifically performs corresponding processing according to the station caption information.

In the station caption processing method provided in this embodiment, the station caption information is added to the video data packet formed by the video picture, that is, the station caption is not added to the meeting place picture before the target video data packet is sent, so that encoding of the station caption before sending is avoided, and after the target video data packet is received by the corresponding conference terminal, the station caption is processed on the corresponding meeting place picture, thereby improving the definition of the station caption in the meeting place picture.

In this embodiment, a station caption processing method is provided, which can be used in the above-mentioned electronic devices, such as a conference terminal, a conference platform, and the like, and fig. 3 is a flowchart of the station caption processing method according to the embodiment of the present invention, and as shown in fig. 3, the flowchart includes the following steps:

and S21, acquiring station caption information and video pictures corresponding to the conference terminal.

Specifically, the step S21 includes the following steps:

s211, station caption position information and station caption content information are acquired.

The station caption position information is the position of the station caption in each meeting place picture, and the position of the meeting place picture is related to picture composition, so that the electronic equipment needs to know the position of each meeting place picture in picture composition. The station caption content information is used for representing station caption identification and station caption attribute information corresponding to the conference terminal. This step is described in two parts hereinafter, namely acquisition of station caption position information and acquisition of station caption content information.

(1) Obtaining station caption position information

1.1) acquiring the position of a meeting place picture corresponding to the station caption in a video picture, the position relation between the station caption and the meeting place picture and the size of the station caption.

The position of the meeting place picture of each meeting terminal in the video picture can be determined by the user, or the position of the meeting place picture of each meeting terminal in the video picture can be set by the electronic equipment according to a corresponding rule, for example, the position of the meeting terminal is determined, and the like. The size of the station caption can be configured in advance by a user or set by the electronic equipment according to corresponding rules.

In the following description, the shapes of the meeting place picture and the station caption are both rectangular, and the position information can be determined by the coordinates of the upper left corner of the corresponding rectangular frame and the size of the rectangular frame. The position of the meeting place picture of each meeting terminal in the video picture can be determined by the coordinates of the upper left corner of the meeting place picture and the size of the meeting place picture; the station caption position information is determined by the coordinates of the upper left corner of the station caption and the size of the station caption.

For example, the position of the meeting place picture of each meeting terminal in the video picture can be determined by the coordinates (x) at the upper left corner_h,y_h) Size of meeting place picture (Width w)_hHigh h is_h) Determining; the position of the station caption can be determined by the coordinate (x) of the upper left corner of the station caption_t,y_t) And station size (width w)_tHigh h is_t) And (4) determining. Therefore, in the case where the size of the station caption is known, the position information of the station caption can be determined by acquiring the coordinates of the upper left corner of the station caption.

In other embodiments, when the size of the station caption is known, the position information of the station caption can be determined by other coordinates of the station caption, such as an upper right-corner coordinate, a lower left-corner coordinate, a lower right-corner coordinate, and a center-point coordinate of the station caption.

Furthermore, the coordinates of the upper left corner of the station caption are determined by acquiring the coordinates of the upper left corner of the meeting place picture, the size of a rectangular frame corresponding to the meeting place picture, the position relation between the station caption and the meeting place picture and the size of the station caption, so as to determine the position information of the station caption. Wherein, the position relation of station caption and meeting place picture includes: azimuth information of station caption in picture of meeting place, and width w of station caption from edge of picture of meeting place_k. The width of the station caption from the edge of the meeting place picture can be understood as the distance between the station caption and one edge nearest to the four edges of the meeting place picture. The azimuth information may be the position of the station caption at the lower center, the upper center, the left center, the right center, etc. of the meeting place screen, but is not limited to the center position, and may be the 1/3, 1/4 positions, etc., as long as the azimuth of the station caption with respect to the meeting place screen can be accurately positioned.

1.2) determining the position information of the station caption based on the position of the meeting place picture corresponding to the station caption in the video picture, the position relation between the station caption and the meeting place picture and the size of the station caption.

As described above, the coordinates of the upper left corner of the station caption are obtained to determine the position information of the station caption.

Acquiring the coordinate (x) of the upper left corner of the meeting place picture corresponding to the station caption in the video picture_h,y_h) Size of the meeting place picture (Width w)_hHigh h is_h) Size of station caption (Width w)_tHigh h is_t) Taking the station caption as the middle position below the meeting place picture, the edge of the station caption closest to the meeting place picture is the bottom edge, i.e. the width of the station caption from the bottom edge of the meeting place picture is w_kThe electronic device then determines the position of the station caption by calculation, i.e. the coordinates (x) of the upper left corner of the station caption in the picture of the meeting place_t,y_t) Namely:

x_t＝x_h+(w_h-w_t)/2，y_t＝y_h-h_h+w_k+h_t。

(2) obtaining station caption content information

And 2.1) acquiring the station caption identification and the attribute information of the station caption corresponding to the conference terminal.

The attribute of the station caption comprises at least one of a font name of the station caption, a font size of the station caption, a font color of the station caption or background data of the station caption. The station caption background data comprises at least one of transparency of the station caption background and color of the station caption background.

The meeting place pictures of all the conference terminals are provided with corresponding station caption identifications, such as station caption IDs, and the station captions of all the meeting place pictures are distinguished by the station caption IDs. The station caption can be customized by a user, and can also be acquired by the electronic equipment from other equipment or from the storage space of the electronic equipment.

2.2) searching the attribute mapping table and other attribute information in the attribute information of the station caption based on the attribute information to be mapped in the attribute information of the station caption, and determining the attribute identification of the station caption.

The station caption attribute information can be divided into attribute information to be mapped and other attribute information. The attribute information of which the byte number occupied by the representation attribute information exceeds the preset byte number is used as the attribute information to be mapped, and the attribute information of which the byte number occupied by the representation attribute information is less than or equal to the preset byte number is used as other attribute information. And establishing an attribute mapping table of the attribute information to be mapped and the attribute identifier.

In one embodiment, the preset number of bytes is three bytes, and since the font name of the station caption needs to be represented by a plurality of english letters or byte streams, the font color of the station caption or the background color of the station caption needs to be represented by RGB, and the number of occupied bytes exceeds the preset number of bytes, the attribute information to be mapped includes the font name of the station caption, the font color of the station caption and the background color in the background data of the station caption; other attribute information includes the font size of the station caption, the background transparency of the station caption, and the like.

The attribute mapping table is used for representing the logo corresponding to the logo font name, the logo font color and the background color in the logo background data, namely, the logo is used for identifying the logo font name, the logo font color and the background color in the logo background data.

For example, the attribute mapping table is shown in table 1:

table 1 attribute mapping table

Character font	Song body	01
			Font color	Black color	01
Background color	Grey colour	04

The attribute mapping table may be obtained by negotiation between the electronic device and each conference terminal, or may be sent to each conference terminal by the electronic device, and the like.

And the other attribute information directly uses the attribute information as the attribute identifier, if the font size of the station caption is No. 4, the attribute identifier is directly 4, and if the background transparency of the station caption is 50%, the attribute identifier is directly 50.

Because the attribute information such as the station caption font name, the station caption font color, the station caption background color and the like occupies more bytes, the attribute mapping table is utilized to determine the identifier corresponding to the attribute information to be mapped, and the attribute information occupying more bytes can be represented by shorter bytes, so that the byte number of the station caption information is reduced, the size of a target video data packet is reduced, and the real-time property of the video conference is ensured.

And 2.3) determining the station caption content information based on the corresponding station caption identification of the conference terminal and the attribute identification of the station caption.

After the electronic equipment obtains the attribute identification of the station caption, the electronic equipment determines the content information of the station caption by using the attribute identification of the station caption and the station caption identification. Specifically, the station caption identifier and the attribute identifier of the station caption may be combined into the station caption content information according to a preset order.

S212, station caption information is determined based on the station caption position information and the station caption content information.

After the station caption position information and the station caption content information corresponding to each conference terminal are determined, the electronic equipment can determine the station caption information. And the station caption information corresponding to different conference terminals is distinguished by station caption marks in the station caption content information.

S22, forming a video data packet based on the video picture.

Please refer to S12 in fig. 2 for details, which are not described herein.

And S23, adding the station caption information corresponding to the conference terminal into the video data packet to obtain a target video data packet, so that the corresponding conference terminal performs station caption processing in the corresponding conference place picture based on the received target video data packet.

Specifically, the step S23 includes the following steps:

and S231, adding station caption information corresponding to the conference terminal to the extension header of the video data packet to obtain a target video data packet.

For example, the video data packet formed by the electronic device in S22 is an RTP video data packet, and then the electronic device adds the station caption information corresponding to each conference terminal to the RTP extension header of the RTP video data packet, so as to obtain the target video data packet.

And S232, sending the target video data packet to a corresponding conference terminal, so that the corresponding conference terminal performs station caption processing in a corresponding conference site picture based on the received target video data packet.

Please refer to the related description in S13 in fig. 1, which is not repeated herein.

In the station caption processing method provided by this embodiment, the station caption information includes the station caption position information and the station caption content information, and then, when the station caption is processed, only the station caption needs to be added based on the station caption position information and the station caption content information, and no other processing operation needs to be performed on the station caption, so that the station caption processing efficiency is improved, and the time delay of a video picture is reduced.

In this embodiment, a station caption processing method is provided, which can be used in the above-mentioned electronic devices, such as a conference terminal, a conference platform, and the like, and fig. 5 is a flowchart of the station caption processing method according to the embodiment of the present invention, and as shown in fig. 5, the flowchart includes the following steps:

and S31, acquiring station caption information and video pictures corresponding to the conference terminal.

After the electronic device obtains the station caption information of each conference terminal, it is necessary to determine whether the station caption in the conference picture of each conference terminal needs to be deleted or not, and whether the station caption is the same as the station caption in the previous conference picture or not. And then forming station caption information of the meeting place picture corresponding to each meeting terminal based on the judgment result. In the following description, the station caption information is determined by taking a conference terminal, i.e., a preset conference terminal, as an example.

Specifically, the step S31 includes the following steps:

and S311, judging whether the station caption in the meeting place picture corresponding to the preset meeting terminal needs to be deleted.

The electronic device may determine whether the station caption in the meeting place picture corresponding to the preset meeting terminal needs to be deleted in a man-machine interaction manner, may also determine in other manners, and the like.

Executing S312 when the station caption in the meeting place picture corresponding to the preset meeting terminal needs to be deleted; otherwise, S314 is executed.

S312, the position of the deleting mark is set to be a first preset value.

For example, when deletion is required, the deletion flag bit may be set to 1; accordingly, when deletion is not required, the deletion flag bit may be set to 0, or not set.

In this step, when it is determined that the station caption in the meeting place picture corresponding to the preset meeting terminal needs to be deleted, the electronic device sets a bit 1 to a deletion flag bit corresponding to the preset meeting terminal.

And S313, forming station caption information corresponding to the preset conference terminal by using the identifier of the station caption and the first preset value.

The station caption information of the preset conference terminal determined by the electronic equipment is as follows: station logo ID + DM, where DM represents the value of the delete flag bit.

And S314, judging whether the station caption in the current meeting place picture corresponding to the preset meeting terminal is the same as the station caption in the last meeting place picture corresponding to the preset meeting terminal.

When the electronic equipment determines that the station caption in the meeting place picture corresponding to the preset meeting terminal does not need to be deleted, the deletion flag bit can be set to be 0 or not set; and then, judging that the station caption in the current meeting place picture corresponding to the preset meeting terminal is the same as the station caption in the last meeting place picture corresponding to the preset meeting terminal. For example, the electronic device determines whether the station caption attribute information and the station caption position information corresponding to the electronic device are changed in real time by using a station caption identifier corresponding to a preset conference terminal, and if the station caption attribute information and the station caption position information are not changed, it can be determined that the station caption in the current conference room picture is the same as the station caption in the previous conference room picture; otherwise, the station caption in the current meeting place picture is determined to be different from the station caption in the last meeting place picture.

When the station caption in the current meeting place picture corresponding to the preset meeting terminal is the same as the station caption in the last meeting place picture corresponding to the preset meeting terminal, executing S315; otherwise, S317 is performed.

And S315, setting the repeated mark position as a second preset value.

For example, the repetition flag is denoted by CM, and CM is set to 1 if the station caption in the current conference picture is the same as the station caption in the previous conference picture.

And S316, forming station caption information corresponding to the preset conference terminal by using the identifier of the station caption and the second preset value.

The station caption information of the preset conference terminal determined by the electronic device may be: the station logo ID + is set to be the repeated ID of the second preset value, i.e. station logo ID + CM, or may be expressed as station logo ID + DM + CM, where DM is not the first preset value.

And S317, acquiring station caption position information and station caption content information.

And when the station caption of the preset conference terminal does not need to be deleted and is not repeated with the station caption in the previous conference site picture, the electronic equipment acquires the station caption position information and the station caption content information of the preset conference terminal again and determines the station caption information.

Please refer to S211 in fig. 3 for details, which are not described herein.

And S318, determining station caption information based on the station caption position information and the station caption content information.

Please refer to S212 of the embodiment shown in fig. 3 for details, which are not described herein.

As an optional implementation manner of this embodiment, the above S31 includes the following implementation procedures: the electronic equipment acquires whether the current station caption is deleted from the upper-layer service; if the station caption needs to be deleted, the deletion mark position DM is set to be 1, and the station caption information consists of the station caption ID + DM and occupies 2 bytes; if the station caption is still the previous station caption, the deletion mark position DM is set to be 0, the repeated mark position CM is set to be 1, and the station caption information consists of the station caption ID + DM + CM and occupies 2 bytes; if the station caption information changes, the deletion marking position DM is set to be 0, the repeated marking position CM is set to be 0, the font name ID, the font size ID, the font color ID, the background alpha channel value, the coordinate x of the upper left vertex of the station caption, the coordinate y of the upper left vertex of the station caption, the width of the station caption, the height of the station caption and the name of the conference terminal are converted into byte streams, and finally all the byte streams are combined into the station caption information, namely RTP extended information data. Finally, the generated station caption information byte stream is added to the RTP extension header of the video frame corresponding to the station caption, and the result is shown in fig. 4.

S32, forming a video data packet based on the video picture.

Please refer to S22 in fig. 3 for details, which are not described herein.

And S33, adding the station caption information corresponding to the conference terminal into the video data packet to obtain a target video data packet, so that the corresponding conference terminal performs station caption processing in the corresponding conference place picture based on the received target video data packet.

Please refer to S23 in fig. 3 for details, which are not described herein.

In the station caption processing method provided by this embodiment, when a station caption in a meeting place picture of a preset meeting terminal needs to be deleted, station caption information corresponding to the preset meeting terminal is determined, so that the station caption corresponding to each meeting terminal can be processed independently; when the station caption of the preset conference terminal is the same as the station caption in the previous conference place picture, the station caption information comprises the station caption identification and the repeated flag bit corresponding to the conference terminal, and the station caption attribute information and the station caption position information do not exist, i.e. the station caption attribute information and the station caption position information do not need to be transmitted again, the non-repeated transmission of the station caption content information and the station caption position information of the repeated station caption can be realized, the requirement on the bandwidth is reduced, and the time delay of the video picture can be further reduced.

In this embodiment, a station caption processing method is provided, which can be used in the above-mentioned electronic device, such as a conference terminal, etc., and fig. 6 is a flowchart of the station caption processing method according to the embodiment of the present invention, as shown in fig. 6, the flowchart includes the following steps:

s41, receiving the target video data packet.

The target video data packet comprises station caption information and video pictures corresponding to conference terminals, and the video pictures comprise at least one conference site picture of the conference terminal.

For the description of the target video data packet, refer to the related description in the embodiments shown in fig. 2, 3 or 5, and are not repeated herein.

And S42, analyzing the target video data packet to obtain station caption information and video pictures corresponding to the conference terminal.

After receiving the target video data packet, the electronic equipment analyzes the target video data packet to obtain station caption information corresponding to each meeting place picture in the video pictures.

And S43, processing station caption in the corresponding meeting place picture of the video picture based on the station caption information corresponding to the meeting terminal.

After the electronic equipment obtains the station caption information corresponding to each meeting place picture, station captions are added, deleted or changed in the corresponding meeting place picture by using the station caption information. This step will be described in detail below.

The station caption processing method provided by the embodiment performs the station caption processing at the receiving end, thereby avoiding encoding the station caption at the sending end, and improving the definition of the station caption in the meeting place picture.

In this embodiment, a station caption processing method is provided, which can be used in the above-mentioned electronic device, such as a conference terminal, etc., fig. 7 is a flowchart of the station caption processing method according to the embodiment of the present invention, and as shown in fig. 7, the flowchart includes the following steps:

s51, receiving the target video data packet.

Please refer to S41 in fig. 6 for details, which are not described herein.

And S52, analyzing the target video data packet to obtain station caption information and video pictures corresponding to the conference terminal.

Please refer to S42 in fig. 6 for details, which are not described herein.

And S53, processing station caption in the corresponding meeting place picture of the video picture based on the station caption information corresponding to the meeting terminal.

Specifically, the step S53 includes the following steps:

and S531, extracting station caption position information and station caption content information in the station caption information.

The station caption information includes station caption position information and station caption content information, and for the specific contents of the station caption position information and the station caption content information, please refer to S21 in the embodiment shown in fig. 3, which is not described herein again.

And S532, determining the position of the station caption in the corresponding meeting place picture by using the station caption position information.

After extracting the station caption position information, the electronic equipment positions the station caption position information in the meeting place picture and determines the position of the station caption in the corresponding meeting place picture.

S533, forming a station caption corresponding to the station caption content information on the meeting place picture based on the determined position.

After the electronic device determines the position of the station caption in the corresponding meeting place picture, the electronic device can utilize the content information of the station caption to form the corresponding station caption on the meeting place picture. Specifically, the electronic device may correspond the attribute identifier of each station caption to the meeting place picture by using the station caption identifier in the station caption content information, generate a corresponding station caption based on the attribute identifier of each station caption, and display the generated station caption in a manner of being superimposed on the corresponding position of the corresponding meeting place picture by using the station caption position information; or, the electronic device may also generate a corresponding station caption by using the attribute identifier of the station caption in the station caption content information, correspond the generated station caption to the meeting place picture by using the station caption identifier in the station caption content information, and display the generated station caption in a superimposed manner by using the station caption position information at the corresponding position of the corresponding meeting place picture.

In this embodiment, a station caption processing method is provided, which can be used in the above-mentioned electronic device, such as a conference terminal, etc., fig. 8 is a flowchart of the station caption processing method according to the embodiment of the present invention, and as shown in fig. 8, the flowchart includes the following steps:

s61, receiving the target video data packet.

Please refer to S51 in fig. 7 for details, which are not described herein.

And S62, analyzing the target video data packet to obtain station caption information and video pictures corresponding to the conference terminal.

Please refer to S52 in fig. 7 for details, which are not described herein.

And S63, processing station caption in the corresponding meeting place picture of the video picture based on the station caption information corresponding to the meeting terminal.

The station mark information in this embodiment includes a deletion flag bit and a repetition flag bit, wherein for the description of the deletion flag bit and the repetition flag bit, please refer to S31 in the embodiment shown in fig. 5, which is not described herein again.

Specifically, the step S63 includes the following steps:

and S631, judging whether the deletion flag bit in the station caption information corresponding to the conference terminal is the first preset value.

And the first preset value indicates that the station caption in the current meeting place picture corresponding to the meeting terminal needs to be nulled. Executing S632 when the deleting flag bit in the station caption information corresponding to the conference terminal is a first preset value; otherwise, S633 is performed.

And S632, marking the station in the meeting place picture corresponding to the meeting terminal to be empty.

And the electronic equipment nulls the station caption in the current meeting place picture corresponding to the meeting terminal if the deletion flag bit in the station caption information of the meeting terminal is determined to be the first preset value.

And S633, judging whether the repeated zone bit in the station caption information corresponding to the conference terminal is a second preset value.

And the second preset value is used for indicating that the station caption in the current meeting place picture is the same as the station caption in the last meeting place picture. When the repeated flag bit in the station caption information corresponding to the conference terminal is the second preset value, executing S634; otherwise, S635 is performed.

And S634, setting the station caption of the current meeting place picture of the meeting terminal as the station caption of the last meeting place picture of the current meeting place picture.

And the electronic equipment directly sets the station caption of the current meeting place picture as the station caption of the last meeting place picture of the current meeting place picture when determining that the station caption of the current meeting place picture of the conference terminal is the same as the station caption of the last meeting place picture.

S635, extracting station caption position information and station caption content information in the station caption information.

And when the electronic equipment determines that the deletion flag bit in the station caption information corresponding to the conference terminal is not the first preset value and the repeated flag bit is not the second preset value, the electronic equipment processes the station caption by using the station caption position information and the station caption content information in the station caption information.

For details of this step, please refer to S531 in the embodiment shown in fig. 7, which is not described herein again.

And S636, determining the position of the station caption in the corresponding meeting place picture by using the station caption position information.

Please refer to S532 of the embodiment shown in fig. 7 in detail, which is not described herein again.

S637 forms a station caption corresponding to the station caption content information on the meeting place screen based on the determined position.

Please refer to S533 in fig. 7, which is not described herein.

For example, the electronic device obtains station caption information of each conference terminal through analysis. If DM is 1, the station caption corresponding to the station caption ID is not displayed any more; if the CM is 1, displaying that the station caption corresponding to the station caption ID uses the previous station caption; if CM is 0, then use the station caption corresponding to the ID of station caption to generate and display the new station caption according to the new station caption information.

In the station caption processing method provided by this embodiment, when the deletion flag bit in the station caption information corresponding to the conference terminal is the first preset value, the station caption of the conference site picture terminal corresponding to the conference terminal is directly set to be empty without affecting the station captions in the other conference site pictures, so that the station caption corresponding to each conference terminal is independently processed; when the station caption of a certain conference terminal is the same as the station caption in the picture of the previous conference place, the received station caption information comprises the station caption identification and the repeated flag bit corresponding to the conference terminal, and the station caption attribute information and the station caption position information do not exist, so that the non-repeated transmission of the station caption content information and the station caption position information of the repeated station caption can be realized, the requirement on the bandwidth is reduced, and the time delay of the video picture can be further reduced.

In this embodiment, a station caption processing apparatus is further provided, and the apparatus is used to implement the foregoing embodiments and preferred embodiments, and details of which have been already described are omitted. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.

The present embodiment provides a station caption processing apparatus, as shown in fig. 9, including:

an obtaining module 71, configured to obtain logo information and a video frame corresponding to the conference terminal; the video pictures comprise meeting place pictures of at least one meeting terminal;

a forming module 72 for forming a video data packet based on the video picture;

and the adding module 73 is configured to add the station caption information corresponding to the conference terminal to the video data packet to obtain a target video data packet, so that the corresponding conference terminal performs station caption processing in a corresponding conference room picture based on the received target video data packet.

The station caption processing device provided in this embodiment adds the station caption information to the video data packet formed by the video picture, that is, before the target video data packet is sent, the station caption is not added to the meeting place picture, so that encoding of the station caption before sending is avoided, and after the target video data packet is received by the corresponding conference terminal, the station caption is processed on the corresponding meeting place picture, thereby improving definition of the station caption in the meeting place picture.

The present embodiment provides a station caption processing apparatus, as shown in fig. 10, including:

a receiving module 81, configured to receive a target video data packet, where the target video data packet includes station caption information and video pictures corresponding to a conference terminal, and the video pictures include at least one conference site picture of the conference terminal;

the analysis module 82 is configured to analyze the target video data packet to obtain station caption information and the video picture corresponding to the conference terminal;

and the processing module 83 is configured to perform station caption processing in a meeting place picture corresponding to the video picture based on the station caption information corresponding to the meeting terminal.

The station caption processing device provided by the embodiment processes the station caption at the receiving end, avoids coding the station caption at the sending end, and improves the definition of the station caption in a meeting place picture.

The station caption processing device in this embodiment is presented in the form of a functional unit, where the unit refers to an ASIC circuit, a processor and a memory that execute one or more software or fixed programs, and/or other devices that can provide the above-described functions.

Further functional descriptions of the modules are the same as those of the corresponding embodiments, and are not repeated herein.

An embodiment of the present invention further provides an electronic device, which includes the station caption processing apparatus shown in fig. 9 or fig. 10.

Referring to fig. 11, fig. 11 is a schematic structural diagram of an electronic device according to an alternative embodiment of the present invention, and as shown in fig. 11, the electronic device may include: at least one processor 91, such as a CPU (Central Processing Unit), at least one communication interface 93, memory 94, and at least one communication bus 92. Wherein a communication bus 92 is used to enable the connection communication between these components. The communication interface 93 may include a Display (Display) and a Keyboard (Keyboard), and the optional communication interface 93 may also include a standard wired interface and a standard wireless interface. The Memory 94 may be a high-speed RAM (Random Access Memory) or a non-volatile Memory, such as at least one disk Memory. The memory 94 may alternatively be at least one memory device located remotely from the processor 91. Wherein the processor 91 may be in connection with the apparatus described in fig. 9 or 10, an application program is stored in the memory 94, and the processor 91 calls the program code stored in the memory 94 for performing any of the above-mentioned method steps.

The communication bus 92 may be a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (EISA) bus. The communication bus 92 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in FIG. 11, but this is not intended to represent only one bus or type of bus.

The memory 94 may include a volatile memory (RAM), such as a random-access memory (RAM); the memory may also include a non-volatile memory (english: non-volatile memory), such as a flash memory (english: flash memory), a hard disk (english: hard disk drive, abbreviated: HDD) or a solid-state drive (english: SSD); memory 94 may also comprise a combination of the above types of memory.

The processor 91 may be a Central Processing Unit (CPU), a Network Processor (NP), or a combination of a CPU and an NP.

The processor 91 may further include a hardware chip. The hardware chip may be an application-specific integrated circuit (ASIC), a Programmable Logic Device (PLD), or a combination thereof. The PLD may be a Complex Programmable Logic Device (CPLD), a field-programmable gate array (FPGA), a General Array Logic (GAL), or any combination thereof.

Optionally, the memory 94 is also used to store program instructions. The processor 91 may call program instructions to implement the station caption processing method as shown in the embodiments of fig. 2, 3, 5, and fig. 6-8 of the present application.

The embodiment of the invention also provides a non-transitory computer storage medium, wherein the computer storage medium stores computer executable instructions, and the computer executable instructions can execute the station caption processing method in any method embodiment. The storage medium may be a magnetic Disk, an optical Disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a Flash Memory (Flash Memory), a Hard Disk (Hard Disk Drive, abbreviated as HDD), a Solid State Drive (SSD), or the like; the storage medium may also comprise a combination of memories of the kind described above.

Although the embodiments of the present invention have been described in conjunction with the accompanying drawings, those skilled in the art may make various modifications and variations without departing from the spirit and scope of the invention, and such modifications and variations fall within the scope defined by the appended claims.

Claims

1. A station caption processing method is characterized by comprising the following steps:

forming a video data packet based on the video picture;

2. The method of claim 1, wherein the obtaining of the station caption information corresponding to the venue terminal comprises:

3. The method of claim 2, wherein the obtaining station beacon location information comprises:

4. The method of claim 2, wherein the obtaining station caption content information comprises:

5. The method of claim 1, wherein the obtaining of the station caption information corresponding to the conference terminal further comprises:

6. The method according to claim 1 or 5, wherein the obtaining of the station caption information corresponding to the conference terminal further comprises:

7. A station caption processing method is characterized by comprising the following steps:

8. The method according to claim 7, wherein the processing of the station caption in the corresponding meeting place picture of the video picture based on the station caption information corresponding to the conference terminal comprises:

9. The method according to claim 7, wherein the processing of the station caption in the corresponding meeting place picture of the video picture based on the station caption information corresponding to the conference terminal further comprises:

10. The method according to claim 7 or 9, wherein the processing of the station caption in the corresponding meeting place picture of the video picture based on the station caption information corresponding to the conference terminal further comprises:

11. An electronic device, comprising:

a memory and a processor, the memory and the processor being communicatively connected to each other, the memory storing therein computer instructions, the processor executing the computer instructions to perform the station caption processing method according to any one of claims 1 to 6 or to perform the station caption processing method according to any one of claims 7 to 10.

12. A computer-readable storage medium storing computer instructions for causing a computer to execute any one of claims 1 to 6, or perform the station caption processing method according to any one of claims 7 to 10.