CN117395459A

CN117395459A - Image processing method, system and electronic equipment

Info

Publication number: CN117395459A
Application number: CN202311329463.8A
Authority: CN
Inventors: 李鑫; 唐峰; 王文权; 吴桂龙
Original assignee: Beijing Nera Stentofon Communication Equipment Co Ltd
Current assignee: Beijing Nera Stentofon Communication Equipment Co Ltd
Priority date: 2023-10-13
Filing date: 2023-10-13
Publication date: 2024-01-12

Abstract

The disclosure provides an image processing method, an image processing system and electronic equipment, and relates to the technical field of image processing. The method comprises the following steps: acquiring hardware configuration information of each terminal entering a video conference and real-time network bandwidth of each terminal; processing the received video stream data sent by at least two terminals based on a preset image synthesis mode to obtain an image to be processed; encoding the image to be processed according to the encoding code rate of each terminal to generate target video stream data matched with each terminal, wherein the encoding code rate is determined according to the hardware configuration information of the terminal and the real-time network bandwidth of the terminal; and sending the target video stream data matched with the target video stream data to each terminal. The terminal can acquire target video stream data matched with the hardware configuration information and the real-time network bandwidth, so that the differential use requirement of the terminal is met, and the use experience of the terminal is improved.

Description

Image processing method, system and electronic equipment

Technical Field

The disclosure relates to the technical field of image processing, and in particular relates to an image processing method, an image processing system and electronic equipment.

Background

Typically, audio/video conferences conducted within an enterprise will utilize an intra-enterprise private communication system for data transmission and processing.

However, as the requirements of users on audio/video conferences are higher and higher, the traditional enterprise internal private communication system cannot provide different picture display for users based on different hardware devices used by the users, and cannot meet the different use requirements of the users.

Disclosure of Invention

The embodiment of the disclosure provides an image processing method, an image processing system and electronic equipment, which solve the problem of how to display different images for users based on different terminal hardware configuration information.

In a first aspect, an embodiment of the present disclosure provides an image processing method, including: acquiring hardware configuration information of each terminal entering a video conference and real-time network bandwidth of each terminal; processing the received video stream data sent by at least two terminals based on a preset image synthesis mode to obtain an image to be processed; encoding the image to be processed according to the encoding code rate of each terminal to generate target video stream data matched with each terminal, wherein the encoding code rate is determined according to the hardware configuration information of the terminal and the real-time network bandwidth of the terminal; and sending the target video stream data matched with the target video stream data to each terminal.

In a second aspect, an embodiment of the present disclosure provides an image processing method, including: transmitting hardware configuration information and real-time network bandwidth of a terminal to a server, so that the server determines the coding rate of the terminal based on the hardware configuration information and the real-time network bandwidth of the terminal, codes an image to be processed by using the coding rate to generate target video stream data matched with the terminal, wherein the image to be processed is an image obtained by processing video stream data transmitted by at least two received terminals by the server based on a preset image synthesis mode, and the terminal is a terminal participating in a video conference; and responding to the target video stream data sent by the server, and processing the target video stream data based on the image processing capability of the terminal to obtain a target display image.

In a third aspect, embodiments of the present disclosure provide an image processing system, comprising: an image processing server, at least two terminals; an image processing server configured to execute any one of the image processing methods applied to the server in the embodiments of the present disclosure; and a terminal configured to perform any one of the image processing methods applied to the terminal in the embodiments of the present disclosure.

In a fourth aspect, an embodiment of the present disclosure provides an electronic device, including: one or more processors; and a memory having one or more programs stored thereon, which when executed by the one or more processors, cause the one or more processors to implement any of the image processing methods of the embodiments of the present disclosure.

According to the image processing method, the system and the electronic equipment provided by the embodiment of the disclosure, the hardware configuration condition of the terminal is defined by acquiring the hardware configuration information of each terminal entering the video conference and the real-time network bandwidth of each terminal, and the processing capacity of the terminal for images is known based on the real-time network bandwidth and the hardware configuration information of the terminal, so that different processing is conveniently carried out for different terminals; processing video stream data sent by at least two terminals based on a preset image synthesis mode to obtain images to be processed, wherein the images to be processed can comprehensively display image information of each terminal referring to a video conference, and provide comprehensive video conference pictures for each terminal; encoding the image to be processed according to the encoding code rate of each terminal to generate target video stream data matched with each terminal, wherein the encoding code rate is determined according to the hardware configuration information of the terminal and the real-time network bandwidth of the terminal, so that the target video stream data can be matched with the hardware configuration information of the terminal and the real-time network bandwidth, and different picture display can be conveniently provided for different terminals; and sending the target video stream data matched with the terminal to each terminal so that the terminal can acquire the target video stream data matched with the hardware configuration information and the real-time network bandwidth of the terminal, thereby meeting the differential use requirements of the terminal and improving the use experience of the terminal.

Drawings

The accompanying drawings are included to provide a further understanding of embodiments of the disclosure, and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description serve to explain the disclosure, without limitation to the disclosure. The above and other features and advantages will become more readily apparent to those of ordinary skill in the art by describing in detail exemplary embodiments with reference to the accompanying drawings in which.

Fig. 1 shows a flowchart of an image processing method according to an embodiment of the present disclosure.

Fig. 2 shows a flowchart of an image processing method according to an embodiment of the present disclosure.

Fig. 3 shows a block diagram of a server provided by an embodiment of the present disclosure.

Fig. 4 shows a schematic composition diagram of a terminal provided in an embodiment of the present disclosure.

Fig. 5 shows a block diagram of an image processing system provided by an embodiment of the present disclosure.

Fig. 6 is a schematic diagram of a video fusion screen according to an embodiment of the disclosure.

Fig. 7 is a schematic diagram of an image display layout according to an embodiment of the disclosure.

Fig. 8 is a schematic diagram of an image display layout according to an embodiment of the disclosure.

Fig. 9 shows a block diagram of an electronic device provided by an embodiment of the present disclosure.

Detailed Description

Specific embodiments of the present disclosure are described in detail below with reference to the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating and illustrating the disclosure, are not intended to limit the disclosure. It will be apparent to one skilled in the art that the present disclosure may be practiced without some of these specific details. The following description of the embodiments is merely intended to provide a better understanding of the present disclosure by showing examples of the present disclosure.

For the purposes of clarity, technical solutions and advantages of the present disclosure, the following further details the embodiments of the present disclosure with reference to the accompanying drawings.

The traditional mode of adopting an enterprise internal special communication system to process services such as video conferences and the like cannot carry out fine control on video screen fusion pictures according to hardware configuration information of terminals, and cannot meet the differential use requirements of users. For example, the data in the audio/video conference is subjected to diversified collection, differentiation processing, fusion of various different types of data and the like.

The embodiment of the disclosure provides an image processing method, an image processing system and electronic equipment, which can carry out fine control and self-adaptive adjustment on images which are required to be displayed on terminals according to different hardware configuration information and different real-time network bandwidths of all the terminals, and improve the use experience of users.

Fig. 1 shows a flowchart of an image processing method according to an embodiment of the present disclosure. The image processing method is applicable to a server. For example, the server may be a server of a multimedia subsystem (Internet Protocol Multimedia Subsystem, IMS) supporting a network interconnection protocol.

As shown in fig. 1, the image processing method includes, but is not limited to, the following steps.

Step S101, obtaining hardware configuration information of each terminal entering a video conference and real-time network bandwidth of each terminal.

Step S102, processing the received video stream data sent by at least two terminals based on a preset image synthesis mode to obtain an image to be processed.

And step S103, respectively encoding the images to be processed according to the encoding code rate of each terminal, and generating target video stream data matched with each terminal.

The coding rate is determined according to the hardware configuration information of the terminal and the real-time network bandwidth of the terminal.

And step S104, sending the target video stream data matched with the target video stream data to each terminal.

In the embodiment, the hardware configuration condition of the terminal is defined by acquiring the hardware configuration information of each terminal entering the video conference and the real-time network bandwidth of each terminal, and the processing capability of the terminal for the image is known based on the real-time network bandwidth and the hardware configuration information of the terminal, so that different processing is convenient for different terminals; processing video stream data sent by at least two terminals based on a preset image synthesis mode to obtain images to be processed, wherein the images to be processed can comprehensively display image information of each terminal referring to a video conference, and provide comprehensive video conference pictures for each terminal; encoding the image to be processed according to the encoding code rate of each terminal to generate target video stream data matched with each terminal, wherein the encoding code rate is determined according to the hardware configuration information of the terminal and the real-time network bandwidth of the terminal, so that the target video stream data can be matched with the hardware configuration information of the terminal and the real-time network bandwidth, and different picture display can be conveniently provided for different terminals; and sending the target video stream data matched with the terminal to each terminal so that the terminal can acquire the target video stream data matched with the hardware configuration information and the real-time network bandwidth of the terminal, thereby meeting the differential use requirements of the terminal and improving the use experience of the terminal.

In some exemplary embodiments, the video stream data transmitted by the at least two terminals includes a plurality of video stream data.

The processing of the video stream data sent by the at least two received terminals based on the preset image synthesis mode in step S102 to obtain the image to be processed may be implemented in the following manner: respectively matching the plurality of video stream data with a preset grid, and determining the display position of each video stream data in the preset grid; and respectively mapping video images corresponding to the video stream data into a preset display interface according to the display position of each video stream data in a preset grid, and generating an image to be processed.

Wherein the preset grid comprises a transverse pane and a longitudinal pane.

The acquisition sequence of each video stream data can be defined by determining the display position of each video stream data in a preset grid; and then, according to the display position of each video stream data in the preset grid, mapping the video image corresponding to each video stream data into a preset display interface, so that the obtained image to be processed can comprehensively embody the real-time picture of each terminal participating in the video conference, the communication among the terminals is facilitated, and the information interaction efficiency of the video conference is improved.

In some exemplary embodiments, the preset grid includes a preset number of display positions, the preset number being a product of the number of landscape panes and the number of portrait panes.

According to the display position of each video stream data in a preset grid, mapping the video image corresponding to each video stream data to a preset display interface to generate an image to be processed, wherein the method comprises the following steps: respectively carrying out scaling treatment on video images corresponding to each video stream data to obtain a plurality of images to be spliced; under the condition that the number of terminals participating in the video conference is smaller than the preset number, mapping each image to be spliced onto a preset display interface in sequence according to the acquisition sequence of the images to be spliced, and obtaining an image to be processed, wherein the image to be processed comprises at least one idle area.

Under the condition that the number of terminals participating in the video conference is determined to be greater than or equal to the preset number, selecting images with definition greater than a preset definition threshold value from a plurality of images to be spliced according to the preset number, and sequentially mapping the selected images onto a preset display interface to obtain the images to be processed.

For example, if the number of horizontal panes is set to X and the number of vertical panes is set to Y, the preset number is set to x×y. The image display layout corresponding to the preset network is 2×2 (i.e. 2 images are displayed horizontally and 2 images are displayed vertically), 4×4, 8×8, and other different layout modes.

For example, an image display layout is set to 2×2, and if 3 terminals participating in a video conference (less than a preset number 4), the image display layout corresponding to the image to be processed is 2×1+1, where "+1" indicates that the number of free areas in the image display layout is 1,2×1 indicates that video images of 2 terminals are sequentially displayed in the first line, and video images of one terminal are displayed again in the next line.

If the number of terminals participating in the video conference is 5, the server selects 4 video images with definition greater than a preset definition threshold from the video images reported by the 5 terminals, and then maps the selected video images to a preset display interface to obtain the image to be processed. At this time, there is no free area in the image display layout, and the image display layout is 2×2+0.

It should be noted that, because the image processing capability of each terminal is different, the resolution of at least two video images displayed by each terminal based on the image display layout may also be different, so as to adapt to the display requirement of each terminal, and fine control and adaptive adjustment are performed on the images.

At least two video images are respectively mapped into a preset display interface through different mapping modes, video images of a plurality of terminals can be simultaneously displayed through the preset display interface, communication between the terminals is facilitated, and information interaction efficiency of a video conference is improved.

Another possible implementation manner is provided by the embodiment of the present disclosure, where after performing the sending, in step S104, of the target video stream data matched with the target video stream data to each terminal, the method further includes: and under the condition that the hardware configuration information of the terminal is determined to be changed, updating the coding rate of the terminal based on the updated hardware configuration information, and obtaining the updated coding rate.

The hardware configuration information of the terminal characterizes the processing capacity of the terminal to the image, and the lower the processing capacity of the terminal to the image is, the lower the updated coding rate is.

For example, the hardware configuration information of the terminal includes at least one of: the processor model of the terminal, the memory capacity of the terminal, and the graphics processor performance information of the terminal.

The graphics processor (graphics processing unit, GPU), also called display core, vision processor, display chip, is a microprocessor that is specially used for performing image and graphics related operations on personal computers, workstations, game machines and some mobile devices (such as tablet computers, smart phones, etc.).

For example, if a certain terminal has a 4-core processor, has a processing capability of 8 threads, and a memory space of 8 Gigabytes (GB), the transmission rate of target video stream data that the terminal can obtain is 25 frames Per Second (fps), and the terminal is enabled to perform image display at a resolution of 720P (Progressive) (i.e., 720×1280 progressive scanning).

When the number of threads of the terminal is reduced from 8 to 4, the resolution of the terminal and the coding rate of the server corresponding to the terminal are correspondingly reduced, for example, the coding rate is reduced to half of the original coding rate, and/or the resolution of the terminal is reduced to 480P.

By monitoring the hardware configuration information of the terminal, the server can timely know the change of the image processing capability of each terminal, for example, for a terminal with lower image processing capability, the server can reduce the picture details and the coding code rate of the terminal, so that the calculation load of the terminal when decoding the target video stream data is reduced, and the terminal can acquire the target video stream data matched with the target video stream data. Aiming at the terminal with increased image processing capability, the corresponding coding code rate of the terminal can be improved, so that the terminal can acquire target video stream data with higher definition, and the image quality of the terminal is improved.

In some exemplary embodiments, after performing the step S104 of transmitting the target video stream data matched thereto to each terminal, the method further includes: under the condition that the real-time network bandwidth of the terminal is determined to change, the coding rate of the terminal is updated according to the changed network bandwidth, and the updated coding rate is obtained.

The smaller the real-time network bandwidth of the terminal is, the lower the coding rate is, and the lower the video resolution corresponding to the target video stream data matched with the terminal is.

The server detects the network condition of each terminal in real time so as to update the real-time network bandwidth of each terminal in real time. If the server detects that the network bandwidth of a certain terminal is reduced, the server can timely update the coding rate of the terminal according to the changed network bandwidth, obtain the updated coding rate, encode the image to be processed according to the updated coding rate, and generate updated target video stream data, so that the updated target video stream data is more suitable for the transmission requirement of the terminal, and the stability of the target video stream data in the transmission process is ensured.

For example, for a terminal with a low real-time network bandwidth or limited hardware performance of the terminal, the server may reduce the coding rate corresponding to the terminal, so as to ensure that the video conference can still maintain good picture quality in a low-bandwidth environment, and improve the data transmission smoothness in the video conference.

Fig. 2 shows a flowchart of an image processing method according to an embodiment of the present disclosure. The image processing method is applicable to a terminal. As shown in fig. 2, the image processing method includes, but is not limited to, the following steps.

Step S201, the hardware configuration information and the real-time network bandwidth of the terminal are sent to the server, so that the server determines the coding rate of the terminal based on the hardware configuration information and the real-time network bandwidth of the terminal, and codes the image to be processed by using the coding rate, and generates target video stream data matched with the terminal.

The image to be processed is an image obtained by processing video stream data sent by at least two received terminals by a server based on a preset image synthesis mode, and the terminals are terminals participating in a video conference.

Step S202, in response to the target video stream data sent by the server, processing the target video stream data based on the image processing capability of the terminal to obtain a target display image.

In the embodiment, the hardware configuration information and the real-time network bandwidth of the terminal are sent to the server, so that the server determines the coding rate of the terminal based on the hardware configuration information and the real-time network bandwidth of the terminal, and codes the image to be processed by using the coding rate to generate target video stream data matched with the terminal; and responding to the target video stream data sent by the server, and processing the target video stream data based on the image processing capability of the terminal to obtain a target display image, so that the target display image is more suitable for the display requirement of the current terminal, and the use experience of the terminal is improved.

In some exemplary embodiments, the higher the image processing capability of the terminal, the higher the video resolution of the target display image; the target display image comprises at least two video images corresponding to the video stream data, wherein the at least two video images are images acquired by any two terminals participating in the video conference.

In some exemplary embodiments, after performing the processing of the target video stream data based on the image processing capability of the terminal in response to the target video stream data transmitted by the server in step S202 to obtain the target display image, the method further includes: based on the display requirement, any one of the following operations is performed on the target display image:

adjusting an image display layout on the terminal, wherein the image display layout comprises display positions corresponding to at least two video images;

performing an enlarging or reducing operation on the target display image;

and switching the target display image and the main video interface.

The target display image is processed according to the acquired display requirement by adopting at least one operation mode, so that the target display image can be clearly displayed for a user to watch, and the user can realize the fine control and self-adaptive adjustment of the image while the personalized use requirement of the user is met.

Fig. 3 shows a block diagram of a server provided by an embodiment of the present disclosure. The implementation of the server in this embodiment is not limited to the above examples, and other non-illustrated examples are also within the scope of protection of the server.

As shown in fig. 3, the server 300 is not limited to the following modules.

The acquiring module 301 is configured to acquire hardware configuration information of each terminal entering the video conference, and real-time network bandwidth of each terminal.

The image processing module 302 is configured to process the received video stream data sent by the at least two terminals based on a preset image synthesis mode, so as to obtain an image to be processed.

The encoding module 303 is configured to encode the image to be processed according to the encoding code rate of each terminal, and generate target video stream data matched with each terminal.

The first transmitting module 304 is configured to transmit the target video stream data matched with the first transmitting module to each terminal.

It should be noted that, the server 300 in this embodiment can execute any image processing method applied to the server in any of the embodiments of the present disclosure, which is not described herein.

In the embodiment, the hardware configuration information of each terminal in the video conference and the real-time network bandwidth of each terminal are acquired through the acquisition module, so that the hardware configuration condition of the terminal is defined, the processing capability of the terminal for the image is known based on the real-time network bandwidth and the hardware configuration information of the terminal, and different processing is conveniently carried out for different terminals; processing video stream data sent by at least two received terminals based on a preset image synthesis mode by using an image processing module to obtain images to be processed, wherein the images to be processed can comprehensively display image information of each terminal referring to a video conference, and provide comprehensive video conference pictures for each terminal; the method comprises the steps that an encoding module is used for encoding an image to be processed according to the encoding code rate of each terminal to generate target video stream data matched with each terminal, wherein the encoding code rate is determined according to the hardware configuration information of the terminal and the real-time network bandwidth of the terminal, so that the target video stream data can be matched with the hardware configuration information of the terminal and the real-time network bandwidth, and different frames can be conveniently displayed for different terminals; and the first sending module is used for sending the target video stream data matched with the first sending module to each terminal, so that the terminal can acquire the target video stream data matched with the hardware configuration information and the real-time network bandwidth of the terminal, the differential use requirement of the terminal is met, and the use experience of the terminal is improved.

Fig. 4 shows a schematic composition diagram of a terminal provided in an embodiment of the present disclosure. The specific implementation of the terminal in this embodiment is not limited to the above examples, and other non-illustrated examples are also within the scope of protection of the terminal.

As shown in fig. 4, the terminal 400 includes, but is not limited to, the following modules.

And the second sending module 401 is configured to send the hardware configuration information of the terminal and the real-time network bandwidth to the server, so that the server determines the coding rate of the terminal based on the hardware configuration information of the terminal and the real-time network bandwidth, and encodes the image to be processed by using the coding rate to generate target video stream data matched with the terminal.

And a processing module 402, configured to respond to the target video stream data sent by the server, and process the target video stream data based on the image processing capability of the terminal to obtain a target display image.

It should be noted that, the terminal 400 in this embodiment can execute any image processing method applied to the terminal in any of the embodiments of the present disclosure, which is not described herein.

In the embodiment, the hardware configuration information and the real-time network bandwidth of the terminal are sent to the server through the second sending module, so that the server determines the coding rate of the terminal based on the hardware configuration information and the real-time network bandwidth of the terminal, and codes the image to be processed by using the coding rate to generate target video stream data matched with the terminal; the processing module is used for responding to the target video stream data sent by the server, processing the target video stream data based on the image processing capability of the terminal to obtain a target display image, so that the target display image is more suitable for the display requirement of the current terminal, and the use experience of the terminal is improved.

It should be noted that each module in this embodiment is a logic module, and in practical application, one logic unit may be one physical unit, or may be a part of one physical unit, or may be implemented by a combination of multiple physical units. In addition, in order to highlight the innovative part of the present disclosure, elements that are not so close to solving the technical problem presented in the present disclosure are not introduced in the present embodiment, but it does not indicate that other elements are not present in the present embodiment.

Fig. 5 shows a block diagram of an image processing system provided by an embodiment of the present disclosure. As shown in fig. 5, the image processing system includes: server 510, at least two terminals (e.g., first terminal 521, second terminal 522, … …, nth terminal 52N, etc., N being an integer greater than or equal to 2, N representing the number of terminals).

Wherein the server 510 is configured to perform any one of the image processing methods applied to the server in the embodiments of the present disclosure.

And a terminal configured to perform any one of the image processing methods applied to the terminal in the embodiments of the present disclosure. The terminal comprises at least one of the following: a personal computer (Personal Computer, PC), a terminal supporting a network interconnection protocol (Internet Protocol, IP) and a terminal supporting a video conference function.

In some embodiments, server 510 is an IMS-enabled server. The server supporting IMS has the capability of carrying out characteristic fusion on audio data, service data and video data, and can support the communication and fusion of the data by carrying out encoding or decoding processing on the different data, thereby providing diversified and differentiated data fusion services for the terminal.

When at least two terminals access the server 510, the server 510 collects hardware configuration information (e.g., a processor model of the terminal, a memory capacity of the terminal, graphics processor performance information of the terminal, etc.) of each terminal, and real-time network bandwidth of each terminal, respectively.

Wherein, if the computing power of the central processing unit (Central Processing Unit, CPU) of a certain terminal is stronger, or the computing power of the graphics processing unit (graphics processing unit, GPU) of a certain terminal is stronger, the terminal has better encoding/decoding capability.

Then, a plurality of terminals will establish a video conference through the server 510, and the server 510 can receive video stream data transmitted from all terminals participating in the video conference. The server 510 processes the received video stream data sent by the at least two terminals based on the preset image synthesis mode, and obtains an image to be processed.

For example, the server 510 determines display positions of the respective video stream data in a preset grid including a landscape pane and a portrait pane by respectively matching the plurality of video stream data with the preset grid; further, based on the display positions of the video stream data in the preset grid, mapping the video images corresponding to the video stream data into a preset display interface respectively, and generating the images to be processed.

It should be noted that, the image to be processed may represent a real-time picture of each terminal participating in the video conference. For example, fig. 6 is a schematic diagram of a video fusion screen according to an embodiment of the disclosure. As shown in fig. 6, the first terminal 521, the second terminal 522, the third terminal 522, and the fourth terminal 524 perform a video conference through the server 510, and each terminal generates one video stream data (e.g., video image "1", video image "2", video image "3", video image "4" shown in fig. 6) during the video conference; when the server 510 receives a plurality of video stream data reported by the first terminal 521, the second terminal 522, the third terminal 522, and the fourth terminal 524, the server 510 integrates the four video stream data into one video stream data (may also be referred to as video screen-melting data); then, the server 510 distributes the video melting screen data to each terminal so that each terminal can synchronously acquire video data of other terminals participating in the video conference.

Wherein, the video fuses screen data and includes: video image "1", video image "2", video image "3", and video image "4", and the above video images are arranged based on a preset order.

In some embodiments, the video fusion screen data may further include a plurality of audio data, and a processing method of the server 510 for the plurality of audio data is the same as a processing method of the server for the plurality of video stream data, which is not described herein.

After the image to be processed is obtained, the server 510 determines the coding rate of each terminal, and then encodes the image to be processed according to the coding rate of each terminal, so as to generate target video stream data matched with each terminal.

The coding code rate of each terminal is determined according to the hardware configuration information of the terminal and the real-time network bandwidth of the terminal. Because the hardware configuration information of each terminal is different, and the real-time network bandwidth of each terminal is also different, the server 510 can adjust the coding rate of each terminal in real time, so as to obtain the coding rate matched with the terminal.

For example, in the case where it is determined that the hardware configuration information of the terminal has been changed, the server 510 updates the coding rate of the terminal based on the updated hardware configuration information, and obtains the updated coding rate.

It should be noted that, for a terminal with a low image processing capability, the server 510 may reduce the picture details, thereby reducing the computing load of the terminal, and enabling the terminal to obtain the target video stream data matched with the terminal.

For another example, in the case of determining that the real-time network bandwidth of the terminal is changed, the server 510 updates the coding rate of the terminal according to the changed network bandwidth, and obtains the updated coding rate.

It should be noted that, the server 510 detects the network status of each terminal in real time, so as to update the real-time network bandwidth of each terminal in real time. If the server 510 detects that the network bandwidth of a certain terminal is reduced, the server 510 timely updates the coding rate of the terminal according to the changed network bandwidth, obtains an updated coding rate, and encodes the image to be processed according to the updated coding rate, so as to generate updated target video stream data, so that the updated target video stream data is more suitable for the transmission requirement of the terminal, and the stability of the target video stream data in the transmission process is ensured. For example, for a terminal with a low real-time network bandwidth or limited hardware performance of the terminal, the server 510 may reduce the coding rate corresponding to the terminal, so as to ensure that the video conference can still maintain good picture quality in a low bandwidth environment, and improve the smoothness of data transmission in the video conference.

For example, if a real-time network bandwidth of a terminal is 2Mb and the terminal has a 4-core processor, a processing capability of 8 threads, and a memory space of 8GB, the transmission rate of target video stream data that the terminal can obtain is 25fps, and the terminal is enabled to perform image display at a resolution of 720P (i.e., 720×1280 progressive scan).

If the real-time network bandwidth of a terminal is 4Mb, and the terminal has an 8-core processor, a processing capability with 16 threads, and a memory space of 8GB, the transmission rate of the target video stream data that can be obtained by the terminal is 25fps, and the terminal can perform image display with 1080P resolution (i.e., 920×1080 progressive scan).

If the real-time network bandwidth of a terminal is 6Mb, and the terminal has an 8-core processor, a processing capability with 16 threads and a memory space of 16GB, the terminal can display an image with 1080P resolution or display an image with the resolution of the original image.

By adopting different coding code rates to code the image to be processed, the obtained target video stream data matched with each terminal is more suitable for the display of the terminal, and the stability of the target video stream data in the transmission process is ensured.

Further, the server 510 transmits the target video stream data matched thereto to each terminal. After the terminal receives the target video stream data, the terminal processes the target video stream data (for example, decodes the target video stream data, and performs operations such as reducing or amplifying the decoded image) based on the image processing capability of the terminal, so as to obtain a target display image, where the target display image includes video images corresponding to at least two video stream data, and the at least two video images are images acquired by any two terminals participating in the video conference.

Wherein the terminal can display the at least two video images by adjusting an image display layout thereon. Also, the image display layout of the terminal is the same as that on the server 510.

For example, fig. 7 is a schematic diagram of an image display layout according to an embodiment of the disclosure. As shown in fig. 7, video image "1", video image "2", video image "3", video image "4", video image "5", video image "6", video image "7", video image "8" and video image "9" represent video images generated by 9 terminals participating in a video conference, respectively.

After receiving the video stream data corresponding to the 9 video images, the server 510 scales each video image and splices the scaled images to meet the display requirement of the image display layout. For example, the server 510 automatically superimposes right from the upper left corner according to the number of pictures (for example, 9) to be presented in a grid shape, the layout being X (the number of horizontal panes) ×y (the number of vertical panes) +the number of remaining pictures, that is, each image in the image display layout shown in fig. 9 is acquired (that is, the image is stitched and displayed using the image display layout of 3×3).

The image display layout may be 2×2, 4×4, 8×8, or the like.

In some embodiments, the image display layout selects the 2×2 mode, when the number of terminals participating in the video conference is large (for example, the number is greater than 4), the server 510 may select the clear video image layout reported by the first 4 terminals from the video images reported by the plurality of terminals to be in the image display layout, that is, the number of the remaining frames at this time is 0, and the image display layout is 2×2+0.

When the number of terminals participating in the video conference is small (e.g., the number is less than 4), the number of remaining pictures therein is not 0, which may be displayed as blank. For example, fig. 8 is a schematic diagram of an image display layout according to an embodiment of the disclosure. As shown in fig. 8, the number of terminals participating in the video conference at this time is 3, and the image display layout at this time is 2×1+1.

It should be noted that, because the image processing capability of each terminal is different, the resolutions of at least two video images displayed by each terminal based on the image display layout are also different, so as to adapt to the display requirements of the respective terminals, and fine control and self-adaptive adjustment are performed on the images.

In this embodiment, the server 510 automatically adjusts the coding rate of each terminal according to the hardware configuration information of each terminal, so as to encode the image to be processed by using the coding rate corresponding to each terminal, so that each terminal can obtain the target video stream data matched with the image processing capability of each terminal, and the parameters such as image resolution, bit rate, frame rate and the like corresponding to the target video stream data of each terminal are different. For a terminal with lower hardware performance or small real-time network bandwidth, the picture resolution of the terminal can be reduced so as to reduce the data processing pressure of the terminal, and meanwhile, the smoothness of the video conference can be improved.

As shown in fig. 9, the electronic device includes: at least one processor 901, at least one memory 902, and one or more I/O interfaces 903. Wherein one or more I/O interfaces 903 are coupled between the processor 901 and the memory 902. The memory 902 stores one or more computer programs that are executed by the at least one processor 901 to enable the at least one processor 901 to implement any one of the image processing methods described in the above embodiments.

The present application also provides a computer-readable storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements any of the image processing methods described in the above embodiments. The computer readable storage medium may be a volatile or nonvolatile computer readable storage medium.

The foregoing is merely exemplary embodiments of the present disclosure and is not intended to limit the scope of the present disclosure. In general, the various embodiments of the disclosure may be implemented in hardware or special purpose circuits, software, logic or any combination thereof. For example, some aspects may be implemented in hardware, while other aspects may be implemented in firmware or software which may be executed by a controller, microprocessor or other computing device, although the disclosure is not limited thereto.

Embodiments of the present disclosure may be implemented by a processor executing computer program instructions, for example, in a processor entity, either in hardware, or in a combination of software and hardware. The computer program instructions may be assembly instructions, instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages.

The block diagrams of any of the logic flows in the figures of this disclosure may represent program steps, or may represent interconnected logic circuits, modules, and functions, or may represent a combination of program steps and logic circuits, modules, and functions. The computer program may be stored on a memory. The memory may be of any type suitable to the local technical environment and may be implemented using any suitable data storage technology, such as, but not limited to, read Only Memory (ROM), random Access Memory (RAM), optical storage devices and systems (digital versatile disk DVD or CD optical disk), etc. The computer readable medium may include a non-transitory storage medium. The processor may be of any type suitable to the local technical environment, such as, but not limited to, general purpose computers, special purpose computers, microprocessors, digital Signal Processors (DSPs), application Specific Integrated Circuits (ASICs), programmable logic devices (FGPAs), and processors based on a multi-core processor architecture.

By way of exemplary and non-limiting example, a detailed description of exemplary embodiments of the present disclosure has been provided above. Various modifications and adaptations to the above embodiments may become apparent to those skilled in the art without departing from the scope of the invention, which is defined in the accompanying drawings and claims. Accordingly, the proper scope of the invention is to be determined according to the claims.

Claims

1. An image processing method, the method comprising:

acquiring hardware configuration information of each terminal entering a video conference and real-time network bandwidth of each terminal;

processing the received video stream data sent by at least two terminals based on a preset image synthesis mode to obtain an image to be processed;

encoding the image to be processed according to the encoding code rate of each terminal to generate target video stream data matched with each terminal, wherein the encoding code rate is determined according to the hardware configuration information of the terminal and the real-time network bandwidth of the terminal;

and sending the target video stream data matched with the target video stream data to each terminal.

2. The method of claim 1, wherein after said transmitting the target video stream data matched thereto to each of said terminals, said method further comprises:

under the condition that the hardware configuration information of the terminal is determined to be changed, updating the coding rate of the terminal based on the updated hardware configuration information to obtain the updated coding rate;

3. The method of claim 1, wherein after said transmitting the target video stream data matched thereto to each of said terminals, said method further comprises:

under the condition that the real-time network bandwidth of the terminal is determined to be changed, updating the coding rate of the terminal according to the changed network bandwidth to obtain the updated coding rate;

4. A method according to any one of claims 1 to 3, wherein the video stream data transmitted by the at least two terminals comprises a plurality of video stream data;

the processing the received video stream data sent by at least two terminals based on the preset image synthesis mode to obtain an image to be processed comprises the following steps:

respectively matching the plurality of video stream data with a preset grid, and determining the display position of each video stream data in the preset grid, wherein the preset grid comprises a transverse pane and a longitudinal pane;

and mapping the video images corresponding to the video stream data into a preset display interface according to the display position of each video stream data in the preset grid, and generating the image to be processed.

5. The method of claim 4, wherein the preset grid comprises a preset number of display positions, the preset number being a product value of the number of landscape panes and the number of portrait panes;

according to the display position of each video stream data in the preset grid, mapping the video image corresponding to each video stream data to a preset display interface to generate the image to be processed, including:

respectively carrying out scaling treatment on video images corresponding to the video stream data to obtain a plurality of images to be spliced;

under the condition that the number of terminals participating in the video conference is smaller than the preset number, mapping each image to be spliced onto the preset display interface in sequence according to the acquisition sequence of the images to be spliced to obtain the image to be processed, wherein the image to be processed comprises at least one idle area;

and under the condition that the number of terminals participating in the video conference is determined to be greater than or equal to the preset number, selecting images with definition greater than a preset definition threshold value from a plurality of images to be spliced according to the preset number, and sequentially mapping the selected images onto a preset display interface to obtain the images to be processed.

6. An image processing method, the method comprising:

transmitting hardware configuration information and real-time network bandwidth of a terminal to a server, so that the server determines a coding rate of the terminal based on the hardware configuration information and the real-time network bandwidth of the terminal, and codes an image to be processed by using the coding rate to generate target video stream data matched with the terminal, wherein the image to be processed is an image obtained by processing video stream data transmitted by at least two received terminals by the server based on a preset image synthesis mode, and the terminal is a terminal participating in a video conference;

and responding to the target video stream data sent by the server, and processing the target video stream data based on the image processing capability of the terminal to obtain a target display image.

7. The method of claim 6, wherein the higher the image processing capability of the terminal, the higher the video resolution of the target display image; the target display image comprises at least two video images corresponding to video stream data, wherein the at least two video images are images acquired by any two terminals participating in the video conference.

8. The method according to claim 7, wherein the processing of the target video stream data based on the image processing capability of the terminal in response to the target video stream data transmitted by the server, after obtaining a target display image, further comprises:

based on the display requirement, any one of the following operations is performed on the target display image:

adjusting an image display layout on the terminal, wherein the image display layout comprises display positions corresponding to the at least two video images;

performing an enlarging or reducing operation on the target display image;

and switching the target display image and the main video interface.

9. An image processing system, comprising: an image processing server, at least two terminals;

an image processing server configured to perform the image processing method according to any one of claims 1 to 5;

the terminal configured to perform the image processing method according to any one of claims 6 to 8.

10. An electronic device, comprising:

one or more processors;

a memory having one or more programs stored thereon, which when executed by the one or more processors, cause the one or more processors to implement the image processing method of any of claims 1 to 5 or the image processing method of any of claims 6 to 8.