CN114760435A

CN114760435A - Conference relaying method, device, equipment and storage medium based on image processing

Info

Publication number: CN114760435A
Application number: CN202210663261.6A
Authority: CN
Inventors: 王志超
Original assignee: Shenzhen Dahui Information Technology Co ltd
Current assignee: Shenzhen Dahui Information Technology Co ltd
Priority date: 2022-06-13
Filing date: 2022-06-13
Publication date: 2022-07-15

Abstract

The invention relates to an artificial intelligence technology, and discloses a conference relaying method and device based on image processing, an electronic device and a storage medium, wherein the method comprises the following steps: performing framing processing on real-time video streams in all conference terminals to obtain original conference video frames, and performing different sampling processing on the original conference video frames to obtain high-resolution reference video frames, low-resolution reference video frames and low-resolution input video frames; carrying out feature information matching transfer and feature fusion processing on the high-resolution reference video frame, the low-resolution reference video frame and the low-resolution input video frame to obtain a standardized video conference fusion feature map; and according to the fusion characteristic diagram of the standardized video conference, carrying out resolution enhancement processing on the low-resolution input video frame by using a video frame resolution enhancement module to obtain an enhanced video frame, and synthesizing and displaying the enhanced video stream. The invention can solve the problem of low picture resolution during video conference rebroadcasting.

Description

Conference relaying method, device, equipment and storage medium based on image processing

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to a conference rebroadcasting method and device based on image processing, electronic equipment and a computer readable storage medium.

Background

With the rapid development of network technology, more and more application scenes of online remote video conferences are provided. The video conference refers to a person or group in two or more different areas, and sound, images or file data and the like are mutually transmitted through a transmission line and multimedia equipment, so that each party can see the head portraits or projection contents of other participants on a screen.

In an application scene that a display used for displaying a video is deployed in a limited manner, a user far away from the display is difficult to see contents displayed on the display, and at the moment, the video display is usually realized in a rebroadcasting mode.

Disclosure of Invention

The invention provides a conference rebroadcasting method and device based on image processing and a computer readable storage medium, and mainly aims to solve the problem of low picture resolution when video conference rebroadcasting is carried out.

In order to achieve the above object, the present invention provides a conference relaying method based on image processing, including:

Acquiring real-time video streams transmitted to each conference terminal, framing the real-time video streams to obtain original conference video frames, downsampling the original conference video frames to obtain low-resolution video frames, and reserving part of the real-time video streams as high-resolution reference video frames according to a preset rule;

respectively sampling the high-resolution reference video frame and the low-resolution video frame to obtain a low-resolution reference video frame and a low-resolution input video frame;

matching feature information of the high-resolution reference video frame, the low-resolution reference video frame and the low-resolution input video frame to obtain an optimal matching feature sub-image set, and performing feature fusion processing on the low-resolution input video frame and the optimal matching feature sub-image set to obtain a standardized video conference fusion feature image;

according to the standardized video conference fusion feature map, a preset video frame resolution enhancement module is used for carrying out resolution enhancement processing on the low-resolution input video frame to obtain an enhanced video frame, and the enhanced video frame is synthesized into an enhanced video stream;

and displaying the enhanced video stream in the display equipment of each conference terminal of the video conference.

Optionally, the sampling the high-resolution reference video frame and the low-resolution video frame respectively to obtain a low-resolution reference video frame and a low-resolution input video frame includes:

utilizing a pre-constructed depth back projection network to carry out down-sampling on the high-resolution reference video frame to obtain a low-resolution pre-reference video frame;

and simultaneously carrying out up-sampling on the low-resolution pre-reference video frame and the low-resolution video frame by utilizing the depth back projection network and adopting a sampling multiplying power which is reciprocal to the down-sampling of the high-resolution reference video frame so as to obtain a low-resolution reference video frame and a low-resolution input video frame.

Optionally, the performing feature information matching on the high-resolution reference video frame, the low-resolution reference video frame, and the low-resolution input video frame to obtain an optimal matching feature sub-image set, and performing feature fusion processing on the low-resolution input video frame and the optimal matching feature sub-image set to obtain a standardized video conference fusion feature image includes:

respectively extracting image characteristics of the high-resolution reference video frame, the low-resolution reference video frame and the low-resolution input video frame to obtain a high-resolution reference video frame characteristic diagram, a low-resolution reference video frame characteristic diagram and a low-resolution input video frame characteristic diagram;

Splitting the high-resolution reference video frame feature map, the low-resolution reference video frame feature map and the low-resolution input video frame feature map into a high-resolution reference video frame feature sub-image set, a low-resolution reference video frame feature sub-image set and a low-resolution input video frame feature sub-image set respectively according to a preset splitting rule;

matching the feature information of the low-resolution reference video frame feature sub-image set with the feature information of the low-resolution input video frame feature sub-image set to obtain a first optimal matching feature sub-image set;

matching a second best matching feature sub-image set of the low-resolution input video frame feature sub-image set in the high-resolution reference video frame feature sub-image set by utilizing a spatial position mapping relation;

performing feature information recombination and transfer on the first optimal matching feature sub-graph set and the second optimal matching feature sub-graph set to obtain a video conference transfer feature graph;

and carrying out feature information fusion and standardization on the low-resolution input video frame feature map and the video conference transfer feature map to obtain a standardized video conference fusion feature map.

Optionally, the performing feature information matching on the low-resolution reference video frame feature sub-atlas and the low-resolution input video frame feature sub-atlas to obtain a first best matching feature sub-atlas includes:

Randomly selecting the ith characteristic subgraph of the characteristic subgraph set of the low-resolution input video frame, wherein i is a positive integer;

calculating the feature similarity of the ith feature sub-graph and each feature sub-graph of the low-resolution reference video frame feature sub-graph set;

selecting a low-resolution reference video frame feature sub-graph with the maximum feature similarity as a first best matching feature sub-graph of the ith feature sub-graph;

and returning to the step of randomly selecting the ith characteristic subgraph of the low-resolution input video frame characteristic subgraph set until all the characteristic subgraphs in the low-resolution input video frame characteristic subgraph set are traversed to obtain a first optimal matching characteristic subgraph set.

Optionally, the performing, according to the standardized video conference fusion feature map, resolution enhancement processing on the low-resolution input video frame by using a preset video frame resolution enhancement module to obtain an enhanced video frame includes:

sequencing the standardized video conference fusion characteristic graphs according to levels to obtain an N-level standardized video conference fusion characteristic graph sequence, wherein N is an integer greater than 1;

and coding and decoding the low-resolution input video frame and the feature maps of different levels in the N-level standardized video conference fusion feature map sequence step by using coding layers of different levels in the preset video frame resolution enhancement module to obtain an enhanced video frame.

Optionally, the step of using coding layers of different levels in the preset video frame resolution enhancement module to perform step-by-step coding and decoding on the low-resolution input video frame and the feature maps of different levels in the N-level standardized video conference fusion feature map sequence to obtain an enhanced video frame includes:

performing preliminary coding on the low-resolution input video frame by using a primary coder in the preset video frame resolution enhancement module to obtain a first coding result;

performing dimensionality splicing on the first coding result and a primary standardized video conference fusion characteristic diagram of the standardized video conference fusion characteristic diagram to obtain a first spliced characteristic diagram;

encoding the first splicing characteristic diagram by using a second encoder, and performing down-sampling and standardization on an encoding result to obtain a second encoding result;

performing dimensionality splicing on the second coding result and a second-level standardized video conference fusion characteristic graph of the standardized video conference fusion characteristic graph to obtain a second spliced characteristic graph;

encoding the second splicing characteristic diagram by using a third encoder, and performing downsampling and standardization on an encoding result to obtain a third encoding result;

Performing dimension splicing on the Nth encoding result and the N-level standardized video conference fusion characteristic graph to obtain an Nth spliced characteristic graph, wherein N is an integer larger than 1;

decoding, up-sampling, standardizing and residual error connection processing are carried out on the Nth splicing characteristic diagram to obtain a pre-enhanced video frame;

respectively acquiring a forward real video frame and a backward real video frame of a real video frame corresponding to the low-resolution input video frame, respectively extracting feature maps of the pre-enhanced video frame, the forward real video frame and the backward real video frame, splicing the feature maps of the pre-enhanced video frame, the forward real video frame and the backward real video frame, and performing convolution and full-connection processing on the spliced feature maps to obtain a confidence coefficient of the pre-enhanced video frame;

and selecting a pre-enhanced video frame with confidence coefficient meeting preset requirements from the pre-enhanced video frames as an enhanced video frame.

Optionally, the performing feature information fusion and normalization on the low-resolution input video frame feature map and the video conference transfer feature map to obtain a normalized video conference fusion feature map includes:

Performing feature information fusion on the low-resolution input video frame feature map and the video conference transfer feature map by using a guide filtering method to obtain a video conference fusion feature map;

and carrying out feature standardization on the video conference fusion feature map by using the standard deviation to obtain a standardized video conference fusion feature map.

In order to solve the above problem, the present invention also provides a conference relay apparatus based on image processing, the apparatus comprising:

the down-sampling module is used for acquiring real-time video streams transmitted to each conference terminal, performing framing processing on the real-time video streams to obtain original conference video frames, performing down-sampling on the original conference video frames to obtain low-resolution video frames, and reserving part of the real-time video streams as high-resolution reference video frames according to a preset rule;

the picture preprocessing module is used for respectively sampling the high-resolution reference video frame and the low-resolution video frame to obtain a low-resolution reference video frame and a low-resolution input video frame;

the matching and fusion module is used for matching the feature information of the high-resolution reference video frame, the low-resolution reference video frame and the low-resolution input video frame to obtain an optimal matching feature sub-image set, and performing feature fusion processing on the low-resolution input video frame and the optimal matching feature sub-image set to obtain a standardized video conference fusion feature image;

The video enhancement module is used for utilizing a preset video frame resolution enhancement module to carry out resolution enhancement processing on the low-resolution input video frame according to the standardized video conference fusion characteristic graph to obtain an enhanced video frame, and synthesizing the enhanced video frame into an enhanced video stream;

and the video display module is used for displaying the enhanced video stream in the display equipment of each conference terminal of the video conference.

In order to solve the above problem, the present invention also provides an electronic device, including:

at least one processor; and the number of the first and second groups,

a memory communicatively coupled to the at least one processor; wherein,

the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the image processing-based conference relaying method described above.

In order to solve the above problem, the present invention also provides a computer-readable storage medium, in which at least one computer program is stored, and the at least one computer program is executed by a processor in an electronic device to implement the image processing-based conference relaying method described above.

According to the embodiment of the invention, an original conference video frame is obtained by framing the real-time video stream, a low-resolution video frame is obtained by downsampling the original conference video frame, a part of the real-time video stream is reserved as a high-resolution reference video frame according to a preset rule, the high-resolution reference video frame and the low-resolution video frame are sampled to obtain a low-resolution reference video frame and a low-resolution input video frame, and the size and the definition of the video frame are adjusted, so that the low-resolution reference video frame is used as a bridge of the high-resolution reference video frame and the low-resolution input video frame, and the characteristics are convenient to match; performing feature information matching transfer and feature fusion processing on the high-resolution reference video frame, the low-resolution reference video frame and the low-resolution input video frame to obtain a standardized video conference fusion feature map, performing accurate matching on the feature information of the high-resolution reference video frame, the low-resolution reference video frame and the low-resolution input video frame, transferring the feature information in the high-resolution reference video frame to the standardized video conference fusion feature map, and secondarily fusing the feature information of the low-resolution input video frame, so that the feature information of the high-resolution reference video frame and the low-resolution input video frame is fully reserved, and the resolution of a conference rebroadcast video picture is favorably improved; according to the standardized video conference fusion characteristic graph, a preset video frame resolution enhancement module is used for carrying out resolution enhancement processing on the low-resolution input video frame to obtain an enhanced video frame, the characteristic graph is subjected to hierarchical coding and reconstruction, the spatial-temporal continuity of adjacent frames is fully considered, the accuracy of reconstructing the low-resolution input video frame is improved, and therefore the resolution of a conference rebroadcast video picture is improved. Therefore, the conference relaying method and device based on image processing, the electronic equipment and the computer readable storage medium provided by the invention can solve the problem of low picture resolution in video conference relaying.

Drawings

Fig. 1 is a schematic flowchart of a conference relaying method based on image processing according to an embodiment of the present invention;

fig. 2 is a flowchart illustrating a detailed implementation of one step in the conference relaying method based on image processing shown in fig. 1;

fig. 3 is a flowchart illustrating a detailed implementation of another step in the image processing-based conference relaying method shown in fig. 1;

fig. 4 is a functional block diagram of a conference relay apparatus based on image processing according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of an electronic device for implementing the image processing-based conference relaying method according to an embodiment of the present invention.

The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

The embodiment of the application provides a conference rebroadcasting method based on image processing. The main body of the conference relaying method based on image processing includes, but is not limited to, at least one of the electronic devices of a server, a terminal, and the like, which can be configured to execute the method provided by the embodiments of the present application. In other words, the image processing-based conference relaying method may be performed by software or hardware installed in a terminal device or a server device, and the software may be a blockchain platform. The server includes but is not limited to: a single server, a server cluster, a cloud server or a cloud server cluster, and the like. The server may be an independent server, or may be a cloud server that provides basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a Network service, cloud communication, a middleware service, a domain name service, a security service, a Content Delivery Network (CDN), a big data and artificial intelligence platform, and the like.

Fig. 1 is a schematic flowchart of a conference relaying method based on image processing according to an embodiment of the present invention. In this embodiment, the conference relaying method based on image processing includes:

s1, real-time video streams transmitted to each conference terminal are obtained, framing processing is carried out on the real-time video streams to obtain original conference video frames, down sampling is carried out on the original conference video frames to obtain low-resolution video frames, and a part of the real-time video streams are reserved as high-resolution reference video frames according to preset rules.

In the embodiment of the invention, each path of real-time video stream in the video conference can be acquired by utilizing the video conference equipment of each participant in the video conference. The video conference equipment can be equipment with a snapshot function, such as a camera and a video recorder corresponding to conference terminals in conference places of different participants of the video conference.

Further, the embodiment of the present invention may utilize a multipoint control server to transmit the real-time video stream acquired by the main conference room to the conference terminal of the relevant conference room, and may also transmit the real-time video stream acquired by the conference terminal of the relevant conference room to the conference terminal of the main conference room.

In one of the real-time examples of the present invention, the real-time video stream may be compressed and then relayed to each conference terminal.

In one embodiment of the present invention, the real-time video stream frame can be processed by using a picture quality evaluation model to obtain an original conference video frame. The image quality evaluation model can evaluate the quality of the original conference video frame, and eliminates unqualified images such as poor illumination, blurring and overlarge offset angle, so that the quality of the original conference video frame is improved.

In one embodiment of the present invention, the original conference video frame may be downsampled by a downsampling function. For example, the original conference video frame has a size of X × Y, and is downsampled by S times, that is, a low resolution video frame with a size of (X/S) × (Y/S) is obtained.

In the embodiment of the invention, in order to control the resource occupation amount, the original conference video frame is periodically reserved and is not subjected to downsampling processing to be used as a high-resolution reference video frame.

And S2, respectively sampling the high-resolution reference video frame and the low-resolution video frame to obtain a low-resolution reference video frame and a low-resolution input video frame.

In detail, the S2 includes:

Utilizing a pre-constructed depth Back-Projection network (DBPN for short) to carry out down-sampling on the high-resolution reference video frame to obtain a low-resolution pre-reference video frame;

In the embodiment of the invention, the depth back projection network comprises a picture down-sampling unit and a picture up-sampling unit, the picture size of the high-resolution reference video frame is reduced through the picture down-sampling unit, and the sizes of the low-resolution pre-reference video frame and the low-resolution video frame are amplified through the picture up-sampling unit, so that the definition of the low-resolution pre-reference video frame and the definition of the low-resolution video frame are higher.

In another embodiment of the present invention, during the down-sampling, a pyramid down-sampling method may also be used to change the S × S pixel block of the original video frame into one pixel.

Further, in another embodiment of the present invention, an interpolation algorithm may be used to perform upsampling on the image.

In the embodiment of the invention, the picture size of the low-resolution video frame is adjusted by sampling the high-resolution reference video frame and the low-resolution video frame, and meanwhile, the physical size of the low-resolution reference video frame is kept consistent with that of the original conference video frame, so that the pixel definition of the low-resolution reference video frame is closer to that of the low-resolution input video frame.

And S3, performing feature information matching on the high-resolution reference video frame, the low-resolution reference video frame and the low-resolution input video frame to obtain an optimal matching feature sub-image set, and performing feature fusion processing on the low-resolution input video frame and the optimal matching feature sub-image set to obtain a standardized video conference fusion feature image.

In the embodiment of the present invention, a preset feature transfer module may be used to perform feature information matching transfer on the high-resolution reference video frame, the low-resolution reference video frame, and the low-resolution input video frame to obtain an optimal matching feature sub-image set, and perform feature fusion processing on the low-resolution input video frame and the optimal matching feature sub-image set to obtain a standardized video conference fusion feature image.

In the embodiment of the invention, the preset feature transfer module comprises a feature extraction network, a feature information matching unit, a spatial position mapping unit and an information fusion unit. The feature extraction network can be composed of a convolutional neural network and can extract a feature map of a video frame; the feature information matching unit may find a first best matching feature sub-image set by matching the feature images; the spatial position mapping unit can search a second best matching feature sub-map set by using a spatial position mapping relation; the information fusion unit may perform recombination and transfer on the first best matching feature sub-image set and the second best matching feature sub-image set to obtain a video conference transfer feature map, and perform feature information fusion on the low-resolution input video frame and the video conference transfer feature map.

In detail, referring to fig. 2, the S3 includes:

s31, respectively extracting image features of the high-resolution reference video frame, the low-resolution reference video frame and the low-resolution input video frame to obtain a high-resolution reference video frame feature map, a low-resolution reference video frame feature map and a low-resolution input video frame feature map;

S32, splitting the high-resolution reference video frame feature map, the low-resolution reference video frame feature map and the low-resolution input video frame feature map into a high-resolution reference video frame feature sub-image set, a low-resolution reference video frame feature sub-image set and a low-resolution input video frame feature sub-image set respectively according to a preset splitting rule;

s33, performing feature information matching on the low-resolution reference video frame feature sub-image set and the low-resolution input video frame feature sub-image set to obtain a first optimal matching feature sub-image set;

s34, matching a second best matching feature sub-image set of the low-resolution input video frame feature sub-image set in the high-resolution reference video frame feature sub-image set by utilizing a spatial position mapping relation;

s35, performing feature information recombination and transfer on the first optimal matching feature sub-graph set and the second optimal matching feature sub-graph set to obtain a video conference transfer feature graph;

and S36, carrying out feature information fusion and standardization on the low-resolution input video frame feature graph and the video conference transfer feature graph to obtain a standardized video conference fusion feature graph.

Further, the S33 includes:

calculating the feature similarity of the ith feature sub-graph and each feature sub-graph in the low-resolution reference video frame feature sub-graph set;

and returning to the step of randomly selecting the ith characteristic subgraph of the low-resolution input video frame characteristic subgraph set until all characteristic subgraphs in the low-resolution input video frame characteristic subgraph set are traversed to obtain a first optimal matching characteristic subgraph set.

In the embodiment of the invention, the feature similarity can be calculated by performing weighted summation of preset weights according to the vector cosine distance and the Manhattan distance of the low-resolution input video frame feature subgraph and the low-resolution reference video frame feature subgraph. And when the vector cosine distance and the Manhattan distance are smaller, the similarity between the low-resolution input video frame characteristic subgraph and the low-resolution reference video frame characteristic subgraph is higher.

In the embodiment of the present invention, when the feature information is recombined and transferred for the first best matching feature sub-graph set and the second best matching feature sub-graph set, a problem that feature sub-graphs are overlapped in space may occur, and for a plurality of values at the same spatial position, a mean value calculation method may be adopted to determine a value to be finally filled.

Further, the S36 includes:

performing feature information fusion on the low-resolution input video frame feature map and the video conference transfer feature map by using a guided filtering method to obtain a video conference fusion feature map;

In an embodiment of the present invention, a method based on data spatial distribution integral migration, for example, an AdaIN method, may also be used to perform feature information fusion on the low-resolution input video frame feature map and the video conference transfer feature map, so as to obtain a video conference fusion feature map.

In the embodiment of the invention, the feature standardization of the video conference fusion feature graph can be further carried out by utilizing linear normalization and nonlinear normalization.

In the embodiment of the invention, the characteristic information matching transfer and the characteristic fusion processing are carried out on the high-resolution reference video frame, the low-resolution reference video frame and the low-resolution input video frame to obtain a standardized video conference fusion characteristic diagram, the characteristic information of the high-resolution reference video frame, the low-resolution reference video frame and the low-resolution input video frame is accurately matched, the characteristic information in the high-resolution reference video frame is transferred to the standardized video conference fusion characteristic diagram, and the characteristic information of the low-resolution input video frame is secondarily fused, so that the characteristic information of the high-resolution reference video frame and the low-resolution input video frame is fully reserved, and the resolution of a conference rebroadcast video picture is favorably improved.

And S4, according to the standardized video conference fusion feature map, carrying out resolution enhancement processing on the low-resolution input video frame by using a preset video frame resolution enhancement module to obtain an enhanced video frame, and synthesizing the enhanced video frame into an enhanced video stream.

In detail, referring to fig. 3, in S4, performing resolution enhancement processing on the low-resolution input video frame by using a preset video frame resolution enhancement module according to the standardized video conference fusion feature map to obtain an enhanced video frame, where the method includes:

s41, sequencing the standardized video conference fusion characteristic graphs according to levels to obtain an N-level standardized video conference fusion characteristic graph sequence, wherein N is an integer larger than 1;

and S42, coding and decoding the low-resolution input video frame and the feature maps of different levels in the N-level standardized video conference fusion feature map sequence step by using the coding layers of different levels in the preset video frame resolution enhancement module to obtain an enhanced video frame.

In the embodiment of the invention, the standardized video conference fusion characteristic graph usually comprises a plurality of levels of characteristic graphs, and the plurality of levels of characteristic graphs are sequentially sequenced from low to high to obtain a first-level standardized video conference fusion characteristic graph to an N-level standardized video conference fusion characteristic graph.

In the embodiment of the invention, the preset video frame resolution enhancement module is a resolution enhancement network based on a convolutional neural network and adopts a symmetrical multi-level coding structure. Further, the video frame resolution enhancement module comprises a resolution enhancement network for reconstructing the low resolution input video frame and a countermeasure network introducing neighboring video frames to constrain the reconstructed video frame.

Further, the S42 includes:

performing dimensionality splicing on the first coding result and a first-level standardized video conference fusion characteristic diagram of the standardized video conference fusion characteristic diagram to obtain a first spliced characteristic diagram;

A third encoder is used for encoding the second splicing characteristic diagram, and downsampling and standardization are carried out on an encoding result to obtain a third encoding result;

until the Nth encoding result and the N-level standardized video conference fusion characteristic graph are subjected to dimension splicing to obtain an Nth splicing characteristic graph, wherein N is an integer larger than 1;

decoding, up-sampling, standardizing and residual connection processing are carried out on the Nth splicing characteristic diagram to obtain a pre-enhanced video frame;

respectively acquiring a forward real video frame and a backward real video frame of a real video frame corresponding to the low-resolution input video frame, respectively extracting feature maps of the pre-enhanced video frame, the forward real video frame and the backward real video frame, splicing the feature maps of the pre-enhanced video frame, the forward real video frame and the backward real video frame, and performing convolution and full-connection processing on the spliced feature maps to obtain the confidence coefficient of the pre-enhanced video frame;

In the embodiment of the invention, according to the standardized video conference fusion characteristic graph, a preset video frame resolution enhancement module is utilized to carry out resolution enhancement processing on the low-resolution input video frame to obtain an enhanced video frame, the characteristic graph is subjected to hierarchical coding and reconstruction, the spatial-temporal continuity of adjacent frames is fully considered, the accuracy of reconstructing the low-resolution input video frame is improved, and the resolution of a conference rebroadcast video picture is improved.

And S5, displaying the enhanced video stream in the display equipment of each conference terminal of the video conference.

In the embodiment of the present invention, the display device may be a display screen, a projection device, or the like.

According to the embodiment of the invention, an original conference video frame is obtained by framing the real-time video stream, a low-resolution video frame is obtained by down-sampling the original conference video frame, a part of the real-time video stream is reserved as a high-resolution reference video frame according to a preset rule, the high-resolution reference video frame and the low-resolution video frame are sampled to obtain a low-resolution reference video frame and a low-resolution input video frame, and the size and the definition of the video frame are adjusted, so that the low-resolution reference video frame is used as a bridge of the high-resolution reference video frame and the low-resolution input video frame, and the characteristics are convenient to match; performing feature information matching transfer and feature fusion processing on the high-resolution reference video frame, the low-resolution reference video frame and the low-resolution input video frame to obtain a standardized video conference fusion feature map, performing accurate matching on the feature information of the high-resolution reference video frame, the low-resolution reference video frame and the low-resolution input video frame, transferring the feature information in the high-resolution reference video frame to the standardized video conference fusion feature map, and secondarily fusing the feature information of the low-resolution input video frame, so that the feature information of the high-resolution reference video frame and the low-resolution input video frame is fully reserved, and the resolution of a conference rebroadcast video picture is favorably improved; according to the standardized video conference fusion characteristic diagram, a preset video frame resolution enhancement module is utilized to perform resolution enhancement processing on the low-resolution input video frame to obtain an enhanced video frame, the characteristic diagram is subjected to hierarchical coding and reconstruction, the spatial-temporal coherence of adjacent frames is fully considered, the accuracy of reconstructing the low-resolution input video frame is improved, and therefore the resolution of a conference rebroadcast video picture is improved. Therefore, the conference rebroadcasting method based on image processing can solve the problem of low picture resolution during video conference rebroadcasting.

Fig. 4 is a functional block diagram of a conference relay device based on image processing according to an embodiment of the present invention.

The conference relay apparatus 100 based on image processing according to the present invention can be installed in an electronic device. According to the implemented functions, the image processing based conference relay device 100 may include a down-sampling module 101, a picture preprocessing module 102, a matching and fusing module 103, a video enhancement module 104, and a video display module 105. The module of the present invention, which may also be referred to as a unit, refers to a series of computer program segments that can be executed by a processor of an electronic device and that can perform a fixed function, and that are stored in a memory of the electronic device.

In the present embodiment, the functions regarding the respective modules/units are as follows:

the down-sampling module 101 is configured to obtain a real-time video stream transmitted to each conference terminal, perform framing processing on the real-time video stream to obtain an original conference video frame, perform down-sampling on the original conference video frame to obtain a low-resolution video frame, and reserve part of the real-time video stream as a high-resolution reference video frame according to a preset rule;

the picture preprocessing module 102 is configured to sample the high-resolution reference video frame and the low-resolution video frame respectively to obtain a low-resolution reference video frame and a low-resolution input video frame;

The matching and fusing module 103 is configured to perform feature information matching on the high-resolution reference video frame, the low-resolution reference video frame, and the low-resolution input video frame to obtain an optimal matching feature sub-image set, and perform feature fusion processing on the low-resolution input video frame and the optimal matching feature sub-image set to obtain a standardized video conference fusion feature image;

the video enhancement module 104 is configured to perform resolution enhancement processing on the low-resolution input video frame by using a preset video frame resolution enhancement module according to the standardized video conference fusion feature map to obtain an enhanced video frame, and synthesize the enhanced video frame into an enhanced video stream;

the video display module 105 is configured to display the enhanced video stream in a display device of each conference terminal of the video conference.

In detail, when the modules in the conference relay device 100 based on image processing according to the embodiment of the present invention are used, the same technical means as the conference relay method based on image processing described in fig. 1 to fig. 3 are used, and the same technical effect can be produced, which is not described again here.

Fig. 5 is a schematic structural diagram of an electronic device for implementing a conference relaying method based on image processing according to an embodiment of the present invention.

The electronic device 1 may comprise a processor 10, a memory 11, a communication bus 12 and a communication interface 13, and may further comprise a computer program, such as a conference relay program based on image processing, stored in the memory 11 and operable on the processor 10.

In some embodiments, the processor 10 may be composed of an integrated circuit, for example, a single packaged integrated circuit, or may be composed of a plurality of integrated circuits packaged with the same function or different functions, and includes one or more Central Processing Units (CPUs), microprocessors, digital Processing chips, graphics processors, and combinations of various control chips. The processor 10 is a Control Unit (Control Unit) of the electronic device, connects various components of the whole electronic device by using various interfaces and lines, and executes various functions and processes data of the electronic device by running or executing programs or modules (for example, executing a conference relay program based on image processing, etc.) stored in the memory 11 and calling data stored in the memory 11.

The memory 11 includes at least one type of readable storage medium including flash memory, removable hard disks, multimedia cards, card-type memory (e.g., SD or DX memory, etc.), magnetic memory, magnetic disks, optical disks, and the like. The memory 11 may in some embodiments be an internal storage unit of the electronic device, for example a removable hard disk of the electronic device. The memory 11 may also be an external storage device of the electronic device in other embodiments, such as a plug-in mobile hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the electronic device. Further, the memory 11 may also include both an internal storage unit and an external storage device of the electronic device. The memory 11 may be used not only to store application software installed in the electronic device and various types of data, such as codes of a conference relay program based on image processing, etc., but also to temporarily store data that has been output or is to be output.

The communication bus 12 may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. The bus is arranged to enable connection communication between the memory 11 and at least one processor 10 or the like.

The communication interface 13 is used for communication between the electronic device and other devices, and includes a network interface and a user interface. Optionally, the network interface may include a wired interface and/or a wireless interface (e.g., WI-FI interface, bluetooth interface, etc.), which are typically used to establish a communication connection between the electronic device and other electronic devices. The user interface may be a Display (Display), an input unit such as a Keyboard (Keyboard), and optionally a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch device, or the like. The display, which may also be referred to as a display screen or display unit, is suitable, among other things, for displaying information processed in the electronic device and for displaying a visualized user interface.

Fig. 5 only shows an electronic device with components, and it will be understood by a person skilled in the art that the structure shown in fig. 5 does not constitute a limitation of the electronic device 1, and may comprise fewer or more components than shown, or a combination of certain components, or a different arrangement of components.

For example, although not shown, the electronic device may further include a power supply (such as a battery) for supplying power to each component, and preferably, the power supply may be logically connected to the at least one processor 10 through a power management device, so that functions of charge management, discharge management, power consumption management and the like are realized through the power management device. The power supply may also include any component of one or more dc or ac power sources, recharging devices, power failure detection circuitry, power converters or inverters, power status indicators, and the like. The electronic device may further include various sensors, a bluetooth module, a Wi-Fi module, and the like, which are not described herein again.

It is to be understood that the described embodiments are for purposes of illustration only and that the scope of the appended claims is not limited to such structures.

The image processing based conference relay program stored in the memory 11 of the electronic device 1 is a combination of instructions that, when executed in the processor 10, enable:

acquiring real-time video streams transmitted to each conference terminal, framing the real-time video streams to obtain original conference video frames, down-sampling the original conference video frames to obtain low-resolution video frames, and reserving part of the real-time video streams as high-resolution reference video frames according to preset rules;

performing feature information matching on the high-resolution reference video frame, the low-resolution reference video frame and the low-resolution input video frame to obtain an optimal matching feature sub-image set, and performing feature fusion processing on the low-resolution input video frame and the optimal matching feature sub-image set to obtain a standardized video conference fusion feature image;

Specifically, the specific implementation method of the processor 10 for the instruction may refer to the description of the relevant steps in the embodiment corresponding to the drawing, and is not repeated here.

Further, the integrated modules/units of the electronic device 1, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. The computer readable storage medium may be volatile or non-volatile. For example, the computer-readable medium may include: any entity or device capable of carrying said computer program code, recording medium, U-disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM).

The present invention also provides a computer-readable storage medium, storing a computer program which, when executed by a processor of an electronic device, may implement:

according to the standardized video conference fusion characteristic graph, a preset video frame resolution enhancement module is used for carrying out resolution enhancement processing on the low-resolution input video frame to obtain an enhanced video frame, and the enhanced video frame is synthesized into an enhanced video stream;

In the several embodiments provided in the present invention, it should be understood that the disclosed apparatus, device and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is only one logical functional division, and other divisions may be realized in practice.

The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one position, or may be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, functional modules in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional module.

It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof.

The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference signs in the claims shall not be construed as limiting the claim concerned.

The block chain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a string of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, which is used for verifying the validity (anti-counterfeiting) of the information and generating a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.

The embodiment of the application can acquire and process related data based on an artificial intelligence technology. Among them, Artificial Intelligence (AI) is a theory, method, technique and application system that simulates, extends and expands human Intelligence using a digital computer or a machine controlled by a digital computer, senses the environment, acquires knowledge and uses the knowledge to obtain the best result.

Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. A plurality of units or means recited in the system claims may also be implemented by one unit or means in software or hardware. The terms first, second, etc. are used to denote names, but not any particular order.

Finally, it should be noted that the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.

Claims

1. A conference relaying method based on image processing, the method comprising:

2. The image-processing-based conference relaying method of claim 1, wherein said sampling said high-resolution reference video frame and said low-resolution video frame respectively to obtain a low-resolution reference video frame and a low-resolution input video frame comprises:

3. The image-processing-based conference relaying method according to claim 1, wherein said matching feature information of said high-resolution reference video frame, said low-resolution reference video frame and said low-resolution input video frame to obtain a best matching feature sub-image set, and performing feature fusion processing of said low-resolution input video frame and said best matching feature sub-image set to obtain a standardized video conference fusion feature map comprises:

matching a second best matching feature sub-graph set of the low-resolution input video frame feature sub-graph set in the high-resolution reference video frame feature sub-graph set by using a spatial position mapping relation;

4. The image-processing-based conference relaying method of claim 3, wherein said matching feature information between said low-resolution reference video frame feature sub-atlas and said low-resolution input video frame feature sub-atlas to obtain a first best matching feature sub-atlas comprises:

5. The image-processing-based conference relaying method according to claim 3, wherein said performing feature information fusion and standardization on said low-resolution input video frame feature map and said video conference transfer feature map to obtain a standardized video conference fusion feature map comprises:

6. The image-processing-based conference relaying method according to claim 1, wherein said performing resolution enhancement processing on said low-resolution input video frame by using a preset video frame resolution enhancement module according to said standardized video conference fusion feature map to obtain an enhanced video frame comprises:

sequencing the standardized video conference fusion characteristic diagrams according to levels to obtain an N-level standardized video conference fusion characteristic diagram sequence, wherein N is an integer greater than 1;

7. The method as claimed in claim 6, wherein the step of progressively encoding and decoding the low-resolution input video frame and the feature maps at different levels in the N-level normalized video conference fusion feature map sequence by using the coding layers at different levels in the preset video frame resolution enhancement module to obtain the enhanced video frame comprises:

encoding the first splicing characteristic diagram by using a second encoder, and performing downsampling and standardization on an encoding result to obtain a second encoding result;

performing dimensionality splicing on the second coding result and a secondary standardized video conference fusion characteristic graph of the standardized video conference fusion characteristic graph to obtain a second spliced characteristic graph;

8. An image processing-based conference relay apparatus, characterized in that the apparatus comprises:

The image preprocessing module is used for respectively sampling the high-resolution reference video frame and the low-resolution video frame to obtain a low-resolution reference video frame and a low-resolution input video frame;

9. An electronic device, characterized in that the electronic device comprises:

at least one processor; and the number of the first and second groups,

a memory communicatively coupled to the at least one processor; wherein,

The memory stores a computer program executable by the at least one processor, the computer program being executable by the at least one processor to enable the at least one processor to perform the image processing based conference relaying method of any of claims 1-7.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out a method of image processing-based conference relaying according to any one of claims 1 to 7.