CN110855908B

CN110855908B - Multi-party video screen mixing method and device, network equipment and storage medium

Info

Publication number: CN110855908B
Application number: CN201911128504.0A
Authority: CN
Inventors: 周骏华; 王乐才; 方华; 宋钦梅
Original assignee: China Mobile Communications Group Co Ltd; China Mobile Hangzhou Information Technology Co Ltd
Current assignee: China Mobile Communications Group Co Ltd; China Mobile Hangzhou Information Technology Co Ltd
Priority date: 2019-11-18
Filing date: 2019-11-18
Publication date: 2022-09-27
Anticipated expiration: 2039-11-18
Also published as: CN110855908A

Abstract

Embodiments of the present invention relate to the technical field of communications, and disclose a method for multi-party video screen mixing, the method comprising: acquiring encoded frames of N videos to be mixed and an input frame rate of each video, where N is a natural number greater than 1; Decode the encoded frame of the video whose input frame rate is different from the preset frame rate to obtain the decoded frame, and obtain the feature parameters of the video whose input frame rate is different from the preset frame rate; input the feature parameters into the motion complexity model to obtain the input The motion complexity of the video whose frame rate is different from the preset frame rate; according to the preset frame rate and motion complexity, the decoded frame is inserted or dropped to obtain the processed decoded frame; screen video. Embodiments of the present invention also provide a multi-party video mixing device, a network device, and a storage medium. The multi-party video screen mixing method, device, network device and storage medium provided by the embodiments of the present invention can improve the display effect of mixed screen communication video.

Description

Multi-party video mixing screen method, device, network device and storage medium

技术领域technical field

本发明涉及通信技术领域，特别涉及一种多方视频混屏方法、装置、网络设备及存储介质。The present invention relates to the field of communication technologies, and in particular, to a method, device, network device and storage medium for multi-party video mixing.

背景技术Background technique

在多种视频通信的场景下，例如进行多方视频会议，需要将进行视频通信的多个视频混在一个屏幕中，以方便进行视频通信用户的观看和互动。In various video communication scenarios, such as multi-party video conferences, it is necessary to mix multiple videos for video communication on one screen to facilitate viewing and interaction of video communication users.

然而，由于进行视频通信的多个终端的类型可能会不同，相应地多个视频的帧率也可能会不一样；而帧率的不同则会导致混屏之后的各个视频不能同步，影响混屏的通信视频的显示效果。However, since the types of multiple terminals that carry out video communication may be different, the frame rates of multiple videos may also be different accordingly; and the difference in frame rates will cause the videos after the mixed screen to be out of sync, which will affect the mixed screen. The display effect of the communication video.

发明内容SUMMARY OF THE INVENTION

本发明实施方式的目的在于提供一种多方视频混屏方法、装置、网络设备及存储介质，可以提高混屏的通信视频的显示效果。The purpose of the embodiments of the present invention is to provide a method, device, network device and storage medium for multi-screen video mixing, which can improve the display effect of mixed-screen communication video.

为解决上述技术问题，本发明的实施方式提供了一种多方视频混屏方法，包含以下步骤：获取待混屏的N个视频的编码帧和每个视频的输入帧率，N为大于1的自然数；将输入帧率与预设帧率不同的视频的编码帧进行解码得到解码帧，并获取输入帧率与预设帧率不同的视频的特征参数；将特征参数输入至运动复杂度模型，得到输入帧率与预设帧率不同的视频的运动复杂度；根据预设帧率和运动复杂度对解码帧进行插帧或抛帧处理，得到处理后的解码帧；根据处理后的解码帧合成混屏视频。In order to solve the above-mentioned technical problems, embodiments of the present invention provide a multi-party video mixing method, which includes the following steps: obtaining the encoded frames of N videos to be mixed and the input frame rate of each video, where N is greater than 1. Natural number; decode the encoded frame of the video whose input frame rate is different from the preset frame rate to obtain the decoded frame, and obtain the feature parameters of the video whose input frame rate is different from the preset frame rate; input the feature parameters into the motion complexity model, Obtain the motion complexity of the video whose input frame rate is different from the preset frame rate; perform frame interpolation or frame drop processing on the decoded frame according to the preset frame rate and motion complexity, and obtain the processed decoded frame; according to the processed decoded frame Composite mixed screen video.

本发明的实施方式还提供了一种多方视频混屏装置，包含：编码帧获取模块，用于获取待混屏的N个视频的编码帧和每个视频的输入帧率，N为大于1的自然数；参数获取模块，用于将输入帧率与预设帧率不同的视频的编码帧进行解码得到解码帧，并获取输入帧率与预设帧率不同的视频的特征参数；复杂度获取模块，用于将特征参数输入至运动复杂度模型，得到输入帧率与预设帧率不同的视频的运动复杂度；帧处理模块，用于根据预设帧率和运动复杂度对解码帧进行插帧或抛帧处理，得到处理后的解码帧；视频合成模块，用于根据处理后的解码帧合成混屏视频。Embodiments of the present invention also provide a multi-party video screen mixing device, comprising: an encoding frame acquisition module, configured to acquire the encoded frames of N videos to be mixed and the input frame rate of each video, where N is greater than 1 Natural number; parameter acquisition module, used to decode the encoded frame of the video whose input frame rate is different from the preset frame rate to obtain the decoded frame, and acquire the characteristic parameters of the video whose input frame rate is different from the preset frame rate; the complexity acquisition module , which is used to input the feature parameters into the motion complexity model to obtain the motion complexity of the video whose input frame rate is different from the preset frame rate; the frame processing module is used to interpolate the decoded frame according to the preset frame rate and motion complexity. Frame or drop frame processing to obtain processed decoded frames; video synthesis module, used to synthesize mixed-screen video according to the processed decoded frames.

本发明的实施方式还提供了一种网络设备，包括：至少一个处理器；以及，与至少一个处理器通信连接的存储器；其中，存储器存储有可被至少一个处理器执行的指令，指令被至少一个处理器执行，以使至少一个处理器能够执行上述的多方视频混屏方法。Embodiments of the present invention also provide a network device, comprising: at least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores instructions executable by the at least one processor, and the instructions are executed by at least one processor. One processor executes, so that at least one processor can execute the above-mentioned multi-party video mixing method.

本发明的实施方式还提供了一种计算机可读存储介质，存储有计算机程序，计算机程序被处理器执行时实现上述的多方视频混屏方法。Embodiments of the present invention further provide a computer-readable storage medium storing a computer program, and when the computer program is executed by a processor, the above-mentioned multi-party video mixing method is implemented.

本发明实施方式相对于现有技术而言，通过将输入帧率与预设帧率不同的视频进行插帧或抛帧处理，可以使处理后的视频的输入帧率与预设帧率相同或接近，从而使混屏后的各个视频的帧率相同或接近，提高混屏后的视频的同步性；同时通过运动复杂度模型获取视频的运动复杂度，在对视频进行插帧或抛帧处理时，结合视频的运动复杂度进行插帧或抛帧，可以使插入的帧或抛掉的帧更加合理，使混屏后的视频更加符合真实的情况，提高了混屏的通信视频的显示效果。Compared with the prior art, the embodiments of the present invention can make the input frame rate of the processed video the same as the preset frame rate or the same as the preset frame rate by performing frame insertion or frame drop processing on the video whose input frame rate is different from the preset frame rate. Close, so that the frame rate of each video after screen mixing is the same or similar, and the synchronization of the video after screen mixing is improved; at the same time, the motion complexity of the video is obtained through the motion complexity model, and the video is inserted or dropped. Frame processing When inserting or discarding frames in combination with the motion complexity of the video, the inserted or discarded frames can be made more reasonable, the mixed-screen video can be more in line with the real situation, and the display effect of the mixed-screen communication video can be improved. .

另外，根据预设帧率和运动复杂度对解码帧进行插帧或抛帧处理，包括：若输入帧率大于预设帧率，则对解码帧进行抛帧处理，其中，抛帧处理优先对运动复杂度小于或等于第一预设值的解码帧进行；若输入帧率小于预设帧率，则对解码帧进行插帧处理，其中，插帧处理优先对运动复杂度大于或等于第二预设值的解码帧进行。In addition, performing frame insertion or frame discarding processing on the decoded frame according to the preset frame rate and motion complexity includes: if the input frame rate is greater than the preset frame rate, performing frame discarding processing on the decoded frame, wherein the discarding frame processing takes priority to The decoded frame whose motion complexity is less than or equal to the first preset value is performed; if the input frame rate is less than the preset frame rate, frame interpolation processing is performed on the decoded frame. The decoding frame of the preset value is carried out.

另外，获取输入帧率与预设帧率不同的视频的特征参数，具体为：获取输入帧率与预设帧率不同的视频的宏块运动矢量、宏块编码类型和帧级的全局运动矢量作为特征参数。由于视频的宏块运动矢量、宏块编码类型和帧级的全局运动矢量较能体现视频的运动信息，因此将这些作为特征参数输入至运动复杂度模型，可以得到较准确的运动复杂度。In addition, acquiring the characteristic parameters of the video whose input frame rate is different from the preset frame rate, specifically: acquiring the macroblock motion vector, the macroblock coding type, and the frame-level global motion vector of the video whose input frame rate is different from the preset frame rate as a feature parameter. Since the video macroblock motion vector, macroblock coding type, and frame-level global motion vector can better reflect the motion information of the video, these are input into the motion complexity model as feature parameters to obtain a more accurate motion complexity.

另外，在将特征参数输入至运动复杂度模型之前，还包括：获取视频训练样本，提取视频训练样本的特征参数；将视频训练样本的特征参数输入至深度学习模型中训练；将训练后的深度学习模型作为运动复杂度模型。通过将视频训练样本输入至深度学习模型中训练，得到运动复杂度模型，从而可以根据运动复杂度模型获取输入帧率与预设帧率不同的视频的运动复杂度，在对视频进行插帧或抛帧处理时，结合视频的运动复杂度来插帧或抛帧，可以使混屏视频更加符合实际的运动情况，提高混屏视频的同步性。In addition, before inputting the feature parameters into the motion complexity model, it also includes: acquiring video training samples, and extracting feature parameters of the video training samples; inputting the feature parameters of the video training samples into the deep learning model for training; Learning models as motion complexity models. By inputting the video training samples into the deep learning model for training, the motion complexity model is obtained, so that the motion complexity of the video whose input frame rate is different from the preset frame rate can be obtained according to the motion complexity model. During frame drop processing, inserting or dropping frames combined with the motion complexity of the video can make the mixed-screen video more in line with the actual motion situation and improve the synchronization of the mixed-screen video.

另外，将视频训练样本的特征参数输入至深度学习模型中训练，具体为：将视频训练样本的特征参数输入至开源深度学习框架中训练。In addition, inputting the feature parameters of the video training samples into the deep learning model for training, specifically: inputting the feature parameters of the video training samples into the open source deep learning framework for training.

另外，获取待混屏的N个视频的编码帧和每个视频的输入帧率，包括：采用卡尔曼滤波分别获取每个视频的输入帧率。由于卡尔曼滤波可以获取到视频的输入帧率变化的范围，因此可以根据变化的范围对视频的输入帧率进行微调，从而使视频的输入帧率更加符合实际情况，更有利于提高混屏视频的同步性和流畅性。In addition, acquiring the encoded frames of the N videos to be mixed and the input frame rate of each video includes: using Kalman filtering to acquire the input frame rate of each video respectively. Since the Kalman filter can obtain the range of the video input frame rate change, the video input frame rate can be fine-tuned according to the range of change, so that the video input frame rate is more in line with the actual situation, which is more conducive to improving the mixed screen video synchronization and fluency.

另外，根据处理后的解码帧合成混屏视频，包括：根据处理后的解码帧和输入帧率与预设帧率相同的视频的编码帧合成混屏视频。In addition, synthesizing the mixed-screen video according to the processed decoded frames includes: synthesizing the mixed-screen video according to the processed decoded frames and the encoded frames of the video whose input frame rate is the same as the preset frame rate.

附图说明Description of drawings

一个或多个实施例通过与之对应的附图中的图片进行示例性说明，这些示例性说明并不构成对实施例的限定。One or more embodiments are exemplified by the pictures in the corresponding drawings, and these exemplified descriptions do not constitute limitations on the embodiments.

图1是本发明第一实施方式提供的多方视频混屏方法的流程示意图；1 is a schematic flowchart of a multi-party video screen mixing method provided by a first embodiment of the present invention;

图2是本发明第一实施方式提供的多方视频混屏方法中S104细化的流程示意图；2 is a schematic flowchart of the refinement of S104 in the multi-party video screen mixing method provided by the first embodiment of the present invention;

图3是本发明第一实施方式提供的多方视频混屏方法中S103之前的流程示意图；3 is a schematic flowchart before S103 in the multi-party video mixing method provided by the first embodiment of the present invention;

图4是本发明第二实施方式提供的多方视频混屏装置的模块结构示意图；4 is a schematic structural diagram of a module of a multi-party video screen mixing device provided by a second embodiment of the present invention;

图5是本发明第三实施方式提供的网络设备的结构示意图。FIG. 5 is a schematic structural diagram of a network device according to a third embodiment of the present invention.

具体实施方式Detailed ways

为使本发明的目的、技术方案和优点更加清楚，下面将结合附图对本发明的各实施方式进行详细的阐述。然而，本领域的普通技术人员可以理解，在本发明各实施方式中，为了使读者更好地理解本申请而提出了许多技术细节。但是，即使没有这些技术细节和基于以下各实施方式的种种变化和修改，也可以实现本申请所要求保护的技术方案。In order to make the objectives, technical solutions and advantages of the present invention clearer, each embodiment of the present invention will be described in detail below with reference to the accompanying drawings. However, those of ordinary skill in the art can appreciate that, in the various embodiments of the present invention, many technical details are set forth in order for the reader to better understand the present application. However, even without these technical details and various changes and modifications based on the following embodiments, the technical solutions claimed in the present application can be realized.

本发明的第一实施方式涉及一种多方视频混屏方法，通过获取待混屏的N个视频的编码帧和每个视频的输入帧率；将输入帧率与预设帧率不同的视频的编码帧进行解码得到解码帧，并获取输入帧率与预设帧率不同的视频的特征参数；将特征参数输入至运动复杂度模型，得到输入帧率与预设帧率不同的视频的运动复杂度；根据预设帧率和运动复杂度对解码帧进行插帧或抛帧处理，得到处理后的解码帧；根据处理后的解码帧合成混屏视频。通过对与预设帧率不同的解码帧进行插帧和抛帧处理，可以使混屏视频被显示时同步性好；同时通过运动复杂度模型获取视频的运动复杂度，根据视频的运动复杂度来进行插帧或抛帧处理，可以使插入的帧或抛掉的帧更加合理，使混屏视频更加符合真实运动情况，提高显示的流畅性。The first embodiment of the present invention relates to a multi-party video mixing method, by acquiring the encoded frames of N videos to be mixed and the input frame rate of each video; Decode the encoded frame to obtain the decoded frame, and obtain the feature parameters of the video whose input frame rate is different from the preset frame rate; input the feature parameters into the motion complexity model to obtain the motion complexity of the video whose input frame rate is different from the preset frame rate According to the preset frame rate and motion complexity, the decoded frame is inserted or dropped to obtain the processed decoded frame; the mixed-screen video is synthesized according to the processed decoded frame. By performing frame insertion and frame throwing processing on the decoded frames different from the preset frame rate, the mixed-screen video can be displayed with good synchronization; at the same time, the motion complexity of the video can be obtained through the motion complexity model, according to the motion complexity of the video. It can make the inserted or discarded frames more reasonable, make the mixed-screen video more in line with the real motion situation, and improve the fluency of the display.

应当说明的是，本发明实施方式的实施主体为接收视频的服务端，其中，服务端可以通过独立的服务器或者多个服务器组成的服务器集群来实现，以下以服务端为例进行说明。It should be noted that the implementation subject of the embodiments of the present invention is a server that receives video, wherein the server can be implemented by an independent server or a server cluster composed of multiple servers. The server is used as an example for description below.

本发明实施方式提供的多方视频混屏方法的具体流程如图1所示，包括以下步骤：The specific process of the multi-party video mixing method provided by the embodiment of the present invention is shown in FIG. 1 , and includes the following steps:

S101：获取待混屏的N个视频的编码帧和每个视频的输入帧率，N为大于1的自然数。S101: Obtain the encoded frames of N videos to be mixed and the input frame rate of each video, where N is a natural number greater than 1.

其中，输入帧率是指服务端接收到的视频的帧率。可选地，可以使用低通滤波方法对每个视频的输入帧率进行估计。优选地，可以采用卡尔曼滤波分别获取每个视频的输入帧率。其中，卡尔曼滤波(Kalman filter)是一种高效率的递归滤波器(自回归滤波器)，它能够从一系列的不完全及包含噪声的测量中，估计动态系统的状态。The input frame rate refers to the frame rate of the video received by the server. Alternatively, a low-pass filtering method can be used to estimate the input frame rate of each video. Preferably, Kalman filtering can be used to obtain the input frame rate of each video respectively. Among them, the Kalman filter is an efficient recursive filter (autoregressive filter), which can estimate the state of a dynamic system from a series of incomplete and noisy measurements.

应当理解的是，由于网络环境的不稳定，视频的输入帧率会受到网络环境的影响而使输入帧率产生变化，采用低通滤波方法的目的是获取视频的输入帧率的一个相对稳定的值。而采用卡尔曼滤波方法，除了可以获取到视频的输入帧率的相对稳定的值外，还可以得到输入帧率受网络环境影响产生的变化范围。当网络环境较不稳定时，例如变得非常卡顿时，应当对视频的输入帧率进行微调。由于卡尔曼滤波可以获取到视频的帧率变化的范围，因此可以根据该变化的范围对该视频的输入帧率进行微调，例如，若某个视频相对稳定的帧率值为25，微调时可根据变化的范围将视频的输入帧率微调成26、27或24等，从而使视频的输入帧率更加符合实际情况，更有利于提高混屏视频被显示时的同步性和流畅性。It should be understood that due to the instability of the network environment, the input frame rate of the video will be affected by the network environment and the input frame rate will change. The purpose of using the low-pass filtering method is to obtain a relatively stable input frame rate of the video. value. With the Kalman filtering method, in addition to obtaining a relatively stable value of the input frame rate of the video, it is also possible to obtain the variation range of the input frame rate affected by the network environment. The input frame rate of the video should be fine-tuned when the network environment is unstable, such as when it becomes very stuttering. Since the Kalman filter can obtain the range of the frame rate change of the video, the input frame rate of the video can be fine-tuned according to the range of change. For example, if the frame rate value of a relatively stable video is 25, the Adjust the input frame rate of the video to 26, 27, or 24, etc. according to the range of change, so that the input frame rate of the video is more in line with the actual situation, which is more conducive to improving the synchronization and fluency of the mixed-screen video when it is displayed.

具体地，服务端接收待混屏的N个视频的RTP包，对RTP包进行解析，得到视频的编码帧，并采用预设的低通滤波方法对视频的输入帧率进行估计，得到每个视频的输入帧率。例如，服务端在对每个视频的RTP包进行解析得到视频的编码帧后，通过调用卡尔曼滤波方法，即可以得到每个视频的输入帧率。Specifically, the server receives the RTP packets of the N videos to be mixed, parses the RTP packets to obtain the encoded frames of the video, and uses the preset low-pass filtering method to estimate the input frame rate of the video to obtain each The input frame rate of the video. For example, after parsing the RTP packet of each video to obtain the encoded frame of the video, the server can obtain the input frame rate of each video by calling the Kalman filtering method.

S102：将输入帧率与预设帧率不同的视频的编码帧进行解码得到解码帧，并获取输入帧率与预设帧率不同的视频的特征参数。S102: Decode the encoded frames of the video whose input frame rate is different from the preset frame rate to obtain decoded frames, and acquire characteristic parameters of the video whose input frame rate is different from the preset frame rate.

其中，预设帧率可以根据实际情况进行具体设置，这里不做限制。例如，可以对N个视频的输入帧率求平均值，将该平均值作为预设帧率；又或者获取N个视频中相同个数最多的输入帧率作为预设帧率，等等。视频的特征参数是指表征视频运动复杂度的参数，可选地，获取输入帧率与预设帧率不同的视频的特征参数是指获取该视频的宏块运动矢量、宏块编码类型和帧级的全局运动矢量，因为视频的宏块运动矢量、宏块编码类型和帧级的全局运动矢量较能体现视频的运动复杂度，也可以是该视频的其它特征参数，这里不做具体限制。Among them, the preset frame rate can be specifically set according to the actual situation, which is not limited here. For example, the input frame rates of N videos may be averaged, and the average value may be used as the preset frame rate; or the input frame rate with the largest number of the same number of N videos may be obtained as the preset frame rate, and so on. The feature parameters of the video refer to parameters that characterize the complexity of the video motion. Optionally, obtaining the feature parameters of the video whose input frame rate is different from the preset frame rate refers to obtaining the macroblock motion vector, macroblock coding type and frame of the video. Because the macroblock motion vector of the video, the macroblock coding type and the frame-level global motion vector can better reflect the motion complexity of the video, it can also be other characteristic parameters of the video, which is not limited here.

可选地，服务端先对N个视频的输入帧率进行判断，若其中的视频的输入帧率与预设帧率相同，则不进行处理；若其中的视频的输入帧率与预设帧率不同，则将该输入帧率与预设帧率不同的视频的编码帧进行解码得到解码帧。由于输入帧率与预设帧率相同的视频不用进行后续的插帧或抛帧处理，因此服务端只需获取输入帧率与预设帧率不同的视频的特征参数。Optionally, the server first judges the input frame rates of the N videos, and if the input frame rates of the videos are the same as the preset frame rates, no processing is performed; if the input frame rates of the videos are the same as the preset frame rates If the input frame rate is different from the preset frame rate, the encoded frame of the video whose input frame rate is different from the preset frame rate is decoded to obtain the decoded frame. Since the video whose input frame rate is the same as the preset frame rate does not need to perform subsequent frame insertion or frame drop processing, the server only needs to obtain the characteristic parameters of the video whose input frame rate is different from the preset frame rate.

具体地，服务端可以采用通用视频编解码开源库对输入帧率与预设帧率不同的视频进行解码得到解码帧。其中，通用视频编码解码开源库例如是ffmpeg、Xvid、X264或ffdshow等。可选地，服务端也可以采用其它视频编解码工具对视频进行解码，这里不做具体限制。在对视频进行解码时，服务端可以获取到视频的各种运动信息作为该视频的特征参数。Specifically, the server can use a general video codec open source library to decode the video whose input frame rate is different from the preset frame rate to obtain decoded frames. Among them, the general video encoding and decoding open source library is, for example, ffmpeg, Xvid, X264, or ffdshow. Optionally, the server can also use other video encoding and decoding tools to decode the video, which is not specifically limited here. When decoding the video, the server can obtain various motion information of the video as characteristic parameters of the video.

S103：将特征参数输入至运动复杂度模型，得到输入帧率与预设帧率不同的视频的运动复杂度。S103: Input the feature parameters into the motion complexity model to obtain the motion complexity of the video whose input frame rate is different from the preset frame rate.

其中，运动复杂度模型可以通过先获取多个视频样本的特征参数，再将多个视频样本的特征参数输入预先构建的神经网络进行训练后得到。神经网络例如可以为深度神经网络、卷积神经网络和深度置信网络和递归神经网络等。Among them, the motion complexity model can be obtained by first acquiring the feature parameters of multiple video samples, and then inputting the feature parameters of multiple video samples into a pre-built neural network for training. The neural network can be, for example, a deep neural network, a convolutional neural network, a deep belief network, a recurrent neural network, and the like.

具体地，服务端将获取的输入帧率与预设帧率不同的视频的特征参数，输入至运动复杂度模型中，即可得到输入帧率与预设帧率不同的视频的运动复杂度。Specifically, the server inputs the acquired feature parameters of the video with the input frame rate different from the preset frame rate into the motion complexity model, so as to obtain the motion complexity of the video with the input frame rate different from the preset frame rate.

S104：根据预设帧率和运动复杂度对解码帧进行插帧或抛帧处理，得到处理后的解码帧。S104: Perform frame interpolation or frame drop processing on the decoded frame according to the preset frame rate and motion complexity, to obtain a processed decoded frame.

其中，插帧处理是指插入解码帧使视频的输入帧率增大，抛帧处理是指抛掉解码帧使视频的输入帧率减小。The frame insertion processing refers to inserting decoded frames to increase the input frame rate of the video, and the frame dropping processing refers to discarding the decoded frames to reduce the input frame rate of the video.

具体地，服务端将视频的输入帧率与预设帧率进行比较，若视频的输入帧率大于预设帧率，则需要对视频进行抛帧处理，使抛帧处理后的视频的输入帧率等于或接近于预设帧率；若视频的输入帧率小于预设帧率，则需要对视频进行插帧处理，使插帧处理后的视频的输入帧率等于或接近于预设帧率。Specifically, the server compares the input frame rate of the video with the preset frame rate. If the input frame rate of the video is greater than the preset frame rate, the video needs to be frame-thrown, so that the input frame of the video after the frame-throwing process is processed. The frame rate is equal to or close to the preset frame rate; if the input frame rate of the video is less than the preset frame rate, it is necessary to perform frame interpolation processing on the video, so that the input frame rate of the video after frame interpolation processing is equal to or close to the preset frame rate .

当对视频的解码帧进行插帧或抛帧处理时，服务端结合视频的运动复杂度来进行插帧或抛帧处理。When performing frame insertion or frame drop processing on the decoded frames of the video, the server performs frame insertion or frame drop processing according to the motion complexity of the video.

在一个具体的例子中，S104中的根据预设帧率和运动复杂度对解码帧进行插帧或抛帧处理，如图2所示，具体包括以下步骤：In a specific example, in S104, frame interpolation or frame drop processing is performed on the decoded frame according to the preset frame rate and motion complexity, as shown in FIG. 2, and specifically includes the following steps:

S1041：若输入帧率大于预设帧率，则对解码帧进行抛帧处理，其中，抛帧处理优先对运动复杂度小于或等于第一预设值的解码帧进行。S1041: If the input frame rate is greater than the preset frame rate, perform frame throwing processing on the decoded frame, wherein the frame throwing processing is preferentially performed on the decoded frame whose motion complexity is less than or equal to the first preset value.

S1042：若输入帧率小于预设帧率，则对解码帧进行插帧处理，其中，插帧处理优先对运动复杂度大于或等于第二预设值的解码帧进行。S1042: If the input frame rate is less than the preset frame rate, perform frame interpolation processing on the decoded frame, wherein the frame interpolation processing is preferentially performed on the decoded frames whose motion complexity is greater than or equal to the second preset value.

可选地，可以用归一化的数值来表示运动复杂度，例如将运动复杂度用0至1之间的数值进行表示，再根据数值的不同将运动复杂度分为不同的类别，例如，[0，0.1]代表运动静止，(0.1，0.5]代表运动平缓，(0.5，0.8]代表运动较复杂，(0.8，1.0]代表运动非常复杂，可以根据实际需要设置不同数值对应的类别，这里不做具体限制。Optionally, the motion complexity can be represented by a normalized value, for example, the motion complexity is represented by a value between 0 and 1, and then the motion complexity can be divided into different categories according to the different values, for example, [0, 0.1] stands for static motion, (0.1, 0.5] stands for gentle motion, (0.5, 0.8] stands for more complicated motion, (0.8, 1.0] stands for very complicated motion, and you can set the categories corresponding to different values according to actual needs. Here No specific restrictions are imposed.

S1041中，第一预设值可以根据实际情况进行设置，例如可以设置为0.5，这里不做具体限制。In S1041, the first preset value may be set according to the actual situation, for example, it may be set to 0.5, which is not specifically limited here.

具体地，服务端将视频的输入帧率与预设帧率进行比较，若视频的输入帧率大于预设帧率，则对视频的解码帧进行抛帧处理，在进行抛帧处理时，优先抛掉运动复杂度小于或等于第一预设值的解码帧。应当理解的是，优先抛掉运动复杂度小于或等于第一预设值的解码帧，是指抛掉至少部分运动复杂度小于或等于第一预设值的解码帧，使视频的输入帧率与预设帧率相同或接近；若已全部抛掉小于或等于第一预设值的解码帧，仍未使视频的输入帧率与预设帧率相同或接近，则继续对运动复杂度大于或等于第一预设值的解码帧进行抛帧处理。可选地，在对运动复杂度小于或等于第一预设值的解码帧进行抛帧处理时，可以根据解码帧的运动复杂度作进一步细分，使运动复杂度越低的解码帧，越优先进行抛帧处理；也可以不作进一步细分，而按照解码帧的先后顺序来进行抛帧处理，具体可以根据实际需要进行设置，这里不做限制。可选地，在继续对运动复杂度大于或等于第一预设值的解码帧进行抛帧处理时，还可以设置第三预设值，尽量保留运动复杂度大于或等于第三预设值的帧，而优先抛掉运动复杂度小于或等于第三预设值的帧。其中，第三预设值可以根据实际需要进行设置，例如设置为0.8，这里不做具体限制。Specifically, the server compares the input frame rate of the video with the preset frame rate. If the input frame rate of the video is greater than the preset frame rate, it performs frame throwing processing on the decoded frames of the video. The decoded frames whose motion complexity is less than or equal to the first preset value are discarded. It should be understood that to preferentially discard decoded frames with motion complexity less than or equal to the first preset value means discarding at least some decoded frames with motion complexity less than or equal to the first preset value, so that the input frame rate of the video is reduced. The frame rate is the same as or close to the preset frame rate; if all the decoded frames less than or equal to the first preset value have been discarded, and the input frame rate of the video has not been made the same as or close to the preset frame rate, continue to adjust the motion complexity greater than or equal to the preset frame rate. Or the decoded frame equal to the first preset value is subjected to frame throwing processing. Optionally, when the frame throwing process is performed on the decoded frame whose motion complexity is less than or equal to the first preset value, it can be further subdivided according to the motion complexity of the decoded frame, so that the decoded frame with the lower motion complexity is more Priority is given to frame throwing processing; it is also possible to perform frame throwing processing without further subdivision, but to perform frame throwing processing according to the sequence of decoding frames, which can be set according to actual needs, which is not limited here. Optionally, when continuing to perform frame throwing processing on decoded frames whose motion complexity is greater than or equal to the first preset value, a third preset value can also be set, and try to keep those whose motion complexity is greater than or equal to the third preset value. frames, and preferentially discard frames whose motion complexity is less than or equal to the third preset value. Wherein, the third preset value can be set according to actual needs, for example, set to 0.8, which is not specifically limited here.

S1042中，第二预设值可以根据实际需要进行设置，例如设置为0.1，这里不做具体限制。In S1042, the second preset value may be set according to actual needs, for example, set to 0.1, which is not specifically limited here.

具体地，服务端将视频的输入帧率与预设帧率进行比较，若视频的输入帧率小于预设帧率，则对解码帧进行插帧处理。在进行插帧处理时，优先对运动复杂度大于或等于第二预设值的解码帧进行插帧。应当理解的是，优先对运动复杂度大于或等于第二预设值的解码帧进行插帧处理，是指至少对部分运动复杂度大于或等于第二预设值的解码帧进行插帧处理，使视频的输入帧率与预设帧率相同或接近；若已对运动复杂度大于或等于第二预设值的解码帧进行插帧处理，仍未使视频的输入帧率与预设帧率相同或接近，则继续对运动复杂度小于第二预设值的解码帧进行插帧处理。可选地，在对运动复杂度大于或等于第二预设值的解码帧进行插帧处理时，可以根据解码帧的运动复杂度作进一步细分，使运动复度越高的解码帧，越优先进行插帧处理，也可以不作进一步细分，根据解码帧的先后顺序来进行插帧处理，具体可以根据实际需要进行设置，这里不做限制。应当说明的是，在对运动复杂度大于或等于第二预设值的解码帧进行插帧处理时，是根据运动复杂度大于或等于第二预设值的解码帧的前一帧或后一帧计算得到插入它们之间的插入帧，再将插入帧插入。在计算插入帧时，可以根据视频其中一些特征参数(如宏块运动矢量)来计算得到，从而使插入帧插入后视频的显示效果更加流畅。可选地，在继续对运动复杂度小于第二预设值的解码帧进行插帧处理时，可以对运动复杂度小于第二预设值的解码帧进行简单的拷贝，减少计算量，也可以用上述的计算方法得到，这里不做具体限制。Specifically, the server compares the input frame rate of the video with the preset frame rate, and performs frame interpolation processing on the decoded frame if the input frame rate of the video is less than the preset frame rate. When performing frame interpolation processing, frame interpolation is preferentially performed on decoded frames whose motion complexity is greater than or equal to the second preset value. It should be understood that to preferentially perform frame interpolation processing on decoded frames whose motion complexity is greater than or equal to the second preset value refers to performing frame interpolation processing on at least part of the decoded frames whose motion complexity is greater than or equal to the second preset value, Make the input frame rate of the video the same as or close to the preset frame rate; if frame interpolation has been performed on the decoded frames whose motion complexity is greater than or equal to the second preset value, the input frame rate of the video has not been adjusted to the preset frame rate. If they are the same or close to each other, continue to perform frame interpolation processing on the decoded frames whose motion complexity is less than the second preset value. Optionally, when performing frame interpolation processing on a decoded frame whose motion complexity is greater than or equal to the second preset value, further subdivisions can be made according to the motion complexity of the decoded frame, so that the decoded frame with higher motion complexity is more The frame insertion processing is given priority, or it may not be further subdivided, and the frame insertion processing is performed according to the sequence of the decoded frames, which can be set according to actual needs, which is not limited here. It should be noted that, when performing frame interpolation processing on the decoded frame whose motion complexity is greater than or equal to the second preset value, the frame interpolation is performed according to the frame before or after the decoded frame whose motion complexity is greater than or equal to the second preset value. The frames are calculated to get the interpolated frames inserted between them, and then the interpolated frames are interpolated. When calculating the inserted frame, it can be calculated according to some characteristic parameters of the video (such as macroblock motion vector), so that the display effect of the video after the inserted frame is inserted is smoother. Optionally, when continuing to perform frame interpolation processing on the decoded frame whose motion complexity is less than the second preset value, a simple copy may be performed on the decoded frame whose motion complexity is less than the second preset value to reduce the amount of calculation, or It can be obtained by the above calculation method, and there is no specific limitation here.

应当说明的是，在对输入帧率与预设帧率不同的视频进行抛帧或插帧处理，使视频的输入帧率与预设帧率相同或接近时，可以根据视频的输入帧率与预设帧率的差值来确定应当抛掉或插入的帧数。例如若视频的输入帧率为27，预设帧率为25，则应当抛掉的帧数为27-25＝2帧。It should be noted that, when the input frame rate of the video is different from the preset frame rate, the frame throwing or interpolation processing is performed, so that the input frame rate of the video is the same as or close to the preset frame rate. The difference between the preset frame rates to determine the number of frames that should be dropped or inserted. For example, if the input frame rate of the video is 27 and the preset frame rate is 25, the number of frames that should be discarded is 27-25=2 frames.

S105：根据处理后的解码帧合成混屏视频。S105: Synthesize a mixed-screen video according to the processed decoded frame.

其中，混屏是指将待混屏的视频合成在一个屏幕中。Wherein, the mixed screen refers to synthesizing the videos to be mixed into one screen.

具体地，服务端将进行抛帧或插帧处理后的解码帧进行混屏，合成混屏视频后进行编码后输出。应当理解的是，由于可能存在视频的输入帧率与预设帧率相同，而该视频不需要进行抛帧或插帧处理，因此若存在输入帧率与预设帧率相同的视频时，服务端将处理后的解码帧和该视频一起进行混屏，合成混屏视频后再输出。Specifically, the server mixes the decoded frames after frame-throwing or frame-insertion processing, synthesizes the mixed-screen video, encodes it, and outputs it. It should be understood that since there may be a video with the same input frame rate as the preset frame rate, and the video does not need to be framed or inserted, if there is a video with the same input frame rate as the preset frame rate, the service will The terminal mixes the processed decoded frame and the video together, synthesizes the mixed-screen video, and then outputs it.

与现有技术相比，本发明实施方式提供的多方视频混屏方法，通过将输入帧率与预设帧率不同的视频进行插帧或抛帧处理，可以使处理后的视频的输入帧率与预设帧率相同或接近，从而使混屏后的各个视频的帧率相同或接近，提高混屏后的视频的同步性；同时通过运动复杂度模型获取视频的运动复杂度，在对视频进行插帧或抛帧处理时，结合视频的运动复杂度进行插帧或抛帧，可以使插入的帧或抛掉的帧更加合理，使混屏后的视频更加符合真实的情况，提高了混屏的通信视频的显示效果。Compared with the prior art, the multi-party video mixing method provided by the embodiment of the present invention can make the input frame rate of the processed video be changed by inserting or discarding the video whose input frame rate is different from the preset frame rate. The frame rate is the same or close to the preset frame rate, so that the frame rate of each video after screen mixing is the same or close to improving the synchronization of the video after screen mixing; at the same time, the motion complexity of the video is obtained through the motion complexity model. When inserting or throwing frames, combining the motion complexity of the video to insert or throw frames can make the inserted or dropped frames more reasonable, make the mixed video more realistic, and improve the mixing effect. The display effect of the communication video on the screen.

在一个具体的例子中，在S103之前，即在将特征参数输入至运动复杂度模型之前，如图3所示，还包括以下步骤：In a specific example, before S103, that is, before the feature parameters are input into the motion complexity model, as shown in FIG. 3, the following steps are further included:

S201：获取视频训练样本，提取视频训练样本的特征参数。S201: Obtain a video training sample, and extract feature parameters of the video training sample.

S202：将视频训练样本的特征参数输入至深度学习模型中训练。S202: Input the feature parameters of the video training samples into the deep learning model for training.

S203：将训练后的深度学习模型作为运动复杂度模型。S203: Use the trained deep learning model as a motion complexity model.

S201中，视频训练样本可以采集各种场景下的实际通话过程中的视频数据源组成，也可以为根据训练需求专门录制的视频，这里不做限制。In S201, the video training samples may be composed of video data sources collected during actual calls in various scenarios, or may be videos specially recorded according to training requirements, which are not limited here.

S202中，深度学习模型是指基于深度神经网络构建的学习模型，用于学习视频的运行信息，从而得到视频的运动复杂度。可选地，深度学习模型可以为通用的开源深度学习框架，例如Tensorflow、Torch或Caffe等，即将视频训练本的特征参数输入至开源深度学习框架中训练。In S202, the deep learning model refers to a learning model constructed based on a deep neural network, which is used to learn the operation information of the video, thereby obtaining the motion complexity of the video. Optionally, the deep learning model can be a general open source deep learning framework, such as Tensorflow, Torch, or Caffe, etc., that is, input the feature parameters of the video training book into the open source deep learning framework for training.

具体地，服务端获取视频训练样本，对视频训练样本进行编解码操作，提取视频训练样本的特征参数，再将视频训练样本的特征参数输入至深度学习模型中训练，当深度学习模型训练到预设精度后，将训练后的深度学习模型作为运动复杂度模型。Specifically, the server obtains the video training samples, performs encoding and decoding operations on the video training samples, extracts the feature parameters of the video training samples, and then inputs the feature parameters of the video training samples into the deep learning model for training. After setting the accuracy, the trained deep learning model is used as the motion complexity model.

通过将视频训练样本输入至深度学习模型中训练，得到运动复杂度模型，从而可以根据运动复杂度模型获取输入帧率与预设帧率不同的视频的运动复杂度，在对视频进行插帧或抛帧处理时，结合视频的运动复杂度来插帧或抛帧，可以使混屏视频更加符合实际的运动情况，提高混屏视频的显示的同步性和流畅度。By inputting the video training samples into the deep learning model for training, the motion complexity model is obtained, so that the motion complexity of the video whose input frame rate is different from the preset frame rate can be obtained according to the motion complexity model. During frame drop processing, inserting or dropping frames combined with the motion complexity of the video can make the mixed-screen video more in line with the actual motion situation, and improve the display synchronization and fluency of the mixed-screen video.

上面各种方法的步骤划分，只是为了描述清楚，实现时可以合并为一个步骤或者对某些步骤进行拆分，分解为多个步骤，只要包含相同的逻辑关系，都在本专利的保护范围内；对算法中或者流程中添加无关紧要的修改或者引入无关紧要的设计，但不改变其算法和流程的核心设计都在该专利的保护范围内。The step division of the above various methods is only for the purpose of describing clearly. During implementation, it can be combined into one step or some steps can be split and decomposed into multiple steps, as long as they contain the same logical relationship, they are all within the protection scope of this patent. ;Adding insignificant modifications to the algorithm or process or introducing insignificant designs, but not changing the core design of the algorithm and process are all within the scope of protection of this patent.

本发明第二实施方式涉及一种多方视频混屏装置，如图4所示，包含：编码帧获取模块301、参数获取模块302、复杂度获取模块303、帧处理模块304和视频合成模块305。The second embodiment of the present invention relates to a multi-party video mixing device, as shown in FIG.

编码帧获取模块301，用于获取待混屏的N个视频的编码帧和每个所述视频的输入帧率，所述N为大于1的自然数；An encoded frame acquisition module 301, configured to acquire encoded frames of N videos to be mixed and the input frame rate of each of the videos, where N is a natural number greater than 1;

参数获取模块302，用于将所述输入帧率与预设帧率不同的所述视频的编码帧进行解码得到解码帧，并获取输入帧率与预设帧率不同的视频的特征参数；A parameter obtaining module 302, configured to decode the encoded frame of the video whose input frame rate is different from the preset frame rate to obtain a decoded frame, and obtain the characteristic parameters of the video whose input frame rate is different from the preset frame rate;

复杂度获取模块303，用于将所述特征参数输入至运动复杂度模型，得到所述输入帧率与预设帧率不同的视频的运动复杂度；The complexity obtaining module 303 is used for inputting the feature parameter into the motion complexity model to obtain the motion complexity of the video whose input frame rate is different from the preset frame rate;

帧处理模块304，用于根据所述预设帧率和所述运动复杂度对所述解码帧进行插帧或抛帧处理，得到处理后的解码帧；a frame processing module 304, configured to perform frame insertion or frame throwing processing on the decoded frame according to the preset frame rate and the motion complexity, to obtain a processed decoded frame;

视频合成模块305，用于根据所述处理后的解码帧合成混屏视频。The video synthesis module 305 is configured to synthesize the mixed-screen video according to the processed decoded frames.

进一步地，帧处理模块304还用于：Further, the frame processing module 304 is also used for:

若所述输入帧率大于所述预设帧率，则对所述解码帧进行抛帧处理，其中，所述抛帧处理优先对运动复杂度小于或等于第一预设值的解码帧进行；If the input frame rate is greater than the preset frame rate, performing frame throwing processing on the decoded frame, wherein the frame throwing processing is preferentially performed on decoded frames whose motion complexity is less than or equal to a first preset value;

若所述输入帧率小于所述预设帧率，则对所述解码帧进行插帧处理，其中，所述插帧处理优先对运动复杂度大于或等于第二预设值的解码帧进行。If the input frame rate is less than the preset frame rate, frame interpolation processing is performed on the decoded frame, wherein the frame interpolation processing is performed preferentially on decoded frames with a motion complexity greater than or equal to a second preset value.

进一步地，参数获取模块302还用于：获取输入帧率与预设帧率不同的视频的宏块运动矢量、宏块编码类型和帧级的全局运动矢量作为特征参数。Further, the parameter obtaining module 302 is further configured to obtain the macroblock motion vector, the macroblock coding type and the frame-level global motion vector of the video whose input frame rate is different from the preset frame rate as characteristic parameters.

进一步地，本发明实施方式提供的多方视频混屏装置还包括模型确定模块，其中，模型确定模块用于：Further, the multi-party video mixing device provided by the embodiment of the present invention further includes a model determination module, wherein the model determination module is used for:

获取视频训练样本，提取所述视频训练样本的特征参数；Obtain a video training sample, and extract the characteristic parameters of the video training sample;

将所述视频训练样本的特征参数输入至深度学习模型中训练；The feature parameters of the video training samples are input into the deep learning model for training;

将训练后的深度学习模型作为所述运动复杂度模型。The trained deep learning model is used as the motion complexity model.

进一步地，模型确定模块还用于：将所述视频训练样本的特征参数输入至开源深度学习框架中训练。Further, the model determination module is further configured to: input the feature parameters of the video training samples into an open source deep learning framework for training.

进一步地，编码帧获取模块301还用于：采用卡尔曼滤波分别获取每个所述视频的输入帧率。Further, the coded frame obtaining module 301 is further configured to obtain the input frame rate of each of the videos by using Kalman filtering.

进一步地，视频合成模块305还用于：根据所述处理后的解码帧和输入帧率与所述预设帧率相同的视频的编码帧合成混屏视频。Further, the video synthesis module 305 is further configured to: synthesize the mixed-screen video according to the processed decoded frame and the encoded frame of the video whose input frame rate is the same as the preset frame rate.

不难发现，本实施方式为与第一实施方式相对应的装置实施例，本实施方式可与第一实施方式互相配合实施。第一实施方式中提到的相关技术细节在本实施方式中依然有效，为了减少重复，这里不再赘述。相应地，本实施方式中提到的相关技术细节也可应用在第一实施方式中。It is not difficult to find that this embodiment is a device example corresponding to the first embodiment, and this embodiment can be implemented in cooperation with the first embodiment. The relevant technical details mentioned in the first embodiment are still valid in this embodiment, and are not repeated here in order to reduce repetition. Correspondingly, the related technical details mentioned in this embodiment can also be applied to the first embodiment.

值得一提的是，本实施方式中所涉及到的各模块均为逻辑模块，在实际应用中，一个逻辑单元可以是一个物理单元，也可以是一个物理单元的一部分，还可以以多个物理单元的组合实现。此外，为了突出本发明的创新部分，本实施方式中并没有将与解决本发明所提出的技术问题关系不太密切的单元引入，但这并不表明本实施方式中不存在其它的单元。It is worth mentioning that each module involved in this embodiment is a logical module. In practical applications, a logical unit may be a physical unit, a part of a physical unit, or multiple physical units. A composite implementation of the unit. In addition, in order to highlight the innovative part of the present invention, this embodiment does not introduce units that are not closely related to solving the technical problem proposed by the present invention, but this does not mean that there are no other units in this embodiment.

本发明第三实施方式涉及一种网络设备，如图5所示，包括至少一个处理器401；以及，与至少一个处理器401通信连接的存储器402；其中，存储器402存储有可被至少一个处理器401执行的指令，指令被至少一个处理器401执行，以使至少一个处理器401能够执行上述的多方视频混屏方法。The third embodiment of the present invention relates to a network device, as shown in FIG. 5 , comprising at least one processor 401; and a memory 402 connected in communication with the at least one processor 401; wherein, the memory 402 stores data that can be processed by the at least one processor 401. The instructions are executed by the processor 401, and the instructions are executed by the at least one processor 401, so that the at least one processor 401 can execute the above-mentioned multi-party video mixing method.

其中，存储器402和处理器401采用总线方式连接，总线可以包括任意数量的互联的总线和桥，总线将一个或多个处理器401和存储器402的各种电路连接在一起。总线还可以将诸如外围设备、稳压器和功率管理电路等之类的各种其他电路连接在一起，这些都是本领域所公知的，因此，本文不再对其进行进一步描述。总线接口在总线和收发机之间提供接口。收发机可以是一个元件，也可以是多个元件，比如多个接收器和发送器，提供用于在传输介质上与各种其他装置通信的单元。经处理器401处理的数据通过天线在无线介质上进行传输，进一步，天线还接收数据并将数据传送给处理器401。The memory 402 and the processor 401 are connected by a bus, and the bus may include any number of interconnected buses and bridges, and the bus connects one or more processors 401 and various circuits of the memory 402 together. The bus may also connect together various other circuits, such as peripherals, voltage regulators, and power management circuits, which are well known in the art and therefore will not be described further herein. The bus interface provides the interface between the bus and the transceiver. A transceiver may be a single element or multiple elements, such as multiple receivers and transmitters, providing a means for communicating with various other devices over a transmission medium. The data processed by the processor 401 is transmitted on the wireless medium through the antenna, and further, the antenna also receives the data and transmits the data to the processor 401 .

处理器401负责管理总线和通常的处理，还可以提供各种功能，包括定时，外围接口，电压调节、电源管理以及其他控制功能。而存储器402可以被用于存储处理器401在执行操作时所使用的数据。Processor 401 is responsible for managing the bus and general processing, and may also provide various functions including timing, peripheral interface, voltage regulation, power management, and other control functions. The memory 402 may be used to store data used by the processor 401 when performing operations.

本发明第四实施方式涉及一种计算机可读存储介质，存储有计算机程序。计算机程序被处理器执行时实现上述方法实施例。A fourth embodiment of the present invention relates to a computer-readable storage medium storing a computer program. The above method embodiments are implemented when the computer program is executed by the processor.

即，本领域技术人员可以理解实现上述实施例方法中的全部或部分步骤是可以通过程序来指令相关的硬件来完成，该程序存储在一个存储介质中，包括若干指令用以使得一个设备(可以是单片机，芯片等)或处理器(processor)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括：U盘、移动硬盘、只读存储器(ROM，Read-OnlyMemory)、随机存取存储器(RAM，Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。That is, those skilled in the art can understand that all or part of the steps in the method of the above-mentioned embodiments can be completed by instructing the relevant hardware through a program, and the program is stored in a storage medium and includes several instructions to make a device (which can be It is a single chip microcomputer, a chip, etc.) or a processor (processor) that executes all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned storage medium includes: U disk, removable hard disk, Read-Only Memory (ROM, Read-Only Memory), Random Access Memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program codes.

本领域的普通技术人员可以理解，上述各实施方式是实现本发明的具体实施例，而在实际应用中，可以在形式上和细节上对其作各种改变，而不偏离本发明的精神和范围。Those skilled in the art can understand that the above-mentioned embodiments are specific examples for realizing the present invention, and in practical applications, various changes in form and details can be made without departing from the spirit and the spirit of the present invention. scope.

Claims

1. A multi-party video screen mixing method is characterized by comprising the following steps:

acquiring coding frames of N videos to be mixed and an input frame rate of each video, wherein N is a natural number greater than 1;

decoding the coded frames of the video with the input frame rate different from the preset frame rate to obtain decoded frames, and acquiring the characteristic parameters of the video with the input frame rate different from the preset frame rate, wherein the characteristic parameters comprise: a macroblock motion vector, a macroblock coding type, and a frame-level global motion vector;

inputting the characteristic parameters into a motion complexity model to obtain the motion complexity of the video with the input frame rate different from a preset frame rate;

performing frame interpolation or frame polishing processing on the decoded frame according to the preset frame rate and the motion complexity to obtain a processed decoded frame; wherein the frame polishing process comprises: the method for processing the frame dropping of the decoded frame with lower motion complexity is characterized by comprising the following steps: preferentially discarding at least part of decoded frames with motion complexity smaller than or equal to a first preset value so as to enable the input frame rate of the video to be the same as or close to the preset frame rate; the frame interpolation processing comprises: the method for performing frame interpolation on the decoded frame with higher motion complexity is characterized by comprising the following steps: preferentially performing frame interpolation processing on at least part of decoded frames with motion complexity greater than or equal to a second preset value so as to enable the input frame rate of the video to be the same as or close to the preset frame rate;

and synthesizing the mixed screen video according to the processed decoded frame.

2. The multi-party video mixing method according to claim 1, wherein the frame interpolation or frame dropping processing on the decoded frame according to the preset frame rate and the motion complexity comprises:

if the input frame rate is greater than the preset frame rate, performing frame polishing processing on the decoded frame, wherein the frame polishing processing is performed on the decoded frame with the motion complexity less than or equal to a first preset value preferentially;

and if the input frame rate is less than the preset frame rate, performing frame interpolation on the decoded frame, wherein the frame interpolation is preferentially performed on the decoded frame with the motion complexity greater than or equal to a second preset value.

3. The multi-party video mixing method according to claim 1, further comprising, before inputting the feature parameters into a motion complexity model:

acquiring a video training sample, and extracting characteristic parameters of the video training sample;

inputting the characteristic parameters of the video training samples into a deep learning model for training;

and taking the trained deep learning model as the motion complexity model.

4. The multi-party video mixing method according to claim 3, wherein the feature parameters of the video training samples are input into a deep learning model for training, specifically:

and inputting the characteristic parameters of the video training samples into an open source deep learning framework for training.

5. The multi-party video mixing method according to claim 1, wherein said obtaining encoded frames of N videos to be mixed and an input frame rate of each of the videos comprises:

and respectively acquiring the input frame rate of each video by adopting Kalman filtering.

6. The multi-party video mixing method according to claim 1, wherein said synthesizing of the mixed screen video according to the processed decoded frames comprises:

and synthesizing the mixed screen video according to the processed decoded frame and the coded frame of the video with the input frame rate being the same as the preset frame rate.

7. A multi-party video mixing device, comprising:

the device comprises a coded frame acquisition module, a frame mixing module and a frame mixing module, wherein the coded frame acquisition module is used for acquiring coded frames of N videos to be mixed and an input frame rate of each video, and N is a natural number greater than 1;

a parameter obtaining module, configured to decode the encoded frames of the video with the input frame rate different from the preset frame rate to obtain decoded frames, and obtain characteristic parameters of the video with the input frame rate different from the preset frame rate, where the characteristic parameters include: a macroblock motion vector, a macroblock coding type, and a frame-level global motion vector;

the complexity obtaining module is used for inputting the characteristic parameters into a motion complexity model to obtain the motion complexity of the video with the input frame rate different from a preset frame rate;

the frame processing module is used for performing frame interpolation or frame polishing processing on the decoded frame according to the preset frame rate and the motion complexity to obtain a processed decoded frame; wherein the frame dropping process comprises: the method for processing the frame dropping in the decoding frame with lower motion complexity is characterized by comprising the following steps: preferentially discarding at least part of decoded frames with motion complexity smaller than or equal to a first preset value so as to enable the input frame rate of the video to be the same as or close to the preset frame rate; the frame interpolation processing comprises: the method for performing frame interpolation on the decoded frame with higher motion complexity is characterized by comprising the following steps: preferentially performing frame interpolation processing on at least part of decoded frames with the motion complexity greater than or equal to a second preset value so as to enable the input frame rate of the video to be the same as or close to the preset frame rate;

and the video synthesis module is used for synthesizing the mixed screen video according to the processed decoded frame.

8. A network device, comprising:

at least one processor; and the number of the first and second groups,

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the multi-party video mixing method of any one of claims 1-6.

9. A computer-readable storage medium storing a computer program, wherein the computer program, when executed by a processor, implements the multi-party video mixing method according to any one of claims 1 to 6.