CN107993217B

CN107993217B - Video data real-time processing method and device and computing equipment

Info

Publication number: CN107993217B
Application number: CN201711405700.9A
Authority: CN
Inventors: 董健
Original assignee: Beijing Qihoo Technology Co Ltd
Current assignee: Beijing Qihoo Technology Co Ltd
Priority date: 2017-12-22
Filing date: 2017-12-22
Publication date: 2021-04-09
Anticipated expiration: 2037-12-22
Also published as: CN107993217A

Abstract

The invention discloses a real-time processing method, a real-time processing device and a computing device of video data, wherein the method carries out grouping processing on frame images contained in the video data and comprises the following steps: acquiring a current frame image in a video in real time; judging whether the current frame image is the 1 st frame image of any group; if so, inputting the current frame image into a neural network to obtain a processed current frame image; if not, inputting the current frame image into the neural network, after the operation result of the ith layer of the convolutional layer is obtained by operating the ith layer of the convolutional layer of the neural network, obtaining the operation result of the jth layer of the deconvolution layer obtained by inputting the 1 st frame image of the group to which the current frame image belongs into the neural network, and directly carrying out image fusion on the operation result of the ith layer of the convolutional layer and the operation result of the jth layer of the deconvolution layer to obtain a processed current frame image; i and j are natural numbers; outputting the processed current frame image; and repeatedly executing the steps until the processing of all the frame images in the video data is finished.

Description

Video data real-time processing method and device and computing equipment

Technical Field

The invention relates to the field of image processing, in particular to a method and a device for processing video data in real time and computing equipment.

Background

With the development of science and technology, the technology of image acquisition equipment is also increasing day by day. The video recorded by the image acquisition equipment is clearer, and the resolution and the display effect are also greatly improved. When recording the video, the video can be correspondingly processed according to various personalized requirements of the user.

When a video is processed, each frame image in the video is often processed as a separate frame image, and continuity between front and rear frames in the video is not considered. This process requires processing for each frame, and the processing speed is slow, which takes a long time.

Therefore, a real-time video data processing method is needed to increase the speed of real-time video processing.

Disclosure of Invention

In view of the above, the present invention is proposed in order to provide a video data real-time processing method and apparatus, a computing device, which overcome the above problems or at least partially solve the above problems.

According to an aspect of the present invention, there is provided a method for processing video data in real time, the method performing packet processing on frame images included in the video data, the method including:

acquiring a current frame image in a video shot and/or recorded by image acquisition equipment in real time; or, acquiring a current frame image in a currently played video in real time;

judging whether the current frame image is the 1 st frame image of any group;

if so, inputting the current frame image into the trained neural network, and obtaining a processed current frame image after the operation of all the convolution layers and the deconvolution layers of the neural network;

if not, inputting the current frame image into the trained neural network, after calculating to the ith convolution layer of the neural network to obtain the calculation result of the ith convolution layer, acquiring the calculation result of the jth deconvolution layer obtained by inputting the 1 st frame image of the group to which the current frame image belongs into the neural network, and directly carrying out image fusion on the calculation result of the ith convolution layer and the calculation result of the jth deconvolution layer to obtain a processed current frame image; wherein i and j are natural numbers;

outputting the processed current frame image;

and repeatedly executing the steps until the processing of all the frame images in the video data is finished.

Optionally, after determining that the current frame image is not the 1 st frame image of any packet, the method further comprises:

calculating the frame distance between the current frame image and the 1 st frame image of the group to which the current frame image belongs;

determining values of i and j according to the frame interval; the layer distance between the ith convolutional layer and the last convolutional layer is in inverse proportion to the frame distance, and the layer distance between the jth deconvolution layer and the output layer is in direct proportion to the frame distance.

Optionally, the method further comprises: and presetting the corresponding relation between the frame interval and the values of i and j.

Optionally, after directly performing image fusion on the operation result of the i-th convolutional layer and the operation result of the j-th deconvolution layer, the method further includes:

if the jth deconvolution layer is the last deconvolution layer of the neural network, inputting the image fusion result into an output layer to obtain a processed current frame image;

and if the j-th deconvolution layer is not the last deconvolution layer of the neural network, inputting the image fusion result into the j + 1-th deconvolution layer, and obtaining the processed current frame image through the subsequent operation of the deconvolution layer and the output layer.

Optionally, inputting the current frame image into a trained neural network, and after the operation of all convolutional layers and deconvolution layers of the neural network, obtaining the processed current frame image further includes: after each convolution layer before the last convolution layer of the neural network is calculated, the calculation result of each convolution layer is subjected to downsampling processing.

Optionally, before the operation on the ith convolutional layer of the neural network obtains the operation result of the ith convolutional layer, the method further includes: after each convolution layer before the ith convolution layer of the neural network is calculated, the calculation result of each convolution layer is subjected to down-sampling processing.

Optionally, each group of video data contains n frame images; wherein n is a fixed preset value.

Optionally, the method further comprises:

and displaying the processed video data in real time.

Optionally, the method further comprises:

and uploading the processed video data to a cloud server.

Optionally, uploading the processed video data to a cloud server further includes:

and uploading the processed video data to a cloud video platform server so that the cloud video platform server can display the video data on a cloud video platform.

and uploading the processed video data to a cloud live broadcast server so that the cloud live broadcast server can push the video data to a client of a watching user in real time.

and uploading the processed video data to a cloud public server so that the cloud public server pushes the video data to a public attention client.

According to another aspect of the present invention, there is provided a video data real-time processing apparatus which performs packet processing on frame images included in video data, including:

the acquisition module is suitable for acquiring a current frame image in a video shot and/or recorded by image acquisition equipment in real time; or, acquiring a current frame image in a currently played video in real time;

the judging module is suitable for judging whether the current frame image is the 1 st frame image of any group, and if so, the first processing module is executed; otherwise, executing the second processing module;

the first processing module is suitable for inputting the current frame image into a trained neural network, and obtaining a processed current frame image after the operation of all the convolution layers and the deconvolution layers of the neural network;

the second processing module is suitable for inputting the current frame image into the trained neural network, obtaining the operation result of the ith convolution layer after the operation is carried out on the ith convolution layer of the neural network to obtain the operation result of the ith convolution layer, obtaining the operation result of the jth deconvolution layer obtained by inputting the 1 st frame image of the group to which the current frame image belongs into the neural network, and directly carrying out image fusion on the operation result of the ith convolution layer and the operation result of the jth deconvolution layer to obtain the processed current frame image; wherein i and j are natural numbers;

the output module is suitable for outputting the processed current frame image;

and the circulating module is suitable for repeatedly executing the acquiring module, the judging module, the first processing module, the second processing module and/or the output module until the processing of all frame images in the video data is completed.

Optionally, the apparatus further comprises:

the frame spacing calculation module is suitable for calculating the frame spacing between the current frame image and the 1 st frame image of the group to which the current frame image belongs;

the determining module is suitable for determining values of i and j according to the frame interval; the layer distance between the ith convolutional layer and the last convolutional layer is in inverse proportion to the frame distance, and the layer distance between the jth deconvolution layer and the output layer is in direct proportion to the frame distance.

Optionally, the apparatus further comprises:

and the presetting module is suitable for presetting the corresponding relation between the frame interval and the values of i and j.

Optionally, the second processing module is further adapted to:

Optionally, the first processing module is further adapted to:

after each convolution layer before the last convolution layer of the neural network is calculated, the calculation result of each convolution layer is subjected to downsampling processing.

Optionally, the second processing module is further adapted to:

after each convolution layer before the ith convolution layer of the neural network is calculated, the calculation result of each convolution layer is subjected to down-sampling processing.

Optionally, the apparatus further comprises:

and the display module is suitable for displaying the processed video data in real time.

Optionally, the apparatus further comprises:

and the uploading module is suitable for uploading the processed video data to the cloud server.

Optionally, the upload module is further adapted to:

According to yet another aspect of the present invention, there is provided a computing device comprising: the processor, the memory and the communication interface complete mutual communication through the communication bus;

the memory is used for storing at least one executable instruction, and the executable instruction enables the processor to execute the operation corresponding to the video data real-time processing method.

According to still another aspect of the present invention, there is provided a computer storage medium having at least one executable instruction stored therein, where the executable instruction causes a processor to perform operations corresponding to the video data real-time processing method.

According to the video data real-time processing method and device and the computing equipment, the current frame image in the video shot and/or recorded by the image acquisition equipment is acquired in real time; or, acquiring a current frame image in a currently played video in real time; judging whether the current frame image is the 1 st frame image of any group; if so, inputting the current frame image into the trained neural network, and obtaining a processed current frame image after the operation of all the convolution layers and the deconvolution layers of the neural network; if not, inputting the current frame image into the trained neural network, after calculating to the ith convolution layer of the neural network to obtain the calculation result of the ith convolution layer, acquiring the calculation result of the jth deconvolution layer obtained by inputting the 1 st frame image of the group to which the current frame image belongs into the neural network, and directly carrying out image fusion on the calculation result of the ith convolution layer and the calculation result of the jth deconvolution layer to obtain a processed current frame image; wherein i and j are natural numbers; outputting the processed current frame image; and repeatedly executing the steps until the processing of all the frame images in the video data is finished. The invention fully utilizes the continuity and the relevance among all the frames of images in the video data, when the video data is processed in real time, the video data is processed in groups, the 1 st frame of image in each group completes the operation of all the convolution layers and the deconvolution layers through the neural network, the other frames of images except the 1 st frame of image are only operated to the i-th layer of convolution layers, the operation result of the j-th layer of deconvolution layer obtained by multiplexing the 1 st frame of image is multiplexed for image fusion, the operation amount of the neural network is greatly reduced, and the speed of the real-time processing of the video data is improved.

The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:

fig. 1 shows a flow diagram of a method of real-time processing of video data according to an embodiment of the invention;

fig. 2 shows a flow chart of a method of real-time processing of video data according to another embodiment of the invention;

fig. 3 shows a functional block diagram of a video data real-time processing apparatus according to an embodiment of the present invention;

fig. 4 shows a functional block diagram of a video data real-time processing apparatus according to another embodiment of the present invention;

FIG. 5 illustrates a schematic structural diagram of a computing device, according to an embodiment of the invention.

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

Fig. 1 shows a flow chart of a method for real-time processing of video data according to an embodiment of the invention. As shown in fig. 1, the real-time processing method of video data specifically includes the following steps:

step S101, acquiring a current frame image in a video shot and/or recorded by image acquisition equipment in real time; or, the current frame image in the currently played video is acquired in real time.

In this embodiment, the image capturing device is described by taking a mobile terminal as an example. And acquiring a current frame image of a camera of the mobile terminal when recording a video or shooting the video in real time. Besides acquiring the video shot and/or recorded by the image acquisition equipment in real time, the current frame image in the currently played video can be acquired in real time.

The present embodiment utilizes the continuity and the relevance between the frames of images in the video data, and when processing the frames of images in the video data, the frames of images in the video data are grouped and processed. When grouping each frame image in the video data, the association relationship between each frame image needs to be considered, and the frame images with close association relationship in each frame image are grouped into one group. The frame numbers of the frame images specifically contained in different groups of frame images may be the same or different, and it is assumed that each group of frame images contains n frame images, n may be a fixed value or a non-fixed value, and the value of n is set according to the implementation situation. When the current frame image is acquired in real time, the current frame image is grouped, and whether the current frame image is one frame image in the current group or the 1 st frame image in a new group is determined. Specifically, the grouping is performed according to the association relationship between the current frame image and the previous frame image or the previous frames of images. If the tracking algorithm is used, if the current frame image obtained by the tracking algorithm is an effective tracking result, the current frame image is determined as a frame image in the current group, and if the current frame image obtained by the tracking algorithm is an invalid tracking result, the current frame image is actually the 1 st frame image in the new group; or according to the sequence of each frame image, two or three adjacent frames of images are divided into a group, taking a group of three frames of images as an example, the 1 st frame of image in the video data is the 1 st frame of image of the first group, the 2 nd frame of image is the 2 nd frame of image of the first group, the 3 rd frame of image is the 3 rd frame of image of the first group, the 4 th frame of image is the 1 st frame of image of the second group, the 5 th frame of image is the 2 nd frame of image of the second group, the 6 th frame of image is the 3 rd frame of image of the second group, and so on. The specific grouping manner in the implementation is determined according to the implementation situation, and is not limited herein.

Step S102, judging whether the current frame image is the 1 st frame image of any group.

And judging whether the current frame image is the 1 st frame image of any one of the groups, if so, executing the step S103, otherwise, executing the step S104. The specific judgment mode is judged according to different grouping modes.

Step S103, inputting the current frame image into the trained neural network, and obtaining the processed current frame image after the operation of all the convolution layers and the deconvolution layers of the neural network.

The current frame image is the 1 st frame image in any group, the current frame image is input into the neural network obtained through training, all the convolution layer operations and the deconvolution layer operations are executed on the current frame image through the neural network in sequence, and finally the processed current frame image is obtained. Specifically, if the neural network includes the operations of 4 convolutional layers and 3 deconvolution layers, the current frame image is input to the neural network and is subjected to the operations of all 4 convolutional layers and the operations of 3 deconvolution layers. The neural network also comprises a step of carrying out image fusion on the operation result of the convolution layer and the operation result of the corresponding deconvolution layer, and finally obtaining a processed current frame image.

Step S104, inputting the current frame image into the trained neural network, obtaining the operation result of the ith convolution layer after the operation is carried out on the ith convolution layer of the neural network, obtaining the operation result of the jth deconvolution layer obtained by inputting the 1 st frame image of the group to which the current frame image belongs into the neural network, and directly carrying out image fusion on the operation result of the ith convolution layer and the operation result of the jth deconvolution layer to obtain the processed current frame image.

The current frame image is not the 1 st frame image in any group, the current frame image is input into the neural network obtained through training, at the moment, the neural network does not need to execute all the calculation of the convolution layer and the calculation of the deconvolution layer, only the ith convolution layer of the neural network is calculated to obtain the calculation result of the ith convolution layer, the 1 st frame image of the group to which the current frame image belongs is directly obtained and input into the neural network to obtain the calculation result of the jth deconvolution layer, and the calculation result of the ith convolution layer and the calculation result of the jth deconvolution layer are subjected to image fusion, so that the processed current frame image can be obtained. The corresponding relation is that the output dimensionality of the operation result of the ith convolution layer is the same as that of the operation result of the jth deconvolution layer. i and j are natural numbers, the value of i is not more than the number of the last convolution layer contained in the neural network, and the value of j is not more than the number of the last deconvolution layer contained in the neural network. Specifically, if the current frame image is input into the neural network, the current frame image is computed to the 1 st layer convolution layer of the neural network to obtain the computation result of the 1 st layer convolution layer, the 1 st frame image of the group to which the current frame image belongs is directly input into the neural network to obtain the computation result of the 3 rd layer deconvolution layer, and the computation result of the 1 st layer convolution layer and the computation result of the 3 rd layer deconvolution layer of the 1 st frame image are fused. Wherein, the output dimension of the operation result of the convolution layer at the 1 st layer and the operation result of the convolution layer at the 3 rd layer is the same.

The operation result of the jth layer deconvolution layer obtained by the operation of the 1 st frame image in the belonged grouping is multiplexed, so that the operation of the neural network on the current frame image can be reduced, the processing speed of the neural network is greatly increased, and the calculation efficiency of the neural network is improved.

And step S105, outputting the processed current frame image.

The processed current frame image is directly output when being output, the processed current frame image can also be used for directly covering the original current frame image, and the covering speed is high and is generally finished within 1/24 seconds. For the user, because the time of the covering processing is relatively short, the human eye does not perceive the process that the original current frame image in the video data is covered, that is, the process that the human eye does not perceive the original current frame image in the video data to be covered is equivalent to outputting the current frame image of the processed video data to the user in real time while shooting and/or recording and/or playing the video data, and the user does not perceive the display effect that the current frame image in the video data is covered.

Step S106, judging whether the processing of all frame images in the video data is finished.

And if the current frame image is the last frame image of the video data, judging that the processing of all the frame images in the video data is finished, and ending the execution. If the frame images in the video data are continuously acquired after the current frame image is processed, it is determined that the processing of all the frame images in the video data is not completed, and step S101 is executed to continuously acquire and process the frame images in the video data.

According to the video data real-time processing method provided by the invention, the current frame image in the video shot and/or recorded by the image acquisition equipment is acquired in real time; or, acquiring a current frame image in a currently played video in real time; judging whether the current frame image is the 1 st frame image of any group; if so, inputting the current frame image into the trained neural network, and obtaining a processed current frame image after the operation of all the convolution layers and the deconvolution layers of the neural network; if not, inputting the current frame image into the trained neural network, after calculating to the ith convolution layer of the neural network to obtain the calculation result of the ith convolution layer, acquiring the calculation result of the jth deconvolution layer obtained by inputting the 1 st frame image of the group to which the current frame image belongs into the neural network, and directly carrying out image fusion on the calculation result of the ith convolution layer and the calculation result of the jth deconvolution layer to obtain a processed current frame image; wherein i and j are natural numbers; outputting the processed current frame image; and repeatedly executing the steps until the processing of all the frame images in the video data is finished. The invention fully utilizes the continuity and the relevance among all the frames of images in the video data, when the video data is processed in real time, the video data is processed in groups, the 1 st frame of image in each group completes the operation of all the convolution layers and the deconvolution layers through the neural network, the other frames of images except the 1 st frame of image are only operated to the i-th layer of convolution layers, the operation result of the j-th layer of deconvolution layer obtained by multiplexing the 1 st frame of image is multiplexed for image fusion, the operation amount of the neural network is greatly reduced, and the speed of the real-time processing of the video data is improved.

Fig. 2 shows a flow chart of a method for real-time processing of video data according to another embodiment of the invention. As shown in fig. 2, the real-time processing method of video data specifically includes the following steps:

step S201, acquiring a current frame image in a video shot and/or recorded by image acquisition equipment in real time; or, the current frame image in the currently played video is acquired in real time.

This step can refer to step S101 in the embodiment of fig. 1, and is not described herein again.

Step S202, it is determined whether the current frame image is the 1 st frame image of any group.

And judging whether the current frame image is the 1 st frame image of any one group, if so, executing step S203, otherwise, executing step S204.

Step S203, inputting the current frame image into the trained neural network, and obtaining the processed current frame image after the operation of all the convolution layers and the deconvolution layers of the neural network.

The current frame image is the 1 st frame image in any group, the current frame image is input into the neural network obtained through training, all the convolution layer operations and the deconvolution layer operations are executed on the current frame image through the neural network in sequence, and finally the processed current frame image is obtained.

In order to further improve the operation speed of the neural network, after each layer of convolution layer operation before the last layer of convolution layer of the neural network, the operation result of each layer of convolution layer is subjected to downsampling processing, namely after the current frame image is input into the neural network, after the 1 st layer of convolution layer operation, the operation result is subjected to downsampling processing to reduce the resolution ratio of the operation result, then the operation result after downsampling is subjected to the 2 nd layer of convolution layer operation, the operation result of the 2 nd layer of convolution layer is also subjected to downsampling processing, and the process is repeated until the last layer of convolution layer (namely the bottleneck layer of the convolution layer) of the neural network, the last layer of convolution layer is taken as the 4 th layer of convolution layer as an example, and downsampling processing is not performed after the operation result of the 4 th layer of convolution layer. After each layer of convolution layer before the last layer of convolution layer is operated, the operation result of each layer of convolution layer is subjected to down-sampling processing, the resolution ratio of frame images input by each layer of convolution layer is reduced, and the operation speed of the neural network can be improved. It should be noted that, in the first convolution layer operation of the neural network, the current frame image acquired in real time is input without down-sampling, so that the details of the current frame image can be better obtained. And then, when the output operation result is subjected to down-sampling processing, the details of the current frame image are not influenced, and the operation speed of the neural network can be improved.

Step S204, calculating the frame distance between the current frame image and the 1 st frame image of the group to which the current frame image belongs.

When the frame distance between the current frame image and the 1 st frame image of the group to which the current frame image belongs is calculated, specifically, the current frame image is the 3 rd frame image of any group, and the frame distance between the current frame image and the 1 st frame image of the group to which the current frame image belongs is calculated to be 2.

And step S205, determining the values of i and j according to the frame interval.

And determining the value of i of the ith convolution layer in the neural network and the value of j of the jth deconvolution layer in the 1 st frame of image according to the obtained frame interval. When i and j are determined, it can be considered that the layer distance between the ith convolutional layer and the last convolutional layer (the bottleneck layer of the convolutional layer) is in inverse proportion to the frame distance, and the layer distance between the jth anti-convolutional layer and the output layer is in direct proportion to the frame distance. When the frame interval is larger, the layer interval between the i-th layer of convolution layer and the last layer of convolution layer is smaller, the value of i is larger, and more convolution layer operations need to be operated; the larger the layer distance between the jth deconvolution layer and the output layer is, the smaller the j value is, and the operation result of the deconvolution layer with a smaller number of layers needs to be obtained.

Taking the example that the neural network comprises 1 st to 4 th convolutional layers, wherein the 4 th convolutional layer is the last convolutional layer; the neural network also comprises 1-3 deconvolution layers and an output layer. When the frame interval is 1, determining the layer interval between the ith convolution layer and the last convolution layer to be 3, determining i to be 1, namely, calculating to the 1 st convolution layer, determining the layer interval between the jth deconvolution layer and the output layer to be 1, determining j to be 3, and obtaining the operation result of the 3 rd deconvolution layer; when the frame interval is 2, determining that the layer interval between the ith convolutional layer and the last convolutional layer is 2, determining that i is 2, namely, calculating to the 2 nd convolutional layer, determining that the layer interval between the jth convolutional layer and the output layer is 2, and j is 2, and obtaining the operation result of the 2 nd convolutional layer. The specific layer distance is related to the number of layers of the convolutional layer and the deconvolution layer included in the neural network and the effect to be achieved in the actual implementation, which are all exemplified above.

Or when the value of i of the ith convolution layer in the neural network and the value of j of the jth deconvolution layer of the 1 st frame image are determined according to the obtained frame interval, the corresponding relation between the frame interval and the values of i and j can be preset directly according to the frame interval. Specifically, values of different i and j are preset according to different frame intervals, for example, the frame interval is 1, the value of i is 1, and the value of j is 3; setting the frame interval to be 2, setting the value of i to be 2, and setting the value of j to be 2; or the same values of i and j can be set according to different frame intervals; if no matter the size of the frame interval, the value of the corresponding i is set to be 2, and the value of the corresponding j is set to be 2; or the same values of i and j can be set for a part of different inter-frame distances, for example, the inter-frame distances are 1 and 2, the corresponding value of i is 1, and the value of j is 3; the inter-frame spacing is 3 and 4, the corresponding value of i is set to 2, and the value of j is set to 2. The method is specifically set according to implementation conditions, and is not limited herein.

Step S206, inputting the current frame image into the trained neural network, obtaining the operation result of the ith convolution layer after calculating to the ith convolution layer of the neural network, obtaining the operation result of the jth deconvolution layer obtained by inputting the 1 st frame image of the group to which the current frame image belongs into the neural network, and directly carrying out image fusion on the operation result of the ith convolution layer and the operation result of the jth deconvolution layer to obtain the processed current frame image.

After values of i and j are determined, the current frame image is input into a trained neural network, only the ith convolution layer of the neural network is operated to obtain an operation result of the ith convolution layer, the 1 st frame image of the group to which the current frame image belongs is directly obtained and input into the neural network to obtain an operation result of the jth deconvolution layer, and the operation result of the ith convolution layer and the operation result of the jth deconvolution layer are subjected to image fusion, so that the processed current frame image can be obtained. The operation result of the jth layer deconvolution layer obtained by inputting the 1 st frame image of the group into the neural network can be directly obtained, and the 1 st frame image of the group does not need to be input into the neural network again to be obtained, so that the operation frequency of the neural network is greatly reduced, and the operation speed of the neural network is accelerated.

Furthermore, after each convolution layer before the ith convolution layer of the neural network is calculated, the calculation result of each convolution layer is subjected to downsampling processing. After the current frame image is input into the neural network, after the 1 st layer convolution layer operation, the operation result is subjected to down-sampling processing to reduce the resolution of the operation result, then the operation result after down-sampling is subjected to the 2 nd layer convolution layer operation, the operation result of the 2 nd layer convolution layer is also subjected to down-sampling processing, and the operation is repeated until the ith layer convolution layer, so that the resolution of the frame image input by each layer of convolution layer can be reduced, and the operation speed of the neural network is improved. It should be noted that, in the first convolution layer operation of the neural network, the current frame image acquired in real time is input without down-sampling, so that the details of the current frame image can be better obtained. And then, when the output operation result is subjected to down-sampling processing, the details of the current frame image are not influenced, and the operation speed of the neural network can be improved.

Further, if the jth deconvolution layer is the last deconvolution layer of the neural network, the image fusion result is input to the output layer to obtain the processed current frame image. And if the j-th deconvolution layer is not the last deconvolution layer of the neural network, inputting the image fusion result into the j + 1-th deconvolution layer, and obtaining the processed current frame image through the subsequent operations of each deconvolution layer and the output layer.

Step S207, the processed current frame image is output.

And step S208, displaying the processed video data in real time.

The processed current frame image is directly output when being output, the processed current frame image can also be used for directly covering the original current frame image, and the covering speed is high and is generally finished within 1/24 seconds. Therefore, the processed video data can be displayed in real time, and a user can directly see the display effect of the current frame image of the processed video data without feeling that the current frame image in the video data is processed.

In step S209, it is determined whether or not the processing of all the frame images in the video data is completed.

If the current frame image is the last frame image of the video data, it is determined that the processing of all the frame images in the video data has been completed, and the processing is completed, and the subsequent step S209 may be performed. If the frame images in the video data are continuously acquired after the current frame image is processed, it is determined that the processing of all the frame images in the video data is not completed, and step S201 is executed to continuously acquire and process the frame images in the video data.

And step S210, uploading the processed video data to a cloud server.

The processed video data can be directly uploaded to a cloud server, and specifically, the processed video data can be uploaded to one or more cloud video platform servers, such as a cloud video platform server for love art, Youkou, fast video and the like, so that the cloud video platform servers can display the video data on a cloud video platform. Or the processed video data can be uploaded to a cloud live broadcast server, and when a user at a live broadcast watching end enters the cloud live broadcast server to watch, the video data can be pushed to a watching user client in real time by the cloud live broadcast server. Or the processed video data can be uploaded to a cloud public server, and when a user pays attention to the public, the cloud public server pushes the video data to a public client; further, the cloud public number server can push video data conforming to user habits to the public number attention client according to the watching habits of users paying attention to the public numbers.

According to the video data real-time processing method provided by the invention, after a current frame image is obtained, the current frame image is judged, if the current frame image is the 1 st frame image in any group, the current frame image is input into a trained neural network, and after the operation of all convolutional layers and deconvolution layers of the neural network, the processed current frame image is obtained; and if the current frame image is not the 1 st frame image in any group, calculating the frame distance between the current frame image and the 1 st frame image of the group to which the current frame image belongs. And determining the value of i of the ith convolutional layer of the neural network according to the frame interval to obtain the operation result of the ith convolutional layer. And meanwhile, j values of a jth layer of deconvolution layer of the neural network are determined, so that the operation result of the jth layer of deconvolution layer, which is obtained by inputting the 1 st frame image of the grouping to which the current frame image belongs, into the neural network is directly obtained, the operation result of the jth layer of deconvolution layer is multiplexed, the operation result of the ith layer of convolution layer and the operation result of the jth layer of deconvolution layer are subjected to image fusion, the processed current frame image is obtained, the operation frequency of the neural network is reduced, and the calculation efficiency is improved. Furthermore, after each convolution layer before the ith convolution layer or the last convolution layer of the neural network is operated, the operation result of each convolution layer is subjected to downsampling processing, so that the resolution of the frame image input by each convolution layer is reduced, and the operation speed of the neural network is improved. The method and the device can directly obtain the processed video data, can directly upload the processed video data to the cloud server, do not need a user to additionally process the video data, save the time of the user, can display the processed video data to the user in real time, and are convenient for the user to check the display effect.

Fig. 3 shows a functional block diagram of a video data real-time processing apparatus according to an embodiment of the present invention. As shown in fig. 3, the video data real-time processing apparatus includes the following modules:

the acquisition module 301 is adapted to acquire a current frame image in a video shot and/or recorded by an image acquisition device in real time; or, the current frame image in the currently played video is acquired in real time.

In this embodiment, the image capturing device is described by taking a mobile terminal as an example. The obtaining module 301 obtains a current frame image of the mobile terminal camera when recording a video or a current frame image of the mobile terminal camera when shooting a video in real time. The obtaining module 301 may obtain, in addition to the video shot and/or recorded by the image capturing device, a current frame image in the currently played video in real time.

The present embodiment utilizes the continuity and the relevance between the frames of images in the video data, and when processing the frames of images in the video data, the frames of images in the video data are grouped and processed. When grouping each frame image in the video data, the association relationship between each frame image needs to be considered, and the frame images with close association relationship in each frame image are grouped into one group. The number of frames of the frame images specifically included in different groups of frame images may be the same or different, and it is assumed that each group of frame images includes n frame images, where the number may be a fixed value or a non-fixed value, and the value of n is set according to the implementation situation. When acquiring the current frame image in real time, the acquiring module 301 groups the current frame image and determines whether the current frame image is a frame image in the current group or a 1 st frame image in a new group. Specifically, the grouping is performed according to the association relationship between the current frame image and the previous frame image or the previous frames of images. If the tracking algorithm is used, if the current frame image obtained by the tracking algorithm is an effective tracking result, the current frame image is determined as a frame image in the current group, and if the current frame image obtained by the tracking algorithm is an invalid tracking result, the current frame image is actually the 1 st frame image in the new group; or according to the sequence of each frame image, two or three adjacent frames of images are divided into a group, taking a group of three frames of images as an example, the 1 st frame of image in the video data is the 1 st frame of image of the first group, the 2 nd frame of image is the 2 nd frame of image of the first group, the 3 rd frame of image is the 3 rd frame of image of the first group, the 4 th frame of image is the 1 st frame of image of the second group, the 5 th frame of image is the 2 nd frame of image of the second group, the 6 th frame of image is the 3 rd frame of image of the second group, and so on. The specific grouping manner in the implementation is determined according to the implementation situation, and is not limited herein.

A judging module 302, adapted to judge whether the current frame image is a 1 st frame image of any group, if yes, execute a first processing module 303; otherwise, the second processing module 304 is executed.

The determining module 302 determines whether the current frame image is the 1 st frame image of any of the groups, if so, the first processing module 303 is executed, otherwise, the second processing module 304 is executed. The determining module 302 determines the specific determining manner according to the different grouping manners.

The first processing module 303 is adapted to input the current frame image into the trained neural network, and obtain a processed current frame image after the operation of all the convolution layers and the deconvolution layers of the neural network.

The judging module 302 judges that the current frame image is the 1 st frame image in any packet, the first processing module 303 inputs the current frame image into the trained neural network, and the neural network executes all the convolution layer operations and the deconvolution layer operations on the current frame image in sequence to finally obtain the processed current frame image. Specifically, if the neural network includes 4-layer convolution layer operation and 3-layer deconvolution layer operation, the first processing module 303 inputs the current frame image into the neural network and performs all 4-layer convolution layer operation and 3-layer deconvolution layer operation. The neural network also comprises a step of carrying out image fusion on the operation result of the convolution layer and the operation result of the corresponding deconvolution layer, and finally obtaining a processed current frame image.

Furthermore, in order to increase the operation speed of the neural network, after each layer of convolution layer before the last layer of convolution layer of the neural network is operated, the first processing module 303 performs down-sampling on the operation result of each convolutional layer, after the current frame image is input into the neural network, after the 1 st layer convolution layer operation, the first processing module 303 performs down-sampling processing on the operation result to reduce the resolution of the operation result, and then performs the 2 nd layer convolution layer operation on the down-sampled operation result, the first processing module 303 also performs down-sampling processing on the operation result of the 2 nd layer convolution layer, and so on until the last layer convolution layer (i.e. the bottleneck layer of the convolution layer) of the neural network, taking the last layer convolution layer as the 4 th layer convolution layer as an example, the first processing module 303 does not perform downsampling after the result of the layer 4 convolutional layer operation. After each convolutional layer before the last convolutional layer is calculated, the first processing module 303 performs downsampling processing on the calculation result of each convolutional layer, so that the resolution of the frame image input by each convolutional layer is reduced, and the calculation speed of the neural network can be improved. It should be noted that, during the first convolutional layer operation of the neural network, the first processing module 303 inputs the current frame image acquired in real time without performing downsampling processing, so that better details of the current frame image can be obtained. Then, when the first processing module 303 performs downsampling processing on the output operation result, the details of the current frame image are not affected, and the operation speed of the neural network can be increased.

The second processing module 304 is adapted to input the current frame image into the trained neural network, obtain an operation result of the ith convolution layer after operating the ith convolution layer of the neural network to obtain an operation result of the ith convolution layer, obtain an operation result of the jth deconvolution layer obtained by inputting the 1 st frame image of the group to which the current frame image belongs into the neural network, and directly perform image fusion on the operation result of the ith convolution layer and the operation result of the jth deconvolution layer to obtain a processed current frame image.

The judging module 302 judges that the current frame image is not the 1 st frame image in any packet, the second processing module 304 inputs the current frame image into the trained neural network, at this time, the neural network does not need to execute all operations of the convolution layer and the operations of the deconvolution layer, only the i-th convolution layer of the neural network is operated to obtain the operation result of the i-th convolution layer, the second processing module 304 directly obtains the operation result of the j-th deconvolution layer obtained by inputting the 1 st frame image of the packet to which the current frame image belongs into the neural network, and the operation result of the i-th convolution layer and the operation result of the j-th deconvolution layer are subjected to image fusion, so that the processed current frame image can be obtained. The corresponding relation is that the output dimensionality of the operation result of the ith convolution layer is the same as that of the operation result of the jth deconvolution layer. i and j are natural numbers, the value of i is not more than the number of the last convolution layer contained in the neural network, and the value of j is not more than the number of the last deconvolution layer contained in the neural network. Specifically, if the second processing module 304 inputs the current frame image into the neural network, and calculates the current frame image to the 1 st layer convolution layer of the neural network to obtain the calculation result of the 1 st layer convolution layer, the second processing module 304 directly obtains the calculation result of the 3 rd layer convolution layer obtained by inputting the 1 st frame image of the group to which the current frame image belongs into the neural network, and fuses the calculation result of the 1 st layer convolution layer and the calculation result of the 3 rd layer convolution layer of the 1 st frame image. Wherein, the output dimension of the operation result of the convolution layer at the 1 st layer and the operation result of the convolution layer at the 3 rd layer is the same.

The second processing module 304 can reduce the operation of the neural network on the current frame image by multiplexing the operation result of the deconvolution layer of the j layer obtained by the operation of the 1 st frame image in the group to which the second processing module belongs, thereby greatly accelerating the processing speed of the neural network and improving the calculation efficiency of the neural network.

Further, after each convolution layer before the ith convolution layer of the neural network is operated, the second processing module 304 performs downsampling processing on the operation result of each convolution layer. After the current frame image is input into the neural network, after the 1 st layer of convolution layer operation, the second processing module 304 performs downsampling processing on the operation result to reduce the resolution of the operation result, and then performs the 2 nd layer of convolution layer operation on the downsampled operation result, and the second processing module 304 also performs downsampling processing on the operation result of the 2 nd layer of convolution layer, and so on until the ith layer of convolution layer, so that the resolution of the frame image input by each layer of convolution layer can be reduced, and the operation speed of the neural network is improved. It should be noted that, during the first convolutional layer operation of the neural network, the second processing module 304 inputs the current frame image acquired in real time without performing downsampling processing, so as to obtain better details of the current frame image. Then, when the second processing module 304 performs downsampling processing on the output operation result, the details of the current frame image are not affected, and the operation speed of the neural network can be increased.

Further, if the jth deconvolution layer is the last deconvolution layer of the neural network, the second processing module 304 inputs the image fusion result to the output layer to obtain the processed current frame image. If the jth deconvolution layer is not the last deconvolution layer of the neural network, the second processing module 304 inputs the image fusion result to the (j + 1) th deconvolution layer, and the processed current frame image is obtained through the subsequent operations of each deconvolution layer and the output layer.

The output module 305 is adapted to output the processed current frame image.

The output module 305 directly outputs the processed current frame image when outputting, and the output module 305 may also directly cover the original current frame image with the processed current frame image, and the covering speed is fast, generally within 1/24 seconds. For the user, because the time of the covering processing is relatively short, the human eye does not perceive the process that the original current frame image in the video data is covered, that is, the process that the human eye does not perceive the original current frame image in the video data to be covered is equivalent to outputting the current frame image of the processed video data to the user in real time while shooting and/or recording and/or playing the video data, and the user does not perceive the display effect that the current frame image in the video data is covered.

The loop module 306 is adapted to repeatedly execute the above-mentioned obtaining module 301, the determining module 302, the first processing module 303, the second processing module 304 and/or the output module 305 until the processing of all frame images in the video data is completed.

And if the current frame image is the last frame image of the video data, judging that the processing of all the frame images in the video data is finished, and ending the execution. If the frame images in the video data are continuously acquired after the current frame image is processed, it is determined that the processing of all the frame images in the video data is not completed, and the loop module 306 executes the acquiring module 301, the determining module 302, the first processing module 303, the second processing module 304, and the output module 305 until the processing of all the frame images in the video data is completed.

According to the video data real-time processing device provided by the invention, the current frame image in the video shot and/or recorded by the image acquisition equipment is acquired in real time; or, acquiring a current frame image in a currently played video in real time; judging whether the current frame image is the 1 st frame image of any group; if so, inputting the current frame image into the trained neural network, and obtaining a processed current frame image after the operation of all the convolution layers and the deconvolution layers of the neural network; if not, inputting the current frame image into the trained neural network, after calculating to the ith convolution layer of the neural network to obtain the calculation result of the ith convolution layer, acquiring the calculation result of the jth deconvolution layer obtained by inputting the 1 st frame image of the group to which the current frame image belongs into the neural network, and directly carrying out image fusion on the calculation result of the ith convolution layer and the calculation result of the jth deconvolution layer to obtain a processed current frame image; wherein i and j are natural numbers; outputting the processed current frame image; the above-described execution is repeatedly performed until the processing of all the frame images in the video data is completed. The invention fully utilizes the continuity and the relevance among all the frames of images in the video data, when the video data is processed in real time, the video data is processed in groups, the 1 st frame of image in each group completes the operation of all the convolution layers and the deconvolution layers through the neural network, the other frames of images except the 1 st frame of image are only operated to the i-th layer of convolution layers, the operation result of the j-th layer of deconvolution layer obtained by multiplexing the 1 st frame of image is multiplexed for image fusion, the operation amount of the neural network is greatly reduced, and the speed of the real-time processing of the video data is improved. Furthermore, after each convolution layer before the ith convolution layer or the last convolution layer of the neural network is operated, the operation result of each convolution layer is subjected to downsampling processing, so that the resolution of the frame image input by each convolution layer is reduced, and the operation speed of the neural network is improved.

Fig. 4 shows a functional block diagram of a video data real-time processing apparatus according to another embodiment of the present invention. As shown in fig. 4, the difference from fig. 3 is that the video data real-time processing apparatus further includes:

the inter-frame distance calculating module 307 is adapted to calculate the inter-frame distance between the current frame image and the 1 st frame image of the group to which the current frame image belongs.

When the frame distance calculation module 307 calculates the frame distance between the current frame image and the 1 st frame image of the group to which the current frame image belongs, specifically, the current frame image is the 3 rd frame image of any group, and the frame distance calculation module 307 calculates that the frame distance between the current frame image and the 1 st frame image of the group to which the current frame image belongs is 2.

The determining module 308 is adapted to determine values of i and j according to the inter-frame distance.

The determining module 308 determines the value of i of the ith convolutional layer and the value of j of the jth deconvolution layer of the 1 st frame image according to the obtained frame interval. The determining module 308 may determine that the layer distance between the ith convolutional layer and the last convolutional layer (bottleneck layer of convolutional layer) is in inverse proportion to the frame distance, and the layer distance between the jth convolutional layer and the output layer is in direct proportion to the frame distance when determining i and j. When the frame interval is larger, the layer interval between the ith convolutional layer and the last convolutional layer is smaller, the value of i is larger, and the second processing module 304 needs to run more convolutional layers; the larger the layer distance between the jth deconvolution layer and the output layer is, the smaller the j value is, and the second processing module 304 needs to obtain the operation result of the deconvolution layer with a smaller number of layers. Taking the example that the neural network comprises 1 st to 4 th convolutional layers, wherein the 4 th convolutional layer is the last convolutional layer; the neural network further comprises 1-3 th deconvolution layers and an output layer, when the frame distance calculation module 307 calculates the frame distance to be 1, the determination module 308 determines that the layer distance between the ith convolution layer and the last convolution layer is 3, determines that i is 1, namely the second processing module 304 calculates to the 1 st convolution layer, the determination module 308 determines that the layer distance between the jth deconvolution layer and the output layer is 1, determines that j is 3, and the second processing module 304 acquires the calculation result of the 3 rd deconvolution layer; when the frame distance calculation module 307 calculates that the frame distance is 2, the determination module 308 determines that the layer distance between the i-th convolutional layer and the last convolutional layer is 2, and determines that i is 2, that is, the second processing module 304 operates to the 2-th convolutional layer, the determination module 308 determines that the layer distance between the j-th convolutional layer and the output layer is 2, and j is 2, and the second processing module 304 obtains the operation result of the 2-th convolutional layer. The specific layer distance is related to the number of layers of the convolutional layer and the deconvolution layer included in the neural network and the effect to be achieved in the actual implementation, which are all exemplified above.

The presetting module 309 is adapted to preset a corresponding relationship between the inter-frame distance and values of i and j.

When the preset module 309 determines the value of i of the i-th convolutional layer in the neural network and the value of j of the j-th anti-convolutional layer in the 1 st frame image according to the obtained frame interval, the corresponding relationship between the frame interval and the values of i and j may be preset directly according to the frame interval. Specifically, the presetting module 309 presets values of different i and j according to different inter-frame distances, for example, the inter-frame distance calculating module 307 calculates that the inter-frame distance is 1, the presetting module 309 sets the value of i to 1, and the value of j to 3; the inter-frame distance calculating module 307 calculates the inter-frame distance to be 2, the presetting module 309 sets the value of i to be 2, and the value of j to be 2; or the same values of i and j can be set according to different frame intervals; if the frame interval is not large, the preset module 309 sets the value of i to 2 and the value of j to 2; or the same values of i and j may also be set for a part of different inter-frame distances, for example, the inter-frame distance calculation module 307 calculates that the inter-frame distance is 1 and 2, the preset module 309 sets the corresponding value of i to 1, and the value of j to 3; the interframe space calculating module 307 calculates the interframe space to be 3 and 4, and the presetting module 309 sets the value of i to be 2 and the value of j to be 2. The method is specifically set according to implementation conditions, and is not limited herein.

And the display module 310 is adapted to display the processed video data in real time.

After obtaining the current frame image of the processed video data, the display module 310 may display the current frame image of the processed video data in real time, so that the user may directly see the display effect of the current frame image of the processed video data.

The uploading module 311 is adapted to upload the processed video data to a cloud server.

The uploading module 311 may directly upload the processed video data to a cloud server, and specifically, the uploading module 311 may upload the processed video data to one or more cloud video platform servers, such as a cloud video platform server for an arcade, a super-cool, a fast video, and the like, so that the cloud video platform servers display the video data on a cloud video platform. Or the uploading module 311 may also upload the processed video data to the cloud live broadcast server, and when a user at a live broadcast watching end enters the cloud live broadcast server to watch, the cloud live broadcast server may push the video data to a watching user client in real time. Or the uploading module 311 may also upload the processed video data to a cloud public server, and when a user pays attention to the public, the cloud public server pushes the video data to a public client; further, the cloud public number server can push video data conforming to user habits to the public number attention client according to the watching habits of users paying attention to the public numbers.

According to the video data real-time processing device provided by the invention, after a current frame image is obtained, the current frame image is judged, if the current frame image is the 1 st frame image in any group, the current frame image is input into a trained neural network, and after the operation of all convolutional layers and deconvolution layers of the neural network, the processed current frame image is obtained; and if the current frame image is not the 1 st frame image in any group, calculating the frame distance between the current frame image and the 1 st frame image of the group to which the current frame image belongs. And determining the value of i of the ith convolutional layer of the neural network according to the frame interval to obtain the operation result of the ith convolutional layer. And meanwhile, j values of a jth layer of deconvolution layer of the neural network are determined, so that the operation result of the jth layer of deconvolution layer, which is obtained by inputting the 1 st frame image of the grouping to which the current frame image belongs, into the neural network is directly obtained, the operation result of the jth layer of deconvolution layer is multiplexed, the operation result of the ith layer of convolution layer and the operation result of the jth layer of deconvolution layer are subjected to image fusion, the processed current frame image is obtained, the operation frequency of the neural network is reduced, and the calculation efficiency is improved. The method and the device can directly obtain the processed video data, can directly upload the processed video data to the cloud server, do not need a user to additionally process the video data, save the time of the user, can display the processed video data to the user in real time, and are convenient for the user to check the display effect.

The application also provides a non-volatile computer storage medium, wherein the computer storage medium stores at least one executable instruction, and the computer executable instruction can execute the video data real-time processing method in any method embodiment.

Fig. 5 is a schematic structural diagram of a computing device according to an embodiment of the present invention, and the specific embodiment of the present invention does not limit the specific implementation of the computing device.

As shown in fig. 5, the computing device may include: a processor (processor)502, a Communications Interface 504, a memory 506, and a communication bus 508.

Wherein:

the processor 502, communication interface 504, and memory 506 communicate with one another via a communication bus 508.

A communication interface 504 for communicating with network elements of other devices, such as clients or other servers.

The processor 502 is configured to execute the program 510, and may specifically perform relevant steps in the above-described embodiment of the video data real-time processing method.

In particular, program 510 may include program code that includes computer operating instructions.

The processor 502 may be a central processing unit CPU, or an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits configured to implement an embodiment of the present invention. The computing device includes one or more processors, which may be the same type of processor, such as one or more CPUs; or may be different types of processors such as one or more CPUs and one or more ASICs.

And a memory 506 for storing a program 510. The memory 506 may comprise high-speed RAM memory, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.

The program 510 may be specifically configured to enable the processor 502 to execute the video data real-time processing method in any of the above-described method embodiments. For specific implementation of each step in the program 510, reference may be made to corresponding steps and corresponding descriptions in units in the foregoing video data real-time processing embodiment, which are not described herein again. It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described devices and modules may refer to the corresponding process descriptions in the foregoing method embodiments, and are not described herein again.

The algorithms and displays presented herein are not inherently related to any particular computer, virtual machine, or other apparatus. Various general purpose systems may also be used with the teachings herein. The required structure for constructing such a system will be apparent from the description above. Moreover, the present invention is not directed to any particular programming language. It is appreciated that a variety of programming languages may be used to implement the teachings of the present invention as described herein, and any descriptions of specific languages are provided above to disclose the best mode of the invention.

In the description provided herein, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.

Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.

The various component embodiments of the invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that a microprocessor or Digital Signal Processor (DSP) may be used in practice to implement some or all of the functions of some or all of the components of an apparatus for real-time processing of video data according to embodiments of the present invention. The present invention may also be embodied as apparatus or device programs (e.g., computer programs and computer program products) for performing a portion or all of the methods described herein. Such programs implementing the present invention may be stored on computer-readable media or may be in the form of one or more signals. Such a signal may be downloaded from an internet website or provided on a carrier signal or in any other form.

It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.

Claims

1. A method for processing video data in real time, the method performing packet processing on frame images included in the video data, comprising:

judging whether the current frame image is the 1 st frame image of any group;

if yes, inputting the current frame image into a trained neural network, and obtaining a processed current frame image after operation of all convolution layers and deconvolution layers of the neural network;

if not, inputting the current frame image into the trained neural network, after calculating to the ith layer of convolution layer of the neural network to obtain the calculation result of the ith layer of convolution layer, obtaining the calculation result of the jth layer of deconvolution layer obtained by inputting the 1 st frame image of the group to which the current frame image belongs into the neural network, and directly carrying out image fusion on the calculation result of the ith layer of convolution layer and the calculation result of the jth layer of deconvolution layer to obtain a processed current frame image; wherein i and j are natural numbers, and the output dimensionality of the operation result of the ith layer of convolution layer is the same as that of the operation result of the jth layer of deconvolution layer;

outputting the processed current frame image;

2. The method of claim 1, wherein after determining that the current frame image is not the 1 st frame image of any packet, the method further comprises:

3. The method of claim 2, wherein the method further comprises: and presetting the corresponding relation between the frame interval and the values of i and j.

4. The method of claim 3, wherein after said image fusing directly the operation result of the i-th convolutional layer with the operation result of the j-th anti-convolutional layer, the method further comprises:

if the jth deconvolution layer is the last deconvolution layer of the neural network, inputting an image fusion result to an output layer to obtain a processed current frame image;

and if the j-th deconvolution layer is not the last deconvolution layer of the neural network, inputting the image fusion result into the j + 1-th deconvolution layer, and obtaining the processed current frame image through subsequent operations of the deconvolution layer and the output layer.

5. The method of claim 4, wherein the inputting the current frame image into the trained neural network, and after the operation of all convolutional layers and deconvolution layers of the neural network, obtaining the processed current frame image further comprises: after each convolution layer before the last convolution layer of the neural network is calculated, the calculation result of each convolution layer is subjected to downsampling processing.

6. The method of claim 4, wherein before computing to an ith convolutional layer of the neural network results in a result of the computation of the ith convolutional layer, the method further comprises: after each convolution layer before the ith convolution layer of the neural network is calculated, the calculation result of each convolution layer is subjected to down-sampling processing.

7. The method of claim 6, wherein each group of the video data comprises n frame images; wherein n is a fixed preset value.

8. The method of claim 7, wherein the method further comprises:

and displaying the processed video data in real time.

9. The method according to any one of claims 1-8, wherein the method further comprises:

and uploading the processed video data to a cloud server.

10. The method of claim 9, wherein the uploading the processed video data to a cloud server further comprises:

11. The method of claim 9, wherein the uploading the processed video data to a cloud server further comprises:

12. The method of claim 9, wherein the uploading the processed video data to a cloud server further comprises:

13. A video data real-time processing device that performs packet processing on frame images included in the video data, comprising:

the second processing module is suitable for inputting the current frame image into a trained neural network, obtaining an operation result of a jth layer deconvolution layer after the operation result of the ith layer of the convolutional layer of the neural network is obtained by operating the ith layer of the convolutional layer of the neural network, obtaining an operation result of a jth layer of deconvolution layer into which the 1 st frame image of the group to which the current frame image belongs, and directly carrying out image fusion on the operation result of the ith layer of the convolutional layer and the operation result of the jth layer of the deconvolution layer to obtain a processed current frame image; wherein i and j are natural numbers, and the output dimensionality of the operation result of the ith layer of convolution layer is the same as that of the operation result of the jth layer of deconvolution layer;

the output module is suitable for outputting the processed current frame image;

14. The apparatus of claim 13, wherein the apparatus further comprises:

15. The apparatus of claim 14, wherein the apparatus further comprises:

16. The apparatus of claim 15, wherein the second processing module is further adapted to:

17. The apparatus of claim 16, wherein the first processing module is further adapted to:

18. The apparatus of claim 17, wherein the second processing module is further adapted to:

19. The apparatus of claim 18, wherein each group of the video data comprises n frame images; wherein n is a fixed preset value.

20. The apparatus of claim 19, wherein the apparatus further comprises:

21. The apparatus of any one of claims 13-20, wherein the apparatus further comprises:

22. The apparatus of claim 21, wherein the upload module is further adapted to:

23. The apparatus of claim 21, wherein the upload module is further adapted to:

24. The apparatus of claim 21, wherein the upload module is further adapted to:

25. A computing device, comprising: the system comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete mutual communication through the communication bus;

the memory is used for storing at least one executable instruction, and the executable instruction causes the processor to execute the operation corresponding to the video data real-time processing method according to any one of claims 1-12.

26. A computer storage medium having stored therein at least one executable instruction for causing a processor to perform operations corresponding to the video data real-time processing method according to any one of claims 1 to 12.