CN114979703A - Method of processing video data and method of processing image data - Google Patents

Method of processing video data and method of processing image data Download PDF

Info

Publication number
CN114979703A
CN114979703A CN202110187923.2A CN202110187923A CN114979703A CN 114979703 A CN114979703 A CN 114979703A CN 202110187923 A CN202110187923 A CN 202110187923A CN 114979703 A CN114979703 A CN 114979703A
Authority
CN
China
Prior art keywords
video
interpolation
processed
processing
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110187923.2A
Other languages
Chinese (zh)
Inventor
杨涛
任沛然
谢宣松
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Group Holding Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN202110187923.2A priority Critical patent/CN114979703A/en
Publication of CN114979703A publication Critical patent/CN114979703A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/23424Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving splicing one content stream with another content stream, e.g. for inserting or substituting an advertisement
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/2343Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
    • H04N21/234345Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements the reformatting operation being performed only on part of the stream, e.g. a region of the image or a time segment
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/2343Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
    • H04N21/234381Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements by altering the temporal resolution, e.g. decreasing the frame rate by frame skipping
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44016Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving splicing one content stream with another content stream, e.g. for substituting a video clip
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
    • H04N21/440245Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display the reformatting operation being performed only on part of the stream, e.g. a region of the image or a time segment
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
    • H04N21/440281Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display by altering the temporal resolution, e.g. by frame skipping

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Marketing (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Business, Economics & Management (AREA)
  • Image Processing (AREA)

Abstract

A method of processing video data and a method of processing image data are disclosed. The method for processing the video data comprises the following steps: acquiring a video to be processed; performing state separation processing on a video to be processed based on a preset neural network algorithm to obtain a first part and a second part in the video to be processed, wherein the first part is a part which allows interpolation operation in the video to be processed, and the second part is a part which prohibits interpolation operation in the video to be processed; carrying out interpolation processing on the first part to obtain an interpolation part; and synthesizing the second part and the interpolation part to obtain a target video. The method solves the technical problem that the error interpolation exists when the video interpolation is carried out by the existing optical flow interpolation method.

Description

Method of processing video data and method of processing image data
Technical Field
The present application relates to the field of image processing, and in particular, to a method of processing video data and a method of processing image data.
Background
With the rapid development of society and technology, people have higher and higher requirements on video quality, and a large amount of videos cannot meet the requirements of users on high definition and high frame rate. Video quality can be improved by related techniques of processing video, in which a high frame rate of the video can be achieved by a video frame interpolation algorithm.
In the prior art, optical flow information is typically employed to guide video interpolation. The video frame interpolation algorithm plays an increasingly important role in improving the video frame rate and quality, but scenes and motions in real videos are too complex, and errors exist in estimation of optical flow information, so that in the scenes, the videos are processed based on the existing optical flow method, and the obtained result is not ideal.
The existing algorithm for supporting video frame insertion at any moment mainly has three problems:
(1) highly dependent on optical flow estimation, the optical flow estimation result cannot be processed without explicit optical flow estimation, and the optical flow estimation has errors in a real complex scene;
(2) a great deal of human knowledge and experience is relied on, and the method is mainly expressed in how to use optical flow estimation information;
(3) the network model needs to be carefully designed, can only be applied to video interpolation frames, cannot be applied to other aspects, and is poor in expansibility.
In view of the above problems, no effective solution has been proposed.
Disclosure of Invention
The embodiment of the application provides a method for processing video data and a method for processing image data, which at least solve the technical problem that the existing optical flow interpolation method has wrong interpolation when the video interpolation is carried out.
According to an aspect of an embodiment of the present application, there is provided a method of processing video data, including: acquiring a video to be processed; performing state separation processing on a video to be processed based on a preset neural network algorithm to obtain a first part and a second part in the video to be processed, wherein the first part is a part which allows interpolation operation in the video to be processed, and the second part is a part which prohibits interpolation operation in the video to be processed; carrying out interpolation processing on the first part to obtain an interpolation part; and synthesizing the second part and the interpolation part to obtain the target video.
According to another aspect of the embodiments of the present application, there is also provided a method of processing video data, including: displaying a video to be processed in an interactive interface; under the condition that the interactive interface receives the separation instruction, obtaining a separation result of the video to be processed, wherein a preset neural network model is adopted to perform state separation on the video to be processed, and the separation result comprises: a first part which allows interpolation operation and a second part which prohibits the interpolation operation in the video to be processed; and displaying the target video in the interactive interface, wherein the target video is obtained by synthesizing the result obtained by interpolating the first part and the second part.
According to another aspect of the embodiments of the present application, there is also provided a method of processing video data, including: in the process of playing the video to be processed, detecting a control instruction sent by a client, wherein the control instruction is used for adjusting the frame rate of the video to be processed; based on the control instruction, performing state separation processing on the video to be processed by adopting a preset neural network algorithm, and performing interpolation processing on a part, which allows interpolation, in the video to be processed to obtain a target video; and feeding back the target video to the client, wherein the client plays the target video.
According to another aspect of the embodiments of the present application, there is also provided a method of processing image data, including: acquiring a plurality of continuously changed images to be processed; performing state separation on a plurality of images to be processed based on a preset neural network model, and performing interpolation processing on parts allowing interpolation operation in the plurality of images to be processed to obtain a target image; and displaying the target image.
According to another aspect of the embodiments of the present application, there is also provided an apparatus for processing image data, including: the acquisition module is used for acquiring a video to be processed; the separation module is used for carrying out state separation processing on the video to be processed based on a preset neural network algorithm to obtain a first part and a second part in the video to be processed, wherein the first part is a part which allows interpolation operation in the video to be processed, and the second part is a part which prohibits interpolation operation in the video to be processed; the interpolation module is used for carrying out interpolation processing on the first part to obtain an interpolation part; and the synthesis module is used for synthesizing the second part and the interpolation part to obtain the target video.
According to another aspect of the embodiments of the present application, there is also provided a storage medium including a stored program, wherein when the program runs, an apparatus where the storage medium is controlled to execute the method for processing video data described above is provided.
According to another aspect of the embodiments of the present application, there is also provided a processor for executing a program, where the program executes the method for processing video data described above.
According to another aspect of the embodiments of the present application, there is also provided a computing device, including: a processor; and a memory coupled to the processor for providing instructions to the processor for processing the following processing steps: acquiring a video to be processed; performing state separation processing on a video to be processed based on a preset neural network algorithm to obtain a first part and a second part in the video to be processed, wherein the first part is a part allowing interpolation operation in the video to be processed, and the second part is a part prohibiting interpolation operation in the video to be processed; carrying out interpolation processing on the first part to obtain an interpolation part; and synthesizing the second part and the interpolation part to obtain the target video.
In the embodiment of the application, a mode of performing state separation processing on a video to be processed is adopted, after the video to be processed is obtained, the state separation processing is performed on the video to be processed through a preset neural network algorithm, a first part allowing interpolation operation and a second part prohibiting interpolation operation in the video to be processed are obtained, interpolation processing is performed on the first part, an interpolation part is obtained, and finally, synthesis processing is performed on the second part and the interpolation part, and a target video is obtained.
In the process, the optical flow information does not need to be estimated, so that the problem of error interpolation caused by estimating the optical flow information is avoided, and the efficiency of video interpolation is improved. In addition, in the method, the state separation is carried out on the video to be processed by adopting a neural network algorithm so as to determine the pluggable value part in the video to be processed, and the process does not need human knowledge and experience, so that the reliability of interpolation of the video to be processed is improved. Finally, the scheme provided by the application can process the video data and also can process the images with continuous and ordered changes, so that the scheme provided by the application has universality.
Therefore, the scheme provided by the application achieves the purpose of reducing the error rate of video interpolation, thereby realizing the technical effect of improving the efficiency of video interpolation, and further solving the technical problem of error interpolation in the existing optical flow interpolation method during video interpolation.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
FIG. 1 is a schematic diagram of an alternative computing device according to embodiments of the present application;
FIG. 2 is a flow chart of a method of processing video data according to an embodiment of the application;
FIG. 3 is a schematic diagram of an alternative neural network model according to an embodiment of the present application;
FIG. 4 is a schematic diagram of an alternative U-network according to an embodiment of the present application;
FIG. 5 is a schematic diagram of an alternative composite network according to an embodiment of the present application;
FIG. 6 is a flow chart of a method of processing image data according to an embodiment of the present application;
FIG. 7 is a block diagram of an alternative method of processing image data according to an embodiment of the present application;
FIG. 8 is a schematic diagram of an apparatus for processing video data according to an embodiment of the present application;
FIG. 9 is a block diagram of a computing device according to an embodiment of the present application;
FIG. 10 is a flow chart of a method of processing video data according to an embodiment of the present application;
FIG. 11 is a flow chart of a method of processing video data according to an embodiment of the present application;
fig. 12 is a block diagram of an alternative method for processing video data according to an embodiment of the present application.
Detailed Description
In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only partial embodiments of the present application, but not all embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application without making any creative effort shall fall within the protection scope of the present application.
It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the accompanying drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It should be understood that the data so used may be interchanged under appropriate circumstances such that embodiments of the application described herein may be implemented in sequences other than those illustrated or described herein. Moreover, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
First, some terms or terms appearing in the description of the embodiments of the present application are applicable to the following explanations:
video frame interpolation refers to a method for acquiring a frame at any moment between two continuous frames of a video by using a frame interpolation algorithm.
The optical flow method is a moving image analysis method, wherein the optical flow is apparent motion of an image brightness mode, contains information of object motion, and is commonly used for determining the motion condition of an object.
Example 1
There is also provided, in accordance with an embodiment of the present application, an embodiment of a method of processing video data, it being noted that the steps illustrated in the flowchart of the figure may be carried out in a computer system such as a set of computer-executable instructions, and that while a logical order is illustrated in the flowchart, in some cases, the steps illustrated or described may be carried out in an order different than here.
The method provided by the first embodiment of the present application may be executed in a mobile terminal, a computing device, or a similar computing device. Fig. 1 shows a hardware configuration block diagram of a computing device (or mobile device) for implementing a method of processing video data. As shown in fig. 1, computing device 10 (or mobile device 10) may include one or more (shown as 102a, 102b, … …, 102 n) processors 102 (processor 102 may include, but is not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA, etc.), memory 104 for storing data, and transmission device 106 for communication functions. Besides, the method can also comprise the following steps: a display, an input/output interface (I/O interface), a Universal Serial BUS (USB) port (which may be included as one of the ports of the BUS), a network interface, a power source, and/or a camera. It will be understood by those skilled in the art that the structure shown in fig. 1 is only an illustration and is not intended to limit the structure of the electronic device. For example, computing device 10 may also include more or fewer components than shown in FIG. 1, or have a different configuration than shown in FIG. 1.
It should be noted that the one or more processors 102 and/or other data processing circuitry described above may be referred to generally herein as "data processing circuitry". The data processing circuitry may be embodied in whole or in part in software, hardware, firmware, or any combination thereof. Further, the data processing circuitry may be a single, stand-alone processing module, or incorporated, in whole or in part, into any of the other elements in the computing device 10 (or mobile device). As referred to in the embodiments of the application, the data processing circuit acts as a processor control (e.g. selection of a variable resistance termination path connected to the interface).
The memory 104 may be used to store software programs and modules of application software, such as program instructions/data storage devices corresponding to the method for processing video data in the embodiment of the present application, and the processor 102 executes various functional applications and data processing by executing the software programs and modules stored in the memory 104, so as to implement the above-mentioned method for processing video data. The memory 104 may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, memory 104 may further include memory located remotely from processor 102, which may be connected to computing device 10 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The transmission device 106 is used for receiving or transmitting data via a network. Specific examples of such networks may include wireless networks provided by a communications provider of computing device 10. In one example, the transmission device 106 includes a Network adapter (NIC) that can be connected to other Network devices through a base station to communicate with the internet. In one example, the transmission device 106 can be a Radio Frequency (RF) module, which is used to communicate with the internet in a wireless manner.
The display may be, for example, a touch screen type Liquid Crystal Display (LCD) that may enable a user to interact with a user interface of the computing device 10 (or mobile device).
Here, it should be noted that in some alternative embodiments, the computer device (or mobile device) shown in fig. 1 described above may include hardware elements (including circuitry), software elements (including computer code stored on a computer-readable medium), or a combination of both hardware and software elements. It should be noted that fig. 1 is only one example of a particular specific example and is intended to illustrate the types of components that may be present in the computer device (or mobile device) described above.
In addition, an execution subject for executing the method for processing video data provided in this embodiment may be a terminal device (e.g., a computer, a smart phone, a smart tablet, etc.) capable of playing a video, or may be a server.
Optionally, usually, the definition and/or the frame rate of the local video stored on the terminal device are not adjustable, and with the solution provided in this application, when the user views the video stored locally on the terminal device through the terminal device, the user may control the terminal device to interpolate the local video, so as to improve the definition and/or the frame rate of the local video.
Optionally, when the user watches the online video through the terminal device, the user may send an adjustment instruction for adjusting the definition and/or the frame rate of the online video to the server by operating the terminal device, and after the server receives the adjustment instruction, the server performs interpolation processing on the online video by using the method provided by this embodiment, so that the definition and/or the frame rate of the processed online video meets the requirement of the user.
In the present embodiment, a server is used as an execution subject.
Optionally, in the above operating environment, the present application provides a method for processing video data as shown in fig. 2. Fig. 2 is a flowchart of a method for processing video data according to a first embodiment of the present application, and as can be seen from fig. 2, the method includes the following steps:
step S202, a video to be processed is obtained.
Optionally, when the user watches the online video (i.e., the video to be processed) through the terminal device, the definition of the online video needs to be adjusted, and at this time, the user may click a definition adjustment control in a client that plays the online video on the terminal device to determine the target definition. After the definition of the target is determined, the terminal device sends the definition adjusting instruction to the server, and the server analyzes the definition adjusting instruction to obtain the definition of the target and video information (such as a video link address and a video name) of an online video of which the definition needs to be adjusted. And the server acquires the online video according to the video information and adjusts the definition of the online video by taking the definition of the target as the target.
And S204, performing state separation processing on the video to be processed based on a preset neural network algorithm to obtain a first part and a second part in the video to be processed.
In step S204, the first part is a part of the video to be processed, which allows the interpolation operation, and the second part is a part of the video to be processed, which prohibits the interpolation operation. Thus, through the step S204, the video to be processed can be divided into the pluggable portion and the non-pluggable portion.
It should be noted that, in step S204, the server implements state separation of the video to be processed through a preset neural network algorithm, and the process does not depend on human knowledge and experience, so that compared with the existing optical flow method, the reliability of interpolation processing on the video to be processed is improved, and the problem of interpolation error in estimating optical flow information is avoided.
In step S206, the first part is interpolated to obtain an interpolated part.
It should be noted that, since the first portion is a pluggable value portion in the video to be processed, the server may perform linear interpolation on the first portion to obtain an interpolated portion. Optionally, the server may interpolate the pluggable value portion in the video to be processed by using an existing interpolation algorithm, where the interpolation algorithm may be, but is not limited to, a SepConv algorithm, a supersslomo algorithm, a DAIN algorithm, a BMBC algorithm, a RIFE algorithm, a QVI algorithm, a CyclicGen algorithm, and the like.
And step S208, synthesizing the second part and the interpolation part to obtain a target video.
It should be noted that, since the state separation processing is performed on the video to be processed in step S204, in order to ensure the integrity of the video, after the interpolation processing is performed on the insertable value portion in the video to be processed, the interpolation portion after the interpolation processing and the portion where the interpolation is prohibited need to be synthesized to obtain the target video for the user to view.
Optionally, after the server performs synthesis processing on the second portion and the interpolation portion to obtain the target video, the server may send the target video to the terminal device, so that the terminal device plays the target video. Or the server can also store the target video in the cloud end and send the target network link corresponding to the target video to the terminal device, and the terminal device can switch the network link corresponding to the video to be processed to the target network link corresponding to the target video and play the target video with higher definition through the target network link.
Based on the solutions defined in the foregoing steps S202 to S208, it can be known that, in the embodiment of the present application, a mode of performing state separation processing on a to-be-processed video is adopted, after the to-be-processed video is acquired, the to-be-processed video is subjected to state separation processing based on a preset neural network algorithm, a first part allowing interpolation operation and a second part prohibiting interpolation operation in the to-be-processed video are obtained, the first part is subjected to interpolation processing, an interpolation part is obtained, and finally, the second part and the interpolation part are subjected to synthesis processing, so as to obtain a target video.
It is easy to notice that in the above process, the optical flow information does not need to be estimated, thereby avoiding the problem of erroneous interpolation caused by estimating the optical flow information and improving the efficiency of video interpolation. In addition, in the application, the neural network algorithm is adopted to separate the states of the video to be processed so as to determine the pluggable value part in the video to be processed, and the process does not need human knowledge and experience, so that the reliability of interpolation of the video to be processed is improved. Finally, the scheme provided by the application can process the video data and also can process the images with continuous and ordered changes, so that the scheme provided by the application has universality.
Therefore, the scheme provided by the application achieves the purpose of reducing the error rate of video interpolation, thereby realizing the technical effect of improving the efficiency of video interpolation and further solving the technical problem of error interpolation in the existing optical flow interpolation method during video interpolation.
In an optional embodiment, after the video to be processed is acquired, the server performs state separation processing on the video to be processed based on a preset neural network algorithm to obtain a first part and a second part in the video to be processed. Specifically, the server first obtains a first frame image and a second frame image in a video to be processed, then inputs the first frame image and the second frame image into a first network according to a first sequence, inputs the first frame image and the second frame image into a second network according to a second sequence, and finally performs state separation processing on the first frame image and the second frame image based on the first network and the second network to obtain a first part and a second part. The first frame image and the second frame image are two continuous adjacent frames.
Alternatively, fig. 3 shows a schematic diagram of an alternative neural network model, which is shown in fig. 3 and is composed of a U-shaped network and a synthetic network, wherein the U-shaped network includes two sub-networks (a first network and a second network), that is, the first network and the second network are both U-shaped networks. It should be noted that the U-shaped network is a network for classifying the image by the whole image, wherein the U-shaped network has a network structure of up-sampling and down-sampling, the down-sampling is used for gradually showing the environment information, the up-sampling is used for restoring the detail information by combining the information of each layer of the down-sampling and the input information of the up-sampling, and gradually restoring the image precision.
It should be noted that the network structures of the first network and the second network are completely the same, and the convolution kernel parameter of the first network is the same as that of the second network, that is, the first network and the second network share the weight. When the weight sharing represents that the first network and the second network scan the image, the image is scanned through convolution kernels in the first network and the second network, numerical values in the convolution kernels are the weight of the network, and in application, each position in the same image is scanned by the same convolution kernel, so that the weight values of the first network and the second network are shared when the image is scanned.
In fig. 3, the order of images input to the two networks is reversed, that is, the first order of inputting images to the first network and the second order of inputting images to the second network are reversed. For example, in fig. 3, the order of the images input to the first network is (I) 0 ,I 1 ) The sequence of the images input to the first network is (I) 1 ,I 0 ). Wherein, I 0 And I 1 Representing two consecutive frames of images in the video to be processed.
In an alternative embodiment, the first network and the second network perform state separation processing on the first frame image and the second frame image to obtain a first portion and a second portion. Specifically, the server first obtains a preset separation ratio, and in a first network, performs state separation processing on the first frame image and the second frame image based on the preset separation ratio to obtain a third portion and a fourth portion, and in a second network, performs state separation processing on the first frame image and the second frame image based on the preset separation ratio to obtain a fifth portion and a sixth portion, where the first portion includes the third portion and the fifth portion, and the second portion includes the fourth portion and the sixth portion.
It should be noted that, in the foregoing process, the state of the video to be processed is separated in the first network and the second network, respectively, so as to obtain the pluggable value part and the interpolation prohibition part separated by each network, where the third part and the fifth part are interpolatable parts, and the fourth part and the sixth part are non-interpolatable parts.
Optionally, the server may obtain a preset separation ratio, where the separation ratio is used to determine a ratio of the pluggable value portion in the video to be processed. The separation ratio may be a fixed value, that is, the same separation ratio is sampled for all videos to perform state separation. The separation ratio may also be an adaptive ratio, that is, different separation ratios are used for different videos to perform state separation, for example, the separation ratio is determined according to video parameters such as image size and resolution of the video to be processed.
It should be noted that unlike the conventional U-network, the U-network in this application can receive additional times t and 1-t as inputs in the decoding portion and act on each layer of the decoding portion.
In an alternative embodiment, after obtaining the pluggable value part in the video to be processed, the server performs interpolation processing on the pluggable value part to obtain an interpolated part. Specifically, the server first obtains a first interpolation time and a second interpolation time, calculates a product of the first interpolation time and the third portion to obtain a first result, calculates a product of the second interpolation time and the fifth portion to obtain a second result, and finally obtains the interpolation portion according to the first result and the second result. Wherein the sum of the first interpolation time and the second interpolation time is a constant;
optionally, in the structural diagram of the U-shaped network shown in fig. 4, the black rectangles and the white rectangles form a feature layer, the white rectangles are extracted as pluggable value parts according to a preset separation ratio (for example, 50%), the black parts are non-pluggable parts, and then the white rectangles form pluggable value feature layers, for example, P and Q in fig. 4 form pluggable value feature layers. And then, the server acquires interpolation time, and multiplies the interpolation time by the pluggable value feature layer to obtain a pluggable value part.
It should be noted that the structure of the U-shaped network is not limited to the structure shown in fig. 4, and the depth and the number of channels of the U-shaped network may be set according to actual requirements, which is not specifically limited in this embodiment. In addition, in the above process, the interpolation time may also be set according to actual requirements, for example, the interpolation time may be determined according to the original resolution and the target resolution of the video to be processed. For two networks (i.e., the first network and the second network), the sum of the corresponding interpolation times is constant, for example, the first interpolation time corresponding to the first network is t, and the interpolation time corresponding to the second network may be 1-t. Performing product operation on t and a pluggable value characteristic layer in a first network in the process of determining an interpolation part in the video to be processed to obtain a first result; and performing product operation on the 1-t and a pluggable value characteristic layer in a second network to obtain a second result, and finally obtaining an interpolation part of the video to be processed according to the first result and the second result.
Further, after the state of the video to be processed is separated in the U-shaped network and the interpolation part is obtained, the server synthesizes the part of the video to be processed, which is forbidden to perform interpolation operation, with the interpolation part through the synthesis network to obtain the target video. Specifically, the server inputs the second part and the interpolation part into the synthesis network, and performs synthesis processing on the second part and the interpolation part based on the synthesis network to obtain the target video (for example, I in fig. 3) t )。
Alternatively, fig. 5 shows a schematic structural diagram of an alternative synthesis network, and in the synthesis network shown in fig. 5, a down-sampling part, an up-sampling part, and a side part are included.
It should be noted that the structure of the synthesis network is not limited to the network structure shown in fig. 5, in practical applications, a plurality of synthesis networks with different structures may be provided, and alternatively, when the interpolation portion and the portion where the interpolation operation is prohibited are subjected to the synthesis operation, an appropriate synthesis network may be selected according to the video parameters of the video to be processed for the synthesis processing, which not only can save the video synthesis time and improve the video synthesis efficiency, but also can ensure the reliability of the video synthesis.
According to the content, the scheme provided by the application performs learning decoupling on the video to be processed through the deep learning network, and then performs linear interpolation operation only on the video part corresponding to the pluggable value state. By the method, the frame interpolation algorithm of frame interpolation effect at any moment can still be obtained without depending on optical flow estimation information, and good effect is achieved in complex scenes and motion.
Moreover, the scheme for separating the states of the video to be processed realizes continuous video interpolation at any moment, does not need to rely on optical flow estimation information which is easy to make mistakes and human experience, and ensures the reliability of the video interpolation.
It should be noted that the concept of state separation provided by the present application can be applied to processing other continuous and ordered changes, for example, scenes such as picture deformation and picture stylization, so that the scheme provided by the present application has generality.
It should be noted that for simplicity of description, the above-mentioned embodiments of the method are described as a series of acts, but those skilled in the art should understand that the present application is not limited by the described order of acts, as some steps may be performed in other orders or simultaneously according to the present application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.
Through the description of the foregoing embodiments, it is clear to those skilled in the art that the method for processing video data according to the foregoing embodiments can be implemented by software plus a necessary general hardware platform, and certainly can be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present application.
Example 2
According to an embodiment of the present application, there is also provided a method of processing image data, as shown in fig. 6, the method including the steps of:
step S602, a plurality of continuously changing images to be processed are acquired.
It should be noted that the essence of the image generation task may be to learn a mapping from input to output, wherein the learning of the mapping relationship from input to output may be implemented by using an image-to-image model framework. However, in the prior art, the output of the image-to-image based model framework is single, i.e., given an input, only a unique output is available. In real scenes, however, many changes are continuous, such as a morphing process from cat images to dog images, and also, for example, a process of aging a face image from young to old.
In this embodiment, the image to be processed is an ordered and continuously changed image, and the ordered and continuously changed image may be an image with deformed image or a stylized image.
In addition, the execution subject executing the embodiment may be a terminal device (e.g., a computer, a smart phone, a smart tablet, etc.), or may be a server. In this embodiment, a server is used as an execution subject.
Optionally, in the process of viewing the online image (i.e., the image to be processed) through the terminal device, the user needs to adjust the definition of the online image, and at this time, the user can click the definition adjusting control on the terminal device to determine the target definition. After the target definition is determined, the terminal device sends the definition adjusting instruction to the server, and the server analyzes the definition adjusting instruction to obtain the target definition and image information (such as an image link address and an image name) of the online image of which the definition needs to be adjusted. And the server acquires the online image according to the image information and adjusts the definition of the online image by taking the definition of the target as the target.
And step S604, performing state separation on the plurality of images to be processed based on a preset neural network model, and performing interpolation processing on parts of the plurality of images to be processed, which are allowed to be subjected to interpolation operation, to obtain a target image.
In step S604, state separation is performed on the image to be processed to realize the division of the image to be processed into an interpolation-permitted portion and an interpolation-prohibited portion. Optionally, fig. 7 shows a frame diagram of an optional processed image data, and as can be seen from fig. 7, after the to-be-processed image is acquired, the server performs state separation on the to-be-processed image to obtain a separation result so as to divide the to-be-processed image into an interpolation-allowed portion and an interpolation-prohibited portion, then performs interpolation processing only on the interpolation-allowed portion in the separation result to obtain an interpolation result, and finally performs synthesis processing to obtain a target image and output the target image.
It should be noted that, in step S604, the server implements state separation of the image to be processed through the preset neural network model, and the process does not depend on human knowledge and experience, so that compared with the existing optical flow method, the reliability of interpolation processing on the image to be processed is improved, and the problem of interpolation error in estimating the optical flow information is avoided.
Step S606, the target image is displayed.
Based on the solutions defined in steps S602 to S606, it can be known that, in the embodiment of the present application, a mode of performing state separation processing on a plurality of continuously changing images to be processed is adopted, after the plurality of continuously changing images to be processed are obtained, the state separation processing is performed on the plurality of continuously changing images to be processed based on a preset neural network model, interpolation processing is performed on a portion of the plurality of images to be processed, which allows interpolation operation, so as to obtain a target image, and the target image is displayed.
It is easy to notice that in the above process, the optical flow information does not need to be estimated, so as to avoid the problem of error interpolation caused by estimating the optical flow information and improve the efficiency of image interpolation. In addition, in the method, the neural network algorithm is adopted to separate the states of the image to be processed so as to determine the pluggable value part in the image to be processed, and the process does not need human knowledge and experience, so that the reliability of interpolation of the image to be processed is improved. Finally, the scheme provided by the application can process the video data besides the image data, so that the scheme provided by the application has universality.
Therefore, the scheme provided by the application achieves the purpose of reducing the error rate of image interpolation, thereby realizing the technical effect of improving the efficiency of image interpolation and further solving the problem of error interpolation in the process of interpolating the image by the conventional optical flow interpolation algorithm.
In an alternative embodiment, after obtaining the multiple images to be processed, the server performs state separation on the multiple images to be processed, and performs interpolation processing on a part of the multiple images to be processed, which allows interpolation operation, to obtain the target image. Specifically, the server performs state separation on a plurality of images to be processed based on a preset neural network model to obtain a first part and a second part, then performs interpolation processing on the first part to obtain an interpolation part, and finally performs synthesis processing on the second part and the interpolation part to obtain a target image. The first part is a part which allows interpolation operation in the plurality of images to be processed, and the second part is a part which prohibits interpolation operation in the plurality of images to be processed.
It should be noted that, since the first portion is a pluggable value portion in the image to be processed, the server may perform linear interpolation on the first portion to obtain an interpolated portion. Optionally, the server may interpolate the pluggable value portion in the image to be processed by using an existing interpolation algorithm, where the interpolation algorithm may be, but is not limited to, a SepConv algorithm, a supersslomo algorithm, a DAIN algorithm, a BMBC algorithm, a RIFE algorithm, a QVI algorithm, a CyclicGen algorithm, and the like.
In addition, since the state separation processing is performed on the image to be processed in step S604, in order to ensure the integrity of the image, after the interpolation processing is performed on the pluggable value portion in the image to be processed, the interpolation portion after the interpolation processing and the portion where the interpolation is prohibited need to be subjected to the synthesis processing, so as to obtain the target image for the user to view.
Optionally, after the server performs synthesis processing on the second portion and the interpolation portion to obtain the target image, the server may send the target image to the terminal device, so that the terminal device displays the target image. Or the server can also store the target image in the cloud, and send the target network link corresponding to the target image to the terminal device, and the terminal device can switch the network link corresponding to the image to be processed to the target network link corresponding to the target image and display the target image with higher definition through the target network link.
It should be noted that the state separation processing, the interpolation processing, and the composition processing of the image to be processed in this embodiment are substantially the same as those performed on the video to be processed in embodiment 1, and only the video data is processed in embodiment 1, whereas the image data is processed in this embodiment. Specific technical details are described in embodiment 1, and are not described herein again.
Example 3
According to an embodiment of the present application, there is also provided a method for processing video data, as shown in fig. 10, the method including the steps of:
and step S1002, displaying the video to be processed in the interactive interface.
Optionally, in step S1002, the interactive interface may be an interactive interface displayed by the terminal device.
In an optional embodiment, during the process of watching the online video (i.e., the video to be processed) through the terminal device, the user needs to adjust the definition of the online video, and at this time, the user may click a definition adjustment control in a client that plays the online video on the terminal device to determine the target definition. After the definition of the target is determined, the terminal device sends the definition adjusting instruction to the server, and the server analyzes the definition adjusting instruction to obtain the definition of the target and video information (such as a video link address and a video name) of an online video of which the definition needs to be adjusted. And the server acquires the online video according to the video information and adjusts the definition of the online video by taking the definition of the target as the target.
Step S1004, acquiring a separation result of the to-be-processed video when the interactive interface receives the separation instruction, wherein the to-be-processed video is subjected to state separation by using a preset neural network model, and the separation result includes: a first part of the video to be processed allowing interpolation operations and a second part of the video to be processed prohibiting interpolation operations.
In step S1004, the preset neural network model may be composed of a U-shaped network and a synthetic network, where the U-shaped network is configured to separate a video to be processed to obtain a first portion and a second portion, and perform interpolation processing on the first portion that allows interpolation operation. Optionally, the U-shaped network includes two sub-networks (a first network and a second network), the network structures of the first network and the second network are identical, and the convolution kernel parameter of the first network is identical to the convolution kernel parameter of the second network, that is, the first network and the second network share the weight. The order of the images input into the two networks is reversed, i.e. the first order of inputting images into the first network and the second order of inputting images into the second network are opposite to each other.
It should be noted that the server separates the state of the video to be processed through the preset neural network model, and the process does not depend on human knowledge and experience, so that compared with the existing optical flow method, the reliability of interpolation processing on the video to be processed is improved, and the problem of interpolation error existing in the estimation of optical flow information is avoided.
And step S1006, displaying a target video in the interactive interface, wherein the target video is obtained by synthesizing the result obtained by performing interpolation processing on the first part and the second part.
It should be noted that, since the state separation processing is performed on the video to be processed in step S1004, in order to ensure the integrity of the video, after the interpolation processing is performed on the insertable value portion in the video to be processed, the interpolation portion after the interpolation processing and the portion where the interpolation is prohibited need to be synthesized to obtain the target video for the user to view.
Based on the solutions defined in steps S1002 to S1006, it can be known that, in this embodiment of the present application, a mode of performing state separation processing on a video to be processed is adopted, after the video to be processed is obtained, the state separation processing is performed on the video to be processed based on a preset neural network algorithm, so as to obtain a first part allowing an interpolation operation and a second part prohibiting the interpolation operation in the video to be processed, perform interpolation processing on the first part, obtain an interpolation part, and finally perform synthesis processing on the second part and the interpolation part, so as to obtain a target video.
It is easy to notice that in the above process, the optical flow information does not need to be estimated, so as to avoid the problem of erroneous interpolation caused by estimating the optical flow information and improve the efficiency of video interpolation. In addition, in the method, the state separation is carried out on the video to be processed by adopting a neural network algorithm so as to determine the pluggable value part in the video to be processed, and the process does not need human knowledge and experience, so that the reliability of interpolation of the video to be processed is improved. Finally, the scheme provided by the application can process the video data and also can process the images with continuous and ordered changes, so that the scheme provided by the application has universality.
Therefore, the scheme provided by the application achieves the purpose of reducing the error rate of video interpolation, thereby realizing the technical effect of improving the efficiency of video interpolation and further solving the technical problem of error interpolation in the existing optical flow interpolation method during video interpolation.
In an optional embodiment, in the present application, the terminal device may further determine the portion to be displayed from the separation result in response to a selection instruction of the user, and display the portion to be displayed.
Optionally, in the process of processing the video to be processed by the server, the terminal device may display the first part and the second part after the state separation of the video to be processed. The user can select to display the whole display, or display the first part or the second part, or not display the whole display according to the requirement of the user. Optionally, by default, the first part and the second part after the state separation are not displayed.
It should be noted that the state separation processing, the interpolation processing, and the synthesis processing of the video to be processed in embodiment 1 can be applied to this embodiment, and specific technical details are already described in embodiment 1 and are not described herein again.
Example 4
According to an embodiment of the present application, there is also provided a method for processing video data, as shown in fig. 11, the method including the steps of:
step S1102, in the process of playing the video to be processed, detecting a control instruction sent by the client, where the control instruction is used to adjust the frame rate of the video to be processed.
In step S1102, the video to be processed may be a short video/live video in a live tv broadcast scene, for example, in a live tv broadcast and sales scene, a user wants to watch a live video with higher definition, and at this time, the user sends a control instruction for adjusting a video frame rate to the server by operating the control for adjusting the video frame rate on the client, so that the server obtains the control instruction. In addition, the video to be processed may also be a teaching video or a distance education video in an education scene, various videos in an entertainment scene, and the like, and the corresponding manner of generating the control instruction is the same as the above manner, which is not described herein again.
And step S1104, based on the control instruction, performing state separation processing on the video to be processed by adopting a preset neural network algorithm, and performing interpolation processing on a part which allows interpolation in the video to be processed to obtain a target video.
In an alternative embodiment, fig. 12 shows a frame diagram of an alternative method for processing video data, as shown in fig. 12, after a client sends a control instruction, a server receives the control instruction, acquires a video to be processed according to the control instruction, and performs state separation on the video to be processed to obtain a separation result, where a first part of the separation result is an interpolation-allowed part and a second part is an interpolation-prohibited part, then performs interpolation processing on the interpolation-allowed first part to obtain an interpolation result, and finally performs synthesis processing on the second part and the first part to obtain the target video.
Optionally, the neural network model corresponding to the preset neural network algorithm may be composed of a U-shaped network and a synthetic network, where the U-shaped network is configured to separate the video to be processed to obtain a first part and a second part, and perform interpolation processing on the first part that allows the interpolation operation. Optionally, the U-shaped network includes two sub-networks (a first network and a second network), the network structures of the first network and the second network are identical, and the convolution kernel parameter of the first network is identical to the convolution kernel parameter of the second network, that is, the first network and the second network share the weight. Furthermore, the order of the images input into the two networks is reversed, i.e. the first order of inputting images to the first network and the second order of inputting images to the second network are the reverse of each other.
It should be noted that the server realizes the state separation of the video to be processed through a preset neural network algorithm, the process does not need to rely on human knowledge and experience, and compared with the existing optical flow method, the reliability of interpolation processing of the video to be processed is improved, and the problem of interpolation error existing in the estimation of optical flow information is avoided.
Step S1106, feeding the target video back to the client, where the client plays the target video.
Optionally, as shown in fig. 12, after the server obtains the target video, the target video is output to the client, and at this time, the video played by the client is the video after interpolation.
Based on the solutions defined in steps S1102 to S1106, it can be known that, in this embodiment of the present application, a mode of performing state separation processing on a video to be processed is adopted, after the video to be processed is obtained, the state separation processing is performed on the video to be processed based on a preset neural network algorithm, so as to obtain a first part allowing an interpolation operation and a second part prohibiting the interpolation operation in the video to be processed, perform interpolation processing on the first part, obtain an interpolation part, and finally perform synthesis processing on the second part and the interpolation part, so as to obtain a target video.
It is easy to notice that in the above process, the optical flow information does not need to be estimated, thereby avoiding the problem of erroneous interpolation caused by estimating the optical flow information and improving the efficiency of video interpolation. In addition, in the method, the state separation is carried out on the video to be processed by adopting a neural network algorithm so as to determine the pluggable value part in the video to be processed, and the process does not need human knowledge and experience, so that the reliability of interpolation of the video to be processed is improved. Finally, the scheme provided by the application can process the video data and also can process the images with continuous and ordered changes, so that the scheme provided by the application has universality.
Therefore, the scheme provided by the application achieves the purpose of reducing the error rate of video interpolation, thereby realizing the technical effect of improving the efficiency of video interpolation and further solving the technical problem of error interpolation in the existing optical flow interpolation method during video interpolation.
It should be noted that the state separation processing, the interpolation processing, and the synthesis processing of the video to be processed in embodiment 1 can be applied to this embodiment, and specific technical details are already described in embodiment 1 and are not described herein again.
Example 5
According to an embodiment of the present application, there is also provided an apparatus for implementing the method for processing video data, as shown in fig. 8, the apparatus 80 includes: an acquisition module 801, a separation module 803, an interpolation module 805, and a synthesis module 807.
The acquiring module 801 is configured to acquire a video to be processed; a separation module 803, configured to perform state separation processing on a to-be-processed video based on a preset neural network algorithm to obtain a first part and a second part in the to-be-processed video, where the first part is a part in the to-be-processed video that allows interpolation operation, and the second part is a part in the to-be-processed video that prohibits interpolation operation; an interpolation module 805, configured to perform interpolation processing on the first part to obtain an interpolated part; and a synthesizing module 807 for synthesizing the second part and the interpolated part to obtain the target video.
It should be noted here that the acquiring module 801, the separating module 803, the interpolating module 805, and the synthesizing module 807 described above correspond to steps S202 to S208 in embodiment 1, and the four modules are the same as the corresponding steps in the implementation example and application scenario, but are not limited to the disclosure in the first embodiment. It should be noted that the above modules as part of the apparatus may be run in the computing device 10 provided in the first embodiment.
Optionally, the separation module includes: the device comprises a first acquisition module, a first input module, a second input module and a first separation module. The first acquisition module is used for acquiring a first frame image and a second frame image in a video to be processed, wherein the first frame image and the second frame image are two continuous adjacent frames of images; a first input module, configured to input the first frame image and the second frame image into a first network according to a first order; the second input module is used for inputting the first frame image and the second frame image into a second network according to a second sequence, wherein the first sequence and the second sequence are reverse to each other; and the first separation module is used for carrying out state separation processing on the first frame image and the second frame image based on the first network and the second network to obtain a first part and a second part.
Optionally, the convolution kernel parameters of the first network are the same as the convolution kernel parameters of the second network.
Optionally, the first separation module includes: the device comprises a second acquisition module, a second separation module and a third separation module. The second acquisition module is used for acquiring a preset separation proportion; the second separation module is used for carrying out state separation processing on the first frame image and the second frame image based on a preset separation proportion in the first network to obtain a third part and a fourth part; and the third separation module is used for carrying out state separation processing on the first frame image and the second frame image based on a preset separation proportion in a second network to obtain a fifth part and a sixth part, wherein the first part comprises the third part and the fifth part, and the second part comprises the fourth part and the sixth part.
Optionally, the interpolation module includes: the device comprises a third acquisition module, a first calculation module, a second calculation module and a processing module. The third obtaining module is configured to obtain a first interpolation time and a second interpolation time, where a sum of the first interpolation time and the second interpolation time is a constant; the first calculation module is used for calculating the product of the first interpolation time and the third part to obtain a first result; the second calculation module is used for calculating the product of the second interpolation time and the fifth part to obtain a second result; and the processing module is used for obtaining an interpolation part according to the first result and the second result.
Optionally, the synthesis module includes: a third input module and a synthesis submodule. The third input module is used for inputting the second part and the interpolation part into the synthesis network; and the synthesis submodule is used for carrying out synthesis processing on the second part and the interpolation part based on a synthesis network to obtain the target video.
Example 6
Embodiments of the present application may provide a computing device that may be any one of a group of computing devices. Optionally, in this embodiment, the computing device may also be replaced with a terminal device such as a mobile terminal.
Optionally, in this embodiment, the computing device may be located in at least one network device of a plurality of network devices of a computer network.
In this embodiment, the above-mentioned computing device may execute program code of the following steps in the method of processing video data: acquiring a video to be processed; performing state separation processing on a video to be processed based on a preset neural network algorithm to obtain a first part and a second part in the video to be processed, wherein the first part is a part which allows interpolation operation in the video to be processed, and the second part is a part which prohibits interpolation operation in the video to be processed; performing interpolation processing on the first part to obtain an interpolation part; and synthesizing the second part and the interpolation part to obtain a target video.
Optionally, fig. 9 is a block diagram of a computing device according to an embodiment of the present application. As shown in fig. 9, the computing device 9 may include: one or more processors 902 (only one of which is shown), memory 904, and a peripherals interface 906.
The memory may be used to store software programs and modules, such as program instructions/modules corresponding to the method and apparatus for processing video data in the embodiments of the present application, and the processor executes various functional applications and data processing by running the software programs and modules stored in the memory, so as to implement the above-described method for processing video data. The memory may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some instances, the memories may further include a memory located remotely from the processor, which may be connected to computing device 10 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The processor can call the information and application program stored in the memory through the transmission device to execute the following steps: acquiring a video to be processed; performing state separation processing on a video to be processed based on a preset neural network algorithm to obtain a first part and a second part in the video to be processed, wherein the first part is a part which allows interpolation operation in the video to be processed, and the second part is a part which prohibits interpolation operation in the video to be processed; carrying out interpolation processing on the first part to obtain an interpolation part; and synthesizing the second part and the interpolation part to obtain the target video.
Optionally, the processor may further execute the program code of the following steps: acquiring a first frame image and a second frame image in a video to be processed, wherein the first frame image and the second frame image are two continuous adjacent frames of images; inputting a first frame image and a second frame image into a first network in a first order; inputting the first frame image and the second frame image into a second network according to a second sequence, wherein the first sequence and the second sequence are opposite to each other; and performing state separation processing on the first frame image and the second frame image based on the first network and the second network to obtain a first part and a second part.
Optionally, the convolution kernel parameters of the first network are the same as the convolution kernel parameters of the second network.
Optionally, the processor may further execute the program code of the following steps: acquiring a preset separation proportion; in a first network, performing state separation processing on a first frame image and a second frame image based on a preset separation ratio to obtain a third part and a fourth part; and in the second network, performing state separation processing on the first frame image and the second frame image based on a preset separation ratio to obtain a fifth part and a sixth part, wherein the first part comprises the third part and the fifth part, and the second part comprises the fourth part and the sixth part.
Optionally, the processor may further execute the program code of the following steps: acquiring first interpolation time and second interpolation time, wherein the sum of the first interpolation time and the second interpolation time is a constant; calculating the product of the first interpolation time and the third part to obtain a first result; calculating the product of the second interpolation time and the fifth part to obtain a second result; an interpolation section is obtained based on the first result and the second result.
Optionally, the processor may further execute the program code of the following steps: inputting the second portion and the interpolation portion into a synthesis network; and synthesizing the second part and the interpolation part based on a synthesis network to obtain a target video.
It can be understood by those skilled in the art that the structure shown in fig. 9 is only an illustration, and the computing device may also be a terminal device such as a smart phone (e.g., an Android phone, an iOS phone, etc.), a tablet computer, a palmtop computer, a Mobile Internet Device (MID), a PAD, and the like. Fig. 9 is a diagram illustrating the structure of the electronic device. For example, computing device 10 may also include more or fewer components (e.g., network interfaces, display devices, etc.) than shown in FIG. 9, or have a different configuration than shown in FIG. 9.
Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by a program instructing hardware associated with the terminal device, where the program may be stored in a computer-readable storage medium, and the storage medium may include: flash disks, Read-Only memories (ROMs), Random Access Memories (RAMs), magnetic or optical disks, and the like.
Example 7
Embodiments of the present application also provide a storage medium. Optionally, in this embodiment, the storage medium may be configured to store program codes executed by the method for processing video data provided in the first embodiment.
Optionally, in this embodiment, the storage medium may be located in any one of computing devices in a computing device group in a computer network, or in any one of mobile terminals in a mobile terminal group.
Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps: acquiring a video to be processed; performing state separation processing on a video to be processed based on a preset neural network algorithm to obtain a first part and a second part in the video to be processed, wherein the first part is a part which allows interpolation operation in the video to be processed, and the second part is a part which prohibits interpolation operation in the video to be processed; carrying out interpolation processing on the first part to obtain an interpolation part; and synthesizing the second part and the interpolation part to obtain the target video.
Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps: acquiring a first frame image and a second frame image in a video to be processed, wherein the first frame image and the second frame image are two continuous adjacent frames of images; inputting a first frame image and a second frame image into a first network in a first order; inputting the first frame image and the second frame image into a second network according to a second sequence, wherein the first sequence and the second sequence are opposite to each other; and performing state separation processing on the first frame image and the second frame image based on the first network and the second network to obtain a first part and a second part.
Optionally, the convolution kernel parameters of the first network are the same as the convolution kernel parameters of the second network.
Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps: acquiring a preset separation proportion; in a first network, performing state separation processing on a first frame image and a second frame image based on a preset separation proportion to obtain a third part and a fourth part; and in the second network, performing state separation processing on the first frame image and the second frame image based on a preset separation ratio to obtain a fifth part and a sixth part, wherein the first part comprises the third part and the fifth part, and the second part comprises the fourth part and the sixth part.
Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps: acquiring first interpolation time and second interpolation time, wherein the sum of the first interpolation time and the second interpolation time is a constant; calculating the product of the first interpolation time and the third part to obtain a first result; calculating the product of the second interpolation time and the fifth part to obtain a second result; an interpolation part is obtained according to the first result and the second result.
Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps: inputting the second portion and the interpolation portion into a synthesis network; and synthesizing the second part and the interpolation part based on a synthesis network to obtain the target video.
The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.
In the above embodiments of the present application, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In the embodiments provided in the present application, it should be understood that the disclosed technical content can be implemented in other manners. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit may be implemented in the form of hardware, or may also be implemented in the form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk, and various media capable of storing program codes.
The foregoing is only a preferred embodiment of the present application and it should be noted that those skilled in the art can make several improvements and modifications without departing from the principle of the present application, and these improvements and modifications should also be considered as the protection scope of the present application.

Claims (15)

1. A method of processing video data, comprising:
acquiring a video to be processed;
performing state separation processing on the video to be processed based on a preset neural network algorithm to obtain a first part and a second part in the video to be processed, wherein the first part is a part which allows interpolation operation in the video to be processed, and the second part is a part which prohibits interpolation operation in the video to be processed;
performing interpolation processing on the first part to obtain an interpolation part;
and synthesizing the second part and the interpolation part to obtain a target video.
2. The method according to claim 1, wherein performing state separation processing on the video to be processed based on a preset neural network algorithm to obtain a first part and a second part in the video to be processed comprises:
acquiring a first frame image and a second frame image in the video to be processed, wherein the first frame image and the second frame image are two continuous adjacent images;
inputting the first frame image and the second frame image into a first network in a first order;
inputting the first frame image and the second frame image into a second network according to a second sequence, wherein the first sequence and the second sequence are opposite to each other;
and performing the state separation processing on the first frame image and the second frame image based on the first network and the second network to obtain the first part and the second part.
3. The method of claim 2, wherein the convolution kernel parameters of the first network are the same as the convolution kernel parameters of the second network.
4. The method of claim 2, wherein performing the state separation process on the first frame image and the second frame image based on the first network and the second network to obtain the first portion and the second portion comprises:
acquiring a preset separation ratio;
in the first network, performing the state separation processing on the first frame image and the second frame image based on the preset separation proportion to obtain a third part and a fourth part;
in the second network, the state separation processing is performed on the first frame image and the second frame image based on the preset separation ratio to obtain a fifth part and a sixth part, wherein the first part includes the third part and the fifth part, and the second part includes the fourth part and the sixth part.
5. The method of claim 4, wherein interpolating the first portion to obtain an interpolated portion comprises:
acquiring first interpolation time and second interpolation time, wherein the sum of the first interpolation time and the second interpolation time is a constant;
calculating the product of the first interpolation time and the third part to obtain a first result;
calculating the product of the second interpolation time and the fifth part to obtain a second result;
and obtaining the interpolation part according to the first result and the second result.
6. The method of claim 1, wherein the synthesizing the second portion and the interpolation portion to obtain the target video comprises:
inputting the second portion and the interpolation portion into a synthesis network;
and synthesizing the second part and the interpolation part based on the synthesis network to obtain the target video.
7. A method of processing video data, comprising:
displaying a video to be processed in an interactive interface;
under the condition that the interactive interface receives a separation instruction, obtaining a separation result of the video to be processed, wherein a preset neural network model is adopted to perform state separation on the video to be processed, and the separation result comprises: a first part allowing interpolation operation and a second part prohibiting interpolation operation in the video to be processed;
and displaying a target video in the interactive interface, wherein the target video is obtained by synthesizing a result obtained by performing interpolation processing on the first part and the second part.
8. The method of claim 7, further comprising:
responding to a selection instruction, and determining a part to be displayed from the separation result;
and displaying the part to be displayed.
9. A method of processing video data, comprising:
in the process of playing a video to be processed, detecting a control instruction sent by a client, wherein the control instruction is used for adjusting the frame rate of the video to be processed;
based on the control instruction, performing state separation processing on the video to be processed by adopting a preset neural network algorithm, and performing interpolation processing on a part which is allowed to be interpolated in the video to be processed to obtain a target video;
feeding the target video back to the client, wherein the client plays the target video.
10. A method of processing image data, comprising:
acquiring a plurality of continuously changed images to be processed;
performing state separation on the multiple images to be processed based on a preset neural network model, and performing interpolation processing on a part, which allows interpolation operation, of the multiple images to be processed to obtain a target image;
and displaying the target image.
11. The method according to claim 10, wherein performing state separation on the plurality of images to be processed based on a preset neural network model, and performing interpolation processing on a part of the plurality of images to be processed allowing interpolation operation to obtain a target image, comprises:
performing state separation on the multiple images to be processed based on the preset neural network model to obtain a first part and a second part, wherein the first part is a part allowing interpolation operation in the multiple images to be processed, and the second part is a part prohibiting interpolation operation in the multiple images to be processed;
carrying out interpolation processing on the first part to obtain an interpolation part;
and synthesizing the second part and the interpolation part to obtain the target image.
12. An apparatus for processing video data, comprising:
the acquisition module is used for acquiring a video to be processed;
the separation module is used for carrying out state separation processing on the video to be processed based on a preset neural network algorithm to obtain a first part and a second part in the video to be processed, wherein the first part is a part which allows interpolation operation in the video to be processed, and the second part is a part which forbids interpolation operation in the video to be processed;
the interpolation module is used for carrying out interpolation processing on the first part to obtain an interpolation part;
and the synthesis module is used for synthesizing the second part and the interpolation part to obtain a target video.
13. A storage medium, comprising a stored program, wherein the program, when executed, controls an apparatus in which the storage medium is located to perform the method for processing video data according to any one of claims 1 to 9.
14. A processor, characterized in that the processor is configured to run a program, wherein the program when running performs the method of processing video data according to any one of claims 1 to 9.
15. A computing device, comprising:
a processor; and
a memory coupled to the processor for providing instructions to the processor for processing the following processing steps:
acquiring a video to be processed;
performing state separation processing on the video to be processed based on a preset neural network algorithm to obtain a first part and a second part in the video to be processed, wherein the first part is a part which allows interpolation operation in the video to be processed, and the second part is a part which prohibits interpolation operation in the video to be processed;
performing interpolation processing on the first part to obtain an interpolation part;
and synthesizing the second part and the interpolation part to obtain a target video.
CN202110187923.2A 2021-02-18 2021-02-18 Method of processing video data and method of processing image data Pending CN114979703A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110187923.2A CN114979703A (en) 2021-02-18 2021-02-18 Method of processing video data and method of processing image data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110187923.2A CN114979703A (en) 2021-02-18 2021-02-18 Method of processing video data and method of processing image data

Publications (1)

Publication Number Publication Date
CN114979703A true CN114979703A (en) 2022-08-30

Family

ID=82954160

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110187923.2A Pending CN114979703A (en) 2021-02-18 2021-02-18 Method of processing video data and method of processing image data

Country Status (1)

Country Link
CN (1) CN114979703A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107133919A (en) * 2017-05-16 2017-09-05 西安电子科技大学 Time dimension video super-resolution method based on deep learning
CN108322685A (en) * 2018-01-12 2018-07-24 广州华多网络科技有限公司 Video frame interpolation method, storage medium and terminal
CN109379550A (en) * 2018-09-12 2019-02-22 上海交通大学 Video frame rate upconversion method and system based on convolutional neural networks
US20190289257A1 (en) * 2018-03-15 2019-09-19 Disney Enterprises Inc. Video frame interpolation using a convolutional neural network

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107133919A (en) * 2017-05-16 2017-09-05 西安电子科技大学 Time dimension video super-resolution method based on deep learning
CN108322685A (en) * 2018-01-12 2018-07-24 广州华多网络科技有限公司 Video frame interpolation method, storage medium and terminal
US20200314382A1 (en) * 2018-01-12 2020-10-01 Guangzhou Huaduo Network Technology Co., Ltd. Video frame interpolation method, storage medium and terminal
US20190289257A1 (en) * 2018-03-15 2019-09-19 Disney Enterprises Inc. Video frame interpolation using a convolutional neural network
CN109379550A (en) * 2018-09-12 2019-02-22 上海交通大学 Video frame rate upconversion method and system based on convolutional neural networks

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
SIMON NIKLAUS ET AL: "Video Frame Interpolation via Adaptive Separable Convolution", 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION, 25 December 2017 (2017-12-25), pages 261 - 270 *
侯敬轩;赵耀;林春雨;刘美琴;白慧慧;: "基于卷积网络的帧率提升算法研究", 计算机应用研究, no. 02, 15 March 2017 (2017-03-15), pages 611 - 614 *

Similar Documents

Publication Publication Date Title
US11354785B2 (en) Image processing method and device, storage medium and electronic device
US11025959B2 (en) Probabilistic model to compress images for three-dimensional video
CN109803175B (en) Video processing method and device, video processing equipment and storage medium
US10313665B2 (en) Behavioral directional encoding of three-dimensional video
US6747610B1 (en) Stereoscopic image display apparatus capable of selectively displaying desired stereoscopic image
US9485493B2 (en) Method and system for displaying multi-viewpoint images and non-transitory computer readable storage medium thereof
CN111402399B (en) Face driving and live broadcasting method and device, electronic equipment and storage medium
US8159530B2 (en) Method and apparatus for displaying stereoscopic images
US11871127B2 (en) High-speed video from camera arrays
CN113747240B (en) Video processing method, apparatus and storage medium
CN114170472A (en) Image processing method, readable storage medium and computer terminal
CN112261408B (en) Image processing method and device for head-mounted display equipment and electronic equipment
CN114004750A (en) Image processing method, device and system
CN115293994B (en) Image processing method, image processing device, computer equipment and storage medium
CN114979703A (en) Method of processing video data and method of processing image data
CN110677676A (en) Video encoding method and apparatus, video decoding method and apparatus, and storage medium
EP4037324A1 (en) Video processing method and apparatus, and storage medium and electronic device
CN114938461A (en) Video processing method, device and equipment and readable storage medium
CN114827666A (en) Video processing method, device and equipment
CN113592875B (en) Data processing method, image processing method, storage medium, and computing device
CN113554556A (en) Data processing method and system, storage medium and computing device
CN114092315A (en) Method, apparatus, computer-readable storage medium and processor for reconstructing an image
CN110519518A (en) A kind of image processing method, device and electronic equipment
CN114913056A (en) Material style conversion method, device, storage medium and processor
CN114299594A (en) Image processing method and apparatus, storage medium, and electronic device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination