WO2023193521A1 - Video inpainting method, related apparatus, device and storage medium - Google Patents

Video inpainting method, related apparatus, device and storage medium Download PDF

Info

Publication number
WO2023193521A1
WO2023193521A1 PCT/CN2023/075576 CN2023075576W WO2023193521A1 WO 2023193521 A1 WO2023193521 A1 WO 2023193521A1 CN 2023075576 W CN2023075576 W CN 2023075576W WO 2023193521 A1 WO2023193521 A1 WO 2023193521A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
optical flow
frame
target mask
video frame
Prior art date
Application number
PCT/CN2023/075576
Other languages
French (fr)
Chinese (zh)
Inventor
钟立耿
朱允全
谯睿智
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Publication of WO2023193521A1 publication Critical patent/WO2023193521A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/77Retouching; Inpainting; Scratch removal
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/269Analysis of motion using gradient-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30168Image quality inspection

Definitions

  • This application relates to the field of data processing technology, especially to video repair technology.
  • Video inpainting is a task used to fill in the missing areas in the video frame with reasonable content. It mainly uses the unmasked area information in the video to repair the masked areas. For example, repair damaged videos, remove unwanted objects, relocate videos, repair underexposed images, etc.
  • video restoration technology is mainly divided into two types.
  • One is the technology that uses optical flow propagation and image restoration. This technology first propagates available pixels to the corresponding areas through optical flow, and then uses image restoration to fill isolated pixel blocks. The other is to use an end-to-end neural network method to fill the occluded area using a generative model.
  • the above video repair technology has at least the following problems.
  • the content based on optical flow filling has higher definition, but relies too much on optical flow.
  • the optical flow itself is easily disturbed and the optical flow estimation may be inaccurate. Therefore, it is easy to Distortions and incorrect fillings occur.
  • the end-to-end neural network method takes into account semantic information and usually does not cause distortions and serious errors.
  • due to the complex background it is easy to cause the filling content to be blurred.
  • Embodiments of the present application provide a video repair method, related devices, equipment, and storage media.
  • This application uses optical flow quality as the basis for selecting a video repair method, so that different video repair methods can learn from each other's strengths and weaknesses, thereby helping to obtain video images with better repair effects.
  • this application provides a video repair method, which is executed by a computer device, including:
  • the video sample sequence includes K video frame pairs, each video frame pair includes two adjacent video frames, and K is an integer greater than or equal to 1;
  • the target mask sample sequence includes K target mask frames, and each target mask frame includes a target mask area obtained by expanding the original mask area, and, There is a one-to-one correspondence between K target mask frames and K video frame pairs;
  • optical flow data sequence includes K optical flow data, and there is a one-to-one correspondence between the K optical flow data and K video frame pairs;
  • the target mask area in each target mask frame is The pixels included in the domain are clustered to obtain the optical flow clustering results of each target mask frame;
  • a video repair device including:
  • An acquisition module is used to obtain a video sample sequence for the video to be repaired, where the video sample sequence includes K video frame pairs, each video frame pair includes two adjacent video frames, and K is an integer greater than or equal to 1;
  • the acquisition module is also used to obtain a target mask sample sequence according to the video sample sequence, wherein the target mask sample sequence includes K target mask frames, and each target mask frame includes a target obtained by expanding the original mask area.
  • Mask area and there is a one-to-one correspondence between K target mask frames and K video frame pairs;
  • the acquisition module is also used to obtain the optical flow data sequence according to the video sample sequence, wherein the optical flow data sequence includes K optical flow data, and there is a one-to-one correspondence between the K optical flow data and the K video frame pairs;
  • the processing module is used to cluster the pixels included in the target mask area in each target mask frame based on each optical flow data in the optical flow data sequence, and obtain the optical flow clustering of each target mask frame. result;
  • a determination module used to determine the optical flow quality score based on the optical flow clustering results of each target mask frame
  • the repair module is used to repair the video to be repaired using a video repair method that matches the optical flow quality score.
  • Another aspect of the present application provides a computer device, including a memory and a processor.
  • the memory stores a computer program.
  • the processor executes the computer program, the methods of the above aspects are implemented.
  • Another aspect of the present application provides a computer-readable storage medium on which a computer program is stored.
  • the computer program is executed by a processor, the methods of the above aspects are implemented.
  • Another aspect of the present application provides a computer program product, including a computer program, which implements the methods of the above aspects when executed by a processor.
  • a video repair method is provided. First, a video sample sequence corresponding to the video to be repaired is obtained, and then a target mask sample sequence can be obtained according to the video sample sequence, wherein each target mask frame includes a pair of original The target mask area obtained after the mask area is expanded. Furthermore, the optical flow data sequence is obtained according to the video sample sequence, and then based on each optical flow data in the optical flow data sequence, the pixels included in the target mask area in each target mask frame are clustered to obtain each target. Optical flow clustering results of mask frames. Based on this, the optical flow quality score can be determined based on the optical flow clustering results of each target mask frame, and the video to be repaired can be repaired using a video repair method that matches the optical flow quality score.
  • the optical flow clustering results of the masked area are used to predict the optical flow quality.
  • the optical flow method can be used as the video Repair methods to obtain filler content with higher clarity and credibility.
  • the generative model can be used as a video repair method to obtain a more stable filling effect. It can be seen that this application uses optical flow quality as the basis for selecting a video repair method, so that different video repair methods can learn from each other's strengths and weaknesses, thereby helping to obtain a video picture with better repair effects.
  • Figure 1 is an architectural schematic diagram of a video repair system in an embodiment of the present application
  • Figure 2 is an effect diagram of video frame filling based on the optical flow method in the embodiment of the present application
  • Figure 3 is an effect diagram of video frame filling based on the model method in the embodiment of the present application.
  • Figure 4 is a schematic flow chart of a video repair method in an embodiment of the present application.
  • Figure 5 is a schematic diagram of generating a target mask frame in an embodiment of the present application.
  • Figure 6 is another schematic diagram of generating a target mask frame in an embodiment of the present application.
  • Figure 7 is another schematic diagram of generating a target mask frame in an embodiment of the present application.
  • Figure 8 is another schematic diagram of generating a target mask frame in an embodiment of the present application.
  • Figure 9 is a schematic diagram of determining a two-dimensional optical flow value based on forward optical flow in an embodiment of the present application.
  • Figure 10 is a schematic diagram of determining a two-dimensional optical flow value based on backward optical flow in an embodiment of the present application
  • Figure 11 is a schematic diagram of the effect of removing a mark based on a video repair application in an embodiment of the present application
  • Figure 12 is a schematic diagram of the effect of removing subtitles based on a video repair application in an embodiment of the present application
  • Figure 13 is a schematic diagram of the effect of object removal based on video repair application in the embodiment of the present application.
  • Figure 14 is a schematic diagram comparing the effects of video frame restoration based on the optical flow method and the model method in the embodiment of the present application;
  • Figure 15 is a schematic diagram of the video repair device in the embodiment of the present application.
  • Figure 16 is a schematic structural diagram of a terminal in an embodiment of the present application.
  • Figure 17 is a schematic structural diagram of a server in an embodiment of the present application.
  • video has gradually become the mainstream method of information exchange, and massive videos pose more challenges to video quality management.
  • videos may be defective due to some reasons. For example, there is a mosaic pattern in the video screen. The mosaic pattern will affect the user's viewing experience. For another example, there may be station logos or advertising patterns during the video creation process. Based on this, this application proposes a video repair method, aiming to remove unnecessary objects in the video or restore damaged pictures.
  • Video repair methods specifically involve AI-based computer vision (CV) technology and machine learning (ML). That is, repairable objects (for example, station logos, subtitles, etc.) are identified from the video through CV technology. The video image is repaired through the neural network trained by ML.
  • CV computer vision
  • ML machine learning
  • the video repair system includes services Server and terminal, and a client is deployed on the terminal.
  • the client can run on the terminal in the form of a browser, or can also run on the terminal in the form of an independent application (application, APP).
  • application application, APP
  • the server involved in this application can be an independent physical server, a server cluster or distributed system composed of multiple physical servers, or a cloud server that provides cloud services.
  • the terminal can be a mobile phone, computer, intelligent voice interaction device, smart home appliance, vehicle terminal, aircraft, etc., but is not limited to this.
  • the terminal and the server can be connected directly or indirectly through wired or wireless communication methods, which is not limited in this application. There is no limit on the number of servers and terminals.
  • the solution provided by this application can be completed independently by the terminal, or independently by the server, or it can also be completed by the terminal and the server in cooperation. This application does not specifically limit this.
  • the user can upload the video to the server through the terminal, and the server can directly call the video repair function. That is, the selected video repair algorithm (ie, optical flow method or model method) is first determined, and based on this, the corresponding video repair algorithm is used to repair the video. Finally, the repaired video is stored in the database.
  • the terminal requests the server to play a video
  • the server can obtain the corresponding video from the database and feed it back to the terminal.
  • the user can upload the video to the server through the terminal, and the server stores the video uploaded by the terminal into the database.
  • the corresponding video can be selected from the database and then the video repair function can be called. That is, the selected video repair algorithm (ie, optical flow method or model method) is first determined. Based on this, the video is repaired using the corresponding video repair algorithm. Finally, the repaired video is stored in the database.
  • the selected video repair algorithm ie, optical flow method or model method
  • Figure 2 is an effect diagram of video frame filling based on the optical flow method in the embodiment of the present application.
  • the mask object is detected in the video frame.
  • the video frame shown in (b) in Figure 2 can be obtained. It can be seen that in the case of object occlusion and complex movement of the background, the filling effect based on the optical flow method will be greatly affected, and the erroneous pixels caused by the optical flow estimation error will gradually expand as it propagates, resulting in incorrect filling content. .
  • Figure 3 is an rendering of video frame filling based on the model method in an embodiment of the present application.
  • a mask object is detected in the video frame.
  • the video frame shown in (b) in Figure 3 can be obtained. It can be seen that the filling part is blurry, and it is difficult to process high-resolution input due to limitations such as video memory, but the overall effect is relatively stable and obvious errors with strong contrast are not prone to occur.
  • the frequency repair method can determine in advance which video repair method to choose for picture repair, and then use a more appropriate video repair method to repair the video picture, so as to achieve a more robust filling effect.
  • the video repair method in the present application will be introduced below. Please refer to Figure 4.
  • the video repair method in the embodiment of the present application can be executed by a computer device, and the computer device can be a terminal or a server.
  • the embodiment of the present application includes:
  • the video sample sequence includes K video frame pairs, each video frame pair includes two adjacent video frames, and K is an integer greater than or equal to 1.
  • the computer device can obtain the video to be repaired, and then extract K video frame pairs from the video to be repaired to form a video sample sequence.
  • Each video frame pair includes two adjacent video frames, and each video frame pair includes two adjacent video frames.
  • a video frame has a corresponding video frame number.
  • the video sample sequence includes K video frame pairs, that is, the first video frame pair is expressed as (x 1 , x 2 ), the second video frame pair is expressed as (x 11 , x 12 ), and so on. .
  • the video sample sequence is expressed as A video frame pair, that is, the first video frame pair is expressed as (x r1 , x r2 ), the second video frame pair is expressed as (x r11 , x r12 ), and so on.
  • the target mask sample sequence includes K target mask frames, and each target mask frame includes a target mask area obtained by expanding the original mask area, Moreover, there is a one-to-one correspondence between K target mask frames and K video frame pairs.
  • the corresponding at least one original mask frame can be obtained.
  • the corresponding original mask area is marked, and then the original mask area is expanded according to a certain number of pixels to obtain the target mask area. Then the target mask frame is obtained according to the target mask area.
  • a target mask sample sequence including K target mask frames is obtained.
  • the optical flow data sequence includes K optical flow data, and there is a one-to-one correspondence between the K optical flow data and K video frame pairs.
  • the computer device may generate corresponding optical flow data respectively according to K video frame pairs in the video sample sequence, thereby obtaining an optical flow data sequence including K optical flow data.
  • the optical flow data can be expressed as an optical flow matrix of two channels. One optical flow matrix is used to record the horizontal offset of all pixels in the video frame pair, and the other optical flow matrix is used to record the vertical offset of all pixels in the video frame pair.
  • the computer device can perform step 120 first and then step 130, or it can perform step 130 first and then step 120, or it can also perform steps 120 and 130 at the same time.
  • steps The execution order of step 120 and step 130 is not limited in any way.
  • the optical flow data sequence and the target mask sample sequence are aligned, that is, the optical flow data in the optical flow data sequence corresponds to the target mask frame in the target mask sample sequence. Based on this, for each target mask frame, the corresponding optical flow data is used to assign corresponding two-dimensional optical flow values to each pixel in the target mask area. Then, based on the two-dimensional optical flow value of each pixel, a clustering algorithm is used to cluster these pixels, thereby obtaining the optical flow clustering result of each target mask frame.
  • this application can use density-based spatial clustering of applications with noise (DBSCAN), or mean shift clustering method (meanshift), or other clustering methods for Pixel points are clustered, which is not limited here.
  • the computer device can combine the optical flow clustering results of each target mask frame to comprehensively determine the quality of the optical flow, thereby generating a corresponding optical flow quality score.
  • the optical flow quality score in this application is the first score or the second score.
  • the first score it means that the optical flow quality is good.
  • the first score may be "1”.
  • the optical flow quality score is the second score, it means that the optical flow quality is poor.
  • the second score may be "0".
  • the computer device may select a corresponding video repair method based on the optical flow quality score. That is, if the optical flow quality score is the first score, the optical flow method is used to repair the video to be repaired. If the optical flow quality score is the second score, the neural network is called to repair the video to be repaired.
  • the process of using the optical flow method to repair the video to be repaired mainly includes: using adjacent frames for optical flow estimation, then filling the original mask area in each frame with optical flow, and using optical flow to fill the unmasked area with optical flow. Pixel gradients are propagated to the original mask area. Then Poisson reconstruction is performed on the pixel gradient to generate red green blue (RGB) pixels. Finally, image inpainting is performed on areas that cannot be filled by optical flow.
  • RGB red green blue
  • the process of calling the neural network to repair the video to be repaired is to receive the frame sequence information as input, and then output the repaired video frame after being processed by the neural network.
  • neural networks mostly use an encoder-decoder structure.
  • the neural network used in this application can be a fine-grained visual categorization (FGVC) network, or a spatial-temporal transformer network (STTN), or a decoupled spatial-temporal attention network (decoupled spatial- Temporal attention network, DSTT), etc., are not limited here.
  • the embodiment of this application provides a video repair method.
  • the regional optical flow clustering results predict the optical flow quality.
  • the optical flow method can be used as a video repair method to obtain filling content with higher clarity and credibility.
  • the generative model can be used as a video repair method to obtain a more stable filling effect. It can be seen that this application uses the optical flow quality as the basis for selecting a video repair method, so as to achieve the purpose of making different video repair methods complement each other, thereby conducive to obtaining a video picture with better repair effect.
  • obtaining a video sample sequence corresponding to the video to be repaired may specifically include:
  • Obtain a video sequence from the video to be repaired where the video sequence includes T original video frames, each original video frame displays a target object, and T is an integer greater than 1;
  • the size of each original video frame in the K video frame pairs to be processed is normalized respectively to obtain K video frame pairs, and the K video frame pairs are used as a video sample sequence.
  • a way of generating a sequence of video samples is introduced.
  • adjacent original video frames can be extracted at certain intervals.
  • the sequence includes K video frame pairs to be processed, that is, the first video frame pair to be processed is expressed as (x 1 , x 2 ), and the second video frame pair to be processed is expressed as (x 11 , x 12 ) , and so on.
  • each original video frame in the pair of video frames to be processed is subjected to size normalization processing to obtain the corresponding video frame.
  • Adjacent video frames constitute a video frame pair
  • K video frame pairs constitute a video sample sequence
  • the video frame after size normalization has a fixed size, for example, 512 ⁇ 288.
  • embodiments of this application provide a way to generate a video sample sequence.
  • extracting several video frame pairs to be processed from the video sequence for subsequent processing can reduce the amount of data processing and save data processing resources.
  • normalizing the size of the original video frame can not only align the statistics of each video frame, but also reduce the size of the video frame, thereby improving processing efficiency.
  • obtaining the target mask sample sequence according to the video sample sequence may specifically include:
  • the corresponding target mask frames of the K video frame pairs are used as the target mask sample sequence.
  • a method of generating a target mask frame based on a single video frame is introduced. It can be known from the foregoing embodiments that the target object is displayed in the video to be repaired. For this purpose, the target object needs to be masked to obtain the corresponding original mask area. Then the original mask area is expanded according to a certain number of pixels to obtain the target mask area.
  • the target object can be a logo, subtitle, object, etc.
  • the method of identifying target objects includes but is not limited to manual annotation and model recognition. For example, using a fully convolutional network (fully convolution network, FCN) to identify the target object.
  • FCN fully convolution network
  • m sF ⁇ m 1 , m 11 ,... ⁇
  • m sF normalized to obtain the original mask.
  • m sB ⁇ m 2 , m 12 ,... ⁇ is obtained.
  • m sB is normalized to obtain the original mask.
  • the original mask frame sequence includes K original mask frames.
  • x srF ⁇ x r1 , x r11 ,... ⁇ is obtained. Then x srF is masked to obtain the original mask.
  • the original mask frame sequence includes K original mask frames.
  • Figure 5 is a schematic diagram of generating a target mask frame in an embodiment of the present application, taking the original mask frame shown in (a) of Figure 5 as an example, where the mark The 15 pixels of "1" constitute the original mask area. Assume that the original mask is mapped according to the number of 2 pixels The area is expanded to obtain the target mask area (ie, the gray area composed of pixels marked "1"). Based on this, the target mask frame shown in (b) in Figure 5 is obtained.
  • each original mask frame is processed until the target mask sample sequence is obtained.
  • the embodiment of the present application provides a way to generate a target mask frame based on a single video frame.
  • the original corresponding to the video frame pair is expanded to obtain the target mask frame corresponding to the video frame, which may include:
  • an XOR operation is performed on the first mask area and the second mask area corresponding to the video frame pair to obtain the target mask frame corresponding to the video frame pair.
  • a way of expanding the original mask area is introduced. It can be known from the foregoing embodiments that for each original mask frame in the original mask frame sequence, the original mask area can also be expanded to obtain a target mask area. In this way, the target mask frame containing the target mask area is obtained.
  • Figure 6 is another schematic diagram of generating a target mask frame in an embodiment of the present application, taking the original mask frame shown in (a) of Figure 6 as an example, where, The 15 pixels marked "1" constitute the original mask area.
  • the original mask area is expanded according to the first number of pixels (for example, 2 pixels) to obtain the first mask area (that is, a gray area composed of pixels marked "1"), that is, we get The mask frame shown in (b) of Figure 6.
  • the original mask area is expanded according to the second number of pixels (for example, 4 pixels) to obtain the second mask area (that is, the gray area composed of pixels marked "1"), that is, we get The mask frame shown in (c) of Figure 6.
  • an XOR operation is performed on the first mask area and the second mask area to obtain a target mask frame as shown in (d) of Figure 6 , where the target mask frame includes the target mask area (i.e., The gray area consisting of pixels marked "1").
  • each original mask frame is processed until the target mask sample sequence is obtained.
  • m dst table represents the t-th target mask frame
  • m da represents the mask frame including the first mask area
  • a represents the number of first pixels
  • m db represents the mask frame including the second mask area
  • b represents the second pixel number
  • " ⁇ " represents the XOR operator.
  • the embodiment of the present application provides a way to expand the original mask area.
  • the optical flow inside the original mask area is obtained from the peripheral optical flow. If the peripheral optical flow is relatively chaotic, the optical flow inside the original mask area cannot be well filled. Considering that there may be some noise in the pixels close to the original mask area, the target mask area obtained by deviating from the original mask area has less noise, which is beneficial to improving the judgment effect of optical flow quality.
  • obtaining the target mask sample sequence according to the video sample sequence may specifically include:
  • the corresponding target mask frames of the K video frame pairs are used as the target mask sample sequence.
  • a method of generating a target mask frame based on multiple video frames is introduced. It can be known from the foregoing embodiments that the target object is displayed in the video to be repaired. For this purpose, the target object needs to be masked to obtain the corresponding original mask area. Then the original mask area is expanded according to a certain number of pixels to obtain the target mask area.
  • the target object can be a logo, subtitle, object, etc. It can be understood that the method of identifying the target object includes but is not limited to manual annotation and model recognition. For example, FCN is used to identify the target object.
  • m r1 corresponds to (x 1 , x 2 )
  • m r11 corresponds to (x 11 , x 12 )
  • m r2 corresponds to (x 1 ,x 2 )
  • m r12 corresponds to (x 11 ,x 12 ), and so on.
  • m r1 corresponds to (x r1 ,x r2 )
  • m r11 corresponds to (x r11 ,x r12 )
  • the second original mask frame In the first original mask frame, m r1 corresponds to (x r1 ,x r2 ), m r11 corresponds to (x r11 ,x r12 ), and so on;
  • m r2 corresponds to (x r1 ,x r2 ), m r12 corresponds to (x r11 ,x r12 ), and so on.
  • Figure 7 is another schematic diagram of generating a target mask frame in an embodiment of the present application.
  • Figure 7(a) illustrates the first original mask frame, where, The 13 pixel points marked "1" constitute the original mask area of the first original mask frame.
  • (b) of FIG. 7 illustrates the second original mask frame, in which the 13 pixel points marked “1” constitute the original mask area of the second original mask frame.
  • the original mask frame is obtained as shown in (c) of Figure 7, in which 15 pixels marked as "1" are formed.
  • the original mask area of this original mask frame Assume that the original mask area is expanded according to the number of 2 pixels to obtain the target mask area (ie, the gray area composed of pixels marked "1"). Based on this, the target mask frame shown in (d) of Figure 7 is obtained.
  • each original mask frame is processed until the target mask sample sequence is obtained.
  • embodiments of the present application provide a method of generating target mask frames based on multiple video frames. This method takes into account that the original mask areas of the two video frames before and after the video frame pair may be different. Therefore, the original mask areas of the two frames before and after are first combined to obtain a more accurate original mask area. As a result, the processing effect of video frames is improved.
  • the original mask corresponding to the video pair is The original mask area in the film frame is expanded to obtain the target mask frame corresponding to the video pair, which may include:
  • an XOR operation is performed on the first mask area and the second mask area corresponding to the video pair to obtain the target mask frame corresponding to the video pair.
  • a way of expanding the original mask area is introduced. It can be known from the foregoing embodiments that for each original mask frame in the original mask frame sequence, the original mask area can also be expanded to obtain a target mask area. In this way, the target mask frame containing the target mask area is obtained.
  • Figure 8 is another schematic diagram of generating a target mask frame in an embodiment of the present application.
  • Figure 8 (a) illustrates the first original mask frame, wherein, The 13 pixel points marked “1" constitute the original mask area of the first original mask frame.
  • FIG. 8 illustrates the second original mask frame, in which the 13 pixel points marked “1” constitute the original mask area of the second original mask frame.
  • an original mask frame is obtained as shown in (c) of Figure 8, in which 15 pixels marked as "1" are formed. The original mask area of this original mask frame.
  • the original mask area is expanded according to the first number of pixels (for example, 2 pixels) to obtain the first mask area (that is, a gray area composed of pixels marked "1"), that is, we get The mask frame shown in (d) in Figure 8.
  • the original mask area is expanded according to the second number of pixels (for example, 4 pixels) to obtain the second mask area (that is, the gray area composed of pixels marked "1"), that is, we get The mask frame shown in (e) of Figure 8.
  • an XOR operation is performed on the first mask area and the second mask area to obtain the target mask frame as shown in (f) of Figure 8, where the target mask frame includes the target mask area (i.e. , the gray area consisting of pixels marked "1").
  • each original mask frame is processed until the target mask sample sequence is obtained.
  • m dst represents the t-th target mask frame
  • m da represents the mask frame including the first mask area
  • a represents the number of first pixels
  • m db represents the mask frame including the second mask area
  • b Indicates the number of second pixels
  • " ⁇ " indicates the XOR operator.
  • the embodiment of the present application provides a way to expand the original mask area.
  • the optical flow inside the original mask area is obtained from the peripheral optical flow. If the peripheral optical flow is relatively chaotic, the optical flow inside the original mask area cannot be well filled. Considering that there may be some noise in the pixels close to the original mask area, the target mask area obtained by deviating from the original mask area has less noise, which is beneficial to improving the judgment effect of optical flow quality.
  • obtaining the optical flow data sequence according to the video sample sequence may specifically include:
  • optical flow data corresponding to each of the K video pairs is used as an optical flow data sequence
  • Obtaining the optical flow data sequence based on the video sample sequence may include:
  • optical flow data corresponding to each of the K video pairs is used as an optical flow data sequence.
  • Figure 9 is a schematic diagram of determining a two-dimensional optical flow value based on forward optical flow in an embodiment of the present application.
  • the pixel point coordinates are (3,4).
  • the pixel coordinates are (4,5).
  • the horizontal offset of this pixel from the previous video frame to the next video frame is 1 (ie, 4-3), and the vertical offset is 1 (ie, 5-4). It can be seen that the two-dimensional The optical flow value is (1,1).
  • Figure 10 is a schematic diagram of determining a two-dimensional optical flow value based on backward optical flow in an embodiment of the present application.
  • the pixel point coordinates are (1,3).
  • the pixel coordinates are (4,5).
  • the horizontal offset of this pixel from the next video frame to the previous video frame is -4 (i.e., 1-4), and the vertical offset is -2 (i.e., 3-5). It can be seen that this pixel
  • the two-dimensional optical flow value is (-4,-2).
  • the embodiments of this application provide two ways of determining optical flow data based on video frame pairs. Through the above method, it is supported to generate optical flow data based on forward optical flow or backward optical flow, thus improving the flexibility of the solution.
  • Clustering results may include:
  • For each target mask frame determine the two-dimensional optical flow values of X pixels in the target mask area in the target mask frame based on the optical flow data corresponding to the target mask frame in the optical flow data sequence, where, The optical flow data corresponding to the target mask frame and the target mask frame correspond to the same video frame pair, and X is an integer greater than 1;
  • For each target mask frame cluster the X pixels according to the two-dimensional optical flow values of the X pixels in the target mask area to obtain the optical flow clustering result of the target mask frame.
  • a method of clustering pixels in a target mask area is introduced.
  • the target mask sample sequence includes K target mask frames, and it is necessary to perform optical flow clustering on the pixels in the target mask area in each target mask frame. It is understandable that in actual situations, the number of pixels included in the target mask area may be large. Therefore, the pixels in the target mask area may also be randomly sampled in advance to obtain X pixels. Among them, X is an integer greater than 1. For example, X can be set to 15000.
  • the optical flow clustering result of each target mask frame includes the category label corresponding to each pixel after clustering. Among them, pixels with a category label of "0" belong to noise pixels and need to be eliminated. After elimination, the total number of categories corresponding to the target mask frame is obtained. Taking the t-th target mask frame as an example, the corresponding total number of categories can be expressed as C t , that is, there are C t clusters. A cluster may include N ct pixels.
  • the embodiment of the present application provides a way to cluster pixels in the target mask area.
  • the DBSCAN algorithm can be used to cluster pixels.
  • adaptive clustering can be achieved without setting the number of categories in advance.
  • the DBSCAN algorithm can better judge outliers and can find clusters of arbitrary shapes.
  • the optical flow quality score is determined according to the optical flow clustering result of each target mask frame. , specifically can include:
  • the optical flow quality score is determined to be the first score
  • the optical flow quality score is determined to be the second score.
  • a method of determining the optical flow quality score based on a category single ratio is provided.
  • the optical flow clustering result of each target mask frame includes the category label corresponding to each pixel after clustering. By eliminating pixels with a category label of "0", the total number of categories corresponding to the target mask frame can be obtained.
  • optical flow clustering results For example, the following method can be used to calculate the single proportion of a category:
  • CR represents the category single ratio.
  • t represents the frame number of the target mask frame
  • K represents the total number of target mask frames.
  • c represents the category label
  • C t represents the total number of categories in the t-th target mask frame.
  • i represents the pixel number
  • N ct represents the number of pixels corresponding to the c-th category label in the t-th target mask frame.
  • the proportion of frames among the K target mask frames in which the total number of categories is less than or equal to the category number threshold (for example, 1) can be calculated, that is, a single category ratio is obtained.
  • the criterion for determining optical flow quality can be defined as:
  • Q represents the optical flow quality score.
  • CR represents category single ratio.
  • CR threshold represents a proportional threshold.
  • the proportional threshold can be set to 0.8, or other reasonable values, which are not limited here.
  • the embodiment of the present application provides a way to determine the optical flow quality score based on a single proportion of categories.
  • the above method takes into account that the larger the proportion of a single category, the smaller the total number of categories and the more stable the video optical flow is. Therefore, a single ratio of categories is used to filter out videos with disturbed optical flow, which is used as a basis for judging optical flow quality, thus improving the feasibility and operability of the solution.
  • the optical flow quality score is determined according to the optical flow clustering result of each target mask frame. , specifically can include:
  • the moving average of each cluster is determined based on the two-dimensional optical flow value of each pixel in each cluster, where the optical flow clustering result Used to characterize one or more clusters;
  • the moving average of each target mask frame is accumulated to obtain the total moving distance
  • the optical flow quality score is determined to be the first score
  • the optical flow quality score is determined to be the second score.
  • a method for determining the optical flow quality score based on the total distance moved is introduced.
  • the optical flow clustering result of each target mask frame includes the category label corresponding to each pixel after clustering. By eliminating pixels with a category label of "0", the total number of categories corresponding to the target mask frame can be obtained.
  • the following method can be used to calculate the total moving distance accumulated by K target mask frames:
  • D represents the total distance moved.
  • D t represents the moving average of the t-th target mask frame.
  • t represents the frame number of the target mask frame, and K represents the total number of target mask frames.
  • the moving average of the target mask frame can be calculated as follows:
  • D t represents the moving average of the t-th target mask frame.
  • D tc represents the moving average of the c-th cluster in the t-th target mask frame.
  • c represents the category label, and
  • C t represents the total number of categories in the t-th target mask frame.
  • the moving average of a cluster can be calculated as follows:
  • i represents the pixel number
  • N ct represents the number of pixels corresponding to the c-th category label in the t-th target mask frame.
  • represents the Euclidean distance.
  • the criterion for determining optical flow quality can be defined as:
  • Q represents the optical flow quality score.
  • D represents the total distance moved.
  • D threshold represents a distance threshold.
  • the distance threshold can be set to 4, or other reasonable values, which are not limited here.
  • the embodiment of the present application provides a way to determine the optical flow quality score based on the total distance moved.
  • the above method takes into account that the larger the total distance moved, the more obvious the frame motion is, which is beneficial to optical flow estimation. Therefore, the total moving distance is used to filter out relatively stationary videos, which is used as a basis for determining the optical flow quality, thereby improving the feasibility and operability of the solution.
  • the optical flow quality score is determined according to the optical flow clustering result of each target mask frame. , specifically can include:
  • the moving average of each cluster is determined based on the two-dimensional optical flow value of each pixel in each cluster, where the optical flow clustering result Used to characterize one or more clusters;
  • For the optical flow clustering result of each target mask frame determine the moving average of the target mask frame based on the moving average of each cluster cluster;
  • the moving average of each target mask frame is accumulated to obtain the total moving distance
  • the optical flow quality score is determined to be the first score
  • the optical flow quality score is determined to be the second score.
  • a method of jointly determining the optical flow quality score based on a single proportion of a category and the total distance moved is introduced.
  • the proportion of frames in which the total number of categories is less than or equal to the category number threshold (for example, 1) among the K target mask frames can be counted, that is, a single category proportion is obtained.
  • the total moving distance of K target mask frames can be calculated.
  • the criterion for determining optical flow quality can be defined as:
  • D represents the total distance moved.
  • D threshold represents a distance threshold.
  • the distance threshold can be set to 4, or other reasonable values, which are not limited here.
  • CR represents category single ratio.
  • CR threshold represents a proportional threshold.
  • the proportional threshold can be set to 0.8, or other reasonable values, which are not limited here.
  • the embodiment of the present application provides a way to jointly determine the optical flow quality score based on a single proportion of categories and the total distance moved.
  • the single ratio of the category can be used to filter out the video whose optical flow is disturbed.
  • the total distance of movement can be used to filter out the relatively static video. Therefore, the combination of the two can reflect the optical flow quality more comprehensively and accurately, thus improving the reliability of the optical flow quality score.
  • a video repair method matching the optical flow quality score is used to repair the video to be repaired, Specifics may include:
  • the optical flow method is used to repair the video to be repaired.
  • the neural network is called to repair the video to be repaired.
  • the optical flow quality score can be a first score or a second score.
  • the following uses the first score as "1" and the second score as "0" as an example for introduction.
  • the video repair method can be selected as follows:
  • F 1 (x, m) indicates that the optical flow method is used for video repair processing.
  • F 2 (x,m) means calling the neural network for video repair processing.
  • Q represents the optical flow quality score.
  • the video sequence differs from the video to be repaired only in the original mask area, and makes the video sequence natural and consistent in time and space. Since naturalness and consistency are difficult to define in a formula, when training a neural network, it is hoped that the filled video sequence is close to the real video sequence y gt .
  • y gt represents the true value of the video sequence without the original mask area.
  • embodiments of this application provide a method for video repair based on optical flow quality scores.
  • the model method is used to fill the content, thereby avoiding erroneous filling caused by inaccurate optical flow estimation and obtaining an overall more stable filling effect.
  • another optional embodiment provided by the embodiment of this application may also include:
  • a video repair method that matches the optical flow quality score, after repairing the video to be repaired, it can also include:
  • a method of intelligently repairing videos is introduced.
  • the present application can be applied to various video repair tasks, such as removing logos, removing subtitles, removing objects, etc. If the user wants to use videos from certain platforms, but the videos carry the logo of that platform, which affects the look and feel, they can use a video repair application to remove the logo. Similarly, users can erase subtitles from some videos, or remove certain moving objects from videos. The following will be introduced separately with the illustrations.
  • Figure 11 is a schematic diagram of the effect of removing a flag based on a video repair application in an embodiment of the present application.
  • the video to be repaired and a list of repair objects are displayed on the interface provided by the video repair application.
  • the list of repairable objects shows that there is at least one repairable object (e.g. logo, subtitles, boats, clouds, etc.).
  • the user selects the "one-click removal" control corresponding to the "logo", thereby triggering a selection operation on the target object (ie, the logo).
  • the video repair function is called. Based on this, a suitable video repair method is used to repair the video to obtain a repaired video. There is no mark in the repaired video.
  • the repaired video can be played when the user triggers the playback action on the repaired video.
  • Figure 12 is a schematic diagram of the effect of removing subtitles based on a video repair application in an embodiment of the present application.
  • the video to be repaired and a list of repair objects are displayed on the interface provided by the video repair application.
  • the repair object list shows that there is at least one repairable object (for example, a sign, a subtitle, a boat, a cloud, etc.).
  • the user selects the "one-click removal" control corresponding to "subtitles”, thereby triggering a selection operation on the target object (ie, subtitles).
  • the video repair function is called. Based on this, a suitable video repair method is used to repair the video to obtain a repaired video. There are no subtitles in the repaired video.
  • the repaired video can be played when the user triggers the play command for the repaired video.
  • Figure 13 is a schematic diagram of the effect of removing objects based on a video repair application in an embodiment of the present application.
  • the video to be repaired and a list of repair objects are displayed on the interface provided by the video repair application.
  • the repair object list shows that there is at least one repairable object (for example, a sign, a subtitle, a boat, a cloud, etc.).
  • the user selects the "one-click removal" control corresponding to "boat”, thereby triggering a selection operation on the target object (ie, boat).
  • the video repair function is called. Based on this, a suitable video repair method is used to repair the video to obtain a repaired video. There is no object "ship” in the repaired video.
  • the repaired video can be played when the user triggers the play command for the repaired video.
  • embodiments of this application provide a method of intelligently repairing videos.
  • users can use the video repair application to choose to repair one or more objects in the video to achieve the purpose of intelligent repair. This not only improves the practicality of the solution, but also improves the efficiency of video repair.
  • Figure 14 is a schematic diagram comparing the effects of video frame repair based on the optical flow method and the model method in the embodiment of the present application. As shown in the figure, in one example, (a) in Figure 14 shows the effect based on the optical flow method and the model method. The effect of optical flow filling. (b) in Figure 14 shows the effect of filling based on the model method.
  • the original mask area is located in the lower left corner of the video frame (ie, the area enclosed by the rectangular frame).
  • the lens movement is smooth and the optical flow estimation is good. Therefore, this application chooses to use the optical flow method for filling.
  • (c) in Figure 14 shows the effect of filling based on the optical flow method
  • (d) in Figure 14 shows the effect of filling based on the model method.
  • the original mask area is located in the lower left corner of the video frame (i.e., the area enclosed by the rectangular frame domain), example: Since the optical flow is affected by the character's watch, this application chooses to use the model method for filling.
  • Figure 15 is a schematic diagram of a video repair device in the embodiment of the present application.
  • the video repair device 20 includes:
  • the acquisition module 210 is used to acquire a video sample sequence corresponding to the video to be repaired, where the video sample sequence includes K video frame pairs, each video frame pair includes two adjacent video frames, and K is an integer greater than or equal to 1. ;
  • the acquisition module 210 is also used to obtain a target mask sample sequence according to the video sample sequence, wherein the target mask sample sequence includes K target mask frames, and each target mask frame includes a target mask obtained by expanding the original mask area.
  • the target mask area and there is a one-to-one correspondence between K target mask frames and K video frame pairs;
  • the acquisition module 210 is also used to obtain an optical flow data sequence according to the video sample sequence, where the optical flow data sequence includes K optical flow data, and there is a one-to-one correspondence between the K optical flow data and the K video frame pairs. ;
  • the processing module 220 is configured to perform clustering processing on the pixels included in the target mask area in each target mask frame based on each optical flow data in the optical flow data sequence, and obtain the optical flow cluster of each target mask frame. class result;
  • the determination module 230 is used to determine the optical flow quality score according to the optical flow clustering result of each target mask frame
  • the repair module 240 is used to repair the video to be repaired using a video repair method that matches the optical flow quality score.
  • the acquisition module 210 is specifically used to acquire a video sequence from the video to be repaired, where the video sequence includes T original video frames, each original video frame displays a target object, and T is an integer greater than 1;
  • the size of each original video frame in the K video frame pairs to be processed is normalized respectively to obtain K video frame pairs, and the K video frame pairs are used as a video sample sequence.
  • the acquisition module 210 is specifically configured to obtain, for each video frame pair in the video sample sequence, an original mask frame corresponding to the video frame pair according to any video frame in the video frame pair, where the original mask frame includes a pair of The original mask area obtained after masking the target object in any video frame;
  • the original mask frame corresponding to the video frame pair For each video frame pair in the video sample sequence, the original mask frame corresponding to the video frame pair The original mask area in is expanded to obtain the target mask frame corresponding to the video frame;
  • the corresponding target mask frames of the K video frame pairs are used as the target mask sample sequence.
  • the acquisition module 210 is specifically configured to, for each video frame pair in the video sample sequence, expand the original mask area in the original mask frame corresponding to the video frame pair according to the first number of pixels to obtain the video frame pair.
  • an XOR operation is performed on the first mask area and the second mask area corresponding to the video frame pair to obtain the target mask frame corresponding to the video frame pair.
  • the acquisition module 210 is specifically configured to, for each video frame pair in the video sample sequence, obtain the first original mask frame corresponding to the video frame pair according to the previous video frame in the video frame pair, and obtain the first original mask frame corresponding to the video frame pair according to the video frame pair. Obtain the second original mask frame corresponding to the video frame pair in the latter video frame, where the first original mask frame and the second original mask frame respectively include the target object in the previous video frame and the next video frame.
  • the corresponding target mask frames of the K video frame pairs are used as the target mask sample sequence.
  • the acquisition module 210 is specifically configured to, for each video frame pair in the video sample sequence, expand the original mask area in the original mask frame corresponding to the video frame pair according to the first number of pixels to obtain the video frame pair.
  • an XOR operation is performed on the first mask area and the second mask area corresponding to the video frame pair to obtain the target mask frame corresponding to the video frame pair.
  • the acquisition module 210 is specifically used for each video frame pair in the video sample sequence, according to the The horizontal offset and vertical offset of each pixel in the next video frame relative to each pixel in the previous video frame determines the optical flow data corresponding to the video frame pair;
  • optical flow data corresponding to each of the K video frame pairs is used as an optical flow data sequence
  • the acquisition module 210 is specifically used for each video frame pair in the video sample sequence, according to the horizontal offset and vertical offset of each pixel point in the previous video frame relative to each pixel point in the next video frame in the video frame pair. Straight offset to determine the optical flow data corresponding to the video frame pair;
  • optical flow data corresponding to each of the K video frame pairs is used as an optical flow data sequence.
  • the processing module 220 is specifically configured to determine, for each target mask frame, the two pixels of X pixels in the target mask area in the target mask frame according to the optical flow data corresponding to the target mask frame in the optical flow data sequence.
  • dimensional optical flow value where the optical flow data corresponding to the target mask frame and the target mask frame correspond to the same video frame pair, and X is an integer greater than 1;
  • For each target mask frame cluster the X pixels according to the two-dimensional optical flow values of the X pixels in the target mask area to obtain the optical flow clustering result of the target mask frame.
  • the determination module 230 is specifically configured to determine the total number of categories of each target mask frame based on the optical flow clustering results of each target mask frame;
  • the optical flow quality score is determined to be the first score
  • the optical flow quality score is determined to be the second score.
  • the determination module 230 is specifically configured to determine the moving average of each cluster cluster according to the two-dimensional optical flow value of each pixel in each cluster cluster based on the optical flow clustering result of each target mask frame. , where the optical flow clustering results are used to characterize one or more clusters;
  • the moving average of each target mask frame is accumulated to obtain the total moving distance
  • the optical flow quality score is determined to be the first score
  • the optical flow quality score is determined to be the second score.
  • the determination module 230 is specifically configured to determine each target mask frame according to the optical flow clustering result. The total number of categories of target mask frames;
  • the moving average of each cluster is determined based on the two-dimensional optical flow value of each pixel in each cluster, where the optical flow clustering result Used to characterize one or more clusters;
  • For the optical flow clustering result of each target mask frame determine the moving average of the target mask frame based on the moving average of each cluster cluster;
  • the moving average of each target mask frame is accumulated to obtain the total moving distance
  • the optical flow quality score is determined to be the first score
  • the optical flow quality score is determined to be the second score.
  • the repair module 240 is specifically configured to use the optical flow method to repair the video to be repaired if the optical flow quality score is the first score.
  • the neural network is called to repair the video to be repaired.
  • the video repair device 20 further includes a display module 250;
  • the display module 250 is used to display the video to be repaired and a repair object list, where the repair object list includes at least one repairable object;
  • the acquisition module 210 is also configured to respond to the selection operation on the target object, and execute the step of acquiring the video sample sequence for the video to be repaired, wherein the target object belongs to at least one repairable object;
  • the display module 250 is also configured to use a video repair method that matches the optical flow quality score. After repairing the video to be repaired, respond to the playback operation of the repaired video and play the repaired video.
  • the embodiment of the present application also provides a terminal, as shown in Figure 16.
  • a terminal for convenience of explanation, only the parts related to the embodiment of the present application are shown. If the specific technical details are not disclosed, please refer to the method part of the embodiment of the present application.
  • the terminal is a mobile phone as an example for explanation:
  • FIG. 16 shows a block diagram of a partial structure of a mobile phone related to the terminal provided by the embodiment of the present application.
  • the mobile phone includes: a radio frequency (RF) circuit 310, a memory 320, an input unit 330 (which includes a touch panel 331 and other input devices 332), a display unit 340 (which includes a display panel 341), and a sensor 350 , audio circuit 360 (which is connected to a speaker 361 and a microphone 362), a wireless fidelity (wireless fidelity, WiFi) module 370, a processor 380, As well as power supply 390 and other components.
  • RF radio frequency
  • FIG. 16 does not limit the mobile phone, and may include more or fewer components than shown in the figure, or combine certain components, or arrange different components.
  • the memory 320 can be used to store software programs and modules, and the processor 380 executes various functional applications and data processing of the mobile phone by running the software programs and modules stored in the memory 320 .
  • the memory 320 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function (such as a sound playback function, an image playback function, etc.), etc.; the storage data area may store a program based on Data created by the use of mobile phones (such as audio data, phone books, etc.), etc.
  • memory 320 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device.
  • the processor 380 is the control center of the mobile phone, using various interfaces and lines to connect various parts of the entire mobile phone, by running or executing software programs and/or modules stored in the memory 320, and calling data stored in the memory 320. , perform various functions of the phone and process data.
  • the processor 380 may include one or more processing units; optionally, the processor 380 may integrate an application processor and a modem processor, where the application processor mainly processes the operating system, user interface and application programs. etc., the modem processor mainly handles wireless communications. It can be understood that the above modem processor may not be integrated into the processor 380 .
  • the steps performed by the terminal in the above embodiment may be based on the terminal structure shown in FIG. 16 .
  • FIG 17 is a schematic structural diagram of a server provided by an embodiment of the present application.
  • the server 400 may vary greatly due to different configurations or performance, and may include one or more central processing units (CPUs) 422 (for example, , one or more processors) and memory 432, one or more storage media 430 (eg, one or more mass storage devices) that stores applications 442 or data 444.
  • the memory 432 and the storage medium 430 may be short-term storage or persistent storage.
  • the program stored in the storage medium 430 may include one or more modules (not shown in the figure), and each module may include a series of instruction operations on the server.
  • the central processor 422 may be configured to communicate with the storage medium 430 and execute a series of instruction operations in the storage medium 430 on the server 400 .
  • Server 400 may also include one or more power supplies 426, one or more wired or wireless network interfaces 450, one or more input and output interfaces 458, and/or, one or more operating systems 441, such as Windows Server TM , Mac OS X TM , Unix TM , Linux TM , FreeBSD TM and more.
  • the steps performed by the server in the above embodiment may be based on the server structure shown in FIG. 17 .
  • An embodiment of the present application also provides a computer device, including a memory and a processor.
  • the memory stores a computer program.
  • the processor executes the computer program, it implements the steps of the methods described in the foregoing embodiments.
  • Embodiments of the present application also provide a computer-readable storage medium on which a computer program is stored.
  • the computer program is executed by a processor, the steps of the method described in each of the foregoing embodiments are implemented.
  • the embodiments of the present application also provide a computer program product, which includes a computer program.
  • a computer program product which includes a computer program.
  • the steps of the method described in each of the foregoing embodiments are implemented.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Processing (AREA)
  • Image Analysis (AREA)

Abstract

The present application discloses a video inpainting method based on artificial intelligence, with application scenarios at least comprising different types of terminal, such as a mobile phone, a computer, and a vehicle-mounted terminal. The present application comprises: acquiring a video sample sequence; acquiring a target mask sample sequence according to the video sample sequence; acquiring an optical flow data sequence according to the video sample sequence; on the basis of each piece of optical flow data in the optical flow data sequence, performing clustering processing on pixel points comprised by a target mask region in each target mask frame, to obtain an optical flow clustering result for each target mask frame; determining an optical flow quality score according to the optical flow clustering result of each target mask frame; and performing inpainting processing on a video to be inpainted using a video inpainting mode matching the optical flow quality score. The present application also provides a related apparatus. According to the present application, the optical flow quality is used as a basis for selecting a video inpainting mode, enabling different video inpainting modes to complement each other, thereby helping to obtain a video picture having a better inpainting effect.

Description

一种视频修复的方法、相关装置、设备以及存储介质A video repair method, related devices, equipment and storage media
本申请要求于2022年04月06日提交中国专利局、申请号为2022103555942、申请名称为“一种视频修复的方法、相关装置、设备以及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application requests the priority of the Chinese patent application submitted to the China Patent Office on April 6, 2022, with the application number 2022103555942 and the application title "A video repair method, related devices, equipment and storage media", and its entire content incorporated herein by reference.
技术领域Technical field
本申请涉及数据处理技术领域,尤其涉及视频修复技术。This application relates to the field of data processing technology, especially to video repair technology.
背景技术Background technique
视频修复(video inpainting)是一项用于在视频帧中缺失区域填补合理内容的任务,其主要利用视频中未被遮掩的区域信息对被遮掩的区域进行修复。例如,对损坏的视频进行修复,对不需要的对象进行移除,对视频重定位,对曝光不足的图像进行修复,等等。Video inpainting is a task used to fill in the missing areas in the video frame with reasonable content. It mainly uses the unmasked area information in the video to repair the masked areas. For example, repair damaged videos, remove unwanted objects, relocate videos, repair underexposed images, etc.
目前,视频修复技术主要分为两种,一种是利用光流传播与图像修复的技术,该技术先将可用像素通过光流传播到相应区域,再利用图像修复填充孤立的像素块。另一种是使用端到端的神经网络方法,使用生成模型的方式对遮挡区域进行填充。At present, video restoration technology is mainly divided into two types. One is the technology that uses optical flow propagation and image restoration. This technology first propagates available pixels to the corresponding areas through optical flow, and then uses image restoration to fill isolated pixel blocks. The other is to use an end-to-end neural network method to fill the occluded area using a generative model.
然而,上述视频修复技术至少存在如下问题,基于光流填充的内容清晰度较高,但过度依赖光流,而光流本身很容易受干扰且光流估计存在失准的可能性,因此,容易出现扭曲及错误填充的情况。端到端的神经网络方法考虑了语意信息,通常不会出现扭曲及严重错误等情况,但由于背景复杂,容易导致出现填充内容模糊的现象。However, the above video repair technology has at least the following problems. The content based on optical flow filling has higher definition, but relies too much on optical flow. The optical flow itself is easily disturbed and the optical flow estimation may be inaccurate. Therefore, it is easy to Distortions and incorrect fillings occur. The end-to-end neural network method takes into account semantic information and usually does not cause distortions and serious errors. However, due to the complex background, it is easy to cause the filling content to be blurred.
发明内容Contents of the invention
本申请实施例提供了一种视频修复的方法、相关装置、设备以及存储介质。本申请将光流质量作为选择视频修复方式的依据,使得不同的视频修复方式彼此之间能够取长补短,从而有利于获得修复效果更好的视频画面。Embodiments of the present application provide a video repair method, related devices, equipment, and storage media. This application uses optical flow quality as the basis for selecting a video repair method, so that different video repair methods can learn from each other's strengths and weaknesses, thereby helping to obtain video images with better repair effects.
有鉴于此,本申请一方面提供一种视频修复的方法,由计算机设备执行,包括:In view of this, on the one hand, this application provides a video repair method, which is executed by a computer device, including:
获取待修复视频对应的视频样本序列,其中,视频样本序列包括K个视频帧对,每个视频帧对包括相邻的两个视频帧,K为大于或等于1的整数;Obtain a video sample sequence corresponding to the video to be repaired, where the video sample sequence includes K video frame pairs, each video frame pair includes two adjacent video frames, and K is an integer greater than or equal to 1;
根据视频样本序列获取目标掩膜样本序列,其中,目标掩膜样本序列包括K个目标掩膜帧,每个目标掩膜帧包括对原始掩膜区域进行扩张后得到的目标掩膜区域,且,K个目标掩膜帧与K个视频帧对之间具有一一对应关系;Obtain the target mask sample sequence according to the video sample sequence, where the target mask sample sequence includes K target mask frames, and each target mask frame includes a target mask area obtained by expanding the original mask area, and, There is a one-to-one correspondence between K target mask frames and K video frame pairs;
根据视频样本序列获取光流数据序列,其中,光流数据序列包括K个光流数据,且,K个光流数据与K个视频帧对之间具有一一对应关系;Obtain an optical flow data sequence according to the video sample sequence, where the optical flow data sequence includes K optical flow data, and there is a one-to-one correspondence between the K optical flow data and K video frame pairs;
基于光流数据序列中的各个光流数据,对每个目标掩膜帧中目标掩膜区 域包括的像素点进行聚类处理,得到每个目标掩膜帧的光流聚类结果;Based on each optical flow data in the optical flow data sequence, the target mask area in each target mask frame is The pixels included in the domain are clustered to obtain the optical flow clustering results of each target mask frame;
根据每个目标掩膜帧的光流聚类结果,确定光流质量分值;Determine the optical flow quality score based on the optical flow clustering results of each target mask frame;
采用与光流质量分值匹配的视频修复方式,对待修复视频进行修复处理。Use a video repair method that matches the optical flow quality score to repair the video to be repaired.
本申请另一方面提供一种视频修复装置,包括:Another aspect of this application provides a video repair device, including:
获取模块,用于获取针对待修复视频的视频样本序列,其中,视频样本序列包括K个视频帧对,每个视频帧对包括相邻的两个视频帧,K为大于或等于1的整数;An acquisition module is used to obtain a video sample sequence for the video to be repaired, where the video sample sequence includes K video frame pairs, each video frame pair includes two adjacent video frames, and K is an integer greater than or equal to 1;
获取模块,还用于根据视频样本序列获取目标掩膜样本序列,其中,目标掩膜样本序列包括K个目标掩膜帧,每个目标掩膜帧包括对原始掩膜区域进行扩张后得到的目标掩膜区域,且,K个目标掩膜帧与K个视频帧对之间具有一一对应关系;The acquisition module is also used to obtain a target mask sample sequence according to the video sample sequence, wherein the target mask sample sequence includes K target mask frames, and each target mask frame includes a target obtained by expanding the original mask area. Mask area, and there is a one-to-one correspondence between K target mask frames and K video frame pairs;
获取模块,还用于根据视频样本序列获取光流数据序列,其中,光流数据序列包括K个光流数据,且,K个光流数据与K个视频帧对之间具有一一对应关系;The acquisition module is also used to obtain the optical flow data sequence according to the video sample sequence, wherein the optical flow data sequence includes K optical flow data, and there is a one-to-one correspondence between the K optical flow data and the K video frame pairs;
处理模块,用于基于光流数据序列中的各个光流数据,对每个目标掩膜帧中目标掩膜区域包括的像素点进行聚类处理,得到每个目标掩膜帧的光流聚类结果;The processing module is used to cluster the pixels included in the target mask area in each target mask frame based on each optical flow data in the optical flow data sequence, and obtain the optical flow clustering of each target mask frame. result;
确定模块,用于根据每个目标掩膜帧的光流聚类结果,确定光流质量分值;A determination module used to determine the optical flow quality score based on the optical flow clustering results of each target mask frame;
修复模块,用于采用与光流质量分值匹配的视频修复方式,对待修复视频进行修复处理。The repair module is used to repair the video to be repaired using a video repair method that matches the optical flow quality score.
本申请另一方面提供一种计算机设备,包括存储器和处理器,存储器存储有计算机程序,处理器执行计算机程序时实现上述各方面的方法。Another aspect of the present application provides a computer device, including a memory and a processor. The memory stores a computer program. When the processor executes the computer program, the methods of the above aspects are implemented.
本申请的另一方面提供了一种计算机可读存储介质,其上存储有计算机程序,计算机程序被处理器执行时实现上述各方面的方法。Another aspect of the present application provides a computer-readable storage medium on which a computer program is stored. When the computer program is executed by a processor, the methods of the above aspects are implemented.
本申请的另一个方面,提供了一种计算机程序产品,包括计算机程序,该计算机程序被处理器执行时实现上述各方面的方法。Another aspect of the present application provides a computer program product, including a computer program, which implements the methods of the above aspects when executed by a processor.
从以上技术方案可以看出,本申请实施例具有以下优点:It can be seen from the above technical solutions that the embodiments of the present application have the following advantages:
本申请实施例中,提供了一种视频修复的方法,首先获取待修复视频对应的视频样本序列,然后可以根据视频样本序列获取目标掩膜样本序列,其中,每个目标掩膜帧包括对原始掩膜区域进行扩张后得到的目标掩膜区域。进而,根据视频样本序列获取光流数据序列,再基于光流数据序列中的各个光流数据,对每个目标掩膜帧中目标掩膜区域包括的像素点进行聚类处理,得到每个目标掩膜帧的光流聚类结果。基于此,可根据每个目标掩膜帧的光流聚类结果确定光流质量分值,并采用与光流质量分值匹配的视频修复方式,对待修复视频进行修复处理。通过上述方式,利用被遮掩区域的光流聚类结果对光流质量进行预判,在光流质量较好的情况下,可使用光流法作为视频 修复方式,以获得清晰度及可信度较高的填充内容。在光流质量较差的情况下,可使用生成模型作为视频修复方式,得到稳定性较高的填充效果。可见,本申请将光流质量作为选择视频修复方式的依据,使得不同的视频修复方式彼此之间取长补短,从而有利于获得修复效果更好的视频画面。In the embodiment of the present application, a video repair method is provided. First, a video sample sequence corresponding to the video to be repaired is obtained, and then a target mask sample sequence can be obtained according to the video sample sequence, wherein each target mask frame includes a pair of original The target mask area obtained after the mask area is expanded. Furthermore, the optical flow data sequence is obtained according to the video sample sequence, and then based on each optical flow data in the optical flow data sequence, the pixels included in the target mask area in each target mask frame are clustered to obtain each target. Optical flow clustering results of mask frames. Based on this, the optical flow quality score can be determined based on the optical flow clustering results of each target mask frame, and the video to be repaired can be repaired using a video repair method that matches the optical flow quality score. Through the above method, the optical flow clustering results of the masked area are used to predict the optical flow quality. When the optical flow quality is good, the optical flow method can be used as the video Repair methods to obtain filler content with higher clarity and credibility. When the optical flow quality is poor, the generative model can be used as a video repair method to obtain a more stable filling effect. It can be seen that this application uses optical flow quality as the basis for selecting a video repair method, so that different video repair methods can learn from each other's strengths and weaknesses, thereby helping to obtain a video picture with better repair effects.
附图说明Description of drawings
图1为本申请实施例中视频修复系统的一个架构示意图;Figure 1 is an architectural schematic diagram of a video repair system in an embodiment of the present application;
图2为本申请实施例中基于光流法实现视频帧填充的一个效果图;Figure 2 is an effect diagram of video frame filling based on the optical flow method in the embodiment of the present application;
图3为本申请实施例中基于模型法实现视频帧填充的一个效果图;Figure 3 is an effect diagram of video frame filling based on the model method in the embodiment of the present application;
图4为本申请实施例中视频修复方法的一个流程示意图;Figure 4 is a schematic flow chart of a video repair method in an embodiment of the present application;
图5为本申请实施例中生成目标掩膜帧的一个示意图;Figure 5 is a schematic diagram of generating a target mask frame in an embodiment of the present application;
图6为本申请实施例中生成目标掩膜帧的另一个示意图;Figure 6 is another schematic diagram of generating a target mask frame in an embodiment of the present application;
图7为本申请实施例中生成目标掩膜帧的又一个示意图;Figure 7 is another schematic diagram of generating a target mask frame in an embodiment of the present application;
图8为本申请实施例中生成目标掩膜帧的再一个示意图;Figure 8 is another schematic diagram of generating a target mask frame in an embodiment of the present application;
图9为本申请实施例中基于前向光流确定二维光流值的一个示意图;Figure 9 is a schematic diagram of determining a two-dimensional optical flow value based on forward optical flow in an embodiment of the present application;
图10为本申请实施例中基于后向光流确定二维光流值的一个示意图;Figure 10 is a schematic diagram of determining a two-dimensional optical flow value based on backward optical flow in an embodiment of the present application;
图11为本申请实施例中基于视频修复应用移除标志的一个效果示意图;Figure 11 is a schematic diagram of the effect of removing a mark based on a video repair application in an embodiment of the present application;
图12为本申请实施例中基于视频修复应用移除字幕的一个效果示意图;Figure 12 is a schematic diagram of the effect of removing subtitles based on a video repair application in an embodiment of the present application;
图13为本申请实施例中基于视频修复应用移除物体的一个效果示意图;Figure 13 is a schematic diagram of the effect of object removal based on video repair application in the embodiment of the present application;
图14为本申请实施例中基于光流法和模型法实现视频帧修复的效果对比示意图;Figure 14 is a schematic diagram comparing the effects of video frame restoration based on the optical flow method and the model method in the embodiment of the present application;
图15为本申请实施例中视频修复装置的一个示意图;Figure 15 is a schematic diagram of the video repair device in the embodiment of the present application;
图16为本申请实施例中终端的一个结构示意图;Figure 16 is a schematic structural diagram of a terminal in an embodiment of the present application;
图17为本申请实施例中服务器的一个结构示意图。Figure 17 is a schematic structural diagram of a server in an embodiment of the present application.
具体实施方式Detailed ways
随着多媒体和人工智能(Artificial Intelligence,AI)时代的来临,视频逐渐成为主流的信息交流方式,海量的视频对于视频质量管理提出了更多的挑战。目前,由于某种原因可能导致视频存在缺陷,例如,视频画面中存在马赛克图案,马赛克图案会影响用户的观看体验。又例如,视频在形成的过程中可能会存在台标或者广告图案等。基于此,本申请提出一种视频修复的方法,旨在对视频中不需要的对象移除、或者对损坏的画面进行恢复等。With the advent of the era of multimedia and artificial intelligence (AI), video has gradually become the mainstream method of information exchange, and massive videos pose more challenges to video quality management. Currently, videos may be defective due to some reasons. For example, there is a mosaic pattern in the video screen. The mosaic pattern will affect the user's viewing experience. For another example, there may be station logos or advertising patterns during the video creation process. Based on this, this application proposes a video repair method, aiming to remove unnecessary objects in the video or restore damaged pictures.
视频修复方法具体涉及到基于AI的计算机视觉(computer vision,CV)技术以及机器学习(machine learning,ML)。即,通过CV技术从视频中识别出可修复对象(例如,台标,字幕等)。通过ML训练得到的神经网络,对视频画面进行修复。Video repair methods specifically involve AI-based computer vision (CV) technology and machine learning (ML). That is, repairable objects (for example, station logos, subtitles, etc.) are identified from the video through CV technology. The video image is repaired through the neural network trained by ML.
为了提升视频画面修复的效果,本申请提出了一种视频修复方法,该方法应用于图1所示的视频修复系统,如图所示,视频修复系统系统包括服务 器和终端,且终端上部署有客户端,其中,客户端可以通过浏览器的形式运行于终端上,也可以通过独立的应用程序(application,APP)的形式运行于终端上等,此处对于客户端的具体展现形式不做限定。本申请涉及的服务器可以是独立的物理服务器,也可以是多个物理服务器构成的服务器集群或者分布式系统,还可以是提供云服务的云服务器。终端可以是手机、电脑、智能语音交互设备、智能家电、车载终端、飞行器等,但并不局限于此。终端与服务器可以通过有线或无线通信方式进行直接或间接地连接,本申请在此不做限制。对服务器和终端的数量也不做限制。本申请提供的方案可以由终端独立完成,也可以由服务器独立完成,还可以由终端与服务器配合完成,本申请对此不做具体限定。In order to improve the effect of video picture repair, this application proposes a video repair method, which is applied to the video repair system shown in Figure 1. As shown in the figure, the video repair system includes services Server and terminal, and a client is deployed on the terminal. The client can run on the terminal in the form of a browser, or can also run on the terminal in the form of an independent application (application, APP). Here, for The specific presentation form of the client is not limited. The server involved in this application can be an independent physical server, a server cluster or distributed system composed of multiple physical servers, or a cloud server that provides cloud services. The terminal can be a mobile phone, computer, intelligent voice interaction device, smart home appliance, vehicle terminal, aircraft, etc., but is not limited to this. The terminal and the server can be connected directly or indirectly through wired or wireless communication methods, which is not limited in this application. There is no limit on the number of servers and terminals. The solution provided by this application can be completed independently by the terminal, or independently by the server, or it can also be completed by the terminal and the server in cooperation. This application does not specifically limit this.
下面将结合图1所示的架构,介绍两种视频修复场景的工作流程。The following will introduce the workflow of two video repair scenarios based on the architecture shown in Figure 1.
示例性地,一种情况下,用户可以通过终端将视频上传至服务器,服务器可直接调用视频修复功能。即,先判定所选的视频修复算法(即,光流法或模型法),基于此,使用相应的视频修复算法对视频进行修复。最后,将修复的视频存储至数据库。当终端向服务器请求播放视频时,服务器可从数据库中获取相应的视频,并反馈至终端。For example, in one case, the user can upload the video to the server through the terminal, and the server can directly call the video repair function. That is, the selected video repair algorithm (ie, optical flow method or model method) is first determined, and based on this, the corresponding video repair algorithm is used to repair the video. Finally, the repaired video is stored in the database. When the terminal requests the server to play a video, the server can obtain the corresponding video from the database and feed it back to the terminal.
示例性地,另一种情况下,用户可以通过终端将视频上传至服务器,服务器将终端上传的视频存储至数据库。当需要对视频进行修复时,可从数据库中选择相应的视频,然后调用视频修复功能。即,先判定所选的视频修复算法(即,光流法或模型法)。基于此,使用相应的视频修复算法对视频进行修复。最后,将修复的视频存储至数据库。For example, in another case, the user can upload the video to the server through the terminal, and the server stores the video uploaded by the terminal into the database. When a video needs to be repaired, the corresponding video can be selected from the database and then the video repair function can be called. That is, the selected video repair algorithm (ie, optical flow method or model method) is first determined. Based on this, the video is repaired using the corresponding video repair algorithm. Finally, the repaired video is stored in the database.
使用光流法和模型法对视频进行修复的效果存在一定差异。下面将结合图示进行介绍。There are certain differences in the effects of video repair using optical flow method and model method. The following will be introduced with illustrations.
一、基于光流法填充掩膜区域;1. Fill the mask area based on the optical flow method;
为了便于介绍,请参阅图2,图2为本申请实施例中基于光流法实现视频帧填充的一个效果图,如图2中(a)图所示,在视频帧中检测出掩膜物体。经过光流法填充之后可得到如图2中(b)图所示的视频帧。可见,在出现物体遮挡及背景发生复杂运动的情况下,基于光流法达到的填充效果会受到较大影响,而光流估计错误带来的错误像素会随其传播逐渐扩大,导致填充内容错误。For ease of introduction, please refer to Figure 2. Figure 2 is an effect diagram of video frame filling based on the optical flow method in the embodiment of the present application. As shown in (a) of Figure 2, the mask object is detected in the video frame. . After filling with the optical flow method, the video frame shown in (b) in Figure 2 can be obtained. It can be seen that in the case of object occlusion and complex movement of the background, the filling effect based on the optical flow method will be greatly affected, and the erroneous pixels caused by the optical flow estimation error will gradually expand as it propagates, resulting in incorrect filling content. .
二、基于模型法填充掩膜区域;2. Fill the mask area based on the model method;
为了便于介绍,请参阅图3,图3为本申请实施例中基于模型法实现视频帧填充的一个效果图,如图3中(a)图所示,在视频帧中检测出掩膜物体。经过模型法填充之后可得到如图3中(b)图所示的视频帧。可见,填充部分较为模糊,且受限于显存等原因,难以处理过高分辨率的输入,但整体效果较为稳定,不易发生对比强烈的明显错误。For ease of introduction, please refer to Figure 3. Figure 3 is an rendering of video frame filling based on the model method in an embodiment of the present application. As shown in (a) of Figure 3, a mask object is detected in the video frame. After filling with the model method, the video frame shown in (b) in Figure 3 can be obtained. It can be seen that the filling part is blurry, and it is difficult to process high-resolution input due to limitations such as video memory, but the overall effect is relatively stable and obvious errors with strong contrast are not prone to occur.
结合上述介绍,受限于光流法与模型法的修复质量,本申请提出一种视 频修复方法,能够预先判定选择何种视频修复方式进行画面修复,进而采用较合适的视频修复方式修复视频画面,以此达到更鲁棒的填充效果。下面将对本申请中视频修复的方法进行介绍,请参阅图4,本申请实施例中视频修复方法可以由计算机设备执行,该计算机设备可以是终端或服务器,本申请实施例包括:Based on the above introduction, limited by the repair quality of the optical flow method and the model method, this application proposes a visual The frequency repair method can determine in advance which video repair method to choose for picture repair, and then use a more appropriate video repair method to repair the video picture, so as to achieve a more robust filling effect. The video repair method in the present application will be introduced below. Please refer to Figure 4. The video repair method in the embodiment of the present application can be executed by a computer device, and the computer device can be a terminal or a server. The embodiment of the present application includes:
110、获取待修复视频对应的视频样本序列,其中,视频样本序列包括K个视频帧对,每个视频帧对包括相邻的两个视频帧,K为大于或等于1的整数。110. Obtain the video sample sequence corresponding to the video to be repaired, where the video sample sequence includes K video frame pairs, each video frame pair includes two adjacent video frames, and K is an integer greater than or equal to 1.
在一个或多个实施例中,计算机设备可以获取待修复视频,再从待修复视频中抽取K个视频帧对组成视频样本序列,每个视频帧对包括相邻的两个视频帧,且每个视频帧具有对应的视频帧编号。示例性地,若未进行归一化,则视频样本序列可表示为xs,xs={(x1,x2),(x11,x12),…}。;其中,视频样本序列包括K个视频帧对,即,第1个视频帧对表示为(x1,x2),第2个视频帧对表示为(x11,x12),以此类推。示例性地,若已进行归一化,则视频样本序列表示为xsr,xsr={(xr1,xr2),(xr11,xr12),…};其中,视频样本序列包括K个视频帧对,即,第1个视频帧对表示为(xr1,xr2),第2个视频帧对表示为(xr11,xr12),以此类推。In one or more embodiments, the computer device can obtain the video to be repaired, and then extract K video frame pairs from the video to be repaired to form a video sample sequence. Each video frame pair includes two adjacent video frames, and each video frame pair includes two adjacent video frames. A video frame has a corresponding video frame number. For example, if no normalization is performed, the video sample sequence can be expressed as x s , x s ={(x 1 , x 2 ), (x 11 , x 12 ),...}. ; Among them, the video sample sequence includes K video frame pairs, that is, the first video frame pair is expressed as (x 1 , x 2 ), the second video frame pair is expressed as (x 11 , x 12 ), and so on. . For example, if normalization has been performed , the video sample sequence is expressed as A video frame pair, that is, the first video frame pair is expressed as (x r1 , x r2 ), the second video frame pair is expressed as (x r11 , x r12 ), and so on.
120、根据视频样本序列获取目标掩膜样本序列,其中,目标掩膜样本序列包括K个目标掩膜帧,每个目标掩膜帧包括对原始掩膜区域进行扩张后得到的目标掩膜区域,且,K个目标掩膜帧与K个视频帧对之间具有一一对应关系。120. Obtain the target mask sample sequence according to the video sample sequence, where the target mask sample sequence includes K target mask frames, and each target mask frame includes a target mask area obtained by expanding the original mask area, Moreover, there is a one-to-one correspondence between K target mask frames and K video frame pairs.
在一个或多个实施例中,计算机设备得到视频样本序列之后,针对其中每个视频帧对,可获取对应的至少一个原始掩膜帧。针对每个原始掩膜帧,标记出对应的原始掩膜区域,然后按照一定像素点个数对原始掩膜区域进行扩张,从而得到目标掩膜区域。再根据目标掩膜区域得到目标掩膜帧。由此,得到包含有K个目标掩膜帧的目标掩膜样本序列。In one or more embodiments, after the computer device obtains the video sample sequence, for each video frame pair, the corresponding at least one original mask frame can be obtained. For each original mask frame, the corresponding original mask area is marked, and then the original mask area is expanded according to a certain number of pixels to obtain the target mask area. Then the target mask frame is obtained according to the target mask area. Thus, a target mask sample sequence including K target mask frames is obtained.
130、根据视频样本序列获取光流数据序列,其中,光流数据序列包括K个光流数据,且,K个光流数据与K个视频帧对之间具有一一对应关系。130. Obtain the optical flow data sequence according to the video sample sequence, where the optical flow data sequence includes K optical flow data, and there is a one-to-one correspondence between the K optical flow data and K video frame pairs.
在一个或多个实施例中,计算机设备可以根据视频样本序列中的K个视频帧对,分别生成对应的光流数据,由此,得到包括K个光流数据的光流数据序列。其中,光流数据可表示为两个通道的光流矩阵。一个光流矩阵用于记录视频帧对中所有像素点的水平方向偏移,另一个光流矩阵用于记录视频帧对中所有像素点的竖直方向偏移。In one or more embodiments, the computer device may generate corresponding optical flow data respectively according to K video frame pairs in the video sample sequence, thereby obtaining an optical flow data sequence including K optical flow data. Among them, the optical flow data can be expressed as an optical flow matrix of two channels. One optical flow matrix is used to record the horizontal offset of all pixels in the video frame pair, and the other optical flow matrix is used to record the vertical offset of all pixels in the video frame pair.
需要说明的是,在实际应用中,计算机设备可以先执行步骤120、后执行步骤130,也可以先执行步骤130、后执行步骤120,还可以同时执行步骤120和130,本申请实施例对于步骤120和步骤130的执行顺序不做任何限定。It should be noted that in practical applications, the computer device can perform step 120 first and then step 130, or it can perform step 130 first and then step 120, or it can also perform steps 120 and 130 at the same time. In the embodiment of the present application, steps The execution order of step 120 and step 130 is not limited in any way.
140、基于光流数据序列中的各个光流数据,对每个目标掩膜帧中目标掩 膜区域包括的像素点进行聚类处理,得到每个目标掩膜帧的光流聚类结果。140. Based on each optical flow data in the optical flow data sequence, calculate the target mask in each target mask frame. The pixels included in the mask area are clustered to obtain the optical flow clustering results of each target mask frame.
在一个或多个实施例中,光流数据序列与目标掩膜样本序列是对齐的,即,光流数据序列中的光流数据对应于目标掩膜样本序列中的目标掩膜帧。基于此,针对每个目标掩膜帧,采用对应的光流数据为其中目标掩膜区域内的各个像素点赋予对应的二维光流值。然后,基于各个像素点的二维光流值,采用聚类算法对这些像素点进行聚类处理,由此,得到每个目标掩膜帧的光流聚类结果。In one or more embodiments, the optical flow data sequence and the target mask sample sequence are aligned, that is, the optical flow data in the optical flow data sequence corresponds to the target mask frame in the target mask sample sequence. Based on this, for each target mask frame, the corresponding optical flow data is used to assign corresponding two-dimensional optical flow values to each pixel in the target mask area. Then, based on the two-dimensional optical flow value of each pixel, a clustering algorithm is used to cluster these pixels, thereby obtaining the optical flow clustering result of each target mask frame.
可以理解的是,本申请可采用具有噪声的基于密度的聚类方法(density-based spatial clustering of applications with noise,DBSCAN),或,均值漂移聚类方法(meanshift),又或者其他聚类方法对像素点进行聚类,此处不做限定。It can be understood that this application can use density-based spatial clustering of applications with noise (DBSCAN), or mean shift clustering method (meanshift), or other clustering methods for Pixel points are clustered, which is not limited here.
150、根据每个目标掩膜帧的光流聚类结果,确定光流质量分值。150. Determine the optical flow quality score based on the optical flow clustering results of each target mask frame.
在一个或多个实施例中,计算机设备可以结合各个目标掩膜帧的光流聚类结果,综合判断光流质量的好坏,由此,生成相应的光流质量分值。示例性地,本申请中的光流质量分值为第一分值或第二分值。其中,光流质量分值为第一分值时表示光流质量较好,例如,第一分值可以为“1”。而光流质量分值为第二分值时表示光流质量较差,例如,第二分值可以为“0”。In one or more embodiments, the computer device can combine the optical flow clustering results of each target mask frame to comprehensively determine the quality of the optical flow, thereby generating a corresponding optical flow quality score. For example, the optical flow quality score in this application is the first score or the second score. Wherein, when the optical flow quality score is the first score, it means that the optical flow quality is good. For example, the first score may be "1". When the optical flow quality score is the second score, it means that the optical flow quality is poor. For example, the second score may be "0".
可以理解的是,实际应用中,还可以对第一分值和第二分值分别设置其他的值,此处仅为一个示意,不应理解为对本申请的限定。It can be understood that in actual applications, other values can also be set for the first score and the second score respectively. This is only an illustration and should not be understood as a limitation of the present application.
160、采用与光流质量分值匹配的视频修复方式,对待修复视频进行修复处理。160. Use a video repair method that matches the optical flow quality score to repair the video to be repaired.
在一个或多个实施例中,计算机设备可根据光流质量分值选定相应的视频修复方式。即,如果光流质量分值为第一分值,则采用光流法对待修复视频进行修复处理。如果光流质量分值为第二分值,则调用神经网络对待修复视频进行修复处理。In one or more embodiments, the computer device may select a corresponding video repair method based on the optical flow quality score. That is, if the optical flow quality score is the first score, the optical flow method is used to repair the video to be repaired. If the optical flow quality score is the second score, the neural network is called to repair the video to be repaired.
具体地,采用光流法对待修复视频进行修复处理的过程主要包括:使用相邻帧进行光流估计,然后对各个帧中的原始掩膜区域进行光流填充,使用光流将未遮掩区域的像素梯度传播至原始掩膜区域。再对像素梯度进行泊松重建,生成红绿蓝(red green blue,RGB)像素。最后,对光流无法填充的区域进行图像修复。Specifically, the process of using the optical flow method to repair the video to be repaired mainly includes: using adjacent frames for optical flow estimation, then filling the original mask area in each frame with optical flow, and using optical flow to fill the unmasked area with optical flow. Pixel gradients are propagated to the original mask area. Then Poisson reconstruction is performed on the pixel gradient to generate red green blue (RGB) pixels. Finally, image inpainting is performed on areas that cannot be filled by optical flow.
调用神经网络对待修复视频进行修复处理的过程为,将接收帧序列信息作为输入,经过神经网络处理后输出修复好的视频帧。可以理解的是,神经网络多采用编码器-解码器结构。本申请采用的神经网络可以是细粒度图像分类(fine-grained visual categorization,FGVC)网络,或,时空变换器网络(spatial-temporal transformer network,STTN),或,解耦时空注意网络(decoupled spatial-temporal attention network,DSTT)等,此处不做限定。The process of calling the neural network to repair the video to be repaired is to receive the frame sequence information as input, and then output the repaired video frame after being processed by the neural network. It is understandable that neural networks mostly use an encoder-decoder structure. The neural network used in this application can be a fine-grained visual categorization (FGVC) network, or a spatial-temporal transformer network (STTN), or a decoupled spatial-temporal attention network (decoupled spatial- Temporal attention network, DSTT), etc., are not limited here.
本申请实施例提供了一种视频修复的方法。通过上述方式,利用被遮掩 区域的光流聚类结果对光流质量进行预判,在光流质量较好的情况下,可使用光流法作为视频修复方式,以获得清晰度及可信度较高的填充内容。在光流质量较差的情况下,可使用生成模型作为视频修复方式,得到稳定性较高的填充效果。可见,本申请将光流质量作为选择视频修复方式的依据,达到使不同的视频修复方式彼此之间取长补短的目的,从而有利于获得修复效果更好的视频画面。The embodiment of this application provides a video repair method. In the above way, using the masked The regional optical flow clustering results predict the optical flow quality. When the optical flow quality is good, the optical flow method can be used as a video repair method to obtain filling content with higher clarity and credibility. When the optical flow quality is poor, the generative model can be used as a video repair method to obtain a more stable filling effect. It can be seen that this application uses the optical flow quality as the basis for selecting a video repair method, so as to achieve the purpose of making different video repair methods complement each other, thereby conducive to obtaining a video picture with better repair effect.
可选地,在上述图4对应的实施例的基础上,本申请实施例提供的另一个可选实施例中,获取待修复视频对应的视频样本序列,具体可以包括:Optionally, based on the above-mentioned embodiment corresponding to Figure 4, in another optional embodiment provided by the embodiment of the present application, obtaining a video sample sequence corresponding to the video to be repaired may specifically include:
从待修复视频中获取视频序列,其中,视频序列包括T个原始视频帧,每个原始视频帧显示有目标对象,T为大于1的整数;Obtain a video sequence from the video to be repaired, where the video sequence includes T original video frames, each original video frame displays a target object, and T is an integer greater than 1;
从视频序列中抽取K个待处理视频帧对,其中,每个待处理视频帧对包括相邻的两个原始视频帧;Extract K video frame pairs to be processed from the video sequence, where each video frame pair to be processed includes two adjacent original video frames;
对K个待处理视频帧对中各个原始视频帧的尺寸分别进行归一化处理,得到K个视频帧对,并将K个视频帧对作为视频样本序列。The size of each original video frame in the K video frame pairs to be processed is normalized respectively to obtain K video frame pairs, and the K video frame pairs are used as a video sample sequence.
在一个或多个实施例中,介绍了一种生成视频样本序列的方式。由前述实施例可知,视频样本序列来源于待修复视频,待修复视频表示为x={xt}(t=1,2,…,T),可见,待修复视频包括T个原始视频帧,即,xt表示第t个原始视频帧。In one or more embodiments, a way of generating a sequence of video samples is introduced. As can be seen from the foregoing embodiments, the video sample sequence originates from the video to be repaired, and the video to be repaired is expressed as x={x t }(t=1,2,...,T). It can be seen that the video to be repaired includes T original video frames, That is, x t represents the t-th original video frame.
具体地,可按照一定间隔抽取相邻的原始视频帧。例如,每隔10帧抽一组相邻的原始视频帧,那么抽取到的序列可以表示为xs,xs={(x1,x2),(x11,x12),…},其中,该序列包括K个待处理视频帧对,即,第1个待处理视频帧对表示为(x1,x2),第2个待处理视频帧对表示为(x11,x12),以此类推。基于此,对待处理视频帧对中的各个原始视频帧分别进行尺寸归一化处理,从而得到相应的视频帧。相邻的视频帧构成视频帧对,而K个视频帧对组成视频样本序列,其中,视频样本序列可表示为xsr,xsr={(xr1,xr2),(xr11,xr12),…}。Specifically, adjacent original video frames can be extracted at certain intervals. For example, if a group of adjacent original video frames is extracted every 10 frames, the extracted sequence can be expressed as x s , x s = {(x 1 ,x 2 ), (x 11 ,x 12 ),…}, Among them, the sequence includes K video frame pairs to be processed, that is, the first video frame pair to be processed is expressed as (x 1 , x 2 ), and the second video frame pair to be processed is expressed as (x 11 , x 12 ) , and so on. Based on this, each original video frame in the pair of video frames to be processed is subjected to size normalization processing to obtain the corresponding video frame. Adjacent video frames constitute a video frame pair, and K video frame pairs constitute a video sample sequence, where the video sample sequence can be expressed as x sr , x sr = {(x r1 ,x r2 ), (x r11 ,x r12 ),…}.
需要说明的是,经过尺寸归一化处理后的视频帧具有固定尺寸,例如,512×288。It should be noted that the video frame after size normalization has a fixed size, for example, 512×288.
其次,本申请实施例提供了一种生成视频样本序列的方式。通过上述方式,一方面,从视频序列中抽取若干个待处理视频帧对用于后续处理,能够减少数据处理量,节省数据处理资源。另一方面,对原始视频帧进行尺寸归一化处理,不仅可以对齐各个视频帧的统计量,还能够缩小视频帧的尺寸,从而提升处理效率。Secondly, embodiments of this application provide a way to generate a video sample sequence. Through the above method, on the one hand, extracting several video frame pairs to be processed from the video sequence for subsequent processing can reduce the amount of data processing and save data processing resources. On the other hand, normalizing the size of the original video frame can not only align the statistics of each video frame, but also reduce the size of the video frame, thereby improving processing efficiency.
可选地,在上述图4对应的实施例的基础上,本申请实施例提供的另一个可选实施例中,根据视频样本序列获取目标掩膜样本序列,具体可以包括:Optionally, based on the above-mentioned embodiment corresponding to Figure 4, in another optional embodiment provided by the embodiment of the present application, obtaining the target mask sample sequence according to the video sample sequence may specifically include:
针对视频样本序列中的每个视频帧对,根据该视频帧对中的任一个视频帧,获取该视频帧对对应的原始掩膜帧,其中,原始掩膜帧包括对该任一个 视频帧中的目标对象进行掩膜处理后得到的原始掩膜区域;For each video frame pair in the video sample sequence, obtain the original mask frame corresponding to the video frame pair according to any video frame in the video frame pair, where the original mask frame includes the corresponding The original mask area obtained after masking the target object in the video frame;
针对视频样本序列中的每个视频帧对,对该视频帧对对应的原始掩膜帧中的原始掩膜区域进行扩张,得到该视频帧对对应的目标掩膜帧;For each video frame pair in the video sample sequence, expand the original mask area in the original mask frame corresponding to the video frame pair to obtain the target mask frame corresponding to the video frame pair;
将K个视频帧对各自对应的目标掩膜帧作为目标掩膜样本序列。The corresponding target mask frames of the K video frame pairs are used as the target mask sample sequence.
在一个或多个实施例中,介绍了一种基于单视频帧生成目标掩膜帧的方式。由前述实施例可知,待修复视频中显示有目标对象,对此,需要对目标对象进行掩膜处理,从而得到相应的原始掩膜区域。再按照一定像素点个数对原始掩膜区域进行扩张,得到目标掩膜区域。In one or more embodiments, a method of generating a target mask frame based on a single video frame is introduced. It can be known from the foregoing embodiments that the target object is displayed in the video to be repaired. For this purpose, the target object needs to be masked to obtain the corresponding original mask area. Then the original mask area is expanded according to a certain number of pixels to obtain the target mask area.
需要说明的是,目标对象可以是标志,字幕,物体等。可以理解的是,识别目标对象的方式包含但不限于人工标注以及模型识别等,例如,采用全卷积网络(fully convolution network,FCN)识别出目标对象。It should be noted that the target object can be a logo, subtitle, object, etc. It can be understood that the method of identifying target objects includes but is not limited to manual annotation and model recognition. For example, using a fully convolutional network (fully convolution network, FCN) to identify the target object.
示例性地,一种处理方式为,待修复视频为x={xt}(t=1,2,…,T)。对该待修复视频中的每个原始视频帧进行掩膜处理,由此,得到m={mt}(t=1,2,…,T)。假设每隔10帧抽一组相邻的原始视频帧,则抽取到视频样本序列表示为xs={(x1,x2),(x11,x12),…},由此,得到对应的掩膜帧序列表示为ms={(m1,m2),(m11,m12),…},其中,(m1,m2)对应于(x1,x2),(m11,m12)对应于(x11,x12),以此类推。若采用前向光流,则基于ms提取每个视频帧对的前一个视频帧,得到msF={m1,m11,…},再对msF进行归一化,得到的原始掩膜帧序列表示为msr={mr1,mr11,…};其中,mr1对应于(x1,x2),mr11对应于(x11,x12),以此类推。若采用后向光流,则基于ms提取每个视频帧对的后一个视频帧,得到msB={m2,m12,…},再对msB进行归一化,得到的原始掩膜帧序列表示为msr={mr2,mr12,…},;其中,mr2对应于(x1,x2),mr12对应于(x11,x12),以此类推。其中,原始掩膜帧序列包括K个原始掩膜帧。For example, one processing method is that the video to be repaired is x={x t }(t=1,2,...,T). Mask processing is performed on each original video frame in the video to be repaired, thereby obtaining m={m t } (t=1, 2,...,T). Assuming that a group of adjacent original video frames is extracted every 10 frames, the extracted video sample sequence is expressed as x s = {(x 1 , x 2 ), (x 11 , x 12 ),...}, thus, we get The corresponding mask frame sequence is expressed as m s = {(m 1 ,m 2 ), (m 11 ,m 12 ),…}, where (m 1 ,m 2 ) corresponds to (x 1 ,x 2 ), (m 11 ,m 12 ) corresponds to (x 11 ,x 12 ), and so on. If forward optical flow is used, the previous video frame of each video frame pair is extracted based on m s , and m sF = {m 1 , m 11 ,…} is obtained. Then m sF is normalized to obtain the original mask. The membrane frame sequence is expressed as m sr = {m r1 , m r11 ,…}; where m r1 corresponds to (x 1 , x 2 ), m r11 corresponds to (x 11 , x 12 ), and so on. If backward optical flow is used, the next video frame of each video frame pair is extracted based on m s , and m sB = {m 2 , m 12 ,…} is obtained. Then m sB is normalized to obtain the original mask. The membrane frame sequence is expressed as m sr = {m r2 , m r12 ,…}, where m r2 corresponds to (x 1 , x 2 ), m r12 corresponds to (x 11 , x 12 ), and so on. Wherein, the original mask frame sequence includes K original mask frames.
示例性地,一种处理方式为,待修复视频为x={xt}(t=1,2,…,T)。假设每隔10帧抽一组相邻的原始视频帧,则抽取到的序列表示为xs={(x1,x2),(x11,x12),…}。对xs中的每个原始视频帧进行归一化处理,得到视频样本序列表示为xsr={(xr1,xr2),(xr11,xr12),…}。若采用前向光流,则基于xsr提取每个视频帧对的前一个视频帧,得到xsrF={xr1,xr11,…},再对xsrF进行掩膜处理,得到的原始掩膜帧序列表示为msr={mr1,mr11,…};其中,mr1对应于(xr1,xr2),mr11对应于(xr11,xr12),以此类推。若采用后向光流,则基于xsr提取每个视频帧对的后一个视频帧,得到xsrB={xr2,xr12,…},再对xsrB进行掩膜处理,得到的原始掩膜帧序列表示为msr={mr2,mr12,…},;其中,mr2对应于(xr1,xr2),mr12对应于(xr11,xr12),以此类推。其中,原始掩膜帧序列包括K个原始掩膜帧。For example, one processing method is that the video to be repaired is x={x t }(t=1,2,...,T). Assuming that a group of adjacent original video frames is extracted every 10 frames, the extracted sequence is expressed as x s ={(x 1 ,x 2 ), (x 11 ,x 12 ),…}. Each original video frame in x s is normalized, and the video sample sequence is obtained as x sr = {(x r1 ,x r2 ), (x r11 ,x r12 ),…}. If forward optical flow is used, the previous video frame of each video frame pair is extracted based on x sr , and x srF = {x r1 , x r11 ,...} is obtained. Then x srF is masked to obtain the original mask. The membrane frame sequence is expressed as m sr ={m r1 ,m r11 ,…}; where m r1 corresponds to (x r1 ,x r2 ), m r11 corresponds to (x r11 ,x r12 ), and so on. If backward optical flow is used, the next video frame of each video frame pair is extracted based on x sr , and x srB = {x r2 , x r12 ,…} is obtained. Then x srB is masked to obtain the original mask. The membrane frame sequence is expressed as m sr ={m r2 ,m r12 ,…},; where m r2 corresponds to (x r1 ,x r2 ), m r12 corresponds to (x r11 ,x r12 ), and so on. Wherein, the original mask frame sequence includes K original mask frames.
具体地,为了便于理解,请参阅图5,图5为本申请实施例中生成目标掩膜帧的一个示意图,以图5中(a)图示出的原始掩膜帧为例,其中,标记为“1”的15个像素点构成原始掩膜区域。假设按照2个像素个数对原始掩膜 区域进行扩张,得到目标掩膜区域(即,由标记为“1”的像素点构成的灰色区域)。基于此,得到如图5中(b)图所示的目标掩膜帧。Specifically, for ease of understanding, please refer to Figure 5. Figure 5 is a schematic diagram of generating a target mask frame in an embodiment of the present application, taking the original mask frame shown in (a) of Figure 5 as an example, where the mark The 15 pixels of "1" constitute the original mask area. Assume that the original mask is mapped according to the number of 2 pixels The area is expanded to obtain the target mask area (ie, the gray area composed of pixels marked "1"). Based on this, the target mask frame shown in (b) in Figure 5 is obtained.
以此类推,对每个原始掩膜帧进行处理,直至得到目标掩膜样本序列。目标掩膜样本序列可表示为{mdst}(t=1,2,…,K)。其中,mdst表示第t个目标掩膜帧。By analogy, each original mask frame is processed until the target mask sample sequence is obtained. The target mask sample sequence can be expressed as {m dst } (t=1,2,...,K). Among them, m dst represents the t-th target mask frame.
其次,本申请实施例提供了一种基于单视频帧生成目标掩膜帧的方式。通过上述方式,考虑到视频帧对中前后两个视频帧的原始掩膜区域差别不大,因此,可以仅对其中一个原始掩膜帧进行区域扩张处理,由此降低操作的复杂度。Secondly, the embodiment of the present application provides a way to generate a target mask frame based on a single video frame. Through the above method, considering that there is little difference between the original mask areas of the two video frames before and after the video frame is centered, only one of the original mask frames can be subjected to area expansion processing, thereby reducing the complexity of the operation.
可选地,在上述图4对应的实施例的基础上,本申请实施例提供的另一个可选实施例中,针对视频样本序列中的每个视频帧对,对该视频帧对对应的原始掩膜帧中的原始掩膜区域进行扩张,得到该视频帧对对应的目标掩膜帧,具体可以包括:Optionally, based on the above-mentioned embodiment corresponding to Figure 4, in another optional embodiment provided by the embodiment of the present application, for each video frame pair in the video sample sequence, the original corresponding to the video frame pair The original mask area in the mask frame is expanded to obtain the target mask frame corresponding to the video frame, which may include:
针对视频样本序列中的每个视频帧对,按照第一像素个数对该视频帧对对应的原始掩膜帧中的原始掩膜区域进行扩张,得到该视频帧对对应的第一掩膜区域;For each video frame pair in the video sample sequence, expand the original mask area in the original mask frame corresponding to the video frame pair according to the number of first pixels to obtain the first mask area corresponding to the video frame pair. ;
针对视频样本序列中的每个视频帧对,按照第二像素个数对该视频帧对对应的原始掩膜帧中的原始掩膜区域进行扩张,得到该视频帧对对应的第二掩膜区域,其中,第二像素个数大于第一像素个数;For each video frame pair in the video sample sequence, expand the original mask area in the original mask frame corresponding to the video frame pair according to the number of second pixels to obtain the second mask area corresponding to the video frame pair. , where the number of second pixels is greater than the number of first pixels;
针对视频样本序列中的每个视频帧对,对该视频帧对对应的第一掩膜区域以及第二掩膜区域进行异或操作,得到该视频帧对对应的目标掩膜帧。For each video frame pair in the video sample sequence, an XOR operation is performed on the first mask area and the second mask area corresponding to the video frame pair to obtain the target mask frame corresponding to the video frame pair.
在一个或多个实施例中,介绍了一种扩张原始掩膜区域的方式。由前述实施例可知,针对原始掩膜帧序列中的每个原始掩膜帧而言,还可以对原始掩膜区域进行扩,得到目标掩膜区域。以此获得包含有目标掩膜区域的目标掩膜帧。In one or more embodiments, a way of expanding the original mask area is introduced. It can be known from the foregoing embodiments that for each original mask frame in the original mask frame sequence, the original mask area can also be expanded to obtain a target mask area. In this way, the target mask frame containing the target mask area is obtained.
具体地,为了便于理解,请参阅图6,图6为本申请实施例中生成目标掩膜帧的另一个示意图,以图6中(a)图示出的原始掩膜帧为例,其中,标记为“1”的15个像素点构成原始掩膜区域。假设按照第一像素个数(例如,2个像素个数)对原始掩膜区域进行扩张,得到第一掩膜区域(即,由标记为“1”的像素点构成的灰色区域),即得到如图6中(b)图所示的掩膜帧。假设按照第二像素个数(例如,4个像素个数)对原始掩膜区域进行扩张,得到第二掩膜区域(即,由标记为“1”的像素点构成的灰色区域),即得到如图6中(c)图所示的掩膜帧。进而,对第一掩膜区域以及第二掩膜区域进行异或操作,得到如图6中(d)图所示的目标掩膜帧,其中,目标掩膜帧包括目标掩膜区域(即,由标记为“1”的像素点构成的灰色区域)。Specifically, for ease of understanding, please refer to Figure 6, which is another schematic diagram of generating a target mask frame in an embodiment of the present application, taking the original mask frame shown in (a) of Figure 6 as an example, where, The 15 pixels marked "1" constitute the original mask area. Assume that the original mask area is expanded according to the first number of pixels (for example, 2 pixels) to obtain the first mask area (that is, a gray area composed of pixels marked "1"), that is, we get The mask frame shown in (b) of Figure 6. Assume that the original mask area is expanded according to the second number of pixels (for example, 4 pixels) to obtain the second mask area (that is, the gray area composed of pixels marked "1"), that is, we get The mask frame shown in (c) of Figure 6. Furthermore, an XOR operation is performed on the first mask area and the second mask area to obtain a target mask frame as shown in (d) of Figure 6 , where the target mask frame includes the target mask area (i.e., The gray area consisting of pixels marked "1").
以此类推,对每个原始掩膜帧进行处理,直至得到目标掩膜样本序列。目标掩膜样本序列可表示为{mdst=mda^mdb}(t=1,2,…,K)。其中,mdst表 示第t个目标掩膜帧,mda表示包括第一掩膜区域的掩膜帧,a表示第一像素个数,mdb表示包括第二掩膜区域的掩膜帧,b表示第二像素个数,“^”表示异或操作符。By analogy, each original mask frame is processed until the target mask sample sequence is obtained. The target mask sample sequence can be expressed as {m dst =m da ^m db } (t=1,2,...,K). Among them, m dst table represents the t-th target mask frame, m da represents the mask frame including the first mask area, a represents the number of first pixels, m db represents the mask frame including the second mask area, and b represents the second pixel number, "^" represents the XOR operator.
实际应用中,第一像素个数可以是7,第二像素个数可以是9,由此,目标掩膜样本序列可表示为{mdst=md7^md9}(t=1,2,…,K)。需要说明的是,第一像素个数和第二像素个数还可以根据情况进行调整,此处不做限定。In practical applications, the number of first pixels can be 7, and the number of second pixels can be 9. Therefore, the target mask sample sequence can be expressed as {m dst =m d7 ^m d9 }(t=1,2, …,K). It should be noted that the number of first pixels and the number of second pixels can also be adjusted according to the situation, and are not limited here.
再次,本申请实施例提供了一种扩张原始掩膜区域的方式。通过上述方式,原始掩膜区域内部的光流是通过周边的光流得到的,如果周边的光流比较混乱,那么原始掩膜区域内部的光流也无法得到很好的填充。考虑到紧贴原始掩膜区域的像素点可能存在一些噪声,因此,偏离原始掩膜区域而得到的目标掩膜区域具有更少的噪声,从而有利于提升光流质量的判定效果。Thirdly, the embodiment of the present application provides a way to expand the original mask area. Through the above method, the optical flow inside the original mask area is obtained from the peripheral optical flow. If the peripheral optical flow is relatively chaotic, the optical flow inside the original mask area cannot be well filled. Considering that there may be some noise in the pixels close to the original mask area, the target mask area obtained by deviating from the original mask area has less noise, which is beneficial to improving the judgment effect of optical flow quality.
可选地,在上述图4对应的实施例的基础上,本申请实施例提供的另一个可选实施例中,根据视频样本序列获取目标掩膜样本序列,具体可以包括:Optionally, based on the above-mentioned embodiment corresponding to Figure 4, in another optional embodiment provided by the embodiment of the present application, obtaining the target mask sample sequence according to the video sample sequence may specifically include:
针对视频样本序列中的每个视频帧对,根据该视频帧对中的前一个视频帧,获取该视频帧对对应的第一原始掩膜帧,并根据该视频帧对中的后一个视频帧,获取该视频帧对对应的第二原始掩膜帧,其中,第一原始掩膜帧以及第二原始掩膜帧分别包括对前一个视频帧和后一个视频帧中的目标对象进行掩膜处理后得到的原始掩膜区域;For each video frame pair in the video sample sequence, obtain the first original mask frame corresponding to the video frame pair based on the previous video frame in the video frame pair, and obtain the first original mask frame corresponding to the video frame pair based on the next video frame in the video frame pair. , obtain the second original mask frame corresponding to the video frame pair, wherein the first original mask frame and the second original mask frame respectively include masking the target object in the previous video frame and the next video frame. The original mask area obtained after;
针对视频样本序列中的每个视频帧对,对该视频帧对对应的第一原始掩膜帧以及第二原始掩膜帧进行并集处理,得到该视频帧对对应的原始掩膜帧;For each video frame pair in the video sample sequence, perform a union process on the first original mask frame and the second original mask frame corresponding to the video frame pair to obtain the original mask frame corresponding to the video frame pair;
针对视频样本序列中的每个视频帧对,对该视频帧对对应的原始掩膜帧中的原始掩膜区域进行扩张,得到该视频帧对对应的目标掩膜帧;For each video frame pair in the video sample sequence, expand the original mask area in the original mask frame corresponding to the video frame pair to obtain the target mask frame corresponding to the video frame pair;
将K个视频帧对各自对应的目标掩膜帧作为目标掩膜样本序列。The corresponding target mask frames of the K video frame pairs are used as the target mask sample sequence.
在一个或多个实施例中,介绍了一种基于多视频帧生成目标掩膜帧的方式。由前述实施例可知,待修复视频中显示有目标对象,对此,需要对目标对象进行掩膜处理,从而得到相应的原始掩膜区域。再按照一定像素点个数对原始掩膜区域进行扩张,从而得到目标掩膜区域。In one or more embodiments, a method of generating a target mask frame based on multiple video frames is introduced. It can be known from the foregoing embodiments that the target object is displayed in the video to be repaired. For this purpose, the target object needs to be masked to obtain the corresponding original mask area. Then the original mask area is expanded according to a certain number of pixels to obtain the target mask area.
需要说明的是,目标对象可以是标志,字幕,物体等。可以理解的是,识别目标对象的方式包含但不限于人工标注以及模型识别等,例如,采用FCN识别出目标对象。It should be noted that the target object can be a logo, subtitle, object, etc. It can be understood that the method of identifying the target object includes but is not limited to manual annotation and model recognition. For example, FCN is used to identify the target object.
示例性地,一种处理方式为,待修复视频为x={xt}(t=1,2,…,T)。可对待修复视频中的每个原始视频帧进行掩膜处理,由此,得到m={mt}(t=1,2,…,T)。假设每隔10帧抽一组相邻的原始视频帧,则抽取到视频样本序列表示为xs={(x1,x2),(x11,x12),…},由此,得到对应的掩膜帧序列表示为ms={(m1,m2),(m11,m12),…}。再对ms进行归一化,得到的原始掩膜帧序列表示为msr={(mr1,mr2),(mr11,mr12),…}。原始掩膜帧序列包括K个第一原始掩膜帧(即,{mr1,mr11,…})以及K个第二原始掩膜帧(即,msr={mr2,mr12,…}); 在第一原始掩膜帧中,mr1对应于(x1,x2),mr11对应于(x11,x12),以此类推;在第二原始掩膜帧中,mr2对应于(x1,x2),mr12对应于(x11,x12),以此类推。For example, one processing method is that the video to be repaired is x={x t }(t=1,2,...,T). Each original video frame in the video to be repaired can be masked, thereby obtaining m={m t } (t=1,2,...,T). Assuming that a group of adjacent original video frames is extracted every 10 frames, the extracted video sample sequence is expressed as x s = {(x 1 , x 2 ), (x 11 , x 12 ),...}, thus, we get The corresponding mask frame sequence is expressed as m s ={(m 1 ,m 2 ), (m 11 ,m 12 ),…}. Then m s is normalized, and the obtained original mask frame sequence is expressed as m sr = {(m r1 ,m r2 ), (m r11 ,m r12 ),…}. The original mask frame sequence includes K first original mask frames (ie, {m r1 , m r11 ,...}) and K second original mask frames (ie, m sr ={m r2 , m r12 ,...}). }); In the first original mask frame, m r1 corresponds to (x 1 , x 2 ), m r11 corresponds to (x 11 , x 12 ), and so on; in the second original mask frame, m r2 corresponds to (x 1 ,x 2 ), m r12 corresponds to (x 11 ,x 12 ), and so on.
示例性地,一种处理方式为,待修复视频为x={xt}(t=1,2,…,T)。假设每隔10帧抽一组相邻的原始视频帧,则抽取到的序列表示为xs={(x1,x2),(x11,x12),…}。对xs中的每个原始视频帧进行归一化处理,得到视频样本序列表示为xsr={(xr1,xr2),(xr11,xr12),…}。再对xsr进行掩膜处理,得到的原始掩膜帧序列表示为msr={(mr1,mr2),(mr11,mr12),…}。其中,原始掩膜帧序列包括K个第一原始掩膜帧(即,{mr1,mr11,…})以及K个第二原始掩膜帧(即,msr={mr2,mr12,…});在第一原始掩膜帧中,mr1对应于(xr1,xr2),mr11对应于(xr11,xr12),以此类推;在第二原始掩膜帧中,mr2对应于(xr1,xr2),mr12对应于(xr11,xr12),以此类推。For example, one processing method is that the video to be repaired is x={x t }(t=1,2,...,T). Assuming that a group of adjacent original video frames is extracted every 10 frames, the extracted sequence is expressed as x s ={(x 1 ,x 2 ), (x 11 ,x 12 ),…}. Each original video frame in x s is normalized, and the video sample sequence is obtained as x sr = {(x r1 ,x r2 ), (x r11 ,x r12 ),…}. Then mask processing is performed on x sr , and the obtained original mask frame sequence is expressed as m sr = {(m r1 ,m r2 ), (m r11 ,m r12 ),…}. Wherein, the original mask frame sequence includes K first original mask frames (ie, {m r1 , m r11 ,...}) and K second original mask frames (ie, m sr ={m r2 , m r12 ,...}); In the first original mask frame, m r1 corresponds to (x r1 ,x r2 ), m r11 corresponds to (x r11 ,x r12 ), and so on; in the second original mask frame , m r2 corresponds to (x r1 ,x r2 ), m r12 corresponds to (x r11 ,x r12 ), and so on.
具体地,为了便于理解,请参阅图7,图7为本申请实施例中生成目标掩膜帧的又一个示意图,图7中(a)图示出的为第一原始掩膜帧,其中,标记为“1”的13个像素点构成第一原始掩膜帧的原始掩膜区域。图7中(b)图示出的为第二原始掩膜帧,其中,标记为“1”的13个像素点构成第二原始掩膜帧的原始掩膜区域。对第一原始掩膜帧以及第二原始掩膜帧进行并集处理之后,得到如图7中(c)图所示的原始掩膜帧,其中,标记为“1”的15个像素点构成该原始掩膜帧的原始掩膜区域。假设按照2个像素个数对原始掩膜区域进行扩张,得到目标掩膜区域(即,由标记为“1”的像素点构成的灰色区域)。基于此,得到如图7中(d)图所示的目标掩膜帧。Specifically, for ease of understanding, please refer to Figure 7. Figure 7 is another schematic diagram of generating a target mask frame in an embodiment of the present application. Figure 7(a) illustrates the first original mask frame, where, The 13 pixel points marked "1" constitute the original mask area of the first original mask frame. (b) of FIG. 7 illustrates the second original mask frame, in which the 13 pixel points marked “1” constitute the original mask area of the second original mask frame. After the union processing of the first original mask frame and the second original mask frame, the original mask frame is obtained as shown in (c) of Figure 7, in which 15 pixels marked as "1" are formed. The original mask area of this original mask frame. Assume that the original mask area is expanded according to the number of 2 pixels to obtain the target mask area (ie, the gray area composed of pixels marked "1"). Based on this, the target mask frame shown in (d) of Figure 7 is obtained.
以此类推,对每个原始掩膜帧进行处理,直至得到目标掩膜样本序列。目标掩膜样本序列可表示为{mdst}(t=1,2,…,K)。其中,mdst表示第t个目标掩膜帧。By analogy, each original mask frame is processed until the target mask sample sequence is obtained. The target mask sample sequence can be expressed as {m dst } (t=1,2,...,K). Among them, m dst represents the t-th target mask frame.
其次,本申请实施例提供了一种基于多视频帧生成目标掩膜帧的方式。该方式考虑到视频帧对中前后两个视频帧的原始掩膜区域可能具有差异,因此,先对前后两帧的原始掩膜区域取并集,获得更准确的原始掩膜区域。由此,提升视频帧的处理效果。Secondly, embodiments of the present application provide a method of generating target mask frames based on multiple video frames. This method takes into account that the original mask areas of the two video frames before and after the video frame pair may be different. Therefore, the original mask areas of the two frames before and after are first combined to obtain a more accurate original mask area. As a result, the processing effect of video frames is improved.
可选地,在上述图4对应的实施例的基础上,本申请实施例提供的另一个可选实施例中,针对视频样本序列中的每个视频帧对,对该视频对对应的原始掩膜帧中的原始掩膜区域进行扩张,得到该视频对对应的目标掩膜帧,具体可以包括:Optionally, based on the above-mentioned embodiment corresponding to Figure 4, in another optional embodiment provided by the embodiment of the present application, for each video frame pair in the video sample sequence, the original mask corresponding to the video pair is The original mask area in the film frame is expanded to obtain the target mask frame corresponding to the video pair, which may include:
针对视频样本序列中的每个视频帧对,按照第一像素个数对该视频对对应的原始掩膜帧中的原始掩膜区域进行扩张,得到该视频对对应的第一掩膜区域;For each video frame pair in the video sample sequence, expand the original mask area in the original mask frame corresponding to the video pair according to the number of first pixels to obtain the first mask area corresponding to the video pair;
针对视频样本序列中的每个视频帧对,按照第二像素个数对该视频对对应的原始掩膜帧中的原始掩膜区域进行扩张,得到该视频对对应的第二掩膜区域,其中,第二像素个数大于第一像素个数; For each video frame pair in the video sample sequence, expand the original mask area in the original mask frame corresponding to the video pair according to the number of second pixels to obtain the second mask area corresponding to the video pair, where , the number of second pixels is greater than the number of first pixels;
针对视频样本序列中的每个视频帧对,对该视频对对应的第一掩膜区域以及第二掩膜区域进行异或操作,得到该视频对对应的目标掩膜帧。For each video frame pair in the video sample sequence, an XOR operation is performed on the first mask area and the second mask area corresponding to the video pair to obtain the target mask frame corresponding to the video pair.
在一个或多个实施例中,介绍了一种扩张原始掩膜区域的方式。由前述实施例可知,针对原始掩膜帧序列中的每个原始掩膜帧,还可以对其中的原始掩膜区域进行扩张,得到目标掩膜区域。以此获得包含有目标掩膜区域的目标掩膜帧。In one or more embodiments, a way of expanding the original mask area is introduced. It can be known from the foregoing embodiments that for each original mask frame in the original mask frame sequence, the original mask area can also be expanded to obtain a target mask area. In this way, the target mask frame containing the target mask area is obtained.
具体地,为了便于理解,请参阅图8,图8为本申请实施例中生成目标掩膜帧的再一个示意图,图8中(a)图示出的为第一原始掩膜帧,其中,标记为“1”的13个像素点构成第一原始掩膜帧的原始掩膜区域。图8中(b)图示出的为第二原始掩膜帧,其中,标记为“1”的13个像素点构成第二原始掩膜帧的原始掩膜区域。对第一原始掩膜帧以及第二原始掩膜帧进行并集处理之后,得到如图8中(c)图所示的原始掩膜帧,其中,标记为“1”的15个像素点构成该原始掩膜帧的原始掩膜区域。假设按照第一像素个数(例如,2个像素个数)对原始掩膜区域进行扩张,得到第一掩膜区域(即,由标记为“1”的像素点构成的灰色区域),即得到如图8中(d)图所示的掩膜帧。假设按照第二像素个数(例如,4个像素个数)对原始掩膜区域进行扩张,得到第二掩膜区域(即,由标记为“1”的像素点构成的灰色区域),即得到如图8中(e)图所示的掩膜帧。基于此,对第一掩膜区域以及第二掩膜区域进行异或操作,得到如图8中(f)图所示的目标掩膜帧,其中,目标掩膜帧包括目标掩膜区域(即,由标记为“1”的像素点构成的灰色区域)。Specifically, for ease of understanding, please refer to Figure 8. Figure 8 is another schematic diagram of generating a target mask frame in an embodiment of the present application. Figure 8 (a) illustrates the first original mask frame, wherein, The 13 pixel points marked "1" constitute the original mask area of the first original mask frame. (b) in FIG. 8 illustrates the second original mask frame, in which the 13 pixel points marked “1” constitute the original mask area of the second original mask frame. After performing a union process on the first original mask frame and the second original mask frame, an original mask frame is obtained as shown in (c) of Figure 8, in which 15 pixels marked as "1" are formed. The original mask area of this original mask frame. Assume that the original mask area is expanded according to the first number of pixels (for example, 2 pixels) to obtain the first mask area (that is, a gray area composed of pixels marked "1"), that is, we get The mask frame shown in (d) in Figure 8. Assume that the original mask area is expanded according to the second number of pixels (for example, 4 pixels) to obtain the second mask area (that is, the gray area composed of pixels marked "1"), that is, we get The mask frame shown in (e) of Figure 8. Based on this, an XOR operation is performed on the first mask area and the second mask area to obtain the target mask frame as shown in (f) of Figure 8, where the target mask frame includes the target mask area (i.e. , the gray area consisting of pixels marked "1").
以此类推,对每个原始掩膜帧进行处理,直至得到目标掩膜样本序列。目标掩膜样本序列可表示为{mdst=mda^mdb}(t=1,2,…,K)。其中,mdst表示第t个目标掩膜帧,mda表示包括第一掩膜区域的掩膜帧,a表示第一像素个数,mdb表示包括第二掩膜区域的掩膜帧,b表示第二像素个数,“^”表示异或操作符。By analogy, each original mask frame is processed until the target mask sample sequence is obtained. The target mask sample sequence can be expressed as {m dst =m da ^m db } (t=1,2,...,K). Among them, m dst represents the t-th target mask frame, m da represents the mask frame including the first mask area, a represents the number of first pixels, m db represents the mask frame including the second mask area, b Indicates the number of second pixels, and "^" indicates the XOR operator.
实际应用中,第一像素个数可以是7,第二像素个数可以是9,由此,目标掩膜样本序列可表示为{mdst=md7^md9}(t=1,2,…,K)。需要说明的是,第一像素个数和第二像素个数还可以根据情况进行调整,此处不做限定。In practical applications, the number of first pixels can be 7, and the number of second pixels can be 9. Therefore, the target mask sample sequence can be expressed as {m dst =m d7 ^m d9 }(t=1,2, …,K). It should be noted that the number of first pixels and the number of second pixels can also be adjusted according to the situation, and are not limited here.
再次,本申请实施例提供了一种扩张原始掩膜区域的方式。通过上述方式,原始掩膜区域内部的光流是通过周边的光流得到的,如果周边的光流比较混乱,那么原始掩膜区域内部的光流也无法得到很好的填充。考虑到紧贴原始掩膜区域的像素点可能存在一些噪声,而偏离原始掩膜区域得到的目标掩膜区域具有更少的噪声,从而有利于提升光流质量的判定效果。Thirdly, the embodiment of the present application provides a way to expand the original mask area. Through the above method, the optical flow inside the original mask area is obtained from the peripheral optical flow. If the peripheral optical flow is relatively chaotic, the optical flow inside the original mask area cannot be well filled. Considering that there may be some noise in the pixels close to the original mask area, the target mask area obtained by deviating from the original mask area has less noise, which is beneficial to improving the judgment effect of optical flow quality.
可选地,在上述图4对应的实施例的基础上,本申请实施例提供的另一个可选实施例中,根据视频样本序列获取光流数据序列,具体可以包括:Optionally, based on the above-mentioned embodiment corresponding to Figure 4, in another optional embodiment provided by the embodiment of the present application, obtaining the optical flow data sequence according to the video sample sequence may specifically include:
针对视频样本序列中的每个视频帧对,根据该视频对中后一个视频帧中各个像素点相对于前一个视频帧中各个像素点的水平偏移量以及竖直偏移量, 确定该视频对对应的光流数据;For each video frame pair in the video sample sequence, according to the horizontal offset and vertical offset of each pixel in the next video frame in the video pair relative to each pixel in the previous video frame, Determine the optical flow data corresponding to the video pair;
将K个视频对各自对应的光流数据作为光流数据序列;The optical flow data corresponding to each of the K video pairs is used as an optical flow data sequence;
或,or,
根据视频样本序列获取光流数据序列,具体可以包括:Obtaining the optical flow data sequence based on the video sample sequence may include:
针对视频样本序列中的每个视频帧对,根据该视频对中前一个视频帧中各个像素点相对于后一个视频帧中各个像素点的水平偏移量以及竖直偏移量,确定该视频对对应的光流数据;For each video frame pair in the video sample sequence, determine the video based on the horizontal offset and vertical offset of each pixel in the previous video frame relative to each pixel in the next video frame. To the corresponding optical flow data;
将K个视频对各自对应的光流数据作为光流数据序列。The optical flow data corresponding to each of the K video pairs is used as an optical flow data sequence.
在一个或多个实施例中,介绍了一种基于视频帧对确定光流数据的两种方式。由前述实施例可知,视频样本序列包括K个视频帧对,每个视频帧对包括两个视频帧。若视频帧已经过尺寸归一化处理,则视频样本序列可表示为xsr={(xr1,xr2),(xr11,xr12),…}。假设视频帧的尺寸为512×288,那么光流数据Flt可以表示通道数为2,尺寸大小为512×288的光流矩阵。而光流数据序列表示为{Flt}(t=1,…,K)。由此,结合光流数据可确定各个像素点对应的二维光流值(w′,h′),其中,w表示像素点的水平偏移量,h′表示像素点的竖直偏移量。In one or more embodiments, two ways of determining optical flow data based on video frame pairs are introduced. As can be seen from the foregoing embodiments, the video sample sequence includes K video frame pairs, and each video frame pair includes two video frames. If the video frame has been size normalized, the video sample sequence can be expressed as x sr = {(x r1 ,x r2 ), (x r11 ,x r12 ),…}. Assuming that the size of the video frame is 512×288, then the optical flow data F t can represent an optical flow matrix with a channel number of 2 and a size of 512×288. The optical flow data sequence is expressed as {Fl t } (t=1,...,K). Therefore, the two-dimensional optical flow value (w′, h′) corresponding to each pixel can be determined by combining the optical flow data, where w represents the horizontal offset of the pixel and h′ represents the vertical offset of the pixel. quantity.
下面将以一个像素点的为例,结合图示介绍确定光流数据的方式。The following will take a pixel as an example and illustrate how to determine optical flow data.
一、基于前向光流确定光流数据;1. Determine optical flow data based on forward optical flow;
具体地,若采用前向光流,则需要根据视频帧对中后一个视频帧中各个像素点相对于前一个视频帧中各个像素点的水平偏移量以及竖直偏移量,确定该视频帧对对应的光流数据。为了便于理解,请参阅图9,图9为本申请实施例中基于前向光流确定二维光流值的一个示意图,在前一个视频帧中,像素点坐标为(3,4)。在后一个视频帧中,像素点坐标为(4,5)。该像素点从前一个视频帧到后一个视频帧的水平偏移量为1(即,4-3),竖直偏移量为1(即,5-4),可见,该像素点的二维光流值为(1,1)。Specifically, if forward optical flow is used, the video needs to be determined based on the horizontal offset and vertical offset of each pixel in the next video frame relative to each pixel in the previous video frame. The optical flow data corresponding to the frame pair. For ease of understanding, please refer to Figure 9. Figure 9 is a schematic diagram of determining a two-dimensional optical flow value based on forward optical flow in an embodiment of the present application. In the previous video frame, the pixel point coordinates are (3,4). In the next video frame, the pixel coordinates are (4,5). The horizontal offset of this pixel from the previous video frame to the next video frame is 1 (ie, 4-3), and the vertical offset is 1 (ie, 5-4). It can be seen that the two-dimensional The optical flow value is (1,1).
二、基于后向光流确定光流数据;2. Determine optical flow data based on backward optical flow;
具体地,若采用后向光流,则需要根据视频帧对中前一个视频帧中各个像素点相对于后一个视频帧中各个像素点的水平偏移量以及竖直偏移量,确定该视频帧对对应的光流数据。为了便于理解,请参阅图10,图10为本申请实施例中基于后向光流确定二维光流值的一个示意图,在前一个视频帧中,像素点坐标为(1,3)。在后一个视频帧中,像素点坐标为(4,5)。该像素点从后一个视频帧到前一个视频帧的水平偏移量为-4(即,1-4),竖直偏移量为-2(即,3-5),可见,该像素点的二维光流值为(-4,-2)。Specifically, if backward optical flow is used, the video frame needs to be determined based on the horizontal offset and vertical offset of each pixel in the previous video frame relative to each pixel in the next video frame. The optical flow data corresponding to the frame pair. For ease of understanding, please refer to Figure 10. Figure 10 is a schematic diagram of determining a two-dimensional optical flow value based on backward optical flow in an embodiment of the present application. In the previous video frame, the pixel point coordinates are (1,3). In the next video frame, the pixel coordinates are (4,5). The horizontal offset of this pixel from the next video frame to the previous video frame is -4 (i.e., 1-4), and the vertical offset is -2 (i.e., 3-5). It can be seen that this pixel The two-dimensional optical flow value is (-4,-2).
其次,本申请实施例提供了一种基于视频帧对确定光流数据的两种方式。通过上述方式,支持基于前向光流或后向光流生成光流数据,从而提升方案的灵活性。Secondly, the embodiments of this application provide two ways of determining optical flow data based on video frame pairs. Through the above method, it is supported to generate optical flow data based on forward optical flow or backward optical flow, thus improving the flexibility of the solution.
可选地,在上述图4对应的实施例的基础上,本申请实施例提供的另一 个可选实施例中,基于光流数据序列中的各个光流数据,对每个目标掩膜帧中目标掩膜区域包括的像素点进行聚类处理,得到每个目标掩膜帧的光流聚类结果,具体可以包括:Optionally, on the basis of the above-mentioned embodiment corresponding to Figure 4, another embodiment of the present application provides In an optional embodiment, based on each optical flow data in the optical flow data sequence, the pixels included in the target mask area in each target mask frame are clustered to obtain the optical flow of each target mask frame. Clustering results may include:
针对每个目标掩膜帧,根据光流数据序列中该目标掩膜帧对应的光流数据,确定该目标掩膜帧中目标掩膜区域中X个像素点的二维光流值,其中,目标掩膜帧对应的光流数据与该目标掩膜帧对应于同一视频帧对,X为大于1的整数;For each target mask frame, determine the two-dimensional optical flow values of X pixels in the target mask area in the target mask frame based on the optical flow data corresponding to the target mask frame in the optical flow data sequence, where, The optical flow data corresponding to the target mask frame and the target mask frame correspond to the same video frame pair, and X is an integer greater than 1;
针对每个目标掩膜帧,根据其中目标掩膜区域中X个像素点的二维光流值,对X个像素点进行聚类处理,得到该目标掩膜帧的光流聚类结果。For each target mask frame, cluster the X pixels according to the two-dimensional optical flow values of the X pixels in the target mask area to obtain the optical flow clustering result of the target mask frame.
在一个或多个实施例中,介绍了一种对目标掩膜区域内的像素点进行聚类的方式。由前述实施例可知,目标掩膜样本序列包括K个目标掩膜帧,需要对每个目标掩膜帧中目标掩膜区域内的像素点进行光流聚类。可以理解的是,实际情况下,目标掩膜区域包括的像素点数量可能较大,因此,还可以预先对目标掩膜区域内的像素点进行随机采样,得到X个像素点。其中,X为大于1的整数,例如,X可以设置为15000。In one or more embodiments, a method of clustering pixels in a target mask area is introduced. As can be seen from the foregoing embodiments, the target mask sample sequence includes K target mask frames, and it is necessary to perform optical flow clustering on the pixels in the target mask area in each target mask frame. It is understandable that in actual situations, the number of pixels included in the target mask area may be large. Therefore, the pixels in the target mask area may also be randomly sampled in advance to obtain X pixels. Among them, X is an integer greater than 1. For example, X can be set to 15000.
具体地,目标掩膜样本序列为{mdst}(t=1,2,…,K),光流数据序列为{Flt}(t=1,…,K)。基于此,可计算其中,“*”表示元素相乘,由此,可保留目标掩膜帧内标记为“1”的像素点对应的二维光流值,而其余部分置为0。于是,可采用DBSCAN算法对目标掩膜帧内的X个像素点进行聚类,聚类依据为每个像素点的二维光流值,以此得到目标掩膜帧的光流聚类结果。Specifically, the target mask sample sequence is {m dst }(t=1,2,...,K), and the optical flow data sequence is {Fl t }(t=1,...,K). Based on this, it can be calculated Among them, "*" means element multiplication, whereby the target mask frame can be retained The two-dimensional optical flow value corresponding to the pixel marked "1" inside, while the rest is set to 0. Therefore, the DBSCAN algorithm can be used to map the target mask frame X pixels within are clustered, and the clustering is based on the two-dimensional optical flow value of each pixel, so as to obtain the optical flow clustering result of the target mask frame.
需要说明的是,每个目标掩膜帧的光流聚类结果包括经过聚类后各个像素点对应的类别标签。其中,类别标签为“0”的像素点属于噪声像素点,需要进行剔除,剔除后得到目标掩膜帧对应的总类别数量。以第t个目标掩膜帧为例,其对应的总类别数量可表示为Ct,即,具有Ct个聚类簇。聚类簇可包括Nct个像素点。It should be noted that the optical flow clustering result of each target mask frame includes the category label corresponding to each pixel after clustering. Among them, pixels with a category label of "0" belong to noise pixels and need to be eliminated. After elimination, the total number of categories corresponding to the target mask frame is obtained. Taking the t-th target mask frame as an example, the corresponding total number of categories can be expressed as C t , that is, there are C t clusters. A cluster may include N ct pixels.
其次,本申请实施例提供了一种对目标掩膜区域内的像素点进行聚类的方式。通过上述方式,可采用DBSCAN算法对像素点进行聚类,一方面能够实现自适应聚类,无需提前设定类别数量。另一方面,DBSCAN算法能够较好地判断离群点,且能发现任意形状的聚类簇。Secondly, the embodiment of the present application provides a way to cluster pixels in the target mask area. Through the above method, the DBSCAN algorithm can be used to cluster pixels. On the one hand, adaptive clustering can be achieved without setting the number of categories in advance. On the other hand, the DBSCAN algorithm can better judge outliers and can find clusters of arbitrary shapes.
可选地,在上述图4对应的实施例的基础上,本申请实施例提供的另一个可选实施例中,根据每个目标掩膜帧的光流聚类结果,确定光流质量分值,具体可以包括:Optionally, based on the above-mentioned embodiment corresponding to Figure 4, in another optional embodiment provided by the embodiment of the present application, the optical flow quality score is determined according to the optical flow clustering result of each target mask frame. , specifically can include:
根据每个目标掩膜帧的光流聚类结果,确定每个目标掩膜帧的总类别数量;Based on the optical flow clustering results of each target mask frame, determine the total number of categories for each target mask frame;
统计总类别数量小于或等于类别数量阈值的目标掩膜帧数;Count the number of target mask frames in which the total number of categories is less than or equal to the category number threshold;
根据目标掩膜帧数与K值之间的比值,确定类别单一比例; According to the ratio between the number of target mask frames and the K value, determine the single proportion of the category;
若类别单一比例大于比例阈值,则确定光流质量分值为第一分值;If the single proportion of a category is greater than the proportion threshold, the optical flow quality score is determined to be the first score;
若类别单一比例小于或等于比例阈值,则确定光流质量分值为第二分值。If the category single ratio is less than or equal to the ratio threshold, the optical flow quality score is determined to be the second score.
在一个或多个实施例中,提供了一种基于类别单一比例(clean rate,CR)确定光流质量分值的方式。由前述实施例可知,每个目标掩膜帧的光流聚类结果包括经过聚类后各个像素点对应的类别标签。剔除类别标签为“0”的像素点,可以得到目标掩膜帧对应的总类别数量。In one or more embodiments, a method of determining the optical flow quality score based on a category single ratio (clean rate, CR) is provided. As can be seen from the foregoing embodiments, the optical flow clustering result of each target mask frame includes the category label corresponding to each pixel after clustering. By eliminating pixels with a category label of "0", the total number of categories corresponding to the target mask frame can be obtained.
具体地,对于光流聚类结果 而言,可采用如下方式计算类别单一比例:
Specifically, for the optical flow clustering results For example, the following method can be used to calculate the single proportion of a category:
其中,CR表示类别单一比例。t表示目标掩膜帧的帧号,K表示目标掩膜帧的总帧数。c表示类别标签,Ct表示第t个目标掩膜帧的总类别数量。i表示像素编号,Nct表示第t个目标掩膜帧对应第c个类别标签的像素个数。表示示性函数,输入为1则返回1,否则返回0。Among them, CR represents the category single ratio. t represents the frame number of the target mask frame, and K represents the total number of target mask frames. c represents the category label, and C t represents the total number of categories in the t-th target mask frame. i represents the pixel number, and N ct represents the number of pixels corresponding to the c-th category label in the t-th target mask frame. Represents an indicator function. If the input is 1, it returns 1, otherwise it returns 0.
基于此,可统计出K个目标掩膜帧中,总类别数量小于或等于类别数量阈值(例如,1)的帧数占比,即,得到类别单一比例。Based on this, the proportion of frames among the K target mask frames in which the total number of categories is less than or equal to the category number threshold (for example, 1) can be calculated, that is, a single category ratio is obtained.
结合类别单一比例,可定义光流质量的判别标准为:
Combined with the single proportion of categories, the criterion for determining optical flow quality can be defined as:
其中,Q表示光流质量分值。CR表示类别单一比例。CRthreshold表示比例阈值,示例性地,比例阈值可设置为0.8,或者其他合理取值,此处不做限定。Among them, Q represents the optical flow quality score. CR represents category single ratio. CR threshold represents a proportional threshold. For example, the proportional threshold can be set to 0.8, or other reasonable values, which are not limited here.
其次,本申请实施例提供了一种基于类别单一比例确定光流质量分值的方式。上述方式考虑到类别单一比例越大,表示总类别数量越少,视频光流越稳定。因此,利用类别单一比例过滤掉光流受到干扰的视频,由此作为判定光流质量的依据,从而提升方案的可行性和可操作性。Secondly, the embodiment of the present application provides a way to determine the optical flow quality score based on a single proportion of categories. The above method takes into account that the larger the proportion of a single category, the smaller the total number of categories and the more stable the video optical flow is. Therefore, a single ratio of categories is used to filter out videos with disturbed optical flow, which is used as a basis for judging optical flow quality, thus improving the feasibility and operability of the solution.
可选地,在上述图4对应的实施例的基础上,本申请实施例提供的另一个可选实施例中,根据每个目标掩膜帧的光流聚类结果,确定光流质量分值,具体可以包括:Optionally, based on the above-mentioned embodiment corresponding to Figure 4, in another optional embodiment provided by the embodiment of the present application, the optical flow quality score is determined according to the optical flow clustering result of each target mask frame. , specifically can include:
针对每个目标掩膜帧的光流聚类结果,根据其中每个聚类簇中各个像素点的二维光流值,确定每个聚类簇的移动平均值,其中,光流聚类结果用于表征一个或多个聚类簇;For the optical flow clustering result of each target mask frame, the moving average of each cluster is determined based on the two-dimensional optical flow value of each pixel in each cluster, where the optical flow clustering result Used to characterize one or more clusters;
针对每个目标掩膜帧的光流聚类结果,根据其中每个聚类簇的移动平均值,确定目标掩膜帧的移动平均值;For the optical flow clustering result of each target mask frame, determine the moving average of the target mask frame based on the moving average of each cluster;
对每个目标掩膜帧的移动平均值进行累加处理,得到移动总距离;The moving average of each target mask frame is accumulated to obtain the total moving distance;
若移动总距离大于或等于距离阈值,则确定光流质量分值为第一分值;If the total distance moved is greater than or equal to the distance threshold, the optical flow quality score is determined to be the first score;
若移动总距离小于距离阈值,则确定光流质量分值为第二分值。If the total distance moved is less than the distance threshold, the optical flow quality score is determined to be the second score.
在一个或多个实施例中,介绍了一种基于移动总距离确定光流质量分值 的方式。由前述实施例可知,每个目标掩膜帧的光流聚类结果包括经过聚类后各个像素点对应的类别标签。剔除类别标签为“0”的像素点,可以得到目标掩膜帧对应的总类别数量。In one or more embodiments, a method for determining the optical flow quality score based on the total distance moved is introduced. The way. As can be seen from the foregoing embodiments, the optical flow clustering result of each target mask frame includes the category label corresponding to each pixel after clustering. By eliminating pixels with a category label of "0", the total number of categories corresponding to the target mask frame can be obtained.
具体地,对于光流聚类结果 而言,可采用如下方式计算K个目标掩膜帧累计得到的移动总距离:
Specifically, for the optical flow clustering results Specifically, the following method can be used to calculate the total moving distance accumulated by K target mask frames:
其中,D表示移动总距离。Dt表示第t个目标掩膜帧的移动平均值。t表示目标掩膜帧的帧号,K表示目标掩膜帧的总帧数。Among them, D represents the total distance moved. D t represents the moving average of the t-th target mask frame. t represents the frame number of the target mask frame, and K represents the total number of target mask frames.
可采用如下方式计算目标掩膜帧的移动平均值:
The moving average of the target mask frame can be calculated as follows:
其中,Dt表示第t个目标掩膜帧的移动平均值。Dtc表示第t个目标掩膜帧中第c个聚类簇的移动平均值。c表示类别标签,Ct表示第t个目标掩膜帧的总类别数量。Among them, D t represents the moving average of the t-th target mask frame. D tc represents the moving average of the c-th cluster in the t-th target mask frame. c represents the category label, and C t represents the total number of categories in the t-th target mask frame.
可采用如下方式计算聚类簇的移动平均值:
The moving average of a cluster can be calculated as follows:
其中,表示第t个目标掩膜帧中第c个聚类簇的移动平均值。表示第t个目标掩膜帧中第c个聚类簇内第i个像素点的二维光流值。i表示像素编号,Nct表示第t个目标掩膜帧对应第c个类别标签的像素个数。||.||表示欧式距离。in, Represents the moving average of the c-th cluster in the t-th target mask frame. Represents the two-dimensional optical flow value of the i-th pixel in the c-th cluster in the t-th target mask frame. i represents the pixel number, and N ct represents the number of pixels corresponding to the c-th category label in the t-th target mask frame. ||.|| represents the Euclidean distance.
基于此,可统计出K个目标掩膜帧的移动总距离。结合移动总距离,可定义光流质量的判别标准为:
Based on this, the total moving distance of K target mask frames can be calculated. Combined with the total distance moved, the criterion for determining optical flow quality can be defined as:
其中,Q表示光流质量分值。D表示移动总距离。Dthreshold表示距离阈值,示例性地,距离阈值可设置为4,或者其他合理取值,此处不做限定。Among them, Q represents the optical flow quality score. D represents the total distance moved. D threshold represents a distance threshold. For example, the distance threshold can be set to 4, or other reasonable values, which are not limited here.
其次,本申请实施例提供了一种基于移动总距离确定光流质量分值的方式。上述方式考虑到移动总距离越大,表示帧运动越明显,有利于进行光流估计。因此,利用移动总距离过滤掉较为静止的视频,由此作为判定光流质量的依据,从而提升方案的可行性和可操作性。Secondly, the embodiment of the present application provides a way to determine the optical flow quality score based on the total distance moved. The above method takes into account that the larger the total distance moved, the more obvious the frame motion is, which is beneficial to optical flow estimation. Therefore, the total moving distance is used to filter out relatively stationary videos, which is used as a basis for determining the optical flow quality, thereby improving the feasibility and operability of the solution.
可选地,在上述图4对应的实施例的基础上,本申请实施例提供的另一个可选实施例中,根据每个目标掩膜帧的光流聚类结果,确定光流质量分值,具体可以包括:Optionally, based on the above-mentioned embodiment corresponding to Figure 4, in another optional embodiment provided by the embodiment of the present application, the optical flow quality score is determined according to the optical flow clustering result of each target mask frame. , specifically can include:
根据每个目标掩膜帧的光流聚类结果,确定每个目标掩膜帧的总类别数量; Based on the optical flow clustering results of each target mask frame, determine the total number of categories for each target mask frame;
统计总类别数量小于或等于类别数量阈值的目标掩膜帧数;Count the number of target mask frames in which the total number of categories is less than or equal to the category number threshold;
根据目标掩膜帧数与K值之间的比值,确定类别单一比例;According to the ratio between the number of target mask frames and the K value, determine the single proportion of the category;
针对每个目标掩膜帧的光流聚类结果,根据其中每个聚类簇中各个像素点的二维光流值,确定每个聚类簇的移动平均值,其中,光流聚类结果用于表征一个或多个聚类簇;For the optical flow clustering result of each target mask frame, the moving average of each cluster is determined based on the two-dimensional optical flow value of each pixel in each cluster, where the optical flow clustering result Used to characterize one or more clusters;
针对每个目标掩膜帧的光流聚类结果,根据其中每个聚类簇的移动平均值,确定该目标掩膜帧的移动平均值;For the optical flow clustering result of each target mask frame, determine the moving average of the target mask frame based on the moving average of each cluster cluster;
对每个目标掩膜帧的移动平均值进行累加处理,得到移动总距离;The moving average of each target mask frame is accumulated to obtain the total moving distance;
若类别单一比例大于比例阈值,且,移动总距离大于或等于距离阈值,则确定光流质量分值为第一分值;If the single proportion of a category is greater than the proportion threshold, and the total distance moved is greater than or equal to the distance threshold, then the optical flow quality score is determined to be the first score;
若类别单一比例小于或等于比例阈值,且,移动总距离小于距离阈值,则确定光流质量分值为第二分值。If the single proportion of a category is less than or equal to the proportion threshold, and the total distance moved is less than the distance threshold, then the optical flow quality score is determined to be the second score.
在一个或多个实施例中,介绍了一种基于类别单一比例以及移动总距离,共同确定光流质量分值的方式。由前述实施例可知,一方面可统计K个目标掩膜帧中,总类别数量小于或等于类别数量阈值(例如,1)的帧数占比,即,得到类别单一比例。另一方面可统计出K个目标掩膜帧的移动总距离。可以理解的是,类别单一比例和移动总距离的确定方式可参阅前述实施例,此处不做赘述。In one or more embodiments, a method of jointly determining the optical flow quality score based on a single proportion of a category and the total distance moved is introduced. As can be seen from the foregoing embodiments, on the one hand, the proportion of frames in which the total number of categories is less than or equal to the category number threshold (for example, 1) among the K target mask frames can be counted, that is, a single category proportion is obtained. On the other hand, the total moving distance of K target mask frames can be calculated. It can be understood that the method for determining the single proportion of a category and the total distance of movement may refer to the foregoing embodiments, and will not be described in detail here.
具体地,结合类别单一比例和移动总距离,可定义光流质量的判别标准为:
Specifically, combining the single proportion of categories and the total distance moved, the criterion for determining optical flow quality can be defined as:
其中,Q表示光流质量分值。D表示移动总距离。Dthreshold表示距离阈值,示例性地,距离阈值可设置为4,或者其他合理取值,此处不做限定。CR表示类别单一比例。CRthreshold表示比例阈值,示例性地,比例阈值可设置为0.8,或者其他合理取值,此处不做限定。Among them, Q represents the optical flow quality score. D represents the total distance moved. D threshold represents a distance threshold. For example, the distance threshold can be set to 4, or other reasonable values, which are not limited here. CR represents category single ratio. CR threshold represents a proportional threshold. For example, the proportional threshold can be set to 0.8, or other reasonable values, which are not limited here.
其次,本申请实施例提供了一种基于类别单一比例以及移动总距离,共同确定光流质量分值的方式。通过上述方式,一方面,利用类别单一比例,能够过滤掉光流受到干扰的视频,另一方面,利用移动总距离,能够过滤掉较为静止的视频。由此,两者相结合可以更全面准确地反映光流质量,从而提升光流质量分值的可靠性。Secondly, the embodiment of the present application provides a way to jointly determine the optical flow quality score based on a single proportion of categories and the total distance moved. Through the above method, on the one hand, the single ratio of the category can be used to filter out the video whose optical flow is disturbed. On the other hand, the total distance of movement can be used to filter out the relatively static video. Therefore, the combination of the two can reflect the optical flow quality more comprehensively and accurately, thus improving the reliability of the optical flow quality score.
可选地,在上述图4对应的实施例的基础上,本申请实施例提供的另一个可选实施例中,采用与光流质量分值匹配的视频修复方式,对待修复视频进行修复处理,具体可以包括:Optionally, based on the above-mentioned embodiment corresponding to Figure 4, in another optional embodiment provided by the embodiment of the present application, a video repair method matching the optical flow quality score is used to repair the video to be repaired, Specifics may include:
若光流质量分值为第一分值,则采用光流法对待修复视频进行修复处理。If the optical flow quality score is the first score, the optical flow method is used to repair the video to be repaired.
若光流质量分值为第二分值,则调用神经网络对待修复视频进行修复处理。 If the optical flow quality score is the second score, the neural network is called to repair the video to be repaired.
在一个或多个实施例中,介绍了一种基于光流质量分值实现视频修复的方法。由前述实施例可知,光流质量分值可以为第一分值或第二分值,下面将以第一分值为“1”,第二分值为“0”作为示例进行介绍。In one or more embodiments, a method for video repair based on optical flow quality scores is introduced. As can be seen from the foregoing embodiments, the optical flow quality score can be a first score or a second score. The following uses the first score as "1" and the second score as "0" as an example for introduction.
具体地,可采用如下方式选择视频修复方式:
Specifically, the video repair method can be selected as follows:
其中,F1(x,m)表示采用光流法进行视频修复处理。F2(x,m)表示调用神经网络进行视频修复处理。Q表示光流质量分值。Among them, F 1 (x, m) indicates that the optical flow method is used for video repair processing. F 2 (x,m) means calling the neural network for video repair processing. Q represents the optical flow quality score.
需要说明的是,本申请的目标是求解视频序列y={yt}(t=0,1,2,…,T)。该视频序列仅在原始掩膜区域与待修复视频相异,并且使得视频序列在时间和空间上是自然且一致的。由于自然和一致难以进行公式化定义,因此,在进行神经网络训练时,希望填充完成的视频序列与真实视频序列ygt接近。其中,ygt表示不带原始掩膜区域的视频序列真值。基于此,通过构建算法F,视频序列y的求解可以被定义为y=F(x,m)。It should be noted that the goal of this application is to solve the video sequence y={y t } (t=0,1,2,...,T). The video sequence differs from the video to be repaired only in the original mask area, and makes the video sequence natural and consistent in time and space. Since naturalness and consistency are difficult to define in a formula, when training a neural network, it is hoped that the filled video sequence is close to the real video sequence y gt . Among them, y gt represents the true value of the video sequence without the original mask area. Based on this, by constructing algorithm F, the solution of video sequence y can be defined as y=F(x,m).
其次,本申请实施例提供了一种基于光流质量分值实现视频修复的方法。通过上述方式,在进行视频修复前,如果判断光流质量良好,则直接使用光流法即可得到清晰可靠的填充内容。如果光流不可靠,那么采用模型法填充内容,从而规避光流估计失准带来的错误填充,得到整体更加稳定的填充效果。Secondly, embodiments of this application provide a method for video repair based on optical flow quality scores. Through the above method, before video repair, if the optical flow quality is judged to be good, clear and reliable filling content can be obtained directly by using the optical flow method. If the optical flow is unreliable, the model method is used to fill the content, thereby avoiding erroneous filling caused by inaccurate optical flow estimation and obtaining an overall more stable filling effect.
可选地,在上述图4对应的实施例的基础上,本申请实施例提供的另一个可选实施例中,还可以包括:Optionally, based on the above-mentioned embodiment corresponding to Figure 4, another optional embodiment provided by the embodiment of this application may also include:
显示待修复视频以及修复对象列表,其中,修复对象列表包括至少一个可修复对象;Display the video to be repaired and a list of repairable objects, where the list of repairable objects includes at least one repairable object;
响应针对于目标对象的选择操作,执行获取针对待修复视频的视频样本序列的步骤,其中,目标对象属于至少一个可修复对象;In response to the selection operation on the target object, perform the step of obtaining a video sample sequence for the video to be repaired, wherein the target object belongs to at least one repairable object;
采用与光流质量分值匹配的视频修复方式,对待修复视频进行修复处理之后,还可以包括:Using a video repair method that matches the optical flow quality score, after repairing the video to be repaired, it can also include:
响应针对已修复视频的播放操作,播放已修复视频。Plays the repaired video in response to a playback operation on the repaired video.
在一个或多个实施例中,介绍了一种智能修复视频的方式。由前述实施例可知,本申请可应用于各类视频修复任务,例如,移除标志,移除字幕,移除物体等。如果用户希望使用某些平台的视频,但由于视频带有该平台的标志,影响观感,那么可以使用视频修复应用进行台标去除。类似地,用户可以将字幕从一些视频中抹除,或者,将某些运动物体从视频中移除。下面将结合图示进行分别进行介绍。In one or more embodiments, a method of intelligently repairing videos is introduced. As can be seen from the foregoing embodiments, the present application can be applied to various video repair tasks, such as removing logos, removing subtitles, removing objects, etc. If the user wants to use videos from certain platforms, but the videos carry the logo of that platform, which affects the look and feel, they can use a video repair application to remove the logo. Similarly, users can erase subtitles from some videos, or remove certain moving objects from videos. The following will be introduced separately with the illustrations.
示例性地,请参阅图11,图11为本申请实施例中基于视频修复应用移除标志的一个效果示意图,如图所示,在视频修复应用提供的界面上显示待修复视频以及修复对象列表,其中,修复对象列表显示有至少一个可修复对象 (例如,标志,字幕,船,云朵等)。假设用户选择“标志”对应的“一键去除”控件,由此,触发针对目标对象(即,标志)的选择操作。于是,响应该选择操作,调用视频修复功能。基于此,采用合适的视频修复方式对视频进行修复,以此得到已修复视频,已修复视频中不存在标志。当用户触发针对已修复视频的播放操作时,可播放已修复视频。Exemplarily, please refer to Figure 11. Figure 11 is a schematic diagram of the effect of removing a flag based on a video repair application in an embodiment of the present application. As shown in the figure, the video to be repaired and a list of repair objects are displayed on the interface provided by the video repair application. , where the list of repairable objects shows that there is at least one repairable object (e.g. logo, subtitles, boats, clouds, etc.). Assume that the user selects the "one-click removal" control corresponding to the "logo", thereby triggering a selection operation on the target object (ie, the logo). Then, in response to the selection operation, the video repair function is called. Based on this, a suitable video repair method is used to repair the video to obtain a repaired video. There is no mark in the repaired video. The repaired video can be played when the user triggers the playback action on the repaired video.
示例性地,请参阅图12,图12为本申请实施例中基于视频修复应用移除字幕的一个效果示意图,如图所示,在视频修复应用提供的界面上显示待修复视频以及修复对象列表,其中,修复对象列表显示有至少一个可修复对象(例如,标志,字幕,船,云朵等)。假设用户选择“字幕”对应的“一键去除”控件,由此,触发针对目标对象(即,字幕)的选择操作。于是,响应该选择操作,调用视频修复功能。基于此,采用合适的视频修复方式对视频进行修复,以此得到已修复视频,已修复视频中不存在字幕。当用户触发针对已修复视频的播放指令时,可播放已修复视频。Exemplarily, please refer to Figure 12. Figure 12 is a schematic diagram of the effect of removing subtitles based on a video repair application in an embodiment of the present application. As shown in the figure, the video to be repaired and a list of repair objects are displayed on the interface provided by the video repair application. , wherein the repair object list shows that there is at least one repairable object (for example, a sign, a subtitle, a boat, a cloud, etc.). Assume that the user selects the "one-click removal" control corresponding to "subtitles", thereby triggering a selection operation on the target object (ie, subtitles). Then, in response to the selection operation, the video repair function is called. Based on this, a suitable video repair method is used to repair the video to obtain a repaired video. There are no subtitles in the repaired video. The repaired video can be played when the user triggers the play command for the repaired video.
示例性地,请参阅图13,图13为本申请实施例中基于视频修复应用移除物体的一个效果示意图,如图所示,在视频修复应用提供的界面上显示待修复视频以及修复对象列表,其中,修复对象列表显示有至少一个可修复对象(例如,标志,字幕,船,云朵等)。假设用户选择“船”对应的“一键去除”控件,由此,触发针对目标对象(即,船)的选择操作。于是,响应该选择操作,调用视频修复功能。基于此,采用合适的视频修复方式对视频进行修复,以此得到已修复视频,已修复视频中不存在物体“船”。当用户触发针对已修复视频的播放指令时,可播放已修复视频。Exemplarily, please refer to Figure 13. Figure 13 is a schematic diagram of the effect of removing objects based on a video repair application in an embodiment of the present application. As shown in the figure, the video to be repaired and a list of repair objects are displayed on the interface provided by the video repair application. , wherein the repair object list shows that there is at least one repairable object (for example, a sign, a subtitle, a boat, a cloud, etc.). Assume that the user selects the "one-click removal" control corresponding to "boat", thereby triggering a selection operation on the target object (ie, boat). Then, in response to the selection operation, the video repair function is called. Based on this, a suitable video repair method is used to repair the video to obtain a repaired video. There is no object "ship" in the repaired video. The repaired video can be played when the user triggers the play command for the repaired video.
需要说明的是,图11,图12和图13示出的界面元素,界面排布方式以及界面文案等,均为一个示意,不应理解为对本申请的限定。It should be noted that the interface elements, interface arrangement, interface copy, etc. shown in Figures 11, 12 and 13 are all schematic and should not be understood as limitations of this application.
其次,本申请实施例提供了一种智能修复视频的方式。通过上述方式,用户可借助视频修复应用选择对视频中的一个或多个对象进行修复,达到智能化修复的目的。由此,不仅提升方案的实用性,还能够提升视频修复效率。Secondly, embodiments of this application provide a method of intelligently repairing videos. Through the above method, users can use the video repair application to choose to repair one or more objects in the video to achieve the purpose of intelligent repair. This not only improves the practicality of the solution, but also improves the efficiency of video repair.
可见,本申请能够准确并高效地判断视频片段中光流质量的优劣,进而据此在调用视频修复方式之前选择光流法或者模型法,即采用较优的视频修复方式进行修复,使得修复效果要优于二者单独使用的效果。下面将结合实例介绍基于光流法和模型法实现视频帧修复的效果。请参阅图14,图14为本申请实施例中基于光流法和模型法实现视频帧修复的效果对比示意图,如图所示,一个示例中,图14中(a)图示出的是基于光流法填充的效果,图14中(b)图示出的是基于模型法填充的效果。其中,原始掩膜区域位于视频帧的左下角(即,矩形框圈出的区域),示例中镜头移动平滑,光流估计良好,因此,本申请选择使用光流法进行填充。另一个示例中,图14中(c)图示出的是基于光流法填充的效果,图14中(d)图示出的是基于模型法填充的效果。其中,原始掩膜区域位于视频帧的左下角(即,矩形框圈出的区 域),示例由于光流受到人物手表的影响,因此,本申请选择使用模型法进行填充。It can be seen that this application can accurately and efficiently judge the quality of optical flow in video clips, and then select the optical flow method or the model method before calling the video repair method, that is, use a better video repair method to repair, so that the repair The effect is better than the two used alone. The following will introduce the effect of video frame restoration based on optical flow method and model method with examples. Please refer to Figure 14. Figure 14 is a schematic diagram comparing the effects of video frame repair based on the optical flow method and the model method in the embodiment of the present application. As shown in the figure, in one example, (a) in Figure 14 shows the effect based on the optical flow method and the model method. The effect of optical flow filling. (b) in Figure 14 shows the effect of filling based on the model method. Among them, the original mask area is located in the lower left corner of the video frame (ie, the area enclosed by the rectangular frame). In the example, the lens movement is smooth and the optical flow estimation is good. Therefore, this application chooses to use the optical flow method for filling. In another example, (c) in Figure 14 shows the effect of filling based on the optical flow method, and (d) in Figure 14 shows the effect of filling based on the model method. Among them, the original mask area is located in the lower left corner of the video frame (i.e., the area enclosed by the rectangular frame domain), example: Since the optical flow is affected by the character's watch, this application chooses to use the model method for filling.
下面对本申请中的视频修复装置进行详细描述,请参阅图15,图15为本申请实施例中视频修复装置的一个实施例示意图,视频修复装置20包括:The video repair device in the present application will be described in detail below. Please refer to Figure 15. Figure 15 is a schematic diagram of a video repair device in the embodiment of the present application. The video repair device 20 includes:
获取模块210,用于获取待修复视频对应的视频样本序列,其中,视频样本序列包括K个视频帧对,每个视频帧对包括相邻的两个视频帧,K为大于或等于1的整数;The acquisition module 210 is used to acquire a video sample sequence corresponding to the video to be repaired, where the video sample sequence includes K video frame pairs, each video frame pair includes two adjacent video frames, and K is an integer greater than or equal to 1. ;
获取模块210,还用于根据视频样本序列获取目标掩膜样本序列,其中,目标掩膜样本序列包括K个目标掩膜帧,每个目标掩膜帧包括对原始掩膜区域进行扩张后得到的目标掩膜区域,且,K个目标掩膜帧与K个视频帧对之间具有一一对应关系;The acquisition module 210 is also used to obtain a target mask sample sequence according to the video sample sequence, wherein the target mask sample sequence includes K target mask frames, and each target mask frame includes a target mask obtained by expanding the original mask area. The target mask area, and there is a one-to-one correspondence between K target mask frames and K video frame pairs;
获取模块210,还用于根据视频样本序列获取光流数据序列,其中,光流数据序列包括K个光流数据,且,K个光流数据与K个视频帧对之间具有一一对应关系;The acquisition module 210 is also used to obtain an optical flow data sequence according to the video sample sequence, where the optical flow data sequence includes K optical flow data, and there is a one-to-one correspondence between the K optical flow data and the K video frame pairs. ;
处理模块220,用于基于光流数据序列中的各个光流数据,对每个目标掩膜帧中目标掩膜区域包括的像素点进行聚类处理,得到每个目标掩膜帧的光流聚类结果;The processing module 220 is configured to perform clustering processing on the pixels included in the target mask area in each target mask frame based on each optical flow data in the optical flow data sequence, and obtain the optical flow cluster of each target mask frame. class result;
确定模块230,用于根据每个目标掩膜帧的光流聚类结果,确定光流质量分值;The determination module 230 is used to determine the optical flow quality score according to the optical flow clustering result of each target mask frame;
修复模块240,用于采用与光流质量分值匹配的视频修复方式,对待修复视频进行修复处理。The repair module 240 is used to repair the video to be repaired using a video repair method that matches the optical flow quality score.
可选地,在上述图15所对应的实施例的基础上,本申请实施例提供的视频修复装置20的另一实施例中,Optionally, based on the above embodiment corresponding to Figure 15, in another embodiment of the video repair device 20 provided by the embodiment of the present application,
获取模块210,具体用于从待修复视频中获取视频序列,其中,视频序列包括T个原始视频帧,每个原始视频帧显示有目标对象,T为大于1的整数;The acquisition module 210 is specifically used to acquire a video sequence from the video to be repaired, where the video sequence includes T original video frames, each original video frame displays a target object, and T is an integer greater than 1;
从视频序列中抽取K个待处理视频帧对,其中,每个待处理视频帧对包括相邻的两个原始视频帧;Extract K video frame pairs to be processed from the video sequence, where each video frame pair to be processed includes two adjacent original video frames;
对K个待处理视频帧对中各个原始视频帧的尺寸分别进行归一化处理,得到K个视频帧对,并将K个视频帧对作为视频样本序列。The size of each original video frame in the K video frame pairs to be processed is normalized respectively to obtain K video frame pairs, and the K video frame pairs are used as a video sample sequence.
可选地,在上述图15所对应的实施例的基础上,本申请实施例提供的视频修复装置20的另一实施例中,Optionally, based on the above embodiment corresponding to Figure 15, in another embodiment of the video repair device 20 provided by the embodiment of the present application,
获取模块210,具体用于针对视频样本序列中的每个视频帧对,根据该视频帧对中的任一个视频帧获取该视频帧对对应的原始掩膜帧,其中,原始掩膜帧包括对该任一个视频帧中的目标对象进行掩膜处理后得到的原始掩膜区域;The acquisition module 210 is specifically configured to obtain, for each video frame pair in the video sample sequence, an original mask frame corresponding to the video frame pair according to any video frame in the video frame pair, where the original mask frame includes a pair of The original mask area obtained after masking the target object in any video frame;
针对视频样本序列中的每个视频帧对,对该视频帧对对应的原始掩膜帧 中的原始掩膜区域进行扩张,得到该视频帧对对应的目标掩膜帧;For each video frame pair in the video sample sequence, the original mask frame corresponding to the video frame pair The original mask area in is expanded to obtain the target mask frame corresponding to the video frame;
将K个视频帧对各自对应的目标掩膜帧作为目标掩膜样本序列。The corresponding target mask frames of the K video frame pairs are used as the target mask sample sequence.
可选地,在上述图15所对应的实施例的基础上,本申请实施例提供的视频修复装置20的另一实施例中,Optionally, based on the above embodiment corresponding to Figure 15, in another embodiment of the video repair device 20 provided by the embodiment of the present application,
获取模块210,具体用于针对视频样本序列中的每个视频帧对,按照第一像素个数对该视频帧对对应的原始掩膜帧中的原始掩膜区域进行扩张,得到该视频帧对对应的第一掩膜区域;The acquisition module 210 is specifically configured to, for each video frame pair in the video sample sequence, expand the original mask area in the original mask frame corresponding to the video frame pair according to the first number of pixels to obtain the video frame pair. The corresponding first mask area;
针对视频样本序列中的每个视频帧对,按照第二像素个数对该视频帧对对应的原始掩膜帧中的原始掩膜区域进行扩张,得到该视频帧对对应的第二掩膜区域,其中,第二像素个数大于第一像素个数;For each video frame pair in the video sample sequence, expand the original mask area in the original mask frame corresponding to the video frame pair according to the number of second pixels to obtain the second mask area corresponding to the video frame pair. , where the number of second pixels is greater than the number of first pixels;
针对视频样本序列中的每个视频帧对,对该视频帧对对应的第一掩膜区域以及第二掩膜区域进行异或操作,得到该视频帧对对应的目标掩膜帧。For each video frame pair in the video sample sequence, an XOR operation is performed on the first mask area and the second mask area corresponding to the video frame pair to obtain the target mask frame corresponding to the video frame pair.
可选地,在上述图15所对应的实施例的基础上,本申请实施例提供的视频修复装置20的另一实施例中,Optionally, based on the above embodiment corresponding to Figure 15, in another embodiment of the video repair device 20 provided by the embodiment of the present application,
获取模块210,具体用于针对视频样本序列中的每个视频帧对,根据该视频帧对中的前一个视频帧获取该视频帧对对应的第一原始掩膜帧,并根据视频帧对中的后一个视频帧获取该视频帧对对应的第二原始掩膜帧,其中,第一原始掩膜帧以及第二原始掩膜帧分别包括对前一个视频帧和后一个视频帧中的目标对象进行掩膜处理后得到的原始掩膜区域;The acquisition module 210 is specifically configured to, for each video frame pair in the video sample sequence, obtain the first original mask frame corresponding to the video frame pair according to the previous video frame in the video frame pair, and obtain the first original mask frame corresponding to the video frame pair according to the video frame pair. Obtain the second original mask frame corresponding to the video frame pair in the latter video frame, where the first original mask frame and the second original mask frame respectively include the target object in the previous video frame and the next video frame. The original mask area obtained after mask processing;
针对视频样本序列中的每个视频帧对,对该视频帧对对应的第一原始掩膜帧以及第二原始掩膜帧进行并集处理,得到该视频帧对对应的原始掩膜帧;For each video frame pair in the video sample sequence, perform a union process on the first original mask frame and the second original mask frame corresponding to the video frame pair to obtain the original mask frame corresponding to the video frame pair;
针对视频样本序列中的每个视频帧对,对该视频帧对对应的原始掩膜帧中的原始掩膜区域进行扩张,得到该视频帧对对应的目标掩膜帧;For each video frame pair in the video sample sequence, expand the original mask area in the original mask frame corresponding to the video frame pair to obtain the target mask frame corresponding to the video frame pair;
将K个视频帧对各自对应的目标掩膜帧作为目标掩膜样本序列。The corresponding target mask frames of the K video frame pairs are used as the target mask sample sequence.
可选地,在上述图15所对应的实施例的基础上,本申请实施例提供的视频修复装置20的另一实施例中,Optionally, based on the above embodiment corresponding to Figure 15, in another embodiment of the video repair device 20 provided by the embodiment of the present application,
获取模块210,具体用于针对视频样本序列中的每个视频帧对,按照第一像素个数对该视频帧对对应的原始掩膜帧中的原始掩膜区域进行扩张,得到该视频帧对对应的第一掩膜区域;The acquisition module 210 is specifically configured to, for each video frame pair in the video sample sequence, expand the original mask area in the original mask frame corresponding to the video frame pair according to the first number of pixels to obtain the video frame pair. The corresponding first mask area;
针对视频样本序列中的每个视频帧对,按照第二像素个数对该视频帧对对应的原始掩膜帧中的原始掩膜区域进行扩张,得到该视频帧对对应的第二掩膜区域,其中,第二像素个数大于第一像素个数;For each video frame pair in the video sample sequence, expand the original mask area in the original mask frame corresponding to the video frame pair according to the number of second pixels to obtain the second mask area corresponding to the video frame pair. , where the number of second pixels is greater than the number of first pixels;
针对视频样本序列中的每个视频帧对,对该视频帧对对应的第一掩膜区域以及第二掩膜区域进行异或操作,得到该视频帧对对应的目标掩膜帧。For each video frame pair in the video sample sequence, an XOR operation is performed on the first mask area and the second mask area corresponding to the video frame pair to obtain the target mask frame corresponding to the video frame pair.
可选地,在上述图15所对应的实施例的基础上,本申请实施例提供的视频修复装置20的另一实施例中,Optionally, based on the above embodiment corresponding to Figure 15, in another embodiment of the video repair device 20 provided by the embodiment of the present application,
获取模块210,具体用于针对视频样本序列中的每个视频帧对,根据该 视频帧对中后一个视频帧中各个像素点相对于前一个视频帧中各个像素点的水平偏移量以及竖直偏移量,确定该视频帧对对应的光流数据;The acquisition module 210 is specifically used for each video frame pair in the video sample sequence, according to the The horizontal offset and vertical offset of each pixel in the next video frame relative to each pixel in the previous video frame determines the optical flow data corresponding to the video frame pair;
将K个视频帧对各自对应的光流数据作为光流数据序列;The optical flow data corresponding to each of the K video frame pairs is used as an optical flow data sequence;
或,or,
获取模块210,具体用于针对视频样本序列中的每个视频帧对,根据该视频帧对中前一个视频帧中各个像素点相对于后一个视频帧中各个像素点的水平偏移量以及竖直偏移量,确定该视频帧对对应的光流数据;The acquisition module 210 is specifically used for each video frame pair in the video sample sequence, according to the horizontal offset and vertical offset of each pixel point in the previous video frame relative to each pixel point in the next video frame in the video frame pair. Straight offset to determine the optical flow data corresponding to the video frame pair;
将K个视频帧对各自对应的光流数据作为光流数据序列。The optical flow data corresponding to each of the K video frame pairs is used as an optical flow data sequence.
可选地,在上述图15所对应的实施例的基础上,本申请实施例提供的视频修复装置20的另一实施例中,Optionally, based on the above embodiment corresponding to Figure 15, in another embodiment of the video repair device 20 provided by the embodiment of the present application,
处理模块220,具体用于针对每个目标掩膜帧,根据光流数据序列中该目标掩膜帧对应的光流数据,确定该目标掩膜帧中目标掩膜区域中X个像素点的二维光流值,其中,目标掩膜帧对应的光流数据与该目标掩膜帧对应于同一视频帧对,X为大于1的整数;The processing module 220 is specifically configured to determine, for each target mask frame, the two pixels of X pixels in the target mask area in the target mask frame according to the optical flow data corresponding to the target mask frame in the optical flow data sequence. dimensional optical flow value, where the optical flow data corresponding to the target mask frame and the target mask frame correspond to the same video frame pair, and X is an integer greater than 1;
针对每个目标掩膜帧,根据其中目标掩膜区域中X个像素点的二维光流值,对X个像素点进行聚类处理,得到该目标掩膜帧的光流聚类结果。For each target mask frame, cluster the X pixels according to the two-dimensional optical flow values of the X pixels in the target mask area to obtain the optical flow clustering result of the target mask frame.
可选地,在上述图15所对应的实施例的基础上,本申请实施例提供的视频修复装置20的另一实施例中,Optionally, based on the above embodiment corresponding to Figure 15, in another embodiment of the video repair device 20 provided by the embodiment of the present application,
确定模块230,具体用于根据每个目标掩膜帧的光流聚类结果,确定每个目标掩膜帧的总类别数量;The determination module 230 is specifically configured to determine the total number of categories of each target mask frame based on the optical flow clustering results of each target mask frame;
统计总类别数量小于或等于类别数量阈值的目标掩膜帧数;Count the number of target mask frames in which the total number of categories is less than or equal to the category number threshold;
根据目标掩膜帧数与K值之间的比值,确定类别单一比例;According to the ratio between the number of target mask frames and the K value, determine the single proportion of the category;
若类别单一比例大于比例阈值,则确定光流质量分值为第一分值;If the single proportion of a category is greater than the proportion threshold, the optical flow quality score is determined to be the first score;
若类别单一比例小于或等于比例阈值,则确定光流质量分值为第二分值。If the category single ratio is less than or equal to the ratio threshold, the optical flow quality score is determined to be the second score.
可选地,在上述图15所对应的实施例的基础上,本申请实施例提供的视频修复装置20的另一实施例中,Optionally, based on the above embodiment corresponding to Figure 15, in another embodiment of the video repair device 20 provided by the embodiment of the present application,
确定模块230,具体用于针对每个目标掩膜帧的光流聚类结果,根据其中每个聚类簇中各个像素点的二维光流值,确定其中每个聚类簇的移动平均值,其中,光流聚类结果用于表征一个或多个聚类簇;The determination module 230 is specifically configured to determine the moving average of each cluster cluster according to the two-dimensional optical flow value of each pixel in each cluster cluster based on the optical flow clustering result of each target mask frame. , where the optical flow clustering results are used to characterize one or more clusters;
针对每个目标掩膜帧的光流聚类结果,根据其中每个聚类簇的移动平均值,确定目标掩膜帧的移动平均值;For the optical flow clustering result of each target mask frame, determine the moving average of the target mask frame based on the moving average of each cluster;
对每个目标掩膜帧的移动平均值进行累加处理,得到移动总距离;The moving average of each target mask frame is accumulated to obtain the total moving distance;
若移动总距离大于或等于距离阈值,则确定光流质量分值为第一分值;If the total distance moved is greater than or equal to the distance threshold, the optical flow quality score is determined to be the first score;
若移动总距离小于距离阈值,则确定光流质量分值为第二分值。If the total distance moved is less than the distance threshold, the optical flow quality score is determined to be the second score.
可选地,在上述图15所对应的实施例的基础上,本申请实施例提供的视频修复装置20的另一实施例中,Optionally, based on the above embodiment corresponding to Figure 15, in another embodiment of the video repair device 20 provided by the embodiment of the present application,
确定模块230,具体用于根据每个目标掩膜帧的光流聚类结果,确定每 个目标掩膜帧的总类别数量;The determination module 230 is specifically configured to determine each target mask frame according to the optical flow clustering result. The total number of categories of target mask frames;
统计总类别数量小于或等于类别数量阈值的目标掩膜帧数;Count the number of target mask frames in which the total number of categories is less than or equal to the category number threshold;
根据目标掩膜帧数与K值之间的比值,确定类别单一比例;According to the ratio between the number of target mask frames and the K value, determine the single proportion of the category;
针对每个目标掩膜帧的光流聚类结果,根据其中每个聚类簇中各个像素点的二维光流值,确定每个聚类簇的移动平均值,其中,光流聚类结果用于表征一个或多个聚类簇;For the optical flow clustering result of each target mask frame, the moving average of each cluster is determined based on the two-dimensional optical flow value of each pixel in each cluster, where the optical flow clustering result Used to characterize one or more clusters;
针对每个目标掩膜帧的光流聚类结果,根据其中每个聚类簇的移动平均值,确定该目标掩膜帧的移动平均值;For the optical flow clustering result of each target mask frame, determine the moving average of the target mask frame based on the moving average of each cluster cluster;
对每个目标掩膜帧的移动平均值进行累加处理,得到移动总距离;The moving average of each target mask frame is accumulated to obtain the total moving distance;
若类别单一比例大于比例阈值,且,移动总距离大于或等于距离阈值,则确定光流质量分值为第一分值;If the single proportion of a category is greater than the proportion threshold, and the total distance moved is greater than or equal to the distance threshold, then the optical flow quality score is determined to be the first score;
若类别单一比例小于或等于比例阈值,或,移动总距离小于距离阈值,则确定光流质量分值为第二分值。If the single proportion of a category is less than or equal to the proportion threshold, or the total distance moved is less than the distance threshold, then the optical flow quality score is determined to be the second score.
可选地,在上述图15所对应的实施例的基础上,本申请实施例提供的视频修复装置20的另一实施例中,Optionally, based on the above embodiment corresponding to Figure 15, in another embodiment of the video repair device 20 provided by the embodiment of the present application,
修复模块240,具体用于若光流质量分值为第一分值,则采用光流法对待修复视频进行修复处理。The repair module 240 is specifically configured to use the optical flow method to repair the video to be repaired if the optical flow quality score is the first score.
若光流质量分值为第二分值,则调用神经网络对待修复视频进行修复处理。If the optical flow quality score is the second score, the neural network is called to repair the video to be repaired.
可选地,在上述图15所对应的实施例的基础上,本申请实施例提供的视频修复装置20的另一实施例中,视频修复装置20还包括显示模块250;Optionally, based on the above embodiment corresponding to Figure 15, in another embodiment of the video repair device 20 provided by the embodiment of the present application, the video repair device 20 further includes a display module 250;
显示模块250,用于显示待修复视频以及修复对象列表,其中,修复对象列表包括至少一个可修复对象;The display module 250 is used to display the video to be repaired and a repair object list, where the repair object list includes at least one repairable object;
获取模块210,还用于响应针对于目标对象的选择操作,执行获取针对待修复视频的视频样本序列的步骤,其中,目标对象属于至少一个可修复对象;The acquisition module 210 is also configured to respond to the selection operation on the target object, and execute the step of acquiring the video sample sequence for the video to be repaired, wherein the target object belongs to at least one repairable object;
显示模块250,还用于采用与光流质量分值匹配的视频修复方式,对待修复视频进行修复处理之后,响应针对已修复视频的播放操作,播放已修复视频。The display module 250 is also configured to use a video repair method that matches the optical flow quality score. After repairing the video to be repaired, respond to the playback operation of the repaired video and play the repaired video.
本申请实施例还提供了一种终端,如图16所示,为了便于说明,仅示出了与本申请实施例相关的部分,具体技术细节未揭示的,请参照本申请实施例方法部分。在本申请实施例中,以终端为手机为例进行说明:The embodiment of the present application also provides a terminal, as shown in Figure 16. For convenience of explanation, only the parts related to the embodiment of the present application are shown. If the specific technical details are not disclosed, please refer to the method part of the embodiment of the present application. In the embodiment of this application, the terminal is a mobile phone as an example for explanation:
图16示出的是与本申请实施例提供的终端相关的手机的部分结构的框图。参考图16,手机包括:射频(radio frequency,RF)电路310、存储器320、输入单元330(其中包括触控面板331和其他输入设备332)、显示单元340(其中包括显示面板341)、传感器350、音频电路360(其连接有扬声器361和传声器362)、无线保真(wireless fidelity,WiFi)模块370、处理器380、 以及电源390等部件。本领域技术人员可以理解,图16中示出的手机结构并不构成对手机的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。Figure 16 shows a block diagram of a partial structure of a mobile phone related to the terminal provided by the embodiment of the present application. Referring to Figure 16, the mobile phone includes: a radio frequency (RF) circuit 310, a memory 320, an input unit 330 (which includes a touch panel 331 and other input devices 332), a display unit 340 (which includes a display panel 341), and a sensor 350 , audio circuit 360 (which is connected to a speaker 361 and a microphone 362), a wireless fidelity (wireless fidelity, WiFi) module 370, a processor 380, As well as power supply 390 and other components. Those skilled in the art can understand that the structure of the mobile phone shown in FIG. 16 does not limit the mobile phone, and may include more or fewer components than shown in the figure, or combine certain components, or arrange different components.
其中,存储器320可用于存储软件程序以及模块,处理器380通过运行存储在存储器320的软件程序以及模块,从而执行手机的各种功能应用以及数据处理。存储器320可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序(比如声音播放功能、图像播放功能等)等;存储数据区可存储根据手机的使用所创建的数据(比如音频数据、电话本等)等。此外,存储器320可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或其他易失性固态存储器件。The memory 320 can be used to store software programs and modules, and the processor 380 executes various functional applications and data processing of the mobile phone by running the software programs and modules stored in the memory 320 . The memory 320 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function (such as a sound playback function, an image playback function, etc.), etc.; the storage data area may store a program based on Data created by the use of mobile phones (such as audio data, phone books, etc.), etc. In addition, memory 320 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device.
其中,处理器380是手机的控制中心,利用各种接口和线路连接整个手机的各个部分,通过运行或执行存储在存储器320内的软件程序和/或模块,以及调用存储在存储器320内的数据,执行手机的各种功能和处理数据。可选的,处理器380可包括一个或多个处理单元;可选的,处理器380可集成应用处理器和调制解调处理器,其中,应用处理器主要处理操作系统、用户界面和应用程序等,调制解调处理器主要处理无线通信。可以理解的是,上述调制解调处理器也可以不集成到处理器380中。Among them, the processor 380 is the control center of the mobile phone, using various interfaces and lines to connect various parts of the entire mobile phone, by running or executing software programs and/or modules stored in the memory 320, and calling data stored in the memory 320. , perform various functions of the phone and process data. Optionally, the processor 380 may include one or more processing units; optionally, the processor 380 may integrate an application processor and a modem processor, where the application processor mainly processes the operating system, user interface and application programs. etc., the modem processor mainly handles wireless communications. It can be understood that the above modem processor may not be integrated into the processor 380 .
上述实施例中由终端所执行的步骤可以基于该图16所示的终端结构。The steps performed by the terminal in the above embodiment may be based on the terminal structure shown in FIG. 16 .
图17是本申请实施例提供的一种服务器结构示意图,该服务器400可因配置或性能不同而产生比较大的差异,可以包括一个或一个以上中央处理器(central processing units,CPU)422(例如,一个或一个以上处理器)和存储器432,一个或一个以上存储应用程序442或数据444的存储介质430(例如一个或一个以上海量存储设备)。其中,存储器432和存储介质430可以是短暂存储或持久存储。存储在存储介质430的程序可以包括一个或一个以上模块(图示没标出),每个模块可以包括对服务器中的一系列指令操作。更进一步地,中央处理器422可以设置为与存储介质430通信,在服务器400上执行存储介质430中的一系列指令操作。Figure 17 is a schematic structural diagram of a server provided by an embodiment of the present application. The server 400 may vary greatly due to different configurations or performance, and may include one or more central processing units (CPUs) 422 (for example, , one or more processors) and memory 432, one or more storage media 430 (eg, one or more mass storage devices) that stores applications 442 or data 444. Among them, the memory 432 and the storage medium 430 may be short-term storage or persistent storage. The program stored in the storage medium 430 may include one or more modules (not shown in the figure), and each module may include a series of instruction operations on the server. Furthermore, the central processor 422 may be configured to communicate with the storage medium 430 and execute a series of instruction operations in the storage medium 430 on the server 400 .
服务器400还可以包括一个或一个以上电源426,一个或一个以上有线或无线网络接口450,一个或一个以上输入输出接口458,和/或,一个或一个以上操作系统441,例如Windows ServerTM,Mac OS XTM,UnixTM,LinuxTM,FreeBSDTM等等。Server 400 may also include one or more power supplies 426, one or more wired or wireless network interfaces 450, one or more input and output interfaces 458, and/or, one or more operating systems 441, such as Windows Server , Mac OS X TM , Unix TM , Linux TM , FreeBSD TM and more.
上述实施例中由服务器所执行的步骤可以基于该图17所示的服务器结构。The steps performed by the server in the above embodiment may be based on the server structure shown in FIG. 17 .
本申请实施例中还提供一种计算机设备,包括存储器和处理器,存储器存储有计算机程序,该处理器执行计算机程序时,实现前述各个实施例描述方法的步骤。 An embodiment of the present application also provides a computer device, including a memory and a processor. The memory stores a computer program. When the processor executes the computer program, it implements the steps of the methods described in the foregoing embodiments.
本申请实施例中还提供一种计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时,实现前述各个实施例描述方法的步骤。Embodiments of the present application also provide a computer-readable storage medium on which a computer program is stored. When the computer program is executed by a processor, the steps of the method described in each of the foregoing embodiments are implemented.
本申请实施例中还提供一种计算机程序产品,包括计算机程序,该计算机程序被处理器执行时,实现前述各个实施例描述方法的步骤。The embodiments of the present application also provide a computer program product, which includes a computer program. When the computer program is executed by a processor, the steps of the method described in each of the foregoing embodiments are implemented.
可以理解的是,在本申请的具体实施方式中,涉及到用户信息等相关的数据,当本申请以上实施例运用到具体产品或技术中时,需要获得用户许可或者同意,且相关数据的收集、使用和处理需要遵守相关国家和地区的相关法律法规和标准。It can be understood that in the specific implementation of this application, user information and other related data are involved. When the above embodiments of this application are applied to specific products or technologies, user permission or consent needs to be obtained, and the collection of relevant data , use and processing need to comply with relevant laws, regulations and standards of relevant countries and regions.
以上所述,以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围。 As mentioned above, the above embodiments are only used to illustrate the technical solution of the present application, but not to limit it. Although the present application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that they can still make the foregoing technical solutions. The technical solutions described in each embodiment may be modified, or some of the technical features may be equivalently replaced; however, these modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions in each embodiment of the present application.

Claims (17)

  1. 一种视频修复的方法,由计算机设备执行,包括:A method of video repair, performed by computer equipment, including:
    获取待修复视频对应的视频样本序列,其中,所述视频样本序列包括K个视频帧对,每个视频帧对包括相邻的两个视频帧,所述K为大于或等于1的整数;Obtain a video sample sequence corresponding to the video to be repaired, wherein the video sample sequence includes K video frame pairs, each video frame pair includes two adjacent video frames, and the K is an integer greater than or equal to 1;
    根据所述视频样本序列获取目标掩膜样本序列,其中,所述目标掩膜样本序列包括K个目标掩膜帧,每个目标掩膜帧包括对原始掩膜区域进行扩张后得到的目标掩膜区域,且,所述K个目标掩膜帧与所述K个视频帧对之间具有一一对应关系;Obtain a target mask sample sequence according to the video sample sequence, wherein the target mask sample sequence includes K target mask frames, and each target mask frame includes a target mask obtained by expanding the original mask area. area, and there is a one-to-one correspondence between the K target mask frames and the K video frame pairs;
    根据所述视频样本序列获取光流数据序列,其中,所述光流数据序列包括K个光流数据,且,所述K个光流数据与所述K个视频帧对之间具有一一对应关系;Obtain an optical flow data sequence according to the video sample sequence, wherein the optical flow data sequence includes K optical flow data, and there is a one-to-one correspondence between the K optical flow data and the K video frame pairs. relation;
    基于所述光流数据序列中的各个光流数据,对所述每个目标掩膜帧中目标掩膜区域包括的像素点进行聚类处理,得到所述每个目标掩膜帧的光流聚类结果;Based on each optical flow data in the optical flow data sequence, cluster the pixels included in the target mask area in each target mask frame to obtain the optical flow cluster of each target mask frame. class result;
    根据所述每个目标掩膜帧的光流聚类结果,确定光流质量分值;Determine the optical flow quality score according to the optical flow clustering result of each target mask frame;
    采用与所述光流质量分值匹配的视频修复方式,对所述待修复视频进行修复处理。The video to be repaired is repaired using a video repair method that matches the optical flow quality score.
  2. 根据权利要求1所述的方法,所述获取待修复视频对应的视频样本序列,包括:The method according to claim 1, obtaining the video sample sequence corresponding to the video to be repaired includes:
    从所述待修复视频中获取视频序列,其中,所述视频序列包括T个原始视频帧,每个原始视频帧显示有目标对象,所述T为大于1的整数;Obtain a video sequence from the video to be repaired, wherein the video sequence includes T original video frames, each original video frame displays a target object, and T is an integer greater than 1;
    从所述视频序列中抽取K个待处理视频帧对,其中,每个待处理视频帧对包括相邻的两个原始视频帧;Extract K video frame pairs to be processed from the video sequence, where each video frame pair to be processed includes two adjacent original video frames;
    对所述K个待处理视频帧对中各个原始视频帧的尺寸分别进行归一化处理,得到所述K个视频帧对,并将所述K个视频帧对作为所述视频样本序列。The size of each original video frame in the K video frame pairs to be processed is respectively normalized to obtain the K video frame pairs, and the K video frame pairs are used as the video sample sequence.
  3. 根据权利要求1所述的方法,所述根据所述视频样本序列获取目标掩膜样本序列,包括:The method according to claim 1, said obtaining a target mask sample sequence according to the video sample sequence, including:
    针对所述视频样本序列中的每个视频帧对,根据所述视频帧对中的任一个视频帧,获取所述视频帧对对应的原始掩膜帧,其中,所述原始掩膜帧包括对所述任一个视频帧中的目标对象进行掩膜处理后得到的原始掩膜区域;For each video frame pair in the video sample sequence, obtain the original mask frame corresponding to the video frame pair according to any video frame in the video frame pair, wherein the original mask frame includes a pair of The original mask area obtained after masking the target object in any of the video frames;
    针对所述视频样本序列中的每个视频帧对,对所述视频帧对对应的原始掩膜帧中的原始掩膜区域进行扩张,得到所述视频帧对对应的目标掩膜帧;For each video frame pair in the video sample sequence, expand the original mask area in the original mask frame corresponding to the video frame pair to obtain the target mask frame corresponding to the video frame pair;
    将所述K个视频帧对各自对应的目标掩膜帧作为所述目标掩膜样本序列。The target mask frames corresponding to each of the K video frame pairs are used as the target mask sample sequence.
  4. 根据权利要求3所述的方法,所述针对所述视频样本序列中的每个视频帧对,对所述视频帧对对应的原始掩膜帧中的原始掩膜区域进行扩张,得到所述视频帧对对应的目标掩膜帧,包括: The method according to claim 3, wherein for each video frame pair in the video sample sequence, the original mask area in the original mask frame corresponding to the video frame pair is expanded to obtain the video The target mask frame corresponding to the frame pair includes:
    针对所述视频样本序列中的每个视频帧对,按照第一像素个数对所述视频帧对对应的原始掩膜帧中的原始掩膜区域进行扩张,得到所述视频帧对对应的第一掩膜区域;For each video frame pair in the video sample sequence, expand the original mask area in the original mask frame corresponding to the video frame pair according to the first number of pixels to obtain the first corresponding video frame pair. a mask area;
    针对所述视频样本序列中的每个视频帧对,按照第二像素个数对所述视频帧对对应的原始掩膜帧中的原始掩膜区域进行扩张,得到所述视频帧对对应的第二掩膜区域,其中,所述第二像素个数大于所述第一像素个数;For each video frame pair in the video sample sequence, expand the original mask area in the original mask frame corresponding to the video frame pair according to the second number of pixels to obtain the third corresponding video frame pair. Two mask areas, wherein the second number of pixels is greater than the first number of pixels;
    针对所述视频样本序列中的每个视频帧对,对所述视频帧对对应的第一掩膜区域以及所述第二掩膜区域进行异或操作,得到所述视频帧对对应的目标掩膜帧。For each video frame pair in the video sample sequence, perform an XOR operation on the first mask area and the second mask area corresponding to the video frame pair to obtain the target mask corresponding to the video frame pair. Membrane frame.
  5. 根据权利要求1所述的方法,所述根据所述视频样本序列获取目标掩膜样本序列,包括:The method according to claim 1, said obtaining a target mask sample sequence according to the video sample sequence, including:
    针对所述视频样本序列中的每个视频帧对,根据所述视频帧对中的前一个视频帧,获取所述视频帧对对应的第一原始掩膜帧,并根据所述视频帧对中的后一个视频帧,获取所述视频帧对对应的第二原始掩膜帧,其中,所述第一原始掩膜帧以及所述第二原始掩膜帧分别包括对所述前一个视频帧和所述后一个视频帧中的目标对象进行掩膜处理后得到的原始掩膜区域;For each video frame pair in the video sample sequence, obtain the first original mask frame corresponding to the video frame pair based on the previous video frame in the video frame pair, and center the video frame pair based on the video frame pair. of the latter video frame, obtain the second original mask frame corresponding to the video frame pair, wherein the first original mask frame and the second original mask frame respectively include the pair of the previous video frame and The original mask area obtained after masking the target object in the latter video frame;
    针对所述视频样本序列中的每个视频帧对,对所述视频帧对对应的第一原始掩膜帧以及所述第二原始掩膜帧进行并集处理,得到所述视频帧对对应的原始掩膜帧;For each video frame pair in the video sample sequence, perform union processing on the first original mask frame and the second original mask frame corresponding to the video frame pair to obtain the corresponding original mask frame;
    针对所述视频样本序列中的每个视频帧对,对所述视频帧对对应的原始掩膜帧中的原始掩膜区域进行扩张,得到所述视频帧对对应的目标掩膜帧;For each video frame pair in the video sample sequence, expand the original mask area in the original mask frame corresponding to the video frame pair to obtain the target mask frame corresponding to the video frame pair;
    将所述K个视频帧对各自对应的目标掩膜帧作为所述目标掩膜样本序列。The target mask frames corresponding to each of the K video frame pairs are used as the target mask sample sequence.
  6. 根据权利要求5所述的方法,所述针对所述视频样本序列中的每个视频帧对,对所述视频帧对对应的原始掩膜帧中的原始掩膜区域进行扩张,得到所述视频帧对对应的目标掩膜帧,包括:The method according to claim 5, wherein for each video frame pair in the video sample sequence, the original mask area in the original mask frame corresponding to the video frame pair is expanded to obtain the video The target mask frame corresponding to the frame pair includes:
    针对所述视频样本序列中的每个视频帧对,按照第一像素个数对所述视频帧对对应的原始掩膜帧中的原始掩膜区域进行扩张,得到所述视频帧对对应的第一掩膜区域;For each video frame pair in the video sample sequence, expand the original mask area in the original mask frame corresponding to the video frame pair according to the first number of pixels to obtain the first corresponding video frame pair. a mask area;
    针对所述视频样本序列中的每个视频帧对,按照第二像素个数对所述视频帧对对应的原始掩膜帧中的原始掩膜区域进行扩张,得到所述视频帧对对应的第二掩膜区域,其中,所述第二像素个数大于所述第一像素个数;For each video frame pair in the video sample sequence, expand the original mask area in the original mask frame corresponding to the video frame pair according to the second number of pixels to obtain the third corresponding video frame pair. Two mask areas, wherein the second number of pixels is greater than the first number of pixels;
    针对所述视频样本序列中的每个视频帧对,对所述视频帧对对应的第一掩膜区域以及所述第二掩膜区域进行异或操作,得到所述视频帧对对应的目标掩膜帧。For each video frame pair in the video sample sequence, perform an XOR operation on the first mask area and the second mask area corresponding to the video frame pair to obtain the target mask corresponding to the video frame pair. Membrane frame.
  7. 根据权利要求1所述的方法,所述根据所述视频样本序列获取光流数据序列,包括:The method according to claim 1, obtaining an optical flow data sequence according to the video sample sequence includes:
    针对所述视频样本序列中的每个视频帧对,根据所述视频帧对中后一个 视频帧中各个像素点相对于前一个视频帧中各个像素点的水平偏移量以及竖直偏移量,确定所述视频帧对对应的光流数据;For each video frame pair in the video sample sequence, according to the latter of the video frame pairs The horizontal offset and vertical offset of each pixel in the video frame relative to each pixel in the previous video frame determines the optical flow data corresponding to the video frame pair;
    将所述K个视频帧对各自对应的光流数据作为所述光流数据序列;Use the optical flow data corresponding to each of the K video frame pairs as the optical flow data sequence;
    或,or,
    所述根据所述视频样本序列获取光流数据序列,包括:Obtaining the optical flow data sequence according to the video sample sequence includes:
    针对所述视频样本序列中的每个视频帧对,根据所述视频帧对中前一个视频帧中各个像素点相对于后一个视频帧中各个像素点的水平偏移量以及竖直偏移量,确定所述视频帧对对应的光流数据;For each video frame pair in the video sample sequence, according to the horizontal offset and vertical offset of each pixel point in the previous video frame relative to each pixel point in the next video frame in the video frame pair , determine the optical flow data corresponding to the video frame pair;
    将所述K个视频帧对各自对应的光流数据作为所述光流数据序列。The optical flow data corresponding to each of the K video frame pairs is used as the optical flow data sequence.
  8. 根据权利要求1所述的方法,所述基于所述光流数据序列中的各个光流数据,对所述每个目标掩膜帧中目标掩膜区域包括的像素点进行聚类处理,得到所述每个目标掩膜帧的光流聚类结果,包括:The method according to claim 1, performing clustering processing on the pixels included in the target mask area in each target mask frame based on each optical flow data in the optical flow data sequence to obtain the The optical flow clustering results of each target mask frame are described, including:
    针对所述每个目标掩膜帧,根据所述光流数据序列中所述目标掩膜帧对应的光流数据,确定所述目标掩膜帧中目标掩膜区域中X个像素点的二维光流值,其中,所述目标掩膜帧对应的光流数据与所述目标掩膜帧对应于同一视频帧对,所述X为大于1的整数;For each target mask frame, determine the two-dimensional image of X pixels in the target mask area in the target mask frame based on the optical flow data corresponding to the target mask frame in the optical flow data sequence. Optical flow value, wherein the optical flow data corresponding to the target mask frame and the target mask frame correspond to the same video frame pair, and the X is an integer greater than 1;
    针对所述每个目标掩膜帧,根据其中所述目标掩膜区域中X个像素点的二维光流值,对所述X个像素点进行聚类处理,得到所述目标掩膜帧的光流聚类结果。For each target mask frame, cluster the X pixels according to the two-dimensional optical flow values of X pixels in the target mask area to obtain the target mask frame. Optical flow clustering results.
  9. 根据权利要求1所述的方法,所述根据所述每个目标掩膜帧的光流聚类结果,确定光流质量分值,包括:The method according to claim 1, wherein determining the optical flow quality score based on the optical flow clustering results of each target mask frame includes:
    根据所述每个目标掩膜帧的光流聚类结果,确定所述每个目标掩膜帧的总类别数量;Determine the total number of categories for each target mask frame according to the optical flow clustering results of each target mask frame;
    统计所述总类别数量小于或等于类别数量阈值的目标掩膜帧数;Count the number of target mask frames in which the total number of categories is less than or equal to the category number threshold;
    根据所述目标掩膜帧数与所述K值之间的比值,确定类别单一比例;According to the ratio between the number of target mask frames and the K value, determine the single proportion of the category;
    若所述类别单一比例大于比例阈值,则确定所述光流质量分值为第一分值;If the single proportion of the category is greater than the proportion threshold, the optical flow quality score is determined to be the first score;
    若所述类别单一比例小于或等于所述比例阈值,则确定所述光流质量分值为第二分值。If the single proportion of the category is less than or equal to the proportion threshold, the optical flow quality score is determined to be the second score.
  10. 根据权利要求1所述的方法,所述根据所述每个目标掩膜帧的光流聚类结果,确定光流质量分值,包括:The method according to claim 1, wherein determining the optical flow quality score based on the optical flow clustering results of each target mask frame includes:
    针对所述每个目标掩膜帧的光流聚类结果,根据其中每个聚类簇中各个像素点的二维光流值,确定所述每个聚类簇的移动平均值,其中,所述光流聚类结果用于表征一个或多个聚类簇;For the optical flow clustering result of each target mask frame, the moving average of each cluster cluster is determined according to the two-dimensional optical flow value of each pixel in each cluster cluster, where, The above optical flow clustering results are used to characterize one or more clusters;
    针对所述每个目标掩膜帧的光流聚类结果,根据其中所述每个聚类簇的移动平均值,确定所述目标掩膜帧的移动平均值;For the optical flow clustering result of each target mask frame, determine the moving average of the target mask frame according to the moving average of each cluster cluster;
    对所述每个目标掩膜帧的移动平均值进行累加处理,得到移动总距离; Accumulate the moving average of each target mask frame to obtain the total moving distance;
    若所述移动总距离大于或等于距离阈值,则确定所述光流质量分值为第一分值;If the total moving distance is greater than or equal to the distance threshold, then determine the optical flow quality score as the first score;
    若所述移动总距离小于所述距离阈值,则确定所述光流质量分值为第二分值。If the total moving distance is less than the distance threshold, the optical flow quality score is determined to be the second score.
  11. 根据权利要求1所述的方法,所述根据所述每个目标掩膜帧的光流聚类结果,确定光流质量分值,包括:The method according to claim 1, wherein determining the optical flow quality score based on the optical flow clustering results of each target mask frame includes:
    根据所述每个目标掩膜帧的光流聚类结果,确定所述每个目标掩膜帧的总类别数量;Determine the total number of categories for each target mask frame according to the optical flow clustering results of each target mask frame;
    统计所述总类别数量小于或等于类别数量阈值的目标掩膜帧数;Count the number of target mask frames in which the total number of categories is less than or equal to the category number threshold;
    根据所述目标掩膜帧数与所述K值之间的比值,确定类别单一比例;According to the ratio between the number of target mask frames and the K value, determine the single proportion of the category;
    针对所述每个目标掩膜帧的光流聚类结果,根据其中每个聚类簇中各个像素点的二维光流值,确定所述每个聚类簇的移动平均值,其中,所述光流聚类结果用于表征一个或多个聚类簇;For the optical flow clustering result of each target mask frame, the moving average of each cluster cluster is determined according to the two-dimensional optical flow value of each pixel in each cluster cluster, where, The above optical flow clustering results are used to characterize one or more clusters;
    针对所述每个目标掩膜帧的光流聚类结果,根据其中所述每个聚类簇的移动平均值,确定所述目标掩膜帧的移动平均值;For the optical flow clustering result of each target mask frame, determine the moving average of the target mask frame according to the moving average of each cluster cluster;
    对所述每个目标掩膜帧的移动平均值进行累加处理,得到移动总距离;Accumulate the moving average of each target mask frame to obtain the total moving distance;
    若所述类别单一比例大于比例阈值,且,所述移动总距离大于或等于距离阈值,则确定所述光流质量分值为第一分值;If the single proportion of the category is greater than the proportion threshold, and the total moving distance is greater than or equal to the distance threshold, then the optical flow quality score is determined to be the first score;
    若所述类别单一比例小于或等于所述比例阈值,或,所述移动总距离小于所述距离阈值,则确定所述光流质量分值为第二分值。If the single category proportion is less than or equal to the proportion threshold, or the total movement distance is less than the distance threshold, then the optical flow quality score is determined to be the second score.
  12. 根据权利要求9至11任一项所述的方法,所述采用与所述光流质量分值匹配的视频修复方式,对所述待修复视频进行修复处理,包括:The method according to any one of claims 9 to 11, using a video repair method that matches the optical flow quality score to perform repair processing on the video to be repaired, including:
    若所述光流质量分值为所述第一分值,则采用光流法对所述待修复视频进行修复处理。If the optical flow quality score is the first score, the optical flow method is used to repair the video to be repaired.
    若所述光流质量分值为所述第二分值,则调用神经网络对所述待修复视频进行修复处理。If the optical flow quality score is the second score, a neural network is called to perform repair processing on the video to be repaired.
  13. 根据权利要求1所述的方法,所述方法还包括:The method of claim 1, further comprising:
    显示所述待修复视频以及修复对象列表,其中,所述修复对象列表包括至少一个可修复对象;Display the video to be repaired and a repair object list, wherein the repair object list includes at least one repairable object;
    响应针对于目标对象的选择操作,执行所述获取针对待修复视频的视频样本序列的步骤,其中,所述目标对象属于所述至少一个可修复对象;In response to a selection operation for a target object, performing the step of obtaining a video sample sequence for the video to be repaired, wherein the target object belongs to the at least one repairable object;
    所述采用与所述光流质量分值匹配的视频修复方式,对所述待修复视频进行修复处理之后,所述方法还包括:After the video repair method is used to match the optical flow quality score and the video to be repaired is repaired, the method further includes:
    响应针对已修复视频的播放操作,播放所述已修复视频。In response to a play operation for the repaired video, the repaired video is played.
  14. 一种视频修复装置,包括:A video repair device including:
    获取模块,用于获取待修复视频对应的视频样本序列,其中,所述视频样本序列包括K个视频帧对,每个视频帧对包括相邻的两个视频帧,所述K 为大于或等于1的整数;An acquisition module is used to acquire a video sample sequence corresponding to the video to be repaired, wherein the video sample sequence includes K video frame pairs, each video frame pair includes two adjacent video frames, and the K is an integer greater than or equal to 1;
    所述获取模块,还用于根据所述视频样本序列获取目标掩膜样本序列,其中,所述目标掩膜样本序列包括K个目标掩膜帧,每个目标掩膜帧包括对原始掩膜区域进行扩张后得到的目标掩膜区域,且,所述K个目标掩膜帧与所述K个视频帧对之间具有一一对应关系;The acquisition module is also configured to acquire a target mask sample sequence according to the video sample sequence, wherein the target mask sample sequence includes K target mask frames, and each target mask frame includes a pair of original mask regions. The target mask area obtained after expansion, and there is a one-to-one correspondence between the K target mask frames and the K video frame pairs;
    所述获取模块,还用于根据所述视频样本序列获取光流数据序列,其中,所述光流数据序列包括K个光流数据,且,所述K个光流数据与所述K个视频帧对之间具有一一对应关系;The acquisition module is also configured to acquire an optical flow data sequence according to the video sample sequence, wherein the optical flow data sequence includes K optical flow data, and the K optical flow data is consistent with the K video There is a one-to-one correspondence between frame pairs;
    处理模块,用于基于所述光流数据序列中的各个光流数据,对所述每个目标掩膜帧中目标掩膜区域包括的像素点进行聚类处理,得到所述每个目标掩膜帧的光流聚类结果;A processing module configured to cluster the pixels included in the target mask area in each target mask frame based on each optical flow data in the optical flow data sequence to obtain each target mask. Optical flow clustering results of frames;
    确定模块,用于根据所述每个目标掩膜帧的光流聚类结果,确定光流质量分值;A determination module, configured to determine the optical flow quality score according to the optical flow clustering result of each target mask frame;
    修复模块,用于采用与所述光流质量分值匹配的视频修复方式,对所述待修复视频进行修复处理。A repair module, configured to use a video repair method that matches the optical flow quality score to repair the video to be repaired.
  15. 一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,所述处理器执行所述计算机程序时实现权利要求1至13中任一项所述的方法的步骤。A computer device includes a memory and a processor. The memory stores a computer program. When the processor executes the computer program, the steps of the method according to any one of claims 1 to 13 are implemented.
  16. 一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现权利要求1至13中任一项所述的方法的步骤。A computer-readable storage medium having a computer program stored thereon, which implements the steps of the method according to any one of claims 1 to 13 when executed by a processor.
  17. 一种计算机程序产品,包括计算机程序,该计算机程序被处理器执行时实现权利要求1至13中任一项所述的方法的步骤。 A computer program product comprising a computer program which, when executed by a processor, implements the steps of the method according to any one of claims 1 to 13.
PCT/CN2023/075576 2022-04-06 2023-02-13 Video inpainting method, related apparatus, device and storage medium WO2023193521A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210355594.2A CN115170400A (en) 2022-04-06 2022-04-06 Video repair method, related device, equipment and storage medium
CN202210355594.2 2022-04-06

Publications (1)

Publication Number Publication Date
WO2023193521A1 true WO2023193521A1 (en) 2023-10-12

Family

ID=83482792

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/075576 WO2023193521A1 (en) 2022-04-06 2023-02-13 Video inpainting method, related apparatus, device and storage medium

Country Status (2)

Country Link
CN (1) CN115170400A (en)
WO (1) WO2023193521A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115170400A (en) * 2022-04-06 2022-10-11 腾讯科技(深圳)有限公司 Video repair method, related device, equipment and storage medium
CN117152658A (en) * 2023-05-10 2023-12-01 瀚博半导体(上海)有限公司 Method, apparatus, system, device and medium for video processing

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110533615A (en) * 2019-08-30 2019-12-03 上海大学 A kind of old film large area method for repairing damage based on generation confrontation network
CN111105382A (en) * 2019-12-31 2020-05-05 北京大学 Video repair method
US20200357099A1 (en) * 2019-05-09 2020-11-12 Adobe Inc. Video inpainting with deep internal learning
CN112200732A (en) * 2020-04-30 2021-01-08 南京理工大学 Video deblurring method with clear feature fusion
CN113436100A (en) * 2021-06-28 2021-09-24 北京百度网讯科技有限公司 Method, apparatus, device, medium and product for repairing video
CN115170400A (en) * 2022-04-06 2022-10-11 腾讯科技(深圳)有限公司 Video repair method, related device, equipment and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200357099A1 (en) * 2019-05-09 2020-11-12 Adobe Inc. Video inpainting with deep internal learning
CN110533615A (en) * 2019-08-30 2019-12-03 上海大学 A kind of old film large area method for repairing damage based on generation confrontation network
CN111105382A (en) * 2019-12-31 2020-05-05 北京大学 Video repair method
CN112200732A (en) * 2020-04-30 2021-01-08 南京理工大学 Video deblurring method with clear feature fusion
CN113436100A (en) * 2021-06-28 2021-09-24 北京百度网讯科技有限公司 Method, apparatus, device, medium and product for repairing video
CN115170400A (en) * 2022-04-06 2022-10-11 腾讯科技(深圳)有限公司 Video repair method, related device, equipment and storage medium

Also Published As

Publication number Publication date
CN115170400A (en) 2022-10-11

Similar Documents

Publication Publication Date Title
US10755173B2 (en) Video deblurring using neural networks
WO2023193521A1 (en) Video inpainting method, related apparatus, device and storage medium
CN109168026A (en) Instant video display methods, device, terminal device and storage medium
CN112954450B (en) Video processing method and device, electronic equipment and storage medium
CN111985281B (en) Image generation model generation method and device and image generation method and device
CN112565653B (en) Video frame insertion method, system, electronic equipment and storage medium
CN114071223A (en) Optical flow-based video interpolation frame generation method, storage medium and terminal equipment
Zhang et al. Sparse representation-based video quality assessment for synthesized 3D videos
WO2023056896A1 (en) Definition determination method and apparatus, and device
CN111353965B (en) Image restoration method, device, terminal and storage medium
CN111179195A (en) Depth image hole filling method and device, electronic equipment and storage medium thereof
CN112163993A (en) Image processing method, device, equipment and storage medium
CN114598919A (en) Video processing method, video processing device, computer equipment and storage medium
CN110049347A (en) In method, system, terminal and the device of live streaming interface configurations image
CN113034412A (en) Video processing method and device
CN108810319A (en) Image processing apparatus and image processing method
CN107491934B (en) 3D interview system based on virtual reality
CN111988520B (en) Picture switching method and device, electronic equipment and storage medium
CN115049572A (en) Image processing method, image processing device, electronic equipment and computer readable storage medium
CN110941413B (en) Display screen generation method and related device
KR20230022153A (en) Single-image 3D photo with soft layering and depth-aware restoration
WO2020108248A1 (en) Video playback method and apparatus
CN113628121A (en) Method and device for processing data and training multimedia data
CN113762016A (en) Key frame selection method and device
CN112055131A (en) Video processing system and method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23784085

Country of ref document: EP

Kind code of ref document: A1