CN108805898B - Video image processing method and device - Google Patents

Video image processing method and device Download PDF

Info

Publication number
CN108805898B
CN108805898B CN201810551722.4A CN201810551722A CN108805898B CN 108805898 B CN108805898 B CN 108805898B CN 201810551722 A CN201810551722 A CN 201810551722A CN 108805898 B CN108805898 B CN 108805898B
Authority
CN
China
Prior art keywords
video image
frame video
current frame
image
mask
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810551722.4A
Other languages
Chinese (zh)
Other versions
CN108805898A (en
Inventor
吴兴龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Douyin Vision Co Ltd
Beijing Volcano Engine Technology Co Ltd
Douyin Vision Beijing Co Ltd
Original Assignee
Beijing ByteDance Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing ByteDance Network Technology Co Ltd filed Critical Beijing ByteDance Network Technology Co Ltd
Priority to CN201810551722.4A priority Critical patent/CN108805898B/en
Publication of CN108805898A publication Critical patent/CN108805898A/en
Application granted granted Critical
Publication of CN108805898B publication Critical patent/CN108805898B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/215Motion-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/269Analysis of motion using gradient-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The application provides a video image processing method and a video image processing device, wherein the method comprises the following steps: acquiring a current frame video image; performing image segmentation on the current frame video image to obtain a first mask image corresponding to the current frame video image; determining historical motion information of the current frame video image according to the current frame video image and the previous frame video image, and obtaining a third mask image corresponding to the current frame video image according to a second mask image and the historical motion information corresponding to the previous frame video image; and calculating according to the historical motion information to obtain fusion weight, and performing weighted fusion on the first mask image and the third mask image according to the fusion weight to obtain a fusion mask image of the current frame video image. The video image processing method and the video image processing device can avoid the problems of jitter and delay, improve the stability and the fluency of the video and improve the accuracy of motion tracking.

Description

Video image processing method and device
Technical Field
The present application relates to the field of computer vision technologies, and in particular, to a video image processing method and apparatus.
Background
With the popularization and development of various video software applications, various video processing algorithms are widely applied to the processing of various video images. Among them, the video segmentation technique is widely used as a basic video processing means.
Conventional video segmentation techniques generally employ image segmentation, i.e., each image is masked by a segmentation mask. However, since the frames before and after the image may not be consistent, the segmentation method often generates significant jitter, resulting in poor video stability.
Disclosure of Invention
In view of the foregoing, it is desirable to provide a video image processing method and apparatus to improve the stability and smoothness of a video.
A video image processing method, said method comprising the steps of:
acquiring a current frame video image;
performing image segmentation on the current frame video image by adopting a convolutional neural network to obtain a first mask image corresponding to the current frame video image;
determining historical motion information of the current frame video image according to the current frame video image and a previous frame video image of the current frame video image, and obtaining a third mask image corresponding to the current frame video image according to a second mask image corresponding to the previous frame video image and the historical motion information of the current frame video image;
and calculating to obtain a fusion weight according to the historical motion information of the current frame video image, and performing weighted fusion on the first mask image and the third mask image according to the fusion weight to obtain a fusion mask image of the current frame video image.
In one embodiment, the step of determining the historical motion information of the current frame video image according to the current frame video image and the previous frame video image of the current frame video image comprises:
determining optical flow information of the current frame video image according to the current frame video image and a previous frame video image of the current frame video image, wherein the optical flow information is used for representing historical motion information of the current frame video image, and the optical flow information comprises a horizontal pixel offset and a vertical pixel offset of each pixel in the current frame video image.
In an embodiment, the step of obtaining a third mask image corresponding to the current frame video image according to the second mask image corresponding to the previous frame video image and the historical motion information of the current frame video image includes:
respectively superposing the horizontal pixel offset of each pixel of the current frame video image and the horizontal pixel of the second mask image, and calculating to obtain the horizontal pixel of the third mask image;
and respectively superposing the vertical pixel offset of each pixel of the current frame video image and the vertical pixel of the second mask image, and calculating to obtain the vertical pixel of the third mask image.
In one embodiment, the step of obtaining the fusion weight according to the historical motion information of the current frame video image comprises:
calculating according to the optical flow information of the current frame video image, a preset first parameter and a second parameter to obtain a first reference value;
comparing the first reference value with a preset second reference value, and taking the minimum value of the first reference value and the second reference value as the fusion weight;
the first parameter and the second parameter are constants, and the first reference value and the second reference value are both greater than zero and less than 1.
In one embodiment, the step of calculating a first reference value according to the optical flow information of the current frame video image, a preset first parameter and a second parameter includes:
taking a natural constant e as a base number, and taking the product of the optical flow information of the current frame video image and the preset first parameter as an index to perform exponential operation to obtain a third parameter;
and calculating to obtain the first reference value according to the third parameter and the preset second parameter.
In one embodiment, the second reference value has a value range of [0.8,0.95], the first parameter has a value range of [4,6], and the second parameter has a value range of [0.6,0.9 ].
In an embodiment, the step of performing weighted fusion on the first mask image and the third mask image according to the fusion weight to obtain a fusion mask image of the current frame video image includes:
and taking the fusion weight as a first weight corresponding to the first mask image, taking the difference between a preset total weight and the fusion weight as a second weight corresponding to the third mask image, and performing weighted fusion on the first mask image and the third mask image according to the first weight and the second weight.
In one embodiment, the present invention also provides a video image processing apparatus, comprising:
the acquisition module is used for acquiring a current frame video image;
the first segmentation module is used for carrying out image segmentation on the current frame video image to obtain a first mask image corresponding to the current frame video image;
the smoothing module is used for determining historical motion information of the current frame video image according to the current frame video image and a previous frame video image of the current frame video image, and obtaining a third mask image corresponding to the current frame video image according to a second mask image corresponding to the previous frame video image and the historical motion information;
and the fusion module is used for calculating to obtain fusion weight according to the historical motion information of the current frame video image, and performing weighted fusion on the first mask image and the third mask image according to the fusion weight to obtain a fusion mask image of the current frame video image.
In one embodiment, the present invention further provides an electronic device comprising a memory and a processor, the memory storing a computer program, the processor implementing the steps of any of the above methods when executing the computer program.
Furthermore, the invention also provides a computer-readable storage medium, on which a computer program is stored, which computer program, when being executed by a processor, carries out the steps of the method of any of the above.
According to the video image processing method and device, the current video image is segmented to obtain the first mask image of the current frame video image, the third mask image corresponding to the current frame video image is obtained according to the second mask image corresponding to the previous frame video image and the historical motion information of the current frame video image, the fusion weight of each pixel in the current frame video image is obtained through calculation according to the historical motion information of the current frame video image, and therefore the first mask image and the third mask image are fused according to the fusion weight, and the fusion mask image of the current frame video image is obtained. According to the video image processing method and device, the fusion weight is obtained through calculation of the historical motion information of the current frame video image, the first mask image and the third mask image are fused, the problem of shaking and the problem of delay caused by the phenomenon that the front frame and the rear frame of the video image are inconsistent can be avoided, and the stability and the smoothness of the video are improved. Meanwhile, by adopting the video image segmentation and fusion method, the specific features in the video image can be accurately identified, so that the accuracy of motion tracking of the specific features can be improved.
Drawings
FIG. 1 is a diagram illustrating an exemplary embodiment of a video image processing method;
FIG. 2 is a flow diagram illustrating a video image processing method according to one embodiment;
FIG. 3 is a flowchart illustrating a step of obtaining a third mask image corresponding to a current frame video image according to historical motion information in an embodiment;
FIG. 4 is a flowchart illustrating a process of obtaining a third mask image corresponding to a current frame video image according to a second mask image and historical motion information in an embodiment;
FIG. 5 is a step of performing weighted fusion of the first mask image and the third mask image according to the calculated fusion weight in one embodiment;
FIG. 6 is a block diagram showing the structure of a video image processing apparatus according to an embodiment;
FIG. 7 is a block diagram of a video image processing apparatus according to another embodiment;
FIG. 8 is a diagram of the internal structure of an electronic device in one embodiment;
FIG. 9 is a diagram of an image before and after segmentation using a convolutional neural network, according to an embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
The video image processing method provided by the application can be applied to the application environment shown in fig. 1. In which a terminal 101 communicates with a server 102 via a network. Specifically, the video image processing method may be applied to the server 102 described above, or to the terminal 101 (e.g., in a video software application installed on the terminal 101). The terminal 101 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices, and the server 102 may be implemented by an independent server or a server cluster formed by a plurality of servers. The following is an example of the application of the above method to a video software application installed on the terminal 101:
for example, the video software on the terminal 101 may obtain a certain video file from the server 102 through the network to play, and in the process of the video file, the terminal may obtain the video image of the current frame in real time, and perform image segmentation on the video image of the current frame to obtain the first mask image corresponding to the current video frame image, and correspondingly store the first mask image. Meanwhile, the terminal may determine the historical motion information of the current frame video image from the current frame video image and a previous frame video image of the current frame video image, for example, the historical motion information of the current frame video image may be optical flow information. And then, the terminal can obtain a third mask image of the current frame video image according to the second mask image corresponding to the previous frame video image and the historical motion information of the current frame video image.
Furthermore, the terminal can calculate and obtain fusion weight according to the historical motion information of the current frame video image, and perform weighted fusion on the first mask image and the third mask image according to the fusion weight to obtain a fusion mask image of the current frame video image, so that image segmentation of the current frame video image is realized. According to the video image processing method in the embodiment of the application, the fusion weight is obtained through calculation of the historical motion information of the current frame video image, and the first mask image and the third mask image are fused, so that the problem of shaking and delay caused by the phenomenon that the front frame and the rear frame of the video image are inconsistent can be avoided, and the stability and the fluency of the video are improved. Meanwhile, by adopting the video image segmentation and fusion method, the specific features in the video image can be accurately identified, so that the accuracy of motion tracking of the specific features can be improved.
In an embodiment, as shown in fig. 2, the video image processing method according to the embodiment of the present application is used for segmenting and fusing video images to improve stability and smoothness of the video. The method comprises the following steps:
s100, acquiring a current frame video image; specifically, the current frame video image may be a current frame of a video file being played. Further, the video file may be an offline video file stored in a memory on the terminal, or an online video file acquired by the terminal from a server. For example, the video file may be an online video file acquired by a terminal (e.g., a mobile phone) from a server, and at this time, when a user requests to play a specified video file through video software on the mobile phone, the terminal may transmit the video playing request to the server through a network, and the server may return a playing address and the like of the specified video file, so that the specified video file may be played on the mobile phone. In the playing process of the video file, the terminal can acquire the current frame video image of the current appointed video file in real time.
S200, performing image segmentation on the current frame video image to obtain a first mask image corresponding to the current frame video image; specifically, a convolutional neural network may be used to perform image segmentation on the current frame video image. Further, the Convolutional Neural network in the embodiment of the present application may adopt a conventional CNN (Convolutional Neural network), that is, in order to classify a pixel, an image block around the pixel is used as an input of the CNN for training and prediction. Optionally, the process of performing image segmentation on the current frame video image by using the convolutional neural network includes identifying each object in the current image by using the convolutional neural network, classifying the identified objects, and finally performing a series of operations such as detection and pooling on the edge of the target object to obtain a mask image of the target object, so as to distinguish the target object from other objects in the current frame video image. Optionally, the Convolutional neural network in the embodiment of the present application may also adopt FCN (full Convolutional neural network). The first mask image obtained before and after image segmentation of the current frame video image can be seen in fig. 9.
S300, determining historical motion information of the current frame video image according to the current frame video image and the previous frame video image of the current frame video image, and obtaining a third mask image corresponding to the current frame video image according to the second mask image corresponding to the previous frame video image and the historical motion information of the current frame video image. Specifically, in this embodiment of the present application, the second mask image corresponding to the previous video image may be a mask image obtained by calculation according to a convolutional neural network, or a fused mask image obtained by mutually fusing a mask image obtained by calculation according to a convolutional neural network and a mask image obtained by historical motion information. Furthermore, the historical motion information of the current frame video image is used for representing the difference between the current frame video image and the previous frame video image, that is, the historical motion information of the current frame video image is used for representing the offset of each pixel in the current frame video image, so that the problem of inconsistency of the previous frame video image and the next frame video image can be avoided by combining the historical motion information of the current frame video image and the second mask image corresponding to the previous frame video image, and the problem of image jitter is further avoided. Alternatively, the above-mentioned historical motion information may be represented using optical flow information.
S400, calculating according to historical motion information of the current frame video image to obtain fusion weight, and fusing the first mask image and the third mask image according to the fusion weight to obtain a fusion mask image of the current frame video image. Optionally, a first weight corresponding to the first mask image and a second weight corresponding to the third mask image may be determined according to the fusion weight, so that the first mask image and the third mask image may be subjected to weighted fusion according to the first weight and the second weight, and the fusion mask image of the current frame video image is obtained through calculation, thereby implementing image segmentation of the current frame video image.
According to the video image processing method in the embodiment of the application, the fusion weight is obtained through calculation of the historical motion information of the current frame video image, and the first mask image and the third mask image are fused, so that the problem of shaking and delay caused by the phenomenon that the front frame and the rear frame of the video image are inconsistent can be avoided, and the stability and the fluency of the video are improved. Meanwhile, by adopting the video image segmentation and fusion method, the specific features in the video image can be accurately identified, so that the accuracy of motion tracking of the specific features can be improved.
In one embodiment, as shown in fig. 3, the step S300 may include:
s310, determining optical flow information of the current frame video image according to the current frame video image and a previous frame video image of the current frame video image, wherein the optical flow information is used for representing historical motion information of the current frame video image, and the optical flow information of the current frame video image comprises a horizontal pixel offset and a vertical pixel offset of each pixel in the current frame video image. Specifically, the optical flow information is used to represent a correspondence relationship between a current frame video image and a previous frame video image, that is, historical motion information of the current frame video image in the embodiment of the present application is obtained by adopting an optical flow method. Where the optical flow is the "instantaneous velocity" of pixel motion of a spatially moving object on the observation imaging plane. The optical flow method is a method for finding out the corresponding relationship between a current frame video image and a previous frame video image by using the change of pixels in an image sequence in a time domain and the correlation between adjacent frames and calculating the motion information of an object between the adjacent frames. The above-mentioned horizontal pixel shift amount may be a horizontal motion velocity value of the pixel, and the above-mentioned vertical pixel shift amount may be a vertical motion velocity value of the pixel. Alternatively, a sparse optical flow method or a dense optical flow method may be employed in the embodiments of the present application. In the embodiment of the application, optical flow information can be obtained by computing the interface calcd optical flow Farnenback provided by Opencv (OpenCV is a cross-platform computer vision library issued based on BSD license (open source) and can run on Linux, Windows, Android and Mac OS operating systems).
S320, obtaining a third mask image corresponding to the current frame video image according to the second mask image corresponding to the previous frame video image and the optical flow information of the current frame video image. The third mask image corresponding to the current frame video image is obtained by utilizing the historical motion information of the pixels in the image, and the problem of video jitter caused by the inconsistency of the previous frame and the next frame can be avoided by considering the dynamic motion characteristics of the pixels in the image.
In one embodiment, as shown in fig. 4, the step S320 may further include:
s321, overlapping the horizontal pixel offset of each pixel of the current frame video image with the horizontal pixel of the second mask image respectively, and calculating to obtain the horizontal pixel of the third mask image;
and S322, respectively superposing the vertical pixel offset of each pixel of the current frame video image and the vertical pixel of the second mask image, and calculating to obtain the vertical pixel of the third mask image.
Specifically, the third mask image Mt2(i,j)=Mt-1(i + F (i, j,0), j + F (i, j, 1)), where Mt2(i, j) represents a pixel of the third mask image corresponding to the current frame video image, Mt-1(i, j) represents a pixel of the second mask image corresponding to the previous frame of the video image, F (i, j,0) represents a horizontal pixel shift amount of the pixel, F (i, j,1) represents a vertical pixel shift amount of the pixel, i represents a horizontal pixel, and j represents a vertical pixel.
In one embodiment, as shown in fig. 5, the step S400 may include:
s410, calculating according to optical flow information of a current frame video image, a preset first parameter and a preset second parameter to obtain a first reference value; specifically, optical flow information of a current frame video image is obtained according to a horizontal pixel offset and a vertical pixel offset of each pixel, and then a first reference value is obtained extremely according to the optical flow information of the current frame video image, a preset first parameter and a preset second parameter. Alternatively, the optical flow information F (i, j) ═ F (i, j,0) × F (i, j,0) + F (i, j,1) × F (i, j,1) of the current frame video image, where F (i, j,0) represents the horizontal pixel shift amount of the current pixel, F (i, j,1) represents the vertical pixel shift amount of the current pixel, i represents the horizontal pixel, and j represents the vertical pixel. The above-mentioned horizontal pixel shift amount may be a horizontal motion velocity value of the pixel, and the above-mentioned vertical pixel shift amount may be a vertical motion velocity value of the pixel.
Further, a product of the optical flow information of the current frame video image and the first parameter may be obtained by performing a multiplication operation, and then a natural constant e is used as a base number, and an exponential operation is performed on the product as an exponent to obtain a third parameter, and a first reference value is obtained by calculating according to the third parameter and a preset second parameter. Furthermore, the reciprocal of the exponential operation result and a preset second parameter are subjected to multiplication operation to obtain a first reference value.
And S420, comparing the first reference value with a preset second reference value, and taking the minimum value of the first reference value and the second reference value as the fusion weight of the current pixel, wherein the first parameter and the second parameter are constants, and both the first reference value and the second reference value are greater than zero and less than 1. According to the video image processing method in the embodiment of the application, the fusion weight is obtained through historical motion information calculation, and the first mask image and the third mask image are fused by using the fusion weight, so that the problem of shaking and delay caused by the phenomenon that the front frame and the rear frame of the video image are inconsistent can be avoided, and the stability and the fluency of the video are improved. Meanwhile, by adopting the video image segmentation and fusion method, the specific features in the video image can be accurately identified, so that the accuracy of motion tracking of the specific features can be improved.
Alternatively, the fusion weight W (i, j) ═ min (r1, 1-1/e ^ ((r2 × F (i, j)))) r3), r1 denotes the second reference value, r2 denotes the first parameter, r3 denotes the second parameter, and (1-1/e ^ ((r2 × F (i, j)))) r3) denotes the first reference value, where F (i, j) denotes optical flow information, i.e., the motion velocity of the pixel. Further, the value range of the second reference value may be [0.8,0.95], the value range of the first parameter may be [4,6], and the value range of the second parameter may be [0.6,0.9 ].
In one embodiment, the step S400 may further include:
and taking the fusion weight as a first weight corresponding to the first mask image, taking the difference between the preset total weight and the fusion weight as a second weight corresponding to the third mask image, and performing weighted fusion on the first mask image and the third mask image according to the first weight and the second weight. Specifically, the first weight is equal to the fusion weight W (i, j), and the second weight is (1-W (i, j)).
At this time, the fusion mask image M (i, j) ═ W (i, j) × Mt1(i,j)+(1-W(i,j))*Mt2(i, j) wherein Mt1(i, j) represents a first mask image corresponding to the current frame video image; mt2And (i, j) represents a third mask image corresponding to the current frame video image.
For example, when the second reference value r1 is 0.9, the first parameter r2 is 5, and the second parameter r3 is 0.8, the first reference value is (1-1/e ^ ((r2 ^ F (i, j)))) r3, and at this time, it is necessary to compare whether the first reference value is less than the second reference value 0.9, and if the first reference value is less than 0.9, the first reference value is used as the fusion weight, and the fusion mask image is calculated according to the above-mentioned weight calculation method.
And if the first reference value is larger than 0.9, taking the second reference value as a fusion weight, and calculating according to the weighting calculation mode to obtain a fusion mask image. At this time, the first weight corresponding to the first mask image is 0.9, the second weight corresponding to the third mask image is 0.1, and the fused mask image M (i, j) is 0.9 × Mt1(i,j)+0.1*Mt2(i,j)。
When the second reference value r1 is equal to 1 and the second parameter r3 is equal to 0, the first reference value 1-1/e ^ ((r2 ^ F (i, j))) > r3 is equal to 1, at this time, the fusion weight W (i, j) is equal to 1, at this time, the fusion mask image is the first mask image, which is equivalent to not performing the smoothing process on the first mask image obtained by the convolution calculation.
When the second reference value r1 is 0.5, the first parameter r2 is 0, and the second parameter r3 is 0.5, the first reference value 1-1/e ^ ((r2 ^ F (i, j))). r3 is 0.5, when the fusion weight W (i, j) is 0.5, and the fusion mask image M (i, j) ═ 0.5Mt1(i,j)+0.5*Mt2(i, j) corresponds to average smoothing.
It should be understood that although the various steps in the flow charts of fig. 2-5 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 2-5 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternating with other steps or at least some of the sub-steps or stages of other steps.
In one embodiment, as shown in fig. 6, the present application further provides a video image processing apparatus, which includes an obtaining module 100, a first segmentation module 200, a smoothing module 300, and a fusion module 400. The obtaining module 100 is configured to obtain a current frame video image; the first segmentation module 200 is configured to perform image segmentation on the current frame video image to obtain a first mask image corresponding to the current frame video image; specifically, in the embodiment of the present application, a conventional CNN (Convolutional Neural Networks) may be used as the Convolutional Neural network, that is, in order to classify one pixel, an image block around the pixel is used as an input of the CNN for training and prediction, and a specific implementation manner may refer to a method for implementing image segmentation by using the conventional CNN. Optionally, the convolutional neural network in the embodiment of the present application may also adopt FCN (full convolutional neural network). The first mask image obtained before and after image segmentation of the current frame video image can be seen in fig. 9.
The smoothing module 300 is configured to determine historical motion information of the current frame video image according to the current frame video image and a previous frame video image of the current frame video image, and obtain a third mask image corresponding to the current frame video image according to a second mask image corresponding to the previous frame video image and the historical motion information of the current frame video image; specifically, in this embodiment of the present application, the second mask image corresponding to the previous video image may be a mask image obtained by calculation according to a convolutional neural network, or a fused mask image obtained by mutually fusing a mask image obtained by calculation according to a convolutional neural network and a mask image obtained by historical motion information. Furthermore, the historical motion information of the current frame video image is used for representing the difference between the current frame video image and the previous frame video image, so that the problem of inconsistency of the previous frame video image and the next frame video image can be avoided by combining the historical motion information of the current frame video image and the second mask image corresponding to the previous frame video image, and the problem of image jitter is further avoided. Alternatively, the above-mentioned historical motion information may be represented using optical flow information.
The fusion module 400 is configured to calculate to obtain a fusion weight according to the historical motion information, and fuse the first mask image and the third mask image according to the fusion weight to obtain a fusion mask image of the current frame video image. In the embodiment of the application, a first weight corresponding to the first mask image and a second weight corresponding to the third mask image can be determined according to the fusion weight, so that the first mask image and the third mask image can be subjected to weighted fusion according to the first weight and the second weight, the fusion mask image of the current frame video image is obtained through calculation, and the image segmentation of the current frame video image is realized.
According to the video image processing device in the embodiment of the application, the fusion weight is obtained through the calculation of the historical motion information of the current frame video image, and the first mask image and the third mask image are fused, so that the problem of shaking and delay caused by the phenomenon that the front frame and the rear frame of the video image are inconsistent can be avoided, and the stability and the smoothness of the video are improved. Meanwhile, by adopting the video image segmentation and fusion method, the specific features in the video image can be accurately identified, so that the accuracy of motion tracking of the specific features can be improved.
In one embodiment, as shown in fig. 7, the smoothing module 300 may include a speed calculation unit 310 and a smoothing unit 320. The speed calculation unit 310 is configured to calculate and obtain optical flow information according to the current frame video image and a previous frame video image of the current frame video image, where the optical flow information is used to represent historical motion information of the current frame video image, and the optical flow information includes a horizontal pixel offset and a vertical pixel offset of each pixel in the current frame video image.
In an embodiment, the smoothing unit 320 is configured to separately superimpose the horizontal pixel offset of each pixel of the current frame video image and the horizontal pixel of the second mask image, and calculate to obtain the horizontal pixel of the third mask image; and respectively superposing the vertical pixel offset of each pixel of the current frame video image and the vertical pixel of the second mask image, and calculating to obtain the vertical pixel of the third mask image.
In one embodiment, as shown in fig. 7, the fusion module 400 further includes a weight calculation unit 410 and a fusion unit 420. The weight calculation unit 410 is configured to calculate and obtain a first reference value according to the optical flow information, a preset first parameter and a second parameter; and comparing the first reference value with a preset second reference value, and taking the minimum value of the first reference value and the second reference value as a fusion weight, wherein the first parameter and the second parameter are constants, and the first reference value and the second reference value are both greater than zero and less than 1. The fusion unit 420 is configured to use the fusion weight as a first weight corresponding to the first mask image, use a difference between a preset total weight and the fusion weight as a second weight corresponding to the third mask image, and perform weighted fusion on the first mask image and the third mask image according to the first weight and the second weight.
In an embodiment, the weight calculating unit 410 is specifically configured to perform an exponential operation by using a natural constant e as a base number and using a product of the optical flow information of the current frame video image and the preset first parameter as an exponent to obtain a third parameter;
and calculating to obtain the first reference value according to the third parameter and the preset second parameter.
Optionally, the fusion weight W (i, j) ═ min (r1, 1-1/e ^ ((r2 ^ F (i, j))). r 3);
where r1 represents the second reference value, r2 represents the first parameter, r3 represents the second parameter, (1-1/e ^ ((r2 x F (i, j))). r3) represents the first reference value, and F (i, j) represents optical flow information.
In one embodiment, the second reference value has a value range of [0.8,0.95], the first parameter has a value range of [4,6], and the second parameter has a value range of [0.6,0.9 ].
For specific limitations of the video image processing apparatus, reference may be made to the above limitations of the video image processing method, which are not described herein again. The respective modules in the video image processing apparatus described above may be wholly or partially implemented by software, hardware, and a combination thereof. The modules can be embedded in a hardware form or independent of a processor in the electronic device, or can be stored in a memory in the electronic device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, an electronic device is provided, which may be a terminal, and its internal structure diagram may be as shown in fig. 8. The electronic device comprises a processor, a memory, a network interface, a display screen and an input device which are connected through a system bus. Wherein the processor of the electronic device is configured to provide computing and control capabilities. The memory of the electronic equipment comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The network interface of the electronic device is used for connecting and communicating with an external terminal through a network. The computer program is executed by a processor to implement a video processing method. The display screen of the electronic equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the electronic equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the electronic equipment, an external keyboard, a touch pad or a mouse and the like.
Those skilled in the art will appreciate that the structure shown in fig. 8 is a block diagram of only a portion of the structure relevant to the present disclosure, and does not constitute a limitation on the electronic device to which the present disclosure may be applied, and that a particular electronic device may include more or less components than those shown, or combine certain components, or have a different arrangement of components.
Specifically, the electronic device may include a memory and a processor, the memory stores a computer program, and the processor implements the following steps when executing the computer program:
acquiring a current frame video image;
performing image segmentation on the current frame video image to obtain a first mask image corresponding to the current frame video image;
determining historical motion information of the current frame video image according to the current frame video image and a previous frame video image of the current frame video image, and obtaining a third mask image corresponding to the current frame video image according to a second mask image corresponding to the previous frame video image and the historical motion information of the current frame video image;
and calculating to obtain a fusion weight according to the historical motion information of the current frame video image, and fusing the first mask image and the third mask image according to the fusion weight to obtain a fusion mask image of the current frame video image.
In one embodiment, when the processor executes the step of determining the historical motion information of the current frame video image according to the current frame video image and the previous frame video image of the current frame video image, the following steps are specifically executed:
determining optical flow information of the current frame video image according to the current frame video image and a previous frame video image of the current frame video image, wherein the optical flow information is used for representing historical motion information of the current frame video image, and the optical flow information comprises a horizontal pixel offset and a vertical pixel offset of each pixel in the current frame video image.
In an embodiment, when the processor executes the step of obtaining the third mask image corresponding to the current frame video image according to the second mask image corresponding to the previous frame video image and the historical motion information of the current frame video image, the following steps are specifically executed:
respectively superposing the horizontal pixel offset of each pixel of the current frame video image and the horizontal pixel of the second mask image, and calculating to obtain the horizontal pixel of the third mask image;
and respectively superposing the vertical pixel offset of each pixel of the current frame video image and the vertical pixel of the second mask image, and calculating to obtain the vertical pixel of the third mask image.
In one embodiment, when the processor performs the step of obtaining the fusion weight according to the historical motion information of the current frame video image, the following steps are specifically performed:
calculating according to the optical flow information, a preset first parameter and a second parameter to obtain a first reference value;
and comparing the first reference value with a preset second reference value, and taking the minimum value of the first reference value and the second reference value as the fusion weight, wherein the first parameter and the second parameter are constants, and the first reference value and the second reference value are both greater than zero and less than 1.
In one embodiment, the step of calculating a first reference value according to the optical flow information of the current frame video image, a preset first parameter and a second parameter includes:
taking a natural constant e as a base number, and taking the product of the optical flow information of the current frame video image and the preset first parameter as an index to perform exponential operation to obtain a third parameter;
and calculating to obtain the first reference value according to the third parameter and the preset second parameter.
In one embodiment, the fusion weight W (i, j) ═ min (r1, 1-1/e ^ ((r2 ^ F (i, j))) > r 3);
wherein r1 represents the second reference value, r2 represents the first parameter, r3 represents the second parameter, (1-1/e ^ ((r2 x F (i, j)))) r3 represents the first reference value; f (i, j) represents optical flow information.
In one embodiment, the second reference value has a value range of [0.8,0.95], the first parameter has a value range of [4,6], and the second parameter has a value range of [0.6,0.9 ].
In an embodiment, when the processor performs the step of performing weighted fusion on the first mask image and the third mask image according to the fusion weight to obtain the fusion mask image of the current frame video image, the following steps are specifically performed:
and taking the fusion weight as a first weight corresponding to the first mask image, taking the difference between a preset total weight and the fusion weight as a second weight corresponding to the third mask image, and performing weighted fusion on the first mask image and the third mask image according to the first weight and the second weight.
It should be clear that, in the embodiment of the present application, the process of implementing the video image segmentation and smoothing processing by the electronic device is consistent with the execution process of the video image processing method described above, and specific reference may be made to the description above.
Furthermore, an embodiment of the present application further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the following steps:
acquiring a current frame video image;
performing image segmentation on the current frame video image to obtain a first mask image corresponding to the current frame video image;
determining historical motion information of the current frame video image according to the current frame video image and a previous frame video image of the current frame video image, and obtaining a third mask image corresponding to the current frame video image according to a second mask image corresponding to the previous frame video image and the historical motion information of the current frame video image;
and calculating to obtain a fusion weight according to the historical motion information of the current frame video image, and fusing the first mask image and the third mask image according to the fusion weight to obtain a fusion mask image of the current frame video image.
In one embodiment, when the computer program is executed by the processor to implement the step of determining the historical motion information of the current frame video image according to the current frame video image and the previous frame video image of the current frame video image, the following steps are specifically implemented:
determining optical flow information of the current frame video image according to the current frame video image and a previous frame video image of the current frame video image, wherein the optical flow information is used for representing historical motion information of the current frame video image, and the optical flow information comprises a horizontal pixel offset and a vertical pixel offset of each pixel in the current frame video image.
In an embodiment, when the computer program is executed by the processor to implement the step of obtaining the third mask image corresponding to the current frame video image according to the second mask image corresponding to the previous frame video image and the historical motion information, the following steps are specifically implemented:
respectively superposing the horizontal pixel offset of each pixel of the current frame video image and the horizontal pixel of the second mask image, and calculating to obtain the horizontal pixel of the third mask image;
and respectively superposing the vertical pixel offset of each pixel of the current frame video image and the vertical pixel of the second mask image, and calculating to obtain the vertical pixel of the third mask image.
In one embodiment, when the computer program is executed by the processor to implement the step of obtaining the fusion weight according to the historical motion information, the following steps are specifically implemented:
calculating according to the optical flow information of the current frame video image, a preset first parameter and a second parameter to obtain a first reference value;
and comparing the first reference value with a preset second reference value, and taking the minimum value of the first reference value and the second reference value as the fusion weight, wherein the first parameter and the second parameter are constants, and the first reference value and the second reference value are both greater than zero and less than 1.
In one embodiment, when the computer program is executed by the processor to calculate and obtain the first reference value according to the optical flow information of the current frame video image, the preset first parameter and the second parameter, the following steps are specifically implemented:
taking a natural constant e as a base number, and taking the product of the optical flow information of the current frame video image and the preset first parameter as an index to perform exponential operation to obtain a third parameter;
and calculating to obtain the first reference value according to the third parameter and the preset second parameter.
In one embodiment, the fusion weight W (i, j) ═ min (r1, 1-1/e ^ ((r2 ^ F (i, j))) > r 3);
where r1 represents the second reference value, r2 represents the first parameter, r3 represents the second parameter, (1-1/e ^ ((r2 x F (i, j))). r3) represents the first reference value, and F (i, j) represents optical flow information.
In one embodiment, the second reference value has a value range of [0.8,0.95], the first parameter has a value range of [4,6], and the second parameter has a value range of [0.6,0.9 ].
In an embodiment, when the computer program is executed by the processor to implement the step of performing weighted fusion on the first mask image and the third mask image according to the fusion weight to obtain the fusion mask image of the current frame video image, the following steps are specifically implemented:
and taking the fusion weight as a first weight corresponding to the first mask image, taking the difference between a preset total weight and the fusion weight as a second weight corresponding to the third mask image, and performing weighted fusion on the first mask image and the third mask image according to the first weight and the second weight.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
According to the video image processing method and device, the current video image is segmented to obtain the first mask image of the current frame video image, the third mask image corresponding to the current frame video image is obtained according to the second mask image corresponding to the previous frame video image and the historical motion information of the current frame video image, the fusion weight of each pixel in the current frame video image is obtained through calculation according to the historical motion information of the current frame video image, and therefore the first mask image and the third mask image are fused according to the fusion weight, and the fusion mask image of the current frame video image is obtained. According to the video image processing method and device, the fusion weight is obtained through calculation of the historical motion information of the current frame video image, the first mask image and the third mask image are fused, the problem of shaking and the problem of delay caused by the phenomenon that the front frame and the rear frame of the video image are inconsistent can be avoided, and the stability and the smoothness of the video are improved.
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (9)

1. A method for processing video images, said method comprising the steps of:
acquiring a current frame video image;
performing image segmentation on the current frame video image to obtain a first mask image corresponding to the current frame video image;
determining historical motion information of the current frame video image according to the current frame video image and a previous frame video image of the current frame video image, and obtaining a third mask image corresponding to the current frame video image according to a second mask image corresponding to the previous frame video image and the historical motion information of the current frame video image;
calculating to obtain a fusion weight according to the historical motion information of the current frame video image, and fusing the first mask image and the third mask image according to the fusion weight to obtain a fusion mask image of the current frame video image;
wherein the step of determining the historical motion information of the current frame video image according to the current frame video image and the previous frame video image of the current frame video image comprises:
determining optical flow information of the current frame video image according to the current frame video image and a previous frame video image of the current frame video image, wherein the optical flow information is used for representing historical motion information of the current frame video image, and the optical flow information comprises a horizontal pixel offset and a vertical pixel offset of each pixel in the current frame video image.
2. The method according to claim 1, wherein the step of obtaining a third mask image corresponding to the current frame video image according to the second mask image corresponding to the previous frame video image and the historical motion information of the current frame video image comprises:
superposing the horizontal pixel offset of each pixel of the current frame video image with the horizontal pixel of the second mask image, and calculating to obtain the horizontal pixel of the third mask image;
and superposing the vertical pixel offset of each pixel of the current frame video image and the vertical pixel of the second mask image, and calculating to obtain the vertical pixel of the third mask image.
3. The method according to claim 1, wherein the step of obtaining the fusion weight according to the historical motion information of the current frame video image comprises:
calculating according to the optical flow information of the current frame video image, a preset first parameter and a second parameter to obtain a first reference value;
comparing the first reference value with a preset second reference value, and taking the minimum value of the first reference value and the second reference value as the fusion weight;
the first parameter and the second parameter are constants, and the first reference value and the second reference value are both greater than zero and less than 1.
4. The method according to claim 3, wherein the step of calculating the first reference value according to the optical flow information of the current frame video image, the preset first parameter and the second parameter comprises:
taking a natural constant e as a base number, and taking the product of the optical flow information of the current frame video image and the preset first parameter as an index to perform exponential operation to obtain a third parameter;
and calculating to obtain the first reference value according to the third parameter and the preset second parameter.
5. The method according to claim 3 or 4, wherein the second reference value is in the range of [0.8,0.95], the first parameter is in the range of [4,6], and the second parameter is in the range of [0.6,0.9 ].
6. The method according to claim 1, wherein the step of performing weighted fusion on the first mask image and the third mask image according to the fusion weight to obtain the fusion mask image of the current frame video image comprises:
and taking the fusion weight as a first weight corresponding to the first mask image, taking the difference between a preset total weight and the fusion weight as a second weight corresponding to the third mask image, and performing weighted fusion on the first mask image and the third mask image according to the first weight and the second weight.
7. A video image processing apparatus, characterized in that the apparatus comprises:
the acquisition module is used for acquiring a current frame video image;
the first segmentation module is used for carrying out image segmentation on the current frame video image to obtain a first mask image corresponding to the current frame video image;
the smoothing module is used for determining historical motion information of the current frame video image according to the current frame video image and a previous frame video image of the current frame video image, and obtaining a third mask image corresponding to the current frame video image according to a second mask image corresponding to the previous frame video image and the historical motion information of the current frame video image;
the fusion module is used for calculating to obtain fusion weight according to the historical motion information of the current frame video image, and performing weighted fusion on the first mask image and the third mask image according to the fusion weight to obtain a fusion mask image of the current frame video image;
wherein, the determining the historical motion information of the current frame video image according to the current frame video image and the previous frame video image of the current frame video image comprises:
determining optical flow information of the current frame video image according to the current frame video image and a previous frame video image of the current frame video image, wherein the optical flow information is used for representing historical motion information of the current frame video image, and the optical flow information comprises a horizontal pixel offset and a vertical pixel offset of each pixel in the current frame video image.
8. An electronic device comprising a memory and a processor, the memory storing a computer program, wherein the processor implements the steps of the method of any one of claims 1 to 6 when executing the computer program.
9. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 6.
CN201810551722.4A 2018-05-31 2018-05-31 Video image processing method and device Active CN108805898B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810551722.4A CN108805898B (en) 2018-05-31 2018-05-31 Video image processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810551722.4A CN108805898B (en) 2018-05-31 2018-05-31 Video image processing method and device

Publications (2)

Publication Number Publication Date
CN108805898A CN108805898A (en) 2018-11-13
CN108805898B true CN108805898B (en) 2020-10-16

Family

ID=64089770

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810551722.4A Active CN108805898B (en) 2018-05-31 2018-05-31 Video image processing method and device

Country Status (1)

Country Link
CN (1) CN108805898B (en)

Families Citing this family (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111260548B (en) * 2018-11-30 2023-07-21 浙江宇视科技有限公司 Mapping method and device based on deep learning
CN111489320A (en) * 2019-01-29 2020-08-04 华为技术有限公司 Image processing method and device
CN110348369B (en) * 2019-07-08 2021-07-06 北京字节跳动网络技术有限公司 Video scene classification method and device, mobile terminal and storage medium
CN111882578A (en) * 2019-07-19 2020-11-03 广州虎牙科技有限公司 Foreground image acquisition method, foreground image acquisition device and electronic equipment
CN110276739B (en) * 2019-07-24 2021-05-07 中国科学技术大学 Video jitter removal method based on deep learning
CN112351221B (en) 2019-08-09 2024-02-13 北京字节跳动网络技术有限公司 Image special effect processing method, device, electronic equipment and computer readable storage medium
CN110838132B (en) * 2019-11-15 2022-08-05 北京字节跳动网络技术有限公司 Object segmentation method, device and equipment based on video stream and storage medium
CN112927144A (en) * 2019-12-05 2021-06-08 北京迈格威科技有限公司 Image enhancement method, image enhancement device, medium, and electronic apparatus
CN111028346B (en) * 2019-12-23 2023-10-10 北京奇艺世纪科技有限公司 Reconstruction method and device of video object
CN111464834B (en) * 2020-04-07 2023-04-07 腾讯科技(深圳)有限公司 Video frame processing method and device, computing equipment and storage medium
CN111901595B (en) * 2020-06-29 2021-07-20 北京大学 Video coding method, device and medium based on deep neural network
CN114494927A (en) * 2020-11-12 2022-05-13 阿里巴巴集团控股有限公司 Image processing method and device, electronic equipment and readable storage medium
CN114511481A (en) * 2020-11-16 2022-05-17 中兴通讯股份有限公司 Image fusion method, video image processing apparatus, and computer-readable storage medium
CN112837323A (en) * 2021-01-12 2021-05-25 全时云商务服务股份有限公司 Video processing method, system and storage medium based on portrait segmentation
CN115018877B (en) * 2021-03-03 2024-09-17 腾讯科技(深圳)有限公司 Method, device, equipment and storage medium for displaying special effects in ground area
CN113066092B (en) * 2021-03-30 2024-08-27 联想(北京)有限公司 Video object segmentation method and device and computer equipment
CN113313661B (en) * 2021-05-26 2024-07-26 Oppo广东移动通信有限公司 Image fusion method, device, electronic equipment and computer readable storage medium
CN113902760B (en) * 2021-10-19 2022-05-17 深圳市飘飘宝贝有限公司 Object edge optimization method, system, device and storage medium in video segmentation
CN114125462B (en) * 2021-11-30 2024-03-12 北京达佳互联信息技术有限公司 Video processing method and device
CN114549535A (en) * 2022-01-28 2022-05-27 北京百度网讯科技有限公司 Image segmentation method, device, equipment, storage medium and product
CN114693702B (en) * 2022-03-24 2023-04-07 小米汽车科技有限公司 Image processing method, image processing device, electronic equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102420985A (en) * 2011-11-29 2012-04-18 宁波大学 Multi-view video object extraction method
CN104410855A (en) * 2014-11-05 2015-03-11 广州中国科学院先进技术研究所 Jitter detection method of monitoring video
CN104966286A (en) * 2015-06-04 2015-10-07 电子科技大学 3D video saliency detection method
CN106097353A (en) * 2016-06-15 2016-11-09 北京市商汤科技开发有限公司 The method for segmenting objects merged based on multi-level regional area and device, calculating equipment
CN106412441A (en) * 2016-11-04 2017-02-15 珠海市魅族科技有限公司 Video anti-shake control method and terminal
CN107808389A (en) * 2017-10-24 2018-03-16 上海交通大学 Unsupervised methods of video segmentation based on deep learning

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW577227B (en) * 2002-04-23 2004-02-21 Ind Tech Res Inst Method and apparatus for removing background of visual content

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102420985A (en) * 2011-11-29 2012-04-18 宁波大学 Multi-view video object extraction method
CN104410855A (en) * 2014-11-05 2015-03-11 广州中国科学院先进技术研究所 Jitter detection method of monitoring video
CN104966286A (en) * 2015-06-04 2015-10-07 电子科技大学 3D video saliency detection method
CN106097353A (en) * 2016-06-15 2016-11-09 北京市商汤科技开发有限公司 The method for segmenting objects merged based on multi-level regional area and device, calculating equipment
CN106412441A (en) * 2016-11-04 2017-02-15 珠海市魅族科技有限公司 Video anti-shake control method and terminal
CN107808389A (en) * 2017-10-24 2018-03-16 上海交通大学 Unsupervised methods of video segmentation based on deep learning

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Video Object Segmentation with Re-identification;Xiaoxiao Li 等;《CVPR 2017 Workshop, DAVIS Challenge on Video Object Segmentation 2017 (Winning Entry)》;20171231;1-6页 *
基于RGB-D图像的3D人脸重建;吴兴龙;《中国优秀硕士学位论文全文数据库(信息科技辑)》;20130915;I138-347 *
融合光流速度与背景建模的目标检测方法;张水发 等;《中国图象图形学报》;20110216;236-243 *

Also Published As

Publication number Publication date
CN108805898A (en) 2018-11-13

Similar Documents

Publication Publication Date Title
CN108805898B (en) Video image processing method and device
CN108830900B (en) Method and device for processing jitter of key point
CN107967693B (en) Video key point processing method and device, computing equipment and computer storage medium
CN109493417B (en) Three-dimensional object reconstruction method, device, equipment and storage medium
CN112330685B (en) Image segmentation model training method, image segmentation device and electronic equipment
US11036975B2 (en) Human pose estimation
CN112651291B (en) Gesture estimation method and device based on video, medium and electronic equipment
CN107920257B (en) Video key point real-time processing method and device and computing equipment
CN110956131A (en) Single-target tracking method, device and system
CN112348828A (en) Example segmentation method and device based on neural network and storage medium
CN111062263A (en) Method, device, computer device and storage medium for hand pose estimation
WO2021217937A1 (en) Posture recognition model training method and device, and posture recognition method and device
CN111144398A (en) Target detection method, target detection device, computer equipment and storage medium
EP4290459A1 (en) Augmented reality method and related device thereof
US12020508B2 (en) Systems and methods for predicting elbow joint poses
CN114937125B (en) Reconstructable metric information prediction method, reconstructable metric information prediction device, computer equipment and storage medium
CN110824496B (en) Motion estimation method, motion estimation device, computer equipment and storage medium
CN115564639A (en) Background blurring method and device, computer equipment and storage medium
CN114663598A (en) Three-dimensional modeling method, device and storage medium
CN114202554A (en) Mark generation method, model training method, mark generation device, model training device, mark method, mark device, storage medium and equipment
CN116524088A (en) Jewelry virtual try-on method, jewelry virtual try-on device, computer equipment and storage medium
CN114998814B (en) Target video generation method and device, computer equipment and storage medium
CN110838138A (en) Repetitive texture detection method, device, computer equipment and storage medium
CN115294280A (en) Three-dimensional reconstruction method, apparatus, device, storage medium, and program product
CN111091022A (en) Machine vision efficiency evaluation method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP01 Change in the name or title of a patent holder

Address after: Room B0035, 2nd floor, No. 3 Courtyard, 30 Shixing Street, Shijingshan District, Beijing, 100041

Patentee after: Tiktok vision (Beijing) Co.,Ltd.

Address before: Room B0035, 2nd floor, No. 3 Courtyard, 30 Shixing Street, Shijingshan District, Beijing, 100041

Patentee before: BEIJING BYTEDANCE NETWORK TECHNOLOGY Co.,Ltd.

Address after: Room B0035, 2nd floor, No. 3 Courtyard, 30 Shixing Street, Shijingshan District, Beijing, 100041

Patentee after: Douyin Vision Co.,Ltd.

Address before: Room B0035, 2nd floor, No. 3 Courtyard, 30 Shixing Street, Shijingshan District, Beijing, 100041

Patentee before: Tiktok vision (Beijing) Co.,Ltd.

CP01 Change in the name or title of a patent holder
TR01 Transfer of patent right

Effective date of registration: 20230712

Address after: 1309, 13th Floor, Building 4, Zijin Digital Park, Haidian District, Beijing, 100080

Patentee after: Beijing volcano Engine Technology Co.,Ltd.

Address before: Room B0035, 2nd floor, No. 3 Courtyard, 30 Shixing Street, Shijingshan District, Beijing, 100041

Patentee before: Douyin Vision Co.,Ltd.

TR01 Transfer of patent right