WO2014205643A1 - Method and system capable of alignment of video frame sequences - Google Patents

Method and system capable of alignment of video frame sequences Download PDF

Info

Publication number
WO2014205643A1
WO2014205643A1 PCT/CN2013/077844 CN2013077844W WO2014205643A1 WO 2014205643 A1 WO2014205643 A1 WO 2014205643A1 CN 2013077844 W CN2013077844 W CN 2013077844W WO 2014205643 A1 WO2014205643 A1 WO 2014205643A1
Authority
WO
WIPO (PCT)
Prior art keywords
frame
sequence
video
virtual
alignment
Prior art date
Application number
PCT/CN2013/077844
Other languages
French (fr)
Inventor
Jianping Song
Shilin Wang
Lin Du
Xiaojun Ma
Original Assignee
Thomson Licensing
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Thomson Licensing filed Critical Thomson Licensing
Priority to PCT/CN2013/077844 priority Critical patent/WO2014205643A1/en
Publication of WO2014205643A1 publication Critical patent/WO2014205643A1/en

Links

Classifications

    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/10Indexing; Addressing; Timing or synchronising; Measuring tape travel
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/10Processing, recording or transmission of stereoscopic or multi-view image signals
    • H04N13/106Processing image signals
    • H04N13/167Synchronising or controlling image signals

Definitions

  • the present invention relates to a method and system capable of alignment of video frame sequences.
  • multiple cameras can be utilized in a number of applications such as multi-view TV, free viewpoint TV, 3D teleconference and 3D surveillance.
  • a 3D application an accurate 3D scene can be presented only when the image of the left view and the image of the right view are of the same scene captured at the same time.
  • temporal latency and spatial shift may be introduced between video frame sequences.
  • a temporal misalignment between video frame sequences occurs when the input sequences have different frame rates (e.g., NTSC and PAL) , or when there is a time shift between the sequences (e.g. when the cameras are not activated simultaneously) .
  • a hardware synchronization method can embed timestamps into each video frame sequence on-the-fly and requires no post-processing, it requires a specialized hardware as well as setting up a camera network in advance.
  • a computer vision-based software synchronization algorithm can be used to perform post-processing on video frame sequences recorded by cameras that are not networked, such as common consumer hand-held video cameras or cameras embedded in mobile phones, or to synchronize historical videos for which hardware synchronization cannot be applicable.
  • the alignment may include shifting and correcting spatial and temporal detection of the video sequences.
  • Many alignment methods exist one of which is a spatio-temporal alignment method.
  • the spatio-temporal alignment method can be divided into two main classes: a feature-based method and a direct method.
  • the feature-based method uses detected features as the main input for alignment (e.g., two-frame feature correspondences or multi-frame feature trajectories) . From the feature correspondences or trajectory, an algebraic or geometric error is computed and used as a measure of synchronization.
  • the direct method relies on colors, intensities and intensity gradients to determine the spatio-temporal alignment of overlapping video frame sequences.
  • the direct method tends to align sequences more accurately when their intensities are similar, while the feature-based method is appropriate when scene appearance varies greatly from sequence to sequence (e.g., due to wide baselines, different magnification, or cameras with distinct spectral sensitivities) .
  • Existing alignment methods merely implement inter-frame alignment. That is, for each frame in one video sequence, a corresponding frame in the other video sequence is found and transformed so that the two frames of the frame in the one video sequence and the corresponding frame in the other video sequence are aligned both in time and space.
  • the depth map should be built based on left and right views. A time shift of 0.5 frames up to 20 milliseconds between the left and right views may impact the preciseness of the depth map observably.
  • a method for video synchronization is shown in the technical paper: Xiaochun Cao, Lin Wu, Jiangjian Xiao, H. Foroosh, Jigui Zhu and Xiaohong Li, "Video synchronization and its application to object transfer", Image and Vision Computing, 2010.
  • a method for video frame alignment is shown in the technical paper: S. Kuthirummal, C.V. Jawahar and P.J. Narayanan, "Video frame alignment in multiple views", Proceedings 2002 International Conference on Image Processing, 2002.
  • a method for spatio-temporal alignment of sequences is shown in the technical paper: Y. Capsi and M. Irani, “Spatio- temporal alignment of sequences", IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 24 (11), page 1409- 1424, NOVEMBER 2002.
  • a method for alignment of video frame sequences may comprise the steps of: determining an initial aligned frame in a video frame sequence with respect to a reference frame in a reference video frame sequence; and creating a virtual frame in the video frame sequence which virtual frame is more temporally aligned to the reference frame than the initial aligned frame
  • the created virtual frame may be selected as the frame in the video frame sequence corresponding to the reference frame in the reference video frame sequence.
  • a system for alignment of video frame sequences may comprise a processor which is configured to perform the processes of: determining an initial aligned frame in a video frame sequence with respect to a reference frame in a reference video frame sequence; and creating a virtual frame in the video frame sequence which virtual frame is more temporally aligned to the reference frame than the initial aligned frame.
  • the created virtual frame may be selected as the frame in the video frame sequence corresponding to the reference frame in the reference video frame sequence.
  • Fig. 1A illustrates an example of an array of video cameras shooting the same scene
  • Fig. IB illustrates the reference frame f r in the reference sequence as well as the nearest initial aligned frame f ⁇ with respect to the reference frame f r and the created virtual frame t ⁇ in the i-th sequence;
  • FIG. 2 is a flow chart illustrating the processes for alignment of video frame sequences according to an embodiment of the present invention.
  • Fig. 3 is a block diagram schematically illustrates a system 300 for performing the processes for the alignment of video frame sequences according to an embodiment of the present invention.
  • Fig. 1A illustrates an example of an array of video cameras shooting the same scene.
  • a single dynamic scene 105 is shot simultaneously by N respective cameras 110a, 110b, 110c, ... 110 ⁇ arranged at distinct viewpoints.
  • each camera captures frames with a constant, but unknown frame rate.
  • the cameras are unsynchronized, i.e., they began capturing video frames at different timings with possibly distinct frame rate.
  • f r denotes the frame numbers of the reference sequence
  • t ⁇ denote the aligned frame numbers of the i-th sequence.
  • a ⁇ and ⁇ are unknown constants representing temporal dilation and temporal shift, respectively, between the i-th sequence and the reference sequence.
  • a ⁇ and ⁇ are not necessarily integers.
  • f r must be an integer.
  • the computed frame number i is not necessarily an integer. If t ⁇ is not an integer, it implies that the most temporally aligned frame of the i-th sequence, with respect to reference frame f r , is not captured by camera i. Therefore, frame t ⁇ may be a virtual frame that is not of the sequence captured by camera i.
  • Fig. IB illustrates the reference frame f r in the reference sequence as well as the nearest initial aligned frame f ⁇ with respect to the reference frame f r and the created virtual frame t ⁇ in the i-th sequence, which case is represented by the above described equation (3) .
  • an existing video alignment technique or a combination of existing video alignment techniques is performed to determine the initial frame f ⁇ .
  • a feature-based alignment approach is one of such video alignment techniques. This approach comprises: to first extract distinctive features from each frame, to match these features in order to establish a global correspondence, and then to estimate the geometric transformation between the feature images. This kind of approach has been used since the early days of stereo matching and has more recently gained popularity for image stitching applications.
  • a direct alignment uses pixel intensities in a video frame for synchronization and is suitable for videos containing significant lighting changes, e.g., fireworks, or flickering fire.
  • a direct alignment algorithm may synchronize two sequences in a coarse-to-fine manner. Firstly, a Gaussian sequence pyramid is computed, which is the video sequence equivalent of a Gaussian image pyramid. At each level of the pyramid, an iterative algorithm minimizes the sum-of-squared differences in pixel intensities between sequences according to the current estimate of the spatio- temporal model.
  • Fig. 2 is flow chart illustrating the processes for alignment of video frame sequences according to an embodiment of the present invention.
  • the differences D(f ⁇ - 1), D(f ⁇ ) and D(f ⁇ + 1) between the reference frame f r and the initial aligned frame fi - 1, fi and fi + 1 of the i-th sequence are computed, respectively.
  • frame f ⁇ - 1 and f ⁇ + 1 are the previous frame and the next frame of frame fi , respectively.
  • the difference is a numerical value that indicates the degree of the difference between the two frames, f r and f ⁇ - 1, f r and f i and f r and f ⁇ + 1.
  • any number of difference estimation techniques, or combination of techniques may be used to determine the difference values.
  • the PDM calculates the sum of the difference of all pixel values of two corresponding frames, such as the frame f ⁇ and the reference frame f r .
  • the pixel difference is typically squared, or the absolute value is taken, to result in a positive magnitude before summation.
  • NPD normalized pixel difference
  • FDC frequency-domain comparisons
  • a spatial transform such as a two-dimensional Fourier transform or wavelet transform may be used to parameterize image regions into transform coefficients.
  • the magnitude difference can be computed between the transform coefficients, which can be selected or weighted to favor or discard particular spatial frequencies .
  • the frame with less difference value is selected from frame fi - 1 and f ⁇ + 1.
  • the selected frame as well as frame f ⁇ is denoted as frame p and frame q, respectively.
  • frame p is temporally prior to frame q. Therefore, if the difference value D( fi - 1) is less than D( fi + 1) , frame p denotes frame f ⁇ -l and frame q denotes frame fi .
  • frame p denotes frame f ⁇ and frame q denotes frame f ⁇ +1.
  • the terminative condition of the iterative process is checked. If the difference between difference values D(p) and D(q) is less than a predefined threshold value ⁇ ("Yes") , it means that frame p and frame q are similar enough. Therefore, the process to refine the alignment can terminate and then the process is passed to step 280 to select the most aligned virtual frame t ⁇ . If the difference between difference values D (p) and D(q) is greater than the predefined threshold value ⁇ ("No") , the process will continue and try to find a more aligned frame.
  • the predefined threshold value ⁇ may be changed by a user by a user interface of the system described below.
  • the motion between frame p and frame q is estimated.
  • Motion estimation is the process of determining motion vectors that describe the transformation from one 2D image to another; usually from adjacent frames in a video sequence. Motion estimation is a key part of video compression as used by MPEG 1, 2 and 4 as well as many other video codecs. It should be noted that the motion needn't be estimated at each iteration. The motion estimated at last iteration can be used to quickly create the motion between frame p and frame q.
  • a motion compensation technique is used to create a frame s from frame p with half of the estimated motion.
  • the motion compensation is a key part of video compression as used by MPEG 1, 2 and 4 as well as many other video codecs.
  • the difference value D(s) between the reference frame f r and the synthesized frame s is estimated. Note that frame s is synthesized from frame p and the estimation motion, and the difference value D(p) between the reference frame f r and frame p is already computed. Therefore, the difference value D(s) can be computed from frame p and the estimation, which may be more efficient than to compute D(s) directly from frame s .
  • the denotation of frame number p and q is updated so that the one of frame p and q with greater difference value is discarded.
  • the remainder and the synthesized frame s are denoted as new frame p and q.
  • Still new frame p is temporally prior to new frame q. Therefore, if D(p) is less than D(q), frame q is discarded and new frame q is the synthesized frame s. Otherwise frame p is discarded and new frame p is the synthesized frame s.
  • step 230 the process is passed again to step 230 to check the difference between the new frame p and q. If the difference between D(p) and D (q) is less than the predefined threshold value ⁇ ("Yes") , it means that new frame p and new frame q are similar enough. Therefore, the process to refine the alignment can terminate. The process is passed to step 280 to select the most aligned frame t ⁇ . If the difference between D(p) and D(q) is greater than the predefined threshold value ⁇ ("No") , the process will continue and try to find a more aligned frame in another iterative procedure.
  • this frame is the most temporally aligned frame of the i-th video sequence with respect to the reference frame f r of the reference sequence.
  • This frame is denoted as t ⁇ .
  • This created virtual frame t ⁇ is selected as the frame in the i-th video frame sequence corresponding to the reference frame f r in the reference video frame sequence.
  • the virtual frame t ⁇ may be presented while the initial aligned frame f ⁇ may be omitted for the presentation.
  • the alignment process described above may be applied to every frame of the reference video sequence. Therefore, for each frame of the reference sequence, an aligned frame is calculated for each other sequences. As a result, i - 1 new sequences are calculated that are aligned with the reference sequence frame by frame .
  • Fig. 3 is a block diagram schematically illustrates a system 300 for performing the processes for alignment of video frame sequences according to an embodiment of the present invention as described above in connection with the flow chart shown in Fig. 2.
  • the system 300 illustrated in Fig. 3 may include a CPU (Central Processing Unit) 310, a storage unit 320, a user interface module 330, and an interface (I/F) module 340 connected via a bus 350.
  • a memory such as RAM (Random Access Memory) may be also connected to the CPU 310 via a direct connection or via the bus 350.
  • the connection between the CPU 310 and other parts of the system 300 may be a direct connection, and the connection is not limited to that using the bus 350.
  • the CPU 310 is an example of a processing unit that controls the processes as described above in connection with the flow chart shown in Fig. 2.
  • the CPU 310 is configured to perform the process of: "video synchronization" for determining an initial frame as discussed above in connection with Figs. 1A and IB and "frame alignment" of video frame sequences including the steps 210-280 as described above in connection with the flow chart shown Fig. 2.
  • the storage unit 320 may store one or more programs to be executed by the CPU 310, and various data including the reference video frame sequence, the other video frame sequences and intermediate data obtained during computations performed by the CPU 310.
  • the storage unit 320 may be formed by a semiconductor memory device, a storage unit that uses a magnetic recording medium, an optical storage unit that uses an optical recording medium, a magneto-optical storage unit that uses a magneto-optic recording medium, or any combination of such devices or units.
  • Examples of the user interface module 330 may include a keyboard and the like, to be operated by the user when inputting data, instructions, and the like to the system 300. The user can change the threshold value ⁇ with the user interface module 330 as mentioned above in connection with the step 230.
  • the I/F module 340 may provide an interface between the system 300 and an external apparatus (not illustrated) via a cable interface, a wireless interface, or a combination of cable and wireless interfaces.
  • the I/F 340 module may be connected to a network, such as the Internet, and the system 300 may receive data (for example, the reference video frame sequence and the other video frame sequences as source sequences) , instructions, programs, and the like via the I/F module 340. Also, the system 300 may output data such as aligned sequences via the I/F module 340.
  • a program may cause a computer or a processing unit, such as the CPU 310, to execute the video sequence alignment process.
  • a program may be stored in any suitable non-volatile computer-readable storage medium, including a semiconductor memory device, a magnetic recording medium, an optical recording medium, and a magneto-optic recording medium.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Image Analysis (AREA)

Abstract

The present invention is related to a method for alignment of video frame sequences. The method comprises the steps of: determining an initial aligned frame in a video frame sequence with respect to a reference frame in a reference video frame sequence; and creating a virtual frame in the video frame sequence which virtual frame is more temporally aligned to the reference frame than the initial aligned frame. The created virtual frame is selected as the frame in the video frame sequence corresponding to the reference frame in the reference video frame sequence.

Description

METHOD AND SYSTEM CAPABLE OF ALIGNMENT OF VIDEO FRAME
SEQUENCES
TECHNICAL FIELD
The present invention relates to a method and system capable of alignment of video frame sequences.
BACKGROUND ART
Multiple cameras can be utilized in a number of applications such as multi-view TV, free viewpoint TV, 3D teleconference and 3D surveillance. In a 3D application, an accurate 3D scene can be presented only when the image of the left view and the image of the right view are of the same scene captured at the same time. However, in an implementation of actual 3D applications, temporal latency and spatial shift may be introduced between video frame sequences. Typically, a temporal misalignment between video frame sequences occurs when the input sequences have different frame rates (e.g., NTSC and PAL) , or when there is a time shift between the sequences (e.g. when the cameras are not activated simultaneously) . On the other hand, a spatial misalignment results from the different positions, orientations and internal calibration parameters of all the cameras. Therefore, it would be necessary to establish both the temporal synchronization (temporal alignment) and the spatial alignment between the video frame sequences. This establishment would be required in applications such as tele- immersion, video-based surveillance, panoramic video mosaicing and video metrology.
While a hardware synchronization method can embed timestamps into each video frame sequence on-the-fly and requires no post-processing, it requires a specialized hardware as well as setting up a camera network in advance. On the other hand, a computer vision-based software synchronization algorithm can be used to perform post-processing on video frame sequences recorded by cameras that are not networked, such as common consumer hand-held video cameras or cameras embedded in mobile phones, or to synchronize historical videos for which hardware synchronization cannot be applicable.
The alignment may include shifting and correcting spatial and temporal detection of the video sequences. Many alignment methods exist, one of which is a spatio-temporal alignment method. The spatio-temporal alignment method can be divided into two main classes: a feature-based method and a direct method. The feature-based method uses detected features as the main input for alignment (e.g., two-frame feature correspondences or multi-frame feature trajectories) . From the feature correspondences or trajectory, an algebraic or geometric error is computed and used as a measure of synchronization. On the other hand, the direct method relies on colors, intensities and intensity gradients to determine the spatio-temporal alignment of overlapping video frame sequences. As a result, the direct method tends to align sequences more accurately when their intensities are similar, while the feature-based method is appropriate when scene appearance varies greatly from sequence to sequence (e.g., due to wide baselines, different magnification, or cameras with distinct spectral sensitivities) .
Existing alignment methods merely implement inter-frame alignment. That is, for each frame in one video sequence, a corresponding frame in the other video sequence is found and transformed so that the two frames of the frame in the one video sequence and the corresponding frame in the other video sequence are aligned both in time and space. The frame rate of the video frame sequence is typically 25 or 30 frames per second. This means there may be a time shift of 0.5 frames up to 20 milliseconds (= 1/25 x 1000 / 2) between two aligned frames. For some applications, such a time shift may cause problems. For example, in a 3D reconstruction application, the depth map should be built based on left and right views. A time shift of 0.5 frames up to 20 milliseconds between the left and right views may impact the preciseness of the depth map observably.
A method for video synchronization is shown in the technical paper: Xiaochun Cao, Lin Wu, Jiangjian Xiao, H. Foroosh, Jigui Zhu and Xiaohong Li, "Video synchronization and its application to object transfer", Image and Vision Computing, 2010. A method for video frame alignment is shown in the technical paper: S. Kuthirummal, C.V. Jawahar and P.J. Narayanan, "Video frame alignment in multiple views", Proceedings 2002 International Conference on Image Processing, 2002. Also, a method for spatio-temporal alignment of sequences is shown in the technical paper: Y. Capsi and M. Irani, "Spatio- temporal alignment of sequences", IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 24 (11), page 1409- 1424, NOVEMBER 2002.
SUMMARY OF THE INVENTION
According to one aspect of the present invention, a method for alignment of video frame sequences may comprise the steps of: determining an initial aligned frame in a video frame sequence with respect to a reference frame in a reference video frame sequence; and creating a virtual frame in the video frame sequence which virtual frame is more temporally aligned to the reference frame than the initial aligned frame The created virtual frame may be selected as the frame in the video frame sequence corresponding to the reference frame in the reference video frame sequence.
According to another aspect of the present invention, a system for alignment of video frame sequences may comprise a processor which is configured to perform the processes of: determining an initial aligned frame in a video frame sequence with respect to a reference frame in a reference video frame sequence; and creating a virtual frame in the video frame sequence which virtual frame is more temporally aligned to the reference frame than the initial aligned frame. The created virtual frame may be selected as the frame in the video frame sequence corresponding to the reference frame in the reference video frame sequence. The object and advantages of the invention will be realized and attained by means of the elements and combinations particularly pointed out in the claims.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are not restrictive of the invention, as claimed.
BRIEF DESCRIPTION OF THE DRAWINGS
These and other aspects, features and advantages of the present invention will become apparent from the following description in connection with the accompanying drawings in which :
Fig. 1A illustrates an example of an array of video cameras shooting the same scene;
Fig. IB illustrates the reference frame fr in the reference sequence as well as the nearest initial aligned frame f± with respect to the reference frame fr and the created virtual frame t± in the i-th sequence;
FIG. 2 is a flow chart illustrating the processes for alignment of video frame sequences according to an embodiment of the present invention; and
Fig. 3 is a block diagram schematically illustrates a system 300 for performing the processes for the alignment of video frame sequences according to an embodiment of the present invention.
DETAILED DESCRIPTION
In the following description, various aspects of an embodiment of the present invention will be described. For the purpose of explanation, specific configurations and details are set forth in order to provide a thorough understanding. However, it will also be apparent to one skilled in the art that the present invention may be implemented without the specific details present herein.
Fig. 1A illustrates an example of an array of video cameras shooting the same scene. In the example shown in Fig. 1A, a single dynamic scene 105 is shot simultaneously by N respective cameras 110a, 110b, 110c, ... 110η arranged at distinct viewpoints. It is assumed that each camera captures frames with a constant, but unknown frame rate. It is also assumed that the cameras are unsynchronized, i.e., they began capturing video frames at different timings with possibly distinct frame rate. In order to temporally align the resulting video frame sequences, the correspondence relationship between frame numbers in one "reference" sequence and frame numbers in all other sequences should be determined. This correspondence can be expressed as a set of linear equations, ti = a±fr + βί Eq. (1) where fr denotes the frame numbers of the reference sequence, and t± denote the aligned frame numbers of the i-th sequence. a± and β± are unknown constants representing temporal dilation and temporal shift, respectively, between the i-th sequence and the reference sequence. In general, a± and β± are not necessarily integers. However, fr must be an integer. As a result, the computed frame number i is not necessarily an integer. If t± is not an integer, it implies that the most temporally aligned frame of the i-th sequence, with respect to reference frame fr, is not captured by camera i. Therefore, frame t± may be a virtual frame that is not of the sequence captured by camera i. However, existing alignment techniques just handle captured frames. Therefore, existing alignment techniques may find that frame f± is the nearest frame to the reference frame frr where f± usually is the rounding of t± to the nearest integer. Therefore, fi ≤ ti ≤ fi + 1, if f1 < ti Eq. (2) or
f± - 1 < ti < fir if fi > t Eq. (3)
Fig. IB illustrates the reference frame fr in the reference sequence as well as the nearest initial aligned frame f± with respect to the reference frame fr and the created virtual frame t± in the i-th sequence, which case is represented by the above described equation (3) .
In an embodiment of the present invention, an existing video alignment technique or a combination of existing video alignment techniques is performed to determine the initial frame f± . A feature-based alignment approach is one of such video alignment techniques. This approach comprises: to first extract distinctive features from each frame, to match these features in order to establish a global correspondence, and then to estimate the geometric transformation between the feature images. This kind of approach has been used since the early days of stereo matching and has more recently gained popularity for image stitching applications.
Another video alignment technique is known as a direct alignment. This technique uses pixel intensities in a video frame for synchronization and is suitable for videos containing significant lighting changes, e.g., fireworks, or flickering fire. For example, a direct alignment algorithm may synchronize two sequences in a coarse-to-fine manner. Firstly, a Gaussian sequence pyramid is computed, which is the video sequence equivalent of a Gaussian image pyramid. At each level of the pyramid, an iterative algorithm minimizes the sum-of-squared differences in pixel intensities between sequences according to the current estimate of the spatio- temporal model.
Once the initial aligned frame f± is found, an iterative procedure is performed to refine the alignment and to look for the most aligned virtual frame t±.
Fig. 2 is flow chart illustrating the processes for alignment of video frame sequences according to an embodiment of the present invention.
At step 210, the differences D(f± - 1), D(f±) and D(f± + 1) between the reference frame fr and the initial aligned frame fi - 1, fi and fi + 1 of the i-th sequence are computed, respectively. Note that frame f± - 1 and f± + 1 are the previous frame and the next frame of frame fi , respectively. The difference is a numerical value that indicates the degree of the difference between the two frames, fr and f± - 1, fr and f i and fr and f± + 1. According to an embodiment of the present invention, any number of difference estimation techniques, or combination of techniques, may be used to determine the difference values.
One of such difference estimation techniques is known as the pixel-wise difference magnitude ("PDM") . The PDM calculates the sum of the difference of all pixel values of two corresponding frames, such as the frame f± and the reference frame fr. The pixel difference is typically squared, or the absolute value is taken, to result in a positive magnitude before summation.
Another difference estimation technique is known as the normalized pixel difference ("NPD") . The NPD determines difference by normalizing or scaling the pixels from both source frames to correct for camera intensity or color differences. Then the difference magnitude may be calculated using another difference estimation technique, such as the PDM.
Another technique is known as the frequency-domain comparisons ("FDC") . Using this technique, a spatial transform such as a two-dimensional Fourier transform or wavelet transform may be used to parameterize image regions into transform coefficients. The magnitude difference can be computed between the transform coefficients, which can be selected or weighted to favor or discard particular spatial frequencies .
At step 220, the frame with less difference value is selected from frame fi - 1 and f± + 1. The selected frame as well as frame f± is denoted as frame p and frame q, respectively. Note that frame p is temporally prior to frame q. Therefore, if the difference value D( fi - 1) is less than D( fi + 1) , frame p denotes frame f±-l and frame q denotes frame fi . On the other hand, if the difference value D( fi - 1) is greater than D( fi + 1), frame p denotes frame f± and frame q denotes frame f± +1.
At step 230, the terminative condition of the iterative process is checked. If the difference between difference values D(p) and D(q) is less than a predefined threshold value δ ("Yes") , it means that frame p and frame q are similar enough. Therefore, the process to refine the alignment can terminate and then the process is passed to step 280 to select the most aligned virtual frame t±. If the difference between difference values D (p) and D(q) is greater than the predefined threshold value δ ("No") , the process will continue and try to find a more aligned frame. The predefined threshold value δ may be changed by a user by a user interface of the system described below. At step 240, the motion between frame p and frame q is estimated. Motion estimation is the process of determining motion vectors that describe the transformation from one 2D image to another; usually from adjacent frames in a video sequence. Motion estimation is a key part of video compression as used by MPEG 1, 2 and 4 as well as many other video codecs. It should be noted that the motion needn't be estimated at each iteration. The motion estimated at last iteration can be used to quickly create the motion between frame p and frame q.
At step 250, a motion compensation technique is used to create a frame s from frame p with half of the estimated motion. The motion compensation is a key part of video compression as used by MPEG 1, 2 and 4 as well as many other video codecs. At step 260, the difference value D(s) between the reference frame fr and the synthesized frame s is estimated. Note that frame s is synthesized from frame p and the estimation motion, and the difference value D(p) between the reference frame fr and frame p is already computed. Therefore, the difference value D(s) can be computed from frame p and the estimation, which may be more efficient than to compute D(s) directly from frame s . At step 270, the denotation of frame number p and q is updated so that the one of frame p and q with greater difference value is discarded. The remainder and the synthesized frame s are denoted as new frame p and q. Still new frame p is temporally prior to new frame q. Therefore, if D(p) is less than D(q), frame q is discarded and new frame q is the synthesized frame s. Otherwise frame p is discarded and new frame p is the synthesized frame s.
Once the denotation of frame number p and q is updated, the process is passed again to step 230 to check the difference between the new frame p and q. If the difference between D(p) and D (q) is less than the predefined threshold value δ ("Yes") , it means that new frame p and new frame q are similar enough. Therefore, the process to refine the alignment can terminate. The process is passed to step 280 to select the most aligned frame t±. If the difference between D(p) and D(q) is greater than the predefined threshold value δ ("No") , the process will continue and try to find a more aligned frame in another iterative procedure.
Finally at step 280, the frame with less difference value is selected from frame p and q. According to an embodiment of present invention, this frame is the most temporally aligned frame of the i-th video sequence with respect to the reference frame fr of the reference sequence. This frame is denoted as t±. This created virtual frame t± is selected as the frame in the i-th video frame sequence corresponding to the reference frame fr in the reference video frame sequence. In the presentation of the i-th video frame sequence, the virtual frame t± may be presented while the initial aligned frame f± may be omitted for the presentation.
In the same manner, the most temporally aligned frame of other video sequences with respect to the reference frame fr of the reference sequence can be found.
In a variant of the embodiment, the alignment process described above may be applied to every frame of the reference video sequence. Therefore, for each frame of the reference sequence, an aligned frame is calculated for each other sequences. As a result, i - 1 new sequences are calculated that are aligned with the reference sequence frame by frame .
In still another variant of the embodiment, instead of computing the aligned frame for each frame of the reference sequence, only a portion of frames of the reference sequence is selected and their corresponding aligned frames are computed. In fact, it can be observed in Eq. (1) that only a± and β± are unknown constants. Therefore, if two aligned frames t±' and t±" are already computed with respect to two reference frame fr ' and fr " , then t± = ¾fr ' + β± Eq. (4) and
Figure imgf000013_0001
According to Eq. (4) and (5) , constants a± and β± can be calculated. Then for any reference frame fr, the time of frame t± can be calculated according to Eq. (1) . Therefore for any reference frame fr, the virtual aligned frame t± can be created once constants a± and β± become known. In still another variation of the embodiment, instead of using the same sequence as the reference sequence, an already aligned sequence can be used as the reference sequence when aligning another sequence. For example, the system may use sequence i as the reference sequence to align sequence j, then the system use the aligned sequence j as the reference to align sequence k. Fig. 3 is a block diagram schematically illustrates a system 300 for performing the processes for alignment of video frame sequences according to an embodiment of the present invention as described above in connection with the flow chart shown in Fig. 2.
The system 300 illustrated in Fig. 3 may include a CPU (Central Processing Unit) 310, a storage unit 320, a user interface module 330, and an interface (I/F) module 340 connected via a bus 350. A memory (not shown) such as RAM (Random Access Memory) may be also connected to the CPU 310 via a direct connection or via the bus 350. The connection between the CPU 310 and other parts of the system 300 may be a direct connection, and the connection is not limited to that using the bus 350.
The CPU 310 is an example of a processing unit that controls the processes as described above in connection with the flow chart shown in Fig. 2. The CPU 310 is configured to perform the process of: "video synchronization" for determining an initial frame as discussed above in connection with Figs. 1A and IB and "frame alignment" of video frame sequences including the steps 210-280 as described above in connection with the flow chart shown Fig. 2. The storage unit 320 may store one or more programs to be executed by the CPU 310, and various data including the reference video frame sequence, the other video frame sequences and intermediate data obtained during computations performed by the CPU 310. The storage unit 320 may be formed by a semiconductor memory device, a storage unit that uses a magnetic recording medium, an optical storage unit that uses an optical recording medium, a magneto-optical storage unit that uses a magneto-optic recording medium, or any combination of such devices or units. Examples of the user interface module 330 may include a keyboard and the like, to be operated by the user when inputting data, instructions, and the like to the system 300. The user can change the threshold value δ with the user interface module 330 as mentioned above in connection with the step 230.
The I/F module 340 may provide an interface between the system 300 and an external apparatus (not illustrated) via a cable interface, a wireless interface, or a combination of cable and wireless interfaces. The I/F 340 module may be connected to a network, such as the Internet, and the system 300 may receive data (for example, the reference video frame sequence and the other video frame sequences as source sequences) , instructions, programs, and the like via the I/F module 340. Also, the system 300 may output data such as aligned sequences via the I/F module 340.
A program may cause a computer or a processing unit, such as the CPU 310, to execute the video sequence alignment process. Such a program may be stored in any suitable non-volatile computer-readable storage medium, including a semiconductor memory device, a magnetic recording medium, an optical recording medium, and a magneto-optic recording medium. All examples and conditional language recited herein are intended for pedagogical purposes to aid the reader in understanding the invention and the concepts contributed by the inventor to furthering the art, and are to be construed as being without limitation to such specifically recited examples and conditions, nor does the organization of such examples in the specification relate to a showing of the superiority and inferiority of the invention. Although the embodiments of the present invention have been described in detail, it should be understood that the various changes, substitutions, and alterations could be made hereto without departing from the spirit and scope of the invention.

Claims

1. A method for alignment of video frame sequences, comprising the steps of:
determining an initial aligned frame in a video frame sequence with respect to a reference frame in a reference video frame sequence; and
creating a virtual frame in the video frame sequence which virtual frame is more temporally aligned to the reference frame than the initial aligned frame,
wherein the created virtual frame is selected as the frame in the video frame sequence corresponding to the reference frame in the reference video frame sequence.
2. The method as claimed in claim 1, wherein the creating step is repeated to create the virtual frame so that a difference value between the reference frame and a newly created virtual frame becomes smaller than a difference value between the reference frame and a previously created virtual frame until the difference value between the reference frame and the newly created virtual frame becomes smaller than a predetermined value.
3. The method as claimed in claim 2, wherein the virtual frame is created so that a motion between the newly created virtual frame and the previously created frame becomes a half of a motion between the previously created frame and another previously created frame.
4. The method as claimed in claim 1, wherein the determining step and the creating step are applied for every reference frame in the reference sequence to create virtual frames in the video frame sequence corresponding to the reference frames, respectively.
5. A system for alignment of video frame sequences comprising a processor which is configured to perform the processes of:
determining an initial aligned frame in a video frame sequence with respect to a reference frame in a reference video frame sequence; and
creating a virtual frame in the video frame sequence which virtual frame is more temporally aligned to the reference frame than the initial aligned frame,
wherein the created virtual frame is selected as the frame in the video frame sequence corresponding to the reference frame in the reference video frame sequence.
6. The system as claimed in claim 5, wherein the creating process is repeated to create the virtual frame so that a difference value between the reference frame and a newly created virtual frame becomes smaller than a difference value between the reference frame and a previously created virtual frame until the difference value between the reference frame and the newly created virtual frame becomes smaller than a predetermined value.
7. The system as claimed in claim 6, wherein the virtual frame is created so that a motion between the newly created virtual frame and the previously created frame becomes a half of a motion between the previously created frame and another previously created frame.
8. The system as claimed in claim 5, wherein the determining process and the creating process are applied for every reference frame in the reference sequence to create virtual frames in the video frame sequence corresponding to the reference frames, respectively.
PCT/CN2013/077844 2013-06-25 2013-06-25 Method and system capable of alignment of video frame sequences WO2014205643A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2013/077844 WO2014205643A1 (en) 2013-06-25 2013-06-25 Method and system capable of alignment of video frame sequences

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2013/077844 WO2014205643A1 (en) 2013-06-25 2013-06-25 Method and system capable of alignment of video frame sequences

Publications (1)

Publication Number Publication Date
WO2014205643A1 true WO2014205643A1 (en) 2014-12-31

Family

ID=52140765

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2013/077844 WO2014205643A1 (en) 2013-06-25 2013-06-25 Method and system capable of alignment of video frame sequences

Country Status (1)

Country Link
WO (1) WO2014205643A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110043691A1 (en) * 2007-10-05 2011-02-24 Vincent Guitteny Method for synchronizing video streams
US20120087571A1 (en) * 2010-10-08 2012-04-12 Electronics And Telecommunications Research Institute Method and apparatus for synchronizing 3-dimensional image
WO2012101542A1 (en) * 2011-01-28 2012-08-02 Koninklijke Philips Electronics N.V. Motion vector based comparison of moving objects

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110043691A1 (en) * 2007-10-05 2011-02-24 Vincent Guitteny Method for synchronizing video streams
US20120087571A1 (en) * 2010-10-08 2012-04-12 Electronics And Telecommunications Research Institute Method and apparatus for synchronizing 3-dimensional image
WO2012101542A1 (en) * 2011-01-28 2012-08-02 Koninklijke Philips Electronics N.V. Motion vector based comparison of moving objects

Similar Documents

Publication Publication Date Title
CN108886598B (en) Compression method and device of panoramic stereo video system
US20140055632A1 (en) Feature based high resolution motion estimation from low resolution images captured using an array source
EP1661384B1 (en) Semantics-based motion estimation for multi-view video coding
US9485495B2 (en) Autofocus for stereo images
EP2214137B1 (en) A method and apparatus for frame interpolation
WO2016074639A1 (en) Methods and systems for multi-view high-speed motion capture
Liu et al. Joint subspace stabilization for stereoscopic video
US8736669B2 (en) Method and device for real-time multi-view production
US8531505B2 (en) Imaging parameter acquisition apparatus, imaging parameter acquisition method and storage medium
KR20130112311A (en) Apparatus and method for reconstructing dense three dimension image
JP2009520975A (en) A method for obtaining a dense parallax field in stereo vision
KR100943635B1 (en) Method and apparatus for generating disparity map using digital camera image
Argyriou et al. Image, video and 3D data registration: medical, satellite and video processing applications with quality metrics
Sun et al. Rolling shutter distortion removal based on curve interpolation
JP2004356747A (en) Method and apparatus for matching image
WO2014205643A1 (en) Method and system capable of alignment of video frame sequences
KR20110133677A (en) Method and apparatus for processing 3d image
KR20070107543A (en) Apparatus for realtime-generating a depth-map by processing streaming streo images
WO2017166081A1 (en) Image registration method and device for terminal, and terminal
Asikuzzaman et al. Object-based motion estimation using the EPD similarity measure
Congote et al. Real-time depth map generation architecture for 3d videoconferencing
Fedak et al. Image and video super-resolution via accurate motion estimation
Yuan et al. A generic video coding framework based on anisotropic diffusion and spatio-temporal completion
KR101344943B1 (en) System for real-time stereo matching
Chen et al. A shape-adaptive low-complexity technique for 3D free-viewpoint visual applications

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 13888364

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 13888364

Country of ref document: EP

Kind code of ref document: A1