WO2019157922A1 - 一种图像处理方法、装置及ar设备 - Google Patents

一种图像处理方法、装置及ar设备 Download PDF

Info

Publication number
WO2019157922A1
WO2019157922A1 PCT/CN2019/072918 CN2019072918W WO2019157922A1 WO 2019157922 A1 WO2019157922 A1 WO 2019157922A1 CN 2019072918 W CN2019072918 W CN 2019072918W WO 2019157922 A1 WO2019157922 A1 WO 2019157922A1
Authority
WO
WIPO (PCT)
Prior art keywords
function
image
frame image
pose
coordinate
Prior art date
Application number
PCT/CN2019/072918
Other languages
English (en)
French (fr)
Inventor
李中源
刘力
张小军
Original Assignee
视辰信息科技(上海)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 视辰信息科技(上海)有限公司 filed Critical 视辰信息科技(上海)有限公司
Publication of WO2019157922A1 publication Critical patent/WO2019157922A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • G06T7/248Analysis of motion using feature-based methods, e.g. the tracking of corners or segments involving reference images or patches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • G06T7/74Determining position or orientation of objects or cameras using feature-based methods involving reference images or patches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/30Determination of transform parameters for the alignment of images, i.e. image registration
    • G06T7/33Determination of transform parameters for the alignment of images, i.e. image registration using feature-based methods
    • G06T7/337Determination of transform parameters for the alignment of images, i.e. image registration using feature-based methods involving reference images or patches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence

Definitions

  • Embodiments of the present invention relate to the field of computer visual effects, and in particular, to an image processing method, apparatus, and AR device.
  • Target tracking is one of the hotspots in the field of computer vision research and has been widely used.
  • video surveillance traffic monitoring
  • driverless face recognition
  • Augmented Reality AR
  • the camera's tracking focus automatic target tracking of the drone, etc. require the use of target tracking technology.
  • tracking of specific objects such as human tracking, vehicle tracking in traffic monitoring systems, face tracking, and gesture tracking in intelligent interactive systems, all require the use of target tracking techniques.
  • Target tracking is to establish the positional relationship of the object to be tracked in a continuous video sequence, obtain the complete motion trajectory of the object, and give the target coordinate position of the first frame of the image, and calculate the exact position of the target in the next frame image.
  • the target may exhibit some changes in the image, such as changes in position or shape, changes in scale, background occlusion, or changes in light brightness.
  • the research of target tracking algorithms also focuses on solving these changes and specific applications.
  • a complete target tracking process begins by detecting where the target is located and then tracking it. Ignoring the calculation of the amount of calculation, the method of detecting and replacing the tracking has higher accuracy, that is, the position of the target is determined by detecting in each frame. However, the amount of computation detected is generally much higher than the amount of tracking demand. Under the existing hardware conditions, in the application put into the market, the tracking method is adopted in the tracking mode.
  • the alignment of the whole image is more sufficient to provide more accurate initial values for subsequent feature tracking in the case of large displacement and jitter, ensuring the success of feature tracking and outputting the final precise position.
  • the alignment of the whole image is iteratively aligned by the image between two consecutive frames, and it is not necessary to extract the feature information in the image, and does not need information such as a significant plane, because the whole image contains both the target information and the background information, so the whole figure The alignment of the graph will be disturbed by the non-target area (ie the background).
  • aspects of the present invention provide an image processing method, apparatus, and AR device, which can improve the success rate of tracking of the entire system.
  • An aspect of the present invention provides an image processing method including:
  • the iterative process of aligning the entire image of the first frame image and the second frame image is iterated using a robust error function.
  • the iterative process of the alignment of the entire image of the first frame image and the second frame image is performed using a robust error function as a penalty function for the objective function:
  • is the robust error function
  • W is the pose transform function, which contains two parameters
  • x is the coordinate of the pixel of the image in the image
  • p is the transform coefficient of the pose
  • ⁇ p is the update of the pose transform coefficient p
  • W(x; ⁇ p) represents the pose transformation function, the two parameters of the pose transformation function are x and ⁇ p, respectively
  • W(x; p) represents the pose transformation function, two parameters of the pose transformation function X and p
  • T(W(x; ⁇ p)) represents the pixel value after the coordinate x is ⁇ p-transformed on the first frame image
  • I(W(x;p)) represents the coordinate x in the The p-transformed pixel value on the two-frame image
  • is the scale parameter vector, which is used to control the penalty of the error of the robust error function.
  • the iterative process of the alignment of the entire image of the first frame image and the second frame image is performed using a robust error function as a penalty function for the objective function:
  • is the robust error function
  • W is the pose transform function, which contains two parameters
  • x is the coordinate of the pixel of the image in the image
  • p is the transform coefficient of the pose
  • ⁇ p is the update of the pose transform coefficient p
  • W(x; p + ⁇ p) represents a pose transformation function, the two parameters of the pose transformation function are the coordinate x and the transformation parameter p + ⁇ p
  • I (W (x; p + ⁇ p)) represents the coordinate x
  • T(x) represents the pixel value of the coordinate x in the first frame image
  • is the scale parameter vector for controlling the penalty of the error of the robust error function Strength.
  • the iterative process of the alignment of the entire image of the first frame image and the second frame image is performed using a robust error function as a penalty function for the objective function:
  • is the robust error function
  • W is the pose transform function, which contains two parameters
  • x is the coordinate of the pixel of the image in the image
  • p is the transform coefficient of the pose
  • ⁇ p is the update of the pose transform coefficient p
  • the quantity; W(x; ⁇ p) represents a pose transformation function, and the two parameters of the pose transformation function are the coordinate x and the transformation parameter ⁇ p; W((W(x; ⁇ p)); p) represents the pose transformation function
  • the two parameters of the pose transformation function are W(x; ⁇ p) and p; I(W((x; ⁇ p)); p)) indicates that the coordinate x is first transformed by the ⁇ p pose and then by the p pose transformation.
  • the iterative process of the alignment of the entire image of the first frame image and the second frame image is performed using a robust error function as a penalty function for the objective function:
  • is the robust error function
  • W is the pose transform function, which contains two parameters
  • x is the coordinate of the pixel of the image in the image
  • p is the transform coefficient of the pose
  • W(x;p) represents the bit a pose transformation function, the two parameters of the pose transformation function are x and p, respectively
  • I(W(x;p)) represents a pixel value of the coordinate x after the p-position transformation on the second frame image
  • (x) represents the pixel value of the coordinate x in the image of the first frame
  • is a scale parameter vector for controlling the penalty strength of the error by the robust error function.
  • a function that satisfies the following functional conditions is the robust error function:
  • ⁇ (t, ⁇ ) is always greater than zero for any t
  • t represents the parameter of the robust error function and ⁇ is the scale parameter vector used to control the penalty of the error of the robust error function.
  • the robust error function is:
  • is the scale parameter vector, which is used to control the penalty force of the robust error function for the error
  • ⁇ 1 is the threshold set in the robust error function.
  • the method further includes:
  • the method further includes:
  • the first frame image is a template image
  • the second frame image is a current frame image
  • Another aspect of the present invention provides an image processing apparatus comprising:
  • transceiver for acquiring a sequence of consecutive video frame images
  • a processor configured to determine a continuous first frame image and a second frame image in the continuous sequence of video frame images, the first frame image being a previous frame image of the second frame image; Determining, by the target area of the frame image, location information of the target area in the second frame image; and aligning the first frame image and the second frame image with the position information as an iteration initial value Iterative processing;
  • the iterative process of aligning the entire image of the first frame image and the second frame image is iterated using a robust error function.
  • the iterative process of the alignment of the entire image of the first frame image and the second frame image is performed using a robust error function as a penalty function for the objective function:
  • is the robust error function
  • W is the pose transform function, which contains two parameters
  • x is the coordinate of the pixel of the image in the image
  • p is the transform coefficient of the pose
  • ⁇ p is the update of the pose transform coefficient p
  • W(x; ⁇ p) represents the pose transformation function, the two parameters of the pose transformation function are x and ⁇ p, respectively
  • W(x; p) represents the pose transformation function, two parameters of the pose transformation function X and p
  • T(W(x; ⁇ p)) represents the pixel value after the coordinate x is ⁇ p-transformed on the first frame image
  • I(W(x;p)) represents the coordinate x in the The p-transformed pixel value on the two-frame image
  • is the scale parameter vector, which is used to control the penalty of the error of the robust error function.
  • the iterative process of the alignment of the entire image of the first frame image and the second frame image is performed using a robust error function as a penalty function for the objective function:
  • is the robust error function
  • W is the pose transform function, which contains two parameters
  • x is the coordinate of the pixel of the image in the image
  • p is the transform coefficient of the pose
  • ⁇ p is the update of the pose transform coefficient p
  • W(x; p + ⁇ p) represents a pose transformation function, the two parameters of the pose transformation function are the coordinate x and the transformation parameter p + ⁇ p
  • I (W (x; p + ⁇ p)) represents the coordinate x
  • T(x) represents the pixel value of the coordinate x in the first frame image
  • is the scale parameter vector for controlling the penalty of the error of the robust error function Strength.
  • the iterative process of the alignment of the entire image of the first frame image and the second frame image is performed using a robust error function as a penalty function for the objective function:
  • is the robust error function
  • W is the pose transform function, which contains two parameters
  • x is the coordinate of the pixel of the image in the image
  • p is the transform coefficient of the pose
  • ⁇ p is the update of the pose transform coefficient p
  • the quantity; W(x; ⁇ p) represents a pose transformation function, the two parameters of the pose transformation function are the coordinate x and the transformation parameter ⁇ p; W((W(x; ⁇ p)); p) represents the pose transformation function
  • the two parameters of the pose transformation function are W(x; ⁇ p) and p; I(W((W(x; ⁇ p)); p))) the coordinate x is first transformed by the ⁇ p pose, and then passed the p-bit
  • T(x) represents the pixel value of the coordinate x in the first frame image
  • is the scale parameter vector, which is used to control the penalty force of the robust error function for the error.
  • the iterative process of the alignment of the entire image of the first frame image and the second frame image is performed using a robust error function as a penalty function for the objective function:
  • is the robust error function
  • W is the pose transform function, which contains two parameters
  • x is the coordinate of the pixel of the image in the image
  • p is the transform coefficient of the pose
  • W(x;p) represents the pose a transformation function, the two parameters of the pose transformation function are x and p, respectively
  • I(W(x;p)) represents a pixel value on the second frame image after the coordinate x is transformed by p pose
  • T( x) represents the pixel value of the coordinate x in the image of the first frame
  • is a scale parameter vector for controlling the penalty force of the error of the robust error function.
  • a function that satisfies the following functional conditions is the robust error function:
  • ⁇ (t, ⁇ ) is always greater than zero for any t
  • t represents the parameter of the robust error function and ⁇ is the scale parameter vector used to control the penalty of the error of the robust error function.
  • the robust error function is:
  • is the scale parameter vector, which is used to control the penalty force of the robust error function for the error
  • ⁇ 1 is the threshold set in the robust error function.
  • the processor is further configured to track the location of the at least one target in the target area by feature tracking matching in the second frame image after the iterative process is completed.
  • the processor is further configured to determine pose information of the target feature on the first frame image, and determine the target region of the first frame image according to the pose information of the target feature .
  • the first frame image is a template image
  • the second frame image is a current frame image
  • Another aspect of the present invention provides an AR device comprising the image processing device of any of the foregoing.
  • the image processing method, device and AR device described above process the image using an iterative method based on a robust function, and provide a very good initial value for the final feature matching tracking through the robust error function iteration, which greatly improves the feature matching.
  • the success rate of tracking thus increasing the success rate of the entire system tracking.
  • FIG. 1 is a schematic flowchart diagram of an image processing method according to an embodiment of the present invention
  • FIG. 2 is a comparison diagram of a robust error function curve and a quadratic curve according to another embodiment of the present invention.
  • FIG. 3 is a schematic structural diagram of an image processing apparatus according to another embodiment of the present invention.
  • FIG. 4 is a schematic diagram of a technical system of an AR capability spectrum according to another embodiment of the present invention.
  • an embodiment of the present invention provides an image processing method.
  • the image processing apparatus can read a series of video frame image sequences from a video.
  • the video can be recorded in real time or in advance. Recording is stored on a specific device (disk, memory, etc.), or it may be taken in real time by a photosensitive device such as a camera to the image processing device.
  • the image processing method may be performed on a chip having a computing capability, that is, the image processing device may be a computer or a portable mobile device (such as a mobile phone or the like).
  • Step 101 Determine successive first frame images and second frame images in a sequence of consecutive video frame images.
  • the second frame image is a current frame image (current frame)
  • the first frame image is a last frame image of the second frame image and is used as a template frame image.
  • Step 102 Determine pose information of the target feature in the first frame image, and determine a target region according to the pose information of the target feature.
  • the target area is extracted, and the target area is used as a template to search in the low-resolution current frame image, and the pose information of the target area is determined, and is provided as an iterative initial value.
  • the entire image is aligned in the processing flow.
  • the pose information includes position information and/or posture information, and the posture information may include at least one of shape transformation information, scale change information, and rotation information, and the shape transformation information includes a perspective angle.
  • the pose is generally six degrees of freedom, including displacement and rotation, for example, the object has a series of transformations such as rotation or perspective in the field of view to produce pose information.
  • Step 103 Determine location information of the target area in the second frame image in the second frame image by using the target area of the first frame image as a template.
  • Step 104 Perform iterative processing on the whole frame alignment of the first frame image and the second frame image with the position information as an iteration initial value.
  • the iterative process uses a Lukas-Kanada (LK) iterative algorithm
  • the iterative process of alignment of the first frame image and the second frame image uses a Robust Error Function ⁇ (t, ⁇ ) is iterative, where t represents the parameter of the robust error function and ⁇ is the scale parameter vector used to control the penalty of the error of the robust error function.
  • LK Lukas-Kanada
  • the specific implementation manner of the LK iterative algorithm is not limited, and may include multiple specific algorithms, such as a Forward Additive algorithm, an Inverse Compositional algorithm, a Forward Compositional algorithm, or an Efficient Second-order minimization (ESM) algorithm.
  • a Forward Additive algorithm such as a Forward Additive algorithm, an Inverse Compositional algorithm, a Forward Compositional algorithm, or an Efficient Second-order minimization (ESM) algorithm.
  • ESM Second-order minimization
  • Lucas-kanada (LK) iterative algorithm is a common optical flow algorithm.
  • the optical flow algorithm itself is divided into a dense optical flow algorithm and a sparse optical flow algorithm.
  • the algorithm used in this embodiment is a dense optical flow algorithm, for example, the Inverse Compositional algorithm.
  • the iterative process of aligning the entire image of the first frame image and the second frame image may be iterated using a robust error function in combination with an optical flow algorithm.
  • the first The iterative process of aligning the frame image with the entire image of the second frame image can be iterated using a robust error function in conjunction with a dense optical flow algorithm.
  • the iterative process of the alignment of the entire image of the first frame image and the second frame image is performed using a robust error function as a penalty function for the objective function:
  • is the robust error function
  • W is the pose transform function, which contains two parameters
  • x is the coordinate of the pixel of the image in the image
  • p is the transform coefficient of the pose
  • ⁇ p is the update of the pose transform coefficient p
  • the quantity; W(x; ⁇ p) represents the pose transformation function, the two parameters of the pose transformation function are x and ⁇ p, respectively
  • W(x; p) represents the pose transformation function, two parameters of the pose transformation function X and p; respectively
  • T(W(x; ⁇ p)) represents the pixel value of the coordinate x after ⁇ p transformation on the first frame image
  • I(W(x;p)) represents the coordinate x in the The p-transformed pixel value on the second frame image
  • is the scale parameter vector, which is used to control the penalty force of the robust error function.
  • the processing flow for the first frame image and the second frame image is as follows: ⁇ P transform is used for the first frame image; p transform is used for the second frame image, and the transformed two are used The image is subtracted, and the subtraction of the two images represents the sum of the differences between each pixel of the transformed two images. When the difference reaches a minimum, it indicates that the image has been aligned, that is, the meaning of the above objective function representation.
  • the iterative process of the alignment of the entire image of the first frame image and the second frame image is performed using a robust error function as a penalty function for the objective function:
  • is the robust error function
  • W is the pose transform function, which contains two parameters
  • x is the coordinate of the pixel of the image in the image
  • p is the transform coefficient of the pose
  • ⁇ p is the update of the pose transform coefficient p
  • W(x; p + ⁇ p) represents a pose transformation function, the two parameters of the pose transformation function are the coordinate x and the transformation parameter p + ⁇ p
  • I (W (x; p + ⁇ p)) represents the coordinate x
  • T(x) represents the pixel value of the coordinate x in the first frame image
  • is the scale parameter vector for controlling the penalty of the error of the robust error function Strength.
  • the processing flow of the first frame image and the second frame image is as follows: the first frame image remains unchanged, and the second frame image is performed on the basis of the existing transform coefficient p.
  • the ⁇ p is additionally superimposed and transformed, and subtracted from the image of the first frame, and the subtraction of the two images represents the sum of the differences between each pixel of the transformed two images, when the difference reaches At the very least, it means that the image is already aligned, that is, the meaning of the above objective function representation.
  • the iterative process of the alignment of the entire image of the first frame image and the second frame image is performed using a robust error function as a penalty function for the objective function:
  • is the robust error function
  • W is the pose transform function, which contains two parameters
  • x is the coordinate of the pixel of the image in the image
  • p is the transform coefficient of the pose
  • ⁇ p is the update of the pose transform coefficient p
  • the quantity; W(x; ⁇ p) represents a pose transformation function, and the two parameters of the pose transformation function are the coordinate x and the transformation parameter ⁇ p; W((x; ⁇ p)); p) represents the pose transformation function,
  • the processing flow of the first frame image and the second frame image is as follows: the second frame image is transformed by using ⁇ p and p, and subtracted from the first frame image, two images. Subtraction represents the sum of the differences between each pixel of the transformed two images. When the difference reaches a minimum, it indicates that the image is already aligned, that is, the meaning of the above objective function representation.
  • the iterative process of the alignment of the entire image of the first frame image and the second frame image is performed using a robust error function as a penalty function for the objective function:
  • is the robust error function
  • W is the pose transform function, which contains two parameters
  • x is the coordinate of the pixel of the image in the image
  • p is the transform coefficient of the pose
  • W(x;p) represents the pose a transformation function, the two parameters of the pose transformation function are x and p, respectively
  • I(W(x;p)) represents a pixel value on the second frame image after the coordinate x is transformed by p pose
  • T( x) represents the pixel value of the coordinate x in the image of the first frame
  • is a scale parameter vector for controlling the penalty force of the error of the robust error function.
  • the processing flow for the first frame image and the second frame image is as follows: the second frame image is transformed using p, and subtracted from the first frame image, and passed through the first order and the second The form of the Taylor expansion is analyzed and solved for p.
  • the subtraction of the two images represents the sum of the differences between each pixel of the transformed two images. When the difference reaches a minimum, it indicates that the image is already aligned, that is, the meaning of the above objective function representation.
  • the objective function refers to the target of iterative determination.
  • the pursuit of the objective function is minimized, i.e., reaches a minimum value.
  • the purpose of the iteration is to bring the expression to its minimum value, and the iterative manner is to update the transformation parameter p with ⁇ P.
  • the objective function updates the transform coefficient p by iteration so that the sum of the difference values of the pixel values of each pixel of the first frame image T after the second frame image I is transformed is minimized. Since it is a quadratic term, its minimum value is 0. When the minimum value is zero, it means that the values of each pixel in I and T are the same, that is, two images that can be regarded as identical. However, in practical applications, I and T are more or less different, so the transformation coefficient p is iteratively determined such that I and T are as identical as possible.
  • the robust error function may also include various types, for example, the huber function or the Geman-McLure Function function is one of the robust error functions.
  • a function that satisfies the following functional conditions can be called a robust error function:
  • ⁇ (t, ⁇ ) is always greater than zero for any t
  • t represents the parameter of the robust error function and ⁇ is the scale parameter vector used to control the penalty of the error of the robust error function.
  • the robust error function may be a Huber Function, ie:
  • is the scale parameter vector, which is used to control the penalty force of the robust error function for the error
  • ⁇ 1 is the threshold set in the robust error function.
  • the robust error function may also be a Geman-McLure Function, ie:
  • is the scale parameter vector, which is used to control the penalty force of the robust error function for the error
  • ⁇ 1 is the threshold set in the robust error function.
  • ⁇ 1 .
  • t represents the parameter of the robust error function
  • p represents the transform coefficient of the pose
  • T(x) represents the pixel value of the coordinate x in the first frame image
  • I(W(x;p)) represents the coordinate x through p The pixel value on the second frame image after the pose change.
  • the pose in the previous frame that is, the position of the target
  • the general pose is output through the feature track (that is, the feature point tracking matching method), that is, by tracking the feature points with certain certainty in the target area, when the position information of the feature points in the two frames is known,
  • the pose in the current frame can be calculated.
  • the location of at least one of the target regions is tracked by feature tracking matching in the second frame image.
  • feature tracking is continued.
  • the general process of feature tracking is based on the position of the corresponding feature point in the previous frame, and is performed near the position in the current frame. Searching; or searching for a feature point predicted by some predictive means near a location that may exist in the current frame.
  • the search radius is typically a few pixels for reasons of real-time efficiency.
  • the objective function solves that the difference between each pixel point tends to be the smallest.
  • the difference can reach 0 because the value of each pixel is the same.
  • the two images may not be exactly the same.
  • the difference of the parts of the same scene in the two frames of images can be very small.
  • the part of the background (moving and moving in) is different in theory.
  • the pixel values are different, and the pixel values cannot be the same by the alignment operation.
  • the objective function often fails to reach zero. It should be noted that the iterative formula obtains the minimum value of the sum of the sums of all the pixel points.
  • the scenes with the same scene in the two frames can be aligned successfully, so the difference is small; the alignment cannot be achieved in different places. Therefore, the difference is large.
  • the significance of introducing a robust error function is: when the pixel difference between two points is within a certain range, that is, the threshold t in the formula, that is, the pixel difference, weighting is performed by the second line alignment of the robust error function (second The magnitude of the formula of the row grows slower than the quadratic term, so that it can still have some influence on the convergence of the iteration, but it reduces the range of its influence, thereby improving the success rate of alignment of the whole image when the background is inconsistent. .
  • the abscissa t can be understood as the difference between the pixels, and the ordinate is the effect of the difference on the final iteration result. It can be seen that when the pixel difference is within a certain range, the effect of the two on the iteration is the same.
  • FIG. 3 is a schematic structural diagram of an image processing apparatus according to another embodiment of the present invention.
  • the image processing apparatus 31 includes a transceiver 311, a processor 312, a memory 313, and a bus 314.
  • the transceiver 311, the processor 312, and the memory 313 communicate with each other through the bus 314.
  • the transceiver 311 includes a transmitting unit (for example, a transmitting circuit) and a receiving unit (for example, a receiving circuit).
  • a transmitting unit for example, a transmitting circuit
  • a receiving unit for example, a receiving circuit
  • the processor 312 may be a central processing unit (CPU), and the processor 312 may also be other general-purpose control processors, digital signal processing (DSP), Application Specific Integrated Circuit (ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware component, etc.
  • the general purpose control processor may be a micro control processor or any conventional control processor such as a microcontroller or the like.
  • the memory 313 is configured to store program code or instructions, the program code includes computer operation instructions, and the processor 312 is configured to execute the program code or instructions stored by the memory 313, such that the transceiver 311, the processing The 312 and the memory 313 perform related functions as described below.
  • the memory 313 may include volatile memory, such as a random access memory (RAM), which may include static RAM or dynamic RAM.
  • RAM random access memory
  • the memory 313 may also include a non-volatile memory such as a read-only memory (PROM), a programmable read-only memory (PROM), and a rewritable memory.
  • the memory 313 may also be an external flash memory, at least one disk storage or a buffer.
  • the bus 314 may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, or an Extended Industry Standard Architecture (EISA) bus.
  • ISA Industry Standard Architecture
  • PCI Peripheral Component Interconnect
  • EISA Extended Industry Standard Architecture
  • the bus system can be divided into an address bus, a data bus, a control bus, and the like. For ease of representation, only one thick line is shown in the figure, but it does not mean that there is only one bus or one type of bus.
  • the image processing device 31 reads a series of video frame image sequences from a video.
  • the video may be recorded in real time, or may be recorded in advance on a specific device (disk, memory, etc.). It is transmitted to the image processing apparatus in real time by a photosensitive device such as a camera.
  • the image processing method may be executed on a chip having a computing capability, that is, the image processing device 31 may be a computer or a portable mobile device (such as a mobile phone or the like).
  • the transceiver 311 is configured to acquire a sequence of video frame images.
  • the processor 312 is configured to determine a continuous first frame image and a second frame image in a continuous sequence of video frame images.
  • the second frame image is a current frame image (current frame)
  • the first frame image is a last frame image of the second frame image and is used as a template frame image.
  • the processor 312 is further configured to determine pose information of the target feature in the first frame image, and determine the target region according to the pose information of the target feature.
  • the processor 312 extracts the target area, uses the target area as a template, searches in the low-resolution current frame image, and determines the pose information of the target area.
  • the iteration initial value is provided to the alignment of the entire image.
  • the pose information includes position information and/or posture information, and the posture information may include at least one of: shape transformation information, scale change information, and rotation information, and the shape transformation information includes a plurality of perspective transitions.
  • the pose is generally six degrees of freedom, including displacement and rotation, for example, the object has a series of transformations such as rotation or perspective in the field of view to produce pose information.
  • the processor 312 is further configured to determine location information of the target area in the second frame image in the second frame image by using the target area of the first frame image as a template.
  • the processor 312 is further configured to perform an iterative process of aligning the first frame image and the second frame image with the position information as an iteration initial value.
  • the processor 312 is further configured to track the location of the at least one target in the target area by feature tracking matching in the second frame image after the iterative process is completed.
  • the specific process of the feature tracking is performed by the processor 312, and the feature tracking process corresponding to the foregoing embodiment of the method in FIG. 1 is referred to, and details are not described herein again.
  • the image processing method and apparatus respectively described in the above embodiments can implement an iterative process of alignment of two consecutive images, and can also be applied to an augmented reality (AR) application/device.
  • AR augmented reality
  • the technical system of the AR capability spectrum included in the AR application/device, as shown in FIG. 4, mainly includes two core elements: a) realistic perception, that is, ability to understand, recognize, and track the real world; b) AR content, that is, rendering, fusion, interaction, and creative capabilities for virtual content, where:
  • the AR content is the next-generation content form following the content form such as text, pictures, and video.
  • the two major features of AR content are high 3D and strong interactivity.
  • AR content is a very critical part of the AR industry chain. The amount of AR content, the quality of which directly determines the end user experience. How to efficiently complete the production, storage, distribution and exchange of AR content will play a decisive role in the prosperity of AR applications, so AR applications must require AR enhanced content tools.
  • Reality perception refers to the perception of the spatial environment and object targets in the real environment through hardware devices such as cameras and sensors. It means that the mobile phone or AR glasses have the ability to understand reality through visual means like humans.
  • the reality perception can be further divided into spatial perception and object perception, wherein the space in the reality perception refers to a small-scale environment that is relatively immobile in a relatively large-scale environment, such as a large-scale environment for the entire earth. Then land, country, city, commercial area, room, and desktop can be considered as static space environment under certain conditions.
  • the object target in object perception refers to an object that is often in motion relative to a large scale.
  • the virtual content can be moved following the dynamic target movement. For example, the virtual character standing on the card can move along with the card movement, and the two are integrated.
  • the object perception is further divided into the perception of the human body (ie, the recognition tracking for the human body, the face, the gesture, etc.) and the non-human body (ie, the artificial marker, the planar image, the three-dimensional rigid body, the non-rigid body, the universal Objects, etc.).
  • the human body ie, the recognition tracking for the human body, the face, the gesture, etc.
  • the non-human body ie, the artificial marker, the planar image, the three-dimensional rigid body, the non-rigid body, the universal Objects, etc.
  • One embodiment of the present invention proposes an object tracking algorithm in the realistic perceptual object sensing capability of the AR capability spectrum, and introduces a robust error function in the image alignment processing flow.
  • the overall target tracking process is implemented step by step by 1) initial displacement judgment; 2) robust error function iteration-; 3) feature matching tracking.
  • Feature matching tracking can give more accurate pose information.
  • the system generally performs target tracking based on the obtained pose matching information of the feature matching tracking.
  • the robust error function iteration provides a very good initial value for the final feature matching tracking, which greatly improves the success rate of feature matching tracking, thus improving the success rate of the entire system tracking.
  • a complete AR application/device needs to perceive the world like humans or humans through realistic sensing capabilities.
  • Reality perception is generally achieved through machine learning and computer vision. With this ability, AR applications/devices can perceive what is in the real world and where. Based on the perceived reality, the AR application/device presents the appropriate content to the user. Since the real world is stereoscopic (3D), this content also has strong 3D attributes; since the information is exceptionally rich and multifaceted, the user must be able to interact with the AR content through some interactive means.
  • 3D stereoscopic
  • the image processing method and apparatus described above process the image using an iterative method based on a robust error function to avoid an iterative failure caused by an excessive image difference, and the tracking result is more stable.
  • the disclosed system, apparatus, and method may be implemented in other manners.
  • the device embodiments described above are merely illustrative.
  • the division of the modules or units is only a logical function division.
  • there may be another division manner for example, multiple units or components may be used. Combinations can be integrated into another system, or some features can be ignored or not executed.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solution of the embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.
  • the processor executes, and the aforementioned program, when executed, can execute all or part of the steps including the above method embodiments.
  • the processor may be implemented as one or more processor chips, or may be part of one or more application specific integrated circuits (ASICs); and the foregoing storage medium may include but not be limited to the following types. Storage medium: flash memory, read-only memory (ROM), random access memory (RAM), mobile hard disk, disk or optical disk, etc. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

本发明实施例提供一种图像处理方法、装置及AR设备,在连续的视频帧图像序列中确定连续的第一帧图像和第二帧图像,其中,所述第二帧图像为当前帧图像,第一帧图像为所述第二帧图像的上一帧图像;以所述第一帧图像的目标区域为模板确定所述目标区域在所述第二帧图像的位置信息;以所述位置信息为迭代初值对所述第一帧图像和所述第二帧图像进行整图对齐的迭代处理;其中,所述第一帧图像和所述第二帧图像的整图对齐的迭代处理使用鲁棒误差函数进行迭代。本发明实施例所公开的图像处理方法、装置及AR设备,提高了整个系统追踪的成功率。

Description

一种图像处理方法、装置及AR设备 技术领域
本发明实施例涉及计算机视觉效果领域,尤其涉及一种图像处理方法、装置及AR设备。
背景技术
目标跟踪是计算机视觉研究领域的热点之一,并得到广泛应用。比如视频监控、车流量监控、无人驾驶、人脸识别、增强现实技术(Augmented Reality,AR)等。例如,相机的跟踪对焦、无人机的自动目标跟踪等都需要用到了目标跟踪技术。另外特定物体的跟踪,比如人体跟踪,交通监控系统中的车辆跟踪,人脸跟踪和智能交互系统中的手势跟踪等,都需要利用目标跟踪技术。
目标跟踪就是在连续的视频序列中,建立所要跟踪物体的位置关系,得到物体完整的运动轨迹,给定图像第一帧的目标坐标位置,计算在下一帧图像中目标的确切位置。在运动的过程中,目标可能会呈现一些图像上的变化,比如位置或形状的变化、尺度的变化、背景遮挡或光线亮度的变化等。目标跟踪算法的研究也围绕着解决这些变化和具体的应用展开。
一个完整的目标跟踪过程是首先检测出目标所在的位置,然后再进行追踪。忽略掉计算量的考量,通过检测代替跟踪的方式具有更高的准确性,即在每一帧中都采用检测的方式确定目标的位置。但是检测的计算量一般远超跟踪需求量。在现有的硬件条件下,投入市场的应用中,多采取追踪的方式进行目标追踪。
整图对齐更够在大幅度位移与抖动的情况下为后续的特征跟踪(feature tracking)提供更精确的初始值,确保特征跟踪的成功,输出最后精确的位置。目前,整图对齐是通过连续两帧之间的图像进行迭代对齐,不需要提取图像中的特征信息,也不需要显著平面等信息,因为整图中同时包含了目标信息和背景信息,所以整图对齐才会受到非目标区域(即背景)的干扰。
但是,现有技术的整图对齐中,由于相机移动或物体移动,两帧之前图像并不完全一致,在两帧图像背景差异过大的情况下,追踪的成功率比较低。
发明内容
本发明多个方面提供一种图像处理方法、装置及AR设备,可以提高整个系统追踪的成功率。
本发明的一方面提供了一种图像处理方法,包括:
在连续的视频帧图像序列中确定连续的第一帧图像和第二帧图像,其中,第一帧图像为所述第二帧图像的上一帧图像;
以所述第一帧图像的目标区域为模板确定所述目标区域在所述第二帧图像的位置信息;
以所述位置信息为迭代初值对所述第一帧图像和所述第二帧图像进行整图对齐的迭代处理;
其中,所述第一帧图像和所述第二帧图像的整图对齐的迭代处理使用鲁棒误差函数进行迭代。
优选地,所述第一帧图像和所述第二帧图像的整图对齐的迭代处理使用鲁棒误差函数作为惩罚函数进行迭代的目标函数为:
Figure PCTCN2019072918-appb-000001
其中,ρ为鲁棒误差函数;W为位姿变换函数,包含两个参数;x为图像的像素点在图像中的坐标;p为位姿的变换系数;Δp为位姿变换系数p的更新量;W(x;Δp)表示位姿变换函数,该位姿变换函数的两个参数分别是x和Δp;W(x;p)表示位姿变换函数,该位姿变换函数的两个参数分别是x和p;T(W(x;Δp))表示坐标x在所述第一帧图像上经过Δp变换后的像素值;I(W(x;p))表示坐标x在所述第二帧图像上经过p变换后的像素值;σ为尺度参数向量,用于控制鲁棒误差函数对误差的惩罚力度。
优选地,所述第一帧图像和所述第二帧图像的整图对齐的迭代处理使用鲁棒误差函数作为惩罚函数进行迭代的目标函数为:
Figure PCTCN2019072918-appb-000002
其中,ρ为鲁棒误差函数;W为位姿变换函数,包含两个参数;x为图像的像素点在图像中的坐标;p为位姿的变换系数;Δp为位姿变换系数p的更新量;W(x;p+Δp)表示位姿变换函数,该位姿变换函数的两个参数分别是坐标x和变换参数p+Δp;I(W(x;p+Δp))表示坐标x在第二帧图像上经过p+Δp变换后的像素值;T(x)表示坐标x在第一帧图像中的像素值;σ为尺度参数向量,用于控制鲁棒误差函数对误差的惩罚力度。
优选地,所述第一帧图像和所述第二帧图像的整图对齐的迭代处理使用鲁棒误差函数作为惩罚函数进行迭代的目标函数为:
Figure PCTCN2019072918-appb-000003
其中,ρ为鲁棒误差函数;W为位姿变换函数,包含两个参数;x为图像的像素点在图像中的坐标;p为位姿的变换系数;Δp为位姿变换系数p的更新量;W(x;Δp)表示位姿变换函数,该位姿变换函数的两个参数分别是坐标x和变换参数Δp;W((W(x;Δp));p)表示位姿变换函数,该位姿变换函数的两个参数是W(x;Δp)和p;I(W((x;Δp));p))表示坐标x先经过Δp位姿变换,再经过p位姿变换后在第二帧图像上的像素值;T(x)表示坐标x在第一帧图像中的像素值;σ为尺度参数向量,用于控制鲁棒误差函数对误差的惩罚力度。
优选地,所述第一帧图像和所述第二帧图像的整图对齐的迭代处理使用鲁棒误差函数作为惩罚函数进行迭代的目标函数为:
Figure PCTCN2019072918-appb-000004
其中,ρ为鲁棒误差函数;W为位姿变换函数,包含两示个参数;x为图像的像素点在图像中的坐标;p为位姿的变换系数;W(x;p)表示位姿变换函数,该位姿变换函数的两个参数分别是x和p;I(W(x;p))表示坐标x经过p位姿变换后在所述第二帧图像上的像素值;T(x)表示坐标x在第一帧图像中的像素值;σ为尺度参数向量,用于控制鲁棒误差函数对误差的惩罚力度。
优选地,满足以下几个函数条件的函数为所述鲁棒误差函数:
a)ρ(t,σ)对于任意t,始终大于零;
b)对于t小于等于零时,ρ(t,σ)始终满足单调递减;
c)对于t大于等于零时,ρ(t,σ)始终满足单调递增;
d)ρ(t,σ)是分段可微分的;
e)ρ(t,σ)单调递增(递减)的时候其值增长幅度小于t 2,大于|t|;
其中,t表示鲁棒误差函数的参数,σ为尺度参数向量,用于控制鲁棒误差函数对误差的惩罚力度。
优选地,所述鲁棒误差函数为:
Figure PCTCN2019072918-appb-000005
或,
Figure PCTCN2019072918-appb-000006
其中,t表示鲁棒误差函数的参数,σ为尺度参数向量,用于控制鲁棒误差函数对误差的惩罚力度,σ 1为鲁棒误差函数中设置的阈值。
优选地,在完成所述迭代处理之后,所述方法还包括:
在所述第二帧图像中通过特征追踪匹配的方式追踪到所述目标区域中的至少一个目标的位置。
优选地,所述方法还包括:
在所述第一帧图像上确定目标特征的位姿信息,根据所述目标特征的所述位姿信息确定所述第一帧图像的所述目标区域。
优选地,所述第一帧图像为模板图像,所述第二帧图像为当前帧图像。
本发明的另一方面提供了一种图像处理装置,包括:
收发器,用于获取连续的视频帧图像序列;
处理器,用于在所述连续的视频帧图像序列中确定连续的第一帧图像和第二帧图像,第一帧图像为所述第二帧图像的上一帧图像;以所述第一帧图像的目标区域为模板确定所述目标区域在所述第二帧图像的位置信息;以所述位置信息为迭代初值对所述第一帧图像和所述第二帧图像进行整图对齐的迭代处理;
其中,所述第一帧图像和所述第二帧图像的整图对齐的迭代处理使用鲁棒误差函数进行迭代。
优选地,所述第一帧图像和所述第二帧图像的整图对齐的迭代处理使用鲁棒误差函数作为惩罚函数进行迭代的目标函数为:
Figure PCTCN2019072918-appb-000007
其中,ρ为鲁棒误差函数;W为位姿变换函数,包含两个参数;x为图像的像素点在图像中的坐标;p为位姿的变换系数;Δp为位姿变换系数p的更新量;W(x;Δp)表示位姿变换函数,该位姿变换函数的两个参数分别是x和Δp;W(x;p)表示位姿变换函数,该位姿变换函数的两个参数分别是x和p;T(W(x;Δp))表示坐标x在所述第一帧图像上经过Δp变换后的像素值;I(W(x;p))表示坐标x在所述第二帧图像上经过p变换后的像素值;σ为尺度参数向量,用于控制鲁棒误差函数对误差的惩罚力度。
优选地,所述第一帧图像和所述第二帧图像的整图对齐的迭代处理使用鲁棒误差函数作为惩罚函数进行迭代的目标函数为:
Figure PCTCN2019072918-appb-000008
其中,ρ为鲁棒误差函数;W为位姿变换函数,包含两个参数;x为图像的像素点在图像中的坐标;p为位姿的变换系数;Δp为位姿变换系数p的更新量;W(x;p+Δp)表示位姿变换函数,该位姿变换函数的两个参数分别是坐标x和变换参数p+Δp;I(W(x;p+Δp))表示坐标x在第二帧图像上经过p+Δp变换后的像素值;T(x)表示坐标x在第一帧图像中的像素值;σ为尺度参数向量,用于控制鲁棒误差函数对误差的惩罚力度。
优选地,所述第一帧图像和所述第二帧图像的整图对齐的迭代处理使用鲁棒误差函数作为惩罚函数进行迭代的目标函数为:
Figure PCTCN2019072918-appb-000009
其中,ρ为鲁棒误差函数;W为位姿变换函数,包含两个参数;x为图像的像素点在图像中的坐标;p为位姿的变换系数;Δp为位姿变换系数p的更新量; W(x;Δp)表示位姿变换函数,该位姿变换函数的两个参数分别是坐标x和变换参数Δp;W((W(x;Δp));p)表示位姿变换函数,该位姿变换函数的两个参数是W(x;Δp)和p;I(W((W(x;Δp));p))表示坐标x先经过Δp位姿变换,再经过p位姿变换后在第二帧图像上的像素值;T(x)表示坐标x在第一帧图像中的像素值;σ为尺度参数向量,用于控制鲁棒误差函数对误差的惩罚力度。
优选地,所述第一帧图像和所述第二帧图像的整图对齐的迭代处理使用鲁棒误差函数作为惩罚函数进行迭代的目标函数为:
Figure PCTCN2019072918-appb-000010
其中,ρ为鲁棒误差函数;W为位姿变换函数,包含两个参数;x为图像的像素点在图像中的坐标;p为位姿的变换系数;W(x;p)表示位姿变换函数,该位姿变换函数的两个参数分别是x和p;I(W(x;p))表示坐标x经过p位姿变换后在所述第二帧图像上的像素值;T(x)表示坐标x在第一帧图像中的像素值;σ为尺度参数向量,用于控制鲁棒误差函数对误差的惩罚力度。
优选地,满足以下几个函数条件的函数为所述鲁棒误差函数:
a)ρ(t,σ)对于任意t,始终大于零;
b)对于t小于等于零时,ρ(t,σ)始终满足单调递减;
c)对于t大于等于零时,ρ(t,σ)始终满足单调递增;
d)ρ(t,σ)是分段可微分的;
e)ρ(t,σ)单调递增(递减)的时候其值增长幅度小于t 2,大于|t|;
其中,t表示鲁棒误差函数的参数,σ为尺度参数向量,用于控制鲁棒误差函数对误差的惩罚力度。
优选地,所述鲁棒误差函数为:
Figure PCTCN2019072918-appb-000011
或,
Figure PCTCN2019072918-appb-000012
其中,t表示鲁棒误差函数的参数,σ为尺度参数向量,用于控制鲁棒误差函数对误差的惩罚力度,σ 1为鲁棒误差函数中设置的阈值。
优选地,所述处理器还用于,在完成所述迭代处理之后,在所述第二帧图像中通过特征追踪匹配的方式追踪到所述目标区域中的至少一个目标的位置。
优选地,所述处理器还用于,在所述第一帧图像上确定目标特征的位姿信息,根据所述目标特征的所述位姿信息确定所述第一帧图像的所述目标区域。
优选地,所述第一帧图像为模板图像,所述第二帧图像为当前帧图像。
本发明的另一方面提供了一种AR设备,包括前述任意一项所述图像处理装置。
上述描述的图像处理方法、装置及AR设备,对图像采用基于鲁棒函数的迭代方式进行处理,通过鲁棒误差函数迭代为最后的特征匹配追踪提供非常好的初值,极大的提高特征匹配追踪的成功率,因而提高了整个系统追踪的成功率。
附图说明
图1为本发明一实施例的一种图像处理方法的流程示意图;
图2为本发明另一实施例的鲁棒误差函数曲线和二次项曲线对比图;
图3为本发明另一实施例的一种图像处理装置的结构示意图;
图4为本发明另一实施例的一种AR能力光谱的技术体系示意图。
具体实施方式
为使本发明实施例的目的、技术方案和优点更加清楚,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。
本文中术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中字符“/”,一般表示前后关联对象是一种“或”的关系。另外,本文中术语“系统”和“网络”在本文中常被可互换使用。
如图1所示,本发明一实施例提供了一种图像处理方法,图像处理装置可 以从一段video中读取一系列的视频帧图像序列,这段video可以是实时录制的,也可以是提前录制好储存在特定设备上(磁盘,内存等)的,也可以是由摄像头等感光设备实时拍摄传入所述图像处理装置。所述图像处理方法可以在具有执行计算能力的芯片上面执行,即所述图像处理装置可以为一台电脑,也可以是一台便携移动设备(比如手机等)。
步骤101,在连续的视频帧图像序列中确定连续的第一帧图像和第二帧图像。
其中,所述第二帧图像为当前帧图像(currentframe),第一帧图像为所述第二帧图像的上一帧图像(lastframe)且作为模板帧图像。
步骤102,在所述第一帧图像确定目标特征的位姿信息,根据所述目标特征的所述位姿信息确定目标区域。
例如,从上一帧的位姿信息中,提取出目标区域,以此目标区域为模板,在低分辨率的当前帧图像中进行搜索,确定目标区域的位姿信息,作为迭代初值提供给整图对齐的处理流程中。其中,所述位姿信息包含位置信息和/或姿态信息,所述姿态信息可以包括:形状的变换信息、尺度的变化信息和旋转的信息中的至少一个,所述形状的变换信息包括因视角转变引起的形状的变换信息,所述尺度的变化信息包括由远近变化引起的尺度的变化。在本发明的另一实施例中,所述位姿一般是六个自由度,包含位移和旋转两个部分,例如,物体在视野中存在旋转或透视等一系列变换而产生位姿信息。
步骤103,以所述第一帧图像的所述目标区域为模板在所述第二帧图像确定所述目标区域在所述第二帧图像的位置信息。
步骤104,以所述位置信息为迭代初值对所述第一帧图像和所述第二帧图像进行整图对齐的迭代处理。
例如,所述迭代处理采用Lukas-Kanada(LK)迭代算法,所述第一帧图像和所述第二帧图像的整图对齐的迭代处理使用鲁棒误差函数(Robust Error Function)ρ(t,σ)进行迭代,其中,t表示鲁棒误差函数的参数,σ为尺度参数向量,用于控制鲁棒误差函数对误差的惩罚力度。
例如,所述LK迭代算法的具体实施方式不作限定,可以包含多种具体算法, 例如,Forward Additive算法、Inverse Compositional算法、Forward Compositional算法或Efficient Second-order minimization(ESM)算法。
Lucas-kanada(LK)迭代算法即常见的光流算法,光流算法本身又分为稠密光流算法与稀疏光流算法。本实施例所采用的算法为稠密光流算法,例如,Inverse Compositional算法。
在本实施例中,所述第一帧图像和所述第二帧图像的整图对齐的迭代处理可以使用鲁棒误差函数结合光流算法进行迭代,在一个优选的方案中,所述第一帧图像和所述第二帧图像的整图对齐的迭代处理可以使用鲁棒误差函数结合稠密光流算法进行迭代。
在本发明的另一实施例中,所述第一帧图像和所述第二帧图像的整图对齐的迭代处理使用鲁棒误差函数作为惩罚函数进行迭代的目标函数为:
Figure PCTCN2019072918-appb-000013
其中,ρ为鲁棒误差函数;W为位姿变换函数,包含两个参数;x为图像的像素点在图像中的坐标;p为位姿的变换系数;Δp为位姿变换系数p的更新量;W(x;Δp)表示位姿变换函数,该位姿变换函数的两个参数分别是x和Δp;W(x;p)表示位姿变换函数,该位姿变换函数的两个参数分别是x和p;;T(W(x;Δp))表示坐标x在所述第一帧图像上经过Δp变换后的像素值;I(W(x;p))表示坐标x在所述第二帧图像上经过p变换后的像素值;σ为尺度参数向量,用于控制鲁棒误差函数对误差的惩罚力度。
在本实施例中,对于所述第一帧图像和所述第二帧图像的处理流程如下:对第一帧图像使用ΔP变换;对第二帧图像使用p变换,并将变换后的两个图像相减,两个图像相减表示经过变换后的两个图像每个像素点之间差值的和,当这个差值达到最小时,表示图像已经对齐,也即上述目标函数表示的意义。
在本发明的另一实施例中,所述第一帧图像和所述第二帧图像的整图对齐的迭代处理使用鲁棒误差函数作为惩罚函数进行迭代的目标函数为:
Figure PCTCN2019072918-appb-000014
其中,ρ为鲁棒误差函数;W为位姿变换函数,包含两个参数;x为图像的像素点在图像中的坐标;p为位姿的变换系数;Δp为位姿变换系数p的更新量;W(x;p+Δp)表示位姿变换函数,该位姿变换函数的两个参数分别是坐标x和变换参数p+Δp;I(W(x;p+Δp))表示坐标x在第二帧图像上经过p+Δp变换后的像素值;T(x)表示坐标x在第一帧图像中的像素值;σ为尺度参数向量,用于控制鲁棒误差函数对误差的惩罚力度。
在本实施例中,对于所述第一帧图像和所述第二帧图像的处理流程如下:对第一帧图像保持不变,对于第二帧图像,在已有变换系数p的基础上进行变换后,再额外叠加上Δp进行变换,并与第一帧图像相减,两个图像相减表示了经过变换后的两个图像每个像素点之间差值的和,当这个差值达到最小时,表示图像已经对齐,也即上述目标函数表示的意义。
在本发明的另一实施例中,所述第一帧图像和所述第二帧图像的整图对齐的迭代处理使用鲁棒误差函数作为惩罚函数进行迭代的目标函数为:
Figure PCTCN2019072918-appb-000015
其中,ρ为鲁棒误差函数;W为位姿变换函数,包含两个参数;x为图像的像素点在图像中的坐标;p为位姿的变换系数;Δp为位姿变换系数p的更新量;W(x;Δp)表示位姿变换函数,该位姿变换函数的两个参数分别是坐标x和变换参数Δp;W((x;Δp));p)表示位姿变换函数,该位姿变换函数的两个参数是W(x;Δp)和p;I(W((W(x;Δp));p))表示坐标x先经过Δp位姿变换,再经过p位姿变换后在第二帧图像上的像素值;T(x)表示坐标x在第一帧图像中的像素值;σ为尺度参数向量,用于控制鲁棒误差函数对误差的惩罚力度。
在本实施例中,对于所述第一帧图像和所述第二帧图像的处理流程如下:先后利用Δp及p对第二帧图像进行变换,并与第一帧图像相减,两个图像相减表示了经过变换后的两个图像每个像素点之间差值的和,当这个差值达到最小 时,表示图像已经对齐,也即上述目标函数表示的意义。
在本发明的另一实施例中,所述第一帧图像和所述第二帧图像的整图对齐的迭代处理使用鲁棒误差函数作为惩罚函数进行迭代的目标函数为:
Figure PCTCN2019072918-appb-000016
其中,ρ为鲁棒误差函数;W为位姿变换函数,包含两个参数;x为图像的像素点在图像中的坐标;p为位姿的变换系数;W(x;p)表示位姿变换函数,该位姿变换函数的两个参数分别是x和p;I(W(x;p))表示坐标x经过p位姿变换后在所述第二帧图像上的像素值;T(x)表示坐标x在第一帧图像中的像素值;σ为尺度参数向量,用于控制鲁棒误差函数对误差的惩罚力度。
在本实施例中,对于所述第一帧图像和所述第二帧图像的处理流程如下:使用p对第二帧图像进行变换,并与第一帧图像相减,并通过一阶以及二阶泰勒展开的形式对p进行分析和求解。两个图像相减表示了经过变换后的两个图像每个像素点之间差值的和,当这个差值达到最小时,表示图像已经对齐,也即上述目标函数表示的意义。
在上述实施例中,目标函数指的是迭代求取的目标。在一个优选的方案中,追求的是目标函数最小化,即达到最小值,换句话说,迭代的目的是使得这个式子达到其最小值,迭代的方式是对变换参数p利用ΔP进行更新。
所述目标函数通过迭代更新变换系数p,使得第二帧图像I经过变换后,与第一帧图像T的每个像素点像素值的差值的和达到最小。由于是二次项,所以其最小值为0。当其最小值为零时,则代表I和T中每一个像素的值都相同,即可以视为完全相同的两幅图像。然而在实际应用中,I和T或多或少有些不同的地方,所以通过迭代求取变换系数p,使得I和T尽可能的相同。
在上述实施例中,所述鲁棒误差函数也可以包括多种,例如,huber函数或Geman-McLure Function函数都属于鲁棒误差函数的一种。实际上,只要满足以下几个函数条件的函数均可以称为鲁棒误差函数:
a)ρ(t,σ)对于任意t,始终大于零;
b)对于t小于等于零时,ρ(t,σ)始终满足单调递减;
c)对于t大于等于零时,ρ(t,σ)始终满足单调递增;
d)ρ(t,σ)是分段可微分(piecewisedifferentiable)的;
e)ρ(t,σ)单调递增(递减)的时候其值增长幅度应该小于t2,大于|t|。
其中,t表示鲁棒误差函数的参数,σ为尺度参数向量,用于控制鲁棒误差函数对误差的惩罚力度。
在一个实施例中,所述鲁棒误差函数可以为Huber Function,即:
Figure PCTCN2019072918-appb-000017
其中,t表示鲁棒误差函数的参数,σ为尺度参数向量,用于控制鲁棒误差函数对误差的惩罚力度,σ 1为鲁棒误差函数中设置的阈值。
在另一实施例中,σ为尺度参数向量,具体表示为(σ 1,σ 2,σ 3,σ 4,....,σ s) T,该尺度参数的数量及具体选择与鲁棒函数本身相关。在一个优选的方案中,σ=σ1。
在另一实施例中,所述鲁棒误差函数也可以为Geman-McLure Function,即:
Figure PCTCN2019072918-appb-000018
其中,t表示鲁棒误差函数的参数,σ为尺度参数向量,用于控制鲁棒误差函数对误差的惩罚力度,σ 1为鲁棒误差函数中设置的阈值。
在本另一实施例中,σ=σ 1
对于一般的ρ(t)的幅值增长,近似于negativelogprobabilityfunction(P),其公式为:
ρ(t)∝-log P[I(W(x;p))-T(x)]
其中,t表示鲁棒误差函数的参数,p表示位姿的变换系数,T(x)表示坐标x在第一帧图像中的像素值,I(W(x;p))表示坐标x经过p位姿变换后在所述第二帧图像上的像素值。
上一帧图像(Last frame)中的位姿(pose),也就是目标区域(target)的位置是已知的,需要在当前帧图像(current frame)中精确的跟踪(track)到目标区域(target)的位置。一般位姿是通过feature track(特征点追踪匹配的方式) 来输出的,即通过追踪目标区域中的具有一定辨识度的特征点,当知晓这些特征点在两帧中各自的位置信息时,即可计算出当前帧图像(current frame)中的位姿。这些特征点一般来说是稀疏的,因而能够满足移动设备上实时性的要求。
在完成所述迭代处理之后,在所述第二帧图像中通过特征追踪匹配的方式追踪到所述目标区域中的至少一个目标的位置。
例如,迭代处理完成后,继续进行特征跟踪(Feature track),特征跟踪的一般流程是以该对应特征点在上一帧的位置为基础,在当前帧图像(current frame)中的该位置附近进行搜索;或者以通过某种预测手段预测得到的该特征点在当前帧中可能存在的位置附近进行搜索。考虑到实时性效率的原因,搜索半径一般为数个像素。
由上述描述可知,目标函数求解的是每个像素点之间的差值趋于最小。当两幅图像完全一样时,这个差值可以达到0,因为每个像素点的值都是一样的。但是实际情况中,两幅图像不可能完全一致,例如,一般经过位姿变换后,以简单的仅有平移的情况举例,一部分背景会被移到画面外,然后会移入一部分新的背景。在做对齐的过程中,两帧图像中相同场景的部分经过对齐后差值可以达到非常小。但是本身背景不同(移出和移入)的部分,理论上来说像素值就是不同的,不能通过对齐的操作使其像素值相同。所以实际操作中,目标函数往往无法达到0。需要注意一点的是,迭代公式求取的是所有像素点差值和的最小值,两帧图像中场景相同的地方可以对齐成功,因此其差值较小;场景不同的地方无法实现对齐成功,因此其差值较大。因此引入鲁棒误差函数的意义是:当两个点的像素差值在一定范围时,即公式中的阈值t,即为像素差,用鲁棒误差函数的第二行对齐进行加权(第二行的公式的幅值增长时慢于二次项的),使其虽然还是能够对迭代收敛造成一定影响,但是缩减了其影响的范围,从而提高了当背景不一致时,整图对齐的成功率。
图2为本发明另一实施例的鲁棒误差函数曲线和二次项曲线对比图。以图2所示的曲线进行说明,曲线1为二次函数f(t)=t 2的曲线(即平方曲线),曲线2为本实施例中鲁棒误差函数的曲线。横坐标t可以理解为像素之间的差值,纵坐标为该差值对最终迭代结果造成的影响。可以看到,当像素差值在一定范围内 时,两者对迭代的影响是相同的。但是当像素差值达到一定程度(两帧图像中场景不一致的部分时时),原本的二次项对迭代的影响迅速攀升,而鲁棒误差的影响则缓慢上升,因此避免了场景差异过大时,这部分场景不同的画面区域的像素差值过大而对迭代收敛过程影响太大而引起的图像对齐失败的问题。
图3为本发明另一实施例的一种图像处理装置的结构示意图,所述图像处理装置31包括:收发器(transceiver)311、处理器(processor)312、存储器(memory)313和总线314;其中,所述收发器311、所述处理器312和所述存储器313相互之间通过所述总线314进行通信。
在本发明实施例中,所述收发器311包括发送单元(例如发送电路)和接收单元(例如接收电路)。
在本发明实施例中,所述处理器312可以是中央处理单元(CentralProcessing Unit,CPU),所述处理器312还可以是其他通用控制处理器、数字信号处理器(Digital Signal Processing,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。所述通用控制处理器可以是微控制处理器或者是任何常规的控制处理器,例如单片机等。
所述存储器313用于存储程序代码或指令,所述程序代码包括计算机操作指令,所述处理器312用于执行所述存储器313存储的程序代码或指令,使得所述收发器311、所述处理器312和所述存储器313执行相关的功能,具体如下所述。所述存储器313可能包含易失性存储器,例如,随机存取存储器(random access memory,RAM),所述RAM可以包括静态RAM或动态RAM。所述存储器313也可能包括非易失性存储器(non-volatile memory),例如只读存储器(read-only memory,PROM)、可编程只读存储器(programmable read-only memory,PROM)、可擦写可编程只读存储器(erasable programmable read-only memory,EPROM)、电可擦写可编程只读存储器(electrically erasable programmable read-only memory,EEPROM)或闪存(flash memory)。所述存储器313还可能是外部闪存、至少一个磁盘存储器或缓存器。
所述总线314可以是工业标准体系结构(Industry Standard Architecture,ISA)总线、外设部件互连标准(Peripheral Component Interconnect,PCI)总线或扩展工业标准结构(Extended Industry Standard Architecture,EISA)总线等。该总线系统可以分为地址总线、数据总线、控制总线等。为便于表示,图中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。
所述图像处理装置31从一段video中读取一系列的视频帧图像序列,这段video可以是实时录制的,也可以是提前录制好储存在特定设备上(磁盘,内存等)的,也可以是由摄像头等感光设备实时拍摄传入所述图像处理装置。所述图像处理方法可以在具有执行计算能力的芯片上面执行的,即所述图像处理装置31可以为一台电脑,也可以是一台便携移动设备(比如手机等)。
所述收发器311用于获取视频帧图像序列。
所述处理器312,用于在连续的视频帧图像序列中确定连续的第一帧图像和第二帧图像。
其中,所述第二帧图像为当前帧图像(currentframe),第一帧图像为所述第二帧图像的上一帧图像(lastframe)且作为模板帧图像。
所述处理器312还用于在所述第一帧图像确定目标特征的位姿信息,根据所述目标特征的所述位姿信息确定目标区域。
例如,从上一帧的位姿信息中,所述处理器312提取出目标区域,以此目标区域为模板,在低分辨率的当前帧图像中进行搜索,确定目标区域的位姿信息,作为迭代初值提供给整图对齐中。所述位姿信息包含位置信息和/或姿态信息,所述姿态信息可以包括如下至少之一:形状的变换信息、尺度的变化信息和旋转的信息,所述形状的变换信息包括多为视角转变引起的形状的变化信息,所述尺度的变化信息包括由远近变化引起的尺度的变化。在本发明的另一实施例中,所述位姿一般是六个自由度,包含位移和旋转两个部分,例如,物体在视野中存在旋转或透视等一系列变换而产生位姿信息。
所述处理器312还用于以所述第一帧图像的所述目标区域为模板在所述第二帧图像确定所述目标区域在所述第二帧图像的位置信息。
所述处理器312还用于以所述位置信息为迭代初值对所述第一帧图像和所 述第二帧图像进行整图对齐的迭代处理。
所述处理器312还用于在完成所述迭代处理之后,在所述第二帧图像中通过特征追踪匹配的方式追踪到所述目标区域中的至少一个目标的位置。所述处理器312执行特征跟踪的具体过程,参考前述图1方法实施例对应描述的特征跟踪过程,在此不再赘述。
所述迭代处理的具体内容,参考前述图1方法实施例对应描述的迭代处理过程,在此不再赘述。
上述实施例中所分别描述的图像处理方法和装置,可以实现对连续两个图像进行整图对齐的迭代处理,也可以适用于增强现实(AR)的应用/设备。
在一个实施例中,AR应用/设备中所包含AR能力光谱的技术体系,如图4所示,主要包括两个核心要素:a)现实感知,即对现实世界理解、识别和跟踪等能力;b)AR内容,即对虚拟内容渲染、融合、交互、创作能力,其中:
AR内容是继文字、图片、视频等内容形态之后下一代内容形态。AR内容的两大特征是高度3D化和强互动性。AR内容是AR产业链里面非常关键的一环。AR内容的多少、好坏直接决定了终端用户的体验。如何高效的完成AR内容的生产、存储、分发、交换对于AR应用的繁荣会起到决定性作用,所以AR应用一定需要AR增强内容工具。
现实感知,是指通过摄像头、传感器等硬件设备感知现实环境中空间环境及对象目标,也就是指让手机或者AR眼镜拥有像人一样通过视觉的方式理解现实的能力。
在一个优选的方案中,现实感知又可以分为空间感知和对象感知,其中现实感知中的空间是指在相对大尺度环境范围内相对不动的小尺度环境,例如大尺度环境为整个地球,那么陆地、国家、城市、商业区、房间、桌面在一定条件下都可以认为是静态空间环境。对象感知中的对象目标是指相对于大尺度范围内常处于运动状态的对象。具备动态对象目标感知的基础上,可以让虚拟内容跟随动态目标移动而移动,例如站在卡片上的虚拟人物可以跟随卡片移动而移动,看上去两个是一体的。
在一个优选的方案中,对象感知又分为对于人体物体的感知(即针对人体、 人脸、手势等的识别跟踪)和非人体物体(即人工标记,平面图像,三维刚体,非刚体,通用物体等)。
本发明一个实施例是在AR能力光谱的现实感知的对象感知能力中,提出了一种目标追踪的算法,在图像对齐的处理流程中引入鲁棒误差函数。在一个优选的方案中,整体的目标追踪流程是由1)初步位移的判断;2)鲁棒误差函数迭代-;3)特征匹配追踪来一步一步实现的。特征匹配追踪可以给出比较精确的位姿信息,系统一般是以最后得到的这个特征匹配追踪的位姿信息为准的进行目标追踪的。鲁棒误差函数迭代为最后的特征匹配追踪提供了非常好的初始值,极大的提高了特征匹配追踪的成功率,因而提高了整个系统追踪的成功率。
在一个实施例中,一个完整的AR应用/设备,需要通过现实感知能力去像人类或者接近人类一样去感知世界。现实感知一般是通过机器学习和计算机视觉的方式去达成的,有了这样的能力,AR应用/设备才能感知到现实中有什么、在哪里。在感知现实的基础上,AR应用/设备向用户展现合适的内容。由于现实世界是立体(3D)的,这个内容也具备很强的3D属性;由于信息是异常丰富多面的,用户必须能够通过一些交互手段同AR内容进行互动。
综上所述,上述描述的图像处理方法和装置,对图像采用基于鲁棒误差函数的迭代方式进行处理,避免图像差异过大引起迭代失败的情况,使追踪结果更加稳定。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统,装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述模块或单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部 单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
所属技术领域的技术人员可以理解:实现上述方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成,前述的程序可以存储于计算机可读取存储介质中,并被通讯设备内部的处理器执行,前述的程序在被执行时处理器可以执行包括上述方法实施例的全部或者部分步骤。其中,所述处理器可以作为一个或多个处理器芯片实施,或者可以为一个或多个专用集成电路(Application Specific Integrated Circuit,ASIC)的一部分;而前述的存储介质可以包括但不限于以下类型的存储介质:闪存(Flash Memory)、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围。

Claims (21)

  1. 一种图像处理方法,其特征在于,包括:
    在连续的视频帧图像序列中确定连续的第一帧图像和第二帧图像,其中,第一帧图像为所述第二帧图像的上一帧图像;
    以所述第一帧图像的目标区域为模板确定所述目标区域在所述第二帧图像的位置信息;
    以所述位置信息为迭代初值对所述第一帧图像和所述第二帧图像进行整图对齐的迭代处理;
    其中,所述第一帧图像和所述第二帧图像的整图对齐的迭代处理使用鲁棒误差函数进行迭代。
  2. 如权利要求1所述的方法,其特征在于,所述第一帧图像和所述第二帧图像的整图对齐的迭代处理使用鲁棒误差函数作为惩罚函数进行迭代的目标函数为:
    Figure PCTCN2019072918-appb-100001
    其中,ρ为鲁棒误差函数;W为位姿变换函数,包含两个参数;x为图像的像素点在图像中的坐标;p为位姿的变换系数;Δp为位姿变换系数p的更新量;W(x;Δp)表示位姿变换函数,该位姿变换函数的两个参数分别是x和Δp;W(x;p)表示位姿变换函数,该位姿变换函数的两个参数分别是x和p;T(W(x;Δp))表示坐标x在所述第一帧图像上经过Δp变换后的像素值;I(W(x;p))表示坐标x在所述第二帧图像上经过p变换后的像素值;σ为尺度参数向量,用于控制鲁棒误差函数对误差的惩罚力度。
  3. 如权利要求1所述的方法,其特征在于,所述第一帧图像和所述第二帧图像的整图对齐的迭代处理使用鲁棒误差函数作为惩罚函数进行迭代的目标函数为:
    Figure PCTCN2019072918-appb-100002
    其中,ρ为鲁棒误差函数;W为位姿变换函数,包含两个参数;x为图像的像素点在图像中的坐标;p为位姿的变换系数;Δp为位姿变换系数p的更新量;W(x;p+Δp)表示位姿变换函数,该位姿变换函数的两个参数分别是坐标x和变换参数p+Δp;I(W(x;p+Δp))表示坐标x在第二帧图像上经过p+Δp变换后的像素值;T(x)表示坐标x在第一帧图像中的像素值;σ为尺度参数向量,用于控制鲁棒误差函数对误差的惩罚力度。
  4. 如权利要求1所述的方法,其特征在于,所述第一帧图像和所述第二帧图像的整图对齐的迭代处理使用鲁棒误差函数作为惩罚函数进行迭代的目标函数为:
    Figure PCTCN2019072918-appb-100003
    其中,ρ为鲁棒误差函数;W为位姿变换函数,包含两个参数;x为图像的像素点在图像中的坐标;p为位姿的变换系数;Δp为位姿变换系数p的更新量;W(x;Δp)表示位姿变换函数,该位姿变换函数的两个参数分别是坐标x和变换参数Δp;W((W(x;Δp));p)表示位姿变换函数,该位姿变换函数的两个参数是W(x;Δp)和p;I(W((W(x;Δp));p))表示坐标x先经过Δp位姿变换,再经过p位姿变换后在第二帧图像上的像素值;T(x)表示坐标x在第一帧图像中的像素值;σ为尺度参数向量,用于控制鲁棒误差函数对误差的惩罚力度。
  5. 如权利要求1所述的方法,其特征在于,所述第一帧图像和所述第二帧图像的整图对齐的迭代处理使用鲁棒误差函数作为惩罚函数进行迭代的目标函数为:
    Figure PCTCN2019072918-appb-100004
    其中,ρ为鲁棒误差函数;W为位姿变换函数,包含两个参数;x为图像的像素点在图像中的坐标;p为位姿的变换系数;W(x;p)表示位姿变换函数,该位姿变换函数的两个参数分别是x和p;I(W(x;p))表示坐标x经过p位姿变换后在所述第二帧图像上的像素值;T(x)表示坐标x在第一帧图像中的像素值;σ 为尺度参数向量,用于控制鲁棒误差函数对误差的惩罚力度。
  6. 如权利要求2-5任意一项所述的方法,其特征在于,满足以下几个函数条件的函数为所述鲁棒误差函数:
    a)ρ(t,σ)对于任意t,始终大于零;
    b)对于t小于等于零时,ρ(t,σ)始终满足单调递减;
    c)对于t大于等于零时,ρ(t,σ)始终满足单调递增;
    d)ρ(t,σ)是分段可微分的;
    e)ρ(t,σ)单调递增(递减)的时候其值增长幅度小于t 2,大于|t|;
    其中,t表示鲁棒误差函数的参数,σ为尺度参数向量,用于控制鲁棒误差函数对误差的惩罚力度。
  7. 如权利要求6所述的方法,其特征在于,所述鲁棒误差函数为:
    Figure PCTCN2019072918-appb-100005
    其中,t表示鲁棒误差函数的参数,σ为尺度参数向量,用于控制鲁棒误差函数对误差的惩罚力度,σ 1为鲁棒误差函数中设置的阈值。
  8. 如权利要求1所述的方法,其特征在于,在完成所述迭代处理之后,所述方法还包括:
    在所述第二帧图像中通过特征追踪匹配的方式追踪到所述目标区域中的至少一个目标的位置。
  9. 如权利要求1所述的方法,其特征在于,所述方法还包括:
    在所述第一帧图像上确定目标特征的位姿信息,根据所述目标特征的所述位姿信息确定所述第一帧图像的所述目标区域。
  10. 如权利要求1所述的方法,其特征在于,所述第一帧图像为模板图像,所述第二帧图像为当前帧图像。
  11. 一种图像处理装置,其特征在于,包括:
    收发器,用于获取连续的视频帧图像序列;
    处理器,用于在所述连续的视频帧图像序列中确定连续的第一帧图像和第二帧图像,第一帧图像为所述第二帧图像的上一帧图像;以所述第一帧图像的 目标区域为模板确定所述目标区域在所述第二帧图像的位置信息;以所述位置信息为迭代初值对所述第一帧图像和所述第二帧图像进行整图对齐的迭代处理;
    其中,所述第一帧图像和所述第二帧图像的整图对齐的迭代处理使用鲁棒误差函数进行迭代。
  12. 如权利要求11所述的图像处理装置,其特征在于,所述第一帧图像和所述第二帧图像的整图对齐的迭代处理使用鲁棒误差函数作为惩罚函数进行迭代的目标函数为:
    Figure PCTCN2019072918-appb-100006
    其中,ρ为鲁棒误差函数;W为位姿变换函数,包含两个参数;x为图像的像素点在图像中的坐标;p为位姿的变换系数;Δp为位姿变换系数p的更新量;W(x;Δp)表示位姿变换函数,该位姿变换函数的两个参数分别是x和Δp;W(x;p)表示位姿变换函数,该位姿变换函数的两个参数分别是x和p;T(W(x;Δp))表示坐标x在所述第一帧图像上经过Δp变换后的像素值;I(W(x;p))表示坐标x在所述第二帧图像上经过p变换后的像素值;σ为尺度参数向量,用于控制鲁棒误差函数对误差的惩罚力度。
  13. 如权利要求11所述的图像处理装置,其特征在于,所述第一帧图像和所述第二帧图像的整图对齐的迭代处理使用鲁棒误差函数作为惩罚函数进行迭代的目标函数为:
    Figure PCTCN2019072918-appb-100007
    其中,ρ为鲁棒误差函数;W为位姿变换函数,包含两个参数;x为图像的像素点在图像中的坐标;p为位姿的变换系数;Δp为位姿变换系数p的更新量;W(x;p+Δp)表示位姿变换函数,该位姿变换函数的两个参数分别是坐标x和变换参数p+Δp;I(W(x;p+Δp))表示坐标x在第二帧图像上经过p+Δp变换后的像素值;T(x)表示坐标x在第一帧图像中的像素值;σ为尺度参数向量,用于控 制鲁棒误差函数对误差的惩罚力度。
  14. 如权利要求11所述的图像处理装置,其特征在于,所述第一帧图像和所述第二帧图像的整图对齐的迭代处理使用鲁棒误差函数作为惩罚函数进行迭代的目标函数为:
    Figure PCTCN2019072918-appb-100008
    其中,ρ为鲁棒误差函数;W为位姿变换函数,包含两个参数;x为图像的像素点在图像中的坐标;p为位姿的变换系数;Δp为位姿变换系数p的更新量;W(x;Δp)表示位姿变换函数,该位姿变换函数的两个参数分别是坐标x和变换参数Δp;W((W(x;Δp));p)表示位姿变换函数,该位姿变换函数的两个参数是W(x;Δp)和p;I(W((W(x;Δp));p))表示坐标x先经过Δp位姿变换,再经过p位姿变换后在第二帧图像上的像素值;T(x)表示坐标x在第一帧图像中的像素值;σ为尺度参数向量,用于控制鲁棒误差函数对误差的惩罚力度。
  15. 如权利要求11所述的图像处理装置,其特征在于,所述第一帧图像和所述第二帧图像的整图对齐的迭代处理使用鲁棒误差函数作为惩罚函数进行迭代的目标函数为:
    Figure PCTCN2019072918-appb-100009
    其中,ρ为鲁棒误差函数;W为位姿变换函数,包含两个参数;x为图像的像素点在图像中的坐标;p为位姿的变换系数;W(x;p)表示位姿变换函数,该位姿变换函数的两个参数分别是x和p;I(W(x;p))表示坐标x经过p位姿变换后在所述第二帧图像上的像素值;T(x)表示坐标x在第一帧图像中的像素值;σ为尺度参数向量,用于控制鲁棒误差函数对误差的惩罚力度。
  16. 如权利要求12-15任意一项所述的图像处理装置,其特征在于,满足以下几个函数条件的函数为所述鲁棒误差函数:
    a)ρ(t,σ)对于任意t,始终大于零;
    b)对于t小于等于零时,ρ(t,σ)始终满足单调递减;
    c)对于t大于等于零时,ρ(t,σ)始终满足单调递增;
    d)ρ(t,σ)是分段可微分的;
    e)ρ(t,σ)单调递增(递减)的时候其值增长幅度小于t 2,大于|t|;
    其中,t表示鲁棒误差函数的参数,σ为尺度参数向量,用于控制鲁棒误差函数对误差的惩罚力度。
  17. 如权利要求16所述的图像处理装置,其特征在于,所述鲁棒误差函数为:
    Figure PCTCN2019072918-appb-100010
    其中,t表示鲁棒误差函数的参数,σ为尺度参数向量,用于控制鲁棒误差函数对误差的惩罚力度,σ 1为鲁棒误差函数中设置的阈值。
  18. 如权利要求11所述的图像处理装置,其特征在于,所述处理器还用于,在完成所述迭代处理之后,在所述第二帧图像中通过特征追踪匹配的方式追踪到所述目标区域中的至少一个目标的位置。
  19. 如权利要求11所述的图像处理装置,其特征在于,所述处理器还用于,在所述第一帧图像上确定目标特征的位姿信息,根据所述目标特征的所述位姿信息确定所述第一帧图像的所述目标区域。
  20. 如权利要求11所述的图像处理装置,其特征在于,所述第一帧图像为模板图像,所述第二帧图像为当前帧图像。
  21. 一种AR设备,其特征在于,包括如权利要求11至20任意一项所述图像处理装置。
PCT/CN2019/072918 2018-02-13 2019-01-24 一种图像处理方法、装置及ar设备 WO2019157922A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810150484.6 2018-02-13
CN201810150484.6A CN108510520B (zh) 2018-02-13 2018-02-13 一种图像处理方法、装置及ar设备

Publications (1)

Publication Number Publication Date
WO2019157922A1 true WO2019157922A1 (zh) 2019-08-22

Family

ID=63375034

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/072918 WO2019157922A1 (zh) 2018-02-13 2019-01-24 一种图像处理方法、装置及ar设备

Country Status (2)

Country Link
CN (1) CN108510520B (zh)
WO (1) WO2019157922A1 (zh)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108510520B (zh) * 2018-02-13 2019-03-08 视辰信息科技(上海)有限公司 一种图像处理方法、装置及ar设备
CN110555862A (zh) * 2019-08-23 2019-12-10 北京数码视讯技术有限公司 目标追踪的方法、装置、电子设备和计算机可读存储介质
CN113223185B (zh) * 2021-05-26 2023-09-05 北京奇艺世纪科技有限公司 一种图像处理方法、装置、电子设备及存储介质
CN115690333B (zh) * 2022-12-30 2023-04-28 思看科技(杭州)股份有限公司 一种三维扫描方法及系统
CN116386089B (zh) * 2023-06-05 2023-10-31 季华实验室 运动场景下人体姿态估计方法、装置、设备及存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102402691A (zh) * 2010-09-08 2012-04-04 中国科学院自动化研究所 一种对人脸姿态和动作进行跟踪的方法
US20150324663A1 (en) * 2012-01-09 2015-11-12 General Electric Company Image congealing via efficient feature selection
CN106228113A (zh) * 2016-07-12 2016-12-14 电子科技大学 基于aam的人脸特征点快速对齐方法
CN108510520A (zh) * 2018-02-13 2018-09-07 视辰信息科技(上海)有限公司 一种图像处理方法、装置及ar设备

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100268311B1 (ko) * 1993-06-04 2000-10-16 윌리암 제이. 버크 전자적 화상 안정화 장치 및 방법
CN102819849A (zh) * 2012-08-28 2012-12-12 湘潭大学 基于外观约束双阶段优化的人体上半身三维运动跟踪方法
CN104573614B (zh) * 2013-10-22 2020-01-03 北京三星通信技术研究有限公司 用于跟踪人脸的设备和方法
CN104463894B (zh) * 2014-12-26 2020-03-24 山东理工大学 一种多视角三维激光点云全局优化整体配准方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102402691A (zh) * 2010-09-08 2012-04-04 中国科学院自动化研究所 一种对人脸姿态和动作进行跟踪的方法
US20150324663A1 (en) * 2012-01-09 2015-11-12 General Electric Company Image congealing via efficient feature selection
CN106228113A (zh) * 2016-07-12 2016-12-14 电子科技大学 基于aam的人脸特征点快速对齐方法
CN108510520A (zh) * 2018-02-13 2018-09-07 视辰信息科技(上海)有限公司 一种图像处理方法、装置及ar设备

Also Published As

Publication number Publication date
CN108510520A (zh) 2018-09-07
CN108510520B (zh) 2019-03-08

Similar Documents

Publication Publication Date Title
WO2019157922A1 (zh) 一种图像处理方法、装置及ar设备
EP3698323B1 (en) Depth from motion for augmented reality for handheld user devices
CN110631554B (zh) 机器人位姿的确定方法、装置、机器人和可读存储介质
US9420265B2 (en) Tracking poses of 3D camera using points and planes
US9237330B2 (en) Forming a stereoscopic video
US9041819B2 (en) Method for stabilizing a digital video
KR102472767B1 (ko) 신뢰도에 기초하여 깊이 맵을 산출하는 방법 및 장치
WO2019119328A1 (zh) 一种基于视觉的定位方法及飞行器
CN109598744B (zh) 一种视频跟踪的方法、装置、设备和存储介质
Padua et al. Linear sequence-to-sequence alignment
US10091435B2 (en) Video segmentation from an uncalibrated camera array
AU2007254600B2 (en) Collaborative tracking
US20130127988A1 (en) Modifying the viewpoint of a digital image
JP2019075082A (ja) 深度値推定を用いた映像処理方法及び装置
US20220301252A1 (en) View synthesis of a dynamic scene
US8611642B2 (en) Forming a steroscopic image using range map
CN112561978B (zh) 深度估计网络的训练方法、图像的深度估计方法、设备
WO2023016271A1 (zh) 位姿确定方法、电子设备及可读存储介质
US20160048978A1 (en) Method and apparatus for automatic keyframe extraction
AU2013237718A1 (en) Method, apparatus and system for selecting a frame
US20130129192A1 (en) Range map determination for a video frame
WO2021027543A1 (zh) 基于单目图像的模型训练方法、装置及数据处理设备
US20190079158A1 (en) 4d camera tracking and optical stabilization
WO2022252487A1 (zh) 位姿获取方法及装置、电子设备、存储介质和程序
CN110070578B (zh) 一种回环检测方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19755014

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19755014

Country of ref document: EP

Kind code of ref document: A1