CN104103062A

CN104103062A - Image processing device and image processing method

Info

Publication number: CN104103062A
Application number: CN201310119788.3A
Authority: CN
Inventors: 范伟; 刘伟; 何源; 孙俊; 皆川明洋; 堀田悦伸
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2013-04-08
Filing date: 2013-04-08
Publication date: 2014-10-15

Abstract

The invention provides an image processing device and an image processing method. The image processing device comprises an object extraction unit, a matching unit, a feature extraction unit and a depth determination unit, wherein the object extraction unit is used for extracting an image of a target object from a monocular video sequence frame to serve as an object image; the matching unit is used for matching an observation object image with a reference object image to obtain a matching parameter, and the reference object image and the observation object image are object images extracted from the reference frame and the observation frame except from the reference frame; the feature extraction unit is used for the features reflecting depth changes of the observation object image and the reference object image in relative to an imaging plane by using the matching parameter; and the depth determination unit is used for determining the depth changes on the basis of the extracted features.

Description

Image processing device and image processing method

技术领域technical field

本公开一般地涉及视频图像处理，尤其涉及一种能够根据单目视频序列确定目标对象相对于成像平面的深度变化的图像处理设备和图像处理方法。The present disclosure generally relates to video image processing, and in particular, relates to an image processing device and an image processing method capable of determining a depth change of a target object relative to an imaging plane according to a monocular video sequence.

背景技术Background technique

检测图像中的目标对象并且估计目标对象距离摄像机的距离被广泛地应用于各个领域，例如，视觉监控、障碍检测以及人机交互等。绝大多数传统的目标对象深度检测技术基于立体视觉原理来估计目标深度。在这种技术中，需要两个标定好的摄像机，通过分析来自该两个摄像机的图像对之间的对应关系来检测目标对象深度。Detecting a target object in an image and estimating the distance between the target object and the camera is widely used in various fields, such as visual surveillance, obstacle detection, and human-computer interaction. Most of the traditional target object depth detection techniques are based on the principle of stereo vision to estimate the target depth. In this technique, two calibrated cameras are required, and the depth of the target object is detected by analyzing the correspondence between image pairs from the two cameras.

传统技术很少涉及单目视觉目标深度估计。在利用单目图像进行目标深度估计的一种方法中，公开了这样的方案：基于对运动目标对象的运动属性做出的一些假设，例如最大速度、最小速度变化、一致性和连续运动等，来检测运动目标；通过该检测，输出单目序列图像中每一运动目标对象在两个连续图像中的位置；然后，根据输出的位置，通过利用过约束（over-constrained）方法来估计运动目标的距离。Traditional techniques rarely address object depth estimation in monocular vision. In one method of target depth estimation using monocular images, such a scheme is disclosed: based on some assumptions made on the motion properties of moving target objects, such as maximum velocity, minimum velocity change, consistency, and continuous motion, etc., to detect the moving target; through this detection, the position of each moving target object in the monocular sequence image in the two consecutive images is output; then, according to the output position, the moving target is estimated by using the over-constrained method distance.

发明内容Contents of the invention

在诸如上述方案的现有技术中，使用对象的运动特征来进行深度估计，算法相对复杂，且准确率不高。In the prior art such as the above solution, the motion feature of the object is used for depth estimation, and the algorithm is relatively complicated and the accuracy rate is not high.

本发明的目的在于提供一种图像处理设备和图像处理方法，通过在无需进行运动预测的情况下，考虑各个帧中的对象图像的、与目标对象深度变化相关联的变化特征来进行深度检测，从而代替对图像运动特征的分析。An object of the present invention is to provide an image processing apparatus and an image processing method that perform depth detection by taking into account the change characteristics of the object image in each frame associated with the depth change of the target object without performing motion prediction, Thereby instead of analyzing the motion characteristics of the image.

根据本公开的一个方面，提供一种图像处理设备，包括：对象提取单元，用于从单目视频序列的帧中提取目标对象的图像，作为对象图像；配准单元，用于将观察对象图像针对参考对象图像进行配准，以获得配准参数，参考对象图像和观察对象图像是分别从参考帧以及参考帧之外的观察帧中提取的对象图像；特征提取单元，用于通过利用配准参数来提取能够反映观察对象图像与参考对象图像相比相对于成像平面的深度变化的特征；以及深度确定单元，用于基于所提取的特征确定深度变化。According to one aspect of the present disclosure, an image processing device is provided, including: an object extraction unit, configured to extract an image of a target object from a frame of a monocular video sequence as an object image; a registration unit, configured to convert the observed object image Registration is performed on the reference object image to obtain registration parameters, the reference object image and the observation object image are object images extracted from the reference frame and the observation frame other than the reference frame respectively; the feature extraction unit is used to use the registration parameters to extract features capable of reflecting depth changes in the image of the observed object compared with the image of the reference object relative to the imaging plane; and a depth determination unit configured to determine the depth change based on the extracted features.

在根据本公开的一个实施例中，特征提取单元可以将配准参数中反映对象图像尺寸变化的缩放参数提取为特征。In one embodiment according to the present disclosure, the feature extraction unit may extract the scaling parameter reflecting the size change of the object image among the registration parameters as a feature.

在根据本公开的另一个实施例中，如果用S表示观察帧针对参考帧的缩放参数，则：在S>1+ε_a时，深度确定单元确定目标对象相对于成像平面变浅；在S<1-ε_b时，深度确定单元确定目标对象相对于成像平面变深；以及否则，深度确定单元确定目标对象相对于成像平面的深度变化不确定。其中，ε_a和ε_b是作为确定的余量的小的正数。In another embodiment according to the present disclosure, if S represents the scaling parameter of the observation frame with respect to the reference frame, then: when S>1+ε _a , the depth determination unit determines that the target object becomes shallow relative to the imaging plane; when S <1− _εb , the depth determination unit determines that the target object becomes deeper relative to the imaging plane; and otherwise, the depth determination unit determines that the depth change of the target object relative to the imaging plane is uncertain. Here, ε _a and ε _b are small positive numbers as a certain margin.

在根据本公开的另一个实施例中，该图像处理设备还可以包括运动方向识别单元，用于识别目标对象相对于成像平面的运动方向。其中，当使图像处理设备对预定时段内的观察帧顺序执行深度确定时，运动方向识别单元可以依据由深度确定单元确定的结果序列对运动方向进行识别。In another embodiment according to the present disclosure, the image processing device may further include a motion direction identification unit configured to identify the motion direction of the target object relative to the imaging plane. Wherein, when the image processing device is made to sequentially perform depth determination on the observation frames within a predetermined period, the motion direction identification unit may identify the motion direction according to the result sequence determined by the depth determination unit.

在根据本公开的另一个实施例中，当结果序列中存在连续的第一预定数目以上的“变深”时，运动方向识别单元可以识别出目标对象沿远离成像平面的方向运动；以及当结果序列中存在连续的第二预定数目以上的“变浅”时，运动方向识别单元可以识别出目标对象沿靠近成像平面的方向运动。In another embodiment according to the present disclosure, when there are more than the first predetermined number of "darkening" in the result sequence, the motion direction identification unit may identify that the target object moves in a direction away from the imaging plane; and when the result When there are more than a second predetermined number of consecutive "shallows" in the sequence, the motion direction identification unit may identify that the target object is moving in a direction close to the imaging plane.

在根据本公开的另一个实施例中，当结果序列中存在连续的第三预定数目以上的“变深”，并且对应的缩放参数S的连乘积小于第一阈值时，运动方向识别单元可以识别出目标对象沿远离成像平面的方向运动；以及当结果序列中存在连续的第四预定数目以上的“变浅”，并且对应的缩放参数S的连乘积大于第二阈值时，运动方向识别单元可以识别出目标对象沿靠近成像平面的方向运动。In another embodiment according to the present disclosure, when there are more than the third predetermined number of "deepening" in the result sequence in succession, and the continuous product of the corresponding scaling parameters S is less than the first threshold, the motion direction identifying unit may identify When the target object moves in a direction away from the imaging plane; and when there are more than a fourth predetermined number of "shallows" in the result sequence, and the product of the corresponding scaling parameters S is greater than the second threshold, the motion direction identification unit may The movement of the target object in a direction close to the imaging plane is identified.

在根据本公开的另一个实施例中，特征提取单元可以包括：对齐单元，用于根据配准参数中的平移参数将观察对象图像与参考对象图像对齐；以及直方图生成单元，用于针对对齐的对象图像的边缘部分生成方向梯度直方图，作为能够反映深度变化的特征。In another embodiment according to the present disclosure, the feature extraction unit may include: an alignment unit for aligning the observation object image with the reference object image according to the translation parameters in the registration parameters; and a histogram generation unit for aligning The edge portion of the object image generates histograms of oriented gradients as features that can reflect depth changes.

在根据本公开的另一个实施例中，对象提取单元可以包括检测单元。该检测单元用于使用多尺寸的滑动窗口检测对象图像，以确定对象图像所在区域。In another embodiment according to the present disclosure, the object extraction unit may include a detection unit. The detection unit is used to detect the object image by using multi-size sliding windows, so as to determine the area where the object image is located.

在根据本公开的另一个实施例中，提取单元可以包括分割单元。该分割单元用于对对象图像所在区域执行分割处理，以提取出对象图像。In another embodiment according to the present disclosure, the extracting unit may include a segmentation unit. The segmentation unit is used for performing segmentation processing on the region where the object image is located, so as to extract the object image.

在根据本公开的另一个实施例中，检测单元可以使用经训练的目标对象检测器检测对象图像。In another embodiment according to the present disclosure, the detection unit may detect the object image using a trained target object detector.

根据本公开的一个方面，提供一种图像处理方法，包括：从单目视频序列的帧中提取目标对象的图像，作为对象图像；将观察对象图像针对参考对象图像进行配准，以获得配准参数，参考对象图像和观察对象图像是分别从参考帧以及参考帧之外的观察帧中提取的对象图像；通过利用配准参数来提取能够反映观察对象图像与参考对象图像相比相对于成像平面的深度变化的特征；以及基于所提取的特征确定深度变化。According to one aspect of the present disclosure, an image processing method is provided, including: extracting an image of a target object from a frame of a monocular video sequence as an object image; registering the image of the observed object against the image of the reference object to obtain a registration Parameters, the reference object image and the observation object image are the object images extracted from the reference frame and the observation frame outside the reference frame respectively; by using the registration parameters to extract, it can reflect that the observation object image is compared with the reference object image relative to the imaging plane features of the depth variation; and determining the depth variation based on the extracted features.

通过实施根据本发明的图像处理设备和图像处理方法，可以简化目标对象深度检测的算法、实时地对目标对象的相对深度变化进行检测，并且提高检测的准确率。By implementing the image processing device and the image processing method according to the present invention, the algorithm for detecting the depth of the target object can be simplified, the relative depth change of the target object can be detected in real time, and the detection accuracy can be improved.

附图说明Description of drawings

参照下面结合附图对本发明实施例的说明，会更加容易地理解本发明的以上和其它目的、特点和优点。在附图中，相同的或对应的技术特征或部件将采用相同或对应的附图标记来表示。在附图中不必依照比例绘制出单元的尺寸和相对位置。The above and other objects, features and advantages of the present invention will be more easily understood with reference to the following description of the embodiments of the present invention in conjunction with the accompanying drawings. In the drawings, the same or corresponding technical features or components will be indicated by the same or corresponding reference numerals. The dimensions and relative positions of elements are not necessarily drawn to scale in the drawings.

图1是示出进行本发明所依据的原理的示意图。Figure 1 is a schematic diagram illustrating the principle on which the invention is carried out.

图2是示出根据本公开实施例的用于进行深度检测的图像处理设备200的结构的框图。FIG. 2 is a block diagram showing the structure of an image processing device 200 for depth detection according to an embodiment of the present disclosure.

图3是示例性示出检测到手的窗口以及对该窗口进行分割处理后的手的图像。FIG. 3 exemplarily shows a window in which a hand is detected and an image of a hand after the window is segmented.

图4是示出根据本公开实施例的特征提取单元400的结构的框图。FIG. 4 is a block diagram showing the structure of a feature extraction unit 400 according to an embodiment of the present disclosure.

图5是示出利用特征提取单元400对图像进行对齐并生成边缘特征直方图的示意图。FIG. 5 is a schematic diagram illustrating the use of the feature extraction unit 400 to align images and generate an edge feature histogram.

图6是示出根据本公开实施例的用于进行深度检测的图像处理设备600的结构的框图。FIG. 6 is a block diagram showing the structure of an image processing device 600 for depth detection according to an embodiment of the present disclosure.

图7是示出根据本公开实施例的进行深度检测的图像处理方法的流程图。FIG. 7 is a flowchart illustrating an image processing method for depth detection according to an embodiment of the present disclosure.

图8是示出根据本公开实施例的进行深度检测的图像处理方法的流程图。FIG. 8 is a flowchart illustrating an image processing method for depth detection according to an embodiment of the present disclosure.

图9是示出实现本公开的计算机的示例性结构的框图。FIG. 9 is a block diagram showing an exemplary structure of a computer implementing the present disclosure.

具体实施方式Detailed ways

下面参照附图来说明本发明的实施例。应当注意，为了清楚的目的，附图和说明书中省略了与本发明无关的、本领域技术人员已知的部件和处理的表示和描述。Embodiments of the present invention will be described below with reference to the drawings. It should be noted that representation and description of components and processes that are not related to the present invention and known to those skilled in the art are omitted from the drawings and the specification for the purpose of clarity.

为了描述方便，下文中将以手作为目标对象的实例进行描述。在人机交互系统中，可以通过将手相对于摄像机前推或后拉来进行事件的触发。而手的前推或后拉可以通过检测手相对于摄像头的成像平面的距离（深度）变化来确定。可以理解，根据本发明的方案可以应用于任何其它目标对象，例如车辆、人（全身或局部）、诸如指示棒的任何指示装置等；以及任何其它应用场景，例如视觉监控、障碍检测等。For the convenience of description, the hand will be described below as an example of the target object. In a human-computer interaction system, an event can be triggered by pushing the hand forward or backward relative to the camera. The forward pushing or pulling of the hand can be determined by detecting the change of the distance (depth) of the hand relative to the imaging plane of the camera. It can be understood that the solution according to the present invention can be applied to any other target objects, such as vehicles, people (whole body or part), any pointing devices such as pointing sticks, etc.; and any other application scenarios, such as visual monitoring, obstacle detection, etc.

图1是示意性示出进行本发明所依据的原理的示意图。给定单目相机拍摄的图像序列，假定要进行沿光轴方向的深度检测的对象是图像序列中出现的手。图1示出人在做“前推”动作时，相邻两帧（第t-1帧和第t帧）的深度变化和图像平面上的成像大小变化。在图1中，d表示手到照相机的距离；f表示照相机的焦距；h代表性地表示手的尺寸；s表示位于距离d处的尺寸为h的手在成像平面上所成的图像的尺寸；Δd表示手在两帧之间运动的距离，更准确地说是手在两帧之间在相对于成像平面的深度方向（光轴方向）上运动的距离；Δs是响应于手在深度方向上的变化，在图像平面上成像大小的变化。Figure 1 is a schematic diagram schematically illustrating the principle on which the invention is carried out. Given a sequence of images captured by a monocular camera, it is assumed that the object to perform depth detection along the optical axis is a hand appearing in the sequence of images. Figure 1 shows the depth change of two adjacent frames (frame t-1 and frame t) and the change of imaging size on the image plane when a person is doing the "forward" action. In Figure 1, d represents the distance from the hand to the camera; f represents the focal length of the camera; h typically represents the size of the hand; s represents the size of the image formed by a hand with a size h at a distance d on the imaging plane ; Δd represents the distance that the hand moves between two frames, more precisely, the distance that the hand moves between two frames in the depth direction (optical axis direction) relative to the imaging plane; Δs is the response to the hand in the depth direction The change in the image plane changes in the size of the image.

经过图示几何比例关系可知：手的深度变化Δd和图像平面上的成像大小变化Δs的关系如式（1）所示：Through the geometric proportion relationship shown in the illustration, it can be seen that the relationship between the depth change Δd of the hand and the imaging size change Δs on the image plane is shown in formula (1):

$\frac{Δs Δs}{s the s} = = \frac{Δd Δd}{d d - - Δd Δd} \approx \approx \frac{Δd Δd}{d d} - - - - - - ((11))$

其中，当在两个连续帧或者相隔较近的帧之间进行比较时，Δd<<d，因而“≈”是合理的。从式（1）可以看出，可以通过检测成像大小的变化率来估计手的深度的变化率。Wherein, when comparing two consecutive frames or frames that are close to each other, Δd<<d, so "≈" is reasonable. It can be seen from formula (1) that the rate of change of the depth of the hand can be estimated by detecting the rate of change of the imaging size.

此外，从对象深度改变导致对象图像大小发生的变化联想到：如果将对应于不同深度的对象图像按各部分的对应位置对齐，那么，当作为从观察帧提取的对象图像的观察对象图像与作为从参考帧提取的对象图像的参考对象图像相比，相对于成像平面变浅时，在对齐后的图像中，观察对象图像将完全覆盖住参考对象图像；而当观察对象图像与参考对象图像相比，相对于成像平面变深时，在对齐后的图像中，观察对象图像将小于参考对象图像，从而无法完全覆盖参考对象图像。可见：在这两种情况中，对齐后的对象图像的边缘部分将呈现出不同的特征。如果对这些特征进行提取，则可能得到具有显著不同的直方图。也就是说，可以依据在这两种情况下的对齐对象图像的边缘特征的直方图来确定观察对象图像相比于参考对象图像的深度变化。In addition, from the change in the size of the object image caused by the change of the object depth, it is associated that if the object images corresponding to different depths are aligned according to the corresponding positions of each part, then when the observation object image as the object image extracted from the observation frame and the object image as the When the object image extracted from the reference frame is shallower relative to the imaging plane compared to the reference object image, the observed object image will completely cover the reference object image in the aligned image; For example, when darkened relative to the imaging plane, the image of the observed object will be smaller than the image of the reference object in the aligned image, so that the image of the reference object cannot be completely covered. Visible: In both cases, the edge parts of the aligned object images will exhibit different characteristics. If these features are extracted, it is possible to obtain histograms that are significantly different. That is to say, the depth change of the observation object image compared to the reference object image can be determined according to the histograms of the edge features of the alignment object image in these two cases.

考虑到以上各点做出本发明。The present invention has been made in consideration of the above points.

图2是示出根据本发明实施例的确定目标对象相对于成像平面的深度变化的图像处理设备200的结构框图。FIG. 2 is a structural block diagram illustrating an image processing device 200 for determining a depth change of a target object relative to an imaging plane according to an embodiment of the present invention.

如图2所示，图像处理设备200包括：对象提取单元201、配准单元202、特征提取单元203以及深度确定单元204。As shown in FIG. 2 , the image processing device 200 includes: an object extraction unit 201 , a registration unit 202 , a feature extraction unit 203 and a depth determination unit 204 .

对象提取单元201从单目视频序列的帧中提取目标对象的图像，作为后续处理的对象图像。可以在提取单元201中采用本领域公知的各种图像提取方法，只要能够将目标对象的图像从视频帧中识别并分离出以满足后续处理需要即可。The object extraction unit 201 extracts an image of a target object from a frame of a monocular video sequence as an object image for subsequent processing. Various image extraction methods known in the art can be used in the extraction unit 201, as long as the image of the target object can be identified and separated from the video frame to meet the needs of subsequent processing.

在一个实施例中，例如，提取单元201可以包括用于将对象图像从视频帧中检测出来的检测单元（未示出）。例如，在一个示例中，检测单元可以使用多尺寸的滑动窗口检测对象图像，以确定对象图像所在的区域。具体地，该检测单元可以使用特定大小的滑动窗口对视频帧进行扫描，并通过对滑动窗口中的图像内容进行特征提取并将提取的特征送入分类器来确定在该滑动窗口中是否存在对象图像。在用该滑动窗口完成整个帧的扫描后，对该滑动窗口的尺寸进行调整，然后重复进行上述扫描、提取以及确定的步骤，直到确定了对象图像所在区域。In one embodiment, for example, the extraction unit 201 may include a detection unit (not shown) for detecting the object image from the video frame. For example, in one example, the detection unit may use multi-sized sliding windows to detect the object image, so as to determine the area where the object image is located. Specifically, the detection unit can use a sliding window of a specific size to scan the video frame, and perform feature extraction on the image content in the sliding window and send the extracted features to the classifier to determine whether there is an object in the sliding window image. After scanning the entire frame with the sliding window, the size of the sliding window is adjusted, and then the above steps of scanning, extracting and determining are repeated until the region where the object image is located is determined.

检测单元使用的分类器可以采用任何惯用的特征来构造和训练。在一些实施例中，为了能够更加准确地检测到目标对象，可以采用经过训练的目标对象检测器来检测对象图像。通过使用标准学习机技术，诸如支持向量机，训练的目标对象检测器能够更加准确地检测到目标对象图像。这对于诸如手这种纹理较少或边缘较不清晰的目标对象以及背景复杂或光线明暗变化的场景尤其适用。此外，经训练的目标对象检测器的使用也为检测非刚性目标对象提供了可能。非刚性对象的一个实例是手在运动中变换姿势，或者发生平面外的转动。The classifier used by the detection unit can be constructed and trained using any conventional features. In some embodiments, in order to detect the target object more accurately, a trained target object detector may be used to detect the object image. By using standard learning machine techniques, such as support vector machines, trained object detectors are able to detect object images more accurately. This is especially useful for objects with less texture or sharp edges, such as hands, and scenes with complex backgrounds or varying lighting. Furthermore, the use of trained object detectors also opens up the possibility to detect non-rigid objects. An example of a non-rigid object is a hand that changes pose during motion, or that rotates out of plane.

在一些实施例中，为了能够提高后续的深度变化检测处理的准确性，提取单元201还可以包括分割单元（未示出）。该分割单元用于对检测到的对象图像所在区域（例如检测到对象图像的窗口）执行分割处理，以将对象图像作为前景与该区域中的背景区域分离。分割处理可以使用本领域惯用的各种方式进行。例如，在以手作为目标对象的实施例中，可以构造肤色模型，以对检测到的包含手的窗口进行前景背景分离。图3中示例性示出了检测到手的滑动窗口，以及对该窗口区域进行分割处理后得到的作为对象图像的手的图像。对手部图像进行分割处理能够有效地减小被引入后续深度检测处理的噪声，从而，使得深度检测结果的准确性提高。In some embodiments, in order to improve the accuracy of subsequent depth change detection processing, the extraction unit 201 may further include a segmentation unit (not shown). The segmentation unit is used to perform segmentation processing on the region where the detected object image is located (for example, the window where the object image is detected), so as to separate the object image as a foreground from a background region in the region. Segmentation can be performed using various methods commonly used in the art. For example, in an embodiment where a hand is used as a target object, a skin color model may be constructed to perform foreground and background separation on detected windows containing hands. FIG. 3 exemplarily shows the sliding window where the hand is detected, and the image of the hand as the object image obtained after the window area is segmented. Segmenting the hand image can effectively reduce the noise introduced into subsequent depth detection processing, thereby improving the accuracy of the depth detection result.

回到图2，如果在单目视频序列中取一帧作为参考帧，取参考帧之外的帧作为观察帧，并将在参考帧提取出的对象图像称作参考对象图像，将在观察帧中提取出的对象图像称作观察对象图像，则配准单元202将观察对象图像针对参考对象图像进行配准，从而获得相应的配准参数。配准参数一般包括：表示对象图像之间发生的平移的平移参数、表示对象图像在平面内的旋转的旋转参数，以及表示对象图像的尺寸变化的缩放参数。Returning to Figure 2, if one frame is taken as the reference frame in the monocular video sequence, and the frames other than the reference frame are taken as the observation frame, and the object image extracted from the reference frame is called the reference object image, the observation frame The object image extracted from is called the observation object image, and the registration unit 202 registers the observation object image with the reference object image, so as to obtain corresponding registration parameters. Registration parameters generally include: a translation parameter representing the translation occurring between object images, a rotation parameter representing the rotation of the object image in a plane, and a scaling parameter representing the size change of the object image.

配准单元202可以根据本领域已知的各种方法来进行配准处理。例如，可以参考J.Lee,S.S.Young,R.Gutierrez-Osuna在Technical Report,CSE Department,Texas A&M University,2011中发表的“An IterativeImage Registration Technique Using a Scale-Space Model”。在对手进行检测的实施例中，考虑到手图像的低分辨率，可以采用上述文献中提到的基于区域的图像配准方法直接将两个图像的像素密度作为整体进行匹配。这通过将尺寸空间模型嵌入非线性最小二乘架构中来实现。The registration unit 202 may perform registration processing according to various methods known in the art. For example, you can refer to "An Iterative Image Registration Technique Using a Scale-Space Model" published by J. Lee, S.S. Young, R. Gutierrez-Osuna in Technical Report, CSE Department, Texas A&M University, 2011. In the embodiment of hand detection, considering the low resolution of the hand image, the region-based image registration method mentioned in the above literature can be used to directly match the pixel densities of the two images as a whole. This is achieved by embedding the dimension-space model in a nonlinear least-squares architecture.

在由配准单元202获得观察对象图像与参考对象图像的配准参数后，特征提取单元203通过利用来自所述配准单元202的配准参数，来提取能够反映观察对象图像与参考对象图像相比相对于成像平面的深度变化的特征。After the registration unit 202 obtains the registration parameters of the observation object image and the reference object image, the feature extraction unit 203 uses the registration parameters from the registration unit 202 to extract Ratio features that vary in depth relative to the imaging plane.

在一个实施例中，特征提取单元203可以直接提取配准参数中反映对象图像尺寸变化的缩放参数S，作为反映观察对象图像与参考对象图像相比相对于成像平面的深度变化的特征。In one embodiment, the feature extraction unit 203 may directly extract the scaling parameter S reflecting the size change of the object image among the registration parameters, as a feature reflecting the depth change of the observed object image relative to the imaging plane compared with the reference object image.

在将缩放参数S作为反映深度变化的特征的情况下，深度确定单元204可以以这样的方式来确定观察对象图像与参考对象图像相比的深度变化：在S>1+ε_a时，确定目标对象相对于成像平面变浅；在S<1-ε_b时，确定目标对象相对于成像平面变深；以及，否则，确定目标对象相对于成像平面的深度变化不确定。其中，ε_a和ε_b是作为确定的余量的小的正数，可以取为相同或不同的值。ε_a和ε_b例如可以取为0.05或0.1，依具体设计需要而定。In the case of using the scaling parameter S as a feature reflecting the depth change, the depth determination unit 204 can determine the depth change of the observation object image compared with the reference object image in such a way: when S>1+ε _a , determine the target The object becomes shallow relative to the imaging plane; when S<1- _εb , it is determined that the target object becomes dark relative to the imaging plane; and, otherwise, it is determined that the depth change of the target object relative to the imaging plane is indeterminate. Here, ε _a and ε _b are small positive numbers serving as certain margins, and may take the same or different values. ε _a and ε _b can be taken as 0.05 or 0.1, for example, depending on specific design requirements.

举例说明：在取ε_a和ε_b为0.1的情况下，当缩放参数S=1.15时，深度确定单元204将深度变化确定为“变浅”（S>1+ε_a=1.1）；当缩放参数S=0.8时，深度确定单元204将深度变化确定为“变深”（在S<1-ε_b=0.9）；当缩放参数S=1.05时，深度确定单元204将深度变化确定为“不确定”（0.9<S<1.1）。For example: in the case of taking ε _a and ε _b as 0.1, when the scaling parameter S=1.15, the depth determination unit 204 determines the depth change as "shallow"(S>1+ε _a =1.1); when scaling When the parameter S=0.8, the depth determination unit 204 determines the depth change as "deepening" (at S<1- _εb =0.9); when the scaling parameter S=1.05, the depth determination unit 204 determines the depth change as "not OK"(0.9<S<1.1).

在另外的实施例中，代替将缩放参数S直接提取为确定深度变化的特征，特征提取单元203可以根据来自配准单元202的配准参数中的平移参数将待确定的对象图像对齐，然后对对齐后的图像的边缘部分进行分析，以提取相应特征。In another embodiment, instead of directly extracting the scaling parameter S as a feature to determine the depth change, the feature extraction unit 203 can align the image of the object to be determined according to the translation parameter in the registration parameters from the registration unit 202, and then The edge parts of the aligned images are analyzed to extract corresponding features.

图4是示出作为特征提取单元203的一个示例的特征提取单元400的结构的框图。特征提取单元400可以包括对齐单元401和直方图生成单元402。对齐单元401根据配准参数中的平移参数将观察对象图像与参考对象图像对齐。直方图生成单元402针对对齐的对象图像的边缘部分生成方向梯度直方图，以作为反映深度变化的特征。直方图生成单元402可以采用本领域公知的任何方法来提取对齐对象图像的边缘部分的特征，以生成相应的直方图。FIG. 4 is a block diagram showing the configuration of a feature extraction unit 400 as one example of the feature extraction unit 203 . The feature extraction unit 400 may include an alignment unit 401 and a histogram generation unit 402 . The alignment unit 401 aligns the image of the observation object with the image of the reference object according to the translation parameter in the registration parameters. The histogram generation unit 402 generates a histogram of directional gradients for edge portions of the aligned object images as features reflecting depth changes. The histogram generation unit 402 may use any method known in the art to extract features of the edge portion of the alignment object image to generate a corresponding histogram.

参见图5。图5的左侧示例性示出了由对齐单元401根据平移参数将观察对象图像和参考对象图像对齐后所获得的图像。由于在本例中参考帧和观察帧是相邻帧（或者是间隔较近的帧），因而观察和参考对象图像各自的边缘在对齐的图像中不能单独地观察到。然而，利用直方图生成单元402针对对齐的图像的边缘部分进行特征提取，并生成相应的方向梯度直方图，可以清楚地分辨出观察和参考对象图像的大小关系。See Figure 5. The left side of Fig. 5 exemplarily shows the image obtained after the alignment unit 401 aligns the observation object image and the reference object image according to the translation parameters. Since the reference frame and the observation frame are adjacent frames (or closely spaced frames) in this example, the respective edges of the observation and reference object images cannot be observed separately in the aligned image. However, by using the histogram generation unit 402 to perform feature extraction on the edge part of the aligned image and generate a corresponding directional gradient histogram, the size relationship between the observed and reference object images can be clearly distinguished.

例如，图5中的直方图（a）示意性示出观察对象图像相比于参考对象图像距成像平面更近（“变浅”）的直方图；图5中的直方图（b）示意性示出观察对象图像相比于参考对象图像距成像平面更远（“变深”）的直方图。从（a）和（b）的对比中可以看出，在观察对象图像与参考对象图像相比相对成像平面变浅或变深的不同情况下，所生成的特征直方图将具有明显区别。请注意：图5中所示直方图指示示意性的，依据提取边缘部分特征所采用的方法的不同，可以得出呈现出不同形式的直方图，只要这些直方图能够反映出不同的深浅变化即可。For example, the histogram (a) in Figure 5 schematically shows a histogram in which the image of the observed object is closer ("lightened") to the imaging plane than the image of the reference object; histogram (b) in Figure 5 schematically shows A histogram showing that the observed object image is farther ("darkened") from the imaging plane than the reference object image. From the comparison of (a) and (b), it can be seen that the generated feature histograms will have obvious differences when the image of the observed object is shallower or darker relative to the imaging plane compared with the image of the reference object. Please note: the histogram shown in Figure 5 is schematic. Depending on the method used to extract the features of the edge, different forms of histograms can be obtained, as long as these histograms can reflect different depth changes. Can.

深度确定单元204可以根据由直方图生成单元402提供的直方图来确定帧间对象图像的深度变化。The depth determining unit 204 may determine the depth variation of the object image between frames according to the histogram provided by the histogram generating unit 402 .

在人机交互等的应用场景中，只确定视频图像序列的两帧中的深度变化是不够的，而要对在特定时间段中，即特定序列的帧中的目标对象相对于成像平面的运动方向进行确定。然后，根据目标对象的运动方向来触发相应的操作。例如，以手作为目标对象的示例，可以当确定出手的运动方向是沿靠近摄像机成像平面的方向运动（即手“前推”）时，打开特定功能；而当确定出手的方向是沿远离摄像机成像平面的方向运动（即手“后拉”）时，关闭特定功能。为了实现这种交互功能，在一些实施例中，图像处理设备还包括用于识别目标对象相对于成像平面的运动方向的运动方向识别单元。In application scenarios such as human-computer interaction, it is not enough to only determine the depth change in two frames of a video image sequence, but it is necessary to determine the motion of the target object in a specific time period, that is, a specific sequence of frames relative to the imaging plane. The direction is determined. Then, a corresponding operation is triggered according to the moving direction of the target object. For example, taking the hand as an example of the target object, when it is determined that the movement direction of the hand is moving in a direction close to the imaging plane of the camera (that is, the hand "pushes forward"), a specific function can be turned on; and when it is determined that the direction of the hand is moving away from the camera Certain functions are turned off when the direction of the imaging plane is moved (i.e. the hand is "pulled back"). In order to realize this interactive function, in some embodiments, the image processing device further includes a motion direction identification unit for identifying the motion direction of the target object relative to the imaging plane.

图6是示出根据本发明实施例的图像处理设备600的结构的框图。图像处理设备600包括：对象提取单元601、配准单元602、特征提取单元603、深度确定单元604以及运动方向识别单元605。因为对象提取单元601、配准单元602、特征提取单元603和深度确定单元604的结构和功能分别与结合图2中说明的对象提取单元201、配准单元202、特征提取单元203和深度确定单元204的结构和功能相同，因此这里省略其重复描述，而只对运动方向识别单元605进行说明。FIG. 6 is a block diagram showing the structure of an image processing apparatus 600 according to an embodiment of the present invention. The image processing device 600 includes: an object extraction unit 601 , a registration unit 602 , a feature extraction unit 603 , a depth determination unit 604 and a motion direction identification unit 605 . Because the structures and functions of the object extraction unit 601, the registration unit 602, the feature extraction unit 603 and the depth determination unit 604 are respectively the same as the object extraction unit 201, the registration unit 202, the feature extraction unit 203 and the depth determination unit explained in FIG. 2 The structures and functions of 204 are the same, so repeated descriptions thereof are omitted here, and only the motion direction identification unit 605 is described.

运动方向识别单元605可以识别目标对象相对于成像平面的运动方向。例如，使图像处理设备600对预定时段内的观察帧顺序执行深度确定，从而得到每一个观察帧的深度确定结果。然后，运动方向识别单元605可以依据从深度确定单元604输入的观察帧的深度确定结果序列对运动方向进行识别。The motion direction identifying unit 605 can identify the motion direction of the target object relative to the imaging plane. For example, the image processing device 600 is made to sequentially perform depth determination on observation frames within a predetermined period, so as to obtain a depth determination result for each observation frame. Then, the motion direction identification unit 605 may identify the motion direction according to the depth determination result sequence of the observation frame input from the depth determination unit 604 .

需要说明的是：针对预定时段，可以在该时段内的视频帧中取一帧作为参考帧。例如，可以但不限于取该时段内的第一帧或最后一帧作为参考帧。然后，将该预定时段内的参考帧之外的帧作为观察帧。此外，也可以任意地在该时段之外取一帧作为参考帧，例如紧挨该时段前的一帧。然后，将该预定时段内的所有帧作为观察帧。这可以依据设计需要来决定。虽然也可以在预定时段所对应的视频序列的中间取一帧作为参考帧，但一般地，为了方便起见，通常取位于观察帧序列的一端的视频帧作为参考帧。下面举例说明运动方向识别单元605进行识别的实施例。It should be noted that: for the predetermined time period, one frame may be taken as the reference frame among the video frames within the time period. For example, the first frame or the last frame within the period may be taken as the reference frame but not limited to. Then, frames other than the reference frame within the predetermined period are used as observation frames. In addition, it is also possible to arbitrarily take a frame outside the period as the reference frame, for example, a frame immediately before the period. Then, all the frames within the predetermined period are taken as observation frames. This can be determined based on design needs. Although it is also possible to take a frame in the middle of the video sequence corresponding to the predetermined period as the reference frame, generally, for the sake of convenience, the video frame at one end of the observed frame sequence is usually taken as the reference frame. The following example illustrates an embodiment of the recognition performed by the motion direction recognition unit 605 .

在一个实施例中，取预定时段内的最早一帧作为参考帧，其后的其它帧作为观察帧，并使得图像处理设备600顺序对各观察帧进行深度确定处理。由于上面已经进行了充分地描述，这里省略描述对观察帧进行的对象提取、配准、特征提取、深度确定等处理。深度确定单元604将获得的各观察帧的深度确定结果提供给运动方向识别单元605。运动方向识别单元605依据获得的针对各个观察帧的深度确定结果序列对运动个方向进行识别。In one embodiment, the earliest frame within a predetermined period is taken as a reference frame, and other subsequent frames are used as observation frames, and the image processing device 600 is made to sequentially perform depth determination processing on each observation frame. Since the above has been fully described, the description of the processing of object extraction, registration, feature extraction, depth determination, etc. performed on the observation frame is omitted here. The depth determination unit 604 provides the obtained depth determination results of each observed frame to the motion direction identification unit 605 . The motion direction identifying unit 605 identifies the motion directions according to the obtained sequence of depth determination results for each observation frame.

例如，当深度确定结果序列中连续出现n个以上的“变深”时，运动方向识别单元识别出目标对象沿远离成像平面的方向运动。例如，识别出手做后拉运动。而当深度确定结果序列中连续出现m个以上的“变浅”时，运动方向识别单元识别出目标对象沿靠近成像平面的方向运动。例如，识别出手做前推运动。其中，n和m是预先设定的正整数，可以取相同或不同的值。n和m的大小可以依据设计需要来决定。For example, when more than n consecutive occurrences of "deepening" appear in the sequence of depth determination results, the motion direction recognition unit recognizes that the target object moves in a direction away from the imaging plane. For example, a hand pulling back movement is recognized. However, when more than m "shallows" appear consecutively in the sequence of depth determination results, the motion direction recognition unit recognizes that the target object moves in a direction close to the imaging plane. For example, a forward thrust movement of a hand is recognized. Wherein, n and m are preset positive integers, which can take the same or different values. The sizes of n and m can be determined according to design requirements.

在使用缩放参数S来进行深度确定的一些实施例中，还可以依据深度确定结果和缩放参数S二者来识别目标对象的运动方向。例如，当来自深度确定单元604的结果序列中连续出现K个以上的“变深”，并且这K个“变深”的观察对象图像所对应的缩放参数S_i（i=1,…,K）的连乘积小于预定阈值TH1时，运动方向识别单元605可以识别出目标对象沿远离成像平面的方向运动。此外，当来自深度确定单元604的结果序列中连续出现L个以上的“变浅”，并且这L个“变浅”的观察对象图像所对应的缩放参数S_j（j=1,…,L）的连乘积大于预定阈值TH2时，运动方向识别单元605可以识别出目标对象沿靠近成像平面的方向运动。其中，K和L是预先设定的正整数，可以取相同或不同的值。K和L以及TH1和TH2的大小可以依据设计需要来决定。In some embodiments where the scaling parameter S is used for depth determination, the moving direction of the target object can also be identified according to both the depth determination result and the scaling parameter S. For example, when there are more than K "darkening" in the result sequence from the depth determination unit 604, and the scaling parameters S _i (i=1,...,K ) is less than the predetermined threshold TH1, the motion direction identification unit 605 may identify that the target object is moving in a direction away from the imaging plane. In addition, when there are more than L "shallow" consecutively in the result sequence from the depth determination unit 604, and the scaling parameters S _j corresponding to these L "shallow" observation object images (j=1,...,L ) is greater than the predetermined threshold TH2, the motion direction identification unit 605 may identify that the target object is moving in a direction close to the imaging plane. Wherein, K and L are preset positive integers, which can take the same or different values. The sizes of K and L and TH1 and TH2 can be determined according to design requirements.

将特定确定结果连续出现的次数和对应于这些连续出现结果的缩放参数相结合来识别目标对象相对于成像平面的运动方向，提高了识别结果的准确性。此外，通过控制预定阈值TH1和TH2的大小，可以将运动对象相对于成像平面相对缓慢、不显著的运动排除掉。The movement direction of the target object relative to the imaging plane is identified by combining the number of consecutive occurrences of specific determination results and the scaling parameters corresponding to these consecutive occurrences, which improves the accuracy of the identification result. In addition, relatively slow and insignificant motion of the moving object relative to the imaging plane can be excluded by controlling the magnitudes of the predetermined thresholds TH1 and TH2.

下面结合图7和图8描述根据本公开实施例的图像处理设备所使用的图像处理方法。The image processing method used by the image processing device according to the embodiment of the present disclosure will be described below with reference to FIG. 7 and FIG. 8 .

图7是示出根据本公开实施例的基于单目视频序列检测目标深度的图像处理方法的流程图。Fig. 7 is a flowchart illustrating an image processing method for detecting the depth of a target based on a monocular video sequence according to an embodiment of the present disclosure.

在步骤S701中，从单目视频序列的帧中提取目标对象的图像，作为对象图像。可以采用本领域公知的各种图像提取方法，只要能够将目标对象的图像从视频帧中识别并分离出以满足后续处理需要即可。In step S701, an image of a target object is extracted from a frame of a monocular video sequence as an object image. Various image extraction methods known in the art can be used, as long as the image of the target object can be identified and separated from the video frame to meet the needs of subsequent processing.

在一个实施例中，可以使用多尺寸的滑动窗口对对象图像进行检测，以确定对象图像所在的区域。具体地，可以使用特定大小的滑动窗口对视频帧进行扫描，并通过对滑动窗口中的图像内容进行特征提取并将提取的特征送入分类器来确定在该滑动窗口中是否存在对象图像。在用该滑动窗口完成整个帧的扫描后，对该滑动窗口的尺寸进行调整，然后重复进行上述扫描、提取以及确定的步骤，直到确定了对象图像所在区域。In one embodiment, multi-sized sliding windows may be used to detect the object image, so as to determine the region where the object image is located. Specifically, a sliding window of a specific size can be used to scan video frames, and by performing feature extraction on the image content in the sliding window and feeding the extracted features into a classifier to determine whether there is an object image in the sliding window. After scanning the entire frame with the sliding window, the size of the sliding window is adjusted, and then the above steps of scanning, extracting and determining are repeated until the region where the object image is located is determined.

使用的分类器可以采用任何惯用的特征来构造和训练。在一些实施例中，为了能够更加准确地检测到目标对象，可以采用经过训练的目标对象检测器来检测对象图像。通过使用标准学习机技术，诸如支持向量机，训练的目标对象检测器能够更加准确地检测到目标对象图像。The classifier used can be constructed and trained using any conventional features. In some embodiments, in order to detect the target object more accurately, a trained target object detector may be used to detect the object image. By using standard learning machine techniques, such as support vector machines, trained object detectors are able to detect object images more accurately.

在一些实施例中，为了能够提高后续的深度变化检测处理的准确性，还可以对检测到的对象图像所在区域（例如检测到对象图像的窗口）执行分割处理，以将对象图像作为前景与该区域中的背景区域分离。分割处理可以使用本领域惯用的各种方式进行。例如，在以手作为目标对象的实施例中，可以构造肤色模型，以对检测到的包含手的窗口进行前景背景分离。对手部图像进行分割处理能够有效地减小被引入后续深度检测处理的噪声，从而，使得深度检测结果的准确性提高。In some embodiments, in order to improve the accuracy of the subsequent depth change detection processing, segmentation processing can also be performed on the region where the detected object image is located (for example, the window where the object image is detected), so as to use the object image as the foreground and the object image. The background region in the region is separated. Segmentation can be performed using various methods commonly used in the art. For example, in an embodiment where a hand is used as a target object, a skin color model may be constructed to perform foreground and background separation on detected windows containing hands. Segmenting the hand image can effectively reduce the noise introduced into subsequent depth detection processing, thereby improving the accuracy of the depth detection result.

在视频帧序列中取一帧作为参考帧，并将参考帧以外的帧作为观察帧。将从参考帧和观察帧中提取的对象图像分别称作参考对象图像和观察对象图像。则在步骤S702中，将观察对象图像向参考对象图像进行配准，从而获得相应的配准参数。可以根据本领域已知的各种方法来进行配准处理。这里不再赘述。Take a frame in the sequence of video frames as a reference frame, and use frames other than the reference frame as observation frames. The object images extracted from the reference frame and the observation frame are referred to as a reference object image and an observation object image, respectively. Then in step S702, the image of the observation object is registered to the image of the reference object, so as to obtain corresponding registration parameters. The registration process can be performed according to various methods known in the art. I won't go into details here.

在获得观察对象图像与参考对象图像的配准参数后，在步骤S703中，通过利用所获得的配准参数提取能够反映观察对象图像与参考对象图像相比相对于成像平面的深度变化的特征。After the registration parameters of the observation object image and the reference object image are obtained, in step S703, features that can reflect the depth change of the observation object image and the reference object image relative to the imaging plane are extracted by using the obtained registration parameters.

在一个实施例中，可以直接提取配准参数中反映对象图像尺寸变化的缩放参数S，作为反映观察对象图像与参考对象图像相比相对于成像平面的深度变化的特征。In one embodiment, the scaling parameter S reflecting the size change of the object image among the registration parameters may be directly extracted as a feature reflecting the depth change of the observed object image relative to the imaging plane compared with the reference object image.

在将缩放参数S作为反映深度变化的特征的情况下，可以以这样的方式来确定观察对象图像与参考对象图像相比的深度变化：在S>1+ε_a时，可以确定目标对象相对于成像平面变浅；在S<1-ε_b时，可以确定目标对象相对于成像平面变深；以及，否则，可以确定目标对象相对于成像平面的深度变化不确定。其中，ε_a和ε_b是作为确定的余量的小的正数，可以取为相同或不同的值。ε_a和ε_b例如可以取为0.05或0.1，依具体设计需要而定。In the case of taking the scaling parameter S as a feature reflecting the depth change, the depth change of the observed object image compared with the reference object image can be determined in such a way: when S>1+ε _a , it can be determined that the target object is relative to The imaging plane becomes shallow; when S<1- _εb , it can be determined that the target object becomes darker relative to the imaging plane; and, otherwise, it can be determined that the depth change of the target object relative to the imaging plane is indeterminate. Here, ε _a and ε _b are small positive numbers serving as certain margins, and may take the same or different values. ε _a and ε _b can be taken as 0.05 or 0.1, for example, depending on specific design requirements.

在另外的实施例中，代替将缩放参数S直接提取为确定深度变化的特征，可以根据配准参数中的平移参数将待确定的对象图像对齐，然后对对齐后的图像的边缘部分进行分析，以提取相应特征。In another embodiment, instead of directly extracting the scaling parameter S as a feature to determine the depth change, the image of the object to be determined can be aligned according to the translation parameter in the registration parameter, and then the edge part of the aligned image is analyzed, to extract the corresponding features.

例如，可以根据配准参数中的平移参数将观察对象图像与参考对象图像对齐。然后，针对对齐的对象图像的边缘部分生成方向梯度直方图，以作为反映深度变化的特征。可以采用本领域公知的任何方法来提取对齐对象图像的边缘部分的特征，以生成相应的直方图。通常，通过针对对齐的图像的边缘部分进行特征提取，并生成相应的方向梯度直方图，可以清楚地分辨出观察和参考对象图像的大小关系。For example, the image of the observed object can be aligned with the image of the reference object according to the translation parameter in the registration parameters. Then, a histogram of oriented gradients is generated for the edge part of the aligned object images as a feature reflecting the depth variation. Any method known in the art may be used to extract features of the edge portion of the alignment object image to generate a corresponding histogram. Usually, by performing feature extraction on the edge parts of the aligned images and generating corresponding histograms of oriented gradients, the size relationship between the observed and reference object images can be clearly distinguished.

在步骤S704中，可以根据在步骤S703中得到的直方图来确定帧间对象图像的深度变化。然后，处理结束。In step S704, the depth variation of the object image between frames may be determined according to the histogram obtained in step S703. Then, the processing ends.

根据图7所示实施例的深度检测方法，能够在保持低运算负载的同时实时进行视频中对象深度的检测。According to the depth detection method of the embodiment shown in FIG. 7 , it is possible to detect the depth of an object in a video in real time while maintaining a low computing load.

在人机交互等的应用场景中，有时需要确定特定序列的帧中的目标对象相对于成像平面的运动方向，以根据目标对象的运动方向来触发相应的操作。例如，以手作为目标对象的示例，可以当确定出手的运动方向是沿靠近摄像机成像平面的方向运动（即手“前推”）时，打开特定功能；而当确定出手的方向是沿远离摄像机成像平面的方向运动（即手“后拉”）时，关闭特定功能。In application scenarios such as human-computer interaction, it is sometimes necessary to determine the motion direction of the target object in a specific sequence of frames relative to the imaging plane, so as to trigger corresponding operations according to the motion direction of the target object. For example, taking the hand as an example of the target object, when it is determined that the movement direction of the hand is moving in a direction close to the imaging plane of the camera (that is, the hand "pushes forward"), a specific function can be turned on; and when it is determined that the direction of the hand is moving away from the camera Certain functions are turned off when the direction of the imaging plane is moved (i.e. the hand is "pulled back").

图8是示出根据本公开实施例的识别目标对象相对于成像平面的运动方向的方法的流程图。由于图8中的步骤S801～S804的处理与结合图7说明的步骤S701～S704的处理相同，这里不再进行重复说明，只对步骤S805进行描述。FIG. 8 is a flowchart illustrating a method for identifying a moving direction of a target object relative to an imaging plane according to an embodiment of the present disclosure. Since the processing of steps S801 to S804 in FIG. 8 is the same as the processing of steps S701 to S704 described in conjunction with FIG. 7 , repeated description will not be repeated here, and only step S805 will be described.

在步骤S805中，识别目标对象相对于成像平面的运动方向。例如，可以对预定时段内的观察帧顺序执行步骤S801～S804中的处理，从而得到每一个观察帧的深度确定结果。然后，在步骤S805中，可以依据获得的观察帧的深度确定结果序列对运动方向进行识别。In step S805, the moving direction of the target object relative to the imaging plane is identified. For example, the processing in steps S801 to S804 may be sequentially performed on the observation frames within a predetermined period, so as to obtain a depth determination result for each observation frame. Then, in step S805, the motion direction may be identified according to the obtained depth determination result sequence of the observation frame.

需要说明的是：针对预定时段，可以在该时段内的视频帧中取一帧作为参考帧。例如，可以但不限于取该时段内的第一帧或最后一帧作为参考帧。然后，将该预定时段内的参考帧之外的帧作为观察帧。此外，也可以任意地在该时段之外取一帧作为参考帧，例如紧挨该时段前的一帧。然后，将该预定时段内的所有帧作为观察帧。这可以依据设计需要来决定。虽然也可以在预定时段所对应的视频序列的中间取一帧作为参考帧，但一般地，为了方便起见，通常取位于观察帧序列的一端的视频帧作为参考帧。下面举例说明对运动方向进行识别的实施例。It should be noted that: for the predetermined time period, one frame may be taken as the reference frame among the video frames within the time period. For example, the first frame or the last frame within the period may be taken as the reference frame but not limited to. Then, frames other than the reference frame within the predetermined period are used as observation frames. In addition, it is also possible to arbitrarily take a frame outside the period as the reference frame, for example, a frame immediately before the period. Then, all the frames within the predetermined period are taken as observation frames. This can be determined based on design needs. Although it is also possible to take a frame in the middle of the video sequence corresponding to the predetermined period as the reference frame, generally, for the sake of convenience, the video frame at one end of the observed frame sequence is usually taken as the reference frame. The following example illustrates the embodiment of identifying the motion direction.

例如，当深度确定结果序列中连续出现n个以上的“变深”时，可以识别出目标对象沿远离成像平面的方向运动。例如，识别出手做后拉运动。而当深度确定结果序列中连续出现m个以上的“变浅”时，可以识别出目标对象沿靠近成像平面的方向运动。例如，识别出手做前推运动。其中，n和m是预先设定的正整数，可以取相同或不同的值。n和m的大小可以依据设计需要来决定。For example, when more than n consecutive occurrences of "deepening" appear in the sequence of depth determination results, it can be identified that the target object is moving in a direction away from the imaging plane. For example, a hand pulling back movement is recognized. However, when more than m "shallows" appear consecutively in the sequence of depth determination results, it can be identified that the target object is moving in a direction close to the imaging plane. For example, a forward thrust movement of a hand is recognized. Wherein, n and m are preset positive integers, which can take the same or different values. The sizes of n and m can be determined according to design requirements.

在使用缩放参数S来进行深度确定的一些实施例中，还可以依据深度确定结果和缩放参数S二者来识别目标对象的运动方向。例如，当步骤S804中获得的确定结果序列中连续出现K个以上的“变深”，并且这K个“变深”的观察对象图像所对应的缩放参数S_i（i=1,…,K）的连乘积小于预定阈值TH1时，可以识别出目标对象沿远离成像平面的方向运动。此外，深度确定结果序列中连续出现L个以上的“变浅”，并且这L个“变浅”的观察对象图像所对应的缩放参数S_j（j=1,…,L）的连乘积大于预定阈值TH2时，运动方向识别单元605可以识别出目标对象沿靠近成像平面的方向运动。其中，K和L是预先设定的正整数，可以取相同或不同的值。K和L以及TH1和TH2的大小可以依据设计需要来决定。In some embodiments where the scaling parameter S is used for depth determination, the moving direction of the target object can also be identified according to both the depth determination result and the scaling parameter S. For example, when there are more than K "darkening" consecutively in the determination result sequence obtained in step S804, and the scaling parameters S _i (i=1,...,K ) is less than the predetermined threshold TH1, it can be identified that the target object is moving in a direction away from the imaging plane. In addition, more than L "shallows" appear consecutively in the depth determination result sequence, and the continuous product of the scaling parameters S _j (j=1,...,L) corresponding to these L "shallowed" observation object images is greater than When the threshold TH2 is predetermined, the motion direction identification unit 605 may identify that the target object is moving in a direction close to the imaging plane. Wherein, K and L are preset positive integers, which can take the same or different values. The sizes of K and L and TH1 and TH2 can be determined according to design requirements.

请注意：虽然在上面的实施例中取视频序列中的一帧作为参考帧，其它帧作为观察帧来与该参考帧进行配准，也可以采用这样的方式：即，将视频序列中的前一帧作为参考帧，后一帧作为观察帧，如此往复。依照设计需要来决定。Please note: Although one frame in the video sequence is taken as the reference frame in the above embodiment, and other frames are used as the observation frame to register with the reference frame, it is also possible to use such a method: that is, the previous frame in the video sequence One frame is used as a reference frame, the next frame is used as an observation frame, and so on. According to design needs to decide.

下文中，参考图9描述实现本发明的数据处理设备的计算机的示例性结构。图9是示出实现本发明的计算机的示例性结构的框图。Hereinafter, an exemplary structure of a computer implementing the data processing device of the present invention is described with reference to FIG. 9 . FIG. 9 is a block diagram showing an exemplary structure of a computer implementing the present invention.

在图9中，中央处理单元(CPU)901根据只读存储器(ROM)902中存储的程序或从存储部分908加载到随机存取存储器(RAM)903的程序执行各种处理。在RAM903中，也根据需要存储当CPU901执行各种处理时所需的数据。In FIG. 9 , a central processing unit (CPU) 901 executes various processes according to programs stored in a read only memory (ROM) 902 or programs loaded from a storage section 908 to a random access memory (RAM) 903 . In the RAM 903, data required when the CPU 901 executes various processes is also stored as necessary.

CPU901、ROM902和RAM903经由总线904彼此连接。输入/输出接口905也连接到总线904。The CPU 901 , ROM 902 , and RAM 903 are connected to each other via a bus 904 . An input/output interface 905 is also connected to the bus 904 .

下述部件连接到输入/输出接口905：输入部分906，包括键盘、鼠标等；输出部分907，包括显示器，诸如阴极射线管(CRT)、液晶显示器(LCD)等，以及扬声器等；存储部分908，包括硬盘等；以及通信部分909，包括网络接口卡诸如LAN卡、调制解调器等。通信部分909经由网络诸如因特网执行通信处理。The following components are connected to the input/output interface 905: an input section 906 including a keyboard, a mouse, etc.; an output section 907 including a display such as a cathode ray tube (CRT), a liquid crystal display (LCD), etc., and a speaker; a storage section 908 , including a hard disk, etc.; and a communication section 909, including a network interface card such as a LAN card, a modem, and the like. The communication section 909 performs communication processing via a network such as the Internet.

根据需要，驱动器910也连接到输入/输出接口905。可拆卸介质911诸如磁盘、光盘、磁光盘、半导体存储器等根据需要被安装在驱动器910上，使得从中读出的计算机程序根据需要被安装到存储部分908中。A drive 910 is also connected to the input/output interface 905 as needed. A removable medium 911 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 910 as necessary, so that a computer program read therefrom is installed into the storage section 908 as necessary.

在通过软件实现上述步骤和处理的情况下，从网络诸如因特网或存储介质诸如可拆卸介质911安装构成软件的程序。In the case of implementing the above-described steps and processing by software, the programs constituting the software are installed from a network such as the Internet or a storage medium such as the removable medium 911 .

本领域的技术人员应当理解，这种存储介质不局限于图9所示的其中存储有程序、与方法相分离地分发以向用户提供程序的可拆卸介质911。可拆卸介质911的例子包含磁盘、光盘(包含光盘只读存储器(CD-ROM)和数字通用盘(DVD))、磁光盘（包含迷你盘(MD)和半导体存储器。或者，存储介质可以是ROM902、存储部分908中包含的硬盘等，其中存有程序，并且与包含它们的方法一起被分发给用户。Those skilled in the art should understand that such a storage medium is not limited to the removable medium 911 shown in FIG. 9 in which the program is stored and distributed separately from the method to provide the program to users. Examples of the removable medium 911 include magnetic disks, optical disks (including compact disk read-only memory (CD-ROM) and digital versatile disk (DVD)), magneto-optical disks (including mini disks (MD) and semiconductor memories. Alternatively, the storage medium may be ROM 902 , a hard disk or the like contained in the storage section 908, in which programs are stored, and are distributed to users together with a method containing them.

在前面的说明书中参照特定实施例描述了本发明。然而本领域的普通技术人员理解，在不偏离如权利要求书限定的本发明的范围的前提下可以进行各种修改和改变。In the foregoing specification, the invention has been described with reference to specific embodiments. However, those of ordinary skill in the art understand that various modifications and changes can be made without departing from the scope of the present invention as defined in the claims.

本发明还可以以下面的实施方式实现：The present invention can also be realized in the following embodiments:

1.一种图像处理设备，包括：1. An image processing device, comprising:

对象提取单元，用于从单目视频序列的帧中提取目标对象的图像，作为对象图像；The object extraction unit is used to extract the image of the target object from the frame of the monocular video sequence as the object image;

配准单元，用于将观察对象图像针对参考对象图像进行配准，以获得配准参数，参考对象图像和观察对象图像是分别从参考帧以及参考帧之外的观察帧中提取的对象图像；A registration unit, configured to register the observation object image with the reference object image to obtain registration parameters, the reference object image and the observation object image are object images extracted from the reference frame and the observation frame other than the reference frame, respectively;

特征提取单元，用于通过利用配准参数来提取能够反映观察对象图像与参考对象图像相比相对于成像平面的深度变化的特征；以及a feature extraction unit for extracting a feature capable of reflecting a depth change of the observed object image with respect to the imaging plane compared with the reference object image by using the registration parameter; and

深度确定单元，用于基于所提取的特征确定深度变化。A depth determination unit is configured to determine a depth change based on the extracted features.

2.根据项1的图像处理设备，其中，特征提取单元将配准参数中反映对象图像尺寸变化的缩放参数提取为特征。2. The image processing apparatus according to item 1, wherein the feature extraction unit extracts, as a feature, a scaling parameter reflecting a change in size of the subject image among the registration parameters.

3.根据项2的图像处理设备，其中，用S表示观察帧针对参考帧的缩放参数，则深度确定单元：3. The image processing device according to item 2, wherein, denoting the scaling parameter of the observation frame with respect to the reference frame by S, then the depth determination unit:

在S>1+ε_a时，确定目标对象相对于成像平面变浅；When S>1+ _εa , it is determined that the target object becomes shallower relative to the imaging plane;

在S<1-ε_b时，确定目标对象相对于成像平面变深；以及When S<1- _εb , it is determined that the target object becomes darker relative to the imaging plane; and

否则，确定目标对象相对于成像平面的深度变化不确定；Otherwise, it is determined that the depth variation of the target object relative to the imaging plane is uncertain;

其中，ε_a和ε_b是作为确定的余量的小的正数。Here, ε _a and ε _b are small positive numbers as a certain margin.

4.根据项1至3中任一个的图像处理设备，还包括运动方向识别单元，用于识别目标对象相对于成像平面的运动方向；4. The image processing device according to any one of items 1 to 3, further comprising a motion direction identification unit for identifying the motion direction of the target object relative to the imaging plane;

其中，当使图像处理设备对预定时段内的观察帧顺序执行深度确定时，运动方向识别单元能够依据由深度确定单元确定的结果序列对运动方向进行识别。Wherein, when the image processing device is made to sequentially perform depth determination on the observation frames within a predetermined period, the motion direction identification unit can identify the motion direction according to the result sequence determined by the depth determination unit.

5.根据项4的图像处理设备，其中，运动方向识别单元：5. The image processing device according to item 4, wherein the motion direction recognition unit:

当结果序列中存在连续的第一预定数目以上的“变深”时，识别出目标对象沿远离成像平面的方向运动；以及identifying motion of the target object in a direction away from the imaging plane when there are more than a first predetermined number of consecutive "darkenings" in the resulting sequence; and

当结果序列中存在连续的第二预定数目以上的“变浅”时，识别出目标对象沿靠近成像平面的方向运动。When there are more than a second predetermined number of consecutive "shallows" in the resulting sequence, it is identified that the target object is moving in a direction closer to the imaging plane.

6.根据项4的图像处理设备，其中，运动方向识别单元：6. The image processing device according to item 4, wherein the motion direction recognition unit:

当结果序列中存在连续的第三预定数目以上的“变深”，并且对应的缩放参数S的连乘积小于第一阈值时，识别出目标对象沿远离成像平面的方向运动；以及When there is a continuous third predetermined number or more of "deepening" in the result sequence, and the continuous product of the corresponding scaling parameters S is less than the first threshold, it is identified that the target object is moving in a direction away from the imaging plane; and

当结果序列中存在连续的第四预定数目以上的“变浅”，并且对应的缩放参数S的连乘积大于第二阈值时，识别出目标对象沿靠近成像平面的方向运动。When there are more than a fourth predetermined number of consecutive "shallows" in the result sequence, and the product of the corresponding scaling parameters S is greater than the second threshold, it is identified that the target object is moving in a direction close to the imaging plane.

7.根据项1的图像处理设备，其中，特征提取单元包括：7. The image processing device according to item 1, wherein the feature extraction unit comprises:

对齐单元，用于根据配准参数中的平移参数将观察对象图像与参考对象图像对齐；以及an alignment unit, configured to align the image of the observation object with the image of the reference object according to the translation parameters in the registration parameters; and

直方图生成单元，用于针对对齐的对象图像的边缘部分生成方向梯度直方图，作为能够反映深度变化的特征。The histogram generating unit is configured to generate a histogram of directional gradients for the edge portion of the aligned object image as a feature capable of reflecting depth changes.

8.根据项1至7中任一个的图像处理设备，其中，对象提取单元包括检测单元，检测单元用于使用多尺寸的滑动窗口检测对象图像，以确定对象图像所在区域。8. The image processing apparatus according to any one of items 1 to 7, wherein the object extraction unit includes a detection unit configured to detect the object image using a multi-sized sliding window to determine the region where the object image is located.

9.根据项8的图像处理设备，其中，提取单元包括分割单元，分割单元用于对对象图像所在区域执行分割处理，以提取出对象图像。9. The image processing apparatus according to item 8, wherein the extracting unit includes a segmentation unit for performing segmentation processing on the region where the object image is located to extract the object image.

10.根据项8或9的图像处理设备，其中，检测单元使用经训练的目标对象检测器检测对象图像。10. The image processing device according to item 8 or 9, wherein the detection unit detects the object image using a trained target object detector.

11.一种图像处理方法，包括：11. An image processing method, comprising:

从单目视频序列的帧中提取目标对象的图像，作为对象图像；Extract the image of the target object from the frame of the monocular video sequence as the object image;

将观察对象图像针对参考对象图像进行配准，以获得配准参数，参考对象图像和观察对象图像是分别从参考帧以及参考帧之外的观察帧中提取的对象图像；Registering the observation object image with the reference object image to obtain registration parameters, the reference object image and the observation object image are object images extracted from the reference frame and the observation frame other than the reference frame, respectively;

通过利用配准参数来提取能够反映观察对象图像与参考对象图像相比相对于成像平面的深度变化的特征；以及extracting features that reflect changes in depth of the image of the observed object relative to the imaging plane compared to the image of the reference object by utilizing the registration parameters; and

基于所提取的特征确定深度变化。Depth variation is determined based on the extracted features.

12.根据项11的图像处理方法，其中，将配准参数中反映对象图像尺寸变化的缩放参数提取为特征。12. The image processing method according to item 11, wherein, among the registration parameters, a scaling parameter reflecting a size change of the subject image is extracted as a feature.

13.根据项12的图像处理方法，其中，用S表示观察帧针对参考帧的缩放参数，则：13. The image processing method according to item 12, wherein, using S to represent the scaling parameter of the observation frame for the reference frame, then:

14.根据项11至13中任一个的图像处理方法，还包括：根据针对预定时段内的观察帧顺序得出的深度确定结果的序列来识别目标对象相对于成像平面的运动方向。14. The image processing method according to any one of items 11 to 13, further comprising: identifying a motion direction of the target object relative to the imaging plane based on a sequence of depth determination results sequentially obtained for observation frames within a predetermined period.

15.根据项14的图像处理方法，其中，15. The image processing method according to item 14, wherein,

16.根据项14的图像处理方法，其中，16. The image processing method according to item 14, wherein,

17.根据项11的图像处理方法，其中，提取反映深度变化的特征包括：17. The image processing method according to item 11, wherein extracting features reflecting depth changes comprises:

根据配准参数中的平移参数将观察对象图像与参考对象图像对齐；以及aligning the image of the observed object with the image of the reference object according to the translation parameter in the registration parameters; and

针对对齐的对象图像的边缘部分生成方向梯度直方图，作为能够反映深度变化的特征。A histogram of oriented gradients is generated for the edge part of the aligned object image as a feature that can reflect the depth variation.

18.根据项11至17中任一个的图像处理方法，其中，提取对象图像包括：使用多尺寸的滑动窗口检测对象图像，以确定对象图像所在区域。18. The image processing method according to any one of items 11 to 17, wherein extracting the object image comprises: detecting the object image using multi-sized sliding windows to determine the region where the object image is located.

19.根据项18的图像处理方法，其中，提取对象图像包括：对对象图像所在区域执行分割处理，以提取出对象图像。19. The image processing method according to item 18, wherein extracting the target image comprises: performing segmentation processing on the region where the target image is located to extract the target image.

20.根据项18或19的图像处理方法，其中，提取对象图像包括：使用经训练的目标对象检测器检测对象图像。20. The image processing method according to item 18 or 19, wherein extracting the object image comprises detecting the object image using a trained target object detector.

Claims

1. An image processing device, comprising:

The object extraction unit is used to extract the image of the target object from the frame of the monocular video sequence as the object image;

A registration unit, configured to register the observation object image with the reference object image to obtain registration parameters, the reference object image and the observation object image are respectively extracted from the reference frame and the observation frame other than the reference frame object image of

a feature extraction unit for extracting a feature capable of reflecting a depth change of the observed object image relative to the imaging plane compared with the reference object image by using the registration parameters; and

a depth determination unit, configured to determine the depth variation based on the extracted features.

2. The image processing apparatus according to claim 1, wherein the feature extraction unit extracts, as the feature, a scaling parameter reflecting a size change of an object image among the registration parameters.

3. The image processing device according to claim 2, wherein, using S to represent the scaling parameter of the observation frame for the reference frame, then the depth determination unit:

When S>1+ _εa , it is determined that the target object becomes shallower relative to the imaging plane;

When S<1- _εb , it is determined that the target object becomes darker with respect to the imaging plane; and

Otherwise, it is determined that the depth change of the target object relative to the imaging plane is uncertain;

Here, ε _a and ε _b are small positive numbers as margins for the determination.

4. The image processing device according to any one of claims 1 to 3, further comprising a motion direction identification unit configured to identify the motion direction of the target object relative to the imaging plane;

Wherein, when the image processing device is made to sequentially perform depth determination on the observation frames within a predetermined period of time, the motion direction identification unit can identify the motion direction according to the sequence of results determined by the depth determination unit.

5. The image processing device according to claim 4, wherein the motion direction identification unit:

identifying that the target object is moving in a direction away from the imaging plane when there are more than a first predetermined number of consecutive "darkenings" in the sequence of results; and

The target object is identified as moving in a direction closer to the imaging plane when there are more than a second predetermined number of "shallows" in succession in the sequence of results.

6. An image processing method, comprising:

Extract the image of the target object from the frame of the monocular video sequence as the object image;

Registering the observation object image with respect to the reference object image to obtain registration parameters, the reference object image and the observation object image are object images extracted from the reference frame and the observation frame other than the reference frame, respectively;

extracting features capable of reflecting depth changes in the image of the observed object relative to the imaging plane compared with the image of the reference object by utilizing the registration parameters; and

The depth variation is determined based on the extracted features.

7. The image processing method according to claim 6, wherein, among the registration parameters, a scaling parameter reflecting a size change of an object image is extracted as the feature.

8. The image processing method according to claim 7, wherein, using S to represent the scaling parameter of the observed frame for the reference frame, then:

9. The image processing method according to any one of claims 6 to 8, further comprising: identifying the target object relative to the The direction of motion of the imaging plane.

10. The image processing method according to claim 9, wherein,