WO2019222889A1 - 一种图像特征提取方法及装置 - Google Patents

一种图像特征提取方法及装置 Download PDF

Info

Publication number
WO2019222889A1
WO2019222889A1 PCT/CN2018/087707 CN2018087707W WO2019222889A1 WO 2019222889 A1 WO2019222889 A1 WO 2019222889A1 CN 2018087707 W CN2018087707 W CN 2018087707W WO 2019222889 A1 WO2019222889 A1 WO 2019222889A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
layer
pyramid
slice
feature point
Prior art date
Application number
PCT/CN2018/087707
Other languages
English (en)
French (fr)
Inventor
王易诚
方晓鑫
但汉光
黄启才
孙艳英
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to CN201880086693.7A priority Critical patent/CN111630523A/zh
Priority to PCT/CN2018/087707 priority patent/WO2019222889A1/zh
Publication of WO2019222889A1 publication Critical patent/WO2019222889A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features

Definitions

  • Embodiments of the present application relate to the field of image processing technologies, and in particular, to a method and an apparatus for extracting image features.
  • SLAM Simultaneous Localization and Mapping
  • AR Augmented Reality
  • SLAM is a combination of mobile device positioning and environmental map creation, that is, the mobile device builds an incremental environmental map based on its own pose estimation and sensor's perception of the environment during the movement, and uses this map to achieve autonomous positioning and navigation.
  • the SLAM system framework In the SLAM system framework, it mainly includes three threads, namely Tracking, Local Mapping, and Loop Closing. Tracking mainly extracts the orientation operator and rotation of the BRIEF operator from the image. and Rotated (BRIEF, ORB) features. Mapping is mainly to complete local map construction. Closed-loop detection mainly includes closed-loop detection and closed-loop correction. Among them, tracking is an important processing thread in SLAM. The feature extraction part is completed in this thread. The feature extraction process mainly includes the following steps: 1. Establishing 8-layer pyramid images; 2. Using FAST algorithm to generate feature points of 8-layer pyramid images; 3. Calculate the descriptor information of the feature points based on the feature points of the 8-layer pyramid image and the 8-layer pyramid image.
  • the embodiments of the present application provide an image feature extraction method and device, which are used to improve the speed of feature extraction.
  • an image feature extraction method includes: acquiring an input image, the input image is a grayscale image; performing slice and downsampling processing on the input image to obtain an N-layer pyramid image, and the N-layer pyramid image N has N images of different resolutions, each layer of pyramid images in N layers includes multiple slice images, N is a positive integer greater than or equal to 2; extracting at least one feature point of each slice image in each layer of pyramid images, In order to obtain multiple feature points of multiple slice images of each layer of pyramid images, each feature point is a corner point of each layer of pyramid images, and each feature point is characterized by coordinates and response score values; for each layer of pyramids Screening processing is performed on multiple feature points of multiple slice images of the image to obtain multiple target feature points of each layer of the pyramid image. Each target feature point is the one with the highest response score value in a preset area in the pyramid image of each layer. Feature points.
  • the input image is sliced to perform feature extraction on each sliced image, and the feature points of multiple sliced images of each layered pyramid image are filtered to reduce the amount of data for subsequent processing. , Thereby increasing the speed of feature extraction.
  • the feature points of each slice image are corner points after non-maximum suppression processing.
  • the data amount of the feature points for subsequent processing can be reduced, thereby increasing the speed of feature extraction.
  • the input image is a layer 0 pyramid image
  • slicing and downsampling the input image to obtain an N layer pyramid image including: performing slice processing on the layer 0 pyramid image, Obtain multiple slice images of the layer 0 pyramid image; down-sample the multiple slice images in the layer i pyramid image to obtain the (i + 1) layer pyramid image, and perform the (i + 1) layer pyramid image Slice processing, to obtain multiple slice images of the (i + 1) -th level pyramid image, 0 ⁇ i ⁇ N-2.
  • the method for acquiring an N-layer pyramid image provided above can reduce a memory space required by a mobile device during processing.
  • filtering and processing multiple feature points of multiple slice images of each layer of pyramid image to obtain multiple target feature points of each layer of pyramid image include: obtaining pyramids of each layer Multiple feature points of an image are sliced to obtain a feature point set; a specified number of feature points are filtered out from the feature point set using an octree filtering method to obtain multiple target feature points.
  • obtaining pyramids of each layer Multiple feature points of an image are sliced to obtain a feature point set; a specified number of feature points are filtered out from the feature point set using an octree filtering method to obtain multiple target feature points.
  • the method further includes: determining description information of each target feature point, where the description information includes a feature point direction and a descriptor.
  • determining the description information of the target feature point includes: performing Gaussian blur processing on the slice image in each layer of the pyramid image; according to the processed slice image and the target feature of the slice image Point, determine the feature point direction and descriptor, and get the description information of the target feature point.
  • the mobile device can determine the description information of the target feature point, and reduce the memory space required during processing.
  • an image feature extraction device in a second aspect, includes: an acquisition unit for acquiring an input image, the input image being a grayscale image; and a processing unit for: slicing and downsampling the input image to obtain N Layer pyramid image, each layer pyramid image in N layer pyramid images includes multiple slice images, N is a positive integer greater than or equal to 2; extract at least one feature point of each slice image in each layer pyramid image to obtain each layer Multiple feature points of multiple slice images of the pyramid image, each feature point is a corner point of the pyramid image of each layer, each feature point is characterized by coordinates and response score values; The multiple feature points of the sliced image are filtered to obtain multiple target feature points of each layer of the pyramid image, and each target feature point is a feature point with the largest response score value in a preset area in the pyramid image of each layer.
  • the processing unit is a digital signal processor (DSP).
  • DSP digital signal processor
  • the feature points of each slice image are corner points after the non-maximum value suppression process.
  • the input image is a layer 0 pyramid image
  • the processing unit is further configured to: slice the layer 0 pyramid image to obtain multiple slice images of the layer 0 pyramid image ; Down-sampling multiple slice images in the pyramid image at level i to obtain the pyramid image at level (i + 1), and slice processing at the pyramid image at level (i + 1) to obtain the (i + 1) level Multiple slice images of the pyramid image, 0 ⁇ i ⁇ N-2.
  • the processing unit is further configured to: obtain multiple feature points of multiple slice images of each layer of the pyramid image to obtain a feature point set; use an octree filtering method to extract features from the features A specified number of feature points are filtered out from the point set to obtain multiple target feature points.
  • the processing unit is further configured to determine description information of each target feature point, where the description information includes a feature point direction and a descriptor.
  • the processing unit is further configured to: perform Gaussian blur processing on the sliced image in each layer of the pyramid image; determine according to the processed sliced image and target feature points of the sliced image Feature point direction and descriptor to get the description information of the target feature point.
  • an image feature extraction device includes: an input interface for acquiring an input image, the input image being a grayscale image; and a processor configured to process the following operations: slice and download the input image Sampling process to obtain N-layer pyramid images, N-layer pyramid images are N images with different resolutions, each layer of pyramid images in N-layer pyramid images includes multiple slice images, N is a positive integer greater than or equal to 2; extraction At least one feature point of each slice image in each layer of pyramid image to obtain multiple feature points of multiple slice images of each layer of pyramid image, each feature point is a corner point of each layer of pyramid image, each feature The points are characterized by coordinates and response score values; filtering is performed on multiple feature points of multiple slice images of each layer of pyramid image to obtain multiple target feature points of each layer of pyramid image, and each target feature point is described as Feature points with the highest response scores in a preset area in each layer of the pyramid image.
  • the processor is a digital signal processor (DSP).
  • DSP digital signal processor
  • the DSP accesses each slice image through a direct memory access DMA method.
  • a target feature point of each slice image occupies continuous storage space.
  • the input image is a layer 0 pyramid image
  • the processor further performs the following operation: performing slice processing on the layer 0 pyramid image to obtain multiple slices of the layer 0 pyramid image Image; down-sampling multiple slice images in the i-th layer pyramid image to obtain the (i + 1) -th layer pyramid image, and slice the (i + 1) -th layer pyramid image to obtain the (i + 1) th Multiple slice images of the layer pyramid image, 0 ⁇ i ⁇ N-2.
  • the processor further performs the following operations: acquiring multiple feature points of multiple slice images of each layer of the pyramid image to obtain a feature point set; using an octree filtering method from A specified number of feature points are filtered from the feature point set to obtain multiple target feature points.
  • the processor further performs the following operation: determining description information of each target feature point, where the description information includes a feature point direction and a descriptor.
  • the processor further performs the following operations: performing Gaussian blur processing on the slice image in each layer of the pyramid image; according to the processed slice image and the target feature points of the slice image, Determine the feature point direction and descriptor of the target feature point.
  • a readable storage medium stores instructions, and when the readable storage medium is run on a device, the device is caused to execute the first aspect or any of the first aspect.
  • An image feature extraction method provided by one possible implementation.
  • a computer program product is provided, and when the computer program product runs on a computer, the computer is caused to execute the image feature extraction method provided by the first aspect or any possible implementation manner of the first aspect. .
  • the apparatus, computer storage medium, or computer program product for any of the image feature extraction methods provided above is used to execute the corresponding method provided above, and therefore, the beneficial effects that can be achieved can refer to the above.
  • the beneficial effects in the provided corresponding methods are not repeated here.
  • FIG. 1 is a schematic diagram of an SLAM system overview provided by an embodiment of the present application
  • FIG. 2 is a schematic structural diagram of a mobile device according to an embodiment of the present application.
  • FIG. 3 is a schematic flowchart of an image feature extraction method according to an embodiment of the present application.
  • FIG. 4 is a schematic diagram of an image feature extraction method according to an embodiment of the present application.
  • FIG. 5 is a schematic diagram of another image feature extraction method according to an embodiment of the present application.
  • FIG. 6 is a schematic structural diagram of an image feature extraction device according to an embodiment of the present application.
  • FIG. 7 is a schematic structural diagram of another image feature extraction apparatus according to an embodiment of the present application.
  • Localization means that the mobile device needs to estimate its position relative to other objects in the environment.
  • the environment where the mobile device is located can be given by a map.
  • Mobile devices can also be called mobile devices.
  • Mapping means that the mobile device needs to construct the distribution of objects in the environment it detects. At this time, the location of the mobile device is known. The map reflects the distribution of objects in the environment where the mobile device is located.
  • SLAM Simultaneous Localization and Mapping
  • the Chinese translation can be a binary robust independent basic feature.
  • FAST is the abbreviation of Features from Accelerated Segment Test.
  • Chinese translation can be based on the characteristics of accelerated segmentation test.
  • ORB is short for BIREF (Oriented FAST and Rotated BRIEF / oFAST and rBRIEF) with direction and rotation.
  • ORB feature refers to the use of FAST method to detect and extract features.
  • Features here can also be called feature points of the image.
  • a series of numerical values are used to represent, for example, the series of numerical values of the feature point may include the coordinates of the feature point and the response score value.
  • the FAST feature point itself is not directional, so the calculation of the feature point direction needs to be added to the ORB feature.
  • ORB uses the BRIEF method to calculate the descriptor of the feature point.
  • the advantage of the BRIEF method is that it is fast.
  • the feature point direction and descriptor may be referred to as feature point description information in the following.
  • ORB-SLAM system is a real-time monocular detection and relocation SLAM system based on feature points, which can be operated in large-scale, small-scale, indoor and outdoor environments. As shown in Figure 1, it is an overview of the ORB-SLAM system.
  • the SLAM system uses ORB for feature extraction.
  • SLAM usually includes three threads: Tracking, Local Mapping, and Loop Closing. ).
  • tracking refers to positioning the mobile device by each frame of images (for example, each frame of images is acquired by the camera on the mobile device), and selecting whether to add a key frame, which can refer to the movement of a character or object or The frame where the key action in the change is.
  • Tracking can specifically include: extracting ORB features from the environment image, performing pose estimation based on the ORB features of the previous frame of image, or initializing poses through global relocation, and then tracking the reconstructed local map, optimizing poses, and then according to some rules Identify new keyframes.
  • Local mapping refers to processing new key frames and using local bundle adjustment (Local Bundle Adjustment, Local BA) to complete the reconstruction, which can include inserting key frames, verifying recently generated map points and filtering, and generating new map points.
  • the map points here can refer to physical points in the map.
  • the map points can include depth and absolute coordinate information. Map points It may be a three-dimensional connection established between feature points of multiple frames of images.
  • Closed-loop detection refers to the closed-loop detection of each newly added key frame, which is optimized using the global BA. It can include: closed-loop detection and closed-loop correction.
  • the closed-loop detection can use the Sim3 algorithm to calculate similarity transformation.
  • the closed-loop correction is mainly closed-loop fusion and Graph optimization of Essential Graph.
  • the common view (Covisibility Graph) and the basic graph are in the form of different graphs.
  • the common view can focus tracking and construction on the local common view area.
  • the basic graph can be used to optimize poses and achieve closed-loop detection.
  • the ORB-SLAM system can also include a bag of words for scene recognition.
  • the bag of words can be understood as: extracting features from a large number of images, generating words through clustering, and using a tree structure to store the bag of words. In actual use, by searching in the bag of words by using the feature information of the image, it is easy to compare whether the two images are similar or find words that can represent the image to speed up the retrieval of the image.
  • the embodiment of the present application mainly relates to an image feature extraction part, and the feature extraction part is completed in a tracking thread.
  • the embodiment of the present application is used to accelerate the speed of feature extraction, reduce the time consumed for feature extraction, reduce the power consumption of the device, and thereby improve the user. Experience.
  • FIG. 2 is a schematic structural diagram of a mobile device according to an embodiment of the present application.
  • the mobile device may be a mobile phone, a tablet computer, a video camera, a camera, a wearable device, a vehicle-mounted device, or a terminal device.
  • the above-mentioned devices are collectively referred to as mobile devices in this application.
  • the embodiment of the present application is described by using the mobile device as a mobile phone as an example.
  • the mobile phone includes: a memory 201, a processor 202, a sensor component 203, a multimedia component 204, a power supply component 205, and an input / output interface 206.
  • the memory 201 may be used to store data, software programs, and modules.
  • the memory 201 mainly includes a storage program area and a storage data area.
  • the storage program area may store an operating system and at least one application program required by functions, such as a sound playback function and an image playback function.
  • Storage data area can store data created according to the use of mobile devices, such as audio data, image data, phone books, and so on.
  • the mobile device may include a high-speed random access memory, and may further include a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, or other volatile solid-state storage device.
  • the processor 202 is a control center of the mobile device, and uses various interfaces and lines to connect various parts of the entire device. By running or executing software programs and / or modules stored in the memory 201, and calling data stored in the memory 201, Perform various functions of mobile devices and process data to monitor mobile devices as a whole.
  • the processor 202 may include one or more processing units.
  • the processor 202 may integrate an application processor (Application Processor) (AP) and a digital signal processor (Digital Signal Processor (DSP)), where the AP mainly processes Operating system, user interface, and application programs.
  • AP Application Processor
  • DSP Digital Signal Processor
  • DSP Digital Signal Processor
  • the sensor component 203 includes one or more sensors for providing various aspects of status assessment for the mobile device.
  • the sensor component 203 may include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
  • the sensor component 203 can detect acceleration / deceleration, orientation, open / close status of a mobile device, relative positioning of the component, or Temperature changes in mobile devices, etc.
  • the sensor component 203 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications, that is, it becomes an integral part of the camera.
  • the multimedia component 204 provides a screen with an output interface between the mobile device and the user.
  • the screen may be a touch panel, and when the screen is a touch panel, the screen may be implemented as a touch screen to receive input signals from the user.
  • the touch panel includes one or more touch sensors to sense touch, swipe, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure related to the touch or slide operation.
  • the multimedia component 204 further includes at least one camera.
  • the multimedia component 204 includes a front camera and / or a rear camera. When the mobile device is in an operation mode, such as a shooting mode or a video mode, the front camera and / or the rear camera can receive external multimedia data.
  • Each front camera and rear camera can be a fixed optical lens system or have focal length and optical zoom capabilities.
  • the power component 205 is used to provide power for various components of the mobile device.
  • the power component 205 may include a power management system, one or more power sources, and other components associated with the mobile device generating, managing, and distributing power.
  • the input / output interface 206 provides an interface between the processor 202 and a peripheral interface module.
  • the peripheral interface module may be a keyboard, a mouse, or a USB (Universal Serial Bus) device.
  • the mobile device may further include an audio component, a communication component, and the like.
  • the audio component includes a microphone
  • the communication component includes a wireless fidelity (WiFi) module, a Bluetooth module, and the like.
  • WiFi wireless fidelity
  • Bluetooth Bluetooth module
  • FIG. 3 is a schematic flowchart of an image feature extraction method according to an embodiment of the present application. The method is applied to a mobile device. Referring to FIG. 3, the method includes the following steps.
  • S301 Obtain an input image, where the input image is a grayscale image.
  • the mobile device may include one or more processors.
  • the processor may be an application processor (AP), and the AP may be configured to execute the image feature extraction method provided by the embodiment of the present application.
  • the multiple processors may include an AP and a digital signal processor (DSP).
  • the DSP may be used to execute the image feature extraction method provided by the embodiment of the present application, and the AP performs the feature extraction in SLAM. Other steps, such as local mapping and closed-loop detection performed by the AP.
  • the AP may be an ARM (Advanced Reduced Instruction Set Machine) processor, and the ARM processor may also be simply referred to as ACPU.
  • the input image may be an image obtained by shooting the surrounding environment by a mobile device.
  • the mobile device includes a shooting unit, and the shooting unit may be a camera or a camera.
  • the input image may be shot by the mobile device through the shooting unit.
  • the input image is a grayscale image.
  • the resolution of the input image may be 640 ⁇ 480.
  • the mobile The device may also perform grayscale processing and / or resolution processing on the acquired input image to process the input image into an image that meets the conditions.
  • S302 Slicing and down-sampling the input image to obtain an N-layer pyramid image.
  • the N-layer pyramid image is N images with different resolutions.
  • Each layer of the N-layer pyramid image includes multiple slice images.
  • the N-layer pyramid image refers to N images of different resolutions arranged in a pyramid shape, and the resolution decreases from bottom to top according to the pyramid.
  • Pyramid images can also be called image pyramids. For example, the image with the highest resolution is placed at the bottom and arranged in a pyramid shape. From the bottom to the top, a series of images with gradually decreasing resolution is used until the top of the pyramid contains only one resolution The image with the lowest rate constitutes the image pyramid.
  • N may be a positive integer greater than or equal to 2, for example, N may be 4, 8, or 16, etc. The embodiment of the present application is described by taking N equal to 8 as an example.
  • the resolution of the 8 images may be 640 ⁇ 480, 512 ⁇ 384, 448 ⁇ 336, 384 ⁇ 288, 320 ⁇ 240, 240 ⁇ 180, 192 ⁇ 144 and 160 ⁇ 120.
  • the input image may be a layer 0 pyramid image
  • slice and down-sampling the input image to obtain an N layer pyramid image may include: slice the layer 0 pyramid image to obtain a layer 0 pyramid image. Multiple slice images. Downsampling multiple slice images in the i-th layer pyramid image to obtain the (i + 1) -th layer pyramid image, and slice the (i + 1) -th layer pyramid image to obtain the (i + 1) -th level pyramid Multiple slice images of the image, 0 ⁇ i ⁇ N-2.
  • the layer 0 pyramid image can be the bottom.
  • an image obtained after slicing and downsampling one or more times can be used as the bottom layer, and further slice and downsampling based on the bottom layer to obtain an N-layer pyramid image.
  • the 8-layer pyramid image in this embodiment is merely an example, and the number of layers of the actual pyramid image can be flexibly set.
  • the resolution of the input image is 640 ⁇ 480.
  • slice the input image that is, the layer 0 pyramid image
  • 8 slice images included in the layer 0 pyramid image The resolution of each slice image is 640 ⁇ 60.
  • This processor performs downsampling processing on each slice image included in the layer 0 pyramid image. Downsampling includes downsampling in the horizontal direction and downsampling in the vertical direction. After sampling processing, 8 with a resolution of 512 ⁇ 48 can be obtained.
  • the mobile device can obtain layer 2 pyramid images to layer 7 pyramid images and multiple slice images included in each layer pyramid image.
  • the resolution of layer 2 pyramid images to layer 7 pyramid images can be 448 ⁇ 336, 384 ⁇ 288, 320 ⁇ 240, 240 ⁇ 180, 192 ⁇ 144, and 160 ⁇ 120.
  • the processor when the processor is a DSP, due to the small capacity of the dynamic random access memory (DRAM) of the DSP, the entire image of each layer of pyramid images may not be placed on the DRAM at one time. Therefore, the DSP can process one slice image in each layer of pyramid at a time.
  • DRAM dynamic random access memory
  • the DSP can access each slice image in the layer of pyramid images through direct memory access (DMA), that is, the DSP slices a layer of the layer of pyramid images in DMA mode.
  • DMA direct memory access
  • the image data is read into the DRAM and processed. After the processing is completed, the processing result is output, and then the next slice image is read for processing, and so on, until the DSP finishes processing each of the pyramid images in this layer. Slice the image.
  • the slice image can be processed in two ways. One way is to down-sample the slice image and output it to obtain the (i +1) layer pyramid image, the other way is to perform the following step S303. Wherein, when the mobile device processes each slice image in the (N-1) th layer pyramid image, the mobile device does not need to perform downsampling processing on the slice image.
  • S303 Extract at least one feature point of each slice image in each layer of pyramid images, where each feature point is a corner point of each layer of pyramid images, and each feature point is characterized by a coordinate and a response score value.
  • the mobile device may extract feature points for each layer of pyramid images in order from bottom to top.
  • feature point extraction is performed on each layer of pyramid image
  • each slice image included in each layer of pyramid image may be processed separately.
  • the mobile device can extract all corner points of the slice image.
  • the corner points of the image refer to the maximum points of the image, and the corner points can be used as the feature points of the image. Corners are very important features in an image. While retaining important features of the image graphics, corners can effectively reduce the amount of information data and make it more informative.
  • the mobile device may use the FAST method to extract all the corner points of the sliced image, and each corner point may be represented by its coordinates and a response score (Response Score).
  • FAST method is obtained during feature extraction.
  • the coordinate is relative to the current pyramid image.
  • the response score value can be used to mark the robustness of the corner point. When the response score value is larger, it means that the corresponding corner point is more robust.
  • the method of obtaining the coordinates of the corner points and the response score value by the FAST method has been described, and this embodiment is not expanded.
  • the number of corner points included in a slice image may be large.
  • non-maximum suppression processing may also be performed on the corner points included in each slice image.
  • the subsequent corner points are used as feature points of the slice image.
  • the non-maximum value suppression processing is performed by using the response score value corresponding to each corner, which may specifically include: for each corner, within the specified area where the corner is located, if the response score of the corner is The value is the largest of the response scores of all corners included in the specified area, the corner is retained, and the corner is a feature point; if the response score of the corner is not included in the preset area If the response score of all the corners is the largest, the response score of the corner is set to 0, and the corner is not used as a feature point.
  • the designated area where a corner is located may refer to a designated area in a slice image where the corner is located, and the size of the designated area may be set according to actual requirements, which is not specifically limited in this embodiment of the present application.
  • the above S302 and S303 can be processed in a large loop.
  • the loop can include 8 sub-loops, each of which corresponds to the feature point extraction of a layer of pyramid images.
  • slice the 0-layer pyramid image in the first sub-cycle and extract the feature points of each slice image in the 0-layer pyramid image to obtain the 0th Feature points of multiple slice images included in the layer pyramid image; after that, slice the layer 1 pyramid image (obtained after downsampling processing of the layer 0 pyramid image) in the second sub-loop, and extract the layer 1 pyramids separately
  • the feature points of each slice image in the image are used to obtain the feature points of multiple slice images included in the first layer pyramid image; and so on, until the feature points of multiple slice images included in the seventh layer pyramid image are obtained.
  • each layer of pyramid images in the 8-layer pyramid image can correspond to a feature point set, thereby obtaining 8 feature point sets
  • the processor can store each feature point set together, for example, store it in a continuous storage space, so that it is convenient to obtain all of the multiple slice images included in a layer of pyramid image at one time in S304 described below. Feature points.
  • S304 Filter the feature points of multiple slice images of each layer of pyramid image to obtain multiple target feature points of each layer of pyramid image.
  • Each target feature point is the one with the highest response score value in the selected preset area. Feature points.
  • the processor can filter the feature point set of each layer of pyramid images to reduce the subsequent processing data volume of the mobile device.
  • the mobile device may perform a filtering process on the feature point set of each layer of pyramid images in order from bottom to top, which may specifically include: acquiring feature points of multiple slice images of each layer of pyramid images, A feature point set is obtained; a specified number of feature points are filtered out from the feature point set to obtain multiple target feature points.
  • the mobile device can filter a specified number of features from the feature point set through an octree. Points to obtain multiple target feature points. After that, the mobile device determines target feature points belonging to the same slice image from a plurality of target feature points, respectively.
  • the mobile device may use the octree to filter out a specified number of feature points from the feature point set.
  • the mobile device may obtain the feature point set corresponding to the layer pyramid image, and the mobile device may Use an octree to mesh the pyramid image of this layer (the mesh obtained can be referred to as the first mesh, for example, it can be divided into 4 ⁇ 2 meshes, that is, eight first meshes), and A feature point subset belonging to each first grid is determined from the feature point set.
  • the response score value of each feature point is sorted, and the response score value is filtered out
  • the higher first M feature points for example, M is a positive integer.
  • an octree is used to perform mesh division on each first mesh (hereinafter, it may be referred to as a second mesh) to determine a subset of feature points belonging to each second mesh.
  • the subset of feature points of the grid is sorted according to the response score value of each feature point, and the top W feature points with higher response score values are filtered out, for example, W is a positive integer.
  • W is a positive integer.
  • the target feature points belonging to the same slice image among the multiple target feature points are determined, among the multiple slice images included in the layer pyramid image, the target feature points may exist in each slice image, or There are target feature points in some slice images and no target feature points in other slice images.
  • the specific feature points are determined based on actual conditions, which are not specifically limited in the embodiments of the present application.
  • the target feature points of each slice image occupy continuous storage space.
  • the mobile device may apply for a continuous storage space. For each layer of pyramid images, the mobile device sequentially stores the target feature points of each slice image according to the positional relationship of multiple slice images in the layer of pyramid images, so that The target feature points of each slice image occupy continuous storage space.
  • the mobile device may also determine the number of target feature points in the same slice image and the starting position of the storage space where the target feature points of the slice image are stored, that is, the first target feature point of the slice image is stored. The address of the storage space.
  • the mobile device may read from the storage space according to the number of the target feature points and the starting position.
  • the above S304 can also be processed in a large loop.
  • the loop can also include 8 sub-loops, each of which is used to filter the feature point set corresponding to a layer of pyramid images. That is, in the order from the bottom to the top of the 8-layer pyramid image, the feature point set corresponding to the layer 0 pyramid image to the feature point set corresponding to the layer 7 pyramid image are filtered in the first sub-loop to the eighth sub-loop, respectively. deal with.
  • S305 Determine the description information of the target feature point according to the slice image in each layer of the pyramid image and the target feature point of the slice image.
  • the description information includes feature point directions and descriptors.
  • the target feature point needs to have good rotation invariance, and the feature point obtained through the above S303 does not include the feature point direction, so the feature point direction needs to be determined to rotate the neighborhood of the target feature point according to the feature point direction.
  • the feature point direction refers to the feature point as the center, given a radius, so that a circular area is constructed, and the gray area's gray center of gravity is calculated for the circular area, and the normal vector from the feature point to the gray center of gravity is determined.
  • the normal vector is used as its direction.
  • the descriptor is the key information that distinguishes each feature point from other feature points. It is used to describe the surrounding image area of the feature point. By searching in the neighborhood of each target feature point, a unique descriptor for each target feature point can be established. .
  • the mobile device may process the slice image in each layer pyramid image and the target feature points of the slice image in order from bottom to top, which may specifically include: for each layer in the pyramid image
  • the image is subjected to Gaussian blur processing, and the feature point direction and descriptor are determined according to the processed slice image and the target feature point of the slice image to obtain description information of the target feature point of the slice image.
  • the target feature points of the slice image in each layer of the pyramid image may be obtained from the storage space by the number of the target feature points of the slice image and its starting position in the storage space.
  • slice processing may be performed on the layer of the pyramid image to obtain the slice image.
  • the above S305 can also be processed in a large loop, which can also include 8 sub-loops, each of which is used to obtain the description information of the target feature points of multiple slice images in a layer of pyramid images. That is, the 8-layer pyramid image is processed from bottom to top, and the 0-layer pyramid image is sliced in the first sub-cycle, and the slice is determined according to each slice image and the target feature points of the slice image in turn.
  • the description information of the target feature points of the image until the description information of the target feature points of all slice images included in the layer 0 pyramid image is determined; then, the slice processing of the layer 1 pyramid image is performed in the second sub-cycle, and the Each slice image and the target feature points of the slice image, determine the description information of the target feature points of the slice image, until the description information of the target feature points of all slice images included in the layer 1 pyramid image is determined; and so on, until The description information of target feature points of all slice images included in the layer 7 pyramid image is determined.
  • the device includes an ACPU and a DSP, and the feature extraction method provided in the embodiment of the present application is executed by the DSP as an example for description.
  • the hardware abstraction layer (HAL) is the driving framework of the DSP.
  • the ACPU of the device calls the DSP by calling some related Application Program Interface (API) functions to power the DSP, Power off, and running algorithms, etc.
  • HAL is a data exchange path between ACPU's double-rate synchronous dynamic random access memory (Double Data Rate) and DSP's DRAM.
  • the input image data collected by the device through the camera unit is stored on DDR.
  • the DSP transfers the data of the input image to the DSP's DRAM for processing by calling related API functions.
  • the method provided by the embodiment of the present application may include a total of three parts, corresponding to the three intra-pyramid frame level loops in FIG. 4.
  • the first loop is a pyramid image acquisition and feature point extraction part. This loop is the feature point filtering part, and the third loop is the description information calculation part.
  • the first cycle includes the acquisition of 8 layers of pyramid images (slicing and downsampling processing), and feature point extraction (for example, corner extraction using FAST algorithm) and non-maximization of sliced pyramid images for each layer.
  • Value suppression processing may include the following steps: 1. Slicing the input image and transferring the sliced image to the DRAM through DMA, for example, transferring a sliced image with a resolution of 640 ⁇ 60 (cutting an input image with a resolution of 640 ⁇ 480 Divided into 8 640 ⁇ 60 slice images); 2.
  • Each slice image is subjected to downsampling in turn, that is, downsampling is performed in the horizontal and vertical directions, at this time, a lower resolution image will be generated, for example, 533x80 Image; 3.
  • Feature point extraction that is, corner processing is performed on the down-sampled sliced image, and the response score value is calculated through the corners to generate an image score plane through which the corners are calculated; 4.
  • Non-maximum suppression processing is performed on the image score plane. If the score value in the local area is not the maximum value, the value is set to 0 (that is, the point is suppressed). If it is the maximum value, the value is retained; 5.
  • steps 1-4 above until all slice images of the pyramid image of this layer are processed, and then transfer to the pyramid image of the next layer.
  • the second loop includes filtering the feature point set of each layer of pyramid images in the 8-layer pyramid image.
  • the feature points of each layer of the pyramid image are filtered through an octree to obtain multiple target feature points of each layer of the pyramid image.
  • the target feature points belonging to each slice image among the plurality of target feature points are stored using continuous memory to facilitate DSP management and subsequent use.
  • the DSP may determine the storage location of the target feature points of each slice image according to the positions of the multiple slice images included in each layer of the pyramid image in advance, and store the target feature points belonging to the same slice image using continuous memory, and The number of target feature points of each slice image and its stored starting position can be recorded.
  • it may include the following steps: input all the feature points belonging to the same layer of pyramid images generated in the first loop to the octree for screening until it reaches the specified number of target feature points; 2. apply for continuous memory, according to each The position of each slice image in its pyramid image stores the target feature points belonging to the same slice image among the filtered target feature points; 3. Record the target feature points contained in each slice image in the 8-layer pyramid image The number and the starting position of its storage.
  • the third cycle includes performing Gaussian Blur processing on the slice images included in each layer of the pyramid images in the 8-layer pyramid image, and calculating each target feature point based on the processed slice image and the target feature points contained therein.
  • Feature point direction information and descriptors can include the following steps: 1. slice the input image and transfer it to DRAM, for example, move a slice image with a resolution of 640 ⁇ 60; 2. perform Gaussian blur processing on each slice image; 3. calculate each The feature point direction information and descriptor of each target feature point; 4. Continue the processing of steps 1-3 above until all slice images of the pyramid image of this layer are processed, and then transfer to the pyramid image of the next layer.
  • each layer of the pyramid image includes N slice images
  • the processing of the 8-layer pyramid image of each part is realized by the intra-pyramid frame loop for level0-level7, and each layer of the pyramid image includes multiple slice images through frames.
  • the inner slice loop is implemented for slice0-sliceN.
  • the data or information flow in the feature extraction process for each slice image.
  • the resolutions of the images in the 8-layer pyramid image are 640 ⁇ 480, 512 ⁇ 384, 448 ⁇ 336, 384 ⁇ 288, 320 ⁇ 240, 240 ⁇ 180, 192 ⁇ 144, and 160 ⁇ 120.
  • the feature point set may refer to all feature points of each layer of the pyramid image stored in the DDR;
  • the layer feature points refer to the feature points of a layer of the pyramid image;
  • Categorization refers to the division of target feature points belonging to the same slice image;
  • the target feature point set may refer to the target feature points of all slice images stored in continuous storage space in each layer of the pyramid image;
  • the description information set may refer to storage in DDR Descriptive information of the target feature points in each layer of the pyramid image.
  • N-layer pyramid images are obtained by slicing and downsampling the input image, and each layer of pyramid images includes multiple slice images, and feature points of each slice image are extracted each time. , And filter all feature points of each layer of the pyramid image, and then determine the description information of the target feature points according to the slice image and its target feature points, so as to reduce the memory space required for the processing process by slicing, And the filtering process reduces the amount of data processed, thereby increasing the data processing rate and reducing power consumption.
  • DSP for feature extraction
  • the data processing rate of SLAM for mobile devices can be further improved.
  • the mobile device includes a hardware structure and / or a software module corresponding to each function.
  • this application can be implemented in the form of hardware or a combination of hardware and computer software. Whether a certain function is performed by hardware or computer software-driven hardware depends on the specific application and design constraints of the technical solution. A professional technician can use different methods to implement the described functions for each specific application, but such implementation should not be considered to be beyond the scope of this application.
  • the image feature extraction device may be divided into functional modules according to the foregoing method examples.
  • each functional module may be divided corresponding to each function, or two or more functions may be integrated into one processing module.
  • the above integrated modules can be implemented in the form of hardware or software functional modules. It should be noted that the division of the modules in the embodiments of the present application is schematic, and is only a logical function division. In actual implementation, there may be another division manner.
  • FIG. 6 shows a possible structural diagram of an image feature extraction device involved in the foregoing embodiment.
  • the image feature extraction device includes an obtaining unit 601 and a processing unit. 602.
  • the obtaining unit 601 is used to support S301 in the embodiment of the method for executing the image feature extraction device;
  • the processing unit 602 is used to support S302 to S305 in the embodiment of the method for executing the image feature extraction device, and / or used in the description herein.
  • the image feature extraction device further includes: an output unit 603; the output unit 603 is configured to input a slice image, a feature point of the slice image, and / or a target feature point of the slice image, and the like.
  • the apparatus can be implemented as software and stored in a storage medium.
  • the image feature extraction device in the embodiment of the present application is described from the perspective of a modular functional entity, and the feature extraction device in the embodiment of the application is described from the perspective of hardware processing.
  • the structure of the device may be as shown in FIG. 7.
  • the image feature extraction device includes a memory 701, a processor 702, a communication interface 703, and a bus 704.
  • the communication interface 703 may include an input interface 7031 and an output interface 7032. Accordingly, when the image feature is extracted as a mobile phone, the memory 701 may be the memory 201 in FIG. 2, the processor 702 may be the processor 202 in FIG. 2 (for example, the processor 702 is specifically a DSP), and the communication interface 703 may be It is the input ⁇ output interface 206 in FIG. 2.
  • the input interface can be used to obtain an input image, and the input image is a grayscale image.
  • the input interface can obtain the above input image by time-multiplexing; in some feasible implementations, In the example, there can be only one input interface, or there can be multiple input interfaces.
  • Processor 702 a function configured to process S302-S305 of the image feature extraction method described above.
  • the processor may be a uniprocessor structure, a multiprocessor structure, a single-threaded processor, a multi-threaded processor, and the like; in some feasible embodiments, the processor may be a central processing unit , General purpose processor, digital signal processor, application specific integrated circuit, field programmable gate array or other programmable logic device, transistor logic device, hardware component or any combination thereof. It may implement or execute various exemplary logical blocks, modules, and circuits described in connection with the disclosure of this application.
  • the processor may also be a combination that implements computing functions, such as a combination including one or more microprocessors, a combination of a digital signal processor and a microprocessor, and so on.
  • Output interface 7032 This output interface is used to output the processing result in the image feature extraction method described above.
  • the processing result may be directly output by a processor, or may be stored in a memory first, and then stored in the memory. Output; in some feasible embodiments, there may be only one output interface, or there may be multiple output interfaces.
  • the processing result output by the output interface may be sent to a memory for storage, or may be sent to another processing flow to continue processing, or sent to a display device for display, and sent to a player terminal for playback. Wait.
  • the memory may store the above-mentioned input image, and related instructions for configuring a processor, and the like.
  • the memory may be a floppy disk, a hard disk such as a built-in hard disk and a mobile hard disk, a magnetic disk, an optical disk, a magneto-optical disk such as CD_ROM, DCD_ROM, non-volatile storage
  • the device is, for example, RAM, ROM, PROM, EPROM, EEPROM, flash memory, or any other form of storage medium known in the technical field.
  • Bus 704 The bus may be a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (EISA) bus, etc.
  • PCI Peripheral Component Interconnect
  • EISA Extended Industry Standard Architecture
  • the bus can be divided into an address bus, a data bus, a control bus, and the like. For ease of representation, only one thick line is used in FIG. 7, but it does not mean that there is only one bus or one type of bus.
  • Each component of the image feature extraction device provided in the embodiment of the present application is used to implement the functions of the corresponding steps of the foregoing feature extraction method. Because in the foregoing embodiment of the image feature extraction method, each step has been performed. Detailed description is not repeated here.
  • An embodiment of the present application further provides a computer-readable storage medium.
  • the computer-readable storage medium stores instructions.
  • the computer-readable storage medium runs on a device (for example, the device may be a single-chip microcomputer, a chip, a computer, or the like), such that The device performs one or more steps in S301-S305 of the image feature extraction method described above.
  • each component module of the image feature extraction device is implemented in the form of a software functional unit and sold or used as an independent product, it may be stored in the computer-readable storage medium.
  • the embodiments of the present application further provide a computer program product containing instructions.
  • the technical solution of the present application is essentially a part that contributes to the existing technology or all or part of the technical solution may be a software product.
  • the computer software product is stored in a storage medium, and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) or a processor therein to execute the embodiments of the present application. All or part of the steps of the method.

Abstract

一种图像特征提取方法及装置,涉及图像处理技术领域,用于加快图像特征提取的速度。该方法包括:获取输入图像,该输入图像为灰度图像(S301);对输入图像进行切片和下采样处理,得到N层金字塔图像,N层金字塔图像中的每层金字塔图像包括多个切片图像(S302);提取每层金字塔图像中每个切片图像的至少一个特征点,每个特征点为所述每层金字塔图像的角点(S303);对每层金字塔图像的多个切片图像的特征点进行筛选处理,得到每层金字塔图像的多个目标特征点,每个目标特征点为选择出的预设区域中响应得分值最大的特征点(S304)。

Description

一种图像特征提取方法及装置 技术领域
本申请实施例涉及图像处理技术领域,尤其涉及一种图像特征提取方法及装置。
背景技术
即时定位与地图构建(Simultaneous Localization and Mapping,SLAM)是增强现实(Augmented Reality,AR)中一种常见的技术。所谓SLAM就是将移动装置定位与环境地图创建融为一体,即移动装置在运动过程中根据自身位姿估计和传感器对环境的感知构建增量式环境地图,同时利用该地图实现自主定位和导航。
在SLAM系统框架中,主要包括三个线程,即跟踪(Tracking)、建图(Local Mapping)和闭环检测(Loop Closing),跟踪主要是从图像中提取带方向和旋转的BRIEF算子(Oriented FAST and Rotated BRIEF,ORB)特征,建图主要是完成局部地图构建,闭环检测主要包括闭环探测和闭环校正。其中,跟踪是SLAM中的重要处理线程,特征提取部分在该线程中完成,特征提取过程主要包括以下步骤:1.建立8层金字塔图像;2.使用FAST算法生成8层金字塔图像的特征点;3.根据8层金字塔图像和8层金字塔图像的特征点,计算特征点的描述符信息。
目前,在通用计算机系统上,在按照上述步骤进行特征提取时,上述步骤都是按照顺序依次执行的,在每个步骤中都需要对8层金字塔图像中的每层金字塔图像进行处理,从而现有特征提取过程中的计算量大、耗费时间长、功耗大等。随着技术的发展,基于嵌入式系统的嵌入式设备应用越来越广泛,而嵌入式系统与通用计算机系统相比,其处理速率和内存都比较小,且在相同精确度计算时的时间更长,因此,如何加快特征提取速度是一个待解决的问题。
发明内容
本申请的实施例提供一种图像特征提取方法及装置,用于提高特征提取的速度。
为达到上述目的,本申请的实施例采用如下技术方案:
第一方面,提供一种图像特征提取方法,该方法包括:获取输入图像,输入图像为灰度图像;对输入图像进行切片和下采样处理,得到N层金字塔图像,N层金字塔图像N是具有不同分辨率的N个图像,N层金字塔图像中的每层金字塔图像包括多个切片图像,N为大于或等于2的正整数;提取每层金字塔图像中每个切片图像的至少一个特征点,以得到每层金字塔图像的多个切片图像的多个特征点,每个特征点为所述每层金字塔图像的角点,每个特征点以坐标和响应得分值所表征;对每层金字塔图像的多个切片图像的多个特征点进行筛选处理,得到每层金字塔图像的多个目标特征点,每个目标特征点为所述每层金字塔图像中预设区域中响应得分值最大的特征点。
上述技术方案中,通过对输入图像进行切片处理,以对每个切片图像进行特征提取,并通过对每层金字塔图像的多个切片图像的特征点进行筛选处理,以减小后续处理的数据量,进而提高特征提取的速度。
在第一方面的一种可能的实现方式中,每个切片图像的特征点为非极大值抑制处 理后的角点。上述可能的实现方式中,可以减小后续处理的特征点的数据量,进而提高特征提取的速度。
在第一方面的一种可能的实现方式中,输入图像为第0层金字塔图像,对输入图像进行切片和下采样处理,得到N层金字塔图像,包括:对第0层金字塔图像进行切片处理,得到第0层金字塔图像的多个切片图像;对第i层金字塔图像中的多个切片图像下采样处理,得到第(i+1)层金字塔图像,对第(i+1)层金字塔图像进行切片处理,得到第(i+1)层金字塔图像的多个切片图像,0≤i≤N-2。上述可能的实现方式中,上述提供的获取N层金字塔图像的方法,可以减小移动设备在处理过程中所需的内存空间。
在第一方面的一种可能的实现方式中,对每层金字塔图像的多个切片图像的多个特征点进行筛选处理,得到每层金字塔图像的多个目标特征点,包括:获取每层金字塔图像的多个切片图像的多个特征点,以得到特征点集合;利用八叉树筛选方式从该特征点集合中筛选出指定数量的特征点,以得到多个目标特征点。上述可能的实现方式中,通过对每层金字塔图像的多个切片图像的多个特征点进行筛选处理,可以减小后续处理的数据量,进而提高数据处理效率并降低功耗。
在第一方面的一种可能的实现方式中,该方法还包括:确定每个目标特征点的描述信息,该描述信息包括特征点方向和描述符。
在第一方面的一种可能的实现方式中,确定目标特征点的描述信息,包括:对于每层金字塔图像中的切片图像进行高斯模糊处理;根据处理后的切片图像和该切片图像的目标特征点,确定特征点方向和描述符,得到目标特征点的描述信息。上述可能的实现方式中,可以实现移动设备确定目标特征点的描述信息,并减小处理过程中所需的内存空间。
第二方面,提供一种图像特征提取装置,该装置包括:获取单元,用于获取输入图像,输入图像为灰度图像;处理单元,用于:对输入图像进行切片和下采样处理,得到N层金字塔图像,N层金字塔图像中的每层金字塔图像包括多个切片图像,N为大于或等于2的正整数;提取每层金字塔图像中每个切片图像的至少一个特征点,以得到每层金字塔图像的多个切片图像的多个特征点,每个特征点为所述每层金字塔图像的角点,每个特征点以坐标和响应得分值所表征;对每层金字塔图像的多个切片图像的多个特征点进行筛选处理,得到每层金字塔图像的多个目标特征点,每个目标特征点为所述每层金字塔图像中预设区域中响应得分值最大的特征点。
在第二方面的一种可能的实现方式中,处理单元为数字信号处理器DSP。
在第二方面的一种可能的实现方式中,每个切片图像的特征点为非极大值抑制处理后的角点。
在第二方面的一种可能的实现方式中,输入图像为第0层金字塔图像;处理单元,还用于:对第0层金字塔图像进行切片处理,得到第0层金字塔图像的多个切片图像;对第i层金字塔图像中的多个切片图像下采样处理,得到第(i+1)层金字塔图像,对第(i+1)层金字塔图像进行切片处理,得到第(i+1)层金字塔图像的多个切片图像,0≤i≤N-2。
在第二方面的一种可能的实现方式中,处理单元,还用于:获取每层金字塔图像 的多个切片图像的多个特征点,以得到特征点集合;利用八叉树筛选方式从特征点集合中筛选出指定数量的特征点,以得到多个目标特征点。
在第二方面的一种可能的实现方式中,处理单元,还用于:确定每个目标特征点的描述信息,该描述信息包括特征点方向和描述符。
在第二方面的一种可能的实现方式中,处理单元,还用于:对于每层金字塔图像中的切片图像进行高斯模糊处理;根据处理后的切片图像和该切片图像的目标特征点,确定特征点方向和描述符,以得到目标特征点的描述信息。
第三方面,提供一种图像特征提取装置,该装置包括:输入接口,用于获取输入图像,输入图像为灰度图像;处理器,被配置为可处理如下操作:对输入图像进行切片和下采样处理,得到N层金字塔图像,N层金字塔图像是具有不同分辨率的N个图像,N层金字塔图像中的每层金字塔图像包括多个切片图像,N为大于或等于2的正整数;提取每层金字塔图像中每个切片图像的至少一个特征点,以得到每层金字塔图像的多个切片图像的多个特征点,每个特征点为所述每层金字塔图像的角点,每个特征点以坐标和响应得分值所表征;对每层金字塔图像的多个切片图像的多个特征点进行筛选处理,得到每层金字塔图像的多个目标特征点,每个目标特征点为所述每层金字塔图像中预设区域中响应得分值最大的特征点。
在第三方面的一种可能的实现方式中,该处理器为数字信号处理器DSP。
在第三方面的一种可能的实现方式中,在每层金字塔图像中的每个切片图像的处理过程中,DSP通过直接内存访问DMA方式访问每个切片图像。
在第三方面的一种可能的实现方式中,每个切片图像的目标特征点占用连续的存储空间。
在第三方面的一种可能的实现方式中,输入图像为第0层金字塔图像,该处理器还执行以下操作:对第0层金字塔图像进行切片处理,得到第0层金字塔图像的多个切片图像;对第i层金字塔图像中的多个切片图像下采样处理,得到第(i+1)层金字塔图像,对第(i+1)层金字塔图像进行切片处理,得到第(i+1)层金字塔图像的多个切片图像,0≤i≤N-2。
在第三方面的一种可能的实现方式中,该处理器还执行以下操作:获取每层金字塔图像的多个切片图像的多个特征点,以得到特征点集合;利用八叉树筛选方式从特征点集合中筛选出指定数量的特征点,以得到多个目标特征点。
在第三方面的一种可能的实现方式中,该处理器还执行以下操作:确定每个目标特征点的描述信息,该描述信息包括特征点方向和描述符。
在第三方面的一种可能的实现方式中,该处理器还执行以下操作:对于每层金字塔图像中的切片图像进行高斯模糊处理;根据处理后的切片图像和该切片图像的目标特征点,确定目标特征点的特征点方向和描述符。
第四方面,提供一种可读存储介质,所述可读存储介质中存储有指令,当所述可读存储介质在设备上运行时,使得所述设备执行第一方面或者第一方面的任一项可能的实现方式所提供的图像特征提取方法。
第五方面,提供一种计算机程序产品,当所述计算机程序产品在计算机上运行时,使得所述计算机执行第一方面或者第一方面的任一项可能的实现方式所提供的图像特 征提取方法。
可以理解地,上述提供的任一种图像特征提取方法的装置、计算机存储介质或者计算机程序产品均用于执行上文所提供的对应的方法,因此,其所能达到的有益效果可参考上文所提供的对应的方法中的有益效果,此处不再赘述。
附图说明
图1为本申请实施例提供的一种SLAM的系统概述的示意图;
图2为本申请实施例提供的一种移动设备的结构示意图;
图3为本申请实施例提供的一种图像特征提取方法的流程示意图;
图4为本申请实施例提供的一种图像特征提取方法的示意图;
图5为本申请实施例提供的另一种图像特征提取方法的示意图;
图6为本申请实施例提供的一种图像特征提取装置的结构示意图;
图7为本申请实施例提供的另一种图像特征提取装置的结构示意图。
具体实施方式
在介绍本申请实施例之前,首先对本申请中所涉及的技术名词进行解释说明。
定位(Localization),是指移动设备需要估计其相对于环境中其他物体的位置,移动设备所在的环境可以由地图给出。移动设备也可以叫移动装置。
地图构建(Mapping),是指移动设备需要构建出其所探测到的环境中物体的分布情况,此时移动设备的位置是已知的。地图反映了移动设备所处的环境中的物体分布情况。
即时定位与地图构建(Simultaneous Localization and Mapping,SLAM),是指移动设备利用具有噪声的传感器数据,既要构建出环境中物体的分布情况,还要确定自身的位置。SLAM可以用于解决移动设备在未知环境中运行时定位导航与地图构建的问题。
BRIEF是Binary Robust Independent Elementary Features的缩写,中文翻译可以为二进制稳健独立基本特征。FAST是Features from Accelerated Segment Test的缩写,中文翻译可以为基于加速分割测试的特征。
ORB是带方向和旋转的BIREF(Oriented FAST and Rotated BRIEF/oFAST and rBRIEF)的简称,ORB特征是指采用FAST方法来检测提取特征,这里的特征也可以称为图像的特征点,该特征点通过一系列数值来表示,比如,该特征点的一系列数值可以包括特征点的坐标和响应得分值。其中,FAST特征点本身是不具有方向性的,所以在ORB特征中需要添加对特征点方向的计算;另外,ORB采用BRIEF方法计算特征点的描述符,BRIEF方法的优点在于速度快。后续可以将特征点方向和描述符称为特征点的描述信息。
ORB-SLAM系统是一个基于特征点的实时单目检测和重定位SLAM系统,在大规模、小规模、室内室外的环境中都可以运行。如图1所示,为ORB-SLAM系统的概述图,SLAM系统采用ORB进行特征提取,SLAM通常包括三个线程,分别为跟踪(Tracking)、局部建图(Local Mapping)和闭环检测(Loop Closing)。
其中,跟踪是指通过每一帧图像定位移动设备(比如,每一帧图像都是通过移动设备上的相机采集得到的),选择是否加入关键帧,该关键帧可以是指角色或者物体 运动或变化中的关键动作所处的那一帧。跟踪具体可以包括:从环境图像中提取ORB特征,根据上一帧图像的ORB特征进行姿态估计,或者通过全局重定位初始化位姿,然后跟踪已经重建的局部地图,优化位姿,再根据一些规则确定新的关键帧。局部建图是指处理新的关键帧,使用局部捆集调整(Local Bundle Adjustment,Local BA)完成重建,具体可以包括:插入关键帧,验证最近生成的地图点并筛选,生成新的地图点,使用局部捆集调整(Local BA),对插入的关键帧进行筛选,去除多余的关键帧,这里的地图点可以是指地图中的物理点,地图点可以包括深度和绝对坐标的信息,地图点可以是根据多帧图像的特征点之间相互建立的三维连接。闭环检测是指对每一个新加入的关键帧进行闭环检测,使用全局的BA进行优化,具体可以包括:闭环探测和闭环校正,闭环检测可以使用Sim3算法计算相似变换,闭环校正主要是闭环融合和基本图(Essential Graph)的图优化。另外,共视图(Covisibility Graph)和基本图是不同图的形式,共视图可以使跟踪和建图聚焦在局部共视区域,基本图可用于优化位姿实现闭环检测。此外,ORB-SLAM系统还可以包括用于场景识别的词袋,这里的词袋可以理解为:将大量图像中的特征提取出来,通过聚类产生单词,使用树形结构存储后生成词袋。实际使用中,通过该图像的特征信息在词袋中进行查找,就很容易比较出两张图像是否为相似的图像,或者查找出可以代表该图像的单词,以加速图像的检索。
本申请实施例主要涉及图像特征提取部分,特征提取部分在跟踪线程中完成的,本申请实施例用于加快特征提取的速度,降低特征提取的耗费时间,减小设备的功耗,进而提高用户体验。
图2为本申请实施例提供的一种移动设备的结构示意图,该移动设备可以为手机、平板电脑、摄像机、照相机、可穿戴设备、车载设备、或终端设备等等。为方便描述,本申请中将上面提到的设备统称为移动设备。本申请实施例以该移动设备为手机为例进行说明,该手机包括:存储器201、处理器202、传感器组件203、多媒体组件204、电源组件205以及输入\输出接口206。
下面结合图2对移动设备的各个构成部件进行具体的介绍:
存储器201可用于存储数据、软件程序以及模块;主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序,比如声音播放功能、图像播放功能等;存储数据区可存储根据移动设备的使用所创建的数据,比如音频数据、图像数据、电话本等。此外,移动设备可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或其他易失性固态存储器件。
处理器202是移动设备的控制中心,利用各种接口和线路连接整个设备的各个部分,通过运行或执行存储在存储器201内的软件程序和/或模块,以及调用存储在存储器201内的数据,执行移动设备的各种功能和处理数据,从而对移动设备进行整体监控。可选地,处理器202可包括一个或多个处理单元,比如,处理器202可集成应用处理器(Application Processor,AP)和数字信号处理器(Digital Signal Processor,DSP),其中,AP主要处理操作系统、用户界面和应用程序等,DSP可用于处理特征提取过程中的图像切片、以及相关运算等。可以理解的是,上述,DSP也可以不集成到处理器 202中。
传感器组件203包括一个或多个传感器,用于为移动设备提供各个方面的状态评估。其中,传感器组件203可以包括加速度传感器,陀螺仪传感器,磁传感器,压力传感器或温度传感器,通过传感器组件203可以检测到移动设备的加速/减速、方位、打开/关闭状态,组件的相对定位,或移动设备的温度变化等。此外,传感器组件203还可以包括光传感器,如CMOS或CCD图像传感器,用于在成像应用中使用,即成为相机的组成部分。
多媒体组件204在移动设备和用户之间的提供一个输出接口的屏幕,该屏幕可以为触摸面板,且当该屏幕为触摸面板时,屏幕可以被实现为触摸屏,以接收来自用户的输入信号。触摸面板包括一个或多个触摸传感器以感测触摸、滑动和触摸面板上的手势。所述触摸传感器可以不仅感测触摸或滑动动作的边界,而且还检测与所述触摸或滑动操作相关的持续时间和压力。此外,多媒体组件204还包括至少一个摄像头,比如,多媒体组件204包括一个前置摄像头和/或后置摄像头。当移动设备处于操作模式,如拍摄模式或视频模式时,前置摄像头和/或后置摄像头可以接收外部的多媒体数据。每个前置摄像头和后置摄像头可以是一个固定的光学透镜系统或具有焦距和光学变焦能力。
电源组件205用于为移动设备的各个组件提供电源,电源组件205可以包括电源管理系统,一个或多个电源,及其他与移动设备生成、管理和分配电力相关联的组件。输入\输出接口206为处理器202和外围接口模块之间提供接口,比如,外围接口模块可以键盘、鼠标、或USB(通用串行总线)设备等。
尽管未示出,移动设备还可以包括音频组件和通信组件等,比如,音频组件包括麦克风,通信组件包括无线保真(Wireless Fidelity,WiFi)模块、蓝牙模块等,本申请实施例在此不再赘述。本领域技术人员可以理解,图2中示出的手机结构并不构成对手机的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。
图3为本申请实施例提供的一种图像特征提取方法的流程示意图,该方法应用于移动设备中,参见图3,该方法包括以下几个步骤。
S301:获取输入图像,该输入图像为灰度图像。
其中,移动设备可以包括一个或者多个处理器。当移动设备包括一个处理器时,该处理器可以为应用处理器(AP),AP可用于执行本申请实施例提供的图像特征提取方法。当移动设备包括多个处理器时,该多个处理器可以包括AP和数字信号处理器(DSP),DSP可用于执行本申请实施例提供的图像特征提取方法,由AP执行SLAM中除特征提取之外的其他步骤,比如,由AP执行局部建图和闭环检测等步骤。可选地,AP可以为ARM(高级精简指令集机器)处理器,ARM处理器也可以简称为ACPU。
另外,该输入图像可以是由移动设备对周围环境进行拍摄后得到的图像,比如,移动设备包括拍摄单元,该拍摄单元可以为摄像头或者相机等,该输入图像可以由移动设备通过该拍摄单元拍摄到的图像。该输入图像为灰度图像,例如该输入图像的分辨率可以为640×480,比如,当该移动设备获取到的输入图像不是灰度图像,或者其分辨率大小不是640×480时,该移动设备还可以对获取的输入图像进行灰度处理和/ 或分辨率处理,以将输入图像处理为满足条件的图像。
S302:对输入图像进行切片和下采样处理,得到N层金字塔图像,N层金字塔图像是具有不同分辨率的N个图像,N层金字塔图像中的每层金字塔图像包括多个切片图像。
其中,N层金字塔图像是指按照金字塔形状排列的N张不同分辨率的图像,且分辨率根据该金字塔从下往上依次减小。金字塔图像也可以称为图像金字塔,比如,将具有最高分辨率的图像放在底部,以金字塔形状排列,从下往上依次是一系列分辨率逐渐降低的图像,直至金字塔的顶部只包含一个分辨率最低的图像,就构成了图像金字塔。N可以是大于或等于2的正整数,比如,N可以为4、8、或者16等等,本申请实施例以N等于8为例进行说明。
可选地,在本申请实施例中的8层金字塔图像中,8张图像的分辨率分别可以为640×480、512×384、448×336、384×288、320×240、240×180、192×144和160×120。
另外,该输入图像可以为第0层金字塔图像,对该输入图像进行切片和下采样处理,得到N层金字塔图像,可以包括:对第0层金字塔图像进行切片处理,得到第0层金字塔图像的多个切片图像。对第i层金字塔图像中的多个切片图像下采样处理,得到第(i+1)层金字塔图像,对第(i+1)层金字塔图像进行切片处理,得到第(i+1)层金字塔图像的多个切片图像,0≤i≤N-2。第0层金字塔图像可以是底部。或者可替换地,可以一次或多次切片和下采样后得到的某个图像作为底层,并在该底层基础上进一步切片和下采样得到N层金字塔图像。须知,本实施例中的8层金字塔图像仅作为举例,实际金字塔图像的层数可以灵活设置。
比如,该输入图像的分辨率为640×480,该移动设备获取该输入图像后,对该输入图像,即第0层金字塔图像进行切片处理,得到第0层金字塔图像包括的8个切片图像,每个切片图像的分辨率为640×60。该处理器对第0层金字塔图像包括的每个切片图像进行下采样处理,下采样包括水平方向上的下采样和垂直方向上的下采样,采样处理后可以得到分辨率为512×48的8个切片图像,从而得到分辨率为512×384的第1层金字塔图像,进而对第1层金字塔图像进行切片处理,得到第1层金字塔图像包括的8个切片图像。以此类推,该移动设备可以得到第2层金字塔图像至第7层金字塔图像,以及各层金字塔图像包括的多个切片图像,第2层金字塔图像至第7层金字塔图像的分辨率依次可以为448×336、384×288、320×240、240×180、192×144和160×120。
可选地,当该处理器为DSP时,由于DSP的动态随机存储器(Dynamic Random Access Memory,DRAM)的容量较小,可能不能一次性地将每层金字塔图像的整张图像放置到DRAM上进行处理,因此,DSP可以每次对每层金字塔中的一个切片图像进行处理。
具体地,对于任意一层金字塔图像,DSP可以通过直接内存访问(Direct Memory Access,DMA)方式访问该层金字塔图像中的每个切片图像,即DSP通过DMA方式将该层金字塔图像中的一个切片图像的数据读取至DRAM上,并对其进行处理,在处理完成后输出处理结果,之后再读取下一个切片图像进行处理,以此类推,直到DSP 处理完该层金字塔图像中的每个切片图像。
在实际应用中,该移动设备在获取第i层金字塔图像的每个切片图像时,可以对该切片图像进行两路处理,一路是对该切片图像进行下采样处理并输出,以得到第(i+1)层金字塔图像,另一路是执行下述S303的步骤。其中,在该移动设备对第(N-1)层金字塔图像中的每个切片图中进行处理时,该移动设备无需再对该切片图像进行下采样处理。
S303:提取每层金字塔图像中每个切片图像的至少一个特征点,每个特征点为所述每层金字塔图像的角点,每个特征点以坐标和响应得分值所表征。
对于N层金字塔图像,该移动设备可以按照从下往上的顺序,依次对每层金字塔图像进行特征点提取。在对每层金字塔图像进行特征点提取时,可以分别对每层金字塔图像中包括的每个切片图像进行处理。
对于每层金字塔图像中的每个切片图像,该移动设备可以提取该切片图像的所有角点,图像的角点是指图像的极大值点,该角点可以作为图像的特征点。角点是图像中很重要的特征,角点在保留图像图形重要特征的同时,可以有效减少信息的数据量,使其信息的含量较高。具体的,该移动设备可以使用FAST方法提取该切片图像的所有角点,每个角点可以通过其坐标和响应得分值(Response Score)来表示,该坐标和该响应得分值是在通过FAST方法进行特征提取时得到的。该坐标是相对于当前的金字塔图像的坐标,该响应得分值可用来标记角点的稳健程度,且当响应得分值越大时,则意味着对应的角点的稳健程度更好。现有技术对于FAST方法得到角点的坐标和该响应得分值的方法已有描述,本实施例不做展开。
进一步地,通常一张切片图像中包括的角点的数量可能较多,为了减小后续处理的数据量,还可以对每个切片图像包括的角点进行非极大值抑制处理,将抑制处理后的角点作为切片图像的特征点。其中,通过每个角点对应的响应得分值进行非极大值抑制处理,具体可以包括:对于每个角点,在该角点所在的指定区域范围内,若该角点的响应得分值是该指定区域内包括的所有角点的响应得分值中最大的,则保留该角点,该角点即为特征点;若该角点的响应得分值不是该预设区域内包括的所有角点的响应得分值中最大的,则将该角点的响应得分值设置为0,该角点不作为特征点。
需要说明的是,一个角点所在的指定区域可以是指该角点所在的切片图像中的指定区域,该指定区域的大小可以根据实际需求进行设置,本申请实施例对此不做具体限定。
在实际应用中,上述S302和S303可以在一个大循环中处理完成,以N等于8为例,则该循环可以包括8个子循环,每个子循环对应一层金字塔图像的特征点提取。具体地,按照8层金字塔图像从下往上的顺序,在第1个子循环内对第0层金字塔图像进行切片,并分别提取第0层金字塔图像中每个切片图像的特征点,得到第0层金字塔图像包括的多个切片图像的特征点;之后,在第2个子循环内对第1层金字塔图像(第0层金字塔图像进行下采样处理后得到)进行切片,并分别提取第1层金字塔图像中每个切片图像的特征点,得到第1层金字塔图像包括的多个切片图像的特征点;以此类推,直到获取第7层金字塔图像包括的多个切片图像的特征点。
进一步地,属于同一层金字塔图像的多个切片图像的特征点可以将其称为特征点 集合,则8层金字塔图像中的每层金字塔图像可以对应一个特征点集合,从而得到8个特征点集合,该处理器可以将每个特征点集合存储在一起,比如,将其存储在连续的存储空间中,这样可以方便下述S304中一次性地获取一层金字塔图像包括的多个切片图像的所有特征点。
S304:对每层金字塔图像的多个切片图像的特征点进行筛选处理,得到每层金字塔图像的多个目标特征点,每个目标特征点为选择出的预设区域中响应得分值最大的特征点。
由于上述S303中得到的每层金字塔图像的特征点的数量会比较大,实际需要的每层金字塔图像的特征点的数量会小于S303中得到的特征点的数量,因此,为了避免该处理器对不必要的特征点进行后续处理,以节省功耗和处理时间,该处理器可以对每层金字塔图像的特征点集合进行筛选,以减小该移动设备后续的处理数据量。
对于N层金字塔图像,该移动设备可以按照从下往上的顺序,依次对每层金字塔图像的特征点集合进行筛选处理,具体可以包括:获取每层金字塔图像的多个切片图像的特征点,得到特征点集合;从该特征点集合中筛选出指定数量的特征点,得到多个目标特征点。
具体地,对于每层金字塔图像,在该移动设备获取到该层金字塔图像对应的特征点集合后,该移动设备可以通过八叉树(octree)从该特征点集合中,筛选出指定数量的特征点,从而得到多个目标特征点。之后,该移动设备从多个目标特征点中分别确定出属于同一切片图像的目标特征点。
具体地,该移动设备可以通过八叉树从该特征点集合中,筛选出指定数量的特征点具体可以包括:当该移动设备获取到该层金字塔图像对应的特征点集合后,该移动设备可以利用八叉树对该层金字塔图像进行网格划分(划分得到的网格可以称为第一网格,比如,将其划分为4×2的网格,即8个第一网格),并从特征点集合确定出属于每个第一网格的特征点子集,对于每个第一网格的特征点子集,按照每个特征点的响应得分值进行排序,从中筛选出响应得分值较高的前M个特征点,例如M为正整数。同理,利用八叉树对每个第一网格再进行网格划分(后续可以称为第二网格),确定出属于每个第二网格的特征点子集,对于每个第二网格的特征点子集,按照每个特征点的响应得分值进行排序,从中筛选出响应得分值较高的前W个特征点,例如W为正整数。依次类推,直到划分得到的某一个网格或者多个网格中不存在特征点,则筛选结束。以上八叉树筛选方法仅作为举例,现有技术中还存在其他的八叉树筛选方法,可作为以上具体筛选方法的替代方案,本实施例对此不作限定。在实际应用中,还可以使用除了八叉树之外的其他方式进行所述筛选以得到所述指定数量的特征点。
需要说明的是,确定所述多个目标特征点中属于同一切片图像的目标特征点之后,该层金字塔图像包括的多个切片图像中,可能每个切片图像中都存在目标特征点,也可能一部分切片图像中存在目标特征点、另一部分切片图像中不存在目标特征点,具体根据实际情况而定,本申请实施例对此不做具体限定。
进一步地,对于存在目标特征点的切片图像,每个切片图像的目标特征点占用连续的存储空间。具体地,该移动设备可以申请一块连续的存储空间,对于每层金字塔图像,该移动设备按照该层金字塔图像中多个切片图像的位置关系,依次存储每个切 片图像的目标特征点,从而使每个切片图像的目标特征点占用连续的存储空间。
可选地,该移动设备还可以确定同一切片图像中目标特征点的个数,以及存储该切片图像的目标特征点的存储空间的起始位置,即存储该切片图像的第一个目标特征点的存储空间的地址。当后续该移动设备需要读取每个切片图像的目标特征点时,该移动设备可以根据其目标特征点的个数和起始位置,从存储空间中进行读取。
在实际应用中,上述S304也可以在一个大循环中处理完成,以N等于8为例,则该循环也可以包括8个子循环,每个子循环用于筛选一层金字塔图像对应的特征点集合。即按照8层金字塔图像从下往上的顺序,分别在第1个子循环至第8个子循环内,对第0层金字塔图像对应的特征点集合至第7层金字塔图像对应的特征点集合进行筛选处理。
S305:根据每层金字塔图像中的切片图像、以及该切片图像的目标特征点,确定目标特征点的描述信息。
其中,描述信息包括特征点方向和描述符。目标特征点需要有良好的旋转不变性,而通过上述S303得到的特征点中不包括特征点方向,所以需要确定特征点方向,以根据该特征点方向,对目标特征点的邻域进行旋转。特征点方向是指以特征点为中心,给定一个半径,这样就构造了一个圆形区域,对该圆形区域计算其像素的灰度重心,确定以特征点到灰度重心的法向量,该法向量作为其方向。描述符是每个特征点区别于其他特征点的关键信息,用于描述特征点的周围图像区域,通过在每个目标特征点的邻域进行搜索,可以建立每个目标特征点唯一的描述符。
对于层金字塔图像,该移动设备可以按照从下往上的顺序,依次对每层金字塔图像中的切片图像以及该切片图像的目标特征点进行处理,具体可以包括:对于每层金字塔图像中的切片图像进行高斯模糊处理,并根据处理后的切片图像和该切片图像的目标特征点,确定该特征点方向和描述符,以得到该切片图像的目标特征点的描述信息。
可选地,对于每层金字塔图像中切片图像的目标特征点,该可以通过该切片图像的目标特征点的数量和其在存储空间的起始位置,从存储空间中获取该目标特征点。对于每层金字塔图像中的切片图像,该可以对该层金字塔图像进行切片处理,以获取该切片图像。
在实际应用中,上述S305也可以在一个大循环中处理完成,该循环也可以包括8个子循环,每个子循环用于获取一层金字塔图像中多个切片图像的目标特征点的描述信息。即按照8层金字塔图像从下往上的顺序进行处理,在第1个子循环内对第0层金字塔图像进行切片处理,并依次根据每个切片图像和该切片图像的目标特征点,确定该切片图像的目标特征点的描述信息,直到确定第0层金字塔图像包括的所有切片图像的目标特征点的描述信息;之后,在第2个子循环内对第1层金字塔图像进行切片处理,并依次根据每个切片图像和该切片图像的目标特征点,确定该切片图像的目标特征点的描述信息,直到确定第1层金字塔图像包括的所有切片图像的目标特征点的描述信息;以此类推,直到确定第7层金字塔图像包括的所有切片图像的目标特征点的描述信息。
为便于理解,这里以设备包括ACPU和DSP,本申请实施例提供的特征提取方法 由DSP执行为例进行说明。如图4所示,硬件抽象层(Hardware Abstraction Layer,HAL)为DSP的驱动框架,设备的ACPU通过调用一些相关的应用程序接口(Application Program Interface,API)函数来调用DSP,使DSP上电、下电、以及运行算法等,HAL是ACPU的双倍速率同步动态随机存储器(Double Data Rate,DDR)与DSP的DRAM的数据交换通路,设备通过摄像单元采集到的输入图像的数据存放在DDR上,DSP通过调用相关API函数,将输入图像的数据搬运至DSP的DRAM上来处理。
在图4中,本申请实施例提供的方法总共可以包括三个部分,对应图4中的3个金字塔内帧(level)循环,第一个循环是金字塔图像获取和特征点提取部分,第二个循环是特征点筛选部分,第三个循环是描述信息计算部分。
其中,第一个循环中包括8层金字塔图像的获取(切片和下采样处理),以及对每层切片化的金字塔图像进行特征点提取(比如,使用FAST算法进行角点提取)和非极大值抑制处理。具体可以包括以下步骤:1.将输入图像进行切片处理,通过DMA方式将切片图像搬运至DRAM,比如,搬运一个分辨率为640×60的切片图像(将分辨率为640×480的输入图像切分成8个640×60的切片图像);2.每个切片图像依次做下采样处理,即水平和竖直方向分别做下采样处理,此时会产生较低分辨率的图像,比如,533x80的图像;3.特征点提取,即对下采样处理过的切片图像做角点处理,通过角点计算响应得分值,产生一个计算过角点的图像得分平面;4.根据当前切片图像的平均亮度判断此时抑制的阈值,对图像得分平面做非极大值抑制处理,如果在局部区域得分值不是最大值,则将该值设置为0(即该点被抑制),如果在局部区域是最大值,则保留该值;5.继续上述步骤1-4的处理,直到该层金字塔图像的所有切片图像均处理完成后,转移至下一层金字塔图像。
第二个循环中包括对8层金字塔图像中每层金字塔图像的特征点集合进行筛选。例如,通过八叉树对每层金字塔图像的特征点进行筛选,得到每层金字塔图像的多个目标特征点。对于所述多个目标特征点中属于每个切片图像的目标特征点使用连续内存进行存储,以方便DSP管理和后续使用。可选地,DSP可以事先根据每层金字塔图像包括的多个切片图像的位置,确定每个切片图像的目标特征点的存储位置,将属于同一切片图像的目标特征点使用连续内存进行存储,并且可以记录每个切片图像的目标特征点的数量和其存储的起始位置。具体可以包括以下步骤:将第一个循环内产生的属于同一层金字塔图像的特征点全部输入至八叉树进行筛选,直到筛选至指定数量的目标特征点为止;2.申请连续内存,根据每个切片图像在其所在的金字塔图像中的位置,对筛选出的目标特征点中属于同一切片图像的目标特征点进行存储;3.记录8层金字塔图像中每个切片图像包含的目标特征点的数量和其存储的起始位置。
第三个循环中包括对8层金字塔图像中每层金字塔图像包括的切片图像进行高斯模糊(Gaussian Blur)处理,以及根据处理后的切片图像和其包含的目标特征点,计算每个目标特征点的特征点方向信息和描述符。具体可以包括以下步骤:1.将输入图像进行切片处理,DMA搬运至DRAM,比如,搬运一个分辨率为640×60的切片图像;2.对每个切片图像进行高斯模糊处理;3.计算每个目标特征点的特征点方向信息和描述符;4.继续上述步骤1-3的处理,直到该层金字塔图像的所有切片图像均处理 完成后,转移至下一层金字塔图像。
需要说明的是,图4中假设每层金字塔图像包括N个切片图像,每部分对于8层金字塔图像的处理通过金字塔内帧循环for level0-level7实现,每层金字塔图像包括多个切片图像通过帧内片循环for slice0-sliceN实现。
相应地,如图5所示,为每个切片图像在特征提取过程中的数据或信息流向。图5中以8层金字塔图像中图像的分辨率依次为640×480、512×384、448×336、384×288、320×240、240×180、192×144和160×120为例进行说明。图5中的片特征的信息是指一个切片图像的特征点;特征点集合可以是指存储在DDR中的每层金字塔图像的所有特征点;层特征点是指一层金字塔图像的特征点;归类划分是指划分属于同一切片图像的目标特征点;目标特征点集合可以是指每层金字塔图像中使用连续存储空间存储的所有切片图像的目标特征点;描述信息集合可以是指存储在DDR中的每层金字塔图像的目标特征点的描述信息。
本申请实施例提供的图像特征提取方法中,移动通过对输入图像进行切片和下采样处理,得到N层金字塔图像,每层金字塔图像包括多个切片图像,每次提取每个切片图像的特征点,并对每层金字塔图像的所有特征点进行筛选,之后,根据切片图像和其目标特征点确定目标特征点的描述信息,从而通过切片化处理的方式以减小处理过程所需的内存空间,并通过筛选处理减小处理的数据量,从而提高数据处理速率、降低功耗。另外,通过使用DSP进行特征提取,可以进一步提高移动设备实现SLAM的数据处理速率。
上述主要从移动设备的角度对本申请实施例提供的特征提取方法进行了介绍。可以理解的是,该移动设备为了实现上述功能,其包含了执行各个功能相应的硬件结构和/或软件模块。本领域技术人员应该很容易意识到,结合本文中所公开的实施例描述的各示例的网元及算法步骤,本申请能够以硬件或硬件和计算机软件的结合形式来实现。某个功能究竟以硬件还是计算机软件驱动硬件的方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
本申请实施例可以根据上述方法示例对图像特征提取装置进行功能模块的划分,例如,可以对应各个功能划分各个功能模块,也可以将两个或两个以上的功能集成在一个处理模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。需要说明的是,本申请实施例中对模块的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。
在采用对应各个功能划分各个功能模块的情况下,图6示出了上述实施例中所涉及的图像特征提取装置的一种可能的结构示意图,该图像特征提取装置包括:获取单元601、处理单元602。其中,获取单元601用于支持该图像特征提取装置执行方法实施例中的S301;处理单元602用于支持该图像特征提取装置执行方法实施例中的S302至S305,和/或用于本文所描述的技术的其他过程。可选地,该图像特征提取装置还包括:输出单元603;输出单元603用于输入切片图像、切片图像的特征点、和/或切片图像的目标特征点等等。本装置可以如软件形式实现,并被存储在存储介质中。
上面从模块化功能实体的角度对本申请实施例中的一种图像特征提取装置进行描述,下面从硬件处理的角度对本申请实施例中的一种特征提取装置进行描述。
本申请实施例提供一种图像特征提取装置,该装置的结构可以如图7所示,该图像特征提取装置包括:存储器701、处理器702、通信接口703和总线704。其中,通信接口703可以包括输入接口7031和输出接口7032。相应地,当该图像特征提取为手机时,存储器701可以为图2中的存储器201,处理器702可以为图2中的处理器202(比如,处理器702具体为DSP),通信接口703可以为图2中的输入\输出接口206。
输入接口7031:该输入接口可用于获取输入图像,该输入图像为灰度图像;在一些可行的实施例中,该输入接口可通过分时复用的方式获取上述输入图像;在一些可行的实施例中,可以只有一个输入接口,也可以有多个输入接口。
处理器702:被配置为可处理上述图像特征提取方法的S302-S305部分的功能。在一些可行的实施例中,该处理器可以是单处理器结构、多处理器结构、单线程处理器以及多线程处理器等;在一些可行的实施例中,处理器可以是中央处理器单元,通用处理器,数字信号处理器,专用集成电路,现场可编程门阵列或者其他可编程逻辑器件、晶体管逻辑器件、硬件部件或者其任意组合。其可以实现或执行结合本申请公开内容所描述的各种示例性的逻辑方框,模块和电路。所述处理器也可以是实现计算功能的组合,例如包含一个或多个微处理器组合,数字信号处理器和微处理器的组合等等。
输出接口7032:该输出接口用于输出上述图像特征提取方法中的处理结果,在一些可行的实施例中,该处理结果可以由处理器直接输出,也可以先被存储于存储器中,然后经存储器输出;在一些可行的实施例中,可以只有一个输出接口,也可以有多个输出接口。在一些可行的实施例中,该输出接口输出的处理结果可以送到存储器中存储,也可以送到另外的处理流程中继续进行处理,或者送到显示设备进行显示、送到播放器终端进行播放等。
存储器701:该存储器中可存储上述的输入图像、以及配置处理器的相关指令等。在一些可行的实施例中,可以有一个存储器,也可以有多个存储器;该存储器可以是软盘,硬盘如内置硬盘和移动硬盘,磁盘,光盘,磁光盘如CD_ROM、DCD_ROM,非易失性存储设备如RAM、ROM、PROM、EPROM、EEPROM、闪存、或者技术领域内所公知的任意其他形式的存储介质。
总线704:该总线可以是外设部件互连标准(Peripheral Component Interconnect,PCI)总线或扩展工业标准结构(Extended Industry Standard Architecture,EISA)总线等。所述总线可以分为地址总线、数据总线、控制总线等。为便于表示,图7中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。
本申请实施例提供的上述图像特征提取装置的各组成部分分别用于实现相对应的前述特征提取方法的各步骤的功能,由于在前述的图像特征提取方法实施例中,已经对各步骤进行了详细说明,在此不再赘述。
本申请实施例还提供了一种计算机可读存储介质,该计算机可读存储介质中存储有指令,当其在一个设备(比如,该设备可以是单片机,芯片、计算机等)上运行时, 使得该设备执行上述图像特征提取方法的S301-S305中的一个或多个步骤。上述图像特征提取装置的各组成模块如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在所述计算机可读取存储介质中。
基于这样的理解,本申请实施例还提供一种包含指令的计算机程序产品,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)或其中的处理器执行本申请各个实施例所述方法的全部或部分步骤。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (19)

  1. 一种图像特征提取方法,其特征在于,所述方法包括:
    获取输入图像,所述输入图像为灰度图像;
    对所述输入图像进行切片和下采样处理,得到N层金字塔图像,所述N层金字塔图像是具有不同分辨率的N个图像,所述N层金字塔图像中的每层金字塔图像包括多个切片图像,N为大于或等于2的正整数;
    提取每层金字塔图像中每个切片图像的至少一个特征点以得到每层金字塔图像的多个切片图像的多个特征点,每个特征点为所述每层金字塔图像的角点,每个特征点以坐标和响应得分值所表征;
    对每层金字塔图像的多个切片图像的多个特征点进行筛选处理,得到每层金字塔图像的多个目标特征点,每个目标特征点为所述每层金字塔图像中预设区域中响应得分值最大的特征点。
  2. 根据权利要求1所述的方法,其特征在于,所述每个特征点为非极大值抑制处理后的角点。
  3. 根据权利要求1或2所述的方法,其特征在于,所述输入图像为第0层金字塔图像;所述对所述输入图像进行切片和下采样处理,得到N层金字塔图像,包括:
    对所述第0层金字塔图像进行切片处理,得到第0层金字塔图像的多个切片图像;
    对第i层金字塔图像中的多个切片图像下采样处理,得到第(i+1)层金字塔图像,对所述第(i+1)层金字塔图像进行切片处理,得到所述第(i+1)层金字塔图像的多个切片图像,0≤i≤N-2。
  4. 根据权利要求1-3任一项所述的方法,其特征在于,所述对每层金字塔图像的多个切片图像的多个特征点进行筛选处理,得到每层金字塔图像的多个目标特征点,包括:
    获取每层金字塔图像的多个切片图像的多个特征点,以得到特征点集合;
    利用八叉树筛选方式从所述特征点集合中筛选出指定数量的特征点,以得到多个目标特征点。
  5. 根据权利要求1-4任一项所述的方法,其特征在于,所述方法还包括:
    确定所述每个目标特征点的描述信息,所述描述信息包括特征点方向和描述符。
  6. 根据权利要求5所述的方法,其特征在于,所述确定所述目标特征点的描述信息,包括:
    对于每层金字塔图像中的切片图像进行高斯模糊处理;
    根据处理后的切片图像和所述切片图像的目标特征点,确定所述特征点方向和所述描述符,以得到所述目标特征点的描述信息。
  7. 一种图像特征提取装置,其特征在于,所述装置包括:
    获取单元,用于获取输入图像,所述输入图像为灰度图像;
    处理单元,用于:对所述输入图像进行切片和下采样处理,得到N层金字塔图像,所述N层金字塔图像是具有不同分辨率的N个图像,所述N层金字塔图像中的每层金字塔图像包括多个切片图像,N为大于或等于2的正整数;
    提取每层金字塔图像中每个切片图像的至少一个特征点,以得到每层金字塔图像的多个切片图像的多个特征点,每个特征点为所述每层金字塔图像的角点,每个特征点以坐标和响应得分值所表征;
    对每层金字塔图像的多个切片图像的多个特征点进行筛选处理,得到每层金字塔图像的多个目标特征点,每个目标特征点为所述每层金字塔图像中预设区域中响应得分值最大的特征点。
  8. 根据权利要求7所述的装置,其特征在于,所述每个切片图像的特征点为非极大值抑制处理后的角点。
  9. 根据权利要求7或8所述的装置,其特征在于,所述输入图像为第0层金字塔图像,所述处理单元,还用于:
    对所述第0层金字塔图像进行切片处理,得到第0层金字塔图像的多个切片图像;
    对第i层金字塔图像中的多个切片图像下采样处理,得到第(i+1)层金字塔图像,对所述第(i+1)层金字塔图像进行切片处理,得到所述第(i+1)层金字塔图像的多个切片图像,0≤i≤N-2。
  10. 根据权利要求7-9任一项所述的装置,其特征在于,所述处理单元,还用于:
    获取每层金字塔图像的多个切片图像的多个特征点,以得到特征点集合;
    利用八叉树筛选方式从所述特征点集合中筛选出指定数量的特征点,以得到多个目标特征点。
  11. 根据权利要求7-10任一项所述的装置,其特征在于,所述处理单元,还用于:
    确定所述每个目标特征点的描述信息,所述描述信息包括特征点方向和描述符。
  12. 根据权利要求11所述的装置,其特征在于,所述处理单元,还用于:
    对于每层金字塔图像中的切片图像进行高斯模糊处理;
    根据处理后的切片图像和所述切片图像的目标特征点,确定所述特征点方向和所述描述符,得到所述目标特征点的描述信息。
  13. 一种图像特征提取装置,其特征在于,所述装置包括:
    输入接口,用于获取输入图像,所述输入图像为灰度图像;
    处理器,被配置为可处理如下操作:
    对所述输入图像进行切片和下采样处理,得到N层金字塔图像,所述N层金字塔图像是具有不同分辨率的N个图像,所述N层金字塔图像中的每层金字塔图像包括多个切片图像,N为大于或等于2的正整数;
    提取每层金字塔图像中每个切片图像的至少一个特征点,以得到每层金字塔图像的多个切片图像的多个特征点,每个特征点为所述每层金字塔图像的角点,每个特征点以坐标和响应得分值所表征;
    对每层金字塔图像的多个切片图像的多个特征点进行筛选处理,得到每层金字塔图像的多个目标特征点,每个目标特征点为所述每层金字塔图像中预设区域中响应得分值最大的特征点。
  14. 根据权利要求13所述的装置,其特征在于,所述处理器为数字信号处理器DSP。
  15. 根据权利要求13或14所述的装置,其特征在于,所述每个切片图像的特征 点为非极大值抑制处理后的角点。
  16. 根据权利要求13-15任一项所述的装置,其特征在于,所述输入图像为第0层金字塔图像,所述处理器还执行以下操作:
    对所述第0层金字塔图像进行切片处理,得到第0层金字塔图像的多个切片图像;
    对第i层金字塔图像中的多个切片图像下采样处理,得到第(i+1)层金字塔图像,对所述第(i+1)层金字塔图像进行切片处理,得到所述第(i+1)层金字塔图像的多个切片图像,0≤i≤N-2。
  17. 根据权利要求13-16任一项所述的装置,其特征在于,所述处理器还执行以下操作:
    获取每层金字塔图像的多个切片图像的多个特征点,以得到特征点集合;
    利用八叉树筛选方式从所述特征点集合中筛选出指定数量的特征点,以得到多个目标特征点。
  18. 根据权利要求13-17任一项所述的装置,其特征在于,所述处理器还执行以下操作:
    确定所述每个目标特征点的描述信息,所述描述信息包括特征点方向和描述符。
  19. 根据权利要求18所述的装置,其特征在于,所述处理器还执行以下操作:
    对于每层金字塔图像中的切片图像进行高斯模糊处理;
    根据处理后的切片图像和所述切片图像的目标特征点,确定所述特征点方向和所述描述符,得到所述目标特征点的描述信息。
PCT/CN2018/087707 2018-05-21 2018-05-21 一种图像特征提取方法及装置 WO2019222889A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201880086693.7A CN111630523A (zh) 2018-05-21 2018-05-21 一种图像特征提取方法及装置
PCT/CN2018/087707 WO2019222889A1 (zh) 2018-05-21 2018-05-21 一种图像特征提取方法及装置

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/087707 WO2019222889A1 (zh) 2018-05-21 2018-05-21 一种图像特征提取方法及装置

Publications (1)

Publication Number Publication Date
WO2019222889A1 true WO2019222889A1 (zh) 2019-11-28

Family

ID=68616239

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/087707 WO2019222889A1 (zh) 2018-05-21 2018-05-21 一种图像特征提取方法及装置

Country Status (2)

Country Link
CN (1) CN111630523A (zh)
WO (1) WO2019222889A1 (zh)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111257588A (zh) * 2020-01-17 2020-06-09 东北石油大学 一种基于orb和ransac的油相流速测量方法
CN112614167A (zh) * 2020-12-17 2021-04-06 西南石油大学 一种结合单偏光与正交偏光图像的岩石薄片图像对齐方法
CN113361545A (zh) * 2021-06-18 2021-09-07 北京易航远智科技有限公司 图像特征提取方法、装置、电子设备和存储介质
CN117315274A (zh) * 2023-11-28 2023-12-29 淄博纽氏达特机器人系统技术有限公司 一种基于自适应特征提取的视觉slam方法

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112907662B (zh) * 2021-01-28 2022-11-04 北京三快在线科技有限公司 特征提取方法、装置、电子设备及存储介质
CN113191370A (zh) * 2021-04-26 2021-07-30 安徽工程大学 一种基于阈值自适应阈值调整的orb算法
CN113610883B (zh) * 2021-04-30 2022-04-08 新驱动重庆智能汽车有限公司 点云处理系统和方法、计算机设备和存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050218338A1 (en) * 2004-04-01 2005-10-06 Cybio Ag Optical analytic measurement device for fluorescence measurements in multisample carriers
CN104134214A (zh) * 2014-07-29 2014-11-05 杭州卓腾信息技术有限公司 一种基于影像金字塔分层的数字切片图像存储及显示方法
CN104361340A (zh) * 2014-11-04 2015-02-18 西安电子科技大学 基于显著性检测和聚类的sar图像目标快速检测方法
CN106910186A (zh) * 2017-01-13 2017-06-30 陕西师范大学 一种基于cnn深度学习的桥梁裂缝检测定位方法

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100489848C (zh) * 2005-11-02 2009-05-20 北京灵图软件技术有限公司 以金字塔架构存储地形影像数据的方法
CN105844616B (zh) * 2016-03-17 2019-06-11 湖南优象科技有限公司 激光散射斑点辅助下的双目立体匹配算法与装置
CN106407943A (zh) * 2016-09-28 2017-02-15 天津工业大学 金字塔层定位的快速dpm行人检测

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050218338A1 (en) * 2004-04-01 2005-10-06 Cybio Ag Optical analytic measurement device for fluorescence measurements in multisample carriers
CN104134214A (zh) * 2014-07-29 2014-11-05 杭州卓腾信息技术有限公司 一种基于影像金字塔分层的数字切片图像存储及显示方法
CN104361340A (zh) * 2014-11-04 2015-02-18 西安电子科技大学 基于显著性检测和聚类的sar图像目标快速检测方法
CN106910186A (zh) * 2017-01-13 2017-06-30 陕西师范大学 一种基于cnn深度学习的桥梁裂缝检测定位方法

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111257588A (zh) * 2020-01-17 2020-06-09 东北石油大学 一种基于orb和ransac的油相流速测量方法
CN112614167A (zh) * 2020-12-17 2021-04-06 西南石油大学 一种结合单偏光与正交偏光图像的岩石薄片图像对齐方法
CN113361545A (zh) * 2021-06-18 2021-09-07 北京易航远智科技有限公司 图像特征提取方法、装置、电子设备和存储介质
CN113361545B (zh) * 2021-06-18 2024-04-05 北京易航远智科技有限公司 图像特征提取方法、装置、电子设备和存储介质
CN117315274A (zh) * 2023-11-28 2023-12-29 淄博纽氏达特机器人系统技术有限公司 一种基于自适应特征提取的视觉slam方法
CN117315274B (zh) * 2023-11-28 2024-03-19 淄博纽氏达特机器人系统技术有限公司 一种基于自适应特征提取的视觉slam方法

Also Published As

Publication number Publication date
CN111630523A (zh) 2020-09-04

Similar Documents

Publication Publication Date Title
WO2019222889A1 (zh) 一种图像特征提取方法及装置
CN111243093B (zh) 三维人脸网格的生成方法、装置、设备及存储介质
KR101457313B1 (ko) 템플릿 스위칭 및 특징 적응을 이용한 오브젝트 추적을 제공하는 방법, 장치 및 컴퓨터 프로그램 제품
CN106875431B (zh) 具有移动预测的图像追踪方法及扩增实境实现方法
JP6500355B2 (ja) 表示装置、表示プログラム、および表示方法
US20150077591A1 (en) Information processing device and information processing method
JP2015079490A (ja) フレームを選択する方法、装置、及びシステム
KR20160129000A (ko) 모바일 디바이스를 위한 실시간 3d 제스처 인식 및 트랙킹 시스템
WO2022156533A1 (zh) 三维人体模型重建方法、装置、电子设备及存储介质
JP7210089B2 (ja) リソースの表示方法、装置、機器及びコンピュータプログラム
TW202141340A (zh) 圖像處理方法、電子設備和電腦可讀儲存介質
TWI676113B (zh) 虹膜識別過程中的預覽方法及裝置
CN103079016B (zh) 一种拍照脸型变换方法及智能终端
CN109658453B (zh) 圆心确定方法、装置、设备及存储介质
CN110493514A (zh) 图像处理方法、存储介质及电子设备
WO2023273071A1 (zh) 一种图像处理方法、装置及电子设备
WO2020107267A1 (zh) 一种图像特征点匹配方法及装置
JP2022519398A (ja) 画像処理方法、装置及び電子機器
WO2023109086A1 (zh) 文字识别方法、装置、设备及存储介质
CN107993247A (zh) 追踪定位方法、系统、介质和计算设备
US11790483B2 (en) Method, apparatus, and device for identifying human body and computer readable storage medium
CN116228850A (zh) 物体姿态估计方法、装置、电子设备及可读存储介质
JP2014085845A (ja) 動画処理装置、動画処理方法、プログラム、および集積回路
US11080864B2 (en) Feature detection, sorting, and tracking in images using a circular buffer
WO2019237286A1 (zh) 一种筛选局部特征点的方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18919493

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18919493

Country of ref document: EP

Kind code of ref document: A1