CN112424787A

CN112424787A - Method and device for extracting image key points

Info

Publication number: CN112424787A
Application number: CN201880095485.3A
Authority: CN
Inventors: 左韶军; 林天鹏; 占云龙; 赵强; 王林召
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2018-09-20
Filing date: 2018-09-20
Publication date: 2021-02-26
Anticipated expiration: 2038-09-20
Also published as: WO2020056688A1; CN112424787B

Abstract

A method and a device for extracting image key points belong to the technical field of electronics. The method comprises the following steps: obtaining an image pyramid (301) of an image, the image pyramid comprising N layers of images, N > 1; determining the size of a target window according to the ith layer of image of the image pyramid, and determining at least one output key point of the ith layer of image according to the size of the target window, wherein i is more than or equal to 1 and less than or equal to N; and determining the output key points of the images of all layers of the image pyramid as the key points of the images. By adopting the method, the number of the key points extracted from each image can be balanced.

Description

Method and device for extracting image key points

Technical Field

The present application relates to the field of electronic technologies, and in particular, to a method and an apparatus for extracting image key points.

Background

In image processing, image matching is generally performed by using key points in image pixel points. The key points (also called feature points or interest points) of the image are some points which are highlighted in the image and have representative meaning, and the image can be identified, matched with the image or reconstructed in 3D (three-dimensional) mode.

In the process of extracting the image key points, firstly, an image pyramid is constructed in a mode of carrying out down-sampling on an image layer by layer, namely, an initial image is used as an image of a layer 1, all pixel points in the image of the layer i are subjected to down-sampling according to a certain sampling rate, the pixel points after the down-sampling are obtained and used as an image of a layer i +1, wherein the pixel size of the image of the layer i +1 is smaller than that of the image of the layer i, and i is a positive integer larger than or equal to 1. Then, dividing each layer of image into image blocks, determining a Feature score of each pixel point based on a FAST from accessed Segment Test (FAST-away Feature detection) algorithm for each image block, and determining the pixel points with the Feature scores larger than a threshold value as key points. Finally, non-maximum suppression can be performed on each key point of the image block according to a window with a preset size, that is, a key point with the highest feature score is extracted within a window range with one key point as the center, and the method can be understood as local maximum search. And performing the processing on each image block of one layer of image in the image pyramid to obtain the key point of the layer of image. And traversing each layer of image to obtain key points, thus obtaining the key points of the whole image.

The scheme for acquiring the key points of the image at least has the following problems: the size of the non-maximum suppression window is fixed, the number of extracted key points may be too large for an image with too rich texture, the number of extracted key points may be too small for an image with too sparse texture, that is, the difference between the number of extracted key points is large for images with different texture complexity, imbalance of key points may affect subsequent processing, for example, the number of key points may be too large to increase the amount of computation for image matching, and the number of key points may be too small to lower the accuracy for image matching.

Disclosure of Invention

The embodiment provides a method and a device for extracting image key points, which can balance the number of key points extracted from each image. The technical scheme is as follows:

in one aspect, a method for extracting image key points is provided, and the method includes: the image processing equipment acquires an image pyramid of an image, wherein the image pyramid comprises N layers of images, and N is greater than 1; determining the size of a target window according to the ith layer of image of the image pyramid, and determining at least one output key point of the ith layer of image according to the size of the target window, wherein i is more than or equal to 1 and less than or equal to N; and determining the output key points of the images of all layers of the image pyramid as the key points of the images.

Through the above processing, the image processing apparatus can adjust the size of the window for non-maximum suppression on a per-layer image basis before performing the non-maximum suppression processing in the process of extracting the key points of the image, so that the size of the window for non-maximum suppression can be changed with each layer of image, and the number of key points extracted per image can be equalized.

In one possible implementation, determining a target window size according to an ith layer image of the image pyramid, and determining at least one output keypoint of the ith layer image according to the target window size includes: extracting at least one candidate key point of the ith layer of image of the image pyramid, determining the size of a target window according to the ith layer of image, and performing non-maximum suppression processing on the at least one candidate key point of the ith layer of image according to the size of the target window to obtain at least one output key point of the ith layer of image. Through the processing, the image processing equipment can perform non-maximum suppression processing on the candidate key points of each layer of image, and due to the extraction of the candidate key points, the number of the key points during the non-maximum suppression processing can be reduced, and the processing efficiency is improved.

In a possible implementation, the image is a static image, and the determining the size of the target window according to the ith layer image of the image pyramid includes: if the ith layer image is the 1 st layer image, the image processing equipment determines the size of a preset window as the size of a target window; otherwise, the image processing device determines the i-1 layer image in the image pyramid of the image as a reference image of the i layer image, and determines the size of the target window according to the reference image. The images of two adjacent layers in the same image pyramid can be obtained according to sampling, that is, the texture of the image of a different layer can be the same as that of the original image, so that the texture complexity of the reference image obtained through the processing is similar to that of the image for extracting the key points.

In a possible implementation, the image is a frame of image in a video stream, and the determining the size of the target window according to the ith layer image of the image pyramid includes: if the image is any frame image except the 1 st frame image in the video stream, the image processing device determines the ith layer image of the image pyramid of the previous frame image of the image as the reference image of the ith layer image of the image pyramid of the image, and determines the size of the target window according to the reference image. The images of two adjacent frames may have similar pictures, so that the texture complexity of the reference image obtained by the above processing is similar to that of the image for extracting the key point.

In one possible embodiment, the method further comprises: if the image is a 1 st frame image in the video stream and the ith layer image of the image pyramid of the image is a 1 st layer image, the image processing equipment determines the size of a preset window as the size of a target window; if the image is the 1 st frame image in the video stream, and the ith layer image of the image pyramid of the image is an image except the 1 st layer image, the image processing equipment determines the i-1 th layer image of the image pyramid of the image as a reference image of the ith layer image of the image pyramid of the image, and determines the size of the target window according to the reference image. If the previous frame image of the 1 st frame image does not exist, the reference image can be determined based on the method similar to the static image, and the accuracy of determining the size of the target window is improved.

In one possible embodiment, determining the target window size from the reference image comprises: the image processing device determines a first key point number of a reference image, and determines the size of a target window according to the first key point number and a second key point number of the reference image, wherein the first key point number is a preset key point number, and the second key point number is an output key point number of the reference image. Through the processing, the size of the non-maximum value inhibition window of the current image can be determined according to the degree that the number of the output key points of the reference image is close to the preset key points, so that when the key points of the image are extracted based on the adjusted size of the target window, the output key points of each layer of the image are close to the preset number of the key points, and therefore if each image is close to the preset key points, the effect of balancing the key points of each image can be achieved.

In one possible embodiment, determining a first number of keypoints for a reference image comprises: and determining the number of first key points corresponding to the reference image according to the number of layers of the reference image in the image pyramid, wherein a preset corresponding relation exists between the number of layers of the reference image in the image pyramid and the number of the first key points. Through the processing, the number of the first key points of each layer of image is different, and the method can adapt to the situation that the pixel size of each layer of image is different.

In a possible implementation manner, in the preset corresponding relationship, the number of the first key points of the two adjacent layers of images satisfies a preset proportion, and the preset proportion is equal to the proportion of the number of the pixel points of the two adjacent layers of images in the image pyramid. Each layer of image in the image pyramid has different first key point numbers, the expected key point numbers of two adjacent layers meet the preset proportion, and the proportion of the first key point number of each layer of image in the total number of the pixel points is ensured to be constant through the processing.

In one possible implementation, determining the size of the target window according to the first number of key points and the second number of key points of the reference image includes: the image processing device determines the ratio of the number of the second key points to the number of the first key points; determining a target ratio range in which the ratio is positioned and a target window level corresponding to the target ratio range according to the corresponding relation between the preset ratio range and the window level; and determining the size of the target window according to the level of the target window. Through the processing, the degree that the number of the output key points is close to the preset number of the key points can be measured by utilizing the ratio of the number of the second key points to the number of the first key points, the ratio ranges of different ratios correspond to different window levels, and the larger the ratio is, the larger the corresponding window level can be. The size of the non-maximum suppression window is increased for the same image, so that the number of key points obtained after the non-maximum suppression processing can be reduced. Therefore, if the ratio of the number of the second key points to the number of the first key points is too large, for example, greater than 2, the window size is appropriately increased through the corresponding relationship between the ratio range and the window level, the number of the output key points of the current image can be reduced, and the effect of balancing the number of the key points is achieved.

In one possible embodiment, determining the size of the target window according to the target window level includes: determining a target window group corresponding to the ith layer of image according to the corresponding relation between the preset layer number and the window group, wherein the window group comprises at least one window size corresponding to the window level; and determining the window size corresponding to the target window level in the target window group as the target window size. The pixel size of each layer of image is gradually reduced along with the increase of the layer number, and if the window size of the same window level is correspondingly set to be reduced layer by layer, the number of key points of each layer of image can be balanced. Because each layer of image of the image pyramid can be used for multi-scale expression of the image, namely simulating images with different blurring degrees, and balancing the number of key points of each layer of image, each blurring degree can have a certain number of key points, and the image matching accuracy is improved.

In one possible embodiment, extracting at least one candidate keypoint of the ith layer image includes: for each image block of the i-th layer image, determining the feature score of each pixel point in the image block according to a preset feature detection algorithm, and determining the pixel points with the feature scores larger than a preset threshold value as candidate key points of the image block; and determining candidate key points of each image block of the ith layer image as at least one candidate key point of the ith layer image. Through the processing, the pixel points can be screened through the preset threshold value, and the number of key points in non-maximum value inhibition processing is reduced.

In one aspect, an apparatus for extracting image keypoints comprises at least one module, and the at least one module is used for implementing the method for extracting image keypoints.

In one aspect, an image processing apparatus is provided, which includes a memory for storing instructions and a processor for calling the instructions and executing the method for extracting the image keypoints.

In one aspect, a computer-readable storage medium is provided, which, when run on an image processing apparatus, causes the image processing apparatus to perform the above-described method of extracting image keypoints.

In one aspect, a computer program product containing instructions is provided, which when run on an image processing apparatus, causes the image processing apparatus to perform the above-described method of extracting keypoints of an image.

The technical scheme provided by the embodiment has the following beneficial effects:

in this embodiment, when extracting the key points from the image, the image processing apparatus may adjust the size of the target window, that is, adjust the size of the non-maximum-suppressed window based on each layer of image in the image pyramid of the image, so that the size of the non-maximum-suppressed window may be changed with each layer of image, and the number of key points in each image is balanced, so as to reduce the negative influence on image matching.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without creative efforts.

FIG. 1 is a diagram of an implementation environment provided in this embodiment;

fig. 2 is a schematic structural diagram of an image processing apparatus provided in the present embodiment;

FIG. 3 is a flowchart of a method for extracting image key points according to the present embodiment;

fig. 4 is a flowchart of a method for obtaining candidate keypoints according to the present embodiment;

fig. 5 is a schematic diagram of a reference image provided in this embodiment;

FIG. 6 is a flowchart of a method for extracting image key points according to the present embodiment;

FIG. 7 is a flowchart of a method for extracting image key points according to the present embodiment;

fig. 8 is a schematic diagram of a reference image provided in this embodiment;

fig. 9 is a schematic diagram of an apparatus for extracting image key points according to this embodiment.

Detailed Description

The present embodiment provides a method of extracting image keypoints, which can be implemented by an image processing apparatus. Fig. 1 is a diagram of an implementation environment provided in this embodiment. The implementation environment includes a plurality of terminals 101, an image processing apparatus 102 for providing services to the plurality of terminals. A plurality of terminals 101 are connected to the image processing apparatus 102 through a wireless or wired network. The image processing apparatus 102 may provide the terminal 101 with a service of extracting key points of an image. For the image processing apparatus 102, the image processing apparatus 102 may further have at least one database for storing images of key points to be extracted, key points of the above images, and the like. The terminal 101, as a requester of the service, may transmit an image from which a key point is to be extracted to the image processing apparatus 102.

The image processing device 102 may include a processor 210, a transceiver 220. The transceiver 220 may be coupled to the processor 210 as shown in fig. 2. The transceiver 220 may be used to transmit and receive messages or data, i.e., may receive images of key points to be extracted, etc. transmitted by the terminal 101. The processor 210 may be the control center of the image processing apparatus 102, and various interfaces and lines are used to connect various parts of the entire image processing apparatus 102, such as the transceiver 220. In the present Application, the processor 210 may be an ASIC (Application-Specific Integrated Circuits), which may be used to extract image keypoints. Processor 210 may include one or more processing units. The processor 210 may integrate an application processor, which primarily handles the operating system, and a modem, which primarily handles wireless communications. The processor 210 may also be a digital signal processor, a central processing unit, or the like. The image processing device 102 may also include a memory 230. The memory 230 may be used to store images of keypoints to be extracted, keypoints of images, and the like. The image processing device 102 may also include an input/output interface 240 that may provide an interface between the processor 210 and peripheral interface modules, which may be keys or the like.

In the process of extracting the key points of the image, the method introduces a reference image to determine the size of a non-maximum suppression window. For an image pyramid formed by images of key points to be extracted, the reference images of the ith layer of images can be the following two types: first, when the image is a still image or a frame image in a video stream, the reference image of the ith layer image may be the i-1 layer image in the same image pyramid; second, when the image is a frame of image in a video stream, the reference image of the i-th layer image may be the i-th layer image in the image pyramid of the previous frame of image. The still image may refer to an independent image, and the key point extracted by the image processing device is independent of other images, for example, the still image may be a shot photograph; correspondingly, one frame of image in the video stream is not an independent image, each frame of image has a time sequence relationship with other frames of images, and the key points extracted from one frame of image by the image processing device are related to the images of the adjacent frames.

The two reference images have the common point that the texture complexity of the reference image and the image of the key point to be extracted is approximate. The reason is that, for the first reference image, the images of two adjacent layers in the same image pyramid can be obtained according to sampling, that is, the texture of the images of different layers can be the same as that of the original image; for the second reference image, because the time interval between the images of the adjacent frames is small, for example, only 40 ms, the pictures represented by the images of the adjacent frames may be relatively close, i.e., the texture between the images of the adjacent frames is similar.

The image features between the images with approximate texture complexity are also approximate, that is, the number of the obtained key points is approximate when the key points are extracted based on the same method. Therefore, if the extraction of the key points of the reference image is completed when the key points are extracted from the image, the number of the key points of the reference image can be used to determine whether the corresponding method for extracting the key points is suitable, so as to determine whether to apply the same method or determine how to adjust. Because the non-maximum suppression window can screen the key points, the number of the key points of the reference image can be measured whether the size of the corresponding non-maximum suppression window is proper or not, and then the non-maximum suppression window is adjusted, so that the number of the key points extracted by each image is balanced, and the problem that the number of the key points obtained by the image with larger texture complexity is also larger in difference is avoided.

Of course, the reference image of the i-th layer image may be another reference image based on the same concept, in addition to the above two types. The reference images can be applied to the method for extracting the image key points provided by the present application, and the present application does not limit this.

An embodiment of the present application provides a method for extracting key points of a still image or a video image, taking as an example that a reference image of an i-th layer image is an i-1-th layer image in the same image pyramid, and by combining with a specific implementation, a detailed description is made of a processing flow of the method for extracting key points of an image shown in fig. 3, which may be as follows:

in step 301, the image processing apparatus acquires an image pyramid of an image.

Wherein the image pyramid can include N layers of images, N > 1.

In this embodiment, the image processing apparatus has the capability of extracting the image key points. If the image processing apparatus provides a service for extracting image key points for other terminals, a still image or video stream transmitted from the terminal may be received. Alternatively, the image processing device may have a function of capturing an image (for example, the image processing device may be a monitoring device), and may perform key point extraction on the captured image.

The image processing device may further store the image of the key point to be extracted. The image processing device may extract the key points from each frame of image in the video stream in real time, or extract the key points from the stored image, which is not limited in this embodiment.

For the image of the key point to be extracted, the image processing device may construct an image pyramid for the image, and the embodiment does not limit the specific manner of constructing the image pyramid, and for example, the image pyramid may be constructed based on an up-sampling or down-sampling method.

In one possible implementation, the process of the image processing device constructing the image pyramid may be as follows: the image processing equipment takes the image as the 1 st layer image of the image pyramid, downsamples the image of the image pyramid layer by layer according to a preset proportion to obtain the next layer image, and stops downsampling the image of the image pyramid until a construction stopping condition is reached to obtain the image pyramid of the image.

The downsampling refers to generating a thumbnail of an image, and the preset scale may refer to a scale of the downsampling. The image pyramid formed by the downsampling method takes the image of the key point to be extracted as an original image, generates thumbnails of various resolutions, and namely performs multi-scale expression on the image.

The construction stopping condition may be that the constructed image pyramid reaches a preset number of layers, or that the image of the highest layer reaches a preset size. For example, an image pyramid is constructed for an image with a pixel size of 992 × 744, the preset ratio is 1.2, and when the image pyramid reaches the 8 th layer, the construction is stopped, so that the pixel sizes of the images from the 1 st layer to the 8 th layer of the image pyramid are 992 × 744, 827 × 620, 689 × 517, 574 × 431, 478 × 359, 399 × 299, 332 × 249, 277 × 208 respectively.

After the image processing device completes the construction of the image pyramid, key points can be extracted from the image of the image pyramid layer by layer from the image of the layer 1.

Of course, the image processing device may also be an image pyramid for acquiring the image structure of other devices, which is not limited in this embodiment.

In step 302, the image processing apparatus extracts at least one candidate keypoint of an image of the i-th layer of the image pyramid.

The image processing apparatus may detect candidate keypoints for the image of the image pyramid layer by layer, for example, the candidate keypoints of the image may be detected based on a FAST-Invariant Feature Transform (FAST-fit) algorithm, a Scale-Invariant Feature Transform (SIFT) algorithm, a Speeded Up Robust Features (SURF) algorithm, or a FAST restart Keypoint (FAST-Retina Keypoint) algorithm, or the like.

In one possible implementation, the process of step 302 may be as follows: for each image block of the i-th layer image, the image processing equipment determines the feature score of each pixel point in the image block according to a preset feature detection algorithm, and determines the pixel points with the feature scores larger than a preset threshold value as candidate key points of the image block; and determining candidate key points of each image block of the ith layer image as at least one candidate key point of the ith layer image.

Taking the FAST algorithm as an example, the FAST algorithm can calculate the pixel point within the preset circular range around each pixel point in the image according to the pixel point in the image, and calculate the gradient of the pixel point, that is, calculate the feature score of the pixel point. An initial threshold value can be preset in the FAST algorithm, if the feature score of a pixel point is greater than the initial threshold value, the pixel point is indicated as a corner point, and the pixel points can be used as candidate key points.

Since there may be images with a relatively smooth texture, and key points may not be detected when detecting based on the initial threshold, two thresholds, i.e., the initial threshold and the low threshold, may be set in the FAST algorithm, for example, the initial threshold may be 20 and the low threshold may be 7. For the same image, because the low threshold can detect the corner points with relatively gentle angles, the number of key points obtained based on the low threshold detection is generally greater than that obtained based on the initial threshold detection.

If the image processing apparatus does not detect the keypoints of the image based on the initial threshold, the image needs to be re-detected based on the low threshold. The re-detection of the image increases the processing time consumption, and particularly, when the hardware realizes the extraction of the key points, more registers and longer processing time delay are needed, more cost is consumed, and the processing efficiency is lower. Therefore, another method for obtaining candidate key points is provided in this embodiment, for each image block, a feature score of each pixel point in the image block is determined, and if there is a pixel point whose feature score is greater than a first threshold, the pixel point whose feature score is greater than the first threshold is determined as a candidate key point of the image block; otherwise, determining the pixel points with the characteristic scores larger than a second threshold value as candidate key points of the image block; and determining candidate key points of each image block of the ith layer image as at least one candidate key point of the ith layer image. By the method, the candidate key points can be detected and extracted once, only the image needs to be traversed once, the pipeline processing can be performed, the image does not need to be read back, the repeated detection can be avoided, the processing efficiency is improved, and the complexity of hardware implementation can be reduced.

As shown in fig. 4, a flowchart of a method for obtaining candidate keypoints may be as follows:

in step 3021, the image processing apparatus divides each layer image of the image pyramid into a plurality of image blocks of a preset size. For example, for each layer of the image, the image processing device may divide the image into a plurality of image blocks having a pixel size of 31 × 15.

In step 3022, for each image block, the image processing apparatus determines the feature score of the pixel point in the image block, and determines the pixel point with the feature score greater than the second threshold as the first candidate key point of the image block.

The image processing device may determine the feature scores of the pixel points as described above, and the description thereof is omitted here. The second threshold may be the above-mentioned low threshold, and the image processing device may obtain a corresponding score map after determining the feature score of each pixel, where the corresponding feature score is recorded at the position of each pixel. Then, the image processing apparatus may detect a keypoint of the image based on a second threshold, and if the feature score is greater than the second threshold, the feature score may be retained on the score map, that is, retained as a first candidate keypoint; if the feature score is not greater than the second threshold, the feature score may be set to 0 on the score map. The detection of the key points is directly based on the low threshold value, so that the repeated detection condition can be avoided.

In step 3023, the image processing apparatus determines, for each image block, whether the largest feature score is larger than a first threshold.

In the process of determining the feature score of each pixel point, the image processing device may further determine and store the maximum feature score, for example, when calculating the score map, a register is used to count the maximum feature score of the candidate keypoint in the image block. Furthermore, after determining the feature score of each pixel, the image processing device may determine whether the maximum feature score of the pixel in the image block is greater than a first threshold.

The first threshold may be the initial threshold, i.e. the first threshold is greater than the second threshold.

In step 3024, if the maximum feature score is greater than the first threshold, determining a pixel point of the first candidate keypoint whose feature score is greater than the first threshold as a second candidate keypoint of the image block, and determining the second candidate keypoint as a candidate keypoint of the image block.

Since the detected keypoints based on the first threshold more satisfy the engineering requirements, if the maximum feature score is greater than the first threshold, indicating that at least one keypoint more satisfying the engineering requirements exists, the first candidate keypoints may be screened based on the first threshold. That is, if the feature score of the first candidate keypoint is greater than the first threshold, the feature score may be retained on the score map, i.e., retained as the second candidate keypoint; if the feature score of the first candidate keypoint is not greater than the first threshold, the feature score may be set to 0 on the score map.

Further, after determining the second candidate keypoints, it may be determined as candidate keypoints for the image block.

In step 3025, if the largest feature score is not greater than the first threshold, the first candidate keypoint is determined to be a candidate keypoint of the image block.

If the maximum feature score is not greater than the first threshold, which indicates that there is no keypoint more satisfying the engineering requirement, the first candidate keypoint is not screened, that is, the first candidate keypoint is determined as the candidate keypoint of the image block.

In step 3026, the image processing apparatus determines candidate keypoints for respective image blocks of each layer image as candidate keypoints for each layer image.

After the image processing device determines the candidate key points of the image block, a score map corresponding to the image block can be obtained at the same time. The feature score corresponding to the candidate key point exists at the pixel point position of the candidate key point in the score map, and the value at the pixel point position of the non-candidate key point is 0. Furthermore, for a layer image, the image processing device may summarize the candidate keypoints of each image block into candidate keypoints of the layer image, and may stitch the score maps of each image block according to the position of each image block in the layer image to obtain the score map of the layer image.

And if i is more than or equal to 1 and less than or equal to N and i is an integer, the ith layer image of the image pyramid is any layer image. For the ith layer image of the image pyramid, after the candidate key points and the score map are obtained, non-maximum suppression can be performed on the candidate key points. Before non-maximum suppression, the size of the non-maximum suppression window, which may be a convolution kernel, needs to be determined.

In this embodiment, the size of the non-local maximum suppression window of the i-th layer image is determined by using the reference image of the i-th layer image, taking the i-1-th layer image as the reference image of the i-th layer image as an example.

In step 303, the image processing apparatus determines whether the i-th layer image is a 1 st layer image.

In step 304, if the ith layer image is the 1 st layer image, the image processing apparatus sets the preset window size as the target window size corresponding to the ith layer image.

The preset window size may refer to a default window size.

When the image processing device extracts the key points, the image processing device extracts the key points of the image pyramid layer by layer from the layer 1 image, on the basis, the image processing device can acquire the layer number of the current image, and further can judge whether the image is the layer 1 image. If the current image is a layer 1 image, and no similar image has extracted the key point before, the image processing apparatus may determine the preset window size as the target window size. That is, non-maximum suppression is performed on the layer 1 image based on the preset window size.

In step 305, if the ith layer image is any layer image except the 1 st layer image, the image processing device determines the ith-1 layer image in the image pyramid of the image as a reference image of the ith layer image, determines a first key point number of the reference image, and determines the size of a target window corresponding to the ith layer image according to the first key point number and the second key point number of the reference image.

The first number of keypoints may be a preset number of keypoints, and the first number of keypoints may be a number expected to be obtained after the keypoints extracted from the reference image are extracted. The second number of keypoints may refer to the number of output keypoints, and the second number of keypoints for the reference image may be the number of output keypoints for the reference image.

As shown in the reference image schematic diagram of fig. 5, if the current image is any layer image except the layer 1 image, and the key points of the previous layer image have been extracted before, the previous layer image may be determined as the reference image, so as to determine the size of the non-maximum suppression window of the current image according to the degree of the previous layer image that satisfies the requirement that the key points have been extracted, that is, according to the degree that the number of the output key points of the reference image is close to the preset number of the key points.

In this embodiment, each layer of image in the image pyramid has a different number of first keypoints, and therefore, the processing for the image processing apparatus to determine the first keypoints may be as follows: and the image processing equipment determines the number of first key points corresponding to the reference image according to the number of layers of the reference image in the image pyramid.

And a preset corresponding relation exists between the number of layers of the reference image in the image pyramid and the number of the first key points. In the preset corresponding relation, the number of the first key points of the two adjacent layers of images meets a preset proportion, and the preset proportion is equal to the proportion of the number of the pixel points of the two adjacent layers of images in the image pyramid. The proportion of the first key point number of each layer of image in the total number of the pixel points is ensured to be certain.

The correspondence between the number of layers and the number of the first key points may be set by a technician according to actual requirements, and of course, if the number of layers of the image pyramid is too many, the process of establishing the correspondence between the number of layers and the number of the first key points may be as follows: storing the layer 1 and the first key point number as a corresponding relation item; and for the k layer, determining the number of first key points corresponding to the k layer according to a preset proportion and the number of first key points corresponding to the k-1 layer, and storing the k layer and the corresponding number of the first key points as a corresponding relation item, wherein k is greater than 1.

The first number of keypoints of the image pyramid level 1 image may be calculated from the number of image pixels, for example, the first number of keypoints may be 1% of the number of image pixels, and when the size of the pixels of the level 1 image is 992 × 744, the first number of keypoints may be 7380. Furthermore, the image processing device can calculate the number of the first key points layer by layer according to the preset proportion of down-sampling when the image pyramid is constructed, and the proportion of the number of the first key points in the total number of the pixel points of each layer of the image is ensured to be constant. The number of the first key points of each layer is calculated by adopting the preset proportion when the image pyramid is constructed, so that the number of image pixel points of each layer of image can be avoided, the processing amount is reduced, and the processing efficiency is improved.

After the image processing device determines the number of the first keypoints corresponding to the reference image, a specific process of determining the size of the target window corresponding to the ith layer image may be as follows: the image processing device determines the ratio of the number of the second key points to the number of the first key points; determining a target ratio range in which the ratio is positioned and a target window level corresponding to the target ratio range according to the corresponding relation between the preset ratio range and the window level; and determining the size of the target window according to the level of the target window.

In this embodiment, the ratio of the number of the second key points to the number of the first key points is used to measure the degree of the number of the output key points approaching the preset number of the key points. The image processing apparatus may have stored therein in advance a correspondence relationship between the range of the ratio and the window level, which is always true for different numbers of layers. The correspondence between the ratio range and the window level can be shown in table 1 below:

TABLE 1 corresponding relationship between ratio range and window level

Range of ratio	[0,1]	(1,1.5]	(1.5,2]	(2,+∞]
Window level	TALL	GRANDE	VENTI	TRENTA

Wherein, the window levels are TRENTA, VENTI, GRANDE and TALL respectively, the cup type capacity of TRENTA is the largest and the cup type capacity of TALL is the smallest according to the names of the cup types of the drinks from large to small, namely the window size sequence of different window levels is TRENTA > VENTI > GRANDE > TALL.

After determining the reference image, the image processing apparatus may obtain a second number of keypoints of the reference image, and may then calculate a ratio of the second number of keypoints to the first number of keypoints. The image processing apparatus may determine a target ratio range in which the ratio is located in the correspondence between the ratio range and the window level, and may further determine a corresponding target window level. After the image processing device determines the target window level, the window size corresponding to the target window level may be obtained, and the window size may be determined as the target window size.

The size of the non-maximum suppression window is increased for the same image, so that the number of key points obtained after the non-maximum suppression processing can be reduced. Therefore, if the ratio of the number of second keypoints to the number of first keypoints is too large, for example greater than 2, the window size may be appropriately resized so as to reduce the number of second keypoints for the current image. Because the reference image and the current image have similar textures, the window size is adjusted according to the number of key points actually output by the reference image, the number of key points extracted from the image is close to the first number of key points as much as possible, and the number of key points of each layer of image is balanced.

Each of the above window levels may correspond to a fixed window size, that is, for images of different layers, the window sizes determined according to the same window level are the same. Optionally, in a possible implementation, for images of different layers, the window sizes determined according to the same window level may be different, and the process of determining the target window size according to the target window level may be as follows: the image processing equipment determines a target window group corresponding to the ith layer of image according to the corresponding relation between the preset layer number and the window group; and determining the window size corresponding to the target window level in the target window group as the target window size.

The image processing device may store a window group corresponding to each layer of image in the image pyramid in advance, and each window group may include a window size corresponding to at least one window level. The correspondence between the number of layers and the window group can be shown in table 2 below:

TABLE 2 correspondence of layer number to Window group

	Layer 1	Layer 2	Layer 3	Layer 4	Layer 5	Layer 6	Layer 7	Layer 8
TRENTA	31*11	29*11	27*11	25*11	23*9	23*9	21*7	21*7
VENTI	23*11	21*11	19*11	17*9	15*7	15*7	13*5	13*5
GRANDE	15*11	13*9	11*9	9*7	7*5	7*5	5*3	5*3
TALL	11*11	9*9	9*9	7*7	5*5	5*5	3*3	3*3

Wherein, the window group of each layer of image comprises 4 window levels which respectively correspond to the TRENTA, VENTI, GRANDE and TALL. As can be seen from table 1, the window sizes of the same level may be the same or different for different layers, and generally, the window sizes of the same level gradually decrease as the number of layers increases.

For the ith layer image, the image processing device may determine the target window size group corresponding to the layer number according to the correspondence between the layer number and the window size group. After determining the target window level in the above process, the image processing apparatus may obtain the window size of the target window level in the target window size group as the non-maximum suppression window size. For example, for the layer 2 image, the reference image is the layer 1 image, and if the ratio of the number of the second key points to the number of the first key points is calculated to be 1.6, it may be determined that the window level is VENTI and the window size is 21 × 11.

With the increase of the number of layers, the pixel size of each layer of image is gradually reduced, the window size of the same window level is correspondingly set to be reduced layer by layer, and the number of key points of each layer of image can be balanced. Because each layer of image of the image pyramid can be used for multi-scale expression of the image, namely simulating images with different blurring degrees, and balancing the number of key points of each layer of image, each blurring degree can have a certain number of key points, and the image matching accuracy is improved.

Of course, the size of the target window may also be determined based on other information of the reference image in this embodiment, for example, the size of the target window corresponding to the i-th layer image may be determined according to the size of the non-maximum suppression window of the reference image. Therefore, the processing of step 303-305 after determining the reference image may also be: the image processing apparatus determines a target window size from the reference image.

In step 306, the image processing apparatus performs non-maximum suppression processing on at least one candidate keypoint of the ith layer image according to the size of the target window, to obtain at least one output keypoint of the ith layer image.

After obtaining the size of the target window in the foregoing process, the image processing apparatus may determine, in the score map of the i-th layer image, the keypoint with the largest feature score within the range of the window by using any candidate keypoint as the window center, and set the feature score that is not the largest to 0, that is, perform non-maximum suppression. And traversing candidate key points in the whole score map to suppress non-maximum values, and obtaining at least one key point of the ith layer of image when the traversal is finished. The key points can be used as key points of the image to be output, namely, output key points are obtained.

After obtaining the output key points of the ith layer image, adding 1 to i, that is, repeating the processing of

step

302 and 306 on the (i + 1) th layer image, extracting the key points of the (i + 1) th layer image until the extraction of the key points of the image on the highest layer is completed, and then continuing the processing of step 307.

Of course, the image processing apparatus may determine the size of the target window based on other methods besides using the reference image, for example, the size of the target window may also be determined according to the pixel size of the ith layer image of the image pyramid, which is not limited in this embodiment. Therefore, the processing of step 302-307 may also be: the image processing equipment determines the size of a target window according to the ith layer of image of the image pyramid, and determines at least one output key point of the ith layer of image according to the size of the target window.

In step 307, the image processing apparatus determines output key points of images of respective layers of the image pyramid as key points of the images.

After the output key points are determined for all the layers of images of the image pyramid, all the output key points can be used as the key points of the images. The image processing device may describe the keypoints of the image, for example, the positions, scales, directions, and the like of the keypoints may be used to describe the keypoints. Further, the image processing apparatus may store the key points for processing such as image matching using the key points in a subsequent process.

In this embodiment, for the ith layer image of the image pyramid, the image processing device uses the ith-1 layer image as a reference image, and since the texture complexity of the ith layer image is similar to that of the ith-1 layer image, the size of the window for non-maximum suppression can be adjusted based on the reference image, so that the number of extracted key points is close to the number of key points expected to be obtained, and the number of key points of each image is balanced, so as to reduce the negative influence on image matching.

In the foregoing process, the reference image of the ith layer image is the i-1 layer image, an embodiment of the present application provides a method for extracting a key point of each frame of image in a video stream, and a processing flow of the method for extracting an image key point shown in fig. 6 is described in detail with reference to a specific implementation by taking the reference image of the ith layer image as the ith layer image in the image pyramid of the previous frame of image, where the content may be as follows:

in step 601, the image processing apparatus acquires an image pyramid of an image.

The specific processing of step 601 is the same as that of step 301, and is not described herein again.

In step 602, the image processing apparatus extracts at least one candidate keypoint of an image of the i-th layer of the image pyramid.

In step 603, the image processing apparatus determines whether the image is a 1 st frame image.

The step 603 and the step 601-.

In step 604, if the image is a 1 st frame image, the image processing apparatus sets a preset window size as a target window size corresponding to each layer of image.

The image processing device may extract the key points for each frame of image according to the time sequence of the video stream, and therefore, if the image is the 1 st frame of image and no similar image has extracted the key points before, for the image pyramid of the 1 st frame of image, the size of the target window corresponding to each layer of image may be a preset window size, optionally, the preset window sizes of each layer of image may be different, and may satisfy a relationship of decreasing layer by layer, for example, the preset window size of each layer of image may be a window size at a TALL level in table 2 above.

Optionally, the 1 st frame image may also determine the reference image based on the method provided in the foregoing embodiment, as shown in the processing flow of the method for extracting the image key points in fig. 7, the specific processing may be as follows:

in step 6041, the image processing apparatus determines whether the i-th layer image is a 1 st layer image.

In step 6042, if the image is a 1 st frame image in the video stream and the i-th layer image of the image pyramid of the image is a 1 st layer image, the image processing apparatus determines the preset window size as a target window size corresponding to the i-th layer image of the image pyramid of the image.

In step 6043, if the image is a 1 st frame image in the video stream and the i-th layer image of the image pyramid of the image is an image other than the 1 st layer image, the image processing apparatus determines the i-1 st layer image of the image pyramid of the image as a reference image of the i-th layer image of the image pyramid of the image, and determines the target window size from the reference image.

The specific process of extracting the key points of the image of frame 1 shown in fig. 7 is the same as the above embodiment, and is not repeated here.

In step 605, if the image is any frame image except the 1 st frame image in the video stream, the image processing apparatus determines an ith layer image of an image pyramid of a previous frame image of the image as a reference image of the ith layer image of the image pyramid of the image, determines a first key point number of the reference image, and determines a target window size corresponding to the ith layer image of the image pyramid of the image according to a second key point number and the first key point number of the reference image.

As shown in the reference image schematic diagram of fig. 8, if the current image is any frame image except the 1 st frame image, and the key points of the previous frame image have been extracted before, the image of the corresponding layer in the image pyramid of the previous frame image may be determined as the reference image, so as to determine the size of the non-maximum suppression window of the current image according to the degree of the extracted key points of the reference image that meets the requirement, that is, according to the degree of the number of the output key points of the reference image approaching the preset key points. The specific process of determining the size of the target window according to the reference image is the same as the above embodiment, and is not described herein again.

In step 606, the image processing device performs non-maximum suppression processing on at least one candidate keypoint of the ith layer of image of the image pyramid of the image according to the size of the target window to obtain at least one output keypoint of the ith layer of image.

In step 607, the image processing apparatus determines output key points of the images of the respective layers of the image pyramid as key points of the images.

The remaining processing for extracting the image key points, except for the method for determining the reference image, is the same as the above embodiment, and is not described again in this embodiment.

In this embodiment, when any frame of image of the video stream is taken as an image, the image processing device uses the image of the i-th layer of the image pyramid of the image as a reference image, and since the texture complexity of the current frame of image is similar to that of the previous frame of image and the blur degree of the images of the same layer in the image pyramid is similar, the size of the window for suppressing the non-maximum value can be adjusted based on the reference image, so that the number of extracted key points approaches the number of desired key points, and the number of key points of each image is balanced, so as to reduce the negative influence on image matching.

Based on the same technical concept, the present embodiment further provides an apparatus for extracting image key points, where the apparatus may be the image processing device or configured in the image processing device, and as shown in fig. 9, the apparatus includes:

an obtaining module 910, configured to obtain an image pyramid of an image, where the image pyramid includes N layers of images, and N >1, and specifically may implement the obtaining function in

steps

301 and 601, and other implicit steps;

a determining module 920, configured to determine a size of a target window according to an ith layer image of the image pyramid, and determine at least one output key point of the ith layer image according to the size of the target window, where i is greater than or equal to 1 and less than or equal to N; determining the output key points of the images of all layers of the image pyramid as the key points of the images; the determination function in steps 302-307, 602-607 and other implicit steps can be implemented.

Optionally, the determining module 920 is configured to:

extracting at least one candidate key point of the ith layer of image of the image pyramid, determining the size of a target window according to the ith layer of image, and performing non-maximum suppression processing on the at least one candidate key point of the ith layer of image according to the size of the target window to obtain at least one output key point of the ith layer of image.

Optionally, the image is a static image, and the determining module 920 is configured to:

if the ith layer image is the 1 st layer image, determining the size of a preset window as the size of a target window;

otherwise, determining the i-1 layer image in the image pyramid of the image as a reference image of the i layer image, and determining the size of a target window according to the reference image.

Optionally, the image is a frame of image in a video stream, and the determining module 920 is configured to:

if the image is any frame image except the 1 st frame image in the video stream, determining the ith layer image of the image pyramid of the previous frame image of the image as the reference image of the ith layer image of the image pyramid of the image, and determining the size of the target window according to the reference image.

Optionally, the determining module 920 is further configured to:

if the image is a 1 st frame image in a video stream and the ith layer image of the image pyramid of the image is a 1 st layer image, determining the size of a preset window as the size of a target window;

if the image is the 1 st frame image in the video stream and the ith layer image of the image pyramid of the image is an image except the 1 st layer image, determining the i-1 th layer image of the image pyramid of the image as a reference image of the ith layer image of the image pyramid of the image, and determining the size of a target window according to the reference image.

Optionally, the determining module 920 is configured to:

determining a first key point number of the reference image, and determining the size of a target window according to the first key point number and a second key point number of the reference image, wherein the first key point number is a preset key point number, and the second key point number is an output key point number of the reference image.

Optionally, the determining module 920 is configured to:

and determining the number of first key points corresponding to the reference image according to the number of layers of the reference image in the image pyramid, wherein a preset corresponding relation exists between the number of layers of the reference image in the image pyramid and the number of the first key points.

Optionally, in the preset corresponding relationship, the number of the first key points of the two adjacent layers of images meets a preset proportion, and the preset proportion is equal to the proportion of the number of the pixel points of the two adjacent layers of images in the image pyramid.

Optionally, the determining module 920 is configured to:

determining a ratio of the number of the second keypoints and the number of the first keypoints;

determining a target ratio range in which the ratio is positioned and a target window level corresponding to the target ratio range according to a corresponding relation between a preset ratio range and the window level;

and determining the size of the target window according to the level of the target window.

Optionally, the determining module 920 is configured to:

determining a target window group corresponding to the ith layer of image according to a corresponding relation between a preset layer number and the window group, wherein the window group comprises at least one window size corresponding to a window level;

and determining the window size corresponding to the target window level in the target window group as the target window size.

Optionally, the determining module 920 is configured to:

for each image block of the i-th layer image, determining the feature score of each pixel point in the image block according to a preset feature detection algorithm, and determining the pixel point with the feature score larger than a preset threshold value as a candidate key point of the image block;

and determining candidate key points of each image block of the ith layer image as at least one candidate key point of the ith layer image.

It should be noted that the obtaining module 910 may be implemented by a processor, and the determining module 920 may be implemented by a processor and a memory together.

It should be noted that: the apparatus for extracting key points of an image according to the foregoing embodiments is only illustrated by the above-mentioned division of each functional module when extracting key points of an image, and in practical applications, the above-mentioned function distribution may be completed by different functional modules according to needs, that is, the internal structure of the image processing device is divided into different functional modules, so as to complete all or part of the above-mentioned functions. In addition, the apparatus for extracting image key points and the method for extracting image key points provided by the above embodiments belong to the same concept, and specific implementation processes thereof are detailed in the method embodiments and are not described herein again.

In the above embodiments, all or part of the implementation may be realized by software, hardware or a combination thereof, and when the implementation is realized by software, all or part of the implementation may be realized in the form of a computer program product. The computer program product comprises one or more computer program instructions which, when loaded and executed on a computer, cause the flow or functions described in the embodiments to be performed, in whole or in part. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by wire (e.g., coaxial cable, optical fiber, twisted pair) or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any medium that can be accessed by a computer or a data storage device including one or more integrated media, servers, data centers, and the like. The medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape, etc.), an optical medium (e.g., optical disk, etc.), or a semiconductor medium (e.g., solid state disk, etc.).

Claims

A method for extracting image key points, characterized in that the method comprises:

acquiring an image pyramid of an image, wherein the image pyramid comprises N layers of images, and N is greater than 1;

determining the size of a target window according to the ith layer of image of the image pyramid, and determining at least one output key point of the ith layer of image according to the size of the target window, wherein i is more than or equal to 1 and less than or equal to N;

and determining the output key points of the images of all layers of the image pyramid as the key points of the images.
The method of claim 1, wherein determining a target window size from the ith layer image of the image pyramid and determining at least one output keypoint for the ith layer image from the target window size comprises:

extracting at least one candidate key point of the ith layer of image of the image pyramid, determining the size of a target window according to the ith layer of image, and performing non-maximum suppression processing on the at least one candidate key point of the ith layer of image according to the size of the target window to obtain at least one output key point of the ith layer of image.
The method of claim 1, wherein the image is a static image, and wherein determining a target window size from the ith layer image of the image pyramid comprises:

if the ith layer image is the 1 st layer image, determining the size of a preset window as the size of a target window;

otherwise, determining the i-1 layer image in the image pyramid of the image as a reference image of the i layer image, and determining the size of a target window according to the reference image.
The method of claim 1, wherein the image is a frame of image in a video stream, and wherein determining the target window size from the ith layer image of the image pyramid comprises:

if the image is any frame image except the 1 st frame image in the video stream, determining the ith layer image of the image pyramid of the previous frame image of the image as the reference image of the ith layer image of the image pyramid of the image, and determining the size of the target window according to the reference image.
The method of claim 4, further comprising:

if the image is a 1 st frame image in a video stream and the ith layer image of the image pyramid of the image is a 1 st layer image, determining the size of a preset window as the size of a target window;

if the image is the 1 st frame image in the video stream and the ith layer image of the image pyramid of the image is an image except the 1 st layer image, determining the i-1 th layer image of the image pyramid of the image as a reference image of the ith layer image of the image pyramid of the image, and determining the size of a target window according to the reference image.
The method according to any of claims 3-5, wherein said determining a target window size from said reference image comprises:

determining a first key point number of the reference image, and determining the size of a target window according to the first key point number and a second key point number of the reference image, wherein the first key point number is a preset key point number, and the second key point number is an output key point number of the reference image.
The method of claim 6, wherein determining the first number of keypoints for the reference image comprises:

and determining the number of first key points corresponding to the reference image according to the number of layers of the reference image in the image pyramid, wherein a preset corresponding relation exists between the number of layers of the reference image in the image pyramid and the number of the first key points.
The method according to claim 7, wherein in the preset correspondence, the first number of keypoints of two adjacent layers of images satisfies a preset ratio, and the preset ratio is equal to the ratio of the number of pixel points of two adjacent layers of images in the image pyramid.
The method of claim 6, wherein determining a target window size based on the first number of keypoints and the second number of keypoints for the reference image comprises:

determining a ratio of the number of the second keypoints and the number of the first keypoints;

determining a target ratio range in which the ratio is positioned and a target window level corresponding to the target ratio range according to a corresponding relation between a preset ratio range and the window level;

and determining the size of the target window according to the level of the target window.
The method of claim 9, wherein determining a target window size based on the target window level comprises:

determining a target window group corresponding to the ith layer of image according to a corresponding relation between a preset layer number and the window group, wherein the window group comprises at least one window size corresponding to a window level;

and determining the window size corresponding to the target window level in the target window group as the target window size.
The method of claim 2, wherein the extracting at least one candidate keypoint of the i-th layer image comprises:

for each image block of the i-th layer image, determining the feature score of each pixel point in the image block according to a preset feature detection algorithm, and determining the pixel point with the feature score larger than a preset threshold value as a candidate key point of the image block;

and determining candidate key points of each image block of the ith layer image as at least one candidate key point of the ith layer image.
An apparatus for extracting image keypoints, the apparatus comprising:

the image pyramid acquisition module is used for acquiring an image pyramid of an image, wherein the image pyramid comprises N layers of images, and N is greater than 1;

the determining module is used for determining the size of a target window according to the ith layer of image of the image pyramid, and determining at least one output key point of the ith layer of image according to the size of the target window, wherein i is more than or equal to 1 and less than or equal to N; and determining the output key points of the images of all layers of the image pyramid as the key points of the images.
The apparatus of claim 12, wherein the determining module is configured to:

extracting at least one candidate key point of the ith layer of image of the image pyramid, determining the size of a target window according to the ith layer of image, and performing non-maximum suppression processing on the at least one candidate key point of the ith layer of image according to the size of the target window to obtain at least one output key point of the ith layer of image.
The apparatus of claim 12, wherein the image is a static image, and wherein the determining module is configured to:

if the ith layer image is the 1 st layer image, determining the size of a preset window as the size of a target window;

otherwise, determining the i-1 layer image in the image pyramid of the image as a reference image of the i layer image, and determining the size of a target window according to the reference image.
The apparatus of claim 12, wherein the image is a frame of image in a video stream, and wherein the determining module is configured to:

if the image is any frame image except the 1 st frame image in the video stream, determining the ith layer image of the image pyramid of the previous frame image of the image as the reference image of the ith layer image of the image pyramid of the image, and determining the size of the target window according to the reference image.
The apparatus of claim 15, wherein the determining module is further configured to:

if the image is a 1 st frame image in a video stream and the ith layer image of the image pyramid of the image is a 1 st layer image, determining the size of a preset window as the size of a target window;

if the image is the 1 st frame image in the video stream and the ith layer image of the image pyramid of the image is an image except the 1 st layer image, determining the i-1 th layer image of the image pyramid of the image as a reference image of the ith layer image of the image pyramid of the image, and determining the size of a target window according to the reference image.
The apparatus of any one of claims 14-16, wherein the determining module is configured to:

determining a first key point number of the reference image, and determining the size of a target window according to the first key point number and a second key point number of the reference image, wherein the first key point number is a preset key point number, and the second key point number is an output key point number of the reference image.
The apparatus of claim 17, wherein the determining module is configured to:

and determining the number of first key points corresponding to the reference image according to the number of layers of the reference image in the image pyramid, wherein a preset corresponding relation exists between the number of layers of the reference image in the image pyramid and the number of the first key points.
The apparatus of claim 18, wherein in the predetermined correspondence, the number of first keypoints in two adjacent layers of images satisfies a predetermined ratio, and the predetermined ratio is equal to a ratio of the number of pixel points in two adjacent layers of images in the image pyramid.
The apparatus of claim 17, wherein the determining module is configured to:

determining a ratio of the number of the second keypoints and the number of the first keypoints;

determining a target ratio range in which the ratio is positioned and a target window level corresponding to the target ratio range according to a corresponding relation between a preset ratio range and the window level;

and determining the size of the target window according to the level of the target window.
The apparatus of claim 20, wherein the determining module is configured to:

determining a target window group corresponding to the ith layer of image according to a corresponding relation between a preset layer number and the window group, wherein the window group comprises at least one window size corresponding to a window level;

and determining the window size corresponding to the target window level in the target window group as the target window size.
The apparatus of claim 13, wherein the determining module is configured to:

for each image block of the i-th layer image, determining the feature score of each pixel point in the image block according to a preset feature detection algorithm, and determining the pixel point with the feature score larger than a preset threshold value as a candidate key point of the image block;

and determining candidate key points of each image block of the ith layer image as at least one candidate key point of the ith layer image.
An image processing device, comprising a memory for storing instructions and a processor for invoking the instructions and performing the method of any one of claims 1-11.
A computer-readable storage medium, which, when run on an image processing apparatus, causes the image processing apparatus to perform the method of any one of claims 1-11.