WO2020056688A1

WO2020056688A1 - Method and apparatus for extracting image key point

Info

Publication number: WO2020056688A1
Application number: PCT/CN2018/106778
Authority: WO
Inventors: 左韶军; 林天鹏; 占云龙; 赵强; 王林召
Original assignee: 华为技术有限公司
Priority date: 2018-09-20
Filing date: 2018-09-20
Publication date: 2020-03-26
Also published as: CN112424787A

Abstract

A method and apparatus for extracting an image key point, which relate to the technical field of electronics. The method comprises: acquiring an image pyramid of an image (301), the image pyramid comprising N layers of images, and N>1; according to an ith layer image of the image pyramid, determining the size of a target window, and determining at least one output key point of the ith layer image according to the size of the target window, wherein 1≤i≤N; and determining an output key point of each layer image of the image pyramid as a key point of the image. By employing the described method, the number of key points extracted from each image may be equalized.

Description

Method and device for extracting key points of image

Technical field

The present application relates to the field of electronic technology, and in particular, to a method and device for extracting key points of an image.

Background technique

In image processing, key points in image pixels are usually used for image matching. Key points (also known as feature points or points of interest) of the image are prominent and representative points in the image. These points can be used to identify images, perform image matching, or implement 3D (3D) reconstruction.

In the process of extracting the key points of the image, the image pyramid is first constructed by down-sampling the image layer by layer, that is, the initial image is used as the image of the first layer, and all pixels in the image of the i-th layer are determined according to a certain level. The downsampling is performed to obtain the down-sampled pixels as the i + 1th layer image, where the pixel size of the i + 1th layer image is smaller than the ith layer image, and i is a positive integer greater than or equal to 1. Then, each layer of image is divided into image blocks. For each image block, the feature score of each pixel is determined based on the FAST (Feature from Accelerated Segment Test) algorithm, and the pixels with the feature score greater than the threshold are determined. For the key point. Finally, non-maximum suppression can be performed on each key point of the image block according to a window of a preset size, that is, in a window centered on a key point, the key point with the highest feature score is extracted, which can be understood as a local pole Big value search. By performing the above processing on each image block of an image in a layer in the image pyramid, the key points of the image in this layer can be obtained. By traversing each layer of images to obtain key points, the key points of the entire image can be obtained.

The above scheme for obtaining image key points has at least the following problems: The size of the non-maximum suppression window is fixed. For images with too rich textures, the number of extracted key points may be too much. For images with too few textures, the extracted keys The number of points may be too small, that is, the number of key points extracted for images with different texture complexity is large. The imbalance of key points may affect subsequent processing. For example, an excessive number of key points may cause image matching operations. When the amount is increased, the number of key points is too small, which may lead to lower accuracy of image matching.

Summary of the Invention

This embodiment provides a method and a device for extracting key points of an image, which can balance the number of key points extracted by each image. The technical solution is as follows:

In one aspect, a method for extracting key points of an image is provided. The method includes: an image processing device acquires an image pyramid of an image, the image pyramid includes N-layer images, N> 1; and determining a target window size according to an i-th layer image of the image pyramid , Determine at least one output key point of the i-th layer image according to the size of the target window, where 1 ≦ i ≦ N; determine the output key points of each layer image of the image pyramid as the key points of the image.

Through the above processing, during the process of extracting the key points of the image, before performing non-maximum suppression processing, the image processing device can adjust the window size of non-maximum suppression based on each layer of the image, so that the non-maximum suppression The window size can be changed with each layer of the image, and the number of key points extracted from each image can be balanced.

In a possible implementation manner, determining the target window size according to the i-th layer image of the image pyramid, and determining at least one output key point of the i-th layer image according to the target window size, including: extracting at least one of the i-th layer image of the image pyramid A candidate key point, the target window size is determined according to the i-th layer image, and at least one candidate key point of the i-th layer image is subjected to non-maximum suppression processing according to the target window size to obtain at least one output key point of the i-th layer image . Through the above processing, the image processing device can perform non-maximum suppression processing on the candidate key points of each layer of the image. Since the candidate key points are extracted, the number of key points during the non-maximum suppression processing can be reduced and the processing efficiency can be improved. .

In a possible implementation manner, the image is a static image, and determining the target window size according to the i-th layer image of the image pyramid includes: if the i-th layer image is a first-layer image, the image processing device determines the preset window size Is the target window size; otherwise, the image processing device determines the i-1th layer image in the image pyramid of the image as the reference image of the i-th layer image, and determines the target window size according to the reference image. The images of two adjacent layers in the same image pyramid can be obtained according to sampling, that is, the textures of the images of different layers and the original image can be the same. Therefore, the textures of the reference image obtained by the above processing and the image of the extracted key points are complicated. Degrees are similar.

In a possible implementation manner, the image is a frame image in the video stream, and the target window size is determined according to the i-th layer image of the image pyramid, including: if the image is any other than the first frame image in the video stream For a frame of image, the image processing device determines the i-th image of the image pyramid of the previous frame of the image as the reference image of the i-th image of the image pyramid of the image, and determines the target window size according to the reference image. The images presented by two adjacent frames may have similar pictures. Therefore, the reference image obtained through the above processing is similar in texture complexity to the image from which the key points are extracted.

In a possible implementation manner, the method further includes: if the image is the first frame image in the video stream, and the i-th image of the image pyramid of the image is the first-layer image, the image processing device presets a window The size is determined as the target window size; if the image is the first frame image in the video stream, and the i-th layer image of the image pyramid of the image is an image other than the first layer image, the image processing device The i-1 layer image is determined as the reference image of the i layer image of the image pyramid of the image, and the target window size is determined according to the reference image. If the previous frame image of the first frame image does not exist, the reference image can be determined based on the same method as the static image to improve the accuracy of determining the target window size.

In a possible implementation manner, determining the target window size according to the reference image includes: the image processing device determines the first key point number of the reference image, and determines the target according to the first key point number and the second key point number of the reference image. Window size, where the first number of key points is a preset number of key points and the second number of key points is the number of output key points of a reference image. Through the above processing, the size of the non-maximum suppression window of the current image can be determined according to the degree to which the number of output keypoints of the reference image approaches the preset keypoints, so as to extract the keypoints of the image based on the adjusted target window size, Make the output keypoints of each layer of images close to the preset number of keypoints. In this way, if each image is close to the preset keypoints, the effect of equalizing the keypoints of each image can be achieved.

In a possible implementation manner, determining the number of first key points of the reference image includes: determining the number of first key points corresponding to the reference image according to the number of layers of the reference image in the image pyramid to which the reference image belongs, where, There is a preset correspondence relationship between the number of layers in the reference image pyramid and the number of first key points of the reference image. Through the above processing, the number of first key points of each layer of the image is different, and it can adapt to the situation that the pixel size of each layer of the image is different.

In a possible implementation manner, in the preset correspondence relationship, the number of first key points of two adjacent layers of images meets a preset ratio, and the preset ratio is equal to the ratio of the number of pixels of two adjacent layers of images in the image pyramid. . Each layer of the image pyramid has a different number of first keypoints. The expected number of keypoints in two adjacent layers meets a preset ratio. Through the above processing, the number of first keypoints in each layer of the image can be guaranteed in the total number of pixels. The proportion in the project is certain.

In a possible implementation manner, determining the target window size according to the first key point number and the second key point number of the reference image includes: the image processing device determines a ratio of the second key point number to the first key point number; The corresponding relationship between the preset ratio range and the window level determines the target ratio range where the ratio is located and the target window level corresponding to the target ratio range; the target window size is determined according to the target window level. Through the above processing, the ratio of the number of second keypoints to the number of first keypoints can be used to measure the degree to which the number of output keypoints is close to the preset number of keypoints. The ratio range where different ratios correspond to different window levels. The larger the corresponding window level, the larger it can be. For the same image, increasing the window size of non-maximum suppression can reduce the number of key points obtained after non-maximum suppression processing. Therefore, if the ratio of the number of the second keypoints to the number of the first keypoints is too large, for example, greater than 2, the number of output keypoints in the current image can be reduced by appropriately increasing the window size through the correspondence between the ratio range and the window level. To achieve the effect of equalizing the number of key points.

In a possible implementation manner, determining the target window size according to the target window level includes: determining a target window group corresponding to the i-th layer image according to a preset correspondence between the number of layers and the window group. The window group includes at least The window size corresponding to one window level; the window size corresponding to the target window level in the target window group is determined as the target window size. As the number of layers increases, the pixel size of each layer image gradually decreases. If the window size of the same window level is set to decrease layer by layer, the number of key points in each layer image can be balanced. As each layer of the image pyramid can be used to express the image at multiple scales, that is, to simulate images with different levels of blur, equalizing the number of key points in each layer of the image can make each level of blur have a certain number of key points, improving the image Matching accuracy.

In a possible implementation manner, extracting at least one candidate key point of the i-th layer image includes: for each image block of the i-th layer image, determining a feature of each pixel point in the image block according to a preset feature detection algorithm Score, determine the pixel points whose feature score is greater than a preset threshold as candidate key points of the image block; determine candidate key points of each image block of the i-th layer image as at least one candidate key point of the i-th layer image. Through the above processing, pixel points can be filtered by a preset threshold, and the number of key points during non-maximum suppression processing can be reduced.

In one aspect, an apparatus for extracting key points of an image is provided. The apparatus for extracting key points of an image includes at least one module, and the at least one module is configured to implement the method for extracting key points of an image.

In one aspect, an image processing device is provided. The image processing device includes a memory and a processor. The memory is used to store instructions. The processor is used to call the instructions and execute the method for extracting key points of the image.

In one aspect, a computer-readable storage medium is provided, and when the computer-readable storage medium is run on an image processing device, the image processing device is caused to perform the above-mentioned method for extracting key points of an image.

In one aspect, a computer program product containing instructions is provided, which, when the computer program product runs on an image processing device, causes the image processing device to execute the method for extracting key points of an image described above.

The beneficial effects brought by the technical solution provided in this embodiment are:

In this embodiment, when the image processing device extracts a key point from the image, the target window size can be adjusted based on each layer of images in the image pyramid of the image, that is, the window size of non-maximum suppression is adjusted so that the non-maximum value The size of the suppressed window can be changed with each layer of the image, and the number of key points in each image is balanced to reduce the negative impact on image matching.

BRIEF DESCRIPTION OF THE DRAWINGS

In order to explain the technical solution in this embodiment more clearly, the accompanying drawings used in the description of the embodiments will be briefly introduced below. Obviously, the accompanying drawings in the following description are only some embodiments of the present application. Those of ordinary skill in the art can obtain other drawings according to the drawings without paying creative labor.

FIG. 1 is an implementation environment diagram provided by this embodiment;

FIG. 2 is a schematic structural diagram of an image processing device according to this embodiment; FIG.

FIG. 3 is a flowchart of a method for extracting key points of an image provided by this embodiment; FIG.

FIG. 4 is a flowchart of a method for obtaining candidate key points provided by this embodiment; FIG.

FIG. 5 is a schematic diagram of a reference image provided by this embodiment;

FIG. 6 is a flowchart of a method for extracting key points of an image provided by this embodiment; FIG.

FIG. 7 is a flowchart of a method for extracting key points of an image provided by this embodiment;

FIG. 8 is a schematic diagram of a reference image provided by this embodiment; FIG.

FIG. 9 is a schematic diagram of an apparatus for extracting key points of an image provided by this embodiment.

detailed description

This embodiment provides a method for extracting key points of an image, and the method may be implemented by an image processing device. FIG. 1 is a diagram of an implementation environment provided by this embodiment. The implementation environment includes a plurality of terminals 101 and an image processing apparatus 102 for providing services to the plurality of terminals. The plurality of terminals 101 are connected to the image processing apparatus 102 through a wireless or wired network. The image processing device 102 may provide a service for the terminal 101 to extract key points of an image. For the image processing device 102, the image processing device 102 may further have at least one database for storing an image of a key point to be extracted, a key point of the above image, and the like. As the requester of the service, the terminal 101 may send an image of the key points to be extracted to the image processing device 102.

The image processing apparatus 102 may include a processor 210 and a transceiver 220. The transceiver 220 may be connected to the processor 210 as shown in FIG. 2. The transceiver 220 may be used to send and receive messages or data, that is, may receive an image of a key point to be extracted and the like sent by the terminal 101. The processor 210 may be a control center of the image processing apparatus 102, and uses various interfaces and lines to connect various parts of the entire image processing apparatus 102, such as the transceiver 220 and the like. In this application, the processor 210 may be an ASIC (Application-Specific Integrated Circuits), which may be used to extract key points of an image. The processor 210 may include one or more processing units. The processor 210 may integrate an application processor and a modem, where the application processor mainly processes an operating system and the modem mainly processes wireless communications. The processor 210 may also be a digital signal processor, a central processing unit, or the like. The image processing apparatus 102 may further include a memory 230. The memory 230 may be configured to store an image of a key point to be extracted, a key point of an image, and the like. The image processing device 102 may further include an input / output interface 240, which may provide an interface between the processor 210 and a peripheral interface module. The peripheral interface module may be a button or the like.

In the process of extracting the key points of the image, the present application introduces a reference image to determine the window size of the non-maximum suppression. For the image pyramid formed by the image of the key points to be extracted, the reference image of the i-th layer image can have the following two types: First, when the image is a still image or a frame image in a video stream, the reference of the i-th layer image The image can be the i-1th layer image in the same image pyramid; second, when the image is a frame image in the video stream, the reference image of the ith layer image can be the first image in the image pyramid of the previous frame image i-layer image. Among them, a static image may refer to an independent image, and the key points extracted by the image processing device are not related to other images. For example, a static image may be a captured photo; correspondingly, a frame image in a video stream It is not an independent image. There is a chronological relationship between each frame of images and other frames of images, and the key points extracted by an image processing device for one frame of image are related to the images of adjacent frames.

The above two types of reference images have in common that the texture complexity of the reference image and the image of the key point to be extracted is similar. The reason is that for the first reference image, the images of two adjacent layers in the same image pyramid can be obtained according to sampling, that is, the texture of the images of different layers can be the same as the original image; for the second reference image, because The time interval between the images of adjacent frames is small, for example, it is only 40 milliseconds. Therefore, the images of the two adjacent frames may be similar, that is, the texture between the images of the adjacent two frames is similar.

The image features between images with similar texture complexity are also similar, that is, when the key points are extracted based on the same method, the number of key points obtained is similar. Therefore, if the key points of the reference image have been extracted when the key points are extracted from the image, the number of key points of the reference image can be used to measure whether the corresponding method of extracting key points is appropriate to determine whether the same method is applied or how Make adjustments. Since the non-maximum suppression window can filter the key points, the number of key points in the reference image can also be used to measure whether the size of the corresponding non-maximum suppression window is appropriate. Adjust to equalize the number of key points extracted from each image and avoid large differences in the number of key points obtained for images with large differences in texture complexity.

Of course, the reference image of the i-th layer image may be other reference images obtained based on the same concept in addition to the above two types. These reference images can be applied to the method for extracting key points of an image provided in this application, which is not limited in this application.

An embodiment of the present application provides a method for extracting a key point of a still image or a video image. Taking the reference image of the i-th layer image as the i-th layer image in the same image pyramid as an example, combining specific In an implementation manner, the process flow of the method for extracting key points of an image shown in FIG. 3 is described in detail, and the content may be as follows:

In step 301, the image processing apparatus acquires an image pyramid of an image.

The image pyramid may include N-layer images, N> 1.

In this embodiment, the image processing device has the ability to extract key points of the image. If the image processing device provides a service for extracting key points of an image for other terminals, it can receive still images or video streams sent by the terminals. Alternatively, the image processing device may have a function of acquiring an image (for example, the image processing device may be a monitoring device), and then key points may be extracted from the acquired image.

The image processing device may also store an image of a key point to be extracted. The image processing device may extract key points for each frame of images in the video stream in real time, or may extract key points for stored images, which is not limited in this embodiment.

For the image of the key points to be extracted, the image processing device may construct an image pyramid. The embodiment does not limit the specific method of constructing the image pyramid. For example, the image pyramid may be constructed based on an upsampling or downsampling method.

In a possible implementation manner, the process of constructing the image pyramid by the image processing device may be as follows: the image processing device uses the image as the first layer image of the image pyramid, and downsamples the image of the image pyramid layer by layer according to a preset ratio. The next layer of images, until the construction stop condition is reached, stops downsampling the image of the image pyramid to obtain the image pyramid of the image.

Downsampling refers to generating a thumbnail of an image, and the preset ratio may refer to a downsampling ratio. The image pyramid formed by the downsampling method uses the image of the key point to be extracted as the original image, and generates thumbnails of multiple resolutions, that is, the image is expressed at multiple scales.

The construction stop condition may be that the constructed image pyramid reaches a preset number of layers, or the highest-level image reaches a preset size. For example, for an image pyramid with an image size of 992 * 744, the image pyramid is constructed with a preset ratio of 1.2. When the image pyramid reaches the eighth layer, the construction is stopped, and the pixel sizes of the first to eighth layers of the image pyramid are 992. * 744, 827 * 620, 689 * 517, 574 * 431, 478 * 359, 399 * 299, 332 * 249, 277 * 208.

After the image processing device completes the construction of the image pyramid, it can start with the first layer image and extract the key points of the image pyramid image layer by layer.

Of course, the image processing device may also acquire an image pyramid constructed by other devices on the image, which is not limited in this embodiment.

In step 302, the image processing device extracts at least one candidate key point of the i-th layer image of the image pyramid.

The image processing device can detect candidate key points of the image of the image pyramid layer by layer. For example, it can be based on FAST algorithm, SIFT (Scale-Invariant Feature Transform) algorithm, and SURF (Speeded Up Robust Features) to accelerate robust features. Algorithm or FREAK (Fast Retina Keypoint, Fast Retina Keypoint) algorithm to detect candidate keypoints of the image.

In a possible implementation manner, the processing of step 302 may be as follows: for each image block of the i-th layer image, the image processing device determines a feature score of each pixel point in the image block according to a preset feature detection algorithm, and combines the features Pixel points with a score greater than a preset threshold are determined as candidate key points of the image block; candidate key points of each image block of the i-th layer image are determined as at least one candidate key point of the i-th layer image.

Taking the FAST algorithm as an example, the FAST algorithm can calculate each pixel in the image and a pixel within a preset circle range around it, and calculate the gradient of the pixel, that is, calculate the feature score of the pixel. The FAST algorithm can also set an initial threshold in advance. If the feature score of a pixel is greater than the initial threshold, it indicates that the pixel is a corner, and these pixels can be used as candidate key points.

Because there may be an image with a smooth texture, key points may not be detected when detecting based on the initial threshold. Therefore, two thresholds can be set in the FAST algorithm, that is, the initial threshold and the low threshold. For example, the initial threshold can be 20, low The threshold can be 7. For the same image, because the low threshold can detect corners with relatively gentle angles, the key points detected based on the low threshold are generally more than the key points detected based on the initial threshold.

If the image processing device does not detect a key point of the image based on the initial threshold, it needs to re-detect the image based on the low threshold. Re-detecting the image will increase the processing time, especially when the key points are extracted by the hardware, requiring more registers and longer processing delays, consuming more costs, and lower processing efficiency. Therefore, in this embodiment, another method for obtaining candidate key points is provided. For each image block, a feature score of each pixel point in the image block is determined. If there are pixels with a feature score greater than a first threshold, then Pixels with feature scores greater than the first threshold are determined as candidate key points of the image block; otherwise, pixels with feature scores greater than the second threshold are determined as candidate key points of the image block; candidates for each image block of the i-th layer image The key point is determined as at least one candidate key point of the i-th layer image. The above method can be used to detect and extract candidate key points once. It only needs to traverse the image once, and can perform pipeline processing. It does not need to read back the image. It can avoid repeated detection, improve processing efficiency, and reduce complexity for hardware implementation.

As shown in the flowchart of the method for obtaining candidate key points shown in FIG. 4, the method may be as follows:

In step 3021, the image processing device divides each layer image of the image pyramid into a plurality of image blocks of a preset size. For example, for each layer of image, the image processing device may divide the image into a plurality of image blocks with a pixel size of 31 * 15.

In step 3022, for each image block, the image processing device determines a feature score of a pixel point in the image block, and determines a pixel point with a feature score greater than a second threshold as a first candidate key point of the image block.

It has been described above that the image processing device can determine the feature score of a pixel, which is not repeated here. The second threshold may be the above-mentioned low threshold. After determining the feature score of each pixel, the image processing device may obtain a corresponding score map, and the corresponding feature score is recorded at the position of each pixel. Then, the image processing device may detect key points of the image based on the second threshold. If the feature score is greater than the second threshold, the feature score may be retained on the score map, that is, remain as the first candidate key point; if the feature score is not greater than For the second threshold, the feature score can be set to 0 on the score map. Detecting key points directly based on a low threshold can avoid repeated detection.

In step 3023, for each image block, the image processing device determines whether the maximum feature score is greater than a first threshold.

In the process of determining the feature score of each pixel, the image processing device may also determine and store the maximum feature score, for example, while calculating the score map, use a register to count the maximum feature score of candidate key points in the image block. Furthermore, after determining the feature score of each pixel, the image processing device can determine whether the maximum feature score of the pixel in the image block is greater than a first threshold.

The first threshold may be the above-mentioned initial threshold, that is, the first threshold is greater than the above-mentioned second threshold.

In step 3024, if the maximum feature score is greater than the first threshold value, the pixel point with the feature score greater than the first threshold value in the first candidate key point is determined as the second candidate key point of the image block, and the second candidate key point is determined. Candidate key points for image patches.

Since the key points detected based on the first threshold value more satisfy the engineering requirements, if the maximum feature score is greater than the first threshold value, indicating that there is at least one key point that better meets the engineering requirements, the first candidate key point can be performed based on the first threshold value. filter. That is, if the feature score of the first candidate key point is greater than the first threshold, the feature score may be retained on the score map, that is, the second candidate key point; if the feature score of the first candidate key point is not greater than the first A threshold, the feature score can be set to 0 on the score map.

Furthermore, after the second candidate key point is determined, it can be determined as a candidate key point of the image block.

In step 3025, if the maximum feature score is not greater than the first threshold, the first candidate key point is determined as a candidate key point of the image block.

If the maximum feature score is not greater than the first threshold, indicating that there are no key points that better meet the engineering requirements, then the first candidate key point is not filtered, that is, the first candidate key point is determined as a candidate key point of the image block.

In step 3026, the image processing device determines candidate key points of each image block of each layer image as candidate key points of each layer image.

After the image processing device determines the candidate key points of the image block, a score map corresponding to the image block can be obtained at the same time. At the pixel position of the candidate key point in the score map, a feature score corresponding to the candidate key point exists, and the value of the pixel position of the non-candidate key point is 0. Furthermore, for a layer of image, the image processing device can summarize the candidate key points of each image block as the candidate key points of the layer image, and at the same time, can stitch the scores of each image block according to the position of each image block in the layer image Map to get the score map of this layer image.

Let 1≤i≤N, where i is an integer, then the i-th layer image of the image pyramid is an image of any layer. For the i-th layer image of the image pyramid, after obtaining the candidate key points and the score map, non-maximum value suppression can be performed on the candidate key points. Before performing non-maximum suppression, the size of the non-maximum suppression window needs to be determined, and the window may be a convolution kernel.

This embodiment takes the i-th layer image as the reference image of the i-th layer image as an example, and uses the reference image of the i-th layer image to determine the size of the non-maximum suppression window of the i-th layer image.

In step 303, the image processing apparatus determines whether the i-th layer image is a first-layer image.

In step 304, if the i-th layer image is a first-layer image, the image processing device uses the preset window size as the target window size corresponding to the i-th layer image.

The preset window size may refer to a default window size.

Since the image processing device extracts the key points, it starts from the first layer of the image and extracts the key points of the image pyramid image layer by layer. Based on this, the image processing device can obtain the number of layers of the current image, and then can judge the image. Whether it is the first layer image. If the current image is a layer 1 image and no similar points have been extracted before, the image processing device may determine the preset window size as the target window size. That is, non-maximum suppression is performed on the first layer image based on the preset window size.

In step 305, if the i-th layer image is any layer image other than the first-layer image, the image processing device determines the i-th layer image in the image pyramid of the image as the reference image of the i-th layer image, The number of first keypoints of the reference image is determined, and the target window size corresponding to the i-th layer image is determined according to the number of first keypoints and the number of second keypoints of the reference image.

The first number of keypoints may be a preset number of keypoints, and the first number of keypoints may refer to a desired number of keypoints extracted from a reference image. The second number of key points may refer to the number of output key points, and the second number of key points of the reference image may be the number of output key points of the reference image.

As shown in the reference image diagram in Figure 5, if the current image is any layer image other than the first layer image, the key points of the previous layer image have been extracted before, the image of the previous layer can be determined as The reference image is used to determine the non-maximum suppression window of the current image according to the degree to which the extracted key points of the previous layer meet the requirements, that is, the degree to which the number of output key points of the reference image approaches the preset number of key points. size.

In this embodiment, each layer of images in the image pyramid has a different number of first keypoints. Therefore, the processing performed by the image processing device to determine the first keypoint may be as follows: the image processing device is located in the image pyramid to which it belongs according to the reference image The number of layers determines the number of first keypoints corresponding to the reference image.

Wherein, there is a preset correspondence relationship between the number of layers where the reference image is located in the image pyramid and the number of first key points. In the preset correspondence relationship, the number of first key points of two adjacent layers of images meets a preset ratio, and the preset ratio is equal to the ratio of the number of pixels of two adjacent layers of images in the image pyramid. That is, it is ensured that the number of the first key points in each layer of the image occupies a certain proportion in the total number of pixels.

The correspondence between the number of layers and the number of first key points can be set by the technician according to actual needs. Of course, if there are too many layers in the image pyramid, the process of establishing the correspondence between the number of layers and the number of first key points can also be as follows: The number of layers 1 and the first key point are stored as a correspondence relationship term; for the k-th layer, the number of the first key points corresponding to the k-th layer is determined according to the preset number of the first key points corresponding to the k-1 layer, The k-th layer and the corresponding number of first keypoints are stored as a correspondence term, where k> 1.

The number of first key points of the first layer image of the image pyramid can be calculated from the number of image pixels. For example, the number of first key points can be 1% of the number of image pixels. When the pixel size of the first layer image is 992 * At 744, the number of first key points can be 7,380. Furthermore, the image processing device may calculate the number of the first key points layer by layer according to a preset ratio of down-sampling when constructing the image pyramid, and ensure that the ratio of the number of the first key points to the total number of pixels in each layer image is constant. By calculating the number of first key points in each layer by using a preset ratio when constructing the image pyramid, the number of image pixels in each layer of the image can be avoided, the amount of processing can be reduced, and the processing efficiency can be improved.

After the image processing device determines the number of the first keypoints corresponding to the reference image, the specific processing for determining the target window size corresponding to the i-th layer image can be as follows: the image processing device determines the ratio of the number of the second keypoints to the number of the first keypoints; Set the corresponding relationship between the ratio range and the window level, determine the target ratio range where the ratio is located, and the target window level corresponding to the target ratio range; determine the target window size according to the target window level.

In this embodiment, the ratio of the number of second key points to the number of first key points is used to measure how close the number of output key points is to the preset number of key points. The image processing device may store a correspondence relationship between a ratio range and a window level in advance. For different layers, the correspondence relationship is always established. The corresponding relationship between the ratio range and the window level can be shown in Table 1 below:

Table 1 Correspondence between ratio range and window level

比值范围Ratio range	[0,1][0,1]	(1,1.5](1,1.5)	(1.5,2](1.5,2)	(2,+∞](2, + ∞)
窗口级别Window level	TALLTALL	GRANDEGRANDE	VENTIVENTI	TRENTATRENTA

Among them, the window levels are respectively TRENTA, VENTI, GRANDE, and TALL. With reference to the beverage cup sizes, TRENTA has the largest cup size and TALL has the smallest cup size, that is, the window sizes of different window levels are sorted as TRENTA > VENTI> GRANDE> TALL.

After the image processing device determines the reference image, the second key point number of the reference image can be obtained, and then the ratio of the second key point number to the first key point number can be calculated. The image processing device can determine the target ratio range in which the ratio is located in the correspondence between the ratio range and the window level, and then can determine the corresponding target window level. After the image processing device determines the target window level, it can obtain the window size corresponding to the target window level and determine the window size as the target window size.

For the same image, increasing the window size of non-maximum suppression can reduce the number of key points obtained after non-maximum suppression processing. Therefore, if the ratio of the number of the second keypoints to the number of the first keypoints is too large, for example, greater than 2, the window size can be appropriately increased to reduce the number of the second keypoints of the current image. Because the reference image and the current image have similar textures, the window size is adjusted by the number of key points actually output by the reference image, so that the number of key points extracted by the image is close to the number of first key points, and the number of key points in each layer of the image is balanced .

Each of the above window levels may correspond to a fixed window size, that is, for images of different layers, the window sizes determined according to the same window level are the same. Optionally, in a possible implementation manner, for images of different layers, the window sizes determined according to the same window level may be different. The above-mentioned processing for determining the target window size according to the target window level may be as follows: The preset correspondence between the number of layers and the window group determines the target window group corresponding to the i-th layer image; the window size corresponding to the target window level in the target window group is determined as the target window size.

The image processing device may store in advance a window group corresponding to each layer of images in the image pyramid, and each window group may include a window size corresponding to at least one window level. The correspondence between the number of layers and the window group can be shown in Table 2 below:

Table 2 Correspondence between the number of layers and the window group

Zh	第1层Layer 1	第2层Layer 2	第3层Layer 3	第4层Layer 4	第5层Layer 5	第6层Layer 6	第7层Layer 7	第8层Layer 8
TRENTATRENTA	311131 11	291129 11	271127 11	251125 11	23923 9	23923 9	21721 7	21721 7
VENTIVENTI	231123 11	211121 11	191119 11	17917 9	15715 7	15715 7	13513 5	13513 5
GRANDEGRANDE	151115 11	13913 9	11911 9	979 7	757 5	757 5	535 3	535 3
TALLTALL	111111 11	999 9	999 9	777 7	555 5	555 5	333 3	333 3

Among them, the window group of each layer of image includes 4 window levels, which respectively correspond to the above-mentioned TRENTA, VENTI, GRANDE, TALL. It can be seen from Table 1 that the window sizes of the same level in different layers may be the same or different. In general, as the number of layers increases, the window sizes of the same level gradually decrease.

For the i-th layer image, the image processing device may determine the target window size group corresponding to the number of layers according to the above-mentioned correspondence between the number of layers and the window size group. After the image processing device determines the target window level in the above process, it can obtain the window size of the target window level in the target window size group as the non-maximum suppressed window size. For example, for the second-layer image, the reference image is the first-layer image. If the ratio of the number of the second keypoints to the number of the first keypoints is 1.6, the window level can be determined to be VENTI, and the window size can be 21 * 11.

As the number of layers increases, the pixel size of each layer of the image gradually decreases, and the window size of the same window level is set to decrease layer by layer, which can balance the number of key points of each layer of image. As each layer of the image pyramid can be used to express the image at multiple scales, that is, to simulate images with different levels of blur, equalizing the number of key points in each layer of the image can make each level of blur have a certain number of key points, improving the image Matching accuracy.

Of course, in this embodiment, the target window size may also be determined based on other information of the reference image. For example, the target window size corresponding to the i-th layer image may be determined according to the size of the non-maximum suppression window of the reference image. Therefore, the processing after determining the reference image in steps 303-305 may also be: the image processing device determines the target window size according to the reference image.

In step 306, the image processing device performs non-maximum value suppression processing on at least one candidate key point of the i-th layer image according to the target window size to obtain at least one output key point of the i-th layer image.

After obtaining the target window size in the above process, the image processing device can use any candidate key point as the center of the window in the scoring map of the i-th layer image, and determine the key point with the largest feature score within the window, and it will not be the largest The feature score of is set to 0, that is, non-maximum suppression is performed. The candidate key points in the entire score graph are traversed for non-maximum value suppression. When the traversal ends, at least one key point of the i-th layer image can be obtained. The key point can be output as a key point of the image, that is, an output key point is obtained.

After obtaining the output key points of the i-th layer image, you can increase i by 1, that is, continue to repeat the processing of steps 302-306 for the i + 1-th layer image to extract the key points of the i + 1-th layer image until the top layer After the image extraction key points are completed, the process of step 307 is continued.

Of course, in addition to using the reference image, the image processing device may also determine the target window size based on other methods, for example, the target window size may also be determined according to the pixel size of the i-th layer image of the image pyramid, which is not limited in this embodiment. Therefore, the processing of the above steps 302 to 307 may also be: the image processing device determines the target window size according to the i-th layer image of the image pyramid, and determines at least one output key point of the i-th layer image according to the target window size.

In step 307, the image processing device determines an output key point of each layer image of the image pyramid as a key point of the image.

After the output key points of each layer of the image pyramid are determined, all the output key points can be used as the image key points. The image processing device can describe the key points of the image, for example, the position, scale, and direction of the key points can be used to describe the key points. Furthermore, the image processing device can store the key points so that the key points can be used for image matching and other processing in subsequent processes.

In this embodiment, for the ith layer image of the image pyramid, the image processing device uses the ith layer-1 image as a reference image. Since the complexity of the texture of the ith layer image and the ith layer-1 image is similar, it can be based on the reference The image adjusts the window size of the non-maximum suppression so that the number of extracted key points is close to the expected number of key points, and the number of key points of each image is balanced to reduce the negative impact on image matching.

The reference image of the i-th layer image in the above process is the i-th layer image. An embodiment of the present application provides a method for extracting a key point of each frame of the image in the video stream. The image is the i-th layer image in the image pyramid of the previous frame as an example. In combination with a specific embodiment, the process flow of the method for extracting the key points of the image shown in FIG. 6 is described in detail. The content can be as follows:

In step 601, the image processing apparatus acquires an image pyramid of an image.

The specific processing of step 601 is the same as that of step 301 described above, and details are not described herein again.

In step 602, the image processing apparatus extracts at least one candidate key point of the i-th layer image of the image pyramid.

In step 603, the image processing device determines whether the image is a first frame image.

There is no necessary timing relationship between step 603 and steps 601-602, and it can be performed synchronously with steps 601-602, or can be performed before steps 601-602, which is not limited in this embodiment.

In step 604, if the image is the first frame image, the image processing device uses the preset window size as the target window size corresponding to each layer of the image.

The image processing device can extract key points for each frame of image according to the chronological order of the video stream. Therefore, if the image is the first frame image and no similar image has been extracted before, then for the image pyramid of the first frame, The target window size corresponding to each layer of images can be a preset window size. Optionally, the preset window size of each layer of images can be different, which can satisfy the relationship of decreasing layer by layer. For example, the preset window size of each layer of images can be The window size of the TALL level in Table 2 above.

Optionally, the reference image can also be determined based on the method provided in the foregoing embodiment. The process flow of the method for extracting key points of the image shown in FIG. 7 can be as follows:

In step 6041, the image processing device determines whether the i-th layer image is a first-layer image.

In step 6042, if the image is the first frame image in the video stream, and the ith layer image of the image pyramid of the image is the first layer image, the image processing device determines the preset window size as the first image pyramid of the image. The target window size corresponding to the i-layer image.

In step 6043, if the image is the first frame image in the video stream, and the i-th layer image of the image pyramid of the image is an image other than the first-layer image, the image processing device converts the i-th image of the image pyramid The -1 layer image is determined as the reference image of the i-th layer image of the image pyramid of the image, and the target window size is determined according to the reference image.

The specific processing for extracting the key points of the first frame image shown in FIG. 7 is the same as that of the foregoing embodiment, and details are not described herein again.

In step 605, if the image is any frame image other than the first frame image in the video stream, the image processing device determines the i-th layer image of the image pyramid of the previous frame image of the image as the image image The reference image of the ith layer image of the pyramid determines the first key point number of the reference image, and determines the target window size corresponding to the ith layer image of the image pyramid according to the second key point number and the first key point number of the reference image. .

As shown in the reference image diagram in Figure 8, if the current image is any frame image other than the first frame image, the key points of the previous frame image have been extracted before, the image pyramid of the previous frame image can be The image of the corresponding layer in the middle is determined as the reference image, so that the non-polarity of the current image is determined according to the degree to which the reference image has extracted the key points to meet the requirements, that is, the degree to which the number of output key points of the reference image approaches the preset key points Large values suppress the size of the window. The specific processing for determining the target window size according to the reference image is the same as that in the foregoing embodiment, and details are not described herein again.

In step 606, the image processing device performs non-maximum suppression processing on at least one candidate key point of the i-th layer image of the image pyramid of the image according to the target window size to obtain at least one output key point of the i-th layer image.

In step 607, the image processing device determines an output key point of each layer image of the image pyramid as a key point of the image.

Except for the method of determining the reference image, the rest of the processing for extracting the key points of the image is the same as that of the above embodiment, which is not described in this embodiment.

In this embodiment, when any frame image of the video stream is used as the image, for the i-th layer image of the image pyramid of the image, the image processing device uses the corresponding layer image of the previous frame image as the reference image. The texture complexity of a frame of images is similar, and the blurring degree of the same layer of images in the image pyramid is similar. Therefore, the size of the non-maximum suppression window can be adjusted based on the reference image, so that the number of extracted key points is close to the expected number of key points. , To equalize the number of key points in each image in order to reduce the negative impact on image matching.

Based on the same technical concept, this embodiment also provides a device for extracting key points of an image. The device may be the above-mentioned image processing device or configured in the above-mentioned image processing device. As shown in FIG. 9, the device includes:

An obtaining module 910 is configured to obtain an image pyramid of an image, where the image pyramid includes N-layer images, N> 1, and can specifically implement the obtaining function in the

above steps

301 and 601, and other hidden steps;

A determining module 920, configured to determine a target window size according to the i-th layer image of the image pyramid, and determine at least one output key point of the i-th layer image according to the target window size, where 1≤i≤N; The output key points of the images of each layer of the image pyramid are determined as the key points of the image; specifically, the determination function in the above steps 302-307, 602-607, and other hidden steps can be implemented.

Optionally, the determining module 920 is configured to:

Extract at least one candidate key point of the i-th layer image of the image pyramid, determine a target window size according to the i-th layer image, and perform at least one candidate key point of the i-th layer image according to the target window size Non-maximum suppression processing to obtain at least one output key point of the i-th layer image.

Optionally, the image is a static image, and the determining module 920 is configured to:

If the i-th layer image is a first-layer image, determining a preset window size as a target window size;

Otherwise, the image of layer i-1 in the image pyramid of the image is determined as the reference image of the image of layer i, and the target window size is determined according to the reference image.

Optionally, the image is a frame image in a video stream, and the determining module 920 is configured to:

If the image is any frame image other than the first frame image in the video stream, determining the i-th layer image of the image pyramid of the previous frame image of the image as the image pyramid of the image A reference image of the i-th layer image, and a target window size is determined according to the reference image.

Optionally, the determining module 920 is further configured to:

If the image is the first frame image in the video stream, and the i-th layer image of the image pyramid of the image is the first-layer image, determining the preset window size as the target window size;

If the image is the first frame image in the video stream, and the i-th layer image of the image pyramid of the image is an image other than the first layer image, the i-1th image of the image pyramid of the image is The layer image is determined as the reference image of the ith layer image of the image pyramid of the image, and the target window size is determined according to the reference image.

Optionally, the determining module 920 is configured to:

Determining a first number of key points of the reference image, and determining a target window size according to the first number of key points and the second number of key points of the reference image, where the first number of key points is a preset number of key points Therefore, the number of the second key points is the number of output key points of the reference image.

Optionally, the determining module 920 is configured to:

The number of first keypoints corresponding to the reference image is determined according to the number of layers where the reference image is located in the image pyramid to which the reference image belongs, where the number of layers and the first key to which the reference image is located in the image pyramid to which the reference image belongs. There is a preset correspondence relationship between the number of points.

Optionally, in the preset correspondence relationship, the number of first key points of two adjacent layers of images meets a preset ratio, and the preset ratio is equal to the ratio of the number of pixels of two adjacent layers of images in the image pyramid.

Optionally, the determining module 920 is configured to:

Determining a ratio between the number of the second key points and the number of the first key points;

Determining a target ratio range in which the ratio is located and a target window level corresponding to the target ratio range according to a preset correspondence between the ratio range and the window level;

A target window size is determined according to the target window level.

Optionally, the determining module 920 is configured to:

Determining a target window group corresponding to the i-th layer image according to a preset correspondence between the number of layers and the window group, where the window group includes a window size corresponding to at least one window level;

A window size corresponding to the target window level in the target window group is determined as a target window size.

Optionally, the determining module 920 is configured to:

For each image block of the i-th layer image, a feature score of each pixel in the image block is determined according to a preset feature detection algorithm, and a pixel point whose feature score is greater than a preset threshold is determined as the image block. Candidate key points

The candidate key points of each image block of the i-th layer image are determined as at least one candidate key point of the i-th layer image.

It should be noted that the foregoing obtaining module 910 may be implemented by a processor, and the determining module 920 may be implemented by a processor and a memory together.

It should be noted that the device for extracting image key points provided in the foregoing embodiments only uses the division of the foregoing functional modules as an example for extracting the image key points. In actual applications, the above-mentioned functions can be assigned by different functions. The function module is completed, that is, the internal structure of the image processing device is divided into different function modules to complete all or part of the functions described above. In addition, the apparatus for extracting key points of an image provided by the foregoing embodiment belongs to the same concept as the method embodiment for extracting key points of an image. For specific implementation processes, refer to the method embodiments, and details are not described herein again.

In the above embodiments, all or part may be implemented by software, hardware, or a combination thereof. When implemented using software, it may be all or partly implemented in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, all or part of the processes or functions according to this embodiment are generated. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be from a website site, computer, server, or data center Transmission to another website site, computer, server or data center by wire (for example, coaxial cable, fiber optic, twisted pair) or wireless (for example, infrared, wireless, microwave, etc.). The computer-readable storage medium may be any medium that can be accessed by a computer or a data storage device such as a server, a data center, or the like that includes one or more media integrations. The medium may be a magnetic medium (such as a floppy disk, a hard disk, a magnetic tape, etc.), an optical medium (such as an optical disk, etc.), or a semiconductor medium (such as a solid state hard disk, etc.).

Claims

A method for extracting key points of an image, wherein the method includes:

Acquiring an image pyramid of an image, the image pyramid including N-layer images, N> 1;

Determine a target window size according to the i-th layer image of the image pyramid, and determine at least one output key point of the i-th layer image according to the target window size, where 1 ≦ i ≦ N;

An output key point of each layer image of the image pyramid is determined as a key point of the image.
The method according to claim 1, characterized in that the target window size is determined according to the i-th layer image of the image pyramid, and at least one output key point of the i-th layer image is determined according to the target window size, include:

Extract at least one candidate key point of the i-th layer image of the image pyramid, determine a target window size according to the i-th layer image, and perform at least one candidate key point of the i-th layer image according to the target window size Non-maximum suppression processing to obtain at least one output key point of the i-th layer image.
The method according to claim 1, wherein the image is a static image, and determining the target window size based on the i-th layer image of the image pyramid includes:

If the i-th layer image is a first-layer image, determining a preset window size as a target window size;

Otherwise, the image of layer i-1 in the image pyramid of the image is determined as the reference image of the image of layer i, and the target window size is determined according to the reference image.
The method according to claim 1, wherein the image is a frame image in a video stream, and the determining a target window size according to an i-th layer image of the image pyramid comprises:

If the image is any frame image other than the first frame image in the video stream, determining the i-th layer image of the image pyramid of the previous frame image of the image as the image pyramid of the image A reference image of the i-th layer image, and a target window size is determined according to the reference image.
The method according to claim 4, further comprising:

If the image is the first frame image in the video stream, and the i-th layer image of the image pyramid of the image is the first-layer image, determining the preset window size as the target window size;

If the image is the first frame image in the video stream, and the i-th layer image of the image pyramid of the image is an image other than the first layer image, the i-1th image of the image pyramid of the image is The layer image is determined as the reference image of the ith layer image of the image pyramid of the image, and the target window size is determined according to the reference image.
The method according to any one of claims 3-5, wherein determining the target window size according to the reference image comprises:

Determining a first number of key points of the reference image, and determining a target window size according to the first number of key points and the second number of key points of the reference image, where the first number of key points is a preset number of key points Therefore, the number of the second key points is the number of output key points of the reference image.
The method according to claim 6, wherein the determining the number of first keypoints of the reference image comprises:

The number of first keypoints corresponding to the reference image is determined according to the number of layers where the reference image is located in the image pyramid to which the reference image belongs, where the number of layers and the first key to which the reference image is located in the image pyramid to which the reference image belongs There is a preset correspondence relationship between the number of points.
The method according to claim 7, characterized in that, in the preset correspondence relationship, the number of first key points of two adjacent layers of images satisfies a preset ratio, and the preset ratio is equal to the adjacent ratio in the image pyramid The ratio of the number of pixels in the two-layer image.
The method according to claim 6, wherein determining the target window size according to the number of first keypoints and the number of second keypoints of the reference image comprises:

Determining a ratio between the number of the second key points and the number of the first key points;

Determining a target ratio range in which the ratio is located and a target window level corresponding to the target ratio range according to a preset correspondence between the ratio range and the window level;

A target window size is determined according to the target window level.
The method according to claim 9, wherein determining the target window size according to the target window level comprises:

Determining a target window group corresponding to the i-th layer image according to a preset correspondence between the number of layers and the window group, where the window group includes a window size corresponding to at least one window level;

A window size corresponding to the target window level in the target window group is determined as a target window size.
The method according to claim 2, wherein the extracting at least one candidate key point of the i-th layer image comprises:

For each image block of the i-th layer image, a feature score of each pixel in the image block is determined according to a preset feature detection algorithm, and a pixel point whose feature score is greater than a preset threshold is determined as the image block. Candidate key points

The candidate key points of each image block of the i-th layer image are determined as at least one candidate key point of the i-th layer image.
A device for extracting key points of an image is characterized in that the device includes:

An acquisition module, configured to acquire an image pyramid of an image, where the image pyramid includes N-layer images, N> 1;

A determining module, configured to determine a target window size according to the i-th layer image of the image pyramid, and determine at least one output key point of the i-th layer image according to the target window size, where 1 ≦ i ≦ N; The output key points of each layer of the image pyramid are determined as the key points of the image.
The apparatus according to claim 12, wherein the determining module is configured to:

Extract at least one candidate key point of the i-th layer image of the image pyramid, determine a target window size according to the i-th layer image, and perform at least one candidate key point of the i-th layer image according to the target window size Non-maximum suppression processing to obtain at least one output key point of the i-th layer image.
The device according to claim 12, wherein the image is a static image, and the determining module is configured to:

If the i-th layer image is a first-layer image, determining a preset window size as a target window size;

Otherwise, the image of layer i-1 in the image pyramid of the image is determined as the reference image of the image of layer i, and the target window size is determined according to the reference image.
The device according to claim 12, wherein the image is a frame image in a video stream, and the determining module is configured to:

If the image is any frame image other than the first frame image in the video stream, determining the i-th layer image of the image pyramid of the previous frame image of the image as the image pyramid of the image A reference image of the i-th layer image, and a target window size is determined according to the reference image.
The apparatus according to claim 15, wherein the determining module is further configured to:

If the image is the first frame image in the video stream, and the i-th layer image of the image pyramid of the image is the first-layer image, determining the preset window size as the target window size;

If the image is the first frame image in the video stream, and the i-th layer image of the image pyramid of the image is an image other than the first layer image, the i-1th image of the image pyramid of the image is The layer image is determined as the reference image of the ith layer image of the image pyramid of the image, and the target window size is determined according to the reference image.
The apparatus according to any one of claims 14-16, wherein the determining module is configured to:

Determining a first number of key points of the reference image, and determining a target window size according to the first number of key points and the second number of key points of the reference image, where the first number of key points is a preset number of key points Therefore, the number of the second key points is the number of output key points of the reference image.
The apparatus according to claim 17, wherein the determining module is configured to:

The number of first keypoints corresponding to the reference image is determined according to the number of layers where the reference image is located in the image pyramid to which the reference image belongs, where the number of layers and the first key to which the reference image is located in the image pyramid to which the reference image belongs. There is a preset correspondence relationship between the number of points.
The device according to claim 18, wherein in the preset correspondence relationship, the number of first key points of two adjacent layers of images satisfies a preset ratio, and the preset ratio is equal to adjacent in the image pyramid The ratio of the number of pixels in the two-layer image.
The apparatus according to claim 17, wherein the determining module is configured to:

Determining a ratio between the number of the second key points and the number of the first key points;

Determining a target ratio range in which the ratio is located and a target window level corresponding to the target ratio range according to a preset correspondence between the ratio range and the window level;

A target window size is determined according to the target window level.
The apparatus according to claim 20, wherein the determining module is configured to:

Determining a target window group corresponding to the i-th layer image according to a preset correspondence between the number of layers and the window group, where the window group includes a window size corresponding to at least one window level;

A window size corresponding to the target window level in the target window group is determined as a target window size.
The apparatus according to claim 13, wherein the determining module is configured to:

For each image block of the i-th layer image, a feature score of each pixel in the image block is determined according to a preset feature detection algorithm, and a pixel point whose feature score is greater than a preset threshold is determined as the image block. Candidate key points

The candidate key points of each image block of the i-th layer image are determined as at least one candidate key point of the i-th layer image.
An image processing device, characterized in that the image processing device includes a memory and a processor, the memory is used to store instructions, and the processor is used to call the instructions and execute any one of claims 1-11 Requires the described method.
A computer-readable storage medium, characterized in that when the computer-readable storage medium is run on an image processing device, the image processing device is caused to perform the method according to any one of claims 1-11.