WO2020056688A1 - Procédé et appareil d'extraction de point clé d'image - Google Patents

Procédé et appareil d'extraction de point clé d'image Download PDF

Info

Publication number
WO2020056688A1
WO2020056688A1 PCT/CN2018/106778 CN2018106778W WO2020056688A1 WO 2020056688 A1 WO2020056688 A1 WO 2020056688A1 CN 2018106778 W CN2018106778 W CN 2018106778W WO 2020056688 A1 WO2020056688 A1 WO 2020056688A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
layer
key points
window size
pyramid
Prior art date
Application number
PCT/CN2018/106778
Other languages
English (en)
Chinese (zh)
Inventor
左韶军
林天鹏
占云龙
赵强
王林召
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to CN201880095485.3A priority Critical patent/CN112424787B/zh
Priority to PCT/CN2018/106778 priority patent/WO2020056688A1/fr
Publication of WO2020056688A1 publication Critical patent/WO2020056688A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition

Definitions

  • the present application relates to the field of electronic technology, and in particular, to a method and device for extracting key points of an image.
  • key points in image pixels are usually used for image matching.
  • Key points (also known as feature points or points of interest) of the image are prominent and representative points in the image. These points can be used to identify images, perform image matching, or implement 3D (3D) reconstruction.
  • the image pyramid is first constructed by down-sampling the image layer by layer, that is, the initial image is used as the image of the first layer, and all pixels in the image of the i-th layer are determined according to a certain level.
  • the downsampling is performed to obtain the down-sampled pixels as the i + 1th layer image, where the pixel size of the i + 1th layer image is smaller than the ith layer image, and i is a positive integer greater than or equal to 1.
  • each layer of image is divided into image blocks. For each image block, the feature score of each pixel is determined based on the FAST (Feature from Accelerated Segment Test) algorithm, and the pixels with the feature score greater than the threshold are determined.
  • FAST Feature from Accelerated Segment Test
  • non-maximum suppression can be performed on each key point of the image block according to a window of a preset size, that is, in a window centered on a key point, the key point with the highest feature score is extracted, which can be understood as a local pole Big value search.
  • the above scheme for obtaining image key points has at least the following problems:
  • the size of the non-maximum suppression window is fixed.
  • the number of extracted key points may be too much.
  • the extracted keys The number of points may be too small, that is, the number of key points extracted for images with different texture complexity is large.
  • the imbalance of key points may affect subsequent processing. For example, an excessive number of key points may cause image matching operations. When the amount is increased, the number of key points is too small, which may lead to lower accuracy of image matching.
  • This embodiment provides a method and a device for extracting key points of an image, which can balance the number of key points extracted by each image.
  • the technical solution is as follows:
  • a method for extracting key points of an image includes: an image processing device acquires an image pyramid of an image, the image pyramid includes N-layer images, N> 1; and determining a target window size according to an i-th layer image of the image pyramid , Determine at least one output key point of the i-th layer image according to the size of the target window, where 1 ⁇ i ⁇ N; determine the output key points of each layer image of the image pyramid as the key points of the image.
  • the image processing device can adjust the window size of non-maximum suppression based on each layer of the image, so that the non-maximum suppression
  • the window size can be changed with each layer of the image, and the number of key points extracted from each image can be balanced.
  • determining the target window size according to the i-th layer image of the image pyramid, and determining at least one output key point of the i-th layer image according to the target window size including: extracting at least one of the i-th layer image of the image pyramid A candidate key point, the target window size is determined according to the i-th layer image, and at least one candidate key point of the i-th layer image is subjected to non-maximum suppression processing according to the target window size to obtain at least one output key point of the i-th layer image .
  • the image processing device can perform non-maximum suppression processing on the candidate key points of each layer of the image. Since the candidate key points are extracted, the number of key points during the non-maximum suppression processing can be reduced and the processing efficiency can be improved. .
  • the image is a static image
  • determining the target window size according to the i-th layer image of the image pyramid includes: if the i-th layer image is a first-layer image, the image processing device determines the preset window size Is the target window size; otherwise, the image processing device determines the i-1th layer image in the image pyramid of the image as the reference image of the i-th layer image, and determines the target window size according to the reference image.
  • the images of two adjacent layers in the same image pyramid can be obtained according to sampling, that is, the textures of the images of different layers and the original image can be the same. Therefore, the textures of the reference image obtained by the above processing and the image of the extracted key points are complicated. Degrees are similar.
  • the image is a frame image in the video stream
  • the target window size is determined according to the i-th layer image of the image pyramid, including: if the image is any other than the first frame image in the video stream
  • the image processing device determines the i-th image of the image pyramid of the previous frame of the image as the reference image of the i-th image of the image pyramid of the image, and determines the target window size according to the reference image.
  • the images presented by two adjacent frames may have similar pictures. Therefore, the reference image obtained through the above processing is similar in texture complexity to the image from which the key points are extracted.
  • the method further includes: if the image is the first frame image in the video stream, and the i-th image of the image pyramid of the image is the first-layer image, the image processing device presets a window The size is determined as the target window size; if the image is the first frame image in the video stream, and the i-th layer image of the image pyramid of the image is an image other than the first layer image, the image processing device The i-1 layer image is determined as the reference image of the i layer image of the image pyramid of the image, and the target window size is determined according to the reference image. If the previous frame image of the first frame image does not exist, the reference image can be determined based on the same method as the static image to improve the accuracy of determining the target window size.
  • determining the target window size according to the reference image includes: the image processing device determines the first key point number of the reference image, and determines the target according to the first key point number and the second key point number of the reference image.
  • Window size where the first number of key points is a preset number of key points and the second number of key points is the number of output key points of a reference image.
  • determining the number of first key points of the reference image includes: determining the number of first key points corresponding to the reference image according to the number of layers of the reference image in the image pyramid to which the reference image belongs, where, There is a preset correspondence relationship between the number of layers in the reference image pyramid and the number of first key points of the reference image.
  • the number of first key points of two adjacent layers of images meets a preset ratio
  • the preset ratio is equal to the ratio of the number of pixels of two adjacent layers of images in the image pyramid.
  • Each layer of the image pyramid has a different number of first keypoints.
  • the expected number of keypoints in two adjacent layers meets a preset ratio.
  • determining the target window size according to the first key point number and the second key point number of the reference image includes: the image processing device determines a ratio of the second key point number to the first key point number; The corresponding relationship between the preset ratio range and the window level determines the target ratio range where the ratio is located and the target window level corresponding to the target ratio range; the target window size is determined according to the target window level.
  • the ratio of the number of second keypoints to the number of first keypoints can be used to measure the degree to which the number of output keypoints is close to the preset number of keypoints.
  • the ratio range where different ratios correspond to different window levels The larger the corresponding window level, the larger it can be.
  • determining the target window size according to the target window level includes: determining a target window group corresponding to the i-th layer image according to a preset correspondence between the number of layers and the window group.
  • the window group includes at least The window size corresponding to one window level; the window size corresponding to the target window level in the target window group is determined as the target window size.
  • the pixel size of each layer image gradually decreases. If the window size of the same window level is set to decrease layer by layer, the number of key points in each layer image can be balanced.
  • each layer of the image pyramid can be used to express the image at multiple scales, that is, to simulate images with different levels of blur, equalizing the number of key points in each layer of the image can make each level of blur have a certain number of key points, improving the image Matching accuracy.
  • extracting at least one candidate key point of the i-th layer image includes: for each image block of the i-th layer image, determining a feature of each pixel point in the image block according to a preset feature detection algorithm Score, determine the pixel points whose feature score is greater than a preset threshold as candidate key points of the image block; determine candidate key points of each image block of the i-th layer image as at least one candidate key point of the i-th layer image.
  • an apparatus for extracting key points of an image includes at least one module, and the at least one module is configured to implement the method for extracting key points of an image.
  • an image processing device includes a memory and a processor.
  • the memory is used to store instructions.
  • the processor is used to call the instructions and execute the method for extracting key points of the image.
  • a computer-readable storage medium is provided, and when the computer-readable storage medium is run on an image processing device, the image processing device is caused to perform the above-mentioned method for extracting key points of an image.
  • a computer program product containing instructions, which, when the computer program product runs on an image processing device, causes the image processing device to execute the method for extracting key points of an image described above.
  • the target window size can be adjusted based on each layer of images in the image pyramid of the image, that is, the window size of non-maximum suppression is adjusted so that the non-maximum value
  • the size of the suppressed window can be changed with each layer of the image, and the number of key points in each image is balanced to reduce the negative impact on image matching.
  • FIG. 1 is an implementation environment diagram provided by this embodiment
  • FIG. 2 is a schematic structural diagram of an image processing device according to this embodiment.
  • FIG. 3 is a flowchart of a method for extracting key points of an image provided by this embodiment
  • FIG. 4 is a flowchart of a method for obtaining candidate key points provided by this embodiment
  • FIG. 5 is a schematic diagram of a reference image provided by this embodiment.
  • FIG. 6 is a flowchart of a method for extracting key points of an image provided by this embodiment
  • FIG. 7 is a flowchart of a method for extracting key points of an image provided by this embodiment.
  • FIG. 8 is a schematic diagram of a reference image provided by this embodiment.
  • FIG. 9 is a schematic diagram of an apparatus for extracting key points of an image provided by this embodiment.
  • FIG. 1 is a diagram of an implementation environment provided by this embodiment.
  • the implementation environment includes a plurality of terminals 101 and an image processing apparatus 102 for providing services to the plurality of terminals.
  • the plurality of terminals 101 are connected to the image processing apparatus 102 through a wireless or wired network.
  • the image processing device 102 may provide a service for the terminal 101 to extract key points of an image.
  • the image processing device 102 may further have at least one database for storing an image of a key point to be extracted, a key point of the above image, and the like.
  • the terminal 101 may send an image of the key points to be extracted to the image processing device 102.
  • the image processing apparatus 102 may include a processor 210 and a transceiver 220.
  • the transceiver 220 may be connected to the processor 210 as shown in FIG. 2.
  • the transceiver 220 may be used to send and receive messages or data, that is, may receive an image of a key point to be extracted and the like sent by the terminal 101.
  • the processor 210 may be a control center of the image processing apparatus 102, and uses various interfaces and lines to connect various parts of the entire image processing apparatus 102, such as the transceiver 220 and the like.
  • the processor 210 may be an ASIC (Application-Specific Integrated Circuits), which may be used to extract key points of an image.
  • the processor 210 may include one or more processing units.
  • the processor 210 may integrate an application processor and a modem, where the application processor mainly processes an operating system and the modem mainly processes wireless communications.
  • the processor 210 may also be a digital signal processor, a central processing unit, or the like.
  • the image processing apparatus 102 may further include a memory 230.
  • the memory 230 may be configured to store an image of a key point to be extracted, a key point of an image, and the like.
  • the image processing device 102 may further include an input / output interface 240, which may provide an interface between the processor 210 and a peripheral interface module.
  • the peripheral interface module may be a button or the like.
  • the present application introduces a reference image to determine the window size of the non-maximum suppression.
  • the reference image of the i-th layer image can have the following two types: First, when the image is a still image or a frame image in a video stream, the reference of the i-th layer image The image can be the i-1th layer image in the same image pyramid; second, when the image is a frame image in the video stream, the reference image of the ith layer image can be the first image in the image pyramid of the previous frame image i-layer image.
  • a static image may refer to an independent image, and the key points extracted by the image processing device are not related to other images.
  • a static image may be a captured photo; correspondingly, a frame image in a video stream It is not an independent image.
  • the above two types of reference images have in common that the texture complexity of the reference image and the image of the key point to be extracted is similar.
  • the reason is that for the first reference image, the images of two adjacent layers in the same image pyramid can be obtained according to sampling, that is, the texture of the images of different layers can be the same as the original image; for the second reference image, because The time interval between the images of adjacent frames is small, for example, it is only 40 milliseconds. Therefore, the images of the two adjacent frames may be similar, that is, the texture between the images of the adjacent two frames is similar.
  • the image features between images with similar texture complexity are also similar, that is, when the key points are extracted based on the same method, the number of key points obtained is similar. Therefore, if the key points of the reference image have been extracted when the key points are extracted from the image, the number of key points of the reference image can be used to measure whether the corresponding method of extracting key points is appropriate to determine whether the same method is applied or how Make adjustments. Since the non-maximum suppression window can filter the key points, the number of key points in the reference image can also be used to measure whether the size of the corresponding non-maximum suppression window is appropriate. Adjust to equalize the number of key points extracted from each image and avoid large differences in the number of key points obtained for images with large differences in texture complexity.
  • the reference image of the i-th layer image may be other reference images obtained based on the same concept in addition to the above two types. These reference images can be applied to the method for extracting key points of an image provided in this application, which is not limited in this application.
  • An embodiment of the present application provides a method for extracting a key point of a still image or a video image. Taking the reference image of the i-th layer image as the i-th layer image in the same image pyramid as an example, combining specific In an implementation manner, the process flow of the method for extracting key points of an image shown in FIG. 3 is described in detail, and the content may be as follows:
  • step 301 the image processing apparatus acquires an image pyramid of an image.
  • the image pyramid may include N-layer images, N> 1.
  • the image processing device has the ability to extract key points of the image. If the image processing device provides a service for extracting key points of an image for other terminals, it can receive still images or video streams sent by the terminals. Alternatively, the image processing device may have a function of acquiring an image (for example, the image processing device may be a monitoring device), and then key points may be extracted from the acquired image.
  • the image processing device may also store an image of a key point to be extracted.
  • the image processing device may extract key points for each frame of images in the video stream in real time, or may extract key points for stored images, which is not limited in this embodiment.
  • the image processing device may construct an image pyramid.
  • the embodiment does not limit the specific method of constructing the image pyramid.
  • the image pyramid may be constructed based on an upsampling or downsampling method.
  • the process of constructing the image pyramid by the image processing device may be as follows: the image processing device uses the image as the first layer image of the image pyramid, and downsamples the image of the image pyramid layer by layer according to a preset ratio. The next layer of images, until the construction stop condition is reached, stops downsampling the image of the image pyramid to obtain the image pyramid of the image.
  • Downsampling refers to generating a thumbnail of an image
  • the preset ratio may refer to a downsampling ratio.
  • the image pyramid formed by the downsampling method uses the image of the key point to be extracted as the original image, and generates thumbnails of multiple resolutions, that is, the image is expressed at multiple scales.
  • the construction stop condition may be that the constructed image pyramid reaches a preset number of layers, or the highest-level image reaches a preset size. For example, for an image pyramid with an image size of 992 * 744, the image pyramid is constructed with a preset ratio of 1.2. When the image pyramid reaches the eighth layer, the construction is stopped, and the pixel sizes of the first to eighth layers of the image pyramid are 992. * 744, 827 * 620, 689 * 517, 574 * 431, 478 * 359, 399 * 299, 332 * 249, 277 * 208.
  • the image processing device After the image processing device completes the construction of the image pyramid, it can start with the first layer image and extract the key points of the image pyramid image layer by layer.
  • the image processing device may also acquire an image pyramid constructed by other devices on the image, which is not limited in this embodiment.
  • step 302 the image processing device extracts at least one candidate key point of the i-th layer image of the image pyramid.
  • the image processing device can detect candidate key points of the image of the image pyramid layer by layer. For example, it can be based on FAST algorithm, SIFT (Scale-Invariant Feature Transform) algorithm, and SURF (Speeded Up Robust Features) to accelerate robust features. Algorithm or FREAK (Fast Retina Keypoint, Fast Retina Keypoint) algorithm to detect candidate keypoints of the image.
  • FAST algorithm FAST algorithm
  • SIFT Scale-Invariant Feature Transform
  • SURF Speeded Up Robust Features
  • the processing of step 302 may be as follows: for each image block of the i-th layer image, the image processing device determines a feature score of each pixel point in the image block according to a preset feature detection algorithm, and combines the features Pixel points with a score greater than a preset threshold are determined as candidate key points of the image block; candidate key points of each image block of the i-th layer image are determined as at least one candidate key point of the i-th layer image.
  • the FAST algorithm can calculate each pixel in the image and a pixel within a preset circle range around it, and calculate the gradient of the pixel, that is, calculate the feature score of the pixel.
  • the FAST algorithm can also set an initial threshold in advance. If the feature score of a pixel is greater than the initial threshold, it indicates that the pixel is a corner, and these pixels can be used as candidate key points.
  • the initial threshold can be 20
  • the threshold can be 7.
  • the key points detected based on the low threshold are generally more than the key points detected based on the initial threshold.
  • the image processing device does not detect a key point of the image based on the initial threshold, it needs to re-detect the image based on the low threshold. Re-detecting the image will increase the processing time, especially when the key points are extracted by the hardware, requiring more registers and longer processing delays, consuming more costs, and lower processing efficiency. Therefore, in this embodiment, another method for obtaining candidate key points is provided. For each image block, a feature score of each pixel point in the image block is determined.
  • Pixels with feature scores greater than the first threshold are determined as candidate key points of the image block; otherwise, pixels with feature scores greater than the second threshold are determined as candidate key points of the image block; candidates for each image block of the i-th layer image
  • the key point is determined as at least one candidate key point of the i-th layer image.
  • the method may be as follows:
  • the image processing device divides each layer image of the image pyramid into a plurality of image blocks of a preset size. For example, for each layer of image, the image processing device may divide the image into a plurality of image blocks with a pixel size of 31 * 15.
  • the image processing device determines a feature score of a pixel point in the image block, and determines a pixel point with a feature score greater than a second threshold as a first candidate key point of the image block.
  • the second threshold may be the above-mentioned low threshold. After determining the feature score of each pixel, the image processing device may obtain a corresponding score map, and the corresponding feature score is recorded at the position of each pixel. Then, the image processing device may detect key points of the image based on the second threshold. If the feature score is greater than the second threshold, the feature score may be retained on the score map, that is, remain as the first candidate key point; if the feature score is not greater than For the second threshold, the feature score can be set to 0 on the score map. Detecting key points directly based on a low threshold can avoid repeated detection.
  • step 3023 for each image block, the image processing device determines whether the maximum feature score is greater than a first threshold.
  • the image processing device may also determine and store the maximum feature score, for example, while calculating the score map, use a register to count the maximum feature score of candidate key points in the image block. Furthermore, after determining the feature score of each pixel, the image processing device can determine whether the maximum feature score of the pixel in the image block is greater than a first threshold.
  • the first threshold may be the above-mentioned initial threshold, that is, the first threshold is greater than the above-mentioned second threshold.
  • step 3024 if the maximum feature score is greater than the first threshold value, the pixel point with the feature score greater than the first threshold value in the first candidate key point is determined as the second candidate key point of the image block, and the second candidate key point is determined.
  • the first candidate key point can be performed based on the first threshold value. filter. That is, if the feature score of the first candidate key point is greater than the first threshold, the feature score may be retained on the score map, that is, the second candidate key point; if the feature score of the first candidate key point is not greater than the first A threshold, the feature score can be set to 0 on the score map.
  • the second candidate key point after the second candidate key point is determined, it can be determined as a candidate key point of the image block.
  • step 3025 if the maximum feature score is not greater than the first threshold, the first candidate key point is determined as a candidate key point of the image block.
  • the first candidate key point is not filtered, that is, the first candidate key point is determined as a candidate key point of the image block.
  • step 3026 the image processing device determines candidate key points of each image block of each layer image as candidate key points of each layer image.
  • a score map corresponding to the image block can be obtained at the same time.
  • a feature score corresponding to the candidate key point exists, and the value of the pixel position of the non-candidate key point is 0.
  • the image processing device can summarize the candidate key points of each image block as the candidate key points of the layer image, and at the same time, can stitch the scores of each image block according to the position of each image block in the layer image Map to get the score map of this layer image.
  • the i-th layer image of the image pyramid is an image of any layer.
  • the i-th layer image of the image pyramid after obtaining the candidate key points and the score map, non-maximum value suppression can be performed on the candidate key points.
  • the size of the non-maximum suppression window needs to be determined, and the window may be a convolution kernel.
  • This embodiment takes the i-th layer image as the reference image of the i-th layer image as an example, and uses the reference image of the i-th layer image to determine the size of the non-maximum suppression window of the i-th layer image.
  • step 303 the image processing apparatus determines whether the i-th layer image is a first-layer image.
  • step 304 if the i-th layer image is a first-layer image, the image processing device uses the preset window size as the target window size corresponding to the i-th layer image.
  • the preset window size may refer to a default window size.
  • the image processing device Since the image processing device extracts the key points, it starts from the first layer of the image and extracts the key points of the image pyramid image layer by layer. Based on this, the image processing device can obtain the number of layers of the current image, and then can judge the image. Whether it is the first layer image. If the current image is a layer 1 image and no similar points have been extracted before, the image processing device may determine the preset window size as the target window size. That is, non-maximum suppression is performed on the first layer image based on the preset window size.
  • step 305 if the i-th layer image is any layer image other than the first-layer image, the image processing device determines the i-th layer image in the image pyramid of the image as the reference image of the i-th layer image, The number of first keypoints of the reference image is determined, and the target window size corresponding to the i-th layer image is determined according to the number of first keypoints and the number of second keypoints of the reference image.
  • the first number of keypoints may be a preset number of keypoints, and the first number of keypoints may refer to a desired number of keypoints extracted from a reference image.
  • the second number of key points may refer to the number of output key points, and the second number of key points of the reference image may be the number of output key points of the reference image.
  • the image of the previous layer can be determined as The reference image is used to determine the non-maximum suppression window of the current image according to the degree to which the extracted key points of the previous layer meet the requirements, that is, the degree to which the number of output key points of the reference image approaches the preset number of key points. size.
  • each layer of images in the image pyramid has a different number of first keypoints. Therefore, the processing performed by the image processing device to determine the first keypoint may be as follows: the image processing device is located in the image pyramid to which it belongs according to the reference image The number of layers determines the number of first keypoints corresponding to the reference image.
  • the number of first key points of two adjacent layers of images meets a preset ratio, and the preset ratio is equal to the ratio of the number of pixels of two adjacent layers of images in the image pyramid. That is, it is ensured that the number of the first key points in each layer of the image occupies a certain proportion in the total number of pixels.
  • the correspondence between the number of layers and the number of first key points can be set by the technician according to actual needs.
  • the process of establishing the correspondence between the number of layers and the number of first key points can also be as follows:
  • the number of layers 1 and the first key point are stored as a correspondence relationship term; for the k-th layer, the number of the first key points corresponding to the k-th layer is determined according to the preset number of the first key points corresponding to the k-1 layer,
  • the k-th layer and the corresponding number of first keypoints are stored as a correspondence term, where k> 1.
  • the number of first key points of the first layer image of the image pyramid can be calculated from the number of image pixels.
  • the number of first key points can be 1% of the number of image pixels.
  • the image processing device may calculate the number of the first key points layer by layer according to a preset ratio of down-sampling when constructing the image pyramid, and ensure that the ratio of the number of the first key points to the total number of pixels in each layer image is constant.
  • the specific processing for determining the target window size corresponding to the i-th layer image can be as follows: the image processing device determines the ratio of the number of the second keypoints to the number of the first keypoints; Set the corresponding relationship between the ratio range and the window level, determine the target ratio range where the ratio is located, and the target window level corresponding to the target ratio range; determine the target window size according to the target window level.
  • the ratio of the number of second key points to the number of first key points is used to measure how close the number of output key points is to the preset number of key points.
  • the image processing device may store a correspondence relationship between a ratio range and a window level in advance. For different layers, the correspondence relationship is always established.
  • the corresponding relationship between the ratio range and the window level can be shown in Table 1 below:
  • the window levels are respectively TRENTA, VENTI, GRANDE, and TALL.
  • TRENTA has the largest cup size and TALL has the smallest cup size, that is, the window sizes of different window levels are sorted as TRENTA > VENTI> GRANDE> TALL.
  • the second key point number of the reference image can be obtained, and then the ratio of the second key point number to the first key point number can be calculated.
  • the image processing device can determine the target ratio range in which the ratio is located in the correspondence between the ratio range and the window level, and then can determine the corresponding target window level. After the image processing device determines the target window level, it can obtain the window size corresponding to the target window level and determine the window size as the target window size.
  • the window size of non-maximum suppression can reduce the number of key points obtained after non-maximum suppression processing. Therefore, if the ratio of the number of the second keypoints to the number of the first keypoints is too large, for example, greater than 2, the window size can be appropriately increased to reduce the number of the second keypoints of the current image. Because the reference image and the current image have similar textures, the window size is adjusted by the number of key points actually output by the reference image, so that the number of key points extracted by the image is close to the number of first key points, and the number of key points in each layer of the image is balanced .
  • Each of the above window levels may correspond to a fixed window size, that is, for images of different layers, the window sizes determined according to the same window level are the same.
  • the window sizes determined according to the same window level may be different.
  • the above-mentioned processing for determining the target window size according to the target window level may be as follows: The preset correspondence between the number of layers and the window group determines the target window group corresponding to the i-th layer image; the window size corresponding to the target window level in the target window group is determined as the target window size.
  • the image processing device may store in advance a window group corresponding to each layer of images in the image pyramid, and each window group may include a window size corresponding to at least one window level.
  • the correspondence between the number of layers and the window group can be shown in Table 2 below:
  • the window group of each layer of image includes 4 window levels, which respectively correspond to the above-mentioned TRENTA, VENTI, GRANDE, TALL. It can be seen from Table 1 that the window sizes of the same level in different layers may be the same or different. In general, as the number of layers increases, the window sizes of the same level gradually decrease.
  • the image processing device may determine the target window size group corresponding to the number of layers according to the above-mentioned correspondence between the number of layers and the window size group. After the image processing device determines the target window level in the above process, it can obtain the window size of the target window level in the target window size group as the non-maximum suppressed window size.
  • the reference image is the first-layer image. If the ratio of the number of the second keypoints to the number of the first keypoints is 1.6, the window level can be determined to be VENTI, and the window size can be 21 * 11.
  • each layer of the image pyramid can be used to express the image at multiple scales, that is, to simulate images with different levels of blur, equalizing the number of key points in each layer of the image can make each level of blur have a certain number of key points, improving the image Matching accuracy.
  • the target window size may also be determined based on other information of the reference image.
  • the target window size corresponding to the i-th layer image may be determined according to the size of the non-maximum suppression window of the reference image. Therefore, the processing after determining the reference image in steps 303-305 may also be: the image processing device determines the target window size according to the reference image.
  • step 306 the image processing device performs non-maximum value suppression processing on at least one candidate key point of the i-th layer image according to the target window size to obtain at least one output key point of the i-th layer image.
  • the image processing device can use any candidate key point as the center of the window in the scoring map of the i-th layer image, and determine the key point with the largest feature score within the window, and it will not be the largest
  • the feature score of is set to 0, that is, non-maximum suppression is performed.
  • the candidate key points in the entire score graph are traversed for non-maximum value suppression. When the traversal ends, at least one key point of the i-th layer image can be obtained.
  • the key point can be output as a key point of the image, that is, an output key point is obtained.
  • step 307 After obtaining the output key points of the i-th layer image, you can increase i by 1, that is, continue to repeat the processing of steps 302-306 for the i + 1-th layer image to extract the key points of the i + 1-th layer image until the top layer After the image extraction key points are completed, the process of step 307 is continued.
  • the image processing device may also determine the target window size based on other methods, for example, the target window size may also be determined according to the pixel size of the i-th layer image of the image pyramid, which is not limited in this embodiment. Therefore, the processing of the above steps 302 to 307 may also be: the image processing device determines the target window size according to the i-th layer image of the image pyramid, and determines at least one output key point of the i-th layer image according to the target window size.
  • step 307 the image processing device determines an output key point of each layer image of the image pyramid as a key point of the image.
  • the image processing device can describe the key points of the image, for example, the position, scale, and direction of the key points can be used to describe the key points. Furthermore, the image processing device can store the key points so that the key points can be used for image matching and other processing in subsequent processes.
  • the image processing device uses the ith layer-1 image as a reference image. Since the complexity of the texture of the ith layer image and the ith layer-1 image is similar, it can be based on the reference The image adjusts the window size of the non-maximum suppression so that the number of extracted key points is close to the expected number of key points, and the number of key points of each image is balanced to reduce the negative impact on image matching.
  • the reference image of the i-th layer image in the above process is the i-th layer image.
  • An embodiment of the present application provides a method for extracting a key point of each frame of the image in the video stream.
  • the image is the i-th layer image in the image pyramid of the previous frame as an example.
  • the process flow of the method for extracting the key points of the image shown in FIG. 6 is described in detail.
  • the content can be as follows:
  • step 601 the image processing apparatus acquires an image pyramid of an image.
  • step 601 The specific processing of step 601 is the same as that of step 301 described above, and details are not described herein again.
  • step 602 the image processing apparatus extracts at least one candidate key point of the i-th layer image of the image pyramid.
  • step 603 the image processing device determines whether the image is a first frame image.
  • step 603 There is no necessary timing relationship between step 603 and steps 601-602, and it can be performed synchronously with steps 601-602, or can be performed before steps 601-602, which is not limited in this embodiment.
  • step 604 if the image is the first frame image, the image processing device uses the preset window size as the target window size corresponding to each layer of the image.
  • the image processing device can extract key points for each frame of image according to the chronological order of the video stream. Therefore, if the image is the first frame image and no similar image has been extracted before, then for the image pyramid of the first frame,
  • the target window size corresponding to each layer of images can be a preset window size.
  • the preset window size of each layer of images can be different, which can satisfy the relationship of decreasing layer by layer.
  • the preset window size of each layer of images can be The window size of the TALL level in Table 2 above.
  • the reference image can also be determined based on the method provided in the foregoing embodiment.
  • the process flow of the method for extracting key points of the image shown in FIG. 7 can be as follows:
  • step 6041 the image processing device determines whether the i-th layer image is a first-layer image.
  • step 6042 if the image is the first frame image in the video stream, and the ith layer image of the image pyramid of the image is the first layer image, the image processing device determines the preset window size as the first image pyramid of the image.
  • step 6043 if the image is the first frame image in the video stream, and the i-th layer image of the image pyramid of the image is an image other than the first-layer image, the image processing device converts the i-th image of the image pyramid
  • the -1 layer image is determined as the reference image of the i-th layer image of the image pyramid of the image, and the target window size is determined according to the reference image.
  • step 605 if the image is any frame image other than the first frame image in the video stream, the image processing device determines the i-th layer image of the image pyramid of the previous frame image of the image as the image image
  • the reference image of the ith layer image of the pyramid determines the first key point number of the reference image, and determines the target window size corresponding to the ith layer image of the image pyramid according to the second key point number and the first key point number of the reference image.
  • the image pyramid of the previous frame image can be The image of the corresponding layer in the middle is determined as the reference image, so that the non-polarity of the current image is determined according to the degree to which the reference image has extracted the key points to meet the requirements, that is, the degree to which the number of output key points of the reference image approaches the preset key points Large values suppress the size of the window.
  • the specific processing for determining the target window size according to the reference image is the same as that in the foregoing embodiment, and details are not described herein again.
  • step 606 the image processing device performs non-maximum suppression processing on at least one candidate key point of the i-th layer image of the image pyramid of the image according to the target window size to obtain at least one output key point of the i-th layer image.
  • step 607 the image processing device determines an output key point of each layer image of the image pyramid as a key point of the image.
  • the image processing device uses the corresponding layer image of the previous frame image as the reference image.
  • the texture complexity of a frame of images is similar, and the blurring degree of the same layer of images in the image pyramid is similar. Therefore, the size of the non-maximum suppression window can be adjusted based on the reference image, so that the number of extracted key points is close to the expected number of key points. , To equalize the number of key points in each image in order to reduce the negative impact on image matching.
  • this embodiment also provides a device for extracting key points of an image.
  • the device may be the above-mentioned image processing device or configured in the above-mentioned image processing device. As shown in FIG. 9, the device includes:
  • An obtaining module 910 is configured to obtain an image pyramid of an image, where the image pyramid includes N-layer images, N> 1, and can specifically implement the obtaining function in the above steps 301 and 601, and other hidden steps;
  • a determining module 920 configured to determine a target window size according to the i-th layer image of the image pyramid, and determine at least one output key point of the i-th layer image according to the target window size, where 1 ⁇ i ⁇ N;
  • the output key points of the images of each layer of the image pyramid are determined as the key points of the image; specifically, the determination function in the above steps 302-307, 602-607, and other hidden steps can be implemented.
  • the determining module 920 is configured to:
  • Extract at least one candidate key point of the i-th layer image of the image pyramid determine a target window size according to the i-th layer image, and perform at least one candidate key point of the i-th layer image according to the target window size Non-maximum suppression processing to obtain at least one output key point of the i-th layer image.
  • the image is a static image
  • the determining module 920 is configured to:
  • the i-th layer image is a first-layer image, determining a preset window size as a target window size
  • the image of layer i-1 in the image pyramid of the image is determined as the reference image of the image of layer i, and the target window size is determined according to the reference image.
  • the image is a frame image in a video stream
  • the determining module 920 is configured to:
  • the image is any frame image other than the first frame image in the video stream, determining the i-th layer image of the image pyramid of the previous frame image of the image as the image pyramid of the image A reference image of the i-th layer image, and a target window size is determined according to the reference image.
  • the determining module 920 is further configured to:
  • the image is the first frame image in the video stream, and the i-th layer image of the image pyramid of the image is the first-layer image, determining the preset window size as the target window size;
  • the i-1th image of the image pyramid of the image is The layer image is determined as the reference image of the ith layer image of the image pyramid of the image, and the target window size is determined according to the reference image.
  • the determining module 920 is configured to:
  • the number of the second key points is the number of output key points of the reference image.
  • the determining module 920 is configured to:
  • the number of first keypoints corresponding to the reference image is determined according to the number of layers where the reference image is located in the image pyramid to which the reference image belongs, where the number of layers and the first key to which the reference image is located in the image pyramid to which the reference image belongs. There is a preset correspondence relationship between the number of points.
  • the number of first key points of two adjacent layers of images meets a preset ratio
  • the preset ratio is equal to the ratio of the number of pixels of two adjacent layers of images in the image pyramid.
  • the determining module 920 is configured to:
  • a target window size is determined according to the target window level.
  • the determining module 920 is configured to:
  • a window size corresponding to the target window level in the target window group is determined as a target window size.
  • the determining module 920 is configured to:
  • a feature score of each pixel in the image block is determined according to a preset feature detection algorithm, and a pixel point whose feature score is greater than a preset threshold is determined as the image block.
  • the candidate key points of each image block of the i-th layer image are determined as at least one candidate key point of the i-th layer image.
  • the foregoing obtaining module 910 may be implemented by a processor, and the determining module 920 may be implemented by a processor and a memory together.
  • the target window size can be adjusted based on each layer of images in the image pyramid of the image, that is, the window size of non-maximum suppression is adjusted so that the non-maximum value
  • the size of the suppressed window can be changed with each layer of the image, and the number of key points in each image is balanced to reduce the negative impact on image matching.
  • the device for extracting image key points only uses the division of the foregoing functional modules as an example for extracting the image key points.
  • the above-mentioned functions can be assigned by different functions.
  • the function module is completed, that is, the internal structure of the image processing device is divided into different function modules to complete all or part of the functions described above.
  • the apparatus for extracting key points of an image provided by the foregoing embodiment belongs to the same concept as the method embodiment for extracting key points of an image. For specific implementation processes, refer to the method embodiments, and details are not described herein again.
  • all or part may be implemented by software, hardware, or a combination thereof.
  • software When implemented using software, it may be all or partly implemented in the form of a computer program product.
  • the computer program product includes one or more computer instructions.
  • the computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be from a website site, computer, server, or data center Transmission to another website site, computer, server or data center by wire (for example, coaxial cable, fiber optic, twisted pair) or wireless (for example, infrared, wireless, microwave, etc.).
  • the computer-readable storage medium may be any medium that can be accessed by a computer or a data storage device such as a server, a data center, or the like that includes one or more media integrations.
  • the medium may be a magnetic medium (such as a floppy disk, a hard disk, a magnetic tape, etc.), an optical medium (such as an optical disk, etc.), or a semiconductor medium (such as a solid state hard disk, etc.).

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Image Analysis (AREA)

Abstract

L'invention concerne un procédé et un appareil d'extraction d'un point clé d'image, qui se rapportent au domaine technique de l'électronique. Le procédé consiste à : acquérir une pyramide d'images d'une image (301), la pyramide d'images comprenant N couches d'images, et N > 1; selon une image d'i-ième couche de la pyramide d'images, déterminer la taille d'une fenêtre cible, et déterminer au moins un point clé de sortie de l'image d'i-ième couche selon la taille de la fenêtre cible, où 1 ≤ i ≤ N; et déterminer un point clé de sortie de chaque image de couche de la pyramide d'images en tant que point clé de l'image. En utilisant le procédé décrit, le nombre de points clés extraits à partir de chaque image peut être égalisé.
PCT/CN2018/106778 2018-09-20 2018-09-20 Procédé et appareil d'extraction de point clé d'image WO2020056688A1 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201880095485.3A CN112424787B (zh) 2018-09-20 2018-09-20 提取图像关键点的方法及装置
PCT/CN2018/106778 WO2020056688A1 (fr) 2018-09-20 2018-09-20 Procédé et appareil d'extraction de point clé d'image

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/106778 WO2020056688A1 (fr) 2018-09-20 2018-09-20 Procédé et appareil d'extraction de point clé d'image

Publications (1)

Publication Number Publication Date
WO2020056688A1 true WO2020056688A1 (fr) 2020-03-26

Family

ID=69888165

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/106778 WO2020056688A1 (fr) 2018-09-20 2018-09-20 Procédé et appareil d'extraction de point clé d'image

Country Status (2)

Country Link
CN (1) CN112424787B (fr)
WO (1) WO2020056688A1 (fr)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113378865A (zh) * 2021-08-16 2021-09-10 航天宏图信息技术股份有限公司 一种影像金字塔的匹配方法和装置
CN117911956A (zh) * 2024-03-19 2024-04-19 洋县阿拉丁生物工程有限责任公司 用于食品加工设备的加工环境动态监测方法及系统

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102930245A (zh) * 2012-09-24 2013-02-13 深圳市捷顺科技实业股份有限公司 一种车辆跟踪方法及系统
CN105069477A (zh) * 2015-08-13 2015-11-18 天津津航技术物理研究所 AdaBoost级联分类器检测图像目标的方法
CN105512638A (zh) * 2015-12-24 2016-04-20 黄江 一种基于融合特征的人脸检测与对齐方法
US20170011520A1 (en) * 2015-07-09 2017-01-12 Texas Instruments Incorporated Window grouping and tracking for fast object detection
CN106529448A (zh) * 2016-10-27 2017-03-22 四川长虹电器股份有限公司 利用聚合通道特征进行多视角人脸检测的方法

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106650615B (zh) * 2016-11-07 2018-03-27 深圳云天励飞技术有限公司 一种图像处理方法及终端

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102930245A (zh) * 2012-09-24 2013-02-13 深圳市捷顺科技实业股份有限公司 一种车辆跟踪方法及系统
US20170011520A1 (en) * 2015-07-09 2017-01-12 Texas Instruments Incorporated Window grouping and tracking for fast object detection
CN105069477A (zh) * 2015-08-13 2015-11-18 天津津航技术物理研究所 AdaBoost级联分类器检测图像目标的方法
CN105512638A (zh) * 2015-12-24 2016-04-20 黄江 一种基于融合特征的人脸检测与对齐方法
CN106529448A (zh) * 2016-10-27 2017-03-22 四川长虹电器股份有限公司 利用聚合通道特征进行多视角人脸检测的方法

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113378865A (zh) * 2021-08-16 2021-09-10 航天宏图信息技术股份有限公司 一种影像金字塔的匹配方法和装置
CN113378865B (zh) * 2021-08-16 2021-11-05 航天宏图信息技术股份有限公司 一种影像金字塔的匹配方法和装置
CN117911956A (zh) * 2024-03-19 2024-04-19 洋县阿拉丁生物工程有限责任公司 用于食品加工设备的加工环境动态监测方法及系统
CN117911956B (zh) * 2024-03-19 2024-05-31 洋县阿拉丁生物工程有限责任公司 用于食品加工设备的加工环境动态监测方法及系统

Also Published As

Publication number Publication date
CN112424787A (zh) 2021-02-26
CN112424787B (zh) 2024-07-09

Similar Documents

Publication Publication Date Title
CN108921806B (zh) 一种图像处理方法、图像处理装置及终端设备
WO2022160980A1 (fr) Procédé et appareil de super-résolution, dispositif terminal et support de stockage
CN110288547A (zh) 用于生成图像去噪模型的方法和装置
US20170109912A1 (en) Creating a composite image from multi-frame raw image data
CN111402170B (zh) 图像增强方法、装置、终端及计算机可读存储介质
CN112602088B (zh) 提高弱光图像的质量的方法、系统和计算机可读介质
WO2021082819A1 (fr) Procédé et appareil de génération d'image, et dispositif électronique
CN111985281B (zh) 图像生成模型的生成方法、装置及图像生成方法、装置
WO2017113917A1 (fr) Procédé d'imagerie, appareil d'imagerie et terminal
WO2021008205A1 (fr) Traitement d'images
CN112883940A (zh) 静默活体检测方法、装置、计算机设备及存储介质
WO2020056688A1 (fr) Procédé et appareil d'extraction de point clé d'image
CN111429371A (zh) 图像处理方法、装置及终端设备
CN111325798A (zh) 相机模型纠正方法、装置、ar实现设备及可读存储介质
CN111028276A (zh) 图像对齐方法、装置、存储介质及电子设备
CN113822927B (zh) 一种适用弱质量图像的人脸检测方法、装置、介质及设备
CN113205011B (zh) 图像掩膜确定方法及装置、存储介质和电子设备
US20240127406A1 (en) Image quality adjustment method and apparatus, device, and medium
CN113658065A (zh) 图像降噪方法及装置、计算机可读介质和电子设备
CN110738625B (zh) 图像重采样方法、装置、终端及计算机可读存储介质
WO2023138540A1 (fr) Procédé et appareil d'extraction de bord, dispositif électronique et support de stockage
CN111080683A (zh) 图像处理方法、装置、存储介质及电子设备
JP2015179426A (ja) 情報処理装置、パラメータの決定方法、及びプログラム
CN112752086B (zh) 用于环境映射的图像信号处理器、方法和系统
CN114973293A (zh) 相似性判断方法、关键帧提取方法及装置、介质和设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18933848

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18933848

Country of ref document: EP

Kind code of ref document: A1