CN109740674B - Image processing method, device, equipment and storage medium - Google Patents

Image processing method, device, equipment and storage medium Download PDF

Info

Publication number
CN109740674B
CN109740674B CN201910011494.6A CN201910011494A CN109740674B CN 109740674 B CN109740674 B CN 109740674B CN 201910011494 A CN201910011494 A CN 201910011494A CN 109740674 B CN109740674 B CN 109740674B
Authority
CN
China
Prior art keywords
image
current frame
frame image
feature
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910011494.6A
Other languages
Chinese (zh)
Other versions
CN109740674A (en
Inventor
马福强
陈丽莉
楚明磊
吕耀宇
薛鸿臻
闫桂新
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BOE Technology Group Co Ltd
Beijing BOE Optoelectronics Technology Co Ltd
Original Assignee
BOE Technology Group Co Ltd
Beijing BOE Optoelectronics Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BOE Technology Group Co Ltd, Beijing BOE Optoelectronics Technology Co Ltd filed Critical BOE Technology Group Co Ltd
Priority to CN201910011494.6A priority Critical patent/CN109740674B/en
Publication of CN109740674A publication Critical patent/CN109740674A/en
Application granted granted Critical
Publication of CN109740674B publication Critical patent/CN109740674B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Image Analysis (AREA)

Abstract

The application discloses an image processing method, an image processing device, image processing equipment and a storage medium. The method comprises the following steps: acquiring a current frame image acquired by a camera, and extracting visual features of the current frame image; generating a feature vector of the current frame image according to the visual features of the current frame image; dividing the feature vector of the current frame image into a plurality of sub-vectors, quantizing the plurality of sub-vectors, and generating a feature index of the visual feature of the current frame image; matching the characteristic index of the visual characteristic of the current frame image with the characteristic index of the visual characteristic of each training image, and determining a matching characteristic pair of the current frame image and each training image; the feature index of the visual feature of each training image is obtained based on the sub-codebook; and determining the training images with the number of the matched feature pairs larger than a first preset threshold value as similar images of the current frame image. The technical scheme can realize the rapid recognition of the image.

Description

Image processing method, device, equipment and storage medium
Technical Field
The present disclosure relates generally to the field of computer technologies, and in particular, to an image processing method, an image processing apparatus, an image processing device, and a storage medium.
Background
In recent years, with the rapid development of semiconductor technology and the promotion of artificial intelligence wave, rapid image recognition and tracking algorithms become research hotspots in the fields of augmented reality, robot positioning and the like.
At present, in the process of image recognition, the image recognition is mainly realized based on a tree-of-words (Bag of words) model. In order to achieve a better recognition effect, a larger-scale tree-shaped visual dictionary needs to be established, so that the time consumption of an image recognition process is longer, the memory occupancy rate of the tree-shaped visual dictionary is high, and the use of the tree-shaped visual dictionary on a platform with limited memory such as an embedded type platform is limited.
Disclosure of Invention
In view of the above-mentioned shortcomings or drawbacks of the prior art, it is desirable to provide a scheme capable of rapidly recognizing an image.
In a first aspect, an embodiment of the present application provides an image processing method, including:
acquiring a current frame image acquired by a camera, and extracting visual features of the current frame image;
generating a feature vector of the current frame image according to the visual features of the current frame image;
dividing the feature vector of the current frame image into a plurality of sub-vectors, quantizing the plurality of sub-vectors, and generating a feature index of the visual feature of the current frame image;
matching the characteristic index of the visual characteristic of the current frame image with the characteristic index of the visual characteristic of each training image in an image training set obtained by training in advance, and determining the matching characteristic pair of the current frame image and each training image; the feature index of the visual feature of each training image is obtained based on a sub-codebook, wherein the sub-codebook is a codebook obtained by dividing the space where the visual feature of each training image is located into a plurality of subspaces and training in each subspace;
and determining the training images with the number of the matching feature pairs larger than a first preset threshold value as similar images of the current frame image.
Optionally, the feature index of the visual feature of each training image is determined as follows:
acquiring an image training set, and extracting visual features of training images in the image training set;
dividing the visual features into M subspaces, and performing cluster analysis in each subspace to obtain the M sub-codebooks composed of k codewords;
and generating a feature index of the visual features of the training image according to at least one of the sub-codebooks.
Optionally, after generating the feature vector of the current frame image, the method further includes:
calculating the similarity between the feature vector of the current frame image and the feature vector of each training image obtained by pre-training, and determining the similarity between the current frame image and each training image;
determining the training images with the similarity greater than a second preset threshold as quasi-similar images; then
Matching the feature index of the visual feature of the current frame image with the feature index of the visual feature of each training image in an image training set obtained by training in advance, and determining the matching feature pair of the current frame image and each training image:
and matching the feature index of the visual feature of the current frame image with the visual feature of the quasi-similar image, and determining a matching feature pair of the current frame image and the quasi-similar image.
Optionally, after determining the training images with the number of the matching feature pairs larger than a first preset threshold as similar images of the current frame image, the method further includes:
determining a first camera pose from the similar image to the current frame image according to the matching feature pair of the current frame image and the similar image;
continuously acquiring a next frame image of the current frame image;
and determining the position of the similar image in the next frame of image according to the first camera pose.
Optionally, determining a first camera pose from the similar image to the current frame image according to the matching feature pair of the current frame image and the similar image, including:
determining matching feature point pairs of the current frame image and the similar image according to the matching feature pairs of the current frame image and the similar image;
and determining the first camera pose according to the 3D coordinates of the matched feature points in the similar image and the 2D coordinates of the matched feature points in the current frame image.
Optionally, determining a position of the similar image in the next frame of image according to the first camera pose includes:
projecting the 3D coordinates of the matched feature points in the similar image to the current frame image according to the first camera pose, and determining the 2D coordinates and the 3D coordinates of the matched feature points in the current frame image;
determining a second camera pose from the current frame image to the next frame image by using a least square method based on photometric errors according to the 2D coordinates and the 3D coordinates of the matched feature points in the current frame image;
projecting the 3D coordinates of the matched feature points in the current frame image according to the second camera position posture to obtain the projected 2D coordinates of the matched feature points in the current frame image;
and determining the position of the similar image in the next frame image according to the projected 2D coordinates of the matched feature points in the current frame image.
Optionally, after obtaining the projected 2D coordinates of the matched feature points in the current frame image, the method further includes:
sequentially judging whether the projected 2D coordinates of each matched feature point in the current frame image are located in the image coordinate range of the next frame image;
determining the number of matched feature points of which the projected 2D coordinates in the current frame image are located in the image coordinate range of the next frame image according to the judgment result; then
Determining the position of the similar image in the next frame image according to the projected 2D coordinates of the matched feature points in the current frame image, including:
and when the number of the matching feature points of the projected 2D coordinates in the current frame image within the image coordinate range of the next frame image is larger than a third preset threshold, determining the position of the similar image in the next frame image according to the projected 2D coordinates of the matching feature points in the current frame image.
In a second aspect, an embodiment of the present application further provides an image recognition apparatus, including:
the characteristic extraction unit is used for acquiring a current frame image acquired by a camera and extracting visual characteristics of the current frame image;
the feature vector generating unit is used for generating a feature vector of the current frame image according to the visual features of the current frame image;
a feature index generating unit, configured to divide the feature vector of the current frame image into a plurality of sub-vectors, quantize the plurality of sub-vectors, and generate a feature index of the visual feature of the current frame image;
the matching unit is used for matching the characteristic index of the visual characteristic of the current frame image with the characteristic index of the visual characteristic of each training image in an image training set obtained by training in advance and determining the matching characteristic pair of the current frame image and each training image; the feature index of the visual feature of each training image is obtained based on a sub-codebook, wherein the sub-codebook is a codebook obtained by dividing the space where the visual feature of each training image is located into a plurality of subspaces and training in each subspace;
and the image identification unit is used for determining the training images with the number of the matching feature pairs larger than a first preset threshold value as the similar images of the current frame image.
In a third aspect, an embodiment of the present application further provides an apparatus, including: at least one processor, at least one memory, and computer program instructions stored in the memory, which when executed by the processor, implement the image processing method as described above.
In a fourth aspect, the present application further provides a computer-readable storage medium, on which computer program instructions are stored, wherein the computer program instructions, when executed by a processor, implement the image processing method as described above.
The image processing scheme provided by the embodiment of the application provides a new visual feature matching method, namely, a feature index of a visual feature of a current frame image is matched with a feature index of a visual feature of each training image in an image training set obtained by pre-training, wherein the feature index of the visual feature of the current frame image is obtained by dividing a feature vector of the current frame image into a plurality of sub-vectors and quantizing the plurality of sub-vectors, the feature index of the visual feature of each training image is obtained based on a sub-codebook, and the sub-codebook is a codebook obtained by segmenting a space where the visual feature of each training image is located into a plurality of sub-spaces and training in each sub-space. The characteristic index obtained in the mode greatly reduces the storage scale, further improves the matching speed and can quickly realize image recognition.
Drawings
Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:
fig. 1 illustrates an exemplary flowchart of an image processing method provided in an embodiment of the present application;
FIG. 2 illustrates a schematic diagram of training feature indices of visual features of training images;
FIG. 3 shows a schematic diagram of training a sub-codebook using all visual features of each training image;
FIG. 4 shows a schematic diagram of a feature index of visual features of a training image generated from M sub-codebooks;
FIG. 5 is a schematic diagram of obtaining a pseudo-similar image of a current frame image;
FIG. 6 shows a schematic diagram of image tracking;
fig. 7 is a block diagram illustrating an exemplary structure of an image processing apparatus according to an embodiment of the present application;
FIG. 8 illustrates a schematic structural diagram of a computer system suitable for use in implementing a server according to embodiments of the present application.
Detailed Description
The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the present invention are shown in the drawings.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.
As mentioned in the background, the image recognition process is mainly implemented based on a tree BoW model. In order to achieve a better recognition effect, a larger-scale tree-shaped visual dictionary needs to be established, so that the time consumption of an image recognition process is longer, the memory occupancy rate of the tree-shaped visual dictionary is high, and the use of the tree-shaped visual dictionary on a platform with limited memory such as an embedded type platform is limited.
In view of the above-mentioned drawbacks of the prior art, embodiments of the present application provide an image processing scheme. The technical scheme provides a new visual characteristic matching method, namely, a characteristic index of the visual characteristic of a current frame image is matched with a characteristic index of the visual characteristic of each training image in an image training set obtained by pre-training, wherein the characteristic index of the visual characteristic of the current frame image is obtained by dividing a characteristic vector of the current frame image into a plurality of sub-vectors and quantizing the plurality of sub-vectors, the characteristic index of the visual characteristic of each training image is obtained based on a sub-codebook, and the sub-codebook is a codebook obtained by segmenting a space where the visual characteristic of each training image is located into a plurality of sub-spaces and training in each sub-space. The characteristic index obtained in the mode greatly reduces the storage scale, further improves the matching speed and can quickly realize image recognition.
The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
Referring to fig. 1, an exemplary flowchart of an image processing method provided in an embodiment of the present application is shown.
The method comprises the following steps:
step 110, acquiring a current frame image acquired by the camera, and extracting visual features of the current frame image.
In the embodiment of the present application, the visual Feature of the current frame image may be extracted based on a Scale-Invariant Feature (SIFT) algorithm, a Speeded Up Robust Features (SURF) algorithm, or an object FAST and Robust Feature (ORB) algorithm, but the method for extracting the visual Feature of the current frame image of the present invention is not limited thereto, and for example, a texture map Feature, a direction gradient histogram Feature, a color histogram Feature, and the like of the current frame image may also be extracted.
Step 120, dividing the visual features of the current frame image into a plurality of sub-vectors, and quantizing the plurality of sub-vectors to generate a feature index of the visual features of the current frame image.
Specifically, each visual feature in the current frame image may be divided into M subspaces according to a vector dimension, and assuming that the visual feature of the current frame image is an SIFT feature, the dimension of the SIFT feature is 128, the 128-dimensional SIFT feature is firstly divided into M sub-vectors, the dimension of each sub-vector is 128/M, then each sub-vector is sequentially quantized, and finally, a feature index is generated according to a quantization result of each sub-vector.
And step 130, matching the feature index of the visual feature of the current frame image with the feature index of the visual feature of each training image in an image training set obtained by pre-training, and determining the matching feature pair of the current frame image and each training image.
Wherein, the feature matching pair refers to two visual features that the feature index can match. For example, the feature index of a certain visual feature in the current frame image is 001, and if the feature index of a certain visual feature of the training image is also 001, the two visual features are a set of matching feature pairs.
It should be noted that the feature index of a certain visual feature in the current frame image is 001, and if there are multiple feature indexes in the training image that are also the visual features of 001, then one visual feature is selected from the multiple visual features and the visual features in the current frame image to form a set of matching feature pairs.
Wherein the feature index of the visual feature of each training image is obtained based on the subcode book.
The sub-codebook is a codebook obtained by dividing the space where the visual features of each training image are located into a plurality of subspaces and training in each subspace.
The codebook refers to k clustering centers obtained by clustering visual features by using a clustering algorithm, each clustering center is called a codeword, and a set of the k clustering centers is called a codebook.
Specifically, the feature index of the visual feature of each training image may be determined in the manner shown in fig. 2:
step 210, obtaining an image training set, and extracting visual features of training images in the image training set.
The method for extracting the visual features of each training image is the same as the method for extracting the visual features of the current frame image, and is not repeated here.
Step 220, dividing the visual features of each training image into M subspaces, and performing cluster analysis in each subspace to obtain M sub codebooks composed of k codewords.
The subspace is a space where the corresponding dimensional subvectors of all visual features of the training image are located.
FIG. 3 is a schematic diagram of training a sub-codebook using all the visual features of each training image. Taking M-3 as an example, all visual features of each training image are divided into 3 subspaces, and clustering analysis is performed in each subspace to obtain 3 sub codebooks composed of k codewords.
And step 230, generating a feature index of the visual features of the training image according to the at least one sub-codebook.
In the embodiment of the application, the feature index of the visual feature of the training image can be generated according to the M sub-codebooks. Fig. 4 is a schematic diagram of a feature index of visual features of a training image generated from M sub-codebooks.
Specifically, a sub-vector of each visual feature of the training image is quantized in each subspace, and a feature index is generated according to a quantization result of M sub-vectors of each visual feature. Wherein, the scale of the characteristic index is shown as formula (1):
Figure BDA0001937471580000081
wherein qi is a quantization result of the ith sub-vector, and index is a feature index of visual features of a training image generated according to the M sub-codebooks.
Optionally, in order to further reduce the size of the feature index and improve the matching speed, M-1 or M-2 sub-codebooks may also be used to generate the feature index.
Step 140, determining the training images with the number of the matching feature pairs larger than a first preset threshold as similar images of the current frame image.
After the matching feature pairs of the current frame image and each training image are determined, the number is counted, and then the training images of which the number is greater than a first preset threshold value are determined as similar images of the current frame.
The embodiment of the application provides an image processing scheme. The technical scheme provides a new visual characteristic matching method, namely, a characteristic index of the visual characteristic of a current frame image is matched with a characteristic index of the visual characteristic of each training image in an image training set obtained by pre-training, wherein the characteristic index of the visual characteristic of the current frame image is obtained by dividing a characteristic vector of the current frame image into a plurality of sub-vectors and quantizing the plurality of sub-vectors, the characteristic index of the visual characteristic of each training image is obtained based on a sub-codebook, and the sub-codebook is a codebook obtained by segmenting a space where the visual characteristic of each training image is located into a plurality of sub-spaces and training in each sub-space. The characteristic index obtained in the mode greatly reduces the storage scale, further improves the matching speed and can quickly realize image recognition.
Optionally, after the visual features of the current frame image are extracted in step 110, the training image may be initially selected to obtain a pseudo-similar image of the current frame image.
Specifically, as shown in fig. 5, the method includes the following steps:
step 510, generating a bag-of-words vector of the current frame image according to the visual characteristics of the current frame image.
Specifically, firstly, the visual features of the current frame image are extracted and feature descriptors of the visual features are constructed, and then clustering is performed on the feature descriptors through a clustering algorithm (such as a k-means algorithm) training to generate a codebook. Then, the visual features are quantized through a KNN (K-nearest neighbor) algorithm, and finally, an image histogram vector weighted by TF-IDF (term frequency-inverse document frequency), namely a BoW vector, is obtained.
Step 520, calculating the similarity between the feature vector of the current frame image and the feature vectors of the training images obtained by pre-training, and determining the similarity between the current frame image and the training images.
The feature vector of each training image is consistent with the method for acquiring the feature vector of the current frame image, and is not described herein again.
In addition, the euclidean distance or the cosine distance of the two BoW vectors may be calculated as a criterion of the similarity between the feature vector of the current frame image and the feature vector of each training image.
In step 530, the training images with similarity greater than the second preset threshold are determined as quasi-similar images.
Therefore, a part of quasi-similar images can be screened out from a large number of training images, and the time consumption of the subsequent visual feature matching process is further shortened.
Based on the above steps 510 to 530, the step 130 may specifically include:
and matching the characteristic index of the visual characteristic of the current frame image with the visual characteristic of the quasi-similar image to determine a matching characteristic pair of the current frame image and the quasi-similar image.
The image processing method can be applied to the technical field of image recognition and tracking.
Optionally, after determining the training images with the number of the matching feature pairs larger than the first preset threshold as similar images of the current frame image, the embodiment of the present application may further include an image tracking step shown in fig. 6:
and step 610, determining a first camera pose from the similar image to the current frame image according to the matching feature pair of the current frame image and the similar image.
In the embodiment of the application, the matching feature point pair of the current frame image and the similar image can be determined according to the matching feature pair of the current frame image and the similar image; that is, each visual feature corresponds to a feature point, and thus a set of matching feature pairs corresponds to a set of matching feature point pairs.
After the matching feature point pairs of the current frame image and the similar image are determined, the first camera pose can be determined according to the 3D coordinates of the matching feature points in the similar image and the 2D coordinates of the matching feature points in the current frame image.
Specifically, assuming that a plane where matching feature points of similar images are located is a plane z equal to 0, so that 2D pixel coordinates (u, v) become 3D coordinates (u, v, 0), then a first camera pose is calculated by using a PnP algorithm according to corresponding 3D-2D matching feature point pairs, where T is the first camera pose, R is a rotation matrix, and T is a translation matrix.
Step 620, continue to acquire the next frame image of the current frame image.
And step 630, determining the position of the similar image in the next frame of image according to the pose of the first camera.
Step 630 may be implemented as follows:
firstly, projecting the 3D coordinates of the matched feature points in the similar images to the current frame image according to the first camera pose, and determining the 2D coordinates and the 3D coordinates of the matched feature points in the current frame image;
wherein, it can be determined according to the following formulas (2) and (3):
P′=RP+t; (2)
Figure BDA0001937471580000101
wherein, P is the 3D coordinate of the matching feature point in the similar image, P is the 3D coordinate of the matching feature point in the current frame image, (u, v) is the 2D coordinate of the matching feature point in the current frame image, and K is the camera internal parameter.
Secondly, determining a second camera pose from the current frame image to the next frame image by using a least square method based on photometric errors according to the 2D coordinates and the 3D coordinates of the matched feature points in the current frame image;
can be determined according to the following equation (4):
Figure BDA0001937471580000102
wherein, T is the pose of the second camera, Pi is the 3D coordinate of the matching feature point in the current frame image, Pi is the 2D coordinate of the matching feature point in the current frame image, n is the number of the matching feature point pairs, K is the camera internal reference, R, T is the value to be estimated, zi is the depth value (known) in the projection process, and I1() is the image gray value of the corresponding point.
And solving the above formula by using a Gauss Newton method or a Levenberg Marquardt method to obtain the second camera pose from the current frame image to the next frame image.
And thirdly, projecting the 3D coordinates of the matched characteristic points in the current frame image according to the second camera position and posture to obtain the projected 2D coordinates of the matched characteristic points in the current frame image.
And fourthly, determining the position of the similar image in the next frame image according to the projected 2D coordinates of the matched feature points in the current frame image.
After obtaining the projected 2D coordinates of the matching feature points in the current frame image, it may be sequentially determined whether the projected 2D coordinates of each matching feature point in the current frame image are within the image coordinate range of the next frame image; and determining the number of the matching feature points of which the projected 2D coordinates in the current frame image are located in the image coordinate range of the next frame image according to the judgment result.
And if the number of the matched feature points of which the projected 2D coordinates in the current frame image are located in the image coordinate range of the next frame image is too small, the fact that the similar image does not exist in the next frame image is indicated, and then the tracking process is finished. At this point, the method can return to continue to acquire the next frame of image for tracking.
And if the number of the matching feature points of the projected 2D coordinates in the current frame image in the image coordinate range of the next frame image is larger than a third preset threshold, indicating that the similar image still exists in the next frame image, and determining the position of the similar image in the next frame image according to the projected 2D coordinates of the matching feature points in the current frame image.
In the embodiment of the application, the tracking and positioning of the image are realized by a least square method.
It should be noted that while the operations of the method of the present invention are depicted in the drawings in a particular order, this does not require or imply that the operations must be performed in this particular order, or that all of the illustrated operations must be performed, to achieve desirable results. Rather, the steps depicted in the flowcharts may change the order of execution. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions.
Further referring to fig. 7, it shows an exemplary structural block diagram of an image processing apparatus provided in an embodiment of the present application.
The device includes:
a feature extraction unit 71, configured to obtain a current frame image acquired by a camera, and extract visual features of the current frame image;
a feature vector generating unit 72, configured to generate a feature vector of the current frame image according to the visual features of the current frame image;
a feature index generating unit 73, configured to divide the feature vector of the current frame image into a plurality of sub-vectors, quantize the plurality of sub-vectors, and generate a feature index of the visual feature of the current frame image;
a matching unit 74, configured to match the feature index of the visual feature of the current frame image with the feature index of the visual feature of each training image in an image training set obtained through pre-training, and determine a matching feature pair between the current frame image and each training image; the feature index of the visual feature of each training image is obtained based on a sub-codebook, wherein the sub-codebook is a codebook obtained by dividing the space where the visual feature of each training image is located into a plurality of subspaces and training in each subspace;
an image recognition unit 75, configured to determine the training images with the number of the matching feature pairs being greater than a first preset threshold as similar images of the current frame image.
Optionally, the apparatus may further include:
a training unit to:
acquiring an image training set, and extracting visual features of training images in the image training set;
dividing the visual features of the training images into M subspaces, and performing cluster analysis in each subspace to obtain the M sub code books consisting of k code words;
and generating a feature index of the visual features of the training image according to at least one of the sub-codebooks.
Optionally, the apparatus may further include:
a pseudo-similar image determination unit for:
generating a bag-of-words vector of the current frame image according to the visual characteristics of the current frame image;
calculating the similarity between the bag-of-word vector of the current frame image and the bag-of-word vector of each training image obtained by pre-training, and determining the similarity between the current frame image and each training image;
and determining the training images with the similarity larger than a second preset threshold as quasi-similar images.
The matching unit 74 is specifically configured to:
and matching the feature index of the visual feature of the current frame image with the visual feature of the quasi-similar image, and determining a matching feature pair of the current frame image and the quasi-similar image.
Optionally, the apparatus may further include:
the first camera pose determining unit is used for determining a first camera pose from the similar image to the current frame image according to the matching feature pair of the current frame image and the similar image;
the acquisition unit is used for continuously acquiring a next frame image of the current frame image;
and the positioning unit is used for determining the position of the similar image in the next frame of image according to the first camera pose.
Optionally, the first camera pose determining unit is specifically configured to:
determining matching feature point pairs of the current frame image and the similar image according to the matching feature pairs of the current frame image and the similar image;
and determining the first camera pose according to the 3D coordinates of the matched feature points in the similar image and the 2D coordinates of the matched feature points in the current frame image.
Optionally, the positioning unit is specifically configured to:
projecting the 3D coordinates of the matched feature points in the similar image to the current frame image according to the first camera pose, and determining the 2D coordinates and the 3D coordinates of the matched feature points in the current frame image;
determining a second camera pose from the current frame image to the next frame image by using a least square method based on photometric errors according to the 2D coordinates and the 3D coordinates of the matched feature points in the current frame image;
projecting the 3D coordinates of the matched feature points in the current frame image according to the second camera position posture to obtain the projected 2D coordinates of the matched feature points in the current frame image;
and determining the position of the similar image in the next frame image according to the projected 2D coordinates of the matched feature points in the current frame image.
Optionally, the method may further include:
a determination unit configured to:
sequentially judging whether the projected 2D coordinates of each matched feature point in the current frame image are located in the image coordinate range of the next frame image;
and determining the number of the matched feature points of which the projected 2D coordinates in the current frame image are located in the image coordinate range of the next frame image according to the judgment result.
The positioning unit is specifically configured to:
and when the number of the matching feature points of the projected 2D coordinates in the current frame image within the image coordinate range of the next frame image is larger than a third preset threshold, determining the position of the similar image in the next frame image according to the projected 2D coordinates of the matching feature points in the current frame image.
It should be understood that the subsystems or units recited in the apparatus 700 correspond to various steps in the method described with reference to fig. 1-6. Thus, the operations and features described above for the method are equally applicable to the apparatus 700 and the units included therein, and are not described in detail here.
Referring now to FIG. 8, shown is a block diagram of a computer system 800 suitable for use in implementing a server according to embodiments of the present application.
As shown in fig. 8, the computer system 800 includes a Central Processing Unit (CPU)801 that can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)802 or a program loaded from a storage section 808 into a Random Access Memory (RAM) 803. In the RAM 803, various programs and data necessary for the operation of the system 800 are also stored. The CPU 801, ROM 802, and RAM 803 are connected to each other via a bus 804. An input/output (I/O) interface 805 is also connected to bus 804.
The following components are connected to the I/O interface 805: an input portion 806 including a keyboard, a mouse, and the like; an output section 807 including a signal such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 808 including a hard disk and the like; and a communication section 809 including a network interface card such as a LAN card, a modem, or the like. The communication section 809 performs communication processing via a network such as the internet. A drive 810 is also connected to the I/O interface 805 as necessary. A removable medium 811 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 810 as necessary, so that a computer program read out therefrom is mounted on the storage section 808 as necessary.
In particular, the processes described above with reference to fig. 1-6 may be implemented as computer software programs, according to embodiments of the present disclosure. For example, embodiments of the present disclosure include a computer program product comprising a computer program tangibly embodied on a machine-readable medium, the computer program comprising program code for performing the methods of fig. 1-6. In such an embodiment, the computer program can be downloaded and installed from a network through the communication section 809 and/or installed from the removable medium 811.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units or modules described in the embodiments of the present application may be implemented by software or hardware. The described units or modules may also be provided in a processor. The names of these units or modules do not in some cases constitute a limitation of the unit or module itself.
As another aspect, the present application also provides a computer-readable storage medium, which may be the computer-readable storage medium included in the apparatus in the above-described embodiments; or it may be a separate computer readable storage medium not incorporated into the device. The computer readable storage medium stores one or more programs for use by one or more processors in performing the image processing methods described herein.
The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by a person skilled in the art that the scope of the invention as referred to in the present application is not limited to the embodiments with a specific combination of the above-mentioned features, but also covers other embodiments with any combination of the above-mentioned features or their equivalents without departing from the inventive concept. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.

Claims (9)

1. An image processing method, characterized in that the method comprises:
acquiring a current frame image acquired by a camera, and extracting visual features of the current frame image;
dividing the visual features of the current frame image into a plurality of sub-vectors, quantizing the sub-vectors, and generating a feature index of the visual features of the current frame image;
matching the characteristic index of the visual characteristic of the current frame image with the characteristic index of the visual characteristic of each training image in an image training set obtained by training in advance, and determining the matching characteristic pair of the current frame image and each training image; the feature index of the visual feature of each training image is obtained based on a sub-codebook, wherein the sub-codebook is a codebook obtained by dividing the space where the visual feature of each training image is located into a plurality of subspaces and training in each subspace;
determining the training images with the number of the matching feature pairs larger than a first preset threshold value as similar images of the current frame image;
determining a first camera pose from the similar image to the current frame image according to the matching feature pair of the current frame image and the similar image;
continuously acquiring a next frame image of the current frame image;
and determining the position of the similar image in the next frame of image according to the first camera pose.
2. The method of claim 1, wherein the feature index of the visual features of each training image is determined as follows:
acquiring an image training set, and extracting visual features of training images in the image training set;
dividing the visual features of the training images into M subspaces, and performing cluster analysis in each subspace to obtain the M sub code books consisting of k code words;
and generating a feature index of the visual features of the training image according to at least one of the sub-codebooks.
3. The method of claim 1, wherein after extracting the visual features of the current frame image, the method further comprises:
generating a bag-of-words vector of the current frame image according to the visual characteristics of the current frame image;
calculating the similarity between the bag-of-word vector of the current frame image and the bag-of-word vector of each training image obtained by pre-training, and determining the similarity between the current frame image and each training image;
determining the training images with the similarity greater than a second preset threshold as quasi-similar images; then
Matching the feature index of the visual feature of the current frame image with the feature index of the visual feature of each training image in an image training set obtained by training in advance, and determining the matching feature pair of the current frame image and each training image:
and matching the feature index of the visual feature of the current frame image with the visual feature of the quasi-similar image, and determining a matching feature pair of the current frame image and the quasi-similar image.
4. The method of claim 1, wherein determining a first camera pose from the similar image to the current frame image based on the matching feature pair of the current frame image and the similar image comprises:
determining matching feature point pairs of the current frame image and the similar image according to the matching feature pairs of the current frame image and the similar image;
and determining the first camera pose according to the 3D coordinates of the matched feature points in the similar image and the 2D coordinates of the matched feature points in the current frame image.
5. The method of claim 1, wherein determining the position of the similar image in the next frame image according to the first camera pose comprises:
projecting the 3D coordinates of the matched feature points in the similar image to the current frame image according to the first camera pose, and determining the 2D coordinates and the 3D coordinates of the matched feature points in the current frame image;
determining a second camera pose from the current frame image to the next frame image by using a least square method based on photometric errors according to the 2D coordinates and the 3D coordinates of the matched feature points in the current frame image;
projecting the 3D coordinates of the matched feature points in the current frame image according to the second camera position posture to obtain the projected 2D coordinates of the matched feature points in the current frame image;
and determining the position of the similar image in the next frame image according to the projected 2D coordinates of the matched feature points in the current frame image.
6. The method of claim 5, wherein after obtaining the projected 2D coordinates of the matching feature points in the current frame image, the method further comprises:
sequentially judging whether the projected 2D coordinates of each matched feature point in the current frame image are located in the image coordinate range of the next frame image;
determining the number of matched feature points of which the projected 2D coordinates in the current frame image are located in the image coordinate range of the next frame image according to the judgment result; then
Determining the position of the similar image in the next frame image according to the projected 2D coordinates of the matched feature points in the current frame image, including:
and when the number of the matching feature points of the projected 2D coordinates in the current frame image within the image coordinate range of the next frame image is larger than a third preset threshold, determining the position of the similar image in the next frame image according to the projected 2D coordinates of the matching feature points in the current frame image.
7. An image processing apparatus, characterized in that the apparatus comprises:
the characteristic extraction unit is used for acquiring a current frame image acquired by a camera and extracting visual characteristics of the current frame image;
the feature vector generating unit is used for generating a feature vector of the current frame image according to the visual features of the current frame image;
a feature index generating unit, configured to divide the feature vector of the current frame image into a plurality of sub-vectors, quantize the plurality of sub-vectors, and generate a feature index of the visual feature of the current frame image;
the matching unit is used for matching the characteristic index of the visual characteristic of the current frame image with the characteristic index of the visual characteristic of each training image in an image training set obtained by training in advance and determining the matching characteristic pair of the current frame image and each training image; the feature index of the visual feature of each training image is obtained based on a sub-codebook, wherein the sub-codebook is a codebook obtained by dividing the space where the visual feature of each training image is located into a plurality of subspaces and training in each subspace;
the image identification unit is used for determining the training images with the number of the matching feature pairs larger than a first preset threshold value as similar images of the current frame image;
the first camera pose determining unit is used for determining a first camera pose from the similar image to the current frame image according to the matching feature pair of the current frame image and the similar image;
the acquisition unit is used for continuously acquiring a next frame image of the current frame image;
and the positioning unit is used for determining the position of the similar image in the next frame of image according to the first camera pose.
8. An apparatus, comprising: at least one processor, at least one memory, and computer program instructions stored in the memory that, when executed by the processor, implement the method of any of claims 1-6.
9. A computer-readable storage medium having computer program instructions stored thereon, which when executed by a processor implement the method of any one of claims 1-6.
CN201910011494.6A 2019-01-07 2019-01-07 Image processing method, device, equipment and storage medium Active CN109740674B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910011494.6A CN109740674B (en) 2019-01-07 2019-01-07 Image processing method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910011494.6A CN109740674B (en) 2019-01-07 2019-01-07 Image processing method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN109740674A CN109740674A (en) 2019-05-10
CN109740674B true CN109740674B (en) 2021-01-22

Family

ID=66363613

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910011494.6A Active CN109740674B (en) 2019-01-07 2019-01-07 Image processing method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN109740674B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111242230A (en) * 2020-01-17 2020-06-05 腾讯科技(深圳)有限公司 Image processing method and image classification model training method based on artificial intelligence
CN111311758A (en) * 2020-02-24 2020-06-19 Oppo广东移动通信有限公司 Augmented reality processing method and device, storage medium and electronic equipment
CN111703656A (en) * 2020-05-19 2020-09-25 河南中烟工业有限责任公司 Method for correcting orientation of circulating smoke box skin
CN112668632B (en) * 2020-12-25 2022-04-08 浙江大华技术股份有限公司 Data processing method and device, computer equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103440348A (en) * 2013-09-16 2013-12-11 重庆邮电大学 Vector-quantization-based overall and local color image searching method
CN104199923A (en) * 2014-09-01 2014-12-10 中国科学院自动化研究所 Massive image library retrieving method based on optimal K mean value Hash algorithm
CN105426533A (en) * 2015-12-17 2016-03-23 电子科技大学 Image retrieving method integrating spatial constraint information
CN108984642A (en) * 2018-06-22 2018-12-11 西安工程大学 A kind of PRINTED FABRIC image search method based on Hash coding

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103440348A (en) * 2013-09-16 2013-12-11 重庆邮电大学 Vector-quantization-based overall and local color image searching method
CN104199923A (en) * 2014-09-01 2014-12-10 中国科学院自动化研究所 Massive image library retrieving method based on optimal K mean value Hash algorithm
CN105426533A (en) * 2015-12-17 2016-03-23 电子科技大学 Image retrieving method integrating spatial constraint information
CN108984642A (en) * 2018-06-22 2018-12-11 西安工程大学 A kind of PRINTED FABRIC image search method based on Hash coding

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"基于SIFT的图像检索技术研究";朱玉滨;《中国优秀硕士论文全文数据库》;20140915;第4.2节 *

Also Published As

Publication number Publication date
CN109740674A (en) 2019-05-10

Similar Documents

Publication Publication Date Title
CN109740674B (en) Image processing method, device, equipment and storage medium
CN107992842B (en) Living body detection method, computer device, and computer-readable storage medium
CN110659582A (en) Image conversion model training method, heterogeneous face recognition method, device and equipment
CN107273458B (en) Depth model training method and device, and image retrieval method and device
US8428397B1 (en) Systems and methods for large scale, high-dimensional searches
US8538164B2 (en) Image patch descriptors
CN112036292B (en) Word recognition method and device based on neural network and readable storage medium
CN108229532B (en) Image recognition method and device and electronic equipment
US8571306B2 (en) Coding of feature location information
US8687892B2 (en) Generating a binary descriptor representing an image patch
JP5591178B2 (en) Method for classifying objects in test images
EP2791869A1 (en) Image classification
WO2016037844A1 (en) Method and apparatus for image retrieval with feature learning
Liu et al. An improved InceptionV3 network for obscured ship classification in remote sensing images
CN114973222B (en) Scene text recognition method based on explicit supervision attention mechanism
CN113095333B (en) Unsupervised feature point detection method and unsupervised feature point detection device
CN108875487A (en) Pedestrian is identified the training of network again and is identified again based on its pedestrian
CN112163114B (en) Image retrieval method based on feature fusion
CN111223128A (en) Target tracking method, device, equipment and storage medium
CN111488810A (en) Face recognition method and device, terminal equipment and computer readable medium
US20170309004A1 (en) Image recognition using descriptor pruning
Mantecón et al. Enhanced gesture-based human-computer interaction through a Compressive Sensing reduction scheme of very large and efficient depth feature descriptors
CN115937567A (en) Image classification method based on wavelet scattering network and ViT
CN108154107A (en) A kind of method of the scene type of determining remote sensing images ownership
DONG et al. Maritime background infrared imagery classification based on histogram of oriented gradient and local contrast features

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant