CN107967482A

CN107967482A - Icon-based programming method and device

Info

Publication number: CN107967482A
Application number: CN201711000477.XA
Authority: CN
Inventors: 许超然; 张正顺; 陆遥; 霍焯亮; 马祥园
Original assignee: Guangzhou Cypress Medical Science And Technology Co Ltd; Guangdong Coast Conan Car Networking Technology Co Ltd
Current assignee: Guangzhou Cypress Medical Science And Technology Co Ltd; Guangdong Coast Conan Car Networking Technology Co Ltd
Priority date: 2017-10-24
Filing date: 2017-10-24
Publication date: 2018-04-27

Abstract

The present invention provides a kind of icon-based programming method and device, and described method includes following steps：Obtain the images to be recognized for including special icon；The characteristic point to detect the special icon is handled the images to be recognized using image processing method and extracts the feature vector of each characteristic point and forms unique feature vector group；The described eigenvector group of generation is respectively matched into the target icon to match with the definite and special icon one by one with preset property data base with reference to the unique corresponding feature vector group of icon institute；Icon information and the output corresponding to the target icon are obtained to third party database.This method carries out icon-based programming based on the SIFT algorithms with scale invariability, lifts the precision of icon-based programming, enhancing this programme rotates image, brightness change, the robustness of change of scale.

Description

Icon identification method and device

Technical Field

The invention relates to the field of image recognition research, in particular to an icon recognition method and device.

Background

The information contained in the image is obtained through image-based search, which is the problem that the current mobile internet era is closely related to users and has the most application value and market potential. The trademark back contains many useful and important information which has wide application prospect in every aspect of daily life. In the face of directly interested commodities, a user hopes to shoot the commodities containing the trademark image part through a mobile phone, accurately obtain information contained in the trademark image by adopting an image searching method and further obtain information related to the commodities. Therefore, designing a method for automatically identifying information contained in an image through identification and analysis of the image has important commercial application value, and is a technical problem which needs to be solved as soon as possible.

The image is processed and identified to obtain the information contained in the image, and the core technology is the image identification technology. At present, many methods have poor performance on image identification, angular points or edges are often directly extracted during feature extraction, and the extracted feature discrimination is not high, so that the identification precision is low, and the robustness on image rotation, brightness change and scale transformation is not strong.

In conclusion, the prior art has poor adaptability to the environment, cannot effectively identify the target under different illumination, different poses and the like, and has low identification precision and poor user experience.

Disclosure of Invention

The invention provides an icon identification method and an icon identification device, which are used for realizing the identification of an icon and the output of icon information.

In a first aspect, the present invention provides a method for identifying an icon, where the method includes:

acquiring an image to be identified containing a specific icon;

processing the image to be identified by using an image processing method to detect the characteristic points of the specific icon and extracting characteristic vectors of the characteristic points to form a unique characteristic vector group;

matching the generated feature vector group with a feature vector group uniquely corresponding to each reference icon in a preset feature database one by one to determine a target icon matched with the specific icon;

and acquiring and outputting icon information corresponding to the target icon from a third-party database.

Specifically, the feature points include corner points, edge points, bright points in a dark area, and dark points in a bright area of the image/the specific icon to be recognized.

Preferably, before the processing the image to be recognized by using an image processing method to detect the feature points of the specific icon and extract the feature vectors of the feature points to form a unique feature vector group, the method further includes:

and converting the image to be identified from the color image into a gray image.

Specifically, the step of querying a preset icon information database to obtain and output icon information corresponding to the target icon includes:

and querying a preset trademark information database to acquire and output trademark information corresponding to the target trademark image.

Preferably, the step of processing the image to be recognized by using an image processing method to detect feature points of the image to be recognized and extract feature vectors of the feature points to form a feature vector group specifically includes:

down-sampling the image to be identified to generate a first image sequence containing a plurality of down-sampled images which are sequenced according to a preset sequencing rule;

performing Gaussian convolution operation on each image in the first image sequence by using different scale space factors to generate a second image sequence, wherein the scale space factors are used for representing the smoothness degree of the images;

subtracting two adjacent images in the second image sequence to obtain a third image sequence, detecting an extreme point in the third image sequence as a feature point of the specific icon, wherein a function expression corresponding to the third image sequence is as follows:

D(x，y，σ)＝|G(x，y，kσ)-G(x，y，σ)|*I(x，y)

＝L(x，y，kσ)-L(x，y，σ)

specifically, the preset ordering rule includes: and performing ascending or descending arrangement according to the size of each image pixel in the first image sequence.

Specifically, the step of subtracting two adjacent images in the second image sequence to obtain a third image sequence, and detecting an extreme point in the third image sequence as the feature point of the specific icon specifically includes:

and comparing the pixel value of each pixel point of each picture in the third image sequence with the pixel values of all adjacent points, and when the pixel value of a certain pixel point is larger or smaller than the pixel values of the adjacent pixel points of the image domain and the scale domain, judging that the pixel point is a local extreme point of the function expression corresponding to the third image sequence, wherein the characteristic point of the specific icon consists of the local extreme points.

Preferably, after the determining that the pixel point is the local extreme point of the function expression corresponding to the third image sequence, the method further includes:

and deriving the Taylor's expression of the function expression corresponding to the third image sequence at the local extreme point, and making the derivative value equal to zero, wherein the calculated coordinates of the pixel points are the coordinates of the feature points of the specific icon.

Specifically, the step of processing the image to be recognized by using the image processing method to detect the feature points of the image to be recognized and extract the feature vectors of the feature points to form a feature vector group specifically includes:

partitioning the image in the specific neighborhood of each feature point, and calculating Gaussian image gradient information corresponding to pixel points in each partition, wherein the gradient information comprises a module value and a direction of a gradient, and the calculation formula of the module value and the direction of the gradient is as follows:

and counting the gradient information of each pixel point in the specific neighborhood of any one feature point and drawing a gradient information histogram to generate a feature vector, wherein the feature vector group is formed by the collection of the feature vectors of all the feature points.

Specifically, the step of partitioning the specific neighborhood images of each feature point and calculating the gaussian image gradient information corresponding to the pixel points in each partition specifically includes:

respectively taking four points in four quadrants of a plane rectangular coordinate system with the feature point as an origin in each feature point scale space, and obtaining 16 pixel points adjacent to the feature point;

and for any one feature point, calculating gradient information of 16 adjacent pixel points in 8 directions, and taking a 128-dimensional vector formed by taking the calculated 128 gradient information as elements as a feature vector of the feature point of the specific icon.

Preferably, the step of matching the generated feature vector group with the feature vector group uniquely corresponding to each reference icon in a preset feature database one by one to determine the target icon matched with the specific icon specifically includes:

calculating cosine distance between each component vector of the generated feature vector group and each component of the feature vector group uniquely corresponding to each reference icon in the feature database, and judging that feature points corresponding to two component vectors are successfully matched when the cosine distance of the two component vectors is smaller than a preset threshold value.

Specifically, after the determining that the feature points corresponding to the two partial vectors are successfully matched, the method further includes:

counting the number of the feature points successfully matched with the specific icon and each reference icon, converting the number of the successfully matched feature points into the similarity of the matched reference icon relative to the specific icon, and sequencing the matched reference icons according to the similarity;

and taking the reference icon with the highest similarity with the specific icon as a target icon.

Specifically, the third-party database includes mapping relationships between the reference icons and the icon information.

Specifically, the step of obtaining and outputting icon information corresponding to the target icon from the third-party database specifically includes:

and sending an acquisition request containing the icon information of the target icon to a server corresponding to the third-party database.

In a second aspect, the present invention provides a trademark image recognition apparatus comprising:

an acquisition module: acquiring an image to be identified containing a specific icon;

an extraction module: processing the image to be identified by using an image processing method to detect the characteristic points of the specific icon and extracting characteristic vectors of the characteristic points to form a unique characteristic vector group;

a matching module: matching the generated feature vector group with a feature vector group uniquely corresponding to each reference icon in a preset feature database one by one to determine a target icon matched with the specific icon;

an output module: and acquiring and outputting icon information corresponding to the target icon from a third party icon information database.

Compared with the prior art, the scheme provided by the invention has the following advantages:

1. the method has the greatest beneficial effects that the image to be identified containing the specific icon is obtained; processing the image to be identified by using an image processing method to extract the characteristic points in the image and generate characteristic vectors to describe the characteristic points; matching each generated characteristic vector with a characteristic vector group uniquely corresponding to each reference icon in a preset characteristic database one by one to determine a target icon matched with the specific icon; and acquiring icon information corresponding to the target icon from a third-party database according to the target icon and outputting the icon information. The system uses an image recognition and matching method to link the trademark image with the specific information contained in the trademark, so that the user can conveniently inquire and search.

2. The method combines a scale invariant feature matching algorithm which can maintain invariance to image rotation, scale scaling and brightness change, can extract key features with better discrimination, also maintains certain stability to view angle change, affine transformation and noise, and is suitable for quick and accurate matching in a massive database. The algorithm has the advantages that the algorithm has multiple properties, and even a few objects can generate a large number of SIFT feature vectors; high speed, the optimized SIFT matching algorithm can even meet the real-time requirement; and the expandability can be conveniently combined with the characteristic vectors in other forms.

3. The invention solves the problem that the image registration and the target identification tracking performance are influenced due to the factors such as the self state of the icon, the environment of the scene, the imaging characteristics of the imaging equipment and the like in the icon identification process. Specifically, the method and the device can solve the problems of rotation, scaling and translation of the icon in the identification process of the icon; image affine/projective transformation (viewpoint); the influence of light; shielding a target; the scenes of the sundries, the influence of noise and the like.

4. According to the method and the device, when the matching of the target icon is implemented, not only can a best matching target icon be determined, but also a matching result which follows the sequence of the matching degrees from large to small can be determined, and a user can select a final matching result according to actual conditions and requirements, so that the flexibility of the scheme is improved, and the user experience is improved.

In conclusion, the method ingeniously combines the image processing and recognition method to realize the acquisition of the icon information contained in the icon through the recognition of the icon, improves the adaptability of the method to the environment, and has strong robustness and high matching precision.

Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.

Drawings

The above and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

FIG. 1 is a flow chart of an embodiment of a method for identifying icons according to the present invention;

FIG. 2 is a schematic diagram of the second image sequence in the icon recognition method according to the present invention;

FIG. 3 is a two-dimensional Gaussian function surface plot in an icon identification method of the present invention;

FIG. 4 is a schematic diagram illustrating the generation of the third image sequence in the icon recognition method according to the present invention;

fig. 5 is a schematic diagram illustrating extreme value detection of the gaussian difference image in the icon identifying method according to the present invention;

FIG. 6 is a statistical histogram of gradient information of neighboring pixels of the feature points in an icon identification method according to the present invention;

fig. 7 is a schematic diagram of gradient information of adjacent pixel points of the feature points in the icon recognition method according to the present invention;

FIG. 8 is a flow chart of an embodiment of an icon identifying apparatus according to the present invention.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative only and should not be construed as limiting the invention.

As will be understood by those skilled in the art, the terms used in the present application should be construed as follows:

SIFT algorithm: SIFT feature (Scale-invariant feature transform) is a computer vision algorithm used to detect and describe local features in an image, which finds extreme points in a spatial Scale and extracts its position, scale, and rotation invariants.

This algorithm was published in 1999 by David Lowe and perfected in 2004. The application range of the method comprises object recognition, robot map perception and navigation, image stitching, 3D model establishment, gesture recognition, image tracking and action comparison. This algorithm is patented by the university of British Columbia.

Description and detection of local image features can help to identify objects, SIFT features are based on some local appearance of interest points on an object regardless of the size and rotation of the image. The tolerance to light, noise, and slight viewing angle changes is also quite high. Based on these characteristics, they are highly significant and relatively easy to retrieve, easily identify objects and are rarely misidentified in feature databases with large denominations. The detection rate of partial object occlusion using the SIFT feature description is also quite high, and even more than 3 SIFT object features are enough to calculate the position and orientation. Under the current hardware speed of computer and the condition of small feature database, the recognition speed can approach to real-time operation. The SIFT features have large information quantity and are suitable for quick and accurate matching in a mass database.

Feature points of the image: since there are two images with the same scene, the stable points are extracted by some method, and there are corresponding matching points between these points, which have three characteristics of size, direction and size.

The scale space theory is as follows: the scale space (scale space) idea was originally proposed by Iijima in 1962, and gradually paid attention to by promotion of witkin, koenderink and the like, and the scale space (scale space) idea is widely used in the visual neighborhood of computers.

The basic idea of the scale space theory is as follows: a parameter regarded as scale is introduced into an image information processing model, scale space representation sequences under the multi-scale are obtained by continuously changing the scale parameter, main outlines of the sequences are extracted, the main outlines serve as feature vectors, and edge and corner detection, feature extraction on different resolutions and the like are achieved.

The scale space method brings the traditional single-scale image information processing technology into a dynamic analysis frame with continuously changing scales, and the essential characteristics of the image are easier to obtain. The fuzzy degree of each scale image in the scale space is gradually increased, and the forming process of the target on the retina when the human is far away from the target from near can be simulated.

The scale space satisfies the visual invariance. The visual interpretation of this invariance is as follows: when we observe an object with eyes, on one hand, when the illumination condition of the background where the object is located changes, the retina perceives the brightness level and the contrast of the image to be different, so that the analysis of the image by the scale space operator is required to be free from the influence of the gray level and the contrast change of the image, namely, the gray scale invariance and the contrast invariance are satisfied. On the other hand, with respect to a certain fixed coordinate system, when the relative position between the observer and the object changes, the position, size, angle, and shape of the image perceived by the retina are different, and therefore, the analysis of the image by the scale space operator is required to be independent of the position, size, angle, and affine transformation of the image, that is, to satisfy the translation invariance, scale invariance, euclidean invariance, and affine invariance.

Laplacian of gaussian (LOG): the Laplace operator is used as an excellent edge detection operator and widely applied to edge detection. The method realizes the detection of the edge by solving the zero crossing point of the second reciprocal of the image. The Laplace operator realizes edge detection by differentiating the image, so the Laplace operator is sensitive to discrete points and noise. Therefore, firstly, the Gaussian convolution filtering is carried out on the image to carry out noise reduction processing, and then the Laplace operator is adopted to carry out edge detection, so that the robustness of the operator to noise and discrete points can be improved, namely the Gaussian Laplace operator.

Gaussian Difference Operator (DOG): the method is an algorithm for enhancing a blurred image of an original gray image from another gray image, and the blurring degree of the blurred image is reduced through DOG. This blurred image is obtained by convolving the original gray image with gaussian kernels with different standard deviations. Gaussian blurring with a gaussian kernel can only suppress high frequency information. Subtracting one image from the other can preserve the spatial information contained in the frequency bands maintained in the two images. In this way, the DOG acts as a band pass filter that removes all other frequency information except those that remain in the original image.

In the SIFT (scale invariant feature transform) technique, SIFT feature points, which are key points located at the edge of an image and having gaussian difference extreme points, can be obtained by performing convolution operation on a pixel matrix function and a gaussian function of an original image. Through Gaussian blur processing, the gray values of the key points can be changed violently, so that the key points are used as the feature points of the image, the stability and the anti-interference performance are good, and the method is insensitive to the change of conditions based on image translation, rotation, scaling, illumination, affine projection, 3D projection, illumination and the like. And the later image registration is facilitated.

Referring to fig. 1, an embodiment of a method for identifying an icon according to the present invention includes the following steps:

and S11, acquiring the image to be identified containing the specific icon.

In the embodiment of the invention, the icons are graphic symbols with reference meanings, and have the characteristics of high concentration, rapid information transmission and convenience in memory. Specifically, it includes words, figures, letters, numbers, three-dimensional logos, colors, etc., and combinations of the above elements, logos having a distinctive feature, such as a trademark image. Furthermore, not only can the original icon with clear patterns and regular positions be used as the image to be recognized, but also the icon obtained by performing conversion operations such as rotation, scaling and the like on the original icon can be used as the image to be recognized.

Specifically, the acquiring of the image to be recognized including the specific icon may specifically be performed by taking a picture including the specific icon, or cutting a partial image including the specific icon from an existing picture as the image to be recognized.

Preferably, in the embodiment of the present invention, after the image to be recognized is acquired, the image to be recognized is converted from a color image (RGB image) into a grayscale image, and then the subsequent steps are performed. In one possible design, the present invention preferably converts the RGB image of the image to be recognized into a grayscale image, where any color is composed of three primary colors, red, green, and blue, and if the color of a certain point is RGB (R, G, B), the grayscale processing methods are as follows:

1. floating point arithmetic: gray = R0.3 + G0.59 + B0.11;

2. integer method: gray = (R30G 59B 11)/100;

3. the shifting method comprises the following steps: gray = (R28G 151+ B77) > >8;

4. average value method: gray = (R + G + B)/3;

5. taking green only: gray = G;

after obtaining Gray by any method, uniformly replacing R, G and B in original RGB (R, G and B) with Gray to form new RGB (Gray, gray and Gray), and replacing the original RGB (R, G and B) with the new RGB (R, G and Gray) to obtain the Gray image corresponding to the original image of the image to be recognized.

It should be noted that the gray-scale processing before the image processing of the image to be recognized is only for simplifying the subsequent image processing process, and the color information is ignored, which may cause the matching performance of the image to be degraded to some extent. Therefore, in the actual operation, whether or not the gradation processing is required can be selected as necessary. Specifically, in the embodiment of the present invention, it may or may not perform the gradation processing.

At the present stage, the SIFT algorithm can not only perform feature matching on the gray level image, but also perform feature matching on the color image. At present, various SIFT algorithms aiming at color image feature matching appear, such as HSV-SIFT, opponentSIFT, W-SIFT and the like. These color image SIFT algorithms are all completed by calculating 128-dimensional feature vectors of three channels respectively, and then synthesizing a 3 × 128-dimensional feature vector, and finally completing color target matching. The dimensionality of the feature vectors is far higher than 128 dimensions of the classic SIFT algorithm, obviously along with the increase of the dimensionality, the time for image matching is greatly increased, the efficiency is low, and the calculation difficulty is correspondingly increased.

Therefore, for convenience of description, the embodiment of the present invention uses a gray scale image as an example to describe the icon recognition method of the present invention, and does not limit the application scope of the present invention.

And S12, processing the image to be identified by using an image processing method to detect the characteristic points of the specific icon and extracting characteristic vectors of the characteristic points to form a unique characteristic vector group.

In the embodiment of the present invention, the feature points include corner points, edge points, bright points in a dark area, and dark points in a bright area of the image to be recognized/the specific icon. Or the feature point is a local extreme point with direction information detected under images in different scale spaces, so that the feature point has three features of scale, direction and size.

In a possible design, the present invention preferably performs the following steps of forming a feature vector group by detecting feature points of the specific icon and extracting feature vectors of the feature points:

1. down-sampling the image to be identified to generate a first image sequence containing a plurality of down-sampled images which are sequenced according to a preset sequencing rule;

in the embodiment of the present invention, the down-sampling is a sampling process in which the number of sampling points is reduced. For an N × M image, if the down-sampling coefficient is k, every k points in each row and each column in the original image are sampled to form an image. Therefore, each time down-sampling is performed, the pixels of the obtained image are smaller than those of the previous image. In general, the main purpose of down-sampling, i.e. reducing, an image is two: (1) conforming the image to the size of the display area; and (2) generating a thumbnail of the corresponding image.

In the embodiment of the present invention, the preset ordering rule includes: for convenience of description, in the embodiment of the present invention, the first image sequence is preferably sorted according to a rule from bottom to top from large to small, where a large pixel is disposed in a lower layer, and a small pixel is disposed in an upper layer, and the first image sequence and the second image sequence are sequentially arranged to form an image layer structure similar to a pyramid.

2. Performing Gaussian convolution operation on each image in the first image sequence by using different scale space factors to generate a second image sequence, wherein the scale space factors are used for representing the smoothness degree of the images;

in general, we often describe information of an image through extrema or gradients, however, scale information is equally important, the finer and finer scales make the image add more details, and conversely, the coarser the scales make many details lost, so that when describing image scale changes by establishing a multi-scale space, by introducing a continuous change parameter regarded as a scale, that is, the spatial scale factor, a scale space representation sequence under the multi-scale is obtained, and the sequence is subjected to scale space extraction, thereby realizing the extraction of feature vectors of feature points of the image to be identified on different resolutions.

In the embodiment of the invention, in order to extract the feature vectors of the feature points of the image to be recognized on different resolutions, the SIFT algorithm is preferably used for processing the image and extracting the feature vectors of the feature points. The SIFT algorithm searches the feature points for the image to be identified on different scale spaces, the acquisition of the scale spaces needs to be realized by using gaussian convolution operation (gaussian blur), and Lindeberg et al have proved that a gaussian convolution kernel is a unique transformation kernel for realizing scale transformation and is a unique linear kernel. In the embodiment of the invention, the Gaussian convolution operation is Gaussian blur. Gaussian blur is often used where image size is reduced. When down-sampling is performed, the image is usually subjected to low-pass filtering processing before sampling. This ensures that no spurious high frequency information is present in the sampled image.

In the first step, the downsampling of the image to be recognized is completed, the first image sequence is generated after the downsampling, and the first image sequence forms an image pyramid by a pyramid model which is formed from a large image sequence to a small image sequence from bottom to top. The original image is the first layer of the pyramid and the new image obtained by each down-sampling is one layer of the pyramid (one image for each layer). Performing a gaussian convolution operation on each image in the first image sequence by using different scale space factors on the basis of the first step to generate the second image sequence.

Specifically, referring to fig. 2, fig. 2 shows a schematic diagram of the second image sequence. In fig. 2, the arrows on the left side represent fine interval variations in spatial scale factor, and the image layer structure on the right side is a plurality of the second image sequences. In order to enable the scale space factors to show the continuity of the scale space factors, on the basis of the first step, gaussian blur of different scales is performed on each image generated after down sampling by using different space scale factors to generate a second image sequence. The second image sequence is based on an image pyramid model generated by original downsampling, one image of each layer of the image pyramid is subjected to Gaussian blur by using different scale parameters (the spatial scale factors), each layer of the pyramid contains a plurality of Gaussian blur images, the plurality of images of each layer of the pyramid are combined to form a group (Octave), as shown in fig. 2, each layer of the pyramid only has one group of images, the number of the groups is equal to the number of the layers of the pyramid, and the scale spatial factors are used for representing the smoothness degree of the images.

It should be noted that simply building the scale space pyramid by down-sampling would make the scale change granularity too large and not very accurate. Only through down-sampling, the concept of the scale space factor is introduced because the scale change of the fine interval cannot be obtained, and the continuity of the scale space cannot be embodied, and although the calculation of the scale space factor consumes much time compared with the down-sampling time, the introduction of the scale space factor actually takes positive countermeasures for solving the problem of the scale change of the fine interval.

Preferably, the specific principle of generating the second image sequence by performing gaussian blurring on each down-sampled image to be identified in the present invention is as follows:

the Gaussian convolution operation is carried out by using a Gaussian function and an original image, and the formula is as follows:

D(x，y，σ)＝[G(x，y，kσ)-G(x，y，σ)]*1(x，y)

＝L(x，y，kσ)-L(x，y，σ) (1-1)

wherein, L (x, y, sigma) is the scale space of the image to be identified, G (x, y, sigma) is a two-dimensional Gaussian function with variable scale, and I (x, y) is the original image. Wherein, the expression of the two-dimensional Gaussian function G (x, y, sigma) with variable scale is as follows:

gaussian blurring is an image filtering technique that is commonly used in the image processing neighborhood to reduce image noise and reduce detail levels, and produces an image that has the visual effect of looking through a translucent screen. It uses normal distribution (Gaussian function) to calculate fuzzy template, and uses said template and original image to make convolution operation to attain the goal of fuzzy image,

specifically, the derivation process of the formula (1-2) is as follows:

the formula (1-2) is obtained by taking N dimension as two dimension and using N dimension as normal distribution equation of space. Preferably, the invention applies the two-dimensional Gaussian template of the formula (1-2) to perform Gaussian blur on the image to be identified. Wherein, the N-dimensional space normal distribution equation is as follows:

where σ is the standard deviation of the normal distribution of the expression (1-3), and the larger the value is, the more blurred (smoothed) the image is. r is the blur radius, which refers to the distance of the template element from the center of the template. If the size of the two-dimensional template is m × n, the gaussian calculation formula corresponding to the element (x, y) on the template is:

where m, n represents the dimension of the gaussian template (determined by (6 σ + 1) × (6 σ -1)). (x, y) represents the pixel position of the image. σ is a scale space factor, and smaller values indicate that the image is smoothed less, and the corresponding scale is smaller. The large scale corresponds to the overview features of the image and the small scale corresponds to the detail features of the image.

In two-dimensional space, the contour lines of the curved surface generated by this formula are concentric circles normally distributed from the center, as shown in fig. 3. And transforming a convolution matrix formed by pixels with distribution not equal to zero and the original image. The value of each pixel is a weighted average of the values of the surrounding neighboring pixels. The original pixel has the largest gaussian distribution value and therefore has the largest weight, and the adjacent pixels have smaller and smaller weights as they are farther and farther from the original pixel. This blurring process preserves the edge effect more than other equalized blurring filters.

In summary, in this step, the gaussian blur is performed on each image in the first image sequence by using different scale space factors on the basis of the first image sequence generated by down-sampling in the step one, so as to generate the second image sequence with scale space continuity, that is, the gaussian pyramid.

3. Subtracting two adjacent images in the second image sequence to obtain a third image sequence, detecting an extreme point in the third image sequence as a feature point of the specific icon, wherein a function expression corresponding to the third image sequence is as follows:

D(x，y，σ)＝[G(x，y,kσ)-G(x，y,σ)]*I(x，y)

＝L(x，y，kσ)-L(x，y,σ) (1-4)

in the embodiment of the present invention, the above three steps are used to detect the feature points of the image to be recognized, and the feature points of an image are typically edge points.

In image processing, an edge point can be regarded as a pixel where a first derivative of an image to be processed is larger or a zero crossing point of a second derivative, so that the first derivative or the second derivative of the image to be processed can be obtained to determine the edge point of the image. The image to be processed is located at the pixel where the first derivative is larger, that is, the first derivative takes the extreme point, and the corresponding pixel where the second derivative is 0. That is, the edge points of the image may be represented as first-order pole-crossing points or second-order zero-crossings of the image. The laplacian (Laplace) operator is to perform edge detection by solving the image for the zero crossing point of the second derivative of the image. Therefore, the problem of detecting the feature points of the image to be identified in the invention is converted into the problem of solving pixel points at the extreme point of the first derivative or the zero point of the second derivative of the first image sequence.

Preferably, since the Laplace operator performs edge detection by differentiating the image, the Laplace operator is sensitive to discrete points and noise. Therefore, the robustness of the operator to noise and discrete points can be improved by performing Gaussian convolution filtering on the image for noise reduction and then performing edge detection by using the Laplace operator, so that the Laplace Gaussian operator LOG (Laplace of Gaussian) becomes a preferable method for realizing edge detection.

Specifically, the LOG operator is a double combination of gaussian and laplacian, and the algorithm is to directly obtain a second derivative of the gaussian fuzzy model. In other words, the principle of the laplacian of gaussian specifically implementing the detection of the edge points of the image to be recognized is as follows: firstly, gaussian blurring is carried out on an image to be recognized, then a second derivative is obtained on the blurred image to be recognized, and pixels corresponding to the positions where the second derivative is equal to zero are edges of the image. Lindeberg, as early as 1994, found that a Difference of Gaussian (DOG) function is very similar to a scale-normalized LOG function, and the specific derivation process is as follows:

and (3) carrying out Laplacian operation on the smoothed image, or directly carrying out Laplacian operation on a Gaussian smoothing operator:

we normalize the Labraus transformObtaining:

from this we have found that the DOG function is actually an approximation to the normalized LOG function. Therefore, local extrema of different scale spaces can be obtained by establishing a series of DOG functions of different scale parameters. The DOG function is a method of grayscale image enhancement and corner detection. Therefore, in the embodiment of the invention, the extreme value detection can be carried out by using a more efficient DOG operator instead of the Laplace operator, so that the efficiency of the algorithm can be improved, and the calculation amount can be reduced. Wherein, the gaussian difference function formula is as follows:

D(x，y，σ)＝[G(x，y，kσ)-G(x，y，σ)]*I(x，y)

＝L(x，y，ka)-L(x，y，a) (1-4)

in summary, the problem of detecting the feature point of the image to be identified is to convert the feature point into the problem of finding the extreme point of the gaussian difference function, and the image corresponding to the gaussian difference function is the gaussian difference image, that is, the third image sequence.

In particular, as can be seen from equation (1-4), the third image sequence is actually generated by subtracting two adjacent images in the image pyramid model, that is, the second image sequence.

Referring to fig. 4, fig. 4 is a schematic diagram illustrating generation of the third image sequence corresponding to the gaussian difference function. In fig. 4, the left image sequence is the second image sequence, and the right image sequence is the third image sequence obtained by subtracting two adjacent images of the left second image sequence from each other. The variation of the pixel values on the image can be seen in fig. 4. (if there is no change, there is no feature-the feature must be the point where the pixel value changes as much as possible). The DOG image depicts the outline of the image to be identified.

Subtracting one image from the other can preserve the spatial information contained in the frequency bands maintained in the two images. In this way, the DOG acts as a band pass filter that removes all other frequency information except those that remain in the original image. Which can be used to increase visibility of edges and other details. And when carrying out extreme value detection, only carrying out extreme value point detection on each pixel in the third image sequence.

In a possible design, the present invention preferably detects the extreme point in the third image sequence by the following method:

comparing the pixel value of each pixel point of each picture in the third image sequence with the pixel values of all adjacent points, and when the pixel value of a certain pixel point is larger or smaller than the pixel values of the adjacent pixel points of the image domain and the scale domain, judging that the pixel point is a local extreme point of a function expression corresponding to the third image sequence, wherein the characteristic point of the specific icon consists of the local extreme points.

Referring to fig. 5, fig. 5 is a schematic diagram illustrating the principle of spatial extremum detection. The characteristic points are composed of local extreme points of DOG space, and the preliminary exploration of the characteristic points is completed by comparing two adjacent layers of images of each DOG in the same group. To find the extreme points of the DOG function, each pixel point is compared with all its neighbors to see if it is larger or smaller than its neighbors in the image domain and scale domain. As shown in fig. 5, the middle detection point a is compared with 26 points, which are 8 adjacent points of the same scale and 9 × 2 points corresponding to upper and lower adjacent scales, to ensure that extreme points are detected in both scale space and two-dimensional image space.

For comparison at adjacent scales, as each group of the gaussian difference pyramid with 4 layers at the right side of fig. 4, the extreme point detection at two scales can be performed only in the middle two layers, and other scales can be performed only in different groups. In order to detect extreme points of S scales in each group, S +2 layers of images are needed in each group of the DOG pyramid, the DOG pyramid is obtained by subtracting two adjacent layers of the Gaussian pyramid, S +3 layers of images are needed in each group of the Gaussian pyramid, and S ranges from 3 to 5 in actual calculation.

However, the above method detects that the local extreme points are not all stable feature points, because some extreme points have weak response and the DOG operator generates strong edge response. Therefore, the detection of the local extreme point in the above DOG scale space needs further verification to accurately locate the feature point, that is, the accurate location of the feature point.

In a possible design, the present invention preferably achieves precise positioning of the feature points by the following method:

In order to improve the stability of the local extreme points to achieve accurate positioning of the feature points, curve fitting needs to be performed on a DOG three-dimensional quadratic function in a scale space. Using a Taylor expansion of the DOG function at the feature points:

wherein X = (X, y, σ), the landmark position X is determined by taking the derivative of equation (1-6) and making it zero _max Can be accurately found:

preferably, the feature points with low contrast and unstable edge response points are removed in order to enhance matching stability and noise immunity while accurately positioning. Mixing X _max Into D (X, y, σ), if | D (X) _max ) If | ≧ 0.03, the feature point is a stable feature point, and the feature point is retained, otherwise, the feature point is discarded.

In conclusion, the detection of the feature points is completed. Through the scale invariance extremum point, the scale invariance property can be obtained. In the embodiment of the present invention, extracting the feature vector of the feature point specifically means extracting a stable feature vector of the feature point.

In the embodiment of the present invention, the feature vector is used to describe each feature point, and the feature vector has two parameters, namely a magnitude parameter and a direction parameter. In order to make the feature vector character have rotation invariance, a reference direction needs to be allocated to each feature point by using local features of the image. In the embodiment of the present invention, by using the gradient direction distribution characteristics and the gradient modulus values of the feature point neighborhood pixels, a direction parameter and a size can be specified for each feature point, so as to extract the feature vector having invariance for the feature point.

In a possible design, the present invention preferably extracts the feature vectors of the feature points by the following methods:

and counting the gradient information of each pixel point in the specific neighborhood of any one feature point and drawing a gradient information histogram to generate a feature vector, wherein the feature vector set of all the feature points forms the feature vector group.

Referring to fig. 6, fig. 6 is a diagram illustrating gradient and direction histogram statistics of pixels in a neighborhood of the feature point. After the gradient calculation of the feature points is completed, the gradient and the direction of the pixels in the neighborhood are counted by using the histogram. The gradient histogram divides the 0-360 degree directional range into 36 bins, with 10 degrees per bin. As shown in fig. 6, the direction of the peak of the histogram represents the main direction of the feature vector of the feature point (for simplicity, only eight directions of the histogram are shown).

The Lowe experiment result shows that the feature vectors of the feature points are characterized by using 4 × 8 =128-dimensional vectors, that is, the 128-dimensional vector characterization for calculating gradient information of 8 directions of 16 adjacent pixel points in a 4 × 4 window in the feature point scale space is used, so that the comprehensive effect is optimal, and therefore the feature vectors for characterizing the feature points are generated by the following method preferably:

referring to fig. 7, fig. 7 shows a schematic diagram of gradient information of 16 pixel points in the specific neighborhood of the feature point. The point B located in the center is the feature point, and the points in each small box are adjacent pixel points, where the modulus (length of black line) and direction (direction of black line) of the gradient of each pixel point are marked. Respectively taking four points from four quadrants of a plane rectangular coordinate system which takes the characteristic point as an origin in each characteristic point scale space, and obtaining 16 pixel points adjacent to the characteristic point in total;

And S13, matching the generated feature vector group with a feature vector group uniquely corresponding to each reference icon in a preset feature database one by one to determine a target icon matched with the specific icon.

Specifically, the cosine distance is calculated between each sub-vector of the generated feature vector group and each component of the feature vector group uniquely corresponding to each reference icon in the feature database, and when the cosine distance between two components is smaller than a preset threshold value, it is determined that the feature points corresponding to the two sub-vectors are successfully matched.

The cosine distance is one of similarity measurement functions, and the specific principle is that the cosine function is used for calculating the similarity of two pictures. The similarity measurement function is used for measuring the common characteristics among different images, and can be an Euler distance, a cosine function cosine or a Pearson function. One or more of (a).

In one possible design, the invention prefers a cosine function as a similarity measurement function to perform similarity measurement on each component vector of the feature vector group and each component vector of the feature vector group uniquely corresponding to each reference icon in the feature database. If one component vector of the feature vector group is a (A1, A2.... A128), one component vector of the feature vector group uniquely corresponding to each reference icon in the feature database is B (B1, B2.. B128), wherein A1.. A128 and B1.. A128 are gradient information of the corresponding component vectors. The distance between the two partial vectors is calculated according to a cosine function calculation formula as follows:

wherein, the numerator is the dot product of two vectors, | a | | | is the length of the vector, and as can be seen from the formula (1-11), the function varies from-1 to 1 with the variation of the angle. The cosine of the vector angle is the cosine distance of the two components. And when the cosine distance of some two components is smaller than a preset threshold value, judging that the feature points corresponding to the two component vectors are successfully matched.

In this embodiment of the present invention, after determining that the feature points corresponding to the two partial vectors are successfully matched, the method further includes:

And S14, acquiring and outputting icon information corresponding to the target icon from a third-party database.

In the embodiment of the present invention, the specific icon includes a trademark image, a famous painting, a landmark building, and the like, and the steps specifically include:

and sending an acquisition request containing the icon information of the target icon to a server corresponding to the third-party database. The third-party database comprises mapping relations between the reference icons and the icon information. The mapping relationship may be understood as that each reference icon is stored in the third party database in association with icon information. And after receiving the acquisition request, a server corresponding to the third-party database inquires the third-party database according to a target icon contained in the acquisition request, determines trademark information corresponding to the target icon according to the mapping relation and feeds the trademark information back to the system, and the system receives the trademark information fed back by the server and outputs the trademark information.

Referring to fig. 8, the present invention further provides an icon identifying apparatus, which in one embodiment includes a first receiving module 11, an extracting module 12, a matching module 13, and an outputting module 14. Wherein,

the acquisition module 11: acquiring an image to be identified containing a specific icon;

the extraction module 12: processing the image to be identified by using an image processing method to detect the characteristic points of the specific icon and extracting the characteristic vectors of the characteristic points to form a unique characteristic vector group;

the matching module 13: matching the generated feature vector group with a feature vector group uniquely corresponding to each reference icon in a preset feature database one by one to determine a target icon matched with the specific icon;

the output module 14: and acquiring and outputting icon information corresponding to the target icon from a third party icon information database.

The specific implementation principle of each module for implementing each corresponding step of the icon identification method is consistent with that of each step of the icon identification method, and is not described herein again.

By combining the embodiments, the method has the greatest beneficial effects that the method combines a scale invariant feature matching algorithm which can maintain invariance to image rotation, scale scaling and brightness change, realizes the acquisition of the information contained in the icon from the icon, improves the adaptability of the method to the environment, and has strong robustness and high matching precision.

The method and the device solve the problem that the image registration and the target identification tracking performance are influenced due to factors such as the self state of the icon, the environment of the scene, the imaging characteristics of imaging equipment and the like in the icon identification process. Specifically, the invention can solve the problems of rotation, scaling and translation (RST) of the icon in the process of identifying the icon; image affine/projective transformation; illumination effects; shielding a target; the scene of the sundries, the influence of noise and the like.

According to the method and the device, when the matching of the target icon is implemented, not only can a best matching target icon be determined, but also a matching result which follows the sequence of the matching degrees from large to small can be determined, and a user can select a final matching result according to actual conditions and requirements, so that the flexibility of the scheme is improved, and the user experience is improved.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware instructions related to a program, and the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

While the foregoing has described in detail the mobile terminal provided by the present invention, those skilled in the art will appreciate that the various modifications and variations can be made to the mobile terminal provided by the present invention in light of the above teachings, and therefore the scope of the present invention is not to be limited by the following claims.

Claims

1. An icon recognition method, the method comprising:

acquiring an image to be identified containing a specific icon;

2. The method according to claim 1, wherein the feature points comprise corner points, edge points, bright points of dark areas and dark points of bright areas of the image/the specific icon to be recognized.

3. The method according to claim 1, wherein before processing the image to be recognized by using an image processing method to detect the feature points of the specific icon and extract the feature vectors of the feature points to form a unique feature vector group, the method further comprises:

4. The method according to claim 1, wherein the specific icon includes a trademark image, and the step of querying a preset icon information database to obtain and output icon information corresponding to the target icon specifically includes:

5. The method according to claim 1, wherein the step of processing the image to be recognized by using an image processing method to detect feature points of the image to be recognized and extract feature vectors of the feature points to form a feature vector group specifically comprises:

D(x，y，σ)＝[G(x，y，kσ)-G(x，y，σ)]*I(x，y)

＝L(x，y，kσ)-L(x，y，σ)

6. the method of claim 5, wherein the predetermined ordering rule comprises: and performing ascending or descending arrangement according to the size of each image pixel in the first image sequence.

7. The method according to claim 5, wherein the step of subtracting two adjacent images in the second image sequence to obtain a third image sequence, and the step of detecting an extreme point in the third image sequence as the feature point of the specific icon specifically comprises:

8. The method according to claim 7, wherein said determining that the pixel point is a local extreme point of a function expression corresponding to the third image sequence further comprises:

9. The method according to claim 1, wherein the step of processing the image to be recognized by using an image processing method to detect feature points of the image to be recognized and extract feature vectors of the feature points to form a feature vector group specifically comprises:

10. The method according to claim 9, wherein the step of blocking the specific neighborhood images of each feature point and calculating the gaussian image gradient information corresponding to the pixel points in each block comprises:

for any one feature point, gradient information of 16 pixel points adjacent to the feature point in 8 directions is calculated, and a 128-dimensional vector composed of the calculated 128 gradient information as elements is used as the feature vector of the feature point of the specific icon.

11. The method according to claim 1, wherein the step of matching the generated feature vector group with the feature vector group uniquely corresponding to each reference icon in a preset feature database one by one to determine the target icon matched with the specific icon specifically comprises:

12. The method of claim 11, wherein said determining that the feature point matching between the two partial vectors is successful further comprises:

13. The method of claim 1, wherein the third-party database comprises a mapping relationship between each reference icon and icon information.

14. The method according to claim 1, wherein the step of obtaining and outputting icon information corresponding to the target icon from a third-party database specifically comprises:

15. A trademark image recognition apparatus, characterized by comprising:

an output module: and acquiring and outputting icon information corresponding to the target icon from a third-party icon information database.