CN105701501B

CN105701501B - A kind of trademark image recognition methods

Info

Publication number: CN105701501B
Application number: CN201610004214.5A
Authority: CN
Inventors: 唐攀攀; 彭宇新
Original assignee: Peking University
Current assignee: Peking University
Priority date: 2016-01-04
Filing date: 2016-01-04
Publication date: 2019-01-18
Anticipated expiration: 2036-01-04
Also published as: CN105701501A

Abstract

The present invention provides a kind of trademark image recognition methods, comprising the following steps: the training sample for preparing trade mark to be identified extracts its local feature and clustered and quantified；Feature selecting is carried out using mutual information；Local feature identical with training sample is extracted to test image, and the cluster centre obtained before quantifies local feature；The feature remained using feature selecting is filtered the key point of test picture, then carries out key point matching to test picture and positive sample；Matching is constrained using spatial topotaxy, removes the key point pair of erroneous matching, similarity of number of the statistics with point centering difference visual word as test image and positive sample；Determine whether test image includes the trade mark according to similarity size.The present invention is accounted for from filtering uncorrelated features point and in terms of eliminating erroneous matching two, is a kind of relationship of complementation between them, can be mutually promoted, to preferably improve brand recognition accuracy rate.

Description

A kind of trademark image recognition methods

Technical field

The invention belongs to sensation target detections and identification technology field, and in particular to a kind of trademark image recognition methods.

Background technique

In recent years, with the rapid development of Internet technology and universal, the especially continuous promotion and application of social network sites, Image on network presents a rapidly rising trend, this makes the network media become most potential advertisement and commercial podium.Brand with Track is a kind of service occurred in recent years, is commented by analyzing the evaluation of frequent degree and user that brand exposes in the media Estimate the growth of brand.There are two types of traditional analysis methods, one is relying on manually to be analyzed and counted, due to network media number Measure huge, it will take a lot of manpower and time for this mode；The second way is the keyword search function by website Relevant image is retrieved, however keyword might not be closely related with content: containing much noise in one side search result, Although, often to brand and uncorrelated, on the other hand can not largely be retrieved containing brand-name image containing keyword.Therefore It designs a kind of pair of picture material to analyze, automatic identification quotient's calibration method contained therein, there is important commercial application value.

In recent years, researcher proposes some brand recognition methods based on " bag of words ": mentioning from image first SIFT or SURF feature is taken, then these features are clustered, visual word is formed, then is quantified to obtain by primitive character These visual words, so every image can be expressed as the set of visual word.With directly being compared using original feature, quantify Characteristic dimension afterwards substantially reduces, and can be used for large-scale brand recognition task.But the method based on " bag of words " is deposited In two o'clock deficiency: first is that quantizing process also reduces primitive character while reducing primitive character dimension to a certain extent Expression ability, the feature for causing two scripts different may be considered as after quantization it is identical, referred to as " error hiding ".For These error hidings are reduced as far as, researchers propose serial of methods.Masterpiece includes S.Romberg et al. 2013 Year considers adjacent multiple passes while proposition in document " Bundle min-hashing for logo recognition " The method of key point and C.Wan et al. 2013 in document " Tree-based shape descriptor for scalable Meet certain condition spatial position four key point proposed in logo detection " is combined into a kind of tree structure Method.These methods can alleviate the problem of error hiding to a certain extent, but their affine transformations and key to image There are also insufficient for the robustness of point missing.

Second of method based on " bag of words " is disadvantageous in that in image in the presence of largely incoherent with trade mark Key point, these key points can not only interfere the correct identification of trade mark, while can also seriously affect recognition speed.In order to remove this A little incoherent key points are those of only to consider to appear in trade mark region key point than relatively straightforward way, still, for Different types of trade mark, the keypoint quantity that detected is widely different, if will affect recognition effect very little, if too many meeting Reduce recognition speed；In addition, those key points for appearing in trade mark region there are many, they and trade mark relevant to background There is no what correlations for itself.So only considering that the method for appearing in the key point in trade mark region can not be gone well Except the unrelated key point of those trade marks.

In conclusion there are two shortcomings for existing brand recognition technology: first is that without a kind of pair of affine change of image Change the error hiding removing method that there is preferable robustness with key point missing；Second is that can effectively filter that not in identification process The method of key point uncorrelated to trade mark a bit.

Summary of the invention

In view of the deficiencies of the prior art, the invention proposes a kind of new trademark image recognition methods, the technology is using first It is first filtered out using feature selection approach based on mutual information largely with the incoherent characteristic point of trade mark, it is then new using one kind Topological constraints method eliminate error hiding, by the combination of the two, can achieve the effect that recognition speed is fast, discrimination is high.

A kind of trademark image recognition methods of the invention, includes the following steps:

It is filtered out first using feature selection approach based on mutual information largely with the incoherent characteristic point of trade mark, includes Following steps:

(1) for every kind of trade mark to be identified, prepare the training sample of certain amount (no less than 5), Certification Mark exists At least occur in every sample primary；

(2) local feature is extracted to every training sample, such as SIFT (Scale-Invariant Feature Transform), SURF (Speeded Up Robust Features) feature, and feature is clustered and quantified, by every Training sample is expressed as vision set of words；

(3) for every kind of trade mark, using the sample comprising the trade mark as positive sample, the sample not comprising the trade mark is as negative Sample is calculated the mutual information for all visual words that positive sample is included, is ranked up based on mutual information to these visual words, is chosen The maximum preceding n (default 100) of association relationship is a as the feature for identifying this trade mark.

Further, based on select come feature to test picture carry out brand recognition comprising the steps of:

(4) identical local feature is extracted to every test image, the cluster centre then obtained using step (2) is to this A little features are quantified, and test image is expressed as to the set of visual word；

(5) key point matches: first with the preceding n visual word feature selected in step (3) to test image Key point is filtered, and only retaining those includes the key point in this n vision set of words.To test image and quotient to be identified Every positive sample of target is matched, and initial matching point pair, and the location information of record matching key point pair are obtained.

Further, the method for error hiding is eliminated using the topological relation of characteristic point, the specific steps are as follows:

(6) for each pair of match point obtained in step (5), their the nearest k of distance (defaults in respective image are found 10) a key point respectively adds the symmetric points about central point (match point) to this k key point, then closes against 2k Key point is ranked up in the direction of the clock, respectively obtains two 2k length, and end to end sequence；

(7) longest common subsequence (LCS) is asked to two sequences obtained in step (6), then finds out the public sub- sequence of longest Column account for the ratio of all 2k key points, as the matching degree between matching double points, if matching degree is less than threshold alpha (default 0.6), then it is assumed that be error hiding, they are rejected from matching double points；The method of elimination error hiding in the present invention can also be single It solely uses, for removing the key point pair of erroneous matching；

(8) number of different visual words in the matching double points remained is counted as the phase between test image and sample Like degree；

(9) maximum similarity between test image and all positive samples of trade mark to be identified is calculated, is tested as judgement Image whether include the trade mark confidence level, if confidence level be greater than threshold value beta, then it is assumed that test image include this trade mark.

The beneficial effects of the present invention are: it whether can rapidly and accurately be automatically identified from image comprising certain quotient Mark identifies that the speed of a kind of trade mark is about 20ms/ in ordinary PC, and accuracy of identification can achieve 90% or more, reach Actual application level.Why the present invention has said effect, and reason is: the present invention is filtered out greatly using feature selecting Amount with the incoherent characteristic point of target trade mark, reduce their interference to recognition result, at the same reduce subsequent step when Between complexity；On this basis, using a kind of pair of image affine transformation and key point missing all with the topology of preferable robustness Constraint can further increase accuracy of identification to eliminate erroneous matching.

Detailed description of the invention

Fig. 1 is techniqueflow chart of the invention.

Fig. 2 is the techniqueflow chart using specific image.

Fig. 3 is feature selecting effect picture.

Fig. 4 is addition symmetric points schematic diagram.

Fig. 5 is topological constraints schematic diagram.

Fig. 6 is to eliminate error hiding effect picture.

Specific embodiment

The present invention is described in further detail with specific example with reference to the accompanying drawing.

The present invention is a kind of trademark image recognition methods, and techniqueflow is as depicted in figs. 1 and 2, specifically includes following step It is rapid:

(1) prepare positive sample, extract visual signature

A certain number of training samples are prepared to every kind of trade mark to be identified, and a kind of part is extracted to these training samples Then feature is clustered and is quantified to the feature extracted such as SIFT feature or SURF feature, the view of key point is obtained Feel that word indicates, as shown in formula one:

Formula one: k={ P (k), S (k), I (k) }

Wherein, k indicates k-th of key point, and P (k) indicates position, and S (k) indicates that scale, I (k) indicate in nearest cluster The heart.

(2) feature selecting based on mutual information

The trade mark is not included using the sample in all training samples comprising the trade mark as positive sample for every kind of trade mark Sample as negative sample, calculate the mutual information of all visual words in positive sample, calculation method is as shown in formula two:

Formula two:

Wherein, c indicates that positive sample generic, t indicate some visual word in positive sample；N_ijExpression includes or does not include (i=1 expression include, i=0 expression do not include) visual word t and belong to or be not belonging to (j=1 expression belong to, j=0 indicate not Belong to) picture number of positive sample.For example, N₁₀The picture number of classification c is indicated comprising visual word t but is not belonging to, with this Analogize.N_i.=N_i0+N_i1, N_.j=N_0j+N_1j, N=N₀₀+N₀₁+N₁₀+N₁₁。

Then we are from big to small ranked up all visual words according to I (t, c), and (default 100, can also be with by n before choosing It is other values) feature of a visual word as subsequent this trade mark of identification.Because mutual information is to measure feature and Category Relevance Important indicator, association relationship is bigger, indicate feature and classification correlation it is bigger, classification (identification) in the process to result Influence it is also bigger, so using feature selecting, can filter out largely with target trade mark incoherent visual word (corresponding key Point is also filtered therewith), it on the one hand can reduce their interference to recognition result, after on the other hand capable of greatly shortening The time of continuous matching and verification, and then improve whole recognition speed.The effect of feature selecting is as shown in figure 3, share four in figure To image, left figure is characterized the image before selection in each pair of image, and right figure is characterized the image after selection.

(3) key point matches

For testing picture, when identifying whether it includes certain trade mark, need to calculate the institute of it and this trade mark There is the similarity between positive sample, is then based on maximum similarity to judge whether it includes this trade mark, if it is greater than threshold value β, then it is assumed that test image includes this trade mark.The process for calculating similarity between test picture and positive sample is as follows:

With (1), identical local feature is extracted to test picture first, then utilizes cluster centre pair obtained in (1) Feature is quantified, and test picture is expressed as vision set of words.Then utilize n feature obtained in (2) to test picture Key point filtering is carried out, those key points within the scope of this n visual word are only retained, is remained in test picture and positive sample in this way Under key point number with n in the same order of magnitude.

Then, key point matching is carried out to this two picture, if two key points are quantized to the same cluster centre, I (k) i.e. in formula one is equal, then it is assumed that they are matched.Initial matching double points are obtained in this way.Due to single key point Descriptive power it is limited, be easy to produce error hiding, therefore next reduce using our self-designed topological constraints of one kind Error hiding.

(4) error hiding is reduced using topological constraints

Key point matched for two, they are called central point by we, are found first spatially nearest from central point K key point, as it is understood that each key point location information P (k) (as shown in formula one), only need to calculate The Euclidean distance of central point and other all key points, is then from big to small ranked up key point according to Euclidean distance, and K key point before choosing all is chosen if the number of key point is less than k.

Then, we add the symmetric points about central point to this k key point respectively, as shown in figure 4, wherein (a), (b) figure be add symmetric points before key point distribution, (c), (d) figure be add symmetric points after key point be distributed. In order to distinguish the symmetric points of original point and addition, the cluster centre of symmetric points is set as the opposite of original point cluster centre by us Number, as shown in formula three:

Formula three: k '={ P (k), S (k) ,-I (k) }

Then, using central point as coordinate origin, any one key point is starting point, along clockwise direction to this 2k key Point is ranked up, and obtains two sequence S₁And S₂, as shown in figure 5, here it is considered that S₁And S₂It is end to end.

Then, we are to S₁And S₂Longest common subsequence is sought, we can solve using Dynamic Programming, give The length of two arrays A and B, their longest common subsequence can use the dynamic transfer equation solution in formula four:

Formula four:

Wherein, dp [i] [j] indicates the length of the longest common subsequence of A [1 ... i] and B [1 ... j], it is assumed that the length of A is The length of m, B are n, then dp [m] [n] is exactly our desired answers.Unlike A and B, S₁And S₂Be it is end to end, So we need to enumerate the starting point of one of sequence, longest common subsequence is then sought again, is recorded and is acquired in enumeration process Maximum length sequence, as S₁And S₂Longest common subsequence, we are expressed as LCS (S₁,S₂), as shown in Figure 5.

Next we calculate the matching degree of central point, and calculation method is as shown in formula five, i.e. longest common subsequence The ratio of length and former sequence length:

Formula five:

Wherein, Length of LCS (S₁,S₂) indicate S₁And S₂Longest common subsequence length, min { #S₁,#S₂} Indicate S₁And S₂Key point number smaller value, the matching degree put centered on r, range is between 0~1.If two keys The matching degree of point is less than some threshold alpha (default 0.6), then it is assumed that they are error hidings, they are given up to fall, and final statistics retains Similarity of the quantity of different visual words as test picture and positive sample in the matching double points to get off.Fig. 6 is to eliminate error hiding Effect picture, wherein (a), (b) figure indicate original match effect, effect (c), after (d) expression elimination error hiding.

It is following the experimental results showed that, compared with the conventional method, a kind of trademark image recognition methods of the present invention can obtain Higher recognition accuracy.

FlickrLogos-32 data set is used in this example, the data set is by document " Scalable Logo Recognition in Real-World Images " (author S.Romberg, L.G.Pueyo, R.Lienhart and R.Van Zwol is published in ACM international conference on Multimedia Retrieval in 2011) it proposes, It wherein include 8240 images, totally 32 trademark class, wherein 320 images are used as training set, 3960 images are as verifying Collection, 3960 images are as test set.We test following four method as Experimental comparison:

Existing method one: the method SLR of data set author, this method are constituted using the adjacent key point met certain condition Triangle enhances the descriptive power of single key point；

Existing method two: document " TREE-BASED SHAPE DESCRIPTOR FOR SCALABLE LOGO (author C.Wan, Z.Zhao, X.Guo and A.Cai are published in 2013 years IEEE Visual to DETECTION " Communications and Image Processing) in method, this method constructed one kind using four key points Tree structure describes sub- TSD, enhances the descriptive power of single key point, and have affine-invariant features；

Existing method three: document " Correlation-Based Burstiness for Logo Retrieval " (author J.Revaud, M.Douze and C.Schmid are published in ACM international conference in 2012 Multimedia the method CBB in), this method can learn automatically from training set out those correctly identify that generation is dry to trade mark The image-region disturbed, reduces their weight, to improve recognition result.

Experiment is evaluated and tested using the accuracy rate (Precision) and recall rate (Recall) index of information retrieval field use The accuracy of brand recognition, and the two indexs are comprehensively considered using F1 value, F1 value is higher, illustrates that the result of identification is better.

The contrast and experiment of table 1. and existing method

Control methods	Existing method one	Existing method two	Existing method three	The present invention
					Accuracy rate	0.980	0.980	0.980	0.994
Recall rate	0.610	0.680	0.730	0.868
					F1 value	0.752	0.803	0.837	0.927

As it can be seen from table 1 the present invention achieves best brand recognition as a result, comparison existing method one, it is using completely Three key points of the certain constraint condition of foot construct triangle description, however, the triangle that constructs and not having imitative Invariance is penetrated, the image of true environment can not be perfectly suitable for；Existing method two is compared, the tree structure that it is constructed describes son Although having affine-invariant features, this tree structure description is more sensitive to key point missing, constitutes the four of description The missing of any one key point will lead to total failure in a key point；Existing method three and the present invention are completely not Same two methods, the two can be complementary in actual application, to further increase recognition result.The present invention is sharp first Both a large amount of incoherent characteristic points are filtered out with feature selecting, then further remove error hiding using topological constraints, pass through Combination, the recognition result of trademark image can be greatly improved.

Obviously, various changes and modifications can be made to the invention without departing from essence of the invention by those skilled in the art Mind and range.In this way, if these modifications and changes of the present invention belongs to the range of the claims in the present invention and its equivalent technologies Within, then the present invention is also intended to include these modifications and variations.

Claims

1. a kind of trademark image recognition methods, comprising the following steps:

(1) using the training sample of trade mark to be identified, the local feature of training sample is extracted, local feature is clustered and is measured Change；

(2) feature selecting is carried out to training sample using mutual information, filtered out and the incoherent key point of trade mark；

(3) local feature identical with training sample is extracted to test image, and the cluster centre obtained before utilizing is to part Feature is quantified；

(4) feature remained using feature selecting to test picture key point be filtered, then to test picture and Trade mark positive sample carries out key point matching, obtains initial matching point pair；

(5) to each pair of match point, by point centered on matched key point, it is crucial to find nearest from central point in image k Point, then respectively adds the symmetric points about central point to this k key point, and the visual word number of symmetric points is original point view Feel the opposite number of word number；

(6) using central point as coordinate origin, all 2k key points is ranked up in the direction of the clock, obtain cyclic sequence；

(7) longest common subsequence is asked to two obtained cyclic sequences, it is long with longest common subsequence length and original series Matching degree of the ratio of degree as two central points, if matching degree be less than threshold value, then it is assumed that they are error hidings, by its from It is rejected with centering；

(8) number of different visual words is similar to positive sample as test image in the matching double points that statistics finally remains Degree；

(9) maximum similarity for calculating all positive samples of test image and trade mark to be identified, determines according to similarity size and surveys Attempt to seem no comprising the trade mark.

2. the method as described in claim 1, which is characterized in that in the step (1), the training samples number of every kind of trade mark is not Less than 5, the local feature of extraction includes SIFT, SURF, and obtained cluster centre is used for the quantization of follow-up test sample.

3. the method as described in claim 1, which is characterized in that will include the quotient for every kind of trade mark in the step (2) For target sample as positive sample, then other samples not comprising the trade mark calculate each vision in positive sample as negative sample The mutual information of word chooses the maximum preceding n visual word of mutual information as the feature for identifying the trade mark.

4. the method as described in claim 1, which is characterized in that in the step (4), for what is extracted in test image Key point only retains the key point in the characteristic range for including trade mark to be identified, so that key point number to be matched is significantly It reduces；Then key point matching is carried out to test image and positive sample, is quantized to and if only if two key points same poly- When class center, it is believed that two key points are matched.

5. the method as described in claim 1, which is characterized in that in the step (7), ask two using the method for Dynamic Programming The longest common subsequence of sequence needs to enumerate the starting point of one of sequence, then asks longest public again due to being cyclic sequence Subsequence altogether records the maximum length sequence acquired in enumeration process, as two cyclic sequence longest common subsequences.

6. the method as described in claim 1, which is characterized in that be not direct for utilizing and remaining in the step (8) Similarity of the number with point as test image and positive sample, but the different visual word number of these matching centerings is counted, This have the advantage that two different key points are quantized to the same cluster centre when cluster centre number is smaller Probability will increase, and help to reduce the interference of cluster centre number of variations bring by counting different visual word numbers.