WO2019127504A1

WO2019127504A1 - Similarity measurement method and device, and storage device

Info

Publication number: WO2019127504A1
Application number: PCT/CN2017/120231
Authority: WO
Inventors: 韩琨; 阳光
Original assignee: 深圳配天智能技术研究院有限公司
Priority date: 2017-12-29
Filing date: 2017-12-29
Publication date: 2019-07-04
Also published as: CN109313709A

Abstract

A similarity measurement method and device, and a device having a storage function, relating to the technical field of identification. The method comprises: acquiring a feature of an object to be identified (S101); calculating a difference between the feature of the object and a feature of a preset template, and processing the difference by using a preset policy, such that a processed difference is greater than or equal to the difference before processing (S102); and calculating similarity between the object and the preset template by using the processed difference (S103). The method accurately and quickly identifies and classifies an object to be identified, thereby improving the speed and success rate of identification.

Description

Method, device and storage device for similarity measurement

[Technical Field]

The present application relates to the field of identification technologies, and in particular, to a method and device for measuring similarity and a device having a storage function.

【Background technique】

When some information is identified, it is usually calculated by calculating certain features in the information, which may have a certain degree of discrimination for different targets, and then compare these features with the features of the preset template. To complete the identification of the information classification. For example, in some simple classification scenarios, we only need to give a threshold to identify the distinction. However, the inventor of the present application found that the recognition speed and accuracy of the method are low in the long-term development process, and the threshold value may cause misclassification problems, especially in some relatively complicated scenarios. The characteristics of different preset templates are similar, which is easy to cause confusion, which not only causes identification errors, causes misclassification, but also reduces the recognition rate, and even can not judge some features that may cause confusion.

[Summary of the Invention]

The technical problem to be solved by the present application is to provide a method and device for measuring similarity and a device having a storage function, which can improve recognition speed and recognition rate.

To solve the above technical problem, a technical solution adopted by the present application is to provide a method for measuring similarity, the method comprising: acquiring features of an object to be identified; and calculating a difference between a feature of the object and a feature of the preset template. And using the preset strategy to process the difference, so that the processed difference is greater than or equal to the difference before the processing; and using the processed difference to calculate the similarity between the object and the preset template.

In order to solve the above technical problem, another technical solution adopted by the present application is to provide a similarity measuring device, the device comprising a processor, a memory and a communication circuit, the processor is coupled to the memory and the communication circuit; the processor is working Obtaining, by the communication circuit, a feature of the object to be identified, and then calculating a difference between the feature of the object and a feature of the preset template, and processing the difference by using a preset strategy, so that the processed difference is greater than or It is equal to the difference before processing; the similarity between the object and the preset template is calculated by using the processed difference.

In order to solve the above technical problem, another technical solution adopted by the present application is to provide a device having a storage function, the device storing a program, and when the program is executed, the above-mentioned measure of similarity is implemented.

The beneficial effects of the present application are: different from the prior art, when the similarity measure is performed between the object to be identified and the preset template, the difference is processed by the difference between the features, so that the processed difference is greater than or It is equal to the difference before processing, which can enlarge the difference between the features, so that the identification object can be identified and classified more accurately and quickly, and the recognition speed and recognition rate can be improved.

[Description of the Drawings]

1 is a schematic flow chart of a first embodiment of a method for measuring similarity of the present application;

2 is a schematic flow chart of a second embodiment of a method for measuring similarity of the present application;

3 is a template coding method of a one-dimensional code in a UPC-A code;

4 is a schematic structural diagram of a first embodiment of a similarity measuring apparatus of the present application;

FIG. 5 is a schematic structural diagram of a first embodiment of an apparatus having a storage function according to the present application.

【Detailed ways】

In order to make the objects, technical solutions and effects of the present application more clear and clear, the present application will be further described in detail below with reference to the accompanying drawings.

The present application provides a method and device for measuring similarity, which can be applied to at least an image recognition processing scenario, in particular, a scene in which features are relatively similar and confusing between a given plurality of preset templates. By calculating the similarity between the image to be recognized and the preset template, the difference between the features is processed to enlarge the difference between the features, so that the recognition image can be classified and classified more accurately and quickly, and the recognition speed and the recognition speed are improved. Recognition rate. The following specific expansion instructions:

Please refer to FIG. 1. FIG. 1 is a schematic flowchart diagram of a first embodiment of a method for measuring similarity of the present application. As shown in FIG. 1, in this embodiment, the measure of similarity includes:

S101: Acquire a feature of the object to be identified.

In this step, the object to be identified may be an image, such as a one-dimensional code image, a two-dimensional code image, or the like. The obtained object feature may be at least one of area, width, perimeter, and density, and the object features have a certain degree of discrimination for different preset templates, and the ratio of the object features to the preset template features Yes, the object to be identified can be identified and classified.

S102: Calculate a difference between the feature of the object and the feature of the preset template, and process the difference by using a preset strategy, so that the processed difference is greater than or equal to the difference before the process.

Specifically, the preset template corresponding to the object to be identified is selected, and the object to be identified is compared with the preset template, and the difference between the object feature and the preset template feature is calculated to calculate the object to be identified and the pre-determined object. Set the similarity between the templates. Wherein, when calculating a difference between the feature of the object and the feature of the preset template, the difference is processed by using a preset strategy, so that the processed difference is greater than or equal to the difference before the processing. By processing the difference and amplifying the difference, the difference between the features can be expanded, which is equivalent to a penalty for the original similarity; the similarity between the object feature and the preset template feature can be reduced to achieve a better global similarity. Sex. When the difference is processed, the difference of all the features can be processed, or only the difference of the partial features can be processed.

The smaller the difference between the object feature and the preset template feature, the closer the object feature is to the preset template feature, the greater the similarity between them, and the corresponding similarity measure is larger; The greater the difference between the object feature and the preset template feature, the more distant the object feature and the preset template feature are, the smaller the similarity between them is, and the smaller the similarity measure is. The difference between the object feature and the preset template feature can be calculated by using a similarity coefficient function and a distance function.

For example, the distance difference between the feature A1 in the object to be identified and the feature A in the preset template is calculated according to the conventional distance function. In the method for measuring the similarity of the present application, the difference may be doubled. The square of the normal difference is taken as the final difference, then the final difference becomes 9 (the square of 3). After this processing, the difference between the object feature and the preset template feature becomes larger, which is even more dissimilar. It is easier to distinguish similar features. Improve recognition speed and recognition rate.

S103: Calculate the similarity between the object and the preset template by using the processed difference.

After the difference between the features is processed, the difference between all the features is integrated to calculate the similarity between the object to be identified and the preset template.

Referring to FIG. 2, FIG. 2 is a schematic flowchart of a second embodiment of a method for measuring similarity of the present application. In this embodiment, after acquiring the features of the object to be identified, the object features are clustered, and the similarity is measured after clustering. By first clustering features, the calculation steps can be simplified, and the recognition speed and recognition rate can be improved. As shown in FIG. 2, in this embodiment, the measure of similarity includes:

S201: Acquire a feature of the object to be identified.

S202: Calculate a metric value of the object feature to be identified, and cluster the object feature.

First, the metric value of each guest feature (such as width) is calculated, and the metric value of each guest feature can be calculated by using a binarization algorithm, a Gabor wavelet transform algorithm, or a deep convolution network algorithm. These feature values can be combined into an n-dimensional vector, where n is the number of features.

After obtaining the metric value of each guest feature, the guest features are clustered. According to the application identification environment, the features can be divided into two categories, four categories, and the like. Object features can be clustered using k-menas clustering algorithm, Otsu algorithm (OSTU) or density algorithm.

S203: Calculate a difference between the feature of the object and the feature of the preset template, and process the difference by using a preset strategy, so that the processed difference is greater than or equal to the difference before the processing.

After obtaining the classification result of the feature, the preset template corresponding to the object to be identified is selected to perform similarity comparison, and the similarity between the object and the preset template is calculated. For different recognition scenarios, the way to calculate the similarity between the object and the preset template is different. For a simple application scenario, simple distance calculation or sorting may be used, and the similarity function or distance function may be used to calculate the object to be identified and The similarity between preset templates.

Wherein, when calculating the similarity measure between the features, for example, when calculating the distance difference, the distance difference is processed by using a preset strategy, and the preset strategy may be calculating the square value and the cubic value of the distance difference; The difference is added or the distance difference is amplified by other formula algorithms; so that the processed distance difference is greater than or equal to the distance difference before processing, the difference between the features is enlarged, and the similarity between the features is reduced. In other embodiments, the final similarity measure may be processed as a whole, that is, after the overall similarity measure is obtained, the square value, the cubic value, and the like of the overall similarity measure are calculated.

Alternatively, using a formula

Calculating a distance difference between the object feature to be identified and the preset template feature, where d is the total distance difference after processing, M is the number of features, and r _i is the feature value of the i-th feature in the preset template , w _i is the feature value of the i-th feature in the object to be identified. Wherein, the formula is applicable to the case where the distance difference between the object feature and the preset template feature is greater than 1. When the distance difference is greater than 1, the squared distance difference is calculated, and the distance difference is increased, so that the distance difference is increased. The difference between features becomes larger and the similarity becomes smaller.

S204: Calculate the similarity between the object and the preset template by using the processed difference.

S205: Perform similarity ranking on the similarity between the object and the preset template or exclude the preset template that is less than the preset threshold between the object and the object.

After calculating the similarity between the object and the preset template, the obtained similarity may be processed, for example, sorting the similarity from high to low, or excluding a preset template whose similarity is less than a preset threshold, etc., where According to different application scenarios, the difference between the preset templates is preset, and the preset threshold is adaptively set. If the similarity between the object and the preset template is higher, it indicates that the object object is likely to belong to the same class as the preset template; and the lower the similarity between the object and the preset template, the lower the similarity between the object and the preset template The object object may not belong to the preset template. In this way, the recognition rate can be greatly improved, and for objects with relatively clear features, the recognition result can be directly obtained.

In an application scenario, the measure of similarity provided by the present application can be applied to identify a one-dimensional code image. specifically,

One-dimensional codes are usually composed of black and white bars of varying widths. For simple commodity codes, such as UPC-A codes, each character is composed of two black bars and two white bars; the black bar with the smallest width Or the width of a white bar is called a module, then the total width of a character is 7 modules. Allowing the width of the black and white bars to be 1, 2, 3, and 4 times of a module, respectively, then one character is composed of four widths, each width is expressed as several times the width of the module, and the width combinations of different characters are different. . The UPC-A code only supports 0-9 for a total of 10 digits. Each digit has a different width encoding method. Please refer to Figure 3. Figure 3 shows the width encoding of the one-dimensional code in the UPC-A code, as shown in Figure 3. The width codes of the numbers 0-9 are respectively, the numbers 0: (3, 2, 1, 1); the numbers 1: (2, 2, 2, 1); the numbers 2: (2, 1, 2, 2) ); number 3: (1, 4, 1, 1); number 4: (1, 1, 3, 2); number 5: (1, 2, 3, 1); number 6: (1, 1, 1) , 4); number 7: (1, 3, 1, 2); number 8: (1, 2, 1, 3); number 9: (3, 1, 1, 2).

For a bar code image, the simple feature, ie the width, is first calculated. The width of each black and white strip can be counted after binarization. The statistical method is obtained by simply counting the number of pixels.

After the width value is obtained, the width can be classified. If the above barcode contains 4 widths, then the width is classified, and any classification method can be used, such as simple clustering using kmenas, each black and white strip represented by pixels. The width is divided into four categories of 1, 2, 3, and 4.

After feature classification, these features are compared with the coding width of the one-dimensional code to calculate the similarity. Use formula (1) for the first error! The reference source was not found. The difference between the character width and the code width is penalized:

Where d is the total difference after processing, error! The reference source was not found. For coding errors! The reference source was not found. In the first mistake! The reference source was not found. The width of the position, wrong! The reference source was not found. For the first mistake! The reference source was not found. The first mistake of the characters! The reference source was not found. The width of the black and white bar, M is the number of widths. The larger the difference, the greater the difference between features, the lower the similarity. This formula is applicable to the case where the difference in width at a certain position is larger than 1, because when the difference in width is larger than 1, the square of the difference in width can be made larger. For example, the wrong code width! The reference source was not found. The first mistake of the characters! The reference source was not found. The width of the black and white bar is 2, and the result of the calculated width is 4, which is not a simple calculation of the classification deviation of 2 (4 minus 2), but a penalty of 4 (2 times 2), that is to say In the case where the width differs by more than 1, the possibility of not being encoded is greater.

For example, if the width of a character is classified as 2, 2, 2, 1, then the width similarity to the coded character 1 is 0, and the penalty for other templates is larger, especially the

characters

3, 6, and 8. Contains black bars or white bars that differ from the code width by more than one. It can thus be known that the character represents the number 0, or an encoding template with a very low similarity can be excluded (if not the number 3 is likely to be large).

After obtaining the similarity between the character and the encoding template, these similarities can be processed, such as sorting the similarity from high to low, excluding the encoding template with low similarity, such as the encoding template of the character and 1 or 2. If the similarity is high, the probability that the character may be a

number

1 or 2 is large, and the similarity between the character and the 3 or 6 encoding template is low, and the probability that the character is a

number

3 or 6 is small, and Quickly identify characters or eliminate dissimilar codes to improve recognition speed and recognition rate. For UPC-A, there are only 10 encoding methods in total, and for code128, different width encoding templates can be used in hundreds. This method can greatly eliminate dissimilar encoding methods, thereby greatly improving the recognition rate for barcode quality. Higher images can be directly identified.

Please refer to FIG. 4. FIG. 4 is a schematic structural diagram of a first embodiment of a similarity measuring apparatus according to the present application. The similarity measurement apparatus in this embodiment may implement the above-described similarity measurement method, and the apparatus includes a processor 401, a memory 402, and a communication circuit 403. The processor 401 is coupled to the memory 402 and the communication circuit 403. The processor 401 executes instructions during operation to cooperate with the memory 402 and the communication circuit 403 to implement the above-mentioned similarity measurement method. The specific working process is consistent with the foregoing method embodiment, so This is not repeated here. For details, please refer to the description of the corresponding method steps above. The measure of similarity may be a barcode recognizer, an image scanner, or the like.

Please refer to FIG. 5. FIG. 5 is a schematic structural diagram of a first embodiment of an apparatus having a storage function according to the present application. In the present embodiment, the storage device 50 stores a program 501, and when the program 501 is executed, the above-described similarity measurement method is implemented. The specific working process is the same as that in the foregoing method embodiment, and therefore is not described here. For details, refer to the description of the corresponding method steps. The device having the storage function may be a portable storage medium such as a USB flash drive, an optical disk, a mobile hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, etc. The medium storing the program code may also be a terminal, a server, or the like.

In the above solution, when the similarity measure is performed between the object to be identified and the preset template, the difference is processed by the difference between the features, so that the processed difference is greater than or equal to the difference before the processing, and the feature can be enlarged. The difference can be more accurately and quickly treated to identify and classify the object, and improve the recognition speed and recognition rate.

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus, and method may be implemented in other manners. For example, the device implementations described above are merely illustrative. For example, the division of the modules or units is only a logical function division. In actual implementation, there may be another division manner, for example, multiple units or components may be used. Combinations can be integrated into another system, or some features can be ignored or not executed. In addition, the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in electrical, mechanical or other form.

The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit. The above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a standalone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application, in essence or the contribution to the prior art, or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium. A number of instructions are included to cause a computer device (which may be a personal computer, server, or network device, etc.) or a processor to perform all or part of the steps of the methods described in various embodiments of the present application. The foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like. .

The above description is only the embodiment of the present application, and thus does not limit the scope of the patent application, and the equivalent structure or equivalent process transformation of the specification and the drawings of the present application, or directly or indirectly applied to other related technologies. The fields are all included in the scope of patent protection of this application.

Claims

A method for measuring similarity, characterized in that the method comprises:

Obtaining the characteristics of the object to be identified;

Calculating a difference between the object feature and a feature of the preset template, and processing the difference by using a preset strategy, so that the processed difference is greater than or equal to the difference before the processing;

A similarity between the object and the preset template is calculated using the processed difference.
The method according to claim 1, wherein the difference between the feature of the object and the feature of the preset template is calculated, and the difference is processed by using a preset strategy, so that the processed difference is greater than Or equal to the difference before processing includes:

Using formula
Calculating a difference between the object feature and the preset template feature, where d is the total difference after processing, M is the number of features, and r i is the feature value of the i-th feature in the preset template , w i is a feature value of the i-th feature in the object to be identified, and (r i -w i ) is greater than 1.
The method according to claim 1, wherein the calculating the difference between the object feature and the feature of the preset template comprises: calculating a metric value of the object feature, and clustering the object feature .
The method according to claim 3, wherein the calculating the difference between the object feature and the feature of the preset template comprises: calculating the feature of the clustered object and the preset template by using a distance function The difference in distance between them.
The method according to claim 3, wherein the clustering the guest features comprises: clustering the guest features using a k-menas clustering algorithm, an Otsu algorithm or a density algorithm.
The method according to claim 3, wherein the calculating the metric value of the guest feature comprises: calculating a metric value of the guest feature using a binarization algorithm, a Gabor wavelet transform algorithm, or a deep convolution network.
The method according to claim 1, wherein the calculating the similarity between the object and the preset template by using the processed difference value comprises:

Performing similarity ranking on the similarity between the object and the preset template or excluding a preset template whose similarity with the object is less than a preset threshold.
The method of claim 1 wherein said guest features are at least one of area, width, perimeter, and density.
The method according to claim 1, wherein the object to be identified is a one-dimensional code image.
A similarity measuring device, characterized in that the device comprises a processor, a memory and a communication circuit, the processor is coupled to the memory and the communication circuit;

When the processor is in operation, acquiring, by the communication circuit, a feature of the object to be identified, then calculating a difference between the feature of the object and a feature of the preset template, and processing the difference by using a preset strategy. So that the processed difference is greater than or equal to the difference before the processing; the similarity between the object and the preset template is calculated by using the processed difference.
The device according to claim 10, wherein the difference between the feature of the object and the feature of the preset template is processed by using a preset strategy, so that the processed difference is greater than Or equal to the difference before processing includes:

The processor uses a formula while working
Calculating a difference between the object feature and the preset template feature, where d is the total difference after processing, M is the number of features, and r i is the feature value of the i-th feature in the preset template , w i is a feature value of the i-th feature in the object to be identified, and (r i -w i ) is greater than 1.
The apparatus according to claim 10, wherein the calculating the difference between the object feature and the feature of the preset template comprises:

The processor, when in operation, calculates a metric value of the guest feature and clusters the guest feature.
The device according to claim 12, wherein the difference between the feature of the computing object and the feature of the preset template comprises:

The processor calculates a distance difference between the clustered object feature and the feature of the preset template by using a distance function during operation.
The apparatus according to claim 12, wherein the clustering the guest features comprises: clustering the guest features using a k-menas clustering algorithm, an Otsu algorithm or a density algorithm.
The apparatus according to claim 12, wherein the calculating the metric value of the guest feature comprises: calculating a metric value of the guest feature using a binarization algorithm, a Gabor wavelet transform algorithm, or a deep convolution network.
The apparatus according to claim 10, wherein the calculating the similarity between the object and the preset template by using the processed difference value comprises:

During operation, the processor performs similarity ranking on the similarity between the object and the preset template or excludes a preset template that is less than a predetermined threshold between the object and the object.
The apparatus of claim 10 wherein said guest features are at least one of area, width, perimeter, and density.
The apparatus according to claim 10, wherein the object to be identified is a one-dimensional code image.
A device having a storage function, characterized in that the device stores a program, and when the program is executed, the method for measuring the similarity according to any one of claims 1 to 9 is implemented.