CN112802009A

CN112802009A - Similarity calculation method and device for product detection data set

Info

Publication number: CN112802009A
Application number: CN202110210120.4A
Authority: CN
Inventors: 林大; 旷黎明; 师文庆; 韩锦; 潘正颐; 侯大为
Original assignee: Changzhou Weiyizhi Technology Co Ltd
Current assignee: Changzhou Weiyizhi Technology Co Ltd
Priority date: 2021-02-25
Filing date: 2021-02-25
Publication date: 2021-05-14

Abstract

The application discloses a similarity calculation method and a similarity calculation device for product detection data sets, wherein the method comprises the steps of calculating the similarity K' of each defect between two data sets according to the weight of each defect on different feature dimensions; calculating the similarity K1 between the two data sets according to the similarity K' of each defect and the number ratio of each defect; calculating cosine similarity between two vectors of each defect in the two data sets, and multiplying the cosine similarity by the similarity K1 to obtain the similarity K2 between the two data sets; and carrying out normalization processing on the similarity K2 to obtain the final similarity K between the two data sets. According to the method and the device, the similarity of the data sets is calculated in a weighting mode based on each characteristic dimension of the defect, and the similarity of the two data sets can be rapidly acquired, so that the historical parameter configuration of the data set with higher similarity is used for setting the initial parameter configuration of the data set for training, and the training efficiency of the model is improved.

Description

Similarity calculation method and device for product detection data set

Technical Field

The application belongs to the technical field of product detection, and relates to a similarity calculation method and device for a product detection data set in an industrial internet.

Background

In the artificial intelligence-based product surface defect detection solution, after a marking team marks defects of shot pictures, the pictures with correlation generally need to be classified into a group of data sets for training an intelligent detection model.

The current model training mode is usually to input a data set to a model, train the model, and obtain configuration information of relevant parameters. When the number of models to be trained is large, the training efficiency is low because the training of each model requires a long time because the training is started from zero.

Disclosure of Invention

In order to solve the problem that the training efficiency of a related technology model is low, the application provides a similarity calculation method and a similarity calculation device for a product detection data set, and the technical scheme is as follows:

in a first aspect, the present application provides a method for calculating similarity of product inspection data sets, the method comprising:

calculating the similarity K' of each defect between the two data sets according to the weight of each defect on different feature dimensions;

calculating to obtain the similarity K1 between the two data sets according to the similarity K' of each defect and the quantity ratio of each defect, wherein the similarity K1 is used for depicting the contribution of the quantity ratio of each defect to the similarity of the two data sets;

calculating cosine similarity between two vectors of each defect in the two data sets, and multiplying the cosine similarity with the similarity K1 to obtain similarity K2 between the two data sets, wherein the cosine similarity is used for describing the structural proportion of the defect number, and the similarity K2 is used for describing the contribution of the structural proportion of each defect number to the similarity of the two data sets;

and carrying out normalization processing on the similarity K2, and mapping the similarity K2 to [0,1] to obtain the final similarity K between the two data sets.

Optionally, the calculating a similarity K' between the two data sets of each defect according to the weight of each defect in different feature dimensions includes:

for each defect, the characteristic value f of said defect is extracted separately in two data sets_i1,f_i2,f_i3,f_i4,f_i5,...f_in]Wherein f is_ijRepresenting the characteristic value of the defect on the ith data set in the j dimension, wherein n is the total number of the characteristic dimensions of the defect;

calculating the similarity K' of the defect between the two data sets according to the characteristic value of the defect between the two data sets and the weight of each characteristic dimension, wherein the weight of each characteristic dimension is [ w₁,w₂,w₃,w₄,w₅,...,w_n]，K’=v₁*w₁+v₂*w₂+v₃*w₃+v₄*w₄+v₅*w₅+...+v_n*w_n，v_jRepresents the similarity between the two data sets of the characteristic value of the j dimension of the current defect, and v_jThe value is [0,1]]Interval, v_j=f_1j/f_2j。

Optionally, the calculating the similarity K1 between the two data sets according to the similarity K' of each defect and the ratio of the number of each defect includes:

acquiring the number of each defect in the two data sets;

adding the number of each defect to obtain the total number of the defects;

dividing the number of each defect by the sum of the number of the defects to obtain the number ratio of each defect;

multiplying the number ratio of each defect by the similarity K' of each defect to obtain a product value of each defect;

the product values of the respective defects are added to obtain the similarity K1.

Optionally, the calculating a cosine similarity between two vectors of the number of each defect in the two data sets, and multiplying the cosine similarity by the similarity K1 to obtain a similarity K2 between the two data sets includes:

a first vector (P) is derived based on the number of defects in the two data sets₁₁,P₁₂,...,P_1m) And a second vector (P)₂₁，P₂₂,...,P_2m），P_ijThe number of defects j in the ith data set, and m is the number of defects;

calculating cosine similarity between the first vector and the second vector;

and multiplying the similarity K1 by the cosine similarity to obtain the similarity K2.

Optionally, the normalizing the similarity K2, and mapping the similarity K2 to [0,1], to obtain a final similarity K between two data sets, includes:

acquiring the weight sum W of each characteristic dimension of the defect;

dividing the similarity K2 by the weight sum W to obtain the similarity K.

In a second aspect, the present application also provides an apparatus for calculating similarity of product inspection data sets, the apparatus comprising:

the first calculation module is used for calculating the similarity K' of each defect between the two data sets according to the weight of each defect on different feature dimensions;

the second calculation module is used for calculating the similarity K1 between the two data sets according to the similarity K' of each defect calculated by the first calculation module and the quantity ratio of each defect, and the similarity K1 is used for depicting the contribution of the quantity ratio of each defect to the similarity of the two data sets;

the third calculation module is used for calculating cosine similarity between two vectors of the number of each defect in the two data sets, multiplying the cosine similarity with the similarity K1 calculated by the second calculation module to obtain the similarity K2 between the two data sets, wherein the cosine similarity is used for describing the structure proportion of the number of the defects, and the similarity K2 is used for describing the contribution of the structure proportion of the number of each defect to the similarity of the two data sets;

and the processing module is used for carrying out normalization processing on the similarity K2 calculated by the third calculation module, and mapping the similarity K2 between [0,1] to obtain the final similarity K between the two data sets.

Optionally, the first computing module includes:

an extraction unit for extracting, for each defect, a feature value [ f ] of the defect in each of the two data sets_i1,f_i2,f_i3,f_i4,f_i5,...f_in]Wherein f is_ijRepresenting the characteristic value of the defect on the ith data set in the j dimension, wherein n is the total number of the characteristic dimensions of the defect;

a first calculating unit, configured to calculate a similarity K' between the two data sets of the defect according to the feature values of the defect extracted by the extracting unit in the two data sets and the weight of each feature dimension, where the weight of each feature dimension is [ w₁,w₂,w₃,w₄,w₅,...,w_n]，K’=v₁*w₁+v₂*w₂+v₃*w₃+v₄*w₄+v₅*w₅+...+v_n*w_n，v_jRepresents the similarity between the two data sets of the characteristic value of the j dimension of the current defect, and v_jThe value is [0,1]]Interval, v_j=f_1j/f_2j。

Optionally, the second computing module includes:

a first acquiring unit for acquiring the number of each defect in the two data sets;

a second calculating unit, configured to add the number of each defect acquired by the first acquiring unit to obtain a defect number sum;

a third calculating unit, configured to divide the number of each defect by the sum of the numbers of defects calculated by the second calculating unit to obtain a number ratio of each defect;

a fourth calculating unit, configured to multiply the number ratio of each defect calculated by the third calculating unit by the similarity K' of each defect to obtain a product value of each defect;

a fifth calculating unit, configured to add the product values of the defects calculated by the fourth calculating unit to obtain the similarity K1.

Optionally, the third computing module includes:

a vector acquisition module for obtaining a first vector (P) based on the number of defects in the two data sets₁₁,P₁₂,...,P_1m) And a second vector (P)₂₁，P₂₂,...,P_2m），P_ijThe number of defects j in the ith data set, and m is the number of defects;

a sixth calculating unit, configured to calculate a cosine similarity between the first vector and the second vector acquired by the vector acquisition module;

and the seventh calculating unit is used for multiplying the similarity K1 by the cosine similarity calculated by the sixth calculating unit to obtain the similarity K2.

Optionally, the processing module includes:

the second acquisition unit is used for acquiring the weight sum W of each characteristic dimension of the defect;

an eighth calculating unit, configured to divide the similarity K2 by the weight sum W obtained by the second obtaining unit to obtain the similarity K.

Based on the technical scheme, the application can at least realize the following beneficial effects:

the similarity of the two data sets is calculated by adopting a method of weighting based on each feature dimension of the defect, so that the similarity of the two data sets can be rapidly acquired, the historical parameter configuration of the data set with higher similarity is further used for setting the initial parameter configuration of the data set for training, and the training efficiency of the model can be improved to a certain extent.

In addition, according to the actual requirements of the service, the weight sizes of different dimensions can be adjusted according to the actual requirements of the service, and the similarity between two defect objects is dynamically calculated, so that the similarity between data sets is influenced; when the similarity of the data set is calculated, the contribution of each defect quantity to the overall similarity is considered locally, and the calculation of the overall proportion of each defect quantity to the similarity is also considered; the normalization process performed, maps to the [0,1] interval, so that the similarity has comparable characteristics in value, with a larger value indicating more similarity between the two data sets.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.

FIG. 1 is a flow chart of a method of similarity calculation for product inspection data sets provided in one embodiment of the present application;

FIG. 2 is a schematic diagram of a similarity calculation apparatus for product inspection data sets provided in an embodiment of the present application;

fig. 3 is a schematic structural diagram of a similarity calculation apparatus for a product inspection data set provided in another embodiment of the present application.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the invention, as detailed in the appended claims.

Fig. 1 is a flowchart of a method for calculating similarity of product inspection data sets provided in an embodiment of the present application, and the method for calculating similarity of product inspection data sets provided in the present application can be applied to a computer, such as a computer used by a client or a server, and the computer stores an execution program for implementing the following steps. The similarity calculation method for the product detection data set provided by the application can comprise the following steps:

step 101, calculating the similarity K' of each defect between two data sets according to the weight of each defect on different feature dimensions;

for each defect, the characteristic value f of the defect is extracted separately in the two data sets_i1,f_i2,f_i3,f_i4,f_i5,...f_in]Wherein f is_ijRepresenting the characteristic value of the defect on the ith data set in the j dimension, and n is the total number of characteristic dimensions of the defect.

Calculating the similarity K' of the defect between the two data sets according to the characteristic value of the defect between the two data sets and the weight of each characteristic dimension, wherein the weight of each characteristic dimension is [ w [ ]₁,w₂,w₃,w₄,w₅,...,w_n]，K’=v₁*w₁+v₂*w₂+v₃*w₃+v₄*w₄+v₅*w₅+...+v_n*w_n，v_jRepresents the similarity between the two data sets of the characteristic value of the j dimension of the current defect, and v_jThe value is [0,1]]Interval, v_j=f_1j/f_2j。

For example, the two data sets include a defect A, which extracts a defect feature value of [ f ] in the first data set₁₁ ^a,f₁₂ ^a,f₁₃ ^a,f₁₄ ^a,f₁₅ ^a,...f_1n ^a]Wherein f is_1j ^aA characteristic value of the defect A in the j dimension of the 1 st data set is represented; defect A the characteristic value of the defect extracted in the second data set is [ f [ ]₂₁ ^a,f₂₂ ^a,f₂₃ ^a,f₂₄ ^a,f₂₅ ^a,...f_2n ^a]Wherein f is_2j ^aRepresenting the characteristic value of defect a in the 1 st data set in the j dimension.

The similarity Ka' between the two data sets for defect a takes the following values:

Ka’= v₁ ^a*w₁+v₂ ^a*w₂+v₃ ^a*w₃+v₄ ^a*w₄+v₅ ^a*w₅+...+v_n ^a*w_nwherein v is_j ^a=f_1j ^a/f_2j ^a _。

As another example, the two data sets include a defect B, which extracts a defect feature value of [ f ] in the first data set₁₁ ^b,f₁₂ ^b,f₁₃ ^b,f₁₄ ^b,f₁₅ ^b,...f_1n ^b]Wherein f is_1j ^bA characteristic value of defect B in the j dimension of the 1 st data set; the characteristic value of the defect extracted by defect B in the second data set is f₂₁ ^b,f₂₂ ^b,f₂₃ ^b,f₂₄ ^b,f₂₅ ^b,...f_2n ^b]Wherein f is_2j ^bRepresenting the characteristic value of defect B in the j dimension of the 1 st data set.

The similarity Kb' between the two data sets for defect B takes the following values:

Kb’= v₁ ^b*w₁+v₂ ^b*w₂+v₃ ^b*w₃+v₄ ^b*w₄+v₅ ^b*w₅+...+v_n ^b*w_nwherein v is_j ^b=f_1j ^b/f_2j ^b _。

102, calculating the similarity K1 between the two data sets according to the similarity K' of each defect and the number ratio of each defect;

the similarity K1 is used herein to characterize the number of defects per defect versus the contribution to the similarity of the two data sets.

In one possible implementation, when step 102 is implemented, the following steps may be included:

s21, acquiring the number of each defect in two data sets;

such as: i is initialized to 1, the number of ith defects in the two data sets is obtained, i = i +1, and the step of obtaining the number of ith defects in the two data sets is continuously executed until the number of all defects in the two data sets is obtained.

For example, if the number of defect a in the first data set and the number of defect B in the second data set are a1 and a2, respectively, the number of defect a in the two data sets is a1+ a2, and the number of defect B in the first data set and the second data set is B1 and B2, respectively, the sum of the number of defect B in the two data sets is B1+ B2; the number of defect C in the first data set and the second data set is C1 and C2, respectively, so the total number of defect C in the two data sets is C1+ C2.

S22, adding the number of each defect to obtain the total number of the defects;

further to the above example, the sum of the number of defects in defect a, defect B and defect C in the two data sets is total = a1+ a2+ B1+ B2+ C1+ C2.

S23, dividing the number of each defect by the sum of the number of the defects to obtain the number ratio of each defect;

after the total number of defects in the two data sets of each defect is obtained according to step S22, for each defect, the number of the current defect in the two data sets may be divided by the total number of the defect to obtain the ratio of the number of the current defect.

Further to the above example, the number of defects a is (a1+ a2)/total, the number of defects B is (B1+ B2)/total, and the number of defects C is (C1+ C2)/total.

S24, multiplying the number ratio of each defect by the similarity K' of each defect to obtain a product value of each defect;

and S25, adding the product values of the defects to obtain the similarity K1.

Further to the above example, the number of individual defects contributes to the similarity of the dataset, resulting in a similarity of dataset K1= Ka (a1+ a2)/total + Kb (B1+ B2)/total + Kc (C1+ C2)/total.

103, calculating cosine similarity between two vectors of each defect in the two data sets, and multiplying the cosine similarity by the similarity K1 to obtain the similarity K2 between the two data sets;

the cosine similarity as used herein is used to characterize the structural proportion of the number of defects, and the similarity K2 is used to characterize the contribution of the structural proportion of the number of each defect to the similarity of the two data sets.

In one possible implementation manner, when step 103 is implemented, the following steps may be included:

s31, obtaining a first vector (P) according to the number of each defect in the two data sets₁₁,P₁₂,...,P_1m) And a second vector (P)₂₁，P₂₂,...,P_2m），P_ijThe number of defects j in the ith data set, and m is the number of defects;

s32, calculating cosine similarity between the first vector and the second vector;

and S33, multiplying the similarity K1 by the cosine similarity to obtain a similarity K2.

Assuming that a defect a, a defect B and a defect C are selected, the number of the three defects in the first data set is a1, B1 and C1, the number of the three defects in the second data set is a2, B2 and C2, the number of the defects is taken as a vector, two vectors (a1, B1 and C1) and (a 2, B2 and C2) can be obtained from the two data sets, the cosine similarity of the two vectors is calculated and is recorded as cos, the similarity K1 of the data sets is obtained according to the

steps

101 and 102, the contribution of the structure of the number of the defects to the similarity of the data sets is calculated, and the similarity K2= K1.

And 104, carrying out normalization processing on the similarity K2, and mapping the similarity K2 to [0,1] to obtain the final similarity K between the two data sets.

Firstly, acquiring the weight sum W of each characteristic dimension of a defect; then, the similarity K2 is divided by the weight sum W to obtain the similarity K.

For example, the weight of each feature dimension of the defect is [ w ]₁,w₂,w₃,w₄,w₅,...,w_n]The sum of their weights W = W₁+w₂+...+w_nFrom K2 obtained in steps 101 to 103, it is easy to know that the maximum value is W, so K2 is normalized and mapped to [0,1]]And obtaining the final similarity K = K2/W of the data set.

In summary, according to the similarity calculation method for the product detection data sets provided by the application, the similarity of the two data sets is calculated by adopting a weight mode based on each feature dimension of the defect, so that the similarity of the two data sets can be rapidly obtained, the historical parameter configuration of the data set with higher similarity is further used for setting the initial parameter configuration of the data set for training, and the training efficiency of the model can be improved to a certain extent.

The following is an embodiment of a similarity calculation apparatus for a product detection data set, and since the apparatus embodiment corresponds to the method embodiment, for the following explanation of technical features in the similarity calculation apparatus for a product detection data set, reference may be made to the above explanation of corresponding technical features in the method embodiment, and details are not repeated here.

Fig. 2 is a schematic structural diagram of a similarity calculation apparatus for a product inspection dataset provided in an embodiment of the present application, which may be implemented by software, hardware, or a combination of software and hardware, and may include: a first calculation module 210, a second calculation module 220, a third calculation module 230, and a processing module 240.

The first calculation module 210 may be configured to calculate a similarity K' of each defect between the two data sets according to the weight of each defect in different feature dimensions;

the second calculating module 220 may be configured to calculate a similarity K1 between the two data sets according to the similarity K' of each defect calculated by the first calculating module 210 and the quantity ratio of each defect, where the similarity K1 is used to characterize the contribution of the quantity ratio of each defect to the similarity of the two data sets;

the third calculating module 230 may be configured to calculate cosine similarity between two vectors of the number of each defect in the two data sets, and multiply the cosine similarity with the similarity K1 calculated by the second calculating module 220 to obtain a similarity K2 between the two data sets, where the cosine similarity is used to characterize the structural proportion of the number of defects, and the similarity K2 is used to characterize the contribution of the structural proportion of the number of each defect to the similarity of the two data sets;

the processing module 240 may be configured to perform normalization processing on the similarity K2 calculated by the third calculating module 230, and map the similarity K2 to [0,1], so as to obtain a final similarity K between two data sets.

In a possible implementation manner, please refer to fig. 3, which is a schematic structural diagram of a similarity calculation apparatus for a product inspection data set provided in another embodiment of the present application, wherein the first calculation module 210 may include: an extraction unit 211 and a first calculation unit 212.

The extraction unit 211 may be configured to extract, for each defect, a feature value [ f ] of the defect in the two data sets, respectively_i1,f_i2,f_i3,f_i4,f_i5,...f_in]Wherein f is_ijRepresenting the characteristic value of the defect on the ith data set in the j dimension, wherein n is the total number of the characteristic dimensions of the defect;

the first calculating unit 212 may be configured to calculate a similarity K' between the two data sets of the defect according to the feature values of the defect extracted by the extracting unit 211 in the two data sets and a weight of each feature dimension, where the weight of each feature dimension is [ w [ ]₁,w₂,w₃,w₄,w₅,...,w_n]，K’=v₁*w₁+v₂*w₂+v₃*w₃+v₄*w₄+v₅*w₅+...+v_n*w_n，v_jRepresents the similarity between the two data sets of the characteristic value of the j dimension of the current defect, and v_jThe value is [0,1]]Interval, v_j=f_1j/f_2j。

Still referring to fig. 3, the second calculation module 220 may include: a first acquisition unit 221, a second calculation unit 222, a third calculation unit 223, a fourth calculation unit 224, and a fifth calculation unit 225.

A first acquiring unit 221 for acquiring the number of each defect in the two data sets;

a second calculating unit 222, configured to add the numbers of the defects acquired by the first acquiring unit 221 to obtain a defect number sum;

a third calculating unit 223 for dividing the number of each defect by the sum of the numbers of defects calculated by the second calculating unit 222 to obtain the number ratio of each defect;

a fourth calculating unit 224, configured to multiply the ratio of the number of each defect calculated by the third calculating unit 223 by the similarity K' of each defect to obtain a product value of each defect;

a fifth calculating unit 225, configured to add the product values of the defects calculated by the fourth calculating unit 224 to obtain the similarity K1.

Still referring to fig. 3, the third calculation module 230 may include: a vector acquisition module 231, a sixth calculation unit 232 and a seventh calculation unit 233.

The vector acquisition module 231 may be configured to obtain the number of defects in the two data sets according to the number of defectsFirst vector (P)₁₁,P₁₂,...,P_1m) And a second vector (P)₂₁，P₂₂,...,P_2m），P_ijThe number of defects j in the ith data set, and m is the number of defects;

the sixth calculating unit 232 may be configured to calculate a cosine similarity between the first vector and the second vector acquired by the vector acquiring module 231;

the seventh calculating unit 233 may be configured to multiply the similarity K1 with the cosine similarity calculated by the sixth calculating unit 232 to obtain the similarity K2.

Still referring to fig. 3, the processing module 240 may include: a second acquisition unit 241 and an eighth calculation unit 242.

The second obtaining unit 241 may be configured to obtain a total weight W of each feature dimension of the defect;

the eighth calculating unit 242 may be configured to divide the similarity K2 by the weight sum W obtained by the second obtaining unit 241 to obtain the similarity K.

In summary, the similarity calculation device for the product detection data sets provided by the application calculates the similarity of the two data sets by adopting the weighting mode based on each feature dimension of the defect, can quickly acquire the similarity of the two data sets, so that the historical parameter configuration of the data set with higher similarity is further used to set the initial parameter configuration of the data set for training, and the training efficiency of the model can be improved to a certain extent.

Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.

It will be understood that the invention is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the invention is limited only by the appended claims.

Claims

1. A method of similarity calculation for a product inspection data set, the method comprising:

2. The method of claim 1, wherein calculating the similarity K' of each defect between two data sets according to the weight of each defect in different feature dimensions comprises:

3. The method according to claim 1, wherein calculating the similarity K1 between the two data sets according to the similarity K' of each defect and the ratio of the number of each defect comprises:

acquiring the number of each defect in the two data sets;

adding the number of each defect to obtain the total number of the defects;

4. The method of claim 1, wherein the calculating the cosine similarity between two vectors of the number of each defect in the two data sets, and multiplying the cosine similarity by the similarity K1 to obtain the similarity K2 between the two data sets comprises:

calculating cosine similarity between the first vector and the second vector;

5. The method according to claim 1, wherein the normalizing the similarity K2 to map the similarity K2 between [0,1] to obtain a final similarity K between two data sets comprises:

acquiring the weight sum W of each characteristic dimension of the defect;

dividing the similarity K2 by the weight sum W to obtain the similarity K.

6. An apparatus for calculating similarity of product inspection data sets, the apparatus comprising:

7. The apparatus of claim 6, wherein the first computing module comprises:

8. The apparatus of claim 6, wherein the second computing module comprises:

9. The apparatus of claim 6, wherein the third computing module comprises:

10. The apparatus of claim 6, wherein the processing module comprises: