CN112560858A

CN112560858A - Character and picture detection and rapid matching method combining lightweight network and personalized feature extraction

Info

Publication number: CN112560858A
Application number: CN202011088800.5A
Authority: CN
Inventors: 张冬明; 张菁; 张翠; 张广朋; 姚嘉诚
Original assignee: Beijing University of Technology; National Computer Network and Information Security Management Center
Current assignee: Beijing University of Technology; National Computer Network and Information Security Management Center
Priority date: 2020-10-13
Filing date: 2020-10-13
Publication date: 2021-03-26
Anticipated expiration: 2040-10-13
Also published as: CN112560858B

Abstract

The invention discloses a character picture detection and rapid matching method combining a lightweight network and personalized feature extraction.A deep learning method based on the lightweight network is used for classifying character pictures, detecting character pictures and non-character pictures, and further dividing the character pictures into two types of character pictures of a complex background and a simple background; further, aiming at the two types of character pictures, personalized feature representation picture contents are respectively extracted; and finally, according to the extracted personalized features, the rapid matching is carried out by using a corresponding method, so that the matching speed is improved while the accuracy is ensured. The method can effectively reduce the matching time, can comprehensively and efficiently utilize the content information of the character pictures, and meets the character picture matching requirements with robustness and real-time property.

Description

Character and picture detection and rapid matching method combining lightweight network and personalized feature extraction

Technical Field

The invention provides a character picture fast matching method combining a lightweight network and personalized feature extraction by taking a character picture as a research object. Firstly, classifying Internet pictures based on a deep learning method of a lightweight network, detecting character pictures and non-character pictures, and further dividing the character pictures into two types of character pictures of a complex background and a simple background; further, aiming at the two types of character pictures, personalized feature representation picture contents are respectively extracted; and finally, according to the extracted personalized features, the rapid matching is carried out by using a corresponding method, so that the matching speed is improved while the accuracy is ensured.

Background

With the development of the internet, smart phones and communication technologies, character picture data on the internet rapidly grow, and the data bring abundant information and great convenience to the life and work of people, and also become main ways for disseminating inciting information, violence information, sensitive speech, false information and the like, and bring great harm to national security, social stability and mass life. The internet character pictures take characters as main bodies, and are evaded and supervised by modes of format mixed arrangement, handwriting and the like, so that a method capable of effectively extracting character picture characteristics and performing quick matching is necessary to be designed for the real-time blocking requirement of illegal character pictures.

The character picture forms in the network mainly include two types, one is a complex background character picture formed by embedding character data into a complex background, and the other is a simple background character picture similar to a microblog long picture. Due to the fact that the character pictures are various in forms, the robustness of matching the character pictures only through single characteristics is weak, the character pictures need to be detected and classified according to the characteristics of the character pictures, and then an individualized characteristic extraction method is selected to achieve fast and accurate character picture matching. Because internet pictures are spread fast, the complexity of the OCR method is high, the real-time detection requirement is difficult to meet, even if AI accelerating equipment is relied on, the system construction cost is still high, and the current OCR method and the OCR method cannot accurately recognize handwriting, scene characters, mixed-arranged characters and the like. In order to meet the requirements of real-time detection and accurate identification of illegal character pictures, a quick and effective character picture detection and classification method needs to be designed, and subsequent matching performance is directly influenced. The traditional detection and classification method usually extracts the global features or the local features of the pictures and classifies the pictures by using an SVM classifier. However, the internet pictures have large data volume and various contents, and the traditional method has insufficient generalization capability and real-time processing capability, and cannot meet the requirements of practical application. In recent years, deep learning is the mainstream method at present due to the excellent performance of the deep learning in application of image classification, identification and the like, however, the deep neural network has large parameter quantity and low running speed, and cannot be directly put into practical use in rapid image detection and classification.

Aiming at the character pictures of two types after detection and classification, how to improve the personalized feature expression capability of the pictures is the key point for further improving the matching performance of the character pictures. Considering that more interference factors are contained in a complex background character picture, but a character region is still a key region in the picture, and the matching technology for the picture mainly comprises a depth feature-based character picture matching technology and a local feature-based character picture matching technology. The character picture matching technology based on the depth features uses a depth neural network to detect texts in any shapes in pictures, and the method performs similarity measurement on extracted depth features after character regions are detected to complete matching of character pictures, but the method rarely considers the running time and efficiency of an algorithm and limits the method in an actual application environment; the character class picture matching technology based on the local Features extracts the local Features of the picture through a local Feature extraction operator, and can effectively represent the Features of the complex background character class picture, wherein the local Feature extraction operator mainly comprises SIFT (Scale artifact Feature transform), SURF (speed-Up Robust Features), ORB (organized FAST and Rotated BRIEF) and the like, wherein the expression capability of ORB Features is similar to the local Features of SIFT, SURF and the like, the detection speed is one to two orders of magnitude faster than SIFT and SURF, the Features of the complex background character class picture can be effectively represented, and the requirement for real-time matching of the picture is met. Considering that a simple background Character picture is a pure Character picture, a mainstream technology for identifying the picture is an Optical Character Recognition (OCR) technology, the OCR technology identifies text contents from the picture, reconstructs an electronic document consistent with the Character contents in the original picture, and then uses a text matching technology to realize matching of the Character picture, but the OCR technology has insufficient accuracy for identifying highly deformed characters, handwritten characters and the like, has higher requirement on computing capacity and low processing speed, and cannot meet the real-time matching requirement on the Character picture; the image feature words are image features formed by extracting relevant regions for coding after detecting character regions by morphological processing and connecting the relevant regions in series according to a certain sequence, can effectively represent the features of character images of simple backgrounds, are high in feature extraction speed and subsequent matching speed, and can meet the real-time matching requirements of the character images.

Therefore, the invention provides a character picture fast matching method combining a lightweight network and personalized feature extraction. Firstly, character pictures with complex backgrounds and character pictures with simple backgrounds are screened out from internet pictures according to the complexity of the backgrounds of the character pictures, and ORB features or feature words of the pictures are respectively extracted; and finally, matching by adopting a corresponding method according to the extracted image feature types, measuring the similarity of the ORB features in a mode of directly calculating vector distance, measuring the similarity of the feature words in a mode of calculating repetition rate, and returning a matching result.

Disclosure of Invention

The invention provides a character picture fast matching method combining a lightweight network and personalized feature extraction. Firstly, according to the complexity of a character picture background, two types of character pictures are detected and classified from an internet picture by utilizing a lightweight network, personalized features are respectively extracted according to different characteristics of the two types of pictures, ORB features are extracted from the character picture with the complex background, the features have good matching effect on pictures with more edges and angular points and have affine invariance, feature words are extracted from the pictures with a simple background and more characters, the feature words encode characters in a region instead of single characters, different character combinations can be distinguished, and the length of the feature words is controllable; and finally, respectively adopting corresponding methods to match according to the types of the extracted image features, measuring the similarity of the ORB features by calculating the Manhattan distance of the vector, and measuring the similarity of the feature words by calculating the repetition rate, thereby realizing the character image matching with both robustness and real-time property. The main process of the method is shown in the attached figure 1, and the main process can be divided into the following steps: firstly, dividing the pictures into two types according to the background complexity of the character pictures, and respectively extracting ORB features or feature words; and finally, respectively adopting a corresponding method to carry out matching according to the extracted image feature types, carrying out ORB feature matching in a mode of directly calculating vector distance, carrying out feature word matching in a mode of calculating the feature word repetition rate, and returning a matching result.

1. Character picture detection classification based on lightweight network

The invention uses the lightweight network to detect and classify the character pictures. With the development of deep learning technology, a great number of researchers propose different types of light weight neural networks, and the MobileNet network has smaller volume, less calculation amount and higher precision and has great advantages in light weight network performance, the MobileNet-V3 is the latest force of the MobileNet series, and has two versions of MobileNet-V3 large and MobileNet-V3 small, and the MobileNet-V3 small has less parameter quantity and is more suitable for the real-time requirement of the invention.

The MobileNet-V3 small has a unique MobileNet-V3 block structure as shown in fig. 2, which first converts the input channel to the expansion channel using 1x1 convolution; then, performing depth Separable Convolution (Dwise) on the expansion channel, wherein the depth Separable Convolution can greatly improve the operation efficiency of the network; performing pooling operation on the channels, selectively using a lightweight attention model of an SE (squeeze and excitation) structure to linearly connect the channels, and combining the relationship of the characteristic channels to strengthen the learning capacity of the network; and finally, carrying out 1x1 convolution operation on the channel and adding the convolution operation and the input value to form an inverse residual structure with a linear bottleneck, so that the network hierarchy is deeper, the model volume is smaller, and the speed is higher. The swish activation function is replaced by H-swish in the mobilene-V3 small, so that the loss of numerical precision is avoided during quantification, and the running speed is guaranteed.

The invention uses about 6 ten thousand complex background character pictures, 6 ten thousand simple background character pictures and 30 ten thousand other pictures (such as natural pictures, icon pictures and the like) training models, and uses a C + + language calling model to realize the detection and classification of the character pictures.

2. Personalized feature extraction of character pictures

The character picture feature extraction is a key step for completing character picture matching, and the expression capability of the features directly determines the picture matching effect. The invention extracts personalized features from the character pictures of the complex background or the simple background which are detected and classified, extracts ORB features from the character pictures of the complex background, and extracts feature words from the character pictures of the simple background.

(1) ORB feature extraction of complex background character class pictures

The ORB feature matrix is extracted aiming at the complex background character type picture, and compared with other operators such as SIFT, SURF and the like, the ORB operator can ensure the expression capability of the features while reducing the feature extraction speed; after the ORB operator is extracted, the ORB characteristic matrix is coded by a VLAD (vector of LocallyAggregatedDescriptors) method to obtain a VLAD characteristic vector; and finally, carrying out PCA (principal component analysis) dimension reduction on the VLAD feature vector to obtain a final feature vector, so that the time for subsequent matching can be effectively reduced. The specific steps are as follows.

ORB feature extraction

The ORB feature extraction speed is high, the ORB feature extraction method has rotation invariance, and the ORB feature extraction method mainly comprises two steps of oFAST (organized Features From accessed segmented test) key point detection and rBRIEF (organized Binary Robust Independent Features) feature description.

oFAST keypoint detection determines feature points by calculating the magnitude relationship between the pixel value of a certain point and its surrounding pixel values. Firstly, comparing the brightness values of the central pixel point and 16 peripheral pixel points with the radius of 3, temporarily setting a threshold h and a central point brightness value I_pIf the peripheral pixels have 9 or more than 9 pixels, the brightness is greater than I_p+ h or both are less than I_pH, judging the point as a characteristic point; then using non-maximum suppression to prevent selecting a plurality of feature points in a relatively adjacent area; then setting a scaling factor scaleFactor and pyramid layer number nlevels, reducing the original image into nlevels images according to the scaling factor, and extracting the sum of characteristic points of the nlevels images in different proportions to serve as an oFAST characteristic point of the images; and finally, calculating the centroid of the feature point in a radius range by using r as the moment, wherein a vector is formed from the coordinates of the feature point to the centroid and serves as the direction of the feature point. Pyramid operation in the oFAST features can enable the features to have scale invariance, and the oFAST is simple in calculation and high in feature extraction speed.

The rBRIEF feature description can make the ORB algorithm rotation invariant, meaning that it can detect the same keypoints in images rotated towards any angle. Firstly, selecting 256 pairs of point sets in a 31 x 31 neighborhood of the extracted oFAST characteristic points, wherein the coordinates of the point sets conform to the Gaussian distribution of (0, S2/25); then, rotating the random pixel pairs according to the direction angle of the key point to enable the direction of the random point to be consistent with that of the key point so as to obtain rotation invariance; finally, rBRIEF compares the intensities of the random pixel pairs and assigns 1 and 0 accordingly to create corresponding 256-bit binary string descriptor feature vectors, all the feature vector sets created for all keypoints in the image are referred to as ORB descriptors.

VLAD coding

Then, a clustering algorithm is used for the ORB characteristics to obtain k clustering centers, and VLAD is carried out to code v_ij，v_ijIs c is_iThe value of each dimension of the feature point x in the cluster as the cluster center and the cluster center c_iIs subtracted from the value of each dimension of (1)And as a result, 32 clustering centers are selected in the invention, and finally the obtained v_ijCharacteristic dimension of 32 × 256, v_i,jThe line ends are connected to obtain an ORB feature V (VLAD feature) coded by the VLAD, and the dimension d is 32 × 256 or 8192. VLAD coding can save the distance from each feature point to the nearest cluster center, and the value of each dimension of the feature point is considered, so that local information of the image is more finely described, and information is not lost by VLAD features.

PCA dimensionality reduction

And finally, carrying out PCA dimension reduction on the VLAD characteristics, and mapping f to a low-dimensional space to obtain data with the dimension f being 1024. The PCA method is used for solving a feature matrix M formed by splicing VLAD features of n images_n,dThe dimension of the features is reduced by combining the similar features, which is beneficial to preventing the over-fitting phenomenon; meanwhile, by using the dimension reduction method, the operation speed of the algorithm is increased, and the memory space for storing data is reduced.

(2) Feature word extraction of simple background character pictures

The invention extracts the characteristic words aiming at the character pictures with simple backgrounds. Firstly, extracting LBP (local Binary Pattern) characteristics of an image, wherein the LBP is an operator for describing local texture characteristics of the image and has the remarkable advantages of rotation invariance, gray scale invariance and the like; after extracting LBP characteristics, the invention preprocesses the picture and detects the character areas, and then carries out LBP characteristic histogram statistics in different areas to generate characteristic word vectors. The specific steps are as follows.

LBP feature extraction

The original LBP operator is defined as that in a window of 3 × 3, the central pixel of the window is used as a threshold, the gray values of the adjacent 8 pixels are compared with the central pixel, if the values of the surrounding pixels are greater than the value of the central pixel, the position of the pixel is marked as 1, otherwise, the position is 0. Thus, 8 points in the 3 × 3 neighborhood can be compared to generate 8-bit binary numbers (usually converted into decimal numbers, i.e. LBP codes, 256 types in total), i.e. the LBP value of the pixel point in the center of the window is obtained, and this value is used to reflect the texture information of the region. After the LBP value of each pixel value is obtained, the LBP is divided into 10 classes according to the edge class corresponding to the LBP, and the class is used as the pixel value to generate the LBP image.

LBP histogram statistics

When performing the LBP histogram statistics, it is necessary to pre-process the character pictures to obtain the character regions in the pictures, and then perform the histogram statistics on the character regions respectively. The method comprises the following specific steps:

1) carrying out binarization processing on the character picture by using an Otsu method, and if a white area of the binarized image is larger than a black area, carrying out reverse color processing on the image to obtain a new binary image;

2) carrying out primary opening operation and primary closing operation on the black-white binary image, extracting the outline of the white area of the image and generating a rectangular bounding box of each outline;

3) removing the rectangular frame with the size smaller than the user-defined value N;

4) performing a closed operation (convolution kernel size is 3 × 3) on the original black-and-white binary image, and generating an LBP image by using the LBP feature extraction method;

5) counting LBP image histograms in each rectangular frame, quantizing to obtain feature words, and encoding the feature words into 64-bit integers, wherein if 'a' is encoded into '0000', 'b' is encoded into '0001', 'ab' is encoded into '00000001', and finally the feature words of each image are composed of a plurality of 64-bit integers.

3. Fast matching of character pictures

The method adopts corresponding methods to match according to the types of the extracted image features, measures the similarity of ORB features by directly calculating vector distance, measures the similarity of feature words by calculating repetition rate, compares the similarity with a hit threshold value, and returns the image with the distance less than the threshold value, wherein the specific measurement mode is as follows.

(1) Similarity measurement of ORB (object-oriented B) features of complex background character class pictures

The Manhattan distance d between the feature vector of the picture to be matched and the feature vector in the database is used during matching_oAs image similarity determinationAccording to, d_oThe value calculation formula is:

d_othe larger the value, the farther the distance, the lower the image similarity, and vice versa. When the nearest distance is obtained

And comparing the distance with a hit threshold, if the distance is smaller than the hit threshold, judging that the matching is successful, otherwise, judging that the matching is failed.

(2) Similarity measurement of simple background character class picture characteristic words

And when the images are matched, comparing the characteristic word strings of the images to be retrieved and the images in the database, and judging whether the characteristic words in the characteristic word strings of the images to be retrieved exist in the characteristic word strings of the images in the database. If the matched feature words exist, the feature words are regarded as matched feature words, and the proportion of the matched feature words is used as an image similarity measurement basis. Let two picture feature words be respectively

Wherein

And

represents a 64-bit integer in the feature word string, and m and n are the lengths of two feature words respectively. The metric distance d between the feature words_wThe calculation formula of (a) is as follows:

wherein the count is a counting function for counting two characteristicsThe number of identical feature words in the word string. To obtain the minimum

When it is worth, will

Comparing the obtained result with T, and if the obtained result is less than T, successfully matching; otherwise, the matching fails.

Compared with the prior art, the invention has the following obvious advantages and beneficial effects:

firstly, the invention uses a lightweight network to detect classified character pictures from internet pictures according to the character picture characteristics, divides the character pictures into pictures with complex backgrounds or simple backgrounds, and can perform personalized feature extraction according to the picture characteristics subsequently, thereby improving the feature expression capability; secondly, the invention carries out VLAD coding and PCA dimension reduction on ORB characteristics extracted from the picture with complex background, can reduce the time of subsequent matching, can remove partial redundancy by carrying out dimension reduction on the characteristics, and experiments prove that the accuracy of matching can be improved in a small amplitude by carrying out dimension reduction on the characteristics; the invention extracts the characteristic words from the simple background picture, the simple background picture is mainly composed of characters without excessive interference factors, and the Hamming distance between the characteristic words is calculated during matching, thereby effectively reducing the matching time. Experiments prove that the method can comprehensively and efficiently utilize the content information of the character pictures and meet the character picture matching requirements with robustness and real-time performance.

Description of the drawings:

FIG. 1 is a flow chart of a character picture detection and fast matching method combining lightweight networking and personalized feature extraction;

FIG. 2 is a schematic representation of a MolileNet-V3 block structure;

FIG. 3 is a flow chart of character class picture detection and classification based on a lightweight network;

FIG. 4 is a flow chart of ORB feature extraction for a complex background character class picture;

FIG. 5 is a schematic diagram of oFAST feature extraction;

FIG. 6 is a flow chart of simple background character class picture feature word extraction;

fig. 7 schematic diagram of LBP edge feature classes.

Detailed Description

In light of the above description, a specific implementation flow is as follows, but the scope of protection of this patent is not limited to this implementation flow. The following is a specific workflow of the invention: firstly, detecting and classifying character pictures in an internet picture by using a lightweight network MbileNet-V3 small to obtain character pictures with complex backgrounds and simple backgrounds; for the character pictures with complex backgrounds, ORB features are extracted, VLAD coding and PCA dimension reduction are carried out on the features, redundancy among the features is reduced, time required by subsequent matching is reduced, finally, the similarity among the pictures is measured by using Manhattan distance, and a matching result is returned; and for the picture with the simple background, segmenting a character area in the picture, extracting LBP characteristics for the whole picture, acquiring an LBP histogram of the corresponding character area, finally measuring the similarity of the picture in a Hamming space, and returning a matching result.

1. Character picture detection classification based on lightweight network

The method comprises the steps of firstly detecting and classifying the character pictures of the complex background and the simple background in the Internet pictures.

The MobileNet-V3 small network used in the invention has a unique MobileNet-V3 block structure, the structure is shown in figure 2, firstly, 1x1 convolution is used for converting an input channel into an expansion channel; then, carrying out depth separable convolution on the expansion channel, wherein the depth separable convolution can greatly improve the operation efficiency of the network; then, pooling operation is carried out on the channels, and the light-weight attention model of the SE structure is selectively used for linearly connecting the channels, wherein the specific operation steps are as follows:

1) a global average pooling operation;

2) performing feature dimensionality reduction through the 1 st full-link layer, and increasing the features back to the original dimensionality through the 2 nd full-link layer after using a ReLU activation function;

Relu(x)＝max(0,x) (3)

3) and obtaining a normalized weight between 0 and 1 through H-sigmoid, and weighting the normalized weight to the characteristics of each channel.

The SE structure can be combined with the relationship of the characteristic channels to strengthen the learning ability of the network; and finally, carrying out 1x1 convolution operation on the channel and adding the convolution operation and the input value to form an inverse residual structure with a linear bottleneck, so that the network hierarchy is deeper, the model volume is smaller, and the speed is higher. The swish activation function is replaced by H-swish in the MobileneetV 3 small, and the calculation formula of H-swish is as follows:

in the formula, x is an input value, so that the loss of numerical precision is avoided during quantization, and the running speed is ensured. The specification of the MoileNet-V3 small network structure used in the invention is shown in Table 1, and the parameter Bnegk in Operator represents a MobileNet-V3 block structure.

TABLE 1 MobileNet-V3 small network specification

The invention uses about 6 ten thousand complex background character pictures, 6 ten thousand simple background character pictures and 30 ten thousand other pictures (such as natural pictures, icon type pictures and the like) to train the model. When the batch size is set to be 128 during model training, the input image is a picture with dimensions of 224 × 224 × 3, an RMSprop (root Mean Square prop) optimization algorithm is used, and the RMSprop calculation method is as follows:

S_dw＝βS_dw+(1-β)dw² (6)

where dw is the gradient, S_dwAs a value container carrying the result of the gradient squared weighted average and as a factor for the gradient scaling, α (default set to 0.001) is the learning rate, β represents the effect of past gradients on the current gradient, and default set to 0.9. The learning rate reduction is set during training, and if the loss value of 3 iterations does not change, the learning rate is reduced by a factor of 0.4; and the training is finished in advance, if the loss does not change after a plurality of iterations, the training is finished in advance, and the model trained by the method automatically finishes the training when training for 30 iterations.

The process of calling the model to realize character type picture detection and classification is shown in figure 3, firstly, performing 3x3 convolution on an image to obtain image characteristics, then learning the image characteristics through a convolution module consisting of 11 MobileNet-V3 block structures, then reducing the calculated amount through Avg-pool (average Power), and finally converting the result into a final type by using 1x1 convolution to finish the detection and classification of the character type picture. Experiments prove that the detection and classification time of a single picture can reach 4.4ms, and the detection and classification requirements of real-time performance and robustness can be met.

2. Personalized feature extraction of character pictures

The character picture feature extraction is a key step for completing character picture matching, and the expression capability of the features directly determines the picture matching effect. The character type picture forms are various, and single characteristics cannot adapt to the variable character type picture forms, so that the invention creatively integrates ORB operators and characteristic words, further designs a characteristic extraction method for realizing individuation, thereby enhancing the expression capability of the characteristics on the character picture contents and realizing rapid and accurate character type picture matching.

The specific integration method of the invention is that ORB characteristics of complex background character pictures are extracted, the ORB characteristics are similar to the expression capability of other local characteristics, and have faster extraction speed, and meanwhile, the ORB characteristic extraction parameters and subsequent PCA parameters suitable for character picture matching are determined through experiments, so that the content of the pictures can be accurately expressed, and meanwhile, the rapid characteristic extraction is realized, and further, the rapid and robust picture matching is realized; a set of unique characteristic word extraction mode is designed for extracting LBP characteristics of a picture, reducing the LBP characteristics from 256 dimensions to 10 dimensions by setting an edge type method after the LBP characteristics are extracted, and then coding each individual characteristic word character into a 4-bit binary number, so that each characteristic word can be represented by a 40-bit binary integer (using 64-bit space during storage), the matching speed is improved, and meanwhile, the characteristic redundancy is reduced.

Aiming at the problem that a single characteristic cannot adapt to a changeable character type picture form, the invention integrates an ORB operator and a characteristic word, and designs a set of unique characteristic extraction scheme in a targeted manner according to a character type picture matching task, thereby effectively improving the character type picture matching efficiency.

2.1 Complex background character class Picture ORB feature extraction

The ORB feature matrix is extracted aiming at the complex background character class picture, the ORB feature extraction step is shown in FIG. 4, and compared with other operators such as SIFT, SURF and the like, the ORB operator can ensure the expression capability of the features while reducing the feature extraction speed; after the ORB operator is extracted, the ORB feature matrix is coded by a VLAD method to obtain a VLAD feature vector; and finally, PCA dimension reduction is carried out on the VLAD eigenvector to obtain a final eigenvector, so that the time for subsequent matching can be effectively reduced. The specific steps are as follows.

ORB feature extraction

Firstly, oFAST key point detection is performed, and each pixel point p in the image is compared with 16 pixels within the radius range of 3 pixels, specifically referring to FIG. 5. A threshold h is set when comparing p with the brightness of the surrounding pixels, in the following manner:

in the formula I_xRepresenting the brightness of the surrounding pixels, I_pExpressing the brightness of the central p-point pixel, d expressing the brightness ratio I of the pixel_pDark, b denotes the pixel luminance ratio I_pBright, s denotes the pixel brightness and I_pSimilarly, S for 9 or more than 9 peripheral pixels_xIf the values are d or p, the pixel points are key points. Then, a non-maximum suppression algorithm is used for solving the problem of a plurality of feature points at adjacent positions, the response size is calculated for each feature point, the calculation mode is that the absolute value sum of deviations of the feature point p and 16 feature points around the feature point p is calculated, in the feature points which are relatively adjacent, the feature point with the larger response value is reserved, and the rest feature points are deleted.

And setting a scale factor scaleFactor and the pyramid layer number nlevels. And reducing the original image into n levels of images according to the scale factor. And taking all the feature points extracted from the n images with different proportions and the original image as the feature points of the image. And finally, calculating the centroid of the feature point in a radius range by using r as the moment, wherein a vector is formed from the coordinates of the feature point to the centroid and serves as the direction of the feature point.

And finally, performing rBRIEF feature description on the extracted oFAST feature points. First, a set of 256 pairs of points is selected from a 31 × 31 neighborhood of the extracted oFast feature points, whose coordinates conform to a Gaussian distribution of (0, 25):

by transforming a matrix R_θMultiplying by D to rotate the point pair by θ degrees to obtain a new point pair:

D_θ＝R_θD (10)

finally, comparing the sizes of the point pairs at the positions of the new point set generates 256-bit binary string descriptors.

To lowerLow pair time required to extract ORB features, the invention sets the threshold h to 20 to reduce the number of orfast feature points extracted and nlevels to 3 to reduce the number of pyramid layers. In order to enable the extracted ORB characteristics to have certain scale invariance as far as possible while the pyramid layer number is reduced, the scaleFactor is set to be 1.3. Meanwhile, in order to reduce the time required by subsequent VLAD coding, the invention ranks the feature points according to scores on the basis of reducing the total number of oFAST feature points, only obtains the 400 feature points with the highest scores for rBRIEF feature description, and the feature point scores are expressed by the S of the surrounding pixels_xIs determined by the sum of the numbers of d or p.

VLAD coding

The ORB features are then VLAD encoded. The VLAD signature coding formula is as follows:

where x is the value of the feature point of the image, c_iThe characteristic points of the image are subjected to K-means clustering to obtain a clustering center, K (32 points are taken), and NN (x) is the clustering center closest to the characteristic point x. v. of_i，jIs c is_iThe value of each dimension of the feature point x in the cluster as the cluster center and the cluster center c_iThe difference in value of each dimension of (a). Let x dimension j be 256, i be 32, then v_i,jHas a dimension of 32 × 256, v is_i,jAnd (3) connecting the line ends to obtain an ORB characteristic V coded by the VLAD, wherein the dimension d is 32 multiplied by 256 to 8192.

PCA dimensionality reduction

And finally, carrying out PCA dimension reduction on the feature points, reducing the dimension d to f, and removing redundancy among the features. Firstly, obtaining VLAD coded d-dimensional characteristics of n images, and splicing the characteristics to obtain a characteristic matrix M consisting of n d-dimensional vectors_n,dObtaining M_n,dOf the covariance matrix s_d,dThe formula is as follows:

in the formula s_d,dThe dimension of (d) is d × d. Then solve for s_d,dCharacteristic value λ of_d＝[λ₁,λ₂,...,λ_f,...,λ_d]And combining the largest f characteristic vectors into a dimension reduction matrix w_d,f(ii) a And finally, directly calculating according to the following formula:

z_n,f＝M_n,d×w_d,f (13)

in the formula z_n,fNamely the feature matrix after dimensionality reduction. Saving the dimensionality reduction matrix w_d,fThat is, the dimension of a single feature vector can be reduced, which corresponds to the case where n is 1 in equation 11.

After a plurality of tests, the value of f has great influence on the accuracy and the speed of final matching, and the value of f is in direct proportion to the time spent on feature matching and PCA conversion and is in wireless relation with the matching accuracy. According to the method, f is set to be 1024, so that the matching precision is high enough, and meanwhile, the time of feature matching and PCA conversion is reduced.

2.2 feature word extraction for simple background character pictures

The method comprises the steps of extracting feature words aiming at a character picture of a simple background, firstly extracting LBP features of the image, then carrying out LBP feature histogram statistics, and generating feature word vectors. The feature word extraction process is shown in fig. 6, and the specific steps are as follows.

LBP feature extraction

Firstly, extracting an LBP value of a picture, defining an original LBP operator as a window of 3 multiplied by 3, taking a central pixel of the window as a threshold value, and comparing the gray values of adjacent 8 pixels with the threshold value, wherein the specific calculation mode is as follows:

in the formula, i (p) represents the gray value of the p-th pixel except the central pixel in the window, and i (c) represents the gray value of the central pixel point c, wherein S (·) is a threshold function, and the specific calculation method is as follows:

if the surrounding pixel value is greater than the central pixel value, the position of the pixel point is marked as 1, otherwise, the position is 0. 8 points in the 3 × 3 neighborhood of the image are compared to generate 8-bit binary numbers (usually converted into decimal numbers, i.e. LBP codes, 256 types in total), i.e. the LBP value of the pixel point in the center of the window is obtained, and the value is used to reflect the texture information of the region.

In order to reduce the dimensionality of subsequent LBP characteristics and improve the matching speed and the characteristic expression capacity, the number of LBP codes is reduced from 256 to 10 by artificially designing and clustering the LBP codes. The specific method is shown in fig. 7, that is, after extracting the LBP value of each pixel, the pixel is classified into 10 classes according to the edge class corresponding to the LBP, where u represents the number of changes, that is, the LBP with the number of changes greater than 2 is classified into the 10 th class. It should be noted that the LBP with central symmetry belongs to the same class (i.e. the cycle is shifted left by 4 bits, such as 00000011 and 00110000 belong to the same class as class 3), and the expansion of the edges of the same class also belongs to the same class (such as 00110000 and 01111000 belong to the same class 3), and finally the LBP image is generated by using the edge class as the pixel value. Therefore, the length of the subsequently generated feature words can be reduced, and redundant information of the original LBP features is removed, so that the matching precision is improved.

LBP histogram statistics

1) carrying out binarization processing on the character picture G by using the Otsu method to obtain a binarization image G^binThe formula for acquiring the binary segmentation threshold value by Otsu method is as follows:

if the image in the image binary imageIf the white area is larger than the black area, the image is reversed to obtain a new binary image G^bin；

2) For black and white binary image G^binPerforming an opening operation and a closing operation to obtain an image G^tmp(convolution kernel size 5X 5), image G is extracted^tmpThe outline of the white region and generating a rectangular bounding box B ═ B for each outline₁,b₂,...,b_j,...,b_n]；

4) for original black and white binary image G^binPerforming a closed loop operation (convolution kernel size of 3 × 3), and generating LBP image G by the LBP feature extraction method^lbp；

5) Counting a histogram of the LBP image in each rectangular frame as an LBP feature, wherein an ith dimension value of the LBP feature is the number of pixels with pixel values equal to i in the LBP image, and the LBP edge types are 10 types, that is, the LBP features have 10 dimensions:

wherein

Represents a rectangular frame b_jAnd the count is a counting function and is used for counting the number of pixels with the pixel value equal to i in the LBP image.

Quantifying LBP characteristics by taking the self-defined value Q as a quantization factor:

the quantized LBP feature F_q ⁱAnd obtaining a characteristic word as an index lookup table:

TABLE 2 characteristic word index Table

Finally, in order to speed up the matching of pictures, the invention encodes the feature words from character strings into 64-bit integers, such as 'a' into '0000', 'b' into '0001', 'ab' into '00000001', and finally the feature words of each image are composed of a plurality of 64-bit integers.

3. Fast matching of character pictures

The method adopts corresponding methods to match according to the types of the extracted image features, measures the similarity of ORB features by directly calculating vector distance, measures the similarity of feature words by calculating repetition rate, sets a hit threshold T, compares the similarity distance with the hit threshold, and returns pictures with the distance less than the threshold. The hit distance T is set as follows:

1) calculating the characteristic distance between the template picture in each database and other template pictures to obtain the quartile p₇₅、p₂₅Median p₅₀And a minimum distance m.

2) Calculating an outlier threshold T^o：

3) If the minimum value is smaller than the outlier judgment threshold, taking the outlier judgment threshold as a hit threshold of the picture, otherwise, taking the minimum value as a hit threshold T of the picture:

the specific measurement modes are divided into the following two types of calculation respectively.

3.1 similarity measurement of ORB features of Complex background character class pictures

And during matching, taking the Manhattan distance between the feature vector of the picture to be matched and the feature vector in the database as an image similarity judgment basis. Let two picture feature vectors be

And

representing the value of the ith dimension of the feature vector, wherein N is the length of the feature vector, the specific calculation formula of the feature vector distance is as follows:

d_othe larger the value is, the farther the representative distance is, the lower the image similarity is; otherwise, the higher. After the comparison is completed, the distance between the database and the database is the minimum

3.2 similarity measurement of simple background character class Picture feature words

And when the images are matched, comparing the characteristic word strings of the images to be retrieved and the images in the database, and judging whether the characteristic words in the characteristic word strings of the images to be retrieved exist in the characteristic word strings of the images in the database. If the matched feature words exist, the feature words are regarded as matched feature words, and the proportion of the matched feature words is used as an image similarity measurement basis. The higher the proportion, the higher the image similarity, otherwise the lower. Let two picture feature words be respectively

Wherein

And

the count is a counting function and is used for counting the number of the same characteristic words in the two characteristic word strings. To obtain the minimum

When it is worth, will

3.3 Performance comparison results

In order to verify the performance of the method provided by the invention, table 3 compares the processing performance of the method with that of other methods, the test platform is an Intel Core i5-7500 CPU, and the ubuntu16.04lts system is installed and runs in a single-thread mode. The test result shows that compared with other character and picture matching methods, the method provided by the invention has the advantages that the precision and the matching speed are greatly improved, and the method is more suitable for quick and accurate matching of character and picture.

TABLE 3 comparison of character class Picture processing Performance

Method	Accuracy of measurement	Time consuming (ms)
			BOVW^[1]	86.3％	>800
SIFT^[2]	85.6％	>500
			The invention	92.3％	<100

[1]Shekhar R,Jawahar C V.Word Image Retrieval Using Bag ofVisual Words[C].document analysis systems,2012:297-301.

[2] UBUL K, YADIKARN, AMATA, etc. Uyghur document image retrieval based on gradient co-occurre matrix [ C ]//2015Chinese Automation Consistency (CAC). Wuhan, China: IEEE,2015: 762-.

Claims

1. A character picture detection and rapid matching method combining lightweight network and personalized feature extraction is characterized in that: firstly, classifying Internet pictures based on a deep learning method of a lightweight network, detecting character pictures and non-character pictures, further dividing the character pictures into two types of character pictures of a complex background and a simple background, and respectively extracting ORB (object-oriented features) or feature words; finally, matching is carried out by adopting a corresponding method according to the extracted image feature types, ORB feature matching is carried out in a mode of directly calculating vector distance, feature word matching is carried out in a mode of calculating the feature word repetition rate, and a matching result is returned;

the character picture feature extraction is a key step for completing character picture matching, and the expression capability of the features directly determines the picture matching effect; and extracting personalized features from the character pictures of the detected and classified complex background or simple background, extracting ORB features from the character pictures of the complex background, and extracting feature words from the character pictures of the simple background.

2. The character and picture detection and rapid matching method combining lightweight network and personalized feature extraction according to claim 1, characterized in that: the ORB characteristic of the complex background character class picture is extracted as follows, an ORB characteristic matrix is extracted aiming at the complex background character class picture, and the ORB characteristic matrix is coded by a VLAD method after an ORB operator is extracted to obtain a VLAD characteristic vector; and finally, carrying out PCA dimension reduction on the VLAD eigenvector to obtain a final eigenvector, and reducing the time of subsequent matching.

3. The character and picture detection and rapid matching method combining lightweight network and personalized feature extraction according to claim 2, characterized in that: the extraction of ORB features comprises two steps of oFAST key point detection and rBRIEF feature description;

firstly, determining a characteristic point by calculating the size relationship between a pixel value of a certain point and a peripheral pixel value of the certain point through oFAST key point detection; firstly, comparing the brightness values of central pixel point and radius peripheral pixel point, temporarily setting threshold value h and central point brightness value I_pIf the brightness of the peripheral pixels is greater than I_p+ h or both are less than I_pH, judging the point as a characteristic point; then using non-maximum suppression to prevent selecting a plurality of feature points in a relatively adjacent area; then setting a scaling factor scaleFactor and pyramid layer number nlevels, reducing the original image into nlevels images according to the scaling factor, and extracting the sum of characteristic points of the nlevels images in different proportions to serve as an oFAST characteristic point of the images; finally, calculating the centroid of the feature point in the radius range by using r as the moment, and forming a vector from the coordinates of the feature point to the centroid as the direction of the feature point;

secondly, the rBRIEF feature description enables the ORB algorithm to have rotation invariance, firstly, a point set is selected in the neighborhood of the extracted oFAST feature points, and coordinates of the point set accord with Gaussian distribution; then, rotating the random pixel pairs according to the direction angle of the key point to enable the direction of the random point to be consistent with that of the key point so as to obtain rotation invariance; finally, rBRIEF compares the intensities of the random pixel pairs and assigns 1 and 0 accordingly to create corresponding 256-bit binary string descriptor feature vectors, all the feature vector sets created for all keypoints in the image are referred to as ORB descriptors.

4. The character and picture detection and rapid matching method combining lightweight network and personalized feature extraction according to claim 3, characterized in that: using a clustering algorithm to obtain k clustering centers aiming at ORB characteristics, and carrying out VLAD coding to obtain v_ij，v_ijIs c is_iThe value of each dimension of the feature point x in the cluster as the cluster center and the cluster center c_iThe value of each dimension of (a) is subjected to the result of difference and then summation;

performing PCA dimension reduction on the VLAD characteristics, and mapping f to a low-dimensional space; the PCA method is used for solving a feature matrix M formed by splicing VLAD features of n images_n,dAnd similar features are combined, the dimension of the features is reduced, and overfitting is prevented.

5. The character and picture detection and rapid matching method combining lightweight network and personalized feature extraction according to claim 1, characterized in that: extracting feature words aiming at the character pictures of the simple background; the method comprises the steps of firstly extracting LBP characteristics of an image, wherein a character type picture of a simple background possibly comprises a plurality of character areas, preprocessing the picture and detecting the character areas after the LBP characteristics are extracted, and then carrying out LBP characteristic histogram statistics in areas to generate characteristic word vectors.

6. The character and picture detection and rapid matching method combining lightweight network and personalized feature extraction according to claim 5, characterized in that: LBP characteristic extraction, namely taking a window central pixel as a threshold value, comparing the gray values of adjacent pixels with the threshold value, if the values of surrounding pixels are greater than the value of the central pixel, marking the position of the pixel as 1, and otherwise, marking the position of the pixel as 0; thus, the LBP value of the central pixel point of the window is obtained, and the value is used for reflecting the texture information of the area; after the LBP value of each pixel value is obtained, the LBP image is generated as the pixel value according to the edge type classification corresponding to the LBP.

7. The character and picture detection and rapid matching method combining lightweight network and personalized feature extraction according to claim 5, characterized in that: performing LBP histogram statistics, namely preprocessing a character picture to obtain character areas in the picture when performing the LBP histogram statistics, and then performing the histogram statistics on the character areas respectively; the method comprises the following specific steps:

4) performing a closing operation on the original black-and-white binary image, and generating an LBP image by using the LBP feature extraction method;

8. The character and picture detection and rapid matching method combining lightweight network and personalized feature extraction according to claim 1, characterized in that: respectively adopting a corresponding method to carry out matching according to the extracted image feature types, measuring the similarity of ORB features in a mode of directly calculating vector distance, measuring the similarity of feature words in a mode of calculating repetition rate, comparing the similarity with a hit threshold value, and returning pictures with the distance less than the threshold value, wherein the specific mode is as follows;

(1) similarity measurement of ORB features of the complex background character class pictures;

the Manhattan distance d between the feature vector of the picture to be matched and the feature vector in the database is used during matching_oAs the basis for judging the similarity of images, d_oThe value calculation formula is:

d_othe larger the value is, the farther the distance is represented, the lower the image similarity is, otherwise, the higher the image similarity is; when the nearest distance is obtained

If the distance is smaller than the hit threshold, the matching is judged to be successful, otherwise, the matching is failed;

(2) similarity measurement of the simple background character class picture feature words;

when the images are matched, comparing the characteristic word strings of the images to be retrieved with the images in the database, and judging whether the characteristic words in the characteristic word strings of the images to be retrieved exist in the characteristic word strings of the images in the database; if the matched feature words exist, the feature words are regarded as matched feature words, and the proportion of the matched feature words is used as an image similarity measurement basis; let two picture feature words be respectively

Wherein

And

representing a 64-bit integer in the characteristic word string, wherein m and n are the lengths of two characteristic words respectively; the metric distance d between the feature words_wThe calculation formula of (a) is as follows:

the count is a counting function and is used for counting the number of the same characteristic words in the two characteristic word strings; to obtain the minimum

When it is worth, will