CN111967488A

CN111967488A - Mobile phone shot text image matching method based on twin convolutional neural network

Info

Publication number: CN111967488A
Application number: CN202010576688.3A
Authority: CN
Inventors: 刘丽; 胡煜鑫; 邱桃荣
Original assignee: Nanchang University
Current assignee: Nanchang University
Priority date: 2020-06-22
Filing date: 2020-06-22
Publication date: 2020-11-20

Abstract

The invention discloses a method for matching a text image shot by a mobile phone based on a twin convolutional neural network. The method mainly comprises the following steps: and image feature extraction and image similarity calculation. The method trains a twin convolutional neural network for extracting the characteristics of the text images shot by the mobile phone, wherein the network is provided with two branches, each branch is formed by GhostNet, and the two branches share weight. During training, two text images of the network are input each time, each image passes through one branch, and the contrast loss is used as a loss function of the network. By improving the diversity of training samples, the extracted image features have good robustness to the problems of uneven illumination, defocusing, projection transformation and the like of the images shot by the mobile phone. Then, any one text image is input into any branch of the trained twin convolutional neural network, and the last layer of the GhostNet is output as the feature of the image. In order to obtain the similarity of the two images, the Euclidean distance is adopted for calculation.

Description

Mobile phone shot text image matching method based on twin convolutional neural network

Technical Field

The invention belongs to the field of image processing, and particularly relates to a method for matching a text image shot by a mobile phone based on a twin convolutional neural network.

Background

With the popularization of smart phones, people often take pictures of texts interested in the real physical world, and further search information related to the images in the internet or a digital library by using the images, so that an effective matching method of the shot text images of the mobile phones is urgently needed.

The images shot by the mobile phone are easily affected by factors such as uneven illumination, defocusing blur and the like, and on the other hand, the images shot by the mobile phone generally have relatively serious projection transformation due to different shooting angles, so that great challenges are brought to subsequent matching. The traditional matching method for the text images shot by the mobile phone firstly segments each word in the images, then describes the images by using the space position relationship among the words, and expresses one image by using a plurality of feature vectors. However, word segmentation in a text image is a difficult problem, and since the quality of a text image shot by a mobile phone is generally poor, over-segmentation or adhesion of a plurality of words often occurs, which brings negative effects to subsequent matching.

In order to solve the problem, the invention provides a method for matching the text image shot by the mobile phone based on a twin convolutional neural network, wherein the twin convolutional neural network is trained to be used for extracting the image characteristics, and the extracted characteristics have stronger robustness to the problems of uneven image illumination, defocusing, projection transformation and the like by improving the diversity of positive and negative samples in a training set, so that the problem caused by segmenting words in the text image in the traditional method is effectively avoided.

Disclosure of Invention

The invention provides a twin convolutional neural network-based matching method for a text image shot by a mobile phone, which comprises the following steps of:

a twin convolutional neural network is trained for extracting image features, the network has two branches, each branch is composed of GhostNet, and the two branches share weights. During training, two text images of the network are input each time, each image passes through one branch, and the contrast loss is used as a loss function of the network.

In order to extract the characteristics with good robustness to the problems of uneven illumination, defocused blurring, projection transformation and the like of text images shot by a mobile phone, the diversity of samples in a training set is improved. Specifically, the two images that make up the positive sample are from the same text, with one image being taken with the cell phone and the other image being obtained by other means, such as scanning with a scanner. The two images have obvious differences in the aspects of illumination, visual angle, brightness, resolution and the like; the two images that constitute the negative examples are from different texts.

The features of any text image can be extracted by utilizing the trained twin convolutional neural network. Specifically, an image is input into any branch of a trained twin convolutional neural network, and the last layer output of the GhostNet is used as the characteristic of the image.

Based on the extracted image features, the similarity of the two images is calculated by using the Euclidean distance.

The invention has the beneficial effects that:

the method for matching the text images shot by the mobile phone based on the twin convolutional neural network has good robustness to the problems of uneven illumination, view angle transformation, defocusing blur and the like of the images shot by the mobile phone.

Drawings

FIG. 1 is a flow chart of the present invention;

FIG. 2 is a block diagram of a twin convolutional neural network proposed by the present invention;

FIG. 3 is a flow chart of image feature extraction using a trained twin convolutional neural network proposed by the present invention.

Detailed Description

The present invention will be described in further detail with reference to the following specific examples and the accompanying drawings. The procedures, conditions, experimental methods and the like for carrying out the present invention are general knowledge and common general knowledge in the art except for the contents specifically mentioned below, and the present invention is not particularly limited.

The invention provides a twin convolution neural network-based matching method for a text image shot by a mobile phone, which mainly comprises two parts: and image feature extraction and image similarity calculation. In order to extract image features, a twin convolutional neural network is firstly trained, the network is provided with two branches, each branch is composed of GhostNet, the two branches share weight, and contrast loss is adopted as a network loss function. Then, any image is input into any branch of the trained twin convolutional neural network, and the last layer of the GhostNet is output as the characteristic of the image. In order to obtain the similarity of the two images, the Euclidean distance is adopted for calculation.

In order to extract image features, a twin convolutional neural network is trained, the network structure is shown in fig. 2, the network has two branches, each branch is formed by GhostNet, and the two branches share weights. Inputting two images I of the network at a time₁And I₂Each image passes through a branch of the network, the last layer of the network being the fully connected layer, the output being denoted S₁And S₂. The loss function of the network is defined as follows:

wherein N is the number of training samples, y is a label of a training sample, and if two input images are positive samples, y is 1; conversely, if the two input images are negative samples, y is 0. m is a threshold value set to 1.5. d represents two images I₁And I₂Output S via a network₁And S₂The euclidean distance between them is specifically defined as follows:

d＝||S₁-S₂||₂

in order to enable the network to have strong robustness to the problems of uneven illumination, defocused blurring, projective transformation and the like of text images shot by a mobile phone, a training sample with diversity is provided for the network. In particular, the two images constituting the positive sample are from the same text, one of which is taken by a mobile phone and the other of which is obtained by other means, such as scanning with a scanner. The two images have obvious differences in the aspects of illumination, visual angle, brightness, resolution and the like; the two images that constitute the negative examples are from different texts.

The invention adopts an average random gradient descent (ASGD) algorithm to carry out network training, the learning rate is set to be 0.00001, and the weight attenuation coefficient is set to be 0.0005.

The trained network is used to extract image features, specifically, any image is input into any branch of the trained network, and the output of the last fully-connected layer of the network is used as the image features as in the network training, as shown in fig. 3.

(image similarity calculation)

Given two imagesIAnd I ', and the other characteristics are respectively expressed as S and S ', the similarity S (I, I ') of the two images is calculated by using the Euclidean distance, and the specific formula is as follows, wherein the smaller the Euclidean distance is, the higher the similarity of the two images is; the larger the euclidean distance, the lower the similarity between the two images.

s(I，I')＝||S-S′'||₂。

Claims

1. A method for matching a text image shot by a mobile phone based on a twin convolutional neural network is characterized by comprising the following two steps:

the method comprises the following steps: the image features are extracted, so that the method has good robustness to the problems of uneven illumination, out-of-focus blur, projection transformation and the like frequently existing in the text images shot by the mobile phone.

Step two: based on the extracted image features, the similarity of the two images is calculated by using the Euclidean distance.

2. The method for matching the text image shot by the mobile phone based on the twin convolutional neural network as claimed in claim 1, wherein: the feature extraction of the first step is realized by training a twin convolutional neural network, wherein the network is provided with two branches, each branch is formed by GhostNet, and the two branches share weight.

3. The method for matching the text image shot by the mobile phone based on the twin convolutional neural network as claimed in claim 2, wherein: during training, two text images of the network are input each time, each image passes through one branch, and contrast loss is taken as a loss function of the network, and is defined as follows:

wherein N is the number of training samples, y is a label of a training sample, and if two input images are positive samples, y is 1; on the contrary, if the two input images are negative samples, y is 0, m is a threshold and is set to 1.5;

d represents two images I₁And I₂Output S via a network₁And S₂The euclidean distance between them is specifically defined as follows:

d＝||s₁-s₂||₂

the features of any text image can be extracted by utilizing the trained twin convolutional neural network; specifically, an image is input into any branch of a trained twin convolutional neural network, and the last layer output of the GhostNet is used as the characteristic of the image.

4. The method for matching the text image shot by the mobile phone based on the twin convolutional neural network as claimed in claim 1, wherein: the first step is to extract the characteristics with good robustness of the text image shot by the mobile phone, so that the diversity of samples in a training set is improved; specifically, two images forming the positive sample are from the same text, wherein one image is obtained by shooting with a mobile phone, the other image is obtained by other modes, and the two images have obvious differences in the aspects of illumination, visual angle, brightness, resolution and the like; the two images that constitute the negative examples are from different texts.

5. The method for matching the text image shot by the mobile phone based on the twin convolutional neural network as claimed in claim 4, wherein: the further image is obtained by scanning with a scanner.