CN111967488A - Mobile phone shot text image matching method based on twin convolutional neural network - Google Patents

Mobile phone shot text image matching method based on twin convolutional neural network Download PDF

Info

Publication number
CN111967488A
CN111967488A CN202010576688.3A CN202010576688A CN111967488A CN 111967488 A CN111967488 A CN 111967488A CN 202010576688 A CN202010576688 A CN 202010576688A CN 111967488 A CN111967488 A CN 111967488A
Authority
CN
China
Prior art keywords
image
mobile phone
images
convolutional neural
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010576688.3A
Other languages
Chinese (zh)
Inventor
刘丽
胡煜鑫
邱桃荣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanchang University
Original Assignee
Nanchang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanchang University filed Critical Nanchang University
Priority to CN202010576688.3A priority Critical patent/CN111967488A/en
Publication of CN111967488A publication Critical patent/CN111967488A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/751Comparing pixel values or logical combinations thereof, or feature values having positional relevance, e.g. template matching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The invention discloses a method for matching a text image shot by a mobile phone based on a twin convolutional neural network. The method mainly comprises the following steps: and image feature extraction and image similarity calculation. The method trains a twin convolutional neural network for extracting the characteristics of the text images shot by the mobile phone, wherein the network is provided with two branches, each branch is formed by GhostNet, and the two branches share weight. During training, two text images of the network are input each time, each image passes through one branch, and the contrast loss is used as a loss function of the network. By improving the diversity of training samples, the extracted image features have good robustness to the problems of uneven illumination, defocusing, projection transformation and the like of the images shot by the mobile phone. Then, any one text image is input into any branch of the trained twin convolutional neural network, and the last layer of the GhostNet is output as the feature of the image. In order to obtain the similarity of the two images, the Euclidean distance is adopted for calculation.

Description

Mobile phone shot text image matching method based on twin convolutional neural network
Technical Field
The invention belongs to the field of image processing, and particularly relates to a method for matching a text image shot by a mobile phone based on a twin convolutional neural network.
Background
With the popularization of smart phones, people often take pictures of texts interested in the real physical world, and further search information related to the images in the internet or a digital library by using the images, so that an effective matching method of the shot text images of the mobile phones is urgently needed.
The images shot by the mobile phone are easily affected by factors such as uneven illumination, defocusing blur and the like, and on the other hand, the images shot by the mobile phone generally have relatively serious projection transformation due to different shooting angles, so that great challenges are brought to subsequent matching. The traditional matching method for the text images shot by the mobile phone firstly segments each word in the images, then describes the images by using the space position relationship among the words, and expresses one image by using a plurality of feature vectors. However, word segmentation in a text image is a difficult problem, and since the quality of a text image shot by a mobile phone is generally poor, over-segmentation or adhesion of a plurality of words often occurs, which brings negative effects to subsequent matching.
In order to solve the problem, the invention provides a method for matching the text image shot by the mobile phone based on a twin convolutional neural network, wherein the twin convolutional neural network is trained to be used for extracting the image characteristics, and the extracted characteristics have stronger robustness to the problems of uneven image illumination, defocusing, projection transformation and the like by improving the diversity of positive and negative samples in a training set, so that the problem caused by segmenting words in the text image in the traditional method is effectively avoided.
Disclosure of Invention
The invention provides a twin convolutional neural network-based matching method for a text image shot by a mobile phone, which comprises the following steps of:
a twin convolutional neural network is trained for extracting image features, the network has two branches, each branch is composed of GhostNet, and the two branches share weights. During training, two text images of the network are input each time, each image passes through one branch, and the contrast loss is used as a loss function of the network.
In order to extract the characteristics with good robustness to the problems of uneven illumination, defocused blurring, projection transformation and the like of text images shot by a mobile phone, the diversity of samples in a training set is improved. Specifically, the two images that make up the positive sample are from the same text, with one image being taken with the cell phone and the other image being obtained by other means, such as scanning with a scanner. The two images have obvious differences in the aspects of illumination, visual angle, brightness, resolution and the like; the two images that constitute the negative examples are from different texts.
The features of any text image can be extracted by utilizing the trained twin convolutional neural network. Specifically, an image is input into any branch of a trained twin convolutional neural network, and the last layer output of the GhostNet is used as the characteristic of the image.
Based on the extracted image features, the similarity of the two images is calculated by using the Euclidean distance.
The invention has the beneficial effects that:
the method for matching the text images shot by the mobile phone based on the twin convolutional neural network has good robustness to the problems of uneven illumination, view angle transformation, defocusing blur and the like of the images shot by the mobile phone.
Drawings
FIG. 1 is a flow chart of the present invention;
FIG. 2 is a block diagram of a twin convolutional neural network proposed by the present invention;
FIG. 3 is a flow chart of image feature extraction using a trained twin convolutional neural network proposed by the present invention.
Detailed Description
The present invention will be described in further detail with reference to the following specific examples and the accompanying drawings. The procedures, conditions, experimental methods and the like for carrying out the present invention are general knowledge and common general knowledge in the art except for the contents specifically mentioned below, and the present invention is not particularly limited.
The invention provides a twin convolution neural network-based matching method for a text image shot by a mobile phone, which mainly comprises two parts: and image feature extraction and image similarity calculation. In order to extract image features, a twin convolutional neural network is firstly trained, the network is provided with two branches, each branch is composed of GhostNet, the two branches share weight, and contrast loss is adopted as a network loss function. Then, any image is input into any branch of the trained twin convolutional neural network, and the last layer of the GhostNet is output as the characteristic of the image. In order to obtain the similarity of the two images, the Euclidean distance is adopted for calculation.
In order to extract image features, a twin convolutional neural network is trained, the network structure is shown in fig. 2, the network has two branches, each branch is formed by GhostNet, and the two branches share weights. Inputting two images I of the network at a time1And I2Each image passes through a branch of the network, the last layer of the network being the fully connected layer, the output being denoted S1And S2. The loss function of the network is defined as follows:
Figure RE-GDA0002666634130000031
wherein N is the number of training samples, y is a label of a training sample, and if two input images are positive samples, y is 1; conversely, if the two input images are negative samples, y is 0. m is a threshold value set to 1.5. d represents two images I1And I2Output S via a network1And S2The euclidean distance between them is specifically defined as follows:
d=||S1-S2||2
in order to enable the network to have strong robustness to the problems of uneven illumination, defocused blurring, projective transformation and the like of text images shot by a mobile phone, a training sample with diversity is provided for the network. In particular, the two images constituting the positive sample are from the same text, one of which is taken by a mobile phone and the other of which is obtained by other means, such as scanning with a scanner. The two images have obvious differences in the aspects of illumination, visual angle, brightness, resolution and the like; the two images that constitute the negative examples are from different texts.
The invention adopts an average random gradient descent (ASGD) algorithm to carry out network training, the learning rate is set to be 0.00001, and the weight attenuation coefficient is set to be 0.0005.
The trained network is used to extract image features, specifically, any image is input into any branch of the trained network, and the output of the last fully-connected layer of the network is used as the image features as in the network training, as shown in fig. 3.
(image similarity calculation)
Given two imagesIAnd I ', and the other characteristics are respectively expressed as S and S ', the similarity S (I, I ') of the two images is calculated by using the Euclidean distance, and the specific formula is as follows, wherein the smaller the Euclidean distance is, the higher the similarity of the two images is; the larger the euclidean distance, the lower the similarity between the two images.
s(I,I')=||S-S′'||2
The method for matching the text images shot by the mobile phone based on the twin convolutional neural network has good robustness to the problems of uneven illumination, view angle transformation, defocusing blur and the like of the images shot by the mobile phone.

Claims (5)

1. A method for matching a text image shot by a mobile phone based on a twin convolutional neural network is characterized by comprising the following two steps:
the method comprises the following steps: the image features are extracted, so that the method has good robustness to the problems of uneven illumination, out-of-focus blur, projection transformation and the like frequently existing in the text images shot by the mobile phone.
Step two: based on the extracted image features, the similarity of the two images is calculated by using the Euclidean distance.
2. The method for matching the text image shot by the mobile phone based on the twin convolutional neural network as claimed in claim 1, wherein: the feature extraction of the first step is realized by training a twin convolutional neural network, wherein the network is provided with two branches, each branch is formed by GhostNet, and the two branches share weight.
3. The method for matching the text image shot by the mobile phone based on the twin convolutional neural network as claimed in claim 2, wherein: during training, two text images of the network are input each time, each image passes through one branch, and contrast loss is taken as a loss function of the network, and is defined as follows:
Figure FDA0002551313410000011
wherein N is the number of training samples, y is a label of a training sample, and if two input images are positive samples, y is 1; on the contrary, if the two input images are negative samples, y is 0, m is a threshold and is set to 1.5;
d represents two images I1And I2Output S via a network1And S2The euclidean distance between them is specifically defined as follows:
d=||s1-s2||2
the features of any text image can be extracted by utilizing the trained twin convolutional neural network; specifically, an image is input into any branch of a trained twin convolutional neural network, and the last layer output of the GhostNet is used as the characteristic of the image.
4. The method for matching the text image shot by the mobile phone based on the twin convolutional neural network as claimed in claim 1, wherein: the first step is to extract the characteristics with good robustness of the text image shot by the mobile phone, so that the diversity of samples in a training set is improved; specifically, two images forming the positive sample are from the same text, wherein one image is obtained by shooting with a mobile phone, the other image is obtained by other modes, and the two images have obvious differences in the aspects of illumination, visual angle, brightness, resolution and the like; the two images that constitute the negative examples are from different texts.
5. The method for matching the text image shot by the mobile phone based on the twin convolutional neural network as claimed in claim 4, wherein: the further image is obtained by scanning with a scanner.
CN202010576688.3A 2020-06-22 2020-06-22 Mobile phone shot text image matching method based on twin convolutional neural network Pending CN111967488A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010576688.3A CN111967488A (en) 2020-06-22 2020-06-22 Mobile phone shot text image matching method based on twin convolutional neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010576688.3A CN111967488A (en) 2020-06-22 2020-06-22 Mobile phone shot text image matching method based on twin convolutional neural network

Publications (1)

Publication Number Publication Date
CN111967488A true CN111967488A (en) 2020-11-20

Family

ID=73361887

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010576688.3A Pending CN111967488A (en) 2020-06-22 2020-06-22 Mobile phone shot text image matching method based on twin convolutional neural network

Country Status (1)

Country Link
CN (1) CN111967488A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112529390A (en) * 2020-12-02 2021-03-19 平安医疗健康管理股份有限公司 Task allocation method and device, computer equipment and storage medium
CN112712078A (en) * 2020-12-31 2021-04-27 上海智臻智能网络科技股份有限公司 Text detection method and device
CN113065645A (en) * 2021-04-30 2021-07-02 华为技术有限公司 Twin attention network, image processing method and device
CN114049634A (en) * 2022-01-12 2022-02-15 深圳思谋信息科技有限公司 Image recognition method and device, computer equipment and storage medium
CN114239630A (en) * 2021-11-03 2022-03-25 广东科学技术职业学院 Method and device for detecting copied two-dimensional code and readable medium
WO2022134728A1 (en) * 2020-12-25 2022-06-30 苏州浪潮智能科技有限公司 Image retrieval method and system, and device and medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103281745A (en) * 2013-06-03 2013-09-04 南昌大学 Wireless sensing network routing method of quotient topology energy hierarchical game
CN110704712A (en) * 2019-09-20 2020-01-17 武汉大学 Scene picture shooting position range identification method and system based on image retrieval

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103281745A (en) * 2013-06-03 2013-09-04 南昌大学 Wireless sensing network routing method of quotient topology energy hierarchical game
CN110704712A (en) * 2019-09-20 2020-01-17 武汉大学 Scene picture shooting position range identification method and system based on image retrieval

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
GLOVER,F等: "Parametric ghost image process for fixed-charge problems: A study of transportation networks", 《JOURNAL OF HEURISTICS》 *
KAI HAN等: "GhostNet: More Features from cheap Operations", 《HTTPS://ARXIV.ORG/ABS/1911.11907》 *
邱桃荣等: "改进量子搜索算法及其在核属性求解上的应用", 《计算机工程与应用》 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112529390A (en) * 2020-12-02 2021-03-19 平安医疗健康管理股份有限公司 Task allocation method and device, computer equipment and storage medium
WO2022134728A1 (en) * 2020-12-25 2022-06-30 苏州浪潮智能科技有限公司 Image retrieval method and system, and device and medium
CN112712078A (en) * 2020-12-31 2021-04-27 上海智臻智能网络科技股份有限公司 Text detection method and device
CN113065645A (en) * 2021-04-30 2021-07-02 华为技术有限公司 Twin attention network, image processing method and device
CN113065645B (en) * 2021-04-30 2024-04-09 华为技术有限公司 Twin attention network, image processing method and device
CN114239630A (en) * 2021-11-03 2022-03-25 广东科学技术职业学院 Method and device for detecting copied two-dimensional code and readable medium
CN114049634A (en) * 2022-01-12 2022-02-15 深圳思谋信息科技有限公司 Image recognition method and device, computer equipment and storage medium
CN114049634B (en) * 2022-01-12 2022-05-13 深圳思谋信息科技有限公司 Image recognition method and device, computer equipment and storage medium

Similar Documents

Publication Publication Date Title
CN111967488A (en) Mobile phone shot text image matching method based on twin convolutional neural network
CN109299274B (en) Natural scene text detection method based on full convolution neural network
CN109829443B (en) Video behavior identification method based on image enhancement and 3D convolution neural network
CN108986050B (en) Image and video enhancement method based on multi-branch convolutional neural network
CN107679491B (en) 3D convolutional neural network sign language recognition method fusing multimodal data
CN109583340B (en) Video target detection method based on deep learning
CN112150493B (en) Semantic guidance-based screen area detection method in natural scene
EP3772036A1 (en) Detection of near-duplicate image
CN111144376B (en) Video target detection feature extraction method
CN111639564B (en) Video pedestrian re-identification method based on multi-attention heterogeneous network
CN109657612B (en) Quality sorting system based on facial image features and application method thereof
CN109948721B (en) Video scene classification method based on video description
CN115393396B (en) Unmanned aerial vehicle target tracking method based on mask pre-training
Konwer et al. Staff line removal using generative adversarial networks
CN112232351A (en) License plate recognition system based on deep neural network
CN111652231B (en) Casting defect semantic segmentation method based on feature self-adaptive selection
CN113011253A (en) Face expression recognition method, device, equipment and storage medium based on ResNeXt network
CN112085017A (en) Tea tender shoot image segmentation method based on significance detection and Grabcut algorithm
CN111932645A (en) Method for automatically generating ink and wash painting based on generation countermeasure network GAN
CN112614110B (en) Method and device for evaluating image quality and terminal equipment
CN111861935A (en) Rain removing method based on image restoration technology
CN112070686A (en) Backlight image cooperative enhancement method based on deep learning
CN116740572A (en) Marine vessel target detection method and system based on improved YOLOX
CN115713546A (en) Lightweight target tracking algorithm for mobile terminal equipment
Lei et al. Noise-robust wagon text extraction based on defect-restore generative adversarial network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20201120