CN114387490A - Backbone design of end-side OCR recognition system based on NAS search - Google Patents
Backbone design of end-side OCR recognition system based on NAS search Download PDFInfo
- Publication number
- CN114387490A CN114387490A CN202111471433.1A CN202111471433A CN114387490A CN 114387490 A CN114387490 A CN 114387490A CN 202111471433 A CN202111471433 A CN 202111471433A CN 114387490 A CN114387490 A CN 114387490A
- Authority
- CN
- China
- Prior art keywords
- network
- search
- architecture
- ocr
- backhaul
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/082—Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Computing Systems (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Probability & Statistics with Applications (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a Backbone design of an end-side OCR recognition system based on NAS search, which comprises the following steps: the design of the OCR overall architecture and the design of an OCR system are divided into three modules, namely a differentiable backhaul, a detection head and a recognition head, wherein the detection head and the recognition head can be replaced by a common lightweight architecture for detection and recognition, and the discussion is omitted, so that the lightweight backhaul is mainly constructed. The invention designs a Backbone architecture for an OCR system at an end side by multi-task architecture search, designs an overall architecture and four search OPs of the OCR Backbone by drawing excellent experience of foreigners, optimizes the time delay and parameters of a network architecture and loss detected and identified by differentiable search, and finds an optimal solution among a model effect, a model parameter and a model time delay; the method can replace manually designed backhaul to find the optimal deployment architecture.
Description
Technical Field
The invention relates to the field of OCR, Automl and NAS, in particular to a Backbone design of an end-side OCR recognition system based on NAS search.
Background
OCR, optical character recognition, refers to the process of translating characters in a picture into computer text by a character recognition method. The method can be generally applied to the recognition of various documents, various bills, various certificates and the like, and is one of the few technologies (based on deep learning) which can be really landed in actual production, and the OCR is generally divided into two steps: and (3) detecting, identifying and post-processing the characters. There are generally two ways to detect and identify text: two-stage text detection + text recognition and single-stage end2end detection recognition. The post-treatment can be roughly divided into two types: a priori knowledge based post-processing and deep learning based post-processing.
Since 2016, automl technology is continuously developed, especially since 2018, various papers about automatic parameter adjustment and automatic Search are found at various tops, NAS is taken as one of branches of automl, and is also concerned by students and cattle, various factories and colleges are also invested in research, NAS is called Neural Architecture Search, and Neural Architecture is automatically searched by defining a Search space and a Search algorithm, so that the artificial priori knowledge and the artificial bias are reduced, and a better Neural network Architecture is expected to be searched.
The current OCR recognition modes can be divided into two types: the method has the advantages that the model is deployed on the server, a large model can be used, so that the recognition rate is higher, the defects are that data needs to be transmitted at two ends, the time consumption of data transmission and the risk of transmission failure are increased, the picture needs to be compressed in the common transmission process, and the picture distortion is caused by a certain probability so as to influence the recognition accuracy rate. And the model is deployed at the end side, so that the image loss caused by data transmission and data compression can be directly avoided. The method has the disadvantages that the end side can not deploy a large model, and the model needs to be reduced in various compression and pruning modes, so that the precision loss is caused to a certain extent, the computing capability of the end side is limited, and the model still needs to consider the computing capability and the computing time delay. The deployment limitation of the OCR on the end side mainly lies in the backhaul, so the invention hopes to explore the backhaul of the OCR framework which is more excellent on the end side through the NAS technology, reduce the bias of artificially designing the backhaul, optimize the recognition effect and the recognition speed, and be more suitable for deployment on the end side.
Disclosure of Invention
The technical problem to be solved by the invention is to overcome the defects of the prior art and provide a backhaul design of an end-side OCR recognition system based on NAS search.
The invention provides the following technical scheme:
the invention provides a Backbone design of an end-side OCR recognition system based on NAS search, which comprises the following steps:
firstly, designing an OCR overall architecture:
the design of the OCR system is divided into three modules, namely a differentiable backhaul, a detection head and an identification head, wherein the detection head and the identification head can be replaced by a common light-weight framework for detection and identification, and the light-weight backhaul is mainly constructed without discussion;
secondly, the architecture design of the backhaul:
firstly, the overall architecture of the backhaul identified by the OCR needs to be designed, and the image classification network in the NANET is optimized by the architecture:
n represents the number of the layer, S represents the downward decreasing multiple of the picture or the map, and the structure uses the downsampling scale of 16 times, so that the network receptive field can be greatly improved, and the detection of the large length-width ratio of the text can be greatly improved;
thirdly, designing a pooling cell:
according to the results of the previous NAS search, whether the pooled cell can be searched or not does not greatly contribute to the network performance, so in order to reduce the network search time and consider the resource problem (here, only the single GPU search), the pooled cell is designed:
the pooling cell has the following advantages that the width of the network is widened firstly, different information can be collected according to the googlenet, the accuracy is improved, and shallow information can be combined by combining the thinking of a residual error network secondly; finally, integrating the information through summation operation; by introducing the pooling cell, the search space is reduced;
fourthly, designing a search space of the convolution cell:
the search of the connection mode is not carried out, only the search of the OP type is carried out, and 4 types of OPs are defined;
four kinds of end-side-based OPs are designed according to dw convolution proposed in mobilenet to jointly form a convolution cell;
the specific calculation of the combination mode of the ops in the convolution cell is shown in formula 1:
fifthly, differentiable design:
since the architecture parameters are discretized, differential operation cannot be performed, and then the network architecture parameters are subjected to reparameterization in a mode of combining probability distribution and softmax, so that differentiation can be performed along with the network; the specific operation mode is as follows:
step 1: assuming that the network output value is a vector a with n dimensions, an independent sample [ b ] of the same dimension as a and the same dimension of the chamber distribution is generated1,b2...,bn];
Step2 by the formula-log (-log (b)i) C) is calculated to obtaini;
Step3, adding the corresponding vectors to obtain a new vector a ═ a1+c1,...,an+cn];
Step4, calculating a final result through a softmax formula, wherein the softmax formula is shown as a formula 2:
where τ represents temperature, where the value decreases as the number of epochs trained increases;
sixthly, designing time delay and model parameter quantity:
since the OCR model is deployed on the end side, the size and the time delay of the model need to be taken into consideration in the searching process, so a manner that differentiable optimization can be performed along with model training is designed, and the specific steps are as follows:
step 1: compute runtime and model parameter size for each designed op individually, denoted asAndwhere i represents the number op, i ≦ 4, and l represents which convolutional cell it is located in, where l ≦ 6;
step2 multiplying each op corresponding to each layer from the parameterized network architecture parametersAndsumming all layers, so that the calculation delay and the network parameter quantity of the designed backhaul network can be obtained;
step3: performing multi-task optimization on the calculated network delay and network parameter quantity together with loss of network detection and identification; the calculation formula is as follows:
ltotal=ldet+lrecog+α*lt+β*lm
the alpha and beta represent the weight of each loss, the larger the weight of each loss is, the lighter the searched network is, and the balance between effect precision and model size and time delay needs to be made, and the balance can be adjusted according to experimental effects; and after the search is finished, selecting the op operation with the maximum value according to the size of the sofmax value of the architecture parameter to combine into the final backhaul.
Compared with the prior art, the invention has the following beneficial effects:
the method designs a Backbone architecture for an OCR system at an end side through multi-task architecture search, designs an overall architecture and four search OPs of the OCR Backbone by drawing excellent experience of predecessors, optimizes the time delay and parameters of a network architecture and loss detected and identified through differentiable search, and finds an optimal solution among a model effect, a model parameter and a model time delay; the method can replace manually designed backhaul to find the optimal deployment architecture.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:
FIG. 1 is a diagram of a prior art NAS search process;
FIG. 2 is a schematic diagram of the overall OCR architecture of the present invention;
FIG. 3 is a schematic diagram of the underlying network of the present invention;
FIG. 4 is a schematic diagram of a pooled cell of the present invention;
FIG. 5 is a schematic diagram of 4 ops according to the invention;
FIG. 6 is a schematic diagram of the 4 op combinations of the present invention.
Detailed Description
The preferred embodiments of the present invention will be described in conjunction with the accompanying drawings, and it will be understood that they are described herein for the purpose of illustration and explanation and not limitation. Wherein like reference numerals refer to like parts throughout.
Example 1
Referring to fig. 1 to 6, the present invention provides a backhaul design of an end-side OCR recognition system based on NAS search, comprising the following:
firstly, designing an OCR overall architecture:
as shown in fig. 2, the design of the OCR system is divided into three modules, a differentiable backhaul, a detection head and a recognition head, where the detection head and the recognition head can be replaced by a light-weight framework for detection and recognition, which is not discussed here, and we mainly aim to construct a light-weight backhaul;
secondly, the architecture design of the backhaul:
firstly, the overall architecture of the Backbone recognized by the OCR needs to be designed, here, we do some architectural optimization to the image classification network in NASnet, and the overall architecture design of the Backbone is shown in fig. 3:
as shown in fig. 3, N in the figure represents the number of the layer, S represents the downward-decreasing multiple of the picture or the map, and the structure of the invention uses a down-sampling scale of 16 times, so that the network receptive field can be greatly improved, and the detection of a large aspect ratio such as a text can be greatly improved;
thirdly, designing a pooling cell:
according to the results of the previous NAS search, whether the pooled cell can be searched or not does not greatly contribute to the network performance, so in order to reduce the time of the network search and consider the resource problem (here, we only search in a single GPU), we design the pooled cell as shown in fig. 4:
the designed pooling cell has the following advantages that the width of the network is widened firstly, people know that different information can be collected according to googlenet, so that the accuracy is improved, and the shallow information can be combined together by combining the thought of a residual error network; finally, integrating the information through summation operation; by introducing the pooled cells shown in the upper graph, the search space is reduced;
fourthly, designing a search space of the convolution cell:
here we do no search for the connection mode, only do an OP type search, here we define 4 types of OPs as shown in figure 5,
here we have designed four kinds of end-side based OPs to jointly constitute a convolutional cell according to dw convolution proposed in mobilenet; the combination of OP is shown in fig. 6:
the combination mode of ops in a convolution cell is shown in fig. 6, and the specific calculation is shown in formula 1:
fifthly, differentiable design:
since the architecture parameters are discretized, differential operation cannot be performed, so that the network architecture parameters are re-parameterized by introducing a mode of combining probability distribution and softmax, so that differentiation can be performed along with the network; the specific operation mode is as follows:
step 1: assuming that the network output value is a vector a with n dimensions, an independent sample [ b ] of the same dimension as a and the same dimension of the chamber distribution is generated1,b2...,bn];
Step2 by the formula-log (-log (b)i) C) is calculated to obtaini;
Step3, adding the corresponding vectors to obtain a new vector a ═ a1+c1,...,an+cn];
Step4, calculating a final result through a softmax formula, wherein the softmax formula is shown as a formula 2:
where τ represents temperature, where the value decreases as the number of epochs trained increases;
sixthly, designing time delay and model parameter quantity:
since the OCR model is deployed on the end side, we need to take the size and time delay of the model into consideration during the search process, and we design a way to perform differentiable optimization along with model training, which includes the following specific steps:
step 1: compute runtime and model parameter size for each designed op individually, denoted asAndwhere i represents the number op, i ≦ 4, and l represents which convolutional cell it is located in, where l ≦ 6;
step2 multiplying each op corresponding to each layer from the parameterized network architecture parametersAndsumming all layers, so that the calculation delay and the network parameter quantity of the designed backhaul network can be obtained;
step3: performing multi-task optimization on the calculated network delay and network parameter quantity together with loss of network detection and identification; the calculation formula is as follows:
ltotal=ldet+lrecog+α*lt+β*lm
the alpha and beta represent the weight of each loss, the larger the weight of each loss is, the lighter the searched network is, and the balance between effect precision and model size and time delay needs to be made, and the balance can be adjusted according to experimental effects; after the search is finished, the op operations with the maximum values are selected according to the sizes of the sofmax values of the architecture parameters to be combined into a final backhaul.
Finally, it should be noted that: although the present invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that changes may be made in the embodiments and/or equivalents thereof without departing from the spirit and scope of the invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (1)
1. The Backbone design of the end-side OCR recognition system based on NAS search is characterized by comprising the following steps:
firstly, designing an OCR overall architecture:
the design of the OCR system is divided into three modules, namely a differentiable backhaul, a detection head and an identification head, wherein the detection head and the identification head can be replaced by a common light-weight framework for detection and identification, and the light-weight backhaul is mainly constructed without discussion;
secondly, the architecture design of the backhaul:
firstly, the overall architecture of the backhaul identified by the OCR needs to be designed, and the image classification network in the NANET is optimized by the architecture:
n represents the number of the layer, S represents the downward decreasing multiple of the picture or the map, and the structure uses the downsampling scale of 16 times, so that the network receptive field can be greatly improved, and the detection of the large length-width ratio of the text can be greatly improved;
thirdly, designing a pooling cell:
according to the results of the previous NAS search, whether the pooled cell can be searched or not does not greatly contribute to the network performance, so in order to reduce the network search time and consider the resource problem (here, only the single GPU search), the pooled cell is designed:
the pooling cell has the following advantages that the width of the network is widened firstly, different information can be collected according to the googlenet, the accuracy is improved, and shallow information can be combined by combining the thinking of a residual error network secondly; finally, integrating the information through summation operation; by introducing the pooling cell, the search space is reduced;
fourthly, designing a search space of the convolution cell:
the search of the connection mode is not carried out, only the search of the OP type is carried out, and 4 types of OPs are defined;
four kinds of end-side-based OPs are designed according to dw convolution proposed in mobilenet to jointly form a convolution cell;
the specific calculation of the combination mode of the ops in the convolution cell is shown in formula 1:
equation 1 is a convolution cell used to calculate each layer, where X represents the input map and X' represents the output map, wiArchitectural parameters representing this layer;
fifthly, differentiable design:
since the architecture parameters are discretized, differential operation cannot be performed, and then the network architecture parameters are subjected to reparameterization in a mode of combining probability distribution and softmax, so that differentiation can be performed along with the network; the specific operation mode is as follows:
step 1: assuming that the network output value is a vector a with n dimensions, an independent sample [ b ] of the same dimension as a and the same dimension of the chamber distribution is generated1,b2...,bn];
Step2 by the formula-log (-log (b)i) C) is calculated to obtaini;
Step3, adding the corresponding vectors to obtain a new vector a ═ a1+c1,...,an+cn];
Step4, calculating a final result through a softmax formula, wherein the softmax formula is shown as a formula 2:
where τ represents temperature, where the value decreases as the number of epochs trained increases;
sixthly, designing time delay and model parameter quantity:
since the OCR model is deployed on the end side, the size and the time delay of the model need to be taken into consideration in the searching process, so a manner that differentiable optimization can be performed along with model training is designed, and the specific steps are as follows:
step 1: compute runtime and model parameter size for each designed op individually, denoted asAndwhere i represents the number op, i ≦ 4, and l represents which convolutional cell it is located in, where l ≦ 6;
step2 multiplying each op corresponding to each layer from the parameterized network architecture parametersAndsumming all layers, so that the calculation delay and the network parameter quantity of the designed backhaul network can be obtained;
step3: performing multi-task optimization on the calculated network delay and network parameter quantity together with loss of network detection and identification; the calculation formula is as follows:
ltotal=ldet+lrecog+α*lt+β*lm
the alpha and beta represent the weight of each loss, the larger the weight of each loss is, the lighter the searched network is, and the balance between effect precision and model size and time delay needs to be made, and the balance can be adjusted according to experimental effects; and after the search is finished, selecting the op operation with the maximum value according to the size of the sofmax value of the architecture parameter to combine into the final backhaul.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111471433.1A CN114387490A (en) | 2021-12-04 | 2021-12-04 | Backbone design of end-side OCR recognition system based on NAS search |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111471433.1A CN114387490A (en) | 2021-12-04 | 2021-12-04 | Backbone design of end-side OCR recognition system based on NAS search |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114387490A true CN114387490A (en) | 2022-04-22 |
Family
ID=81196390
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111471433.1A Pending CN114387490A (en) | 2021-12-04 | 2021-12-04 | Backbone design of end-side OCR recognition system based on NAS search |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114387490A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116522999A (en) * | 2023-06-26 | 2023-08-01 | 深圳思谋信息科技有限公司 | Model searching and time delay predictor training method, device, equipment and storage medium |
-
2021
- 2021-12-04 CN CN202111471433.1A patent/CN114387490A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116522999A (en) * | 2023-06-26 | 2023-08-01 | 深圳思谋信息科技有限公司 | Model searching and time delay predictor training method, device, equipment and storage medium |
CN116522999B (en) * | 2023-06-26 | 2023-12-15 | 深圳思谋信息科技有限公司 | Model searching and time delay predictor training method, device, equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110472817B (en) | XGboost integrated credit evaluation system and method combined with deep neural network | |
CN111583263B (en) | Point cloud segmentation method based on joint dynamic graph convolution | |
CN108985317B (en) | Image classification method based on separable convolution and attention mechanism | |
Asif et al. | Ensemble knowledge distillation for learning improved and efficient networks | |
CN112163628A (en) | Method for improving target real-time identification network structure suitable for embedded equipment | |
CN110263855B (en) | Method for classifying images by utilizing common-basis capsule projection | |
CN112732921B (en) | False user comment detection method and system | |
CN111597943B (en) | Table structure identification method based on graph neural network | |
CN114612761A (en) | Network architecture searching method for image recognition | |
CN114091650A (en) | Searching method and application of deep convolutional neural network architecture | |
CN112507114A (en) | Multi-input LSTM-CNN text classification method and system based on word attention mechanism | |
CN114387490A (en) | Backbone design of end-side OCR recognition system based on NAS search | |
CN114121163A (en) | Culture medium prediction system based on ensemble learning, training and culture medium prediction method | |
CN116756391A (en) | Unbalanced graph node neural network classification method based on graph data enhancement | |
Yang et al. | Skeleton Neural Networks via Low-rank Guided Filter Pruning | |
CN108446718B (en) | Dynamic deep confidence network analysis method | |
CN110543567A (en) | Chinese text emotion classification method based on A-GCNN network and ACELM algorithm | |
CN114168782B (en) | Deep hash image retrieval method based on triplet network | |
CN112149556B (en) | Face attribute identification method based on deep mutual learning and knowledge transfer | |
CN114596567A (en) | Handwritten digit recognition method based on dynamic feedforward neural network structure and growth rate function | |
Li et al. | A PSO-based fine-tuning algorithm for CNN | |
CN117556064B (en) | Information classification storage method and system based on big data analysis | |
Cai et al. | Implementation of hybrid deep learning architecture on loop-closure detection | |
Nguyen et al. | Improve object detection performance with efficient task-alignment module | |
CN117315324B (en) | Lightweight class detection method and system for Mars rugged topography |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication |