CN112883964A - Method for detecting characters in natural scene - Google Patents

Method for detecting characters in natural scene Download PDF

Info

Publication number
CN112883964A
CN112883964A CN202110176924.7A CN202110176924A CN112883964A CN 112883964 A CN112883964 A CN 112883964A CN 202110176924 A CN202110176924 A CN 202110176924A CN 112883964 A CN112883964 A CN 112883964A
Authority
CN
China
Prior art keywords
representing
attention
feature map
module
convolution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110176924.7A
Other languages
Chinese (zh)
Other versions
CN112883964B (en
Inventor
巫义锐
刘文翔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hohai University HHU
Original Assignee
Hohai University HHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hohai University HHU filed Critical Hohai University HHU
Priority to CN202110176924.7A priority Critical patent/CN112883964B/en
Publication of CN112883964A publication Critical patent/CN112883964A/en
Application granted granted Critical
Publication of CN112883964B publication Critical patent/CN112883964B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • G06V20/63Scene text, e.g. street names
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a method for detecting characters in a natural scene, and belongs to the technical field of character detection methods. The method comprises the following steps: 1, inputting 7200 pictures of characters to be trained; and 2, acquiring basic characteristic information through the convolution layer, and removing redundant information and enlarging the receptive field through the pooling layer. And 3, adding channel attention and receptive field attention to optimize the characteristic information. And 4, layering the network to enhance the detection capability of objects with different sizes and generate target points. And 5, cascading the generated contents to remove false positive results and obtain a final text area. And 6, comparing the text region obtained after cascading with the marked text region, calculating loss and adjusting network parameters. And 7, inputting the pictures to be detected into the trained network to obtain the detection results of the pictures. According to the method and the system, the recall rate and the accuracy of the model can be improved by receiving the attention of the field and the attention of the space. By means of the last cascade module, false positive results can be removed.

Description

Method for detecting characters in natural scene
Technical Field
The invention relates to a method for detecting characters in a natural scene, and belongs to the technical field of character detection methods.
Background
In recent years, the rapid development of mobile devices and automatic driving attracts attention to character detection, for the situations of traveling abroad and the like, the requirement of converting characters into texts by shooting is increasingly needed, the understanding of characters in natural scenes is increasingly concerned, and the detection of texts in natural scenes still has challenge because the directions of languages in natural scenes are possibly different and different from the characters appearing in books, the situation of bending is possibly caused. How to solve the detection problem caused by multi-language text, character bending and multi-direction needs to be solved urgently.
Disclosure of Invention
Aiming at the defects of the conventional character detection method in the study of a receptive field attention module, the invention provides a method for detecting characters in a natural scene, which enhances characteristic information by a multi-dimensional attention module and adds a cascade structure to help remove false positive results.
The invention adopts the following technical scheme for solving the technical problems:
a method for detecting characters in a natural scene comprises the following steps:
step 1, inputting 7200 pictures of characters to be trained;
step 2, obtaining basic characteristic information through the convolution layer, removing redundant information and enlarging the receptive field through the pooling layer,
and adding a residual error network;
step 3, adding channel attention and receptive field attention optimization characteristic information;
step 4, layering the network to enhance the detection capability of objects with different sizes and generate target points;
step 5, cascading the generated content, removing false positive results and obtaining a final text area;
step 6, comparing the text region obtained after cascading with the marked text region to calculate loss and adjusting network parameters;
and 7, inputting the pictures to be detected into the trained network to obtain the detection results of the pictures.
The step 2 comprises the following processes:
step 21, converting the input picture into a characteristic diagram I of n x 3, wherein n is the length and width of the characteristic diagram, and 3 is the channel number of the characteristic diagram;
step 22, extracting feature information from the feature map I obtained in step 21:
and extracting feature information by using ResNet-50, wherein the ResNet-50 is provided with 5 convolution modules, and the ResNet-50 processes the input feature information and is expressed as follows:
F=Res50(I)
wherein I represents the characteristic graph of n × 3 of the input picture through processing, Res50() Representing a residual network of 50 layers, F represents a characteristic diagram after being processed by ResNet-50, and ResNet-50 is divided into 5 modules and represented by the following formula:
F=Fi{i=1,2,3,4,5}
i ═ {1,2,3,4,5} represents 5 convolution blocks, respectively, FiA characteristic diagram of each module is shown.
Step 3 comprises the following processes:
step 31, firstly, channel attention and receptor field attention are obtained through an ISTK module, and the ISTK processes the characteristic diagrams of 2,3,4 and 5 layers generated by ResNet-50, and the formula is as follows:
Figure BDA0002940232260000021
wherein: f. ofistk() A presentation channel and a receptor field attention module,
Figure BDA0002940232260000022
representing a characteristic diagram generated after passing through an ISTK module;
step 32, firstly, the feature map is processed by the convolution processing formula as follows:
Ki,λ=fconv,λ(Fi)λ={1,2,3}
wherein Ki,λRepresenting the processed feature results of convolution kernels of different sizes, fconv,λ() Convolution operations representing convolution kernels of different sizes;
generating a new weight-related feature map by using feature maps generated by different convolution kernel size operations through two fully-connected layers and a pooling layer, and then calculating the weight of the new weight-related feature map through softmax, wherein the specific formula is as follows:
Figure BDA0002940232260000023
wherein wiRepresenting the weight coefficients generated by the respective convolution kernels, softmax being one way of calculating the weights, ffc() Representing two fully connected operations, favg() Representing the average pooling operation, softmax is calculated specifically as follows:
Figure BDA0002940232260000031
wherein wi,λWeight, C, representing the ith channel of the lambda convolution kerneli,λRepresents attention indexes of an ith channel and a lambda channel;
step 33, generating a new feature map according to the generated weight value and feature map, and expressing the feature map by the following formula:
Figure BDA0002940232260000032
where sum () represents a summing function,
Figure BDA0002940232260000033
the weight of the ith feature map is represented, and the feature map passing through the attention module is obtained by summing the weights of the ith feature map and the ith feature map
Figure BDA0002940232260000034
Representing a new feature graph, relu () representing an activation function, in particular the following formula:
Figure BDA0002940232260000035
x represents quiltValue of activation, wherein frelu(x) Represents the value after activation;
step 34, adding NLNet as a spatial attention module to the second layer, wherein the formula is as follows:
Figure BDA0002940232260000036
wherein:
Figure BDA0002940232260000037
representing the result of convolution of the feature values of the second layer of the previous step, wherein fNLN() The module is a global contact network module, and is specifically expressed as the following formula:
Figure BDA0002940232260000038
wherein c (x) denotes normalization, f (x)i,xj) Is represented as the relation of the sought characteristic maps i and j, g (x)j) The eigenvalue of the j point is calculated.
Step 4 comprises the following processes:
layering through FPN and generating different levels of anchors through RPN, wherein the formula is as follows:
b0=fRPN(P)
b0denotes the most initial bounding box, fRPN() And representing the RPN module, and selecting the bounding boxes with different lengths and widths in the first stage, namely target points generated by the RPN through the RPN at different levels.
Step 5 comprises the following processes:
an initial bounding box is obtained through step 4, and is sequentially used as a reference to be circularly input into the RolAlign:
mk=fM,k(Ra(bk-1,P))k=1,2,3
bk=fB,k(Ra(bk-1,P))k=1,2,3
where k represents the number of stages of the cascade function, thisWhere k is 3, m is taken over three stageskAs a k-th stage of segmentation, bkIs a detection frame of the k-th stage, bk-1The detection box of the k-1 stage, P represents the characteristic diagram generated by the stage three, RaRepresents RolAlign, fM,k() Mask, f produced by RolAlign representing stage KB,k() Representing the bounding box generated in stage k.
The invention has the following beneficial effects:
according to the method, the cascade mask r-cnn (cascade mask area neural network) is modified to have the context information of the picture, and the recall rate and the accuracy rate of the model can be improved by receiving the wild attention and the spatial attention. By means of the last cascade module, false positive results can be removed.
Drawings
FIG. 1 is a flow chart of the present invention.
Fig. 2 is a picture of a character to be trained.
Fig. 3 is a characteristic diagram of the entire network structure.
Fig. 4 is a view of an attention module structure.
FIG. 5 is a mask (segmentation) diagram.
Fig. 6 is a detection result picture.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings.
As shown in fig. 1, the method for detecting characters in a natural scene of the present invention is shown in fig. 1, and includes the following steps:
step 1: a picture of the character to be trained is input, and the picture of the character to be trained refers to fig. 2.
Step 2: comprises the following steps.
The operation steps can refer to the basic feature extraction module in FIG. 3
And converting the input picture into a characteristic diagram I of n x 3, wherein n is the length and the width of the characteristic diagram, and 3 is the channel number of the characteristic diagram.
And extracting feature information from the obtained feature map.
The feature information extraction is performed by using ResNet-50 (residual network), the ResNet-50 has 5 convolution modules, and the ResNet-50 can express the input feature information processing as
F=Res50(I)
Wherein I represents the characteristic graph of n × 3 of the input picture through processing, Res50() Representing a residual network of 50 layers, F represents a characteristic graph after being processed by ResNet-50, and ResNet-50 is divided into 5 modules which can be represented by the following formula:
F=Fi{i=1,2,3,4,5}
i ═ {1,2,3,4,5} represents 5 convolution blocks, respectively, FiA characteristic diagram of each module is shown.
And step 3: the method comprises the following steps:
the operation steps can refer to fig. 4.
Channel attention and receptor field attention are firstly obtained through an ISTK (independently selected text convolution kernel) module, and the ISTK processes a feature map of 2,3,4 and 5 layers generated by ResNet-50, and the formula is as follows:
Figure BDA0002940232260000051
wherein: f. ofistk() A presentation channel and a receptor field attention module,
Figure BDA0002940232260000052
showing the signature generated after passing through the ISTK module. Wherein f isistk() The module details are as follows:
firstly, the feature map is subjected to convolution processing formula as follows:
Ki,λ=fconv,λ(Fi)λ={1,2,3+
wherein: ki,λRepresenting the processed feature results of convolution kernels of different sizes, fconv,λ() Representing convolution operations with convolution kernels of different sizes.
Generating a new weight-related feature map by using feature maps generated by different convolution kernel size operations through two fully-connected layers and a pooling layer, and then calculating the weight of the new weight-related feature map through softmax, wherein the specific formula is as follows:
Figure BDA0002940232260000053
wherein wiRepresenting the weight coefficients generated by the respective convolution kernels, softmax being one way of calculating the weights, ffc() Representing two fully connected operations, favg() Representing the average pooling operation, softmax is calculated specifically as follows:
Figure BDA0002940232260000054
wherein wi,λWeight, C, representing the ith channel of the lambda convolution kerneli,λAnd represents attention indexes of the ith channel and the lambda channel.
A new feature map can be generated according to the generated weight values and feature maps, and can be represented by the following formula:
Figure BDA0002940232260000055
where sum () represents a summing function,
Figure BDA0002940232260000056
the weight of the ith feature map is represented, and the feature map passing through the attention module can be obtained by summing the weights of the ith feature map and the ith feature map, wherein
Figure BDA0002940232260000061
Representing a new feature graph, relu () representing an activation function, in particular the following formula:
Figure BDA0002940232260000062
x represents the value to be activated, wherein frelu(x) Representing the value after activation.
Adding NLNet (spatial attention) to the second layer, the formula can be expressed as:
Figure BDA0002940232260000063
wherein
Figure BDA0002940232260000064
Representing the result of convolution of the feature values of the second layer of the previous step, wherein fNLN() The global contact network module may be specifically expressed as the following formula:
Figure BDA0002940232260000065
wherein c (x) denotes normalization, f (x)i,xj) Is represented as the relation of the sought characteristic maps i and j, g (x)j) The eigenvalue of the j point is calculated.
Step 4 comprises the following steps:
reference may be made to the FPN (feature pyramid network) part of fig. 3.
Hierarchical with FPN and generating different levels of anchors (target points) with RPN (regional suggestion network), the formula can be expressed as:
b0=fRPN(P)
b0denotes the most initial bounding box, fRPN() And representing the RPN module, and selecting the bounding boxes with different lengths and widths in the first stage, namely target points generated by the RPN through the RPN at different levels.
And 5: the method comprises the following steps:
reference may be made to the cascade detection section of fig. 3.
The initial bounding box is obtained by step 4, which in turn can be used as a reference for cyclic input into the RolAlign (pooling layer)
mk=fM,k(Ra(bk-1,P)){k=1,2,3}
bk=fB,k(Ra(bk-1,P)){k=1,2,3}
Where k represents the number of stages of the cascade function, where k takes 3 to denote the passage through three stages, mkAs a k-th stage of segmentation, bkIs a detection frame of the k-th stage, bk-1The detection box of the k-1 stage, P represents the characteristic diagram generated by the stage three, RaRepresents RolAlign, fM,k() Mask, f produced by RolAlign representing stage KB,k() Representing the bounding box generated in stage k.
Step 6: the method comprises the following steps:
the distance between the coordinates generated by the present network and the actual coordinates is calculated using a cross-entropy loss function, which is formulated as follows:
Figure BDA0002940232260000071
wherein p () is expected output, q () is actual output, log is log function, and Loss () is cross entropy Loss obtained.
And 7: after passing through the cascade module, the resulting segmented image is shown in FIG. 5.
And 8: the resulting text area is shown in fig. 6.

Claims (5)

1. A method for detecting characters in a natural scene is characterized by comprising the following steps:
step 1, inputting 7200 pictures of characters to be trained;
step 2, obtaining basic characteristic information through the convolution layer, removing redundant information and enlarging a receptive field through the pooling layer, and adding a residual error network;
step 3, adding channel attention and receptive field attention optimization characteristic information;
step 4, layering the network to enhance the detection capability of objects with different sizes and generate target points;
step 5, cascading the generated content, removing false positive results and obtaining a final text area;
step 6, comparing the text region obtained after cascading with the marked text region to calculate loss and adjusting network parameters;
and 7, inputting the pictures to be detected into the trained network to obtain the detection results of the pictures.
2. The method for detecting the characters in the natural scene according to claim 1, wherein: the step 2 comprises the following processes:
step 21, converting the input picture into a characteristic diagram I of n x 3, wherein n is the length and width of the characteristic diagram, and 3 is the channel number of the characteristic diagram;
step 22, extracting feature information from the feature map I obtained in step 21:
and extracting feature information by using ResNet-50, wherein the ResNet-50 is provided with 5 convolution modules, and the ResNet-50 processes the input feature information and is expressed as follows:
F=Res50(I)
wherein I represents the characteristic graph of n × 3 of the input picture through processing, Res50() Representing a residual network of 50 layers, F represents a characteristic diagram after being processed by ResNet-50, and ResNet-50 is divided into 5 modules and represented by the following formula:
F=Fi i=1,2,3,4,5
i-1, 2,3,4,5 respectively represent 5 convolution modules, FiA characteristic diagram of each module is shown.
3. The method for detecting the characters in the natural scene according to claim 2, wherein the step 3 comprises the following steps:
step 31, firstly, channel attention and receptor field attention are obtained through an ISTK module, the ISTK module processes the characteristic diagram of 2,3,4 and 5 layers generated by ResNet-50, and the formula is as follows:
Figure FDA0002940232250000021
wherein: f. ofistk() Representing a channel anda receptor field attention module for receiving the attention of the user,
Figure FDA0002940232250000022
representing a characteristic diagram generated after passing through an ISTK module;
step 32, firstly, the feature map is processed by the convolution processing formula as follows:
Ki,λ=fconv,λ(Fi) λ={1,2,3}
wherein Ki,λRepresenting the processed feature results of convolution kernels of different sizes, fconv,λ() Convolution operations representing convolution kernels of different sizes;
generating a new weight-related feature map by using feature maps generated by different convolution kernel size operations through two fully-connected layers and a pooling layer, and then calculating the weight of the new weight-related feature map through softmax, wherein the specific formula is as follows:
Figure FDA0002940232250000023
wherein wiRepresenting the weight coefficients generated by the respective convolution kernels, softmax being one way of calculating the weights, ffc() Representing two fully connected operations, favg() Representing the average pooling operation, softmax is calculated specifically as follows:
Figure FDA0002940232250000024
wherein wi,λWeight, C, representing the ith channel of the lambda convolution kerneli,λRepresents attention indexes of an ith channel and a lambda channel;
step 33, generating a new feature map according to the generated weight value and feature map, and expressing the feature map by the following formula:
Figure FDA0002940232250000025
where sum () represents a summing function,
Figure FDA0002940232250000026
the weight of the ith feature map is represented, and the feature map passing through the attention module is obtained by summing the weights of the ith feature map and the ith feature map
Figure FDA0002940232250000027
Representing a new feature graph, relu () representing an activation function, in particular the following formula:
Figure FDA0002940232250000028
wherein x represents the value to be activated, frelu(x) Represents the value after activation;
step 34, adding NLNet as a spatial attention module to the second layer, wherein the formula is as follows:
Figure FDA0002940232250000029
wherein:
Figure FDA00029402322500000210
representing the result of convolution of the feature values of the second layer of the previous step, wherein fNLN() The module is a global contact network module, and is specifically expressed as the following formula:
Figure FDA0002940232250000031
wherein c (x) denotes normalization, f (x)i,xj) Is represented as the relation of the sought characteristic maps i and j, g (x)j) The eigenvalue of the j point is calculated.
4. The method for detecting the characters in the natural scene according to claim 1, wherein the step 4 comprises the following steps:
layering through FPN and generating different levels of anchors through RPN, wherein the formula is as follows:
b0=fRPN(P)
wherein b is0Denotes the most initial bounding box, fRPN() And representing the RPN module, and selecting the bounding boxes with different lengths and widths in the first stage, namely target points generated by the RPN through the RPN at different levels.
5. The method for detecting the characters in the natural scene according to claim 4, wherein the step 5 comprises the following steps:
an initial bounding box is obtained through step 4, and is sequentially used as a reference to be circularly input into the RolAlign:
mk=fM,k(Ra(bk-l,P)) k=i,2,3
bk=fB,k(Ra(bk-1,P)) k=i,2,3
where k represents the number of stages of the cascade function, where k takes 3 to denote the passage through three stages, mkAs a k-th stage of segmentation, bkIs a detection frame of the k-th stage, bk-1The detection box of the k-1 stage, P represents the characteristic diagram generated by the stage three, RaRepresents RolAlign, fM,k() Mask, f produced by RolAlign representing stage KB,k() Representing the bounding box generated in stage k.
CN202110176924.7A 2021-02-07 2021-02-07 Method for detecting characters in natural scene Active CN112883964B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110176924.7A CN112883964B (en) 2021-02-07 2021-02-07 Method for detecting characters in natural scene

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110176924.7A CN112883964B (en) 2021-02-07 2021-02-07 Method for detecting characters in natural scene

Publications (2)

Publication Number Publication Date
CN112883964A true CN112883964A (en) 2021-06-01
CN112883964B CN112883964B (en) 2022-07-29

Family

ID=76056307

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110176924.7A Active CN112883964B (en) 2021-02-07 2021-02-07 Method for detecting characters in natural scene

Country Status (1)

Country Link
CN (1) CN112883964B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115482538A (en) * 2022-11-15 2022-12-16 上海安维尔信息科技股份有限公司 Material label extraction method and system based on Mask R-CNN
CN115661828A (en) * 2022-12-08 2023-01-31 中化现代农业有限公司 Character direction identification method based on dynamic hierarchical nested residual error network

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109165697A (en) * 2018-10-12 2019-01-08 福州大学 A kind of natural scene character detecting method based on attention mechanism convolutional neural networks
CN112149619A (en) * 2020-10-14 2020-12-29 南昌慧亦臣科技有限公司 Natural scene character recognition method based on Transformer model

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109165697A (en) * 2018-10-12 2019-01-08 福州大学 A kind of natural scene character detecting method based on attention mechanism convolutional neural networks
CN112149619A (en) * 2020-10-14 2020-12-29 南昌慧亦臣科技有限公司 Natural scene character recognition method based on Transformer model

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115482538A (en) * 2022-11-15 2022-12-16 上海安维尔信息科技股份有限公司 Material label extraction method and system based on Mask R-CNN
CN115661828A (en) * 2022-12-08 2023-01-31 中化现代农业有限公司 Character direction identification method based on dynamic hierarchical nested residual error network
CN115661828B (en) * 2022-12-08 2023-10-20 中化现代农业有限公司 Character direction recognition method based on dynamic hierarchical nested residual error network

Also Published As

Publication number Publication date
CN112883964B (en) 2022-07-29

Similar Documents

Publication Publication Date Title
CN112001339B (en) Pedestrian social distance real-time monitoring method based on YOLO v4
WO2020221298A1 (en) Text detection model training method and apparatus, text region determination method and apparatus, and text content determination method and apparatus
CN112883964B (en) Method for detecting characters in natural scene
CN113469073A (en) SAR image ship detection method and system based on lightweight deep learning
CN111079739B (en) Multi-scale attention feature detection method
CN112233129B (en) Deep learning-based parallel multi-scale attention mechanism semantic segmentation method and device
CN110533022B (en) Target detection method, system, device and storage medium
CN110348531B (en) Deep convolution neural network construction method with resolution adaptability and application
CN113269224B (en) Scene image classification method, system and storage medium
CN113034506B (en) Remote sensing image semantic segmentation method and device, computer equipment and storage medium
CN113052775B (en) Image shadow removing method and device
CN113076957A (en) RGB-D image saliency target detection method based on cross-modal feature fusion
CN111860683A (en) Target detection method based on feature fusion
CN114913424B (en) Improved U-net model-based sentry collapse extraction method and system
CN111104855B (en) Workflow identification method based on time sequence behavior detection
CN112949506A (en) Low-cost real-time bone key point identification method and device
CN116740119A (en) Tobacco leaf image active contour segmentation method based on deep learning
CN115984701A (en) Multi-modal remote sensing image semantic segmentation method based on coding and decoding structure
CN114926734A (en) Solid waste detection device and method based on feature aggregation and attention fusion
CN110706256A (en) Detection tracking algorithm optimization method based on multi-core heterogeneous platform
CN114037743A (en) Three-dimensional point cloud robust registration method for Qinhong warriors based on dynamic graph attention mechanism
CN116012709B (en) High-resolution remote sensing image building extraction method and system
CN115620120B (en) Street view image multi-scale high-dimensional feature construction quantization method, device and storage medium
CN111126185A (en) Deep learning vehicle target identification method for road intersection scene
CN110555462A (en) non-fixed multi-character verification code identification method based on convolutional neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant