CN110929746A - Electronic file title positioning, extracting and classifying method based on deep neural network - Google Patents

Electronic file title positioning, extracting and classifying method based on deep neural network Download PDF

Info

Publication number
CN110929746A
CN110929746A CN201910454209.8A CN201910454209A CN110929746A CN 110929746 A CN110929746 A CN 110929746A CN 201910454209 A CN201910454209 A CN 201910454209A CN 110929746 A CN110929746 A CN 110929746A
Authority
CN
China
Prior art keywords
title
image
neural network
frame
electronic file
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910454209.8A
Other languages
Chinese (zh)
Inventor
葛季栋
李传艺
刘宇翔
姚林霞
乔洪波
周筱羽
骆斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University
Original Assignee
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University filed Critical Nanjing University
Priority to CN201910454209.8A priority Critical patent/CN110929746A/en
Publication of CN110929746A publication Critical patent/CN110929746A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)
  • Character Discrimination (AREA)

Abstract

The invention discloses an electronic file title positioning, extracting and classifying method based on a deep neural network, which comprises the following steps of: inputting the file pictures into a neural network to extract a plurality of feature maps with multiple sizes, calculating category scores and border positions according to the output feature maps, and selecting title positions and title categories in the document by a plurality of title election algorithms. The invention aims to solve the problem that the electronic file images are often required to be classified manually in the actual electronic file processing process, the titles of the images are extracted from a simple image level rather than an OCR (optical character recognition) mode and the like, the positions and the types of the image titles can be accurately obtained through the characteristics of the images, the general robustness is improved, and the accuracy of image classification is improved.

Description

Electronic file title positioning, extracting and classifying method based on deep neural network
Technical Field
The invention relates to a classification method for electronic files, in particular to a method for positioning, extracting and classifying titles of electronic files based on a deep neural network, and belongs to the field of computer vision and deep learning.
Background
In order to promote the synchronous generation work of electronic files according to the case, promote the deep integration of modern information technology and court work, boost the 'intelligent court' to upgrade again, and generate the electronic files according to the case accepted by each court across the country, most cases synchronously generate the electronic files according to the case, and cover the whole process of case setting, handling, filing and case settlement. The personnel handling the case need to convert the data handling the case into an electronic document in real time and generate an electronic file, thereby ensuring that all traces are left in the system in the whole case handling process; related personnel such as department responsible persons, council courtyards, branch management couriers and the like can track case handling progress, council review cases and examination case quality on line through the electronic file system, and the judicial intelligent management level is improved; the courts at all levels can realize the on-line transfer of case files through the electronic file system, so that the cooperative work efficiency among the courts is improved; the parties and the litigation agents thereof can automatically scan and upload electronic litigation materials on equipment provided by the province and high schools, apply for looking up and printing the related electronic file information of cases, know and track the case transaction progress in real time, better promote judicial disclosure and realize execution supervision.
However, cases and electronic files need to be manually processed, workers need to browse related types of information, other data mining information extraction also depends on files of specific types, cataloguing personnel need to identify and split electronic data, file titles are extracted, file names are manually input, and time and labor are wasted.
The value of the electronic file picture classification is reflected in the fact that on one hand, each picture in one electronic file is clearly marked with a type, so that other related personnel or workers can check the pictures to be browsed more quickly, specific information is tracked, whether materials and the like are left or not is checked, and the efficiency of the workers in checking the electronic files is greatly improved. On the other hand, as the first step of building the intelligent court, because many related subsequent steps of artificial intelligence greatly depend on the classified pictures and then carry out additional steps of information extraction and the like, the classification of thousands of pictures consumes manpower, the intelligent targeting marking of the electronic file pictures provides great convenience for the subsequent steps, and a large amount of time and manpower are saved.
In computer vision, image classification is a fundamental problem, but when applied to image classification of electronic documents, because the overall characteristics of text type document images are approximately the same, and meanwhile, new types of litigation materials continuously appear and the identifiability of the document materials and other factors influence, the effect of directly classifying the overall document images is not particularly ideal. Whereby the localization and classification of the position of the header is performed for the text image using another classical object detection and recognition in computer vision. The object detection is roughly divided into two types, namely two-stage extracting the region of interest first and then extracting the image characteristics of the region of interest again for subsequent classification and inference. The other method is that the whole image features are extracted end to end in the prior art, and prediction frames with different aspect ratios and sizes of target frames are output at each layer while feature maps are reduced in a pyramid mode. The prediction burden of different target sizes is shared to different layers to complete, and the multitask forms of jointly predicting the title types and regressing the length and width of the title frames interact with each other to improve the accuracy of the titles. In the calculation process, as a plurality of long text boxes with extremely high aspect ratios need to be predicted, a plurality of convolution layers with extremely low aspect ratios are added, so that the characteristics of text titles and characters are easily lost, and a satisfactory effect is difficult to achieve by only depending on high-level characteristics and simultaneously calculating the types of the text titles, so that the bottom-level characteristics need to be spliced when the types of the text titles are predicted, so that the font characteristics do not disappear along with the increase of network hierarchies. Therefore, the method is based on the basic end-to-end target detection deep network, simplifies the flow line steps of scanning the electronic files to globally perform OCR (Optical character recognition) and then performing text analysis and title extraction in the traditional process, and emphatically researches the method for positioning, extracting and classifying the titles of the electronic files.
Disclosure of Invention
The invention discloses an electronic file title positioning, extracting and classifying method based on a deep neural network, and provides an electronic file image preprocessing method. The method can greatly reduce the manpower and time consumption of a court when the court manually consults and classifies and archives the electronic files, provides convenient retrieval when a judge needs to consult file data or files of a specific type, and provides clear image types for subsequent information extraction of documents of specific types such as litigation documents, judgment documents and the like in specific artificial intelligence related processing.
The invention relates to an electronic file title positioning, extracting and classifying method based on a deep neural network, which is characterized by comprising the following steps of:
step (1), inputting the file pictures into a neural network to extract a plurality of characteristic maps with multiple sizes.
And (2) calculating a category score and a frame position according to the output feature map.
And (3) deducing the title position and the title category in the document by using a plurality of title election algorithms.
2. The method for extracting and classifying electronic file titles based on the deep neural network as claimed in claim 1, wherein the file pictures are input into the neural network in step (1) to extract a plurality of feature maps with multiple sizes, and the specific sub-steps include:
and (1.1) carrying out size correction on the file image and carrying out image preprocessing.
Step (1.2) input the preprocessed portfolio image into the underlying neural network and transmit into the title proposal neural network when the signature size becomes 1/8 initial.
Step (1.3) multiple long transverse convolutions and dilation convolutions are performed on the feature map and combined in the title proposed neural network.
And (1.4) finishing the final reduction of the feature map to 1/32 of the original map, and calculating and classifying two 1/8, two 1/16 and one 1/32 feature maps in the middle process.
And (1.5) image enhancement is carried out by adopting image rotation, filling, blurring, interception and brightness contrast adjustment in a training stage. And selecting a corresponding frame as a positive category mark through the Jaccard, selecting a specified number of frames with the lowest predicted values as negative category marks, and dividing the data set into a training, verifying and testing set. And training the network by continuously changing the hierarchical structure of the network parameters, and finally jointly evaluating the network by the f-measure of the frame and the classified call.
3. The method for extracting and classifying electronic file titles based on the deep neural network as claimed in claim 1, wherein the step (2) of calculating the category score and the border position according to the output feature map comprises the following specific sub-steps:
and (2.1) predicting the position of the center of the title in a plurality of vertical rows in the original image and the length and width of the title in various aspect ratios by using the point mapping for each layer of feature map through frame regression convolution.
And (2.2) continuously calculating the characteristics of the title categories of the characteristic graphs of the layers through an additional multi-layer classification module, and outputting the title categories of the points mapped in the original graph.
4. The method for extracting and classifying electronic file titles based on the deep neural network as claimed in claim 1, wherein the title position and the title category in the document are selected by a plurality of title election algorithms in step (3), and the specific sub-steps include:
and (3.1) judging the possibility of each title of all points in the image, and if the titles exist, acquiring predicted values of the center and the height width of the title frame.
And (3.2) screening all the prediction frames through a threshold value.
Step (3.3) revises all the values of the header bounding box prediction that exceed the image boundary.
And (3.4) sorting all the processed title frames in a descending order according to the probability of each type of title, and extracting the top k title frames with the highest probability.
And (3.5) selecting several frame types and frame positions with the highest prediction probability by using an NMS algorithm.
Step (3.6) reprocesses several results obtained in step (3.5) by a quench NMS algorithm and finally elects a bounding box result.
And (3.7) the title of the step of proposing the effect of the classification network adopts f-measure of frame prediction with IOU more than 0.5 and the accuracy rate of classification to jointly evaluate.
Compared with the prior art, the invention has the following remarkable advantages: the method changes the traditional method aiming at the editing purpose of the electronic file, namely, the method of extracting the title in a text analysis mode after performing OCR on the file image into the method of directly predicting the position of the title frame through image characteristics, and simultaneously calculating the category of the title frame through shared convolution, thereby simplifying the steps, and reducing some error conditions in the OCR process and the condition that the title name is not obvious and can not be classified. Meanwhile, the image features have strong robustness against handwriting, rotation and blurring, and the position of a title can be detected and a possible title category can be estimated under the condition that OCR cannot detect correctly. Time is saved. Meanwhile, when an additional electronic file picture category needs to be added, only additional training is needed, and no additional step is needed. The extracted image features are obvious, so that the method is reliable in judgment accuracy. When manual verification is needed, the title frame marked on the image and the category on the frame can be used for visually checking whether the image is wrong or not, and subsequent verification is facilitated. By the method provided by the invention, a large amount of electronic file images of pure text types can be well subjected to title extraction and recognition without extra OCR (optical character recognition) and other steps, and meanwhile, the positions of titles can be well positioned and recognized through image characteristics aiming at handwritten or fuzzy text images which are difficult to recognize by OCR. When the images of the new category need to be added, only the image samples of the new category need to be added for training, and the network convergence is rapid.
Drawings
FIG. 1 is a flow chart of a method for extracting and classifying electronic file titles based on a deep neural network
FIG. 2 general architecture of a title-locating network
FIG. 3 Special modules employed for title Frames in a title extraction network
FIG. 4 electronic portfolio title example
FIG. 5 is a schematic diagram of a portion of a pre-allocated border (only shown with aspect ratios of 3 and 13, in reality more than two, and with the aspect ratios of 3 and 9 placed in two columns for visibility)
FIG. 6 general network architecture of a classifier module
FIG. 7 is a flow chart of the post-NMS suppression algorithm
FIG. 8 is a comparison of the conventional TextBoxes method with the electronic portfolio title extraction positioning classification network experiment presented herein
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in detail with reference to the accompanying drawings and specific embodiments.
The invention aims to solve the problem of electronic file cataloging and provides an electronic file title positioning, extracting and classifying method based on a deep neural network. By using the deep neural network, the title position and the title category in the electronic volume are extracted, and the step of finding the document title by extracting text information after OCR recognition of the whole text is avoided. The method changes the traditional method aiming at the editing purpose of the electronic file, namely, the method of extracting the title in a text analysis mode after performing OCR on the file image into the method of directly predicting the position of the title frame through image characteristics, and simultaneously calculating the category of the title frame through shared convolution, thereby simplifying the steps, and reducing some error conditions in the OCR process and the condition that the title name is not obvious and can not be classified. Meanwhile, the image features have strong robustness against handwriting, rotation and blurring, and the position of a title can be detected and a possible title category can be estimated under the condition that OCR cannot detect correctly. Time is saved. Meanwhile, when an additional electronic file picture category needs to be added, only additional training is needed, and no additional step is needed. The extracted image features are obvious, so that the method is reliable in judgment accuracy. When manual verification is needed, the title frame marked on the image and the category on the frame can be used for visually checking whether the image is wrong or not, and subsequent verification is facilitated. By the method provided by the invention, a large amount of electronic file images of pure text types can be well subjected to title extraction and recognition without extra OCR (optical character recognition) and other steps, and meanwhile, the positions of titles can be well positioned and recognized through image characteristics aiming at handwritten or fuzzy text images which are difficult to recognize by OCR. When the images of the new category need to be added, only the image samples of the new category need to be added for training, and the network convergence is rapid. The invention mainly comprises the following steps:
step (1), inputting the file pictures into a neural network to extract a plurality of characteristic maps with multiple sizes.
And (2) calculating a category score and a frame position according to the output feature map.
And (3) deducing the title position and the title category in the document by using a plurality of title election algorithms.
The detailed work flow of the electronic file title positioning, extracting and classifying method based on the deep neural network is shown in fig. 1. The above steps will be described in detail herein.
1. Because the change of the size proportion of the electronic file image is large, when the file image is input into the neural network to extract a plurality of feature maps with multiple sizes, a series of preprocessing operations are needed to be carried out, so that all the file images can be added into the deep neural network to be processed, and the method specifically comprises the following steps:
step (1.1) resizes the portfolio image to a fixed scale resolution (e.g., 320 x 320).
Inputting the preprocessed file image into a basic neural network and transmitting the file image into a title suggestion neural network when the size of a feature map is changed to be initial 1/8, wherein the basic neural network is selected to extract the features of the electronic file image, the basic neural networks such as initiation series, vgg or resnet can be selected and used, the pre-training models of the basic neural networks help the title suggestion network to better and faster learn the basic features of the image, the basic neural network is used for extracting the character features of the electronic file image, the image features of the levels such as the relation among characters in each line, and the like, and the network structure of the whole network is shown in FIG. 2.
Step (1.3) multiple long transverse convolutions and dilation convolutions are performed on the feature map and combined in the title proposed neural network. Since the header parts of all electronic volume images belong to long strips instead of square-like shapes as in the conventional object detection task, the features of the header frame need to be extracted more frequently by using a form of horizontal long convolution, and the features of other information on the image need to be extracted by using vertical convolution, and since the high-level features of the image need to be better obtained without reducing the feature map, the features of the header frame of the image need to be better obtained by adopting forms of expansion convolution, pooling and the like, and the structure of the network module is shown in fig. 3.
And (1.4) finishing the final reduction of the feature map to 1/32 of the original map, and calculating and classifying two 1/8, two 1/16 and one 1/32 feature maps in the middle process. Since the title size type in the electronic volume image spans very large, as shown in fig. 4, the title box needs to be predicted by using multiple layers of image pyramids with different sizes to make the network have a wide range of scales.
And (1.5) image enhancement is carried out by adopting image rotation, filling, blurring, interception and brightness contrast adjustment in a training stage. And selecting a corresponding frame as a positive category mark through the Jaccard, selecting a specified number of frames with the lowest predicted values as negative category marks, and dividing the data set into a training, verifying and testing set. And training the network by continuously changing the hierarchical structure of the network parameters, and finally jointly evaluating the network by the f-measure of the frame and the classified call. Meanwhile, in the training stage, softmax is adopted for conversion of predicted values aiming at judgment of frame types, cross entropy is adopted for loss values of the training network, top-k negative sample values with the worst prediction results are selected aiming at negative samples, wherein k is the number of positive samples, and the specific calculation formula is as follows:
Figure BSA0000183798180000051
the length and width of the frame and the loss value of the central point are calculated by adopting smooth L1, meanwhile, the regression of the central point is zoomed by 0.1 time, the regression of the length and width is zoomed by 0.15 time, so that the distribution of the length and width is similar, and the specific calculation formula is as follows:
Figure BSA0000183798180000052
Figure BSA0000183798180000053
Figure BSA0000183798180000054
the sum of the two is divided by the number N of the selected positive samples, and the specific calculation formula is as follows:
Figure BSA0000183798180000055
the positive sample is matched with pre-configured frames on the feature map through a marking frame during input, wherein the pre-configured frames are generally distributed as shown in fig. 5 and are frames with various aspect ratios, a node of each feature map is responsible for a plurality of frames with different aspect ratios generated by a plurality of central points of the area in the original map, and a specific calculation formula of the length and the width of each frame is as follows:
Figure BSA0000183798180000056
Figure BSA0000183798180000057
wherein v iskIs a preset frame size, fkIs the ratio of the frame step size to the image size. Calculating IOU through Jaccard, namely calculating the ratio of the intersection area of two rectangular frames to the area sum of the two rectangular frames, and if the IOU is more than 0.5, representing a positive sample. The training target is to select an optimal title-proposing model, pre-training can be performed on a general title data set firstly, and then the method is transferred to the title extraction of the electronic file in a transfer learning mode, so that the overfitting of the model is reduced.
2. In order to convert the image features extracted by the neural network into the final multi-scale title detection result, the class score and the frame position need to be calculated according to the output feature map in step (2), and the specific sub-steps include:
and (2.1) predicting the position of the center of the title in a plurality of vertical rows in the original image and the length and width of the title in various aspect ratios by using the point mapping for each layer of feature map through frame regression convolution.
And (2.2) continuously calculating the characteristics of the title categories of the characteristic graphs of the layers through an additional multi-layer classification module, and outputting the title categories of the points mapped in the original graph. The general structure is shown in fig. 6. Since most titles have many words in common and the image features at the upper level may not have the values of the features at the lower level of the image, especially after the features are calculated for the long title, the category of the title is jointly predicted by combining the features at the lower level and the features at the upper level, so that the distinction of small fonts in a large title box is amplified.
3. In order to integrate all preset title frames output by the neural network and finally obtain a single title frame position and a title frame category for the electronic file image through screening, the title position and the title category in the document need to be selected through a plurality of title election algorithms in step (3), and the specific sub-steps include:
and (3.1) judging the possibility of each title of all points in the image, and if the titles exist, acquiring predicted values of the center and the height width of the title frame. I.e. the classification output and the title frame regression output in the step (2).
And (3.2) screening all the prediction frames through a threshold value. Since most of the header frames are background or do not need to be adopted, and the number of the header frames output in the feature map is very large, the header frames with small prediction results need to be directly excluded through threshold values for convenient subsequent processing to reduce subsequent calculation.
Step (3.3) revises all the values of the header bounding box prediction that exceed the image boundary. Since the regression value of the output header box may exceed the boundary of the image, the value exceeding the regression value needs to be corrected to the boundary of the image.
And (3.4) sorting all the processed title frames in a descending order according to the probability of each type of title, and extracting the top k title frames with the highest probability. Thereby obtaining all titles that are most likely to be the title of the electronic volume.
And (3.5) selecting several frame types and frame positions with the highest prediction probability by using an NMS algorithm.
Step (3.6) reprocesses several results obtained in step (3.5) by a quench NMS algorithm and finally elects a bounding box result. The general flow of suppressing the NMS algorithm is shown in fig. 7, and the purpose is to filter the final judgment result error caused by outputting a predicted frame type with the highest probability by error, because the network only needs to adopt a bounding box at last, the conventional NMS algorithm degenerates to only select the bounding box with the highest probability for the title positioning of the electronic file, and generates some fluctuation. The post-suppression NMS algorithm can reduce some interference aiming at the detection header result.
And (3.7) the title of the step of proposing the effect of the classification network adopts f-measure of frame prediction with IOU more than 0.5 and the accuracy rate of classification to jointly evaluate. Due to the diversity of aspect ratio changes in the header size of electronic files, there is not a good feature, and in addition, the types of header fonts of different types of electronic files are very different. In the scene of intelligent cataloguing of electronic files, a good electronic file image classification network should take the variability of fonts into consideration. Therefore, the invention adopts the f-measure of frame prediction and the accuracy of classification to jointly evaluate. In experimental evaluation, the invention compares the conventional TextBoxes method with the electronic file title extraction, positioning and classification network provided herein for five different types of electronic file image calculation experiments, and the experimental result is shown in fig. 8. Therefore, the extracted title position and the category judgment of the invention are superior to other methods for the electronic file images of five different types. Other conventional methods cannot cover all titles of the electronic file and cannot well extract the title category information of the electronic file.
A method for extracting and classifying the title of an electronic file based on a deep neural network according to the present invention has been described in detail with reference to the accompanying drawings. The invention has the following advantages: the method changes the traditional method aiming at the editing purpose of the electronic file, namely, the method of extracting the title in a text analysis mode after performing OCR on the file image into the method of directly predicting the position of the title frame through image characteristics, and simultaneously calculating the category of the title frame through shared convolution, thereby simplifying the steps, and reducing some error conditions in the OCR process and the condition that the title name is not obvious and can not be classified. Meanwhile, the image features have strong robustness against handwriting, rotation and blurring, and the position of a title can be detected and a possible title category can be estimated under the condition that OCR cannot detect correctly. Time is saved. Meanwhile, when an additional electronic file picture category needs to be added, only additional training is needed, and no additional step is needed. The extracted image features are obvious, so that the method is reliable in judgment accuracy. When manual verification is needed, the title frame marked on the image and the category on the frame can be used for visually checking whether the image is wrong or not, and subsequent verification is facilitated. By the method provided by the invention, a large amount of electronic file images of pure text types can be well subjected to title extraction and recognition without extra OCR (optical character recognition) and other steps, and meanwhile, the positions of titles can be well positioned and recognized through image characteristics aiming at handwritten or fuzzy text images which are difficult to recognize by OCR. When the images of the new category need to be added, only the image samples of the new category need to be added for training, and the network convergence is rapid. Compared with the traditional target detection network, the network module specially designed for the electronic files enables the network to well detect the title positions of almost all the electronic files, and meanwhile, the characteristics of all the characters in the titles of the electronic files can be well extracted.
It is to be understood that the invention is not limited to the specific arrangements and instrumentality described above and shown in the drawings. Also, a detailed description of known process techniques is omitted herein for the sake of brevity. The present embodiments are to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein.

Claims (4)

1. A method for extracting and classifying electronic file titles based on a deep neural network is characterized by comprising the following steps:
step (1), inputting the file pictures into a neural network to extract a plurality of characteristic maps with multiple sizes.
And (2) calculating a category score and a frame position according to the output feature map.
And (3) deducing the title position and the title category in the document by using a plurality of title election algorithms.
2. The method for extracting and classifying electronic file titles based on the deep neural network as claimed in claim 1, wherein the file pictures are input into the neural network in step (1) to extract a plurality of feature maps with multiple sizes, and the specific sub-steps include:
and (1.1) carrying out size correction on the file image and carrying out image preprocessing.
Step (1.2) input the preprocessed portfolio image into the underlying neural network and transmit into the title proposal neural network when the signature size becomes 1/8 initial.
Step (1.3) multiple long transverse convolutions and dilation convolutions are performed on the feature map and combined in the title proposed neural network.
And (1.4) finishing the final reduction of the feature map to 1/32 of the original map, and calculating and classifying two 1/8, two 1/16 and one 1/32 feature maps in the middle process.
And (1.5) image enhancement is carried out by adopting image rotation, filling, blurring, interception and brightness contrast adjustment in a training stage. And selecting a corresponding frame as a positive category mark through the Jaccard, selecting a specified number of frames with the lowest predicted values as negative category marks, and dividing the data set into a training, verifying and testing set. And training the network by continuously changing the hierarchical structure of the network parameters, and finally jointly evaluating the network by the f-measure of the frame and the classified call.
3. The method for extracting and classifying electronic file titles based on the deep neural network as claimed in claim 1, wherein the step (2) of calculating the category score and the border position according to the output feature map comprises the following specific sub-steps:
and (2.1) predicting the position of the center of the title in a plurality of vertical rows in the original image and the length and width of the title in various aspect ratios by using the point mapping for each layer of feature map through frame regression convolution.
And (2.2) continuously calculating the characteristics of the title categories of the characteristic graphs of the layers through an additional multi-layer classification module, and outputting the title categories of the points mapped in the original graph.
4. The method for extracting and classifying electronic file titles based on the deep neural network as claimed in claim 1, wherein the title position and the title category in the document are selected by a plurality of title election algorithms in step (3), and the specific sub-steps include:
and (3.1) judging the possibility of each title of all points in the image, and if the titles exist, acquiring predicted values of the center and the height width of the title frame.
And (3.2) screening all the prediction frames through a threshold value.
Step (3.3) revises all the values of the header bounding box prediction that exceed the image boundary.
And (3.4) sorting all the processed title frames in a descending order according to the probability of each type of title, and extracting the top k title frames with the highest probability.
And (3.5) selecting several frame types and frame positions with the highest prediction probability by using an NMS algorithm.
Step (3.6) reprocesses several results obtained in step (3.5) by a quench NMS algorithm and finally elects a bounding box result.
And (3.7) the title of the step of proposing the effect of the classification network adopts f-measure of frame prediction with IOU more than 0.5 and the accuracy rate of classification to jointly evaluate.
CN201910454209.8A 2019-05-24 2019-05-24 Electronic file title positioning, extracting and classifying method based on deep neural network Pending CN110929746A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910454209.8A CN110929746A (en) 2019-05-24 2019-05-24 Electronic file title positioning, extracting and classifying method based on deep neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910454209.8A CN110929746A (en) 2019-05-24 2019-05-24 Electronic file title positioning, extracting and classifying method based on deep neural network

Publications (1)

Publication Number Publication Date
CN110929746A true CN110929746A (en) 2020-03-27

Family

ID=69855684

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910454209.8A Pending CN110929746A (en) 2019-05-24 2019-05-24 Electronic file title positioning, extracting and classifying method based on deep neural network

Country Status (1)

Country Link
CN (1) CN110929746A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111860524A (en) * 2020-07-28 2020-10-30 上海兑观信息科技技术有限公司 Intelligent classification device and method for digital files
CN112132710A (en) * 2020-09-23 2020-12-25 平安国际智慧城市科技股份有限公司 Legal element processing method and device, electronic equipment and storage medium
CN112446372A (en) * 2020-12-08 2021-03-05 电子科技大学 Text detection method based on channel grouping attention mechanism
CN112560902A (en) * 2020-12-01 2021-03-26 中国农业科学院农业信息研究所 Book identification method and system based on spine visual information
CN112766246A (en) * 2021-04-09 2021-05-07 上海旻浦科技有限公司 Document title identification method, system, terminal and medium based on deep learning
CN113781607A (en) * 2021-09-17 2021-12-10 平安科技(深圳)有限公司 Method, device and equipment for processing annotation data of OCR (optical character recognition) image and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106682697A (en) * 2016-12-29 2017-05-17 华中科技大学 End-to-end object detection method based on convolutional neural network
CN108564097A (en) * 2017-12-05 2018-09-21 华南理工大学 A kind of multiscale target detection method based on depth convolutional neural networks
CN109584227A (en) * 2018-11-27 2019-04-05 山东大学 A kind of quality of welding spot detection method and its realization system based on deep learning algorithm of target detection

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106682697A (en) * 2016-12-29 2017-05-17 华中科技大学 End-to-end object detection method based on convolutional neural network
CN108564097A (en) * 2017-12-05 2018-09-21 华南理工大学 A kind of multiscale target detection method based on depth convolutional neural networks
CN109584227A (en) * 2018-11-27 2019-04-05 山东大学 A kind of quality of welding spot detection method and its realization system based on deep learning algorithm of target detection

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111860524A (en) * 2020-07-28 2020-10-30 上海兑观信息科技技术有限公司 Intelligent classification device and method for digital files
CN112132710A (en) * 2020-09-23 2020-12-25 平安国际智慧城市科技股份有限公司 Legal element processing method and device, electronic equipment and storage medium
CN112132710B (en) * 2020-09-23 2023-02-03 平安国际智慧城市科技股份有限公司 Legal element processing method and device, electronic equipment and storage medium
CN112560902A (en) * 2020-12-01 2021-03-26 中国农业科学院农业信息研究所 Book identification method and system based on spine visual information
CN112446372A (en) * 2020-12-08 2021-03-05 电子科技大学 Text detection method based on channel grouping attention mechanism
CN112766246A (en) * 2021-04-09 2021-05-07 上海旻浦科技有限公司 Document title identification method, system, terminal and medium based on deep learning
CN113781607A (en) * 2021-09-17 2021-12-10 平安科技(深圳)有限公司 Method, device and equipment for processing annotation data of OCR (optical character recognition) image and storage medium
CN113781607B (en) * 2021-09-17 2023-09-19 平安科技(深圳)有限公司 Processing method, device, equipment and storage medium for labeling data of OCR (optical character recognition) image

Similar Documents

Publication Publication Date Title
CN111325203B (en) American license plate recognition method and system based on image correction
CN111931684B (en) Weak and small target detection method based on video satellite data identification features
CN109857889B (en) Image retrieval method, device and equipment and readable storage medium
CN110929746A (en) Electronic file title positioning, extracting and classifying method based on deep neural network
US20200302248A1 (en) Recognition system for security check and control method thereof
CN110717534B (en) Target classification and positioning method based on network supervision
CN109740603A (en) Based on the vehicle character identifying method under CNN convolutional neural networks
CN112446388A (en) Multi-category vegetable seedling identification method and system based on lightweight two-stage detection model
CN112232371B (en) American license plate recognition method based on YOLOv3 and text recognition
CN112633382B (en) Method and system for classifying few sample images based on mutual neighbor
Ahranjany et al. A very high accuracy handwritten character recognition system for Farsi/Arabic digits using convolutional neural networks
CN111738055B (en) Multi-category text detection system and bill form detection method based on same
CN110781648A (en) Test paper automatic transcription system and method based on deep learning
CN114155527A (en) Scene text recognition method and device
CN110674777A (en) Optical character recognition method in patent text scene
CN110008899B (en) Method for extracting and classifying candidate targets of visible light remote sensing image
CN112365497A (en) High-speed target detection method and system based on Trident Net and Cascade-RCNN structures
CN111242131B (en) Method, storage medium and device for identifying images in intelligent paper reading
CN114092938A (en) Image recognition processing method and device, electronic equipment and storage medium
CN108960005B (en) Method and system for establishing and displaying object visual label in intelligent visual Internet of things
CN111832497B (en) Text detection post-processing method based on geometric features
CN117911697A (en) Hyperspectral target tracking method, system, medium and equipment based on large model segmentation
CN117557860A (en) Freight train foreign matter detection method and system based on YOLOv8
CN112418262A (en) Vehicle re-identification method, client and system
CN112465821A (en) Multi-scale pest image detection method based on boundary key point perception

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination