CN115471856A - Invoice image information identification method and device and storage medium - Google Patents

Invoice image information identification method and device and storage medium Download PDF

Info

Publication number
CN115471856A
CN115471856A CN202211012411.3A CN202211012411A CN115471856A CN 115471856 A CN115471856 A CN 115471856A CN 202211012411 A CN202211012411 A CN 202211012411A CN 115471856 A CN115471856 A CN 115471856A
Authority
CN
China
Prior art keywords
layer
forest
invoice image
invoice
image information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211012411.3A
Other languages
Chinese (zh)
Inventor
张文洋
杨桂珍
尹旭
褚夕
杨寅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Corp of China SGCC
Jinan Power Supply Co of State Grid Shandong Electric Power Co Ltd
Original Assignee
State Grid Corp of China SGCC
Jinan Power Supply Co of State Grid Shandong Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Corp of China SGCC, Jinan Power Supply Co of State Grid Shandong Electric Power Co Ltd filed Critical State Grid Corp of China SGCC
Priority to CN202211012411.3A priority Critical patent/CN115471856A/en
Publication of CN115471856A publication Critical patent/CN115471856A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/42Document-oriented image-based pattern recognition based on the type of document
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Databases & Information Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Medical Informatics (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to an invoice image information identification method, an invoice image information identification device and a storage medium. Acquiring an invoice image and preprocessing the acquired invoice image, and inputting the preprocessed image into a residual error neural network optimized by an evolution algorithm to identify a region to be identified in the invoice; the residual error neural network comprises a first convolution layer, a first pooling layer connected with the first convolution layer, a second convolution layer connected with the first pooling layer, a second pooling layer connected with the second convolution layer, a first residual block connected with the second pooling layer, a second residual block connected with the first residual block, a third pooling layer connected with the second residual block, a third pooling layer connected with the third pooling layer, a third residual block connected with the third pooling layer, a global average pooling layer connected with the global average pooling layer, and a depth forest classifier based on multi-objective optimization; and identifying the relevant digital information in each area to be identified, and storing the identified information in a formatted manner.

Description

Invoice image information identification method and device and storage medium
Technical Field
The invention relates to the field of invoice identification, in particular to an invoice image information identification method, an invoice image information identification device and a storage medium.
Background
In recent years, with the deep advancement of tax system reform and tax law landing application such as marketing improvement and increase, government supervision is continuously increased, supervision level is rapidly improved, supervision means is continuously enriched, and the use amount of invoices in China is increased sharply, wherein the use amount of special invoices and common invoices of value-added tax accounts for a very large amount and accounts for about 90% of the total amount of various invoices. The reimbursement process of the invoices at the present stage is still very complicated, the financial reimbursement of organizations such as enterprises, public institution and the like is mainly manually input by related personnel of financial departments or personnel of business departments, the labor cost is high, a large amount of time is occupied, and the phenomena of information input errors, insufficient information accuracy and the like easily occur when centralized processing is carried out at the end of a month and the end of a year.
At present, the traditional algorithm can identify information of scanned value-added tax special invoices and value-added tax common invoice images, but challenges exist in identifying information in the invoice images in natural scenes with different zoom ratios, fuzzy degrees, brightness, inclination angles, sizes and background interferences shot by a mobile phone, and if key information in the invoices needs to be intelligently positioned, the traditional template matching algorithm and the anchor point positioning method do not have universality. Text detection and recognition are an important application field of computer vision technology, and text information contained in an image is converted into language which can be directly recognized and used by a computer by using the text recognition technology. Deep learning, also called characterization learning, is an important branch field of machine learning and is one of the most popular scientific research trends today. Deep learning drastically changes pattern recognition, which is a credit allocation mechanism for potential associations that exist between behaviors and outcomes in adaptive systems. Deep learning was introduced into machine learning as early as 1986, and then applied to artificial neural networks in 2000. The deep neural network is also one of the most successful researches, and has made a major breakthrough in the fields of voice recognition, face recognition, natural language processing, medical treatment, security and the like. The development of deep architecture starts with artificial neural networks, which have been the focus of research for a long time. The first generation of artificial neural networks consisted of single-layer perceptrons, the performance of which was limited by simple calculations. And updating the weight parameters of the neurons by using a back propagation algorithm according to the error rate through the second-generation artificial neural network. With the advent of the support vector machine, the low-dimensional inseparable problem can be converted into a high-dimensional separable problem and surpass first and second generation artificial neural networks. Meanwhile, the Boltzmann machine appears, and the problem of a back propagation algorithm in a second generation artificial neural network is solved. Then, a large number of deep learning algorithms and neural networks such as feedforward neural networks, convolutional neural networks, cyclic neural networks, deep belief networks, autoencoders, and deep confrontation networks have appeared. Zheng Wei, etc., a method and system for automatically identifying and managing value-added tax invoices is proposed, which mainly solves the problems of low efficiency and accuracy in the invoice entry task. The method comprises the steps of firstly, automatically identifying an invoice to acquire an invoice image, preprocessing the invoice image after the acquisition is finished to acquire an invoice image gray-scale map, then, identifying and extracting invoice information of the invoice image gray-scale map, detecting each region of invoice content through a cascade target detector, then, identifying the invoice content of a detection division region through an invoice content identifier to acquire an identification result and a score, dividing the score into three levels according to a set confidence interval, finally, manually correcting, and performing warehousing operation after the identification information is manually corrected. Zheng Dixin et al provide an invoice identification method and apparatus, and a computer storage medium, where text identification processing is performed on an invoice image to obtain a text identification result of the invoice image, then the text identification result of the invoice image is subjected to line splitting to obtain at least one text line, and then entry information corresponding to an entry in the invoice image is determined based on a text identification result included in each line of the at least one text line. Determining entry information corresponding to entries in the invoice image based on a text recognition result contained in each line of at least one text line, wherein the entry information comprises: analyzing a text recognition result contained in a first text line in at least one text line, and determining a corresponding relation between at least one item contained in the first text line and at least one item information; and analyzing the text recognition result contained in the next text line of the first text line based on the corresponding relation between the at least one entry contained in the first text line and the at least one entry information. The method is good in innovation and capable of solving the accuracy problem of invoice image recognition, but the provided algorithm needs a large number of samples to support network training, overfitting is easy to occur in small samples, and an effective method is still lacked for unbalanced training sample image recognition which is often seen in life.
Disclosure of Invention
In order to solve the technical problems or at least partially solve the technical problems, the invention provides an invoice image information identification method, an invoice image information identification device and a storage medium.
The invention provides an invoice image information identification method, which comprises the steps of collecting an invoice image, preprocessing the collected invoice image, inputting the preprocessed image into a residual error neural network optimized by an evolution algorithm to identify a region to be identified in an invoice; the residual error neural network comprises a first convolution layer, a first pooling layer connected with the first convolution layer, a second convolution layer connected with the first pooling layer, a second pooling layer connected with the second convolution layer, a first residual block connected with the second pooling layer, a second residual block connected with the first residual block, a third convolution layer connected with the second residual block, a third pooling layer connected with the third convolution layer, a third residual block connected with the third pooling layer, a global average pooling layer connected with the global average pooling layer, and a depth forest classifier based on multi-objective optimization; and identifying the relevant digital information in each area to be identified, and storing the identified information in a formatted manner.
Furthermore, the preprocessing comprises affine transformation of the image to reach a preset size, perspective transformation of the image to correct perspective deformation of the to-be-identified area of the image, and edge detection of the image to extract effective information of the image.
Furthermore, the first, second and third residual blocks have the same structure and comprise input layers, the input layers are connected with the inner convolutional layer, the output of the inner convolutional layer is connected with the threshold value screening layer, the output of the inner convolutional layer is connected with the global convolutional layer, the output of the global convolutional layer is connected with the batch normalization layer, the ReLU activation function layer and the Sigmoid activation function layer, the output of the inner convolutional layer is weighted with the output of the Sigmoid activation function layer and then input into the threshold value screening layer, and the input layers are weighted with the output of the threshold value screening layer and then output.
Furthermore, the deep forest classifier adopts a cascade forest structure, each layer of forest of the cascade forest structure is decision tree integration, the feature vector generated by each layer of forest is connected with the original feature vector and is input into the next forest until the last forest layer, and the maximum value of the average values of the results of the last forest layer in the deep forest classification model is taken as the classification result output by the deep forest classifier.
Further, automatically determining the number of cascaded layers for the performance improvement of the depth forest classifier according to the increase of the number of cascaded layers of the depth forest classifier comprises: and generating a class vector by each forest through k-fold cross validation, namely, taking each sample data as a training sample for k-1 times, generating k-1 class vectors, obtaining validation data according to the image, evaluating the performance of the whole deep forest classifier according to the validation data when a new layer of forest is generated in an expansion way, and if the performance of the whole deep forest classifier is not obviously improved, the number of layers of the forest is not increased any more.
Furthermore, each layer of the cascade forest structure comprises a random forest and a completely random forest; when the decision tree of the random forest is constructed, the whole process is carried outRandom selection in a feature space
Figure BDA0003811435100000045
Taking the individual characteristics as candidate characteristics, wherein d is the number of input characteristics, and then selecting the characteristics with the best Gini value as the splitting characteristics of the nodes; the fully random forest member is a split feature that randomly selects 1 feature as a node in the whole feature space.
Furthermore, the hyper-parameters involved in the deep forest classifier include the number w of random forests in each forest layer i And the number theta of completely random forests in each layer of forests i The number b of decision trees contained in each forest i Carrying out optimization by adopting a multi-objective optimization mode and hyper-parameters;
multi-objective optimization utilizes deep forest activation function h as first optimization function
Figure BDA0003811435100000041
Using the order of magnitude beta of the deep forest parameter as a second optimization function
Figure BDA0003811435100000042
Optimizing the hyper-parameter through a first optimization function and a second optimization function;
the first and second optimization functions are constrained by a first and second objective, where the first objective is the root mean square error over the training set as:
Figure BDA0003811435100000043
the second objective is sparsity:
Figure BDA0003811435100000044
wherein x is tr Is a training set sample, N tr Is the number of training set samples, o represents the Hadamard product of two numbers converted into vectors, omega ii ,b i Respectively, hyper-parameters of the deep forest classifier; n represents the number of neurons in each layer and the number of layers of the L neural network;
the objective function of the multi-objective optimization model is as follows:
Figure BDA0003811435100000051
and meanwhile, the two objective functions are minimized, so that the model is sparse as much as possible on the premise of better performance.
Furthermore, the residual neural network optimized by the evolution algorithm is optimized by the evolution algorithm, the population individuals are rearranged by a selection operator of the evolution algorithm by using a sorting method, and the probability of selecting the individuals after the rearrangement is as follows:
Figure BDA0003811435100000052
p=s(1-p 0 ) b-1
wherein a is the number of the population in the evolutionary algorithm, p0 is the probability that the optimal individual may be selected, s is a value obtained by normalizing p0, and b is the position of the nth individual after the population is rearranged;
the evolution algorithm optimization adopts the reciprocal of the sum of squared errors as a fitness function:
f (j) =1/E (j), wherein,
Figure BDA0003811435100000053
e is the sum of squared errors, P is the overall output, w is the weight, x is the input characteristic, F is the fitness, j is the number of generations, y j Is a theoretical output.
In a second aspect, the present invention provides a container cluster protection device, comprising: the invoice image information recognition method comprises a processing unit, a bus unit, a storage unit and an image acquisition unit, wherein the bus unit is connected with the storage unit, the processing unit and the image acquisition unit, the storage unit stores a computer program, and the computer program realizes the invoice image information recognition method when being executed by the processing unit.
In a third aspect, the present invention provides a storage medium for implementing an invoice image information recognition method, wherein the storage medium stores a computer program, and the computer program implements the invoice image information recognition method when executed by a processor.
Compared with the prior art, the technical scheme provided by the embodiment of the invention has the following advantages: preprocessing the acquired invoice image, and inputting the preprocessed image into a residual error neural network optimized by an evolution algorithm to identify a region to be identified in the invoice; the residual error neural network comprises a first convolution layer, a first pooling layer connected with the first convolution layer, a second convolution layer connected with the first pooling layer, a second pooling layer connected with the second convolution layer, a first residual block connected with the second pooling layer, a second residual block connected with the first residual block, a third pooling layer connected with the second residual block, a third pooling layer connected with the third pooling layer, a third residual block connected with the third pooling layer, a global average pooling layer connected with the global average pooling layer, and a depth forest classifier based on multi-objective optimization; and identifying the relevant digital information in each area to be identified, and storing the identified information in a formatted manner. The introduction of the global average pooling layer greatly reduces the parameters required to be calculated, greatly improves the calculation speed of the residual neural network, and avoids the overfitting problem because the global average pooling layer does not need a large amount of training and optimizing parameters like a full-connection layer. The global average pooling layer aggregates spatial information and is therefore more robust to spatial transformations of the input data. The first, second and third residual blocks avoid gradient explosion and disappearance by using a quick connection to skip convolution, and contribute to constructing a deeper neural network structure.
And the first residual block, the second residual block and the third residual block improve the threshold value by using an attention mechanism, so that the residual neural network automatically generates corresponding threshold values according to the input data to eliminate noise, and each group of input data can carry out unique characteristic channel weighting adjustment according to different importance degrees of the sample. The data is processed by a global convolution layer, then the data passes through a batch normalization layer, a ReLU activation function layer and a Sigmoid activation function layer, the Sigmoid activation function layer enables output to be mapped into [0,1], a mapping scaling coefficient is marked as alpha, a final threshold value can be expressed as alpha multiplied by A, different samples correspond to different threshold values, and screening is carried out through a threshold value screening layer, so that the purpose of eliminating or weakening noise is achieved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.
FIG. 1 is a schematic diagram of a residual neural network provided in an embodiment of the present invention;
fig. 2 is a schematic diagram of a first residual block, a second residual block, and a third residual block according to an embodiment of the present invention;
fig. 3 is a schematic diagram of a deep forest classifier provided in an embodiment of the present invention;
fig. 4 is a schematic diagram of a container cluster protection apparatus according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising a … …" does not exclude the presence of another identical element in a process, method, article, or apparatus that comprises the element.
Example 1
Referring to fig. 1, the present invention provides an invoice image information recognition method, including:
acquiring an invoice image and preprocessing the acquired invoice image. Specifically, the preprocessing includes performing affine transformation on the invoice image to reach a preset size, performing perspective transformation on the invoice image to correct perspective deformation of an area to be identified of the invoice image, and performing edge detection on the invoice image to extract effective information of the image. At present, domestic invoices are mainly divided into two categories: one type is an electronic invoice and the other type is a paper invoice. The electronic invoice has a standard format and very clear font printing, so that the invoice image of the electronic invoice can be directly used as original input data; the paper invoice can be photographed by a mobile phone, a camera and other photographic equipment, or is converted into an electronic image through scanning to serve as original input data. The invoice image obtained by photographing or scanning the paper invoice usually has the problems of unfixed size, perspective deformation and the like. The invention can effectively improve the invoice image obtained by photographing or scanning by preprocessing the invoice image.
And inputting the preprocessed image into a residual error neural network optimized by an evolution algorithm to identify a region to be identified in the invoice. The residual error neural network optimized by the evolution algorithm is optimized by the evolution algorithm, a selection operator of the evolution algorithm rearranges the population individuals by using a sorting method, and the probability of selecting the individuals after rearrangement is as follows:
Figure BDA0003811435100000081
p=s(1-p 0 ) b-1
wherein a is the number of the population in the evolutionary algorithm, p0 is the probability that the optimal individual may be selected, s is a value obtained by normalizing p0, and b is the position of the nth individual after the population is rearranged;
the evolution algorithm optimization adopts the reciprocal of the sum of squared errors as a fitness function:
f (j) =1/E (j), wherein,
Figure BDA0003811435100000082
e is the sum of squares of the errors, P is the overall output, w is the weight, x is the input characteristic, F is the fitness, j is the number of generations, y j Is a theoretical output.
Referring to fig. 1, the residual neural network includes: the global mean pooling layer is a forest average value processing directly performed on a feature mapping graph of each channel, namely one feature mapping graph outputs one value, and then the result is input to a depth classifier based on multi-objective optimization; and identifying the relevant digital information in each area to be identified, and storing the identified information in a formatted manner. The introduction of the global average pooling layer greatly reduces the parameters required to be calculated, the calculation speed of the residual error neural network is greatly improved, the global average pooling layer does not need a large amount of training and tuning parameters like a full connection layer, and the overfitting problem is avoided. The global average pooling layer aggregates spatial information and is therefore more robust to spatial transformations of the input data.
Referring to fig. 2, the first, second, and third residual blocks have the same structure and include an input layer, the input layer is connected to an internal convolutional layer, an output of the internal convolutional layer is connected to a threshold screening layer, an output of the internal convolutional layer is connected to a global convolutional layer, an output of the global convolutional layer is connected to a batch normalization layer, a ReLU activation function layer, and a Sigmoid activation function layer, an output of the internal convolutional layer is weighted with an output of the Sigmoid activation function layer and then input to the threshold screening layer, and an output of the input layer is weighted with an output of the threshold screening layer and then output. The first, second and third residual blocks avoid gradient explosion and disappearance by using a quick connection to skip convolution, and contribute to constructing a deeper neural network structure. And the first residual block, the second residual block and the third residual block improve the threshold value by using an attention mechanism, so that the residual neural network automatically generates corresponding threshold values according to the input data to eliminate noise, and each group of input data can carry out unique characteristic channel weighting adjustment according to different importance degrees of the sample. The data is processed by a global convolutional layer, and then passes through a batch normalization layer, a ReLU activation function layer and a Sigmoid activation function layer, the Sigmoid activation function layer enables output to be mapped into [0,1], a mapping scaling coefficient is marked as alpha, a final threshold value can be expressed as alpha multiplied by A, different samples correspond to different threshold values, and screening is carried out through a threshold screening layer, so that the purpose of eliminating or weakening noise is achieved.
Referring to fig. 3, the deep forest classifier adopts a cascade forest structure, each layer of forest of the cascade forest structure is a decision tree integration, the feature vector generated by each layer of forest is connected with the original feature vector and is input into the next forest until the next to last layer, and the maximum value of the average values of the results of the last layer of forest in the deep forest classification model is taken as the classification result output by the deep forest classifier.
In one possible embodiment, automatically determining the number of cascaded layers for the improvement in the performance of the depth forest classifier based on the increase in the number of cascaded layers for the depth forest classifier comprises: and generating a class vector by each forest through k-fold cross validation, namely, taking each sample data as a training sample for k-1 times, generating k-1 class vectors, obtaining validation data according to the image, evaluating the performance of the whole deep forest classifier according to the validation data when a new layer of forest is generated in an expansion way, and if the performance of the whole deep forest classifier is not obviously improved, the number of layers of the forest is not increased any more.
In one possible embodiment, the cascaded forest structure comprises random forests and fully random forests per layer; randomly selecting in the whole feature space when constructing the decision tree of the random forest
Figure BDA0003811435100000106
Taking the individual characteristics as candidate characteristics, wherein d is the number of input characteristics, and then selecting the characteristics with the best Gini value as the splitting characteristics of the nodes; the fully random forest member is a split feature that randomly selects 1 feature as a node in the whole feature space.
In one possible embodiment, the hyper-parameters involved in the deep forest classifier include the number w of random forests in each forest layer i The number theta of completely random forests in each layer of forests i Each forest contains the number b of decision trees i Carrying out optimization by adopting a multi-objective optimization mode and hyper-parameters;
multi-objective optimization utilizes deep forest activation function h as first optimization function
Figure BDA0003811435100000101
Using the magnitude beta of the deep forest parameter as a second optimization function
Figure BDA0003811435100000102
Optimizing the hyper-parameter through a first optimization function and a second optimization function;
the first and second optimization functions are constrained by a first objective and a second objective, wherein the first objective is a root mean square error over a training set as:
Figure BDA0003811435100000103
the second objective is sparsity:
Figure BDA0003811435100000104
wherein x tr Is a training set sample, N tr Is the number of training set samples, o represents the Hadamard product of two numbers converted into vectors, omega ii ,b i Respectively, the hyper-parameters of the deep forest classifier; n represents the number of neurons in each layer and the number of layers of the L neural network;
the objective function of the multi-objective optimization model is as follows:
Figure BDA0003811435100000105
and meanwhile, the two objective functions are minimized, so that the deep forest classifier is sparse as much as possible on the premise of better performance.
Taking the maximum value in the average value of the results of the last layer of forest in the deep forest classifier as the classification corresponding to the output classification result:
Fin(c)=Max y {Ave. m [c 11 ,c 12 ,...,c 1y ,c 21 ,c 22 ,...,c 2y ,...c m1 ,c m2 ,...,c my ]};
wherein m is the number of forests contained in each layer of the deep forest, y is the number of categories of the data set, c is the category of the classification of the data set, fin (c) is the classification result output by the deep forest classification model, and Max y The maximum value, ave, of the average values of the results of the last layer of forest in the deep forest classification model. m The average value of the results of the last layer of forest in the deep forest classification model is obtained. By classifying the invoice informationThe information is identified and, finally, stored in a json file in a formatted manner.
Example 2
Referring to fig. 4, an embodiment of the present invention provides a container cluster protection device, including: the invoice image information identification method comprises a processing unit, a bus unit, a storage unit and an image acquisition unit, wherein the bus unit is connected with the storage unit, the processing unit and the image acquisition unit, the storage unit stores a computer program and an image acquired by the image acquisition unit, and the computer program is executed by the processing unit to realize the invoice image information identification method.
Example 3
The embodiment of the invention provides a storage medium for realizing an invoice image information identification method, wherein the storage medium stores a computer program, and the computer program realizes the invoice image information identification method when being executed by a processor.
In the embodiments provided herein, it should be understood that the disclosed structures and methods may be implemented in other ways. For example, the above-described structural embodiments are merely illustrative, and for example, the division of the units is only one type of logical functional division, and there may be other divisions when the actual implementation is performed, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, structures or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The foregoing are merely exemplary embodiments of the present invention, which enable those skilled in the art to understand or practice the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. An invoice image information identification method is characterized by comprising the following steps: acquiring an invoice image and preprocessing the acquired invoice image, and inputting the preprocessed image into a residual error neural network optimized by an evolution algorithm to identify a region to be identified in the invoice; the residual error neural network comprises a first convolution layer, a first pooling layer connected with the first convolution layer, a second convolution layer connected with the first pooling layer, a second pooling layer connected with the second convolution layer, a first residual block connected with the second pooling layer, a second residual block connected with the first residual block, a third convolution layer connected with the second residual block, a third pooling layer connected with the third convolution layer, a third residual block connected with the third pooling layer, a global average pooling layer connected with the global average pooling layer, and a depth forest classifier based on multi-objective optimization; and identifying the relevant digital information in each area to be identified, and storing the identified information in a formatted manner.
2. The invoice image information identification method according to claim 1, wherein the preprocessing comprises performing affine transformation on the invoice image to reach a preset size, performing perspective transformation on the invoice image to correct perspective deformation of an area to be identified of the invoice image, and performing edge detection on the invoice image to extract effective information of the image.
3. The invoice image information recognition method of claim 1, wherein the first, second and third residual blocks have the same structure and comprise input layers, the input layers are connected with an internal convolution layer, the output of the internal convolution layer is connected with a threshold screening layer, the output of the internal convolution layer is connected with a global convolution layer, the output of the global convolution layer is connected with a batch normalization layer, a ReLU activation function layer and a Sigmoid activation function layer, the output of the internal convolution layer is weighted with the output of the Sigmoid activation function layer and then input to the threshold screening layer, and the input layers are weighted with the output of the threshold screening layer and then output.
4. The invoice image information recognition method of claim 1, wherein the deep forest classifier adopts a cascade forest structure, each layer of forest of the cascade forest structure is decision tree integration, the feature vector generated by each layer of forest is connected with the original feature vector and is input into the next forest until the last but one layer, and the maximum value of the average values of the results of the last layer of forest in the deep forest classification model is taken as the classification result output by the deep forest classifier.
5. The invoice image information recognition method of claim 4, wherein automatically determining the number of cascaded layers for depth forest classifier performance improvement based on an increase in the number of cascaded layers for depth forest classifiers comprises: and generating a class vector by each forest through k-fold cross validation, namely, taking each sample data as a training sample for k-1 times, generating k-1 class vectors, obtaining validation data according to the image, evaluating the performance of the whole deep forest classifier according to the validation data when a new layer of forest is generated in an expansion way, and if the performance of the whole deep forest classifier is not obviously improved, the number of layers of the forest is not increased any more.
6. The invoice image information identification method according to claim 4, whichIs characterized in that each layer of the cascade forest structure comprises a random forest and a completely random forest; randomly selecting in the whole feature space when constructing the decision tree of the random forest
Figure FDA0003811435090000021
Taking the individual characteristics as candidate characteristics, wherein d is the number of input characteristics, and then selecting the characteristics with the best Gini value as the splitting characteristics of the nodes; the fully random forest member is a split feature that randomly selects 1 feature as a node in the whole feature space.
7. The invoice image information recognition method of claim 4, characterized in that the hyper-parameters involved in the deep forest classifier comprise the number of random forests in each layer of forest, w i And the number theta of completely random forests in each layer of forests i Each forest contains the number b of decision trees i Carrying out optimization by adopting a multi-objective optimization mode and hyper-parameters;
multi-objective optimization utilizes deep forest activation function h as first optimization function
Figure FDA0003811435090000022
Using the order of magnitude beta of the deep forest parameter as a second optimization function
Figure FDA0003811435090000023
Optimizing the hyper-parameters through a first optimization function and a second optimization function;
the first and second optimization functions are constrained by a first and second objective, where the first objective is the root mean square error over the training set as:
Figure FDA0003811435090000024
the second objective is sparsity:
Figure FDA0003811435090000025
wherein x tr Is a training set sample, N tr Is the number of training set samples, o represents the Hadamard product of two numbers converted into vectors, omega ii ,b i Respectively, the hyper-parameters of the deep forest classifier; n represents the number of neurons in each layer and the number of layers of the L neural network;
the objective function of the multi-objective optimization model is as follows:
Figure FDA0003811435090000031
and meanwhile, the two objective functions are minimized, so that the deep forest classifier is as sparse as possible on the premise of better performance.
8. The invoice image information identification method according to claim 1, characterized in that the residual neural network optimized by the evolutionary algorithm is optimized by the evolutionary algorithm, the population individuals are rearranged by a selection operator of the evolutionary algorithm by using a sorting method, and the probability of the individuals being selected after the rearrangement is as follows:
Figure FDA0003811435090000032
p=s(1-p 0 ) b-1
wherein a is the number of populations in the evolutionary algorithm, p 0 For the probability that the optimal individual is likely to be selected, s is the sum of p 0 The normalized value, b being the position of the nth individual after rearranging the population;
the evolution algorithm optimization adopts the reciprocal of the sum of squared errors as a fitness function:
f (j) =1/E (j), wherein,
Figure FDA0003811435090000033
e is the sum of squares of the errors, P is the overall output, w is the weight, x is the input characteristic, F is the fitness, j is the number of generations, y j Is a theoretical output.
9. A container cluster protection device, comprising: the invoice image information recognition method comprises a processing unit, a bus unit, a storage unit and an image acquisition unit, wherein the bus unit is connected with the storage unit, the processing unit and the image acquisition unit, the storage unit stores a computer program, and the computer program realizes the invoice image information recognition method according to any one of claims 1-8 when being executed by the processing unit.
10. A storage medium for implementing an invoice image information recognition method, the storage medium storing a computer program, wherein the computer program, when executed by a processor, implements the invoice image information recognition method according to any one of claims 1-8.
CN202211012411.3A 2022-08-23 2022-08-23 Invoice image information identification method and device and storage medium Pending CN115471856A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211012411.3A CN115471856A (en) 2022-08-23 2022-08-23 Invoice image information identification method and device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211012411.3A CN115471856A (en) 2022-08-23 2022-08-23 Invoice image information identification method and device and storage medium

Publications (1)

Publication Number Publication Date
CN115471856A true CN115471856A (en) 2022-12-13

Family

ID=84367715

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211012411.3A Pending CN115471856A (en) 2022-08-23 2022-08-23 Invoice image information identification method and device and storage medium

Country Status (1)

Country Link
CN (1) CN115471856A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117809262A (en) * 2024-03-01 2024-04-02 广州宇中网络科技有限公司 Real-time image recognition method and customer behavior analysis system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117809262A (en) * 2024-03-01 2024-04-02 广州宇中网络科技有限公司 Real-time image recognition method and customer behavior analysis system
CN117809262B (en) * 2024-03-01 2024-05-28 广州宇中网络科技有限公司 Real-time image recognition method and customer behavior analysis system

Similar Documents

Publication Publication Date Title
US10747989B2 (en) Systems and/or methods for accelerating facial feature vector matching with supervised machine learning
CN109543690B (en) Method and device for extracting information
Sharma et al. Offline signature verification using deep neural network with application to computer vision
Herdiyeni et al. Mobile application for Indonesian medicinal plants identification using fuzzy local binary pattern and fuzzy color histogram
CN113076927B (en) Finger vein identification method and system based on multi-source domain migration
US20200134382A1 (en) Neural network training utilizing specialized loss functions
CN110751027B (en) Pedestrian re-identification method based on deep multi-instance learning
CN112116009A (en) New coronary pneumonia X-ray image identification method and system based on convolutional neural network
CN111694954B (en) Image classification method and device and electronic equipment
CN112786160A (en) Multi-image input multi-label gastroscope image classification method based on graph neural network
Das et al. Hybrid descriptor definition for content based image classification using fusion of handcrafted features to convolutional neural network features
CN115471856A (en) Invoice image information identification method and device and storage medium
Jadhav et al. HDL-PI: hybrid DeepLearning technique for person identification using multimodal finger print, iris and face biometric features
CN114049935A (en) HER2 image classification system based on multi-convolution neural network
CN111860601B (en) Method and device for predicting type of large fungi
CN111414863B (en) Enhanced integrated remote sensing image classification method
CN115640401B (en) Text content extraction method and device
CN112507863A (en) Handwritten character and picture classification method based on quantum Grover algorithm
Djibrine et al. Transfer Learning for Animal Species Identification from CCTV Image: Case Study Zakouma National Park
CN111144453A (en) Method and equipment for constructing multi-model fusion calculation model and method and equipment for identifying website data
US20220027662A1 (en) Optical character recognition using specialized confidence functions
Villegas-Cortez et al. Interest points reduction using evolutionary algorithms and CBIR for face recognition
CN114758379A (en) Jupiter identification method based on attention convolution neural network
CN108304546B (en) Medical image retrieval method based on content similarity and Softmax classifier
Arslan et al. Classification of Invoice Images By Using Convolutional Neural Networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination